Native Apps At The Client & Cloud

Srinivasan Sundara Rajan

Subscribe to Srinivasan Sundara Rajan: eMailAlertsEmail Alerts
Get Srinivasan Sundara Rajan: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

Related Topics: Cloud Computing, SOA & WOA Magazine, Java Developer Magazine


Improving the Productivity of Knowledge Workers

Importance of automated content classification

What Is Content Classification
The term content classification is best understood in an enterprise information context, defined by the following concepts.

Taxonomy is the hierarchical representation of topics of interest. For example, a basic taxonomy might consist of a class called "Transport," which might have subclasses "Air Transport" and "Land Transport." Then "Land Transport" might in turn have subclasses "Bus" and "Car." This hierarchy means that a "Car" is a type of "Land Transport," and is also a type of "Transport."

Ontology defines the relationships between the topics of interest.

Content classification is the process of analyzing a document and adding metadata 'tags' that describe that document that is sourced from a taxonomy or other form of controlled vocabulary.

Content Classification in Enterprises
Today's enterprises deal with data in which 80% is unstructured. There is a tremendous amount of intelligence and insight held in this massive amount of unstructured data. However, most enterprises depend on their information worker's knowledge to bring meaning to the unstructured data. In most of the enterprises, relevancy is entirely subjective to the individual who is performing the search. Only each individual can judge how relevant a particular bit of information is to what they are attempting to discover.

As evident enterprises needed to augment their knowledge workers with insights that go beyond their human expertise, so that they find and analyze the topics of interest and enrich them further for end customers.

One of the typical applications of content classification usage in enterprise is how the enterprises analyze the warranty and customer complaints towards improving the product quality.

  • Problems occur in different geographies and the same problem scope is represented differently
  • Most of the time problems are not grouped into a larger category, due to the lack of taxonomies within the problem area
  • Problems cannot be associated with each other since human intervention is needed to associate two similar problems
  • This results in lost opportunities to identify true problem areas or the wrong classification of problems, ultimately impacting the product quality, which results in product recalls and lost market share.

The following are the some of the players and their products that support content classification. Adopting these products and similar ones will help the enterprises to best utilize the potential of their information workers, while improving their productivity and reducing the manual work. This will also facilitate the enterprises to keep their core knowledge inside automated business rule processing machines than with the human intelligence.

Smartlogic Semaphore Content Intelligence Platform
Semaphore, the Content Intelligence Platform from Smartlogic that works with an enterprise‘s existing search and content management systems, organizes business-critical content by automatically tagging and categorizing it - enabling precise searching, guided navigation, and effective management and governance.

Semaphore consists of four core modules:

  1. Ontology Server & Manager - allows multiple users to collaborate on the development and management of ontologies which capture the essential topics, resources and vocabulary for the business.
  2. Advanced Linguistics Pack - provides text mining and entity extraction based on part-of-speech tagging.
  3. Classification Server - a rules-based semantic classification engine providing accurate metadata tagging of content in 26 languages.
  4. Semantic Enhancement Server / Search Application Framework - enhance search engines (e.g., Microsoft SharePoint Search, Microsoft FAST, Lucene/Solr, Google Search Appliance, etc.)

Using the above core modules, Semaphore is an enterprise semantic platform that uniquely captures an organization‘s subjects and topics into a taxonomy or ontology [model] and enhances traditional information management systems like search, content management and business workflow engines by adding advanced content classification, metadata enrichment, and navigation capabilities to deliver a more complete enterprise information management experience.

Additional information about the product can be obtained from their website:

IBM ECM - Classification Module
IBM's Content management portfolio is added with features for Content classification. We can categorize documents by using IBM Classification Module. The Classification Module annotator uses the capabilities of Classification Module to classify content into categories and generate metadata information that can be used for facets or keywords in Content Analytics.

Much like the Smartlogic Platform, IBM Classification Module has the following core components.

  • Classification Workbench:
  • - Taxonomy Proposer
  • _ Classification Module server:
  • - Management Console
  • - Client APIs
  • _ IBM FileNet P8 integration asset:
  • - Classification Center
  • - Content Extractor

The Taxonomy Proposer, which is installed with the Classification Workbench, allows you to discover new categories in an uncategorized or partially categorized body of documents. The Taxonomy Proposer uses custom clustering algorithms to analyze and group similar documents to help you to create a taxonomy for your content.

Typical of any IBM product, we get a lot of redbooks and materials available to go deep into this. Further information can be found at the IBM Website:

Most enterprises wanted to make a difference by providing a unique value proposition for their information delivery. Enterprises invest heavily in their knowledge workers or information analysts to provide meaningful insight into their unstructured data; however, this process is not repeatable and prone to failures. Content Classification Automation solutions as identified in the above will enable the enterprises to be more efficient.

More Stories By Srinivasan Sundara Rajan

Highly passionate about utilizing Digital Technologies to enable next generation enterprise. Believes in enterprise transformation through the Natives (Cloud Native & Mobile Native).