Tuesday, December 18, 2012

Content Wars: Instagram Privacy update and opportunity for Flickr?

Yesterday, there was a lot of activity on twitter related to the Instagram new terms of services (TOS). Instragram plans to productize  the millions of user generated content which happens to be the  beautiful photographs created by its free users of the service. Users are strongly protesting the change in TOS and plan to discontinue use of the Instagram service. Instagram built up its subscriber base by offering a great product packaged with a  free service to lure in professional photographers and content publishers to share content through its service. After the acquisition by Facebook, there is tremendous pressure to monetize the content to drive revenues. The crux of the problem is on ownership of the content. The content producers are against using their user generated content. This debate raises an interesting question on user content leveraging free services like Instagram, Flikr and drop box to share and collaborate. It is an important question that needs to be answered for the future of the web based on social media. At the end of the day, social media companies need to monetize and will use the user generated content for selling adds, analyzing patterns and selling data to 3rd party companies. As a consumer of such services, we need to take a call on adopting such freemium services vs the cost of our privacy. The updated TOS by Instagram may open the door for competitors like Flickr who are pro consumer in terms of the service agreement on content use.

Saturday, December 15, 2012

Machine Learning and AI is taking over our lives

Google has added futurist and artificial intelligence guru Ray Kurzweil as Director of Technology. The hire highlights the growing importance of machine learning to the internet, search and discovery and everything from self driving cars to analyzing documents. Machine learning is transitioning in a big way from the R&D labs into mainstream consumer and enterprise applications. 2013 will be even bigger with several Big Data processes incorporating machine learning. Ray truly believes in artificial intelligence growing beyond the capabilities of humans. With the availability of massive amounts of computing power through cloud computing and SSD storage, its time for the machines to really step into the mainstream and take over duties performed by humans. It may sound futuristic but so was the idea of the internet taking over our communication. 

Monday, August 15, 2011

Content Analytics : Integrating Natural Language Processing(NLP) in your ECM solutions

Natural Language Processing (NLP) is a branch of Artificial Intelligence that deals with the interaction between computers and human(natural) languages. The recently held Jeopardy game show involving IBM's Watson competing with humans is a great example of a NLP system in real life. NLP technology can be leveraged successfully in solving problems in the unstructured  content management space within Enterprises.
Enterprises deal with a mountain of unstructured content in their day to day business processes.  Until recently, organizations ingest the huge volume of data into a traditional content management system like IBM FileNet and EMC Documentum, indexed with a few index fields like account number, names and location. Invariably most of the indexing is  performed using human-centric manual indexing due to the wildly unstructured nature of the content. The only searchable information from the unstructured content objects is through the index fields. This methodology of dealing with content is no longer sufficient to meet requirements for today's enterprises where there is need to ensure that content is in motion and widely used across all departments.


To address this ever growing problem, my organization AlphaCloud Labs decided to embrace NLP using machine learning. The paradigm of machine learning is different from that of most prior attempts at language processing. Prior implementations of language-processing tasks typically involved the direct hand coding of large sets of rules. The machine-learning paradigm calls instead for using general learning algorithms — often, although not always, grounded in statistical inference — to automatically learn such rules through the analysis of large corpora of typical real-world examples. A corpus (plural, "corpora") is a set of documents (or sometimes, individual sentences) that have been hand-annotated with the correct values to be learned.


Our NLP solutions address the following requirements:

  • Information retrieval (IR): This is concerned with storing, searching and retrieving information. It is a separate field within computer science (closer to databases), but IR relies on some NLP methods (for example, stemming). Some current research and applications seek to bridge the gap between IR and NLP.
  • Information extraction (IE): This is concerned in general with the extraction of semantic information from text. This covers tasks such as named entity recognitioncoreference resolutionrelationship extraction, etc.
  • Question answering: Given a human-language question, determine its answer. Typical questions have a specific right answer (such as "What is the capital of Canada?"), but sometimes open-ended questions are also considered (such as "What is the meaning of life?").
  • Automatic summarization: Produce a readable summary of a chunk of text. Often used to provide summaries of text of a known type, such as articles in the financial section of a newspaper.
  • Named entity recognition (NER): Given a stream of text, determine which items in the text map to proper names, such as people or places, and what the type of each such name is (e.g. person, location, organization).
  • Optical character recognition (OCR): Given an image representing printed text, determine the corresponding text.
As you can infer from the above, NLP can be widely leveraged to solve unstructured information / content analytics related use cases in an enterprise. Typical use cases would be Risk Management in Financial Services and Insurance, Electronic Medical Records in Health Care, Sentiment Analysis for your Social Media and Brand management . Our NLP solution is built on open standards based architecture that can be easily integrated into existing Content Management Systems like Filenet, Content Manager, Sharepoint and Documentum. One of our clients recently requested to analyze 5TB of image data and provide a searchable interface looking for patterns and meaning into the documents. We are seeing more trends towards this area of content analytics targeting unstructured content.

Hope this blog has provided an introduction to how organizations can plan and create a strategy to leverage NLP for content analytics.




Friday, July 8, 2011

Content at Rest vs Content in Motion - What's good for your enterprise?

Enterprise's without a clear information management strategy invariably mange unstructured content by storing them in file systems,mail servers and desktops. Content objects are rarely searched and accessed. Once an employee switches jobs, the device and user account hosting  the content is erased as part of the off-boarding process leading to information destruction and loss of data. This is typically the life cycle of content at rest Content at rest = Risk + Static Enterprise. On the contrary, agile enterprises with a clear information strategy have content objects in motion.Content in Motion supports business intelligence, analytics, enable collaboration and workflow and deliver true customer service. Such organizations lead the marked in innovation, rank high  in customer satisfaction and are forward looking in terms of return on investments to share holders.Content in Motion =  Reward + Agile Enterprise. Craig Rhinehart from IBM  alluded to the Content at Rest vs Content in Motion  theory in detail in his blog http://craigrhinehart.wordpress.com/2011/05/26/content-at-rest-or-content-in-motion-which-is-better/.

Organizations can transform Content at Rest to Content in Motion through the following actions:

  1. Identify Content : Inventory your current  content objects by systematically identifying all of your content objects across departments
  2. Defensibly Dispose: Dispose content objects that are erroneous, duplicates and unused. Examples include stranded sharepoint sites, legacy file formats and documents no longer valid. Ensure that you keep objects that are relevant to current business process and legal and compliance requirements
  3. Content Analytics : Work with you line of business leaders to study and analyze the content, understanding trends and patterns and incorporating the results in the decision making process.
  4. Customer Service : Ensure that customer service associates are brought into the loop to use the intelligent content on day to day activities
The above mentioned steps need to be a repeatable process within organizations to put content into motion. I hope it's obvious by now that content in motion is always the winner in any organization. I have borrowed ideas for this blog from Craig Rhinehart from IBM and would like to thank him for his excellent blog post. I hope this fuels more organizations in putting their content into motion...Watch this space for my next blog on Content Analytics using Machine Learning and Natural Language Processing to enable content into motion

Saturday, November 27, 2010

Why is ECM strategy important for all Enterprises?

Welcome to the new enterprise, Enterprise 2.0. Knowledge workers in today's enterprises create unstructured content during every step of the business process. Unstructured content is the daily email, chat, IM, documents, images, web-content, blogs, tweets,  power point presentations, videos, music and several other files  that are constantly created and stored on PC's, mobile devices and e-readers. Surveys conducted across enterprises project the unstructured content to be the most difficult to manage and the fastest growing data point within an organization. The value of the information in the content directly impacts revenues, process efficiencies(cost saves), legal and compliance and in almost all cases delivers business benefit. This creates several challenging problems for the enterprise which leads to several questions - how can this content be managed? how can the content be processed to extract meaningful information to help the business? what is the business impact if the content is lost / destroyed? what is the cost to store the content on the PC's and mobile devices long term? Every CIO and IT department in Enterprises have to answer these questions in the age of Enterprise 2.0. This leads directly to the topic of our discussion on "Why is ECM strategy important for all Enterprises?". ECM is the strategies, methods and tools used to capture, manage, store, preserve, and deliver content and documents related to organizational processes. ECM covers the management of information within the entire scope of an enterprise whether that information is in the form of a paper document, an electronic file, a database print stream, or even an email. Thus as you can infer from the very definition of ECM, it's needed for Enterprise 2.0 to deliver innovation, create smart business processes through workflow optimization and protect the organization through the legal discovery process in addition to delivering collaboration in an Enterprise. Only 10% of the enterprises across the globe have an active ECM strategy. The main barrier for ECM deployment in an organization was cost followed by a lengthy ROI. With the emergence of quality open source/open standards products, the entry costs have dramatically reduced by 70-0% in most cases. Enterprises and government's should look at open source ECM as an alternative to become adopters of this enabling technology.