Chris Brogan got me thinking about information filtering in this age of information overload and as I thought about it, I thought about what the leap from an abacus to a typewriter must have been like in its ability to express information. The current information access and filtering issue is analogous in some ways.
By IDC estimates 80% of digital content is created by individuals and 70% of content is unstructured (this roughly means too complex to fit neatly into one database field). Thinking about those number it makes sense - most content that we absorb is language-, music-, or graphics-based - not data or individual words.
The enterprise software market has excelled at solving data-based problems; accounting, supply-chain, and financial workflows can be largely automated through collecting, managing, and producing data. While those solutions have taken decades to evolve...they in many ways solve the 'easy' information problems.
Now that we have developed the abacus... how do we get to the typewriter? The typewriter is where systems understanding language, meaning, and tone and can 'converse' with the end user to narrow in on the information, people, or conversations they need. This is where search and fuzzy matching technologies come in but search technologies are quite varied and base their matching technologies on a variety of methodologies depending on the need and approach. Keyword matching, link analysis, and 'most viewed' are some basic approaches. As you move up the complexity scale there is categorization, machine learning, natural language processing, text analysis, stemming, co-occurrence, semantic analysis, and sentiment extraction. It's enough to make one's head spin but these technologies are becoming more and more important as the amount of digital content expands. All discovery applications - advertising platforms, recommendation tools, merchandising tools, and online intelligence application - need search and matching technologies at their core.
We've all learned how the abacus works and now we have some typewriters but we are just beginning to see how they can be used for information discovery and filtering. And as this technology has become more important it is not hard to understand why Microsoft recently bought FAST.
I believe we are only beginning to see the solutions that are possible using search and matching engines. Please drop a note if you have seen any particularly interesting deployments, examples, or technologies in this space.