Seth Grimes, an analyst specialising in business intelligence and text analysis, gave a fascinating presentation – “an introduction to the semantic web and text-mining” – last week in London.
What is “the semantic web”
When you look up a key word or phrase on Google, the search engine returns content on the basis of the frequency of those words within the text and the links to it from other sites, among other things. The semantic web takes that concept further, returning content by recognising not only the frequency of the words and calibre of the links, but also the context of the request. In short, the semantic web aims to understand user searches in a more human way, adding context to queries.
Seth kicked off with an article by Hans Peter Luhnin the IBM Journal of 1958 which has Luhn, the pioneer of information services, complaining that “no attention is paid to the logical and semantic relationship the author has established”.
Seth argues that even then Luhn was perceiving a time when “sense making” would matter:-
Statistical information derived from word frequency and distribution is used by the machine to compute a relative measure of significance
Today that need for sense among all the disorder of content is even greater with the “unstructured data challenge”, as Seth called it, of blogs, emails, surveys and office documents.
Ever more relevant today
Seth used, as an example, a Twitter application called twitrratr which assesses Tweets for “sentiment analysis”. But, using the word “kind” as an example, he showed how difficult it is to do that with the multiple-meaning English language.
“Is seach up to the job?” he asked.
Only if it provides content semantically enriched with linked data, that is context sensitive and location aware.
And the sooner media companies get in on the act, the better.
Seth quoted from a survey by IDC to show just how little those who are responsible for content benefit from it.
The broadcast, media and entertainment industries garner about 4% of the world’s revenues but already generate, manage, or otherwise oversee 50% of the digital universe
Finally Seth went to push textmining on sites using automatic content categorization, text augmentation and information extraction (disclosure the presentation was sponsored by text mining platform Nstein). The market, he argued, from a study (partly funded by Nstein) he had published “Text Analytics 2009: user perspective on solutions and providers”, was worth $350 in 2008 and due to increase by 25% in 2009.
Seth’s own research showed…
Yet, surprisingly, when clients were asked about relative importance of several online qualities, clients placed content management a lowly fifth below brand values.
But, interestingly, clients were more likely now to analyse social media content than traditional news articles.
Is B2B media ready to exploit the semantic web?
B2B media is the opposite of mass media, the former a mass of sites for small but very well defined communities rather than the latter with its few big sites for millions of people. Indeed B2B sites do not want millions of the wrong people coming to their sites but rather few of the right people. Users’ familiarity with an existing print brand, social media activity, all help to refine those who get to the sites. But key to this refinement is search.
Take SHP, a B2B site for the safety and health professionals, as an example. It boasts articles on stress. We do not want millions of people finding the site because they are Googling the word “stress”. What we do want is for safety and health practitioners who are looking for such phrases as “stress in factories in northern England” to find the site. The more complex the keyword phrase used to get to a B2B site is, the more qualified the user.
The issue for us, therefore, is that our more qualified readers have been making choices informed by Seth’s linked data, context and even “location aware” for years. How then will the semantic web benefit them? In fact, does the semantic web have something to learn from B2B?