Google Base: the rise of the semantic web?

Posted on Tue 22 November 2005
Google Base introduces structured data to web search engines, which up to now mostly had raw text to base their search results on. A few days ago, I was chating with Bertrand about how GBase could be related to the search engine by adding a simple link to web pages:
<link 
  rel="alternate" type="application/googlebase+xml" 
  title="Metadata about this page" 
  href="http://somewhere-on-the-web.com/page224/metadata.xml"/>
Bill Burnham had the same thoughts, but he expects Google to ask for people to register their RSS feeds to Google Base. Why would that be needed? Google already knows all the web. Or that would simple be a helper for the crawler, just as you can do today by registering a sitemap.

Some problems that prevent the semantic web to emerge are the lack of available semantic data and the huge processing power needed to process that data when it will be available. With Google Base, people will hapilly publish their structured data, and the Google search engine will crunch it just as it routinely crunches the whole web.

This isn't without problems however:
  • Many people will invent their own data structures, making the semantic meaning of information difficult to analyze
  • Google will have all the earth's information in its hands. Sure, their motto is "don't be evil", but can we expect this to be true forever from a business entity? It looks to me they'll have to open or share their data in some ways, at least with some international organizations (the U.N.?) or otherwise they risk some antitrust action on the long term. Now if they just crawl data (using links described above) and don't require it to be uploaded as of today, this allows other engines to use that data as well, thus limiting the monopoly problem to the processing power.
Interesting times ahead...


Welcome Thibault!

Cocoon 3.0...