Google now crawling forms

Posted on Sun 04 May 2008

A few days ago, I started receiving Google alerts about me (yes, I'm following what's said about me on the web) that were linking to search results pages on my own blog, with strange query terms such as "steve" or "near", "idea" or "known".

Why would such searches show up in Google? Who has linked from his website to the search results for these weird words? I tried to find these pages, but failed.

Today I finally found the answer: Google is now crawling through forms, by filling inputs with words they find on the page containing the form. The intent is to crawl the "deep web" that is not normally accessible through regular links.

I'm not sure of the efficiency of this new feature, since every web site that has some sort of product catalog that seems to be only available to humans through forms has semi-hidden link pages to feed the search engines. But it will certainly bring to the surface lots of information that people thought were more or less hidden (not to say protected) behind a form.

Give a box full of keys to a monkey, and it will probably find one that opens your door if you have a weak lock! Inspecting my server logs shows that the current form crawler is not really different from a monkey, trying lots of very common words, 100 per day.

Now the nice thing is that, according to Google's explanation, my blog is a "high-quality site" :-)



Speeding up mobile web applications

LWUIT's weird licensing terms