The deal with Google buying reCaptcha

Google 收购 reCAPTCHA
Image by Fenng(dbanotes) via Flickr

I had no idea why Google would buy a company, reCaptcha, that does captchas. For those of you who don’t know, captchas are the little squiggly text that people enter to prove they are human. The word “captcha” actually stands for: Completely Automated Public Turing test to tell Computers and Humans Apart

With these things are all over the internet, why would Google buy this specific company? I found out a few reasons. First, they are the original gangsters – it turns out the guy who invented captchas is the founder of reCaptcha. Second, they way they do the captcha words is quite innovative. Check this out: reCAPTCHA takes scans from newsclippings, articles and old books that can’t be read by machines (because they are scans) then feeds them to humans in a captcha one at a time with other words that it knows. The user then enters both words. The word that reCAPTCHA knows is tested – if correct, it now learns an additional word to use on other challenges. This is how they build up their database of words from scans.

Google has for the past 6 years been scanning books like crazy. They have millions of books scanned. What they don’t have is text of those books available to be searched. The thought is that if you use captchas to surface all the words of those books one at a time, this will enable a massive crowdsourcing project to build a database of literature. Very interesting experiment. I never really hear of such clever business development deals. I love it.

Reblog this post [with Zemanta]