But, I Babble: Bafflegab!: Captcha? What's that?

Monday, May 24, 2010

Captcha? What's that?

Captchas - just what are they? The coinage is slowly moving into the mainstream, but there are a number of people who do not know what they are or understand how they work. I mentioned using them in the post below, so it's worth spending a brief moment on this topic...

The intent of a Captcha is to prevent automated systems from scouring the web, scraping up information as they go, or posting spam en masse. The idea is that, if human intervention can be imposed, we can slow down the vile intents of these systems.

But first, a brief word about the word itself. Strictly speaking, it should be rendered in all caps, because it supposedly is a backronym for Completely Automated Public Turing test to tell Computers and Humans Apart. Personally, I find the all-cap rendering ugly, so I've resorted to initial-capping it.

Computers are not very good at interpreting images. A Captcha is an image of (usually) a series of letters and/or numbers, which then have to be interpreted by human eyes, and re-keyed manually. Current technology is not at the point where machines can process these efficiently.

Needless to say, there are attempts to overcome this situation, but typically this still involves humans - essentially, it's done by paying a token sum to individuals (often in third-world countries) to interpret the images. This move is only partially successful because, no matter how low the pay is (typically about US$0.001 each), it still becomes an intolerably large amount for spammers.

Typically, a Captcha is a series of characters which are "warped" to the point of still being able to be interpreted by human eyes, but not by image scanning software. Techniques involve the use of color, skewing the image, or superimposing lines over it. There are a few variations on this theme, one being the use of pictures which then need to be identified (e.g. a picture of a camera, or an apple). A fascinating new one is the "Recaptcha", which deserves more explanation.

You may be aware that companies like Google have embarked on massive projects which involve the scanning of text and automatically converting it to computer characters, using optical character recognition (OCR). Much of this is the interpretation of old books, where the text is not always clear. As I said above, computers have trouble with this. So, why not use the legions of users out there who are asked to enter Captchas, and get them to assist with the interpretation of the unreadable stuff? Enter the Recaptcha (well, OK, officially reCAPTCHA!). You are probably familiar with seeing not one, but two words that must be interpreted, like so:

The words are presented in random order. One of these words is already known by the requesting machine; the other one is not. The first one is used to perform the original purpose of the Captcha, while the second one is the unknown word. (It's often possible to guess which is which, based purely on the relative quality of each.) Your answer then gets sent back to Google, or wherever, and, when there is enough consensus that a word is what we say it is, then it is assumed to be correct and is inserted into the text where the mystery blob originally appeared. Clever, huh?

Click these Wikipedia links to learn more about Captchas and Recaptchas. Tell me what YOU think about this idea, as well as the use of Captchas in general.

No comments:

Post a Comment

Must-reads

I came across this thoughtful, well-written, objective essay in Newsweek, discussing how the Teabaggers use the Constitution as their "bible" to get their message across, and how incredibly misguided and just plain wrong this notion is.

And an item about how the Republicans, in this election cycle, have decided that the electorate is just plain stupid (hmm...).

This article from Techdirt contemplates the nature of centralized vs distributed organisms, in the context of Wikileak's recent exposures of the U.S. Government's behavior in Iraq. Regardless of one's opinion of this particular situation, it's worth reading because it provides food for thought about what our future may look like.

English as she is spoke!

This space features funny language-related boo-boos from World Wide Words ©, with regular updates.

----------------

Now that I'm back, I'm finally getting to update this...

• Last Saturday's Guardian included a quote from a woman whom it identified as Atha Cain "who carried her husband's ashes into Hull Crown Court for the trial of the woman who caused his death by wreckless driving."

• A recent comment about having a backup program that issued the message "An invalid argument was encountered" put Robert Hart in mind of a response he once received from an earlier generation of computers: "Wrong Error". He commented, "This was from IBM in its heyday. I feel it transcended mere obscurity and approached the metaphysical."

>> I can personally identify with this, having encountered and enjoyed many nonsensical errors over the years. My personal fave was the ever-present gem from IBM:
Error: An error has occurred.
Action: Correct the error.

Back then, you had to consult a hardcopy manual to discover this profundity. Sadly, to this day, I still see this sort of BS.

• The Observer's monthly food supplement in December recommended the
single malt Highland Park as "a genuine classic that never fails to disappoint."

• An aerial photo submitted to the Boston Globe, Anne Reece tells us, had the intriguing caption "A view of Diamond Head crater taking off from Honolulu at the end of August".

• On November 8, this was reported in the Huffington Post: "White House Press Secretary Robert Gibbs grew so angry with Indian security officials on Sunday that he got into a heated shouting match with them - and even blocked a door they were trying to close with his foot." Maybe they shoulda just used their own hands?

• On his blog, Making Light, on 24 October, Patrick Nielsen Hayden reported an example of a classic error that's reminiscent of the famous (and apocryphal) book dedication, "To my parents, Ayn Rand and God". It was a caption to a picture of Merle Haggard in the Los Angeles Times of 21 July and referred to a documentary about him: "Among those interviewed were his two ex-wives, Kris Kristofferson and Robert Duvall".

• Peter Smith read an item on Sky News, dated 22 October, about a plane crash in the Congo that killed 20 people, including the pilot, Chris Wilson: "Generally viewed as being in a chronic state of disrepair, Mr Wilson had apparently expressed concern about the Czech-built Let-410 before the crash."

Explore me

Monday, May 24, 2010

Captcha? What's that?

No comments: