Fun With AOL Data
By Annalee Newitz
LAST WEEK, AOL did another stupid thing, but at least it was in the name of science. The giant web portal released a data chunk containing three months' worth of queries to its search engine taken from roughly half a million users. Gathered during the months of March, April and May, the data show queries along with the date and time, as well as which websites the users ultimately visited. The idea was this information might be of some use to researchers.
To protect user privacy, AOL replaced the login names of searchers with numbers. So you could still see everything that searcher No . 4356 looked for, but you wouldn't know who No. 4356 was. Except there's one problem: It's incredibly easy to figure out who people are based on their searches, because they tend to look for themselves, family members and things in their immediate geographical vicinity. The New York Times did a great story in which reporters examined searches done by user No. 4417749, and within hours managed to locate their author, a nice old lady in Georgia who now plans to cancel her AOL subscription.
Bloggers and privacy advocates have pointed out that the information AOL released contains more than just innocent Georgia ladies. It's unclear what law enforcement might do with the thousands of searches for illegal drugs and pornography. It's equally unclear what the feds will make of the handful of searches for "Muslim death rituals," "Muslim brotherhood" and "Islamic militant web forums." In a nation where the government is seriously contemplating blanket warrants for online surveillance, it's hard to imagine there aren't law enforcement types combing this treasure trove of prepackaged personal data. Imagine getting enough dirt on somebody to haul them in for questioning just by downloading 400 megabytes of stuff from AOL! That's like free candy.
After public outcry reached a crescendo, AOL apologized and took the data down. Of course, privacy advocates like the Electronic Privacy Information Center's Marc Rotenberg and Electronic Frontier Foundation's Kurt Opsahl remain pissed off. Why? Because this is the interweb, folks. Data never dies here. In fact, you can search the records yourself via Don't Delete (www.dontdelete.com). Once I visited Don't Delete, I couldn't leave. There's a button you can click to give you the search terms from a random user, and every time I hit it I got another gem. My favorite was user No. 4206444, who was obviously a college student trying to cheat quickly on her or his exams in order to get around to the more important things in life. Search phrases like "does social darwinism persist in social welfare policies and in the attitudes of the general public about social welfare" were followed by "free essays on adolescent depression and suicide risks," and "free essays on Charles Dickens Hard Times." In between these queries were hundreds for "sailor moon pictures," "pokemon pictures," "sonic x" and "selena pictures."
As blogger Thomas Claburn (www.lot49.com) points out, there's a kind of poetry to some of the queries. He excerpts a dozen lines from the 8,200 queries done by user No. 23187425, all of which seem to be a sort of conversation this person was having with the search engine. He or she never actually clicks on any links, just keeps querying with plaintive phrases like "i have had trouble," "i want to change" and "i know who i am."
I'm torn. I love having access to this data, both for its touching human qualities and for the kinds of anthropological information it could yield. But as someone who believes strongly in digital privacy, I simply can't sanction what AOL did. It would be different if I had faith that discovering all those porn searches would somehow inspire people to accept that sexual curiosity is normal. And it would be different if I thought law enforcement would assume that the people searching for "Islamic militant web forums" were simply trying to understand the world. But I don't. This data will be used to "prove" that the Internet is crawling with child pornographers and terrorists.
Someday AOL's information should be put into the public domain for anthropologists and cultural researchers of the future. That future, however, is probably decades if not a century away. The data is too close to us now—too easily weaponized. Nevertheless, I still hold out hope that one day our search queries will illuminate us and provide for another generation a digital outline of our daily desires.
Annalee Newitz (firstname.lastname@example.org) is a surly media nerd whose search history is known only to Google, which isn't exactly comforting.
Send a letter to the editor about this story.