Features & Columns

OMG, THE FILES ARE IN THE COMPUTER? Many users don't realize how the information they post to social media is being used.

New Data

A quarter of the world's citizens use social media. In early 2015, Twitter had 236 million active users. Facebook had 1.44 billion. The world generated 98,000 tweets and 695,000 Facebook posts every minute in 2012, and the numbers keep rising. Social media is like a "firehose" spewing an unfathomable amount of data, says Warshaw.

This torrent is so relentless that social media has become a key source of "big data"—large, complex and continuous sets of information, like the DNA sequence of a species or every purchase ever made at Wal-Mart. Until recently, computers lacked the processing capabilities to analyze anything at the scale of social media. They were overwhelmed by the firehose of tweets and posts, like an ant trying to make sense of a skyscraper.

But with faster processing speeds and specialized algorithms, our understanding of the world has transcended such barriers of scale. Some of the insights from social media are not what we might have guessed. For instance, one study examined what people "like" on Facebook and their IQs. The strongest social media indicator of a person's intelligence, the results showed, is whether he or she likes curly fries. (Hint: Only an idiot doesn't like curly fries.)

The algorithm in Warshaw's study drew from two well-known models in psychology, called "Big-5 Personality" and "Schwartz's Basic Human Values." The Big-5 traits are openness, conscientiousness, extroversion, agreeableness and neuroticism. Schwartz's Values "describe the beliefs and motivations that guide a person throughout their life," the team's paper explains. Five values make up this model: self-transcendence, openness to change, conservation, hedonism and self-enhancement.

The algorithm links specific categories of words to different dimensions of personality and values, based on findings from previous research. The system scores each person for each trait, compares their score to the rest of the population, and assigns a percentile rank between 0 and 100. The language used in these two models is clear to psychologists, but ambiguous to the rest of us. To make profiles easier to interpret, the UCSC team presented a summary of each participant's most defining traits.

Information like this equals dollars for many companies. It's no secret that they track consumer behavior. Amazon recommends products based on previous online purchases—and not just from itself, but many other retailers.

Until recently, researchers and companies focused on user behavior. Now they want to track us and understand us more deeply by probing our personalities.

"It might affect the ways things are sold to you, not just what's sold to you," says UCSC psychologist Steve Whittaker, co-author of the study.

Employers are also intrigued by this technology, which led many to begin asking for access to online profiles as they attempt to glean information beyond basic interviews. Legal protections have since been put in place, but the notion of first impressions may now be a fallacy.

Uncanny Accuracy

When I used the same algorithm to generate my own profile, I fed it posts from my old Facebook account. I was impressed by its accuracy, especially since it saw only the silly things I had written in high school. Before seeing my own profile, I figured this technology would be as esoteric as an astrology reading. Instead the readings are surprisingly straightforward.

The study's 18 volunteers also acknowledged the system's accuracy. When Warshaw asked participants if they agreed with how the algorithm captured their personalities, all but one participant said yes. Some people used only professional accounts. Others rarely posted on social media. Still, their profile results were uncannily accurate.

"I don't know how it would derive that from the limited number of tweets that I made," one participant said. "I guess I'm a little shocked that it works so well."

The team randomly presented several hypothetical scenarios to each participant, and in each scenario the volunteers could choose whether to share their profiles. The incentive to share ranged from getting an online shopping discount to being matched with professional mentors.

More than half of the participants shared their profiles in each situation. Thirteen out of 14 shared them for the reward of recommendations about local events. Fewer volunteers—10 out of 17—shared their profiles in a mock job application.

The team also gave participants the choice of posting their computer-generated profiles on their real social media accounts. More than half of them shared the reports.

The next step was to understand why participants decided to share or not to share. The volunteers perceived several risks, including pre-judgment from potential employers.

"People don't feel good about that at all," Warshaw says.

The realities of data sharing and privacy unnerved the group, especially companies that already take our data without asking. "This technology is out there, and some versions aren't requiring user consent," Warshaw says.

A study by High-Tech Bridge, an information security company, revealed that both Facebook and Google+ "click" on links found in users' private messages. Soon after, two Facebook users filed a class action lawsuit against Facebook. Campbell et. al v. Facebook alleged that the company was reading private messages, not to search for scams or spam, but to collect valuable data about users. Any link found in a private message would be counted as a "like," information useful for developing better advertisements. The case is ongoing.

Privacy Demands

Perhaps unsurprising, one of the key takeaways from the report was just how much people like reading about themselves. Participants in the study felt like they were learning about themselves—or at least how they appear to others.

And yet, although participants recognized the technology's benefits, almost half still felt uncomfortable about sharing it.

Whittaker notes the paradox: "One sad thing about the study is that it shows people don't seem to believe they have a lot of control." Participants feared that in real life, their information would be used regardless of their consent. In their minds, there was no use in trying to keep it private.

Aside from privacy, people also worried about not sharing. "Non-sharing is interpreted as hiding terrible information, pressuring non-sharers to share against their wishes," the team writes. This phenomenon is known as the "unraveling effect."

"If they know you decline, that's more of a red flag to them," said one participant.

Despite these attitudes, Warshaw and Whittaker have hope for social media. Users may eventually feel empowered enough to demand privacy and consent.

"I think these systems will continue to be deployed, but if papers like ours have an impact on people's consciousness, it will lead them to be more careful," Whittaker says.

This trend already is taking root. In 2007, just 20 percent of Facebook profiles were private. That number is now up to 70 percent. "I think companies see a role for these kinds of analytics in employment situations and market research," says Warshaw. "Right now, companies assume they can get data without consent. But now that people are going more private, eventually that won't be viable."

The irony, of course, is not lost on the researches, who appreciate the strangeness of using computers to condense our characters into neat packages of words. "Granting the algorithm this level of 'humanity' simultaneously reduces our humanity by supplanting people as the sole judges of character," the team writes. "This result raises the ethical question: Should an algorithm judge character?"

"People aren't perfect, and systems aren't perfect," Warshaw says. "We're still at the point where it might be better for a person to be wrong than for an algorithm to be wrong."

Regardless, someone or something will be watching.