BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

More Privacy Perils: Facebook Data Is Greater Than The Sum Of Your Likes

This article is more than 10 years old.

New research from the University of Cambridge in England can accurately predict a person's political slant, age, gender and even if they're gay based on their Facebook Likes. Much of what the study found was obvious (liking Jesus Christ correlates strongly to being Christian, Cover Girl makeup to being female and Rush Limbaugh to being Republican. Triple duh!), but some was less so, and that's where things get either interesting—or scary.

The report, Private traits and attributes are predictable from digital records of human behavior, was just posted on the Proceedings of the National Academy of Sciences, and is coauthored by David Stillwell and Michal Kosinski of the University of Cambridge and  Thore Graepel, of Microsoft Research in Cambridge. In the authors' words, the study shows that, "easily accessible digital records of behavior, Facebook Likes, can be used to automatically and accurately predict a range of highly sensitive personal attributes including: sexual orientation, ethnicity, religious and political views, personality traits, intelligence, happiness, use of addictive substances, parental separation, age, and gender."

Some of the findings are bested by standard personality tests, but many are strikingly accurate, "The model correctly discriminates between homosexual and heterosexual men in 88% of cases, African Americans and Caucasian Americans in 95% of cases, and between Democrat and Republican in 85% of cases." And although, as ScienceNews reporter Rachel Ehrenberg points out, "The line between intuition and stereotyping is a blurry one,"   coauthor Kosinski insists that "the computer isn’t biased.… It just spits out correlations."

And this is true. Computers aren't biased, but the people who program them can be, as well as the people who use the data they produce. The more colorful results seem innocuous enough, "the best predictors of high intelligence include 'Thunderstorms,' 'The Colbert Report,' 'Science,' and 'Curly Fries,' whereas low intelligence was indicated by 'Sephora,' 'I Love Being A Mom,' 'Harley Davidson,' and 'Lady Antebellum.'" But in the wrong hands, the expression of preference can be a demographic trap leading to psychographic profiling. Leading indicators for homosexuality include, "liking the TV show Desperate Housewives or the musical Wicked."

The study is based on data from 58,000 Facebook users who volunteered to use an app called myPersonality, created by study coauthor Stillwell. In their conclusion, the authors argue for the scientific value of this kind of large-scale data collection "digital records of behavior may provide a convenient and reliable way to measure psychological traits. Automated assessment based on large samples of behavior may not only be more accurate and less prone to cheating and misrepresentation but may also permit assessment across time to detect trends." All true enough. The game-like nature of Facebook apps may indeed make them more accurate magnets for how people actually feel about things than more clinical methods. But the authors also acknowledge that, "the predictability of individual attributes from digital records of behavior may have considerable negative implications, because it can easily be applied to large numbers of people without obtaining their individual consent and without them noticing."

This is a point that has caused Facebook trouble before, as when it came out (so to speak) that ads on the social network could be used to "out" gay users. The point then as now is that any given Facebook Like may be meaningless, but a collection of Likes, a profile of Likes is inherently meaningful—often in ways that a user is not aware of. Statistically speaking, we are greater than the sum of our Facebook Likes. The authors of the study admit they, "can imagine [a] situations in which such predictions, even if incorrect, could pose a threat to an individual’s well-being, freedom, or even life."

There is a corollary here with computer security. The accumulation of too much data about any one entity in a single location poses a threat. In cyber security, this risk is mitigated by dispersing data in such a way that no one bit of it leads to any other. Studies like this one from Cambridge suggest that we may need to think about privacy in similar ways. Using third-party tools to distribute our data among different servers—preferable ones that users possess their own unique encryption keys to—may be the only way to prevent third parties from painting possibly misleading pictures of us without our consent or knowledge.

For Mark Zuckerberg's empire, the question is whether our consent to participate in the social network implies our consent to being stereotyped. If this data can be used in this manner, I think it is safe to say it is being used in this manner, and will continue to be unless users become outraged enough to demand explicit changes in what can or cannot be inferred algorithmically from their behavior on the network. Would you like curly fries with your Colbert Report and Harvard degree, sir?

– – – – – – – – – – – – – – – – – – – –

To keep up with Quantum of Content, please subscribe to my updates on Facebook, follow me on Twitter and App.net or add me on Google+.