BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

Big Data's Dark Side

This article is more than 9 years old.

I don’t mean to pick on big data. All technology has a dark side, if you let it loose. Mobile devices can be leashes or liberators (speaking as someone who once took his laptop to spring training, I count them as the latter). Sometimes I think security devices trip up the people they’re trying to protect more than they stymie the hackers they’re trying to target.

Arguably, most of my time here is spent promoting the value that big data provides. So when I start highlighting its limitations, it’s not that I want you think of a haunting, mellifluous voice cajoling, “Come to the dark side, Luuuuuuuke.” It’s that I want to point out things to think about so that we can be aware of those limitations, and thereby sidestep or overcome them.

Forbes columnist Steve Andriole concurs with this philosophy, I suspect. In his recent piece, Unstructured Data: The Other Side of Analytics, he writes, “‘Big data analytics’ is already a cult (like so many cults we’ve seen before). The data Gods are angry, my friends, and they’re pouring data onto us so fast that it’s impossible to avoid being buried alive – or so the pundits and consultants would have us believe.”

He adds, “[U]nstructured data is noisy. So one of the major challenges of unstructured data analytics (UDA) is finding diagnostic signals within mountains of unstructured noise. Once it’s cleaned and analyzed, unstructured data must then be integrated with structured data. … Many vendors talk about UDA, but they talk better than they deliver. The signal-to-noise, integration and real-time challenges are real, and just a few vendors have it all sorted out.  Most are still figuring it out, still trying to determine to Hadoop or not Hadoop, and other similar conundrums.”

Sounds negative, sure, but he’s just trying to get our attention: everyone who’s made their fortune in structured data analysis is trying to sell you unstructured data analysis, but it’s not easy, so don’t take them at their word.

Over at the Smart Data Collective site last week, consultant Martyn Jones railed about the contradictions of big data, going so far as to cite three success stories of companies combining structured and unstructured data. Oh, by the way, those projects took place in 1989, 1993, and 2001. Jones went on to hammer away at all the pillars of big data, until we’re left with one unassailable truth. No worries, it’s one we’ve danced around for years: data is data. Ultimately, we’re getting smarter about analyzing it. But don’t dazzle yourself into thinking you’re doing something new. It’s still data analysis.

(In my mind, that’s the great thing about technology: it’s all incremental. People get all het up about smartphone interfaces, but the principles really aren’t that much different than those computer-human interaction [CHI] experts have been arguing about for years.)

You can’t talk about digs against big data without a look at privacy issues. This article by Chris Cottrell, about privacy advocates taking big data to task, is from the site of a German broadcaster, reporting from last week’s Mobile World Congress in Barcelona. And you know how seriously they take privacy in Europe.

Cottrell quotes technology executive Gary Kovacs’ keynote, in which Kovacs cited a user agreement that popped up while his 10-year-old daughter was trying to download a video game. The company apparently wanted access to his daughter's name, location, searches, and chat."Somebody please explain to me why a massive worldwide company needs to collect this information on a 10-year-old girl. What do they do with it? Why don't they tell me what they do with it?"

All fair questions, which serve to bring up still more teachable moments. Sure, to follow Martyn Jones’ argument, what we think of now as big data has been around for years. What’s new? It’s easier and sleeker than ever before. The fact is, we’re still figuring out how to take advantage of that ease. It can be smooth, or it can be slippery.

When data is too easy to get, it’s easy to overreach. The value of identifying big data overreaches is that it helps us figure out how not to go too far – to think about not just the data we can get, but the data we really need to understand the world around us. Shining a light on what we really should do with big data helps keep us away from the dark side.