BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

Unstructured Data: The Other Side of Analytics

Following
This article is more than 9 years old.

Everyone is obsessed with “analytics.”  I cannot keep up with what the technology pundits and consultants are telling me about analytics, no matter how hard I try.  (I just Googled “big data analytics” and got 107,000,000 results in 0.29 seconds.)

If you listen to these people, you will absolutely, positively believe that without huge investments in analytics immediately, your company will at any moment explode in a huge fireball.

“Big data analytics” is already a cult (like so many cults we’ve seen before).  The data Gods are angry, my friends, and they’re pouring data onto us so fast that it’s impossible to avoid being buried alive – or so the pundits and consultants would have us believe.   So we need more servers, data bases, tools and most of all, “data scientists.”

Universities are scrambling to create this expertise, churning out data scientists as quickly as they can.  Students are flocking to these programs because companies pay data scientists really well, and everyone knows it: try finding data scientists at the mall – or anywhere, for that matter. (At Villanova, we offer a new masters program in analytics and it’s already very popular.

“Predictive analytics” is the goal:  of course (!), who wouldn’t want to predict the future? Yes, “analytics” is important, just as digital commerce, location awareness, cloud computing, mobility and digital security are important.  Hopefully, we’ve learned how to ride these technology waves – because there will be others. There are always others.

But there’s another side to the analytics obsession that needs attention because most analytics applications and technologies are focused more on structured than unstructured data.  (Oh, the joys of relational data base management, now largely gone, as old data base managers endlessly pine for the good old days of rows and columns.)

Unstructured data includes social media (tweets, blogs, posts, etc.), call center notes, email, images, and open-ended surveys, among other forms.  Some unstructured data is internal to companies (like call center notes) while other forms are external (like social media).  (By the way, there’s also internal structured data – like SKU sales – and external structured data – like market trends.)

The distinction between structured and unstructured data is important because automated reasoning, one of the pillars of Web 40.0, requires both kinds of data analytics. Predictive analytics especially requires both, and without the integration and analysis of unstructured data with structured data it’s impossible to comprehensively describe, explain, predict or prescribe behavior. For example, while a company might track the sales of specific products and services, and correlate structured sales data with all kinds of variables (like time of year and customer demographics), without unstructured data (like social media and call center logs) it’s impossible to fully understand why sales rise or fall: structured data analytics can describe and explain what’s happening and unstructured data analytics can explain why it’s happening.  Together you get the whole picture.

Without both, you’re half blind.

The pundits tell us that unstructured data is the fastest growing form of data.  In fact, nearly 80% of new data is unstructured.

But unstructured data is noisy. So one of the major challenges of unstructured data analytics (UDA) is finding diagnostic signals within mountains of unstructured noise. Once it’s cleaned and analyzed, unstructured data must then be integrated with structured data. This can be done “manually” or with the major business intelligence (BI) platforms that companies already have in their analytics arsenals, platforms like SAS, Tibco’s Spotfire, SAP/NetWeaver, and IBM/Cognos, among so many others.

Many vendors talk about UDA, but they talk better than they deliver. The signal-to-noise, integration and real-time challenges are real, and just a few vendors have it all sorted out.  Most are still figuring it out, still trying to determine to Hadoop or not Hadoop, and other similar conundrums.

What am I talking about?

Go to your current analytics/BI vendors/consultants and tell them you only want internal and external unstructured data signals (no noise), you want unstructured/structured data integration, you want unstructured/structured predictive analytics and you want it all in real-time through your installed BI platform with a personalized dashboard – and you want everything to adapt to structured and unstructured data sources that you can turn on and off depending on your analytics requirements.  Oh, and you also want all this at a price you can afford, a price that generates measurable ROI.

See how many vendors can comply with your request and provide you with the other half of analytics.  See how many can comply with full-view analytics anywhere, anytime on any device and cooked just the way you like your data done.