The problem with big data is that there’s so much of it. No, that's not a Yogi-ism because it accurately describes the paradox facing many organizations today. We have entered an era in which it is straightforward to collect a myriad of data, from the most mundane (think server logs or retail register receipts) to the most profound (like one’s genome or daily online meanderings and preferences), cheaper to store it indefinitely and easier than ever to aggregate from multiple sources. Yet this bounty of data perversely makes it harder to refine into useful information: more costly to process, complex to analyze, harder to comprehend and thus less than effective at improving decisions.
The situation will only get worse as the variety of technology trends summed up by today’s hottest buzzword, the Internet of Things (IoT), mean our data supply, both as individuals and businesses will be shifting into overdrive as data collection and connectedness is added to all manner of objects, whether home thermostats, industrial equipment, or even livestock. Never hear of the quantified cow? Your local dairy farmer has and is may already be outfitting them with the bovine equivalent of a Fitbit to increase milk production while reducing greenhouse gas emissions.
Data plenitude = business transformation
Yet the plethora of data raises the serious issue of how best to actually use it for informed, optimized business decisions. That’s where big data systems come in; however these introduce a new set of problems: the need for data science expertise to handle the analytic complexity, additional infrastructure, and concomitant added cost, to process the information and added lag in decision process since these systems typically must process data in bulk, not as a continuous stream. But superabundant data offers business benefits that are too compelling to sacrifice on the altar of IT simplicity.
Problems beget business opportunity
BitYota CEO Dev Patel characterizes the problem as one of cost and complexity. He says businesses today are faced with data everywhere, in every process, coming from many sources; that’s the complexity. But companies don't have the resources to cleanse, normalize and warehouse the mix of structured and unstructured data, nor to build their own tools; that’s the cost. Much like Salesforce democratized access to sophisticated CRM software, the new generation of data analysis services do the same for big data systems.
Some, like BitYota focus on providing the core infrastructure to ingest and process large data streams in real time. Others, like Quid attack the problems of making sense out of enormous, multi-variant data sets through AI-like natural language processing, clustering algorithms, creative visualizations and new UI paradigms that provides an interactive environment for exploring data. Interana, which just launched with a founding team from Intel and Facebook, differs in that it’s a pure software product that can be run either on-premise or in the cloud and attacks both ends of the big data problem: algorithmic processing and human analysis. According to CEO Ann Johnson, Interana provides the full big data stack: storage, analytics and visualization.
In each case, the goal is to insulate customers from the costly and complex details of building a scalable, distributed analysis system capable of dealing with data from a variety of sources by exploiting cloud economics, whether that’s running on public cloud infrastructure, or in the case of Interana’s hybrid on-premise option, commodity, cloud-like bare metal hardware.
Different services for different usage scenarios
Befitting a new, dynamic and still chaotic market, there are no hard and fast product categories: various data analytics services address different problems. For example, BitYota and Interana focuses on user, not machine event data, things like Web hits, DVR remote clicks and call center logs, to help companies understand customer behavior. Medio targets mobile marketing, while [24]7 specializes in omnichannel sales engagement and customer support. Quid, whose co-founder and CTO Sean Gourley keynoted at VMworld 2014, takes a more exploratory approach he calls “data intelligence” to the data analysis problem. Quid's interactive software is reminiscent of the way Mathematica revolutionized scientific and engineering analysis over two decades ago by making it easy to explore and refine many possible predictive models.
Big data analytics as a service is a natural progression and it's exciting to see the application of cloud economies of scale to higher-level problems like predictive analytics, data intelligence and graph analysis. By enabling the deployment of vast compute resources, in short bursts, to specific problems cloud software enables the understanding of previously invisible or inscrutable correlations, connections and likely outcomes. For businesses, data analytic services means unlocking the value of big data without building dedicated infrastructure or hiring a staff of PhD data scientists. It will be interesting to see the creative potential unleashed by making such power available to anyone with a credit card.
- Schmarzo, Big Data, Introduction ↩
- Schmarzo, Big Data, Chapter 1: The Business Transformation Imperative ↩