BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

The Right Process For Big Data Analytics Profit

Following
This article is more than 8 years old.

The term “Big Data” is relatively new, but the challenge it represents, realizing value from voluminous, complex and growing electronic data resources, has existed for decades. So has a meaningful, open standard for data analysis that was specifically developed for dealing with massive datasets; it's called CRISP-DM.

In a 2001 essay, 3D Data Management: Controlling Data Volume, Velocity, and Variety, Doug Laney laid out his now famous “Three Vs” of Big Data. His clients were overwhelmed by the volume (sheer quantity), velocity (ongoing addition of new cases) and variety (diverse formats) of data at hand. Yet even before Laney’s article was published, an international consortium of over 200 organizations had already banded together (with funding from the European Union) to define and publish an open standard, CRISP-DM, for analysis of massive datasets.

Although industry surveys indicate that it’s the most widely used analytics process standard, CRISP-DM is not particularly famous. It’s known by many hands-on analysts, but not the wider business community. Not every segment of the analytics community uses, or is even aware of, the standard. That’s a shame, because good process maximizes the chance of producing information that executives can actually use.

What’s in this process standard? CRISP-DM lays out six major phases of the analytics process, with steps to be taken in each of them. It’s not a linear process; the phases represent an ongoing cycle of action and analysis, and there’s often a lot of back and forth within and between phases.

The major phases of the CRISP-DM process are:

  • Business understanding. Defining the problem to be addressed.
  • Data understanding. Assessing the data available and its suitability for addressing the business problem.
  • Data preparation. Making the data ready for use, filling gaps, correcting errors, merging data sources and so on.
  • Using mathematics to describe business processes and make useful predictions.
  • Judging how well the mathematical model perform in real-world application.
  • Integrating results into everyday business practice.

Each phase has specific deliverables and documentation requirements, which are explained in a detailed step-by-step guide created by the original CRISP-DM consortium in 2000. (It’s 75 pages of small type, with lots of juicy detail. If you prefer a lighter version, you’ll find it in my book, Data Mining for Dummies.)

Each phase involves several tasks. For example, the business understanding phase tasks are: identify your business goals, assess your situation, define your data mining goals and produce your project plan. Each of these elements is further defined with specific deliverables. The single task “identify your business goals” calls for preparing three documents, to explain background, business goals and success criteria.

Not every analyst is enthusiastic about process. When I speak about process, I sometimes get heckled. At one talk, a data science graduate student, raised strong objections to my recommendation to use CRISP-DM. “This is so ten years ago,” he said.  At another, a young man complained that the process seemed slow, and required a lot of writing from the start. People who are trained in programming and math are not always trained to care about business process and documentation.

The good news is that CRISP-DM is used worldwide today, across sectors in government, non-profit and many industries. CRISP-DM is a flexible standard, makes sense for today’s datasets and applications. It’s mentioned in some current textbooks and some analytics tools offer special feature to support it. But there’s no organization to maintain and promote use of the standard. The consortium that created the standard disbanded when EU funding ran out.

A recently-founded professional organization, the Society of Data Miners, based in London, may form the heart of a new constituency for CRISP-DM. The society aims to advance the data mining profession and promote high standards for competency and ethics.

All analytics practice, whether you prefer to call it statistics, data science, data mining, or by any other name, needs good process to ensure that results are relevant to real business problems, appropriate for use, and preserved as valued intellectual capital. The open standard exists, complete with step-by-step instructions. Analytics professionals should embrace it to maximize the value of their own work to clients and employers. Executives should ask about CRISP-DM, and insist on its use, because poor analytics process means poor analytics returns.

Follow me on LinkedInCheck out my website