BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

Best Practices For Managing Big Data

This article is more than 10 years old.

Guest post written by Ash Ashutosh

Ash Ashutosh is CEO of Actifio, a provider of data management software.

Big Data is the result of practically everything in the world being monitored and measured, creating data faster than the available technologies can store, process or manage it. Since it is a lot more intuitive to represent information as a “file” than a relational object, there has been a surge of unstructured data, making up as much as 80% of new data we must manage.

Organizations are struggling to manage Big Data. According to IDC, the amount of information created, captured or replicated has exceeded available storage for the first time since 2007. The size of the digital universe this year will be tenfold what it was just five years earlier.

Therefore, organizations must find smarter data management approaches that enable them to effectively corral and optimize their data.

Too many organizations think they can manage Big Data by throwing increasing amounts of storage at the problem. They often buy additional storage capacity every 6-to-12 months, which not only results in exorbitant costs but forces their frazzled IT teams to spend more time on data management rather than more strategic IT initiatives. The lack of a real solution for managing Big Data simply causes tremendous inefficiencies all across the organization.

At the same time, Big Data just keeps growing and growing, according to Forrester Research:

--The average organization will grow their data by 50 percent in the coming year.

--Overall corporate data will grow by a staggering 94 percent.

--Database systems will grow by 97 percent.

--Server backups for disaster recovery and continuity will expand by 89 percent.

Big Data results in three basic challenges: storing, processing and managing it efficiently. Scale-out architectures have been developed to store large amounts of data and purpose-built appliances have improved the processing capability. The next frontier is learning how to manage Big Data throughout its entire lifecycle.

What most people don’t know is that the vast majority of Big Data is either duplicated data or synthesized data. Let’s take a look at a leading medical research facility that generates 100 terabytes of data from various instruments.

This data is then copied by 18 different research departments that further process the data and add 5 terabytes of additional synthesized data each.  Now they must manage a total of over a petabyte of data, of which less than 150 terabytes is unique. Yet, the entire petabyte of data is backed up, moved to a disaster recovery site, consuming additional power and space used to store it all. So now, the medical center has used over 10 petabytes of storage to manage less than 150 terabytes of real unique data. This is not efficient.

So how should it be managed?

The first step is to bring the data down to its unique set and reduce the amount of data to be managed.

Next , leverage the power of virtualization technology. Organizations must virtualize this unique data set so that not only multiple applications can reuse the same data footprint, but also the smaller data footprint can be stored on any vendor-independent storage device.

Virtualization is the secret weapon that organizations can wield to battle the Big Data management challenge.

By reducing the data footprint, virtualizing the reuse and storage of the data and centralizing the management of the data set, Big Data is ultimately transformed into small data and managed like virtual data. Now that the data footprint is smaller, organizations will dramatically improve data management in three key areas:

  • Less time is required by applications to process data.
  • Data can be better secured since the management is centralized, even though access is distributed.
  • Results of data analysis are more accurate since all copies of data are visible.

Virtualization is indeed the “hero” when it comes to managing Big Data. And, it gives organizations so many additional benefits – end-users enjoy flexibility, lower costs and freedom from IT vendor lock-in.

A smarter data management approach not only allows Big Data to be backed up far more effectively but also makes it more easily recoverable and accessible with a whopping 90% cost savings - while freeing IT staff to drive more strategic technology initiatives that drive corporate growth instead of engaging in a futile battle with an out-of-control Big Data beast.