BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

Why Hasn't Open Source Taken Over Storage?

This article is more than 10 years old.

Open source products have very uneven penetration into the world of business technology. If you look at content management systems or languages, open source rules. But if you look at the market for ERP software or for storage systems, open source hasn’t made much of a dent.

If Linux and BSD weren’t free, it is clear that businesses like Amazon and Google would have a much harder time of it -- and would probably not exist -- if they had to pay a license on every one of their servers to someone. IDC’s latest analysis of the server market shows that Linux is making the biggest dent in other forms of Unix. Servers running IBM ’s mainframe operating system zOS grew by 9 percent in the second quarter of 2013. In the same period, Linux server revenue was up 1.8 percent, non-Linux Unix share dropped by 21 percent and Microsoft server revenue was down by 5.1 percent.

But there is another realm that open source has yet to mount a serious attack, the world of storage. CIOs may complain loudly about writing big checks to NetApp , EMC , and other storage vendors, but write them they do. So do the large Internet companies, although they have concocted their own storage solutions like the Google File System where they could, albeit for internal use, though.

For a long time I’ve wondered: Where does open source win? Where does it go slowly? Where does have no hope? And specifically Why hasn’t an open source storage system succeeded? (Yes, I know that RedHat’s Gluster is one attempt at this, and there are others, but so far these systems are not replacing mission critical use cases in large numbers. All of you out there who disagree with my premise, please comment on this article and I will be happy to respond.)

In a recent conversation with Jay Kidd, the CTO of NetApp, a storage vendor, I got the best answer yet. The reason that certain problems have resisted is that open source is great at solving computer science problems, but far less effective at meeting the challenges of engineering problems.

“Computer science gets you 90% of the way there, in concept, and engineering gets you the other 90% of the way there in reality,” said Kidd. The primary reason: nothing always works the way it should and often fails in unpleasant ways. “One big job of engineering is to fix in the software the problems that happen in the hardware.”

Think of it this way: You could sic a bunch of grad students from MIT on the problem of creating APIs to define and configure virtual computing resources. In 3 months they would probably come up with some abstractions that were even better than those created by Amazon Web Services or the Rackspace Cloud. But if you took all of the engineers at MIT, it would not be clear that they would ever create a system that would work at scale the way that Amazon Web Services or the Rackspace Cloud does. Werner Vogels, Amazon CTO, refers to this work as “undifferentiated heavy lifting.”

One big qualification. I’m not saying that open source hasn’t mightily influenced storage. Kidd says that NetApp is based on BSD and I’m sure most large storage vendors use open source heavily. But that’s a lot different from having an open source storage system that you can just download and use.

Another reason storage has resisted open source is that storing data is not a realm that people want to experiment with. “Storage is the only thing that keeps working when all the power is turned off to the data center,” said Kidd. “The switches don’t switch, the servers don’t serve, the applications don’t run, but the storage keeps storing.” Kidd admits that there are domains of storage that in which people are willing to experiment, for example, storage of log files or other data that is volatile but arrives in great volume. In these areas, open source storage is used more often.

In addition, Kidd argues that the natural way that an open source process works is iterative, not something that can be tolerated for mission critical systems. “Customers are conservative around the technologies they choose to store their data. But open source has always depended on a few people to try open source to solve a problem, learn something from it, give feedback to the community. Open source iterates toward quality,” said Kidd.

“It is a creative process. It’s not clear that Linux had a clear set of product requirements that it followed linearly to get to where it is today from when it started 20+ years ago. There is this process of wandering that is healthy for some problems and won’t be tolerated in others.”

This difference between computer science and engineering explains the reason ERP has resisted attack as well. Writing software that handles all the complex accounting rules all over the world is much more an engineering problem than that of computer science. So is many of the other realms automated by ERP.

For storage the problem is that there are dozens of different kinds of hardware involved in storage, all that have their quirks and weird failure modes. To complete the engineering solution, you need to see all the problems. “Engineering is done at large scale. Computer science is done at small scale. To test storage software requires a lot of storage. Disk drive manufacturers don’t make software and vice versa,” said Kidd. “We’ve learned our storage software is hardened by dealing with so many disk drives over so many generations.”

Kidd is careful to point out that he is not at war with computer science or with open source. He just thinks that business people will make better decisions if they understand the relationship between computer science and engineering.

“Are there any engineering victories that don’t start with computer science? Maybe a few,” said Kidd. “But there are some problems that started by computer science that need a lot of engineering to finish, and storage is one of them. Until the hardware is standardized and works almost perfectly, I don’t see open source providing a storage solution suitable for enterprise applications.”

Follow me on Twitter or LinkedInCheck out my website