BETA
This is a BETA experience. You may opt-out by clicking here
Edit Story

What If Big Data Is Too Big? A Radical Solution May Be In DNA

NetApp

Where would you store 70 million books?

One answer—albeit very old school—is a really big library. After all, the U.S. Library of Congress has 35 million books; San Francisco’s main library has fewer than 3 million. A more twenty-first century answer is to digitize those books onto hard drives. Hundreds of hard drives.

Or how about encoding it in just one gram of DNA? George Church and Sriram Kosuri from Harvard’s Wyss Institute and Yuan Gao, now at Johns Hopkins, have stored those books—700 terabytes of data—in a single gram of DNA, the stuff that makes us, well, us.

source: Will Lorton (public domain)

They treated the DNA as just another digital storage device, but instead of binary data encoded as magnetic regions on a hard drive, they manipulated the composition of the DNA molecules.

It’s not as off-the-wall an idea as you might think; our genes, made of strands of DNA, are information storehouses. In them are the building plans for every last cell in our bodies; researchers like Church have spent the last few decades learning to decode our genetic information.

Time for a Quick Biology Refresher...

Deoxyribonucleic acid is a large, complex molecule made of atoms in a structure that repeats itself. Among the atoms in DNA are four—cytosine, adenine, guanine, thymine—that are used to store the information in our genes and in the researchers’ books. They occur in pairs, constructing the rungs of the twisted ladder, or double helix, of the molecule. (The rails of the ladder are made of other atoms such as carbon, nitrogen and oxygen and aren't used for encoding.)

Church and co. converted a book, Church’s own 53,000-word tome Regenesis: How Synthetic Biology will Reinvent Nature and Ourselves in DNA, into 10 MB of binary—strings of 0s and 1s—making 70 million copies of it. They then translated the code into the DNA: Zeros were encoded in guanine and thymine, ones in adenine and cytosine.  Using standard laboratory gene synthesizing equipment, they made almost 55,000 strands of DNA, each containing a portion of the text and an address block, indicating where it occurred in the flow of the book.

Church and his colleagues are not the first to store binary information in DNA, but they stored a tremendous amount and were able to read and write it using commercially available gene synthesis and sequencing machinery, in just a couple of weeks.

The results, published in the journal Science, were not only astonishing but garnered interest throughout the corporate world. “Every storage and many device makers as well as the large data companies have contacted us. You would know the names,” says Church. This might not solve every big-data problem—fast access to the data is an issue—but as a solution to large-scale archival storage it could be a godsend.

But Why DNA?

As our own Dave Einstein reported, “The capacity of hard drives isn’t increasing fast enough to keep up with the explosion of digital data worldwide. Forecasts call for a 50-fold increase in global data by 2020, but hard drives may grow only by a factor of 15.”

But DNA is unimaginably dense. Four grams could theoretically hold 1.82 trillion gigabytes: All the data the world produces in a year.

It requires no power, only a dry, non-volcanic climate, and can last indefinitely. No cold-aisle datacenter required. This is chemistry, not electro-mechanics; the unique chemical properties that allow DNA to hold life’s hereditary information make it possible.

Why Now?

Reading and writing information into DNA is still much slower than any of today’s computer technology but it’s improving rapidly. While current storage technology is only improving at about 50% per year, gene-sequencing technology has improved by orders of magnitude in the last decade.

It took years to analyze a single human genome for the original Human Genome Project, now it takes just a few hours. And costs have dropped from $10,000 per million base pairs of DNA in 2001 to about 10 cents per million in 2012. Church says he can see another 1,000-fold improvement in the next few years.

With the reams of digital data we’re creating, there’s an immense potential for DNA to be a stable, long-term archive for ordinary information, such as photographs, books, financial records, medical files, and videos—all of which today are stored as computer code on fallible, power-hungry storage devices that, unlike DNA, become obsolete.

What's Next?

Church’s next project is to build a biological VCR. His vision is to record everything that happens around us and archive the information in DNA (the volume of data would make it impossible with today’s methods). Ever the researcher, Church foresees being able to go back through that information and study pivotal events in people’s lives.

Share: Click here to tweet this article.

More from NetAppVoice: