BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

Blueprints Of NSA's Ridiculously Expensive Data Center In Utah Suggest It Holds Less Info Than Thought

This article is more than 10 years old.

For the last two months, we’ve been bombarded with stories about the spying information-collection practices of the NSA thanks to documents leaked by the agency’s most regretted contract employee, Edward Snowden. The degree of forced exposure has gotten to the point that once secret information gathered for the agency -- whose acronym is jokingly said to stand for "No Such Agency" and “Never Say Anything” -- was the subject of a press release on Friday; the Office of National Intelligence announced that it got the legal sign-off for a fresh batch of “telephony metadata in bulk” from companies such as Verizon and AT&T – despite continuing controversy over that including the call records of millions of Americans who are non-terrorists and non-criminal suspects.

The NSA will soon cut the ribbon on a facility in Utah built to help house and process data collected from telephone and Internet companies, satellites, fiber-optic cables and anywhere else it can plant listening devices. An NSA spokesperson says the center will be up and running by the "end of the fiscal year," i.e., the end of September. Much has been written about just how much data that facility might hold, with estimates ranging from “yottabytes” (in Wired) to “5 zettabytes” (on NPR), a.k.a. words that you probably can’t pronounce that translate to “a lot.” A guide from Cisco explains that a yottabyte = 1,000 zettabytes = 1,000,000 exabytes = 1 billion pettabytes = 1 trillion terabytes. For some sense of scale, you would need just 400 terabytes to hold all of the books ever written in any language. Dana Priest at the Washington Post decided to go with a simpler, non-technical approximation, saying the million-square-foot facility will store “oceans of bulk data.”

However, based on blueprints of the facility obtained by FORBES – and published here for the first time -- experts estimate that the storage capacity of the data center is lower than has previously been reported given the technology currently available and the square footage that the center has allocated for its servers.

The motherlode for the massive, multi-building site is a set of four identical data halls; they are flanked by buildings containing power sources, batteries and back-up generators, as well as an administrative building and a kennel (dogs are part of the site’s security plan). Within those data halls, an area in the middle of the room – marked “MR – machine room/data center” on the blueprints – is the juicy center of the information Tootsie pop, where the digital dirt will reside. It’s surrounded by cooling and power equipment, which take up a goodly part of the floor space, leaving just over 25,000 square feet per building for data storage, or 100,000 square feet for all four buildings, which is the equivalent of a Wal-Mart superstore.

Brewster Kahle is the engineering genius behind the Internet Archive, which is kind of like the NSA for the public Web. The NSA data center will accumulate private interactions and information and make them searchable; the Internet Archive’s Wayback Machine does the same thing for the open Web for historical purposes. Kahle estimates that a space of that size could hold 10,000 racks of servers (assuming each rack takes up 10 square feet). "One of these racks cost about $100,000," says Kahle. "So we are talking $1 billion in machines."

Kahle estimates each rack would be capable of storing 1.2 petabytes of data. Kahle says that voice recordings of all the phone calls made in the U.S. in a year would take up about 272 petabytes, or just over 200 of those 10,000 racks.

If Kahle’s estimations and assumptions are correct, the facility could hold up to 12,000 petabytes, or 12 exabytes – which is a lot of information(!) – but is not of the scale previously reported. Previous estimates would allow the data center to easily hold hypothetical 24-hour video and audio recordings of every person in the United States for a full year. The data center’s capacity as calculated by Kahle would only allow the NSA to create Beyonce-style archives for the 13 million people living in the Los Angeles metro area.

Even that reduced number struck Internet infrastructure expert Paul Vixie as high given the space allocated for data in the facility. He came up with a lower estimation. Assuming larger 13 square feet racks would be used, factoring in space between the racks, and assuming a lower amount of data storage per rack, he came up with an estimate of less than 3 exabytes of data capacity for the facility. That would only allow for 24-hour recordings of what every one of Philadelphia’s 1.5 million residents was up to for a year. (But who would want to watch that?) Still, he says that's a lot of data pointing to a 2009 article about Google planning multiple data centers for a single exabyte of info.

"For all I know, Google has hit and exceeded that target by now," says Vixie. "But in 2009, one XB split across many global data centers was a lot."

William Binney, a former employee of the NSA turned whistleblower, believes the agency is guilty of unconstitutional information gathering on American citizens, and that the sheer size of the data centers in Utah and elsewhere suggests that the agency wants to vacuum up everything it can, including the content of people's phone calls and emails. He points to a former FBI agent claiming on CNN that the government could listen to phone conversations that Boston marathon bomber Tamerlan Tsarnaev had with his wife before the bombing. When a CNN correspondent expressed surprise that it would be possible to get access to those pre-attack phone calls, the former FBI agent responded, "Welcome to America. All of that stuff is being captured as we speak whether we know it or like it or not."

Binney gave the 5 zettabyte estimate to NPR and also included it in an affidavit filed in Jewel vs. NSA, a lawsuit filed by AT&T customers over the NSA's placing surveillance equipment in a secret room at one of the telecom's outposts in San Francisco. His estimate is based on the assumption the facility might offer equipment like that developed by Cleversafe. The company says it has a 10-exabyte data storage system that involves portable data centers with 21 racks each:

Cleversafe is the only company that can deliver on this requirement today. Its 10 Exabyte data storage system configuration, which uses the same innovative object-based dispersed storage technology originally developed by the company, has been expanded to allow for an independent scaling of storage capacity and performance through a Portable Datacenter (PD), a collection of storage and network racks that can be easily deployed or moved. Each PD contains 21 racks with 189 Storage Nodes per PD and 45 3TB drives per Storage Node. This geographically distributed PD model allows for rapid scale and mobility and is further optimized for site failure tolerance and high availability. The company’s current configuration includes 16 sites across the U.S. with 35 PDs per site and hundreds of simultaneous readers/writers to deliver instantaneous access to billions of objects.

Binney read this as meaning that 21 racks could hold 10 exabytes, and assumed an efficient rack size of 4 square feet, which he says is standard for government use. However, he misread the Cleversafe marketing materials. They actually say that 560 portable data centers of 21 racks each (or 11,760 racks) can hold 10 exabytes. Chris Gladwin, founder of Cleversafe, says that in January of 2012, 10 exabytes of storage system would have needed "about 2 million square feet."

"A 10 EB 'usable' system today would take about 1 million square feet," he says by email. "And in a couple years, you could build a 10 EB system in half that space."

Back to Kahle's estimate: he thought the power bill for supporting a facility with that many servers would be $70 million per year for 75 megawatts, based on each rack consuming 5 kilowatts of power, plus air conditioning. Wired estimated a $40 million bill based on a 65-megawatt power demand. The NSA declined to make a representative available for this story or confirm the authenticity of the data center blueprints. However, its director for installations and logistics, Harvey Davis told NPR the annual maintenance bill on the center would be $20 million.

Randy Sparks, a consultant who has helped construct data centers for a major tech company and Lucasfilm, says any estimation based on a blueprint is going to be heavily dependent on very specific assumptions. Some data centers privilege computing power over storage power, and we don’t know for sure what the NSA wants from this facility as it has not released specific information about the center’s storage capacity or function.

“It’s somewhat immaterial,” says Sparks. “Given the rapid advancements in data storage technology, is there really an upper limit to the amount of data that can be stored?”

All experts queried for this piece cited Moore’s Law, saying that storage capacity will increase exponentially over the years. It's capacity will double every year and a half per the principle, meaning that the NSA’s ability to store information will only increase.

“I always build everything expandable,’’ said the NSA’s Davis in his interview with The Salt Lake Tribune.

Plus, this is just one NSA data center of many, as detailed in Dana Priest's recent comprehensive overview of the NSA's expanding operations in the Washington Post.

Given the increasing ease of keeping data, Kahle is worried about a future in which we live in a completely-surveilled society, where any information that can be collected will be collected and will go into a data warehouse for potential analysis.

“If there’s the will and the opportunity, there is the technical ability and the cost is low,” he says. “You don’t need dossiers of people if you just collect all of the data and can collate it at any time.”