BETA
This is a BETA experience. You may opt-out by clicking here
Edit Story

Will Your Data Still Be Around Tomorrow?

Xerox

By Jean-Pierre Chanod, Senior Scientist, Xerox Research Centre Europe

If you’ve ever found yourself wondering if in 10 or 20 years’ time you’ll still be able to access all of your electronic data and documents, you’re not alone. All of those emails, pictures, videos, websites, and even scientific measurements and medical or legal records could be lost as systems and software change.

Preservation of this digital content is becoming a major concern for organizations with large amounts of data and the need to preserve the critical knowledge contained within.

Digital obsolescence is a threat to preserving the records of the 21st century. Information systems of the future will need to be preservation aware by design to ensure the long-term access to -- and integrity of -- valuable economic, cultural and intellectual assets.

Lessons from Mesopotamia

Clay tablets appeared in Mesopotamia around 2400 B.C. They were intended for writings that were meant to last, as opposed to writing on more perishable material like papyrus. Some of these tablets managed to last far beyond the expectations of their authors.

Collections of tablets impressed with a stylus in soft clay and then baked in the sun have reached us today, in various languages and different states of conservation. They cover a range of topics, such as business records, poetry, prayers, hymns, history, divination and science.

These tablets are striking examples of tangible cultural heritage; where the legacy of physical artifacts inherited from past generations convey artistic, cultural, religious, documental or aesthetic meaning often produced within a long-gone society.

The Problem with Digital

A tangible heritage is one that can be stored and physically touched. Accessing the meaning behind physical items inherited from the past requires the expertise to recover, translate, compare, interpret and contextualize the text encrypted in the clay tablets or any other ancient artifact such as the Rosetta stone.

But what about preserving intellectual assets created today? Physically stored digital data cannot alone be considered as a tangible item in the traditional sense of preservation. That’s because it cannot be accessed without a complex computer environment that includes physical storage as well as various combinations of hardware and software.

Moreover, computer environments change at a rapid pace and are quickly obsolete. Digital content that relies on a specific environment from a given point in time is at risk of becoming inaccessible and lost.

Digital obsolescence is a greater threat to the preservation of digital content than the combined hazards for traditional paper documents such as acid, mold, and looting. Digital obsolescence happens quickly, is pervasive, and it’s hard to control.

Obsolescence threatens all aspects of the rendering chain, including:

  • Bits in storage that degrade, or for which readers are no longer available.
  • Data formats with outdated documentation or for which the rendering software has disappeared.
  • Software that runs on dead or rare devices, and retired operating systems.

Digital Becomes Tangible: Manage the Unpredictable

In the good old days, maintaining the physical integrity of books, newspapers, manuscripts, and pictures was achieved with reasonable effort and care, and at a manageable pace.

There is little hope to secure durable access to digital content without taking specific actions before digital obsolescence comes into play. Soon after the content has been produced, one must make irrevocable decisions about whether it should be sent to the future, and in which form. Otherwise it will be lost forever.

At some point, digital preservation will be integrated seamlessly into the information lifecycle: information systems of the future will be preservation-aware by design.

To make this happen, the digital objects of the future will not be treated simply as bit streams associated with adequate hardware and software at their time of creation. They will become part of a rich information ecosystem, self-descriptive of all that is essential to know about itself: its purpose, intended behavior, the context within which it was created, the user experience, and more.

To make the descriptions of such ecosystems sustainable, they will be infrastructure-independent, with a strong focus on capturing their temporal evolution and authenticity.

Eventually these descriptions will travel into the future, where yet unknown information systems will need to make sense of them, irrespective of the hardware and software that initially created and used the content. Future generations will then reconstruct not the original digital objects, but new ones. These new objects will convey the essential properties of the originals, albeit rendered in a significantly different mode.

This is a major shift in preservation: one no longer preserves tangible physical objects per se, but views abstract representations of such objects that can be reconstructed in an unpredictable technological future.

This shift represents a major challenge for the long term preservation of modern cultural, intellectual and economic assets, the consequences of which are not yet widely recognized. Eventually, digital preservation will become a natural and transparent side effect of the proper governance of information and data.

More from Xerox:

5 Ways Data Mining Can Improve Healthcare

Thinking Through the Entire Customer Experience