The lifetime of a major academic institution is measured in centuries, yet the lifetime of computer media that archive its scientific data is measured in decades or less. Centinel is a project to perfect methods for archiving digital scientific data so that the archives could be read reliably by the computers that will exist a century hence, with no human effort needed in the intervening years. That might seem a hopeless goal, given continual rapid and unpredictable changes in methods and technology. But attention to what has lasted through centuries past, and what it means to be "digital", suggests a solution.
The Centinel method targets the kind of moderate-volume scientific data that in the past has been recorded in scientific notebooks, but now typically is stored in various databases on various computer media. The method applies to printed paper media the same kind of error detecting and error correcting codes presently used to make magnetic and optical computer media reliable. It uses (1) a simple data format accessible to human readers and to computers; (2) a digital error-correcting code for symbols, related to what are called Hamming codes in computer science; (3) complete documentation of the mathematical and logical methods; (4) multiple independent copies of the archives; and (5) printed or inscribed media independent of present and future technologies.
A prototype of the system has been tested on the core datasets of Cedar Creek Natural History Area.
- To perfect a method for long-term digital data storage
- To raise awareness of digital data loss and advance the art of digital preservation
- To gain experience by implementing the method on a specific, well-defined application.
- To accomplish a viable archive for that application (The long-term wetland data of Gorham et al.)