To mark the World Digital Preservation Day we would like to address the topic of long-term preservation. Needless to say, a comprehensive overview is out of the question; the topic is just too diverse and complex. Instead, this blog entry aims to examine long-term preservation as an ongoing and communicative process.
Challenges of digital long-term preservation
Digital long-term preservation can be defined as safeguarding the curation, accessibility and usability of digital data. What might sound straightforward enough actually poses a major challenge in practice. Data is information in an encoded form which can only be viewed with suitable software and only saved on storage devices (e.g. hard drives) that are not permanently stable. Defects can occur without prior warning, at worst causing the information to be lost entirely, at best necessitating a complicated recovery. Read the Explora story Surfing the sea of data for further information on this. By contrast, damage to a physical object such as a book or letter which has been spotted at an early stage can be repaired or stabilised with appropriate conservational methods, thereby preventing the partial or total loss of the item.
Safeguarding data technically
Long-term preservation of digital data therefore boils down to countering the instability of the storage medium and ensuring that the binary code, also known as the bitstream, can always be interpreted and used correctly.
We are able to tackle the storage problem with backups and by replacing ageing hardware. To preserve the bitstream, we have software at our disposal that creates a clear check sum using an algorithm which enables the integrity of the data flow to be guaranteed.
Guaranteeing the interpretability of data
To guarantee interpretability, we contemplate which formats are best suited and which metadata we store. This guarantees the legibility of the data. In doing so, we assume that the rapidly progressing technological development will necessitate the permanent upkeep of the data and sooner or later migration, i.e. conversion into new formats. Another approach is emulation, i.e. a “replica” of a suitable software environment, with the aid of which the data can be used and interpreted.
Digital long-term archiving as continuous communication
Long-term preservation as a permanent communication process can be regarded as a prerequisite for both approaches. We will communicate the information we deem necessary to more or less unknown recipients in the near or distant future. We consider today what will be necessary to interpret the data in a specific environment that is unknown to us. Ideally, the package comprising data and its metadata which we have put together and archived will be self-explanatory so a recipient whom we do not know can read and interpret the data without any further queries.
Illustration: Andres Bucher
Nevertheless, our efforts should not be taken as a pure message in a bottle that might be discovered in thousands of years. Instead, the continuity of these communication activities may be crucial for the success of long-term archiving for generations to come. Besides the (meta) data we define as important or absolutely necessary today, new metadata that will be just as important will also accumulate through the use of new formats and software. This information cycle and the continuity of the transfer must not be broken.
ETH Data Archive
Therefore, to save the data and metadata safely for the medium and long term, we champion daily research data management and digital curation. With the Data Archive we run a long-term archiving solution for ETH Zurich that is used to safeguard digital collections or preserve digital publications and research data in the Research Collection, for instance.
Please don’t hesitate to contact us. We would be glad to answer any questions you may have or advise you on your next research project: firstname.lastname@example.org
DOI Link: 10.16911/ethz-ib-3129-en