Skip to content

Introduction to digital long-term preservation

Digital long-term preservation can be defined as "a series of managed activities necessary to ensure continued access to digital materials for as long as necessary" 2. In accordance with the "Guidelines for Safeguarding Good Research Practice" from the DFG, a retention period for research data of ten years has been established in many scientific disciplines 3. However, research data and especially ESS data can be and often is of great relevance long beyond the ten-year retention period and needs to be preserved accordingly.

Digital long-term archiving aims at ensuring the (re)usability and comprehensibility of digital data over an undefined period of time through securing the

  • Authenticity: "The digital material is what it purports to be. In the case of electronic records, it refers to the trustworthiness of the electronic record as a record. In the case of "born digital" and digitized materials, it refers to the fact that whatever is being cited is the same as it was when it was first created unless the accompanying metadata indicates any changes. Confidence in the authenticity of digital materials over time is particularly crucial owing to the ease with which alterations can be made." 2
  • Integrity: "Internal consistency or lack of corruption of digital objects. Integrity can be compromised by hardware errors even when digital objects are not touched, or by software or human errors when they are transferred or processed." 4
  • Access: "Access is assumed to mean continued, ongoing usability of a digital resource, retaining all qualities of authenticity, accuracy and functionality deemed to be essential for the purposes the digital material was created and/or acquired for." 2
  • Interpretability: The ability to understand data on a semantic level. Long-term interpretability of digital data requires sufficient descriptive metadata on a contextual and content-related level about the digital object.

of digital data over time. Regarding the conceptual structure of a digital archive the Open Archival Information System (OAIS) Reference Model 51 has established itself as a de facto standard in the archival community.

Challenges

  • Digital obsolescence: Digital data requires a software environment to render and present its content. Continuous and rapid technological change in hard- and software environments, as well as file formats, poses a major challenge to ensuring a continuous comprehensible access to the content of digital data.
  • Documentation: Digital data needs to be adequately documented and described with metadata on several levels in order to ensure its independent long-term comprehensibility. In addition to technical specifications (e.g. regarding file format, software, etc.), information on creation context and content, all steps undertaken over time for long-term preservation must also be documented.
  • Issue of scale and appraisal criteria: The volume, diversity and complexity of newly created digital data, the diversity of file formats used, as well as the size of individual files, has been continuously growing, posing an issue of scale on several levels. Not all digital data needs to or even can be preserved indefinitely, thus criteria for the appraisal of digital data are required to determine its archival value and the intended timeframe of preservation.
  • Organizational issues: Long-term archiving of digital data requires a continuous curation with corresponding financial and human resource allocation. Responsibilities within an organization (or in the case of collaborations with partner organizations) need to be clearly defined and assigned as well as preservation planning policies implemented, taking possible organizational changes into account.

Strategies

Bitstream preservation is a strategy to preserve digital data (the exact bit string of a data file) as it was at the point of ingest. It is a basic requirement for digital long-term preservation, but it does not ensure access to the content of digital data over time.

There are two main strategies for digital long-term preservation:

  • The migration strategy ensures continuous access to the content and interpretability of digital data and its usability through migrating data to newer file formats, once the currently used is in danger of becoming obsolete. This requires ongoing monitoring regarding technological changes in order to determine the point of time actions have to be undertaken. In addition, documentary material on the original file format and systems as well as the definition of significant properties is necessary. Despite the possibility of information loss during the migration process, this strategy is considered so far as the most feasible one.
  • The emulation strategy on the other hand keeps the original data as it is and aims at ensuring continuous access to it through (re)creating the original software environment on to newer hardware environments and operating systems. This approach has proven difficult to implement so far due to technical and legal challenges.

Additional resources

References


  1. This text is identical to the standard ISO 14721:2012. 

  2. Digital Preservation Coalition, editor. Digital Preservation Handbook. Digital Preservation Coalition, 2 edition, 2015. URL: https://www.dpconline.org/handbook

  3. Deutsche Forschungsgemeinschaft. Guidelines for Safeguarding Good Research Practice. Code of Conduct. 2022. doi:10.5281/zenodo.6472827

  4. CoreTrustSeal Standards and Certification Board. CoreTrustSeal Trustworthy Data Repositories Requirements: Glossary 2023-2025. 2022. doi:10.5281/zenodo.7051125

  5. The Consultive Committee for Space Data Systems. Recommendation for Space Data System Pratices: Reference Model for an open archival information system (OAIS): Recommended practice CCSDS 650.0-M-2. Volume 2 of Recommended Practice. Magenta Book, Washington, DC, 2012. URL: https://public.ccsds.org/Pubs/650x0m2.pdf