Bitstream preservation
What is a bitstream?
A bitstream (or bit stream) is the sequence of digital bits, zeros and ones, that represent a digital object. A bit (short form for "binary digit") is the fundamental unit of information, that computers work with. This is why we can say that every piece of digital information comes down to a bitstream. The bitstream refers to how information is physically processed and stored - either in electronic circuits or on physical storage media. It represents a digital object, such as a data file containing research data. Files have their specific file formats, but when we talk about the bitstream, we're usually not talking about file formats, but focus on the raw sequence of zeros and ones.
What does bitstream preservation mean?
The objective of bitstream (or: bit-level) preservation is to maintain an unaltered, physical representation of the digital object in its original state at the time of its delivery.
The preservation of the unaltered digital object is a basic prerequisite for its digital long-term preservation, but it is not sufficient on its own to guarantee access to the content of digital data over time.
In summary, bitstream preservation is mostly concerned with the mere storage of a digital object, whereas the task of archiving takes into account the preservation of its content and its authentic meaning.
Long-term archiving is more than bitstream preservation
Even if integrity is maintained based on the physical bitstream, there are other barriers to reusing the data after a long period of time. This is, the ability to properly interpret the bitstream may be lost. Media, file formats, software, and storage locations may become obsolete and thus inaccessible or unusable. To overcome these obstacles, bitstream preservation needs to be complemented by additional measures. These are referred to as content preservation and include migration, emulation, or mixed strategies for long-term archiving.
Challenges on the bitstream level
Various forms of decay can challenge the integrity of a bitstream:
-
Data degradation or bit rot is the gradual deterioration of a storage medium such as a hard drive or in solid state media. As a result, the stored bit stream is altered in an uncontrolled manner or is completely lost.
-
Risks and disasters such as fire, earthquake, flood, lightning, power failures, airplane crashes, nuclear accidents, war damage, terrorist attacks or acts of sabotage threaten to physically destroy the storage facility or make it inaccessible.
The challenge of bitstream preservation itself is addressed by technical measures that include redundant storage and data checks.
Methods to ensure bitstream preservation
The establishment of bitstream preservation policies should be defined in a comprehensive storage and backup strategy. This strategy may consist of various complex techniques, procedures and combinations of procedures, depending on the need to preserve the data.1 The following aspects should be considered for a successful strategy to preserve the bitstream in the long term:
-
Redundant Storage
In contrast to the gradual and partial degradation observed in analog media, the loss or damage of a bitstream is typically instantaneous and complete. This is because it is sometimes sufficient for only a single bit in the bitstream to be altered, as a result of decay, unauthorized manipulation, or undocumented alteration, for the entire file to become unreadable or uninterpretable. It is therefore advisable to keep several copies of the data object at all times in order to avoid losing it completely. To make sure there is always one complete copy of the original bitstream, special techniques are used, which are explained in more detail below: -
Multiple Copies
According to the 3-2-1 rule, at least three independent copies on two different types of storage media should be kept, with one copy located offsite. -
Resilient Storage Systems
Today´s storage systems with built in resilience have become standard. For example, hard drives that operate within a Redundant Array of Independent Disks (RAID). Prerequisites for long-term preservation differ from systems that store the data for active use. Some systems combine the requirements for accessibility and long-term preservation, like the LOCKSS program's open-source technologies and services. -
Storage Diversity and Geographic Redundancy
Storing data in different locations helps to protect it from local disasters. Choose locations with different disaster threats. Store data in at least one remote location and use different storage technologies to avoid unexpected losses. Different storage technologies use different resiliency mechanisms, which helps to reduce the likelihood of a systematic failure. -
Data Checks, Fixity Checks and Monitoring
Stored data should be checked regularly. The best way to do this is to perform fixity checks. Fixity means that a digital object is fixed or unchanged. A fixity check determines if an object has been altered without authorization and documentation. In most cases, checksum algorithms are used for this purpose. These algorithms use cryptographic hash functions to ensure that two different data objects do not produce the same checksum (= collision resistance). To perform a fixity check a new message digest is compared to the original. The checksum or cryptographic hash of a digital object can be stored in its PREMIS record [@PREMIS2015], in a separate database, or in a manifest file. To keep data fixity over time, monitor it regularly. Also, when files are copied or converted to different formats, perform a fixity check. In the latter case, recalculate the checksum or hash.
Responsibility for bitstream preservation
At the institutional level, bitstream preservation is the responsibility of technical staff. But the concepts also need to be understood by managers who must assign responsibilities in the research data life cycle and establish monitoring routines for data in long-term storage.
Additional resources
- [@waters1996preserving]
- [@Beagrie2015_pre]
- [@Beagrie2015_star]
- [@Beagrie2015_stor]
- [@Beagrie2015_ris]
- [@forschungsdatenArchiveCalculation]
- [@Ullrich2010]
References
-
The US NDSA has developed a matrix that describes different levels of preservation for different functional areas. Three of the five areas apply to bitstream preservation: storage, integrity and control. Levels of Digital Preservation (ndsa.org). ↩