Skip to content

Keep or delete data? - A decision guide

This article is adapted from the corresponding article by [@anders_2024]

Introduction

Data is of great importance in both private and professional environments, as it is essential for communication, organisation, entertainment, business decisions, innovation and security. The effective use and management of data (including data protection aspects and, if necessary, the deletion of data) are therefore crucial for success, efficiency and security in various areas of life.

Data also plays an important role in scientific research. It not only forms the basis for findings, but is also crucial for the reproducibility, traceability and integrity of research results and may be reused for further research. The long-term storage and accessibility of data may therefore be necessary. Data is made accessible or preserved in databases, repositories and archives. On the other hand, sensitivity aspects may require the secure deletion of data.

As technological and methodological opportunities become more advanced, the volume of data continues to increase - just think of the huge amounts of data generated by for example climate modelling. This also increases the need to categorise research data into relevant data (retention required) and data that is no longer relevant (deletion required). Many researchers struggle to delete data. The decision as to whether data should or could be retained or deleted, but also how data should be stored, depends on various factors and can vary depending on the context. There are various aspects to consider:

Relevance and benefit

Start by checking whether data is still relevant and useful. For example, outdated, inaccurate or no longer required can be deleted. Secondary use, i.e., the use of the data in a different context, needs to be taken into account. Only data that is relevant for current and future purposes should be retained.

Data protection and data security

Data protection and security aspects must are of great importance before deleting or retaining data. Personal or otherwise sensitive data must be securely stored or, if necessary, securely deleted in accordance with data protection regulations in order to avoid data protection breaches.

Storage space and resources

Limiting factors in data storage are the available storage space and the resources required to store and manage data. Deleting data that is no longer required can free up storage space and improve the efficiency of data management. The reproducibility of data also plays a similar key role. If it is measurement or survey data, for example, it cannot be reproduced and its retention should be considered. However, if it is numerical simulation data that can be reproduced with relatively little effort, it is likely sensible to delete this data to reduce resource consumption. Another point is the storage hardware used in the case of data retention: storage on magnetic tape, for example, is much more economical than storage on hard drives, which are constantly in motion and require large amounts of electrical energy.

The decision to delete or keep data must be in accordance with applicable legal requirements and compliance guidelines. Some data may need to be retained for a certain period of time for legal or contractual reasons before it can be deleted.

Archiving and storage

Unused data that must be retained for historical or legal reasons can be archived. This avoids overloading the "active storage space", i.e., the memory that is constantly in use for data processing, which is often in short supply at scientific institutions.

In general, it should always be weighed up whether data can be deleted, must be retained for legal/contractual reasons or whether it should be retained due to its subsequent usability. It is therefore advisable to evaluate and assess the respecitve risks. These may include security risks, data breaches or legal risks that may be influenced by such decisions.

The aspects described form the basis for the decision aid, which is shown in Figure 1 in the form of a decision tree. "Keep or delete data?" uses clear questions and answers to facilitate the decision in favour of data retention or deletion.

Decision guide: Keep or delete data?
Media 1: Figure 1: Decision guide for deleting or retaining data. Sources: Anders et al. (2024)

References