What are repositories?

(adapted from the corresponding article by E. Böker on forschungsdaten.info)

What are repositories?

Repositories are storage locations for digital objects that make them available to a public or limited group of users. Repositories can be categorised according to

the type of objects to be stored (publications, code or research data),
the domain of the data contained (institutional, specialised or generic),
the storage period of the data (e.g. 10 years to fulfil the rules of good scientific practice or permanently) or
the policies with which the data may be retrieved and reused.

Examples of repositories can be a university's institutional publication server, a subject-specific open access repository, a subject-specific data repository or a long-term archive for data and publications.

Curators often check the content or technical quality of the data before including it in the repository, and sometimes also with regard to legal aspects (copyright, data protection). In this way, they ensure that the data can be used by third parties in its current form.

How do repositories work?

A repository essentially consists of repository software and a database. The data providers can transfer the data to the repository via a web-based user interface or the repository operators can collect it automatically from other platforms via corresponding protocols and interfaces.

For subsequent use by third parties, metadata is required in addition to the actual data. The data provider can take some of this data from other applications or add it manually. Metadata describes the content of the research data and provides information about its creation, the software or methods used, and legal aspects. The metadata should also specify terms of use in the form of licences that regulate access to the data (registration, embargo, etc.).

To ensure that the data can be permanently referenced and cited, most repositories assign unique persistent identifiers. The content of many repositories is indexed in search engines and specialised databases (e.g. Google Scholar) both via the persistent identifiers (often DOIs or URNs) and via corresponding interfaces. Furthermore, repositories have a search function with which users can find, view and download the data they contain.

Selecting a suitable repository

The selection of a suitable repository should be based on the practices of the respective discipline or the requirements of funding organisations or publishers. It also depends on whether data is to be preserved for a specific period (e.g. ten years) or archived for the long term.

If no specifications exist, specialised repositories should be considered first as storage locations. There are several directories that facilitate the search for a suitable repository. The Registry of Research Data Repositories re3data.org service, for example, provides a worldwide overview of research data repositories. ROAR and OPENDOAR are directories that list open access repositories from all over the world. The selection on these sites can be customised and narrowed down using search and filter functions.

Institutional repositories, offered by a growing number of universities and research institutions, or generic repositories, often provided by central institutions or non-profit organisations, are suitable for storing and publishing research data for which no suitable subject-specific repository exists.

Certificates for repositories

Quality criteria can make the decision in favour for or against a repository much easier. Such certificates give the data producer the assurance that the data will be stored, usable and citable in the long term. Data users can rely on a minimum level of quality (data format, citability, etc.) of the data held in certified repositories. Certified repositories, archives, libraries and museums benefit from increased visibility of their services. There are several initiatives that award seals of approval or certificates for repositories based on different criteria.

Examples:

https://www.coretrustseal.org/
Nestor seal for trustworthy long-term digital archives