Describe
What is the describe phase?
After the data has been observed, collected, created and evaluated, an appropriate organisation and storage strategy is chosen to guarantee access to raw data, which is followed by data processing and analysis. In the 5th phase of the data lifecycle, the information involved with these 3 steps is described and documented.
Describe: The aim of data description is to use structured, machine-interpretable metadata to explain how the data was collected, e.g, methods, periods, equipment, how to access raw datasets, for example using data repositories, and what techniques were used to process raw data, e.g., standardized clean-up tools, machine learning. Ideally, the required metadata descriptions have been defined in the Data Management Plan and are added while the project is ongoing.
Why is the describe phase important?
Precise data descriptions help researchers interpret data, integrate different datasets, reproduce datasets, and generally use them in future projects. With proper data descriptions using standardised metadata and vocabularies it becomes possible to e.g., combine geochemical data of soils with satellite images, atmosphere data, and agricultural maps of the same location. Similarly, for example high-pressure experiments to simulate the deep Earth can only be replicated if equipment details and experimental parameters are provided with common vocabularies and protocols. Datasets using standardised and detailed metadata are also mandatory for the next phase of data lifecycle: archiving.
What will be offered in this article collection?
Each scientific domain has its dedicated vocabulary and, hence, its domain specific metadata with associated vocabularies. These are deposited in research vocabulary databases, and it is highly recommended to use community agreed on, widely applied – e.g., by respective domain specific databases or repositories – metadata and vocabularies. This article collection provides lists and descriptions of ESS domain specific vocabulary databases and provides descriptions and best practices on how to access and use such vocabulary databases and how to apply these to a dataset, how to choose appropriate metadata and the correct vocabulary.
We encourage contributions from the community to help refine and enhance this collection and provide general or domain specific strategies for more effective and collaborative data solutions.
For more details on how to contribute, please refer to our template article.