Skip to content

Reusability of data with complex semantic structure

Data on the occurrence and abundance of fossils provide invaluable insights into past climate and biodiversity change. However, lack of common taxonomic standards and associated vocabularies, limit reusability of fossil data and thus global assessments. Inconsistent and variable taxonomy are a common challenge faced in biodiversity research using species occurrence data. This pilot aimed to resolve those semantic barriers for the example of planktonic foraminifera. We designed and developed an R workflow (Media 1) that applies the resolved semantics on legacy data stored in PANGAEA while making use of WoRMS (World Register of Marine Species). Furthermore, we provide community guidelines for new data submissions of species abundance data to generate sustainable ways of combining legacy and new data. As the pilot is closely linked to PANGAEA, we expect that many users will benefit from our workflows and best practice solutions. Since heterogeneous data structures and inadequate ontology support are a common problem for any other geoscientific and biodiversity research communities, we hope that our approach can be transferred on different types of long-tail data.

Simplified cross-functional flowchart sketching the functioning of the
R script to harmonize planktonic foraminifera abundance data.
Media 1: Simplified cross-functional flowchart sketching the functioning of the R script to harmonize planktonic foraminifera abundance data.

Resources