NFDI4Earth System Architecture Documentation

Here we provide a documentation of the NFDI4Earth software architecture. The documentation summarizes the major user cases, gives an overview on the requirement analysis approaches for the main NFDI4Earth software components, points out envisioned quality goals and potential architecture’s stakeholders. Getting more specific, the documentation provides externally and internally driven constraints that need to be considered when developing or contributing to NFDI4Earth software components. We herewith facilitate, the users to better understand the solution strategy for the NFDI4Earth architecture. We use the arc42 template for the documentation of software and system architecture. The detailed description of the NFDI4Earth architecture is done by using a blackbox-whitebox approach that describes the major aim, the main functionalities and interfaces of the components as seen by the users first (blackbox) and then provides details on the (inner) solutions, e.g., sub-components, implementations (whitebox). The envisioned target group of the documentation includes software developers and architects as well service providers.

The architecture documentation will be updated regularly. The first version of the NFDI4Earth architecture documentation is published on Zenodo: Degbelo, A., Frickenhaus, S., Grieb, J., Hachinger, S., Henzen, C., Klammer, R., Müller, C., Munke, J., Nüst, D., Purr, C., Weiland, C., & Wellmann, A. (2023). NFDI4Earth Software Architecture. Zenodo. https://doi.org/10.5281/zenodo.8333959 (Authors in alphabetical order)

Contact: Christin Henzen or the NFDI4Earth Architecture Team with your questions and comments.

Introduction and Goals

The mission of NFDI4Earth is to address the digital needs of the Earth System Sciences (ESS) for more FAIRness and Openness in ESS research. We develop several services, and concepts within NFDI4Earth and reuse/integrate existing services when suitable. By doing so, we facilitate researchers, data experts, and software specialists to discover, access, and analyse relevant Earth data and related publications or tools.

NFDI4Earth supports the following (main) use cases with common services:
1. Discover and explore earth data sources
2. Support data publication and data curation
3. Solve a research data management problem
4. Create and publish information products, e.g., as services

The architecture of the NFDI4Earth describes the different services built to make resources from the Earth System Sciences (ESS) findable, accessible, interoperable and reusable, as well as the requirements for interfaces enabling their interaction.

In NFDI4Earth, we follow the service definition used in the joint statement of NFDI consortia on basic services:

A service in NFDI is understood as a technical-organisational solution, which typically includes storage and computing services, software, processes, and workflows, as well as the necessary personnel support for different service desks.

The service portfolio is described in Section Solution Strategy.

Quality Goals

In this section, we describe quality goals synonymously used as a term to describe architecture goals with a long-term perspective. As the NFDI4Earth architecture is evolving, we envision to regularly evaluate the prioritization of the quality goals. Following the ISO25010 on software product quality, we consider the following quality goals for the NFDI4Earth Software Architecture:

Priority Goal Description
1 Functional suitability Degree to which the architecture provides functions that meet stated and implied needs when used under specified conditions
2 Maintainability Degree of effectiveness and efficiency with which the architecture can be modified to improve it, correct it or adapt it to changes in environment, and in requirements
3 Usability Degree to which a component can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use.

Further goals

Following the ISO 25010:2011 on software product quality, we consider the following quality goals for the NFDI4Earth software architecture:

Stakeholders

We envision the following main roles for using the NFDI4Earth Software Architecture Documentation:

Role/Name Expectations
Internal developers of NFDI4Earth services Find descriptions and specifications of other NFDI4Earth services and integrate the services within their own projects
External developers Find specifications on how to use existing NFDI4Earth services and find descriptions on how to add services to the NFDI4Earth infrastructure

Further roles and details may be defined by products/components in the building block view.

Architecture Constraints

The following requirements constrain the design and implementation decisions and processes for the NFDI4Earth Software Architecture:

Constraint Type Explanation
NFDI4Earth Proposal organisational, strategic and technical The proposal provides the context and aims of the NFDI4Earth software architecture (see https://doi.org/10.5281/zenodo.571894)
NFDI integration and interoperability technical, strategic The NFDI4Earth software architecture must fit with relevant activities of the overall NFDI, e.g., with base service initiatives
International integration and interoperability technical, strategic The NFDI4Earth software architecture will be embedded in international infrastructures
Developer expertise, research interests, and availability organisational, technical The expertise, research interests, and availability of a distributed software developer team affect the software project management as well as the technology decisions
Architecture team organisational Software decisions for NFDI4Earth-developed services are made by the NFDI4Earth architecture team on suggestion and on agreement with the measure leads of the relevant component
Developer team organisational, conventions NFDI4Earth cross-product implementations, guidance and conventions will be provided by the software developer team
FAIR principles technical NFDI4Earth services should support the implementation of the FAIR principles
Programming languages technical Services will be developed in an established programming language, e.g., JavaScript, Java, Python, C#, HTML, and follow the basic structures of a sustainable software project, e.g., testing, dependency management, documentation and internationalisation
Free and Open Source Software technical and strategical NFDI4Earth-developed services must be provided as Free and Open Source (FOSS). Used open-source solutions must be well established, documented and maintained, e.g., guaranteed by their long-term applicability whenever possible.
Open standards / specifications technical NFDI4Earth services will reuse existing (preferably) or define open standards or specifications for all interfaces, when interfaces are relevant for the service
Loosely coupled services technical NFDI4Earth services must be loosely coupled to allow replacing software solutions, e.g., as agile adaptions to user needs and changing requirements or newly developed NFDI-wide solutions
Software repository organisational The source code of the NFDI4Earth-developed services must be managed in a software repository that allows contributions from (relevant) NFDI4Earth participants
Hosting at TU Dresden technical, organisational The NFDI4Earth services will be hosted at TU Dresden Enterprise cloud and maintained by following the respective guidelines / regulations (https://git.rwth-aachen.de/nfdi4earth/softwaretoolsarchitecture/devguide).
Containerization technical NFDI4Earth services will run in virtual containers, whenever possible.

Types: technical constraints, organisational constraints, strategic constraints, and conventions (e.g., programming or versioning guidelines, documentation or naming conventions).

System Scope and Context

Today, there are numerous scattered and heterogeneous services supporting RDM in ESS. Some of them are project-based and do not have a long-term perspective or are not openly available to all researchers. In addition, the implementation of RDM concepts, FAIR principles and related concepts differ among researchers, institutions, and disciplines. NFDI4Earth contributes to aligning existing and emerging RDM services along FAIR and Openness principles and to working towards a unifying and long-term perspective for services. Thus, NFDI4Earth targets the consolidation and harmonization of research data-related services in ESS and linking these services into the NFDI4Earth software architecture (see https://doi.org/10.5281/zenodo.571894).

The development of the NFDI4Earth architecture is supported and influenced by several external drivers. For instance, the architecture reuses / integrates well-known and accepted existing services, components and resources from the ESS community and infrastructure providers. Moreover, the NFDI4Earth architecture benefits and contributes to the activities of the national research data infrastructure (NFDI) in case of 1) consuming / integrating NFDI base services and 2) developing strategies for reusing / integrating services across NFDI consortia. The following aspects are beyond the scope of the NFDI4Earth architecture

Solution Strategy

NFDI4Earth aims to reuse existing services or parts of the services to build a sustainable infrastructure for the ESS. Moreover, we target identifying and collaborating with sustainable services that act as data sources whenever appropriate and applicable. The NFDI4Earth software architecture is, thus, organized along a service portfolio, including community services and NFDI4Earth-developed services addressing researchers' needs on the previously mentioned overall use cases.

Community services are either disciplinary services offered by NFDI4Earth partners or discipline-agnostic/multidisciplinary services offered by trustworthy providers, both open for community usage.

NFDI4Earth-developed services serve as openly available central support backbone services designed to address ESS researchers' needs in research data management and developed by a distributed software developer team from different NFDI4Earth partner institutions.

Community Services

Information on community services was collected using a systematic approach. Taking the different types of provided resources into account, we either assessed data sources for the specific types of resources or performed a landscaping process with manual steps, e.g., conducting interviews with the service providers if there was no proper data source.
Data sources offer the harvesting of existing information via application programming interfaces (APIs). In NFDI4Earth, we harvest information on repositories and archives, data services, datasets, software, documents, standards and specifications, education and training materials.

Data source’s characteristics and scope Actions Example(s)
The identified data source is selected to publish NFDI4Earth-developed resources NFDI4Earth implements the publishing process, curates the metadata and harvests the source. NFDI4Earth Zenodo Community
The identified data source provides relevant (meta)data for the ESS community.
Meta(data) is managed and curated via the data source.
NFDI4Earth develops a metadata mapping in close collaboration with the data source and harvests the source. DLR EOC Web Services,
GDI-DE Geodata Catalog
The identified data source provides relevant (meta)data for the research community.
Meta(data) is managed and curated as a community-effort. There is no need for specific ESS adaptions.
By providing and updating information NFDI4Earth contributes to the overall research community.
NFDI4Earth recommends managing metadata in the related data source and assist if needed.
Moreover, NFDI4Earth implements a proper harvester and provides an NFDI4Earth-developed workaround for resource providers who are not (yet) able to manage their metadata there, e.g., with manually curated and managed information.
Research Organisation Registry (ROR) for the resource type organisations;
WIKIDATA to get NFDI4Earth member organisations;
RDA Metadata Standards Catalog and Digital Curation Centre (DCC) for standards
The identified data source provides relevant (meta)data for the research community. Meta(data) is managed and curated as a community-effort.
Metadata provided by the data source needs regular updates and/or qualified information to tackle researchers' requirements.
By registering new resources in the data source, NFDI4Earth contributes to the overall research community, e.g., by facilitating multidisciplinary use cases.
NFDI4Earth recommends managing metadata in the related data source and assists if needed.
Moreover, NFDI4Earth and the data source align strategies on updating and improving metadata for the registered services.NFDI4Earth and the data source collaborate on the development of metadata schemas that meet the ESS community's requirements. NFDI4Earth, thus, utilizes the data source to manage metadata and provide a sustainable service and API for metadata harvesting.
Registry of Research Data Repositories (re3data) for the resource type repository and archive
The identified data source provides relevant (meta)data for the ESS community and acts as an aggregator by harvesting relevant metadata from distributed sources. Moreover, NFDI4Earth provides additional community services that act as potential data sources for harvesting and should be integrated. NFDI4Earth recommends managing metadata in the harvested data sources and assist if needed. NFDI4Earth and the data source align implementation strategies, co-develop solutions, such as harvesters, and provide common recommendations, e.g., on metadata schemas and APIs. Helmholtz Earth and Environment DataHub for the resource type dataset

The full list of harvested data sources is available at: https://knowledgehub.nfdi4earth.de/ and will be continuously adapted with respect to the following main goals:

NFDI Basic Services

Basic services provide specific community services. Following the general service definition as described in the Introduction, they additionally

Detailed information is available here: https://base4nfdi.de/process/basic-services

Service name Integration
IAM4NFDI (https://base4nfdi.de/projects/iam4nfdi) NFDI4Earth needs authentication and authorization in the EduTrain learning management system to track the learner's progress and preferences. We thus use the NFDI-AAI-provided AcademicID solution (see: https://doc.nfdi-aai.de/). The integration is implemented as incubator in the IAM4NFDI project (https://incubators.nfdi-aai.de).
PID4NFDI (https://base4nfdi.de/projects/pid4nfdi) NFDI4Earth actively contributes to the PID basic service as principal investigator and with providing use cases and derived requirements.
nfdi.software (https://base4nfdi.de/projects/nfdi-software) NFDI4Earth actively contributes to the basic service as principal investigator and with providing the Research Software Directory.
DMP4NFDI (https://base4nfdi.de/projects/dmp4nfdi) NFDI4Earth contributes with a use case to integrate existing services in RDMO.

NFDI4Earth-developed Services

NFDI4Earth provides five central support services. The Knowledge Hub is the backend service that manages structured and interlinked metadata. It stores all metadata in RDF (Resource Description Framework) and is accessible through a SPARQL API. Through the use of the data management middleware Cordra that stores all manually-created and harvested metadata, the architecture supports the management and provision of FAIR digital objects. The OneStop4All provides a single entry point to find and explore ESS resources. The communication between the OneStop4All and the Knowledge Hub happens through the Knowledge Hub's SPARQL API and the OneStop4All search index.
The Living Handbook stores and manages web-based, interactive articles about topics relevant to the community and the documentation of the NFDI4Earth. It provides an editorial workflow to generate high-quality content for users and ensures giving credit to attract authors. Living Handbook articles are stored in the NFDI4Earth Gitlab, are harvested through the API and made visible in the OneStop4All.
The EduTrain learning resources are stored in an instance of the OPENedX learn management system (LMS) and are also harvested through the API.
The User Support Network is a distributed helpdesk based on a ticket system (Znuny) hosted at TU Dresden. The network is closely linked to helpdesks of the community services, e.g., the Helmholtz Earth and Environment DataHub.

The following table provides an overview of chosen approaches and concepts:

Product Approaches Concept
Knowledge Hub (KH) Question-based approach
OneStop4All (OS4A) User-centred design and scenario-based approach
Living Handbook (LHB) Scenario-based approach
User Support Network (USN) Help desk concept based on longstanding practical experiences
Education and Training Materials and Services (EduTrain) Focus group, requirement analysis

NFDI4Earth-funded Community Software

The Pilots are small projects that provide solutions for specific needs of the ESS community. They are managed in the NFDI4Earth GitLab and made discoverable through the OneStop4All and integrated in the NFDI4Earth architecture whenever possible (see Requirements Overview).
The Incubator projects are small blue-sky projects on innovative ESS RDM approaches with an experimental character. They are made discoverable through the OneStop4All and their source code and descriptions are published for reuse by the ESS community.

NFDI4Earth Virtual Research Environment

NFDI4Earth provides a virtual research environment for the development, piloting and operation of NFDI4Earth software and services.

Our NFDI4Earth helpdesk (helpdesk@nfdi4earth.de) supports the related tasks.

Building Block View

Whitebox Overall System

The NFDI4Earth software architecture provides two services that serve as entry points to several linked NFDI4Earth and community-developed services: the OneStop4All as human-readable interactive user interface and the Knowledge Hub as machine-readable interface.

Whitebox of the NFDI4Earth architecture

Knowledge Hub

The blackbox description is based on the Knowledge Hub concept one-pager and concept deliverable (https://doi.org/10.5281/zenodo.7950859; https://doi.org/10.5281/zenodo.7583596).

The NFDI4Earth Knowledge Hub serves as one major backend service of NFDI4Earth. It integrates metadata about all NFDI4Earth resources and is accessed via an API.

   
Problem Research products from the Earth System Sciences (ESS) are increasingly difficult to find. There is a need for tools that automate their discovery. ‘Research products’ is used here as a catch-all term that includes 1) datasets, 2) services, 3) tools, 4) vocabularies, 5) reports, 6) scientific papers, and 7) peer reviews.
Innovations Structured and interlinked metadata for ESS resources produced in NFDI4Earth or relevant for the NFDI4Earth. These ESS resources can be any research product listed above, an article of the Living Handbook, or an educational material from the EduTrain. We use RDF (Resource Description Framework) as an encoding format.

Structured and interlinked metadata for ESS resources hosted by NFDI4Earth partners

NFDI4Earth label - compiled based on the available metadata - as an indicator of the extent to which services are FAIR, and in particular, the degree of interoperability of the services
Users Consumers: people who have basic skills in programmatic data access (i.e., they are able to program a short snippet of code in a programming language to retrieve data).

Producers: these create/edit metadata for the Knowledge Hub. They may have programming skills (in which case they create/edit metadata via the API of the Knowledge Hub) or have no programming skills (in which case they do the creation/editing via a user interface).
Interface(s) SPARQL API
Unit of adoption Individuals (e.g., data scientists);
Organizations (indirectly, as a by-product of the adoption by individuals)

The NFDI4Earth Knowledge Hub is available at: https://knowledgehub.nfdi4earth.de

OneStop4All

The blackbox description is based on OneStop4All concept one-pager https://doi.org/10.5281/zenodo.7583596.

The NFDI4Earth OneStop4All is the primary visual and user-friendly NFDI4Earth access point.

   
Challenges Research products from the ESS community are diverse and increasingly difficult to find. There is thus a need for platforms that efficiently organize the access to ESS resources, in particular quality-assured resources. These platforms should be:

User-friendly and easy-to-use, taking specific user characteristics and needs into account
Flexible enough to integrate future RDM services (e.g., address multidisciplinary use cases with other NFDIs, link to EOSC services).
Innovations Central search on NFDI4Earth resources and distributed sources, including relevant governmental, research and other open data sources

Innovative user interfaces to explore the linked ESS resources that adapt to the needs of different user groups

Intelligent functionality to connect Living Handbook information for registered resources

Seamless transition from machine-based to human-based support

A community tool fostering the sharing of high-quality information and resources
Users We envision the following types of primary users:
Users, who are looking for ESS research and ESS RDM information, e.g., events, networks
Users, who are looking for support, e.g., on NFDI4Earth tools or on how to use NFDI4Earth services
Users, who want to offer information/research products
Users, who want to provide feedback on the content
Interface(s) User interface
Unit of adoption Individuals

The NFDI4Earth OneStop4All is available at: https://onestop4all.nfdi4earth.de/

Whitebox NFDI4Earth Products

White Box Knowledge Hub

The NFDI4Earth Knowledge Hub consists of three building blocks to harvest, process and provide metadata. The pre-processing scripts mainly provide pipelines to harvest data sources or populate manually collected metadata, map metadata to the NFDI4Earth schemas and add/update the harmonised metadata in the data management system. These scripts are written in Python and provided as open-source in the NFDI4Earth GitLab. Through the use of a data management system that stores all manually-created and harvested metadata, the NFDI4Earth software architecture supports the management and provision of FAIR digital objects. The triple store stores all metadata as semantically-enriched metadata in RDF (Resource Description Framework) and is accessible through a SPARQL API. The implementation of the data management system and the triple store happens in NFDI4Earth through open-source software.

Whitebox Knowledge Hub

The NFDI4Earth Ontology is available at: https://nfdi4earth.de/ontology. The Ontology is iteratively developed in the open Knowledge Hub working group. The Knowledge Hub source code is managed in the NFDI4Earth GitLab: https://git.rwth-aachen.de/nfdi4earth/knowledgehub. Developments, e.g., harvester implementations, are coordinated across the products in the NFDI4Earth developer meeting: https://www.nfdi4earth.de/2coordinate/software-developer-team

White Box OneStop4All

The OneStop4All provides the Web frame for all NFDI4Earth user interface (UI) components. Thus, the OneStop4All links or embeds the EduTrain learning management system and the Living Handbook user interfaces with respect to a user-friendly navigation and a common look&feel for all NFDI4Earth UI components and provides access to the User Support Network via Web form. By doing so, the OneStop4All does not provide the exclusive, but an additional access point for all other NFDI4Earth software components with user interfaces. The central search on all NFDI4Earth resources provides the core functionality of the OneStop4All. The OneStop4All is implemented as a custom solution.

The design and implementation strategy are described here: https://doi.org/10.5281/zenodo.10351658, https://doi.org/10.5281/zenodo.13629130 OneStop4All source code is managed in the NFDI4Earth GitLab: https://git.rwth-aachen.de/nfdi4earth/onestop4all. Developments are coordinated across the products in the NFDI4Earth developer meeting: https://www.nfdi4earth.de/2coordinate/software-developer-team

Whitebox OneStop4All

EduTrain

The blackbox description is based on the EduTrain concept one-pager https://doi.org/10.5281/zenodo.7583596.

The NFDI4Earth EduTrain provides a comprehensive overview on existing education and training material and will provide FAIR, Open, ready-to-use modular course material that are developed by the EduTrain team based on the community's needs.

   
Problem A lack of FAIR and open educational resources is one of the biggest obstacles to scientific activities. Although substantial effort has already been put into developing Open Educational Resources (OERs), many issues still exist, e.g., peer-reviewing the content, maintenance responsibility, quality control, management, and lack of funding for the development and maintenance. Another major problem is that most existing FAIR principles and Open Science materials are generic. At the same time, ESS-specific materials that outline adapting the FAIR principles and Open Science concepts are highly needed but mainly missing.
Innovations Development and maintenance of OERs and curriculum based on regular educational needs assessment of the ESS community

Continuous collection and evaluation of existing OERs in research data management tailored for ESS, spatio-temporal data literacy, and spatio-temporal data science

Funding the development of new open-licensed materials to meet the educational needs of the ESS community by publishing calls for educational pilots

Development of target group-specific curricula
Users Scientists, ranging from early-career researchers (Ph.D. students, Post-Docs) to experienced senior scientists and professors
Master students
Bachelor students
Educators and training professionals (e.g., professors, lecturers, teaching assistants)
Interface(s) User interfaces
Unit of adoption Individuals
Higher education institutions
Research centres

Living Handbook

The blackbox description is based on the Living Handbook concept one-pager https://doi.org/10.5281/zenodo.7583596.

The NFDI4Earth Living Handbook provides an interactive Web-based documentation for all aspects related to the NFDI4Earth, its services and outcomes.

   
Problem Many researchers, societies, funding agencies, companies, authorities, or the interested public are not familiar with each aspect of the NFDI4Earth, its services, or ESS research data in general. A core service with overview documents of such topics is, hence, required. The various user needs, and prior knowledges must be reflected in these documents, i.e., these must provide a flexible granularity, from being brief and informal to being comprehensive and detailed.
Innovations Structuring and harmonizing all aspects of NFDI4Earth as well as ESS related information from different, also previously unpublished, sources.

Curate and present information about the NFDI4Earth as a collection of edited, inter-linked, human-readable documents of various types (documentation, report, article, manual, tutorial, ed-op, etc.) that are externally linked with general ESS resources.

Compilation of documents tailored to the different proficiency levels and backgrounds of readers by a combination of automatic re-combination and re-arrangement of the document's elements.
Users Consumers: Users with interest in NFDI4Earth, NFDI or else ESS related information, data, services, concepts, software etc. We expect users with a high variety in their backgrounds and prior knowledge.
Editors/authors: Persons that provide and regularly quality check the LHB contents.
Interface(s) User interface
Unit of adoption The LHB is beneficial to, e.g.,:
Researchers as a manual how to use NFDI4Earth and related external products and to learn what the scope of the NFDI4Earth and its services are.
Scientific and professional societies as a place to refer their members as a resource for ESS data related topics.
Funding agencies to understand how researchers are using and providing ESS data
Authorities to get and provide information about ESS data The interested public as the first stop to find ESS related information

The NFDI4Earth Living Handbook is managed here: https://git.rwth-aachen.de/nfdi4earth/livinghandbook/livinghandbook

User Support Network

The blackbox description is based on the User Support Network concept one-pager https://doi.org/10.5281/zenodo.7583596.

The NFDI4Earth User Support Network provides distributed, cross-institutional user support based on the services of the existing partner institutions’ services and the upcoming NFDI4Earth innovations.

   
Challenges Research data services from the Earth system sciences community are diverse and until now mainly directed to a smaller community, e.g., an institute. We work on a structure in the USN that allows to map the different resources and to access them. The USN will also evaluate if an open community support system (like Stackoverflow) will be of value next to the institutional RDM support of the USN team. To work on that evaluation, we need a solid idea of what kind of user questions are ask-ing, which we expect to get by running the ticketing system.
Innovations Single point of access to a national expert pool offering individual support for ESS RDM problems for all phases of the data lifecycle

Collection, harmonization and provision of expert knowledge based on institutional experience, e.g., via Living Handbook

Creation of standard operation procedures (SOPs) for user support
Users We envision the following types of primary users:
Users, who are looking for general information, e.g., on NFDI4Earth tools or on how to use NFDI4Earth services
Users, who are looking for support in ESS research data management (RDM)
Interface(s) User interface
Unit of adoption Individuals
Research institutions

The User Support Network can be contacted via: https://nfdi4earth.de/helpdesk

Whitebox NFDI4Earth-funded Community Software

Pilots

The blackbox description is based on the Pilots concept one-pager https://doi.org/10.5281/zenodo.7583596.

The NFDI4Earth Earth System Science (ESS) Pilots are small projects from various disciplines of the ESS community usually lasting for one year. Pilots are used to assess and define requirements in other task areas and promising results will be integrated into the NFDI4Earth infrastructure.

   
Problem To achieve acceptance and adoption of the community as well as a cultural change, NFDI4Earth must not implement top-down solutions but involve ideas and existing tools from the research community. Different domains of ESS face different challenges in interoperability, standardization of data, methods and workflows. Expertise and technologies are existent but need further development to meet domain specific requirements and often lack transferability for usage beyond a small user group.
Innovations Agile projects that directly reflect researchers' needs in data management and implement novel solutions for research data management

Bottom-up innovation scouts for other Task Areas of NFDI4Earth

Focus on transferability of results and enhancement of technologies to make use of existing resources and foster community driven design of NFDI4Earth
Users The target community are researchers from the ESS community working on tools that enhance research data management. The solutions implemented from the pilots are targeted to the respective scientific community.
Interface(s) Depending on the individual pilot proposals
Unit of adoption NFDI4Earth takes up pilots' results into their infrastructure
User communities of different domains that adopt the newly de-veloped tools by pilots

NFDI4Earth Pilots are available at: https://git.rwth-aachen.de/nfdi4earth/pilotsincubatorlab/pilots

Runtime View

This section will be updated at a later stage

Deployment View

Infrastructure Level 1

Within this section, we present the high-level architecture and key components that form the foundation of our deployment strategy. This section provides an overarching view of the core structures, their interactions, and their collective role in facilitating the deployment and operational aspects of our system.

System Distribution

Our system is distributed across various virtual machines (VMs), all located in the Lehmann-Zentrum Data Center within the so called Enterprise Cloud service structure.

Network structure

Diagram of NFDI4Earth's network structure (source)

Important Justifications or Motivations for this Deployment Structure

The decision to utilize this service infrastructure is primarily based on the fact that it is the established system of the Technical University of Dresden. The Center for Information Services and High Performance Computing (ZIH) was established to provide a robust and reliable infrastructure for the various services and applications of the university. By leveraging this established infrastructure, we can benefit from the expertise and resources of the ZIH, which facilitates the maintenance and scalability of our application.

Our system is distributed across multiple virtual machines (VMs), following the guideline from the Center for Information Services and High Performance Computing (ZIH) of having one service per VM. This approach has several advantages:

Efficient Management and Deployment of Applications with Docker and Portainer

General Docker setup

We have decided to deploy all applications as Docker containers due to several advantages:

1. Isolation and Consistency: Containers provide a consistent environment across development, testing, and production phases.

2. Portability: Applications can run seamlessly on different platforms and cloud services.

3. Resource Efficiency: Containers run with minimal overhead and utilize system resources more efficiently, even when deployed on virtual machines.

4. Rapid Deployment and Rollbacks: New versions can be deployed quickly and rolled back if needed.

5. Simplified Dependency Management: All dependencies are bundled within the container.

6. Port Forwarding: Docker simplifies port forwarding, allowing easy mapping of host ports to container ports, ensuring external access to services.

7. Persistent Storage with Volumes: Docker volumes enable persistent data storage independent of the container lifecycle, ensuring data durability and easy sharing between containers.

Portainer

We have chosen to utilize Portainer (in Community Edition) for managing our Docker containers because it meets our needs effectively as It's free and open-source. Additionally, it benefits from active community support and ongoing development, ensuring regular updates and new features. Portainer's capabilities are particularly advantageous in centralizing our container management interface.

Portainer serves as a unified UI deployed centrally within our designated virtual machine. From here, it seamlessly integrates and centrally manages all Docker hosts (VMs) across different environments. This setup allows us to efficiently oversee and administer our containerized applications from a single interface. Each Docker host is treated as an environment within Portainer, ensuring streamlined operations and enhancing our ability to monitor and maintain our infrastructure effectively.

General Portainer Setup (source)

Portainer and Portainer agents are implemented as Docker containers themselves. Portainer serves as a centralized management interface deployed as a container, while Portainer agents are deployed on each Docker host (VM) to facilitate communication and management tasks between Portainer and the Docker environment.

Open Portainer UI

Click here

Automating Deployment using Ansible

We have chosen to implement our entire deployment process using Ansible, a powerful automation tool that offers several advantages.

Ansible features an agentless architecture, leveraging standard SSH connections for seamless setup and reduced overhead. It ensures idempotency, meaning running the same playbook multiple times achieves a consistent state without unintended changes. Using YAML for playbooks makes Ansible easy to read and write, lowering the barrier to entry and accelerating automation script development.

Potential Drawbacks of Using Ansible

Performance Overhead: Because Ansible uses SSH for communication, it might be slower than agent-based tools when managing a large number of machines simultaneously. This can lead to longer deployment times for extensive infrastructures.

Learning Curve: While Ansible is relatively easy to learn, mastering its more advanced features and best practices can take time and effort, particularly for users new to configuration management and automation.

Complexity in Large Environments: Managing very large and complex infrastructures with Ansible can become challenging, requiring careful organization of playbooks, roles, and inventories to maintain readability and manageability.

Maintenance & Documentation

To manage our Ansible scripts comprehensively, we have centralized them within a GitLab project for documentation and version control.

The whole Ansible deployment scripts, playbooks and documentations only accessible for authenticated users!

Open Ansible scripts & documentation

Click here

Handling secrets

Given the distributed nature of our developers and facilities, we have implemented a secure handling of host_vars in Ansible. This approach ensures that access to sensitive information, such as secrets and credentials, is restricted. Not everyone has access to all host_vars, ensuring that each team member only sees and manages the variables relevant to their role and permissions. This strategy enhances security and confidentiality while facilitating efficient collaboration across our diverse team and infrastructure setup.

Diagram about handling of Ansible host_vars (source)

Version Control

In collaborative software development, especially within large teams, using a version control system (VCS) is essential. A VCS allows multiple developers to work on the same codebase simultaneously, track changes, and maintain a complete history of modifications. This enables seamless collaboration, as team or group members can work independently on different features or bug fixes without interfering with each other's work. It also facilitates code reviews, debugging, and project management by providing a clear record of who made which changes and why.

Using Git

Git is particularly suited for this purpose due to its distributed nature. Unlike centralized version control systems, Git allows every developer to have a complete local copy of the repository, including its full history. This means that developers can work offline and commit changes locally, which can later be synchronized with the central repository. Git's branching and merging capabilities are also highly advanced, making it easier to manage different development streams and integrate changes from various team members. Its speed, efficiency, and wide adoption make Git a robust choice for version control in collaborative environments.

Centralized Management with GitLab

In addition to using a distributed version control system like Git, having a centralized instance for managing repositories is crucial for coordinated development efforts. A central instance ensures that there is a single source of truth for the project's codebase, facilitating easier integration, continuous integration/continuous deployment (CI/CD), and unified access control. This centralization helps in maintaining consistency, security, and reliability across the development process.

GitLab is a good choice for this central instance due to its comprehensive suite of tools designed for modern software development. Therefore, we use the GitLab instance provided by RWTH Aachen, which offers several advantages for our needs. As a research institution, it is important for us to utilize platforms that are hosted within the academic and publicly funded sector rather than commercial solutions. The RWTH Aachen instance supports seamless integration with GitHub for authentication, providing a convenient login method while aligning with our requirements for an open, non-commercial environment

GitLab

Groups

We have set up various groups within the RWTH Aachen GitLab instance, each aligned with specific software components of our project, to ensure organized and efficient collaboration.

This data has been exported on 2025-01-19

Members
Anna Brauer Auriol Degbelo Balthasar Teuscher Björn Saß (Hereon) Carsten Keßler Christin Henzen Claudiamuellerklein Daniel Nüst Fabian Gans Farzaneh-prime FreyaThiessen Hachi1 Hezel2000 Ira Gerloff Ivonne Anders Jan Schulte Jie Xu Johannes Munke Jonas Eberle (DLR) Jonas Grieb Jonas Kuppler Kemeng Liu MacPingu Marie Ryan Markus Konkol Philipp S. Sommer Ralf Klammer Sandhya Rajendran Sibylle Haßler Simeon Wetzel St3ff3nBusch Thomas Rose Tim Schäfer Timm Schultz Tom Niers Udo Feuerhake Veronika Grupp Yomna Eid arnevogt awellmann-lrz knenovsky langerines mharis111 mwfinkel rechristonga schmidt4earth stefanUHH vkakar
NFDI4Earth X X - - - X - X - - - - - - - X - - - X - - - - X - X X - - - - X - X - - - - - - - - - - - - -
NFDI4Earth / Architecture X X - - - X - X - - - - - - - X - - - X - - - - X - X X - - - - X - X - - - - - - - - - - - - -
NFDI4Earth / CrossTopics X X - - - X - X - - - - - - - X - - - X - - - - X - X X - - - - X - X - - - - - - - - - - - - -
NFDI4Earth / CrossTopics / DataXplorers_Hackathon_Results X X - - - X - X - - - - - - - X - - - X - - - - X - X X - - - - X - X - - - - - - - - - - - - -
NFDI4Earth / EduTrain X X - - X X - X - X - - - - - X - - - X X X - - X - X X - - - - X - X - - X - - - - - - X - - -
NFDI4Earth / EduTrain / content X X X - X X - X - X - - - - - X - - - X X X - - X - X X - - - - X - X - - X - - - - - - X - - -
NFDI4Earth / EduTrain / content / intern X X X - X X - X - X - - - - - X - - - X X X - - X - X X - - - - X - X - - X - - - - - - X - - -
NFDI4Earth / EduTrain / content / public X X X - X X - X - X - - - - - X - - - X X X - - X - X X - - - - X - X - - X - - - - - - X - - -
NFDI4Earth / KnowledgeHub X X - - - X X X - - X - - - - X - X - X - X - - X - X X - - X - X - X - - - - X - - X - X - - -
NFDI4Earth / LivingHandbook X X - - - X - X - - - - X - - X X - - X - - - - X - X X - - - X X - X X - - - - - - - - - - - -
NFDI4Earth / LivingHandbook / Editorial Board X X - - - X X X - - - - X X X X X - - X - - - - X - X X - - - X X - X X - - - - - X - X - X X X
NFDI4Earth / OneStop4All X X - - - X - X - - - - - - X X - - - X - - - X X - X X X X - - X - X - - - X - - - - - - - - -
NFDI4Earth / PilotsIncubatorAndGroups X X - - - X - X X - - - - - - X - X - X - - - - X - X X - - - - X - X X - - - - - - - - - - - -
NFDI4Earth / PilotsIncubatorAndGroups / Incubator X X - - - X - X X - - - - - - X - X - X - - - - X - X X - - - - X - X X - - - - - - - - - - - -
NFDI4Earth / PilotsIncubatorAndGroups / Pilots X X - - - X - X X - - - - - - X - X - X - - - - X - X X - - - - X - X X X - - - X - - - - - - -
NFDI4Earth / PilotsIncubatorAndGroups / Pilots / CAPICE X X - X - X - X X - - X - - - X - X X X - - X - X X X X - - - - X X X X X - - - X - - - - - - -

Overview of members in GitLab groups

This data has been exported on 2025-01-19

Domains

Structure and subdomains

Our infrastructure relies on three distinct domain types to manage and access our services effectively:

n4e.geo.tu-dresden.de: This internal domain serves as the operational backbone for our services within the TU Dresden network:

Subdomains for n4e.geo.tu-dresden.de

nfdi4earth.de: Acting as a public-facing domain alias, nfdi4earth.de promotes our services to external users by redirecting to our internal domain.

test.n4e.geo.tu-dresden.de: Reserved for testing purposes, this domain is accessible only within the TU Dresden network, ensuring secure development and evaluation environments.

Subdomains for test.n4e.geo.tu-dresden.de

SSL Certificates

SSL certificates are uniformly generated using Sectigo, ACME, and Certbot. Since early 2023, TU Dresden has relied on Sectigo as its SSL certificate provider. Sectigo serves as a Certificate Authority (CA) issuing trusted certificates. ACME (Automated Certificate Management Environment) acts as the protocol for automated certificate issuance and management, while Certbot, an open-source tool, uses ACME to obtain and install SSL certificates from Let's Encrypt. This collaboration ensures efficient and secure SSL certificate management for TU Dresden's various services and domains.

Furthermore, the SSL certificates are regularly checked for updates. This ongoing process is managed through systemctl timers, which automate the monitoring and updating tasks. Specifically, the renewal of SSL certificates is scheduled to occur 30 days before their expiration date. The status of those timers can be checked at the virtual machines by running: systemctl list-timers | grep certbot

Infrastructure Level 2

This section provides a detailed look at the architecture, focusing specifically on virtual machines (VMs) and Docker containers. It provides an in-depth examination of these components and their configurations.

We have dedicated virtual machines (VMs) for both the test and production instances of our main products. This separation ensures that new features and updates can be thoroughly tested in an isolated environment before being deployed to the production instance. The advantages of this setup include increased stability and reliability of the production environment, reduced risk of downtime or service disruptions, and the ability to identify and resolve issues in the test instance without affecting end-users.

In addition, we also maintain two VMs that serve a supporting and aggregating role. These VMs host various smaller auxiliary features and services, enhancing the overall functionality. By centralizing these additional services, we ensure streamlined management, easier maintenance, and improved integration with our primary products.

OneStop4All Deployment Overview

The OneStop4All service is a crucial component of the NFDI4Earth project, designed to assist researchers in finding and accessing earth system-related resources and services (see also blackbox & whitebox descriptions). Developed by 52°North in collaboration with NFDI4Earth, this platform utilizes modern technologies to provide a user-friendly interface for searching and managing geo-related research resources.

Technical Implementation

The technical implementation is based on the Open Pioneer Trails framework, which utilizes React, Chakra UI, Vite, and pnpm. This framework offers a modern development environment with efficient build tools and a flexible runtime.

It is composed of three Dockerized components:

Frontend

Delivers the user interface for accessing and managing resources, the OneStop4All service is built upon the Open Pioneer Trails framework developed by 52°North. This framework leverages modern technologies including React, Chakra UI, Vite, and pnpm, ensuring a robust and user-friendly platform for geo-related research data management.

Index

Indexes data from KnowledgeHub into Apache Solr, facilitating efficient search operations.

Harvester

This component systematically retrieves information from the KnowledgeHub's TripleStore using SPARQL queries. It prepares this extracted data for integration into the Index component. This process significantly enhances the platform's ability to deliver fast and efficient data retrieval for users accessing the frontend interface.

Deployment

Like all major services, the OneStop4All is deployed on two dedicated virtual machines (VMs) to distinguish between testing and production environments (see also System Distribution). Following the main deployment strategy this deployment strategy leverages Docker for containerization, allowing for consistent and isolated environments for each component of the application. Docker Compose is used for orchestration, which simplifies the management of multi-container applications by defining and running them with ease. This setup provides streamlined operations, efficient resource management, and scalability to accommodate growing needs.

VM - OneStop4All (Productive)

System infos

This data has been exported on 2025-01-20

Docker container

This data has been exported on 2025-01-20

SSL Certificate info

Certificate Name: onestop4all.n4e.geo.tu-dresden.de

Certificate Name: onestop4all.nfdi4earth.de

Certificate Name: wildcard.onestop4all.n4e.geo.tu-dresden.de

Certificate Name: wildcard.onestop4all.nfdi4earth.de

Output of systemctl list-timers | grep certbot:

Mon 2025-01-20 03:15:22 CET 2h 39min left Sun 2025-01-19 22:20:55 CET 2h 15min ago certbot.timer certbot.service

This data has been exported on 2025-01-20

VM - OneStop4All (Test)

System infos

This data has been exported on 2025-01-20

Docker setup

This data has been exported on 2025-01-20

SSL Certificate info

Certificate Name: onestop4all.test.n4e.geo.tu-dresden.de

Certificate Name: wildcard.onestop4all.test.n4e.geo.tu-dresden.de

Output of systemctl list-timers | grep certbot:

Mon 2025-01-20 01:42:00 CET 1h 5min left Sun 2025-01-19 15:14:08 CET 9h ago snap.certbot.renew.timer snap.certbot.renew.service

This data has been exported on 2025-01-20

KnowledgeHub Deployment Overview

The KnowledgeHub is a central backend for data storage, leveraging modern technologies like Linked Open Data. See blackbox & whitebox for further descriptions

Technical Implementation

The KnowledgeHub is composed of three main components:

Harvesting Scripts and Pipelines

Developed in Python, these scripts are responsible for collecting and processing data from various sources.

Middleware

Cordra is used for managing the ingestion and updates of the triple store, providing a robust interface for data operations.

Triple Store

Jena Fuseki is utilized as the triple store, offering a powerful and scalable solution for storing and querying RDF data.

Deployment

Like all major services, the KnowledgeHub is deployed on two dedicated virtual machines (VMs) to distinguish between testing and production environments (see also System Distribution). Following the main deployment strategy this deployment strategy leverages Docker for containerization, allowing for consistent and isolated environments for each component of the application. Docker Compose is used for orchestration, which simplifies the management of multi-container applications by defining and running them with ease. This setup provides streamlined operations, efficient resource management, and scalability to accommodate growing needs.

Update cycles

The harvesting scripts are executed regularly according to the schedule depicted in the enclosed graphic. This automation is achieved through the use of Celery and RabbitMQ. Celery is a distributed task queue that enables the asynchronous execution of tasks, ensuring that the harvesting scripts run at the designated intervals. And RabbitMQ acts as the message broker, facilitating communication between the Celery workers and the task queue. This setup ensures that the harvesting processes are efficiently managed and executed in a timely manner.

Schedule for update cycles

VM - KnowledgeHub (Productive)

System infos

This data has been exported on 2025-01-20

Docker setup

This data has been exported on 2025-01-20

SSL Certificate info

Certificate Name: knowledgehub.n4e.geo.tu-dresden.de

Certificate Name: knowledgehub.nfdi4earth.de

Certificate Name: wildcard.knowledgehub.n4e.geo.tu-dresden.de

Certificate Name: wildcard.knowledgehub.nfdi4earth.de

Output of systemctl list-timers | grep certbot:

Mon 2025-01-20 08:31:00 CET 7h left Sun 2025-01-19 21:08:14 CET 3h 28min ago snap.certbot.renew.timer snap.certbot.renew.service

This data has been exported on 2025-01-20

VM - KnowledgeHub (Test)

System infos

This data has been exported on 2025-01-20

Docker setup

This data has been exported on 2025-01-20

SSL Certificate info

Certificate Name: knowledgehub.test.n4e.geo.tu-dresden.de

Certificate Name: wildcard.knowledgehub.test.n4e.geo.tu-dresden.de

Output of systemctl list-timers | grep certbot:

Mon 2025-01-20 06:10:00 CET 5h 33min left Sun 2025-01-19 13:41:23 CET 10h ago snap.certbot.renew.timer snap.certbot.renew.service

This data has been exported on 2025-01-20

EduTrain Deployment Overview

The central portal for the EduTrain service is a Learning Management System (LMS) implemented using Open edX. Open edX is a robust, open-source platform designed to create, deliver, and manage online courses.

Both instances (test and production) of Open edX are maintained using Tutor, a command-line tool that simplifies the deployment and management of Open edX. Tutor uses Docker containers to encapsulate all necessary components, ensuring a consistent and reproducible environment. This setup includes containers for the LMS, content management, database, and other services, enabling easy scaling, updates, and maintenance of the EduTrain service.

General maintenance