Skip to content

CAPICE - Combined Analysis and Publication of Ice Sheet data

Scientific domain modeling, such as ice sheet modeling, relies on extensive input data that is often the output of other models. For example, the upper boundary of an ice sheet model domain may be constrained by snowfall and surface temperature. Global circulation models can provide such input. In addition, changes in the geometry of the model domain due to calving of icebergs can be determined from satellite imagery (e.g., [@Loebel2024]). There are several other scenarios where domain models depend on external data. Such data is typically distributed across multiple computing facilities, such as HPC systems, cloud services, and data publishers. This is natural, because application-specific requirements for a system may differ, access to all systems is not guaranteed, and duplicate data should be avoided. Therefore, scientific domain modeling involves not only model development and model experiments, but also a lot of data preprocessing and data management.

The goal of this pilot project was to develop an exemplary infrastructure and workflow that would allow simulations to be run across different computing infrastructures without the need for manual intervention by the user, and for preparing the publication of the results. The exemplary scientific software used was the open-source "Ice Sheet System Model ISSM" ([@Larour2012]). ISSM is an established and widely used ice sheet model. It was used, for example, for the sea level projections that contributed to the latest IPCC reports ([@Goelzer2020]). The software solution that enables the distributed simulation approach is the Data “Analytics Software Framework DASF” ([@Eggert2022]). DASF is a remote procedure call wrapper library and acts as a message broker, allowing the user to send a message from a local machine to a remote system. This message can then be interpreted to run a simulation, for example. In this way, the user can trigger various processes, such as transferring data from one remote system to another via a data storage service, preprocessing and analyzing the required data, and submitting compute job scripts, e.g. for the job scheduler slurm.

Media 1: Principle of a workflow between different computing infrastructures using the CAPICE project. In a first step the user calls the CAPICE data API to trigger the upload of a data file from the data infrastructure to the data storage service 1). The request is passed by the DASF message broker to the data infrastructure 2). The specified data is then uploaded from the data infrastructure to the data storage service through the CAPICE data backend 3). In a second step, the CAPICE computing API is called to download the data file from the data storage service to the computing infrastructure 1). The DASF message broker passes the request to the computing infrastructure 2) where the data is downloaded from the data storage service by calling the CAPICE compute backend 3). A third step involves calling the CAPICE computing API to process the formerly downloaded file on the computing infrastructure 1). Again the request is passed by the DASF message broker to the computing infrastructure 2). There the file is processed, for example using ISSM. Afterwards the result is uploaded to the data storage service 3) from where the user can download the result 4).

Results

Implemented Solution

To illustrate the ability of DASF to run scientific simulations on distributed systems, a sample simulation of ISSM is implemented in such a way that the simulation is run at the "Leibniz-Rechenzentrum (LRZ)" in Munich and the required data is provided by the "Deutsches Klimarechenzentrum (DKRZ)" in Hamburg. However, the implemented software solution is independent of the respective systems. Installation and deployment are possible on any infrastructure, such as HPC-systems, compute clouds, local and virtual machines. The implemented example is based on one of the tutorial cases provided by the ISSM development group. The code base can be found in the project repository on [github.com]1, the [required input data]2 ([@Nowicki2013]) and [further information]3 are provided by the ISSM website https://issm.jpl.nasa.gov/. While this example is relatively simple and primarily serves an educational purpose, key concepts and functionalities have been implemented. The example provides a basis for further development.

Media 1 shows the basic setup we used for piloting. Two remote systems (data infrastructure and computing infrastructure) run a site-specific backend component of DASF, which allows the connection to the DASF message broker. For this project, the message broker is hosted at the Helmholtz-Zentrum Hereon. DASF provides an application programming interface (API) for these backends. With this API, the user can send a message to the message broker, which is passed to the backend and triggers a defined workflow on the remote system. In this way, the user is only connected to the DASF message broker. The aim is to require no direct interaction with the remote systems while the workflow is running. The data transfer is achieved by a data storage service that can be accessed by all systems. In the following, the workflow for the provided example is summarized.

Input data for a simulation are stored at the data infrastructure. The user calls the CAPICE data API to trigger the upload of the input data to the storage system, which provides a link to download the input data (blue workflow in Media 1). In a second step, the user calls the CAPICE computing API, providing the previously generated download link, to trigger the download of the input data at the computing infrastructure (orange workflow in Media 1). The user then calls the CAPICE compute API to trigger several ISSM commands that use the input data to build the model. The model is saved to an ISSM model file. Finally, the simulation is run on the computing system. The backend automatically generates a job-script and submits it to the scheduler. After the simulation is finished, the model result is saved and uploaded to the storage service. An email notification is sent to the user and project partners with a link to download the simulation results (purple workflow in Media 1).

Media 2: Principle of how ISSM is called on the computing infrastructure using the CAPICE compute backend. The compute backend receives a request from the DASF message broker to call a ISSM function with certain arguments. This is implemented in the backend using a wrapper function. This wrapper function generates a configuration file, which contains the information on which ISSM function shall be called and the argument values. In a second step, the generic run script, reading the information from the configuration file, is executed. The generic run script uses the ISSM Python interface to execute the requested ISSM function with the given argument values. In this way the executed code is static.

This way, the user does not need to access the remote systems directly. Furthermore, no local installation of ISSM is required. All functionality needed to run the example is provided by the CAPICE compute backend running on the computing infrastructure through the CAPICE compute API. However, the API represents a further abstraction layer to the interface of ISSM. Only functions needed to run the presented example are implemented. Implemented functions may not have the same level of functionality as their ISSM interface counterparts.

One of the requirements for running the DASF backend on remote infrastructure is that the software that will finally be executed on the system must be static and may be validated by a checksum. Therefore, the execution of ISSM is implemented using a generic run script calling the ISSM Python interface. The principle is shown in Media 2. In a first step, the CAPICE compute backend receives a request from the DASF message broker containing information about an ISSM function to be executed and all arguments to be passed to this function. The backend then generates a configuration file containing this information and executes the generic run script. The execution of the run script can be performed locally, or as a batch job, scheduling the run on compute nodes, depending on the requested ISSM function. Local execution can, of course, only be used for very simple functionalities like mere data conversion, as the execution of actual HPC calculations on login nodes is prohibited on HPC systems. Local execution comprises, for example, various model definitions. The run script imports the information stored in the configuration file and executes the requested ISSM function using the ISSM Python interface. While this concept introduces another layer of abstraction between the DASF backend and the ice sheet model code, it guarantees that the code executed, the generic run script, won't be modified by the user. Code security measures can be easily applied. The concept also allows the implementation of interfaces for all model code. ISSM includes not only a Python interface but also a Matlab interface. If the Matlab interface is preferred, a generic Matlab execution script for ISSM could easily be implemented in the same way. Of course, the same is true for other domain model codes.

Data and Software availability

In principle, the installation and deployment of the example is independent of the computing system. Here, we refer to the computing infrastructure and data infrastructure to distinguish between the system at which ISSM is executed and the system at which the data is stored. A working ISSM installation is required at the computing infrastructure.

The “Ice Sheet System Model (ISSM)” ([@Larour2012]) is available at: https://github.com/ISSMteam/ISSM . ISSM is published under the BSD 3-Clause License. Further information about its installation can be found on the respective [website]4.

To run the example, a data infrastructure running the CAPICE data backend, a computing infrastructure running the CAPICE compute backend, and a DASF message broker are required. Such infrastructures can be any computing system.

Additionally, access to a data storage service such as swift or NextCloud is needed. In principle, all components can be run on the same machine. If any of the components are to be deployed on an HPC system, communication with and guidance from the data center administrators is required.

The source code of the data backend and the compute backend are available at https://git.rwth-aachen.de/nfdi4earth/pilotsincubatorlab/pilots/capice/capice-data-backend and https://git.rwth-aachen.de/nfdi4earth/pilotsincubatorlab/pilots/capice/capice-compute-backend respectively.

The repositories feature guidance for installation and use. The example setup based on the ISSM example is available at: https://git.rwth-aachen.de/nfdi4earth/pilotsincubatorlab/pilots/capice/capice-greenland-example The readme of the repository includes a step-by-step discussion of the example code, including the download of needed input files, data transfer between the different infrastructures and the data storage service, model setup, and model execution.

References


  1. https://github.com/ISSMteam/ISSM/tree/main/examples/Greenland 

  2. https://issm.jpl.nasa.gov/files/examples/Greenland_5km_dev1.2.nc 

  3. https://issm.jpl.nasa.gov/documentation/tutorials/greenland/ 

  4. https://issm.jpl.nasa.gov/