Skip to content

Empowering Earth System Science through Julia for Optimized Processing and Statistical Driver Attribution in Big Data

The availability of remote sensing information of the Earth has increased exponentially over the last decades. This trend is expected to continue as new remote sensing products from new satellite missions become available. To keep the pace with the amount of information gathered by the various sensors orbiting the Earth, it is essential to develop tools that allow scientists and students to easily manipulate and perform operations on spatio-temporal gridded data. Ideally, the new generation tools should be able to run, and interact with data on cloud platforms rather than individual computers. In this pilot, we developed a new Julia package as an entrance point to high-performance computing based on the YAXArrays.jl package, the YAXArraysToolbox package 1. The front-end of the YAXArraysToolbox package is divided in two modules, the first one (Basic Operations, Media 1) contains a set of basic operations to process and visualize large gridded data in a very efficient way. The second module (Spatio-Temporal Analyses, Media 2) contains a set of tools to perform data-driven attribution, e.g., using the space-for-time concept, and spatio-temporal data partitioning for cross-validation analyses based on 2, 3, and 4. We anticipate that the package will be beneficial to various domains within the Earth system data science, and Julia user communities by providing a user-friendly environment. Advanced users will also benefit by taking advantage of the implemented analyses to perform data-driven attribution using machine learning techniques and semi-empirical modeling. We intend to further enhance the capabilities of the package by incorporating additional functions useful for statistical driver attribution in large datasets, including forward feature selection based on regularized regression.

YAXArraysToolbox package. Basic operations module. Structure and functions.
Media 1: YAXArraysToolbox package. Basic operations module. Structure and functions.

YAXArraysToolbox package. Spatio-temporal analysis module. Structure and functions.
Media 2: YAXArraysToolbox package. Spatio-temporal analysis module.

Resources

References


  1. Daniel E. Pabon-Moreno, Gregory Duveiller, Markus Reichstein, Fabian Gans, Felix Cremer, and Alexander Winkler. Dpabon/yaxarraystoolbox.jl: zenodo publication. May 2023. This work has been funded by the German Research Foundation (DFG) through the project NFDI4Earth (TA1 M1.1, DFG project no. 460036893, https://www.nfdi4earth.de/) within the German National Research Data Infrastructure (NFDI, https://www.nfdi.de/). doi:10.5281/zenodo.7989936

  2. Gregory Duveiller, Josh Hooker, and Alessandro Cescatti. The mark of vegetation change on earth's surface energy balance. Nature Communications, 9(1):679, Feb 2018. doi:10.1038/s41467-017-02810-8

  3. Hanna Meyer, Christoph Reudenbach, Tomislav Hengl, Marwan Katurji, and Thomas Nauss. Improving performance of spatio-temporal machine learning models using forward feature selection and target-oriented validation. Environmental Modelling & Software, 101:1–9, 2018. doi:https://doi.org/10.1016/j.envsoft.2017.12.001

  4. Hanna Meyer and Edzer Pebesma. Predicting into unknown space? estimating the area of applicability of spatial prediction models. Methods in Ecology and Evolution, 12(9):1620–1633, 2021. doi:https://doi.org/10.1111/2041-210X.13650