Skip to content

Introduction to handling raster time series in Julia, Python and R

Introduction

This tutorial will showcase how to work with raster data efficiently. The analysis will be shown in Julia, Python and R to showcase the similarities and differences in handling raster data in these ecosystems.

In this tutorial we are going to use the COSMO REA reanalyis near surface air temperature data. The data is an reanalysis dataset on a 6km by 6km grid. We are going to use the monthly average values, but the data is also avialable with an hourly or daily temporal resolution.

Time series analysis

Loading of necessary packages

First we load the relevant packages in the different languages for working with raster and vector data and also the packages for plotting.

    using YAXArrays # This package is for handling large raster data 
    using NetCDF # This package is for opening NetCDF files 
    using GLMakie # Plotting package with a focus on interactivity
    using DimensionalData # Package to handle named and labeled arrays
    using Rasters
    using Downloads: download # Standard Library to handle downloads to get the data
    using Glob: glob
    using NCDatasets
    using GeoInterface: GeoInterface as GI # Package for handling Geospatial data
    using GADM # Package for loading state borders
    library("rnaturalearth")
    library("rnaturalearthdata")
    library( "sf")
    library( "raster")
    library("stars")
    library("lubridate")
    import geopandas #Load GeoJSON data into #pandas dataframes
    import rasterio #Handle reprojection of geospatial data
    import xarray as xr #Handle I/O and compute for n-dimensional array data

Now we download the airtemperature data for 1995 to the data/ folder. In the first part of the tutorial we are going to only use the data from one single year, and later on we are going to combine the datasets from different years together. The data will only be downloaded if it is not yet available on the local computer.

url = "http://esgf1.dkrz.de/thredds/fileServer/cosmo-rea/reanalysis/EUR-6km/DWD/ECMWF-ERAINT/REA6/r1i1p1f1/COSMO/v1/mon/atmos/tas/v20230314/tas_EUR-6km_ECMWF-ERAINT_REA6_r1i1p1f1_COSMO_v1_mon_199601-199612.nc"
filename = split(url, "/")[end]
mkpath("data/")
p = joinpath(@__DIR__, "data", filename)
if !isfile(p)
download(url, p)
end
setwd("data/")
destfile <- getwd()

# URL of DKRZ folder containing COSMO-REA Dataset files
baseurl <- "http://esgf1.dkrz.de/thredds/fileServer/cosmo-rea/reanalysis/EUR-6km/DWD/ECMWF-ERAINT/REA6/r1i1p1f1/COSMO/v1/mon/atmos/tas/v20230314/tas_EUR-6km_ECMWF-ERAINT_REA6_r1i1p1f1_COSMO_v1_mon_"
urlList <- "0"
l1 <- format(seq(as.Date("1995-01-01"), as.Date("2018-01-01"), by="year"), format = "%Y%m")
l2 <- format(seq(as.Date("1995-12-01"), as.Date("2018-12-01"), by="year"), format = "%Y%m")

for (i in 1:length(l1)) {
x <- paste0(baseurl,l1[i],"-",l2[i],".nc")
# Use the following line to download the data to the destfile folder
#download.file(x, paste0(destfile,"\\", basename(x)), mode = "wb") #download files
urlList[i] <- x
i = i + 1
}

Opening the data and first map

Now we are going to open the raster data as a datacube and plot a first overview map.

r = Raster(p, key="tas", lazy=true)
# TODO: should we use GeoMakie for plotting?
# TODO: This should work without the set see Rasters # 
r = set(r, :rlat=>Y, :rlon=>X)
# To get an overview we could use
#Rasters.rplot(r)
# Or to plot the data of January separately
heatmap(r[Ti(Near(DateTime(1995,1,15)))])

┌ Warning: Found resolution in the theme when creating a Scene. The resolution keyword for Scenes and Figures has been deprecated. Use Figure(; size = ... or Scene(; size = ...) instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from set_theme! calls or related theming functions. └ @ Makie ~/.julia/packages/Makie/fyNiH/src/scenes.jl:220

file = paste0(destfile, "/", basename(urlList[1]))
m = read_mdim(file, "tas", curvilinear = c("longitude", "latitude"))
plot(m[,,,1], downsample = 5, axes = TRUE, reset = FALSE) # downsample to save time
maps::map(add = TRUE)
""# | hide_line

import os
from pathlib import Path
print(os.getcwd())
dirpath = os.path.join("/home/fcremer/Documents/NFDI4Earth/lhbarticles", "data", "*.nc")
cosmo_rea6 = xr.open_mfdataset(dirpath, engine='netcdf4')["tas"]

/home/fcremer/Documents/NFDI4Earth/lhbarticles/data

Coordinate reference system of the data

The data is in a rotated latitude longitude grid. This rotation helps to reduce the spatial distortions of the data because of the projection. For an introduction into the concepts of Here we construct a projection string from the metadata of the dataset so that we can use this projection information latter on for converting between different coordinate reference systems.

In Julia we construct a Proj representation of the coordinate reference system so that we can convert the vector data that we are going to use later on for subsetting the dataset into the CRS of the raster data.

```julia
ds = open_dataset(p)
olonp = 180 + ds.rotated_latitude_longitude.properties["grid_north_pole_longitude"]
olatp = ds.rotated_latitude_longitude.properties["grid_north_pole_latitude"]

proj = "+proj=ob_tran +o_proj=latlon +o_lon_p=0 +o_lat_p=$olatp +lon_0=$olonp"
```

 "+proj=ob_tran +o_proj=latlon +o_lon_p=0 +o_lat_p=39.25 +lon_0=18.0"

In R we are warping the raster data from the rotated longitude latitude grid to an unrotated longitude latitude grid.

if  (isFALSE(file.exists("out2.nc"))) {
gdal_utils("warp", file, "out2.nc")
}
m2 = read_mdim("out2.nc")                                 # now has time over bands, caused by "warp"
plot(m2, axes = TRUE, reset = FALSE)                      # fast: regular grid
maps::map(add = TRUE)
# Add time dimension and crs to re-gridded dataset
t = st_get_dimension_values(m, "time")
merge(m2, name = "time") |>
st_set_dimensions("time", values = t) |>
setNames("tas") -> m3
m3 <- st_set_crs(m3, st_crs(4326))
m3
┌ Warning: RCall.jl: downsample set to 1
└ @ RCall ~/.julia/packages/RCall/YrsKg/src/io.jl:172

RObject{VecSxp}
stars object with 3 dimensions and 1 attribute
attribute(s), summary of first 1e+05 cells:
            Min.  1st Qu.   Median     Mean  3rd Qu.   Max.  NA's
tas [K] 280.5498 285.1855 286.6836 286.7361 287.6787 294.75 89615
dimension(s):
    from   to offset   delta refsys                    values x/y
lon     1 1469 -44.75 0.07483 WGS 84                      NULL [x]
lat     1  678   21.9 0.07483 WGS 84                      NULL [y]
time    1   12     NA      NA   Date 1995-01-16,...,1995-12-16
_proj_str = "+proj=ob_tran +o_proj=latlon +o_lon=90 +o_lat_p=39.25 +lon_0=18.0"
crs = rasterio.crs.CRS.from_proj4(_proj_str)
cosmo_rea6 = cosmo_rea6.rio.write_crs(crs)
#cosmo_rea6 = cosmo_rea6.rio.reproject("EPSG:4326")["tas"].chunk({"time":1})

Restricting to an area of interest

Now we load the polygon data for the border of Germany to restrict our data to the bounding box of Germany.

We use the GADM package to load the boundary polygon for Germany. Then we reproject the polygon to the rotated longitude latitude grid.

using GADM
using ArchGDAL:ArchGDAL as AG
using GLMakie
using GeoInterfaceMakie: GeoInterfaceMakie
GeoInterfaceMakie.@enable AG.AbstractGeometry

deu = GADM.get("DEU")
projdeu = AG.reproject(only(deu.geom), ProjString(AG.toPROJ4(AG.getspatialref(deu.geom[1]))), ProjString(proj))
# Should work like that
#projdeu = AG.reproject(deu, ProjString(proj))
bbox = GI.extent(projdeu)
rger = r[bbox]
heatmap(rger[Ti(Near(DateTime(1995,1,15)))])
plot!(projdeu)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/fyNiH/src/scenes.jl:220

Poly{Tuple{GeometryBasics.MultiPolygon{2, Float64, GeometryBasics.Polygon{2, Float64, Point2{Float64}, GeometryBasics.LineString{2, Float64, Point2{Float64}, Base.ReinterpretArray{GeometryBasics.Line{2, Float64}, 1, Tuple{Point2{Float64}, Point2{Float64}}, GeometryBasics.TupleView{Tuple{Point2{Float64}, Point2{Float64}}, 2, 1, Vector{Point2{Float64}}}, false}}, Vector{GeometryBasics.LineString{2, Float64, Point2{Float64}, Base.ReinterpretArray{GeometryBasics.Line{2, Float64}, 1, Tuple{Point2{Float64}, Point2{Float64}}, GeometryBasics.TupleView{Tuple{Point2{Float64}, Point2{Float64}}, 2, 1, Vector{Point2{Float64}}}, false}}}}, Vector{GeometryBasics.Polygon{2, Float64, Point2{Float64}, GeometryBasics.LineString{2, Float64, Point2{Float64}, Base.ReinterpretArray{GeometryBasics.Line{2, Float64}, 1, Tuple{Point2{Float64}, Point2{Float64}}, GeometryBasics.TupleView{Tuple{Point2{Float64}, Point2{Float64}}, 2, 1, Vector{Point2{Float64}}}, false}}, Vector{GeometryBasics.LineString{2, Float64, Point2{Float64}, Base.ReinterpretArray{GeometryBasics.Line{2, Float64}, 1, Tuple{Point2{Float64}, Point2{Float64}}, GeometryBasics.TupleView{Tuple{Point2{Float64}, Point2{Float64}}, 2, 1, Vector{Point2{Float64}}}, false}}}}}}}}
R""" #| hide_line#jrn
# Subset for Germany
germany <- ne_countries(type = "countries", country = "Germany", scale = "medium", returnclass = "sf")
rger <- m3[germany]
plot(rger)

Split the time series into two seasons

Now we split the time series data into two datasets by season.

Hereby we define the summer as the time between the spring and autumn equinox. Since we are using monthly data in this example, we define summer as April to September. We can define the winter as every month that is not included in the summer dataset.

summer = rger[Ti=Date(1996,4,1)..Date(1996, 10,1)]
winter = rger[Ti=Not(Date(1996, 4,1)..Date(1996,10,1))]
104×142×6 Raster{Union{Missing, Float32},3} tas with dimensions: 
X Sampled{Float64} Float64[-7.557552218296486, -7.502551037136502, …, -1.9474317399783132, -1.8924305588183294] ForwardOrdered Explicit Intervals{Center},
Y Sampled{Float64} Float64[-3.1625523990153646, -3.107551183040055, …, 4.537617837528135, 4.592619053503444] ForwardOrdered Explicit Intervals{Center},
Ti Sampled{DateTime} DateTime[1996-01-16T12:00:00, …, 1996-12-16T12:00:00] ForwardOrdered Explicit Intervals{Center}
extent: Extent(X = (-7.585052808876476, -1.8649299682383376), Y = (-3.1900530070030193, 4.620119661491099), Ti = (DateTime("1996-01-01T00:00:00"), DateTime("1997-01-01T00:00:00")))missingval: missingparent:
[:, :, 1]
            -3.16255   -3.10755  …    4.48262    4.53762    4.59262
-7.55755  272.92     273.601       275.57     275.613    275.659
-7.50255  273.192    273.271       275.499    275.545    275.595
-7.44755  273.245    273.018       275.43     275.478    275.527
-7.39255  273.229    272.829       275.362    275.411    275.461
-7.33755  273.27     272.859    …  275.294    275.342    275.392
-7.28255  273.295    273.223       275.225    275.271    275.319
-7.22755  273.122    273.2         275.155    275.199    275.245
⋮                              ⋱               ⋮        
-2.27744  270.596    271.563       272.977    272.934    272.877
-2.22244  270.089    272.091    …  272.983    272.937    272.869
-2.16744  270.009    271.892       272.99     272.953    272.905
-2.11244  269.83     271.159       272.989    272.972    272.968
-2.05743  269.084    270.396       272.967    272.965    272.987
-2.00243  268.513    269.486       272.925    272.939    272.992
-1.94743  268.209    268.963    …  272.858    272.897    272.996
-1.89243  268.287    268.819       272.444    272.764    273.012
[and 5 more slices...]
# Split dataset into two by season
# Definition: Summer (April - September) & Winter (October - March)
a = seq(from = 4, to = 288, by = 12)
b = seq(from = 9, to = 288, by = 12)
winter <- rger[,,,4:9]

c = seq(from = 1, to = 288, by = 12)
d = seq(from = 3, to = 288, by = 12)
e = seq(from = 10, to = 288, by = 12)
f = seq(from = 12, to = 288, by = 12)
summer <- c(rger[,,,1:3], rger[,,,10:12])
RObject{VecSxp}
stars object with 3 dimensions and 1 attribute
attribute(s):
            Min. 1st Qu.  Median     Mean  3rd Qu.     Max.  NA's
tas [K] 267.0615 272.964 276.252 276.6505 278.1038 288.6487 28236
dimension(s):
    from  to offset   delta refsys                    values x/y
lon   677 799 -44.75 0.07483 WGS 84                      NULL [x]
lat   340 444   21.9 0.07483 WGS 84                      NULL [y]
time    1   6     NA      NA   Date 1995-01-16,...,1995-12-16
def is_summer(month):
    return (month > 3) & (month < 10)

def is_winter(month):
    return (month <= 3) | (month >= 10)

cosmo_rea6_summer = cosmo_rea6.sel(time=is_summer(cosmo_rea6['time.month']))
cosmo_rea6_winter = cosmo_rea6.sel(time=is_winter(cosmo_rea6['time.month']))

Now we compute the standard deviation and the mean of the time series for the "summer" and the "winter" dataset.

summer = rger[Ti=Date(1996,4,1)..Date(1996, 10,1)]
winter = rger[Ti=Not(Date(1996, 4,1)..Date(1996,10,1))]
using Statistics
summermean = mapslices(mean, summer, dims=Ti)
wintermean = mapslices(mean, winter, dims=Ti)
winterstd = mapslices(std, winter, dims=Ti)
plot(summermean[:,:,1])
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/fyNiH/src/scenes.jl:220

summer_mean_tas <- aggregate(summer, by = "year", FUN = mean)
plot(summer_mean_tas)

    cosmo_rea6_winter_mean = cosmo_rea6_winter.groupby("time.year").mean(dim="time",skipna=True)
    cosmo_rea6_summer_mean = cosmo_rea6_summer.groupby("time.year").mean(dim="time",skipna=True)

    cosmo_rea6_winter_std = cosmo_rea6_winter.groupby("time.year").std(dim="time",skipna=True)
    cosmo_rea6_summer_std = cosmo_rea6_summer.groupby("time.year").std(dim="time",skipna=True)
    #cosmo_rea6_winter_mean.sel(year=1995).plot()