Introduction to handling raster time series in Julia, Python and R
Introduction
This tutorial will showcase how to work with raster data efficiently. The analysis will be shown in Julia, Python and R to showcase the similarities and differences in handling raster data in these ecosystems.
In this tutorial we are going to use the COSMO REA reanalyis near surface air temperature data. The data is an reanalysis dataset on a 6km by 6km grid. We are going to use the monthly average values, but the data is also avialable with an hourly or daily temporal resolution.
Time series analysis
Loading of necessary packages
First we load the relevant packages in the different languages for working with raster and vector data and also the packages for plotting.
using YAXArrays # This package is for handling large raster data
using NetCDF # This package is for opening NetCDF files
using GLMakie # Plotting package with a focus on interactivity
using DimensionalData # Package to handle named and labeled arrays
using Rasters
using Downloads: download # Standard Library to handle downloads to get the data
using Glob: glob
using NCDatasets
using GeoInterface: GeoInterface as GI # Package for handling Geospatial data
using GADM # Package for loading state borders
library("rnaturalearth")
library("rnaturalearthdata")
library( "sf")
library( "raster")
library("stars")
library("lubridate")
import geopandas #Load GeoJSON data into #pandas dataframes
import rasterio #Handle reprojection of geospatial data
import xarray as xr #Handle I/O and compute for n-dimensional array data
Now we download the airtemperature data for 1995 to the data/
folder.
In the first part of the tutorial we are going to only use the data from
one single year, and later on we are going to combine the datasets from
different years together. The data will only be downloaded if it is not
yet available on the local computer.
url = "http://esgf1.dkrz.de/thredds/fileServer/cosmo-rea/reanalysis/EUR-6km/DWD/ECMWF-ERAINT/REA6/r1i1p1f1/COSMO/v1/mon/atmos/tas/v20230314/tas_EUR-6km_ECMWF-ERAINT_REA6_r1i1p1f1_COSMO_v1_mon_199601-199612.nc"
filename = split(url, "/")[end]
mkpath("data/")
p = joinpath(@__DIR__, "data", filename)
if !isfile(p)
download(url, p)
end
setwd("data/")
destfile <- getwd()
# URL of DKRZ folder containing COSMO-REA Dataset files
baseurl <- "http://esgf1.dkrz.de/thredds/fileServer/cosmo-rea/reanalysis/EUR-6km/DWD/ECMWF-ERAINT/REA6/r1i1p1f1/COSMO/v1/mon/atmos/tas/v20230314/tas_EUR-6km_ECMWF-ERAINT_REA6_r1i1p1f1_COSMO_v1_mon_"
urlList <- "0"
l1 <- format(seq(as.Date("1995-01-01"), as.Date("2018-01-01"), by="year"), format = "%Y%m")
l2 <- format(seq(as.Date("1995-12-01"), as.Date("2018-12-01"), by="year"), format = "%Y%m")
for (i in 1:length(l1)) {
x <- paste0(baseurl,l1[i],"-",l2[i],".nc")
# Use the following line to download the data to the destfile folder
#download.file(x, paste0(destfile,"\\", basename(x)), mode = "wb") #download files
urlList[i] <- x
i = i + 1
}
Opening the data and first map
Now we are going to open the raster data as a datacube and plot a first overview map.
r = Raster(p, key="tas", lazy=true)
# TODO: should we use GeoMakie for plotting?
# TODO: This should work without the set see Rasters #
r = set(r, :rlat=>Y, :rlon=>X)
# To get an overview we could use
#Rasters.rplot(r)
# Or to plot the data of January separately
heatmap(r[Ti(Near(DateTime(1995,1,15)))])
┌ Warning: Found resolution
in the theme when creating a Scene
. The resolution
keyword for Scene
s and Figure
s has been deprecated. Use Figure(; size = ...
or Scene(; size = ...)
instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from set_theme!
calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/fyNiH/src/scenes.jl:220
file = paste0(destfile, "/", basename(urlList[1]))
m = read_mdim(file, "tas", curvilinear = c("longitude", "latitude"))
plot(m[,,,1], downsample = 5, axes = TRUE, reset = FALSE) # downsample to save time
maps::map(add = TRUE)
""# | hide_line
import os
from pathlib import Path
print(os.getcwd())
dirpath = os.path.join("/home/fcremer/Documents/NFDI4Earth/lhbarticles", "data", "*.nc")
cosmo_rea6 = xr.open_mfdataset(dirpath, engine='netcdf4')["tas"]
/home/fcremer/Documents/NFDI4Earth/lhbarticles/data
Coordinate reference system of the data
The data is in a rotated latitude longitude grid. This rotation helps to reduce the spatial distortions of the data because of the projection. For an introduction into the concepts of Here we construct a projection string from the metadata of the dataset so that we can use this projection information latter on for converting between different coordinate reference systems.
In Julia we construct a Proj representation of the coordinate reference system so that we can convert the vector data that we are going to use later on for subsetting the dataset into the CRS of the raster data.
```julia
ds = open_dataset(p)
olonp = 180 + ds.rotated_latitude_longitude.properties["grid_north_pole_longitude"]
olatp = ds.rotated_latitude_longitude.properties["grid_north_pole_latitude"]
proj = "+proj=ob_tran +o_proj=latlon +o_lon_p=0 +o_lat_p=$olatp +lon_0=$olonp"
```
"+proj=ob_tran +o_proj=latlon +o_lon_p=0 +o_lat_p=39.25 +lon_0=18.0"
In R we are warping the raster data from the rotated longitude latitude grid to an unrotated longitude latitude grid.
if (isFALSE(file.exists("out2.nc"))) {
gdal_utils("warp", file, "out2.nc")
}
m2 = read_mdim("out2.nc") # now has time over bands, caused by "warp"
plot(m2, axes = TRUE, reset = FALSE) # fast: regular grid
maps::map(add = TRUE)
# Add time dimension and crs to re-gridded dataset
t = st_get_dimension_values(m, "time")
merge(m2, name = "time") |>
st_set_dimensions("time", values = t) |>
setNames("tas") -> m3
m3 <- st_set_crs(m3, st_crs(4326))
m3
┌ Warning: RCall.jl: downsample set to 1
└ @ RCall ~/.julia/packages/RCall/YrsKg/src/io.jl:172
RObject{VecSxp}
stars object with 3 dimensions and 1 attribute
attribute(s), summary of first 1e+05 cells:
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
tas [K] 280.5498 285.1855 286.6836 286.7361 287.6787 294.75 89615
dimension(s):
from to offset delta refsys values x/y
lon 1 1469 -44.75 0.07483 WGS 84 NULL [x]
lat 1 678 21.9 0.07483 WGS 84 NULL [y]
time 1 12 NA NA Date 1995-01-16,...,1995-12-16
_proj_str = "+proj=ob_tran +o_proj=latlon +o_lon=90 +o_lat_p=39.25 +lon_0=18.0"
crs = rasterio.crs.CRS.from_proj4(_proj_str)
cosmo_rea6 = cosmo_rea6.rio.write_crs(crs)
#cosmo_rea6 = cosmo_rea6.rio.reproject("EPSG:4326")["tas"].chunk({"time":1})
Restricting to an area of interest
Now we load the polygon data for the border of Germany to restrict our data to the bounding box of Germany.
We use the GADM package to load the boundary polygon for Germany. Then we reproject the polygon to the rotated longitude latitude grid.
using GADM
using ArchGDAL:ArchGDAL as AG
using GLMakie
using GeoInterfaceMakie: GeoInterfaceMakie
GeoInterfaceMakie.@enable AG.AbstractGeometry
deu = GADM.get("DEU")
projdeu = AG.reproject(only(deu.geom), ProjString(AG.toPROJ4(AG.getspatialref(deu.geom[1]))), ProjString(proj))
# Should work like that
#projdeu = AG.reproject(deu, ProjString(proj))
bbox = GI.extent(projdeu)
rger = r[bbox]
heatmap(rger[Ti(Near(DateTime(1995,1,15)))])
plot!(projdeu)
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/fyNiH/src/scenes.jl:220
Poly{Tuple{GeometryBasics.MultiPolygon{2, Float64, GeometryBasics.Polygon{2, Float64, Point2{Float64}, GeometryBasics.LineString{2, Float64, Point2{Float64}, Base.ReinterpretArray{GeometryBasics.Line{2, Float64}, 1, Tuple{Point2{Float64}, Point2{Float64}}, GeometryBasics.TupleView{Tuple{Point2{Float64}, Point2{Float64}}, 2, 1, Vector{Point2{Float64}}}, false}}, Vector{GeometryBasics.LineString{2, Float64, Point2{Float64}, Base.ReinterpretArray{GeometryBasics.Line{2, Float64}, 1, Tuple{Point2{Float64}, Point2{Float64}}, GeometryBasics.TupleView{Tuple{Point2{Float64}, Point2{Float64}}, 2, 1, Vector{Point2{Float64}}}, false}}}}, Vector{GeometryBasics.Polygon{2, Float64, Point2{Float64}, GeometryBasics.LineString{2, Float64, Point2{Float64}, Base.ReinterpretArray{GeometryBasics.Line{2, Float64}, 1, Tuple{Point2{Float64}, Point2{Float64}}, GeometryBasics.TupleView{Tuple{Point2{Float64}, Point2{Float64}}, 2, 1, Vector{Point2{Float64}}}, false}}, Vector{GeometryBasics.LineString{2, Float64, Point2{Float64}, Base.ReinterpretArray{GeometryBasics.Line{2, Float64}, 1, Tuple{Point2{Float64}, Point2{Float64}}, GeometryBasics.TupleView{Tuple{Point2{Float64}, Point2{Float64}}, 2, 1, Vector{Point2{Float64}}}, false}}}}}}}}
R""" #| hide_line#jrn
# Subset for Germany
germany <- ne_countries(type = "countries", country = "Germany", scale = "medium", returnclass = "sf")
rger <- m3[germany]
plot(rger)
Split the time series into two seasons
Now we split the time series data into two datasets by season.
Hereby we define the summer as the time between the spring and autumn equinox. Since we are using monthly data in this example, we define summer as April to September. We can define the winter as every month that is not included in the summer dataset.
summer = rger[Ti=Date(1996,4,1)..Date(1996, 10,1)]
winter = rger[Ti=Not(Date(1996, 4,1)..Date(1996,10,1))]
104×142×6 Raster{Union{Missing, Float32},3} tas with dimensions:
X Sampled{Float64} Float64[-7.557552218296486, -7.502551037136502, …, -1.9474317399783132, -1.8924305588183294] ForwardOrdered Explicit Intervals{Center},
Y Sampled{Float64} Float64[-3.1625523990153646, -3.107551183040055, …, 4.537617837528135, 4.592619053503444] ForwardOrdered Explicit Intervals{Center},
Ti Sampled{DateTime} DateTime[1996-01-16T12:00:00, …, 1996-12-16T12:00:00] ForwardOrdered Explicit Intervals{Center}
extent: Extent(X = (-7.585052808876476, -1.8649299682383376), Y = (-3.1900530070030193, 4.620119661491099), Ti = (DateTime("1996-01-01T00:00:00"), DateTime("1997-01-01T00:00:00")))missingval: missingparent:
[:, :, 1]
-3.16255 -3.10755 … 4.48262 4.53762 4.59262
-7.55755 272.92 273.601 275.57 275.613 275.659
-7.50255 273.192 273.271 275.499 275.545 275.595
-7.44755 273.245 273.018 275.43 275.478 275.527
-7.39255 273.229 272.829 275.362 275.411 275.461
-7.33755 273.27 272.859 … 275.294 275.342 275.392
-7.28255 273.295 273.223 275.225 275.271 275.319
-7.22755 273.122 273.2 275.155 275.199 275.245
⋮ ⋱ ⋮
-2.27744 270.596 271.563 272.977 272.934 272.877
-2.22244 270.089 272.091 … 272.983 272.937 272.869
-2.16744 270.009 271.892 272.99 272.953 272.905
-2.11244 269.83 271.159 272.989 272.972 272.968
-2.05743 269.084 270.396 272.967 272.965 272.987
-2.00243 268.513 269.486 272.925 272.939 272.992
-1.94743 268.209 268.963 … 272.858 272.897 272.996
-1.89243 268.287 268.819 272.444 272.764 273.012
[and 5 more slices...]
# Split dataset into two by season
# Definition: Summer (April - September) & Winter (October - March)
a = seq(from = 4, to = 288, by = 12)
b = seq(from = 9, to = 288, by = 12)
winter <- rger[,,,4:9]
c = seq(from = 1, to = 288, by = 12)
d = seq(from = 3, to = 288, by = 12)
e = seq(from = 10, to = 288, by = 12)
f = seq(from = 12, to = 288, by = 12)
summer <- c(rger[,,,1:3], rger[,,,10:12])
RObject{VecSxp}
stars object with 3 dimensions and 1 attribute
attribute(s):
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
tas [K] 267.0615 272.964 276.252 276.6505 278.1038 288.6487 28236
dimension(s):
from to offset delta refsys values x/y
lon 677 799 -44.75 0.07483 WGS 84 NULL [x]
lat 340 444 21.9 0.07483 WGS 84 NULL [y]
time 1 6 NA NA Date 1995-01-16,...,1995-12-16
def is_summer(month):
return (month > 3) & (month < 10)
def is_winter(month):
return (month <= 3) | (month >= 10)
cosmo_rea6_summer = cosmo_rea6.sel(time=is_summer(cosmo_rea6['time.month']))
cosmo_rea6_winter = cosmo_rea6.sel(time=is_winter(cosmo_rea6['time.month']))
Now we compute the standard deviation and the mean of the time series for the "summer" and the "winter" dataset.
summer = rger[Ti=Date(1996,4,1)..Date(1996, 10,1)]
winter = rger[Ti=Not(Date(1996, 4,1)..Date(1996,10,1))]
using Statistics
summermean = mapslices(mean, summer, dims=Ti)
wintermean = mapslices(mean, winter, dims=Ti)
winterstd = mapslices(std, winter, dims=Ti)
plot(summermean[:,:,1])
┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/fyNiH/src/scenes.jl:220
summer_mean_tas <- aggregate(summer, by = "year", FUN = mean)
plot(summer_mean_tas)
cosmo_rea6_winter_mean = cosmo_rea6_winter.groupby("time.year").mean(dim="time",skipna=True)
cosmo_rea6_summer_mean = cosmo_rea6_summer.groupby("time.year").mean(dim="time",skipna=True)
cosmo_rea6_winter_std = cosmo_rea6_winter.groupby("time.year").std(dim="time",skipna=True)
cosmo_rea6_summer_std = cosmo_rea6_summer.groupby("time.year").std(dim="time",skipna=True)
#cosmo_rea6_winter_mean.sel(year=1995).plot()