10 minutes to… OGGM-shop as a data provider for your worksflows#

In this notebook we illustrate how the OGGM shop can act as a data provider for glacier modelling or machine learning workflows.

We use preprocessed glacier directories that include most datasets available in the OGGM shop. These datasets go beyond what is required for the standard OGGM workflow and include additional information such as velocity, thickness estimates, and other glacier attributes that can be useful for data-driven approaches (e.g. machine learning).

Tags: beginner, shop, workflow

Preprocessed directories with additional products#

We are going to use the South Glacier example taken from the ITMIX experiment. It is a small (5.6 km2) glacier in Alaska.

## Libs
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import xarray as xr
import salem

# OGGM
import oggm.cfg as cfg
from oggm import utils, workflow, tasks, graphics
# Initialize OGGM and set up the default run parameters
cfg.initialize(logging_level='WARNING')
cfg.PARAMS['use_multiprocessing'] = False
# Local working directory (where OGGM will write its output)
cfg.PATHS['working_dir'] = utils.gettempdir('OGGM_Toy_Thickness_Model')
# We use the preprocessed directories with additional data in it: "W5E5_w_data"
# the old 2023.3 preprocessed gdir had "0"-values instead of NaN-values for the Millan2022 data. This was corrected in the 2025.6 gdirs.
base_url = 'https://cluster.klima.uni-bremen.de/~oggm/gdirs/oggm_v1.6/L3-L5_files/2025.6/elev_bands_w_data/W5E5/per_glacier/'
gdirs = workflow.init_glacier_directories(['RGI60-01.16195'], from_prepro_level=3, prepro_base_url=base_url, prepro_border=10)
2026-03-09 23:08:42: oggm.cfg: Reading default parameters from the OGGM `params.cfg` configuration file.
2026-03-09 23:08:42: oggm.cfg: Multiprocessing switched OFF according to the parameter file.
2026-03-09 23:08:42: oggm.cfg: Multiprocessing: using all available processors (N=4)
2026-03-09 23:08:42: oggm.workflow: init_glacier_directories from prepro level 3 on 1 glaciers.
2026-03-09 23:08:42: oggm.workflow: Execute entity tasks [gdir_from_prepro] on 1 glaciers
# Pick our glacier
gdir = gdirs[0]
gdir
<oggm.GlacierDirectory>
  RGI id: RGI60-01.16195
  Region: 01: Alaska
  Subregion: 01-05: St Elias Mtns                   
  Glacier type: Glacier
  Terminus type: Land-terminating
  Status: Glacier or ice cap
  Area: 5.655 km2
  Lon, Lat: (-139.131, 60.822)
  Grid (nx, ny): (105, 120)
  Grid (dx, dy): (43.0, -43.0)

OGGM-Shop datasets#

We are using here the preprocessed glacier directories (_w_data) that contain more data than the default ones:

with xr.open_dataset(gdir.get_filepath('gridded_data')) as ds:
    ds = ds.load()
# List all variables
ds
<xarray.Dataset> Size: 593kB
Dimensions:                  (y: 120, x: 105)
Coordinates:
  * y                        (y) float32 480B 6.747e+06 6.747e+06 ... 6.742e+06
  * x                        (x) float32 420B 5.992e+05 5.993e+05 ... 6.037e+05
Data variables: (12/14)
    topo                     (y, x) float32 50kB 2.501e+03 ... 2.144e+03
    topo_smoothed            (y, x) float32 50kB 2.507e+03 ... 2.166e+03
    topo_valid_mask          (y, x) int8 13kB 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1
    glacier_mask             (y, x) int8 13kB 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
    glacier_ext              (y, x) int8 13kB 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
    consensus_ice_thickness  (y, x) float32 50kB nan nan nan nan ... nan nan nan
    ...                       ...
    itslive_vy               (y, x) float32 50kB 26.01 22.5 ... 0.2009 0.1959
    millan_ice_thickness     (y, x) float32 50kB 181.9 184.7 186.1 ... 0.0 0.0
    millan_v                 (y, x) float32 50kB 42.7 41.17 39.01 ... nan nan
    millan_vx                (y, x) float32 50kB -27.83 -27.46 ... nan nan
    millan_vy                (y, x) float32 50kB 32.39 30.67 26.88 ... nan nan
    hugonnet_dhdt            (y, x) float32 50kB -0.3079 -0.2949 ... -0.003762
Attributes:
    author:         OGGM
    author_info:    Open Global Glacier Model
    pyproj_srs:     +proj=utm +zone=7 +datum=WGS84 +units=m +no_defs
    max_h_dem:      3165.046
    min_h_dem:      1816.4698
    max_h_glacier:  3165.046
    min_h_glacier:  1971.5496

That’s already quite a lot! We have access to a bunch of data for this glacier, lets have a look. We prepare the map first:

smap = ds.salem.get_map(countries=False)
smap.set_shapefile(gdir.read_shapefile('outlines'))
smap.set_topography(ds.topo.data);

ITS_LIVE velocity data#

Let’s start with the ITS_LIVE velocity data. OGGM provides data from ITS_LIVE v2, whic is a composite of the [2014-2022] average:

# get the velocity data
u = ds.itslive_vx.where(ds.glacier_mask)
v = ds.itslive_vy.where(ds.glacier_mask)
ws = ds.itslive_v.where(ds.glacier_mask)

The .where(ds.glacier_mask) command will remove the data outside of the glacier outline.

# get the axes ready
f, ax = plt.subplots(figsize=(9, 9))

# Quiver only every N grid point
us = u[1::3, 1::3]
vs = v[1::3, 1::3]

smap.set_data(ws)
smap.set_cmap('Blues')
smap.plot(ax=ax)
smap.append_colorbar(ax=ax, label='ice velocity (m yr$^{-1}$)')

# transform their coordinates to the map reference system and plot the arrows
xx, yy = smap.grid.transform(us.x.values, us.y.values, crs=gdir.grid.proj)
xx, yy = np.meshgrid(xx, yy)
qu = ax.quiver(xx, yy, us.values, vs.values)
qk = ax.quiverkey(qu, 0.82, 0.97, 10, '10 m yr$^{-1}$',
                  labelpos='E', coordinates='axes')
ax.set_title('ITS-LIVE velocity');
../../_images/0a7532ad5d1ed902acd87dcb8eb12695c8a28ec058db5de06146f6223969635c.png

Millan 2022 velocity data#

Millan et al. (2022) published global velocities as well, which is more a snapshot of 2016-2017:

# get the velocity data
u = ds.millan_vx.where(ds.glacier_mask)
v = ds.millan_vy.where(ds.glacier_mask)
ws = ds.millan_v.where(ds.glacier_mask)
# get the axes ready
f, ax = plt.subplots(figsize=(9, 9))

# Quiver only every N grid point
us = u[1::3, 1::3]
vs = v[1::3, 1::3]

smap.set_data(ws)
smap.set_cmap('Blues')
smap.plot(ax=ax)
smap.append_colorbar(ax=ax, label='ice velocity (m yr$^{-1}$)')

# transform their coordinates to the map reference system and plot the arrows
xx, yy = smap.grid.transform(us.x.values, us.y.values, crs=gdir.grid.proj)
xx, yy = np.meshgrid(xx, yy)
qu = ax.quiver(xx, yy, us.values, vs.values)
qk = ax.quiverkey(qu, 0.82, 0.97, 10, '10 m yr$^{-1}$',
                  labelpos='E', coordinates='axes')
ax.set_title('Millan 2022 velocity');
../../_images/f63fbc65d7bf9dc541bccb9522175ad69f004eab5091464233f4525a74adacf3.png

Thickness data from Farinotti 2019 and Millan 2022#

We also provide ice thickness data from Farinotti et al. (2019) and Millan et al. (2022) in the same gridded format.

# get the axes ready
f, ax = plt.subplots(figsize=(9, 9))
smap.set_cmap('viridis')
smap.set_data(ds.consensus_ice_thickness)
smap.plot(ax=ax)
smap.append_colorbar(ax=ax, label='ice thickness (m)')
ax.set_title('Farinotti 2019 thickness');
../../_images/0ab329f25521426b093b8daf4e2a3967b4665f0e8cfbf1d408af58faa291ea7c.png
# get the axes ready
f, ax = plt.subplots(figsize=(9, 9))
smap.set_cmap('viridis')
smap.set_data(ds.millan_ice_thickness.where(ds.glacier_mask))
smap.plot(ax=ax)
smap.append_colorbar(ax=ax, label='ice thickness (m)')
ax.set_title('Millan 2022 thickness');
../../_images/08065b92de306f7e6b8b9638dd83cff80b24391b41d01aea0ea3f80e4bff6474.png

Hugonnet dh/dt maps#

Hugonnet et al., (2021) provides maps of glacier surface elevation change (2000-2020). We also reproject them to the glacier directory and provide them:

# get the axes ready
f, ax = plt.subplots(figsize=(9, 9))

smap.set_data(ds.hugonnet_dhdt.where(ds.glacier_mask))
smap.set_cmap('RdBu')
smap.set_plot_params(vmin=-3, vmax=3)
smap.plot(ax=ax)
smap.append_colorbar(ax=ax, label='dh/dt')

ax.set_title('Hugonnet dh/dt');
../../_images/f16b92ba9e35e0ed3e2bc3baf5b9f341d37379471d637f4ec8051d2f6a9140fd.png

GlaThiDa ice thickness#

With v1.6.3, OGGM allows to fetch GlaThiDa point data for glaciers which have it:

df_gtd = pd.read_csv(gdir.get_filepath('glathida_data'))
df_gtd
survey_id date elevation_date latitude longitude elevation thickness thickness_uncertainty flag rgi_id x_proj y_proj i_grid j_grid ij_grid
0 161 2011-01-01 2007-01-01 60.825257 -139.155974 NaN 114 13.0 NaN RGI60-01.16195 600273.997908 6.744733e+06 25 52 0025_0052
1 161 2011-01-01 2007-01-01 60.825336 -139.155234 NaN 124 6.0 NaN RGI60-01.16195 600314.002333 6.744743e+06 26 51 0026_0051
2 161 2011-01-01 2007-01-01 60.825184 -139.155243 NaN 124 6.0 NaN RGI60-01.16195 600314.001755 6.744726e+06 26 52 0026_0052
3 161 2011-01-01 2007-01-01 60.825031 -139.155215 NaN 124 6.0 NaN RGI60-01.16195 600315.997741 6.744709e+06 26 52 0026_0052
4 161 2011-01-01 2007-01-01 60.824931 -139.155129 NaN 119 6.0 NaN RGI60-01.16195 600320.997450 6.744698e+06 26 52 0026_0052
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3401 161 2011-01-01 2007-01-01 60.803228 -139.122337 NaN 20 3.0 NaN RGI60-01.16195 602172.999100 6.742332e+06 69 107 0069_0107
3402 161 2011-01-01 2007-01-01 60.803199 -139.122210 NaN 20 3.0 NaN RGI60-01.16195 602179.999968 6.742329e+06 69 108 0069_0108
3403 161 2011-01-01 2007-01-01 60.803081 -139.122125 NaN 19 3.0 NaN RGI60-01.16195 602185.000440 6.742316e+06 69 108 0069_0108
3404 161 2011-01-01 2007-01-01 60.802984 -139.122204 NaN 18 3.0 NaN RGI60-01.16195 602180.997158 6.742305e+06 69 108 0069_0108
3405 161 2011-01-01 2007-01-01 60.803052 -139.123891 NaN 22 3.0 NaN RGI60-01.16195 602088.997810 6.742310e+06 67 108 0067_0108

3406 rows × 15 columns

ax = df_gtd.plot.scatter(
    x='x_proj',
    y='y_proj',
    c='thickness',
    cmap='viridis',
    colorbar=True,
)
ax.set_aspect('equal')
../../_images/2ba0cd7b70d39c98c48892da8a4111965f016d8292c789332d13479ca970b636.png

Some additional gridded attributes#

Let’s also add some attributes that OGGM can compute for us:

# Tested tasks
task_list = [
    tasks.gridded_attributes,
    tasks.gridded_mb_attributes,
]
for task in task_list:
    workflow.execute_entity_task(task, gdirs)
2026-03-09 23:08:45: oggm.workflow: Execute entity tasks [gridded_attributes] on 1 glaciers
2026-03-09 23:08:45: oggm.workflow: Execute entity tasks [gridded_mb_attributes] on 1 glaciers

Let’s open the gridded data file again with xarray:

with xr.open_dataset(gdir.get_filepath('gridded_data')) as ds:
    ds = ds.load()
# List all variables
ds
<xarray.Dataset> Size: 971kB
Dimensions:                  (y: 120, x: 105)
Coordinates:
  * y                        (y) float32 480B 6.747e+06 6.747e+06 ... 6.742e+06
  * x                        (x) float32 420B 5.992e+05 5.993e+05 ... 6.037e+05
Data variables: (12/23)
    topo                     (y, x) float32 50kB 2.501e+03 ... 2.144e+03
    topo_smoothed            (y, x) float32 50kB 2.507e+03 ... 2.166e+03
    topo_valid_mask          (y, x) int8 13kB 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1
    glacier_mask             (y, x) int8 13kB 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
    glacier_ext              (y, x) int8 13kB 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
    consensus_ice_thickness  (y, x) float32 50kB nan nan nan nan ... nan nan nan
    ...                       ...
    aspect                   (y, x) float32 50kB 5.822 5.722 ... 1.647 1.738
    slope_factor             (y, x) float32 50kB 3.872 3.872 ... 2.051 2.514
    dis_from_border          (y, x) float32 50kB 1.686e+03 ... 1.55e+03
    catchment_area           (y, x) float32 50kB nan nan nan nan ... nan nan nan
    lin_mb_above_z           (y, x) float32 50kB nan nan nan nan ... nan nan nan
    oggm_mb_above_z          (y, x) float32 50kB nan nan nan nan ... nan nan nan
Attributes:
    author:         OGGM
    author_info:    Open Global Glacier Model
    pyproj_srs:     +proj=utm +zone=7 +datum=WGS84 +units=m +no_defs
    max_h_dem:      3165.046
    min_h_dem:      1816.4698
    max_h_glacier:  3165.046
    min_h_glacier:  1971.5496

The file contains several new variables with their description. For example:

ds.oggm_mb_above_z
<xarray.DataArray 'oggm_mb_above_z' (y: 120, x: 105)> Size: 50kB
array([[nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       ...,
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan]],
      shape=(120, 105), dtype=float32)
Coordinates:
  * y        (y) float32 480B 6.747e+06 6.747e+06 ... 6.742e+06 6.742e+06
  * x        (x) float32 420B 5.992e+05 5.993e+05 ... 6.036e+05 6.037e+05
Attributes:
    units:        kg/year
    long_name:    MB above point from OGGM MB model, without catchments
    description:  Mass balance cumulated above the altitude of thepoint, henc...

Let’s plot a few of them (we show how to plot them with xarray and with oggm):

ds.slope.plot()
plt.axis('equal');
../../_images/9ed3a1172191becc5df835311ef27fafabb773c6992957b13e61ae3b0ac3c03b.png
f, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))
graphics.plot_raster(gdir, var_name='aspect', cmap='twilight', ax=ax1)
graphics.plot_raster(gdir, var_name='oggm_mb_above_z', ax=ax2)
../../_images/a893ce9a08420b315a8b181c24d513af47be58c2f730325ba0e3bee963d4ddbd.png
In a few lines of code, we have used OGGM to generate or deliver a bunch of data for this glaciers. A similar workflow can be used on ALL of them! With this, we hope to facilitate access to data for many other models or machine learning workflows.

Not convinced yet? Let’s spend 10 more minutes to apply a (very simple) machine learning workflow#

Retrieve these attributes at point locations#

For this glacier, we have ice thickness observations from GlaThiDa. Let’s make a table out of it:

df = df_gtd[['thickness', 'x_proj', 'y_proj', 'i_grid', 'j_grid', 'ij_grid']].copy()
# check that there are no NaN values in the data (otherwise, we would remove them)
assert np.any(~df.isna())

Plot the data again:

geom = gdir.read_shapefile('outlines')
f, ax = plt.subplots()
df.plot.scatter(x='x_proj', y='y_proj', c='thickness', cmap='viridis', s=10, ax=ax)
geom.plot(ax=ax, facecolor='none', edgecolor='k');
../../_images/5e4b71036a80ad5b1337dc4c6884923363205502c57c21a909571a187d46232c.png

Method 1: interpolation#

The measurement points of this dataset are very frequent and close to each other. There are plenty of them:

len(df)
3406

Here, we will keep them all and interpolate the variables of interest at the point’s location. We use xarray for this:

vns = ['topo',
       'slope',
       'slope_factor',
       'aspect',
       'dis_from_border',
       'catchment_area',
       'lin_mb_above_z',
       'oggm_mb_above_z',
       'millan_v',
       'itslive_v',
       'hugonnet_dhdt',
       ]
# Interpolate (bilinear)
for vn in vns:
    df[vn] = ds[vn].interp(x=('z', df.x_proj), y=('z', df.y_proj))

We remove those rows with NaN values (mostly from Millan velocities) to have a fair comparison

df = df.dropna()

Let’s see how these variables can explain the measured ice thickness:

f, (ax1, ax2, ax3, ax4) = plt.subplots(1, 4, figsize=(16, 4))
df.plot.scatter(x='dis_from_border', y='thickness', ax=ax1); ax1.set_title('dis_from_border')
df.plot.scatter(x='slope', y='thickness', ax=ax2); ax2.set_title('slope')
df.plot.scatter(x='oggm_mb_above_z', y='thickness', ax=ax3); ax3.set_title('oggm_mb_above_z');
df.plot.scatter(x='hugonnet_dhdt', y='thickness', ax=ax4); ax3.set_title('hugonnet_dhdt');
../../_images/9fb2d9d8610ee080dcc7608ce30b12794e3d09f2561b82142f403b934ca5eb79.png

There is a negative correlation with slope (as expected), a positive correlation with the mass-flux (oggm_mb_above_z), and a slight positive correlation with the distance from the glacier boundaries. There is also some correlation with ice velocity, but not a strong one:

f, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
df.plot.scatter(x='millan_v', y='thickness', ax=ax1); ax1.set_title('millan_v')
df.plot.scatter(x='itslive_v', y='thickness', ax=ax2); ax2.set_title('itslive_v');
../../_images/4561ac8a522bf30577c4bcde3f21a5c52000c9039e94ef0d51fafaf556d17c0b.png

Method 2: aggregated per grid point#

There are so many points that much of the information obtained by OGGM is interpolated and is therefore not bringing much new information to a statistical model. A way to deal with this is to aggregate all the measurement points per grid point and to average them. Let’s do this:

df_agg = df.copy()
df_agg = df_agg.groupby('ij_grid').mean()
# Conversion does not preserve ints
df_agg['i'] = df_agg['i_grid'].astype(int)
df_agg['j'] = df_agg['j_grid'].astype(int)
len(df_agg)
977
# Select
for vn in vns:
    df_agg[vn] = ds[vn].isel(x=('z', df_agg['i']), y=('z', df_agg['j']))

We now have 9 times fewer points, but the main features of the data remain unchanged:

f, (ax1, ax2, ax3, ax4) = plt.subplots(1, 4, figsize=(16, 4))
df_agg.plot.scatter(x='dis_from_border', y='thickness', ax=ax1); ax1.set_title('dis_from_border')
df_agg.plot.scatter(x='slope', y='thickness', ax=ax2); ax2.set_title('slope')
df_agg.plot.scatter(x='oggm_mb_above_z', y='thickness', ax=ax3); ax3.set_title('oggm_mb_above_z');
df_agg.plot.scatter(x='hugonnet_dhdt', y='thickness', ax=ax4); ax3.set_title('hugonnet_dhdt');
../../_images/76f4a26eb7537de1370f3a15ed24e5cd5fde6d31aecd3c58755ad012177068c8.png

Build a statistical model#

Let’s use scikit-learn to build a very simple linear model of ice-thickness. First, we have to acknowledge that there is a correlation between many of the explanatory variables we will use:

import seaborn as sns
plt.figure(figsize=(10, 8))
sns.heatmap(df.corr(), cmap='RdBu');
../../_images/52d1d95246cb4df05c34f31c4a2ccceec503a1c8a5645d71e51600b4e4bdbce6.png

This is a problem for linear regression models, which cannot deal well with correlated explanatory variables. We have to do a so-called “feature selection”, i.e. keep only the variables which are independently important to explain the outcome.

For the sake of simplicity, let’s use the Lasso method to help us out:

import warnings
warnings.filterwarnings("ignore")  # sklearn sends a lot of warnings
# Scikit learn (can be installed e.g. via conda install -c anaconda scikit-learn)
from sklearn.linear_model import LassoCV
from sklearn.model_selection import KFold
# Prepare our data
df = df.dropna()
# Variable to model
target = df['thickness']
# Predictors - remove x and y (redundant with lon lat)
# Normalize it in order to be able to compare the factors
data = df.drop(['thickness', 'x_proj', 'y_proj', 'i_grid', 'j_grid', 'ij_grid'], axis=1).copy()
data_mean = data.mean()
data_std = data.std()
data = (data - data_mean) / data_std
# Use cross-validation to select the regularization parameter
lasso_cv = LassoCV(cv=5, random_state=0)
lasso_cv.fit(data.values, target.values)
LassoCV(cv=5, random_state=0)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

We now have a statistical model trained on the full dataset. Let’s see how it compares to the calibration data:

odf = df.copy()
odf['thick_predicted'] = lasso_cv.predict(data.values)
f, ax = plt.subplots(figsize=(6, 6))
odf.plot.scatter(x='thickness', y='thick_predicted', ax=ax)
plt.xlim([-25, 220])
plt.ylim([-25, 220]);
../../_images/0678e41e0092405852cd90742d295f211e149e14e89fad68cd7a67ee17aa7657.png

Not too bad. It is doing OK for low thicknesses, but high thickness are strongly underestimated. Which explanatory variables did the model choose as being the most important?

predictors = pd.Series(lasso_cv.coef_, index=data.columns)
predictors
topo              -26.349444
slope              -5.317532
slope_factor        8.545106
aspect             -8.275107
dis_from_border     6.134154
catchment_area    -37.155245
lin_mb_above_z     18.518913
oggm_mb_above_z     0.000000
millan_v           -2.277471
itslive_v          -0.982519
hugonnet_dhdt      -6.787095
dtype: float64

This is interesting. If we look at the correlation of the error with the variables of importance, one sees that there is a large correlation between the error and the spatial variables:

odf['error'] = np.abs(odf['thick_predicted'] - odf['thickness'])
odf.corr()['error'].loc[predictors.loc[predictors != 0].index]
topo               0.337517
slope             -0.187259
slope_factor       0.223775
aspect            -0.251635
dis_from_border    0.039196
catchment_area    -0.357454
lin_mb_above_z     0.248110
millan_v           0.101737
itslive_v          0.119478
hugonnet_dhdt      0.109385
Name: error, dtype: float64

Furthermore, the model is not very robust. Let’s use cross-validation to check our model parameters:

k_fold = KFold(4, random_state=0, shuffle=True)
for k, (train, test) in enumerate(k_fold.split(data.values, target.values)):
    lasso_cv.fit(data.values[train], target.values[train])
    print("[fold {0}] alpha: {1:.5f}, score (r^2): {2:.5f}".
          format(k, lasso_cv.alpha_, lasso_cv.score(data.values[test], target.values[test])))
[fold 0] alpha: 0.02783, score (r^2): 0.84413
[fold 1] alpha: 0.02795, score (r^2): 0.83290
[fold 2] alpha: 0.02859, score (r^2): 0.82298
[fold 3] alpha: 0.02785, score (r^2): 0.83720

The fact that the hyperparameter alpha and the score change that much between iterations is a sign that the model isn’t very robust.

Our model is not great, but… let’s apply it#

In order to apply the model to our entre glacier, we have to get the explanatory variables from the gridded dataset again:

# Add variables we are missing
lon, lat = gdir.grid.ll_coordinates
ds['lon'] = (('y', 'x'), lon)
ds['lat'] = (('y', 'x'), lat)

# Generate our dataset
pred_data = pd.DataFrame()
for vn in data.columns:
    # only take glacier gridpoints
    # and only take those "gridpoints" where millan velocities is not NaN
    pred_data[vn] = ds[vn].data[(ds.glacier_mask == 1) & (~np.isnan(ds.millan_v))]

# Normalize using the same normalization constants
pred_data = (pred_data - data_mean) / data_std

# Apply the model
pred_data['thick'] = lasso_cv.predict(pred_data.values)

# For the sake of physics, clip negative thickness values...
pred_data['thick'] = np.clip(pred_data['thick'], 0, None)
# Back to 2d and in xarray
var = ds[vn].data * np.nan
var[(ds.glacier_mask == 1) & (~np.isnan(ds.millan_v))] = pred_data['thick']
ds['linear_model_thick'] = (('y', 'x'), var)
ds['linear_model_thick'].attrs['description'] = 'Predicted thickness'
ds['linear_model_thick'].attrs['units'] = 'm'
ds['linear_model_thick'].plot();
../../_images/fbd9971b73a032552b9d9e82631b525cef51f35db121506aad242eb7ecdaa90c.png

Take home points#

  • OGGM provides preprocessed data from different sources on the same grid

  • we have shown how to compute gridded attributes from OGGM glaciers such as slope, aspect, or catchments

  • we used two methods to extract these data at point locations: with interpolation or with aggregated averages on each grid point

  • as an application example, we trained a linear regression model to predict the ice thickness of this glacier at unseen locations

The model we developed was quite bad, and we used quite lousy statistics. If I had more time to make it better, I would:

  • make a pre-selection of meaningful predictors to avoid discontinuities

  • use a non-linear model

  • use cross-validation to better asses the true skill of the model

What’s next?#

Bonus: how does the statistical model compare to OGGM “out-of the box” and other thickness products?#

# Write our thickness estimates back to disk
ds.to_netcdf(gdir.get_filepath('gridded_data'))
# Distribute OGGM thickness using default values only
workflow.execute_entity_task(tasks.distribute_thickness_per_altitude, gdirs);
2026-03-09 23:08:49: oggm.workflow: Execute entity tasks [distribute_thickness_per_altitude] on 1 glaciers
with xr.open_dataset(gdir.get_filepath('gridded_data')) as ds:
    ds = ds.load()
df_agg['oggm_thick'] = ds.distributed_thickness.isel(x=('z', df_agg['i']), y=('z', df_agg['j']))
df_agg['linear_model_thick'] = ds.linear_model_thick.isel(x=('z', df_agg['i']), y=('z', df_agg['j']))
df_agg['millan_thick'] = ds.millan_ice_thickness.isel(x=('z', df_agg['i']), y=('z', df_agg['j']))
df_agg['consensus_thick'] = ds.consensus_ice_thickness.isel(x=('z', df_agg['i']), y=('z', df_agg['j']))
f, ((ax1, ax2), (ax3, ax4)) = plt.subplots(
    2, 2, figsize=(12, 10), constrained_layout=True
)
vmin= 0
vmax = ds[['linear_model_thick','distributed_thickness','millan_ice_thickness','consensus_ice_thickness']].to_dataframe().max().max()*1.05


im1 = ds['linear_model_thick'].plot(ax=ax1, vmin=vmin, vmax=vmax, add_colorbar=False)
ds['distributed_thickness'].plot(ax=ax2, vmin=vmin, vmax=vmax, add_colorbar=False)
ds['millan_ice_thickness'].where(ds.glacier_mask).plot(ax=ax3, vmin=vmin, vmax=vmax, add_colorbar=False)
ds['consensus_ice_thickness'].plot(ax=ax4, vmin=vmin, vmax=vmax, add_colorbar=False)

ax1.set_title('Statistical model')
ax2.set_title('OGGM')
ax3.set_title('Millan 2022')
ax4.set_title('Farinotti 2019')
cbar = f.colorbar(im1, ax=[ax1, ax2, ax3, ax4], shrink=0.8, location='right', pad=0.02)
cbar.set_label("Ice thickness (m)");
../../_images/82e9b914bab9a839c5c6abf70043dc8a4819fc08837d0ae6a9314812082be0dd.png
## check if there are no NaN values
assert np.any(~np.isnan(df_agg))
###
f, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(12, 10))
df_agg.plot.scatter(x='thickness', y='linear_model_thick', ax=ax1)
ax1.set_xlim([-25, 220]); ax1.set_ylim([-25, 220]); ax1.set_title('Statistical model')
df_agg.plot.scatter(x='thickness', y='oggm_thick', ax=ax2)
ax2.set_xlim([-25, 220]); ax2.set_ylim([-25, 220]); ax2.set_title('OGGM')
df_agg.plot.scatter(x='thickness', y='millan_thick', ax=ax3)
ax3.set_xlim([-25, 220]); ax3.set_ylim([-25, 220]); ax3.set_title('Millan 2022')
df_agg.plot.scatter(x='thickness', y='consensus_thick', ax=ax4)
ax4.set_xlim([-25, 220]); ax4.set_ylim([-25, 220]); ax4.set_title('Farinotti 2019');
../../_images/a4f526d5d6f4f5bd54123965f1aeacc1113483bebdd5f4c0482b336dcc2fdc94.png