Ingest gridded products such as ice velocity into OGGM

Ingest gridded products such as ice velocity into OGGM#

After running our OGGM experiments we often want to compare the model output to other gridded observations or maybe we want to use additional data sets that are not currently in the OGGM shop to calibrate parameters in the model (e.g. Glen A creep parameter, sliding parameter or the calving constant of proportionality). If you are looking on ways or ideas on how to do this, you are in the right tutorial!

In OGGM, a local map projection is defined for each glacier entity in the RGI inventory following the methods described in Maussion and others (2019). The model uses a Transverse Mercator projection centred on the glacier. A lot of data sets, especially those from Polar regions can have a different projections and if we are not careful, we would be making mistakes when we compare them with our model output or when we use such data sets to constrain our model experiments.

New in OGGM 1.6! We now offer preprocess directories where the data is available already. Visit OGGM as an accelerator for modelling and machine learning for more info.

First lets import the modules we need:

import xarray as xr
import numpy as np
import matplotlib.pyplot as plt
import salem

from oggm import cfg, utils, workflow, tasks, graphics
cfg.initialize(logging_level='WARNING')

2025-04-16 10:40:22: oggm.cfg: Reading default parameters from the OGGM `params.cfg` configuration file.
2025-04-16 10:40:22: oggm.cfg: Multiprocessing switched OFF according to the parameter file.
2025-04-16 10:40:22: oggm.cfg: Multiprocessing: using all available processors (N=4)

cfg.PATHS['working_dir'] = utils.gettempdir(dirname='OGGM-shop-on-Flowlines', reset=True)

Lets define the glaciers for the run#

rgi_ids = ['RGI60-14.06794']  # Baltoro

# The RGI version to use
# Size of the map around the glacier.
prepro_border = 80
# Degree of processing level. This is OGGM specific and for the shop 1 is the one you want
from_prepro_level = 3
# URL of the preprocessed gdirs
# we use elevation bands flowlines here
base_url = 'https://cluster.klima.uni-bremen.de/~oggm/gdirs/oggm_v1.6/L3-L5_files/2023.3/elev_bands/W5E5'
gdirs = workflow.init_glacier_directories(rgi_ids,
                                          from_prepro_level=from_prepro_level,
                                          prepro_base_url=base_url,
                                          prepro_border=prepro_border)

2025-04-16 10:40:23: oggm.workflow: init_glacier_directories from prepro level 3 on 1 glaciers.
2025-04-16 10:40:23: oggm.workflow: Execute entity tasks [gdir_from_prepro] on 1 glaciers

gdir = gdirs[0]

graphics.plot_googlemap(gdir, figsize=(8, 7))

../../_images/3fd00dcae57b66a3d26b139a6290f7d999d5a65c582d7ec9035d93c1b03cc5cd.png

Add data from OGGM-Shop: bed topography data#

Additionally to the data produced by the model, the OGGM-Shop counts with routines that will automatically download and reproject other useful data sets into the glacier projection (For more information also check out this notebook). This data will be stored under the file described above.

OGGM can now download data from the Farinotti et al., (2019) consensus estimate and reproject it to the glacier directories map:

from oggm.shop import bedtopo

workflow.execute_entity_task(bedtopo.add_consensus_thickness, gdirs);

2025-04-16 10:40:33: oggm.workflow: Execute entity tasks [add_consensus_thickness] on 1 glaciers

with xr.open_dataset(gdir.get_filepath('gridded_data')) as ds:
    ds = ds.load()

the cell below might take a while… be patient

# plot the salem map background, make countries in grey
smap = ds.salem.get_map(countries=False)
smap.set_shapefile(gdir.read_shapefile('outlines'))
smap.set_topography(ds.topo.data);

f, ax = plt.subplots(figsize=(9, 9))
smap.set_data(ds.consensus_ice_thickness)
smap.set_cmap('Blues')
smap.plot(ax=ax)
smap.append_colorbar(ax=ax, label='ice thickness (m)');

../../_images/7e72f262e98c1e584cb4d4862f7c31b04773a66c7257b8793e30445f9a131c8c.png

Bin the data in `gridded_data` into OGGM elevation bands#

Now that we have added new data to the gridded_data file, we can bin the data sets into the same elevation bands as OGGM by recomputing the elevation bands flowlines:

tasks.elevation_band_flowline(gdir,
                              bin_variables=['consensus_ice_thickness', 
                                             'millan_vx',
                                             'millan_vy'],
                              preserve_totals=[True, False, False]  # I"m actually not sure if preserving totals is meaningful with velocities - likely not
                              # NOTE: we could bin variables according to max() as well!
                              )

2025-04-16 10:40:50: oggm.core.centerlines: RGI60-14.06794: var `millan_vx` not found in gridded_data.
2025-04-16 10:40:50: oggm.core.centerlines: RGI60-14.06794: var `millan_vy` not found in gridded_data.
2025-04-16 10:40:50: oggm.core.centerlines: ValueError occurred during task elevation_band_flowline on RGI60-14.06794: Cannot broadcast len 3 to desired length: 1.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[29], line 1
----> 1 tasks.elevation_band_flowline(gdir,
      2                               bin_variables=['consensus_ice_thickness', 
      3                                              'millan_vx',
      4                                              'millan_vy'],
      5                               preserve_totals=[True, False, False]  # I"m actually not sure if preserving totals is meaningful with velocities - likely not
      6                               # NOTE: we could bin variables according to max() as well!
      7                               )

File /usr/local/pyenv/versions/3.11.11/lib/python3.11/site-packages/oggm/utils/_workflow.py:496, in entity_task.__call__.<locals>._entity_task(gdir, reset, print_log, return_value, continue_on_error, add_to_log_file, **kwargs)
    494     signal.alarm(cfg.PARAMS['task_timeout'])
    495 ex_t = time.time()
--> 496 out = task_func(gdir, **kwargs)
    497 ex_t = time.time() - ex_t
    498 if cfg.PARAMS['task_timeout'] > 0:

File /usr/local/pyenv/versions/3.11.11/lib/python3.11/site-packages/oggm/core/centerlines.py:2239, in elevation_band_flowline(gdir, bin_variables, preserve_totals)
   2236         out_totals.append(data_sum * gdir.grid.dx ** 2)
   2237         out_vars.append(data[glacier_mask])
-> 2239 preserve_totals = utils.tolist(preserve_totals, length=len(bin_variables))
   2241 # Slope
   2242 sy, sx = np.gradient(topo, gdir.grid.dx)

File /usr/local/pyenv/versions/3.11.11/lib/python3.11/site-packages/oggm/utils/_funcs.py:169, in tolist(arg, length)
    167         pass
    168     else:
--> 169         raise ValueError('Cannot broadcast len {} '.format(len(arg)) +
    170                          'to desired length: {}.'.format(length))
    172 return arg

ValueError: Cannot broadcast len 3 to desired length: 1.

This created a csv in the glacier directory folder with the data binned to it:

import pandas as pd
df = pd.read_csv(gdir.get_filepath('elevation_band_flowline'), index_col=0)
df

	area	mean_elevation	slope	bin_elevation	dx	width
dis_along_flowline
60.171679	40000.0	8345.080078	0.244307	8355.0	120.343358	332.382282
140.402840	80000.0	8235.237305	0.642076	8235.0	40.118964	1994.069438
173.939057	40000.0	8197.490234	0.838839	8205.0	26.953469	1484.039045
202.715848	40000.0	8150.462402	0.775496	8145.0	30.600113	1307.184709
236.362148	40000.0	8075.439453	0.685385	8085.0	36.692489	1090.141376
...	...	...	...	...	...	...
42309.175061	760000.0	3525.279541	0.104130	3525.0	287.058303	2647.545781
42584.145386	400000.0	3495.227295	0.113628	3495.0	262.882346	1521.593236
42815.229714	200000.0	3472.789551	0.149415	3465.0	199.286311	1003.581227
43010.617627	200000.0	3441.432861	0.155403	3435.0	191.489514	1044.443613
43225.272023	120000.0	3407.177490	0.125483	3405.0	237.819278	504.584831

157 rows × 6 columns

df['obs_velocity'] = (df['millan_vx']**2 + df['millan_vy']**2)**0.5
df['bed_h'] = df['mean_elevation'] - df['consensus_ice_thickness']

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File /usr/local/pyenv/versions/3.11.11/lib/python3.11/site-packages/pandas/core/indexes/base.py:3805, in Index.get_loc(self, key)
   3804 try:
-> 3805     return self._engine.get_loc(casted_key)
   3806 except KeyError as err:

File index.pyx:167, in pandas._libs.index.IndexEngine.get_loc()

File index.pyx:196, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:7081, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:7089, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'millan_vx'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[31], line 1
----> 1 df['obs_velocity'] = (df['millan_vx']**2 + df['millan_vy']**2)**0.5
      2 df['bed_h'] = df['mean_elevation'] - df['consensus_ice_thickness']

File /usr/local/pyenv/versions/3.11.11/lib/python3.11/site-packages/pandas/core/frame.py:4102, in DataFrame.__getitem__(self, key)
   4100 if self.columns.nlevels > 1:
   4101     return self._getitem_multilevel(key)
-> 4102 indexer = self.columns.get_loc(key)
   4103 if is_integer(indexer):
   4104     indexer = [indexer]

File /usr/local/pyenv/versions/3.11.11/lib/python3.11/site-packages/pandas/core/indexes/base.py:3812, in Index.get_loc(self, key)
   3807     if isinstance(casted_key, slice) or (
   3808         isinstance(casted_key, abc.Iterable)
   3809         and any(isinstance(x, slice) for x in casted_key)
   3810     ):
   3811         raise InvalidIndexError(key)
-> 3812     raise KeyError(key) from err
   3813 except TypeError:
   3814     # If we have a listlike key, _check_indexing_error will raise
   3815     #  InvalidIndexError. Otherwise we fall through and re-raise
   3816     #  the TypeError.
   3817     self._check_indexing_error(key)

KeyError: 'millan_vx'

df[['mean_elevation', 'bed_h', 'obs_velocity']].plot(figsize=(10, 4), secondary_y='obs_velocity');

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[32], line 1
----> 1 df[['mean_elevation', 'bed_h', 'obs_velocity']].plot(figsize=(10, 4), secondary_y='obs_velocity');

File /usr/local/pyenv/versions/3.11.11/lib/python3.11/site-packages/pandas/core/frame.py:4108, in DataFrame.__getitem__(self, key)
   4106     if is_iterator(key):
   4107         key = list(key)
-> 4108     indexer = self.columns._get_indexer_strict(key, "columns")[1]
   4110 # take() does not accept boolean indexers
   4111 if getattr(indexer, "dtype", None) == bool:

File /usr/local/pyenv/versions/3.11.11/lib/python3.11/site-packages/pandas/core/indexes/base.py:6200, in Index._get_indexer_strict(self, key, axis_name)
   6197 else:
   6198     keyarr, indexer, new_indexer = self._reindex_non_unique(keyarr)
-> 6200 self._raise_if_missing(keyarr, indexer, axis_name)
   6202 keyarr = self.take(indexer)
   6203 if isinstance(key, Index):
   6204     # GH 42790 - Preserve name from an Index

File /usr/local/pyenv/versions/3.11.11/lib/python3.11/site-packages/pandas/core/indexes/base.py:6252, in Index._raise_if_missing(self, key, indexer, axis_name)
   6249     raise KeyError(f"None of [{key}] are in the [{axis_name}]")
   6251 not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique())
-> 6252 raise KeyError(f"{not_found} not in index")

KeyError: "['bed_h', 'obs_velocity'] not in index"

The problem with this file is that it does not have a regular spacing. The numerical model needs regular spacing, which is why OGGM does this:

# This takes the csv file and prepares new 'inversion_flowlines.pkl' and created a new csv file with regular spacing
tasks.fixed_dx_elevation_band_flowline(gdir,
                                       bin_variables=['consensus_ice_thickness',
                                                      'millan_vx', 
                                                      'millan_vy'],
                                       preserve_totals=[True, False, False]
                                      )

2025-04-16 10:40:51: oggm.core.centerlines: RGI60-14.06794: var `consensus_ice_thickness` not found in gridded_data.
2025-04-16 10:40:51: oggm.core.centerlines: RGI60-14.06794: var `millan_vx` not found in gridded_data.
2025-04-16 10:40:51: oggm.core.centerlines: RGI60-14.06794: var `millan_vy` not found in gridded_data.
2025-04-16 10:40:51: oggm.core.centerlines: ValueError occurred during task fixed_dx_elevation_band_flowline on RGI60-14.06794: Cannot broadcast len 3 to desired length: 0.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[33], line 2
      1 # This takes the csv file and prepares new 'inversion_flowlines.pkl' and created a new csv file with regular spacing
----> 2 tasks.fixed_dx_elevation_band_flowline(gdir,
      3                                        bin_variables=['consensus_ice_thickness',
      4                                                       'millan_vx', 
      5                                                       'millan_vy'],
      6                                        preserve_totals=[True, False, False]
      7                                       )

File /usr/local/pyenv/versions/3.11.11/lib/python3.11/site-packages/oggm/utils/_workflow.py:496, in entity_task.__call__.<locals>._entity_task(gdir, reset, print_log, return_value, continue_on_error, add_to_log_file, **kwargs)
    494     signal.alarm(cfg.PARAMS['task_timeout'])
    495 ex_t = time.time()
--> 496 out = task_func(gdir, **kwargs)
    497 ex_t = time.time() - ex_t
    498 if cfg.PARAMS['task_timeout'] > 0:

File /usr/local/pyenv/versions/3.11.11/lib/python3.11/site-packages/oggm/core/centerlines.py:2418, in fixed_dx_elevation_band_flowline(gdir, bin_variables, preserve_totals)
   2414         log.warning('{}: var `{}` not found in gridded_data.'
   2415                     ''.format(gdir.rgi_id, var))
   2416 bin_variables = keep
-> 2418 preserve_totals = utils.tolist(preserve_totals,
   2419                                length=len(bin_variables))
   2420 odf = pd.DataFrame(index=dis_along_flowline)
   2421 odf.index.name = 'dis_along_flowline'

File /usr/local/pyenv/versions/3.11.11/lib/python3.11/site-packages/oggm/utils/_funcs.py:169, in tolist(arg, length)
    167         pass
    168     else:
--> 169         raise ValueError('Cannot broadcast len {} '.format(len(arg)) +
    170                          'to desired length: {}.'.format(length))
    172 return arg

ValueError: Cannot broadcast len 3 to desired length: 0.

df_regular = pd.read_csv(gdir.get_filepath('elevation_band_flowline', filesuffix='_fixed_dx'), index_col=0)
df_regular

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[34], line 1
----> 1 df_regular = pd.read_csv(gdir.get_filepath('elevation_band_flowline', filesuffix='_fixed_dx'), index_col=0)
df_regular

File /usr/local/pyenv/versions/3.11.11/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1026, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, date_format, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options, dtype_backend)
kwds_defaults = _refine_defaults_read(
   dialect,
   delimiter,
   (...)   1022     dtype_backend=dtype_backend,
)
kwds.update(kwds_defaults)
-> 1026 return _read(filepath_or_buffer, kwds)

File /usr/local/pyenv/versions/3.11.11/lib/python3.11/site-packages/pandas/io/parsers/readers.py:620, in _read(filepath_or_buffer, kwds)
_validate_names(kwds.get("names", None))
# Create the parser.
--> 620 parser = TextFileReader(filepath_or_buffer, **kwds)
if chunksize or iterator:
   return parser

File /usr/local/pyenv/versions/3.11.11/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1620, in TextFileReader.__init__(self, f, engine, **kwds)
   self.options["has_index_names"] = kwds["has_index_names"]
self.handles: IOHandles | None = None
-> 1620 self._engine = self._make_engine(f, self.engine)

File /usr/local/pyenv/versions/3.11.11/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1880, in TextFileReader._make_engine(self, f, engine)
   if "b" not in mode:
       mode += "b"
-> 1880 self.handles = get_handle(
   f,
   mode,
   encoding=self.options.get("encoding", None),
   compression=self.options.get("compression", None),
   memory_map=self.options.get("memory_map", False),
   is_text=is_text,
   errors=self.options.get("encoding_errors", "strict"),
   storage_options=self.options.get("storage_options", None),
)
assert self.handles is not None
f = self.handles.handle

File /usr/local/pyenv/versions/3.11.11/lib/python3.11/site-packages/pandas/io/common.py:873, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
elif isinstance(handle, str):
   # Check whether the filename is to be opened in binary mode.
   # Binary mode does not support 'encoding' and 'newline'.
   if ioargs.encoding and "b" not in ioargs.mode:
       # Encoding
--> 873         handle = open(
           handle,
           ioargs.mode,
           encoding=ioargs.encoding,
           errors=errors,
           newline="",
       )
   else:
       # Binary mode
       handle = open(handle, ioargs.mode)

FileNotFoundError: [Errno 2] No such file or directory: '/tmp/OGGM/OGGM-shop-on-Flowlines/per_glacier/RGI60-14/RGI60-14.06/RGI60-14.06794/elevation_band_flowline_fixed_dx.csv'

The other variables have disappeared for saving space, but I think it would be nicer to have them here as well. We can grab them from the inversion flowlines (this could be better handled in OGGM):

fl = gdir.read_pickle('inversion_flowlines')[0]
df_regular['mean_elevation'] = fl.surface_h

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[35], line 2
      1 fl = gdir.read_pickle('inversion_flowlines')[0]
----> 2 df_regular['mean_elevation'] = fl.surface_h

NameError: name 'df_regular' is not defined

df_regular['obs_velocity'] = (df_regular['millan_vx']**2 + df_regular['millan_vy']**2)**0.5
df_regular['consensus_bed_h'] = df_regular['mean_elevation'] - df_regular['consensus_ice_thickness']

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[36], line 1
----> 1 df_regular['obs_velocity'] = (df_regular['millan_vx']**2 + df_regular['millan_vy']**2)**0.5
      2 df_regular['consensus_bed_h'] = df_regular['mean_elevation'] - df_regular['consensus_ice_thickness']

NameError: name 'df_regular' is not defined

df_regular[['mean_elevation', 'consensus_bed_h', 'obs_velocity']].plot(figsize=(10, 4),
                                                                       secondary_y='obs_velocity');

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[37], line 1
----> 1 df_regular[['mean_elevation', 'consensus_bed_h', 'obs_velocity']].plot(figsize=(10, 4),
      2                                                                        secondary_y='obs_velocity');

NameError: name 'df_regular' is not defined

OK so more or less the same but this time on a regular grid. Note that these now have the same length as the OGGM inversion flowlines, i.e. one can do stuff such as comparing what OGGM has done for the inversion:

inv = gdir.read_pickle('inversion_output')[0]

df_regular['OGGM_bed'] = df_regular['mean_elevation'] - inv['thick']

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[39], line 1
----> 1 df_regular['OGGM_bed'] = df_regular['mean_elevation'] - inv['thick']

NameError: name 'df_regular' is not defined

df_regular[['mean_elevation', 'consensus_bed_h', 'OGGM_bed']].plot();

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[40], line 1
----> 1 df_regular[['mean_elevation', 'consensus_bed_h', 'OGGM_bed']].plot();

NameError: name 'df_regular' is not defined

Compare velocities: at inversion#

OGGM already inverted the ice thickness based on some optimisation such as matching the regional totals. Which parameters did we use?

d = gdir.get_diagnostics()
d

{'dem_source': 'NASADEM',
 'flowline_type': 'elevation_band',
 'apparent_mb_from_any_mb_residual': 56.10000000000032,
 'inversion_glen_a': 6.42968271645714e-24,
 'inversion_fs': 0}

# OK set them so that we are consistent
cfg.PARAMS['inversion_glen_a'] = d['inversion_glen_a']
cfg.PARAMS['inversion_fs'] = d['inversion_fs']

2025-04-16 10:40:52: oggm.cfg: PARAMS['inversion_glen_a'] changed from `2.4e-24` to `6.42968271645714e-24`.

# Since we overwrote the inversion flowlines we have to do stuff again
tasks.mb_calibration_from_geodetic_mb(gdir,
                                      informed_threestep=True,
                                      overwrite_gdir=True)
tasks.apparent_mb_from_any_mb(gdir)
tasks.prepare_for_inversion(gdir)
tasks.mass_conservation_inversion(gdir)
tasks.compute_inversion_velocities(gdir)

inv = gdir.read_pickle('inversion_output')[0]

df_regular['OGGM_inversion_velocity'] = inv['u_surface']

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[45], line 1
----> 1 df_regular['OGGM_inversion_velocity'] = inv['u_surface']

NameError: name 'df_regular' is not defined

df_regular[['obs_velocity', 'OGGM_inversion_velocity']].plot();

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[46], line 1
----> 1 df_regular[['obs_velocity', 'OGGM_inversion_velocity']].plot();

NameError: name 'df_regular' is not defined

The velocities are higher, because:

OGGM is at equilibrium
the comparison between the bulk velocities of OGGM and the gridded observed ones is not straightforward.

What’s next?#

return to the OGGM documentation
back to the table of contents

Ingest gridded products such as ice velocity into OGGM

Contents

Ingest gridded products such as ice velocity into OGGM#

Lets define the glaciers for the run#

The `gridded_data` file in the glacier directory#

Merging the `gridded_data` files of multiple glacier directories#

Add data from OGGM-Shop: bed topography data#

OGGM-Shop: velocities#

Bin the data in `gridded_data` into OGGM elevation bands#

Compare velocities: at inversion#

Velocities during the run#

What’s next?#

Ingest gridded products such as ice velocity into OGGM

Contents

Ingest gridded products such as ice velocity into OGGM#

Lets define the glaciers for the run#

The gridded_data file in the glacier directory#

Merging the gridded_data files of multiple glacier directories#

Add data from OGGM-Shop: bed topography data#

OGGM-Shop: velocities#

Bin the data in gridded_data into OGGM elevation bands#

Compare velocities: at inversion#

Velocities during the run#

What’s next?#

The `gridded_data` file in the glacier directory#

Merging the `gridded_data` files of multiple glacier directories#

Bin the data in `gridded_data` into OGGM elevation bands#