Storing glacier directories for later use#

“Glacier directories” are the fundamental data structure used by OGGM. They allow to share data between runs, between the OGGM developers and users, and between users themselves.

Glacier directories can also be confusing at times, and can contain a high number of files, making them hard to move between clusters or computers. This notebook explains how these directories are structured and how to store them for move and later use.

The main use-cases documented by this notebook are:

  • pre-process a number of glacier directories

  • stop working, and then re-start again from the same location

  • stop working, store them and copy them to another storage, or move them to another machine

  • re-start from them on another machine / instance

# Libs
import os
import shutil

# Locals
import oggm.cfg as cfg
from oggm import utils, workflow, tasks, DEFAULT_BASE_URL

The structure of the working directory#

Let’s open a new workflow for two glaciers:

# Initialize OGGM and set up the default run parameters
cfg.initialize(logging_level='WARNING')
rgi_version = '62'
cfg.PARAMS['border'] = 80

# Local working directory (where OGGM will write its output)
WORKING_DIR = utils.gettempdir('oggm_gdirs_wd', reset=True)
cfg.PATHS['working_dir'] = WORKING_DIR

# RGI glaciers: Hintereisferner and Kesselwandferner
rgi_ids = utils.get_rgi_glacier_entities(['RGI60-11.00897', 'RGI60-11.00787'])

# Go - get the pre-processed glacier directories
base_url = ('https://cluster.klima.uni-bremen.de/~oggm/gdirs/oggm_v1.6/'
            'L3-L5_files/2023.3/elev_bands/W5E5/')
gdirs = workflow.init_glacier_directories(rgi_ids, from_prepro_level=3, prepro_base_url=base_url)
2026-03-05 21:05:00: oggm.cfg: Reading default parameters from the OGGM `params.cfg` configuration file.
2026-03-05 21:05:00: oggm.cfg: Multiprocessing switched OFF according to the parameter file.
2026-03-05 21:05:00: oggm.cfg: Multiprocessing: using all available processors (N=4)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[2], line 11
      8 cfg.PATHS['working_dir'] = WORKING_DIR
     10 # RGI glaciers: Hintereisferner and Kesselwandferner
---> 11 rgi_ids = utils.get_rgi_glacier_entities(['RGI60-11.00897', 'RGI60-11.00787'])
     13 # Go - get the pre-processed glacier directories
     14 base_url = ('https://cluster.klima.uni-bremen.de/~oggm/gdirs/oggm_v1.6/'
     15             'L3-L5_files/2023.3/elev_bands/W5E5/')

File /usr/local/pyenv/versions/3.11.11/lib/python3.11/site-packages/oggm/utils/_downloads.py:2006, in get_rgi_glacier_entities(rgi_ids, version)
   2004 selection = []
   2005 for reg in sorted(np.unique(regions)):
-> 2006     sh = gpd.read_file(get_rgi_region_file(reg, version=version))
   2007     try:
   2008         selection.append(sh.loc[sh.RGIId.isin(rgi_ids)])

File /usr/local/pyenv/versions/3.11.11/lib/python3.11/site-packages/oggm/utils/_downloads.py:1963, in get_rgi_region_file(region, version, reset)
   1943 def get_rgi_region_file(region, version=None, reset=False):
   1944     """Path to the RGI region file.
   1945 
   1946     If the RGI files are not present, download them.
   (...)   1960         path to the RGI shapefile
   1961     """
-> 1963     rgi_dir = get_rgi_dir(version=version, reset=reset)
   1964     if version in ['70G', '70C']:
   1965         f = list(glob.glob(rgi_dir + f"/*/*-{region}_*.shp"))

File /usr/local/pyenv/versions/3.11.11/lib/python3.11/site-packages/oggm/utils/_downloads.py:1872, in get_rgi_dir(version, reset)
   1853 """Path to the RGI directory.
   1854 
   1855 If the RGI files are not present, download them.
   (...)   1868     path to the RGI directory
   1869 """
   1871 with get_lock():
-> 1872     return _get_rgi_dir_unlocked(version=version, reset=reset)

File /usr/local/pyenv/versions/3.11.11/lib/python3.11/site-packages/oggm/utils/_downloads.py:1920, in _get_rgi_dir_unlocked(version, reset)
   1918 ofile = file_downloader(dfile, reset=reset)
   1919 if ofile is None:
-> 1920     raise RuntimeError(f'Could not download RGI file: {dfile}')
   1921 # Extract root
   1922 try:

RuntimeError: Could not download RGI file: http://www.glims.org/RGI/rgi60_files/00_rgi60.zip

Note that in OGGM v1.6 you have to explicitly indicate the url from where you want to start from, we will use here a preprocessed directory with elevation band flowlines and used W5E5 for calibration. In the future, other preprocessed directories might exist and you can use them by changing the base_url.

OGGM downloaded the pre-processed directories, stored the tar files in your cache, and extracted them in your working directory. But how is this working directory structured? Let’s have a look:

def file_tree_print(prepro_dir=False):
    # Just a utility function to show the dir structure and selected files
    print("cfg.PATHS['working_dir']/")
    tab = '  '
    for dirname, dirnames, filenames in os.walk(cfg.PATHS['working_dir']):
        for subdirname in dirnames:
            print(tab + subdirname + '/')
        for filename in filenames:
            if '.tar' in filename and 'RGI' in filename:
                print(tab + filename)
        tab += '  '
file_tree_print()
cfg.PATHS['working_dir']/

OK, so from the WORKING_DIR, OGGM creates a per_glacier folder (always) where the glacier directories are stored. In order to avoid a large cluttering of the folder (and for other reasons which become apparent later), the directories are organised in regional (here RGI60-16) and then in folders containing up to 1000 glaciers (here RGI60-16.02, i.e. for ids RGI60-16.020000 to RGI60-16.029999).

Our files are located in the final folders of this tree (not shown in the tree). For example:

gdirs[0].get_filepath('dem').replace(WORKING_DIR, 'WORKING_DIR')
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[5], line 1
----> 1 gdirs[0].get_filepath('dem').replace(WORKING_DIR, 'WORKING_DIR')

NameError: name 'gdirs' is not defined

Let’s add some steps to our workflow, for example a spinup run that we would like to store for later:

# Run
workflow.execute_entity_task(tasks.run_from_climate_data, gdirs, 
                             output_filesuffix='_spinup',  # to use the files as input later on
                             );
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[6], line 2
      1 # Run
----> 2 workflow.execute_entity_task(tasks.run_from_climate_data, gdirs, 
      3                              output_filesuffix='_spinup',  # to use the files as input later on
      4                              );

NameError: name 'gdirs' is not defined

Stop there and restart from the same spot#

The glacier directories are on disk, and won’t move away. This means that next time you’ll open OGGM, from this notebook or another script, you can start from them again. The only steps you have to take:

  • set the working directory to the one you want to start from

  • initialize the working directories without arguments (or, faster, with the list of IDs)

See for example:

# Set the working dir correctly
cfg.PATHS['working_dir'] = utils.gettempdir('oggm_gdirs_wd')

# Go - re-open the pre-processed glacier directories from what's there
gdirs = workflow.init_glacier_directories()
2026-03-05 21:05:01: oggm.workflow: init_glacier_directories by parsing all available folders (this takes time: if possible, provide rgidf instead).

The step above can be quite slow (because OGGM has to parse quite some info from the directories). Better is to start from the list of glaciers you want to work with:

# Go - re-open the pre-processed glacier directories from what's there but with the list of glaciers
gdirs = workflow.init_glacier_directories(rgi_ids)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[8], line 2
      1 # Go - re-open the pre-processed glacier directories from what's there but with the list of glaciers
----> 2 gdirs = workflow.init_glacier_directories(rgi_ids)

NameError: name 'rgi_ids' is not defined

!!!CAREFUL!!! do not start from a preprocessed level (or from a tar file), or your local directories (which may contain new data) will be overwritten, i.e. workflow.init_glacier_directories(rgi_ids, from_prepro_level=3, prepro_base_url=base_url) will always start from the pre-processed, fresh state.

Store the single glacier directories into tar files#

The gdir_to_tar task will compress each single glacier directory into the same folder per default (but you can actually also put the compressed files somewhere else, e.g. in a folder in your $home):

utils.gdir_to_tar?
workflow.execute_entity_task(utils.gdir_to_tar, gdirs, delete=False);
file_tree_print()
cfg.PATHS['working_dir']/
2026-03-05 21:05:01: oggm.workflow: Called execute_entity_task on 0 glaciers. Returning...

Most of the time, you will actually want to delete the orginal directories because they are not needed for this run anymore:

workflow.execute_entity_task(utils.gdir_to_tar, gdirs, delete=True);
file_tree_print()
cfg.PATHS['working_dir']/
2026-03-05 21:05:01: oggm.workflow: Called execute_entity_task on 0 glaciers. Returning...

Now the original directories are gone, and the gdirs objects are useless (attempting to do anything with them will lead to an error).

Since they are already available in the correct file structure, however, OGGM will know how to reconstruct them from the tar files if asked to:

gdirs = workflow.init_glacier_directories(rgi_ids, from_tar=True, delete_tar=True)
file_tree_print()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[12], line 1
----> 1 gdirs = workflow.init_glacier_directories(rgi_ids, from_tar=True, delete_tar=True)
      2 file_tree_print()

NameError: name 'rgi_ids' is not defined

These directories are now ready to be used again! To summarize: thanks to this first step, you already reduced the number of files to move around from N x M (where M is the number of files in each glacier directory) to N (where N is the number of glaciers).

You can now move this working directory somewhere else, and in another OGGM run instance, simply start from them as shown above.

Bundle of directories#

It turned out that the file structure above was a bit cumbersome to use, in particular for glacier directories that we wanted to share online. For this, we found it more convenient to bundle the directories into groups of 1000 glaciers. Fortunately, this is easy to do:

utils.base_dir_to_tar?
# Tar the individual ones first
workflow.execute_entity_task(utils.gdir_to_tar, gdirs, delete=True);
# Then tar the bundles
utils.base_dir_to_tar(WORKING_DIR, delete=True)
file_tree_print()
cfg.PATHS['working_dir']/
2026-03-05 21:05:01: oggm.workflow: Called execute_entity_task on 0 glaciers. Returning...

Now, the glacier directories are bundled in a file at a higher level even. This is even more convenient to move around (less files), but is not a mandatory step. The nice part about this bundling is that you can still select individual glaciers, as we will see in the next section. In the meantime, you can do:

gdirs = workflow.init_glacier_directories(rgi_ids, from_tar=True)
file_tree_print()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[15], line 1
----> 1 gdirs = workflow.init_glacier_directories(rgi_ids, from_tar=True)
      2 file_tree_print()

NameError: name 'rgi_ids' is not defined

Which did the trick! Note that the bundled tar files are never deleted. This is why they are useful for another purpose explained in the next section: creating your own “pre-processed directories”.

Self-made pre-processed directories for “restart” workflows#

This workflow is the one used by OGGM to prepare the preprocessed directories that many of you are using. It is a variant of the workflow above, the only difference being that the directories are re-started from a file which is located elsewhere than in the working directory:

# Where to put the compressed dirs
PREPRO_DIR = utils.get_temp_dir('prepro_dir')
if os.path.exists(PREPRO_DIR):
    shutil.rmtree(PREPRO_DIR)

# Lets start from a clean state
# Beware! If you use `reset=True` in `utils.mkdir`, ALL DATA in this folder will be deleted! Use with caution!
utils.mkdir(WORKING_DIR, reset=True)
gdirs = workflow.init_glacier_directories(rgi_ids, from_prepro_level=3, prepro_base_url=base_url)

# Then tar the gdirs and bundle
workflow.execute_entity_task(utils.gdir_to_tar, gdirs, delete=True)
utils.base_dir_to_tar(delete=True)

# Copy the outcome in a new directory: scratch folder, new machine, etc.
shutil.copytree(os.path.join(WORKING_DIR, 'per_glacier'), PREPRO_DIR);
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[16], line 9
      6 # Lets start from a clean state
      7 # Beware! If you use `reset=True` in `utils.mkdir`, ALL DATA in this folder will be deleted! Use with caution!
      8 utils.mkdir(WORKING_DIR, reset=True)
----> 9 gdirs = workflow.init_glacier_directories(rgi_ids, from_prepro_level=3, prepro_base_url=base_url)
     11 # Then tar the gdirs and bundle
     12 workflow.execute_entity_task(utils.gdir_to_tar, gdirs, delete=True)

NameError: name 'rgi_ids' is not defined

OK so this PREPRO_DIR directory is where the files will stay for longer now. You can start from there at wish with:

# Lets start from a clean state
utils.mkdir(WORKING_DIR, reset=True)
# This needs https://github.com/OGGM/oggm/pull/1158 to work
# It uses the files you prepared beforehand to start the dirs
gdirs = workflow.init_glacier_directories(rgi_ids, from_tar=PREPRO_DIR)
file_tree_print()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[17], line 5
      2 utils.mkdir(WORKING_DIR, reset=True)
      3 # This needs https://github.com/OGGM/oggm/pull/1158 to work
      4 # It uses the files you prepared beforehand to start the dirs
----> 5 gdirs = workflow.init_glacier_directories(rgi_ids, from_tar=PREPRO_DIR)
      6 file_tree_print()

NameError: name 'rgi_ids' is not defined

What’s next?#