Merge, analyse and visualize OGGM GCM runs
Contents
Merge, analyse and visualize OGGM GCM runs#
In this notebook we want to show:
How to merge the output of different OGGM GCM projections into one dataset
How to calculate regional values (e.g. volume) using the merged dataset
How to use HoloViews and Panel to visualize the outcome. This part uses advanced plotting capabilities which are not necessary to understand the rest of the notebook.
This notebook is intended to explain the postprocessing steps, rather than the OGGM workflow itself. Therefore some code (especially conducting the GCM projection runs) does not have many explanations. If you are more interested in these steps you should check out the notebook Run OGGM with GCM data.
GCM projection runs#
The first step is to conduct the GCM projection runs. We choose two different glaciers by their rgi_ids and conduct the GCM projections. Again if you do not understand all of the following code you should check out the Run OGGM with GCM data notebook.
# Libs
from time import gmtime, strftime
import geopandas as gpd
import shapely.geometry as shpg
import xarray as xr
import numpy as np
# Plotting
import holoviews as hv
import panel as pn
hv.extension('bokeh')
pn.extension()
# Locals
from oggm import utils, workflow, tasks
import oggm.cfg as cfg
from oggm.shop import gcm_climate
Pre-processed directories#
# Initialize OGGM and set up the default run parameters
cfg.initialize(logging_level='WARNING')
# change border around the individual glaciers
cfg.PARAMS['border'] = 40
# Use Multiprocessing
cfg.PARAMS['use_multiprocessing'] = True
# For hydro output
cfg.PARAMS['store_model_geometry'] = True
# Local working directory (where OGGM will write its output)
cfg.PATHS['working_dir'] = utils.gettempdir('OGGM_merge_gcm_runs', reset=True)
# RGI glaciers: Ngojumba and Khumbu
rgi_ids = utils.get_rgi_glacier_entities(['RGI60-15.03473', 'RGI60-15.03733'])
# Go - get the pre-processed glacier directories
gdirs = workflow.init_glacier_directories(rgi_ids, from_prepro_level=5)
2023-03-07 12:44:26: oggm.cfg: Reading default parameters from the OGGM `params.cfg` configuration file.
2023-03-07 12:44:26: oggm.cfg: Multiprocessing switched OFF according to the parameter file.
2023-03-07 12:44:26: oggm.cfg: Multiprocessing: using all available processors (N=2)
2023-03-07 12:44:26: oggm.cfg: Multiprocessing switched ON after user settings.
2023-03-07 12:44:26: oggm.cfg: PARAMS['store_model_geometry'] changed from `False` to `True`.
2023-03-07 12:44:33: oggm.workflow: init_glacier_directories from prepro level 5 on 2 glaciers.
2023-03-07 12:44:33: oggm.workflow: Execute entity tasks [gdir_from_prepro] on 2 glaciers
---------------------------------------------------------------------------
RemoteTraceback Traceback (most recent call last)
RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/local/pyenv/versions/3.10.10/lib/python3.10/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/usr/local/pyenv/versions/3.10.10/lib/python3.10/multiprocessing/pool.py", line 48, in mapstar
return list(map(*args))
File "/usr/local/pyenv/versions/3.10.10/lib/python3.10/site-packages/oggm/workflow.py", line 108, in __call__
res = self._call_internal(func, arg, kwargs)
File "/usr/local/pyenv/versions/3.10.10/lib/python3.10/site-packages/oggm/workflow.py", line 102, in _call_internal
return call_func(gdir, **kwargs)
File "/usr/local/pyenv/versions/3.10.10/lib/python3.10/site-packages/oggm/workflow.py", line 251, in gdir_from_prepro
return oggm.GlacierDirectory(entity, from_tar=from_tar)
File "/usr/local/pyenv/versions/3.10.10/lib/python3.10/site-packages/oggm/utils/_workflow.py", line 2529, in __init__
self.name = filter_rgi_name(name)
File "/usr/local/pyenv/versions/3.10.10/lib/python3.10/site-packages/oggm/utils/_funcs.py", line 734, in filter_rgi_name
if name is None or len(name) == 0:
TypeError: object of type 'float' has no len()
"""
The above exception was the direct cause of the following exception:
TypeError Traceback (most recent call last)
Cell In[2], line 20
17 rgi_ids = utils.get_rgi_glacier_entities(['RGI60-15.03473', 'RGI60-15.03733'])
19 # Go - get the pre-processed glacier directories
---> 20 gdirs = workflow.init_glacier_directories(rgi_ids, from_prepro_level=5)
File /usr/local/pyenv/versions/3.10.10/lib/python3.10/site-packages/oggm/workflow.py:533, in init_glacier_directories(rgidf, reset, force, from_prepro_level, prepro_border, prepro_rgi_version, prepro_base_url, from_tar, delete_tar)
531 if cfg.PARAMS['dl_verify']:
532 utils.get_dl_verify_data('cluster.klima.uni-bremen.de')
--> 533 gdirs = execute_entity_task(gdir_from_prepro, entities,
534 from_prepro_level=from_prepro_level,
535 prepro_border=prepro_border,
536 prepro_rgi_version=prepro_rgi_version,
537 base_url=prepro_base_url)
538 else:
539 # We can set the intersects file automatically here
540 if (cfg.PARAMS['use_intersects'] and
541 len(cfg.PARAMS['intersects_gdf']) == 0 and
542 not from_tar):
File /usr/local/pyenv/versions/3.10.10/lib/python3.10/site-packages/oggm/workflow.py:185, in execute_entity_task(task, gdirs, **kwargs)
183 if cfg.PARAMS['use_multiprocessing'] and ng > 1:
184 mppool = init_mp_pool(cfg.CONFIG_MODIFIED)
--> 185 out = mppool.map(pc, gdirs, chunksize=1)
186 else:
187 if ng > 3:
File /usr/local/pyenv/versions/3.10.10/lib/python3.10/multiprocessing/pool.py:367, in Pool.map(self, func, iterable, chunksize)
362 def map(self, func, iterable, chunksize=None):
363 '''
364 Apply `func` to each element in `iterable`, collecting the results
365 in a list that is returned.
366 '''
--> 367 return self._map_async(func, iterable, mapstar, chunksize).get()
File /usr/local/pyenv/versions/3.10.10/lib/python3.10/multiprocessing/pool.py:774, in ApplyResult.get(self, timeout)
772 return self._value
773 else:
--> 774 raise self._value
File /usr/local/pyenv/versions/3.10.10/lib/python3.10/multiprocessing/pool.py:125, in worker()
123 job, i, func, args, kwds = task
124 try:
--> 125 result = (True, func(*args, **kwds))
126 except Exception as e:
127 if wrap_exception and func is not _helper_reraises_exception:
File /usr/local/pyenv/versions/3.10.10/lib/python3.10/multiprocessing/pool.py:48, in mapstar()
47 def mapstar(args):
---> 48 return list(map(*args))
File /usr/local/pyenv/versions/3.10.10/lib/python3.10/site-packages/oggm/workflow.py:108, in __call__()
106 for func in self.call_func:
107 func, kwargs = func
--> 108 res = self._call_internal(func, arg, kwargs)
109 return res
File /usr/local/pyenv/versions/3.10.10/lib/python3.10/site-packages/oggm/workflow.py:102, in _call_internal()
99 gdir, gdir_kwargs = gdir
100 kwargs.update(gdir_kwargs)
--> 102 return call_func(gdir, **kwargs)
File /usr/local/pyenv/versions/3.10.10/lib/python3.10/site-packages/oggm/workflow.py:251, in gdir_from_prepro()
248 tar_base = utils.get_prepro_gdir(prepro_rgi_version, rid, prepro_border,
249 from_prepro_level, base_url=base_url)
250 from_tar = os.path.join(tar_base.replace('.tar', ''), rid + '.tar.gz')
--> 251 return oggm.GlacierDirectory(entity, from_tar=from_tar)
File /usr/local/pyenv/versions/3.10.10/lib/python3.10/site-packages/oggm/utils/_workflow.py:2529, in __init__()
2525 raise RuntimeError('RGI Version not supported: '
2526 '{}'.format(self.rgi_version))
2528 # remove spurious characters and trailing blanks
-> 2529 self.name = filter_rgi_name(name)
2531 # region
2532 reg_names, subreg_names = parse_rgi_meta(version=self.rgi_version[0])
File /usr/local/pyenv/versions/3.10.10/lib/python3.10/site-packages/oggm/utils/_funcs.py:734, in filter_rgi_name()
728 def filter_rgi_name(name):
729 """Remove spurious characters and trailing blanks from RGI glacier name.
730
731 This seems to be unnecessary with RGI V6
732 """
--> 734 if name is None or len(name) == 0:
735 return ''
737 if name[-1] in ['À', 'È', 'è', '\x9c', '3', 'Ð', '°', '¾',
738 '\r', '\x93', '¤', '0', '`', '/', 'C', '@',
739 'Å', '\x06', '\x10', '^', 'å', ';']:
TypeError: object of type 'float' has no len()
Download and process GCM data#
Here we define all the CMIP5 GCM models and the RCP scenarios we want to use. Here you can get an overview of all available options. As you see in the overview only for two of the three GCMs the rcp60 Scenario is available! We deal with this issue by including a try
/except
in the code below.
# define the GCM models you want to use
all_GCM = ['CCSM4',
'CNRM-CM5',
'CSIRO-Mk3-6-0',
]
# define the rcp scenarios to use
all_rcp = ['rcp26',
'rcp45',
'rcp60',
'rcp85']
# download locations for precipitation and temperature
bp = 'https://cluster.klima.uni-bremen.de/~oggm/cmip5-ng/pr/pr_mon_{}_{}_r1i1p1_g025.nc'
bt = 'https://cluster.klima.uni-bremen.de/~oggm/cmip5-ng/tas/tas_mon_{}_{}_r1i1p1_g025.nc'
for GCM in all_GCM:
for rcp in all_rcp:
# Download the files
ft = utils.file_downloader(bt.format(GCM, rcp))
fp = utils.file_downloader(bp.format(GCM, rcp))
try:
# bias correct them
workflow.execute_entity_task(gcm_climate.process_cmip_data, gdirs,
filesuffix='_{}_{}'.format(GCM, rcp), # recognize the climate file for later
fpath_temp=ft, # temperature projections
fpath_precip=fp, # precip projections
);
except ValueError:
# if a certain scenario is not available for a GCM we land here
# and we inidcate this by printing a message so the user knows
# this scenario is missing
print('No ' + GCM +' run with scenario ' + rcp + ' available!')
Actual projection Runs#
Now we conduct the actual projection Runs. Again handling the case that for a certain (GCM, RCP) combination no data is available with a try
/except
.
for GCM in all_GCM:
for rcp in all_rcp:
rid = '_{}_{}'.format(GCM, rcp)
try: # check if GCM, RCP combination exists
workflow.execute_entity_task(tasks.run_with_hydro, gdirs,
run_task=tasks.run_from_climate_data,
ys=2020, # star year of our projection runs
climate_filename='gcm_data', # use gcm_data, not climate_historical
climate_input_filesuffix=rid, # use the chosen GCM and scenario
init_model_filesuffix='_historical', # this is important! Start from 2020 glacier
output_filesuffix=rid, # the filesuffix of the resulting file, so we can find it later
store_monthly_hydro=True
);
except FileNotFoundError:
# if a certain scenario is not available for a GCM we land here
# and we inidcate this by printing a message so the user knows
# this scenario is missing
print('No ' + GCM +' run with scenario ' + rcp + ' available!')
Merge datasets together#
Now that we have conducted the projection Runs we can start merging all outputs into one dataset. The individual files can be opened by providing the output_filesuffix
of the projection runs as the input_filesuffix
. Let us have a look at one resulting dataset!
file_id = '_{}_{}'.format(all_GCM[0], all_rcp[0])
ds = utils.compile_run_output(gdirs, input_filesuffix=file_id)
ds
We see that as a result, we get a xarray.Dataset. We also see that this dataset has three dimensions time
, rgi_id
and month_2d
. When we look at the Data variables
(click to expand) we can see that for each of these variables a certain combination of the dimensions is given. For example, for the volume
we see the dimensions (time, rgi_id)
which indicates that the underlying data is separated by each year and each glacier. Let’s have a look at the volume in 2030 and 2040 at the glacier ‘RGI60-15.03473’ (Ngojumba):
ds['volume'].loc[{'time': [2030, 2040], 'rgi_id': ['RGI60-15.03473' ]}]
A more complete overview of how to access the data of xarray Dataset can be found here.
This is quite useful, but unfortunately, we have one xarray Dataset for each (GCM, RCP) pair. For an easier calculation and comparison between different GCMs and RCPs, it would be desirable to combine all individual Datasets into one Dataset with the additional dimensions (GCM, RCP)
. Fortunately, xarray provides different functions to merge Datasets (here you can find an Overview).
In our case, we want to merge all individual datasets with two additional coordinates GCM
and RCP
. Additional coordinates can be added with the comment .coords[]
, also you can include a description with .coords[].attrs['description']
. For example, let us add GCM
as a new coordinate:
ds.coords['GCM'] = all_GCM[0]
ds.coords['GCM'].attrs['description'] = 'Global Circulation Model'
ds
You can see that now GCM
is added as Coordinate, but when you look at the Data variables you see that this new Coordinate is not used by any of them. So we must add this new Coordinate to the Data variables using the .expand_dims()
comment:
ds = ds.expand_dims('GCM')
ds
Now we see that GCM
was added to the Dimensions and all Data variables now use this new coordinate. The same can be done with RCP
. As a standalone, this is not very useful. But if we add these coordinates to all datasets, it becomes quite handy for merging. Therefore we now open all datasets and add to each one the two coordinates GCM
and RCP
:
Add new coordinates#
ds_all = [] # in this array all datasets going to be stored with additional coordinates GCM and RCP
creation_date = strftime("%Y-%m-%d %H:%M:%S", gmtime()) # here add the current time for info
for GCM in all_GCM: # loop through all GCMs
for RCP in all_rcp: # loop through all RCPs
try: # check if GCM, RCP combination exists
rid = '_{}_{}'.format(GCM, RCP) # put together the same filesuffix which was used during the projection runs
ds_tmp = utils.compile_run_output(gdirs, input_filesuffix=rid) # open one model run
ds_tmp.coords['GCM'] = GCM # add GCM as a coordinate
ds_tmp.coords['GCM'].attrs['description'] = 'used Global circulation Model' # add a description for GCM
ds_tmp = ds_tmp.expand_dims("GCM") # add GCM as a dimension to all Data variables
ds_tmp.coords['RCP'] = RCP # add RCP as a coordinate
ds_tmp.coords['RCP'].attrs['description'] = 'used Representative Concentration Pathway' # add a description for RCP
ds_tmp = ds_tmp.expand_dims("RCP") # add RCP as a dimension to all Data variables
ds_tmp.attrs['creation_date'] = creation_date # also add todays date for info
ds_all.append(ds_tmp) # add the dataset with extra coordinates to our final ds_all array
except RuntimeError as err: # here we land if an error occured
if str(err) == 'Found no valid glaciers!': # This is the error message if the GCM, RCP combination does not exist
print(f'No data for GCM {GCM} with RCP {RCP} found!') # print a descriptive message
else:
raise RuntimeError(err) # if an other error occured we just raise it
Actual merging#
ds_all
now contains all outputs with the additionally added coordinates. Now we can merge them with the very convenient xarray function xr.combine_by_coords()
:
ds_merged = xr.combine_by_coords(ds_all,
fill_value=np.nan) # define how the missing GCM, RCP combinations should be filled
ds_merged
Great! You see that the result is one Dataset, but we still can identify the individual runs with the two additionally added Coordinates GCM and RCP. If you want to save this new Dataset for later analysis you can use for example .to_netcdf()
(here you can find an Overview of xarray writing options):
ds_merged.to_netcdf('merged_GCM_projections.nc')
And to open it again later and automatically close the file after reading you can use:
with xr.open_dataset('merged_GCM_projections.nc') as ds:
ds_merged = ds
Calculate regional values#
Now we have our merged dataset and we can start with the analysis. To do this xarray gives us a ton of different mathematical functions which can be used directly with our dataset. Here you can find a great overview of what is possible.
As an example, we want to calculate the mean and std of the total regional glacier volume for different scenarios of our merged dataset. Here our added coordinates come in very handy.
Let us start by calculating the total regional glacier volume using .sum()
:
ds_total_volume = ds_merged['volume'].sum(dim='rgi_id', # over which dimension the sum should be taken, here we want to sum up over all glacier ids
skipna=True, # ignore nan values
keep_attrs=True) # keep the variable descriptions
ds_total_volume
For this calculation, we told xarray over which dimension we want to sum up dim='rgi_id'
(all glaciers ), that missing values should be ignored skipna=True
(as some GCM/RCP combinations are not available) and that the attributes (the description of the variables) should be preserved keep_attrs=True
. You can see that the result has three dimensions left (GCM, time, RCP)
and the dimension rgi_id
disappeared as we have summed all values up along this dimension.
Likewise, we can now calculate the mean of all GCM runs:
ds_total_volume_mean = ds_total_volume.mean(dim='GCM', # over which dimension the mean should be calculated
skipna=True, # ignore nan values
keep_attrs=True) # keep all variable descriptions
ds_total_volume_mean
And you see that now we are left with two dimensions (RCP, time)
. This means we now have calculated the mean total volume for all different RCP scenarios and along the projection period. The standard deviation (std) can be also calculated in the same way as the mean.
If you want to calculate the total mean and std values for different variables it is convenient to define a small function:
def calculate_total_mean_and_std(ds, variable):
mean = ds[variable].sum(dim='rgi_id', # first sum up over all glaciers
skipna=True,
keep_attrs=True,
).mean(dim='GCM', # afterwards calculate the mean of all GCMs
skipna=True,
keep_attrs=True,
)
std = ds[variable].sum(dim='rgi_id', # first sum up over all glaciers
skipna=True,
keep_attrs=True,
).std(dim='GCM', # afterwards calculate the std of all GCMs
skipna=True,
keep_attrs=True,
)
return mean, std
This function takes a dataset ds
and a variable name variable
for which the mean and the std should be calculated. This function will come in handy in the next section dealing with the visualisation of the calculated values.
Visualisation#
Now we can calculate the regional values of our projection runs it is time to visualize them. In the following, we show one way to visualize the data using tools from the HoloViz framework (namely HoloViews and Panel). For an introduction to HoloViz, you can have a look at the Small overview of HoloViz capability of data exploration notebook.
First, we create a single curve for a single RCP scenario:
# calculate mean and std with the previously defined function
total_volume_mean, total_volume_std = calculate_total_mean_and_std(ds_merged, 'volume')
# select only on RCP scenario
total_volume_mean_rcp85 = total_volume_mean.loc[{'RCP': 'rcp85'}]
# plot a curve
x = total_volume_mean_rcp85.coords['time']
y = total_volume_mean_rcp85
hv.Curve((x, y),
kdims=x.name,
vdims=y.name,
).opts(xlabel=x.attrs['description'],
ylabel=f"{y.attrs['description']} in {y.attrs['unit']}")
We used a HoloViews Curve and defined kdims
and vdims
. This definition is not so important for a single plot but if we start to compose different plots all axis of the different plots with the same kdims
or vdims
are connected. This means for example whenever you zoom in on one plot all other plots also zoom in. Further, you can see that we have defined xlabel
and ylabel
using the variable description of the dataset, therefore it was useful to keep_attrs=True
when we calculated the total values (see above).
As a next step we can add the std as a shaded area (HoloViews Area) and again define the whole plot as a single curve in a new function:
Create single mean curve with std area#
def get_single_curve(mean, std, rcp):
mean_use = mean.loc[{'RCP': rcp}] # read out the mean of the RCP to plot
std_use = std.loc[{'RCP': rcp}] # read out the std of the RCP to plot
time = mean.coords['time'] # get the time for the x axis
return (hv.Area((time, # plot std as an area
mean_use + std_use, # upper boundary of the area
mean_use - std_use), # lower boundary of the area
vdims=[mean_use.name, 'y2'], # vdims for both boundaries
kdims='time',
label=rcp,
).opts(alpha=0.2,
line_color=None,
) *
hv.Curve((time, mean_use),
vdims=mean_use.name,
kdims='time',
label=rcp,
)
).opts(width=400, # width of the total plot
height=400, # height of the total plot
xlabel=time.attrs['description'],
ylabel=f"{mean_use.attrs['description']} in {mean_use.attrs['unit']}",
)
Overlay different scenarios#
The single curves of the different scenarios we can put together in a HoloViews HoloMap, which is comparable to a dictionary. This further can be easily used to create a nice overlay of all curves.
def overlay_scenarios(ds, variable):
hmap = hv.HoloMap(kdims='Scenarios') # create a HoloMap
mean, std = calculate_total_mean_and_std(ds, variable) # calculate mean and std for all RCPs using our previously defined function
for rcp in all_rcp:
hmap[rcp] = get_single_curve(mean, std, rcp) # add a curve for each RCP to the HoloMap, using the RCP as a key (when you compare it do a dictonary)
return hmap.overlay().opts(title=variable) # create an overlay of all curves
Show different variables in one figure and save as html file#
Now that we have defined how our plots should look like we can compose different variables we want to explore in one plot. To do so we can use Panel Column and Row for customization of the plot layout:
all_plots = pn.Column(pn.Row(overlay_scenarios(ds_merged, 'volume'),
overlay_scenarios(ds_merged, 'area'),
),
overlay_scenarios(ds_merged, 'melt_on_glacier'))
all_plots
When you start exploring the plots by dragging them around or zooming in (using the tools of the toolboxes in the upper right corner of each plot) you see that the x-axes are connected. This makes it very convenient for example to look at different periods for all variables interactively.
You also can open the plots in a new browser tab by using
all_plots.show()
or save it as a html file for sharing
plots_to_save = pn.panel(all_plots)
plots_to_save.save('GCM_runs.html', embed=True)
What’s next?#
return to the OGGM documentation
back to the table of contents