Use catalogs with assets containing multiple variables#
By default, intake-esm assumes that the data assets (files) contain a single variable (e.g. temperature, precipitation, etc..). If you have multiple variables in your data files, intake-esm requires the following:
the
variable_columnof the catalog must contain iterables (list, tuple, set) of values (e.g.['temperature', 'precipitation']).the user must provide converters with appropriate functions for parsing values in the
variable_column(and/or any other column with iterables) into iterables when loading the catalog. There are two ways to do this with theopen_esm_datastorefunction: either pass the converter functions directly through theread_kwargsargument, or specify the columns incolumns_with_iterablesparameter. The latter is a shortcut for the former. Both are demonstrated below.
Inspect the catalog#
In the example below, we are are going to use the following catalog to demonstrate how to work with multi-variable assets:
# Look at the catalog on disk
!cat multi-variable-catalog.csv
experiment,case,component,stream,variable,member_id,path,time_range
CTRL,b.e11.B1850C5CN.f09_g16.005,ocn,pop.h,"['SHF', 'REGION_MASK', 'ANGLE', 'DXU', 'KMT', 'NO2', 'O2']",5,../../../tests/sample_data/cesm-multi-variables/b.e11.B1850C5CN.f09_g16.005.pop.h.SHF-NO2-O2.050001-050012.nc,050001-050012
CTRL,b.e11.B1850C5CN.f09_g16.005,ocn,pop.h,"['SHF', 'REGION_MASK', 'ANGLE', 'DXU', 'KMT', 'NO2', 'O2']",5,../../../tests/sample_data/cesm-multi-variables/b.e11.B1850C5CN.f09_g16.005.pop.h.SHF-NO2-O2.050101-050112.nc,050101-050112
CTRL,b.e11.B1850C5CN.f09_g16.005,ocn,pop.h,"['SHF', 'REGION_MASK', 'ANGLE', 'DXU', 'KMT', 'NO2', 'PO4']",5,../../../tests/sample_data/cesm-multi-variables/b.e11.B1850C5CN.f09_g16.005.pop.h.SHF-NO2-PO4.050001-050012.nc,050001-050012
CTRL,b.e11.B1850C5CN.f09_g16.005,ocn,pop.h,"['SHF', 'REGION_MASK', 'ANGLE', 'DXU', 'KMT', 'NO2', 'PO4']",5,../../../tests/sample_data/cesm-multi-variables/b.e11.B1850C5CN.f09_g16.005.pop.h.SHF-NO2-PO4.050101-050112.nc,050101-050112
CTRL,b.e11.B1850C5CN.f09_g16.005,ocn,pop.h,"['SHF', 'REGION_MASK', 'ANGLE', 'DXU', 'KMT', 'TEMP', 'SiO3']",5,../../../tests/sample_data/cesm-multi-variables/b.e11.B1850C5CN.f09_g16.005.pop.h.SHF-TEMP-SiO3.050001-050012.nc,050001-050012
As you can see, the variable column contains a list of varibles, and this list
was serialized as a string:
"['SHF', 'REGION_MASK', 'ANGLE', 'DXU', 'KMT', 'NO2', 'O2']".
Load the catalog#
import intake
import ast
import dask
# Make sure this is single-threaded
dask.config.set(scheduler='single-threaded')
cat = intake.open_esm_datastore(
"multi-variable-catalog.json",
read_kwargs={"converters": {"variable": ast.literal_eval}},
)
cat
sample-multi-variable-cesm1-lens catalog with 1 dataset(s) from 5 asset(s):
| unique | |
|---|---|
| experiment | 1 |
| case | 1 |
| component | 1 |
| stream | 1 |
| variable | 10 |
| member_id | 1 |
| path | 5 |
| time_range | 2 |
| derived_variable | 0 |
To confirm that intake-esm has loaded the catalog correctly, we can inspect the .has_multiple_variable_assets property:
cat.esmcat.has_multiple_variable_assets
True
Alternatively, we can specify the variable column name in the columns_with_iterables parameter:
cat = intake.open_esm_datastore(
"multi-variable-catalog.json",
columns_with_iterables=["variable"],
)
cat.esmcat.has_multiple_variable_assets
True
Search for datasets#
The search functionality works in the same way:
cat_subset =cat.search(variable=["O2", "SiO3"])
cat_subset.df
| experiment | case | component | stream | variable | member_id | path | time_range | |
|---|---|---|---|---|---|---|---|---|
| 0 | CTRL | b.e11.B1850C5CN.f09_g16.005 | ocn | pop.h | (SHF, REGION_MASK, ANGLE, DXU, KMT, NO2, O2) | 5 | ../../../tests/sample_data/cesm-multi-variable... | 050001-050012 |
| 1 | CTRL | b.e11.B1850C5CN.f09_g16.005 | ocn | pop.h | (SHF, REGION_MASK, ANGLE, DXU, KMT, NO2, O2) | 5 | ../../../tests/sample_data/cesm-multi-variable... | 050101-050112 |
| 2 | CTRL | b.e11.B1850C5CN.f09_g16.005 | ocn | pop.h | (SHF, REGION_MASK, ANGLE, DXU, KMT, TEMP, SiO3) | 5 | ../../../tests/sample_data/cesm-multi-variable... | 050001-050012 |
Interactively search the catalog#
We can also use the interactive attribute of a catalog to interactively search the catalog. This will not save any searches, but allows you to explore the catalog in a quick and intuitive way.
cat.interactive
Load assets into xarray datasets#
When loading the data files into xarray datasets, intake-esm will load only data variables that were requested. For example, if a data file contains ten data variables and the user requests for two variables, intake-esm will load the two requested variables plus necessary coordinates information.
dsets = cat_subset.to_dataset_dict()
dsets
--> The keys in the returned dictionary of datasets are constructed as follows:
'component.experiment.stream'
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/latest/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for join will change from join='outer' to join='exact'. This change will result in the following ValueError: cannot be aligned with join='exact' because index/labels/sizes are not equal along these coordinates (dimensions): 'time' ('time',) The recommendation is to set join explicitly for this case.
self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/latest/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/latest/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/latest/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/latest/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/latest/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/latest/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/latest/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/latest/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/latest/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/latest/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/latest/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/latest/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/latest/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/latest/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/latest/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/latest/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/latest/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/latest/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/latest/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/latest/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/latest/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/latest/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/latest/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/latest/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/latest/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/latest/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/latest/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/latest/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/latest/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/latest/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/latest/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/latest/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/latest/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/latest/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
self._ds = xr.combine_by_coords(
{'ocn.CTRL.pop.h': <xarray.Dataset> Size: 1kB
Dimensions: (time: 24, member_id: 1, nlat: 2, nlon: 2)
Coordinates: (12/36)
* time (time) object 192B 0500-02-01 00:00:00 ... 0502-02-01...
* member_id (member_id) object 8B '5'
TLAT (nlat, nlon) float64 32B dask.array<chunksize=(2, 2), meta=np.ndarray>
TLONG (nlat, nlon) float64 32B dask.array<chunksize=(2, 2), meta=np.ndarray>
ULAT (nlat, nlon) float64 32B dask.array<chunksize=(2, 2), meta=np.ndarray>
ULONG (nlat, nlon) float64 32B dask.array<chunksize=(2, 2), meta=np.ndarray>
... ...
salt_to_ppt float64 8B 1e+03
sea_ice_salinity float64 8B 4.0
sflux_factor float64 8B 0.1
sound float64 8B 1.5e+05
stefan_boltzmann float64 8B 5.67e-08
vonkar float64 8B 0.4
Dimensions without coordinates: nlat, nlon
Data variables:
O2 (member_id, time, nlat, nlon) float32 384B dask.array<chunksize=(1, 12, 2, 2), meta=np.ndarray>
SiO3 (member_id, time, nlat, nlon) float32 384B dask.array<chunksize=(1, 12, 2, 2), meta=np.ndarray>
Attributes: (12/23)
title: b.e11.B1850C5CN.f09_g16.005
history: Fri Oct 11 01:05:51 2013: /glade/apps/op...
Conventions: CF-1.0; http://www.cgd.ucar.edu/cms/eato...
contents: Diagnostic and Prognostic Variables
source: CCSM POP2, the CCSM Ocean Component
revision: $Id: tavg.F90 41939 2012-11-14 16:37:23Z...
... ...
intake_esm_attrs:stream: pop.h
intake_esm_attrs:member_id: 5
intake_esm_attrs:_data_format_: netcdf
intake_esm_attrs:path: ../../../tests/sample_data/cesm-multi-va...
intake_esm_attrs:time_range: 050001-050012
intake_esm_dataset_key: ocn.CTRL.pop.h}
Why does intake.open_esm_datastore need the columns_with_iterables parameter?#
Why does intake intake.open_esm_datastore need the columns_with_iterables argument when we can achieve the same functionality with just read_kwargs? Intake facilitates writing YAML descriptions of catalogs that can be opened with intake.open_catalog. These YAML descriptions include the information required to open the catalog: things like the catalog driver (intake_esm.core.esm_datastore in our case) and the arguments to pass to the driver to open the catalog. They can be included as entries in other catalogs enabling features like catalog nesting. However, intake does not support Python function arguments like those we provided to read_kwargs above so if we want a functional intake YAML description of an intake-esm catalog with multi-variable assets we need to use the columns_with_iterables argument instead. You can return an intake YAML description of an esm_datastore as follows:
cat.name = "my-esm-catalog"
print(cat.yaml())
sources:
my-esm-catalog:
args:
columns_with_iterables:
- variable
obj: multi-variable-catalog.json
description: ''
driver: intake_esm.core.esm_datastore
metadata: {}