Use catalogs with assets containing multiple variables#

By default, intake-esm assumes that the data assets (files) contain a single variable (e.g. temperature, precipitation, etc..). If you have multiple variables in your data files, intake-esm requires the following:

the variable_column of the catalog must contain iterables (list, tuple, set) of values (e.g. ['temperature', 'precipitation']).
the user must provide converters with appropriate functions for parsing values in the variable_column (and/or any other column with iterables) into iterables when loading the catalog. There are two ways to do this with the open_esm_datastore function: either pass the converter functions directly through the read_kwargs argument, or specify the columns in columns_with_iterables parameter. The latter is a shortcut for the former. Both are demonstrated below.

Inspect the catalog#

In the example below, we are are going to use the following catalog to demonstrate how to work with multi-variable assets:

# Look at the catalog on disk
!cat multi-variable-catalog.csv

experiment,case,component,stream,variable,member_id,path,time_range
CTRL,b.e11.B1850C5CN.f09_g16.005,ocn,pop.h,"['SHF', 'REGION_MASK', 'ANGLE', 'DXU', 'KMT', 'NO2', 'O2']",5,../../../tests/sample_data/cesm-multi-variables/b.e11.B1850C5CN.f09_g16.005.pop.h.SHF-NO2-O2.050001-050012.nc,050001-050012
CTRL,b.e11.B1850C5CN.f09_g16.005,ocn,pop.h,"['SHF', 'REGION_MASK', 'ANGLE', 'DXU', 'KMT', 'NO2', 'O2']",5,../../../tests/sample_data/cesm-multi-variables/b.e11.B1850C5CN.f09_g16.005.pop.h.SHF-NO2-O2.050101-050112.nc,050101-050112
CTRL,b.e11.B1850C5CN.f09_g16.005,ocn,pop.h,"['SHF', 'REGION_MASK', 'ANGLE', 'DXU', 'KMT', 'NO2', 'PO4']",5,../../../tests/sample_data/cesm-multi-variables/b.e11.B1850C5CN.f09_g16.005.pop.h.SHF-NO2-PO4.050001-050012.nc,050001-050012
CTRL,b.e11.B1850C5CN.f09_g16.005,ocn,pop.h,"['SHF', 'REGION_MASK', 'ANGLE', 'DXU', 'KMT', 'NO2', 'PO4']",5,../../../tests/sample_data/cesm-multi-variables/b.e11.B1850C5CN.f09_g16.005.pop.h.SHF-NO2-PO4.050101-050112.nc,050101-050112
CTRL,b.e11.B1850C5CN.f09_g16.005,ocn,pop.h,"['SHF', 'REGION_MASK', 'ANGLE', 'DXU', 'KMT', 'TEMP', 'SiO3']",5,../../../tests/sample_data/cesm-multi-variables/b.e11.B1850C5CN.f09_g16.005.pop.h.SHF-TEMP-SiO3.050001-050012.nc,050001-050012

As you can see, the variable column contains a list of varibles, and this list was serialized as a string: "['SHF', 'REGION_MASK', 'ANGLE', 'DXU', 'KMT', 'NO2', 'O2']".

Load the catalog#

import intake
import ast
import dask

# Make sure this is single-threaded
dask.config.set(scheduler='single-threaded')

cat = intake.open_esm_datastore(
    "multi-variable-catalog.json",
    read_kwargs={"converters": {"variable": ast.literal_eval}},
)
cat

sample-multi-variable-cesm1-lens catalog with 1 dataset(s) from 5 asset(s):

	unique
experiment	1
case	1
component	1
stream	1
variable	10
member_id	1
path	5
time_range	2
derived_variable	0

To confirm that intake-esm has loaded the catalog correctly, we can inspect the .has_multiple_variable_assets property:

cat.esmcat.has_multiple_variable_assets

True

Alternatively, we can specify the variable column name in the columns_with_iterables parameter:

cat = intake.open_esm_datastore(
    "multi-variable-catalog.json",
    columns_with_iterables=["variable"],
)
cat.esmcat.has_multiple_variable_assets

True

Search for datasets#

The search functionality works in the same way:

cat_subset =cat.search(variable=["O2", "SiO3"])
cat_subset.df

	experiment	case	component	stream	variable	member_id	path	time_range
0	CTRL	b.e11.B1850C5CN.f09_g16.005	ocn	pop.h	(SHF, REGION_MASK, ANGLE, DXU, KMT, NO2, O2)	5	../../../tests/sample_data/cesm-multi-variable...	050001-050012
1	CTRL	b.e11.B1850C5CN.f09_g16.005	ocn	pop.h	(SHF, REGION_MASK, ANGLE, DXU, KMT, NO2, O2)	5	../../../tests/sample_data/cesm-multi-variable...	050101-050112
2	CTRL	b.e11.B1850C5CN.f09_g16.005	ocn	pop.h	(SHF, REGION_MASK, ANGLE, DXU, KMT, TEMP, SiO3)	5	../../../tests/sample_data/cesm-multi-variable...	050001-050012

Interactively search the catalog#

We can also use the interactive attribute of a catalog to interactively search the catalog. This will not save any searches, but allows you to explore the catalog in a quick and intuitive way.

cat.interactive

Loading ITables v2.5.2 from the internet... (need help?)

Load assets into xarray datasets#

When loading the data files into xarray datasets, intake-esm will load only data variables that were requested. For example, if a data file contains ten data variables and the user requests for two variables, intake-esm will load the two requested variables plus necessary coordinates information.

dsets = cat_subset.to_dataset_dict()
dsets

--> The keys in the returned dictionary of datasets are constructed as follows:
	'component.experiment.stream'

100.00% [1/1 00:00<00:00]

/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/stable/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for join will change from join='outer' to join='exact'. This change will result in the following ValueError: cannot be aligned with join='exact' because index/labels/sizes are not equal along these coordinates (dimensions): 'time' ('time',) The recommendation is to set join explicitly for this case.
  self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/stable/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/stable/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/stable/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/stable/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/stable/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/stable/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/stable/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/stable/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/stable/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/stable/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/stable/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/stable/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/stable/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/stable/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/stable/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/stable/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/stable/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/stable/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/stable/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/stable/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/stable/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/stable/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/stable/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/stable/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/stable/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/stable/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/stable/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/stable/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/stable/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/stable/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/stable/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/stable/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/stable/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  self._ds = xr.combine_by_coords(
/home/docs/checkouts/readthedocs.org/user_builds/intake-esm/checkouts/stable/intake_esm/source.py:308: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  self._ds = xr.combine_by_coords(

{'ocn.CTRL.pop.h': <xarray.Dataset> Size: 1kB
 Dimensions:             (time: 24, member_id: 1, nlat: 2, nlon: 2)
 Coordinates: (12/36)
   * time                (time) object 192B 0500-02-01 00:00:00 ... 0502-02-01...
   * member_id           (member_id) object 8B '5'
     T0_Kelvin           float64 8B 273.1
     cp_air              float64 8B 1.005e+03
     cp_sw               float64 8B 3.996e+07
     days_in_norm_year   float64 8B 365.0
     ...                  ...
     stefan_boltzmann    float64 8B 5.67e-08
     vonkar              float64 8B 0.4
     TLAT                (nlat, nlon) float64 32B dask.array<chunksize=(2, 2), meta=np.ndarray>
     TLONG               (nlat, nlon) float64 32B dask.array<chunksize=(2, 2), meta=np.ndarray>
     ULAT                (nlat, nlon) float64 32B dask.array<chunksize=(2, 2), meta=np.ndarray>
     ULONG               (nlat, nlon) float64 32B dask.array<chunksize=(2, 2), meta=np.ndarray>
 Dimensions without coordinates: nlat, nlon
 Data variables:
     O2                  (member_id, time, nlat, nlon) float32 384B dask.array<chunksize=(1, 12, 2, 2), meta=np.ndarray>
     SiO3                (member_id, time, nlat, nlon) float32 384B dask.array<chunksize=(1, 12, 2, 2), meta=np.ndarray>
 Attributes: (12/23)
     title:                           b.e11.B1850C5CN.f09_g16.005
     history:                         Fri Oct 11 01:05:51 2013: /glade/apps/op...
     Conventions:                     CF-1.0; http://www.cgd.ucar.edu/cms/eato...
     contents:                        Diagnostic and Prognostic Variables
     source:                          CCSM POP2, the CCSM Ocean Component
     revision:                        $Id: tavg.F90 41939 2012-11-14 16:37:23Z...
     ...                              ...
     intake_esm_attrs:stream:         pop.h
     intake_esm_attrs:member_id:      5
     intake_esm_attrs:_data_format_:  netcdf
     intake_esm_attrs:path:           ../../../tests/sample_data/cesm-multi-va...
     intake_esm_attrs:time_range:     050001-050012
     intake_esm_dataset_key:          ocn.CTRL.pop.h}

Why does `intake.open_esm_datastore` need the `columns_with_iterables` parameter?#

Why does intake intake.open_esm_datastore need the columns_with_iterables argument when we can achieve the same functionality with just read_kwargs? Intake facilitates writing YAML descriptions of catalogs that can be opened with intake.open_catalog. These YAML descriptions include the information required to open the catalog: things like the catalog driver (intake_esm.core.esm_datastore in our case) and the arguments to pass to the driver to open the catalog. They can be included as entries in other catalogs enabling features like catalog nesting. However, intake does not support Python function arguments like those we provided to read_kwargs above so if we want a functional intake YAML description of an intake-esm catalog with multi-variable assets we need to use the columns_with_iterables argument instead. You can return an intake YAML description of an esm_datastore as follows:

cat.name = "my-esm-catalog"
print(cat.yaml())

sources:
  my-esm-catalog:
    args:
      columns_with_iterables:
      - variable
      obj: multi-variable-catalog.json
    description: ''
    driver: intake_esm.core.esm_datastore
    metadata: {}