Use catalogs with assets containing multiple variables#

By default, intake-esm assumes that the data assets (files) contain a single variable (e.g. temperature, precipitation, etc..). If you have multiple variables in your data files, intake-esm requires the following:

  • the variable_column of the catalog must contain iterables (list, tuple, set) of values (e.g. ['temperature', 'precipitation']).

  • the user must provide a converters dictionary with appropriate functions for parsing values in the variable_column and/or any other column with iterables into iterables when loading the catalog. This is done via the read_csv_kwargs argument of the open_esm_datastore function.

Inspect the catalog#

In the example below, we are are going to use the following catalog to demonstrate how to work with multi-variable assets:

# Look at the catalog on disk
!cat multi-variable-catalog.csv
CTRL,b.e11.B1850C5CN.f09_g16.005,ocn,pop.h,"['SHF', 'REGION_MASK', 'ANGLE', 'DXU', 'KMT', 'NO2', 'O2']",5,../../../tests/sample_data/cesm-multi-variables/,050001-050012
CTRL,b.e11.B1850C5CN.f09_g16.005,ocn,pop.h,"['SHF', 'REGION_MASK', 'ANGLE', 'DXU', 'KMT', 'NO2', 'O2']",5,../../../tests/sample_data/cesm-multi-variables/,050101-050112
CTRL,b.e11.B1850C5CN.f09_g16.005,ocn,pop.h,"['SHF', 'REGION_MASK', 'ANGLE', 'DXU', 'KMT', 'NO2', 'PO4']",5,../../../tests/sample_data/cesm-multi-variables/,050001-050012
CTRL,b.e11.B1850C5CN.f09_g16.005,ocn,pop.h,"['SHF', 'REGION_MASK', 'ANGLE', 'DXU', 'KMT', 'NO2', 'PO4']",5,../../../tests/sample_data/cesm-multi-variables/,050101-050112
CTRL,b.e11.B1850C5CN.f09_g16.005,ocn,pop.h,"['SHF', 'REGION_MASK', 'ANGLE', 'DXU', 'KMT', 'TEMP', 'SiO3']",5,../../../tests/sample_data/cesm-multi-variables/,050001-050012

As you can see, the variable column contains a list of varibles, and this list was serialized as a string: "['SHF', 'REGION_MASK', 'ANGLE', 'DXU', 'KMT', 'NO2', 'O2']".

Load the catalog#

import intake
import ast
import dask

# Make sure this is single-threaded

cat = intake.open_esm_datastore(
    read_csv_kwargs={"converters": {"variable": ast.literal_eval}},

sample-multi-variable-cesm1-lens catalog with 1 dataset(s) from 5 asset(s):

experiment 1
case 1
component 1
stream 1
variable 10
member_id 1
path 5
time_range 2
derived_variable 0

To confirm that intake-esm has loaded the catalog correctly, we can inspect the .has_multiple_variable_assets property:


Search for datasets#

The search functionatilty works in the same way:

cat_subset["O2", "SiO3"])
experiment case component stream variable member_id path time_range
0 CTRL b.e11.B1850C5CN.f09_g16.005 ocn pop.h [SHF, REGION_MASK, ANGLE, DXU, KMT, NO2, O2] 5 ../../../tests/sample_data/cesm-multi-variable... 050001-050012
1 CTRL b.e11.B1850C5CN.f09_g16.005 ocn pop.h [SHF, REGION_MASK, ANGLE, DXU, KMT, NO2, O2] 5 ../../../tests/sample_data/cesm-multi-variable... 050101-050112
2 CTRL b.e11.B1850C5CN.f09_g16.005 ocn pop.h [SHF, REGION_MASK, ANGLE, DXU, KMT, TEMP, SiO3] 5 ../../../tests/sample_data/cesm-multi-variable... 050001-050012

Load assets into xarray datasets#

When loading the data files into xarray datasets, intake-esm will load only data variables that were requested. For example, if a data file contains ten data variables and the user requests for two variables, intake-esm will load the two requested variables plus necessary coordinates information.

dsets = cat_subset.to_dataset_dict()
--> The keys in the returned dictionary of datasets are constructed as follows:
100.00% [1/1 00:00<00:00]
{'ocn.CTRL.pop.h': <xarray.Dataset>
 Dimensions:             (time: 24, member_id: 1, nlat: 2, nlon: 2)
 Coordinates: (12/36)
   * time                (time) object 0500-02-01 00:00:00 ... 0502-02-01 00:0...
   * member_id           (member_id) int64 5
     T0_Kelvin           float64 273.1
     TLAT                (nlat, nlon) float64 dask.array<chunksize=(2, 2), meta=np.ndarray>
     TLONG               (nlat, nlon) float64 dask.array<chunksize=(2, 2), meta=np.ndarray>
     ULAT                (nlat, nlon) float64 dask.array<chunksize=(2, 2), meta=np.ndarray>
     ...                  ...
     salt_to_ppt         float64 1e+03
     sea_ice_salinity    float64 4.0
     sflux_factor        float64 0.1
     sound               float64 1.5e+05
     stefan_boltzmann    float64 5.67e-08
     vonkar              float64 0.4
 Dimensions without coordinates: nlat, nlon
 Data variables:
     O2                  (member_id, time, nlat, nlon) float32 dask.array<chunksize=(1, 12, 2, 2), meta=np.ndarray>
     SiO3                (member_id, time, nlat, nlon) float32 dask.array<chunksize=(1, 24, 2, 2), meta=np.ndarray>
 Attributes: (12/23)
     title:                           b.e11.B1850C5CN.f09_g16.005
     history:                         Fri Oct 11 01:05:51 2013: /glade/apps/op...
     Conventions:                     CF-1.0;
     contents:                        Diagnostic and Prognostic Variables
     source:                          CCSM POP2, the CCSM Ocean Component
     revision:                        $Id: tavg.F90 41939 2012-11-14 16:37:23Z...
     ...                              ...
     intake_esm_attrs:stream:         pop.h
     intake_esm_attrs:member_id:      5
     intake_esm_attrs:_data_format_:  netcdf
     intake_esm_attrs:path:           ../../../tests/sample_data/cesm-multi-va...
     intake_esm_attrs:time_range:     050001-050012
     intake_esm_dataset_key:          ocn.CTRL.pop.h}
Hide code cell source
import intake_esm  # just to display version information
Hide code cell output

cftime: 1.6.2
dask: 2023.3.1
fastprogress: 1.0.3
fsspec: 2023.3.0
gcsfs: 2023.3.0
intake: 0.6.8
intake_esm: 2022.9.18.post30+dirty
netCDF4: 1.6.3
pandas: 1.5.3
requests: 2.28.2
s3fs: 2023.3.0
xarray: 2023.2.0
zarr: 2.14.2