In addition to catalogs of data assets (files) in time-series (single-variable) format, intake-esm supports catalogs with data assets in time-slice (history) format and/or files with multiple variables. For intake-esm to properly work with multi-variable assets,
the variable_column of the catalog must contain iterables (list, tuple, set) of values.
variable_column
the user must specifiy a dictionary of functions for converting values in certain columns into iterables. This is done via the csv_kwargs argument.
csv_kwargs
In the example below, we are are going to use the following catalog to demonstrate how to work with multi-variable assets:
# Look at the catalog on disk !cat multi-variable-catalog.csv
experiment,case,component,stream,variable,member_id,path,time_range CTRL,b.e11.B1850C5CN.f09_g16.005,ocn,pop.h,"['SHF', 'REGION_MASK', 'ANGLE', 'DXU', 'KMT', 'NO2', 'O2']",5,../../../tests/sample_data/cesm-multi-variables/b.e11.B1850C5CN.f09_g16.005.pop.h.SHF-NO2-O2.050001-050012.nc,050001-050012 CTRL,b.e11.B1850C5CN.f09_g16.005,ocn,pop.h,"['SHF', 'REGION_MASK', 'ANGLE', 'DXU', 'KMT', 'NO2', 'O2']",5,../../../tests/sample_data/cesm-multi-variables/b.e11.B1850C5CN.f09_g16.005.pop.h.SHF-NO2-O2.050101-050112.nc,050101-050112 CTRL,b.e11.B1850C5CN.f09_g16.005,ocn,pop.h,"['SHF', 'REGION_MASK', 'ANGLE', 'DXU', 'KMT', 'NO2', 'PO4']",5,../../../tests/sample_data/cesm-multi-variables/b.e11.B1850C5CN.f09_g16.005.pop.h.SHF-NO2-PO4.050001-050012.nc,050001-050012 CTRL,b.e11.B1850C5CN.f09_g16.005,ocn,pop.h,"['SHF', 'REGION_MASK', 'ANGLE', 'DXU', 'KMT', 'NO2', 'PO4']",5,../../../tests/sample_data/cesm-multi-variables/b.e11.B1850C5CN.f09_g16.005.pop.h.SHF-NO2-PO4.050101-050112.nc,050101-050112 CTRL,b.e11.B1850C5CN.f09_g16.005,ocn,pop.h,"['SHF', 'REGION_MASK', 'ANGLE', 'DXU', 'KMT', 'TEMP', 'SiO3']",5,../../../tests/sample_data/cesm-multi-variables/b.e11.B1850C5CN.f09_g16.005.pop.h.SHF-TEMP-SiO3.050001-050012.nc,050001-050012
As you can see, the variable column contains a list of varibles, and this list was serialized as a string: "['SHF', 'REGION_MASK', 'ANGLE', 'DXU', 'KMT', 'NO2', 'O2']".
"['SHF', 'REGION_MASK', 'ANGLE', 'DXU', 'KMT', 'NO2', 'O2']"
To load a catalog with multiple variable files, we must pass additional information to open_esm_datastore via the csv_kwargs argument. We are going to specify a dictionary of functions for converting values in variable column into iterables. We use the literal_eval function from the standard ast module:
open_esm_datastore
variable
literal_eval
ast
import ast import intake
col = intake.open_esm_datastore( "multi-variable-collection.json", csv_kwargs={"converters": {"variable": ast.literal_eval}}, ) col
sample-multi-variable-cesm1-lens catalog with 1 dataset(s) from 5 asset(s):
col.df.head()
The in-memory representation of the catalog contains variable with tuple of values. To confirm that intake-esm has registered this catalog with multiple variable assets, we can the ._multiple_variable_assets property:
._multiple_variable_assets
col._multiple_variable_assets
True
The search functionatilty works in the same way:
col_subset = col.search(variable=["O2", "SiO3"]) col_subset.df
Loading data assets into xarray datasets works in the same way too:
col_subset.to_dataset_dict(cdf_kwargs={})
--> The keys in the returned dictionary of datasets are constructed as follows: 'component.experiment.stream'
{'ocn.CTRL.pop.h': <xarray.Dataset> Dimensions: (member_id: 1, nlat: 2, nlon: 2, time: 24) Coordinates: * time (time) object 0500-02-01 00:00:00 ... 0502-02-01 00:00:00 TLAT (nlat, nlon) float64 dask.array<chunksize=(2, 2), meta=np.ndarray> TLONG (nlat, nlon) float64 dask.array<chunksize=(2, 2), meta=np.ndarray> ULAT (nlat, nlon) float64 dask.array<chunksize=(2, 2), meta=np.ndarray> ULONG (nlat, nlon) float64 dask.array<chunksize=(2, 2), meta=np.ndarray> * member_id (member_id) int64 5 Dimensions without coordinates: nlat, nlon Data variables: O2 (member_id, time, nlat, nlon) float32 dask.array<chunksize=(1, 12, 2, 2), meta=np.ndarray> SiO3 (member_id, time, nlat, nlon) float32 dask.array<chunksize=(1, 24, 2, 2), meta=np.ndarray> Attributes: contents: Diagnostic and Prognostic Variables nsteps_total: 1953500 tavg_sum: 2678400.0 calendar: All years have exactly 365 days. tavg_sum_qflux: 2678400.0 source: CCSM POP2, the CCSM Ocean Component NCO: 4.3.4 title: b.e11.B1850C5CN.f09_g16.005 Conventions: CF-1.0; http://www.cgd.ucar.edu/cms/eaton/netc... cell_methods: cell_methods = time: mean ==> the variable val... history: Fri Oct 11 01:05:51 2013: /glade/apps/opt/nco/... intake_esm_varname: O2\nSiO3 nco_openmp_thread_number: 1 revision: $Id: tavg.F90 41939 2012-11-14 16:37:23Z mlevy... start_time: This dataset was created on 2013-05-28 at 02:4... intake_esm_dataset_key: ocn.CTRL.pop.h}