API Reference#

This page provides an auto-generated summary of intake-esm’s API. For more details and examples, refer to the relevant chapters in the main part of the documentation.

ESM Datastore (intake.open_esm_datastore)#

class intake_esm.core.esm_datastore(*args, **kwargs)[source]

An intake plugin for parsing an ESM (Earth System Model) Catalog and loading assets (netCDF files and/or Zarr stores) into xarray datasets. The in-memory representation for the catalog is a Pandas DataFrame.

Parameters
  • obj (str, dict) – If string, this must be a path or URL to an ESM catalog JSON file. If dict, this must be a dict representation of an ESM catalog. This dict must have two keys: ‘esmcat’ and ‘df’. The ‘esmcat’ key must be a dict representation of the ESM catalog and the ‘df’ key must be a Pandas DataFrame containing content that would otherwise be in a CSV file.

  • sep (str, optional) – Delimiter to use when constructing a key for a query, by default ‘.’

  • registry (DerivedVariableRegistry, optional) – Registry of derived variables to use, by default None. If not provided, uses the default registry.

  • read_csv_kwargs (dict, optional) – Additional keyword arguments passed through to the read_csv() function.

  • storage_options (dict, optional) – Parameters passed to the backend file-system such as Google Cloud Storage, Amazon Web Service S3.

  • intake_kwargs (dict, optional) – Additional keyword arguments are passed through to the Catalog base class.

Examples

At import time, this plugin is available in intake’s registry as esm_datastore and can be accessed with intake.open_esm_datastore():

>>> import intake
>>> url = "https://storage.googleapis.com/cmip6/pangeo-cmip6.json"
>>> cat = intake.open_esm_datastore(url)
>>> cat.df.head()
activity_id institution_id source_id experiment_id  ... variable_id grid_label                                             zstore dcpp_init_year
0  AerChemMIP            BCC  BCC-ESM1        ssp370  ...          pr         gn  gs://cmip6/AerChemMIP/BCC/BCC-ESM1/ssp370/r1i1...            NaN
1  AerChemMIP            BCC  BCC-ESM1        ssp370  ...        prsn         gn  gs://cmip6/AerChemMIP/BCC/BCC-ESM1/ssp370/r1i1...            NaN
2  AerChemMIP            BCC  BCC-ESM1        ssp370  ...         tas         gn  gs://cmip6/AerChemMIP/BCC/BCC-ESM1/ssp370/r1i1...            NaN
3  AerChemMIP            BCC  BCC-ESM1        ssp370  ...      tasmax         gn  gs://cmip6/AerChemMIP/BCC/BCC-ESM1/ssp370/r1i1...            NaN
4  AerChemMIP            BCC  BCC-ESM1        ssp370  ...      tasmin         gn  gs://cmip6/AerChemMIP/BCC/BCC-ESM1/ssp370/r1i1...            NaN
__getitem__(key)[source]

This method takes a key argument and return a data source corresponding to assets (files) that will be aggregated into a single xarray dataset.

Parameters

key (str) – key to use for catalog entry lookup

Returns

intake_esm.source.ESMDataSource – A data source by name (key)

Raises

KeyError – if key is not found.

Examples

>>> cat = intake.open_esm_datastore("mycatalog.json")
>>> data_source = cat["AerChemMIP.BCC.BCC-ESM1.piClim-control.AERmon.gn"]
keys()[source]

Get keys for the catalog entries

Returns

list – keys for the catalog entries

keys_info()[source]

Get keys for the catalog entries and their metadata

Returns

pandas.DataFrame – keys for the catalog entries and their metadata

Examples

>>> import intake
>>> cat = intake.open_esm_datastore("./tests/sample-catalogs/cesm1-lens-netcdf.json")
>>> cat.keys_info()
                component experiment stream
key
ocn.20C.pop.h         ocn        20C  pop.h
ocn.CTRL.pop.h        ocn       CTRL  pop.h
ocn.RCP85.pop.h       ocn      RCP85  pop.h
nunique()[source]

Count distinct observations across dataframe columns in the catalog.

Examples

>>> import intake
>>> cat = intake.open_esm_datastore("pangeo-cmip6.json")
>>> cat.nunique()
activity_id          10
institution_id       23
source_id            48
experiment_id        29
member_id            86
table_id             19
variable_id         187
grid_label            7
zstore            27437
dcpp_init_year       59
dtype: int64
search(require_all_on=None, **query)[source]

Search for entries in the catalog.

Parameters
  • require_all_on (list, str, optional) – A dataframe column or a list of dataframe columns across which all entries must satisfy the query criteria. If None, return entries that fulfill any of the criteria specified in the query, by default None.

  • **query – keyword arguments corresponding to user’s query to execute against the dataframe.

Returns

cat (esm_datastore) – A new Catalog with a subset of the entries in this Catalog.

Examples

>>> import intake
>>> cat = intake.open_esm_datastore("pangeo-cmip6.json")
>>> cat.df.head(3)
activity_id institution_id source_id  ... grid_label                                             zstore dcpp_init_year
0  AerChemMIP            BCC  BCC-ESM1  ...         gn  gs://cmip6/AerChemMIP/BCC/BCC-ESM1/ssp370/r1i1...            NaN
1  AerChemMIP            BCC  BCC-ESM1  ...         gn  gs://cmip6/AerChemMIP/BCC/BCC-ESM1/ssp370/r1i1...            NaN
2  AerChemMIP            BCC  BCC-ESM1  ...         gn  gs://cmip6/AerChemMIP/BCC/BCC-ESM1/ssp370/r1i1...            NaN
>>> sub_cat = cat.search(
...     source_id=["BCC-CSM2-MR", "CNRM-CM6-1", "CNRM-ESM2-1"],
...     experiment_id=["historical", "ssp585"],
...     variable_id="pr",
...     table_id="Amon",
...     grid_label="gn",
... )
>>> sub_cat.df.head(3)
    activity_id institution_id    source_id  ... grid_label                                             zstore dcpp_init_year
260        CMIP            BCC  BCC-CSM2-MR  ...         gn  gs://cmip6/CMIP/BCC/BCC-CSM2-MR/historical/r1i...            NaN
346        CMIP            BCC  BCC-CSM2-MR  ...         gn  gs://cmip6/CMIP/BCC/BCC-CSM2-MR/historical/r2i...            NaN
401        CMIP            BCC  BCC-CSM2-MR  ...         gn  gs://cmip6/CMIP/BCC/BCC-CSM2-MR/historical/r3i...            NaN

The search method also accepts compiled regular expression objects from compile() as patterns.

>>> import re
>>> # Let's search for variables containing "Frac" in their name
>>> pat = re.compile(r"Frac")  # Define a regular expression
>>> cat.search(variable_id=pat)
>>> cat.df.head().variable_id
0     residualFrac
1    landCoverFrac
2    landCoverFrac
3     residualFrac
4    landCoverFrac
serialize(name, directory=None, catalog_type='dict', to_csv_kwargs=None, json_dump_kwargs=None, storage_options=None)[source]

Serialize catalog to corresponding json and csv files.

Parameters
  • name (str) – name to use when creating ESM catalog json file and csv catalog.

  • directory (str, PathLike, default None) – The path to the local directory. If None, use the current directory

  • catalog_type (str, default 'dict') – Whether to save the catalog table as a dictionary in the JSON file or as a separate CSV file.

  • to_csv_kwargs (dict, optional) – Additional keyword arguments passed through to the to_csv() method.

  • json_dump_kwargs (dict, optional) – Additional keyword arguments passed through to the dump() function.

  • storage_options (dict) – fsspec parameters passed to the backend file-system such as Google Cloud Storage, Amazon Web Service S3.

Notes

Large catalogs can result in large JSON files. To keep the JSON file size manageable, call with catalog_type=’file’ to save catalog as a separate CSV file.

Examples

>>> import intake
>>> cat = intake.open_esm_datastore("pangeo-cmip6.json")
>>> cat_subset = cat.search(
...     source_id="BCC-ESM1",
...     grid_label="gn",
...     table_id="Amon",
...     experiment_id="historical",
... )
>>> cat_subset.serialize(name="cmip6_bcc_esm1", catalog_type="file")
to_dask(**kwargs)[source]

Convert result to an xarray dataset.

This is only possible if the search returned exactly one result.

Parameters

kwargs (dict) – Parameters forwarded to to_dataset_dict().

Returns

Dataset

to_dataset_dict(xarray_open_kwargs=None, xarray_combine_by_coords_kwargs=None, preprocess=None, storage_options=None, progressbar=None, aggregate=None, skip_on_error=False, **kwargs)[source]

Load catalog entries into a dictionary of xarray datasets.

Column values, dataset keys and requested variables are added as global attributes on the returned datasets. The names of these attributes can be customized with intake_esm.utils.set_options.

Parameters
  • xarray_open_kwargs (dict) – Keyword arguments to pass to open_dataset() function

  • xarray_combine_by_coords_kwargs (: dict) – Keyword arguments to pass to combine_by_coords() function.

  • preprocess (callable, optional) – If provided, call this function on each dataset prior to aggregation.

  • storage_options (dict, optional) – fsspec Parameters passed to the backend file-system such as Google Cloud Storage, Amazon Web Service S3.

  • progressbar (bool) – If True, will print a progress bar to standard error (stderr) when loading assets into Dataset.

  • aggregate (bool, optional) – If False, no aggregation will be done.

  • skip_on_error (bool, optional) – If True, skip datasets that cannot be loaded and/or variables we are unable to derive.

Returns

dsets (dict) – A dictionary of xarray Dataset.

Examples

>>> import intake
>>> cat = intake.open_esm_datastore("glade-cmip6.json")
>>> sub_cat = cat.search(
...     source_id=["BCC-CSM2-MR", "CNRM-CM6-1", "CNRM-ESM2-1"],
...     experiment_id=["historical", "ssp585"],
...     variable_id="pr",
...     table_id="Amon",
...     grid_label="gn",
... )
>>> dsets = sub_cat.to_dataset_dict()
>>> dsets.keys()
dict_keys(['CMIP.BCC.BCC-CSM2-MR.historical.Amon.gn', 'ScenarioMIP.BCC.BCC-CSM2-MR.ssp585.Amon.gn'])
>>> dsets["CMIP.BCC.BCC-CSM2-MR.historical.Amon.gn"]
<xarray.Dataset>
Dimensions:    (bnds: 2, lat: 160, lon: 320, member_id: 3, time: 1980)
Coordinates:
* lon        (lon) float64 0.0 1.125 2.25 3.375 ... 355.5 356.6 357.8 358.9
* lat        (lat) float64 -89.14 -88.03 -86.91 -85.79 ... 86.91 88.03 89.14
* time       (time) object 1850-01-16 12:00:00 ... 2014-12-16 12:00:00
* member_id  (member_id) <U8 'r1i1p1f1' 'r2i1p1f1' 'r3i1p1f1'
Dimensions without coordinates: bnds
Data variables:
    lat_bnds   (lat, bnds) float64 dask.array<chunksize=(160, 2), meta=np.ndarray>
    lon_bnds   (lon, bnds) float64 dask.array<chunksize=(320, 2), meta=np.ndarray>
    time_bnds  (time, bnds) object dask.array<chunksize=(1980, 2), meta=np.ndarray>
    pr         (member_id, time, lat, lon) float32 dask.array<chunksize=(1, 600, 160, 320), meta=np.ndarray>
to_datatree(xarray_open_kwargs=None, xarray_combine_by_coords_kwargs=None, preprocess=None, storage_options=None, progressbar=None, aggregate=None, skip_on_error=False, **kwargs)[source]

Load catalog entries into a tree of xarray datasets.

Parameters
  • xarray_open_kwargs (dict) – Keyword arguments to pass to open_dataset() function

  • xarray_combine_by_coords_kwargs (: dict) – Keyword arguments to pass to combine_by_coords() function.

  • preprocess (callable, optional) – If provided, call this function on each dataset prior to aggregation.

  • storage_options (dict, optional) – Parameters passed to the backend file-system such as Google Cloud Storage, Amazon Web Service S3.

  • progressbar (bool) – If True, will print a progress bar to standard error (stderr) when loading assets into Dataset.

  • aggregate (bool, optional) – If False, no aggregation will be done.

  • skip_on_error (bool, optional) – If True, skip datasets that cannot be loaded and/or variables we are unable to derive.

Returns

dsets (DataTree) – A tree of xarray Dataset.

Examples

>>> import intake
>>> cat = intake.open_esm_datastore("glade-cmip6.json")
>>> sub_cat = cat.search(
...     source_id=["BCC-CSM2-MR", "CNRM-CM6-1", "CNRM-ESM2-1"],
...     experiment_id=["historical", "ssp585"],
...     variable_id="pr",
...     table_id="Amon",
...     grid_label="gn",
... )
>>> dsets = sub_cat.to_datatree()
>>> dsets["CMIP/BCC.BCC-CSM2-MR/historical/Amon/gn"].ds
<xarray.Dataset>
Dimensions:    (bnds: 2, lat: 160, lon: 320, member_id: 3, time: 1980)
Coordinates:
* lon        (lon) float64 0.0 1.125 2.25 3.375 ... 355.5 356.6 357.8 358.9
* lat        (lat) float64 -89.14 -88.03 -86.91 -85.79 ... 86.91 88.03 89.14
* time       (time) object 1850-01-16 12:00:00 ... 2014-12-16 12:00:00
* member_id  (member_id) <U8 'r1i1p1f1' 'r2i1p1f1' 'r3i1p1f1'
Dimensions without coordinates: bnds
Data variables:
    lat_bnds   (lat, bnds) float64 dask.array<chunksize=(160, 2), meta=np.ndarray>
    lon_bnds   (lon, bnds) float64 dask.array<chunksize=(320, 2), meta=np.ndarray>
    time_bnds  (time, bnds) object dask.array<chunksize=(1980, 2), meta=np.ndarray>
    pr         (member_id, time, lat, lon) float32 dask.array<chunksize=(1, 600, 160, 320), meta=np.ndarray>
unique()[source]

Return unique values for given columns in the catalog.

property df

Return pandas DataFrame.

property key_template

Return string template used to create catalog entry keys

Returns

str – string template used to create catalog entry keys

ESM DataSource#

class intake_esm.source.ESMDataSource(*args, **kwargs)[source]
__init__(key, records, variable_column_name, path_column_name, data_format, format_column_name, *, aggregations=None, requested_variables=None, preprocess=None, storage_options=None, xarray_open_kwargs=None, xarray_combine_by_coords_kwargs=None, intake_kwargs=None)[source]

An intake compatible Data Source for ESM data.

Parameters
  • key (str) – The key of the data source.

  • records (list of dict) – A list of records, each of which is a dictionary mapping column names to values.

  • variable_column_name (str) – The column name of the variable name.

  • path_column_name (str) – The column name of the path.

  • data_format (DataFormat) – The data format of the data.

  • aggregations (list of Aggregation, optional) – A list of aggregations to apply to the data.

  • requested_variables (list of str, optional) – A list of variables to load.

  • preprocess (callable, optional) – A preprocessing function to apply to the data.

  • storage_options (dict, optional) – fsspec parameters passed to the backend file-system such as Google Cloud Storage, Amazon Web Service S3.

  • xarray_open_kwargs (dict, optional) – Keyword arguments to pass to open_dataset() function.

  • xarray_combine_by_coords_kwargs (dict, optional) – Keyword arguments to pass to combine_by_coords() function.

  • intake_kwargs (dict, optional) – Additional keyword arguments are passed through to the DataSource base class.

close()[source]

Delete open files from memory

to_dask()[source]

Return xarray object (which will have chunks)

ESM Catalog#

pydantic model intake_esm.cat.ESMCatalogModel[source]#

Pydantic model for the ESM data catalog defined in https://git.io/JBWoW

Show JSON schema
{
   "title": "ESMCatalogModel",
   "description": "Pydantic model for the ESM data catalog defined in https://git.io/JBWoW",
   "type": "object",
   "properties": {
      "esmcat_version": {
         "title": "Esmcat Version",
         "type": "string"
      },
      "attributes": {
         "title": "Attributes",
         "type": "array",
         "items": {
            "$ref": "#/definitions/Attribute"
         }
      },
      "assets": {
         "$ref": "#/definitions/Assets"
      },
      "aggregation_control": {
         "$ref": "#/definitions/AggregationControl"
      },
      "id": {
         "title": "Id",
         "default": "",
         "type": "string"
      },
      "catalog_dict": {
         "title": "Catalog Dict",
         "type": "array",
         "items": {
            "type": "object"
         }
      },
      "catalog_file": {
         "title": "Catalog File",
         "type": "string"
      },
      "description": {
         "title": "Description",
         "type": "string"
      },
      "title": {
         "title": "Title",
         "type": "string"
      },
      "last_updated": {
         "title": "Last Updated",
         "anyOf": [
            {
               "type": "string",
               "format": "date-time"
            },
            {
               "type": "string",
               "format": "date"
            }
         ]
      }
   },
   "required": [
      "esmcat_version",
      "attributes",
      "assets",
      "aggregation_control"
   ],
   "definitions": {
      "Attribute": {
         "title": "Attribute",
         "type": "object",
         "properties": {
            "column_name": {
               "title": "Column Name",
               "type": "string"
            },
            "vocabulary": {
               "title": "Vocabulary",
               "default": "",
               "type": "string"
            }
         },
         "required": [
            "column_name"
         ]
      },
      "DataFormat": {
         "title": "DataFormat",
         "description": "An enumeration.",
         "enum": [
            "netcdf",
            "zarr",
            "reference",
            "<class 'intake_esm.cat.DataFormat.Config'>"
         ],
         "type": "string"
      },
      "Assets": {
         "title": "Assets",
         "type": "object",
         "properties": {
            "column_name": {
               "title": "Column Name",
               "type": "string"
            },
            "format": {
               "$ref": "#/definitions/DataFormat"
            },
            "format_column_name": {
               "title": "Format Column Name",
               "type": "string"
            }
         },
         "required": [
            "column_name"
         ]
      },
      "AggregationType": {
         "title": "AggregationType",
         "description": "An enumeration.",
         "enum": [
            "join_new",
            "join_existing",
            "union",
            "<class 'intake_esm.cat.AggregationType.Config'>"
         ],
         "type": "string"
      },
      "Aggregation": {
         "title": "Aggregation",
         "type": "object",
         "properties": {
            "type": {
               "$ref": "#/definitions/AggregationType"
            },
            "attribute_name": {
               "title": "Attribute Name",
               "type": "string"
            },
            "options": {
               "title": "Options",
               "default": {},
               "type": "object"
            }
         },
         "required": [
            "type",
            "attribute_name"
         ]
      },
      "AggregationControl": {
         "title": "AggregationControl",
         "type": "object",
         "properties": {
            "variable_column_name": {
               "title": "Variable Column Name",
               "type": "string"
            },
            "groupby_attrs": {
               "title": "Groupby Attrs",
               "type": "array",
               "items": {
                  "type": "string"
               }
            },
            "aggregations": {
               "title": "Aggregations",
               "default": [],
               "type": "array",
               "items": {
                  "$ref": "#/definitions/Aggregation"
               }
            }
         },
         "required": [
            "variable_column_name",
            "groupby_attrs"
         ]
      }
   }
}

Config
  • arbitrary_types_allowed: bool = True

  • underscore_attrs_are_private: bool = True

  • validate_all: bool = True

  • validate_assignment: bool = True

Fields
field aggregation_control [Required]#
Validated by
field assets [Required]#
Validated by
field attributes [Required]#
Validated by
field catalog_dict = None#
Validated by
field catalog_file = None#
Validated by
field description = None#
Validated by
field esmcat_version [Required]#
Validated by
field id = ''#
Validated by
field last_updated = None#
Validated by
field title = None#
Validated by
classmethod from_dict(data)[source]#
classmethod load(json_file, storage_options=None, read_csv_kwargs=None)[source]#

Loads the catalog from a file

Parameters
  • json_file (str or pathlib.Path) – The path to the json file containing the catalog

  • storage_options (dict) – fsspec parameters passed to the backend file-system such as Google Cloud Storage, Amazon Web Service S3.

  • read_csv_kwargs (dict) – Additional keyword arguments passed through to the read_csv() function.

nunique()[source]#

Return a series of the number of unique values for each column in the catalog.

save(name, *, directory=None, catalog_type='dict', to_csv_kwargs=None, json_dump_kwargs=None, storage_options=None)[source]#

Save the catalog to a file.

Parameters
  • name (str) – The name of the file to save the catalog to.

  • directory (str) – The directory or cloud storage bucket to save the catalog to. If None, use the current directory.

  • catalog_type (str) – The type of catalog to save. Whether to save the catalog table as a dictionary in the JSON file or as a separate CSV file. Valid options are ‘dict’ and ‘file’.

  • to_csv_kwargs (dict, optional) – Additional keyword arguments passed through to the to_csv() method.

  • json_dump_kwargs (dict, optional) – Additional keyword arguments passed through to the dump() function.

  • storage_options (dict) – fsspec parameters passed to the backend file-system such as Google Cloud Storage, Amazon Web Service S3.

Notes

Large catalogs can result in large JSON files. To keep the JSON file size manageable, call with catalog_type=’file’ to save catalog as a separate CSV file.

search(*, query, require_all_on=None)[source]#

Search for entries in the catalog.

Parameters
  • query (dict, optional) – A dictionary of query parameters to execute against the dataframe.

  • require_all_on (list, str, optional) – A dataframe column or a list of dataframe columns across which all entries must satisfy the query criteria. If None, return entries that fulfill any of the criteria specified in the query, by default None.

Returns

catalog (ESMCatalogModel) – A new catalog with the entries satisfying the query criteria.

unique()[source]#

Return a series of unique values for each column in the catalog.

validator validate_catalog  »  all fields[source]#
property columns_with_iterables#

Return a set of columns that have iterables.

property df#

Return the dataframe.

property grouped#
property has_multiple_variable_assets#

Return True if the catalog has multiple variable assets.

Query Model#

pydantic model intake_esm.cat.QueryModel[source]#

A Pydantic model to represent a query to be executed against a catalog.

Show JSON schema
{
   "title": "QueryModel",
   "description": "A Pydantic model to represent a query to be executed against a catalog.",
   "type": "object",
   "properties": {
      "query": {
         "title": "Query",
         "type": "object",
         "additionalProperties": {
            "anyOf": [
               {},
               {
                  "type": "array",
                  "items": {}
               }
            ]
         }
      },
      "columns": {
         "title": "Columns",
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "require_all_on": {
         "title": "Require All On",
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "array",
               "items": {}
            }
         ]
      }
   },
   "required": [
      "query",
      "columns"
   ]
}

Config
  • validate_all: bool = True

  • validate_assignment: bool = True

Fields
field columns [Required]#
Validated by
field query [Required]#
Validated by
field require_all_on = None#
Validated by
validator validate_query  »  all fields[source]#

Derived Variable Registry#

class intake_esm.derived.DerivedVariableRegistry[source]

Registry of derived variables

__init__()
classmethod load(name, package=None)[source]

Load a DerivedVariableRegistry from a Python module/file

Parameters
  • name (str) – The name of the module to load the DerivedVariableRegistry from.

  • package (str, optional) – The package to load the module from. This argument is required when performing a relative import. It specifies the package to use as the anchor point from which to resolve the relative import to an absolute import.

Returns

DerivedVariableRegistry – A DerivedVariableRegistry loaded from the Python module.

Notes

If you have a folder: /home/foo/pythonfiles, and you want to load a registry defined in registry.py, located in that directory, ensure to add your folder to the $PYTHONPATH before calling this function.

>>> import sys
>>> sys.path.insert(0, "/home/foo/pythonfiles")
>>> from intake_esm.derived import DerivedVariableRegistry
>>> registsry = DerivedVariableRegistry.load("registry")
search(variable)[source]

Search for a derived variable by name or list of names

Parameters

variable (typing.Union[str, typing.List[str]]) – The name of the variable to search for.

Returns

DerivedVariableRegistry – A DerivedVariableRegistry with the found variables.

update_datasets(*, datasets, variable_key_name, skip_on_error=False)[source]

Given a dictionary of datasets, return a dictionary of datasets with the derived variables

Parameters
  • datasets (typing.Dict[str, xr.Dataset]) – A dictionary of datasets to apply the derived variables to.

  • variable_key_name (str) – The name of the variable key used in the derived variable query

  • skip_on_error (bool, optional) – If True, skip variables that fail variable derivation.

Returns

typing.Dict[str, xr.Dataset] – A dictionary of datasets with the derived variables applied.

register[source]

Register a derived variable

Parameters
  • func (typing.Callable) – The function to apply to the dependent variables.

  • variable (str) – The name of the variable to derive.

  • query (typing.Dict[str, typing.Union[typing.Any, typing.List[typing.Any]]]) – The query to use to retrieve dependent variables required to derive variable.

  • prefer_derived (bool, optional (default=False)) – Specify whether to compute this variable on datasets that already contain a variable of the same name. Default (False) is to leave the existing variable.

Returns

typing.Callable – The function that was registered.

Derived Variable#

class intake_esm.derived.DerivedVariable(*, func, variable, query, prefer_derived)[source]
dependent_variables(variable_key_name)[source]

Return a list of dependent variables for a given variable

Options for dataset attributes#

class intake_esm.utils.set_options(**kwargs)[source]

Set options for intake_esm in a controlled context.

Currently-supported options:

  • attrs_prefix: The prefix to use in the names of attributes constructed from the catalog’s columns when returning xarray Datasets. Default: intake_esm_attrs.

  • dataset_key: Name of the global attribute where to store the dataset’s key. Default: intake_esm_dataset_key.

  • vars_key: Name of the global attribute where to store the list of requested variables when opening a dataset. Default: intake_esm_vars.

Examples

You can use set_options either as a context manager:

>>> import intake
>>> import intake_esm
>>> cat = intake.open_esm_datastore('catalog.json')
>>> with intake_esm.set_options(attrs_prefix='cat'):
...     out = cat.to_dataset_dict()
...

Or to set global options:

>>> intake_esm.set_options(attrs_prefix='cat', vars_key='cat_vars')