# Overview

Intake-esm is a data cataloging utility built on top of intake, pandas, and
xarray. Intake-esm aims to facilitate:

- the discovery of earth’s climate and weather datasets.
- the ingestion of these datasets into xarray dataset containers.

It's basic usage is shown below. To begin, let's import `intake`:


In [None]:
import intake

## Loading a catalog


At import time, intake-esm plugin is available in intake’s registry as
`esm_datastore` and can be accessed with `intake.open_esm_datastore()` function.
For demonstration purposes, we are going to use the catalog for Community Earth
System Model Large ensemble (CESM LENS) dataset publicly available in Amazon S3.

```{note}
You can learn more about CESM LENS dataset in AWS S3 [here](https://registry.opendata.aws/ncar-cesm-lens/)
```


You can load data from an
[ESM Catalog](https://github.com/NCAR/esm-collection-spec) by providing the URL
to valid ESM Catalog:


In [None]:
catalog_url = "https://ncar-cesm-lens.s3-us-west-2.amazonaws.com/catalogs/aws-cesm1-le.json"
col = intake.open_esm_datastore(catalog_url)
col

The summary above tells us that this catalog contains over 400 data assets. We
can get more information on the individual data assets contained in the catalog
by calling the underlying dataframe created when it is initialized:


In [None]:
col.df.head()

## Finding unique entries for individual columns

To get unique values for given columns in the catalog, intake-esm provides a
{py:meth}`~intake_esm.core.esm_datastore.unique` method. This method returns a
dictionary containing count, and unique values:


In [None]:
col.unique(columns=["component", "frequency", "experiment"])

## Search

The {py:meth}`~intake_esm.core.esm_datastore.search` method allows the user to
perform a query on a catalog using keyword arguments. The keyword argument names
must be the names of the columns in the catalog. The search method returns a
subset of the catalog with all the entries that match the provided query.

### Exact Match Keywords

By default, the {py:meth}`~intake_esm.core.esm_datastore.search` method looks
for exact matches


In [None]:
col_subset = col.search(
 component=["ice_nh", "lnd"],
 frequency=["monthly"],
 experiment=["20C", "HIST"],
)
col_subset.df

### Substring matches


As pointed earlier, the search method looks for exact matches by default.
However, with use of wildcards and/or regular expressions, we can find all items
with a particular substring in a given column:


In [None]:
# Find all entries with `wind` in their variable long_name
col.search(long_name="wind*").df

In [None]:
# Find all entries whose variable long name starts with `wind`
col.search(long_name="^wind").df

## Loading datasets


Intake-esm implements convenience utilities for loading the query results into
higher level xarray datasets. The logic for merging/concatenating the query
results into higher level xarray datasets is provided in the input JSON file and
is available under `.aggregation_info` property:


In [None]:
col.aggregation_info

In [None]:
col.aggregation_info.aggregations

In [None]:
# Dataframe columns used to determine groups of compatible datasets.
col.aggregation_info.groupby_attrs # or col.groupby_attrs

In [None]:
# List of columns used to merge/concatenate compatible multiple Dataset into a single Dataset.
col.aggregation_info.agg_columns # or col.agg_columns

To load data assets into xarray datasets, we need to use the
{py:meth}`~intake_esm.core.esm_datastore.to_dataset_dict` method. This method
returns a dictionary of aggregate xarray datasets as the name hints.


In [None]:
dset_dicts = col_subset.to_dataset_dict(zarr_kwargs={"consolidated": True})

In [None]:
[key for key in dset_dicts.keys()]

We can access a particular dataset as follows:


In [None]:
ds = dset_dicts["lnd.20C.monthly"]
print(ds)

Let’s create a quick plot for a slice of the data:


In [None]:
ds.SNOW.isel(time=0, member_id=range(1, 24, 4)).plot(col="member_id", col_wrap=3, robust=True)

In [None]:
import intake_esm # just to display version information

intake_esm.show_versions()