Filter a catalog by substring and/or regular expression#
Exact match keywords#
import intake
url = "https://ncar-cesm-lens.s3-us-west-2.amazonaws.com/catalogs/aws-cesm1-le.json"
cat = intake.open_esm_datastore(url)
cat
aws-cesm1-le catalog with 56 dataset(s) from 442 asset(s):
unique | |
---|---|
variable | 78 |
long_name | 75 |
component | 5 |
experiment | 4 |
frequency | 6 |
vertical_levels | 3 |
spatial_domain | 5 |
units | 25 |
start_time | 12 |
end_time | 13 |
path | 427 |
derived_variable | 0 |
cat.df.head()
variable | long_name | component | experiment | frequency | vertical_levels | spatial_domain | units | start_time | end_time | path | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | FLNS | net longwave flux at surface | atm | 20C | daily | 1.0 | global | W/m2 | 1920-01-01 12:00:00 | 2005-12-31 12:00:00 | s3://ncar-cesm-lens/atm/daily/cesmLE-20C-FLNS.... |
1 | FLNSC | clearsky net longwave flux at surface | atm | 20C | daily | 1.0 | global | W/m2 | 1920-01-01 12:00:00 | 2005-12-31 12:00:00 | s3://ncar-cesm-lens/atm/daily/cesmLE-20C-FLNSC... |
2 | FLUT | upwelling longwave flux at top of model | atm | 20C | daily | 1.0 | global | W/m2 | 1920-01-01 12:00:00 | 2005-12-31 12:00:00 | s3://ncar-cesm-lens/atm/daily/cesmLE-20C-FLUT.... |
3 | FSNS | net solar flux at surface | atm | 20C | daily | 1.0 | global | W/m2 | 1920-01-01 12:00:00 | 2005-12-31 12:00:00 | s3://ncar-cesm-lens/atm/daily/cesmLE-20C-FSNS.... |
4 | FSNSC | clearsky net solar flux at surface | atm | 20C | daily | 1.0 | global | W/m2 | 1920-01-01 12:00:00 | 2005-12-31 12:00:00 | s3://ncar-cesm-lens/atm/daily/cesmLE-20C-FSNSC... |
By default, the
search()
method looks for exact matches,
and is case sensitive:
cat.search(experiment="20C", long_name="wind")
aws-cesm1-le catalog with 0 dataset(s) from 0 asset(s):
unique | |
---|---|
variable | 0 |
long_name | 0 |
component | 0 |
experiment | 0 |
frequency | 0 |
vertical_levels | 0 |
spatial_domain | 0 |
units | 0 |
start_time | 0 |
end_time | 0 |
path | 0 |
derived_variable | 0 |
As you can see, the example above returns an empty catalog.
Substring matches#
In some cases, you may not know the exact term to look for. For such cases, inkake-esm supports searching for substring matches. With use of wildcards and/or regular expressions, we can find all items with a particular substring in a given column. Let’s search for:
entries from
experiment
= ‘20C’all entries whose variable long name contains
wind
cat.search(experiment="20C", long_name="wind*")
aws-cesm1-le catalog with 4 dataset(s) from 10 asset(s):
unique | |
---|---|
variable | 8 |
long_name | 8 |
component | 2 |
experiment | 1 |
frequency | 3 |
vertical_levels | 2 |
spatial_domain | 2 |
units | 3 |
start_time | 3 |
end_time | 3 |
path | 10 |
derived_variable | 0 |
Now, let’s search for:
entries from
experiment
= ‘20C’all entries whose variable long name starts with
wind
cat_subset = cat.search(experiment="20C", long_name="^wind")
cat_subset
aws-cesm1-le catalog with 1 dataset(s) from 4 asset(s):
unique | |
---|---|
variable | 4 |
long_name | 4 |
component | 1 |
experiment | 1 |
frequency | 1 |
vertical_levels | 1 |
spatial_domain | 1 |
units | 2 |
start_time | 1 |
end_time | 1 |
path | 4 |
derived_variable | 0 |
cat_subset.df
variable | long_name | component | experiment | frequency | vertical_levels | spatial_domain | units | start_time | end_time | path | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | TAUX | windstress in grid-x direction | ocn | 20C | monthly | 1.0 | global_ocean | dyne/centimeter^2 | 1920-01-16 12:00:00 | 2005-12-16 12:00:00 | s3://ncar-cesm-lens/ocn/monthly/cesmLE-20C-TAU... |
1 | TAUX2 | windstress**2 in grid-x direction | ocn | 20C | monthly | 1.0 | global_ocean | dyne^2/centimeter^4 | 1920-01-16 12:00:00 | 2005-12-16 12:00:00 | s3://ncar-cesm-lens/ocn/monthly/cesmLE-20C-TAU... |
2 | TAUY | windstress in grid-y direction | ocn | 20C | monthly | 1.0 | global_ocean | dyne/centimeter^2 | 1920-01-16 12:00:00 | 2005-12-16 12:00:00 | s3://ncar-cesm-lens/ocn/monthly/cesmLE-20C-TAU... |
3 | TAUY2 | windstress**2 in grid-y direction | ocn | 20C | monthly | 1.0 | global_ocean | dyne^2/centimeter^4 | 1920-01-16 12:00:00 | 2005-12-16 12:00:00 | s3://ncar-cesm-lens/ocn/monthly/cesmLE-20C-TAU... |
Show code cell source
import intake_esm # just to display version information
intake_esm.show_versions()
Show code cell output
INSTALLED VERSIONS
------------------
cftime: 1.6.4
dask: 2025.1.0
fastprogress: 1.0.3
fsspec: 2025.2.0
gcsfs: 2025.2.0
intake: 2.0.8
intake_esm: 2024.2.6.post33+g8be31d9.d20250204
netCDF4: 1.7.2
pandas: 2.2.3
requests: 2.32.3
s3fs: 2025.2.0
xarray: 2025.1.2
zarr: 3.0.2