Frequently Asked Questions#
How do I create my own catalog?#
To create your own data catalog, we recommend using the ecgtools package. The package provides a set of tools for harvesting metadata information from files and creating intake-esm compatible catalogs.
Is there a list of existing catalogs?#
The table below is an incomplete list of existing catalogs. Please feel free to add to this list or raise an issue on GitHub.
CMIP6-GLADE
Description: CMIP6 data accessible on the NCAR’s GLADE disk storage system
Platform: NCAR-GLADE
Catalog path or url: /glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cmip6.json
Data Format: netCDF
Documentation Page: https://pcmdi.llnl.gov/CMIP6/Guide/dataUsers.html
CMIP6-CESM2-Timeseries
Description: CESM2 raw output (non-cmorized) that went into CMIP6 data
Platform: NCAR-CAMPAIGN
Catalog path or url: /glade/collections/cmip/catalog/intake-esm-datastore/catalogs/campaign-cesm2-cmip6-timeseries.json
Data Format: netCDF
Documentation Page: http://www.cesm.ucar.edu/models/cesm2/
CMIP5-GLADE
Description: CMIP5 data accessible on the NCAR’s GLADE disk storage system
Platform: NCAR-GLADE
Catalog path or url: /glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cmip5.json
Data Format: netCDF
Documentation Page: https://pcmdi.llnl.gov/mips/cmip5/guide.html
CESM1-LENS-AWS
Description: CESM1 Large Ensemble data publicly available on Amazon S3
Platform: AWS S3 (us-west-2 region)
Catalog path or url: https://raw.githubusercontent.com/NCAR/cesm-lens-aws/master/intake-catalogs/aws-cesm1-le.json
Data Format: Zarr
Documentation Page: https://doi.org/10.26024/wt24-5j82
CESM1-LENS-GLADE
Description: CESM1 Large Ensemble data stored on NCAR’s GLADE disk storage system
Platform: NCAR-GLADE
Catalog path or url: /glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cesm1-le.json
Data Format: netCDF
Documentation Page: https://doi.org/10.5065/d6j101d1
CESM2-LE-GLADE
Description: ESM collection for the CESM2 LENS data stored on GLADE in /glade/campaign/cgd/cesm/CESM2-LE/timeseries
Platform: NCAR-GLADE
Catalog path or url: /glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cesm2-le.json
Data Format: netCDF
Documentation Page: https://www.cesm.ucar.edu/projects/community-projects/LENS2/
CMIP6-GCP
Description: CMIP6 Zarr data residing in Pangeo’s Google Storage
Platform: Google Cloud Platform
Catalog path or url: https://storage.googleapis.com/cmip6/pangeo-cmip6.json
Data Format: Zarr
Documentation Page: https://pcmdi.llnl.gov/CMIP6/Guide/dataUsers.html
CMIP6-MISTRAL
Description: CMIP6 data accessible on the DKRZ’s MISTRAL disk storage system
Platform: DKRZ (German Climate Computing Centre)-MISTRAL
Catalog path or url: /work/ik1017/Catalogs/mistral-cmip6.json
Data Format: netCDF
Documentation Page: https://pcmdi.llnl.gov/CMIP6/Guide/dataUsers.html
CMIP5-MISTRAL
Description: CMIP5 data accessible on the DKRZ’s MISTRAL disk storage system
Platform: DKRZ (German Climate Computing Centre)-MISTRAL
Catalog path or url: /work/ik1017/Catalogs/mistral-cmip5.json
Data Format: netCDF
Documentation Page: https://pcmdi.llnl.gov/mips/cmip5/guide.html
MiKlip-MISTRAL
Description: Data from MiKlip projects at the Max Planck Institute for Meteorology (MPI-M)
Platform: DKRZ (German Climate Computing Centre)-MISTRAL
Catalog path or url: /work/ik1017/Catalogs/mistral-miklip.json
Data Format: netCDF
Documentation Page: https://www.fona-miklip.de/
MPI-GE-MISTRAL
Description: Max Planck Institute Grand Ensemble cmorized by CMIP5-standards
Platform: DKRZ (German Climate Computing Centre)-MISTRAL
Catalog path or url: /work/ik1017/Catalogs/mistral-MPI-GE.json
Data Format: netCDF
Documentation Page: https://doi.org/10/gf3kgt
CMIP6-LDEO-OpenDAP
Description: CMIP6 data accessible via Hyrax OpenDAP Server at Lamont-Doherty Earth Observatory
Platform: LDEO-OpenDAP
Catalog path or url: http://haden.ldeo.columbia.edu/catalogs/hyrax_cmip6.json
Data Format: netCDF
Documentation Page: https://pcmdi.llnl.gov/CMIP6/Guide/dataUsers.html
Note
Some of these catalogs are also stored in intake-esm-datastore GitHub repository at https://github.com/NCAR/intake-esm-datastore/tree/master/catalogs
Why do I get a segmentation fault when I try to open a dataset?#
This is a known issue when trying to load datasets with xarray using the netcdf4
backend. The issue is related to thread safety in the underlying netcdf-c
library.
By default, Intake-ESM attempts to open multiple datasets by creating and executing delayed dask tasks, in order to maximise performance. However, this can lead to segmentations faults. If using a dask client, you may experience this issue as dask workers dying.
In order to avoid this issue, you can either pass threaded=False
to .to_dask()
, .to_dataset_dict
, or .to_datatree()
, in order to the use of delayed dask tasks when opening datasets on a per-function call basis, or set the environment variable ITK_ESM_THREADING="False"
to set the default behaviour to eagerly execute dataset opening without using dask tasks. This should prevent the segmentation fault from occurring.
Note that if ITK_ESM_THREADING="False"
, passing threaded=True
to .to_dask()
, .to_dataset_dict
, or .to_datatree()
will override the default behaviour and use dask tasks.