Frequently Asked Questions#

How do I create my own catalog?#

To create your own data catalog, we recommend using the ecgtools package. The package provides a set of tools for harvesting metadata information from files and creating intake-esm compatible catalogs.

Is there a list of existing catalogs?#

The table below is an incomplete list of existing catalogs. Please feel free to add to this list or raise an issue on GitHub.

CMIP6-GLADE

  • Description: CMIP6 data accessible on the NCAR’s GLADE disk storage system

  • Platform: NCAR-GLADE

  • Catalog path or url: /glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cmip6.json

  • Data Format: netCDF

  • Documentation Page: https://pcmdi.llnl.gov/CMIP6/Guide/dataUsers.html

CMIP6-CESM2-Timeseries

  • Description: CESM2 raw output (non-cmorized) that went into CMIP6 data

  • Platform: NCAR-CAMPAIGN

  • Catalog path or url: /glade/collections/cmip/catalog/intake-esm-datastore/catalogs/campaign-cesm2-cmip6-timeseries.json

  • Data Format: netCDF

  • Documentation Page: http://www.cesm.ucar.edu/models/cesm2/

CMIP5-GLADE

  • Description: CMIP5 data accessible on the NCAR’s GLADE disk storage system

  • Platform: NCAR-GLADE

  • Catalog path or url: /glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cmip5.json

  • Data Format: netCDF

  • Documentation Page: https://pcmdi.llnl.gov/mips/cmip5/guide.html

CESM1-LENS-AWS

  • Description: CESM1 Large Ensemble data publicly available on Amazon S3

  • Platform: AWS S3 (us-west-2 region)

  • Catalog path or url: https://raw.githubusercontent.com/NCAR/cesm-lens-aws/master/intake-catalogs/aws-cesm1-le.json

  • Data Format: Zarr

  • Documentation Page: https://doi.org/10.26024/wt24-5j82

CESM1-LENS-GLADE

  • Description: CESM1 Large Ensemble data stored on NCAR’s GLADE disk storage system

  • Platform: NCAR-GLADE

  • Catalog path or url: /glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cesm1-le.json

  • Data Format: netCDF

  • Documentation Page: https://doi.org/10.5065/d6j101d1

CESM2-LE-GLADE

  • Description: ESM collection for the CESM2 LENS data stored on GLADE in /glade/campaign/cgd/cesm/CESM2-LE/timeseries

  • Platform: NCAR-GLADE

  • Catalog path or url: /glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cesm2-le.json

  • Data Format: netCDF

  • Documentation Page: https://www.cesm.ucar.edu/projects/community-projects/LENS2/

CMIP6-GCP

  • Description: CMIP6 Zarr data residing in Pangeo’s Google Storage

  • Platform: Google Cloud Platform

  • Catalog path or url: https://storage.googleapis.com/cmip6/pangeo-cmip6.json

  • Data Format: Zarr

  • Documentation Page: https://pcmdi.llnl.gov/CMIP6/Guide/dataUsers.html

CMIP6-MISTRAL

  • Description: CMIP6 data accessible on the DKRZ’s MISTRAL disk storage system

  • Platform: DKRZ (German Climate Computing Centre)-MISTRAL

  • Catalog path or url: /work/ik1017/Catalogs/mistral-cmip6.json

  • Data Format: netCDF

  • Documentation Page: https://pcmdi.llnl.gov/CMIP6/Guide/dataUsers.html

CMIP5-MISTRAL

  • Description: CMIP5 data accessible on the DKRZ’s MISTRAL disk storage system

  • Platform: DKRZ (German Climate Computing Centre)-MISTRAL

  • Catalog path or url: /work/ik1017/Catalogs/mistral-cmip5.json

  • Data Format: netCDF

  • Documentation Page: https://pcmdi.llnl.gov/mips/cmip5/guide.html

MiKlip-MISTRAL

  • Description: Data from MiKlip projects at the Max Planck Institute for Meteorology (MPI-M)

  • Platform: DKRZ (German Climate Computing Centre)-MISTRAL

  • Catalog path or url: /work/ik1017/Catalogs/mistral-miklip.json

  • Data Format: netCDF

  • Documentation Page: https://www.fona-miklip.de/

MPI-GE-MISTRAL

  • Description: Max Planck Institute Grand Ensemble cmorized by CMIP5-standards

  • Platform: DKRZ (German Climate Computing Centre)-MISTRAL

  • Catalog path or url: /work/ik1017/Catalogs/mistral-MPI-GE.json

  • Data Format: netCDF

  • Documentation Page: https://doi.org/10/gf3kgt

CMIP6-LDEO-OpenDAP

  • Description: CMIP6 data accessible via Hyrax OpenDAP Server at Lamont-Doherty Earth Observatory

  • Platform: LDEO-OpenDAP

  • Catalog path or url: http://haden.ldeo.columbia.edu/catalogs/hyrax_cmip6.json

  • Data Format: netCDF

  • Documentation Page: https://pcmdi.llnl.gov/CMIP6/Guide/dataUsers.html

Note

Some of these catalogs are also stored in intake-esm-datastore GitHub repository at https://github.com/NCAR/intake-esm-datastore/tree/master/catalogs

Why do I get a segmentation fault when I try to open a dataset?#

This is a known issue when trying to load datasets with xarray using the netcdf4 backend. The issue is related to thread safety in the underlying netcdf-c library.

By default, Intake-ESM attempts to open multiple datasets by creating and executing delayed dask tasks, in order to maximise performance. However, this can lead to segmentations faults. If using a dask client, you may experience this issue as dask workers dying.

In order to avoid this issue, you can either pass threaded=False to .to_dask(), .to_dataset_dict, or .to_datatree(), in order to the use of delayed dask tasks when opening datasets on a per-function call basis, or set the environment variable ITK_ESM_THREADING="False" to set the default behaviour to eagerly execute dataset opening without using dask tasks. This should prevent the segmentation fault from occurring.

Note that if ITK_ESM_THREADING="False", passing threaded=True to .to_dask(), .to_dataset_dict, or .to_datatree() will override the default behaviour and use dask tasks.

My Dask Workers all die when I try to open a dataset - how can I fix this?#

See the segmentation fault section above.