Changelog#

v2022.9.18#

(full changelog)

New features added#

Bugs fixed#

  • FIX: Update default catalog location and tests #525 (@mgrover1)

  • FIX: Add fixes to allow reading kerchunk catalog #485 (@mgrover1)

Maintenance and upkeep improvements#

Documentation improvements#

Other merged PRs#

Contributors to this release#

(GitHub contributors page for this release)

@andersy005 | @aulemahal | @d70-t | @dependabot | @jukent | @MackenzieBlanusa | @mgrover1 | @pre-commit-ci | @RondeauG | @wachsylon

v2021.8.17#

(full changelog)

Enhancements made#

Maintenance and upkeep improvements#

Documentation improvements#

Other merged PRs#

Contributors to this release#

(GitHub contributors page for this release)

@andersy005 | @dependabot | @mgrover1 | @pre-commit-ci

v2021.1.15#

(full changelog)

Bug Fixes#

Breaking Changes#

Internal Changes#

Documentation#

Contributors to this release#

(GitHub contributors page for this release)

@aaronspring | @andersy005 | @jbusecke

v2020.12.18#

(full changelog)

Bug Fixes#

  • ๐Ÿ› Disable _requested_variables for single variable assets #306 (@andersy005)

Internal Changes#

Contributors to this release#

(GitHub contributors page for this release)

@andersy005 | @dcherian | @jbusecke | @naomi-henderson | @Recalculate

v2020.11.4#

Features#

Breaking Changes#

Bug Fixes#

Documentation#

Internal Changes#

Contributors to this release#

(GitHub contributors page for this release)

@andersy005 | @dcherian | @jbusecke | @jukent | @sherimickelson

v2020.8.15#

Features#

Documentation#

  • Update to_dataset_dict() docstring to inform users on how cdf_kwargs argument is used in regards to chunking (GH#278) @bonnland

Internal Changes#

Contributors to this release#

(GitHub contributors page for this release)

@andersy005 | @bonnland | @dcherian | @jeffdlb | @jukent | @kmpaul | @markusritschel | @martindurant | @matt-long

v2020.6.11#

Features#

Documentation#

Internal Changes#

Contributors to this release#

(GitHub contributors page for this release)

@andersy005 | @jhamman | @martindurant

v2020.5.21#

Features#

Contributors to this release#

(GitHub contributors page for this release)

@andersy005 | @bonnland | @dcherian | @jbusecke | @jeffdlb | @kmpaul | @markusritschel

v2020.5.01#

Features#

  • Add html representation for the catalog object. (GH#229) @andersy005

  • Move logic for assets aggregation into ESMGroupDataSource() and add few basic dict-like methods (keys(), len(), getitem(), contains()) to the catalog object. (GH#194) @andersy005 & @jhamman & @kmpaul

  • Support columns with iterables in unique() and nunique(). (GH#223) @andersy005

Bug Fixes#

  • Revert back to using concurrent.futures to address failures due to daskโ€™s distributed scheduler. (GH#225) & (GH#226)

Internal Changes#

Contributors to this release#

(GitHub contributors page for this release)

@andersy005 | @bonnland | @dcherian | @jbusecke | @jhamman | @kmpaul | @sherimickelson

v2020.3.16#

Features#

Bug Fixes#

Internal Changes#

Contributors to this release#

(GitHub contributors page for this release)

@andersy005 | @bonnland | @dcherian | @jbusecke | @jhamman | @kmpaul

v2019.12.13#

Features#

  • Add optional preprocess argument to to_dataset_dict() (GH#155) @matt-long

  • Allow users to disable dataset aggregations by passing aggregate=False to to_dataset_dict() (GH#164) @matt-long

  • Avoid manipulating dataset coordinates by using data_vars=varname when concatenating datasets via xarray {py:func}:~xarray.concat() (GH#174) @andersy005

  • Support loading netCDF assets from openDAP endpoints (GH#176) @andersy005

  • Add serialize() method to serialize collection/catalog (GH#179) @andersy005

  • Allow passing extra storage options to the backend file system via to_dataset_dict() (GH#180) @bonnland

  • Provide informational messages to the user via Logging module (GH#186) @andersy005

Bug Fixes#

  • Remove the caching option (GH#158) @matt-long

  • Preserve encoding when aggregating datasets (GH#161) @matt-long

  • Sort aggregations to make sure {py:func}:~intake_esm.merge_util.join_existing is always done before {py:func}:~intake_esm.merge_util.join_new (GH#171) @andersy005

Documentation#

Internal Changes#

Contributors to this release#

(GitHub contributors page for this release)

@andersy005 | @bonnland | @dcherian | @jbusecke | @jhamman | @matt-long | @naomi-henderson | @Recalculate | @sebasblancogonz

v2019.10.15#

Features#

Breaking changes#

  • Replaced {py:class}:~intake_esm.core.esm_metadatastore with {py:class}:~intake_esm.core.esm_datastore, see the API reference for more details.

  • intake-esm wonโ€™t build collection catalogs anymore. intake-esm now expects an ESM collection JSON file as input. This JSON should conform to the Earth System Model Collection specification.

Contributors to this release#

(GitHub contributors page for this release)

@aaronspring | @andersy005 | @bonnland | @dcherian | @n-henderson | @naomi-henderson | @rabernat

v2019.8.23#

Features#

  • Add mistral data holdings to intake-esm-datastore (GH#133) @aaronspring

  • Add support for NA-CORDEX data holdings. (GH#115) @jukent

  • Replace .csv with netCDF as serialization format when saving the built collection to disk. With netCDF, we can record very useful information into the global attributes of the netCDF dataset. (GH#119) @andersy005

  • Add string representation of ESMMetadataStoreCatalog`` object ({pr}122`) @andersy005

  • Automatically build missing collections by calling esm_metadatastore(collection_name="GLADE-CMIP5"). When the specified collection is part of the curated collections in intake-esm-datastore. (GH#124) @andersy005

    
    In [1]: import intake
    
    In [2]: col = intake.open_esm_metadatastore(collection_name="GLADE-CMIP5")
    
    In [3]: # if "GLADE-CMIP5" collection isn't built already, the above is equivalent to:
    
    In [4]: col = intake.open_esm_metadatastore(collection_input_definition="GLADE-CMIP5")
    
  • Revert back to using official DRS attributes when building CMIP5 and CMIP6 collections. (GH#126) @andersy005

  • Add .df property for interfacing with the built collection via dataframe To maintain backwards compatiblity. (GH#127) @andersy005

  • Add unique() and nunique() methods for summarizing count and unique values in a collection. (GH#128) @andersy005

    
    In [1]: import intake
    
    In [2]: col = intake.open_esm_metadatastore(collection_name="GLADE-CMIP5")
    
    In [3]: col
    Out[3]: GLADE-CMIP5 collection catalogue with 615853 entries: > 3 resource(s)
    
              > 1 resource_type(s)
    
              > 1 direct_access(s)
    
              > 1 activity(s)
    
              > 218 ensemble_member(s)
    
              > 51 experiment(s)
    
              > 312093 file_basename(s)
    
              > 615853 file_fullpath(s)
    
              > 6 frequency(s)
    
              > 25 institute(s)
    
              > 15 mip_table(s)
    
              > 53 model(s)
    
              > 7 modeling_realm(s)
    
              > 3 product(s)
    
              > 9121 temporal_subset(s)
    
              > 454 variable(s)
    
              > 489 version(s)
    
    In[4]: col.nunique()
    
    resource 3
    resource_type 1
    direct_access 1
    activity 1
    ensemble_member 218
    experiment 51
    file_basename 312093
    file_fullpath 615853
    frequency 6
    institute 25
    mip_table 15
    model 53
    modeling_realm 7
    product 3
    temporal_subset 9121
    variable 454
    version 489
    dtype: int64
    
    In[4]: col.unique(columns=['frequency', 'modeling_realm'])
    
    {'frequency': {'count': 6, 'values': ['mon', 'day', '6hr', 'yr', '3hr', 'fx']},
    'modeling_realm': {'count': 7, 'values': ['atmos', 'land', 'ocean', 'seaIce', 'ocnBgchem',
    'landIce', 'aerosol']}}
    
    

Bug Fixes#

  • For CMIP6, extract grid_label from directory path instead of file name. (GH#127) @andersy005

Contributors to this release#

(GitHub contributors page for this release)

v2019.8.5#

Features#

  • Support building collections using inputs from intake-esm-datastore repository. (GH#79) @andersy005

  • Ensure that requested files are available locally before loading data into xarray datasets. (GH#82) @andersy005 and @matt-long

  • Split collection definitions out of config. (GH#83) @matt-long

  • Add intake-esm-builder, a CLI tool for building collection from the command line. (GH#89) @andersy005

  • Add support for CESM-LENS data holdings residing in AWS S3. (GH#98) @andersy005

  • Sort collection upon creation according to order-by-columns, pass urlpath through stack for use in parsing collection filenames (GH#100) @pbranson

Bug Fixes#

Internal Changes#

  • Refactor existing functionality to make intake-esm robust and extensible. (GH#77) @andersy005

  • Add aggregate._override_coords function to override dim coordinates except time in case thereโ€™s floating point precision difference. (GH#108) @andersy005

  • Fix CESM-LE ice component peculiarities that caused intake-esm to load data improperly. The fix separates variables for ice component into two separate components:

    • ice_sh: for southern hemisphere

    • ice_nh: for northern hemisphere

    (GH#114) @andersy005

Contributors to this release#

(GitHub contributors page for this release)

v2019.5.11#

Features#

Contributors to this release#

(GitHub contributors page for this release)

v2019.4.26#

Features#

Bug Fixes#

Contributors to this release#

(GitHub contributors page for this release)

v2019.2.28#

Features#

Bug Fixes#

  • Fix bug on build catalog and move exclude_dirs to locations (GH#33) @matt-long

Internal Changes#