Changelog¶

Intake-esm v2020.12.18¶

(full changelog)

Bug Fixes¶

🐛 Disable _requested_variables for single variable assets #306 (@andersy005)

Internal Changes¶

Update changelog in preparation for new release #307 (@andersy005)
Use github-activity to update list of contributors #302 (@andersy005)
Add nbqa & Update prettier commit hooks #300 (@andersy005)
Update pre-commit and GH actions #299 (@andersy005)

Contributors to this release¶

(GitHub contributors page for this release)

@andersy005 | @dcherian | @jbusecke | @naomi-henderson | @Recalculate

Intake-esm v2020.11.4¶

Features¶

✨ Support multiple variable assets/files. (GH#287) @andersy005
✨ Add utility function for printing version information. (GH#284) @andersy005

Breaking Changes¶

💥 Remove unnecessary logging bits. (GH#297) @andersy005

Bug Fixes¶

✔️ Fix test failures. (GH#280) @andersy005
Fix TypeError bug in .search() method when using wildcard and regular expressions. (GH#285) @andersy005
Use file like object when dealing with netcdf in the cloud. (GH#292) @andersy005

Documentation¶

📚 Fix ReadtheDocs documentation builds. (GH#286) @andersy005
📚 Migrate docs from restructured text to markdown via myst-parsers. (GH#296) @andersy005
🔨 Refactor documentation contents & add new notebooks. (GH#298) @andersy005

Internal Changes¶

Fix import errors due to intake/intake#526. (GH#282) @andersy005
Migrate CI from CircleCI to GitHub Actions. (GH#283) @andersy005
Use mamba to speed up CI testing. (GH#293) @andersy005
Enable dependabot updates. (GH#294) @andersy005
Test against Python 3.9. (GH#295) @andersy005

Contributors to this release¶

(GitHub contributors page for this release)

@andersy005 | @dcherian | @jbusecke | @jukent | @sherimickelson

Intake-esm v2020.8.15¶

Features¶

Support regular expression objects in search() (GH#236) @andersy005
Support wildcard expresssions in search() (GH#259) @andersy005
Expose attributes used when aggregating/combining datasets (GH#268) @andersy005
Support turning aggregations off (GH#269) @andersy005
Improve error messages (GH#270) @andersy005
Expose aggregations options passed to xarray during datasets aggregation (GH#272) @andersy005
Reset _entries dict after updating aggregations (GH#274) @andersy005

Documentation¶

Update to_dataset_dict() docstring to inform users on how cdf_kwargs argument is used in regards to chunking (GH#278) @bonnland

Internal Changes¶

Update pre-commit hooks & GitHub actions (GH#260) @andersy005
Update badges (GH#258) @andersy005
Update upstream environment (GH#263) @andersy005
Refactor search functionality into a standalone module (GH#267) @andersy005
Fix dask/concurrent.futures parallelism (GH#271) @andersy005
Increase test coverage to ~100% (GH#273) @andersy005
Bump minimum required versions (GH#275) @andersy005

Contributors to this release¶

(GitHub contributors page for this release)

Intake-esm v2020.6.11¶

Features¶

Add df property setter (GH#247) @andersy005

Documentation¶

Use Pandas sphinx theme (GH#244) @andersy005
Update documentation tutorial (GH#252) @andersy005 & @charlesbluca

Internal Changes¶

Fix anti-patterns and other bug risks (GH#251) @andersy005
Sync with intake’s Entry unification (GH#249) @andersy005

Contributors to this release¶

(GitHub contributors page for this release)

@andersy005 | @jhamman | @martindurant

Intake-esm v2020.5.21¶

Features¶

Provide informative message/warnings from empty queries. (GH#235) @andersy005
Replace tqdm progressbar with fastprogress. (GH#238) @andersy005
Add catalog_file attribute to esm_datastore class. (GH#240) @andersy005

Contributors to this release¶

(GitHub contributors page for this release)

Intake-esm v2020.5.01¶

Features¶

Add html representation for the catalog object. (GH#229) @andersy005
Move logic for assets aggregation into ESMGroupDataSource() and add few basic dict-like methods (keys(), len(), getitem(), contains()) to the catalog object. (GH#194) @andersy005 & @jhamman & @kmpaul
Support columns with iterables in unique() and nunique(). (GH#223) @andersy005

Bug Fixes¶

Revert back to using concurrent.futures to address failures due to dask’s distributed scheduler. (GH#225) & (GH#226)

Internal Changes¶

Increase test coverage. (GH#222) @andersy005

Contributors to this release¶

(GitHub contributors page for this release)

Intake-esm v2020.3.16¶

Features¶

Support single file catalogs. (GH#195) @bonnland
Add progressbar argument to to_dataset_dict(). This allows the user to override the default progressbar value used during the class instantiation. (GH#204) @andersy005
Enhanced search: enforce query criteria via require_all_on argument via search() method. (GH#202) & (GH#207) & (GH#209) @andersy005 & @jbusecke
Support relative paths for catalog files. (GH#208) @andersy005

Bug Fixes¶

Use raw path if protocol is None. (GH#210) @andersy005

Internal Changes¶

Github Action to publish package to PyPI on release. (GH#190) @andersy005
Remove unnecessary inheritance. (GH#193) @andersy005
Update linting GitHub action to run on all pull requests. (GH#196) @andersy005

Contributors to this release¶

(GitHub contributors page for this release)

Intake-esm v2019.12.13¶

Features¶

Add optional preprocess argument to to_dataset_dict() (GH#155) @matt-long
Allow users to disable dataset aggregations by passing aggregate=False to to_dataset_dict() (GH#164) @matt-long
Avoid manipulating dataset coordinates by using data_vars=varname when concatenating datasets via xarray {py:func}:~xarray.concat() (GH#174) @andersy005
Support loading netCDF assets from openDAP endpoints (GH#176) @andersy005
Add serialize() method to serialize collection/catalog (GH#179) @andersy005
Allow passing extra storage options to the backend file system via to_dataset_dict() (GH#180) @bonnland
Provide informational messages to the user via Logging module (GH#186) @andersy005

Bug Fixes¶

Remove the caching option (GH#158) @matt-long
Preserve encoding when aggregating datasets (GH#161) @matt-long
Sort aggregations to make sure {py:func}:~intake_esm.merge_util.join_existing is always done before {py:func}:~intake_esm.merge_util.join_new (GH#171) @andersy005

Documentation¶

Add example for preprocessing function (GH#168) @jbusecke
Add FAQ style document to documentation (GH#182) & (GH#177) @andersy005 & @jhamman

Internal Changes¶

Simplify group loading by using concurrent.futures (GH#185) @andersy005

Contributors to this release¶

(GitHub contributors page for this release)

Intake-esm v2019.10.15¶

Features¶

Rewrite intake-esm’s core based on (esm-collection-spec)_ Earth System Model Collection specification (GH#135) @andersy005, @matt-long, @rabernat

Breaking changes¶

Replaced {py:class}:~intake_esm.core.esm_metadatastore with {py:class}:~intake_esm.core.esm_datastore, see the API reference for more details.
intake-esm won’t build collection catalogs anymore. intake-esm now expects an ESM collection JSON file as input. This JSON should conform to the Earth System Model Collection specification.

Contributors to this release¶

(GitHub contributors page for this release)

Intake-esm v2019.8.23¶

Features¶

Add mistral data holdings to intake-esm-datastore (GH#133) @aaronspring
Add support for NA-CORDEX data holdings. (GH#115) @jukent
Replace .csv with netCDF as serialization format when saving the built collection to disk. With netCDF, we can record very useful information into the global attributes of the netCDF dataset. (GH#119) @andersy005
Add string representation of ESMMetadataStoreCatalog`` object ({pr}122`) @andersy005

Automatically build missing collections by calling esm_metadatastore(collection_name="GLADE-CMIP5"). When the specified collection is part of the curated collections in intake-esm-datastore. (GH#124) @andersy005

In [1]: import intake

In [2]: col = intake.open_esm_metadatastore(collection_name="GLADE-CMIP5")

In [3]: # if "GLADE-CMIP5" collection isn't built already, the above is equivalent to:

In [4]: col = intake.open_esm_metadatastore(collection_input_definition="GLADE-CMIP5")

Revert back to using official DRS attributes when building CMIP5 and CMIP6 collections. (GH#126) @andersy005
Add .df property for interfacing with the built collection via dataframe To maintain backwards compatiblity. (GH#127) @andersy005

Add unique() and nunique() methods for summarizing count and unique values in a collection. (GH#128) @andersy005

In [1]: import intake

In [2]: col = intake.open_esm_metadatastore(collection_name="GLADE-CMIP5")

In [3]: col
Out[3]: GLADE-CMIP5 collection catalogue with 615853 entries: > 3 resource(s)

          > 1 resource_type(s)

          > 1 direct_access(s)

          > 1 activity(s)

          > 218 ensemble_member(s)

          > 51 experiment(s)

          > 312093 file_basename(s)

          > 615853 file_fullpath(s)

          > 6 frequency(s)

          > 25 institute(s)

          > 15 mip_table(s)

          > 53 model(s)

          > 7 modeling_realm(s)

          > 3 product(s)

          > 9121 temporal_subset(s)

          > 454 variable(s)

          > 489 version(s)

In[4]: col.nunique()

resource 3
resource_type 1
direct_access 1
activity 1
ensemble_member 218
experiment 51
file_basename 312093
file_fullpath 615853
frequency 6
institute 25
mip_table 15
model 53
modeling_realm 7
product 3
temporal_subset 9121
variable 454
version 489
dtype: int64

In[4]: col.unique(columns=['frequency', 'modeling_realm'])

{'frequency': {'count': 6, 'values': ['mon', 'day', '6hr', 'yr', '3hr', 'fx']},
'modeling_realm': {'count': 7, 'values': ['atmos', 'land', 'ocean', 'seaIce', 'ocnBgchem',
'landIce', 'aerosol']}}

Bug Fixes¶

For CMIP6, extract grid_label from directory path instead of file name. (GH#127) @andersy005

Contributors to this release¶

(GitHub contributors page for this release)

Intake-esm v2019.8.5¶

Features¶

Support building collections using inputs from intake-esm-datastore repository. (GH#79) @andersy005
Ensure that requested files are available locally before loading data into xarray datasets. (GH#82) @andersy005 and @matt-long
Split collection definitions out of config. (GH#83) @matt-long
Add intake-esm-builder, a CLI tool for building collection from the command line. (GH#89) @andersy005
Add support for CESM-LENS data holdings residing in AWS S3. (GH#98) @andersy005
Sort collection upon creation according to order-by-columns, pass urlpath through stack for use in parsing collection filenames (GH#100) @pbranson

Bug Fixes¶

Fix bug in _list_files_hsi() to return list instead of filter object. (GH#81) @matt-long and @andersy005
cesm._get_file_attrs fixed to break loop when longest stream is matched. (GH#80) @matt-long
Restore non_dim_coords to data variables all the time. (GH#90) @andersy005
Fix bug in intake_esm/cesm.py that caused intake-esm to exclude hourly (1hr, 6hr, etc..) CESM-LE data. (GH#110) @andersy005
Fix bugs in intake_esm/cmip.py that caused improper regular expression matching for table_id and grid_label. (GH#113) & (GH#111) @naomi-henderson and @andersy005

Internal Changes¶

Refactor existing functionality to make intake-esm robust and extensible. (GH#77) @andersy005
Add aggregate._override_coords function to override dim coordinates except time in case there’s floating point precision difference. (GH#108) @andersy005
Fix CESM-LE ice component peculiarities that caused intake-esm to load data improperly. The fix separates variables for ice component into two separate components:
- ice_sh: for southern hemisphere
- ice_nh: for northern hemisphere
(GH#114) @andersy005

Contributors to this release¶

(GitHub contributors page for this release)

Intake-esm v2019.5.11¶

Features¶

Add implementation for The Gridded Meteorological Ensemble Tool (GMET) data holdings (GH#61) @andersy005
Allow users to specify exclude*dirs for CMIP collections (GH#63) & (GH#62) @andersy005
Keep CMIP6 tracking_id in merge_keys (GH#67) @andersy005
Add implementation for ERA5 datasets (GH#68) @andersy005

Contributors to this release¶

(GitHub contributors page for this release)

Intake-esm v2019.4.26¶

Features¶

Add implementations for CMIPCollection and CMIPSource (GH#38) @andersy005
Add support for CMIP6 data (GH#46) @andersy005
Add implementation for The Max Planck Institute Grand Ensemble (MPI-GE) data holdings (GH#52) & (GH#51) @aaronspring and @andersy005
Return dictionary of datasets all the time for consistency (GH#56) @andersy005

Bug Fixes¶

Include multiple netcdf files in same subdirectory (GH#55) & (GH#54) @naomi-henderson and @andersy005

Contributors to this release¶

(GitHub contributors page for this release)

Intake-esm v2019.2.28¶

Features¶

Allow CMIP integration (GH#35) @andersy005

Bug Fixes¶

Fix bug on build catalog and move exclude_dirs to locations (GH#33) @matt-long

Internal Changes¶

Change Logger, update dev-environment dependencies, and formatting fix in input.yml (GH#31) @matt-long
Update CircleCI workflow (GH#32) @andersy005
Rename package from intake-cesm to intake-esm (GH#34) @andersy005

Contribution Guide