Changelog#
Unreleased#
New features added#
Add
opendap
as a possible data format.#570 (@aulemahal)
v2022.9.18#
New features added#
Add
keys_info()
method #515 (@andersy005)Discard
catalog_file
attribute after a search #514 (@andersy005)ENH: Add .to_datatree() method and remove to collection #512 (@mgrover1)
FIX: Add fixes to allow reading kerchunk catalog #485 (@mgrover1)
Allow saving catalog via fsspec protocols #469 (@andersy005)
Change import to have configurable prefixes via
set_options
#460 (@aulemahal)Add
get_available_cats()
method totutorial.py
#458 (@jukent)Support for the grid mapping attribute and variable #449 (@RondeauG)
Subset derived variable registry only with used derived variables #446 (@aulemahal)
Add
last_updated
to theESMCatalogModel
#442 (@andersy005)Derived Catalog: test for all needed variables and skip if existing #441 (@aulemahal)
Support iterable columns with
require_all_on
#435 (@aulemahal)Improve search with derived variables #428 (@aulemahal)
Expose
pd.DataFrame.to_csv
andjson.dump
keyword arguments #421 (@andersy005)Search returns same class as self - allowing subclassing #417 (@aulemahal)
Added detection of โ*โ to enable opening of multi-file datasets. #395 (@andersy005)
Add
skip_on_error
option when loading datasets #390 (@andersy005)Add query to derived variables #389 (@andersy005)
Add method for loading registry from a Python module #386 (@andersy005)
Add functionality for derived variables #379 (@andersy005)
Use pydantic.validate_arguments decorator to validate individual functions #377 (@andersy005)
Ensure multi variable catalogs are parsed properly #375 (@andersy005)
Fix
__repr__
and__repr_html__
#374 (@andersy005)Fix catalog serialization #373 (@andersy005)
Add ESMDataSource #372 (@andersy005)
Add Query Model #370 (@andersy005)
Use ESMCatModel Pydantic model #368 (@andersy005)
Update pydantic models #367 (@andersy005)
Bugs fixed#
Maintenance and upkeep improvements#
Update isort configuration to use
profile=black
#528 (@andersy005)upgrade dependencies #517 (@andersy005)
use micromamba in CI #500 (@andersy005)
Drop support for Python 3.7 #499 (@andersy005)
Bump styfle/cancel-workflow-action from 0.9.1 to 0.10.0 #486 (@dependabot)
Bump codecov/codecov-action from 2.1.0 to 3.0.0 #472 (@dependabot)
[pre-commit.ci] pre-commit autoupdate #467 (@pre-commit-ci)
[pre-commit.ci] pre-commit autoupdate #466 (@pre-commit-ci)
[pre-commit.ci] pre-commit autoupdate #463 (@pre-commit-ci)
[pre-commit.ci] pre-commit autoupdate #459 (@pre-commit-ci)
Bump actions/setup-python from 2 to 3 #451 (@dependabot)
Bump actions/checkout from 2 to 3 #450 (@dependabot)
[pre-commit.ci] pre-commit autoupdate #438 (@pre-commit-ci)
Add pyupgrade to pre-commit hooks #433 (@andersy005)
Pin importlib-metadata to 2.0 #422 (@andersy005)
[pre-commit.ci] pre-commit autoupdate #410 (@pre-commit-ci)
[pre-commit.ci] pre-commit autoupdate #402 (@pre-commit-ci)
[pre-commit.ci] pre-commit autoupdate #396 (@pre-commit-ci)
[pre-commit.ci] pre-commit autoupdate #392 (@pre-commit-ci)
Upgrade setup/build requirements #383 (@andersy005)
Update Pull Request Template #376 (@andersy005)
[pre-commit.ci] pre-commit autoupdate #371 (@pre-commit-ci)
Update pre-commit hooks #369 (@andersy005)
[pre-commit.ci] pre-commit autoupdate #366 (@pre-commit-ci)
Bump codecov/codecov-action from 2.0.3 to 2.1.0 #365 (@dependabot)
Documentation improvements#
Update how to guides #523 (@andersy005)
Reorganize the documentation #521 (@andersy005)
add
sphinx-design
to list of extensions #505 (@andersy005)Docs improvement: use sphinx-design tabs #504 (@andersy005)
updating google-cmip6 cat to have more datasets #464 (@jukent)
replace references to collection with catalog #457 (@jukent)
Reorganize docs #413 (@andersy005)
Switch over to the furo theme #412 (@andersy005)
Use mamba when building docs on Readthedocs #408 (@andersy005)
Other merged PRs#
Check if all values in a groupby column are NaN or not NaN #526 (@andersy005)
[pre-commit.ci] pre-commit autoupdate #519 (@pre-commit-ci)
[pre-commit.ci] pre-commit autoupdate #516 (@pre-commit-ci)
Ensure global attributes added by intake-esm are compatible with netCDF and Zarr #509 (@andersy005)
ensure
storage_options
are passed to data loader #508 (@andersy005)[pre-commit.ci] pre-commit autoupdate #507 (@pre-commit-ci)
[pre-commit.ci] pre-commit autoupdate #497 (@pre-commit-ci)
[pre-commit.ci] pre-commit autoupdate #494 (@pre-commit-ci)
Bump pypa/gh-action-pypi-publish from 1.5.0 to 1.5.1 #493 (@dependabot)
[pre-commit.ci] pre-commit autoupdate #492 (@pre-commit-ci)
[pre-commit.ci] pre-commit autoupdate #489 (@pre-commit-ci)
[pre-commit.ci] pre-commit autoupdate #488 (@pre-commit-ci)
[pre-commit.ci] pre-commit autoupdate #484 (@pre-commit-ci)
[pre-commit.ci] pre-commit autoupdate #483 (@pre-commit-ci)
Bump actions/setup-python from 3 to 4 #482 (@dependabot)
[pre-commit.ci] pre-commit autoupdate #476 (@pre-commit-ci)
Bump codecov/codecov-action from 3.0.0 to 3.1.0 #475 (@dependabot)
[pre-commit.ci] pre-commit autoupdate #473 (@pre-commit-ci)
Ensure
storage_options
are passed toesmcat.save()
#471 (@andersy005)Ensure fsspec storage options are propagated to
xr.open_dataset
#453 (@jukent)Remove unused
@df.setter
#440 (@andersy005)Fix dependency version conflict #432 (@andersy005)
Bump pypa/gh-action-pypi-publish from 1.4.2 to 1.5.0 #431 (@dependabot)
[pre-commit.ci] pre-commit autoupdate #430 (@pre-commit-ci)
Fix CI failures: donโt turn warnings into errors #429 (@andersy005)
Support mixed data formats #416 (@aulemahal)
[pre-commit.ci] pre-commit autoupdate #411 (@pre-commit-ci)
Exclude buggy xarray versions #406 (@andersy005)
Add Python 3.10 to CI #400 (@andersy005)
Remove linting workflow: use pre-commit.ci #399 (@andersy005)
ESMCat pydantic Model: make id optional #398 (@andersy005)
Properly check whether dataframe is empty #394 (@andersy005)
Fix
.nunique()
and.unique()
methods #391 (@andersy005)Rename
_types
module tocat
#381 (@andersy005)[pre-commit.ci] pre-commit autoupdate #364 (@pre-commit-ci)
[pre-commit.ci] pre-commit autoupdate #363 (@pre-commit-ci)
Bump codecov/codecov-action from 2.0.2 to 2.0.3 #362 (@dependabot)
Update changelog in prep for new release #358 (@andersy005)
Contributors to this release#
(GitHub contributors page for this release)
@andersy005 | @aulemahal | @d70-t | @dependabot | @jukent | @MackenzieBlanusa | @mgrover1 | @pre-commit-ci | @RondeauG | @wachsylon
v2021.8.17#
Enhancements made#
Add pydantic models to facilitate data validation #347 (@andersy005)
Maintenance and upkeep improvements#
[pre-commit.ci] pre-commit autoupdate #355 (@pre-commit-ci)
skip cmip6_preprocessing tests for the time being #354 (@andersy005)
Bump styfle/cancel-workflow-action from 0.9.0 to 0.9.1 #348 (@dependabot)
Update pre-commit hooks #346 (@andersy005)
Bump codecov/codecov-action from 1 to 2.0.2 #345 (@dependabot)
Disable workflows on Forks #342 (@andersy005)
๐ Add missing test dependency #340 (@andersy005)
Code refactoring #338 (@andersy005)
Bump pre-commit/action from v2.0.2 to v2.0.3 #337 (@dependabot)
Bump styfle/cancel-workflow-action from 0.8.0 to 0.9.0 #334 (@dependabot)
Bump pre-commit/action from v2.0.0 to v2.0.2 #333 (@dependabot)
Bump styfle/cancel-workflow-action from 0.7.0 to 0.8.0 #322 (@dependabot)
๐ Fix CI #321 (@andersy005)
Fix Tests: Use a publicly available s3 object #318 (@andersy005)
Bump styfle/cancel-workflow-action from 0.6.0 to 0.7.0 #316 (@dependabot)
Documentation improvements#
Docs: Execute all notebooks #341 (@andersy005)
๐ Enable comments in docs via sphinx-comments #326 (@andersy005)
Other merged PRs#
Contributors to this release#
v2021.1.15#
Bug Fixes#
Fix memory error when computing unique values #313 (@andersy005)
Breaking Changes#
๐ฆ Drop support for Python 3.6 #311 (@andersy005)
Internal Changes#
โฌ๏ธ Upgrade dependencies & pin versions in CI environment #314 (@andersy005)
๐ Fix failing upstream-dev CI #310 (@andersy005)
Documentation#
Update MPI catalogs for MISTRAL #308 (@aaronspring)
Contributors to this release#
v2020.12.18#
Bug Fixes#
๐ Disable
_requested_variables
for single variable assets #306 (@andersy005)
Internal Changes#
Update changelog in preparation for new release #307 (@andersy005)
Use
github-activity
to update list of contributors #302 (@andersy005)Add nbqa & Update prettier commit hooks #300 (@andersy005)
Update pre-commit and GH actions #299 (@andersy005)
Contributors to this release#
(GitHub contributors page for this release)
@andersy005 | @dcherian | @jbusecke | @naomi-henderson | @Recalculate
v2020.11.4#
Features#
โจ Support multiple variable assets/files. (GH#287) @andersy005
โจ Add utility function for printing version information. (GH#284) @andersy005
Breaking Changes#
๐ฅ Remove unnecessary logging bits. (GH#297) @andersy005
Bug Fixes#
โ๏ธ Fix test failures. (GH#280) @andersy005
Fix TypeError bug in
.search()
method when using wildcard and regular expressions. (GH#285) @andersy005Use file like object when dealing with netcdf in the cloud. (GH#292) @andersy005
Documentation#
๐ Fix ReadtheDocs documentation builds. (GH#286) @andersy005
๐ Migrate docs from restructured text to markdown via
myst-parsers
. (GH#296) @andersy005๐จ Refactor documentation contents & add new notebooks. (GH#298) @andersy005
Internal Changes#
Fix import errors due to intake/intake#526. (GH#282) @andersy005
Migrate CI from CircleCI to GitHub Actions. (GH#283) @andersy005
Use mamba to speed up CI testing. (GH#293) @andersy005
Enable dependabot updates. (GH#294) @andersy005
Test against Python 3.9. (GH#295) @andersy005
Contributors to this release#
(GitHub contributors page for this release)
@andersy005 | @dcherian | @jbusecke | @jukent | @sherimickelson
v2020.8.15#
Features#
Support regular expression objects in
search()
(GH#236) @andersy005Support wildcard expresssions in
search()
(GH#259) @andersy005Expose attributes used when aggregating/combining datasets (GH#268) @andersy005
Support turning aggregations off (GH#269) @andersy005
Improve error messages (GH#270) @andersy005
Expose aggregations options passed to xarray during datasets aggregation (GH#272) @andersy005
Reset
_entries
dict after updating aggregations (GH#274) @andersy005
Documentation#
Internal Changes#
Update pre-commit hooks & GitHub actions (GH#260) @andersy005
Update badges (GH#258) @andersy005
Update upstream environment (GH#263) @andersy005
Refactor search functionality into a standalone module (GH#267) @andersy005
Fix dask/concurrent.futures parallelism (GH#271) @andersy005
Increase test coverage to ~100% (GH#273) @andersy005
Bump minimum required versions (GH#275) @andersy005
Contributors to this release#
(GitHub contributors page for this release)
@andersy005 | @bonnland | @dcherian | @jeffdlb | @jukent | @kmpaul | @markusritschel | @martindurant | @matt-long
v2020.6.11#
Features#
Add
df
property setter (GH#247) @andersy005
Documentation#
Use Pandas sphinx theme (GH#244) @andersy005
Update documentation tutorial (GH#252) @andersy005 & @charlesbluca
Internal Changes#
Fix anti-patterns and other bug risks (GH#251) @andersy005
Sync with intakeโs Entry unification (GH#249) @andersy005
Contributors to this release#
v2020.5.21#
Features#
Provide informative message/warnings from empty queries. (GH#235) @andersy005
Replace tqdm progressbar with fastprogress. (GH#238) @andersy005
Add
catalog_file
attribute toesm_datastore
class. (GH#240) @andersy005
Contributors to this release#
(GitHub contributors page for this release)
@andersy005 | @bonnland | @dcherian | @jbusecke | @jeffdlb | @kmpaul | @markusritschel
v2020.5.01#
Features#
Add html representation for the catalog object. (GH#229) @andersy005
Move logic for assets aggregation into
ESMGroupDataSource()
and add few basic dict-like methods (keys()
,len()
,getitem()
,contains()
) to the catalog object. (GH#194) @andersy005 & @jhamman & @kmpaulSupport columns with iterables in
unique()
andnunique()
. (GH#223) @andersy005
Bug Fixes#
Internal Changes#
Increase test coverage. (GH#222) @andersy005
Contributors to this release#
(GitHub contributors page for this release)
@andersy005 | @bonnland | @dcherian | @jbusecke | @jhamman | @kmpaul | @sherimickelson
v2020.3.16#
Features#
Add
progressbar
argument toto_dataset_dict()
. This allows the user to override the defaultprogressbar
value used during the class instantiation. (GH#204) @andersy005Enhanced search: enforce query criteria via
require_all_on
argument viasearch()
method. (GH#202) & (GH#207) & (GH#209) @andersy005 & @jbuseckeSupport relative paths for catalog files. (GH#208) @andersy005
Bug Fixes#
Use raw path if protocol is
None
. (GH#210) @andersy005
Internal Changes#
Github Action to publish package to PyPI on release. (GH#190) @andersy005
Remove unnecessary inheritance. (GH#193) @andersy005
Update linting GitHub action to run on all pull requests. (GH#196) @andersy005
Contributors to this release#
(GitHub contributors page for this release)
@andersy005 | @bonnland | @dcherian | @jbusecke | @jhamman | @kmpaul
v2019.12.13#
Features#
Add optional
preprocess
argument toto_dataset_dict()
(GH#155) @matt-longAllow users to disable dataset aggregations by passing
aggregate=False
toto_dataset_dict()
(GH#164) @matt-longAvoid manipulating dataset coordinates by using
data_vars=varname
when concatenating datasets via xarray {py:func}:~xarray.concat()
(GH#174) @andersy005Support loading netCDF assets from openDAP endpoints (GH#176) @andersy005
Add
serialize()
method to serialize collection/catalog (GH#179) @andersy005Allow passing extra storage options to the backend file system via
to_dataset_dict()
(GH#180) @bonnlandProvide informational messages to the user via Logging module (GH#186) @andersy005
Bug Fixes#
Remove the caching option (GH#158) @matt-long
Preserve encoding when aggregating datasets (GH#161) @matt-long
Sort aggregations to make sure {py:func}:
~intake_esm.merge_util.join_existing
is always done before {py:func}:~intake_esm.merge_util.join_new
(GH#171) @andersy005
Documentation#
Internal Changes#
Simplify group loading by using
concurrent.futures
(GH#185) @andersy005
Contributors to this release#
(GitHub contributors page for this release)
@andersy005 | @bonnland | @dcherian | @jbusecke | @jhamman | @matt-long | @naomi-henderson | @Recalculate | @sebasblancogonz
v2019.10.15#
Features#
Rewrite
intake-esm
โs core based on(esm-collection-spec)
_ Earth System Model Collection specification (GH#135) @andersy005, @matt-long, @rabernat
Breaking changes#
Replaced {py:class}:
~intake_esm.core.esm_metadatastore
with {py:class}:~intake_esm.core.esm_datastore
, see the API reference for more details.intake-esm
wonโt build collection catalogs anymore.intake-esm
now expects an ESM collection JSON file as input. This JSON should conform to the Earth System Model Collection specification.
Contributors to this release#
(GitHub contributors page for this release)
@aaronspring | @andersy005 | @bonnland | @dcherian | @n-henderson | @naomi-henderson | @rabernat
v2019.8.23#
Features#
Add
mistral
data holdings tointake-esm-datastore
(GH#133) @aaronspringReplace
.csv
withnetCDF
as serialization format when saving the built collection to disk. WithnetCDF
, we can record very useful information into the global attributes of the netCDF dataset. (GH#119) @andersy005Add string representation of
ESMMetadataStoreCatalog`` object ({pr}
122`) @andersy005Automatically build missing collections by calling
esm_metadatastore(collection_name="GLADE-CMIP5")
. When the specified collection is part of the curated collections inintake-esm-datastore
. (GH#124) @andersy005In [1]: import intake In [2]: col = intake.open_esm_metadatastore(collection_name="GLADE-CMIP5") In [3]: # if "GLADE-CMIP5" collection isn't built already, the above is equivalent to: In [4]: col = intake.open_esm_metadatastore(collection_input_definition="GLADE-CMIP5")
Revert back to using official DRS attributes when building CMIP5 and CMIP6 collections. (GH#126) @andersy005
Add
.df
property for interfacing with the built collection via dataframe To maintain backwards compatiblity. (GH#127) @andersy005Add
unique()
andnunique()
methods for summarizing count and unique values in a collection. (GH#128) @andersy005In [1]: import intake In [2]: col = intake.open_esm_metadatastore(collection_name="GLADE-CMIP5") In [3]: col Out[3]: GLADE-CMIP5 collection catalogue with 615853 entries: > 3 resource(s) > 1 resource_type(s) > 1 direct_access(s) > 1 activity(s) > 218 ensemble_member(s) > 51 experiment(s) > 312093 file_basename(s) > 615853 file_fullpath(s) > 6 frequency(s) > 25 institute(s) > 15 mip_table(s) > 53 model(s) > 7 modeling_realm(s) > 3 product(s) > 9121 temporal_subset(s) > 454 variable(s) > 489 version(s) In[4]: col.nunique() resource 3 resource_type 1 direct_access 1 activity 1 ensemble_member 218 experiment 51 file_basename 312093 file_fullpath 615853 frequency 6 institute 25 mip_table 15 model 53 modeling_realm 7 product 3 temporal_subset 9121 variable 454 version 489 dtype: int64 In[4]: col.unique(columns=['frequency', 'modeling_realm']) {'frequency': {'count': 6, 'values': ['mon', 'day', '6hr', 'yr', '3hr', 'fx']}, 'modeling_realm': {'count': 7, 'values': ['atmos', 'land', 'ocean', 'seaIce', 'ocnBgchem', 'landIce', 'aerosol']}}
Bug Fixes#
For CMIP6, extract
grid_label
from directory path instead of file name. (GH#127) @andersy005
Contributors to this release#
v2019.8.5#
Features#
Support building collections using inputs from intake-esm-datastore repository. (GH#79) @andersy005
Ensure that requested files are available locally before loading data into xarray datasets. (GH#82) @andersy005 and @matt-long
Split collection definitions out of config. (GH#83) @matt-long
Add
intake-esm-builder
, a CLI tool for building collection from the command line. (GH#89) @andersy005Add support for CESM-LENS data holdings residing in AWS S3. (GH#98) @andersy005
Sort collection upon creation according to order-by-columns, pass urlpath through stack for use in parsing collection filenames (GH#100) @pbranson
Bug Fixes#
Fix bug in
_list_files_hsi()
to return list instead of filter object. (GH#81) @matt-long and @andersy005cesm._get_file_attrs
fixed to break loop when longeststream
is matched. (GH#80) @matt-longRestore
non_dim_coords
to data variables all the time. (GH#90) @andersy005Fix bug in
intake_esm/cesm.py
that causedintake-esm
to exclude hourly (1hr, 6hr, etc..) CESM-LE data. (GH#110) @andersy005Fix bugs in
intake_esm/cmip.py
that caused improper regular expression matching fortable_id
andgrid_label
. (GH#113) & (GH#111) @naomi-henderson and @andersy005
Internal Changes#
Refactor existing functionality to make intake-esm robust and extensible. (GH#77) @andersy005
Add
aggregate._override_coords
function to override dim coordinates except time in case thereโs floating point precision difference. (GH#108) @andersy005Fix CESM-LE ice component peculiarities that caused intake-esm to load data improperly. The fix separates variables for
ice
component into two separate components:ice_sh
: for southern hemisphereice_nh
: for northern hemisphere
Contributors to this release#
v2019.5.11#
Features#
Add implementation for The Gridded Meteorological Ensemble Tool (GMET) data holdings (GH#61) @andersy005
Allow users to specify exclude*dirs for CMIP collections (GH#63) & (GH#62) @andersy005
Keep CMIP6
tracking_id
inmerge_keys
(GH#67) @andersy005Add implementation for ERA5 datasets (GH#68) @andersy005
Contributors to this release#
v2019.4.26#
Features#
Add implementations for
CMIPCollection
andCMIPSource
(GH#38) @andersy005Add support for CMIP6 data (GH#46) @andersy005
Add implementation for The Max Planck Institute Grand Ensemble (MPI-GE) data holdings (GH#52) & (GH#51) @aaronspring and @andersy005
Return dictionary of datasets all the time for consistency (GH#56) @andersy005
Bug Fixes#
Include multiple netcdf files in same subdirectory (GH#55) & (GH#54) @naomi-henderson and @andersy005
Contributors to this release#
v2019.2.28#
Features#
Allow CMIP integration (GH#35) @andersy005
Bug Fixes#
Fix bug on build catalog and move
exclude_dirs
tolocations
(GH#33) @matt-long
Internal Changes#
Change Logger, update dev-environment dependencies, and formatting fix in input.yml (GH#31) @matt-long
Update CircleCI workflow (GH#32) @andersy005
Rename package from
intake-cesm
tointake-esm
(GH#34) @andersy005