(full changelog)
🐛 Disable _requested_variables for single variable assets #306 (@andersy005)
_requested_variables
Update changelog in preparation for new release #307 (@andersy005)
Use github-activity to update list of contributors #302 (@andersy005)
github-activity
Add nbqa & Update prettier commit hooks #300 (@andersy005)
Update pre-commit and GH actions #299 (@andersy005)
(GitHub contributors page for this release)
@andersy005 | @dcherian | @jbusecke | @naomi-henderson | @Recalculate
✨ Support multiple variable assets/files. (GH#287) @andersy005
✨ Add utility function for printing version information. (GH#284) @andersy005
💥 Remove unnecessary logging bits. (GH#297) @andersy005
✔️ Fix test failures. (GH#280) @andersy005
Fix TypeError bug in .search() method when using wildcard and regular expressions. (GH#285) @andersy005
.search()
Use file like object when dealing with netcdf in the cloud. (GH#292) @andersy005
📚 Fix ReadtheDocs documentation builds. (GH#286) @andersy005
📚 Migrate docs from restructured text to markdown via myst-parsers. (GH#296) @andersy005
myst-parsers
🔨 Refactor documentation contents & add new notebooks. (GH#298) @andersy005
Fix import errors due to intake/intake#526. (GH#282) @andersy005
Migrate CI from CircleCI to GitHub Actions. (GH#283) @andersy005
Use mamba to speed up CI testing. (GH#293) @andersy005
Enable dependabot updates. (GH#294) @andersy005
Test against Python 3.9. (GH#295) @andersy005
@andersy005 | @dcherian | @jbusecke | @jukent | @sherimickelson
Support regular expression objects in search() (GH#236) @andersy005
search()
Support wildcard expresssions in search() (GH#259) @andersy005
Expose attributes used when aggregating/combining datasets (GH#268) @andersy005
Support turning aggregations off (GH#269) @andersy005
Improve error messages (GH#270) @andersy005
Expose aggregations options passed to xarray during datasets aggregation (GH#272) @andersy005
Reset _entries dict after updating aggregations (GH#274) @andersy005
_entries
Update to_dataset_dict() docstring to inform users on how cdf_kwargs argument is used in regards to chunking (GH#278) @bonnland
to_dataset_dict()
cdf_kwargs
Update pre-commit hooks & GitHub actions (GH#260) @andersy005
Update badges (GH#258) @andersy005
Update upstream environment (GH#263) @andersy005
Refactor search functionality into a standalone module (GH#267) @andersy005
Fix dask/concurrent.futures parallelism (GH#271) @andersy005
Increase test coverage to ~100% (GH#273) @andersy005
Bump minimum required versions (GH#275) @andersy005
@andersy005 | @bonnland | @dcherian | @jeffdlb | @jukent | @kmpaul | @markusritschel | @martindurant | @matt-long
Add df property setter (GH#247) @andersy005
df
Use Pandas sphinx theme (GH#244) @andersy005
Update documentation tutorial (GH#252) @andersy005 & @charlesbluca
Fix anti-patterns and other bug risks (GH#251) @andersy005
Sync with intake’s Entry unification (GH#249) @andersy005
@andersy005 | @jhamman | @martindurant
Provide informative message/warnings from empty queries. (GH#235) @andersy005
Replace tqdm progressbar with fastprogress. (GH#238) @andersy005
Add catalog_file attribute to esm_datastore class. (GH#240) @andersy005
catalog_file
esm_datastore
@andersy005 | @bonnland | @dcherian | @jbusecke | @jeffdlb | @kmpaul | @markusritschel
Add html representation for the catalog object. (GH#229) @andersy005
Move logic for assets aggregation into ESMGroupDataSource() and add few basic dict-like methods (keys(), len(), getitem(), contains()) to the catalog object. (GH#194) @andersy005 & @jhamman & @kmpaul
ESMGroupDataSource()
keys()
len()
getitem()
contains()
Support columns with iterables in unique() and nunique(). (GH#223) @andersy005
unique()
nunique()
Revert back to using concurrent.futures to address failures due to dask’s distributed scheduler. (GH#225) & (GH#226)
concurrent.futures
Increase test coverage. (GH#222) @andersy005
@andersy005 | @bonnland | @dcherian | @jbusecke | @jhamman | @kmpaul | @sherimickelson
Support single file catalogs. (GH#195) @bonnland
Add progressbar argument to to_dataset_dict(). This allows the user to override the default progressbar value used during the class instantiation. (GH#204) @andersy005
progressbar
Enhanced search: enforce query criteria via require_all_on argument via search() method. (GH#202) & (GH#207) & (GH#209) @andersy005 & @jbusecke
require_all_on
Support relative paths for catalog files. (GH#208) @andersy005
Use raw path if protocol is None. (GH#210) @andersy005
None
Github Action to publish package to PyPI on release. (GH#190) @andersy005
Remove unnecessary inheritance. (GH#193) @andersy005
Update linting GitHub action to run on all pull requests. (GH#196) @andersy005
@andersy005 | @bonnland | @dcherian | @jbusecke | @jhamman | @kmpaul
Add optional preprocess argument to to_dataset_dict() (GH#155) @matt-long
preprocess
Allow users to disable dataset aggregations by passing aggregate=False to to_dataset_dict() (GH#164) @matt-long
aggregate=False
Avoid manipulating dataset coordinates by using data_vars=varname when concatenating datasets via xarray {py:func}:~xarray.concat() (GH#174) @andersy005
data_vars=varname
~xarray.concat()
Support loading netCDF assets from openDAP endpoints (GH#176) @andersy005
Add serialize() method to serialize collection/catalog (GH#179) @andersy005
serialize()
Allow passing extra storage options to the backend file system via to_dataset_dict() (GH#180) @bonnland
Provide informational messages to the user via Logging module (GH#186) @andersy005
Remove the caching option (GH#158) @matt-long
Preserve encoding when aggregating datasets (GH#161) @matt-long
Sort aggregations to make sure {py:func}:~intake_esm.merge_util.join_existing is always done before {py:func}:~intake_esm.merge_util.join_new (GH#171) @andersy005
~intake_esm.merge_util.join_existing
~intake_esm.merge_util.join_new
Add example for preprocessing function (GH#168) @jbusecke
Add FAQ style document to documentation (GH#182) & (GH#177) @andersy005 & @jhamman
Simplify group loading by using concurrent.futures (GH#185) @andersy005
@andersy005 | @bonnland | @dcherian | @jbusecke | @jhamman | @matt-long | @naomi-henderson | @Recalculate | @sebasblancogonz
Rewrite intake-esm’s core based on (esm-collection-spec)_ Earth System Model Collection specification (GH#135) @andersy005, @matt-long, @rabernat
intake-esm
(esm-collection-spec)
Replaced {py:class}:~intake_esm.core.esm_metadatastore with {py:class}:~intake_esm.core.esm_datastore, see the API reference for more details.
~intake_esm.core.esm_metadatastore
~intake_esm.core.esm_datastore
intake-esm won’t build collection catalogs anymore. intake-esm now expects an ESM collection JSON file as input. This JSON should conform to the Earth System Model Collection specification.
@aaronspring | @andersy005 | @bonnland | @dcherian | @n-henderson | @naomi-henderson | @rabernat
Add mistral data holdings to intake-esm-datastore (GH#133) @aaronspring
mistral
intake-esm-datastore
Add support for NA-CORDEX data holdings. (GH#115) @jukent
NA-CORDEX
Replace .csv with netCDF as serialization format when saving the built collection to disk. With netCDF, we can record very useful information into the global attributes of the netCDF dataset. (GH#119) @andersy005
.csv
netCDF
Add string representation of ESMMetadataStoreCatalog`` object ({pr}122`) @andersy005
ESMMetadataStoreCatalog`` object ({pr}
Automatically build missing collections by calling esm_metadatastore(collection_name="GLADE-CMIP5"). When the specified collection is part of the curated collections in intake-esm-datastore. (GH#124) @andersy005
esm_metadatastore(collection_name="GLADE-CMIP5")
In [1]: import intake In [2]: col = intake.open_esm_metadatastore(collection_name="GLADE-CMIP5") In [3]: # if "GLADE-CMIP5" collection isn't built already, the above is equivalent to: In [4]: col = intake.open_esm_metadatastore(collection_input_definition="GLADE-CMIP5")
Revert back to using official DRS attributes when building CMIP5 and CMIP6 collections. (GH#126) @andersy005
Add .df property for interfacing with the built collection via dataframe To maintain backwards compatiblity. (GH#127) @andersy005
.df
Add unique() and nunique() methods for summarizing count and unique values in a collection. (GH#128) @andersy005
In [1]: import intake In [2]: col = intake.open_esm_metadatastore(collection_name="GLADE-CMIP5") In [3]: col Out[3]: GLADE-CMIP5 collection catalogue with 615853 entries: > 3 resource(s) > 1 resource_type(s) > 1 direct_access(s) > 1 activity(s) > 218 ensemble_member(s) > 51 experiment(s) > 312093 file_basename(s) > 615853 file_fullpath(s) > 6 frequency(s) > 25 institute(s) > 15 mip_table(s) > 53 model(s) > 7 modeling_realm(s) > 3 product(s) > 9121 temporal_subset(s) > 454 variable(s) > 489 version(s) In[4]: col.nunique() resource 3 resource_type 1 direct_access 1 activity 1 ensemble_member 218 experiment 51 file_basename 312093 file_fullpath 615853 frequency 6 institute 25 mip_table 15 model 53 modeling_realm 7 product 3 temporal_subset 9121 variable 454 version 489 dtype: int64 In[4]: col.unique(columns=['frequency', 'modeling_realm']) {'frequency': {'count': 6, 'values': ['mon', 'day', '6hr', 'yr', '3hr', 'fx']}, 'modeling_realm': {'count': 7, 'values': ['atmos', 'land', 'ocean', 'seaIce', 'ocnBgchem', 'landIce', 'aerosol']}}
For CMIP6, extract grid_label from directory path instead of file name. (GH#127) @andersy005
grid_label
Support building collections using inputs from intake-esm-datastore repository. (GH#79) @andersy005
Ensure that requested files are available locally before loading data into xarray datasets. (GH#82) @andersy005 and @matt-long
Split collection definitions out of config. (GH#83) @matt-long
Add intake-esm-builder, a CLI tool for building collection from the command line. (GH#89) @andersy005
intake-esm-builder
Add support for CESM-LENS data holdings residing in AWS S3. (GH#98) @andersy005
Sort collection upon creation according to order-by-columns, pass urlpath through stack for use in parsing collection filenames (GH#100) @pbranson
Fix bug in _list_files_hsi() to return list instead of filter object. (GH#81) @matt-long and @andersy005
_list_files_hsi()
cesm._get_file_attrs fixed to break loop when longest stream is matched. (GH#80) @matt-long
cesm._get_file_attrs
stream
Restore non_dim_coords to data variables all the time. (GH#90) @andersy005
non_dim_coords
Fix bug in intake_esm/cesm.py that caused intake-esm to exclude hourly (1hr, 6hr, etc..) CESM-LE data. (GH#110) @andersy005
intake_esm/cesm.py
Fix bugs in intake_esm/cmip.py that caused improper regular expression matching for table_id and grid_label. (GH#113) & (GH#111) @naomi-henderson and @andersy005
intake_esm/cmip.py
table_id
Refactor existing functionality to make intake-esm robust and extensible. (GH#77) @andersy005
Add aggregate._override_coords function to override dim coordinates except time in case there’s floating point precision difference. (GH#108) @andersy005
aggregate._override_coords
Fix CESM-LE ice component peculiarities that caused intake-esm to load data improperly. The fix separates variables for ice component into two separate components:
ice
ice_sh: for southern hemisphere
ice_sh
ice_nh: for northern hemisphere
ice_nh
(GH#114) @andersy005
Add implementation for The Gridded Meteorological Ensemble Tool (GMET) data holdings (GH#61) @andersy005
Allow users to specify exclude*dirs for CMIP collections (GH#63) & (GH#62) @andersy005
Keep CMIP6 tracking_id in merge_keys (GH#67) @andersy005
tracking_id
merge_keys
Add implementation for ERA5 datasets (GH#68) @andersy005
Add implementations for CMIPCollection and CMIPSource (GH#38) @andersy005
CMIPCollection
CMIPSource
Add support for CMIP6 data (GH#46) @andersy005
Add implementation for The Max Planck Institute Grand Ensemble (MPI-GE) data holdings (GH#52) & (GH#51) @aaronspring and @andersy005
Return dictionary of datasets all the time for consistency (GH#56) @andersy005
Include multiple netcdf files in same subdirectory (GH#55) & (GH#54) @naomi-henderson and @andersy005
Allow CMIP integration (GH#35) @andersy005
Fix bug on build catalog and move exclude_dirs to locations (GH#33) @matt-long
exclude_dirs
locations
Change Logger, update dev-environment dependencies, and formatting fix in input.yml (GH#31) @matt-long
Update CircleCI workflow (GH#32) @andersy005
Rename package from intake-cesm to intake-esm (GH#34) @andersy005
intake-cesm