{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Working with multi-variable assets\n", "\n", "In addition to catalogs of data assets (files) in time-series (single-variable)\n", "format, intake-esm supports catalogs with data assets in time-slice (history)\n", "format and/or files with multiple variables. For intake-esm to properly work\n", "with multi-variable assets,\n", "\n", "- the `variable_column` of the catalog must contain iterables (list, tuple, set)\n", " of values.\n", "- the user must specifiy a dictionary of functions for converting values in\n", " certain columns into iterables. This is done via the `csv_kwargs` argument.\n", "\n", "In the example below, we are are going to use the following catalog to\n", "demonstrate how to work with multi-variable assets:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Look at the catalog on disk\n", "!cat multi-variable-catalog.csv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see, the variable column contains a list of varibles, and this list\n", "was serialized as a string:\n", "`\"['SHF', 'REGION_MASK', 'ANGLE', 'DXU', 'KMT', 'NO2', 'O2']\"`.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading a catalog\n", "\n", "To load a catalog with multiple variable files, we must pass additional\n", "information to `open_esm_datastore` via the `csv_kwargs` argument. We are going\n", "to specify a dictionary of functions for converting values in `variable` column\n", "into iterables. We use the `literal_eval` function from the standard `ast`\n", "module:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import ast\n", "\n", "import intake" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "col = intake.open_esm_datastore(\n", " \"multi-variable-collection.json\",\n", " csv_kwargs={\"converters\": {\"variable\": ast.literal_eval}},\n", ")\n", "col" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "col.df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The in-memory representation of the catalog contains `variable` with tuple of\n", "values. To confirm that intake-esm has registered this catalog with multiple\n", "variable assets, we can the `._multiple_variable_assets` property:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "col._multiple_variable_assets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Searching\n", "\n", "The search functionatilty works in the same way:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "col_subset = col.search(variable=[\"O2\", \"SiO3\"])\n", "col_subset.df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading assets into xarray datasets\n", "\n", "Loading data assets into xarray datasets works in the same way too:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "col_subset.to_dataset_dict(cdf_kwargs={})" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import intake_esm # just to display version information\n", "\n", "intake_esm.show_versions()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 4 }