{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Manipulating DataFrame (in-memory catalog)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import warnings\n", "\n", "warnings.filterwarnings(\"ignore\")\n", "import intake" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The in-memory representation of an Earth System Model (ESM) catalog is a pandas\n", "dataframe, and is accessible via the `.df` property:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "url = \"https://storage.googleapis.com/cmip6/pangeo-cmip6.json\"\n", "col = intake.open_esm_datastore(url)\n", "col.df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this notebook we will go through some examples showing how to manipulate this\n", "dataframe outside of intake-esm.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Use Case 1: Complex Search Queries\n", "\n", "Let's say we are interested in datasets with the following attributes:\n", "\n", "- `experiment_id=[\"historical\"]`\n", "- `table_id=\"Amon\"`\n", "- `variable_id=\"tas\"`\n", "- `source_id=['TaiESM1', 'AWI-CM-1-1-MR', 'AWI-ESM-1-1-LR', 'BCC-CSM2-MR', 'BCC-ESM1', 'CAMS-CSM1-0', 'CAS-ESM2-0', 'UKESM1-0-LL']`\n", "\n", "In addition to these attributes, **we are interested in the first ensemble\n", "member (member_id) of each model (source_id) only**.\n", "\n", "This can be achieved in two steps:\n", "\n", "### Step 1: Run a query against the catalog\n", "\n", "We can run a query against the catalog:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "col_subset = col.search(\n", " experiment_id=[\"historical\"],\n", " table_id=\"Amon\",\n", " variable_id=\"tas\",\n", " source_id=[\n", " \"TaiESM1\",\n", " \"AWI-CM-1-1-MR\",\n", " \"AWI-ESM-1-1-LR\",\n", " \"BCC-CSM2-MR\",\n", " \"BCC-ESM1\",\n", " \"CAMS-CSM1-0\",\n", " \"CAS-ESM2-0\",\n", " \"UKESM1-0-LL\",\n", " ],\n", ")\n", "col_subset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 2: Select the first `member_id` for each `source_id`\n", "\n", "The subsetted catalog contains `source_id` with the following number of\n", "`member_id` per `source_id`:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "col_subset.df.groupby(\"source_id\")[\"member_id\"].nunique()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To get the first `member_id` for each `source_id`, we group the dataframe by\n", "`source_id` and use the `.first()` function to retrieve the first `member_id`:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "grouped = col_subset.df.groupby([\"source_id\"])\n", "df = grouped.first().reset_index()\n", "\n", "# Confirm that we have one ensemble member per source_id\n", "\n", "df.groupby(\"source_id\")[\"member_id\"].nunique()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 3: Attach the new dataframe to our catalog object\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "col_subset.df = df\n", "col_subset" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dsets = col_subset.to_dataset_dict(zarr_kwargs={\"consolidated\": True})\n", "[key for key in dsets]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(dsets[\"CMIP.CAS.CAS-ESM2-0.historical.Amon.gn\"])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import intake_esm # just to display version information\n", "\n", "intake_esm.show_versions()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 4 }