# maelstrom-downscaling-ap5
A <a href="https://climetlab.readthedocs.io">CliMetLab </a> dataset plugin for the datasets
used in application of the <a href="https://maelstrom-eurohpc.eu/">MAELSTROM project</a>.
Features
--------
This README provides a brief description of the provided datasets for statistical downscaling of
meteorological fields, the target of
<a href="https://www.maelstrom-eurohpc.eu/article?topic=improved-local-weather-predictions">application 5 (AP5) in scope of MAELSTROM</a>.
Two different datasets, named Tier-1 and Tier-2 in the following, can be downloaded from the <a href="https://aws.amazon.com/s3/?nc1=h_ls">AWS s3-bucket</a>,
provided by ECMWF, with this `CliMetLab` plugin. Both datasets are distributed under the
<a href="https://git.ecmwf.int/projects/MLFET/repos/maelstrom-downscaling-ap5/browse/LICENSE">Apache License, version 2.0 </a>
and thus are open-access.
## Using climetlab to access the data
The `CliMetLab` python package allows easy access to the data with a few lines of code. <br>
The following examples demonstrate how to obtain the two provided datasets.
A more detailed description of both datasets is provided afterwards.
### Download the Tier-1 data
The training data of the Tier-1 dataset can be downloaded as follows:
```
!pip install climetlab climetlab_maelstrom_downscaling
import climetlab as cml
ds = cml.load_dataset("maelstrom-downscaling", dataset="training")
ds.to_xarray()
```
By changing the `dataset`-argument to `"validation"` and `"testing"`, the validation and testing data can be retrieved.
Furthermore, an augmented variant of the dataset is available which can be downloaded by adding
a `_augmented`-suffix to the `dataset`-arguments.
### Download the Tier-2 data
The Tier-2 dataset can be downloaded by replacing the value of the first argument of `cml.load_dataset`.
The following code-snippet exemplary downloads the training dataset:
```commandline
ds = cml.load_dataset("maelstrom-downscaling-tier2", dataset="training")
```
Note that the training dataset comprises about 250 GB of data and thus downloading can require several minutes or hours depending on the Internet connection.
Due to the comprehensive size of the dataset, no augmented variant is provided.
### Saving the data on disk
By default, `CliMetLab` only caches the data. To save the data persistently onto disk/in the user's filesystem,
`persist=True` must be added, when running the `to_xarray`-method. Furthermore, a directory-path under which the file(-s) will be saved must be parsed via `data_dir`.
The following command exemplifies saving the large-scale Tier-2 dataset.
```
ds = cml.load_dataset("maelstrom-downscaling-tier2", dataset="training")
ds.to_xarray(persist=True, data_dir="/my/local/path")
```
### Tutorial for the Tier-1 dataset
A tutorial is available in form of a <a href="https://git.ecmwf.int/projects/MLFET/repos/maelstrom-downscaling-ap5/browse/notebooks/demo_downscaling_dataset.ipynb">Jupyter Notebook</a>.
In this Jupyter Notebook, the Tier-1 dataset is used to train a simple U-Net for downscaling adapted from [1].
## Dataset description
Within the <a href="https://maelstrom-eurohpc.eu/">MAELSTROM</a> project, two different datasets are provided that are
used to construct statistical downscaling tasks with deep neural networks.
The first dataset, the Tier-1 dataset, serves as the starting point and provides the data for a pure downscaling task
similar to the super-resolution task in computer vision. <br>
The Tier-2 dataset provides the data for a real downscaling task in meteorology where the super-resolution task
is complemented by bias correction. <br>
Both datasets will be described in more detail in the following.
### The Tier-1 dataset
The Tier-1 dataset contains 2m temperature and surface elevation data obtained from the IFS HRES model at its initialization times 00 and 12 UTC between 2016 and 2020.
The data is temporally sliced to months of the summer half of the year (defined between April and September inclusively).
Spatially, the data is limited to a domain covering Central Europe including complex topography with 128x96 grid points in zonal and meridional direction.
For convenience, the data has been remapped onto a regular spherical grid with a spacing (dx) of 0.1°.
<br><br>
Since only one set of model data is used, the Tier-1 dataset constitutes an artificial downscaling task
where the input data is coarsened and the downscaling model is trained to revert this coarsening process.
This makes the downscaling task very similar to the super-resolution task in computer vision,
since no systematic bias has to be removed between the input and the target data. Note that this (simplified)
downscaling task has been subject to other studies on statistical downscaling with deep neural networks as well,
e.g. [1] or [2].
For the target variable of the Tier 1-dataset, the 2m temperature `t2m_tar`, the coarsened input data has undergone the
following preprocessing chain:<br>
The first step comprises a conservative remapping onto a coarse grid with dx = 0.8°. This step effectively removes fine-grained information from
the data. Second, the data is interpolated back (naively) onto the high resolved grid (with dx = 0.1°) via bi-linear interpolation. Note that this step does
*not* recover the information loss from step 1. Finally, to obtain energetic consistency, all calculation have been performed using the dry static energy
which is a pure function of the temperature and the elevation.<br><br>
The dataset is thereby subdivided into subsets for training, validation and testing. The former comprises the data between 2016 and 2019,
while the two latter are made of monthly data from 2020.
### The Tier-2 dataset
The Tier-2 dataset provides data for a real downscaling task where coarse-grained ERA5 reanalysis data [3]
are downscaled onto the high-resolved grid of the COSMO REA6 dataset [4].
Since data from two different models are now used, where COSMO REA6 provides added value
over complex terrain due to its higher spatial resolution, the downscaling task is now twofold:
The data has to be super-resolved and bias-corrected.
Here, we still target the 2m temperature as with the Tier-1 dataset, but include more predictor variables:
- 2m temperature
- temperature from model levels 137, 135, 131, 127, 122 and 115
- surface pressure
- surface latent and sensible heat fluxes
- 10m horizontal wind (u,v)
- boundary layer height
The surface topography from the ERA5 and the COSMO REA6 datasets are also included as static predictor variables.
As a necessary prerequisite, the underlying grid of both reanalysis datasets needed to be merged.
Here, we have remapped the ERA5 data onto the rotated pole grid deployed by the COSMO model [5].
With a grid spacing of 0.275° in rotated coordinates, the spatial resolution of the ERA 5 data is
five times coarser than the target data, the COSMO REA6-data (dx=0.055°).
Similar to the Tier-1 dataset, the ERA5-data is bi-linearly interpolated onto the high resolved target
grid to serve as input for the neural networks for downscaling.
Currently, the target domain of the Tier-2 dataset comprises 120x96 grid points (with dx=0.055°)
covering large parts of Central Europe to include complex terrain of the Alps and the German low mountain range.
Hourly data is provided for the years between 1995 and 2018. By default, the years 1995-2016 constitute the
training dataset, while 2017 and 2018 are used for validation and testing, respectively.
This dataset constitutes the final dataset used in Application 5 of the MAELSTROM project as described in this [report](https://www.maelstrom-eurohpc.eu/content/docs/uploads/doc50.pdf).
## References
**[1]** Sha, Yingkai, et al. "Deep-learning-based gridded downscaling of surface meteorological variables in complex
terrain. Part I: Daily maximum and minimum 2-m temperature."
Journal of Applied Meteorology and Climatology 59.12 (2020): 2057-2073.
<a href="https://doi.org/10.1175/JAMC-D-20-0057.1"> DOI</a>.<br>
**[2]** Leinonen, Jussi et al., "Stochastic Super-Resolution for Downscaling Time-Evolving Atmospheric Fields
With a Generative Adversarial Network." IEEE Transactions on Geoscience and Remote Sensing 59.9 (2021): 7211-7223
<a href="https://doi.org/10.1109/TGRS.2020.3032790">DOI</a>.<br>
**[3]** Hersbach, Hans, et al. "The ERA5 global reanalysis." Quarterly Journal of the Royal Meteorological Society
146.730 (2020): 1999-2049.
<a href="https://doi.org/10.1002/qj.3803">DOI</a>.<br>
**[4]** Bollmeyer, Christoph, et al. "Towards a high‐resolution regional reanalysis for the European CORDEX domain."
Quarterly Journal of the Royal Meteorological Society 141.686 (2015): 1-15.
<a href="https://doi.org/10.1002/qj.2486">DOI</a>.<br>
**[5]** Hans-Ertel-Center for Weather Research - Climate Monitoring and Diagnostics.
COSMO Regional Reanalysis - COSMO-REA6.
<a href="https://reanalysis.meteo.uni-bonn.de/?COSMO-REA6">Link</a>.<br>
Raw data
{
"_id": null,
"home_page": "https://git.ecmwf.int/projects/MLFET/repos/maelstrom-downscaling-ap5/",
"name": "climetlab-maelstrom-downscaling",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "meteorology",
"author": "Michael Langguth, Bing Gong",
"author_email": "m.langguth@fz-juelich.de",
"download_url": "https://files.pythonhosted.org/packages/21/f0/3e7b64ce10c512aa18c9e43cd8c21685c1616ab7e66dbc2c46fe6b2a0786/climetlab_maelstrom_downscaling-0.4.0.tar.gz",
"platform": null,
"description": "\ufeff# maelstrom-downscaling-ap5\n\nA <a href=\"https://climetlab.readthedocs.io\">CliMetLab </a> dataset plugin for the datasets\nused in application of the <a href=\"https://maelstrom-eurohpc.eu/\">MAELSTROM project</a>.\n\nFeatures\n--------\n\nThis README provides a brief description of the provided datasets for statistical downscaling of\nmeteorological fields, the target of \n<a href=\"https://www.maelstrom-eurohpc.eu/article?topic=improved-local-weather-predictions\">application 5 (AP5) in scope of MAELSTROM</a>.\nTwo different datasets, named Tier-1 and Tier-2 in the following, can be downloaded from the <a href=\"https://aws.amazon.com/s3/?nc1=h_ls\">AWS s3-bucket</a>,\nprovided by ECMWF, with this `CliMetLab` plugin. Both datasets are distributed under the \n<a href=\"https://git.ecmwf.int/projects/MLFET/repos/maelstrom-downscaling-ap5/browse/LICENSE\">Apache License, version 2.0 </a>\nand thus are open-access. \n\n## Using climetlab to access the data\n\nThe `CliMetLab` python package allows easy access to the data with a few lines of code. <br>\nThe following examples demonstrate how to obtain the two provided datasets. \nA more detailed description of both datasets is provided afterwards.\n\n### Download the Tier-1 data\nThe training data of the Tier-1 dataset can be downloaded as follows:\n```\n!pip install climetlab climetlab_maelstrom_downscaling\nimport climetlab as cml\nds = cml.load_dataset(\"maelstrom-downscaling\", dataset=\"training\")\nds.to_xarray()\n```\nBy changing the `dataset`-argument to `\"validation\"` and `\"testing\"`, the validation and testing data can be retrieved.\nFurthermore, an augmented variant of the dataset is available which can be downloaded by adding \na `_augmented`-suffix to the `dataset`-arguments.\n\n### Download the Tier-2 data\nThe Tier-2 dataset can be downloaded by replacing the value of the first argument of `cml.load_dataset`. \nThe following code-snippet exemplary downloads the training dataset:\n```commandline\nds = cml.load_dataset(\"maelstrom-downscaling-tier2\", dataset=\"training\")\n```\nNote that the training dataset comprises about 250 GB of data and thus downloading can require several minutes or hours depending on the Internet connection.\nDue to the comprehensive size of the dataset, no augmented variant is provided.\n\n### Saving the data on disk\nBy default, `CliMetLab` only caches the data. To save the data persistently onto disk/in the user's filesystem,\n`persist=True` must be added, when running the `to_xarray`-method. Furthermore, a directory-path under which the file(-s) will be saved must be parsed via `data_dir`.\nThe following command exemplifies saving the large-scale Tier-2 dataset.\n```\nds = cml.load_dataset(\"maelstrom-downscaling-tier2\", dataset=\"training\")\nds.to_xarray(persist=True, data_dir=\"/my/local/path\")\n```\n\n### Tutorial for the Tier-1 dataset\n\nA tutorial is available in form of a <a href=\"https://git.ecmwf.int/projects/MLFET/repos/maelstrom-downscaling-ap5/browse/notebooks/demo_downscaling_dataset.ipynb\">Jupyter Notebook</a>.\nIn this Jupyter Notebook, the Tier-1 dataset is used to train a simple U-Net for downscaling adapted from [1].\n\n## Dataset description\n\nWithin the <a href=\"https://maelstrom-eurohpc.eu/\">MAELSTROM</a> project, two different datasets are provided that are \nused to construct statistical downscaling tasks with deep neural networks.\nThe first dataset, the Tier-1 dataset, serves as the starting point and provides the data for a pure downscaling task \nsimilar to the super-resolution task in computer vision. <br>\nThe Tier-2 dataset provides the data for a real downscaling task in meteorology where the super-resolution task \nis complemented by bias correction. <br>\nBoth datasets will be described in more detail in the following.\n\n### The Tier-1 dataset\nThe Tier-1 dataset contains 2m temperature and surface elevation data obtained from the IFS HRES model at its initialization times 00 and 12 UTC between 2016 and 2020.\nThe data is temporally sliced to months of the summer half of the year (defined between April and September inclusively). \nSpatially, the data is limited to a domain covering Central Europe including complex topography with 128x96 grid points in zonal and meridional direction.\nFor convenience, the data has been remapped onto a regular spherical grid with a spacing (dx) of 0.1\u00b0.\n<br><br>\nSince only one set of model data is used, the Tier-1 dataset constitutes an artificial downscaling task\nwhere the input data is coarsened and the downscaling model is trained to revert this coarsening process.\nThis makes the downscaling task very similar to the super-resolution task in computer vision,\nsince no systematic bias has to be removed between the input and the target data. Note that this (simplified)\ndownscaling task has been subject to other studies on statistical downscaling with deep neural networks as well,\ne.g. [1] or [2].\n\nFor the target variable of the Tier 1-dataset, the 2m temperature `t2m_tar`, the coarsened input data has undergone the\nfollowing preprocessing chain:<br>\nThe first step comprises a conservative remapping onto a coarse grid with dx = 0.8\u00b0. This step effectively removes fine-grained information from \nthe data. Second, the data is interpolated back (naively) onto the high resolved grid (with dx = 0.1\u00b0) via bi-linear interpolation. Note that this step does \n*not* recover the information loss from step 1. Finally, to obtain energetic consistency, all calculation have been performed using the dry static energy \nwhich is a pure function of the temperature and the elevation.<br><br>\n\nThe dataset is thereby subdivided into subsets for training, validation and testing. The former comprises the data between 2016 and 2019,\nwhile the two latter are made of monthly data from 2020. \n\n### The Tier-2 dataset\n\nThe Tier-2 dataset provides data for a real downscaling task where coarse-grained ERA5 reanalysis data [3]\nare downscaled onto the high-resolved grid of the COSMO REA6 dataset [4]. \nSince data from two different models are now used, where COSMO REA6 provides added value\nover complex terrain due to its higher spatial resolution, the downscaling task is now twofold:\nThe data has to be super-resolved and bias-corrected.\n\nHere, we still target the 2m temperature as with the Tier-1 dataset, but include more predictor variables:\n- 2m temperature\n- temperature from model levels 137, 135, 131, 127, 122 and 115\n- surface pressure\n- surface latent and sensible heat fluxes\n- 10m horizontal wind (u,v)\n- boundary layer height\n\nThe surface topography from the ERA5 and the COSMO REA6 datasets are also included as static predictor variables.\n\nAs a necessary prerequisite, the underlying grid of both reanalysis datasets needed to be merged.\nHere, we have remapped the ERA5 data onto the rotated pole grid deployed by the COSMO model [5].\nWith a grid spacing of 0.275\u00b0 in rotated coordinates, the spatial resolution of the ERA 5 data is \nfive times coarser than the target data, the COSMO REA6-data (dx=0.055\u00b0). \nSimilar to the Tier-1 dataset, the ERA5-data is bi-linearly interpolated onto the high resolved target \ngrid to serve as input for the neural networks for downscaling.\n\nCurrently, the target domain of the Tier-2 dataset comprises 120x96 grid points (with dx=0.055\u00b0) \ncovering large parts of Central Europe to include complex terrain of the Alps and the German low mountain range.\nHourly data is provided for the years between 1995 and 2018. By default, the years 1995-2016 constitute the \ntraining dataset, while 2017 and 2018 are used for validation and testing, respectively.\n\nThis dataset constitutes the final dataset used in Application 5 of the MAELSTROM project as described in this [report](https://www.maelstrom-eurohpc.eu/content/docs/uploads/doc50.pdf).\n\n## References\n**[1]** Sha, Yingkai, et al. \"Deep-learning-based gridded downscaling of surface meteorological variables in complex\nterrain. Part I: Daily maximum and minimum 2-m temperature.\"\nJournal of Applied Meteorology and Climatology 59.12 (2020): 2057-2073. \n<a href=\"https://doi.org/10.1175/JAMC-D-20-0057.1\"> DOI</a>.<br>\n**[2]** Leinonen, Jussi et al., \"Stochastic Super-Resolution for Downscaling Time-Evolving Atmospheric Fields\nWith a Generative Adversarial Network.\" IEEE Transactions on Geoscience and Remote Sensing 59.9 (2021): 7211-7223 \n<a href=\"https://doi.org/10.1109/TGRS.2020.3032790\">DOI</a>.<br>\n**[3]** Hersbach, Hans, et al. \"The ERA5 global reanalysis.\" Quarterly Journal of the Royal Meteorological Society \n146.730 (2020): 1999-2049. \n<a href=\"https://doi.org/10.1002/qj.3803\">DOI</a>.<br>\n**[4]** Bollmeyer, Christoph, et al. \"Towards a high\u2010resolution regional reanalysis for the European CORDEX domain.\"\nQuarterly Journal of the Royal Meteorological Society 141.686 (2015): 1-15.\n<a href=\"https://doi.org/10.1002/qj.2486\">DOI</a>.<br>\n**[5]** Hans-Ertel-Center for Weather Research - Climate Monitoring and Diagnostics.\nCOSMO Regional Reanalysis - COSMO-REA6. \n<a href=\"https://reanalysis.meteo.uni-bonn.de/?COSMO-REA6\">Link</a>.<br>\n\n",
"bugtrack_url": null,
"license": "Apache License Version 2.0",
"summary": "A dataset plugin for climetlab for the dataset maelstrom-downscaling.",
"version": "0.4.0",
"project_urls": {
"Homepage": "https://git.ecmwf.int/projects/MLFET/repos/maelstrom-downscaling-ap5/"
},
"split_keywords": [
"meteorology"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "10e782c7d78125cca0ca3164a10ce7492de99570463c4625311f58c68fea9d9c",
"md5": "90184419ebf14aaae55a33ace787aa1c",
"sha256": "594d39f65bded44dea31d394feb9191774e4b749c9e74a345c6cf71d5d0d8605"
},
"downloads": -1,
"filename": "climetlab_maelstrom_downscaling-0.4.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "90184419ebf14aaae55a33ace787aa1c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 15411,
"upload_time": "2024-05-27T12:16:50",
"upload_time_iso_8601": "2024-05-27T12:16:50.147723Z",
"url": "https://files.pythonhosted.org/packages/10/e7/82c7d78125cca0ca3164a10ce7492de99570463c4625311f58c68fea9d9c/climetlab_maelstrom_downscaling-0.4.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "21f03e7b64ce10c512aa18c9e43cd8c21685c1616ab7e66dbc2c46fe6b2a0786",
"md5": "88300d4f7bbbb56304c987596e2de266",
"sha256": "70cc47aca9f5cbbbbbfdb112e568d610278efd094257f3fb0f37c4ead3879a8b"
},
"downloads": -1,
"filename": "climetlab_maelstrom_downscaling-0.4.0.tar.gz",
"has_sig": false,
"md5_digest": "88300d4f7bbbb56304c987596e2de266",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 16764,
"upload_time": "2024-05-27T12:16:51",
"upload_time_iso_8601": "2024-05-27T12:16:51.922807Z",
"url": "https://files.pythonhosted.org/packages/21/f0/3e7b64ce10c512aa18c9e43cd8c21685c1616ab7e66dbc2c46fe6b2a0786/climetlab_maelstrom_downscaling-0.4.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-27 12:16:51",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "climetlab-maelstrom-downscaling"
}