# s2spy: Boost (sub) seasonal forecasting with AI
<img align="right" width="150" alt="Logo" src="https://raw.githubusercontent.com/AI4S2S/s2spy/main/docs/assets/images/ai4s2s_logo.png">
[![github repo badge](https://img.shields.io/badge/github-repo-000.svg?logo=github&labelColor=gray&color=blue)](https://github.com/AI4S2S/ai4s2s)
[![github license badge](https://img.shields.io/github/license/AI4S2S/s2spy)](https://github.com/AI4S2S/s2spy)
[![fair-software badge](https://img.shields.io/badge/fair--software.eu-%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8B-yellow)](https://fair-software.eu)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7708338.svg)](https://doi.org/10.5281/zenodo.7708338)
[![Documentation Status](https://readthedocs.org/projects/ai4s2s/badge/?version=latest)](https://ai4s2s.readthedocs.io/en/latest/?badge=latest)
[![build](https://github.com/AI4S2S/s2spy/actions/workflows/build.yml/badge.svg)](https://github.com/AI4S2S/s2spy/actions/workflows/build.yml)
[![codecov](https://codecov.io/gh/AI4S2S/s2spy/graph/badge.svg?token=8HFAXHTTB1)](https://codecov.io/gh/AI4S2S/s2spy)
A high-level python package integrating expert knowledge and artificial intelligence to boost (sub) seasonal forecasting.
## Why s2spy?
Producing reliable sub-seasonal to seasonal (S2S) forecasts with machine learning techniques remains a challenge. Currently, these data-driven S2S forecasts generally suffer from a lack of trust because of:
- Intransparent data processing and poorly reproducible scientific outcomes
- Technical pitfalls related to machine learning-based predictability (e.g. overfitting)
- Black-box methods without sufficient explanation
To tackle these challenges, we build `s2spy` which is an open-source, high-level python package. It provides an interface between artificial intelligence and expert knowledge, to boost predictability and physical understanding of S2S processes. By implementing optimal data-handling and parallel-computing packages, it can efficiently run across different Big Climate Data platforms. Key components will be explainable AI and causal discovery, which will support the classical scientific interplay between theory, hypothesis-generation and data-driven hypothesis-testing, enabling knowledge-mining from data.
Developing this tool will be a community effort. It helps us achieve trustworthy data-driven forecasts by providing:
- Transparent and reproducible analyses
- Best practices in model verifications
- Understanding the sources of predictability
## Installation
[![workflow pypi badge](https://img.shields.io/pypi/v/s2spy.svg?colorB=blue)](https://pypi.python.org/project/s2spy/)
[![supported python versions](https://img.shields.io/pypi/pyversions/s2spy)](https://pypi.python.org/project/s2spy/)
To install the latest release of s2spy, do:
```console
python3 -m pip install s2spy
```
To install the in-development version from the GitHub repository, do:
```console
python3 -m pip install git+https://github.com/AI4S2S/s2spy.git
```
### Configure the package for development and testing
For developing and testing the package, please follow the developer guide, which can be found [here](https://github.com/AI4S2S/s2spy/blob/main/docs/README.dev.md).
## Getting started
`s2spy` provides end-to-end solutions for machine learning (ML) based S2S forecasting.
![workflow](https://raw.githubusercontent.com/AI4S2S/s2spy/main/docs/assets/images/workflow.png)
### Datetime operations & Data processing
In a typical ML-based S2S project, the first step is always data processing. Our calendar-based package, [`lilio`](https://github.com/AI4S2S/lilio), is used for time operations. For instance, a user is looking for predictors for winter climate at seasonal timescales (~180 days). First, a `Calendar` object is created using `daily_calendar`:
```py
>>> calendar = lilio.daily_calendar(anchor="11-30", length='180d')
>>> calendar = calendar.map_years(2020, 2021)
>>> calendar.show()
i_interval -1 1
anchor_year
2021 [2021-06-03, 2021-11-30) [2021-11-30, 2022-05-29)
2020 [2020-06-03, 2020-11-30) [2020-11-30, 2021-05-29)
```
Now, the user can load the data `input_data` (e.g. `pandas` `DataFrame`) and resample it to the desired timescales configured in the calendar:
```py
>>> calendar = calendar.map_to_data(input_data)
>>> bins = lilio.resample(calendar, input_data)
>>> bins
anchor_year i_interval interval mean_data target
0 2020 -1 [2020-06-03, 2020-11-30) 275.5 True
1 2020 1 [2020-11-30, 2021-05-29) 95.5 False
2 2021 -1 [2021-06-03, 2021-11-30) 640.5 True
3 2021 1 [2021-11-30, 2022-05-29) 460.5 False
```
Depending on data preparations, we can choose different types of calendars. For more information, see [Lilio's documentation](https://lilio.readthedocs.io/en/latest/notebooks/calendar_shorthands.html).
### Cross-validation
Lilio can also generate train/test splits and perform cross-validation. To do that, a splitter is called from `sklearn.model_selection` e.g. `ShuffleSplit` and used to split the resampled data:
```py
from sklearn.model_selection import ShuffleSplit
splitter = ShuffleSplit(n_splits=3)
lilio.traintest.split_groups(splitter, bins)
```
All splitter classes from `scikit-learn` are supported, a list is available [here](https://scikit-learn.org/stable/modules/classes.html#splitter-classes). Users should follow `scikit-learn` documentation on how to use a different splitter class.
### Dimensionality reduction
With `s2spy`, we can perform dimensionality reduction on data. For instance, to perform the [Response Guided Dimensionality Reduction (RGDR)](https://www.nature.com/articles/s41612-022-00237-7), we configure the RGDR operator and fit it to a precursor field. Then, this cluster can be used to transform the data into the reduced clusters:
```py
rgdr = RGDR(eps_km=600, alpha=0.05, min_area_km2=3000**2)
rgdr.fit(precursor_field, target_timeseries)
clustered_data = rgdr.transform(precursor_field)
_ = rgdr.plot_clusters(precursor_field, target_timeseries, lag=1)
```
![clusters](https://raw.githubusercontent.com/AI4S2S/s2spy/main/docs/assets/images/rgdr_clusters.png)
(for more information about `precursor_field` and `target_timeseries`, check the complete example in [this notebook](https://github.com/AI4S2S/s2spy/blob/main/docs/notebooks/tutorial_RGDR.ipynb).)
Currently, `s2spy` supports [dimensionality reduction approaches](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.cluster) from `scikit-learn`.
## Tutorials
`s2spy` supports operations that are common in a machine learning pipeline of sub-seasonal to seasonal forecasting research. Tutorials covering supported methods and functionalities are listed in [notebooks](https://github.com/AI4S2S/s2spy/tree/main/docs/notebooks). To check these notebooks, users need to install [`Jupyter lab`](https://jupyter.org/). More details about each method can be found in this [API reference documentation](https://ai4s2s.readthedocs.io/en/latest/autoapi/index.html).
## Advanced usecases
You can achieve more by integrating `s2spy` and `lilio` into your data-driven S2S forecast workflow! We have a magic [cookbook](https://github.com/AI4S2S/cookbook), which includes recipes for complex machine learning based forecasting usecases. These examples will show you how `s2spy` and `lilio` can facilitate your workflow.
## Documentation
[![Documentation Status](https://readthedocs.org/projects/ai4s2s/badge/?version=latest)](https://ai4s2s.readthedocs.io/en/latest/?badge=latest)
For detailed information on using `s2spy` package, visit the [documentation page](https://ai4s2s.readthedocs.io/en/latest/) hosted at Readthedocs.
## Contributing
If you want to contribute to the development of s2spy,
have a look at the [contribution guidelines](docs/CONTRIBUTING.md).
## How to cite us
[![RSD](https://img.shields.io/badge/rsd-s2spy-00a3e3.svg)](https://research-software-directory.org/software/s2spy)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7708338.svg)](https://doi.org/10.5281/zenodo.7708338)
Please use the Zenodo DOI to cite this package if you used it in your research.
## Acknowledgements
This package was developed by the Netherlands eScience Center and Vrije Universiteit Amsterdam. Development was supported by the Netherlands eScience Center under grant number NLESC.OEC.2021.005.
This package was created with [Cookiecutter](https://github.com/audreyr/cookiecutter) and the [NLeSC/python-template](https://github.com/NLeSC/python-template).
Raw data
{
"_id": null,
"home_page": null,
"name": "s2spy",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.12,>3.8",
"maintainer_email": "Yang Liu <y.liu@esciencecenter.nl>, Bart Schilperoort <b.schilperoort@esciencecenter.nl>, Peter Kalverla <b.schilperoort@esciencecenter.nl>, Jannes van Ingen <jannes.van.ingen@s2s-ai.com>, Sem Vijverberg <sem.vijverberg@vu.nl>, Claire Donnelly <c.donnelly@esciencecenter.nl>",
"keywords": "AI, S2S",
"author": "Yang Liu, Bart Schilperoort, Peter Kalverla, Jannes van Ingen, Sem Vijverberg, Claire Donnelly",
"author_email": "y.liu@esciencecenter.nl",
"download_url": "https://files.pythonhosted.org/packages/0c/47/6b67e3c4987788d1ddbd8eadb5396f20a66e6e87abdbaa4524415a23da0f/s2spy-0.4.1.tar.gz",
"platform": null,
"description": "# s2spy: Boost (sub) seasonal forecasting with AI\n\n<img align=\"right\" width=\"150\" alt=\"Logo\" src=\"https://raw.githubusercontent.com/AI4S2S/s2spy/main/docs/assets/images/ai4s2s_logo.png\">\n\n[![github repo badge](https://img.shields.io/badge/github-repo-000.svg?logo=github&labelColor=gray&color=blue)](https://github.com/AI4S2S/ai4s2s)\n[![github license badge](https://img.shields.io/github/license/AI4S2S/s2spy)](https://github.com/AI4S2S/s2spy)\n[![fair-software badge](https://img.shields.io/badge/fair--software.eu-%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8B-yellow)](https://fair-software.eu)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7708338.svg)](https://doi.org/10.5281/zenodo.7708338)\n[![Documentation Status](https://readthedocs.org/projects/ai4s2s/badge/?version=latest)](https://ai4s2s.readthedocs.io/en/latest/?badge=latest)\n[![build](https://github.com/AI4S2S/s2spy/actions/workflows/build.yml/badge.svg)](https://github.com/AI4S2S/s2spy/actions/workflows/build.yml)\n[![codecov](https://codecov.io/gh/AI4S2S/s2spy/graph/badge.svg?token=8HFAXHTTB1)](https://codecov.io/gh/AI4S2S/s2spy)\n\nA high-level python package integrating expert knowledge and artificial intelligence to boost (sub) seasonal forecasting.\n\n## Why s2spy?\nProducing reliable sub-seasonal to seasonal (S2S) forecasts with machine learning techniques remains a challenge. Currently, these data-driven S2S forecasts generally suffer from a lack of trust because of:\n- Intransparent data processing and poorly reproducible scientific outcomes\n- Technical pitfalls related to machine learning-based predictability (e.g. overfitting)\n- Black-box methods without sufficient explanation\n\nTo tackle these challenges, we build `s2spy` which is an open-source, high-level python package. It provides an interface between artificial intelligence and expert knowledge, to boost predictability and physical understanding of S2S processes. By implementing optimal data-handling and parallel-computing packages, it can efficiently run across different Big Climate Data platforms. Key components will be explainable AI and causal discovery, which will support the classical scientific interplay between theory, hypothesis-generation and data-driven hypothesis-testing, enabling knowledge-mining from data.\n\nDeveloping this tool will be a community effort. It helps us achieve trustworthy data-driven forecasts by providing:\n- Transparent and reproducible analyses\n- Best practices in model verifications\n- Understanding the sources of predictability\n\n## Installation\n[![workflow pypi badge](https://img.shields.io/pypi/v/s2spy.svg?colorB=blue)](https://pypi.python.org/project/s2spy/)\n[![supported python versions](https://img.shields.io/pypi/pyversions/s2spy)](https://pypi.python.org/project/s2spy/)\n\nTo install the latest release of s2spy, do:\n```console\npython3 -m pip install s2spy\n```\n\nTo install the in-development version from the GitHub repository, do:\n\n```console\npython3 -m pip install git+https://github.com/AI4S2S/s2spy.git\n```\n\n### Configure the package for development and testing\nFor developing and testing the package, please follow the developer guide, which can be found [here](https://github.com/AI4S2S/s2spy/blob/main/docs/README.dev.md).\n\n## Getting started\n`s2spy` provides end-to-end solutions for machine learning (ML) based S2S forecasting.\n\n![workflow](https://raw.githubusercontent.com/AI4S2S/s2spy/main/docs/assets/images/workflow.png)\n\n### Datetime operations & Data processing\nIn a typical ML-based S2S project, the first step is always data processing. Our calendar-based package, [`lilio`](https://github.com/AI4S2S/lilio), is used for time operations. For instance, a user is looking for predictors for winter climate at seasonal timescales (~180 days). First, a `Calendar` object is created using `daily_calendar`:\n\n```py\n>>> calendar = lilio.daily_calendar(anchor=\"11-30\", length='180d')\n>>> calendar = calendar.map_years(2020, 2021)\n>>> calendar.show()\ni_interval -1 1\nanchor_year\n2021 [2021-06-03, 2021-11-30) [2021-11-30, 2022-05-29)\n2020 [2020-06-03, 2020-11-30) [2020-11-30, 2021-05-29)\n```\n\nNow, the user can load the data `input_data` (e.g. `pandas` `DataFrame`) and resample it to the desired timescales configured in the calendar:\n\n```py\n>>> calendar = calendar.map_to_data(input_data)\n>>> bins = lilio.resample(calendar, input_data)\n>>> bins\n anchor_year i_interval interval mean_data target\n0 2020 -1 [2020-06-03, 2020-11-30) 275.5 True\n1 2020 1 [2020-11-30, 2021-05-29) 95.5 False\n2 2021 -1 [2021-06-03, 2021-11-30) 640.5 True\n3 2021 1 [2021-11-30, 2022-05-29) 460.5 False\n```\n\nDepending on data preparations, we can choose different types of calendars. For more information, see [Lilio's documentation](https://lilio.readthedocs.io/en/latest/notebooks/calendar_shorthands.html).\n\n### Cross-validation\nLilio can also generate train/test splits and perform cross-validation. To do that, a splitter is called from `sklearn.model_selection` e.g. `ShuffleSplit` and used to split the resampled data:\n\n```py\nfrom sklearn.model_selection import ShuffleSplit\nsplitter = ShuffleSplit(n_splits=3)\nlilio.traintest.split_groups(splitter, bins)\n```\n\nAll splitter classes from `scikit-learn` are supported, a list is available [here](https://scikit-learn.org/stable/modules/classes.html#splitter-classes). Users should follow `scikit-learn` documentation on how to use a different splitter class.\n\n### Dimensionality reduction\nWith `s2spy`, we can perform dimensionality reduction on data. For instance, to perform the [Response Guided Dimensionality Reduction (RGDR)](https://www.nature.com/articles/s41612-022-00237-7), we configure the RGDR operator and fit it to a precursor field. Then, this cluster can be used to transform the data into the reduced clusters:\n```py\nrgdr = RGDR(eps_km=600, alpha=0.05, min_area_km2=3000**2)\nrgdr.fit(precursor_field, target_timeseries)\nclustered_data = rgdr.transform(precursor_field)\n_ = rgdr.plot_clusters(precursor_field, target_timeseries, lag=1)\n```\n![clusters](https://raw.githubusercontent.com/AI4S2S/s2spy/main/docs/assets/images/rgdr_clusters.png)\n\n(for more information about `precursor_field` and `target_timeseries`, check the complete example in [this notebook](https://github.com/AI4S2S/s2spy/blob/main/docs/notebooks/tutorial_RGDR.ipynb).)\n\nCurrently, `s2spy` supports [dimensionality reduction approaches](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.cluster) from `scikit-learn`. \n\n## Tutorials\n`s2spy` supports operations that are common in a machine learning pipeline of sub-seasonal to seasonal forecasting research. Tutorials covering supported methods and functionalities are listed in [notebooks](https://github.com/AI4S2S/s2spy/tree/main/docs/notebooks). To check these notebooks, users need to install [`Jupyter lab`](https://jupyter.org/). More details about each method can be found in this [API reference documentation](https://ai4s2s.readthedocs.io/en/latest/autoapi/index.html).\n\n## Advanced usecases\nYou can achieve more by integrating `s2spy` and `lilio` into your data-driven S2S forecast workflow! We have a magic [cookbook](https://github.com/AI4S2S/cookbook), which includes recipes for complex machine learning based forecasting usecases. These examples will show you how `s2spy` and `lilio` can facilitate your workflow.\n\n## Documentation\n[![Documentation Status](https://readthedocs.org/projects/ai4s2s/badge/?version=latest)](https://ai4s2s.readthedocs.io/en/latest/?badge=latest)\n\nFor detailed information on using `s2spy` package, visit the [documentation page](https://ai4s2s.readthedocs.io/en/latest/) hosted at Readthedocs.\n\n## Contributing\n\nIf you want to contribute to the development of s2spy,\nhave a look at the [contribution guidelines](docs/CONTRIBUTING.md).\n\n## How to cite us\n[![RSD](https://img.shields.io/badge/rsd-s2spy-00a3e3.svg)](https://research-software-directory.org/software/s2spy)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7708338.svg)](https://doi.org/10.5281/zenodo.7708338)\n\nPlease use the Zenodo DOI to cite this package if you used it in your research.\n\n## Acknowledgements\n\nThis package was developed by the Netherlands eScience Center and Vrije Universiteit Amsterdam. Development was supported by the Netherlands eScience Center under grant number NLESC.OEC.2021.005.\n\nThis package was created with [Cookiecutter](https://github.com/audreyr/cookiecutter) and the [NLeSC/python-template](https://github.com/NLeSC/python-template).\n",
"bugtrack_url": null,
"license": null,
"summary": "python package for s2s forecasts with ai",
"version": "0.4.1",
"project_urls": {
"Bug Tracker": "https://github.com/AI4S2S/ai4s2s/issues",
"Documentation": "https://ai4s2s.readthedocs.io/",
"Homepage": "https://github.com/AI4S2S/ai4s2s"
},
"split_keywords": [
"ai",
" s2s"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ff4d1f971a1cc7f0a54eb9d025cb3f0fd96d5d3bc46332e16ce43a8794e33608",
"md5": "e6992dcfa6c076bbf1ce28d4e65d40a8",
"sha256": "9467c34cb7d34788bd5b32047a8f58895d26a8e93b88779a818d87bef588413a"
},
"downloads": -1,
"filename": "s2spy-0.4.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e6992dcfa6c076bbf1ce28d4e65d40a8",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.12,>3.8",
"size": 24851,
"upload_time": "2024-10-09T08:04:55",
"upload_time_iso_8601": "2024-10-09T08:04:55.273361Z",
"url": "https://files.pythonhosted.org/packages/ff/4d/1f971a1cc7f0a54eb9d025cb3f0fd96d5d3bc46332e16ce43a8794e33608/s2spy-0.4.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "0c476b67e3c4987788d1ddbd8eadb5396f20a66e6e87abdbaa4524415a23da0f",
"md5": "772508c9e3c1f807323354521e15834b",
"sha256": "482a2d26af69b5b1c0cf09d873c4f09a3a7efe5770756d8b3e55f56663833069"
},
"downloads": -1,
"filename": "s2spy-0.4.1.tar.gz",
"has_sig": false,
"md5_digest": "772508c9e3c1f807323354521e15834b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.12,>3.8",
"size": 5188923,
"upload_time": "2024-10-09T08:04:56",
"upload_time_iso_8601": "2024-10-09T08:04:56.984064Z",
"url": "https://files.pythonhosted.org/packages/0c/47/6b67e3c4987788d1ddbd8eadb5396f20a66e6e87abdbaa4524415a23da0f/s2spy-0.4.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-09 08:04:56",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "AI4S2S",
"github_project": "ai4s2s",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "s2spy"
}