<div align="center">
<img src="https://github.com/rvandewater/ReciPies/blob/development/docs/figures/recipies_logo.svg?raw=true"
alt="recipies logo" height="300">
</div>
# ReciPies 🥧
[](https://github.com/rvandewater/ReciPies/actions/workflows/ci.yml)


[](LICENSE)
[](https://pypi.python.org/pypi/recipies/)
[](https://pypi.python.org/pypi/recipies/)
[](https://pepy.tech/project/recipies)
[](http://arxiv.org/abs/2306.05109)
The ReciPies package is a preprocessing framework operating on [Polars](https://github.com/pola-rs/polars)
and [Pandas](https://github.com/pandas-dev/pandas) dataframes. The backend can be chosen by the user.
The operation of this package is inspired by the R-package [recipes](https://recipes.tidymodels.org/).
This package allows the user to apply a number of extensible operations for imputation, feature generation/extraction,
scaling, and encoding.
It operates on modified Dataframe objects from the established data science package Pandas.
## Installation
You can install ReciPies from pip using:
```
pip install recipies
```
> Note that the package is called `recipies` on pip.
>
You can install ReciPies from source to ensure you have the latest version:
```
conda env update -f environment.yml
conda activate ReciPies
pip install -e .
```
> Note that the last command installs the package called `recipies`.
## Usage
To define preprocessing operations, one has to supply _roles_ to the different columns of the Dataframe.
This allows the user to create groups of columns which have a particular function.
Then, we provide several "steps" that can be applied to the datasets, among which: Historical accumulation,
Resampling the time resolution, A number of imputation methods, and a wrapper for any
[Scikit-learn](https://github.com/scikit-learn/scikit-learn) preprocessing step.
We believe to have covered any basic preprocessing needs for prepared datasets.
Any missing step can be added by following the step interface.
# 📄Paper
If you use this code in your research, please cite the following publication which uses ReciPys extensively to create a
customisable preprocessing pipeline (a standalone paper is in preparation):
```
@inproceedings{vandewaterYetAnotherICUBenchmark2024,
title = {Yet Another ICU Benchmark: A Flexible Multi-Center Framework for Clinical ML},
shorttitle = {Yet Another ICU Benchmark},
booktitle = {The Twelfth International Conference on Learning Representations},
author = {van de Water, Robin and Schmidt, Hendrik Nils Aurel and Elbers, Paul and Thoral, Patrick and Arnrich, Bert and Rockenschaub, Patrick},
year = {2024},
month = oct,
urldate = {2024-02-19},
langid = {english},
}
```
This paper can also be found on arxiv: https://arxiv.org/pdf/2306.05109.pdf
Raw data
{
"_id": null,
"home_page": null,
"name": "recipies",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "recipies, pandas, dataframe, polars, preprocessing, recipys",
"author": "Hendrik Schmidt, Patrick Rockenschaub",
"author_email": "Robin van de Water <robin.vandewater@hpi.de>",
"download_url": "https://files.pythonhosted.org/packages/4d/50/165a97aab5b4d01574d80cbc62703b78dc3650ebf5ebee089af3321f088c/recipies-1.2.0.tar.gz",
"platform": null,
"description": "<div align=\"center\">\n <img src=\"https://github.com/rvandewater/ReciPies/blob/development/docs/figures/recipies_logo.svg?raw=true\" \nalt=\"recipies logo\" height=\"300\">\n</div>\n\n# ReciPies \ud83e\udd67\n\n[](https://github.com/rvandewater/ReciPies/actions/workflows/ci.yml)\n\n\n[](LICENSE)\n[](https://pypi.python.org/pypi/recipies/)\n[](https://pypi.python.org/pypi/recipies/)\n[](https://pepy.tech/project/recipies)\n[](http://arxiv.org/abs/2306.05109)\n\nThe ReciPies package is a preprocessing framework operating on [Polars](https://github.com/pola-rs/polars)\nand [Pandas](https://github.com/pandas-dev/pandas) dataframes. The backend can be chosen by the user.\nThe operation of this package is inspired by the R-package [recipes](https://recipes.tidymodels.org/).\nThis package allows the user to apply a number of extensible operations for imputation, feature generation/extraction,\nscaling, and encoding.\nIt operates on modified Dataframe objects from the established data science package Pandas.\n## Installation\n\nYou can install ReciPies from pip using:\n\n```\npip install recipies\n```\n\n> Note that the package is called `recipies` on pip.\n>\nYou can install ReciPies from source to ensure you have the latest version:\n\n```\nconda env update -f environment.yml\nconda activate ReciPies\npip install -e .\n```\n\n> Note that the last command installs the package called `recipies`.\n\n## Usage\n\nTo define preprocessing operations, one has to supply _roles_ to the different columns of the Dataframe.\nThis allows the user to create groups of columns which have a particular function.\nThen, we provide several \"steps\" that can be applied to the datasets, among which: Historical accumulation,\nResampling the time resolution, A number of imputation methods, and a wrapper for any\n[Scikit-learn](https://github.com/scikit-learn/scikit-learn) preprocessing step.\nWe believe to have covered any basic preprocessing needs for prepared datasets.\nAny missing step can be added by following the step interface.\n\n# \ud83d\udcc4Paper\n\nIf you use this code in your research, please cite the following publication which uses ReciPys extensively to create a \ncustomisable preprocessing pipeline (a standalone paper is in preparation):\n\n```\n@inproceedings{vandewaterYetAnotherICUBenchmark2024,\n title = {Yet Another ICU Benchmark: A Flexible Multi-Center Framework for Clinical ML},\n shorttitle = {Yet Another ICU Benchmark},\n booktitle = {The Twelfth International Conference on Learning Representations},\n author = {van de Water, Robin and Schmidt, Hendrik Nils Aurel and Elbers, Paul and Thoral, Patrick and Arnrich, Bert and Rockenschaub, Patrick},\n year = {2024},\n month = oct,\n urldate = {2024-02-19},\n langid = {english},\n}\n\n```\n\nThis paper can also be found on arxiv: https://arxiv.org/pdf/2306.05109.pdf\n\n\n\n\n",
"bugtrack_url": null,
"license": "MIT license",
"summary": "A modular preprocessing package for Pandas Dataframe",
"version": "1.2.0",
"project_urls": null,
"split_keywords": [
"recipies",
" pandas",
" dataframe",
" polars",
" preprocessing",
" recipys"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "d691b232b99fdccb349f5b220daee5f8d89e63eaccc8357acf969849e918c61c",
"md5": "5dc1521426ddac16c8dfd923d68aa6d7",
"sha256": "8feaaa2f577aee7ae2b8745387c72d7f15dbc18d0ae0ef17a4bdf351ab6b28a3"
},
"downloads": -1,
"filename": "recipies-1.2.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "5dc1521426ddac16c8dfd923d68aa6d7",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 17043,
"upload_time": "2025-07-24T10:32:52",
"upload_time_iso_8601": "2025-07-24T10:32:52.170803Z",
"url": "https://files.pythonhosted.org/packages/d6/91/b232b99fdccb349f5b220daee5f8d89e63eaccc8357acf969849e918c61c/recipies-1.2.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "4d50165a97aab5b4d01574d80cbc62703b78dc3650ebf5ebee089af3321f088c",
"md5": "ce1db6aa7a5394110b8ede226f799b47",
"sha256": "56a762264f7cfba42ad903af24c1a279cd31dd591bf3717a677b532b2c0de343"
},
"downloads": -1,
"filename": "recipies-1.2.0.tar.gz",
"has_sig": false,
"md5_digest": "ce1db6aa7a5394110b8ede226f799b47",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 3730807,
"upload_time": "2025-07-24T10:32:53",
"upload_time_iso_8601": "2025-07-24T10:32:53.601602Z",
"url": "https://files.pythonhosted.org/packages/4d/50/165a97aab5b4d01574d80cbc62703b78dc3650ebf5ebee089af3321f088c/recipies-1.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-24 10:32:53",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "recipies"
}