Name | pipda JSON |
Version |
0.13.1
JSON |
| download |
home_page | |
Summary | A framework for data piping in python |
upload_time | 2023-10-10 21:28:36 |
maintainer | |
docs_url | None |
author | pwwang |
requires_python | >=3.8,<4.0 |
license | MIT |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# pipda
[![Pypi][7]][8] [![Github][9]][10] [![PythonVers][11]][8] [![Codacy][16]][14] [![Codacy coverage][15]][14] ![Docs building][13] ![Building][12]
A framework for data piping in python
Inspired by [siuba][1], [dfply][2], [plydata][3] and [dplython][4], but with simple yet powerful APIs to mimic the `dplyr` and `tidyr` packages in python
[API][17] | [Change Log][18] | [Documentation][19]
## Installation
```shell
pip install -U pipda
```
## Usage
### Verbs
- A verb is pipeable (able to be called like `data >> verb(...)`)
- A verb is dispatchable by the type of its first argument
- A verb evaluates other arguments using the first one
- A verb is passing down the context if not specified in the arguments
```python
import pandas as pd
from pipda import (
register_verb,
register_func,
register_operator,
evaluate_expr,
Operator,
Symbolic,
Context
)
f = Symbolic()
df = pd.DataFrame({
'x': [0, 1, 2, 3],
'y': ['zero', 'one', 'two', 'three']
})
df
# x y
# 0 0 zero
# 1 1 one
# 2 2 two
# 3 3 three
@register_verb(pd.DataFrame)
def head(data, n=5):
return data.head(n)
df >> head(2)
# x y
# 0 0 zero
# 1 1 one
@register_verb(pd.DataFrame, context=Context.EVAL)
def mutate(data, **kwargs):
data = data.copy()
for key, val in kwargs.items():
data[key] = val
return data
df >> mutate(z=1)
# x y z
# 0 0 zero 1
# 1 1 one 1
# 2 2 two 1
# 3 3 three 1
df >> mutate(z=f.x)
# x y z
# 0 0 zero 0
# 1 1 one 1
# 2 2 two 2
# 3 3 three 3
```
### Functions used as verb arguments
```python
# verb can be used as an argument passed to another verb
# dep=True make `data` argument invisible while calling
@register_verb(pd.DataFrame, context=Context.EVAL, dep=True)
def if_else(data, cond, true, false):
cond.loc[cond.isin([True]), ] = true
cond.loc[cond.isin([False]), ] = false
return cond
# The function is then also a singledispatch generic function
df >> mutate(z=if_else(f.x>1, 20, 10))
# x y z
# 0 0 zero 10
# 1 1 one 10
# 2 2 two 20
# 3 3 three 20
```
```python
# function without data argument
@register_func
def length(strings):
return [len(s) for s in strings]
df >> mutate(z=length(f.y))
# x y z
# 0 0 zero 4
# 1 1 one 3
# 2 2 two 3
# 3 3 three 5
```
### Context
The context defines how a reference (`f.A`, `f['A']`, `f.A.B` is evaluated)
```python
@register_verb(pd.DataFrame, context=Context.SELECT)
def select(df, *columns):
return df[list(columns)]
df >> select(f.x, f.y)
# x y
# 0 0 zero
# 1 1 one
# 2 2 two
# 3 3 three
```
## How it works
```R
data %>% verb(arg1, ..., key1=kwarg1, ...)
```
The above is a typical `dplyr`/`tidyr` data piping syntax.
The counterpart python syntax we expect is:
```python
data >> verb(arg1, ..., key1=kwarg1, ...)
```
To implement that, we need to defer the execution of the `verb` by turning it into a `Verb` object, which holds all information of the function to be executed later. The `Verb` object won't be executed until the `data` is piped in. It all thanks to the [`executing`][5] package to let us determine the ast nodes where the function is called. So that we are able to determine whether the function is called in a piping mode.
If an argument is referring to a column of the data and the column will be involved in the later computation, the it also needs to be deferred. For example, with `dplyr` in `R`:
```R
data %>% mutate(z=a)
```
is trying add a column named `z` with the data from column `a`.
In python, we want to do the same with:
```python
data >> mutate(z=f.a)
```
where `f.a` is a `Reference` object that carries the column information without fetching the data while python sees it immmediately.
Here the trick is `f`. Like other packages, we introduced the `Symbolic` object, which will connect the parts in the argument and make the whole argument an `Expression` object. This object is holding the execution information, which we could use later when the piping is detected.
## Documentation
[https://pwwang.github.io/pipda/][19]
See also [datar][6] for real-case usages.
[1]: https://github.com/machow/siuba
[2]: https://github.com/kieferk/dfply
[3]: https://github.com/has2k1/plydata
[4]: https://github.com/dodger487/dplython
[5]: https://github.com/alexmojaki/executing
[6]: https://github.com/pwwang/datar
[7]: https://img.shields.io/pypi/v/pipda?style=flat-square
[8]: https://pypi.org/project/pipda/
[9]: https://img.shields.io/github/v/tag/pwwang/pipda?style=flat-square
[10]: https://github.com/pwwang/pipda
[11]: https://img.shields.io/pypi/pyversions/pipda?style=flat-square
[12]: https://img.shields.io/github/actions/workflow/status/pwwang/pipda/build.yml?label=CI&style=flat-square
[13]: https://img.shields.io/github/actions/workflow/status/pwwang/pipda/docs.yml?label=docs&style=flat-square
[14]: https://app.codacy.com/gh/pwwang/pipda/dashboard
[15]: https://img.shields.io/codacy/coverage/75d312da24c94bdda5923627fc311a99?style=flat-square
[16]: https://img.shields.io/codacy/grade/75d312da24c94bdda5923627fc311a99?style=flat-square
[17]: https://pwwang.github.io/pipda/api/pipda/
[18]: https://pwwang.github.io/pipda/CHANGELOG/
[19]: https://pwwang.github.io/pipda/
Raw data
{
"_id": null,
"home_page": "",
"name": "pipda",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8,<4.0",
"maintainer_email": "",
"keywords": "",
"author": "pwwang",
"author_email": "pwwang@pwwang.com",
"download_url": "https://files.pythonhosted.org/packages/72/ef/51772bad9cb991011efcd3d99a4f052e5563da9db8439f5279e5aa8bb1fd/pipda-0.13.1.tar.gz",
"platform": null,
"description": "# pipda\n\n[![Pypi][7]][8] [![Github][9]][10] [![PythonVers][11]][8] [![Codacy][16]][14] [![Codacy coverage][15]][14] ![Docs building][13] ![Building][12]\n\nA framework for data piping in python\n\nInspired by [siuba][1], [dfply][2], [plydata][3] and [dplython][4], but with simple yet powerful APIs to mimic the `dplyr` and `tidyr` packages in python\n\n[API][17] | [Change Log][18] | [Documentation][19]\n\n## Installation\n\n```shell\npip install -U pipda\n```\n\n## Usage\n\n### Verbs\n\n- A verb is pipeable (able to be called like `data >> verb(...)`)\n- A verb is dispatchable by the type of its first argument\n- A verb evaluates other arguments using the first one\n- A verb is passing down the context if not specified in the arguments\n\n```python\nimport pandas as pd\nfrom pipda import (\n register_verb,\n register_func,\n register_operator,\n evaluate_expr,\n Operator,\n Symbolic,\n Context\n)\n\nf = Symbolic()\n\ndf = pd.DataFrame({\n 'x': [0, 1, 2, 3],\n 'y': ['zero', 'one', 'two', 'three']\n})\n\ndf\n\n# x y\n# 0 0 zero\n# 1 1 one\n# 2 2 two\n# 3 3 three\n\n@register_verb(pd.DataFrame)\ndef head(data, n=5):\n return data.head(n)\n\ndf >> head(2)\n# x y\n# 0 0 zero\n# 1 1 one\n\n@register_verb(pd.DataFrame, context=Context.EVAL)\ndef mutate(data, **kwargs):\n data = data.copy()\n for key, val in kwargs.items():\n data[key] = val\n return data\n\ndf >> mutate(z=1)\n# x y z\n# 0 0 zero 1\n# 1 1 one 1\n# 2 2 two 1\n# 3 3 three 1\n\ndf >> mutate(z=f.x)\n# x y z\n# 0 0 zero 0\n# 1 1 one 1\n# 2 2 two 2\n# 3 3 three 3\n```\n\n### Functions used as verb arguments\n\n```python\n# verb can be used as an argument passed to another verb\n# dep=True make `data` argument invisible while calling\n@register_verb(pd.DataFrame, context=Context.EVAL, dep=True)\ndef if_else(data, cond, true, false):\n cond.loc[cond.isin([True]), ] = true\n cond.loc[cond.isin([False]), ] = false\n return cond\n\n# The function is then also a singledispatch generic function\n\ndf >> mutate(z=if_else(f.x>1, 20, 10))\n# x y z\n# 0 0 zero 10\n# 1 1 one 10\n# 2 2 two 20\n# 3 3 three 20\n```\n\n```python\n# function without data argument\n@register_func\ndef length(strings):\n return [len(s) for s in strings]\n\ndf >> mutate(z=length(f.y))\n\n# x y z\n# 0 0 zero 4\n# 1 1 one 3\n# 2 2 two 3\n# 3 3 three 5\n```\n\n### Context\n\nThe context defines how a reference (`f.A`, `f['A']`, `f.A.B` is evaluated)\n\n```python\n@register_verb(pd.DataFrame, context=Context.SELECT)\ndef select(df, *columns):\n return df[list(columns)]\n\ndf >> select(f.x, f.y)\n# x y\n# 0 0 zero\n# 1 1 one\n# 2 2 two\n# 3 3 three\n```\n\n## How it works\n\n```R\ndata %>% verb(arg1, ..., key1=kwarg1, ...)\n```\n\nThe above is a typical `dplyr`/`tidyr` data piping syntax.\n\nThe counterpart python syntax we expect is:\n\n```python\ndata >> verb(arg1, ..., key1=kwarg1, ...)\n```\n\nTo implement that, we need to defer the execution of the `verb` by turning it into a `Verb` object, which holds all information of the function to be executed later. The `Verb` object won't be executed until the `data` is piped in. It all thanks to the [`executing`][5] package to let us determine the ast nodes where the function is called. So that we are able to determine whether the function is called in a piping mode.\n\nIf an argument is referring to a column of the data and the column will be involved in the later computation, the it also needs to be deferred. For example, with `dplyr` in `R`:\n\n```R\ndata %>% mutate(z=a)\n```\n\nis trying add a column named `z` with the data from column `a`.\n\nIn python, we want to do the same with:\n\n```python\ndata >> mutate(z=f.a)\n```\n\nwhere `f.a` is a `Reference` object that carries the column information without fetching the data while python sees it immmediately.\n\nHere the trick is `f`. Like other packages, we introduced the `Symbolic` object, which will connect the parts in the argument and make the whole argument an `Expression` object. This object is holding the execution information, which we could use later when the piping is detected.\n\n## Documentation\n\n[https://pwwang.github.io/pipda/][19]\n\nSee also [datar][6] for real-case usages.\n\n[1]: https://github.com/machow/siuba\n[2]: https://github.com/kieferk/dfply\n[3]: https://github.com/has2k1/plydata\n[4]: https://github.com/dodger487/dplython\n[5]: https://github.com/alexmojaki/executing\n[6]: https://github.com/pwwang/datar\n[7]: https://img.shields.io/pypi/v/pipda?style=flat-square\n[8]: https://pypi.org/project/pipda/\n[9]: https://img.shields.io/github/v/tag/pwwang/pipda?style=flat-square\n[10]: https://github.com/pwwang/pipda\n[11]: https://img.shields.io/pypi/pyversions/pipda?style=flat-square\n[12]: https://img.shields.io/github/actions/workflow/status/pwwang/pipda/build.yml?label=CI&style=flat-square\n[13]: https://img.shields.io/github/actions/workflow/status/pwwang/pipda/docs.yml?label=docs&style=flat-square\n[14]: https://app.codacy.com/gh/pwwang/pipda/dashboard\n[15]: https://img.shields.io/codacy/coverage/75d312da24c94bdda5923627fc311a99?style=flat-square\n[16]: https://img.shields.io/codacy/grade/75d312da24c94bdda5923627fc311a99?style=flat-square\n[17]: https://pwwang.github.io/pipda/api/pipda/\n[18]: https://pwwang.github.io/pipda/CHANGELOG/\n[19]: https://pwwang.github.io/pipda/\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A framework for data piping in python",
"version": "0.13.1",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "768f10431c73e0e31d84e3e71264389787fe7e3cf6b3678e57684862af7d4f01",
"md5": "f8e3e40956b581743af088b386b353c1",
"sha256": "9e9046ac507ad03ced7b63e09e2468bdc2c863c01d44233c5502b4f450461893"
},
"downloads": -1,
"filename": "pipda-0.13.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f8e3e40956b581743af088b386b353c1",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8,<4.0",
"size": 20843,
"upload_time": "2023-10-10T21:28:34",
"upload_time_iso_8601": "2023-10-10T21:28:34.775627Z",
"url": "https://files.pythonhosted.org/packages/76/8f/10431c73e0e31d84e3e71264389787fe7e3cf6b3678e57684862af7d4f01/pipda-0.13.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "72ef51772bad9cb991011efcd3d99a4f052e5563da9db8439f5279e5aa8bb1fd",
"md5": "5fd4c4c67137650662f0b4adee69b6f0",
"sha256": "56420cbb285a085db385a37ad267f59ba090ec1e901eb122132bd64ad5f515f9"
},
"downloads": -1,
"filename": "pipda-0.13.1.tar.gz",
"has_sig": false,
"md5_digest": "5fd4c4c67137650662f0b4adee69b6f0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8,<4.0",
"size": 18572,
"upload_time": "2023-10-10T21:28:36",
"upload_time_iso_8601": "2023-10-10T21:28:36.206815Z",
"url": "https://files.pythonhosted.org/packages/72/ef/51772bad9cb991011efcd3d99a4f052e5563da9db8439f5279e5aa8bb1fd/pipda-0.13.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-10-10 21:28:36",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "pipda"
}