pipda


Namepipda JSON
Version 0.13.1 PyPI version JSON
download
home_page
SummaryA framework for data piping in python
upload_time2023-10-10 21:28:36
maintainer
docs_urlNone
authorpwwang
requires_python>=3.8,<4.0
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pipda

[![Pypi][7]][8] [![Github][9]][10] [![PythonVers][11]][8] [![Codacy][16]][14] [![Codacy coverage][15]][14] ![Docs building][13] ![Building][12]

A framework for data piping in python

Inspired by [siuba][1], [dfply][2], [plydata][3] and [dplython][4], but with simple yet powerful APIs to mimic the `dplyr` and `tidyr` packages in python

[API][17] | [Change Log][18] | [Documentation][19]

## Installation

```shell
pip install -U pipda
```

## Usage

### Verbs

- A verb is pipeable (able to be called like `data >> verb(...)`)
- A verb is dispatchable by the type of its first argument
- A verb evaluates other arguments using the first one
- A verb is passing down the context if not specified in the arguments

```python
import pandas as pd
from pipda import (
    register_verb,
    register_func,
    register_operator,
    evaluate_expr,
    Operator,
    Symbolic,
    Context
)

f = Symbolic()

df = pd.DataFrame({
    'x': [0, 1, 2, 3],
    'y': ['zero', 'one', 'two', 'three']
})

df

#      x    y
# 0    0    zero
# 1    1    one
# 2    2    two
# 3    3    three

@register_verb(pd.DataFrame)
def head(data, n=5):
    return data.head(n)

df >> head(2)
#      x    y
# 0    0    zero
# 1    1    one

@register_verb(pd.DataFrame, context=Context.EVAL)
def mutate(data, **kwargs):
    data = data.copy()
    for key, val in kwargs.items():
        data[key] = val
    return data

df >> mutate(z=1)
#    x      y  z
# 0  0   zero  1
# 1  1    one  1
# 2  2    two  1
# 3  3  three  1

df >> mutate(z=f.x)
#    x      y  z
# 0  0   zero  0
# 1  1    one  1
# 2  2    two  2
# 3  3  three  3
```

### Functions used as verb arguments

```python
# verb can be used as an argument passed to another verb
# dep=True make `data` argument invisible while calling
@register_verb(pd.DataFrame, context=Context.EVAL, dep=True)
def if_else(data, cond, true, false):
    cond.loc[cond.isin([True]), ] = true
    cond.loc[cond.isin([False]), ] = false
    return cond

# The function is then also a singledispatch generic function

df >> mutate(z=if_else(f.x>1, 20, 10))
#    x      y   z
# 0  0   zero  10
# 1  1    one  10
# 2  2    two  20
# 3  3  three  20
```

```python
# function without data argument
@register_func
def length(strings):
    return [len(s) for s in strings]

df >> mutate(z=length(f.y))

#    x     y    z
# 0  0  zero    4
# 1  1   one    3
# 2  2   two    3
# 3  3 three    5
```

### Context

The context defines how a reference (`f.A`, `f['A']`, `f.A.B` is evaluated)

```python
@register_verb(pd.DataFrame, context=Context.SELECT)
def select(df, *columns):
    return df[list(columns)]

df >> select(f.x, f.y)
#    x     y
# 0  0  zero
# 1  1   one
# 2  2   two
# 3  3 three
```

## How it works

```R
data %>% verb(arg1, ..., key1=kwarg1, ...)
```

The above is a typical `dplyr`/`tidyr` data piping syntax.

The counterpart python syntax we expect is:

```python
data >> verb(arg1, ..., key1=kwarg1, ...)
```

To implement that, we need to defer the execution of the `verb` by turning it into a `Verb` object, which holds all information of the function to be executed later. The `Verb` object won't be executed until the `data` is piped in. It all thanks to the [`executing`][5] package to let us determine the ast nodes where the function is called. So that we are able to determine whether the function is called in a piping mode.

If an argument is referring to a column of the data and the column will be involved in the later computation, the it also needs to be deferred. For example, with `dplyr` in `R`:

```R
data %>% mutate(z=a)
```

is trying add a column named `z` with the data from column `a`.

In python, we want to do the same with:

```python
data >> mutate(z=f.a)
```

where `f.a` is a `Reference` object that carries the column information without fetching the data while python sees it immmediately.

Here the trick is `f`. Like other packages, we introduced the `Symbolic` object, which will connect the parts in the argument and make the whole argument an `Expression` object. This object is holding the execution information, which we could use later when the piping is detected.

## Documentation

[https://pwwang.github.io/pipda/][19]

See also [datar][6] for real-case usages.

[1]: https://github.com/machow/siuba
[2]: https://github.com/kieferk/dfply
[3]: https://github.com/has2k1/plydata
[4]: https://github.com/dodger487/dplython
[5]: https://github.com/alexmojaki/executing
[6]: https://github.com/pwwang/datar
[7]: https://img.shields.io/pypi/v/pipda?style=flat-square
[8]: https://pypi.org/project/pipda/
[9]: https://img.shields.io/github/v/tag/pwwang/pipda?style=flat-square
[10]: https://github.com/pwwang/pipda
[11]: https://img.shields.io/pypi/pyversions/pipda?style=flat-square
[12]: https://img.shields.io/github/actions/workflow/status/pwwang/pipda/build.yml?label=CI&style=flat-square
[13]: https://img.shields.io/github/actions/workflow/status/pwwang/pipda/docs.yml?label=docs&style=flat-square
[14]: https://app.codacy.com/gh/pwwang/pipda/dashboard
[15]: https://img.shields.io/codacy/coverage/75d312da24c94bdda5923627fc311a99?style=flat-square
[16]: https://img.shields.io/codacy/grade/75d312da24c94bdda5923627fc311a99?style=flat-square
[17]: https://pwwang.github.io/pipda/api/pipda/
[18]: https://pwwang.github.io/pipda/CHANGELOG/
[19]: https://pwwang.github.io/pipda/

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "pipda",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8,<4.0",
    "maintainer_email": "",
    "keywords": "",
    "author": "pwwang",
    "author_email": "pwwang@pwwang.com",
    "download_url": "https://files.pythonhosted.org/packages/72/ef/51772bad9cb991011efcd3d99a4f052e5563da9db8439f5279e5aa8bb1fd/pipda-0.13.1.tar.gz",
    "platform": null,
    "description": "# pipda\n\n[![Pypi][7]][8] [![Github][9]][10] [![PythonVers][11]][8] [![Codacy][16]][14] [![Codacy coverage][15]][14] ![Docs building][13] ![Building][12]\n\nA framework for data piping in python\n\nInspired by [siuba][1], [dfply][2], [plydata][3] and [dplython][4], but with simple yet powerful APIs to mimic the `dplyr` and `tidyr` packages in python\n\n[API][17] | [Change Log][18] | [Documentation][19]\n\n## Installation\n\n```shell\npip install -U pipda\n```\n\n## Usage\n\n### Verbs\n\n- A verb is pipeable (able to be called like `data >> verb(...)`)\n- A verb is dispatchable by the type of its first argument\n- A verb evaluates other arguments using the first one\n- A verb is passing down the context if not specified in the arguments\n\n```python\nimport pandas as pd\nfrom pipda import (\n    register_verb,\n    register_func,\n    register_operator,\n    evaluate_expr,\n    Operator,\n    Symbolic,\n    Context\n)\n\nf = Symbolic()\n\ndf = pd.DataFrame({\n    'x': [0, 1, 2, 3],\n    'y': ['zero', 'one', 'two', 'three']\n})\n\ndf\n\n#      x    y\n# 0    0    zero\n# 1    1    one\n# 2    2    two\n# 3    3    three\n\n@register_verb(pd.DataFrame)\ndef head(data, n=5):\n    return data.head(n)\n\ndf >> head(2)\n#      x    y\n# 0    0    zero\n# 1    1    one\n\n@register_verb(pd.DataFrame, context=Context.EVAL)\ndef mutate(data, **kwargs):\n    data = data.copy()\n    for key, val in kwargs.items():\n        data[key] = val\n    return data\n\ndf >> mutate(z=1)\n#    x      y  z\n# 0  0   zero  1\n# 1  1    one  1\n# 2  2    two  1\n# 3  3  three  1\n\ndf >> mutate(z=f.x)\n#    x      y  z\n# 0  0   zero  0\n# 1  1    one  1\n# 2  2    two  2\n# 3  3  three  3\n```\n\n### Functions used as verb arguments\n\n```python\n# verb can be used as an argument passed to another verb\n# dep=True make `data` argument invisible while calling\n@register_verb(pd.DataFrame, context=Context.EVAL, dep=True)\ndef if_else(data, cond, true, false):\n    cond.loc[cond.isin([True]), ] = true\n    cond.loc[cond.isin([False]), ] = false\n    return cond\n\n# The function is then also a singledispatch generic function\n\ndf >> mutate(z=if_else(f.x>1, 20, 10))\n#    x      y   z\n# 0  0   zero  10\n# 1  1    one  10\n# 2  2    two  20\n# 3  3  three  20\n```\n\n```python\n# function without data argument\n@register_func\ndef length(strings):\n    return [len(s) for s in strings]\n\ndf >> mutate(z=length(f.y))\n\n#    x     y    z\n# 0  0  zero    4\n# 1  1   one    3\n# 2  2   two    3\n# 3  3 three    5\n```\n\n### Context\n\nThe context defines how a reference (`f.A`, `f['A']`, `f.A.B` is evaluated)\n\n```python\n@register_verb(pd.DataFrame, context=Context.SELECT)\ndef select(df, *columns):\n    return df[list(columns)]\n\ndf >> select(f.x, f.y)\n#    x     y\n# 0  0  zero\n# 1  1   one\n# 2  2   two\n# 3  3 three\n```\n\n## How it works\n\n```R\ndata %>% verb(arg1, ..., key1=kwarg1, ...)\n```\n\nThe above is a typical `dplyr`/`tidyr` data piping syntax.\n\nThe counterpart python syntax we expect is:\n\n```python\ndata >> verb(arg1, ..., key1=kwarg1, ...)\n```\n\nTo implement that, we need to defer the execution of the `verb` by turning it into a `Verb` object, which holds all information of the function to be executed later. The `Verb` object won't be executed until the `data` is piped in. It all thanks to the [`executing`][5] package to let us determine the ast nodes where the function is called. So that we are able to determine whether the function is called in a piping mode.\n\nIf an argument is referring to a column of the data and the column will be involved in the later computation, the it also needs to be deferred. For example, with `dplyr` in `R`:\n\n```R\ndata %>% mutate(z=a)\n```\n\nis trying add a column named `z` with the data from column `a`.\n\nIn python, we want to do the same with:\n\n```python\ndata >> mutate(z=f.a)\n```\n\nwhere `f.a` is a `Reference` object that carries the column information without fetching the data while python sees it immmediately.\n\nHere the trick is `f`. Like other packages, we introduced the `Symbolic` object, which will connect the parts in the argument and make the whole argument an `Expression` object. This object is holding the execution information, which we could use later when the piping is detected.\n\n## Documentation\n\n[https://pwwang.github.io/pipda/][19]\n\nSee also [datar][6] for real-case usages.\n\n[1]: https://github.com/machow/siuba\n[2]: https://github.com/kieferk/dfply\n[3]: https://github.com/has2k1/plydata\n[4]: https://github.com/dodger487/dplython\n[5]: https://github.com/alexmojaki/executing\n[6]: https://github.com/pwwang/datar\n[7]: https://img.shields.io/pypi/v/pipda?style=flat-square\n[8]: https://pypi.org/project/pipda/\n[9]: https://img.shields.io/github/v/tag/pwwang/pipda?style=flat-square\n[10]: https://github.com/pwwang/pipda\n[11]: https://img.shields.io/pypi/pyversions/pipda?style=flat-square\n[12]: https://img.shields.io/github/actions/workflow/status/pwwang/pipda/build.yml?label=CI&style=flat-square\n[13]: https://img.shields.io/github/actions/workflow/status/pwwang/pipda/docs.yml?label=docs&style=flat-square\n[14]: https://app.codacy.com/gh/pwwang/pipda/dashboard\n[15]: https://img.shields.io/codacy/coverage/75d312da24c94bdda5923627fc311a99?style=flat-square\n[16]: https://img.shields.io/codacy/grade/75d312da24c94bdda5923627fc311a99?style=flat-square\n[17]: https://pwwang.github.io/pipda/api/pipda/\n[18]: https://pwwang.github.io/pipda/CHANGELOG/\n[19]: https://pwwang.github.io/pipda/\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A framework for data piping in python",
    "version": "0.13.1",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "768f10431c73e0e31d84e3e71264389787fe7e3cf6b3678e57684862af7d4f01",
                "md5": "f8e3e40956b581743af088b386b353c1",
                "sha256": "9e9046ac507ad03ced7b63e09e2468bdc2c863c01d44233c5502b4f450461893"
            },
            "downloads": -1,
            "filename": "pipda-0.13.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f8e3e40956b581743af088b386b353c1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<4.0",
            "size": 20843,
            "upload_time": "2023-10-10T21:28:34",
            "upload_time_iso_8601": "2023-10-10T21:28:34.775627Z",
            "url": "https://files.pythonhosted.org/packages/76/8f/10431c73e0e31d84e3e71264389787fe7e3cf6b3678e57684862af7d4f01/pipda-0.13.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "72ef51772bad9cb991011efcd3d99a4f052e5563da9db8439f5279e5aa8bb1fd",
                "md5": "5fd4c4c67137650662f0b4adee69b6f0",
                "sha256": "56420cbb285a085db385a37ad267f59ba090ec1e901eb122132bd64ad5f515f9"
            },
            "downloads": -1,
            "filename": "pipda-0.13.1.tar.gz",
            "has_sig": false,
            "md5_digest": "5fd4c4c67137650662f0b4adee69b6f0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<4.0",
            "size": 18572,
            "upload_time": "2023-10-10T21:28:36",
            "upload_time_iso_8601": "2023-10-10T21:28:36.206815Z",
            "url": "https://files.pythonhosted.org/packages/72/ef/51772bad9cb991011efcd3d99a4f052e5563da9db8439f5279e5aa8bb1fd/pipda-0.13.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-10 21:28:36",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "pipda"
}
        
Elapsed time: 0.13831s