nanbi


Namenanbi JSON
Version 0.0.1 PyPI version JSON
download
home_page
SummaryA framework that allows the definition of data transformations in a composable way, agnostic of data processing engine.
upload_time2023-03-13 07:25:29
maintainer
docs_urlNone
author
requires_python>=3.8
licenseMIT License Copyright (c) 2022 Eduardo Emery Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords data transformation pandas sql spark
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Nanbi


>*Nanbiquara*: speech of smart people, of clever people
>- Translated from the [Tupi Guarani Illustrated Dictionary](https://www.dicionariotupiguarani.com.br/dicionario/nanbiquara/)

Nanbi is a framework that allows you to define data transformations in a composable way, agnostic of data processing engine (Pandas,  mySQL, Spark etc).
- Its syntax is *sql-like*, inspired by PySpark and Scala-Spark approaches
- It allows you to define a set of data transformations in a more composable way than SQL, for example, allowing for better readability specially on complex queries
- It allows you to execute your data transformations definitions in multiple engines (Pandas, mySQL, Spark etc) without having to change the data transformation definition

>Nanbi is right now under the initial stages of development. It's not fully ready for a version 1. So far, there is no compatibility with engines other than Pandas.
>
>Please get in touch if you have interest in using Nanbi on your work or personal project. Feature requests are welcome.


## Setup

>While the library isn't published in PyPI
1. Clone the repo
2. Create a symlink to the repo
- TODO(eemery): Add installation details once package gets published in PyPI

## Getting Started

### 1. Creating a DataFrame

Nanbi uses the concept of a `DataFrame` to represent a table and its annotations (or metadata). Currently, Nanbi supports the creation of DataFrames from Pandas DataFrames and CSV files (using Pandas behind the scenes).

**From a Pandas DataFrame**

```python
import pandas as pd
import nanbi.connectors.pandas as nb

pandas_df = pd.DataFrame({"num_a": [10, 50, 20, 50, 20],
                          "num_b": [41, 51, 21, 31, 11]})

df = nb.from_data_frame(pandas_df)
```

**From a CSV file (with Pandas)**

```python
import nanbi.connectors.pandas as nb

df = nb.from_csv("path/to/my-file.csv")
```

**Viewing your imported data**

To visualize the imported or created data, just use the `.display()` method:

```python
import nanbi.connectors.pandas as nb

df = nb.from_csv("path/to/my-file.csv")

df.display()
```

The output will be a Pandas DataFrame, for example:

```
  col_a col_b
0 50    51
1 50    31
2 20    21
3 20    11
4 10    51
```

### 2. Enriching tables (`.with_columns()`)

Nanbi goal is to allow you to define data transformations to enrich your table with derived data in a composable way. One of the main ways that you can achieve this, is by the use of the `.with_column()` method. It creates a new column in your table according to the transformation formula you gave it. For example:

```python
import nanbi.connectors.pandas as nb

df = nb.from_csv("path/to/my-file.csv")

enriched_df = df.with_column("result", col("col_a") + col("col_a"))

enriched_df.display()
```

The output will be a Pandas DataFrame in the form of:

```
  col_a col_b result
0 50    51    101
1 50    31    81
2 20    21    41
3 20    11    31
4 10    51    61
```

#### Chaining Transformations

One improvement that we can make to the code above is to take advantage of chaining transformations. We could have written the above code like:

```python
import nanbi.connectors.pandas as nb

df = nb.from_csv("path/to/my-file.csv")
       .with_column("result", col("col_a") + col("col_a"))

df.display()
```

#### Improving Transformations Readability and Reusability

Another improvement that can be done, specially when transformations get complex, is to move the formula definition (i.e., `col("col_a") + col("col_a")`) to its own variable. In the code above, this would look like:

```python
import nanbi.connectors.pandas as nb

my_complex_formula = col("col_a") + col("col_a")

df = nb.from_csv("path/to/my-file.csv")
       .with_column("result", my_complex_formula)

df.display()
```

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "nanbi",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "data,transformation,pandas,SQL,Spark",
    "author": "",
    "author_email": "Eduardo Emery <emeryecs@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/4e/87/548143dfe0f9d524f5f57f020683f3eb4e0378c1b5abc8b1b0d3ead8746f/nanbi-0.0.1.tar.gz",
    "platform": null,
    "description": "# Nanbi\n\n\n>*Nanbiquara*: speech of smart people, of clever people\n>- Translated from the [Tupi Guarani Illustrated Dictionary](https://www.dicionariotupiguarani.com.br/dicionario/nanbiquara/)\n\nNanbi is a framework that allows you to define data transformations in a composable way, agnostic of data processing engine (Pandas,  mySQL, Spark etc).\n- Its syntax is *sql-like*, inspired by PySpark and Scala-Spark approaches\n- It allows you to define a set of data transformations in a more composable way than SQL, for example, allowing for better readability specially on complex queries\n- It allows you to execute your data transformations definitions in multiple engines (Pandas, mySQL, Spark etc) without having to change the data transformation definition\n\n>Nanbi is right now under the initial stages of development. It's not fully ready for a version 1. So far, there is no compatibility with engines other than Pandas.\n>\n>Please get in touch if you have interest in using Nanbi on your work or personal project. Feature requests are welcome.\n\n\n## Setup\n\n>While the library isn't published in PyPI\n1. Clone the repo\n2. Create a symlink to the repo\n- TODO(eemery): Add installation details once package gets published in PyPI\n\n## Getting Started\n\n### 1. Creating a DataFrame\n\nNanbi uses the concept of a `DataFrame` to represent a table and its annotations (or metadata). Currently, Nanbi supports the creation of DataFrames from Pandas DataFrames and CSV files (using Pandas behind the scenes).\n\n**From a Pandas DataFrame**\n\n```python\nimport pandas as pd\nimport nanbi.connectors.pandas as nb\n\npandas_df = pd.DataFrame({\"num_a\": [10, 50, 20, 50, 20],\n                          \"num_b\": [41, 51, 21, 31, 11]})\n\ndf = nb.from_data_frame(pandas_df)\n```\n\n**From a CSV file (with Pandas)**\n\n```python\nimport nanbi.connectors.pandas as nb\n\ndf = nb.from_csv(\"path/to/my-file.csv\")\n```\n\n**Viewing your imported data**\n\nTo visualize the imported or created data, just use the `.display()` method:\n\n```python\nimport nanbi.connectors.pandas as nb\n\ndf = nb.from_csv(\"path/to/my-file.csv\")\n\ndf.display()\n```\n\nThe output will be a Pandas DataFrame, for example:\n\n```\n  col_a col_b\n0 50    51\n1 50    31\n2 20    21\n3 20    11\n4 10    51\n```\n\n### 2. Enriching tables (`.with_columns()`)\n\nNanbi goal is to allow you to define data transformations to enrich your table with derived data in a composable way. One of the main ways that you can achieve this, is by the use of the `.with_column()` method. It creates a new column in your table according to the transformation formula you gave it. For example:\n\n```python\nimport nanbi.connectors.pandas as nb\n\ndf = nb.from_csv(\"path/to/my-file.csv\")\n\nenriched_df = df.with_column(\"result\", col(\"col_a\") + col(\"col_a\"))\n\nenriched_df.display()\n```\n\nThe output will be a Pandas DataFrame in the form of:\n\n```\n  col_a col_b result\n0 50    51    101\n1 50    31    81\n2 20    21    41\n3 20    11    31\n4 10    51    61\n```\n\n#### Chaining Transformations\n\nOne improvement that we can make to the code above is to take advantage of chaining transformations. We could have written the above code like:\n\n```python\nimport nanbi.connectors.pandas as nb\n\ndf = nb.from_csv(\"path/to/my-file.csv\")\n       .with_column(\"result\", col(\"col_a\") + col(\"col_a\"))\n\ndf.display()\n```\n\n#### Improving Transformations Readability and Reusability\n\nAnother improvement that can be done, specially when transformations get complex, is to move the formula definition (i.e., `col(\"col_a\") + col(\"col_a\")`) to its own variable. In the code above, this would look like:\n\n```python\nimport nanbi.connectors.pandas as nb\n\nmy_complex_formula = col(\"col_a\") + col(\"col_a\")\n\ndf = nb.from_csv(\"path/to/my-file.csv\")\n       .with_column(\"result\", my_complex_formula)\n\ndf.display()\n```\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2022 Eduardo Emery  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ",
    "summary": "A framework that allows the definition of data transformations in a composable way, agnostic of data processing engine.",
    "version": "0.0.1",
    "split_keywords": [
        "data",
        "transformation",
        "pandas",
        "sql",
        "spark"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "112f033487113bd30ca32624f2cb065c53670724043e5e6932eb6d0715755522",
                "md5": "00bffb7798562c4b2c5508d149cf47c8",
                "sha256": "42a7906c115ab9f612c5f5e596f20fae2ac42e9046ec0cee0af86097b79b776f"
            },
            "downloads": -1,
            "filename": "nanbi-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "00bffb7798562c4b2c5508d149cf47c8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 11848,
            "upload_time": "2023-03-13T07:25:27",
            "upload_time_iso_8601": "2023-03-13T07:25:27.237855Z",
            "url": "https://files.pythonhosted.org/packages/11/2f/033487113bd30ca32624f2cb065c53670724043e5e6932eb6d0715755522/nanbi-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4e87548143dfe0f9d524f5f57f020683f3eb4e0378c1b5abc8b1b0d3ead8746f",
                "md5": "d402818491797801e30e0a6445ebcd6e",
                "sha256": "537d714885c8a33cb54a1edc21d85601c34b7fd5da8121510aa6d074c4675f6e"
            },
            "downloads": -1,
            "filename": "nanbi-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "d402818491797801e30e0a6445ebcd6e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 11188,
            "upload_time": "2023-03-13T07:25:29",
            "upload_time_iso_8601": "2023-03-13T07:25:29.368660Z",
            "url": "https://files.pythonhosted.org/packages/4e/87/548143dfe0f9d524f5f57f020683f3eb4e0378c1b5abc8b1b0d3ead8746f/nanbi-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-03-13 07:25:29",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "nanbi"
}
        
Elapsed time: 0.09236s