daffy


Namedaffy JSON
Version 0.7.0 PyPI version JSON
download
home_pagehttps://github.com/fourkind/daffy
SummaryFunction decorators for Pandas Dataframe column name and data type validation
upload_time2023-12-07 13:33:17
maintainer
docs_urlNone
authorJanne Sinivirta
requires_python>=3.8.1,<4.0.0
licenseMIT
keywords pandas dataframe typing validation
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # DAFFY DataFrame Column Validator
[![PyPI](https://img.shields.io/pypi/v/daffy)](https://pypi.org/project/daffy/)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/daffy)
![test](https://github.com/fourkind/daffy/workflows/test/badge.svg)
[![codecov](https://codecov.io/gh/fourkind/daffy/branch/master/graph/badge.svg?token=2FYBMT65A6)](https://codecov.io/gh/fourkind/daffy)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

## Description 

In projects using Pandas, it's very common to have functions that take Pandas DataFrames as input or produce them as output.
It's hard to figure out quickly what these DataFrames contain. This library offers simple decorators to annotate your functions
so that they document themselves and that documentation is kept up-to-date by validating the input and output on runtime.

For example,

```python
@df_in(columns=["Brand", "Price"])     # the function expects a DataFrame as input parameter with columns Brand and Price
@df_out(columns=["Brand", "Price"])    # the function will return a DataFrame with columns Brand and Price
def filter_cars(car_df):
    # before this code is executed, the input DataFrame is validated according to the above decorator
    # filter some cars..
    return filtered_cars_df
```

## Table of Contents
* [Installation](#installation)
* [Usage](#usage)
* [Contributing](#contributing)
* [Tests](#tests)
* [License](#license)
* [Changelog](#changelog)

## Installation

Install with your favorite Python dependency manager like

```sh
pip install daffy
```

## Usage 

Start by importing the needed decorators:

```python
from daffy import df_in, df_out
```

To check a DataFrame input to a function, annotate the function with `@df_in`. For example the following function expects to get
a DataFrame with columns `Brand` and `Price`:

```python
@df_in(columns=["Brand", "Price"])
def process_cars(car_df):
    # do stuff with cars
```

If your function takes multiple arguments, specify the field to be checked with it's `name`:

```python
@df_in(name="car_df", columns=["Brand", "Price"])
def process_cars(year, style, car_df):
    # do stuff with cars
```

You can also check columns of multiple arguments if you specify the names
```python
@df_in(name="car_df", columns=["Brand", "Price"])
@df_in(name="brand_df", columns=["Brand", "BrandName"])
def process_cars(car_df, brand_df):
    # do stuff with cars
```

To check that a function returns a DataFrame with specific columns, use `@df_out` decorator:

```python
@df_out(columns=["Brand", "Price"])
def get_all_cars():
    # get those cars
    return all_cars_df
```

In case one of the listed columns is missing from the DataFrame, a helpful assertion error is thrown:

```python
AssertionError("Column Price missing from DataFrame. Got columns: ['Brand']")
```

To check both input and output, just use both annotations on the same function:

```python
@df_in(columns=["Brand", "Price"])
@df_out(columns=["Brand", "Price"])
def filter_cars(car_df):
    # filter some cars
    return filtered_cars_df
```

If you want to also check the data types of each column, you can replace the column array:

```python
columns=["Brand", "Price"]
```

with a dict:

```python
columns={"Brand": "object", "Price": "int64"}
```

This will not only check that the specified columns are found from the DataFrame but also that their `dtype` is the expected.
In case of a wrong `dtype`, an error message similar to following will explain the mismatch:

```
AssertionError("Column Price has wrong dtype. Was int64, expected float64")
```

You can enable strict-mode for both `@df_in` and `@df_out`. This will raise an error if the DataFrame contains columns
not defined in the annotation:

```python
@df_in(columns=["Brand"], strict=True)
def process_cars(car_df):
    # do stuff with cars
```

will, when `car_df` contains columns `["Brand", "Price"]` raise an error:

```
AssertionError: DataFrame contained unexpected column(s): Price
```

To quickly check what the incoming and outgoing dataframes contain, you can add a `@df_log` annotation to the function. For
example adding `@df_log` to the above `filter_cars` function will product log lines:

```
Function filter_cars parameters contained a DataFrame: columns: ['Brand', 'Price']
Function filter_cars returned a DataFrame: columns: ['Brand', 'Price']
```

or with `@df_log(include_dtypes=True)` you get:

```
Function filter_cars parameters contained a DataFrame: columns: ['Brand', 'Price'] with dtypes ['object', 'int64']
Function filter_cars returned a DataFrame: columns: ['Brand', 'Price'] with dtypes ['object', 'int64']
```

## Contributing

Contributions are accepted. Include tests in PR's.

## Development

To run the tests, clone the repository, install dependencies with Poetry and run tests with PyTest:

```sh
poetry install
poetry shell
pytest
```

To enable linting on each commit, run `pre-commit install`. After that, your every commit will be checked with `isort`, `black` and `flake8`.

## License

MIT

## Changelog

### 0.7.0

- Support Pandas 2.x
- Drop support for Python 3.7 and 3.8
- Build and test with Python 3.12 also

### 0.6.0

- Make checking columns of multiple function parameters work also with positional arguments (thanks @latvanii)

### 0.5.0

- Added `strict` parameter for `@df_in` and `@df_out`

### 0.4.2

- Added docstrings for the decorators
- Fix import of `@df_log`

### 0.4.1

- Add `include_dtypes` parameter for `@df_log`.
- Fix handling of empty signature with `@df_in`.

### 0.4.0

- Added `@df_log` for logging.
- Improved assertion messages.

### 0.3.0

- Added type hints.

### 0.2.1

- Added Pypi classifiers. 

### 0.2.0

- Fixed decorator usage.
- Added functools wraps.

### 0.1.0

- Initial release.


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/fourkind/daffy",
    "name": "daffy",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8.1,<4.0.0",
    "maintainer_email": "",
    "keywords": "pandas,dataframe,typing,validation",
    "author": "Janne Sinivirta",
    "author_email": "janne.sinivirta@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/1f/55/f5292bc354ba2e9ac2e08cdca19c47839c53938dd38c497d90159680fb34/daffy-0.7.0.tar.gz",
    "platform": null,
    "description": "# DAFFY DataFrame Column Validator\n[![PyPI](https://img.shields.io/pypi/v/daffy)](https://pypi.org/project/daffy/)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/daffy)\n![test](https://github.com/fourkind/daffy/workflows/test/badge.svg)\n[![codecov](https://codecov.io/gh/fourkind/daffy/branch/master/graph/badge.svg?token=2FYBMT65A6)](https://codecov.io/gh/fourkind/daffy)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\n## Description \n\nIn projects using Pandas, it's very common to have functions that take Pandas DataFrames as input or produce them as output.\nIt's hard to figure out quickly what these DataFrames contain. This library offers simple decorators to annotate your functions\nso that they document themselves and that documentation is kept up-to-date by validating the input and output on runtime.\n\nFor example,\n\n```python\n@df_in(columns=[\"Brand\", \"Price\"])     # the function expects a DataFrame as input parameter with columns Brand and Price\n@df_out(columns=[\"Brand\", \"Price\"])    # the function will return a DataFrame with columns Brand and Price\ndef filter_cars(car_df):\n    # before this code is executed, the input DataFrame is validated according to the above decorator\n    # filter some cars..\n    return filtered_cars_df\n```\n\n## Table of Contents\n* [Installation](#installation)\n* [Usage](#usage)\n* [Contributing](#contributing)\n* [Tests](#tests)\n* [License](#license)\n* [Changelog](#changelog)\n\n## Installation\n\nInstall with your favorite Python dependency manager like\n\n```sh\npip install daffy\n```\n\n## Usage \n\nStart by importing the needed decorators:\n\n```python\nfrom daffy import df_in, df_out\n```\n\nTo check a DataFrame input to a function, annotate the function with `@df_in`. For example the following function expects to get\na DataFrame with columns `Brand` and `Price`:\n\n```python\n@df_in(columns=[\"Brand\", \"Price\"])\ndef process_cars(car_df):\n    # do stuff with cars\n```\n\nIf your function takes multiple arguments, specify the field to be checked with it's `name`:\n\n```python\n@df_in(name=\"car_df\", columns=[\"Brand\", \"Price\"])\ndef process_cars(year, style, car_df):\n    # do stuff with cars\n```\n\nYou can also check columns of multiple arguments if you specify the names\n```python\n@df_in(name=\"car_df\", columns=[\"Brand\", \"Price\"])\n@df_in(name=\"brand_df\", columns=[\"Brand\", \"BrandName\"])\ndef process_cars(car_df, brand_df):\n    # do stuff with cars\n```\n\nTo check that a function returns a DataFrame with specific columns, use `@df_out` decorator:\n\n```python\n@df_out(columns=[\"Brand\", \"Price\"])\ndef get_all_cars():\n    # get those cars\n    return all_cars_df\n```\n\nIn case one of the listed columns is missing from the DataFrame, a helpful assertion error is thrown:\n\n```python\nAssertionError(\"Column Price missing from DataFrame. Got columns: ['Brand']\")\n```\n\nTo check both input and output, just use both annotations on the same function:\n\n```python\n@df_in(columns=[\"Brand\", \"Price\"])\n@df_out(columns=[\"Brand\", \"Price\"])\ndef filter_cars(car_df):\n    # filter some cars\n    return filtered_cars_df\n```\n\nIf you want to also check the data types of each column, you can replace the column array:\n\n```python\ncolumns=[\"Brand\", \"Price\"]\n```\n\nwith a dict:\n\n```python\ncolumns={\"Brand\": \"object\", \"Price\": \"int64\"}\n```\n\nThis will not only check that the specified columns are found from the DataFrame but also that their `dtype` is the expected.\nIn case of a wrong `dtype`, an error message similar to following will explain the mismatch:\n\n```\nAssertionError(\"Column Price has wrong dtype. Was int64, expected float64\")\n```\n\nYou can enable strict-mode for both `@df_in` and `@df_out`. This will raise an error if the DataFrame contains columns\nnot defined in the annotation:\n\n```python\n@df_in(columns=[\"Brand\"], strict=True)\ndef process_cars(car_df):\n    # do stuff with cars\n```\n\nwill, when `car_df` contains columns `[\"Brand\", \"Price\"]` raise an error:\n\n```\nAssertionError: DataFrame contained unexpected column(s): Price\n```\n\nTo quickly check what the incoming and outgoing dataframes contain, you can add a `@df_log` annotation to the function. For\nexample adding `@df_log` to the above `filter_cars` function will product log lines:\n\n```\nFunction filter_cars parameters contained a DataFrame: columns: ['Brand', 'Price']\nFunction filter_cars returned a DataFrame: columns: ['Brand', 'Price']\n```\n\nor with `@df_log(include_dtypes=True)` you get:\n\n```\nFunction filter_cars parameters contained a DataFrame: columns: ['Brand', 'Price'] with dtypes ['object', 'int64']\nFunction filter_cars returned a DataFrame: columns: ['Brand', 'Price'] with dtypes ['object', 'int64']\n```\n\n## Contributing\n\nContributions are accepted. Include tests in PR's.\n\n## Development\n\nTo run the tests, clone the repository, install dependencies with Poetry and run tests with PyTest:\n\n```sh\npoetry install\npoetry shell\npytest\n```\n\nTo enable linting on each commit, run `pre-commit install`. After that, your every commit will be checked with `isort`, `black` and `flake8`.\n\n## License\n\nMIT\n\n## Changelog\n\n### 0.7.0\n\n- Support Pandas 2.x\n- Drop support for Python 3.7 and 3.8\n- Build and test with Python 3.12 also\n\n### 0.6.0\n\n- Make checking columns of multiple function parameters work also with positional arguments (thanks @latvanii)\n\n### 0.5.0\n\n- Added `strict` parameter for `@df_in` and `@df_out`\n\n### 0.4.2\n\n- Added docstrings for the decorators\n- Fix import of `@df_log`\n\n### 0.4.1\n\n- Add `include_dtypes` parameter for `@df_log`.\n- Fix handling of empty signature with `@df_in`.\n\n### 0.4.0\n\n- Added `@df_log` for logging.\n- Improved assertion messages.\n\n### 0.3.0\n\n- Added type hints.\n\n### 0.2.1\n\n- Added Pypi classifiers. \n\n### 0.2.0\n\n- Fixed decorator usage.\n- Added functools wraps.\n\n### 0.1.0\n\n- Initial release.\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Function decorators for Pandas Dataframe column name and data type validation",
    "version": "0.7.0",
    "project_urls": {
        "Homepage": "https://github.com/fourkind/daffy",
        "Repository": "https://github.com/fourkind/daffy.git"
    },
    "split_keywords": [
        "pandas",
        "dataframe",
        "typing",
        "validation"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8781d70672b0028a0a4cc1317a3d7c57f246612053084b5022ab64e622805c77",
                "md5": "e199e210a54dce8e90e4533168bf73e9",
                "sha256": "7bf8a78df6b96aa26226589767b366b103ae61c5806ce4cd8dd2031a2991118c"
            },
            "downloads": -1,
            "filename": "daffy-0.7.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e199e210a54dce8e90e4533168bf73e9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8.1,<4.0.0",
            "size": 6770,
            "upload_time": "2023-12-07T13:33:16",
            "upload_time_iso_8601": "2023-12-07T13:33:16.262776Z",
            "url": "https://files.pythonhosted.org/packages/87/81/d70672b0028a0a4cc1317a3d7c57f246612053084b5022ab64e622805c77/daffy-0.7.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1f55f5292bc354ba2e9ac2e08cdca19c47839c53938dd38c497d90159680fb34",
                "md5": "115536b32d7ee51fad7cad43d318d0bf",
                "sha256": "a7fdbf6608a55b2a607255e4f7d1d6890e1fc1db5abd56dcb1497ae3ce430443"
            },
            "downloads": -1,
            "filename": "daffy-0.7.0.tar.gz",
            "has_sig": false,
            "md5_digest": "115536b32d7ee51fad7cad43d318d0bf",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8.1,<4.0.0",
            "size": 6322,
            "upload_time": "2023-12-07T13:33:17",
            "upload_time_iso_8601": "2023-12-07T13:33:17.918516Z",
            "url": "https://files.pythonhosted.org/packages/1f/55/f5292bc354ba2e9ac2e08cdca19c47839c53938dd38c497d90159680fb34/daffy-0.7.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-07 13:33:17",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "fourkind",
    "github_project": "daffy",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "daffy"
}
        
Elapsed time: 0.16400s