pandas-validity


Namepandas-validity JSON
Version 0.1.1 PyPI version JSON
download
home_pagehttps://github.com/ohmycoffe/pandas-validity
SummaryValidation library for Pandas Dataframe
upload_time2023-10-18 22:27:43
maintainer
docs_urlNone
authorohmycoffe
requires_python>=3.9,<4.0
licenseMIT
keywords pandas dataframe validation
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pandas-validity
[![PyPI - Version](https://img.shields.io/pypi/v/pandas-validity)](https://pypi.org/project/pandas-validity/)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/pandas-validity)
[![Test and lint](https://github.com/ohmycoffe/pandas-validity/actions/workflows/test.yml/badge.svg?branch=main)](https://github.com/ohmycoffe/pandas-validity/actions/workflows/test.yml?query=branch%3Amain)
[![codecov](https://codecov.io/gh/ohmycoffe/pandas-validity/graph/badge.svg?token=4K6RV6E9JX)](https://codecov.io/gh/ohmycoffe/pandas-validity)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Checked with mypy](https://www.mypy-lang.org/static/mypy_badge.svg)](https://mypy-lang.org/)
[![Poetry](https://img.shields.io/endpoint?url=https://python-poetry.org/badge/v0.json)](https://python-poetry.org/)
![PyPI - License](https://img.shields.io/pypi/l/organize-photos)

## What is it?

**pandas-validity** is a Python library for the validation of pandas DataFrames. It provides a `DataFrameValidator` class that serves as a context manager. Within this context, you can perform multiple validations and checks. Any encountered errors are collected and raised at the end of the process. The `DataFrameValidator` raises a `ValidationErrorsGroup` exception to summarize the errors.

## Installation

You can easily install the latest released version using binary installers from the [Python Package Index (PyPI)](https://pypi.org/project/pandas-validity):

```sh
pip install pandas-validity
```

### Development Installation

**Prerequisites**: [poetry](https://python-poetry.org/) for environment management 

The source code is currently hosted on GitHub at [ohmycoffe/pandas-validity](https://github.com/ohmycoffe/pandas-validity). To get the development version:

```shell
git clone git@github.com:ohmycoffe/pandas-validity.git
```

To install the project and development dependencies:

```shell
make install 
```

To run tests:

```shell
make test 
```

To view all possible commands, use:

```shell
make help
```

## Usage
```python
import pandas as pd
import datetime
from pandas_validity import DataFrameValidator

# Create a sample DataFrame
df = pd.DataFrame(
        {
            "A": [1, 2, 3],
            "B": ["a", None, "c"],
            "C": [2.3, 4.5, 9.2],
            "D": [
                datetime.datetime(2023, 1, 1, 1),
                datetime.datetime(2023, 1, 1, 2),
                datetime.datetime(2023, 1, 1, 3),
            ],
        }
    )

# Define your expectations and data type mappings
expected_columns = ['A', 'B', 'C', 'E']
data_types_mapping = {
            "A": 'float',
            "D": 'datetime'
        }

# Use DataFrameValidator for validation
with DataFrameValidator(df) as validator:
    validator.is_empty()
    validator.has_required_columns(expected_columns)
    validator.has_no_redundant_columns(expected_columns)
    validator.has_valid_data_types(data_types_mapping)
    validator.has_no_missing_data()
```

**Output:**

```shell
Error occurred: (<class 'pandas_validity.exceptions.ValidationError'>) The dataframe has missing columns: ['E']
Error occurred: (<class 'pandas_validity.exceptions.ValidationError'>) The dataframe has redundant columns: ['D']
Error occurred: (<class 'pandas_validity.exceptions.ValidationError'>) Column 'A' has an invalid data type: 'int64'
Error occurred: (<class 'pandas_validity.exceptions.ValidationError'>) Found 1 missing value: [{'index': 1, 'column': 'B', 'value': None}]
  + Exception Group Traceback (most recent call last):
...
  | pandas_validity.exceptions.ValidationErrorsGroup: Validation errors found: 4. (4 sub-exceptions)
  +-+---------------- 1 ----------------
    | pandas_validity.exceptions.ValidationError: The dataframe has missing columns: ['E']
    +---------------- 2 ----------------
    | pandas_validity.exceptions.ValidationError: The dataframe has redundant columns: ['D']
    +---------------- 3 ----------------
    | pandas_validity.exceptions.ValidationError: Column 'A' has an invalid data type: 'int64'
    +---------------- 4 ----------------
    | pandas_validity.exceptions.ValidationError: Found 1 missing value: [{'index': 1, 'column': 'B', 'value': None}]
    +------------------------------------
```
---

The library supports the following data types for validation:
- predefined: `"str"`, `"int"`, `"float"`,`"datetime"`, `"bool"`
- or any `Callable` that accepts a data `type/dtype` object and returns a boolean value to indicate the validation status - example: `pd.api.types.is_string_dtype`


## Development
**Prerequisites**: [poetry](https://python-poetry.org/) for environment management 

The source code is currently hosted on GitHub at:
[https://github.com/ohmycoffe/pandas-validity](https://github.com/ohmycoffe/pandas-validity)

```shell
git clone git@github.com:ohmycoffe/pandas-validity.git
```
To install the project and development dependencies:
```shell
make install 
```
To run tests:
```shell
make test 
```
To view all possible commands, use:
```shell
make 
```
## License
This project is licensed under the terms of the [MIT](LICENSE) license.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ohmycoffe/pandas-validity",
    "name": "pandas-validity",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9,<4.0",
    "maintainer_email": "",
    "keywords": "pandas,dataframe,validation",
    "author": "ohmycoffe",
    "author_email": "ohmycoffe1@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/8b/43/5c62c45801b4caa25976f5376db1fdce0565c4d1d9de9786a193204127a2/pandas_validity-0.1.1.tar.gz",
    "platform": null,
    "description": "# pandas-validity\n[![PyPI - Version](https://img.shields.io/pypi/v/pandas-validity)](https://pypi.org/project/pandas-validity/)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/pandas-validity)\n[![Test and lint](https://github.com/ohmycoffe/pandas-validity/actions/workflows/test.yml/badge.svg?branch=main)](https://github.com/ohmycoffe/pandas-validity/actions/workflows/test.yml?query=branch%3Amain)\n[![codecov](https://codecov.io/gh/ohmycoffe/pandas-validity/graph/badge.svg?token=4K6RV6E9JX)](https://codecov.io/gh/ohmycoffe/pandas-validity)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![Checked with mypy](https://www.mypy-lang.org/static/mypy_badge.svg)](https://mypy-lang.org/)\n[![Poetry](https://img.shields.io/endpoint?url=https://python-poetry.org/badge/v0.json)](https://python-poetry.org/)\n![PyPI - License](https://img.shields.io/pypi/l/organize-photos)\n\n## What is it?\n\n**pandas-validity** is a Python library for the validation of pandas DataFrames. It provides a `DataFrameValidator` class that serves as a context manager. Within this context, you can perform multiple validations and checks. Any encountered errors are collected and raised at the end of the process. The `DataFrameValidator` raises a `ValidationErrorsGroup` exception to summarize the errors.\n\n## Installation\n\nYou can easily install the latest released version using binary installers from the [Python Package Index (PyPI)](https://pypi.org/project/pandas-validity):\n\n```sh\npip install pandas-validity\n```\n\n### Development Installation\n\n**Prerequisites**: [poetry](https://python-poetry.org/) for environment management \n\nThe source code is currently hosted on GitHub at [ohmycoffe/pandas-validity](https://github.com/ohmycoffe/pandas-validity). To get the development version:\n\n```shell\ngit clone git@github.com:ohmycoffe/pandas-validity.git\n```\n\nTo install the project and development dependencies:\n\n```shell\nmake install \n```\n\nTo run tests:\n\n```shell\nmake test \n```\n\nTo view all possible commands, use:\n\n```shell\nmake help\n```\n\n## Usage\n```python\nimport pandas as pd\nimport datetime\nfrom pandas_validity import DataFrameValidator\n\n# Create a sample DataFrame\ndf = pd.DataFrame(\n        {\n            \"A\": [1, 2, 3],\n            \"B\": [\"a\", None, \"c\"],\n            \"C\": [2.3, 4.5, 9.2],\n            \"D\": [\n                datetime.datetime(2023, 1, 1, 1),\n                datetime.datetime(2023, 1, 1, 2),\n                datetime.datetime(2023, 1, 1, 3),\n            ],\n        }\n    )\n\n# Define your expectations and data type mappings\nexpected_columns = ['A', 'B', 'C', 'E']\ndata_types_mapping = {\n            \"A\": 'float',\n            \"D\": 'datetime'\n        }\n\n# Use DataFrameValidator for validation\nwith DataFrameValidator(df) as validator:\n    validator.is_empty()\n    validator.has_required_columns(expected_columns)\n    validator.has_no_redundant_columns(expected_columns)\n    validator.has_valid_data_types(data_types_mapping)\n    validator.has_no_missing_data()\n```\n\n**Output:**\n\n```shell\nError occurred: (<class 'pandas_validity.exceptions.ValidationError'>) The dataframe has missing columns: ['E']\nError occurred: (<class 'pandas_validity.exceptions.ValidationError'>) The dataframe has redundant columns: ['D']\nError occurred: (<class 'pandas_validity.exceptions.ValidationError'>) Column 'A' has an invalid data type: 'int64'\nError occurred: (<class 'pandas_validity.exceptions.ValidationError'>) Found 1 missing value: [{'index': 1, 'column': 'B', 'value': None}]\n  + Exception Group Traceback (most recent call last):\n...\n  | pandas_validity.exceptions.ValidationErrorsGroup: Validation errors found: 4. (4 sub-exceptions)\n  +-+---------------- 1 ----------------\n    | pandas_validity.exceptions.ValidationError: The dataframe has missing columns: ['E']\n    +---------------- 2 ----------------\n    | pandas_validity.exceptions.ValidationError: The dataframe has redundant columns: ['D']\n    +---------------- 3 ----------------\n    | pandas_validity.exceptions.ValidationError: Column 'A' has an invalid data type: 'int64'\n    +---------------- 4 ----------------\n    | pandas_validity.exceptions.ValidationError: Found 1 missing value: [{'index': 1, 'column': 'B', 'value': None}]\n    +------------------------------------\n```\n---\n\nThe library supports the following data types for validation:\n- predefined: `\"str\"`, `\"int\"`, `\"float\"`,`\"datetime\"`, `\"bool\"`\n- or any `Callable` that accepts a data `type/dtype` object and returns a boolean value to indicate the validation status - example: `pd.api.types.is_string_dtype`\n\n\n## Development\n**Prerequisites**: [poetry](https://python-poetry.org/) for environment management \n\nThe source code is currently hosted on GitHub at:\n[https://github.com/ohmycoffe/pandas-validity](https://github.com/ohmycoffe/pandas-validity)\n\n```shell\ngit clone git@github.com:ohmycoffe/pandas-validity.git\n```\nTo install the project and development dependencies:\n```shell\nmake install \n```\nTo run tests:\n```shell\nmake test \n```\nTo view all possible commands, use:\n```shell\nmake \n```\n## License\nThis project is licensed under the terms of the [MIT](LICENSE) license.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Validation library for Pandas Dataframe",
    "version": "0.1.1",
    "project_urls": {
        "Homepage": "https://github.com/ohmycoffe/pandas-validity",
        "Repository": "https://github.com/ohmycoffe/pandas-validity"
    },
    "split_keywords": [
        "pandas",
        "dataframe",
        "validation"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b75443f6405c10b64363e6dea92082fa226003aefad03779015f4b255d7d4aee",
                "md5": "d740ce8743e345e8e5f28ce57e09eaea",
                "sha256": "eec0ed82eeae0894c34e61e3f5c55542cd07fadc1ce5b6ed1a4cc7c801bce8c8"
            },
            "downloads": -1,
            "filename": "pandas_validity-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d740ce8743e345e8e5f28ce57e09eaea",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9,<4.0",
            "size": 7486,
            "upload_time": "2023-10-18T22:27:42",
            "upload_time_iso_8601": "2023-10-18T22:27:42.590251Z",
            "url": "https://files.pythonhosted.org/packages/b7/54/43f6405c10b64363e6dea92082fa226003aefad03779015f4b255d7d4aee/pandas_validity-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8b435c62c45801b4caa25976f5376db1fdce0565c4d1d9de9786a193204127a2",
                "md5": "87e80c8c3480f885063d2eba25cd046d",
                "sha256": "51db9fc1121cb9a9c22fc6bf08bfc71e52398f21b5d9ab516f6bb684a22a95d1"
            },
            "downloads": -1,
            "filename": "pandas_validity-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "87e80c8c3480f885063d2eba25cd046d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9,<4.0",
            "size": 5980,
            "upload_time": "2023-10-18T22:27:43",
            "upload_time_iso_8601": "2023-10-18T22:27:43.968991Z",
            "url": "https://files.pythonhosted.org/packages/8b/43/5c62c45801b4caa25976f5376db1fdce0565c4d1d9de9786a193204127a2/pandas_validity-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-18 22:27:43",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ohmycoffe",
    "github_project": "pandas-validity",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "pandas-validity"
}
        
Elapsed time: 0.15066s