pandera


Namepandera JSON
Version 0.21.1 PyPI version JSON
download
home_pagehttps://github.com/pandera-dev/pandera
SummaryA light-weight and flexible data validation and testing tool for statistical data objects.
upload_time2024-12-04 00:47:31
maintainerNone
docs_urlNone
authorNiels Bantilan
requires_python>=3.7
licenseMIT
keywords pandas validation data-structures
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            <br>
<div align="center"><a href="https://www.union.ai/pandera"><img src="docs/source/_static/pandera-banner.png" width="400"></a></div>

<h1 align="center">
  The Open-source Framework for Precision Data Testing
</h1>

<p align="center">
  📊 🔎 ✅
</p>

<p align="center">
  <i>Data validation for scientists, engineers, and analysts seeking correctness.</i>
</p>

<br>


[![CI Build](https://img.shields.io/github/actions/workflow/status/unionai-oss/pandera/ci-tests.yml?branch=main&label=tests&style=for-the-badge)](https://github.com/unionai-oss/pandera/actions/workflows/ci-tests.yml?query=branch%3Amain)
[![Documentation Status](https://readthedocs.org/projects/pandera/badge/?version=stable&style=for-the-badge)](https://pandera.readthedocs.io/en/stable/?badge=stable)
[![PyPI version shields.io](https://img.shields.io/pypi/v/pandera.svg?style=for-the-badge)](https://pypi.org/project/pandera/)
[![PyPI license](https://img.shields.io/pypi/l/pandera.svg?style=for-the-badge)](https://pypi.python.org/pypi/)
[![pyOpenSci](https://go.union.ai/pandera-pyopensci-badge)](https://github.com/pyOpenSci/software-review/issues/12)
[![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://img.shields.io/badge/repo%20status-Active-Green?style=for-the-badge)](https://www.repostatus.org/#active)
[![Documentation Status](https://readthedocs.org/projects/pandera/badge/?version=latest&style=for-the-badge)](https://pandera.readthedocs.io/en/latest/?badge=latest)
[![codecov](https://img.shields.io/codecov/c/github/unionai-oss/pandera?style=for-the-badge)](https://codecov.io/gh/unionai-oss/pandera)
[![PyPI pyversions](https://img.shields.io/pypi/pyversions/pandera.svg?style=for-the-badge)](https://pypi.python.org/pypi/pandera/)
[![DOI](https://img.shields.io/badge/DOI-10.5281/zenodo.3385265-blue?style=for-the-badge)](https://doi.org/10.5281/zenodo.3385265)
[![asv](http://img.shields.io/badge/benchmarked%20by-asv-green.svg?style=for-the-badge)](https://pandera-dev.github.io/pandera-asv-logs/)
[![Monthly Downloads](https://img.shields.io/pypi/dm/pandera?style=for-the-badge&color=blue)](https://pepy.tech/project/pandera)
[![Total Downloads](https://img.shields.io/pepy/dt/pandera?style=for-the-badge&color=blue)](https://pepy.tech/project/pandera)
[![Conda Downloads](https://img.shields.io/conda/dn/conda-forge/pandera?style=for-the-badge)](https://anaconda.org/conda-forge/pandera)
[![Discord](https://img.shields.io/badge/discord-chat-purple?color=%235765F2&label=discord&logo=discord&style=for-the-badge)](https://discord.gg/vyanhWuaKB)

`pandera` is a [Union.ai](https://union.ai/blog-post/pandera-joins-union-ai) open
source project that provides a flexible and expressive API for performing data
validation on dataframe-like objects to make data processing pipelines more readable and robust.

Dataframes contain information that `pandera` explicitly validates at runtime.
This is useful in production-critical or reproducible research settings. With
`pandera`, you can:

1. Define a schema once and use it to validate
   [different dataframe types](https://pandera.readthedocs.io/en/stable/supported_libraries.html)
   including [pandas](http://pandas.pydata.org), [polars](https://docs.pola.rs/),
   [dask](https://dask.org), [modin](https://modin.readthedocs.io/),
   and [pyspark](https://spark.apache.org/docs/3.2.0/api/python/user_guide/pandas_on_spark/index.html).
1. [Check](https://pandera.readthedocs.io/en/stable/checks.html) the types and
   properties of columns in a `DataFrame` or values in a `Series`.
1. Perform more complex statistical validation like
   [hypothesis testing](https://pandera.readthedocs.io/en/stable/hypothesis.html#hypothesis).
1. [Parse](https://pandera.readthedocs.io/en/stable/parsers.html) data to standardize
   the preprocessing steps needed to produce valid data.
1. Seamlessly integrate with existing data analysis/processing pipelines
   via [function decorators](https://pandera.readthedocs.io/en/stable/decorators.html#decorators).
1. Define dataframe models with the
   [class-based API](https://pandera.readthedocs.io/en/stable/dataframe_models.html#dataframe-models)
   with pydantic-style syntax and validate dataframes using the typing syntax.
1. [Synthesize data](https://pandera.readthedocs.io/en/stable/data_synthesis_strategies.html#data-synthesis-strategies)
   from schema objects for property-based testing with pandas data structures.
1. [Lazily Validate](https://pandera.readthedocs.io/en/stable/lazy_validation.html)
   dataframes so that all validation checks are executed before raising an error.
1. [Integrate](https://pandera.readthedocs.io/en/stable/integrations.html) with
   a rich ecosystem of python tools like [pydantic](https://pydantic-docs.helpmanual.io),
   [fastapi](https://fastapi.tiangolo.com/), and [mypy](http://mypy-lang.org/).

## Documentation

The official documentation is hosted here: https://pandera.readthedocs.io


## Install

Using pip:

```
pip install pandera
```

Using conda:

```
conda install -c conda-forge pandera
```

### Extras

Installing additional functionality:

<details>

<summary><i>pip</i></summary>

```bash
pip install 'pandera[hypotheses]' # hypothesis checks
pip install 'pandera[io]'         # yaml/script schema io utilities
pip install 'pandera[strategies]' # data synthesis strategies
pip install 'pandera[mypy]'       # enable static type-linting of pandas
pip install 'pandera[fastapi]'    # fastapi integration
pip install 'pandera[dask]'       # validate dask dataframes
pip install 'pandera[pyspark]'    # validate pyspark dataframes
pip install 'pandera[modin]'      # validate modin dataframes
pip install 'pandera[modin-ray]'  # validate modin dataframes with ray
pip install 'pandera[modin-dask]' # validate modin dataframes with dask
pip install 'pandera[geopandas]'  # validate geopandas geodataframes
pip install 'pandera[polars]'     # validate polars dataframes
```

</details>

<details>

<summary><i>conda</i></summary>

```bash
conda install -c conda-forge pandera-hypotheses  # hypothesis checks
conda install -c conda-forge pandera-io          # yaml/script schema io utilities
conda install -c conda-forge pandera-strategies  # data synthesis strategies
conda install -c conda-forge pandera-mypy        # enable static type-linting of pandas
conda install -c conda-forge pandera-fastapi     # fastapi integration
conda install -c conda-forge pandera-dask        # validate dask dataframes
conda install -c conda-forge pandera-pyspark     # validate pyspark dataframes
conda install -c conda-forge pandera-modin       # validate modin dataframes
conda install -c conda-forge pandera-modin-ray   # validate modin dataframes with ray
conda install -c conda-forge pandera-modin-dask  # validate modin dataframes with dask
conda install -c conda-forge pandera-geopandas   # validate geopandas geodataframes
conda install -c conda-forge pandera-polars      # validate polars dataframes
```

</details>

## Quick Start

```python
import pandas as pd
import pandera as pa


# data to validate
df = pd.DataFrame({
    "column1": [1, 4, 0, 10, 9],
    "column2": [-1.3, -1.4, -2.9, -10.1, -20.4],
    "column3": ["value_1", "value_2", "value_3", "value_2", "value_1"]
})

# define schema
schema = pa.DataFrameSchema({
    "column1": pa.Column(int, checks=pa.Check.le(10)),
    "column2": pa.Column(float, checks=pa.Check.lt(-1.2)),
    "column3": pa.Column(str, checks=[
        pa.Check.str_startswith("value_"),
        # define custom checks as functions that take a series as input and
        # outputs a boolean or boolean Series
        pa.Check(lambda s: s.str.split("_", expand=True).shape[1] == 2)
    ]),
})

validated_df = schema(df)
print(validated_df)

#     column1  column2  column3
#  0        1     -1.3  value_1
#  1        4     -1.4  value_2
#  2        0     -2.9  value_3
#  3       10    -10.1  value_2
#  4        9    -20.4  value_1
```

## DataFrame Model

`pandera` also provides an alternative API for expressing schemas inspired
by [dataclasses](https://docs.python.org/3/library/dataclasses.html) and
[pydantic](https://pydantic-docs.helpmanual.io/). The equivalent `DataFrameModel`
for the above `DataFrameSchema` would be:


```python
from pandera.typing import Series

class Schema(pa.DataFrameModel):

    column1: int = pa.Field(le=10)
    column2: float = pa.Field(lt=-1.2)
    column3: str = pa.Field(str_startswith="value_")

    @pa.check("column3")
    def column_3_check(cls, series: Series[str]) -> Series[bool]:
        """Check that values have two elements after being split with '_'"""
        return series.str.split("_", expand=True).shape[1] == 2

Schema.validate(df)
```

## Development Installation

```
git clone https://github.com/pandera-dev/pandera.git
cd pandera
export PYTHON_VERSION=...  # specify desired python version
pip install -r dev/requirements-${PYTHON_VERSION}.txt
pip install -e .
```

## Tests

```
pip install pytest
pytest tests
```

## Contributing to pandera [![GitHub contributors](https://img.shields.io/github/contributors/pandera-dev/pandera.svg?style=for-the-badge)](https://github.com/pandera-dev/pandera/graphs/contributors)

All contributions, bug reports, bug fixes, documentation improvements,
enhancements and ideas are welcome.

A detailed overview on how to contribute can be found in the
[contributing guide](https://github.com/pandera-dev/pandera/blob/main/.github/CONTRIBUTING.md)
on GitHub.

## Issues

Go [here](https://github.com/pandera-dev/pandera/issues) to submit feature
requests or bugfixes.

## Need Help?

There are many ways of getting help with your questions. You can ask a question
on [Github Discussions](https://github.com/pandera-dev/pandera/discussions/categories/q-a)
page or reach out to the maintainers and pandera community on
[Discord](https://discord.gg/vyanhWuaKB)

## Why `pandera`?

- [dataframe-centric data types](https://pandera.readthedocs.io/en/stable/dtypes.html),
  [column nullability](https://pandera.readthedocs.io/en/stable/dataframe_schemas.html#null-values-in-columns),
  and [uniqueness](https://pandera.readthedocs.io/en/stable/dataframe_schemas.html#validating-the-joint-uniqueness-of-columns)
  are first-class concepts.
- Define [dataframe models](https://pandera.readthedocs.io/en/stable/schema_models.html) with the class-based API with
  [pydantic](https://pydantic-docs.helpmanual.io/)-style syntax and validate dataframes using the typing syntax.
- `check_input` and `check_output` [decorators](https://pandera.readthedocs.io/en/stable/decorators.html#decorators-for-pipeline-integration)
  enable seamless integration with existing code.
- [`Check`s](https://pandera.readthedocs.io/en/stable/checks.html) provide flexibility and performance by providing access to `pandas`
  API by design and offers built-in checks for common data tests.
- [`Hypothesis`](https://pandera.readthedocs.io/en/stable/hypothesis.html) class provides a tidy-first interface for statistical hypothesis
  testing.
- `Check`s and `Hypothesis` objects support both [tidy and wide data validation](https://pandera.readthedocs.io/en/stable/checks.html#wide-checks).
- Use schemas as generative contracts to [synthesize data](https://pandera.readthedocs.io/en/stable/data_synthesis_strategies.html) for unit testing.
- [Schema inference](https://pandera.readthedocs.io/en/stable/schema_inference.html) allows you to bootstrap schemas from data.

## How to Cite

If you use `pandera` in the context of academic or industry research, please
consider citing the **paper** and/or **software package**.

### [Paper](https://conference.scipy.org/proceedings/scipy2020/niels_bantilan.html)

```
@InProceedings{ niels_bantilan-proc-scipy-2020,
  author    = { {N}iels {B}antilan },
  title     = { pandera: {S}tatistical {D}ata {V}alidation of {P}andas {D}ataframes },
  booktitle = { {P}roceedings of the 19th {P}ython in {S}cience {C}onference },
  pages     = { 116 - 124 },
  year      = { 2020 },
  editor    = { {M}eghann {A}garwal and {C}hris {C}alloway and {D}illon {N}iederhut and {D}avid {S}hupe },
  doi       = { 10.25080/Majora-342d178e-010 }
}
```

### Software Package

[![DOI](https://img.shields.io/badge/DOI-10.5281/zenodo.3385265-blue?style=for-the-badge)](https://doi.org/10.5281/zenodo.3385265)


## License and Credits

`pandera` is licensed under the [MIT license](license.txt) and is written and
maintained by Niels Bantilan (niels@union.ai)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/pandera-dev/pandera",
    "name": "pandera",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "pandas, validation, data-structures",
    "author": "Niels Bantilan",
    "author_email": "niels.bantilan@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/82/05/99a9470ff77b7caee59e32f18447068b832dcd6c6f4dbdf8afe4af18c098/pandera-0.21.1.tar.gz",
    "platform": "any",
    "description": "<br>\n<div align=\"center\"><a href=\"https://www.union.ai/pandera\"><img src=\"docs/source/_static/pandera-banner.png\" width=\"400\"></a></div>\n\n<h1 align=\"center\">\n  The Open-source Framework for Precision Data Testing\n</h1>\n\n<p align=\"center\">\n  \ud83d\udcca \ud83d\udd0e \u2705\n</p>\n\n<p align=\"center\">\n  <i>Data validation for scientists, engineers, and analysts seeking correctness.</i>\n</p>\n\n<br>\n\n\n[![CI Build](https://img.shields.io/github/actions/workflow/status/unionai-oss/pandera/ci-tests.yml?branch=main&label=tests&style=for-the-badge)](https://github.com/unionai-oss/pandera/actions/workflows/ci-tests.yml?query=branch%3Amain)\n[![Documentation Status](https://readthedocs.org/projects/pandera/badge/?version=stable&style=for-the-badge)](https://pandera.readthedocs.io/en/stable/?badge=stable)\n[![PyPI version shields.io](https://img.shields.io/pypi/v/pandera.svg?style=for-the-badge)](https://pypi.org/project/pandera/)\n[![PyPI license](https://img.shields.io/pypi/l/pandera.svg?style=for-the-badge)](https://pypi.python.org/pypi/)\n[![pyOpenSci](https://go.union.ai/pandera-pyopensci-badge)](https://github.com/pyOpenSci/software-review/issues/12)\n[![Project Status: Active \u2013 The project has reached a stable, usable state and is being actively developed.](https://img.shields.io/badge/repo%20status-Active-Green?style=for-the-badge)](https://www.repostatus.org/#active)\n[![Documentation Status](https://readthedocs.org/projects/pandera/badge/?version=latest&style=for-the-badge)](https://pandera.readthedocs.io/en/latest/?badge=latest)\n[![codecov](https://img.shields.io/codecov/c/github/unionai-oss/pandera?style=for-the-badge)](https://codecov.io/gh/unionai-oss/pandera)\n[![PyPI pyversions](https://img.shields.io/pypi/pyversions/pandera.svg?style=for-the-badge)](https://pypi.python.org/pypi/pandera/)\n[![DOI](https://img.shields.io/badge/DOI-10.5281/zenodo.3385265-blue?style=for-the-badge)](https://doi.org/10.5281/zenodo.3385265)\n[![asv](http://img.shields.io/badge/benchmarked%20by-asv-green.svg?style=for-the-badge)](https://pandera-dev.github.io/pandera-asv-logs/)\n[![Monthly Downloads](https://img.shields.io/pypi/dm/pandera?style=for-the-badge&color=blue)](https://pepy.tech/project/pandera)\n[![Total Downloads](https://img.shields.io/pepy/dt/pandera?style=for-the-badge&color=blue)](https://pepy.tech/project/pandera)\n[![Conda Downloads](https://img.shields.io/conda/dn/conda-forge/pandera?style=for-the-badge)](https://anaconda.org/conda-forge/pandera)\n[![Discord](https://img.shields.io/badge/discord-chat-purple?color=%235765F2&label=discord&logo=discord&style=for-the-badge)](https://discord.gg/vyanhWuaKB)\n\n`pandera` is a [Union.ai](https://union.ai/blog-post/pandera-joins-union-ai) open\nsource project that provides a flexible and expressive API for performing data\nvalidation on dataframe-like objects to make data processing pipelines more readable and robust.\n\nDataframes contain information that `pandera` explicitly validates at runtime.\nThis is useful in production-critical or reproducible research settings. With\n`pandera`, you can:\n\n1. Define a schema once and use it to validate\n   [different dataframe types](https://pandera.readthedocs.io/en/stable/supported_libraries.html)\n   including [pandas](http://pandas.pydata.org), [polars](https://docs.pola.rs/),\n   [dask](https://dask.org), [modin](https://modin.readthedocs.io/),\n   and [pyspark](https://spark.apache.org/docs/3.2.0/api/python/user_guide/pandas_on_spark/index.html).\n1. [Check](https://pandera.readthedocs.io/en/stable/checks.html) the types and\n   properties of columns in a `DataFrame` or values in a `Series`.\n1. Perform more complex statistical validation like\n   [hypothesis testing](https://pandera.readthedocs.io/en/stable/hypothesis.html#hypothesis).\n1. [Parse](https://pandera.readthedocs.io/en/stable/parsers.html) data to standardize\n   the preprocessing steps needed to produce valid data.\n1. Seamlessly integrate with existing data analysis/processing pipelines\n   via [function decorators](https://pandera.readthedocs.io/en/stable/decorators.html#decorators).\n1. Define dataframe models with the\n   [class-based API](https://pandera.readthedocs.io/en/stable/dataframe_models.html#dataframe-models)\n   with pydantic-style syntax and validate dataframes using the typing syntax.\n1. [Synthesize data](https://pandera.readthedocs.io/en/stable/data_synthesis_strategies.html#data-synthesis-strategies)\n   from schema objects for property-based testing with pandas data structures.\n1. [Lazily Validate](https://pandera.readthedocs.io/en/stable/lazy_validation.html)\n   dataframes so that all validation checks are executed before raising an error.\n1. [Integrate](https://pandera.readthedocs.io/en/stable/integrations.html) with\n   a rich ecosystem of python tools like [pydantic](https://pydantic-docs.helpmanual.io),\n   [fastapi](https://fastapi.tiangolo.com/), and [mypy](http://mypy-lang.org/).\n\n## Documentation\n\nThe official documentation is hosted here: https://pandera.readthedocs.io\n\n\n## Install\n\nUsing pip:\n\n```\npip install pandera\n```\n\nUsing conda:\n\n```\nconda install -c conda-forge pandera\n```\n\n### Extras\n\nInstalling additional functionality:\n\n<details>\n\n<summary><i>pip</i></summary>\n\n```bash\npip install 'pandera[hypotheses]' # hypothesis checks\npip install 'pandera[io]'         # yaml/script schema io utilities\npip install 'pandera[strategies]' # data synthesis strategies\npip install 'pandera[mypy]'       # enable static type-linting of pandas\npip install 'pandera[fastapi]'    # fastapi integration\npip install 'pandera[dask]'       # validate dask dataframes\npip install 'pandera[pyspark]'    # validate pyspark dataframes\npip install 'pandera[modin]'      # validate modin dataframes\npip install 'pandera[modin-ray]'  # validate modin dataframes with ray\npip install 'pandera[modin-dask]' # validate modin dataframes with dask\npip install 'pandera[geopandas]'  # validate geopandas geodataframes\npip install 'pandera[polars]'     # validate polars dataframes\n```\n\n</details>\n\n<details>\n\n<summary><i>conda</i></summary>\n\n```bash\nconda install -c conda-forge pandera-hypotheses  # hypothesis checks\nconda install -c conda-forge pandera-io          # yaml/script schema io utilities\nconda install -c conda-forge pandera-strategies  # data synthesis strategies\nconda install -c conda-forge pandera-mypy        # enable static type-linting of pandas\nconda install -c conda-forge pandera-fastapi     # fastapi integration\nconda install -c conda-forge pandera-dask        # validate dask dataframes\nconda install -c conda-forge pandera-pyspark     # validate pyspark dataframes\nconda install -c conda-forge pandera-modin       # validate modin dataframes\nconda install -c conda-forge pandera-modin-ray   # validate modin dataframes with ray\nconda install -c conda-forge pandera-modin-dask  # validate modin dataframes with dask\nconda install -c conda-forge pandera-geopandas   # validate geopandas geodataframes\nconda install -c conda-forge pandera-polars      # validate polars dataframes\n```\n\n</details>\n\n## Quick Start\n\n```python\nimport pandas as pd\nimport pandera as pa\n\n\n# data to validate\ndf = pd.DataFrame({\n    \"column1\": [1, 4, 0, 10, 9],\n    \"column2\": [-1.3, -1.4, -2.9, -10.1, -20.4],\n    \"column3\": [\"value_1\", \"value_2\", \"value_3\", \"value_2\", \"value_1\"]\n})\n\n# define schema\nschema = pa.DataFrameSchema({\n    \"column1\": pa.Column(int, checks=pa.Check.le(10)),\n    \"column2\": pa.Column(float, checks=pa.Check.lt(-1.2)),\n    \"column3\": pa.Column(str, checks=[\n        pa.Check.str_startswith(\"value_\"),\n        # define custom checks as functions that take a series as input and\n        # outputs a boolean or boolean Series\n        pa.Check(lambda s: s.str.split(\"_\", expand=True).shape[1] == 2)\n    ]),\n})\n\nvalidated_df = schema(df)\nprint(validated_df)\n\n#     column1  column2  column3\n#  0        1     -1.3  value_1\n#  1        4     -1.4  value_2\n#  2        0     -2.9  value_3\n#  3       10    -10.1  value_2\n#  4        9    -20.4  value_1\n```\n\n## DataFrame Model\n\n`pandera` also provides an alternative API for expressing schemas inspired\nby [dataclasses](https://docs.python.org/3/library/dataclasses.html) and\n[pydantic](https://pydantic-docs.helpmanual.io/). The equivalent `DataFrameModel`\nfor the above `DataFrameSchema` would be:\n\n\n```python\nfrom pandera.typing import Series\n\nclass Schema(pa.DataFrameModel):\n\n    column1: int = pa.Field(le=10)\n    column2: float = pa.Field(lt=-1.2)\n    column3: str = pa.Field(str_startswith=\"value_\")\n\n    @pa.check(\"column3\")\n    def column_3_check(cls, series: Series[str]) -> Series[bool]:\n        \"\"\"Check that values have two elements after being split with '_'\"\"\"\n        return series.str.split(\"_\", expand=True).shape[1] == 2\n\nSchema.validate(df)\n```\n\n## Development Installation\n\n```\ngit clone https://github.com/pandera-dev/pandera.git\ncd pandera\nexport PYTHON_VERSION=...  # specify desired python version\npip install -r dev/requirements-${PYTHON_VERSION}.txt\npip install -e .\n```\n\n## Tests\n\n```\npip install pytest\npytest tests\n```\n\n## Contributing to pandera [![GitHub contributors](https://img.shields.io/github/contributors/pandera-dev/pandera.svg?style=for-the-badge)](https://github.com/pandera-dev/pandera/graphs/contributors)\n\nAll contributions, bug reports, bug fixes, documentation improvements,\nenhancements and ideas are welcome.\n\nA detailed overview on how to contribute can be found in the\n[contributing guide](https://github.com/pandera-dev/pandera/blob/main/.github/CONTRIBUTING.md)\non GitHub.\n\n## Issues\n\nGo [here](https://github.com/pandera-dev/pandera/issues) to submit feature\nrequests or bugfixes.\n\n## Need Help?\n\nThere are many ways of getting help with your questions. You can ask a question\non [Github Discussions](https://github.com/pandera-dev/pandera/discussions/categories/q-a)\npage or reach out to the maintainers and pandera community on\n[Discord](https://discord.gg/vyanhWuaKB)\n\n## Why `pandera`?\n\n- [dataframe-centric data types](https://pandera.readthedocs.io/en/stable/dtypes.html),\n  [column nullability](https://pandera.readthedocs.io/en/stable/dataframe_schemas.html#null-values-in-columns),\n  and [uniqueness](https://pandera.readthedocs.io/en/stable/dataframe_schemas.html#validating-the-joint-uniqueness-of-columns)\n  are first-class concepts.\n- Define [dataframe models](https://pandera.readthedocs.io/en/stable/schema_models.html) with the class-based API with\n  [pydantic](https://pydantic-docs.helpmanual.io/)-style syntax and validate dataframes using the typing syntax.\n- `check_input` and `check_output` [decorators](https://pandera.readthedocs.io/en/stable/decorators.html#decorators-for-pipeline-integration)\n  enable seamless integration with existing code.\n- [`Check`s](https://pandera.readthedocs.io/en/stable/checks.html) provide flexibility and performance by providing access to `pandas`\n  API by design and offers built-in checks for common data tests.\n- [`Hypothesis`](https://pandera.readthedocs.io/en/stable/hypothesis.html) class provides a tidy-first interface for statistical hypothesis\n  testing.\n- `Check`s and `Hypothesis` objects support both [tidy and wide data validation](https://pandera.readthedocs.io/en/stable/checks.html#wide-checks).\n- Use schemas as generative contracts to [synthesize data](https://pandera.readthedocs.io/en/stable/data_synthesis_strategies.html) for unit testing.\n- [Schema inference](https://pandera.readthedocs.io/en/stable/schema_inference.html) allows you to bootstrap schemas from data.\n\n## How to Cite\n\nIf you use `pandera` in the context of academic or industry research, please\nconsider citing the **paper** and/or **software package**.\n\n### [Paper](https://conference.scipy.org/proceedings/scipy2020/niels_bantilan.html)\n\n```\n@InProceedings{ niels_bantilan-proc-scipy-2020,\n  author    = { {N}iels {B}antilan },\n  title     = { pandera: {S}tatistical {D}ata {V}alidation of {P}andas {D}ataframes },\n  booktitle = { {P}roceedings of the 19th {P}ython in {S}cience {C}onference },\n  pages     = { 116 - 124 },\n  year      = { 2020 },\n  editor    = { {M}eghann {A}garwal and {C}hris {C}alloway and {D}illon {N}iederhut and {D}avid {S}hupe },\n  doi       = { 10.25080/Majora-342d178e-010 }\n}\n```\n\n### Software Package\n\n[![DOI](https://img.shields.io/badge/DOI-10.5281/zenodo.3385265-blue?style=for-the-badge)](https://doi.org/10.5281/zenodo.3385265)\n\n\n## License and Credits\n\n`pandera` is licensed under the [MIT license](license.txt) and is written and\nmaintained by Niels Bantilan (niels@union.ai)\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A light-weight and flexible data validation and testing tool for statistical data objects.",
    "version": "0.21.1",
    "project_urls": {
        "Documentation": "https://pandera.readthedocs.io",
        "Homepage": "https://github.com/pandera-dev/pandera",
        "Issue Tracker": "https://github.com/pandera-dev/pandera/issues"
    },
    "split_keywords": [
        "pandas",
        " validation",
        " data-structures"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9b1c63fac4a86797b584132532cfab7295321103255930b96e516ab9101ffa90",
                "md5": "bde8662e40249e5d83eb6bdc7c262644",
                "sha256": "cb6323952815ab82484bd8371f71d0b609a9cd0f515a7b91b2c076871b4db387"
            },
            "downloads": -1,
            "filename": "pandera-0.21.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "bde8662e40249e5d83eb6bdc7c262644",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 261741,
            "upload_time": "2024-12-04T00:47:29",
            "upload_time_iso_8601": "2024-12-04T00:47:29.199184Z",
            "url": "https://files.pythonhosted.org/packages/9b/1c/63fac4a86797b584132532cfab7295321103255930b96e516ab9101ffa90/pandera-0.21.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "820599a9470ff77b7caee59e32f18447068b832dcd6c6f4dbdf8afe4af18c098",
                "md5": "c6ae5c9462c8d2815a6fcf255e7eabfe",
                "sha256": "3a40b643cd32d1fdd4142917ede1ae91b93a5f3469b01fcf70ffd1046964818c"
            },
            "downloads": -1,
            "filename": "pandera-0.21.1.tar.gz",
            "has_sig": false,
            "md5_digest": "c6ae5c9462c8d2815a6fcf255e7eabfe",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 203482,
            "upload_time": "2024-12-04T00:47:31",
            "upload_time_iso_8601": "2024-12-04T00:47:31.553999Z",
            "url": "https://files.pythonhosted.org/packages/82/05/99a9470ff77b7caee59e32f18447068b832dcd6c6f4dbdf8afe4af18c098/pandera-0.21.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-04 00:47:31",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "pandera-dev",
    "github_project": "pandera",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "lcname": "pandera"
}
        
Elapsed time: 0.56690s