Pandas Type Checks
==================
[![Build Status](https://dev.azure.com/martin-zuber/pandas-type-checks/_apis/build/status/mzuber.pandas-type-checks?branchName=main)](https://dev.azure.com/martin-zuber/pandas-type-checks/_build/latest?definitionId=1&branchName=main)
[![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=mzuber_pandas-type-checks&metric=alert_status)](https://sonarcloud.io/summary/new_code?id=mzuber_pandas-type-checks)
[![Coverage](https://sonarcloud.io/api/project_badges/measure?project=mzuber_pandas-type-checks&metric=coverage)](https://sonarcloud.io/summary/new_code?id=mzuber_pandas-type-checks)
[![PyPI Version](https://img.shields.io/pypi/v/pandas-type-checks)](https://pypi.org/project/pandas-type-checks/)
[![PyPI Wheel](https://img.shields.io/pypi/wheel/pandas-type-checks)](https://pypi.org/project/pandas-type-checks/)
A Python library providing means for structural type checking of Pandas data frames and series:
- A decorator `pandas_type_check` for specifying and checking the structure of Pandas `DataFrame` and `Series`
arguments and return values of a function.
- Support for "non-strict" type checking. In this mode data frames can contain columns which are not part of the type
specification against which they are checked. Non-strict type checking in that sense allows a form of structural
subtyping for data frames.
- Configuration options to raise exceptions for type errors or alternatively log them.
- Configuration option to globally enable/disable the type checks. This allows users to enable the type checking
functionality in e.g. only testing environments.
This library focuses on providing utilities to check the structure (i.e. columns and their types) of Pandas data frames
and series arguments and return values of functions. For checking individual data frame and series values, including
formulating more sophisticated constraints on column values, [Pandera](https://github.com/unionai-oss/pandera) is a
great alternative.
Installation
------------
Packages for all released versions are available at the
[Python Package Index (PyPI)](https://pypi.org/project/pandas-type-checks) and can be installed with `pip`:
```
pip install pandas-type-checks
```
The library can also be installed with support for additional functionality:
```
pip install pandas-type-checks[pandera] # Support for Pandera data frame and series schemas
```
Usage Example
-------------
The function `filter_rows_and_remove_column` is annotated with type check hints for the Pandas `DataFrame` and `Series`
arguments and return value of the function:
```python
import pandas as pd
import numpy as np
import pandas_type_checks as pd_types
@pd_types.pandas_type_check(
pd_types.DataFrameArgument('data', {
'A': np.dtype('float64'),
'B': np.dtype('int64'),
'C': np.dtype('bool')
}),
pd_types.SeriesArgument('filter_values', 'int64'),
pd_types.DataFrameReturnValue({
'B': np.dtype('int64'),
'C': np.dtype('bool')
})
)
def filter_rows_and_remove_column(data: pd.DataFrame, filter_values: pd.Series) -> pd.DataFrame:
return data[data['B'].isin(filter_values.values)].drop('A', axis=1)
```
Applying the function `filter_rows_and_remove_column` to a filter values `Series` with the wrong type will result in a
`TypeError` exception with a detailed type error message:
```python
test_data = pd.DataFrame({
'A': pd.Series(1, index=list(range(4)), dtype='float64'),
'B': np.array([1, 2, 3, 4], dtype='int64'),
'C': np.array([True] * 4, dtype='bool')
})
test_filter_values_with_wrong_type = pd.Series([3, 4], dtype='int32')
filter_rows_and_remove_column(test_data, test_filter_values_with_wrong_type)
```
```
TypeError: Pandas type error in function 'filter_rows_and_remove_column'
Type error in argument 'filter_values':
Expected Series of type 'int64' but found type 'int32'
```
Applying the function `filter_rows_and_remove_column` to a data frame with a wrong column type and a missing column
will result in a `TypeError` exception with a detailed type error message:
```python
test_data_with_wrong_type_and_missing_column = pd.DataFrame({
'A': pd.Series(1, index=list(range(4)), dtype='float64'),
'B': np.array([1, 2, 3, 4], dtype='int32')
})
test_filter_values = pd.Series([3, 4], dtype='int64')
filter_rows_and_remove_column(test_data_with_wrong_type_and_missing_column, test_filter_values)
```
```
TypeError: Pandas type error in function 'filter_rows_and_remove_column'
Type error in argument 'data':
Expected type 'int64' for column B' but found type 'int32'
Missing column in DataFrame: 'C'
Type error in return value:
Expected type 'int64' for column B' but found type 'int32'
Missing column in DataFrame: 'C'
```
Configuration
-------------
The global configuration object `pandas_type_checks.config` can be used to configure the behavior of the library:
- `config.enable_type_checks` (`bool`): Flag for enabling/disabling type checks for specified arguments and return
values. This flag can be used to globally enable or disable the type checker in certain environments.
Default: `True`
- `config.strict_type_checks` (`bool`): Flag for strict type check mode. If strict type checking is enabled data frames
cannot contain columns which are not part of the type specification against which they are checked. Non-strict type
checking in that sense allows a form of structural subtyping for data frames.
Default: `False`
- `config.log_type_errors` (`bool`): Flag indicating that type errors for Pandas dataframes or series values should be
logged instead of raising a `TypeError` exception. Type errors will be logged with log level `ERROR`.
Default: `False`
- `config.logger` (`logging.Logger`): Logger to be used for logging type errors when the `log_type_errors` flag is enabled.
When no logger is specified via the configuration a built-in default logger is used.
Pandera Support
---------------
This library can be installed which additional support for [Pandera](https://github.com/unionai-oss/pandera):
```
pip install pandas-type-checks[pandera]
```
In this case Pandera [DataFrameSchema](https://pandera.readthedocs.io/en/stable/reference/generated/pandera.schemas.DataFrameSchema.html)
and [SeriesSchema](https://pandera.readthedocs.io/en/stable/reference/generated/pandera.schemas.SeriesSchema.html)
can be used as type specifications for data frame and series arguments and return values.
```python
import pandas as pd
import pandera as pa
import numpy as np
import pandas_type_checks as pd_types
@pd_types.pandas_type_check(
pd_types.DataFrameArgument('data',
pa.DataFrameSchema({
'A': pa.Column(np.dtype('float64'), checks=pa.Check.le(10.0)),
'B': pa.Column(np.dtype('int64'), checks=pa.Check.lt(2)),
'C': pa.Column(np.dtype('bool'))
})),
pd_types.SeriesArgument('filter_values', 'int64'),
pd_types.DataFrameReturnValue({
'B': np.dtype('int64'),
'C': np.dtype('bool')
})
)
def filter_rows_and_remove_column(data: pd.DataFrame, filter_values: pd.Series) -> pd.DataFrame:
return data[data['B'].isin(filter_values.values)].drop('A', axis=1)
```
References
----------
* [Google Python Style Guide](https://google.github.io/styleguide/pyguide.html)
* [Python Packaging User Guide](https://packaging.python.org/en/latest/)
Raw data
{
"_id": null,
"home_page": null,
"name": "pandas-type-checks",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "Pandas, type check",
"author": null,
"author_email": "Martin Zuber <martin.zuber@sap.com>",
"download_url": "https://files.pythonhosted.org/packages/f3/98/e50baa275200cd86bbaa6eb761de96a23bfd2d5de6686727ec89098e2157/pandas_type_checks-1.1.3.tar.gz",
"platform": "any",
"description": "Pandas Type Checks\n==================\n\n[![Build Status](https://dev.azure.com/martin-zuber/pandas-type-checks/_apis/build/status/mzuber.pandas-type-checks?branchName=main)](https://dev.azure.com/martin-zuber/pandas-type-checks/_build/latest?definitionId=1&branchName=main)\n[![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=mzuber_pandas-type-checks&metric=alert_status)](https://sonarcloud.io/summary/new_code?id=mzuber_pandas-type-checks)\n[![Coverage](https://sonarcloud.io/api/project_badges/measure?project=mzuber_pandas-type-checks&metric=coverage)](https://sonarcloud.io/summary/new_code?id=mzuber_pandas-type-checks)\n[![PyPI Version](https://img.shields.io/pypi/v/pandas-type-checks)](https://pypi.org/project/pandas-type-checks/)\n[![PyPI Wheel](https://img.shields.io/pypi/wheel/pandas-type-checks)](https://pypi.org/project/pandas-type-checks/)\n\nA Python library providing means for structural type checking of Pandas data frames and series:\n- A decorator `pandas_type_check` for specifying and checking the structure of Pandas `DataFrame` and `Series`\n arguments and return values of a function.\n- Support for \"non-strict\" type checking. In this mode data frames can contain columns which are not part of the type\n specification against which they are checked. Non-strict type checking in that sense allows a form of structural\n subtyping for data frames.\n- Configuration options to raise exceptions for type errors or alternatively log them.\n- Configuration option to globally enable/disable the type checks. This allows users to enable the type checking\n functionality in e.g. only testing environments.\n\nThis library focuses on providing utilities to check the structure (i.e. columns and their types) of Pandas data frames\nand series arguments and return values of functions. For checking individual data frame and series values, including\nformulating more sophisticated constraints on column values, [Pandera](https://github.com/unionai-oss/pandera) is a\ngreat alternative.\n\nInstallation\n------------\n\nPackages for all released versions are available at the\n[Python Package Index (PyPI)](https://pypi.org/project/pandas-type-checks) and can be installed with `pip`:\n\n```\npip install pandas-type-checks\n```\n\nThe library can also be installed with support for additional functionality:\n\n```\npip install pandas-type-checks[pandera] # Support for Pandera data frame and series schemas\n```\n\nUsage Example\n-------------\n\nThe function `filter_rows_and_remove_column` is annotated with type check hints for the Pandas `DataFrame` and `Series`\narguments and return value of the function:\n\n```python\nimport pandas as pd\nimport numpy as np\nimport pandas_type_checks as pd_types\n\n@pd_types.pandas_type_check(\n pd_types.DataFrameArgument('data', {\n 'A': np.dtype('float64'),\n 'B': np.dtype('int64'),\n 'C': np.dtype('bool')\n }),\n pd_types.SeriesArgument('filter_values', 'int64'),\n pd_types.DataFrameReturnValue({\n 'B': np.dtype('int64'),\n 'C': np.dtype('bool')\n })\n)\ndef filter_rows_and_remove_column(data: pd.DataFrame, filter_values: pd.Series) -> pd.DataFrame:\n return data[data['B'].isin(filter_values.values)].drop('A', axis=1)\n```\n\nApplying the function `filter_rows_and_remove_column` to a filter values `Series` with the wrong type will result in a\n`TypeError` exception with a detailed type error message:\n\n```python\ntest_data = pd.DataFrame({\n 'A': pd.Series(1, index=list(range(4)), dtype='float64'),\n 'B': np.array([1, 2, 3, 4], dtype='int64'),\n 'C': np.array([True] * 4, dtype='bool')\n})\ntest_filter_values_with_wrong_type = pd.Series([3, 4], dtype='int32')\n\nfilter_rows_and_remove_column(test_data, test_filter_values_with_wrong_type)\n```\n\n```\nTypeError: Pandas type error in function 'filter_rows_and_remove_column'\nType error in argument 'filter_values':\n\tExpected Series of type 'int64' but found type 'int32'\n```\n\nApplying the function `filter_rows_and_remove_column` to a data frame with a wrong column type and a missing column\nwill result in a `TypeError` exception with a detailed type error message:\n\n```python\ntest_data_with_wrong_type_and_missing_column = pd.DataFrame({\n 'A': pd.Series(1, index=list(range(4)), dtype='float64'),\n 'B': np.array([1, 2, 3, 4], dtype='int32')\n})\ntest_filter_values = pd.Series([3, 4], dtype='int64')\n\nfilter_rows_and_remove_column(test_data_with_wrong_type_and_missing_column, test_filter_values)\n```\n\n```\nTypeError: Pandas type error in function 'filter_rows_and_remove_column'\nType error in argument 'data':\n Expected type 'int64' for column B' but found type 'int32'\n Missing column in DataFrame: 'C'\nType error in return value:\n Expected type 'int64' for column B' but found type 'int32'\n Missing column in DataFrame: 'C'\n```\n\nConfiguration\n-------------\n\nThe global configuration object `pandas_type_checks.config` can be used to configure the behavior of the library:\n- `config.enable_type_checks` (`bool`): Flag for enabling/disabling type checks for specified arguments and return\n values. This flag can be used to globally enable or disable the type checker in certain environments.\n\n Default: `True`\n- `config.strict_type_checks` (`bool`): Flag for strict type check mode. If strict type checking is enabled data frames\n cannot contain columns which are not part of the type specification against which they are checked. Non-strict type\n checking in that sense allows a form of structural subtyping for data frames.\n\n Default: `False`\n- `config.log_type_errors` (`bool`): Flag indicating that type errors for Pandas dataframes or series values should be\n logged instead of raising a `TypeError` exception. Type errors will be logged with log level `ERROR`.\n\n Default: `False`\n- `config.logger` (`logging.Logger`): Logger to be used for logging type errors when the `log_type_errors` flag is enabled.\n When no logger is specified via the configuration a built-in default logger is used.\n\nPandera Support\n---------------\n\nThis library can be installed which additional support for [Pandera](https://github.com/unionai-oss/pandera):\n\n```\npip install pandas-type-checks[pandera]\n```\n\nIn this case Pandera [DataFrameSchema](https://pandera.readthedocs.io/en/stable/reference/generated/pandera.schemas.DataFrameSchema.html)\nand [SeriesSchema](https://pandera.readthedocs.io/en/stable/reference/generated/pandera.schemas.SeriesSchema.html)\ncan be used as type specifications for data frame and series arguments and return values.\n\n```python\nimport pandas as pd\nimport pandera as pa\nimport numpy as np\nimport pandas_type_checks as pd_types\n\n@pd_types.pandas_type_check(\n pd_types.DataFrameArgument('data',\n pa.DataFrameSchema({\n 'A': pa.Column(np.dtype('float64'), checks=pa.Check.le(10.0)),\n 'B': pa.Column(np.dtype('int64'), checks=pa.Check.lt(2)),\n 'C': pa.Column(np.dtype('bool'))\n })),\n pd_types.SeriesArgument('filter_values', 'int64'),\n pd_types.DataFrameReturnValue({\n 'B': np.dtype('int64'),\n 'C': np.dtype('bool')\n })\n)\ndef filter_rows_and_remove_column(data: pd.DataFrame, filter_values: pd.Series) -> pd.DataFrame:\n return data[data['B'].isin(filter_values.values)].drop('A', axis=1)\n```\n\nReferences\n----------\n\n* [Google Python Style Guide](https://google.github.io/styleguide/pyguide.html)\n* [Python Packaging User Guide](https://packaging.python.org/en/latest/)\n\n",
"bugtrack_url": null,
"license": "BSD 3-Clause License",
"summary": "Structural type checking for Pandas data frames.",
"version": "1.1.3",
"project_urls": {
"Source Code": "https://github.com/mzuber/pandas-type-checks"
},
"split_keywords": [
"pandas",
" type check"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ece992b77a8d9e2a83e6f074b659d9794def1649d14b271dcbe3145f8def2f5a",
"md5": "dd7266664c7facca57cc3272de5b5247",
"sha256": "05101d590f7f2feac9109d2967a32a740747531af1f5b8f7bcea1d3cd1aeeed7"
},
"downloads": -1,
"filename": "pandas_type_checks-1.1.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "dd7266664c7facca57cc3272de5b5247",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 11081,
"upload_time": "2024-08-23T06:44:23",
"upload_time_iso_8601": "2024-08-23T06:44:23.848970Z",
"url": "https://files.pythonhosted.org/packages/ec/e9/92b77a8d9e2a83e6f074b659d9794def1649d14b271dcbe3145f8def2f5a/pandas_type_checks-1.1.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f398e50baa275200cd86bbaa6eb761de96a23bfd2d5de6686727ec89098e2157",
"md5": "a5aa8fcd91cd85e137893bb84791586d",
"sha256": "02127fb0b85caf681eb31e1293bf110c558888abf39c3891ceb2bfaca0e50fee"
},
"downloads": -1,
"filename": "pandas_type_checks-1.1.3.tar.gz",
"has_sig": false,
"md5_digest": "a5aa8fcd91cd85e137893bb84791586d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 14391,
"upload_time": "2024-08-23T06:44:25",
"upload_time_iso_8601": "2024-08-23T06:44:25.219705Z",
"url": "https://files.pythonhosted.org/packages/f3/98/e50baa275200cd86bbaa6eb761de96a23bfd2d5de6686727ec89098e2157/pandas_type_checks-1.1.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-23 06:44:25",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "mzuber",
"github_project": "pandas-type-checks",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"tox": true,
"lcname": "pandas-type-checks"
}