================================================================
Strictly Typed Pandas: static type checking of pandas DataFrames
================================================================
I love Pandas! But in production code I’m always a bit wary when I see:
.. code-block:: python
import pandas as pd
def foo(df: pd.DataFrame) -> pd.DataFrame:
# do stuff
return df
Because… How do I know which columns are supposed to be in `df`?
Using `strictly_typed_pandas`, we can be more explicit about what these data should look like.
.. code-block:: python
from strictly_typed_pandas import DataSet
class Schema:
id: int
name: str
def foo(df: DataSet[Schema]) -> DataSet[Schema]:
# do stuff
return df
Where `DataSet`:
* is a subclass of `pd.DataFrame` and hence has the same functionality as `DataFrame`.
* validates whether the data adheres to the provided schema upon its initialization.
* is immutable, so its schema cannot be changed using inplace modifications.
The `DataSet[Schema]` annotations are compatible with:
* `mypy` for type checking during linting-time (i.e. while you write your code).
* `typeguard` (<v3.0) for type checking during run-time (i.e. while you run your unit tests).
To get the most out of `strictly_typed_pandas`, be sure to:
* set up `mypy` in your IDE.
* run your unit tests with `pytest --stp-typeguard-packages=foo.bar` (where `foo.bar` is your package name).
Installation
============
.. code-block:: bash
pip install strictly-typed-pandas
Documentation
=================
For example notebooks and API documentation, please see our `ReadTheDocs <https://strictly-typed-pandas.readthedocs.io/>`_.
FAQ
===
| **Do you know of something similar for pyspark?**
| Yes! Check out our package `typedspark <https://github.com/kaiko-ai/typedspark/>`_.
|
| **Why use Python if you want static typing?**
| There are just so many good packages for data science in Python. Rather than sacrificing all of that by moving to a different language, I'd like to make the Pythonverse a little bit better.
|
| **I found a bug! What should I do?**
| Great! Contact me and I'll look into it.
|
| **I have a great idea to improve strictly_typed_pandas! How can we make this work?**
| Awesome, drop me a line!
Raw data
{
"_id": null,
"home_page": "https://github.com/nanne-aben/strictly_typed_pandas",
"name": "strictly-typed-pandas",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8.0",
"maintainer_email": null,
"keywords": "typing type checking pandas mypy linting",
"author": "Nanne Aben",
"author_email": "nanne.aben@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/3b/33/bdbb81a51963119a4a96db29f36c4396636181287d96ab6130332ff32b09/strictly_typed_pandas-0.3.5.tar.gz",
"platform": null,
"description": "================================================================\nStrictly Typed Pandas: static type checking of pandas DataFrames\n================================================================\n\nI love Pandas! But in production code I\u2019m always a bit wary when I see:\n\n.. code-block:: python\n\n import pandas as pd\n\n def foo(df: pd.DataFrame) -> pd.DataFrame:\n # do stuff\n return df\n\nBecause\u2026 How do I know which columns are supposed to be in `df`?\n\nUsing `strictly_typed_pandas`, we can be more explicit about what these data should look like.\n\n.. code-block:: python\n\n from strictly_typed_pandas import DataSet\n\n class Schema:\n id: int\n name: str\n\n def foo(df: DataSet[Schema]) -> DataSet[Schema]:\n # do stuff\n return df\n\nWhere `DataSet`:\n * is a subclass of `pd.DataFrame` and hence has the same functionality as `DataFrame`.\n * validates whether the data adheres to the provided schema upon its initialization.\n * is immutable, so its schema cannot be changed using inplace modifications.\n\nThe `DataSet[Schema]` annotations are compatible with:\n * `mypy` for type checking during linting-time (i.e. while you write your code).\n * `typeguard` (<v3.0) for type checking during run-time (i.e. while you run your unit tests).\n\nTo get the most out of `strictly_typed_pandas`, be sure to:\n * set up `mypy` in your IDE.\n * run your unit tests with `pytest --stp-typeguard-packages=foo.bar` (where `foo.bar` is your package name).\n\nInstallation\n============\n\n.. code-block:: bash\n\n pip install strictly-typed-pandas\n\n\nDocumentation\n=================\nFor example notebooks and API documentation, please see our `ReadTheDocs <https://strictly-typed-pandas.readthedocs.io/>`_.\n\nFAQ\n===\n\n| **Do you know of something similar for pyspark?**\n| Yes! Check out our package `typedspark <https://github.com/kaiko-ai/typedspark/>`_.\n|\n| **Why use Python if you want static typing?**\n| There are just so many good packages for data science in Python. Rather than sacrificing all of that by moving to a different language, I'd like to make the Pythonverse a little bit better.\n|\n| **I found a bug! What should I do?**\n| Great! Contact me and I'll look into it.\n|\n| **I have a great idea to improve strictly_typed_pandas! How can we make this work?**\n| Awesome, drop me a line!\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Static type checking of pandas DataFrames",
"version": "0.3.5",
"project_urls": {
"Homepage": "https://github.com/nanne-aben/strictly_typed_pandas"
},
"split_keywords": [
"typing",
"type",
"checking",
"pandas",
"mypy",
"linting"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "55a1cbf0c2e9c0646da6e6c02e97cc1d2227ab5839f4ad3893f0224d2e059304",
"md5": "e82091ad1d8057faa21f098e0baa2546",
"sha256": "60c12e714bbb37fcc8ba9abc97b175bf0754586dc9b3ae566ab589f4070dfae0"
},
"downloads": -1,
"filename": "strictly_typed_pandas-0.3.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e82091ad1d8057faa21f098e0baa2546",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8.0",
"size": 25389,
"upload_time": "2024-11-15T18:14:50",
"upload_time_iso_8601": "2024-11-15T18:14:50.793751Z",
"url": "https://files.pythonhosted.org/packages/55/a1/cbf0c2e9c0646da6e6c02e97cc1d2227ab5839f4ad3893f0224d2e059304/strictly_typed_pandas-0.3.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "3b33bdbb81a51963119a4a96db29f36c4396636181287d96ab6130332ff32b09",
"md5": "95eb6b5a502911cdd7c28b7e2eef3f14",
"sha256": "6fc4a9dabf4c2fa9626de7cfe17967f6457458763b3bbbc8c832f88dc4161b08"
},
"downloads": -1,
"filename": "strictly_typed_pandas-0.3.5.tar.gz",
"has_sig": false,
"md5_digest": "95eb6b5a502911cdd7c28b7e2eef3f14",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8.0",
"size": 24444,
"upload_time": "2024-11-15T18:14:51",
"upload_time_iso_8601": "2024-11-15T18:14:51.771207Z",
"url": "https://files.pythonhosted.org/packages/3b/33/bdbb81a51963119a4a96db29f36c4396636181287d96ab6130332ff32b09/strictly_typed_pandas-0.3.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-15 18:14:51",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "nanne-aben",
"github_project": "strictly_typed_pandas",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"tox": true,
"lcname": "strictly-typed-pandas"
}