================================================================
Strictly Typed Pandas: static type checking of pandas DataFrames
================================================================
I love Pandas! But in production code I’m always a bit wary when I see:
.. code-block:: python
import pandas as pd
def foo(df: pd.DataFrame) -> pd.DataFrame:
# do stuff
return df
Because… How do I know which columns are supposed to be in `df`?
Using `strictly_typed_pandas`, we can be more explicit about what these data should look like.
.. code-block:: python
from strictly_typed_pandas import DataSet
class Schema:
id: int
name: str
def foo(df: DataSet[Schema]) -> DataSet[Schema]:
# do stuff
return df
Where `DataSet`:
* is a subclass of `pd.DataFrame` and hence has the same functionality as `DataFrame`.
* validates whether the data adheres to the provided schema upon its initialization.
* is immutable, so its schema cannot be changed using inplace modifications.
The `DataSet[Schema]` annotations are compatible with:
* `mypy` for type checking during linting-time (i.e. while you write your code).
* `typeguard` (<v3.0) for type checking during run-time (i.e. while you run your unit tests).
To get the most out of `strictly_typed_pandas`, be sure to:
* set up `mypy` in your IDE.
* run your unit tests with `pytest --stp-typeguard-packages=foo.bar` (where `foo.bar` is your package name).
Installation
============
.. code-block:: bash
pip install strictly-typed-pandas
Documentation
=================
For example notebooks and API documentation, please see our `ReadTheDocs <https://strictly-typed-pandas.readthedocs.io/>`_.
FAQ
===
| **Do you know of something similar for pyspark?**
| Yes! Check out our package `typedspark <https://github.com/kaiko-ai/typedspark/>`_.
|
| **Why use Python if you want static typing?**
| There are just so many good packages for data science in Python. Rather than sacrificing all of that by moving to a different language, I'd like to make the Pythonverse a little bit better.
|
| **I found a bug! What should I do?**
| Great! Contact me and I'll look into it.
|
| **I have a great idea to improve strictly_typed_pandas! How can we make this work?**
| Awesome, drop me a line!
Raw data
{
"_id": null,
"home_page": "https://github.com/nanne-aben/strictly_typed_pandas",
"name": "strictly-typed-pandas",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8.0",
"maintainer_email": "",
"keywords": "typing type checking pandas mypy linting",
"author": "Nanne Aben",
"author_email": "nanne.aben@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/f7/0e/a09ba21fa23020cc3a7a8ec9d225d6d0c984264c71b2d11064749a5810c5/strictly_typed_pandas-0.2.2.tar.gz",
"platform": null,
"description": "================================================================\nStrictly Typed Pandas: static type checking of pandas DataFrames\n================================================================\n\nI love Pandas! But in production code I\u2019m always a bit wary when I see:\n\n.. code-block:: python\n\n import pandas as pd\n\n def foo(df: pd.DataFrame) -> pd.DataFrame:\n # do stuff\n return df\n\nBecause\u2026 How do I know which columns are supposed to be in `df`?\n\nUsing `strictly_typed_pandas`, we can be more explicit about what these data should look like.\n\n.. code-block:: python\n\n from strictly_typed_pandas import DataSet\n\n class Schema:\n id: int\n name: str\n\n def foo(df: DataSet[Schema]) -> DataSet[Schema]:\n # do stuff\n return df\n\nWhere `DataSet`:\n * is a subclass of `pd.DataFrame` and hence has the same functionality as `DataFrame`.\n * validates whether the data adheres to the provided schema upon its initialization.\n * is immutable, so its schema cannot be changed using inplace modifications.\n\nThe `DataSet[Schema]` annotations are compatible with:\n * `mypy` for type checking during linting-time (i.e. while you write your code).\n * `typeguard` (<v3.0) for type checking during run-time (i.e. while you run your unit tests).\n\nTo get the most out of `strictly_typed_pandas`, be sure to:\n * set up `mypy` in your IDE.\n * run your unit tests with `pytest --stp-typeguard-packages=foo.bar` (where `foo.bar` is your package name).\n\nInstallation\n============\n\n.. code-block:: bash\n\n pip install strictly-typed-pandas\n\n\nDocumentation\n=================\nFor example notebooks and API documentation, please see our `ReadTheDocs <https://strictly-typed-pandas.readthedocs.io/>`_.\n\nFAQ\n===\n\n| **Do you know of something similar for pyspark?**\n| Yes! Check out our package `typedspark <https://github.com/kaiko-ai/typedspark/>`_.\n|\n| **Why use Python if you want static typing?**\n| There are just so many good packages for data science in Python. Rather than sacrificing all of that by moving to a different language, I'd like to make the Pythonverse a little bit better.\n|\n| **I found a bug! What should I do?**\n| Great! Contact me and I'll look into it.\n|\n| **I have a great idea to improve strictly_typed_pandas! How can we make this work?**\n| Awesome, drop me a line!\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Static type checking of pandas DataFrames",
"version": "0.2.2",
"project_urls": {
"Homepage": "https://github.com/nanne-aben/strictly_typed_pandas"
},
"split_keywords": [
"typing",
"type",
"checking",
"pandas",
"mypy",
"linting"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "4253b7e27f212da27cfa6ef33af36224ede8f0d9dbbdd013700755cb3b39978e",
"md5": "7a4faf11cd712e975fffd873b444af6b",
"sha256": "b383afa5acc0958df01891a50d153ce59d7848dc719d05556152c70ad32de30f"
},
"downloads": -1,
"filename": "strictly_typed_pandas-0.2.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "7a4faf11cd712e975fffd873b444af6b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8.0",
"size": 25352,
"upload_time": "2024-02-26T08:27:40",
"upload_time_iso_8601": "2024-02-26T08:27:40.318417Z",
"url": "https://files.pythonhosted.org/packages/42/53/b7e27f212da27cfa6ef33af36224ede8f0d9dbbdd013700755cb3b39978e/strictly_typed_pandas-0.2.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f70ea09ba21fa23020cc3a7a8ec9d225d6d0c984264c71b2d11064749a5810c5",
"md5": "48f7cc3fb2d5657f7c8dd58a340d7d38",
"sha256": "52465481eb995a789adaf869d501be5ba569a601bfdc21f6e5b0317312838ad2"
},
"downloads": -1,
"filename": "strictly_typed_pandas-0.2.2.tar.gz",
"has_sig": false,
"md5_digest": "48f7cc3fb2d5657f7c8dd58a340d7d38",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8.0",
"size": 24154,
"upload_time": "2024-02-26T08:27:41",
"upload_time_iso_8601": "2024-02-26T08:27:41.657506Z",
"url": "https://files.pythonhosted.org/packages/f7/0e/a09ba21fa23020cc3a7a8ec9d225d6d0c984264c71b2d11064749a5810c5/strictly_typed_pandas-0.2.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-02-26 08:27:41",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "nanne-aben",
"github_project": "strictly_typed_pandas",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"tox": true,
"lcname": "strictly-typed-pandas"
}