typedspark


Nametypedspark JSON
Version 1.4.2 PyPI version JSON
download
home_pagehttps://github.com/kaiko-ai/typedspark
SummaryColumn-wise type annotations for pyspark DataFrames
upload_time2024-04-30 13:20:15
maintainerNone
docs_urlNone
authorNanne Aben
requires_python>=3.9.0
licenseApache-2.0
keywords pyspark spark typing type checking annotations
VCS
bugtrack_url
requirements typing-extensions
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Typedspark: column-wise type annotations for pyspark DataFrames

We love Spark! But in production code we're wary when we see:

```python
from pyspark.sql import DataFrame

def foo(df: DataFrame) -> DataFrame:
    # do stuff
    return df
```

Becauseā€¦ How do we know which columns are supposed to be in ``df``?

Using ``typedspark``, we can be more explicit about what these data should look like.

```python
from typedspark import Column, DataSet, Schema
from pyspark.sql.types import LongType, StringType

class Person(Schema):
    id: Column[LongType]
    name: Column[StringType]
    age: Column[LongType]

def foo(df: DataSet[Person]) -> DataSet[Person]:
    # do stuff
    return df
```
The advantages include:

* Improved readability of the code
* Typechecking, both during runtime and linting
* Auto-complete of column names
* Easy refactoring of column names
* Easier unit testing through the generation of empty ``DataSets`` based on their schemas
* Improved documentation of tables

## Documentation
Please see our documentation on [readthedocs](https://typedspark.readthedocs.io/en/latest/index.html).

## Installation

You can install ``typedspark`` from [pypi](https://pypi.org/project/typedspark/) by running:

```bash
pip install typedspark
```
By default, ``typedspark`` does not list ``pyspark`` as a dependency, since many platforms (e.g. Databricks) come with ``pyspark`` preinstalled.  If you want to install ``typedspark`` with ``pyspark``, you can run:

```bash
pip install "typedspark[pyspark]"
```

## Demo videos

### IDE demo

https://github.com/kaiko-ai/typedspark/assets/47976799/e6f7fa9c-6d14-4f68-baba-fe3c22f75b67

You can find the corresponding code [here](docs/videos/ide.ipynb).

### Jupyter / Databricks notebooks demo

https://github.com/kaiko-ai/typedspark/assets/47976799/39e157c3-6db0-436a-9e72-44b2062df808

You can find the corresponding code [here](docs/videos/notebook.ipynb).

## FAQ

**I found a bug! What should I do?**</br>
Great! Please make an issue and we'll look into it.

**I have a great idea to improve typedspark! How can we make this work?**</br>
Awesome, please make an issue and let us know!

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/kaiko-ai/typedspark",
    "name": "typedspark",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9.0",
    "maintainer_email": null,
    "keywords": "pyspark spark typing type checking annotations",
    "author": "Nanne Aben",
    "author_email": "nanne@kaiko.ai",
    "download_url": "https://files.pythonhosted.org/packages/46/a6/157a6d8e160b7a4981d8510fb9c8ce8059a6bc4703a6cc7ae65c6c3bdaa6/typedspark-1.4.2.tar.gz",
    "platform": null,
    "description": "# Typedspark: column-wise type annotations for pyspark DataFrames\n\nWe love Spark! But in production code we're wary when we see:\n\n```python\nfrom pyspark.sql import DataFrame\n\ndef foo(df: DataFrame) -> DataFrame:\n    # do stuff\n    return df\n```\n\nBecause\u2026 How do we know which columns are supposed to be in ``df``?\n\nUsing ``typedspark``, we can be more explicit about what these data should look like.\n\n```python\nfrom typedspark import Column, DataSet, Schema\nfrom pyspark.sql.types import LongType, StringType\n\nclass Person(Schema):\n    id: Column[LongType]\n    name: Column[StringType]\n    age: Column[LongType]\n\ndef foo(df: DataSet[Person]) -> DataSet[Person]:\n    # do stuff\n    return df\n```\nThe advantages include:\n\n* Improved readability of the code\n* Typechecking, both during runtime and linting\n* Auto-complete of column names\n* Easy refactoring of column names\n* Easier unit testing through the generation of empty ``DataSets`` based on their schemas\n* Improved documentation of tables\n\n## Documentation\nPlease see our documentation on [readthedocs](https://typedspark.readthedocs.io/en/latest/index.html).\n\n## Installation\n\nYou can install ``typedspark`` from [pypi](https://pypi.org/project/typedspark/) by running:\n\n```bash\npip install typedspark\n```\nBy default, ``typedspark`` does not list ``pyspark`` as a dependency, since many platforms (e.g. Databricks) come with ``pyspark`` preinstalled.  If you want to install ``typedspark`` with ``pyspark``, you can run:\n\n```bash\npip install \"typedspark[pyspark]\"\n```\n\n## Demo videos\n\n### IDE demo\n\nhttps://github.com/kaiko-ai/typedspark/assets/47976799/e6f7fa9c-6d14-4f68-baba-fe3c22f75b67\n\nYou can find the corresponding code [here](docs/videos/ide.ipynb).\n\n### Jupyter / Databricks notebooks demo\n\nhttps://github.com/kaiko-ai/typedspark/assets/47976799/39e157c3-6db0-436a-9e72-44b2062df808\n\nYou can find the corresponding code [here](docs/videos/notebook.ipynb).\n\n## FAQ\n\n**I found a bug! What should I do?**</br>\nGreat! Please make an issue and we'll look into it.\n\n**I have a great idea to improve typedspark! How can we make this work?**</br>\nAwesome, please make an issue and let us know!\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Column-wise type annotations for pyspark DataFrames",
    "version": "1.4.2",
    "project_urls": {
        "Homepage": "https://github.com/kaiko-ai/typedspark"
    },
    "split_keywords": [
        "pyspark",
        "spark",
        "typing",
        "type",
        "checking",
        "annotations"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c3cd9d69f4cb09620b96d065469a5a8d63b39b00921ce711a6e7440122d89414",
                "md5": "60a7a9383a816eba793c8c428b9d0a1f",
                "sha256": "1f297356f2c84a5b0afd792ab81b4eb0cb7a84829b7148d2707ac4f4fd0de045"
            },
            "downloads": -1,
            "filename": "typedspark-1.4.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "60a7a9383a816eba793c8c428b9d0a1f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9.0",
            "size": 34999,
            "upload_time": "2024-04-30T13:20:13",
            "upload_time_iso_8601": "2024-04-30T13:20:13.479205Z",
            "url": "https://files.pythonhosted.org/packages/c3/cd/9d69f4cb09620b96d065469a5a8d63b39b00921ce711a6e7440122d89414/typedspark-1.4.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "46a6157a6d8e160b7a4981d8510fb9c8ce8059a6bc4703a6cc7ae65c6c3bdaa6",
                "md5": "ca158462c429ad4b13ac8a49d220bb49",
                "sha256": "79d28eb5acf9857ea784cc0a3397e8a94348b37ff6b45a1e8c5f510b6a42f8d1"
            },
            "downloads": -1,
            "filename": "typedspark-1.4.2.tar.gz",
            "has_sig": false,
            "md5_digest": "ca158462c429ad4b13ac8a49d220bb49",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9.0",
            "size": 27140,
            "upload_time": "2024-04-30T13:20:15",
            "upload_time_iso_8601": "2024-04-30T13:20:15.802476Z",
            "url": "https://files.pythonhosted.org/packages/46/a6/157a6d8e160b7a4981d8510fb9c8ce8059a6bc4703a6cc7ae65c6c3bdaa6/typedspark-1.4.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-30 13:20:15",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kaiko-ai",
    "github_project": "typedspark",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "typing-extensions",
            "specs": [
                [
                    "<=",
                    "4.11.0"
                ]
            ]
        }
    ],
    "tox": true,
    "lcname": "typedspark"
}
        
Elapsed time: 0.25725s