flake8-pyspark-with-column


Nameflake8-pyspark-with-column JSON
Version 0.0.2 PyPI version JSON
download
home_pageNone
SummaryA Flake8 plugin to check for PySpark withColumn usage in loops
upload_time2024-09-22 12:26:20
maintainerNone
docs_urlNone
authorNone
requires_pythonNone
licenseNone
keywords flake8 linter pyspark quality
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Flake8-pyspark-with-column

A flake8 plugin that detects of usage `withColumn` in a loop or inside `reduce`. From the PySpark documentation about `withColumn` method:

```
  This method introduces a projection internally.
  Therefore, calling it multiple times, for instance,
  via loops in order to add multiple columns
  can generate big plans which can cause performance issues
  and even StackOverflowException.
  To avoid this, use select() with multiple columns at once.
```

## Rules
This plugin contains the following rules:

- `PSPRK001`: Usage of withColumn in a loop detected
- `PSPRK002`: Usage of withColumn iside reduce is detected

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "flake8-pyspark-with-column",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "flake8, linter, pyspark, quality",
    "author": null,
    "author_email": "Sem Sinchenko <ssinchenko@apache.org>",
    "download_url": "https://files.pythonhosted.org/packages/36/3a/7edfce8cc46f11c588af5e4e5599db311ecc0e307e943a35405dc19f76c7/flake8_pyspark_with_column-0.0.2.tar.gz",
    "platform": null,
    "description": "# Flake8-pyspark-with-column\n\nA flake8 plugin that detects of usage `withColumn` in a loop or inside `reduce`. From the PySpark documentation about `withColumn` method:\n\n```\n  This method introduces a projection internally.\n  Therefore, calling it multiple times, for instance,\n  via loops in order to add multiple columns\n  can generate big plans which can cause performance issues\n  and even StackOverflowException.\n  To avoid this, use select() with multiple columns at once.\n```\n\n## Rules\nThis plugin contains the following rules:\n\n- `PSPRK001`: Usage of withColumn in a loop detected\n- `PSPRK002`: Usage of withColumn iside reduce is detected\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A Flake8 plugin to check for PySpark withColumn usage in loops",
    "version": "0.0.2",
    "project_urls": {
        "Homepage": "https://github.com/SemyonSinchenko/flake8-pyspark-with-column",
        "Repository": "https://github.com/SemyonSinchenko/flake8-pyspark-with-column.git"
    },
    "split_keywords": [
        "flake8",
        " linter",
        " pyspark",
        " quality"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6cf5acaf42f53af29b64ea30a60357b7cb370628a1fb9ad7591f0d4acd665f1d",
                "md5": "b7b42657f49c19cd1c0192957ac8474a",
                "sha256": "19cfd8c7b3aab91f0cc68398206255f089be01392af02f78291ead15330078ff"
            },
            "downloads": -1,
            "filename": "flake8_pyspark_with_column-0.0.2-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b7b42657f49c19cd1c0192957ac8474a",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 6824,
            "upload_time": "2024-09-22T12:26:19",
            "upload_time_iso_8601": "2024-09-22T12:26:19.512175Z",
            "url": "https://files.pythonhosted.org/packages/6c/f5/acaf42f53af29b64ea30a60357b7cb370628a1fb9ad7591f0d4acd665f1d/flake8_pyspark_with_column-0.0.2-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "363a7edfce8cc46f11c588af5e4e5599db311ecc0e307e943a35405dc19f76c7",
                "md5": "17f1c9c0f9fd55f628c7fa9ca434b5e0",
                "sha256": "897670411f9ca6858d9f36ba328182895e65a2ea54f23e29212e9cc75f8c0dad"
            },
            "downloads": -1,
            "filename": "flake8_pyspark_with_column-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "17f1c9c0f9fd55f628c7fa9ca434b5e0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 7235,
            "upload_time": "2024-09-22T12:26:20",
            "upload_time_iso_8601": "2024-09-22T12:26:20.917988Z",
            "url": "https://files.pythonhosted.org/packages/36/3a/7edfce8cc46f11c588af5e4e5599db311ecc0e307e943a35405dc19f76c7/flake8_pyspark_with_column-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-22 12:26:20",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "SemyonSinchenko",
    "github_project": "flake8-pyspark-with-column",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "flake8-pyspark-with-column"
}
        
Elapsed time: 0.72903s