# Flake8-pyspark-with-column
A flake8 plugin that detects of usage `withColumn` in a loop or inside `reduce`. From the PySpark documentation about `withColumn` method:
```
This method introduces a projection internally.
Therefore, calling it multiple times, for instance,
via loops in order to add multiple columns
can generate big plans which can cause performance issues
and even StackOverflowException.
To avoid this, use select() with multiple columns at once.
```
## Rules
This plugin contains the following rules:
- `PSPRK001`: Usage of withColumn in a loop detected
- `PSPRK002`: Usage of withColumn iside reduce is detected
Raw data
{
"_id": null,
"home_page": null,
"name": "flake8-pyspark-with-column",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "flake8, linter, pyspark, quality",
"author": null,
"author_email": "Sem Sinchenko <ssinchenko@apache.org>",
"download_url": "https://files.pythonhosted.org/packages/36/3a/7edfce8cc46f11c588af5e4e5599db311ecc0e307e943a35405dc19f76c7/flake8_pyspark_with_column-0.0.2.tar.gz",
"platform": null,
"description": "# Flake8-pyspark-with-column\n\nA flake8 plugin that detects of usage `withColumn` in a loop or inside `reduce`. From the PySpark documentation about `withColumn` method:\n\n```\n This method introduces a projection internally.\n Therefore, calling it multiple times, for instance,\n via loops in order to add multiple columns\n can generate big plans which can cause performance issues\n and even StackOverflowException.\n To avoid this, use select() with multiple columns at once.\n```\n\n## Rules\nThis plugin contains the following rules:\n\n- `PSPRK001`: Usage of withColumn in a loop detected\n- `PSPRK002`: Usage of withColumn iside reduce is detected\n",
"bugtrack_url": null,
"license": null,
"summary": "A Flake8 plugin to check for PySpark withColumn usage in loops",
"version": "0.0.2",
"project_urls": {
"Homepage": "https://github.com/SemyonSinchenko/flake8-pyspark-with-column",
"Repository": "https://github.com/SemyonSinchenko/flake8-pyspark-with-column.git"
},
"split_keywords": [
"flake8",
" linter",
" pyspark",
" quality"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "6cf5acaf42f53af29b64ea30a60357b7cb370628a1fb9ad7591f0d4acd665f1d",
"md5": "b7b42657f49c19cd1c0192957ac8474a",
"sha256": "19cfd8c7b3aab91f0cc68398206255f089be01392af02f78291ead15330078ff"
},
"downloads": -1,
"filename": "flake8_pyspark_with_column-0.0.2-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "b7b42657f49c19cd1c0192957ac8474a",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": null,
"size": 6824,
"upload_time": "2024-09-22T12:26:19",
"upload_time_iso_8601": "2024-09-22T12:26:19.512175Z",
"url": "https://files.pythonhosted.org/packages/6c/f5/acaf42f53af29b64ea30a60357b7cb370628a1fb9ad7591f0d4acd665f1d/flake8_pyspark_with_column-0.0.2-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "363a7edfce8cc46f11c588af5e4e5599db311ecc0e307e943a35405dc19f76c7",
"md5": "17f1c9c0f9fd55f628c7fa9ca434b5e0",
"sha256": "897670411f9ca6858d9f36ba328182895e65a2ea54f23e29212e9cc75f8c0dad"
},
"downloads": -1,
"filename": "flake8_pyspark_with_column-0.0.2.tar.gz",
"has_sig": false,
"md5_digest": "17f1c9c0f9fd55f628c7fa9ca434b5e0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 7235,
"upload_time": "2024-09-22T12:26:20",
"upload_time_iso_8601": "2024-09-22T12:26:20.917988Z",
"url": "https://files.pythonhosted.org/packages/36/3a/7edfce8cc46f11c588af5e4e5599db311ecc0e307e943a35405dc19f76c7/flake8_pyspark_with_column-0.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-22 12:26:20",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "SemyonSinchenko",
"github_project": "flake8-pyspark-with-column",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "flake8-pyspark-with-column"
}