beam-pyspark-runner

Name	beam-pyspark-runner JSON
Version	0.0.3 JSON
	download
home_page	None
Summary	An Apache Beam pipeline Runner built on Apache Spark's python API
upload_time	2024-04-23 22:51:20
maintainer	None
docs_url	None
author	None
requires_python	>=3.7
license	MIT
keywords	virtualenv dependencies
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # PySpark Apache Beam Runner

## Overview
(WHY? Doesn't Beam ship with a Spark runner?)

This project introduces a custom Apache Beam runner that leverages PySpark directly.
This is not a 'portability' framework compliant runner! It is designed for environments
where a SparkSession is available but a Spark master server is not. This is useful for
e.g. serverless environments where jobs are triggered without a long-running cluster,
sidestepping the expectations of Beam's default Spark runner.

The other benefit is that this strategy for building a runner helps to keep the stack as
python-centric as possible. The compilation process, the optimizations, the execution
planning - these all happen in python (for better or worse). Depending on your needs,
this might be a significant advantage.

## Features
- **Direct Integration with PySpark**: Utilizes a PySpark  assumed SparkSession directly.
- **Serverless Compatibility**: Ideal for environments without a dedicated Spark master, supporting execution in serverless frameworks.
- **Simplified Setup**: Potentially reduces the complexity of job submission by avoiding the need for port listening on a Spark master.

## Getting Started

### Prerequisites
- Apache Spark
- Apache Beam
- Python 3.8 or later

### Installation
To use this custom runner, just `pip install` as you would any library

```bash
pip install beam-pyspark-runner
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "beam-pyspark-runner",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "virtualenv, dependencies",
    "author": null,
    "author_email": "Nathan Zimmerman <npzimmerman@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/cc/0c/a51d5b39b2beda69129da1e52825dcdc99b055ac11810258b21deb348fac/beam_pyspark_runner-0.0.3.tar.gz",
    "platform": null,
    "description": "# PySpark Apache Beam Runner\n\n## Overview\n(WHY? Doesn't Beam ship with a Spark runner?)\n\nThis project introduces a custom Apache Beam runner that leverages PySpark directly.\nThis is not a 'portability' framework compliant runner! It is designed for environments\nwhere a SparkSession is available but a Spark master server is not. This is useful for\ne.g. serverless environments where jobs are triggered without a long-running cluster,\nsidestepping the expectations of Beam's default Spark runner.\n\nThe other benefit is that this strategy for building a runner helps to keep the stack as\npython-centric as possible. The compilation process, the optimizations, the execution\nplanning - these all happen in python (for better or worse). Depending on your needs,\nthis might be a significant advantage.\n\n## Features\n- **Direct Integration with PySpark**: Utilizes a PySpark  assumed SparkSession directly.\n- **Serverless Compatibility**: Ideal for environments without a dedicated Spark master, supporting execution in serverless frameworks.\n- **Simplified Setup**: Potentially reduces the complexity of job submission by avoiding the need for port listening on a Spark master.\n\n## Getting Started\n\n### Prerequisites\n- Apache Spark\n- Apache Beam\n- Python 3.8 or later\n\n### Installation\nTo use this custom runner, just `pip install` as you would any library\n\n```bash\npip install beam-pyspark-runner\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "An Apache Beam pipeline Runner built on Apache Spark's python API",
    "version": "0.0.3",
    "project_urls": {
        "homepage": "https://github.com/moradology/beam-pyspark-runner",
        "repository": "https://github.com/moradology/beam-pyspark-runner.git"
    },
    "split_keywords": [
        "virtualenv",
        " dependencies"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5ec44e55ec84c154902a1b334ecc12caab5fb38cdf79e0ea7bd809136dd581fe",
                "md5": "ae2c6a090c4ed8839def0abe3cd1ab44",
                "sha256": "33c458c2f1b48d7a5042732d4fd55b2739cfad19f6d7a1f485d270e57c1d5141"
            },
            "downloads": -1,
            "filename": "beam_pyspark_runner-0.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ae2c6a090c4ed8839def0abe3cd1ab44",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 11868,
            "upload_time": "2024-04-23T22:51:18",
            "upload_time_iso_8601": "2024-04-23T22:51:18.698309Z",
            "url": "https://files.pythonhosted.org/packages/5e/c4/4e55ec84c154902a1b334ecc12caab5fb38cdf79e0ea7bd809136dd581fe/beam_pyspark_runner-0.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cc0ca51d5b39b2beda69129da1e52825dcdc99b055ac11810258b21deb348fac",
                "md5": "359cbf0b0dfda90b45a69694f9a2332f",
                "sha256": "1a02ecbf325f9d8c8885a92218c79edfbd43ade2d4f4502a76afa05dd3ccd44d"
            },
            "downloads": -1,
            "filename": "beam_pyspark_runner-0.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "359cbf0b0dfda90b45a69694f9a2332f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 10775,
            "upload_time": "2024-04-23T22:51:20",
            "upload_time_iso_8601": "2024-04-23T22:51:20.320385Z",
            "url": "https://files.pythonhosted.org/packages/cc/0c/a51d5b39b2beda69129da1e52825dcdc99b055ac11810258b21deb348fac/beam_pyspark_runner-0.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-23 22:51:20",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "moradology",
    "github_project": "beam-pyspark-runner",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "beam-pyspark-runner"
}

None