spark-connect-proxy


Namespark-connect-proxy JSON
Version 0.0.7 PyPI version JSON
download
home_pageNone
SummaryA reverse proxy server which allows secure connectivity to a Spark Connect server
upload_time2024-09-16 15:37:17
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseNone
keywords spark pyspark grpc reverse-proxy connect spark-connect
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # spark-connect-proxy
A reverse proxy server which allows secure connectivity to a Spark Connect server

[<img src="https://img.shields.io/badge/GitHub-prmoore77%2Fspark--connect--proxy-blue.svg?logo=Github">](https://github.com/prmoore77/spark-connect-proxy)
[![spark-connect-proxy-ci](https://github.com/prmoore77/spark-connect-proxy/actions/workflows/ci.yml/badge.svg)](https://github.com/prmoore77/spark-connect-proxy/actions/workflows/ci.yml)
[![Supported Python Versions](https://img.shields.io/pypi/pyversions/spark-connect-proxy)](https://pypi.org/project/spark-connect-proxy/)
[![PyPI version](https://badge.fury.io/py/spark-connect-proxy.svg)](https://badge.fury.io/py/spark-connect-proxy)
[![PyPI Downloads](https://img.shields.io/pypi/dm/spark-connect-proxy.svg)](https://pypi.org/project/spark-connect-proxy/)

# Setup (to run locally)

## Install Python package
You can install `spark-connect-proxy` from PyPi or from source.

### Option 1 - from PyPi
```shell
# Create the virtual environment
python3 -m venv .venv

# Activate the virtual environment
. .venv/bin/activate

pip install spark-connect-proxy
```

### Option 2 - from source - for development
```shell
git clone https://github.com/prmoore77/spark-connect-proxy

cd spark-connect-proxy

# Create the virtual environment
python3 -m venv .venv

# Activate the virtual environment
. .venv/bin/activate

# Upgrade pip, setuptools, and wheel
pip install --upgrade pip setuptools wheel

# Install Spark Connect Proxy - in editable mode with client and dev dependencies
pip install --editable .[client,dev]
```

### Note
For the following commands - if you running from source and using `--editable` mode (for development purposes) - you will need to set the PYTHONPATH environment variable as follows:
```shell
export PYTHONPATH=$(pwd)/src
```

### Usage
This repo contains scripts to let you provision an AWS EMR Spark cluster with a secure Spark Connect Proxy server to allow you to securely and remotely connect to it.

The scripts the AWS CLI to provision the EMR Spark cluster - so you will need to have the [AWS CLI installed](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) and configured with your AWS credentials.

You can create a file in your local copy of the `scripts` directory called `.env` with the following contents:
```shell
export AWS_ACCESS_KEY_ID="put value from AWS here"
export AWS_SECRET_ACCESS_KEY="put value from AWS here"
export AWS_SESSION_TOKEN="put value from AWS here"
export AWS_REGION="us-east-2"
```

To provision the EMR Spark cluster - run the following command from the root directory of this repo:
```shell
scripts/provision_emr_spark_cluster.sh
```

That will output several files:
- file: `tls/ca.crt` - the EMR Spark cluster generated TLS certificate - needed for your PySpark client to trust the Spark Connect Proxy server (b/c it is self-signed)
- file: `scripts/output/instance_details.txt` - shows the ssh command for connecting to the master node of the EMR Spark cluster
- file: `scripts/output/spark_connect_proxy_details.log` - shows how to run a PySpark Ibis client example - which connects securely from your local computer to the remote EMR Spark cluster.  Example command:
```shell
spark-connect-proxy-ibis-client-example \
  --host ec2-01-01-01-01.us-east-2.compute.amazonaws.com \
  --port 50051 \
  --use-tls \
  --tls-roots tls/ca.crt \
  --token honey.badger.dontcare
```

### Handy development commands

#### Version management

##### Bump the version of the application - (you must have installed from source with the [dev] extras)
```bash
bumpver update --patch
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "spark-connect-proxy",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "spark, pyspark, grpc, reverse-proxy, connect, spark-connect",
    "author": null,
    "author_email": "Philip Moore <prmoore77@hotmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/30/64/8051287c11d7cd5f4a2e17eed108cc43b42443edef0c88b6b1785d51f17f/spark_connect_proxy-0.0.7.tar.gz",
    "platform": null,
    "description": "# spark-connect-proxy\nA reverse proxy server which allows secure connectivity to a Spark Connect server\n\n[<img src=\"https://img.shields.io/badge/GitHub-prmoore77%2Fspark--connect--proxy-blue.svg?logo=Github\">](https://github.com/prmoore77/spark-connect-proxy)\n[![spark-connect-proxy-ci](https://github.com/prmoore77/spark-connect-proxy/actions/workflows/ci.yml/badge.svg)](https://github.com/prmoore77/spark-connect-proxy/actions/workflows/ci.yml)\n[![Supported Python Versions](https://img.shields.io/pypi/pyversions/spark-connect-proxy)](https://pypi.org/project/spark-connect-proxy/)\n[![PyPI version](https://badge.fury.io/py/spark-connect-proxy.svg)](https://badge.fury.io/py/spark-connect-proxy)\n[![PyPI Downloads](https://img.shields.io/pypi/dm/spark-connect-proxy.svg)](https://pypi.org/project/spark-connect-proxy/)\n\n# Setup (to run locally)\n\n## Install Python package\nYou can install `spark-connect-proxy` from PyPi or from source.\n\n### Option 1 - from PyPi\n```shell\n# Create the virtual environment\npython3 -m venv .venv\n\n# Activate the virtual environment\n. .venv/bin/activate\n\npip install spark-connect-proxy\n```\n\n### Option 2 - from source - for development\n```shell\ngit clone https://github.com/prmoore77/spark-connect-proxy\n\ncd spark-connect-proxy\n\n# Create the virtual environment\npython3 -m venv .venv\n\n# Activate the virtual environment\n. .venv/bin/activate\n\n# Upgrade pip, setuptools, and wheel\npip install --upgrade pip setuptools wheel\n\n# Install Spark Connect Proxy - in editable mode with client and dev dependencies\npip install --editable .[client,dev]\n```\n\n### Note\nFor the following commands - if you running from source and using `--editable` mode (for development purposes) - you will need to set the PYTHONPATH environment variable as follows:\n```shell\nexport PYTHONPATH=$(pwd)/src\n```\n\n### Usage\nThis repo contains scripts to let you provision an AWS EMR Spark cluster with a secure Spark Connect Proxy server to allow you to securely and remotely connect to it.\n\nThe scripts the AWS CLI to provision the EMR Spark cluster - so you will need to have the [AWS CLI installed](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) and configured with your AWS credentials.\n\nYou can create a file in your local copy of the `scripts` directory called `.env` with the following contents:\n```shell\nexport AWS_ACCESS_KEY_ID=\"put value from AWS here\"\nexport AWS_SECRET_ACCESS_KEY=\"put value from AWS here\"\nexport AWS_SESSION_TOKEN=\"put value from AWS here\"\nexport AWS_REGION=\"us-east-2\"\n```\n\nTo provision the EMR Spark cluster - run the following command from the root directory of this repo:\n```shell\nscripts/provision_emr_spark_cluster.sh\n```\n\nThat will output several files:\n- file: `tls/ca.crt` - the EMR Spark cluster generated TLS certificate - needed for your PySpark client to trust the Spark Connect Proxy server (b/c it is self-signed)\n- file: `scripts/output/instance_details.txt` - shows the ssh command for connecting to the master node of the EMR Spark cluster\n- file: `scripts/output/spark_connect_proxy_details.log` - shows how to run a PySpark Ibis client example - which connects securely from your local computer to the remote EMR Spark cluster.  Example command:\n```shell\nspark-connect-proxy-ibis-client-example \\\n  --host ec2-01-01-01-01.us-east-2.compute.amazonaws.com \\\n  --port 50051 \\\n  --use-tls \\\n  --tls-roots tls/ca.crt \\\n  --token honey.badger.dontcare\n```\n\n### Handy development commands\n\n#### Version management\n\n##### Bump the version of the application - (you must have installed from source with the [dev] extras)\n```bash\nbumpver update --patch\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A reverse proxy server which allows secure connectivity to a Spark Connect server",
    "version": "0.0.7",
    "project_urls": {
        "Homepage": "https://github.com/prmoore77/spark-connect-proxy"
    },
    "split_keywords": [
        "spark",
        " pyspark",
        " grpc",
        " reverse-proxy",
        " connect",
        " spark-connect"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "92ac596f59bc7d3187f34a755b7f0a901eeb0277fa26c2380d378d0bebffa01b",
                "md5": "21a60f8e9fd0c8508e38b4fc265100d1",
                "sha256": "4015cadd8641ea479708065ca4d0cfd498cde384de1191962605a04a894ab47c"
            },
            "downloads": -1,
            "filename": "spark_connect_proxy-0.0.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "21a60f8e9fd0c8508e38b4fc265100d1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 12140,
            "upload_time": "2024-09-16T15:37:16",
            "upload_time_iso_8601": "2024-09-16T15:37:16.338808Z",
            "url": "https://files.pythonhosted.org/packages/92/ac/596f59bc7d3187f34a755b7f0a901eeb0277fa26c2380d378d0bebffa01b/spark_connect_proxy-0.0.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "30648051287c11d7cd5f4a2e17eed108cc43b42443edef0c88b6b1785d51f17f",
                "md5": "663b6e224a96afb6198833cfc0aa3de8",
                "sha256": "1917d20a9868d6d832f674a2e602542437e45b97d5898137ae0a7ae09c8c66a5"
            },
            "downloads": -1,
            "filename": "spark_connect_proxy-0.0.7.tar.gz",
            "has_sig": false,
            "md5_digest": "663b6e224a96afb6198833cfc0aa3de8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 11337,
            "upload_time": "2024-09-16T15:37:17",
            "upload_time_iso_8601": "2024-09-16T15:37:17.712649Z",
            "url": "https://files.pythonhosted.org/packages/30/64/8051287c11d7cd5f4a2e17eed108cc43b42443edef0c88b6b1785d51f17f/spark_connect_proxy-0.0.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-16 15:37:17",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "prmoore77",
    "github_project": "spark-connect-proxy",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "spark-connect-proxy"
}
        
Elapsed time: 0.33143s