# spark-connect-proxy
A reverse proxy server which allows secure connectivity to a Spark Connect server
[<img src="https://img.shields.io/badge/GitHub-prmoore77%2Fspark--connect--proxy-blue.svg?logo=Github">](https://github.com/prmoore77/spark-connect-proxy)
[![spark-connect-proxy-ci](https://github.com/prmoore77/spark-connect-proxy/actions/workflows/ci.yml/badge.svg)](https://github.com/prmoore77/spark-connect-proxy/actions/workflows/ci.yml)
[![Supported Python Versions](https://img.shields.io/pypi/pyversions/spark-connect-proxy)](https://pypi.org/project/spark-connect-proxy/)
[![PyPI version](https://badge.fury.io/py/spark-connect-proxy.svg)](https://badge.fury.io/py/spark-connect-proxy)
[![PyPI Downloads](https://img.shields.io/pypi/dm/spark-connect-proxy.svg)](https://pypi.org/project/spark-connect-proxy/)
# Setup (to run locally)
## Install Python package
You can install `spark-connect-proxy` from PyPi or from source.
### Option 1 - from PyPi
```shell
# Create the virtual environment
python3 -m venv .venv
# Activate the virtual environment
. .venv/bin/activate
pip install spark-connect-proxy
```
### Option 2 - from source - for development
```shell
git clone https://github.com/prmoore77/spark-connect-proxy
cd spark-connect-proxy
# Create the virtual environment
python3 -m venv .venv
# Activate the virtual environment
. .venv/bin/activate
# Upgrade pip, setuptools, and wheel
pip install --upgrade pip setuptools wheel
# Install Spark Connect Proxy - in editable mode with client and dev dependencies
pip install --editable .[client,dev]
```
### Note
For the following commands - if you running from source and using `--editable` mode (for development purposes) - you will need to set the PYTHONPATH environment variable as follows:
```shell
export PYTHONPATH=$(pwd)/src
```
### Usage
This repo contains scripts to let you provision an AWS EMR Spark cluster with a secure Spark Connect Proxy server to allow you to securely and remotely connect to it.
The scripts the AWS CLI to provision the EMR Spark cluster - so you will need to have the [AWS CLI installed](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) and configured with your AWS credentials.
You can create a file in your local copy of the `scripts` directory called `.env` with the following contents:
```shell
export AWS_ACCESS_KEY_ID="put value from AWS here"
export AWS_SECRET_ACCESS_KEY="put value from AWS here"
export AWS_SESSION_TOKEN="put value from AWS here"
export AWS_REGION="us-east-2"
```
To provision the EMR Spark cluster - run the following command from the root directory of this repo:
```shell
scripts/provision_emr_spark_cluster.sh
```
That will output several files:
- file: `tls/ca.crt` - the EMR Spark cluster generated TLS certificate - needed for your PySpark client to trust the Spark Connect Proxy server (b/c it is self-signed)
- file: `scripts/output/instance_details.txt` - shows the ssh command for connecting to the master node of the EMR Spark cluster
- file: `scripts/output/spark_connect_proxy_details.log` - shows how to run a PySpark Ibis client example - which connects securely from your local computer to the remote EMR Spark cluster. Example command:
```shell
spark-connect-proxy-ibis-client-example \
--host ec2-01-01-01-01.us-east-2.compute.amazonaws.com \
--port 50051 \
--use-tls \
--tls-roots tls/ca.crt \
--token honey.badger.dontcare
```
### Handy development commands
#### Version management
##### Bump the version of the application - (you must have installed from source with the [dev] extras)
```bash
bumpver update --patch
```
Raw data
{
"_id": null,
"home_page": null,
"name": "spark-connect-proxy",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "spark, pyspark, grpc, reverse-proxy, connect, spark-connect",
"author": null,
"author_email": "Philip Moore <prmoore77@hotmail.com>",
"download_url": "https://files.pythonhosted.org/packages/30/64/8051287c11d7cd5f4a2e17eed108cc43b42443edef0c88b6b1785d51f17f/spark_connect_proxy-0.0.7.tar.gz",
"platform": null,
"description": "# spark-connect-proxy\nA reverse proxy server which allows secure connectivity to a Spark Connect server\n\n[<img src=\"https://img.shields.io/badge/GitHub-prmoore77%2Fspark--connect--proxy-blue.svg?logo=Github\">](https://github.com/prmoore77/spark-connect-proxy)\n[![spark-connect-proxy-ci](https://github.com/prmoore77/spark-connect-proxy/actions/workflows/ci.yml/badge.svg)](https://github.com/prmoore77/spark-connect-proxy/actions/workflows/ci.yml)\n[![Supported Python Versions](https://img.shields.io/pypi/pyversions/spark-connect-proxy)](https://pypi.org/project/spark-connect-proxy/)\n[![PyPI version](https://badge.fury.io/py/spark-connect-proxy.svg)](https://badge.fury.io/py/spark-connect-proxy)\n[![PyPI Downloads](https://img.shields.io/pypi/dm/spark-connect-proxy.svg)](https://pypi.org/project/spark-connect-proxy/)\n\n# Setup (to run locally)\n\n## Install Python package\nYou can install `spark-connect-proxy` from PyPi or from source.\n\n### Option 1 - from PyPi\n```shell\n# Create the virtual environment\npython3 -m venv .venv\n\n# Activate the virtual environment\n. .venv/bin/activate\n\npip install spark-connect-proxy\n```\n\n### Option 2 - from source - for development\n```shell\ngit clone https://github.com/prmoore77/spark-connect-proxy\n\ncd spark-connect-proxy\n\n# Create the virtual environment\npython3 -m venv .venv\n\n# Activate the virtual environment\n. .venv/bin/activate\n\n# Upgrade pip, setuptools, and wheel\npip install --upgrade pip setuptools wheel\n\n# Install Spark Connect Proxy - in editable mode with client and dev dependencies\npip install --editable .[client,dev]\n```\n\n### Note\nFor the following commands - if you running from source and using `--editable` mode (for development purposes) - you will need to set the PYTHONPATH environment variable as follows:\n```shell\nexport PYTHONPATH=$(pwd)/src\n```\n\n### Usage\nThis repo contains scripts to let you provision an AWS EMR Spark cluster with a secure Spark Connect Proxy server to allow you to securely and remotely connect to it.\n\nThe scripts the AWS CLI to provision the EMR Spark cluster - so you will need to have the [AWS CLI installed](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) and configured with your AWS credentials.\n\nYou can create a file in your local copy of the `scripts` directory called `.env` with the following contents:\n```shell\nexport AWS_ACCESS_KEY_ID=\"put value from AWS here\"\nexport AWS_SECRET_ACCESS_KEY=\"put value from AWS here\"\nexport AWS_SESSION_TOKEN=\"put value from AWS here\"\nexport AWS_REGION=\"us-east-2\"\n```\n\nTo provision the EMR Spark cluster - run the following command from the root directory of this repo:\n```shell\nscripts/provision_emr_spark_cluster.sh\n```\n\nThat will output several files:\n- file: `tls/ca.crt` - the EMR Spark cluster generated TLS certificate - needed for your PySpark client to trust the Spark Connect Proxy server (b/c it is self-signed)\n- file: `scripts/output/instance_details.txt` - shows the ssh command for connecting to the master node of the EMR Spark cluster\n- file: `scripts/output/spark_connect_proxy_details.log` - shows how to run a PySpark Ibis client example - which connects securely from your local computer to the remote EMR Spark cluster. Example command:\n```shell\nspark-connect-proxy-ibis-client-example \\\n --host ec2-01-01-01-01.us-east-2.compute.amazonaws.com \\\n --port 50051 \\\n --use-tls \\\n --tls-roots tls/ca.crt \\\n --token honey.badger.dontcare\n```\n\n### Handy development commands\n\n#### Version management\n\n##### Bump the version of the application - (you must have installed from source with the [dev] extras)\n```bash\nbumpver update --patch\n```\n",
"bugtrack_url": null,
"license": null,
"summary": "A reverse proxy server which allows secure connectivity to a Spark Connect server",
"version": "0.0.7",
"project_urls": {
"Homepage": "https://github.com/prmoore77/spark-connect-proxy"
},
"split_keywords": [
"spark",
" pyspark",
" grpc",
" reverse-proxy",
" connect",
" spark-connect"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "92ac596f59bc7d3187f34a755b7f0a901eeb0277fa26c2380d378d0bebffa01b",
"md5": "21a60f8e9fd0c8508e38b4fc265100d1",
"sha256": "4015cadd8641ea479708065ca4d0cfd498cde384de1191962605a04a894ab47c"
},
"downloads": -1,
"filename": "spark_connect_proxy-0.0.7-py3-none-any.whl",
"has_sig": false,
"md5_digest": "21a60f8e9fd0c8508e38b4fc265100d1",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 12140,
"upload_time": "2024-09-16T15:37:16",
"upload_time_iso_8601": "2024-09-16T15:37:16.338808Z",
"url": "https://files.pythonhosted.org/packages/92/ac/596f59bc7d3187f34a755b7f0a901eeb0277fa26c2380d378d0bebffa01b/spark_connect_proxy-0.0.7-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "30648051287c11d7cd5f4a2e17eed108cc43b42443edef0c88b6b1785d51f17f",
"md5": "663b6e224a96afb6198833cfc0aa3de8",
"sha256": "1917d20a9868d6d832f674a2e602542437e45b97d5898137ae0a7ae09c8c66a5"
},
"downloads": -1,
"filename": "spark_connect_proxy-0.0.7.tar.gz",
"has_sig": false,
"md5_digest": "663b6e224a96afb6198833cfc0aa3de8",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 11337,
"upload_time": "2024-09-16T15:37:17",
"upload_time_iso_8601": "2024-09-16T15:37:17.712649Z",
"url": "https://files.pythonhosted.org/packages/30/64/8051287c11d7cd5f4a2e17eed108cc43b42443edef0c88b6b1785d51f17f/spark_connect_proxy-0.0.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-16 15:37:17",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "prmoore77",
"github_project": "spark-connect-proxy",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "spark-connect-proxy"
}