dataproc-spark-connect

Name	dataproc-spark-connect JSON
Version	0.8.3 JSON
	download
home_page	https://github.com/GoogleCloudDataproc/dataproc-spark-connect-python
Summary	Dataproc client library for Spark Connect
upload_time	2025-07-21 22:47:36
maintainer	None
docs_url	None
author	Google LLC
requires_python	None
license	Apache 2.0
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

# Dataproc Spark Connect Client

A wrapper of the Apache [Spark Connect](https://spark.apache.org/spark-connect/)
client with additional functionalities that allow applications to communicate
with a remote Dataproc Spark Session using the Spark Connect protocol without
requiring additional steps.

## Install

```sh
pip install dataproc_spark_connect
```

## Uninstall

```sh
pip uninstall dataproc_spark_connect
```

## Setup

This client requires permissions to
manage [Dataproc Sessions and Session Templates](https://cloud.google.com/dataproc-serverless/docs/concepts/iam).
If you are running the client outside of Google Cloud, you must set following
environment variables:

* `GOOGLE_CLOUD_PROJECT` - The Google Cloud project you use to run Spark
workloads
* `GOOGLE_CLOUD_REGION` - The Compute
Engine [region](https://cloud.google.com/compute/docs/regions-zones#available)
where you run the Spark workload.
* `GOOGLE_APPLICATION_CREDENTIALS` -
Your [Application Credentials](https://cloud.google.com/docs/authentication/provide-credentials-adc)

## Usage

1. Install the latest version of Dataproc Python client and Dataproc Spark
Connect modules:

```sh
pip install google_cloud_dataproc dataproc_spark_connect --force-reinstall
```

2. Add the required imports into your PySpark application or notebook and start
a Spark session with the following code instead of using
environment variables:

```python
from google.cloud.dataproc_spark_connect import DataprocSparkSession
from google.cloud.dataproc_v1 import Session
session_config = Session()
session_config.environment_config.execution_config.subnetwork_uri = '<subnet>'
session_config.runtime_config.version = '2.2'
spark = DataprocSparkSession.builder.dataprocSessionConfig(session_config).getOrCreate()
```

## Developing

For development instructions see [guide](DEVELOPING.md).

## Contributing

We'd love to accept your patches and contributions to this project. There are
just a few small guidelines you need to follow.

### Contributor License Agreement

Contributions to this project must be accompanied by a Contributor License
Agreement. You (or your employer) retain the copyright to your contribution;
this simply gives us permission to use and redistribute your contributions as
part of the project. Head over to <https://cla.developers.google.com> to see
your current agreements on file or to sign a new one.

You generally only need to submit a CLA once, so if you've already submitted one
(even if it was for a different project), you probably don't need to do it
again.

### Code reviews

All submissions, including submissions by project members, require review. We
use GitHub pull requests for this purpose. Consult
[GitHub Help](https://help.github.com/articles/about-pull-requests/) for more
information on using pull requests.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/GoogleCloudDataproc/dataproc-spark-connect-python",
    "name": "dataproc-spark-connect",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "Google LLC",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/7b/c4/95ffdba4490c3a8270c4f4f5f8503224cfda3816d931414f5d437c21c4dd/dataproc_spark_connect-0.8.3.tar.gz",
    "platform": null,
    "description": "# Dataproc Spark Connect Client\n\nA wrapper of the Apache [Spark Connect](https://spark.apache.org/spark-connect/)\nclient with additional functionalities that allow applications to communicate\nwith a remote Dataproc Spark Session using the Spark Connect protocol without\nrequiring additional steps.\n\n## Install\n\n```sh\npip install dataproc_spark_connect\n```\n\n## Uninstall\n\n```sh\npip uninstall dataproc_spark_connect\n```\n\n## Setup\n\nThis client requires permissions to\nmanage [Dataproc Sessions and Session Templates](https://cloud.google.com/dataproc-serverless/docs/concepts/iam).\nIf you are running the client outside of Google Cloud, you must set following\nenvironment variables:\n\n* `GOOGLE_CLOUD_PROJECT` - The Google Cloud project you use to run Spark\n  workloads\n* `GOOGLE_CLOUD_REGION` - The Compute\n  Engine [region](https://cloud.google.com/compute/docs/regions-zones#available)\n  where you run the Spark workload.\n* `GOOGLE_APPLICATION_CREDENTIALS` -\n  Your [Application Credentials](https://cloud.google.com/docs/authentication/provide-credentials-adc)\n\n## Usage\n\n1. Install the latest version of Dataproc Python client and Dataproc Spark\n   Connect modules:\n\n   ```sh\n   pip install google_cloud_dataproc dataproc_spark_connect --force-reinstall\n   ```\n\n2. Add the required imports into your PySpark application or notebook and start\n   a Spark session with the following code instead of using\n   environment variables:\n\n   ```python\n   from google.cloud.dataproc_spark_connect import DataprocSparkSession\n   from google.cloud.dataproc_v1 import Session\n   session_config = Session()\n   session_config.environment_config.execution_config.subnetwork_uri = '<subnet>'\n   session_config.runtime_config.version = '2.2'\n   spark = DataprocSparkSession.builder.dataprocSessionConfig(session_config).getOrCreate()\n   ```\n\n## Developing\n\nFor development instructions see [guide](DEVELOPING.md).\n\n## Contributing\n\nWe'd love to accept your patches and contributions to this project. There are\njust a few small guidelines you need to follow.\n\n### Contributor License Agreement\n\nContributions to this project must be accompanied by a Contributor License\nAgreement. You (or your employer) retain the copyright to your contribution;\nthis simply gives us permission to use and redistribute your contributions as\npart of the project. Head over to <https://cla.developers.google.com> to see\nyour current agreements on file or to sign a new one.\n\nYou generally only need to submit a CLA once, so if you've already submitted one\n(even if it was for a different project), you probably don't need to do it\nagain.\n\n### Code reviews\n\nAll submissions, including submissions by project members, require review. We\nuse GitHub pull requests for this purpose. Consult\n[GitHub Help](https://help.github.com/articles/about-pull-requests/) for more\ninformation on using pull requests.\n",
    "bugtrack_url": null,
    "license": "Apache 2.0",
    "summary": "Dataproc client library for Spark Connect",
    "version": "0.8.3",
    "project_urls": {
        "Homepage": "https://github.com/GoogleCloudDataproc/dataproc-spark-connect-python"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "3b111b1748a8f426660630458458957cbfaf6beb6a558365a3244689148419d9",
                "md5": "0dfc7c8db702f2a88725c95641a8053c",
                "sha256": "8d61a75b178f8b7d0f2d2ee4c87505b8535186aeb208e11a3a76c98d01499526"
            },
            "downloads": -1,
            "filename": "dataproc_spark_connect-0.8.3-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0dfc7c8db702f2a88725c95641a8053c",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 23050,
            "upload_time": "2025-07-21T22:47:35",
            "upload_time_iso_8601": "2025-07-21T22:47:35.237794Z",
            "url": "https://files.pythonhosted.org/packages/3b/11/1b1748a8f426660630458458957cbfaf6beb6a558365a3244689148419d9/dataproc_spark_connect-0.8.3-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7bc495ffdba4490c3a8270c4f4f5f8503224cfda3816d931414f5d437c21c4dd",
                "md5": "c2a5c615f102a8834fd3d642f2ac6c3f",
                "sha256": "31d8ee04c75026baa79a093cb009e73d96ae9de1d06cf2d0a4fb5a5b2918eba3"
            },
            "downloads": -1,
            "filename": "dataproc_spark_connect-0.8.3.tar.gz",
            "has_sig": false,
            "md5_digest": "c2a5c615f102a8834fd3d642f2ac6c3f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 19205,
            "upload_time": "2025-07-21T22:47:36",
            "upload_time_iso_8601": "2025-07-21T22:47:36.087185Z",
            "url": "https://files.pythonhosted.org/packages/7b/c4/95ffdba4490c3a8270c4f4f5f8503224cfda3816d931414f5d437c21c4dd/dataproc_spark_connect-0.8.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-21 22:47:36",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "GoogleCloudDataproc",
    "github_project": "dataproc-spark-connect-python",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "dataproc-spark-connect"
}

Google LLC