### PySpark sample
This sample project shows how to deploy a PySpark job to Data Platform.
The only pre-requisite is to have a Data Platform project already setup
with a namespace with enough quota and with this repo connected as source.
Please find the Data Platform Quickstart guide [here](https://docs.aci.apple.com/spark_kube/getting_started/introduction.html#getting-started).
Once you have a `projectId` and a `namespaceId`, please substitute the values in the `rio.yml`.
### CI/CD Overview
#### PySpark project
There are 2 pipelines in `rio.yml` that show the possible ways of packaging and deploying your PySpark jobs:
1. `main`: shows how to use a [Buildozer build](https://docs.aci.apple.com/rio/guide-to-rio/build-and-test/buildozer.html#buildozer-builds)
which will provide a Python runtime with the version set in `runtime.txt` file.
2. `sdp-base-image`: shows how to use a [Freestyle build](https://docs.aci.apple.com/rio/guide-to-rio/build-and-test/freestyle.html#freestyle-builds)
with an SDP Base Image that will provide Python 3.9. The jobs will be deployed only when commits are pushed to `sdp-base-image` branch.
#### Hybrid PySpark + Gradle multi-project (Java and Scala)
There is another pipeline `hybrid` in `rio.yml` that show how to compile, package and deploying your hybrid Python + Java/Scala jobs with a [Buildozer build](https://docs.aci.apple.com/rio/guide-to-rio/build-and-test/buildozer.html#buildozer-builds)
with two Builders: [Python](https://docs.aci.apple.com/rio/guide-to-rio/build-and-test/builders/python.html) and [Gradle](https://docs.aci.apple.com/rio/guide-to-rio/build-and-test/builders/java-gradle.html).
The jobs will be deployed only when commits are pushed to `hybrid` branch.
In this case, the Python version is taken from the `runtime.txt` file and the Java version can be customized with the `RUNTIME_JDK_VERSION` env variable in the `rio.yml` file.
Please refer to the previous links to Rio docs for further instructions on how to customize the builders.
### Splunk logging from an ACI Kube namespace
1. Follow [Kube's docs](https://kube.apple.com/docs/guides/connecting-to-non-kube-services/) to enable connectivity to the Apple Internal and the Apple Datacenters named networks.
2. In your Python scripts, configure the logger with the following instruction:
```python
import os
from logging.config import fileConfig
fileConfig(os.getenv('PLATFORM_PYTHON_SPARK_LOG_CONF'))
```
Raw data
{
"_id": null,
"home_page": "https://github.geo.apple.com/yun-hu/veelib",
"name": "veelib",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "PySpark,PIE Spark,Spark",
"author": "Yun Hu",
"author_email": "yun_hu@apple.com",
"download_url": "",
"platform": null,
"description": "### PySpark sample\n\nThis sample project shows how to deploy a PySpark job to Data Platform.\n\nThe only pre-requisite is to have a Data Platform project already setup \nwith a namespace with enough quota and with this repo connected as source.\n\nPlease find the Data Platform Quickstart guide [here](https://docs.aci.apple.com/spark_kube/getting_started/introduction.html#getting-started).\n\nOnce you have a `projectId` and a `namespaceId`, please substitute the values in the `rio.yml`.\n\n### CI/CD Overview\n\n#### PySpark project\nThere are 2 pipelines in `rio.yml` that show the possible ways of packaging and deploying your PySpark jobs:\n1. `main`: shows how to use a [Buildozer build](https://docs.aci.apple.com/rio/guide-to-rio/build-and-test/buildozer.html#buildozer-builds)\nwhich will provide a Python runtime with the version set in `runtime.txt` file.\n2. `sdp-base-image`: shows how to use a [Freestyle build](https://docs.aci.apple.com/rio/guide-to-rio/build-and-test/freestyle.html#freestyle-builds)\nwith an SDP Base Image that will provide Python 3.9. The jobs will be deployed only when commits are pushed to `sdp-base-image` branch.\n\n#### Hybrid PySpark + Gradle multi-project (Java and Scala) \nThere is another pipeline `hybrid` in `rio.yml` that show how to compile, package and deploying your hybrid Python + Java/Scala jobs with a [Buildozer build](https://docs.aci.apple.com/rio/guide-to-rio/build-and-test/buildozer.html#buildozer-builds)\nwith two Builders: [Python](https://docs.aci.apple.com/rio/guide-to-rio/build-and-test/builders/python.html) and [Gradle](https://docs.aci.apple.com/rio/guide-to-rio/build-and-test/builders/java-gradle.html).\nThe jobs will be deployed only when commits are pushed to `hybrid` branch.\n\nIn this case, the Python version is taken from the `runtime.txt` file and the Java version can be customized with the `RUNTIME_JDK_VERSION` env variable in the `rio.yml` file.\n\nPlease refer to the previous links to Rio docs for further instructions on how to customize the builders.\n\n### Splunk logging from an ACI Kube namespace\n1. Follow [Kube's docs](https://kube.apple.com/docs/guides/connecting-to-non-kube-services/) to enable connectivity to the Apple Internal and the Apple Datacenters named networks.\n2. In your Python scripts, configure the logger with the following instruction: \n```python\nimport os\nfrom logging.config import fileConfig\n \nfileConfig(os.getenv('PLATFORM_PYTHON_SPARK_LOG_CONF'))\n```\n\n\n",
"bugtrack_url": null,
"license": "Apple Internal",
"summary": "Vee Library",
"version": "1.0.8",
"split_keywords": [
"pyspark",
"pie spark",
"spark"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "d4e821d49191f381cd322a280763d6a95f7b87ed1bd999653cc15dad28859805",
"md5": "769186e131b7a6f0f918989e29c5fd6c",
"sha256": "cab15489b44ee0674d0defd6cd694cbc5870fdd794bbc45cc4d405cb15df509e"
},
"downloads": -1,
"filename": "veelib-1.0.8-py3-none-any.whl",
"has_sig": false,
"md5_digest": "769186e131b7a6f0f918989e29c5fd6c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 19599,
"upload_time": "2023-01-25T07:28:02",
"upload_time_iso_8601": "2023-01-25T07:28:02.157136Z",
"url": "https://files.pythonhosted.org/packages/d4/e8/21d49191f381cd322a280763d6a95f7b87ed1bd999653cc15dad28859805/veelib-1.0.8-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-01-25 07:28:02",
"github": false,
"gitlab": false,
"bitbucket": false,
"lcname": "veelib"
}