vee-ap-generation-pyspark-app


Namevee-ap-generation-pyspark-app JSON
Version 1.0.1 PyPI version JSON
download
home_pagehttps://github.geo.apple.com/yun-hu/vee-ap-generation
SummaryPySpark App on PIE Spark
upload_time2023-01-23 20:47:40
maintainer
docs_urlNone
authorYun Hu
requires_python
licenseApple Internal
keywords pyspark pie spark spark
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ### PySpark sample

This sample project shows how to deploy a PySpark job to Data Platform.

The only pre-requisite is to have a Data Platform project already setup 
with a namespace with enough quota and with this repo connected as source.

Please find the Data Platform Quickstart guide [here](https://docs.aci.apple.com/spark_kube/getting_started/introduction.html#getting-started).

Once you have a `projectId` and a `namespaceId`, please substitute the values in the `rio.yml`.

### CI/CD Overview

#### PySpark project
There are 2 pipelines in `rio.yml` that show the possible ways of packaging and deploying your PySpark jobs:
1. `main`: shows how to use a [Buildozer build](https://docs.aci.apple.com/rio/guide-to-rio/build-and-test/buildozer.html#buildozer-builds)
which will provide a Python runtime with the version set in `runtime.txt` file.
2. `sdp-base-image`: shows how to use a [Freestyle build](https://docs.aci.apple.com/rio/guide-to-rio/build-and-test/freestyle.html#freestyle-builds)
with an SDP Base Image that will provide Python 3.9. The jobs will be deployed only when commits are pushed to `sdp-base-image` branch.

#### Hybrid PySpark + Gradle multi-project (Java and Scala) 
There is another pipeline `hybrid` in `rio.yml` that show how to compile, package and deploying your hybrid Python + Java/Scala jobs with a [Buildozer build](https://docs.aci.apple.com/rio/guide-to-rio/build-and-test/buildozer.html#buildozer-builds)
with two Builders: [Python](https://docs.aci.apple.com/rio/guide-to-rio/build-and-test/builders/python.html) and [Gradle](https://docs.aci.apple.com/rio/guide-to-rio/build-and-test/builders/java-gradle.html).
The jobs will be deployed only when commits are pushed to `hybrid` branch.

In this case, the Python version is taken from the `runtime.txt` file and the Java version can be customized with the `RUNTIME_JDK_VERSION` env variable in the `rio.yml` file.

Please refer to the previous links to Rio docs for further instructions on how to customize the builders.

### Splunk logging from an ACI Kube namespace
1. Follow [Kube's docs](https://kube.apple.com/docs/guides/connecting-to-non-kube-services/) to enable connectivity to the Apple Internal and the Apple Datacenters named networks.
2. In your Python scripts, configure the logger with the following instruction: 
```python
import os
from logging.config import fileConfig
    
fileConfig(os.getenv('PLATFORM_PYTHON_SPARK_LOG_CONF'))
```



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.geo.apple.com/yun-hu/vee-ap-generation",
    "name": "vee-ap-generation-pyspark-app",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "PySpark,PIE Spark,Spark",
    "author": "Yun Hu",
    "author_email": "yun_hu@apple.com",
    "download_url": "",
    "platform": null,
    "description": "### PySpark sample\n\nThis sample project shows how to deploy a PySpark job to Data Platform.\n\nThe only pre-requisite is to have a Data Platform project already setup \nwith a namespace with enough quota and with this repo connected as source.\n\nPlease find the Data Platform Quickstart guide [here](https://docs.aci.apple.com/spark_kube/getting_started/introduction.html#getting-started).\n\nOnce you have a `projectId` and a `namespaceId`, please substitute the values in the `rio.yml`.\n\n### CI/CD Overview\n\n#### PySpark project\nThere are 2 pipelines in `rio.yml` that show the possible ways of packaging and deploying your PySpark jobs:\n1. `main`: shows how to use a [Buildozer build](https://docs.aci.apple.com/rio/guide-to-rio/build-and-test/buildozer.html#buildozer-builds)\nwhich will provide a Python runtime with the version set in `runtime.txt` file.\n2. `sdp-base-image`: shows how to use a [Freestyle build](https://docs.aci.apple.com/rio/guide-to-rio/build-and-test/freestyle.html#freestyle-builds)\nwith an SDP Base Image that will provide Python 3.9. The jobs will be deployed only when commits are pushed to `sdp-base-image` branch.\n\n#### Hybrid PySpark + Gradle multi-project (Java and Scala) \nThere is another pipeline `hybrid` in `rio.yml` that show how to compile, package and deploying your hybrid Python + Java/Scala jobs with a [Buildozer build](https://docs.aci.apple.com/rio/guide-to-rio/build-and-test/buildozer.html#buildozer-builds)\nwith two Builders: [Python](https://docs.aci.apple.com/rio/guide-to-rio/build-and-test/builders/python.html) and [Gradle](https://docs.aci.apple.com/rio/guide-to-rio/build-and-test/builders/java-gradle.html).\nThe jobs will be deployed only when commits are pushed to `hybrid` branch.\n\nIn this case, the Python version is taken from the `runtime.txt` file and the Java version can be customized with the `RUNTIME_JDK_VERSION` env variable in the `rio.yml` file.\n\nPlease refer to the previous links to Rio docs for further instructions on how to customize the builders.\n\n### Splunk logging from an ACI Kube namespace\n1. Follow [Kube's docs](https://kube.apple.com/docs/guides/connecting-to-non-kube-services/) to enable connectivity to the Apple Internal and the Apple Datacenters named networks.\n2. In your Python scripts, configure the logger with the following instruction: \n```python\nimport os\nfrom logging.config import fileConfig\n    \nfileConfig(os.getenv('PLATFORM_PYTHON_SPARK_LOG_CONF'))\n```\n\n\n",
    "bugtrack_url": null,
    "license": "Apple Internal",
    "summary": "PySpark App on PIE Spark",
    "version": "1.0.1",
    "split_keywords": [
        "pyspark",
        "pie spark",
        "spark"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5bd583b42028c222330828568ea832f1ad88ba6fa49a5106c759f8ad99c43f1d",
                "md5": "61a69374eb7adfeaf8f32687918f4dbb",
                "sha256": "5adbf5ad0c52fed8aaf6bbcef12ecbbdd227ef437f56f3d8386dfcc063265d98"
            },
            "downloads": -1,
            "filename": "vee_ap_generation_pyspark_app-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "61a69374eb7adfeaf8f32687918f4dbb",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 24248,
            "upload_time": "2023-01-23T20:47:40",
            "upload_time_iso_8601": "2023-01-23T20:47:40.626531Z",
            "url": "https://files.pythonhosted.org/packages/5b/d5/83b42028c222330828568ea832f1ad88ba6fa49a5106c759f8ad99c43f1d/vee_ap_generation_pyspark_app-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-01-23 20:47:40",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "vee-ap-generation-pyspark-app"
}
        
Elapsed time: 0.03373s