dataos-pyflare


Namedataos-pyflare JSON
Version 0.1.13 PyPI version JSON
download
home_pagehttps://bitbucket.org/rubik_/dataos-pyspark-sdk
SummaryPyspark bridge to dataos
upload_time2024-11-06 10:03:40
maintainerNone
docs_urlNone
authorModern labs
requires_python<4,>=3.7
licenseCopyright (c) 2018 The Python Packaging Authority Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords dataos python flare pyflare dataos-pyflare
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Dataos-PyFlare : DataOS SDK for Apache Spark

### What it does:
Dataos-PyFlare is a powerful Python library designed to simplify data operations and interactions with the DataOS platform and Apache Spark. It provides a convenient and efficient way to load, transform, and save data.

It abstracts out the challenges/complexity around data flow. User can just focus on data transformations and 
business logic.

### Features
* **Streamlined Data Operations**: Dataos-PyFlare streamlines data operations by offering a unified interface for data loading, transformation, and storage, reducing development complexity and time.

* **Data Connector Integration**: Seamlessly connect to various data connectors, including Google BigQuery, Google Cloud Storage (GCS), Snowflake, Redshift, Pulsar and more, using sdk's built-in capabilities.

* **Customizable and Extensible**: Dataos-PyFlare allows for easy customization and extension to suit your specific project requirements. It integrates with existing Python libraries and frameworks for data manipulation.

* **Optimized for DataOS**: Dataos-PyFlare is optimized for the DataOS platform, making it an ideal choice for managing and processing data within DataOS environments.

### Steps to install
Before you begin, make sure you have Python 3 [version >= 3.7] installed on your system.

You can install Dataos-PyFlare and its dependencies using pip:
        ``
```
pip install dataos-pyflare
```

Additionally, make sure to have a Spark environment set up with the required configurations for your specific use case.

## Getting Started

### Sample Code:
This code snippet demonstrates how to configure a Dataos-PyFlare session to load data from a source, apply transformations, and save the result to a destination.

```python
from pyflare.sdk import load, save, session_builder

# Define your spark conf params here
sparkConf = [("spark.app.name", "Dataos Sdk Spark App"), ("spark.master", "local[*]"), ("spark.executor.memory", "4g"),
             ("spark.jars.packages", "com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.25.1,"
                                     "com.google.cloud.bigdataoss:gcs-connector:hadoop3-2.2.17,"
                                    "net.snowflake:spark-snowflake_2.12:2.11.0-spark_3.3")
             ]

# Provide dataos token here
token = "bWF5YW5rLjkxYzZiNDQ3LWM3ZWYLWMzNjk3MzQ1MTQyNw=="

# provide dataos fully qualified domain name
DATAOS_FQDN = "sunny-prawn.dataos.app"

# initialize pyflare session
spark = session_builder.SparkSessionBuilder() \
    .with_spark_conf(sparkConf) \
    .with_user_apikey(token) \
    .with_dataos_fqdn(DATAOS_FQDN) \
    .with_depot(depot_name="icebase", acl="r") \
    .with_depot("sanitysnowflake", "rw") \
    .build_session()

# load() method will read dataset city from the source and return a governed dataframe
df_city = load(name="dataos://icebase:retail/city", format="iceberg")

# perform, required transformation as per business logic
df_city = df_city.drop("__metadata")

# save() will write transformed dataset to the sink
save(name="dataos://sanitysnowflake:public/city", mode="overwrite", dataframe=df_city, format="snowflake")
```

### Explanation

1. **Importing Libraries**: We import necessary modules from the pyflare.sdk package.

2. **Spark Configuration**: We define Spark configuration parameters such as the Spark application name, master URL, executor memory, and additional packages required for connectors.

3. **DataOS Token and FQDN**: You provide your DataOS token and fully qualified domain name (FQDN) to authenticate and connect to the DataOS platform.

4. **PyFlare Session Initialization**: We create a PyFlare session using session_builder.SparkSessionBuilder(). This session will be used for data operations.

5. **Loading Data**: We use the load method to load data from a specified source (dataos://icebase:retail/city) in Iceberg format. The result is a governed DataFrame (df_city).

6. **Transformation**: We perform a transformation on the loaded DataFrame by dropping the __metadata column. You can customize this step to fit your business logic.

7. **Saving Data**: Finally, we use the save method to save the transformed DataFrame to a specified destination (dataos://sanitysnowflake:public/customer) in Snowflake format.

            

Raw data

            {
    "_id": null,
    "home_page": "https://bitbucket.org/rubik_/dataos-pyspark-sdk",
    "name": "dataos-pyflare",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4,>=3.7",
    "maintainer_email": null,
    "keywords": "dataos, python flare, pyflare, dataos-pyflare",
    "author": "Modern labs",
    "author_email": "Mayank Sharma <mayank.sharma@tmdc.io>",
    "download_url": "https://files.pythonhosted.org/packages/cd/be/6f59db5e4e4e8a32e93db56de7530f2ef8ffb5a83889da844a4a496ce946/dataos_pyflare-0.1.13.tar.gz",
    "platform": null,
    "description": "# Dataos-PyFlare : DataOS SDK for Apache Spark\n\n### What it does:\nDataos-PyFlare is a powerful Python library designed to simplify data operations and interactions with the DataOS platform and Apache Spark. It provides a convenient and efficient way to load, transform, and save data.\n\nIt abstracts out the challenges/complexity around data flow. User can just focus on data transformations and \nbusiness logic.\n\n### Features\n* **Streamlined Data Operations**: Dataos-PyFlare streamlines data operations by offering a unified interface for data loading, transformation, and storage, reducing development complexity and time.\n\n* **Data Connector Integration**: Seamlessly connect to various data connectors, including Google BigQuery, Google Cloud Storage (GCS), Snowflake, Redshift, Pulsar and more, using sdk's built-in capabilities.\n\n* **Customizable and Extensible**: Dataos-PyFlare allows for easy customization and extension to suit your specific project requirements. It integrates with existing Python libraries and frameworks for data manipulation.\n\n* **Optimized for DataOS**: Dataos-PyFlare is optimized for the DataOS platform, making it an ideal choice for managing and processing data within DataOS environments.\n\n### Steps to install\nBefore you begin, make sure you have Python 3 [version >= 3.7] installed on your system.\n\nYou can install Dataos-PyFlare and its dependencies using pip:\n        ``\n```\npip install dataos-pyflare\n```\n\nAdditionally, make sure to have a Spark environment set up with the required configurations for your specific use case.\n\n## Getting Started\n\n### Sample Code:\nThis code snippet demonstrates how to configure a Dataos-PyFlare session to load data from a source, apply transformations, and save the result to a destination.\n\n```python\nfrom pyflare.sdk import load, save, session_builder\n\n# Define your spark conf params here\nsparkConf = [(\"spark.app.name\", \"Dataos Sdk Spark App\"), (\"spark.master\", \"local[*]\"), (\"spark.executor.memory\", \"4g\"),\n             (\"spark.jars.packages\", \"com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.25.1,\"\n                                     \"com.google.cloud.bigdataoss:gcs-connector:hadoop3-2.2.17,\"\n                                    \"net.snowflake:spark-snowflake_2.12:2.11.0-spark_3.3\")\n             ]\n\n# Provide dataos token here\ntoken = \"bWF5YW5rLjkxYzZiNDQ3LWM3ZWYLWMzNjk3MzQ1MTQyNw==\"\n\n# provide dataos fully qualified domain name\nDATAOS_FQDN = \"sunny-prawn.dataos.app\"\n\n# initialize pyflare session\nspark = session_builder.SparkSessionBuilder() \\\n    .with_spark_conf(sparkConf) \\\n    .with_user_apikey(token) \\\n    .with_dataos_fqdn(DATAOS_FQDN) \\\n    .with_depot(depot_name=\"icebase\", acl=\"r\") \\\n    .with_depot(\"sanitysnowflake\", \"rw\") \\\n    .build_session()\n\n# load() method will read dataset city from the source and return a governed dataframe\ndf_city = load(name=\"dataos://icebase:retail/city\", format=\"iceberg\")\n\n# perform, required transformation as per business logic\ndf_city = df_city.drop(\"__metadata\")\n\n# save() will write transformed dataset to the sink\nsave(name=\"dataos://sanitysnowflake:public/city\", mode=\"overwrite\", dataframe=df_city, format=\"snowflake\")\n```\n\n### Explanation\n\n1. **Importing Libraries**: We import necessary modules from the pyflare.sdk package.\n\n2. **Spark Configuration**: We define Spark configuration parameters such as the Spark application name, master URL, executor memory, and additional packages required for connectors.\n\n3. **DataOS Token and FQDN**: You provide your DataOS token and fully qualified domain name (FQDN) to authenticate and connect to the DataOS platform.\n\n4. **PyFlare Session Initialization**: We create a PyFlare session using session_builder.SparkSessionBuilder(). This session will be used for data operations.\n\n5. **Loading Data**: We use the load method to load data from a specified source (dataos://icebase:retail/city) in Iceberg format. The result is a governed DataFrame (df_city).\n\n6. **Transformation**: We perform a transformation on the loaded DataFrame by dropping the __metadata column. You can customize this step to fit your business logic.\n\n7. **Saving Data**: Finally, we use the save method to save the transformed DataFrame to a specified destination (dataos://sanitysnowflake:public/customer) in Snowflake format.\n",
    "bugtrack_url": null,
    "license": "Copyright (c) 2018 The Python Packaging Authority  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.",
    "summary": "Pyspark bridge to dataos",
    "version": "0.1.13",
    "project_urls": {
        "Homepage": "https://bitbucket.org/rubik_/dataos-pyspark-sdk"
    },
    "split_keywords": [
        "dataos",
        " python flare",
        " pyflare",
        " dataos-pyflare"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a06a6035020beadeae3f8ccaea31185deaa364235cb0ce62072e1637ce18d8d6",
                "md5": "1dd5f17c61ba117b8b21a83ac172457f",
                "sha256": "d25683ec8c68e12d10911018b018fb9d5be3338e0ab14207b6d0afd6795ff198"
            },
            "downloads": -1,
            "filename": "dataos_pyflare-0.1.13-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1dd5f17c61ba117b8b21a83ac172457f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4,>=3.7",
            "size": 675775,
            "upload_time": "2024-11-06T10:03:37",
            "upload_time_iso_8601": "2024-11-06T10:03:37.342551Z",
            "url": "https://files.pythonhosted.org/packages/a0/6a/6035020beadeae3f8ccaea31185deaa364235cb0ce62072e1637ce18d8d6/dataos_pyflare-0.1.13-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cdbe6f59db5e4e4e8a32e93db56de7530f2ef8ffb5a83889da844a4a496ce946",
                "md5": "ed3bfb2ec001777e69e95dca738dc6e4",
                "sha256": "3f5360b5492b6d26d8836b4e927acb0cb953e5d5ae5da57bd165a6795c7c74e5"
            },
            "downloads": -1,
            "filename": "dataos_pyflare-0.1.13.tar.gz",
            "has_sig": false,
            "md5_digest": "ed3bfb2ec001777e69e95dca738dc6e4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4,>=3.7",
            "size": 660561,
            "upload_time": "2024-11-06T10:03:40",
            "upload_time_iso_8601": "2024-11-06T10:03:40.465848Z",
            "url": "https://files.pythonhosted.org/packages/cd/be/6f59db5e4e4e8a32e93db56de7530f2ef8ffb5a83889da844a4a496ce946/dataos_pyflare-0.1.13.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-06 10:03:40",
    "github": false,
    "gitlab": false,
    "bitbucket": true,
    "codeberg": false,
    "bitbucket_user": "rubik_",
    "bitbucket_project": "dataos-pyspark-sdk",
    "lcname": "dataos-pyflare"
}
        
Elapsed time: 0.52267s