# ClarifaiPySpark
## Introduction
This readme provides overview of the Software Development Kit (SDK) under development for integrating Clarifai with Databricks. The primary use case for this SDK is to facilitate the interaction between Databricks and Clarifai for tasks related to uploading client datasets, annotating data, and exporting and storing annotations in Spark DataFrames or Delta tables.
![Screenshot 2023-11-17 at 5 21 04 PM](https://github.com/Clarifai/clarifai-pyspark/assets/143642606/7b6bfc6a-19b9-48d7-8013-24e79fc5aacf)
The initial use case for this SDK revolves around three main objectives:
### Uploading Client Datasets into Clarifai App:
The SDK should enable the seamless upload of datasets into the Clarifai application, simplifying the process of data transfer from Databricks to Clarifai.
### Annotate the Data:
It should provide features for data annotation, making it easier for users to add labels and metadata to their datasets within the Clarifai platform.
### Export Annotations to Spark DataFrames/Delta Tables:
The SDK should offer functionality to export annotations and store them in Spark DataFrames or Delta tables, facilitating further data analysis within Databricks.
## Requirements:
* Databricks : Runtime 13.3 or later
* Clarifai : ``` pip install clarifai ```
* Create your [Clarifai account](https://clarifai.com/login)
* Follow the instructions to get your own [Clarifai PAT](https://docs.clarifai.com/clarifai-basics/authentication/personal-access-tokens)
* Protocol Buffers : version 4.24.2 `pip install protobuf==4.24.2 `
## Setup:
Install the package and initialize the clarifaipyspark class to begin.
```bash
pip install clarifai-pyspark
```
## Getting Started:
``` python
from clarifaipyspark.client import ClarifaiPySpark
```
Create a Clarifai-PySpark client object to connect to your app on Clarifai. You can also choose the dataset or create one in your clarifai app to upload the data.
``` python
claps_obj = ClarifaiPySpark(user_id=USER_ID, app_id=APP_ID, pat=CLARIFAI_PAT)
dataset_obj = claps_obj.dataset(dataset_id=DATASET_ID)
```
## Examples:
Checkout these notebooks for various operations you can perform using clarifai-pyspark SDK.
| Notebook | **Description** | GitHub |
|----------|--------|---------------- |
| ClarifaiPyspark_Example_NB | An extensive notebook which walks through the journey from data ingestion to exporting annotations | [![GitHub](https://img.shields.io/badge/GitHub-Link-blue?logo=github)](https://github.com/Clarifai/clarifai-pyspark/blob/main/examples/ClarifaiPyspark_Example_NB.ipynb) |
| export_to_df_demo | Explains the process of exporting annotations from clarifai app and storing it as dataframe in databricks | [![GitHub](https://img.shields.io/badge/GitHub-Link-blue?logo=github)](https://github.com/Clarifai/clarifai-pyspark/blob/main/examples/export_to_df_demo.ipynb) |
##
If you want to enhance your AI journey with workflows and leveraging custom models (programmatically) our [Clarifai SDK](https://docs.clarifai.com/python-sdk/tutorial) might be good place to start with.
Please refer below resources for further references.
* Docs - [Clarifai Docs](https://docs.clarifai.com)
* Explore our community page - [Clarifai Community](https://clarifai.com/explore)
* Fork and contribute to our SDK here ! [![GitHub](https://img.shields.io/badge/GitHub-Link-blue?logo=github)](https://github.com/Clarifai/clarifai-python)
* Reach out to us on socials [![Discord](https://img.shields.io/discord/your_server_id?label=Discord&logo=discord&style=flat-square)](https://discord.com/invite/WgUvPK4pVD)
Raw data
{
"_id": null,
"home_page": "https://github.com/Clarifai/clarifai-pyspark",
"name": "clarifai-pyspark",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "",
"author": "Clarifai",
"author_email": "support@clarifai.com",
"download_url": "https://files.pythonhosted.org/packages/9c/7c/8b4edee4ad3c12bdef3308c59578c9d8615b80ae1adc31cb45f11e5f435b/clarifai-pyspark-0.0.4.tar.gz",
"platform": null,
"description": "# ClarifaiPySpark\n\n\n## Introduction\n\nThis readme provides overview of the Software Development Kit (SDK) under development for integrating Clarifai with Databricks. The primary use case for this SDK is to facilitate the interaction between Databricks and Clarifai for tasks related to uploading client datasets, annotating data, and exporting and storing annotations in Spark DataFrames or Delta tables.\n\n![Screenshot 2023-11-17 at 5 21 04\u202fPM](https://github.com/Clarifai/clarifai-pyspark/assets/143642606/7b6bfc6a-19b9-48d7-8013-24e79fc5aacf)\n\nThe initial use case for this SDK revolves around three main objectives:\n\n### Uploading Client Datasets into Clarifai App:\n The SDK should enable the seamless upload of datasets into the Clarifai application, simplifying the process of data transfer from Databricks to Clarifai.\n\n### Annotate the Data:\n It should provide features for data annotation, making it easier for users to add labels and metadata to their datasets within the Clarifai platform.\n\n### Export Annotations to Spark DataFrames/Delta Tables:\n The SDK should offer functionality to export annotations and store them in Spark DataFrames or Delta tables, facilitating further data analysis within Databricks.\n\n## Requirements:\n * Databricks : Runtime 13.3 or later\n * Clarifai : ``` pip install clarifai ```\n * Create your [Clarifai account](https://clarifai.com/login)\n * Follow the instructions to get your own [Clarifai PAT](https://docs.clarifai.com/clarifai-basics/authentication/personal-access-tokens)\n * Protocol Buffers : version 4.24.2 `pip install protobuf==4.24.2 `\n\n## Setup:\n\nInstall the package and initialize the clarifaipyspark class to begin.\n```bash\npip install clarifai-pyspark\n```\n## Getting Started:\n``` python\nfrom clarifaipyspark.client import ClarifaiPySpark\n```\n\nCreate a Clarifai-PySpark client object to connect to your app on Clarifai. You can also choose the dataset or create one in your clarifai app to upload the data.\n``` python\nclaps_obj = ClarifaiPySpark(user_id=USER_ID, app_id=APP_ID, pat=CLARIFAI_PAT)\ndataset_obj = claps_obj.dataset(dataset_id=DATASET_ID)\n```\n## Examples:\nCheckout these notebooks for various operations you can perform using clarifai-pyspark SDK.\n| Notebook | **Description** | GitHub |\n|----------|--------|---------------- |\n| ClarifaiPyspark_Example_NB | An extensive notebook which walks through the journey from data ingestion to exporting annotations | [![GitHub](https://img.shields.io/badge/GitHub-Link-blue?logo=github)](https://github.com/Clarifai/clarifai-pyspark/blob/main/examples/ClarifaiPyspark_Example_NB.ipynb) |\n| export_to_df_demo | Explains the process of exporting annotations from clarifai app and storing it as dataframe in databricks | [![GitHub](https://img.shields.io/badge/GitHub-Link-blue?logo=github)](https://github.com/Clarifai/clarifai-pyspark/blob/main/examples/export_to_df_demo.ipynb) |\n\n##\nIf you want to enhance your AI journey with workflows and leveraging custom models (programmatically) our [Clarifai SDK](https://docs.clarifai.com/python-sdk/tutorial) might be good place to start with.\nPlease refer below resources for further references. \n* Docs - [Clarifai Docs](https://docs.clarifai.com)\n* Explore our community page - [Clarifai Community](https://clarifai.com/explore)\n* Fork and contribute to our SDK here ! [![GitHub](https://img.shields.io/badge/GitHub-Link-blue?logo=github)](https://github.com/Clarifai/clarifai-python)\n* Reach out to us on socials [![Discord](https://img.shields.io/discord/your_server_id?label=Discord&logo=discord&style=flat-square)](https://discord.com/invite/WgUvPK4pVD) \n\n\n",
"bugtrack_url": null,
"license": "Apache 2.0",
"summary": "Clarifai PySpark Python SDK",
"version": "0.0.4",
"project_urls": {
"Homepage": "https://github.com/Clarifai/clarifai-pyspark"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "345ddd1b22256368bf22511df740347f9f16237641f2e4642b9fc00abb800b56",
"md5": "478fddffb15d12c16699603d817413ac",
"sha256": "39abace7d009b4dcac5ab5e90897d9106b269a814e09fb84fc0b76fa1e36bbb8"
},
"downloads": -1,
"filename": "clarifai_pyspark-0.0.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "478fddffb15d12c16699603d817413ac",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 12481,
"upload_time": "2024-01-18T12:45:03",
"upload_time_iso_8601": "2024-01-18T12:45:03.677425Z",
"url": "https://files.pythonhosted.org/packages/34/5d/dd1b22256368bf22511df740347f9f16237641f2e4642b9fc00abb800b56/clarifai_pyspark-0.0.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "9c7c8b4edee4ad3c12bdef3308c59578c9d8615b80ae1adc31cb45f11e5f435b",
"md5": "565a595c86a96e2a06a6cb1a953d3be7",
"sha256": "ec186f0cff489969a92afa9ecd5cf1f43c6fabffbb3b91eb120bb7a7f7439787"
},
"downloads": -1,
"filename": "clarifai-pyspark-0.0.4.tar.gz",
"has_sig": false,
"md5_digest": "565a595c86a96e2a06a6cb1a953d3be7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 11836,
"upload_time": "2024-01-18T12:45:05",
"upload_time_iso_8601": "2024-01-18T12:45:05.265370Z",
"url": "https://files.pythonhosted.org/packages/9c/7c/8b4edee4ad3c12bdef3308c59578c9d8615b80ae1adc31cb45f11e5f435b/clarifai-pyspark-0.0.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-01-18 12:45:05",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Clarifai",
"github_project": "clarifai-pyspark",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "clarifai-pyspark"
}