# Labelbox Connector for Snowflake
Access the Labelbox Connector for Snowflake to connect an unstructured dataset to Labelbox, programmatically set up an ontology for labeling, and load the labeled dataset into your Snowflake environment.
Labelbox is the enterprise-grade training data solution with fast AI enabled labeling tools, labeling automation, human workforce, data management, a powerful API for integration & SDK for extensibility. Visit [Labelbox](http://labelbox.com/) for more information.
This library is currently in beta. It may contain errors or inaccuracies and may not function as well as commercially released software. Please report any issues/bugs via [Github Issues](https://github.com/Labelbox/labelsnow/issues).
## Table of Contents
* [Requirements](#requirements)
* [Installation](#installation)
* [Documentation](#documentation)
* [Authentication](#authentication)
* [Contribution](#contribution)
## Requirements
* [Snowflake account with credentials](https://signup.snowflake.com/)
* [Snowflake SDK](https://pypi.org/project/snowflake-connector-python/)
* [Labelbox account](http://app.labelbox.com/)
* [Generate a Labelbox API key](https://labelbox.com/docs/api/getting-started#create_api_key)
## Installation
Install LabelSnow to your Python environment. The installation will also add the Labelbox SDK, a requirement for LabelSnow to function. LabelSnow is available via pypi:
```
pip install labelsnow
```
## Documentation
LabelSnow includes several methods to help facilitate your workflow between Snowflake and Labelbox.
1. Create your dataset in Labelbox from your Unstructured Data stage in Snowflake:
```
sf_dataframe = labelsnow.get_snowflake_datarows(snowflake_cursor, "name_of_snowflake_stage", 604800) #604800 is signed_URL expiration time in Snowflake
my_demo_dataset = labelsnow.create_dataset(labelbox_client=lb_client, snowflake_pandas_dataframe=sf_dataframe, dataset_name="SF Test")
```
Where "sf_dataframe" is a pandas dataframe of unstructured data with asset names and asset URLs in two columns, named "external_id" and "row_data" respectively. my_demo_dataset labelsnow.create_dataset() returns a Labelbox Dataset python object.
| external_id | row_data |
|-------------|--------------------------------------|
| image1.jpg | https://url_to_your_asset/image1.jpg |
| image2.jpg | https://url_to_your_asset/image2.jpg |
| image3.jpg | https://url_to_your_asset/image3.jpg |
2. Get your annotations from Labelbox as a Pandas DataFrame.
```
bronze_df = labelsnow.get_annotations(lb_client, "insert_project_id_here")
```
3. You can use the our flattener to flatten the "Label" JSON column into component columns, or use the silver table method to produce a more queryable table of your labeled assets. Both of these methods take in the bronze table of annotations from above:
```
flattened_table = labelsnow.flatten_bronze_table(bronze_df)
queryable_silver_DF =labelsnow.silver_table(bronze_df)
```
### Depositing your tables into Snowflake
We also include a helper function `put_tables_into_snowflake` that can help you quickly load Pandas tables into Snowflake. It takes in a dictionary of Pandas tables, creates tables, and loads the data.
```
my_table_payload = {"BRONZE_TABLE": bronze_df,
"FLATTENED_BRONZE_TABLE": flattened_table,
"SILVER_TABLE": silver_table}
ctx = snowflake.connector.connect(
user=credentials.user,
password=credentials.password,
account=credentials.account,
warehouse="name_of_warehouse",
database="SAMPLE_DB",
schema="PUBLIC"
)
labelsnow.put_tables_into_snowflake(ctx, my_table_payload)
```
### How To Get Video Project Annotations
Because Labelbox Video projects can contain multiple videos, you must use the `get_videoframe_annotations` method to return an array of Pandas DataFrames for each video in your project. Each DataFrame contains frame-by-frame annotation for a video in the project:
```
video_bronze = labelsnow.get_annotations(lb_client, "insert_video_project_id_here") #sample completed video project
video_dataframe_framesets = labelsnow.get_videoframe_annotations(video_bronze, LB_API_KEY)
```
You may use standard Python code to iteratively to create your flattened bronze tables and silver tables:
```
silver_video_dataframes = {}
video_count = 1
for frameset in video_dataframe_framesets:
silver_table = labelsnow.silver_table(frameset)
silver_table_with_datarowid = pd.merge(silver_table, video_bronze, how = 'inner', on=["DataRow ID"])
video_name = "VIDEO_DEMO_{}".format(video_count)
silver_video_dataframes[video_name] = silver_table_with_datarowid
video_count += 1
```
Then deposit these Pandas dataframes into Snowflake with `put_tables_into_snowflake`
While using LabelSnow, you will likely also use the Labelbox SDK (e.g. for programmatic ontology creation). These resources will help familiarize you with the Labelbox Python SDK:
* [Visit our docs](https://labelbox.com/docs/python-api) to learn how the SDK works
* View our [LabelSnow demo code](https://github.com/Labelbox/labelsnow/tree/main/demo) for inspiration.
* view our [API reference](https://labelbox.com/docs/python-api/api-reference).
## Authentication
Labelbox uses API keys to validate requests. You can create and manage API keys on [Labelbox](https://app.labelbox.com/account/api-keys).
## Contribution
Please consult `CONTRIB.md`
## Provenance
[](https://slsa.dev)
To enhance the software supply chain security of Labelbox's users, as of 0.1.3, every release contains a [SLSA Level 3 Provenance](https://github.com/slsa-framework/slsa-github-generator/blob/main/internal/builders/generic/README.md) document.
This document provides detailed information about the build process, including the repository and branch from which the package was generated.
By using the [SLSA framework's official verifier](https://github.com/slsa-framework/slsa-verifier), you can verify the provenance document to ensure that the package is from a trusted source. Verifying the provenance helps confirm that the package has not been tampered with and was built in a secure environment.
Example of usage for the 1.0.0 release wheel:
```
export VERSION=1.0.0
pip download --no-deps labelsnow==${VERSION}
curl --location -O \
https://github.com/Labelbox/labelsnow/releases/download/${VERSION}/multiple.intoto.jsonl
slsa-verifier verify-artifact --source-branch main --builder-id 'https://github.com/slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@refs/tags/v2.0.0' --source-uri "git+https://github.com/Labelbox/labelsnow" --provenance-path multiple.intoto.jsonl ./labelsnow-${VERSION}-py3-none-any.whl
```
Raw data
{
"_id": null,
"home_page": "https://labelbox.com",
"name": "labelsnow",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "labelbox, labelsnow",
"author": "Labelbox",
"author_email": "ecosystem+snowflake@labelbox.com",
"download_url": "https://files.pythonhosted.org/packages/38/09/f2ab0e2da48a6ffc43d0ca71408dbeb406865da03a17dfd7b5dd0e56a7f7/labelsnow-1.0.0.tar.gz",
"platform": null,
"description": "# Labelbox Connector for Snowflake\n\nAccess the Labelbox Connector for Snowflake to connect an unstructured dataset to Labelbox, programmatically set up an ontology for labeling, and load the labeled dataset into your Snowflake environment. \n\nLabelbox is the enterprise-grade training data solution with fast AI enabled labeling tools, labeling automation, human workforce, data management, a powerful API for integration & SDK for extensibility. Visit [Labelbox](http://labelbox.com/) for more information.\n\nThis library is currently in beta. It may contain errors or inaccuracies and may not function as well as commercially released software. Please report any issues/bugs via [Github Issues](https://github.com/Labelbox/labelsnow/issues).\n\n\n## Table of Contents\n\n* [Requirements](#requirements)\n* [Installation](#installation)\n* [Documentation](#documentation)\n* [Authentication](#authentication)\n* [Contribution](#contribution)\n\n## Requirements\n\n* [Snowflake account with credentials](https://signup.snowflake.com/)\n* [Snowflake SDK](https://pypi.org/project/snowflake-connector-python/)\n* [Labelbox account](http://app.labelbox.com/)\n* [Generate a Labelbox API key](https://labelbox.com/docs/api/getting-started#create_api_key)\n\n## Installation\n\nInstall LabelSnow to your Python environment. The installation will also add the Labelbox SDK, a requirement for LabelSnow to function. LabelSnow is available via pypi: \n\n```\npip install labelsnow\n```\n\n## Documentation\n\nLabelSnow includes several methods to help facilitate your workflow between Snowflake and Labelbox. \n\n1. Create your dataset in Labelbox from your Unstructured Data stage in Snowflake: \n\n```\nsf_dataframe = labelsnow.get_snowflake_datarows(snowflake_cursor, \"name_of_snowflake_stage\", 604800) #604800 is signed_URL expiration time in Snowflake\n\nmy_demo_dataset = labelsnow.create_dataset(labelbox_client=lb_client, snowflake_pandas_dataframe=sf_dataframe, dataset_name=\"SF Test\")\n```\nWhere \"sf_dataframe\" is a pandas dataframe of unstructured data with asset names and asset URLs in two columns, named \"external_id\" and \"row_data\" respectively. my_demo_dataset labelsnow.create_dataset() returns a Labelbox Dataset python object. \n\n| external_id | row_data |\n|-------------|--------------------------------------|\n| image1.jpg | https://url_to_your_asset/image1.jpg |\n| image2.jpg | https://url_to_your_asset/image2.jpg |\n| image3.jpg | https://url_to_your_asset/image3.jpg |\n\n2. Get your annotations from Labelbox as a Pandas DataFrame. \n```\nbronze_df = labelsnow.get_annotations(lb_client, \"insert_project_id_here\")\n```\n\n3. You can use the our flattener to flatten the \"Label\" JSON column into component columns, or use the silver table method to produce a more queryable table of your labeled assets. Both of these methods take in the bronze table of annotations from above: \n\n```\nflattened_table = labelsnow.flatten_bronze_table(bronze_df)\nqueryable_silver_DF =labelsnow.silver_table(bronze_df)\n```\n### Depositing your tables into Snowflake\n\nWe also include a helper function `put_tables_into_snowflake` that can help you quickly load Pandas tables into Snowflake. It takes in a dictionary of Pandas tables, creates tables, and loads the data.\n\n```\nmy_table_payload = {\"BRONZE_TABLE\": bronze_df,\n \"FLATTENED_BRONZE_TABLE\": flattened_table,\n \"SILVER_TABLE\": silver_table}\n \nctx = snowflake.connector.connect(\n user=credentials.user,\n password=credentials.password,\n account=credentials.account,\n warehouse=\"name_of_warehouse\",\n database=\"SAMPLE_DB\",\n schema=\"PUBLIC\"\n )\n\nlabelsnow.put_tables_into_snowflake(ctx, my_table_payload)\n```\n\n### How To Get Video Project Annotations\n\nBecause Labelbox Video projects can contain multiple videos, you must use the `get_videoframe_annotations` method to return an array of Pandas DataFrames for each video in your project. Each DataFrame contains frame-by-frame annotation for a video in the project: \n\n```\nvideo_bronze = labelsnow.get_annotations(lb_client, \"insert_video_project_id_here\") #sample completed video project\nvideo_dataframe_framesets = labelsnow.get_videoframe_annotations(video_bronze, LB_API_KEY)\n```\n\nYou may use standard Python code to iteratively to create your flattened bronze tables and silver tables: \n```\nsilver_video_dataframes = {} \n\nvideo_count = 1\nfor frameset in video_dataframe_framesets:\n silver_table = labelsnow.silver_table(frameset)\n silver_table_with_datarowid = pd.merge(silver_table, video_bronze, how = 'inner', on=[\"DataRow ID\"])\n video_name = \"VIDEO_DEMO_{}\".format(video_count)\n silver_video_dataframes[video_name] = silver_table_with_datarowid\n video_count += 1\n```\nThen deposit these Pandas dataframes into Snowflake with `put_tables_into_snowflake`\n\n\nWhile using LabelSnow, you will likely also use the Labelbox SDK (e.g. for programmatic ontology creation). These resources will help familiarize you with the Labelbox Python SDK: \n* [Visit our docs](https://labelbox.com/docs/python-api) to learn how the SDK works\n* View our [LabelSnow demo code](https://github.com/Labelbox/labelsnow/tree/main/demo) for inspiration.\n* view our [API reference](https://labelbox.com/docs/python-api/api-reference).\n\n## Authentication\n\nLabelbox uses API keys to validate requests. You can create and manage API keys on [Labelbox](https://app.labelbox.com/account/api-keys). \n\n## Contribution\nPlease consult `CONTRIB.md`\n\n## Provenance\n[](https://slsa.dev)\n\nTo enhance the software supply chain security of Labelbox's users, as of 0.1.3, every release contains a [SLSA Level 3 Provenance](https://github.com/slsa-framework/slsa-github-generator/blob/main/internal/builders/generic/README.md) document. \nThis document provides detailed information about the build process, including the repository and branch from which the package was generated.\n\nBy using the [SLSA framework's official verifier](https://github.com/slsa-framework/slsa-verifier), you can verify the provenance document to ensure that the package is from a trusted source. Verifying the provenance helps confirm that the package has not been tampered with and was built in a secure environment.\n\nExample of usage for the 1.0.0 release wheel:\n\n```\nexport VERSION=1.0.0\npip download --no-deps labelsnow==${VERSION}\n\ncurl --location -O \\\n https://github.com/Labelbox/labelsnow/releases/download/${VERSION}/multiple.intoto.jsonl\n\nslsa-verifier verify-artifact --source-branch main --builder-id 'https://github.com/slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@refs/tags/v2.0.0' --source-uri \"git+https://github.com/Labelbox/labelsnow\" --provenance-path multiple.intoto.jsonl ./labelsnow-${VERSION}-py3-none-any.whl\n```\n",
"bugtrack_url": null,
"license": null,
"summary": "Labelbox Connector for Snowflake",
"version": "1.0.0",
"project_urls": {
"Homepage": "https://labelbox.com"
},
"split_keywords": [
"labelbox",
" labelsnow"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "a576ebbb3261dc206ba11648861c1e3284f5568e1312f7468671cb1ece553d59",
"md5": "d8f2e566ffa80d30f8e750ac2a081d76",
"sha256": "8f3e9004153d71014c6b33065fb358ea4e784f8f306fd1841d9858775bdc3126"
},
"downloads": -1,
"filename": "labelsnow-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d8f2e566ffa80d30f8e750ac2a081d76",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 15056,
"upload_time": "2024-06-13T20:05:27",
"upload_time_iso_8601": "2024-06-13T20:05:27.751980Z",
"url": "https://files.pythonhosted.org/packages/a5/76/ebbb3261dc206ba11648861c1e3284f5568e1312f7468671cb1ece553d59/labelsnow-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "3809f2ab0e2da48a6ffc43d0ca71408dbeb406865da03a17dfd7b5dd0e56a7f7",
"md5": "a558f81969f57a9263712a6a90b2be54",
"sha256": "0abbe0b1d06a4a4bb91d7379422e45e3a384a586272c1c3cea71d9f915d58253"
},
"downloads": -1,
"filename": "labelsnow-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "a558f81969f57a9263712a6a90b2be54",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 14814,
"upload_time": "2024-06-13T20:05:29",
"upload_time_iso_8601": "2024-06-13T20:05:29.061377Z",
"url": "https://files.pythonhosted.org/packages/38/09/f2ab0e2da48a6ffc43d0ca71408dbeb406865da03a17dfd7b5dd0e56a7f7/labelsnow-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-06-13 20:05:29",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "labelsnow"
}