labelspark


Namelabelspark JSON
Version 0.7.35 PyPI version JSON
download
home_pagehttps://github.com/Labelbox/LabelSpark.git
SummaryLabelbox Connector for Databricks
upload_time2023-10-03 18:02:08
maintainer
docs_urlNone
authorLabelbox
requires_python
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # The Official Labelbox <> Databricks Python Integration

[Labelbox](https://labelbox.com/) enables teams to maximize the value of their unstructured data with its enterprise-grade training data platform. For ML use cases, Labelbox has tools to deploy labelers to annotate data at massive scale, diagnose model performance to prioritize labeling, and plug in existing ML models to speed up labeling. For non-ML use cases, Labelbox has a powerful catalog with auto-computed similarity scores that users can leverage to label large amounts of data with a couple clicks.

This library was designed to run in a Databricks environment, although it will function in any Spark environment with some modification.

We strongly encourage collaboration - please free to fork this repo and tweak the code base to work for you own data, and make pull requests if you have suggestions on how to enhance the overall experience, add new features, or improve general performance. 

Please report any issues/bugs via [Github Issues](https://github.com/Labelbox/labelspark/issues).

## Table of Contents

* [Requirements](#requirements)
* [Setup](#setup)
* [Example Notebooks](#example-notebooks)

## Requirements

* Databricks: Runtime 10.4 LTS or Later
* Apache Spark: 3.1.2 or Later
* [Labelbox account](http://app.labelbox.com/)
* [Generate a Labelbox API key](https://docs.labelbox.com/reference/create-api-key)

## Setup

Set up LabelSpark with the following lines of code:

```
%pip install labelspark -q
import labelspark as ls

api_key = "" # Insert your Labelbox API key here
client = ls.Client(api_key)
```

Once set up, you can run the following core functions:

- `client.create_data_rows_from_table()` :   Creates Labelbox data rows (and metadata) given a Spark Table DataFrame

- `client.export_to_table()` :  Exports labels (and metadata) from a given Labelbox project and creates a Spark DataFrame

## Example Notebooks

### Importing Data

|            Notebook            |  Github  |
| ------------------------------ | -------- |
| Basics: Data Rows from URLs            | [![Github](https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white)](notebooks/intro.ipynb)  | 
| Data Rows with Metadata        | [![Github](https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white)](notebooks/metadata.ipynb)  | 
| Data Rows with Attachments     | [![Github](https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white)](notebooks/attachments.ipynb)  | 
| Data Rows with Annotations     | [![Github](https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white)](notebooks/annotations.ipynb)  | 
| Putting it all Together        | [![Github](https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white)](notebooks/full-demo.ipynb)  | 
------

### Exporting Data

|            Notebook            |  Github  |
| ------------------------------ | -------- |
| Exporting Data to a Spark Table            | [![Github](https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white)](notebooks/export.ipynb)  |
------

While using LabelSpark, you will likely also use the Labelbox SDK (e.g. for programmatic ontology creation). These resources will help familiarize you with the Labelbox Python SDK: 
* [Visit our docs](https://docs.labelbox.com/reference/install-python-sdk) to learn how the SDK works
* Checkout our [notebook examples](https://github.com/Labelbox/labelspark/tree/master/notebooks) to follow along with interactive tutorials
* View the Labelbox [API reference](https://labelbox-python.readthedocs.io/en/latest/).

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Labelbox/LabelSpark.git",
    "name": "labelspark",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Labelbox",
    "author_email": "raphael@labelbox.com",
    "download_url": "https://files.pythonhosted.org/packages/05/d9/c76e51787eeb1103031aa873e4076aa971aa9ae78ad881b6a6bcac78aafa/labelspark-0.7.35.tar.gz",
    "platform": null,
    "description": "# The Official Labelbox <> Databricks Python Integration\n\n[Labelbox](https://labelbox.com/) enables teams to maximize the value of their unstructured data with its enterprise-grade training data platform. For ML use cases, Labelbox has tools to deploy labelers to annotate data at massive scale, diagnose model performance to prioritize labeling, and plug in existing ML models to speed up labeling. For non-ML use cases, Labelbox has a powerful catalog with auto-computed similarity scores that users can leverage to label large amounts of data with a couple clicks.\n\nThis library was designed to run in a Databricks environment, although it will function in any Spark environment with some modification.\n\nWe strongly encourage collaboration - please free to fork this repo and tweak the code base to work for you own data, and make pull requests if you have suggestions on how to enhance the overall experience, add new features, or improve general performance. \n\nPlease report any issues/bugs via [Github Issues](https://github.com/Labelbox/labelspark/issues).\n\n## Table of Contents\n\n* [Requirements](#requirements)\n* [Setup](#setup)\n* [Example Notebooks](#example-notebooks)\n\n## Requirements\n\n* Databricks: Runtime 10.4 LTS or Later\n* Apache Spark: 3.1.2 or Later\n* [Labelbox account](http://app.labelbox.com/)\n* [Generate a Labelbox API key](https://docs.labelbox.com/reference/create-api-key)\n\n## Setup\n\nSet up LabelSpark with the following lines of code:\n\n```\n%pip install labelspark -q\nimport labelspark as ls\n\napi_key = \"\" # Insert your Labelbox API key here\nclient = ls.Client(api_key)\n```\n\nOnce set up, you can run the following core functions:\n\n- `client.create_data_rows_from_table()` :   Creates Labelbox data rows (and metadata) given a Spark Table DataFrame\n\n- `client.export_to_table()` :  Exports labels (and metadata) from a given Labelbox project and creates a Spark DataFrame\n\n## Example Notebooks\n\n### Importing Data\n\n|            Notebook            |  Github  |\n| ------------------------------ | -------- |\n| Basics: Data Rows from URLs            | [![Github](https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white)](notebooks/intro.ipynb)  | \n| Data Rows with Metadata        | [![Github](https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white)](notebooks/metadata.ipynb)  | \n| Data Rows with Attachments     | [![Github](https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white)](notebooks/attachments.ipynb)  | \n| Data Rows with Annotations     | [![Github](https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white)](notebooks/annotations.ipynb)  | \n| Putting it all Together        | [![Github](https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white)](notebooks/full-demo.ipynb)  | \n------\n\n### Exporting Data\n\n|            Notebook            |  Github  |\n| ------------------------------ | -------- |\n| Exporting Data to a Spark Table            | [![Github](https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white)](notebooks/export.ipynb)  |\n------\n\nWhile using LabelSpark, you will likely also use the Labelbox SDK (e.g. for programmatic ontology creation). These resources will help familiarize you with the Labelbox Python SDK: \n* [Visit our docs](https://docs.labelbox.com/reference/install-python-sdk) to learn how the SDK works\n* Checkout our [notebook examples](https://github.com/Labelbox/labelspark/tree/master/notebooks) to follow along with interactive tutorials\n* View the Labelbox [API reference](https://labelbox-python.readthedocs.io/en/latest/).\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Labelbox Connector for Databricks",
    "version": "0.7.35",
    "project_urls": {
        "Homepage": "https://github.com/Labelbox/LabelSpark.git"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "842c98d54f7b5dc00e685ceb178fc77466bc8f3017141e10b2350bc216ce4943",
                "md5": "6073099350bda06c104b1f7219ee8c2d",
                "sha256": "5cd1d23a9f2bc6ddcd4a38a316c7bb6d3d3302d327ce3cc1668ce46729a1a5b2"
            },
            "downloads": -1,
            "filename": "labelspark-0.7.35-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6073099350bda06c104b1f7219ee8c2d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 31058,
            "upload_time": "2023-10-03T18:02:06",
            "upload_time_iso_8601": "2023-10-03T18:02:06.280048Z",
            "url": "https://files.pythonhosted.org/packages/84/2c/98d54f7b5dc00e685ceb178fc77466bc8f3017141e10b2350bc216ce4943/labelspark-0.7.35-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "05d9c76e51787eeb1103031aa873e4076aa971aa9ae78ad881b6a6bcac78aafa",
                "md5": "62c1f00b07d4404b1b949d45e40657bd",
                "sha256": "185b94505b185e76814713bc322f22b217225d96a0f021c61bdd4ab122057349"
            },
            "downloads": -1,
            "filename": "labelspark-0.7.35.tar.gz",
            "has_sig": false,
            "md5_digest": "62c1f00b07d4404b1b949d45e40657bd",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 26932,
            "upload_time": "2023-10-03T18:02:08",
            "upload_time_iso_8601": "2023-10-03T18:02:08.108244Z",
            "url": "https://files.pythonhosted.org/packages/05/d9/c76e51787eeb1103031aa873e4076aa971aa9ae78ad881b6a6bcac78aafa/labelspark-0.7.35.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-03 18:02:08",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Labelbox",
    "github_project": "LabelSpark",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "labelspark"
}
        
Elapsed time: 0.12364s