label-studio-ml

Name	label-studio-ml JSON
Version	1.0.9 JSON
	download
home_page	https://github.com/heartexlabs/label-studio-ml-backend
Summary	Label Studio ML backend
upload_time	2023-01-03 15:54:52
maintainer
docs_url	None
author	Heartex
requires_python	>=3.6
license
keywords
VCS
bugtrack_url
requirements	attr attrs appdirs colorama Flask lxml Pillow requests label-studio-tools Jinja2 itsdangerous werkzeug
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            ## What is the Label Studio ML backend?

The Label Studio ML backend is an SDK that lets you wrap your machine learning code and turn it into a web server.
You can then connect that server to a Label Studio instance to perform 2 tasks:

- Dynamically pre-annotate data based on model inference results
- Retrain or fine-tune a model based on recently annotated data

If you just need to load static pre-annotated data into Label Studio, running an ML backend might be overkill for you. Instead, you can [import preannotated data](https://labelstud.io/guide/predictions.html).

## How it works

1. Get your model code
2. Wrap it with the Label Studio SDK
3. Create a running server script
4. Launch the script
5. Connect Label Studio to ML backend on the UI


## Quickstart

Follow this example tutorial to run an ML backend with a simple text classifier:

0. Clone the repo
   ```bash
   git clone https://github.com/heartexlabs/label-studio-ml-backend  
   ```

1. Setup environment

    It is highly recommended to use `venv`, `virtualenv` or `conda` python environments. You can use the same environment as Label Studio does. [Read more](https://docs.python.org/3/tutorial/venv.html#creating-virtual-environments) about creating virtual environments via `venv`.
   ```bash
   cd label-studio-ml-backend

   # Install label-studio-ml and its dependencies
   pip install -U -e .

   # Install example dependencies
   pip install -r label_studio_ml/examples/requirements.txt
   ```

2. Initialize an ML backend based on an example script:
   ```bash
   label-studio-ml init my_ml_backend --script label_studio_ml/examples/simple_text_classifier/simple_text_classifier.py
   ```
   This ML backend is an example provided by Label Studio. See [how to create your own ML backend](#create-your-own-ml-backend).

3. Start ML backend server
   ```bash
   label-studio-ml start my_ml_backend
   ```

4. Start Label Studio and connect it to the running ML backend on the project settings page.

## Create your own ML backend

Follow this tutorial to wrap existing machine learning model code with the Label Studio ML SDK to use it as an ML backend with Label Studio. 

Before you start, determine the following:
1. The expected inputs and outputs for your model. In other words, the type of labeling that your model supports in Label Studio, which informs the [Label Studio labeling config](https://labelstud.io/guide/setup.html#Set-up-the-labeling-interface-for-your-project). For example, text classification labels of "Dog", "Cat", or "Opossum" could be possible inputs and outputs. 
2. The [prediction format](https://labelstud.io/guide/predictions.html) returned by your ML backend server.

This example tutorial outlines how to wrap a simple text classifier based on the [scikit-learn](https://scikit-learn.org/) framework with the Label Studio ML SDK.

Start by creating a class declaration. You can create a Label Studio-compatible ML backend server in one command by inheriting it from `LabelStudioMLBase`. 
```python
from label_studio_ml.model import LabelStudioMLBase

class MyModel(LabelStudioMLBase):
```

Then, define loaders & initializers in the `__init__` method. 

```python
def __init__(self, **kwargs):
    # don't forget to initialize base class...
    super(MyModel, self).__init__(**kwargs)
    self.model = self.load_my_model()
```

There are special variables provided by the inherited class:
- `self.parsed_label_config` is a Python dict that provides a Label Studio project config structure. See [ref for details](https://github.com/heartexlabs/label-studio/blob/6bcbba7dd056533bfdbc2feab1a6f1e38ce7cf11/label_studio/core/label_config.py#L33). Use might want to use this to align your model input/output with Label Studio labeling configuration;
- `self.label_config` is a raw labeling config string;
- `self.train_output` is a Python dict with the results of the previous model training runs (the output of the `fit()` method described bellow) Use this if you want to load the model for the next updates for active learning and model fine-tuning.

After you define the loaders, you can define two methods for your model: an inference call and a training call. 

### Inference call

Use an inference call to get pre-annotations from your model on-the-fly. You must update the existing predict method in the example ML backend scripts to make them work for your specific use case. Write your own code to override the `predict(tasks, **kwargs)` method, which takes [JSON-formatted Label Studio tasks](https://labelstud.io/guide/tasks.html#Basic-Label-Studio-JSON-format) and returns predictions in the [format accepted by Label Studio](https://labelstud.io/guide/predictions.html).

**Example**

```python
def predict(self, tasks, **kwargs):
    predictions = []
    # Get annotation tag first, and extract from_name/to_name keys from the labeling config to make predictions
    from_name, schema = list(self.parsed_label_config.items())[0]
    to_name = schema['to_name'][0]
    for task in tasks:
        # for each task, return classification results in the form of "choices" pre-annotations
        predictions.append({
            'result': [{
                'from_name': from_name,
                'to_name': to_name,
                'type': 'choices',
                'value': {'choices': ['My Label']}
            }],
            # optionally you can include prediction scores that you can use to sort the tasks and do active learning
            'score': 0.987
        })
    return predictions
```


### Training call
Use the training call to update your model with new annotations. You don't need to use this call in your code, for example if you just want to pre-annotate tasks without retraining the model. If you do want to retrain the model based on annotations from Label Studio, use this method. 

Write your own code to override the `fit(annotations, **kwargs)` method, which takes [JSON-formatted Label Studio annotations](https://labelstud.io/guide/export.html#Raw-JSON-format-of-completed-labeled-tasks) and returns an arbitrary dict where some information about the created model can be stored.

**Example**
```python
def fit(self, completions, workdir=None, **kwargs):
    # ... do some heavy computations, get your model and store checkpoints and resources
    return {'checkpoints': 'my/model/checkpoints'}  # <-- you can retrieve this dict as self.train_output in the subsequent calls
```


After you wrap your model code with the class, define the loaders, and define the methods, you're ready to run your model as an ML backend with Label Studio. 

For other examples of ML backends, refer to the [examples in this repository](label_studio_ml/examples). These examples aren't production-ready, but can help you set up your own code as a Label Studio ML backend.

## Deploy your ML backend to GCP

Before you start:
1. Install [gcloud](https://cloud.google.com/sdk/docs/install)
2. Init billing for account if it's not [activated](https://console.cloud.google.com/project/_/billing/enable)
3. Init gcloud, type the following commands and login in browser:
```bash
gcloud auth login
```
4. Activate your Cloud Build API
5. Find your GCP project ID
6. (Optional) Add GCP_REGION with your default region to your ENV variables 

To start deployment:
1. Create your own ML backend
2. Start deployment to GCP:
```bash
label-studio-ml deploy gcp {ml-backend-local-dir} \
--from={model-python-script} \
--gcp-project-id {gcp-project-id} \
--label-studio-host {https://app.heartex.com} \
--label-studio-api-key {YOUR-LABEL-STUDIO-API-KEY}
```
3. After label studio deploys the model - you will get model endpoint in console.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/heartexlabs/label-studio-ml-backend",
    "name": "label-studio-ml",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "",
    "author": "Heartex",
    "author_email": "hello@heartex.com",
    "download_url": "https://files.pythonhosted.org/packages/b7/68/81331494d8bab08d215e052444bdd685ad9f26b68ba22f5ceadb86080a30/label-studio-ml-1.0.9.tar.gz",
    "platform": null,
    "description": "## What is the Label Studio ML backend?\n\nThe Label Studio ML backend is an SDK that lets you wrap your machine learning code and turn it into a web server.\nYou can then connect that server to a Label Studio instance to perform 2 tasks:\n\n- Dynamically pre-annotate data based on model inference results\n- Retrain or fine-tune a model based on recently annotated data\n\nIf you just need to load static pre-annotated data into Label Studio, running an ML backend might be overkill for you. Instead, you can [import preannotated data](https://labelstud.io/guide/predictions.html).\n\n## How it works\n\n1. Get your model code\n2. Wrap it with the Label Studio SDK\n3. Create a running server script\n4. Launch the script\n5. Connect Label Studio to ML backend on the UI\n\n\n## Quickstart\n\nFollow this example tutorial to run an ML backend with a simple text classifier:\n\n0. Clone the repo\n   ```bash\n   git clone https://github.com/heartexlabs/label-studio-ml-backend  \n   ```\n\n1. Setup environment\n\n    It is highly recommended to use `venv`, `virtualenv` or `conda` python environments. You can use the same environment as Label Studio does. [Read more](https://docs.python.org/3/tutorial/venv.html#creating-virtual-environments) about creating virtual environments via `venv`.\n   ```bash\n   cd label-studio-ml-backend\n\n   # Install label-studio-ml and its dependencies\n   pip install -U -e .\n\n   # Install example dependencies\n   pip install -r label_studio_ml/examples/requirements.txt\n   ```\n\n2. Initialize an ML backend based on an example script:\n   ```bash\n   label-studio-ml init my_ml_backend --script label_studio_ml/examples/simple_text_classifier/simple_text_classifier.py\n   ```\n   This ML backend is an example provided by Label Studio. See [how to create your own ML backend](#create-your-own-ml-backend).\n\n3. Start ML backend server\n   ```bash\n   label-studio-ml start my_ml_backend\n   ```\n\n4. Start Label Studio and connect it to the running ML backend on the project settings page.\n\n## Create your own ML backend\n\nFollow this tutorial to wrap existing machine learning model code with the Label Studio ML SDK to use it as an ML backend with Label Studio. \n\nBefore you start, determine the following:\n1. The expected inputs and outputs for your model. In other words, the type of labeling that your model supports in Label Studio, which informs the [Label Studio labeling config](https://labelstud.io/guide/setup.html#Set-up-the-labeling-interface-for-your-project). For example, text classification labels of \"Dog\", \"Cat\", or \"Opossum\" could be possible inputs and outputs. \n2. The [prediction format](https://labelstud.io/guide/predictions.html) returned by your ML backend server.\n\nThis example tutorial outlines how to wrap a simple text classifier based on the [scikit-learn](https://scikit-learn.org/) framework with the Label Studio ML SDK.\n\nStart by creating a class declaration. You can create a Label Studio-compatible ML backend server in one command by inheriting it from `LabelStudioMLBase`. \n```python\nfrom label_studio_ml.model import LabelStudioMLBase\n\nclass MyModel(LabelStudioMLBase):\n```\n\nThen, define loaders & initializers in the `__init__` method. \n\n```python\ndef __init__(self, **kwargs):\n    # don't forget to initialize base class...\n    super(MyModel, self).__init__(**kwargs)\n    self.model = self.load_my_model()\n```\n\nThere are special variables provided by the inherited class:\n- `self.parsed_label_config` is a Python dict that provides a Label Studio project config structure. See [ref for details](https://github.com/heartexlabs/label-studio/blob/6bcbba7dd056533bfdbc2feab1a6f1e38ce7cf11/label_studio/core/label_config.py#L33). Use might want to use this to align your model input/output with Label Studio labeling configuration;\n- `self.label_config` is a raw labeling config string;\n- `self.train_output` is a Python dict with the results of the previous model training runs (the output of the `fit()` method described bellow) Use this if you want to load the model for the next updates for active learning and model fine-tuning.\n\nAfter you define the loaders, you can define two methods for your model: an inference call and a training call. \n\n### Inference call\n\nUse an inference call to get pre-annotations from your model on-the-fly. You must update the existing predict method in the example ML backend scripts to make them work for your specific use case. Write your own code to override the `predict(tasks, **kwargs)` method, which takes [JSON-formatted Label Studio tasks](https://labelstud.io/guide/tasks.html#Basic-Label-Studio-JSON-format) and returns predictions in the [format accepted by Label Studio](https://labelstud.io/guide/predictions.html).\n\n**Example**\n\n```python\ndef predict(self, tasks, **kwargs):\n    predictions = []\n    # Get annotation tag first, and extract from_name/to_name keys from the labeling config to make predictions\n    from_name, schema = list(self.parsed_label_config.items())[0]\n    to_name = schema['to_name'][0]\n    for task in tasks:\n        # for each task, return classification results in the form of \"choices\" pre-annotations\n        predictions.append({\n            'result': [{\n                'from_name': from_name,\n                'to_name': to_name,\n                'type': 'choices',\n                'value': {'choices': ['My Label']}\n            }],\n            # optionally you can include prediction scores that you can use to sort the tasks and do active learning\n            'score': 0.987\n        })\n    return predictions\n```\n\n\n### Training call\nUse the training call to update your model with new annotations. You don't need to use this call in your code, for example if you just want to pre-annotate tasks without retraining the model. If you do want to retrain the model based on annotations from Label Studio, use this method. \n\nWrite your own code to override the `fit(annotations, **kwargs)` method, which takes [JSON-formatted Label Studio annotations](https://labelstud.io/guide/export.html#Raw-JSON-format-of-completed-labeled-tasks) and returns an arbitrary dict where some information about the created model can be stored.\n\n**Example**\n```python\ndef fit(self, completions, workdir=None, **kwargs):\n    # ... do some heavy computations, get your model and store checkpoints and resources\n    return {'checkpoints': 'my/model/checkpoints'}  # <-- you can retrieve this dict as self.train_output in the subsequent calls\n```\n\n\nAfter you wrap your model code with the class, define the loaders, and define the methods, you're ready to run your model as an ML backend with Label Studio. \n\nFor other examples of ML backends, refer to the [examples in this repository](label_studio_ml/examples). These examples aren't production-ready, but can help you set up your own code as a Label Studio ML backend.\n\n## Deploy your ML backend to GCP\n\nBefore you start:\n1. Install [gcloud](https://cloud.google.com/sdk/docs/install)\n2. Init billing for account if it's not [activated](https://console.cloud.google.com/project/_/billing/enable)\n3. Init gcloud, type the following commands and login in browser:\n```bash\ngcloud auth login\n```\n4. Activate your Cloud Build API\n5. Find your GCP project ID\n6. (Optional) Add GCP_REGION with your default region to your ENV variables \n\nTo start deployment:\n1. Create your own ML backend\n2. Start deployment to GCP:\n```bash\nlabel-studio-ml deploy gcp {ml-backend-local-dir} \\\n--from={model-python-script} \\\n--gcp-project-id {gcp-project-id} \\\n--label-studio-host {https://app.heartex.com} \\\n--label-studio-api-key {YOUR-LABEL-STUDIO-API-KEY}\n```\n3. After label studio deploys the model - you will get model endpoint in console.\n\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Label Studio ML backend",
    "version": "1.0.9",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1792d253add75c5a350977e601e07f924131c38545b0d06bf665a968b9f9839a",
                "md5": "6d6c371420cdf928c29535fce2fadc20",
                "sha256": "bafa96fb6b6b4c0c7ab4bd4b860137f33622756a3d70899cb91282c543846db3"
            },
            "downloads": -1,
            "filename": "label_studio_ml-1.0.9-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6d6c371420cdf928c29535fce2fadc20",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 28607,
            "upload_time": "2023-01-03T15:54:50",
            "upload_time_iso_8601": "2023-01-03T15:54:50.023348Z",
            "url": "https://files.pythonhosted.org/packages/17/92/d253add75c5a350977e601e07f924131c38545b0d06bf665a968b9f9839a/label_studio_ml-1.0.9-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b76881331494d8bab08d215e052444bdd685ad9f26b68ba22f5ceadb86080a30",
                "md5": "a74cabd5117187fa4fc8896699507dcf",
                "sha256": "122fa7256520b8e09737432dbfad0d6980afd408c67b9e1f9fb6ede31976470a"
            },
            "downloads": -1,
            "filename": "label-studio-ml-1.0.9.tar.gz",
            "has_sig": false,
            "md5_digest": "a74cabd5117187fa4fc8896699507dcf",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 28246,
            "upload_time": "2023-01-03T15:54:52",
            "upload_time_iso_8601": "2023-01-03T15:54:52.149798Z",
            "url": "https://files.pythonhosted.org/packages/b7/68/81331494d8bab08d215e052444bdd685ad9f26b68ba22f5ceadb86080a30/label-studio-ml-1.0.9.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-01-03 15:54:52",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "heartexlabs",
    "github_project": "label-studio-ml-backend",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "attr",
            "specs": [
                [
                    "==",
                    "0.3.1"
                ]
            ]
        },
        {
            "name": "attrs",
            "specs": [
                [
                    ">=",
                    "19.2.0"
                ]
            ]
        },
        {
            "name": "appdirs",
            "specs": [
                [
                    ">=",
                    "1.4.3"
                ]
            ]
        },
        {
            "name": "colorama",
            "specs": [
                [
                    ">=",
                    "0.4.4"
                ]
            ]
        },
        {
            "name": "Flask",
            "specs": [
                [
                    "==",
                    "1.1.2"
                ]
            ]
        },
        {
            "name": "lxml",
            "specs": [
                [
                    ">=",
                    "4.2.5"
                ]
            ]
        },
        {
            "name": "Pillow",
            "specs": []
        },
        {
            "name": "requests",
            "specs": [
                [
                    ">=",
                    "2.22.0"
                ],
                [
                    "<",
                    "3"
                ]
            ]
        },
        {
            "name": "label-studio-tools",
            "specs": [
                [
                    ">=",
                    "0.0.0.dev11"
                ]
            ]
        },
        {
            "name": "Jinja2",
            "specs": [
                [
                    "==",
                    "3.0.3"
                ]
            ]
        },
        {
            "name": "itsdangerous",
            "specs": [
                [
                    "==",
                    "2.0.1"
                ]
            ]
        },
        {
            "name": "werkzeug",
            "specs": [
                [
                    "==",
                    "2.0.2"
                ]
            ]
        }
    ],
    "lcname": "label-studio-ml"
}

Heartex