mlflow-tritonserver

Name	mlflow-tritonserver JSON
Version	1.1.0 JSON
	download
home_page
Summary	Tritonserver Mlflow Deployment
upload_time	2023-07-14 11:07:37
maintainer
docs_url	None
author
requires_python	>=3.7
license	Apache 2.0
keywords	machine-learning deep-learning inference tritonserver mlflow
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # MLflow Tritonserver

MLflow plugin for deploying your models from MLflow to Triton Inference Server.
Scripts are included for publishing models, which are in Triton recognized
structure, to your MLflow Model Registry.

### Supported flavors

MLFlow Tritonserver currently supports the following flavors, you may
substitute the flavor specification in the example below according to the model
to be deployed.

* onnx
* triton

## Requirements

* MLflow
* Triton Python HTTP client
* Triton Inference Server

## Installation

The plugin can be installed from source using the following commands

```
pip install mlflow_tritonserver
```

## Quick Start

In this documentation, we will use the files in `examples` to showcase how
the plugin interacts with Triton Inference Server. The `onnx_float32_int32_int32`
model in `examples` is a simple model that takes two float32 inputs, INPUT0 and
INPUT1, with shape [-1, 16], and produces two int32 outputs, OUTPUT0 and
OUTPUT1, where OUTPUT0 is the element-wise summation of INPUT0 and INPUT1 and
OUTPUT1 is the element-wise subtraction of INPUT0 and INPUT1.

### Start Triton Inference Server in EXPLICIT mode

The MLflow Triton plugin must work with a running Triton server, see
[documentation](https://github.com/triton-inference-server/server/blob/main/docs/getting_started/quickstart.md)
of Triton Inference Server for how to start the server. Note that
the server should be run in EXPLICIT mode (`--model-control-mode=explicit`)
to exploit the deployment feature of the plugin.

Once the server has started, the following environment must be set so that the plugin
can interact with the server properly:
* `TRITON_URL`: The address to the Triton HTTP endpoint
* `TRITON_MODEL_REPO`: The path to the Triton model repository. It can be an s3 URI but keep in \
mind that the env vars AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are needed.

### Publish models to MLflow

#### ONNX flavor

The MLFlow ONNX built-in functionalities can be used to publish `onnx` flavor
models to MLFlow directly, and the MLFlow Tritonserver will prepare the model
to the format expected by Triton. You may also log
[`config.pbtxt`](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_model_configuration.md)
as additional artifact which Triton will be used to serve the model. Otherwise,
the server should be run with auto-complete feature enabled
(`--strict-model-config=false`) to generate the model configuration.

```
import mlflow.onnx
import onnx
model = onnx.load("examples/onnx_float32_int32_int32/1/model.onnx")
mlflow.onnx.log_model(model, "triton", registered_model_name="onnx_float32_int32_int32")
```

#### Triton flavor

For other model frameworks that Triton supports but not yet recognized by
the MLFlow Tritonserver, the cli `mlflow_tritonserver_cli` can be used to
publish `triton` flavor models to MLflow. A `triton` flavor model is a directory
containing the model files following the
[model layout](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_repository.md#repository-layout).
Below is an example usage:

```
mlflow_tritonserver_cli --model_name onnx_float32_int32_int32 --model_directory <path-to-the-examples-directory>/onnx_float32_int32_int32 --flavor triton
```

### Deploy models tracked in MLflow to Triton

Once a model is published and tracked in MLflow, it can be deployed to Triton
via MLflow's deployments command, the following command will download the model
to Triton's model repository and request Triton to load the model.

```
mlflow deployments create -t triton --flavor triton --name onnx_float32_int32_int32 -m models:/onnx_float32_int32_int32/1
```

### Perform inference

After the model is deployed, the following command is the CLI usage to send
inference request to a deployment.

```
mlflow deployments predict -t triton --name onnx_float32_int32_int32 --input-path <path-to-the-examples-directory>/input.json --output-path output.json
```

The inference result will be written in `output.json` and you may compare it
with the results in `expected_output.json`

## MLflow Deployments

"MLflow Deployments" is a set of MLflow APIs for deploying MLflow models to
custom serving tools. The MLflow Triton plugin implements the following
deployment functions to support the interaction with Triton server in MLflow.

### Create Deployment

MLflow deployments create API deploys a model to the Triton target, which will
download the model to Triton's model repository and request Triton to load the
model.

To create a MLflow deployment using CLI:

```
mlflow deployments create -t triton --flavor triton --name model_name -m models:/model_name/1
```

To create a MLflow deployment using Python API:

```
from mlflow.deployments import get_deploy_client
client = get_deploy_client('triton')
client.create_deployment("model_name", "models:/model_name/1", flavor="triton")
```

### Delete Deployment

MLflow deployments delete API removes an existing deployment from the Triton
target, which will remove the model in Triton's model repository and request
Triton to unload the model.

To delete a MLflow deployment using CLI

```
mlflow deployments delete -t triton --name model_name
```

To delete a MLflow deployment using Python API

```
from mlflow.deployments import get_deploy_client
client = get_deploy_client('triton')
client.delete_deployment("model_name")
```

### Update Deployment

MLflow deployments update API updates an existing deployment with another model
(version) tracked in MLflow, which will overwrite the model in Triton's model
repository and request Triton to reload the model.

To update a MLflow deployment using CLI

```
mlflow deployments update -t triton --flavor triton --name model_name -m models:/model_name/2
```

To update a MLflow deployment using Python API

```
from mlflow.deployments import get_deploy_client
client = get_deploy_client('triton')
client.update_deployment("model_name", "models:/model_name/2", flavor="triton")
```

### List Deployments

MLflow deployments list API lists all existing deployments in Triton target.

To list all MLflow deployments using CLI

```
mlflow deployments list -t triton
```

To list all MLflow deployments using Python API

```
from mlflow.deployments import get_deploy_client
client = get_deploy_client('triton')
client.list_deployments()
```

### Get Deployment

MLflow deployments get API returns information regarding a specific deployments
in Triton target.

To list a specific MLflow deployment using CLI
```
mlflow deployments get -t triton --name model_name
```

To list a specific MLflow deployment using Python API
```
from mlflow.deployments import get_deploy_client
client = get_deploy_client('triton')
client.get_deployment("model_name")
```

### Run Inference on Deployments

MLflow deployments predict API runs inference by preparing and sending the
request to Triton and returns the Triton response.

To run inference using CLI

```
mlflow deployments predict -t triton --name model_name --input-path input_file --output-path output_file

```

To run inference using Python API

```
from mlflow.deployments import get_deploy_client
client = get_deploy_client('triton')
client.predict("model_name", inputs)
```

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "mlflow-tritonserver",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "machine-learning,deep-learning,inference,tritonserver,mlflow",
    "author": "",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/41/14/8c34590f7e09f5fb17cce1ca19b641d58c9345a80be34484214e7381af0f/mlflow-tritonserver-1.1.0.tar.gz",
    "platform": null,
    "description": "# MLflow Tritonserver\n\nMLflow plugin for deploying your models from MLflow to Triton Inference Server.\nScripts are included for publishing models, which are in Triton recognized\nstructure, to your MLflow Model Registry.\n\n### Supported flavors\n\nMLFlow Tritonserver currently supports the following flavors, you may\nsubstitute the flavor specification in the example below according to the model\nto be deployed.\n\n* onnx\n* triton\n\n## Requirements\n\n* MLflow\n* Triton Python HTTP client\n* Triton Inference Server\n\n## Installation\n\nThe plugin can be installed from source using the following commands\n\n```\npip install mlflow_tritonserver\n```\n\n## Quick Start\n\nIn this documentation, we will use the files in `examples` to showcase how\nthe plugin interacts with Triton Inference Server. The `onnx_float32_int32_int32`\nmodel in `examples` is a simple model that takes two float32 inputs, INPUT0 and\nINPUT1, with shape [-1, 16], and produces two int32 outputs, OUTPUT0 and\nOUTPUT1, where OUTPUT0 is the element-wise summation of INPUT0 and INPUT1 and\nOUTPUT1 is the element-wise subtraction of INPUT0 and INPUT1.\n\n### Start Triton Inference Server in EXPLICIT mode\n\nThe MLflow Triton plugin must work with a running Triton server, see\n[documentation](https://github.com/triton-inference-server/server/blob/main/docs/getting_started/quickstart.md)\nof Triton Inference Server for how to start the server. Note that\nthe server should be run in EXPLICIT mode (`--model-control-mode=explicit`)\nto exploit the deployment feature of the plugin.\n\nOnce the server has started, the following environment must be set so that the plugin\ncan interact with the server properly:\n* `TRITON_URL`: The address to the Triton HTTP endpoint\n* `TRITON_MODEL_REPO`: The path to the Triton model repository. It can be an s3 URI but keep in \\\nmind that the env vars AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are needed.\n\n### Publish models to MLflow\n\n#### ONNX flavor\n\nThe MLFlow ONNX built-in functionalities can be used to publish `onnx` flavor\nmodels to MLFlow directly, and the MLFlow Tritonserver will prepare the model\nto the format expected by Triton. You may also log\n[`config.pbtxt`](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_model_configuration.md)\nas additional artifact which Triton will be used to serve the model. Otherwise,\nthe server should be run with auto-complete feature enabled\n(`--strict-model-config=false`) to generate the model configuration.\n\n```\nimport mlflow.onnx\nimport onnx\nmodel = onnx.load(\"examples/onnx_float32_int32_int32/1/model.onnx\")\nmlflow.onnx.log_model(model, \"triton\", registered_model_name=\"onnx_float32_int32_int32\")\n```\n\n#### Triton flavor\n\nFor other model frameworks that Triton supports but not yet recognized by\nthe MLFlow Tritonserver, the cli `mlflow_tritonserver_cli` can be used to\npublish `triton` flavor models to MLflow. A `triton` flavor model is a directory\ncontaining the model files following the\n[model layout](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_repository.md#repository-layout).\nBelow is an example usage:\n\n```\nmlflow_tritonserver_cli --model_name onnx_float32_int32_int32 --model_directory <path-to-the-examples-directory>/onnx_float32_int32_int32 --flavor triton\n```\n\n### Deploy models tracked in MLflow to Triton\n\nOnce a model is published and tracked in MLflow, it can be deployed to Triton\nvia MLflow's deployments command, the following command will download the model\nto Triton's model repository and request Triton to load the model.\n\n```\nmlflow deployments create -t triton --flavor triton --name onnx_float32_int32_int32 -m models:/onnx_float32_int32_int32/1\n```\n\n### Perform inference\n\nAfter the model is deployed, the following command is the CLI usage to send\ninference request to a deployment.\n\n```\nmlflow deployments predict -t triton --name onnx_float32_int32_int32 --input-path <path-to-the-examples-directory>/input.json --output-path output.json\n```\n\nThe inference result will be written in `output.json` and you may compare it\nwith the results in `expected_output.json`\n\n## MLflow Deployments\n\n\"MLflow Deployments\" is a set of MLflow APIs for deploying MLflow models to\ncustom serving tools. The MLflow Triton plugin implements the following\ndeployment functions to support the interaction with Triton server in MLflow.\n\n### Create Deployment\n\nMLflow deployments create API deploys a model to the Triton target, which will\ndownload the model to Triton's model repository and request Triton to load the\nmodel.\n\nTo create a MLflow deployment using CLI:\n\n```\nmlflow deployments create -t triton --flavor triton --name model_name -m models:/model_name/1\n```\n\nTo create a MLflow deployment using Python API:\n\n```\nfrom mlflow.deployments import get_deploy_client\nclient = get_deploy_client('triton')\nclient.create_deployment(\"model_name\", \"models:/model_name/1\", flavor=\"triton\")\n```\n\n### Delete Deployment\n\nMLflow deployments delete API removes an existing deployment from the Triton\ntarget, which will remove the model in Triton's model repository and request\nTriton to unload the model.\n\nTo delete a MLflow deployment using CLI\n\n```\nmlflow deployments delete -t triton --name model_name\n```\n\nTo delete a MLflow deployment using Python API\n\n```\nfrom mlflow.deployments import get_deploy_client\nclient = get_deploy_client('triton')\nclient.delete_deployment(\"model_name\")\n```\n\n### Update Deployment\n\nMLflow deployments update API updates an existing deployment with another model\n(version) tracked in MLflow, which will overwrite the model in Triton's model\nrepository and request Triton to reload the model.\n\nTo update a MLflow deployment using CLI\n\n```\nmlflow deployments update -t triton --flavor triton --name model_name -m models:/model_name/2\n```\n\nTo update a MLflow deployment using Python API\n\n```\nfrom mlflow.deployments import get_deploy_client\nclient = get_deploy_client('triton')\nclient.update_deployment(\"model_name\", \"models:/model_name/2\", flavor=\"triton\")\n```\n\n### List Deployments\n\nMLflow deployments list API lists all existing deployments in Triton target.\n\nTo list all MLflow deployments using CLI\n\n```\nmlflow deployments list -t triton\n```\n\nTo list all MLflow deployments using Python API\n\n```\nfrom mlflow.deployments import get_deploy_client\nclient = get_deploy_client('triton')\nclient.list_deployments()\n```\n\n### Get Deployment\n\nMLflow deployments get API returns information regarding a specific deployments\nin Triton target.\n\nTo list a specific MLflow deployment using CLI\n```\nmlflow deployments get -t triton --name model_name\n```\n\nTo list a specific MLflow deployment using Python API\n```\nfrom mlflow.deployments import get_deploy_client\nclient = get_deploy_client('triton')\nclient.get_deployment(\"model_name\")\n```\n\n### Run Inference on Deployments\n\nMLflow deployments predict API runs inference by preparing and sending the\nrequest to Triton and returns the Triton response.\n\nTo run inference using CLI\n\n```\nmlflow deployments predict -t triton --name model_name --input-path input_file --output-path output_file\n\n```\n\nTo run inference using Python API\n\n```\nfrom mlflow.deployments import get_deploy_client\nclient = get_deploy_client('triton')\nclient.predict(\"model_name\", inputs)\n```\n",
    "bugtrack_url": null,
    "license": "Apache 2.0",
    "summary": "Tritonserver Mlflow Deployment",
    "version": "1.1.0",
    "project_urls": {
        "Documentation": "https://github.com/msclock/mlflow-tritonserver.git",
        "Source": "https://github.com/msclock/mlflow-tritonserver.git",
        "Tracker": "https://github.com/msclock/mlflow-tritonserver/issues"
    },
    "split_keywords": [
        "machine-learning",
        "deep-learning",
        "inference",
        "tritonserver",
        "mlflow"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "14e46211dd275639f26b9afc2398acd077e6778aef430d9f04ceea18770a8cb4",
                "md5": "39489b6e63b1e74d5b264ffb544f99d1",
                "sha256": "b0d46159741e5c9cd2f717a6afe9c4c88e81a1d570c86652612f02f7b2ce87c3"
            },
            "downloads": -1,
            "filename": "mlflow_tritonserver-1.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "39489b6e63b1e74d5b264ffb544f99d1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 17302,
            "upload_time": "2023-07-14T11:07:33",
            "upload_time_iso_8601": "2023-07-14T11:07:33.880941Z",
            "url": "https://files.pythonhosted.org/packages/14/e4/6211dd275639f26b9afc2398acd077e6778aef430d9f04ceea18770a8cb4/mlflow_tritonserver-1.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "41148c34590f7e09f5fb17cce1ca19b641d58c9345a80be34484214e7381af0f",
                "md5": "c6d992680763efd52a7d720755484cb8",
                "sha256": "0879eadc067c9d48cb45fa03f339db320f86b8419c07af4a2bd4ccf19dca3a39"
            },
            "downloads": -1,
            "filename": "mlflow-tritonserver-1.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "c6d992680763efd52a7d720755484cb8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 24661,
            "upload_time": "2023-07-14T11:07:37",
            "upload_time_iso_8601": "2023-07-14T11:07:37.309682Z",
            "url": "https://files.pythonhosted.org/packages/41/14/8c34590f7e09f5fb17cce1ca19b641d58c9345a80be34484214e7381af0f/mlflow-tritonserver-1.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-14 11:07:37",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "msclock",
    "github_project": "mlflow-tritonserver",
    "github_not_found": true,
    "lcname": "mlflow-tritonserver"
}