nvidia-pytriton

Name	nvidia-pytriton JSON
Version	0.5.5 JSON
	download
home_page	None
Summary	PyTriton - Flask/FastAPI-like interface to simplify Triton's deployment in Python environments.
upload_time	2024-04-14 20:10:44
maintainer	None
docs_url	None
author	None
requires_python	<4,>=3.8
license	Apache 2.0
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            ..
    Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.

    Licensed under the Apache License, Version 2.0 (the "License");
    you may not use this file except in compliance with the License.
    You may obtain a copy of the License at

        http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software
    distributed under the License is distributed on an "AS IS" BASIS,
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.

PyTriton
==========

PyTriton - a Flask/FastAPI-like framework designed to streamline
the use of NVIDIA's `Triton Inference Server <https://github.com/triton-inference-server>`_.

For comprehensive guidance on how to deploy your models, optimize performance,
and explore the API, delve into the extensive resources found in our
`documentation <https://triton-inference-server.github.io/pytriton/>`_.

Features at a Glance
--------------------

The distinct capabilities of PyTriton are summarized in the feature matrix:

+------------------------+--------------------------------------------------------------------------------------+
| Feature                | Description                                                                          |
+========================+======================================================================================+
| Native Python support  | You can create any `Python function <https://triton-inference-server.github.io/pytri |
|                        | ton/latest/inference_callables/>`_ and expose it as an HTTP/gRPC API.                |
+------------------------+--------------------------------------------------------------------------------------+
| Framework-agnostic     | You can run any Python code with any framework of your choice, such as: PyTorch,     |
|                        | TensorFlow, or JAX.                                                                  |
+------------------------+--------------------------------------------------------------------------------------+
| Performance            | You can benefit from `dynamic batching <https://triton-inference-server.github.io/py |
| optimization           | triton/latest/inference_callables/decorators/#batch>`_, response cache, model        |
|                        | pipelining, `clusters <https://triton-inference-server.github.io/pytriton/latest/    |
|                        | guides/deploying_in_clusters/>`_, and GPU/CPU inference.                             |
+------------------------+--------------------------------------------------------------------------------------+
| Decorators             | You can use batching `decorators <https://triton-inference-server.github.io/pytriton |
|                        | /latest/inference_callables/decorators/>`_ to handle batching  and other             |
|                        | pre-processing tasks for your inference function.                                    |
+------------------------+--------------------------------------------------------------------------------------+
| Easy `installation     | You can use a simple and familiar interface based on Flask/FastAPI for easy          |
| <https://triton-infer  | installation and `setup <https://triton-inference-server.github.io/pytriton/latest/b |
| ence-server.github.io/ | inding_models/>`_.                                                                   |
| pytriton/latest/instal |                                                                                      |
| lation/>`_ and setup   |                                                                                      |
+------------------------+--------------------------------------------------------------------------------------+
| `Model clients         | You can access high-level model clients for HTTP/gRPC requests with configurable     |
| <https://triton-infer  | options and both synchronous and `asynchronous <https://triton-inference-server.gith |
| ence-server.github.io/ | ub.io/pytriton/latest/clients/#asynciomodelclient>`_  API.                           |
| pytriton/latest/clien  |                                                                                      |
| ts>`_                  |                                                                                      |
+------------------------+--------------------------------------------------------------------------------------+
| Streaming (alpha)      | You can stream partial responses from a model by serving it in a `decoupled mode     |
|                        | <https://triton-inference-server.github.io/pytriton/latest/clients/#decoupledmodelcl |
|                        | ient>`_.                                                                             |
+------------------------+--------------------------------------------------------------------------------------+

Learn more about PyTriton's `architecture <https://triton-inference-server.github.io/pytriton/latest/#architecture>`_.

Prerequisites
-------------

Before proceeding with the installation of PyTriton, ensure your system meets the following criteria:

- **Operating System**: Compatible with glibc version ``2.35`` or higher.
  - Primarily tested on Ubuntu 22.04.
  - Other supported OS include Debian 11+, Rocky Linux 9+, and Red Hat UBI 9+.
  - Use ``ldd --version`` to verify your glibc version.
- **Python**: Version ``3.8`` or newer.
- **pip**: Version ``20.3`` or newer.
- **libpython**: Ensure ``libpython3.*.so`` is installed, corresponding to your Python version.

Install
-------

The PyTriton can be installed from pypi.org by running the following command::

    pip install nvidia-pytriton

**Important**: The Triton Inference Server binary is installed as part of the PyTriton package.

Discover more about PyTriton's `installation procedures <https://triton-inference-server.github.io/pytriton/latest/installation/>`_, including Docker usage, prerequisites, and insights into `building binaries from source <https://triton-inference-server.github.io/pytriton/latest/guides/building/>`_ to match your specific Triton server versions.


Quick Start
-----------

The quick start presents how to run Python model in Triton Inference Server without need to change the current working
environment. In the example we are using a simple `Linear` model.

The `infer_fn` is a function that takes an `data` tensor and returns a list with single output tensor. The `@batch` from `batching decorators <https://triton-inference-server.github.io/pytriton/latest/inference_callables/decorators/>`_ is used to handle batching for the model.

.. code-block:: python

    import numpy as np
    from pytriton.decorators import batch

    @batch
    def infer_fn(data):
        result = data * np.array([[-1]], dtype=np.float32)  # Process inputs and produce result
        return [result]


In the next step, you can create the binding between the inference callable and Triton Inference Server using the `bind` method from pyTriton. This method takes the model name, the inference callable, the inputs and outputs tensors, and an optional model configuration object.

.. code-block:: python

    from pytriton.model_config import Tensor
    from pytriton.triton import Triton
    triton = Triton()
    triton.bind(
        model_name="Linear",
        infer_func=infer_fn,
        inputs=[Tensor(name="data", dtype=np.float32, shape=(-1,)),],
        outputs=[Tensor(name="result", dtype=np.float32, shape=(-1,)),],
    )
    triton.run()

Finally, you can send an inference query to the model using the `ModelClient` class. The `infer_sample` method takes the input data as a numpy array and returns the output data as a numpy array. You can learn more about the `ModelClient` class in the `clients <https://triton-inference-server.github.io/pytriton/latest/clients/>`_ section.

.. code-block:: python

    from pytriton.client import ModelClient

    client = ModelClient("localhost", "Linear")
    data = np.array([1, 2, ], dtype=np.float32)
    print(client.infer_sample(data=data))

After the inference is done, you can stop the Triton Inference Server and close the client:

.. code-block:: python

    client.close()
    triton.stop()

The output of the inference should be:

.. code-block:: python

    {'result': array([-1., -2.], dtype=float32)}


For the full example, including defining the model and binding it to the Triton server, check out our detailed `Quick Start <https://triton-inference-server.github.io/pytriton/latest/quick_start/>`_ instructions. Get your model up and running, explore how to serve it, and learn how to `invoke it from client applications <https://triton-inference-server.github.io/pytriton/latest/clients/>`_.


The full example code can be found in `examples/linear_random_pytorch <https://github.com/triton-inference-server/pytriton/tree/main/examples/linear_random_pytorch>`_.

Examples
--------

The `examples <https://triton-inference-server.github.io/pytriton/latest/examples/>`_ page showcases various use cases of serving models using PyTriton. This includes simple examples of running models in PyTorch, TensorFlow2, JAX, and plain Python. In addition, more advanced scenarios are covered, such as online learning, multi-node models, and deployment on Kubernetes using PyTriton. Each example is accompanied by instructions on how to build and run it. Discover more about utilizing PyTriton by exploring our examples.


Links
-------

* `Source <https://github.com/triton-inference-server/pytriton>`_
* `Issues  <https://github.com/triton-inference-server/pytriton/issues>`_
* `Changelog <https://github.com/triton-inference-server/pytriton/blob/main/CHANGELOG.md>`_
* `Known Issues <https://github.com/triton-inference-server/pytriton/blob/main/docs/known_issues.md>`_
* `Contributing <https://github.com/triton-inference-server/pytriton/blob/main/CONTRIBUTING.md>`_

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "nvidia-pytriton",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4,>=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": null,
    "download_url": null,
    "platform": null,
    "description": "..\n    Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.\n\n    Licensed under the Apache License, Version 2.0 (the \"License\");\n    you may not use this file except in compliance with the License.\n    You may obtain a copy of the License at\n\n        http://www.apache.org/licenses/LICENSE-2.0\n\n    Unless required by applicable law or agreed to in writing, software\n    distributed under the License is distributed on an \"AS IS\" BASIS,\n    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n    See the License for the specific language governing permissions and\n    limitations under the License.\n\nPyTriton\n==========\n\nPyTriton - a Flask/FastAPI-like framework designed to streamline\nthe use of NVIDIA's `Triton Inference Server <https://github.com/triton-inference-server>`_.\n\nFor comprehensive guidance on how to deploy your models, optimize performance,\nand explore the API, delve into the extensive resources found in our\n`documentation <https://triton-inference-server.github.io/pytriton/>`_.\n\nFeatures at a Glance\n--------------------\n\nThe distinct capabilities of PyTriton are summarized in the feature matrix:\n\n+------------------------+--------------------------------------------------------------------------------------+\n| Feature                | Description                                                                          |\n+========================+======================================================================================+\n| Native Python support  | You can create any `Python function <https://triton-inference-server.github.io/pytri |\n|                        | ton/latest/inference_callables/>`_ and expose it as an HTTP/gRPC API.                |\n+------------------------+--------------------------------------------------------------------------------------+\n| Framework-agnostic     | You can run any Python code with any framework of your choice, such as: PyTorch,     |\n|                        | TensorFlow, or JAX.                                                                  |\n+------------------------+--------------------------------------------------------------------------------------+\n| Performance            | You can benefit from `dynamic batching <https://triton-inference-server.github.io/py |\n| optimization           | triton/latest/inference_callables/decorators/#batch>`_, response cache, model        |\n|                        | pipelining, `clusters <https://triton-inference-server.github.io/pytriton/latest/    |\n|                        | guides/deploying_in_clusters/>`_, and GPU/CPU inference.                             |\n+------------------------+--------------------------------------------------------------------------------------+\n| Decorators             | You can use batching `decorators <https://triton-inference-server.github.io/pytriton |\n|                        | /latest/inference_callables/decorators/>`_ to handle batching  and other             |\n|                        | pre-processing tasks for your inference function.                                    |\n+------------------------+--------------------------------------------------------------------------------------+\n| Easy `installation     | You can use a simple and familiar interface based on Flask/FastAPI for easy          |\n| <https://triton-infer  | installation and `setup <https://triton-inference-server.github.io/pytriton/latest/b |\n| ence-server.github.io/ | inding_models/>`_.                                                                   |\n| pytriton/latest/instal |                                                                                      |\n| lation/>`_ and setup   |                                                                                      |\n+------------------------+--------------------------------------------------------------------------------------+\n| `Model clients         | You can access high-level model clients for HTTP/gRPC requests with configurable     |\n| <https://triton-infer  | options and both synchronous and `asynchronous <https://triton-inference-server.gith |\n| ence-server.github.io/ | ub.io/pytriton/latest/clients/#asynciomodelclient>`_  API.                           |\n| pytriton/latest/clien  |                                                                                      |\n| ts>`_                  |                                                                                      |\n+------------------------+--------------------------------------------------------------------------------------+\n| Streaming (alpha)      | You can stream partial responses from a model by serving it in a `decoupled mode     |\n|                        | <https://triton-inference-server.github.io/pytriton/latest/clients/#decoupledmodelcl |\n|                        | ient>`_.                                                                             |\n+------------------------+--------------------------------------------------------------------------------------+\n\nLearn more about PyTriton's `architecture <https://triton-inference-server.github.io/pytriton/latest/#architecture>`_.\n\nPrerequisites\n-------------\n\nBefore proceeding with the installation of PyTriton, ensure your system meets the following criteria:\n\n- **Operating System**: Compatible with glibc version ``2.35`` or higher.\n  - Primarily tested on Ubuntu 22.04.\n  - Other supported OS include Debian 11+, Rocky Linux 9+, and Red Hat UBI 9+.\n  - Use ``ldd --version`` to verify your glibc version.\n- **Python**: Version ``3.8`` or newer.\n- **pip**: Version ``20.3`` or newer.\n- **libpython**: Ensure ``libpython3.*.so`` is installed, corresponding to your Python version.\n\nInstall\n-------\n\nThe PyTriton can be installed from pypi.org by running the following command::\n\n    pip install nvidia-pytriton\n\n**Important**: The Triton Inference Server binary is installed as part of the PyTriton package.\n\nDiscover more about PyTriton's `installation procedures <https://triton-inference-server.github.io/pytriton/latest/installation/>`_, including Docker usage, prerequisites, and insights into `building binaries from source <https://triton-inference-server.github.io/pytriton/latest/guides/building/>`_ to match your specific Triton server versions.\n\n\nQuick Start\n-----------\n\nThe quick start presents how to run Python model in Triton Inference Server without need to change the current working\nenvironment. In the example we are using a simple `Linear` model.\n\nThe `infer_fn` is a function that takes an `data` tensor and returns a list with single output tensor. The `@batch` from `batching decorators <https://triton-inference-server.github.io/pytriton/latest/inference_callables/decorators/>`_ is used to handle batching for the model.\n\n.. code-block:: python\n\n    import numpy as np\n    from pytriton.decorators import batch\n\n    @batch\n    def infer_fn(data):\n        result = data * np.array([[-1]], dtype=np.float32)  # Process inputs and produce result\n        return [result]\n\n\nIn the next step, you can create the binding between the inference callable and Triton Inference Server using the `bind` method from pyTriton. This method takes the model name, the inference callable, the inputs and outputs tensors, and an optional model configuration object.\n\n.. code-block:: python\n\n    from pytriton.model_config import Tensor\n    from pytriton.triton import Triton\n    triton = Triton()\n    triton.bind(\n        model_name=\"Linear\",\n        infer_func=infer_fn,\n        inputs=[Tensor(name=\"data\", dtype=np.float32, shape=(-1,)),],\n        outputs=[Tensor(name=\"result\", dtype=np.float32, shape=(-1,)),],\n    )\n    triton.run()\n\nFinally, you can send an inference query to the model using the `ModelClient` class. The `infer_sample` method takes the input data as a numpy array and returns the output data as a numpy array. You can learn more about the `ModelClient` class in the `clients <https://triton-inference-server.github.io/pytriton/latest/clients/>`_ section.\n\n.. code-block:: python\n\n    from pytriton.client import ModelClient\n\n    client = ModelClient(\"localhost\", \"Linear\")\n    data = np.array([1, 2, ], dtype=np.float32)\n    print(client.infer_sample(data=data))\n\nAfter the inference is done, you can stop the Triton Inference Server and close the client:\n\n.. code-block:: python\n\n    client.close()\n    triton.stop()\n\nThe output of the inference should be:\n\n.. code-block:: python\n\n    {'result': array([-1., -2.], dtype=float32)}\n\n\nFor the full example, including defining the model and binding it to the Triton server, check out our detailed `Quick Start <https://triton-inference-server.github.io/pytriton/latest/quick_start/>`_ instructions. Get your model up and running, explore how to serve it, and learn how to `invoke it from client applications <https://triton-inference-server.github.io/pytriton/latest/clients/>`_.\n\n\nThe full example code can be found in `examples/linear_random_pytorch <https://github.com/triton-inference-server/pytriton/tree/main/examples/linear_random_pytorch>`_.\n\nExamples\n--------\n\nThe `examples <https://triton-inference-server.github.io/pytriton/latest/examples/>`_ page showcases various use cases of serving models using PyTriton. This includes simple examples of running models in PyTorch, TensorFlow2, JAX, and plain Python. In addition, more advanced scenarios are covered, such as online learning, multi-node models, and deployment on Kubernetes using PyTriton. Each example is accompanied by instructions on how to build and run it. Discover more about utilizing PyTriton by exploring our examples.\n\n\nLinks\n-------\n\n* `Source <https://github.com/triton-inference-server/pytriton>`_\n* `Issues  <https://github.com/triton-inference-server/pytriton/issues>`_\n* `Changelog <https://github.com/triton-inference-server/pytriton/blob/main/CHANGELOG.md>`_\n* `Known Issues <https://github.com/triton-inference-server/pytriton/blob/main/docs/known_issues.md>`_\n* `Contributing <https://github.com/triton-inference-server/pytriton/blob/main/CONTRIBUTING.md>`_\n",
    "bugtrack_url": null,
    "license": "Apache 2.0",
    "summary": "PyTriton - Flask/FastAPI-like interface to simplify Triton's deployment in Python environments.",
    "version": "0.5.5",
    "project_urls": {
        "Documentation": "https://triton-inference-server.github.io/pytriton",
        "Source": "https://github.com/triton-inference-server/pytriton",
        "Tracker": "https://github.com/triton-inference-server/pytriton/issues"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "330efb12de3f5270c15ecfc8b81aef3d1ae086354225d17142b4f131a7a7fb66",
                "md5": "6122c8052ead2b00992036c068450872",
                "sha256": "7a78e1252a4881945f67a1a9e397b4a0846710fe3889e50114d8760977b160d8"
            },
            "downloads": -1,
            "filename": "nvidia_pytriton-0.5.5-py3-none-manylinux_2_35_aarch64.whl",
            "has_sig": false,
            "md5_digest": "6122c8052ead2b00992036c068450872",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4,>=3.8",
            "size": 51330837,
            "upload_time": "2024-04-14T20:10:44",
            "upload_time_iso_8601": "2024-04-14T20:10:44.156518Z",
            "url": "https://files.pythonhosted.org/packages/33/0e/fb12de3f5270c15ecfc8b81aef3d1ae086354225d17142b4f131a7a7fb66/nvidia_pytriton-0.5.5-py3-none-manylinux_2_35_aarch64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c804633c5a24d423e2a8a2bf6370602d2182f34dca246f1b9ad7ea5edda7c77b",
                "md5": "d894fc019a8858ac366fcb966f9ab72d",
                "sha256": "986ee75b8fb67bba9e3b0058bd5ff3981b17be3b88e178b0b60971d89ba42a63"
            },
            "downloads": -1,
            "filename": "nvidia_pytriton-0.5.5-py3-none-manylinux_2_35_x86_64.whl",
            "has_sig": false,
            "md5_digest": "d894fc019a8858ac366fcb966f9ab72d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4,>=3.8",
            "size": 52766799,
            "upload_time": "2024-04-14T20:11:28",
            "upload_time_iso_8601": "2024-04-14T20:11:28.816398Z",
            "url": "https://files.pythonhosted.org/packages/c8/04/633c5a24d423e2a8a2bf6370602d2182f34dca246f1b9ad7ea5edda7c77b/nvidia_pytriton-0.5.5-py3-none-manylinux_2_35_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-14 20:10:44",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "triton-inference-server",
    "github_project": "pytriton",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "nvidia-pytriton"
}

None