# HuggingFace runtime for MLServer
This package provides a MLServer runtime compatible with HuggingFace Transformers.
## Usage
You can install the runtime, alongside `mlserver`, as:
```bash
pip install mlserver mlserver-huggingface
```
For further information on how to use MLServer with HuggingFace, you can check
out this [worked out example](../../docs/examples/huggingface/README.md).
## Content Types
The HuggingFace runtime will always decode the input request using its own
built-in codec.
Therefore, [content type annotations](../../docs/user-guide/content-type) at
the request level will **be ignored**.
Not that this **doesn't include [input-level content
type](../../docs/user-guide/content-type#Codecs) annotations**, which will be
respected as usual.
## Settings
The HuggingFace runtime exposes a couple extra parameters which can be used to
customise how the runtime behaves.
These settings can be added under the `parameters.extra` section of your
`model-settings.json` file, e.g.
```{code-block} json
---
emphasize-lines: 5-8
---
{
"name": "qa",
"implementation": "mlserver_huggingface.HuggingFaceRuntime",
"parameters": {
"extra": {
"task": "question-answering",
"optimum_model": true
}
}
}
```
````{note}
These settings can also be injected through environment variables prefixed with `MLSERVER_MODEL_HUGGINGFACE_`, e.g.
```bash
MLSERVER_MODEL_HUGGINGFACE_TASK="question-answering"
MLSERVER_MODEL_HUGGINGFACE_OPTIMUM_MODEL=true
```
````
### Loading models
#### Local models
It is possible to load a local model into a HuggingFace pipeline by specifying the model artefact folder path in `parameters.uri` in `model-settings.json`.
#### HuggingFace models
Models in the HuggingFace hub can be loaded by specifying their name in `parameters.extra.pretrained_model` in `model-settings.json`.
````{note}
If `parameters.extra.pretrained_model` is specified, it takes precedence over `parameters.uri`.
````
#### Model Inference
Model inference is done by HuggingFace pipeline. It allows users to run inference on a batch of inputs. Extra inference kwargs can be kept in `parameters.extra`.
```{code-block} json
{
"inputs": [
{
"name": "text_inputs",
"shape": [1],
"datatype": "BYTES",
"data": ["My kitten's name is JoJo,","Tell me a story:"],
}
],
"parameters": {
"extra":{"max_new_tokens": 200,"return_full_text": false}
}
}
```
### Reference
You can find the full reference of the accepted extra settings for the
HuggingFace runtime below:
```{eval-rst}
.. autopydantic_settings:: mlserver_huggingface.settings.HuggingFaceSettings
```
Raw data
{
"_id": null,
"home_page": "",
"name": "mlserver-huggingface-striveworks",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8.1,<3.12",
"maintainer_email": "",
"keywords": "",
"author": "Seldon Technologies Ltd.",
"author_email": "hello@seldon.io",
"download_url": "https://files.pythonhosted.org/packages/e2/0f/ed8b7bbed00fc75147acf680b93dc9c041db940f5ca3dfc27688a0513d4c/mlserver_huggingface_striveworks-1.4.0.dev3.tar.gz",
"platform": null,
"description": "# HuggingFace runtime for MLServer\n\nThis package provides a MLServer runtime compatible with HuggingFace Transformers.\n\n## Usage\n\nYou can install the runtime, alongside `mlserver`, as:\n\n```bash\npip install mlserver mlserver-huggingface\n```\n\nFor further information on how to use MLServer with HuggingFace, you can check\nout this [worked out example](../../docs/examples/huggingface/README.md).\n\n## Content Types\n\nThe HuggingFace runtime will always decode the input request using its own\nbuilt-in codec.\nTherefore, [content type annotations](../../docs/user-guide/content-type) at\nthe request level will **be ignored**.\nNot that this **doesn't include [input-level content\ntype](../../docs/user-guide/content-type#Codecs) annotations**, which will be\nrespected as usual.\n\n## Settings\n\nThe HuggingFace runtime exposes a couple extra parameters which can be used to\ncustomise how the runtime behaves.\nThese settings can be added under the `parameters.extra` section of your\n`model-settings.json` file, e.g.\n\n```{code-block} json\n---\nemphasize-lines: 5-8\n---\n{\n \"name\": \"qa\",\n \"implementation\": \"mlserver_huggingface.HuggingFaceRuntime\",\n \"parameters\": {\n \"extra\": {\n \"task\": \"question-answering\",\n \"optimum_model\": true\n }\n }\n}\n```\n\n````{note}\nThese settings can also be injected through environment variables prefixed with `MLSERVER_MODEL_HUGGINGFACE_`, e.g.\n\n```bash\nMLSERVER_MODEL_HUGGINGFACE_TASK=\"question-answering\"\nMLSERVER_MODEL_HUGGINGFACE_OPTIMUM_MODEL=true\n```\n````\n\n### Loading models\n#### Local models\nIt is possible to load a local model into a HuggingFace pipeline by specifying the model artefact folder path in `parameters.uri` in `model-settings.json`.\n\n#### HuggingFace models\nModels in the HuggingFace hub can be loaded by specifying their name in `parameters.extra.pretrained_model` in `model-settings.json`.\n\n````{note}\nIf `parameters.extra.pretrained_model` is specified, it takes precedence over `parameters.uri`.\n````\n\n#### Model Inference\nModel inference is done by HuggingFace pipeline. It allows users to run inference on a batch of inputs. Extra inference kwargs can be kept in `parameters.extra`.\n```{code-block} json\n{\n \"inputs\": [\n {\n \"name\": \"text_inputs\",\n \"shape\": [1],\n \"datatype\": \"BYTES\",\n \"data\": [\"My kitten's name is JoJo,\",\"Tell me a story:\"],\n }\n ],\n \"parameters\": {\n \"extra\":{\"max_new_tokens\": 200,\"return_full_text\": false}\n }\n}\n```\n\n### Reference\n\nYou can find the full reference of the accepted extra settings for the\nHuggingFace runtime below:\n\n```{eval-rst}\n\n.. autopydantic_settings:: mlserver_huggingface.settings.HuggingFaceSettings\n```\n\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "HuggingFace runtime for MLServer",
"version": "1.4.0.dev3",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ae4cdca72cc4c921e13741cc5a0906142453af431f408b03e9604dd8397836c5",
"md5": "ccaf5ea5410d6fe997de4d22e4feaa9f",
"sha256": "fe8d7488082647c351e0afb49e38db25d2e9a8485028f346545e4a97aec0cbd7"
},
"downloads": -1,
"filename": "mlserver_huggingface_striveworks-1.4.0.dev3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ccaf5ea5410d6fe997de4d22e4feaa9f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8.1,<3.12",
"size": 22250,
"upload_time": "2024-02-08T22:30:38",
"upload_time_iso_8601": "2024-02-08T22:30:38.710274Z",
"url": "https://files.pythonhosted.org/packages/ae/4c/dca72cc4c921e13741cc5a0906142453af431f408b03e9604dd8397836c5/mlserver_huggingface_striveworks-1.4.0.dev3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "e20fed8b7bbed00fc75147acf680b93dc9c041db940f5ca3dfc27688a0513d4c",
"md5": "116efcc0885759756aea63fe716ad491",
"sha256": "8376a60201e3f664a0e59ce3eb83e226dbaa94043840a3b64a67ceadb90b49fc"
},
"downloads": -1,
"filename": "mlserver_huggingface_striveworks-1.4.0.dev3.tar.gz",
"has_sig": false,
"md5_digest": "116efcc0885759756aea63fe716ad491",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8.1,<3.12",
"size": 16278,
"upload_time": "2024-02-08T22:30:40",
"upload_time_iso_8601": "2024-02-08T22:30:40.059165Z",
"url": "https://files.pythonhosted.org/packages/e2/0f/ed8b7bbed00fc75147acf680b93dc9c041db940f5ca3dfc27688a0513d4c/mlserver_huggingface_striveworks-1.4.0.dev3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-02-08 22:30:40",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "mlserver-huggingface-striveworks"
}