<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://github.com/bentoml/BentoML/assets/489344/d3e6c95d-d224-49a5-9cff-0789f094e127">
<source media="(prefers-color-scheme: light)" srcset="https://github.com/bentoml/BentoML/assets/489344/de4da660-6aeb-4e5a-bf76-b7177435444d">
<img alt="BentoML: Unified Model Serving Framework" src="https://github.com/bentoml/BentoML/assets/489344/de4da660-6aeb-4e5a-bf76-b7177435444d" width="370" style="max-width: 100%;">
</picture>
## Unified Model Serving Framework
🍱 Build model inference APIs and multi-model serving systems with any open-source or custom AI models. 👉 [Join our Slack community!](https://l.bentoml.com/join-slack)
[](https://github.com/bentoml/BentoML?tab=Apache-2.0-1-ov-file)
[](https://github.com/bentoml/bentoml/releases)
[](https://github.com/bentoml/BentoML/actions/workflows/ci.yml?query=branch%3Amain)
[](https://twitter.com/bentomlai)
[](https://l.bentoml.com/join-slack)
## What is BentoML?
BentoML is a Python library for building online serving systems optimized for AI apps and model inference.
- **🍱 Easily build APIs for Any AI/ML Model.** Turn any model inference script into a REST API server with just a few lines of code and standard Python type hints.
- **🐳 Docker Containers made simple.** No more dependency hell! Manage your environments, dependencies and model versions with a simple config file. BentoML automatically generates Docker images, ensures reproducibility, and simplifies how you deploy to different environments.
- **🧭 Maximize CPU/GPU utilization.** Build high performance inference APIs leveraging built-in serving optimization features like dynamic batching, model parallelism, multi-stage pipeline and multi-model inference-graph orchestration.
- **👩💻 Fully customizable.** Easily implement your own APIs or task queues, with custom business logic, model inference and multi-model composition. Supports any ML framework, modality, and inference runtime.
- **🚀 Ready for Production.** Develop, run and debug locally. Seamlessly deploy to production with Docker containers or [BentoCloud](https://www.bentoml.com/).
## Getting started
Install BentoML:
```
# Requires Python≥3.9
pip install -U bentoml
```
Define APIs in a `service.py` file.
```python
from __future__ import annotations
import bentoml
@bentoml.service(
resources={"cpu": "4"}
)
class Summarization:
def __init__(self) -> None:
import torch
from transformers import pipeline
device = "cuda" if torch.cuda.is_available() else "cpu"
self.pipeline = pipeline('summarization', device=device)
@bentoml.api(batchable=True)
def summarize(self, texts: list[str]) -> list[str]:
results = self.pipeline(texts)
return [item['summary_text'] for item in results]
```
Run the service code locally (serving at http://localhost:3000 by default):
```bash
pip install torch transformers # additional dependencies for local run
bentoml serve
```
Now you can run inference from your browser at http://localhost:3000 or with a Python script:
```python
import bentoml
with bentoml.SyncHTTPClient('http://localhost:3000') as client:
summarized_text: str = client.summarize([bentoml.__doc__])[0]
print(f"Result: {summarized_text}")
```
### Deploying your first Bento
To deploy your BentoML Service code, configure the [runtime environment](https://docs.bentoml.com/en/latest/build-with-bentoml/runtime-environment.html) in `service.py`.
```diff
+ my_image = bentoml.images.PythonImage(python_version="3.11") \
+ .requirements_file("requirements.txt")
@bentoml.service(
+ image=my_image,
resources={"cpu": "4"},
traffic={"timeout": 30},
)
```
Then, choose one of the following ways for deployment:
<details>
<summary>🐳 Docker Container</summary>
Run `bentoml build` to package necessary code, models, dependency configs into a Bento - the standardized deployable artifact in BentoML:
```bash
bentoml build
```
Ensure [Docker](https://docs.docker.com/) is running. Generate a Docker container image for deployment:
```bash
bentoml containerize summarization:latest
```
Run the generated image:
```bash
docker run --rm -p 3000:3000 summarization:latest
```
</details>
<details>
<summary>☁️ BentoCloud</summary>
[BentoCloud](www.bentoml.com) provides compute infrastructure for rapid and reliable GenAI adoption. It helps speed up your BentoML development process leveraging cloud compute resources, and simplify how you deploy, scale and operate BentoML in production.
[Sign up for BentoCloud](https://cloud.bentoml.com/signup) for personal access; for enterprise use cases, [contact our team](https://www.bentoml.com/contact).
```bash
# After signup, run the following command to create an API token:
bentoml cloud login
# Deploy from current directory:
bentoml deploy
```

</details>
For detailed explanations, read the [Hello World example](https://docs.bentoml.com/en/latest/get-started/hello-world.html).
## Examples
- LLMs: [Llama 3.2](https://github.com/bentoml/BentoVLLM/tree/main/llama3.2-90b-instruct), [Mixtral](https://github.com/bentoml/BentoVLLM/tree/main/mixtral-8x7b-instruct), [Solar](https://github.com/bentoml/BentoVLLM/tree/main/solar-10.7b-instruct), and [Mistral](https://github.com/bentoml/BentoVLLM/tree/main/mistral-7b-instruct).
- Image Generation: [Stable Diffusion 3 Medium](https://github.com/bentoml/BentoDiffusion/tree/main/sd3-medium), [Stable Video Diffusion](https://github.com/bentoml/BentoDiffusion/tree/main/svd), [Stable Diffusion XL Turbo](https://github.com/bentoml/BentoDiffusion/tree/main/sdxl-turbo), [ControlNet](https://github.com/bentoml/BentoDiffusion/tree/main/controlnet), and [LCM LoRAs](https://github.com/bentoml/BentoDiffusion/tree/main/lcm).
- Embeddings: [SentenceTransformers](https://github.com/bentoml/BentoSentenceTransformers) and [ColPali](https://github.com/bentoml/BentoColPali)
- Audio: [ChatTTS](https://github.com/bentoml/BentoChatTTS), [XTTS](https://github.com/bentoml/BentoXTTS), [WhisperX](https://github.com/bentoml/BentoWhisperX), [Bark](https://github.com/bentoml/BentoBark)
- Computer Vision: [YOLO](https://github.com/bentoml/BentoYolo) and [ResNet](https://github.com/bentoml/BentoResnet)
- Advanced examples: [Function calling](https://github.com/bentoml/BentoFunctionCalling), [LangGraph](https://github.com/bentoml/BentoLangGraph), [CrewAI](https://github.com/bentoml/BentoCrewAI)
Check out the [full list](https://docs.bentoml.com/en/latest/examples/overview.html) for more sample code and usage.
## Advanced topics
- [Model composition](https://docs.bentoml.com/en/latest/get-started/model-composition.html)
- [Workers and model parallelization](https://docs.bentoml.com/en/latest/build-with-bentoml/parallelize-requests.html)
- [Adaptive batching](https://docs.bentoml.com/en/latest/get-started/adaptive-batching.html)
- [GPU inference](https://docs.bentoml.com/en/latest/build-with-bentoml/gpu-inference.html)
- [Distributed serving systems](https://docs.bentoml.com/en/latest/build-with-bentoml/distributed-services.html)
- [Concurrency and autoscaling](https://docs.bentoml.com/en/latest/scale-with-bentocloud/scaling/autoscaling.html)
- [Model loading and Model Store](https://docs.bentoml.com/en/latest/build-with-bentoml/model-loading-and-management.html)
- [Observability](https://docs.bentoml.com/en/latest/build-with-bentoml/observability/index.html)
- [BentoCloud deployment](https://docs.bentoml.com/en/latest/get-started/cloud-deployment.html)
See [Documentation](https://docs.bentoml.com) for more tutorials and guides.
## Community
Get involved and join our [Community Slack 💬](https://l.bentoml.com/join-slack), where thousands of AI/ML engineers help each other, contribute to the project, and talk about building AI products.
To report a bug or suggest a feature request, use
[GitHub Issues](https://github.com/bentoml/BentoML/issues/new/choose).
### Contributing
There are many ways to contribute to the project:
- Report bugs and "Thumbs up" on [issues](https://github.com/bentoml/BentoML/issues) that are relevant to you.
- Investigate [issues](https://github.com/bentoml/BentoML/issues) and review other developers' [pull requests](https://github.com/bentoml/BentoML/pulls).
- Contribute code or [documentation](https://docs.bentoml.com/en/latest/index.html) to the project by submitting a GitHub pull request.
- Check out the [Contributing Guide](https://github.com/bentoml/BentoML/blob/main/CONTRIBUTING.md) and [Development Guide](https://github.com/bentoml/BentoML/blob/main/DEVELOPMENT.md) to learn more.
- Share your feedback and discuss roadmap plans in the `#bentoml-contributors` channel [here](https://l.bentoml.com/join-slack).
Thanks to all of our amazing contributors!
<a href="https://github.com/bentoml/BentoML/graphs/contributors">
<img src="https://contrib.rocks/image?repo=bentoml/BentoML" />
</a>
### Usage tracking and feedback
The BentoML framework collects anonymous usage data that helps our community improve the product. Only BentoML's internal API calls are being reported. This excludes any sensitive information, such as user code, model data, model names, or stack traces. Here's the [code](https://github.com/bentoml/BentoML/blob/main/src/bentoml/_internal/utils/analytics/usage_stats.py) used for usage tracking. You can opt-out of usage tracking by the `--do-not-track` CLI option:
```bash
bentoml [command] --do-not-track
```
Or by setting the environment variable:
```bash
export BENTOML_DO_NOT_TRACK=True
```
### License
[Apache License 2.0](https://github.com/bentoml/BentoML/blob/main/LICENSE)
Raw data
{
"_id": null,
"home_page": null,
"name": "bentoml",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "BentoML, Compound AI Systems, LLMOps, MLOps, Model Deployment, Model Inference, Model Serving",
"author": null,
"author_email": "BentoML Team <contact@bentoml.com>",
"download_url": "https://files.pythonhosted.org/packages/7e/5e/acb7cbd1b757f7e585a28b5d568a6c25ceb785e98ad20ec9f5d7dc45afb9/bentoml-1.4.1.tar.gz",
"platform": null,
"description": "<picture>\n <source media=\"(prefers-color-scheme: dark)\" srcset=\"https://github.com/bentoml/BentoML/assets/489344/d3e6c95d-d224-49a5-9cff-0789f094e127\">\n <source media=\"(prefers-color-scheme: light)\" srcset=\"https://github.com/bentoml/BentoML/assets/489344/de4da660-6aeb-4e5a-bf76-b7177435444d\">\n <img alt=\"BentoML: Unified Model Serving Framework\" src=\"https://github.com/bentoml/BentoML/assets/489344/de4da660-6aeb-4e5a-bf76-b7177435444d\" width=\"370\" style=\"max-width: 100%;\">\n</picture>\n\n## Unified Model Serving Framework\n\n\ud83c\udf71 Build model inference APIs and multi-model serving systems with any open-source or custom AI models. \ud83d\udc49 [Join our Slack community!](https://l.bentoml.com/join-slack)\n\n[](https://github.com/bentoml/BentoML?tab=Apache-2.0-1-ov-file)\n[](https://github.com/bentoml/bentoml/releases)\n[](https://github.com/bentoml/BentoML/actions/workflows/ci.yml?query=branch%3Amain)\n[](https://twitter.com/bentomlai)\n[](https://l.bentoml.com/join-slack)\n\n## What is BentoML?\n\nBentoML is a Python library for building online serving systems optimized for AI apps and model inference.\n\n- **\ud83c\udf71 Easily build APIs for Any AI/ML Model.** Turn any model inference script into a REST API server with just a few lines of code and standard Python type hints.\n- **\ud83d\udc33 Docker Containers made simple.** No more dependency hell! Manage your environments, dependencies and model versions with a simple config file. BentoML automatically generates Docker images, ensures reproducibility, and simplifies how you deploy to different environments.\n- **\ud83e\udded Maximize CPU/GPU utilization.** Build high performance inference APIs leveraging built-in serving optimization features like dynamic batching, model parallelism, multi-stage pipeline and multi-model inference-graph orchestration.\n- **\ud83d\udc69\u200d\ud83d\udcbb Fully customizable.** Easily implement your own APIs or task queues, with custom business logic, model inference and multi-model composition. Supports any ML framework, modality, and inference runtime.\n- **\ud83d\ude80 Ready for Production.** Develop, run and debug locally. Seamlessly deploy to production with Docker containers or [BentoCloud](https://www.bentoml.com/).\n\n## Getting started\n\nInstall BentoML:\n\n```\n# Requires Python\u22653.9\npip install -U bentoml\n```\n\nDefine APIs in a\u00a0`service.py`\u00a0file.\n\n```python\nfrom __future__ import annotations\n\nimport bentoml\n\n@bentoml.service(\n resources={\"cpu\": \"4\"}\n)\nclass Summarization:\n def __init__(self) -> None:\n import torch\n from transformers import pipeline\n\n device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n self.pipeline = pipeline('summarization', device=device)\n\n @bentoml.api(batchable=True)\n def summarize(self, texts: list[str]) -> list[str]:\n results = self.pipeline(texts)\n return [item['summary_text'] for item in results]\n```\n\nRun the service code locally (serving at http://localhost:3000 by default):\n\n```bash\npip install torch transformers # additional dependencies for local run\n\nbentoml serve\n```\n\nNow you can run inference from your browser at http://localhost:3000 or with a Python script:\n\n```python\nimport bentoml\n\nwith bentoml.SyncHTTPClient('http://localhost:3000') as client:\n summarized_text: str = client.summarize([bentoml.__doc__])[0]\n print(f\"Result: {summarized_text}\")\n```\n\n### Deploying your first Bento\n\nTo deploy your BentoML Service code, configure the [runtime environment](https://docs.bentoml.com/en/latest/build-with-bentoml/runtime-environment.html) in `service.py`.\n\n```diff\n\n+ my_image = bentoml.images.PythonImage(python_version=\"3.11\") \\\n+ .requirements_file(\"requirements.txt\")\n\n\n@bentoml.service(\n+ image=my_image,\n resources={\"cpu\": \"4\"},\n traffic={\"timeout\": 30},\n)\n```\n\nThen, choose one of the following ways for deployment:\n\n<details>\n\n<summary>\ud83d\udc33 Docker Container</summary>\n\nRun `bentoml build` to package necessary code, models, dependency configs into a Bento - the standardized deployable artifact in BentoML:\n\n```bash\nbentoml build\n```\n\nEnsure [Docker](https://docs.docker.com/) is running. Generate a Docker container image for deployment:\n\n```bash\nbentoml containerize summarization:latest\n```\n\nRun the generated image:\n\n```bash\ndocker run --rm -p 3000:3000 summarization:latest\n```\n\n</details>\n\n<details>\n\n<summary>\u2601\ufe0f BentoCloud</summary>\n\n[BentoCloud](www.bentoml.com) provides compute infrastructure for rapid and reliable GenAI adoption. It helps speed up your BentoML development process leveraging cloud compute resources, and simplify how you deploy, scale and operate BentoML in production.\n\n[Sign up for BentoCloud](https://cloud.bentoml.com/signup) for personal access; for enterprise use cases, [contact our team](https://www.bentoml.com/contact).\n\n```bash\n# After signup, run the following command to create an API token:\nbentoml cloud login\n\n# Deploy from current directory:\nbentoml deploy\n```\n\n\n\n</details>\n\nFor detailed explanations, read the [Hello World example](https://docs.bentoml.com/en/latest/get-started/hello-world.html).\n\n## Examples\n\n- LLMs: [Llama 3.2](https://github.com/bentoml/BentoVLLM/tree/main/llama3.2-90b-instruct), [Mixtral](https://github.com/bentoml/BentoVLLM/tree/main/mixtral-8x7b-instruct), [Solar](https://github.com/bentoml/BentoVLLM/tree/main/solar-10.7b-instruct), and [Mistral](https://github.com/bentoml/BentoVLLM/tree/main/mistral-7b-instruct).\n- Image Generation: [Stable Diffusion 3 Medium](https://github.com/bentoml/BentoDiffusion/tree/main/sd3-medium), [Stable Video Diffusion](https://github.com/bentoml/BentoDiffusion/tree/main/svd), [Stable Diffusion XL Turbo](https://github.com/bentoml/BentoDiffusion/tree/main/sdxl-turbo), [ControlNet](https://github.com/bentoml/BentoDiffusion/tree/main/controlnet), and [LCM LoRAs](https://github.com/bentoml/BentoDiffusion/tree/main/lcm).\n- Embeddings: [SentenceTransformers](https://github.com/bentoml/BentoSentenceTransformers) and [ColPali](https://github.com/bentoml/BentoColPali)\n- Audio: [ChatTTS](https://github.com/bentoml/BentoChatTTS), [XTTS](https://github.com/bentoml/BentoXTTS), [WhisperX](https://github.com/bentoml/BentoWhisperX), [Bark](https://github.com/bentoml/BentoBark)\n- Computer Vision: [YOLO](https://github.com/bentoml/BentoYolo) and [ResNet](https://github.com/bentoml/BentoResnet)\n- Advanced examples: [Function calling](https://github.com/bentoml/BentoFunctionCalling), [LangGraph](https://github.com/bentoml/BentoLangGraph), [CrewAI](https://github.com/bentoml/BentoCrewAI)\n\nCheck out the [full list](https://docs.bentoml.com/en/latest/examples/overview.html) for more sample code and usage.\n\n## Advanced topics\n\n- [Model composition](https://docs.bentoml.com/en/latest/get-started/model-composition.html)\n- [Workers and model parallelization](https://docs.bentoml.com/en/latest/build-with-bentoml/parallelize-requests.html)\n- [Adaptive batching](https://docs.bentoml.com/en/latest/get-started/adaptive-batching.html)\n- [GPU inference](https://docs.bentoml.com/en/latest/build-with-bentoml/gpu-inference.html)\n- [Distributed serving systems](https://docs.bentoml.com/en/latest/build-with-bentoml/distributed-services.html)\n- [Concurrency and autoscaling](https://docs.bentoml.com/en/latest/scale-with-bentocloud/scaling/autoscaling.html)\n- [Model loading and Model Store](https://docs.bentoml.com/en/latest/build-with-bentoml/model-loading-and-management.html)\n- [Observability](https://docs.bentoml.com/en/latest/build-with-bentoml/observability/index.html)\n- [BentoCloud deployment](https://docs.bentoml.com/en/latest/get-started/cloud-deployment.html)\n\nSee [Documentation](https://docs.bentoml.com) for more tutorials and guides.\n\n## Community\n\nGet involved and join our [Community Slack \ud83d\udcac](https://l.bentoml.com/join-slack), where thousands of AI/ML engineers help each other, contribute to the project, and talk about building AI products.\n\nTo report a bug or suggest a feature request, use\n[GitHub Issues](https://github.com/bentoml/BentoML/issues/new/choose).\n\n### Contributing\n\nThere are many ways to contribute to the project:\n\n- Report bugs and \"Thumbs up\" on [issues](https://github.com/bentoml/BentoML/issues) that are relevant to you.\n- Investigate [issues](https://github.com/bentoml/BentoML/issues) and review other developers' [pull requests](https://github.com/bentoml/BentoML/pulls).\n- Contribute code or [documentation](https://docs.bentoml.com/en/latest/index.html) to the project by submitting a GitHub pull request.\n- Check out the [Contributing Guide](https://github.com/bentoml/BentoML/blob/main/CONTRIBUTING.md) and [Development Guide](https://github.com/bentoml/BentoML/blob/main/DEVELOPMENT.md) to learn more.\n- Share your feedback and discuss roadmap plans in the `#bentoml-contributors` channel [here](https://l.bentoml.com/join-slack).\n\nThanks to all of our amazing contributors!\n\n<a href=\"https://github.com/bentoml/BentoML/graphs/contributors\">\n <img src=\"https://contrib.rocks/image?repo=bentoml/BentoML\" />\n</a>\n\n### Usage tracking and feedback\n\nThe BentoML framework collects anonymous usage data that helps our community improve the product. Only BentoML's internal API calls are being reported. This excludes any sensitive information, such as user code, model data, model names, or stack traces. Here's the\u00a0[code](https://github.com/bentoml/BentoML/blob/main/src/bentoml/_internal/utils/analytics/usage_stats.py)\u00a0used for usage tracking. You can opt-out of usage tracking by the\u00a0`--do-not-track`\u00a0CLI option:\n\n```bash\nbentoml [command] --do-not-track\n```\n\nOr by setting the environment variable:\n\n```bash\nexport BENTOML_DO_NOT_TRACK=True\n```\n\n### License\n\n[Apache License 2.0](https://github.com/bentoml/BentoML/blob/main/LICENSE)\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "BentoML: The easiest way to serve AI apps and models",
"version": "1.4.1",
"project_urls": {
"Blog": "https://bentoml.com/blog",
"Documentation": "https://docs.bentoml.com",
"GitHub": "https://github.com/bentoml/bentoml",
"Homepage": "https://bentoml.com",
"Slack": "https://l.bentoml.com/join-slack",
"Tracker": "https://github.com/bentoml/BentoML/issues",
"Twitter": "https://twitter.com/bentomlai"
},
"split_keywords": [
"bentoml",
" compound ai systems",
" llmops",
" mlops",
" model deployment",
" model inference",
" model serving"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "82de6d65c876a1c3b5d809e20c6a18e2678f76542fd8452bd49b1b64633e554b",
"md5": "22959ce545d594308b56404a7a1c8483",
"sha256": "3b622f5301bd55043f528be6677c660005acbf4619eef38d60d98e6b7ce05ad2"
},
"downloads": -1,
"filename": "bentoml-1.4.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "22959ce545d594308b56404a7a1c8483",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 1148727,
"upload_time": "2025-02-25T04:59:41",
"upload_time_iso_8601": "2025-02-25T04:59:41.901702Z",
"url": "https://files.pythonhosted.org/packages/82/de/6d65c876a1c3b5d809e20c6a18e2678f76542fd8452bd49b1b64633e554b/bentoml-1.4.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "7e5eacb7cbd1b757f7e585a28b5d568a6c25ceb785e98ad20ec9f5d7dc45afb9",
"md5": "54626b5b7eee2ce98bc28e28c767f93e",
"sha256": "20c538eb8367b04ecf0c691ea5805fa71b2d82c3a3cabcb3720a109e09653381"
},
"downloads": -1,
"filename": "bentoml-1.4.1.tar.gz",
"has_sig": false,
"md5_digest": "54626b5b7eee2ce98bc28e28c767f93e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 968403,
"upload_time": "2025-02-25T04:59:44",
"upload_time_iso_8601": "2025-02-25T04:59:44.976097Z",
"url": "https://files.pythonhosted.org/packages/7e/5e/acb7cbd1b757f7e585a28b5d568a6c25ceb785e98ad20ec9f5d7dc45afb9/bentoml-1.4.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-25 04:59:44",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "bentoml",
"github_project": "bentoml",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "bentoml"
}