<picture>
<source media="(prefers-color-scheme: dark)" srcset="docs/_static/RoboML_DARK.png">
<source media="(prefers-color-scheme: light)" srcset="docs/_static/RoboML_LIGHT.png">
<img alt="RoboML Logo" src="docs/_static/RoboML_LIGHT.png" width="40%">
</picture>
<br/>
[![中文版本][cn-badge]][cn-url]
[![PyPI][pypi-badge]][pypi-url]
[![MIT licensed][mit-badge]][mit-url]
[![Python Version][python-badge]][python-url]
[cn-badge]: https://img.shields.io/badge/文档-中文-blue.svg
[cn-url]: docs/README.zh-CN.md
[pypi-badge]: https://img.shields.io/pypi/v/roboml.svg
[pypi-url]: https://pypi.org/project/roboml/
[mit-badge]: https://img.shields.io/pypi/l/roboml.svg
[mit-url]: https://github.com/automatika-robotics/roboml/LICENSE
[python-badge]: https://img.shields.io/pypi/pyversions/roboml.svg
[python-url]: https://www.python.org/downloads/
RoboML is an aggregator package for quickly deploying open-source ML models for robots. It supports three main use cases:
- **Rapid deployment of general-purpose models:** Wraps around popular ML libraries like 🤗 [**Transformers**](https://github.com/huggingface/transformers), allowing fast deployment of models through scalable server endpoints.
- **Deploy detection models with tracking:** Supports deployment of all detection models in [**MMDetection**](https://github.com/open-mmlab/mmdetection) with optional tracking integration.
- **Aggregate robot-specific models from the robotics community:** Intended as a platform for community-contributed multimodal models, usable in planning and control, especially with ROS components. See [EmbodiedAgents](https://automatika-robotics.github.io/embodied-agents).
## Models And Wrappers
| **Model Class** | **Description** | **Default Checkpoint / Resource** | **Key Init Parameters** |
| ------------------ | --------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `TransformersLLM` | General-purpose large language model (LLM) from [🤗 Transformers](https://github.com/huggingface/transformers) | [`microsoft/Phi-3-mini-4k-instruct`](https://huggingface.co/models?other=LLM) | `name`, `checkpoint`, `quantization`, `init_timeout` |
| `TransformersMLLM` | Multimodal vision-language model (MLLM) from [🤗 Transformers](https://github.com/huggingface/transformers) | [`HuggingFaceM4/idefics2-8b`](https://huggingface.co/models?pipeline_tag=image-text-to-text) | `name`, `checkpoint`, `quantization`, `init_timeout` |
| `RoboBrain2` | Embodied planning + multimodal reasoning via [RoboBrain 2.0](https://github.com/FlagOpen/RoboBrain2.0) | [`BAAI/RoboBrain2.0-7B`](https://huggingface.co/collections/BAAI/robobrain20-6841eeb1df55c207a4ea0036) | `name`, `checkpoint`, `init_timeout` |
| `Whisper` | Multilingual speech-to-text (ASR) from [OpenAI Whisper](https://openai.com/index/whisper) | `small.en` ([checkpoint list](https://github.com/SYSTRAN/faster-whisper/blob/d3bfd0a305eb9d97c08047c82149c1998cc90fcb/faster_whisper/transcribe.py#L606)) | `name`, `checkpoint`, `compute_type`, `init_timeout` |
| `SpeechT5` | Text-to-speech model from [Microsoft SpeechT5](https://github.com/microsoft/SpeechT5) | `microsoft/speecht5_tts` | `name`, `checkpoint`, `voice`, `init_timeout` |
| `Bark` | Text-to-speech model from [SunoAI Bark](https://github.com/suno-ai/bark) | [`suno/bark-small`](https://huggingface.co/collections/suno/bark-6502bdd89a612aa33a111bae), [voice options](https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c) | `name`, `checkpoint`, `voice`, `attn_implementation`, `init_timeout` |
| `MeloTTS` | Multilingual text-to-speech via [MeloTTS](https://github.com/myshell-ai/MeloTTS/blob/main/docs/install.md#python-api) | `EN`, `EN-US` | `name`, `language`, `speaker_id`, `init_timeout` |
| `VisionModel` | Detection + tracking via [MMDetection](https://github.com/open-mmlab/mmdetection) | [`dino-4scale_r50_8xb2-12e_coco`](https://github.com/open-mmlab/mmdetection?tab=readme-ov-file#overview-of-benchmark-and-model-zoo) | `name`, `checkpoint`, `setup_trackers`, `cache_dir`, `tracking_distance_function`, `tracking_distance_threshold`, `deploy_tensorrt`, `_num_trackers`, `init_timeout` |
## Installation
RoboML has been tested on Ubuntu 20.04 and later. A GPU with CUDA 12.1+ is recommended. If you encounter issues, please [open an issue](https://github.com/automatika-robotics/roboml/issues).
```bash
pip install roboml
```
### From Source
```bash
git clone https://github.com/automatika-robotics/roboml.git && cd roboml
virtualenv venv && source venv/bin/activate
pip install pip-tools
pip install .
```
## Vision Model Support
To use detection and tracking features via MMDetection:
- Install RoboML with the vision extras:
```bash
pip install roboml[vision]
```
- Install `mmcv` using the appropriate CUDA and PyTorch versions as described in [their docs](https://mmcv.readthedocs.io/en/latest/get_started/installation.html). Example for PyTorch 2.1 with CUDA 12.1:
```bash
pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu121/torch2.1/index.html
```
- Install `mmdetection`:
```bash
git clone https://github.com/open-mmlab/mmdetection.git
cd mmdetection
pip install -v -e .
```
- If `ffmpeg` or `libGL` is missing:
```bash
sudo apt-get update && apt-get install ffmpeg libsm6 libxext6
```
### TensorRT-Based Model Deployment
RoboML vision models can optionally be accelerated with NVIDIA TensorRT on Linux x86_64 systems. For setup, follow the [TensorRT installation guide](https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html).
## Docker Build (Recommended)
Jetson users are especially encouraged to use Docker.
- Install Docker Desktop
- Install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)
```bash
git clone https://github.com/automatika-robotics/roboml.git && cd roboml
# Build container image
docker build --tag=automatika:roboml .
# For Jetson boards:
docker build --tag=automatika:roboml -f Dockerfile.Jetson .
# Run HTTP server
docker run --runtime=nvidia --gpus all --rm -p 8000:8000 automatika:roboml roboml
# Or run RESP server
docker run --runtime=nvidia --gpus all --rm -p 6379:6379 automatika:roboml roboml-resp
```
- (Optional) Mount your cache dir to persist downloaded models:
```bash
-v ~/.cache:/root/.cache
```
## Servers
RoboML uses [Ray Serve](https://docs.ray.io/en/latest/serve/index.html) to host models as scalable apps across various environments.
### WebSocket Endpoint
WebSocket endpoints are exposed for streaming use cases (e.g., STT/TTS).
### Experimental RESP Server
For ultra-low latency in robotics, RoboML also includes a RESP-based server compatible with any Redis client.
RESP (see [spec](https://github.com/antirez/RESP3)) is a lightweight, binary-safe protocol. Combined with `msgpack` instead of JSON, it enables very fast I/O, ideal for binary data like images, audio, or video.
This work is inspired by [@hansonkd](https://github.com/hansonkd)’s [Tino project](https://github.com/hansonkd/Tino).
## Usage
Run the HTTP server:
```bash
roboml
```
Run the RESP server:
```bash
roboml-resp
```
Example usage in ROS clients is documented in [ROS Agents](https://automatika-robotics.github.io/ros-agents).
## Running Tests
Install dev dependencies:
```bash
pip install ".[dev]"
```
Run tests from the project root:
```bash
python -m pytest
```
## Copyright
Unless otherwise specified, all code is © 2024 Automatika Robotics.
RoboML is released under the MIT License. See [LICENSE](LICENSE) for details.
## Contributions
ROS Agents is developed in collaboration between [Automatika Robotics](https://automatikarobotics.com/) and [Inria](https://inria.fr/). Community contributions are welcome!
Raw data
{
"_id": null,
"home_page": null,
"name": "roboml",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "robots, robotics, machine learning, multimodal, deep learning",
"author": null,
"author_email": "Automatika Robotics <contact@automatikarobotics.com>",
"download_url": "https://files.pythonhosted.org/packages/f0/86/f9b4f2f7129e0d9faf37e5a093a17d2bb2b0edef027b58c3924d7d10ebbf/roboml-0.3.1.tar.gz",
"platform": null,
"description": "<picture>\n <source media=\"(prefers-color-scheme: dark)\" srcset=\"docs/_static/RoboML_DARK.png\">\n <source media=\"(prefers-color-scheme: light)\" srcset=\"docs/_static/RoboML_LIGHT.png\">\n <img alt=\"RoboML Logo\" src=\"docs/_static/RoboML_LIGHT.png\" width=\"40%\">\n</picture>\n<br/>\n\n[![\u4e2d\u6587\u7248\u672c][cn-badge]][cn-url]\n[![PyPI][pypi-badge]][pypi-url]\n[![MIT licensed][mit-badge]][mit-url]\n[![Python Version][python-badge]][python-url]\n\n[cn-badge]: https://img.shields.io/badge/\u6587\u6863-\u4e2d\u6587-blue.svg\n[cn-url]: docs/README.zh-CN.md\n[pypi-badge]: https://img.shields.io/pypi/v/roboml.svg\n[pypi-url]: https://pypi.org/project/roboml/\n[mit-badge]: https://img.shields.io/pypi/l/roboml.svg\n[mit-url]: https://github.com/automatika-robotics/roboml/LICENSE\n[python-badge]: https://img.shields.io/pypi/pyversions/roboml.svg\n[python-url]: https://www.python.org/downloads/\n\nRoboML is an aggregator package for quickly deploying open-source ML models for robots. It supports three main use cases:\n\n- **Rapid deployment of general-purpose models:** Wraps around popular ML libraries like \ud83e\udd17 [**Transformers**](https://github.com/huggingface/transformers), allowing fast deployment of models through scalable server endpoints.\n- **Deploy detection models with tracking:** Supports deployment of all detection models in [**MMDetection**](https://github.com/open-mmlab/mmdetection) with optional tracking integration.\n- **Aggregate robot-specific models from the robotics community:** Intended as a platform for community-contributed multimodal models, usable in planning and control, especially with ROS components. See [EmbodiedAgents](https://automatika-robotics.github.io/embodied-agents).\n\n## Models And Wrappers\n\n| **Model Class** | **Description** | **Default Checkpoint / Resource** | **Key Init Parameters** |\n| ------------------ | --------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| `TransformersLLM` | General-purpose large language model (LLM) from [\ud83e\udd17 Transformers](https://github.com/huggingface/transformers) | [`microsoft/Phi-3-mini-4k-instruct`](https://huggingface.co/models?other=LLM) | `name`, `checkpoint`, `quantization`, `init_timeout` |\n| `TransformersMLLM` | Multimodal vision-language model (MLLM) from [\ud83e\udd17 Transformers](https://github.com/huggingface/transformers) | [`HuggingFaceM4/idefics2-8b`](https://huggingface.co/models?pipeline_tag=image-text-to-text) | `name`, `checkpoint`, `quantization`, `init_timeout` |\n| `RoboBrain2` | Embodied planning + multimodal reasoning via [RoboBrain 2.0](https://github.com/FlagOpen/RoboBrain2.0) | [`BAAI/RoboBrain2.0-7B`](https://huggingface.co/collections/BAAI/robobrain20-6841eeb1df55c207a4ea0036) | `name`, `checkpoint`, `init_timeout` |\n| `Whisper` | Multilingual speech-to-text (ASR) from [OpenAI Whisper](https://openai.com/index/whisper) | `small.en` ([checkpoint list](https://github.com/SYSTRAN/faster-whisper/blob/d3bfd0a305eb9d97c08047c82149c1998cc90fcb/faster_whisper/transcribe.py#L606)) | `name`, `checkpoint`, `compute_type`, `init_timeout` |\n| `SpeechT5` | Text-to-speech model from [Microsoft SpeechT5](https://github.com/microsoft/SpeechT5) | `microsoft/speecht5_tts` | `name`, `checkpoint`, `voice`, `init_timeout` |\n| `Bark` | Text-to-speech model from [SunoAI Bark](https://github.com/suno-ai/bark) | [`suno/bark-small`](https://huggingface.co/collections/suno/bark-6502bdd89a612aa33a111bae), [voice options](https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c) | `name`, `checkpoint`, `voice`, `attn_implementation`, `init_timeout` |\n| `MeloTTS` | Multilingual text-to-speech via [MeloTTS](https://github.com/myshell-ai/MeloTTS/blob/main/docs/install.md#python-api) | `EN`, `EN-US` | `name`, `language`, `speaker_id`, `init_timeout` |\n| `VisionModel` | Detection + tracking via [MMDetection](https://github.com/open-mmlab/mmdetection) | [`dino-4scale_r50_8xb2-12e_coco`](https://github.com/open-mmlab/mmdetection?tab=readme-ov-file#overview-of-benchmark-and-model-zoo) | `name`, `checkpoint`, `setup_trackers`, `cache_dir`, `tracking_distance_function`, `tracking_distance_threshold`, `deploy_tensorrt`, `_num_trackers`, `init_timeout` |\n\n## Installation\n\nRoboML has been tested on Ubuntu 20.04 and later. A GPU with CUDA 12.1+ is recommended. If you encounter issues, please [open an issue](https://github.com/automatika-robotics/roboml/issues).\n\n```bash\npip install roboml\n```\n\n### From Source\n\n```bash\ngit clone https://github.com/automatika-robotics/roboml.git && cd roboml\nvirtualenv venv && source venv/bin/activate\npip install pip-tools\npip install .\n```\n\n## Vision Model Support\n\nTo use detection and tracking features via MMDetection:\n\n- Install RoboML with the vision extras:\n\n ```bash\n pip install roboml[vision]\n ```\n\n- Install `mmcv` using the appropriate CUDA and PyTorch versions as described in [their docs](https://mmcv.readthedocs.io/en/latest/get_started/installation.html). Example for PyTorch 2.1 with CUDA 12.1:\n\n ```bash\n pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu121/torch2.1/index.html\n ```\n\n- Install `mmdetection`:\n\n ```bash\n git clone https://github.com/open-mmlab/mmdetection.git\n cd mmdetection\n pip install -v -e .\n ```\n\n- If `ffmpeg` or `libGL` is missing:\n\n ```bash\n sudo apt-get update && apt-get install ffmpeg libsm6 libxext6\n ```\n\n### TensorRT-Based Model Deployment\n\nRoboML vision models can optionally be accelerated with NVIDIA TensorRT on Linux x86_64 systems. For setup, follow the [TensorRT installation guide](https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html).\n\n## Docker Build (Recommended)\n\nJetson users are especially encouraged to use Docker.\n\n- Install Docker Desktop\n- Install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)\n\n```bash\ngit clone https://github.com/automatika-robotics/roboml.git && cd roboml\n\n# Build container image\ndocker build --tag=automatika:roboml .\n# For Jetson boards:\ndocker build --tag=automatika:roboml -f Dockerfile.Jetson .\n\n# Run HTTP server\ndocker run --runtime=nvidia --gpus all --rm -p 8000:8000 automatika:roboml roboml\n# Or run RESP server\ndocker run --runtime=nvidia --gpus all --rm -p 6379:6379 automatika:roboml roboml-resp\n```\n\n- (Optional) Mount your cache dir to persist downloaded models:\n\n ```bash\n -v ~/.cache:/root/.cache\n ```\n\n## Servers\n\nRoboML uses [Ray Serve](https://docs.ray.io/en/latest/serve/index.html) to host models as scalable apps across various environments.\n\n### WebSocket Endpoint\n\nWebSocket endpoints are exposed for streaming use cases (e.g., STT/TTS).\n\n### Experimental RESP Server\n\nFor ultra-low latency in robotics, RoboML also includes a RESP-based server compatible with any Redis client.\nRESP (see [spec](https://github.com/antirez/RESP3)) is a lightweight, binary-safe protocol. Combined with `msgpack` instead of JSON, it enables very fast I/O, ideal for binary data like images, audio, or video.\n\nThis work is inspired by [@hansonkd](https://github.com/hansonkd)\u2019s [Tino project](https://github.com/hansonkd/Tino).\n\n## Usage\n\nRun the HTTP server:\n\n```bash\nroboml\n```\n\nRun the RESP server:\n\n```bash\nroboml-resp\n```\n\nExample usage in ROS clients is documented in [ROS Agents](https://automatika-robotics.github.io/ros-agents).\n\n## Running Tests\n\nInstall dev dependencies:\n\n```bash\npip install \".[dev]\"\n```\n\nRun tests from the project root:\n\n```bash\npython -m pytest\n```\n\n## Copyright\n\nUnless otherwise specified, all code is \u00a9 2024 Automatika Robotics.\nRoboML is released under the MIT License. See [LICENSE](LICENSE) for details.\n\n## Contributions\n\nROS Agents is developed in collaboration between [Automatika Robotics](https://automatikarobotics.com/) and [Inria](https://inria.fr/). Community contributions are welcome!\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Machine learning models optimized for robotics experimentation and deployment",
"version": "0.3.1",
"project_urls": {
"Homepage": "https://github.com/automatika-robotics/roboml"
},
"split_keywords": [
"robots",
" robotics",
" machine learning",
" multimodal",
" deep learning"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "5f5c38c5b7b8e7181fb29d68530f50c012c32a62d6222e9e7dc1c23c59d92d8d",
"md5": "8d99cf253bd7ea63987cd0459b5a15c5",
"sha256": "8d6b41292f2796b2f4efda58bd46684d5b430efe481e89a5bee50e9383b5b1cb"
},
"downloads": -1,
"filename": "roboml-0.3.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "8d99cf253bd7ea63987cd0459b5a15c5",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 35069,
"upload_time": "2025-07-10T08:31:51",
"upload_time_iso_8601": "2025-07-10T08:31:51.554744Z",
"url": "https://files.pythonhosted.org/packages/5f/5c/38c5b7b8e7181fb29d68530f50c012c32a62d6222e9e7dc1c23c59d92d8d/roboml-0.3.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "f086f9b4f2f7129e0d9faf37e5a093a17d2bb2b0edef027b58c3924d7d10ebbf",
"md5": "99de56f1e2647f9928d58bab75ce3720",
"sha256": "374aba7f73fe9860d12aeea9e6cfcbde63c2e6d6bb7696ed5c692487d7387adb"
},
"downloads": -1,
"filename": "roboml-0.3.1.tar.gz",
"has_sig": false,
"md5_digest": "99de56f1e2647f9928d58bab75ce3720",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 34728,
"upload_time": "2025-07-10T08:31:52",
"upload_time_iso_8601": "2025-07-10T08:31:52.779128Z",
"url": "https://files.pythonhosted.org/packages/f0/86/f9b4f2f7129e0d9faf37e5a093a17d2bb2b0edef027b58c3924d7d10ebbf/roboml-0.3.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-10 08:31:52",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "automatika-robotics",
"github_project": "roboml",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "roboml"
}