# Turn detector plugin for LiveKit Agents
This plugin introduces end-of-turn detection for LiveKit Agents using custom models to determine when a user has finished speaking. It supports both built-in models and external inference servers.
Traditional voice agents use VAD (voice activity detection) for end-of-turn detection. However, VAD models lack language understanding, often causing false positives where the agent interrupts the user before they finish speaking.
By leveraging language models specifically trained for this task, this plugin offers a more accurate and robust method for detecting end-of-turns.
## Features
- **Built-in Models**: English and multilingual models that run locally
- **External Server Support**: Use custom models via OpenAI-compatible APIs or NVIDIA Triton Inference Server
- **Flexible Backends**: Choose between local inference or remote servers based on your needs
- **Async/Await**: Fully asynchronous implementation for optimal performance
See [https://docs.livekit.io/agents/build/turns/turn-detector/](https://docs.livekit.io/agents/build/turns/turn-detector/) for more information.
## Installation
### Basic Installation
```bash
pip install livekit-plugins-external-turn-detector
```
## Usage
### Built-in Models
#### English model
The English model is the smaller of the two models. It requires 200MB of RAM and completes inference in ~10ms
```python
from livekit.plugins.turn_detector.english import EnglishModel
session = AgentSession(
...
turn_detection=EnglishModel(),
)
```
#### Multilingual model
We've trained a separate multilingual model that supports the following languages: `English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Indonesian, Russian, Turkish`
The multilingual model requires ~400MB of RAM and completes inferences in ~25ms.
```python
from livekit.plugins.turn_detector.multilingual import MultilingualModel
session = AgentSession(
...
turn_detection=MultilingualModel(),
)
```
### External Server Models
For custom models or when you need to offload inference to a dedicated server, you can use external backends.
#### Using NVIDIA Triton Inference Server
For high-performance inference with custom models:
```python
from livekit.plugins.turn_detector.external import ExternalModel
turn_detector = ExternalModel(
triton_url="localhost:7001", # Your Triton server gRPC endpoint
triton_model="ensemble", # Your model name in Triton
tokenizer="dangvansam/Qwen3-0.6B-turn-detection-en",
temperature=0.1,
max_tokens=20,
)
session = AgentSession(
...
turn_detection=turn_detector,
)
```
#### Triton Server Configuration
Your Triton server should have models that accept:
**Inputs:**
- `text_input` (BYTES): Input prompt
- `max_tokens` (INT32): Max tokens to generate
- `temperature` (FP32): Sampling temperature
- Additional generation parameters as needed
**Outputs:**
- `text_output` (BYTES): Generated text ("end" or "continue")
### Usage with RealtimeModel
The turn detector can be used even with speech-to-speech models such as OpenAI's Realtime API. You'll need to provide a separate STT to ensure our model has access to the text content.
```python
session = AgentSession(
...
stt=deepgram.STT(model="nova-3", language="multi"),
llm=openai.realtime.RealtimeModel(),
turn_detection=MultilingualModel(),
)
```
## Running your agent
This plugin requires model files. Before starting your agent for the first time, or when building Docker images for deployment, run the following command to download the model files:
```bash
python my_agent.py download-files
```
## Model system requirements
### Built-in Models
The built-in end-of-turn models are optimized to run on CPUs with modest system requirements. They are designed to run on the same server hosting your agents.
- **English model**: ~200MB RAM, ~10ms inference time
- **Multilingual model**: ~400MB RAM, ~25ms inference time
- Both models run within a shared inference server, supporting multiple concurrent sessions
### External Models
When using external backends, system requirements depend on your chosen configuration:
#### Triton Inference Server
- Server requirements depend on your model size and configuration
- Supports GPU acceleration for faster inference
- Can handle high-throughput scenarios with proper scaling
- Recommended for production deployments with custom models
## License
The plugin source code is licensed under the Apache-2.0 license.
The end-of-turn model is licensed under the [LiveKit Model License](https://huggingface.co/livekit/turn-detector/blob/main/LICENSE).
Raw data
{
"_id": null,
"home_page": null,
"name": "livekit-plugins-external-turn-detector",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9.0",
"maintainer_email": null,
"keywords": "audio, livekit, realtime, video, webrtc",
"author": null,
"author_email": "Sam Dang <dangvansam98@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/ad/95/3b58add2e09ca25323844c97a1dbc00f7c66ee35bfe1f2c4f5e7fa441c03/livekit_plugins_external_turn_detector-1.2.8.tar.gz",
"platform": null,
"description": "# Turn detector plugin for LiveKit Agents\n\nThis plugin introduces end-of-turn detection for LiveKit Agents using custom models to determine when a user has finished speaking. It supports both built-in models and external inference servers.\n\nTraditional voice agents use VAD (voice activity detection) for end-of-turn detection. However, VAD models lack language understanding, often causing false positives where the agent interrupts the user before they finish speaking.\n\nBy leveraging language models specifically trained for this task, this plugin offers a more accurate and robust method for detecting end-of-turns.\n\n## Features\n\n- **Built-in Models**: English and multilingual models that run locally\n- **External Server Support**: Use custom models via OpenAI-compatible APIs or NVIDIA Triton Inference Server\n- **Flexible Backends**: Choose between local inference or remote servers based on your needs\n- **Async/Await**: Fully asynchronous implementation for optimal performance\n\nSee [https://docs.livekit.io/agents/build/turns/turn-detector/](https://docs.livekit.io/agents/build/turns/turn-detector/) for more information.\n\n## Installation\n\n### Basic Installation\n```bash\npip install livekit-plugins-external-turn-detector\n```\n\n## Usage\n\n### Built-in Models\n\n#### English model\n\nThe English model is the smaller of the two models. It requires 200MB of RAM and completes inference in ~10ms\n\n```python\nfrom livekit.plugins.turn_detector.english import EnglishModel\n\nsession = AgentSession(\n ...\n turn_detection=EnglishModel(),\n)\n```\n\n#### Multilingual model\n\nWe've trained a separate multilingual model that supports the following languages: `English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Indonesian, Russian, Turkish`\n\nThe multilingual model requires ~400MB of RAM and completes inferences in ~25ms.\n\n```python\nfrom livekit.plugins.turn_detector.multilingual import MultilingualModel\n\nsession = AgentSession(\n ...\n turn_detection=MultilingualModel(),\n)\n```\n\n### External Server Models\n\nFor custom models or when you need to offload inference to a dedicated server, you can use external backends.\n\n#### Using NVIDIA Triton Inference Server\n\nFor high-performance inference with custom models:\n\n```python\nfrom livekit.plugins.turn_detector.external import ExternalModel\n\nturn_detector = ExternalModel(\n triton_url=\"localhost:7001\", # Your Triton server gRPC endpoint\n triton_model=\"ensemble\", # Your model name in Triton\n tokenizer=\"dangvansam/Qwen3-0.6B-turn-detection-en\",\n temperature=0.1,\n max_tokens=20,\n)\n\nsession = AgentSession(\n ...\n turn_detection=turn_detector,\n)\n```\n\n#### Triton Server Configuration\n\nYour Triton server should have models that accept:\n\n**Inputs:**\n- `text_input` (BYTES): Input prompt\n- `max_tokens` (INT32): Max tokens to generate \n- `temperature` (FP32): Sampling temperature\n- Additional generation parameters as needed\n\n**Outputs:**\n- `text_output` (BYTES): Generated text (\"end\" or \"continue\")\n\n### Usage with RealtimeModel\n\nThe turn detector can be used even with speech-to-speech models such as OpenAI's Realtime API. You'll need to provide a separate STT to ensure our model has access to the text content.\n\n```python\nsession = AgentSession(\n ...\n stt=deepgram.STT(model=\"nova-3\", language=\"multi\"),\n llm=openai.realtime.RealtimeModel(),\n turn_detection=MultilingualModel(),\n)\n```\n\n## Running your agent\n\nThis plugin requires model files. Before starting your agent for the first time, or when building Docker images for deployment, run the following command to download the model files:\n\n```bash\npython my_agent.py download-files\n```\n\n## Model system requirements\n\n### Built-in Models\n\nThe built-in end-of-turn models are optimized to run on CPUs with modest system requirements. They are designed to run on the same server hosting your agents.\n\n- **English model**: ~200MB RAM, ~10ms inference time\n- **Multilingual model**: ~400MB RAM, ~25ms inference time\n- Both models run within a shared inference server, supporting multiple concurrent sessions\n\n### External Models\n\nWhen using external backends, system requirements depend on your chosen configuration:\n\n#### Triton Inference Server\n- Server requirements depend on your model size and configuration\n- Supports GPU acceleration for faster inference\n- Can handle high-throughput scenarios with proper scaling\n- Recommended for production deployments with custom models\n\n## License\n\nThe plugin source code is licensed under the Apache-2.0 license.\n\nThe end-of-turn model is licensed under the [LiveKit Model License](https://huggingface.co/livekit/turn-detector/blob/main/LICENSE).\n",
"bugtrack_url": null,
"license": null,
"summary": "End of utterance detection for LiveKit Agents",
"version": "1.2.8",
"project_urls": {
"Documentation": "https://docs.livekit.io",
"Source": "https://github.com/livekit/agents",
"Website": "https://livekit.io/"
},
"split_keywords": [
"audio",
" livekit",
" realtime",
" video",
" webrtc"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "117a8fc0921b5f4e37b5b3fc21cf5cdeb228223efa06dc49fe58ccc5c655902c",
"md5": "6d6dd84b09486dd069ba256390aca5c1",
"sha256": "f5b239342f4aafe23ca4701f83e9296406ea6a7906a931381166c094763f7597"
},
"downloads": -1,
"filename": "livekit_plugins_external_turn_detector-1.2.8-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6d6dd84b09486dd069ba256390aca5c1",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9.0",
"size": 14152,
"upload_time": "2025-09-08T07:27:30",
"upload_time_iso_8601": "2025-09-08T07:27:30.800302Z",
"url": "https://files.pythonhosted.org/packages/11/7a/8fc0921b5f4e37b5b3fc21cf5cdeb228223efa06dc49fe58ccc5c655902c/livekit_plugins_external_turn_detector-1.2.8-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "ad953b58add2e09ca25323844c97a1dbc00f7c66ee35bfe1f2c4f5e7fa441c03",
"md5": "8dbca33b2301150f53995276b79aeb61",
"sha256": "e2bf01e5bc8f52200a710349f4b5d2d3e0e98bb81ebf480ff077212d1b6dcebd"
},
"downloads": -1,
"filename": "livekit_plugins_external_turn_detector-1.2.8.tar.gz",
"has_sig": false,
"md5_digest": "8dbca33b2301150f53995276b79aeb61",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9.0",
"size": 12106,
"upload_time": "2025-09-08T07:27:32",
"upload_time_iso_8601": "2025-09-08T07:27:32.552901Z",
"url": "https://files.pythonhosted.org/packages/ad/95/3b58add2e09ca25323844c97a1dbc00f7c66ee35bfe1f2c4f5e7fa441c03/livekit_plugins_external_turn_detector-1.2.8.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-08 07:27:32",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "livekit",
"github_project": "agents",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "livekit-plugins-external-turn-detector"
}