livekit-plugins-external-turn-detector


Namelivekit-plugins-external-turn-detector JSON
Version 1.2.8 PyPI version JSON
download
home_pageNone
SummaryEnd of utterance detection for LiveKit Agents
upload_time2025-09-08 07:27:32
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9.0
licenseNone
keywords audio livekit realtime video webrtc
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Turn detector plugin for LiveKit Agents

This plugin introduces end-of-turn detection for LiveKit Agents using custom models to determine when a user has finished speaking. It supports both built-in models and external inference servers.

Traditional voice agents use VAD (voice activity detection) for end-of-turn detection. However, VAD models lack language understanding, often causing false positives where the agent interrupts the user before they finish speaking.

By leveraging language models specifically trained for this task, this plugin offers a more accurate and robust method for detecting end-of-turns.

## Features

- **Built-in Models**: English and multilingual models that run locally
- **External Server Support**: Use custom models via OpenAI-compatible APIs or NVIDIA Triton Inference Server
- **Flexible Backends**: Choose between local inference or remote servers based on your needs
- **Async/Await**: Fully asynchronous implementation for optimal performance

See [https://docs.livekit.io/agents/build/turns/turn-detector/](https://docs.livekit.io/agents/build/turns/turn-detector/) for more information.

## Installation

### Basic Installation
```bash
pip install livekit-plugins-external-turn-detector
```

## Usage

### Built-in Models

#### English model

The English model is the smaller of the two models. It requires 200MB of RAM and completes inference in ~10ms

```python
from livekit.plugins.turn_detector.english import EnglishModel

session = AgentSession(
    ...
    turn_detection=EnglishModel(),
)
```

#### Multilingual model

We've trained a separate multilingual model that supports the following languages: `English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Indonesian, Russian, Turkish`

The multilingual model requires ~400MB of RAM and completes inferences in ~25ms.

```python
from livekit.plugins.turn_detector.multilingual import MultilingualModel

session = AgentSession(
    ...
    turn_detection=MultilingualModel(),
)
```

### External Server Models

For custom models or when you need to offload inference to a dedicated server, you can use external backends.

#### Using NVIDIA Triton Inference Server

For high-performance inference with custom models:

```python
from livekit.plugins.turn_detector.external import ExternalModel

turn_detector = ExternalModel(
    triton_url="localhost:7001",  # Your Triton server gRPC endpoint
    triton_model="ensemble",      # Your model name in Triton
    tokenizer="dangvansam/Qwen3-0.6B-turn-detection-en",
    temperature=0.1,
    max_tokens=20,
)

session = AgentSession(
    ...
    turn_detection=turn_detector,
)
```

#### Triton Server Configuration

Your Triton server should have models that accept:

**Inputs:**
- `text_input` (BYTES): Input prompt
- `max_tokens` (INT32): Max tokens to generate  
- `temperature` (FP32): Sampling temperature
- Additional generation parameters as needed

**Outputs:**
- `text_output` (BYTES): Generated text ("end" or "continue")

### Usage with RealtimeModel

The turn detector can be used even with speech-to-speech models such as OpenAI's Realtime API. You'll need to provide a separate STT to ensure our model has access to the text content.

```python
session = AgentSession(
    ...
    stt=deepgram.STT(model="nova-3", language="multi"),
    llm=openai.realtime.RealtimeModel(),
    turn_detection=MultilingualModel(),
)
```

## Running your agent

This plugin requires model files. Before starting your agent for the first time, or when building Docker images for deployment, run the following command to download the model files:

```bash
python my_agent.py download-files
```

## Model system requirements

### Built-in Models

The built-in end-of-turn models are optimized to run on CPUs with modest system requirements. They are designed to run on the same server hosting your agents.

- **English model**: ~200MB RAM, ~10ms inference time
- **Multilingual model**: ~400MB RAM, ~25ms inference time
- Both models run within a shared inference server, supporting multiple concurrent sessions

### External Models

When using external backends, system requirements depend on your chosen configuration:

#### Triton Inference Server
- Server requirements depend on your model size and configuration
- Supports GPU acceleration for faster inference
- Can handle high-throughput scenarios with proper scaling
- Recommended for production deployments with custom models

## License

The plugin source code is licensed under the Apache-2.0 license.

The end-of-turn model is licensed under the [LiveKit Model License](https://huggingface.co/livekit/turn-detector/blob/main/LICENSE).

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "livekit-plugins-external-turn-detector",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9.0",
    "maintainer_email": null,
    "keywords": "audio, livekit, realtime, video, webrtc",
    "author": null,
    "author_email": "Sam Dang <dangvansam98@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/ad/95/3b58add2e09ca25323844c97a1dbc00f7c66ee35bfe1f2c4f5e7fa441c03/livekit_plugins_external_turn_detector-1.2.8.tar.gz",
    "platform": null,
    "description": "# Turn detector plugin for LiveKit Agents\n\nThis plugin introduces end-of-turn detection for LiveKit Agents using custom models to determine when a user has finished speaking. It supports both built-in models and external inference servers.\n\nTraditional voice agents use VAD (voice activity detection) for end-of-turn detection. However, VAD models lack language understanding, often causing false positives where the agent interrupts the user before they finish speaking.\n\nBy leveraging language models specifically trained for this task, this plugin offers a more accurate and robust method for detecting end-of-turns.\n\n## Features\n\n- **Built-in Models**: English and multilingual models that run locally\n- **External Server Support**: Use custom models via OpenAI-compatible APIs or NVIDIA Triton Inference Server\n- **Flexible Backends**: Choose between local inference or remote servers based on your needs\n- **Async/Await**: Fully asynchronous implementation for optimal performance\n\nSee [https://docs.livekit.io/agents/build/turns/turn-detector/](https://docs.livekit.io/agents/build/turns/turn-detector/) for more information.\n\n## Installation\n\n### Basic Installation\n```bash\npip install livekit-plugins-external-turn-detector\n```\n\n## Usage\n\n### Built-in Models\n\n#### English model\n\nThe English model is the smaller of the two models. It requires 200MB of RAM and completes inference in ~10ms\n\n```python\nfrom livekit.plugins.turn_detector.english import EnglishModel\n\nsession = AgentSession(\n    ...\n    turn_detection=EnglishModel(),\n)\n```\n\n#### Multilingual model\n\nWe've trained a separate multilingual model that supports the following languages: `English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Indonesian, Russian, Turkish`\n\nThe multilingual model requires ~400MB of RAM and completes inferences in ~25ms.\n\n```python\nfrom livekit.plugins.turn_detector.multilingual import MultilingualModel\n\nsession = AgentSession(\n    ...\n    turn_detection=MultilingualModel(),\n)\n```\n\n### External Server Models\n\nFor custom models or when you need to offload inference to a dedicated server, you can use external backends.\n\n#### Using NVIDIA Triton Inference Server\n\nFor high-performance inference with custom models:\n\n```python\nfrom livekit.plugins.turn_detector.external import ExternalModel\n\nturn_detector = ExternalModel(\n    triton_url=\"localhost:7001\",  # Your Triton server gRPC endpoint\n    triton_model=\"ensemble\",      # Your model name in Triton\n    tokenizer=\"dangvansam/Qwen3-0.6B-turn-detection-en\",\n    temperature=0.1,\n    max_tokens=20,\n)\n\nsession = AgentSession(\n    ...\n    turn_detection=turn_detector,\n)\n```\n\n#### Triton Server Configuration\n\nYour Triton server should have models that accept:\n\n**Inputs:**\n- `text_input` (BYTES): Input prompt\n- `max_tokens` (INT32): Max tokens to generate  \n- `temperature` (FP32): Sampling temperature\n- Additional generation parameters as needed\n\n**Outputs:**\n- `text_output` (BYTES): Generated text (\"end\" or \"continue\")\n\n### Usage with RealtimeModel\n\nThe turn detector can be used even with speech-to-speech models such as OpenAI's Realtime API. You'll need to provide a separate STT to ensure our model has access to the text content.\n\n```python\nsession = AgentSession(\n    ...\n    stt=deepgram.STT(model=\"nova-3\", language=\"multi\"),\n    llm=openai.realtime.RealtimeModel(),\n    turn_detection=MultilingualModel(),\n)\n```\n\n## Running your agent\n\nThis plugin requires model files. Before starting your agent for the first time, or when building Docker images for deployment, run the following command to download the model files:\n\n```bash\npython my_agent.py download-files\n```\n\n## Model system requirements\n\n### Built-in Models\n\nThe built-in end-of-turn models are optimized to run on CPUs with modest system requirements. They are designed to run on the same server hosting your agents.\n\n- **English model**: ~200MB RAM, ~10ms inference time\n- **Multilingual model**: ~400MB RAM, ~25ms inference time\n- Both models run within a shared inference server, supporting multiple concurrent sessions\n\n### External Models\n\nWhen using external backends, system requirements depend on your chosen configuration:\n\n#### Triton Inference Server\n- Server requirements depend on your model size and configuration\n- Supports GPU acceleration for faster inference\n- Can handle high-throughput scenarios with proper scaling\n- Recommended for production deployments with custom models\n\n## License\n\nThe plugin source code is licensed under the Apache-2.0 license.\n\nThe end-of-turn model is licensed under the [LiveKit Model License](https://huggingface.co/livekit/turn-detector/blob/main/LICENSE).\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "End of utterance detection for LiveKit Agents",
    "version": "1.2.8",
    "project_urls": {
        "Documentation": "https://docs.livekit.io",
        "Source": "https://github.com/livekit/agents",
        "Website": "https://livekit.io/"
    },
    "split_keywords": [
        "audio",
        " livekit",
        " realtime",
        " video",
        " webrtc"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "117a8fc0921b5f4e37b5b3fc21cf5cdeb228223efa06dc49fe58ccc5c655902c",
                "md5": "6d6dd84b09486dd069ba256390aca5c1",
                "sha256": "f5b239342f4aafe23ca4701f83e9296406ea6a7906a931381166c094763f7597"
            },
            "downloads": -1,
            "filename": "livekit_plugins_external_turn_detector-1.2.8-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6d6dd84b09486dd069ba256390aca5c1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9.0",
            "size": 14152,
            "upload_time": "2025-09-08T07:27:30",
            "upload_time_iso_8601": "2025-09-08T07:27:30.800302Z",
            "url": "https://files.pythonhosted.org/packages/11/7a/8fc0921b5f4e37b5b3fc21cf5cdeb228223efa06dc49fe58ccc5c655902c/livekit_plugins_external_turn_detector-1.2.8-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ad953b58add2e09ca25323844c97a1dbc00f7c66ee35bfe1f2c4f5e7fa441c03",
                "md5": "8dbca33b2301150f53995276b79aeb61",
                "sha256": "e2bf01e5bc8f52200a710349f4b5d2d3e0e98bb81ebf480ff077212d1b6dcebd"
            },
            "downloads": -1,
            "filename": "livekit_plugins_external_turn_detector-1.2.8.tar.gz",
            "has_sig": false,
            "md5_digest": "8dbca33b2301150f53995276b79aeb61",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9.0",
            "size": 12106,
            "upload_time": "2025-09-08T07:27:32",
            "upload_time_iso_8601": "2025-09-08T07:27:32.552901Z",
            "url": "https://files.pythonhosted.org/packages/ad/95/3b58add2e09ca25323844c97a1dbc00f7c66ee35bfe1f2c4f5e7fa441c03/livekit_plugins_external_turn_detector-1.2.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-08 07:27:32",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "livekit",
    "github_project": "agents",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "livekit-plugins-external-turn-detector"
}
        
Elapsed time: 2.19863s