vision-agents-plugins-gemini


Namevision-agents-plugins-gemini JSON
Version 0.1.6 PyPI version JSON
download
home_pageNone
SummaryGoogle Gemini LLM integration for Vision Agents
upload_time2025-10-16 15:53:33
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseNone
keywords ai llm agents gemini google voice agents
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ## Gemini Live Speech-to-Speech Plugin

Google Gemini Live Speech-to-Speech (STS) plugin for GetStream. It connects a realtime Gemini Live session to a Stream video call so your assistant can speak and listen in the same call.

### Installation

```bash
pip install getstream-plugins-gemini
```

### Requirements

- **Python**: 3.10+
- **Dependencies**: `getstream[webrtc"]`, `getstream-plugins-common`, `google-genai`
- **API key**: `GOOGLE_API_KEY` or `GEMINI_API_KEY` set in your environment

### Quick Start

Below is a minimal example that attaches the Gemini Live output audio track to a Stream call and streams microphone audio into Gemini. The assistant will speak back into the call, and you can also send text messages to the assistant.

```python
import asyncio
import os

from getstream import Stream
from getstream.plugins.gemini.live import GeminiLive
from getstream.video import rtc
from getstream.video.rtc.track_util import PcmData


async def main():
    # Ensure your key is set: export GOOGLE_API_KEY=... (or GEMINI_API_KEY)
    gemini = GeminiLive(
        api_key=os.getenv("GOOGLE_API_KEY"),
        model="gemini-live-2.5-flash-preview",
    )

    client = Stream.from_env()
    call = client.video.call("default", "your-call-id")

    async with await rtc.join(call, user_id="assistant-bot") as connection:
        # Route Gemini's synthesized speech back into the call
        await connection.add_tracks(audio=gemini.output_track)

        # Forward microphone PCM frames to Gemini in realtime
        @connection.on("audio")
        async def on_audio(pcm: PcmData):
            await gemini.send_audio_pcm(pcm, target_rate=48000)

        # Optionally send a kick-off text message
        await gemini.send_text("Give a short greeting to the participants.")

        # Keep the session running
        while True:
            await asyncio.sleep(1)


if __name__ == "__main__":
    asyncio.run(main())
```

Optional: forward remote participant video frames to Gemini for multimodal context:

```python
# Forward remote video frames to Gemini (optional)
@connection.on("track_added")
async def _on_track_added(track_id, kind, user):
    if kind == "video" and connection.subscriber_pc:
        track = connection.subscriber_pc.add_track_subscriber(track_id)
        if track:
            await gemini._watch_video_track(track)
```

For a full runnable example, see `examples/gemini_live/main.py`.

### Features

- **Bidirectional audio**: Streams microphone PCM to Gemini, and plays Gemini speech into the call using `output_track`.
- **Video frame forwarding**: Sends remote participant video frames to Gemini Live for multimodal understanding. Use `start_video_sender` with a remote `MediaStreamTrack`.
- **Text messages**: Use `send_text` to add text turns directly to the conversation.
- **Barge-in (interruptions)**: When the user starts speaking, current playback is interrupted so Gemini can focus on the new input. Playback automatically resumes after brief silence.
- **Auto resampling**: `send_audio_pcm` will resample input frames to the target rate when needed.
- **Events**: Subscribe to `"audio"` for synthesized audio chunks and `"text"` for assistant text.

### API Overview

- **`GeminiLive(api_key: str | None = None, model: str = "gemini-live-2.5-flash-preview", config: LiveConnectConfigDict | None = None)`**: Create a new Gemini Live session. If `api_key` is not provided, the plugin reads `GOOGLE_API_KEY` or `GEMINI_API_KEY` from the environment.
- **`output_track`**: An `AudioStreamTrack` you can publish in your call via `add_tracks(audio=...)`.
- **`await send_text(text: str)`**: Send a user text message to the current turn.
- **`await send_audio_pcm(pcm: PcmData, target_rate: int = 48000)`**: Stream PCM frames to Gemini. Frames are converted to the required format and resampled if necessary.
- **`await wait_until_ready(timeout: float | None = None) -> bool`**: Wait until the underlying live session is connected.
- **`await interrupt_playback()` / `resume_playback()`**: Manually stop or resume synthesized audio playback. Useful if you want to manage barge-in behavior yourself.
- **`await start_video_sender(track: MediaStreamTrack, fps: int = 1)`**: Start forwarding video frames from a remote `MediaStreamTrack` to Gemini Live at the given frame rate.
- **`await stop_video_sender()`**: Stop the background video sender task, if running.
- **`await close()`**: Close the session and background tasks.

### Environment Variables

- **`GOOGLE_API_KEY` / `GEMINI_API_KEY`**: Gemini API key. One must be set.
- **`GEMINI_LIVE_MODEL`**: Optional override for the model name if you need a different variant.

### Notes on Interruptions

- **How it works**: The plugin detects user speech activity in incoming PCM and interrupts any ongoing playback. After a short period of silence, playback is enabled again so the assistant can speak.
- **Why it matters**: This enables natural barge-in experiences, where users can cut off the assistant mid-sentence and ask follow-up questions.

### Troubleshooting

- **No audio playback**: Ensure you publish `output_track` to your call and the call is subscribed to the assistant’s audio.
- **No responses**: Verify `GOOGLE_API_KEY`/`GEMINI_API_KEY` is set and has access to the chosen model. Try a different model via `model=`.
- **Sample-rate issues**: Use `send_audio_pcm(..., target_rate=48000)` to normalize input frames.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "vision-agents-plugins-gemini",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "AI, LLM, agents, gemini, google, voice agents",
    "author": null,
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/97/0b/2adacea236c17eb9b01cf23aa59d0740f43e3b969a592b68b8f7638016a8/vision_agents_plugins_gemini-0.1.6.tar.gz",
    "platform": null,
    "description": "## Gemini Live Speech-to-Speech Plugin\n\nGoogle Gemini Live Speech-to-Speech (STS) plugin for GetStream. It connects a realtime Gemini Live session to a Stream video call so your assistant can speak and listen in the same call.\n\n### Installation\n\n```bash\npip install getstream-plugins-gemini\n```\n\n### Requirements\n\n- **Python**: 3.10+\n- **Dependencies**: `getstream[webrtc\"]`, `getstream-plugins-common`, `google-genai`\n- **API key**: `GOOGLE_API_KEY` or `GEMINI_API_KEY` set in your environment\n\n### Quick Start\n\nBelow is a minimal example that attaches the Gemini Live output audio track to a Stream call and streams microphone audio into Gemini. The assistant will speak back into the call, and you can also send text messages to the assistant.\n\n```python\nimport asyncio\nimport os\n\nfrom getstream import Stream\nfrom getstream.plugins.gemini.live import GeminiLive\nfrom getstream.video import rtc\nfrom getstream.video.rtc.track_util import PcmData\n\n\nasync def main():\n    # Ensure your key is set: export GOOGLE_API_KEY=... (or GEMINI_API_KEY)\n    gemini = GeminiLive(\n        api_key=os.getenv(\"GOOGLE_API_KEY\"),\n        model=\"gemini-live-2.5-flash-preview\",\n    )\n\n    client = Stream.from_env()\n    call = client.video.call(\"default\", \"your-call-id\")\n\n    async with await rtc.join(call, user_id=\"assistant-bot\") as connection:\n        # Route Gemini's synthesized speech back into the call\n        await connection.add_tracks(audio=gemini.output_track)\n\n        # Forward microphone PCM frames to Gemini in realtime\n        @connection.on(\"audio\")\n        async def on_audio(pcm: PcmData):\n            await gemini.send_audio_pcm(pcm, target_rate=48000)\n\n        # Optionally send a kick-off text message\n        await gemini.send_text(\"Give a short greeting to the participants.\")\n\n        # Keep the session running\n        while True:\n            await asyncio.sleep(1)\n\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n```\n\nOptional: forward remote participant video frames to Gemini for multimodal context:\n\n```python\n# Forward remote video frames to Gemini (optional)\n@connection.on(\"track_added\")\nasync def _on_track_added(track_id, kind, user):\n    if kind == \"video\" and connection.subscriber_pc:\n        track = connection.subscriber_pc.add_track_subscriber(track_id)\n        if track:\n            await gemini._watch_video_track(track)\n```\n\nFor a full runnable example, see `examples/gemini_live/main.py`.\n\n### Features\n\n- **Bidirectional audio**: Streams microphone PCM to Gemini, and plays Gemini speech into the call using `output_track`.\n- **Video frame forwarding**: Sends remote participant video frames to Gemini Live for multimodal understanding. Use `start_video_sender` with a remote `MediaStreamTrack`.\n- **Text messages**: Use `send_text` to add text turns directly to the conversation.\n- **Barge-in (interruptions)**: When the user starts speaking, current playback is interrupted so Gemini can focus on the new input. Playback automatically resumes after brief silence.\n- **Auto resampling**: `send_audio_pcm` will resample input frames to the target rate when needed.\n- **Events**: Subscribe to `\"audio\"` for synthesized audio chunks and `\"text\"` for assistant text.\n\n### API Overview\n\n- **`GeminiLive(api_key: str | None = None, model: str = \"gemini-live-2.5-flash-preview\", config: LiveConnectConfigDict | None = None)`**: Create a new Gemini Live session. If `api_key` is not provided, the plugin reads `GOOGLE_API_KEY` or `GEMINI_API_KEY` from the environment.\n- **`output_track`**: An `AudioStreamTrack` you can publish in your call via `add_tracks(audio=...)`.\n- **`await send_text(text: str)`**: Send a user text message to the current turn.\n- **`await send_audio_pcm(pcm: PcmData, target_rate: int = 48000)`**: Stream PCM frames to Gemini. Frames are converted to the required format and resampled if necessary.\n- **`await wait_until_ready(timeout: float | None = None) -> bool`**: Wait until the underlying live session is connected.\n- **`await interrupt_playback()` / `resume_playback()`**: Manually stop or resume synthesized audio playback. Useful if you want to manage barge-in behavior yourself.\n- **`await start_video_sender(track: MediaStreamTrack, fps: int = 1)`**: Start forwarding video frames from a remote `MediaStreamTrack` to Gemini Live at the given frame rate.\n- **`await stop_video_sender()`**: Stop the background video sender task, if running.\n- **`await close()`**: Close the session and background tasks.\n\n### Environment Variables\n\n- **`GOOGLE_API_KEY` / `GEMINI_API_KEY`**: Gemini API key. One must be set.\n- **`GEMINI_LIVE_MODEL`**: Optional override for the model name if you need a different variant.\n\n### Notes on Interruptions\n\n- **How it works**: The plugin detects user speech activity in incoming PCM and interrupts any ongoing playback. After a short period of silence, playback is enabled again so the assistant can speak.\n- **Why it matters**: This enables natural barge-in experiences, where users can cut off the assistant mid-sentence and ask follow-up questions.\n\n### Troubleshooting\n\n- **No audio playback**: Ensure you publish `output_track` to your call and the call is subscribed to the assistant\u2019s audio.\n- **No responses**: Verify `GOOGLE_API_KEY`/`GEMINI_API_KEY` is set and has access to the chosen model. Try a different model via `model=`.\n- **Sample-rate issues**: Use `send_audio_pcm(..., target_rate=48000)` to normalize input frames.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Google Gemini LLM integration for Vision Agents",
    "version": "0.1.6",
    "project_urls": {
        "Documentation": "https://visionagents.ai/",
        "Source": "https://github.com/GetStream/Vision-Agents",
        "Website": "https://visionagents.ai/"
    },
    "split_keywords": [
        "ai",
        " llm",
        " agents",
        " gemini",
        " google",
        " voice agents"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b4f749471fae4494f8d06deb7803675b88c8812424a70bcd056c36d1f3820eed",
                "md5": "f1e96fb800bdd6dd04f091ca5df45269",
                "sha256": "764a09e7d8d9df9a399d85a698730cac5724b058b32af12d68ef8a912e025044"
            },
            "downloads": -1,
            "filename": "vision_agents_plugins_gemini-0.1.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f1e96fb800bdd6dd04f091ca5df45269",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 13980,
            "upload_time": "2025-10-16T15:53:32",
            "upload_time_iso_8601": "2025-10-16T15:53:32.898314Z",
            "url": "https://files.pythonhosted.org/packages/b4/f7/49471fae4494f8d06deb7803675b88c8812424a70bcd056c36d1f3820eed/vision_agents_plugins_gemini-0.1.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "970b2adacea236c17eb9b01cf23aa59d0740f43e3b969a592b68b8f7638016a8",
                "md5": "a7d145edd14f0a7995cf911ac4468c36",
                "sha256": "a6b1f7accb6cda755cdef1b42394c5140089d2609c7007ecf95a4826bc00c22b"
            },
            "downloads": -1,
            "filename": "vision_agents_plugins_gemini-0.1.6.tar.gz",
            "has_sig": false,
            "md5_digest": "a7d145edd14f0a7995cf911ac4468c36",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 12258,
            "upload_time": "2025-10-16T15:53:33",
            "upload_time_iso_8601": "2025-10-16T15:53:33.628287Z",
            "url": "https://files.pythonhosted.org/packages/97/0b/2adacea236c17eb9b01cf23aa59d0740f43e3b969a592b68b8f7638016a8/vision_agents_plugins_gemini-0.1.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-16 15:53:33",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "GetStream",
    "github_project": "Vision-Agents",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "vision-agents-plugins-gemini"
}
        
Elapsed time: 2.75821s