## Gemini Live Speech-to-Speech Plugin
Google Gemini Live Speech-to-Speech (STS) plugin for GetStream. It connects a realtime Gemini Live session to a Stream video call so your assistant can speak and listen in the same call.
### Installation
```bash
pip install getstream-plugins-gemini
```
### Requirements
- **Python**: 3.10+
- **Dependencies**: `getstream[webrtc"]`, `getstream-plugins-common`, `google-genai`
- **API key**: `GOOGLE_API_KEY` or `GEMINI_API_KEY` set in your environment
### Quick Start
Below is a minimal example that attaches the Gemini Live output audio track to a Stream call and streams microphone audio into Gemini. The assistant will speak back into the call, and you can also send text messages to the assistant.
```python
import asyncio
import os
from getstream import Stream
from getstream.plugins.gemini.live import GeminiLive
from getstream.video import rtc
from getstream.video.rtc.track_util import PcmData
async def main():
# Ensure your key is set: export GOOGLE_API_KEY=... (or GEMINI_API_KEY)
gemini = GeminiLive(
api_key=os.getenv("GOOGLE_API_KEY"),
model="gemini-live-2.5-flash-preview",
)
client = Stream.from_env()
call = client.video.call("default", "your-call-id")
async with await rtc.join(call, user_id="assistant-bot") as connection:
# Route Gemini's synthesized speech back into the call
await connection.add_tracks(audio=gemini.output_track)
# Forward microphone PCM frames to Gemini in realtime
@connection.on("audio")
async def on_audio(pcm: PcmData):
await gemini.send_audio_pcm(pcm, target_rate=48000)
# Optionally send a kick-off text message
await gemini.send_text("Give a short greeting to the participants.")
# Keep the session running
while True:
await asyncio.sleep(1)
if __name__ == "__main__":
asyncio.run(main())
```
Optional: forward remote participant video frames to Gemini for multimodal context:
```python
# Forward remote video frames to Gemini (optional)
@connection.on("track_added")
async def _on_track_added(track_id, kind, user):
if kind == "video" and connection.subscriber_pc:
track = connection.subscriber_pc.add_track_subscriber(track_id)
if track:
await gemini._watch_video_track(track)
```
For a full runnable example, see `examples/gemini_live/main.py`.
### Features
- **Bidirectional audio**: Streams microphone PCM to Gemini, and plays Gemini speech into the call using `output_track`.
- **Video frame forwarding**: Sends remote participant video frames to Gemini Live for multimodal understanding. Use `start_video_sender` with a remote `MediaStreamTrack`.
- **Text messages**: Use `send_text` to add text turns directly to the conversation.
- **Barge-in (interruptions)**: When the user starts speaking, current playback is interrupted so Gemini can focus on the new input. Playback automatically resumes after brief silence.
- **Auto resampling**: `send_audio_pcm` will resample input frames to the target rate when needed.
- **Events**: Subscribe to `"audio"` for synthesized audio chunks and `"text"` for assistant text.
### API Overview
- **`GeminiLive(api_key: str | None = None, model: str = "gemini-live-2.5-flash-preview", config: LiveConnectConfigDict | None = None)`**: Create a new Gemini Live session. If `api_key` is not provided, the plugin reads `GOOGLE_API_KEY` or `GEMINI_API_KEY` from the environment.
- **`output_track`**: An `AudioStreamTrack` you can publish in your call via `add_tracks(audio=...)`.
- **`await send_text(text: str)`**: Send a user text message to the current turn.
- **`await send_audio_pcm(pcm: PcmData, target_rate: int = 48000)`**: Stream PCM frames to Gemini. Frames are converted to the required format and resampled if necessary.
- **`await wait_until_ready(timeout: float | None = None) -> bool`**: Wait until the underlying live session is connected.
- **`await interrupt_playback()` / `resume_playback()`**: Manually stop or resume synthesized audio playback. Useful if you want to manage barge-in behavior yourself.
- **`await start_video_sender(track: MediaStreamTrack, fps: int = 1)`**: Start forwarding video frames from a remote `MediaStreamTrack` to Gemini Live at the given frame rate.
- **`await stop_video_sender()`**: Stop the background video sender task, if running.
- **`await close()`**: Close the session and background tasks.
### Environment Variables
- **`GOOGLE_API_KEY` / `GEMINI_API_KEY`**: Gemini API key. One must be set.
- **`GEMINI_LIVE_MODEL`**: Optional override for the model name if you need a different variant.
### Notes on Interruptions
- **How it works**: The plugin detects user speech activity in incoming PCM and interrupts any ongoing playback. After a short period of silence, playback is enabled again so the assistant can speak.
- **Why it matters**: This enables natural barge-in experiences, where users can cut off the assistant mid-sentence and ask follow-up questions.
### Troubleshooting
- **No audio playback**: Ensure you publish `output_track` to your call and the call is subscribed to the assistant’s audio.
- **No responses**: Verify `GOOGLE_API_KEY`/`GEMINI_API_KEY` is set and has access to the chosen model. Try a different model via `model=`.
- **Sample-rate issues**: Use `send_audio_pcm(..., target_rate=48000)` to normalize input frames.
Raw data
{
"_id": null,
"home_page": null,
"name": "vision-agents-plugins-gemini",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "AI, LLM, agents, gemini, google, voice agents",
"author": null,
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/97/0b/2adacea236c17eb9b01cf23aa59d0740f43e3b969a592b68b8f7638016a8/vision_agents_plugins_gemini-0.1.6.tar.gz",
"platform": null,
"description": "## Gemini Live Speech-to-Speech Plugin\n\nGoogle Gemini Live Speech-to-Speech (STS) plugin for GetStream. It connects a realtime Gemini Live session to a Stream video call so your assistant can speak and listen in the same call.\n\n### Installation\n\n```bash\npip install getstream-plugins-gemini\n```\n\n### Requirements\n\n- **Python**: 3.10+\n- **Dependencies**: `getstream[webrtc\"]`, `getstream-plugins-common`, `google-genai`\n- **API key**: `GOOGLE_API_KEY` or `GEMINI_API_KEY` set in your environment\n\n### Quick Start\n\nBelow is a minimal example that attaches the Gemini Live output audio track to a Stream call and streams microphone audio into Gemini. The assistant will speak back into the call, and you can also send text messages to the assistant.\n\n```python\nimport asyncio\nimport os\n\nfrom getstream import Stream\nfrom getstream.plugins.gemini.live import GeminiLive\nfrom getstream.video import rtc\nfrom getstream.video.rtc.track_util import PcmData\n\n\nasync def main():\n # Ensure your key is set: export GOOGLE_API_KEY=... (or GEMINI_API_KEY)\n gemini = GeminiLive(\n api_key=os.getenv(\"GOOGLE_API_KEY\"),\n model=\"gemini-live-2.5-flash-preview\",\n )\n\n client = Stream.from_env()\n call = client.video.call(\"default\", \"your-call-id\")\n\n async with await rtc.join(call, user_id=\"assistant-bot\") as connection:\n # Route Gemini's synthesized speech back into the call\n await connection.add_tracks(audio=gemini.output_track)\n\n # Forward microphone PCM frames to Gemini in realtime\n @connection.on(\"audio\")\n async def on_audio(pcm: PcmData):\n await gemini.send_audio_pcm(pcm, target_rate=48000)\n\n # Optionally send a kick-off text message\n await gemini.send_text(\"Give a short greeting to the participants.\")\n\n # Keep the session running\n while True:\n await asyncio.sleep(1)\n\n\nif __name__ == \"__main__\":\n asyncio.run(main())\n```\n\nOptional: forward remote participant video frames to Gemini for multimodal context:\n\n```python\n# Forward remote video frames to Gemini (optional)\n@connection.on(\"track_added\")\nasync def _on_track_added(track_id, kind, user):\n if kind == \"video\" and connection.subscriber_pc:\n track = connection.subscriber_pc.add_track_subscriber(track_id)\n if track:\n await gemini._watch_video_track(track)\n```\n\nFor a full runnable example, see `examples/gemini_live/main.py`.\n\n### Features\n\n- **Bidirectional audio**: Streams microphone PCM to Gemini, and plays Gemini speech into the call using `output_track`.\n- **Video frame forwarding**: Sends remote participant video frames to Gemini Live for multimodal understanding. Use `start_video_sender` with a remote `MediaStreamTrack`.\n- **Text messages**: Use `send_text` to add text turns directly to the conversation.\n- **Barge-in (interruptions)**: When the user starts speaking, current playback is interrupted so Gemini can focus on the new input. Playback automatically resumes after brief silence.\n- **Auto resampling**: `send_audio_pcm` will resample input frames to the target rate when needed.\n- **Events**: Subscribe to `\"audio\"` for synthesized audio chunks and `\"text\"` for assistant text.\n\n### API Overview\n\n- **`GeminiLive(api_key: str | None = None, model: str = \"gemini-live-2.5-flash-preview\", config: LiveConnectConfigDict | None = None)`**: Create a new Gemini Live session. If `api_key` is not provided, the plugin reads `GOOGLE_API_KEY` or `GEMINI_API_KEY` from the environment.\n- **`output_track`**: An `AudioStreamTrack` you can publish in your call via `add_tracks(audio=...)`.\n- **`await send_text(text: str)`**: Send a user text message to the current turn.\n- **`await send_audio_pcm(pcm: PcmData, target_rate: int = 48000)`**: Stream PCM frames to Gemini. Frames are converted to the required format and resampled if necessary.\n- **`await wait_until_ready(timeout: float | None = None) -> bool`**: Wait until the underlying live session is connected.\n- **`await interrupt_playback()` / `resume_playback()`**: Manually stop or resume synthesized audio playback. Useful if you want to manage barge-in behavior yourself.\n- **`await start_video_sender(track: MediaStreamTrack, fps: int = 1)`**: Start forwarding video frames from a remote `MediaStreamTrack` to Gemini Live at the given frame rate.\n- **`await stop_video_sender()`**: Stop the background video sender task, if running.\n- **`await close()`**: Close the session and background tasks.\n\n### Environment Variables\n\n- **`GOOGLE_API_KEY` / `GEMINI_API_KEY`**: Gemini API key. One must be set.\n- **`GEMINI_LIVE_MODEL`**: Optional override for the model name if you need a different variant.\n\n### Notes on Interruptions\n\n- **How it works**: The plugin detects user speech activity in incoming PCM and interrupts any ongoing playback. After a short period of silence, playback is enabled again so the assistant can speak.\n- **Why it matters**: This enables natural barge-in experiences, where users can cut off the assistant mid-sentence and ask follow-up questions.\n\n### Troubleshooting\n\n- **No audio playback**: Ensure you publish `output_track` to your call and the call is subscribed to the assistant\u2019s audio.\n- **No responses**: Verify `GOOGLE_API_KEY`/`GEMINI_API_KEY` is set and has access to the chosen model. Try a different model via `model=`.\n- **Sample-rate issues**: Use `send_audio_pcm(..., target_rate=48000)` to normalize input frames.\n",
"bugtrack_url": null,
"license": null,
"summary": "Google Gemini LLM integration for Vision Agents",
"version": "0.1.6",
"project_urls": {
"Documentation": "https://visionagents.ai/",
"Source": "https://github.com/GetStream/Vision-Agents",
"Website": "https://visionagents.ai/"
},
"split_keywords": [
"ai",
" llm",
" agents",
" gemini",
" google",
" voice agents"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "b4f749471fae4494f8d06deb7803675b88c8812424a70bcd056c36d1f3820eed",
"md5": "f1e96fb800bdd6dd04f091ca5df45269",
"sha256": "764a09e7d8d9df9a399d85a698730cac5724b058b32af12d68ef8a912e025044"
},
"downloads": -1,
"filename": "vision_agents_plugins_gemini-0.1.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f1e96fb800bdd6dd04f091ca5df45269",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 13980,
"upload_time": "2025-10-16T15:53:32",
"upload_time_iso_8601": "2025-10-16T15:53:32.898314Z",
"url": "https://files.pythonhosted.org/packages/b4/f7/49471fae4494f8d06deb7803675b88c8812424a70bcd056c36d1f3820eed/vision_agents_plugins_gemini-0.1.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "970b2adacea236c17eb9b01cf23aa59d0740f43e3b969a592b68b8f7638016a8",
"md5": "a7d145edd14f0a7995cf911ac4468c36",
"sha256": "a6b1f7accb6cda755cdef1b42394c5140089d2609c7007ecf95a4826bc00c22b"
},
"downloads": -1,
"filename": "vision_agents_plugins_gemini-0.1.6.tar.gz",
"has_sig": false,
"md5_digest": "a7d145edd14f0a7995cf911ac4468c36",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 12258,
"upload_time": "2025-10-16T15:53:33",
"upload_time_iso_8601": "2025-10-16T15:53:33.628287Z",
"url": "https://files.pythonhosted.org/packages/97/0b/2adacea236c17eb9b01cf23aa59d0740f43e3b969a592b68b8f7638016a8/vision_agents_plugins_gemini-0.1.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-16 15:53:33",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "GetStream",
"github_project": "Vision-Agents",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "vision-agents-plugins-gemini"
}