# LiveKit Plugins Google
Agent Framework plugin for services from Google Cloud. Currently supporting Google's [Speech-to-Text](https://cloud.google.com/speech-to-text) API.
## Installation
```bash
pip install livekit-plugins-google
```
## Pre-requisites
For credentials, you'll need a Google Cloud account and obtain the correct credentials. Credentials can be passed directly or via Application Default Credentials as specified in [How Application Default Credentials works](https://cloud.google.com/docs/authentication/application-default-credentials).
To use the STT and TTS API, you'll need to enable the respective services for your Google Cloud project.
- Cloud Speech-to-Text API
- Cloud Text-to-Speech API
## Gemini Multimodal Live
Gemini Multimodal Live can be used with the `MultimodalAgent` class. See examples/multimodal_agent/gemini_agent.py for an example.
### Live Video Input (experimental)
You can push video frames to your Gemini Multimodal Live session alongside the audio automatically handled by the `MultimodalAgent`. The basic approach is to subscribe to the video track, create a video stream, sample frames at a suitable frame rate, and push them into the RealtimeSession:
```
# Make sure you subscribe to audio and video tracks
await ctx.connect(auto_subscribe=AutoSubscribe.SUBSCRIBE_ALL)
# Create your RealtimeModel and store a reference
model = google.beta.realtime.RealtimeModel(
# ...
)
# Create your MultimodalAgent as usual
agent = MultimodalAgent(
model=model,
# ...
)
# Async method to process the video track and push frames to Gemini
async def _process_video_track(self, track: Track):
video_stream = VideoStream(track)
last_frame_time = 0
async for event in video_stream:
current_time = asyncio.get_event_loop().time()
# Sample at 1 FPS
if current_time - last_frame_time < 1.0:
continue
last_frame_time = current_time
frame = event.frame
# Push the frame into the RealtimeSession
model.sessions[0].push_video(frame)
await video_stream.aclose()
# Subscribe to new tracks and process them
@ctx.room.on("track_subscribed")
def _on_track_subscribed(track: Track, pub, participant):
if track.kind == TrackKind.KIND_VIDEO:
asyncio.create_task(self._process_video_track(track))
```
Raw data
{
"_id": null,
"home_page": "https://github.com/livekit/agents",
"name": "livekit-plugins-google",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9.0",
"maintainer_email": null,
"keywords": "webrtc, realtime, audio, video, livekit",
"author": null,
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/4e/86/be929c1f018c8dd6da87c9294ebec43b59e9c8e5f3922fe8c63c2a622bb2/livekit_plugins_google-0.11.3.tar.gz",
"platform": null,
"description": "# LiveKit Plugins Google\n\nAgent Framework plugin for services from Google Cloud. Currently supporting Google's [Speech-to-Text](https://cloud.google.com/speech-to-text) API.\n\n## Installation\n\n```bash\npip install livekit-plugins-google\n```\n\n## Pre-requisites\n\nFor credentials, you'll need a Google Cloud account and obtain the correct credentials. Credentials can be passed directly or via Application Default Credentials as specified in [How Application Default Credentials works](https://cloud.google.com/docs/authentication/application-default-credentials).\n\nTo use the STT and TTS API, you'll need to enable the respective services for your Google Cloud project.\n\n- Cloud Speech-to-Text API\n- Cloud Text-to-Speech API\n\n\n## Gemini Multimodal Live\n\nGemini Multimodal Live can be used with the `MultimodalAgent` class. See examples/multimodal_agent/gemini_agent.py for an example.\n\n### Live Video Input (experimental)\n\nYou can push video frames to your Gemini Multimodal Live session alongside the audio automatically handled by the `MultimodalAgent`. The basic approach is to subscribe to the video track, create a video stream, sample frames at a suitable frame rate, and push them into the RealtimeSession:\n\n```\n# Make sure you subscribe to audio and video tracks\nawait ctx.connect(auto_subscribe=AutoSubscribe.SUBSCRIBE_ALL)\n\n# Create your RealtimeModel and store a reference\nmodel = google.beta.realtime.RealtimeModel(\n # ...\n)\n\n# Create your MultimodalAgent as usual\nagent = MultimodalAgent(\n model=model,\n # ...\n)\n\n# Async method to process the video track and push frames to Gemini\nasync def _process_video_track(self, track: Track):\n video_stream = VideoStream(track)\n last_frame_time = 0\n \n async for event in video_stream:\n current_time = asyncio.get_event_loop().time()\n \n # Sample at 1 FPS\n if current_time - last_frame_time < 1.0: \n continue\n \n last_frame_time = current_time\n frame = event.frame\n \n # Push the frame into the RealtimeSession\n model.sessions[0].push_video(frame)\n \n await video_stream.aclose()\n\n# Subscribe to new tracks and process them\n@ctx.room.on(\"track_subscribed\")\ndef _on_track_subscribed(track: Track, pub, participant):\n if track.kind == TrackKind.KIND_VIDEO:\n asyncio.create_task(self._process_video_track(track))\n```\n\n\n\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Agent Framework plugin for services from Google Cloud",
"version": "0.11.3",
"project_urls": {
"Documentation": "https://docs.livekit.io",
"Homepage": "https://github.com/livekit/agents",
"Source": "https://github.com/livekit/agents",
"Website": "https://livekit.io/"
},
"split_keywords": [
"webrtc",
" realtime",
" audio",
" video",
" livekit"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "61f476170e85984326e4b90e5e69595f40691a308059849326bec3faf8ccbfbe",
"md5": "002d559e4a9fa710f18e4842823d6f25",
"sha256": "2efc612858ea11125184ee44d97e3f1950c871fc61156ee15598093532fb565a"
},
"downloads": -1,
"filename": "livekit_plugins_google-0.11.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "002d559e4a9fa710f18e4842823d6f25",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9.0",
"size": 28480,
"upload_time": "2025-04-07T13:45:13",
"upload_time_iso_8601": "2025-04-07T13:45:13.200910Z",
"url": "https://files.pythonhosted.org/packages/61/f4/76170e85984326e4b90e5e69595f40691a308059849326bec3faf8ccbfbe/livekit_plugins_google-0.11.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "4e86be929c1f018c8dd6da87c9294ebec43b59e9c8e5f3922fe8c63c2a622bb2",
"md5": "25a34114696fec1b8bc1e66cd9faaffe",
"sha256": "c8a037984acfe790f5303c1503cb6dc4524b829b6e36e08cc47a23a52844c605"
},
"downloads": -1,
"filename": "livekit_plugins_google-0.11.3.tar.gz",
"has_sig": false,
"md5_digest": "25a34114696fec1b8bc1e66cd9faaffe",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9.0",
"size": 23664,
"upload_time": "2025-04-07T13:45:17",
"upload_time_iso_8601": "2025-04-07T13:45:17.980828Z",
"url": "https://files.pythonhosted.org/packages/4e/86/be929c1f018c8dd6da87c9294ebec43b59e9c8e5f3922fe8c63c2a622bb2/livekit_plugins_google-0.11.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-04-07 13:45:17",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "livekit",
"github_project": "agents",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "livekit-plugins-google"
}