livekit-plugins-google


Namelivekit-plugins-google JSON
Version 0.11.3 PyPI version JSON
download
home_pagehttps://github.com/livekit/agents
SummaryAgent Framework plugin for services from Google Cloud
upload_time2025-04-07 13:45:17
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9.0
licenseApache-2.0
keywords webrtc realtime audio video livekit
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # LiveKit Plugins Google

Agent Framework plugin for services from Google Cloud. Currently supporting Google's [Speech-to-Text](https://cloud.google.com/speech-to-text) API.

## Installation

```bash
pip install livekit-plugins-google
```

## Pre-requisites

For credentials, you'll need a Google Cloud account and obtain the correct credentials. Credentials can be passed directly or via Application Default Credentials as specified in [How Application Default Credentials works](https://cloud.google.com/docs/authentication/application-default-credentials).

To use the STT and TTS API, you'll need to enable the respective services for your Google Cloud project.

- Cloud Speech-to-Text API
- Cloud Text-to-Speech API


## Gemini Multimodal Live

Gemini Multimodal Live can be used with the `MultimodalAgent` class. See examples/multimodal_agent/gemini_agent.py for an example.

### Live Video Input (experimental)

You can push video frames to your Gemini Multimodal Live session alongside the audio automatically handled by the `MultimodalAgent`.  The basic approach is to subscribe to the video track, create a video stream, sample frames at a suitable frame rate, and push them into the RealtimeSession:

```
# Make sure you subscribe to audio and video tracks
await ctx.connect(auto_subscribe=AutoSubscribe.SUBSCRIBE_ALL)

# Create your RealtimeModel and store a reference
model = google.beta.realtime.RealtimeModel(
    # ...
)

# Create your MultimodalAgent as usual
agent = MultimodalAgent(
    model=model,
    # ...
)

# Async method to process the video track and push frames to Gemini
async def _process_video_track(self, track: Track):
    video_stream = VideoStream(track)
    last_frame_time = 0
    
    async for event in video_stream:
        current_time = asyncio.get_event_loop().time()
        
        # Sample at 1 FPS
        if current_time - last_frame_time < 1.0: 
            continue
            
        last_frame_time = current_time
        frame = event.frame
        
        # Push the frame into the RealtimeSession
        model.sessions[0].push_video(frame)
        
    await video_stream.aclose()

# Subscribe to new tracks and process them
@ctx.room.on("track_subscribed")
def _on_track_subscribed(track: Track, pub, participant):
    if track.kind == TrackKind.KIND_VIDEO:
        asyncio.create_task(self._process_video_track(track))
```




            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/livekit/agents",
    "name": "livekit-plugins-google",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9.0",
    "maintainer_email": null,
    "keywords": "webrtc, realtime, audio, video, livekit",
    "author": null,
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/4e/86/be929c1f018c8dd6da87c9294ebec43b59e9c8e5f3922fe8c63c2a622bb2/livekit_plugins_google-0.11.3.tar.gz",
    "platform": null,
    "description": "# LiveKit Plugins Google\n\nAgent Framework plugin for services from Google Cloud. Currently supporting Google's [Speech-to-Text](https://cloud.google.com/speech-to-text) API.\n\n## Installation\n\n```bash\npip install livekit-plugins-google\n```\n\n## Pre-requisites\n\nFor credentials, you'll need a Google Cloud account and obtain the correct credentials. Credentials can be passed directly or via Application Default Credentials as specified in [How Application Default Credentials works](https://cloud.google.com/docs/authentication/application-default-credentials).\n\nTo use the STT and TTS API, you'll need to enable the respective services for your Google Cloud project.\n\n- Cloud Speech-to-Text API\n- Cloud Text-to-Speech API\n\n\n## Gemini Multimodal Live\n\nGemini Multimodal Live can be used with the `MultimodalAgent` class. See examples/multimodal_agent/gemini_agent.py for an example.\n\n### Live Video Input (experimental)\n\nYou can push video frames to your Gemini Multimodal Live session alongside the audio automatically handled by the `MultimodalAgent`.  The basic approach is to subscribe to the video track, create a video stream, sample frames at a suitable frame rate, and push them into the RealtimeSession:\n\n```\n# Make sure you subscribe to audio and video tracks\nawait ctx.connect(auto_subscribe=AutoSubscribe.SUBSCRIBE_ALL)\n\n# Create your RealtimeModel and store a reference\nmodel = google.beta.realtime.RealtimeModel(\n    # ...\n)\n\n# Create your MultimodalAgent as usual\nagent = MultimodalAgent(\n    model=model,\n    # ...\n)\n\n# Async method to process the video track and push frames to Gemini\nasync def _process_video_track(self, track: Track):\n    video_stream = VideoStream(track)\n    last_frame_time = 0\n    \n    async for event in video_stream:\n        current_time = asyncio.get_event_loop().time()\n        \n        # Sample at 1 FPS\n        if current_time - last_frame_time < 1.0: \n            continue\n            \n        last_frame_time = current_time\n        frame = event.frame\n        \n        # Push the frame into the RealtimeSession\n        model.sessions[0].push_video(frame)\n        \n    await video_stream.aclose()\n\n# Subscribe to new tracks and process them\n@ctx.room.on(\"track_subscribed\")\ndef _on_track_subscribed(track: Track, pub, participant):\n    if track.kind == TrackKind.KIND_VIDEO:\n        asyncio.create_task(self._process_video_track(track))\n```\n\n\n\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Agent Framework plugin for services from Google Cloud",
    "version": "0.11.3",
    "project_urls": {
        "Documentation": "https://docs.livekit.io",
        "Homepage": "https://github.com/livekit/agents",
        "Source": "https://github.com/livekit/agents",
        "Website": "https://livekit.io/"
    },
    "split_keywords": [
        "webrtc",
        " realtime",
        " audio",
        " video",
        " livekit"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "61f476170e85984326e4b90e5e69595f40691a308059849326bec3faf8ccbfbe",
                "md5": "002d559e4a9fa710f18e4842823d6f25",
                "sha256": "2efc612858ea11125184ee44d97e3f1950c871fc61156ee15598093532fb565a"
            },
            "downloads": -1,
            "filename": "livekit_plugins_google-0.11.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "002d559e4a9fa710f18e4842823d6f25",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9.0",
            "size": 28480,
            "upload_time": "2025-04-07T13:45:13",
            "upload_time_iso_8601": "2025-04-07T13:45:13.200910Z",
            "url": "https://files.pythonhosted.org/packages/61/f4/76170e85984326e4b90e5e69595f40691a308059849326bec3faf8ccbfbe/livekit_plugins_google-0.11.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "4e86be929c1f018c8dd6da87c9294ebec43b59e9c8e5f3922fe8c63c2a622bb2",
                "md5": "25a34114696fec1b8bc1e66cd9faaffe",
                "sha256": "c8a037984acfe790f5303c1503cb6dc4524b829b6e36e08cc47a23a52844c605"
            },
            "downloads": -1,
            "filename": "livekit_plugins_google-0.11.3.tar.gz",
            "has_sig": false,
            "md5_digest": "25a34114696fec1b8bc1e66cd9faaffe",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9.0",
            "size": 23664,
            "upload_time": "2025-04-07T13:45:17",
            "upload_time_iso_8601": "2025-04-07T13:45:17.980828Z",
            "url": "https://files.pythonhosted.org/packages/4e/86/be929c1f018c8dd6da87c9294ebec43b59e9c8e5f3922fe8c63c2a622bb2/livekit_plugins_google-0.11.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-04-07 13:45:17",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "livekit",
    "github_project": "agents",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "livekit-plugins-google"
}
        
Elapsed time: 0.43109s