# VLMS - Video Intelligence SDK
**Event-based video intelligence with 98% cost reduction**
Multi-source video processing SDK with intelligent frame selection, motion tracking, and VLM-powered analysis. Built for production use with RTSP, ONVIF, UDP, WebRTC, and more coming soon.
> **Note:** `pip install vlm-sdk` installs the SDK components (connectors, preprocessors, providers). The FastAPI service depends on additional packages; install them separately if you plan to run the API.
[](https://www.python.org/downloads/)
[](https://www.apache.org/licenses/LICENSE-2.0)
---
## π Features
### Core SDK (`vlm`)
- **π― Event-based processing**: Only analyze frames with motion/activity (98% cost reduction vs frame-by-frame)
- **πΉ Multi-source connectors**: RTSP, ONVIF, UDP, WebRTC, File
- **π€ RT-DETR + ByteTrack**: Real-time object detection and motion tracking
- **π§ Provider-agnostic VLM**: Gemini, OpenAI, Claude (via env config)
- **π¨ Advanced analysis**: Timestamps, object detection, bounding boxes, range queries
### Production API (`api`)
- **β‘ FastAPI REST API**: Industry-standard multi-stream video intelligence
- **π‘ Server-Sent Events (SSE)**: Real-time event streaming
- **π Authentication**: API key-based auth with rate limiting
- **π Monitoring**: Health checks, metrics, stream management
- **π§ Configurable**: Environment-based provider selection
---
## π Quick Start
### Installation
```bash
# Install from PyPI
pip install vlm-sdk
# Or install from source
git clone https://github.com/observee-ai/vlm-sdk.git
cd vlm-sdk
pip install -e .
```
### SDK Usage
```python
from vlm.preprocessors import DetectorPreprocessor
from vlm.connectors import RTSPConnector
from vlm.providers.gemini import GeminiVideoService
import asyncio
# Initialize components
connector = RTSPConnector("rtsp://camera.local/stream1")
preprocessor = DetectorPreprocessor(
confidence_threshold=0.6,
track_objects=["person", "car"],
min_duration=2.0 # Only events longer than 2 seconds
)
gemini = GeminiVideoService(api_key="your-gemini-key")
# Process stream
async def process():
for frame in connector.stream_frames():
result = preprocessor.process_frame(frame.data, frame.timestamp)
if result['status'] == 'completed':
# Event detected! Analyze with VLM
upload = await gemini.upload_file(result['clip_path'])
analysis = await gemini.query_video_with_file(
upload['name'],
"Describe the activity in this video"
)
print(f"Analysis: {analysis['response']}")
asyncio.run(process())
```
### API Server
```bash
# Set environment variables
export ADMIN_API_KEY=your-secret-key
export GEMINI_API_KEY=your-gemini-key
export VLM_PROVIDER=gemini # or openai, anthropic
# Install SDK (from repo checkout)
pip install -e .
# Install API dependencies (required for running api.main)
pip install fastapi uvicorn[standard] pydantic python-dotenv
# or install everything we ship in Docker
pip install -r requirements.txt
# Run server
python -m api.main
# Server starts at http://localhost:8000
```
**Create a stream:**
```bash
curl -X POST http://localhost:8000/v1/streams/create \
-H "X-Admin-API-Key: your-secret-key" \
-H "X-VLM-API-Key: your-gemini-key" \
-H "Content-Type: application/json" \
-d '{
"source_type": "rtsp",
"source_url": "rtsp://camera.local/stream1",
"config": {
"username": "admin",
"password": "password",
"profile": "security",
"min_duration": 2.0
},
"analysis": {
"enabled": true,
"mode": "basic",
"prompt": "Describe any activity or movement"
}
}'
```
**Listen to events (SSE):**
```bash
curl -N http://localhost:8000/v1/streams/{stream_id}/events \
-H "X-Admin-API-Key: your-secret-key"
```
---
## π Documentation
### Environment Variables
```bash
# Required
ADMIN_API_KEY=your-admin-key # API authentication
# VLM Provider (choose one)
VLM_PROVIDER=gemini # gemini, openai, or anthropic
GEMINI_API_KEY=your-gemini-key # If using Gemini
OPENAI_API_KEY=your-openai-key # If using OpenAI
ANTHROPIC_API_KEY=your-anthropic-key # If using Claude
# Optional: Rate Limiting
RATE_LIMIT_REQUESTS=100 # Requests per window
RATE_LIMIT_WINDOW=60 # Time window (seconds)
```
### Analysis Modes
**Basic** - Simple video description
```json
{
"analysis": {
"mode": "basic",
"prompt": "Describe the activity"
}
}
```
**Timestamps** - Find specific moments
```json
{
"analysis": {
"mode": "timestamps",
"find_timestamps": {
"query": "when does someone wave",
"find_all": true,
"confidence_threshold": 0.7
}
}
}
```
### Supported Connectors
| Connector | Description | Config |
|-----------|-------------|--------|
| **RTSP** | IP camera streams | `username`, `password`, `transport` (tcp/udp) |
| **ONVIF** | Auto-discovery + PTZ | `username`, `password`, `profile_index` |
| **UDP** | UDP video receiver | `host`, `port`, `buffer_size` |
| **WebRTC** | Browser streams | `signaling_url`, `ice_servers` |
### API Endpoints
```
POST /v1/streams/create Create stream
GET /v1/streams/{id}/events SSE event stream
GET /v1/streams/{id} Get status
DELETE /v1/streams/{id} Stop stream
GET /v1/streams List all streams
GET /v1/streams/discover/onvif Discover cameras
GET /v1/streams/health Health check
```
---
## ποΈ Architecture
```
βββββββββββββββ
β Connector β (RTSP/ONVIF/UDP/WebRTC/File)
ββββββββ¬βββββββ
β Frames
βΌ
βββββββββββββββ
β RT-DETR β (Object detection + motion tracking)
ββββββββ¬βββββββ
β Events (only motion/activity)
βΌ
βββββββββββββββ
β Event Bufferβ (Collects frames during events)
ββββββββ¬βββββββ
β Complete Events
ββββββββββββββββββ
β β
βΌ βΌ
βββββββββββββ ββββββββββββ
β Storage β β VLM β (Gemini/Qwen/Observee VLM)
βββββββββββββ ββββββ¬ββββββ
β
βΌ
βββββββββββββββββ
β SSE / Webhooksβ
βββββββββββββββββ
```
**Key Innovation**: Event-based processing analyzes only frames with detected motion/activity, reducing VLM API calls by 98% compared to frame-by-frame analysis.
---
## π§ Development
```bash
# Clone repository
git clone https://github.com/observee-ai/vlm-sdk.git
cd vlm-sdk
# Install with dev dependencies
pip install -e ".[dev]"
# Include API stack if you plan to run the server locally
pip install -r requirements.txt
# Run tests
pytest tests/
# Format code
black vlm/ api/
ruff check vlm/ api/
# Run API server (development)
uvicorn api.main:app --reload
```
---
## π― Use Cases
- **π’ Security & Surveillance**: 24/7 perimeter monitoring with motion alerts
- **πͺ Retail Analytics**: Customer counting, queue analysis, behavior tracking
- **π Traffic Monitoring**: Vehicle counting, flow analysis, incident detection
- **π Smart Home**: Activity monitoring, intrusion detection
- **π Industrial**: Safety compliance, equipment monitoring
---
## π Cost Comparison
| Approach | Frames/Hour | VLM API Calls | Cost Reduction |
|----------|-------------|---------------|----------------|
| **Frame-by-frame** | 54,000 (15 FPS) | 54,000 | Baseline |
| **Event-based (VLMS)** | 54,000 | ~1,000 | **98%** β
|
*Example: 1-hour 15 FPS stream with 5-10 motion events*
---
## π€ Contributing
Contributions welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
---
## π License
**Apache-2.0** β Permissive license suitable for commercial and open-source use.
See [LICENSE](LICENSE) for the complete text. Commercial support is available on request.
---
## π Acknowledgments
- **Ultralytics RT-DETR**: Object detection and tracking
- **FastAPI**: Modern Python web framework
- **Google Gemini**: Video understanding API
- **ByteTrack**: Multi-object tracking algorithm
---
**Built with β€οΈ for efficient video intelligence in SF**
Raw data
{
"_id": null,
"home_page": null,
"name": "vlm-sdk",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "video, ai, vlm, gemini, vision, rtdetr, detection, rtsp, onvif",
"author": null,
"author_email": "Rishabh <rishabh@observee.com>",
"download_url": "https://files.pythonhosted.org/packages/41/14/6c96fe0462cf19c7e592a9ee6144b7363ffb86b62b60d64865105c20fc69/vlm_sdk-0.0.2.tar.gz",
"platform": null,
"description": "# VLMS - Video Intelligence SDK\n\n**Event-based video intelligence with 98% cost reduction**\n\nMulti-source video processing SDK with intelligent frame selection, motion tracking, and VLM-powered analysis. Built for production use with RTSP, ONVIF, UDP, WebRTC, and more coming soon.\n\n> **Note:** `pip install vlm-sdk` installs the SDK components (connectors, preprocessors, providers). The FastAPI service depends on additional packages; install them separately if you plan to run the API.\n\n[](https://www.python.org/downloads/)\n[](https://www.apache.org/licenses/LICENSE-2.0)\n\n---\n\n## \ud83c\udf1f Features\n\n### Core SDK (`vlm`)\n- **\ud83c\udfaf Event-based processing**: Only analyze frames with motion/activity (98% cost reduction vs frame-by-frame)\n- **\ud83d\udcf9 Multi-source connectors**: RTSP, ONVIF, UDP, WebRTC, File\n- **\ud83e\udd16 RT-DETR + ByteTrack**: Real-time object detection and motion tracking\n- **\ud83e\udde0 Provider-agnostic VLM**: Gemini, OpenAI, Claude (via env config)\n- **\ud83c\udfa8 Advanced analysis**: Timestamps, object detection, bounding boxes, range queries\n\n### Production API (`api`)\n- **\u26a1 FastAPI REST API**: Industry-standard multi-stream video intelligence\n- **\ud83d\udce1 Server-Sent Events (SSE)**: Real-time event streaming\n- **\ud83d\udd10 Authentication**: API key-based auth with rate limiting\n- **\ud83d\udcca Monitoring**: Health checks, metrics, stream management\n- **\ud83d\udd27 Configurable**: Environment-based provider selection\n\n---\n\n## \ud83d\ude80 Quick Start\n\n### Installation\n\n```bash\n# Install from PyPI\npip install vlm-sdk\n\n# Or install from source\ngit clone https://github.com/observee-ai/vlm-sdk.git\ncd vlm-sdk\npip install -e .\n```\n\n### SDK Usage\n\n```python\nfrom vlm.preprocessors import DetectorPreprocessor\nfrom vlm.connectors import RTSPConnector\nfrom vlm.providers.gemini import GeminiVideoService\nimport asyncio\n\n# Initialize components\nconnector = RTSPConnector(\"rtsp://camera.local/stream1\")\npreprocessor = DetectorPreprocessor(\n confidence_threshold=0.6,\n track_objects=[\"person\", \"car\"],\n min_duration=2.0 # Only events longer than 2 seconds\n)\n\ngemini = GeminiVideoService(api_key=\"your-gemini-key\")\n\n# Process stream\nasync def process():\n for frame in connector.stream_frames():\n result = preprocessor.process_frame(frame.data, frame.timestamp)\n\n if result['status'] == 'completed':\n # Event detected! Analyze with VLM\n upload = await gemini.upload_file(result['clip_path'])\n analysis = await gemini.query_video_with_file(\n upload['name'],\n \"Describe the activity in this video\"\n )\n print(f\"Analysis: {analysis['response']}\")\n\nasyncio.run(process())\n```\n\n### API Server\n\n```bash\n# Set environment variables\nexport ADMIN_API_KEY=your-secret-key\nexport GEMINI_API_KEY=your-gemini-key\nexport VLM_PROVIDER=gemini # or openai, anthropic\n\n# Install SDK (from repo checkout)\npip install -e .\n\n# Install API dependencies (required for running api.main)\npip install fastapi uvicorn[standard] pydantic python-dotenv\n# or install everything we ship in Docker\npip install -r requirements.txt\n\n# Run server\npython -m api.main\n\n# Server starts at http://localhost:8000\n```\n\n**Create a stream:**\n\n```bash\ncurl -X POST http://localhost:8000/v1/streams/create \\\n -H \"X-Admin-API-Key: your-secret-key\" \\\n -H \"X-VLM-API-Key: your-gemini-key\" \\\n -H \"Content-Type: application/json\" \\\n -d '{\n \"source_type\": \"rtsp\",\n \"source_url\": \"rtsp://camera.local/stream1\",\n \"config\": {\n \"username\": \"admin\",\n \"password\": \"password\",\n \"profile\": \"security\",\n \"min_duration\": 2.0\n },\n \"analysis\": {\n \"enabled\": true,\n \"mode\": \"basic\",\n \"prompt\": \"Describe any activity or movement\"\n }\n }'\n```\n\n**Listen to events (SSE):**\n\n```bash\ncurl -N http://localhost:8000/v1/streams/{stream_id}/events \\\n -H \"X-Admin-API-Key: your-secret-key\"\n```\n\n---\n\n## \ud83d\udcd6 Documentation\n\n### Environment Variables\n\n```bash\n# Required\nADMIN_API_KEY=your-admin-key # API authentication\n\n# VLM Provider (choose one)\nVLM_PROVIDER=gemini # gemini, openai, or anthropic\nGEMINI_API_KEY=your-gemini-key # If using Gemini\nOPENAI_API_KEY=your-openai-key # If using OpenAI\nANTHROPIC_API_KEY=your-anthropic-key # If using Claude\n\n# Optional: Rate Limiting\nRATE_LIMIT_REQUESTS=100 # Requests per window\nRATE_LIMIT_WINDOW=60 # Time window (seconds)\n```\n\n### Analysis Modes\n\n**Basic** - Simple video description\n```json\n{\n \"analysis\": {\n \"mode\": \"basic\",\n \"prompt\": \"Describe the activity\"\n }\n}\n```\n\n**Timestamps** - Find specific moments\n```json\n{\n \"analysis\": {\n \"mode\": \"timestamps\",\n \"find_timestamps\": {\n \"query\": \"when does someone wave\",\n \"find_all\": true,\n \"confidence_threshold\": 0.7\n }\n }\n}\n```\n\n\n\n### Supported Connectors\n\n| Connector | Description | Config |\n|-----------|-------------|--------|\n| **RTSP** | IP camera streams | `username`, `password`, `transport` (tcp/udp) |\n| **ONVIF** | Auto-discovery + PTZ | `username`, `password`, `profile_index` |\n| **UDP** | UDP video receiver | `host`, `port`, `buffer_size` |\n| **WebRTC** | Browser streams | `signaling_url`, `ice_servers` |\n\n### API Endpoints\n\n```\nPOST /v1/streams/create Create stream\nGET /v1/streams/{id}/events SSE event stream\nGET /v1/streams/{id} Get status\nDELETE /v1/streams/{id} Stop stream\nGET /v1/streams List all streams\nGET /v1/streams/discover/onvif Discover cameras\nGET /v1/streams/health Health check\n```\n\n\n---\n\n## \ud83c\udfd7\ufe0f Architecture\n\n```\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 Connector \u2502 (RTSP/ONVIF/UDP/WebRTC/File)\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n \u2502 Frames\n \u25bc\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 RT-DETR \u2502 (Object detection + motion tracking)\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n \u2502 Events (only motion/activity)\n \u25bc\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 Event Buffer\u2502 (Collects frames during events)\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n \u2502 Complete Events\n \u251c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n \u2502 \u2502\n \u25bc \u25bc\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 Storage \u2502 \u2502 VLM \u2502 (Gemini/Qwen/Observee VLM)\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2518\n \u2502\n \u25bc\n \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n \u2502 SSE / Webhooks\u2502\n \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n```\n\n**Key Innovation**: Event-based processing analyzes only frames with detected motion/activity, reducing VLM API calls by 98% compared to frame-by-frame analysis.\n\n---\n\n\n## \ud83d\udd27 Development\n\n```bash\n# Clone repository\ngit clone https://github.com/observee-ai/vlm-sdk.git\ncd vlm-sdk\n\n# Install with dev dependencies\npip install -e \".[dev]\"\n\n# Include API stack if you plan to run the server locally\npip install -r requirements.txt\n\n# Run tests\npytest tests/\n\n# Format code\nblack vlm/ api/\nruff check vlm/ api/\n\n# Run API server (development)\nuvicorn api.main:app --reload\n```\n\n---\n\n## \ud83c\udfaf Use Cases\n\n- **\ud83c\udfe2 Security & Surveillance**: 24/7 perimeter monitoring with motion alerts\n- **\ud83c\udfea Retail Analytics**: Customer counting, queue analysis, behavior tracking\n- **\ud83d\ude97 Traffic Monitoring**: Vehicle counting, flow analysis, incident detection\n- **\ud83c\udfe0 Smart Home**: Activity monitoring, intrusion detection\n- **\ud83c\udfed Industrial**: Safety compliance, equipment monitoring\n\n---\n\n## \ud83d\udcca Cost Comparison\n\n| Approach | Frames/Hour | VLM API Calls | Cost Reduction |\n|----------|-------------|---------------|----------------|\n| **Frame-by-frame** | 54,000 (15 FPS) | 54,000 | Baseline |\n| **Event-based (VLMS)** | 54,000 | ~1,000 | **98%** \u2705 |\n\n*Example: 1-hour 15 FPS stream with 5-10 motion events*\n\n---\n\n## \ud83e\udd1d Contributing\n\nContributions welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.\n\n1. Fork the repository\n2. Create a feature branch (`git checkout -b feature/amazing-feature`)\n3. Commit your changes (`git commit -m 'Add amazing feature'`)\n4. Push to the branch (`git push origin feature/amazing-feature`)\n5. Open a Pull Request\n\n---\n\n## \ud83d\udcc4 License\n\n**Apache-2.0** \u2013 Permissive license suitable for commercial and open-source use.\n\nSee [LICENSE](LICENSE) for the complete text. Commercial support is available on request.\n\n---\n\n## \ud83d\ude4f Acknowledgments\n\n- **Ultralytics RT-DETR**: Object detection and tracking\n- **FastAPI**: Modern Python web framework\n- **Google Gemini**: Video understanding API\n- **ByteTrack**: Multi-object tracking algorithm\n\n\n---\n\n**Built with \u2764\ufe0f for efficient video intelligence in SF**\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "VLMS - Video Intelligence SDK with event-based processing",
"version": "0.0.2",
"project_urls": null,
"split_keywords": [
"video",
" ai",
" vlm",
" gemini",
" vision",
" rtdetr",
" detection",
" rtsp",
" onvif"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "bcc5129c50ce62a2ef4e5ae0653d0c7ea56c8785afd024ee8adbdaff2f18edbf",
"md5": "78efd0eeee4cf61a199303549e842675",
"sha256": "bf42e1d2e1e644da48eb4533e3fd1f249d30068932872e3e1a72c53fbc0c49bb"
},
"downloads": -1,
"filename": "vlm_sdk-0.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "78efd0eeee4cf61a199303549e842675",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 9466,
"upload_time": "2025-10-30T07:14:43",
"upload_time_iso_8601": "2025-10-30T07:14:43.206186Z",
"url": "https://files.pythonhosted.org/packages/bc/c5/129c50ce62a2ef4e5ae0653d0c7ea56c8785afd024ee8adbdaff2f18edbf/vlm_sdk-0.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "41146c96fe0462cf19c7e592a9ee6144b7363ffb86b62b60d64865105c20fc69",
"md5": "256be993b6ad07debc87971ad946d0ff",
"sha256": "60b9cd0aebea66c16a8a43f02dccf32cf345d9993c35661a0c4f2f95860a2dc1"
},
"downloads": -1,
"filename": "vlm_sdk-0.0.2.tar.gz",
"has_sig": false,
"md5_digest": "256be993b6ad07debc87971ad946d0ff",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 9790,
"upload_time": "2025-10-30T07:14:44",
"upload_time_iso_8601": "2025-10-30T07:14:44.426941Z",
"url": "https://files.pythonhosted.org/packages/41/14/6c96fe0462cf19c7e592a9ee6144b7363ffb86b62b60d64865105c20fc69/vlm_sdk-0.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-30 07:14:44",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "vlm-sdk"
}