# 🎬 FrameWise
*AI-powered video search and Q&A for tutorial content*
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
Transform tutorial videos into searchable, interactive content. Find exact moments by meaning, not just keywords.
## Quick Start
```bash
pip install framewise
# Or install with LLM features
pip install framewise[llm]
```
### Basic Usage
```python
from framewise import TranscriptExtractor, FrameExtractor, FrameWiseEmbedder, FrameWiseVectorStore
# 1. Process video
transcript = TranscriptExtractor().extract("tutorial.mp4")
frames = FrameExtractor().extract("tutorial.mp4", transcript)
# 2. Create searchable index
embedder = FrameWiseEmbedder()
embeddings = embedder.embed_frames_batch(frames)
store = FrameWiseVectorStore()
store.create_table(embeddings)
# 3. Search
results = store.search_by_text("How do I export?", embedder, limit=3)
for r in results:
print(f"{r['timestamp']}s: {r['text']}")
```
### With LLM Q&A (Optional)
```python
from framewise import FrameWiseQA
# Requires: pip install framewise[llm]
# Set ANTHROPIC_API_KEY environment variable
qa = FrameWiseQA(vector_store=store, embedder=embedder)
response = qa.ask("How do I get started?")
print(response['answer']) # Natural language answer with frame references
```
## Features
- 🎙️ **Transcript Extraction** - Whisper-powered speech-to-text
- 🖼️ **Smart Frame Extraction** - Captures key visual moments
- 🧠 **Multimodal Embeddings** - CLIP + Sentence Transformers
- 🔍 **Semantic Search** - Find by meaning, not keywords
- 🤖 **LLM Q&A** - Optional Claude integration (requires API key)
## How It Works
```
Video → Transcript (Whisper) → Frames (OpenCV) → Embeddings (CLIP) → Search (LanceDB) → [Optional] Q&A (Claude)
```
**Core Pipeline** (no API keys needed):
1. Extract audio transcript with timestamps
2. Capture key frames at important moments
3. Generate multimodal embeddings
4. Search by semantic similarity
**Optional LLM Layer**:
- Add natural language Q&A with Claude
- Requires `pip install framewise[llm]` and API key
## Installation
### Requirements
- Python 3.9+
- ffmpeg (for video processing)
```bash
# macOS
brew install ffmpeg
# Ubuntu/Debian
apt-get install ffmpeg
```
### Install FrameWise
```bash
# Core features only
pip install framewise
# With LLM Q&A support
pip install framewise[llm]
# From source
git clone https://github.com/mesmaeili73/framewise.git
cd framewise
pip install -e .
```
## Configuration
### Frame Extraction
```python
extractor = FrameExtractor(
strategy="hybrid", # "scene", "transcript", or "hybrid"
max_frames_per_video=20,
scene_threshold=0.3,
quality_threshold=0.5
)
```
### Embeddings
```python
embedder = FrameWiseEmbedder(
text_model="all-MiniLM-L6-v2",
vision_model="openai/clip-vit-base-patch32",
device="cuda" # or "cpu"
)
```
### LLM Q&A (Optional)
```python
# Set environment variable
export ANTHROPIC_API_KEY=your_key_here
# Or pass directly
qa = FrameWiseQA(
vector_store=store,
embedder=embedder,
model="claude-3-5-sonnet-20241022",
api_key="your_key_here"
)
```
## Current Limitations (V1)
- **Vector Store**: LanceDB only (local storage)
- **LLM Provider**: Claude/Anthropic only
- **Embedding Models**: Fixed (CLIP + Sentence Transformers)
Future versions will support multiple vector stores (Qdrant, Elasticsearch), LLM providers (OpenAI, VertexAI), and configurable embedding models.
## Examples
See the [`examples/`](examples/) directory for complete examples:
- `extract_transcript.py` - Basic transcript extraction
- `extract_frames.py` - Frame extraction strategies
- `complete_pipeline.py` - Full end-to-end workflow
## Use Cases
- **Product Teams**: Build AI assistants for tutorial libraries
- **EdTech**: Make educational videos searchable
- **Documentation**: Create interactive video knowledge bases
## Performance
For 50 videos (5 min each):
- **Processing**: ~15-90 min (GPU vs CPU)
- **Search**: <50ms per query
- **Q&A**: ~2-3 seconds (with LLM)
## Contributing
Contributions welcome! This is an open-source project.
## License
MIT License - see [LICENSE](LICENSE) file
## Built With
- [OpenAI Whisper](https://github.com/openai/whisper) - Speech recognition
- [CLIP](https://github.com/openai/CLIP) - Vision-language embeddings
- [Sentence Transformers](https://www.sbert.net/) - Text embeddings
- [LanceDB](https://lancedb.com/) - Vector database
- [LangChain](https://www.langchain.com/) - LLM orchestration
- [Anthropic Claude](https://www.anthropic.com/) - Language model
---
*FrameWise: See the right frame at the right time* 🎬
Raw data
{
"_id": null,
"home_page": "https://github.com/mesmaeili73/framewise",
"name": "framewise",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.14,>=3.9",
"maintainer_email": null,
"keywords": "video, rag, ai, tutorials, multimodal, embeddings, llm, whisper, clip, langchain",
"author": "Mohammad Esmaeili",
"author_email": "mimonachi@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/93/74/f4af2a17e91d441e5ff3351ffd1537d500eb01379b1d9ea18ae9b5c23690/framewise-0.1.3.tar.gz",
"platform": null,
"description": "# \ud83c\udfac FrameWise\n\n*AI-powered video search and Q&A for tutorial content*\n\n[](https://www.python.org/downloads/)\n[](https://opensource.org/licenses/MIT)\n\nTransform tutorial videos into searchable, interactive content. Find exact moments by meaning, not just keywords.\n\n## Quick Start\n\n```bash\npip install framewise\n\n# Or install with LLM features\npip install framewise[llm]\n```\n\n### Basic Usage\n\n```python\nfrom framewise import TranscriptExtractor, FrameExtractor, FrameWiseEmbedder, FrameWiseVectorStore\n\n# 1. Process video\ntranscript = TranscriptExtractor().extract(\"tutorial.mp4\")\nframes = FrameExtractor().extract(\"tutorial.mp4\", transcript)\n\n# 2. Create searchable index\nembedder = FrameWiseEmbedder()\nembeddings = embedder.embed_frames_batch(frames)\n\nstore = FrameWiseVectorStore()\nstore.create_table(embeddings)\n\n# 3. Search\nresults = store.search_by_text(\"How do I export?\", embedder, limit=3)\nfor r in results:\n print(f\"{r['timestamp']}s: {r['text']}\")\n```\n\n### With LLM Q&A (Optional)\n\n```python\nfrom framewise import FrameWiseQA\n\n# Requires: pip install framewise[llm]\n# Set ANTHROPIC_API_KEY environment variable\n\nqa = FrameWiseQA(vector_store=store, embedder=embedder)\nresponse = qa.ask(\"How do I get started?\")\nprint(response['answer']) # Natural language answer with frame references\n```\n\n## Features\n\n- \ud83c\udf99\ufe0f **Transcript Extraction** - Whisper-powered speech-to-text\n- \ud83d\uddbc\ufe0f **Smart Frame Extraction** - Captures key visual moments\n- \ud83e\udde0 **Multimodal Embeddings** - CLIP + Sentence Transformers\n- \ud83d\udd0d **Semantic Search** - Find by meaning, not keywords\n- \ud83e\udd16 **LLM Q&A** - Optional Claude integration (requires API key)\n\n## How It Works\n\n```\nVideo \u2192 Transcript (Whisper) \u2192 Frames (OpenCV) \u2192 Embeddings (CLIP) \u2192 Search (LanceDB) \u2192 [Optional] Q&A (Claude)\n```\n\n**Core Pipeline** (no API keys needed):\n1. Extract audio transcript with timestamps\n2. Capture key frames at important moments\n3. Generate multimodal embeddings\n4. Search by semantic similarity\n\n**Optional LLM Layer**:\n- Add natural language Q&A with Claude\n- Requires `pip install framewise[llm]` and API key\n\n## Installation\n\n### Requirements\n- Python 3.9+\n- ffmpeg (for video processing)\n\n```bash\n# macOS\nbrew install ffmpeg\n\n# Ubuntu/Debian\napt-get install ffmpeg\n```\n\n### Install FrameWise\n\n```bash\n# Core features only\npip install framewise\n\n# With LLM Q&A support\npip install framewise[llm]\n\n# From source\ngit clone https://github.com/mesmaeili73/framewise.git\ncd framewise\npip install -e .\n```\n\n## Configuration\n\n### Frame Extraction\n\n```python\nextractor = FrameExtractor(\n strategy=\"hybrid\", # \"scene\", \"transcript\", or \"hybrid\"\n max_frames_per_video=20,\n scene_threshold=0.3,\n quality_threshold=0.5\n)\n```\n\n### Embeddings\n\n```python\nembedder = FrameWiseEmbedder(\n text_model=\"all-MiniLM-L6-v2\",\n vision_model=\"openai/clip-vit-base-patch32\",\n device=\"cuda\" # or \"cpu\"\n)\n```\n\n### LLM Q&A (Optional)\n\n```python\n# Set environment variable\nexport ANTHROPIC_API_KEY=your_key_here\n\n# Or pass directly\nqa = FrameWiseQA(\n vector_store=store,\n embedder=embedder,\n model=\"claude-3-5-sonnet-20241022\",\n api_key=\"your_key_here\"\n)\n```\n\n## Current Limitations (V1)\n\n- **Vector Store**: LanceDB only (local storage)\n- **LLM Provider**: Claude/Anthropic only\n- **Embedding Models**: Fixed (CLIP + Sentence Transformers)\n\nFuture versions will support multiple vector stores (Qdrant, Elasticsearch), LLM providers (OpenAI, VertexAI), and configurable embedding models.\n\n## Examples\n\nSee the [`examples/`](examples/) directory for complete examples:\n- `extract_transcript.py` - Basic transcript extraction\n- `extract_frames.py` - Frame extraction strategies\n- `complete_pipeline.py` - Full end-to-end workflow\n\n## Use Cases\n\n- **Product Teams**: Build AI assistants for tutorial libraries\n- **EdTech**: Make educational videos searchable\n- **Documentation**: Create interactive video knowledge bases\n\n## Performance\n\nFor 50 videos (5 min each):\n- **Processing**: ~15-90 min (GPU vs CPU)\n- **Search**: <50ms per query\n- **Q&A**: ~2-3 seconds (with LLM)\n\n## Contributing\n\nContributions welcome! This is an open-source project.\n\n## License\n\nMIT License - see [LICENSE](LICENSE) file\n\n## Built With\n\n- [OpenAI Whisper](https://github.com/openai/whisper) - Speech recognition\n- [CLIP](https://github.com/openai/CLIP) - Vision-language embeddings\n- [Sentence Transformers](https://www.sbert.net/) - Text embeddings\n- [LanceDB](https://lancedb.com/) - Vector database\n- [LangChain](https://www.langchain.com/) - LLM orchestration\n- [Anthropic Claude](https://www.anthropic.com/) - Language model\n\n---\n\n*FrameWise: See the right frame at the right time* \ud83c\udfac\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "AI-powered video tutorial assistant with intelligent frame extraction and multimodal RAG",
"version": "0.1.3",
"project_urls": {
"Documentation": "https://github.com/mesmaeili73/framewise#readme",
"Homepage": "https://github.com/mesmaeili73/framewise",
"Repository": "https://github.com/mesmaeili73/framewise"
},
"split_keywords": [
"video",
" rag",
" ai",
" tutorials",
" multimodal",
" embeddings",
" llm",
" whisper",
" clip",
" langchain"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "239a205fde514668bdba72bea062bb294e42bb8a3fd4a792978f3e22ff95534c",
"md5": "90e1edf6285c67bd136fab1ce975ff47",
"sha256": "4cf831f3d01ee91bbad95b33e8bc459f140c55305018dc87901705405297a2c2"
},
"downloads": -1,
"filename": "framewise-0.1.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "90e1edf6285c67bd136fab1ce975ff47",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.14,>=3.9",
"size": 31782,
"upload_time": "2025-10-19T00:25:32",
"upload_time_iso_8601": "2025-10-19T00:25:32.890712Z",
"url": "https://files.pythonhosted.org/packages/23/9a/205fde514668bdba72bea062bb294e42bb8a3fd4a792978f3e22ff95534c/framewise-0.1.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "9374f4af2a17e91d441e5ff3351ffd1537d500eb01379b1d9ea18ae9b5c23690",
"md5": "04500658e9aea1de1f0a15502014c77d",
"sha256": "921a3baed032b6b07c27f44d06219cb9cebe6bd2a1e49c4cd412ebfe9d77d120"
},
"downloads": -1,
"filename": "framewise-0.1.3.tar.gz",
"has_sig": false,
"md5_digest": "04500658e9aea1de1f0a15502014c77d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.14,>=3.9",
"size": 27339,
"upload_time": "2025-10-19T00:25:34",
"upload_time_iso_8601": "2025-10-19T00:25:34.209686Z",
"url": "https://files.pythonhosted.org/packages/93/74/f4af2a17e91d441e5ff3351ffd1537d500eb01379b1d9ea18ae9b5c23690/framewise-0.1.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-19 00:25:34",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "mesmaeili73",
"github_project": "framewise",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "framewise"
}