framewise

Name	framewise JSON
Version	0.1.3 JSON
	download
home_page	https://github.com/mesmaeili73/framewise
Summary	AI-powered video tutorial assistant with intelligent frame extraction and multimodal RAG
upload_time	2025-10-19 00:25:34
maintainer	None
docs_url	None
author	Mohammad Esmaeili
requires_python	<3.14,>=3.9
license	MIT
keywords	video rag ai tutorials multimodal embeddings llm whisper clip langchain
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # 🎬 FrameWise

*AI-powered video search and Q&A for tutorial content*

[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

Transform tutorial videos into searchable, interactive content. Find exact moments by meaning, not just keywords.

## Quick Start

```bash
pip install framewise

# Or install with LLM features
pip install framewise[llm]
```

### Basic Usage

```python
from framewise import TranscriptExtractor, FrameExtractor, FrameWiseEmbedder, FrameWiseVectorStore

# 1. Process video
transcript = TranscriptExtractor().extract("tutorial.mp4")
frames = FrameExtractor().extract("tutorial.mp4", transcript)

# 2. Create searchable index
embedder = FrameWiseEmbedder()
embeddings = embedder.embed_frames_batch(frames)

store = FrameWiseVectorStore()
store.create_table(embeddings)

# 3. Search
results = store.search_by_text("How do I export?", embedder, limit=3)
for r in results:
    print(f"{r['timestamp']}s: {r['text']}")
```

### With LLM Q&A (Optional)

```python
from framewise import FrameWiseQA

# Requires: pip install framewise[llm]
# Set ANTHROPIC_API_KEY environment variable

qa = FrameWiseQA(vector_store=store, embedder=embedder)
response = qa.ask("How do I get started?")
print(response['answer'])  # Natural language answer with frame references
```

## Features

- 🎙️ **Transcript Extraction** - Whisper-powered speech-to-text
- 🖼️ **Smart Frame Extraction** - Captures key visual moments
- 🧠 **Multimodal Embeddings** - CLIP + Sentence Transformers
- 🔍 **Semantic Search** - Find by meaning, not keywords
- 🤖 **LLM Q&A** - Optional Claude integration (requires API key)

## How It Works

```
Video → Transcript (Whisper) → Frames (OpenCV) → Embeddings (CLIP) → Search (LanceDB) → [Optional] Q&A (Claude)
```

**Core Pipeline** (no API keys needed):
1. Extract audio transcript with timestamps
2. Capture key frames at important moments
3. Generate multimodal embeddings
4. Search by semantic similarity

**Optional LLM Layer**:
- Add natural language Q&A with Claude
- Requires `pip install framewise[llm]` and API key

## Installation

### Requirements
- Python 3.9+
- ffmpeg (for video processing)

```bash
# macOS
brew install ffmpeg

# Ubuntu/Debian
apt-get install ffmpeg
```

### Install FrameWise

```bash
# Core features only
pip install framewise

# With LLM Q&A support
pip install framewise[llm]

# From source
git clone https://github.com/mesmaeili73/framewise.git
cd framewise
pip install -e .
```

## Configuration

### Frame Extraction

```python
extractor = FrameExtractor(
    strategy="hybrid",        # "scene", "transcript", or "hybrid"
    max_frames_per_video=20,
    scene_threshold=0.3,
    quality_threshold=0.5
)
```

### Embeddings

```python
embedder = FrameWiseEmbedder(
    text_model="all-MiniLM-L6-v2",
    vision_model="openai/clip-vit-base-patch32",
    device="cuda"  # or "cpu"
)
```

### LLM Q&A (Optional)

```python
# Set environment variable
export ANTHROPIC_API_KEY=your_key_here

# Or pass directly
qa = FrameWiseQA(
    vector_store=store,
    embedder=embedder,
    model="claude-3-5-sonnet-20241022",
    api_key="your_key_here"
)
```

## Current Limitations (V1)

- **Vector Store**: LanceDB only (local storage)
- **LLM Provider**: Claude/Anthropic only
- **Embedding Models**: Fixed (CLIP + Sentence Transformers)

Future versions will support multiple vector stores (Qdrant, Elasticsearch), LLM providers (OpenAI, VertexAI), and configurable embedding models.

## Examples

See the [`examples/`](examples/) directory for complete examples:
- `extract_transcript.py` - Basic transcript extraction
- `extract_frames.py` - Frame extraction strategies
- `complete_pipeline.py` - Full end-to-end workflow

## Use Cases

- **Product Teams**: Build AI assistants for tutorial libraries
- **EdTech**: Make educational videos searchable
- **Documentation**: Create interactive video knowledge bases

## Performance

For 50 videos (5 min each):
- **Processing**: ~15-90 min (GPU vs CPU)
- **Search**: <50ms per query
- **Q&A**: ~2-3 seconds (with LLM)

## Contributing

Contributions welcome! This is an open-source project.

## License

MIT License - see [LICENSE](LICENSE) file

## Built With

- [OpenAI Whisper](https://github.com/openai/whisper) - Speech recognition
- [CLIP](https://github.com/openai/CLIP) - Vision-language embeddings
- [Sentence Transformers](https://www.sbert.net/) - Text embeddings
- [LanceDB](https://lancedb.com/) - Vector database
- [LangChain](https://www.langchain.com/) - LLM orchestration
- [Anthropic Claude](https://www.anthropic.com/) - Language model

---

*FrameWise: See the right frame at the right time* 🎬

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/mesmaeili73/framewise",
    "name": "framewise",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.14,>=3.9",
    "maintainer_email": null,
    "keywords": "video, rag, ai, tutorials, multimodal, embeddings, llm, whisper, clip, langchain",
    "author": "Mohammad Esmaeili",
    "author_email": "mimonachi@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/93/74/f4af2a17e91d441e5ff3351ffd1537d500eb01379b1d9ea18ae9b5c23690/framewise-0.1.3.tar.gz",
    "platform": null,
    "description": "# \ud83c\udfac FrameWise\n\n*AI-powered video search and Q&A for tutorial content*\n\n[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\nTransform tutorial videos into searchable, interactive content. Find exact moments by meaning, not just keywords.\n\n## Quick Start\n\n```bash\npip install framewise\n\n# Or install with LLM features\npip install framewise[llm]\n```\n\n### Basic Usage\n\n```python\nfrom framewise import TranscriptExtractor, FrameExtractor, FrameWiseEmbedder, FrameWiseVectorStore\n\n# 1. Process video\ntranscript = TranscriptExtractor().extract(\"tutorial.mp4\")\nframes = FrameExtractor().extract(\"tutorial.mp4\", transcript)\n\n# 2. Create searchable index\nembedder = FrameWiseEmbedder()\nembeddings = embedder.embed_frames_batch(frames)\n\nstore = FrameWiseVectorStore()\nstore.create_table(embeddings)\n\n# 3. Search\nresults = store.search_by_text(\"How do I export?\", embedder, limit=3)\nfor r in results:\n    print(f\"{r['timestamp']}s: {r['text']}\")\n```\n\n### With LLM Q&A (Optional)\n\n```python\nfrom framewise import FrameWiseQA\n\n# Requires: pip install framewise[llm]\n# Set ANTHROPIC_API_KEY environment variable\n\nqa = FrameWiseQA(vector_store=store, embedder=embedder)\nresponse = qa.ask(\"How do I get started?\")\nprint(response['answer'])  # Natural language answer with frame references\n```\n\n## Features\n\n- \ud83c\udf99\ufe0f **Transcript Extraction** - Whisper-powered speech-to-text\n- \ud83d\uddbc\ufe0f **Smart Frame Extraction** - Captures key visual moments\n- \ud83e\udde0 **Multimodal Embeddings** - CLIP + Sentence Transformers\n- \ud83d\udd0d **Semantic Search** - Find by meaning, not keywords\n- \ud83e\udd16 **LLM Q&A** - Optional Claude integration (requires API key)\n\n## How It Works\n\n```\nVideo \u2192 Transcript (Whisper) \u2192 Frames (OpenCV) \u2192 Embeddings (CLIP) \u2192 Search (LanceDB) \u2192 [Optional] Q&A (Claude)\n```\n\n**Core Pipeline** (no API keys needed):\n1. Extract audio transcript with timestamps\n2. Capture key frames at important moments\n3. Generate multimodal embeddings\n4. Search by semantic similarity\n\n**Optional LLM Layer**:\n- Add natural language Q&A with Claude\n- Requires `pip install framewise[llm]` and API key\n\n## Installation\n\n### Requirements\n- Python 3.9+\n- ffmpeg (for video processing)\n\n```bash\n# macOS\nbrew install ffmpeg\n\n# Ubuntu/Debian\napt-get install ffmpeg\n```\n\n### Install FrameWise\n\n```bash\n# Core features only\npip install framewise\n\n# With LLM Q&A support\npip install framewise[llm]\n\n# From source\ngit clone https://github.com/mesmaeili73/framewise.git\ncd framewise\npip install -e .\n```\n\n## Configuration\n\n### Frame Extraction\n\n```python\nextractor = FrameExtractor(\n    strategy=\"hybrid\",        # \"scene\", \"transcript\", or \"hybrid\"\n    max_frames_per_video=20,\n    scene_threshold=0.3,\n    quality_threshold=0.5\n)\n```\n\n### Embeddings\n\n```python\nembedder = FrameWiseEmbedder(\n    text_model=\"all-MiniLM-L6-v2\",\n    vision_model=\"openai/clip-vit-base-patch32\",\n    device=\"cuda\"  # or \"cpu\"\n)\n```\n\n### LLM Q&A (Optional)\n\n```python\n# Set environment variable\nexport ANTHROPIC_API_KEY=your_key_here\n\n# Or pass directly\nqa = FrameWiseQA(\n    vector_store=store,\n    embedder=embedder,\n    model=\"claude-3-5-sonnet-20241022\",\n    api_key=\"your_key_here\"\n)\n```\n\n## Current Limitations (V1)\n\n- **Vector Store**: LanceDB only (local storage)\n- **LLM Provider**: Claude/Anthropic only\n- **Embedding Models**: Fixed (CLIP + Sentence Transformers)\n\nFuture versions will support multiple vector stores (Qdrant, Elasticsearch), LLM providers (OpenAI, VertexAI), and configurable embedding models.\n\n## Examples\n\nSee the [`examples/`](examples/) directory for complete examples:\n- `extract_transcript.py` - Basic transcript extraction\n- `extract_frames.py` - Frame extraction strategies\n- `complete_pipeline.py` - Full end-to-end workflow\n\n## Use Cases\n\n- **Product Teams**: Build AI assistants for tutorial libraries\n- **EdTech**: Make educational videos searchable\n- **Documentation**: Create interactive video knowledge bases\n\n## Performance\n\nFor 50 videos (5 min each):\n- **Processing**: ~15-90 min (GPU vs CPU)\n- **Search**: <50ms per query\n- **Q&A**: ~2-3 seconds (with LLM)\n\n## Contributing\n\nContributions welcome! This is an open-source project.\n\n## License\n\nMIT License - see [LICENSE](LICENSE) file\n\n## Built With\n\n- [OpenAI Whisper](https://github.com/openai/whisper) - Speech recognition\n- [CLIP](https://github.com/openai/CLIP) - Vision-language embeddings\n- [Sentence Transformers](https://www.sbert.net/) - Text embeddings\n- [LanceDB](https://lancedb.com/) - Vector database\n- [LangChain](https://www.langchain.com/) - LLM orchestration\n- [Anthropic Claude](https://www.anthropic.com/) - Language model\n\n---\n\n*FrameWise: See the right frame at the right time* \ud83c\udfac\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "AI-powered video tutorial assistant with intelligent frame extraction and multimodal RAG",
    "version": "0.1.3",
    "project_urls": {
        "Documentation": "https://github.com/mesmaeili73/framewise#readme",
        "Homepage": "https://github.com/mesmaeili73/framewise",
        "Repository": "https://github.com/mesmaeili73/framewise"
    },
    "split_keywords": [
        "video",
        " rag",
        " ai",
        " tutorials",
        " multimodal",
        " embeddings",
        " llm",
        " whisper",
        " clip",
        " langchain"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "239a205fde514668bdba72bea062bb294e42bb8a3fd4a792978f3e22ff95534c",
                "md5": "90e1edf6285c67bd136fab1ce975ff47",
                "sha256": "4cf831f3d01ee91bbad95b33e8bc459f140c55305018dc87901705405297a2c2"
            },
            "downloads": -1,
            "filename": "framewise-0.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "90e1edf6285c67bd136fab1ce975ff47",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.14,>=3.9",
            "size": 31782,
            "upload_time": "2025-10-19T00:25:32",
            "upload_time_iso_8601": "2025-10-19T00:25:32.890712Z",
            "url": "https://files.pythonhosted.org/packages/23/9a/205fde514668bdba72bea062bb294e42bb8a3fd4a792978f3e22ff95534c/framewise-0.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9374f4af2a17e91d441e5ff3351ffd1537d500eb01379b1d9ea18ae9b5c23690",
                "md5": "04500658e9aea1de1f0a15502014c77d",
                "sha256": "921a3baed032b6b07c27f44d06219cb9cebe6bd2a1e49c4cd412ebfe9d77d120"
            },
            "downloads": -1,
            "filename": "framewise-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "04500658e9aea1de1f0a15502014c77d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.14,>=3.9",
            "size": 27339,
            "upload_time": "2025-10-19T00:25:34",
            "upload_time_iso_8601": "2025-10-19T00:25:34.209686Z",
            "url": "https://files.pythonhosted.org/packages/93/74/f4af2a17e91d441e5ff3351ffd1537d500eb01379b1d9ea18ae9b5c23690/framewise-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-19 00:25:34",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mesmaeili73",
    "github_project": "framewise",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "framewise"
}

Mohammad Esmaeili