VideoRAC


NameVideoRAC JSON
Version 0.2.7 PyPI version JSON
download
home_pageNone
SummaryVideo Retrieval-Augmented Chunking and Q&A Generation Toolkit
upload_time2025-11-03 05:44:37
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseCC BY 4.0
keywords video rag chunking qa clip ssim yt-dlp multimodal_rag
VCS
bugtrack_url
requirements opencv-python tqdm yt-dlp youtube-transcript-api transformers scikit-image scipy langchain colorlog torch
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">

# ๐Ÿช„๐ŸŽ“ **VideoRAC**: *Retrieval-Adaptive Chunking for Lecture Video RAG*

</div>

<div align="center">

<img src="https://github.com/PrismaticLab/Video-RAC/blob/main/docs/assets/logo.png?raw=true" alt="VideoRAC Logo" width="300"/>

### ๐Ÿ›๏ธ *Official CSICC 2025 Implementation*

#### "Adaptive Chunking for VideoRAG Pipelines with a Newly Gathered Bilingual Educational Dataset"

*(Presented at the 30th International Computer Society of Iran Computer Conference โ€” CSICC 2025)*

[![Paper](https://img.shields.io/badge/Paper-CSICC%202025-blue)](https://ieeexplore.ieee.org/document/10967455)
[![Dataset](https://img.shields.io/badge/Dataset-EduViQA-orange)](https://huggingface.co/datasets/UIAIC/EduViQA)
[![Python](https://img.shields.io/badge/Python-3.9+-green.svg)](https://www.python.org/downloads/)
[![License: CC BY 4.0](https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg)](LICENSE)

</div>

---

## ๐Ÿ“Š Project Pipeline

<div align="center">

<!-- โœจ Placeholder for horizontal pipeline image โœจ -->

<img src="https://github.com/PrismaticLab/Video-RAC/blob/main/docs/assets/fig-2.png?raw=true" alt="VideoRAC Pipeline" width="900"/>

</div>

---

## ๐Ÿ“– Overview

**VideoRAC** (Video Retrieval-Adaptive Chunking) provides a comprehensive framework for multimodal retrieval-augmented generation (RAG) in educational videos. This toolkit integrates **visual-semantic chunking**, **entropy-based keyframe selection**, and **LLM-driven question generation** to enable effective multimodal retrieval.

This repository is the **official implementation** of the CSICC 2025 paper by *Hemmat et al.*

> **Hemmat, A., Vadaei, K., Shirian, M., Heydari, M.H., Fatemi, A.**
> *โ€œAdaptive Chunking for VideoRAG Pipelines with a Newly Gathered Bilingual Educational Dataset.โ€*
> *Proceedings of the 30th International Computer Society of Iran Computer Conference (CSICC 2025), University of Isfahan.*

---

## ๐Ÿง  Research Background

This framework underpins the **EduViQA bilingual dataset**, designed for evaluating lecture-based RAG systems in both Persian and English. The dataset and code form a unified ecosystem for multimodal question generation and retrieval evaluation.

**Key Contributions:**

* ๐ŸŽฅ Adaptive Hybrid Chunking โ€” Combines CLIP cosine similarity with SSIM-based visual comparison.
* ๐Ÿงฎ Entropy-Based Keyframe Selection โ€” Extracts high-information frames for retrieval.
* ๐Ÿ—ฃ๏ธ Transcriptโ€“Frame Alignment โ€” Synchronizes ASR transcripts with visual semantics.
* ๐Ÿ” Multimodal Retrieval โ€” Integrates visual and textual embeddings for RAG.
* ๐Ÿง  Benchmark Dataset โ€” 20 bilingual educational videos with 50 QA pairs each.

---

## โš™๏ธ Installation

```bash
pip install VideoRAC
```

---

## ๐Ÿš€ Usage Example

### 1๏ธโƒฃ Hybrid Chunking

```python
from VideoRAC.Modules import HybridChunker

chunker = HybridChunker(
    clip_model='openai/clip-vit-base-patch32',
    alpha=0.6,
    threshold_embedding=0.85,
    threshold_ssim: float=0.8,
    interval: int=1,
)
chunks, timestamps, duration = chunker.chunk("lecture.mp4")
chunker.evaluate()
```

### 2๏ธโƒฃ Q&A Generation

```python
from VideoRAC.Modules import VideoQAGenerator

def my_llm_fn(messages):
    from openai import OpenAI
    client = OpenAI()
    response = client.chat.completions.create(model="gpt-4o", messages=messages)
    return response.choices[0].message.content

urls = ["https://www.youtube.com/watch?v=2uYu8nMR5O4"]
qa = VideoQAGenerator(video_urls=urls, llm_fn=my_llm_fn)
qa.process_videos()
```

---

## ๐Ÿ“ˆ Results Summary (CSICC 2025)

| Method                   | AR       | CR       | F        | Notes                        |
| ------------------------ | -------- | -------- | -------- | ---------------------------- |
| **VideoRAC (CLIP+SSIM)** | **0.87** | **0.82** | **0.91** | Best performance overall     |
| CLIP-only                | 0.80     | 0.75     | 0.83     | Weaker temporal segmentation |
| Simple Slicing           | 0.72     | 0.67     | 0.76     | Time-based only              |

> Evaluated using RAGAS metrics: *Answer Relevance (AR)*, *Context Relevance (CR)*, and *Faithfulness (F)*.

---

## ๐Ÿงพ License

Licensed under **Creative Commons Attribution 4.0 International (CC BY 4.0)**.

You may share and adapt this work with attribution. Please cite our paper when using VideoRAC or EduViQA:

```bibtex
@INPROCEEDINGS{10967455,
  author={Hemmat, Arshia and Vadaei, Kianoosh and Shirian, Melika and Heydari, Mohammad Hassan and Fatemi, Afsaneh},
  booktitle={2025 29th International Computer Conference, Computer Society of Iran (CSICC)}, 
  title={Adaptive Chunking for VideoRAG Pipelines with a Newly Gathered Bilingual Educational Dataset}, 
  year={2025},
  volume={},
  number={},
  pages={1-7},
  keywords={Measurement;Visualization;Large language models;Pipelines;Retrieval augmented generation;Education;Question answering (information retrieval);Multilingual;Standards;Context modeling;Video QA;Datasets Preparation;Academic Question Answering;Multilingual},
  doi={10.1109/CSICC65765.2025.10967455}}
```

---

## ๐Ÿ‘ฅ Authors

**University of Isfahan โ€” Department of Computer Engineering**

* **Kianoosh Vadaei** โ€” [kia.vadaei@gmail.com](mailto:kia.vadaei@gmail.com)
* **Melika Shirian** โ€” [mel.shirian@gmail.com](mailto:mel.shirian@gmail.com)
* **Arshia Hemmat** โ€” [amirarshia.hemmat@kellogg.ox.ac.uk](mailto:amirarshia.hemmat@kellogg.ox.ac.uk)
* **Mohammad Hassan Heydari** โ€” [heidary0081@gmail.com](mailto:heidary0081@gmail.com)
* **Afsaneh Fatemi** โ€” [a.fatemi@eng.ui.ac.ir](mailto:a.fatemi@eng.ui.ac.ir)

---

<div align="center">

**โญ Official CSICC 2025 Implementation โ€” Give it a star if you use it in your research! โญ**
*Made with โค๏ธ at University of Isfahan*

</div>

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "VideoRAC",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "video, rag, chunking, qa, clip, ssim, yt-dlp, multimodal_rag",
    "author": null,
    "author_email": "Melika Shirian <m.shirian@gmail.com>, Kianoosh Vadaei <k.vadaei@eng.ui.ac.ir>, Arshia Hemmat <amirarshia.hemmat@kellogg.ox.ac.uk>, Mohammad Hassan Heydari <heidary0081@gmail.com>, Afsaneh Fatemi <a.fatemi@eng.ui.ac.ir>",
    "download_url": "https://files.pythonhosted.org/packages/7a/a0/92c0f743bb3b3a460a194997f4ad9d046813b0fb91bba44f3c961d308553/videorac-0.2.7.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n\n# \ud83e\ude84\ud83c\udf93 **VideoRAC**: *Retrieval-Adaptive Chunking for Lecture Video RAG*\n\n</div>\n\n<div align=\"center\">\n\n<img src=\"https://github.com/PrismaticLab/Video-RAC/blob/main/docs/assets/logo.png?raw=true\" alt=\"VideoRAC Logo\" width=\"300\"/>\n\n### \ud83c\udfdb\ufe0f *Official CSICC 2025 Implementation*\n\n#### \"Adaptive Chunking for VideoRAG Pipelines with a Newly Gathered Bilingual Educational Dataset\"\n\n*(Presented at the 30th International Computer Society of Iran Computer Conference \u2014 CSICC 2025)*\n\n[![Paper](https://img.shields.io/badge/Paper-CSICC%202025-blue)](https://ieeexplore.ieee.org/document/10967455)\n[![Dataset](https://img.shields.io/badge/Dataset-EduViQA-orange)](https://huggingface.co/datasets/UIAIC/EduViQA)\n[![Python](https://img.shields.io/badge/Python-3.9+-green.svg)](https://www.python.org/downloads/)\n[![License: CC BY 4.0](https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg)](LICENSE)\n\n</div>\n\n---\n\n## \ud83d\udcca Project Pipeline\n\n<div align=\"center\">\n\n<!-- \u2728 Placeholder for horizontal pipeline image \u2728 -->\n\n<img src=\"https://github.com/PrismaticLab/Video-RAC/blob/main/docs/assets/fig-2.png?raw=true\" alt=\"VideoRAC Pipeline\" width=\"900\"/>\n\n</div>\n\n---\n\n## \ud83d\udcd6 Overview\n\n**VideoRAC** (Video Retrieval-Adaptive Chunking) provides a comprehensive framework for multimodal retrieval-augmented generation (RAG) in educational videos. This toolkit integrates **visual-semantic chunking**, **entropy-based keyframe selection**, and **LLM-driven question generation** to enable effective multimodal retrieval.\n\nThis repository is the **official implementation** of the CSICC 2025 paper by *Hemmat et al.*\n\n> **Hemmat, A., Vadaei, K., Shirian, M., Heydari, M.H., Fatemi, A.**\n> *\u201cAdaptive Chunking for VideoRAG Pipelines with a Newly Gathered Bilingual Educational Dataset.\u201d*\n> *Proceedings of the 30th International Computer Society of Iran Computer Conference (CSICC 2025), University of Isfahan.*\n\n---\n\n## \ud83e\udde0 Research Background\n\nThis framework underpins the **EduViQA bilingual dataset**, designed for evaluating lecture-based RAG systems in both Persian and English. The dataset and code form a unified ecosystem for multimodal question generation and retrieval evaluation.\n\n**Key Contributions:**\n\n* \ud83c\udfa5 Adaptive Hybrid Chunking \u2014 Combines CLIP cosine similarity with SSIM-based visual comparison.\n* \ud83e\uddee Entropy-Based Keyframe Selection \u2014 Extracts high-information frames for retrieval.\n* \ud83d\udde3\ufe0f Transcript\u2013Frame Alignment \u2014 Synchronizes ASR transcripts with visual semantics.\n* \ud83d\udd0d Multimodal Retrieval \u2014 Integrates visual and textual embeddings for RAG.\n* \ud83e\udde0 Benchmark Dataset \u2014 20 bilingual educational videos with 50 QA pairs each.\n\n---\n\n## \u2699\ufe0f Installation\n\n```bash\npip install VideoRAC\n```\n\n---\n\n## \ud83d\ude80 Usage Example\n\n### 1\ufe0f\u20e3 Hybrid Chunking\n\n```python\nfrom VideoRAC.Modules import HybridChunker\n\nchunker = HybridChunker(\n    clip_model='openai/clip-vit-base-patch32',\n    alpha=0.6,\n    threshold_embedding=0.85,\n    threshold_ssim: float=0.8,\n    interval: int=1,\n)\nchunks, timestamps, duration = chunker.chunk(\"lecture.mp4\")\nchunker.evaluate()\n```\n\n### 2\ufe0f\u20e3 Q&A Generation\n\n```python\nfrom VideoRAC.Modules import VideoQAGenerator\n\ndef my_llm_fn(messages):\n    from openai import OpenAI\n    client = OpenAI()\n    response = client.chat.completions.create(model=\"gpt-4o\", messages=messages)\n    return response.choices[0].message.content\n\nurls = [\"https://www.youtube.com/watch?v=2uYu8nMR5O4\"]\nqa = VideoQAGenerator(video_urls=urls, llm_fn=my_llm_fn)\nqa.process_videos()\n```\n\n---\n\n## \ud83d\udcc8 Results Summary (CSICC 2025)\n\n| Method                   | AR       | CR       | F        | Notes                        |\n| ------------------------ | -------- | -------- | -------- | ---------------------------- |\n| **VideoRAC (CLIP+SSIM)** | **0.87** | **0.82** | **0.91** | Best performance overall     |\n| CLIP-only                | 0.80     | 0.75     | 0.83     | Weaker temporal segmentation |\n| Simple Slicing           | 0.72     | 0.67     | 0.76     | Time-based only              |\n\n> Evaluated using RAGAS metrics: *Answer Relevance (AR)*, *Context Relevance (CR)*, and *Faithfulness (F)*.\n\n---\n\n## \ud83e\uddfe License\n\nLicensed under **Creative Commons Attribution 4.0 International (CC BY 4.0)**.\n\nYou may share and adapt this work with attribution. Please cite our paper when using VideoRAC or EduViQA:\n\n```bibtex\n@INPROCEEDINGS{10967455,\n  author={Hemmat, Arshia and Vadaei, Kianoosh and Shirian, Melika and Heydari, Mohammad Hassan and Fatemi, Afsaneh},\n  booktitle={2025 29th International Computer Conference, Computer Society of Iran (CSICC)}, \n  title={Adaptive Chunking for VideoRAG Pipelines with a Newly Gathered Bilingual Educational Dataset}, \n  year={2025},\n  volume={},\n  number={},\n  pages={1-7},\n  keywords={Measurement;Visualization;Large language models;Pipelines;Retrieval augmented generation;Education;Question answering (information retrieval);Multilingual;Standards;Context modeling;Video QA;Datasets Preparation;Academic Question Answering;Multilingual},\n  doi={10.1109/CSICC65765.2025.10967455}}\n```\n\n---\n\n## \ud83d\udc65 Authors\n\n**University of Isfahan \u2014 Department of Computer Engineering**\n\n* **Kianoosh Vadaei** \u2014 [kia.vadaei@gmail.com](mailto:kia.vadaei@gmail.com)\n* **Melika Shirian** \u2014 [mel.shirian@gmail.com](mailto:mel.shirian@gmail.com)\n* **Arshia Hemmat** \u2014 [amirarshia.hemmat@kellogg.ox.ac.uk](mailto:amirarshia.hemmat@kellogg.ox.ac.uk)\n* **Mohammad Hassan Heydari** \u2014 [heidary0081@gmail.com](mailto:heidary0081@gmail.com)\n* **Afsaneh Fatemi** \u2014 [a.fatemi@eng.ui.ac.ir](mailto:a.fatemi@eng.ui.ac.ir)\n\n---\n\n<div align=\"center\">\n\n**\u2b50 Official CSICC 2025 Implementation \u2014 Give it a star if you use it in your research! \u2b50**\n*Made with \u2764\ufe0f at University of Isfahan*\n\n</div>\n",
    "bugtrack_url": null,
    "license": "CC BY 4.0",
    "summary": "Video Retrieval-Augmented Chunking and Q&A Generation Toolkit",
    "version": "0.2.7",
    "project_urls": {
        "Documentation": "https://prismaticlab.github.io/Video-RAC/",
        "Homepage": "https://prismaticlab.github.io/Video-RAC/",
        "Issues": "https://github.com/PrismaticLab/Video-RAC/issues",
        "Repository": "https://github.com/PrismaticLab/Video-RAC"
    },
    "split_keywords": [
        "video",
        " rag",
        " chunking",
        " qa",
        " clip",
        " ssim",
        " yt-dlp",
        " multimodal_rag"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d49c8b151cf79c997097b1d27204976b5f3e539a1fc3f186333fefe1394aa177",
                "md5": "2cd47db36bdaa64651aa58c5453e3db0",
                "sha256": "ef2b8ac28bdcea5e59f7e23807362349d6e0461d4d6ebcf856bdc551c0b7174c"
            },
            "downloads": -1,
            "filename": "videorac-0.2.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2cd47db36bdaa64651aa58c5453e3db0",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 18516,
            "upload_time": "2025-11-03T05:44:35",
            "upload_time_iso_8601": "2025-11-03T05:44:35.525464Z",
            "url": "https://files.pythonhosted.org/packages/d4/9c/8b151cf79c997097b1d27204976b5f3e539a1fc3f186333fefe1394aa177/videorac-0.2.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7aa092c0f743bb3b3a460a194997f4ad9d046813b0fb91bba44f3c961d308553",
                "md5": "52cea063913808af405694fdfa38ffed",
                "sha256": "6dce66a9bae051bb769c098f8af9e5be189f3f60539c31ae8e8c0a29f8774f90"
            },
            "downloads": -1,
            "filename": "videorac-0.2.7.tar.gz",
            "has_sig": false,
            "md5_digest": "52cea063913808af405694fdfa38ffed",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 20057,
            "upload_time": "2025-11-03T05:44:37",
            "upload_time_iso_8601": "2025-11-03T05:44:37.106955Z",
            "url": "https://files.pythonhosted.org/packages/7a/a0/92c0f743bb3b3a460a194997f4ad9d046813b0fb91bba44f3c961d308553/videorac-0.2.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-11-03 05:44:37",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "PrismaticLab",
    "github_project": "Video-RAC",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "opencv-python",
            "specs": [
                [
                    ">=",
                    "4.7"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    ">=",
                    "4.65"
                ]
            ]
        },
        {
            "name": "yt-dlp",
            "specs": [
                [
                    ">=",
                    "2024.8.6"
                ]
            ]
        },
        {
            "name": "youtube-transcript-api",
            "specs": [
                [
                    ">=",
                    "0.6.2"
                ]
            ]
        },
        {
            "name": "transformers",
            "specs": [
                [
                    ">=",
                    "4.41"
                ]
            ]
        },
        {
            "name": "scikit-image",
            "specs": [
                [
                    ">=",
                    "0.22"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    ">=",
                    "1.10"
                ]
            ]
        },
        {
            "name": "langchain",
            "specs": [
                [
                    ">=",
                    "0.2.0"
                ]
            ]
        },
        {
            "name": "colorlog",
            "specs": [
                [
                    ">=",
                    "6.9.0"
                ]
            ]
        },
        {
            "name": "torch",
            "specs": [
                [
                    ">=",
                    "2.8.0"
                ]
            ]
        }
    ],
    "lcname": "videorac"
}
        
Elapsed time: 1.46562s