<div align="center">
# ๐ช๐ **VideoRAC**: *Retrieval-Adaptive Chunking for Lecture Video RAG*
</div>
<div align="center">
<img src="https://github.com/PrismaticLab/Video-RAC/blob/main/docs/assets/logo.png?raw=true" alt="VideoRAC Logo" width="300"/>
### ๐๏ธ *Official CSICC 2025 Implementation*
#### "Adaptive Chunking for VideoRAG Pipelines with a Newly Gathered Bilingual Educational Dataset"
*(Presented at the 30th International Computer Society of Iran Computer Conference โ CSICC 2025)*
[](https://ieeexplore.ieee.org/document/10967455)
[](https://huggingface.co/datasets/UIAIC/EduViQA)
[](https://www.python.org/downloads/)
[](LICENSE)
</div>
---
## ๐ Project Pipeline
<div align="center">
<!-- โจ Placeholder for horizontal pipeline image โจ -->
<img src="https://github.com/PrismaticLab/Video-RAC/blob/main/docs/assets/fig-2.png?raw=true" alt="VideoRAC Pipeline" width="900"/>
</div>
---
## ๐ Overview
**VideoRAC** (Video Retrieval-Adaptive Chunking) provides a comprehensive framework for multimodal retrieval-augmented generation (RAG) in educational videos. This toolkit integrates **visual-semantic chunking**, **entropy-based keyframe selection**, and **LLM-driven question generation** to enable effective multimodal retrieval.
This repository is the **official implementation** of the CSICC 2025 paper by *Hemmat et al.*
> **Hemmat, A., Vadaei, K., Shirian, M., Heydari, M.H., Fatemi, A.**
> *โAdaptive Chunking for VideoRAG Pipelines with a Newly Gathered Bilingual Educational Dataset.โ*
> *Proceedings of the 30th International Computer Society of Iran Computer Conference (CSICC 2025), University of Isfahan.*
---
## ๐ง Research Background
This framework underpins the **EduViQA bilingual dataset**, designed for evaluating lecture-based RAG systems in both Persian and English. The dataset and code form a unified ecosystem for multimodal question generation and retrieval evaluation.
**Key Contributions:**
* ๐ฅ Adaptive Hybrid Chunking โ Combines CLIP cosine similarity with SSIM-based visual comparison.
* ๐งฎ Entropy-Based Keyframe Selection โ Extracts high-information frames for retrieval.
* ๐ฃ๏ธ TranscriptโFrame Alignment โ Synchronizes ASR transcripts with visual semantics.
* ๐ Multimodal Retrieval โ Integrates visual and textual embeddings for RAG.
* ๐ง Benchmark Dataset โ 20 bilingual educational videos with 50 QA pairs each.
---
## โ๏ธ Installation
```bash
pip install VideoRAC
```
---
## ๐ Usage Example
### 1๏ธโฃ Hybrid Chunking
```python
from VideoRAC.Modules import HybridChunker
chunker = HybridChunker(
clip_model='openai/clip-vit-base-patch32',
alpha=0.6,
threshold_embedding=0.85,
threshold_ssim: float=0.8,
interval: int=1,
)
chunks, timestamps, duration = chunker.chunk("lecture.mp4")
chunker.evaluate()
```
### 2๏ธโฃ Q&A Generation
```python
from VideoRAC.Modules import VideoQAGenerator
def my_llm_fn(messages):
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(model="gpt-4o", messages=messages)
return response.choices[0].message.content
urls = ["https://www.youtube.com/watch?v=2uYu8nMR5O4"]
qa = VideoQAGenerator(video_urls=urls, llm_fn=my_llm_fn)
qa.process_videos()
```
---
## ๐ Results Summary (CSICC 2025)
| Method | AR | CR | F | Notes |
| ------------------------ | -------- | -------- | -------- | ---------------------------- |
| **VideoRAC (CLIP+SSIM)** | **0.87** | **0.82** | **0.91** | Best performance overall |
| CLIP-only | 0.80 | 0.75 | 0.83 | Weaker temporal segmentation |
| Simple Slicing | 0.72 | 0.67 | 0.76 | Time-based only |
> Evaluated using RAGAS metrics: *Answer Relevance (AR)*, *Context Relevance (CR)*, and *Faithfulness (F)*.
---
## ๐งพ License
Licensed under **Creative Commons Attribution 4.0 International (CC BY 4.0)**.
You may share and adapt this work with attribution. Please cite our paper when using VideoRAC or EduViQA:
```bibtex
@INPROCEEDINGS{10967455,
author={Hemmat, Arshia and Vadaei, Kianoosh and Shirian, Melika and Heydari, Mohammad Hassan and Fatemi, Afsaneh},
booktitle={2025 29th International Computer Conference, Computer Society of Iran (CSICC)},
title={Adaptive Chunking for VideoRAG Pipelines with a Newly Gathered Bilingual Educational Dataset},
year={2025},
volume={},
number={},
pages={1-7},
keywords={Measurement;Visualization;Large language models;Pipelines;Retrieval augmented generation;Education;Question answering (information retrieval);Multilingual;Standards;Context modeling;Video QA;Datasets Preparation;Academic Question Answering;Multilingual},
doi={10.1109/CSICC65765.2025.10967455}}
```
---
## ๐ฅ Authors
**University of Isfahan โ Department of Computer Engineering**
* **Kianoosh Vadaei** โ [kia.vadaei@gmail.com](mailto:kia.vadaei@gmail.com)
* **Melika Shirian** โ [mel.shirian@gmail.com](mailto:mel.shirian@gmail.com)
* **Arshia Hemmat** โ [amirarshia.hemmat@kellogg.ox.ac.uk](mailto:amirarshia.hemmat@kellogg.ox.ac.uk)
* **Mohammad Hassan Heydari** โ [heidary0081@gmail.com](mailto:heidary0081@gmail.com)
* **Afsaneh Fatemi** โ [a.fatemi@eng.ui.ac.ir](mailto:a.fatemi@eng.ui.ac.ir)
---
<div align="center">
**โญ Official CSICC 2025 Implementation โ Give it a star if you use it in your research! โญ**
*Made with โค๏ธ at University of Isfahan*
</div>
Raw data
{
"_id": null,
"home_page": null,
"name": "VideoRAC",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "video, rag, chunking, qa, clip, ssim, yt-dlp, multimodal_rag",
"author": null,
"author_email": "Melika Shirian <m.shirian@gmail.com>, Kianoosh Vadaei <k.vadaei@eng.ui.ac.ir>, Arshia Hemmat <amirarshia.hemmat@kellogg.ox.ac.uk>, Mohammad Hassan Heydari <heidary0081@gmail.com>, Afsaneh Fatemi <a.fatemi@eng.ui.ac.ir>",
"download_url": "https://files.pythonhosted.org/packages/7a/a0/92c0f743bb3b3a460a194997f4ad9d046813b0fb91bba44f3c961d308553/videorac-0.2.7.tar.gz",
"platform": null,
"description": "<div align=\"center\">\n\n# \ud83e\ude84\ud83c\udf93 **VideoRAC**: *Retrieval-Adaptive Chunking for Lecture Video RAG*\n\n</div>\n\n<div align=\"center\">\n\n<img src=\"https://github.com/PrismaticLab/Video-RAC/blob/main/docs/assets/logo.png?raw=true\" alt=\"VideoRAC Logo\" width=\"300\"/>\n\n### \ud83c\udfdb\ufe0f *Official CSICC 2025 Implementation*\n\n#### \"Adaptive Chunking for VideoRAG Pipelines with a Newly Gathered Bilingual Educational Dataset\"\n\n*(Presented at the 30th International Computer Society of Iran Computer Conference \u2014 CSICC 2025)*\n\n[](https://ieeexplore.ieee.org/document/10967455)\n[](https://huggingface.co/datasets/UIAIC/EduViQA)\n[](https://www.python.org/downloads/)\n[](LICENSE)\n\n</div>\n\n---\n\n## \ud83d\udcca Project Pipeline\n\n<div align=\"center\">\n\n<!-- \u2728 Placeholder for horizontal pipeline image \u2728 -->\n\n<img src=\"https://github.com/PrismaticLab/Video-RAC/blob/main/docs/assets/fig-2.png?raw=true\" alt=\"VideoRAC Pipeline\" width=\"900\"/>\n\n</div>\n\n---\n\n## \ud83d\udcd6 Overview\n\n**VideoRAC** (Video Retrieval-Adaptive Chunking) provides a comprehensive framework for multimodal retrieval-augmented generation (RAG) in educational videos. This toolkit integrates **visual-semantic chunking**, **entropy-based keyframe selection**, and **LLM-driven question generation** to enable effective multimodal retrieval.\n\nThis repository is the **official implementation** of the CSICC 2025 paper by *Hemmat et al.*\n\n> **Hemmat, A., Vadaei, K., Shirian, M., Heydari, M.H., Fatemi, A.**\n> *\u201cAdaptive Chunking for VideoRAG Pipelines with a Newly Gathered Bilingual Educational Dataset.\u201d*\n> *Proceedings of the 30th International Computer Society of Iran Computer Conference (CSICC 2025), University of Isfahan.*\n\n---\n\n## \ud83e\udde0 Research Background\n\nThis framework underpins the **EduViQA bilingual dataset**, designed for evaluating lecture-based RAG systems in both Persian and English. The dataset and code form a unified ecosystem for multimodal question generation and retrieval evaluation.\n\n**Key Contributions:**\n\n* \ud83c\udfa5 Adaptive Hybrid Chunking \u2014 Combines CLIP cosine similarity with SSIM-based visual comparison.\n* \ud83e\uddee Entropy-Based Keyframe Selection \u2014 Extracts high-information frames for retrieval.\n* \ud83d\udde3\ufe0f Transcript\u2013Frame Alignment \u2014 Synchronizes ASR transcripts with visual semantics.\n* \ud83d\udd0d Multimodal Retrieval \u2014 Integrates visual and textual embeddings for RAG.\n* \ud83e\udde0 Benchmark Dataset \u2014 20 bilingual educational videos with 50 QA pairs each.\n\n---\n\n## \u2699\ufe0f Installation\n\n```bash\npip install VideoRAC\n```\n\n---\n\n## \ud83d\ude80 Usage Example\n\n### 1\ufe0f\u20e3 Hybrid Chunking\n\n```python\nfrom VideoRAC.Modules import HybridChunker\n\nchunker = HybridChunker(\n clip_model='openai/clip-vit-base-patch32',\n alpha=0.6,\n threshold_embedding=0.85,\n threshold_ssim: float=0.8,\n interval: int=1,\n)\nchunks, timestamps, duration = chunker.chunk(\"lecture.mp4\")\nchunker.evaluate()\n```\n\n### 2\ufe0f\u20e3 Q&A Generation\n\n```python\nfrom VideoRAC.Modules import VideoQAGenerator\n\ndef my_llm_fn(messages):\n from openai import OpenAI\n client = OpenAI()\n response = client.chat.completions.create(model=\"gpt-4o\", messages=messages)\n return response.choices[0].message.content\n\nurls = [\"https://www.youtube.com/watch?v=2uYu8nMR5O4\"]\nqa = VideoQAGenerator(video_urls=urls, llm_fn=my_llm_fn)\nqa.process_videos()\n```\n\n---\n\n## \ud83d\udcc8 Results Summary (CSICC 2025)\n\n| Method | AR | CR | F | Notes |\n| ------------------------ | -------- | -------- | -------- | ---------------------------- |\n| **VideoRAC (CLIP+SSIM)** | **0.87** | **0.82** | **0.91** | Best performance overall |\n| CLIP-only | 0.80 | 0.75 | 0.83 | Weaker temporal segmentation |\n| Simple Slicing | 0.72 | 0.67 | 0.76 | Time-based only |\n\n> Evaluated using RAGAS metrics: *Answer Relevance (AR)*, *Context Relevance (CR)*, and *Faithfulness (F)*.\n\n---\n\n## \ud83e\uddfe License\n\nLicensed under **Creative Commons Attribution 4.0 International (CC BY 4.0)**.\n\nYou may share and adapt this work with attribution. Please cite our paper when using VideoRAC or EduViQA:\n\n```bibtex\n@INPROCEEDINGS{10967455,\n author={Hemmat, Arshia and Vadaei, Kianoosh and Shirian, Melika and Heydari, Mohammad Hassan and Fatemi, Afsaneh},\n booktitle={2025 29th International Computer Conference, Computer Society of Iran (CSICC)}, \n title={Adaptive Chunking for VideoRAG Pipelines with a Newly Gathered Bilingual Educational Dataset}, \n year={2025},\n volume={},\n number={},\n pages={1-7},\n keywords={Measurement;Visualization;Large language models;Pipelines;Retrieval augmented generation;Education;Question answering (information retrieval);Multilingual;Standards;Context modeling;Video QA;Datasets Preparation;Academic Question Answering;Multilingual},\n doi={10.1109/CSICC65765.2025.10967455}}\n```\n\n---\n\n## \ud83d\udc65 Authors\n\n**University of Isfahan \u2014 Department of Computer Engineering**\n\n* **Kianoosh Vadaei** \u2014 [kia.vadaei@gmail.com](mailto:kia.vadaei@gmail.com)\n* **Melika Shirian** \u2014 [mel.shirian@gmail.com](mailto:mel.shirian@gmail.com)\n* **Arshia Hemmat** \u2014 [amirarshia.hemmat@kellogg.ox.ac.uk](mailto:amirarshia.hemmat@kellogg.ox.ac.uk)\n* **Mohammad Hassan Heydari** \u2014 [heidary0081@gmail.com](mailto:heidary0081@gmail.com)\n* **Afsaneh Fatemi** \u2014 [a.fatemi@eng.ui.ac.ir](mailto:a.fatemi@eng.ui.ac.ir)\n\n---\n\n<div align=\"center\">\n\n**\u2b50 Official CSICC 2025 Implementation \u2014 Give it a star if you use it in your research! \u2b50**\n*Made with \u2764\ufe0f at University of Isfahan*\n\n</div>\n",
"bugtrack_url": null,
"license": "CC BY 4.0",
"summary": "Video Retrieval-Augmented Chunking and Q&A Generation Toolkit",
"version": "0.2.7",
"project_urls": {
"Documentation": "https://prismaticlab.github.io/Video-RAC/",
"Homepage": "https://prismaticlab.github.io/Video-RAC/",
"Issues": "https://github.com/PrismaticLab/Video-RAC/issues",
"Repository": "https://github.com/PrismaticLab/Video-RAC"
},
"split_keywords": [
"video",
" rag",
" chunking",
" qa",
" clip",
" ssim",
" yt-dlp",
" multimodal_rag"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "d49c8b151cf79c997097b1d27204976b5f3e539a1fc3f186333fefe1394aa177",
"md5": "2cd47db36bdaa64651aa58c5453e3db0",
"sha256": "ef2b8ac28bdcea5e59f7e23807362349d6e0461d4d6ebcf856bdc551c0b7174c"
},
"downloads": -1,
"filename": "videorac-0.2.7-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2cd47db36bdaa64651aa58c5453e3db0",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 18516,
"upload_time": "2025-11-03T05:44:35",
"upload_time_iso_8601": "2025-11-03T05:44:35.525464Z",
"url": "https://files.pythonhosted.org/packages/d4/9c/8b151cf79c997097b1d27204976b5f3e539a1fc3f186333fefe1394aa177/videorac-0.2.7-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "7aa092c0f743bb3b3a460a194997f4ad9d046813b0fb91bba44f3c961d308553",
"md5": "52cea063913808af405694fdfa38ffed",
"sha256": "6dce66a9bae051bb769c098f8af9e5be189f3f60539c31ae8e8c0a29f8774f90"
},
"downloads": -1,
"filename": "videorac-0.2.7.tar.gz",
"has_sig": false,
"md5_digest": "52cea063913808af405694fdfa38ffed",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 20057,
"upload_time": "2025-11-03T05:44:37",
"upload_time_iso_8601": "2025-11-03T05:44:37.106955Z",
"url": "https://files.pythonhosted.org/packages/7a/a0/92c0f743bb3b3a460a194997f4ad9d046813b0fb91bba44f3c961d308553/videorac-0.2.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-11-03 05:44:37",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "PrismaticLab",
"github_project": "Video-RAC",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "opencv-python",
"specs": [
[
">=",
"4.7"
]
]
},
{
"name": "tqdm",
"specs": [
[
">=",
"4.65"
]
]
},
{
"name": "yt-dlp",
"specs": [
[
">=",
"2024.8.6"
]
]
},
{
"name": "youtube-transcript-api",
"specs": [
[
">=",
"0.6.2"
]
]
},
{
"name": "transformers",
"specs": [
[
">=",
"4.41"
]
]
},
{
"name": "scikit-image",
"specs": [
[
">=",
"0.22"
]
]
},
{
"name": "scipy",
"specs": [
[
">=",
"1.10"
]
]
},
{
"name": "langchain",
"specs": [
[
">=",
"0.2.0"
]
]
},
{
"name": "colorlog",
"specs": [
[
">=",
"6.9.0"
]
]
},
{
"name": "torch",
"specs": [
[
">=",
"2.8.0"
]
]
}
],
"lcname": "videorac"
}