# **Offline YouTube Video Search Application**
This application allows users to **extract transcripts from YouTube videos**, **upload their own video/audio files**, **create searchable vector databases**, and **perform semantic searches** using a **Gradio web interface** or **command-line interface (CLI)**. It's powered by `faster-whisper` for transcription, `FAISS` for vector search, and `sentence-transformers` for text embeddings.
---
## **Features**
- Extract transcripts from individual videos, playlists, and entire channels.
- **Upload your own video or audio files for processing.**
- Automatically detect playlists, channels, and individual video links.
- Automatically download video thumbnails.
- Store transcripts and create a searchable vector database.
- Perform semantic searches on video content.
- Supports **Gradio web interface** and **CLI** for flexible usage.
- Easily add more videos or your own files to the dataset.
---
## **Web Interface**
### **Add Videos Tab**
- **Enter playlist, channel, and/or video URLs (comma-separated).**
- **Upload your own video/audio files.**
- **Option to process entire channels when a channel URL is provided.**
- **Option to keep videos stored locally or not.**
<img width="628" alt="Screenshot 2024-11-01 at 11 14 22 AM" src="https://github.com/user-attachments/assets/00807fa3-ac86-4940-a72a-60fa267577d0">
### **Search Tab**
- **Enter your search query to find relevant snippets.**
- **View top relevant videos with thumbnails and play local videos if available.**
- **View detailed results with timestamps and direct links.**
<img width="635" alt="Screenshot 2024-11-01 at 11 18 01 AM" src="https://github.com/user-attachments/assets/c2c21482-dbf6-4515-b1ca-d1bf650e3c48">
<img width="635" alt="Screenshot 2024-11-01 at 12 05 34 PM" src="https://github.com/user-attachments/assets/eb881286-a827-410b-a484-641b78ea1e0e">
---
## **Installation**
![PyPI Downloads](https://static.pepy.tech/badge/offlineyoutube)
Ensure you have Python installed (>= 3.8). Then, install the required dependencies:
```bash
pip install offlineyoutube
```
---
## **Usage**
The app provides **two ways to interact**:
1. **Gradio Web Interface**
2. **Command-Line Interface (CLI)**
### **1. Running the Gradio Web Interface**
Launch the web interface:
```bash
offlineyoutube ui
```
or simply:
```bash
offlineyoutube
```
Then, open the URL (usually `http://127.0.0.1:7860`) in your browser.
#### **Gradio Interface Tabs:**
- **Add Videos:**
- Enter playlist URLs, channel URLs, and/or individual video URLs (comma-separated).
- **Upload your own video or audio files for processing.**
- **Option to process entire YouTube channels when a channel URL is provided.**
- **Option to keep videos stored locally or not.**
- The app will automatically detect whether each link is a playlist, channel, or a video.
- Videos and uploaded files will be transcribed, and the database will be updated with the content.
- **Search:**
- Enter search queries to find relevant snippets from the video transcripts.
- Results are ranked based on semantic similarity and include video thumbnails.
- **If local videos are available, you can play them directly in the interface.**
---
### **2. Command-Line Interface (CLI)**
The CLI provides more flexibility for programmatic use.
#### **Commands Overview**
Use the `--help` command to view available commands and examples:
```bash
offlineyoutube --help
```
**Output:**
```
usage: offlineyoutube [-h] {add,search,ui} ...
YouTube Video Search Application
positional arguments:
{add,search,ui} Available commands
add Add videos to the database
search Search the video database
ui Run the Gradio web interface
optional arguments:
-h, --help Show this help message and exit
Examples:
# Add videos from a playlist and keep videos locally
offlineyoutube add --input "https://www.youtube.com/playlist?list=YOUR_PLAYLIST_ID" --keep_videos
# Add specific videos without keeping videos locally
offlineyoutube add --input "https://www.youtube.com/watch?v=VIDEO_ID1,https://www.youtube.com/watch?v=VIDEO_ID2"
# Add videos from a channel (process entire channel)
offlineyoutube add --input "https://www.youtube.com/channel/CHANNEL_ID" --process_channel
# Search the database with a query
offlineyoutube search --query "Your search query" --top_k 5
# Run the Gradio web interface
offlineyoutube ui
```
---
### **Examples of CLI Usage**
#### **1. Adding Videos**
- **Add Playlists and Videos:**
```bash
offlineyoutube add --input "https://www.youtube.com/playlist?list=YOUR_PLAYLIST_ID,https://www.youtube.com/watch?v=VIDEO_ID"
```
- **Add Specific Videos Without Keeping Them Locally:**
```bash
offlineyoutube add --input "https://www.youtube.com/watch?v=dQw4w9WgXcQ,https://www.youtube.com/watch?v=9bZkp7q19f0"
```
- **Add Videos from a Channel (Process Entire Channel):**
```bash
offlineyoutube add --input "https://www.youtube.com/channel/CHANNEL_ID" --process_channel
```
- **Add Videos and Keep Videos Stored Locally:**
```bash
offlineyoutube add --input "https://www.youtube.com/watch?v=VIDEO_ID" --keep_videos
```
#### **2. Searching the Database**
- **Perform a Search:**
```bash
offlineyoutube search --query "machine learning tutorials" --top_k 5
```
---
### **How It Works**
1. **Adding Videos and Uploaded Files:**
- The app accepts a list of links and automatically detects whether each link is a playlist, channel, or an individual video.
- **You can upload your own video or audio files for processing.**
- It downloads video audio (or uses uploaded files) and transcribes it using `faster-whisper`.
- Thumbnails are downloaded and saved locally.
- The transcript data is saved in `datasets/transcript_dataset.csv`.
- A vector database is updated using FAISS with embeddings generated by `sentence-transformers`.
2. **Incremental Updating:**
- Videos and uploaded files are processed one by one, and the dataset and vector database are updated incrementally.
- This ensures efficient processing, especially when dealing with large datasets.
3. **Searching the Database:**
- When a query is entered, the app computes its embedding and searches the FAISS index for relevant video snippets.
- The top results are displayed with thumbnails, titles, and links to the videos.
- **If local videos are available, you can play them directly in the interface.**
---
### **FAQ**
#### **1. How do I add multiple playlists, channels, and videos at once?**
Simply provide a comma-separated list of URLs, and the app will automatically detect and process each link:
```bash
offlineyoutube add --input "https://www.youtube.com/playlist?list=PLAYLIST_ID1,https://www.youtube.com/watch?v=VIDEO_ID,https://www.youtube.com/channel/CHANNEL_ID"
```
If you want to process entire channels, make sure to include the `--process_channel` flag:
```bash
offlineyoutube add --input "https://www.youtube.com/channel/CHANNEL_ID" --process_channel
```
#### **2. How can I upload my own video or audio files for processing?**
In the Gradio web interface, navigate to the **Add Videos** tab. Use the **"Upload your own video/audio files"** option to upload one or multiple files. The app will process these files and add them to the database.
#### **3. Why aren’t new videos or uploaded files showing up in search results?**
Ensure that the videos or files have been fully processed and that the vector database has been updated. The app handles this automatically, but processing may take time for large videos, playlists, or channels.
#### **4. How do I prevent videos from being stored locally?**
By default, the app keeps videos stored locally. To change this behavior, use the `--keep_videos` flag and set it to `False`:
```bash
offlineyoutube add --input "VIDEO_OR_PLAYLIST_URL" --keep_videos False
```
In the Gradio interface, uncheck the **"Keep videos stored locally"** option in the **Add Videos** tab.
#### **5. Can I process entire YouTube channels?**
Yes! Use the `--process_channel` flag when adding videos via the CLI:
```bash
offlineyoutube add --input "https://www.youtube.com/channel/CHANNEL_ID" --process_channel
```
In the Gradio interface, check the **"Process entire channel when a channel URL is provided"** option in the **Add Videos** tab.
#### **6. Can I search the database without launching the Gradio interface?**
Yes! Use the `search` command via the CLI:
```bash
offlineyoutube search --query "Your query" --top_k 5
```
---
### **Project Structure**
```
.
├── app.py # Main application script (Gradio + CLI)
├── functions.py # Helper functions for transcription, FAISS, etc.
├── datasets/
│ ├── transcript_dataset.csv # CSV file storing transcripts
│ └── vector_index.faiss # FAISS vector index
├── thumbnails/ # Folder for storing video thumbnails
├── videos/ # Folder for storing downloaded videos (if keep_videos is True)
├── tmp/ # Temporary folder for videos (if keep_videos is False)
├── uploaded_files/ # Folder for storing uploaded files
```
---
### **Known Limitations**
- **Processing Time:** Transcribing videos and generating embeddings can be time-consuming, especially for long videos, large playlists, or channels.
- **Storage Requirements:** Keeping videos stored locally will require additional disk space. Use the `--keep_videos False` option if storage is a concern.
- **Large Datasets:** As the dataset grows, querying may take longer. Consider optimizing the FAISS index for very large datasets.
---
### **Contributing**
Feel free to fork the repository, open issues, or submit pull requests if you'd like to contribute to this project.
---
### **License**
This project is licensed under the MIT License. See the LICENSE file for details.
---
### **Acknowledgments**
- **faster-whisper** for fast transcription.
- **FAISS** for efficient vector search.
- **Gradio** for the interactive web interface.
- **yt-dlp** for downloading video content.
---
Raw data
{
"_id": null,
"home_page": "https://github.com/DrewThomasson/offlineYoutube",
"name": "offlineyoutube",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": null,
"author": "Andrew Phillip Thomasson",
"author_email": "drew.thomasson100@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/ff/92/a23d0267697d1a8024e0cbd3eaf5296e263979892e9c9e0444f69a540e44/offlineyoutube-2.1.9.tar.gz",
"platform": null,
"description": "# **Offline YouTube Video Search Application**\n\nThis application allows users to **extract transcripts from YouTube videos**, **upload their own video/audio files**, **create searchable vector databases**, and **perform semantic searches** using a **Gradio web interface** or **command-line interface (CLI)**. It's powered by `faster-whisper` for transcription, `FAISS` for vector search, and `sentence-transformers` for text embeddings.\n\n---\n\n## **Features**\n\n- Extract transcripts from individual videos, playlists, and entire channels.\n- **Upload your own video or audio files for processing.**\n- Automatically detect playlists, channels, and individual video links.\n- Automatically download video thumbnails.\n- Store transcripts and create a searchable vector database.\n- Perform semantic searches on video content.\n- Supports **Gradio web interface** and **CLI** for flexible usage.\n- Easily add more videos or your own files to the dataset.\n\n---\n\n## **Web Interface**\n\n### **Add Videos Tab**\n\n- **Enter playlist, channel, and/or video URLs (comma-separated).**\n- **Upload your own video/audio files.**\n- **Option to process entire channels when a channel URL is provided.**\n- **Option to keep videos stored locally or not.**\n\n<img width=\"628\" alt=\"Screenshot 2024-11-01 at 11 14 22\u202fAM\" src=\"https://github.com/user-attachments/assets/00807fa3-ac86-4940-a72a-60fa267577d0\">\n\n### **Search Tab**\n\n- **Enter your search query to find relevant snippets.**\n- **View top relevant videos with thumbnails and play local videos if available.**\n- **View detailed results with timestamps and direct links.**\n\n<img width=\"635\" alt=\"Screenshot 2024-11-01 at 11 18 01\u202fAM\" src=\"https://github.com/user-attachments/assets/c2c21482-dbf6-4515-b1ca-d1bf650e3c48\">\n<img width=\"635\" alt=\"Screenshot 2024-11-01 at 12 05 34\u202fPM\" src=\"https://github.com/user-attachments/assets/eb881286-a827-410b-a484-641b78ea1e0e\">\n\n---\n\n## **Installation**\n![PyPI Downloads](https://static.pepy.tech/badge/offlineyoutube)\n\nEnsure you have Python installed (>= 3.8). Then, install the required dependencies:\n\n```bash\npip install offlineyoutube\n```\n\n---\n\n## **Usage**\n\nThe app provides **two ways to interact**: \n1. **Gradio Web Interface** \n2. **Command-Line Interface (CLI)**\n\n### **1. Running the Gradio Web Interface**\n\nLaunch the web interface:\n\n```bash\nofflineyoutube ui\n```\n\nor simply:\n\n```bash\nofflineyoutube\n```\n\nThen, open the URL (usually `http://127.0.0.1:7860`) in your browser.\n\n#### **Gradio Interface Tabs:**\n\n- **Add Videos:** \n - Enter playlist URLs, channel URLs, and/or individual video URLs (comma-separated).\n - **Upload your own video or audio files for processing.**\n - **Option to process entire YouTube channels when a channel URL is provided.**\n - **Option to keep videos stored locally or not.**\n - The app will automatically detect whether each link is a playlist, channel, or a video.\n - Videos and uploaded files will be transcribed, and the database will be updated with the content.\n \n- **Search:** \n - Enter search queries to find relevant snippets from the video transcripts.\n - Results are ranked based on semantic similarity and include video thumbnails.\n - **If local videos are available, you can play them directly in the interface.**\n\n---\n\n### **2. Command-Line Interface (CLI)**\n\nThe CLI provides more flexibility for programmatic use.\n\n#### **Commands Overview**\n\nUse the `--help` command to view available commands and examples:\n\n```bash\nofflineyoutube --help\n```\n\n**Output:**\n\n```\nusage: offlineyoutube [-h] {add,search,ui} ...\n\nYouTube Video Search Application\n\npositional arguments:\n {add,search,ui} Available commands\n add Add videos to the database\n search Search the video database\n ui Run the Gradio web interface\n\noptional arguments:\n -h, --help Show this help message and exit\n\nExamples:\n # Add videos from a playlist and keep videos locally\n offlineyoutube add --input \"https://www.youtube.com/playlist?list=YOUR_PLAYLIST_ID\" --keep_videos\n\n # Add specific videos without keeping videos locally\n offlineyoutube add --input \"https://www.youtube.com/watch?v=VIDEO_ID1,https://www.youtube.com/watch?v=VIDEO_ID2\"\n\n # Add videos from a channel (process entire channel)\n offlineyoutube add --input \"https://www.youtube.com/channel/CHANNEL_ID\" --process_channel\n\n # Search the database with a query\n offlineyoutube search --query \"Your search query\" --top_k 5\n\n # Run the Gradio web interface\n offlineyoutube ui\n```\n\n---\n\n### **Examples of CLI Usage**\n\n#### **1. Adding Videos**\n\n- **Add Playlists and Videos:**\n\n ```bash\n offlineyoutube add --input \"https://www.youtube.com/playlist?list=YOUR_PLAYLIST_ID,https://www.youtube.com/watch?v=VIDEO_ID\"\n ```\n\n- **Add Specific Videos Without Keeping Them Locally:**\n\n ```bash\n offlineyoutube add --input \"https://www.youtube.com/watch?v=dQw4w9WgXcQ,https://www.youtube.com/watch?v=9bZkp7q19f0\"\n ```\n\n- **Add Videos from a Channel (Process Entire Channel):**\n\n ```bash\n offlineyoutube add --input \"https://www.youtube.com/channel/CHANNEL_ID\" --process_channel\n ```\n\n- **Add Videos and Keep Videos Stored Locally:**\n\n ```bash\n offlineyoutube add --input \"https://www.youtube.com/watch?v=VIDEO_ID\" --keep_videos\n ```\n\n#### **2. Searching the Database**\n\n- **Perform a Search:**\n\n ```bash\n offlineyoutube search --query \"machine learning tutorials\" --top_k 5\n ```\n\n---\n\n### **How It Works**\n\n1. **Adding Videos and Uploaded Files:**\n - The app accepts a list of links and automatically detects whether each link is a playlist, channel, or an individual video.\n - **You can upload your own video or audio files for processing.**\n - It downloads video audio (or uses uploaded files) and transcribes it using `faster-whisper`.\n - Thumbnails are downloaded and saved locally.\n - The transcript data is saved in `datasets/transcript_dataset.csv`.\n - A vector database is updated using FAISS with embeddings generated by `sentence-transformers`.\n\n2. **Incremental Updating:**\n - Videos and uploaded files are processed one by one, and the dataset and vector database are updated incrementally.\n - This ensures efficient processing, especially when dealing with large datasets.\n\n3. **Searching the Database:**\n - When a query is entered, the app computes its embedding and searches the FAISS index for relevant video snippets.\n - The top results are displayed with thumbnails, titles, and links to the videos.\n - **If local videos are available, you can play them directly in the interface.**\n\n---\n\n### **FAQ**\n\n#### **1. How do I add multiple playlists, channels, and videos at once?**\n\nSimply provide a comma-separated list of URLs, and the app will automatically detect and process each link:\n\n```bash\nofflineyoutube add --input \"https://www.youtube.com/playlist?list=PLAYLIST_ID1,https://www.youtube.com/watch?v=VIDEO_ID,https://www.youtube.com/channel/CHANNEL_ID\"\n```\n\nIf you want to process entire channels, make sure to include the `--process_channel` flag:\n\n```bash\nofflineyoutube add --input \"https://www.youtube.com/channel/CHANNEL_ID\" --process_channel\n```\n\n#### **2. How can I upload my own video or audio files for processing?**\n\nIn the Gradio web interface, navigate to the **Add Videos** tab. Use the **\"Upload your own video/audio files\"** option to upload one or multiple files. The app will process these files and add them to the database.\n\n#### **3. Why aren\u2019t new videos or uploaded files showing up in search results?**\n\nEnsure that the videos or files have been fully processed and that the vector database has been updated. The app handles this automatically, but processing may take time for large videos, playlists, or channels.\n\n#### **4. How do I prevent videos from being stored locally?**\n\nBy default, the app keeps videos stored locally. To change this behavior, use the `--keep_videos` flag and set it to `False`:\n\n```bash\nofflineyoutube add --input \"VIDEO_OR_PLAYLIST_URL\" --keep_videos False\n```\n\nIn the Gradio interface, uncheck the **\"Keep videos stored locally\"** option in the **Add Videos** tab.\n\n#### **5. Can I process entire YouTube channels?**\n\nYes! Use the `--process_channel` flag when adding videos via the CLI:\n\n```bash\nofflineyoutube add --input \"https://www.youtube.com/channel/CHANNEL_ID\" --process_channel\n```\n\nIn the Gradio interface, check the **\"Process entire channel when a channel URL is provided\"** option in the **Add Videos** tab.\n\n#### **6. Can I search the database without launching the Gradio interface?**\n\nYes! Use the `search` command via the CLI:\n\n```bash\nofflineyoutube search --query \"Your query\" --top_k 5\n```\n\n---\n\n### **Project Structure**\n\n```\n.\n\u251c\u2500\u2500 app.py # Main application script (Gradio + CLI)\n\u251c\u2500\u2500 functions.py # Helper functions for transcription, FAISS, etc.\n\u251c\u2500\u2500 datasets/\n\u2502 \u251c\u2500\u2500 transcript_dataset.csv # CSV file storing transcripts\n\u2502 \u2514\u2500\u2500 vector_index.faiss # FAISS vector index\n\u251c\u2500\u2500 thumbnails/ # Folder for storing video thumbnails\n\u251c\u2500\u2500 videos/ # Folder for storing downloaded videos (if keep_videos is True)\n\u251c\u2500\u2500 tmp/ # Temporary folder for videos (if keep_videos is False)\n\u251c\u2500\u2500 uploaded_files/ # Folder for storing uploaded files\n```\n\n---\n\n### **Known Limitations**\n\n- **Processing Time:** Transcribing videos and generating embeddings can be time-consuming, especially for long videos, large playlists, or channels.\n- **Storage Requirements:** Keeping videos stored locally will require additional disk space. Use the `--keep_videos False` option if storage is a concern.\n- **Large Datasets:** As the dataset grows, querying may take longer. Consider optimizing the FAISS index for very large datasets.\n\n---\n\n### **Contributing**\n\nFeel free to fork the repository, open issues, or submit pull requests if you'd like to contribute to this project.\n\n---\n\n### **License**\n\nThis project is licensed under the MIT License. See the LICENSE file for details.\n\n---\n\n### **Acknowledgments**\n\n- **faster-whisper** for fast transcription.\n- **FAISS** for efficient vector search.\n- **Gradio** for the interactive web interface.\n- **yt-dlp** for downloading video content.\n\n---\n",
"bugtrack_url": null,
"license": null,
"summary": "A YouTube video search and management tool with a Gradio interface",
"version": "2.1.9",
"project_urls": {
"Homepage": "https://github.com/DrewThomasson/offlineYoutube"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "10c51ce18004acedb7d4b48d3e8783d365c6e74d8d3cbc682c249d84e61dea3c",
"md5": "e2e472027c4e5c37306f9983eb354be5",
"sha256": "dfa45b1b1178f73731b4349ca0d08d5b5d29ebca1267c5797afa5ec1c61b0c89"
},
"downloads": -1,
"filename": "offlineyoutube-2.1.9-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e2e472027c4e5c37306f9983eb354be5",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 16494,
"upload_time": "2024-11-07T18:11:05",
"upload_time_iso_8601": "2024-11-07T18:11:05.278284Z",
"url": "https://files.pythonhosted.org/packages/10/c5/1ce18004acedb7d4b48d3e8783d365c6e74d8d3cbc682c249d84e61dea3c/offlineyoutube-2.1.9-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ff92a23d0267697d1a8024e0cbd3eaf5296e263979892e9c9e0444f69a540e44",
"md5": "2d9bcb7d9b33a7bd21f6d8e805ab6c85",
"sha256": "8f793d4891c28db19149ca14a88a11de7313eebf5d39adfd8ae3614b33ab8e8b"
},
"downloads": -1,
"filename": "offlineyoutube-2.1.9.tar.gz",
"has_sig": false,
"md5_digest": "2d9bcb7d9b33a7bd21f6d8e805ab6c85",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 18359,
"upload_time": "2024-11-07T18:11:06",
"upload_time_iso_8601": "2024-11-07T18:11:06.834626Z",
"url": "https://files.pythonhosted.org/packages/ff/92/a23d0267697d1a8024e0cbd3eaf5296e263979892e9c9e0444f69a540e44/offlineyoutube-2.1.9.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-07 18:11:06",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "DrewThomasson",
"github_project": "offlineYoutube",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "offlineyoutube"
}