# YouTube Video to Text Markdown Converter
`yt-video-text-md` is a Python package designed to retrieve and convert YouTube video transcripts/subtitles into Markdown files. This tool is particularly useful for extracting text from entire playlists or individual videos. It leverages the `youtube-transcript-api` for direct subtitle extraction and `whisper` for audio-to-text conversion when transcripts are unavailable.
## Features
- **Playlist and Video Support:** Extracts subtitles from both individual videos and entire playlists.
- **Fallback Mechanism:** Utilizes `whisper` to transcribe audio if subtitles are not available.
- **Markdown Formatting:** Outputs transcripts in Markdown format with video titles as headers.
## Installation
### Via pip
To install the latest version directly from the GitHub repository, use:
```bash
pip install git+https://github.com/kothiyarajesh/yt-video-text-md.git
```
### Building from Source
1. Clone the repository:
```bash
git clone https://github.com/kothiyarajesh/yt-video-text-md.git
```
2. Navigate to the project directory:
```bash
cd yt-video-text-md
```
3. Install the package:
```bash
python setup.py install
```
4. If installing from source, make sure to install the dependencies manually:
```bash
pip install -r requirements.txt
```
## Usage
### Python Script
Here's a simple example of how to use the `yt-video-text-md` library in a Python script:
```python
from yt_video_text_md import YTVideoTextMD
# Define the URL of the YouTube video or playlist you want to process
video_url = "https://www.youtube.com/watch?v=pzo13OPXZS4"
# Specify the directory where the output Markdown file will be saved
output_directory = "."
# Set the default name for the generated Markdown file
markdown_file_name = "yt_video_2_text_md_"
# Define the directory where temporary audio files will be stored (Used only if a transcript is not available)
temporary_audio_directory = "/tmp"
# Create an instance of YTVideoTextMD with the specified parameters
YTVideoTextMD(
url=video_url,
output_dir=output_directory,
default_md_file_name=markdown_file_name,
audio_output_dir=temporary_audio_directory
)
```
### Command-Line Interface
You can also use the package from the command line:
```bash
yt-video-text-md -u "https://www.youtube.com/playlist?list=PLMrJAkhIeNNQV7wi9r7Kut8liLFMWQOXn" -d "." -f "playlist_video_" -ad "/tmp"
```
**Options:**
- `-u` or `--url`: URL of the YouTube video or playlist.
- `-d` or `--output-dir`: Directory where the output Markdown file will be saved.
- `-f` or `--file-name`: Name for the generated Markdown file.
- `-ad` or `--audio-dir`: Directory where temporary audio files will be stored (used only if a transcript is not available).
## Notes
- **Dependencies:** This package relies on several external libraries. Ensure all dependencies are installed for optimal functionality.
- **Audio Extraction:** If a video does not have an available transcript, the script will download the video, extract the audio, and convert it to text. This process requires a stable internet connection and may be resource-intensive, especially for long videos.
## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
Raw data
{
"_id": null,
"home_page": "https://github.com/kothiyarajesh/yt-video-text-md",
"name": "yt-video-text-md",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": null,
"author": "Rajesh Kothiya",
"author_email": "rkahir2222@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/00/b8/86ef05c6cc843f8977f0a0179a335d7d1e7e6532eac1a9f9514356dde969/yt_video_text_md-0.1.0.tar.gz",
"platform": null,
"description": "\n# YouTube Video to Text Markdown Converter\n\n`yt-video-text-md` is a Python package designed to retrieve and convert YouTube video transcripts/subtitles into Markdown files. This tool is particularly useful for extracting text from entire playlists or individual videos. It leverages the `youtube-transcript-api` for direct subtitle extraction and `whisper` for audio-to-text conversion when transcripts are unavailable.\n\n## Features\n\n- **Playlist and Video Support:** Extracts subtitles from both individual videos and entire playlists.\n- **Fallback Mechanism:** Utilizes `whisper` to transcribe audio if subtitles are not available.\n- **Markdown Formatting:** Outputs transcripts in Markdown format with video titles as headers.\n\n## Installation\n\n### Via pip\n\nTo install the latest version directly from the GitHub repository, use:\n\n```bash\npip install git+https://github.com/kothiyarajesh/yt-video-text-md.git\n```\n\n### Building from Source\n\n1. Clone the repository:\n\n ```bash\n git clone https://github.com/kothiyarajesh/yt-video-text-md.git\n ```\n\n2. Navigate to the project directory:\n\n ```bash\n cd yt-video-text-md\n ```\n\n3. Install the package:\n\n ```bash\n python setup.py install\n ```\n\n4. If installing from source, make sure to install the dependencies manually:\n\n ```bash\n pip install -r requirements.txt\n ```\n\n## Usage\n\n### Python Script\n\nHere's a simple example of how to use the `yt-video-text-md` library in a Python script:\n\n```python\nfrom yt_video_text_md import YTVideoTextMD\n\n# Define the URL of the YouTube video or playlist you want to process\nvideo_url = \"https://www.youtube.com/watch?v=pzo13OPXZS4\"\n\n# Specify the directory where the output Markdown file will be saved\noutput_directory = \".\"\n\n# Set the default name for the generated Markdown file\nmarkdown_file_name = \"yt_video_2_text_md_\"\n\n# Define the directory where temporary audio files will be stored (Used only if a transcript is not available)\ntemporary_audio_directory = \"/tmp\"\n\n# Create an instance of YTVideoTextMD with the specified parameters\nYTVideoTextMD(\n url=video_url,\n output_dir=output_directory,\n default_md_file_name=markdown_file_name,\n audio_output_dir=temporary_audio_directory\n)\n```\n\n### Command-Line Interface\n\nYou can also use the package from the command line:\n\n```bash\nyt-video-text-md -u \"https://www.youtube.com/playlist?list=PLMrJAkhIeNNQV7wi9r7Kut8liLFMWQOXn\" -d \".\" -f \"playlist_video_\" -ad \"/tmp\"\n```\n\n**Options:**\n- `-u` or `--url`: URL of the YouTube video or playlist.\n- `-d` or `--output-dir`: Directory where the output Markdown file will be saved.\n- `-f` or `--file-name`: Name for the generated Markdown file.\n- `-ad` or `--audio-dir`: Directory where temporary audio files will be stored (used only if a transcript is not available).\n\n## Notes\n\n- **Dependencies:** This package relies on several external libraries. Ensure all dependencies are installed for optimal functionality.\n- **Audio Extraction:** If a video does not have an available transcript, the script will download the video, extract the audio, and convert it to text. This process requires a stable internet connection and may be resource-intensive, especially for long videos.\n\n## License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.\n",
"bugtrack_url": null,
"license": null,
"summary": "Fetch YouTube video transcripts and save them to markdown files.",
"version": "0.1.0",
"project_urls": {
"Homepage": "https://github.com/kothiyarajesh/yt-video-text-md"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "6280d124b3e75d37ef8c91ff1f2ead5996b792ca08bc45f722aaf2270cf3ebd4",
"md5": "fa2b1af9c491629951baf18aeece1587",
"sha256": "8540f50634659f2e73db191725a1ed7df6a5934645c7c215e2fb2caeaca7433a"
},
"downloads": -1,
"filename": "yt_video_text_md-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "fa2b1af9c491629951baf18aeece1587",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 8710,
"upload_time": "2024-08-23T09:09:47",
"upload_time_iso_8601": "2024-08-23T09:09:47.310823Z",
"url": "https://files.pythonhosted.org/packages/62/80/d124b3e75d37ef8c91ff1f2ead5996b792ca08bc45f722aaf2270cf3ebd4/yt_video_text_md-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "00b886ef05c6cc843f8977f0a0179a335d7d1e7e6532eac1a9f9514356dde969",
"md5": "c1fbbb5c1f119109f45635a518c027f9",
"sha256": "5c40d46c74c8e793bd41f6e5c712d6624c825a27ef5ebf395af030fd0b8e587f"
},
"downloads": -1,
"filename": "yt_video_text_md-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "c1fbbb5c1f119109f45635a518c027f9",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 8232,
"upload_time": "2024-08-23T09:23:06",
"upload_time_iso_8601": "2024-08-23T09:23:06.579872Z",
"url": "https://files.pythonhosted.org/packages/00/b8/86ef05c6cc843f8977f0a0179a335d7d1e7e6532eac1a9f9514356dde969/yt_video_text_md-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-23 09:23:06",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "kothiyarajesh",
"github_project": "yt-video-text-md",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "yt-dlp",
"specs": [
[
"==",
"2024.8.6"
]
]
},
{
"name": "whisper",
"specs": [
[
"==",
"1.1.10"
]
]
},
{
"name": "aiofiles",
"specs": [
[
"==",
"24.1.0"
]
]
},
{
"name": "youtube-transcript-api",
"specs": [
[
"==",
"0.6.2"
]
]
},
{
"name": "pytube",
"specs": [
[
"==",
"15.0.0"
]
]
},
{
"name": "tqdm",
"specs": [
[
"==",
"4.66.5"
]
]
}
],
"lcname": "yt-video-text-md"
}