yt-video-text-md

Name	yt-video-text-md JSON
Version	0.1.0 JSON
	download
home_page	https://github.com/kothiyarajesh/yt-video-text-md
Summary	Fetch YouTube video transcripts and save them to markdown files.
upload_time	2024-08-23 09:23:06
maintainer	None
docs_url	None
author	Rajesh Kothiya
requires_python	>=3.10
license	None
keywords
VCS
bugtrack_url
requirements	yt-dlp whisper aiofiles youtube-transcript-api pytube tqdm
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            
# YouTube Video to Text Markdown Converter

`yt-video-text-md` is a Python package designed to retrieve and convert YouTube video transcripts/subtitles into Markdown files. This tool is particularly useful for extracting text from entire playlists or individual videos. It leverages the `youtube-transcript-api` for direct subtitle extraction and `whisper` for audio-to-text conversion when transcripts are unavailable.

## Features

- **Playlist and Video Support:** Extracts subtitles from both individual videos and entire playlists.
- **Fallback Mechanism:** Utilizes `whisper` to transcribe audio if subtitles are not available.
- **Markdown Formatting:** Outputs transcripts in Markdown format with video titles as headers.

## Installation

### Via pip

To install the latest version directly from the GitHub repository, use:

```bash
pip install git+https://github.com/kothiyarajesh/yt-video-text-md.git
```

### Building from Source

1. Clone the repository:

    ```bash
    git clone https://github.com/kothiyarajesh/yt-video-text-md.git
    ```

2. Navigate to the project directory:

    ```bash
    cd yt-video-text-md
    ```

3. Install the package:

    ```bash
    python setup.py install
    ```

4. If installing from source, make sure to install the dependencies manually:

    ```bash
    pip install -r requirements.txt
    ```

## Usage

### Python Script

Here's a simple example of how to use the `yt-video-text-md` library in a Python script:

```python
from yt_video_text_md import YTVideoTextMD

# Define the URL of the YouTube video or playlist you want to process
video_url = "https://www.youtube.com/watch?v=pzo13OPXZS4"

# Specify the directory where the output Markdown file will be saved
output_directory = "."

# Set the default name for the generated Markdown file
markdown_file_name = "yt_video_2_text_md_"

# Define the directory where temporary audio files will be stored (Used only if a transcript is not available)
temporary_audio_directory = "/tmp"

# Create an instance of YTVideoTextMD with the specified parameters
YTVideoTextMD(
    url=video_url,
    output_dir=output_directory,
    default_md_file_name=markdown_file_name,
    audio_output_dir=temporary_audio_directory
)
```

### Command-Line Interface

You can also use the package from the command line:

```bash
yt-video-text-md -u "https://www.youtube.com/playlist?list=PLMrJAkhIeNNQV7wi9r7Kut8liLFMWQOXn" -d "." -f "playlist_video_" -ad "/tmp"
```

**Options:**
- `-u` or `--url`: URL of the YouTube video or playlist.
- `-d` or `--output-dir`: Directory where the output Markdown file will be saved.
- `-f` or `--file-name`: Name for the generated Markdown file.
- `-ad` or `--audio-dir`: Directory where temporary audio files will be stored (used only if a transcript is not available).

## Notes

- **Dependencies:** This package relies on several external libraries. Ensure all dependencies are installed for optimal functionality.
- **Audio Extraction:** If a video does not have an available transcript, the script will download the video, extract the audio, and convert it to text. This process requires a stable internet connection and may be resource-intensive, especially for long videos.

## License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/kothiyarajesh/yt-video-text-md",
    "name": "yt-video-text-md",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": "Rajesh Kothiya",
    "author_email": "rkahir2222@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/00/b8/86ef05c6cc843f8977f0a0179a335d7d1e7e6532eac1a9f9514356dde969/yt_video_text_md-0.1.0.tar.gz",
    "platform": null,
    "description": "\n# YouTube Video to Text Markdown Converter\n\n`yt-video-text-md` is a Python package designed to retrieve and convert YouTube video transcripts/subtitles into Markdown files. This tool is particularly useful for extracting text from entire playlists or individual videos. It leverages the `youtube-transcript-api` for direct subtitle extraction and `whisper` for audio-to-text conversion when transcripts are unavailable.\n\n## Features\n\n- **Playlist and Video Support:** Extracts subtitles from both individual videos and entire playlists.\n- **Fallback Mechanism:** Utilizes `whisper` to transcribe audio if subtitles are not available.\n- **Markdown Formatting:** Outputs transcripts in Markdown format with video titles as headers.\n\n## Installation\n\n### Via pip\n\nTo install the latest version directly from the GitHub repository, use:\n\n```bash\npip install git+https://github.com/kothiyarajesh/yt-video-text-md.git\n```\n\n### Building from Source\n\n1. Clone the repository:\n\n    ```bash\n    git clone https://github.com/kothiyarajesh/yt-video-text-md.git\n    ```\n\n2. Navigate to the project directory:\n\n    ```bash\n    cd yt-video-text-md\n    ```\n\n3. Install the package:\n\n    ```bash\n    python setup.py install\n    ```\n\n4. If installing from source, make sure to install the dependencies manually:\n\n    ```bash\n    pip install -r requirements.txt\n    ```\n\n## Usage\n\n### Python Script\n\nHere's a simple example of how to use the `yt-video-text-md` library in a Python script:\n\n```python\nfrom yt_video_text_md import YTVideoTextMD\n\n# Define the URL of the YouTube video or playlist you want to process\nvideo_url = \"https://www.youtube.com/watch?v=pzo13OPXZS4\"\n\n# Specify the directory where the output Markdown file will be saved\noutput_directory = \".\"\n\n# Set the default name for the generated Markdown file\nmarkdown_file_name = \"yt_video_2_text_md_\"\n\n# Define the directory where temporary audio files will be stored (Used only if a transcript is not available)\ntemporary_audio_directory = \"/tmp\"\n\n# Create an instance of YTVideoTextMD with the specified parameters\nYTVideoTextMD(\n    url=video_url,\n    output_dir=output_directory,\n    default_md_file_name=markdown_file_name,\n    audio_output_dir=temporary_audio_directory\n)\n```\n\n### Command-Line Interface\n\nYou can also use the package from the command line:\n\n```bash\nyt-video-text-md -u \"https://www.youtube.com/playlist?list=PLMrJAkhIeNNQV7wi9r7Kut8liLFMWQOXn\" -d \".\" -f \"playlist_video_\" -ad \"/tmp\"\n```\n\n**Options:**\n- `-u` or `--url`: URL of the YouTube video or playlist.\n- `-d` or `--output-dir`: Directory where the output Markdown file will be saved.\n- `-f` or `--file-name`: Name for the generated Markdown file.\n- `-ad` or `--audio-dir`: Directory where temporary audio files will be stored (used only if a transcript is not available).\n\n## Notes\n\n- **Dependencies:** This package relies on several external libraries. Ensure all dependencies are installed for optimal functionality.\n- **Audio Extraction:** If a video does not have an available transcript, the script will download the video, extract the audio, and convert it to text. This process requires a stable internet connection and may be resource-intensive, especially for long videos.\n\n## License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Fetch YouTube video transcripts and save them to markdown files.",
    "version": "0.1.0",
    "project_urls": {
        "Homepage": "https://github.com/kothiyarajesh/yt-video-text-md"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6280d124b3e75d37ef8c91ff1f2ead5996b792ca08bc45f722aaf2270cf3ebd4",
                "md5": "fa2b1af9c491629951baf18aeece1587",
                "sha256": "8540f50634659f2e73db191725a1ed7df6a5934645c7c215e2fb2caeaca7433a"
            },
            "downloads": -1,
            "filename": "yt_video_text_md-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fa2b1af9c491629951baf18aeece1587",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 8710,
            "upload_time": "2024-08-23T09:09:47",
            "upload_time_iso_8601": "2024-08-23T09:09:47.310823Z",
            "url": "https://files.pythonhosted.org/packages/62/80/d124b3e75d37ef8c91ff1f2ead5996b792ca08bc45f722aaf2270cf3ebd4/yt_video_text_md-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "00b886ef05c6cc843f8977f0a0179a335d7d1e7e6532eac1a9f9514356dde969",
                "md5": "c1fbbb5c1f119109f45635a518c027f9",
                "sha256": "5c40d46c74c8e793bd41f6e5c712d6624c825a27ef5ebf395af030fd0b8e587f"
            },
            "downloads": -1,
            "filename": "yt_video_text_md-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "c1fbbb5c1f119109f45635a518c027f9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 8232,
            "upload_time": "2024-08-23T09:23:06",
            "upload_time_iso_8601": "2024-08-23T09:23:06.579872Z",
            "url": "https://files.pythonhosted.org/packages/00/b8/86ef05c6cc843f8977f0a0179a335d7d1e7e6532eac1a9f9514356dde969/yt_video_text_md-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-23 09:23:06",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kothiyarajesh",
    "github_project": "yt-video-text-md",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "yt-dlp",
            "specs": [
                [
                    "==",
                    "2024.8.6"
                ]
            ]
        },
        {
            "name": "whisper",
            "specs": [
                [
                    "==",
                    "1.1.10"
                ]
            ]
        },
        {
            "name": "aiofiles",
            "specs": [
                [
                    "==",
                    "24.1.0"
                ]
            ]
        },
        {
            "name": "youtube-transcript-api",
            "specs": [
                [
                    "==",
                    "0.6.2"
                ]
            ]
        },
        {
            "name": "pytube",
            "specs": [
                [
                    "==",
                    "15.0.0"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    "==",
                    "4.66.5"
                ]
            ]
        }
    ],
    "lcname": "yt-video-text-md"
}

Rajesh Kothiya