yt2doc

Name	yt2doc JSON
Version	0.3.2 JSON
	download
home_page	None
Summary	Transcribe any YouTube video into a structural Markdown document
upload_time	2024-12-12 13:55:54
maintainer	None
docs_url	None
author	None
requires_python	<3.13,>=3.10
license	None
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # yt2doc

![Header Image](header-image.png)

yt2doc transcribes videos & audios online into readable Markdown documents.

Supported video/audio sources:
* YouTube
* Apple Podcasts
* Twitter

yt2doc is meant to work fully locally, without invoking any external API. The OpenAI SDK dependency is required solely to interact with a local LLM server such as [Ollama](https://github.com/ollama/ollama).

Check out some [examples](./examples/) generated by yt2doc.

## Why

There have been many existing projects that transcribe YouTube videos with Whisper and its variants, but most of them aimed to generate subtitles, while I had not found one that priortises readability. Whisper does not generate line break in its transcription, so transcribing a 20 mins long video without any post processing would give you a huge piece of text, without any line break or topic segmentation. This project aims to transcribe videos with that post processing. 

## Installation

### Prerequisites

[ffmepg](https://www.ffmpeg.org/) is required to run yt2doc.

If you are running MacOS:

```
brew install ffmpeg
```

If you are on Debian/Ubuntu:
```
sudo apt install ffmpeg
```

If you are on Windows, follow the instruction on the ffmpeg [website](https://ffmpeg.org/download.html#build-windows). If you have installed Scoop on Windows:

```
scoop install ffmpeg
```

### Install yt2doc

Install with [pipx](https://github.com/pypa/pipx):

```
pipx install yt2doc
```

Or install with [uv](https://github.com/astral-sh/uv):
```
uv tool install yt2doc
```

#### ⚠️ Know issue of Python 3.13 on MacOS

If you are on MacOS and running Python 3.13, you may face a dependency issue that is from the upstream PyTorch dependency. See issue [#46](https://github.com/shun-liang/yt2doc/issues/46).

A quick workaround will be

```
pipx install --python 3.12 yt2doc
```

### Upgrade

If you have already installed yt2doc but would like to upgrade to a later version:

```
pipx upgrade yt2doc
```

or with `uv`:

```
uv tool upgrade yt2doc
```

## Usage

Get helping information:

```
yt2doc --help
```

### Transcribe Video from Youtube or Twitter

To transcribe a video (on YouTube or Twitter) into a document:

```
yt2doc --video <video-url>
```

To save your transcription:

```
yt2doc --video <video-url> -o some_dir/transcription.md
```

### Transcribe a YouTube playlist

To transcribe all videos from a YouTube playlist:

```
yt2doc --playlist <playlist-url> -o some_dir
```

### Chapter unchaptered videos

(LLM server e.g. [Ollama](https://github.com/ollama/ollama) required) If the video is not chaptered, you can chapter it and add headings to each chapter:

```
yt2doc --video <video-url> --segment-unchaptered --llm-model <model-name>
```

Among smaller size models, `gemma2:9b`, `llama3.1:8b`, and `qwen 2.5:7b` work reasonably well.

By default, yt2doc talks to Ollama at `http://localhost:11434/v1` to segment the text by topic. You can run yt2doc to interact with Ollama at a different address or port, a different (OpenAI-compatible) LLM server (e.g. [vLLM](https://github.com/vllm-project/vllm), [mistral.rs](https://github.com/EricLBuehler/mistral.rs)), or even OpenAI itself, by

```
yt2doc --video <video-url> --segment-unchaptered --llm-server <llm-server-url> --llm-api-key <llm-server-api-key> --llm-model <model-name>
```

### Transcribe Apple Podcasts

To transcribe a podcast episode on Apple Podcasts:

```
yt2doc --audio <apple-podcasts-episode-url> --segment-unchaptered --llm-model <model-name>
```

### Whisper configuration

By default, yt2doc uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper) as transcription backend. You can run yt2doc with different faster-whisper configs (model size, device, compute type etc):

```
yt2doc --video <video-url> --whisper-model <model-name> --whisper-device <cpu|cuda|auto> --whisper-compute-type <compute_type>
```

For the meaning and choices of `--whisper-model`, `--whisper-device` and `--whisper-compute-type`, please refer to this [comment](https://github.com/SYSTRAN/faster-whisper/blob/v1.0.3/faster_whisper/transcribe.py#L101-L127) of faster-whisper.


If you are running yt2doc on Apple Silicon, [whisper.cpp](https://github.com/ggerganov/whisper.cpp) gives much faster performance as it supports the Apple GPU. (A hacky) Support for whisper.cpp has been implemented:

```
yt2doc --video --whisper-backend whisper_cpp --whisper-cpp-executable <path-to-whisper-cpp-executable>  --whisper-cpp-model <path-to-whisper-cpp-model>
```

See https://github.com/shun-liang/yt2doc/issues/15 for more info on whisper.cpp integration.


### Text segmentation configuration

yt2doc uses [Segment Any Text (SaT)](https://github.com/segment-any-text/wtpsplit) to segment the transcript into sentences and paragraphs. You can change the SaT model:
```
yt2doc --video <video-url> --sat-model <sat-model>
```

List of available SaT models [here](https://github.com/segment-any-text/wtpsplit?tab=readme-ov-file#available-models).


### Timestamping paragraphs

Paragraphs in the generated Markdown can be timestamped by
```
yt2doc --video <video-url> --timestamp-paragraphs
```

### Add table of contents

A table of contents of all chapters can be added by
```
yt2doc --video <video-url>  --segment-unchaptered --llm-model <llm-model> --add-table-of-contents
```

### Ignore chapters from source

Sometimes, the chaptering of the video/audio at the source does not segment the content in the way you are happy about. You can ask yt2doc to ignore the source chaptering by

```
yt2doc --video <video-url> --ignore-chapters --segment-unchaptered --llm-model <model-name>
```

### Extra options to yt-dlp

If you need to specify extra options to yt-dlp, you can specify `--yt-dlp-extra-opts` with a string representation of a Python dictionary of the key and value pairs, such as

```
yt2doc --video <video-url> --yt-dlp-extra-opts '{"quiet": False}'
```

The list of possible keys supported by yt-dlp (as a library, not as a cli tool) is documented in the source code and may change any time. As of version [2024.12.06](https://github.com/yt-dlp/yt-dlp/releases/tag/2024.12.06) the yt-dlp options are documented [here](https://github.com/yt-dlp/yt-dlp/blob/2024.12.06/yt_dlp/YoutubeDL.py#L212-L491).


### Run in Docker

To run yt2doc in Docker, first pull the image from ghcr:

```
docker pull ghcr.io/shun-liang/yt2doc
```

Then just run:

```
docker run ghcr.io/shun-liang/yt2doc --video <video-url>
```

If you are running Ollama (or any LLM server) locally and you want to segment the unchapter video/audio, you need to use the [host](https://docs.docker.com/engine/network/drivers/host/) network driver. Also, if you want to save the document to the host filesystem, you need [mount](https://docs.docker.com/engine/storage/bind-mounts/) a host directory to the Docker container. For example, if you run Ollam at `http://localhost:11434` on host, and you want yt2doc to write to `<directory-on-host>` on the host filesystem, then

```
docker run --network="host" --mount type=bind,source=<directory-on-host>,target=/app  ghcr.io/shun-liang/yt2doc --video <video-url> --segment-unchaptered --llm-server http://host.docker.internal:11434/v1 --llm-model <llm-model> -o .
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "yt2doc",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/56/db/cca9d57c33e0d608e0bd1ebc5c2a20925f43d8d1a0319b7f7933c00f0a2c/yt2doc-0.3.2.tar.gz",
    "platform": null,
    "description": "# yt2doc\n\n![Header Image](header-image.png)\n\nyt2doc transcribes videos & audios online into readable Markdown documents.\n\nSupported video/audio sources:\n* YouTube\n* Apple Podcasts\n* Twitter\n\nyt2doc is meant to work fully locally, without invoking any external API. The OpenAI SDK dependency is required solely to interact with a local LLM server such as [Ollama](https://github.com/ollama/ollama).\n\nCheck out some [examples](./examples/) generated by yt2doc.\n\n## Why\n\nThere have been many existing projects that transcribe YouTube videos with Whisper and its variants, but most of them aimed to generate subtitles, while I had not found one that priortises readability. Whisper does not generate line break in its transcription, so transcribing a 20 mins long video without any post processing would give you a huge piece of text, without any line break or topic segmentation. This project aims to transcribe videos with that post processing. \n\n## Installation\n\n### Prerequisites\n\n[ffmepg](https://www.ffmpeg.org/) is required to run yt2doc.\n\nIf you are running MacOS:\n\n```\nbrew install ffmpeg\n```\n\nIf you are on Debian/Ubuntu:\n```\nsudo apt install ffmpeg\n```\n\nIf you are on Windows, follow the instruction on the ffmpeg [website](https://ffmpeg.org/download.html#build-windows). If you have installed Scoop on Windows:\n\n```\nscoop install ffmpeg\n```\n\n### Install yt2doc\n\nInstall with [pipx](https://github.com/pypa/pipx):\n\n```\npipx install yt2doc\n```\n\nOr install with [uv](https://github.com/astral-sh/uv):\n```\nuv tool install yt2doc\n```\n\n#### \u26a0\ufe0f Know issue of Python 3.13 on MacOS\n\nIf you are on MacOS and running Python 3.13, you may face a dependency issue that is from the upstream PyTorch dependency. See issue [#46](https://github.com/shun-liang/yt2doc/issues/46).\n\nA quick workaround will be\n\n```\npipx install --python 3.12 yt2doc\n```\n\n### Upgrade\n\nIf you have already installed yt2doc but would like to upgrade to a later version:\n\n```\npipx upgrade yt2doc\n```\n\nor with `uv`:\n\n```\nuv tool upgrade yt2doc\n```\n\n## Usage\n\nGet helping information:\n\n```\nyt2doc --help\n```\n\n### Transcribe Video from Youtube or Twitter\n\nTo transcribe a video (on YouTube or Twitter) into a document:\n\n```\nyt2doc --video <video-url>\n```\n\nTo save your transcription:\n\n```\nyt2doc --video <video-url> -o some_dir/transcription.md\n```\n\n### Transcribe a YouTube playlist\n\nTo transcribe all videos from a YouTube playlist:\n\n```\nyt2doc --playlist <playlist-url> -o some_dir\n```\n\n### Chapter unchaptered videos\n\n(LLM server e.g. [Ollama](https://github.com/ollama/ollama) required) If the video is not chaptered, you can chapter it and add headings to each chapter:\n\n```\nyt2doc --video <video-url> --segment-unchaptered --llm-model <model-name>\n```\n\nAmong smaller size models, `gemma2:9b`, `llama3.1:8b`, and `qwen 2.5:7b` work reasonably well.\n\nBy default, yt2doc talks to Ollama at `http://localhost:11434/v1` to segment the text by topic. You can run yt2doc to interact with Ollama at a different address or port, a different (OpenAI-compatible) LLM server (e.g. [vLLM](https://github.com/vllm-project/vllm), [mistral.rs](https://github.com/EricLBuehler/mistral.rs)), or even OpenAI itself, by\n\n```\nyt2doc --video <video-url> --segment-unchaptered --llm-server <llm-server-url> --llm-api-key <llm-server-api-key> --llm-model <model-name>\n```\n\n### Transcribe Apple Podcasts\n\nTo transcribe a podcast episode on Apple Podcasts:\n\n```\nyt2doc --audio <apple-podcasts-episode-url> --segment-unchaptered --llm-model <model-name>\n```\n\n### Whisper configuration\n\nBy default, yt2doc uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper) as transcription backend. You can run yt2doc with different faster-whisper configs (model size, device, compute type etc):\n\n```\nyt2doc --video <video-url> --whisper-model <model-name> --whisper-device <cpu|cuda|auto> --whisper-compute-type <compute_type>\n```\n\nFor the meaning and choices of `--whisper-model`, `--whisper-device` and `--whisper-compute-type`, please refer to this [comment](https://github.com/SYSTRAN/faster-whisper/blob/v1.0.3/faster_whisper/transcribe.py#L101-L127) of faster-whisper.\n\n\nIf you are running yt2doc on Apple Silicon, [whisper.cpp](https://github.com/ggerganov/whisper.cpp) gives much faster performance as it supports the Apple GPU. (A hacky) Support for whisper.cpp has been implemented:\n\n```\nyt2doc --video --whisper-backend whisper_cpp --whisper-cpp-executable <path-to-whisper-cpp-executable>  --whisper-cpp-model <path-to-whisper-cpp-model>\n```\n\nSee https://github.com/shun-liang/yt2doc/issues/15 for more info on whisper.cpp integration.\n\n\n### Text segmentation configuration\n\nyt2doc uses [Segment Any Text (SaT)](https://github.com/segment-any-text/wtpsplit) to segment the transcript into sentences and paragraphs. You can change the SaT model:\n```\nyt2doc --video <video-url> --sat-model <sat-model>\n```\n\nList of available SaT models [here](https://github.com/segment-any-text/wtpsplit?tab=readme-ov-file#available-models).\n\n\n### Timestamping paragraphs\n\nParagraphs in the generated Markdown can be timestamped by\n```\nyt2doc --video <video-url> --timestamp-paragraphs\n```\n\n### Add table of contents\n\nA table of contents of all chapters can be added by\n```\nyt2doc --video <video-url>  --segment-unchaptered --llm-model <llm-model> --add-table-of-contents\n```\n\n### Ignore chapters from source\n\nSometimes, the chaptering of the video/audio at the source does not segment the content in the way you are happy about. You can ask yt2doc to ignore the source chaptering by\n\n```\nyt2doc --video <video-url> --ignore-chapters --segment-unchaptered --llm-model <model-name>\n```\n\n### Extra options to yt-dlp\n\nIf you need to specify extra options to yt-dlp, you can specify `--yt-dlp-extra-opts` with a string representation of a Python dictionary of the key and value pairs, such as\n\n```\nyt2doc --video <video-url> --yt-dlp-extra-opts '{\"quiet\": False}'\n```\n\nThe list of possible keys supported by yt-dlp (as a library, not as a cli tool) is documented in the source code and may change any time. As of version [2024.12.06](https://github.com/yt-dlp/yt-dlp/releases/tag/2024.12.06) the yt-dlp options are documented [here](https://github.com/yt-dlp/yt-dlp/blob/2024.12.06/yt_dlp/YoutubeDL.py#L212-L491).\n\n\n### Run in Docker\n\nTo run yt2doc in Docker, first pull the image from ghcr:\n\n```\ndocker pull ghcr.io/shun-liang/yt2doc\n```\n\nThen just run:\n\n```\ndocker run ghcr.io/shun-liang/yt2doc --video <video-url>\n```\n\nIf you are running Ollama (or any LLM server) locally and you want to segment the unchapter video/audio, you need to use the [host](https://docs.docker.com/engine/network/drivers/host/) network driver. Also, if you want to save the document to the host filesystem, you need [mount](https://docs.docker.com/engine/storage/bind-mounts/) a host directory to the Docker container. For example, if you run Ollam at `http://localhost:11434` on host, and you want yt2doc to write to `<directory-on-host>` on the host filesystem, then\n\n```\ndocker run --network=\"host\" --mount type=bind,source=<directory-on-host>,target=/app  ghcr.io/shun-liang/yt2doc --video <video-url> --segment-unchaptered --llm-server http://host.docker.internal:11434/v1 --llm-model <llm-model> -o .\n```",
    "bugtrack_url": null,
    "license": null,
    "summary": "Transcribe any YouTube video into a structural Markdown document",
    "version": "0.3.2",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "de07c21675152a51ee676b80b4d9cd8a07341464f0386df176e7151012c5e5ab",
                "md5": "359d8c042a4d309ac7a6e6241683a77d",
                "sha256": "0481922d35227237f8fd859acc5cf8a1a920b17b3314ef81cfaf594123195325"
            },
            "downloads": -1,
            "filename": "yt2doc-0.3.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "359d8c042a4d309ac7a6e6241683a77d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.10",
            "size": 26017,
            "upload_time": "2024-12-12T13:55:51",
            "upload_time_iso_8601": "2024-12-12T13:55:51.523545Z",
            "url": "https://files.pythonhosted.org/packages/de/07/c21675152a51ee676b80b4d9cd8a07341464f0386df176e7151012c5e5ab/yt2doc-0.3.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "56dbcca9d57c33e0d608e0bd1ebc5c2a20925f43d8d1a0319b7f7933c00f0a2c",
                "md5": "d36012294c46cfe6c3bb51082472b943",
                "sha256": "3d395b2cfb29c0496409e30dfddbd15c54b6ece2c188926c57641ef2d6c71e09"
            },
            "downloads": -1,
            "filename": "yt2doc-0.3.2.tar.gz",
            "has_sig": false,
            "md5_digest": "d36012294c46cfe6c3bb51082472b943",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.10",
            "size": 3547976,
            "upload_time": "2024-12-12T13:55:54",
            "upload_time_iso_8601": "2024-12-12T13:55:54.325643Z",
            "url": "https://files.pythonhosted.org/packages/56/db/cca9d57c33e0d608e0bd1ebc5c2a20925f43d8d1a0319b7f7933c00f0a2c/yt2doc-0.3.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-12 13:55:54",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "yt2doc"
}

None