Name | yt2doc JSON |
Version |
0.3.2
JSON |
| download |
home_page | None |
Summary | Transcribe any YouTube video into a structural Markdown document |
upload_time | 2024-12-12 13:55:54 |
maintainer | None |
docs_url | None |
author | None |
requires_python | <3.13,>=3.10 |
license | None |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# yt2doc

yt2doc transcribes videos & audios online into readable Markdown documents.
Supported video/audio sources:
* YouTube
* Apple Podcasts
* Twitter
yt2doc is meant to work fully locally, without invoking any external API. The OpenAI SDK dependency is required solely to interact with a local LLM server such as [Ollama](https://github.com/ollama/ollama).
Check out some [examples](./examples/) generated by yt2doc.
## Why
There have been many existing projects that transcribe YouTube videos with Whisper and its variants, but most of them aimed to generate subtitles, while I had not found one that priortises readability. Whisper does not generate line break in its transcription, so transcribing a 20 mins long video without any post processing would give you a huge piece of text, without any line break or topic segmentation. This project aims to transcribe videos with that post processing.
## Installation
### Prerequisites
[ffmepg](https://www.ffmpeg.org/) is required to run yt2doc.
If you are running MacOS:
```
brew install ffmpeg
```
If you are on Debian/Ubuntu:
```
sudo apt install ffmpeg
```
If you are on Windows, follow the instruction on the ffmpeg [website](https://ffmpeg.org/download.html#build-windows). If you have installed Scoop on Windows:
```
scoop install ffmpeg
```
### Install yt2doc
Install with [pipx](https://github.com/pypa/pipx):
```
pipx install yt2doc
```
Or install with [uv](https://github.com/astral-sh/uv):
```
uv tool install yt2doc
```
#### ⚠️ Know issue of Python 3.13 on MacOS
If you are on MacOS and running Python 3.13, you may face a dependency issue that is from the upstream PyTorch dependency. See issue [#46](https://github.com/shun-liang/yt2doc/issues/46).
A quick workaround will be
```
pipx install --python 3.12 yt2doc
```
### Upgrade
If you have already installed yt2doc but would like to upgrade to a later version:
```
pipx upgrade yt2doc
```
or with `uv`:
```
uv tool upgrade yt2doc
```
## Usage
Get helping information:
```
yt2doc --help
```
### Transcribe Video from Youtube or Twitter
To transcribe a video (on YouTube or Twitter) into a document:
```
yt2doc --video <video-url>
```
To save your transcription:
```
yt2doc --video <video-url> -o some_dir/transcription.md
```
### Transcribe a YouTube playlist
To transcribe all videos from a YouTube playlist:
```
yt2doc --playlist <playlist-url> -o some_dir
```
### Chapter unchaptered videos
(LLM server e.g. [Ollama](https://github.com/ollama/ollama) required) If the video is not chaptered, you can chapter it and add headings to each chapter:
```
yt2doc --video <video-url> --segment-unchaptered --llm-model <model-name>
```
Among smaller size models, `gemma2:9b`, `llama3.1:8b`, and `qwen 2.5:7b` work reasonably well.
By default, yt2doc talks to Ollama at `http://localhost:11434/v1` to segment the text by topic. You can run yt2doc to interact with Ollama at a different address or port, a different (OpenAI-compatible) LLM server (e.g. [vLLM](https://github.com/vllm-project/vllm), [mistral.rs](https://github.com/EricLBuehler/mistral.rs)), or even OpenAI itself, by
```
yt2doc --video <video-url> --segment-unchaptered --llm-server <llm-server-url> --llm-api-key <llm-server-api-key> --llm-model <model-name>
```
### Transcribe Apple Podcasts
To transcribe a podcast episode on Apple Podcasts:
```
yt2doc --audio <apple-podcasts-episode-url> --segment-unchaptered --llm-model <model-name>
```
### Whisper configuration
By default, yt2doc uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper) as transcription backend. You can run yt2doc with different faster-whisper configs (model size, device, compute type etc):
```
yt2doc --video <video-url> --whisper-model <model-name> --whisper-device <cpu|cuda|auto> --whisper-compute-type <compute_type>
```
For the meaning and choices of `--whisper-model`, `--whisper-device` and `--whisper-compute-type`, please refer to this [comment](https://github.com/SYSTRAN/faster-whisper/blob/v1.0.3/faster_whisper/transcribe.py#L101-L127) of faster-whisper.
If you are running yt2doc on Apple Silicon, [whisper.cpp](https://github.com/ggerganov/whisper.cpp) gives much faster performance as it supports the Apple GPU. (A hacky) Support for whisper.cpp has been implemented:
```
yt2doc --video --whisper-backend whisper_cpp --whisper-cpp-executable <path-to-whisper-cpp-executable> --whisper-cpp-model <path-to-whisper-cpp-model>
```
See https://github.com/shun-liang/yt2doc/issues/15 for more info on whisper.cpp integration.
### Text segmentation configuration
yt2doc uses [Segment Any Text (SaT)](https://github.com/segment-any-text/wtpsplit) to segment the transcript into sentences and paragraphs. You can change the SaT model:
```
yt2doc --video <video-url> --sat-model <sat-model>
```
List of available SaT models [here](https://github.com/segment-any-text/wtpsplit?tab=readme-ov-file#available-models).
### Timestamping paragraphs
Paragraphs in the generated Markdown can be timestamped by
```
yt2doc --video <video-url> --timestamp-paragraphs
```
### Add table of contents
A table of contents of all chapters can be added by
```
yt2doc --video <video-url> --segment-unchaptered --llm-model <llm-model> --add-table-of-contents
```
### Ignore chapters from source
Sometimes, the chaptering of the video/audio at the source does not segment the content in the way you are happy about. You can ask yt2doc to ignore the source chaptering by
```
yt2doc --video <video-url> --ignore-chapters --segment-unchaptered --llm-model <model-name>
```
### Extra options to yt-dlp
If you need to specify extra options to yt-dlp, you can specify `--yt-dlp-extra-opts` with a string representation of a Python dictionary of the key and value pairs, such as
```
yt2doc --video <video-url> --yt-dlp-extra-opts '{"quiet": False}'
```
The list of possible keys supported by yt-dlp (as a library, not as a cli tool) is documented in the source code and may change any time. As of version [2024.12.06](https://github.com/yt-dlp/yt-dlp/releases/tag/2024.12.06) the yt-dlp options are documented [here](https://github.com/yt-dlp/yt-dlp/blob/2024.12.06/yt_dlp/YoutubeDL.py#L212-L491).
### Run in Docker
To run yt2doc in Docker, first pull the image from ghcr:
```
docker pull ghcr.io/shun-liang/yt2doc
```
Then just run:
```
docker run ghcr.io/shun-liang/yt2doc --video <video-url>
```
If you are running Ollama (or any LLM server) locally and you want to segment the unchapter video/audio, you need to use the [host](https://docs.docker.com/engine/network/drivers/host/) network driver. Also, if you want to save the document to the host filesystem, you need [mount](https://docs.docker.com/engine/storage/bind-mounts/) a host directory to the Docker container. For example, if you run Ollam at `http://localhost:11434` on host, and you want yt2doc to write to `<directory-on-host>` on the host filesystem, then
```
docker run --network="host" --mount type=bind,source=<directory-on-host>,target=/app ghcr.io/shun-liang/yt2doc --video <video-url> --segment-unchaptered --llm-server http://host.docker.internal:11434/v1 --llm-model <llm-model> -o .
```
Raw data
{
"_id": null,
"home_page": null,
"name": "yt2doc",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.13,>=3.10",
"maintainer_email": null,
"keywords": null,
"author": null,
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/56/db/cca9d57c33e0d608e0bd1ebc5c2a20925f43d8d1a0319b7f7933c00f0a2c/yt2doc-0.3.2.tar.gz",
"platform": null,
"description": "# yt2doc\n\n\n\nyt2doc transcribes videos & audios online into readable Markdown documents.\n\nSupported video/audio sources:\n* YouTube\n* Apple Podcasts\n* Twitter\n\nyt2doc is meant to work fully locally, without invoking any external API. The OpenAI SDK dependency is required solely to interact with a local LLM server such as [Ollama](https://github.com/ollama/ollama).\n\nCheck out some [examples](./examples/) generated by yt2doc.\n\n## Why\n\nThere have been many existing projects that transcribe YouTube videos with Whisper and its variants, but most of them aimed to generate subtitles, while I had not found one that priortises readability. Whisper does not generate line break in its transcription, so transcribing a 20 mins long video without any post processing would give you a huge piece of text, without any line break or topic segmentation. This project aims to transcribe videos with that post processing. \n\n## Installation\n\n### Prerequisites\n\n[ffmepg](https://www.ffmpeg.org/) is required to run yt2doc.\n\nIf you are running MacOS:\n\n```\nbrew install ffmpeg\n```\n\nIf you are on Debian/Ubuntu:\n```\nsudo apt install ffmpeg\n```\n\nIf you are on Windows, follow the instruction on the ffmpeg [website](https://ffmpeg.org/download.html#build-windows). If you have installed Scoop on Windows:\n\n```\nscoop install ffmpeg\n```\n\n### Install yt2doc\n\nInstall with [pipx](https://github.com/pypa/pipx):\n\n```\npipx install yt2doc\n```\n\nOr install with [uv](https://github.com/astral-sh/uv):\n```\nuv tool install yt2doc\n```\n\n#### \u26a0\ufe0f Know issue of Python 3.13 on MacOS\n\nIf you are on MacOS and running Python 3.13, you may face a dependency issue that is from the upstream PyTorch dependency. See issue [#46](https://github.com/shun-liang/yt2doc/issues/46).\n\nA quick workaround will be\n\n```\npipx install --python 3.12 yt2doc\n```\n\n### Upgrade\n\nIf you have already installed yt2doc but would like to upgrade to a later version:\n\n```\npipx upgrade yt2doc\n```\n\nor with `uv`:\n\n```\nuv tool upgrade yt2doc\n```\n\n## Usage\n\nGet helping information:\n\n```\nyt2doc --help\n```\n\n### Transcribe Video from Youtube or Twitter\n\nTo transcribe a video (on YouTube or Twitter) into a document:\n\n```\nyt2doc --video <video-url>\n```\n\nTo save your transcription:\n\n```\nyt2doc --video <video-url> -o some_dir/transcription.md\n```\n\n### Transcribe a YouTube playlist\n\nTo transcribe all videos from a YouTube playlist:\n\n```\nyt2doc --playlist <playlist-url> -o some_dir\n```\n\n### Chapter unchaptered videos\n\n(LLM server e.g. [Ollama](https://github.com/ollama/ollama) required) If the video is not chaptered, you can chapter it and add headings to each chapter:\n\n```\nyt2doc --video <video-url> --segment-unchaptered --llm-model <model-name>\n```\n\nAmong smaller size models, `gemma2:9b`, `llama3.1:8b`, and `qwen 2.5:7b` work reasonably well.\n\nBy default, yt2doc talks to Ollama at `http://localhost:11434/v1` to segment the text by topic. You can run yt2doc to interact with Ollama at a different address or port, a different (OpenAI-compatible) LLM server (e.g. [vLLM](https://github.com/vllm-project/vllm), [mistral.rs](https://github.com/EricLBuehler/mistral.rs)), or even OpenAI itself, by\n\n```\nyt2doc --video <video-url> --segment-unchaptered --llm-server <llm-server-url> --llm-api-key <llm-server-api-key> --llm-model <model-name>\n```\n\n### Transcribe Apple Podcasts\n\nTo transcribe a podcast episode on Apple Podcasts:\n\n```\nyt2doc --audio <apple-podcasts-episode-url> --segment-unchaptered --llm-model <model-name>\n```\n\n### Whisper configuration\n\nBy default, yt2doc uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper) as transcription backend. You can run yt2doc with different faster-whisper configs (model size, device, compute type etc):\n\n```\nyt2doc --video <video-url> --whisper-model <model-name> --whisper-device <cpu|cuda|auto> --whisper-compute-type <compute_type>\n```\n\nFor the meaning and choices of `--whisper-model`, `--whisper-device` and `--whisper-compute-type`, please refer to this [comment](https://github.com/SYSTRAN/faster-whisper/blob/v1.0.3/faster_whisper/transcribe.py#L101-L127) of faster-whisper.\n\n\nIf you are running yt2doc on Apple Silicon, [whisper.cpp](https://github.com/ggerganov/whisper.cpp) gives much faster performance as it supports the Apple GPU. (A hacky) Support for whisper.cpp has been implemented:\n\n```\nyt2doc --video --whisper-backend whisper_cpp --whisper-cpp-executable <path-to-whisper-cpp-executable> --whisper-cpp-model <path-to-whisper-cpp-model>\n```\n\nSee https://github.com/shun-liang/yt2doc/issues/15 for more info on whisper.cpp integration.\n\n\n### Text segmentation configuration\n\nyt2doc uses [Segment Any Text (SaT)](https://github.com/segment-any-text/wtpsplit) to segment the transcript into sentences and paragraphs. You can change the SaT model:\n```\nyt2doc --video <video-url> --sat-model <sat-model>\n```\n\nList of available SaT models [here](https://github.com/segment-any-text/wtpsplit?tab=readme-ov-file#available-models).\n\n\n### Timestamping paragraphs\n\nParagraphs in the generated Markdown can be timestamped by\n```\nyt2doc --video <video-url> --timestamp-paragraphs\n```\n\n### Add table of contents\n\nA table of contents of all chapters can be added by\n```\nyt2doc --video <video-url> --segment-unchaptered --llm-model <llm-model> --add-table-of-contents\n```\n\n### Ignore chapters from source\n\nSometimes, the chaptering of the video/audio at the source does not segment the content in the way you are happy about. You can ask yt2doc to ignore the source chaptering by\n\n```\nyt2doc --video <video-url> --ignore-chapters --segment-unchaptered --llm-model <model-name>\n```\n\n### Extra options to yt-dlp\n\nIf you need to specify extra options to yt-dlp, you can specify `--yt-dlp-extra-opts` with a string representation of a Python dictionary of the key and value pairs, such as\n\n```\nyt2doc --video <video-url> --yt-dlp-extra-opts '{\"quiet\": False}'\n```\n\nThe list of possible keys supported by yt-dlp (as a library, not as a cli tool) is documented in the source code and may change any time. As of version [2024.12.06](https://github.com/yt-dlp/yt-dlp/releases/tag/2024.12.06) the yt-dlp options are documented [here](https://github.com/yt-dlp/yt-dlp/blob/2024.12.06/yt_dlp/YoutubeDL.py#L212-L491).\n\n\n### Run in Docker\n\nTo run yt2doc in Docker, first pull the image from ghcr:\n\n```\ndocker pull ghcr.io/shun-liang/yt2doc\n```\n\nThen just run:\n\n```\ndocker run ghcr.io/shun-liang/yt2doc --video <video-url>\n```\n\nIf you are running Ollama (or any LLM server) locally and you want to segment the unchapter video/audio, you need to use the [host](https://docs.docker.com/engine/network/drivers/host/) network driver. Also, if you want to save the document to the host filesystem, you need [mount](https://docs.docker.com/engine/storage/bind-mounts/) a host directory to the Docker container. For example, if you run Ollam at `http://localhost:11434` on host, and you want yt2doc to write to `<directory-on-host>` on the host filesystem, then\n\n```\ndocker run --network=\"host\" --mount type=bind,source=<directory-on-host>,target=/app ghcr.io/shun-liang/yt2doc --video <video-url> --segment-unchaptered --llm-server http://host.docker.internal:11434/v1 --llm-model <llm-model> -o .\n```",
"bugtrack_url": null,
"license": null,
"summary": "Transcribe any YouTube video into a structural Markdown document",
"version": "0.3.2",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "de07c21675152a51ee676b80b4d9cd8a07341464f0386df176e7151012c5e5ab",
"md5": "359d8c042a4d309ac7a6e6241683a77d",
"sha256": "0481922d35227237f8fd859acc5cf8a1a920b17b3314ef81cfaf594123195325"
},
"downloads": -1,
"filename": "yt2doc-0.3.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "359d8c042a4d309ac7a6e6241683a77d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.13,>=3.10",
"size": 26017,
"upload_time": "2024-12-12T13:55:51",
"upload_time_iso_8601": "2024-12-12T13:55:51.523545Z",
"url": "https://files.pythonhosted.org/packages/de/07/c21675152a51ee676b80b4d9cd8a07341464f0386df176e7151012c5e5ab/yt2doc-0.3.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "56dbcca9d57c33e0d608e0bd1ebc5c2a20925f43d8d1a0319b7f7933c00f0a2c",
"md5": "d36012294c46cfe6c3bb51082472b943",
"sha256": "3d395b2cfb29c0496409e30dfddbd15c54b6ece2c188926c57641ef2d6c71e09"
},
"downloads": -1,
"filename": "yt2doc-0.3.2.tar.gz",
"has_sig": false,
"md5_digest": "d36012294c46cfe6c3bb51082472b943",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.13,>=3.10",
"size": 3547976,
"upload_time": "2024-12-12T13:55:54",
"upload_time_iso_8601": "2024-12-12T13:55:54.325643Z",
"url": "https://files.pythonhosted.org/packages/56/db/cca9d57c33e0d608e0bd1ebc5c2a20925f43d8d1a0319b7f7933c00f0a2c/yt2doc-0.3.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-12 13:55:54",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "yt2doc"
}