# TubeScribe (ytx) — YouTube Transcriber (Whisper / Metal via whisper.cpp)
CLI that downloads YouTube audio and produces transcripts and captions using:
- Local Whisper (faster-whisper / CTranslate2)
- Whisper.cpp (Metal acceleration on Apple Silicon)
Repository: https://github.com/prateekjain24/TubeScribe
Managed with venv+pip (recommended) or uv, using the `src` layout.
Features
- One command: URL → audio → normalized WAV → transcript JSON + SRT captions
- Engines: `whisper` (faster-whisper) and `whispercpp` (Metal via whisper.cpp)
- Rich progress for download + transcription
- Deterministic JSON (orjson) and SRT line wrapping
Requirements
- Python >= 3.11
- FFmpeg installed and on PATH
- Check: `ffmpeg -version`
- macOS: `brew install ffmpeg`
- Ubuntu/Debian: `sudo apt-get update && sudo apt-get install -y ffmpeg`
- Fedora: `sudo dnf install -y ffmpeg`
- Arch: `sudo pacman -S ffmpeg`
- Windows: `winget install Gyan.FFmpeg` or `choco install ffmpeg`
Install (dev)
- Option A: venv + pip (recommended)
- `cd ytx && python3.11 -m venv .venv && source .venv/bin/activate`
- `python -m pip install -U pip setuptools wheel`
- `python -m pip install -e .`
- `ytx --help`
- Option B: uv
- `cd ytx && uv sync`
- `uv run ytx --help`
Running locally without installing
- From repo root:
- `export PYTHONPATH="$(pwd)/ytx/src"`
- `cd ytx && python3 -m ytx.cli --help`
- Example: `python3 -m ytx.cli summarize-file 0jpcFxY_38k.json --write`
Note: Avoid running the `ytx` console script from inside the `ytx/` folder; Python may shadow the installed package. Use the module form or run from repo root.
Usage (CLI)
- Whisper (CPU by default):
- `ytx transcribe <url> --engine whisper --model small`
- Whisper (larger model):
- `ytx transcribe <url> --engine whisper --model large-v3-turbo`
- Gemini (best‑effort timestamps):
- `ytx transcribe <url> --engine gemini --timestamps chunked --fallback`
- Chapters + summaries:
- `ytx transcribe <url> --by-chapter --parallel-chapters --chapter-overlap 2.0 --summarize-chapters --summarize`
- Engine options and timestamp policy:
- `ytx transcribe <url> --engine-opts '{"utterances":true}' --timestamps native`
- Output dir:
- `ytx transcribe <url> --output-dir ./artifacts`
- Verbose logging:
- `ytx --verbose transcribe <url> --engine whisper`
- Health check:
- `ytx health` (ffmpeg, API key presence, network)
- Summarize an existing transcript JSON:
- `ytx summarize-file /path/to/<video_id>.json --write`
Metal (Apple Silicon) via whisper.cpp
- Build whisper.cpp with Metal: `make -j METAL=1`
- Download a GGUF/GGML model (e.g., large-v3-turbo)
- Run with whisper.cpp engine by passing a model file path:
- `uv run ytx transcribe <url> --engine whispercpp --model /path/to/gguf-large-v3-turbo.bin`
- Auto-prefer whisper.cpp when `device=metal` (if `whisper.cpp` binary is available):
- Set env `YTX_WHISPERCPP_BIN` to the `main` binary path, and provide a model path as above
- Tuning (env or .env):
- `YTX_WHISPERCPP_NGL` (GPU layers, default 35), `YTX_WHISPERCPP_THREADS` (CPU threads)
Outputs
- JSON (`<video_id>.json`): TranscriptDoc
- keys: `video_id, source_url, title, duration, language, engine, model, created_at, segments[], chapters?, summary?`
- segment: `{id, start, end, text, confidence?}` (seconds for time)
- SRT (`<video_id>.srt`): line-wrapped captions (2 lines max)
- Cache artifacts (under XDG cache root): `meta.json`, `summary.json`, transcript and captions.
Configuration (.env)
- Copy `.env.example` → `.env`, then adjust:
- `GEMINI_API_KEY` (for Gemini)
- `YTX_ENGINE` (default `whisper`), `WHISPER_MODEL` (e.g., `large-v3-turbo`)
- `YTX_WHISPERCPP_BIN` and `YTX_WHISPERCPP_MODEL_PATH` for whisper.cpp
- Optional: `YTX_CACHE_DIR`, `YTX_OUTPUT_DIR`, `YTX_ENGINE_OPTS` (JSON), and timeouts (`YTX_NETWORK_TIMEOUT`, etc.)
Restricted videos & cookies
- Some videos are age/region restricted or private. The downloader supports cookies, but CLI flags are not yet wired.
- Workarounds: run yt-dlp manually, or use the Python API (pass `cookies_from_browser` / `cookies_file` to downloader).
- Error messages suggest cookies usage when restrictions are detected.
Performance Tips
- faster‑whisper: `compute_type=auto` resolves to `int8` on CPU, `float16` on CUDA.
- Model sizing: start with `small`/`medium`; use `large-v3(-turbo)` for best quality.
- Metal (whisper.cpp): tune `-ngl` (30–40 typical on M‑series) and threads to maximize throughput.
Development
- Structure: code in `src/ytx/`, CLI in `src/ytx/cli.py`, engines in `src/ytx/engines/`, exporters in `src/ytx/exporters/`.
- Tests: `pytest -q` (add tests under `ytx/tests/`).
- Lint/format (if configured): `ruff check .` / `ruff format .`.
Roadmap
- Add VTT/TXT exporters, format selection (`--formats json,srt,vtt,txt`)
- OpenAI/Deepgram/ElevenLabs engines via shared cloud base
- More resilient chunking/alignment; diarization options where supported
- CI + tests; docs polish; performance tuning
Raw data
{
"_id": null,
"home_page": null,
"name": "tubescribe",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.11",
"maintainer_email": null,
"keywords": "caption, cli, speech-to-text, srt, transcription, whisper, youtube",
"author": null,
"author_email": "Prateek <19404752+prateekjain24@users.noreply.github.com>",
"download_url": "https://files.pythonhosted.org/packages/74/ef/1dde36a06725c26102659a6626aa59b46583569b5fe2da06971a0155178c/tubescribe-0.3.4.tar.gz",
"platform": null,
"description": "# TubeScribe (ytx) \u2014 YouTube Transcriber (Whisper / Metal via whisper.cpp)\n\nCLI that downloads YouTube audio and produces transcripts and captions using:\n\n- Local Whisper (faster-whisper / CTranslate2)\n- Whisper.cpp (Metal acceleration on Apple Silicon)\n\nRepository: https://github.com/prateekjain24/TubeScribe\n\nManaged with venv+pip (recommended) or uv, using the `src` layout.\n\nFeatures\n\n- One command: URL \u2192 audio \u2192 normalized WAV \u2192 transcript JSON + SRT captions\n- Engines: `whisper` (faster-whisper) and `whispercpp` (Metal via whisper.cpp)\n- Rich progress for download + transcription\n- Deterministic JSON (orjson) and SRT line wrapping\n\nRequirements\n\n- Python >= 3.11\n- FFmpeg installed and on PATH\n - Check: `ffmpeg -version`\n - macOS: `brew install ffmpeg`\n - Ubuntu/Debian: `sudo apt-get update && sudo apt-get install -y ffmpeg`\n - Fedora: `sudo dnf install -y ffmpeg`\n - Arch: `sudo pacman -S ffmpeg`\n - Windows: `winget install Gyan.FFmpeg` or `choco install ffmpeg`\n\nInstall (dev)\n\n- Option A: venv + pip (recommended)\n - `cd ytx && python3.11 -m venv .venv && source .venv/bin/activate`\n - `python -m pip install -U pip setuptools wheel`\n - `python -m pip install -e .`\n - `ytx --help`\n- Option B: uv\n - `cd ytx && uv sync`\n - `uv run ytx --help`\n\nRunning locally without installing\n\n- From repo root:\n - `export PYTHONPATH=\"$(pwd)/ytx/src\"`\n - `cd ytx && python3 -m ytx.cli --help`\n - Example: `python3 -m ytx.cli summarize-file 0jpcFxY_38k.json --write`\n\nNote: Avoid running the `ytx` console script from inside the `ytx/` folder; Python may shadow the installed package. Use the module form or run from repo root.\n\nUsage (CLI)\n\n- Whisper (CPU by default):\n - `ytx transcribe <url> --engine whisper --model small`\n- Whisper (larger model):\n - `ytx transcribe <url> --engine whisper --model large-v3-turbo`\n- Gemini (best\u2011effort timestamps):\n - `ytx transcribe <url> --engine gemini --timestamps chunked --fallback`\n- Chapters + summaries:\n - `ytx transcribe <url> --by-chapter --parallel-chapters --chapter-overlap 2.0 --summarize-chapters --summarize`\n- Engine options and timestamp policy:\n - `ytx transcribe <url> --engine-opts '{\"utterances\":true}' --timestamps native`\n- Output dir:\n - `ytx transcribe <url> --output-dir ./artifacts`\n- Verbose logging:\n - `ytx --verbose transcribe <url> --engine whisper`\n- Health check:\n - `ytx health` (ffmpeg, API key presence, network)\n- Summarize an existing transcript JSON:\n - `ytx summarize-file /path/to/<video_id>.json --write`\n\nMetal (Apple Silicon) via whisper.cpp\n\n- Build whisper.cpp with Metal: `make -j METAL=1`\n- Download a GGUF/GGML model (e.g., large-v3-turbo)\n- Run with whisper.cpp engine by passing a model file path:\n - `uv run ytx transcribe <url> --engine whispercpp --model /path/to/gguf-large-v3-turbo.bin`\n- Auto-prefer whisper.cpp when `device=metal` (if `whisper.cpp` binary is available):\n - Set env `YTX_WHISPERCPP_BIN` to the `main` binary path, and provide a model path as above\n- Tuning (env or .env):\n - `YTX_WHISPERCPP_NGL` (GPU layers, default 35), `YTX_WHISPERCPP_THREADS` (CPU threads)\n\nOutputs\n\n- JSON (`<video_id>.json`): TranscriptDoc\n - keys: `video_id, source_url, title, duration, language, engine, model, created_at, segments[], chapters?, summary?`\n - segment: `{id, start, end, text, confidence?}` (seconds for time)\n- SRT (`<video_id>.srt`): line-wrapped captions (2 lines max)\n- Cache artifacts (under XDG cache root): `meta.json`, `summary.json`, transcript and captions.\n\nConfiguration (.env)\n\n- Copy `.env.example` \u2192 `.env`, then adjust:\n - `GEMINI_API_KEY` (for Gemini)\n - `YTX_ENGINE` (default `whisper`), `WHISPER_MODEL` (e.g., `large-v3-turbo`)\n - `YTX_WHISPERCPP_BIN` and `YTX_WHISPERCPP_MODEL_PATH` for whisper.cpp\n - Optional: `YTX_CACHE_DIR`, `YTX_OUTPUT_DIR`, `YTX_ENGINE_OPTS` (JSON), and timeouts (`YTX_NETWORK_TIMEOUT`, etc.)\n\nRestricted videos & cookies\n\n- Some videos are age/region restricted or private. The downloader supports cookies, but CLI flags are not yet wired.\n- Workarounds: run yt-dlp manually, or use the Python API (pass `cookies_from_browser` / `cookies_file` to downloader).\n- Error messages suggest cookies usage when restrictions are detected.\n\nPerformance Tips\n\n- faster\u2011whisper: `compute_type=auto` resolves to `int8` on CPU, `float16` on CUDA.\n- Model sizing: start with `small`/`medium`; use `large-v3(-turbo)` for best quality.\n- Metal (whisper.cpp): tune `-ngl` (30\u201340 typical on M\u2011series) and threads to maximize throughput.\n\nDevelopment\n\n- Structure: code in `src/ytx/`, CLI in `src/ytx/cli.py`, engines in `src/ytx/engines/`, exporters in `src/ytx/exporters/`.\n- Tests: `pytest -q` (add tests under `ytx/tests/`).\n- Lint/format (if configured): `ruff check .` / `ruff format .`.\n\nRoadmap\n\n- Add VTT/TXT exporters, format selection (`--formats json,srt,vtt,txt`)\n- OpenAI/Deepgram/ElevenLabs engines via shared cloud base\n- More resilient chunking/alignment; diarization options where supported\n- CI + tests; docs polish; performance tuning\n",
"bugtrack_url": null,
"license": null,
"summary": "CLI to transcribe YouTube audio via Whisper (local) or Gemini (cloud)",
"version": "0.3.4",
"project_urls": {
"Homepage": "https://github.com/prateekjain24/TubeScribe",
"Issues": "https://github.com/prateekjain24/TubeScribe/issues",
"Repository": "https://github.com/prateekjain24/TubeScribe"
},
"split_keywords": [
"caption",
" cli",
" speech-to-text",
" srt",
" transcription",
" whisper",
" youtube"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "9464b9d22ee3906f8ec67fae93a5455b472f23264f2f4105e36f137ed5b80762",
"md5": "67c1686529980620c30d157b81fe70f8",
"sha256": "24e9e2d2ab2be2e0a4cee9d4f19be426a9305b17fbffd3b2fb61130524f3eccd"
},
"downloads": -1,
"filename": "tubescribe-0.3.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "67c1686529980620c30d157b81fe70f8",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.11",
"size": 63983,
"upload_time": "2025-09-08T08:06:28",
"upload_time_iso_8601": "2025-09-08T08:06:28.353319Z",
"url": "https://files.pythonhosted.org/packages/94/64/b9d22ee3906f8ec67fae93a5455b472f23264f2f4105e36f137ed5b80762/tubescribe-0.3.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "74ef1dde36a06725c26102659a6626aa59b46583569b5fe2da06971a0155178c",
"md5": "3c47567be18717900ac74afcfa8e68ce",
"sha256": "e09e76d2a4a334b03b814364c922cbff2111fa6bff135428bed3bc4ca70bd104"
},
"downloads": -1,
"filename": "tubescribe-0.3.4.tar.gz",
"has_sig": false,
"md5_digest": "3c47567be18717900ac74afcfa8e68ce",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.11",
"size": 47400,
"upload_time": "2025-09-08T08:06:33",
"upload_time_iso_8601": "2025-09-08T08:06:33.514654Z",
"url": "https://files.pythonhosted.org/packages/74/ef/1dde36a06725c26102659a6626aa59b46583569b5fe2da06971a0155178c/tubescribe-0.3.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-08 08:06:33",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "prateekjain24",
"github_project": "TubeScribe",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "tubescribe"
}