tubescribe


Nametubescribe JSON
Version 0.3.4 PyPI version JSON
download
home_pageNone
SummaryCLI to transcribe YouTube audio via Whisper (local) or Gemini (cloud)
upload_time2025-09-08 08:06:33
maintainerNone
docs_urlNone
authorNone
requires_python>=3.11
licenseNone
keywords caption cli speech-to-text srt transcription whisper youtube
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # TubeScribe (ytx) — YouTube Transcriber (Whisper / Metal via whisper.cpp)

CLI that downloads YouTube audio and produces transcripts and captions using:

- Local Whisper (faster-whisper / CTranslate2)
- Whisper.cpp (Metal acceleration on Apple Silicon)

Repository: https://github.com/prateekjain24/TubeScribe

Managed with venv+pip (recommended) or uv, using the `src` layout.

Features

- One command: URL → audio → normalized WAV → transcript JSON + SRT captions
- Engines: `whisper` (faster-whisper) and `whispercpp` (Metal via whisper.cpp)
- Rich progress for download + transcription
- Deterministic JSON (orjson) and SRT line wrapping

Requirements

- Python >= 3.11
- FFmpeg installed and on PATH
  - Check: `ffmpeg -version`
  - macOS: `brew install ffmpeg`
  - Ubuntu/Debian: `sudo apt-get update && sudo apt-get install -y ffmpeg`
  - Fedora: `sudo dnf install -y ffmpeg`
  - Arch: `sudo pacman -S ffmpeg`
  - Windows: `winget install Gyan.FFmpeg` or `choco install ffmpeg`

Install (dev)

- Option A: venv + pip (recommended)
  - `cd ytx && python3.11 -m venv .venv && source .venv/bin/activate`
  - `python -m pip install -U pip setuptools wheel`
  - `python -m pip install -e .`
  - `ytx --help`
- Option B: uv
  - `cd ytx && uv sync`
  - `uv run ytx --help`

Running locally without installing

- From repo root:
  - `export PYTHONPATH="$(pwd)/ytx/src"`
  - `cd ytx && python3 -m ytx.cli --help`
  - Example: `python3 -m ytx.cli summarize-file 0jpcFxY_38k.json --write`

Note: Avoid running the `ytx` console script from inside the `ytx/` folder; Python may shadow the installed package. Use the module form or run from repo root.

Usage (CLI)

- Whisper (CPU by default):
  - `ytx transcribe <url> --engine whisper --model small`
- Whisper (larger model):
  - `ytx transcribe <url> --engine whisper --model large-v3-turbo`
- Gemini (best‑effort timestamps):
  - `ytx transcribe <url> --engine gemini --timestamps chunked --fallback`
- Chapters + summaries:
  - `ytx transcribe <url> --by-chapter --parallel-chapters --chapter-overlap 2.0 --summarize-chapters --summarize`
- Engine options and timestamp policy:
  - `ytx transcribe <url> --engine-opts '{"utterances":true}' --timestamps native`
- Output dir:
  - `ytx transcribe <url> --output-dir ./artifacts`
- Verbose logging:
  - `ytx --verbose transcribe <url> --engine whisper`
- Health check:
  - `ytx health` (ffmpeg, API key presence, network)
- Summarize an existing transcript JSON:
  - `ytx summarize-file /path/to/<video_id>.json --write`

Metal (Apple Silicon) via whisper.cpp

- Build whisper.cpp with Metal: `make -j METAL=1`
- Download a GGUF/GGML model (e.g., large-v3-turbo)
- Run with whisper.cpp engine by passing a model file path:
  - `uv run ytx transcribe <url> --engine whispercpp --model /path/to/gguf-large-v3-turbo.bin`
- Auto-prefer whisper.cpp when `device=metal` (if `whisper.cpp` binary is available):
  - Set env `YTX_WHISPERCPP_BIN` to the `main` binary path, and provide a model path as above
- Tuning (env or .env):
  - `YTX_WHISPERCPP_NGL` (GPU layers, default 35), `YTX_WHISPERCPP_THREADS` (CPU threads)

Outputs

- JSON (`<video_id>.json`): TranscriptDoc
  - keys: `video_id, source_url, title, duration, language, engine, model, created_at, segments[], chapters?, summary?`
  - segment: `{id, start, end, text, confidence?}` (seconds for time)
- SRT (`<video_id>.srt`): line-wrapped captions (2 lines max)
- Cache artifacts (under XDG cache root): `meta.json`, `summary.json`, transcript and captions.

Configuration (.env)

- Copy `.env.example` → `.env`, then adjust:
  - `GEMINI_API_KEY` (for Gemini)
  - `YTX_ENGINE` (default `whisper`), `WHISPER_MODEL` (e.g., `large-v3-turbo`)
  - `YTX_WHISPERCPP_BIN` and `YTX_WHISPERCPP_MODEL_PATH` for whisper.cpp
  - Optional: `YTX_CACHE_DIR`, `YTX_OUTPUT_DIR`, `YTX_ENGINE_OPTS` (JSON), and timeouts (`YTX_NETWORK_TIMEOUT`, etc.)

Restricted videos & cookies

- Some videos are age/region restricted or private. The downloader supports cookies, but CLI flags are not yet wired.
- Workarounds: run yt-dlp manually, or use the Python API (pass `cookies_from_browser` / `cookies_file` to downloader).
- Error messages suggest cookies usage when restrictions are detected.

Performance Tips

- faster‑whisper: `compute_type=auto` resolves to `int8` on CPU, `float16` on CUDA.
- Model sizing: start with `small`/`medium`; use `large-v3(-turbo)` for best quality.
- Metal (whisper.cpp): tune `-ngl` (30–40 typical on M‑series) and threads to maximize throughput.

Development

- Structure: code in `src/ytx/`, CLI in `src/ytx/cli.py`, engines in `src/ytx/engines/`, exporters in `src/ytx/exporters/`.
- Tests: `pytest -q` (add tests under `ytx/tests/`).
- Lint/format (if configured): `ruff check .` / `ruff format .`.

Roadmap

- Add VTT/TXT exporters, format selection (`--formats json,srt,vtt,txt`)
- OpenAI/Deepgram/ElevenLabs engines via shared cloud base
- More resilient chunking/alignment; diarization options where supported
- CI + tests; docs polish; performance tuning

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "tubescribe",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "caption, cli, speech-to-text, srt, transcription, whisper, youtube",
    "author": null,
    "author_email": "Prateek <19404752+prateekjain24@users.noreply.github.com>",
    "download_url": "https://files.pythonhosted.org/packages/74/ef/1dde36a06725c26102659a6626aa59b46583569b5fe2da06971a0155178c/tubescribe-0.3.4.tar.gz",
    "platform": null,
    "description": "# TubeScribe (ytx) \u2014 YouTube Transcriber (Whisper / Metal via whisper.cpp)\n\nCLI that downloads YouTube audio and produces transcripts and captions using:\n\n- Local Whisper (faster-whisper / CTranslate2)\n- Whisper.cpp (Metal acceleration on Apple Silicon)\n\nRepository: https://github.com/prateekjain24/TubeScribe\n\nManaged with venv+pip (recommended) or uv, using the `src` layout.\n\nFeatures\n\n- One command: URL \u2192 audio \u2192 normalized WAV \u2192 transcript JSON + SRT captions\n- Engines: `whisper` (faster-whisper) and `whispercpp` (Metal via whisper.cpp)\n- Rich progress for download + transcription\n- Deterministic JSON (orjson) and SRT line wrapping\n\nRequirements\n\n- Python >= 3.11\n- FFmpeg installed and on PATH\n  - Check: `ffmpeg -version`\n  - macOS: `brew install ffmpeg`\n  - Ubuntu/Debian: `sudo apt-get update && sudo apt-get install -y ffmpeg`\n  - Fedora: `sudo dnf install -y ffmpeg`\n  - Arch: `sudo pacman -S ffmpeg`\n  - Windows: `winget install Gyan.FFmpeg` or `choco install ffmpeg`\n\nInstall (dev)\n\n- Option A: venv + pip (recommended)\n  - `cd ytx && python3.11 -m venv .venv && source .venv/bin/activate`\n  - `python -m pip install -U pip setuptools wheel`\n  - `python -m pip install -e .`\n  - `ytx --help`\n- Option B: uv\n  - `cd ytx && uv sync`\n  - `uv run ytx --help`\n\nRunning locally without installing\n\n- From repo root:\n  - `export PYTHONPATH=\"$(pwd)/ytx/src\"`\n  - `cd ytx && python3 -m ytx.cli --help`\n  - Example: `python3 -m ytx.cli summarize-file 0jpcFxY_38k.json --write`\n\nNote: Avoid running the `ytx` console script from inside the `ytx/` folder; Python may shadow the installed package. Use the module form or run from repo root.\n\nUsage (CLI)\n\n- Whisper (CPU by default):\n  - `ytx transcribe <url> --engine whisper --model small`\n- Whisper (larger model):\n  - `ytx transcribe <url> --engine whisper --model large-v3-turbo`\n- Gemini (best\u2011effort timestamps):\n  - `ytx transcribe <url> --engine gemini --timestamps chunked --fallback`\n- Chapters + summaries:\n  - `ytx transcribe <url> --by-chapter --parallel-chapters --chapter-overlap 2.0 --summarize-chapters --summarize`\n- Engine options and timestamp policy:\n  - `ytx transcribe <url> --engine-opts '{\"utterances\":true}' --timestamps native`\n- Output dir:\n  - `ytx transcribe <url> --output-dir ./artifacts`\n- Verbose logging:\n  - `ytx --verbose transcribe <url> --engine whisper`\n- Health check:\n  - `ytx health` (ffmpeg, API key presence, network)\n- Summarize an existing transcript JSON:\n  - `ytx summarize-file /path/to/<video_id>.json --write`\n\nMetal (Apple Silicon) via whisper.cpp\n\n- Build whisper.cpp with Metal: `make -j METAL=1`\n- Download a GGUF/GGML model (e.g., large-v3-turbo)\n- Run with whisper.cpp engine by passing a model file path:\n  - `uv run ytx transcribe <url> --engine whispercpp --model /path/to/gguf-large-v3-turbo.bin`\n- Auto-prefer whisper.cpp when `device=metal` (if `whisper.cpp` binary is available):\n  - Set env `YTX_WHISPERCPP_BIN` to the `main` binary path, and provide a model path as above\n- Tuning (env or .env):\n  - `YTX_WHISPERCPP_NGL` (GPU layers, default 35), `YTX_WHISPERCPP_THREADS` (CPU threads)\n\nOutputs\n\n- JSON (`<video_id>.json`): TranscriptDoc\n  - keys: `video_id, source_url, title, duration, language, engine, model, created_at, segments[], chapters?, summary?`\n  - segment: `{id, start, end, text, confidence?}` (seconds for time)\n- SRT (`<video_id>.srt`): line-wrapped captions (2 lines max)\n- Cache artifacts (under XDG cache root): `meta.json`, `summary.json`, transcript and captions.\n\nConfiguration (.env)\n\n- Copy `.env.example` \u2192 `.env`, then adjust:\n  - `GEMINI_API_KEY` (for Gemini)\n  - `YTX_ENGINE` (default `whisper`), `WHISPER_MODEL` (e.g., `large-v3-turbo`)\n  - `YTX_WHISPERCPP_BIN` and `YTX_WHISPERCPP_MODEL_PATH` for whisper.cpp\n  - Optional: `YTX_CACHE_DIR`, `YTX_OUTPUT_DIR`, `YTX_ENGINE_OPTS` (JSON), and timeouts (`YTX_NETWORK_TIMEOUT`, etc.)\n\nRestricted videos & cookies\n\n- Some videos are age/region restricted or private. The downloader supports cookies, but CLI flags are not yet wired.\n- Workarounds: run yt-dlp manually, or use the Python API (pass `cookies_from_browser` / `cookies_file` to downloader).\n- Error messages suggest cookies usage when restrictions are detected.\n\nPerformance Tips\n\n- faster\u2011whisper: `compute_type=auto` resolves to `int8` on CPU, `float16` on CUDA.\n- Model sizing: start with `small`/`medium`; use `large-v3(-turbo)` for best quality.\n- Metal (whisper.cpp): tune `-ngl` (30\u201340 typical on M\u2011series) and threads to maximize throughput.\n\nDevelopment\n\n- Structure: code in `src/ytx/`, CLI in `src/ytx/cli.py`, engines in `src/ytx/engines/`, exporters in `src/ytx/exporters/`.\n- Tests: `pytest -q` (add tests under `ytx/tests/`).\n- Lint/format (if configured): `ruff check .` / `ruff format .`.\n\nRoadmap\n\n- Add VTT/TXT exporters, format selection (`--formats json,srt,vtt,txt`)\n- OpenAI/Deepgram/ElevenLabs engines via shared cloud base\n- More resilient chunking/alignment; diarization options where supported\n- CI + tests; docs polish; performance tuning\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "CLI to transcribe YouTube audio via Whisper (local) or Gemini (cloud)",
    "version": "0.3.4",
    "project_urls": {
        "Homepage": "https://github.com/prateekjain24/TubeScribe",
        "Issues": "https://github.com/prateekjain24/TubeScribe/issues",
        "Repository": "https://github.com/prateekjain24/TubeScribe"
    },
    "split_keywords": [
        "caption",
        " cli",
        " speech-to-text",
        " srt",
        " transcription",
        " whisper",
        " youtube"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9464b9d22ee3906f8ec67fae93a5455b472f23264f2f4105e36f137ed5b80762",
                "md5": "67c1686529980620c30d157b81fe70f8",
                "sha256": "24e9e2d2ab2be2e0a4cee9d4f19be426a9305b17fbffd3b2fb61130524f3eccd"
            },
            "downloads": -1,
            "filename": "tubescribe-0.3.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "67c1686529980620c30d157b81fe70f8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 63983,
            "upload_time": "2025-09-08T08:06:28",
            "upload_time_iso_8601": "2025-09-08T08:06:28.353319Z",
            "url": "https://files.pythonhosted.org/packages/94/64/b9d22ee3906f8ec67fae93a5455b472f23264f2f4105e36f137ed5b80762/tubescribe-0.3.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "74ef1dde36a06725c26102659a6626aa59b46583569b5fe2da06971a0155178c",
                "md5": "3c47567be18717900ac74afcfa8e68ce",
                "sha256": "e09e76d2a4a334b03b814364c922cbff2111fa6bff135428bed3bc4ca70bd104"
            },
            "downloads": -1,
            "filename": "tubescribe-0.3.4.tar.gz",
            "has_sig": false,
            "md5_digest": "3c47567be18717900ac74afcfa8e68ce",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 47400,
            "upload_time": "2025-09-08T08:06:33",
            "upload_time_iso_8601": "2025-09-08T08:06:33.514654Z",
            "url": "https://files.pythonhosted.org/packages/74/ef/1dde36a06725c26102659a6626aa59b46583569b5fe2da06971a0155178c/tubescribe-0.3.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-08 08:06:33",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "prateekjain24",
    "github_project": "TubeScribe",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "tubescribe"
}
        
Elapsed time: 3.30423s