yt-pls

Name	yt-pls JSON
Version	0.0.1b1 JSON
	download
home_page	None
Summary	Youtube Summariser with LlamaIndex
upload_time	2025-09-03 16:55:46
maintainer	None
docs_url	None
author	Nelson Ngai
requires_python	>=3.13
license	None
keywords	youtube summary llamaindex genai
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <h1 align="center">yt-summary</h1>

<p align="center">
    <a href="https://www.python.org/downloads/release/python-3131/">
        <img src="https://img.shields.io/badge/python-3.13-blue.svg" alt="Python 3.13">
    </a>
    <a href="https://github.com/astral-sh/ty">
        <img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ty/main/assets/badge/v0.json">
    </a>
    <a href="https://github.com/astral-sh/ruff">
        <img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json" alt="Linting: Ruff">
    </a>
    <a href="LICENSE">
        <img alt="License" src="https://img.shields.io/static/v1?logo=MIT&color=Blue&message=MIT&label=License"/>
    </a>
</p>

<p align="center">
A tool to generate YouTube video summaries with LlmaIndex
</p>

> The project was named `yt-summary` initially but PyPI rejected it for being too
> similar to an existing project. Hence it's renamed to `yt-pls` for the
> CLI tool (YouTube please)

## Install

### pip

```bash
pip install yt-pls
```

If you want to use the Gemini and/or Claude models, you need to install the optional dependencies:
```bash
pip install yt-pls[google]
pip install yt-pls[anthropic]

# or
pip install yt-pls[all]
```

### Global
You can install the package globally with `pipx` or `uvx`:
```bash
pipx install yt-pls
```

```bash
uvx run yt-pls
```

### Development
Install [uv](https://github.com/astral-sh/uv) and run:
```bash
git clone https://github.com/nelnn/yt-summary.git
cd yt-summary
make install
source .venv/bin/activate
```

## CLI
Export your API key(s) as an environment variable:
```bash
export OPENAI_API_KEY="sk-..."
export GOOGLE_API_KEY="..."
export ANTHROPIC_API_KEY="..."
```

Then execute
```bash
yt-pls [url/video_id]
```

You can specify the LLM provider and model:
```bash
yt-pls --provider openai --model gpt-5-mini-2025-08-07 [url/video_id]
```

You can also specify the summariser type:
```bash
yt-pls --summariser refined [url/video_id]
```

To save the output to a file, for example in markdown:
```bash
yt-pls [url/video_id] > summary.md
```

## API
You can also use yt-summary as a library:
```python
from yt_summary.run import get_youtube_summary

summary = await get_youtube_summary("https://www.youtube.com/watch?v=abc123")
```

Remember to export your LLM API key as an environment variable before running
the function. Alternatively, you can save them in a `.env` file in your project
root directory.


### Transcript Extractor
To fetch the transcript and metadata:
```python
from yt_summary.extractors import TranscriptExtractor

url = "url or video id"
transcript_extractor = TranscriptExtractor()
transcript_raw = await transcript_extractor.fetch(url)
```

This will return a Pydantic model `YoutubeTranscriptRaw` which looks like:
```python
YoutubeTranscriptRaw(
    metadata=YoutubeMetadata(
        video_id="abc123",
        title="Video Title",
        author="Author Name",
        channel_id="channel123",
        video_url="https://www.youtube.com/watch?v=abc123",
        channel_url="https://www.youtube.com/channel/channel123",
        thumbnail_url="https://i.ytimg.com/vi/abc123/hqdefault.jpg",
        is_generated=True,
        language="English (auto-generated)",
        language_code="en",
    ),
    text="[00:00 (0s)] First sentence. Second sentence. [00:10 (10s)] Third sentence...",
)
```
> **Note**: The extractor is just a wrapper around
> [youtube-transcript-api](https://github.com/jdepoix/youtube-transcript-api)
> which you can pass the same parameters to `fetch` as you would to
> `YouTubeTranscriptApi.fetch` but here we stitch the transcript snippets into
> a single string with timestamps embedded in the text. You can also pass the
> proxy settings to `TranscriptExtractor` as you would to
> `YouTubeTranscriptApi`.

### Summariser
There are currently two summariser implementations: `CompactSummariser` and
`RefineSummariser`.

`CompactSummariser` lists metadata, high level summary and
Q&A with timestamps. It utilises the `DocumentSummaryIndex` from LLamaIndex.

`RefineSummariser` achieves the same by chunking the transcript to generate
summaries for each chunk before consolidating them by making multiple calls to
the LLM asynchronously. Be aware of the rate limits of your chosen LLM
provider.

For example, to generate summary with `CompactSummariser`:
```python
from yt_summary.extractors import TranscriptExtractor
from yt_summary.schemas import LLMModel, LLMProvidersEnum
from yt_summary.summarisers import CompactSummariser, RefinedSummariser

transcript_extractor = TranscriptExtractor()
transcript_raw = await transcript_extractor.fetch(url)
summariser = CompactSummariser(
    llm=LLMModel(
        provider=LLMProvidersEnum.OPENAI,
        model="gpt-5-mini-2025-08-07",
    )
)
summary = await summariser.summarise(transcript)
```

> The repository is under development, expect breaking changes.
> Sugguestions and contributions are welcome!

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "yt-pls",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.13",
    "maintainer_email": null,
    "keywords": "youtube, summary, llamaindex, genai",
    "author": "Nelson Ngai",
    "author_email": "Nelson Ngai <nelsonn.xyz@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/f5/3a/babc03cf7a529bc3fb50023fc61cf660761d9385d450337f4f2b369f35ae/yt_pls-0.0.1b1.tar.gz",
    "platform": null,
    "description": "<h1 align=\"center\">yt-summary</h1>\n\n<p align=\"center\">\n    <a href=\"https://www.python.org/downloads/release/python-3131/\">\n        <img src=\"https://img.shields.io/badge/python-3.13-blue.svg\" alt=\"Python 3.13\">\n    </a>\n    <a href=\"https://github.com/astral-sh/ty\">\n        <img src=\"https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ty/main/assets/badge/v0.json\">\n    </a>\n    <a href=\"https://github.com/astral-sh/ruff\">\n        <img src=\"https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json\" alt=\"Linting: Ruff\">\n    </a>\n    <a href=\"LICENSE\">\n        <img alt=\"License\" src=\"https://img.shields.io/static/v1?logo=MIT&color=Blue&message=MIT&label=License\"/>\n    </a>\n</p>\n\n<p align=\"center\">\nA tool to generate YouTube video summaries with LlmaIndex\n</p>\n\n> The project was named `yt-summary` initially but PyPI rejected it for being too\n> similar to an existing project. Hence it's renamed to `yt-pls` for the\n> CLI tool (YouTube please)\n\n## Install\n\n### pip\n\n```bash\npip install yt-pls\n```\n\nIf you want to use the Gemini and/or Claude models, you need to install the optional dependencies:\n```bash\npip install yt-pls[google]\npip install yt-pls[anthropic]\n\n# or\npip install yt-pls[all]\n```\n\n### Global\nYou can install the package globally with `pipx` or `uvx`:\n```bash\npipx install yt-pls\n```\n\n```bash\nuvx run yt-pls\n```\n\n### Development\nInstall [uv](https://github.com/astral-sh/uv) and run:\n```bash\ngit clone https://github.com/nelnn/yt-summary.git\ncd yt-summary\nmake install\nsource .venv/bin/activate\n```\n\n## CLI\nExport your API key(s) as an environment variable:\n```bash\nexport OPENAI_API_KEY=\"sk-...\"\nexport GOOGLE_API_KEY=\"...\"\nexport ANTHROPIC_API_KEY=\"...\"\n```\n\nThen execute\n```bash\nyt-pls [url/video_id]\n```\n\nYou can specify the LLM provider and model:\n```bash\nyt-pls --provider openai --model gpt-5-mini-2025-08-07 [url/video_id]\n```\n\nYou can also specify the summariser type:\n```bash\nyt-pls --summariser refined [url/video_id]\n```\n\nTo save the output to a file, for example in markdown:\n```bash\nyt-pls [url/video_id] > summary.md\n```\n\n## API\nYou can also use yt-summary as a library:\n```python\nfrom yt_summary.run import get_youtube_summary\n\nsummary = await get_youtube_summary(\"https://www.youtube.com/watch?v=abc123\")\n```\n\nRemember to export your LLM API key as an environment variable before running\nthe function. Alternatively, you can save them in a `.env` file in your project\nroot directory.\n\n\n### Transcript Extractor\nTo fetch the transcript and metadata:\n```python\nfrom yt_summary.extractors import TranscriptExtractor\n\nurl = \"url or video id\"\ntranscript_extractor = TranscriptExtractor()\ntranscript_raw = await transcript_extractor.fetch(url)\n```\n\nThis will return a Pydantic model `YoutubeTranscriptRaw` which looks like:\n```python\nYoutubeTranscriptRaw(\n    metadata=YoutubeMetadata(\n        video_id=\"abc123\",\n        title=\"Video Title\",\n        author=\"Author Name\",\n        channel_id=\"channel123\",\n        video_url=\"https://www.youtube.com/watch?v=abc123\",\n        channel_url=\"https://www.youtube.com/channel/channel123\",\n        thumbnail_url=\"https://i.ytimg.com/vi/abc123/hqdefault.jpg\",\n        is_generated=True,\n        language=\"English (auto-generated)\",\n        language_code=\"en\",\n    ),\n    text=\"[00:00 (0s)] First sentence. Second sentence. [00:10 (10s)] Third sentence...\",\n)\n```\n> **Note**: The extractor is just a wrapper around\n> [youtube-transcript-api](https://github.com/jdepoix/youtube-transcript-api)\n> which you can pass the same parameters to `fetch` as you would to\n> `YouTubeTranscriptApi.fetch` but here we stitch the transcript snippets into\n> a single string with timestamps embedded in the text. You can also pass the\n> proxy settings to `TranscriptExtractor` as you would to\n> `YouTubeTranscriptApi`.\n\n### Summariser\nThere are currently two summariser implementations: `CompactSummariser` and\n`RefineSummariser`.\n\n`CompactSummariser` lists metadata, high level summary and\nQ&A with timestamps. It utilises the `DocumentSummaryIndex` from LLamaIndex.\n\n`RefineSummariser` achieves the same by chunking the transcript to generate\nsummaries for each chunk before consolidating them by making multiple calls to\nthe LLM asynchronously. Be aware of the rate limits of your chosen LLM\nprovider.\n\nFor example, to generate summary with `CompactSummariser`:\n```python\nfrom yt_summary.extractors import TranscriptExtractor\nfrom yt_summary.schemas import LLMModel, LLMProvidersEnum\nfrom yt_summary.summarisers import CompactSummariser, RefinedSummariser\n\ntranscript_extractor = TranscriptExtractor()\ntranscript_raw = await transcript_extractor.fetch(url)\nsummariser = CompactSummariser(\n    llm=LLMModel(\n        provider=LLMProvidersEnum.OPENAI,\n        model=\"gpt-5-mini-2025-08-07\",\n    )\n)\nsummary = await summariser.summarise(transcript)\n```\n\n> The repository is under development, expect breaking changes.\n> Sugguestions and contributions are welcome!\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Youtube Summariser with LlamaIndex",
    "version": "0.0.1b1",
    "project_urls": {
        "Repository": "https://github.com/nelnn/yt-pls"
    },
    "split_keywords": [
        "youtube",
        " summary",
        " llamaindex",
        " genai"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "64d08ce7a08c5430c0e25be053cbec8cdd883699fcd59ba2214fd3ddf6a16fbc",
                "md5": "9a5d8e886d2978aa41f9ec179aae83f9",
                "sha256": "00a375c5175cd4d9189610ec7b811f02938f85806e13c0c00045d0512194f7e4"
            },
            "downloads": -1,
            "filename": "yt_pls-0.0.1b1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9a5d8e886d2978aa41f9ec179aae83f9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.13",
            "size": 24456,
            "upload_time": "2025-09-03T16:55:46",
            "upload_time_iso_8601": "2025-09-03T16:55:46.006005Z",
            "url": "https://files.pythonhosted.org/packages/64/d0/8ce7a08c5430c0e25be053cbec8cdd883699fcd59ba2214fd3ddf6a16fbc/yt_pls-0.0.1b1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "f53ababc03cf7a529bc3fb50023fc61cf660761d9385d450337f4f2b369f35ae",
                "md5": "8859c746bd74a4e7479c191ace4737a3",
                "sha256": "74f114ef6fb603d370692443a63acdb5f0d12a5b4a46c4c98122e94761e3201f"
            },
            "downloads": -1,
            "filename": "yt_pls-0.0.1b1.tar.gz",
            "has_sig": false,
            "md5_digest": "8859c746bd74a4e7479c191ace4737a3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.13",
            "size": 16514,
            "upload_time": "2025-09-03T16:55:46",
            "upload_time_iso_8601": "2025-09-03T16:55:46.959160Z",
            "url": "https://files.pythonhosted.org/packages/f5/3a/babc03cf7a529bc3fb50023fc61cf660761d9385d450337f4f2b369f35ae/yt_pls-0.0.1b1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-03 16:55:46",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "nelnn",
    "github_project": "yt-pls",
    "github_not_found": true,
    "lcname": "yt-pls"
}

Nelson Ngai