ytfetcher


Nameytfetcher JSON
Version 0.4.0 PyPI version JSON
download
home_pagehttps://github.com/kaya70875/ytfetcher
SummaryYTFetcher lets you fetch YouTube transcripts in bulk with metadata like titles, publish dates, and thumbnails. Great for ML, NLP, and dataset generation.
upload_time2025-08-10 12:13:17
maintainerNone
docs_urlNone
authorAhmet Kaya
requires_python<3.14,>=3.11
licenseMIT License Copyright (c) 2025 Ahmet Kaya Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY...
keywords yt transcript youtube cli dataset scraping python
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # YTFetcher
[![codecov](https://codecov.io/gh/kaya70875/ytfetcher/branch/main/graph/badge.svg)](https://codecov.io/gh/kaya70875/ytfetcher)
[![PyPI version](https://img.shields.io/pypi/v/ytfetcher)](https://pypi.org/project/ytfetcher/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

> ⚡ Effortlessly fetch and convert YouTube transcripts for ML, research, or personal use.

**YTFetcher** is a Python tool for fetching YouTube video transcripts in bulk, along with rich metadata like titles, publish dates, and descriptions. Ideal for building NLP datasets, search indexes, or powering content analysis apps.

---

## 📚 Table of Contents

- [Features](#features)
- [Installation](#installation)
- [Basic Usage](#basic-usage)
- [Fetching With Custom Video IDs](#fetching-with-custom-video-ids)
- [Exporting](#exporting)
- [Proxy Configuration](#proxy-configuration)
- [Advanced HTTP Configuration](#advanced-http-configuration-optional)
- [CLI](#cli)
- [Contributing](#contributing)
- [Running Tests](#running-tests)
- [Related Projects](#related-projects)
- [License](#license)

---

## Features

- Fetch full transcripts from a YouTube channel.
- Get video metadata: title, description, thumbnails, published date.
- Async support for high performance.
- Export fetched data as txt, csv or json.
- CLI support.

---

## Installation

It is recommended to install this package by using pip:

```bash
pip install ytfetcher
```

## Basic Usage

Ytfetcher uses **YoutubeV3 API** to get channel details and video id's so you have to create your API key from Google Cloud Console [In here](https://console.cloud.google.com/apis/api/youtube.googleapis.com).

Also keep in mind that you have a quota limit for **YoutubeV3 API**, but for basic usage quota isn't generally a concern.

Here how you can get transcripts and metadata informations like channel name, description, publishedDate etc. from a single channel with `from_channel` method:

```python
from ytfetcher import YTFetcher
from ytfetcher import ChannelData # Or ytfetcher.models import ChannelData
import asyncio

fetcher = YTFetcher.from_channel(
    api_key='your-youtubev3-api-key', 
    channel_handle="TheOffice", 
    max_results=2)

async def get_channel_data() -> list[ChannelData]:
    channel_data = await fetcher.fetch_youtube_data()
    return channel_data

if __name__ == '__main__':
    data = asyncio.run(get_channel_data())
    print(data)
```

---

This will return a list of `ChannelData`. Here's how it's looks like:

```python
[
ChannelData(
    video_id='video1',
    transcripts=[
        Transcript(
            text="Hey there",
            start=0.0,
            duration=1.54
        ),
        Transcript(
            text="Happy coding!",
            start=1.56,
            duration=4.46
        )
    ]
    metadata=Snippet(
        title='VideoTitle',
        description='VideoDescription',
        publishedAt='02.04.2025',
        channelId='id123',
        thumbnails=Thumbnails(
            default=Thumbnail(
                url:'thumbnail_url',
                width: 124,
                height: 124
            )
        )
    )
),
# Other ChannelData objects...
]
```

## Fetching With Custom Video IDs

You can also initialize `ytfetcher` with custom video id's using `from_video_ids` method.

```python
from ytfetcher import YTFetcher
import asyncio

fetcher = YTFetcher.from_video_ids(
    api_key='your-youtubev3-api-key', 
    video_ids=['video1', 'video2', 'video3']) # Here we initialized ytfetcher with from_video_ids method.

# Rest is same ...
```

## Exporting

To export data you can use `Exporter` class. Exporter allows you to export `ChannelData` with formats like **csv**, **json** or **txt**.

```python
from ytfetcher.services import Exporter

channel_data = await fetcher.fetch_youtube_data()

exporter = Exporter(
    channel_data=channel_data,
    allowed_metadata_list=['title', 'publishedAt'],   # You can customize this
    timing=True,                                      # Include transcript start/duration
    filename='my_export',                             # Base filename
    output_dir='./exports'                            # Optional export directory
)

exporter.export_as_json()  # or .export_as_txt(), .export_as_csv()

```

## Other Methods

You can also fetch only transcript data or metadata with video ID's using `fetch_transcripts` and `fetch_snippets` methods.

### Fetch Transcripts

```python
from ytfetcher import VideoTranscript

fetcher = YTFetcher.from_channel(
    api_key='your-youtubev3-api-key', 
    channel_handle="TheOffice", 
    max_results=2)

async def get_transcript_data() -> list[VideoTranscript]:
    transcript_data = await fetcher.fetch_transcripts()
    return transcript_data

if __name__ == '__main__':
    data = asyncio.run(get_transcript_data())
    print(data)

```

### Fetch Snippets

```python
from ytfetcher import VideoMetadata

# Init ytfetcher ...

def get_metadata() -> list[VideoMetadata]:
    metadata = fetcher.fetch_snippets()
    return metadata

if __name__ == '__main__':
    get_metadata()

```

## Proxy Configuration

`YTFetcher` supports proxy usage for fetching YouTube transcripts by leveraging the built-in proxy configuration support from [youtube-transcript-api](https://pypi.org/project/youtube-transcript-api/).

To configure proxies, you can pass a proxy config object from `ytfecher.config` directly to `YTFetcher`:

```python
from ytfetcher import YTFetcher
from ytfetcher.config import GenericProxyConfig, WebshareProxyConfig

fetcher = YTFetcher.from_channel(
    api_key="your-api-key",
    channel_handle="TheOffice",
    max_results=3,
    proxy_config=GenericProxyConfig() | WebshareProxyConfig()
)
```

For more information about proxy configuration please check official `youtube-transcript-api` documents.

## Advanced HTTP Configuration (Optional)

You can pass a custom timeout or headers (e.g., user-agent) to `YTFetcher` using `HTTPConfig`:

```python
from ytfetcher import YTFetcher
from ytfetcher.config import HTTPConfig

custom_config = HTTPConfig(
    timeout=4.0,
    headers={"User-Agent": "ytfetcher/1.0"} # Doesn't recommended to change this unless you have a strong headers.
)

fetcher = YTFetcher.from_channel(
    api_key="your-key",
    channel_handle="TheOffice",
    max_results=10,
    http_config=custom_config
)
```

## CLI

### Basic Usage

```bash
ytfetcher from_channel --api-key <API_KEY> -c <CHANNEL_HANDLE> -m <MAX_RESULTS> -f <FORMAT>
```

Basic usage example:

```bash
ytfetcher from_channel --api-key <API_KEY> -c "channelname" -m 20 -f json
```

### Output Example

```json
[
  {
    "video_id": "abc123",
    "metadata": {
      "title": "Video Title",
      "description": "Video Description",
      "publishedAt": "2023-07-01T12:00:00Z"
    },
    "transcripts": [
      {"text": "Welcome!", "start": 0.0, "duration": 1.2}
    ]
  }
]
```

### Setting API Key Globally In CLI

You can save your api key once with `ytfetcher config` command and use it globally without needing to write everytime while using CLI.

```bash
ytfetcher config <YOUR_API_KEY>
```

Now you can basically say without passing API key argument.

```bash
ytfetcher from_channel -c ChannelName
```

### Using Webshare Proxy

```bash
ytfetcher from_channel --api-key <API_KEY> -c "channel" -f json \
  --webshare-proxy-username "<USERNAME>" \
  --webshare-proxy-password "<PASSWORD>"

```

### Using Custom Proxy

```bash
ytfetcher from_channel --api-key <API_KEY> -c "channel" -f json \
  --http-proxy "http://user:pass@host:port" \
  --https-proxy "https://user:pass@host:port"

```

### Using Custom HTTP Config
```bash
ytfetcher from_channel --api-key <API_KEY> -c "channel" \
  --http-timeout 4.2 \
  --http-headers "{'key': 'value'}" # Must be exact wrapper with double quotes with following single quotes.
```

### Fetching by Video IDs

```bash
ytfetcher from_video_ids --api-key <API_KEY> -v video_id1 video_id2 ... -f json
```

---

## Contributing

To insall this project locally:

```bash
git clone https://github.com/kaya70875/ytfetcher.git
cd ytfetcher
poetry install
```

## Running Tests

```bash
poetry run pytest
```

## Related Projects

- [youtube-transcript-api](https://github.com/jdepoix/youtube-transcript-api)

## License

This project is licensed under the MIT License — see the [LICENSE](./LICENSE) file for details.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/kaya70875/ytfetcher",
    "name": "ytfetcher",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.14,>=3.11",
    "maintainer_email": null,
    "keywords": "yt, transcript, youtube, cli, dataset, scraping, python",
    "author": "Ahmet Kaya",
    "author_email": "kaya70875@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/13/fd/fd14047a7c592a9232d931ff6a32d22a16a9eae162efa37dda4decbf91b3/ytfetcher-0.4.0.tar.gz",
    "platform": null,
    "description": "# YTFetcher\n[![codecov](https://codecov.io/gh/kaya70875/ytfetcher/branch/main/graph/badge.svg)](https://codecov.io/gh/kaya70875/ytfetcher)\n[![PyPI version](https://img.shields.io/pypi/v/ytfetcher)](https://pypi.org/project/ytfetcher/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\n> \u26a1 Effortlessly fetch and convert YouTube transcripts for ML, research, or personal use.\n\n**YTFetcher** is a Python tool for fetching YouTube video transcripts in bulk, along with rich metadata like titles, publish dates, and descriptions. Ideal for building NLP datasets, search indexes, or powering content analysis apps.\n\n---\n\n## \ud83d\udcda Table of Contents\n\n- [Features](#features)\n- [Installation](#installation)\n- [Basic Usage](#basic-usage)\n- [Fetching With Custom Video IDs](#fetching-with-custom-video-ids)\n- [Exporting](#exporting)\n- [Proxy Configuration](#proxy-configuration)\n- [Advanced HTTP Configuration](#advanced-http-configuration-optional)\n- [CLI](#cli)\n- [Contributing](#contributing)\n- [Running Tests](#running-tests)\n- [Related Projects](#related-projects)\n- [License](#license)\n\n---\n\n## Features\n\n- Fetch full transcripts from a YouTube channel.\n- Get video metadata: title, description, thumbnails, published date.\n- Async support for high performance.\n- Export fetched data as txt, csv or json.\n- CLI support.\n\n---\n\n## Installation\n\nIt is recommended to install this package by using pip:\n\n```bash\npip install ytfetcher\n```\n\n## Basic Usage\n\nYtfetcher uses **YoutubeV3 API** to get channel details and video id's so you have to create your API key from Google Cloud Console [In here](https://console.cloud.google.com/apis/api/youtube.googleapis.com).\n\nAlso keep in mind that you have a quota limit for **YoutubeV3 API**, but for basic usage quota isn't generally a concern.\n\nHere how you can get transcripts and metadata informations like channel name, description, publishedDate etc. from a single channel with `from_channel` method:\n\n```python\nfrom ytfetcher import YTFetcher\nfrom ytfetcher import ChannelData # Or ytfetcher.models import ChannelData\nimport asyncio\n\nfetcher = YTFetcher.from_channel(\n    api_key='your-youtubev3-api-key', \n    channel_handle=\"TheOffice\", \n    max_results=2)\n\nasync def get_channel_data() -> list[ChannelData]:\n    channel_data = await fetcher.fetch_youtube_data()\n    return channel_data\n\nif __name__ == '__main__':\n    data = asyncio.run(get_channel_data())\n    print(data)\n```\n\n---\n\nThis will return a list of `ChannelData`. Here's how it's looks like:\n\n```python\n[\nChannelData(\n    video_id='video1',\n    transcripts=[\n        Transcript(\n            text=\"Hey there\",\n            start=0.0,\n            duration=1.54\n        ),\n        Transcript(\n            text=\"Happy coding!\",\n            start=1.56,\n            duration=4.46\n        )\n    ]\n    metadata=Snippet(\n        title='VideoTitle',\n        description='VideoDescription',\n        publishedAt='02.04.2025',\n        channelId='id123',\n        thumbnails=Thumbnails(\n            default=Thumbnail(\n                url:'thumbnail_url',\n                width: 124,\n                height: 124\n            )\n        )\n    )\n),\n# Other ChannelData objects...\n]\n```\n\n## Fetching With Custom Video IDs\n\nYou can also initialize `ytfetcher` with custom video id's using `from_video_ids` method.\n\n```python\nfrom ytfetcher import YTFetcher\nimport asyncio\n\nfetcher = YTFetcher.from_video_ids(\n    api_key='your-youtubev3-api-key', \n    video_ids=['video1', 'video2', 'video3']) # Here we initialized ytfetcher with from_video_ids method.\n\n# Rest is same ...\n```\n\n## Exporting\n\nTo export data you can use `Exporter` class. Exporter allows you to export `ChannelData` with formats like **csv**, **json** or **txt**.\n\n```python\nfrom ytfetcher.services import Exporter\n\nchannel_data = await fetcher.fetch_youtube_data()\n\nexporter = Exporter(\n    channel_data=channel_data,\n    allowed_metadata_list=['title', 'publishedAt'],   # You can customize this\n    timing=True,                                      # Include transcript start/duration\n    filename='my_export',                             # Base filename\n    output_dir='./exports'                            # Optional export directory\n)\n\nexporter.export_as_json()  # or .export_as_txt(), .export_as_csv()\n\n```\n\n## Other Methods\n\nYou can also fetch only transcript data or metadata with video ID's using `fetch_transcripts` and `fetch_snippets` methods.\n\n### Fetch Transcripts\n\n```python\nfrom ytfetcher import VideoTranscript\n\nfetcher = YTFetcher.from_channel(\n    api_key='your-youtubev3-api-key', \n    channel_handle=\"TheOffice\", \n    max_results=2)\n\nasync def get_transcript_data() -> list[VideoTranscript]:\n    transcript_data = await fetcher.fetch_transcripts()\n    return transcript_data\n\nif __name__ == '__main__':\n    data = asyncio.run(get_transcript_data())\n    print(data)\n\n```\n\n### Fetch Snippets\n\n```python\nfrom ytfetcher import VideoMetadata\n\n# Init ytfetcher ...\n\ndef get_metadata() -> list[VideoMetadata]:\n    metadata = fetcher.fetch_snippets()\n    return metadata\n\nif __name__ == '__main__':\n    get_metadata()\n\n```\n\n## Proxy Configuration\n\n`YTFetcher` supports proxy usage for fetching YouTube transcripts by leveraging the built-in proxy configuration support from [youtube-transcript-api](https://pypi.org/project/youtube-transcript-api/).\n\nTo configure proxies, you can pass a proxy config object from `ytfecher.config` directly to `YTFetcher`:\n\n```python\nfrom ytfetcher import YTFetcher\nfrom ytfetcher.config import GenericProxyConfig, WebshareProxyConfig\n\nfetcher = YTFetcher.from_channel(\n    api_key=\"your-api-key\",\n    channel_handle=\"TheOffice\",\n    max_results=3,\n    proxy_config=GenericProxyConfig() | WebshareProxyConfig()\n)\n```\n\nFor more information about proxy configuration please check official `youtube-transcript-api` documents.\n\n## Advanced HTTP Configuration (Optional)\n\nYou can pass a custom timeout or headers (e.g., user-agent) to `YTFetcher` using `HTTPConfig`:\n\n```python\nfrom ytfetcher import YTFetcher\nfrom ytfetcher.config import HTTPConfig\n\ncustom_config = HTTPConfig(\n    timeout=4.0,\n    headers={\"User-Agent\": \"ytfetcher/1.0\"} # Doesn't recommended to change this unless you have a strong headers.\n)\n\nfetcher = YTFetcher.from_channel(\n    api_key=\"your-key\",\n    channel_handle=\"TheOffice\",\n    max_results=10,\n    http_config=custom_config\n)\n```\n\n## CLI\n\n### Basic Usage\n\n```bash\nytfetcher from_channel --api-key <API_KEY> -c <CHANNEL_HANDLE> -m <MAX_RESULTS> -f <FORMAT>\n```\n\nBasic usage example:\n\n```bash\nytfetcher from_channel --api-key <API_KEY> -c \"channelname\" -m 20 -f json\n```\n\n### Output Example\n\n```json\n[\n  {\n    \"video_id\": \"abc123\",\n    \"metadata\": {\n      \"title\": \"Video Title\",\n      \"description\": \"Video Description\",\n      \"publishedAt\": \"2023-07-01T12:00:00Z\"\n    },\n    \"transcripts\": [\n      {\"text\": \"Welcome!\", \"start\": 0.0, \"duration\": 1.2}\n    ]\n  }\n]\n```\n\n### Setting API Key Globally In CLI\n\nYou can save your api key once with `ytfetcher config` command and use it globally without needing to write everytime while using CLI.\n\n```bash\nytfetcher config <YOUR_API_KEY>\n```\n\nNow you can basically say without passing API key argument.\n\n```bash\nytfetcher from_channel -c ChannelName\n```\n\n### Using Webshare Proxy\n\n```bash\nytfetcher from_channel --api-key <API_KEY> -c \"channel\" -f json \\\n  --webshare-proxy-username \"<USERNAME>\" \\\n  --webshare-proxy-password \"<PASSWORD>\"\n\n```\n\n### Using Custom Proxy\n\n```bash\nytfetcher from_channel --api-key <API_KEY> -c \"channel\" -f json \\\n  --http-proxy \"http://user:pass@host:port\" \\\n  --https-proxy \"https://user:pass@host:port\"\n\n```\n\n### Using Custom HTTP Config\n```bash\nytfetcher from_channel --api-key <API_KEY> -c \"channel\" \\\n  --http-timeout 4.2 \\\n  --http-headers \"{'key': 'value'}\" # Must be exact wrapper with double quotes with following single quotes.\n```\n\n### Fetching by Video IDs\n\n```bash\nytfetcher from_video_ids --api-key <API_KEY> -v video_id1 video_id2 ... -f json\n```\n\n---\n\n## Contributing\n\nTo insall this project locally:\n\n```bash\ngit clone https://github.com/kaya70875/ytfetcher.git\ncd ytfetcher\npoetry install\n```\n\n## Running Tests\n\n```bash\npoetry run pytest\n```\n\n## Related Projects\n\n- [youtube-transcript-api](https://github.com/jdepoix/youtube-transcript-api)\n\n## License\n\nThis project is licensed under the MIT License \u2014 see the [LICENSE](./LICENSE) file for details.\n",
    "bugtrack_url": null,
    "license": "MIT License\n\nCopyright (c) 2025 Ahmet Kaya\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in\nall copies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY...\n",
    "summary": "YTFetcher lets you fetch YouTube transcripts in bulk with metadata like titles, publish dates, and thumbnails. Great for ML, NLP, and dataset generation.",
    "version": "0.4.0",
    "project_urls": {
        "Documentation": "https://github.com/kaya70875/ytfetcher#readme",
        "Homepage": "https://github.com/kaya70875/ytfetcher",
        "Repository": "https://github.com/kaya70875/ytfetcher"
    },
    "split_keywords": [
        "yt",
        " transcript",
        " youtube",
        " cli",
        " dataset",
        " scraping",
        " python"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a9923af1fec81965eef09eacdc0f3de16ebc4ff5d9de5e2a85f59afee447fa69",
                "md5": "94a6dafd14aa179ede8f94c8155a98db",
                "sha256": "c39ad510f42070f8995031786cd31d070d7dcbee1beeee087ccf259292c0e64d"
            },
            "downloads": -1,
            "filename": "ytfetcher-0.4.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "94a6dafd14aa179ede8f94c8155a98db",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.14,>=3.11",
            "size": 18844,
            "upload_time": "2025-08-10T12:13:16",
            "upload_time_iso_8601": "2025-08-10T12:13:16.108065Z",
            "url": "https://files.pythonhosted.org/packages/a9/92/3af1fec81965eef09eacdc0f3de16ebc4ff5d9de5e2a85f59afee447fa69/ytfetcher-0.4.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "13fdfd14047a7c592a9232d931ff6a32d22a16a9eae162efa37dda4decbf91b3",
                "md5": "b5c964e51a5b9dab804ba0c8e560d2a1",
                "sha256": "f4bd663f8c7cd130a7cde7e2a9baab0a0155bae17bd92c251d91890e84f3cc79"
            },
            "downloads": -1,
            "filename": "ytfetcher-0.4.0.tar.gz",
            "has_sig": false,
            "md5_digest": "b5c964e51a5b9dab804ba0c8e560d2a1",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.14,>=3.11",
            "size": 16322,
            "upload_time": "2025-08-10T12:13:17",
            "upload_time_iso_8601": "2025-08-10T12:13:17.369073Z",
            "url": "https://files.pythonhosted.org/packages/13/fd/fd14047a7c592a9232d931ff6a32d22a16a9eae162efa37dda4decbf91b3/ytfetcher-0.4.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-10 12:13:17",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kaya70875",
    "github_project": "ytfetcher",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "ytfetcher"
}
        
Elapsed time: 1.19440s