Name | ytfetcher JSON |
Version |
0.4.0
JSON |
| download |
home_page | https://github.com/kaya70875/ytfetcher |
Summary | YTFetcher lets you fetch YouTube transcripts in bulk with metadata like titles, publish dates, and thumbnails. Great for ML, NLP, and dataset generation. |
upload_time | 2025-08-10 12:13:17 |
maintainer | None |
docs_url | None |
author | Ahmet Kaya |
requires_python | <3.14,>=3.11 |
license | MIT License
Copyright (c) 2025 Ahmet Kaya
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY...
|
keywords |
yt
transcript
youtube
cli
dataset
scraping
python
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# YTFetcher
[](https://codecov.io/gh/kaya70875/ytfetcher)
[](https://pypi.org/project/ytfetcher/)
[](https://opensource.org/licenses/MIT)
> ⚡ Effortlessly fetch and convert YouTube transcripts for ML, research, or personal use.
**YTFetcher** is a Python tool for fetching YouTube video transcripts in bulk, along with rich metadata like titles, publish dates, and descriptions. Ideal for building NLP datasets, search indexes, or powering content analysis apps.
---
## 📚 Table of Contents
- [Features](#features)
- [Installation](#installation)
- [Basic Usage](#basic-usage)
- [Fetching With Custom Video IDs](#fetching-with-custom-video-ids)
- [Exporting](#exporting)
- [Proxy Configuration](#proxy-configuration)
- [Advanced HTTP Configuration](#advanced-http-configuration-optional)
- [CLI](#cli)
- [Contributing](#contributing)
- [Running Tests](#running-tests)
- [Related Projects](#related-projects)
- [License](#license)
---
## Features
- Fetch full transcripts from a YouTube channel.
- Get video metadata: title, description, thumbnails, published date.
- Async support for high performance.
- Export fetched data as txt, csv or json.
- CLI support.
---
## Installation
It is recommended to install this package by using pip:
```bash
pip install ytfetcher
```
## Basic Usage
Ytfetcher uses **YoutubeV3 API** to get channel details and video id's so you have to create your API key from Google Cloud Console [In here](https://console.cloud.google.com/apis/api/youtube.googleapis.com).
Also keep in mind that you have a quota limit for **YoutubeV3 API**, but for basic usage quota isn't generally a concern.
Here how you can get transcripts and metadata informations like channel name, description, publishedDate etc. from a single channel with `from_channel` method:
```python
from ytfetcher import YTFetcher
from ytfetcher import ChannelData # Or ytfetcher.models import ChannelData
import asyncio
fetcher = YTFetcher.from_channel(
api_key='your-youtubev3-api-key',
channel_handle="TheOffice",
max_results=2)
async def get_channel_data() -> list[ChannelData]:
channel_data = await fetcher.fetch_youtube_data()
return channel_data
if __name__ == '__main__':
data = asyncio.run(get_channel_data())
print(data)
```
---
This will return a list of `ChannelData`. Here's how it's looks like:
```python
[
ChannelData(
video_id='video1',
transcripts=[
Transcript(
text="Hey there",
start=0.0,
duration=1.54
),
Transcript(
text="Happy coding!",
start=1.56,
duration=4.46
)
]
metadata=Snippet(
title='VideoTitle',
description='VideoDescription',
publishedAt='02.04.2025',
channelId='id123',
thumbnails=Thumbnails(
default=Thumbnail(
url:'thumbnail_url',
width: 124,
height: 124
)
)
)
),
# Other ChannelData objects...
]
```
## Fetching With Custom Video IDs
You can also initialize `ytfetcher` with custom video id's using `from_video_ids` method.
```python
from ytfetcher import YTFetcher
import asyncio
fetcher = YTFetcher.from_video_ids(
api_key='your-youtubev3-api-key',
video_ids=['video1', 'video2', 'video3']) # Here we initialized ytfetcher with from_video_ids method.
# Rest is same ...
```
## Exporting
To export data you can use `Exporter` class. Exporter allows you to export `ChannelData` with formats like **csv**, **json** or **txt**.
```python
from ytfetcher.services import Exporter
channel_data = await fetcher.fetch_youtube_data()
exporter = Exporter(
channel_data=channel_data,
allowed_metadata_list=['title', 'publishedAt'], # You can customize this
timing=True, # Include transcript start/duration
filename='my_export', # Base filename
output_dir='./exports' # Optional export directory
)
exporter.export_as_json() # or .export_as_txt(), .export_as_csv()
```
## Other Methods
You can also fetch only transcript data or metadata with video ID's using `fetch_transcripts` and `fetch_snippets` methods.
### Fetch Transcripts
```python
from ytfetcher import VideoTranscript
fetcher = YTFetcher.from_channel(
api_key='your-youtubev3-api-key',
channel_handle="TheOffice",
max_results=2)
async def get_transcript_data() -> list[VideoTranscript]:
transcript_data = await fetcher.fetch_transcripts()
return transcript_data
if __name__ == '__main__':
data = asyncio.run(get_transcript_data())
print(data)
```
### Fetch Snippets
```python
from ytfetcher import VideoMetadata
# Init ytfetcher ...
def get_metadata() -> list[VideoMetadata]:
metadata = fetcher.fetch_snippets()
return metadata
if __name__ == '__main__':
get_metadata()
```
## Proxy Configuration
`YTFetcher` supports proxy usage for fetching YouTube transcripts by leveraging the built-in proxy configuration support from [youtube-transcript-api](https://pypi.org/project/youtube-transcript-api/).
To configure proxies, you can pass a proxy config object from `ytfecher.config` directly to `YTFetcher`:
```python
from ytfetcher import YTFetcher
from ytfetcher.config import GenericProxyConfig, WebshareProxyConfig
fetcher = YTFetcher.from_channel(
api_key="your-api-key",
channel_handle="TheOffice",
max_results=3,
proxy_config=GenericProxyConfig() | WebshareProxyConfig()
)
```
For more information about proxy configuration please check official `youtube-transcript-api` documents.
## Advanced HTTP Configuration (Optional)
You can pass a custom timeout or headers (e.g., user-agent) to `YTFetcher` using `HTTPConfig`:
```python
from ytfetcher import YTFetcher
from ytfetcher.config import HTTPConfig
custom_config = HTTPConfig(
timeout=4.0,
headers={"User-Agent": "ytfetcher/1.0"} # Doesn't recommended to change this unless you have a strong headers.
)
fetcher = YTFetcher.from_channel(
api_key="your-key",
channel_handle="TheOffice",
max_results=10,
http_config=custom_config
)
```
## CLI
### Basic Usage
```bash
ytfetcher from_channel --api-key <API_KEY> -c <CHANNEL_HANDLE> -m <MAX_RESULTS> -f <FORMAT>
```
Basic usage example:
```bash
ytfetcher from_channel --api-key <API_KEY> -c "channelname" -m 20 -f json
```
### Output Example
```json
[
{
"video_id": "abc123",
"metadata": {
"title": "Video Title",
"description": "Video Description",
"publishedAt": "2023-07-01T12:00:00Z"
},
"transcripts": [
{"text": "Welcome!", "start": 0.0, "duration": 1.2}
]
}
]
```
### Setting API Key Globally In CLI
You can save your api key once with `ytfetcher config` command and use it globally without needing to write everytime while using CLI.
```bash
ytfetcher config <YOUR_API_KEY>
```
Now you can basically say without passing API key argument.
```bash
ytfetcher from_channel -c ChannelName
```
### Using Webshare Proxy
```bash
ytfetcher from_channel --api-key <API_KEY> -c "channel" -f json \
--webshare-proxy-username "<USERNAME>" \
--webshare-proxy-password "<PASSWORD>"
```
### Using Custom Proxy
```bash
ytfetcher from_channel --api-key <API_KEY> -c "channel" -f json \
--http-proxy "http://user:pass@host:port" \
--https-proxy "https://user:pass@host:port"
```
### Using Custom HTTP Config
```bash
ytfetcher from_channel --api-key <API_KEY> -c "channel" \
--http-timeout 4.2 \
--http-headers "{'key': 'value'}" # Must be exact wrapper with double quotes with following single quotes.
```
### Fetching by Video IDs
```bash
ytfetcher from_video_ids --api-key <API_KEY> -v video_id1 video_id2 ... -f json
```
---
## Contributing
To insall this project locally:
```bash
git clone https://github.com/kaya70875/ytfetcher.git
cd ytfetcher
poetry install
```
## Running Tests
```bash
poetry run pytest
```
## Related Projects
- [youtube-transcript-api](https://github.com/jdepoix/youtube-transcript-api)
## License
This project is licensed under the MIT License — see the [LICENSE](./LICENSE) file for details.
Raw data
{
"_id": null,
"home_page": "https://github.com/kaya70875/ytfetcher",
"name": "ytfetcher",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.14,>=3.11",
"maintainer_email": null,
"keywords": "yt, transcript, youtube, cli, dataset, scraping, python",
"author": "Ahmet Kaya",
"author_email": "kaya70875@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/13/fd/fd14047a7c592a9232d931ff6a32d22a16a9eae162efa37dda4decbf91b3/ytfetcher-0.4.0.tar.gz",
"platform": null,
"description": "# YTFetcher\n[](https://codecov.io/gh/kaya70875/ytfetcher)\n[](https://pypi.org/project/ytfetcher/)\n[](https://opensource.org/licenses/MIT)\n\n> \u26a1 Effortlessly fetch and convert YouTube transcripts for ML, research, or personal use.\n\n**YTFetcher** is a Python tool for fetching YouTube video transcripts in bulk, along with rich metadata like titles, publish dates, and descriptions. Ideal for building NLP datasets, search indexes, or powering content analysis apps.\n\n---\n\n## \ud83d\udcda Table of Contents\n\n- [Features](#features)\n- [Installation](#installation)\n- [Basic Usage](#basic-usage)\n- [Fetching With Custom Video IDs](#fetching-with-custom-video-ids)\n- [Exporting](#exporting)\n- [Proxy Configuration](#proxy-configuration)\n- [Advanced HTTP Configuration](#advanced-http-configuration-optional)\n- [CLI](#cli)\n- [Contributing](#contributing)\n- [Running Tests](#running-tests)\n- [Related Projects](#related-projects)\n- [License](#license)\n\n---\n\n## Features\n\n- Fetch full transcripts from a YouTube channel.\n- Get video metadata: title, description, thumbnails, published date.\n- Async support for high performance.\n- Export fetched data as txt, csv or json.\n- CLI support.\n\n---\n\n## Installation\n\nIt is recommended to install this package by using pip:\n\n```bash\npip install ytfetcher\n```\n\n## Basic Usage\n\nYtfetcher uses **YoutubeV3 API** to get channel details and video id's so you have to create your API key from Google Cloud Console [In here](https://console.cloud.google.com/apis/api/youtube.googleapis.com).\n\nAlso keep in mind that you have a quota limit for **YoutubeV3 API**, but for basic usage quota isn't generally a concern.\n\nHere how you can get transcripts and metadata informations like channel name, description, publishedDate etc. from a single channel with `from_channel` method:\n\n```python\nfrom ytfetcher import YTFetcher\nfrom ytfetcher import ChannelData # Or ytfetcher.models import ChannelData\nimport asyncio\n\nfetcher = YTFetcher.from_channel(\n api_key='your-youtubev3-api-key', \n channel_handle=\"TheOffice\", \n max_results=2)\n\nasync def get_channel_data() -> list[ChannelData]:\n channel_data = await fetcher.fetch_youtube_data()\n return channel_data\n\nif __name__ == '__main__':\n data = asyncio.run(get_channel_data())\n print(data)\n```\n\n---\n\nThis will return a list of `ChannelData`. Here's how it's looks like:\n\n```python\n[\nChannelData(\n video_id='video1',\n transcripts=[\n Transcript(\n text=\"Hey there\",\n start=0.0,\n duration=1.54\n ),\n Transcript(\n text=\"Happy coding!\",\n start=1.56,\n duration=4.46\n )\n ]\n metadata=Snippet(\n title='VideoTitle',\n description='VideoDescription',\n publishedAt='02.04.2025',\n channelId='id123',\n thumbnails=Thumbnails(\n default=Thumbnail(\n url:'thumbnail_url',\n width: 124,\n height: 124\n )\n )\n )\n),\n# Other ChannelData objects...\n]\n```\n\n## Fetching With Custom Video IDs\n\nYou can also initialize `ytfetcher` with custom video id's using `from_video_ids` method.\n\n```python\nfrom ytfetcher import YTFetcher\nimport asyncio\n\nfetcher = YTFetcher.from_video_ids(\n api_key='your-youtubev3-api-key', \n video_ids=['video1', 'video2', 'video3']) # Here we initialized ytfetcher with from_video_ids method.\n\n# Rest is same ...\n```\n\n## Exporting\n\nTo export data you can use `Exporter` class. Exporter allows you to export `ChannelData` with formats like **csv**, **json** or **txt**.\n\n```python\nfrom ytfetcher.services import Exporter\n\nchannel_data = await fetcher.fetch_youtube_data()\n\nexporter = Exporter(\n channel_data=channel_data,\n allowed_metadata_list=['title', 'publishedAt'], # You can customize this\n timing=True, # Include transcript start/duration\n filename='my_export', # Base filename\n output_dir='./exports' # Optional export directory\n)\n\nexporter.export_as_json() # or .export_as_txt(), .export_as_csv()\n\n```\n\n## Other Methods\n\nYou can also fetch only transcript data or metadata with video ID's using `fetch_transcripts` and `fetch_snippets` methods.\n\n### Fetch Transcripts\n\n```python\nfrom ytfetcher import VideoTranscript\n\nfetcher = YTFetcher.from_channel(\n api_key='your-youtubev3-api-key', \n channel_handle=\"TheOffice\", \n max_results=2)\n\nasync def get_transcript_data() -> list[VideoTranscript]:\n transcript_data = await fetcher.fetch_transcripts()\n return transcript_data\n\nif __name__ == '__main__':\n data = asyncio.run(get_transcript_data())\n print(data)\n\n```\n\n### Fetch Snippets\n\n```python\nfrom ytfetcher import VideoMetadata\n\n# Init ytfetcher ...\n\ndef get_metadata() -> list[VideoMetadata]:\n metadata = fetcher.fetch_snippets()\n return metadata\n\nif __name__ == '__main__':\n get_metadata()\n\n```\n\n## Proxy Configuration\n\n`YTFetcher` supports proxy usage for fetching YouTube transcripts by leveraging the built-in proxy configuration support from [youtube-transcript-api](https://pypi.org/project/youtube-transcript-api/).\n\nTo configure proxies, you can pass a proxy config object from `ytfecher.config` directly to `YTFetcher`:\n\n```python\nfrom ytfetcher import YTFetcher\nfrom ytfetcher.config import GenericProxyConfig, WebshareProxyConfig\n\nfetcher = YTFetcher.from_channel(\n api_key=\"your-api-key\",\n channel_handle=\"TheOffice\",\n max_results=3,\n proxy_config=GenericProxyConfig() | WebshareProxyConfig()\n)\n```\n\nFor more information about proxy configuration please check official `youtube-transcript-api` documents.\n\n## Advanced HTTP Configuration (Optional)\n\nYou can pass a custom timeout or headers (e.g., user-agent) to `YTFetcher` using `HTTPConfig`:\n\n```python\nfrom ytfetcher import YTFetcher\nfrom ytfetcher.config import HTTPConfig\n\ncustom_config = HTTPConfig(\n timeout=4.0,\n headers={\"User-Agent\": \"ytfetcher/1.0\"} # Doesn't recommended to change this unless you have a strong headers.\n)\n\nfetcher = YTFetcher.from_channel(\n api_key=\"your-key\",\n channel_handle=\"TheOffice\",\n max_results=10,\n http_config=custom_config\n)\n```\n\n## CLI\n\n### Basic Usage\n\n```bash\nytfetcher from_channel --api-key <API_KEY> -c <CHANNEL_HANDLE> -m <MAX_RESULTS> -f <FORMAT>\n```\n\nBasic usage example:\n\n```bash\nytfetcher from_channel --api-key <API_KEY> -c \"channelname\" -m 20 -f json\n```\n\n### Output Example\n\n```json\n[\n {\n \"video_id\": \"abc123\",\n \"metadata\": {\n \"title\": \"Video Title\",\n \"description\": \"Video Description\",\n \"publishedAt\": \"2023-07-01T12:00:00Z\"\n },\n \"transcripts\": [\n {\"text\": \"Welcome!\", \"start\": 0.0, \"duration\": 1.2}\n ]\n }\n]\n```\n\n### Setting API Key Globally In CLI\n\nYou can save your api key once with `ytfetcher config` command and use it globally without needing to write everytime while using CLI.\n\n```bash\nytfetcher config <YOUR_API_KEY>\n```\n\nNow you can basically say without passing API key argument.\n\n```bash\nytfetcher from_channel -c ChannelName\n```\n\n### Using Webshare Proxy\n\n```bash\nytfetcher from_channel --api-key <API_KEY> -c \"channel\" -f json \\\n --webshare-proxy-username \"<USERNAME>\" \\\n --webshare-proxy-password \"<PASSWORD>\"\n\n```\n\n### Using Custom Proxy\n\n```bash\nytfetcher from_channel --api-key <API_KEY> -c \"channel\" -f json \\\n --http-proxy \"http://user:pass@host:port\" \\\n --https-proxy \"https://user:pass@host:port\"\n\n```\n\n### Using Custom HTTP Config\n```bash\nytfetcher from_channel --api-key <API_KEY> -c \"channel\" \\\n --http-timeout 4.2 \\\n --http-headers \"{'key': 'value'}\" # Must be exact wrapper with double quotes with following single quotes.\n```\n\n### Fetching by Video IDs\n\n```bash\nytfetcher from_video_ids --api-key <API_KEY> -v video_id1 video_id2 ... -f json\n```\n\n---\n\n## Contributing\n\nTo insall this project locally:\n\n```bash\ngit clone https://github.com/kaya70875/ytfetcher.git\ncd ytfetcher\npoetry install\n```\n\n## Running Tests\n\n```bash\npoetry run pytest\n```\n\n## Related Projects\n\n- [youtube-transcript-api](https://github.com/jdepoix/youtube-transcript-api)\n\n## License\n\nThis project is licensed under the MIT License \u2014 see the [LICENSE](./LICENSE) file for details.\n",
"bugtrack_url": null,
"license": "MIT License\n\nCopyright (c) 2025 Ahmet Kaya\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in\nall copies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY...\n",
"summary": "YTFetcher lets you fetch YouTube transcripts in bulk with metadata like titles, publish dates, and thumbnails. Great for ML, NLP, and dataset generation.",
"version": "0.4.0",
"project_urls": {
"Documentation": "https://github.com/kaya70875/ytfetcher#readme",
"Homepage": "https://github.com/kaya70875/ytfetcher",
"Repository": "https://github.com/kaya70875/ytfetcher"
},
"split_keywords": [
"yt",
" transcript",
" youtube",
" cli",
" dataset",
" scraping",
" python"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "a9923af1fec81965eef09eacdc0f3de16ebc4ff5d9de5e2a85f59afee447fa69",
"md5": "94a6dafd14aa179ede8f94c8155a98db",
"sha256": "c39ad510f42070f8995031786cd31d070d7dcbee1beeee087ccf259292c0e64d"
},
"downloads": -1,
"filename": "ytfetcher-0.4.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "94a6dafd14aa179ede8f94c8155a98db",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.14,>=3.11",
"size": 18844,
"upload_time": "2025-08-10T12:13:16",
"upload_time_iso_8601": "2025-08-10T12:13:16.108065Z",
"url": "https://files.pythonhosted.org/packages/a9/92/3af1fec81965eef09eacdc0f3de16ebc4ff5d9de5e2a85f59afee447fa69/ytfetcher-0.4.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "13fdfd14047a7c592a9232d931ff6a32d22a16a9eae162efa37dda4decbf91b3",
"md5": "b5c964e51a5b9dab804ba0c8e560d2a1",
"sha256": "f4bd663f8c7cd130a7cde7e2a9baab0a0155bae17bd92c251d91890e84f3cc79"
},
"downloads": -1,
"filename": "ytfetcher-0.4.0.tar.gz",
"has_sig": false,
"md5_digest": "b5c964e51a5b9dab804ba0c8e560d2a1",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.14,>=3.11",
"size": 16322,
"upload_time": "2025-08-10T12:13:17",
"upload_time_iso_8601": "2025-08-10T12:13:17.369073Z",
"url": "https://files.pythonhosted.org/packages/13/fd/fd14047a7c592a9232d931ff6a32d22a16a9eae162efa37dda4decbf91b3/ytfetcher-0.4.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-10 12:13:17",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "kaya70875",
"github_project": "ytfetcher",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "ytfetcher"
}