Name | ytfetcher JSON |
Version |
1.3.1
JSON |
| download |
home_page | https://github.com/kaya70875/ytfetcher |
Summary | YTFetcher lets you fetch YouTube transcripts in bulk with metadata like titles, publish dates, and thumbnails. Great for ML, NLP, and dataset generation. |
upload_time | 2025-10-18 10:16:07 |
maintainer | None |
docs_url | None |
author | Ahmet Kaya |
requires_python | <3.14,>=3.11 |
license | MIT License
Copyright (c) 2025 Ahmet Kaya
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY...
|
keywords |
yt
transcripts
youtube
cli
dataset
scraping
python
youtube-transcripts
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# YTFetcher
[](https://codecov.io/gh/kaya70875/ytfetcher)
[](https://pepy.tech/projects/ytfetcher)
[](https://pypi.org/project/ytfetcher/)
[](https://ytfetcher.readthedocs.io/en/latest/?badge=latest)
[](https://opensource.org/licenses/MIT)
> ⚡ Turn hours of YouTube videos into clean, structured text in minutes.
A python tool for fetching thousands of videos fast from a Youtube channel along with structured transcripts and additional metadata. Export data easily as CSV, TXT, or JSON.
---
## 📚 Table of Contents
- [Installation](#installation)
- [Quick CLI Usage](#quick-cli-usage)
- [Features](#features)
- [Basic Usage (Python API)](#basic-usage-python-api)
- [Using Different Fetchers](#using-different-fetchers)
- [Retreive Different Languages](#retreive-different-languages)
- [Exporting](#exporting)
- [Other Methods](#other-methods)
- [Proxy Configuration](#proxy-configuration)
- [Advanced HTTP Configuration (Optional)](#advanced-http-configuration-optional)
- [CLI (Advanced)](#cli-advanced)
- [Contributing](#contributing)
- [Running Tests](#running-tests)
- [Related Projects](#related-projects)
- [License](#license)
---
## Installation
Install from PyPI:
```bash
pip install ytfetcher
```
---
## Quick CLI Usage
Fetch 50 video transcripts + metadata from a channel and save as JSON:
```bash
ytfetcher from_channel -c TheOffice -m 50 -f json
```
---
## CLI Overview
YTFetcher comes with a simple CLI so you can fetch data directly from your terminal.
```bash
ytfetcher -h
```
```bash
usage: ytfetcher [-h] {from_channel,from_video_ids} ...
Fetch YouTube transcripts for a channel
positional arguments:
{from_channel,from_video_ids}
from_channel Fetch data from channel handle with max_results.
from_playlist_id Fetch data from a specific playlist id.
from_video_ids Fetch data from your custom video ids.
options:
-h, --help show this help message and exit
```
---
## Features
- Fetch full **transcripts** from a YouTube channel.
- Get video **metadata: title, description, thumbnails, published date**.
- Async support for **high performance**.
- **Export** fetched data as txt, csv or json.
- **CLI** support.
---
## Basic Usage (Python API)
**Note:** When specifying the channel, you should provide the exact **channel handle** without the `@` symbol, channel URL, or display name.
For example, use `TheOffice` instead of `@TheOffice` or `https://www.youtube.com/c/TheOffice`.
Here’s how you can get transcripts and metadata information like channel name, description, published date, etc. from a single channel with `from_channel` method:
```python
from ytfetcher import YTFetcher
from ytfetcher.models.channel import ChannelData
import asyncio
fetcher = YTFetcher.from_channel(
channel_handle="TheOffice",
max_results=2
)
async def get_channel_data() -> list[ChannelData]:
channel_data = await fetcher.fetch_youtube_data()
return channel_data
if __name__ == '__main__':
data = asyncio.run(get_channel_data())
print(data)
```
---
This will return a list of `ChannelData` with metadata in `DLSnippet` objects:
```python
[
ChannelData(
video_id='video1',
transcripts=[
Transcript(
text="Hey there",
start=0.0,
duration=1.54
),
Transcript(
text="Happy coding!",
start=1.56,
duration=4.46
)
]
metadata=DLSnippet(
title='VideoTitle',
description='VideoDescription',
url='https://youtu.be/video1',
duration=120,
view_count=1000,
thumbnails=[{'url': 'thumbnail_url'}]
)
),
# Other ChannelData objects...
]
```
---
## Using Different Fetchers
`Ytfetcher` also supports different fetcher so you can fetch with `channel_handle`, custom `video_ids` or from a `playlist_id`
### Fetching from Playlist ID
Here's how you can fetch bulk transcripts from a specific `playlist_id` using `ytfetcher`.
```python
from ytfetcher import YTFetcher
import asyncio
fetcher = YTFetcher.from_playlist_id(
playlist_id="playlistid1254"
)
# Rest is same ...
```
### Fetching With Custom Video IDs
Initialize `ytfetcher` with custom video IDs using `from_video_ids` method:
```python
from ytfetcher import YTFetcher
import asyncio
fetcher = YTFetcher.from_video_ids(
video_ids=['video1', 'video2', 'video3']
)
# Rest is same ...
```
---
## Retreive Different Languages
You can use the `languages` param to retrieve your desired language. (Default en)
```python
fetcher = YTFetcher.from_video_ids(video_ids=video_ids, languages=["tr", "en"])
```
Also here's a quick CLI command for `languages` param.
```bash
ytfetcher from_channel -c TheOffice -m 50 -f csv --print --languages tr en
```
`ytfetcher` first tries to fetch the `Turkish` transcript. If it's not available, it falls back to `English`.
---
## Exporting
Use the `Exporter` class to export `ChannelData` in **csv**, **json**, or **txt**:
```python
from ytfetcher.services import Exporter
channel_data = asyncio.run(fetcher.fetch_youtube_data())
exporter = Exporter(
channel_data=channel_data,
allowed_metadata_list=['title'], # You can customize this
timing=True, # Include transcript start/duration
filename='my_export', # Base filename
output_dir='./exports' # Optional output directory
)
exporter.export_as_json() # or .export_as_txt(), .export_as_csv()
```
### Exporting With CLI
You can also specify arguments when exporting which allows you to decide whether to exclude `timings` and choose desired `metadata`.
```bash
ytfetcher from_channel -c TheOffice -m 20 -f json --no-timing --metadata title description
```
This will **exclude** `timings` from transcripts and keep only `title` and `description` as metadata.
---
## Other Methods
You can also fetch only transcript data or metadata with video IDs using `fetch_transcripts` and `fetch_snippets`.
### Fetch Transcripts
```python
fetcher = YTFetcher.from_channel(channel_handle="TheOffice", max_results=2)
async def get_transcript_data():
return await fetcher.fetch_transcripts()
data = asyncio.run(get_transcript_data())
print(data)
```
### Fetch Snippets
```python
async def get_snippets():
return await fetcher.fetch_snippets()
data = asyncio.run(get_snippets())
print(data)
```
---
## Proxy Configuration
`YTFetcher` supports proxy usage for fetching YouTube transcripts:
```python
from ytfetcher import YTFetcher
from ytfetcher.config import GenericProxyConfig
fetcher = YTFetcher.from_channel(
channel_handle="TheOffice",
max_results=3,
proxy_config=GenericProxyConfig()
)
```
---
## Advanced HTTP Configuration (Optional)
`YTfetcher` already uses custom headers for mimic real browser behavior but if want to change it you can use a custom `HTTPConfig` class.
```python
from ytfetcher import YTFetcher
from ytfetcher.config import HTTPConfig
custom_config = HTTPConfig(
timeout=4.0,
headers={"User-Agent": "ytfetcher/1.0"}
)
fetcher = YTFetcher.from_channel(
channel_handle="TheOffice",
max_results=10,
http_config=custom_config
)
```
---
## CLI (Advanced)
### Basic Usage
```bash
ytfetcher from_channel -c <CHANNEL_HANDLE> -m <MAX_RESULTS> -f <FORMAT>
```
### Fetching by Video IDs
```bash
ytfetcher from_video_ids -v video_id1 video_id2 ... -f json
```
### Fetching From Playlist Id
```bash
ytfetcher from_playlist_id -p playlistid123 -f csv -m 25
```
### Using Webshare Proxy
```bash
ytfetcher from_channel -c <CHANNEL_HANDLE> -f json --webshare-proxy-username "<USERNAME>" --webshare-proxy-password "<PASSWORD>"
```
### Using Custom Proxy
```bash
ytfetcher from_channel -c <CHANNEL_HANDLE> -f json --http-proxy "http://user:pass@host:port" --https-proxy "https://user:pass@host:port"
```
### Using Custom HTTP Config
```bash
ytfetcher from_channel -c <CHANNEL_HANDLE> --http-timeout 4.2 --http-headers "{'key': 'value'}"
```
---
## Contributing
```bash
git clone https://github.com/kaya70875/ytfetcher.git
cd ytfetcher
poetry install
```
---
## Running Tests
```bash
poetry run pytest
```
---
## Related Projects
- [youtube-transcript-api](https://github.com/jdepoix/youtube-transcript-api)
---
## License
This project is licensed under the MIT License — see the [LICENSE](./LICENSE) file for details.
---
⭐ If you find this useful, please star the repo or open an issue with feedback!
Raw data
{
"_id": null,
"home_page": "https://github.com/kaya70875/ytfetcher",
"name": "ytfetcher",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.14,>=3.11",
"maintainer_email": null,
"keywords": "yt, transcripts, youtube, cli, dataset, scraping, python, youtube-transcripts",
"author": "Ahmet Kaya",
"author_email": "kaya70875@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/b3/94/1f26d1a10073f5e6d91a802213a1db4d9d15f589a68abb3bf8874793b0fb/ytfetcher-1.3.1.tar.gz",
"platform": null,
"description": "# YTFetcher\n[](https://codecov.io/gh/kaya70875/ytfetcher)\n[](https://pepy.tech/projects/ytfetcher)\n[](https://pypi.org/project/ytfetcher/)\n[](https://ytfetcher.readthedocs.io/en/latest/?badge=latest)\n[](https://opensource.org/licenses/MIT)\n\n> \u26a1 Turn hours of YouTube videos into clean, structured text in minutes.\n\nA python tool for fetching thousands of videos fast from a Youtube channel along with structured transcripts and additional metadata. Export data easily as CSV, TXT, or JSON.\n\n---\n\n## \ud83d\udcda Table of Contents\n- [Installation](#installation)\n- [Quick CLI Usage](#quick-cli-usage)\n- [Features](#features)\n- [Basic Usage (Python API)](#basic-usage-python-api)\n- [Using Different Fetchers](#using-different-fetchers)\n- [Retreive Different Languages](#retreive-different-languages)\n- [Exporting](#exporting)\n- [Other Methods](#other-methods)\n- [Proxy Configuration](#proxy-configuration)\n- [Advanced HTTP Configuration (Optional)](#advanced-http-configuration-optional)\n- [CLI (Advanced)](#cli-advanced)\n- [Contributing](#contributing)\n- [Running Tests](#running-tests)\n- [Related Projects](#related-projects)\n- [License](#license)\n\n---\n\n## Installation\nInstall from PyPI:\n```bash\npip install ytfetcher\n```\n\n---\n\n## Quick CLI Usage\nFetch 50 video transcripts + metadata from a channel and save as JSON:\n```bash\nytfetcher from_channel -c TheOffice -m 50 -f json\n```\n\n---\n\n## CLI Overview\nYTFetcher comes with a simple CLI so you can fetch data directly from your terminal.\n\n```bash\nytfetcher -h\n```\n\n```bash\nusage: ytfetcher [-h] {from_channel,from_video_ids} ...\n\nFetch YouTube transcripts for a channel\n\npositional arguments:\n {from_channel,from_video_ids}\n from_channel Fetch data from channel handle with max_results.\n from_playlist_id Fetch data from a specific playlist id.\n from_video_ids Fetch data from your custom video ids.\n\noptions:\n -h, --help show this help message and exit\n```\n\n---\n\n## Features\n- Fetch full **transcripts** from a YouTube channel.\n- Get video **metadata: title, description, thumbnails, published date**.\n- Async support for **high performance**.\n- **Export** fetched data as txt, csv or json.\n- **CLI** support.\n\n---\n\n## Basic Usage (Python API)\n\n**Note:** When specifying the channel, you should provide the exact **channel handle** without the `@` symbol, channel URL, or display name. \nFor example, use `TheOffice` instead of `@TheOffice` or `https://www.youtube.com/c/TheOffice`.\n\nHere\u2019s how you can get transcripts and metadata information like channel name, description, published date, etc. from a single channel with `from_channel` method:\n\n```python\nfrom ytfetcher import YTFetcher\nfrom ytfetcher.models.channel import ChannelData\nimport asyncio\n\nfetcher = YTFetcher.from_channel(\n channel_handle=\"TheOffice\",\n max_results=2\n)\n\nasync def get_channel_data() -> list[ChannelData]:\n channel_data = await fetcher.fetch_youtube_data()\n return channel_data\n\nif __name__ == '__main__':\n data = asyncio.run(get_channel_data())\n print(data)\n```\n\n---\n\nThis will return a list of `ChannelData` with metadata in `DLSnippet` objects:\n\n```python\n[\nChannelData(\n video_id='video1',\n transcripts=[\n Transcript(\n text=\"Hey there\",\n start=0.0,\n duration=1.54\n ),\n Transcript(\n text=\"Happy coding!\",\n start=1.56,\n duration=4.46\n )\n ]\n metadata=DLSnippet(\n title='VideoTitle',\n description='VideoDescription',\n url='https://youtu.be/video1',\n duration=120,\n view_count=1000,\n thumbnails=[{'url': 'thumbnail_url'}]\n )\n),\n# Other ChannelData objects...\n]\n```\n\n---\n\n## Using Different Fetchers\n\n`Ytfetcher` also supports different fetcher so you can fetch with `channel_handle`, custom `video_ids` or from a `playlist_id`\n\n### Fetching from Playlist ID\n\nHere's how you can fetch bulk transcripts from a specific `playlist_id` using `ytfetcher`.\n\n```python\nfrom ytfetcher import YTFetcher\nimport asyncio\n\nfetcher = YTFetcher.from_playlist_id(\n playlist_id=\"playlistid1254\"\n)\n\n# Rest is same ...\n```\n\n### Fetching With Custom Video IDs\n\nInitialize `ytfetcher` with custom video IDs using `from_video_ids` method:\n\n```python\nfrom ytfetcher import YTFetcher\nimport asyncio\n\nfetcher = YTFetcher.from_video_ids(\n video_ids=['video1', 'video2', 'video3']\n)\n\n# Rest is same ...\n```\n\n---\n\n## Retreive Different Languages\n\nYou can use the `languages` param to retrieve your desired language. (Default en)\n\n```python\nfetcher = YTFetcher.from_video_ids(video_ids=video_ids, languages=[\"tr\", \"en\"])\n```\n\nAlso here's a quick CLI command for `languages` param.\n```bash\nytfetcher from_channel -c TheOffice -m 50 -f csv --print --languages tr en\n```\n\n`ytfetcher` first tries to fetch the `Turkish` transcript. If it's not available, it falls back to `English`.\n\n---\n\n## Exporting\n\nUse the `Exporter` class to export `ChannelData` in **csv**, **json**, or **txt**:\n\n```python\nfrom ytfetcher.services import Exporter\n\nchannel_data = asyncio.run(fetcher.fetch_youtube_data())\n\nexporter = Exporter(\n channel_data=channel_data,\n allowed_metadata_list=['title'], # You can customize this\n timing=True, # Include transcript start/duration\n filename='my_export', # Base filename\n output_dir='./exports' # Optional output directory\n)\n\nexporter.export_as_json() # or .export_as_txt(), .export_as_csv()\n```\n\n### Exporting With CLI\n\nYou can also specify arguments when exporting which allows you to decide whether to exclude `timings` and choose desired `metadata`.\n```bash\nytfetcher from_channel -c TheOffice -m 20 -f json --no-timing --metadata title description\n```\n\nThis will **exclude** `timings` from transcripts and keep only `title` and `description` as metadata.\n\n---\n\n## Other Methods\n\nYou can also fetch only transcript data or metadata with video IDs using `fetch_transcripts` and `fetch_snippets`.\n\n### Fetch Transcripts\n\n```python\nfetcher = YTFetcher.from_channel(channel_handle=\"TheOffice\", max_results=2)\n\nasync def get_transcript_data():\n return await fetcher.fetch_transcripts()\n\ndata = asyncio.run(get_transcript_data())\nprint(data)\n```\n\n### Fetch Snippets\n\n```python\nasync def get_snippets():\n return await fetcher.fetch_snippets()\n\ndata = asyncio.run(get_snippets())\nprint(data)\n```\n\n---\n\n## Proxy Configuration\n\n`YTFetcher` supports proxy usage for fetching YouTube transcripts:\n\n```python\nfrom ytfetcher import YTFetcher\nfrom ytfetcher.config import GenericProxyConfig\n\nfetcher = YTFetcher.from_channel(\n channel_handle=\"TheOffice\",\n max_results=3,\n proxy_config=GenericProxyConfig()\n)\n```\n\n---\n\n## Advanced HTTP Configuration (Optional)\n`YTfetcher` already uses custom headers for mimic real browser behavior but if want to change it you can use a custom `HTTPConfig` class.\n\n```python\nfrom ytfetcher import YTFetcher\nfrom ytfetcher.config import HTTPConfig\n\ncustom_config = HTTPConfig(\n timeout=4.0,\n headers={\"User-Agent\": \"ytfetcher/1.0\"}\n)\n\nfetcher = YTFetcher.from_channel(\n channel_handle=\"TheOffice\",\n max_results=10,\n http_config=custom_config\n)\n```\n\n---\n\n## CLI (Advanced)\n\n### Basic Usage\n```bash\nytfetcher from_channel -c <CHANNEL_HANDLE> -m <MAX_RESULTS> -f <FORMAT>\n```\n\n### Fetching by Video IDs\n```bash\nytfetcher from_video_ids -v video_id1 video_id2 ... -f json\n```\n\n### Fetching From Playlist Id\n```bash\nytfetcher from_playlist_id -p playlistid123 -f csv -m 25\n```\n\n### Using Webshare Proxy\n\n```bash\nytfetcher from_channel -c <CHANNEL_HANDLE> -f json --webshare-proxy-username \"<USERNAME>\" --webshare-proxy-password \"<PASSWORD>\"\n```\n\n### Using Custom Proxy\n\n```bash\nytfetcher from_channel -c <CHANNEL_HANDLE> -f json --http-proxy \"http://user:pass@host:port\" --https-proxy \"https://user:pass@host:port\"\n```\n\n### Using Custom HTTP Config\n\n```bash\nytfetcher from_channel -c <CHANNEL_HANDLE> --http-timeout 4.2 --http-headers \"{'key': 'value'}\"\n```\n\n---\n\n## Contributing\n\n```bash\ngit clone https://github.com/kaya70875/ytfetcher.git\ncd ytfetcher\npoetry install\n```\n\n---\n\n## Running Tests\n\n```bash\npoetry run pytest\n```\n\n---\n\n## Related Projects\n\n- [youtube-transcript-api](https://github.com/jdepoix/youtube-transcript-api)\n\n---\n\n## License\n\nThis project is licensed under the MIT License \u2014 see the [LICENSE](./LICENSE) file for details.\n\n---\n\n\u2b50 If you find this useful, please star the repo or open an issue with feedback!\n\n",
"bugtrack_url": null,
"license": "MIT License\n\nCopyright (c) 2025 Ahmet Kaya\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in\nall copies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY...\n",
"summary": "YTFetcher lets you fetch YouTube transcripts in bulk with metadata like titles, publish dates, and thumbnails. Great for ML, NLP, and dataset generation.",
"version": "1.3.1",
"project_urls": {
"Documentation": "https://github.com/kaya70875/ytfetcher#readme",
"Homepage": "https://github.com/kaya70875/ytfetcher",
"Repository": "https://github.com/kaya70875/ytfetcher"
},
"split_keywords": [
"yt",
" transcripts",
" youtube",
" cli",
" dataset",
" scraping",
" python",
" youtube-transcripts"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "daa186c9e353ea3cbd2766c913261453c3112d4f0743fbc4e4b593bd97bb458e",
"md5": "7a23fb727d84435463f5a321add0d251",
"sha256": "e23df9a1b86dfa402ce80864e29d24bfbfc7ef163f4f4234c0a9ce8a9e88e1bd"
},
"downloads": -1,
"filename": "ytfetcher-1.3.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "7a23fb727d84435463f5a321add0d251",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.14,>=3.11",
"size": 19095,
"upload_time": "2025-10-18T10:16:05",
"upload_time_iso_8601": "2025-10-18T10:16:05.621874Z",
"url": "https://files.pythonhosted.org/packages/da/a1/86c9e353ea3cbd2766c913261453c3112d4f0743fbc4e4b593bd97bb458e/ytfetcher-1.3.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b3941f26d1a10073f5e6d91a802213a1db4d9d15f589a68abb3bf8874793b0fb",
"md5": "6ae78a91d158b09995b8c9e885be3d62",
"sha256": "3bbf7ca8e22f172d9c8bb80284acd0ae2952a08c6918bfe37c76a59c58dd29e9"
},
"downloads": -1,
"filename": "ytfetcher-1.3.1.tar.gz",
"has_sig": false,
"md5_digest": "6ae78a91d158b09995b8c9e885be3d62",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.14,>=3.11",
"size": 17091,
"upload_time": "2025-10-18T10:16:07",
"upload_time_iso_8601": "2025-10-18T10:16:07.245696Z",
"url": "https://files.pythonhosted.org/packages/b3/94/1f26d1a10073f5e6d91a802213a1db4d9d15f589a68abb3bf8874793b0fb/ytfetcher-1.3.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-18 10:16:07",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "kaya70875",
"github_project": "ytfetcher",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "ytfetcher"
}