# yt-dlp-transcripts
A powerful Python tool for extracting video information and transcripts from YouTube videos, playlists, channels, and channel playlists. Built on top of `yt-dlp` and `youtube-transcript-api`.
## Features
- 📹 **Single Video Processing** - Extract metadata and transcripts from individual YouTube videos
- 📚 **Playlist Support** - Process entire playlists with progress tracking
- 📺 **Channel Videos** - Download information from all videos on a channel
- 🗂️ **Channel Playlists** - Process all playlists from a channel
- 🔄 **Resume Capability** - Automatically skip already processed videos
- 🎯 **Auto-Detection** - Automatically detect URL type (video/playlist/channel)
- 📊 **Rich Metadata** - Extract title, description, upload date, duration, view count, and more
- 📝 **Transcript Extraction** - Get full video transcripts when available
- 💾 **CSV Export** - Save all data in easily accessible CSV format
## Installation
### Via pip
```bash
pip install yt-dlp-transcripts
# As a command-line tool (after pip install)
yt-dlp-transcripts -u "https://www.youtube.com/watch?v=VIDEO_ID" -o video.csv
```
### To run from source
```bash
git clone https://github.com/yourusername/yt-dlp-transcripts.git
cd yt-dlp-transcripts
poetry install
poetry shell
# With poetry (after poetry install and poetry shell)
python -m yt_dlp_transcripts -u "https://www.youtube.com/watch?v=VIDEO_ID" -o video.csv
```
## Usage
```bash
yt-dlp-transcripts -u "https://www.youtube.com/watch?v=VIDEO_ID" -o output.csv
yt-dlp-transcripts -u "https://www.youtube.com/playlist?list=PLAYLIST_ID" -o output.csv
yt-dlp-transcripts -u "https://www.youtube.com/@channelname/videos" -o output.csv
yt-dlp-transcripts -u "https://www.youtube.com/@channelname/playlists" -o output.csv
```
### Options
| Option | Short | Description | Example |
|--------|-------|-------------|---------|
| `--url` | `-u` | YouTube URL (auto-detects type) | `https://youtube.com/...` |
| `--output` | `-o` | Output CSV file path | `output.csv` |
| `--help` | | Show help message | (flag, no value) |
## Output Format
The tool exports data to CSV with the following fields:
### Common Fields
- `video_id` - YouTube video ID
- `title` - Video title
- `url` - Video URL
- `description` - Video description
- `transcript` - Full video transcript (when available)
- `upload_date` - Upload date (YYYYMMDD format)
- `duration` - Video duration in seconds
- `view_count` - Number of views
- `channel` - Channel name
- `channel_id` - Channel ID
### Additional Fields for Playlists
- `playlist_name` - Name of the source playlist
- `playlist_url` - URL of the source playlist
### Additional Fields for Channel Videos
- `channel_source_url` - URL of the channel page
## Examples
### Research and Analysis
```bash
# Analyze a conference talk playlist
yt-dlp-transcripts -u "https://www.youtube.com/playlist?list=PLconf2024" -o conference_talks.csv
# Extract all videos from an educational channel
yt-dlp-transcripts -u "https://www.youtube.com/@3blue1brown/videos" -o math_videos.csv
```
### Content Creation
```bash
# Get transcripts from your competitor's channel
yt-dlp-transcripts -u "https://www.youtube.com/@competitor/videos" -o competitor_analysis.csv
# Archive your own channel's content
yt-dlp-transcripts -u "https://www.youtube.com/@yourchannel/videos" -o my_backup.csv
```
### Academic Research
```bash
# Collect lecture series for analysis
yt-dlp-transcripts -u "https://www.youtube.com/playlist?list=PLlecture" -o lectures.csv
# Get transcripts from multiple related playlists
yt-dlp-transcripts -u "https://www.youtube.com/@university/playlists" -o all_courses.csv
```
### Python API Usage
```python
from yt_dlp_transcripts import (
get_video_info,
process_single_video,
process_playlist,
process_channel,
detect_url_type
)
# Get video information as dictionary
video_data = get_video_info("https://www.youtube.com/watch?v=VIDEO_ID")
print(video_data['title'])
print(video_data['transcript'])
print(video_data['duration'])
# Process content and save to CSV
process_single_video("https://www.youtube.com/watch?v=VIDEO_ID", "output.csv")
process_playlist("https://www.youtube.com/playlist?list=PLAYLIST_ID", "output.csv")
process_channel("https://www.youtube.com/@channel/videos", "output.csv", mode='videos')
# Auto-detect URL type
url_type = detect_url_type("https://www.youtube.com/watch?v=VIDEO_ID") # Returns: 'video'
```
## Features in Detail
### Resume Capability
The tool automatically tracks processed videos and skips them on subsequent runs. This allows you to:
- Interrupt and resume large downloads
- Update your dataset with only new videos
- Avoid redundant API calls
### Progress Tracking
When processing multiple videos, the tool shows:
- Current video number and total count
- Video title being processed
- Success/skip status for each video
### Error Handling
- Gracefully handles missing transcripts
- Continues processing even if individual videos fail
- Provides clear error messages for troubleshooting
### Rate Limiting
The tool respects YouTube's rate limits. If you encounter 429 errors:
- The tool will continue processing and get available metadata
- Transcripts may be unavailable during rate limiting
- Consider adding delays or processing in smaller batches
## Requirements
- Python 3.9+
- yt-dlp
- youtube-transcript-api
- click
## Limitations
- **Transcript Availability**: Not all videos have transcripts available
- **Rate Limiting**: YouTube may rate limit requests with large datasets
- **Private Videos**: Cannot access private or age-restricted content without authentication
- **API Changes**: YouTube's API may change, affecting functionality
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
1. Fork the repository
2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request
## Running Tests
```bash
pytest
```
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Acknowledgments
- Built on top of [yt-dlp](https://github.com/yt-dlp/yt-dlp)
- Transcript extraction via [youtube-transcript-api](https://github.com/jdepoix/youtube-transcript-api)
- CLI interface powered by [click](https://click.palletsprojects.com/)
## Author
[Shawn Anderson](https://linuxiscool.xyz)
## Support
If you encounter any issues or have questions:
- Open an issue on [GitHub](https://github.com/yourusername/yt-dlp-transcripts/issues)
- Check existing issues for solutions
- Provide detailed error messages and URLs (when possible) for debugging
## Changelog
### v0.1.0 (2025-08-28)
- Initial release
- Support for videos, playlists, channels, and channel playlists
- Auto-detection of URL types
- Resume capability for interrupted downloads
- CSV export with comprehensive metadata
Raw data
{
"_id": null,
"home_page": "https://linuxiscool.xyz",
"name": "yt-dlp-transcripts",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": null,
"keywords": "youtube, transcripts, yt-dlp, video, download, metadata, playlist, channel",
"author": "Shawn Anderson",
"author_email": "shawn@longtailfinancial.com",
"download_url": "https://files.pythonhosted.org/packages/b9/4c/2337bb4dcb6aa32bd1e0e5a0dabac9d97d0f07aab072611a845abeb69551/yt_dlp_transcripts-0.1.1.tar.gz",
"platform": null,
"description": "# yt-dlp-transcripts\n\nA powerful Python tool for extracting video information and transcripts from YouTube videos, playlists, channels, and channel playlists. Built on top of `yt-dlp` and `youtube-transcript-api`.\n\n## Features\n\n- \ud83d\udcf9 **Single Video Processing** - Extract metadata and transcripts from individual YouTube videos\n- \ud83d\udcda **Playlist Support** - Process entire playlists with progress tracking\n- \ud83d\udcfa **Channel Videos** - Download information from all videos on a channel\n- \ud83d\uddc2\ufe0f **Channel Playlists** - Process all playlists from a channel\n- \ud83d\udd04 **Resume Capability** - Automatically skip already processed videos\n- \ud83c\udfaf **Auto-Detection** - Automatically detect URL type (video/playlist/channel)\n- \ud83d\udcca **Rich Metadata** - Extract title, description, upload date, duration, view count, and more\n- \ud83d\udcdd **Transcript Extraction** - Get full video transcripts when available\n- \ud83d\udcbe **CSV Export** - Save all data in easily accessible CSV format\n\n## Installation\n\n### Via pip \n```bash\npip install yt-dlp-transcripts\n\n# As a command-line tool (after pip install)\nyt-dlp-transcripts -u \"https://www.youtube.com/watch?v=VIDEO_ID\" -o video.csv\n```\n\n### To run from source\n```bash\ngit clone https://github.com/yourusername/yt-dlp-transcripts.git\ncd yt-dlp-transcripts\npoetry install\npoetry shell\n\n# With poetry (after poetry install and poetry shell)\npython -m yt_dlp_transcripts -u \"https://www.youtube.com/watch?v=VIDEO_ID\" -o video.csv\n```\n\n## Usage\n\n```bash\nyt-dlp-transcripts -u \"https://www.youtube.com/watch?v=VIDEO_ID\" -o output.csv\nyt-dlp-transcripts -u \"https://www.youtube.com/playlist?list=PLAYLIST_ID\" -o output.csv\nyt-dlp-transcripts -u \"https://www.youtube.com/@channelname/videos\" -o output.csv\nyt-dlp-transcripts -u \"https://www.youtube.com/@channelname/playlists\" -o output.csv\n```\n\n### Options\n\n| Option | Short | Description | Example |\n|--------|-------|-------------|---------|\n| `--url` | `-u` | YouTube URL (auto-detects type) | `https://youtube.com/...` |\n| `--output` | `-o` | Output CSV file path | `output.csv` |\n| `--help` | | Show help message | (flag, no value) |\n\n## Output Format\n\nThe tool exports data to CSV with the following fields:\n\n### Common Fields\n- `video_id` - YouTube video ID\n- `title` - Video title\n- `url` - Video URL\n- `description` - Video description\n- `transcript` - Full video transcript (when available)\n- `upload_date` - Upload date (YYYYMMDD format)\n- `duration` - Video duration in seconds\n- `view_count` - Number of views\n- `channel` - Channel name\n- `channel_id` - Channel ID\n\n### Additional Fields for Playlists\n- `playlist_name` - Name of the source playlist\n- `playlist_url` - URL of the source playlist\n\n### Additional Fields for Channel Videos\n- `channel_source_url` - URL of the channel page\n\n## Examples\n\n### Research and Analysis\n```bash\n# Analyze a conference talk playlist\nyt-dlp-transcripts -u \"https://www.youtube.com/playlist?list=PLconf2024\" -o conference_talks.csv\n\n# Extract all videos from an educational channel\nyt-dlp-transcripts -u \"https://www.youtube.com/@3blue1brown/videos\" -o math_videos.csv\n```\n\n### Content Creation\n```bash\n# Get transcripts from your competitor's channel\nyt-dlp-transcripts -u \"https://www.youtube.com/@competitor/videos\" -o competitor_analysis.csv\n\n# Archive your own channel's content\nyt-dlp-transcripts -u \"https://www.youtube.com/@yourchannel/videos\" -o my_backup.csv\n```\n\n### Academic Research\n```bash\n# Collect lecture series for analysis\nyt-dlp-transcripts -u \"https://www.youtube.com/playlist?list=PLlecture\" -o lectures.csv\n\n# Get transcripts from multiple related playlists\nyt-dlp-transcripts -u \"https://www.youtube.com/@university/playlists\" -o all_courses.csv\n```\n\n### Python API Usage\n\n```python\nfrom yt_dlp_transcripts import (\n get_video_info,\n process_single_video,\n process_playlist,\n process_channel,\n detect_url_type\n)\n\n# Get video information as dictionary\nvideo_data = get_video_info(\"https://www.youtube.com/watch?v=VIDEO_ID\")\nprint(video_data['title'])\nprint(video_data['transcript'])\nprint(video_data['duration'])\n\n# Process content and save to CSV\nprocess_single_video(\"https://www.youtube.com/watch?v=VIDEO_ID\", \"output.csv\")\nprocess_playlist(\"https://www.youtube.com/playlist?list=PLAYLIST_ID\", \"output.csv\")\nprocess_channel(\"https://www.youtube.com/@channel/videos\", \"output.csv\", mode='videos')\n\n# Auto-detect URL type\nurl_type = detect_url_type(\"https://www.youtube.com/watch?v=VIDEO_ID\") # Returns: 'video'\n```\n\n## Features in Detail\n\n### Resume Capability\nThe tool automatically tracks processed videos and skips them on subsequent runs. This allows you to:\n- Interrupt and resume large downloads\n- Update your dataset with only new videos\n- Avoid redundant API calls\n\n### Progress Tracking\nWhen processing multiple videos, the tool shows:\n- Current video number and total count\n- Video title being processed\n- Success/skip status for each video\n\n### Error Handling\n- Gracefully handles missing transcripts\n- Continues processing even if individual videos fail\n- Provides clear error messages for troubleshooting\n\n### Rate Limiting\nThe tool respects YouTube's rate limits. If you encounter 429 errors:\n- The tool will continue processing and get available metadata\n- Transcripts may be unavailable during rate limiting\n- Consider adding delays or processing in smaller batches\n\n\n## Requirements\n\n- Python 3.9+\n- yt-dlp\n- youtube-transcript-api\n- click\n\n## Limitations\n\n- **Transcript Availability**: Not all videos have transcripts available\n- **Rate Limiting**: YouTube may rate limit requests with large datasets\n- **Private Videos**: Cannot access private or age-restricted content without authentication\n- **API Changes**: YouTube's API may change, affecting functionality\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n1. Fork the repository\n2. Create your feature branch (`git checkout -b feature/AmazingFeature`)\n3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)\n4. Push to the branch (`git push origin feature/AmazingFeature`)\n5. Open a Pull Request\n\n## Running Tests\n```bash\npytest\n```\n\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Acknowledgments\n\n- Built on top of [yt-dlp](https://github.com/yt-dlp/yt-dlp)\n- Transcript extraction via [youtube-transcript-api](https://github.com/jdepoix/youtube-transcript-api)\n- CLI interface powered by [click](https://click.palletsprojects.com/)\n\n## Author\n\n[Shawn Anderson](https://linuxiscool.xyz)\n\n## Support\n\nIf you encounter any issues or have questions:\n- Open an issue on [GitHub](https://github.com/yourusername/yt-dlp-transcripts/issues)\n- Check existing issues for solutions\n- Provide detailed error messages and URLs (when possible) for debugging\n\n## Changelog\n\n### v0.1.0 (2025-08-28)\n- Initial release\n- Support for videos, playlists, channels, and channel playlists\n- Auto-detection of URL types\n- Resume capability for interrupted downloads\n- CSV export with comprehensive metadata\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Extract video information and transcripts from YouTube videos, playlists, and channels",
"version": "0.1.1",
"project_urls": {
"Documentation": "https://github.com/LinuxIsCool/yt-dlp-transcripts#readme",
"Homepage": "https://linuxiscool.xyz",
"Repository": "https://github.com/LinuxIsCool/yt-dlp-transcripts"
},
"split_keywords": [
"youtube",
" transcripts",
" yt-dlp",
" video",
" download",
" metadata",
" playlist",
" channel"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0c4213b10be9afc939dedf1e18cdb3a1a162363bc12c221d14ea6c1cd884e943",
"md5": "8b216f49ae4d1adb1115a1a76e881621",
"sha256": "0c4307025905fa48c705778acf07b74ec3247ccf8009422290cc52dd91437c34"
},
"downloads": -1,
"filename": "yt_dlp_transcripts-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "8b216f49ae4d1adb1115a1a76e881621",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 9992,
"upload_time": "2025-08-29T06:23:14",
"upload_time_iso_8601": "2025-08-29T06:23:14.884532Z",
"url": "https://files.pythonhosted.org/packages/0c/42/13b10be9afc939dedf1e18cdb3a1a162363bc12c221d14ea6c1cd884e943/yt_dlp_transcripts-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b94c2337bb4dcb6aa32bd1e0e5a0dabac9d97d0f07aab072611a845abeb69551",
"md5": "52616ea59aa4f9b65c95d81b1f474cbc",
"sha256": "7420c97612fab8ceea5562abddbe074ea5890ca5e17295bbb46b475d86c75751"
},
"downloads": -1,
"filename": "yt_dlp_transcripts-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "52616ea59aa4f9b65c95d81b1f474cbc",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.9",
"size": 10927,
"upload_time": "2025-08-29T06:23:16",
"upload_time_iso_8601": "2025-08-29T06:23:16.099158Z",
"url": "https://files.pythonhosted.org/packages/b9/4c/2337bb4dcb6aa32bd1e0e5a0dabac9d97d0f07aab072611a845abeb69551/yt_dlp_transcripts-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-29 06:23:16",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "LinuxIsCool",
"github_project": "yt-dlp-transcripts#readme",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "yt-dlp-transcripts"
}