# transcript-kit
**AI-powered YouTube transcript processor** - Clean, tag, and organize video transcripts automatically.
[](https://opensource.org/licenses/MIT)
[](https://www.python.org/downloads/)
**Author:** Kevin Callens
---
## Features
- 🎬 **Download** YouTube subtitles automatically
- 🤖 **AI-powered cleaning** - Fixes speech-to-text errors, creates proper paragraphs
- 🏷️ **Smart tagging** - Automatically tags transcripts with relevant topics
- 📁 **Organized storage** - Clean file naming with dates and tags
- 🔧 **Configurable** - Customize AI model, storage location, and processing options
- 🌍 **Cross-platform** - Works on Linux, macOS, and Windows
---
## Quick Start
### Installation
```bash
pip3 install transcript-kit
```
Or using pipx (recommended):
```bash
pipx install transcript-kit
```
**That's it!** All dependencies (including yt-dlp) install automatically.
### Setup
Run the interactive setup wizard:
```bash
transcript-kit setup
```
This will:
- Prompt for your OpenRouter API key
- Configure AI model preferences
- Set up storage locations
- Optionally add starter tags
### Process a Video
```bash
transcript-kit process "https://www.youtube.com/watch?v=VIDEO_ID"
```
That's it! Your cleaned transcript will be saved to `~/Documents/transcript-kit/`
---
## Requirements
- **Python 3.8+**
- All other dependencies install automatically with `pip3 install transcript-kit`
---
## Configuration
### Getting an API Key
1. Visit [OpenRouter](https://openrouter.ai/keys)
2. Sign up for an account
3. Generate an API key
4. Run `transcript-kit setup` and enter your key
**Note:** transcript-kit uses OpenRouter which provides access to various AI models. Some models are free, others are pay-per-use.
### Configuration File
Config is stored at:
- **Linux/macOS:** `~/.config/transcript-kit/config.yaml`
- **Windows:** `%APPDATA%/transcript-kit/config.yaml`
See `examples/config.yaml.example` for all available options.
### Environment Variables
You can also configure via environment variables:
```bash
export OPENROUTER_API_KEY="your-key-here"
export TRANSCRIPT_KIT_AI_MODEL="openai/gpt-4o-mini"
export TRANSCRIPT_KIT_DATA_DIR="~/Documents/my-transcripts"
```
---
## Usage
### Commands
```bash
# Interactive setup wizard
transcript-kit setup
# Process a YouTube video
transcript-kit process "https://youtube.com/watch?v=xxx"
# Process with custom model
transcript-kit process "URL" --model anthropic/claude-3-haiku
# Process without tagging
transcript-kit process "URL" --no-tag
# Process to custom directory
transcript-kit process "URL" --output ~/custom/path
# Show current configuration
transcript-kit config
# Show configuration with API key visible
transcript-kit config --show-secrets
# Show version
transcript-kit --version
# Get help
transcript-kit --help
```
### Output Format
Transcripts are saved as:
```
YYYY-MM-DD-video-title-[tag1,tag2].txt
```
Example:
```
2025-11-03-how-ai-works-[AI,Education].txt
```
File contents:
```markdown
# How AI Works
**Date**: 2025-11-03
**Tags**: AI, Education
**Context**: This video explains the fundamentals of artificial intelligence...
---
[Cleaned transcript with proper paragraphs and fixed errors...]
```
---
## How It Works
1. **Download** - Uses yt-dlp to fetch YouTube subtitles (.srt format)
2. **Analyze** - AI analyzes the video title and content to understand context
3. **Tag** - Assigns 1-2 relevant tags (reuses existing tags when possible)
4. **Clean** - Processes transcript in chunks to:
- Fix speech-to-text errors
- Add proper punctuation
- Create readable paragraphs
- Maintain original meaning
5. **Save** - Organized file with metadata and cleaned content
---
## Tag System
### How Tags Work
- Starts with **empty tag database** (or optional starter tags)
- AI assigns **1-2 tags maximum** per transcript
- **Prefers existing tags** to maintain consistency
- **Creates new tags** only when necessary
- All tags saved to `tags-database.txt`
### Example Tag Evolution
```
First video: [AI]
Second video: [Marketing]
Third video: [AI, Marketing] ← Uses existing tags
Fourth video: [Tutorial] ← Creates new tag when needed
```
---
## Advanced Configuration
### AI Models
Popular models (via OpenRouter):
- `openai/gpt-oss-20b` - Fast, good quality (default)
- `openai/gpt-4o-mini` - Higher quality, moderate cost
- `anthropic/claude-3-haiku` - Fast Claude model
### Processing Options
Edit `~/.config/transcript-kit/config.yaml`:
```yaml
ai:
chunk_size: 8000 # Words per API call
max_retries: 3 # Retry attempts on error
processing:
analyze_context: true # Context analysis before processing
auto_tag: true # Automatic tagging
tags:
max_tags: 2 # Maximum tags per transcript
starter_tags: [] # Optional starter tags
```
---
## Troubleshooting
### "yt-dlp not found"
This shouldn't happen since yt-dlp is automatically installed with transcript-kit.
If you see this error, try reinstalling:
```bash
pip3 install --force-reinstall transcript-kit
```
### "API key not configured"
Run the setup wizard:
```bash
transcript-kit setup
```
Or set environment variable:
```bash
export OPENROUTER_API_KEY="your-key-here"
```
### "No subtitles found"
The video may not have auto-generated subtitles. Try a different video or check if subtitles are available on YouTube.
### "Configuration error"
Verify your config file:
```bash
transcript-kit config
```
Re-run setup if needed:
```bash
transcript-kit setup
```
---
## Security
### API Key Protection
- **NEVER** commit your `config.yaml` to git
- Config file permissions are set to `0600` (owner read/write only)
- API keys are never logged or printed
- Use `.env` files for additional security (also gitignored)
### Example Files
Safe to commit:
- ✅ `examples/config.yaml.example`
- ✅ `.env.example`
**NEVER** commit:
- ❌ `~/.config/transcript-kit/config.yaml`
- ❌ `.env`
---
## Development
### Local Installation
```bash
git clone https://github.com/kevincallens/transcript-kit.git
cd transcript-kit
pip3 install -e ".[dev]"
```
### Running Tests
```bash
pytest
```
### Code Formatting
```bash
black src/
ruff check src/
```
---
## Contributing
Contributions are welcome! Please:
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests if applicable
5. Submit a pull request
---
## License
MIT License - see [LICENSE](LICENSE) file for details
Copyright (c) 2025 Kevin Callens
---
## Acknowledgments
- Uses [yt-dlp](https://github.com/yt-dlp/yt-dlp) for downloading subtitles
- Powered by [OpenRouter](https://openrouter.ai) for AI processing
- Built with [Click](https://click.palletsprojects.com/) for the CLI
---
## Links
- **Repository:** https://github.com/kevincallens/transcript-kit
- **Issues:** https://github.com/kevincallens/transcript-kit/issues
- **OpenRouter:** https://openrouter.ai
Raw data
{
"_id": null,
"home_page": null,
"name": "transcript-kit",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "youtube, transcript, ai, openrouter, subtitles",
"author": "Kevin Callens",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/ef/e8/73f97520f8657238eef0125e8fcd4a54861532bc204fad26c89c7308d222/transcript_kit-0.1.1.tar.gz",
"platform": null,
"description": "# transcript-kit\n\n**AI-powered YouTube transcript processor** - Clean, tag, and organize video transcripts automatically.\n\n[](https://opensource.org/licenses/MIT)\n[](https://www.python.org/downloads/)\n\n**Author:** Kevin Callens\n\n---\n\n## Features\n\n- \ud83c\udfac **Download** YouTube subtitles automatically\n- \ud83e\udd16 **AI-powered cleaning** - Fixes speech-to-text errors, creates proper paragraphs\n- \ud83c\udff7\ufe0f **Smart tagging** - Automatically tags transcripts with relevant topics\n- \ud83d\udcc1 **Organized storage** - Clean file naming with dates and tags\n- \ud83d\udd27 **Configurable** - Customize AI model, storage location, and processing options\n- \ud83c\udf0d **Cross-platform** - Works on Linux, macOS, and Windows\n\n---\n\n## Quick Start\n\n### Installation\n\n```bash\npip3 install transcript-kit\n```\n\nOr using pipx (recommended):\n\n```bash\npipx install transcript-kit\n```\n\n**That's it!** All dependencies (including yt-dlp) install automatically.\n\n### Setup\n\nRun the interactive setup wizard:\n\n```bash\ntranscript-kit setup\n```\n\nThis will:\n- Prompt for your OpenRouter API key\n- Configure AI model preferences\n- Set up storage locations\n- Optionally add starter tags\n\n### Process a Video\n\n```bash\ntranscript-kit process \"https://www.youtube.com/watch?v=VIDEO_ID\"\n```\n\nThat's it! Your cleaned transcript will be saved to `~/Documents/transcript-kit/`\n\n---\n\n## Requirements\n\n- **Python 3.8+**\n- All other dependencies install automatically with `pip3 install transcript-kit`\n\n---\n\n## Configuration\n\n### Getting an API Key\n\n1. Visit [OpenRouter](https://openrouter.ai/keys)\n2. Sign up for an account\n3. Generate an API key\n4. Run `transcript-kit setup` and enter your key\n\n**Note:** transcript-kit uses OpenRouter which provides access to various AI models. Some models are free, others are pay-per-use.\n\n### Configuration File\n\nConfig is stored at:\n- **Linux/macOS:** `~/.config/transcript-kit/config.yaml`\n- **Windows:** `%APPDATA%/transcript-kit/config.yaml`\n\nSee `examples/config.yaml.example` for all available options.\n\n### Environment Variables\n\nYou can also configure via environment variables:\n\n```bash\nexport OPENROUTER_API_KEY=\"your-key-here\"\nexport TRANSCRIPT_KIT_AI_MODEL=\"openai/gpt-4o-mini\"\nexport TRANSCRIPT_KIT_DATA_DIR=\"~/Documents/my-transcripts\"\n```\n\n---\n\n## Usage\n\n### Commands\n\n```bash\n# Interactive setup wizard\ntranscript-kit setup\n\n# Process a YouTube video\ntranscript-kit process \"https://youtube.com/watch?v=xxx\"\n\n# Process with custom model\ntranscript-kit process \"URL\" --model anthropic/claude-3-haiku\n\n# Process without tagging\ntranscript-kit process \"URL\" --no-tag\n\n# Process to custom directory\ntranscript-kit process \"URL\" --output ~/custom/path\n\n# Show current configuration\ntranscript-kit config\n\n# Show configuration with API key visible\ntranscript-kit config --show-secrets\n\n# Show version\ntranscript-kit --version\n\n# Get help\ntranscript-kit --help\n```\n\n### Output Format\n\nTranscripts are saved as:\n\n```\nYYYY-MM-DD-video-title-[tag1,tag2].txt\n```\n\nExample:\n```\n2025-11-03-how-ai-works-[AI,Education].txt\n```\n\nFile contents:\n```markdown\n# How AI Works\n\n**Date**: 2025-11-03\n**Tags**: AI, Education\n**Context**: This video explains the fundamentals of artificial intelligence...\n\n---\n\n[Cleaned transcript with proper paragraphs and fixed errors...]\n```\n\n---\n\n## How It Works\n\n1. **Download** - Uses yt-dlp to fetch YouTube subtitles (.srt format)\n2. **Analyze** - AI analyzes the video title and content to understand context\n3. **Tag** - Assigns 1-2 relevant tags (reuses existing tags when possible)\n4. **Clean** - Processes transcript in chunks to:\n - Fix speech-to-text errors\n - Add proper punctuation\n - Create readable paragraphs\n - Maintain original meaning\n5. **Save** - Organized file with metadata and cleaned content\n\n---\n\n## Tag System\n\n### How Tags Work\n\n- Starts with **empty tag database** (or optional starter tags)\n- AI assigns **1-2 tags maximum** per transcript\n- **Prefers existing tags** to maintain consistency\n- **Creates new tags** only when necessary\n- All tags saved to `tags-database.txt`\n\n### Example Tag Evolution\n\n```\nFirst video: [AI]\nSecond video: [Marketing]\nThird video: [AI, Marketing] \u2190 Uses existing tags\nFourth video: [Tutorial] \u2190 Creates new tag when needed\n```\n\n---\n\n## Advanced Configuration\n\n### AI Models\n\nPopular models (via OpenRouter):\n- `openai/gpt-oss-20b` - Fast, good quality (default)\n- `openai/gpt-4o-mini` - Higher quality, moderate cost\n- `anthropic/claude-3-haiku` - Fast Claude model\n\n### Processing Options\n\nEdit `~/.config/transcript-kit/config.yaml`:\n\n```yaml\nai:\n chunk_size: 8000 # Words per API call\n max_retries: 3 # Retry attempts on error\n\nprocessing:\n analyze_context: true # Context analysis before processing\n auto_tag: true # Automatic tagging\n\ntags:\n max_tags: 2 # Maximum tags per transcript\n starter_tags: [] # Optional starter tags\n```\n\n---\n\n## Troubleshooting\n\n### \"yt-dlp not found\"\n\nThis shouldn't happen since yt-dlp is automatically installed with transcript-kit.\n\nIf you see this error, try reinstalling:\n```bash\npip3 install --force-reinstall transcript-kit\n```\n\n### \"API key not configured\"\n\nRun the setup wizard:\n```bash\ntranscript-kit setup\n```\n\nOr set environment variable:\n```bash\nexport OPENROUTER_API_KEY=\"your-key-here\"\n```\n\n### \"No subtitles found\"\n\nThe video may not have auto-generated subtitles. Try a different video or check if subtitles are available on YouTube.\n\n### \"Configuration error\"\n\nVerify your config file:\n```bash\ntranscript-kit config\n```\n\nRe-run setup if needed:\n```bash\ntranscript-kit setup\n```\n\n---\n\n## Security\n\n### API Key Protection\n\n- **NEVER** commit your `config.yaml` to git\n- Config file permissions are set to `0600` (owner read/write only)\n- API keys are never logged or printed\n- Use `.env` files for additional security (also gitignored)\n\n### Example Files\n\nSafe to commit:\n- \u2705 `examples/config.yaml.example`\n- \u2705 `.env.example`\n\n**NEVER** commit:\n- \u274c `~/.config/transcript-kit/config.yaml`\n- \u274c `.env`\n\n---\n\n## Development\n\n### Local Installation\n\n```bash\ngit clone https://github.com/kevincallens/transcript-kit.git\ncd transcript-kit\npip3 install -e \".[dev]\"\n```\n\n### Running Tests\n\n```bash\npytest\n```\n\n### Code Formatting\n\n```bash\nblack src/\nruff check src/\n```\n\n---\n\n## Contributing\n\nContributions are welcome! Please:\n\n1. Fork the repository\n2. Create a feature branch\n3. Make your changes\n4. Add tests if applicable\n5. Submit a pull request\n\n---\n\n## License\n\nMIT License - see [LICENSE](LICENSE) file for details\n\nCopyright (c) 2025 Kevin Callens\n\n---\n\n## Acknowledgments\n\n- Uses [yt-dlp](https://github.com/yt-dlp/yt-dlp) for downloading subtitles\n- Powered by [OpenRouter](https://openrouter.ai) for AI processing\n- Built with [Click](https://click.palletsprojects.com/) for the CLI\n\n---\n\n## Links\n\n- **Repository:** https://github.com/kevincallens/transcript-kit\n- **Issues:** https://github.com/kevincallens/transcript-kit/issues\n- **OpenRouter:** https://openrouter.ai\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "AI-powered YouTube transcript processor",
"version": "0.1.1",
"project_urls": {
"Homepage": "https://github.com/kevincallens/transcript-kit",
"Issues": "https://github.com/kevincallens/transcript-kit/issues",
"Repository": "https://github.com/kevincallens/transcript-kit"
},
"split_keywords": [
"youtube",
" transcript",
" ai",
" openrouter",
" subtitles"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "d081f3821065cb533016fc7626e8d13e5c0e9edafc1c3dd1b834c1508cbe9d6f",
"md5": "2f966d5ae0724547f7c63d1cb11c1283",
"sha256": "1347c892e5d3fb01a3b2bc28b6ac3b412d0375837bec706069aaa2230bfe172a"
},
"downloads": -1,
"filename": "transcript_kit-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2f966d5ae0724547f7c63d1cb11c1283",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 17955,
"upload_time": "2025-11-03T04:16:44",
"upload_time_iso_8601": "2025-11-03T04:16:44.158205Z",
"url": "https://files.pythonhosted.org/packages/d0/81/f3821065cb533016fc7626e8d13e5c0e9edafc1c3dd1b834c1508cbe9d6f/transcript_kit-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "efe873f97520f8657238eef0125e8fcd4a54861532bc204fad26c89c7308d222",
"md5": "5189127d0b3706eee8d4b16df7c11106",
"sha256": "914f22283f862a9fc6688a185da87c2ef5e05f8fd8cbef6be2cd5b404efd405b"
},
"downloads": -1,
"filename": "transcript_kit-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "5189127d0b3706eee8d4b16df7c11106",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 18570,
"upload_time": "2025-11-03T04:16:45",
"upload_time_iso_8601": "2025-11-03T04:16:45.603562Z",
"url": "https://files.pythonhosted.org/packages/ef/e8/73f97520f8657238eef0125e8fcd4a54861532bc204fad26c89c7308d222/transcript_kit-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-11-03 04:16:45",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "kevincallens",
"github_project": "transcript-kit",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "transcript-kit"
}