transcript-kit

Name	transcript-kit JSON
Version	0.1.1 JSON
	download
home_page	None
Summary	AI-powered YouTube transcript processor
upload_time	2025-11-03 04:16:45
maintainer	None
docs_url	None
author	Kevin Callens
requires_python	>=3.8
license	MIT
keywords	youtube transcript ai openrouter subtitles
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # transcript-kit

**AI-powered YouTube transcript processor** - Clean, tag, and organize video transcripts automatically.

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)

**Author:** Kevin Callens

---

## Features

- 🎬 **Download** YouTube subtitles automatically
- 🤖 **AI-powered cleaning** - Fixes speech-to-text errors, creates proper paragraphs
- 🏷️ **Smart tagging** - Automatically tags transcripts with relevant topics
- 📁 **Organized storage** - Clean file naming with dates and tags
- 🔧 **Configurable** - Customize AI model, storage location, and processing options
- 🌍 **Cross-platform** - Works on Linux, macOS, and Windows

---

## Quick Start

### Installation

```bash
pip3 install transcript-kit
```

Or using pipx (recommended):

```bash
pipx install transcript-kit
```

**That's it!** All dependencies (including yt-dlp) install automatically.

### Setup

Run the interactive setup wizard:

```bash
transcript-kit setup
```

This will:
- Prompt for your OpenRouter API key
- Configure AI model preferences
- Set up storage locations
- Optionally add starter tags

### Process a Video

```bash
transcript-kit process "https://www.youtube.com/watch?v=VIDEO_ID"
```

That's it! Your cleaned transcript will be saved to `~/Documents/transcript-kit/`

---

## Requirements

- **Python 3.8+**
- All other dependencies install automatically with `pip3 install transcript-kit`

---

## Configuration

### Getting an API Key

1. Visit [OpenRouter](https://openrouter.ai/keys)
2. Sign up for an account
3. Generate an API key
4. Run `transcript-kit setup` and enter your key

**Note:** transcript-kit uses OpenRouter which provides access to various AI models. Some models are free, others are pay-per-use.

### Configuration File

Config is stored at:
- **Linux/macOS:** `~/.config/transcript-kit/config.yaml`
- **Windows:** `%APPDATA%/transcript-kit/config.yaml`

See `examples/config.yaml.example` for all available options.

### Environment Variables

You can also configure via environment variables:

```bash
export OPENROUTER_API_KEY="your-key-here"
export TRANSCRIPT_KIT_AI_MODEL="openai/gpt-4o-mini"
export TRANSCRIPT_KIT_DATA_DIR="~/Documents/my-transcripts"
```

---

## Usage

### Commands

```bash
# Interactive setup wizard
transcript-kit setup

# Process a YouTube video
transcript-kit process "https://youtube.com/watch?v=xxx"

# Process with custom model
transcript-kit process "URL" --model anthropic/claude-3-haiku

# Process without tagging
transcript-kit process "URL" --no-tag

# Process to custom directory
transcript-kit process "URL" --output ~/custom/path

# Show current configuration
transcript-kit config

# Show configuration with API key visible
transcript-kit config --show-secrets

# Show version
transcript-kit --version

# Get help
transcript-kit --help
```

### Output Format

Transcripts are saved as:

```
YYYY-MM-DD-video-title-[tag1,tag2].txt
```

Example:
```
2025-11-03-how-ai-works-[AI,Education].txt
```

File contents:
```markdown
# How AI Works

**Date**: 2025-11-03
**Tags**: AI, Education
**Context**: This video explains the fundamentals of artificial intelligence...

---

[Cleaned transcript with proper paragraphs and fixed errors...]
```

---

## How It Works

1. **Download** - Uses yt-dlp to fetch YouTube subtitles (.srt format)
2. **Analyze** - AI analyzes the video title and content to understand context
3. **Tag** - Assigns 1-2 relevant tags (reuses existing tags when possible)
4. **Clean** - Processes transcript in chunks to:
   - Fix speech-to-text errors
   - Add proper punctuation
   - Create readable paragraphs
   - Maintain original meaning
5. **Save** - Organized file with metadata and cleaned content

---

## Tag System

### How Tags Work

- Starts with **empty tag database** (or optional starter tags)
- AI assigns **1-2 tags maximum** per transcript
- **Prefers existing tags** to maintain consistency
- **Creates new tags** only when necessary
- All tags saved to `tags-database.txt`

### Example Tag Evolution

```
First video:  [AI]
Second video: [Marketing]
Third video:  [AI, Marketing]  ← Uses existing tags
Fourth video: [Tutorial]       ← Creates new tag when needed
```

---

## Advanced Configuration

### AI Models

Popular models (via OpenRouter):
- `openai/gpt-oss-20b` - Fast, good quality (default)
- `openai/gpt-4o-mini` - Higher quality, moderate cost
- `anthropic/claude-3-haiku` - Fast Claude model

### Processing Options

Edit `~/.config/transcript-kit/config.yaml`:

```yaml
ai:
  chunk_size: 8000          # Words per API call
  max_retries: 3            # Retry attempts on error

processing:
  analyze_context: true     # Context analysis before processing
  auto_tag: true            # Automatic tagging

tags:
  max_tags: 2               # Maximum tags per transcript
  starter_tags: []          # Optional starter tags
```

---

## Troubleshooting

### "yt-dlp not found"

This shouldn't happen since yt-dlp is automatically installed with transcript-kit.

If you see this error, try reinstalling:
```bash
pip3 install --force-reinstall transcript-kit
```

### "API key not configured"

Run the setup wizard:
```bash
transcript-kit setup
```

Or set environment variable:
```bash
export OPENROUTER_API_KEY="your-key-here"
```

### "No subtitles found"

The video may not have auto-generated subtitles. Try a different video or check if subtitles are available on YouTube.

### "Configuration error"

Verify your config file:
```bash
transcript-kit config
```

Re-run setup if needed:
```bash
transcript-kit setup
```

---

## Security

### API Key Protection

- **NEVER** commit your `config.yaml` to git
- Config file permissions are set to `0600` (owner read/write only)
- API keys are never logged or printed
- Use `.env` files for additional security (also gitignored)

### Example Files

Safe to commit:
- ✅ `examples/config.yaml.example`
- ✅ `.env.example`

**NEVER** commit:
- ❌ `~/.config/transcript-kit/config.yaml`
- ❌ `.env`

---

## Development

### Local Installation

```bash
git clone https://github.com/kevincallens/transcript-kit.git
cd transcript-kit
pip3 install -e ".[dev]"
```

### Running Tests

```bash
pytest
```

### Code Formatting

```bash
black src/
ruff check src/
```

---

## Contributing

Contributions are welcome! Please:

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests if applicable
5. Submit a pull request

---

## License

MIT License - see [LICENSE](LICENSE) file for details

Copyright (c) 2025 Kevin Callens

---

## Acknowledgments

- Uses [yt-dlp](https://github.com/yt-dlp/yt-dlp) for downloading subtitles
- Powered by [OpenRouter](https://openrouter.ai) for AI processing
- Built with [Click](https://click.palletsprojects.com/) for the CLI

---

## Links

- **Repository:** https://github.com/kevincallens/transcript-kit
- **Issues:** https://github.com/kevincallens/transcript-kit/issues
- **OpenRouter:** https://openrouter.ai

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "transcript-kit",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "youtube, transcript, ai, openrouter, subtitles",
    "author": "Kevin Callens",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/ef/e8/73f97520f8657238eef0125e8fcd4a54861532bc204fad26c89c7308d222/transcript_kit-0.1.1.tar.gz",
    "platform": null,
    "description": "# transcript-kit\n\n**AI-powered YouTube transcript processor** - Clean, tag, and organize video transcripts automatically.\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)\n\n**Author:** Kevin Callens\n\n---\n\n## Features\n\n- \ud83c\udfac **Download** YouTube subtitles automatically\n- \ud83e\udd16 **AI-powered cleaning** - Fixes speech-to-text errors, creates proper paragraphs\n- \ud83c\udff7\ufe0f **Smart tagging** - Automatically tags transcripts with relevant topics\n- \ud83d\udcc1 **Organized storage** - Clean file naming with dates and tags\n- \ud83d\udd27 **Configurable** - Customize AI model, storage location, and processing options\n- \ud83c\udf0d **Cross-platform** - Works on Linux, macOS, and Windows\n\n---\n\n## Quick Start\n\n### Installation\n\n```bash\npip3 install transcript-kit\n```\n\nOr using pipx (recommended):\n\n```bash\npipx install transcript-kit\n```\n\n**That's it!** All dependencies (including yt-dlp) install automatically.\n\n### Setup\n\nRun the interactive setup wizard:\n\n```bash\ntranscript-kit setup\n```\n\nThis will:\n- Prompt for your OpenRouter API key\n- Configure AI model preferences\n- Set up storage locations\n- Optionally add starter tags\n\n### Process a Video\n\n```bash\ntranscript-kit process \"https://www.youtube.com/watch?v=VIDEO_ID\"\n```\n\nThat's it! Your cleaned transcript will be saved to `~/Documents/transcript-kit/`\n\n---\n\n## Requirements\n\n- **Python 3.8+**\n- All other dependencies install automatically with `pip3 install transcript-kit`\n\n---\n\n## Configuration\n\n### Getting an API Key\n\n1. Visit [OpenRouter](https://openrouter.ai/keys)\n2. Sign up for an account\n3. Generate an API key\n4. Run `transcript-kit setup` and enter your key\n\n**Note:** transcript-kit uses OpenRouter which provides access to various AI models. Some models are free, others are pay-per-use.\n\n### Configuration File\n\nConfig is stored at:\n- **Linux/macOS:** `~/.config/transcript-kit/config.yaml`\n- **Windows:** `%APPDATA%/transcript-kit/config.yaml`\n\nSee `examples/config.yaml.example` for all available options.\n\n### Environment Variables\n\nYou can also configure via environment variables:\n\n```bash\nexport OPENROUTER_API_KEY=\"your-key-here\"\nexport TRANSCRIPT_KIT_AI_MODEL=\"openai/gpt-4o-mini\"\nexport TRANSCRIPT_KIT_DATA_DIR=\"~/Documents/my-transcripts\"\n```\n\n---\n\n## Usage\n\n### Commands\n\n```bash\n# Interactive setup wizard\ntranscript-kit setup\n\n# Process a YouTube video\ntranscript-kit process \"https://youtube.com/watch?v=xxx\"\n\n# Process with custom model\ntranscript-kit process \"URL\" --model anthropic/claude-3-haiku\n\n# Process without tagging\ntranscript-kit process \"URL\" --no-tag\n\n# Process to custom directory\ntranscript-kit process \"URL\" --output ~/custom/path\n\n# Show current configuration\ntranscript-kit config\n\n# Show configuration with API key visible\ntranscript-kit config --show-secrets\n\n# Show version\ntranscript-kit --version\n\n# Get help\ntranscript-kit --help\n```\n\n### Output Format\n\nTranscripts are saved as:\n\n```\nYYYY-MM-DD-video-title-[tag1,tag2].txt\n```\n\nExample:\n```\n2025-11-03-how-ai-works-[AI,Education].txt\n```\n\nFile contents:\n```markdown\n# How AI Works\n\n**Date**: 2025-11-03\n**Tags**: AI, Education\n**Context**: This video explains the fundamentals of artificial intelligence...\n\n---\n\n[Cleaned transcript with proper paragraphs and fixed errors...]\n```\n\n---\n\n## How It Works\n\n1. **Download** - Uses yt-dlp to fetch YouTube subtitles (.srt format)\n2. **Analyze** - AI analyzes the video title and content to understand context\n3. **Tag** - Assigns 1-2 relevant tags (reuses existing tags when possible)\n4. **Clean** - Processes transcript in chunks to:\n   - Fix speech-to-text errors\n   - Add proper punctuation\n   - Create readable paragraphs\n   - Maintain original meaning\n5. **Save** - Organized file with metadata and cleaned content\n\n---\n\n## Tag System\n\n### How Tags Work\n\n- Starts with **empty tag database** (or optional starter tags)\n- AI assigns **1-2 tags maximum** per transcript\n- **Prefers existing tags** to maintain consistency\n- **Creates new tags** only when necessary\n- All tags saved to `tags-database.txt`\n\n### Example Tag Evolution\n\n```\nFirst video:  [AI]\nSecond video: [Marketing]\nThird video:  [AI, Marketing]  \u2190 Uses existing tags\nFourth video: [Tutorial]       \u2190 Creates new tag when needed\n```\n\n---\n\n## Advanced Configuration\n\n### AI Models\n\nPopular models (via OpenRouter):\n- `openai/gpt-oss-20b` - Fast, good quality (default)\n- `openai/gpt-4o-mini` - Higher quality, moderate cost\n- `anthropic/claude-3-haiku` - Fast Claude model\n\n### Processing Options\n\nEdit `~/.config/transcript-kit/config.yaml`:\n\n```yaml\nai:\n  chunk_size: 8000          # Words per API call\n  max_retries: 3            # Retry attempts on error\n\nprocessing:\n  analyze_context: true     # Context analysis before processing\n  auto_tag: true            # Automatic tagging\n\ntags:\n  max_tags: 2               # Maximum tags per transcript\n  starter_tags: []          # Optional starter tags\n```\n\n---\n\n## Troubleshooting\n\n### \"yt-dlp not found\"\n\nThis shouldn't happen since yt-dlp is automatically installed with transcript-kit.\n\nIf you see this error, try reinstalling:\n```bash\npip3 install --force-reinstall transcript-kit\n```\n\n### \"API key not configured\"\n\nRun the setup wizard:\n```bash\ntranscript-kit setup\n```\n\nOr set environment variable:\n```bash\nexport OPENROUTER_API_KEY=\"your-key-here\"\n```\n\n### \"No subtitles found\"\n\nThe video may not have auto-generated subtitles. Try a different video or check if subtitles are available on YouTube.\n\n### \"Configuration error\"\n\nVerify your config file:\n```bash\ntranscript-kit config\n```\n\nRe-run setup if needed:\n```bash\ntranscript-kit setup\n```\n\n---\n\n## Security\n\n### API Key Protection\n\n- **NEVER** commit your `config.yaml` to git\n- Config file permissions are set to `0600` (owner read/write only)\n- API keys are never logged or printed\n- Use `.env` files for additional security (also gitignored)\n\n### Example Files\n\nSafe to commit:\n- \u2705 `examples/config.yaml.example`\n- \u2705 `.env.example`\n\n**NEVER** commit:\n- \u274c `~/.config/transcript-kit/config.yaml`\n- \u274c `.env`\n\n---\n\n## Development\n\n### Local Installation\n\n```bash\ngit clone https://github.com/kevincallens/transcript-kit.git\ncd transcript-kit\npip3 install -e \".[dev]\"\n```\n\n### Running Tests\n\n```bash\npytest\n```\n\n### Code Formatting\n\n```bash\nblack src/\nruff check src/\n```\n\n---\n\n## Contributing\n\nContributions are welcome! Please:\n\n1. Fork the repository\n2. Create a feature branch\n3. Make your changes\n4. Add tests if applicable\n5. Submit a pull request\n\n---\n\n## License\n\nMIT License - see [LICENSE](LICENSE) file for details\n\nCopyright (c) 2025 Kevin Callens\n\n---\n\n## Acknowledgments\n\n- Uses [yt-dlp](https://github.com/yt-dlp/yt-dlp) for downloading subtitles\n- Powered by [OpenRouter](https://openrouter.ai) for AI processing\n- Built with [Click](https://click.palletsprojects.com/) for the CLI\n\n---\n\n## Links\n\n- **Repository:** https://github.com/kevincallens/transcript-kit\n- **Issues:** https://github.com/kevincallens/transcript-kit/issues\n- **OpenRouter:** https://openrouter.ai\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "AI-powered YouTube transcript processor",
    "version": "0.1.1",
    "project_urls": {
        "Homepage": "https://github.com/kevincallens/transcript-kit",
        "Issues": "https://github.com/kevincallens/transcript-kit/issues",
        "Repository": "https://github.com/kevincallens/transcript-kit"
    },
    "split_keywords": [
        "youtube",
        " transcript",
        " ai",
        " openrouter",
        " subtitles"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d081f3821065cb533016fc7626e8d13e5c0e9edafc1c3dd1b834c1508cbe9d6f",
                "md5": "2f966d5ae0724547f7c63d1cb11c1283",
                "sha256": "1347c892e5d3fb01a3b2bc28b6ac3b412d0375837bec706069aaa2230bfe172a"
            },
            "downloads": -1,
            "filename": "transcript_kit-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2f966d5ae0724547f7c63d1cb11c1283",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 17955,
            "upload_time": "2025-11-03T04:16:44",
            "upload_time_iso_8601": "2025-11-03T04:16:44.158205Z",
            "url": "https://files.pythonhosted.org/packages/d0/81/f3821065cb533016fc7626e8d13e5c0e9edafc1c3dd1b834c1508cbe9d6f/transcript_kit-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "efe873f97520f8657238eef0125e8fcd4a54861532bc204fad26c89c7308d222",
                "md5": "5189127d0b3706eee8d4b16df7c11106",
                "sha256": "914f22283f862a9fc6688a185da87c2ef5e05f8fd8cbef6be2cd5b404efd405b"
            },
            "downloads": -1,
            "filename": "transcript_kit-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "5189127d0b3706eee8d4b16df7c11106",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 18570,
            "upload_time": "2025-11-03T04:16:45",
            "upload_time_iso_8601": "2025-11-03T04:16:45.603562Z",
            "url": "https://files.pythonhosted.org/packages/ef/e8/73f97520f8657238eef0125e8fcd4a54861532bc204fad26c89c7308d222/transcript_kit-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-11-03 04:16:45",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kevincallens",
    "github_project": "transcript-kit",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "transcript-kit"
}

Kevin Callens