yt-ts-extract


Nameyt-ts-extract JSON
Version 1.0.0 PyPI version JSON
download
home_pageNone
SummaryExtract transcripts from YouTube videos with multi-language support and various output formats
upload_time2025-08-24 11:23:57
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT License Copyright (c) 2025 yt-ts-extract Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords automation captions cli extract multilanguage srt subtitles transcript video youtube
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # yt-ts-extract

[![PyPI version](https://badge.fury.io/py/yt-ts-extract.svg)](https://badge.fury.io/py/yt-ts-extract)
[![Python Support](https://img.shields.io/pypi/pyversions/yt-ts-extract.svg)](https://pypi.org/project/yt-ts-extract/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A robust Python library and CLI tool for extracting YouTube video transcripts with multi-language support and proxy rotation capabilities.

## โœจ Key Features

- **Extract transcripts** from YouTube videos via video ID
- **26+ language support** (English, Spanish, French, German, Japanese, Arabic, Chinese, etc.)
- **Multiple output formats**: plain text, SRT subtitles, timestamped segments, JSON
- **Batch processing** for multiple videos
- **Anti-bot protection**: Android client implementation bypasses detection
- **Proxy rotation**: Multiple proxy support with automatic rotation strategies
- **Both CLI and Python library** interfaces

## ๐Ÿš€ Installation

```bash
# Install from PyPI
pip install yt-ts-extract

# Or install in development mode
git clone https://github.com/sinjab/yt-ts-extract.git
cd yt-ts-extract
pip install -e .
```

## ๐Ÿ“– Quick Start

### Command Line Interface

```bash
# Basic transcript extraction
yt-transcript fR9ClX0egTc

# Export as SRT subtitles
yt-transcript -f srt -o video.srt fR9ClX0egTc

# List available languages
yt-transcript --list-languages fR9ClX0egTc

# Batch process multiple videos
yt-transcript --batch ids.txt --output-dir ./transcripts/

# Get help
yt-transcript --help
```

### Python Library

```python
from yt_ts_extract import (
    get_transcript,
    get_transcript_text,
    get_available_languages,
    YouTubeTranscriptExtractor,
)
from yt_ts_extract.utils import export_to_srt, get_transcript_stats

# Quick transcript extraction
transcript = get_transcript("fR9ClX0egTc")
print(f"Segments: {len(transcript)}")

# Export to SRT
srt_text = export_to_srt(transcript)
with open("video.srt", "w", encoding="utf-8") as f:
    f.write(srt_text)

# Plain text and languages
text = get_transcript_text("fR9ClX0egTc")
langs = get_available_languages("fR9ClX0egTc")
print(f"Languages available: {[l['code'] for l in langs]}")

# Using the class directly
extractor = YouTubeTranscriptExtractor(
    timeout=20,
    max_retries=5,
    backoff_factor=1.0,
    min_delay=1.5
)
segments = extractor.get_transcript("fR9ClX0egTc", language="en")
stats = get_transcript_stats(segments)
print(stats)
```

## ๐ŸŽ›๏ธ CLI Options

```bash
yt-transcript [OPTIONS] VIDEO_ID

Options:
  -f, --format [text|srt|segments|stats]  Output format (default: text)
  -o, --output PATH                       Save output to file
  -l, --language TEXT                     Language code (e.g., 'en', 'es', 'fr')
  --list-languages                        Show available languages for video
  --batch PATH                            Process video IDs from file (one per line)
  --output-dir PATH                       Directory for batch output files
  --search TEXT                           Search for specific text in transcript
  --examples                              Show usage examples
  --timeout FLOAT                         Per-request timeout in seconds (default: 30)
  --retries INT                           Max HTTP retries on failure (default: 3)
  --backoff FLOAT                         Exponential backoff factor (default: 0.75)
  --min-delay FLOAT                       Minimum delay between requests (default: 2)
  --proxy TEXT                            Proxy URL (e.g., "http://user:pass@host:port")
  --proxy-list PATH                       Proxy list file for rotation
  --rotation-strategy [random|round_robin|least_used]  Proxy rotation strategy (default: random)
  --health-check                          Perform health check on all proxies before starting
  --help                                  Show this message and exit
```

### Network Tuning Examples

```bash
# Increase retries and timeout
yt-transcript fR9ClX0egTc --retries 5 --timeout 45

# Reduce delay for faster runs (use responsibly)
yt-transcript fR9ClX0egTc --min-delay 1.0 --backoff 0.5

# Single proxy
yt-transcript fR9ClX0egTc --proxy "http://user:pass@host:port"

# Proxy rotation with health check
yt-transcript fR9ClX0egTc --proxy-list proxies.txt --health-check

# Batch processing with proxy rotation
yt-transcript --batch ids.txt --proxy-list proxies.txt --output-dir transcripts/
```

## ๐Ÿ”„ Proxy Support

### Single Proxy

```bash
# HTTP proxy with authentication
yt-transcript --proxy "http://username:password@proxy-host:8080" fR9ClX0egTc

# HTTPS proxy
yt-transcript --proxy "https://proxy-host:8443" fR9ClX0egTc

# SOCKS5 proxy
yt-transcript --proxy "socks5://user:pass@proxy-host:1080" fR9ClX0egTc
```

### Proxy Rotation

Load multiple proxies from a file and automatically rotate between them:

```bash
# Basic proxy rotation
yt-transcript --proxy-list proxies.txt fR9ClX0egTc

# With rotation strategy
yt-transcript --proxy-list proxies.txt --rotation-strategy round_robin fR9ClX0egTc

# With health check
yt-transcript --proxy-list proxies.txt --health-check fR9ClX0egTc
```

**Proxy List File Format** (`proxies.txt`):
```
Address Port Username Password
23.95.150.145 6114 mhzbhrwb yj2veiaafrbu
198.23.239.134 6540 mhzbhrwb yj2veiaafrbu
45.38.107.97 6014 mhzbhrwb yj2veiaafrbu
64.137.96.74 6641 mhzbhrwb yj2veiaafrbu
216.10.27.159 6837 mhzbhrwb yj2veiaafrbu
136.0.207.84 6661 mhzbhrwb yj2veiaafrbu
```

**Rotation Strategies:**
- `random`: Random proxy selection (default)
- `round_robin`: Cycle through proxies in order
- `least_used`: Select least recently used proxy

### Python Proxy Usage

```python
from yt_ts_extract import YouTubeTranscriptExtractor, ProxyManager

# Single proxy
extractor = YouTubeTranscriptExtractor(
    proxy="http://user:pass@host:port",
    timeout=30,
    max_retries=3
)

# Proxy rotation
proxy_manager = ProxyManager.from_file("proxies.txt", rotation_strategy="round_robin")
extractor = YouTubeTranscriptExtractor(
    proxy_manager=proxy_manager,
    timeout=30,
    max_retries=3
)

# Convenience functions with proxy rotation
from yt_ts_extract import get_transcript_with_proxy_rotation
transcript = get_transcript_with_proxy_rotation("fR9ClX0egTc", "proxies.txt")
```

**Proxy Best Practices:**
- Use `--health-check` to verify proxy connectivity before processing
- Failed proxies are automatically deactivated and reactivated after cooldown
- Each proxy respects minimum delay between requests
- Monitor proxy health with `extractor.get_proxy_stats()`

## ๐Ÿ“Š Output Formats

### 1. Plain Text (`text`)
```
Hello everyone and welcome to this tutorial.
In this video we'll be covering the basics of...
```

### 2. SRT Subtitles (`srt`)
```
1
00:00:00,000 --> 00:00:03,200
Hello everyone and welcome to this tutorial.

2
00:00:03,200 --> 00:00:07,840
In this video we'll be covering the basics of...
```

### 3. Timestamped Segments (`segments`)
```json
[
  {
    "start": 0.0,
    "end": 3.2,
    "duration": 3.2,
    "text": "Hello everyone and welcome to this tutorial."
  },
  {
    "start": 3.2,
    "end": 7.84,
    "duration": 4.64,
    "text": "In this video we'll be covering the basics of..."
  }
]
```

### 4. Statistics (`stats`)
```json
{
  "total_segments": 245,
  "total_duration": 1823.4,
  "word_count": 2156,
  "average_words_per_segment": 8.8,
  "languages_available": ["en", "es", "fr", "de"]
}
```

## ๐ŸŒ Language Support

Supports 26+ languages with automatic detection:

| Language | Code | Language | Code |
|----------|------|----------|------|
| English | `en` | Spanish | `es` |
| French | `fr` | German | `de` |
| Italian | `it` | Portuguese | `pt` |
| Russian | `ru` | Japanese | `ja` |
| Korean | `ko` | Chinese (Simplified) | `zh-Hans` |
| Chinese (Traditional) | `zh-Hant` | Arabic | `ar` |
| Hindi | `hi` | Dutch | `nl` |
| Polish | `pl` | Turkish | `tr` |

Use `--list-languages` to see available languages for any video.

## ๐Ÿ”ง Advanced Usage

### Batch Processing

Create an `ids.txt` file (one video ID per line):
```
fR9ClX0egTc
9bZkp7q19f0
wIwCTQZ_xFE
```

Process all videos:
```bash
yt-transcript --batch ids.txt --format srt --output-dir ./subtitles/
```

### Search Within Transcripts

```bash
# Find mentions of specific topics
yt-transcript --search "machine learning" VIDEO_ID
```

### Advanced Python Features

```python
from yt_ts_extract import YouTubeTranscriptExtractor
from yt_ts_extract.utils import get_transcript_stats

extractor = YouTubeTranscriptExtractor()

# Get timestamped segments for an ID
segments = extractor.get_transcript("fR9ClX0egTc", language="en")
for seg in segments[:5]:
    print(f"{seg['start']:.1f}s: {seg['text']}")

# Get statistics about the transcript
stats = get_transcript_stats(segments)
print(f"Duration: {stats['duration_seconds']:.1f} seconds")
print(f"Word count: {stats['word_count']} words")
```

## ๐Ÿ—๏ธ Technical Architecture

### Android Client Implementation
The extractor uses Android YouTube client headers to bypass anti-bot measures:

```http
User-Agent: com.google.android.youtube/20.10.38 (Linux; U; Android 14) gzip
X-YouTube-Client-Name: 3
X-YouTube-Client-Version: 20.10.38
Content-Type: application/json
```

### Dual XML Parser System
- **Legacy format**: Direct XML transcript data
- **Current format**: API-based JSON responses with embedded XML

### Proxy Architecture
- **Rotation strategies**: random, round_robin, least_used
- **Health monitoring**: Automatic health checks and failed proxy deactivation
- **Recovery**: Reactivation after cooldown periods

### Error Handling & Recovery
- **Exponential backoff**: Prevents overwhelming servers during failures
- **Retry mechanisms**: Configurable retry logic with circuit breaking
- **Graceful degradation**: Falls back to alternative extraction methods
- **Rate limiting**: Built-in delays prevent IP-based blocking

## ๐Ÿงช Testing

```bash
# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=yt_ts_extract --cov-report=term-missing

# Run specific test suites
uv run pytest tests/test_proxy_manager.py -v
uv run pytest tests/test_e2e_proxy.py -v
```

### Test Categories
- **Unit tests**: Individual component testing
- **Integration tests**: CLI and API integration testing  
- **E2E tests**: Full workflow testing with real YouTube videos
- **Proxy tests**: Proxy rotation and health check testing
- **Network resilience**: Timeout and retry behavior testing

## ๐Ÿค Contributing

1. Fork the repository
2. Create a feature branch: `git checkout -b feature/amazing-feature`
3. Make your changes and add tests
4. Run the test suite: `uv run pytest`
5. Submit a pull request

### Development Setup
```bash
git clone https://github.com/sinjab/yt-ts-extract.git
cd yt-ts-extract
uv sync  # Install dependencies
uv run pytest  # Run tests
```

## ๐Ÿ“„ License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## ๐Ÿ†˜ Support

- **Issues**: [GitHub Issues](https://github.com/sinjab/yt-ts-extract/issues)
- **Documentation**: This README and inline code documentation
- **Examples**: Check the `examples/` directory for usage patterns

---

**Made with โค๏ธ for the developer community. Happy transcript extracting!**
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "yt-ts-extract",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "automation, captions, cli, extract, multilanguage, srt, subtitles, transcript, video, youtube",
    "author": null,
    "author_email": "Khaldoon Sinjab <sinjab@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/5b/50/aa40da1178ed298d79a2daf940369798aedb5c70123e6aca2500c06f2968/yt_ts_extract-1.0.0.tar.gz",
    "platform": null,
    "description": "# yt-ts-extract\n\n[![PyPI version](https://badge.fury.io/py/yt-ts-extract.svg)](https://badge.fury.io/py/yt-ts-extract)\n[![Python Support](https://img.shields.io/pypi/pyversions/yt-ts-extract.svg)](https://pypi.org/project/yt-ts-extract/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\nA robust Python library and CLI tool for extracting YouTube video transcripts with multi-language support and proxy rotation capabilities.\n\n## \u2728 Key Features\n\n- **Extract transcripts** from YouTube videos via video ID\n- **26+ language support** (English, Spanish, French, German, Japanese, Arabic, Chinese, etc.)\n- **Multiple output formats**: plain text, SRT subtitles, timestamped segments, JSON\n- **Batch processing** for multiple videos\n- **Anti-bot protection**: Android client implementation bypasses detection\n- **Proxy rotation**: Multiple proxy support with automatic rotation strategies\n- **Both CLI and Python library** interfaces\n\n## \ud83d\ude80 Installation\n\n```bash\n# Install from PyPI\npip install yt-ts-extract\n\n# Or install in development mode\ngit clone https://github.com/sinjab/yt-ts-extract.git\ncd yt-ts-extract\npip install -e .\n```\n\n## \ud83d\udcd6 Quick Start\n\n### Command Line Interface\n\n```bash\n# Basic transcript extraction\nyt-transcript fR9ClX0egTc\n\n# Export as SRT subtitles\nyt-transcript -f srt -o video.srt fR9ClX0egTc\n\n# List available languages\nyt-transcript --list-languages fR9ClX0egTc\n\n# Batch process multiple videos\nyt-transcript --batch ids.txt --output-dir ./transcripts/\n\n# Get help\nyt-transcript --help\n```\n\n### Python Library\n\n```python\nfrom yt_ts_extract import (\n    get_transcript,\n    get_transcript_text,\n    get_available_languages,\n    YouTubeTranscriptExtractor,\n)\nfrom yt_ts_extract.utils import export_to_srt, get_transcript_stats\n\n# Quick transcript extraction\ntranscript = get_transcript(\"fR9ClX0egTc\")\nprint(f\"Segments: {len(transcript)}\")\n\n# Export to SRT\nsrt_text = export_to_srt(transcript)\nwith open(\"video.srt\", \"w\", encoding=\"utf-8\") as f:\n    f.write(srt_text)\n\n# Plain text and languages\ntext = get_transcript_text(\"fR9ClX0egTc\")\nlangs = get_available_languages(\"fR9ClX0egTc\")\nprint(f\"Languages available: {[l['code'] for l in langs]}\")\n\n# Using the class directly\nextractor = YouTubeTranscriptExtractor(\n    timeout=20,\n    max_retries=5,\n    backoff_factor=1.0,\n    min_delay=1.5\n)\nsegments = extractor.get_transcript(\"fR9ClX0egTc\", language=\"en\")\nstats = get_transcript_stats(segments)\nprint(stats)\n```\n\n## \ud83c\udf9b\ufe0f CLI Options\n\n```bash\nyt-transcript [OPTIONS] VIDEO_ID\n\nOptions:\n  -f, --format [text|srt|segments|stats]  Output format (default: text)\n  -o, --output PATH                       Save output to file\n  -l, --language TEXT                     Language code (e.g., 'en', 'es', 'fr')\n  --list-languages                        Show available languages for video\n  --batch PATH                            Process video IDs from file (one per line)\n  --output-dir PATH                       Directory for batch output files\n  --search TEXT                           Search for specific text in transcript\n  --examples                              Show usage examples\n  --timeout FLOAT                         Per-request timeout in seconds (default: 30)\n  --retries INT                           Max HTTP retries on failure (default: 3)\n  --backoff FLOAT                         Exponential backoff factor (default: 0.75)\n  --min-delay FLOAT                       Minimum delay between requests (default: 2)\n  --proxy TEXT                            Proxy URL (e.g., \"http://user:pass@host:port\")\n  --proxy-list PATH                       Proxy list file for rotation\n  --rotation-strategy [random|round_robin|least_used]  Proxy rotation strategy (default: random)\n  --health-check                          Perform health check on all proxies before starting\n  --help                                  Show this message and exit\n```\n\n### Network Tuning Examples\n\n```bash\n# Increase retries and timeout\nyt-transcript fR9ClX0egTc --retries 5 --timeout 45\n\n# Reduce delay for faster runs (use responsibly)\nyt-transcript fR9ClX0egTc --min-delay 1.0 --backoff 0.5\n\n# Single proxy\nyt-transcript fR9ClX0egTc --proxy \"http://user:pass@host:port\"\n\n# Proxy rotation with health check\nyt-transcript fR9ClX0egTc --proxy-list proxies.txt --health-check\n\n# Batch processing with proxy rotation\nyt-transcript --batch ids.txt --proxy-list proxies.txt --output-dir transcripts/\n```\n\n## \ud83d\udd04 Proxy Support\n\n### Single Proxy\n\n```bash\n# HTTP proxy with authentication\nyt-transcript --proxy \"http://username:password@proxy-host:8080\" fR9ClX0egTc\n\n# HTTPS proxy\nyt-transcript --proxy \"https://proxy-host:8443\" fR9ClX0egTc\n\n# SOCKS5 proxy\nyt-transcript --proxy \"socks5://user:pass@proxy-host:1080\" fR9ClX0egTc\n```\n\n### Proxy Rotation\n\nLoad multiple proxies from a file and automatically rotate between them:\n\n```bash\n# Basic proxy rotation\nyt-transcript --proxy-list proxies.txt fR9ClX0egTc\n\n# With rotation strategy\nyt-transcript --proxy-list proxies.txt --rotation-strategy round_robin fR9ClX0egTc\n\n# With health check\nyt-transcript --proxy-list proxies.txt --health-check fR9ClX0egTc\n```\n\n**Proxy List File Format** (`proxies.txt`):\n```\nAddress Port Username Password\n23.95.150.145 6114 mhzbhrwb yj2veiaafrbu\n198.23.239.134 6540 mhzbhrwb yj2veiaafrbu\n45.38.107.97 6014 mhzbhrwb yj2veiaafrbu\n64.137.96.74 6641 mhzbhrwb yj2veiaafrbu\n216.10.27.159 6837 mhzbhrwb yj2veiaafrbu\n136.0.207.84 6661 mhzbhrwb yj2veiaafrbu\n```\n\n**Rotation Strategies:**\n- `random`: Random proxy selection (default)\n- `round_robin`: Cycle through proxies in order\n- `least_used`: Select least recently used proxy\n\n### Python Proxy Usage\n\n```python\nfrom yt_ts_extract import YouTubeTranscriptExtractor, ProxyManager\n\n# Single proxy\nextractor = YouTubeTranscriptExtractor(\n    proxy=\"http://user:pass@host:port\",\n    timeout=30,\n    max_retries=3\n)\n\n# Proxy rotation\nproxy_manager = ProxyManager.from_file(\"proxies.txt\", rotation_strategy=\"round_robin\")\nextractor = YouTubeTranscriptExtractor(\n    proxy_manager=proxy_manager,\n    timeout=30,\n    max_retries=3\n)\n\n# Convenience functions with proxy rotation\nfrom yt_ts_extract import get_transcript_with_proxy_rotation\ntranscript = get_transcript_with_proxy_rotation(\"fR9ClX0egTc\", \"proxies.txt\")\n```\n\n**Proxy Best Practices:**\n- Use `--health-check` to verify proxy connectivity before processing\n- Failed proxies are automatically deactivated and reactivated after cooldown\n- Each proxy respects minimum delay between requests\n- Monitor proxy health with `extractor.get_proxy_stats()`\n\n## \ud83d\udcca Output Formats\n\n### 1. Plain Text (`text`)\n```\nHello everyone and welcome to this tutorial.\nIn this video we'll be covering the basics of...\n```\n\n### 2. SRT Subtitles (`srt`)\n```\n1\n00:00:00,000 --> 00:00:03,200\nHello everyone and welcome to this tutorial.\n\n2\n00:00:03,200 --> 00:00:07,840\nIn this video we'll be covering the basics of...\n```\n\n### 3. Timestamped Segments (`segments`)\n```json\n[\n  {\n    \"start\": 0.0,\n    \"end\": 3.2,\n    \"duration\": 3.2,\n    \"text\": \"Hello everyone and welcome to this tutorial.\"\n  },\n  {\n    \"start\": 3.2,\n    \"end\": 7.84,\n    \"duration\": 4.64,\n    \"text\": \"In this video we'll be covering the basics of...\"\n  }\n]\n```\n\n### 4. Statistics (`stats`)\n```json\n{\n  \"total_segments\": 245,\n  \"total_duration\": 1823.4,\n  \"word_count\": 2156,\n  \"average_words_per_segment\": 8.8,\n  \"languages_available\": [\"en\", \"es\", \"fr\", \"de\"]\n}\n```\n\n## \ud83c\udf0d Language Support\n\nSupports 26+ languages with automatic detection:\n\n| Language | Code | Language | Code |\n|----------|------|----------|------|\n| English | `en` | Spanish | `es` |\n| French | `fr` | German | `de` |\n| Italian | `it` | Portuguese | `pt` |\n| Russian | `ru` | Japanese | `ja` |\n| Korean | `ko` | Chinese (Simplified) | `zh-Hans` |\n| Chinese (Traditional) | `zh-Hant` | Arabic | `ar` |\n| Hindi | `hi` | Dutch | `nl` |\n| Polish | `pl` | Turkish | `tr` |\n\nUse `--list-languages` to see available languages for any video.\n\n## \ud83d\udd27 Advanced Usage\n\n### Batch Processing\n\nCreate an `ids.txt` file (one video ID per line):\n```\nfR9ClX0egTc\n9bZkp7q19f0\nwIwCTQZ_xFE\n```\n\nProcess all videos:\n```bash\nyt-transcript --batch ids.txt --format srt --output-dir ./subtitles/\n```\n\n### Search Within Transcripts\n\n```bash\n# Find mentions of specific topics\nyt-transcript --search \"machine learning\" VIDEO_ID\n```\n\n### Advanced Python Features\n\n```python\nfrom yt_ts_extract import YouTubeTranscriptExtractor\nfrom yt_ts_extract.utils import get_transcript_stats\n\nextractor = YouTubeTranscriptExtractor()\n\n# Get timestamped segments for an ID\nsegments = extractor.get_transcript(\"fR9ClX0egTc\", language=\"en\")\nfor seg in segments[:5]:\n    print(f\"{seg['start']:.1f}s: {seg['text']}\")\n\n# Get statistics about the transcript\nstats = get_transcript_stats(segments)\nprint(f\"Duration: {stats['duration_seconds']:.1f} seconds\")\nprint(f\"Word count: {stats['word_count']} words\")\n```\n\n## \ud83c\udfd7\ufe0f Technical Architecture\n\n### Android Client Implementation\nThe extractor uses Android YouTube client headers to bypass anti-bot measures:\n\n```http\nUser-Agent: com.google.android.youtube/20.10.38 (Linux; U; Android 14) gzip\nX-YouTube-Client-Name: 3\nX-YouTube-Client-Version: 20.10.38\nContent-Type: application/json\n```\n\n### Dual XML Parser System\n- **Legacy format**: Direct XML transcript data\n- **Current format**: API-based JSON responses with embedded XML\n\n### Proxy Architecture\n- **Rotation strategies**: random, round_robin, least_used\n- **Health monitoring**: Automatic health checks and failed proxy deactivation\n- **Recovery**: Reactivation after cooldown periods\n\n### Error Handling & Recovery\n- **Exponential backoff**: Prevents overwhelming servers during failures\n- **Retry mechanisms**: Configurable retry logic with circuit breaking\n- **Graceful degradation**: Falls back to alternative extraction methods\n- **Rate limiting**: Built-in delays prevent IP-based blocking\n\n## \ud83e\uddea Testing\n\n```bash\n# Run all tests\nuv run pytest\n\n# Run with coverage\nuv run pytest --cov=yt_ts_extract --cov-report=term-missing\n\n# Run specific test suites\nuv run pytest tests/test_proxy_manager.py -v\nuv run pytest tests/test_e2e_proxy.py -v\n```\n\n### Test Categories\n- **Unit tests**: Individual component testing\n- **Integration tests**: CLI and API integration testing  \n- **E2E tests**: Full workflow testing with real YouTube videos\n- **Proxy tests**: Proxy rotation and health check testing\n- **Network resilience**: Timeout and retry behavior testing\n\n## \ud83e\udd1d Contributing\n\n1. Fork the repository\n2. Create a feature branch: `git checkout -b feature/amazing-feature`\n3. Make your changes and add tests\n4. Run the test suite: `uv run pytest`\n5. Submit a pull request\n\n### Development Setup\n```bash\ngit clone https://github.com/sinjab/yt-ts-extract.git\ncd yt-ts-extract\nuv sync  # Install dependencies\nuv run pytest  # Run tests\n```\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## \ud83c\udd98 Support\n\n- **Issues**: [GitHub Issues](https://github.com/sinjab/yt-ts-extract/issues)\n- **Documentation**: This README and inline code documentation\n- **Examples**: Check the `examples/` directory for usage patterns\n\n---\n\n**Made with \u2764\ufe0f for the developer community. Happy transcript extracting!**",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2025 yt-ts-extract  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.",
    "summary": "Extract transcripts from YouTube videos with multi-language support and various output formats",
    "version": "1.0.0",
    "project_urls": {
        "Documentation": "https://github.com/sinjab/yt-ts-extract#readme",
        "Homepage": "https://github.com/sinjab/yt-ts-extract",
        "Issues": "https://github.com/sinjab/yt-ts-extract/issues",
        "Repository": "https://github.com/sinjab/yt-ts-extract"
    },
    "split_keywords": [
        "automation",
        " captions",
        " cli",
        " extract",
        " multilanguage",
        " srt",
        " subtitles",
        " transcript",
        " video",
        " youtube"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9271786d412d6374b294c85fa77349fa351069035abc4085ccf7d2be81a6737a",
                "md5": "bc06f83224298d52c9bf0b96b1aefcc4",
                "sha256": "ef4747edba10b629a1d1a1471532e7b0de97b331b5b201df25759c4facb027df"
            },
            "downloads": -1,
            "filename": "yt_ts_extract-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "bc06f83224298d52c9bf0b96b1aefcc4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 28017,
            "upload_time": "2025-08-24T11:23:55",
            "upload_time_iso_8601": "2025-08-24T11:23:55.820202Z",
            "url": "https://files.pythonhosted.org/packages/92/71/786d412d6374b294c85fa77349fa351069035abc4085ccf7d2be81a6737a/yt_ts_extract-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5b50aa40da1178ed298d79a2daf940369798aedb5c70123e6aca2500c06f2968",
                "md5": "00526744d5770acb92ee27d619d84424",
                "sha256": "6df9f8000f76bc89868694584c5a23c6db54a34a85488d2e3adec22096d83ee5"
            },
            "downloads": -1,
            "filename": "yt_ts_extract-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "00526744d5770acb92ee27d619d84424",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 45906,
            "upload_time": "2025-08-24T11:23:57",
            "upload_time_iso_8601": "2025-08-24T11:23:57.296806Z",
            "url": "https://files.pythonhosted.org/packages/5b/50/aa40da1178ed298d79a2daf940369798aedb5c70123e6aca2500c06f2968/yt_ts_extract-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-24 11:23:57",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "sinjab",
    "github_project": "yt-ts-extract#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "yt-ts-extract"
}
        
Elapsed time: 1.62623s