parakeet-stream

Name	parakeet-stream JSON
Version	0.5.0 JSON
	download
home_page	None
Summary	Simple, powerful streaming transcription for Python using NVIDIA's Parakeet TDT 0.6b
upload_time	2025-10-13 02:11:01
maintainer	None
docs_url	None
author	None
requires_python	<3.14,>=3.9
license	MIT
keywords	asr nemo nvidia parakeet speech-recognition streaming transcription
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Parakeet Stream

**Simple, powerful streaming transcription for Python using NVIDIA's Parakeet TDT 0.6b**

A modern Python library with a beautiful REPL-friendly API for audio transcription, featuring instant quality tuning, live microphone support, and rich interactive displays.

## ✨ Features

- 🎯 **Simple & Intuitive** - Beautiful API designed for interactive use
- 🎨 **Rich Displays** - Gorgeous output in Python REPL, IPython, and Jupyter notebooks
- ⚡ **Instant Quality Tuning** - Switch between 6 quality presets without reloading model
- 🎤 **Live Transcription** - Real-time microphone transcription with one line of code
- 🌊 **Streaming Support** - Process audio in chunks with configurable latency
- 💻 **CPU Optimized** - Efficient inference on CPU (GPU optional)
- 🌍 **25 Languages** - Automatic language detection
- 📦 **Batch Processing** - Transcribe multiple files efficiently
- ⏱️ **Timestamps** - Optional word-level timestamps

## 🚀 Installation

### Quick Install

```bash
# Install with pip
pip install git+https://github.com/maximerivest/parakeet-stream.git

# Or with uv (recommended)
uv pip install git+https://github.com/maximerivest/parakeet-stream.git

# With microphone support
pip install "parakeet-stream[microphone] @ git+https://github.com/maximerivest/parakeet-stream.git"
```

### Install from Source

```bash
git clone https://github.com/maximerivest/parakeet-stream.git
cd parakeet-stream

# Install with uv
uv pip install -e .

# Or with pip
pip install -e .

# With microphone support
uv pip install -e ".[microphone]"
```

### Requirements

- Python 3.9-3.13
- 2GB+ RAM (4GB+ recommended)
- Any modern CPU (GPU optional)

**Note**: Python 3.13 support requires `ml-dtypes>=0.5.0` which is automatically installed as a dependency.

## 📖 Quick Start

### Basic Transcription

```python
from parakeet_stream import Parakeet

# Initialize (loads model with clean progress bar)
pk = Parakeet()

# Transcribe an audio file
result = pk.transcribe("audio.wav")
print(result.text)
```

The model loads immediately on initialization with a clean progress bar (no verbose logging). First run takes 3-5 minutes (downloads ~600MB from HuggingFace), subsequent runs load from cache in ~5 seconds.

### Live Microphone Transcription

```python
from parakeet_stream import Parakeet

# Initialize transcriber
pk = Parakeet()

# Start live transcription (silent mode - no console output)
live = pk.listen()

# Speak into microphone...
# Transcription happens silently in background

# Access transcript
print(live.text)  # Get current text
print(live.transcript.stats)  # Get statistics

# Stop and get results
live.stop()
print(live.transcript.text)

# Verbose mode - prints transcriptions to console
live = pk.listen(verbose=True)
# [2.5s] Hello world
# [4.6s] This is a test
```

### Quality/Latency Tuning

Switch between quality presets instantly - **no model reload needed**!

```python
from parakeet_stream import Parakeet

pk = Parakeet()

# Try different quality levels (no reload!)
pk.with_quality('max').transcribe("audio.wav")      # ●●●●● (15s latency)
pk.with_quality('high').transcribe("audio.wav")     # ●●●●○ (10s latency)
pk.with_quality('good').transcribe("audio.wav")     # ●●●○○ (4s latency)
pk.with_quality('low').transcribe("audio.wav")      # ●●○○○ (2s latency)
pk.with_quality('realtime').transcribe("audio.wav") # ●○○○○ (1s latency)

# Or use preset names
pk.with_config('balanced').transcribe("audio.wav")
pk.with_config('low_latency').transcribe("audio.wav")
```

### Streaming Transcription

Process long audio files in chunks:

```python
from parakeet_stream import Parakeet

pk = Parakeet()

# Stream transcription results as they become available
for chunk in pk.stream("long_audio.wav"):
    print(f"[{chunk.timestamp_start:.1f}s]: {chunk.text}")
    if chunk.is_final:
        print(f"✓ Final: {chunk.text}")
```

### Microphone Features

```python
from parakeet_stream import Parakeet, Microphone

pk = Parakeet()

# Test ALL microphones automatically (recommended!)
results = Microphone.test_all(pk)
# Shows test phrase for you to read
# Tests each microphone with the same phrase
# Ranks by quality and recommends best one
# You can play back any recording: results[0].clip.play()

# Use the best microphone
best_mic = results[0].microphone
live = pk.listen(microphone=best_mic)

# Or manually discover and test
mics = Microphone.discover()
for mic in mics:
    print(mic)
# 🎤 Microphone 0: Built-in Microphone
# 🎤 Microphone 1: USB Microphone

# Test a specific microphone
mic = Microphone(device=1)
test_result = mic.test(pk)
# Shows random test phrase
# Records, transcribes, and evaluates quality
# Returns detailed metrics: match score, confidence, audio level

# Record audio
clip = mic.record(duration=5.0)
clip.play()  # Playback
clip.save("recording.wav")  # Save to file
```

### Batch Processing

```python
from parakeet_stream import Parakeet

pk = Parakeet()

# Transcribe multiple files with progress bar
audio_files = ["file1.wav", "file2.wav", "file3.wav"]
results = pk.transcribe_batch(audio_files, show_progress=True)

for file, result in zip(audio_files, results):
    print(f"{file}: {result.text}")
```

## 🎛️ Configuration Guide

### Quality Presets

Parakeet Stream includes 6 carefully tuned presets for different use cases:

| Preset | Quality | Latency | Use Case |
|--------|---------|---------|----------|
| `maximum_quality` | ●●●●● | ~15s | Offline transcription, highest accuracy |
| `high_quality` | ●●●●○ | ~10s | Long audio files, near-perfect quality |
| `balanced` | ●●●○○ | ~4s | **Default** - Great quality, acceptable latency |
| `low_latency` | ●●○○○ | ~2s | Interactive applications |
| `realtime` | ●○○○○ | ~1s | Live conversations, minimal delay |
| `ultra_realtime` | ●○○○○ | ~0.3s | Experimental ultra-low latency |

```python
from parakeet_stream import Parakeet

# Use preset at initialization
pk = Parakeet(config='balanced')

# Or change on the fly (no reload!)
pk.with_config('high_quality')

# Access preset information
from parakeet_stream import ConfigPresets

print(ConfigPresets.list())
# ['maximum_quality', 'high_quality', 'balanced', 'low_latency', 'realtime', 'ultra_realtime']

print(ConfigPresets.BALANCED)
# balanced:
#   Chunk: 2.0s | Left: 10.0s | Right: 2.0s
#   Latency: ~4.0s | Quality: ●●●○○
```

### Custom Parameters

Fine-tune parameters for specific needs:

```python
from parakeet_stream import Parakeet

pk = Parakeet()

# Adjust individual parameters
pk.with_params(
    chunk_secs=3.0,           # Process in 3-second chunks
    left_context_secs=15.0,   # More context for better quality
    right_context_secs=1.5    # Less lookahead for lower latency
)

result = pk.transcribe("audio.wav")
```

**Understanding Parameters:**

- **chunk_secs**: Size of each processing chunk (affects latency)
- **left_context_secs**: Context from previous audio (improves quality)
- **right_context_secs**: Context from future audio (affects latency)

**Latency Formula**: `latency = chunk_secs + right_context_secs`

### Device Selection

```python
from parakeet_stream import Parakeet

# CPU (default) - works everywhere
pk = Parakeet(device="cpu")

# NVIDIA GPU - 5-10x faster
pk = Parakeet(device="cuda")

# Apple Silicon (M1/M2/M3/M4)
pk = Parakeet(device="mps")
```

### Lazy Loading

By default, models load immediately (eager loading). For advanced use cases:

```python
from parakeet_stream import Parakeet

# Delay model loading
pk = Parakeet(lazy=True)

# Model loads on first use
result = pk.transcribe("audio.wav")

# Or load manually
pk.load()
```

## 🎨 Rich REPL Experience

Parakeet Stream provides beautiful displays in interactive environments:

### Python REPL

```python
>>> from parakeet_stream import Parakeet
>>> pk = Parakeet()

Loading nvidia/parakeet-tdt-0.6b-v3 on cpu...
Loading model:  20%|████████          | 1/5
Moving to device:  40%|████████████████          | 2/5
Configuring streaming:  60%|████████████████████████          | 3/5
Setting up decoder:  80%|████████████████████████████████          | 4/5
Computing context: 100%|████████████████████████████████████████| 5/5
✓ Ready! (nvidia/parakeet-tdt-0.6b-v3 on cpu)

>>> pk
Parakeet(model='nvidia/parakeet-tdt-0.6b-v3', device='cpu', config='balanced', status='ready')
```

### IPython

```python
In [1]: from parakeet_stream import Parakeet
In [2]: pk = Parakeet()
In [3]: pk
Out[3]:
Parakeet(model='nvidia/parakeet-tdt-0.6b-v3', device='cpu')
  Quality: ●●●○○ (balanced)
  Latency: ~4.0s
  Status: ✓ Ready

In [4]: result = pk.transcribe("audio.wav")
In [5]: result
Out[5]:
📝 This is a sample transcription
   Confidence: 95% ●●●●●
   Duration: 5.2s
```

### Jupyter Notebooks

Results display as styled HTML tables with rich formatting.

### Explore Configuration

```python
>>> from parakeet_stream import ConfigPresets
>>> ConfigPresets.list()
['maximum_quality', 'high_quality', 'balanced', 'low_latency', 'realtime', 'ultra_realtime']

>>> ConfigPresets.BALANCED
AudioConfig(name='balanced', latency=4.0s, quality=●●●○○)

>>> print(ConfigPresets.list_with_details())
Available Configuration Presets:

  balanced:
    Chunk: 2.0s | Left: 10.0s | Right: 2.0s
    Latency: ~4.0s | Quality: ●●●○○

  high_quality:
    Chunk: 5.0s | Left: 10.0s | Right: 5.0s
    Latency: ~10.0s | Quality: ●●●●○
  ...
```

## 🎤 Microphone Quality Testing

Not sure which microphone to use? Test them all automatically!

### Test All Microphones

```python
from parakeet_stream import Parakeet, Microphone

pk = Parakeet()

# Automatically test all microphones
results = Microphone.test_all(pk)
```

**What it does:**
1. Discovers all available microphones
2. Shows you a test phrase to read
3. Records from each microphone (same phrase for fair comparison)
4. Transcribes and evaluates quality
5. Detects silent/broken microphones
6. Ranks by quality score (transcription accuracy + confidence)
7. Recommends the best one

**Output:**
```
============================================================
🎤 MICROPHONE QUALITY TEST
============================================================

🔍 Discovering microphones...
✓ Found 3 microphone(s):
   1. Built-in Microphone (device 0)
   2. USB Microphone (device 1)
   3. Bluetooth Headset (device 2)

📝 Test phrase (same for all microphones):

   "Speech recognition technology continues to improve every year"

We'll now test each microphone. Press Enter to start...

... tests each mic ...

============================================================
📊 RESULTS SUMMARY
============================================================

Ranking (Best to Worst):

1. ✓ USB Microphone
   Device: 1
   Quality: [████████████████    ] 82.3%
   Match:   85.0%
   Confidence: 92% ●●●●●
   Audio Level: 0.0523
   Transcribed: "speech recognition technology continues to improve..."

2. ✓ Built-in Microphone
   Device: 0
   Quality: [███████████         ] 65.4%
   Match:   70.0%
   Confidence: 85% ●●●●○
   Audio Level: 0.0312

3. ✗ Bluetooth Headset
   Device: 2
   Quality: [                    ] 0.0%
   Match:   0.0%
   Audio Level: 0.0001
   ⚠️  No audio detected

────────────────────────────────────────────────────────
🏆 RECOMMENDATION
────────────────────────────────────────────────────────

Best microphone: USB Microphone
Device index: 1
Quality score: 82.3%

To use this microphone:
>>> mic = Microphone(device=1)
>>> live = pk.listen(microphone=mic)

============================================================
Tip: You can replay any recording:
>>> results[0].clip.play()  # Play best mic's recording
============================================================
```

### Access Test Results

```python
# Get results
results = Microphone.test_all(pk)

# Use best microphone
best = results[0]
print(f"Best: {best.microphone.name}")
print(f"Quality: {best.quality_score:.1%}")

# Play back recordings
best.clip.play()

# See what was transcribed
print(f"Expected: {best.expected_text}")
print(f"Got: {best.transcribed_text}")

# Check metrics
print(f"Match: {best.match_score:.1%}")
print(f"Confidence: {best.confidence:.1%}")
print(f"Audio level (RMS): {best.rms_level:.4f}")

# Start live transcription with best mic
live = pk.listen(microphone=best.microphone)
```

### Test Single Microphone

```python
pk = Parakeet()
mic = Microphone(device=1)

# Test with random phrase
result = mic.test(pk, duration=5.0)
# Shows phrase, records, transcribes, evaluates

# Test with specific phrase
result = mic.test(pk, phrase="Hello world", duration=3.0)

# Skip playback (faster)
result = mic.test(pk, playback=False)
```

## 🎯 Live Transcription Deep Dive

### Basic Usage

```python
from parakeet_stream import Parakeet

pk = Parakeet()

# Silent mode (default) - no console output
live = pk.listen()

# Transcription runs in background
# Check current transcript
print(live.text)

# Get statistics
print(live.transcript.stats)
# {'segments': 15, 'duration': 45.2, 'words': 234, 'avg_confidence': 0.94}

# Control playback
live.pause()   # Pause transcription
live.resume()  # Resume transcription
live.stop()    # Stop completely

# Verbose mode - prints to console
live = pk.listen(verbose=True)
# 🎤 Listening on: Built-in Microphone
#    (Press Ctrl+C or call .stop() to end)
# [2.5s] Hello world
# [4.6s] This is a test
```

### Save to File

```python
pk = Parakeet()

# Transcription automatically saved to file
live = pk.listen(output="transcript.txt")

# Stop and save complete transcript
live.stop()
live.transcript.save("transcript.json")  # Save with metadata
```

### Custom Microphone

```python
from parakeet_stream import Parakeet, Microphone

# Use specific microphone
mic = Microphone(device=1)  # USB microphone

pk = Parakeet()
live = pk.listen(microphone=mic)
```

### Access Segments

```python
live = pk.listen()

# Wait for some transcription...

# Get all segments
for segment in live.transcript.segments:
    print(f"[{segment.start_time:.1f}s - {segment.end_time:.1f}s] {segment.text}")

# Get last 5 segments
recent = live.transcript.tail(5)

# Get first 5 segments
beginning = live.transcript.head(5)
```

## 📚 API Reference

### Parakeet

Main interface for transcription.

```python
Parakeet(
    model_name: str = "nvidia/parakeet-tdt-0.6b-v3",
    device: str = "cpu",
    config: Union[str, AudioConfig] = "balanced",
    lazy: bool = False
)
```

**Methods:**

- `transcribe(audio, timestamps=False)` → `TranscriptResult`
  - Transcribe audio file or array

- `stream(audio)` → `Generator[StreamChunk]`
  - Stream transcription results as chunks

- `transcribe_batch(audio_files, timestamps=False, show_progress=True)` → `List[TranscriptResult]`
  - Batch transcribe multiple files

- `listen(microphone=None, output=None, chunk_duration=None, verbose=False)` → `LiveTranscriber`
  - Start live microphone transcription (silent by default)

**Configuration Methods (Chainable):**

- `with_config(config)` → `Parakeet`
  - Set configuration preset or custom AudioConfig

- `with_quality(level)` → `Parakeet`
  - Set quality level: 'max', 'high', 'good', 'low', 'realtime'

- `with_latency(level)` → `Parakeet`
  - Set latency level: 'high', 'medium', 'low', 'realtime'

- `with_params(chunk_secs=None, left_context_secs=None, right_context_secs=None)` → `Parakeet`
  - Set custom parameters

**Properties:**

- `config` - Current AudioConfig
- `configs` - Access to ConfigPresets

### TranscriptResult

Rich result object from transcription.

**Attributes:**
- `text` (str) - Transcribed text
- `confidence` (float) - Confidence score (0.0-1.0)
- `duration` (float) - Audio duration in seconds
- `timestamps` (List[dict]) - Word-level timestamps (if enabled)
- `word_count` (int) - Number of words
- `has_timestamps` (bool) - Whether timestamps are available

### LiveTranscriber

Background live transcription manager.

Runs silently by default - transcription happens in background without console output.
Use `verbose=True` to print transcriptions to console.

**Methods:**

- `start()` - Start transcription (called automatically by `pk.listen()`)
- `pause()` - Pause transcription
- `resume()` - Resume transcription
- `stop()` - Stop transcription

**Properties:**

- `text` (str) - Current full transcript
- `transcript` (TranscriptBuffer) - Buffer with all segments
- `is_running` (bool) - Whether currently running
- `is_paused` (bool) - Whether currently paused
- `elapsed` (float) - Elapsed time in seconds
- `verbose` (bool) - Whether console output is enabled

### TranscriptBuffer

Thread-safe buffer for live transcription segments.

**Methods:**

- `append(segment)` - Add segment
- `save(path)` - Save to JSON file
- `head(n=5)` - Get first n segments
- `tail(n=5)` - Get last n segments

**Properties:**

- `text` (str) - Full text (all segments joined)
- `segments` (List[Segment]) - All segments
- `stats` (dict) - Statistics (segments, duration, words, avg_confidence)

### Microphone

Microphone input manager with quality testing.

```python
Microphone(device=None, sample_rate=16000)
```

**Class Methods:**

- `discover()` → `List[Microphone]`
  - Discover all available microphones

- `test_all(transcriber, duration=5.0, playback=False)` → `List[MicrophoneTestResult]`
  - Test all microphones and rank by quality (recommended!)

**Methods:**

- `record(duration=3.0)` → `AudioClip`
  - Record audio for specified duration

- `test(transcriber, duration=5.0, phrase=None, playback=True)` → `MicrophoneTestResult`
  - Test microphone quality with transcription
  - Shows test phrase for user to read
  - Returns detailed quality metrics

**Properties:**

- `name` (str) - Device name
- `channels` (int) - Number of input channels

### MicrophoneTestResult

Result from microphone quality test.

**Attributes:**

- `microphone` (Microphone) - The tested microphone
- `clip` (AudioClip) - Recorded audio (can replay with `.clip.play()`)
- `expected_text` (str) - Text user was supposed to say
- `transcribed_text` (str) - What was actually transcribed
- `confidence` (float) - Transcription confidence score
- `has_audio` (bool) - Whether audio was detected (not silent)
- `rms_level` (float) - Audio level (higher = louder)
- `match_score` (float) - How well transcription matches (0-1)
- `quality_score` (float) - Overall quality (0-1)

### AudioClip

Recorded audio wrapper.

**Methods:**

- `play()` - Play audio through default device
- `save(path)` - Save to WAV file
- `to_tensor()` - Convert to PyTorch tensor

**Properties:**

- `duration` (float) - Duration in seconds
- `num_samples` (int) - Number of samples
- `data` (np.ndarray) - Audio data array
- `sample_rate` (int) - Sample rate in Hz

### ConfigPresets

Pre-configured quality/latency presets.

**Presets:**

- `MAXIMUM_QUALITY` - Best quality (15s latency)
- `HIGH_QUALITY` - High quality (10s latency)
- `BALANCED` - Balanced (4s latency) - **Default**
- `LOW_LATENCY` - Low latency (2s latency)
- `REALTIME` - Real-time (1s latency)
- `ULTRA_REALTIME` - Ultra real-time (0.3s latency)

**Methods:**

- `get(name)` → `AudioConfig` - Get preset by name
- `list()` → `List[str]` - List all preset names
- `list_with_details()` → `str` - Formatted list with details
- `by_quality(level)` → `AudioConfig` - Get by quality level
- `by_latency(level)` → `AudioConfig` - Get by latency level

### AudioConfig

Custom audio configuration.

```python
AudioConfig(
    name: str,
    chunk_secs: float,
    left_context_secs: float,
    right_context_secs: float
)
```

**Properties:**

- `latency` (float) - Theoretical latency in seconds
- `quality_score` (int) - Quality rating (1-5)
- `quality_indicator` (str) - Visual indicator (●●●○○)

## 📂 Examples

The `examples/` directory contains complete working examples:

### Available Examples

- **simple_transcribe.py** - Basic file transcription
- **streaming_transcribe.py** - Streaming with custom configuration
- **batch_transcribe.py** - Batch processing multiple files
- **test_microphones.py** - 🎤 **Test all microphones and find the best one**
- **microphone_simple.py** - Simple microphone recording
- **stream_microphone.py** - Full-featured live transcription
- **benchmark.py** - Compare configurations and benchmark performance

### Running Examples

```bash
# Test all microphones (recommended first step!)
python examples/test_microphones.py

# Simple transcription
python examples/simple_transcribe.py

# Live microphone (Ctrl+C to stop)
python examples/stream_microphone.py

# Save transcript to file
python examples/stream_microphone.py --output transcript.txt

# Use different quality preset
python examples/stream_microphone.py --config low_latency

# Benchmark different configurations
python examples/benchmark.py --audio audio.wav --benchmark
```

## 🌍 Supported Languages

The model automatically detects and transcribes in **25 European languages**:

Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Spanish, Swedish, Ukrainian

## 🚀 Performance

### Speed

- **CPU**: ~2-3x real-time on modern CPUs (transcribe 1 hour in 20-30 minutes)
- **GPU**: ~10x real-time on NVIDIA GPUs (transcribe 1 hour in 6 minutes)
- **Apple Silicon**: ~3-5x real-time on M1/M2/M3/M4

### Memory

- **CPU**: 2-4GB RAM
- **GPU**: 2-4GB RAM + 2GB VRAM
- **Model Size**: ~600MB download

### First Run

Model downloads from HuggingFace on first run (~600MB). Subsequent runs load from cache (~3-5 seconds).

## 🛠️ Development

### Setup Development Environment

```bash
# Clone repository
git clone https://github.com/maximerivest/parakeet-stream.git
cd parakeet-stream

# Install with dev dependencies
uv pip install -e ".[dev]"

# Install with microphone support
uv pip install -e ".[dev,microphone]"
```

### Running Tests

```bash
# Run all tests
pytest

# Run with coverage
pytest --cov=parakeet_stream --cov-report=html

# Run specific test file
pytest tests/test_parakeet.py

# Run specific test
pytest tests/test_parakeet.py::test_transcribe

# Run verbose
pytest -v
```

### Code Quality

```bash
# Format code
black parakeet_stream/

# Lint code
ruff check parakeet_stream/

# Type checking (if using mypy)
mypy parakeet_stream/
```

## 🐛 Troubleshooting

### Installation Issues

**Build errors during installation:**

```bash
# Install build dependencies first
pip install "Cython>=0.29.0" "numpy>=1.20.0"

# Then install the package
pip install -e .
```

**Python 3.13 compatibility:**

The package automatically installs `ml-dtypes>=0.5.0` for Python 3.13 support.

### Microphone Issues

**Linux (Ubuntu/Debian):**

```bash
sudo apt-get install portaudio19-dev
pip install sounddevice --force-reinstall
```

**Linux (Fedora/RHEL):**

```bash
sudo dnf install portaudio-devel
pip install sounddevice --force-reinstall
```

**macOS:**

```bash
brew install portaudio
pip install sounddevice --force-reinstall
```

**Test microphone:**

```python
from parakeet_stream import Microphone

# List available microphones
mics = Microphone.discover()
for mic in mics:
    print(mic)

# Test specific microphone
mic = Microphone(device=0)
clip = mic.record(2.0)
clip.play()
```

### Performance Issues

**Slow transcription:**

- Use GPU if available: `Parakeet(device="cuda")`
- Use lower quality preset: `pk.with_config('low_latency')`
- Close other applications to free RAM
- Check CPU usage - transcription is CPU-intensive

**High memory usage:**

- Use `lazy=True` for delayed loading
- Process files in smaller batches
- Reduce context window sizes with `pk.with_params()`

**Model download fails:**

```bash
# Set HuggingFace cache directory
export HF_HOME=/path/to/cache

# Or use offline mode (requires cached model)
export HF_HUB_OFFLINE=1
```

### Common Errors

**`RuntimeError: Model not loaded`:**

If using `lazy=True`, call `pk.load()` before transcribing.

**`ImportError: sounddevice is required`:**

Install microphone dependencies:
```bash
pip install "parakeet-stream[microphone]"
```

**Audio format errors:**

Ensure audio is 16kHz mono WAV. Convert with:
```bash
ffmpeg -i input.mp3 -ar 16000 -ac 1 output.wav
```

## 📄 License

MIT License - See LICENSE file for details.

This library uses NVIDIA's Parakeet TDT model, which is licensed under [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/).

## 🙏 Acknowledgments

- Built on [NVIDIA NeMo](https://github.com/NVIDIA/NeMo)
- Uses [Parakeet TDT 0.6b v3](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3) model
- Inspired by NVIDIA's streaming inference examples

## 📖 Citation

If you use this library in your research, please cite the Parakeet model:

```bibtex
@misc{parakeet-tdt-0.6b-v3,
  title={Parakeet TDT 0.6B V3},
  author={NVIDIA},
  year={2025},
  url={https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3}
}
```

## 🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

### How to Contribute

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Make your changes
4. Run tests (`pytest`)
5. Commit your changes (`git commit -m 'Add amazing feature'`)
6. Push to the branch (`git push origin feature/amazing-feature`)
7. Open a Pull Request

## 🛠️ CLI Tools

Parakeet Stream includes production-ready CLI tools for server and client deployment.

### Server CLI

Install and run the transcription server:

```bash
# Run server directly with uvx (no installation needed)
uvx --from parakeet-stream parakeet-server run --host 0.0.0.0 --port 8765 --device cuda

# Or install as systemd service for production (requires sudo)
uvx --from parakeet-stream parakeet-server install

# Check service status
sudo systemctl status parakeet-server
sudo journalctl -u parakeet-server -f  # View logs
```

**Server options:**
- `--host`: Host to bind to (default: 0.0.0.0)
- `--port`: Port to listen on (default: 8765)
- `--device`: Device to use (cpu, cuda, mps)
- `--config`: Quality preset (low_latency, balanced, high_quality)
- `--chunk-secs`: Audio chunk size in seconds
- `--left-context-secs`: Left context window
- `--right-context-secs`: Right context window

### Client CLI (Hotkey Transcription)

System-wide hotkey transcription that works anywhere:

```bash
# Run client with uvx (installs dependencies automatically)
uvx --from 'parakeet-stream[hotkey]' parakeet-client run \
  --server ws://192.168.1.100:8765 \
  --auto-paste

# Or install as user systemd service (autostart on login)
uvx --from 'parakeet-stream[hotkey]' parakeet-client install

# Check service status
systemctl --user status parakeet-hotkey
```

**Client features:**
- Press **Alt+W** to start/stop recording
- Transcription copied to clipboard automatically
- Optional auto-paste with smart terminal detection (Ctrl+Shift+V for terminals, Ctrl+V for apps)
- Transcription shown in system status bar (requires `panelstatus`)
- Works system-wide in any application

**Client requirements:**
- Linux with X11 (requires `xdotool` for auto-paste)
- `pynput`, `panelstatus`, `pyperclip` (installed automatically with `[hotkey]` extras)

### Installation as Tools

For persistent installation:

```bash
# Install server tool
uv tool install 'parakeet-stream[server]'

# Install client tool with hotkey dependencies
uv tool install 'parakeet-stream[hotkey]'

# Now use commands directly
parakeet-server run --device cuda
parakeet-client run --server ws://localhost:8765
```

## 💬 Support

- **Documentation**: This README and inline code documentation
- **Issues**: [GitHub Issues](https://github.com/maximerivest/parakeet-stream/issues)
- **Discussions**: [GitHub Discussions](https://github.com/maximerivest/parakeet-stream/discussions)

---

**Made with ❤️ for the speech recognition community**

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "parakeet-stream",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.14,>=3.9",
    "maintainer_email": null,
    "keywords": "asr, nemo, nvidia, parakeet, speech-recognition, streaming, transcription",
    "author": null,
    "author_email": "Maxime Rivest <mrive052@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/cd/06/dc0d17db7170267464ff041c22e6aef223006c96563fb81f534384f062aa/parakeet_stream-0.5.0.tar.gz",
    "platform": null,
    "description": "# Parakeet Stream\n\n**Simple, powerful streaming transcription for Python using NVIDIA's Parakeet TDT 0.6b**\n\nA modern Python library with a beautiful REPL-friendly API for audio transcription, featuring instant quality tuning, live microphone support, and rich interactive displays.\n\n## \u2728 Features\n\n- \ud83c\udfaf **Simple & Intuitive** - Beautiful API designed for interactive use\n- \ud83c\udfa8 **Rich Displays** - Gorgeous output in Python REPL, IPython, and Jupyter notebooks\n- \u26a1 **Instant Quality Tuning** - Switch between 6 quality presets without reloading model\n- \ud83c\udfa4 **Live Transcription** - Real-time microphone transcription with one line of code\n- \ud83c\udf0a **Streaming Support** - Process audio in chunks with configurable latency\n- \ud83d\udcbb **CPU Optimized** - Efficient inference on CPU (GPU optional)\n- \ud83c\udf0d **25 Languages** - Automatic language detection\n- \ud83d\udce6 **Batch Processing** - Transcribe multiple files efficiently\n- \u23f1\ufe0f **Timestamps** - Optional word-level timestamps\n\n## \ud83d\ude80 Installation\n\n### Quick Install\n\n```bash\n# Install with pip\npip install git+https://github.com/maximerivest/parakeet-stream.git\n\n# Or with uv (recommended)\nuv pip install git+https://github.com/maximerivest/parakeet-stream.git\n\n# With microphone support\npip install \"parakeet-stream[microphone] @ git+https://github.com/maximerivest/parakeet-stream.git\"\n```\n\n### Install from Source\n\n```bash\ngit clone https://github.com/maximerivest/parakeet-stream.git\ncd parakeet-stream\n\n# Install with uv\nuv pip install -e .\n\n# Or with pip\npip install -e .\n\n# With microphone support\nuv pip install -e \".[microphone]\"\n```\n\n### Requirements\n\n- Python 3.9-3.13\n- 2GB+ RAM (4GB+ recommended)\n- Any modern CPU (GPU optional)\n\n**Note**: Python 3.13 support requires `ml-dtypes>=0.5.0` which is automatically installed as a dependency.\n\n## \ud83d\udcd6 Quick Start\n\n### Basic Transcription\n\n```python\nfrom parakeet_stream import Parakeet\n\n# Initialize (loads model with clean progress bar)\npk = Parakeet()\n\n# Transcribe an audio file\nresult = pk.transcribe(\"audio.wav\")\nprint(result.text)\n```\n\nThe model loads immediately on initialization with a clean progress bar (no verbose logging). First run takes 3-5 minutes (downloads ~600MB from HuggingFace), subsequent runs load from cache in ~5 seconds.\n\n### Live Microphone Transcription\n\n```python\nfrom parakeet_stream import Parakeet\n\n# Initialize transcriber\npk = Parakeet()\n\n# Start live transcription (silent mode - no console output)\nlive = pk.listen()\n\n# Speak into microphone...\n# Transcription happens silently in background\n\n# Access transcript\nprint(live.text)  # Get current text\nprint(live.transcript.stats)  # Get statistics\n\n# Stop and get results\nlive.stop()\nprint(live.transcript.text)\n\n# Verbose mode - prints transcriptions to console\nlive = pk.listen(verbose=True)\n# [2.5s] Hello world\n# [4.6s] This is a test\n```\n\n### Quality/Latency Tuning\n\nSwitch between quality presets instantly - **no model reload needed**!\n\n```python\nfrom parakeet_stream import Parakeet\n\npk = Parakeet()\n\n# Try different quality levels (no reload!)\npk.with_quality('max').transcribe(\"audio.wav\")      # \u25cf\u25cf\u25cf\u25cf\u25cf (15s latency)\npk.with_quality('high').transcribe(\"audio.wav\")     # \u25cf\u25cf\u25cf\u25cf\u25cb (10s latency)\npk.with_quality('good').transcribe(\"audio.wav\")     # \u25cf\u25cf\u25cf\u25cb\u25cb (4s latency)\npk.with_quality('low').transcribe(\"audio.wav\")      # \u25cf\u25cf\u25cb\u25cb\u25cb (2s latency)\npk.with_quality('realtime').transcribe(\"audio.wav\") # \u25cf\u25cb\u25cb\u25cb\u25cb (1s latency)\n\n# Or use preset names\npk.with_config('balanced').transcribe(\"audio.wav\")\npk.with_config('low_latency').transcribe(\"audio.wav\")\n```\n\n### Streaming Transcription\n\nProcess long audio files in chunks:\n\n```python\nfrom parakeet_stream import Parakeet\n\npk = Parakeet()\n\n# Stream transcription results as they become available\nfor chunk in pk.stream(\"long_audio.wav\"):\n    print(f\"[{chunk.timestamp_start:.1f}s]: {chunk.text}\")\n    if chunk.is_final:\n        print(f\"\u2713 Final: {chunk.text}\")\n```\n\n### Microphone Features\n\n```python\nfrom parakeet_stream import Parakeet, Microphone\n\npk = Parakeet()\n\n# Test ALL microphones automatically (recommended!)\nresults = Microphone.test_all(pk)\n# Shows test phrase for you to read\n# Tests each microphone with the same phrase\n# Ranks by quality and recommends best one\n# You can play back any recording: results[0].clip.play()\n\n# Use the best microphone\nbest_mic = results[0].microphone\nlive = pk.listen(microphone=best_mic)\n\n# Or manually discover and test\nmics = Microphone.discover()\nfor mic in mics:\n    print(mic)\n# \ud83c\udfa4 Microphone 0: Built-in Microphone\n# \ud83c\udfa4 Microphone 1: USB Microphone\n\n# Test a specific microphone\nmic = Microphone(device=1)\ntest_result = mic.test(pk)\n# Shows random test phrase\n# Records, transcribes, and evaluates quality\n# Returns detailed metrics: match score, confidence, audio level\n\n# Record audio\nclip = mic.record(duration=5.0)\nclip.play()  # Playback\nclip.save(\"recording.wav\")  # Save to file\n```\n\n### Batch Processing\n\n```python\nfrom parakeet_stream import Parakeet\n\npk = Parakeet()\n\n# Transcribe multiple files with progress bar\naudio_files = [\"file1.wav\", \"file2.wav\", \"file3.wav\"]\nresults = pk.transcribe_batch(audio_files, show_progress=True)\n\nfor file, result in zip(audio_files, results):\n    print(f\"{file}: {result.text}\")\n```\n\n## \ud83c\udf9b\ufe0f Configuration Guide\n\n### Quality Presets\n\nParakeet Stream includes 6 carefully tuned presets for different use cases:\n\n| Preset | Quality | Latency | Use Case |\n|--------|---------|---------|----------|\n| `maximum_quality` | \u25cf\u25cf\u25cf\u25cf\u25cf | ~15s | Offline transcription, highest accuracy |\n| `high_quality` | \u25cf\u25cf\u25cf\u25cf\u25cb | ~10s | Long audio files, near-perfect quality |\n| `balanced` | \u25cf\u25cf\u25cf\u25cb\u25cb | ~4s | **Default** - Great quality, acceptable latency |\n| `low_latency` | \u25cf\u25cf\u25cb\u25cb\u25cb | ~2s | Interactive applications |\n| `realtime` | \u25cf\u25cb\u25cb\u25cb\u25cb | ~1s | Live conversations, minimal delay |\n| `ultra_realtime` | \u25cf\u25cb\u25cb\u25cb\u25cb | ~0.3s | Experimental ultra-low latency |\n\n```python\nfrom parakeet_stream import Parakeet\n\n# Use preset at initialization\npk = Parakeet(config='balanced')\n\n# Or change on the fly (no reload!)\npk.with_config('high_quality')\n\n# Access preset information\nfrom parakeet_stream import ConfigPresets\n\nprint(ConfigPresets.list())\n# ['maximum_quality', 'high_quality', 'balanced', 'low_latency', 'realtime', 'ultra_realtime']\n\nprint(ConfigPresets.BALANCED)\n# balanced:\n#   Chunk: 2.0s | Left: 10.0s | Right: 2.0s\n#   Latency: ~4.0s | Quality: \u25cf\u25cf\u25cf\u25cb\u25cb\n```\n\n### Custom Parameters\n\nFine-tune parameters for specific needs:\n\n```python\nfrom parakeet_stream import Parakeet\n\npk = Parakeet()\n\n# Adjust individual parameters\npk.with_params(\n    chunk_secs=3.0,           # Process in 3-second chunks\n    left_context_secs=15.0,   # More context for better quality\n    right_context_secs=1.5    # Less lookahead for lower latency\n)\n\nresult = pk.transcribe(\"audio.wav\")\n```\n\n**Understanding Parameters:**\n\n- **chunk_secs**: Size of each processing chunk (affects latency)\n- **left_context_secs**: Context from previous audio (improves quality)\n- **right_context_secs**: Context from future audio (affects latency)\n\n**Latency Formula**: `latency = chunk_secs + right_context_secs`\n\n### Device Selection\n\n```python\nfrom parakeet_stream import Parakeet\n\n# CPU (default) - works everywhere\npk = Parakeet(device=\"cpu\")\n\n# NVIDIA GPU - 5-10x faster\npk = Parakeet(device=\"cuda\")\n\n# Apple Silicon (M1/M2/M3/M4)\npk = Parakeet(device=\"mps\")\n```\n\n### Lazy Loading\n\nBy default, models load immediately (eager loading). For advanced use cases:\n\n```python\nfrom parakeet_stream import Parakeet\n\n# Delay model loading\npk = Parakeet(lazy=True)\n\n# Model loads on first use\nresult = pk.transcribe(\"audio.wav\")\n\n# Or load manually\npk.load()\n```\n\n## \ud83c\udfa8 Rich REPL Experience\n\nParakeet Stream provides beautiful displays in interactive environments:\n\n### Python REPL\n\n```python\n>>> from parakeet_stream import Parakeet\n>>> pk = Parakeet()\n\nLoading nvidia/parakeet-tdt-0.6b-v3 on cpu...\nLoading model:  20%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588          | 1/5\nMoving to device:  40%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588          | 2/5\nConfiguring streaming:  60%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588          | 3/5\nSetting up decoder:  80%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588          | 4/5\nComputing context: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 5/5\n\u2713 Ready! (nvidia/parakeet-tdt-0.6b-v3 on cpu)\n\n>>> pk\nParakeet(model='nvidia/parakeet-tdt-0.6b-v3', device='cpu', config='balanced', status='ready')\n```\n\n### IPython\n\n```python\nIn [1]: from parakeet_stream import Parakeet\nIn [2]: pk = Parakeet()\nIn [3]: pk\nOut[3]:\nParakeet(model='nvidia/parakeet-tdt-0.6b-v3', device='cpu')\n  Quality: \u25cf\u25cf\u25cf\u25cb\u25cb (balanced)\n  Latency: ~4.0s\n  Status: \u2713 Ready\n\nIn [4]: result = pk.transcribe(\"audio.wav\")\nIn [5]: result\nOut[5]:\n\ud83d\udcdd This is a sample transcription\n   Confidence: 95% \u25cf\u25cf\u25cf\u25cf\u25cf\n   Duration: 5.2s\n```\n\n### Jupyter Notebooks\n\nResults display as styled HTML tables with rich formatting.\n\n### Explore Configuration\n\n```python\n>>> from parakeet_stream import ConfigPresets\n>>> ConfigPresets.list()\n['maximum_quality', 'high_quality', 'balanced', 'low_latency', 'realtime', 'ultra_realtime']\n\n>>> ConfigPresets.BALANCED\nAudioConfig(name='balanced', latency=4.0s, quality=\u25cf\u25cf\u25cf\u25cb\u25cb)\n\n>>> print(ConfigPresets.list_with_details())\nAvailable Configuration Presets:\n\n  balanced:\n    Chunk: 2.0s | Left: 10.0s | Right: 2.0s\n    Latency: ~4.0s | Quality: \u25cf\u25cf\u25cf\u25cb\u25cb\n\n  high_quality:\n    Chunk: 5.0s | Left: 10.0s | Right: 5.0s\n    Latency: ~10.0s | Quality: \u25cf\u25cf\u25cf\u25cf\u25cb\n  ...\n```\n\n## \ud83c\udfa4 Microphone Quality Testing\n\nNot sure which microphone to use? Test them all automatically!\n\n### Test All Microphones\n\n```python\nfrom parakeet_stream import Parakeet, Microphone\n\npk = Parakeet()\n\n# Automatically test all microphones\nresults = Microphone.test_all(pk)\n```\n\n**What it does:**\n1. Discovers all available microphones\n2. Shows you a test phrase to read\n3. Records from each microphone (same phrase for fair comparison)\n4. Transcribes and evaluates quality\n5. Detects silent/broken microphones\n6. Ranks by quality score (transcription accuracy + confidence)\n7. Recommends the best one\n\n**Output:**\n```\n============================================================\n\ud83c\udfa4 MICROPHONE QUALITY TEST\n============================================================\n\n\ud83d\udd0d Discovering microphones...\n\u2713 Found 3 microphone(s):\n   1. Built-in Microphone (device 0)\n   2. USB Microphone (device 1)\n   3. Bluetooth Headset (device 2)\n\n\ud83d\udcdd Test phrase (same for all microphones):\n\n   \"Speech recognition technology continues to improve every year\"\n\nWe'll now test each microphone. Press Enter to start...\n\n... tests each mic ...\n\n============================================================\n\ud83d\udcca RESULTS SUMMARY\n============================================================\n\nRanking (Best to Worst):\n\n1. \u2713 USB Microphone\n   Device: 1\n   Quality: [\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588    ] 82.3%\n   Match:   85.0%\n   Confidence: 92% \u25cf\u25cf\u25cf\u25cf\u25cf\n   Audio Level: 0.0523\n   Transcribed: \"speech recognition technology continues to improve...\"\n\n2. \u2713 Built-in Microphone\n   Device: 0\n   Quality: [\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588         ] 65.4%\n   Match:   70.0%\n   Confidence: 85% \u25cf\u25cf\u25cf\u25cf\u25cb\n   Audio Level: 0.0312\n\n3. \u2717 Bluetooth Headset\n   Device: 2\n   Quality: [                    ] 0.0%\n   Match:   0.0%\n   Audio Level: 0.0001\n   \u26a0\ufe0f  No audio detected\n\n\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n\ud83c\udfc6 RECOMMENDATION\n\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n\nBest microphone: USB Microphone\nDevice index: 1\nQuality score: 82.3%\n\nTo use this microphone:\n>>> mic = Microphone(device=1)\n>>> live = pk.listen(microphone=mic)\n\n============================================================\nTip: You can replay any recording:\n>>> results[0].clip.play()  # Play best mic's recording\n============================================================\n```\n\n### Access Test Results\n\n```python\n# Get results\nresults = Microphone.test_all(pk)\n\n# Use best microphone\nbest = results[0]\nprint(f\"Best: {best.microphone.name}\")\nprint(f\"Quality: {best.quality_score:.1%}\")\n\n# Play back recordings\nbest.clip.play()\n\n# See what was transcribed\nprint(f\"Expected: {best.expected_text}\")\nprint(f\"Got: {best.transcribed_text}\")\n\n# Check metrics\nprint(f\"Match: {best.match_score:.1%}\")\nprint(f\"Confidence: {best.confidence:.1%}\")\nprint(f\"Audio level (RMS): {best.rms_level:.4f}\")\n\n# Start live transcription with best mic\nlive = pk.listen(microphone=best.microphone)\n```\n\n### Test Single Microphone\n\n```python\npk = Parakeet()\nmic = Microphone(device=1)\n\n# Test with random phrase\nresult = mic.test(pk, duration=5.0)\n# Shows phrase, records, transcribes, evaluates\n\n# Test with specific phrase\nresult = mic.test(pk, phrase=\"Hello world\", duration=3.0)\n\n# Skip playback (faster)\nresult = mic.test(pk, playback=False)\n```\n\n## \ud83c\udfaf Live Transcription Deep Dive\n\n### Basic Usage\n\n```python\nfrom parakeet_stream import Parakeet\n\npk = Parakeet()\n\n# Silent mode (default) - no console output\nlive = pk.listen()\n\n# Transcription runs in background\n# Check current transcript\nprint(live.text)\n\n# Get statistics\nprint(live.transcript.stats)\n# {'segments': 15, 'duration': 45.2, 'words': 234, 'avg_confidence': 0.94}\n\n# Control playback\nlive.pause()   # Pause transcription\nlive.resume()  # Resume transcription\nlive.stop()    # Stop completely\n\n# Verbose mode - prints to console\nlive = pk.listen(verbose=True)\n# \ud83c\udfa4 Listening on: Built-in Microphone\n#    (Press Ctrl+C or call .stop() to end)\n# [2.5s] Hello world\n# [4.6s] This is a test\n```\n\n### Save to File\n\n```python\npk = Parakeet()\n\n# Transcription automatically saved to file\nlive = pk.listen(output=\"transcript.txt\")\n\n# Stop and save complete transcript\nlive.stop()\nlive.transcript.save(\"transcript.json\")  # Save with metadata\n```\n\n### Custom Microphone\n\n```python\nfrom parakeet_stream import Parakeet, Microphone\n\n# Use specific microphone\nmic = Microphone(device=1)  # USB microphone\n\npk = Parakeet()\nlive = pk.listen(microphone=mic)\n```\n\n### Access Segments\n\n```python\nlive = pk.listen()\n\n# Wait for some transcription...\n\n# Get all segments\nfor segment in live.transcript.segments:\n    print(f\"[{segment.start_time:.1f}s - {segment.end_time:.1f}s] {segment.text}\")\n\n# Get last 5 segments\nrecent = live.transcript.tail(5)\n\n# Get first 5 segments\nbeginning = live.transcript.head(5)\n```\n\n## \ud83d\udcda API Reference\n\n### Parakeet\n\nMain interface for transcription.\n\n```python\nParakeet(\n    model_name: str = \"nvidia/parakeet-tdt-0.6b-v3\",\n    device: str = \"cpu\",\n    config: Union[str, AudioConfig] = \"balanced\",\n    lazy: bool = False\n)\n```\n\n**Methods:**\n\n- `transcribe(audio, timestamps=False)` \u2192 `TranscriptResult`\n  - Transcribe audio file or array\n\n- `stream(audio)` \u2192 `Generator[StreamChunk]`\n  - Stream transcription results as chunks\n\n- `transcribe_batch(audio_files, timestamps=False, show_progress=True)` \u2192 `List[TranscriptResult]`\n  - Batch transcribe multiple files\n\n- `listen(microphone=None, output=None, chunk_duration=None, verbose=False)` \u2192 `LiveTranscriber`\n  - Start live microphone transcription (silent by default)\n\n**Configuration Methods (Chainable):**\n\n- `with_config(config)` \u2192 `Parakeet`\n  - Set configuration preset or custom AudioConfig\n\n- `with_quality(level)` \u2192 `Parakeet`\n  - Set quality level: 'max', 'high', 'good', 'low', 'realtime'\n\n- `with_latency(level)` \u2192 `Parakeet`\n  - Set latency level: 'high', 'medium', 'low', 'realtime'\n\n- `with_params(chunk_secs=None, left_context_secs=None, right_context_secs=None)` \u2192 `Parakeet`\n  - Set custom parameters\n\n**Properties:**\n\n- `config` - Current AudioConfig\n- `configs` - Access to ConfigPresets\n\n### TranscriptResult\n\nRich result object from transcription.\n\n**Attributes:**\n- `text` (str) - Transcribed text\n- `confidence` (float) - Confidence score (0.0-1.0)\n- `duration` (float) - Audio duration in seconds\n- `timestamps` (List[dict]) - Word-level timestamps (if enabled)\n- `word_count` (int) - Number of words\n- `has_timestamps` (bool) - Whether timestamps are available\n\n### LiveTranscriber\n\nBackground live transcription manager.\n\nRuns silently by default - transcription happens in background without console output.\nUse `verbose=True` to print transcriptions to console.\n\n**Methods:**\n\n- `start()` - Start transcription (called automatically by `pk.listen()`)\n- `pause()` - Pause transcription\n- `resume()` - Resume transcription\n- `stop()` - Stop transcription\n\n**Properties:**\n\n- `text` (str) - Current full transcript\n- `transcript` (TranscriptBuffer) - Buffer with all segments\n- `is_running` (bool) - Whether currently running\n- `is_paused` (bool) - Whether currently paused\n- `elapsed` (float) - Elapsed time in seconds\n- `verbose` (bool) - Whether console output is enabled\n\n### TranscriptBuffer\n\nThread-safe buffer for live transcription segments.\n\n**Methods:**\n\n- `append(segment)` - Add segment\n- `save(path)` - Save to JSON file\n- `head(n=5)` - Get first n segments\n- `tail(n=5)` - Get last n segments\n\n**Properties:**\n\n- `text` (str) - Full text (all segments joined)\n- `segments` (List[Segment]) - All segments\n- `stats` (dict) - Statistics (segments, duration, words, avg_confidence)\n\n### Microphone\n\nMicrophone input manager with quality testing.\n\n```python\nMicrophone(device=None, sample_rate=16000)\n```\n\n**Class Methods:**\n\n- `discover()` \u2192 `List[Microphone]`\n  - Discover all available microphones\n\n- `test_all(transcriber, duration=5.0, playback=False)` \u2192 `List[MicrophoneTestResult]`\n  - Test all microphones and rank by quality (recommended!)\n\n**Methods:**\n\n- `record(duration=3.0)` \u2192 `AudioClip`\n  - Record audio for specified duration\n\n- `test(transcriber, duration=5.0, phrase=None, playback=True)` \u2192 `MicrophoneTestResult`\n  - Test microphone quality with transcription\n  - Shows test phrase for user to read\n  - Returns detailed quality metrics\n\n**Properties:**\n\n- `name` (str) - Device name\n- `channels` (int) - Number of input channels\n\n### MicrophoneTestResult\n\nResult from microphone quality test.\n\n**Attributes:**\n\n- `microphone` (Microphone) - The tested microphone\n- `clip` (AudioClip) - Recorded audio (can replay with `.clip.play()`)\n- `expected_text` (str) - Text user was supposed to say\n- `transcribed_text` (str) - What was actually transcribed\n- `confidence` (float) - Transcription confidence score\n- `has_audio` (bool) - Whether audio was detected (not silent)\n- `rms_level` (float) - Audio level (higher = louder)\n- `match_score` (float) - How well transcription matches (0-1)\n- `quality_score` (float) - Overall quality (0-1)\n\n### AudioClip\n\nRecorded audio wrapper.\n\n**Methods:**\n\n- `play()` - Play audio through default device\n- `save(path)` - Save to WAV file\n- `to_tensor()` - Convert to PyTorch tensor\n\n**Properties:**\n\n- `duration` (float) - Duration in seconds\n- `num_samples` (int) - Number of samples\n- `data` (np.ndarray) - Audio data array\n- `sample_rate` (int) - Sample rate in Hz\n\n### ConfigPresets\n\nPre-configured quality/latency presets.\n\n**Presets:**\n\n- `MAXIMUM_QUALITY` - Best quality (15s latency)\n- `HIGH_QUALITY` - High quality (10s latency)\n- `BALANCED` - Balanced (4s latency) - **Default**\n- `LOW_LATENCY` - Low latency (2s latency)\n- `REALTIME` - Real-time (1s latency)\n- `ULTRA_REALTIME` - Ultra real-time (0.3s latency)\n\n**Methods:**\n\n- `get(name)` \u2192 `AudioConfig` - Get preset by name\n- `list()` \u2192 `List[str]` - List all preset names\n- `list_with_details()` \u2192 `str` - Formatted list with details\n- `by_quality(level)` \u2192 `AudioConfig` - Get by quality level\n- `by_latency(level)` \u2192 `AudioConfig` - Get by latency level\n\n### AudioConfig\n\nCustom audio configuration.\n\n```python\nAudioConfig(\n    name: str,\n    chunk_secs: float,\n    left_context_secs: float,\n    right_context_secs: float\n)\n```\n\n**Properties:**\n\n- `latency` (float) - Theoretical latency in seconds\n- `quality_score` (int) - Quality rating (1-5)\n- `quality_indicator` (str) - Visual indicator (\u25cf\u25cf\u25cf\u25cb\u25cb)\n\n## \ud83d\udcc2 Examples\n\nThe `examples/` directory contains complete working examples:\n\n### Available Examples\n\n- **simple_transcribe.py** - Basic file transcription\n- **streaming_transcribe.py** - Streaming with custom configuration\n- **batch_transcribe.py** - Batch processing multiple files\n- **test_microphones.py** - \ud83c\udfa4 **Test all microphones and find the best one**\n- **microphone_simple.py** - Simple microphone recording\n- **stream_microphone.py** - Full-featured live transcription\n- **benchmark.py** - Compare configurations and benchmark performance\n\n### Running Examples\n\n```bash\n# Test all microphones (recommended first step!)\npython examples/test_microphones.py\n\n# Simple transcription\npython examples/simple_transcribe.py\n\n# Live microphone (Ctrl+C to stop)\npython examples/stream_microphone.py\n\n# Save transcript to file\npython examples/stream_microphone.py --output transcript.txt\n\n# Use different quality preset\npython examples/stream_microphone.py --config low_latency\n\n# Benchmark different configurations\npython examples/benchmark.py --audio audio.wav --benchmark\n```\n\n## \ud83c\udf0d Supported Languages\n\nThe model automatically detects and transcribes in **25 European languages**:\n\nBulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Spanish, Swedish, Ukrainian\n\n## \ud83d\ude80 Performance\n\n### Speed\n\n- **CPU**: ~2-3x real-time on modern CPUs (transcribe 1 hour in 20-30 minutes)\n- **GPU**: ~10x real-time on NVIDIA GPUs (transcribe 1 hour in 6 minutes)\n- **Apple Silicon**: ~3-5x real-time on M1/M2/M3/M4\n\n### Memory\n\n- **CPU**: 2-4GB RAM\n- **GPU**: 2-4GB RAM + 2GB VRAM\n- **Model Size**: ~600MB download\n\n### First Run\n\nModel downloads from HuggingFace on first run (~600MB). Subsequent runs load from cache (~3-5 seconds).\n\n## \ud83d\udee0\ufe0f Development\n\n### Setup Development Environment\n\n```bash\n# Clone repository\ngit clone https://github.com/maximerivest/parakeet-stream.git\ncd parakeet-stream\n\n# Install with dev dependencies\nuv pip install -e \".[dev]\"\n\n# Install with microphone support\nuv pip install -e \".[dev,microphone]\"\n```\n\n### Running Tests\n\n```bash\n# Run all tests\npytest\n\n# Run with coverage\npytest --cov=parakeet_stream --cov-report=html\n\n# Run specific test file\npytest tests/test_parakeet.py\n\n# Run specific test\npytest tests/test_parakeet.py::test_transcribe\n\n# Run verbose\npytest -v\n```\n\n### Code Quality\n\n```bash\n# Format code\nblack parakeet_stream/\n\n# Lint code\nruff check parakeet_stream/\n\n# Type checking (if using mypy)\nmypy parakeet_stream/\n```\n\n## \ud83d\udc1b Troubleshooting\n\n### Installation Issues\n\n**Build errors during installation:**\n\n```bash\n# Install build dependencies first\npip install \"Cython>=0.29.0\" \"numpy>=1.20.0\"\n\n# Then install the package\npip install -e .\n```\n\n**Python 3.13 compatibility:**\n\nThe package automatically installs `ml-dtypes>=0.5.0` for Python 3.13 support.\n\n### Microphone Issues\n\n**Linux (Ubuntu/Debian):**\n\n```bash\nsudo apt-get install portaudio19-dev\npip install sounddevice --force-reinstall\n```\n\n**Linux (Fedora/RHEL):**\n\n```bash\nsudo dnf install portaudio-devel\npip install sounddevice --force-reinstall\n```\n\n**macOS:**\n\n```bash\nbrew install portaudio\npip install sounddevice --force-reinstall\n```\n\n**Test microphone:**\n\n```python\nfrom parakeet_stream import Microphone\n\n# List available microphones\nmics = Microphone.discover()\nfor mic in mics:\n    print(mic)\n\n# Test specific microphone\nmic = Microphone(device=0)\nclip = mic.record(2.0)\nclip.play()\n```\n\n### Performance Issues\n\n**Slow transcription:**\n\n- Use GPU if available: `Parakeet(device=\"cuda\")`\n- Use lower quality preset: `pk.with_config('low_latency')`\n- Close other applications to free RAM\n- Check CPU usage - transcription is CPU-intensive\n\n**High memory usage:**\n\n- Use `lazy=True` for delayed loading\n- Process files in smaller batches\n- Reduce context window sizes with `pk.with_params()`\n\n**Model download fails:**\n\n```bash\n# Set HuggingFace cache directory\nexport HF_HOME=/path/to/cache\n\n# Or use offline mode (requires cached model)\nexport HF_HUB_OFFLINE=1\n```\n\n### Common Errors\n\n**`RuntimeError: Model not loaded`:**\n\nIf using `lazy=True`, call `pk.load()` before transcribing.\n\n**`ImportError: sounddevice is required`:**\n\nInstall microphone dependencies:\n```bash\npip install \"parakeet-stream[microphone]\"\n```\n\n**Audio format errors:**\n\nEnsure audio is 16kHz mono WAV. Convert with:\n```bash\nffmpeg -i input.mp3 -ar 16000 -ac 1 output.wav\n```\n\n## \ud83d\udcc4 License\n\nMIT License - See LICENSE file for details.\n\nThis library uses NVIDIA's Parakeet TDT model, which is licensed under [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/).\n\n## \ud83d\ude4f Acknowledgments\n\n- Built on [NVIDIA NeMo](https://github.com/NVIDIA/NeMo)\n- Uses [Parakeet TDT 0.6b v3](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3) model\n- Inspired by NVIDIA's streaming inference examples\n\n## \ud83d\udcd6 Citation\n\nIf you use this library in your research, please cite the Parakeet model:\n\n```bibtex\n@misc{parakeet-tdt-0.6b-v3,\n  title={Parakeet TDT 0.6B V3},\n  author={NVIDIA},\n  year={2025},\n  url={https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3}\n}\n```\n\n## \ud83e\udd1d Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n### How to Contribute\n\n1. Fork the repository\n2. Create a feature branch (`git checkout -b feature/amazing-feature`)\n3. Make your changes\n4. Run tests (`pytest`)\n5. Commit your changes (`git commit -m 'Add amazing feature'`)\n6. Push to the branch (`git push origin feature/amazing-feature`)\n7. Open a Pull Request\n\n## \ud83d\udee0\ufe0f CLI Tools\n\nParakeet Stream includes production-ready CLI tools for server and client deployment.\n\n### Server CLI\n\nInstall and run the transcription server:\n\n```bash\n# Run server directly with uvx (no installation needed)\nuvx --from parakeet-stream parakeet-server run --host 0.0.0.0 --port 8765 --device cuda\n\n# Or install as systemd service for production (requires sudo)\nuvx --from parakeet-stream parakeet-server install\n\n# Check service status\nsudo systemctl status parakeet-server\nsudo journalctl -u parakeet-server -f  # View logs\n```\n\n**Server options:**\n- `--host`: Host to bind to (default: 0.0.0.0)\n- `--port`: Port to listen on (default: 8765)\n- `--device`: Device to use (cpu, cuda, mps)\n- `--config`: Quality preset (low_latency, balanced, high_quality)\n- `--chunk-secs`: Audio chunk size in seconds\n- `--left-context-secs`: Left context window\n- `--right-context-secs`: Right context window\n\n### Client CLI (Hotkey Transcription)\n\nSystem-wide hotkey transcription that works anywhere:\n\n```bash\n# Run client with uvx (installs dependencies automatically)\nuvx --from 'parakeet-stream[hotkey]' parakeet-client run \\\n  --server ws://192.168.1.100:8765 \\\n  --auto-paste\n\n# Or install as user systemd service (autostart on login)\nuvx --from 'parakeet-stream[hotkey]' parakeet-client install\n\n# Check service status\nsystemctl --user status parakeet-hotkey\n```\n\n**Client features:**\n- Press **Alt+W** to start/stop recording\n- Transcription copied to clipboard automatically\n- Optional auto-paste with smart terminal detection (Ctrl+Shift+V for terminals, Ctrl+V for apps)\n- Transcription shown in system status bar (requires `panelstatus`)\n- Works system-wide in any application\n\n**Client requirements:**\n- Linux with X11 (requires `xdotool` for auto-paste)\n- `pynput`, `panelstatus`, `pyperclip` (installed automatically with `[hotkey]` extras)\n\n### Installation as Tools\n\nFor persistent installation:\n\n```bash\n# Install server tool\nuv tool install 'parakeet-stream[server]'\n\n# Install client tool with hotkey dependencies\nuv tool install 'parakeet-stream[hotkey]'\n\n# Now use commands directly\nparakeet-server run --device cuda\nparakeet-client run --server ws://localhost:8765\n```\n\n## \ud83d\udcac Support\n\n- **Documentation**: This README and inline code documentation\n- **Issues**: [GitHub Issues](https://github.com/maximerivest/parakeet-stream/issues)\n- **Discussions**: [GitHub Discussions](https://github.com/maximerivest/parakeet-stream/discussions)\n\n---\n\n**Made with \u2764\ufe0f for the speech recognition community**\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Simple, powerful streaming transcription for Python using NVIDIA's Parakeet TDT 0.6b",
    "version": "0.5.0",
    "project_urls": {
        "Documentation": "https://github.com/maximerivest/parakeet-stream/blob/main/README.md",
        "Homepage": "https://github.com/maximerivest/parakeet-stream",
        "Repository": "https://github.com/maximerivest/parakeet-stream"
    },
    "split_keywords": [
        "asr",
        " nemo",
        " nvidia",
        " parakeet",
        " speech-recognition",
        " streaming",
        " transcription"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5d6bbdfdf19ebb244a17f641f22c92b4c0d3925efbd18a515a9528bed729b88d",
                "md5": "916540628ffb10fe82ff69627ede1373",
                "sha256": "e43438534d45a65b464e8c41987c8cf2843be2ff51e2b55856fe90996fce18c2"
            },
            "downloads": -1,
            "filename": "parakeet_stream-0.5.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "916540628ffb10fe82ff69627ede1373",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.14,>=3.9",
            "size": 59016,
            "upload_time": "2025-10-13T02:10:59",
            "upload_time_iso_8601": "2025-10-13T02:10:59.555537Z",
            "url": "https://files.pythonhosted.org/packages/5d/6b/bdfdf19ebb244a17f641f22c92b4c0d3925efbd18a515a9528bed729b88d/parakeet_stream-0.5.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "cd06dc0d17db7170267464ff041c22e6aef223006c96563fb81f534384f062aa",
                "md5": "117cf0cd553f5973179d130e550ec763",
                "sha256": "e7bda4b2e06b66ae799edb004438fa68470a41113a6a58b63ef76adc35d6ebfe"
            },
            "downloads": -1,
            "filename": "parakeet_stream-0.5.0.tar.gz",
            "has_sig": false,
            "md5_digest": "117cf0cd553f5973179d130e550ec763",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.14,>=3.9",
            "size": 477143,
            "upload_time": "2025-10-13T02:11:01",
            "upload_time_iso_8601": "2025-10-13T02:11:01.076855Z",
            "url": "https://files.pythonhosted.org/packages/cd/06/dc0d17db7170267464ff041c22e6aef223006c96563fb81f534384f062aa/parakeet_stream-0.5.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-13 02:11:01",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "maximerivest",
    "github_project": "parakeet-stream",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "parakeet-stream"
}

None