# Parakeet Stream
**Simple, powerful streaming transcription for Python using NVIDIA's Parakeet TDT 0.6b**
A modern Python library with a beautiful REPL-friendly API for audio transcription, featuring instant quality tuning, live microphone support, and rich interactive displays.
## ✨ Features
- 🎯 **Simple & Intuitive** - Beautiful API designed for interactive use
- 🎨 **Rich Displays** - Gorgeous output in Python REPL, IPython, and Jupyter notebooks
- ⚡ **Instant Quality Tuning** - Switch between 6 quality presets without reloading model
- 🎤 **Live Transcription** - Real-time microphone transcription with one line of code
- 🌊 **Streaming Support** - Process audio in chunks with configurable latency
- 💻 **CPU Optimized** - Efficient inference on CPU (GPU optional)
- 🌍 **25 Languages** - Automatic language detection
- 📦 **Batch Processing** - Transcribe multiple files efficiently
- ⏱️ **Timestamps** - Optional word-level timestamps
## 🚀 Installation
### Quick Install
```bash
# Install with pip
pip install git+https://github.com/maximerivest/parakeet-stream.git
# Or with uv (recommended)
uv pip install git+https://github.com/maximerivest/parakeet-stream.git
# With microphone support
pip install "parakeet-stream[microphone] @ git+https://github.com/maximerivest/parakeet-stream.git"
```
### Install from Source
```bash
git clone https://github.com/maximerivest/parakeet-stream.git
cd parakeet-stream
# Install with uv
uv pip install -e .
# Or with pip
pip install -e .
# With microphone support
uv pip install -e ".[microphone]"
```
### Requirements
- Python 3.9-3.13
- 2GB+ RAM (4GB+ recommended)
- Any modern CPU (GPU optional)
**Note**: Python 3.13 support requires `ml-dtypes>=0.5.0` which is automatically installed as a dependency.
## 📖 Quick Start
### Basic Transcription
```python
from parakeet_stream import Parakeet
# Initialize (loads model with clean progress bar)
pk = Parakeet()
# Transcribe an audio file
result = pk.transcribe("audio.wav")
print(result.text)
```
The model loads immediately on initialization with a clean progress bar (no verbose logging). First run takes 3-5 minutes (downloads ~600MB from HuggingFace), subsequent runs load from cache in ~5 seconds.
### Live Microphone Transcription
```python
from parakeet_stream import Parakeet
# Initialize transcriber
pk = Parakeet()
# Start live transcription (silent mode - no console output)
live = pk.listen()
# Speak into microphone...
# Transcription happens silently in background
# Access transcript
print(live.text) # Get current text
print(live.transcript.stats) # Get statistics
# Stop and get results
live.stop()
print(live.transcript.text)
# Verbose mode - prints transcriptions to console
live = pk.listen(verbose=True)
# [2.5s] Hello world
# [4.6s] This is a test
```
### Quality/Latency Tuning
Switch between quality presets instantly - **no model reload needed**!
```python
from parakeet_stream import Parakeet
pk = Parakeet()
# Try different quality levels (no reload!)
pk.with_quality('max').transcribe("audio.wav") # ●●●●● (15s latency)
pk.with_quality('high').transcribe("audio.wav") # ●●●●○ (10s latency)
pk.with_quality('good').transcribe("audio.wav") # ●●●○○ (4s latency)
pk.with_quality('low').transcribe("audio.wav") # ●●○○○ (2s latency)
pk.with_quality('realtime').transcribe("audio.wav") # ●○○○○ (1s latency)
# Or use preset names
pk.with_config('balanced').transcribe("audio.wav")
pk.with_config('low_latency').transcribe("audio.wav")
```
### Streaming Transcription
Process long audio files in chunks:
```python
from parakeet_stream import Parakeet
pk = Parakeet()
# Stream transcription results as they become available
for chunk in pk.stream("long_audio.wav"):
print(f"[{chunk.timestamp_start:.1f}s]: {chunk.text}")
if chunk.is_final:
print(f"✓ Final: {chunk.text}")
```
### Microphone Features
```python
from parakeet_stream import Parakeet, Microphone
pk = Parakeet()
# Test ALL microphones automatically (recommended!)
results = Microphone.test_all(pk)
# Shows test phrase for you to read
# Tests each microphone with the same phrase
# Ranks by quality and recommends best one
# You can play back any recording: results[0].clip.play()
# Use the best microphone
best_mic = results[0].microphone
live = pk.listen(microphone=best_mic)
# Or manually discover and test
mics = Microphone.discover()
for mic in mics:
print(mic)
# 🎤 Microphone 0: Built-in Microphone
# 🎤 Microphone 1: USB Microphone
# Test a specific microphone
mic = Microphone(device=1)
test_result = mic.test(pk)
# Shows random test phrase
# Records, transcribes, and evaluates quality
# Returns detailed metrics: match score, confidence, audio level
# Record audio
clip = mic.record(duration=5.0)
clip.play() # Playback
clip.save("recording.wav") # Save to file
```
### Batch Processing
```python
from parakeet_stream import Parakeet
pk = Parakeet()
# Transcribe multiple files with progress bar
audio_files = ["file1.wav", "file2.wav", "file3.wav"]
results = pk.transcribe_batch(audio_files, show_progress=True)
for file, result in zip(audio_files, results):
print(f"{file}: {result.text}")
```
## 🎛️ Configuration Guide
### Quality Presets
Parakeet Stream includes 6 carefully tuned presets for different use cases:
| Preset | Quality | Latency | Use Case |
|--------|---------|---------|----------|
| `maximum_quality` | ●●●●● | ~15s | Offline transcription, highest accuracy |
| `high_quality` | ●●●●○ | ~10s | Long audio files, near-perfect quality |
| `balanced` | ●●●○○ | ~4s | **Default** - Great quality, acceptable latency |
| `low_latency` | ●●○○○ | ~2s | Interactive applications |
| `realtime` | ●○○○○ | ~1s | Live conversations, minimal delay |
| `ultra_realtime` | ●○○○○ | ~0.3s | Experimental ultra-low latency |
```python
from parakeet_stream import Parakeet
# Use preset at initialization
pk = Parakeet(config='balanced')
# Or change on the fly (no reload!)
pk.with_config('high_quality')
# Access preset information
from parakeet_stream import ConfigPresets
print(ConfigPresets.list())
# ['maximum_quality', 'high_quality', 'balanced', 'low_latency', 'realtime', 'ultra_realtime']
print(ConfigPresets.BALANCED)
# balanced:
# Chunk: 2.0s | Left: 10.0s | Right: 2.0s
# Latency: ~4.0s | Quality: ●●●○○
```
### Custom Parameters
Fine-tune parameters for specific needs:
```python
from parakeet_stream import Parakeet
pk = Parakeet()
# Adjust individual parameters
pk.with_params(
chunk_secs=3.0, # Process in 3-second chunks
left_context_secs=15.0, # More context for better quality
right_context_secs=1.5 # Less lookahead for lower latency
)
result = pk.transcribe("audio.wav")
```
**Understanding Parameters:**
- **chunk_secs**: Size of each processing chunk (affects latency)
- **left_context_secs**: Context from previous audio (improves quality)
- **right_context_secs**: Context from future audio (affects latency)
**Latency Formula**: `latency = chunk_secs + right_context_secs`
### Device Selection
```python
from parakeet_stream import Parakeet
# CPU (default) - works everywhere
pk = Parakeet(device="cpu")
# NVIDIA GPU - 5-10x faster
pk = Parakeet(device="cuda")
# Apple Silicon (M1/M2/M3/M4)
pk = Parakeet(device="mps")
```
### Lazy Loading
By default, models load immediately (eager loading). For advanced use cases:
```python
from parakeet_stream import Parakeet
# Delay model loading
pk = Parakeet(lazy=True)
# Model loads on first use
result = pk.transcribe("audio.wav")
# Or load manually
pk.load()
```
## 🎨 Rich REPL Experience
Parakeet Stream provides beautiful displays in interactive environments:
### Python REPL
```python
>>> from parakeet_stream import Parakeet
>>> pk = Parakeet()
Loading nvidia/parakeet-tdt-0.6b-v3 on cpu...
Loading model: 20%|████████ | 1/5
Moving to device: 40%|████████████████ | 2/5
Configuring streaming: 60%|████████████████████████ | 3/5
Setting up decoder: 80%|████████████████████████████████ | 4/5
Computing context: 100%|████████████████████████████████████████| 5/5
✓ Ready! (nvidia/parakeet-tdt-0.6b-v3 on cpu)
>>> pk
Parakeet(model='nvidia/parakeet-tdt-0.6b-v3', device='cpu', config='balanced', status='ready')
```
### IPython
```python
In [1]: from parakeet_stream import Parakeet
In [2]: pk = Parakeet()
In [3]: pk
Out[3]:
Parakeet(model='nvidia/parakeet-tdt-0.6b-v3', device='cpu')
Quality: ●●●○○ (balanced)
Latency: ~4.0s
Status: ✓ Ready
In [4]: result = pk.transcribe("audio.wav")
In [5]: result
Out[5]:
📝 This is a sample transcription
Confidence: 95% ●●●●●
Duration: 5.2s
```
### Jupyter Notebooks
Results display as styled HTML tables with rich formatting.
### Explore Configuration
```python
>>> from parakeet_stream import ConfigPresets
>>> ConfigPresets.list()
['maximum_quality', 'high_quality', 'balanced', 'low_latency', 'realtime', 'ultra_realtime']
>>> ConfigPresets.BALANCED
AudioConfig(name='balanced', latency=4.0s, quality=●●●○○)
>>> print(ConfigPresets.list_with_details())
Available Configuration Presets:
balanced:
Chunk: 2.0s | Left: 10.0s | Right: 2.0s
Latency: ~4.0s | Quality: ●●●○○
high_quality:
Chunk: 5.0s | Left: 10.0s | Right: 5.0s
Latency: ~10.0s | Quality: ●●●●○
...
```
## 🎤 Microphone Quality Testing
Not sure which microphone to use? Test them all automatically!
### Test All Microphones
```python
from parakeet_stream import Parakeet, Microphone
pk = Parakeet()
# Automatically test all microphones
results = Microphone.test_all(pk)
```
**What it does:**
1. Discovers all available microphones
2. Shows you a test phrase to read
3. Records from each microphone (same phrase for fair comparison)
4. Transcribes and evaluates quality
5. Detects silent/broken microphones
6. Ranks by quality score (transcription accuracy + confidence)
7. Recommends the best one
**Output:**
```
============================================================
🎤 MICROPHONE QUALITY TEST
============================================================
🔍 Discovering microphones...
✓ Found 3 microphone(s):
1. Built-in Microphone (device 0)
2. USB Microphone (device 1)
3. Bluetooth Headset (device 2)
📝 Test phrase (same for all microphones):
"Speech recognition technology continues to improve every year"
We'll now test each microphone. Press Enter to start...
... tests each mic ...
============================================================
📊 RESULTS SUMMARY
============================================================
Ranking (Best to Worst):
1. ✓ USB Microphone
Device: 1
Quality: [████████████████ ] 82.3%
Match: 85.0%
Confidence: 92% ●●●●●
Audio Level: 0.0523
Transcribed: "speech recognition technology continues to improve..."
2. ✓ Built-in Microphone
Device: 0
Quality: [███████████ ] 65.4%
Match: 70.0%
Confidence: 85% ●●●●○
Audio Level: 0.0312
3. ✗ Bluetooth Headset
Device: 2
Quality: [ ] 0.0%
Match: 0.0%
Audio Level: 0.0001
⚠️ No audio detected
────────────────────────────────────────────────────────
🏆 RECOMMENDATION
────────────────────────────────────────────────────────
Best microphone: USB Microphone
Device index: 1
Quality score: 82.3%
To use this microphone:
>>> mic = Microphone(device=1)
>>> live = pk.listen(microphone=mic)
============================================================
Tip: You can replay any recording:
>>> results[0].clip.play() # Play best mic's recording
============================================================
```
### Access Test Results
```python
# Get results
results = Microphone.test_all(pk)
# Use best microphone
best = results[0]
print(f"Best: {best.microphone.name}")
print(f"Quality: {best.quality_score:.1%}")
# Play back recordings
best.clip.play()
# See what was transcribed
print(f"Expected: {best.expected_text}")
print(f"Got: {best.transcribed_text}")
# Check metrics
print(f"Match: {best.match_score:.1%}")
print(f"Confidence: {best.confidence:.1%}")
print(f"Audio level (RMS): {best.rms_level:.4f}")
# Start live transcription with best mic
live = pk.listen(microphone=best.microphone)
```
### Test Single Microphone
```python
pk = Parakeet()
mic = Microphone(device=1)
# Test with random phrase
result = mic.test(pk, duration=5.0)
# Shows phrase, records, transcribes, evaluates
# Test with specific phrase
result = mic.test(pk, phrase="Hello world", duration=3.0)
# Skip playback (faster)
result = mic.test(pk, playback=False)
```
## 🎯 Live Transcription Deep Dive
### Basic Usage
```python
from parakeet_stream import Parakeet
pk = Parakeet()
# Silent mode (default) - no console output
live = pk.listen()
# Transcription runs in background
# Check current transcript
print(live.text)
# Get statistics
print(live.transcript.stats)
# {'segments': 15, 'duration': 45.2, 'words': 234, 'avg_confidence': 0.94}
# Control playback
live.pause() # Pause transcription
live.resume() # Resume transcription
live.stop() # Stop completely
# Verbose mode - prints to console
live = pk.listen(verbose=True)
# 🎤 Listening on: Built-in Microphone
# (Press Ctrl+C or call .stop() to end)
# [2.5s] Hello world
# [4.6s] This is a test
```
### Save to File
```python
pk = Parakeet()
# Transcription automatically saved to file
live = pk.listen(output="transcript.txt")
# Stop and save complete transcript
live.stop()
live.transcript.save("transcript.json") # Save with metadata
```
### Custom Microphone
```python
from parakeet_stream import Parakeet, Microphone
# Use specific microphone
mic = Microphone(device=1) # USB microphone
pk = Parakeet()
live = pk.listen(microphone=mic)
```
### Access Segments
```python
live = pk.listen()
# Wait for some transcription...
# Get all segments
for segment in live.transcript.segments:
print(f"[{segment.start_time:.1f}s - {segment.end_time:.1f}s] {segment.text}")
# Get last 5 segments
recent = live.transcript.tail(5)
# Get first 5 segments
beginning = live.transcript.head(5)
```
## 📚 API Reference
### Parakeet
Main interface for transcription.
```python
Parakeet(
model_name: str = "nvidia/parakeet-tdt-0.6b-v3",
device: str = "cpu",
config: Union[str, AudioConfig] = "balanced",
lazy: bool = False
)
```
**Methods:**
- `transcribe(audio, timestamps=False)` → `TranscriptResult`
- Transcribe audio file or array
- `stream(audio)` → `Generator[StreamChunk]`
- Stream transcription results as chunks
- `transcribe_batch(audio_files, timestamps=False, show_progress=True)` → `List[TranscriptResult]`
- Batch transcribe multiple files
- `listen(microphone=None, output=None, chunk_duration=None, verbose=False)` → `LiveTranscriber`
- Start live microphone transcription (silent by default)
**Configuration Methods (Chainable):**
- `with_config(config)` → `Parakeet`
- Set configuration preset or custom AudioConfig
- `with_quality(level)` → `Parakeet`
- Set quality level: 'max', 'high', 'good', 'low', 'realtime'
- `with_latency(level)` → `Parakeet`
- Set latency level: 'high', 'medium', 'low', 'realtime'
- `with_params(chunk_secs=None, left_context_secs=None, right_context_secs=None)` → `Parakeet`
- Set custom parameters
**Properties:**
- `config` - Current AudioConfig
- `configs` - Access to ConfigPresets
### TranscriptResult
Rich result object from transcription.
**Attributes:**
- `text` (str) - Transcribed text
- `confidence` (float) - Confidence score (0.0-1.0)
- `duration` (float) - Audio duration in seconds
- `timestamps` (List[dict]) - Word-level timestamps (if enabled)
- `word_count` (int) - Number of words
- `has_timestamps` (bool) - Whether timestamps are available
### LiveTranscriber
Background live transcription manager.
Runs silently by default - transcription happens in background without console output.
Use `verbose=True` to print transcriptions to console.
**Methods:**
- `start()` - Start transcription (called automatically by `pk.listen()`)
- `pause()` - Pause transcription
- `resume()` - Resume transcription
- `stop()` - Stop transcription
**Properties:**
- `text` (str) - Current full transcript
- `transcript` (TranscriptBuffer) - Buffer with all segments
- `is_running` (bool) - Whether currently running
- `is_paused` (bool) - Whether currently paused
- `elapsed` (float) - Elapsed time in seconds
- `verbose` (bool) - Whether console output is enabled
### TranscriptBuffer
Thread-safe buffer for live transcription segments.
**Methods:**
- `append(segment)` - Add segment
- `save(path)` - Save to JSON file
- `head(n=5)` - Get first n segments
- `tail(n=5)` - Get last n segments
**Properties:**
- `text` (str) - Full text (all segments joined)
- `segments` (List[Segment]) - All segments
- `stats` (dict) - Statistics (segments, duration, words, avg_confidence)
### Microphone
Microphone input manager with quality testing.
```python
Microphone(device=None, sample_rate=16000)
```
**Class Methods:**
- `discover()` → `List[Microphone]`
- Discover all available microphones
- `test_all(transcriber, duration=5.0, playback=False)` → `List[MicrophoneTestResult]`
- Test all microphones and rank by quality (recommended!)
**Methods:**
- `record(duration=3.0)` → `AudioClip`
- Record audio for specified duration
- `test(transcriber, duration=5.0, phrase=None, playback=True)` → `MicrophoneTestResult`
- Test microphone quality with transcription
- Shows test phrase for user to read
- Returns detailed quality metrics
**Properties:**
- `name` (str) - Device name
- `channels` (int) - Number of input channels
### MicrophoneTestResult
Result from microphone quality test.
**Attributes:**
- `microphone` (Microphone) - The tested microphone
- `clip` (AudioClip) - Recorded audio (can replay with `.clip.play()`)
- `expected_text` (str) - Text user was supposed to say
- `transcribed_text` (str) - What was actually transcribed
- `confidence` (float) - Transcription confidence score
- `has_audio` (bool) - Whether audio was detected (not silent)
- `rms_level` (float) - Audio level (higher = louder)
- `match_score` (float) - How well transcription matches (0-1)
- `quality_score` (float) - Overall quality (0-1)
### AudioClip
Recorded audio wrapper.
**Methods:**
- `play()` - Play audio through default device
- `save(path)` - Save to WAV file
- `to_tensor()` - Convert to PyTorch tensor
**Properties:**
- `duration` (float) - Duration in seconds
- `num_samples` (int) - Number of samples
- `data` (np.ndarray) - Audio data array
- `sample_rate` (int) - Sample rate in Hz
### ConfigPresets
Pre-configured quality/latency presets.
**Presets:**
- `MAXIMUM_QUALITY` - Best quality (15s latency)
- `HIGH_QUALITY` - High quality (10s latency)
- `BALANCED` - Balanced (4s latency) - **Default**
- `LOW_LATENCY` - Low latency (2s latency)
- `REALTIME` - Real-time (1s latency)
- `ULTRA_REALTIME` - Ultra real-time (0.3s latency)
**Methods:**
- `get(name)` → `AudioConfig` - Get preset by name
- `list()` → `List[str]` - List all preset names
- `list_with_details()` → `str` - Formatted list with details
- `by_quality(level)` → `AudioConfig` - Get by quality level
- `by_latency(level)` → `AudioConfig` - Get by latency level
### AudioConfig
Custom audio configuration.
```python
AudioConfig(
name: str,
chunk_secs: float,
left_context_secs: float,
right_context_secs: float
)
```
**Properties:**
- `latency` (float) - Theoretical latency in seconds
- `quality_score` (int) - Quality rating (1-5)
- `quality_indicator` (str) - Visual indicator (●●●○○)
## 📂 Examples
The `examples/` directory contains complete working examples:
### Available Examples
- **simple_transcribe.py** - Basic file transcription
- **streaming_transcribe.py** - Streaming with custom configuration
- **batch_transcribe.py** - Batch processing multiple files
- **test_microphones.py** - 🎤 **Test all microphones and find the best one**
- **microphone_simple.py** - Simple microphone recording
- **stream_microphone.py** - Full-featured live transcription
- **benchmark.py** - Compare configurations and benchmark performance
### Running Examples
```bash
# Test all microphones (recommended first step!)
python examples/test_microphones.py
# Simple transcription
python examples/simple_transcribe.py
# Live microphone (Ctrl+C to stop)
python examples/stream_microphone.py
# Save transcript to file
python examples/stream_microphone.py --output transcript.txt
# Use different quality preset
python examples/stream_microphone.py --config low_latency
# Benchmark different configurations
python examples/benchmark.py --audio audio.wav --benchmark
```
## 🌍 Supported Languages
The model automatically detects and transcribes in **25 European languages**:
Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Spanish, Swedish, Ukrainian
## 🚀 Performance
### Speed
- **CPU**: ~2-3x real-time on modern CPUs (transcribe 1 hour in 20-30 minutes)
- **GPU**: ~10x real-time on NVIDIA GPUs (transcribe 1 hour in 6 minutes)
- **Apple Silicon**: ~3-5x real-time on M1/M2/M3/M4
### Memory
- **CPU**: 2-4GB RAM
- **GPU**: 2-4GB RAM + 2GB VRAM
- **Model Size**: ~600MB download
### First Run
Model downloads from HuggingFace on first run (~600MB). Subsequent runs load from cache (~3-5 seconds).
## 🛠️ Development
### Setup Development Environment
```bash
# Clone repository
git clone https://github.com/maximerivest/parakeet-stream.git
cd parakeet-stream
# Install with dev dependencies
uv pip install -e ".[dev]"
# Install with microphone support
uv pip install -e ".[dev,microphone]"
```
### Running Tests
```bash
# Run all tests
pytest
# Run with coverage
pytest --cov=parakeet_stream --cov-report=html
# Run specific test file
pytest tests/test_parakeet.py
# Run specific test
pytest tests/test_parakeet.py::test_transcribe
# Run verbose
pytest -v
```
### Code Quality
```bash
# Format code
black parakeet_stream/
# Lint code
ruff check parakeet_stream/
# Type checking (if using mypy)
mypy parakeet_stream/
```
## 🐛 Troubleshooting
### Installation Issues
**Build errors during installation:**
```bash
# Install build dependencies first
pip install "Cython>=0.29.0" "numpy>=1.20.0"
# Then install the package
pip install -e .
```
**Python 3.13 compatibility:**
The package automatically installs `ml-dtypes>=0.5.0` for Python 3.13 support.
### Microphone Issues
**Linux (Ubuntu/Debian):**
```bash
sudo apt-get install portaudio19-dev
pip install sounddevice --force-reinstall
```
**Linux (Fedora/RHEL):**
```bash
sudo dnf install portaudio-devel
pip install sounddevice --force-reinstall
```
**macOS:**
```bash
brew install portaudio
pip install sounddevice --force-reinstall
```
**Test microphone:**
```python
from parakeet_stream import Microphone
# List available microphones
mics = Microphone.discover()
for mic in mics:
print(mic)
# Test specific microphone
mic = Microphone(device=0)
clip = mic.record(2.0)
clip.play()
```
### Performance Issues
**Slow transcription:**
- Use GPU if available: `Parakeet(device="cuda")`
- Use lower quality preset: `pk.with_config('low_latency')`
- Close other applications to free RAM
- Check CPU usage - transcription is CPU-intensive
**High memory usage:**
- Use `lazy=True` for delayed loading
- Process files in smaller batches
- Reduce context window sizes with `pk.with_params()`
**Model download fails:**
```bash
# Set HuggingFace cache directory
export HF_HOME=/path/to/cache
# Or use offline mode (requires cached model)
export HF_HUB_OFFLINE=1
```
### Common Errors
**`RuntimeError: Model not loaded`:**
If using `lazy=True`, call `pk.load()` before transcribing.
**`ImportError: sounddevice is required`:**
Install microphone dependencies:
```bash
pip install "parakeet-stream[microphone]"
```
**Audio format errors:**
Ensure audio is 16kHz mono WAV. Convert with:
```bash
ffmpeg -i input.mp3 -ar 16000 -ac 1 output.wav
```
## 📄 License
MIT License - See LICENSE file for details.
This library uses NVIDIA's Parakeet TDT model, which is licensed under [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/).
## 🙏 Acknowledgments
- Built on [NVIDIA NeMo](https://github.com/NVIDIA/NeMo)
- Uses [Parakeet TDT 0.6b v3](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3) model
- Inspired by NVIDIA's streaming inference examples
## 📖 Citation
If you use this library in your research, please cite the Parakeet model:
```bibtex
@misc{parakeet-tdt-0.6b-v3,
title={Parakeet TDT 0.6B V3},
author={NVIDIA},
year={2025},
url={https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3}
}
```
## 🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
### How to Contribute
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Make your changes
4. Run tests (`pytest`)
5. Commit your changes (`git commit -m 'Add amazing feature'`)
6. Push to the branch (`git push origin feature/amazing-feature`)
7. Open a Pull Request
## 🛠️ CLI Tools
Parakeet Stream includes production-ready CLI tools for server and client deployment.
### Server CLI
Install and run the transcription server:
```bash
# Run server directly with uvx (no installation needed)
uvx --from parakeet-stream parakeet-server run --host 0.0.0.0 --port 8765 --device cuda
# Or install as systemd service for production (requires sudo)
uvx --from parakeet-stream parakeet-server install
# Check service status
sudo systemctl status parakeet-server
sudo journalctl -u parakeet-server -f # View logs
```
**Server options:**
- `--host`: Host to bind to (default: 0.0.0.0)
- `--port`: Port to listen on (default: 8765)
- `--device`: Device to use (cpu, cuda, mps)
- `--config`: Quality preset (low_latency, balanced, high_quality)
- `--chunk-secs`: Audio chunk size in seconds
- `--left-context-secs`: Left context window
- `--right-context-secs`: Right context window
### Client CLI (Hotkey Transcription)
System-wide hotkey transcription that works anywhere:
```bash
# Run client with uvx (installs dependencies automatically)
uvx --from 'parakeet-stream[hotkey]' parakeet-client run \
--server ws://192.168.1.100:8765 \
--auto-paste
# Or install as user systemd service (autostart on login)
uvx --from 'parakeet-stream[hotkey]' parakeet-client install
# Check service status
systemctl --user status parakeet-hotkey
```
**Client features:**
- Press **Alt+W** to start/stop recording
- Transcription copied to clipboard automatically
- Optional auto-paste with smart terminal detection (Ctrl+Shift+V for terminals, Ctrl+V for apps)
- Transcription shown in system status bar (requires `panelstatus`)
- Works system-wide in any application
**Client requirements:**
- Linux with X11 (requires `xdotool` for auto-paste)
- `pynput`, `panelstatus`, `pyperclip` (installed automatically with `[hotkey]` extras)
### Installation as Tools
For persistent installation:
```bash
# Install server tool
uv tool install 'parakeet-stream[server]'
# Install client tool with hotkey dependencies
uv tool install 'parakeet-stream[hotkey]'
# Now use commands directly
parakeet-server run --device cuda
parakeet-client run --server ws://localhost:8765
```
## 💬 Support
- **Documentation**: This README and inline code documentation
- **Issues**: [GitHub Issues](https://github.com/maximerivest/parakeet-stream/issues)
- **Discussions**: [GitHub Discussions](https://github.com/maximerivest/parakeet-stream/discussions)
---
**Made with ❤️ for the speech recognition community**
Raw data
{
"_id": null,
"home_page": null,
"name": "parakeet-stream",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.14,>=3.9",
"maintainer_email": null,
"keywords": "asr, nemo, nvidia, parakeet, speech-recognition, streaming, transcription",
"author": null,
"author_email": "Maxime Rivest <mrive052@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/cd/06/dc0d17db7170267464ff041c22e6aef223006c96563fb81f534384f062aa/parakeet_stream-0.5.0.tar.gz",
"platform": null,
"description": "# Parakeet Stream\n\n**Simple, powerful streaming transcription for Python using NVIDIA's Parakeet TDT 0.6b**\n\nA modern Python library with a beautiful REPL-friendly API for audio transcription, featuring instant quality tuning, live microphone support, and rich interactive displays.\n\n## \u2728 Features\n\n- \ud83c\udfaf **Simple & Intuitive** - Beautiful API designed for interactive use\n- \ud83c\udfa8 **Rich Displays** - Gorgeous output in Python REPL, IPython, and Jupyter notebooks\n- \u26a1 **Instant Quality Tuning** - Switch between 6 quality presets without reloading model\n- \ud83c\udfa4 **Live Transcription** - Real-time microphone transcription with one line of code\n- \ud83c\udf0a **Streaming Support** - Process audio in chunks with configurable latency\n- \ud83d\udcbb **CPU Optimized** - Efficient inference on CPU (GPU optional)\n- \ud83c\udf0d **25 Languages** - Automatic language detection\n- \ud83d\udce6 **Batch Processing** - Transcribe multiple files efficiently\n- \u23f1\ufe0f **Timestamps** - Optional word-level timestamps\n\n## \ud83d\ude80 Installation\n\n### Quick Install\n\n```bash\n# Install with pip\npip install git+https://github.com/maximerivest/parakeet-stream.git\n\n# Or with uv (recommended)\nuv pip install git+https://github.com/maximerivest/parakeet-stream.git\n\n# With microphone support\npip install \"parakeet-stream[microphone] @ git+https://github.com/maximerivest/parakeet-stream.git\"\n```\n\n### Install from Source\n\n```bash\ngit clone https://github.com/maximerivest/parakeet-stream.git\ncd parakeet-stream\n\n# Install with uv\nuv pip install -e .\n\n# Or with pip\npip install -e .\n\n# With microphone support\nuv pip install -e \".[microphone]\"\n```\n\n### Requirements\n\n- Python 3.9-3.13\n- 2GB+ RAM (4GB+ recommended)\n- Any modern CPU (GPU optional)\n\n**Note**: Python 3.13 support requires `ml-dtypes>=0.5.0` which is automatically installed as a dependency.\n\n## \ud83d\udcd6 Quick Start\n\n### Basic Transcription\n\n```python\nfrom parakeet_stream import Parakeet\n\n# Initialize (loads model with clean progress bar)\npk = Parakeet()\n\n# Transcribe an audio file\nresult = pk.transcribe(\"audio.wav\")\nprint(result.text)\n```\n\nThe model loads immediately on initialization with a clean progress bar (no verbose logging). First run takes 3-5 minutes (downloads ~600MB from HuggingFace), subsequent runs load from cache in ~5 seconds.\n\n### Live Microphone Transcription\n\n```python\nfrom parakeet_stream import Parakeet\n\n# Initialize transcriber\npk = Parakeet()\n\n# Start live transcription (silent mode - no console output)\nlive = pk.listen()\n\n# Speak into microphone...\n# Transcription happens silently in background\n\n# Access transcript\nprint(live.text) # Get current text\nprint(live.transcript.stats) # Get statistics\n\n# Stop and get results\nlive.stop()\nprint(live.transcript.text)\n\n# Verbose mode - prints transcriptions to console\nlive = pk.listen(verbose=True)\n# [2.5s] Hello world\n# [4.6s] This is a test\n```\n\n### Quality/Latency Tuning\n\nSwitch between quality presets instantly - **no model reload needed**!\n\n```python\nfrom parakeet_stream import Parakeet\n\npk = Parakeet()\n\n# Try different quality levels (no reload!)\npk.with_quality('max').transcribe(\"audio.wav\") # \u25cf\u25cf\u25cf\u25cf\u25cf (15s latency)\npk.with_quality('high').transcribe(\"audio.wav\") # \u25cf\u25cf\u25cf\u25cf\u25cb (10s latency)\npk.with_quality('good').transcribe(\"audio.wav\") # \u25cf\u25cf\u25cf\u25cb\u25cb (4s latency)\npk.with_quality('low').transcribe(\"audio.wav\") # \u25cf\u25cf\u25cb\u25cb\u25cb (2s latency)\npk.with_quality('realtime').transcribe(\"audio.wav\") # \u25cf\u25cb\u25cb\u25cb\u25cb (1s latency)\n\n# Or use preset names\npk.with_config('balanced').transcribe(\"audio.wav\")\npk.with_config('low_latency').transcribe(\"audio.wav\")\n```\n\n### Streaming Transcription\n\nProcess long audio files in chunks:\n\n```python\nfrom parakeet_stream import Parakeet\n\npk = Parakeet()\n\n# Stream transcription results as they become available\nfor chunk in pk.stream(\"long_audio.wav\"):\n print(f\"[{chunk.timestamp_start:.1f}s]: {chunk.text}\")\n if chunk.is_final:\n print(f\"\u2713 Final: {chunk.text}\")\n```\n\n### Microphone Features\n\n```python\nfrom parakeet_stream import Parakeet, Microphone\n\npk = Parakeet()\n\n# Test ALL microphones automatically (recommended!)\nresults = Microphone.test_all(pk)\n# Shows test phrase for you to read\n# Tests each microphone with the same phrase\n# Ranks by quality and recommends best one\n# You can play back any recording: results[0].clip.play()\n\n# Use the best microphone\nbest_mic = results[0].microphone\nlive = pk.listen(microphone=best_mic)\n\n# Or manually discover and test\nmics = Microphone.discover()\nfor mic in mics:\n print(mic)\n# \ud83c\udfa4 Microphone 0: Built-in Microphone\n# \ud83c\udfa4 Microphone 1: USB Microphone\n\n# Test a specific microphone\nmic = Microphone(device=1)\ntest_result = mic.test(pk)\n# Shows random test phrase\n# Records, transcribes, and evaluates quality\n# Returns detailed metrics: match score, confidence, audio level\n\n# Record audio\nclip = mic.record(duration=5.0)\nclip.play() # Playback\nclip.save(\"recording.wav\") # Save to file\n```\n\n### Batch Processing\n\n```python\nfrom parakeet_stream import Parakeet\n\npk = Parakeet()\n\n# Transcribe multiple files with progress bar\naudio_files = [\"file1.wav\", \"file2.wav\", \"file3.wav\"]\nresults = pk.transcribe_batch(audio_files, show_progress=True)\n\nfor file, result in zip(audio_files, results):\n print(f\"{file}: {result.text}\")\n```\n\n## \ud83c\udf9b\ufe0f Configuration Guide\n\n### Quality Presets\n\nParakeet Stream includes 6 carefully tuned presets for different use cases:\n\n| Preset | Quality | Latency | Use Case |\n|--------|---------|---------|----------|\n| `maximum_quality` | \u25cf\u25cf\u25cf\u25cf\u25cf | ~15s | Offline transcription, highest accuracy |\n| `high_quality` | \u25cf\u25cf\u25cf\u25cf\u25cb | ~10s | Long audio files, near-perfect quality |\n| `balanced` | \u25cf\u25cf\u25cf\u25cb\u25cb | ~4s | **Default** - Great quality, acceptable latency |\n| `low_latency` | \u25cf\u25cf\u25cb\u25cb\u25cb | ~2s | Interactive applications |\n| `realtime` | \u25cf\u25cb\u25cb\u25cb\u25cb | ~1s | Live conversations, minimal delay |\n| `ultra_realtime` | \u25cf\u25cb\u25cb\u25cb\u25cb | ~0.3s | Experimental ultra-low latency |\n\n```python\nfrom parakeet_stream import Parakeet\n\n# Use preset at initialization\npk = Parakeet(config='balanced')\n\n# Or change on the fly (no reload!)\npk.with_config('high_quality')\n\n# Access preset information\nfrom parakeet_stream import ConfigPresets\n\nprint(ConfigPresets.list())\n# ['maximum_quality', 'high_quality', 'balanced', 'low_latency', 'realtime', 'ultra_realtime']\n\nprint(ConfigPresets.BALANCED)\n# balanced:\n# Chunk: 2.0s | Left: 10.0s | Right: 2.0s\n# Latency: ~4.0s | Quality: \u25cf\u25cf\u25cf\u25cb\u25cb\n```\n\n### Custom Parameters\n\nFine-tune parameters for specific needs:\n\n```python\nfrom parakeet_stream import Parakeet\n\npk = Parakeet()\n\n# Adjust individual parameters\npk.with_params(\n chunk_secs=3.0, # Process in 3-second chunks\n left_context_secs=15.0, # More context for better quality\n right_context_secs=1.5 # Less lookahead for lower latency\n)\n\nresult = pk.transcribe(\"audio.wav\")\n```\n\n**Understanding Parameters:**\n\n- **chunk_secs**: Size of each processing chunk (affects latency)\n- **left_context_secs**: Context from previous audio (improves quality)\n- **right_context_secs**: Context from future audio (affects latency)\n\n**Latency Formula**: `latency = chunk_secs + right_context_secs`\n\n### Device Selection\n\n```python\nfrom parakeet_stream import Parakeet\n\n# CPU (default) - works everywhere\npk = Parakeet(device=\"cpu\")\n\n# NVIDIA GPU - 5-10x faster\npk = Parakeet(device=\"cuda\")\n\n# Apple Silicon (M1/M2/M3/M4)\npk = Parakeet(device=\"mps\")\n```\n\n### Lazy Loading\n\nBy default, models load immediately (eager loading). For advanced use cases:\n\n```python\nfrom parakeet_stream import Parakeet\n\n# Delay model loading\npk = Parakeet(lazy=True)\n\n# Model loads on first use\nresult = pk.transcribe(\"audio.wav\")\n\n# Or load manually\npk.load()\n```\n\n## \ud83c\udfa8 Rich REPL Experience\n\nParakeet Stream provides beautiful displays in interactive environments:\n\n### Python REPL\n\n```python\n>>> from parakeet_stream import Parakeet\n>>> pk = Parakeet()\n\nLoading nvidia/parakeet-tdt-0.6b-v3 on cpu...\nLoading model: 20%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588 | 1/5\nMoving to device: 40%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588 | 2/5\nConfiguring streaming: 60%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588 | 3/5\nSetting up decoder: 80%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588 | 4/5\nComputing context: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 5/5\n\u2713 Ready! (nvidia/parakeet-tdt-0.6b-v3 on cpu)\n\n>>> pk\nParakeet(model='nvidia/parakeet-tdt-0.6b-v3', device='cpu', config='balanced', status='ready')\n```\n\n### IPython\n\n```python\nIn [1]: from parakeet_stream import Parakeet\nIn [2]: pk = Parakeet()\nIn [3]: pk\nOut[3]:\nParakeet(model='nvidia/parakeet-tdt-0.6b-v3', device='cpu')\n Quality: \u25cf\u25cf\u25cf\u25cb\u25cb (balanced)\n Latency: ~4.0s\n Status: \u2713 Ready\n\nIn [4]: result = pk.transcribe(\"audio.wav\")\nIn [5]: result\nOut[5]:\n\ud83d\udcdd This is a sample transcription\n Confidence: 95% \u25cf\u25cf\u25cf\u25cf\u25cf\n Duration: 5.2s\n```\n\n### Jupyter Notebooks\n\nResults display as styled HTML tables with rich formatting.\n\n### Explore Configuration\n\n```python\n>>> from parakeet_stream import ConfigPresets\n>>> ConfigPresets.list()\n['maximum_quality', 'high_quality', 'balanced', 'low_latency', 'realtime', 'ultra_realtime']\n\n>>> ConfigPresets.BALANCED\nAudioConfig(name='balanced', latency=4.0s, quality=\u25cf\u25cf\u25cf\u25cb\u25cb)\n\n>>> print(ConfigPresets.list_with_details())\nAvailable Configuration Presets:\n\n balanced:\n Chunk: 2.0s | Left: 10.0s | Right: 2.0s\n Latency: ~4.0s | Quality: \u25cf\u25cf\u25cf\u25cb\u25cb\n\n high_quality:\n Chunk: 5.0s | Left: 10.0s | Right: 5.0s\n Latency: ~10.0s | Quality: \u25cf\u25cf\u25cf\u25cf\u25cb\n ...\n```\n\n## \ud83c\udfa4 Microphone Quality Testing\n\nNot sure which microphone to use? Test them all automatically!\n\n### Test All Microphones\n\n```python\nfrom parakeet_stream import Parakeet, Microphone\n\npk = Parakeet()\n\n# Automatically test all microphones\nresults = Microphone.test_all(pk)\n```\n\n**What it does:**\n1. Discovers all available microphones\n2. Shows you a test phrase to read\n3. Records from each microphone (same phrase for fair comparison)\n4. Transcribes and evaluates quality\n5. Detects silent/broken microphones\n6. Ranks by quality score (transcription accuracy + confidence)\n7. Recommends the best one\n\n**Output:**\n```\n============================================================\n\ud83c\udfa4 MICROPHONE QUALITY TEST\n============================================================\n\n\ud83d\udd0d Discovering microphones...\n\u2713 Found 3 microphone(s):\n 1. Built-in Microphone (device 0)\n 2. USB Microphone (device 1)\n 3. Bluetooth Headset (device 2)\n\n\ud83d\udcdd Test phrase (same for all microphones):\n\n \"Speech recognition technology continues to improve every year\"\n\nWe'll now test each microphone. Press Enter to start...\n\n... tests each mic ...\n\n============================================================\n\ud83d\udcca RESULTS SUMMARY\n============================================================\n\nRanking (Best to Worst):\n\n1. \u2713 USB Microphone\n Device: 1\n Quality: [\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588 ] 82.3%\n Match: 85.0%\n Confidence: 92% \u25cf\u25cf\u25cf\u25cf\u25cf\n Audio Level: 0.0523\n Transcribed: \"speech recognition technology continues to improve...\"\n\n2. \u2713 Built-in Microphone\n Device: 0\n Quality: [\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588 ] 65.4%\n Match: 70.0%\n Confidence: 85% \u25cf\u25cf\u25cf\u25cf\u25cb\n Audio Level: 0.0312\n\n3. \u2717 Bluetooth Headset\n Device: 2\n Quality: [ ] 0.0%\n Match: 0.0%\n Audio Level: 0.0001\n \u26a0\ufe0f No audio detected\n\n\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n\ud83c\udfc6 RECOMMENDATION\n\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n\nBest microphone: USB Microphone\nDevice index: 1\nQuality score: 82.3%\n\nTo use this microphone:\n>>> mic = Microphone(device=1)\n>>> live = pk.listen(microphone=mic)\n\n============================================================\nTip: You can replay any recording:\n>>> results[0].clip.play() # Play best mic's recording\n============================================================\n```\n\n### Access Test Results\n\n```python\n# Get results\nresults = Microphone.test_all(pk)\n\n# Use best microphone\nbest = results[0]\nprint(f\"Best: {best.microphone.name}\")\nprint(f\"Quality: {best.quality_score:.1%}\")\n\n# Play back recordings\nbest.clip.play()\n\n# See what was transcribed\nprint(f\"Expected: {best.expected_text}\")\nprint(f\"Got: {best.transcribed_text}\")\n\n# Check metrics\nprint(f\"Match: {best.match_score:.1%}\")\nprint(f\"Confidence: {best.confidence:.1%}\")\nprint(f\"Audio level (RMS): {best.rms_level:.4f}\")\n\n# Start live transcription with best mic\nlive = pk.listen(microphone=best.microphone)\n```\n\n### Test Single Microphone\n\n```python\npk = Parakeet()\nmic = Microphone(device=1)\n\n# Test with random phrase\nresult = mic.test(pk, duration=5.0)\n# Shows phrase, records, transcribes, evaluates\n\n# Test with specific phrase\nresult = mic.test(pk, phrase=\"Hello world\", duration=3.0)\n\n# Skip playback (faster)\nresult = mic.test(pk, playback=False)\n```\n\n## \ud83c\udfaf Live Transcription Deep Dive\n\n### Basic Usage\n\n```python\nfrom parakeet_stream import Parakeet\n\npk = Parakeet()\n\n# Silent mode (default) - no console output\nlive = pk.listen()\n\n# Transcription runs in background\n# Check current transcript\nprint(live.text)\n\n# Get statistics\nprint(live.transcript.stats)\n# {'segments': 15, 'duration': 45.2, 'words': 234, 'avg_confidence': 0.94}\n\n# Control playback\nlive.pause() # Pause transcription\nlive.resume() # Resume transcription\nlive.stop() # Stop completely\n\n# Verbose mode - prints to console\nlive = pk.listen(verbose=True)\n# \ud83c\udfa4 Listening on: Built-in Microphone\n# (Press Ctrl+C or call .stop() to end)\n# [2.5s] Hello world\n# [4.6s] This is a test\n```\n\n### Save to File\n\n```python\npk = Parakeet()\n\n# Transcription automatically saved to file\nlive = pk.listen(output=\"transcript.txt\")\n\n# Stop and save complete transcript\nlive.stop()\nlive.transcript.save(\"transcript.json\") # Save with metadata\n```\n\n### Custom Microphone\n\n```python\nfrom parakeet_stream import Parakeet, Microphone\n\n# Use specific microphone\nmic = Microphone(device=1) # USB microphone\n\npk = Parakeet()\nlive = pk.listen(microphone=mic)\n```\n\n### Access Segments\n\n```python\nlive = pk.listen()\n\n# Wait for some transcription...\n\n# Get all segments\nfor segment in live.transcript.segments:\n print(f\"[{segment.start_time:.1f}s - {segment.end_time:.1f}s] {segment.text}\")\n\n# Get last 5 segments\nrecent = live.transcript.tail(5)\n\n# Get first 5 segments\nbeginning = live.transcript.head(5)\n```\n\n## \ud83d\udcda API Reference\n\n### Parakeet\n\nMain interface for transcription.\n\n```python\nParakeet(\n model_name: str = \"nvidia/parakeet-tdt-0.6b-v3\",\n device: str = \"cpu\",\n config: Union[str, AudioConfig] = \"balanced\",\n lazy: bool = False\n)\n```\n\n**Methods:**\n\n- `transcribe(audio, timestamps=False)` \u2192 `TranscriptResult`\n - Transcribe audio file or array\n\n- `stream(audio)` \u2192 `Generator[StreamChunk]`\n - Stream transcription results as chunks\n\n- `transcribe_batch(audio_files, timestamps=False, show_progress=True)` \u2192 `List[TranscriptResult]`\n - Batch transcribe multiple files\n\n- `listen(microphone=None, output=None, chunk_duration=None, verbose=False)` \u2192 `LiveTranscriber`\n - Start live microphone transcription (silent by default)\n\n**Configuration Methods (Chainable):**\n\n- `with_config(config)` \u2192 `Parakeet`\n - Set configuration preset or custom AudioConfig\n\n- `with_quality(level)` \u2192 `Parakeet`\n - Set quality level: 'max', 'high', 'good', 'low', 'realtime'\n\n- `with_latency(level)` \u2192 `Parakeet`\n - Set latency level: 'high', 'medium', 'low', 'realtime'\n\n- `with_params(chunk_secs=None, left_context_secs=None, right_context_secs=None)` \u2192 `Parakeet`\n - Set custom parameters\n\n**Properties:**\n\n- `config` - Current AudioConfig\n- `configs` - Access to ConfigPresets\n\n### TranscriptResult\n\nRich result object from transcription.\n\n**Attributes:**\n- `text` (str) - Transcribed text\n- `confidence` (float) - Confidence score (0.0-1.0)\n- `duration` (float) - Audio duration in seconds\n- `timestamps` (List[dict]) - Word-level timestamps (if enabled)\n- `word_count` (int) - Number of words\n- `has_timestamps` (bool) - Whether timestamps are available\n\n### LiveTranscriber\n\nBackground live transcription manager.\n\nRuns silently by default - transcription happens in background without console output.\nUse `verbose=True` to print transcriptions to console.\n\n**Methods:**\n\n- `start()` - Start transcription (called automatically by `pk.listen()`)\n- `pause()` - Pause transcription\n- `resume()` - Resume transcription\n- `stop()` - Stop transcription\n\n**Properties:**\n\n- `text` (str) - Current full transcript\n- `transcript` (TranscriptBuffer) - Buffer with all segments\n- `is_running` (bool) - Whether currently running\n- `is_paused` (bool) - Whether currently paused\n- `elapsed` (float) - Elapsed time in seconds\n- `verbose` (bool) - Whether console output is enabled\n\n### TranscriptBuffer\n\nThread-safe buffer for live transcription segments.\n\n**Methods:**\n\n- `append(segment)` - Add segment\n- `save(path)` - Save to JSON file\n- `head(n=5)` - Get first n segments\n- `tail(n=5)` - Get last n segments\n\n**Properties:**\n\n- `text` (str) - Full text (all segments joined)\n- `segments` (List[Segment]) - All segments\n- `stats` (dict) - Statistics (segments, duration, words, avg_confidence)\n\n### Microphone\n\nMicrophone input manager with quality testing.\n\n```python\nMicrophone(device=None, sample_rate=16000)\n```\n\n**Class Methods:**\n\n- `discover()` \u2192 `List[Microphone]`\n - Discover all available microphones\n\n- `test_all(transcriber, duration=5.0, playback=False)` \u2192 `List[MicrophoneTestResult]`\n - Test all microphones and rank by quality (recommended!)\n\n**Methods:**\n\n- `record(duration=3.0)` \u2192 `AudioClip`\n - Record audio for specified duration\n\n- `test(transcriber, duration=5.0, phrase=None, playback=True)` \u2192 `MicrophoneTestResult`\n - Test microphone quality with transcription\n - Shows test phrase for user to read\n - Returns detailed quality metrics\n\n**Properties:**\n\n- `name` (str) - Device name\n- `channels` (int) - Number of input channels\n\n### MicrophoneTestResult\n\nResult from microphone quality test.\n\n**Attributes:**\n\n- `microphone` (Microphone) - The tested microphone\n- `clip` (AudioClip) - Recorded audio (can replay with `.clip.play()`)\n- `expected_text` (str) - Text user was supposed to say\n- `transcribed_text` (str) - What was actually transcribed\n- `confidence` (float) - Transcription confidence score\n- `has_audio` (bool) - Whether audio was detected (not silent)\n- `rms_level` (float) - Audio level (higher = louder)\n- `match_score` (float) - How well transcription matches (0-1)\n- `quality_score` (float) - Overall quality (0-1)\n\n### AudioClip\n\nRecorded audio wrapper.\n\n**Methods:**\n\n- `play()` - Play audio through default device\n- `save(path)` - Save to WAV file\n- `to_tensor()` - Convert to PyTorch tensor\n\n**Properties:**\n\n- `duration` (float) - Duration in seconds\n- `num_samples` (int) - Number of samples\n- `data` (np.ndarray) - Audio data array\n- `sample_rate` (int) - Sample rate in Hz\n\n### ConfigPresets\n\nPre-configured quality/latency presets.\n\n**Presets:**\n\n- `MAXIMUM_QUALITY` - Best quality (15s latency)\n- `HIGH_QUALITY` - High quality (10s latency)\n- `BALANCED` - Balanced (4s latency) - **Default**\n- `LOW_LATENCY` - Low latency (2s latency)\n- `REALTIME` - Real-time (1s latency)\n- `ULTRA_REALTIME` - Ultra real-time (0.3s latency)\n\n**Methods:**\n\n- `get(name)` \u2192 `AudioConfig` - Get preset by name\n- `list()` \u2192 `List[str]` - List all preset names\n- `list_with_details()` \u2192 `str` - Formatted list with details\n- `by_quality(level)` \u2192 `AudioConfig` - Get by quality level\n- `by_latency(level)` \u2192 `AudioConfig` - Get by latency level\n\n### AudioConfig\n\nCustom audio configuration.\n\n```python\nAudioConfig(\n name: str,\n chunk_secs: float,\n left_context_secs: float,\n right_context_secs: float\n)\n```\n\n**Properties:**\n\n- `latency` (float) - Theoretical latency in seconds\n- `quality_score` (int) - Quality rating (1-5)\n- `quality_indicator` (str) - Visual indicator (\u25cf\u25cf\u25cf\u25cb\u25cb)\n\n## \ud83d\udcc2 Examples\n\nThe `examples/` directory contains complete working examples:\n\n### Available Examples\n\n- **simple_transcribe.py** - Basic file transcription\n- **streaming_transcribe.py** - Streaming with custom configuration\n- **batch_transcribe.py** - Batch processing multiple files\n- **test_microphones.py** - \ud83c\udfa4 **Test all microphones and find the best one**\n- **microphone_simple.py** - Simple microphone recording\n- **stream_microphone.py** - Full-featured live transcription\n- **benchmark.py** - Compare configurations and benchmark performance\n\n### Running Examples\n\n```bash\n# Test all microphones (recommended first step!)\npython examples/test_microphones.py\n\n# Simple transcription\npython examples/simple_transcribe.py\n\n# Live microphone (Ctrl+C to stop)\npython examples/stream_microphone.py\n\n# Save transcript to file\npython examples/stream_microphone.py --output transcript.txt\n\n# Use different quality preset\npython examples/stream_microphone.py --config low_latency\n\n# Benchmark different configurations\npython examples/benchmark.py --audio audio.wav --benchmark\n```\n\n## \ud83c\udf0d Supported Languages\n\nThe model automatically detects and transcribes in **25 European languages**:\n\nBulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Spanish, Swedish, Ukrainian\n\n## \ud83d\ude80 Performance\n\n### Speed\n\n- **CPU**: ~2-3x real-time on modern CPUs (transcribe 1 hour in 20-30 minutes)\n- **GPU**: ~10x real-time on NVIDIA GPUs (transcribe 1 hour in 6 minutes)\n- **Apple Silicon**: ~3-5x real-time on M1/M2/M3/M4\n\n### Memory\n\n- **CPU**: 2-4GB RAM\n- **GPU**: 2-4GB RAM + 2GB VRAM\n- **Model Size**: ~600MB download\n\n### First Run\n\nModel downloads from HuggingFace on first run (~600MB). Subsequent runs load from cache (~3-5 seconds).\n\n## \ud83d\udee0\ufe0f Development\n\n### Setup Development Environment\n\n```bash\n# Clone repository\ngit clone https://github.com/maximerivest/parakeet-stream.git\ncd parakeet-stream\n\n# Install with dev dependencies\nuv pip install -e \".[dev]\"\n\n# Install with microphone support\nuv pip install -e \".[dev,microphone]\"\n```\n\n### Running Tests\n\n```bash\n# Run all tests\npytest\n\n# Run with coverage\npytest --cov=parakeet_stream --cov-report=html\n\n# Run specific test file\npytest tests/test_parakeet.py\n\n# Run specific test\npytest tests/test_parakeet.py::test_transcribe\n\n# Run verbose\npytest -v\n```\n\n### Code Quality\n\n```bash\n# Format code\nblack parakeet_stream/\n\n# Lint code\nruff check parakeet_stream/\n\n# Type checking (if using mypy)\nmypy parakeet_stream/\n```\n\n## \ud83d\udc1b Troubleshooting\n\n### Installation Issues\n\n**Build errors during installation:**\n\n```bash\n# Install build dependencies first\npip install \"Cython>=0.29.0\" \"numpy>=1.20.0\"\n\n# Then install the package\npip install -e .\n```\n\n**Python 3.13 compatibility:**\n\nThe package automatically installs `ml-dtypes>=0.5.0` for Python 3.13 support.\n\n### Microphone Issues\n\n**Linux (Ubuntu/Debian):**\n\n```bash\nsudo apt-get install portaudio19-dev\npip install sounddevice --force-reinstall\n```\n\n**Linux (Fedora/RHEL):**\n\n```bash\nsudo dnf install portaudio-devel\npip install sounddevice --force-reinstall\n```\n\n**macOS:**\n\n```bash\nbrew install portaudio\npip install sounddevice --force-reinstall\n```\n\n**Test microphone:**\n\n```python\nfrom parakeet_stream import Microphone\n\n# List available microphones\nmics = Microphone.discover()\nfor mic in mics:\n print(mic)\n\n# Test specific microphone\nmic = Microphone(device=0)\nclip = mic.record(2.0)\nclip.play()\n```\n\n### Performance Issues\n\n**Slow transcription:**\n\n- Use GPU if available: `Parakeet(device=\"cuda\")`\n- Use lower quality preset: `pk.with_config('low_latency')`\n- Close other applications to free RAM\n- Check CPU usage - transcription is CPU-intensive\n\n**High memory usage:**\n\n- Use `lazy=True` for delayed loading\n- Process files in smaller batches\n- Reduce context window sizes with `pk.with_params()`\n\n**Model download fails:**\n\n```bash\n# Set HuggingFace cache directory\nexport HF_HOME=/path/to/cache\n\n# Or use offline mode (requires cached model)\nexport HF_HUB_OFFLINE=1\n```\n\n### Common Errors\n\n**`RuntimeError: Model not loaded`:**\n\nIf using `lazy=True`, call `pk.load()` before transcribing.\n\n**`ImportError: sounddevice is required`:**\n\nInstall microphone dependencies:\n```bash\npip install \"parakeet-stream[microphone]\"\n```\n\n**Audio format errors:**\n\nEnsure audio is 16kHz mono WAV. Convert with:\n```bash\nffmpeg -i input.mp3 -ar 16000 -ac 1 output.wav\n```\n\n## \ud83d\udcc4 License\n\nMIT License - See LICENSE file for details.\n\nThis library uses NVIDIA's Parakeet TDT model, which is licensed under [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/).\n\n## \ud83d\ude4f Acknowledgments\n\n- Built on [NVIDIA NeMo](https://github.com/NVIDIA/NeMo)\n- Uses [Parakeet TDT 0.6b v3](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3) model\n- Inspired by NVIDIA's streaming inference examples\n\n## \ud83d\udcd6 Citation\n\nIf you use this library in your research, please cite the Parakeet model:\n\n```bibtex\n@misc{parakeet-tdt-0.6b-v3,\n title={Parakeet TDT 0.6B V3},\n author={NVIDIA},\n year={2025},\n url={https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3}\n}\n```\n\n## \ud83e\udd1d Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n### How to Contribute\n\n1. Fork the repository\n2. Create a feature branch (`git checkout -b feature/amazing-feature`)\n3. Make your changes\n4. Run tests (`pytest`)\n5. Commit your changes (`git commit -m 'Add amazing feature'`)\n6. Push to the branch (`git push origin feature/amazing-feature`)\n7. Open a Pull Request\n\n## \ud83d\udee0\ufe0f CLI Tools\n\nParakeet Stream includes production-ready CLI tools for server and client deployment.\n\n### Server CLI\n\nInstall and run the transcription server:\n\n```bash\n# Run server directly with uvx (no installation needed)\nuvx --from parakeet-stream parakeet-server run --host 0.0.0.0 --port 8765 --device cuda\n\n# Or install as systemd service for production (requires sudo)\nuvx --from parakeet-stream parakeet-server install\n\n# Check service status\nsudo systemctl status parakeet-server\nsudo journalctl -u parakeet-server -f # View logs\n```\n\n**Server options:**\n- `--host`: Host to bind to (default: 0.0.0.0)\n- `--port`: Port to listen on (default: 8765)\n- `--device`: Device to use (cpu, cuda, mps)\n- `--config`: Quality preset (low_latency, balanced, high_quality)\n- `--chunk-secs`: Audio chunk size in seconds\n- `--left-context-secs`: Left context window\n- `--right-context-secs`: Right context window\n\n### Client CLI (Hotkey Transcription)\n\nSystem-wide hotkey transcription that works anywhere:\n\n```bash\n# Run client with uvx (installs dependencies automatically)\nuvx --from 'parakeet-stream[hotkey]' parakeet-client run \\\n --server ws://192.168.1.100:8765 \\\n --auto-paste\n\n# Or install as user systemd service (autostart on login)\nuvx --from 'parakeet-stream[hotkey]' parakeet-client install\n\n# Check service status\nsystemctl --user status parakeet-hotkey\n```\n\n**Client features:**\n- Press **Alt+W** to start/stop recording\n- Transcription copied to clipboard automatically\n- Optional auto-paste with smart terminal detection (Ctrl+Shift+V for terminals, Ctrl+V for apps)\n- Transcription shown in system status bar (requires `panelstatus`)\n- Works system-wide in any application\n\n**Client requirements:**\n- Linux with X11 (requires `xdotool` for auto-paste)\n- `pynput`, `panelstatus`, `pyperclip` (installed automatically with `[hotkey]` extras)\n\n### Installation as Tools\n\nFor persistent installation:\n\n```bash\n# Install server tool\nuv tool install 'parakeet-stream[server]'\n\n# Install client tool with hotkey dependencies\nuv tool install 'parakeet-stream[hotkey]'\n\n# Now use commands directly\nparakeet-server run --device cuda\nparakeet-client run --server ws://localhost:8765\n```\n\n## \ud83d\udcac Support\n\n- **Documentation**: This README and inline code documentation\n- **Issues**: [GitHub Issues](https://github.com/maximerivest/parakeet-stream/issues)\n- **Discussions**: [GitHub Discussions](https://github.com/maximerivest/parakeet-stream/discussions)\n\n---\n\n**Made with \u2764\ufe0f for the speech recognition community**\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Simple, powerful streaming transcription for Python using NVIDIA's Parakeet TDT 0.6b",
"version": "0.5.0",
"project_urls": {
"Documentation": "https://github.com/maximerivest/parakeet-stream/blob/main/README.md",
"Homepage": "https://github.com/maximerivest/parakeet-stream",
"Repository": "https://github.com/maximerivest/parakeet-stream"
},
"split_keywords": [
"asr",
" nemo",
" nvidia",
" parakeet",
" speech-recognition",
" streaming",
" transcription"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "5d6bbdfdf19ebb244a17f641f22c92b4c0d3925efbd18a515a9528bed729b88d",
"md5": "916540628ffb10fe82ff69627ede1373",
"sha256": "e43438534d45a65b464e8c41987c8cf2843be2ff51e2b55856fe90996fce18c2"
},
"downloads": -1,
"filename": "parakeet_stream-0.5.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "916540628ffb10fe82ff69627ede1373",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.14,>=3.9",
"size": 59016,
"upload_time": "2025-10-13T02:10:59",
"upload_time_iso_8601": "2025-10-13T02:10:59.555537Z",
"url": "https://files.pythonhosted.org/packages/5d/6b/bdfdf19ebb244a17f641f22c92b4c0d3925efbd18a515a9528bed729b88d/parakeet_stream-0.5.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "cd06dc0d17db7170267464ff041c22e6aef223006c96563fb81f534384f062aa",
"md5": "117cf0cd553f5973179d130e550ec763",
"sha256": "e7bda4b2e06b66ae799edb004438fa68470a41113a6a58b63ef76adc35d6ebfe"
},
"downloads": -1,
"filename": "parakeet_stream-0.5.0.tar.gz",
"has_sig": false,
"md5_digest": "117cf0cd553f5973179d130e550ec763",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.14,>=3.9",
"size": 477143,
"upload_time": "2025-10-13T02:11:01",
"upload_time_iso_8601": "2025-10-13T02:11:01.076855Z",
"url": "https://files.pythonhosted.org/packages/cd/06/dc0d17db7170267464ff041c22e6aef223006c96563fb81f534384f062aa/parakeet_stream-0.5.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-13 02:11:01",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "maximerivest",
"github_project": "parakeet-stream",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "parakeet-stream"
}