Name | abstractvoice JSON |
Version |
0.5.1
JSON |
| download |
home_page | None |
Summary | A modular Python library for voice interactions with AI systems |
upload_time | 2025-10-21 21:41:21 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.8 |
license | None |
keywords |
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# AbstractVoice
[](https://pypi.org/project/abstractvoice/)
[](https://pypi.org/project/abstractvoice/)
[](https://github.com/lpalbou/abstractvoice/blob/main/LICENSE)
[](https://github.com/lpalbou/abstractvoice/stargazers)
A modular Python library for voice interactions with AI systems, providing text-to-speech (TTS) and speech-to-text (STT) capabilities with interrupt handling.
While we provide CLI and WEB examples, AbstractVoice is designed to be integrated in other projects.
## Features
- **High-Quality TTS**: Best-in-class speech synthesis with VITS model
- Natural prosody and intonation
- Adjustable speed without pitch distortion (using librosa time-stretching)
- Multiple quality levels (VITS best, fast_pitch fallback)
- Automatic fallback if espeak-ng not installed
- **Cross-Platform**: Works on macOS, Linux, and Windows
- Best quality: Install espeak-ng (easy on all platforms)
- Fallback mode: Works without any system dependencies
- **Speech-to-Text**: Accurate voice recognition using OpenAI's Whisper
- **Voice Activity Detection**: Efficient speech detection using WebRTC VAD
- **Interrupt Handling**: Stop TTS by speaking or using stop commands
- **Modular Design**: Easily integrate with any text generation system
Note : *the LLM access is rudimentary and abstractvoice is provided more as an example and demonstrator. A better integration is to use the functionalities of this library and use them directly in combination with [AbstractCore](https://github.com/lpalbou/AbstractCore)*.
## Installation
AbstractVoice is designed to **work everywhere, out of the box** with automatic quality upgrades.
### 🚀 Quick Start (Recommended)
```bash
# One command installation - works on all systems
pip install abstractvoice[all]
# Verify it works
python -c "from abstractvoice import VoiceManager; print('✅ Ready to go!')"
```
**That's it!** AbstractVoice automatically:
- ✅ **Works everywhere** - Uses reliable models that run on any system
- ✅ **Auto-upgrades quality** - Detects when better models are available
- ✅ **No system dependencies required** - Pure Python installation
- ✅ **Optional quality boost** - Install `espeak-ng` for premium voices
### Installation Options
```bash
# Minimal (just 2 dependencies)
pip install abstractvoice
# Add features as needed
pip install abstractvoice[tts] # Text-to-speech
pip install abstractvoice[stt] # Speech-to-text
pip install abstractvoice[all] # Everything (recommended)
# Language-specific
pip install abstractvoice[fr] # French with all features
pip install abstractvoice[de] # German with all features
```
### Optional Quality Upgrade
For the **absolute best voice quality**, install espeak-ng:
```bash
# macOS
brew install espeak-ng
# Linux
sudo apt-get install espeak-ng
# Windows
conda install espeak-ng
```
AbstractVoice automatically detects espeak-ng and upgrades to premium quality voices when available.
## Quick Start
### ⚡ Instant TTS (v0.5.0+)
```python
from abstractvoice import VoiceManager
# Initialize voice manager - works immediately with included dependencies
vm = VoiceManager()
# Text-to-speech works right away!
vm.speak("Hello! TTS works out of the box!")
# Language switching with automatic model download
vm.set_language('fr')
vm.speak("Bonjour! Le français fonctionne aussi!")
```
**That's it!** AbstractVoice v0.5.0+ automatically:
- ✅ Includes essential TTS dependencies in base installation
- ✅ Downloads models automatically when switching languages/voices
- ✅ Works immediately after `pip install abstractvoice`
- ✅ No silent failures - clear error messages if download fails
- ✅ No complex configuration needed
### 🌍 Multi-Language Support (Auto-Download in v0.5.0+)
```python
# Simply switch language - downloads model automatically if needed!
vm.set_language('fr')
vm.speak("Bonjour! Je parle français maintenant.")
# Switch to German - no manual download needed
vm.set_language('de')
vm.speak("Hallo! Ich spreche jetzt Deutsch.")
# Spanish, Italian also supported
vm.set_language('es')
vm.speak("¡Hola! Hablo español ahora.")
# If download fails, you'll get clear error messages with instructions
# Example: "❌ Cannot switch to French: Model download failed"
# " Try: abstractvoice download-models --language fr"
```
**New in v0.5.0:** No more manual `download_model()` calls! Language switching handles downloads automatically.
### 🔧 Check System Status
```python
from abstractvoice import is_ready, get_status, list_models
import json
# Quick readiness check
ready = is_ready()
print(f"TTS ready: {ready}")
# Get detailed status
status = json.loads(get_status())
print(f"Models cached: {status['total_cached']}")
print(f"Offline ready: {status['ready_for_offline']}")
# List all available models
models = json.loads(list_models())
for lang, voices in models.items():
print(f"{lang}: {len(voices)} voices available")
```
# Speech-to-text with callbacks
def on_transcription(text):
print(f"You said: {text}")
# Process the transcription
vm.speak(f"I heard you say: {text}")
def on_stop():
print("Stopping voice interaction")
# Start listening
vm.listen(on_transcription, on_stop)
# The voice manager will automatically pause listening when speaking
# and resume when done to prevent feedback loops
```
## Additional Examples
### Language-Specific Usage
```python
# French voice
vm_fr = VoiceManager(language='fr')
vm_fr.speak("Bonjour! Je peux parler français.")
# Spanish voice
vm_es = VoiceManager(language='es')
vm_es.speak("¡Hola! Puedo hablar español.")
# Dynamic language switching
vm.set_language('fr') # Switch to French
vm.set_language('en') # Switch back to English
```
### Advanced Configuration
```python
from abstractvoice import VoiceManager
# Custom TTS model selection
vm = VoiceManager(
language='en',
tts_model='tts_models/en/ljspeech/fast_pitch', # Specific model
whisper_model='base', # Larger Whisper model for better accuracy
debug_mode=True
)
# Speed control
vm.set_speed(1.5) # 1.5x speed
vm.speak("This text will be spoken faster.")
# Model switching at runtime
vm.set_tts_model('tts_models/en/ljspeech/vits') # Switch to VITS
vm.set_whisper('small') # Switch to larger Whisper model
```
### Error Handling and Graceful Degradation
AbstractVoice is designed to provide helpful error messages and fallback gracefully:
```python
# If you install just the basic package
# pip install abstractvoice
from abstractvoice import VoiceManager # This works fine
try:
vm = VoiceManager() # This will fail with helpful message
except ImportError as e:
print(e)
# Output: "TTS functionality requires optional dependencies. Install with:
# pip install abstractvoice[tts] # For TTS only
# pip install abstractvoice[all] # For all features"
# Missing espeak-ng automatically falls back to compatible models
# Missing dependencies show clear installation instructions
# All errors are graceful with helpful guidance
```
## CLI and Web Examples
AbstractVoice includes example applications to demonstrate its capabilities:
### Using AbstractVoice from the Command Line
The easiest way to get started is to use AbstractVoice directly from your shell:
```bash
# Start AbstractVoice in voice mode (TTS ON, STT ON)
abstractvoice
# → Automatically uses VITS if espeak-ng installed (best quality)
# → Falls back to fast_pitch if espeak-ng not found
# Or start with custom settings
abstractvoice --model gemma3:latest --whisper base
# Start in text-only mode (TTS enabled, listening disabled)
abstractvoice --no-listening
```
Once started, you can interact with the AI using voice or text. Use `/help` to see all available commands.
**Note**: AbstractVoice automatically selects the best available TTS model. For best quality, install espeak-ng (see Installation section above).
### Integrating AbstractVoice in Your Python Project
Here's a simple example of how to integrate AbstractVoice into your own application:
```python
from abstractvoice import VoiceManager
import time
# Initialize voice manager
voice_manager = VoiceManager(debug_mode=False)
# Text to speech
voice_manager.speak("Hello, I am an AI assistant. How can I help you today?")
# Wait for speech to complete
while voice_manager.is_speaking():
time.sleep(0.1)
# Speech to text with callback
def on_transcription(text):
print(f"User said: {text}")
if text.lower() != "stop":
# Process with your text generation system
response = f"You said: {text}"
voice_manager.speak(response)
# Start voice recognition
voice_manager.listen(on_transcription)
# Wait for user to say "stop" or press Ctrl+C
try:
while voice_manager.is_listening():
time.sleep(0.1)
except KeyboardInterrupt:
pass
# Clean up
voice_manager.cleanup()
```
## Running Examples
The package includes several examples that demonstrate different ways to use AbstractVoice.
### Voice Mode (Default)
If installed globally, you can launch AbstractVoice directly in voice mode:
```bash
# Start AbstractVoice in voice mode (TTS ON, STT ON)
abstractvoice
# With options
abstractvoice --debug --whisper base --model gemma3:latest --api http://localhost:11434/api/chat
```
**Command line options:**
- `--debug` - Enable debug mode with detailed logging
- `--api <url>` - URL of the Ollama API (default: http://localhost:11434/api/chat)
- `--model <name>` - Ollama model to use (default: granite3.3:2b)
- Examples: cogito:3b, phi4-mini:latest, qwen2.5:latest, gemma3:latest, etc.
- `--whisper <model>` - Whisper model to use (default: tiny)
- Options: tiny, base, small, medium, large
- `--no-listening` - Disable speech-to-text (listening), TTS still works
- **Note**: This creates a "TTS-only" mode where you type and the AI speaks back
- `--system <prompt>` - Custom system prompt
### 🎯 Complete CLI Interface (v0.3.0+)
AbstractVoice provides a unified command interface for all functionality:
```bash
# Voice mode (default)
abstractvoice # Interactive voice mode with AI
abstractvoice --model cogito:3b # With custom Ollama model
abstractvoice --language fr # French voice mode
# Examples and utilities
abstractvoice cli # CLI REPL for text interaction
abstractvoice web # Web API server
abstractvoice simple # Simple TTS/STT demonstration
abstractvoice check-deps # Check dependency compatibility
abstractvoice help # Show available commands
# Get help
abstractvoice --help # Complete help with all options
```
**All functionality through one command!** No more confusion between different entry points.
### Command-Line REPL
```bash
# Run the CLI example (TTS ON, STT OFF)
abstractvoice cli
# With debug mode
abstractvoice cli --debug
# With specific language
abstractvoice cli --language fr
```
#### REPL Commands
All commands must start with `/` except `stop`:
**Basic Commands:**
- `/exit`, `/q`, `/quit` - Exit REPL
- `/clear` - Clear conversation history
- `/help` - Show help information
- `stop` - Stop voice mode or TTS (voice command, no `/` needed)
**Voice & Audio:**
- `/tts on|off` - Toggle text-to-speech
- `/voice <mode>` - Voice input modes:
- `off` - Disable voice input
- `full` - Continuous listening, interrupts TTS on speech detection
- `wait` - Pause listening while speaking (recommended, reduces self-interruption)
- `stop` - Only stop on 'stop' keyword (planned)
- `ptt` - Push-to-talk mode (planned)
- `/speed <number>` - Set TTS speed (0.5-2.0, default: 1.0, **pitch preserved**)
- `/tts_model <model>` - Switch TTS model:
- `vits` - **Best quality** (requires espeak-ng)
- `fast_pitch` - Good quality (works everywhere)
- `glow-tts` - Alternative (similar quality to fast_pitch)
- `tacotron2-DDC` - Legacy (slower, lower quality)
- `/whisper <model>` - Switch Whisper model (tiny|base|small|medium|large)
- `/stop` - Stop voice mode or TTS playback
- `/pause` - Pause current TTS playback (can be resumed)
- `/resume` - Resume paused TTS playback
**LLM Configuration:**
- `/model <name>` - Change LLM model (e.g., `/model gemma3:latest`)
- `/system <prompt>` - Set system prompt (e.g., `/system You are a helpful coding assistant`)
- `/temperature <val>` - Set temperature (0.0-2.0, default: 0.7)
- `/max_tokens <num>` - Set max tokens (default: 4096)
**Chat Management:**
- `/save <filename>` - Save chat history (e.g., `/save conversation`)
- `/load <filename>` - Load chat history (e.g., `/load conversation`)
- `/tokens` - Display token usage statistics
**Sending Messages:**
- `<message>` - Any text without `/` prefix is sent to the LLM
**Note**: Commands without `/` (except `stop`) are sent to the LLM as regular messages.
### Web API
```bash
# Run the web API example
abstractvoice web
# With different host and port
abstractvoice web --host 0.0.0.0 --port 8000
```
You can also run a simplified version that doesn't load the full models:
```bash
# Run the web API with simulation mode
abstractvoice web --simulate
```
#### Troubleshooting Web API
If you encounter issues with the web API:
1. **404 Not Found**: Make sure you're accessing the correct endpoints (e.g., `/api/test`, `/api/tts`)
2. **Connection Issues**: Ensure no other service is using the port
3. **Model Loading Errors**: Try running with `--simulate` flag to test without loading models
4. **Dependencies**: Ensure all required packages are installed:
```bash
pip install flask soundfile numpy requests
```
5. **Test with a simple Flask script**:
```python
from flask import Flask
app = Flask(__name__)
@app.route('/')
def home():
return "Flask works!"
app.run(host='127.0.0.1', port=5000)
```
### Simple Demo
```bash
# Run the simple example
abstractvoice simple
```
## Documentation
### 📚 Documentation Overview
- **[README.md](README.md)** - This file: User guide, API reference, and examples
- **[CONTRIBUTING.md](CONTRIBUTING.md)** - Contribution guidelines and development setup
- **[CHANGELOG.md](CHANGELOG.md)** - Version history and release notes
- **[docs/](docs/)** - Technical documentation for developers
### 🎯 Quick Navigation
- **Getting Started**: [Installation](#installation) and [Quick Start](#quick-start)
- **Pause/Resume Control**: [TTS Control](#quick-reference-tts-control) section
- **Integration Examples**: [Integration Guide](#integration-guide-for-third-party-applications)
- **Technical Details**: [docs/architecture.md](docs/architecture.md) - How immediate pause/resume works
- **Development**: [CONTRIBUTING.md](CONTRIBUTING.md) - Setup and guidelines
## Component Overview
### VoiceManager
The main class that coordinates TTS and STT functionality:
```python
from abstractvoice import VoiceManager
# Simple initialization (automatic model selection)
# - Uses VITS if espeak-ng is installed (best quality)
# - Falls back to fast_pitch if espeak-ng is missing
manager = VoiceManager()
# Or specify a model explicitly
manager = VoiceManager(
tts_model="tts_models/en/ljspeech/vits", # Best quality (needs espeak-ng)
# tts_model="tts_models/en/ljspeech/fast_pitch", # Good (works everywhere)
whisper_model="tiny",
debug_mode=False
)
# === TTS (Text-to-Speech) ===
# Basic speech synthesis
manager.speak("Hello world")
# With speed control (pitch preserved via time-stretching!)
manager.speak("This is 20% faster", speed=1.2)
manager.speak("This is half speed", speed=0.5)
# Check if speaking
if manager.is_speaking():
manager.stop_speaking()
# Pause and resume TTS (IMMEDIATE response)
manager.speak("This is a long sentence that can be paused and resumed immediately")
time.sleep(1)
success = manager.pause_speaking() # Pause IMMEDIATELY (~20ms response)
if success:
print("TTS paused immediately")
time.sleep(2)
success = manager.resume_speaking() # Resume IMMEDIATELY from exact position
if success:
print("TTS resumed from exact position")
# Check pause status
if manager.is_paused():
manager.resume_speaking()
# Change TTS speed globally
manager.set_speed(1.3) # All subsequent speech will be 30% faster
# Change TTS model dynamically
manager.set_tts_model("tts_models/en/ljspeech/glow-tts")
# Available TTS models (quality ranking):
# - "tts_models/en/ljspeech/vits" (BEST quality, requires espeak-ng)
# - "tts_models/en/ljspeech/fast_pitch" (fallback, works everywhere)
# - "tts_models/en/ljspeech/glow-tts" (alternative fallback)
# - "tts_models/en/ljspeech/tacotron2-DDC" (legacy)
# === Audio Lifecycle Callbacks (v0.5.1+) ===
# NEW: Precise audio timing callbacks for visual status indicators
def on_synthesis_start():
print("🔴 Synthesis started - show thinking animation")
def on_audio_start():
print("🔵 Audio started - show speaking animation")
def on_audio_pause():
print("⏸️ Audio paused - show paused animation")
def on_audio_resume():
print("▶️ Audio resumed - continue speaking animation")
def on_audio_end():
print("🟢 Audio ended - show ready animation")
def on_synthesis_end():
print("✅ Synthesis complete")
# Wire up callbacks
manager.tts_engine.on_playback_start = on_synthesis_start # Existing (synthesis phase)
manager.tts_engine.on_playback_end = on_synthesis_end # Existing (synthesis phase)
manager.on_audio_start = on_audio_start # NEW (actual audio playback)
manager.on_audio_end = on_audio_end # NEW (actual audio playback)
manager.on_audio_pause = on_audio_pause # NEW (pause events)
manager.on_audio_resume = on_audio_resume # NEW (resume events)
# Perfect for system tray icons, UI animations, or coordinating multiple audio streams
# === STT (Speech-to-Text) ===
def on_transcription(text):
print(f"You said: {text}")
manager.listen(on_transcription, on_stop=None)
manager.stop_listening()
manager.is_listening()
# Change Whisper model
manager.set_whisper("base") # tiny, base, small, medium, large
# === Voice Modes ===
# Control how voice recognition behaves during TTS
manager.set_voice_mode("wait") # Pause listening while speaking (recommended)
manager.set_voice_mode("full") # Keep listening, interrupt on speech
manager.set_voice_mode("off") # Disable voice recognition
# === VAD (Voice Activity Detection) ===
manager.change_vad_aggressiveness(2) # 0-3, higher = more aggressive
# === Cleanup ===
manager.cleanup()
```
### TTSEngine
Handles text-to-speech synthesis:
```python
from abstractvoice.tts import TTSEngine
# Initialize with fast_pitch model (default, no external dependencies)
tts = TTSEngine(
model_name="tts_models/en/ljspeech/fast_pitch",
debug_mode=False,
streaming=True # Enable progressive playback for long text
)
# Speak with speed control (pitch preserved via time-stretching)
tts.speak(text, speed=1.2, callback=None) # 20% faster, same pitch
# Immediate pause and resume control
success = tts.pause() # Pause IMMEDIATELY (~20ms response)
success = tts.resume() # Resume IMMEDIATELY from exact position
is_paused = tts.is_paused() # Check if currently paused
tts.stop() # Stop completely (cannot resume)
tts.is_active() # Check if active
```
**Important Note on Speed Parameter:**
- The speed parameter now uses proper time-stretching (via librosa)
- Changing speed does NOT affect pitch anymore
- Range: 0.5 (half speed) to 2.0 (double speed)
- Example: `speed=1.3` makes speech 30% faster while preserving natural pitch
### VoiceRecognizer
Manages speech recognition with VAD:
```python
from abstractvoice.recognition import VoiceRecognizer
def on_transcription(text):
print(f"Transcribed: {text}")
def on_stop():
print("Stop command detected")
recognizer = VoiceRecognizer(transcription_callback=on_transcription,
stop_callback=on_stop,
whisper_model="tiny",
debug_mode=False)
recognizer.start(tts_interrupt_callback=None)
recognizer.stop()
recognizer.change_whisper_model("base")
recognizer.change_vad_aggressiveness(2)
```
## Quick Reference: TTS Control
### Pause and Resume TTS
**Professional-grade pause/resume control** with immediate response and no terminal interference.
**In CLI/REPL:**
```bash
/pause # Pause current TTS playback IMMEDIATELY
/resume # Resume paused TTS playback IMMEDIATELY
/stop # Stop TTS completely (cannot resume)
```
**Programmatic Usage:**
#### Basic Pause/Resume
```python
from abstractvoice import VoiceManager
import time
vm = VoiceManager()
# Start speech
vm.speak("This is a long sentence that demonstrates immediate pause and resume functionality.")
# Pause immediately (takes effect within ~20ms)
time.sleep(1)
result = vm.pause_speaking()
if result:
print("✓ TTS paused immediately")
# Resume immediately (takes effect within ~20ms)
time.sleep(2)
result = vm.resume_speaking()
if result:
print("✓ TTS resumed immediately")
```
#### Advanced Control with Status Checking
```python
from abstractvoice import VoiceManager
import time
vm = VoiceManager()
# Start long speech
vm.speak("This is a very long text that will be used to demonstrate the advanced pause and resume control features.")
# Wait and pause
time.sleep(1.5)
if vm.is_speaking():
vm.pause_speaking()
print("Speech paused")
# Check pause status
if vm.is_paused():
print("Confirmed: TTS is paused")
time.sleep(2)
# Resume from exact position
vm.resume_speaking()
print("Speech resumed from exact position")
# Wait for completion
while vm.is_speaking():
time.sleep(0.1)
print("Speech completed")
```
#### Interactive Control Example
```python
from abstractvoice import VoiceManager
import threading
import time
vm = VoiceManager()
def control_speech():
"""Interactive control in separate thread"""
time.sleep(2)
print("Pausing speech...")
vm.pause_speaking()
time.sleep(3)
print("Resuming speech...")
vm.resume_speaking()
# Start long speech
long_text = """
This is a comprehensive demonstration of AbstractVoice's immediate pause and resume functionality.
The system uses non-blocking audio streaming with callback-based control.
You can pause and resume at any time with immediate response.
The audio continues from the exact position where it was paused.
"""
# Start control thread
control_thread = threading.Thread(target=control_speech, daemon=True)
control_thread.start()
# Start speech (non-blocking)
vm.speak(long_text)
# Wait for completion
while vm.is_speaking() or vm.is_paused():
time.sleep(0.1)
vm.cleanup()
```
#### Error Handling
```python
from abstractvoice import VoiceManager
vm = VoiceManager()
# Start speech
vm.speak("Testing pause/resume with error handling")
# Safe pause with error handling
try:
if vm.is_speaking():
success = vm.pause_speaking()
if success:
print("Successfully paused")
else:
print("No active speech to pause")
# Safe resume with error handling
if vm.is_paused():
success = vm.resume_speaking()
if success:
print("Successfully resumed")
else:
print("Was not paused or playback completed")
except Exception as e:
print(f"Error controlling TTS: {e}")
```
**Key Features:**
- **⚡ Immediate Response**: Pause/resume takes effect within ~20ms
- **🎯 Exact Position**: Resumes from precise audio position (no repetition)
- **🖥️ No Terminal Interference**: Uses OutputStream callbacks, never blocks terminal
- **🔒 Thread-Safe**: Safe to call from any thread or callback
- **📊 Reliable Status**: `is_paused()` and `is_speaking()` always accurate
- **🔄 Seamless Streaming**: Works with ongoing text synthesis
**How it works:**
- Uses `sounddevice.OutputStream` with callback function
- Pause immediately outputs silence in next audio callback (~20ms)
- Resume immediately continues audio output from exact position
- No blocking `sd.stop()` calls that interfere with terminal I/O
- Thread-safe with proper locking mechanisms
## Quick Reference: Speed & Model Control
### Changing TTS Speed
**In CLI/REPL:**
```bash
/speed 1.2 # 20% faster, pitch preserved
/speed 0.8 # 20% slower, pitch preserved
```
**Programmatically:**
```python
from abstractvoice import VoiceManager
vm = VoiceManager()
# Method 1: Set global speed
vm.set_speed(1.3) # All speech will be 30% faster
vm.speak("This will be 30% faster")
# Method 2: Per-speech speed
vm.speak("This is 50% faster", speed=1.5)
vm.speak("This is normal speed", speed=1.0)
vm.speak("This is half speed", speed=0.5)
# Get current speed
current = vm.get_speed() # Returns 1.3 from set_speed() above
```
### Changing TTS Model
**In CLI/REPL:**
```bash
/tts_model vits # Best quality (needs espeak-ng)
/tts_model fast_pitch # Good quality (works everywhere)
/tts_model glow-tts # Alternative model
/tts_model tacotron2-DDC # Legacy model
```
**Programmatically:**
```python
from abstractvoice import VoiceManager
# Method 1: Set at initialization
vm = VoiceManager(tts_model="tts_models/en/ljspeech/glow-tts")
# Method 2: Change dynamically at runtime
vm.set_tts_model("tts_models/en/ljspeech/fast_pitch")
vm.speak("Using fast_pitch now")
vm.set_tts_model("tts_models/en/ljspeech/glow-tts")
vm.speak("Using glow-tts now")
# Available models (quality ranking):
models = [
"tts_models/en/ljspeech/vits", # BEST (requires espeak-ng)
"tts_models/en/ljspeech/fast_pitch", # Good (works everywhere)
"tts_models/en/ljspeech/glow-tts", # Alternative fallback
"tts_models/en/ljspeech/tacotron2-DDC" # Legacy
]
```
### Complete Example: Experiment with Settings
```python
from abstractvoice import VoiceManager
import time
vm = VoiceManager()
# Test different models (vits requires espeak-ng)
for model in ["vits", "fast_pitch", "glow-tts", "tacotron2-DDC"]:
full_name = f"tts_models/en/ljspeech/{model}"
vm.set_tts_model(full_name)
# Test different speeds with each model
for speed in [0.8, 1.0, 1.2]:
vm.speak(f"Testing {model} at {speed}x speed", speed=speed)
while vm.is_speaking():
time.sleep(0.1)
```
## Integration Guide for Third-Party Applications
AbstractVoice is designed as a lightweight, modular library for easy integration into your applications. This guide covers everything you need to know.
### Quick Start: Basic Integration
```python
from abstractvoice import VoiceManager
# 1. Initialize (automatic best-quality model selection)
vm = VoiceManager()
# 2. Text-to-Speech
vm.speak("Hello from my app!")
# 3. Speech-to-Text with callback
def handle_speech(text):
print(f"User said: {text}")
# Process text in your app...
vm.listen(on_transcription=handle_speech)
```
### Model Selection: Automatic vs Explicit
**Automatic (Recommended):**
```python
# Automatically uses best available model
vm = VoiceManager()
# → Uses VITS if espeak-ng installed (best quality)
# → Falls back to fast_pitch if espeak-ng missing
```
**Explicit:**
```python
# Force a specific model (bypasses auto-detection)
vm = VoiceManager(tts_model="tts_models/en/ljspeech/fast_pitch")
# Or change dynamically at runtime
vm.set_tts_model("tts_models/en/ljspeech/vits")
```
### Voice Quality Levels
| Model | Quality | Speed | Requirements |
|-------|---------|-------|--------------|
| **vits** | ⭐⭐⭐⭐⭐ Excellent | Fast | espeak-ng |
| **fast_pitch** | ⭐⭐⭐ Good | Fast | None |
| **glow-tts** | ⭐⭐⭐ Good | Fast | None |
| **tacotron2-DDC** | ⭐⭐ Fair | Slow | None |
### Customization Options
```python
from abstractvoice import VoiceManager
vm = VoiceManager(
# TTS Configuration
tts_model="tts_models/en/ljspeech/vits", # Model to use
# STT Configuration
whisper_model="base", # tiny, base, small, medium, large
# Debugging
debug_mode=True # Enable detailed logging
)
# Runtime customization
vm.set_speed(1.2) # Adjust TTS speed (0.5-2.0)
vm.set_tts_model("...") # Change TTS model
vm.set_whisper("small") # Change STT model
vm.set_voice_mode("wait") # wait, full, or off
vm.change_vad_aggressiveness(2) # VAD sensitivity (0-3)
```
### Integration Patterns
#### Pattern 1: TTS Only (No Voice Input)
```python
vm = VoiceManager()
# Speak with different speeds
vm.speak("Normal speed")
vm.speak("Fast speech", speed=1.5)
vm.speak("Slow speech", speed=0.7)
# Control playback with immediate response
if vm.is_speaking():
success = vm.pause_speaking() # Pause IMMEDIATELY (~20ms)
if success:
print("Speech paused immediately")
# or
vm.stop_speaking() # Stop completely (cannot resume)
# Resume from exact position
if vm.is_paused():
success = vm.resume_speaking() # Resume IMMEDIATELY (~20ms)
if success:
print("Speech resumed from exact position")
```
#### Pattern 2: STT Only (No Text-to-Speech)
```python
vm = VoiceManager()
def process_speech(text):
# Send to your backend, save to DB, etc.
your_app.process(text)
vm.listen(on_transcription=process_speech)
```
#### Pattern 3: Full Voice Interaction
```python
vm = VoiceManager()
def on_speech(text):
response = your_llm.generate(text)
vm.speak(response)
def on_stop():
print("User said stop")
vm.cleanup()
vm.listen(
on_transcription=on_speech,
on_stop=on_stop
)
```
### Error Handling
```python
try:
vm = VoiceManager()
vm.speak("Test")
except Exception as e:
print(f"TTS Error: {e}")
# Handle missing dependencies, etc.
# Check model availability
try:
vm.set_tts_model("tts_models/en/ljspeech/vits")
print("VITS available")
except:
print("VITS not available, using fallback")
vm.set_tts_model("tts_models/en/ljspeech/fast_pitch")
```
### Threading and Async Support
AbstractVoice handles threading internally for TTS and STT:
```python
# TTS is non-blocking
vm.speak("Long text...") # Returns immediately
# Your code continues while speech plays
# Check status
if vm.is_speaking():
print("Still speaking...")
# Wait for completion
while vm.is_speaking():
time.sleep(0.1)
# STT runs in background thread
vm.listen(on_transcription=callback) # Returns immediately
# Callbacks fire on background thread
```
### Cleanup and Resource Management
```python
# Always cleanup when done
vm.cleanup()
# Or use context manager pattern
from contextlib import contextmanager
@contextmanager
def voice_manager():
vm = VoiceManager()
try:
yield vm
finally:
vm.cleanup()
# Usage
with voice_manager() as vm:
vm.speak("Hello")
```
### Configuration for Different Environments
**Development (fast iteration):**
```python
vm = VoiceManager(
tts_model="tts_models/en/ljspeech/fast_pitch", # Fast
whisper_model="tiny", # Fast STT
debug_mode=True
)
```
**Production (best quality):**
```python
vm = VoiceManager(
tts_model="tts_models/en/ljspeech/vits", # Best quality
whisper_model="base", # Good accuracy
debug_mode=False
)
```
**Embedded/Resource-Constrained:**
```python
vm = VoiceManager(
tts_model="tts_models/en/ljspeech/fast_pitch", # Lower memory
whisper_model="tiny", # Smallest model
debug_mode=False
)
```
## Integration with Text Generation Systems
AbstractVoice is designed to be a lightweight, modular library that you can easily integrate into your own applications. Here are complete examples for common use cases:
### Example 1: Voice-Enabled Chatbot with Ollama
```python
from abstractvoice import VoiceManager
import requests
import time
# Initialize voice manager
voice_manager = VoiceManager()
# Function to call Ollama API
def generate_text(prompt):
response = requests.post("http://localhost:11434/api/chat", json={
"model": "granite3.3:2b",
"messages": [{"role": "user", "content": prompt}],
"stream": False
})
return response.json()["message"]["content"]
# Callback for speech recognition
def on_transcription(text):
if text.lower() == "stop":
return
print(f"User: {text}")
# Generate response
response = generate_text(text)
print(f"AI: {response}")
# Speak response
voice_manager.speak(response)
# Start listening
voice_manager.listen(on_transcription)
# Keep running until interrupted
try:
while voice_manager.is_listening():
time.sleep(0.1)
except KeyboardInterrupt:
voice_manager.cleanup()
```
### Example 2: Voice-Enabled Assistant with OpenAI
```python
from abstractvoice import VoiceManager
import openai
import time
# Initialize
voice_manager = VoiceManager()
openai.api_key = "your-api-key"
def on_transcription(text):
print(f"User: {text}")
# Get response from OpenAI
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": text}]
)
ai_response = response.choices[0].message.content
print(f"AI: {ai_response}")
# Speak the response
voice_manager.speak(ai_response)
# Start voice interaction
voice_manager.listen(on_transcription)
# Keep running
try:
while voice_manager.is_listening():
time.sleep(0.1)
except KeyboardInterrupt:
voice_manager.cleanup()
```
### Example 3: Text-to-Speech Only (No Voice Input)
```python
from abstractvoice import VoiceManager
import time
# Initialize voice manager
voice_manager = VoiceManager()
# Simple text-to-speech
voice_manager.speak("Hello! This is a test of the text to speech system.")
# Wait for speech to finish
while voice_manager.is_speaking():
time.sleep(0.1)
# Adjust speed
voice_manager.set_speed(1.5)
voice_manager.speak("This speech is 50% faster.")
while voice_manager.is_speaking():
time.sleep(0.1)
# Cleanup
voice_manager.cleanup()
```
### Example 4: Speech-to-Text Only (No TTS)
```python
from abstractvoice import VoiceManager
import time
voice_manager = VoiceManager()
def on_transcription(text):
print(f"Transcribed: {text}")
# Do something with the transcribed text
# e.g., save to file, send to API, etc.
# Start listening
voice_manager.listen(on_transcription)
# Keep running
try:
while voice_manager.is_listening():
time.sleep(0.1)
except KeyboardInterrupt:
voice_manager.cleanup()
```
### Key Integration Points
**VoiceManager Configuration:**
```python
# Full configuration example
voice_manager = VoiceManager(
tts_model="tts_models/en/ljspeech/fast_pitch", # Default (no external deps)
whisper_model="base", # Whisper STT model (tiny, base, small, medium, large)
debug_mode=True # Enable debug logging
)
# Alternative TTS models (all pure Python, cross-platform):
# - "tts_models/en/ljspeech/fast_pitch" - Default (fast, good quality)
# - "tts_models/en/ljspeech/glow-tts" - Alternative (similar quality)
# - "tts_models/en/ljspeech/tacotron2-DDC" - Legacy (older, slower)
# Set voice mode (full, wait, off)
voice_manager.set_voice_mode("wait") # Recommended to avoid self-interruption
# Adjust settings (speed now preserves pitch!)
voice_manager.set_speed(1.2) # TTS speed (default is 1.0, range 0.5-2.0)
voice_manager.change_vad_aggressiveness(2) # VAD sensitivity (0-3)
```
**Callback Functions:**
```python
def on_transcription(text):
"""Called when speech is transcribed"""
print(f"User said: {text}")
# Your custom logic here
def on_stop():
"""Called when user says 'stop'"""
print("Stopping voice mode")
# Your cleanup logic here
voice_manager.listen(
on_transcription=on_transcription,
on_stop=on_stop
)
```
## 💻 CLI Commands (v0.4.0+)
AbstractVoice provides powerful CLI commands for model management and voice interactions.
### Model Management
```bash
# Download essential model for offline use (recommended first step)
abstractvoice download-models
# Download models for specific languages
abstractvoice download-models --language fr # French
abstractvoice download-models --language de # German
abstractvoice download-models --language it # Italian
abstractvoice download-models --language es # Spanish
# Download specific model by name
abstractvoice download-models --model tts_models/fr/css10/vits
# Download all available models (large download!)
abstractvoice download-models --all
# Check current cache status
abstractvoice download-models --status
# Clear model cache
abstractvoice download-models --clear
```
### Voice Interface
```bash
# Start voice interface (default)
abstractvoice
# Start CLI REPL with specific language
abstractvoice cli --language fr
# Start with specific model
abstractvoice --model granite3.3:2b --language de
# Run simple example
abstractvoice simple
# Check dependencies
abstractvoice check-deps
```
### CLI Voice Commands
In the CLI REPL, use these commands (v0.5.0+):
```bash
# List all available voices with download status
/setvoice
# Automatically download and set specific voice (NEW in v0.5.0!)
/setvoice fr.css10_vits # Downloads French CSS10 if needed
/setvoice de.thorsten_vits # Downloads German Thorsten if needed
/setvoice it.mai_male_vits # Downloads Italian Male if needed
/setvoice en.jenny # Downloads Jenny voice if needed
# Change language (automatically downloads models if needed - NEW!)
/language fr # Switches to French, downloads if needed
/language de # Switches to German, downloads if needed
/language es # Switches to Spanish, downloads if needed
# Voice controls
/pause # Pause current speech
/resume # Resume speech
/stop # Stop speech
# Exit
/exit
```
**New in v0.5.0:** Language and voice commands now automatically download missing models with progress indicators. No more silent failures!
## Perspectives
This is a test project that I designed with examples to work with Ollama, but I will adapt the examples and abstractvoice to work with any LLM provider (anthropic, openai, etc).
Next iteration will leverage directly [AbstractCore](https://www.abstractcore.ai) to handle everything related to LLM, their providers, models and configurations.
## License and Acknowledgments
AbstractVoice is licensed under the [MIT License](LICENSE).
This project depends on several open-source libraries and models, each with their own licenses. Please see [ACKNOWLEDGMENTS.md](ACKNOWLEDGMENTS.md) for a detailed list of dependencies and their respective licenses.
Some dependencies, particularly certain TTS models, may have non-commercial use restrictions. If you plan to use AbstractVoice in a commercial application, please ensure you are using models that permit commercial use or obtain appropriate licenses.
Raw data
{
"_id": null,
"home_page": null,
"name": "abstractvoice",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": null,
"author": null,
"author_email": "Laurent-Philippe Albou <contact@abstractcore.ai>",
"download_url": "https://files.pythonhosted.org/packages/6b/e1/b97321ef76f57ece215c47905ed0f52092a79600af29b836cc0f6bb2b970/abstractvoice-0.5.1.tar.gz",
"platform": null,
"description": "# AbstractVoice\n\n[](https://pypi.org/project/abstractvoice/)\n[](https://pypi.org/project/abstractvoice/)\n[](https://github.com/lpalbou/abstractvoice/blob/main/LICENSE)\n[](https://github.com/lpalbou/abstractvoice/stargazers)\n\n\nA modular Python library for voice interactions with AI systems, providing text-to-speech (TTS) and speech-to-text (STT) capabilities with interrupt handling.\n\nWhile we provide CLI and WEB examples, AbstractVoice is designed to be integrated in other projects.\n\n## Features\n\n- **High-Quality TTS**: Best-in-class speech synthesis with VITS model\n - Natural prosody and intonation\n - Adjustable speed without pitch distortion (using librosa time-stretching)\n - Multiple quality levels (VITS best, fast_pitch fallback)\n - Automatic fallback if espeak-ng not installed\n- **Cross-Platform**: Works on macOS, Linux, and Windows\n - Best quality: Install espeak-ng (easy on all platforms)\n - Fallback mode: Works without any system dependencies\n- **Speech-to-Text**: Accurate voice recognition using OpenAI's Whisper\n- **Voice Activity Detection**: Efficient speech detection using WebRTC VAD\n- **Interrupt Handling**: Stop TTS by speaking or using stop commands\n- **Modular Design**: Easily integrate with any text generation system\n\nNote : *the LLM access is rudimentary and abstractvoice is provided more as an example and demonstrator. A better integration is to use the functionalities of this library and use them directly in combination with [AbstractCore](https://github.com/lpalbou/AbstractCore)*.\n\n## Installation\n\nAbstractVoice is designed to **work everywhere, out of the box** with automatic quality upgrades.\n\n### \ud83d\ude80 Quick Start (Recommended)\n\n```bash\n# One command installation - works on all systems\npip install abstractvoice[all]\n\n# Verify it works\npython -c \"from abstractvoice import VoiceManager; print('\u2705 Ready to go!')\"\n```\n\n**That's it!** AbstractVoice automatically:\n- \u2705 **Works everywhere** - Uses reliable models that run on any system\n- \u2705 **Auto-upgrades quality** - Detects when better models are available\n- \u2705 **No system dependencies required** - Pure Python installation\n- \u2705 **Optional quality boost** - Install `espeak-ng` for premium voices\n\n### Installation Options\n\n```bash\n# Minimal (just 2 dependencies)\npip install abstractvoice\n\n# Add features as needed\npip install abstractvoice[tts] # Text-to-speech\npip install abstractvoice[stt] # Speech-to-text\npip install abstractvoice[all] # Everything (recommended)\n\n# Language-specific\npip install abstractvoice[fr] # French with all features\npip install abstractvoice[de] # German with all features\n```\n\n### Optional Quality Upgrade\n\nFor the **absolute best voice quality**, install espeak-ng:\n\n```bash\n# macOS\nbrew install espeak-ng\n\n# Linux\nsudo apt-get install espeak-ng\n\n# Windows\nconda install espeak-ng\n```\n\nAbstractVoice automatically detects espeak-ng and upgrades to premium quality voices when available.\n\n## Quick Start\n\n### \u26a1 Instant TTS (v0.5.0+)\n\n```python\nfrom abstractvoice import VoiceManager\n\n# Initialize voice manager - works immediately with included dependencies\nvm = VoiceManager()\n\n# Text-to-speech works right away!\nvm.speak(\"Hello! TTS works out of the box!\")\n\n# Language switching with automatic model download\nvm.set_language('fr')\nvm.speak(\"Bonjour! Le fran\u00e7ais fonctionne aussi!\")\n```\n\n**That's it!** AbstractVoice v0.5.0+ automatically:\n- \u2705 Includes essential TTS dependencies in base installation\n- \u2705 Downloads models automatically when switching languages/voices\n- \u2705 Works immediately after `pip install abstractvoice`\n- \u2705 No silent failures - clear error messages if download fails\n- \u2705 No complex configuration needed\n\n### \ud83c\udf0d Multi-Language Support (Auto-Download in v0.5.0+)\n\n```python\n# Simply switch language - downloads model automatically if needed!\nvm.set_language('fr')\nvm.speak(\"Bonjour! Je parle fran\u00e7ais maintenant.\")\n\n# Switch to German - no manual download needed\nvm.set_language('de')\nvm.speak(\"Hallo! Ich spreche jetzt Deutsch.\")\n\n# Spanish, Italian also supported\nvm.set_language('es')\nvm.speak(\"\u00a1Hola! Hablo espa\u00f1ol ahora.\")\n\n# If download fails, you'll get clear error messages with instructions\n# Example: \"\u274c Cannot switch to French: Model download failed\"\n# \" Try: abstractvoice download-models --language fr\"\n```\n\n**New in v0.5.0:** No more manual `download_model()` calls! Language switching handles downloads automatically.\n\n### \ud83d\udd27 Check System Status\n\n```python\nfrom abstractvoice import is_ready, get_status, list_models\nimport json\n\n# Quick readiness check\nready = is_ready()\nprint(f\"TTS ready: {ready}\")\n\n# Get detailed status\nstatus = json.loads(get_status())\nprint(f\"Models cached: {status['total_cached']}\")\nprint(f\"Offline ready: {status['ready_for_offline']}\")\n\n# List all available models\nmodels = json.loads(list_models())\nfor lang, voices in models.items():\n print(f\"{lang}: {len(voices)} voices available\")\n```\n\n# Speech-to-text with callbacks\ndef on_transcription(text):\n print(f\"You said: {text}\")\n # Process the transcription\n vm.speak(f\"I heard you say: {text}\")\n\ndef on_stop():\n print(\"Stopping voice interaction\")\n\n# Start listening\nvm.listen(on_transcription, on_stop)\n\n# The voice manager will automatically pause listening when speaking\n# and resume when done to prevent feedback loops\n```\n\n## Additional Examples\n\n### Language-Specific Usage\n\n```python\n# French voice\nvm_fr = VoiceManager(language='fr')\nvm_fr.speak(\"Bonjour! Je peux parler fran\u00e7ais.\")\n\n# Spanish voice\nvm_es = VoiceManager(language='es')\nvm_es.speak(\"\u00a1Hola! Puedo hablar espa\u00f1ol.\")\n\n# Dynamic language switching\nvm.set_language('fr') # Switch to French\nvm.set_language('en') # Switch back to English\n```\n\n### Advanced Configuration\n\n```python\nfrom abstractvoice import VoiceManager\n\n# Custom TTS model selection\nvm = VoiceManager(\n language='en',\n tts_model='tts_models/en/ljspeech/fast_pitch', # Specific model\n whisper_model='base', # Larger Whisper model for better accuracy\n debug_mode=True\n)\n\n# Speed control\nvm.set_speed(1.5) # 1.5x speed\nvm.speak(\"This text will be spoken faster.\")\n\n# Model switching at runtime\nvm.set_tts_model('tts_models/en/ljspeech/vits') # Switch to VITS\nvm.set_whisper('small') # Switch to larger Whisper model\n```\n\n### Error Handling and Graceful Degradation\n\nAbstractVoice is designed to provide helpful error messages and fallback gracefully:\n\n```python\n# If you install just the basic package\n# pip install abstractvoice\n\nfrom abstractvoice import VoiceManager # This works fine\n\ntry:\n vm = VoiceManager() # This will fail with helpful message\nexcept ImportError as e:\n print(e)\n # Output: \"TTS functionality requires optional dependencies. Install with:\n # pip install abstractvoice[tts] # For TTS only\n # pip install abstractvoice[all] # For all features\"\n\n# Missing espeak-ng automatically falls back to compatible models\n# Missing dependencies show clear installation instructions\n# All errors are graceful with helpful guidance\n```\n\n## CLI and Web Examples\n\nAbstractVoice includes example applications to demonstrate its capabilities:\n\n### Using AbstractVoice from the Command Line\n\nThe easiest way to get started is to use AbstractVoice directly from your shell:\n\n```bash\n# Start AbstractVoice in voice mode (TTS ON, STT ON)\nabstractvoice\n# \u2192 Automatically uses VITS if espeak-ng installed (best quality)\n# \u2192 Falls back to fast_pitch if espeak-ng not found\n\n# Or start with custom settings\nabstractvoice --model gemma3:latest --whisper base\n\n# Start in text-only mode (TTS enabled, listening disabled)\nabstractvoice --no-listening\n```\n\nOnce started, you can interact with the AI using voice or text. Use `/help` to see all available commands.\n\n**Note**: AbstractVoice automatically selects the best available TTS model. For best quality, install espeak-ng (see Installation section above).\n\n### Integrating AbstractVoice in Your Python Project\n\nHere's a simple example of how to integrate AbstractVoice into your own application:\n\n```python\nfrom abstractvoice import VoiceManager\nimport time\n\n# Initialize voice manager\nvoice_manager = VoiceManager(debug_mode=False)\n\n# Text to speech\nvoice_manager.speak(\"Hello, I am an AI assistant. How can I help you today?\")\n\n# Wait for speech to complete\nwhile voice_manager.is_speaking():\n time.sleep(0.1)\n\n# Speech to text with callback\ndef on_transcription(text):\n print(f\"User said: {text}\")\n if text.lower() != \"stop\":\n # Process with your text generation system\n response = f\"You said: {text}\"\n voice_manager.speak(response)\n\n# Start voice recognition\nvoice_manager.listen(on_transcription)\n\n# Wait for user to say \"stop\" or press Ctrl+C\ntry:\n while voice_manager.is_listening():\n time.sleep(0.1)\nexcept KeyboardInterrupt:\n pass\n\n# Clean up\nvoice_manager.cleanup()\n```\n\n## Running Examples\n\nThe package includes several examples that demonstrate different ways to use AbstractVoice.\n\n### Voice Mode (Default)\n\nIf installed globally, you can launch AbstractVoice directly in voice mode:\n\n```bash\n# Start AbstractVoice in voice mode (TTS ON, STT ON)\nabstractvoice\n\n# With options\nabstractvoice --debug --whisper base --model gemma3:latest --api http://localhost:11434/api/chat\n```\n\n**Command line options:**\n- `--debug` - Enable debug mode with detailed logging\n- `--api <url>` - URL of the Ollama API (default: http://localhost:11434/api/chat)\n- `--model <name>` - Ollama model to use (default: granite3.3:2b)\n - Examples: cogito:3b, phi4-mini:latest, qwen2.5:latest, gemma3:latest, etc.\n- `--whisper <model>` - Whisper model to use (default: tiny)\n - Options: tiny, base, small, medium, large\n- `--no-listening` - Disable speech-to-text (listening), TTS still works\n - **Note**: This creates a \"TTS-only\" mode where you type and the AI speaks back\n- `--system <prompt>` - Custom system prompt\n\n### \ud83c\udfaf Complete CLI Interface (v0.3.0+)\n\nAbstractVoice provides a unified command interface for all functionality:\n\n```bash\n# Voice mode (default)\nabstractvoice # Interactive voice mode with AI\nabstractvoice --model cogito:3b # With custom Ollama model\nabstractvoice --language fr # French voice mode\n\n# Examples and utilities\nabstractvoice cli # CLI REPL for text interaction\nabstractvoice web # Web API server\nabstractvoice simple # Simple TTS/STT demonstration\nabstractvoice check-deps # Check dependency compatibility\nabstractvoice help # Show available commands\n\n# Get help\nabstractvoice --help # Complete help with all options\n```\n\n**All functionality through one command!** No more confusion between different entry points.\n\n### Command-Line REPL\n\n```bash\n# Run the CLI example (TTS ON, STT OFF)\nabstractvoice cli\n\n# With debug mode\nabstractvoice cli --debug\n\n# With specific language\nabstractvoice cli --language fr\n```\n\n#### REPL Commands\n\nAll commands must start with `/` except `stop`:\n\n**Basic Commands:**\n- `/exit`, `/q`, `/quit` - Exit REPL\n- `/clear` - Clear conversation history\n- `/help` - Show help information\n- `stop` - Stop voice mode or TTS (voice command, no `/` needed)\n\n**Voice & Audio:**\n- `/tts on|off` - Toggle text-to-speech\n- `/voice <mode>` - Voice input modes:\n - `off` - Disable voice input\n - `full` - Continuous listening, interrupts TTS on speech detection\n - `wait` - Pause listening while speaking (recommended, reduces self-interruption)\n - `stop` - Only stop on 'stop' keyword (planned)\n - `ptt` - Push-to-talk mode (planned)\n- `/speed <number>` - Set TTS speed (0.5-2.0, default: 1.0, **pitch preserved**)\n- `/tts_model <model>` - Switch TTS model:\n - `vits` - **Best quality** (requires espeak-ng)\n - `fast_pitch` - Good quality (works everywhere)\n - `glow-tts` - Alternative (similar quality to fast_pitch)\n - `tacotron2-DDC` - Legacy (slower, lower quality)\n- `/whisper <model>` - Switch Whisper model (tiny|base|small|medium|large)\n- `/stop` - Stop voice mode or TTS playback\n- `/pause` - Pause current TTS playback (can be resumed)\n- `/resume` - Resume paused TTS playback\n\n**LLM Configuration:**\n- `/model <name>` - Change LLM model (e.g., `/model gemma3:latest`)\n- `/system <prompt>` - Set system prompt (e.g., `/system You are a helpful coding assistant`)\n- `/temperature <val>` - Set temperature (0.0-2.0, default: 0.7)\n- `/max_tokens <num>` - Set max tokens (default: 4096)\n\n**Chat Management:**\n- `/save <filename>` - Save chat history (e.g., `/save conversation`)\n- `/load <filename>` - Load chat history (e.g., `/load conversation`)\n- `/tokens` - Display token usage statistics\n\n**Sending Messages:**\n- `<message>` - Any text without `/` prefix is sent to the LLM\n\n**Note**: Commands without `/` (except `stop`) are sent to the LLM as regular messages.\n\n### Web API\n\n```bash\n# Run the web API example\nabstractvoice web\n\n# With different host and port\nabstractvoice web --host 0.0.0.0 --port 8000\n```\n\nYou can also run a simplified version that doesn't load the full models:\n\n```bash\n# Run the web API with simulation mode\nabstractvoice web --simulate\n```\n\n#### Troubleshooting Web API\n\nIf you encounter issues with the web API:\n\n1. **404 Not Found**: Make sure you're accessing the correct endpoints (e.g., `/api/test`, `/api/tts`)\n2. **Connection Issues**: Ensure no other service is using the port\n3. **Model Loading Errors**: Try running with `--simulate` flag to test without loading models\n4. **Dependencies**: Ensure all required packages are installed:\n ```bash\n pip install flask soundfile numpy requests\n ```\n5. **Test with a simple Flask script**:\n ```python\n from flask import Flask\n app = Flask(__name__)\n @app.route('/')\n def home():\n return \"Flask works!\"\n app.run(host='127.0.0.1', port=5000)\n ```\n\n### Simple Demo\n\n```bash\n# Run the simple example\nabstractvoice simple\n```\n\n## Documentation\n\n### \ud83d\udcda Documentation Overview\n\n- **[README.md](README.md)** - This file: User guide, API reference, and examples\n- **[CONTRIBUTING.md](CONTRIBUTING.md)** - Contribution guidelines and development setup\n- **[CHANGELOG.md](CHANGELOG.md)** - Version history and release notes\n- **[docs/](docs/)** - Technical documentation for developers\n\n### \ud83c\udfaf Quick Navigation\n\n- **Getting Started**: [Installation](#installation) and [Quick Start](#quick-start)\n- **Pause/Resume Control**: [TTS Control](#quick-reference-tts-control) section\n- **Integration Examples**: [Integration Guide](#integration-guide-for-third-party-applications)\n- **Technical Details**: [docs/architecture.md](docs/architecture.md) - How immediate pause/resume works\n- **Development**: [CONTRIBUTING.md](CONTRIBUTING.md) - Setup and guidelines\n\n## Component Overview\n\n### VoiceManager\n\nThe main class that coordinates TTS and STT functionality:\n\n```python\nfrom abstractvoice import VoiceManager\n\n# Simple initialization (automatic model selection)\n# - Uses VITS if espeak-ng is installed (best quality)\n# - Falls back to fast_pitch if espeak-ng is missing\nmanager = VoiceManager()\n\n# Or specify a model explicitly\nmanager = VoiceManager(\n tts_model=\"tts_models/en/ljspeech/vits\", # Best quality (needs espeak-ng)\n # tts_model=\"tts_models/en/ljspeech/fast_pitch\", # Good (works everywhere)\n whisper_model=\"tiny\",\n debug_mode=False\n)\n\n# === TTS (Text-to-Speech) ===\n\n# Basic speech synthesis\nmanager.speak(\"Hello world\")\n\n# With speed control (pitch preserved via time-stretching!)\nmanager.speak(\"This is 20% faster\", speed=1.2)\nmanager.speak(\"This is half speed\", speed=0.5)\n\n# Check if speaking\nif manager.is_speaking():\n manager.stop_speaking()\n\n# Pause and resume TTS (IMMEDIATE response)\nmanager.speak(\"This is a long sentence that can be paused and resumed immediately\")\ntime.sleep(1)\nsuccess = manager.pause_speaking() # Pause IMMEDIATELY (~20ms response)\nif success:\n print(\"TTS paused immediately\")\n\ntime.sleep(2)\nsuccess = manager.resume_speaking() # Resume IMMEDIATELY from exact position\nif success:\n print(\"TTS resumed from exact position\")\n\n# Check pause status\nif manager.is_paused():\n manager.resume_speaking()\n\n# Change TTS speed globally\nmanager.set_speed(1.3) # All subsequent speech will be 30% faster\n\n# Change TTS model dynamically\nmanager.set_tts_model(\"tts_models/en/ljspeech/glow-tts\")\n\n# Available TTS models (quality ranking):\n# - \"tts_models/en/ljspeech/vits\" (BEST quality, requires espeak-ng)\n# - \"tts_models/en/ljspeech/fast_pitch\" (fallback, works everywhere)\n# - \"tts_models/en/ljspeech/glow-tts\" (alternative fallback)\n# - \"tts_models/en/ljspeech/tacotron2-DDC\" (legacy)\n\n# === Audio Lifecycle Callbacks (v0.5.1+) ===\n\n# NEW: Precise audio timing callbacks for visual status indicators\ndef on_synthesis_start():\n print(\"\ud83d\udd34 Synthesis started - show thinking animation\")\n\ndef on_audio_start():\n print(\"\ud83d\udd35 Audio started - show speaking animation\")\n\ndef on_audio_pause():\n print(\"\u23f8\ufe0f Audio paused - show paused animation\")\n\ndef on_audio_resume():\n print(\"\u25b6\ufe0f Audio resumed - continue speaking animation\")\n\ndef on_audio_end():\n print(\"\ud83d\udfe2 Audio ended - show ready animation\")\n\ndef on_synthesis_end():\n print(\"\u2705 Synthesis complete\")\n\n# Wire up callbacks\nmanager.tts_engine.on_playback_start = on_synthesis_start # Existing (synthesis phase)\nmanager.tts_engine.on_playback_end = on_synthesis_end # Existing (synthesis phase)\nmanager.on_audio_start = on_audio_start # NEW (actual audio playback)\nmanager.on_audio_end = on_audio_end # NEW (actual audio playback)\nmanager.on_audio_pause = on_audio_pause # NEW (pause events)\nmanager.on_audio_resume = on_audio_resume # NEW (resume events)\n\n# Perfect for system tray icons, UI animations, or coordinating multiple audio streams\n\n# === STT (Speech-to-Text) ===\n\ndef on_transcription(text):\n print(f\"You said: {text}\")\n\nmanager.listen(on_transcription, on_stop=None)\nmanager.stop_listening()\nmanager.is_listening()\n\n# Change Whisper model\nmanager.set_whisper(\"base\") # tiny, base, small, medium, large\n\n# === Voice Modes ===\n\n# Control how voice recognition behaves during TTS\nmanager.set_voice_mode(\"wait\") # Pause listening while speaking (recommended)\nmanager.set_voice_mode(\"full\") # Keep listening, interrupt on speech\nmanager.set_voice_mode(\"off\") # Disable voice recognition\n\n# === VAD (Voice Activity Detection) ===\n\nmanager.change_vad_aggressiveness(2) # 0-3, higher = more aggressive\n\n# === Cleanup ===\n\nmanager.cleanup()\n```\n\n### TTSEngine\n\nHandles text-to-speech synthesis:\n\n```python\nfrom abstractvoice.tts import TTSEngine\n\n# Initialize with fast_pitch model (default, no external dependencies)\ntts = TTSEngine(\n model_name=\"tts_models/en/ljspeech/fast_pitch\",\n debug_mode=False,\n streaming=True # Enable progressive playback for long text\n)\n\n# Speak with speed control (pitch preserved via time-stretching)\ntts.speak(text, speed=1.2, callback=None) # 20% faster, same pitch\n\n# Immediate pause and resume control\nsuccess = tts.pause() # Pause IMMEDIATELY (~20ms response)\nsuccess = tts.resume() # Resume IMMEDIATELY from exact position\nis_paused = tts.is_paused() # Check if currently paused\n\ntts.stop() # Stop completely (cannot resume)\ntts.is_active() # Check if active\n```\n\n**Important Note on Speed Parameter:**\n- The speed parameter now uses proper time-stretching (via librosa)\n- Changing speed does NOT affect pitch anymore\n- Range: 0.5 (half speed) to 2.0 (double speed)\n- Example: `speed=1.3` makes speech 30% faster while preserving natural pitch\n\n### VoiceRecognizer\n\nManages speech recognition with VAD:\n\n```python\nfrom abstractvoice.recognition import VoiceRecognizer\n\ndef on_transcription(text):\n print(f\"Transcribed: {text}\")\n\ndef on_stop():\n print(\"Stop command detected\")\n\nrecognizer = VoiceRecognizer(transcription_callback=on_transcription,\n stop_callback=on_stop, \n whisper_model=\"tiny\",\n debug_mode=False)\nrecognizer.start(tts_interrupt_callback=None)\nrecognizer.stop()\nrecognizer.change_whisper_model(\"base\")\nrecognizer.change_vad_aggressiveness(2)\n```\n\n## Quick Reference: TTS Control\n\n### Pause and Resume TTS\n\n**Professional-grade pause/resume control** with immediate response and no terminal interference.\n\n**In CLI/REPL:**\n```bash\n/pause # Pause current TTS playback IMMEDIATELY\n/resume # Resume paused TTS playback IMMEDIATELY \n/stop # Stop TTS completely (cannot resume)\n```\n\n**Programmatic Usage:**\n\n#### Basic Pause/Resume\n```python\nfrom abstractvoice import VoiceManager\nimport time\n\nvm = VoiceManager()\n\n# Start speech\nvm.speak(\"This is a long sentence that demonstrates immediate pause and resume functionality.\")\n\n# Pause immediately (takes effect within ~20ms)\ntime.sleep(1)\nresult = vm.pause_speaking()\nif result:\n print(\"\u2713 TTS paused immediately\")\n\n# Resume immediately (takes effect within ~20ms) \ntime.sleep(2)\nresult = vm.resume_speaking()\nif result:\n print(\"\u2713 TTS resumed immediately\")\n```\n\n#### Advanced Control with Status Checking\n```python\nfrom abstractvoice import VoiceManager\nimport time\n\nvm = VoiceManager()\n\n# Start long speech\nvm.speak(\"This is a very long text that will be used to demonstrate the advanced pause and resume control features.\")\n\n# Wait and pause\ntime.sleep(1.5)\nif vm.is_speaking():\n vm.pause_speaking()\n print(\"Speech paused\")\n\n# Check pause status\nif vm.is_paused():\n print(\"Confirmed: TTS is paused\")\n time.sleep(2)\n \n # Resume from exact position\n vm.resume_speaking()\n print(\"Speech resumed from exact position\")\n\n# Wait for completion\nwhile vm.is_speaking():\n time.sleep(0.1)\nprint(\"Speech completed\")\n```\n\n#### Interactive Control Example\n```python\nfrom abstractvoice import VoiceManager\nimport threading\nimport time\n\nvm = VoiceManager()\n\ndef control_speech():\n \"\"\"Interactive control in separate thread\"\"\"\n time.sleep(2)\n print(\"Pausing speech...\")\n vm.pause_speaking()\n \n time.sleep(3)\n print(\"Resuming speech...\")\n vm.resume_speaking()\n\n# Start long speech\nlong_text = \"\"\"\nThis is a comprehensive demonstration of AbstractVoice's immediate pause and resume functionality.\nThe system uses non-blocking audio streaming with callback-based control.\nYou can pause and resume at any time with immediate response.\nThe audio continues from the exact position where it was paused.\n\"\"\"\n\n# Start control thread\ncontrol_thread = threading.Thread(target=control_speech, daemon=True)\ncontrol_thread.start()\n\n# Start speech (non-blocking)\nvm.speak(long_text)\n\n# Wait for completion\nwhile vm.is_speaking() or vm.is_paused():\n time.sleep(0.1)\n\nvm.cleanup()\n```\n\n#### Error Handling\n```python\nfrom abstractvoice import VoiceManager\n\nvm = VoiceManager()\n\n# Start speech\nvm.speak(\"Testing pause/resume with error handling\")\n\n# Safe pause with error handling\ntry:\n if vm.is_speaking():\n success = vm.pause_speaking()\n if success:\n print(\"Successfully paused\")\n else:\n print(\"No active speech to pause\")\n \n # Safe resume with error handling\n if vm.is_paused():\n success = vm.resume_speaking()\n if success:\n print(\"Successfully resumed\")\n else:\n print(\"Was not paused or playback completed\")\n \nexcept Exception as e:\n print(f\"Error controlling TTS: {e}\")\n```\n\n**Key Features:**\n- **\u26a1 Immediate Response**: Pause/resume takes effect within ~20ms\n- **\ud83c\udfaf Exact Position**: Resumes from precise audio position (no repetition)\n- **\ud83d\udda5\ufe0f No Terminal Interference**: Uses OutputStream callbacks, never blocks terminal\n- **\ud83d\udd12 Thread-Safe**: Safe to call from any thread or callback\n- **\ud83d\udcca Reliable Status**: `is_paused()` and `is_speaking()` always accurate\n- **\ud83d\udd04 Seamless Streaming**: Works with ongoing text synthesis\n\n**How it works:**\n- Uses `sounddevice.OutputStream` with callback function\n- Pause immediately outputs silence in next audio callback (~20ms)\n- Resume immediately continues audio output from exact position\n- No blocking `sd.stop()` calls that interfere with terminal I/O\n- Thread-safe with proper locking mechanisms\n\n## Quick Reference: Speed & Model Control\n\n### Changing TTS Speed\n\n**In CLI/REPL:**\n```bash\n/speed 1.2 # 20% faster, pitch preserved\n/speed 0.8 # 20% slower, pitch preserved\n```\n\n**Programmatically:**\n```python\nfrom abstractvoice import VoiceManager\n\nvm = VoiceManager()\n\n# Method 1: Set global speed\nvm.set_speed(1.3) # All speech will be 30% faster\nvm.speak(\"This will be 30% faster\")\n\n# Method 2: Per-speech speed\nvm.speak(\"This is 50% faster\", speed=1.5)\nvm.speak(\"This is normal speed\", speed=1.0)\nvm.speak(\"This is half speed\", speed=0.5)\n\n# Get current speed\ncurrent = vm.get_speed() # Returns 1.3 from set_speed() above\n```\n\n### Changing TTS Model\n\n**In CLI/REPL:**\n```bash\n/tts_model vits # Best quality (needs espeak-ng)\n/tts_model fast_pitch # Good quality (works everywhere)\n/tts_model glow-tts # Alternative model\n/tts_model tacotron2-DDC # Legacy model\n```\n\n**Programmatically:**\n```python\nfrom abstractvoice import VoiceManager\n\n# Method 1: Set at initialization\nvm = VoiceManager(tts_model=\"tts_models/en/ljspeech/glow-tts\")\n\n# Method 2: Change dynamically at runtime\nvm.set_tts_model(\"tts_models/en/ljspeech/fast_pitch\")\nvm.speak(\"Using fast_pitch now\")\n\nvm.set_tts_model(\"tts_models/en/ljspeech/glow-tts\")\nvm.speak(\"Using glow-tts now\")\n\n# Available models (quality ranking):\nmodels = [\n \"tts_models/en/ljspeech/vits\", # BEST (requires espeak-ng)\n \"tts_models/en/ljspeech/fast_pitch\", # Good (works everywhere)\n \"tts_models/en/ljspeech/glow-tts\", # Alternative fallback\n \"tts_models/en/ljspeech/tacotron2-DDC\" # Legacy\n]\n```\n\n### Complete Example: Experiment with Settings\n\n```python\nfrom abstractvoice import VoiceManager\nimport time\n\nvm = VoiceManager()\n\n# Test different models (vits requires espeak-ng)\nfor model in [\"vits\", \"fast_pitch\", \"glow-tts\", \"tacotron2-DDC\"]:\n full_name = f\"tts_models/en/ljspeech/{model}\"\n vm.set_tts_model(full_name)\n \n # Test different speeds with each model\n for speed in [0.8, 1.0, 1.2]:\n vm.speak(f\"Testing {model} at {speed}x speed\", speed=speed)\n while vm.is_speaking():\n time.sleep(0.1)\n```\n\n## Integration Guide for Third-Party Applications\n\nAbstractVoice is designed as a lightweight, modular library for easy integration into your applications. This guide covers everything you need to know.\n\n### Quick Start: Basic Integration\n\n```python\nfrom abstractvoice import VoiceManager\n\n# 1. Initialize (automatic best-quality model selection)\nvm = VoiceManager()\n\n# 2. Text-to-Speech\nvm.speak(\"Hello from my app!\")\n\n# 3. Speech-to-Text with callback\ndef handle_speech(text):\n print(f\"User said: {text}\")\n # Process text in your app...\n\nvm.listen(on_transcription=handle_speech)\n```\n\n### Model Selection: Automatic vs Explicit\n\n**Automatic (Recommended):**\n```python\n# Automatically uses best available model\nvm = VoiceManager()\n# \u2192 Uses VITS if espeak-ng installed (best quality)\n# \u2192 Falls back to fast_pitch if espeak-ng missing\n```\n\n**Explicit:**\n```python\n# Force a specific model (bypasses auto-detection)\nvm = VoiceManager(tts_model=\"tts_models/en/ljspeech/fast_pitch\")\n\n# Or change dynamically at runtime\nvm.set_tts_model(\"tts_models/en/ljspeech/vits\")\n```\n\n### Voice Quality Levels\n\n| Model | Quality | Speed | Requirements |\n|-------|---------|-------|--------------|\n| **vits** | \u2b50\u2b50\u2b50\u2b50\u2b50 Excellent | Fast | espeak-ng |\n| **fast_pitch** | \u2b50\u2b50\u2b50 Good | Fast | None |\n| **glow-tts** | \u2b50\u2b50\u2b50 Good | Fast | None |\n| **tacotron2-DDC** | \u2b50\u2b50 Fair | Slow | None |\n\n### Customization Options\n\n```python\nfrom abstractvoice import VoiceManager\n\nvm = VoiceManager(\n # TTS Configuration\n tts_model=\"tts_models/en/ljspeech/vits\", # Model to use\n \n # STT Configuration \n whisper_model=\"base\", # tiny, base, small, medium, large\n \n # Debugging\n debug_mode=True # Enable detailed logging\n)\n\n# Runtime customization\nvm.set_speed(1.2) # Adjust TTS speed (0.5-2.0)\nvm.set_tts_model(\"...\") # Change TTS model\nvm.set_whisper(\"small\") # Change STT model\nvm.set_voice_mode(\"wait\") # wait, full, or off\nvm.change_vad_aggressiveness(2) # VAD sensitivity (0-3)\n```\n\n### Integration Patterns\n\n#### Pattern 1: TTS Only (No Voice Input)\n```python\nvm = VoiceManager()\n\n# Speak with different speeds\nvm.speak(\"Normal speed\")\nvm.speak(\"Fast speech\", speed=1.5)\nvm.speak(\"Slow speech\", speed=0.7)\n\n# Control playback with immediate response\nif vm.is_speaking():\n success = vm.pause_speaking() # Pause IMMEDIATELY (~20ms)\n if success:\n print(\"Speech paused immediately\")\n # or\n vm.stop_speaking() # Stop completely (cannot resume)\n\n# Resume from exact position\nif vm.is_paused():\n success = vm.resume_speaking() # Resume IMMEDIATELY (~20ms)\n if success:\n print(\"Speech resumed from exact position\")\n```\n\n#### Pattern 2: STT Only (No Text-to-Speech)\n```python\nvm = VoiceManager()\n\ndef process_speech(text):\n # Send to your backend, save to DB, etc.\n your_app.process(text)\n\nvm.listen(on_transcription=process_speech)\n```\n\n#### Pattern 3: Full Voice Interaction\n```python\nvm = VoiceManager()\n\ndef on_speech(text):\n response = your_llm.generate(text)\n vm.speak(response)\n\ndef on_stop():\n print(\"User said stop\")\n vm.cleanup()\n\nvm.listen(\n on_transcription=on_speech,\n on_stop=on_stop\n)\n```\n\n### Error Handling\n\n```python\ntry:\n vm = VoiceManager()\n vm.speak(\"Test\")\nexcept Exception as e:\n print(f\"TTS Error: {e}\")\n # Handle missing dependencies, etc.\n\n# Check model availability\ntry:\n vm.set_tts_model(\"tts_models/en/ljspeech/vits\")\n print(\"VITS available\")\nexcept:\n print(\"VITS not available, using fallback\")\n vm.set_tts_model(\"tts_models/en/ljspeech/fast_pitch\")\n```\n\n### Threading and Async Support\n\nAbstractVoice handles threading internally for TTS and STT:\n\n```python\n# TTS is non-blocking\nvm.speak(\"Long text...\") # Returns immediately\n# Your code continues while speech plays\n\n# Check status\nif vm.is_speaking():\n print(\"Still speaking...\")\n\n# Wait for completion\nwhile vm.is_speaking():\n time.sleep(0.1)\n\n# STT runs in background thread\nvm.listen(on_transcription=callback) # Returns immediately\n# Callbacks fire on background thread\n```\n\n### Cleanup and Resource Management\n\n```python\n# Always cleanup when done\nvm.cleanup()\n\n# Or use context manager pattern\nfrom contextlib import contextmanager\n\n@contextmanager\ndef voice_manager():\n vm = VoiceManager()\n try:\n yield vm\n finally:\n vm.cleanup()\n\n# Usage\nwith voice_manager() as vm:\n vm.speak(\"Hello\")\n```\n\n### Configuration for Different Environments\n\n**Development (fast iteration):**\n```python\nvm = VoiceManager(\n tts_model=\"tts_models/en/ljspeech/fast_pitch\", # Fast\n whisper_model=\"tiny\", # Fast STT\n debug_mode=True\n)\n```\n\n**Production (best quality):**\n```python\nvm = VoiceManager(\n tts_model=\"tts_models/en/ljspeech/vits\", # Best quality\n whisper_model=\"base\", # Good accuracy\n debug_mode=False\n)\n```\n\n**Embedded/Resource-Constrained:**\n```python\nvm = VoiceManager(\n tts_model=\"tts_models/en/ljspeech/fast_pitch\", # Lower memory\n whisper_model=\"tiny\", # Smallest model\n debug_mode=False\n)\n```\n\n## Integration with Text Generation Systems\n\nAbstractVoice is designed to be a lightweight, modular library that you can easily integrate into your own applications. Here are complete examples for common use cases:\n\n### Example 1: Voice-Enabled Chatbot with Ollama\n\n```python\nfrom abstractvoice import VoiceManager\nimport requests\nimport time\n\n# Initialize voice manager\nvoice_manager = VoiceManager()\n\n# Function to call Ollama API\ndef generate_text(prompt):\n response = requests.post(\"http://localhost:11434/api/chat\", json={\n \"model\": \"granite3.3:2b\",\n \"messages\": [{\"role\": \"user\", \"content\": prompt}],\n \"stream\": False\n })\n return response.json()[\"message\"][\"content\"]\n\n# Callback for speech recognition\ndef on_transcription(text):\n if text.lower() == \"stop\":\n return\n \n print(f\"User: {text}\")\n \n # Generate response\n response = generate_text(text)\n print(f\"AI: {response}\")\n \n # Speak response\n voice_manager.speak(response)\n\n# Start listening\nvoice_manager.listen(on_transcription)\n\n# Keep running until interrupted\ntry:\n while voice_manager.is_listening():\n time.sleep(0.1)\nexcept KeyboardInterrupt:\n voice_manager.cleanup()\n```\n\n### Example 2: Voice-Enabled Assistant with OpenAI\n\n```python\nfrom abstractvoice import VoiceManager\nimport openai\nimport time\n\n# Initialize\nvoice_manager = VoiceManager()\nopenai.api_key = \"your-api-key\"\n\ndef on_transcription(text):\n print(f\"User: {text}\")\n \n # Get response from OpenAI\n response = openai.ChatCompletion.create(\n model=\"gpt-4\",\n messages=[{\"role\": \"user\", \"content\": text}]\n )\n \n ai_response = response.choices[0].message.content\n print(f\"AI: {ai_response}\")\n \n # Speak the response\n voice_manager.speak(ai_response)\n\n# Start voice interaction\nvoice_manager.listen(on_transcription)\n\n# Keep running\ntry:\n while voice_manager.is_listening():\n time.sleep(0.1)\nexcept KeyboardInterrupt:\n voice_manager.cleanup()\n```\n\n### Example 3: Text-to-Speech Only (No Voice Input)\n\n```python\nfrom abstractvoice import VoiceManager\nimport time\n\n# Initialize voice manager\nvoice_manager = VoiceManager()\n\n# Simple text-to-speech\nvoice_manager.speak(\"Hello! This is a test of the text to speech system.\")\n\n# Wait for speech to finish\nwhile voice_manager.is_speaking():\n time.sleep(0.1)\n\n# Adjust speed\nvoice_manager.set_speed(1.5)\nvoice_manager.speak(\"This speech is 50% faster.\")\n\nwhile voice_manager.is_speaking():\n time.sleep(0.1)\n\n# Cleanup\nvoice_manager.cleanup()\n```\n\n### Example 4: Speech-to-Text Only (No TTS)\n\n```python\nfrom abstractvoice import VoiceManager\nimport time\n\nvoice_manager = VoiceManager()\n\ndef on_transcription(text):\n print(f\"Transcribed: {text}\")\n # Do something with the transcribed text\n # e.g., save to file, send to API, etc.\n\n# Start listening\nvoice_manager.listen(on_transcription)\n\n# Keep running\ntry:\n while voice_manager.is_listening():\n time.sleep(0.1)\nexcept KeyboardInterrupt:\n voice_manager.cleanup()\n```\n\n### Key Integration Points\n\n**VoiceManager Configuration:**\n```python\n# Full configuration example\nvoice_manager = VoiceManager(\n tts_model=\"tts_models/en/ljspeech/fast_pitch\", # Default (no external deps)\n whisper_model=\"base\", # Whisper STT model (tiny, base, small, medium, large)\n debug_mode=True # Enable debug logging\n)\n\n# Alternative TTS models (all pure Python, cross-platform):\n# - \"tts_models/en/ljspeech/fast_pitch\" - Default (fast, good quality)\n# - \"tts_models/en/ljspeech/glow-tts\" - Alternative (similar quality)\n# - \"tts_models/en/ljspeech/tacotron2-DDC\" - Legacy (older, slower)\n\n# Set voice mode (full, wait, off)\nvoice_manager.set_voice_mode(\"wait\") # Recommended to avoid self-interruption\n\n# Adjust settings (speed now preserves pitch!)\nvoice_manager.set_speed(1.2) # TTS speed (default is 1.0, range 0.5-2.0)\nvoice_manager.change_vad_aggressiveness(2) # VAD sensitivity (0-3)\n```\n\n**Callback Functions:**\n```python\ndef on_transcription(text):\n \"\"\"Called when speech is transcribed\"\"\"\n print(f\"User said: {text}\")\n # Your custom logic here\n\ndef on_stop():\n \"\"\"Called when user says 'stop'\"\"\"\n print(\"Stopping voice mode\")\n # Your cleanup logic here\n\nvoice_manager.listen(\n on_transcription=on_transcription,\n on_stop=on_stop\n)\n```\n\n## \ud83d\udcbb CLI Commands (v0.4.0+)\n\nAbstractVoice provides powerful CLI commands for model management and voice interactions.\n\n### Model Management\n\n```bash\n# Download essential model for offline use (recommended first step)\nabstractvoice download-models\n\n# Download models for specific languages\nabstractvoice download-models --language fr # French\nabstractvoice download-models --language de # German\nabstractvoice download-models --language it # Italian\nabstractvoice download-models --language es # Spanish\n\n# Download specific model by name\nabstractvoice download-models --model tts_models/fr/css10/vits\n\n# Download all available models (large download!)\nabstractvoice download-models --all\n\n# Check current cache status\nabstractvoice download-models --status\n\n# Clear model cache\nabstractvoice download-models --clear\n```\n\n### Voice Interface\n\n```bash\n# Start voice interface (default)\nabstractvoice\n\n# Start CLI REPL with specific language\nabstractvoice cli --language fr\n\n# Start with specific model\nabstractvoice --model granite3.3:2b --language de\n\n# Run simple example\nabstractvoice simple\n\n# Check dependencies\nabstractvoice check-deps\n```\n\n### CLI Voice Commands\n\nIn the CLI REPL, use these commands (v0.5.0+):\n\n```bash\n# List all available voices with download status\n/setvoice\n\n# Automatically download and set specific voice (NEW in v0.5.0!)\n/setvoice fr.css10_vits # Downloads French CSS10 if needed\n/setvoice de.thorsten_vits # Downloads German Thorsten if needed\n/setvoice it.mai_male_vits # Downloads Italian Male if needed\n/setvoice en.jenny # Downloads Jenny voice if needed\n\n# Change language (automatically downloads models if needed - NEW!)\n/language fr # Switches to French, downloads if needed\n/language de # Switches to German, downloads if needed\n/language es # Switches to Spanish, downloads if needed\n\n# Voice controls\n/pause # Pause current speech\n/resume # Resume speech\n/stop # Stop speech\n\n# Exit\n/exit\n```\n\n**New in v0.5.0:** Language and voice commands now automatically download missing models with progress indicators. No more silent failures!\n\n## Perspectives\n\nThis is a test project that I designed with examples to work with Ollama, but I will adapt the examples and abstractvoice to work with any LLM provider (anthropic, openai, etc).\n\nNext iteration will leverage directly [AbstractCore](https://www.abstractcore.ai) to handle everything related to LLM, their providers, models and configurations.\n\n## License and Acknowledgments\n\nAbstractVoice is licensed under the [MIT License](LICENSE).\n\nThis project depends on several open-source libraries and models, each with their own licenses. Please see [ACKNOWLEDGMENTS.md](ACKNOWLEDGMENTS.md) for a detailed list of dependencies and their respective licenses.\n\nSome dependencies, particularly certain TTS models, may have non-commercial use restrictions. If you plan to use AbstractVoice in a commercial application, please ensure you are using models that permit commercial use or obtain appropriate licenses. \n",
"bugtrack_url": null,
"license": null,
"summary": "A modular Python library for voice interactions with AI systems",
"version": "0.5.1",
"project_urls": {
"Documentation": "https://github.com/lpalbou/abstractvoice#readme",
"Repository": "https://github.com/lpalbou/abstractvoice"
},
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "c9427470f2ec8d8fe1082d1d0d2509e1dfd69dedacde87e06dd4837c70dff1a9",
"md5": "b8634af2421338d1974b40cc4748729b",
"sha256": "b1aa3b792dc37f3b1c3ee105da61240dca3998752888b458edf9c64ca9d97606"
},
"downloads": -1,
"filename": "abstractvoice-0.5.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b8634af2421338d1974b40cc4748729b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 69161,
"upload_time": "2025-10-21T21:41:19",
"upload_time_iso_8601": "2025-10-21T21:41:19.005399Z",
"url": "https://files.pythonhosted.org/packages/c9/42/7470f2ec8d8fe1082d1d0d2509e1dfd69dedacde87e06dd4837c70dff1a9/abstractvoice-0.5.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "6be1b97321ef76f57ece215c47905ed0f52092a79600af29b836cc0f6bb2b970",
"md5": "f9b6c1e804c59b96b8edc5c93dfab8f4",
"sha256": "f133e93236f183ca80276789efa413cf03abd884de50b03bcaaeae72e47f527f"
},
"downloads": -1,
"filename": "abstractvoice-0.5.1.tar.gz",
"has_sig": false,
"md5_digest": "f9b6c1e804c59b96b8edc5c93dfab8f4",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 84804,
"upload_time": "2025-10-21T21:41:21",
"upload_time_iso_8601": "2025-10-21T21:41:21.055094Z",
"url": "https://files.pythonhosted.org/packages/6b/e1/b97321ef76f57ece215c47905ed0f52092a79600af29b836cc0f6bb2b970/abstractvoice-0.5.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-21 21:41:21",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "lpalbou",
"github_project": "abstractvoice#readme",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "abstractvoice"
}