abstractvoice


Nameabstractvoice JSON
Version 0.5.1 PyPI version JSON
download
home_pageNone
SummaryA modular Python library for voice interactions with AI systems
upload_time2025-10-21 21:41:21
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # AbstractVoice

[![PyPI version](https://img.shields.io/pypi/v/abstractvoice.svg)](https://pypi.org/project/abstractvoice/)
[![Python Version](https://img.shields.io/pypi/pyversions/abstractvoice)](https://pypi.org/project/abstractvoice/)
[![license](https://img.shields.io/github/license/lpalbou/AbstractVoice)](https://github.com/lpalbou/abstractvoice/blob/main/LICENSE)
[![GitHub stars](https://img.shields.io/github/stars/lpalbou/abstractvoice?style=social)](https://github.com/lpalbou/abstractvoice/stargazers)


A modular Python library for voice interactions with AI systems, providing text-to-speech (TTS) and speech-to-text (STT) capabilities with interrupt handling.

While we provide CLI and WEB examples, AbstractVoice is designed to be integrated in other projects.

## Features

- **High-Quality TTS**: Best-in-class speech synthesis with VITS model
  - Natural prosody and intonation
  - Adjustable speed without pitch distortion (using librosa time-stretching)
  - Multiple quality levels (VITS best, fast_pitch fallback)
  - Automatic fallback if espeak-ng not installed
- **Cross-Platform**: Works on macOS, Linux, and Windows
  - Best quality: Install espeak-ng (easy on all platforms)
  - Fallback mode: Works without any system dependencies
- **Speech-to-Text**: Accurate voice recognition using OpenAI's Whisper
- **Voice Activity Detection**: Efficient speech detection using WebRTC VAD
- **Interrupt Handling**: Stop TTS by speaking or using stop commands
- **Modular Design**: Easily integrate with any text generation system

Note : *the LLM access is rudimentary and abstractvoice is provided more as an example and demonstrator. A better integration is to use the functionalities of this library and use them directly in combination with [AbstractCore](https://github.com/lpalbou/AbstractCore)*.

## Installation

AbstractVoice is designed to **work everywhere, out of the box** with automatic quality upgrades.

### 🚀 Quick Start (Recommended)

```bash
# One command installation - works on all systems
pip install abstractvoice[all]

# Verify it works
python -c "from abstractvoice import VoiceManager; print('✅ Ready to go!')"
```

**That's it!** AbstractVoice automatically:
- ✅ **Works everywhere** - Uses reliable models that run on any system
- ✅ **Auto-upgrades quality** - Detects when better models are available
- ✅ **No system dependencies required** - Pure Python installation
- ✅ **Optional quality boost** - Install `espeak-ng` for premium voices

### Installation Options

```bash
# Minimal (just 2 dependencies)
pip install abstractvoice

# Add features as needed
pip install abstractvoice[tts]      # Text-to-speech
pip install abstractvoice[stt]      # Speech-to-text
pip install abstractvoice[all]      # Everything (recommended)

# Language-specific
pip install abstractvoice[fr]       # French with all features
pip install abstractvoice[de]       # German with all features
```

### Optional Quality Upgrade

For the **absolute best voice quality**, install espeak-ng:

```bash
# macOS
brew install espeak-ng

# Linux
sudo apt-get install espeak-ng

# Windows
conda install espeak-ng
```

AbstractVoice automatically detects espeak-ng and upgrades to premium quality voices when available.

## Quick Start

### ⚡ Instant TTS (v0.5.0+)

```python
from abstractvoice import VoiceManager

# Initialize voice manager - works immediately with included dependencies
vm = VoiceManager()

# Text-to-speech works right away!
vm.speak("Hello! TTS works out of the box!")

# Language switching with automatic model download
vm.set_language('fr')
vm.speak("Bonjour! Le français fonctionne aussi!")
```

**That's it!** AbstractVoice v0.5.0+ automatically:
- ✅ Includes essential TTS dependencies in base installation
- ✅ Downloads models automatically when switching languages/voices
- ✅ Works immediately after `pip install abstractvoice`
- ✅ No silent failures - clear error messages if download fails
- ✅ No complex configuration needed

### 🌍 Multi-Language Support (Auto-Download in v0.5.0+)

```python
# Simply switch language - downloads model automatically if needed!
vm.set_language('fr')
vm.speak("Bonjour! Je parle français maintenant.")

# Switch to German - no manual download needed
vm.set_language('de')
vm.speak("Hallo! Ich spreche jetzt Deutsch.")

# Spanish, Italian also supported
vm.set_language('es')
vm.speak("¡Hola! Hablo español ahora.")

# If download fails, you'll get clear error messages with instructions
# Example: "❌ Cannot switch to French: Model download failed"
#          "   Try: abstractvoice download-models --language fr"
```

**New in v0.5.0:** No more manual `download_model()` calls! Language switching handles downloads automatically.

### 🔧 Check System Status

```python
from abstractvoice import is_ready, get_status, list_models
import json

# Quick readiness check
ready = is_ready()
print(f"TTS ready: {ready}")

# Get detailed status
status = json.loads(get_status())
print(f"Models cached: {status['total_cached']}")
print(f"Offline ready: {status['ready_for_offline']}")

# List all available models
models = json.loads(list_models())
for lang, voices in models.items():
    print(f"{lang}: {len(voices)} voices available")
```

# Speech-to-text with callbacks
def on_transcription(text):
    print(f"You said: {text}")
    # Process the transcription
    vm.speak(f"I heard you say: {text}")

def on_stop():
    print("Stopping voice interaction")

# Start listening
vm.listen(on_transcription, on_stop)

# The voice manager will automatically pause listening when speaking
# and resume when done to prevent feedback loops
```

## Additional Examples

### Language-Specific Usage

```python
# French voice
vm_fr = VoiceManager(language='fr')
vm_fr.speak("Bonjour! Je peux parler français.")

# Spanish voice
vm_es = VoiceManager(language='es')
vm_es.speak("¡Hola! Puedo hablar español.")

# Dynamic language switching
vm.set_language('fr')  # Switch to French
vm.set_language('en')  # Switch back to English
```

### Advanced Configuration

```python
from abstractvoice import VoiceManager

# Custom TTS model selection
vm = VoiceManager(
    language='en',
    tts_model='tts_models/en/ljspeech/fast_pitch',  # Specific model
    whisper_model='base',  # Larger Whisper model for better accuracy
    debug_mode=True
)

# Speed control
vm.set_speed(1.5)  # 1.5x speed
vm.speak("This text will be spoken faster.")

# Model switching at runtime
vm.set_tts_model('tts_models/en/ljspeech/vits')  # Switch to VITS
vm.set_whisper('small')  # Switch to larger Whisper model
```

### Error Handling and Graceful Degradation

AbstractVoice is designed to provide helpful error messages and fallback gracefully:

```python
# If you install just the basic package
# pip install abstractvoice

from abstractvoice import VoiceManager  # This works fine

try:
    vm = VoiceManager()  # This will fail with helpful message
except ImportError as e:
    print(e)
    # Output: "TTS functionality requires optional dependencies. Install with:
    #          pip install abstractvoice[tts]    # For TTS only
    #          pip install abstractvoice[all]    # For all features"

# Missing espeak-ng automatically falls back to compatible models
# Missing dependencies show clear installation instructions
# All errors are graceful with helpful guidance
```

## CLI and Web Examples

AbstractVoice includes example applications to demonstrate its capabilities:

### Using AbstractVoice from the Command Line

The easiest way to get started is to use AbstractVoice directly from your shell:

```bash
# Start AbstractVoice in voice mode (TTS ON, STT ON)
abstractvoice
# → Automatically uses VITS if espeak-ng installed (best quality)
# → Falls back to fast_pitch if espeak-ng not found

# Or start with custom settings
abstractvoice --model gemma3:latest --whisper base

# Start in text-only mode (TTS enabled, listening disabled)
abstractvoice --no-listening
```

Once started, you can interact with the AI using voice or text. Use `/help` to see all available commands.

**Note**: AbstractVoice automatically selects the best available TTS model. For best quality, install espeak-ng (see Installation section above).

### Integrating AbstractVoice in Your Python Project

Here's a simple example of how to integrate AbstractVoice into your own application:

```python
from abstractvoice import VoiceManager
import time

# Initialize voice manager
voice_manager = VoiceManager(debug_mode=False)

# Text to speech
voice_manager.speak("Hello, I am an AI assistant. How can I help you today?")

# Wait for speech to complete
while voice_manager.is_speaking():
    time.sleep(0.1)

# Speech to text with callback
def on_transcription(text):
    print(f"User said: {text}")
    if text.lower() != "stop":
        # Process with your text generation system
        response = f"You said: {text}"
        voice_manager.speak(response)

# Start voice recognition
voice_manager.listen(on_transcription)

# Wait for user to say "stop" or press Ctrl+C
try:
    while voice_manager.is_listening():
        time.sleep(0.1)
except KeyboardInterrupt:
    pass

# Clean up
voice_manager.cleanup()
```

## Running Examples

The package includes several examples that demonstrate different ways to use AbstractVoice.

### Voice Mode (Default)

If installed globally, you can launch AbstractVoice directly in voice mode:

```bash
# Start AbstractVoice in voice mode (TTS ON, STT ON)
abstractvoice

# With options
abstractvoice --debug --whisper base --model gemma3:latest --api http://localhost:11434/api/chat
```

**Command line options:**
- `--debug` - Enable debug mode with detailed logging
- `--api <url>` - URL of the Ollama API (default: http://localhost:11434/api/chat)
- `--model <name>` - Ollama model to use (default: granite3.3:2b)
  - Examples: cogito:3b, phi4-mini:latest, qwen2.5:latest, gemma3:latest, etc.
- `--whisper <model>` - Whisper model to use (default: tiny)
  - Options: tiny, base, small, medium, large
- `--no-listening` - Disable speech-to-text (listening), TTS still works
  - **Note**: This creates a "TTS-only" mode where you type and the AI speaks back
- `--system <prompt>` - Custom system prompt

### 🎯 Complete CLI Interface (v0.3.0+)

AbstractVoice provides a unified command interface for all functionality:

```bash
# Voice mode (default)
abstractvoice                      # Interactive voice mode with AI
abstractvoice --model cogito:3b    # With custom Ollama model
abstractvoice --language fr        # French voice mode

# Examples and utilities
abstractvoice cli                  # CLI REPL for text interaction
abstractvoice web                  # Web API server
abstractvoice simple               # Simple TTS/STT demonstration
abstractvoice check-deps           # Check dependency compatibility
abstractvoice help                 # Show available commands

# Get help
abstractvoice --help               # Complete help with all options
```

**All functionality through one command!** No more confusion between different entry points.

### Command-Line REPL

```bash
# Run the CLI example (TTS ON, STT OFF)
abstractvoice cli

# With debug mode
abstractvoice cli --debug

# With specific language
abstractvoice cli --language fr
```

#### REPL Commands

All commands must start with `/` except `stop`:

**Basic Commands:**
- `/exit`, `/q`, `/quit` - Exit REPL
- `/clear` - Clear conversation history
- `/help` - Show help information
- `stop` - Stop voice mode or TTS (voice command, no `/` needed)

**Voice & Audio:**
- `/tts on|off` - Toggle text-to-speech
- `/voice <mode>` - Voice input modes:
  - `off` - Disable voice input
  - `full` - Continuous listening, interrupts TTS on speech detection
  - `wait` - Pause listening while speaking (recommended, reduces self-interruption)
  - `stop` - Only stop on 'stop' keyword (planned)
  - `ptt` - Push-to-talk mode (planned)
- `/speed <number>` - Set TTS speed (0.5-2.0, default: 1.0, **pitch preserved**)
- `/tts_model <model>` - Switch TTS model:
  - `vits` - **Best quality** (requires espeak-ng)
  - `fast_pitch` - Good quality (works everywhere)
  - `glow-tts` - Alternative (similar quality to fast_pitch)
  - `tacotron2-DDC` - Legacy (slower, lower quality)
- `/whisper <model>` - Switch Whisper model (tiny|base|small|medium|large)
- `/stop` - Stop voice mode or TTS playback
- `/pause` - Pause current TTS playback (can be resumed)
- `/resume` - Resume paused TTS playback

**LLM Configuration:**
- `/model <name>` - Change LLM model (e.g., `/model gemma3:latest`)
- `/system <prompt>` - Set system prompt (e.g., `/system You are a helpful coding assistant`)
- `/temperature <val>` - Set temperature (0.0-2.0, default: 0.7)
- `/max_tokens <num>` - Set max tokens (default: 4096)

**Chat Management:**
- `/save <filename>` - Save chat history (e.g., `/save conversation`)
- `/load <filename>` - Load chat history (e.g., `/load conversation`)
- `/tokens` - Display token usage statistics

**Sending Messages:**
- `<message>` - Any text without `/` prefix is sent to the LLM

**Note**: Commands without `/` (except `stop`) are sent to the LLM as regular messages.

### Web API

```bash
# Run the web API example
abstractvoice web

# With different host and port
abstractvoice web --host 0.0.0.0 --port 8000
```

You can also run a simplified version that doesn't load the full models:

```bash
# Run the web API with simulation mode
abstractvoice web --simulate
```

#### Troubleshooting Web API

If you encounter issues with the web API:

1. **404 Not Found**: Make sure you're accessing the correct endpoints (e.g., `/api/test`, `/api/tts`)
2. **Connection Issues**: Ensure no other service is using the port
3. **Model Loading Errors**: Try running with `--simulate` flag to test without loading models
4. **Dependencies**: Ensure all required packages are installed:
   ```bash
   pip install flask soundfile numpy requests
   ```
5. **Test with a simple Flask script**:
   ```python
   from flask import Flask
   app = Flask(__name__)
   @app.route('/')
   def home():
       return "Flask works!"
   app.run(host='127.0.0.1', port=5000)
   ```

### Simple Demo

```bash
# Run the simple example
abstractvoice simple
```

## Documentation

### 📚 Documentation Overview

- **[README.md](README.md)** - This file: User guide, API reference, and examples
- **[CONTRIBUTING.md](CONTRIBUTING.md)** - Contribution guidelines and development setup
- **[CHANGELOG.md](CHANGELOG.md)** - Version history and release notes
- **[docs/](docs/)** - Technical documentation for developers

### 🎯 Quick Navigation

- **Getting Started**: [Installation](#installation) and [Quick Start](#quick-start)
- **Pause/Resume Control**: [TTS Control](#quick-reference-tts-control) section
- **Integration Examples**: [Integration Guide](#integration-guide-for-third-party-applications)
- **Technical Details**: [docs/architecture.md](docs/architecture.md) - How immediate pause/resume works
- **Development**: [CONTRIBUTING.md](CONTRIBUTING.md) - Setup and guidelines

## Component Overview

### VoiceManager

The main class that coordinates TTS and STT functionality:

```python
from abstractvoice import VoiceManager

# Simple initialization (automatic model selection)
# - Uses VITS if espeak-ng is installed (best quality)
# - Falls back to fast_pitch if espeak-ng is missing
manager = VoiceManager()

# Or specify a model explicitly
manager = VoiceManager(
    tts_model="tts_models/en/ljspeech/vits",  # Best quality (needs espeak-ng)
    # tts_model="tts_models/en/ljspeech/fast_pitch",  # Good (works everywhere)
    whisper_model="tiny",
    debug_mode=False
)

# === TTS (Text-to-Speech) ===

# Basic speech synthesis
manager.speak("Hello world")

# With speed control (pitch preserved via time-stretching!)
manager.speak("This is 20% faster", speed=1.2)
manager.speak("This is half speed", speed=0.5)

# Check if speaking
if manager.is_speaking():
    manager.stop_speaking()

# Pause and resume TTS (IMMEDIATE response)
manager.speak("This is a long sentence that can be paused and resumed immediately")
time.sleep(1)
success = manager.pause_speaking()  # Pause IMMEDIATELY (~20ms response)
if success:
    print("TTS paused immediately")

time.sleep(2)
success = manager.resume_speaking()  # Resume IMMEDIATELY from exact position
if success:
    print("TTS resumed from exact position")

# Check pause status
if manager.is_paused():
    manager.resume_speaking()

# Change TTS speed globally
manager.set_speed(1.3)  # All subsequent speech will be 30% faster

# Change TTS model dynamically
manager.set_tts_model("tts_models/en/ljspeech/glow-tts")

# Available TTS models (quality ranking):
# - "tts_models/en/ljspeech/vits" (BEST quality, requires espeak-ng)
# - "tts_models/en/ljspeech/fast_pitch" (fallback, works everywhere)
# - "tts_models/en/ljspeech/glow-tts" (alternative fallback)
# - "tts_models/en/ljspeech/tacotron2-DDC" (legacy)

# === Audio Lifecycle Callbacks (v0.5.1+) ===

# NEW: Precise audio timing callbacks for visual status indicators
def on_synthesis_start():
    print("🔴 Synthesis started - show thinking animation")

def on_audio_start():
    print("🔵 Audio started - show speaking animation")

def on_audio_pause():
    print("⏸️ Audio paused - show paused animation")

def on_audio_resume():
    print("▶️ Audio resumed - continue speaking animation")

def on_audio_end():
    print("🟢 Audio ended - show ready animation")

def on_synthesis_end():
    print("✅ Synthesis complete")

# Wire up callbacks
manager.tts_engine.on_playback_start = on_synthesis_start    # Existing (synthesis phase)
manager.tts_engine.on_playback_end = on_synthesis_end        # Existing (synthesis phase)
manager.on_audio_start = on_audio_start                      # NEW (actual audio playback)
manager.on_audio_end = on_audio_end                          # NEW (actual audio playback)
manager.on_audio_pause = on_audio_pause                      # NEW (pause events)
manager.on_audio_resume = on_audio_resume                    # NEW (resume events)

# Perfect for system tray icons, UI animations, or coordinating multiple audio streams

# === STT (Speech-to-Text) ===

def on_transcription(text):
    print(f"You said: {text}")

manager.listen(on_transcription, on_stop=None)
manager.stop_listening()
manager.is_listening()

# Change Whisper model
manager.set_whisper("base")  # tiny, base, small, medium, large

# === Voice Modes ===

# Control how voice recognition behaves during TTS
manager.set_voice_mode("wait")  # Pause listening while speaking (recommended)
manager.set_voice_mode("full")  # Keep listening, interrupt on speech
manager.set_voice_mode("off")   # Disable voice recognition

# === VAD (Voice Activity Detection) ===

manager.change_vad_aggressiveness(2)  # 0-3, higher = more aggressive

# === Cleanup ===

manager.cleanup()
```

### TTSEngine

Handles text-to-speech synthesis:

```python
from abstractvoice.tts import TTSEngine

# Initialize with fast_pitch model (default, no external dependencies)
tts = TTSEngine(
    model_name="tts_models/en/ljspeech/fast_pitch",
    debug_mode=False,
    streaming=True  # Enable progressive playback for long text
)

# Speak with speed control (pitch preserved via time-stretching)
tts.speak(text, speed=1.2, callback=None)  # 20% faster, same pitch

# Immediate pause and resume control
success = tts.pause()      # Pause IMMEDIATELY (~20ms response)
success = tts.resume()     # Resume IMMEDIATELY from exact position
is_paused = tts.is_paused()  # Check if currently paused

tts.stop()       # Stop completely (cannot resume)
tts.is_active()  # Check if active
```

**Important Note on Speed Parameter:**
- The speed parameter now uses proper time-stretching (via librosa)
- Changing speed does NOT affect pitch anymore
- Range: 0.5 (half speed) to 2.0 (double speed)
- Example: `speed=1.3` makes speech 30% faster while preserving natural pitch

### VoiceRecognizer

Manages speech recognition with VAD:

```python
from abstractvoice.recognition import VoiceRecognizer

def on_transcription(text):
    print(f"Transcribed: {text}")

def on_stop():
    print("Stop command detected")

recognizer = VoiceRecognizer(transcription_callback=on_transcription,
                           stop_callback=on_stop, 
                           whisper_model="tiny",
                           debug_mode=False)
recognizer.start(tts_interrupt_callback=None)
recognizer.stop()
recognizer.change_whisper_model("base")
recognizer.change_vad_aggressiveness(2)
```

## Quick Reference: TTS Control

### Pause and Resume TTS

**Professional-grade pause/resume control** with immediate response and no terminal interference.

**In CLI/REPL:**
```bash
/pause    # Pause current TTS playback IMMEDIATELY
/resume   # Resume paused TTS playback IMMEDIATELY  
/stop     # Stop TTS completely (cannot resume)
```

**Programmatic Usage:**

#### Basic Pause/Resume
```python
from abstractvoice import VoiceManager
import time

vm = VoiceManager()

# Start speech
vm.speak("This is a long sentence that demonstrates immediate pause and resume functionality.")

# Pause immediately (takes effect within ~20ms)
time.sleep(1)
result = vm.pause_speaking()
if result:
    print("✓ TTS paused immediately")

# Resume immediately (takes effect within ~20ms)  
time.sleep(2)
result = vm.resume_speaking()
if result:
    print("✓ TTS resumed immediately")
```

#### Advanced Control with Status Checking
```python
from abstractvoice import VoiceManager
import time

vm = VoiceManager()

# Start long speech
vm.speak("This is a very long text that will be used to demonstrate the advanced pause and resume control features.")

# Wait and pause
time.sleep(1.5)
if vm.is_speaking():
    vm.pause_speaking()
    print("Speech paused")

# Check pause status
if vm.is_paused():
    print("Confirmed: TTS is paused")
    time.sleep(2)
    
    # Resume from exact position
    vm.resume_speaking()
    print("Speech resumed from exact position")

# Wait for completion
while vm.is_speaking():
    time.sleep(0.1)
print("Speech completed")
```

#### Interactive Control Example
```python
from abstractvoice import VoiceManager
import threading
import time

vm = VoiceManager()

def control_speech():
    """Interactive control in separate thread"""
    time.sleep(2)
    print("Pausing speech...")
    vm.pause_speaking()
    
    time.sleep(3)
    print("Resuming speech...")
    vm.resume_speaking()

# Start long speech
long_text = """
This is a comprehensive demonstration of AbstractVoice's immediate pause and resume functionality.
The system uses non-blocking audio streaming with callback-based control.
You can pause and resume at any time with immediate response.
The audio continues from the exact position where it was paused.
"""

# Start control thread
control_thread = threading.Thread(target=control_speech, daemon=True)
control_thread.start()

# Start speech (non-blocking)
vm.speak(long_text)

# Wait for completion
while vm.is_speaking() or vm.is_paused():
    time.sleep(0.1)

vm.cleanup()
```

#### Error Handling
```python
from abstractvoice import VoiceManager

vm = VoiceManager()

# Start speech
vm.speak("Testing pause/resume with error handling")

# Safe pause with error handling
try:
    if vm.is_speaking():
        success = vm.pause_speaking()
        if success:
            print("Successfully paused")
        else:
            print("No active speech to pause")
    
    # Safe resume with error handling
    if vm.is_paused():
        success = vm.resume_speaking()
        if success:
            print("Successfully resumed")
        else:
            print("Was not paused or playback completed")
            
except Exception as e:
    print(f"Error controlling TTS: {e}")
```

**Key Features:**
- **⚡ Immediate Response**: Pause/resume takes effect within ~20ms
- **🎯 Exact Position**: Resumes from precise audio position (no repetition)
- **🖥️ No Terminal Interference**: Uses OutputStream callbacks, never blocks terminal
- **🔒 Thread-Safe**: Safe to call from any thread or callback
- **📊 Reliable Status**: `is_paused()` and `is_speaking()` always accurate
- **🔄 Seamless Streaming**: Works with ongoing text synthesis

**How it works:**
- Uses `sounddevice.OutputStream` with callback function
- Pause immediately outputs silence in next audio callback (~20ms)
- Resume immediately continues audio output from exact position
- No blocking `sd.stop()` calls that interfere with terminal I/O
- Thread-safe with proper locking mechanisms

## Quick Reference: Speed & Model Control

### Changing TTS Speed

**In CLI/REPL:**
```bash
/speed 1.2    # 20% faster, pitch preserved
/speed 0.8    # 20% slower, pitch preserved
```

**Programmatically:**
```python
from abstractvoice import VoiceManager

vm = VoiceManager()

# Method 1: Set global speed
vm.set_speed(1.3)  # All speech will be 30% faster
vm.speak("This will be 30% faster")

# Method 2: Per-speech speed
vm.speak("This is 50% faster", speed=1.5)
vm.speak("This is normal speed", speed=1.0)
vm.speak("This is half speed", speed=0.5)

# Get current speed
current = vm.get_speed()  # Returns 1.3 from set_speed() above
```

### Changing TTS Model

**In CLI/REPL:**
```bash
/tts_model vits           # Best quality (needs espeak-ng)
/tts_model fast_pitch     # Good quality (works everywhere)
/tts_model glow-tts       # Alternative model
/tts_model tacotron2-DDC  # Legacy model
```

**Programmatically:**
```python
from abstractvoice import VoiceManager

# Method 1: Set at initialization
vm = VoiceManager(tts_model="tts_models/en/ljspeech/glow-tts")

# Method 2: Change dynamically at runtime
vm.set_tts_model("tts_models/en/ljspeech/fast_pitch")
vm.speak("Using fast_pitch now")

vm.set_tts_model("tts_models/en/ljspeech/glow-tts")
vm.speak("Using glow-tts now")

# Available models (quality ranking):
models = [
    "tts_models/en/ljspeech/vits",          # BEST (requires espeak-ng)
    "tts_models/en/ljspeech/fast_pitch",    # Good (works everywhere)
    "tts_models/en/ljspeech/glow-tts",      # Alternative fallback
    "tts_models/en/ljspeech/tacotron2-DDC"  # Legacy
]
```

### Complete Example: Experiment with Settings

```python
from abstractvoice import VoiceManager
import time

vm = VoiceManager()

# Test different models (vits requires espeak-ng)
for model in ["vits", "fast_pitch", "glow-tts", "tacotron2-DDC"]:
    full_name = f"tts_models/en/ljspeech/{model}"
    vm.set_tts_model(full_name)
    
    # Test different speeds with each model
    for speed in [0.8, 1.0, 1.2]:
        vm.speak(f"Testing {model} at {speed}x speed", speed=speed)
        while vm.is_speaking():
            time.sleep(0.1)
```

## Integration Guide for Third-Party Applications

AbstractVoice is designed as a lightweight, modular library for easy integration into your applications. This guide covers everything you need to know.

### Quick Start: Basic Integration

```python
from abstractvoice import VoiceManager

# 1. Initialize (automatic best-quality model selection)
vm = VoiceManager()

# 2. Text-to-Speech
vm.speak("Hello from my app!")

# 3. Speech-to-Text with callback
def handle_speech(text):
    print(f"User said: {text}")
    # Process text in your app...

vm.listen(on_transcription=handle_speech)
```

### Model Selection: Automatic vs Explicit

**Automatic (Recommended):**
```python
# Automatically uses best available model
vm = VoiceManager()
# → Uses VITS if espeak-ng installed (best quality)
# → Falls back to fast_pitch if espeak-ng missing
```

**Explicit:**
```python
# Force a specific model (bypasses auto-detection)
vm = VoiceManager(tts_model="tts_models/en/ljspeech/fast_pitch")

# Or change dynamically at runtime
vm.set_tts_model("tts_models/en/ljspeech/vits")
```

### Voice Quality Levels

| Model | Quality | Speed | Requirements |
|-------|---------|-------|--------------|
| **vits** | ⭐⭐⭐⭐⭐ Excellent | Fast | espeak-ng |
| **fast_pitch** | ⭐⭐⭐ Good | Fast | None |
| **glow-tts** | ⭐⭐⭐ Good | Fast | None |
| **tacotron2-DDC** | ⭐⭐ Fair | Slow | None |

### Customization Options

```python
from abstractvoice import VoiceManager

vm = VoiceManager(
    # TTS Configuration
    tts_model="tts_models/en/ljspeech/vits",  # Model to use
    
    # STT Configuration  
    whisper_model="base",  # tiny, base, small, medium, large
    
    # Debugging
    debug_mode=True  # Enable detailed logging
)

# Runtime customization
vm.set_speed(1.2)                    # Adjust TTS speed (0.5-2.0)
vm.set_tts_model("...")              # Change TTS model
vm.set_whisper("small")              # Change STT model
vm.set_voice_mode("wait")            # wait, full, or off
vm.change_vad_aggressiveness(2)      # VAD sensitivity (0-3)
```

### Integration Patterns

#### Pattern 1: TTS Only (No Voice Input)
```python
vm = VoiceManager()

# Speak with different speeds
vm.speak("Normal speed")
vm.speak("Fast speech", speed=1.5)
vm.speak("Slow speech", speed=0.7)

# Control playback with immediate response
if vm.is_speaking():
    success = vm.pause_speaking()  # Pause IMMEDIATELY (~20ms)
    if success:
        print("Speech paused immediately")
    # or
    vm.stop_speaking()   # Stop completely (cannot resume)

# Resume from exact position
if vm.is_paused():
    success = vm.resume_speaking()  # Resume IMMEDIATELY (~20ms)
    if success:
        print("Speech resumed from exact position")
```

#### Pattern 2: STT Only (No Text-to-Speech)
```python
vm = VoiceManager()

def process_speech(text):
    # Send to your backend, save to DB, etc.
    your_app.process(text)

vm.listen(on_transcription=process_speech)
```

#### Pattern 3: Full Voice Interaction
```python
vm = VoiceManager()

def on_speech(text):
    response = your_llm.generate(text)
    vm.speak(response)

def on_stop():
    print("User said stop")
    vm.cleanup()

vm.listen(
    on_transcription=on_speech,
    on_stop=on_stop
)
```

### Error Handling

```python
try:
    vm = VoiceManager()
    vm.speak("Test")
except Exception as e:
    print(f"TTS Error: {e}")
    # Handle missing dependencies, etc.

# Check model availability
try:
    vm.set_tts_model("tts_models/en/ljspeech/vits")
    print("VITS available")
except:
    print("VITS not available, using fallback")
    vm.set_tts_model("tts_models/en/ljspeech/fast_pitch")
```

### Threading and Async Support

AbstractVoice handles threading internally for TTS and STT:

```python
# TTS is non-blocking
vm.speak("Long text...")  # Returns immediately
# Your code continues while speech plays

# Check status
if vm.is_speaking():
    print("Still speaking...")

# Wait for completion
while vm.is_speaking():
    time.sleep(0.1)

# STT runs in background thread
vm.listen(on_transcription=callback)  # Returns immediately
# Callbacks fire on background thread
```

### Cleanup and Resource Management

```python
# Always cleanup when done
vm.cleanup()

# Or use context manager pattern
from contextlib import contextmanager

@contextmanager
def voice_manager():
    vm = VoiceManager()
    try:
        yield vm
    finally:
        vm.cleanup()

# Usage
with voice_manager() as vm:
    vm.speak("Hello")
```

### Configuration for Different Environments

**Development (fast iteration):**
```python
vm = VoiceManager(
    tts_model="tts_models/en/ljspeech/fast_pitch",  # Fast
    whisper_model="tiny",  # Fast STT
    debug_mode=True
)
```

**Production (best quality):**
```python
vm = VoiceManager(
    tts_model="tts_models/en/ljspeech/vits",  # Best quality
    whisper_model="base",  # Good accuracy
    debug_mode=False
)
```

**Embedded/Resource-Constrained:**
```python
vm = VoiceManager(
    tts_model="tts_models/en/ljspeech/fast_pitch",  # Lower memory
    whisper_model="tiny",  # Smallest model
    debug_mode=False
)
```

## Integration with Text Generation Systems

AbstractVoice is designed to be a lightweight, modular library that you can easily integrate into your own applications. Here are complete examples for common use cases:

### Example 1: Voice-Enabled Chatbot with Ollama

```python
from abstractvoice import VoiceManager
import requests
import time

# Initialize voice manager
voice_manager = VoiceManager()

# Function to call Ollama API
def generate_text(prompt):
    response = requests.post("http://localhost:11434/api/chat", json={
        "model": "granite3.3:2b",
        "messages": [{"role": "user", "content": prompt}],
        "stream": False
    })
    return response.json()["message"]["content"]

# Callback for speech recognition
def on_transcription(text):
    if text.lower() == "stop":
        return
        
    print(f"User: {text}")
    
    # Generate response
    response = generate_text(text)
    print(f"AI: {response}")
    
    # Speak response
    voice_manager.speak(response)

# Start listening
voice_manager.listen(on_transcription)

# Keep running until interrupted
try:
    while voice_manager.is_listening():
        time.sleep(0.1)
except KeyboardInterrupt:
    voice_manager.cleanup()
```

### Example 2: Voice-Enabled Assistant with OpenAI

```python
from abstractvoice import VoiceManager
import openai
import time

# Initialize
voice_manager = VoiceManager()
openai.api_key = "your-api-key"

def on_transcription(text):
    print(f"User: {text}")
    
    # Get response from OpenAI
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[{"role": "user", "content": text}]
    )
    
    ai_response = response.choices[0].message.content
    print(f"AI: {ai_response}")
    
    # Speak the response
    voice_manager.speak(ai_response)

# Start voice interaction
voice_manager.listen(on_transcription)

# Keep running
try:
    while voice_manager.is_listening():
        time.sleep(0.1)
except KeyboardInterrupt:
    voice_manager.cleanup()
```

### Example 3: Text-to-Speech Only (No Voice Input)

```python
from abstractvoice import VoiceManager
import time

# Initialize voice manager
voice_manager = VoiceManager()

# Simple text-to-speech
voice_manager.speak("Hello! This is a test of the text to speech system.")

# Wait for speech to finish
while voice_manager.is_speaking():
    time.sleep(0.1)

# Adjust speed
voice_manager.set_speed(1.5)
voice_manager.speak("This speech is 50% faster.")

while voice_manager.is_speaking():
    time.sleep(0.1)

# Cleanup
voice_manager.cleanup()
```

### Example 4: Speech-to-Text Only (No TTS)

```python
from abstractvoice import VoiceManager
import time

voice_manager = VoiceManager()

def on_transcription(text):
    print(f"Transcribed: {text}")
    # Do something with the transcribed text
    # e.g., save to file, send to API, etc.

# Start listening
voice_manager.listen(on_transcription)

# Keep running
try:
    while voice_manager.is_listening():
        time.sleep(0.1)
except KeyboardInterrupt:
    voice_manager.cleanup()
```

### Key Integration Points

**VoiceManager Configuration:**
```python
# Full configuration example
voice_manager = VoiceManager(
    tts_model="tts_models/en/ljspeech/fast_pitch",  # Default (no external deps)
    whisper_model="base",  # Whisper STT model (tiny, base, small, medium, large)
    debug_mode=True  # Enable debug logging
)

# Alternative TTS models (all pure Python, cross-platform):
# - "tts_models/en/ljspeech/fast_pitch" - Default (fast, good quality)
# - "tts_models/en/ljspeech/glow-tts" - Alternative (similar quality)
# - "tts_models/en/ljspeech/tacotron2-DDC" - Legacy (older, slower)

# Set voice mode (full, wait, off)
voice_manager.set_voice_mode("wait")  # Recommended to avoid self-interruption

# Adjust settings (speed now preserves pitch!)
voice_manager.set_speed(1.2)  # TTS speed (default is 1.0, range 0.5-2.0)
voice_manager.change_vad_aggressiveness(2)  # VAD sensitivity (0-3)
```

**Callback Functions:**
```python
def on_transcription(text):
    """Called when speech is transcribed"""
    print(f"User said: {text}")
    # Your custom logic here

def on_stop():
    """Called when user says 'stop'"""
    print("Stopping voice mode")
    # Your cleanup logic here

voice_manager.listen(
    on_transcription=on_transcription,
    on_stop=on_stop
)
```

## 💻 CLI Commands (v0.4.0+)

AbstractVoice provides powerful CLI commands for model management and voice interactions.

### Model Management

```bash
# Download essential model for offline use (recommended first step)
abstractvoice download-models

# Download models for specific languages
abstractvoice download-models --language fr    # French
abstractvoice download-models --language de    # German
abstractvoice download-models --language it    # Italian
abstractvoice download-models --language es    # Spanish

# Download specific model by name
abstractvoice download-models --model tts_models/fr/css10/vits

# Download all available models (large download!)
abstractvoice download-models --all

# Check current cache status
abstractvoice download-models --status

# Clear model cache
abstractvoice download-models --clear
```

### Voice Interface

```bash
# Start voice interface (default)
abstractvoice

# Start CLI REPL with specific language
abstractvoice cli --language fr

# Start with specific model
abstractvoice --model granite3.3:2b --language de

# Run simple example
abstractvoice simple

# Check dependencies
abstractvoice check-deps
```

### CLI Voice Commands

In the CLI REPL, use these commands (v0.5.0+):

```bash
# List all available voices with download status
/setvoice

# Automatically download and set specific voice (NEW in v0.5.0!)
/setvoice fr.css10_vits      # Downloads French CSS10 if needed
/setvoice de.thorsten_vits   # Downloads German Thorsten if needed
/setvoice it.mai_male_vits   # Downloads Italian Male if needed
/setvoice en.jenny           # Downloads Jenny voice if needed

# Change language (automatically downloads models if needed - NEW!)
/language fr                 # Switches to French, downloads if needed
/language de                 # Switches to German, downloads if needed
/language es                 # Switches to Spanish, downloads if needed

# Voice controls
/pause                       # Pause current speech
/resume                      # Resume speech
/stop                        # Stop speech

# Exit
/exit
```

**New in v0.5.0:** Language and voice commands now automatically download missing models with progress indicators. No more silent failures!

## Perspectives

This is a test project that I designed with examples to work with Ollama, but I will adapt the examples and abstractvoice to work with any LLM provider (anthropic, openai, etc).

Next iteration will leverage directly [AbstractCore](https://www.abstractcore.ai) to handle everything related to LLM, their providers, models and configurations.

## License and Acknowledgments

AbstractVoice is licensed under the [MIT License](LICENSE).

This project depends on several open-source libraries and models, each with their own licenses. Please see [ACKNOWLEDGMENTS.md](ACKNOWLEDGMENTS.md) for a detailed list of dependencies and their respective licenses.

Some dependencies, particularly certain TTS models, may have non-commercial use restrictions. If you plan to use AbstractVoice in a commercial application, please ensure you are using models that permit commercial use or obtain appropriate licenses. 

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "abstractvoice",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": "Laurent-Philippe Albou <contact@abstractcore.ai>",
    "download_url": "https://files.pythonhosted.org/packages/6b/e1/b97321ef76f57ece215c47905ed0f52092a79600af29b836cc0f6bb2b970/abstractvoice-0.5.1.tar.gz",
    "platform": null,
    "description": "# AbstractVoice\n\n[![PyPI version](https://img.shields.io/pypi/v/abstractvoice.svg)](https://pypi.org/project/abstractvoice/)\n[![Python Version](https://img.shields.io/pypi/pyversions/abstractvoice)](https://pypi.org/project/abstractvoice/)\n[![license](https://img.shields.io/github/license/lpalbou/AbstractVoice)](https://github.com/lpalbou/abstractvoice/blob/main/LICENSE)\n[![GitHub stars](https://img.shields.io/github/stars/lpalbou/abstractvoice?style=social)](https://github.com/lpalbou/abstractvoice/stargazers)\n\n\nA modular Python library for voice interactions with AI systems, providing text-to-speech (TTS) and speech-to-text (STT) capabilities with interrupt handling.\n\nWhile we provide CLI and WEB examples, AbstractVoice is designed to be integrated in other projects.\n\n## Features\n\n- **High-Quality TTS**: Best-in-class speech synthesis with VITS model\n  - Natural prosody and intonation\n  - Adjustable speed without pitch distortion (using librosa time-stretching)\n  - Multiple quality levels (VITS best, fast_pitch fallback)\n  - Automatic fallback if espeak-ng not installed\n- **Cross-Platform**: Works on macOS, Linux, and Windows\n  - Best quality: Install espeak-ng (easy on all platforms)\n  - Fallback mode: Works without any system dependencies\n- **Speech-to-Text**: Accurate voice recognition using OpenAI's Whisper\n- **Voice Activity Detection**: Efficient speech detection using WebRTC VAD\n- **Interrupt Handling**: Stop TTS by speaking or using stop commands\n- **Modular Design**: Easily integrate with any text generation system\n\nNote : *the LLM access is rudimentary and abstractvoice is provided more as an example and demonstrator. A better integration is to use the functionalities of this library and use them directly in combination with [AbstractCore](https://github.com/lpalbou/AbstractCore)*.\n\n## Installation\n\nAbstractVoice is designed to **work everywhere, out of the box** with automatic quality upgrades.\n\n### \ud83d\ude80 Quick Start (Recommended)\n\n```bash\n# One command installation - works on all systems\npip install abstractvoice[all]\n\n# Verify it works\npython -c \"from abstractvoice import VoiceManager; print('\u2705 Ready to go!')\"\n```\n\n**That's it!** AbstractVoice automatically:\n- \u2705 **Works everywhere** - Uses reliable models that run on any system\n- \u2705 **Auto-upgrades quality** - Detects when better models are available\n- \u2705 **No system dependencies required** - Pure Python installation\n- \u2705 **Optional quality boost** - Install `espeak-ng` for premium voices\n\n### Installation Options\n\n```bash\n# Minimal (just 2 dependencies)\npip install abstractvoice\n\n# Add features as needed\npip install abstractvoice[tts]      # Text-to-speech\npip install abstractvoice[stt]      # Speech-to-text\npip install abstractvoice[all]      # Everything (recommended)\n\n# Language-specific\npip install abstractvoice[fr]       # French with all features\npip install abstractvoice[de]       # German with all features\n```\n\n### Optional Quality Upgrade\n\nFor the **absolute best voice quality**, install espeak-ng:\n\n```bash\n# macOS\nbrew install espeak-ng\n\n# Linux\nsudo apt-get install espeak-ng\n\n# Windows\nconda install espeak-ng\n```\n\nAbstractVoice automatically detects espeak-ng and upgrades to premium quality voices when available.\n\n## Quick Start\n\n### \u26a1 Instant TTS (v0.5.0+)\n\n```python\nfrom abstractvoice import VoiceManager\n\n# Initialize voice manager - works immediately with included dependencies\nvm = VoiceManager()\n\n# Text-to-speech works right away!\nvm.speak(\"Hello! TTS works out of the box!\")\n\n# Language switching with automatic model download\nvm.set_language('fr')\nvm.speak(\"Bonjour! Le fran\u00e7ais fonctionne aussi!\")\n```\n\n**That's it!** AbstractVoice v0.5.0+ automatically:\n- \u2705 Includes essential TTS dependencies in base installation\n- \u2705 Downloads models automatically when switching languages/voices\n- \u2705 Works immediately after `pip install abstractvoice`\n- \u2705 No silent failures - clear error messages if download fails\n- \u2705 No complex configuration needed\n\n### \ud83c\udf0d Multi-Language Support (Auto-Download in v0.5.0+)\n\n```python\n# Simply switch language - downloads model automatically if needed!\nvm.set_language('fr')\nvm.speak(\"Bonjour! Je parle fran\u00e7ais maintenant.\")\n\n# Switch to German - no manual download needed\nvm.set_language('de')\nvm.speak(\"Hallo! Ich spreche jetzt Deutsch.\")\n\n# Spanish, Italian also supported\nvm.set_language('es')\nvm.speak(\"\u00a1Hola! Hablo espa\u00f1ol ahora.\")\n\n# If download fails, you'll get clear error messages with instructions\n# Example: \"\u274c Cannot switch to French: Model download failed\"\n#          \"   Try: abstractvoice download-models --language fr\"\n```\n\n**New in v0.5.0:** No more manual `download_model()` calls! Language switching handles downloads automatically.\n\n### \ud83d\udd27 Check System Status\n\n```python\nfrom abstractvoice import is_ready, get_status, list_models\nimport json\n\n# Quick readiness check\nready = is_ready()\nprint(f\"TTS ready: {ready}\")\n\n# Get detailed status\nstatus = json.loads(get_status())\nprint(f\"Models cached: {status['total_cached']}\")\nprint(f\"Offline ready: {status['ready_for_offline']}\")\n\n# List all available models\nmodels = json.loads(list_models())\nfor lang, voices in models.items():\n    print(f\"{lang}: {len(voices)} voices available\")\n```\n\n# Speech-to-text with callbacks\ndef on_transcription(text):\n    print(f\"You said: {text}\")\n    # Process the transcription\n    vm.speak(f\"I heard you say: {text}\")\n\ndef on_stop():\n    print(\"Stopping voice interaction\")\n\n# Start listening\nvm.listen(on_transcription, on_stop)\n\n# The voice manager will automatically pause listening when speaking\n# and resume when done to prevent feedback loops\n```\n\n## Additional Examples\n\n### Language-Specific Usage\n\n```python\n# French voice\nvm_fr = VoiceManager(language='fr')\nvm_fr.speak(\"Bonjour! Je peux parler fran\u00e7ais.\")\n\n# Spanish voice\nvm_es = VoiceManager(language='es')\nvm_es.speak(\"\u00a1Hola! Puedo hablar espa\u00f1ol.\")\n\n# Dynamic language switching\nvm.set_language('fr')  # Switch to French\nvm.set_language('en')  # Switch back to English\n```\n\n### Advanced Configuration\n\n```python\nfrom abstractvoice import VoiceManager\n\n# Custom TTS model selection\nvm = VoiceManager(\n    language='en',\n    tts_model='tts_models/en/ljspeech/fast_pitch',  # Specific model\n    whisper_model='base',  # Larger Whisper model for better accuracy\n    debug_mode=True\n)\n\n# Speed control\nvm.set_speed(1.5)  # 1.5x speed\nvm.speak(\"This text will be spoken faster.\")\n\n# Model switching at runtime\nvm.set_tts_model('tts_models/en/ljspeech/vits')  # Switch to VITS\nvm.set_whisper('small')  # Switch to larger Whisper model\n```\n\n### Error Handling and Graceful Degradation\n\nAbstractVoice is designed to provide helpful error messages and fallback gracefully:\n\n```python\n# If you install just the basic package\n# pip install abstractvoice\n\nfrom abstractvoice import VoiceManager  # This works fine\n\ntry:\n    vm = VoiceManager()  # This will fail with helpful message\nexcept ImportError as e:\n    print(e)\n    # Output: \"TTS functionality requires optional dependencies. Install with:\n    #          pip install abstractvoice[tts]    # For TTS only\n    #          pip install abstractvoice[all]    # For all features\"\n\n# Missing espeak-ng automatically falls back to compatible models\n# Missing dependencies show clear installation instructions\n# All errors are graceful with helpful guidance\n```\n\n## CLI and Web Examples\n\nAbstractVoice includes example applications to demonstrate its capabilities:\n\n### Using AbstractVoice from the Command Line\n\nThe easiest way to get started is to use AbstractVoice directly from your shell:\n\n```bash\n# Start AbstractVoice in voice mode (TTS ON, STT ON)\nabstractvoice\n# \u2192 Automatically uses VITS if espeak-ng installed (best quality)\n# \u2192 Falls back to fast_pitch if espeak-ng not found\n\n# Or start with custom settings\nabstractvoice --model gemma3:latest --whisper base\n\n# Start in text-only mode (TTS enabled, listening disabled)\nabstractvoice --no-listening\n```\n\nOnce started, you can interact with the AI using voice or text. Use `/help` to see all available commands.\n\n**Note**: AbstractVoice automatically selects the best available TTS model. For best quality, install espeak-ng (see Installation section above).\n\n### Integrating AbstractVoice in Your Python Project\n\nHere's a simple example of how to integrate AbstractVoice into your own application:\n\n```python\nfrom abstractvoice import VoiceManager\nimport time\n\n# Initialize voice manager\nvoice_manager = VoiceManager(debug_mode=False)\n\n# Text to speech\nvoice_manager.speak(\"Hello, I am an AI assistant. How can I help you today?\")\n\n# Wait for speech to complete\nwhile voice_manager.is_speaking():\n    time.sleep(0.1)\n\n# Speech to text with callback\ndef on_transcription(text):\n    print(f\"User said: {text}\")\n    if text.lower() != \"stop\":\n        # Process with your text generation system\n        response = f\"You said: {text}\"\n        voice_manager.speak(response)\n\n# Start voice recognition\nvoice_manager.listen(on_transcription)\n\n# Wait for user to say \"stop\" or press Ctrl+C\ntry:\n    while voice_manager.is_listening():\n        time.sleep(0.1)\nexcept KeyboardInterrupt:\n    pass\n\n# Clean up\nvoice_manager.cleanup()\n```\n\n## Running Examples\n\nThe package includes several examples that demonstrate different ways to use AbstractVoice.\n\n### Voice Mode (Default)\n\nIf installed globally, you can launch AbstractVoice directly in voice mode:\n\n```bash\n# Start AbstractVoice in voice mode (TTS ON, STT ON)\nabstractvoice\n\n# With options\nabstractvoice --debug --whisper base --model gemma3:latest --api http://localhost:11434/api/chat\n```\n\n**Command line options:**\n- `--debug` - Enable debug mode with detailed logging\n- `--api <url>` - URL of the Ollama API (default: http://localhost:11434/api/chat)\n- `--model <name>` - Ollama model to use (default: granite3.3:2b)\n  - Examples: cogito:3b, phi4-mini:latest, qwen2.5:latest, gemma3:latest, etc.\n- `--whisper <model>` - Whisper model to use (default: tiny)\n  - Options: tiny, base, small, medium, large\n- `--no-listening` - Disable speech-to-text (listening), TTS still works\n  - **Note**: This creates a \"TTS-only\" mode where you type and the AI speaks back\n- `--system <prompt>` - Custom system prompt\n\n### \ud83c\udfaf Complete CLI Interface (v0.3.0+)\n\nAbstractVoice provides a unified command interface for all functionality:\n\n```bash\n# Voice mode (default)\nabstractvoice                      # Interactive voice mode with AI\nabstractvoice --model cogito:3b    # With custom Ollama model\nabstractvoice --language fr        # French voice mode\n\n# Examples and utilities\nabstractvoice cli                  # CLI REPL for text interaction\nabstractvoice web                  # Web API server\nabstractvoice simple               # Simple TTS/STT demonstration\nabstractvoice check-deps           # Check dependency compatibility\nabstractvoice help                 # Show available commands\n\n# Get help\nabstractvoice --help               # Complete help with all options\n```\n\n**All functionality through one command!** No more confusion between different entry points.\n\n### Command-Line REPL\n\n```bash\n# Run the CLI example (TTS ON, STT OFF)\nabstractvoice cli\n\n# With debug mode\nabstractvoice cli --debug\n\n# With specific language\nabstractvoice cli --language fr\n```\n\n#### REPL Commands\n\nAll commands must start with `/` except `stop`:\n\n**Basic Commands:**\n- `/exit`, `/q`, `/quit` - Exit REPL\n- `/clear` - Clear conversation history\n- `/help` - Show help information\n- `stop` - Stop voice mode or TTS (voice command, no `/` needed)\n\n**Voice & Audio:**\n- `/tts on|off` - Toggle text-to-speech\n- `/voice <mode>` - Voice input modes:\n  - `off` - Disable voice input\n  - `full` - Continuous listening, interrupts TTS on speech detection\n  - `wait` - Pause listening while speaking (recommended, reduces self-interruption)\n  - `stop` - Only stop on 'stop' keyword (planned)\n  - `ptt` - Push-to-talk mode (planned)\n- `/speed <number>` - Set TTS speed (0.5-2.0, default: 1.0, **pitch preserved**)\n- `/tts_model <model>` - Switch TTS model:\n  - `vits` - **Best quality** (requires espeak-ng)\n  - `fast_pitch` - Good quality (works everywhere)\n  - `glow-tts` - Alternative (similar quality to fast_pitch)\n  - `tacotron2-DDC` - Legacy (slower, lower quality)\n- `/whisper <model>` - Switch Whisper model (tiny|base|small|medium|large)\n- `/stop` - Stop voice mode or TTS playback\n- `/pause` - Pause current TTS playback (can be resumed)\n- `/resume` - Resume paused TTS playback\n\n**LLM Configuration:**\n- `/model <name>` - Change LLM model (e.g., `/model gemma3:latest`)\n- `/system <prompt>` - Set system prompt (e.g., `/system You are a helpful coding assistant`)\n- `/temperature <val>` - Set temperature (0.0-2.0, default: 0.7)\n- `/max_tokens <num>` - Set max tokens (default: 4096)\n\n**Chat Management:**\n- `/save <filename>` - Save chat history (e.g., `/save conversation`)\n- `/load <filename>` - Load chat history (e.g., `/load conversation`)\n- `/tokens` - Display token usage statistics\n\n**Sending Messages:**\n- `<message>` - Any text without `/` prefix is sent to the LLM\n\n**Note**: Commands without `/` (except `stop`) are sent to the LLM as regular messages.\n\n### Web API\n\n```bash\n# Run the web API example\nabstractvoice web\n\n# With different host and port\nabstractvoice web --host 0.0.0.0 --port 8000\n```\n\nYou can also run a simplified version that doesn't load the full models:\n\n```bash\n# Run the web API with simulation mode\nabstractvoice web --simulate\n```\n\n#### Troubleshooting Web API\n\nIf you encounter issues with the web API:\n\n1. **404 Not Found**: Make sure you're accessing the correct endpoints (e.g., `/api/test`, `/api/tts`)\n2. **Connection Issues**: Ensure no other service is using the port\n3. **Model Loading Errors**: Try running with `--simulate` flag to test without loading models\n4. **Dependencies**: Ensure all required packages are installed:\n   ```bash\n   pip install flask soundfile numpy requests\n   ```\n5. **Test with a simple Flask script**:\n   ```python\n   from flask import Flask\n   app = Flask(__name__)\n   @app.route('/')\n   def home():\n       return \"Flask works!\"\n   app.run(host='127.0.0.1', port=5000)\n   ```\n\n### Simple Demo\n\n```bash\n# Run the simple example\nabstractvoice simple\n```\n\n## Documentation\n\n### \ud83d\udcda Documentation Overview\n\n- **[README.md](README.md)** - This file: User guide, API reference, and examples\n- **[CONTRIBUTING.md](CONTRIBUTING.md)** - Contribution guidelines and development setup\n- **[CHANGELOG.md](CHANGELOG.md)** - Version history and release notes\n- **[docs/](docs/)** - Technical documentation for developers\n\n### \ud83c\udfaf Quick Navigation\n\n- **Getting Started**: [Installation](#installation) and [Quick Start](#quick-start)\n- **Pause/Resume Control**: [TTS Control](#quick-reference-tts-control) section\n- **Integration Examples**: [Integration Guide](#integration-guide-for-third-party-applications)\n- **Technical Details**: [docs/architecture.md](docs/architecture.md) - How immediate pause/resume works\n- **Development**: [CONTRIBUTING.md](CONTRIBUTING.md) - Setup and guidelines\n\n## Component Overview\n\n### VoiceManager\n\nThe main class that coordinates TTS and STT functionality:\n\n```python\nfrom abstractvoice import VoiceManager\n\n# Simple initialization (automatic model selection)\n# - Uses VITS if espeak-ng is installed (best quality)\n# - Falls back to fast_pitch if espeak-ng is missing\nmanager = VoiceManager()\n\n# Or specify a model explicitly\nmanager = VoiceManager(\n    tts_model=\"tts_models/en/ljspeech/vits\",  # Best quality (needs espeak-ng)\n    # tts_model=\"tts_models/en/ljspeech/fast_pitch\",  # Good (works everywhere)\n    whisper_model=\"tiny\",\n    debug_mode=False\n)\n\n# === TTS (Text-to-Speech) ===\n\n# Basic speech synthesis\nmanager.speak(\"Hello world\")\n\n# With speed control (pitch preserved via time-stretching!)\nmanager.speak(\"This is 20% faster\", speed=1.2)\nmanager.speak(\"This is half speed\", speed=0.5)\n\n# Check if speaking\nif manager.is_speaking():\n    manager.stop_speaking()\n\n# Pause and resume TTS (IMMEDIATE response)\nmanager.speak(\"This is a long sentence that can be paused and resumed immediately\")\ntime.sleep(1)\nsuccess = manager.pause_speaking()  # Pause IMMEDIATELY (~20ms response)\nif success:\n    print(\"TTS paused immediately\")\n\ntime.sleep(2)\nsuccess = manager.resume_speaking()  # Resume IMMEDIATELY from exact position\nif success:\n    print(\"TTS resumed from exact position\")\n\n# Check pause status\nif manager.is_paused():\n    manager.resume_speaking()\n\n# Change TTS speed globally\nmanager.set_speed(1.3)  # All subsequent speech will be 30% faster\n\n# Change TTS model dynamically\nmanager.set_tts_model(\"tts_models/en/ljspeech/glow-tts\")\n\n# Available TTS models (quality ranking):\n# - \"tts_models/en/ljspeech/vits\" (BEST quality, requires espeak-ng)\n# - \"tts_models/en/ljspeech/fast_pitch\" (fallback, works everywhere)\n# - \"tts_models/en/ljspeech/glow-tts\" (alternative fallback)\n# - \"tts_models/en/ljspeech/tacotron2-DDC\" (legacy)\n\n# === Audio Lifecycle Callbacks (v0.5.1+) ===\n\n# NEW: Precise audio timing callbacks for visual status indicators\ndef on_synthesis_start():\n    print(\"\ud83d\udd34 Synthesis started - show thinking animation\")\n\ndef on_audio_start():\n    print(\"\ud83d\udd35 Audio started - show speaking animation\")\n\ndef on_audio_pause():\n    print(\"\u23f8\ufe0f Audio paused - show paused animation\")\n\ndef on_audio_resume():\n    print(\"\u25b6\ufe0f Audio resumed - continue speaking animation\")\n\ndef on_audio_end():\n    print(\"\ud83d\udfe2 Audio ended - show ready animation\")\n\ndef on_synthesis_end():\n    print(\"\u2705 Synthesis complete\")\n\n# Wire up callbacks\nmanager.tts_engine.on_playback_start = on_synthesis_start    # Existing (synthesis phase)\nmanager.tts_engine.on_playback_end = on_synthesis_end        # Existing (synthesis phase)\nmanager.on_audio_start = on_audio_start                      # NEW (actual audio playback)\nmanager.on_audio_end = on_audio_end                          # NEW (actual audio playback)\nmanager.on_audio_pause = on_audio_pause                      # NEW (pause events)\nmanager.on_audio_resume = on_audio_resume                    # NEW (resume events)\n\n# Perfect for system tray icons, UI animations, or coordinating multiple audio streams\n\n# === STT (Speech-to-Text) ===\n\ndef on_transcription(text):\n    print(f\"You said: {text}\")\n\nmanager.listen(on_transcription, on_stop=None)\nmanager.stop_listening()\nmanager.is_listening()\n\n# Change Whisper model\nmanager.set_whisper(\"base\")  # tiny, base, small, medium, large\n\n# === Voice Modes ===\n\n# Control how voice recognition behaves during TTS\nmanager.set_voice_mode(\"wait\")  # Pause listening while speaking (recommended)\nmanager.set_voice_mode(\"full\")  # Keep listening, interrupt on speech\nmanager.set_voice_mode(\"off\")   # Disable voice recognition\n\n# === VAD (Voice Activity Detection) ===\n\nmanager.change_vad_aggressiveness(2)  # 0-3, higher = more aggressive\n\n# === Cleanup ===\n\nmanager.cleanup()\n```\n\n### TTSEngine\n\nHandles text-to-speech synthesis:\n\n```python\nfrom abstractvoice.tts import TTSEngine\n\n# Initialize with fast_pitch model (default, no external dependencies)\ntts = TTSEngine(\n    model_name=\"tts_models/en/ljspeech/fast_pitch\",\n    debug_mode=False,\n    streaming=True  # Enable progressive playback for long text\n)\n\n# Speak with speed control (pitch preserved via time-stretching)\ntts.speak(text, speed=1.2, callback=None)  # 20% faster, same pitch\n\n# Immediate pause and resume control\nsuccess = tts.pause()      # Pause IMMEDIATELY (~20ms response)\nsuccess = tts.resume()     # Resume IMMEDIATELY from exact position\nis_paused = tts.is_paused()  # Check if currently paused\n\ntts.stop()       # Stop completely (cannot resume)\ntts.is_active()  # Check if active\n```\n\n**Important Note on Speed Parameter:**\n- The speed parameter now uses proper time-stretching (via librosa)\n- Changing speed does NOT affect pitch anymore\n- Range: 0.5 (half speed) to 2.0 (double speed)\n- Example: `speed=1.3` makes speech 30% faster while preserving natural pitch\n\n### VoiceRecognizer\n\nManages speech recognition with VAD:\n\n```python\nfrom abstractvoice.recognition import VoiceRecognizer\n\ndef on_transcription(text):\n    print(f\"Transcribed: {text}\")\n\ndef on_stop():\n    print(\"Stop command detected\")\n\nrecognizer = VoiceRecognizer(transcription_callback=on_transcription,\n                           stop_callback=on_stop, \n                           whisper_model=\"tiny\",\n                           debug_mode=False)\nrecognizer.start(tts_interrupt_callback=None)\nrecognizer.stop()\nrecognizer.change_whisper_model(\"base\")\nrecognizer.change_vad_aggressiveness(2)\n```\n\n## Quick Reference: TTS Control\n\n### Pause and Resume TTS\n\n**Professional-grade pause/resume control** with immediate response and no terminal interference.\n\n**In CLI/REPL:**\n```bash\n/pause    # Pause current TTS playback IMMEDIATELY\n/resume   # Resume paused TTS playback IMMEDIATELY  \n/stop     # Stop TTS completely (cannot resume)\n```\n\n**Programmatic Usage:**\n\n#### Basic Pause/Resume\n```python\nfrom abstractvoice import VoiceManager\nimport time\n\nvm = VoiceManager()\n\n# Start speech\nvm.speak(\"This is a long sentence that demonstrates immediate pause and resume functionality.\")\n\n# Pause immediately (takes effect within ~20ms)\ntime.sleep(1)\nresult = vm.pause_speaking()\nif result:\n    print(\"\u2713 TTS paused immediately\")\n\n# Resume immediately (takes effect within ~20ms)  \ntime.sleep(2)\nresult = vm.resume_speaking()\nif result:\n    print(\"\u2713 TTS resumed immediately\")\n```\n\n#### Advanced Control with Status Checking\n```python\nfrom abstractvoice import VoiceManager\nimport time\n\nvm = VoiceManager()\n\n# Start long speech\nvm.speak(\"This is a very long text that will be used to demonstrate the advanced pause and resume control features.\")\n\n# Wait and pause\ntime.sleep(1.5)\nif vm.is_speaking():\n    vm.pause_speaking()\n    print(\"Speech paused\")\n\n# Check pause status\nif vm.is_paused():\n    print(\"Confirmed: TTS is paused\")\n    time.sleep(2)\n    \n    # Resume from exact position\n    vm.resume_speaking()\n    print(\"Speech resumed from exact position\")\n\n# Wait for completion\nwhile vm.is_speaking():\n    time.sleep(0.1)\nprint(\"Speech completed\")\n```\n\n#### Interactive Control Example\n```python\nfrom abstractvoice import VoiceManager\nimport threading\nimport time\n\nvm = VoiceManager()\n\ndef control_speech():\n    \"\"\"Interactive control in separate thread\"\"\"\n    time.sleep(2)\n    print(\"Pausing speech...\")\n    vm.pause_speaking()\n    \n    time.sleep(3)\n    print(\"Resuming speech...\")\n    vm.resume_speaking()\n\n# Start long speech\nlong_text = \"\"\"\nThis is a comprehensive demonstration of AbstractVoice's immediate pause and resume functionality.\nThe system uses non-blocking audio streaming with callback-based control.\nYou can pause and resume at any time with immediate response.\nThe audio continues from the exact position where it was paused.\n\"\"\"\n\n# Start control thread\ncontrol_thread = threading.Thread(target=control_speech, daemon=True)\ncontrol_thread.start()\n\n# Start speech (non-blocking)\nvm.speak(long_text)\n\n# Wait for completion\nwhile vm.is_speaking() or vm.is_paused():\n    time.sleep(0.1)\n\nvm.cleanup()\n```\n\n#### Error Handling\n```python\nfrom abstractvoice import VoiceManager\n\nvm = VoiceManager()\n\n# Start speech\nvm.speak(\"Testing pause/resume with error handling\")\n\n# Safe pause with error handling\ntry:\n    if vm.is_speaking():\n        success = vm.pause_speaking()\n        if success:\n            print(\"Successfully paused\")\n        else:\n            print(\"No active speech to pause\")\n    \n    # Safe resume with error handling\n    if vm.is_paused():\n        success = vm.resume_speaking()\n        if success:\n            print(\"Successfully resumed\")\n        else:\n            print(\"Was not paused or playback completed\")\n            \nexcept Exception as e:\n    print(f\"Error controlling TTS: {e}\")\n```\n\n**Key Features:**\n- **\u26a1 Immediate Response**: Pause/resume takes effect within ~20ms\n- **\ud83c\udfaf Exact Position**: Resumes from precise audio position (no repetition)\n- **\ud83d\udda5\ufe0f No Terminal Interference**: Uses OutputStream callbacks, never blocks terminal\n- **\ud83d\udd12 Thread-Safe**: Safe to call from any thread or callback\n- **\ud83d\udcca Reliable Status**: `is_paused()` and `is_speaking()` always accurate\n- **\ud83d\udd04 Seamless Streaming**: Works with ongoing text synthesis\n\n**How it works:**\n- Uses `sounddevice.OutputStream` with callback function\n- Pause immediately outputs silence in next audio callback (~20ms)\n- Resume immediately continues audio output from exact position\n- No blocking `sd.stop()` calls that interfere with terminal I/O\n- Thread-safe with proper locking mechanisms\n\n## Quick Reference: Speed & Model Control\n\n### Changing TTS Speed\n\n**In CLI/REPL:**\n```bash\n/speed 1.2    # 20% faster, pitch preserved\n/speed 0.8    # 20% slower, pitch preserved\n```\n\n**Programmatically:**\n```python\nfrom abstractvoice import VoiceManager\n\nvm = VoiceManager()\n\n# Method 1: Set global speed\nvm.set_speed(1.3)  # All speech will be 30% faster\nvm.speak(\"This will be 30% faster\")\n\n# Method 2: Per-speech speed\nvm.speak(\"This is 50% faster\", speed=1.5)\nvm.speak(\"This is normal speed\", speed=1.0)\nvm.speak(\"This is half speed\", speed=0.5)\n\n# Get current speed\ncurrent = vm.get_speed()  # Returns 1.3 from set_speed() above\n```\n\n### Changing TTS Model\n\n**In CLI/REPL:**\n```bash\n/tts_model vits           # Best quality (needs espeak-ng)\n/tts_model fast_pitch     # Good quality (works everywhere)\n/tts_model glow-tts       # Alternative model\n/tts_model tacotron2-DDC  # Legacy model\n```\n\n**Programmatically:**\n```python\nfrom abstractvoice import VoiceManager\n\n# Method 1: Set at initialization\nvm = VoiceManager(tts_model=\"tts_models/en/ljspeech/glow-tts\")\n\n# Method 2: Change dynamically at runtime\nvm.set_tts_model(\"tts_models/en/ljspeech/fast_pitch\")\nvm.speak(\"Using fast_pitch now\")\n\nvm.set_tts_model(\"tts_models/en/ljspeech/glow-tts\")\nvm.speak(\"Using glow-tts now\")\n\n# Available models (quality ranking):\nmodels = [\n    \"tts_models/en/ljspeech/vits\",          # BEST (requires espeak-ng)\n    \"tts_models/en/ljspeech/fast_pitch\",    # Good (works everywhere)\n    \"tts_models/en/ljspeech/glow-tts\",      # Alternative fallback\n    \"tts_models/en/ljspeech/tacotron2-DDC\"  # Legacy\n]\n```\n\n### Complete Example: Experiment with Settings\n\n```python\nfrom abstractvoice import VoiceManager\nimport time\n\nvm = VoiceManager()\n\n# Test different models (vits requires espeak-ng)\nfor model in [\"vits\", \"fast_pitch\", \"glow-tts\", \"tacotron2-DDC\"]:\n    full_name = f\"tts_models/en/ljspeech/{model}\"\n    vm.set_tts_model(full_name)\n    \n    # Test different speeds with each model\n    for speed in [0.8, 1.0, 1.2]:\n        vm.speak(f\"Testing {model} at {speed}x speed\", speed=speed)\n        while vm.is_speaking():\n            time.sleep(0.1)\n```\n\n## Integration Guide for Third-Party Applications\n\nAbstractVoice is designed as a lightweight, modular library for easy integration into your applications. This guide covers everything you need to know.\n\n### Quick Start: Basic Integration\n\n```python\nfrom abstractvoice import VoiceManager\n\n# 1. Initialize (automatic best-quality model selection)\nvm = VoiceManager()\n\n# 2. Text-to-Speech\nvm.speak(\"Hello from my app!\")\n\n# 3. Speech-to-Text with callback\ndef handle_speech(text):\n    print(f\"User said: {text}\")\n    # Process text in your app...\n\nvm.listen(on_transcription=handle_speech)\n```\n\n### Model Selection: Automatic vs Explicit\n\n**Automatic (Recommended):**\n```python\n# Automatically uses best available model\nvm = VoiceManager()\n# \u2192 Uses VITS if espeak-ng installed (best quality)\n# \u2192 Falls back to fast_pitch if espeak-ng missing\n```\n\n**Explicit:**\n```python\n# Force a specific model (bypasses auto-detection)\nvm = VoiceManager(tts_model=\"tts_models/en/ljspeech/fast_pitch\")\n\n# Or change dynamically at runtime\nvm.set_tts_model(\"tts_models/en/ljspeech/vits\")\n```\n\n### Voice Quality Levels\n\n| Model | Quality | Speed | Requirements |\n|-------|---------|-------|--------------|\n| **vits** | \u2b50\u2b50\u2b50\u2b50\u2b50 Excellent | Fast | espeak-ng |\n| **fast_pitch** | \u2b50\u2b50\u2b50 Good | Fast | None |\n| **glow-tts** | \u2b50\u2b50\u2b50 Good | Fast | None |\n| **tacotron2-DDC** | \u2b50\u2b50 Fair | Slow | None |\n\n### Customization Options\n\n```python\nfrom abstractvoice import VoiceManager\n\nvm = VoiceManager(\n    # TTS Configuration\n    tts_model=\"tts_models/en/ljspeech/vits\",  # Model to use\n    \n    # STT Configuration  \n    whisper_model=\"base\",  # tiny, base, small, medium, large\n    \n    # Debugging\n    debug_mode=True  # Enable detailed logging\n)\n\n# Runtime customization\nvm.set_speed(1.2)                    # Adjust TTS speed (0.5-2.0)\nvm.set_tts_model(\"...\")              # Change TTS model\nvm.set_whisper(\"small\")              # Change STT model\nvm.set_voice_mode(\"wait\")            # wait, full, or off\nvm.change_vad_aggressiveness(2)      # VAD sensitivity (0-3)\n```\n\n### Integration Patterns\n\n#### Pattern 1: TTS Only (No Voice Input)\n```python\nvm = VoiceManager()\n\n# Speak with different speeds\nvm.speak(\"Normal speed\")\nvm.speak(\"Fast speech\", speed=1.5)\nvm.speak(\"Slow speech\", speed=0.7)\n\n# Control playback with immediate response\nif vm.is_speaking():\n    success = vm.pause_speaking()  # Pause IMMEDIATELY (~20ms)\n    if success:\n        print(\"Speech paused immediately\")\n    # or\n    vm.stop_speaking()   # Stop completely (cannot resume)\n\n# Resume from exact position\nif vm.is_paused():\n    success = vm.resume_speaking()  # Resume IMMEDIATELY (~20ms)\n    if success:\n        print(\"Speech resumed from exact position\")\n```\n\n#### Pattern 2: STT Only (No Text-to-Speech)\n```python\nvm = VoiceManager()\n\ndef process_speech(text):\n    # Send to your backend, save to DB, etc.\n    your_app.process(text)\n\nvm.listen(on_transcription=process_speech)\n```\n\n#### Pattern 3: Full Voice Interaction\n```python\nvm = VoiceManager()\n\ndef on_speech(text):\n    response = your_llm.generate(text)\n    vm.speak(response)\n\ndef on_stop():\n    print(\"User said stop\")\n    vm.cleanup()\n\nvm.listen(\n    on_transcription=on_speech,\n    on_stop=on_stop\n)\n```\n\n### Error Handling\n\n```python\ntry:\n    vm = VoiceManager()\n    vm.speak(\"Test\")\nexcept Exception as e:\n    print(f\"TTS Error: {e}\")\n    # Handle missing dependencies, etc.\n\n# Check model availability\ntry:\n    vm.set_tts_model(\"tts_models/en/ljspeech/vits\")\n    print(\"VITS available\")\nexcept:\n    print(\"VITS not available, using fallback\")\n    vm.set_tts_model(\"tts_models/en/ljspeech/fast_pitch\")\n```\n\n### Threading and Async Support\n\nAbstractVoice handles threading internally for TTS and STT:\n\n```python\n# TTS is non-blocking\nvm.speak(\"Long text...\")  # Returns immediately\n# Your code continues while speech plays\n\n# Check status\nif vm.is_speaking():\n    print(\"Still speaking...\")\n\n# Wait for completion\nwhile vm.is_speaking():\n    time.sleep(0.1)\n\n# STT runs in background thread\nvm.listen(on_transcription=callback)  # Returns immediately\n# Callbacks fire on background thread\n```\n\n### Cleanup and Resource Management\n\n```python\n# Always cleanup when done\nvm.cleanup()\n\n# Or use context manager pattern\nfrom contextlib import contextmanager\n\n@contextmanager\ndef voice_manager():\n    vm = VoiceManager()\n    try:\n        yield vm\n    finally:\n        vm.cleanup()\n\n# Usage\nwith voice_manager() as vm:\n    vm.speak(\"Hello\")\n```\n\n### Configuration for Different Environments\n\n**Development (fast iteration):**\n```python\nvm = VoiceManager(\n    tts_model=\"tts_models/en/ljspeech/fast_pitch\",  # Fast\n    whisper_model=\"tiny\",  # Fast STT\n    debug_mode=True\n)\n```\n\n**Production (best quality):**\n```python\nvm = VoiceManager(\n    tts_model=\"tts_models/en/ljspeech/vits\",  # Best quality\n    whisper_model=\"base\",  # Good accuracy\n    debug_mode=False\n)\n```\n\n**Embedded/Resource-Constrained:**\n```python\nvm = VoiceManager(\n    tts_model=\"tts_models/en/ljspeech/fast_pitch\",  # Lower memory\n    whisper_model=\"tiny\",  # Smallest model\n    debug_mode=False\n)\n```\n\n## Integration with Text Generation Systems\n\nAbstractVoice is designed to be a lightweight, modular library that you can easily integrate into your own applications. Here are complete examples for common use cases:\n\n### Example 1: Voice-Enabled Chatbot with Ollama\n\n```python\nfrom abstractvoice import VoiceManager\nimport requests\nimport time\n\n# Initialize voice manager\nvoice_manager = VoiceManager()\n\n# Function to call Ollama API\ndef generate_text(prompt):\n    response = requests.post(\"http://localhost:11434/api/chat\", json={\n        \"model\": \"granite3.3:2b\",\n        \"messages\": [{\"role\": \"user\", \"content\": prompt}],\n        \"stream\": False\n    })\n    return response.json()[\"message\"][\"content\"]\n\n# Callback for speech recognition\ndef on_transcription(text):\n    if text.lower() == \"stop\":\n        return\n        \n    print(f\"User: {text}\")\n    \n    # Generate response\n    response = generate_text(text)\n    print(f\"AI: {response}\")\n    \n    # Speak response\n    voice_manager.speak(response)\n\n# Start listening\nvoice_manager.listen(on_transcription)\n\n# Keep running until interrupted\ntry:\n    while voice_manager.is_listening():\n        time.sleep(0.1)\nexcept KeyboardInterrupt:\n    voice_manager.cleanup()\n```\n\n### Example 2: Voice-Enabled Assistant with OpenAI\n\n```python\nfrom abstractvoice import VoiceManager\nimport openai\nimport time\n\n# Initialize\nvoice_manager = VoiceManager()\nopenai.api_key = \"your-api-key\"\n\ndef on_transcription(text):\n    print(f\"User: {text}\")\n    \n    # Get response from OpenAI\n    response = openai.ChatCompletion.create(\n        model=\"gpt-4\",\n        messages=[{\"role\": \"user\", \"content\": text}]\n    )\n    \n    ai_response = response.choices[0].message.content\n    print(f\"AI: {ai_response}\")\n    \n    # Speak the response\n    voice_manager.speak(ai_response)\n\n# Start voice interaction\nvoice_manager.listen(on_transcription)\n\n# Keep running\ntry:\n    while voice_manager.is_listening():\n        time.sleep(0.1)\nexcept KeyboardInterrupt:\n    voice_manager.cleanup()\n```\n\n### Example 3: Text-to-Speech Only (No Voice Input)\n\n```python\nfrom abstractvoice import VoiceManager\nimport time\n\n# Initialize voice manager\nvoice_manager = VoiceManager()\n\n# Simple text-to-speech\nvoice_manager.speak(\"Hello! This is a test of the text to speech system.\")\n\n# Wait for speech to finish\nwhile voice_manager.is_speaking():\n    time.sleep(0.1)\n\n# Adjust speed\nvoice_manager.set_speed(1.5)\nvoice_manager.speak(\"This speech is 50% faster.\")\n\nwhile voice_manager.is_speaking():\n    time.sleep(0.1)\n\n# Cleanup\nvoice_manager.cleanup()\n```\n\n### Example 4: Speech-to-Text Only (No TTS)\n\n```python\nfrom abstractvoice import VoiceManager\nimport time\n\nvoice_manager = VoiceManager()\n\ndef on_transcription(text):\n    print(f\"Transcribed: {text}\")\n    # Do something with the transcribed text\n    # e.g., save to file, send to API, etc.\n\n# Start listening\nvoice_manager.listen(on_transcription)\n\n# Keep running\ntry:\n    while voice_manager.is_listening():\n        time.sleep(0.1)\nexcept KeyboardInterrupt:\n    voice_manager.cleanup()\n```\n\n### Key Integration Points\n\n**VoiceManager Configuration:**\n```python\n# Full configuration example\nvoice_manager = VoiceManager(\n    tts_model=\"tts_models/en/ljspeech/fast_pitch\",  # Default (no external deps)\n    whisper_model=\"base\",  # Whisper STT model (tiny, base, small, medium, large)\n    debug_mode=True  # Enable debug logging\n)\n\n# Alternative TTS models (all pure Python, cross-platform):\n# - \"tts_models/en/ljspeech/fast_pitch\" - Default (fast, good quality)\n# - \"tts_models/en/ljspeech/glow-tts\" - Alternative (similar quality)\n# - \"tts_models/en/ljspeech/tacotron2-DDC\" - Legacy (older, slower)\n\n# Set voice mode (full, wait, off)\nvoice_manager.set_voice_mode(\"wait\")  # Recommended to avoid self-interruption\n\n# Adjust settings (speed now preserves pitch!)\nvoice_manager.set_speed(1.2)  # TTS speed (default is 1.0, range 0.5-2.0)\nvoice_manager.change_vad_aggressiveness(2)  # VAD sensitivity (0-3)\n```\n\n**Callback Functions:**\n```python\ndef on_transcription(text):\n    \"\"\"Called when speech is transcribed\"\"\"\n    print(f\"User said: {text}\")\n    # Your custom logic here\n\ndef on_stop():\n    \"\"\"Called when user says 'stop'\"\"\"\n    print(\"Stopping voice mode\")\n    # Your cleanup logic here\n\nvoice_manager.listen(\n    on_transcription=on_transcription,\n    on_stop=on_stop\n)\n```\n\n## \ud83d\udcbb CLI Commands (v0.4.0+)\n\nAbstractVoice provides powerful CLI commands for model management and voice interactions.\n\n### Model Management\n\n```bash\n# Download essential model for offline use (recommended first step)\nabstractvoice download-models\n\n# Download models for specific languages\nabstractvoice download-models --language fr    # French\nabstractvoice download-models --language de    # German\nabstractvoice download-models --language it    # Italian\nabstractvoice download-models --language es    # Spanish\n\n# Download specific model by name\nabstractvoice download-models --model tts_models/fr/css10/vits\n\n# Download all available models (large download!)\nabstractvoice download-models --all\n\n# Check current cache status\nabstractvoice download-models --status\n\n# Clear model cache\nabstractvoice download-models --clear\n```\n\n### Voice Interface\n\n```bash\n# Start voice interface (default)\nabstractvoice\n\n# Start CLI REPL with specific language\nabstractvoice cli --language fr\n\n# Start with specific model\nabstractvoice --model granite3.3:2b --language de\n\n# Run simple example\nabstractvoice simple\n\n# Check dependencies\nabstractvoice check-deps\n```\n\n### CLI Voice Commands\n\nIn the CLI REPL, use these commands (v0.5.0+):\n\n```bash\n# List all available voices with download status\n/setvoice\n\n# Automatically download and set specific voice (NEW in v0.5.0!)\n/setvoice fr.css10_vits      # Downloads French CSS10 if needed\n/setvoice de.thorsten_vits   # Downloads German Thorsten if needed\n/setvoice it.mai_male_vits   # Downloads Italian Male if needed\n/setvoice en.jenny           # Downloads Jenny voice if needed\n\n# Change language (automatically downloads models if needed - NEW!)\n/language fr                 # Switches to French, downloads if needed\n/language de                 # Switches to German, downloads if needed\n/language es                 # Switches to Spanish, downloads if needed\n\n# Voice controls\n/pause                       # Pause current speech\n/resume                      # Resume speech\n/stop                        # Stop speech\n\n# Exit\n/exit\n```\n\n**New in v0.5.0:** Language and voice commands now automatically download missing models with progress indicators. No more silent failures!\n\n## Perspectives\n\nThis is a test project that I designed with examples to work with Ollama, but I will adapt the examples and abstractvoice to work with any LLM provider (anthropic, openai, etc).\n\nNext iteration will leverage directly [AbstractCore](https://www.abstractcore.ai) to handle everything related to LLM, their providers, models and configurations.\n\n## License and Acknowledgments\n\nAbstractVoice is licensed under the [MIT License](LICENSE).\n\nThis project depends on several open-source libraries and models, each with their own licenses. Please see [ACKNOWLEDGMENTS.md](ACKNOWLEDGMENTS.md) for a detailed list of dependencies and their respective licenses.\n\nSome dependencies, particularly certain TTS models, may have non-commercial use restrictions. If you plan to use AbstractVoice in a commercial application, please ensure you are using models that permit commercial use or obtain appropriate licenses. \n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A modular Python library for voice interactions with AI systems",
    "version": "0.5.1",
    "project_urls": {
        "Documentation": "https://github.com/lpalbou/abstractvoice#readme",
        "Repository": "https://github.com/lpalbou/abstractvoice"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c9427470f2ec8d8fe1082d1d0d2509e1dfd69dedacde87e06dd4837c70dff1a9",
                "md5": "b8634af2421338d1974b40cc4748729b",
                "sha256": "b1aa3b792dc37f3b1c3ee105da61240dca3998752888b458edf9c64ca9d97606"
            },
            "downloads": -1,
            "filename": "abstractvoice-0.5.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b8634af2421338d1974b40cc4748729b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 69161,
            "upload_time": "2025-10-21T21:41:19",
            "upload_time_iso_8601": "2025-10-21T21:41:19.005399Z",
            "url": "https://files.pythonhosted.org/packages/c9/42/7470f2ec8d8fe1082d1d0d2509e1dfd69dedacde87e06dd4837c70dff1a9/abstractvoice-0.5.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6be1b97321ef76f57ece215c47905ed0f52092a79600af29b836cc0f6bb2b970",
                "md5": "f9b6c1e804c59b96b8edc5c93dfab8f4",
                "sha256": "f133e93236f183ca80276789efa413cf03abd884de50b03bcaaeae72e47f527f"
            },
            "downloads": -1,
            "filename": "abstractvoice-0.5.1.tar.gz",
            "has_sig": false,
            "md5_digest": "f9b6c1e804c59b96b8edc5c93dfab8f4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 84804,
            "upload_time": "2025-10-21T21:41:21",
            "upload_time_iso_8601": "2025-10-21T21:41:21.055094Z",
            "url": "https://files.pythonhosted.org/packages/6b/e1/b97321ef76f57ece215c47905ed0f52092a79600af29b836cc0f6bb2b970/abstractvoice-0.5.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-21 21:41:21",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "lpalbou",
    "github_project": "abstractvoice#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "abstractvoice"
}
        
Elapsed time: 2.56789s