vocals

Name	vocals JSON
Version	1.0.984 JSON
	download
home_page	https://github.com/hairetsucodes/vocals-sdk-python
Summary	A Python SDK for voice processing and real-time audio communication
upload_time	2025-07-11 04:52:57
maintainer	None
docs_url	None
author	Vocals Team
requires_python	>=3.8
license	None
keywords	vocals audio speech websocket real-time voice processing
VCS
bugtrack_url
requirements	aiohttp websockets sounddevice numpy PyJWT python-dotenv typing-extensions pyaudio soundfile click psutil matplotlib
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Vocals SDK Python

[![PyPI version](https://badge.fury.io/py/vocals.svg)](https://badge.fury.io/py/vocals)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![GitHub issues](https://img.shields.io/github/issues/hairetsucodes/vocals-sdk-python)](https://github.com/hairetsucodes/vocals-sdk-python/issues)

A Python SDK for voice processing and real-time audio communication with AI assistants. Stream microphone input or audio files to receive live transcription, AI responses, and text-to-speech audio.

**Features both class-based and functional interfaces** for maximum flexibility and ease of use.

## Features

- 🎤 **Real-time microphone streaming** with voice activity detection
- 📁 **Audio file playback** support (WAV format)
- ✨ **Live transcription** with partial and final results
- 🤖 **Streaming AI responses** with real-time text display
- 🔊 **Text-to-speech playback** with automatic audio queueing
- 📊 **Conversation tracking** and session statistics
- 🚀 **Easy setup** with minimal configuration required
- 🔄 **Auto-reconnection** and robust error handling
- 🎛️ **Class-based API** with modern Python patterns
- 🔀 **Context manager support** for automatic cleanup

## Table of Contents

- [Features](#features)
- [Installation](#installation)
- [Quick Start](#quick-start)
- [SDK Modes](#sdk-modes)
- [Advanced Usage](#advanced-usage)
- [Configuration](#configuration)
- [Complete API Reference](#complete-api-reference)
- [Testing Your Setup](#testing-your-setup)
- [CLI Tools](#cli-tools)
- [Error Handling](#error-handling)
- [Troubleshooting](#troubleshooting)
- [Examples](#examples)
- [Contributing](#contributing)
- [Support](#support)
- [License](#license)

## Installation

```bash
pip install vocals
```

### Quick Setup

After installation, use the built-in setup wizard to configure your environment:

```bash
vocals setup
```

Or test your installation:

```bash
vocals test
```

Run a quick demo:

```bash
vocals demo
```

### 🌐 Web UI Demo

**NEW!** Launch an interactive web interface to try the voice assistant:

```bash
vocals demo --ui
```

This will:

- ✅ **Automatically install Gradio** (if not already installed)
- 🚀 **Launch a web interface** in your browser
- 🎤 **Real-time voice interaction** with visual feedback
- 📱 **Easy-to-use interface** with buttons and live updates
- 🔊 **Live transcription and AI responses** in the browser

**Perfect for:**

- 🎯 **Quick demonstrations** and testing
- 👥 **Showing to others** without command line
- 🖥️ **Visual feedback** and status indicators
- 📊 **Real-time conversation tracking**

The web UI provides the same functionality as the command line demo but with an intuitive graphical interface that's perfect for demonstrations and interactive testing.

### System Requirements

- Python 3.8 or higher
- Working microphone (for microphone streaming)
- Audio output device (for TTS playback)

### Additional Dependencies

The SDK automatically installs all required Python dependencies including `pyaudio`, `sounddevice`, `numpy`, `websockets`, and others.

On some Linux systems, you may need to install system-level audio libraries:

**Ubuntu/Debian:**

```bash
sudo apt-get install portaudio19-dev
```

**Other Linux distributions:**

```bash
# Install portaudio development headers using your package manager
# For example, on CentOS/RHEL: sudo yum install portaudio-devel
```

## Quick Start

### 1. Get Your API Key

Set up your Vocals API key as an environment variable:

```bash
export VOCALS_DEV_API_KEY="your_api_key_here"
```

Or create a `.env` file in your project:

```
VOCALS_DEV_API_KEY=your_api_key_here
```

### 2. Basic Usage

The Vocals SDK provides a modern **class-based API** as the primary interface

#### Microphone Streaming (Minimal Example)

```python
import asyncio
from vocals import VocalsClient

async def main():
    # Create client instance
    client = VocalsClient()

    # Stream microphone for 10 seconds
    await client.stream_microphone(duration=10.0)

    # Clean up
    await client.disconnect()
    client.cleanup()

if __name__ == "__main__":
    asyncio.run(main())
```

#### Audio File Playback (Minimal Example)

```python
import asyncio
from vocals import VocalsClient

async def main():
    # Create client instance
    client = VocalsClient()

    # Stream audio file
    await client.stream_audio_file("path/to/your/audio.wav")

    # Clean up
    await client.disconnect()
    client.cleanup()

if __name__ == "__main__":
    asyncio.run(main())
```

#### Context Manager Usage (Recommended)

```python
import asyncio
from vocals import VocalsClient

async def main():
    # Use context manager for automatic cleanup
    async with VocalsClient() as client:
        await client.stream_microphone(duration=10.0)

if __name__ == "__main__":
    asyncio.run(main())
```

## SDK Modes

The Vocals SDK supports two usage patterns:

### Default Experience (No Modes)

When you create the client without specifying modes, you get a full auto-contained experience:

```python
# Full experience with automatic handlers, playback, and beautiful console output
client = VocalsClient()
```

**Features:**

- ✅ Automatic transcription display with partial updates
- ✅ Streaming AI response display in real-time
- ✅ Automatic TTS audio playback
- ✅ Speech interruption handling
- ✅ Beautiful console output with emojis
- ✅ Perfect for getting started quickly

### Controlled Experience (With Modes)

When you specify modes, the client becomes passive and you control everything:

```python
# Controlled experience - you handle all logic
client = VocalsClient(modes=['transcription', 'voice_assistant'])
```

**Available Modes:**

- `'transcription'`: Enables transcription-related internal processing
- `'voice_assistant'`: Enables AI response handling and speech interruption

**Features:**

- ✅ No automatic handlers attached
- ✅ No automatic playback
- ✅ You attach your own message handlers
- ✅ You control when to play audio
- ✅ Perfect for custom applications

### Example: Controlled Experience

```python
import asyncio
from vocals import VocalsClient

async def main():
    # Create client with controlled experience
    client = VocalsClient(modes=['transcription', 'voice_assistant'])

    # Custom message handler
    def handle_messages(message):
        if message.type == "transcription" and message.data:
            text = message.data.get("text", "")
            is_partial = message.data.get("is_partial", False)
            if not is_partial:
                print(f"You said: {text}")

        elif message.type == "tts_audio" and message.data:
            text = message.data.get("text", "")
            print(f"AI speaking: {text}")
            # Manually start playback
            asyncio.create_task(client.play_audio())

    # Register your handler
    client.on_message(handle_messages)

    # Stream microphone with context manager
    async with client:
        await client.stream_microphone(
            duration=30.0,
            auto_playback=False  # We control playback
        )

if __name__ == "__main__":
    asyncio.run(main())
```

## Advanced Usage

### Enhanced Microphone Streaming

```python
import asyncio
import logging
from vocals import (
    VocalsClient,
    create_enhanced_message_handler,
    create_default_connection_handler,
    create_default_error_handler,
)

async def main():
    # Configure logging for cleaner output
    logging.getLogger("vocals").setLevel(logging.WARNING)

    # Create client with default full experience
    client = VocalsClient()

    try:
        print("🎤 Starting microphone streaming...")
        print("Speak into your microphone!")

        # Stream microphone with enhanced features
        async with client:
            stats = await client.stream_microphone(
                duration=30.0,            # Record for 30 seconds
                auto_connect=True,        # Auto-connect if needed
                auto_playback=True,       # Auto-play received audio
                verbose=False,            # Client handles display automatically
                stats_tracking=True,      # Track session statistics
                amplitude_threshold=0.01, # Voice activity detection threshold
            )

        # Print session statistics
        print(f"\n📊 Session Statistics:")
        print(f"   • Transcriptions: {stats.get('transcriptions', 0)}")
        print(f"   • AI Responses: {stats.get('responses', 0)}")
        print(f"   • TTS Segments: {stats.get('tts_segments_received', 0)}")

    except Exception as e:
        print(f"Error: {e}")
        await client.disconnect()
        client.cleanup()

if __name__ == "__main__":
    asyncio.run(main())
```

### Conversation Tracking Example

```python
import asyncio
from vocals import (
    VocalsClient,
    create_conversation_tracker,
    create_enhanced_message_handler,
)

async def main():
    # Create client with controlled experience for custom tracking
    client = VocalsClient(modes=['transcription', 'voice_assistant'])
    conversation_tracker = create_conversation_tracker()

    # Custom message handler with conversation tracking
    def tracking_handler(message):
        # Custom display logic
        if message.type == "transcription" and message.data:
            text = message.data.get("text", "")
            is_partial = message.data.get("is_partial", False)
            if not is_partial and text:
                print(f"🎤 You: {text}")

        elif message.type == "llm_response" and message.data:
            response = message.data.get("response", "")
            if response:
                print(f"🤖 AI: {response}")

        elif message.type == "tts_audio" and message.data:
            text = message.data.get("text", "")
            if text:
                print(f"🔊 Playing: {text}")
                # Manually start playback since we're in controlled mode
                asyncio.create_task(client.play_audio())

        # Track conversation based on message type
        if message.type == "transcription" and message.data:
            text = message.data.get("text", "")
            is_partial = message.data.get("is_partial", False)
            if text and not is_partial:
                conversation_tracker["add_transcription"](text, is_partial)

        elif message.type == "llm_response" and message.data:
            response = message.data.get("response", "")
            if response:
                conversation_tracker["add_response"](response)

    # Set up handler
    client.on_message(tracking_handler)

    try:
        # Stream microphone with context manager
        async with client:
            await client.stream_microphone(
                duration=15.0,
                auto_playback=False  # We handle playback manually
            )

        # Print conversation history
        print("\n" + "="*50)
        print("📜 CONVERSATION HISTORY")
        print("="*50)
        conversation_tracker["print_conversation"]()

        # Print conversation statistics
        stats = conversation_tracker["get_stats"]()
        print(f"\n📈 Session lasted {stats['duration']:.1f} seconds")

    except Exception as e:
        print(f"Error: {e}")

if __name__ == "__main__":
    asyncio.run(main())
```

### Infinite Streaming with Signal Handling

```python
import asyncio
import signal
from vocals import VocalsClient

# Global shutdown event
shutdown_event = asyncio.Event()

def setup_signal_handlers():
    """Setup signal handlers for graceful shutdown."""
    def signal_handler(signum, frame):
        if not shutdown_event.is_set():
            print(f"\n📡 Received signal {signum}, shutting down...")
            shutdown_event.set()

    signal.signal(signal.SIGINT, signal_handler)
    signal.signal(signal.SIGTERM, signal_handler)

async def main():
    setup_signal_handlers()

    # Create client
    client = VocalsClient()

    try:
        print("🎤 Starting infinite streaming...")
        print("Press Ctrl+C to stop")

        # Connect to service
        await client.connect()

        # Create streaming task
        async def stream_task():
            await client.stream_microphone(
                duration=0,  # 0 = infinite streaming
                auto_connect=True,
                auto_playback=True,
                verbose=False,
                stats_tracking=True,
            )

        # Run streaming and wait for shutdown
        streaming_task = asyncio.create_task(stream_task())
        shutdown_task = asyncio.create_task(shutdown_event.wait())

        # Wait for shutdown signal
        await shutdown_task

        # Stop recording gracefully
        await client.stop_recording()

    finally:
        # Cancel streaming task
        if 'streaming_task' in locals():
            streaming_task.cancel()
        await client.disconnect()
        client.cleanup()

if __name__ == "__main__":
    asyncio.run(main())
```

### Custom Audio Processing (Alternative to Local Playback)

Instead of playing audio locally, you can process audio segments with custom handlers - perfect for saving audio files, sending to external players, or implementing custom audio processing:

```python
import asyncio
import base64
from vocals import VocalsClient

async def main():
    """Advanced voice assistant with custom audio processing"""

    # Create client with controlled mode for manual audio handling
    client = VocalsClient(modes=["transcription", "voice_assistant"])

    # Custom state tracking
    conversation_state = {"listening": False, "processing": False, "speaking": False}

    def handle_messages(message):
        """Custom message handler with audio processing control"""

        if message.type == "transcription" and message.data:
            text = message.data.get("text", "")
            is_partial = message.data.get("is_partial", False)

            if is_partial:
                print(f"\r🎤 Listening: {text}...", end="", flush=True)
            else:
                print(f"\n✅ You said: {text}")

        elif message.type == "llm_response_streaming" and message.data:
            token = message.data.get("token", "")
            is_complete = message.data.get("is_complete", False)

            if token:
                print(token, end="", flush=True)
            if is_complete:
                print()  # New line

        elif message.type == "tts_audio" and message.data:
            text = message.data.get("text", "")
            if text and not conversation_state["speaking"]:
                print(f"🔊 AI speaking: {text}")
                conversation_state["speaking"] = True

                # Custom audio processing instead of local playback
                def custom_audio_handler(segment):
                    """Process each audio segment with custom logic"""
                    print(f"🎵 Processing audio: {segment.text}")

                    # Option 1: Save to file
                    audio_data = base64.b64decode(segment.audio_data)
                    filename = f"audio_{segment.segment_id}.wav"
                    with open(filename, "wb") as f:
                        f.write(audio_data)
                    print(f"💾 Saved audio to: {filename}")

                    # Option 2: Send to external audio player
                    # subprocess.run(["ffplay", "-nodisp", "-autoexit", filename])

                    # Option 3: Stream to audio device
                    # your_audio_device.play(audio_data)

                    # Option 4: Convert format
                    # converted_audio = convert_audio_format(audio_data, target_format)

                    # Option 5: Process with AI/ML
                    # audio_features = extract_audio_features(audio_data)
                    # emotion_score = analyze_emotion(audio_features)

                # Process all available audio segments
                processed_count = client.process_audio_queue(
                    custom_audio_handler,
                    consume_all=True
                )
                print(f"✅ Processed {processed_count} audio segments")

        elif message.type == "speech_interruption":
            print("\n🛑 Speech interrupted")
            conversation_state["speaking"] = False

    # Register message handler
    client.on_message(handle_messages)

    # Connection handler
    def handle_connection(state):
        if state.name == "CONNECTED":
            print("✅ Connected to voice assistant")
        elif state.name == "DISCONNECTED":
            print("❌ Disconnected from voice assistant")

    client.on_connection_change(handle_connection)

    try:
        print("🎤 Voice Assistant with Custom Audio Processing")
        print("Audio will be saved to files instead of played locally")
        print("Speak into your microphone...")
        print("Press Ctrl+C to stop")

        # Stream microphone with custom audio handling
        async with client:
            await client.stream_microphone(
                duration=0,           # Infinite recording
                auto_connect=True,    # Auto-connect to service
                auto_playback=False,  # Disable automatic playback - we handle it
                verbose=False,        # Clean output
            )

    except KeyboardInterrupt:
        print("\n👋 Custom audio processing stopped")
    finally:
        await client.disconnect()
        client.cleanup()

if __name__ == "__main__":
    asyncio.run(main())
```

**Key Features of Custom Audio Processing:**

- 🎛️ **Full Control**: Complete control over audio handling instead of automatic playback
- 💾 **Save to Files**: Save audio segments as individual WAV files
- 🔄 **Format Conversion**: Convert audio to different formats before processing
- 🎵 **External Players**: Send audio to external audio players or devices
- 🤖 **AI Processing**: Analyze audio with machine learning models
- 📊 **Audio Analytics**: Extract features, analyze emotion, or process speech patterns
- 🔌 **Integration**: Easily integrate with existing audio pipelines

**Use Cases:**

- Recording conversations for later playback
- Building custom audio players with UI controls
- Streaming audio to multiple devices simultaneously
- Processing audio with AI/ML models for analysis
- Converting audio formats for different platforms
- Creating audio archives or transcription systems

## Configuration

### Environment Variables

```bash
# Required: Your Vocals API key
export VOCALS_DEV_API_KEY="vdev_your_api_key_here"

```

### Audio Configuration

```python
from vocals import VocalsClient, AudioConfig

# Create custom audio configuration
audio_config = AudioConfig(
    sample_rate=24000,    # Sample rate in Hz
    channels=1,           # Number of audio channels
    format="pcm_f32le",   # Audio format
    buffer_size=1024,     # Audio buffer size
)

# Use with client
client = VocalsClient(audio_config=audio_config)
```

### SDK Configuration

```python
from vocals import VocalsClient, get_default_config

# Get default configuration
config = get_default_config()

# Customize configuration
config.max_reconnect_attempts = 5
config.reconnect_delay = 2.0
config.auto_connect = True
config.token_refresh_buffer = 60.0

# Use with client
client = VocalsClient(config=config)
```

## Complete API Reference

The Vocals SDK provides comprehensive control over voice processing, connection management, audio playback, and event handling. Here's a complete reference of all available controls:

**🎛️ Main Control Categories:**

- **SDK Creation & Configuration** - Initialize and configure the SDK
- **Stream Methods** - Control microphone and file streaming
- **Connection Management** - Connect, disconnect, and manage WebSocket connections
- **Audio Playback** - Control TTS audio playback, queueing, and timing
- **Event Handling** - Register handlers for messages, connections, errors, and audio data
- **State Management** - Access real-time state information
- **Device Management** - Manage and test audio devices

**📋 Quick Reference:**
| Control Category | Key Methods | Purpose |
|------------------|-------------|---------|
| **Streaming** | `stream_microphone()`, `stream_audio_file()` | Start voice/audio processing |
| **Connection** | `connect()`, `disconnect()`, `reconnect()` | Manage WebSocket connection |
| **Recording** | `start_recording()`, `stop_recording()` | Control audio input |
| **Playback** | `play_audio()`, `pause_audio()`, `stop_audio()` | Control TTS audio output |
| **Queue** | `clear_queue()`, `add_to_queue()`, `get_audio_queue()` | Manage audio queue |
| **Events** | `on_message()`, `on_connection_change()`, `on_error()` | Handle events |
| **State** | `get_is_connected()`, `get_is_playing()`, `get_recording_state()` | Check current state |

### Core Functions

- `VocalsClient(config?, audio_config?, user_id?, modes?)` - Create client instance
- `get_default_config()` - Get default configuration
- `AudioConfig(...)` - Audio configuration class

#### `VocalsClient()` Constructor

```python
VocalsClient(
    config: Optional[VocalsConfig] = None,
    audio_config: Optional[AudioConfig] = None,
    user_id: Optional[str] = None,
    modes: List[str] = []  # Controls client behavior
)
```

**Parameters:**

- `config`: Client configuration options (connection, logging, etc.)
- `audio_config`: Audio processing configuration (sample rate, channels, etc.)
- `user_id`: Optional user ID for token generation
- `modes`: List of modes to control client behavior

**Modes:**

- `[]` (empty list): **Default Experience** - Full auto-contained behavior with automatic handlers
- `['transcription']`: **Controlled** - Only transcription-related internal processing
- `['voice_assistant']`: **Controlled** - Only AI response handling and speech interruption
- `['transcription', 'voice_assistant']`: **Controlled** - Both features, but no automatic handlers

### Audio Configuration

```python
AudioConfig(
    sample_rate: int = 24000,     # Sample rate in Hz
    channels: int = 1,            # Number of audio channels
    format: str = "pcm_f32le",    # Audio format
    buffer_size: int = 1024,      # Audio buffer size
)
```

### Stream Methods

#### `stream_microphone()` Parameters

```python
await client.stream_microphone(
    duration: float = 30.0,           # Recording duration in seconds (0 for infinite)
    auto_connect: bool = True,        # Whether to automatically connect if not connected
    auto_playback: bool = True,       # Whether to automatically play received audio
    verbose: bool = True,             # Whether to log detailed progress
    stats_tracking: bool = True,      # Whether to track and return statistics
    amplitude_threshold: float = 0.01 # Minimum amplitude to consider as speech
)
```

**Important:** In **Controlled Experience** (with modes), TTS audio is always added to the queue, but `auto_playback=False` prevents automatic playback. You must manually call `client.play_audio()` to play queued audio.

#### `stream_audio_file()` Parameters

```python
await client.stream_audio_file(
    file_path: str,                   # Path to the audio file to stream
    chunk_size: int = 1024,           # Size of each chunk to send
    verbose: bool = True,             # Whether to log detailed progress
    auto_connect: bool = True         # Whether to automatically connect if not connected
)
```

### Connection & Recording Methods

```python
await client.connect()                # Connect to WebSocket
await client.disconnect()             # Disconnect from WebSocket
await client.reconnect()              # Reconnect to WebSocket
await client.start_recording()        # Start recording
await client.stop_recording()         # Stop recording
```

### Audio Playback Methods

```python
await client.play_audio()             # Start/resume audio playback
await client.pause_audio()            # Pause audio playback
await client.stop_audio()             # Stop audio playback
await client.fade_out_audio(duration) # Fade out audio over specified duration
client.clear_queue()                  # Clear the audio playback queue
client.add_to_queue(segment)          # Add audio segment to queue
```

### Event Handlers

```python
client.on_message(handler)            # Handle incoming messages
client.on_connection_change(handler)  # Handle connection state changes
client.on_error(handler)              # Handle errors
client.on_audio_data(handler)         # Handle audio data
```

**Handler Functions:**

- `handler(message)` - Message handler receives WebSocket messages
- `handler(connection_state)` - Connection handler receives connection state changes
- `handler(error)` - Error handler receives error objects
- `handler(audio_data)` - Audio data handler receives real-time audio data

### Properties

```python
# Connection properties
client.connection_state               # Get current connection state
client.is_connected                   # Check if connected
client.is_connecting                  # Check if connecting

# Recording properties
client.recording_state                # Get current recording state
client.is_recording                   # Check if recording

# Playback properties
client.playback_state                 # Get current playback state
client.is_playing                     # Check if playing audio
client.audio_queue                    # Get current audio queue
client.current_segment                # Get currently playing segment
client.current_amplitude              # Get current audio amplitude

# Token properties
client.token                          # Get current token
client.token_expires_at               # Get token expiration timestamp
```

### Utility Methods

```python
client.set_user_id(user_id)           # Set user ID for token generation
client.cleanup()                      # Clean up resources
client.process_audio_queue(handler)   # Process audio queue with custom handler
```

### Utility Functions

These utility functions work with both the class-based and functional APIs:

```python
# Message handlers
create_enhanced_message_handler(
    verbose: bool = True,
    show_transcription: bool = True,
    show_responses: bool = True,
    show_streaming: bool = True,
    show_detection: bool = False
)

# Conversation tracking
create_conversation_tracker()

# Statistics tracking
create_microphone_stats_tracker(verbose: bool = True)

# Connection handlers
create_default_connection_handler(verbose: bool = True)
create_default_error_handler(verbose: bool = True)
```

### Audio Device Management

```python
# Device management
list_audio_devices()                  # List available audio devices
get_default_audio_device()            # Get default audio device
test_audio_device(device_id, duration) # Test audio device
validate_audio_device(device_id)      # Validate audio device
get_audio_device_info(device_id)      # Get device information
print_audio_devices()                 # Print formatted device list
create_audio_device_selector()        # Interactive device selector
```

### Auto-Playback Behavior

**Default Experience (no modes):**

- `auto_playback=True` (default): TTS audio plays automatically
- `auto_playback=False`: TTS audio is added to queue but doesn't play automatically

**Controlled Experience (with modes):**

- `auto_playback=True`: TTS audio is added to queue and plays automatically
- `auto_playback=False`: TTS audio is added to queue but requires manual `client.play_audio()` call

**Key Point:** In controlled mode, TTS audio is **always** added to the queue regardless of `auto_playback` setting. The `auto_playback` parameter only controls whether playback starts automatically.

### Message Types

Common message types you'll receive in handlers:

```python
# Transcription messages
{
    "type": "transcription",
    "data": {
        "text": "Hello world",
        "is_partial": False,
        "segment_id": "abc123"
    }
}

# LLM streaming response
{
    "type": "llm_response_streaming",
    "data": {
        "token": "Hello",
        "accumulated_response": "Hello",
        "is_complete": False,
        "segment_id": "def456"
    }
}

# TTS audio
{
    "type": "tts_audio",
    "data": {
        "text": "Hello there",
        "audio_data": "base64_encoded_wav_data",
        "sample_rate": 24000,
        "segment_id": "ghi789",
        "duration_seconds": 1.5
    }
}

# Speech interruption
{
    "type": "speech_interruption",
    "data": {}
}
```

## Testing Your Setup

After setting up the SDK, you can test all the controls to ensure everything is working properly:

### 1. Test Basic Audio Setup

```bash
# List available audio devices
vocals devices

# Test your microphone
vocals test-device

# Run system diagnostics
vocals diagnose
```

### 2. Test Default Experience

```python
import asyncio
from vocals import VocalsClient

async def test_default():
    """Test default experience with automatic handlers"""
    client = VocalsClient()  # No modes = full automatic experience

    print("🎤 Testing default experience...")
    print("Speak and listen for AI responses...")

    # Test with automatic playback
    async with client:
        await client.stream_microphone(
            duration=15.0,
            auto_playback=True,  # Should auto-play TTS
            verbose=False
        )

    print("✅ Default experience test completed")

asyncio.run(test_default())
```

### 3. Test Controlled Experience

```python
import asyncio
from vocals import VocalsClient

async def test_controlled():
    """Test controlled experience with manual handlers"""
    client = VocalsClient(modes=['transcription', 'voice_assistant'])

    # Track what we receive
    received_messages = []

    def test_handler(message):
        received_messages.append(message.type)
        print(f"✅ Received: {message.type}")

        # Test manual playback control
        if message.type == "tts_audio":
            print("🔊 Manually triggering playback...")
            asyncio.create_task(client.play_audio())

    # Register handler
    client.on_message(test_handler)

    print("🎤 Testing controlled experience...")
    print("Should receive transcription and TTS messages...")

    # Test with manual playback control
    async with client:
        await client.stream_microphone(
            duration=15.0,
            auto_playback=False,  # We control playback manually
            verbose=False
        )

    print(f"📊 Received message types: {set(received_messages)}")

    # Verify we got the expected message types
    expected_types = ["transcription", "tts_audio"]
    for msg_type in expected_types:
        if msg_type in received_messages:
            print(f"✅ {msg_type} messages working")
        else:
            print(f"❌ {msg_type} messages not received")

    print("✅ Controlled experience test completed")

asyncio.run(test_controlled())
```

### 4. Test Audio Playback Controls

```python
import asyncio
from vocals import VocalsClient

async def test_playback_controls():
    """Test all audio playback controls"""
    client = VocalsClient(modes=['transcription', 'voice_assistant'])

    # Test queue management
    print("🎵 Testing audio playback controls...")

    # Check initial state
    print(f"Initial queue size: {len(client.audio_queue)}")
    print(f"Is playing: {client.is_playing}")

    def audio_handler(message):
        if message.type == "tts_audio":
            print(f"🎵 Audio received: {message.data.get('text', '')}")
            print(f"Queue size: {len(client.audio_queue)}")

    client.on_message(audio_handler)

    # Stream and collect audio
    async with client:
        await client.stream_microphone(
            duration=10.0,
            auto_playback=False,  # Don't auto-play
            verbose=False
        )

    # Test manual controls
    queue_size = len(client.audio_queue)
    if queue_size > 0:
        print(f"✅ {queue_size} audio segments in queue")

        print("🎵 Testing play_audio()...")
        await client.play_audio()

        # Wait a moment then test pause
        await asyncio.sleep(1)
        print("⏸️ Testing pause_audio()...")
        await client.pause_audio()

        print("▶️ Testing play_audio() again...")
        await client.play_audio()

        # Test stop
        await asyncio.sleep(1)
        print("⏹️ Testing stop_audio()...")
        await client.stop_audio()

        print("🗑️ Testing clear_queue()...")
        client.clear_queue()
        print(f"Queue size after clear: {len(client.audio_queue)}")

        print("✅ All playback controls working!")
    else:
        print("❌ No audio received to test playback controls")

    await client.disconnect()
    client.cleanup()

asyncio.run(test_playback_controls())
```

### 5. Test All Event Handlers

```python
import asyncio
from vocals import VocalsClient

async def test_event_handlers():
    """Test all event handler types"""
    client = VocalsClient(modes=['transcription', 'voice_assistant'])

    # Track events
    events_received = {
        'messages': 0,
        'connections': 0,
        'errors': 0,
        'audio_data': 0
    }

    def message_handler(message):
        events_received['messages'] += 1
        print(f"📩 Message: {message.type}")

    def connection_handler(state):
        events_received['connections'] += 1
        print(f"🔌 Connection: {state.name}")

    def error_handler(error):
        events_received['errors'] += 1
        print(f"❌ Error: {error.message}")

    def audio_data_handler(audio_data):
        events_received['audio_data'] += 1
        if events_received['audio_data'] % 100 == 0:  # Log every 100th
            print(f"🎤 Audio data chunks: {events_received['audio_data']}")

    # Register all handlers
    client.on_message(message_handler)
    client.on_connection_change(connection_handler)
    client.on_error(error_handler)
    client.on_audio_data(audio_data_handler)

    print("🧪 Testing all event handlers...")

    async with client:
        await client.stream_microphone(
            duration=10.0,
            auto_playback=False,
            verbose=False
        )

    # Report results
    print("\n📊 Event Handler Test Results:")
    for event_type, count in events_received.items():
        status = "✅" if count > 0 else "❌"
        print(f"   {status} {event_type}: {count}")

asyncio.run(test_event_handlers())
```

### 6. Validate All Controls Are Working

Run this comprehensive test to verify everything:

```bash
# Create a test script
cat > test_all_controls.py << 'EOF'
import asyncio
from vocals import VocalsClient

async def comprehensive_test():
    """Comprehensive test of all client controls"""
    print("🧪 Comprehensive Client Control Test")
    print("=" * 50)

    # Test 1: Default mode
    print("\n1️⃣ Testing Default Mode...")
    client1 = VocalsClient()
    async with client1:
        await client1.stream_microphone(duration=5.0, verbose=False)
    print("✅ Default mode test completed")

    # Test 2: Controlled mode
    print("\n2️⃣ Testing Controlled Mode...")
    client2 = VocalsClient(modes=['transcription', 'voice_assistant'])

    message_count = 0
    def counter(message):
        nonlocal message_count
        message_count += 1
        if message.type == "tts_audio":
            asyncio.create_task(client2.play_audio())

    client2.on_message(counter)
    async with client2:
        await client2.stream_microphone(duration=5.0, auto_playback=False, verbose=False)
    print(f"✅ Controlled mode test completed - {message_count} messages")

    # Test 3: All controls
    print("\n3️⃣ Testing Individual Controls...")
    client3 = VocalsClient()

    # Test properties
    print(f"   Connection state: {client3.connection_state.name}")
    print(f"   Is connected: {client3.is_connected}")
    print(f"   Recording state: {client3.recording_state.name}")
    print(f"   Is recording: {client3.is_recording}")
    print(f"   Playback state: {client3.playback_state.name}")
    print(f"   Is playing: {client3.is_playing}")
    print(f"   Queue length: {len(client3.audio_queue)}")
    print(f"   Current amplitude: {client3.current_amplitude}")

    await client3.disconnect()
    client3.cleanup()
    print("✅ All controls test completed")

    print("\n🎉 All tests completed successfully!")

if __name__ == "__main__":
    asyncio.run(comprehensive_test())
EOF

# Run the test
python test_all_controls.py
```

This comprehensive testing suite will validate that all your controls are working properly after our recent fixes!

## CLI Tools

The SDK includes powerful command-line tools for setup, testing, and debugging:

### Setup & Configuration

```bash
# Interactive setup wizard
vocals setup

# List available audio devices
vocals devices

# Test a specific audio device
vocals test-device 1 --duration 5

# Generate diagnostic report
vocals diagnose
```

### Development Tools

```bash
# Run all tests
vocals test

# Run a demo session
vocals demo --duration 30 --verbose

# Create project templates
vocals create-template voice_assistant
vocals create-template file_processor
vocals create-template conversation_tracker
vocals create-template advanced_voice_assistant
```

**Available Templates:**

- `voice_assistant`: Simple voice assistant (**Default Experience**)
- `file_processor`: Process audio files (**Default Experience**)
- `conversation_tracker`: Track conversations (**Controlled Experience**)
- `advanced_voice_assistant`: Full control voice assistant (**Controlled Experience**)

All templates use the modern **class-based API** with `VocalsClient`.

### Advanced Features

```bash
# Performance monitoring
vocals demo --duration 60 --stats

# Custom audio device
vocals demo --device 2

# Debug mode
VOCALS_DEBUG_LEVEL=DEBUG vocals demo
```

## Error Handling

The client provides comprehensive error handling:

```python
from vocals import VocalsClient, VocalsError

async def main():
    client = VocalsClient()

    try:
        async with client:
            await client.stream_microphone(duration=10.0)
    except VocalsError as e:
        print(f"Vocals client error: {e}")
    except Exception as e:
        print(f"Unexpected error: {e}")
        # Manual cleanup if context manager fails
        await client.disconnect()
        client.cleanup()

# Alternative without context manager
async def main_manual():
    client = VocalsClient()

    try:
        await client.stream_microphone(duration=10.0)
    except VocalsError as e:
        print(f"Vocals client error: {e}")
    except Exception as e:
        print(f"Unexpected error: {e}")
    finally:
        await client.disconnect()
        client.cleanup()
```

## Troubleshooting

### Common Issues

1. **"API key not found"**

   - Set environment variable: `export VOCALS_DEV_API_KEY="your_key"`
   - Or create `.env` file with the key
   - Ensure the .env file is loaded (e.g., using python-dotenv if needed)

2. **"Connection failed"**

   - Check your internet connection
   - Verify API key is valid
   - Check WebSocket endpoint is accessible
   - Try increasing reconnect attempts in config

3. **"No audio input detected"**

   - Check microphone permissions
   - Verify microphone is working (use `vocals devices` to list devices)
   - Adjust `amplitude_threshold` parameter lower (e.g., 0.005)
   - Test with `vocals test-device <id>`

4. **Audio playback issues**

   - Ensure speakers/headphones are connected
   - Check system audio settings
   - Try different audio formats or sample rates in AudioConfig

5. **High latency**

   - Check network speed
   - Reduce buffer_size in AudioConfig
   - Ensure no other apps are using high bandwidth

6. **Dependency errors**
   - Run `pip install -r requirements.txt` again
   - For Linux: Ensure portaudio is installed
   - Try creating a fresh virtual environment

If issues persist, run `vocals diagnose` and share the output when reporting bugs.

### Debug Mode

Enable debug logging to troubleshoot issues:

```python
import logging

# Enable debug logging
logging.basicConfig(level=logging.DEBUG)

# Or for specific modules
logging.getLogger("vocals").setLevel(logging.DEBUG)
```

## Examples

Check out the included examples:

- [`examples/example_microphone_streaming.py`](examples/example_microphone_streaming.py) - Comprehensive microphone streaming examples
- [`examples/example_file_playback.py`](examples/example_file_playback.py) - Audio file playback examples
- [`examples/run_examples.sh`](examples/run_examples.sh) - Script to run examples with proper setup

## Contributing

Contributions are welcome! Please follow these steps:

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request

For major changes, please open an issue first to discuss what you would like to change.

See [CONTRIBUTING.md](CONTRIBUTING.md) for more details (feel free to create one if it doesn't exist).

## Support

For support, documentation, and updates:

- 📖 [Documentation](https://docs.vocals.dev)
- 🐛 [Issues](https://github.com/vocals/vocals-sdk-python/issues)
- 💬 [Support](mailto:support@vocals.dev)

## License

MIT License - see LICENSE file for details.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/hairetsucodes/vocals-sdk-python",
    "name": "vocals",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "vocals, audio, speech, websocket, real-time, voice processing",
    "author": "Vocals Team",
    "author_email": "support@vocals.dev",
    "download_url": "https://files.pythonhosted.org/packages/db/19/580616ee1899cb2156601a9bef458be863b999cbc7d8d5fce8867e5e4445/vocals-1.0.984.tar.gz",
    "platform": null,
    "description": "# Vocals SDK Python\n\n[![PyPI version](https://badge.fury.io/py/vocals.svg)](https://badge.fury.io/py/vocals)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![GitHub issues](https://img.shields.io/github/issues/hairetsucodes/vocals-sdk-python)](https://github.com/hairetsucodes/vocals-sdk-python/issues)\n\nA Python SDK for voice processing and real-time audio communication with AI assistants. Stream microphone input or audio files to receive live transcription, AI responses, and text-to-speech audio.\n\n**Features both class-based and functional interfaces** for maximum flexibility and ease of use.\n\n## Features\n\n- \ud83c\udfa4 **Real-time microphone streaming** with voice activity detection\n- \ud83d\udcc1 **Audio file playback** support (WAV format)\n- \u2728 **Live transcription** with partial and final results\n- \ud83e\udd16 **Streaming AI responses** with real-time text display\n- \ud83d\udd0a **Text-to-speech playback** with automatic audio queueing\n- \ud83d\udcca **Conversation tracking** and session statistics\n- \ud83d\ude80 **Easy setup** with minimal configuration required\n- \ud83d\udd04 **Auto-reconnection** and robust error handling\n- \ud83c\udf9b\ufe0f **Class-based API** with modern Python patterns\n- \ud83d\udd00 **Context manager support** for automatic cleanup\n\n## Table of Contents\n\n- [Features](#features)\n- [Installation](#installation)\n- [Quick Start](#quick-start)\n- [SDK Modes](#sdk-modes)\n- [Advanced Usage](#advanced-usage)\n- [Configuration](#configuration)\n- [Complete API Reference](#complete-api-reference)\n- [Testing Your Setup](#testing-your-setup)\n- [CLI Tools](#cli-tools)\n- [Error Handling](#error-handling)\n- [Troubleshooting](#troubleshooting)\n- [Examples](#examples)\n- [Contributing](#contributing)\n- [Support](#support)\n- [License](#license)\n\n## Installation\n\n```bash\npip install vocals\n```\n\n### Quick Setup\n\nAfter installation, use the built-in setup wizard to configure your environment:\n\n```bash\nvocals setup\n```\n\nOr test your installation:\n\n```bash\nvocals test\n```\n\nRun a quick demo:\n\n```bash\nvocals demo\n```\n\n### \ud83c\udf10 Web UI Demo\n\n**NEW!** Launch an interactive web interface to try the voice assistant:\n\n```bash\nvocals demo --ui\n```\n\nThis will:\n\n- \u2705 **Automatically install Gradio** (if not already installed)\n- \ud83d\ude80 **Launch a web interface** in your browser\n- \ud83c\udfa4 **Real-time voice interaction** with visual feedback\n- \ud83d\udcf1 **Easy-to-use interface** with buttons and live updates\n- \ud83d\udd0a **Live transcription and AI responses** in the browser\n\n**Perfect for:**\n\n- \ud83c\udfaf **Quick demonstrations** and testing\n- \ud83d\udc65 **Showing to others** without command line\n- \ud83d\udda5\ufe0f **Visual feedback** and status indicators\n- \ud83d\udcca **Real-time conversation tracking**\n\nThe web UI provides the same functionality as the command line demo but with an intuitive graphical interface that's perfect for demonstrations and interactive testing.\n\n### System Requirements\n\n- Python 3.8 or higher\n- Working microphone (for microphone streaming)\n- Audio output device (for TTS playback)\n\n### Additional Dependencies\n\nThe SDK automatically installs all required Python dependencies including `pyaudio`, `sounddevice`, `numpy`, `websockets`, and others.\n\nOn some Linux systems, you may need to install system-level audio libraries:\n\n**Ubuntu/Debian:**\n\n```bash\nsudo apt-get install portaudio19-dev\n```\n\n**Other Linux distributions:**\n\n```bash\n# Install portaudio development headers using your package manager\n# For example, on CentOS/RHEL: sudo yum install portaudio-devel\n```\n\n## Quick Start\n\n### 1. Get Your API Key\n\nSet up your Vocals API key as an environment variable:\n\n```bash\nexport VOCALS_DEV_API_KEY=\"your_api_key_here\"\n```\n\nOr create a `.env` file in your project:\n\n```\nVOCALS_DEV_API_KEY=your_api_key_here\n```\n\n### 2. Basic Usage\n\nThe Vocals SDK provides a modern **class-based API** as the primary interface\n\n#### Microphone Streaming (Minimal Example)\n\n```python\nimport asyncio\nfrom vocals import VocalsClient\n\nasync def main():\n    # Create client instance\n    client = VocalsClient()\n\n    # Stream microphone for 10 seconds\n    await client.stream_microphone(duration=10.0)\n\n    # Clean up\n    await client.disconnect()\n    client.cleanup()\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n```\n\n#### Audio File Playback (Minimal Example)\n\n```python\nimport asyncio\nfrom vocals import VocalsClient\n\nasync def main():\n    # Create client instance\n    client = VocalsClient()\n\n    # Stream audio file\n    await client.stream_audio_file(\"path/to/your/audio.wav\")\n\n    # Clean up\n    await client.disconnect()\n    client.cleanup()\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n```\n\n#### Context Manager Usage (Recommended)\n\n```python\nimport asyncio\nfrom vocals import VocalsClient\n\nasync def main():\n    # Use context manager for automatic cleanup\n    async with VocalsClient() as client:\n        await client.stream_microphone(duration=10.0)\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n```\n\n## SDK Modes\n\nThe Vocals SDK supports two usage patterns:\n\n### Default Experience (No Modes)\n\nWhen you create the client without specifying modes, you get a full auto-contained experience:\n\n```python\n# Full experience with automatic handlers, playback, and beautiful console output\nclient = VocalsClient()\n```\n\n**Features:**\n\n- \u2705 Automatic transcription display with partial updates\n- \u2705 Streaming AI response display in real-time\n- \u2705 Automatic TTS audio playback\n- \u2705 Speech interruption handling\n- \u2705 Beautiful console output with emojis\n- \u2705 Perfect for getting started quickly\n\n### Controlled Experience (With Modes)\n\nWhen you specify modes, the client becomes passive and you control everything:\n\n```python\n# Controlled experience - you handle all logic\nclient = VocalsClient(modes=['transcription', 'voice_assistant'])\n```\n\n**Available Modes:**\n\n- `'transcription'`: Enables transcription-related internal processing\n- `'voice_assistant'`: Enables AI response handling and speech interruption\n\n**Features:**\n\n- \u2705 No automatic handlers attached\n- \u2705 No automatic playback\n- \u2705 You attach your own message handlers\n- \u2705 You control when to play audio\n- \u2705 Perfect for custom applications\n\n### Example: Controlled Experience\n\n```python\nimport asyncio\nfrom vocals import VocalsClient\n\nasync def main():\n    # Create client with controlled experience\n    client = VocalsClient(modes=['transcription', 'voice_assistant'])\n\n    # Custom message handler\n    def handle_messages(message):\n        if message.type == \"transcription\" and message.data:\n            text = message.data.get(\"text\", \"\")\n            is_partial = message.data.get(\"is_partial\", False)\n            if not is_partial:\n                print(f\"You said: {text}\")\n\n        elif message.type == \"tts_audio\" and message.data:\n            text = message.data.get(\"text\", \"\")\n            print(f\"AI speaking: {text}\")\n            # Manually start playback\n            asyncio.create_task(client.play_audio())\n\n    # Register your handler\n    client.on_message(handle_messages)\n\n    # Stream microphone with context manager\n    async with client:\n        await client.stream_microphone(\n            duration=30.0,\n            auto_playback=False  # We control playback\n        )\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n```\n\n## Advanced Usage\n\n### Enhanced Microphone Streaming\n\n```python\nimport asyncio\nimport logging\nfrom vocals import (\n    VocalsClient,\n    create_enhanced_message_handler,\n    create_default_connection_handler,\n    create_default_error_handler,\n)\n\nasync def main():\n    # Configure logging for cleaner output\n    logging.getLogger(\"vocals\").setLevel(logging.WARNING)\n\n    # Create client with default full experience\n    client = VocalsClient()\n\n    try:\n        print(\"\ud83c\udfa4 Starting microphone streaming...\")\n        print(\"Speak into your microphone!\")\n\n        # Stream microphone with enhanced features\n        async with client:\n            stats = await client.stream_microphone(\n                duration=30.0,            # Record for 30 seconds\n                auto_connect=True,        # Auto-connect if needed\n                auto_playback=True,       # Auto-play received audio\n                verbose=False,            # Client handles display automatically\n                stats_tracking=True,      # Track session statistics\n                amplitude_threshold=0.01, # Voice activity detection threshold\n            )\n\n        # Print session statistics\n        print(f\"\\n\ud83d\udcca Session Statistics:\")\n        print(f\"   \u2022 Transcriptions: {stats.get('transcriptions', 0)}\")\n        print(f\"   \u2022 AI Responses: {stats.get('responses', 0)}\")\n        print(f\"   \u2022 TTS Segments: {stats.get('tts_segments_received', 0)}\")\n\n    except Exception as e:\n        print(f\"Error: {e}\")\n        await client.disconnect()\n        client.cleanup()\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n```\n\n### Conversation Tracking Example\n\n```python\nimport asyncio\nfrom vocals import (\n    VocalsClient,\n    create_conversation_tracker,\n    create_enhanced_message_handler,\n)\n\nasync def main():\n    # Create client with controlled experience for custom tracking\n    client = VocalsClient(modes=['transcription', 'voice_assistant'])\n    conversation_tracker = create_conversation_tracker()\n\n    # Custom message handler with conversation tracking\n    def tracking_handler(message):\n        # Custom display logic\n        if message.type == \"transcription\" and message.data:\n            text = message.data.get(\"text\", \"\")\n            is_partial = message.data.get(\"is_partial\", False)\n            if not is_partial and text:\n                print(f\"\ud83c\udfa4 You: {text}\")\n\n        elif message.type == \"llm_response\" and message.data:\n            response = message.data.get(\"response\", \"\")\n            if response:\n                print(f\"\ud83e\udd16 AI: {response}\")\n\n        elif message.type == \"tts_audio\" and message.data:\n            text = message.data.get(\"text\", \"\")\n            if text:\n                print(f\"\ud83d\udd0a Playing: {text}\")\n                # Manually start playback since we're in controlled mode\n                asyncio.create_task(client.play_audio())\n\n        # Track conversation based on message type\n        if message.type == \"transcription\" and message.data:\n            text = message.data.get(\"text\", \"\")\n            is_partial = message.data.get(\"is_partial\", False)\n            if text and not is_partial:\n                conversation_tracker[\"add_transcription\"](text, is_partial)\n\n        elif message.type == \"llm_response\" and message.data:\n            response = message.data.get(\"response\", \"\")\n            if response:\n                conversation_tracker[\"add_response\"](response)\n\n    # Set up handler\n    client.on_message(tracking_handler)\n\n    try:\n        # Stream microphone with context manager\n        async with client:\n            await client.stream_microphone(\n                duration=15.0,\n                auto_playback=False  # We handle playback manually\n            )\n\n        # Print conversation history\n        print(\"\\n\" + \"=\"*50)\n        print(\"\ud83d\udcdc CONVERSATION HISTORY\")\n        print(\"=\"*50)\n        conversation_tracker[\"print_conversation\"]()\n\n        # Print conversation statistics\n        stats = conversation_tracker[\"get_stats\"]()\n        print(f\"\\n\ud83d\udcc8 Session lasted {stats['duration']:.1f} seconds\")\n\n    except Exception as e:\n        print(f\"Error: {e}\")\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n```\n\n### Infinite Streaming with Signal Handling\n\n```python\nimport asyncio\nimport signal\nfrom vocals import VocalsClient\n\n# Global shutdown event\nshutdown_event = asyncio.Event()\n\ndef setup_signal_handlers():\n    \"\"\"Setup signal handlers for graceful shutdown.\"\"\"\n    def signal_handler(signum, frame):\n        if not shutdown_event.is_set():\n            print(f\"\\n\ud83d\udce1 Received signal {signum}, shutting down...\")\n            shutdown_event.set()\n\n    signal.signal(signal.SIGINT, signal_handler)\n    signal.signal(signal.SIGTERM, signal_handler)\n\nasync def main():\n    setup_signal_handlers()\n\n    # Create client\n    client = VocalsClient()\n\n    try:\n        print(\"\ud83c\udfa4 Starting infinite streaming...\")\n        print(\"Press Ctrl+C to stop\")\n\n        # Connect to service\n        await client.connect()\n\n        # Create streaming task\n        async def stream_task():\n            await client.stream_microphone(\n                duration=0,  # 0 = infinite streaming\n                auto_connect=True,\n                auto_playback=True,\n                verbose=False,\n                stats_tracking=True,\n            )\n\n        # Run streaming and wait for shutdown\n        streaming_task = asyncio.create_task(stream_task())\n        shutdown_task = asyncio.create_task(shutdown_event.wait())\n\n        # Wait for shutdown signal\n        await shutdown_task\n\n        # Stop recording gracefully\n        await client.stop_recording()\n\n    finally:\n        # Cancel streaming task\n        if 'streaming_task' in locals():\n            streaming_task.cancel()\n        await client.disconnect()\n        client.cleanup()\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n```\n\n### Custom Audio Processing (Alternative to Local Playback)\n\nInstead of playing audio locally, you can process audio segments with custom handlers - perfect for saving audio files, sending to external players, or implementing custom audio processing:\n\n```python\nimport asyncio\nimport base64\nfrom vocals import VocalsClient\n\nasync def main():\n    \"\"\"Advanced voice assistant with custom audio processing\"\"\"\n\n    # Create client with controlled mode for manual audio handling\n    client = VocalsClient(modes=[\"transcription\", \"voice_assistant\"])\n\n    # Custom state tracking\n    conversation_state = {\"listening\": False, \"processing\": False, \"speaking\": False}\n\n    def handle_messages(message):\n        \"\"\"Custom message handler with audio processing control\"\"\"\n\n        if message.type == \"transcription\" and message.data:\n            text = message.data.get(\"text\", \"\")\n            is_partial = message.data.get(\"is_partial\", False)\n\n            if is_partial:\n                print(f\"\\r\ud83c\udfa4 Listening: {text}...\", end=\"\", flush=True)\n            else:\n                print(f\"\\n\u2705 You said: {text}\")\n\n        elif message.type == \"llm_response_streaming\" and message.data:\n            token = message.data.get(\"token\", \"\")\n            is_complete = message.data.get(\"is_complete\", False)\n\n            if token:\n                print(token, end=\"\", flush=True)\n            if is_complete:\n                print()  # New line\n\n        elif message.type == \"tts_audio\" and message.data:\n            text = message.data.get(\"text\", \"\")\n            if text and not conversation_state[\"speaking\"]:\n                print(f\"\ud83d\udd0a AI speaking: {text}\")\n                conversation_state[\"speaking\"] = True\n\n                # Custom audio processing instead of local playback\n                def custom_audio_handler(segment):\n                    \"\"\"Process each audio segment with custom logic\"\"\"\n                    print(f\"\ud83c\udfb5 Processing audio: {segment.text}\")\n\n                    # Option 1: Save to file\n                    audio_data = base64.b64decode(segment.audio_data)\n                    filename = f\"audio_{segment.segment_id}.wav\"\n                    with open(filename, \"wb\") as f:\n                        f.write(audio_data)\n                    print(f\"\ud83d\udcbe Saved audio to: {filename}\")\n\n                    # Option 2: Send to external audio player\n                    # subprocess.run([\"ffplay\", \"-nodisp\", \"-autoexit\", filename])\n\n                    # Option 3: Stream to audio device\n                    # your_audio_device.play(audio_data)\n\n                    # Option 4: Convert format\n                    # converted_audio = convert_audio_format(audio_data, target_format)\n\n                    # Option 5: Process with AI/ML\n                    # audio_features = extract_audio_features(audio_data)\n                    # emotion_score = analyze_emotion(audio_features)\n\n                # Process all available audio segments\n                processed_count = client.process_audio_queue(\n                    custom_audio_handler,\n                    consume_all=True\n                )\n                print(f\"\u2705 Processed {processed_count} audio segments\")\n\n        elif message.type == \"speech_interruption\":\n            print(\"\\n\ud83d\uded1 Speech interrupted\")\n            conversation_state[\"speaking\"] = False\n\n    # Register message handler\n    client.on_message(handle_messages)\n\n    # Connection handler\n    def handle_connection(state):\n        if state.name == \"CONNECTED\":\n            print(\"\u2705 Connected to voice assistant\")\n        elif state.name == \"DISCONNECTED\":\n            print(\"\u274c Disconnected from voice assistant\")\n\n    client.on_connection_change(handle_connection)\n\n    try:\n        print(\"\ud83c\udfa4 Voice Assistant with Custom Audio Processing\")\n        print(\"Audio will be saved to files instead of played locally\")\n        print(\"Speak into your microphone...\")\n        print(\"Press Ctrl+C to stop\")\n\n        # Stream microphone with custom audio handling\n        async with client:\n            await client.stream_microphone(\n                duration=0,           # Infinite recording\n                auto_connect=True,    # Auto-connect to service\n                auto_playback=False,  # Disable automatic playback - we handle it\n                verbose=False,        # Clean output\n            )\n\n    except KeyboardInterrupt:\n        print(\"\\n\ud83d\udc4b Custom audio processing stopped\")\n    finally:\n        await client.disconnect()\n        client.cleanup()\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n```\n\n**Key Features of Custom Audio Processing:**\n\n- \ud83c\udf9b\ufe0f **Full Control**: Complete control over audio handling instead of automatic playback\n- \ud83d\udcbe **Save to Files**: Save audio segments as individual WAV files\n- \ud83d\udd04 **Format Conversion**: Convert audio to different formats before processing\n- \ud83c\udfb5 **External Players**: Send audio to external audio players or devices\n- \ud83e\udd16 **AI Processing**: Analyze audio with machine learning models\n- \ud83d\udcca **Audio Analytics**: Extract features, analyze emotion, or process speech patterns\n- \ud83d\udd0c **Integration**: Easily integrate with existing audio pipelines\n\n**Use Cases:**\n\n- Recording conversations for later playback\n- Building custom audio players with UI controls\n- Streaming audio to multiple devices simultaneously\n- Processing audio with AI/ML models for analysis\n- Converting audio formats for different platforms\n- Creating audio archives or transcription systems\n\n## Configuration\n\n### Environment Variables\n\n```bash\n# Required: Your Vocals API key\nexport VOCALS_DEV_API_KEY=\"vdev_your_api_key_here\"\n\n```\n\n### Audio Configuration\n\n```python\nfrom vocals import VocalsClient, AudioConfig\n\n# Create custom audio configuration\naudio_config = AudioConfig(\n    sample_rate=24000,    # Sample rate in Hz\n    channels=1,           # Number of audio channels\n    format=\"pcm_f32le\",   # Audio format\n    buffer_size=1024,     # Audio buffer size\n)\n\n# Use with client\nclient = VocalsClient(audio_config=audio_config)\n```\n\n### SDK Configuration\n\n```python\nfrom vocals import VocalsClient, get_default_config\n\n# Get default configuration\nconfig = get_default_config()\n\n# Customize configuration\nconfig.max_reconnect_attempts = 5\nconfig.reconnect_delay = 2.0\nconfig.auto_connect = True\nconfig.token_refresh_buffer = 60.0\n\n# Use with client\nclient = VocalsClient(config=config)\n```\n\n## Complete API Reference\n\nThe Vocals SDK provides comprehensive control over voice processing, connection management, audio playback, and event handling. Here's a complete reference of all available controls:\n\n**\ud83c\udf9b\ufe0f Main Control Categories:**\n\n- **SDK Creation & Configuration** - Initialize and configure the SDK\n- **Stream Methods** - Control microphone and file streaming\n- **Connection Management** - Connect, disconnect, and manage WebSocket connections\n- **Audio Playback** - Control TTS audio playback, queueing, and timing\n- **Event Handling** - Register handlers for messages, connections, errors, and audio data\n- **State Management** - Access real-time state information\n- **Device Management** - Manage and test audio devices\n\n**\ud83d\udccb Quick Reference:**\n| Control Category | Key Methods | Purpose |\n|------------------|-------------|---------|\n| **Streaming** | `stream_microphone()`, `stream_audio_file()` | Start voice/audio processing |\n| **Connection** | `connect()`, `disconnect()`, `reconnect()` | Manage WebSocket connection |\n| **Recording** | `start_recording()`, `stop_recording()` | Control audio input |\n| **Playback** | `play_audio()`, `pause_audio()`, `stop_audio()` | Control TTS audio output |\n| **Queue** | `clear_queue()`, `add_to_queue()`, `get_audio_queue()` | Manage audio queue |\n| **Events** | `on_message()`, `on_connection_change()`, `on_error()` | Handle events |\n| **State** | `get_is_connected()`, `get_is_playing()`, `get_recording_state()` | Check current state |\n\n### Core Functions\n\n- `VocalsClient(config?, audio_config?, user_id?, modes?)` - Create client instance\n- `get_default_config()` - Get default configuration\n- `AudioConfig(...)` - Audio configuration class\n\n#### `VocalsClient()` Constructor\n\n```python\nVocalsClient(\n    config: Optional[VocalsConfig] = None,\n    audio_config: Optional[AudioConfig] = None,\n    user_id: Optional[str] = None,\n    modes: List[str] = []  # Controls client behavior\n)\n```\n\n**Parameters:**\n\n- `config`: Client configuration options (connection, logging, etc.)\n- `audio_config`: Audio processing configuration (sample rate, channels, etc.)\n- `user_id`: Optional user ID for token generation\n- `modes`: List of modes to control client behavior\n\n**Modes:**\n\n- `[]` (empty list): **Default Experience** - Full auto-contained behavior with automatic handlers\n- `['transcription']`: **Controlled** - Only transcription-related internal processing\n- `['voice_assistant']`: **Controlled** - Only AI response handling and speech interruption\n- `['transcription', 'voice_assistant']`: **Controlled** - Both features, but no automatic handlers\n\n### Audio Configuration\n\n```python\nAudioConfig(\n    sample_rate: int = 24000,     # Sample rate in Hz\n    channels: int = 1,            # Number of audio channels\n    format: str = \"pcm_f32le\",    # Audio format\n    buffer_size: int = 1024,      # Audio buffer size\n)\n```\n\n### Stream Methods\n\n#### `stream_microphone()` Parameters\n\n```python\nawait client.stream_microphone(\n    duration: float = 30.0,           # Recording duration in seconds (0 for infinite)\n    auto_connect: bool = True,        # Whether to automatically connect if not connected\n    auto_playback: bool = True,       # Whether to automatically play received audio\n    verbose: bool = True,             # Whether to log detailed progress\n    stats_tracking: bool = True,      # Whether to track and return statistics\n    amplitude_threshold: float = 0.01 # Minimum amplitude to consider as speech\n)\n```\n\n**Important:** In **Controlled Experience** (with modes), TTS audio is always added to the queue, but `auto_playback=False` prevents automatic playback. You must manually call `client.play_audio()` to play queued audio.\n\n#### `stream_audio_file()` Parameters\n\n```python\nawait client.stream_audio_file(\n    file_path: str,                   # Path to the audio file to stream\n    chunk_size: int = 1024,           # Size of each chunk to send\n    verbose: bool = True,             # Whether to log detailed progress\n    auto_connect: bool = True         # Whether to automatically connect if not connected\n)\n```\n\n### Connection & Recording Methods\n\n```python\nawait client.connect()                # Connect to WebSocket\nawait client.disconnect()             # Disconnect from WebSocket\nawait client.reconnect()              # Reconnect to WebSocket\nawait client.start_recording()        # Start recording\nawait client.stop_recording()         # Stop recording\n```\n\n### Audio Playback Methods\n\n```python\nawait client.play_audio()             # Start/resume audio playback\nawait client.pause_audio()            # Pause audio playback\nawait client.stop_audio()             # Stop audio playback\nawait client.fade_out_audio(duration) # Fade out audio over specified duration\nclient.clear_queue()                  # Clear the audio playback queue\nclient.add_to_queue(segment)          # Add audio segment to queue\n```\n\n### Event Handlers\n\n```python\nclient.on_message(handler)            # Handle incoming messages\nclient.on_connection_change(handler)  # Handle connection state changes\nclient.on_error(handler)              # Handle errors\nclient.on_audio_data(handler)         # Handle audio data\n```\n\n**Handler Functions:**\n\n- `handler(message)` - Message handler receives WebSocket messages\n- `handler(connection_state)` - Connection handler receives connection state changes\n- `handler(error)` - Error handler receives error objects\n- `handler(audio_data)` - Audio data handler receives real-time audio data\n\n### Properties\n\n```python\n# Connection properties\nclient.connection_state               # Get current connection state\nclient.is_connected                   # Check if connected\nclient.is_connecting                  # Check if connecting\n\n# Recording properties\nclient.recording_state                # Get current recording state\nclient.is_recording                   # Check if recording\n\n# Playback properties\nclient.playback_state                 # Get current playback state\nclient.is_playing                     # Check if playing audio\nclient.audio_queue                    # Get current audio queue\nclient.current_segment                # Get currently playing segment\nclient.current_amplitude              # Get current audio amplitude\n\n# Token properties\nclient.token                          # Get current token\nclient.token_expires_at               # Get token expiration timestamp\n```\n\n### Utility Methods\n\n```python\nclient.set_user_id(user_id)           # Set user ID for token generation\nclient.cleanup()                      # Clean up resources\nclient.process_audio_queue(handler)   # Process audio queue with custom handler\n```\n\n### Utility Functions\n\nThese utility functions work with both the class-based and functional APIs:\n\n```python\n# Message handlers\ncreate_enhanced_message_handler(\n    verbose: bool = True,\n    show_transcription: bool = True,\n    show_responses: bool = True,\n    show_streaming: bool = True,\n    show_detection: bool = False\n)\n\n# Conversation tracking\ncreate_conversation_tracker()\n\n# Statistics tracking\ncreate_microphone_stats_tracker(verbose: bool = True)\n\n# Connection handlers\ncreate_default_connection_handler(verbose: bool = True)\ncreate_default_error_handler(verbose: bool = True)\n```\n\n### Audio Device Management\n\n```python\n# Device management\nlist_audio_devices()                  # List available audio devices\nget_default_audio_device()            # Get default audio device\ntest_audio_device(device_id, duration) # Test audio device\nvalidate_audio_device(device_id)      # Validate audio device\nget_audio_device_info(device_id)      # Get device information\nprint_audio_devices()                 # Print formatted device list\ncreate_audio_device_selector()        # Interactive device selector\n```\n\n### Auto-Playback Behavior\n\n**Default Experience (no modes):**\n\n- `auto_playback=True` (default): TTS audio plays automatically\n- `auto_playback=False`: TTS audio is added to queue but doesn't play automatically\n\n**Controlled Experience (with modes):**\n\n- `auto_playback=True`: TTS audio is added to queue and plays automatically\n- `auto_playback=False`: TTS audio is added to queue but requires manual `client.play_audio()` call\n\n**Key Point:** In controlled mode, TTS audio is **always** added to the queue regardless of `auto_playback` setting. The `auto_playback` parameter only controls whether playback starts automatically.\n\n### Message Types\n\nCommon message types you'll receive in handlers:\n\n```python\n# Transcription messages\n{\n    \"type\": \"transcription\",\n    \"data\": {\n        \"text\": \"Hello world\",\n        \"is_partial\": False,\n        \"segment_id\": \"abc123\"\n    }\n}\n\n# LLM streaming response\n{\n    \"type\": \"llm_response_streaming\",\n    \"data\": {\n        \"token\": \"Hello\",\n        \"accumulated_response\": \"Hello\",\n        \"is_complete\": False,\n        \"segment_id\": \"def456\"\n    }\n}\n\n# TTS audio\n{\n    \"type\": \"tts_audio\",\n    \"data\": {\n        \"text\": \"Hello there\",\n        \"audio_data\": \"base64_encoded_wav_data\",\n        \"sample_rate\": 24000,\n        \"segment_id\": \"ghi789\",\n        \"duration_seconds\": 1.5\n    }\n}\n\n# Speech interruption\n{\n    \"type\": \"speech_interruption\",\n    \"data\": {}\n}\n```\n\n## Testing Your Setup\n\nAfter setting up the SDK, you can test all the controls to ensure everything is working properly:\n\n### 1. Test Basic Audio Setup\n\n```bash\n# List available audio devices\nvocals devices\n\n# Test your microphone\nvocals test-device\n\n# Run system diagnostics\nvocals diagnose\n```\n\n### 2. Test Default Experience\n\n```python\nimport asyncio\nfrom vocals import VocalsClient\n\nasync def test_default():\n    \"\"\"Test default experience with automatic handlers\"\"\"\n    client = VocalsClient()  # No modes = full automatic experience\n\n    print(\"\ud83c\udfa4 Testing default experience...\")\n    print(\"Speak and listen for AI responses...\")\n\n    # Test with automatic playback\n    async with client:\n        await client.stream_microphone(\n            duration=15.0,\n            auto_playback=True,  # Should auto-play TTS\n            verbose=False\n        )\n\n    print(\"\u2705 Default experience test completed\")\n\nasyncio.run(test_default())\n```\n\n### 3. Test Controlled Experience\n\n```python\nimport asyncio\nfrom vocals import VocalsClient\n\nasync def test_controlled():\n    \"\"\"Test controlled experience with manual handlers\"\"\"\n    client = VocalsClient(modes=['transcription', 'voice_assistant'])\n\n    # Track what we receive\n    received_messages = []\n\n    def test_handler(message):\n        received_messages.append(message.type)\n        print(f\"\u2705 Received: {message.type}\")\n\n        # Test manual playback control\n        if message.type == \"tts_audio\":\n            print(\"\ud83d\udd0a Manually triggering playback...\")\n            asyncio.create_task(client.play_audio())\n\n    # Register handler\n    client.on_message(test_handler)\n\n    print(\"\ud83c\udfa4 Testing controlled experience...\")\n    print(\"Should receive transcription and TTS messages...\")\n\n    # Test with manual playback control\n    async with client:\n        await client.stream_microphone(\n            duration=15.0,\n            auto_playback=False,  # We control playback manually\n            verbose=False\n        )\n\n    print(f\"\ud83d\udcca Received message types: {set(received_messages)}\")\n\n    # Verify we got the expected message types\n    expected_types = [\"transcription\", \"tts_audio\"]\n    for msg_type in expected_types:\n        if msg_type in received_messages:\n            print(f\"\u2705 {msg_type} messages working\")\n        else:\n            print(f\"\u274c {msg_type} messages not received\")\n\n    print(\"\u2705 Controlled experience test completed\")\n\nasyncio.run(test_controlled())\n```\n\n### 4. Test Audio Playback Controls\n\n```python\nimport asyncio\nfrom vocals import VocalsClient\n\nasync def test_playback_controls():\n    \"\"\"Test all audio playback controls\"\"\"\n    client = VocalsClient(modes=['transcription', 'voice_assistant'])\n\n    # Test queue management\n    print(\"\ud83c\udfb5 Testing audio playback controls...\")\n\n    # Check initial state\n    print(f\"Initial queue size: {len(client.audio_queue)}\")\n    print(f\"Is playing: {client.is_playing}\")\n\n    def audio_handler(message):\n        if message.type == \"tts_audio\":\n            print(f\"\ud83c\udfb5 Audio received: {message.data.get('text', '')}\")\n            print(f\"Queue size: {len(client.audio_queue)}\")\n\n    client.on_message(audio_handler)\n\n    # Stream and collect audio\n    async with client:\n        await client.stream_microphone(\n            duration=10.0,\n            auto_playback=False,  # Don't auto-play\n            verbose=False\n        )\n\n    # Test manual controls\n    queue_size = len(client.audio_queue)\n    if queue_size > 0:\n        print(f\"\u2705 {queue_size} audio segments in queue\")\n\n        print(\"\ud83c\udfb5 Testing play_audio()...\")\n        await client.play_audio()\n\n        # Wait a moment then test pause\n        await asyncio.sleep(1)\n        print(\"\u23f8\ufe0f Testing pause_audio()...\")\n        await client.pause_audio()\n\n        print(\"\u25b6\ufe0f Testing play_audio() again...\")\n        await client.play_audio()\n\n        # Test stop\n        await asyncio.sleep(1)\n        print(\"\u23f9\ufe0f Testing stop_audio()...\")\n        await client.stop_audio()\n\n        print(\"\ud83d\uddd1\ufe0f Testing clear_queue()...\")\n        client.clear_queue()\n        print(f\"Queue size after clear: {len(client.audio_queue)}\")\n\n        print(\"\u2705 All playback controls working!\")\n    else:\n        print(\"\u274c No audio received to test playback controls\")\n\n    await client.disconnect()\n    client.cleanup()\n\nasyncio.run(test_playback_controls())\n```\n\n### 5. Test All Event Handlers\n\n```python\nimport asyncio\nfrom vocals import VocalsClient\n\nasync def test_event_handlers():\n    \"\"\"Test all event handler types\"\"\"\n    client = VocalsClient(modes=['transcription', 'voice_assistant'])\n\n    # Track events\n    events_received = {\n        'messages': 0,\n        'connections': 0,\n        'errors': 0,\n        'audio_data': 0\n    }\n\n    def message_handler(message):\n        events_received['messages'] += 1\n        print(f\"\ud83d\udce9 Message: {message.type}\")\n\n    def connection_handler(state):\n        events_received['connections'] += 1\n        print(f\"\ud83d\udd0c Connection: {state.name}\")\n\n    def error_handler(error):\n        events_received['errors'] += 1\n        print(f\"\u274c Error: {error.message}\")\n\n    def audio_data_handler(audio_data):\n        events_received['audio_data'] += 1\n        if events_received['audio_data'] % 100 == 0:  # Log every 100th\n            print(f\"\ud83c\udfa4 Audio data chunks: {events_received['audio_data']}\")\n\n    # Register all handlers\n    client.on_message(message_handler)\n    client.on_connection_change(connection_handler)\n    client.on_error(error_handler)\n    client.on_audio_data(audio_data_handler)\n\n    print(\"\ud83e\uddea Testing all event handlers...\")\n\n    async with client:\n        await client.stream_microphone(\n            duration=10.0,\n            auto_playback=False,\n            verbose=False\n        )\n\n    # Report results\n    print(\"\\n\ud83d\udcca Event Handler Test Results:\")\n    for event_type, count in events_received.items():\n        status = \"\u2705\" if count > 0 else \"\u274c\"\n        print(f\"   {status} {event_type}: {count}\")\n\nasyncio.run(test_event_handlers())\n```\n\n### 6. Validate All Controls Are Working\n\nRun this comprehensive test to verify everything:\n\n```bash\n# Create a test script\ncat > test_all_controls.py << 'EOF'\nimport asyncio\nfrom vocals import VocalsClient\n\nasync def comprehensive_test():\n    \"\"\"Comprehensive test of all client controls\"\"\"\n    print(\"\ud83e\uddea Comprehensive Client Control Test\")\n    print(\"=\" * 50)\n\n    # Test 1: Default mode\n    print(\"\\n1\ufe0f\u20e3 Testing Default Mode...\")\n    client1 = VocalsClient()\n    async with client1:\n        await client1.stream_microphone(duration=5.0, verbose=False)\n    print(\"\u2705 Default mode test completed\")\n\n    # Test 2: Controlled mode\n    print(\"\\n2\ufe0f\u20e3 Testing Controlled Mode...\")\n    client2 = VocalsClient(modes=['transcription', 'voice_assistant'])\n\n    message_count = 0\n    def counter(message):\n        nonlocal message_count\n        message_count += 1\n        if message.type == \"tts_audio\":\n            asyncio.create_task(client2.play_audio())\n\n    client2.on_message(counter)\n    async with client2:\n        await client2.stream_microphone(duration=5.0, auto_playback=False, verbose=False)\n    print(f\"\u2705 Controlled mode test completed - {message_count} messages\")\n\n    # Test 3: All controls\n    print(\"\\n3\ufe0f\u20e3 Testing Individual Controls...\")\n    client3 = VocalsClient()\n\n    # Test properties\n    print(f\"   Connection state: {client3.connection_state.name}\")\n    print(f\"   Is connected: {client3.is_connected}\")\n    print(f\"   Recording state: {client3.recording_state.name}\")\n    print(f\"   Is recording: {client3.is_recording}\")\n    print(f\"   Playback state: {client3.playback_state.name}\")\n    print(f\"   Is playing: {client3.is_playing}\")\n    print(f\"   Queue length: {len(client3.audio_queue)}\")\n    print(f\"   Current amplitude: {client3.current_amplitude}\")\n\n    await client3.disconnect()\n    client3.cleanup()\n    print(\"\u2705 All controls test completed\")\n\n    print(\"\\n\ud83c\udf89 All tests completed successfully!\")\n\nif __name__ == \"__main__\":\n    asyncio.run(comprehensive_test())\nEOF\n\n# Run the test\npython test_all_controls.py\n```\n\nThis comprehensive testing suite will validate that all your controls are working properly after our recent fixes!\n\n## CLI Tools\n\nThe SDK includes powerful command-line tools for setup, testing, and debugging:\n\n### Setup & Configuration\n\n```bash\n# Interactive setup wizard\nvocals setup\n\n# List available audio devices\nvocals devices\n\n# Test a specific audio device\nvocals test-device 1 --duration 5\n\n# Generate diagnostic report\nvocals diagnose\n```\n\n### Development Tools\n\n```bash\n# Run all tests\nvocals test\n\n# Run a demo session\nvocals demo --duration 30 --verbose\n\n# Create project templates\nvocals create-template voice_assistant\nvocals create-template file_processor\nvocals create-template conversation_tracker\nvocals create-template advanced_voice_assistant\n```\n\n**Available Templates:**\n\n- `voice_assistant`: Simple voice assistant (**Default Experience**)\n- `file_processor`: Process audio files (**Default Experience**)\n- `conversation_tracker`: Track conversations (**Controlled Experience**)\n- `advanced_voice_assistant`: Full control voice assistant (**Controlled Experience**)\n\nAll templates use the modern **class-based API** with `VocalsClient`.\n\n### Advanced Features\n\n```bash\n# Performance monitoring\nvocals demo --duration 60 --stats\n\n# Custom audio device\nvocals demo --device 2\n\n# Debug mode\nVOCALS_DEBUG_LEVEL=DEBUG vocals demo\n```\n\n## Error Handling\n\nThe client provides comprehensive error handling:\n\n```python\nfrom vocals import VocalsClient, VocalsError\n\nasync def main():\n    client = VocalsClient()\n\n    try:\n        async with client:\n            await client.stream_microphone(duration=10.0)\n    except VocalsError as e:\n        print(f\"Vocals client error: {e}\")\n    except Exception as e:\n        print(f\"Unexpected error: {e}\")\n        # Manual cleanup if context manager fails\n        await client.disconnect()\n        client.cleanup()\n\n# Alternative without context manager\nasync def main_manual():\n    client = VocalsClient()\n\n    try:\n        await client.stream_microphone(duration=10.0)\n    except VocalsError as e:\n        print(f\"Vocals client error: {e}\")\n    except Exception as e:\n        print(f\"Unexpected error: {e}\")\n    finally:\n        await client.disconnect()\n        client.cleanup()\n```\n\n## Troubleshooting\n\n### Common Issues\n\n1. **\"API key not found\"**\n\n   - Set environment variable: `export VOCALS_DEV_API_KEY=\"your_key\"`\n   - Or create `.env` file with the key\n   - Ensure the .env file is loaded (e.g., using python-dotenv if needed)\n\n2. **\"Connection failed\"**\n\n   - Check your internet connection\n   - Verify API key is valid\n   - Check WebSocket endpoint is accessible\n   - Try increasing reconnect attempts in config\n\n3. **\"No audio input detected\"**\n\n   - Check microphone permissions\n   - Verify microphone is working (use `vocals devices` to list devices)\n   - Adjust `amplitude_threshold` parameter lower (e.g., 0.005)\n   - Test with `vocals test-device <id>`\n\n4. **Audio playback issues**\n\n   - Ensure speakers/headphones are connected\n   - Check system audio settings\n   - Try different audio formats or sample rates in AudioConfig\n\n5. **High latency**\n\n   - Check network speed\n   - Reduce buffer_size in AudioConfig\n   - Ensure no other apps are using high bandwidth\n\n6. **Dependency errors**\n   - Run `pip install -r requirements.txt` again\n   - For Linux: Ensure portaudio is installed\n   - Try creating a fresh virtual environment\n\nIf issues persist, run `vocals diagnose` and share the output when reporting bugs.\n\n### Debug Mode\n\nEnable debug logging to troubleshoot issues:\n\n```python\nimport logging\n\n# Enable debug logging\nlogging.basicConfig(level=logging.DEBUG)\n\n# Or for specific modules\nlogging.getLogger(\"vocals\").setLevel(logging.DEBUG)\n```\n\n## Examples\n\nCheck out the included examples:\n\n- [`examples/example_microphone_streaming.py`](examples/example_microphone_streaming.py) - Comprehensive microphone streaming examples\n- [`examples/example_file_playback.py`](examples/example_file_playback.py) - Audio file playback examples\n- [`examples/run_examples.sh`](examples/run_examples.sh) - Script to run examples with proper setup\n\n## Contributing\n\nContributions are welcome! Please follow these steps:\n\n1. Fork the repository\n2. Create a feature branch (`git checkout -b feature/AmazingFeature`)\n3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)\n4. Push to the branch (`git push origin feature/AmazingFeature`)\n5. Open a Pull Request\n\nFor major changes, please open an issue first to discuss what you would like to change.\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for more details (feel free to create one if it doesn't exist).\n\n## Support\n\nFor support, documentation, and updates:\n\n- \ud83d\udcd6 [Documentation](https://docs.vocals.dev)\n- \ud83d\udc1b [Issues](https://github.com/vocals/vocals-sdk-python/issues)\n- \ud83d\udcac [Support](mailto:support@vocals.dev)\n\n## License\n\nMIT License - see LICENSE file for details.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A Python SDK for voice processing and real-time audio communication",
    "version": "1.0.984",
    "project_urls": {
        "Bug Reports": "https://github.com/vocals/vocals-sdk-python/issues",
        "Documentation": "https://docs.vocals.dev",
        "Homepage": "https://github.com/hairetsucodes/vocals-sdk-python",
        "Source": "https://github.com/vocals/vocals-sdk-python"
    },
    "split_keywords": [
        "vocals",
        " audio",
        " speech",
        " websocket",
        " real-time",
        " voice processing"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "90a95daae33861ca141730e048ac59acc4ad66f5fd6ba6cde5239f2a33664453",
                "md5": "cabdc160f95b90df69f07dadbd7375c5",
                "sha256": "c351a2c8283242298318f5f568d72184b747cd09d7098db75eb658cf6672dcf6"
            },
            "downloads": -1,
            "filename": "vocals-1.0.984-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "cabdc160f95b90df69f07dadbd7375c5",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 67468,
            "upload_time": "2025-07-11T04:52:56",
            "upload_time_iso_8601": "2025-07-11T04:52:56.004429Z",
            "url": "https://files.pythonhosted.org/packages/90/a9/5daae33861ca141730e048ac59acc4ad66f5fd6ba6cde5239f2a33664453/vocals-1.0.984-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "db19580616ee1899cb2156601a9bef458be863b999cbc7d8d5fce8867e5e4445",
                "md5": "783c54db7beda48fef269b79d547b87c",
                "sha256": "f74ad21ba0a68ff80e39da723260141a9e3cbd7e7253af54454bef6ae58e70d9"
            },
            "downloads": -1,
            "filename": "vocals-1.0.984.tar.gz",
            "has_sig": false,
            "md5_digest": "783c54db7beda48fef269b79d547b87c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 101230,
            "upload_time": "2025-07-11T04:52:57",
            "upload_time_iso_8601": "2025-07-11T04:52:57.301680Z",
            "url": "https://files.pythonhosted.org/packages/db/19/580616ee1899cb2156601a9bef458be863b999cbc7d8d5fce8867e5e4445/vocals-1.0.984.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-11 04:52:57",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "hairetsucodes",
    "github_project": "vocals-sdk-python",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "aiohttp",
            "specs": [
                [
                    ">=",
                    "3.8.0"
                ]
            ]
        },
        {
            "name": "websockets",
            "specs": [
                [
                    ">=",
                    "11.0.0"
                ]
            ]
        },
        {
            "name": "sounddevice",
            "specs": [
                [
                    ">=",
                    "0.4.6"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.21.0"
                ]
            ]
        },
        {
            "name": "PyJWT",
            "specs": [
                [
                    ">=",
                    "2.8.0"
                ]
            ]
        },
        {
            "name": "python-dotenv",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "typing-extensions",
            "specs": [
                [
                    ">=",
                    "4.0.0"
                ]
            ]
        },
        {
            "name": "pyaudio",
            "specs": [
                [
                    ">=",
                    "0.2.11"
                ]
            ]
        },
        {
            "name": "soundfile",
            "specs": [
                [
                    ">=",
                    "0.12.1"
                ]
            ]
        },
        {
            "name": "click",
            "specs": [
                [
                    ">=",
                    "8.0.0"
                ]
            ]
        },
        {
            "name": "psutil",
            "specs": [
                [
                    ">=",
                    "5.9.0"
                ]
            ]
        },
        {
            "name": "matplotlib",
            "specs": [
                [
                    ">=",
                    "3.5.0"
                ]
            ]
        }
    ],
    "lcname": "vocals"
}

Vocals Team