# Vocals SDK Python
[](https://badge.fury.io/py/vocals)
[](https://opensource.org/licenses/MIT)
[](https://github.com/hairetsucodes/vocals-sdk-python/issues)
A Python SDK for voice processing and real-time audio communication with AI assistants. Stream microphone input or audio files to receive live transcription, AI responses, and text-to-speech audio.
**Features both class-based and functional interfaces** for maximum flexibility and ease of use.
## Features
- π€ **Real-time microphone streaming** with voice activity detection
- π **Audio file playback** support (WAV format)
- β¨ **Live transcription** with partial and final results
- π€ **Streaming AI responses** with real-time text display
- π **Text-to-speech playback** with automatic audio queueing
- π **Conversation tracking** and session statistics
- π **Easy setup** with minimal configuration required
- π **Auto-reconnection** and robust error handling
- ποΈ **Class-based API** with modern Python patterns
- π **Context manager support** for automatic cleanup
## Table of Contents
- [Features](#features)
- [Installation](#installation)
- [Quick Start](#quick-start)
- [SDK Modes](#sdk-modes)
- [Advanced Usage](#advanced-usage)
- [Configuration](#configuration)
- [Complete API Reference](#complete-api-reference)
- [Testing Your Setup](#testing-your-setup)
- [CLI Tools](#cli-tools)
- [Error Handling](#error-handling)
- [Troubleshooting](#troubleshooting)
- [Examples](#examples)
- [Contributing](#contributing)
- [Support](#support)
- [License](#license)
## Installation
```bash
pip install vocals
```
### Quick Setup
After installation, use the built-in setup wizard to configure your environment:
```bash
vocals setup
```
Or test your installation:
```bash
vocals test
```
Run a quick demo:
```bash
vocals demo
```
### π Web UI Demo
**NEW!** Launch an interactive web interface to try the voice assistant:
```bash
vocals demo --ui
```
This will:
- β
**Automatically install Gradio** (if not already installed)
- π **Launch a web interface** in your browser
- π€ **Real-time voice interaction** with visual feedback
- π± **Easy-to-use interface** with buttons and live updates
- π **Live transcription and AI responses** in the browser
**Perfect for:**
- π― **Quick demonstrations** and testing
- π₯ **Showing to others** without command line
- π₯οΈ **Visual feedback** and status indicators
- π **Real-time conversation tracking**
The web UI provides the same functionality as the command line demo but with an intuitive graphical interface that's perfect for demonstrations and interactive testing.
### System Requirements
- Python 3.8 or higher
- Working microphone (for microphone streaming)
- Audio output device (for TTS playback)
### Additional Dependencies
The SDK automatically installs all required Python dependencies including `pyaudio`, `sounddevice`, `numpy`, `websockets`, and others.
On some Linux systems, you may need to install system-level audio libraries:
**Ubuntu/Debian:**
```bash
sudo apt-get install portaudio19-dev
```
**Other Linux distributions:**
```bash
# Install portaudio development headers using your package manager
# For example, on CentOS/RHEL: sudo yum install portaudio-devel
```
## Quick Start
### 1. Get Your API Key
Set up your Vocals API key as an environment variable:
```bash
export VOCALS_DEV_API_KEY="your_api_key_here"
```
Or create a `.env` file in your project:
```
VOCALS_DEV_API_KEY=your_api_key_here
```
### 2. Basic Usage
The Vocals SDK provides a modern **class-based API** as the primary interface
#### Microphone Streaming (Minimal Example)
```python
import asyncio
from vocals import VocalsClient
async def main():
# Create client instance
client = VocalsClient()
# Stream microphone for 10 seconds
await client.stream_microphone(duration=10.0)
# Clean up
await client.disconnect()
client.cleanup()
if __name__ == "__main__":
asyncio.run(main())
```
#### Audio File Playback (Minimal Example)
```python
import asyncio
from vocals import VocalsClient
async def main():
# Create client instance
client = VocalsClient()
# Stream audio file
await client.stream_audio_file("path/to/your/audio.wav")
# Clean up
await client.disconnect()
client.cleanup()
if __name__ == "__main__":
asyncio.run(main())
```
#### Context Manager Usage (Recommended)
```python
import asyncio
from vocals import VocalsClient
async def main():
# Use context manager for automatic cleanup
async with VocalsClient() as client:
await client.stream_microphone(duration=10.0)
if __name__ == "__main__":
asyncio.run(main())
```
## SDK Modes
The Vocals SDK supports two usage patterns:
### Default Experience (No Modes)
When you create the client without specifying modes, you get a full auto-contained experience:
```python
# Full experience with automatic handlers, playback, and beautiful console output
client = VocalsClient()
```
**Features:**
- β
Automatic transcription display with partial updates
- β
Streaming AI response display in real-time
- β
Automatic TTS audio playback
- β
Speech interruption handling
- β
Beautiful console output with emojis
- β
Perfect for getting started quickly
### Controlled Experience (With Modes)
When you specify modes, the client becomes passive and you control everything:
```python
# Controlled experience - you handle all logic
client = VocalsClient(modes=['transcription', 'voice_assistant'])
```
**Available Modes:**
- `'transcription'`: Enables transcription-related internal processing
- `'voice_assistant'`: Enables AI response handling and speech interruption
**Features:**
- β
No automatic handlers attached
- β
No automatic playback
- β
You attach your own message handlers
- β
You control when to play audio
- β
Perfect for custom applications
### Example: Controlled Experience
```python
import asyncio
from vocals import VocalsClient
async def main():
# Create client with controlled experience
client = VocalsClient(modes=['transcription', 'voice_assistant'])
# Custom message handler
def handle_messages(message):
if message.type == "transcription" and message.data:
text = message.data.get("text", "")
is_partial = message.data.get("is_partial", False)
if not is_partial:
print(f"You said: {text}")
elif message.type == "tts_audio" and message.data:
text = message.data.get("text", "")
print(f"AI speaking: {text}")
# Manually start playback
asyncio.create_task(client.play_audio())
# Register your handler
client.on_message(handle_messages)
# Stream microphone with context manager
async with client:
await client.stream_microphone(
duration=30.0,
auto_playback=False # We control playback
)
if __name__ == "__main__":
asyncio.run(main())
```
## Advanced Usage
### Enhanced Microphone Streaming
```python
import asyncio
import logging
from vocals import (
VocalsClient,
create_enhanced_message_handler,
create_default_connection_handler,
create_default_error_handler,
)
async def main():
# Configure logging for cleaner output
logging.getLogger("vocals").setLevel(logging.WARNING)
# Create client with default full experience
client = VocalsClient()
try:
print("π€ Starting microphone streaming...")
print("Speak into your microphone!")
# Stream microphone with enhanced features
async with client:
stats = await client.stream_microphone(
duration=30.0, # Record for 30 seconds
auto_connect=True, # Auto-connect if needed
auto_playback=True, # Auto-play received audio
verbose=False, # Client handles display automatically
stats_tracking=True, # Track session statistics
amplitude_threshold=0.01, # Voice activity detection threshold
)
# Print session statistics
print(f"\nπ Session Statistics:")
print(f" β’ Transcriptions: {stats.get('transcriptions', 0)}")
print(f" β’ AI Responses: {stats.get('responses', 0)}")
print(f" β’ TTS Segments: {stats.get('tts_segments_received', 0)}")
except Exception as e:
print(f"Error: {e}")
await client.disconnect()
client.cleanup()
if __name__ == "__main__":
asyncio.run(main())
```
### Conversation Tracking Example
```python
import asyncio
from vocals import (
VocalsClient,
create_conversation_tracker,
create_enhanced_message_handler,
)
async def main():
# Create client with controlled experience for custom tracking
client = VocalsClient(modes=['transcription', 'voice_assistant'])
conversation_tracker = create_conversation_tracker()
# Custom message handler with conversation tracking
def tracking_handler(message):
# Custom display logic
if message.type == "transcription" and message.data:
text = message.data.get("text", "")
is_partial = message.data.get("is_partial", False)
if not is_partial and text:
print(f"π€ You: {text}")
elif message.type == "llm_response" and message.data:
response = message.data.get("response", "")
if response:
print(f"π€ AI: {response}")
elif message.type == "tts_audio" and message.data:
text = message.data.get("text", "")
if text:
print(f"π Playing: {text}")
# Manually start playback since we're in controlled mode
asyncio.create_task(client.play_audio())
# Track conversation based on message type
if message.type == "transcription" and message.data:
text = message.data.get("text", "")
is_partial = message.data.get("is_partial", False)
if text and not is_partial:
conversation_tracker["add_transcription"](text, is_partial)
elif message.type == "llm_response" and message.data:
response = message.data.get("response", "")
if response:
conversation_tracker["add_response"](response)
# Set up handler
client.on_message(tracking_handler)
try:
# Stream microphone with context manager
async with client:
await client.stream_microphone(
duration=15.0,
auto_playback=False # We handle playback manually
)
# Print conversation history
print("\n" + "="*50)
print("π CONVERSATION HISTORY")
print("="*50)
conversation_tracker["print_conversation"]()
# Print conversation statistics
stats = conversation_tracker["get_stats"]()
print(f"\nπ Session lasted {stats['duration']:.1f} seconds")
except Exception as e:
print(f"Error: {e}")
if __name__ == "__main__":
asyncio.run(main())
```
### Infinite Streaming with Signal Handling
```python
import asyncio
import signal
from vocals import VocalsClient
# Global shutdown event
shutdown_event = asyncio.Event()
def setup_signal_handlers():
"""Setup signal handlers for graceful shutdown."""
def signal_handler(signum, frame):
if not shutdown_event.is_set():
print(f"\nπ‘ Received signal {signum}, shutting down...")
shutdown_event.set()
signal.signal(signal.SIGINT, signal_handler)
signal.signal(signal.SIGTERM, signal_handler)
async def main():
setup_signal_handlers()
# Create client
client = VocalsClient()
try:
print("π€ Starting infinite streaming...")
print("Press Ctrl+C to stop")
# Connect to service
await client.connect()
# Create streaming task
async def stream_task():
await client.stream_microphone(
duration=0, # 0 = infinite streaming
auto_connect=True,
auto_playback=True,
verbose=False,
stats_tracking=True,
)
# Run streaming and wait for shutdown
streaming_task = asyncio.create_task(stream_task())
shutdown_task = asyncio.create_task(shutdown_event.wait())
# Wait for shutdown signal
await shutdown_task
# Stop recording gracefully
await client.stop_recording()
finally:
# Cancel streaming task
if 'streaming_task' in locals():
streaming_task.cancel()
await client.disconnect()
client.cleanup()
if __name__ == "__main__":
asyncio.run(main())
```
### Custom Audio Processing (Alternative to Local Playback)
Instead of playing audio locally, you can process audio segments with custom handlers - perfect for saving audio files, sending to external players, or implementing custom audio processing:
```python
import asyncio
import base64
from vocals import VocalsClient
async def main():
"""Advanced voice assistant with custom audio processing"""
# Create client with controlled mode for manual audio handling
client = VocalsClient(modes=["transcription", "voice_assistant"])
# Custom state tracking
conversation_state = {"listening": False, "processing": False, "speaking": False}
def handle_messages(message):
"""Custom message handler with audio processing control"""
if message.type == "transcription" and message.data:
text = message.data.get("text", "")
is_partial = message.data.get("is_partial", False)
if is_partial:
print(f"\rπ€ Listening: {text}...", end="", flush=True)
else:
print(f"\nβ
You said: {text}")
elif message.type == "llm_response_streaming" and message.data:
token = message.data.get("token", "")
is_complete = message.data.get("is_complete", False)
if token:
print(token, end="", flush=True)
if is_complete:
print() # New line
elif message.type == "tts_audio" and message.data:
text = message.data.get("text", "")
if text and not conversation_state["speaking"]:
print(f"π AI speaking: {text}")
conversation_state["speaking"] = True
# Custom audio processing instead of local playback
def custom_audio_handler(segment):
"""Process each audio segment with custom logic"""
print(f"π΅ Processing audio: {segment.text}")
# Option 1: Save to file
audio_data = base64.b64decode(segment.audio_data)
filename = f"audio_{segment.segment_id}.wav"
with open(filename, "wb") as f:
f.write(audio_data)
print(f"πΎ Saved audio to: {filename}")
# Option 2: Send to external audio player
# subprocess.run(["ffplay", "-nodisp", "-autoexit", filename])
# Option 3: Stream to audio device
# your_audio_device.play(audio_data)
# Option 4: Convert format
# converted_audio = convert_audio_format(audio_data, target_format)
# Option 5: Process with AI/ML
# audio_features = extract_audio_features(audio_data)
# emotion_score = analyze_emotion(audio_features)
# Process all available audio segments
processed_count = client.process_audio_queue(
custom_audio_handler,
consume_all=True
)
print(f"β
Processed {processed_count} audio segments")
elif message.type == "speech_interruption":
print("\nπ Speech interrupted")
conversation_state["speaking"] = False
# Register message handler
client.on_message(handle_messages)
# Connection handler
def handle_connection(state):
if state.name == "CONNECTED":
print("β
Connected to voice assistant")
elif state.name == "DISCONNECTED":
print("β Disconnected from voice assistant")
client.on_connection_change(handle_connection)
try:
print("π€ Voice Assistant with Custom Audio Processing")
print("Audio will be saved to files instead of played locally")
print("Speak into your microphone...")
print("Press Ctrl+C to stop")
# Stream microphone with custom audio handling
async with client:
await client.stream_microphone(
duration=0, # Infinite recording
auto_connect=True, # Auto-connect to service
auto_playback=False, # Disable automatic playback - we handle it
verbose=False, # Clean output
)
except KeyboardInterrupt:
print("\nπ Custom audio processing stopped")
finally:
await client.disconnect()
client.cleanup()
if __name__ == "__main__":
asyncio.run(main())
```
**Key Features of Custom Audio Processing:**
- ποΈ **Full Control**: Complete control over audio handling instead of automatic playback
- πΎ **Save to Files**: Save audio segments as individual WAV files
- π **Format Conversion**: Convert audio to different formats before processing
- π΅ **External Players**: Send audio to external audio players or devices
- π€ **AI Processing**: Analyze audio with machine learning models
- π **Audio Analytics**: Extract features, analyze emotion, or process speech patterns
- π **Integration**: Easily integrate with existing audio pipelines
**Use Cases:**
- Recording conversations for later playback
- Building custom audio players with UI controls
- Streaming audio to multiple devices simultaneously
- Processing audio with AI/ML models for analysis
- Converting audio formats for different platforms
- Creating audio archives or transcription systems
## Configuration
### Environment Variables
```bash
# Required: Your Vocals API key
export VOCALS_DEV_API_KEY="vdev_your_api_key_here"
```
### Audio Configuration
```python
from vocals import VocalsClient, AudioConfig
# Create custom audio configuration
audio_config = AudioConfig(
sample_rate=24000, # Sample rate in Hz
channels=1, # Number of audio channels
format="pcm_f32le", # Audio format
buffer_size=1024, # Audio buffer size
)
# Use with client
client = VocalsClient(audio_config=audio_config)
```
### SDK Configuration
```python
from vocals import VocalsClient, get_default_config
# Get default configuration
config = get_default_config()
# Customize configuration
config.max_reconnect_attempts = 5
config.reconnect_delay = 2.0
config.auto_connect = True
config.token_refresh_buffer = 60.0
# Use with client
client = VocalsClient(config=config)
```
## Complete API Reference
The Vocals SDK provides comprehensive control over voice processing, connection management, audio playback, and event handling. Here's a complete reference of all available controls:
**ποΈ Main Control Categories:**
- **SDK Creation & Configuration** - Initialize and configure the SDK
- **Stream Methods** - Control microphone and file streaming
- **Connection Management** - Connect, disconnect, and manage WebSocket connections
- **Audio Playback** - Control TTS audio playback, queueing, and timing
- **Event Handling** - Register handlers for messages, connections, errors, and audio data
- **State Management** - Access real-time state information
- **Device Management** - Manage and test audio devices
**π Quick Reference:**
| Control Category | Key Methods | Purpose |
|------------------|-------------|---------|
| **Streaming** | `stream_microphone()`, `stream_audio_file()` | Start voice/audio processing |
| **Connection** | `connect()`, `disconnect()`, `reconnect()` | Manage WebSocket connection |
| **Recording** | `start_recording()`, `stop_recording()` | Control audio input |
| **Playback** | `play_audio()`, `pause_audio()`, `stop_audio()` | Control TTS audio output |
| **Queue** | `clear_queue()`, `add_to_queue()`, `get_audio_queue()` | Manage audio queue |
| **Events** | `on_message()`, `on_connection_change()`, `on_error()` | Handle events |
| **State** | `get_is_connected()`, `get_is_playing()`, `get_recording_state()` | Check current state |
### Core Functions
- `VocalsClient(config?, audio_config?, user_id?, modes?)` - Create client instance
- `get_default_config()` - Get default configuration
- `AudioConfig(...)` - Audio configuration class
#### `VocalsClient()` Constructor
```python
VocalsClient(
config: Optional[VocalsConfig] = None,
audio_config: Optional[AudioConfig] = None,
user_id: Optional[str] = None,
modes: List[str] = [] # Controls client behavior
)
```
**Parameters:**
- `config`: Client configuration options (connection, logging, etc.)
- `audio_config`: Audio processing configuration (sample rate, channels, etc.)
- `user_id`: Optional user ID for token generation
- `modes`: List of modes to control client behavior
**Modes:**
- `[]` (empty list): **Default Experience** - Full auto-contained behavior with automatic handlers
- `['transcription']`: **Controlled** - Only transcription-related internal processing
- `['voice_assistant']`: **Controlled** - Only AI response handling and speech interruption
- `['transcription', 'voice_assistant']`: **Controlled** - Both features, but no automatic handlers
### Audio Configuration
```python
AudioConfig(
sample_rate: int = 24000, # Sample rate in Hz
channels: int = 1, # Number of audio channels
format: str = "pcm_f32le", # Audio format
buffer_size: int = 1024, # Audio buffer size
)
```
### Stream Methods
#### `stream_microphone()` Parameters
```python
await client.stream_microphone(
duration: float = 30.0, # Recording duration in seconds (0 for infinite)
auto_connect: bool = True, # Whether to automatically connect if not connected
auto_playback: bool = True, # Whether to automatically play received audio
verbose: bool = True, # Whether to log detailed progress
stats_tracking: bool = True, # Whether to track and return statistics
amplitude_threshold: float = 0.01 # Minimum amplitude to consider as speech
)
```
**Important:** In **Controlled Experience** (with modes), TTS audio is always added to the queue, but `auto_playback=False` prevents automatic playback. You must manually call `client.play_audio()` to play queued audio.
#### `stream_audio_file()` Parameters
```python
await client.stream_audio_file(
file_path: str, # Path to the audio file to stream
chunk_size: int = 1024, # Size of each chunk to send
verbose: bool = True, # Whether to log detailed progress
auto_connect: bool = True # Whether to automatically connect if not connected
)
```
### Connection & Recording Methods
```python
await client.connect() # Connect to WebSocket
await client.disconnect() # Disconnect from WebSocket
await client.reconnect() # Reconnect to WebSocket
await client.start_recording() # Start recording
await client.stop_recording() # Stop recording
```
### Audio Playback Methods
```python
await client.play_audio() # Start/resume audio playback
await client.pause_audio() # Pause audio playback
await client.stop_audio() # Stop audio playback
await client.fade_out_audio(duration) # Fade out audio over specified duration
client.clear_queue() # Clear the audio playback queue
client.add_to_queue(segment) # Add audio segment to queue
```
### Event Handlers
```python
client.on_message(handler) # Handle incoming messages
client.on_connection_change(handler) # Handle connection state changes
client.on_error(handler) # Handle errors
client.on_audio_data(handler) # Handle audio data
```
**Handler Functions:**
- `handler(message)` - Message handler receives WebSocket messages
- `handler(connection_state)` - Connection handler receives connection state changes
- `handler(error)` - Error handler receives error objects
- `handler(audio_data)` - Audio data handler receives real-time audio data
### Properties
```python
# Connection properties
client.connection_state # Get current connection state
client.is_connected # Check if connected
client.is_connecting # Check if connecting
# Recording properties
client.recording_state # Get current recording state
client.is_recording # Check if recording
# Playback properties
client.playback_state # Get current playback state
client.is_playing # Check if playing audio
client.audio_queue # Get current audio queue
client.current_segment # Get currently playing segment
client.current_amplitude # Get current audio amplitude
# Token properties
client.token # Get current token
client.token_expires_at # Get token expiration timestamp
```
### Utility Methods
```python
client.set_user_id(user_id) # Set user ID for token generation
client.cleanup() # Clean up resources
client.process_audio_queue(handler) # Process audio queue with custom handler
```
### Utility Functions
These utility functions work with both the class-based and functional APIs:
```python
# Message handlers
create_enhanced_message_handler(
verbose: bool = True,
show_transcription: bool = True,
show_responses: bool = True,
show_streaming: bool = True,
show_detection: bool = False
)
# Conversation tracking
create_conversation_tracker()
# Statistics tracking
create_microphone_stats_tracker(verbose: bool = True)
# Connection handlers
create_default_connection_handler(verbose: bool = True)
create_default_error_handler(verbose: bool = True)
```
### Audio Device Management
```python
# Device management
list_audio_devices() # List available audio devices
get_default_audio_device() # Get default audio device
test_audio_device(device_id, duration) # Test audio device
validate_audio_device(device_id) # Validate audio device
get_audio_device_info(device_id) # Get device information
print_audio_devices() # Print formatted device list
create_audio_device_selector() # Interactive device selector
```
### Auto-Playback Behavior
**Default Experience (no modes):**
- `auto_playback=True` (default): TTS audio plays automatically
- `auto_playback=False`: TTS audio is added to queue but doesn't play automatically
**Controlled Experience (with modes):**
- `auto_playback=True`: TTS audio is added to queue and plays automatically
- `auto_playback=False`: TTS audio is added to queue but requires manual `client.play_audio()` call
**Key Point:** In controlled mode, TTS audio is **always** added to the queue regardless of `auto_playback` setting. The `auto_playback` parameter only controls whether playback starts automatically.
### Message Types
Common message types you'll receive in handlers:
```python
# Transcription messages
{
"type": "transcription",
"data": {
"text": "Hello world",
"is_partial": False,
"segment_id": "abc123"
}
}
# LLM streaming response
{
"type": "llm_response_streaming",
"data": {
"token": "Hello",
"accumulated_response": "Hello",
"is_complete": False,
"segment_id": "def456"
}
}
# TTS audio
{
"type": "tts_audio",
"data": {
"text": "Hello there",
"audio_data": "base64_encoded_wav_data",
"sample_rate": 24000,
"segment_id": "ghi789",
"duration_seconds": 1.5
}
}
# Speech interruption
{
"type": "speech_interruption",
"data": {}
}
```
## Testing Your Setup
After setting up the SDK, you can test all the controls to ensure everything is working properly:
### 1. Test Basic Audio Setup
```bash
# List available audio devices
vocals devices
# Test your microphone
vocals test-device
# Run system diagnostics
vocals diagnose
```
### 2. Test Default Experience
```python
import asyncio
from vocals import VocalsClient
async def test_default():
"""Test default experience with automatic handlers"""
client = VocalsClient() # No modes = full automatic experience
print("π€ Testing default experience...")
print("Speak and listen for AI responses...")
# Test with automatic playback
async with client:
await client.stream_microphone(
duration=15.0,
auto_playback=True, # Should auto-play TTS
verbose=False
)
print("β
Default experience test completed")
asyncio.run(test_default())
```
### 3. Test Controlled Experience
```python
import asyncio
from vocals import VocalsClient
async def test_controlled():
"""Test controlled experience with manual handlers"""
client = VocalsClient(modes=['transcription', 'voice_assistant'])
# Track what we receive
received_messages = []
def test_handler(message):
received_messages.append(message.type)
print(f"β
Received: {message.type}")
# Test manual playback control
if message.type == "tts_audio":
print("π Manually triggering playback...")
asyncio.create_task(client.play_audio())
# Register handler
client.on_message(test_handler)
print("π€ Testing controlled experience...")
print("Should receive transcription and TTS messages...")
# Test with manual playback control
async with client:
await client.stream_microphone(
duration=15.0,
auto_playback=False, # We control playback manually
verbose=False
)
print(f"π Received message types: {set(received_messages)}")
# Verify we got the expected message types
expected_types = ["transcription", "tts_audio"]
for msg_type in expected_types:
if msg_type in received_messages:
print(f"β
{msg_type} messages working")
else:
print(f"β {msg_type} messages not received")
print("β
Controlled experience test completed")
asyncio.run(test_controlled())
```
### 4. Test Audio Playback Controls
```python
import asyncio
from vocals import VocalsClient
async def test_playback_controls():
"""Test all audio playback controls"""
client = VocalsClient(modes=['transcription', 'voice_assistant'])
# Test queue management
print("π΅ Testing audio playback controls...")
# Check initial state
print(f"Initial queue size: {len(client.audio_queue)}")
print(f"Is playing: {client.is_playing}")
def audio_handler(message):
if message.type == "tts_audio":
print(f"π΅ Audio received: {message.data.get('text', '')}")
print(f"Queue size: {len(client.audio_queue)}")
client.on_message(audio_handler)
# Stream and collect audio
async with client:
await client.stream_microphone(
duration=10.0,
auto_playback=False, # Don't auto-play
verbose=False
)
# Test manual controls
queue_size = len(client.audio_queue)
if queue_size > 0:
print(f"β
{queue_size} audio segments in queue")
print("π΅ Testing play_audio()...")
await client.play_audio()
# Wait a moment then test pause
await asyncio.sleep(1)
print("βΈοΈ Testing pause_audio()...")
await client.pause_audio()
print("βΆοΈ Testing play_audio() again...")
await client.play_audio()
# Test stop
await asyncio.sleep(1)
print("βΉοΈ Testing stop_audio()...")
await client.stop_audio()
print("ποΈ Testing clear_queue()...")
client.clear_queue()
print(f"Queue size after clear: {len(client.audio_queue)}")
print("β
All playback controls working!")
else:
print("β No audio received to test playback controls")
await client.disconnect()
client.cleanup()
asyncio.run(test_playback_controls())
```
### 5. Test All Event Handlers
```python
import asyncio
from vocals import VocalsClient
async def test_event_handlers():
"""Test all event handler types"""
client = VocalsClient(modes=['transcription', 'voice_assistant'])
# Track events
events_received = {
'messages': 0,
'connections': 0,
'errors': 0,
'audio_data': 0
}
def message_handler(message):
events_received['messages'] += 1
print(f"π© Message: {message.type}")
def connection_handler(state):
events_received['connections'] += 1
print(f"π Connection: {state.name}")
def error_handler(error):
events_received['errors'] += 1
print(f"β Error: {error.message}")
def audio_data_handler(audio_data):
events_received['audio_data'] += 1
if events_received['audio_data'] % 100 == 0: # Log every 100th
print(f"π€ Audio data chunks: {events_received['audio_data']}")
# Register all handlers
client.on_message(message_handler)
client.on_connection_change(connection_handler)
client.on_error(error_handler)
client.on_audio_data(audio_data_handler)
print("π§ͺ Testing all event handlers...")
async with client:
await client.stream_microphone(
duration=10.0,
auto_playback=False,
verbose=False
)
# Report results
print("\nπ Event Handler Test Results:")
for event_type, count in events_received.items():
status = "β
" if count > 0 else "β"
print(f" {status} {event_type}: {count}")
asyncio.run(test_event_handlers())
```
### 6. Validate All Controls Are Working
Run this comprehensive test to verify everything:
```bash
# Create a test script
cat > test_all_controls.py << 'EOF'
import asyncio
from vocals import VocalsClient
async def comprehensive_test():
"""Comprehensive test of all client controls"""
print("π§ͺ Comprehensive Client Control Test")
print("=" * 50)
# Test 1: Default mode
print("\n1οΈβ£ Testing Default Mode...")
client1 = VocalsClient()
async with client1:
await client1.stream_microphone(duration=5.0, verbose=False)
print("β
Default mode test completed")
# Test 2: Controlled mode
print("\n2οΈβ£ Testing Controlled Mode...")
client2 = VocalsClient(modes=['transcription', 'voice_assistant'])
message_count = 0
def counter(message):
nonlocal message_count
message_count += 1
if message.type == "tts_audio":
asyncio.create_task(client2.play_audio())
client2.on_message(counter)
async with client2:
await client2.stream_microphone(duration=5.0, auto_playback=False, verbose=False)
print(f"β
Controlled mode test completed - {message_count} messages")
# Test 3: All controls
print("\n3οΈβ£ Testing Individual Controls...")
client3 = VocalsClient()
# Test properties
print(f" Connection state: {client3.connection_state.name}")
print(f" Is connected: {client3.is_connected}")
print(f" Recording state: {client3.recording_state.name}")
print(f" Is recording: {client3.is_recording}")
print(f" Playback state: {client3.playback_state.name}")
print(f" Is playing: {client3.is_playing}")
print(f" Queue length: {len(client3.audio_queue)}")
print(f" Current amplitude: {client3.current_amplitude}")
await client3.disconnect()
client3.cleanup()
print("β
All controls test completed")
print("\nπ All tests completed successfully!")
if __name__ == "__main__":
asyncio.run(comprehensive_test())
EOF
# Run the test
python test_all_controls.py
```
This comprehensive testing suite will validate that all your controls are working properly after our recent fixes!
## CLI Tools
The SDK includes powerful command-line tools for setup, testing, and debugging:
### Setup & Configuration
```bash
# Interactive setup wizard
vocals setup
# List available audio devices
vocals devices
# Test a specific audio device
vocals test-device 1 --duration 5
# Generate diagnostic report
vocals diagnose
```
### Development Tools
```bash
# Run all tests
vocals test
# Run a demo session
vocals demo --duration 30 --verbose
# Create project templates
vocals create-template voice_assistant
vocals create-template file_processor
vocals create-template conversation_tracker
vocals create-template advanced_voice_assistant
```
**Available Templates:**
- `voice_assistant`: Simple voice assistant (**Default Experience**)
- `file_processor`: Process audio files (**Default Experience**)
- `conversation_tracker`: Track conversations (**Controlled Experience**)
- `advanced_voice_assistant`: Full control voice assistant (**Controlled Experience**)
All templates use the modern **class-based API** with `VocalsClient`.
### Advanced Features
```bash
# Performance monitoring
vocals demo --duration 60 --stats
# Custom audio device
vocals demo --device 2
# Debug mode
VOCALS_DEBUG_LEVEL=DEBUG vocals demo
```
## Error Handling
The client provides comprehensive error handling:
```python
from vocals import VocalsClient, VocalsError
async def main():
client = VocalsClient()
try:
async with client:
await client.stream_microphone(duration=10.0)
except VocalsError as e:
print(f"Vocals client error: {e}")
except Exception as e:
print(f"Unexpected error: {e}")
# Manual cleanup if context manager fails
await client.disconnect()
client.cleanup()
# Alternative without context manager
async def main_manual():
client = VocalsClient()
try:
await client.stream_microphone(duration=10.0)
except VocalsError as e:
print(f"Vocals client error: {e}")
except Exception as e:
print(f"Unexpected error: {e}")
finally:
await client.disconnect()
client.cleanup()
```
## Troubleshooting
### Common Issues
1. **"API key not found"**
- Set environment variable: `export VOCALS_DEV_API_KEY="your_key"`
- Or create `.env` file with the key
- Ensure the .env file is loaded (e.g., using python-dotenv if needed)
2. **"Connection failed"**
- Check your internet connection
- Verify API key is valid
- Check WebSocket endpoint is accessible
- Try increasing reconnect attempts in config
3. **"No audio input detected"**
- Check microphone permissions
- Verify microphone is working (use `vocals devices` to list devices)
- Adjust `amplitude_threshold` parameter lower (e.g., 0.005)
- Test with `vocals test-device <id>`
4. **Audio playback issues**
- Ensure speakers/headphones are connected
- Check system audio settings
- Try different audio formats or sample rates in AudioConfig
5. **High latency**
- Check network speed
- Reduce buffer_size in AudioConfig
- Ensure no other apps are using high bandwidth
6. **Dependency errors**
- Run `pip install -r requirements.txt` again
- For Linux: Ensure portaudio is installed
- Try creating a fresh virtual environment
If issues persist, run `vocals diagnose` and share the output when reporting bugs.
### Debug Mode
Enable debug logging to troubleshoot issues:
```python
import logging
# Enable debug logging
logging.basicConfig(level=logging.DEBUG)
# Or for specific modules
logging.getLogger("vocals").setLevel(logging.DEBUG)
```
## Examples
Check out the included examples:
- [`examples/example_microphone_streaming.py`](examples/example_microphone_streaming.py) - Comprehensive microphone streaming examples
- [`examples/example_file_playback.py`](examples/example_file_playback.py) - Audio file playback examples
- [`examples/run_examples.sh`](examples/run_examples.sh) - Script to run examples with proper setup
## Contributing
Contributions are welcome! Please follow these steps:
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request
For major changes, please open an issue first to discuss what you would like to change.
See [CONTRIBUTING.md](CONTRIBUTING.md) for more details (feel free to create one if it doesn't exist).
## Support
For support, documentation, and updates:
- π [Documentation](https://docs.vocals.dev)
- π [Issues](https://github.com/vocals/vocals-sdk-python/issues)
- π¬ [Support](mailto:support@vocals.dev)
## License
MIT License - see LICENSE file for details.
Raw data
{
"_id": null,
"home_page": "https://github.com/hairetsucodes/vocals-sdk-python",
"name": "vocals",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "vocals, audio, speech, websocket, real-time, voice processing",
"author": "Vocals Team",
"author_email": "support@vocals.dev",
"download_url": "https://files.pythonhosted.org/packages/db/19/580616ee1899cb2156601a9bef458be863b999cbc7d8d5fce8867e5e4445/vocals-1.0.984.tar.gz",
"platform": null,
"description": "# Vocals SDK Python\n\n[](https://badge.fury.io/py/vocals)\n[](https://opensource.org/licenses/MIT)\n[](https://github.com/hairetsucodes/vocals-sdk-python/issues)\n\nA Python SDK for voice processing and real-time audio communication with AI assistants. Stream microphone input or audio files to receive live transcription, AI responses, and text-to-speech audio.\n\n**Features both class-based and functional interfaces** for maximum flexibility and ease of use.\n\n## Features\n\n- \ud83c\udfa4 **Real-time microphone streaming** with voice activity detection\n- \ud83d\udcc1 **Audio file playback** support (WAV format)\n- \u2728 **Live transcription** with partial and final results\n- \ud83e\udd16 **Streaming AI responses** with real-time text display\n- \ud83d\udd0a **Text-to-speech playback** with automatic audio queueing\n- \ud83d\udcca **Conversation tracking** and session statistics\n- \ud83d\ude80 **Easy setup** with minimal configuration required\n- \ud83d\udd04 **Auto-reconnection** and robust error handling\n- \ud83c\udf9b\ufe0f **Class-based API** with modern Python patterns\n- \ud83d\udd00 **Context manager support** for automatic cleanup\n\n## Table of Contents\n\n- [Features](#features)\n- [Installation](#installation)\n- [Quick Start](#quick-start)\n- [SDK Modes](#sdk-modes)\n- [Advanced Usage](#advanced-usage)\n- [Configuration](#configuration)\n- [Complete API Reference](#complete-api-reference)\n- [Testing Your Setup](#testing-your-setup)\n- [CLI Tools](#cli-tools)\n- [Error Handling](#error-handling)\n- [Troubleshooting](#troubleshooting)\n- [Examples](#examples)\n- [Contributing](#contributing)\n- [Support](#support)\n- [License](#license)\n\n## Installation\n\n```bash\npip install vocals\n```\n\n### Quick Setup\n\nAfter installation, use the built-in setup wizard to configure your environment:\n\n```bash\nvocals setup\n```\n\nOr test your installation:\n\n```bash\nvocals test\n```\n\nRun a quick demo:\n\n```bash\nvocals demo\n```\n\n### \ud83c\udf10 Web UI Demo\n\n**NEW!** Launch an interactive web interface to try the voice assistant:\n\n```bash\nvocals demo --ui\n```\n\nThis will:\n\n- \u2705 **Automatically install Gradio** (if not already installed)\n- \ud83d\ude80 **Launch a web interface** in your browser\n- \ud83c\udfa4 **Real-time voice interaction** with visual feedback\n- \ud83d\udcf1 **Easy-to-use interface** with buttons and live updates\n- \ud83d\udd0a **Live transcription and AI responses** in the browser\n\n**Perfect for:**\n\n- \ud83c\udfaf **Quick demonstrations** and testing\n- \ud83d\udc65 **Showing to others** without command line\n- \ud83d\udda5\ufe0f **Visual feedback** and status indicators\n- \ud83d\udcca **Real-time conversation tracking**\n\nThe web UI provides the same functionality as the command line demo but with an intuitive graphical interface that's perfect for demonstrations and interactive testing.\n\n### System Requirements\n\n- Python 3.8 or higher\n- Working microphone (for microphone streaming)\n- Audio output device (for TTS playback)\n\n### Additional Dependencies\n\nThe SDK automatically installs all required Python dependencies including `pyaudio`, `sounddevice`, `numpy`, `websockets`, and others.\n\nOn some Linux systems, you may need to install system-level audio libraries:\n\n**Ubuntu/Debian:**\n\n```bash\nsudo apt-get install portaudio19-dev\n```\n\n**Other Linux distributions:**\n\n```bash\n# Install portaudio development headers using your package manager\n# For example, on CentOS/RHEL: sudo yum install portaudio-devel\n```\n\n## Quick Start\n\n### 1. Get Your API Key\n\nSet up your Vocals API key as an environment variable:\n\n```bash\nexport VOCALS_DEV_API_KEY=\"your_api_key_here\"\n```\n\nOr create a `.env` file in your project:\n\n```\nVOCALS_DEV_API_KEY=your_api_key_here\n```\n\n### 2. Basic Usage\n\nThe Vocals SDK provides a modern **class-based API** as the primary interface\n\n#### Microphone Streaming (Minimal Example)\n\n```python\nimport asyncio\nfrom vocals import VocalsClient\n\nasync def main():\n # Create client instance\n client = VocalsClient()\n\n # Stream microphone for 10 seconds\n await client.stream_microphone(duration=10.0)\n\n # Clean up\n await client.disconnect()\n client.cleanup()\n\nif __name__ == \"__main__\":\n asyncio.run(main())\n```\n\n#### Audio File Playback (Minimal Example)\n\n```python\nimport asyncio\nfrom vocals import VocalsClient\n\nasync def main():\n # Create client instance\n client = VocalsClient()\n\n # Stream audio file\n await client.stream_audio_file(\"path/to/your/audio.wav\")\n\n # Clean up\n await client.disconnect()\n client.cleanup()\n\nif __name__ == \"__main__\":\n asyncio.run(main())\n```\n\n#### Context Manager Usage (Recommended)\n\n```python\nimport asyncio\nfrom vocals import VocalsClient\n\nasync def main():\n # Use context manager for automatic cleanup\n async with VocalsClient() as client:\n await client.stream_microphone(duration=10.0)\n\nif __name__ == \"__main__\":\n asyncio.run(main())\n```\n\n## SDK Modes\n\nThe Vocals SDK supports two usage patterns:\n\n### Default Experience (No Modes)\n\nWhen you create the client without specifying modes, you get a full auto-contained experience:\n\n```python\n# Full experience with automatic handlers, playback, and beautiful console output\nclient = VocalsClient()\n```\n\n**Features:**\n\n- \u2705 Automatic transcription display with partial updates\n- \u2705 Streaming AI response display in real-time\n- \u2705 Automatic TTS audio playback\n- \u2705 Speech interruption handling\n- \u2705 Beautiful console output with emojis\n- \u2705 Perfect for getting started quickly\n\n### Controlled Experience (With Modes)\n\nWhen you specify modes, the client becomes passive and you control everything:\n\n```python\n# Controlled experience - you handle all logic\nclient = VocalsClient(modes=['transcription', 'voice_assistant'])\n```\n\n**Available Modes:**\n\n- `'transcription'`: Enables transcription-related internal processing\n- `'voice_assistant'`: Enables AI response handling and speech interruption\n\n**Features:**\n\n- \u2705 No automatic handlers attached\n- \u2705 No automatic playback\n- \u2705 You attach your own message handlers\n- \u2705 You control when to play audio\n- \u2705 Perfect for custom applications\n\n### Example: Controlled Experience\n\n```python\nimport asyncio\nfrom vocals import VocalsClient\n\nasync def main():\n # Create client with controlled experience\n client = VocalsClient(modes=['transcription', 'voice_assistant'])\n\n # Custom message handler\n def handle_messages(message):\n if message.type == \"transcription\" and message.data:\n text = message.data.get(\"text\", \"\")\n is_partial = message.data.get(\"is_partial\", False)\n if not is_partial:\n print(f\"You said: {text}\")\n\n elif message.type == \"tts_audio\" and message.data:\n text = message.data.get(\"text\", \"\")\n print(f\"AI speaking: {text}\")\n # Manually start playback\n asyncio.create_task(client.play_audio())\n\n # Register your handler\n client.on_message(handle_messages)\n\n # Stream microphone with context manager\n async with client:\n await client.stream_microphone(\n duration=30.0,\n auto_playback=False # We control playback\n )\n\nif __name__ == \"__main__\":\n asyncio.run(main())\n```\n\n## Advanced Usage\n\n### Enhanced Microphone Streaming\n\n```python\nimport asyncio\nimport logging\nfrom vocals import (\n VocalsClient,\n create_enhanced_message_handler,\n create_default_connection_handler,\n create_default_error_handler,\n)\n\nasync def main():\n # Configure logging for cleaner output\n logging.getLogger(\"vocals\").setLevel(logging.WARNING)\n\n # Create client with default full experience\n client = VocalsClient()\n\n try:\n print(\"\ud83c\udfa4 Starting microphone streaming...\")\n print(\"Speak into your microphone!\")\n\n # Stream microphone with enhanced features\n async with client:\n stats = await client.stream_microphone(\n duration=30.0, # Record for 30 seconds\n auto_connect=True, # Auto-connect if needed\n auto_playback=True, # Auto-play received audio\n verbose=False, # Client handles display automatically\n stats_tracking=True, # Track session statistics\n amplitude_threshold=0.01, # Voice activity detection threshold\n )\n\n # Print session statistics\n print(f\"\\n\ud83d\udcca Session Statistics:\")\n print(f\" \u2022 Transcriptions: {stats.get('transcriptions', 0)}\")\n print(f\" \u2022 AI Responses: {stats.get('responses', 0)}\")\n print(f\" \u2022 TTS Segments: {stats.get('tts_segments_received', 0)}\")\n\n except Exception as e:\n print(f\"Error: {e}\")\n await client.disconnect()\n client.cleanup()\n\nif __name__ == \"__main__\":\n asyncio.run(main())\n```\n\n### Conversation Tracking Example\n\n```python\nimport asyncio\nfrom vocals import (\n VocalsClient,\n create_conversation_tracker,\n create_enhanced_message_handler,\n)\n\nasync def main():\n # Create client with controlled experience for custom tracking\n client = VocalsClient(modes=['transcription', 'voice_assistant'])\n conversation_tracker = create_conversation_tracker()\n\n # Custom message handler with conversation tracking\n def tracking_handler(message):\n # Custom display logic\n if message.type == \"transcription\" and message.data:\n text = message.data.get(\"text\", \"\")\n is_partial = message.data.get(\"is_partial\", False)\n if not is_partial and text:\n print(f\"\ud83c\udfa4 You: {text}\")\n\n elif message.type == \"llm_response\" and message.data:\n response = message.data.get(\"response\", \"\")\n if response:\n print(f\"\ud83e\udd16 AI: {response}\")\n\n elif message.type == \"tts_audio\" and message.data:\n text = message.data.get(\"text\", \"\")\n if text:\n print(f\"\ud83d\udd0a Playing: {text}\")\n # Manually start playback since we're in controlled mode\n asyncio.create_task(client.play_audio())\n\n # Track conversation based on message type\n if message.type == \"transcription\" and message.data:\n text = message.data.get(\"text\", \"\")\n is_partial = message.data.get(\"is_partial\", False)\n if text and not is_partial:\n conversation_tracker[\"add_transcription\"](text, is_partial)\n\n elif message.type == \"llm_response\" and message.data:\n response = message.data.get(\"response\", \"\")\n if response:\n conversation_tracker[\"add_response\"](response)\n\n # Set up handler\n client.on_message(tracking_handler)\n\n try:\n # Stream microphone with context manager\n async with client:\n await client.stream_microphone(\n duration=15.0,\n auto_playback=False # We handle playback manually\n )\n\n # Print conversation history\n print(\"\\n\" + \"=\"*50)\n print(\"\ud83d\udcdc CONVERSATION HISTORY\")\n print(\"=\"*50)\n conversation_tracker[\"print_conversation\"]()\n\n # Print conversation statistics\n stats = conversation_tracker[\"get_stats\"]()\n print(f\"\\n\ud83d\udcc8 Session lasted {stats['duration']:.1f} seconds\")\n\n except Exception as e:\n print(f\"Error: {e}\")\n\nif __name__ == \"__main__\":\n asyncio.run(main())\n```\n\n### Infinite Streaming with Signal Handling\n\n```python\nimport asyncio\nimport signal\nfrom vocals import VocalsClient\n\n# Global shutdown event\nshutdown_event = asyncio.Event()\n\ndef setup_signal_handlers():\n \"\"\"Setup signal handlers for graceful shutdown.\"\"\"\n def signal_handler(signum, frame):\n if not shutdown_event.is_set():\n print(f\"\\n\ud83d\udce1 Received signal {signum}, shutting down...\")\n shutdown_event.set()\n\n signal.signal(signal.SIGINT, signal_handler)\n signal.signal(signal.SIGTERM, signal_handler)\n\nasync def main():\n setup_signal_handlers()\n\n # Create client\n client = VocalsClient()\n\n try:\n print(\"\ud83c\udfa4 Starting infinite streaming...\")\n print(\"Press Ctrl+C to stop\")\n\n # Connect to service\n await client.connect()\n\n # Create streaming task\n async def stream_task():\n await client.stream_microphone(\n duration=0, # 0 = infinite streaming\n auto_connect=True,\n auto_playback=True,\n verbose=False,\n stats_tracking=True,\n )\n\n # Run streaming and wait for shutdown\n streaming_task = asyncio.create_task(stream_task())\n shutdown_task = asyncio.create_task(shutdown_event.wait())\n\n # Wait for shutdown signal\n await shutdown_task\n\n # Stop recording gracefully\n await client.stop_recording()\n\n finally:\n # Cancel streaming task\n if 'streaming_task' in locals():\n streaming_task.cancel()\n await client.disconnect()\n client.cleanup()\n\nif __name__ == \"__main__\":\n asyncio.run(main())\n```\n\n### Custom Audio Processing (Alternative to Local Playback)\n\nInstead of playing audio locally, you can process audio segments with custom handlers - perfect for saving audio files, sending to external players, or implementing custom audio processing:\n\n```python\nimport asyncio\nimport base64\nfrom vocals import VocalsClient\n\nasync def main():\n \"\"\"Advanced voice assistant with custom audio processing\"\"\"\n\n # Create client with controlled mode for manual audio handling\n client = VocalsClient(modes=[\"transcription\", \"voice_assistant\"])\n\n # Custom state tracking\n conversation_state = {\"listening\": False, \"processing\": False, \"speaking\": False}\n\n def handle_messages(message):\n \"\"\"Custom message handler with audio processing control\"\"\"\n\n if message.type == \"transcription\" and message.data:\n text = message.data.get(\"text\", \"\")\n is_partial = message.data.get(\"is_partial\", False)\n\n if is_partial:\n print(f\"\\r\ud83c\udfa4 Listening: {text}...\", end=\"\", flush=True)\n else:\n print(f\"\\n\u2705 You said: {text}\")\n\n elif message.type == \"llm_response_streaming\" and message.data:\n token = message.data.get(\"token\", \"\")\n is_complete = message.data.get(\"is_complete\", False)\n\n if token:\n print(token, end=\"\", flush=True)\n if is_complete:\n print() # New line\n\n elif message.type == \"tts_audio\" and message.data:\n text = message.data.get(\"text\", \"\")\n if text and not conversation_state[\"speaking\"]:\n print(f\"\ud83d\udd0a AI speaking: {text}\")\n conversation_state[\"speaking\"] = True\n\n # Custom audio processing instead of local playback\n def custom_audio_handler(segment):\n \"\"\"Process each audio segment with custom logic\"\"\"\n print(f\"\ud83c\udfb5 Processing audio: {segment.text}\")\n\n # Option 1: Save to file\n audio_data = base64.b64decode(segment.audio_data)\n filename = f\"audio_{segment.segment_id}.wav\"\n with open(filename, \"wb\") as f:\n f.write(audio_data)\n print(f\"\ud83d\udcbe Saved audio to: {filename}\")\n\n # Option 2: Send to external audio player\n # subprocess.run([\"ffplay\", \"-nodisp\", \"-autoexit\", filename])\n\n # Option 3: Stream to audio device\n # your_audio_device.play(audio_data)\n\n # Option 4: Convert format\n # converted_audio = convert_audio_format(audio_data, target_format)\n\n # Option 5: Process with AI/ML\n # audio_features = extract_audio_features(audio_data)\n # emotion_score = analyze_emotion(audio_features)\n\n # Process all available audio segments\n processed_count = client.process_audio_queue(\n custom_audio_handler,\n consume_all=True\n )\n print(f\"\u2705 Processed {processed_count} audio segments\")\n\n elif message.type == \"speech_interruption\":\n print(\"\\n\ud83d\uded1 Speech interrupted\")\n conversation_state[\"speaking\"] = False\n\n # Register message handler\n client.on_message(handle_messages)\n\n # Connection handler\n def handle_connection(state):\n if state.name == \"CONNECTED\":\n print(\"\u2705 Connected to voice assistant\")\n elif state.name == \"DISCONNECTED\":\n print(\"\u274c Disconnected from voice assistant\")\n\n client.on_connection_change(handle_connection)\n\n try:\n print(\"\ud83c\udfa4 Voice Assistant with Custom Audio Processing\")\n print(\"Audio will be saved to files instead of played locally\")\n print(\"Speak into your microphone...\")\n print(\"Press Ctrl+C to stop\")\n\n # Stream microphone with custom audio handling\n async with client:\n await client.stream_microphone(\n duration=0, # Infinite recording\n auto_connect=True, # Auto-connect to service\n auto_playback=False, # Disable automatic playback - we handle it\n verbose=False, # Clean output\n )\n\n except KeyboardInterrupt:\n print(\"\\n\ud83d\udc4b Custom audio processing stopped\")\n finally:\n await client.disconnect()\n client.cleanup()\n\nif __name__ == \"__main__\":\n asyncio.run(main())\n```\n\n**Key Features of Custom Audio Processing:**\n\n- \ud83c\udf9b\ufe0f **Full Control**: Complete control over audio handling instead of automatic playback\n- \ud83d\udcbe **Save to Files**: Save audio segments as individual WAV files\n- \ud83d\udd04 **Format Conversion**: Convert audio to different formats before processing\n- \ud83c\udfb5 **External Players**: Send audio to external audio players or devices\n- \ud83e\udd16 **AI Processing**: Analyze audio with machine learning models\n- \ud83d\udcca **Audio Analytics**: Extract features, analyze emotion, or process speech patterns\n- \ud83d\udd0c **Integration**: Easily integrate with existing audio pipelines\n\n**Use Cases:**\n\n- Recording conversations for later playback\n- Building custom audio players with UI controls\n- Streaming audio to multiple devices simultaneously\n- Processing audio with AI/ML models for analysis\n- Converting audio formats for different platforms\n- Creating audio archives or transcription systems\n\n## Configuration\n\n### Environment Variables\n\n```bash\n# Required: Your Vocals API key\nexport VOCALS_DEV_API_KEY=\"vdev_your_api_key_here\"\n\n```\n\n### Audio Configuration\n\n```python\nfrom vocals import VocalsClient, AudioConfig\n\n# Create custom audio configuration\naudio_config = AudioConfig(\n sample_rate=24000, # Sample rate in Hz\n channels=1, # Number of audio channels\n format=\"pcm_f32le\", # Audio format\n buffer_size=1024, # Audio buffer size\n)\n\n# Use with client\nclient = VocalsClient(audio_config=audio_config)\n```\n\n### SDK Configuration\n\n```python\nfrom vocals import VocalsClient, get_default_config\n\n# Get default configuration\nconfig = get_default_config()\n\n# Customize configuration\nconfig.max_reconnect_attempts = 5\nconfig.reconnect_delay = 2.0\nconfig.auto_connect = True\nconfig.token_refresh_buffer = 60.0\n\n# Use with client\nclient = VocalsClient(config=config)\n```\n\n## Complete API Reference\n\nThe Vocals SDK provides comprehensive control over voice processing, connection management, audio playback, and event handling. Here's a complete reference of all available controls:\n\n**\ud83c\udf9b\ufe0f Main Control Categories:**\n\n- **SDK Creation & Configuration** - Initialize and configure the SDK\n- **Stream Methods** - Control microphone and file streaming\n- **Connection Management** - Connect, disconnect, and manage WebSocket connections\n- **Audio Playback** - Control TTS audio playback, queueing, and timing\n- **Event Handling** - Register handlers for messages, connections, errors, and audio data\n- **State Management** - Access real-time state information\n- **Device Management** - Manage and test audio devices\n\n**\ud83d\udccb Quick Reference:**\n| Control Category | Key Methods | Purpose |\n|------------------|-------------|---------|\n| **Streaming** | `stream_microphone()`, `stream_audio_file()` | Start voice/audio processing |\n| **Connection** | `connect()`, `disconnect()`, `reconnect()` | Manage WebSocket connection |\n| **Recording** | `start_recording()`, `stop_recording()` | Control audio input |\n| **Playback** | `play_audio()`, `pause_audio()`, `stop_audio()` | Control TTS audio output |\n| **Queue** | `clear_queue()`, `add_to_queue()`, `get_audio_queue()` | Manage audio queue |\n| **Events** | `on_message()`, `on_connection_change()`, `on_error()` | Handle events |\n| **State** | `get_is_connected()`, `get_is_playing()`, `get_recording_state()` | Check current state |\n\n### Core Functions\n\n- `VocalsClient(config?, audio_config?, user_id?, modes?)` - Create client instance\n- `get_default_config()` - Get default configuration\n- `AudioConfig(...)` - Audio configuration class\n\n#### `VocalsClient()` Constructor\n\n```python\nVocalsClient(\n config: Optional[VocalsConfig] = None,\n audio_config: Optional[AudioConfig] = None,\n user_id: Optional[str] = None,\n modes: List[str] = [] # Controls client behavior\n)\n```\n\n**Parameters:**\n\n- `config`: Client configuration options (connection, logging, etc.)\n- `audio_config`: Audio processing configuration (sample rate, channels, etc.)\n- `user_id`: Optional user ID for token generation\n- `modes`: List of modes to control client behavior\n\n**Modes:**\n\n- `[]` (empty list): **Default Experience** - Full auto-contained behavior with automatic handlers\n- `['transcription']`: **Controlled** - Only transcription-related internal processing\n- `['voice_assistant']`: **Controlled** - Only AI response handling and speech interruption\n- `['transcription', 'voice_assistant']`: **Controlled** - Both features, but no automatic handlers\n\n### Audio Configuration\n\n```python\nAudioConfig(\n sample_rate: int = 24000, # Sample rate in Hz\n channels: int = 1, # Number of audio channels\n format: str = \"pcm_f32le\", # Audio format\n buffer_size: int = 1024, # Audio buffer size\n)\n```\n\n### Stream Methods\n\n#### `stream_microphone()` Parameters\n\n```python\nawait client.stream_microphone(\n duration: float = 30.0, # Recording duration in seconds (0 for infinite)\n auto_connect: bool = True, # Whether to automatically connect if not connected\n auto_playback: bool = True, # Whether to automatically play received audio\n verbose: bool = True, # Whether to log detailed progress\n stats_tracking: bool = True, # Whether to track and return statistics\n amplitude_threshold: float = 0.01 # Minimum amplitude to consider as speech\n)\n```\n\n**Important:** In **Controlled Experience** (with modes), TTS audio is always added to the queue, but `auto_playback=False` prevents automatic playback. You must manually call `client.play_audio()` to play queued audio.\n\n#### `stream_audio_file()` Parameters\n\n```python\nawait client.stream_audio_file(\n file_path: str, # Path to the audio file to stream\n chunk_size: int = 1024, # Size of each chunk to send\n verbose: bool = True, # Whether to log detailed progress\n auto_connect: bool = True # Whether to automatically connect if not connected\n)\n```\n\n### Connection & Recording Methods\n\n```python\nawait client.connect() # Connect to WebSocket\nawait client.disconnect() # Disconnect from WebSocket\nawait client.reconnect() # Reconnect to WebSocket\nawait client.start_recording() # Start recording\nawait client.stop_recording() # Stop recording\n```\n\n### Audio Playback Methods\n\n```python\nawait client.play_audio() # Start/resume audio playback\nawait client.pause_audio() # Pause audio playback\nawait client.stop_audio() # Stop audio playback\nawait client.fade_out_audio(duration) # Fade out audio over specified duration\nclient.clear_queue() # Clear the audio playback queue\nclient.add_to_queue(segment) # Add audio segment to queue\n```\n\n### Event Handlers\n\n```python\nclient.on_message(handler) # Handle incoming messages\nclient.on_connection_change(handler) # Handle connection state changes\nclient.on_error(handler) # Handle errors\nclient.on_audio_data(handler) # Handle audio data\n```\n\n**Handler Functions:**\n\n- `handler(message)` - Message handler receives WebSocket messages\n- `handler(connection_state)` - Connection handler receives connection state changes\n- `handler(error)` - Error handler receives error objects\n- `handler(audio_data)` - Audio data handler receives real-time audio data\n\n### Properties\n\n```python\n# Connection properties\nclient.connection_state # Get current connection state\nclient.is_connected # Check if connected\nclient.is_connecting # Check if connecting\n\n# Recording properties\nclient.recording_state # Get current recording state\nclient.is_recording # Check if recording\n\n# Playback properties\nclient.playback_state # Get current playback state\nclient.is_playing # Check if playing audio\nclient.audio_queue # Get current audio queue\nclient.current_segment # Get currently playing segment\nclient.current_amplitude # Get current audio amplitude\n\n# Token properties\nclient.token # Get current token\nclient.token_expires_at # Get token expiration timestamp\n```\n\n### Utility Methods\n\n```python\nclient.set_user_id(user_id) # Set user ID for token generation\nclient.cleanup() # Clean up resources\nclient.process_audio_queue(handler) # Process audio queue with custom handler\n```\n\n### Utility Functions\n\nThese utility functions work with both the class-based and functional APIs:\n\n```python\n# Message handlers\ncreate_enhanced_message_handler(\n verbose: bool = True,\n show_transcription: bool = True,\n show_responses: bool = True,\n show_streaming: bool = True,\n show_detection: bool = False\n)\n\n# Conversation tracking\ncreate_conversation_tracker()\n\n# Statistics tracking\ncreate_microphone_stats_tracker(verbose: bool = True)\n\n# Connection handlers\ncreate_default_connection_handler(verbose: bool = True)\ncreate_default_error_handler(verbose: bool = True)\n```\n\n### Audio Device Management\n\n```python\n# Device management\nlist_audio_devices() # List available audio devices\nget_default_audio_device() # Get default audio device\ntest_audio_device(device_id, duration) # Test audio device\nvalidate_audio_device(device_id) # Validate audio device\nget_audio_device_info(device_id) # Get device information\nprint_audio_devices() # Print formatted device list\ncreate_audio_device_selector() # Interactive device selector\n```\n\n### Auto-Playback Behavior\n\n**Default Experience (no modes):**\n\n- `auto_playback=True` (default): TTS audio plays automatically\n- `auto_playback=False`: TTS audio is added to queue but doesn't play automatically\n\n**Controlled Experience (with modes):**\n\n- `auto_playback=True`: TTS audio is added to queue and plays automatically\n- `auto_playback=False`: TTS audio is added to queue but requires manual `client.play_audio()` call\n\n**Key Point:** In controlled mode, TTS audio is **always** added to the queue regardless of `auto_playback` setting. The `auto_playback` parameter only controls whether playback starts automatically.\n\n### Message Types\n\nCommon message types you'll receive in handlers:\n\n```python\n# Transcription messages\n{\n \"type\": \"transcription\",\n \"data\": {\n \"text\": \"Hello world\",\n \"is_partial\": False,\n \"segment_id\": \"abc123\"\n }\n}\n\n# LLM streaming response\n{\n \"type\": \"llm_response_streaming\",\n \"data\": {\n \"token\": \"Hello\",\n \"accumulated_response\": \"Hello\",\n \"is_complete\": False,\n \"segment_id\": \"def456\"\n }\n}\n\n# TTS audio\n{\n \"type\": \"tts_audio\",\n \"data\": {\n \"text\": \"Hello there\",\n \"audio_data\": \"base64_encoded_wav_data\",\n \"sample_rate\": 24000,\n \"segment_id\": \"ghi789\",\n \"duration_seconds\": 1.5\n }\n}\n\n# Speech interruption\n{\n \"type\": \"speech_interruption\",\n \"data\": {}\n}\n```\n\n## Testing Your Setup\n\nAfter setting up the SDK, you can test all the controls to ensure everything is working properly:\n\n### 1. Test Basic Audio Setup\n\n```bash\n# List available audio devices\nvocals devices\n\n# Test your microphone\nvocals test-device\n\n# Run system diagnostics\nvocals diagnose\n```\n\n### 2. Test Default Experience\n\n```python\nimport asyncio\nfrom vocals import VocalsClient\n\nasync def test_default():\n \"\"\"Test default experience with automatic handlers\"\"\"\n client = VocalsClient() # No modes = full automatic experience\n\n print(\"\ud83c\udfa4 Testing default experience...\")\n print(\"Speak and listen for AI responses...\")\n\n # Test with automatic playback\n async with client:\n await client.stream_microphone(\n duration=15.0,\n auto_playback=True, # Should auto-play TTS\n verbose=False\n )\n\n print(\"\u2705 Default experience test completed\")\n\nasyncio.run(test_default())\n```\n\n### 3. Test Controlled Experience\n\n```python\nimport asyncio\nfrom vocals import VocalsClient\n\nasync def test_controlled():\n \"\"\"Test controlled experience with manual handlers\"\"\"\n client = VocalsClient(modes=['transcription', 'voice_assistant'])\n\n # Track what we receive\n received_messages = []\n\n def test_handler(message):\n received_messages.append(message.type)\n print(f\"\u2705 Received: {message.type}\")\n\n # Test manual playback control\n if message.type == \"tts_audio\":\n print(\"\ud83d\udd0a Manually triggering playback...\")\n asyncio.create_task(client.play_audio())\n\n # Register handler\n client.on_message(test_handler)\n\n print(\"\ud83c\udfa4 Testing controlled experience...\")\n print(\"Should receive transcription and TTS messages...\")\n\n # Test with manual playback control\n async with client:\n await client.stream_microphone(\n duration=15.0,\n auto_playback=False, # We control playback manually\n verbose=False\n )\n\n print(f\"\ud83d\udcca Received message types: {set(received_messages)}\")\n\n # Verify we got the expected message types\n expected_types = [\"transcription\", \"tts_audio\"]\n for msg_type in expected_types:\n if msg_type in received_messages:\n print(f\"\u2705 {msg_type} messages working\")\n else:\n print(f\"\u274c {msg_type} messages not received\")\n\n print(\"\u2705 Controlled experience test completed\")\n\nasyncio.run(test_controlled())\n```\n\n### 4. Test Audio Playback Controls\n\n```python\nimport asyncio\nfrom vocals import VocalsClient\n\nasync def test_playback_controls():\n \"\"\"Test all audio playback controls\"\"\"\n client = VocalsClient(modes=['transcription', 'voice_assistant'])\n\n # Test queue management\n print(\"\ud83c\udfb5 Testing audio playback controls...\")\n\n # Check initial state\n print(f\"Initial queue size: {len(client.audio_queue)}\")\n print(f\"Is playing: {client.is_playing}\")\n\n def audio_handler(message):\n if message.type == \"tts_audio\":\n print(f\"\ud83c\udfb5 Audio received: {message.data.get('text', '')}\")\n print(f\"Queue size: {len(client.audio_queue)}\")\n\n client.on_message(audio_handler)\n\n # Stream and collect audio\n async with client:\n await client.stream_microphone(\n duration=10.0,\n auto_playback=False, # Don't auto-play\n verbose=False\n )\n\n # Test manual controls\n queue_size = len(client.audio_queue)\n if queue_size > 0:\n print(f\"\u2705 {queue_size} audio segments in queue\")\n\n print(\"\ud83c\udfb5 Testing play_audio()...\")\n await client.play_audio()\n\n # Wait a moment then test pause\n await asyncio.sleep(1)\n print(\"\u23f8\ufe0f Testing pause_audio()...\")\n await client.pause_audio()\n\n print(\"\u25b6\ufe0f Testing play_audio() again...\")\n await client.play_audio()\n\n # Test stop\n await asyncio.sleep(1)\n print(\"\u23f9\ufe0f Testing stop_audio()...\")\n await client.stop_audio()\n\n print(\"\ud83d\uddd1\ufe0f Testing clear_queue()...\")\n client.clear_queue()\n print(f\"Queue size after clear: {len(client.audio_queue)}\")\n\n print(\"\u2705 All playback controls working!\")\n else:\n print(\"\u274c No audio received to test playback controls\")\n\n await client.disconnect()\n client.cleanup()\n\nasyncio.run(test_playback_controls())\n```\n\n### 5. Test All Event Handlers\n\n```python\nimport asyncio\nfrom vocals import VocalsClient\n\nasync def test_event_handlers():\n \"\"\"Test all event handler types\"\"\"\n client = VocalsClient(modes=['transcription', 'voice_assistant'])\n\n # Track events\n events_received = {\n 'messages': 0,\n 'connections': 0,\n 'errors': 0,\n 'audio_data': 0\n }\n\n def message_handler(message):\n events_received['messages'] += 1\n print(f\"\ud83d\udce9 Message: {message.type}\")\n\n def connection_handler(state):\n events_received['connections'] += 1\n print(f\"\ud83d\udd0c Connection: {state.name}\")\n\n def error_handler(error):\n events_received['errors'] += 1\n print(f\"\u274c Error: {error.message}\")\n\n def audio_data_handler(audio_data):\n events_received['audio_data'] += 1\n if events_received['audio_data'] % 100 == 0: # Log every 100th\n print(f\"\ud83c\udfa4 Audio data chunks: {events_received['audio_data']}\")\n\n # Register all handlers\n client.on_message(message_handler)\n client.on_connection_change(connection_handler)\n client.on_error(error_handler)\n client.on_audio_data(audio_data_handler)\n\n print(\"\ud83e\uddea Testing all event handlers...\")\n\n async with client:\n await client.stream_microphone(\n duration=10.0,\n auto_playback=False,\n verbose=False\n )\n\n # Report results\n print(\"\\n\ud83d\udcca Event Handler Test Results:\")\n for event_type, count in events_received.items():\n status = \"\u2705\" if count > 0 else \"\u274c\"\n print(f\" {status} {event_type}: {count}\")\n\nasyncio.run(test_event_handlers())\n```\n\n### 6. Validate All Controls Are Working\n\nRun this comprehensive test to verify everything:\n\n```bash\n# Create a test script\ncat > test_all_controls.py << 'EOF'\nimport asyncio\nfrom vocals import VocalsClient\n\nasync def comprehensive_test():\n \"\"\"Comprehensive test of all client controls\"\"\"\n print(\"\ud83e\uddea Comprehensive Client Control Test\")\n print(\"=\" * 50)\n\n # Test 1: Default mode\n print(\"\\n1\ufe0f\u20e3 Testing Default Mode...\")\n client1 = VocalsClient()\n async with client1:\n await client1.stream_microphone(duration=5.0, verbose=False)\n print(\"\u2705 Default mode test completed\")\n\n # Test 2: Controlled mode\n print(\"\\n2\ufe0f\u20e3 Testing Controlled Mode...\")\n client2 = VocalsClient(modes=['transcription', 'voice_assistant'])\n\n message_count = 0\n def counter(message):\n nonlocal message_count\n message_count += 1\n if message.type == \"tts_audio\":\n asyncio.create_task(client2.play_audio())\n\n client2.on_message(counter)\n async with client2:\n await client2.stream_microphone(duration=5.0, auto_playback=False, verbose=False)\n print(f\"\u2705 Controlled mode test completed - {message_count} messages\")\n\n # Test 3: All controls\n print(\"\\n3\ufe0f\u20e3 Testing Individual Controls...\")\n client3 = VocalsClient()\n\n # Test properties\n print(f\" Connection state: {client3.connection_state.name}\")\n print(f\" Is connected: {client3.is_connected}\")\n print(f\" Recording state: {client3.recording_state.name}\")\n print(f\" Is recording: {client3.is_recording}\")\n print(f\" Playback state: {client3.playback_state.name}\")\n print(f\" Is playing: {client3.is_playing}\")\n print(f\" Queue length: {len(client3.audio_queue)}\")\n print(f\" Current amplitude: {client3.current_amplitude}\")\n\n await client3.disconnect()\n client3.cleanup()\n print(\"\u2705 All controls test completed\")\n\n print(\"\\n\ud83c\udf89 All tests completed successfully!\")\n\nif __name__ == \"__main__\":\n asyncio.run(comprehensive_test())\nEOF\n\n# Run the test\npython test_all_controls.py\n```\n\nThis comprehensive testing suite will validate that all your controls are working properly after our recent fixes!\n\n## CLI Tools\n\nThe SDK includes powerful command-line tools for setup, testing, and debugging:\n\n### Setup & Configuration\n\n```bash\n# Interactive setup wizard\nvocals setup\n\n# List available audio devices\nvocals devices\n\n# Test a specific audio device\nvocals test-device 1 --duration 5\n\n# Generate diagnostic report\nvocals diagnose\n```\n\n### Development Tools\n\n```bash\n# Run all tests\nvocals test\n\n# Run a demo session\nvocals demo --duration 30 --verbose\n\n# Create project templates\nvocals create-template voice_assistant\nvocals create-template file_processor\nvocals create-template conversation_tracker\nvocals create-template advanced_voice_assistant\n```\n\n**Available Templates:**\n\n- `voice_assistant`: Simple voice assistant (**Default Experience**)\n- `file_processor`: Process audio files (**Default Experience**)\n- `conversation_tracker`: Track conversations (**Controlled Experience**)\n- `advanced_voice_assistant`: Full control voice assistant (**Controlled Experience**)\n\nAll templates use the modern **class-based API** with `VocalsClient`.\n\n### Advanced Features\n\n```bash\n# Performance monitoring\nvocals demo --duration 60 --stats\n\n# Custom audio device\nvocals demo --device 2\n\n# Debug mode\nVOCALS_DEBUG_LEVEL=DEBUG vocals demo\n```\n\n## Error Handling\n\nThe client provides comprehensive error handling:\n\n```python\nfrom vocals import VocalsClient, VocalsError\n\nasync def main():\n client = VocalsClient()\n\n try:\n async with client:\n await client.stream_microphone(duration=10.0)\n except VocalsError as e:\n print(f\"Vocals client error: {e}\")\n except Exception as e:\n print(f\"Unexpected error: {e}\")\n # Manual cleanup if context manager fails\n await client.disconnect()\n client.cleanup()\n\n# Alternative without context manager\nasync def main_manual():\n client = VocalsClient()\n\n try:\n await client.stream_microphone(duration=10.0)\n except VocalsError as e:\n print(f\"Vocals client error: {e}\")\n except Exception as e:\n print(f\"Unexpected error: {e}\")\n finally:\n await client.disconnect()\n client.cleanup()\n```\n\n## Troubleshooting\n\n### Common Issues\n\n1. **\"API key not found\"**\n\n - Set environment variable: `export VOCALS_DEV_API_KEY=\"your_key\"`\n - Or create `.env` file with the key\n - Ensure the .env file is loaded (e.g., using python-dotenv if needed)\n\n2. **\"Connection failed\"**\n\n - Check your internet connection\n - Verify API key is valid\n - Check WebSocket endpoint is accessible\n - Try increasing reconnect attempts in config\n\n3. **\"No audio input detected\"**\n\n - Check microphone permissions\n - Verify microphone is working (use `vocals devices` to list devices)\n - Adjust `amplitude_threshold` parameter lower (e.g., 0.005)\n - Test with `vocals test-device <id>`\n\n4. **Audio playback issues**\n\n - Ensure speakers/headphones are connected\n - Check system audio settings\n - Try different audio formats or sample rates in AudioConfig\n\n5. **High latency**\n\n - Check network speed\n - Reduce buffer_size in AudioConfig\n - Ensure no other apps are using high bandwidth\n\n6. **Dependency errors**\n - Run `pip install -r requirements.txt` again\n - For Linux: Ensure portaudio is installed\n - Try creating a fresh virtual environment\n\nIf issues persist, run `vocals diagnose` and share the output when reporting bugs.\n\n### Debug Mode\n\nEnable debug logging to troubleshoot issues:\n\n```python\nimport logging\n\n# Enable debug logging\nlogging.basicConfig(level=logging.DEBUG)\n\n# Or for specific modules\nlogging.getLogger(\"vocals\").setLevel(logging.DEBUG)\n```\n\n## Examples\n\nCheck out the included examples:\n\n- [`examples/example_microphone_streaming.py`](examples/example_microphone_streaming.py) - Comprehensive microphone streaming examples\n- [`examples/example_file_playback.py`](examples/example_file_playback.py) - Audio file playback examples\n- [`examples/run_examples.sh`](examples/run_examples.sh) - Script to run examples with proper setup\n\n## Contributing\n\nContributions are welcome! Please follow these steps:\n\n1. Fork the repository\n2. Create a feature branch (`git checkout -b feature/AmazingFeature`)\n3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)\n4. Push to the branch (`git push origin feature/AmazingFeature`)\n5. Open a Pull Request\n\nFor major changes, please open an issue first to discuss what you would like to change.\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for more details (feel free to create one if it doesn't exist).\n\n## Support\n\nFor support, documentation, and updates:\n\n- \ud83d\udcd6 [Documentation](https://docs.vocals.dev)\n- \ud83d\udc1b [Issues](https://github.com/vocals/vocals-sdk-python/issues)\n- \ud83d\udcac [Support](mailto:support@vocals.dev)\n\n## License\n\nMIT License - see LICENSE file for details.\n",
"bugtrack_url": null,
"license": null,
"summary": "A Python SDK for voice processing and real-time audio communication",
"version": "1.0.984",
"project_urls": {
"Bug Reports": "https://github.com/vocals/vocals-sdk-python/issues",
"Documentation": "https://docs.vocals.dev",
"Homepage": "https://github.com/hairetsucodes/vocals-sdk-python",
"Source": "https://github.com/vocals/vocals-sdk-python"
},
"split_keywords": [
"vocals",
" audio",
" speech",
" websocket",
" real-time",
" voice processing"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "90a95daae33861ca141730e048ac59acc4ad66f5fd6ba6cde5239f2a33664453",
"md5": "cabdc160f95b90df69f07dadbd7375c5",
"sha256": "c351a2c8283242298318f5f568d72184b747cd09d7098db75eb658cf6672dcf6"
},
"downloads": -1,
"filename": "vocals-1.0.984-py3-none-any.whl",
"has_sig": false,
"md5_digest": "cabdc160f95b90df69f07dadbd7375c5",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 67468,
"upload_time": "2025-07-11T04:52:56",
"upload_time_iso_8601": "2025-07-11T04:52:56.004429Z",
"url": "https://files.pythonhosted.org/packages/90/a9/5daae33861ca141730e048ac59acc4ad66f5fd6ba6cde5239f2a33664453/vocals-1.0.984-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "db19580616ee1899cb2156601a9bef458be863b999cbc7d8d5fce8867e5e4445",
"md5": "783c54db7beda48fef269b79d547b87c",
"sha256": "f74ad21ba0a68ff80e39da723260141a9e3cbd7e7253af54454bef6ae58e70d9"
},
"downloads": -1,
"filename": "vocals-1.0.984.tar.gz",
"has_sig": false,
"md5_digest": "783c54db7beda48fef269b79d547b87c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 101230,
"upload_time": "2025-07-11T04:52:57",
"upload_time_iso_8601": "2025-07-11T04:52:57.301680Z",
"url": "https://files.pythonhosted.org/packages/db/19/580616ee1899cb2156601a9bef458be863b999cbc7d8d5fce8867e5e4445/vocals-1.0.984.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-11 04:52:57",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "hairetsucodes",
"github_project": "vocals-sdk-python",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "aiohttp",
"specs": [
[
">=",
"3.8.0"
]
]
},
{
"name": "websockets",
"specs": [
[
">=",
"11.0.0"
]
]
},
{
"name": "sounddevice",
"specs": [
[
">=",
"0.4.6"
]
]
},
{
"name": "numpy",
"specs": [
[
">=",
"1.21.0"
]
]
},
{
"name": "PyJWT",
"specs": [
[
">=",
"2.8.0"
]
]
},
{
"name": "python-dotenv",
"specs": [
[
">=",
"1.0.0"
]
]
},
{
"name": "typing-extensions",
"specs": [
[
">=",
"4.0.0"
]
]
},
{
"name": "pyaudio",
"specs": [
[
">=",
"0.2.11"
]
]
},
{
"name": "soundfile",
"specs": [
[
">=",
"0.12.1"
]
]
},
{
"name": "click",
"specs": [
[
">=",
"8.0.0"
]
]
},
{
"name": "psutil",
"specs": [
[
">=",
"5.9.0"
]
]
},
{
"name": "matplotlib",
"specs": [
[
">=",
"3.5.0"
]
]
}
],
"lcname": "vocals"
}