# CosyVoice Python SDK
[](https://opensource.org/licenses/MIT)
[](https://github.com/psf/black)
**Production-Ready Async TTS Client for CosyVoice Services**
Enterprise-grade asynchronous text-to-speech SDK designed for high-concurrency, low-latency real-time voice synthesis scenarios. The optimal choice for building intelligent voice interaction applications.
## Overview
CosyVoice Python SDK is an async-first TTS client library that provides:
- **🚀 Real-time Streaming Synthesis**: Text stream input, audio stream output with minimized time-to-first-byte
- **🎠Custom Voice Management**: Create unique voices from audio samples with zero-shot voice cloning
- **âš¡ High-Performance Async Architecture**: Support thousands of concurrent requests with auto-reconnection
- **🔧 Production-Ready**: Complete error handling, monitoring metrics, and load balancing support
- **📡 Multi-Protocol Support**: WebSocket streaming synthesis + HTTP RESTful voice management
- **🎵 Multi-Format Output**: Support WAV, MP3, PCM and other audio formats
## Quick Start
### Installation
```bash
pip install cosyvoice-client
# or using uv
uv add cosyvoice-client
```
### Authentication & Configuration
The SDK supports multiple authentication and configuration methods:
#### Environment Variables (Recommended)
```bash
export COSYVOICE_BASE_URL="https://api.cosyvoice.com"
export COSYVOICE_API_KEY="your_api_key_here"
```
#### Configuration in Code
```python
import cosyvoice
# Method 1: Using connection string
client = await cosyvoice.create_client(
"wss://api.cosyvoice.com",
api_key="your_api_key"
)
# Method 2: Using configuration object
config = cosyvoice.ClientConfig(
base_url="https://api.cosyvoice.com",
api_key="your_api_key",
connection_timeout=30.0
)
client = await cosyvoice.connect_client(config)
```
### Basic Usage Example
```python
import asyncio
import cosyvoice
async def basic_tts_example():
# Connect to CosyVoice service
async with cosyvoice.create_client() as client:
# 1. Create custom speaker
speaker = await client.speaker.create(
prompt_text="Hello, this is my voice sample.",
prompt_audio_path="https://example.com/voice_sample.wav"
)
# 2. Configure synthesis parameters
config = cosyvoice.SynthesisConfig(
speaker_id=speaker.zero_shot_spk_id,
mode=cosyvoice.SynthesisMode.ZERO_SHOT,
speed=1.2,
output_format=cosyvoice.AudioFormat.WAV
)
# 3. Synthesize speech
audio_data = await client.collect_audio(
"Welcome to CosyVoice TTS service!",
config
)
# 4. Save audio file
with open("output.wav", "wb") as f:
f.write(audio_data)
asyncio.run(basic_tts_example())
```
## API Reference
### Authentication
The SDK uses **Bearer Token** authentication for HTTP APIs and **query parameter token** for WebSocket connections.
| Method | HTTP Header | WebSocket URL Parameter |
|--------|-------------|------------------------|
| Bearer Token | `Authorization: Bearer {token}` | `?token={token}` |
### Core Interfaces
#### 1. Client Management
##### Create Client Connection
```python
# Create client with auto-configuration from environment
client = await cosyvoice.create_client()
# Create client with explicit configuration
client = await cosyvoice.create_client(
endpoint_url="wss://api.cosyvoice.com",
api_key="your_api_key",
timeout=30.0
)
# Using context manager (recommended)
async with cosyvoice.create_client() as client:
# Use client
pass
```
#### 2. Speaker Management API
##### Create Speaker
Creates a new custom voice from reference audio.
**Request:**
```python
speaker = await client.speaker.create(
prompt_text: str, # Reference text (1-500 chars)
prompt_audio_path: str, # HTTP/HTTPS URL to audio file
zero_shot_spk_id: str = None # Optional custom ID (auto-generated if not provided)
)
```
**Response:**
```python
class SpeakerInfo:
zero_shot_spk_id: str # Speaker unique identifier
prompt_text: str # Reference text
created_at: str # Creation timestamp (ISO format)
audio_url: str # Reference audio URL
```
##### Get Speaker Information
```python
speaker_info = await client.speaker.get_info(speaker_id: str)
```
##### Update Speaker
```python
await client.speaker.update(
speaker_id: str,
prompt_text: str = None, # Optional new reference text
prompt_audio_path: str = None # Optional new reference audio
)
```
##### Delete Speaker
```python
await client.speaker.delete(speaker_id: str)
```
##### List Speakers
```python
speakers = await client.speaker.list(
offset: int = 0, # Pagination offset
limit: int = 50 # Maximum results per page
)
```
##### Check Speaker Existence
```python
exists = await client.speaker.exists(speaker_id: str)
```
#### 3. Speech Synthesis API
##### Synthesis Configuration
```python
config = cosyvoice.SynthesisConfig(
speaker_id: str, # Required: Speaker ID
mode: SynthesisMode = SynthesisMode.ZERO_SHOT, # Synthesis mode
speed: float = 1.0, # Speed multiplier (0.5-3.0)
output_format: AudioFormat = AudioFormat.WAV, # Audio format
sample_rate: int = 22050, # Sample rate (Hz)
instruct_text: str = None, # Instruction text (instruct mode only)
bit_rate: int = 192000, # Bit rate for MP3 (bps)
compression_level: int = 2 # Compression level (0-9)
)
```
**Supported Synthesis Modes:**
- `ZERO_SHOT`: Custom voice cloning mode
- `SFT`: Pre-trained voice mode
- `CROSS_LINGUAL`: Cross-lingual synthesis
- `INSTRUCT`: Natural language instruction mode
**Supported Audio Formats:**
- `WAV`: Uncompressed audio
- `MP3`: Compressed audio with configurable bit rate
- `PCM`: Raw audio data
##### Batch Synthesis
Synthesize entire text at once.
```python
audio_data: bytes = await client.collect_audio(
text: str,
config: SynthesisConfig
)
```
##### Streaming Synthesis
Process audio chunks as they arrive for low-latency playback.
```python
async for result in client.synthesize_text(text: str, config: SynthesisConfig):
# result.audio_data: bytes - Audio chunk data
# result.text_index: int - Text segment index
# result.chunk_index: int - Audio chunk index within text segment
# Process audio chunk immediately for real-time playback
await audio_player.play(result.audio_data)
```
##### Text Stream Synthesis
Synthesize text as it arrives (ideal for LLM integration).
```python
async def text_generator():
# Simulate streaming text from LLM
sentences = ["Hello", "How are you?", "Welcome to our service!"]
for sentence in sentences:
yield sentence
await asyncio.sleep(0.1)
async for result in client.synthesize_stream(text_generator(), config):
await audio_player.play(result.audio_data)
```
##### Quick Synthesis
One-shot synthesis with automatic speaker creation.
```python
audio_data = await client.quick_synthesize(
text: str,
speaker_prompt_text: str,
speaker_audio_file: str,
speed: float = 1.0,
output_file: str = None
)
```
### Data Models
#### SynthesisResult
```python
class SynthesisResult:
audio_data: bytes # Audio chunk data
text_index: int # Text segment index
chunk_index: int # Audio chunk index
session_id: str # Synthesis session ID
metadata: dict # Additional metadata
```
#### Error Handling
```python
# Exception hierarchy
CosyVoiceError # Base exception
├── ConnectionError # Network connection issues
├── AuthenticationError # Authentication failures
├── SpeakerError # Speaker management errors
├── SynthesisError # Speech synthesis errors
├── InvalidStateError # Client state errors
└── ValidationError # Input validation errors
# Error handling example
try:
async with cosyvoice.create_client() as client:
audio = await client.collect_audio("Hello world", config)
except cosyvoice.ConnectionError as e:
print(f"Connection failed: {e}")
except cosyvoice.SpeakerError as e:
print(f"Speaker error: {e}")
except cosyvoice.SynthesisError as e:
print(f"Synthesis error: {e}")
```
## Advanced Usage
### Production Integration Patterns
#### 1. High-Concurrency Server Integration
```python
import asyncio
import cosyvoice
from typing import Dict
import uuid
class ProductionTTSService:
def __init__(self, endpoint: str, api_key: str):
self.endpoint = endpoint
self.api_key = api_key
self.active_sessions: Dict[str, asyncio.Task] = {}
self.session_semaphore = asyncio.Semaphore(1000) # Max concurrent sessions
async def create_session_client(self) -> cosyvoice.StreamClient:
"""Create dedicated client for each session"""
return await cosyvoice.create_client(self.endpoint, api_key=self.api_key)
async def handle_user_request(self, user_id: str, text: str, config: cosyvoice.SynthesisConfig):
"""Handle individual user TTS request"""
async with self.session_semaphore:
session_id = f"{user_id}_{uuid.uuid4().hex[:8]}"
try:
async with self.create_session_client() as client:
# Stream synthesis results
async for result in client.synthesize_text(text, config):
# Send to user immediately (WebSocket/SSE/etc.)
await self.send_to_user(user_id, result.audio_data)
except Exception as e:
await self.handle_error(user_id, e)
finally:
# Cleanup
if session_id in self.active_sessions:
del self.active_sessions[session_id]
# FastAPI integration example
from fastapi import FastAPI, WebSocket
app = FastAPI()
tts_service = ProductionTTSService("wss://api.cosyvoice.com", "your_key")
@app.websocket("/tts/{user_id}")
async def tts_websocket(websocket: WebSocket, user_id: str):
await websocket.accept()
try:
while True:
# Receive TTS request
data = await websocket.receive_json()
config = cosyvoice.SynthesisConfig(
speaker_id=data["speaker_id"],
speed=data.get("speed", 1.0)
)
# Process in background
task = asyncio.create_task(
tts_service.handle_user_request(user_id, data["text"], config)
)
tts_service.active_sessions[f"{user_id}_current"] = task
except Exception as e:
print(f"WebSocket error: {e}")
```
#### 2. LLM + TTS Integration
```python
async def llm_with_voice_response(user_question: str, voice_config: cosyvoice.SynthesisConfig):
"""Stream LLM response directly to voice synthesis"""
async def llm_text_stream():
# Replace with your LLM client (OpenAI, Anthropic, etc.)
async for text_chunk in your_llm_client.stream(user_question):
yield text_chunk
async with cosyvoice.create_client() as tts_client:
# Stream voice synthesis from LLM output
async for audio_result in tts_client.synthesize_stream(llm_text_stream(), voice_config):
# Send audio to user in real-time
await send_audio_to_user(audio_result.audio_data)
```
### Performance Monitoring
```python
import time
from prometheus_client import Counter, Histogram
# Metrics
tts_requests_total = Counter('cosyvoice_requests_total', 'Total TTS requests')
tts_duration_seconds = Histogram('cosyvoice_duration_seconds', 'TTS request duration')
tts_errors_total = Counter('cosyvoice_errors_total', 'Total TTS errors', ['error_type'])
async def monitored_synthesis(client: cosyvoice.StreamClient, text: str, config: cosyvoice.SynthesisConfig):
"""TTS with monitoring metrics"""
tts_requests_total.inc()
start_time = time.time()
try:
with tts_duration_seconds.time():
audio_data = await client.collect_audio(text, config)
return audio_data
except cosyvoice.ConnectionError:
tts_errors_total.labels(error_type='connection').inc()
raise
except cosyvoice.SynthesisError:
tts_errors_total.labels(error_type='synthesis').inc()
raise
```
### Environment Variables Reference
| Variable | Default | Description |
|----------|---------|-------------|
| `COSYVOICE_BASE_URL` | `http://localhost:8080` | Service endpoint URL |
| `COSYVOICE_API_KEY` | None | API authentication key |
| `COSYVOICE_CONNECTION_TIMEOUT` | `30.0` | Connection timeout (seconds) |
| `COSYVOICE_READ_TIMEOUT` | `60.0` | Read timeout (seconds) |
| `COSYVOICE_MAX_RECONNECT_ATTEMPTS` | `3` | Maximum reconnection attempts |
| `COSYVOICE_PING_INTERVAL` | `20.0` | WebSocket ping interval (seconds) |
| `COSYVOICE_PING_TIMEOUT` | `10.0` | WebSocket ping timeout (seconds) |
### Protocol Specifications
#### WebSocket Protocol
The SDK communicates with CosyVoice servers using a structured WebSocket protocol:
**Message Format:**
```json
{
"header": {
"version": "1.0",
"message_type": "TEXT_REQUEST",
"timestamp": "2024-01-01T12:00:00Z",
"sequence": 1
},
"payload": {
"session_id": "session_123",
"params": {
"text": "Hello world",
"mode": "zero_shot",
"speed": 1.0,
"output_format": "wav"
}
}
}
```
**Message Types:**
- Client → Server: `CONNECT_REQUEST`, `SESSION_REQUEST`, `TEXT_REQUEST`, `SYNTHESIS_END`
- Server → Client: `AUDIO_RESPONSE`, `AUDIO_COMPLETE`, `ERROR_RESPONSE`
#### HTTP API Endpoints
**Speaker Management:**
```
POST /v1/speakers # Create speaker
GET /v1/speakers/{id} # Get speaker info
PUT /v1/speakers/{id} # Update speaker
DELETE /v1/speakers/{id} # Delete speaker
GET /v1/speakers # List speakers
```
## Development
### Environment Setup
```bash
# Clone repository
git clone https://github.com/cosyvoice/cosyvoice-python.git
cd cosyvoice-python
# Install uv package manager
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install dependencies
uv sync --dev
```
### Running Tests
```bash
# Run all tests
uv run pytest
# Run with coverage
uv run pytest --cov=cosyvoice --cov-report=html
# Run specific test types
uv run pytest -m unit # Unit tests only
uv run pytest -m integration # Integration tests only
uv run pytest -m slow # Network-dependent tests
```
### Code Quality
```bash
# Format code
uv run black cosyvoice tests examples
uv run isort cosyvoice tests examples
# Lint code
uv run ruff check cosyvoice tests examples
uv run ruff check --fix cosyvoice tests examples
# Type checking
uv run mypy cosyvoice
```
### Running Examples
```bash
# Basic synthesis example
uv run python examples/basic_synthesis.py
# Real-time streaming example
uv run python examples/realtime_streaming.py
# Speaker management example
uv run python examples/speaker_management.py
```
## Performance Guidelines
### Latency Optimization
- **TTFB Target**: < 300ms for optimal user experience
- **RTF Target**: < 0.3 for real-time performance
- **Connection Reuse**: Maintain persistent WebSocket connections
- **Streaming**: Use `synthesize_stream()` for lowest latency
### Throughput Optimization
- **Connection Pooling**: Pre-create client connections
- **Concurrent Sessions**: Support multiple parallel synthesis requests
- **Batch Processing**: Group small text segments when possible
- **Format Selection**: Use PCM for lowest processing overhead
### Resource Management
- **Memory**: Process audio chunks immediately, avoid accumulation
- **Connections**: Use context managers for automatic cleanup
- **Error Recovery**: Implement exponential backoff for reconnections
## Troubleshooting
### Common Issues
1. **Connection Timeout**
```python
# Increase timeout values
config = cosyvoice.ClientConfig(
connection_timeout=60.0,
read_timeout=120.0
)
```
2. **Speaker Not Found**
```python
# Always check speaker existence
if not await client.speaker.exists(speaker_id):
speaker = await client.speaker.create(prompt_text, audio_url)
speaker_id = speaker.zero_shot_spk_id
```
3. **Audio Format Issues**
```python
# PCM format requires explicit WAV conversion for playback
from cosyvoice.utils.audio import write_wav_file
write_wav_file(pcm_data, "output.wav", sample_rate=22050)
```
### Debug Logging
```python
import logging
logging.basicConfig(level=logging.DEBUG)
# Enable detailed WebSocket and HTTP logging
logger = logging.getLogger("cosyvoice")
logger.setLevel(logging.DEBUG)
```
## Support & Community
- **Examples**: Complete integration samples in `/examples` directory
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Contributing
We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details on how to submit pull requests, report issues, and suggest improvements.
Raw data
{
"_id": null,
"home_page": null,
"name": "cosyvoice-client",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "async, speech-synthesis, streaming, tts, websocket",
"author": null,
"author_email": "CosyVoice Team <noreply@cosyvoice.com>",
"download_url": "https://files.pythonhosted.org/packages/03/09/7fe4c0241492f21153cbd7cb4a2c2c78d2e4325a86fb27d4d25bc36faf6c/cosyvoice_client-1.0.1.tar.gz",
"platform": null,
"description": "# CosyVoice Python SDK\n\n[](https://opensource.org/licenses/MIT)\n[](https://github.com/psf/black)\n\n**Production-Ready Async TTS Client for CosyVoice Services**\n\nEnterprise-grade asynchronous text-to-speech SDK designed for high-concurrency, low-latency real-time voice synthesis scenarios. The optimal choice for building intelligent voice interaction applications.\n\n## Overview\n\nCosyVoice Python SDK is an async-first TTS client library that provides:\n\n- **\ud83d\ude80 Real-time Streaming Synthesis**: Text stream input, audio stream output with minimized time-to-first-byte\n- **\ud83c\udfad Custom Voice Management**: Create unique voices from audio samples with zero-shot voice cloning\n- **\u26a1 High-Performance Async Architecture**: Support thousands of concurrent requests with auto-reconnection\n- **\ud83d\udd27 Production-Ready**: Complete error handling, monitoring metrics, and load balancing support\n- **\ud83d\udce1 Multi-Protocol Support**: WebSocket streaming synthesis + HTTP RESTful voice management\n- **\ud83c\udfb5 Multi-Format Output**: Support WAV, MP3, PCM and other audio formats\n\n## Quick Start\n\n### Installation\n\n```bash\npip install cosyvoice-client\n# or using uv\nuv add cosyvoice-client\n```\n\n### Authentication & Configuration\n\nThe SDK supports multiple authentication and configuration methods:\n\n#### Environment Variables (Recommended)\n\n```bash\nexport COSYVOICE_BASE_URL=\"https://api.cosyvoice.com\"\nexport COSYVOICE_API_KEY=\"your_api_key_here\"\n```\n\n#### Configuration in Code\n\n```python\nimport cosyvoice\n\n# Method 1: Using connection string\nclient = await cosyvoice.create_client(\n \"wss://api.cosyvoice.com\",\n api_key=\"your_api_key\"\n)\n\n# Method 2: Using configuration object\nconfig = cosyvoice.ClientConfig(\n base_url=\"https://api.cosyvoice.com\",\n api_key=\"your_api_key\",\n connection_timeout=30.0\n)\nclient = await cosyvoice.connect_client(config)\n```\n\n### Basic Usage Example\n\n```python\nimport asyncio\nimport cosyvoice\n\nasync def basic_tts_example():\n # Connect to CosyVoice service\n async with cosyvoice.create_client() as client:\n\n # 1. Create custom speaker\n speaker = await client.speaker.create(\n prompt_text=\"Hello, this is my voice sample.\",\n prompt_audio_path=\"https://example.com/voice_sample.wav\"\n )\n\n # 2. Configure synthesis parameters\n config = cosyvoice.SynthesisConfig(\n speaker_id=speaker.zero_shot_spk_id,\n mode=cosyvoice.SynthesisMode.ZERO_SHOT,\n speed=1.2,\n output_format=cosyvoice.AudioFormat.WAV\n )\n\n # 3. Synthesize speech\n audio_data = await client.collect_audio(\n \"Welcome to CosyVoice TTS service!\",\n config\n )\n\n # 4. Save audio file\n with open(\"output.wav\", \"wb\") as f:\n f.write(audio_data)\n\nasyncio.run(basic_tts_example())\n```\n\n## API Reference\n\n### Authentication\n\nThe SDK uses **Bearer Token** authentication for HTTP APIs and **query parameter token** for WebSocket connections.\n\n| Method | HTTP Header | WebSocket URL Parameter |\n|--------|-------------|------------------------|\n| Bearer Token | `Authorization: Bearer {token}` | `?token={token}` |\n\n### Core Interfaces\n\n#### 1. Client Management\n\n##### Create Client Connection\n\n```python\n# Create client with auto-configuration from environment\nclient = await cosyvoice.create_client()\n\n# Create client with explicit configuration\nclient = await cosyvoice.create_client(\n endpoint_url=\"wss://api.cosyvoice.com\",\n api_key=\"your_api_key\",\n timeout=30.0\n)\n\n# Using context manager (recommended)\nasync with cosyvoice.create_client() as client:\n # Use client\n pass\n```\n\n#### 2. Speaker Management API\n\n##### Create Speaker\n\nCreates a new custom voice from reference audio.\n\n**Request:**\n```python\nspeaker = await client.speaker.create(\n prompt_text: str, # Reference text (1-500 chars)\n prompt_audio_path: str, # HTTP/HTTPS URL to audio file\n zero_shot_spk_id: str = None # Optional custom ID (auto-generated if not provided)\n)\n```\n\n**Response:**\n```python\nclass SpeakerInfo:\n zero_shot_spk_id: str # Speaker unique identifier\n prompt_text: str # Reference text\n created_at: str # Creation timestamp (ISO format)\n audio_url: str # Reference audio URL\n```\n\n##### Get Speaker Information\n\n```python\nspeaker_info = await client.speaker.get_info(speaker_id: str)\n```\n\n##### Update Speaker\n\n```python\nawait client.speaker.update(\n speaker_id: str,\n prompt_text: str = None, # Optional new reference text\n prompt_audio_path: str = None # Optional new reference audio\n)\n```\n\n##### Delete Speaker\n\n```python\nawait client.speaker.delete(speaker_id: str)\n```\n\n##### List Speakers\n\n```python\nspeakers = await client.speaker.list(\n offset: int = 0, # Pagination offset\n limit: int = 50 # Maximum results per page\n)\n```\n\n##### Check Speaker Existence\n\n```python\nexists = await client.speaker.exists(speaker_id: str)\n```\n\n#### 3. Speech Synthesis API\n\n##### Synthesis Configuration\n\n```python\nconfig = cosyvoice.SynthesisConfig(\n speaker_id: str, # Required: Speaker ID\n mode: SynthesisMode = SynthesisMode.ZERO_SHOT, # Synthesis mode\n speed: float = 1.0, # Speed multiplier (0.5-3.0)\n output_format: AudioFormat = AudioFormat.WAV, # Audio format\n sample_rate: int = 22050, # Sample rate (Hz)\n instruct_text: str = None, # Instruction text (instruct mode only)\n bit_rate: int = 192000, # Bit rate for MP3 (bps)\n compression_level: int = 2 # Compression level (0-9)\n)\n```\n\n**Supported Synthesis Modes:**\n- `ZERO_SHOT`: Custom voice cloning mode\n- `SFT`: Pre-trained voice mode\n- `CROSS_LINGUAL`: Cross-lingual synthesis\n- `INSTRUCT`: Natural language instruction mode\n\n**Supported Audio Formats:**\n- `WAV`: Uncompressed audio\n- `MP3`: Compressed audio with configurable bit rate\n- `PCM`: Raw audio data\n\n##### Batch Synthesis\n\nSynthesize entire text at once.\n\n```python\naudio_data: bytes = await client.collect_audio(\n text: str,\n config: SynthesisConfig\n)\n```\n\n##### Streaming Synthesis\n\nProcess audio chunks as they arrive for low-latency playback.\n\n```python\nasync for result in client.synthesize_text(text: str, config: SynthesisConfig):\n # result.audio_data: bytes - Audio chunk data\n # result.text_index: int - Text segment index\n # result.chunk_index: int - Audio chunk index within text segment\n\n\n # Process audio chunk immediately for real-time playback\n await audio_player.play(result.audio_data)\n```\n\n##### Text Stream Synthesis\n\nSynthesize text as it arrives (ideal for LLM integration).\n\n```python\nasync def text_generator():\n # Simulate streaming text from LLM\n sentences = [\"Hello\", \"How are you?\", \"Welcome to our service!\"]\n for sentence in sentences:\n yield sentence\n await asyncio.sleep(0.1)\n\nasync for result in client.synthesize_stream(text_generator(), config):\n await audio_player.play(result.audio_data)\n```\n\n##### Quick Synthesis\n\nOne-shot synthesis with automatic speaker creation.\n\n```python\naudio_data = await client.quick_synthesize(\n text: str,\n speaker_prompt_text: str,\n speaker_audio_file: str,\n speed: float = 1.0,\n output_file: str = None\n)\n```\n\n### Data Models\n\n#### SynthesisResult\n\n```python\nclass SynthesisResult:\n audio_data: bytes # Audio chunk data\n text_index: int # Text segment index\n chunk_index: int # Audio chunk index\n session_id: str # Synthesis session ID\n metadata: dict # Additional metadata\n```\n\n#### Error Handling\n\n```python\n# Exception hierarchy\nCosyVoiceError # Base exception\n\u251c\u2500\u2500 ConnectionError # Network connection issues\n\u251c\u2500\u2500 AuthenticationError # Authentication failures\n\u251c\u2500\u2500 SpeakerError # Speaker management errors\n\u251c\u2500\u2500 SynthesisError # Speech synthesis errors\n\u251c\u2500\u2500 InvalidStateError # Client state errors\n\u2514\u2500\u2500 ValidationError # Input validation errors\n\n# Error handling example\ntry:\n async with cosyvoice.create_client() as client:\n audio = await client.collect_audio(\"Hello world\", config)\n\nexcept cosyvoice.ConnectionError as e:\n print(f\"Connection failed: {e}\")\nexcept cosyvoice.SpeakerError as e:\n print(f\"Speaker error: {e}\")\nexcept cosyvoice.SynthesisError as e:\n print(f\"Synthesis error: {e}\")\n```\n\n## Advanced Usage\n\n### Production Integration Patterns\n\n#### 1. High-Concurrency Server Integration\n\n```python\nimport asyncio\nimport cosyvoice\nfrom typing import Dict\nimport uuid\n\nclass ProductionTTSService:\n def __init__(self, endpoint: str, api_key: str):\n self.endpoint = endpoint\n self.api_key = api_key\n self.active_sessions: Dict[str, asyncio.Task] = {}\n self.session_semaphore = asyncio.Semaphore(1000) # Max concurrent sessions\n\n async def create_session_client(self) -> cosyvoice.StreamClient:\n \"\"\"Create dedicated client for each session\"\"\"\n return await cosyvoice.create_client(self.endpoint, api_key=self.api_key)\n\n async def handle_user_request(self, user_id: str, text: str, config: cosyvoice.SynthesisConfig):\n \"\"\"Handle individual user TTS request\"\"\"\n async with self.session_semaphore:\n session_id = f\"{user_id}_{uuid.uuid4().hex[:8]}\"\n\n try:\n async with self.create_session_client() as client:\n # Stream synthesis results\n async for result in client.synthesize_text(text, config):\n # Send to user immediately (WebSocket/SSE/etc.)\n await self.send_to_user(user_id, result.audio_data)\n\n except Exception as e:\n await self.handle_error(user_id, e)\n finally:\n # Cleanup\n if session_id in self.active_sessions:\n del self.active_sessions[session_id]\n\n# FastAPI integration example\nfrom fastapi import FastAPI, WebSocket\n\napp = FastAPI()\ntts_service = ProductionTTSService(\"wss://api.cosyvoice.com\", \"your_key\")\n\n@app.websocket(\"/tts/{user_id}\")\nasync def tts_websocket(websocket: WebSocket, user_id: str):\n await websocket.accept()\n\n try:\n while True:\n # Receive TTS request\n data = await websocket.receive_json()\n\n config = cosyvoice.SynthesisConfig(\n speaker_id=data[\"speaker_id\"],\n speed=data.get(\"speed\", 1.0)\n )\n\n # Process in background\n task = asyncio.create_task(\n tts_service.handle_user_request(user_id, data[\"text\"], config)\n )\n tts_service.active_sessions[f\"{user_id}_current\"] = task\n\n except Exception as e:\n print(f\"WebSocket error: {e}\")\n```\n\n#### 2. LLM + TTS Integration\n\n```python\nasync def llm_with_voice_response(user_question: str, voice_config: cosyvoice.SynthesisConfig):\n \"\"\"Stream LLM response directly to voice synthesis\"\"\"\n\n async def llm_text_stream():\n # Replace with your LLM client (OpenAI, Anthropic, etc.)\n async for text_chunk in your_llm_client.stream(user_question):\n yield text_chunk\n\n async with cosyvoice.create_client() as tts_client:\n # Stream voice synthesis from LLM output\n async for audio_result in tts_client.synthesize_stream(llm_text_stream(), voice_config):\n # Send audio to user in real-time\n await send_audio_to_user(audio_result.audio_data)\n```\n\n\n### Performance Monitoring\n\n```python\nimport time\nfrom prometheus_client import Counter, Histogram\n\n# Metrics\ntts_requests_total = Counter('cosyvoice_requests_total', 'Total TTS requests')\ntts_duration_seconds = Histogram('cosyvoice_duration_seconds', 'TTS request duration')\ntts_errors_total = Counter('cosyvoice_errors_total', 'Total TTS errors', ['error_type'])\n\nasync def monitored_synthesis(client: cosyvoice.StreamClient, text: str, config: cosyvoice.SynthesisConfig):\n \"\"\"TTS with monitoring metrics\"\"\"\n tts_requests_total.inc()\n start_time = time.time()\n\n try:\n with tts_duration_seconds.time():\n audio_data = await client.collect_audio(text, config)\n return audio_data\n\n except cosyvoice.ConnectionError:\n tts_errors_total.labels(error_type='connection').inc()\n raise\n except cosyvoice.SynthesisError:\n tts_errors_total.labels(error_type='synthesis').inc()\n raise\n```\n\n### Environment Variables Reference\n\n| Variable | Default | Description |\n|----------|---------|-------------|\n| `COSYVOICE_BASE_URL` | `http://localhost:8080` | Service endpoint URL |\n| `COSYVOICE_API_KEY` | None | API authentication key |\n| `COSYVOICE_CONNECTION_TIMEOUT` | `30.0` | Connection timeout (seconds) |\n| `COSYVOICE_READ_TIMEOUT` | `60.0` | Read timeout (seconds) |\n| `COSYVOICE_MAX_RECONNECT_ATTEMPTS` | `3` | Maximum reconnection attempts |\n| `COSYVOICE_PING_INTERVAL` | `20.0` | WebSocket ping interval (seconds) |\n| `COSYVOICE_PING_TIMEOUT` | `10.0` | WebSocket ping timeout (seconds) |\n\n### Protocol Specifications\n\n#### WebSocket Protocol\n\nThe SDK communicates with CosyVoice servers using a structured WebSocket protocol:\n\n**Message Format:**\n```json\n{\n \"header\": {\n \"version\": \"1.0\",\n \"message_type\": \"TEXT_REQUEST\",\n \"timestamp\": \"2024-01-01T12:00:00Z\",\n \"sequence\": 1\n },\n \"payload\": {\n \"session_id\": \"session_123\",\n \"params\": {\n \"text\": \"Hello world\",\n \"mode\": \"zero_shot\",\n \"speed\": 1.0,\n \"output_format\": \"wav\"\n }\n }\n}\n```\n\n**Message Types:**\n- Client \u2192 Server: `CONNECT_REQUEST`, `SESSION_REQUEST`, `TEXT_REQUEST`, `SYNTHESIS_END`\n- Server \u2192 Client: `AUDIO_RESPONSE`, `AUDIO_COMPLETE`, `ERROR_RESPONSE`\n\n#### HTTP API Endpoints\n\n**Speaker Management:**\n```\nPOST /v1/speakers # Create speaker\nGET /v1/speakers/{id} # Get speaker info\nPUT /v1/speakers/{id} # Update speaker\nDELETE /v1/speakers/{id} # Delete speaker\nGET /v1/speakers # List speakers\n```\n\n## Development\n\n### Environment Setup\n\n```bash\n# Clone repository\ngit clone https://github.com/cosyvoice/cosyvoice-python.git\ncd cosyvoice-python\n\n# Install uv package manager\ncurl -LsSf https://astral.sh/uv/install.sh | sh\n\n# Install dependencies\nuv sync --dev\n```\n\n### Running Tests\n\n```bash\n# Run all tests\nuv run pytest\n\n# Run with coverage\nuv run pytest --cov=cosyvoice --cov-report=html\n\n# Run specific test types\nuv run pytest -m unit # Unit tests only\nuv run pytest -m integration # Integration tests only\nuv run pytest -m slow # Network-dependent tests\n```\n\n### Code Quality\n\n```bash\n# Format code\nuv run black cosyvoice tests examples\nuv run isort cosyvoice tests examples\n\n# Lint code\nuv run ruff check cosyvoice tests examples\nuv run ruff check --fix cosyvoice tests examples\n\n# Type checking\nuv run mypy cosyvoice\n```\n\n### Running Examples\n\n```bash\n# Basic synthesis example\nuv run python examples/basic_synthesis.py\n\n# Real-time streaming example\nuv run python examples/realtime_streaming.py\n\n# Speaker management example\nuv run python examples/speaker_management.py\n```\n\n## Performance Guidelines\n\n### Latency Optimization\n\n- **TTFB Target**: < 300ms for optimal user experience\n- **RTF Target**: < 0.3 for real-time performance\n- **Connection Reuse**: Maintain persistent WebSocket connections\n- **Streaming**: Use `synthesize_stream()` for lowest latency\n\n### Throughput Optimization\n\n- **Connection Pooling**: Pre-create client connections\n- **Concurrent Sessions**: Support multiple parallel synthesis requests\n- **Batch Processing**: Group small text segments when possible\n- **Format Selection**: Use PCM for lowest processing overhead\n\n### Resource Management\n\n- **Memory**: Process audio chunks immediately, avoid accumulation\n- **Connections**: Use context managers for automatic cleanup\n- **Error Recovery**: Implement exponential backoff for reconnections\n\n## Troubleshooting\n\n### Common Issues\n\n1. **Connection Timeout**\n ```python\n # Increase timeout values\n config = cosyvoice.ClientConfig(\n connection_timeout=60.0,\n read_timeout=120.0\n )\n ```\n\n2. **Speaker Not Found**\n ```python\n # Always check speaker existence\n if not await client.speaker.exists(speaker_id):\n speaker = await client.speaker.create(prompt_text, audio_url)\n speaker_id = speaker.zero_shot_spk_id\n ```\n\n3. **Audio Format Issues**\n ```python\n # PCM format requires explicit WAV conversion for playback\n from cosyvoice.utils.audio import write_wav_file\n write_wav_file(pcm_data, \"output.wav\", sample_rate=22050)\n ```\n\n### Debug Logging\n\n```python\nimport logging\nlogging.basicConfig(level=logging.DEBUG)\n\n# Enable detailed WebSocket and HTTP logging\nlogger = logging.getLogger(\"cosyvoice\")\nlogger.setLevel(logging.DEBUG)\n```\n\n## Support & Community\n\n- **Examples**: Complete integration samples in `/examples` directory\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Contributing\n\nWe welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details on how to submit pull requests, report issues, and suggest improvements.\n\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Asynchronous streaming TTS client for CosyVoice",
"version": "1.0.1",
"project_urls": {
"Bug Tracker": "https://github.com/cosyvoice/cosyvoice-python/issues",
"Documentation": "https://cosyvoice.github.io/cosyvoice-python",
"Homepage": "https://github.com/cosyvoice/cosyvoice-python",
"Repository": "https://github.com/cosyvoice/cosyvoice-python"
},
"split_keywords": [
"async",
" speech-synthesis",
" streaming",
" tts",
" websocket"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "fe801828581a3291f33fb77c6a552dbaeaf342feba43feff371caea191f8ea0c",
"md5": "3d6ff619d093d0a3beac4f48807d4089",
"sha256": "316768fb48e7e02f699f13a6a6b225e0b657ed0b269dbc0940858f21cccdeadb"
},
"downloads": -1,
"filename": "cosyvoice_client-1.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3d6ff619d093d0a3beac4f48807d4089",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 46187,
"upload_time": "2025-09-15T01:20:21",
"upload_time_iso_8601": "2025-09-15T01:20:21.022183Z",
"url": "https://files.pythonhosted.org/packages/fe/80/1828581a3291f33fb77c6a552dbaeaf342feba43feff371caea191f8ea0c/cosyvoice_client-1.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "03097fe4c0241492f21153cbd7cb4a2c2c78d2e4325a86fb27d4d25bc36faf6c",
"md5": "a55f9dec323fa6ab73d831fbac049de1",
"sha256": "ca44a93f26a6f4fef7061587591174cfb953dd4bfe5741f959329dacbdad5470"
},
"downloads": -1,
"filename": "cosyvoice_client-1.0.1.tar.gz",
"has_sig": false,
"md5_digest": "a55f9dec323fa6ab73d831fbac049de1",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 54906,
"upload_time": "2025-09-15T01:20:22",
"upload_time_iso_8601": "2025-09-15T01:20:22.759717Z",
"url": "https://files.pythonhosted.org/packages/03/09/7fe4c0241492f21153cbd7cb4a2c2c78d2e4325a86fb27d4d25bc36faf6c/cosyvoice_client-1.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-15 01:20:22",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "cosyvoice",
"github_project": "cosyvoice-python",
"github_not_found": true,
"lcname": "cosyvoice-client"
}