cosyvoice-client

Name	cosyvoice-client JSON
Version	1.0.1 JSON
	download
home_page	None
Summary	Asynchronous streaming TTS client for CosyVoice
upload_time	2025-09-15 01:20:22
maintainer	None
docs_url	None
author	None
requires_python	>=3.10
license	MIT
keywords	async speech-synthesis streaming tts websocket
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # CosyVoice Python SDK

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

**Production-Ready Async TTS Client for CosyVoice Services**

Enterprise-grade asynchronous text-to-speech SDK designed for high-concurrency, low-latency real-time voice synthesis scenarios. The optimal choice for building intelligent voice interaction applications.

## Overview

CosyVoice Python SDK is an async-first TTS client library that provides:

- **🚀 Real-time Streaming Synthesis**: Text stream input, audio stream output with minimized time-to-first-byte
- **🎭 Custom Voice Management**: Create unique voices from audio samples with zero-shot voice cloning
- **⚡ High-Performance Async Architecture**: Support thousands of concurrent requests with auto-reconnection
- **🔧 Production-Ready**: Complete error handling, monitoring metrics, and load balancing support
- **📡 Multi-Protocol Support**: WebSocket streaming synthesis + HTTP RESTful voice management
- **🎵 Multi-Format Output**: Support WAV, MP3, PCM and other audio formats

## Quick Start

### Installation

```bash
pip install cosyvoice-client
# or using uv
uv add cosyvoice-client
```

### Authentication & Configuration

The SDK supports multiple authentication and configuration methods:

#### Environment Variables (Recommended)

```bash
export COSYVOICE_BASE_URL="https://api.cosyvoice.com"
export COSYVOICE_API_KEY="your_api_key_here"
```

#### Configuration in Code

```python
import cosyvoice

# Method 1: Using connection string
client = await cosyvoice.create_client(
    "wss://api.cosyvoice.com",
    api_key="your_api_key"
)

# Method 2: Using configuration object
config = cosyvoice.ClientConfig(
    base_url="https://api.cosyvoice.com",
    api_key="your_api_key",
    connection_timeout=30.0
)
client = await cosyvoice.connect_client(config)
```

### Basic Usage Example

```python
import asyncio
import cosyvoice

async def basic_tts_example():
    # Connect to CosyVoice service
    async with cosyvoice.create_client() as client:

        # 1. Create custom speaker
        speaker = await client.speaker.create(
            prompt_text="Hello, this is my voice sample.",
            prompt_audio_path="https://example.com/voice_sample.wav"
        )

        # 2. Configure synthesis parameters
        config = cosyvoice.SynthesisConfig(
            speaker_id=speaker.zero_shot_spk_id,
            mode=cosyvoice.SynthesisMode.ZERO_SHOT,
            speed=1.2,
            output_format=cosyvoice.AudioFormat.WAV
        )

        # 3. Synthesize speech
        audio_data = await client.collect_audio(
            "Welcome to CosyVoice TTS service!",
            config
        )

        # 4. Save audio file
        with open("output.wav", "wb") as f:
            f.write(audio_data)

asyncio.run(basic_tts_example())
```

## API Reference

### Authentication

The SDK uses **Bearer Token** authentication for HTTP APIs and **query parameter token** for WebSocket connections.

| Method | HTTP Header | WebSocket URL Parameter |
|--------|-------------|------------------------|
| Bearer Token | `Authorization: Bearer {token}` | `?token={token}` |

### Core Interfaces

#### 1. Client Management

##### Create Client Connection

```python
# Create client with auto-configuration from environment
client = await cosyvoice.create_client()

# Create client with explicit configuration
client = await cosyvoice.create_client(
    endpoint_url="wss://api.cosyvoice.com",
    api_key="your_api_key",
    timeout=30.0
)

# Using context manager (recommended)
async with cosyvoice.create_client() as client:
    # Use client
    pass
```

#### 2. Speaker Management API

##### Create Speaker

Creates a new custom voice from reference audio.

**Request:**
```python
speaker = await client.speaker.create(
    prompt_text: str,              # Reference text (1-500 chars)
    prompt_audio_path: str,        # HTTP/HTTPS URL to audio file
    zero_shot_spk_id: str = None   # Optional custom ID (auto-generated if not provided)
)
```

**Response:**
```python
class SpeakerInfo:
    zero_shot_spk_id: str     # Speaker unique identifier
    prompt_text: str          # Reference text
    created_at: str           # Creation timestamp (ISO format)
    audio_url: str            # Reference audio URL
```

##### Get Speaker Information

```python
speaker_info = await client.speaker.get_info(speaker_id: str)
```

##### Update Speaker

```python
await client.speaker.update(
    speaker_id: str,
    prompt_text: str = None,        # Optional new reference text
    prompt_audio_path: str = None   # Optional new reference audio
)
```

##### Delete Speaker

```python
await client.speaker.delete(speaker_id: str)
```

##### List Speakers

```python
speakers = await client.speaker.list(
    offset: int = 0,    # Pagination offset
    limit: int = 50     # Maximum results per page
)
```

##### Check Speaker Existence

```python
exists = await client.speaker.exists(speaker_id: str)
```

#### 3. Speech Synthesis API

##### Synthesis Configuration

```python
config = cosyvoice.SynthesisConfig(
    speaker_id: str,                                    # Required: Speaker ID
    mode: SynthesisMode = SynthesisMode.ZERO_SHOT,     # Synthesis mode
    speed: float = 1.0,                                # Speed multiplier (0.5-3.0)
    output_format: AudioFormat = AudioFormat.WAV,      # Audio format
    sample_rate: int = 22050,                          # Sample rate (Hz)
    instruct_text: str = None,                         # Instruction text (instruct mode only)
    bit_rate: int = 192000,                            # Bit rate for MP3 (bps)
    compression_level: int = 2                         # Compression level (0-9)
)
```

**Supported Synthesis Modes:**
- `ZERO_SHOT`: Custom voice cloning mode
- `SFT`: Pre-trained voice mode
- `CROSS_LINGUAL`: Cross-lingual synthesis
- `INSTRUCT`: Natural language instruction mode

**Supported Audio Formats:**
- `WAV`: Uncompressed audio
- `MP3`: Compressed audio with configurable bit rate
- `PCM`: Raw audio data

##### Batch Synthesis

Synthesize entire text at once.

```python
audio_data: bytes = await client.collect_audio(
    text: str,
    config: SynthesisConfig
)
```

##### Streaming Synthesis

Process audio chunks as they arrive for low-latency playback.

```python
async for result in client.synthesize_text(text: str, config: SynthesisConfig):
    # result.audio_data: bytes - Audio chunk data
    # result.text_index: int - Text segment index
    # result.chunk_index: int - Audio chunk index within text segment


    # Process audio chunk immediately for real-time playback
    await audio_player.play(result.audio_data)
```

##### Text Stream Synthesis

Synthesize text as it arrives (ideal for LLM integration).

```python
async def text_generator():
    # Simulate streaming text from LLM
    sentences = ["Hello", "How are you?", "Welcome to our service!"]
    for sentence in sentences:
        yield sentence
        await asyncio.sleep(0.1)

async for result in client.synthesize_stream(text_generator(), config):
    await audio_player.play(result.audio_data)
```

##### Quick Synthesis

One-shot synthesis with automatic speaker creation.

```python
audio_data = await client.quick_synthesize(
    text: str,
    speaker_prompt_text: str,
    speaker_audio_file: str,
    speed: float = 1.0,
    output_file: str = None
)
```

### Data Models

#### SynthesisResult

```python
class SynthesisResult:
    audio_data: bytes      # Audio chunk data
    text_index: int        # Text segment index
    chunk_index: int       # Audio chunk index
    session_id: str        # Synthesis session ID
    metadata: dict         # Additional metadata
```

#### Error Handling

```python
# Exception hierarchy
CosyVoiceError                 # Base exception
├── ConnectionError            # Network connection issues
├── AuthenticationError        # Authentication failures
├── SpeakerError              # Speaker management errors
├── SynthesisError            # Speech synthesis errors
├── InvalidStateError         # Client state errors
└── ValidationError           # Input validation errors

# Error handling example
try:
    async with cosyvoice.create_client() as client:
        audio = await client.collect_audio("Hello world", config)

except cosyvoice.ConnectionError as e:
    print(f"Connection failed: {e}")
except cosyvoice.SpeakerError as e:
    print(f"Speaker error: {e}")
except cosyvoice.SynthesisError as e:
    print(f"Synthesis error: {e}")
```

## Advanced Usage

### Production Integration Patterns

#### 1. High-Concurrency Server Integration

```python
import asyncio
import cosyvoice
from typing import Dict
import uuid

class ProductionTTSService:
    def __init__(self, endpoint: str, api_key: str):
        self.endpoint = endpoint
        self.api_key = api_key
        self.active_sessions: Dict[str, asyncio.Task] = {}
        self.session_semaphore = asyncio.Semaphore(1000)  # Max concurrent sessions

    async def create_session_client(self) -> cosyvoice.StreamClient:
        """Create dedicated client for each session"""
        return await cosyvoice.create_client(self.endpoint, api_key=self.api_key)

    async def handle_user_request(self, user_id: str, text: str, config: cosyvoice.SynthesisConfig):
        """Handle individual user TTS request"""
        async with self.session_semaphore:
            session_id = f"{user_id}_{uuid.uuid4().hex[:8]}"

            try:
                async with self.create_session_client() as client:
                    # Stream synthesis results
                    async for result in client.synthesize_text(text, config):
                        # Send to user immediately (WebSocket/SSE/etc.)
                        await self.send_to_user(user_id, result.audio_data)

            except Exception as e:
                await self.handle_error(user_id, e)
            finally:
                # Cleanup
                if session_id in self.active_sessions:
                    del self.active_sessions[session_id]

# FastAPI integration example
from fastapi import FastAPI, WebSocket

app = FastAPI()
tts_service = ProductionTTSService("wss://api.cosyvoice.com", "your_key")

@app.websocket("/tts/{user_id}")
async def tts_websocket(websocket: WebSocket, user_id: str):
    await websocket.accept()

    try:
        while True:
            # Receive TTS request
            data = await websocket.receive_json()

            config = cosyvoice.SynthesisConfig(
                speaker_id=data["speaker_id"],
                speed=data.get("speed", 1.0)
            )

            # Process in background
            task = asyncio.create_task(
                tts_service.handle_user_request(user_id, data["text"], config)
            )
            tts_service.active_sessions[f"{user_id}_current"] = task

    except Exception as e:
        print(f"WebSocket error: {e}")
```

#### 2. LLM + TTS Integration

```python
async def llm_with_voice_response(user_question: str, voice_config: cosyvoice.SynthesisConfig):
    """Stream LLM response directly to voice synthesis"""

    async def llm_text_stream():
        # Replace with your LLM client (OpenAI, Anthropic, etc.)
        async for text_chunk in your_llm_client.stream(user_question):
            yield text_chunk

    async with cosyvoice.create_client() as tts_client:
        # Stream voice synthesis from LLM output
        async for audio_result in tts_client.synthesize_stream(llm_text_stream(), voice_config):
            # Send audio to user in real-time
            await send_audio_to_user(audio_result.audio_data)
```


### Performance Monitoring

```python
import time
from prometheus_client import Counter, Histogram

# Metrics
tts_requests_total = Counter('cosyvoice_requests_total', 'Total TTS requests')
tts_duration_seconds = Histogram('cosyvoice_duration_seconds', 'TTS request duration')
tts_errors_total = Counter('cosyvoice_errors_total', 'Total TTS errors', ['error_type'])

async def monitored_synthesis(client: cosyvoice.StreamClient, text: str, config: cosyvoice.SynthesisConfig):
    """TTS with monitoring metrics"""
    tts_requests_total.inc()
    start_time = time.time()

    try:
        with tts_duration_seconds.time():
            audio_data = await client.collect_audio(text, config)
            return audio_data

    except cosyvoice.ConnectionError:
        tts_errors_total.labels(error_type='connection').inc()
        raise
    except cosyvoice.SynthesisError:
        tts_errors_total.labels(error_type='synthesis').inc()
        raise
```

### Environment Variables Reference

| Variable | Default | Description |
|----------|---------|-------------|
| `COSYVOICE_BASE_URL` | `http://localhost:8080` | Service endpoint URL |
| `COSYVOICE_API_KEY` | None | API authentication key |
| `COSYVOICE_CONNECTION_TIMEOUT` | `30.0` | Connection timeout (seconds) |
| `COSYVOICE_READ_TIMEOUT` | `60.0` | Read timeout (seconds) |
| `COSYVOICE_MAX_RECONNECT_ATTEMPTS` | `3` | Maximum reconnection attempts |
| `COSYVOICE_PING_INTERVAL` | `20.0` | WebSocket ping interval (seconds) |
| `COSYVOICE_PING_TIMEOUT` | `10.0` | WebSocket ping timeout (seconds) |

### Protocol Specifications

#### WebSocket Protocol

The SDK communicates with CosyVoice servers using a structured WebSocket protocol:

**Message Format:**
```json
{
  "header": {
    "version": "1.0",
    "message_type": "TEXT_REQUEST",
    "timestamp": "2024-01-01T12:00:00Z",
    "sequence": 1
  },
  "payload": {
    "session_id": "session_123",
    "params": {
      "text": "Hello world",
      "mode": "zero_shot",
      "speed": 1.0,
      "output_format": "wav"
    }
  }
}
```

**Message Types:**
- Client → Server: `CONNECT_REQUEST`, `SESSION_REQUEST`, `TEXT_REQUEST`, `SYNTHESIS_END`
- Server → Client: `AUDIO_RESPONSE`, `AUDIO_COMPLETE`, `ERROR_RESPONSE`

#### HTTP API Endpoints

**Speaker Management:**
```
POST   /v1/speakers              # Create speaker
GET    /v1/speakers/{id}         # Get speaker info
PUT    /v1/speakers/{id}         # Update speaker
DELETE /v1/speakers/{id}         # Delete speaker
GET    /v1/speakers              # List speakers
```

## Development

### Environment Setup

```bash
# Clone repository
git clone https://github.com/cosyvoice/cosyvoice-python.git
cd cosyvoice-python

# Install uv package manager
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install dependencies
uv sync --dev
```

### Running Tests

```bash
# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=cosyvoice --cov-report=html

# Run specific test types
uv run pytest -m unit           # Unit tests only
uv run pytest -m integration   # Integration tests only
uv run pytest -m slow          # Network-dependent tests
```

### Code Quality

```bash
# Format code
uv run black cosyvoice tests examples
uv run isort cosyvoice tests examples

# Lint code
uv run ruff check cosyvoice tests examples
uv run ruff check --fix cosyvoice tests examples

# Type checking
uv run mypy cosyvoice
```

### Running Examples

```bash
# Basic synthesis example
uv run python examples/basic_synthesis.py

# Real-time streaming example
uv run python examples/realtime_streaming.py

# Speaker management example
uv run python examples/speaker_management.py
```

## Performance Guidelines

### Latency Optimization

- **TTFB Target**: < 300ms for optimal user experience
- **RTF Target**: < 0.3 for real-time performance
- **Connection Reuse**: Maintain persistent WebSocket connections
- **Streaming**: Use `synthesize_stream()` for lowest latency

### Throughput Optimization

- **Connection Pooling**: Pre-create client connections
- **Concurrent Sessions**: Support multiple parallel synthesis requests
- **Batch Processing**: Group small text segments when possible
- **Format Selection**: Use PCM for lowest processing overhead

### Resource Management

- **Memory**: Process audio chunks immediately, avoid accumulation
- **Connections**: Use context managers for automatic cleanup
- **Error Recovery**: Implement exponential backoff for reconnections

## Troubleshooting

### Common Issues

1. **Connection Timeout**
   ```python
   # Increase timeout values
   config = cosyvoice.ClientConfig(
       connection_timeout=60.0,
       read_timeout=120.0
   )
   ```

2. **Speaker Not Found**
   ```python
   # Always check speaker existence
   if not await client.speaker.exists(speaker_id):
       speaker = await client.speaker.create(prompt_text, audio_url)
       speaker_id = speaker.zero_shot_spk_id
   ```

3. **Audio Format Issues**
   ```python
   # PCM format requires explicit WAV conversion for playback
   from cosyvoice.utils.audio import write_wav_file
   write_wav_file(pcm_data, "output.wav", sample_rate=22050)
   ```

### Debug Logging

```python
import logging
logging.basicConfig(level=logging.DEBUG)

# Enable detailed WebSocket and HTTP logging
logger = logging.getLogger("cosyvoice")
logger.setLevel(logging.DEBUG)
```

## Support & Community

- **Examples**: Complete integration samples in `/examples` directory

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Contributing

We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details on how to submit pull requests, report issues, and suggest improvements.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "cosyvoice-client",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "async, speech-synthesis, streaming, tts, websocket",
    "author": null,
    "author_email": "CosyVoice Team <noreply@cosyvoice.com>",
    "download_url": "https://files.pythonhosted.org/packages/03/09/7fe4c0241492f21153cbd7cb4a2c2c78d2e4325a86fb27d4d25bc36faf6c/cosyvoice_client-1.0.1.tar.gz",
    "platform": null,
    "description": "# CosyVoice Python SDK\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\n**Production-Ready Async TTS Client for CosyVoice Services**\n\nEnterprise-grade asynchronous text-to-speech SDK designed for high-concurrency, low-latency real-time voice synthesis scenarios. The optimal choice for building intelligent voice interaction applications.\n\n## Overview\n\nCosyVoice Python SDK is an async-first TTS client library that provides:\n\n- **\ud83d\ude80 Real-time Streaming Synthesis**: Text stream input, audio stream output with minimized time-to-first-byte\n- **\ud83c\udfad Custom Voice Management**: Create unique voices from audio samples with zero-shot voice cloning\n- **\u26a1 High-Performance Async Architecture**: Support thousands of concurrent requests with auto-reconnection\n- **\ud83d\udd27 Production-Ready**: Complete error handling, monitoring metrics, and load balancing support\n- **\ud83d\udce1 Multi-Protocol Support**: WebSocket streaming synthesis + HTTP RESTful voice management\n- **\ud83c\udfb5 Multi-Format Output**: Support WAV, MP3, PCM and other audio formats\n\n## Quick Start\n\n### Installation\n\n```bash\npip install cosyvoice-client\n# or using uv\nuv add cosyvoice-client\n```\n\n### Authentication & Configuration\n\nThe SDK supports multiple authentication and configuration methods:\n\n#### Environment Variables (Recommended)\n\n```bash\nexport COSYVOICE_BASE_URL=\"https://api.cosyvoice.com\"\nexport COSYVOICE_API_KEY=\"your_api_key_here\"\n```\n\n#### Configuration in Code\n\n```python\nimport cosyvoice\n\n# Method 1: Using connection string\nclient = await cosyvoice.create_client(\n    \"wss://api.cosyvoice.com\",\n    api_key=\"your_api_key\"\n)\n\n# Method 2: Using configuration object\nconfig = cosyvoice.ClientConfig(\n    base_url=\"https://api.cosyvoice.com\",\n    api_key=\"your_api_key\",\n    connection_timeout=30.0\n)\nclient = await cosyvoice.connect_client(config)\n```\n\n### Basic Usage Example\n\n```python\nimport asyncio\nimport cosyvoice\n\nasync def basic_tts_example():\n    # Connect to CosyVoice service\n    async with cosyvoice.create_client() as client:\n\n        # 1. Create custom speaker\n        speaker = await client.speaker.create(\n            prompt_text=\"Hello, this is my voice sample.\",\n            prompt_audio_path=\"https://example.com/voice_sample.wav\"\n        )\n\n        # 2. Configure synthesis parameters\n        config = cosyvoice.SynthesisConfig(\n            speaker_id=speaker.zero_shot_spk_id,\n            mode=cosyvoice.SynthesisMode.ZERO_SHOT,\n            speed=1.2,\n            output_format=cosyvoice.AudioFormat.WAV\n        )\n\n        # 3. Synthesize speech\n        audio_data = await client.collect_audio(\n            \"Welcome to CosyVoice TTS service!\",\n            config\n        )\n\n        # 4. Save audio file\n        with open(\"output.wav\", \"wb\") as f:\n            f.write(audio_data)\n\nasyncio.run(basic_tts_example())\n```\n\n## API Reference\n\n### Authentication\n\nThe SDK uses **Bearer Token** authentication for HTTP APIs and **query parameter token** for WebSocket connections.\n\n| Method | HTTP Header | WebSocket URL Parameter |\n|--------|-------------|------------------------|\n| Bearer Token | `Authorization: Bearer {token}` | `?token={token}` |\n\n### Core Interfaces\n\n#### 1. Client Management\n\n##### Create Client Connection\n\n```python\n# Create client with auto-configuration from environment\nclient = await cosyvoice.create_client()\n\n# Create client with explicit configuration\nclient = await cosyvoice.create_client(\n    endpoint_url=\"wss://api.cosyvoice.com\",\n    api_key=\"your_api_key\",\n    timeout=30.0\n)\n\n# Using context manager (recommended)\nasync with cosyvoice.create_client() as client:\n    # Use client\n    pass\n```\n\n#### 2. Speaker Management API\n\n##### Create Speaker\n\nCreates a new custom voice from reference audio.\n\n**Request:**\n```python\nspeaker = await client.speaker.create(\n    prompt_text: str,              # Reference text (1-500 chars)\n    prompt_audio_path: str,        # HTTP/HTTPS URL to audio file\n    zero_shot_spk_id: str = None   # Optional custom ID (auto-generated if not provided)\n)\n```\n\n**Response:**\n```python\nclass SpeakerInfo:\n    zero_shot_spk_id: str     # Speaker unique identifier\n    prompt_text: str          # Reference text\n    created_at: str           # Creation timestamp (ISO format)\n    audio_url: str            # Reference audio URL\n```\n\n##### Get Speaker Information\n\n```python\nspeaker_info = await client.speaker.get_info(speaker_id: str)\n```\n\n##### Update Speaker\n\n```python\nawait client.speaker.update(\n    speaker_id: str,\n    prompt_text: str = None,        # Optional new reference text\n    prompt_audio_path: str = None   # Optional new reference audio\n)\n```\n\n##### Delete Speaker\n\n```python\nawait client.speaker.delete(speaker_id: str)\n```\n\n##### List Speakers\n\n```python\nspeakers = await client.speaker.list(\n    offset: int = 0,    # Pagination offset\n    limit: int = 50     # Maximum results per page\n)\n```\n\n##### Check Speaker Existence\n\n```python\nexists = await client.speaker.exists(speaker_id: str)\n```\n\n#### 3. Speech Synthesis API\n\n##### Synthesis Configuration\n\n```python\nconfig = cosyvoice.SynthesisConfig(\n    speaker_id: str,                                    # Required: Speaker ID\n    mode: SynthesisMode = SynthesisMode.ZERO_SHOT,     # Synthesis mode\n    speed: float = 1.0,                                # Speed multiplier (0.5-3.0)\n    output_format: AudioFormat = AudioFormat.WAV,      # Audio format\n    sample_rate: int = 22050,                          # Sample rate (Hz)\n    instruct_text: str = None,                         # Instruction text (instruct mode only)\n    bit_rate: int = 192000,                            # Bit rate for MP3 (bps)\n    compression_level: int = 2                         # Compression level (0-9)\n)\n```\n\n**Supported Synthesis Modes:**\n- `ZERO_SHOT`: Custom voice cloning mode\n- `SFT`: Pre-trained voice mode\n- `CROSS_LINGUAL`: Cross-lingual synthesis\n- `INSTRUCT`: Natural language instruction mode\n\n**Supported Audio Formats:**\n- `WAV`: Uncompressed audio\n- `MP3`: Compressed audio with configurable bit rate\n- `PCM`: Raw audio data\n\n##### Batch Synthesis\n\nSynthesize entire text at once.\n\n```python\naudio_data: bytes = await client.collect_audio(\n    text: str,\n    config: SynthesisConfig\n)\n```\n\n##### Streaming Synthesis\n\nProcess audio chunks as they arrive for low-latency playback.\n\n```python\nasync for result in client.synthesize_text(text: str, config: SynthesisConfig):\n    # result.audio_data: bytes - Audio chunk data\n    # result.text_index: int - Text segment index\n    # result.chunk_index: int - Audio chunk index within text segment\n\n\n    # Process audio chunk immediately for real-time playback\n    await audio_player.play(result.audio_data)\n```\n\n##### Text Stream Synthesis\n\nSynthesize text as it arrives (ideal for LLM integration).\n\n```python\nasync def text_generator():\n    # Simulate streaming text from LLM\n    sentences = [\"Hello\", \"How are you?\", \"Welcome to our service!\"]\n    for sentence in sentences:\n        yield sentence\n        await asyncio.sleep(0.1)\n\nasync for result in client.synthesize_stream(text_generator(), config):\n    await audio_player.play(result.audio_data)\n```\n\n##### Quick Synthesis\n\nOne-shot synthesis with automatic speaker creation.\n\n```python\naudio_data = await client.quick_synthesize(\n    text: str,\n    speaker_prompt_text: str,\n    speaker_audio_file: str,\n    speed: float = 1.0,\n    output_file: str = None\n)\n```\n\n### Data Models\n\n#### SynthesisResult\n\n```python\nclass SynthesisResult:\n    audio_data: bytes      # Audio chunk data\n    text_index: int        # Text segment index\n    chunk_index: int       # Audio chunk index\n    session_id: str        # Synthesis session ID\n    metadata: dict         # Additional metadata\n```\n\n#### Error Handling\n\n```python\n# Exception hierarchy\nCosyVoiceError                 # Base exception\n\u251c\u2500\u2500 ConnectionError            # Network connection issues\n\u251c\u2500\u2500 AuthenticationError        # Authentication failures\n\u251c\u2500\u2500 SpeakerError              # Speaker management errors\n\u251c\u2500\u2500 SynthesisError            # Speech synthesis errors\n\u251c\u2500\u2500 InvalidStateError         # Client state errors\n\u2514\u2500\u2500 ValidationError           # Input validation errors\n\n# Error handling example\ntry:\n    async with cosyvoice.create_client() as client:\n        audio = await client.collect_audio(\"Hello world\", config)\n\nexcept cosyvoice.ConnectionError as e:\n    print(f\"Connection failed: {e}\")\nexcept cosyvoice.SpeakerError as e:\n    print(f\"Speaker error: {e}\")\nexcept cosyvoice.SynthesisError as e:\n    print(f\"Synthesis error: {e}\")\n```\n\n## Advanced Usage\n\n### Production Integration Patterns\n\n#### 1. High-Concurrency Server Integration\n\n```python\nimport asyncio\nimport cosyvoice\nfrom typing import Dict\nimport uuid\n\nclass ProductionTTSService:\n    def __init__(self, endpoint: str, api_key: str):\n        self.endpoint = endpoint\n        self.api_key = api_key\n        self.active_sessions: Dict[str, asyncio.Task] = {}\n        self.session_semaphore = asyncio.Semaphore(1000)  # Max concurrent sessions\n\n    async def create_session_client(self) -> cosyvoice.StreamClient:\n        \"\"\"Create dedicated client for each session\"\"\"\n        return await cosyvoice.create_client(self.endpoint, api_key=self.api_key)\n\n    async def handle_user_request(self, user_id: str, text: str, config: cosyvoice.SynthesisConfig):\n        \"\"\"Handle individual user TTS request\"\"\"\n        async with self.session_semaphore:\n            session_id = f\"{user_id}_{uuid.uuid4().hex[:8]}\"\n\n            try:\n                async with self.create_session_client() as client:\n                    # Stream synthesis results\n                    async for result in client.synthesize_text(text, config):\n                        # Send to user immediately (WebSocket/SSE/etc.)\n                        await self.send_to_user(user_id, result.audio_data)\n\n            except Exception as e:\n                await self.handle_error(user_id, e)\n            finally:\n                # Cleanup\n                if session_id in self.active_sessions:\n                    del self.active_sessions[session_id]\n\n# FastAPI integration example\nfrom fastapi import FastAPI, WebSocket\n\napp = FastAPI()\ntts_service = ProductionTTSService(\"wss://api.cosyvoice.com\", \"your_key\")\n\n@app.websocket(\"/tts/{user_id}\")\nasync def tts_websocket(websocket: WebSocket, user_id: str):\n    await websocket.accept()\n\n    try:\n        while True:\n            # Receive TTS request\n            data = await websocket.receive_json()\n\n            config = cosyvoice.SynthesisConfig(\n                speaker_id=data[\"speaker_id\"],\n                speed=data.get(\"speed\", 1.0)\n            )\n\n            # Process in background\n            task = asyncio.create_task(\n                tts_service.handle_user_request(user_id, data[\"text\"], config)\n            )\n            tts_service.active_sessions[f\"{user_id}_current\"] = task\n\n    except Exception as e:\n        print(f\"WebSocket error: {e}\")\n```\n\n#### 2. LLM + TTS Integration\n\n```python\nasync def llm_with_voice_response(user_question: str, voice_config: cosyvoice.SynthesisConfig):\n    \"\"\"Stream LLM response directly to voice synthesis\"\"\"\n\n    async def llm_text_stream():\n        # Replace with your LLM client (OpenAI, Anthropic, etc.)\n        async for text_chunk in your_llm_client.stream(user_question):\n            yield text_chunk\n\n    async with cosyvoice.create_client() as tts_client:\n        # Stream voice synthesis from LLM output\n        async for audio_result in tts_client.synthesize_stream(llm_text_stream(), voice_config):\n            # Send audio to user in real-time\n            await send_audio_to_user(audio_result.audio_data)\n```\n\n\n### Performance Monitoring\n\n```python\nimport time\nfrom prometheus_client import Counter, Histogram\n\n# Metrics\ntts_requests_total = Counter('cosyvoice_requests_total', 'Total TTS requests')\ntts_duration_seconds = Histogram('cosyvoice_duration_seconds', 'TTS request duration')\ntts_errors_total = Counter('cosyvoice_errors_total', 'Total TTS errors', ['error_type'])\n\nasync def monitored_synthesis(client: cosyvoice.StreamClient, text: str, config: cosyvoice.SynthesisConfig):\n    \"\"\"TTS with monitoring metrics\"\"\"\n    tts_requests_total.inc()\n    start_time = time.time()\n\n    try:\n        with tts_duration_seconds.time():\n            audio_data = await client.collect_audio(text, config)\n            return audio_data\n\n    except cosyvoice.ConnectionError:\n        tts_errors_total.labels(error_type='connection').inc()\n        raise\n    except cosyvoice.SynthesisError:\n        tts_errors_total.labels(error_type='synthesis').inc()\n        raise\n```\n\n### Environment Variables Reference\n\n| Variable | Default | Description |\n|----------|---------|-------------|\n| `COSYVOICE_BASE_URL` | `http://localhost:8080` | Service endpoint URL |\n| `COSYVOICE_API_KEY` | None | API authentication key |\n| `COSYVOICE_CONNECTION_TIMEOUT` | `30.0` | Connection timeout (seconds) |\n| `COSYVOICE_READ_TIMEOUT` | `60.0` | Read timeout (seconds) |\n| `COSYVOICE_MAX_RECONNECT_ATTEMPTS` | `3` | Maximum reconnection attempts |\n| `COSYVOICE_PING_INTERVAL` | `20.0` | WebSocket ping interval (seconds) |\n| `COSYVOICE_PING_TIMEOUT` | `10.0` | WebSocket ping timeout (seconds) |\n\n### Protocol Specifications\n\n#### WebSocket Protocol\n\nThe SDK communicates with CosyVoice servers using a structured WebSocket protocol:\n\n**Message Format:**\n```json\n{\n  \"header\": {\n    \"version\": \"1.0\",\n    \"message_type\": \"TEXT_REQUEST\",\n    \"timestamp\": \"2024-01-01T12:00:00Z\",\n    \"sequence\": 1\n  },\n  \"payload\": {\n    \"session_id\": \"session_123\",\n    \"params\": {\n      \"text\": \"Hello world\",\n      \"mode\": \"zero_shot\",\n      \"speed\": 1.0,\n      \"output_format\": \"wav\"\n    }\n  }\n}\n```\n\n**Message Types:**\n- Client \u2192 Server: `CONNECT_REQUEST`, `SESSION_REQUEST`, `TEXT_REQUEST`, `SYNTHESIS_END`\n- Server \u2192 Client: `AUDIO_RESPONSE`, `AUDIO_COMPLETE`, `ERROR_RESPONSE`\n\n#### HTTP API Endpoints\n\n**Speaker Management:**\n```\nPOST   /v1/speakers              # Create speaker\nGET    /v1/speakers/{id}         # Get speaker info\nPUT    /v1/speakers/{id}         # Update speaker\nDELETE /v1/speakers/{id}         # Delete speaker\nGET    /v1/speakers              # List speakers\n```\n\n## Development\n\n### Environment Setup\n\n```bash\n# Clone repository\ngit clone https://github.com/cosyvoice/cosyvoice-python.git\ncd cosyvoice-python\n\n# Install uv package manager\ncurl -LsSf https://astral.sh/uv/install.sh | sh\n\n# Install dependencies\nuv sync --dev\n```\n\n### Running Tests\n\n```bash\n# Run all tests\nuv run pytest\n\n# Run with coverage\nuv run pytest --cov=cosyvoice --cov-report=html\n\n# Run specific test types\nuv run pytest -m unit           # Unit tests only\nuv run pytest -m integration   # Integration tests only\nuv run pytest -m slow          # Network-dependent tests\n```\n\n### Code Quality\n\n```bash\n# Format code\nuv run black cosyvoice tests examples\nuv run isort cosyvoice tests examples\n\n# Lint code\nuv run ruff check cosyvoice tests examples\nuv run ruff check --fix cosyvoice tests examples\n\n# Type checking\nuv run mypy cosyvoice\n```\n\n### Running Examples\n\n```bash\n# Basic synthesis example\nuv run python examples/basic_synthesis.py\n\n# Real-time streaming example\nuv run python examples/realtime_streaming.py\n\n# Speaker management example\nuv run python examples/speaker_management.py\n```\n\n## Performance Guidelines\n\n### Latency Optimization\n\n- **TTFB Target**: < 300ms for optimal user experience\n- **RTF Target**: < 0.3 for real-time performance\n- **Connection Reuse**: Maintain persistent WebSocket connections\n- **Streaming**: Use `synthesize_stream()` for lowest latency\n\n### Throughput Optimization\n\n- **Connection Pooling**: Pre-create client connections\n- **Concurrent Sessions**: Support multiple parallel synthesis requests\n- **Batch Processing**: Group small text segments when possible\n- **Format Selection**: Use PCM for lowest processing overhead\n\n### Resource Management\n\n- **Memory**: Process audio chunks immediately, avoid accumulation\n- **Connections**: Use context managers for automatic cleanup\n- **Error Recovery**: Implement exponential backoff for reconnections\n\n## Troubleshooting\n\n### Common Issues\n\n1. **Connection Timeout**\n   ```python\n   # Increase timeout values\n   config = cosyvoice.ClientConfig(\n       connection_timeout=60.0,\n       read_timeout=120.0\n   )\n   ```\n\n2. **Speaker Not Found**\n   ```python\n   # Always check speaker existence\n   if not await client.speaker.exists(speaker_id):\n       speaker = await client.speaker.create(prompt_text, audio_url)\n       speaker_id = speaker.zero_shot_spk_id\n   ```\n\n3. **Audio Format Issues**\n   ```python\n   # PCM format requires explicit WAV conversion for playback\n   from cosyvoice.utils.audio import write_wav_file\n   write_wav_file(pcm_data, \"output.wav\", sample_rate=22050)\n   ```\n\n### Debug Logging\n\n```python\nimport logging\nlogging.basicConfig(level=logging.DEBUG)\n\n# Enable detailed WebSocket and HTTP logging\nlogger = logging.getLogger(\"cosyvoice\")\nlogger.setLevel(logging.DEBUG)\n```\n\n## Support & Community\n\n- **Examples**: Complete integration samples in `/examples` directory\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Contributing\n\nWe welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details on how to submit pull requests, report issues, and suggest improvements.\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Asynchronous streaming TTS client for CosyVoice",
    "version": "1.0.1",
    "project_urls": {
        "Bug Tracker": "https://github.com/cosyvoice/cosyvoice-python/issues",
        "Documentation": "https://cosyvoice.github.io/cosyvoice-python",
        "Homepage": "https://github.com/cosyvoice/cosyvoice-python",
        "Repository": "https://github.com/cosyvoice/cosyvoice-python"
    },
    "split_keywords": [
        "async",
        " speech-synthesis",
        " streaming",
        " tts",
        " websocket"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "fe801828581a3291f33fb77c6a552dbaeaf342feba43feff371caea191f8ea0c",
                "md5": "3d6ff619d093d0a3beac4f48807d4089",
                "sha256": "316768fb48e7e02f699f13a6a6b225e0b657ed0b269dbc0940858f21cccdeadb"
            },
            "downloads": -1,
            "filename": "cosyvoice_client-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3d6ff619d093d0a3beac4f48807d4089",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 46187,
            "upload_time": "2025-09-15T01:20:21",
            "upload_time_iso_8601": "2025-09-15T01:20:21.022183Z",
            "url": "https://files.pythonhosted.org/packages/fe/80/1828581a3291f33fb77c6a552dbaeaf342feba43feff371caea191f8ea0c/cosyvoice_client-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "03097fe4c0241492f21153cbd7cb4a2c2c78d2e4325a86fb27d4d25bc36faf6c",
                "md5": "a55f9dec323fa6ab73d831fbac049de1",
                "sha256": "ca44a93f26a6f4fef7061587591174cfb953dd4bfe5741f959329dacbdad5470"
            },
            "downloads": -1,
            "filename": "cosyvoice_client-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "a55f9dec323fa6ab73d831fbac049de1",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 54906,
            "upload_time": "2025-09-15T01:20:22",
            "upload_time_iso_8601": "2025-09-15T01:20:22.759717Z",
            "url": "https://files.pythonhosted.org/packages/03/09/7fe4c0241492f21153cbd7cb4a2c2c78d2e4325a86fb27d4d25bc36faf6c/cosyvoice_client-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-15 01:20:22",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "cosyvoice",
    "github_project": "cosyvoice-python",
    "github_not_found": true,
    "lcname": "cosyvoice-client"
}

None