azure-ai-voicelive


Nameazure-ai-voicelive JSON
Version 1.0.0 PyPI version JSON
download
home_pageNone
SummaryMicrosoft Corporation Azure Ai Voicelive Client Library for Python
upload_time2025-10-02 18:56:43
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseNone
keywords azure azure sdk
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            Azure AI VoiceLive client library for Python
============================================

This package provides a **real-time, speech-to-speech** client for Azure AI VoiceLive.
It opens a WebSocket session to stream microphone audio to the service and receive
typed server events (including audio) for responsive, interruptible conversations.

> **Status:** General Availability (GA). This is a stable release suitable for production use.

> **Important:** As of version 1.0.0, this SDK is **async-only**. The synchronous API has been removed to focus exclusively on async patterns. All examples and samples use `async`/`await` syntax.

---

Getting started
---------------

### Prerequisites

- **Python 3.9+**
- An **Azure subscription**
- A **VoiceLive** resource and endpoint
- A working **microphone** and **speakers/headphones** if you run the voice samples

### Install

Install the stable GA version:

```bash
# Base install (core client only)
python -m pip install azure-ai-voicelive

# For asynchronous streaming (uses aiohttp)
python -m pip install "azure-ai-voicelive[aiohttp]"

# For voice samples (includes audio processing)
python -m pip install azure-ai-voicelive[aiohttp] pyaudio python-dotenv
```

The SDK provides async-only WebSocket connections using `aiohttp` for optimal performance and reliability.

### Authenticate

You can authenticate with an **API key** or an **Azure Active Directory (AAD) token**.

#### API Key Authentication (Quick Start)

Set environment variables in a `.env` file or directly in your environment:

```bash
# In your .env file or environment variables
AZURE_VOICELIVE_API_KEY="your-api-key"
AZURE_VOICELIVE_ENDPOINT="your-endpoint"
```

Then, use the key in your code:

```python
import asyncio
from azure.core.credentials import AzureKeyCredential
from azure.ai.voicelive import connect

async def main():
    async with connect(
        endpoint="your-endpoint",
        credential=AzureKeyCredential("your-api-key"),
        model="gpt-4o-realtime-preview"
    ) as connection:
        # Your async code here
        pass

asyncio.run(main())
```

#### AAD Token Authentication

For production applications, AAD authentication is recommended:

```python
import asyncio
from azure.identity.aio import DefaultAzureCredential
from azure.ai.voicelive import connect

async def main():
    credential = DefaultAzureCredential()
    
    async with connect(
        endpoint="your-endpoint",
        credential=credential,
        model="gpt-4o-realtime-preview"
    ) as connection:
        # Your async code here
        pass

asyncio.run(main())
```

---

Key concepts
------------

- **VoiceLiveConnection** – Manages an active async WebSocket connection to the service
- **Session Management** – Configure conversation parameters:
  - **SessionResource** – Update session parameters (voice, formats, VAD) with async methods
  - **RequestSession** – Strongly-typed session configuration
  - **ServerVad** – Configure voice activity detection
  - **AzureStandardVoice** – Configure voice settings
- **Audio Handling**:
  - **InputAudioBufferResource** – Manage audio input to the service with async methods
  - **OutputAudioBufferResource** – Control audio output from the service with async methods
- **Conversation Management**:
  - **ResponseResource** – Create or cancel model responses with async methods
  - **ConversationResource** – Manage conversation items with async methods
- **Error Handling**: 
  - **ConnectionError** – Base exception for WebSocket connection errors
  - **ConnectionClosed** – Raised when WebSocket connection is closed
- **Strongly-Typed Events** – Process service events with type safety:
  - `SESSION_UPDATED`, `RESPONSE_AUDIO_DELTA`, `RESPONSE_DONE`
  - `INPUT_AUDIO_BUFFER_SPEECH_STARTED`, `INPUT_AUDIO_BUFFER_SPEECH_STOPPED`
  - `ERROR`, and more

---

Examples
--------

### Basic Voice Assistant (Featured Sample)

The Basic Voice Assistant sample demonstrates full-featured voice interaction with:

- Real-time speech streaming
- Server-side voice activity detection  
- Interruption handling
- High-quality audio processing

```bash
# Run the basic voice assistant sample
# Requires [aiohttp] for async
python samples/basic_voice_assistant_async.py

# With custom parameters
python samples/basic_voice_assistant_async.py --model gpt-4o-realtime-preview --voice alloy --instructions "You're a helpful assistant"
```

### Minimal example

```python
import asyncio
from azure.core.credentials import AzureKeyCredential
from azure.ai.voicelive.aio import connect
from azure.ai.voicelive.models import (
    RequestSession, Modality, InputAudioFormat, OutputAudioFormat, ServerVad, ServerEventType
)

API_KEY = "your-api-key"
ENDPOINT = "wss://your-endpoint.com/openai/realtime"
MODEL = "gpt-4o-realtime-preview"

async def main():
    async with connect(
        endpoint=ENDPOINT,
        credential=AzureKeyCredential(API_KEY),
        model=MODEL,
    ) as conn:
        session = RequestSession(
            modalities=[Modality.TEXT, Modality.AUDIO],
            instructions="You are a helpful assistant.",
            input_audio_format=InputAudioFormat.PCM16,
            output_audio_format=OutputAudioFormat.PCM16,
            turn_detection=ServerVad(
                threshold=0.5, 
                prefix_padding_ms=300, 
                silence_duration_ms=500
            ),
        )
        await conn.session.update(session=session)

        # Process events
        async for evt in conn:
            print(f"Event: {evt.type}")
            if evt.type == ServerEventType.RESPONSE_DONE:
                break

asyncio.run(main())
```

Available Voice Options
-----------------------

### Azure Neural Voices

```python
# Use Azure Neural voices
voice_config = AzureStandardVoice(
    name="en-US-AvaNeural",  # Or another voice name
    type="azure-standard"
)
```

Popular voices include:

- `en-US-AvaNeural` - Female, natural and professional
- `en-US-JennyNeural` - Female, conversational
- `en-US-GuyNeural` - Male, professional

### OpenAI Voices

```python
# Use OpenAI voices (as string)
voice_config = "alloy"  # Or another OpenAI voice
```

Available OpenAI voices:

- `alloy` - Versatile, neutral
- `echo` - Precise, clear
- `fable` - Animated, expressive
- `onyx` - Deep, authoritative
- `nova` - Warm, conversational
- `shimmer` - Optimistic, friendly

---

Handling Events
---------------

```python
async for event in connection:
    if event.type == ServerEventType.SESSION_UPDATED:
        print(f"Session ready: {event.session.id}")
        # Start audio capture
        
    elif event.type == ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STARTED:
        print("User started speaking")
        # Stop playback and cancel any current response
        
    elif event.type == ServerEventType.RESPONSE_AUDIO_DELTA:
        # Play the audio chunk
        audio_bytes = event.delta
        
    elif event.type == ServerEventType.ERROR:
        print(f"Error: {event.error.message}")
```

---

Troubleshooting
---------------

### Connection Issues

- **WebSocket connection errors (1006/timeout):**  
  Verify `AZURE_VOICELIVE_ENDPOINT`, network rules, and that your credential has access.

- **Missing WebSocket dependencies:**  
  If you see import errors, make sure you have installed the package:
    pip install azure-ai-voicelive[aiohttp]

- **Auth failures:**  
  For API key, double-check `AZURE_VOICELIVE_API_KEY`. For AAD, ensure the identity is authorized.

### Audio Device Issues

- **No microphone/speaker detected:**  
  Check device connections and permissions. On headless CI environments, audio samples can't run.

- **Audio library installation problems:**  
  On Linux/macOS you may need PortAudio:

  ```bash
  # Debian/Ubuntu
  sudo apt-get install -y portaudio19-dev libasound2-dev
  # macOS (Homebrew)
  brew install portaudio
  ```

### Enable Verbose Logging

```python
import logging
logging.basicConfig(level=logging.DEBUG)
```

---

Next steps
----------

1. **Run the featured sample:**
   - Try `samples/basic_voice_assistant_async.py` for a complete voice assistant implementation

2. **Customize your implementation:**
   - Experiment with different voices and parameters
   - Add custom instructions for specialized assistants
   - Integrate with your own audio capture/playback systems

3. **Advanced scenarios:**
   - Add function calling support
   - Implement tool usage
   - Create multi-turn conversations with history

4. **Explore other samples:**
   - Check the `samples/` directory for specialized examples
   - See `samples/README.md` for a full list of samples

---

Contributing
------------

This project follows the Azure SDK guidelines. If you'd like to contribute:

1. Fork the repo and create a feature branch
2. Run linters and tests locally
3. Submit a pull request with a clear description of the change

---

Release notes
-------------

Changelogs are available in the package directory.

---

License
-------

This project is released under the **MIT License**.

# Release History

## 1.0.0 (2025-10-01)

### Features Added

- **Enhanced WebSocket Connection Options**: Significantly improved WebSocket connection configuration with transport-agnostic design:
  - Added new timeout configuration options: `receive_timeout`, `close_timeout`, and `handshake_timeout` for fine-grained control
  - Enhanced `compression` parameter to support both boolean and integer types for advanced zlib window configuration
  - Added `vendor_options` parameter for implementation-specific options passthrough (escape hatch for advanced users)
  - Improved documentation with clearer descriptions for all connection parameters
  - Better support for common aliases from other WebSocket ecosystems (`max_size`, `ping_interval`, etc.)
  - More robust option mapping with proper type conversion and safety checks
- **Enhanced Type Safety**: Improved type safety for content parts with proper enum usage:
  - `InputAudioContentPart`, `InputTextContentPart`, and `OutputTextContentPart` now use `ContentPartType` enum values instead of string literals
  - Better IntelliSense support and compile-time type checking for content part discriminators

### Breaking Changes

- **Improved Naming Conventions**: Updated model and enum names for better clarity and consistency:
  - `OAIVoice` enum renamed to `OpenAIVoiceName` for more descriptive naming
  - `ToolChoiceObject` model renamed to `ToolChoiceSelection` for better semantic meaning
  - `ToolChoiceFunctionObject` model renamed to `ToolChoiceFunctionSelection` for consistency
  - Updated type unions and imports to reflect the new naming conventions
  - Cross-language package mappings updated to maintain compatibility across SDKs
- **Session Model Architecture**: Separated `ResponseSession` and `RequestSession` models for better design clarity:
  - `ResponseSession` no longer inherits from `RequestSession` and now inherits directly from `_Model`
  - All session configuration fields are now explicitly defined in `ResponseSession` instead of being inherited
  - This provides clearer separation of concerns between request and response session configurations
  - May affect type checking and code that relied on the previous inheritance relationship
- **Model Cleanup**: Removed unused `AgentConfig` model and related fields from the public API:
  - `AgentConfig` class has been completely removed from imports and exports
  - `agent` field removed from `ResponseSession` model (including constructor parameter)
  - Updated cross-language package mappings to reflect the removal
- **Model Naming Convention Update**: Renamed `EOUDetection` to `EouDetection` for better naming consistency:
  - Class name changed from `EOUDetection` to `EouDetection` 
  - All inheritance relationships updated: `AzureSemanticDetection`, `AzureSemanticDetectionEn`, and `AzureSemanticDetectionMultilingual` now inherit from `EouDetection`
  - Type annotations updated in `AzureSemanticVad`, `AzureSemanticVadEn`, `AzureSemanticVadMultilingual`, and `ServerVad` classes
  - Import statements and exports updated to reflect the new naming
- **Enhanced Content Part Type Safety**: Content part discriminators now use enum values instead of string literals:
  - `InputAudioContentPart.type` now uses `ContentPartType.INPUT_AUDIO` instead of `"input_audio"`
  - `InputTextContentPart.type` now uses `ContentPartType.INPUT_TEXT` instead of `"input_text"`  
  - `OutputTextContentPart.type` now uses `ContentPartType.TEXT` instead of `"text"`

### Other Changes

- Initial GA release

## 1.0.0b5 (2025-09-26)

### Features Added

- **Enhanced Semantic Detection Type Safety**: Added new `EouThresholdLevel` enum for better type safety in end-of-utterance detection:
  - `LOW` for low sensitivity threshold level
  - `MEDIUM` for medium sensitivity threshold level  
  - `HIGH` for high sensitivity threshold level
  - `DEFAULT` for default sensitivity threshold level
- **Improved Semantic Detection Configuration**: Enhanced semantic detection classes with better type annotations:
  - `threshold_level` parameter now supports both string values and `EouThresholdLevel` enum
  - Cleaner type definitions for `AzureSemanticDetection`, `AzureSemanticDetectionEn`, and `AzureSemanticDetectionMultilingual`
  - Improved documentation for threshold level parameters
- **Comprehensive Unit Test Suite**: Added extensive unit test coverage with 200+ test cases covering:
  - All enum types and their functionality
  - Model creation, validation, and serialization
  - Async connection functionality with proper mocking
  - Client event handling and workflows
  - Voice configuration across all supported types
  - Message handling with content part hierarchy
  - Integration scenarios and real-world usage patterns
  - Recent changes validation and backwards compatibility
- **API Version Update**: Updated to API version `2025-10-01` (from `2025-05-01-preview`)
- **Enhanced Type Safety**: Added new `AzureVoiceType` enum with values for better Azure voice type categorization:
  - `AZURE_CUSTOM` for custom voice configurations
  - `AZURE_STANDARD` for standard voice configurations  
  - `AZURE_PERSONAL` for personal voice configurations
- **Improved Message Handling**: Added `MessageRole` enum for better role type safety in message items
- **Enhanced Model Documentation**: Comprehensive documentation improvements across all models:
  - Added detailed docstrings for model classes and their parameters
  - Enhanced enum value documentation with descriptions
  - Improved type annotations and parameter descriptions
- **Enhanced Semantic Detection**: Added improved configuration options for all semantic detection classes:
  - Added `threshold_level` parameter with options: `"low"`, `"medium"`, `"high"`, `"default"` (recommended over deprecated `threshold`)
  - Added `timeout_ms` parameter for timeout configuration in milliseconds (recommended over deprecated `timeout`)
- **Video Background Support**: Added new `Background` model for video background customization:
  - Support for solid color backgrounds in hex format (e.g., `#00FF00FF`)
  - Support for image URL backgrounds
  - Mutually exclusive color and image URL options
- **Enhanced Video Parameters**: Extended `VideoParams` model with:
  - `background` parameter for configuring video backgrounds using the new `Background` model
  - `gop_size` parameter for Group of Pictures (GOP) size control, affecting compression efficiency and seeking performance
- **Improved Type Safety**: Added `TurnDetectionType` enum for better type safety and IntelliSense support
- **Package Structure Modernization**: Simplified package initialization with namespace package support
- **Enhanced Error Handling**: Added `ConnectionError` and `ConnectionClosed` exception classes to the async API for better WebSocket error management

### Breaking Changes

- **Cross-Language Package Identity Update**: Updated package ID from `VoiceLive` to `VoiceLive.WebSocket` for better cross-language consistency
- **Model Refactoring**: 
  - Renamed `UserContentPart` to `MessageContentPart` for clearer content part hierarchy
  - All message items now require a `content` field with list of `MessageContentPart` objects
  - `OutputTextContentPart` now inherits from `MessageContentPart` instead of being standalone
- **Enhanced Type Safety**: 
  - Azure voice classes now use `AzureVoiceType` enum discriminators instead of string literals
  - Message role discriminators now use `MessageRole` enum values for better type safety
- **Removed Deprecated Parameters**: Completely removed deprecated parameters from semantic detection classes:
  - Removed `threshold` parameter from all semantic detection classes (`AzureSemanticDetection`, `AzureSemanticDetectionEn`, `AzureSemanticDetectionMultilingual`)
  - Removed `timeout` parameter from all semantic detection classes
  - Users must now use `threshold_level` and `timeout_ms` parameters respectively
- **Removed Synchronous API**: Completely removed synchronous WebSocket operations to focus exclusively on async patterns:
  - Removed sync `connect()` function and sync `VoiceLiveConnection` class from main patch implementation
  - Removed sync `basic_voice_assistant.py` sample (only async version remains)
  - Simplified sync patch to minimal structure with empty exports
  - All functionality now available only through async patterns
- **Updated Dependencies**: Modified package dependencies to reflect async-only architecture:
  - Moved `aiohttp>=3.9.0,<4.0.0` from optional to required dependency
  - Removed `websockets` optional dependency as sync API no longer exists
  - Removed optional dependency groups `websockets`, `aiohttp`, and `all-websockets`
- **Model Rename**:
  - Renamed `AudioInputTranscriptionSettings` to `AudioInputTranscriptionOptions` for consistency with naming conventions
  - Renamed `AzureMultilingualSemanticVad` to `AzureSemanticVadMultilingual` for naming consistency with other multilingual variants
- **Enhanced Type Safety**: Turn detection discriminator types now use enum values instead of string literals for better type safety

### Bug Fixes

- **Serialization Improvements**: Fixed type casting issue in serialization utilities for better enum handling and type safety

### Other Changes

- **Testing Infrastructure**: Added comprehensive unit test suite with extensive coverage:
  - 8 main test files with 200+ individual test methods
  - Tests for all enums, models, async operations, client events, voice configurations, and message handling
  - Integration tests covering real-world scenarios and recent changes
  - Proper mocking for async WebSocket connections
  - Backwards compatibility validation
  - Test coverage for all recent changes and enhancements
- **API Documentation**: Updated API view properties to reflect model structure changes, new enums, and cross-language package identity
- **Documentation Updates**: Comprehensive updates to all markdown documentation:
  - Updated README.md to reflect async-only nature with updated examples and installation instructions
  - Updated samples README.md to remove sync sample references
  - Enhanced BASIC_VOICE_ASSISTANT.md with comprehensive async implementation guide
  - Added MIGRATION_GUIDE.md for users upgrading from previous versions

## 1.0.0b4 (2025-09-19)

### Features Added

- **Personal Voice Models**: Added `PersonalVoiceModels` enum with support for `DragonLatestNeural`, `PhoenixLatestNeural`, and `PhoenixV2Neural` models
- **Enhanced Animation Support**: Added comprehensive server event classes for animation blendshapes and viseme handling:
  - `ServerEventResponseAnimationBlendshapeDelta` and `ServerEventResponseAnimationBlendshapeDone`
  - `ServerEventResponseAnimationVisemeDelta` and `ServerEventResponseAnimationVisemeDone`
- **Audio Timestamp Events**: Added `ServerEventResponseAudioTimestampDelta` and `ServerEventResponseAudioTimestampDone` for better audio timing control
- **Improved Error Handling**: Added `ErrorResponse` class for better error management
- **Enhanced Base Classes**: Added `ConversationItemBase` and `SessionBase` for better code organization and inheritance
- **Token Usage Improvements**: Renamed `Usage` to `TokenUsage` for better clarity
- **Audio Format Improvements**: Reorganized audio format enums with separate `InputAudioFormat` and `OutputAudioFormat` enums for better clarity
- **Enhanced Output Audio Format Support**: Added more granular output audio format options including specific sampling rates (8kHz, 16kHz) for PCM16

### Breaking Changes

- **Model Cleanup**: Removed experimental classes `AzurePlatformVoice`, `LLMVoice`, `AzureSemanticVadServer`, `InputAudio`, `NoTurnDetection`, and `ToolChoiceFunctionObjectFunction`
- **Class Rename**: Renamed `Usage` class to `TokenUsage` for better clarity
- **Enum Reorganization**:
  - Replaced `AudioFormat` enum with separate `InputAudioFormat` and `OutputAudioFormat` enums
  - Removed `Phi4mmVoice` enum
  - Removed `EMOTION` value from `AnimationOutputType` enum
  - Removed `IN_PROGRESS` value from `ItemParamStatus` enum
- **Server Events**: Removed `RESPONSE_EMOTION_HYPOTHESIS` from `ServerEventType` enum

### Other Changes

- **Package Structure**: Simplified package initialization with namespace package support
- **Sample Updates**: Improved basic voice assistant samples
- **Code Optimization**: Streamlined model definitions with significant code reduction
- **API Configuration**: Updated API view properties for better tooling support

## 1.0.0b3 (2025-09-17)

### Features Added

- **Transcription improvement**: Added phrase list
- **New Voice Types**: Added `AzurePlatformVoice` and `LLMVoice` classes
- **Enhanced Speech Detection**: Added `AzureSemanticVadServer` class
- **Improved Function Calling**: Enhanced async function calling sample with better error handling
- **English-Specific Detection**: Added `AzureSemanticDetectionEn` class for optimized English-only semantic end-of-utterance detection
- **English-Specific Voice Activity Detection**: Added `AzureSemanticVadEn` class for enhanced English-only voice activity detection

### Breaking Changes

- **Transcription**: Removed `custom_model` and `enabled` from `AudioInputTranscriptionSettings`.
- **Async Authentication**: Fixed credential handling for async scenarios
- **Model Serialization**: Improved error handling and deserialization

### Other Changes

- **Code Modernization**: Updated type annotations throughout

## 1.0.0b2 (2025-09-10)

### Features Added

- Async function call

### Bugs Fixed

- Fixed function calling: ensure `FunctionCallOutputItem.output` is properly serialized as a JSON string before sending to the service.

## 1.0.0b1 (2025-08-28)

### Features Added

- Added WebSocket connection support through `connect()`.
- Added `VoiceLiveConnection` for managing WebSocket connections.
- Added models of Voice Live preview.
- Added WebSocket-based examples in the samples directory.

### Other Changes

- Initial preview release.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "azure-ai-voicelive",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "azure, azure sdk",
    "author": null,
    "author_email": "Microsoft Corporation <azpysdkhelp@microsoft.com> License-Expression: MIT",
    "download_url": "https://files.pythonhosted.org/packages/e7/cf/bc7114d4d625043b1e447b7d09f7a2af52d2ef5c79a13b178449e723a104/azure_ai_voicelive-1.0.0.tar.gz",
    "platform": null,
    "description": "Azure AI VoiceLive client library for Python\n============================================\n\nThis package provides a **real-time, speech-to-speech** client for Azure AI VoiceLive.\nIt opens a WebSocket session to stream microphone audio to the service and receive\ntyped server events (including audio) for responsive, interruptible conversations.\n\n> **Status:** General Availability (GA). This is a stable release suitable for production use.\n\n> **Important:** As of version 1.0.0, this SDK is **async-only**. The synchronous API has been removed to focus exclusively on async patterns. All examples and samples use `async`/`await` syntax.\n\n---\n\nGetting started\n---------------\n\n### Prerequisites\n\n- **Python 3.9+**\n- An **Azure subscription**\n- A **VoiceLive** resource and endpoint\n- A working **microphone** and **speakers/headphones** if you run the voice samples\n\n### Install\n\nInstall the stable GA version:\n\n```bash\n# Base install (core client only)\npython -m pip install azure-ai-voicelive\n\n# For asynchronous streaming (uses aiohttp)\npython -m pip install \"azure-ai-voicelive[aiohttp]\"\n\n# For voice samples (includes audio processing)\npython -m pip install azure-ai-voicelive[aiohttp] pyaudio python-dotenv\n```\n\nThe SDK provides async-only WebSocket connections using `aiohttp` for optimal performance and reliability.\n\n### Authenticate\n\nYou can authenticate with an **API key** or an **Azure Active Directory (AAD) token**.\n\n#### API Key Authentication (Quick Start)\n\nSet environment variables in a `.env` file or directly in your environment:\n\n```bash\n# In your .env file or environment variables\nAZURE_VOICELIVE_API_KEY=\"your-api-key\"\nAZURE_VOICELIVE_ENDPOINT=\"your-endpoint\"\n```\n\nThen, use the key in your code:\n\n```python\nimport asyncio\nfrom azure.core.credentials import AzureKeyCredential\nfrom azure.ai.voicelive import connect\n\nasync def main():\n    async with connect(\n        endpoint=\"your-endpoint\",\n        credential=AzureKeyCredential(\"your-api-key\"),\n        model=\"gpt-4o-realtime-preview\"\n    ) as connection:\n        # Your async code here\n        pass\n\nasyncio.run(main())\n```\n\n#### AAD Token Authentication\n\nFor production applications, AAD authentication is recommended:\n\n```python\nimport asyncio\nfrom azure.identity.aio import DefaultAzureCredential\nfrom azure.ai.voicelive import connect\n\nasync def main():\n    credential = DefaultAzureCredential()\n    \n    async with connect(\n        endpoint=\"your-endpoint\",\n        credential=credential,\n        model=\"gpt-4o-realtime-preview\"\n    ) as connection:\n        # Your async code here\n        pass\n\nasyncio.run(main())\n```\n\n---\n\nKey concepts\n------------\n\n- **VoiceLiveConnection** \u2013 Manages an active async WebSocket connection to the service\n- **Session Management** \u2013 Configure conversation parameters:\n  - **SessionResource** \u2013 Update session parameters (voice, formats, VAD) with async methods\n  - **RequestSession** \u2013 Strongly-typed session configuration\n  - **ServerVad** \u2013 Configure voice activity detection\n  - **AzureStandardVoice** \u2013 Configure voice settings\n- **Audio Handling**:\n  - **InputAudioBufferResource** \u2013 Manage audio input to the service with async methods\n  - **OutputAudioBufferResource** \u2013 Control audio output from the service with async methods\n- **Conversation Management**:\n  - **ResponseResource** \u2013 Create or cancel model responses with async methods\n  - **ConversationResource** \u2013 Manage conversation items with async methods\n- **Error Handling**: \n  - **ConnectionError** \u2013 Base exception for WebSocket connection errors\n  - **ConnectionClosed** \u2013 Raised when WebSocket connection is closed\n- **Strongly-Typed Events** \u2013 Process service events with type safety:\n  - `SESSION_UPDATED`, `RESPONSE_AUDIO_DELTA`, `RESPONSE_DONE`\n  - `INPUT_AUDIO_BUFFER_SPEECH_STARTED`, `INPUT_AUDIO_BUFFER_SPEECH_STOPPED`\n  - `ERROR`, and more\n\n---\n\nExamples\n--------\n\n### Basic Voice Assistant (Featured Sample)\n\nThe Basic Voice Assistant sample demonstrates full-featured voice interaction with:\n\n- Real-time speech streaming\n- Server-side voice activity detection  \n- Interruption handling\n- High-quality audio processing\n\n```bash\n# Run the basic voice assistant sample\n# Requires [aiohttp] for async\npython samples/basic_voice_assistant_async.py\n\n# With custom parameters\npython samples/basic_voice_assistant_async.py --model gpt-4o-realtime-preview --voice alloy --instructions \"You're a helpful assistant\"\n```\n\n### Minimal example\n\n```python\nimport asyncio\nfrom azure.core.credentials import AzureKeyCredential\nfrom azure.ai.voicelive.aio import connect\nfrom azure.ai.voicelive.models import (\n    RequestSession, Modality, InputAudioFormat, OutputAudioFormat, ServerVad, ServerEventType\n)\n\nAPI_KEY = \"your-api-key\"\nENDPOINT = \"wss://your-endpoint.com/openai/realtime\"\nMODEL = \"gpt-4o-realtime-preview\"\n\nasync def main():\n    async with connect(\n        endpoint=ENDPOINT,\n        credential=AzureKeyCredential(API_KEY),\n        model=MODEL,\n    ) as conn:\n        session = RequestSession(\n            modalities=[Modality.TEXT, Modality.AUDIO],\n            instructions=\"You are a helpful assistant.\",\n            input_audio_format=InputAudioFormat.PCM16,\n            output_audio_format=OutputAudioFormat.PCM16,\n            turn_detection=ServerVad(\n                threshold=0.5, \n                prefix_padding_ms=300, \n                silence_duration_ms=500\n            ),\n        )\n        await conn.session.update(session=session)\n\n        # Process events\n        async for evt in conn:\n            print(f\"Event: {evt.type}\")\n            if evt.type == ServerEventType.RESPONSE_DONE:\n                break\n\nasyncio.run(main())\n```\n\nAvailable Voice Options\n-----------------------\n\n### Azure Neural Voices\n\n```python\n# Use Azure Neural voices\nvoice_config = AzureStandardVoice(\n    name=\"en-US-AvaNeural\",  # Or another voice name\n    type=\"azure-standard\"\n)\n```\n\nPopular voices include:\n\n- `en-US-AvaNeural` - Female, natural and professional\n- `en-US-JennyNeural` - Female, conversational\n- `en-US-GuyNeural` - Male, professional\n\n### OpenAI Voices\n\n```python\n# Use OpenAI voices (as string)\nvoice_config = \"alloy\"  # Or another OpenAI voice\n```\n\nAvailable OpenAI voices:\n\n- `alloy` - Versatile, neutral\n- `echo` - Precise, clear\n- `fable` - Animated, expressive\n- `onyx` - Deep, authoritative\n- `nova` - Warm, conversational\n- `shimmer` - Optimistic, friendly\n\n---\n\nHandling Events\n---------------\n\n```python\nasync for event in connection:\n    if event.type == ServerEventType.SESSION_UPDATED:\n        print(f\"Session ready: {event.session.id}\")\n        # Start audio capture\n        \n    elif event.type == ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STARTED:\n        print(\"User started speaking\")\n        # Stop playback and cancel any current response\n        \n    elif event.type == ServerEventType.RESPONSE_AUDIO_DELTA:\n        # Play the audio chunk\n        audio_bytes = event.delta\n        \n    elif event.type == ServerEventType.ERROR:\n        print(f\"Error: {event.error.message}\")\n```\n\n---\n\nTroubleshooting\n---------------\n\n### Connection Issues\n\n- **WebSocket connection errors (1006/timeout):**  \n  Verify `AZURE_VOICELIVE_ENDPOINT`, network rules, and that your credential has access.\n\n- **Missing WebSocket dependencies:**  \n  If you see import errors, make sure you have installed the package:\n    pip install azure-ai-voicelive[aiohttp]\n\n- **Auth failures:**  \n  For API key, double-check `AZURE_VOICELIVE_API_KEY`. For AAD, ensure the identity is authorized.\n\n### Audio Device Issues\n\n- **No microphone/speaker detected:**  \n  Check device connections and permissions. On headless CI environments, audio samples can't run.\n\n- **Audio library installation problems:**  \n  On Linux/macOS you may need PortAudio:\n\n  ```bash\n  # Debian/Ubuntu\n  sudo apt-get install -y portaudio19-dev libasound2-dev\n  # macOS (Homebrew)\n  brew install portaudio\n  ```\n\n### Enable Verbose Logging\n\n```python\nimport logging\nlogging.basicConfig(level=logging.DEBUG)\n```\n\n---\n\nNext steps\n----------\n\n1. **Run the featured sample:**\n   - Try `samples/basic_voice_assistant_async.py` for a complete voice assistant implementation\n\n2. **Customize your implementation:**\n   - Experiment with different voices and parameters\n   - Add custom instructions for specialized assistants\n   - Integrate with your own audio capture/playback systems\n\n3. **Advanced scenarios:**\n   - Add function calling support\n   - Implement tool usage\n   - Create multi-turn conversations with history\n\n4. **Explore other samples:**\n   - Check the `samples/` directory for specialized examples\n   - See `samples/README.md` for a full list of samples\n\n---\n\nContributing\n------------\n\nThis project follows the Azure SDK guidelines. If you'd like to contribute:\n\n1. Fork the repo and create a feature branch\n2. Run linters and tests locally\n3. Submit a pull request with a clear description of the change\n\n---\n\nRelease notes\n-------------\n\nChangelogs are available in the package directory.\n\n---\n\nLicense\n-------\n\nThis project is released under the **MIT License**.\n\n# Release History\n\n## 1.0.0 (2025-10-01)\n\n### Features Added\n\n- **Enhanced WebSocket Connection Options**: Significantly improved WebSocket connection configuration with transport-agnostic design:\n  - Added new timeout configuration options: `receive_timeout`, `close_timeout`, and `handshake_timeout` for fine-grained control\n  - Enhanced `compression` parameter to support both boolean and integer types for advanced zlib window configuration\n  - Added `vendor_options` parameter for implementation-specific options passthrough (escape hatch for advanced users)\n  - Improved documentation with clearer descriptions for all connection parameters\n  - Better support for common aliases from other WebSocket ecosystems (`max_size`, `ping_interval`, etc.)\n  - More robust option mapping with proper type conversion and safety checks\n- **Enhanced Type Safety**: Improved type safety for content parts with proper enum usage:\n  - `InputAudioContentPart`, `InputTextContentPart`, and `OutputTextContentPart` now use `ContentPartType` enum values instead of string literals\n  - Better IntelliSense support and compile-time type checking for content part discriminators\n\n### Breaking Changes\n\n- **Improved Naming Conventions**: Updated model and enum names for better clarity and consistency:\n  - `OAIVoice` enum renamed to `OpenAIVoiceName` for more descriptive naming\n  - `ToolChoiceObject` model renamed to `ToolChoiceSelection` for better semantic meaning\n  - `ToolChoiceFunctionObject` model renamed to `ToolChoiceFunctionSelection` for consistency\n  - Updated type unions and imports to reflect the new naming conventions\n  - Cross-language package mappings updated to maintain compatibility across SDKs\n- **Session Model Architecture**: Separated `ResponseSession` and `RequestSession` models for better design clarity:\n  - `ResponseSession` no longer inherits from `RequestSession` and now inherits directly from `_Model`\n  - All session configuration fields are now explicitly defined in `ResponseSession` instead of being inherited\n  - This provides clearer separation of concerns between request and response session configurations\n  - May affect type checking and code that relied on the previous inheritance relationship\n- **Model Cleanup**: Removed unused `AgentConfig` model and related fields from the public API:\n  - `AgentConfig` class has been completely removed from imports and exports\n  - `agent` field removed from `ResponseSession` model (including constructor parameter)\n  - Updated cross-language package mappings to reflect the removal\n- **Model Naming Convention Update**: Renamed `EOUDetection` to `EouDetection` for better naming consistency:\n  - Class name changed from `EOUDetection` to `EouDetection` \n  - All inheritance relationships updated: `AzureSemanticDetection`, `AzureSemanticDetectionEn`, and `AzureSemanticDetectionMultilingual` now inherit from `EouDetection`\n  - Type annotations updated in `AzureSemanticVad`, `AzureSemanticVadEn`, `AzureSemanticVadMultilingual`, and `ServerVad` classes\n  - Import statements and exports updated to reflect the new naming\n- **Enhanced Content Part Type Safety**: Content part discriminators now use enum values instead of string literals:\n  - `InputAudioContentPart.type` now uses `ContentPartType.INPUT_AUDIO` instead of `\"input_audio\"`\n  - `InputTextContentPart.type` now uses `ContentPartType.INPUT_TEXT` instead of `\"input_text\"`  \n  - `OutputTextContentPart.type` now uses `ContentPartType.TEXT` instead of `\"text\"`\n\n### Other Changes\n\n- Initial GA release\n\n## 1.0.0b5 (2025-09-26)\n\n### Features Added\n\n- **Enhanced Semantic Detection Type Safety**: Added new `EouThresholdLevel` enum for better type safety in end-of-utterance detection:\n  - `LOW` for low sensitivity threshold level\n  - `MEDIUM` for medium sensitivity threshold level  \n  - `HIGH` for high sensitivity threshold level\n  - `DEFAULT` for default sensitivity threshold level\n- **Improved Semantic Detection Configuration**: Enhanced semantic detection classes with better type annotations:\n  - `threshold_level` parameter now supports both string values and `EouThresholdLevel` enum\n  - Cleaner type definitions for `AzureSemanticDetection`, `AzureSemanticDetectionEn`, and `AzureSemanticDetectionMultilingual`\n  - Improved documentation for threshold level parameters\n- **Comprehensive Unit Test Suite**: Added extensive unit test coverage with 200+ test cases covering:\n  - All enum types and their functionality\n  - Model creation, validation, and serialization\n  - Async connection functionality with proper mocking\n  - Client event handling and workflows\n  - Voice configuration across all supported types\n  - Message handling with content part hierarchy\n  - Integration scenarios and real-world usage patterns\n  - Recent changes validation and backwards compatibility\n- **API Version Update**: Updated to API version `2025-10-01` (from `2025-05-01-preview`)\n- **Enhanced Type Safety**: Added new `AzureVoiceType` enum with values for better Azure voice type categorization:\n  - `AZURE_CUSTOM` for custom voice configurations\n  - `AZURE_STANDARD` for standard voice configurations  \n  - `AZURE_PERSONAL` for personal voice configurations\n- **Improved Message Handling**: Added `MessageRole` enum for better role type safety in message items\n- **Enhanced Model Documentation**: Comprehensive documentation improvements across all models:\n  - Added detailed docstrings for model classes and their parameters\n  - Enhanced enum value documentation with descriptions\n  - Improved type annotations and parameter descriptions\n- **Enhanced Semantic Detection**: Added improved configuration options for all semantic detection classes:\n  - Added `threshold_level` parameter with options: `\"low\"`, `\"medium\"`, `\"high\"`, `\"default\"` (recommended over deprecated `threshold`)\n  - Added `timeout_ms` parameter for timeout configuration in milliseconds (recommended over deprecated `timeout`)\n- **Video Background Support**: Added new `Background` model for video background customization:\n  - Support for solid color backgrounds in hex format (e.g., `#00FF00FF`)\n  - Support for image URL backgrounds\n  - Mutually exclusive color and image URL options\n- **Enhanced Video Parameters**: Extended `VideoParams` model with:\n  - `background` parameter for configuring video backgrounds using the new `Background` model\n  - `gop_size` parameter for Group of Pictures (GOP) size control, affecting compression efficiency and seeking performance\n- **Improved Type Safety**: Added `TurnDetectionType` enum for better type safety and IntelliSense support\n- **Package Structure Modernization**: Simplified package initialization with namespace package support\n- **Enhanced Error Handling**: Added `ConnectionError` and `ConnectionClosed` exception classes to the async API for better WebSocket error management\n\n### Breaking Changes\n\n- **Cross-Language Package Identity Update**: Updated package ID from `VoiceLive` to `VoiceLive.WebSocket` for better cross-language consistency\n- **Model Refactoring**: \n  - Renamed `UserContentPart` to `MessageContentPart` for clearer content part hierarchy\n  - All message items now require a `content` field with list of `MessageContentPart` objects\n  - `OutputTextContentPart` now inherits from `MessageContentPart` instead of being standalone\n- **Enhanced Type Safety**: \n  - Azure voice classes now use `AzureVoiceType` enum discriminators instead of string literals\n  - Message role discriminators now use `MessageRole` enum values for better type safety\n- **Removed Deprecated Parameters**: Completely removed deprecated parameters from semantic detection classes:\n  - Removed `threshold` parameter from all semantic detection classes (`AzureSemanticDetection`, `AzureSemanticDetectionEn`, `AzureSemanticDetectionMultilingual`)\n  - Removed `timeout` parameter from all semantic detection classes\n  - Users must now use `threshold_level` and `timeout_ms` parameters respectively\n- **Removed Synchronous API**: Completely removed synchronous WebSocket operations to focus exclusively on async patterns:\n  - Removed sync `connect()` function and sync `VoiceLiveConnection` class from main patch implementation\n  - Removed sync `basic_voice_assistant.py` sample (only async version remains)\n  - Simplified sync patch to minimal structure with empty exports\n  - All functionality now available only through async patterns\n- **Updated Dependencies**: Modified package dependencies to reflect async-only architecture:\n  - Moved `aiohttp>=3.9.0,<4.0.0` from optional to required dependency\n  - Removed `websockets` optional dependency as sync API no longer exists\n  - Removed optional dependency groups `websockets`, `aiohttp`, and `all-websockets`\n- **Model Rename**:\n  - Renamed `AudioInputTranscriptionSettings` to `AudioInputTranscriptionOptions` for consistency with naming conventions\n  - Renamed `AzureMultilingualSemanticVad` to `AzureSemanticVadMultilingual` for naming consistency with other multilingual variants\n- **Enhanced Type Safety**: Turn detection discriminator types now use enum values instead of string literals for better type safety\n\n### Bug Fixes\n\n- **Serialization Improvements**: Fixed type casting issue in serialization utilities for better enum handling and type safety\n\n### Other Changes\n\n- **Testing Infrastructure**: Added comprehensive unit test suite with extensive coverage:\n  - 8 main test files with 200+ individual test methods\n  - Tests for all enums, models, async operations, client events, voice configurations, and message handling\n  - Integration tests covering real-world scenarios and recent changes\n  - Proper mocking for async WebSocket connections\n  - Backwards compatibility validation\n  - Test coverage for all recent changes and enhancements\n- **API Documentation**: Updated API view properties to reflect model structure changes, new enums, and cross-language package identity\n- **Documentation Updates**: Comprehensive updates to all markdown documentation:\n  - Updated README.md to reflect async-only nature with updated examples and installation instructions\n  - Updated samples README.md to remove sync sample references\n  - Enhanced BASIC_VOICE_ASSISTANT.md with comprehensive async implementation guide\n  - Added MIGRATION_GUIDE.md for users upgrading from previous versions\n\n## 1.0.0b4 (2025-09-19)\n\n### Features Added\n\n- **Personal Voice Models**: Added `PersonalVoiceModels` enum with support for `DragonLatestNeural`, `PhoenixLatestNeural`, and `PhoenixV2Neural` models\n- **Enhanced Animation Support**: Added comprehensive server event classes for animation blendshapes and viseme handling:\n  - `ServerEventResponseAnimationBlendshapeDelta` and `ServerEventResponseAnimationBlendshapeDone`\n  - `ServerEventResponseAnimationVisemeDelta` and `ServerEventResponseAnimationVisemeDone`\n- **Audio Timestamp Events**: Added `ServerEventResponseAudioTimestampDelta` and `ServerEventResponseAudioTimestampDone` for better audio timing control\n- **Improved Error Handling**: Added `ErrorResponse` class for better error management\n- **Enhanced Base Classes**: Added `ConversationItemBase` and `SessionBase` for better code organization and inheritance\n- **Token Usage Improvements**: Renamed `Usage` to `TokenUsage` for better clarity\n- **Audio Format Improvements**: Reorganized audio format enums with separate `InputAudioFormat` and `OutputAudioFormat` enums for better clarity\n- **Enhanced Output Audio Format Support**: Added more granular output audio format options including specific sampling rates (8kHz, 16kHz) for PCM16\n\n### Breaking Changes\n\n- **Model Cleanup**: Removed experimental classes `AzurePlatformVoice`, `LLMVoice`, `AzureSemanticVadServer`, `InputAudio`, `NoTurnDetection`, and `ToolChoiceFunctionObjectFunction`\n- **Class Rename**: Renamed `Usage` class to `TokenUsage` for better clarity\n- **Enum Reorganization**:\n  - Replaced `AudioFormat` enum with separate `InputAudioFormat` and `OutputAudioFormat` enums\n  - Removed `Phi4mmVoice` enum\n  - Removed `EMOTION` value from `AnimationOutputType` enum\n  - Removed `IN_PROGRESS` value from `ItemParamStatus` enum\n- **Server Events**: Removed `RESPONSE_EMOTION_HYPOTHESIS` from `ServerEventType` enum\n\n### Other Changes\n\n- **Package Structure**: Simplified package initialization with namespace package support\n- **Sample Updates**: Improved basic voice assistant samples\n- **Code Optimization**: Streamlined model definitions with significant code reduction\n- **API Configuration**: Updated API view properties for better tooling support\n\n## 1.0.0b3 (2025-09-17)\n\n### Features Added\n\n- **Transcription improvement**: Added phrase list\n- **New Voice Types**: Added `AzurePlatformVoice` and `LLMVoice` classes\n- **Enhanced Speech Detection**: Added `AzureSemanticVadServer` class\n- **Improved Function Calling**: Enhanced async function calling sample with better error handling\n- **English-Specific Detection**: Added `AzureSemanticDetectionEn` class for optimized English-only semantic end-of-utterance detection\n- **English-Specific Voice Activity Detection**: Added `AzureSemanticVadEn` class for enhanced English-only voice activity detection\n\n### Breaking Changes\n\n- **Transcription**: Removed `custom_model` and `enabled` from `AudioInputTranscriptionSettings`.\n- **Async Authentication**: Fixed credential handling for async scenarios\n- **Model Serialization**: Improved error handling and deserialization\n\n### Other Changes\n\n- **Code Modernization**: Updated type annotations throughout\n\n## 1.0.0b2 (2025-09-10)\n\n### Features Added\n\n- Async function call\n\n### Bugs Fixed\n\n- Fixed function calling: ensure `FunctionCallOutputItem.output` is properly serialized as a JSON string before sending to the service.\n\n## 1.0.0b1 (2025-08-28)\n\n### Features Added\n\n- Added WebSocket connection support through `connect()`.\n- Added `VoiceLiveConnection` for managing WebSocket connections.\n- Added models of Voice Live preview.\n- Added WebSocket-based examples in the samples directory.\n\n### Other Changes\n\n- Initial preview release.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Microsoft Corporation Azure Ai Voicelive Client Library for Python",
    "version": "1.0.0",
    "project_urls": {
        "repository": "https://github.com/Azure/azure-sdk-for-python"
    },
    "split_keywords": [
        "azure",
        " azure sdk"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "3a08dd166e29378e3184640fc35d177d4b114b2dc1b6bd6340298ab3d884ad23",
                "md5": "6e43d8460da4ebc61e87d3b8fcd88c56",
                "sha256": "985f398d3d05d336792b4164fd307dd7ad57029110db4a888aec745e2ae27c61"
            },
            "downloads": -1,
            "filename": "azure_ai_voicelive-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6e43d8460da4ebc61e87d3b8fcd88c56",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 82939,
            "upload_time": "2025-10-02T18:56:44",
            "upload_time_iso_8601": "2025-10-02T18:56:44.931368Z",
            "url": "https://files.pythonhosted.org/packages/3a/08/dd166e29378e3184640fc35d177d4b114b2dc1b6bd6340298ab3d884ad23/azure_ai_voicelive-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e7cfbc7114d4d625043b1e447b7d09f7a2af52d2ef5c79a13b178449e723a104",
                "md5": "6297aabb992ec86ff49f245030eefb01",
                "sha256": "2c19dd34f8d10398e2c2254e44f05f5182a1b332810bdd370e8cd3da7719a598"
            },
            "downloads": -1,
            "filename": "azure_ai_voicelive-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "6297aabb992ec86ff49f245030eefb01",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 126303,
            "upload_time": "2025-10-02T18:56:43",
            "upload_time_iso_8601": "2025-10-02T18:56:43.447362Z",
            "url": "https://files.pythonhosted.org/packages/e7/cf/bc7114d4d625043b1e447b7d09f7a2af52d2ef5c79a13b178449e723a104/azure_ai_voicelive-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-02 18:56:43",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Azure",
    "github_project": "azure-sdk-for-python",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "lcname": "azure-ai-voicelive"
}
        
Elapsed time: 3.20252s