| Name | azure-ai-voicelive JSON |
| Version |
1.0.0
JSON |
| download |
| home_page | None |
| Summary | Microsoft Corporation Azure Ai Voicelive Client Library for Python |
| upload_time | 2025-10-02 18:56:43 |
| maintainer | None |
| docs_url | None |
| author | None |
| requires_python | >=3.9 |
| license | None |
| keywords |
azure
azure sdk
|
| VCS |
 |
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
|
Azure AI VoiceLive client library for Python
============================================
This package provides a **real-time, speech-to-speech** client for Azure AI VoiceLive.
It opens a WebSocket session to stream microphone audio to the service and receive
typed server events (including audio) for responsive, interruptible conversations.
> **Status:** General Availability (GA). This is a stable release suitable for production use.
> **Important:** As of version 1.0.0, this SDK is **async-only**. The synchronous API has been removed to focus exclusively on async patterns. All examples and samples use `async`/`await` syntax.
---
Getting started
---------------
### Prerequisites
- **Python 3.9+**
- An **Azure subscription**
- A **VoiceLive** resource and endpoint
- A working **microphone** and **speakers/headphones** if you run the voice samples
### Install
Install the stable GA version:
```bash
# Base install (core client only)
python -m pip install azure-ai-voicelive
# For asynchronous streaming (uses aiohttp)
python -m pip install "azure-ai-voicelive[aiohttp]"
# For voice samples (includes audio processing)
python -m pip install azure-ai-voicelive[aiohttp] pyaudio python-dotenv
```
The SDK provides async-only WebSocket connections using `aiohttp` for optimal performance and reliability.
### Authenticate
You can authenticate with an **API key** or an **Azure Active Directory (AAD) token**.
#### API Key Authentication (Quick Start)
Set environment variables in a `.env` file or directly in your environment:
```bash
# In your .env file or environment variables
AZURE_VOICELIVE_API_KEY="your-api-key"
AZURE_VOICELIVE_ENDPOINT="your-endpoint"
```
Then, use the key in your code:
```python
import asyncio
from azure.core.credentials import AzureKeyCredential
from azure.ai.voicelive import connect
async def main():
async with connect(
endpoint="your-endpoint",
credential=AzureKeyCredential("your-api-key"),
model="gpt-4o-realtime-preview"
) as connection:
# Your async code here
pass
asyncio.run(main())
```
#### AAD Token Authentication
For production applications, AAD authentication is recommended:
```python
import asyncio
from azure.identity.aio import DefaultAzureCredential
from azure.ai.voicelive import connect
async def main():
credential = DefaultAzureCredential()
async with connect(
endpoint="your-endpoint",
credential=credential,
model="gpt-4o-realtime-preview"
) as connection:
# Your async code here
pass
asyncio.run(main())
```
---
Key concepts
------------
- **VoiceLiveConnection** – Manages an active async WebSocket connection to the service
- **Session Management** – Configure conversation parameters:
- **SessionResource** – Update session parameters (voice, formats, VAD) with async methods
- **RequestSession** – Strongly-typed session configuration
- **ServerVad** – Configure voice activity detection
- **AzureStandardVoice** – Configure voice settings
- **Audio Handling**:
- **InputAudioBufferResource** – Manage audio input to the service with async methods
- **OutputAudioBufferResource** – Control audio output from the service with async methods
- **Conversation Management**:
- **ResponseResource** – Create or cancel model responses with async methods
- **ConversationResource** – Manage conversation items with async methods
- **Error Handling**:
- **ConnectionError** – Base exception for WebSocket connection errors
- **ConnectionClosed** – Raised when WebSocket connection is closed
- **Strongly-Typed Events** – Process service events with type safety:
- `SESSION_UPDATED`, `RESPONSE_AUDIO_DELTA`, `RESPONSE_DONE`
- `INPUT_AUDIO_BUFFER_SPEECH_STARTED`, `INPUT_AUDIO_BUFFER_SPEECH_STOPPED`
- `ERROR`, and more
---
Examples
--------
### Basic Voice Assistant (Featured Sample)
The Basic Voice Assistant sample demonstrates full-featured voice interaction with:
- Real-time speech streaming
- Server-side voice activity detection
- Interruption handling
- High-quality audio processing
```bash
# Run the basic voice assistant sample
# Requires [aiohttp] for async
python samples/basic_voice_assistant_async.py
# With custom parameters
python samples/basic_voice_assistant_async.py --model gpt-4o-realtime-preview --voice alloy --instructions "You're a helpful assistant"
```
### Minimal example
```python
import asyncio
from azure.core.credentials import AzureKeyCredential
from azure.ai.voicelive.aio import connect
from azure.ai.voicelive.models import (
RequestSession, Modality, InputAudioFormat, OutputAudioFormat, ServerVad, ServerEventType
)
API_KEY = "your-api-key"
ENDPOINT = "wss://your-endpoint.com/openai/realtime"
MODEL = "gpt-4o-realtime-preview"
async def main():
async with connect(
endpoint=ENDPOINT,
credential=AzureKeyCredential(API_KEY),
model=MODEL,
) as conn:
session = RequestSession(
modalities=[Modality.TEXT, Modality.AUDIO],
instructions="You are a helpful assistant.",
input_audio_format=InputAudioFormat.PCM16,
output_audio_format=OutputAudioFormat.PCM16,
turn_detection=ServerVad(
threshold=0.5,
prefix_padding_ms=300,
silence_duration_ms=500
),
)
await conn.session.update(session=session)
# Process events
async for evt in conn:
print(f"Event: {evt.type}")
if evt.type == ServerEventType.RESPONSE_DONE:
break
asyncio.run(main())
```
Available Voice Options
-----------------------
### Azure Neural Voices
```python
# Use Azure Neural voices
voice_config = AzureStandardVoice(
name="en-US-AvaNeural", # Or another voice name
type="azure-standard"
)
```
Popular voices include:
- `en-US-AvaNeural` - Female, natural and professional
- `en-US-JennyNeural` - Female, conversational
- `en-US-GuyNeural` - Male, professional
### OpenAI Voices
```python
# Use OpenAI voices (as string)
voice_config = "alloy" # Or another OpenAI voice
```
Available OpenAI voices:
- `alloy` - Versatile, neutral
- `echo` - Precise, clear
- `fable` - Animated, expressive
- `onyx` - Deep, authoritative
- `nova` - Warm, conversational
- `shimmer` - Optimistic, friendly
---
Handling Events
---------------
```python
async for event in connection:
if event.type == ServerEventType.SESSION_UPDATED:
print(f"Session ready: {event.session.id}")
# Start audio capture
elif event.type == ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STARTED:
print("User started speaking")
# Stop playback and cancel any current response
elif event.type == ServerEventType.RESPONSE_AUDIO_DELTA:
# Play the audio chunk
audio_bytes = event.delta
elif event.type == ServerEventType.ERROR:
print(f"Error: {event.error.message}")
```
---
Troubleshooting
---------------
### Connection Issues
- **WebSocket connection errors (1006/timeout):**
Verify `AZURE_VOICELIVE_ENDPOINT`, network rules, and that your credential has access.
- **Missing WebSocket dependencies:**
If you see import errors, make sure you have installed the package:
pip install azure-ai-voicelive[aiohttp]
- **Auth failures:**
For API key, double-check `AZURE_VOICELIVE_API_KEY`. For AAD, ensure the identity is authorized.
### Audio Device Issues
- **No microphone/speaker detected:**
Check device connections and permissions. On headless CI environments, audio samples can't run.
- **Audio library installation problems:**
On Linux/macOS you may need PortAudio:
```bash
# Debian/Ubuntu
sudo apt-get install -y portaudio19-dev libasound2-dev
# macOS (Homebrew)
brew install portaudio
```
### Enable Verbose Logging
```python
import logging
logging.basicConfig(level=logging.DEBUG)
```
---
Next steps
----------
1. **Run the featured sample:**
- Try `samples/basic_voice_assistant_async.py` for a complete voice assistant implementation
2. **Customize your implementation:**
- Experiment with different voices and parameters
- Add custom instructions for specialized assistants
- Integrate with your own audio capture/playback systems
3. **Advanced scenarios:**
- Add function calling support
- Implement tool usage
- Create multi-turn conversations with history
4. **Explore other samples:**
- Check the `samples/` directory for specialized examples
- See `samples/README.md` for a full list of samples
---
Contributing
------------
This project follows the Azure SDK guidelines. If you'd like to contribute:
1. Fork the repo and create a feature branch
2. Run linters and tests locally
3. Submit a pull request with a clear description of the change
---
Release notes
-------------
Changelogs are available in the package directory.
---
License
-------
This project is released under the **MIT License**.
# Release History
## 1.0.0 (2025-10-01)
### Features Added
- **Enhanced WebSocket Connection Options**: Significantly improved WebSocket connection configuration with transport-agnostic design:
- Added new timeout configuration options: `receive_timeout`, `close_timeout`, and `handshake_timeout` for fine-grained control
- Enhanced `compression` parameter to support both boolean and integer types for advanced zlib window configuration
- Added `vendor_options` parameter for implementation-specific options passthrough (escape hatch for advanced users)
- Improved documentation with clearer descriptions for all connection parameters
- Better support for common aliases from other WebSocket ecosystems (`max_size`, `ping_interval`, etc.)
- More robust option mapping with proper type conversion and safety checks
- **Enhanced Type Safety**: Improved type safety for content parts with proper enum usage:
- `InputAudioContentPart`, `InputTextContentPart`, and `OutputTextContentPart` now use `ContentPartType` enum values instead of string literals
- Better IntelliSense support and compile-time type checking for content part discriminators
### Breaking Changes
- **Improved Naming Conventions**: Updated model and enum names for better clarity and consistency:
- `OAIVoice` enum renamed to `OpenAIVoiceName` for more descriptive naming
- `ToolChoiceObject` model renamed to `ToolChoiceSelection` for better semantic meaning
- `ToolChoiceFunctionObject` model renamed to `ToolChoiceFunctionSelection` for consistency
- Updated type unions and imports to reflect the new naming conventions
- Cross-language package mappings updated to maintain compatibility across SDKs
- **Session Model Architecture**: Separated `ResponseSession` and `RequestSession` models for better design clarity:
- `ResponseSession` no longer inherits from `RequestSession` and now inherits directly from `_Model`
- All session configuration fields are now explicitly defined in `ResponseSession` instead of being inherited
- This provides clearer separation of concerns between request and response session configurations
- May affect type checking and code that relied on the previous inheritance relationship
- **Model Cleanup**: Removed unused `AgentConfig` model and related fields from the public API:
- `AgentConfig` class has been completely removed from imports and exports
- `agent` field removed from `ResponseSession` model (including constructor parameter)
- Updated cross-language package mappings to reflect the removal
- **Model Naming Convention Update**: Renamed `EOUDetection` to `EouDetection` for better naming consistency:
- Class name changed from `EOUDetection` to `EouDetection`
- All inheritance relationships updated: `AzureSemanticDetection`, `AzureSemanticDetectionEn`, and `AzureSemanticDetectionMultilingual` now inherit from `EouDetection`
- Type annotations updated in `AzureSemanticVad`, `AzureSemanticVadEn`, `AzureSemanticVadMultilingual`, and `ServerVad` classes
- Import statements and exports updated to reflect the new naming
- **Enhanced Content Part Type Safety**: Content part discriminators now use enum values instead of string literals:
- `InputAudioContentPart.type` now uses `ContentPartType.INPUT_AUDIO` instead of `"input_audio"`
- `InputTextContentPart.type` now uses `ContentPartType.INPUT_TEXT` instead of `"input_text"`
- `OutputTextContentPart.type` now uses `ContentPartType.TEXT` instead of `"text"`
### Other Changes
- Initial GA release
## 1.0.0b5 (2025-09-26)
### Features Added
- **Enhanced Semantic Detection Type Safety**: Added new `EouThresholdLevel` enum for better type safety in end-of-utterance detection:
- `LOW` for low sensitivity threshold level
- `MEDIUM` for medium sensitivity threshold level
- `HIGH` for high sensitivity threshold level
- `DEFAULT` for default sensitivity threshold level
- **Improved Semantic Detection Configuration**: Enhanced semantic detection classes with better type annotations:
- `threshold_level` parameter now supports both string values and `EouThresholdLevel` enum
- Cleaner type definitions for `AzureSemanticDetection`, `AzureSemanticDetectionEn`, and `AzureSemanticDetectionMultilingual`
- Improved documentation for threshold level parameters
- **Comprehensive Unit Test Suite**: Added extensive unit test coverage with 200+ test cases covering:
- All enum types and their functionality
- Model creation, validation, and serialization
- Async connection functionality with proper mocking
- Client event handling and workflows
- Voice configuration across all supported types
- Message handling with content part hierarchy
- Integration scenarios and real-world usage patterns
- Recent changes validation and backwards compatibility
- **API Version Update**: Updated to API version `2025-10-01` (from `2025-05-01-preview`)
- **Enhanced Type Safety**: Added new `AzureVoiceType` enum with values for better Azure voice type categorization:
- `AZURE_CUSTOM` for custom voice configurations
- `AZURE_STANDARD` for standard voice configurations
- `AZURE_PERSONAL` for personal voice configurations
- **Improved Message Handling**: Added `MessageRole` enum for better role type safety in message items
- **Enhanced Model Documentation**: Comprehensive documentation improvements across all models:
- Added detailed docstrings for model classes and their parameters
- Enhanced enum value documentation with descriptions
- Improved type annotations and parameter descriptions
- **Enhanced Semantic Detection**: Added improved configuration options for all semantic detection classes:
- Added `threshold_level` parameter with options: `"low"`, `"medium"`, `"high"`, `"default"` (recommended over deprecated `threshold`)
- Added `timeout_ms` parameter for timeout configuration in milliseconds (recommended over deprecated `timeout`)
- **Video Background Support**: Added new `Background` model for video background customization:
- Support for solid color backgrounds in hex format (e.g., `#00FF00FF`)
- Support for image URL backgrounds
- Mutually exclusive color and image URL options
- **Enhanced Video Parameters**: Extended `VideoParams` model with:
- `background` parameter for configuring video backgrounds using the new `Background` model
- `gop_size` parameter for Group of Pictures (GOP) size control, affecting compression efficiency and seeking performance
- **Improved Type Safety**: Added `TurnDetectionType` enum for better type safety and IntelliSense support
- **Package Structure Modernization**: Simplified package initialization with namespace package support
- **Enhanced Error Handling**: Added `ConnectionError` and `ConnectionClosed` exception classes to the async API for better WebSocket error management
### Breaking Changes
- **Cross-Language Package Identity Update**: Updated package ID from `VoiceLive` to `VoiceLive.WebSocket` for better cross-language consistency
- **Model Refactoring**:
- Renamed `UserContentPart` to `MessageContentPart` for clearer content part hierarchy
- All message items now require a `content` field with list of `MessageContentPart` objects
- `OutputTextContentPart` now inherits from `MessageContentPart` instead of being standalone
- **Enhanced Type Safety**:
- Azure voice classes now use `AzureVoiceType` enum discriminators instead of string literals
- Message role discriminators now use `MessageRole` enum values for better type safety
- **Removed Deprecated Parameters**: Completely removed deprecated parameters from semantic detection classes:
- Removed `threshold` parameter from all semantic detection classes (`AzureSemanticDetection`, `AzureSemanticDetectionEn`, `AzureSemanticDetectionMultilingual`)
- Removed `timeout` parameter from all semantic detection classes
- Users must now use `threshold_level` and `timeout_ms` parameters respectively
- **Removed Synchronous API**: Completely removed synchronous WebSocket operations to focus exclusively on async patterns:
- Removed sync `connect()` function and sync `VoiceLiveConnection` class from main patch implementation
- Removed sync `basic_voice_assistant.py` sample (only async version remains)
- Simplified sync patch to minimal structure with empty exports
- All functionality now available only through async patterns
- **Updated Dependencies**: Modified package dependencies to reflect async-only architecture:
- Moved `aiohttp>=3.9.0,<4.0.0` from optional to required dependency
- Removed `websockets` optional dependency as sync API no longer exists
- Removed optional dependency groups `websockets`, `aiohttp`, and `all-websockets`
- **Model Rename**:
- Renamed `AudioInputTranscriptionSettings` to `AudioInputTranscriptionOptions` for consistency with naming conventions
- Renamed `AzureMultilingualSemanticVad` to `AzureSemanticVadMultilingual` for naming consistency with other multilingual variants
- **Enhanced Type Safety**: Turn detection discriminator types now use enum values instead of string literals for better type safety
### Bug Fixes
- **Serialization Improvements**: Fixed type casting issue in serialization utilities for better enum handling and type safety
### Other Changes
- **Testing Infrastructure**: Added comprehensive unit test suite with extensive coverage:
- 8 main test files with 200+ individual test methods
- Tests for all enums, models, async operations, client events, voice configurations, and message handling
- Integration tests covering real-world scenarios and recent changes
- Proper mocking for async WebSocket connections
- Backwards compatibility validation
- Test coverage for all recent changes and enhancements
- **API Documentation**: Updated API view properties to reflect model structure changes, new enums, and cross-language package identity
- **Documentation Updates**: Comprehensive updates to all markdown documentation:
- Updated README.md to reflect async-only nature with updated examples and installation instructions
- Updated samples README.md to remove sync sample references
- Enhanced BASIC_VOICE_ASSISTANT.md with comprehensive async implementation guide
- Added MIGRATION_GUIDE.md for users upgrading from previous versions
## 1.0.0b4 (2025-09-19)
### Features Added
- **Personal Voice Models**: Added `PersonalVoiceModels` enum with support for `DragonLatestNeural`, `PhoenixLatestNeural`, and `PhoenixV2Neural` models
- **Enhanced Animation Support**: Added comprehensive server event classes for animation blendshapes and viseme handling:
- `ServerEventResponseAnimationBlendshapeDelta` and `ServerEventResponseAnimationBlendshapeDone`
- `ServerEventResponseAnimationVisemeDelta` and `ServerEventResponseAnimationVisemeDone`
- **Audio Timestamp Events**: Added `ServerEventResponseAudioTimestampDelta` and `ServerEventResponseAudioTimestampDone` for better audio timing control
- **Improved Error Handling**: Added `ErrorResponse` class for better error management
- **Enhanced Base Classes**: Added `ConversationItemBase` and `SessionBase` for better code organization and inheritance
- **Token Usage Improvements**: Renamed `Usage` to `TokenUsage` for better clarity
- **Audio Format Improvements**: Reorganized audio format enums with separate `InputAudioFormat` and `OutputAudioFormat` enums for better clarity
- **Enhanced Output Audio Format Support**: Added more granular output audio format options including specific sampling rates (8kHz, 16kHz) for PCM16
### Breaking Changes
- **Model Cleanup**: Removed experimental classes `AzurePlatformVoice`, `LLMVoice`, `AzureSemanticVadServer`, `InputAudio`, `NoTurnDetection`, and `ToolChoiceFunctionObjectFunction`
- **Class Rename**: Renamed `Usage` class to `TokenUsage` for better clarity
- **Enum Reorganization**:
- Replaced `AudioFormat` enum with separate `InputAudioFormat` and `OutputAudioFormat` enums
- Removed `Phi4mmVoice` enum
- Removed `EMOTION` value from `AnimationOutputType` enum
- Removed `IN_PROGRESS` value from `ItemParamStatus` enum
- **Server Events**: Removed `RESPONSE_EMOTION_HYPOTHESIS` from `ServerEventType` enum
### Other Changes
- **Package Structure**: Simplified package initialization with namespace package support
- **Sample Updates**: Improved basic voice assistant samples
- **Code Optimization**: Streamlined model definitions with significant code reduction
- **API Configuration**: Updated API view properties for better tooling support
## 1.0.0b3 (2025-09-17)
### Features Added
- **Transcription improvement**: Added phrase list
- **New Voice Types**: Added `AzurePlatformVoice` and `LLMVoice` classes
- **Enhanced Speech Detection**: Added `AzureSemanticVadServer` class
- **Improved Function Calling**: Enhanced async function calling sample with better error handling
- **English-Specific Detection**: Added `AzureSemanticDetectionEn` class for optimized English-only semantic end-of-utterance detection
- **English-Specific Voice Activity Detection**: Added `AzureSemanticVadEn` class for enhanced English-only voice activity detection
### Breaking Changes
- **Transcription**: Removed `custom_model` and `enabled` from `AudioInputTranscriptionSettings`.
- **Async Authentication**: Fixed credential handling for async scenarios
- **Model Serialization**: Improved error handling and deserialization
### Other Changes
- **Code Modernization**: Updated type annotations throughout
## 1.0.0b2 (2025-09-10)
### Features Added
- Async function call
### Bugs Fixed
- Fixed function calling: ensure `FunctionCallOutputItem.output` is properly serialized as a JSON string before sending to the service.
## 1.0.0b1 (2025-08-28)
### Features Added
- Added WebSocket connection support through `connect()`.
- Added `VoiceLiveConnection` for managing WebSocket connections.
- Added models of Voice Live preview.
- Added WebSocket-based examples in the samples directory.
### Other Changes
- Initial preview release.
Raw data
{
"_id": null,
"home_page": null,
"name": "azure-ai-voicelive",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "azure, azure sdk",
"author": null,
"author_email": "Microsoft Corporation <azpysdkhelp@microsoft.com> License-Expression: MIT",
"download_url": "https://files.pythonhosted.org/packages/e7/cf/bc7114d4d625043b1e447b7d09f7a2af52d2ef5c79a13b178449e723a104/azure_ai_voicelive-1.0.0.tar.gz",
"platform": null,
"description": "Azure AI VoiceLive client library for Python\n============================================\n\nThis package provides a **real-time, speech-to-speech** client for Azure AI VoiceLive.\nIt opens a WebSocket session to stream microphone audio to the service and receive\ntyped server events (including audio) for responsive, interruptible conversations.\n\n> **Status:** General Availability (GA). This is a stable release suitable for production use.\n\n> **Important:** As of version 1.0.0, this SDK is **async-only**. The synchronous API has been removed to focus exclusively on async patterns. All examples and samples use `async`/`await` syntax.\n\n---\n\nGetting started\n---------------\n\n### Prerequisites\n\n- **Python 3.9+**\n- An **Azure subscription**\n- A **VoiceLive** resource and endpoint\n- A working **microphone** and **speakers/headphones** if you run the voice samples\n\n### Install\n\nInstall the stable GA version:\n\n```bash\n# Base install (core client only)\npython -m pip install azure-ai-voicelive\n\n# For asynchronous streaming (uses aiohttp)\npython -m pip install \"azure-ai-voicelive[aiohttp]\"\n\n# For voice samples (includes audio processing)\npython -m pip install azure-ai-voicelive[aiohttp] pyaudio python-dotenv\n```\n\nThe SDK provides async-only WebSocket connections using `aiohttp` for optimal performance and reliability.\n\n### Authenticate\n\nYou can authenticate with an **API key** or an **Azure Active Directory (AAD) token**.\n\n#### API Key Authentication (Quick Start)\n\nSet environment variables in a `.env` file or directly in your environment:\n\n```bash\n# In your .env file or environment variables\nAZURE_VOICELIVE_API_KEY=\"your-api-key\"\nAZURE_VOICELIVE_ENDPOINT=\"your-endpoint\"\n```\n\nThen, use the key in your code:\n\n```python\nimport asyncio\nfrom azure.core.credentials import AzureKeyCredential\nfrom azure.ai.voicelive import connect\n\nasync def main():\n async with connect(\n endpoint=\"your-endpoint\",\n credential=AzureKeyCredential(\"your-api-key\"),\n model=\"gpt-4o-realtime-preview\"\n ) as connection:\n # Your async code here\n pass\n\nasyncio.run(main())\n```\n\n#### AAD Token Authentication\n\nFor production applications, AAD authentication is recommended:\n\n```python\nimport asyncio\nfrom azure.identity.aio import DefaultAzureCredential\nfrom azure.ai.voicelive import connect\n\nasync def main():\n credential = DefaultAzureCredential()\n \n async with connect(\n endpoint=\"your-endpoint\",\n credential=credential,\n model=\"gpt-4o-realtime-preview\"\n ) as connection:\n # Your async code here\n pass\n\nasyncio.run(main())\n```\n\n---\n\nKey concepts\n------------\n\n- **VoiceLiveConnection** \u2013 Manages an active async WebSocket connection to the service\n- **Session Management** \u2013 Configure conversation parameters:\n - **SessionResource** \u2013 Update session parameters (voice, formats, VAD) with async methods\n - **RequestSession** \u2013 Strongly-typed session configuration\n - **ServerVad** \u2013 Configure voice activity detection\n - **AzureStandardVoice** \u2013 Configure voice settings\n- **Audio Handling**:\n - **InputAudioBufferResource** \u2013 Manage audio input to the service with async methods\n - **OutputAudioBufferResource** \u2013 Control audio output from the service with async methods\n- **Conversation Management**:\n - **ResponseResource** \u2013 Create or cancel model responses with async methods\n - **ConversationResource** \u2013 Manage conversation items with async methods\n- **Error Handling**: \n - **ConnectionError** \u2013 Base exception for WebSocket connection errors\n - **ConnectionClosed** \u2013 Raised when WebSocket connection is closed\n- **Strongly-Typed Events** \u2013 Process service events with type safety:\n - `SESSION_UPDATED`, `RESPONSE_AUDIO_DELTA`, `RESPONSE_DONE`\n - `INPUT_AUDIO_BUFFER_SPEECH_STARTED`, `INPUT_AUDIO_BUFFER_SPEECH_STOPPED`\n - `ERROR`, and more\n\n---\n\nExamples\n--------\n\n### Basic Voice Assistant (Featured Sample)\n\nThe Basic Voice Assistant sample demonstrates full-featured voice interaction with:\n\n- Real-time speech streaming\n- Server-side voice activity detection \n- Interruption handling\n- High-quality audio processing\n\n```bash\n# Run the basic voice assistant sample\n# Requires [aiohttp] for async\npython samples/basic_voice_assistant_async.py\n\n# With custom parameters\npython samples/basic_voice_assistant_async.py --model gpt-4o-realtime-preview --voice alloy --instructions \"You're a helpful assistant\"\n```\n\n### Minimal example\n\n```python\nimport asyncio\nfrom azure.core.credentials import AzureKeyCredential\nfrom azure.ai.voicelive.aio import connect\nfrom azure.ai.voicelive.models import (\n RequestSession, Modality, InputAudioFormat, OutputAudioFormat, ServerVad, ServerEventType\n)\n\nAPI_KEY = \"your-api-key\"\nENDPOINT = \"wss://your-endpoint.com/openai/realtime\"\nMODEL = \"gpt-4o-realtime-preview\"\n\nasync def main():\n async with connect(\n endpoint=ENDPOINT,\n credential=AzureKeyCredential(API_KEY),\n model=MODEL,\n ) as conn:\n session = RequestSession(\n modalities=[Modality.TEXT, Modality.AUDIO],\n instructions=\"You are a helpful assistant.\",\n input_audio_format=InputAudioFormat.PCM16,\n output_audio_format=OutputAudioFormat.PCM16,\n turn_detection=ServerVad(\n threshold=0.5, \n prefix_padding_ms=300, \n silence_duration_ms=500\n ),\n )\n await conn.session.update(session=session)\n\n # Process events\n async for evt in conn:\n print(f\"Event: {evt.type}\")\n if evt.type == ServerEventType.RESPONSE_DONE:\n break\n\nasyncio.run(main())\n```\n\nAvailable Voice Options\n-----------------------\n\n### Azure Neural Voices\n\n```python\n# Use Azure Neural voices\nvoice_config = AzureStandardVoice(\n name=\"en-US-AvaNeural\", # Or another voice name\n type=\"azure-standard\"\n)\n```\n\nPopular voices include:\n\n- `en-US-AvaNeural` - Female, natural and professional\n- `en-US-JennyNeural` - Female, conversational\n- `en-US-GuyNeural` - Male, professional\n\n### OpenAI Voices\n\n```python\n# Use OpenAI voices (as string)\nvoice_config = \"alloy\" # Or another OpenAI voice\n```\n\nAvailable OpenAI voices:\n\n- `alloy` - Versatile, neutral\n- `echo` - Precise, clear\n- `fable` - Animated, expressive\n- `onyx` - Deep, authoritative\n- `nova` - Warm, conversational\n- `shimmer` - Optimistic, friendly\n\n---\n\nHandling Events\n---------------\n\n```python\nasync for event in connection:\n if event.type == ServerEventType.SESSION_UPDATED:\n print(f\"Session ready: {event.session.id}\")\n # Start audio capture\n \n elif event.type == ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STARTED:\n print(\"User started speaking\")\n # Stop playback and cancel any current response\n \n elif event.type == ServerEventType.RESPONSE_AUDIO_DELTA:\n # Play the audio chunk\n audio_bytes = event.delta\n \n elif event.type == ServerEventType.ERROR:\n print(f\"Error: {event.error.message}\")\n```\n\n---\n\nTroubleshooting\n---------------\n\n### Connection Issues\n\n- **WebSocket connection errors (1006/timeout):** \n Verify `AZURE_VOICELIVE_ENDPOINT`, network rules, and that your credential has access.\n\n- **Missing WebSocket dependencies:** \n If you see import errors, make sure you have installed the package:\n pip install azure-ai-voicelive[aiohttp]\n\n- **Auth failures:** \n For API key, double-check `AZURE_VOICELIVE_API_KEY`. For AAD, ensure the identity is authorized.\n\n### Audio Device Issues\n\n- **No microphone/speaker detected:** \n Check device connections and permissions. On headless CI environments, audio samples can't run.\n\n- **Audio library installation problems:** \n On Linux/macOS you may need PortAudio:\n\n ```bash\n # Debian/Ubuntu\n sudo apt-get install -y portaudio19-dev libasound2-dev\n # macOS (Homebrew)\n brew install portaudio\n ```\n\n### Enable Verbose Logging\n\n```python\nimport logging\nlogging.basicConfig(level=logging.DEBUG)\n```\n\n---\n\nNext steps\n----------\n\n1. **Run the featured sample:**\n - Try `samples/basic_voice_assistant_async.py` for a complete voice assistant implementation\n\n2. **Customize your implementation:**\n - Experiment with different voices and parameters\n - Add custom instructions for specialized assistants\n - Integrate with your own audio capture/playback systems\n\n3. **Advanced scenarios:**\n - Add function calling support\n - Implement tool usage\n - Create multi-turn conversations with history\n\n4. **Explore other samples:**\n - Check the `samples/` directory for specialized examples\n - See `samples/README.md` for a full list of samples\n\n---\n\nContributing\n------------\n\nThis project follows the Azure SDK guidelines. If you'd like to contribute:\n\n1. Fork the repo and create a feature branch\n2. Run linters and tests locally\n3. Submit a pull request with a clear description of the change\n\n---\n\nRelease notes\n-------------\n\nChangelogs are available in the package directory.\n\n---\n\nLicense\n-------\n\nThis project is released under the **MIT License**.\n\n# Release History\n\n## 1.0.0 (2025-10-01)\n\n### Features Added\n\n- **Enhanced WebSocket Connection Options**: Significantly improved WebSocket connection configuration with transport-agnostic design:\n - Added new timeout configuration options: `receive_timeout`, `close_timeout`, and `handshake_timeout` for fine-grained control\n - Enhanced `compression` parameter to support both boolean and integer types for advanced zlib window configuration\n - Added `vendor_options` parameter for implementation-specific options passthrough (escape hatch for advanced users)\n - Improved documentation with clearer descriptions for all connection parameters\n - Better support for common aliases from other WebSocket ecosystems (`max_size`, `ping_interval`, etc.)\n - More robust option mapping with proper type conversion and safety checks\n- **Enhanced Type Safety**: Improved type safety for content parts with proper enum usage:\n - `InputAudioContentPart`, `InputTextContentPart`, and `OutputTextContentPart` now use `ContentPartType` enum values instead of string literals\n - Better IntelliSense support and compile-time type checking for content part discriminators\n\n### Breaking Changes\n\n- **Improved Naming Conventions**: Updated model and enum names for better clarity and consistency:\n - `OAIVoice` enum renamed to `OpenAIVoiceName` for more descriptive naming\n - `ToolChoiceObject` model renamed to `ToolChoiceSelection` for better semantic meaning\n - `ToolChoiceFunctionObject` model renamed to `ToolChoiceFunctionSelection` for consistency\n - Updated type unions and imports to reflect the new naming conventions\n - Cross-language package mappings updated to maintain compatibility across SDKs\n- **Session Model Architecture**: Separated `ResponseSession` and `RequestSession` models for better design clarity:\n - `ResponseSession` no longer inherits from `RequestSession` and now inherits directly from `_Model`\n - All session configuration fields are now explicitly defined in `ResponseSession` instead of being inherited\n - This provides clearer separation of concerns between request and response session configurations\n - May affect type checking and code that relied on the previous inheritance relationship\n- **Model Cleanup**: Removed unused `AgentConfig` model and related fields from the public API:\n - `AgentConfig` class has been completely removed from imports and exports\n - `agent` field removed from `ResponseSession` model (including constructor parameter)\n - Updated cross-language package mappings to reflect the removal\n- **Model Naming Convention Update**: Renamed `EOUDetection` to `EouDetection` for better naming consistency:\n - Class name changed from `EOUDetection` to `EouDetection` \n - All inheritance relationships updated: `AzureSemanticDetection`, `AzureSemanticDetectionEn`, and `AzureSemanticDetectionMultilingual` now inherit from `EouDetection`\n - Type annotations updated in `AzureSemanticVad`, `AzureSemanticVadEn`, `AzureSemanticVadMultilingual`, and `ServerVad` classes\n - Import statements and exports updated to reflect the new naming\n- **Enhanced Content Part Type Safety**: Content part discriminators now use enum values instead of string literals:\n - `InputAudioContentPart.type` now uses `ContentPartType.INPUT_AUDIO` instead of `\"input_audio\"`\n - `InputTextContentPart.type` now uses `ContentPartType.INPUT_TEXT` instead of `\"input_text\"` \n - `OutputTextContentPart.type` now uses `ContentPartType.TEXT` instead of `\"text\"`\n\n### Other Changes\n\n- Initial GA release\n\n## 1.0.0b5 (2025-09-26)\n\n### Features Added\n\n- **Enhanced Semantic Detection Type Safety**: Added new `EouThresholdLevel` enum for better type safety in end-of-utterance detection:\n - `LOW` for low sensitivity threshold level\n - `MEDIUM` for medium sensitivity threshold level \n - `HIGH` for high sensitivity threshold level\n - `DEFAULT` for default sensitivity threshold level\n- **Improved Semantic Detection Configuration**: Enhanced semantic detection classes with better type annotations:\n - `threshold_level` parameter now supports both string values and `EouThresholdLevel` enum\n - Cleaner type definitions for `AzureSemanticDetection`, `AzureSemanticDetectionEn`, and `AzureSemanticDetectionMultilingual`\n - Improved documentation for threshold level parameters\n- **Comprehensive Unit Test Suite**: Added extensive unit test coverage with 200+ test cases covering:\n - All enum types and their functionality\n - Model creation, validation, and serialization\n - Async connection functionality with proper mocking\n - Client event handling and workflows\n - Voice configuration across all supported types\n - Message handling with content part hierarchy\n - Integration scenarios and real-world usage patterns\n - Recent changes validation and backwards compatibility\n- **API Version Update**: Updated to API version `2025-10-01` (from `2025-05-01-preview`)\n- **Enhanced Type Safety**: Added new `AzureVoiceType` enum with values for better Azure voice type categorization:\n - `AZURE_CUSTOM` for custom voice configurations\n - `AZURE_STANDARD` for standard voice configurations \n - `AZURE_PERSONAL` for personal voice configurations\n- **Improved Message Handling**: Added `MessageRole` enum for better role type safety in message items\n- **Enhanced Model Documentation**: Comprehensive documentation improvements across all models:\n - Added detailed docstrings for model classes and their parameters\n - Enhanced enum value documentation with descriptions\n - Improved type annotations and parameter descriptions\n- **Enhanced Semantic Detection**: Added improved configuration options for all semantic detection classes:\n - Added `threshold_level` parameter with options: `\"low\"`, `\"medium\"`, `\"high\"`, `\"default\"` (recommended over deprecated `threshold`)\n - Added `timeout_ms` parameter for timeout configuration in milliseconds (recommended over deprecated `timeout`)\n- **Video Background Support**: Added new `Background` model for video background customization:\n - Support for solid color backgrounds in hex format (e.g., `#00FF00FF`)\n - Support for image URL backgrounds\n - Mutually exclusive color and image URL options\n- **Enhanced Video Parameters**: Extended `VideoParams` model with:\n - `background` parameter for configuring video backgrounds using the new `Background` model\n - `gop_size` parameter for Group of Pictures (GOP) size control, affecting compression efficiency and seeking performance\n- **Improved Type Safety**: Added `TurnDetectionType` enum for better type safety and IntelliSense support\n- **Package Structure Modernization**: Simplified package initialization with namespace package support\n- **Enhanced Error Handling**: Added `ConnectionError` and `ConnectionClosed` exception classes to the async API for better WebSocket error management\n\n### Breaking Changes\n\n- **Cross-Language Package Identity Update**: Updated package ID from `VoiceLive` to `VoiceLive.WebSocket` for better cross-language consistency\n- **Model Refactoring**: \n - Renamed `UserContentPart` to `MessageContentPart` for clearer content part hierarchy\n - All message items now require a `content` field with list of `MessageContentPart` objects\n - `OutputTextContentPart` now inherits from `MessageContentPart` instead of being standalone\n- **Enhanced Type Safety**: \n - Azure voice classes now use `AzureVoiceType` enum discriminators instead of string literals\n - Message role discriminators now use `MessageRole` enum values for better type safety\n- **Removed Deprecated Parameters**: Completely removed deprecated parameters from semantic detection classes:\n - Removed `threshold` parameter from all semantic detection classes (`AzureSemanticDetection`, `AzureSemanticDetectionEn`, `AzureSemanticDetectionMultilingual`)\n - Removed `timeout` parameter from all semantic detection classes\n - Users must now use `threshold_level` and `timeout_ms` parameters respectively\n- **Removed Synchronous API**: Completely removed synchronous WebSocket operations to focus exclusively on async patterns:\n - Removed sync `connect()` function and sync `VoiceLiveConnection` class from main patch implementation\n - Removed sync `basic_voice_assistant.py` sample (only async version remains)\n - Simplified sync patch to minimal structure with empty exports\n - All functionality now available only through async patterns\n- **Updated Dependencies**: Modified package dependencies to reflect async-only architecture:\n - Moved `aiohttp>=3.9.0,<4.0.0` from optional to required dependency\n - Removed `websockets` optional dependency as sync API no longer exists\n - Removed optional dependency groups `websockets`, `aiohttp`, and `all-websockets`\n- **Model Rename**:\n - Renamed `AudioInputTranscriptionSettings` to `AudioInputTranscriptionOptions` for consistency with naming conventions\n - Renamed `AzureMultilingualSemanticVad` to `AzureSemanticVadMultilingual` for naming consistency with other multilingual variants\n- **Enhanced Type Safety**: Turn detection discriminator types now use enum values instead of string literals for better type safety\n\n### Bug Fixes\n\n- **Serialization Improvements**: Fixed type casting issue in serialization utilities for better enum handling and type safety\n\n### Other Changes\n\n- **Testing Infrastructure**: Added comprehensive unit test suite with extensive coverage:\n - 8 main test files with 200+ individual test methods\n - Tests for all enums, models, async operations, client events, voice configurations, and message handling\n - Integration tests covering real-world scenarios and recent changes\n - Proper mocking for async WebSocket connections\n - Backwards compatibility validation\n - Test coverage for all recent changes and enhancements\n- **API Documentation**: Updated API view properties to reflect model structure changes, new enums, and cross-language package identity\n- **Documentation Updates**: Comprehensive updates to all markdown documentation:\n - Updated README.md to reflect async-only nature with updated examples and installation instructions\n - Updated samples README.md to remove sync sample references\n - Enhanced BASIC_VOICE_ASSISTANT.md with comprehensive async implementation guide\n - Added MIGRATION_GUIDE.md for users upgrading from previous versions\n\n## 1.0.0b4 (2025-09-19)\n\n### Features Added\n\n- **Personal Voice Models**: Added `PersonalVoiceModels` enum with support for `DragonLatestNeural`, `PhoenixLatestNeural`, and `PhoenixV2Neural` models\n- **Enhanced Animation Support**: Added comprehensive server event classes for animation blendshapes and viseme handling:\n - `ServerEventResponseAnimationBlendshapeDelta` and `ServerEventResponseAnimationBlendshapeDone`\n - `ServerEventResponseAnimationVisemeDelta` and `ServerEventResponseAnimationVisemeDone`\n- **Audio Timestamp Events**: Added `ServerEventResponseAudioTimestampDelta` and `ServerEventResponseAudioTimestampDone` for better audio timing control\n- **Improved Error Handling**: Added `ErrorResponse` class for better error management\n- **Enhanced Base Classes**: Added `ConversationItemBase` and `SessionBase` for better code organization and inheritance\n- **Token Usage Improvements**: Renamed `Usage` to `TokenUsage` for better clarity\n- **Audio Format Improvements**: Reorganized audio format enums with separate `InputAudioFormat` and `OutputAudioFormat` enums for better clarity\n- **Enhanced Output Audio Format Support**: Added more granular output audio format options including specific sampling rates (8kHz, 16kHz) for PCM16\n\n### Breaking Changes\n\n- **Model Cleanup**: Removed experimental classes `AzurePlatformVoice`, `LLMVoice`, `AzureSemanticVadServer`, `InputAudio`, `NoTurnDetection`, and `ToolChoiceFunctionObjectFunction`\n- **Class Rename**: Renamed `Usage` class to `TokenUsage` for better clarity\n- **Enum Reorganization**:\n - Replaced `AudioFormat` enum with separate `InputAudioFormat` and `OutputAudioFormat` enums\n - Removed `Phi4mmVoice` enum\n - Removed `EMOTION` value from `AnimationOutputType` enum\n - Removed `IN_PROGRESS` value from `ItemParamStatus` enum\n- **Server Events**: Removed `RESPONSE_EMOTION_HYPOTHESIS` from `ServerEventType` enum\n\n### Other Changes\n\n- **Package Structure**: Simplified package initialization with namespace package support\n- **Sample Updates**: Improved basic voice assistant samples\n- **Code Optimization**: Streamlined model definitions with significant code reduction\n- **API Configuration**: Updated API view properties for better tooling support\n\n## 1.0.0b3 (2025-09-17)\n\n### Features Added\n\n- **Transcription improvement**: Added phrase list\n- **New Voice Types**: Added `AzurePlatformVoice` and `LLMVoice` classes\n- **Enhanced Speech Detection**: Added `AzureSemanticVadServer` class\n- **Improved Function Calling**: Enhanced async function calling sample with better error handling\n- **English-Specific Detection**: Added `AzureSemanticDetectionEn` class for optimized English-only semantic end-of-utterance detection\n- **English-Specific Voice Activity Detection**: Added `AzureSemanticVadEn` class for enhanced English-only voice activity detection\n\n### Breaking Changes\n\n- **Transcription**: Removed `custom_model` and `enabled` from `AudioInputTranscriptionSettings`.\n- **Async Authentication**: Fixed credential handling for async scenarios\n- **Model Serialization**: Improved error handling and deserialization\n\n### Other Changes\n\n- **Code Modernization**: Updated type annotations throughout\n\n## 1.0.0b2 (2025-09-10)\n\n### Features Added\n\n- Async function call\n\n### Bugs Fixed\n\n- Fixed function calling: ensure `FunctionCallOutputItem.output` is properly serialized as a JSON string before sending to the service.\n\n## 1.0.0b1 (2025-08-28)\n\n### Features Added\n\n- Added WebSocket connection support through `connect()`.\n- Added `VoiceLiveConnection` for managing WebSocket connections.\n- Added models of Voice Live preview.\n- Added WebSocket-based examples in the samples directory.\n\n### Other Changes\n\n- Initial preview release.\n",
"bugtrack_url": null,
"license": null,
"summary": "Microsoft Corporation Azure Ai Voicelive Client Library for Python",
"version": "1.0.0",
"project_urls": {
"repository": "https://github.com/Azure/azure-sdk-for-python"
},
"split_keywords": [
"azure",
" azure sdk"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "3a08dd166e29378e3184640fc35d177d4b114b2dc1b6bd6340298ab3d884ad23",
"md5": "6e43d8460da4ebc61e87d3b8fcd88c56",
"sha256": "985f398d3d05d336792b4164fd307dd7ad57029110db4a888aec745e2ae27c61"
},
"downloads": -1,
"filename": "azure_ai_voicelive-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6e43d8460da4ebc61e87d3b8fcd88c56",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 82939,
"upload_time": "2025-10-02T18:56:44",
"upload_time_iso_8601": "2025-10-02T18:56:44.931368Z",
"url": "https://files.pythonhosted.org/packages/3a/08/dd166e29378e3184640fc35d177d4b114b2dc1b6bd6340298ab3d884ad23/azure_ai_voicelive-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "e7cfbc7114d4d625043b1e447b7d09f7a2af52d2ef5c79a13b178449e723a104",
"md5": "6297aabb992ec86ff49f245030eefb01",
"sha256": "2c19dd34f8d10398e2c2254e44f05f5182a1b332810bdd370e8cd3da7719a598"
},
"downloads": -1,
"filename": "azure_ai_voicelive-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "6297aabb992ec86ff49f245030eefb01",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 126303,
"upload_time": "2025-10-02T18:56:43",
"upload_time_iso_8601": "2025-10-02T18:56:43.447362Z",
"url": "https://files.pythonhosted.org/packages/e7/cf/bc7114d4d625043b1e447b7d09f7a2af52d2ef5c79a13b178449e723a104/azure_ai_voicelive-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-02 18:56:43",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Azure",
"github_project": "azure-sdk-for-python",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"lcname": "azure-ai-voicelive"
}