Name | azure-ai-voicelive JSON |
Version |
1.0.0b1
JSON |
| download |
home_page | None |
Summary | Microsoft Corporation Azure Ai Voicelive Client Library for Python |
upload_time | 2025-08-29 00:54:50 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.9 |
license | None |
keywords |
azure
azure sdk
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
|
Azure AI VoiceLive client library for Python
============================================
This package provides a **real-time, speech-to-speech** client for Azure AI VoiceLive.
It opens a WebSocket session to stream microphone audio to the service and receive
typed server events (including audio) for responsive, interruptible conversations.
> **Status:** Preview. APIs are subject to change.
---
Getting started
---------------
### Prerequisites
- **Python 3.9+**
- An **Azure subscription**
- A **VoiceLive** resource and endpoint
- A working **microphone** and **speakers/headphones** if you run the voice samples
### Install
```bash
# Base install (core client only)
python -m pip install azure-ai-voicelive
# For synchronous streaming (uses websockets)
python -m pip install "azure-ai-voicelive[websockets]"
# For asynchronous streaming (uses aiohttp)
python -m pip install "azure-ai-voicelive[aiohttp]"
# For both sync + async scenarios (recommended if unsure)
python -m pip install "azure-ai-voicelive[all-websockets]" pyaudio python-dotenv
```
WebSocket streaming features require additional dependencies.
Install them with:
pip install "azure-ai-voicelive[websockets]" # for sync
pip install "azure-ai-voicelive[aiohttp]" # for async
pip install "azure-ai-voicelive[all-websockets]" # for both
### Authenticate
You can authenticate with an **API key** or an **Azure Active Directory (AAD) token**.
#### API Key Authentication (Quick Start)
Set environment variables in a `.env` file or directly in your environment:
```bash
# In your .env file or environment variables
AZURE_VOICELIVE_API_KEY="your-api-key"
AZURE_VOICELIVE_ENDPOINT="your-endpoint"
```
Then, use the key in your code:
```python
from azure.core.credentials import AzureKeyCredential
from azure.ai.voicelive import connect
connection = connect(
endpoint="your-endpoint",
credential=AzureKeyCredential("your-api-key"),
model="gpt-4o-realtime-preview"
)
```
#### AAD Token Authentication
For production applications, AAD authentication is recommended:
```python
from azure.identity import DefaultAzureCredential
from azure.ai.voicelive import connect
credential = DefaultAzureCredential()
connection = connect(
endpoint="your-endpoint",
credential=credential,
model="gpt-4o-realtime-preview"
)
```
---
Key concepts
------------
- **VoiceLiveConnection** – Manages an active WebSocket connection to the service
- **Session Management** – Configure conversation parameters:
- **SessionResource** – Update session parameters (voice, formats, VAD)
- **RequestSession** – Strongly-typed session configuration
- **ServerVad** – Configure voice activity detection
- **AzureStandardVoice** – Configure voice settings
- **Audio Handling**:
- **InputAudioBufferResource** – Manage audio input to the service
- **OutputAudioBufferResource** – Control audio output from the service
- **Conversation Management**:
- **ResponseResource** – Create or cancel model responses
- **ConversationResource** – Manage conversation items
- **Strongly-Typed Events** – Process service events with type safety:
- `SESSION_UPDATED`, `RESPONSE_AUDIO_DELTA`, `RESPONSE_DONE`
- `INPUT_AUDIO_BUFFER_SPEECH_STARTED`, `INPUT_AUDIO_BUFFER_SPEECH_STOPPED`
- `ERROR`, and more
---
Examples
--------
### Basic async Voice Assistant (Featured Sample)
The Basic async Voice Assistant sample demonstrates full-featured voice interaction with:
- Real-time speech streaming
- Server-side voice activity detection
- Interruption handling
- High-quality audio processing
```bash
# Run the basic voice assistant sample
# Requires [aiohttp] for async (easiest: [all-websockets])
python samples/basic_voice_assistant_async.py
# With custom parameters
python samples/basic_voice_assistant_async.py --model gpt-4o-realtime-preview --voice alloy --instructions "You're a helpful assistant"
```
### Minimal async example
```python
import asyncio
from azure.core.credentials import AzureKeyCredential
from azure.ai.voicelive.aio import connect
from azure.ai.voicelive.models import (
RequestSession, Modality, AudioFormat, ServerVad, ServerEventType
)
API_KEY = "your-api-key"
ENDPOINT = "wss://your-endpoint.com/openai/realtime"
MODEL = "gpt-4o-realtime-preview"
async def main():
async with connect(
endpoint=ENDPOINT,
credential=AzureKeyCredential(API_KEY),
model=MODEL,
) as conn:
session = RequestSession(
modalities=[Modality.TEXT, Modality.AUDIO],
instructions="You are a helpful assistant.",
input_audio_format=AudioFormat.PCM16,
output_audio_format=AudioFormat.PCM16,
turn_detection=ServerVad(
threshold=0.5,
prefix_padding_ms=300,
silence_duration_ms=500
),
)
await conn.session.update(session=session)
# Process events
async for evt in conn:
print(f"Event: {evt.type}")
if evt.type == ServerEventType.RESPONSE_DONE:
break
asyncio.run(main())
```
### Minimal sync example
```python
from azure.core.credentials import AzureKeyCredential
from azure.ai.voicelive import connect
from azure.ai.voicelive.models import (
RequestSession, Modality, AudioFormat, ServerVad, ServerEventType
)
API_KEY = "your-api-key"
ENDPOINT = "your-endpoint"
MODEL = "gpt-4o-realtime-preview"
with connect(
endpoint=ENDPOINT,
credential=AzureKeyCredential(API_KEY),
model=MODEL
) as conn:
session = RequestSession(
modalities=[Modality.TEXT, Modality.AUDIO],
instructions="You are a helpful assistant.",
input_audio_format=AudioFormat.PCM16,
output_audio_format=AudioFormat.PCM16,
turn_detection=ServerVad(
threshold=0.5,
prefix_padding_ms=300,
silence_duration_ms=500
),
)
conn.session.update(session=session)
# Process events
for evt in conn:
print(f"Event: {evt.type}")
if evt.type == ServerEventType.RESPONSE_DONE:
break
```
Available Voice Options
-----------------------
### Azure Neural Voices
```python
# Use Azure Neural voices
voice_config = AzureStandardVoice(
name="en-US-AvaNeural", # Or another voice name
type="azure-standard"
)
```
Popular voices include:
- `en-US-AvaNeural` - Female, natural and professional
- `en-US-JennyNeural` - Female, conversational
- `en-US-GuyNeural` - Male, professional
### OpenAI Voices
```python
# Use OpenAI voices (as string)
voice_config = "alloy" # Or another OpenAI voice
```
Available OpenAI voices:
- `alloy` - Versatile, neutral
- `echo` - Precise, clear
- `fable` - Animated, expressive
- `onyx` - Deep, authoritative
- `nova` - Warm, conversational
- `shimmer` - Optimistic, friendly
---
Handling Events
---------------
```python
async for event in connection:
if event.type == ServerEventType.SESSION_UPDATED:
print(f"Session ready: {event.session.id}")
# Start audio capture
elif event.type == ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STARTED:
print("User started speaking")
# Stop playback and cancel any current response
elif event.type == ServerEventType.RESPONSE_AUDIO_DELTA:
# Play the audio chunk
audio_bytes = event.delta
elif event.type == ServerEventType.ERROR:
print(f"Error: {event.error.message}")
```
---
Troubleshooting
---------------
### Connection Issues
- **WebSocket connection errors (1006/timeout):**
Verify `AZURE_VOICELIVE_ENDPOINT`, network rules, and that your credential has access.
- **Missing WebSocket dependencies:**
If you see:
WebSocket streaming features require additional dependencies.
Install them with:
pip install "azure-ai-voicelive[websockets]" # for sync
pip install "azure-ai-voicelive[aiohttp]" # for async
pip install "azure-ai-voicelive[all-websockets]" # for both
- **Auth failures:**
For API key, double-check `AZURE_VOICELIVE_API_KEY`. For AAD, ensure the identity is authorized.
### Audio Device Issues
- **No microphone/speaker detected:**
Check device connections and permissions. On headless CI environments, audio samples can't run.
- **Audio library installation problems:**
On Linux/macOS you may need PortAudio:
```bash
# Debian/Ubuntu
sudo apt-get install -y portaudio19-dev libasound2-dev
# macOS (Homebrew)
brew install portaudio
```
### Enable Verbose Logging
```python
import logging
logging.basicConfig(level=logging.DEBUG)
```
---
Next steps
----------
1. **Run the featured sample:**
- Try `samples/basic_voice_assistant_async.py` for a complete voice assistant implementation
2. **Customize your implementation:**
- Experiment with different voices and parameters
- Add custom instructions for specialized assistants
- Integrate with your own audio capture/playback systems
3. **Advanced scenarios:**
- Add function calling support
- Implement tool usage
- Create multi-turn conversations with history
4. **Explore other samples:**
- Check the `samples/` directory for specialized examples
- See `samples/README.md` for a full list of samples
---
Contributing
------------
This project follows the Azure SDK guidelines. If you'd like to contribute:
1. Fork the repo and create a feature branch
2. Run linters and tests locally
3. Submit a pull request with a clear description of the change
---
Release notes
-------------
Changelogs are available in the package directory.
---
License
-------
This project is released under the **MIT License**.
# Release History
## 1.0.0b1 (2025-08-28)
### Features Added
- Added WebSocket connection support through `connect()`.
- Added `VoiceLiveConnection` for managing WebSocket connections.
- Added models of Voice Live preview.
- Added WebSocket-based examples in the samples directory.
### Other Changes
- Initial preview release.
Raw data
{
"_id": null,
"home_page": null,
"name": "azure-ai-voicelive",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "azure, azure sdk",
"author": null,
"author_email": "Microsoft Corporation <azpysdkhelp@microsoft.com> License-Expression: MIT",
"download_url": "https://files.pythonhosted.org/packages/6c/ff/5c6b935aec92c125bd42ae549e94c7808e59336514b1dd53dab30af51c23/azure_ai_voicelive-1.0.0b1.tar.gz",
"platform": null,
"description": "Azure AI VoiceLive client library for Python\n============================================\n\nThis package provides a **real-time, speech-to-speech** client for Azure AI VoiceLive.\nIt opens a WebSocket session to stream microphone audio to the service and receive\ntyped server events (including audio) for responsive, interruptible conversations.\n\n> **Status:** Preview. APIs are subject to change.\n\n---\n\nGetting started\n---------------\n\n### Prerequisites\n\n- **Python 3.9+**\n- An **Azure subscription**\n- A **VoiceLive** resource and endpoint\n- A working **microphone** and **speakers/headphones** if you run the voice samples\n\n### Install\n\n```bash\n# Base install (core client only)\npython -m pip install azure-ai-voicelive\n\n# For synchronous streaming (uses websockets)\npython -m pip install \"azure-ai-voicelive[websockets]\"\n\n# For asynchronous streaming (uses aiohttp)\npython -m pip install \"azure-ai-voicelive[aiohttp]\"\n\n# For both sync + async scenarios (recommended if unsure)\npython -m pip install \"azure-ai-voicelive[all-websockets]\" pyaudio python-dotenv\n```\n\nWebSocket streaming features require additional dependencies.\nInstall them with:\n pip install \"azure-ai-voicelive[websockets]\" # for sync\n pip install \"azure-ai-voicelive[aiohttp]\" # for async\n pip install \"azure-ai-voicelive[all-websockets]\" # for both\n\n### Authenticate\n\nYou can authenticate with an **API key** or an **Azure Active Directory (AAD) token**.\n\n#### API Key Authentication (Quick Start)\n\nSet environment variables in a `.env` file or directly in your environment:\n\n```bash\n# In your .env file or environment variables\nAZURE_VOICELIVE_API_KEY=\"your-api-key\"\nAZURE_VOICELIVE_ENDPOINT=\"your-endpoint\"\n```\n\nThen, use the key in your code:\n\n```python\nfrom azure.core.credentials import AzureKeyCredential\nfrom azure.ai.voicelive import connect\n\nconnection = connect(\n endpoint=\"your-endpoint\",\n credential=AzureKeyCredential(\"your-api-key\"),\n model=\"gpt-4o-realtime-preview\"\n)\n```\n\n#### AAD Token Authentication\n\nFor production applications, AAD authentication is recommended:\n\n```python\nfrom azure.identity import DefaultAzureCredential\nfrom azure.ai.voicelive import connect\n\ncredential = DefaultAzureCredential()\n\nconnection = connect(\n endpoint=\"your-endpoint\",\n credential=credential,\n model=\"gpt-4o-realtime-preview\"\n)\n```\n\n---\n\nKey concepts\n------------\n\n- **VoiceLiveConnection** \u2013 Manages an active WebSocket connection to the service\n- **Session Management** \u2013 Configure conversation parameters:\n - **SessionResource** \u2013 Update session parameters (voice, formats, VAD)\n - **RequestSession** \u2013 Strongly-typed session configuration\n - **ServerVad** \u2013 Configure voice activity detection\n - **AzureStandardVoice** \u2013 Configure voice settings\n- **Audio Handling**:\n - **InputAudioBufferResource** \u2013 Manage audio input to the service\n - **OutputAudioBufferResource** \u2013 Control audio output from the service\n- **Conversation Management**:\n - **ResponseResource** \u2013 Create or cancel model responses\n - **ConversationResource** \u2013 Manage conversation items\n- **Strongly-Typed Events** \u2013 Process service events with type safety:\n - `SESSION_UPDATED`, `RESPONSE_AUDIO_DELTA`, `RESPONSE_DONE`\n - `INPUT_AUDIO_BUFFER_SPEECH_STARTED`, `INPUT_AUDIO_BUFFER_SPEECH_STOPPED`\n - `ERROR`, and more\n\n---\n\nExamples\n--------\n\n### Basic async Voice Assistant (Featured Sample)\n\nThe Basic async Voice Assistant sample demonstrates full-featured voice interaction with:\n\n- Real-time speech streaming\n- Server-side voice activity detection\n- Interruption handling\n- High-quality audio processing\n\n```bash\n# Run the basic voice assistant sample\n# Requires [aiohttp] for async (easiest: [all-websockets])\npython samples/basic_voice_assistant_async.py\n\n# With custom parameters\npython samples/basic_voice_assistant_async.py --model gpt-4o-realtime-preview --voice alloy --instructions \"You're a helpful assistant\"\n```\n\n### Minimal async example\n\n```python\nimport asyncio\nfrom azure.core.credentials import AzureKeyCredential\nfrom azure.ai.voicelive.aio import connect\nfrom azure.ai.voicelive.models import (\n RequestSession, Modality, AudioFormat, ServerVad, ServerEventType\n)\n\nAPI_KEY = \"your-api-key\"\nENDPOINT = \"wss://your-endpoint.com/openai/realtime\"\nMODEL = \"gpt-4o-realtime-preview\"\n\nasync def main():\n async with connect(\n endpoint=ENDPOINT,\n credential=AzureKeyCredential(API_KEY),\n model=MODEL,\n ) as conn:\n session = RequestSession(\n modalities=[Modality.TEXT, Modality.AUDIO],\n instructions=\"You are a helpful assistant.\",\n input_audio_format=AudioFormat.PCM16,\n output_audio_format=AudioFormat.PCM16,\n turn_detection=ServerVad(\n threshold=0.5, \n prefix_padding_ms=300, \n silence_duration_ms=500\n ),\n )\n await conn.session.update(session=session)\n\n # Process events\n async for evt in conn:\n print(f\"Event: {evt.type}\")\n if evt.type == ServerEventType.RESPONSE_DONE:\n break\n\nasyncio.run(main())\n```\n\n### Minimal sync example\n\n```python\nfrom azure.core.credentials import AzureKeyCredential\nfrom azure.ai.voicelive import connect\nfrom azure.ai.voicelive.models import (\n RequestSession, Modality, AudioFormat, ServerVad, ServerEventType\n)\n\nAPI_KEY = \"your-api-key\"\nENDPOINT = \"your-endpoint\"\nMODEL = \"gpt-4o-realtime-preview\"\n\nwith connect(\n endpoint=ENDPOINT,\n credential=AzureKeyCredential(API_KEY),\n model=MODEL\n) as conn:\n session = RequestSession(\n modalities=[Modality.TEXT, Modality.AUDIO],\n instructions=\"You are a helpful assistant.\",\n input_audio_format=AudioFormat.PCM16,\n output_audio_format=AudioFormat.PCM16,\n turn_detection=ServerVad(\n threshold=0.5, \n prefix_padding_ms=300, \n silence_duration_ms=500\n ),\n )\n conn.session.update(session=session)\n\n # Process events\n for evt in conn:\n print(f\"Event: {evt.type}\")\n if evt.type == ServerEventType.RESPONSE_DONE:\n break\n```\n\nAvailable Voice Options\n-----------------------\n\n### Azure Neural Voices\n\n```python\n# Use Azure Neural voices\nvoice_config = AzureStandardVoice(\n name=\"en-US-AvaNeural\", # Or another voice name\n type=\"azure-standard\"\n)\n```\n\nPopular voices include:\n\n- `en-US-AvaNeural` - Female, natural and professional\n- `en-US-JennyNeural` - Female, conversational\n- `en-US-GuyNeural` - Male, professional\n\n### OpenAI Voices\n\n```python\n# Use OpenAI voices (as string)\nvoice_config = \"alloy\" # Or another OpenAI voice\n```\n\nAvailable OpenAI voices:\n\n- `alloy` - Versatile, neutral\n- `echo` - Precise, clear\n- `fable` - Animated, expressive\n- `onyx` - Deep, authoritative\n- `nova` - Warm, conversational\n- `shimmer` - Optimistic, friendly\n\n---\n\nHandling Events\n---------------\n\n```python\nasync for event in connection:\n if event.type == ServerEventType.SESSION_UPDATED:\n print(f\"Session ready: {event.session.id}\")\n # Start audio capture\n \n elif event.type == ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STARTED:\n print(\"User started speaking\")\n # Stop playback and cancel any current response\n \n elif event.type == ServerEventType.RESPONSE_AUDIO_DELTA:\n # Play the audio chunk\n audio_bytes = event.delta\n \n elif event.type == ServerEventType.ERROR:\n print(f\"Error: {event.error.message}\")\n```\n\n---\n\nTroubleshooting\n---------------\n\n### Connection Issues\n\n- **WebSocket connection errors (1006/timeout):** \n Verify `AZURE_VOICELIVE_ENDPOINT`, network rules, and that your credential has access.\n\n- **Missing WebSocket dependencies:** \n If you see:\n WebSocket streaming features require additional dependencies.\n Install them with:\n pip install \"azure-ai-voicelive[websockets]\" # for sync\n pip install \"azure-ai-voicelive[aiohttp]\" # for async\n pip install \"azure-ai-voicelive[all-websockets]\" # for both\n\n- **Auth failures:** \n For API key, double-check `AZURE_VOICELIVE_API_KEY`. For AAD, ensure the identity is authorized.\n\n### Audio Device Issues\n\n- **No microphone/speaker detected:** \n Check device connections and permissions. On headless CI environments, audio samples can't run.\n\n- **Audio library installation problems:** \n On Linux/macOS you may need PortAudio:\n\n ```bash\n # Debian/Ubuntu\n sudo apt-get install -y portaudio19-dev libasound2-dev\n # macOS (Homebrew)\n brew install portaudio\n ```\n\n### Enable Verbose Logging\n\n```python\nimport logging\nlogging.basicConfig(level=logging.DEBUG)\n```\n\n---\n\nNext steps\n----------\n\n1. **Run the featured sample:**\n - Try `samples/basic_voice_assistant_async.py` for a complete voice assistant implementation\n\n2. **Customize your implementation:**\n - Experiment with different voices and parameters\n - Add custom instructions for specialized assistants\n - Integrate with your own audio capture/playback systems\n\n3. **Advanced scenarios:**\n - Add function calling support\n - Implement tool usage\n - Create multi-turn conversations with history\n\n4. **Explore other samples:**\n - Check the `samples/` directory for specialized examples\n - See `samples/README.md` for a full list of samples\n\n---\n\nContributing\n------------\n\nThis project follows the Azure SDK guidelines. If you'd like to contribute:\n\n1. Fork the repo and create a feature branch\n2. Run linters and tests locally\n3. Submit a pull request with a clear description of the change\n\n---\n\nRelease notes\n-------------\n\nChangelogs are available in the package directory.\n\n---\n\nLicense\n-------\n\nThis project is released under the **MIT License**.\n\n# Release History\n\n## 1.0.0b1 (2025-08-28)\n\n### Features Added\n\n- Added WebSocket connection support through `connect()`.\n- Added `VoiceLiveConnection` for managing WebSocket connections.\n- Added models of Voice Live preview.\n- Added WebSocket-based examples in the samples directory.\n\n### Other Changes\n\n- Initial preview release.\n",
"bugtrack_url": null,
"license": null,
"summary": "Microsoft Corporation Azure Ai Voicelive Client Library for Python",
"version": "1.0.0b1",
"project_urls": {
"Repository": "https://github.com/Azure/azure-sdk-for-python"
},
"split_keywords": [
"azure",
" azure sdk"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "462e4f7d545f83d97c5e672cf20a6b2e99bc55b872e7a48fb538eb32f17275ee",
"md5": "d5867807c7cd4f8646bf00b72fa6173b",
"sha256": "5c20cc2fbc5add0099f2b42453c9a9cc9f621eb31371c925f60a341c380ab5f4"
},
"downloads": -1,
"filename": "azure_ai_voicelive-1.0.0b1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d5867807c7cd4f8646bf00b72fa6173b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 82263,
"upload_time": "2025-08-29T00:54:52",
"upload_time_iso_8601": "2025-08-29T00:54:52.435553Z",
"url": "https://files.pythonhosted.org/packages/46/2e/4f7d545f83d97c5e672cf20a6b2e99bc55b872e7a48fb538eb32f17275ee/azure_ai_voicelive-1.0.0b1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "6cff5c6b935aec92c125bd42ae549e94c7808e59336514b1dd53dab30af51c23",
"md5": "913ebc6fb31c1cd27f0711a91772a6d3",
"sha256": "8fb7bb36dee85d029049d3ef0736f02c7dfaacb6283c11455db3063767b38ccf"
},
"downloads": -1,
"filename": "azure_ai_voicelive-1.0.0b1.tar.gz",
"has_sig": false,
"md5_digest": "913ebc6fb31c1cd27f0711a91772a6d3",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 90193,
"upload_time": "2025-08-29T00:54:50",
"upload_time_iso_8601": "2025-08-29T00:54:50.737024Z",
"url": "https://files.pythonhosted.org/packages/6c/ff/5c6b935aec92c125bd42ae549e94c7808e59336514b1dd53dab30af51c23/azure_ai_voicelive-1.0.0b1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-29 00:54:50",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Azure",
"github_project": "azure-sdk-for-python",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"lcname": "azure-ai-voicelive"
}