# ๐ฃ๏ธ Debabelizer
**Voice Processing Library - Breaking Down Language Barriers**
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
Debabelizer is a voice processing library that provides a unified interface for speech-to-text (STT) and text-to-speech (TTS) operations across multiple cloud providers and local engines. Break down language barriers with support for 100+ languages and dialects.
## ๐ Features
### ๐ฏ **Pluggable Provider Support**
- **6 STT Providers**: Soniox, Deepgram, Google Cloud, Azure, OpenAI Whisper (local), OpenAI Whisper (API)
- **4 TTS Providers**: ElevenLabs, OpenAI, Google Cloud, Azure
- **Unified API**: Switch providers without changing code
- **Provider-specific optimizations**: Each provider uses its optimal streaming/processing approach
### ๐ **Comprehensive Language Support**
- **100+ languages and dialects** across all providers
- **Automatic language detection**
- **Multi-language processing** in single workflows
- **Custom language hints** for improved accuracy
### โก **Advanced Processing**
- **Real-time streaming** transcription (Soniox, Deepgram with true WebSocket streaming)
- **Chunk-based transcription** for reliable web application audio processing
- **File-based transcription** for batch processing
- **Word-level timestamps** and confidence scores
- **Speaker diarization** and voice identification (provider-dependent)
- **Custom voice training** and cloning (ElevenLabs)
### ๐ **Local & Cloud Options**
- **OpenAI Whisper**: Complete offline processing (FREE)
- **Cloud APIs**: Enterprise-grade accuracy and features
- **Hybrid workflows**: Mix local and cloud processing
- **Cost optimization**: Automatic provider selection by cost/quality
### ๐ ๏ธ **Enterprise Ready**
- **Async/await support** for high-performance applications
- **Session management** for long-running processes
- **Error handling** with provider-specific fallbacks
- **Usage tracking** and cost estimation
- **Extensive configuration** options
## ๐ฆ Installation
### Basic Installation
```bash
pip install debabelizer
```
### Provider-Specific Installation
```bash
# Individual providers
pip install debabelizer[soniox] # Soniox STT
pip install debabelizer[deepgram] # Deepgram STT
pip install debabelizer[google] # Google Cloud STT & TTS
pip install debabelizer[azure] # Azure STT & TTS
pip install debabelizer[whisper] # OpenAI Whisper STT (local)
pip install debabelizer[elevenlabs] # ElevenLabs TTS
pip install debabelizer[openai] # OpenAI TTS & Whisper API
# All providers
pip install debabelizer[all]
# Development
pip install debabelizer[dev]
```
### Development Installation
```bash
git clone https://github.com/your-org/debabelizer.git
cd debabelizer
pip install -e .[dev]
```
## ๐ Quick Start
### Basic Speech-to-Text
```python
import asyncio
from debabelizer import VoiceProcessor, DebabelizerConfig
async def transcribe_audio():
# Configure with your preferred provider
config = DebabelizerConfig({
"deepgram": {"api_key": "your_deepgram_key"},
"preferences": {"stt_provider": "deepgram"}
})
# Create processor
processor = VoiceProcessor(config=config)
# Transcribe audio file
result = await processor.transcribe_file("audio.wav")
print(f"Text: {result.text}")
print(f"Language: {result.language_detected}")
print(f"Confidence: {result.confidence}")
# Run transcription
asyncio.run(transcribe_audio())
```
### Basic Text-to-Speech
```python
import asyncio
from debabelizer import VoiceProcessor, DebabelizerConfig
async def synthesize_speech():
# Configure TTS provider
config = DebabelizerConfig({
"elevenlabs": {"api_key": "your_elevenlabs_key"}
})
processor = VoiceProcessor(
tts_provider="elevenlabs",
config=config
)
# Synthesize speech
result = await processor.synthesize(
text="Hello world! This is Debabelizer speaking.",
voice="Rachel" # ElevenLabs voice
)
# Save audio
with open("output.mp3", "wb") as f:
f.write(result.audio_data)
asyncio.run(synthesize_speech())
```
### Local Processing (FREE with Whisper)
```python
import asyncio
from debabelizer import VoiceProcessor, DebabelizerConfig
async def local_transcription():
# No API key needed for Whisper!
config = DebabelizerConfig({
"whisper": {
"model_size": "base", # tiny, base, small, medium, large
"device": "auto" # auto-detects GPU/CPU
}
})
processor = VoiceProcessor(stt_provider="whisper", config=config)
# Completely offline transcription
result = await processor.transcribe_file("audio.wav")
print(f"Offline transcription: {result.text}")
asyncio.run(local_transcription())
```
### Real-time Streaming (Provider-Specific)
```python
import asyncio
from debabelizer import VoiceProcessor, DebabelizerConfig
async def streaming_transcription():
# Note: True streaming varies by provider
# Soniox: True real-time WebSocket streaming
# Deepgram: True real-time WebSocket streaming
# Google/Azure: Session-based streaming with optimizations
config = DebabelizerConfig({
"soniox": {"api_key": "your_key"} # Best for true streaming
})
processor = VoiceProcessor(stt_provider="soniox", config=config)
# Start streaming session
session_id = await processor.start_streaming_transcription(
audio_format="pcm", # Raw PCM preferred for streaming
sample_rate=16000,
language="en"
)
# Stream audio chunks (typically 16ms - 100ms chunks)
with open("audio.wav", "rb") as f:
chunk_size = 1024 # Small chunks for real-time
while chunk := f.read(chunk_size):
await processor.stream_audio(session_id, chunk)
# Get results as they arrive
async for result in processor.get_streaming_results(session_id):
if result.is_final:
print(f"Final: {result.text}")
else:
print(f"Interim: {result.text}")
await processor.stop_streaming_transcription(session_id)
asyncio.run(streaming_transcription())
```
### File-Based Transcription (Alternative to Streaming)
```python
import asyncio
from debabelizer import VoiceProcessor, DebabelizerConfig
async def file_transcription():
"""
Process complete audio files or buffered audio chunks.
Alternative to streaming for applications that can buffer audio.
"""
config = DebabelizerConfig({
"deepgram": {"api_key": "your_key"}
})
processor = VoiceProcessor(stt_provider="deepgram", config=config)
# Process complete audio file
result = await processor.transcribe_file("audio.wav")
# Or process audio data from memory (e.g., from web upload)
with open("audio_chunk.webm", "rb") as f:
chunk_data = f.read() # WebM/Opus from MediaRecorder
# Process audio data directly
result = await processor.transcribe_audio(
audio_data=chunk_data,
audio_format="webm", # Browser WebM/Opus format
sample_rate=48000, # Browser standard
language="en"
)
print(f"Result: {result.text}")
print(f"Confidence: {result.confidence}")
print(f"Language: {result.language_detected}")
asyncio.run(file_transcription())
```
## ๐ง Configuration
### Environment Variables
Create a `.env` file:
```bash
# Provider API Keys
DEEPGRAM_API_KEY=your_deepgram_key
ELEVENLABS_API_KEY=your_elevenlabs_key
OPENAI_API_KEY=your_openai_key
SONIOX_API_KEY=your_soniox_key
# Azure (requires key + region)
AZURE_SPEECH_KEY=your_azure_key
AZURE_SPEECH_REGION=eastus
# Google Cloud (requires service account JSON)
GOOGLE_APPLICATION_CREDENTIALS=/path/to/google-credentials.json
# Preferences
DEBABELIZER_STT_PROVIDER=deepgram
DEBABELIZER_TTS_PROVIDER=elevenlabs
DEBABELIZER_OPTIMIZE_FOR=quality # cost, latency, quality, balanced
```
### Authentication Requirements by Provider
#### Google Cloud STT/TTS
**Requires**: Service account JSON file or Application Default Credentials
```bash
# Option 1: Service account file
GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
# Option 2: Use gcloud CLI (for development)
gcloud auth login
gcloud auth application-default login
gcloud config set project YOUR_PROJECT_ID
```
#### Azure STT/TTS
**Requires**: API key + region
```bash
AZURE_SPEECH_KEY=your_api_key_here
AZURE_SPEECH_REGION=eastus # or your preferred region
```
#### OpenAI (TTS & Whisper API)
**Requires**: OpenAI API key
```bash
OPENAI_API_KEY=your_openai_api_key
```
#### ElevenLabs TTS
**Requires**: ElevenLabs API key
```bash
ELEVENLABS_API_KEY=your_elevenlabs_key
```
#### Deepgram STT
**Requires**: Deepgram API key
```bash
DEEPGRAM_API_KEY=your_deepgram_key
```
#### Soniox STT
**Requires**: Soniox API key
```bash
SONIOX_API_KEY=your_soniox_key
```
#### OpenAI Whisper (Local)
**Requires**: No API key (completely offline)
- Automatically downloads models on first use
- Supports GPU acceleration with CUDA/MPS
## ๐ฏ Provider Comparison & Testing Status
### Speech-to-Text (STT) Providers
| Provider | Status | Streaming | Language Auto-Detection | Testing | Authentication | Best For |
|----------|--------|-----------|-------------------------|---------|----------------|----------|
| **Soniox** | โ
**Verified** | True WebSocket streaming | โ
**Verified** | โ
**Tested & Fixed** | API Key | Real-time applications |
| **Deepgram** | โ
**Verified** | True WebSocket streaming | โ
**Claimed** | โ
**Tested & Fixed** | API Key | High accuracy & speed |
| **Google Cloud** | โ
**Code Fixed** | Session-based streaming | โ ๏ธ **Limited** | โ ๏ธ **Needs Auth Setup** | Service Account JSON | Enterprise features |
| **Azure** | โ
**Code Fixed** | Session-based streaming | โ
**Claimed** | โ ๏ธ **Needs Auth Setup** | API Key + Region | Microsoft ecosystem |
| **OpenAI Whisper (Local)** | โ
**Verified** | File-based only | โ **Unclear** | โ
**Tested** | None (offline) | Cost-free processing |
| **OpenAI Whisper (API)** | โ
**Available** | File-based only | โ **Unclear** | โ ๏ธ **Not tested** | OpenAI API Key | Cloud Whisper |
### Text-to-Speech (TTS) Providers
| Provider | Status | Streaming | Testing | Authentication | Best For |
|----------|--------|-----------|---------|----------------|----------|
| **ElevenLabs** | โ
**Verified** | Simulated streaming | โ
**Tested & Working** | API Key | Voice cloning & quality |
| **OpenAI** | โ
**Verified** | Simulated streaming | โ
**Tested & Fixed** | OpenAI API Key | Natural voices |
| **Google Cloud** | โ
**Available** | TBD | โ ๏ธ **Not tested** | Service Account JSON | Enterprise features |
| **Azure** | โ
**Available** | TBD | โ ๏ธ **Not tested** | API Key + Region | Microsoft ecosystem |
### Key Testing Results
#### โ
**Fully Tested & Verified**
- **OpenAI TTS**: All features working, issues fixed (sample rate accuracy, duration estimation, streaming transparency)
- **ElevenLabs TTS**: All features working, fully tested and verified
- **Soniox STT**: Streaming implementation fixed (method names, session management)
- **Deepgram STT**: True WebSocket streaming implemented and working
#### โ
**Code Issues Fixed (Ready for Testing)**
- **Google Cloud STT**: Fixed critical async/sync mixing bugs in streaming implementation
- **Azure STT**: Fixed critical async/sync mixing bugs in event handlers
#### โ ๏ธ **Available but Needs Testing**
- **Google Cloud TTS**: Implementation exists but not tested
- **Azure TTS**: Implementation exists but not tested
- **OpenAI Whisper API**: Implementation exists but not tested
## ๐ง Advanced Usage
### Provider-Specific Optimizations
```python
# Soniox: Best for true real-time streaming
soniox_config = DebabelizerConfig({
"soniox": {
"api_key": "your_key",
"model": "en_v2",
"include_profanity": False,
"enable_global_speaker_diarization": True
}
})
# Deepgram: High accuracy with true streaming
deepgram_config = DebabelizerConfig({
"deepgram": {
"api_key": "your_key",
"model": "nova-2",
"language": "en",
"interim_results": True,
"vad_events": True
}
})
# Google Cloud: Enterprise features (requires service account)
google_config = DebabelizerConfig({
"google": {
"credentials_path": "/path/to/service-account.json",
"project_id": "your-project-id",
"model": "latest_long",
"enable_speaker_diarization": True,
"enable_word_time_offsets": True
}
})
# Azure: Microsoft ecosystem integration
azure_config = DebabelizerConfig({
"azure": {
"api_key": "your_key",
"region": "eastus",
"language": "en-US",
"enable_dictation": True,
"profanity_filter": True
}
})
# OpenAI Whisper: Free local processing
whisper_config = DebabelizerConfig({
"whisper": {
"model_size": "medium", # tiny, base, small, medium, large
"device": "cuda", # cpu, cuda, mps, auto
"fp16": True, # Faster inference with GPU
"language": None # Auto-detect
}
})
```
### Web Application Integration
```python
from fastapi import FastAPI, UploadFile, File, WebSocket
from debabelizer import VoiceProcessor, DebabelizerConfig
import asyncio
app = FastAPI()
# Initialize processor globally
config = DebabelizerConfig()
processor = VoiceProcessor(config=config)
@app.post("/transcribe-chunk")
async def transcribe_chunk(file: UploadFile = File(...)):
"""
Recommended approach for web applications.
Process audio chunks from browser MediaRecorder.
"""
content = await file.read()
# Use audio transcription for buffered chunks
result = await processor.transcribe_audio(
audio_data=content,
audio_format="webm", # Common browser format
sample_rate=48000, # Browser standard
language="en"
)
return {
"text": result.text,
"language": result.language_detected,
"confidence": result.confidence,
"duration": result.duration,
"method": "chunk_transcription"
}
@app.websocket("/transcribe-stream")
async def transcribe_stream(websocket: WebSocket):
"""
True streaming approach for specialized applications.
Requires careful connection management.
"""
await websocket.accept()
# Start streaming session
session_id = await processor.start_streaming_transcription(
audio_format="pcm",
sample_rate=16000,
language="en"
)
try:
while True:
# Receive audio chunk from WebSocket
audio_chunk = await websocket.receive_bytes()
# Stream to STT provider
await processor.stream_audio(session_id, audio_chunk)
# Get results and send back
async for result in processor.get_streaming_results(session_id):
await websocket.send_json({
"text": result.text,
"is_final": result.is_final,
"confidence": result.confidence
})
if result.is_final:
break
except Exception as e:
print(f"Streaming error: {e}")
finally:
await processor.stop_streaming_transcription(session_id)
```
## ๐งช Testing
### Run Tests
```bash
# All tests
python -m pytest
# Specific test categories
python -m pytest tests/test_voice_processor.py # Core functionality
python -m pytest tests/test_config.py # Configuration
python -m pytest tests/test_providers/ # Provider tests
# Integration tests (requires API keys)
python -m pytest tests/test_integration.py
# With coverage
python -m pytest --cov=debabelizer --cov-report=html
```
### Test Results
Current test status: **150/165 tests passing, 15 skipped** โ
```bash
# Test specific providers (requires API keys in .env)
python tests/test_openai_tts.py # OpenAI TTS (tested โ
)
python tests/test_soniox_stt.py # Soniox STT (tested โ
)
python tests/test_deepgram_stt.py # Deepgram STT (tested โ
)
python tests/test_google_stt.py # Google STT (needs auth setup)
python tests/test_azure_stt.py # Azure STT (needs auth setup)
```
## ๐จ Known Issues & Limitations
### Current Limitations
1. **Google Cloud & Azure**: Code is fixed but requires proper authentication setup for testing
2. **TTS Streaming**: Most providers simulate streaming (download full audio, then chunk) - only true for specialized streaming TTS APIs
3. **OpenAI TTS**: Correctly reports 24kHz output, but doesn't support custom sample rates
4. **WebM Audio**: Some providers may need audio format conversion for browser-generated WebM/Opus
### Fixed Issues
- โ
**Google STT**: Fixed critical async/sync mixing in streaming implementation
- โ
**Azure STT**: Fixed async/sync mixing in event handlers
- โ
**OpenAI TTS**: Fixed sample rate accuracy, duration estimation, and streaming transparency
- โ
**Soniox STT**: Fixed method name mismatches and session management
- โ
**Deepgram STT**: Implemented true WebSocket streaming
## ๐ค Contributing
We welcome contributions!
### Development Setup
```bash
git clone https://github.com/your-org/debabelizer.git
cd debabelizer
pip install -e .[dev]
pre-commit install
```
### Testing New Providers
1. Add comprehensive test coverage
2. Follow the systematic debugging approach documented in CLAUDE.md
3. Test both file-based and streaming implementations
4. Verify error handling and edge cases
### Adding New Providers
1. Implement the provider interface in `src/debabelizer/providers/`
2. Add configuration support in `src/debabelizer/core/config.py`
3. Update processor in `src/debabelizer/core/processor.py`
4. Add comprehensive tests in `tests/`
5. Update documentation
## ๐ License
This project is licensed under the MIT License.
## ๐ Support
- **Issues**: [GitHub Issues](https://github.com/techwiz42/debabelizer/issues)
- **Discussions**: [GitHub Discussions](https://github.com/techwiz42/debabelizer/discussions)
## ๐ Acknowledgments
- OpenAI for Whisper models and TTS API
- All provider teams for their excellent APIs
- Contributors and testers
- The open-source community
---
**Debabelizer** - *Breaking down language barriers, one voice at a time* ๐๐ฃ๏ธ
*Last updated: 2025-07-31 - Comprehensive testing and bug fixes for OpenAI TTS, Soniox STT, Deepgram STT, Google Cloud STT, and Azure STT implementations*
Raw data
{
"_id": null,
"home_page": null,
"name": "debabelizer",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "Peter Sisk <pete@cyberiad.ai>",
"keywords": "speech, voice, transcription, synthesis, stt, tts, ai, ml",
"author": null,
"author_email": "Peter Sisk <pete@cyberiad.ai>",
"download_url": "https://files.pythonhosted.org/packages/a5/bf/2e4667c903f038847d24a959885d5bfd8cc14f01ae5f0e936f986ce79f34/debabelizer-0.1.0b1.tar.gz",
"platform": null,
"description": "# \ud83d\udde3\ufe0f Debabelizer\n\n**Voice Processing Library - Breaking Down Language Barriers**\n\n[](https://www.python.org/downloads/)\n[](https://opensource.org/licenses/MIT)\n\nDebabelizer is a voice processing library that provides a unified interface for speech-to-text (STT) and text-to-speech (TTS) operations across multiple cloud providers and local engines. Break down language barriers with support for 100+ languages and dialects.\n\n## \ud83c\udf1f Features\n\n### \ud83c\udfaf **Pluggable Provider Support**\n- **6 STT Providers**: Soniox, Deepgram, Google Cloud, Azure, OpenAI Whisper (local), OpenAI Whisper (API)\n- **4 TTS Providers**: ElevenLabs, OpenAI, Google Cloud, Azure\n- **Unified API**: Switch providers without changing code\n- **Provider-specific optimizations**: Each provider uses its optimal streaming/processing approach\n\n### \ud83c\udf0d **Comprehensive Language Support**\n- **100+ languages and dialects** across all providers\n- **Automatic language detection** \n- **Multi-language processing** in single workflows\n- **Custom language hints** for improved accuracy\n\n### \u26a1 **Advanced Processing**\n- **Real-time streaming** transcription (Soniox, Deepgram with true WebSocket streaming)\n- **Chunk-based transcription** for reliable web application audio processing\n- **File-based transcription** for batch processing\n- **Word-level timestamps** and confidence scores\n- **Speaker diarization** and voice identification (provider-dependent)\n- **Custom voice training** and cloning (ElevenLabs)\n\n### \ud83c\udfe0 **Local & Cloud Options**\n- **OpenAI Whisper**: Complete offline processing (FREE)\n- **Cloud APIs**: Enterprise-grade accuracy and features\n- **Hybrid workflows**: Mix local and cloud processing\n- **Cost optimization**: Automatic provider selection by cost/quality\n\n### \ud83d\udee0\ufe0f **Enterprise Ready**\n- **Async/await support** for high-performance applications\n- **Session management** for long-running processes\n- **Error handling** with provider-specific fallbacks\n- **Usage tracking** and cost estimation\n- **Extensive configuration** options\n\n## \ud83d\udce6 Installation\n\n### Basic Installation\n```bash\npip install debabelizer\n```\n\n### Provider-Specific Installation\n```bash\n# Individual providers\npip install debabelizer[soniox] # Soniox STT\npip install debabelizer[deepgram] # Deepgram STT\npip install debabelizer[google] # Google Cloud STT & TTS\npip install debabelizer[azure] # Azure STT & TTS\npip install debabelizer[whisper] # OpenAI Whisper STT (local)\npip install debabelizer[elevenlabs] # ElevenLabs TTS\npip install debabelizer[openai] # OpenAI TTS & Whisper API\n\n# All providers\npip install debabelizer[all]\n\n# Development\npip install debabelizer[dev]\n```\n\n### Development Installation\n```bash\ngit clone https://github.com/your-org/debabelizer.git\ncd debabelizer\npip install -e .[dev]\n```\n\n## \ud83d\ude80 Quick Start\n\n### Basic Speech-to-Text\n```python\nimport asyncio\nfrom debabelizer import VoiceProcessor, DebabelizerConfig\n\nasync def transcribe_audio():\n # Configure with your preferred provider\n config = DebabelizerConfig({\n \"deepgram\": {\"api_key\": \"your_deepgram_key\"},\n \"preferences\": {\"stt_provider\": \"deepgram\"}\n })\n \n # Create processor\n processor = VoiceProcessor(config=config)\n \n # Transcribe audio file\n result = await processor.transcribe_file(\"audio.wav\")\n \n print(f\"Text: {result.text}\")\n print(f\"Language: {result.language_detected}\")\n print(f\"Confidence: {result.confidence}\")\n\n# Run transcription\nasyncio.run(transcribe_audio())\n```\n\n### Basic Text-to-Speech\n```python\nimport asyncio\nfrom debabelizer import VoiceProcessor, DebabelizerConfig\n\nasync def synthesize_speech():\n # Configure TTS provider\n config = DebabelizerConfig({\n \"elevenlabs\": {\"api_key\": \"your_elevenlabs_key\"}\n })\n \n processor = VoiceProcessor(\n tts_provider=\"elevenlabs\", \n config=config\n )\n \n # Synthesize speech\n result = await processor.synthesize(\n text=\"Hello world! This is Debabelizer speaking.\",\n voice=\"Rachel\" # ElevenLabs voice\n )\n \n # Save audio\n with open(\"output.mp3\", \"wb\") as f:\n f.write(result.audio_data)\n\nasyncio.run(synthesize_speech())\n```\n\n### Local Processing (FREE with Whisper)\n```python\nimport asyncio\nfrom debabelizer import VoiceProcessor, DebabelizerConfig\n\nasync def local_transcription():\n # No API key needed for Whisper!\n config = DebabelizerConfig({\n \"whisper\": {\n \"model_size\": \"base\", # tiny, base, small, medium, large\n \"device\": \"auto\" # auto-detects GPU/CPU\n }\n })\n \n processor = VoiceProcessor(stt_provider=\"whisper\", config=config)\n \n # Completely offline transcription\n result = await processor.transcribe_file(\"audio.wav\")\n print(f\"Offline transcription: {result.text}\")\n\nasyncio.run(local_transcription())\n```\n\n### Real-time Streaming (Provider-Specific)\n```python\nimport asyncio\nfrom debabelizer import VoiceProcessor, DebabelizerConfig\n\nasync def streaming_transcription():\n # Note: True streaming varies by provider\n # Soniox: True real-time WebSocket streaming\n # Deepgram: True real-time WebSocket streaming \n # Google/Azure: Session-based streaming with optimizations\n \n config = DebabelizerConfig({\n \"soniox\": {\"api_key\": \"your_key\"} # Best for true streaming\n })\n \n processor = VoiceProcessor(stt_provider=\"soniox\", config=config)\n \n # Start streaming session\n session_id = await processor.start_streaming_transcription(\n audio_format=\"pcm\", # Raw PCM preferred for streaming\n sample_rate=16000,\n language=\"en\"\n )\n \n # Stream audio chunks (typically 16ms - 100ms chunks)\n with open(\"audio.wav\", \"rb\") as f:\n chunk_size = 1024 # Small chunks for real-time\n while chunk := f.read(chunk_size):\n await processor.stream_audio(session_id, chunk)\n \n # Get results as they arrive\n async for result in processor.get_streaming_results(session_id):\n if result.is_final:\n print(f\"Final: {result.text}\")\n else:\n print(f\"Interim: {result.text}\")\n \n await processor.stop_streaming_transcription(session_id)\n\nasyncio.run(streaming_transcription())\n```\n\n### File-Based Transcription (Alternative to Streaming)\n```python\nimport asyncio\nfrom debabelizer import VoiceProcessor, DebabelizerConfig\n\nasync def file_transcription():\n \"\"\"\n Process complete audio files or buffered audio chunks.\n Alternative to streaming for applications that can buffer audio.\n \"\"\"\n config = DebabelizerConfig({\n \"deepgram\": {\"api_key\": \"your_key\"}\n })\n \n processor = VoiceProcessor(stt_provider=\"deepgram\", config=config)\n \n # Process complete audio file\n result = await processor.transcribe_file(\"audio.wav\")\n \n # Or process audio data from memory (e.g., from web upload)\n with open(\"audio_chunk.webm\", \"rb\") as f:\n chunk_data = f.read() # WebM/Opus from MediaRecorder\n \n # Process audio data directly\n result = await processor.transcribe_audio(\n audio_data=chunk_data,\n audio_format=\"webm\", # Browser WebM/Opus format\n sample_rate=48000, # Browser standard\n language=\"en\"\n )\n \n print(f\"Result: {result.text}\")\n print(f\"Confidence: {result.confidence}\")\n print(f\"Language: {result.language_detected}\")\n\nasyncio.run(file_transcription())\n```\n\n## \ud83d\udd27 Configuration\n\n### Environment Variables\nCreate a `.env` file:\n```bash\n# Provider API Keys\nDEEPGRAM_API_KEY=your_deepgram_key\nELEVENLABS_API_KEY=your_elevenlabs_key\nOPENAI_API_KEY=your_openai_key\nSONIOX_API_KEY=your_soniox_key\n\n# Azure (requires key + region)\nAZURE_SPEECH_KEY=your_azure_key\nAZURE_SPEECH_REGION=eastus\n\n# Google Cloud (requires service account JSON)\nGOOGLE_APPLICATION_CREDENTIALS=/path/to/google-credentials.json\n\n# Preferences\nDEBABELIZER_STT_PROVIDER=deepgram\nDEBABELIZER_TTS_PROVIDER=elevenlabs\nDEBABELIZER_OPTIMIZE_FOR=quality # cost, latency, quality, balanced\n```\n\n### Authentication Requirements by Provider\n\n#### Google Cloud STT/TTS\n**Requires**: Service account JSON file or Application Default Credentials\n```bash\n# Option 1: Service account file\nGOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json\n\n# Option 2: Use gcloud CLI (for development)\ngcloud auth login\ngcloud auth application-default login\ngcloud config set project YOUR_PROJECT_ID\n```\n\n#### Azure STT/TTS\n**Requires**: API key + region\n```bash\nAZURE_SPEECH_KEY=your_api_key_here\nAZURE_SPEECH_REGION=eastus # or your preferred region\n```\n\n#### OpenAI (TTS & Whisper API)\n**Requires**: OpenAI API key\n```bash\nOPENAI_API_KEY=your_openai_api_key\n```\n\n#### ElevenLabs TTS\n**Requires**: ElevenLabs API key\n```bash\nELEVENLABS_API_KEY=your_elevenlabs_key\n```\n\n#### Deepgram STT\n**Requires**: Deepgram API key\n```bash\nDEEPGRAM_API_KEY=your_deepgram_key\n```\n\n#### Soniox STT\n**Requires**: Soniox API key\n```bash\nSONIOX_API_KEY=your_soniox_key\n```\n\n#### OpenAI Whisper (Local)\n**Requires**: No API key (completely offline)\n- Automatically downloads models on first use\n- Supports GPU acceleration with CUDA/MPS\n\n## \ud83c\udfaf Provider Comparison & Testing Status\n\n### Speech-to-Text (STT) Providers\n\n| Provider | Status | Streaming | Language Auto-Detection | Testing | Authentication | Best For |\n|----------|--------|-----------|-------------------------|---------|----------------|----------|\n| **Soniox** | \u2705 **Verified** | True WebSocket streaming | \u2705 **Verified** | \u2705 **Tested & Fixed** | API Key | Real-time applications |\n| **Deepgram** | \u2705 **Verified** | True WebSocket streaming | \u2705 **Claimed** | \u2705 **Tested & Fixed** | API Key | High accuracy & speed |\n| **Google Cloud** | \u2705 **Code Fixed** | Session-based streaming | \u26a0\ufe0f **Limited** | \u26a0\ufe0f **Needs Auth Setup** | Service Account JSON | Enterprise features |\n| **Azure** | \u2705 **Code Fixed** | Session-based streaming | \u2705 **Claimed** | \u26a0\ufe0f **Needs Auth Setup** | API Key + Region | Microsoft ecosystem |\n| **OpenAI Whisper (Local)** | \u2705 **Verified** | File-based only | \u2753 **Unclear** | \u2705 **Tested** | None (offline) | Cost-free processing |\n| **OpenAI Whisper (API)** | \u2705 **Available** | File-based only | \u2753 **Unclear** | \u26a0\ufe0f **Not tested** | OpenAI API Key | Cloud Whisper |\n\n### Text-to-Speech (TTS) Providers\n\n| Provider | Status | Streaming | Testing | Authentication | Best For |\n|----------|--------|-----------|---------|----------------|----------|\n| **ElevenLabs** | \u2705 **Verified** | Simulated streaming | \u2705 **Tested & Working** | API Key | Voice cloning & quality |\n| **OpenAI** | \u2705 **Verified** | Simulated streaming | \u2705 **Tested & Fixed** | OpenAI API Key | Natural voices |\n| **Google Cloud** | \u2705 **Available** | TBD | \u26a0\ufe0f **Not tested** | Service Account JSON | Enterprise features |\n| **Azure** | \u2705 **Available** | TBD | \u26a0\ufe0f **Not tested** | API Key + Region | Microsoft ecosystem |\n\n### Key Testing Results\n\n#### \u2705 **Fully Tested & Verified**\n- **OpenAI TTS**: All features working, issues fixed (sample rate accuracy, duration estimation, streaming transparency)\n- **ElevenLabs TTS**: All features working, fully tested and verified\n- **Soniox STT**: Streaming implementation fixed (method names, session management)\n- **Deepgram STT**: True WebSocket streaming implemented and working\n\n#### \u2705 **Code Issues Fixed (Ready for Testing)**\n- **Google Cloud STT**: Fixed critical async/sync mixing bugs in streaming implementation\n- **Azure STT**: Fixed critical async/sync mixing bugs in event handlers\n\n#### \u26a0\ufe0f **Available but Needs Testing**\n- **Google Cloud TTS**: Implementation exists but not tested \n- **Azure TTS**: Implementation exists but not tested\n- **OpenAI Whisper API**: Implementation exists but not tested\n\n## \ud83d\udd27 Advanced Usage\n\n\n### Provider-Specific Optimizations\n\n```python\n# Soniox: Best for true real-time streaming\nsoniox_config = DebabelizerConfig({\n \"soniox\": {\n \"api_key\": \"your_key\",\n \"model\": \"en_v2\",\n \"include_profanity\": False,\n \"enable_global_speaker_diarization\": True\n }\n})\n\n# Deepgram: High accuracy with true streaming\ndeepgram_config = DebabelizerConfig({\n \"deepgram\": {\n \"api_key\": \"your_key\", \n \"model\": \"nova-2\",\n \"language\": \"en\",\n \"interim_results\": True,\n \"vad_events\": True\n }\n})\n\n# Google Cloud: Enterprise features (requires service account)\ngoogle_config = DebabelizerConfig({\n \"google\": {\n \"credentials_path\": \"/path/to/service-account.json\",\n \"project_id\": \"your-project-id\",\n \"model\": \"latest_long\",\n \"enable_speaker_diarization\": True,\n \"enable_word_time_offsets\": True\n }\n})\n\n# Azure: Microsoft ecosystem integration\nazure_config = DebabelizerConfig({\n \"azure\": {\n \"api_key\": \"your_key\",\n \"region\": \"eastus\",\n \"language\": \"en-US\",\n \"enable_dictation\": True,\n \"profanity_filter\": True\n }\n})\n\n# OpenAI Whisper: Free local processing\nwhisper_config = DebabelizerConfig({\n \"whisper\": {\n \"model_size\": \"medium\", # tiny, base, small, medium, large\n \"device\": \"cuda\", # cpu, cuda, mps, auto\n \"fp16\": True, # Faster inference with GPU\n \"language\": None # Auto-detect\n }\n})\n```\n\n### Web Application Integration\n\n```python\nfrom fastapi import FastAPI, UploadFile, File, WebSocket\nfrom debabelizer import VoiceProcessor, DebabelizerConfig\nimport asyncio\n\napp = FastAPI()\n\n# Initialize processor globally\nconfig = DebabelizerConfig()\nprocessor = VoiceProcessor(config=config)\n\n@app.post(\"/transcribe-chunk\")\nasync def transcribe_chunk(file: UploadFile = File(...)):\n \"\"\"\n Recommended approach for web applications.\n Process audio chunks from browser MediaRecorder.\n \"\"\"\n content = await file.read()\n \n # Use audio transcription for buffered chunks\n result = await processor.transcribe_audio(\n audio_data=content,\n audio_format=\"webm\", # Common browser format\n sample_rate=48000, # Browser standard\n language=\"en\"\n )\n \n return {\n \"text\": result.text,\n \"language\": result.language_detected,\n \"confidence\": result.confidence,\n \"duration\": result.duration,\n \"method\": \"chunk_transcription\"\n }\n\n@app.websocket(\"/transcribe-stream\")\nasync def transcribe_stream(websocket: WebSocket):\n \"\"\"\n True streaming approach for specialized applications.\n Requires careful connection management.\n \"\"\"\n await websocket.accept()\n \n # Start streaming session\n session_id = await processor.start_streaming_transcription(\n audio_format=\"pcm\",\n sample_rate=16000,\n language=\"en\"\n )\n \n try:\n while True:\n # Receive audio chunk from WebSocket\n audio_chunk = await websocket.receive_bytes()\n \n # Stream to STT provider\n await processor.stream_audio(session_id, audio_chunk)\n \n # Get results and send back\n async for result in processor.get_streaming_results(session_id):\n await websocket.send_json({\n \"text\": result.text,\n \"is_final\": result.is_final,\n \"confidence\": result.confidence\n })\n \n if result.is_final:\n break\n \n except Exception as e:\n print(f\"Streaming error: {e}\")\n finally:\n await processor.stop_streaming_transcription(session_id)\n```\n\n## \ud83e\uddea Testing\n\n### Run Tests\n```bash\n# All tests\npython -m pytest\n\n# Specific test categories\npython -m pytest tests/test_voice_processor.py # Core functionality\npython -m pytest tests/test_config.py # Configuration\npython -m pytest tests/test_providers/ # Provider tests\n\n# Integration tests (requires API keys)\npython -m pytest tests/test_integration.py\n\n# With coverage\npython -m pytest --cov=debabelizer --cov-report=html\n```\n\n### Test Results\nCurrent test status: **150/165 tests passing, 15 skipped** \u2705\n\n```bash\n# Test specific providers (requires API keys in .env)\npython tests/test_openai_tts.py # OpenAI TTS (tested \u2705)\npython tests/test_soniox_stt.py # Soniox STT (tested \u2705) \npython tests/test_deepgram_stt.py # Deepgram STT (tested \u2705)\npython tests/test_google_stt.py # Google STT (needs auth setup)\npython tests/test_azure_stt.py # Azure STT (needs auth setup)\n```\n\n## \ud83d\udea8 Known Issues & Limitations\n\n### Current Limitations\n1. **Google Cloud & Azure**: Code is fixed but requires proper authentication setup for testing\n2. **TTS Streaming**: Most providers simulate streaming (download full audio, then chunk) - only true for specialized streaming TTS APIs\n3. **OpenAI TTS**: Correctly reports 24kHz output, but doesn't support custom sample rates\n4. **WebM Audio**: Some providers may need audio format conversion for browser-generated WebM/Opus\n\n### Fixed Issues\n- \u2705 **Google STT**: Fixed critical async/sync mixing in streaming implementation\n- \u2705 **Azure STT**: Fixed async/sync mixing in event handlers \n- \u2705 **OpenAI TTS**: Fixed sample rate accuracy, duration estimation, and streaming transparency\n- \u2705 **Soniox STT**: Fixed method name mismatches and session management\n- \u2705 **Deepgram STT**: Implemented true WebSocket streaming\n\n## \ud83e\udd1d Contributing\n\nWe welcome contributions! \n\n### Development Setup\n```bash\ngit clone https://github.com/your-org/debabelizer.git\ncd debabelizer\npip install -e .[dev]\npre-commit install\n```\n\n### Testing New Providers\n1. Add comprehensive test coverage\n2. Follow the systematic debugging approach documented in CLAUDE.md\n3. Test both file-based and streaming implementations\n4. Verify error handling and edge cases\n\n### Adding New Providers\n1. Implement the provider interface in `src/debabelizer/providers/`\n2. Add configuration support in `src/debabelizer/core/config.py`\n3. Update processor in `src/debabelizer/core/processor.py`\n4. Add comprehensive tests in `tests/`\n5. Update documentation\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License.\n\n## \ud83c\udd98 Support\n\n- **Issues**: [GitHub Issues](https://github.com/techwiz42/debabelizer/issues)\n- **Discussions**: [GitHub Discussions](https://github.com/techwiz42/debabelizer/discussions)\n\n## \ud83d\ude4f Acknowledgments\n\n- OpenAI for Whisper models and TTS API\n- All provider teams for their excellent APIs\n- Contributors and testers\n- The open-source community\n\n---\n\n**Debabelizer** - *Breaking down language barriers, one voice at a time* \ud83c\udf0d\ud83d\udde3\ufe0f\n\n*Last updated: 2025-07-31 - Comprehensive testing and bug fixes for OpenAI TTS, Soniox STT, Deepgram STT, Google Cloud STT, and Azure STT implementations*\n",
"bugtrack_url": null,
"license": null,
"summary": "Universal Voice Processing Library - Breaking Down Language Barriers",
"version": "0.1.0b1",
"project_urls": {
"Bug Tracker": "https://github.com/techwiz42/debabelizer/issues",
"Documentation": "https://github.com/techwiz42/debabelizer#readme",
"Homepage": "https://debabelize.me",
"Source Code": "https://github.com/techwiz/debabelizer"
},
"split_keywords": [
"speech",
" voice",
" transcription",
" synthesis",
" stt",
" tts",
" ai",
" ml"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "2edda111a768c741313e2c7e5b658ef0f4415e49913ba1244d0fce5382d92f66",
"md5": "ad6201d6a70ce5f8980b9c4aae588a50",
"sha256": "6b554f1cf38e3ffea593693798333df75c8587ad9033861da214215cafa3777b"
},
"downloads": -1,
"filename": "debabelizer-0.1.0b1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ad6201d6a70ce5f8980b9c4aae588a50",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 81517,
"upload_time": "2025-08-07T18:08:43",
"upload_time_iso_8601": "2025-08-07T18:08:43.808815Z",
"url": "https://files.pythonhosted.org/packages/2e/dd/a111a768c741313e2c7e5b658ef0f4415e49913ba1244d0fce5382d92f66/debabelizer-0.1.0b1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "a5bf2e4667c903f038847d24a959885d5bfd8cc14f01ae5f0e936f986ce79f34",
"md5": "e16c5c7984f75c8c3455d1d58f19f5a8",
"sha256": "f23b899251261a0d37073d04663b28a982003f5fc5c86f6bb68c45115d1e0aa4"
},
"downloads": -1,
"filename": "debabelizer-0.1.0b1.tar.gz",
"has_sig": false,
"md5_digest": "e16c5c7984f75c8c3455d1d58f19f5a8",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 97378,
"upload_time": "2025-08-07T18:08:45",
"upload_time_iso_8601": "2025-08-07T18:08:45.208947Z",
"url": "https://files.pythonhosted.org/packages/a5/bf/2e4667c903f038847d24a959885d5bfd8cc14f01ae5f0e936f986ce79f34/debabelizer-0.1.0b1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-07 18:08:45",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "techwiz42",
"github_project": "debabelizer",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "debabelizer"
}