debabelizer


Namedebabelizer JSON
Version 0.1.0b1 PyPI version JSON
download
home_pageNone
SummaryUniversal Voice Processing Library - Breaking Down Language Barriers
upload_time2025-08-07 18:08:45
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords speech voice transcription synthesis stt tts ai ml
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ๐Ÿ—ฃ๏ธ Debabelizer

**Voice Processing Library - Breaking Down Language Barriers**

[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

Debabelizer is a voice processing library that provides a unified interface for speech-to-text (STT) and text-to-speech (TTS) operations across multiple cloud providers and local engines. Break down language barriers with support for 100+ languages and dialects.

## ๐ŸŒŸ Features

### ๐ŸŽฏ **Pluggable Provider Support**
- **6 STT Providers**: Soniox, Deepgram, Google Cloud, Azure, OpenAI Whisper (local), OpenAI Whisper (API)
- **4 TTS Providers**: ElevenLabs, OpenAI, Google Cloud, Azure
- **Unified API**: Switch providers without changing code
- **Provider-specific optimizations**: Each provider uses its optimal streaming/processing approach

### ๐ŸŒ **Comprehensive Language Support**
- **100+ languages and dialects** across all providers
- **Automatic language detection** 
- **Multi-language processing** in single workflows
- **Custom language hints** for improved accuracy

### โšก **Advanced Processing**
- **Real-time streaming** transcription (Soniox, Deepgram with true WebSocket streaming)
- **Chunk-based transcription** for reliable web application audio processing
- **File-based transcription** for batch processing
- **Word-level timestamps** and confidence scores
- **Speaker diarization** and voice identification (provider-dependent)
- **Custom voice training** and cloning (ElevenLabs)

### ๐Ÿ  **Local & Cloud Options**
- **OpenAI Whisper**: Complete offline processing (FREE)
- **Cloud APIs**: Enterprise-grade accuracy and features
- **Hybrid workflows**: Mix local and cloud processing
- **Cost optimization**: Automatic provider selection by cost/quality

### ๐Ÿ› ๏ธ **Enterprise Ready**
- **Async/await support** for high-performance applications
- **Session management** for long-running processes
- **Error handling** with provider-specific fallbacks
- **Usage tracking** and cost estimation
- **Extensive configuration** options

## ๐Ÿ“ฆ Installation

### Basic Installation
```bash
pip install debabelizer
```

### Provider-Specific Installation
```bash
# Individual providers
pip install debabelizer[soniox]      # Soniox STT
pip install debabelizer[deepgram]    # Deepgram STT
pip install debabelizer[google]      # Google Cloud STT & TTS
pip install debabelizer[azure]       # Azure STT & TTS
pip install debabelizer[whisper]     # OpenAI Whisper STT (local)
pip install debabelizer[elevenlabs]  # ElevenLabs TTS
pip install debabelizer[openai]      # OpenAI TTS & Whisper API

# All providers
pip install debabelizer[all]

# Development
pip install debabelizer[dev]
```

### Development Installation
```bash
git clone https://github.com/your-org/debabelizer.git
cd debabelizer
pip install -e .[dev]
```

## ๐Ÿš€ Quick Start

### Basic Speech-to-Text
```python
import asyncio
from debabelizer import VoiceProcessor, DebabelizerConfig

async def transcribe_audio():
    # Configure with your preferred provider
    config = DebabelizerConfig({
        "deepgram": {"api_key": "your_deepgram_key"},
        "preferences": {"stt_provider": "deepgram"}
    })
    
    # Create processor
    processor = VoiceProcessor(config=config)
    
    # Transcribe audio file
    result = await processor.transcribe_file("audio.wav")
    
    print(f"Text: {result.text}")
    print(f"Language: {result.language_detected}")
    print(f"Confidence: {result.confidence}")

# Run transcription
asyncio.run(transcribe_audio())
```

### Basic Text-to-Speech
```python
import asyncio
from debabelizer import VoiceProcessor, DebabelizerConfig

async def synthesize_speech():
    # Configure TTS provider
    config = DebabelizerConfig({
        "elevenlabs": {"api_key": "your_elevenlabs_key"}
    })
    
    processor = VoiceProcessor(
        tts_provider="elevenlabs", 
        config=config
    )
    
    # Synthesize speech
    result = await processor.synthesize(
        text="Hello world! This is Debabelizer speaking.",
        voice="Rachel"  # ElevenLabs voice
    )
    
    # Save audio
    with open("output.mp3", "wb") as f:
        f.write(result.audio_data)

asyncio.run(synthesize_speech())
```

### Local Processing (FREE with Whisper)
```python
import asyncio
from debabelizer import VoiceProcessor, DebabelizerConfig

async def local_transcription():
    # No API key needed for Whisper!
    config = DebabelizerConfig({
        "whisper": {
            "model_size": "base",  # tiny, base, small, medium, large
            "device": "auto"       # auto-detects GPU/CPU
        }
    })
    
    processor = VoiceProcessor(stt_provider="whisper", config=config)
    
    # Completely offline transcription
    result = await processor.transcribe_file("audio.wav")
    print(f"Offline transcription: {result.text}")

asyncio.run(local_transcription())
```

### Real-time Streaming (Provider-Specific)
```python
import asyncio
from debabelizer import VoiceProcessor, DebabelizerConfig

async def streaming_transcription():
    # Note: True streaming varies by provider
    # Soniox: True real-time WebSocket streaming
    # Deepgram: True real-time WebSocket streaming  
    # Google/Azure: Session-based streaming with optimizations
    
    config = DebabelizerConfig({
        "soniox": {"api_key": "your_key"}  # Best for true streaming
    })
    
    processor = VoiceProcessor(stt_provider="soniox", config=config)
    
    # Start streaming session
    session_id = await processor.start_streaming_transcription(
        audio_format="pcm",  # Raw PCM preferred for streaming
        sample_rate=16000,
        language="en"
    )
    
    # Stream audio chunks (typically 16ms - 100ms chunks)
    with open("audio.wav", "rb") as f:
        chunk_size = 1024  # Small chunks for real-time
        while chunk := f.read(chunk_size):
            await processor.stream_audio(session_id, chunk)
    
    # Get results as they arrive
    async for result in processor.get_streaming_results(session_id):
        if result.is_final:
            print(f"Final: {result.text}")
        else:
            print(f"Interim: {result.text}")
    
    await processor.stop_streaming_transcription(session_id)

asyncio.run(streaming_transcription())
```

### File-Based Transcription (Alternative to Streaming)
```python
import asyncio
from debabelizer import VoiceProcessor, DebabelizerConfig

async def file_transcription():
    """
    Process complete audio files or buffered audio chunks.
    Alternative to streaming for applications that can buffer audio.
    """
    config = DebabelizerConfig({
        "deepgram": {"api_key": "your_key"}
    })
    
    processor = VoiceProcessor(stt_provider="deepgram", config=config)
    
    # Process complete audio file
    result = await processor.transcribe_file("audio.wav")
    
    # Or process audio data from memory (e.g., from web upload)
    with open("audio_chunk.webm", "rb") as f:
        chunk_data = f.read()  # WebM/Opus from MediaRecorder
    
    # Process audio data directly
    result = await processor.transcribe_audio(
        audio_data=chunk_data,
        audio_format="webm",     # Browser WebM/Opus format
        sample_rate=48000,       # Browser standard
        language="en"
    )
    
    print(f"Result: {result.text}")
    print(f"Confidence: {result.confidence}")
    print(f"Language: {result.language_detected}")

asyncio.run(file_transcription())
```

## ๐Ÿ”ง Configuration

### Environment Variables
Create a `.env` file:
```bash
# Provider API Keys
DEEPGRAM_API_KEY=your_deepgram_key
ELEVENLABS_API_KEY=your_elevenlabs_key
OPENAI_API_KEY=your_openai_key
SONIOX_API_KEY=your_soniox_key

# Azure (requires key + region)
AZURE_SPEECH_KEY=your_azure_key
AZURE_SPEECH_REGION=eastus

# Google Cloud (requires service account JSON)
GOOGLE_APPLICATION_CREDENTIALS=/path/to/google-credentials.json

# Preferences
DEBABELIZER_STT_PROVIDER=deepgram
DEBABELIZER_TTS_PROVIDER=elevenlabs
DEBABELIZER_OPTIMIZE_FOR=quality  # cost, latency, quality, balanced
```

### Authentication Requirements by Provider

#### Google Cloud STT/TTS
**Requires**: Service account JSON file or Application Default Credentials
```bash
# Option 1: Service account file
GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json

# Option 2: Use gcloud CLI (for development)
gcloud auth login
gcloud auth application-default login
gcloud config set project YOUR_PROJECT_ID
```

#### Azure STT/TTS
**Requires**: API key + region
```bash
AZURE_SPEECH_KEY=your_api_key_here
AZURE_SPEECH_REGION=eastus  # or your preferred region
```

#### OpenAI (TTS & Whisper API)
**Requires**: OpenAI API key
```bash
OPENAI_API_KEY=your_openai_api_key
```

#### ElevenLabs TTS
**Requires**: ElevenLabs API key
```bash
ELEVENLABS_API_KEY=your_elevenlabs_key
```

#### Deepgram STT
**Requires**: Deepgram API key
```bash
DEEPGRAM_API_KEY=your_deepgram_key
```

#### Soniox STT
**Requires**: Soniox API key
```bash
SONIOX_API_KEY=your_soniox_key
```

#### OpenAI Whisper (Local)
**Requires**: No API key (completely offline)
- Automatically downloads models on first use
- Supports GPU acceleration with CUDA/MPS

## ๐ŸŽฏ Provider Comparison & Testing Status

### Speech-to-Text (STT) Providers

| Provider | Status | Streaming | Language Auto-Detection | Testing | Authentication | Best For |
|----------|--------|-----------|-------------------------|---------|----------------|----------|
| **Soniox** | โœ… **Verified** | True WebSocket streaming | โœ…  **Verified** | โœ… **Tested & Fixed** | API Key | Real-time applications |
| **Deepgram** | โœ… **Verified** | True WebSocket streaming | โœ… **Claimed** | โœ… **Tested & Fixed** | API Key | High accuracy & speed |
| **Google Cloud** | โœ… **Code Fixed** | Session-based streaming | โš ๏ธ **Limited** | โš ๏ธ **Needs Auth Setup** | Service Account JSON | Enterprise features |
| **Azure** | โœ… **Code Fixed** | Session-based streaming | โœ… **Claimed** | โš ๏ธ **Needs Auth Setup** | API Key + Region | Microsoft ecosystem |
| **OpenAI Whisper (Local)** | โœ… **Verified** | File-based only | โ“ **Unclear** | โœ… **Tested** | None (offline) | Cost-free processing |
| **OpenAI Whisper (API)** | โœ… **Available** | File-based only | โ“ **Unclear** | โš ๏ธ **Not tested** | OpenAI API Key | Cloud Whisper |

### Text-to-Speech (TTS) Providers

| Provider | Status | Streaming | Testing | Authentication | Best For |
|----------|--------|-----------|---------|----------------|----------|
| **ElevenLabs** | โœ… **Verified** | Simulated streaming | โœ… **Tested & Working** | API Key | Voice cloning & quality |
| **OpenAI** | โœ… **Verified** | Simulated streaming | โœ… **Tested & Fixed** | OpenAI API Key | Natural voices |
| **Google Cloud** | โœ… **Available** | TBD | โš ๏ธ **Not tested** | Service Account JSON | Enterprise features |
| **Azure** | โœ… **Available** | TBD | โš ๏ธ **Not tested** | API Key + Region | Microsoft ecosystem |

### Key Testing Results

#### โœ… **Fully Tested & Verified**
- **OpenAI TTS**: All features working, issues fixed (sample rate accuracy, duration estimation, streaming transparency)
- **ElevenLabs TTS**: All features working, fully tested and verified
- **Soniox STT**: Streaming implementation fixed (method names, session management)
- **Deepgram STT**: True WebSocket streaming implemented and working

#### โœ… **Code Issues Fixed (Ready for Testing)**
- **Google Cloud STT**: Fixed critical async/sync mixing bugs in streaming implementation
- **Azure STT**: Fixed critical async/sync mixing bugs in event handlers

#### โš ๏ธ **Available but Needs Testing**
- **Google Cloud TTS**: Implementation exists but not tested  
- **Azure TTS**: Implementation exists but not tested
- **OpenAI Whisper API**: Implementation exists but not tested

## ๐Ÿ”ง Advanced Usage


### Provider-Specific Optimizations

```python
# Soniox: Best for true real-time streaming
soniox_config = DebabelizerConfig({
    "soniox": {
        "api_key": "your_key",
        "model": "en_v2",
        "include_profanity": False,
        "enable_global_speaker_diarization": True
    }
})

# Deepgram: High accuracy with true streaming
deepgram_config = DebabelizerConfig({
    "deepgram": {
        "api_key": "your_key", 
        "model": "nova-2",
        "language": "en",
        "interim_results": True,
        "vad_events": True
    }
})

# Google Cloud: Enterprise features (requires service account)
google_config = DebabelizerConfig({
    "google": {
        "credentials_path": "/path/to/service-account.json",
        "project_id": "your-project-id",
        "model": "latest_long",
        "enable_speaker_diarization": True,
        "enable_word_time_offsets": True
    }
})

# Azure: Microsoft ecosystem integration
azure_config = DebabelizerConfig({
    "azure": {
        "api_key": "your_key",
        "region": "eastus",
        "language": "en-US",
        "enable_dictation": True,
        "profanity_filter": True
    }
})

# OpenAI Whisper: Free local processing
whisper_config = DebabelizerConfig({
    "whisper": {
        "model_size": "medium",  # tiny, base, small, medium, large
        "device": "cuda",        # cpu, cuda, mps, auto
        "fp16": True,           # Faster inference with GPU
        "language": None        # Auto-detect
    }
})
```

### Web Application Integration

```python
from fastapi import FastAPI, UploadFile, File, WebSocket
from debabelizer import VoiceProcessor, DebabelizerConfig
import asyncio

app = FastAPI()

# Initialize processor globally
config = DebabelizerConfig()
processor = VoiceProcessor(config=config)

@app.post("/transcribe-chunk")
async def transcribe_chunk(file: UploadFile = File(...)):
    """
    Recommended approach for web applications.
    Process audio chunks from browser MediaRecorder.
    """
    content = await file.read()
    
    # Use audio transcription for buffered chunks
    result = await processor.transcribe_audio(
        audio_data=content,
        audio_format="webm",    # Common browser format
        sample_rate=48000,      # Browser standard
        language="en"
    )
    
    return {
        "text": result.text,
        "language": result.language_detected,
        "confidence": result.confidence,
        "duration": result.duration,
        "method": "chunk_transcription"
    }

@app.websocket("/transcribe-stream")
async def transcribe_stream(websocket: WebSocket):
    """
    True streaming approach for specialized applications.
    Requires careful connection management.
    """
    await websocket.accept()
    
    # Start streaming session
    session_id = await processor.start_streaming_transcription(
        audio_format="pcm",
        sample_rate=16000,
        language="en"
    )
    
    try:
        while True:
            # Receive audio chunk from WebSocket
            audio_chunk = await websocket.receive_bytes()
            
            # Stream to STT provider
            await processor.stream_audio(session_id, audio_chunk)
            
            # Get results and send back
            async for result in processor.get_streaming_results(session_id):
                await websocket.send_json({
                    "text": result.text,
                    "is_final": result.is_final,
                    "confidence": result.confidence
                })
                
                if result.is_final:
                    break
                    
    except Exception as e:
        print(f"Streaming error: {e}")
    finally:
        await processor.stop_streaming_transcription(session_id)
```

## ๐Ÿงช Testing

### Run Tests
```bash
# All tests
python -m pytest

# Specific test categories
python -m pytest tests/test_voice_processor.py  # Core functionality
python -m pytest tests/test_config.py          # Configuration
python -m pytest tests/test_providers/         # Provider tests

# Integration tests (requires API keys)
python -m pytest tests/test_integration.py

# With coverage
python -m pytest --cov=debabelizer --cov-report=html
```

### Test Results
Current test status: **150/165 tests passing, 15 skipped** โœ…

```bash
# Test specific providers (requires API keys in .env)
python tests/test_openai_tts.py      # OpenAI TTS (tested โœ…)
python tests/test_soniox_stt.py      # Soniox STT (tested โœ…) 
python tests/test_deepgram_stt.py    # Deepgram STT (tested โœ…)
python tests/test_google_stt.py      # Google STT (needs auth setup)
python tests/test_azure_stt.py       # Azure STT (needs auth setup)
```

## ๐Ÿšจ Known Issues & Limitations

### Current Limitations
1. **Google Cloud & Azure**: Code is fixed but requires proper authentication setup for testing
2. **TTS Streaming**: Most providers simulate streaming (download full audio, then chunk) - only true for specialized streaming TTS APIs
3. **OpenAI TTS**: Correctly reports 24kHz output, but doesn't support custom sample rates
4. **WebM Audio**: Some providers may need audio format conversion for browser-generated WebM/Opus

### Fixed Issues
- โœ… **Google STT**: Fixed critical async/sync mixing in streaming implementation
- โœ… **Azure STT**: Fixed async/sync mixing in event handlers  
- โœ… **OpenAI TTS**: Fixed sample rate accuracy, duration estimation, and streaming transparency
- โœ… **Soniox STT**: Fixed method name mismatches and session management
- โœ… **Deepgram STT**: Implemented true WebSocket streaming

## ๐Ÿค Contributing

We welcome contributions! 

### Development Setup
```bash
git clone https://github.com/your-org/debabelizer.git
cd debabelizer
pip install -e .[dev]
pre-commit install
```

### Testing New Providers
1. Add comprehensive test coverage
2. Follow the systematic debugging approach documented in CLAUDE.md
3. Test both file-based and streaming implementations
4. Verify error handling and edge cases

### Adding New Providers
1. Implement the provider interface in `src/debabelizer/providers/`
2. Add configuration support in `src/debabelizer/core/config.py`
3. Update processor in `src/debabelizer/core/processor.py`
4. Add comprehensive tests in `tests/`
5. Update documentation

## ๐Ÿ“„ License

This project is licensed under the MIT License.

## ๐Ÿ†˜ Support

- **Issues**: [GitHub Issues](https://github.com/techwiz42/debabelizer/issues)
- **Discussions**: [GitHub Discussions](https://github.com/techwiz42/debabelizer/discussions)

## ๐Ÿ™ Acknowledgments

- OpenAI for Whisper models and TTS API
- All provider teams for their excellent APIs
- Contributors and testers
- The open-source community

---

**Debabelizer** - *Breaking down language barriers, one voice at a time* ๐ŸŒ๐Ÿ—ฃ๏ธ

*Last updated: 2025-07-31 - Comprehensive testing and bug fixes for OpenAI TTS, Soniox STT, Deepgram STT, Google Cloud STT, and Azure STT implementations*

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "debabelizer",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "Peter Sisk <pete@cyberiad.ai>",
    "keywords": "speech, voice, transcription, synthesis, stt, tts, ai, ml",
    "author": null,
    "author_email": "Peter Sisk <pete@cyberiad.ai>",
    "download_url": "https://files.pythonhosted.org/packages/a5/bf/2e4667c903f038847d24a959885d5bfd8cc14f01ae5f0e936f986ce79f34/debabelizer-0.1.0b1.tar.gz",
    "platform": null,
    "description": "# \ud83d\udde3\ufe0f Debabelizer\n\n**Voice Processing Library - Breaking Down Language Barriers**\n\n[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\nDebabelizer is a voice processing library that provides a unified interface for speech-to-text (STT) and text-to-speech (TTS) operations across multiple cloud providers and local engines. Break down language barriers with support for 100+ languages and dialects.\n\n## \ud83c\udf1f Features\n\n### \ud83c\udfaf **Pluggable Provider Support**\n- **6 STT Providers**: Soniox, Deepgram, Google Cloud, Azure, OpenAI Whisper (local), OpenAI Whisper (API)\n- **4 TTS Providers**: ElevenLabs, OpenAI, Google Cloud, Azure\n- **Unified API**: Switch providers without changing code\n- **Provider-specific optimizations**: Each provider uses its optimal streaming/processing approach\n\n### \ud83c\udf0d **Comprehensive Language Support**\n- **100+ languages and dialects** across all providers\n- **Automatic language detection** \n- **Multi-language processing** in single workflows\n- **Custom language hints** for improved accuracy\n\n### \u26a1 **Advanced Processing**\n- **Real-time streaming** transcription (Soniox, Deepgram with true WebSocket streaming)\n- **Chunk-based transcription** for reliable web application audio processing\n- **File-based transcription** for batch processing\n- **Word-level timestamps** and confidence scores\n- **Speaker diarization** and voice identification (provider-dependent)\n- **Custom voice training** and cloning (ElevenLabs)\n\n### \ud83c\udfe0 **Local & Cloud Options**\n- **OpenAI Whisper**: Complete offline processing (FREE)\n- **Cloud APIs**: Enterprise-grade accuracy and features\n- **Hybrid workflows**: Mix local and cloud processing\n- **Cost optimization**: Automatic provider selection by cost/quality\n\n### \ud83d\udee0\ufe0f **Enterprise Ready**\n- **Async/await support** for high-performance applications\n- **Session management** for long-running processes\n- **Error handling** with provider-specific fallbacks\n- **Usage tracking** and cost estimation\n- **Extensive configuration** options\n\n## \ud83d\udce6 Installation\n\n### Basic Installation\n```bash\npip install debabelizer\n```\n\n### Provider-Specific Installation\n```bash\n# Individual providers\npip install debabelizer[soniox]      # Soniox STT\npip install debabelizer[deepgram]    # Deepgram STT\npip install debabelizer[google]      # Google Cloud STT & TTS\npip install debabelizer[azure]       # Azure STT & TTS\npip install debabelizer[whisper]     # OpenAI Whisper STT (local)\npip install debabelizer[elevenlabs]  # ElevenLabs TTS\npip install debabelizer[openai]      # OpenAI TTS & Whisper API\n\n# All providers\npip install debabelizer[all]\n\n# Development\npip install debabelizer[dev]\n```\n\n### Development Installation\n```bash\ngit clone https://github.com/your-org/debabelizer.git\ncd debabelizer\npip install -e .[dev]\n```\n\n## \ud83d\ude80 Quick Start\n\n### Basic Speech-to-Text\n```python\nimport asyncio\nfrom debabelizer import VoiceProcessor, DebabelizerConfig\n\nasync def transcribe_audio():\n    # Configure with your preferred provider\n    config = DebabelizerConfig({\n        \"deepgram\": {\"api_key\": \"your_deepgram_key\"},\n        \"preferences\": {\"stt_provider\": \"deepgram\"}\n    })\n    \n    # Create processor\n    processor = VoiceProcessor(config=config)\n    \n    # Transcribe audio file\n    result = await processor.transcribe_file(\"audio.wav\")\n    \n    print(f\"Text: {result.text}\")\n    print(f\"Language: {result.language_detected}\")\n    print(f\"Confidence: {result.confidence}\")\n\n# Run transcription\nasyncio.run(transcribe_audio())\n```\n\n### Basic Text-to-Speech\n```python\nimport asyncio\nfrom debabelizer import VoiceProcessor, DebabelizerConfig\n\nasync def synthesize_speech():\n    # Configure TTS provider\n    config = DebabelizerConfig({\n        \"elevenlabs\": {\"api_key\": \"your_elevenlabs_key\"}\n    })\n    \n    processor = VoiceProcessor(\n        tts_provider=\"elevenlabs\", \n        config=config\n    )\n    \n    # Synthesize speech\n    result = await processor.synthesize(\n        text=\"Hello world! This is Debabelizer speaking.\",\n        voice=\"Rachel\"  # ElevenLabs voice\n    )\n    \n    # Save audio\n    with open(\"output.mp3\", \"wb\") as f:\n        f.write(result.audio_data)\n\nasyncio.run(synthesize_speech())\n```\n\n### Local Processing (FREE with Whisper)\n```python\nimport asyncio\nfrom debabelizer import VoiceProcessor, DebabelizerConfig\n\nasync def local_transcription():\n    # No API key needed for Whisper!\n    config = DebabelizerConfig({\n        \"whisper\": {\n            \"model_size\": \"base\",  # tiny, base, small, medium, large\n            \"device\": \"auto\"       # auto-detects GPU/CPU\n        }\n    })\n    \n    processor = VoiceProcessor(stt_provider=\"whisper\", config=config)\n    \n    # Completely offline transcription\n    result = await processor.transcribe_file(\"audio.wav\")\n    print(f\"Offline transcription: {result.text}\")\n\nasyncio.run(local_transcription())\n```\n\n### Real-time Streaming (Provider-Specific)\n```python\nimport asyncio\nfrom debabelizer import VoiceProcessor, DebabelizerConfig\n\nasync def streaming_transcription():\n    # Note: True streaming varies by provider\n    # Soniox: True real-time WebSocket streaming\n    # Deepgram: True real-time WebSocket streaming  \n    # Google/Azure: Session-based streaming with optimizations\n    \n    config = DebabelizerConfig({\n        \"soniox\": {\"api_key\": \"your_key\"}  # Best for true streaming\n    })\n    \n    processor = VoiceProcessor(stt_provider=\"soniox\", config=config)\n    \n    # Start streaming session\n    session_id = await processor.start_streaming_transcription(\n        audio_format=\"pcm\",  # Raw PCM preferred for streaming\n        sample_rate=16000,\n        language=\"en\"\n    )\n    \n    # Stream audio chunks (typically 16ms - 100ms chunks)\n    with open(\"audio.wav\", \"rb\") as f:\n        chunk_size = 1024  # Small chunks for real-time\n        while chunk := f.read(chunk_size):\n            await processor.stream_audio(session_id, chunk)\n    \n    # Get results as they arrive\n    async for result in processor.get_streaming_results(session_id):\n        if result.is_final:\n            print(f\"Final: {result.text}\")\n        else:\n            print(f\"Interim: {result.text}\")\n    \n    await processor.stop_streaming_transcription(session_id)\n\nasyncio.run(streaming_transcription())\n```\n\n### File-Based Transcription (Alternative to Streaming)\n```python\nimport asyncio\nfrom debabelizer import VoiceProcessor, DebabelizerConfig\n\nasync def file_transcription():\n    \"\"\"\n    Process complete audio files or buffered audio chunks.\n    Alternative to streaming for applications that can buffer audio.\n    \"\"\"\n    config = DebabelizerConfig({\n        \"deepgram\": {\"api_key\": \"your_key\"}\n    })\n    \n    processor = VoiceProcessor(stt_provider=\"deepgram\", config=config)\n    \n    # Process complete audio file\n    result = await processor.transcribe_file(\"audio.wav\")\n    \n    # Or process audio data from memory (e.g., from web upload)\n    with open(\"audio_chunk.webm\", \"rb\") as f:\n        chunk_data = f.read()  # WebM/Opus from MediaRecorder\n    \n    # Process audio data directly\n    result = await processor.transcribe_audio(\n        audio_data=chunk_data,\n        audio_format=\"webm\",     # Browser WebM/Opus format\n        sample_rate=48000,       # Browser standard\n        language=\"en\"\n    )\n    \n    print(f\"Result: {result.text}\")\n    print(f\"Confidence: {result.confidence}\")\n    print(f\"Language: {result.language_detected}\")\n\nasyncio.run(file_transcription())\n```\n\n## \ud83d\udd27 Configuration\n\n### Environment Variables\nCreate a `.env` file:\n```bash\n# Provider API Keys\nDEEPGRAM_API_KEY=your_deepgram_key\nELEVENLABS_API_KEY=your_elevenlabs_key\nOPENAI_API_KEY=your_openai_key\nSONIOX_API_KEY=your_soniox_key\n\n# Azure (requires key + region)\nAZURE_SPEECH_KEY=your_azure_key\nAZURE_SPEECH_REGION=eastus\n\n# Google Cloud (requires service account JSON)\nGOOGLE_APPLICATION_CREDENTIALS=/path/to/google-credentials.json\n\n# Preferences\nDEBABELIZER_STT_PROVIDER=deepgram\nDEBABELIZER_TTS_PROVIDER=elevenlabs\nDEBABELIZER_OPTIMIZE_FOR=quality  # cost, latency, quality, balanced\n```\n\n### Authentication Requirements by Provider\n\n#### Google Cloud STT/TTS\n**Requires**: Service account JSON file or Application Default Credentials\n```bash\n# Option 1: Service account file\nGOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json\n\n# Option 2: Use gcloud CLI (for development)\ngcloud auth login\ngcloud auth application-default login\ngcloud config set project YOUR_PROJECT_ID\n```\n\n#### Azure STT/TTS\n**Requires**: API key + region\n```bash\nAZURE_SPEECH_KEY=your_api_key_here\nAZURE_SPEECH_REGION=eastus  # or your preferred region\n```\n\n#### OpenAI (TTS & Whisper API)\n**Requires**: OpenAI API key\n```bash\nOPENAI_API_KEY=your_openai_api_key\n```\n\n#### ElevenLabs TTS\n**Requires**: ElevenLabs API key\n```bash\nELEVENLABS_API_KEY=your_elevenlabs_key\n```\n\n#### Deepgram STT\n**Requires**: Deepgram API key\n```bash\nDEEPGRAM_API_KEY=your_deepgram_key\n```\n\n#### Soniox STT\n**Requires**: Soniox API key\n```bash\nSONIOX_API_KEY=your_soniox_key\n```\n\n#### OpenAI Whisper (Local)\n**Requires**: No API key (completely offline)\n- Automatically downloads models on first use\n- Supports GPU acceleration with CUDA/MPS\n\n## \ud83c\udfaf Provider Comparison & Testing Status\n\n### Speech-to-Text (STT) Providers\n\n| Provider | Status | Streaming | Language Auto-Detection | Testing | Authentication | Best For |\n|----------|--------|-----------|-------------------------|---------|----------------|----------|\n| **Soniox** | \u2705 **Verified** | True WebSocket streaming | \u2705  **Verified** | \u2705 **Tested & Fixed** | API Key | Real-time applications |\n| **Deepgram** | \u2705 **Verified** | True WebSocket streaming | \u2705 **Claimed** | \u2705 **Tested & Fixed** | API Key | High accuracy & speed |\n| **Google Cloud** | \u2705 **Code Fixed** | Session-based streaming | \u26a0\ufe0f **Limited** | \u26a0\ufe0f **Needs Auth Setup** | Service Account JSON | Enterprise features |\n| **Azure** | \u2705 **Code Fixed** | Session-based streaming | \u2705 **Claimed** | \u26a0\ufe0f **Needs Auth Setup** | API Key + Region | Microsoft ecosystem |\n| **OpenAI Whisper (Local)** | \u2705 **Verified** | File-based only | \u2753 **Unclear** | \u2705 **Tested** | None (offline) | Cost-free processing |\n| **OpenAI Whisper (API)** | \u2705 **Available** | File-based only | \u2753 **Unclear** | \u26a0\ufe0f **Not tested** | OpenAI API Key | Cloud Whisper |\n\n### Text-to-Speech (TTS) Providers\n\n| Provider | Status | Streaming | Testing | Authentication | Best For |\n|----------|--------|-----------|---------|----------------|----------|\n| **ElevenLabs** | \u2705 **Verified** | Simulated streaming | \u2705 **Tested & Working** | API Key | Voice cloning & quality |\n| **OpenAI** | \u2705 **Verified** | Simulated streaming | \u2705 **Tested & Fixed** | OpenAI API Key | Natural voices |\n| **Google Cloud** | \u2705 **Available** | TBD | \u26a0\ufe0f **Not tested** | Service Account JSON | Enterprise features |\n| **Azure** | \u2705 **Available** | TBD | \u26a0\ufe0f **Not tested** | API Key + Region | Microsoft ecosystem |\n\n### Key Testing Results\n\n#### \u2705 **Fully Tested & Verified**\n- **OpenAI TTS**: All features working, issues fixed (sample rate accuracy, duration estimation, streaming transparency)\n- **ElevenLabs TTS**: All features working, fully tested and verified\n- **Soniox STT**: Streaming implementation fixed (method names, session management)\n- **Deepgram STT**: True WebSocket streaming implemented and working\n\n#### \u2705 **Code Issues Fixed (Ready for Testing)**\n- **Google Cloud STT**: Fixed critical async/sync mixing bugs in streaming implementation\n- **Azure STT**: Fixed critical async/sync mixing bugs in event handlers\n\n#### \u26a0\ufe0f **Available but Needs Testing**\n- **Google Cloud TTS**: Implementation exists but not tested  \n- **Azure TTS**: Implementation exists but not tested\n- **OpenAI Whisper API**: Implementation exists but not tested\n\n## \ud83d\udd27 Advanced Usage\n\n\n### Provider-Specific Optimizations\n\n```python\n# Soniox: Best for true real-time streaming\nsoniox_config = DebabelizerConfig({\n    \"soniox\": {\n        \"api_key\": \"your_key\",\n        \"model\": \"en_v2\",\n        \"include_profanity\": False,\n        \"enable_global_speaker_diarization\": True\n    }\n})\n\n# Deepgram: High accuracy with true streaming\ndeepgram_config = DebabelizerConfig({\n    \"deepgram\": {\n        \"api_key\": \"your_key\", \n        \"model\": \"nova-2\",\n        \"language\": \"en\",\n        \"interim_results\": True,\n        \"vad_events\": True\n    }\n})\n\n# Google Cloud: Enterprise features (requires service account)\ngoogle_config = DebabelizerConfig({\n    \"google\": {\n        \"credentials_path\": \"/path/to/service-account.json\",\n        \"project_id\": \"your-project-id\",\n        \"model\": \"latest_long\",\n        \"enable_speaker_diarization\": True,\n        \"enable_word_time_offsets\": True\n    }\n})\n\n# Azure: Microsoft ecosystem integration\nazure_config = DebabelizerConfig({\n    \"azure\": {\n        \"api_key\": \"your_key\",\n        \"region\": \"eastus\",\n        \"language\": \"en-US\",\n        \"enable_dictation\": True,\n        \"profanity_filter\": True\n    }\n})\n\n# OpenAI Whisper: Free local processing\nwhisper_config = DebabelizerConfig({\n    \"whisper\": {\n        \"model_size\": \"medium\",  # tiny, base, small, medium, large\n        \"device\": \"cuda\",        # cpu, cuda, mps, auto\n        \"fp16\": True,           # Faster inference with GPU\n        \"language\": None        # Auto-detect\n    }\n})\n```\n\n### Web Application Integration\n\n```python\nfrom fastapi import FastAPI, UploadFile, File, WebSocket\nfrom debabelizer import VoiceProcessor, DebabelizerConfig\nimport asyncio\n\napp = FastAPI()\n\n# Initialize processor globally\nconfig = DebabelizerConfig()\nprocessor = VoiceProcessor(config=config)\n\n@app.post(\"/transcribe-chunk\")\nasync def transcribe_chunk(file: UploadFile = File(...)):\n    \"\"\"\n    Recommended approach for web applications.\n    Process audio chunks from browser MediaRecorder.\n    \"\"\"\n    content = await file.read()\n    \n    # Use audio transcription for buffered chunks\n    result = await processor.transcribe_audio(\n        audio_data=content,\n        audio_format=\"webm\",    # Common browser format\n        sample_rate=48000,      # Browser standard\n        language=\"en\"\n    )\n    \n    return {\n        \"text\": result.text,\n        \"language\": result.language_detected,\n        \"confidence\": result.confidence,\n        \"duration\": result.duration,\n        \"method\": \"chunk_transcription\"\n    }\n\n@app.websocket(\"/transcribe-stream\")\nasync def transcribe_stream(websocket: WebSocket):\n    \"\"\"\n    True streaming approach for specialized applications.\n    Requires careful connection management.\n    \"\"\"\n    await websocket.accept()\n    \n    # Start streaming session\n    session_id = await processor.start_streaming_transcription(\n        audio_format=\"pcm\",\n        sample_rate=16000,\n        language=\"en\"\n    )\n    \n    try:\n        while True:\n            # Receive audio chunk from WebSocket\n            audio_chunk = await websocket.receive_bytes()\n            \n            # Stream to STT provider\n            await processor.stream_audio(session_id, audio_chunk)\n            \n            # Get results and send back\n            async for result in processor.get_streaming_results(session_id):\n                await websocket.send_json({\n                    \"text\": result.text,\n                    \"is_final\": result.is_final,\n                    \"confidence\": result.confidence\n                })\n                \n                if result.is_final:\n                    break\n                    \n    except Exception as e:\n        print(f\"Streaming error: {e}\")\n    finally:\n        await processor.stop_streaming_transcription(session_id)\n```\n\n## \ud83e\uddea Testing\n\n### Run Tests\n```bash\n# All tests\npython -m pytest\n\n# Specific test categories\npython -m pytest tests/test_voice_processor.py  # Core functionality\npython -m pytest tests/test_config.py          # Configuration\npython -m pytest tests/test_providers/         # Provider tests\n\n# Integration tests (requires API keys)\npython -m pytest tests/test_integration.py\n\n# With coverage\npython -m pytest --cov=debabelizer --cov-report=html\n```\n\n### Test Results\nCurrent test status: **150/165 tests passing, 15 skipped** \u2705\n\n```bash\n# Test specific providers (requires API keys in .env)\npython tests/test_openai_tts.py      # OpenAI TTS (tested \u2705)\npython tests/test_soniox_stt.py      # Soniox STT (tested \u2705) \npython tests/test_deepgram_stt.py    # Deepgram STT (tested \u2705)\npython tests/test_google_stt.py      # Google STT (needs auth setup)\npython tests/test_azure_stt.py       # Azure STT (needs auth setup)\n```\n\n## \ud83d\udea8 Known Issues & Limitations\n\n### Current Limitations\n1. **Google Cloud & Azure**: Code is fixed but requires proper authentication setup for testing\n2. **TTS Streaming**: Most providers simulate streaming (download full audio, then chunk) - only true for specialized streaming TTS APIs\n3. **OpenAI TTS**: Correctly reports 24kHz output, but doesn't support custom sample rates\n4. **WebM Audio**: Some providers may need audio format conversion for browser-generated WebM/Opus\n\n### Fixed Issues\n- \u2705 **Google STT**: Fixed critical async/sync mixing in streaming implementation\n- \u2705 **Azure STT**: Fixed async/sync mixing in event handlers  \n- \u2705 **OpenAI TTS**: Fixed sample rate accuracy, duration estimation, and streaming transparency\n- \u2705 **Soniox STT**: Fixed method name mismatches and session management\n- \u2705 **Deepgram STT**: Implemented true WebSocket streaming\n\n## \ud83e\udd1d Contributing\n\nWe welcome contributions! \n\n### Development Setup\n```bash\ngit clone https://github.com/your-org/debabelizer.git\ncd debabelizer\npip install -e .[dev]\npre-commit install\n```\n\n### Testing New Providers\n1. Add comprehensive test coverage\n2. Follow the systematic debugging approach documented in CLAUDE.md\n3. Test both file-based and streaming implementations\n4. Verify error handling and edge cases\n\n### Adding New Providers\n1. Implement the provider interface in `src/debabelizer/providers/`\n2. Add configuration support in `src/debabelizer/core/config.py`\n3. Update processor in `src/debabelizer/core/processor.py`\n4. Add comprehensive tests in `tests/`\n5. Update documentation\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License.\n\n## \ud83c\udd98 Support\n\n- **Issues**: [GitHub Issues](https://github.com/techwiz42/debabelizer/issues)\n- **Discussions**: [GitHub Discussions](https://github.com/techwiz42/debabelizer/discussions)\n\n## \ud83d\ude4f Acknowledgments\n\n- OpenAI for Whisper models and TTS API\n- All provider teams for their excellent APIs\n- Contributors and testers\n- The open-source community\n\n---\n\n**Debabelizer** - *Breaking down language barriers, one voice at a time* \ud83c\udf0d\ud83d\udde3\ufe0f\n\n*Last updated: 2025-07-31 - Comprehensive testing and bug fixes for OpenAI TTS, Soniox STT, Deepgram STT, Google Cloud STT, and Azure STT implementations*\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Universal Voice Processing Library - Breaking Down Language Barriers",
    "version": "0.1.0b1",
    "project_urls": {
        "Bug Tracker": "https://github.com/techwiz42/debabelizer/issues",
        "Documentation": "https://github.com/techwiz42/debabelizer#readme",
        "Homepage": "https://debabelize.me",
        "Source Code": "https://github.com/techwiz/debabelizer"
    },
    "split_keywords": [
        "speech",
        " voice",
        " transcription",
        " synthesis",
        " stt",
        " tts",
        " ai",
        " ml"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "2edda111a768c741313e2c7e5b658ef0f4415e49913ba1244d0fce5382d92f66",
                "md5": "ad6201d6a70ce5f8980b9c4aae588a50",
                "sha256": "6b554f1cf38e3ffea593693798333df75c8587ad9033861da214215cafa3777b"
            },
            "downloads": -1,
            "filename": "debabelizer-0.1.0b1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ad6201d6a70ce5f8980b9c4aae588a50",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 81517,
            "upload_time": "2025-08-07T18:08:43",
            "upload_time_iso_8601": "2025-08-07T18:08:43.808815Z",
            "url": "https://files.pythonhosted.org/packages/2e/dd/a111a768c741313e2c7e5b658ef0f4415e49913ba1244d0fce5382d92f66/debabelizer-0.1.0b1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a5bf2e4667c903f038847d24a959885d5bfd8cc14f01ae5f0e936f986ce79f34",
                "md5": "e16c5c7984f75c8c3455d1d58f19f5a8",
                "sha256": "f23b899251261a0d37073d04663b28a982003f5fc5c86f6bb68c45115d1e0aa4"
            },
            "downloads": -1,
            "filename": "debabelizer-0.1.0b1.tar.gz",
            "has_sig": false,
            "md5_digest": "e16c5c7984f75c8c3455d1d58f19f5a8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 97378,
            "upload_time": "2025-08-07T18:08:45",
            "upload_time_iso_8601": "2025-08-07T18:08:45.208947Z",
            "url": "https://files.pythonhosted.org/packages/a5/bf/2e4667c903f038847d24a959885d5bfd8cc14f01ae5f0e936f986ce79f34/debabelizer-0.1.0b1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-07 18:08:45",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "techwiz42",
    "github_project": "debabelizer",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "debabelizer"
}
        
Elapsed time: 1.89393s