# Realtime_mlx_STT
[](https://badge.fury.io/py/realtime-mlx-stt)
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
[-lightgrey.svg)](https://support.apple.com/en-us/HT211814)
High-performance speech-to-text transcription library optimized exclusively for Apple Silicon. Leverages MLX framework for real-time on-device transcription with low latency.
> ⚠️ **IMPORTANT: This library is designed for LOCAL USE ONLY on macOS with Apple Silicon.** The included server is a development tool and should NOT be exposed to the internet or used in production environments without implementing proper security measures.
## Features
- **Real-time transcription** with low latency using MLX Whisper
- **Multiple APIs** - Python API, REST API, and WebSocket for different use cases
- **Apple Silicon optimization** using MLX with Neural Engine acceleration
- **Voice activity detection** with WebRTC and Silero (configurable thresholds)
- **Wake word detection** using Porcupine ("Jarvis", "Alexa", etc.)
- **OpenAI integration** for cloud-based transcription alternative
- **Interactive CLI** for easy exploration of features
- **Web UI** with modern interface and real-time updates
- **Profile system** for quick configuration switching
- **Event-driven architecture** with command pattern
- **Thread-safe** and production-ready
## Language Selection
The Whisper large-v3-turbo model supports 99 languages with intelligent language detection:
- **Language-specific mode**: When you select a specific language (e.g., Norwegian, French, Spanish), the model uses language-specific tokens that significantly improve transcription accuracy for that language
- **Multi-language capability**: Even with a language selected, Whisper can still transcribe other languages if spoken - it's not restricted to only the selected language
- **Accuracy benefit**: Selecting the primary language you'll be speaking provides much more accurate transcription compared to auto-detect mode
- **Auto-detect mode**: When no language is specified, the model attempts to detect the language automatically, though with potentially lower accuracy
For example, if you select Norwegian (`no`) as your language:
- Norwegian speech will be transcribed with high accuracy
- English speech will still be transcribed correctly if spoken
- The model uses the Norwegian language token (50288) to optimize for Norwegian
This behavior matches OpenAI's Whisper API - the language parameter guides but doesn't restrict the model.
## Requirements
- **macOS** with Apple Silicon (M1/M2/M3) - Required, not optional
- **Python 3.9+** (3.11+ recommended for best performance)
- **MLX** for Apple Silicon optimization
- **PyAudio** for audio capture
- **WebRTC VAD** and **Silero VAD** for voice activity detection
- **Porcupine** for wake word detection (optional)
- **Torch** and **NumPy** for audio processing
> **Important Note**: This library is specifically optimized for Apple Silicon and will not work on Intel-based Macs or other platforms. It requires the Neural Engine found in Apple Silicon chips to achieve optimal performance.
## Installation
### Install from PyPI (Recommended)
```bash
# Basic installation
pip install realtime-mlx-stt
# With OpenAI support for cloud transcription
pip install "realtime-mlx-stt[openai]"
# With development tools
pip install "realtime-mlx-stt[dev]"
# With server support for REST/WebSocket APIs
pip install "realtime-mlx-stt[server]"
# Install everything
pip install "realtime-mlx-stt[openai,server,dev]"
```
## 📚 Documentation
- **[Usage Guide](USAGE_GUIDE.md)** - Common patterns and troubleshooting
- **[API Reference](realtime_mlx_stt/README.md)** - Detailed API documentation
- **[Examples](examples/)** - Working code examples
### Install from Source
```bash
# Clone the repository
git clone https://github.com/kristofferv98/Realtime_mlx_STT.git
cd Realtime_mlx_STT
# Set up Python environment (requires Python 3.9+ but 3.11+ recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in development mode
pip install -e .
```
## Quick Start
### Interactive CLI (Recommended)
The easiest way to explore all features:
```bash
python examples/cli.py
```
This provides a menu-driven interface for:
- Quick 10-second transcription
- Continuous streaming mode
- OpenAI cloud transcription
- Wake word detection
- Audio device selection
- Language configuration
### Python API
```python
from realtime_mlx_stt import STTClient
# Simple transcription
client = STTClient()
for result in client.transcribe(duration=10):
print(result.text)
# With OpenAI
client = STTClient(openai_api_key="sk-...")
for result in client.transcribe(engine="openai"):
print(result.text)
# Wake word mode
client.start_wake_word("jarvis")
```
### Server Mode
> **Security Note**: The server is for local development only and binds to localhost by default. Do NOT expose it to the internet without proper authentication and security measures.
```bash
# Start server (localhost only - safe)
cd example_server
python server_example.py
# Opens web UI at http://localhost:8000
```
## Architecture
The library provides two specialized interfaces built on a common Features layer:
```
┌─────────────────────────────────────────────────┐
│ User Interfaces │
│ • CLI (examples/cli.py) │
│ • Web UI (example_server/) │
├─────────────────────────────────────────────────┤
│ API Layers │
│ • Python API (realtime_mlx_stt/) │
│ • REST/WebSocket (src/Application/Server/) │
├─────────────────────────────────────────────────┤
│ Features Layer │
│ • AudioCapture │
│ • VoiceActivityDetection │
│ • Transcription (MLX/OpenAI) │
│ • WakeWordDetection │
├─────────────────────────────────────────────────┤
│ Core & Infrastructure │
│ • Command/Event System │
│ • Logging & Configuration │
└─────────────────────────────────────────────────┘
```
### Key Design Principles
- **Vertical Slice Architecture**: Each feature is self-contained with Commands, Events, Handlers, and Models
- **Dual API Design**: Python API optimized for direct use, Server API optimized for multi-client scenarios
- **Event-Driven**: Features communicate via commands and events, not direct dependencies
- **Production Ready**: Thread-safe, lazy initialization, comprehensive error handling
## API Documentation
### Python API (realtime_mlx_stt)
```python
from realtime_mlx_stt import STTClient, TranscriptionSession, create_transcriber
# Method 1: Modern Client API
client = STTClient(
openai_api_key="sk-...", # Optional
default_engine="mlx_whisper", # or "openai"
default_language="en" # or None for auto-detect
)
# Transcribe for fixed duration
for result in client.transcribe(duration=10):
print(f"{result.text} (confidence: {result.confidence})")
# Streaming with stop word
with client.stream() as stream:
for result in stream:
print(result.text)
if "stop" in result.text.lower():
break
# Method 2: Session-based API
from realtime_mlx_stt import TranscriptionSession, ModelConfig, VADConfig
session = TranscriptionSession(
model=ModelConfig(engine="mlx_whisper", language="no"),
vad=VADConfig(sensitivity=0.8),
on_transcription=lambda r: print(r.text)
)
with session:
time.sleep(30) # Listen for 30 seconds
# Method 3: Simple Transcriber
from realtime_mlx_stt import Transcriber
transcriber = Transcriber(language="es")
text = transcriber.transcribe_from_mic(duration=5)
print(f"You said: {text}")
```
### REST API
```bash
# Start system with profile
curl -X POST http://localhost:8000/api/v1/system/start \
-H "Content-Type: application/json" \
-d '{
"profile": "vad-triggered",
"custom_config": {
"transcription": {"language": "fr"},
"vad": {"sensitivity": 0.7}
}
}'
# Get system status
curl http://localhost:8000/api/v1/system/status
# Transcribe audio file
curl -X POST http://localhost:8000/api/v1/transcription/audio \
-H "Content-Type: application/json" \
-d '{"audio_data": "base64_encoded_audio_data"}'
```
### WebSocket Events
```javascript
const ws = new WebSocket('ws://localhost:8000/events');
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
switch(data.type) {
case 'transcription':
if (data.is_final) {
console.log(`Final: ${data.text}`);
} else {
console.log(`Transcribing: ${data.text}`);
}
break;
case 'wake_word':
console.log(`Wake word: ${data.wake_word}`);
break;
}
```
## Configuration
### Environment Variables
```bash
# API Keys
export OPENAI_API_KEY="sk-..." # For OpenAI transcription
export PORCUPINE_ACCESS_KEY="..." # For wake word detection
# Alternative names for Picovoice universal key (same as PORCUPINE_ACCESS_KEY):
# export PICOVOICE_ACCESS_KEY="..."
# export PICOVOICE_API_KEY="..."
# Logging
export LOG_LEVEL="INFO" # DEBUG, INFO, WARNING, ERROR
export LOG_FORMAT="human" # human, json, detailed
```
### Python Configuration
```python
from realtime_mlx_stt import ModelConfig, VADConfig, WakeWordConfig
# Model configuration
model = ModelConfig(
engine="mlx_whisper", # or "openai"
model="whisper-large-v3-turbo",
language="en" # or None for auto-detect
)
# VAD configuration
vad = VADConfig(
enabled=True,
sensitivity=0.6, # 0.0-1.0
min_speech_duration=0.25, # seconds
min_silence_duration=0.1 # seconds
)
# Wake word configuration
# Note: Requires PORCUPINE_ACCESS_KEY environment variable
wake_word = WakeWordConfig(
words=["jarvis", "computer"],
sensitivity=0.7,
timeout=30 # seconds
)
## Testing
The project includes comprehensive tests for each feature and component:
```bash
# Run all tests
python tests/run_tests.py
# Run tests for a specific feature or component
python tests/run_tests.py -f VoiceActivityDetection
python tests/run_tests.py -f Infrastructure
python tests/run_tests.py -f Application # Server/Client tests
# Run a specific test with verbose output
python tests/run_tests.py -t webrtc_vad_test -v
python tests/run_tests.py -t test_server_module -v
# Test with PYTHONPATH (if imports fail)
PYTHONPATH=/path/to/Realtime_mlx_STT python tests/run_tests.py
```
The Server implementation includes tests for:
- API Controllers (Transcription and System)
- WebSocket connections and event broadcasting
- Configuration and profile management
- Command/Event integration
## Performance
On Apple Silicon (M1/M2/M3), the MLX-optimized Whisper-large-v3-turbo model typically achieves:
- **Batch mode**: ~0.3-0.5x realtime (processes 60 seconds of audio in 20-30 seconds)
- **Streaming mode**: ~0.5-0.7x realtime (processes audio with ~2-3 second latency)
The MLX implementation takes full advantage of the Neural Engine in Apple Silicon chips, providing significantly better performance than CPU-based implementations.
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add some amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
## Recent Updates
- **New Python API**: Added high-level `realtime_mlx_stt` package with STTClient, TranscriptionSession, and Transcriber
- **Interactive CLI**: New user-friendly CLI at `examples/cli.py` for exploring all features
- **Dual API Architecture**: Python API optimized for direct use, Server API for multi-client scenarios
- **Improved Examples**: Consolidated examples with clear documentation
- **Architecture Documentation**: Added comprehensive architecture documentation
- **OpenAI Integration**: Support for OpenAI's transcription API as alternative to local MLX
## License
This project is licensed under the MIT License - see the LICENSE file for details.
## Acknowledgments
- [OpenAI Whisper](https://github.com/openai/whisper) for the base Whisper large-v3-turbo model
- [MLX](https://github.com/ml-explore/mlx) for Apple Silicon optimization
- [RealtimeSTT](https://github.com/KoljaB/RealtimeSTT) for the original audio processing concepts
- [Picovoice Porcupine](https://picovoice.ai/platform/porcupine/) for wake word detection
- [Hugging Face](https://huggingface.co) for model distribution infrastructure
Raw data
{
"_id": null,
"home_page": null,
"name": "realtime-mlx-stt",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "speech-to-text, stt, transcription, whisper, mlx, apple-silicon, real-time, voice-activity-detection, vad, wake-word, speech-recognition, audio-processing, macos, m1, m2, m3, neural-engine",
"author": null,
"author_email": "Kristoffer Vatnehol <kristoffer.vatnehol@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/dd/0a/7697b2bac9d3052c09d2c0a7205281599584ba861c67e87a93b348e71145/realtime_mlx_stt-0.1.5.tar.gz",
"platform": null,
"description": "# Realtime_mlx_STT\n\n[](https://badge.fury.io/py/realtime-mlx-stt)\n[](https://www.python.org/downloads/)\n[](https://opensource.org/licenses/MIT)\n[-lightgrey.svg)](https://support.apple.com/en-us/HT211814)\n\nHigh-performance speech-to-text transcription library optimized exclusively for Apple Silicon. Leverages MLX framework for real-time on-device transcription with low latency.\n\n> \u26a0\ufe0f **IMPORTANT: This library is designed for LOCAL USE ONLY on macOS with Apple Silicon.** The included server is a development tool and should NOT be exposed to the internet or used in production environments without implementing proper security measures.\n\n## Features\n\n- **Real-time transcription** with low latency using MLX Whisper\n- **Multiple APIs** - Python API, REST API, and WebSocket for different use cases \n- **Apple Silicon optimization** using MLX with Neural Engine acceleration\n- **Voice activity detection** with WebRTC and Silero (configurable thresholds)\n- **Wake word detection** using Porcupine (\"Jarvis\", \"Alexa\", etc.)\n- **OpenAI integration** for cloud-based transcription alternative\n- **Interactive CLI** for easy exploration of features\n- **Web UI** with modern interface and real-time updates\n- **Profile system** for quick configuration switching\n- **Event-driven architecture** with command pattern\n- **Thread-safe** and production-ready\n\n## Language Selection\n\nThe Whisper large-v3-turbo model supports 99 languages with intelligent language detection:\n\n- **Language-specific mode**: When you select a specific language (e.g., Norwegian, French, Spanish), the model uses language-specific tokens that significantly improve transcription accuracy for that language\n- **Multi-language capability**: Even with a language selected, Whisper can still transcribe other languages if spoken - it's not restricted to only the selected language\n- **Accuracy benefit**: Selecting the primary language you'll be speaking provides much more accurate transcription compared to auto-detect mode\n- **Auto-detect mode**: When no language is specified, the model attempts to detect the language automatically, though with potentially lower accuracy\n\nFor example, if you select Norwegian (`no`) as your language:\n- Norwegian speech will be transcribed with high accuracy\n- English speech will still be transcribed correctly if spoken\n- The model uses the Norwegian language token (50288) to optimize for Norwegian\n\nThis behavior matches OpenAI's Whisper API - the language parameter guides but doesn't restrict the model.\n\n## Requirements\n\n- **macOS** with Apple Silicon (M1/M2/M3) - Required, not optional\n- **Python 3.9+** (3.11+ recommended for best performance)\n- **MLX** for Apple Silicon optimization\n- **PyAudio** for audio capture\n- **WebRTC VAD** and **Silero VAD** for voice activity detection\n- **Porcupine** for wake word detection (optional)\n- **Torch** and **NumPy** for audio processing\n\n> **Important Note**: This library is specifically optimized for Apple Silicon and will not work on Intel-based Macs or other platforms. It requires the Neural Engine found in Apple Silicon chips to achieve optimal performance.\n\n## Installation\n\n### Install from PyPI (Recommended)\n\n```bash\n# Basic installation\npip install realtime-mlx-stt\n\n# With OpenAI support for cloud transcription\npip install \"realtime-mlx-stt[openai]\"\n\n# With development tools\npip install \"realtime-mlx-stt[dev]\"\n\n# With server support for REST/WebSocket APIs\npip install \"realtime-mlx-stt[server]\"\n\n# Install everything\npip install \"realtime-mlx-stt[openai,server,dev]\"\n```\n\n## \ud83d\udcda Documentation\n\n- **[Usage Guide](USAGE_GUIDE.md)** - Common patterns and troubleshooting\n- **[API Reference](realtime_mlx_stt/README.md)** - Detailed API documentation\n- **[Examples](examples/)** - Working code examples\n\n### Install from Source\n\n```bash\n# Clone the repository\ngit clone https://github.com/kristofferv98/Realtime_mlx_STT.git\ncd Realtime_mlx_STT\n\n# Set up Python environment (requires Python 3.9+ but 3.11+ recommended)\npython -m venv venv\nsource venv/bin/activate # On Windows: venv\\Scripts\\activate\n\n# Install in development mode\npip install -e .\n```\n\n## Quick Start\n\n### Interactive CLI (Recommended)\n\nThe easiest way to explore all features:\n\n```bash\npython examples/cli.py\n```\n\nThis provides a menu-driven interface for:\n- Quick 10-second transcription\n- Continuous streaming mode\n- OpenAI cloud transcription\n- Wake word detection\n- Audio device selection\n- Language configuration\n\n### Python API\n\n```python\nfrom realtime_mlx_stt import STTClient\n\n# Simple transcription\nclient = STTClient()\nfor result in client.transcribe(duration=10):\n print(result.text)\n\n# With OpenAI\nclient = STTClient(openai_api_key=\"sk-...\")\nfor result in client.transcribe(engine=\"openai\"):\n print(result.text)\n\n# Wake word mode\nclient.start_wake_word(\"jarvis\")\n```\n\n### Server Mode\n\n> **Security Note**: The server is for local development only and binds to localhost by default. Do NOT expose it to the internet without proper authentication and security measures.\n\n```bash\n# Start server (localhost only - safe)\ncd example_server\npython server_example.py\n\n# Opens web UI at http://localhost:8000\n```\n\n## Architecture\n\nThe library provides two specialized interfaces built on a common Features layer:\n\n```\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 User Interfaces \u2502\n\u2502 \u2022 CLI (examples/cli.py) \u2502\n\u2502 \u2022 Web UI (example_server/) \u2502\n\u251c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2524\n\u2502 API Layers \u2502\n\u2502 \u2022 Python API (realtime_mlx_stt/) \u2502\n\u2502 \u2022 REST/WebSocket (src/Application/Server/) \u2502\n\u251c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2524\n\u2502 Features Layer \u2502\n\u2502 \u2022 AudioCapture \u2502\n\u2502 \u2022 VoiceActivityDetection \u2502\n\u2502 \u2022 Transcription (MLX/OpenAI) \u2502\n\u2502 \u2022 WakeWordDetection \u2502\n\u251c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2524\n\u2502 Core & Infrastructure \u2502\n\u2502 \u2022 Command/Event System \u2502\n\u2502 \u2022 Logging & Configuration \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n```\n\n### Key Design Principles\n\n- **Vertical Slice Architecture**: Each feature is self-contained with Commands, Events, Handlers, and Models\n- **Dual API Design**: Python API optimized for direct use, Server API optimized for multi-client scenarios\n- **Event-Driven**: Features communicate via commands and events, not direct dependencies\n- **Production Ready**: Thread-safe, lazy initialization, comprehensive error handling\n\n## API Documentation\n\n### Python API (realtime_mlx_stt)\n\n```python\nfrom realtime_mlx_stt import STTClient, TranscriptionSession, create_transcriber\n\n# Method 1: Modern Client API\nclient = STTClient(\n openai_api_key=\"sk-...\", # Optional\n default_engine=\"mlx_whisper\", # or \"openai\"\n default_language=\"en\" # or None for auto-detect\n)\n\n# Transcribe for fixed duration\nfor result in client.transcribe(duration=10):\n print(f\"{result.text} (confidence: {result.confidence})\")\n\n# Streaming with stop word\nwith client.stream() as stream:\n for result in stream:\n print(result.text)\n if \"stop\" in result.text.lower():\n break\n\n# Method 2: Session-based API\nfrom realtime_mlx_stt import TranscriptionSession, ModelConfig, VADConfig\n\nsession = TranscriptionSession(\n model=ModelConfig(engine=\"mlx_whisper\", language=\"no\"),\n vad=VADConfig(sensitivity=0.8),\n on_transcription=lambda r: print(r.text)\n)\n\nwith session:\n time.sleep(30) # Listen for 30 seconds\n\n# Method 3: Simple Transcriber\nfrom realtime_mlx_stt import Transcriber\ntranscriber = Transcriber(language=\"es\")\ntext = transcriber.transcribe_from_mic(duration=5)\nprint(f\"You said: {text}\")\n```\n\n### REST API\n\n```bash\n# Start system with profile\ncurl -X POST http://localhost:8000/api/v1/system/start \\\n -H \"Content-Type: application/json\" \\\n -d '{\n \"profile\": \"vad-triggered\",\n \"custom_config\": {\n \"transcription\": {\"language\": \"fr\"},\n \"vad\": {\"sensitivity\": 0.7}\n }\n }'\n\n# Get system status\ncurl http://localhost:8000/api/v1/system/status\n\n# Transcribe audio file\ncurl -X POST http://localhost:8000/api/v1/transcription/audio \\\n -H \"Content-Type: application/json\" \\\n -d '{\"audio_data\": \"base64_encoded_audio_data\"}'\n```\n\n### WebSocket Events\n\n```javascript\nconst ws = new WebSocket('ws://localhost:8000/events');\n\nws.onmessage = (event) => {\n const data = JSON.parse(event.data);\n \n switch(data.type) {\n case 'transcription':\n if (data.is_final) {\n console.log(`Final: ${data.text}`);\n } else {\n console.log(`Transcribing: ${data.text}`);\n }\n break;\n case 'wake_word':\n console.log(`Wake word: ${data.wake_word}`);\n break;\n }\n```\n\n## Configuration\n\n### Environment Variables\n\n```bash\n# API Keys\nexport OPENAI_API_KEY=\"sk-...\" # For OpenAI transcription\nexport PORCUPINE_ACCESS_KEY=\"...\" # For wake word detection\n# Alternative names for Picovoice universal key (same as PORCUPINE_ACCESS_KEY):\n# export PICOVOICE_ACCESS_KEY=\"...\"\n# export PICOVOICE_API_KEY=\"...\"\n\n# Logging\nexport LOG_LEVEL=\"INFO\" # DEBUG, INFO, WARNING, ERROR\nexport LOG_FORMAT=\"human\" # human, json, detailed\n```\n\n### Python Configuration\n\n```python\nfrom realtime_mlx_stt import ModelConfig, VADConfig, WakeWordConfig\n\n# Model configuration\nmodel = ModelConfig(\n engine=\"mlx_whisper\", # or \"openai\"\n model=\"whisper-large-v3-turbo\",\n language=\"en\" # or None for auto-detect\n)\n\n# VAD configuration\nvad = VADConfig(\n enabled=True,\n sensitivity=0.6, # 0.0-1.0\n min_speech_duration=0.25, # seconds\n min_silence_duration=0.1 # seconds\n)\n\n# Wake word configuration\n# Note: Requires PORCUPINE_ACCESS_KEY environment variable\nwake_word = WakeWordConfig(\n words=[\"jarvis\", \"computer\"],\n sensitivity=0.7,\n timeout=30 # seconds\n)\n\n## Testing\n\nThe project includes comprehensive tests for each feature and component:\n\n```bash\n# Run all tests\npython tests/run_tests.py\n\n# Run tests for a specific feature or component\npython tests/run_tests.py -f VoiceActivityDetection\npython tests/run_tests.py -f Infrastructure\npython tests/run_tests.py -f Application # Server/Client tests\n\n# Run a specific test with verbose output\npython tests/run_tests.py -t webrtc_vad_test -v\npython tests/run_tests.py -t test_server_module -v\n\n# Test with PYTHONPATH (if imports fail)\nPYTHONPATH=/path/to/Realtime_mlx_STT python tests/run_tests.py\n```\n\nThe Server implementation includes tests for:\n- API Controllers (Transcription and System)\n- WebSocket connections and event broadcasting\n- Configuration and profile management\n- Command/Event integration\n\n## Performance\n\nOn Apple Silicon (M1/M2/M3), the MLX-optimized Whisper-large-v3-turbo model typically achieves:\n\n- **Batch mode**: ~0.3-0.5x realtime (processes 60 seconds of audio in 20-30 seconds)\n- **Streaming mode**: ~0.5-0.7x realtime (processes audio with ~2-3 second latency)\n\nThe MLX implementation takes full advantage of the Neural Engine in Apple Silicon chips, providing significantly better performance than CPU-based implementations.\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n1. Fork the repository\n2. Create your feature branch (`git checkout -b feature/amazing-feature`)\n3. Commit your changes (`git commit -m 'Add some amazing feature'`)\n4. Push to the branch (`git push origin feature/amazing-feature`)\n5. Open a Pull Request\n\n## Recent Updates\n\n- **New Python API**: Added high-level `realtime_mlx_stt` package with STTClient, TranscriptionSession, and Transcriber\n- **Interactive CLI**: New user-friendly CLI at `examples/cli.py` for exploring all features\n- **Dual API Architecture**: Python API optimized for direct use, Server API for multi-client scenarios\n- **Improved Examples**: Consolidated examples with clear documentation\n- **Architecture Documentation**: Added comprehensive architecture documentation\n- **OpenAI Integration**: Support for OpenAI's transcription API as alternative to local MLX\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n## Acknowledgments\n\n- [OpenAI Whisper](https://github.com/openai/whisper) for the base Whisper large-v3-turbo model\n- [MLX](https://github.com/ml-explore/mlx) for Apple Silicon optimization\n- [RealtimeSTT](https://github.com/KoljaB/RealtimeSTT) for the original audio processing concepts\n- [Picovoice Porcupine](https://picovoice.ai/platform/porcupine/) for wake word detection\n- [Hugging Face](https://huggingface.co) for model distribution infrastructure\n",
"bugtrack_url": null,
"license": null,
"summary": "Real-time speech-to-text transcription optimized for Apple Silicon",
"version": "0.1.5",
"project_urls": {
"Homepage": "https://github.com/kristofferv98/Realtime_mlx_STT",
"Issues": "https://github.com/kristofferv98/Realtime_mlx_STT/issues",
"Repository": "https://github.com/kristofferv98/Realtime_mlx_STT"
},
"split_keywords": [
"speech-to-text",
" stt",
" transcription",
" whisper",
" mlx",
" apple-silicon",
" real-time",
" voice-activity-detection",
" vad",
" wake-word",
" speech-recognition",
" audio-processing",
" macos",
" m1",
" m2",
" m3",
" neural-engine"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "8a61371c1466bc71f41b0fdb1172e456b63aadbfd9245388982d06f19b129c0c",
"md5": "6333dab9138f3274aa527a501670de4b",
"sha256": "b68b0ef18bdfd182d4845694838c416c712d88b3cef903fce21649b9b649f24f"
},
"downloads": -1,
"filename": "realtime_mlx_stt-0.1.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6333dab9138f3274aa527a501670de4b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 284273,
"upload_time": "2025-07-14T16:55:54",
"upload_time_iso_8601": "2025-07-14T16:55:54.266294Z",
"url": "https://files.pythonhosted.org/packages/8a/61/371c1466bc71f41b0fdb1172e456b63aadbfd9245388982d06f19b129c0c/realtime_mlx_stt-0.1.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "dd0a7697b2bac9d3052c09d2c0a7205281599584ba861c67e87a93b348e71145",
"md5": "835cea83211a382a66bcedd260791079",
"sha256": "7035bd65e5022dcc7798e3c2cc333d2f1fe7e0f9d05c0b2baea20773658980b1"
},
"downloads": -1,
"filename": "realtime_mlx_stt-0.1.5.tar.gz",
"has_sig": false,
"md5_digest": "835cea83211a382a66bcedd260791079",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 202642,
"upload_time": "2025-07-14T16:55:56",
"upload_time_iso_8601": "2025-07-14T16:55:56.786811Z",
"url": "https://files.pythonhosted.org/packages/dd/0a/7697b2bac9d3052c09d2c0a7205281599584ba861c67e87a93b348e71145/realtime_mlx_stt-0.1.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-14 16:55:56",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "kristofferv98",
"github_project": "Realtime_mlx_STT",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "realtime-mlx-stt"
}