# whisperpipe
Real-time speech-to-text streaming with OpenAI Whisper
## Description
whisperpipe is a powerful, easy-to-use Python package for real-time, offline audio transcription using OpenAI's Whisper model. It runs locally, making it a free and private solution for continuous speech-to-text applications. It provides seamless integration with callback functions for LLM processing and supports pause/resume functionality for interactive applications.
## Why whisperpipe?
In a world where most ASR (Automatic Speech Recognition) services are cloud-based, whisperpipe offers a refreshing alternative by harnessing the power of OpenAI's Whisper model to run directly on your local machine. This approach provides several key advantages:
- **Complete Privacy**: Since all transcription is done locally, your voice data never leaves your computer. This is crucial for applications that handle sensitive or private conversations.
- **Zero Cost**: Say goodbye to recurring subscription fees and per-minute charges. whisperpipe is free to use, making it an economical choice for both hobbyists and commercial projects.
- **No Internet Required**: Whether you're on a plane, in a remote location, or simply have an unstable internet connection, whisperpipe works flawlessly offline.
- **Real-time Performance**: Designed for continuous, real-time transcription, whisperpipe is ideal for live applications such as voice-controlled assistants, dictation software, and more.
- **Unleash the Power of Whisper**: By running the Whisper model locally, you have full control over the transcription process, from model selection to performance tuning.
whisperpipe empowers you to build powerful, private, and cost-effective voice applications with ease.
## Features
- **Real-time audio transcription** using OpenAI Whisper
- **Callback system** for custom processing (LLM integration, etc.)
- **Pause/Resume functionality** for interactive applications
- **Multiple language support**
- **Configurable processing parameters**
- **Thread-safe operation**
- **Easy installation and usage**
## Installation
### From PyPI
```bash
pip install whisperpipe
```
### From GitHub
```bash
pip install git+https://github.com/Erfan-ram/whisperpipe.git
```
## Quick Start
```python
from whisperpipe import pipeStream
# Basic usage
transcriber = pipeStream(
model_name="base",
language="en",
finalization_delay=10.0,
processing_interval=1.0
)
# Start streaming
transcriber.start_streaming()
```
## Usage Examples
### Basic Transcription
```python
from whisperpipe import pipeStream
# Create transcriber instance
transcriber = pipeStream(
model_name="base",
language="en",
finalization_delay=10.0,
processing_interval=1.0
)
# Start transcription
transcriber.start_streaming()
# The transcribed text will be printed to console
# Press Ctrl+C to stop
```
### With Custom Callback (LLM Integration)
```python
from whisperpipe import pipeStream
def llm_processor(text):
"""Custom function to process transcribed text"""
print(f"Processing: {text}")
# Your LLM integration here
# e.g., send to OpenAI, Claude, local model, etc.
response = your_llm_api.chat(text)
print(f"Response: {response}")
return response
# Create transcriber with callback
transcriber = pipeStream(
model_name="base",
language="en",
finalization_delay=10.0,
processing_interval=1.0
)
# Register callback
transcriber.set_def_callback(llm_processor)
# Start streaming with LLM integration
transcriber.start_streaming()
```
### Interactive Mode with Pause/Resume
```python
from whisperpipe import pipeStream
import time
def interactive_processor(text):
"""Process text and pause for response"""
# Pause transcriber while processing
transcriber.pause_streaming()
print(f"User said: {text}")
# Process with your system
response = process_with_llm(text)
# Speak or display response
print(f"Assistant: {response}")
# Resume for next input
transcriber.resume_streaming()
transcriber = pipeStream()
transcriber.set_def_callback(interactive_processor)
transcriber.start_streaming()
```
## API Reference
### Constructor Parameters
- `model_name` (str): Whisper model name ("tiny", "base", "small", "medium", "large"). Default: "base"
- `language` (str): Language code for transcription ("en", "es", "fr", etc.). Default: "en"
- `finalization_delay` (float): Wait time in seconds before finalizing transcription. Default: 10.0
- `processing_interval` (float): Interval in seconds between processing cycles. Default: 1.0
- `buffer_duration_seconds` (float): Time window in seconds to hold audio for processing. Default: 5.0
- `debug_mode` (bool): Enable debug mode for detailed logging. Default: True
### Methods
#### Core Methods
- `start_streaming()`: Start audio capture and transcription
- `stop_streaming()`: Stop audio capture and transcription
#### Callback System
- `set_def_callback(callback_function)`: Register a callback function for processing transcribed text
- `set_def_callback(None)`: Clear the callback (use default behavior)
#### Pause/Resume Control
- `pause_streaming()`: Pause audio processing temporarily
- `resume_streaming()`: Resume audio processing
- `is_paused()`: Check if transcriber is paused
- `is_running()`: Check if transcriber is running
## Requirements
- Python 3.8+
- PyAudio
- OpenAI Whisper
- PyTorch
- NumPy
- pynput
## License
MIT License
## Author
Erfan Ramezani - erfanramezany245@gmail.com
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## Support
For issues and questions, please use the [GitHub Issues](https://github.com/Erfan-ram/whisperpipe/issues) page.
Raw data
{
"_id": null,
"home_page": null,
"name": "whisperpipe",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.13,>=3.9",
"maintainer_email": null,
"keywords": "whisper, openai, speech-to-text, stt, asr, automatic speech recognition, real-time, realtime, audio, microphone, transcription, streaming, live transcription",
"author": "Erfan Ramezani",
"author_email": "erfanramezany245@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/15/e2/3a792ced5852231a066880fa1da7073095805ac4501aca018e32dba4e564/whisperpipe-0.1.0.tar.gz",
"platform": null,
"description": "# whisperpipe\n\nReal-time speech-to-text streaming with OpenAI Whisper\n\n## Description\n\nwhisperpipe is a powerful, easy-to-use Python package for real-time, offline audio transcription using OpenAI's Whisper model. It runs locally, making it a free and private solution for continuous speech-to-text applications. It provides seamless integration with callback functions for LLM processing and supports pause/resume functionality for interactive applications.\n\n## Why whisperpipe?\n\nIn a world where most ASR (Automatic Speech Recognition) services are cloud-based, whisperpipe offers a refreshing alternative by harnessing the power of OpenAI's Whisper model to run directly on your local machine. This approach provides several key advantages:\n\n- **Complete Privacy**: Since all transcription is done locally, your voice data never leaves your computer. This is crucial for applications that handle sensitive or private conversations.\n- **Zero Cost**: Say goodbye to recurring subscription fees and per-minute charges. whisperpipe is free to use, making it an economical choice for both hobbyists and commercial projects.\n- **No Internet Required**: Whether you're on a plane, in a remote location, or simply have an unstable internet connection, whisperpipe works flawlessly offline.\n- **Real-time Performance**: Designed for continuous, real-time transcription, whisperpipe is ideal for live applications such as voice-controlled assistants, dictation software, and more.\n- **Unleash the Power of Whisper**: By running the Whisper model locally, you have full control over the transcription process, from model selection to performance tuning.\n\nwhisperpipe empowers you to build powerful, private, and cost-effective voice applications with ease.\n\n## Features\n\n- **Real-time audio transcription** using OpenAI Whisper\n- **Callback system** for custom processing (LLM integration, etc.)\n- **Pause/Resume functionality** for interactive applications\n- **Multiple language support**\n- **Configurable processing parameters**\n- **Thread-safe operation**\n- **Easy installation and usage**\n\n## Installation\n\n### From PyPI\n\n```bash\npip install whisperpipe\n```\n\n### From GitHub\n\n```bash\npip install git+https://github.com/Erfan-ram/whisperpipe.git\n```\n\n## Quick Start\n\n```python\nfrom whisperpipe import pipeStream\n\n# Basic usage\ntranscriber = pipeStream(\n model_name=\"base\",\n language=\"en\",\n finalization_delay=10.0,\n processing_interval=1.0\n)\n\n# Start streaming\ntranscriber.start_streaming()\n```\n\n## Usage Examples\n\n### Basic Transcription\n\n```python\nfrom whisperpipe import pipeStream\n\n# Create transcriber instance\ntranscriber = pipeStream(\n model_name=\"base\",\n language=\"en\",\n finalization_delay=10.0,\n processing_interval=1.0\n)\n\n# Start transcription\ntranscriber.start_streaming()\n\n# The transcribed text will be printed to console\n# Press Ctrl+C to stop\n```\n\n### With Custom Callback (LLM Integration)\n\n```python\nfrom whisperpipe import pipeStream\n\ndef llm_processor(text):\n \"\"\"Custom function to process transcribed text\"\"\"\n print(f\"Processing: {text}\")\n # Your LLM integration here\n # e.g., send to OpenAI, Claude, local model, etc.\n response = your_llm_api.chat(text)\n print(f\"Response: {response}\")\n return response\n\n# Create transcriber with callback\ntranscriber = pipeStream(\n model_name=\"base\",\n language=\"en\",\n finalization_delay=10.0,\n processing_interval=1.0\n)\n\n# Register callback\ntranscriber.set_def_callback(llm_processor)\n\n# Start streaming with LLM integration\ntranscriber.start_streaming()\n```\n\n### Interactive Mode with Pause/Resume\n\n```python\nfrom whisperpipe import pipeStream\nimport time\n\ndef interactive_processor(text):\n \"\"\"Process text and pause for response\"\"\"\n # Pause transcriber while processing\n transcriber.pause_streaming()\n \n print(f\"User said: {text}\")\n \n # Process with your system\n response = process_with_llm(text)\n \n # Speak or display response\n print(f\"Assistant: {response}\")\n \n # Resume for next input\n transcriber.resume_streaming()\n\ntranscriber = pipeStream()\ntranscriber.set_def_callback(interactive_processor)\ntranscriber.start_streaming()\n```\n\n## API Reference\n\n### Constructor Parameters\n\n- `model_name` (str): Whisper model name (\"tiny\", \"base\", \"small\", \"medium\", \"large\"). Default: \"base\"\n- `language` (str): Language code for transcription (\"en\", \"es\", \"fr\", etc.). Default: \"en\"\n- `finalization_delay` (float): Wait time in seconds before finalizing transcription. Default: 10.0\n- `processing_interval` (float): Interval in seconds between processing cycles. Default: 1.0\n- `buffer_duration_seconds` (float): Time window in seconds to hold audio for processing. Default: 5.0\n- `debug_mode` (bool): Enable debug mode for detailed logging. Default: True\n\n### Methods\n\n#### Core Methods\n- `start_streaming()`: Start audio capture and transcription\n- `stop_streaming()`: Stop audio capture and transcription\n\n#### Callback System\n- `set_def_callback(callback_function)`: Register a callback function for processing transcribed text\n- `set_def_callback(None)`: Clear the callback (use default behavior)\n\n#### Pause/Resume Control\n- `pause_streaming()`: Pause audio processing temporarily\n- `resume_streaming()`: Resume audio processing\n- `is_paused()`: Check if transcriber is paused\n- `is_running()`: Check if transcriber is running\n\n## Requirements\n\n- Python 3.8+\n- PyAudio\n- OpenAI Whisper\n- PyTorch\n- NumPy\n- pynput\n\n## License\n\nMIT License\n\n## Author\n\nErfan Ramezani - erfanramezany245@gmail.com\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## Support\n\nFor issues and questions, please use the [GitHub Issues](https://github.com/Erfan-ram/whisperpipe/issues) page.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Real-time speech-to-text streaming with OpenAI Whisper",
"version": "0.1.0",
"project_urls": {
"Bug Tracker": "https://github.com/Erfan-ram/whisperpipe/issues",
"Changelog": "https://github.com/Erfan-ram/whisperpipe/releases",
"Documentation": "https://github.com/Erfan-ram/whisperpipe/blob/main/README.md",
"Homepage": "https://github.com/Erfan-ram/whisperpipe",
"Repository": "https://github.com/Erfan-ram/whisperpipe",
"Source": "https://github.com/Erfan-ram/whisperpipe"
},
"split_keywords": [
"whisper",
" openai",
" speech-to-text",
" stt",
" asr",
" automatic speech recognition",
" real-time",
" realtime",
" audio",
" microphone",
" transcription",
" streaming",
" live transcription"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "235fd9d68f2c318832a6a1ad2b02383ce1223b88ee227303d2b570810328a3fe",
"md5": "7185872aca2f1493a387198107fe3ca1",
"sha256": "afd7737cf474a6767ba1a82ce3dd35975e469a920f317168b83a9892e3ffc61d"
},
"downloads": -1,
"filename": "whisperpipe-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "7185872aca2f1493a387198107fe3ca1",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.13,>=3.9",
"size": 21560,
"upload_time": "2025-10-20T23:24:11",
"upload_time_iso_8601": "2025-10-20T23:24:11.032665Z",
"url": "https://files.pythonhosted.org/packages/23/5f/d9d68f2c318832a6a1ad2b02383ce1223b88ee227303d2b570810328a3fe/whisperpipe-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "15e23a792ced5852231a066880fa1da7073095805ac4501aca018e32dba4e564",
"md5": "e506cedd3cb28ecaefec5317f7f96ac3",
"sha256": "cd750f587cf54bdd7f3dfefb7ff101545e8265961b3264b950c98a92b193d1f1"
},
"downloads": -1,
"filename": "whisperpipe-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "e506cedd3cb28ecaefec5317f7f96ac3",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.13,>=3.9",
"size": 22746,
"upload_time": "2025-10-20T23:24:12",
"upload_time_iso_8601": "2025-10-20T23:24:12.467073Z",
"url": "https://files.pythonhosted.org/packages/15/e2/3a792ced5852231a066880fa1da7073095805ac4501aca018e32dba4e564/whisperpipe-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-20 23:24:12",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Erfan-ram",
"github_project": "whisperpipe",
"github_not_found": true,
"lcname": "whisperpipe"
}