<div align="center">
# Live-Translation
**A real-time speech-to-text translation system built on a modular serverβclient architecture.**
[](https://github.com/AbdullahHendy/live-translation/actions/workflows/ci.yml)
[](https://pypi.org/project/live-translation/)
[](https://github.com/AbdullahHendy/live-translation/blob/main/LICENSE)
[](https://www.python.org/downloads/)
</br>
[](https://en.wikipedia.org/wiki/Client%E2%80%93server_model)
[](https://en.wikipedia.org/wiki/WebSocket)
[](https://en.wikipedia.org/wiki/Pulse-code_modulation)
[](https://en.wikipedia.org/wiki/Streaming_media)
[](https://en.wikipedia.org/wiki/Opus_(audio_format))
</br>
[](https://github.com/AbdullahHendy/live-translation/commits/main)
[](https://github.com/AbdullahHendy/live-translation/issues)
[](https://github.com/AbdullahHendy/live-translation/stargazers)
</br>
[](https://github.com/AbdullahHendy/live-translation/blob/main/ruff.toml)
[](https://codecov.io/github/AbdullahHendy/live-translation)
</br>
[](https://github.com/AbdullahHendy/live-translation/tree/main/examples)
[](https://github.com/AbdullahHendy/live-translation/tree/main/examples/clients/nodejs)
[](https://github.com/AbdullahHendy/live-translation/tree/main/examples/clients/browser_js)
[](https://github.com/AbdullahHendy/live-translation/tree/main/examples/clients/csharpclient)
[](https://github.com/AbdullahHendy/live-translation/tree/main/examples/clients/go_client)
[](https://github.com/AbdullahHendy/live-translation/tree/main/examples/clients/android)
</br>
[](https://huggingface.co/Helsinki-NLP)
[](https://github.com/openai/whisper)
[](https://www.python.org/)
</div>
---
## Demos
> **NOTE**: This project is ***not*** meant to be a plug-and-play translation app for web browsers. Instead, it serves as a ***foundational enabler*** for building [real-time translation experiences](https://github.com/AbdullahHendy/live-translation/tree/main/examples).
>
### π Browser Client Experience
*A javascript example client for the ***live translation*** server*
*See [Under the Hood](#-under-the-hood)*
<a href="https://github.com/AbdullahHendy/live-translation/blob/main/doc/browser_js.gif?raw=true" target="_blank">
<img src="https://github.com/AbdullahHendy/live-translation/blob/main/doc/browser_js.gif?raw=true" alt="Browser-Client Demo" />
</a>
### πͺ Under the Hood
*On the left, the ***live translation*** [CLI](#cli) server*
*On the right, the ***live translation*** [CLI](#cli) client*
*For a deeper dive into more ways to use ***live translation*** server and clients, see the [Usage](#-usage) section*
<a href="https://github.com/AbdullahHendy/live-translation/blob/main/doc/demo.gif?raw=true" target="_blank">
<img src="https://github.com/AbdullahHendy/live-translation/blob/main/doc/demo.gif?raw=true" alt="Server-Client Demo" />
</a>
---
## π·πΌββοΈ Architecture Overview
***The diagram ommits finer details***
<img src="https://github.com/AbdullahHendy/live-translation/blob/main/doc/live-translation-pipeline.png?raw=true" alt="Architecture Diagram" />
---
## β Features
- Real-time speech capture using **PyAudio**
- Voice Activity Detection (VAD) using **Silero** for more efficient processing
- Speech-to-text transcription using OpenAI's **Whisper**
- Translation of transcriptions using Helsinki-NLP's **OpusMT**
- **Full-duplex WebSocket streaming** between client and server
- Audio compression via **Opus** codec support for lower bandwidth usage
- Multithreaded design for parallelized processing
- Optional server logging:
- Print to **stdout**
- Save transcription/translation logs to a structured **.jsonl** file
- Designed for both:
- Simple **CLI** usage (***live-translate-server***, ***live-translate-client***)
- **Python API** usage (***LiveTranslationServer***, ***LiveTranslationClient***) with Asynchronous support for embedding in larger systems
---
## π Prerequisites
Before running the project, you need to install the following system dependencies:
### **Debian**
- [**PortAudio**](https://www.portaudio.com/) (for audio input handling)
```bash
sudo apt-get install portaudio19-dev
```
### **MacOS**
- [**PortAudio**](https://www.portaudio.com/) (for audio input handling)
```bash
brew install portaudio
```
---
## π₯ Installation
**(RECOMMENDED)**: install this package inside a virtual environment to avoid dependency conflicts.
```bash
python -m venv .venv
source .venv/bin/activate
```
**Install** the [PyPI package](https://pypi.org/project/live-translation/):
```bash
pip install live-translation
```
**Verify** the installation:
```bash
python -c "import live_translation; print(f'live-translation installed successfully\n{live_translation.__version__}')"
```
---
## π Usage
> **NOTE**: One can safely ignore similar warnings that might appear on **Linux** systems when running the client as it tries to open the mic:
>
> ALSA lib pcm_dsnoop.c:567:(snd_pcm_dsnoop_open) unable to open slave
> ALSA lib pcm_dmix.c:1000:(snd_pcm_dmix_open) unable to open slave
> ALSA lib pcm.c:2722:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
> ALSA lib pcm.c:2722:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
> ALSA lib pcm.c:2722:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
> ALSA lib pcm_dmix.c:1000:(snd_pcm_dmix_open) unable to open slave
> Cannot connect to server socket err = No such file or directory
> Cannot connect to server request channel
> jack server is not running or cannot be started
> JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock
> JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock
>
### CLI
* **demo** can be run directly from the command line:
> **NOTE**: This is a convenience demo cli tool to run both the **server** and the **client** with ***default*** configs. It should only be used for a quick demo. It's ***highly recommended*** to start a separate server and client for full customization as shown below.
>
```bash
live-translate-demo
```
* **server** can be run directly from the command line:
> **NOTE**: Running the server for the first time will download the required models in the **Cache** folder (e.g. `~/.cache` on linux). The downloading process in the first run might clutter the terminal view leading to scattered and unpredicted locations of the **initial server logs**. It is advised to rerun the server after all models finish downloading for better view of the **initial server logs**.
>
```bash
live-translate-server [OPTIONS]
```
**[OPTIONS]**
```bash
usage: live-translate-server [-h] [--silence_threshold SILENCE_THRESHOLD] [--vad_aggressiveness {0,1,2,3,4,5,6,7,8,9}] [--max_buffer_duration {5,6,7,8,9,10}] [--codec {pcm,opus}]
[--device {cpu,cuda}] [--whisper_model {tiny,base,small,medium,large,large-v2,large-v3,large-v3-turbo}]
[--trans_model {Helsinki-NLP/opus-mt,Helsinki-NLP/opus-mt-tc-big}] [--src_lang SRC_LANG] [--tgt_lang TGT_LANG] [--log {print,file}] [--ws_port WS_PORT]
[--transcribe_only] [--version]
Live Translation Server - Configure runtime settings.
options:
-h, --help show this help message and exit
--silence_threshold SILENCE_THRESHOLD
Number of consecutive seconds to detect SILENCE.
SILENCE clears the audio buffer for transcription/translation.
NOTE: Minimum value is 1.5.
Default is 2.
--vad_aggressiveness {0,1,2,3,4,5,6,7,8,9}
Voice Activity Detection (VAD) aggressiveness level (0-9).
Higher values mean VAD has to be more confident to detect speech vs silence.
Default is 8.
--max_buffer_duration {5,6,7,8,9,10}
Max audio buffer duration in seconds before trimming it.
Default is 7 seconds.
--codec {pcm,opus} Audio codec for WebSocket communication ('pcm', 'opus').
Default is 'opus'.
--device {cpu,cuda} Device for processing ('cpu', 'cuda').
Default is 'cpu'.
--whisper_model {tiny,base,small,medium,large,large-v2,large-v3,large-v3-turbo}
Whisper model size ('tiny', 'base', 'small', 'medium', 'large', 'large-v2', 'large-v3', 'large-v3-turbo).
NOTE: Running large models like 'large-v3', or 'large-v3-turbo' might require a decent GPU with CUDA support for reasonable performance.
NOTE: large-v3-turbo has great accuracy while being significantly faster than the original large-v3 model. see: https://github.com/openai/whisper/discussions/2363
Default is 'base'.
--trans_model {Helsinki-NLP/opus-mt,Helsinki-NLP/opus-mt-tc-big}
Translation model ('Helsinki-NLP/opus-mt', 'Helsinki-NLP/opus-mt-tc-big').
NOTE: Don't include source and target languages here.
Default is 'Helsinki-NLP/opus-mt'.
--src_lang SRC_LANG Source/Input language for transcription (e.g., 'en', 'fr').
Default is 'en'.
--tgt_lang TGT_LANG Target language for translation (e.g., 'es', 'de').
Default is 'es'.
--log {print,file} Optional logging mode for saving transcription output.
- 'file': Save each result to a structured .jsonl file in ./transcripts/transcript_{TIMESTAMP}.jsonl.
- 'print': Print each result to stdout.
Default is None (no logging).
--ws_port WS_PORT WebSocket port the of the server.
Used to listen for client audio and publish output (e.g., 8765).
--transcribe_only Transcribe only mode. No translations are performed.
--version Print version and exit.
```
* **client** can be run directly from the command line:
```bash
live-translate-client [OPTIONS]
```
**[OPTIONS]**
```bash
usage: live-translate-client [-h] [--server SERVER] [--codec {pcm,opus}] [--version]
Live Translation Client - Stream audio to the server.
options:
-h, --help show this help message and exit
--server SERVER WebSocket URI of the server (e.g., ws://localhost:8765)
--codec {pcm,opus} Audio codec for WebSocket communication ('pcm', 'opus').
Default is 'opus'.
--version Print version and exit.
```
### Python API
You can also import and use ***live_translation*** directly in your Python code.
The following is ***simple*** examples of running ***live_translation***'s server and client in a **blocking** fashion.
For more detailed examples showing **non-blocking** and **asynchronous** workflows, see [./examples/](https://github.com/AbdullahHendy/live-translation/tree/main/examples).
> **NOTE**: The examples below assumes the ***live_translation*** package has been installed as shown in the [Installation](#-installation).
>
> **NOTE**: To run a provided example using the ***Python API***, see instructions in the [./examples/](https://github.com/AbdullahHendy/live-translation/tree/main/examples) directory.
- **Server**
```python
from live_translation import LiveTranslationServer, ServerConfig
def main():
config = ServerConfig(
device="cpu",
ws_port=8765,
log="print",
transcribe_only=False,
codec="opus",
)
server = LiveTranslationServer(config)
server.run(blocking=True)
# Main guard is CRITICAL for systems that uses spawn method to create new processes
# This is the case for Windows and MacOS
if __name__ == "__main__":
main()
```
- **Client**
```python
from live_translation import LiveTranslationClient, ClientConfig
def parser_callback(entry, *args, **kwargs):
"""Callback function to parse the output from the server.
Args:
entry (dict): The message from the server.
*args: Optional positional args passed from the client.
**kwargs: Optional keyword args passed from the client.
"""
print(f"π {entry['transcription']}")
print(f"π {entry['translation']}")
# Returning True signals the client to shutdown
return False
def main():
config = ClientConfig(
server_uri="ws://localhost:8765",
codec="opus",
)
client = LiveTranslationClient(config)
client.run(
callback=parser_callback,
callback_args=(), # Optional: positional args to pass
callback_kwargs={}, # Optional: keyword args to pass
blocking=True,
)
if __name__ == "__main__":
main()
```
### Non-Python Integration
If you're writing a **custom client** or integrating this system into another application, you can interact with the server directly using the WebSocket protocol.
### Protocol Overview
The server listens on a WebSocket endpoint (default: `ws://localhost:8765`) and expects the client to:
- **Send**: **encoded PCM** audio using the [**Opus codec**](https://en.wikipedia.org/wiki/Opus_(audio_format)) with the following specs:
- Format: 16-bit signed integer (`int16`)
- Sample Rate: 16,000 Hz
- Channels: Mono (1 channel)
- Chunk Size: 640 samples = 1280 bytes per message (40 ms)
- Each encoded chunk should be sent immediately over the WebSocket
> **NOTE**: The server also supports receiving **raw PCM** using the ***--codec pcm*** server option. The specs are identical to above, except not encoded.
>
- **Receive**: structured ***JSON*** messages with timestamp, transcription and translation fields
```json
{
"timestamp": "2025-05-25T12:58:35.259085+00:00",
"transcription": "Good morning, I hope everyone's doing great.",
"translation": "Buenos dΓas, espero que todo el mundo estΓ© bien"
}
### Client Examples
For fully working, ***yet simple***, examples in multiple languages, see [./examples/clients](https://github.com/AbdullahHendy/live-translation/tree/main/examples/clients)
To create more complex clients, look at the [python client](https://github.com/AbdullahHendy/live-translation/blob/main/live_translation/client/client.py) for guidance.
Available Examples:
- **Node.js**
- **Browser JS**
- **Go**
- **C#**
- **Kotlin/Android**
---
## π€ Development & Contribution
To contribute or modify this project, these steps might be helpful:
> **NOTE**: This workflow below is developed with Linux-based systems with typical build tools installed e.g. ***Make*** in mind. One might need to install ***Make*** and possibly other tools on other systems. However, one can still do things manually without ***Make***, for example, run test manually using `python -m pytest -s tests/` instead of `make test`.
> See **Makefile** for more details.
>
**Fork & Clone** the repository:
```bash
git clone git@github.com:<your-username>/live-translation.git
cd live-translation
```
**Ceate** a virtual environment:
```bash
python -m venv .venv
source .venv/bin/activate
```
**Install** the package and its dependencies in ***editable mode***:
```bash
pip install --upgrade pip
pip install -e .[dev,examples] # Install with optional examples dependencies
```
This is **equivalent** to:
```bash
make install
```
**Test** the package:
```bash
make test
```
**Build** the package:
```bash
make build
```
> **NOTE**: Building does ***lint*** and checks for ***formatting*** using [ruff](https://docs.astral.sh/ruff/). One can do that seprately using `make format` and `make lint`. For linting and formatting rules, see the [ruff config](https://github.com/AbdullahHendy/live-translation/blob/main/ruff.toml).
> **NOTE**: Building generates a ***.whl*** file that can be ***pip*** installed in a new environment for testing
**Check** more available ***make*** commands
```bash
make help
```
**For quick testing**, run the server and the client within the virtual environment:
```bash
live-translate-server [OPTIONS]
live-translate-client [OPTIONS]
```
> **NOTE**: Since the package was installed in editable mode, any changes will be reflected when the cli tools are run
**For contribution**:
- Make your changes in a feature branch
- Ensure all tests pass
- Open a Pull Request (PR) with a clear description of your changes
---
## π± Tested Environments
This project was tested and developed on the following system configuration:
- **Architecture**: x86_64 (64-bit)
- **Operating System**: Ubuntu 24.10 (Oracular Oriole)
- **Kernel Version**: 6.11.0-18-generic
- **Python Version**: 3.12.7
- **Processor**: 13th Gen Intel(R) Core(TM) i9-13900HX
- **GPU**: GeForce RTX 4070 Max-Q / Mobile [^1]
- **NVIDIA Driver Version**: 560.35.03
- **CUDA Toolkit Version**: 12.1
- **cuDNN Version**: 9.7.1
- **RAM**: 32GB DDR5
- **Dependencies**: All required dependencies are listed in `pyproject.toml` and [Prerequisites](#-prerequisites)
[^1]: CUDA as the `DEVICE` is probably needed for heavier models like `large-v3-turbo` for Whisper. [**Nvidia drivers**](https://www.nvidia.com/drivers/), [**CUDA Toolkit**](https://developer.nvidia.com/cuda-downloads), [**cuDNN**](https://developer.nvidia.com/cudnn-downloads) installation needed if option `"cuda"` was to be used.
---
## π Improvements
- **ARM64 Support**: Ensure support for ARM64 based systems.
- **Concurrency Design Check**: Review and optimize the threading design to ensure thread safety and prevent issues like race conditions or deadlocks, etc., revisit the current design of ***WebSocketIO*** being a thread while ***AudioProcessor***, ***Transcriber***, and ***Translator*** being processes.
- **Logging**: Integrate detailed logging to track system activity, errors, and performance metrics using a more formal logging framework.
- **Translation Models**: Some of the models downloaded in ***Translator*** from [OpusMT's Hugging Face](https://huggingface.co/Helsinki-NLP) are not the best performing when compared with top models in [Opus-MT's Leaderboard](https://opus.nlpl.eu/dashboard/). Find a way to automatically download best performing models using the user's input of `src_lang` and `tgt_lang` as it's currently done.
- **System Profiling & Resource Guidelines**: Benchmark and document CPU, memory, and GPU usage across all multiprocessing components. For example, "~35% CPU usage on 24-core **Intel i9-13900HX**", or "GPU load ~20% on **Nvidia RTX 4070** with `large-v3-turbo` Whisper model"). This will help with hardware requirements and deployment decisions.
- **Proper Handshake Protocol**: Instead of duplicate server and clinet options (e.g. --codec), establish a handshake protocol where, for example, server advertises its capabilities and negotiate with client over what options to use.
---
## π Citations
```bibtex
@article{Whisper,
title = {Robust Speech Recognition via Large-Scale Weak Supervision},
url = {https://arxiv.org/abs/2212.04356},
author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
publisher = {arXiv},
year = {2022}
}
@misc{Silero VAD,
author = {Silero Team},
title = {Silero VAD: pre-trained enterprise-grade Voice Activity Detector (VAD), Number Detector and Language Classifier},
year = {2021},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/snakers4/silero-vad}},
email = {hello@silero.ai}
}
@article{tiedemann2023democratizing,
title={Democratizing neural machine translation with {OPUS-MT}},
author={Tiedemann, J{\"o}rg and Aulamo, Mikko and Bakshandaeva, Daria and Boggia, Michele and Gr{\"o}nroos, Stig-Arne and Nieminen, Tommi and Raganato, Alessandro and Scherrer, Yves and Vazquez, Raul and Virpioja, Sami},
journal={Language Resources and Evaluation},
number={58},
pages={713--755},
year={2023},
publisher={Springer Nature},
issn={1574-0218},
doi={10.1007/s10579-023-09704-w}
}
@InProceedings{TiedemannThottingal:EAMT2020,
author = {J{\"o}rg Tiedemann and Santhosh Thottingal},
title = {{OPUS-MT} β {B}uilding open translation services for the {W}orld},
booktitle = {Proceedings of the 22nd Annual Conference of the European Association for Machine Translation (EAMT)},
year = {2020},
address = {Lisbon, Portugal}
}
```
Raw data
{
"_id": null,
"home_page": null,
"name": "live-translation",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.11",
"maintainer_email": null,
"keywords": "translation, transcription, speech-to-text, real-time translation, whisper, marianMT, Opus-MT, live transcription",
"author": null,
"author_email": "Abdullah Hendy <abdullah.a.hendy@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/1b/ec/14962223f76515d5d74d32150385bdf401f2f0836f9b5eb6fec383aebb15/live_translation-0.9.0.tar.gz",
"platform": null,
"description": "<div align=\"center\">\n\n# Live-Translation\n\n**A real-time speech-to-text translation system built on a modular server\u2013client architecture.**\n\n[](https://github.com/AbdullahHendy/live-translation/actions/workflows/ci.yml)\n[](https://pypi.org/project/live-translation/)\n[](https://github.com/AbdullahHendy/live-translation/blob/main/LICENSE)\n[](https://www.python.org/downloads/)\n</br>\n[](https://en.wikipedia.org/wiki/Client%E2%80%93server_model)\n[](https://en.wikipedia.org/wiki/WebSocket)\n[](https://en.wikipedia.org/wiki/Pulse-code_modulation)\n[](https://en.wikipedia.org/wiki/Streaming_media)\n[](https://en.wikipedia.org/wiki/Opus_(audio_format))\n</br>\n[](https://github.com/AbdullahHendy/live-translation/commits/main)\n[](https://github.com/AbdullahHendy/live-translation/issues)\n[](https://github.com/AbdullahHendy/live-translation/stargazers)\n</br>\n[](https://github.com/AbdullahHendy/live-translation/blob/main/ruff.toml)\n[](https://codecov.io/github/AbdullahHendy/live-translation)\n</br>\n[](https://github.com/AbdullahHendy/live-translation/tree/main/examples)\n[](https://github.com/AbdullahHendy/live-translation/tree/main/examples/clients/nodejs)\n[](https://github.com/AbdullahHendy/live-translation/tree/main/examples/clients/browser_js)\n[](https://github.com/AbdullahHendy/live-translation/tree/main/examples/clients/csharpclient)\n[](https://github.com/AbdullahHendy/live-translation/tree/main/examples/clients/go_client)\n[](https://github.com/AbdullahHendy/live-translation/tree/main/examples/clients/android)\n</br>\n[](https://huggingface.co/Helsinki-NLP)\n[](https://github.com/openai/whisper)\n[](https://www.python.org/)\n</div>\n\n---\n\n## Demos\n\n> **NOTE**: This project is ***not*** meant to be a plug-and-play translation app for web browsers. Instead, it serves as a ***foundational enabler*** for building [real-time translation experiences](https://github.com/AbdullahHendy/live-translation/tree/main/examples).\n>\n\n### \ud83c\udf10 Browser Client Experience\n\n*A javascript example client for the ***live translation*** server*\n\n*See [Under the Hood](#-under-the-hood)*\n\n<a href=\"https://github.com/AbdullahHendy/live-translation/blob/main/doc/browser_js.gif?raw=true\" target=\"_blank\">\n <img src=\"https://github.com/AbdullahHendy/live-translation/blob/main/doc/browser_js.gif?raw=true\" alt=\"Browser-Client Demo\" />\n</a>\n\n### \ud83e\ude9b Under the Hood\n\n*On the left, the ***live translation*** [CLI](#cli) server*\n\n*On the right, the ***live translation*** [CLI](#cli) client*\n\n*For a deeper dive into more ways to use ***live translation*** server and clients, see the [Usage](#-usage) section*\n\n<a href=\"https://github.com/AbdullahHendy/live-translation/blob/main/doc/demo.gif?raw=true\" target=\"_blank\">\n <img src=\"https://github.com/AbdullahHendy/live-translation/blob/main/doc/demo.gif?raw=true\" alt=\"Server-Client Demo\" />\n</a>\n\n---\n\n## \ud83d\udc77\ud83c\udffc\u200d\u2642\ufe0f Architecture Overview\n\n***The diagram ommits finer details***\n\n<img src=\"https://github.com/AbdullahHendy/live-translation/blob/main/doc/live-translation-pipeline.png?raw=true\" alt=\"Architecture Diagram\" />\n\n---\n\n## \u2b50 Features\n\n- Real-time speech capture using **PyAudio**\n- Voice Activity Detection (VAD) using **Silero** for more efficient processing\n- Speech-to-text transcription using OpenAI's **Whisper**\n- Translation of transcriptions using Helsinki-NLP's **OpusMT**\n- **Full-duplex WebSocket streaming** between client and server\n- Audio compression via **Opus** codec support for lower bandwidth usage\n- Multithreaded design for parallelized processing\n- Optional server logging:\n - Print to **stdout**\n - Save transcription/translation logs to a structured **.jsonl** file\n- Designed for both:\n - Simple **CLI** usage (***live-translate-server***, ***live-translate-client***)\n - **Python API** usage (***LiveTranslationServer***, ***LiveTranslationClient***) with Asynchronous support for embedding in larger systems\n\n---\n\n## \ud83d\udcdc Prerequisites\n\nBefore running the project, you need to install the following system dependencies:\n### **Debian**\n- [**PortAudio**](https://www.portaudio.com/) (for audio input handling)\n ```bash\n sudo apt-get install portaudio19-dev\n ```\n### **MacOS**\n- [**PortAudio**](https://www.portaudio.com/) (for audio input handling)\n ```bash\n brew install portaudio\n ```\n---\n\n## \ud83d\udce5 Installation\n\n**(RECOMMENDED)**: install this package inside a virtual environment to avoid dependency conflicts.\n```bash\npython -m venv .venv\nsource .venv/bin/activate\n```\n\n**Install** the [PyPI package](https://pypi.org/project/live-translation/):\n```bash\npip install live-translation\n```\n\n**Verify** the installation:\n```bash\npython -c \"import live_translation; print(f'live-translation installed successfully\\n{live_translation.__version__}')\"\n```\n\n---\n\n## \ud83d\ude80 Usage\n\n> **NOTE**: One can safely ignore similar warnings that might appear on **Linux** systems when running the client as it tries to open the mic:\n>\n> ALSA lib pcm_dsnoop.c:567:(snd_pcm_dsnoop_open) unable to open slave\n> ALSA lib pcm_dmix.c:1000:(snd_pcm_dmix_open) unable to open slave\n> ALSA lib pcm.c:2722:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear\n> ALSA lib pcm.c:2722:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe\n> ALSA lib pcm.c:2722:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side\n> ALSA lib pcm_dmix.c:1000:(snd_pcm_dmix_open) unable to open slave\n> Cannot connect to server socket err = No such file or directory\n> Cannot connect to server request channel\n> jack server is not running or cannot be started\n> JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock\n> JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock\n>\n\n### CLI\n* **demo** can be run directly from the command line:\n > **NOTE**: This is a convenience demo cli tool to run both the **server** and the **client** with ***default*** configs. It should only be used for a quick demo. It's ***highly recommended*** to start a separate server and client for full customization as shown below. \n >\n ```bash\n live-translate-demo\n ```\n\n* **server** can be run directly from the command line:\n > **NOTE**: Running the server for the first time will download the required models in the **Cache** folder (e.g. `~/.cache` on linux). The downloading process in the first run might clutter the terminal view leading to scattered and unpredicted locations of the **initial server logs**. It is advised to rerun the server after all models finish downloading for better view of the **initial server logs**.\n >\n ```bash\n live-translate-server [OPTIONS]\n ```\n\n **[OPTIONS]**\n ```bash\n usage: live-translate-server [-h] [--silence_threshold SILENCE_THRESHOLD] [--vad_aggressiveness {0,1,2,3,4,5,6,7,8,9}] [--max_buffer_duration {5,6,7,8,9,10}] [--codec {pcm,opus}]\n [--device {cpu,cuda}] [--whisper_model {tiny,base,small,medium,large,large-v2,large-v3,large-v3-turbo}]\n [--trans_model {Helsinki-NLP/opus-mt,Helsinki-NLP/opus-mt-tc-big}] [--src_lang SRC_LANG] [--tgt_lang TGT_LANG] [--log {print,file}] [--ws_port WS_PORT]\n [--transcribe_only] [--version]\n\n Live Translation Server - Configure runtime settings.\n\n options:\n -h, --help show this help message and exit\n --silence_threshold SILENCE_THRESHOLD\n Number of consecutive seconds to detect SILENCE.\n SILENCE clears the audio buffer for transcription/translation.\n NOTE: Minimum value is 1.5.\n Default is 2.\n --vad_aggressiveness {0,1,2,3,4,5,6,7,8,9}\n Voice Activity Detection (VAD) aggressiveness level (0-9).\n Higher values mean VAD has to be more confident to detect speech vs silence.\n Default is 8.\n --max_buffer_duration {5,6,7,8,9,10}\n Max audio buffer duration in seconds before trimming it.\n Default is 7 seconds.\n --codec {pcm,opus} Audio codec for WebSocket communication ('pcm', 'opus').\n Default is 'opus'.\n --device {cpu,cuda} Device for processing ('cpu', 'cuda').\n Default is 'cpu'.\n --whisper_model {tiny,base,small,medium,large,large-v2,large-v3,large-v3-turbo}\n Whisper model size ('tiny', 'base', 'small', 'medium', 'large', 'large-v2', 'large-v3', 'large-v3-turbo). \n NOTE: Running large models like 'large-v3', or 'large-v3-turbo' might require a decent GPU with CUDA support for reasonable performance. \n NOTE: large-v3-turbo has great accuracy while being significantly faster than the original large-v3 model. see: https://github.com/openai/whisper/discussions/2363 \n Default is 'base'.\n --trans_model {Helsinki-NLP/opus-mt,Helsinki-NLP/opus-mt-tc-big}\n Translation model ('Helsinki-NLP/opus-mt', 'Helsinki-NLP/opus-mt-tc-big'). \n NOTE: Don't include source and target languages here.\n Default is 'Helsinki-NLP/opus-mt'.\n --src_lang SRC_LANG Source/Input language for transcription (e.g., 'en', 'fr').\n Default is 'en'.\n --tgt_lang TGT_LANG Target language for translation (e.g., 'es', 'de').\n Default is 'es'.\n --log {print,file} Optional logging mode for saving transcription output.\n - 'file': Save each result to a structured .jsonl file in ./transcripts/transcript_{TIMESTAMP}.jsonl.\n - 'print': Print each result to stdout.\n Default is None (no logging).\n --ws_port WS_PORT WebSocket port the of the server.\n Used to listen for client audio and publish output (e.g., 8765).\n --transcribe_only Transcribe only mode. No translations are performed.\n --version Print version and exit.\n ```\n\n* **client** can be run directly from the command line:\n ```bash\n live-translate-client [OPTIONS]\n ```\n\n **[OPTIONS]**\n ```bash\n usage: live-translate-client [-h] [--server SERVER] [--codec {pcm,opus}] [--version]\n\n Live Translation Client - Stream audio to the server.\n\n options:\n -h, --help show this help message and exit\n --server SERVER WebSocket URI of the server (e.g., ws://localhost:8765)\n --codec {pcm,opus} Audio codec for WebSocket communication ('pcm', 'opus').\n Default is 'opus'.\n --version Print version and exit.\n ```\n\n### Python API\nYou can also import and use ***live_translation*** directly in your Python code.\nThe following is ***simple*** examples of running ***live_translation***'s server and client in a **blocking** fashion.\nFor more detailed examples showing **non-blocking** and **asynchronous** workflows, see [./examples/](https://github.com/AbdullahHendy/live-translation/tree/main/examples).\n\n> **NOTE**: The examples below assumes the ***live_translation*** package has been installed as shown in the [Installation](#-installation).\n>\n> **NOTE**: To run a provided example using the ***Python API***, see instructions in the [./examples/](https://github.com/AbdullahHendy/live-translation/tree/main/examples) directory.\n\n- **Server**\n ```python\n from live_translation import LiveTranslationServer, ServerConfig\n\n def main():\n config = ServerConfig(\n device=\"cpu\",\n ws_port=8765,\n log=\"print\",\n transcribe_only=False,\n codec=\"opus\",\n )\n\n server = LiveTranslationServer(config)\n server.run(blocking=True)\n\n # Main guard is CRITICAL for systems that uses spawn method to create new processes\n # This is the case for Windows and MacOS\n if __name__ == \"__main__\":\n main()\n\n ```\n\n- **Client**\n ```python\n from live_translation import LiveTranslationClient, ClientConfig\n\n def parser_callback(entry, *args, **kwargs):\n \"\"\"Callback function to parse the output from the server.\n\n Args:\n entry (dict): The message from the server.\n *args: Optional positional args passed from the client.\n **kwargs: Optional keyword args passed from the client.\n \"\"\"\n print(f\"\ud83d\udcdd {entry['transcription']}\")\n print(f\"\ud83c\udf0d {entry['translation']}\")\n\n # Returning True signals the client to shutdown\n return False\n\n def main():\n config = ClientConfig(\n server_uri=\"ws://localhost:8765\",\n codec=\"opus\",\n )\n\n client = LiveTranslationClient(config)\n client.run(\n callback=parser_callback,\n callback_args=(), # Optional: positional args to pass\n callback_kwargs={}, # Optional: keyword args to pass\n blocking=True,\n )\n\n if __name__ == \"__main__\":\n main()\n\n ```\n\n### Non-Python Integration\nIf you're writing a **custom client** or integrating this system into another application, you can interact with the server directly using the WebSocket protocol.\n### Protocol Overview\n\nThe server listens on a WebSocket endpoint (default: `ws://localhost:8765`) and expects the client to:\n\n- **Send**: **encoded PCM** audio using the [**Opus codec**](https://en.wikipedia.org/wiki/Opus_(audio_format)) with the following specs:\n - Format: 16-bit signed integer (`int16`)\n - Sample Rate: 16,000 Hz\n - Channels: Mono (1 channel)\n - Chunk Size: 640 samples = 1280 bytes per message (40 ms)\n - Each encoded chunk should be sent immediately over the WebSocket\n > **NOTE**: The server also supports receiving **raw PCM** using the ***--codec pcm*** server option. The specs are identical to above, except not encoded.\n >\n\n- **Receive**: structured ***JSON*** messages with timestamp, transcription and translation fields\n ```json\n {\n \"timestamp\": \"2025-05-25T12:58:35.259085+00:00\",\n \"transcription\": \"Good morning, I hope everyone's doing great.\",\n \"translation\": \"Buenos d\u00edas, espero que todo el mundo est\u00e9 bien\"\n }\n\n### Client Examples\nFor fully working, ***yet simple***, examples in multiple languages, see [./examples/clients](https://github.com/AbdullahHendy/live-translation/tree/main/examples/clients)\nTo create more complex clients, look at the [python client](https://github.com/AbdullahHendy/live-translation/blob/main/live_translation/client/client.py) for guidance. \nAvailable Examples:\n- **Node.js**\n- **Browser JS**\n- **Go**\n- **C#**\n- **Kotlin/Android**\n\n---\n\n## \ud83e\udd1d Development & Contribution\n\nTo contribute or modify this project, these steps might be helpful:\n> **NOTE**: This workflow below is developed with Linux-based systems with typical build tools installed e.g. ***Make*** in mind. One might need to install ***Make*** and possibly other tools on other systems. However, one can still do things manually without ***Make***, for example, run test manually using `python -m pytest -s tests/` instead of `make test`. \n> See **Makefile** for more details.\n>\n\n**Fork & Clone** the repository:\n```bash\ngit clone git@github.com:<your-username>/live-translation.git\ncd live-translation\n```\n\n**Ceate** a virtual environment:\n```bash\npython -m venv .venv\nsource .venv/bin/activate \n```\n\n**Install** the package and its dependencies in ***editable mode***:\n```bash\npip install --upgrade pip\npip install -e .[dev,examples] # Install with optional examples dependencies\n```\nThis is **equivalent** to:\n```bash\nmake install\n```\n\n**Test** the package:\n```bash\nmake test\n```\n\n**Build** the package:\n```bash\nmake build\n```\n> **NOTE**: Building does ***lint*** and checks for ***formatting*** using [ruff](https://docs.astral.sh/ruff/). One can do that seprately using `make format` and `make lint`. For linting and formatting rules, see the [ruff config](https://github.com/AbdullahHendy/live-translation/blob/main/ruff.toml).\n\n> **NOTE**: Building generates a ***.whl*** file that can be ***pip*** installed in a new environment for testing\n\n**Check** more available ***make*** commands\n```bash\nmake help\n```\n\n**For quick testing**, run the server and the client within the virtual environment:\n```bash\nlive-translate-server [OPTIONS]\nlive-translate-client [OPTIONS]\n```\n> **NOTE**: Since the package was installed in editable mode, any changes will be reflected when the cli tools are run\n\n**For contribution**:\n- Make your changes in a feature branch\n- Ensure all tests pass\n- Open a Pull Request (PR) with a clear description of your changes\n\n---\n\n## \ud83c\udf31 Tested Environments\n\nThis project was tested and developed on the following system configuration:\n\n- **Architecture**: x86_64 (64-bit)\n- **Operating System**: Ubuntu 24.10 (Oracular Oriole)\n- **Kernel Version**: 6.11.0-18-generic\n- **Python Version**: 3.12.7\n- **Processor**: 13th Gen Intel(R) Core(TM) i9-13900HX\n- **GPU**: GeForce RTX 4070 Max-Q / Mobile [^1]\n- **NVIDIA Driver Version**: 560.35.03 \n- **CUDA Toolkit Version**: 12.1 \n- **cuDNN Version**: 9.7.1\n- **RAM**: 32GB DDR5\n- **Dependencies**: All required dependencies are listed in `pyproject.toml` and [Prerequisites](#-prerequisites)\n\n[^1]: CUDA as the `DEVICE` is probably needed for heavier models like `large-v3-turbo` for Whisper. [**Nvidia drivers**](https://www.nvidia.com/drivers/), [**CUDA Toolkit**](https://developer.nvidia.com/cuda-downloads), [**cuDNN**](https://developer.nvidia.com/cudnn-downloads) installation needed if option `\"cuda\"` was to be used.\n\n---\n\n## \ud83d\udcc8 Improvements\n\n- **ARM64 Support**: Ensure support for ARM64 based systems.\n- **Concurrency Design Check**: Review and optimize the threading design to ensure thread safety and prevent issues like race conditions or deadlocks, etc., revisit the current design of ***WebSocketIO*** being a thread while ***AudioProcessor***, ***Transcriber***, and ***Translator*** being processes.\n- **Logging**: Integrate detailed logging to track system activity, errors, and performance metrics using a more formal logging framework.\n- **Translation Models**: Some of the models downloaded in ***Translator*** from [OpusMT's Hugging Face](https://huggingface.co/Helsinki-NLP) are not the best performing when compared with top models in [Opus-MT's Leaderboard](https://opus.nlpl.eu/dashboard/). Find a way to automatically download best performing models using the user's input of `src_lang` and `tgt_lang` as it's currently done. \n- **System Profiling & Resource Guidelines**: Benchmark and document CPU, memory, and GPU usage across all multiprocessing components. For example, \"~35% CPU usage on 24-core **Intel i9-13900HX**\", or \"GPU load ~20% on **Nvidia RTX 4070** with `large-v3-turbo` Whisper model\"). This will help with hardware requirements and deployment decisions.\n- **Proper Handshake Protocol**: Instead of duplicate server and clinet options (e.g. --codec), establish a handshake protocol where, for example, server advertises its capabilities and negotiate with client over what options to use.\n---\n\n## \ud83d\udcda Citations\n ```bibtex\n @article{Whisper,\n title = {Robust Speech Recognition via Large-Scale Weak Supervision},\n url = {https://arxiv.org/abs/2212.04356},\n author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},\n publisher = {arXiv},\n year = {2022}\n }\n\n @misc{Silero VAD,\n author = {Silero Team},\n title = {Silero VAD: pre-trained enterprise-grade Voice Activity Detector (VAD), Number Detector and Language Classifier},\n year = {2021},\n publisher = {GitHub},\n journal = {GitHub repository},\n howpublished = {\\url{https://github.com/snakers4/silero-vad}},\n email = {hello@silero.ai}\n }\n\n @article{tiedemann2023democratizing,\n title={Democratizing neural machine translation with {OPUS-MT}},\n author={Tiedemann, J{\\\"o}rg and Aulamo, Mikko and Bakshandaeva, Daria and Boggia, Michele and Gr{\\\"o}nroos, Stig-Arne and Nieminen, Tommi and Raganato, Alessandro and Scherrer, Yves and Vazquez, Raul and Virpioja, Sami},\n journal={Language Resources and Evaluation},\n number={58},\n pages={713--755},\n year={2023},\n publisher={Springer Nature},\n issn={1574-0218},\n doi={10.1007/s10579-023-09704-w}\n }\n\n @InProceedings{TiedemannThottingal:EAMT2020,\n author = {J{\\\"o}rg Tiedemann and Santhosh Thottingal},\n title = {{OPUS-MT} \u2014 {B}uilding open translation services for the {W}orld},\n booktitle = {Proceedings of the 22nd Annual Conference of the European Association for Machine Translation (EAMT)},\n year = {2020},\n address = {Lisbon, Portugal}\n }\n```\n",
"bugtrack_url": null,
"license": null,
"summary": "A real-time translation tool using Whisper & Opus-MT",
"version": "0.9.0",
"project_urls": {
"Homepage": "https://github.com/AbdullahHendy/live-translation",
"Repository": "https://github.com/AbdullahHendy/live-translation"
},
"split_keywords": [
"translation",
" transcription",
" speech-to-text",
" real-time translation",
" whisper",
" marianmt",
" opus-mt",
" live transcription"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "814f1eaf0bb612b10951e07e847c7affa08f41ed6040cdc659aae9b2f9b6187f",
"md5": "fa33a7593dacce362ce0fcd78c2fcaa7",
"sha256": "7a7164b3cb21f4b88482c37c014bf668d8163e6c20a9f6184022eea1da75347f"
},
"downloads": -1,
"filename": "live_translation-0.9.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "fa33a7593dacce362ce0fcd78c2fcaa7",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.11",
"size": 32767,
"upload_time": "2025-08-12T10:52:24",
"upload_time_iso_8601": "2025-08-12T10:52:24.397542Z",
"url": "https://files.pythonhosted.org/packages/81/4f/1eaf0bb612b10951e07e847c7affa08f41ed6040cdc659aae9b2f9b6187f/live_translation-0.9.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "1bec14962223f76515d5d74d32150385bdf401f2f0836f9b5eb6fec383aebb15",
"md5": "c88dcc28230d5d58f63f27365a303146",
"sha256": "0c73d746502abb30c40c2c7b751965f1ff7f2a020ba37145ff13a52cad0550e6"
},
"downloads": -1,
"filename": "live_translation-0.9.0.tar.gz",
"has_sig": false,
"md5_digest": "c88dcc28230d5d58f63f27365a303146",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.11",
"size": 34156,
"upload_time": "2025-08-12T10:52:25",
"upload_time_iso_8601": "2025-08-12T10:52:25.628168Z",
"url": "https://files.pythonhosted.org/packages/1b/ec/14962223f76515d5d74d32150385bdf401f2f0836f9b5eb6fec383aebb15/live_translation-0.9.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-12 10:52:25",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "AbdullahHendy",
"github_project": "live-translation",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "live-translation"
}