# stream-translator-gpt
Command line utility to transcribe or translate audio from livestreams in real time. Uses [yt-dlp](https://github.com/yt-dlp/yt-dlp) to
get livestream URLs from various services and [Whisper](https://github.com/openai/whisper) / [Faster-Whisper](https://github.com/SYSTRAN/faster-whisper) for transcription.
This fork optimized the audio slicing logic based on [VAD](https://github.com/snakers4/silero-vad),
introduced [GPT API](https://platform.openai.com/api-keys) / [Gemini API](https://aistudio.google.com/app/apikey) to support language translation beyond English, and supports input from the audio devices.
Try it on Colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ionic-bond/stream-translator-gpt/blob/main/stream_translator.ipynb)
## Prerequisites
**Linux or Windows:**
1. Python >= 3.8 (Recommend >= 3.10)
2. [**Install CUDA on your system.**](https://developer.nvidia.com/cuda-downloads).
3. [**Install cuDNN v8 to your CUDA dir**](https://developer.nvidia.com/rdp/cudnn-archive) if you want to use **Faster-Whisper**.
4. [**Install PyTorch (with CUDA) to your Python.**](https://pytorch.org/get-started/locally/)
5. [**Create a Google API key**](https://aistudio.google.com/app/apikey) if you want to use **Gemini API** for translation. (Free 15 requests / minute)
6. [**Create a OpenAI API key**](https://platform.openai.com/api-keys) if you want to use **Whisper API** for transcription or **GPT API** for translation.
**If you are in Windows, you also need to:**
1. [**Install and add ffmpeg to your PATH.**](https://www.thewindowsclub.com/how-to-install-ffmpeg-on-windows-10#:~:text=Click%20New%20and%20type%20the,Click%20OK%20to%20apply%20changes.)
2. Install [**yt-dlp**](https://github.com/yt-dlp/yt-dlp) and add it to your PATH.
## Installation
**Install release version from PyPI (Recommend):**
```
pip install stream-translator-gpt -U
stream-translator-gpt
```
or
**Clone master version code from Github:**
```
git clone https://github.com/ionic-bond/stream-translator-gpt.git
pip install -r ./stream-translator-gpt/requirements.txt
python3 ./stream-translator-gpt/translator.py
```
## Usage
- Transcribe live streaming (default use **Whisper**):
```stream-translator-gpt {URL} --model large --language {input_language}```
- Transcribe by **Faster-Whisper**:
```stream-translator-gpt {URL} --model large --language {input_language} --use_faster_whisper```
- Transcribe by **Whisper API**:
```stream-translator-gpt {URL} --language {input_language} --use_whisper_api --openai_api_key {your_openai_key}```
- Translate to other language by **Gemini**:
```stream-translator-gpt {URL} --model large --language ja --gpt_translation_prompt "Translate from Japanese to Chinese" --google_api_key {your_google_key}```
- Translate to other language by **GPT**:
```stream-translator-gpt {URL} --model large --language ja --gpt_translation_prompt "Translate from Japanese to Chinese" --openai_api_key {your_openai_key}```
- Using **Whisper API** and **Gemini** at the same time:
```stream-translator-gpt {URL} --model large --language ja --use_whisper_api --openai_api_key {your_openai_key} --gpt_translation_prompt "Translate from Japanese to Chinese" --google_api_key {your_google_key}```
- Local video/audio file as input:
```stream-translator-gpt /path/to/file --model large --language {input_language}```
- Computer microphone as input:
```stream-translator-gpt device --model large --language {input_language}```
Will use the system's default audio device as input.
If you want to use another audio input device, `stream-translator-gpt device --print_all_devices` get device index and then run the CLI with `--device_index {index}`.
If you want to use the audio output of another program as input, you need to [**enable stereo mix**](https://www.howtogeek.com/39532/how-to-enable-stereo-mix-in-windows-7-to-record-audio/).
- Sending result to Cqhttp:
```stream-translator-gpt {URL} --model large --language {input_language} --cqhttp_url {your_cqhttp_url} --cqhttp_token {your_cqhttp_token}```
- Sending result to Discord:
```stream-translator-gpt {URL} --model large --language {input_language} --discord_webhook_url {your_discord_webhook_url}```
- Saving result to a .srt subtitle file:
```stream-translator-gpt {URL} --model large --language ja --gpt_translation_prompt "Translate from Japanese to Chinese" --google_api_key {your_google_key} --hide_transcribe_result --output_timestamps --output_file_path ./result.srt```
Raw data
{
"_id": null,
"home_page": null,
"name": "stream-translator-gpt",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "translator, translation, translate, transcribe, yt-dlp, vad, whisper, faster-whisper, whisper-api, gpt, gemini",
"author": null,
"author_email": "ion <ionicbond3@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/e6/8d/6509e90ac1070eb6aee7edd2b660252736661d9ffee4cf82539ad4ffc895/stream_translator_gpt-2025.1.13.tar.gz",
"platform": null,
"description": "# stream-translator-gpt\n\nCommand line utility to transcribe or translate audio from livestreams in real time. Uses [yt-dlp](https://github.com/yt-dlp/yt-dlp) to \nget livestream URLs from various services and [Whisper](https://github.com/openai/whisper) / [Faster-Whisper](https://github.com/SYSTRAN/faster-whisper) for transcription.\n\nThis fork optimized the audio slicing logic based on [VAD](https://github.com/snakers4/silero-vad), \nintroduced [GPT API](https://platform.openai.com/api-keys) / [Gemini API](https://aistudio.google.com/app/apikey) to support language translation beyond English, and supports input from the audio devices.\n\nTry it on Colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ionic-bond/stream-translator-gpt/blob/main/stream_translator.ipynb)\n\n## Prerequisites\n\n**Linux or Windows:**\n\n1. Python >= 3.8 (Recommend >= 3.10)\n2. [**Install CUDA on your system.**](https://developer.nvidia.com/cuda-downloads).\n3. [**Install cuDNN v8 to your CUDA dir**](https://developer.nvidia.com/rdp/cudnn-archive) if you want to use **Faster-Whisper**.\n4. [**Install PyTorch (with CUDA) to your Python.**](https://pytorch.org/get-started/locally/)\n5. [**Create a Google API key**](https://aistudio.google.com/app/apikey) if you want to use **Gemini API** for translation. (Free 15 requests / minute)\n6. [**Create a OpenAI API key**](https://platform.openai.com/api-keys) if you want to use **Whisper API** for transcription or **GPT API** for translation.\n\n**If you are in Windows, you also need to:**\n\n1. [**Install and add ffmpeg to your PATH.**](https://www.thewindowsclub.com/how-to-install-ffmpeg-on-windows-10#:~:text=Click%20New%20and%20type%20the,Click%20OK%20to%20apply%20changes.)\n2. Install [**yt-dlp**](https://github.com/yt-dlp/yt-dlp) and add it to your PATH.\n\n## Installation\n\n**Install release version from PyPI (Recommend):**\n\n```\npip install stream-translator-gpt -U\nstream-translator-gpt\n```\n\nor\n\n**Clone master version code from Github:**\n\n```\ngit clone https://github.com/ionic-bond/stream-translator-gpt.git\npip install -r ./stream-translator-gpt/requirements.txt\npython3 ./stream-translator-gpt/translator.py\n```\n\n## Usage\n\n- Transcribe live streaming (default use **Whisper**):\n\n ```stream-translator-gpt {URL} --model large --language {input_language}```\n\n- Transcribe by **Faster-Whisper**:\n\n ```stream-translator-gpt {URL} --model large --language {input_language} --use_faster_whisper```\n\n- Transcribe by **Whisper API**:\n\n ```stream-translator-gpt {URL} --language {input_language} --use_whisper_api --openai_api_key {your_openai_key}```\n\n- Translate to other language by **Gemini**:\n\n ```stream-translator-gpt {URL} --model large --language ja --gpt_translation_prompt \"Translate from Japanese to Chinese\" --google_api_key {your_google_key}```\n\n- Translate to other language by **GPT**:\n\n ```stream-translator-gpt {URL} --model large --language ja --gpt_translation_prompt \"Translate from Japanese to Chinese\" --openai_api_key {your_openai_key}```\n\n- Using **Whisper API** and **Gemini** at the same time:\n\n ```stream-translator-gpt {URL} --model large --language ja --use_whisper_api --openai_api_key {your_openai_key} --gpt_translation_prompt \"Translate from Japanese to Chinese\" --google_api_key {your_google_key}```\n\n- Local video/audio file as input:\n\n ```stream-translator-gpt /path/to/file --model large --language {input_language}```\n\n- Computer microphone as input:\n\n ```stream-translator-gpt device --model large --language {input_language}```\n \n Will use the system's default audio device as input.\n\n If you want to use another audio input device, `stream-translator-gpt device --print_all_devices` get device index and then run the CLI with `--device_index {index}`.\n\n If you want to use the audio output of another program as input, you need to [**enable stereo mix**](https://www.howtogeek.com/39532/how-to-enable-stereo-mix-in-windows-7-to-record-audio/).\n\n- Sending result to Cqhttp:\n\n ```stream-translator-gpt {URL} --model large --language {input_language} --cqhttp_url {your_cqhttp_url} --cqhttp_token {your_cqhttp_token}```\n\n- Sending result to Discord:\n\n ```stream-translator-gpt {URL} --model large --language {input_language} --discord_webhook_url {your_discord_webhook_url}```\n\n- Saving result to a .srt subtitle file:\n\n ```stream-translator-gpt {URL} --model large --language ja --gpt_translation_prompt \"Translate from Japanese to Chinese\" --google_api_key {your_google_key} --hide_transcribe_result --output_timestamps --output_file_path ./result.srt```\n",
"bugtrack_url": null,
"license": null,
"summary": "Command line tool to transcribe & translate audio from livestreams in real time",
"version": "2025.1.13",
"project_urls": {
"Homepage": "https://github.com/ionic-bond/stream-translator-gpt",
"Issues": "https://github.com/ionic-bond/stream-translator-gpt/issues"
},
"split_keywords": [
"translator",
" translation",
" translate",
" transcribe",
" yt-dlp",
" vad",
" whisper",
" faster-whisper",
" whisper-api",
" gpt",
" gemini"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "49c430870fa40b3bfe627c03dc1deec324f6e30af682b32ac6feb19b046d672e",
"md5": "c43ab48461103e3ccfab184b0f791b6f",
"sha256": "81ba6a4d249cd04ac252d96e2726c0e5ead919f42b6add2500c1d3935ee4345a"
},
"downloads": -1,
"filename": "stream_translator_gpt-2025.1.13-py3-none-any.whl",
"has_sig": false,
"md5_digest": "c43ab48461103e3ccfab184b0f791b6f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 1140269,
"upload_time": "2025-01-13T13:48:20",
"upload_time_iso_8601": "2025-01-13T13:48:20.883664Z",
"url": "https://files.pythonhosted.org/packages/49/c4/30870fa40b3bfe627c03dc1deec324f6e30af682b32ac6feb19b046d672e/stream_translator_gpt-2025.1.13-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "e68d6509e90ac1070eb6aee7edd2b660252736661d9ffee4cf82539ad4ffc895",
"md5": "8b7d2ddae0cf187ddb6b23753f4f861b",
"sha256": "60df07090d4a48617d95589cb6396b1b2ad2f1441f875b068887eed027c90d87"
},
"downloads": -1,
"filename": "stream_translator_gpt-2025.1.13.tar.gz",
"has_sig": false,
"md5_digest": "8b7d2ddae0cf187ddb6b23753f4f861b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 1144540,
"upload_time": "2025-01-13T13:48:25",
"upload_time_iso_8601": "2025-01-13T13:48:25.627704Z",
"url": "https://files.pythonhosted.org/packages/e6/8d/6509e90ac1070eb6aee7edd2b660252736661d9ffee4cf82539ad4ffc895/stream_translator_gpt-2025.1.13.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-13 13:48:25",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "ionic-bond",
"github_project": "stream-translator-gpt",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "numpy",
"specs": []
},
{
"name": "scipy",
"specs": []
},
{
"name": "yt-dlp",
"specs": [
[
">=",
"2024.12.23"
]
]
},
{
"name": "ffmpeg-python",
"specs": [
[
"<",
"0.3"
],
[
">=",
"0.2.0"
]
]
},
{
"name": "sounddevice",
"specs": [
[
"<",
"1.0"
]
]
},
{
"name": "openai-whisper",
"specs": [
[
"==",
"20240930"
]
]
},
{
"name": "faster-whisper",
"specs": [
[
"<",
"2.0.0"
],
[
">=",
"1.1.0"
]
]
},
{
"name": "openai",
"specs": [
[
"<",
"2.0"
],
[
">=",
"1.40"
]
]
},
{
"name": "google-generativeai",
"specs": [
[
"<",
"1.0"
],
[
">=",
"0.7"
]
]
}
],
"lcname": "stream-translator-gpt"
}