# Open-Lyrics
[![PyPI](https://img.shields.io/pypi/v/openlrc)](https://pypi.org/project/openlrc/)
[![PyPI - License](https://img.shields.io/pypi/l/openlrc)](https://pypi.org/project/openlrc/)
[![Downloads](https://static.pepy.tech/badge/openlrc)](https://pepy.tech/project/openlrc)
![GitHub Workflow Status (with event)](https://img.shields.io/github/actions/workflow/status/zh-plus/Open-Lyrics/ci.yml)
Open-Lyrics is a Python library that transcribes voice files using
[faster-whisper](https://github.com/guillaumekln/faster-whisper), and translates/polishes the resulting text
into `.lrc` files in the desired language using LLM,
e.g. [OpenAI-GPT](https://github.com/openai/openai-python), [Anthropic-Claude](https://github.com/anthropics/anthropic-sdk-python).
## New 🚨
- 2024.3.29: Claude models are now available for translation. According to the testing, Claude 3 Sonnet performs way
better than GPT-3.5 Turbo. We recommend using Claude 3 Sonnet for non-english audio (source language) translation (For
now, the default model
are still GPT-3.5 Turbo):
```python
lrcer = LRCer(chatbot_model='claude-3-sonnet-20240229')
```
- 2024.4.4: Add basic streamlit GUI support. Try `openlrc gui` to start the GUI.
## Installation ⚙️
1. Please install CUDA 11.x and [cuDNN 8 for CUDA 11](https://developer.nvidia.com/cudnn) first according
to https://opennmt.net/CTranslate2/installation.html to enable `faster-whisper`.
`faster-whisper` also needs [cuBLAS for CUDA 11](https://developer.nvidia.com/cublas) installed.
<details>
<summary>For Windows Users (click to expand)</summary>
(For Windows Users only) Windows user can Download the libraries from Purfview's repository:
Purfview's [whisper-standalone-win](https://github.com/Purfview/whisper-standalone-win) provides the required NVIDIA
libraries for Windows in a [single archive](https://github.com/Purfview/whisper-standalone-win/releases/tag/libs).
Decompress the archive and place the libraries in a directory included in the `PATH`.
</details>
2. Add LLM API keys, you can either:
- Add your [OpenAI API key](https://platform.openai.com/account/api-keys) to environment variable `OPENAI_API_KEY`.
- Add your [Anthropic API key](https://console.anthropic.com/settings/keys) to environment
variable `ANTHROPIC_API_KEY`.
3. Install [PyTorch](https://pytorch.org/get-started/locally/):
```shell
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
```
4. Install latest [fast-whisper](https://github.com/guillaumekln/faster-whisper)
```shell
pip install git+https://github.com/guillaumekln/faster-whisper
```
5. Install [ffmpeg](https://ffmpeg.org/download.html) and add `bin` directory
to your `PATH`.
6. This project can be installed from PyPI:
```shell
pip install openlrc
```
or install directly from GitHub:
```shell
pip install git+https://github.com/zh-plus/Open-Lyrics
```
## Usage 🐍
### GUI
```shell
openlrc gui
```
![](https://github.com/zh-plus/openlrc/blob/master/resources/streamlit_app.jpg?raw=true)
### Python code
```python
from openlrc import LRCer
if __name__ == '__main__':
lrcer = LRCer()
# Single file
lrcer.run('./data/test.mp3',
target_lang='zh-cn') # Generate translated ./data/test.lrc with default translate prompt.
# Multiple files
lrcer.run(['./data/test1.mp3', './data/test2.mp3'], target_lang='zh-cn')
# Note we run the transcription sequentially, but run the translation concurrently for each file.
# Path can contain video
lrcer.run(['./data/test_audio.mp3', './data/test_video.mp4'], target_lang='zh-cn')
# Generate translated ./data/test_audio.lrc and ./data/test_video.srt
# Use context.yaml to improve translation
lrcer.run('./data/test.mp3', target_lang='zh-cn', context_path='./data/context.yaml')
# To skip translation process
lrcer.run('./data/test.mp3', target_lang='en', skip_trans=True)
# Change asr_options or vad_options, check openlrc.defaults for details
vad_options = {"threshold": 0.1}
lrcer = LRCer(vad_options=vad_options)
lrcer.run('./data/test.mp3', target_lang='zh-cn')
# Enhance the audio using noise suppression (consume more time).
lrcer.run('./data/test.mp3', target_lang='zh-cn', noise_suppress=True)
# Change the LLM model for translation
lrcer = LRCer(chatbot_model='claude-3-sonnet-20240229')
lrcer.run('./data/test.mp3', target_lang='zh-cn')
# Clear temp folder after processing done
lrcer.run('./data/test.mp3', target_lang='zh-cn', clear_temp_folder=True)
```
Check more details in [Documentation](https://zh-plus.github.io/openlrc/#/).
### Context
Utilize the available context to enhance the quality of your translation.
Save them as `context.yaml` in the same directory as your audio file.
> [!NOTE]
> The improvement of translation quality from Context is **NOT** guaranteed.
```yaml
background: "This is a multi-line background.
This is a basic example."
audio_type: Movie
description_map: {
movie_name1 (without extension): "This
is a multi-line description for movie1.",
movie_name2 (without extension): "This
is a multi-line description for movie2.",
movie_name3 (without extension): "This is a single-line description for movie 3.",
}
```
## Pricing 💰
*pricing data from [OpenAI](https://openai.com/pricing)
and [Anthropic](https://docs.anthropic.com/claude/docs/models-overview#model-comparison)*
| Model Name | Pricing for 1M Tokens <br/>(Input/Output) (USD) | Cost for 1 Hour Audio <br/>(USD) |
|----------------------------|-------------------------------------------------|----------------------------------|
| `gpt-3.5-turbo-0125` | 0.5, 1.5 | 0.01 |
| `gpt-3.5-turbo` | 0.5, 1.5 | 0.01 |
| `gpt-4-0125-preview` | 10, 30 | 0.5 |
| `gpt-4-turbo-preview` | 10, 30 | 0.5 |
| `claude-3-haiku-20240307` | 0.25, 1.25 | 0.015 |
| `claude-3-sonnet-20240229` | 3, 15 | 0.2 |
| `claude-3-opus-20240229` | 15, 75 | 1 |
**Note the cost is estimated based on the token count of the input and output text.
The actual cost may vary due to the language and audio speed.**
### Recommended translation model
For english audio, we recommend using `gpt-3.5-turbo`.
For non-english audio, we recommend using `claude-3-sonnet-20240229`.
## Todo
- [x] [Efficiency] Batched translate/polish for GPT request (enable contextual ability).
- [x] [Efficiency] Concurrent support for GPT request.
- [x] [Translation Quality] Make translate prompt more robust according to https://github.com/openai/openai-cookbook.
- [x] [Feature] Automatically fix json encoder error using GPT.
- [x] [Efficiency] Asynchronously perform transcription and translation for multiple audio inputs.
- [x] [Quality] Improve batched translation/polish prompt according
to [gpt-subtrans](https://github.com/machinewrapped/gpt-subtrans).
- [x] [Feature] Input video support.
- [X] [Feature] Multiple output format support.
- [x] [Quality] Speech enhancement for input audio.
- [ ] [Feature] Preprocessor: Voice-music separation.
- [ ] [Feature] Align ground-truth transcription with audio.
- [ ] [Quality]
Use [multilingual language model](https://www.sbert.net/docs/pretrained_models.html#multi-lingual-models) to assess
translation quality.
- [ ] [Efficiency] Add Azure OpenAI Service support.
- [ ] [Quality] Use [claude](https://www.anthropic.com/index/introducing-claude) for translation.
- [ ] [Feature] Add local LLM support.
- [X] [Feature] Multiple translate engine (Anthropic, Microsoft, DeepL, Google, etc.) support.
- [ ] [**Feature**] Build
a [electron + fastapi](https://ivanyu2021.hashnode.dev/electron-django-desktop-app-integrate-javascript-and-python)
GUI for cross-platform application.
- [x] [Feature] Web-based [streamlit](https://streamlit.io/) GUI.
- [ ] Add [fine-tuned whisper-large-v2](https://huggingface.co/models?search=whisper-large-v2) models for common
languages.
- [ ] [Others] Add transcribed examples.
- [ ] Song
- [ ] Podcast
- [ ] Audiobook
## Credits
- https://github.com/guillaumekln/faster-whisper
- https://github.com/m-bain/whisperX
- https://github.com/openai/openai-python
- https://github.com/openai/whisper
- https://github.com/machinewrapped/gpt-subtrans
- https://github.com/MicrosoftTranslator/Text-Translation-API-V3-Python
- https://github.com/streamlit/streamlit
## Star History
[![Star History Chart](https://api.star-history.com/svg?repos=zh-plus/Open-Lyrics&type=Date)](https://star-history.com/#zh-plus/Open-Lyrics&Date)
Raw data
{
"_id": null,
"home_page": "https://github.com/zh-plus/Open-Lyrics",
"name": "openlrc",
"maintainer": null,
"docs_url": null,
"requires_python": "!=2.7.*,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*,!=3.5.*,!=3.6.*,!=3.7.*,>=3.8",
"maintainer_email": null,
"keywords": "openai-gpt3, whisper, voice transcribe, lrc",
"author": "Hao Zheng",
"author_email": "zhenghaosustc@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/a3/fa/59a652da51deaa6d361856ac5e47b09ce8d03380ad24ee79586f9c167cf2/openlrc-1.3.0.tar.gz",
"platform": null,
"description": "# Open-Lyrics\n\n[![PyPI](https://img.shields.io/pypi/v/openlrc)](https://pypi.org/project/openlrc/)\n[![PyPI - License](https://img.shields.io/pypi/l/openlrc)](https://pypi.org/project/openlrc/)\n[![Downloads](https://static.pepy.tech/badge/openlrc)](https://pepy.tech/project/openlrc)\n![GitHub Workflow Status (with event)](https://img.shields.io/github/actions/workflow/status/zh-plus/Open-Lyrics/ci.yml)\n\nOpen-Lyrics is a Python library that transcribes voice files using\n[faster-whisper](https://github.com/guillaumekln/faster-whisper), and translates/polishes the resulting text\ninto `.lrc` files in the desired language using LLM,\ne.g. [OpenAI-GPT](https://github.com/openai/openai-python), [Anthropic-Claude](https://github.com/anthropics/anthropic-sdk-python).\n\n## New \ud83d\udea8\n\n- 2024.3.29: Claude models are now available for translation. According to the testing, Claude 3 Sonnet performs way\n better than GPT-3.5 Turbo. We recommend using Claude 3 Sonnet for non-english audio (source language) translation (For\n now, the default model\n are still GPT-3.5 Turbo):\n ```python\n lrcer = LRCer(chatbot_model='claude-3-sonnet-20240229')\n ```\n- 2024.4.4: Add basic streamlit GUI support. Try `openlrc gui` to start the GUI.\n\n## Installation \u2699\ufe0f\n\n1. Please install CUDA 11.x and [cuDNN 8 for CUDA 11](https://developer.nvidia.com/cudnn) first according\n to https://opennmt.net/CTranslate2/installation.html to enable `faster-whisper`.\n\n `faster-whisper` also needs [cuBLAS for CUDA 11](https://developer.nvidia.com/cublas) installed.\n <details>\n <summary>For Windows Users (click to expand)</summary> \n\n (For Windows Users only) Windows user can Download the libraries from Purfview's repository:\n\n Purfview's [whisper-standalone-win](https://github.com/Purfview/whisper-standalone-win) provides the required NVIDIA\n libraries for Windows in a [single archive](https://github.com/Purfview/whisper-standalone-win/releases/tag/libs).\n Decompress the archive and place the libraries in a directory included in the `PATH`.\n\n </details>\n\n\n2. Add LLM API keys, you can either:\n - Add your [OpenAI API key](https://platform.openai.com/account/api-keys) to environment variable `OPENAI_API_KEY`.\n - Add your [Anthropic API key](https://console.anthropic.com/settings/keys) to environment\n variable `ANTHROPIC_API_KEY`.\n\n3. Install [PyTorch](https://pytorch.org/get-started/locally/):\n ```shell\n pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118\n ```\n\n4. Install latest [fast-whisper](https://github.com/guillaumekln/faster-whisper)\n ```shell\n pip install git+https://github.com/guillaumekln/faster-whisper\n ```\n\n5. Install [ffmpeg](https://ffmpeg.org/download.html) and add `bin` directory\n to your `PATH`.\n\n6. This project can be installed from PyPI:\n\n ```shell\n pip install openlrc\n ```\n\n or install directly from GitHub:\n\n ```shell\n pip install git+https://github.com/zh-plus/Open-Lyrics\n ```\n\n## Usage \ud83d\udc0d\n\n### GUI\n\n```shell\nopenlrc gui\n```\n\n![](https://github.com/zh-plus/openlrc/blob/master/resources/streamlit_app.jpg?raw=true)\n\n### Python code\n\n```python\nfrom openlrc import LRCer\n\nif __name__ == '__main__':\n lrcer = LRCer()\n\n # Single file\n lrcer.run('./data/test.mp3',\n target_lang='zh-cn') # Generate translated ./data/test.lrc with default translate prompt.\n\n # Multiple files\n lrcer.run(['./data/test1.mp3', './data/test2.mp3'], target_lang='zh-cn')\n # Note we run the transcription sequentially, but run the translation concurrently for each file.\n\n # Path can contain video\n lrcer.run(['./data/test_audio.mp3', './data/test_video.mp4'], target_lang='zh-cn')\n # Generate translated ./data/test_audio.lrc and ./data/test_video.srt\n\n # Use context.yaml to improve translation\n lrcer.run('./data/test.mp3', target_lang='zh-cn', context_path='./data/context.yaml')\n\n # To skip translation process\n lrcer.run('./data/test.mp3', target_lang='en', skip_trans=True)\n\n # Change asr_options or vad_options, check openlrc.defaults for details\n vad_options = {\"threshold\": 0.1}\n lrcer = LRCer(vad_options=vad_options)\n lrcer.run('./data/test.mp3', target_lang='zh-cn')\n\n # Enhance the audio using noise suppression (consume more time).\n lrcer.run('./data/test.mp3', target_lang='zh-cn', noise_suppress=True)\n\n # Change the LLM model for translation\n lrcer = LRCer(chatbot_model='claude-3-sonnet-20240229')\n lrcer.run('./data/test.mp3', target_lang='zh-cn')\n\n # Clear temp folder after processing done\n lrcer.run('./data/test.mp3', target_lang='zh-cn', clear_temp_folder=True)\n```\n\nCheck more details in [Documentation](https://zh-plus.github.io/openlrc/#/).\n\n### Context\n\nUtilize the available context to enhance the quality of your translation.\nSave them as `context.yaml` in the same directory as your audio file.\n\n> [!NOTE]\n> The improvement of translation quality from Context is **NOT** guaranteed.\n\n```yaml\nbackground: \"This is a multi-line background.\nThis is a basic example.\"\naudio_type: Movie\ndescription_map: {\n movie_name1 (without extension): \"This\n is a multi-line description for movie1.\",\n movie_name2 (without extension): \"This\n is a multi-line description for movie2.\",\n movie_name3 (without extension): \"This is a single-line description for movie 3.\",\n}\n```\n\n## Pricing \ud83d\udcb0\n\n*pricing data from [OpenAI](https://openai.com/pricing)\nand [Anthropic](https://docs.anthropic.com/claude/docs/models-overview#model-comparison)*\n\n| Model Name | Pricing for 1M Tokens <br/>(Input/Output) (USD) | Cost for 1 Hour Audio <br/>(USD) |\n|----------------------------|-------------------------------------------------|----------------------------------|\n| `gpt-3.5-turbo-0125` | 0.5, 1.5 | 0.01 |\n| `gpt-3.5-turbo` | 0.5, 1.5 | 0.01 |\n| `gpt-4-0125-preview` | 10, 30 | 0.5 |\n| `gpt-4-turbo-preview` | 10, 30 | 0.5 |\n| `claude-3-haiku-20240307` | 0.25, 1.25 | 0.015 |\n| `claude-3-sonnet-20240229` | 3, 15 | 0.2 |\n| `claude-3-opus-20240229` | 15, 75 | 1 |\n\n**Note the cost is estimated based on the token count of the input and output text.\nThe actual cost may vary due to the language and audio speed.**\n\n### Recommended translation model\n\nFor english audio, we recommend using `gpt-3.5-turbo`.\n\nFor non-english audio, we recommend using `claude-3-sonnet-20240229`.\n\n## Todo\n\n- [x] [Efficiency] Batched translate/polish for GPT request (enable contextual ability).\n- [x] [Efficiency] Concurrent support for GPT request.\n- [x] [Translation Quality] Make translate prompt more robust according to https://github.com/openai/openai-cookbook.\n- [x] [Feature] Automatically fix json encoder error using GPT.\n- [x] [Efficiency] Asynchronously perform transcription and translation for multiple audio inputs.\n- [x] [Quality] Improve batched translation/polish prompt according\n to [gpt-subtrans](https://github.com/machinewrapped/gpt-subtrans).\n- [x] [Feature] Input video support.\n- [X] [Feature] Multiple output format support.\n- [x] [Quality] Speech enhancement for input audio.\n- [ ] [Feature] Preprocessor: Voice-music separation.\n- [ ] [Feature] Align ground-truth transcription with audio.\n- [ ] [Quality]\n Use [multilingual language model](https://www.sbert.net/docs/pretrained_models.html#multi-lingual-models) to assess\n translation quality.\n- [ ] [Efficiency] Add Azure OpenAI Service support.\n- [ ] [Quality] Use [claude](https://www.anthropic.com/index/introducing-claude) for translation.\n- [ ] [Feature] Add local LLM support.\n- [X] [Feature] Multiple translate engine (Anthropic, Microsoft, DeepL, Google, etc.) support.\n- [ ] [**Feature**] Build\n a [electron + fastapi](https://ivanyu2021.hashnode.dev/electron-django-desktop-app-integrate-javascript-and-python)\n GUI for cross-platform application.\n- [x] [Feature] Web-based [streamlit](https://streamlit.io/) GUI.\n- [ ] Add [fine-tuned whisper-large-v2](https://huggingface.co/models?search=whisper-large-v2) models for common\n languages.\n- [ ] [Others] Add transcribed examples.\n - [ ] Song\n - [ ] Podcast\n - [ ] Audiobook\n\n## Credits\n\n- https://github.com/guillaumekln/faster-whisper\n- https://github.com/m-bain/whisperX\n- https://github.com/openai/openai-python\n- https://github.com/openai/whisper\n- https://github.com/machinewrapped/gpt-subtrans\n- https://github.com/MicrosoftTranslator/Text-Translation-API-V3-Python\n- https://github.com/streamlit/streamlit\n\n## Star History\n\n[![Star History Chart](https://api.star-history.com/svg?repos=zh-plus/Open-Lyrics&type=Date)](https://star-history.com/#zh-plus/Open-Lyrics&Date)\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Transcribe (whisper) and translate (gpt) voice into LRC file.",
"version": "1.3.0",
"project_urls": {
"Bug Tracker": "https://github.com/zh-plus/Open-Lyrics/issues",
"Homepage": "https://github.com/zh-plus/Open-Lyrics"
},
"split_keywords": [
"openai-gpt3",
" whisper",
" voice transcribe",
" lrc"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "e7ad518a469326802bccd5ae56cfb2a84035bec2ebe0b2fd9da2c547d02399d0",
"md5": "11f52c9c7d0af7febd9d45e3cce842be",
"sha256": "260c0b093a40e31b7159019f1414d47bb77da78e03fcd48d4fb8e69858e8c65e"
},
"downloads": -1,
"filename": "openlrc-1.3.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "11f52c9c7d0af7febd9d45e3cce842be",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "!=2.7.*,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*,!=3.5.*,!=3.6.*,!=3.7.*,>=3.8",
"size": 42911,
"upload_time": "2024-04-09T10:34:20",
"upload_time_iso_8601": "2024-04-09T10:34:20.939944Z",
"url": "https://files.pythonhosted.org/packages/e7/ad/518a469326802bccd5ae56cfb2a84035bec2ebe0b2fd9da2c547d02399d0/openlrc-1.3.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a3fa59a652da51deaa6d361856ac5e47b09ce8d03380ad24ee79586f9c167cf2",
"md5": "d09ba9f9bf33572c5edd74ebc2c398cd",
"sha256": "eefb76c6bae9b524e3f7478d7617ebd7428451a4ce9de60e63402dcf859c3246"
},
"downloads": -1,
"filename": "openlrc-1.3.0.tar.gz",
"has_sig": false,
"md5_digest": "d09ba9f9bf33572c5edd74ebc2c398cd",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "!=2.7.*,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*,!=3.5.*,!=3.6.*,!=3.7.*,>=3.8",
"size": 36076,
"upload_time": "2024-04-09T10:34:23",
"upload_time_iso_8601": "2024-04-09T10:34:23.025265Z",
"url": "https://files.pythonhosted.org/packages/a3/fa/59a652da51deaa6d361856ac5e47b09ce8d03380ad24ee79586f9c167cf2/openlrc-1.3.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-09 10:34:23",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "zh-plus",
"github_project": "Open-Lyrics",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "openlrc"
}