# Subtitle to Audio
Subtitle to audio, generate audio from any subtitle file using Coqui-ai TTS and synchronize the audio timing according to subtitle time.
**Demo :** [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/bnsantoso/sub-to-audio//blob/main/subtitle_to_audio.ipynb)
[![ko-fi](https://ko-fi.com/img/githubbutton_sm.svg)](https://ko-fi.com/bnsantoso)
## Dependencies
[ffmpeg](https://ffmpeg.org/), [pydub](https://github.com/jiaaro/pydub), [librosa](https://github.com/librosa/librosa), [coqui-ai TTS](https://github.com/coqui-ai/TTS/), [ffmpeg-python](https://github.com/kkroening/ffmpeg-python)
## Installation
```bash
pip install git+https://github.com/bnsantoso/sub-to-audio
```
```bash
pip install subtoaudio
```
ffmpeg on linux
```bash
apt-get install ffmpeg
```
## Example usage
Basic use is very similiar to [Coqui-ai TTS](https://github.com/coqui-ai/TTS/), you can check their [documentation](https://tts.readthedocs.io/en/latest/inference.html) and the [<lang-iso_code>](https://dl.fbaipublicfiles.com/mms/tts/all-tts-languages.html).
**!Note: Use non-overlapping subtitles with an optimal Character per Second / CPS for best result**
```python
from subtoaudio import SubToAudio
#Using the Fairseq English speaker model as the default
#The code will output 'yoursubtitle.wav' in the current directory.
sub = SubToAudio(gpu=True)
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=subtitle)
#you can choose 1100 different language using fairseq model
sub = SubToAudio(language='<lang-iso_code>')
subtitle = sub.subtitle("yoursubtitle.ass")
sub.convert_to_audio(sub_data=subtitle)
#specify model name
sub = SubToAudio(model_name="tts_models/multilingual/multi-dataset/your_tts")
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=subtitle, output_path="subtitle.wav")
#specify model and config path
sub = SubToAudio(model_path="path/to/your/model.pth" config_path="config/path.json")
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=subtitle)
#By default, it is using "speaker=tts.speakers[0]/None,
#language=tts.languages[0]/None, speaker_wav=None
sub = SubToAudio(model_name="tts_models/multilingual/multi-dataset/your_tts")
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=subtitle, language="en", speaker="speakername", speaker_wav="your/path/speaker.wav", output_path="subtitle.wav")
#Save temporary audio to current folder
sub = SubToAudio(model_name="tts_models/multilingual/multi-dataset/your_tts")
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=subtitle, output_path="subtitle.wav", save_temp=True)
```
## Tempo Mode
Use the `tempo_mode` parameter to speed up the audio. There are three tempo modes:
- `tempo_mode="all"` : This accelerates all audio. Use `tempo_speed=float` to specify the speed.
- `tempo_mode="overflow"` : This accelerates the audio to match the total subtitle duration plus the blank duration before the next subtitle appears. `'tempo_limit'` will limit the speed increase during overflow.
- `tempo_mode="precise"` : This accelerates the audio to match the duration the subtitle appears."
```python
from subtoaudio import SubToAudio
#Speed up tempo or speech rate
sub = SubToAudio(gpu=True)
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=subtitle, tempo_mode="all", tempo_speed=1.3)
#Change the tempo or speech rate of all audio files , default is 1.2
sub = SubToAudio(gpu=True)
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=subtitle, tempo_mode="all", tempo_speed=1.3)
#Change tempo or speech rate to audio that doesn't match the subtitle duration
sub = SubToAudio(gpu=True)
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=subtitle, tempo_mode="overflow")
#Limit tempo speed on the overflow mode
sub.convert_to_audio(sub_data=subtitle, tempo_mode="overflow", tempo_limit=1.2)
#Match audio length to subtitle duration
sub = SubToAudio(gpu=True)
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=subtitle, tempo_mode="precise")
```
## Shift Mode
`shift_mode` parameter will shift audio that doesnt match subtitle duration.
- `shift_mode="right"` : Shift audio time to the right and prevent audio overlaping.
- `shift_mode="left"` : Shift audio to the left and prevent audio overlap, but be cautious of limited space on the left side, as some audio may disappear.
- `shift_mode="interpose"` : Shift audio to mid position and prevent right and left of audio overlaping. (Note: This mode can be clunky, so use it cautiously.)
- `shift_mode="left-overlap"` : Shift audio time to the left, allowing overlap.
- `shift_mode="interpose-overlap"` : Shift audio to mid position, allowing overlap.
- `shift_limit=int or "str"` : limit audio shift, use integer for millisecond or string like `2.5s` for second
```python
from subtoaudio import SubToAudio
#shift mode with limit of 2 second to the right.
sub = SubToAudio(languages="vie", gpu=True)
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=sub, tempo_mode="overflow", shift_mode="right", limit_shift="2s")
#shift audio to left position or, time before next subtitle appear
sub = SubToAudio(languages="fra" gpu=True)
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=sub, shift_mode="left-overlap")
#shift to left, and limit shift only 1 sec.
sub = SubToAudio(gpu=False)
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=sub, shift_mode="left", shift_limit=1000) #1000 = 1s
```
### Citation
Eren, G., & The Coqui TTS Team. (2021). Coqui TTS (Version 1.4) [Computer software]. https://doi.org/10.5281/zenodo.6334862
Raw data
{
"_id": null,
"home_page": "https://github.com/bnsantoso/",
"name": "subtoaudio",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "subtitle,tts,text to audio,subtitle to audio,subtitle to speech",
"author": "Bagas NS",
"author_email": "bagassantoso71@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/86/ff/3386a97d39cb6b37583ea420d6197ab6f4dde1a6358452a73a5727af9e7c/subtoaudio-0.1.4.tar.gz",
"platform": null,
"description": "# Subtitle to Audio\r\nSubtitle to audio, generate audio from any subtitle file using Coqui-ai TTS and synchronize the audio timing according to subtitle time. \r\n\r\n**Demo :** [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/bnsantoso/sub-to-audio//blob/main/subtitle_to_audio.ipynb)\r\n\r\n[![ko-fi](https://ko-fi.com/img/githubbutton_sm.svg)](https://ko-fi.com/bnsantoso)\r\n## Dependencies\r\n[ffmpeg](https://ffmpeg.org/), [pydub](https://github.com/jiaaro/pydub), [librosa](https://github.com/librosa/librosa), [coqui-ai TTS](https://github.com/coqui-ai/TTS/), [ffmpeg-python](https://github.com/kkroening/ffmpeg-python)\r\n\r\n## Installation\r\n\r\n```bash\r\npip install git+https://github.com/bnsantoso/sub-to-audio\r\n```\r\n```bash\r\npip install subtoaudio\r\n```\r\nffmpeg on linux\r\n```bash\r\napt-get install ffmpeg\r\n```\r\n## Example usage\r\n\r\nBasic use is very similiar to [Coqui-ai TTS](https://github.com/coqui-ai/TTS/), you can check their [documentation](https://tts.readthedocs.io/en/latest/inference.html) and the [<lang-iso_code>](https://dl.fbaipublicfiles.com/mms/tts/all-tts-languages.html).\r\n\r\n**!Note: Use non-overlapping subtitles with an optimal Character per Second / CPS for best result**\r\n\r\n```python\r\nfrom subtoaudio import SubToAudio\r\n\r\n#Using the Fairseq English speaker model as the default\r\n#The code will output 'yoursubtitle.wav' in the current directory.\r\nsub = SubToAudio(gpu=True)\r\nsubtitle = sub.subtitle(\"yoursubtitle.srt\")\r\nsub.convert_to_audio(sub_data=subtitle)\r\n\r\n#you can choose 1100 different language using fairseq model\r\nsub = SubToAudio(language='<lang-iso_code>')\r\nsubtitle = sub.subtitle(\"yoursubtitle.ass\")\r\nsub.convert_to_audio(sub_data=subtitle) \r\n\r\n#specify model name\r\nsub = SubToAudio(model_name=\"tts_models/multilingual/multi-dataset/your_tts\")\r\nsubtitle = sub.subtitle(\"yoursubtitle.srt\")\r\nsub.convert_to_audio(sub_data=subtitle, output_path=\"subtitle.wav\")\r\n\r\n#specify model and config path\r\nsub = SubToAudio(model_path=\"path/to/your/model.pth\" config_path=\"config/path.json\")\r\nsubtitle = sub.subtitle(\"yoursubtitle.srt\")\r\nsub.convert_to_audio(sub_data=subtitle)\r\n\r\n#By default, it is using \"speaker=tts.speakers[0]/None, \r\n#language=tts.languages[0]/None, speaker_wav=None\r\nsub = SubToAudio(model_name=\"tts_models/multilingual/multi-dataset/your_tts\")\r\nsubtitle = sub.subtitle(\"yoursubtitle.srt\")\r\nsub.convert_to_audio(sub_data=subtitle, language=\"en\", speaker=\"speakername\", speaker_wav=\"your/path/speaker.wav\", output_path=\"subtitle.wav\")\r\n\r\n#Save temporary audio to current folder\r\nsub = SubToAudio(model_name=\"tts_models/multilingual/multi-dataset/your_tts\")\r\nsubtitle = sub.subtitle(\"yoursubtitle.srt\")\r\nsub.convert_to_audio(sub_data=subtitle, output_path=\"subtitle.wav\", save_temp=True)\r\n\r\n```\r\n\r\n## Tempo Mode\r\n\r\nUse the `tempo_mode` parameter to speed up the audio. There are three tempo modes: \r\n\r\n- `tempo_mode=\"all\"` : This accelerates all audio. Use `tempo_speed=float` to specify the speed.\r\n- `tempo_mode=\"overflow\"` : This accelerates the audio to match the total subtitle duration plus the blank duration before the next subtitle appears. `'tempo_limit'` will limit the speed increase during overflow.\r\n- `tempo_mode=\"precise\"` : This accelerates the audio to match the duration the subtitle appears.\"\r\n\r\n\r\n```python\r\nfrom subtoaudio import SubToAudio\r\n\r\n#Speed up tempo or speech rate\r\nsub = SubToAudio(gpu=True)\r\nsubtitle = sub.subtitle(\"yoursubtitle.srt\")\r\nsub.convert_to_audio(sub_data=subtitle, tempo_mode=\"all\", tempo_speed=1.3)\r\n\r\n#Change the tempo or speech rate of all audio files , default is 1.2\r\nsub = SubToAudio(gpu=True)\r\nsubtitle = sub.subtitle(\"yoursubtitle.srt\")\r\nsub.convert_to_audio(sub_data=subtitle, tempo_mode=\"all\", tempo_speed=1.3)\r\n\r\n#Change tempo or speech rate to audio that doesn't match the subtitle duration\r\nsub = SubToAudio(gpu=True)\r\nsubtitle = sub.subtitle(\"yoursubtitle.srt\")\r\nsub.convert_to_audio(sub_data=subtitle, tempo_mode=\"overflow\")\r\n\r\n#Limit tempo speed on the overflow mode \r\nsub.convert_to_audio(sub_data=subtitle, tempo_mode=\"overflow\", tempo_limit=1.2)\r\n\r\n#Match audio length to subtitle duration\r\nsub = SubToAudio(gpu=True)\r\nsubtitle = sub.subtitle(\"yoursubtitle.srt\")\r\nsub.convert_to_audio(sub_data=subtitle, tempo_mode=\"precise\")\r\n\r\n```\r\n\r\n## Shift Mode\r\n\r\n`shift_mode` parameter will shift audio that doesnt match subtitle duration.\r\n\r\n- `shift_mode=\"right\"` : Shift audio time to the right and prevent audio overlaping.\r\n- `shift_mode=\"left\"` : Shift audio to the left and prevent audio overlap, but be cautious of limited space on the left side, as some audio may disappear.\r\n- `shift_mode=\"interpose\"` : Shift audio to mid position and prevent right and left of audio overlaping. (Note: This mode can be clunky, so use it cautiously.)\r\n- `shift_mode=\"left-overlap\"` : Shift audio time to the left, allowing overlap.\r\n- `shift_mode=\"interpose-overlap\"` : Shift audio to mid position, allowing overlap.\r\n- `shift_limit=int or \"str\"` : limit audio shift, use integer for millisecond or string like `2.5s` for second\r\n\r\n```python\r\nfrom subtoaudio import SubToAudio\r\n\r\n#shift mode with limit of 2 second to the right.\r\nsub = SubToAudio(languages=\"vie\", gpu=True)\r\nsubtitle = sub.subtitle(\"yoursubtitle.srt\")\r\nsub.convert_to_audio(sub_data=sub, tempo_mode=\"overflow\", shift_mode=\"right\", limit_shift=\"2s\")\r\n\r\n#shift audio to left position or, time before next subtitle appear\r\nsub = SubToAudio(languages=\"fra\" gpu=True)\r\nsubtitle = sub.subtitle(\"yoursubtitle.srt\")\r\nsub.convert_to_audio(sub_data=sub, shift_mode=\"left-overlap\")\r\n\r\n#shift to left, and limit shift only 1 sec.\r\nsub = SubToAudio(gpu=False)\r\nsubtitle = sub.subtitle(\"yoursubtitle.srt\")\r\nsub.convert_to_audio(sub_data=sub, shift_mode=\"left\", shift_limit=1000) #1000 = 1s\r\n\r\n```\r\n\r\n### Citation \r\nEren, G., & The Coqui TTS Team. (2021). Coqui TTS (Version 1.4) [Computer software]. https://doi.org/10.5281/zenodo.6334862\r\n\r\n",
"bugtrack_url": null,
"license": "MPL 2.0",
"summary": "Subtitle to Audio, generate audio or speech from any subtitle file",
"version": "0.1.4",
"project_urls": {
"Homepage": "https://github.com/bnsantoso/"
},
"split_keywords": [
"subtitle",
"tts",
"text to audio",
"subtitle to audio",
"subtitle to speech"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ff9589aaf50b38643857087455b83501d52c288706e10fd7672e0a2bcaa6af41",
"md5": "2f50470a550e95954c52f04473e8789f",
"sha256": "e5327e47ef6204f90581b1cbb1d6905849bb7cea61291e186cd8e48005cbe66d"
},
"downloads": -1,
"filename": "subtoaudio-0.1.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2f50470a550e95954c52f04473e8789f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 11675,
"upload_time": "2023-08-11T04:29:45",
"upload_time_iso_8601": "2023-08-11T04:29:45.330203Z",
"url": "https://files.pythonhosted.org/packages/ff/95/89aaf50b38643857087455b83501d52c288706e10fd7672e0a2bcaa6af41/subtoaudio-0.1.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "86ff3386a97d39cb6b37583ea420d6197ab6f4dde1a6358452a73a5727af9e7c",
"md5": "87b0b86a0660ec845176a84113eb6b20",
"sha256": "14ecad278f500521b9bf723361acb464a0cb7045e76fbce18a31ba7caa329e42"
},
"downloads": -1,
"filename": "subtoaudio-0.1.4.tar.gz",
"has_sig": false,
"md5_digest": "87b0b86a0660ec845176a84113eb6b20",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 11990,
"upload_time": "2023-08-11T04:29:47",
"upload_time_iso_8601": "2023-08-11T04:29:47.728023Z",
"url": "https://files.pythonhosted.org/packages/86/ff/3386a97d39cb6b37583ea420d6197ab6f4dde1a6358452a73a5727af9e7c/subtoaudio-0.1.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-08-11 04:29:47",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "subtoaudio"
}