subtoaudio


Namesubtoaudio JSON
Version 0.1.4 PyPI version JSON
download
home_pagehttps://github.com/bnsantoso/
SummarySubtitle to Audio, generate audio or speech from any subtitle file
upload_time2023-08-11 04:29:47
maintainer
docs_urlNone
authorBagas NS
requires_python
licenseMPL 2.0
keywords subtitle tts text to audio subtitle to audio subtitle to speech
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Subtitle to Audio
Subtitle to audio, generate audio from any subtitle file using Coqui-ai TTS and synchronize the audio timing according to subtitle time. 

**Demo :** [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/bnsantoso/sub-to-audio//blob/main/subtitle_to_audio.ipynb)

[![ko-fi](https://ko-fi.com/img/githubbutton_sm.svg)](https://ko-fi.com/bnsantoso)
## Dependencies
[ffmpeg](https://ffmpeg.org/), [pydub](https://github.com/jiaaro/pydub), [librosa](https://github.com/librosa/librosa), [coqui-ai TTS](https://github.com/coqui-ai/TTS/), [ffmpeg-python](https://github.com/kkroening/ffmpeg-python)

## Installation

```bash
pip install git+https://github.com/bnsantoso/sub-to-audio
```
```bash
pip install subtoaudio
```
ffmpeg on linux
```bash
apt-get install ffmpeg
```
## Example usage

Basic use is very similiar to [Coqui-ai TTS](https://github.com/coqui-ai/TTS/), you can check their [documentation](https://tts.readthedocs.io/en/latest/inference.html) and the [<lang-iso_code>](https://dl.fbaipublicfiles.com/mms/tts/all-tts-languages.html).

**!Note: Use non-overlapping subtitles with an optimal Character per Second / CPS for best result**

```python
from subtoaudio import SubToAudio

#Using the Fairseq English speaker model as the default
#The code will output 'yoursubtitle.wav' in the current directory.
sub = SubToAudio(gpu=True)
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=subtitle)

#you can choose 1100 different language using fairseq model
sub = SubToAudio(language='<lang-iso_code>')
subtitle = sub.subtitle("yoursubtitle.ass")
sub.convert_to_audio(sub_data=subtitle) 

#specify model name
sub = SubToAudio(model_name="tts_models/multilingual/multi-dataset/your_tts")
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=subtitle, output_path="subtitle.wav")

#specify model and config path
sub = SubToAudio(model_path="path/to/your/model.pth" config_path="config/path.json")
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=subtitle)

#By default, it is using "speaker=tts.speakers[0]/None, 
#language=tts.languages[0]/None, speaker_wav=None
sub = SubToAudio(model_name="tts_models/multilingual/multi-dataset/your_tts")
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=subtitle, language="en", speaker="speakername", speaker_wav="your/path/speaker.wav", output_path="subtitle.wav")

#Save temporary audio to current folder
sub = SubToAudio(model_name="tts_models/multilingual/multi-dataset/your_tts")
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=subtitle, output_path="subtitle.wav", save_temp=True)

```

## Tempo Mode

Use the `tempo_mode` parameter to speed up the audio. There are three tempo modes: 

- `tempo_mode="all"` : This accelerates all audio. Use `tempo_speed=float` to specify the speed.
- `tempo_mode="overflow"` : This accelerates the audio to match the total subtitle duration plus the blank duration before the next subtitle appears. `'tempo_limit'` will limit the speed increase during overflow.
- `tempo_mode="precise"` : This accelerates the audio to match the duration the subtitle appears."


```python
from subtoaudio import SubToAudio

#Speed up tempo or speech rate
sub = SubToAudio(gpu=True)
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=subtitle, tempo_mode="all", tempo_speed=1.3)

#Change the tempo or speech rate of all audio files , default is 1.2
sub = SubToAudio(gpu=True)
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=subtitle, tempo_mode="all", tempo_speed=1.3)

#Change tempo or speech rate to audio that doesn't match the subtitle duration
sub = SubToAudio(gpu=True)
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=subtitle, tempo_mode="overflow")

#Limit tempo speed on the overflow mode 
sub.convert_to_audio(sub_data=subtitle, tempo_mode="overflow", tempo_limit=1.2)

#Match audio length to subtitle duration
sub = SubToAudio(gpu=True)
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=subtitle, tempo_mode="precise")

```

## Shift Mode

`shift_mode` parameter will shift audio that doesnt match subtitle duration.

- `shift_mode="right"` : Shift audio time to the right and prevent audio overlaping.
- `shift_mode="left"` : Shift audio to the left and prevent audio overlap, but be cautious of limited space on the left side, as some audio may disappear.
- `shift_mode="interpose"` : Shift audio to mid position and prevent right and left of audio overlaping. (Note: This mode can be clunky, so use it cautiously.)
- `shift_mode="left-overlap"` : Shift audio time to the left, allowing overlap.
- `shift_mode="interpose-overlap"` : Shift audio to mid position, allowing overlap.
- `shift_limit=int or "str"` : limit audio shift, use integer for millisecond or string like `2.5s` for second

```python
from subtoaudio import SubToAudio

#shift mode with limit of 2 second to the right.
sub = SubToAudio(languages="vie", gpu=True)
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=sub, tempo_mode="overflow", shift_mode="right", limit_shift="2s")

#shift audio to left position or, time before next subtitle appear
sub = SubToAudio(languages="fra" gpu=True)
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=sub, shift_mode="left-overlap")

#shift to left, and limit shift only 1 sec.
sub = SubToAudio(gpu=False)
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=sub, shift_mode="left", shift_limit=1000) #1000 = 1s

```

### Citation 
Eren, G., & The Coqui TTS Team. (2021). Coqui TTS (Version 1.4) [Computer software]. https://doi.org/10.5281/zenodo.6334862


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/bnsantoso/",
    "name": "subtoaudio",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "subtitle,tts,text to audio,subtitle to audio,subtitle to speech",
    "author": "Bagas NS",
    "author_email": "bagassantoso71@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/86/ff/3386a97d39cb6b37583ea420d6197ab6f4dde1a6358452a73a5727af9e7c/subtoaudio-0.1.4.tar.gz",
    "platform": null,
    "description": "# Subtitle to Audio\r\nSubtitle to audio, generate audio from any subtitle file using Coqui-ai TTS and synchronize the audio timing according to subtitle time. \r\n\r\n**Demo :** [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/bnsantoso/sub-to-audio//blob/main/subtitle_to_audio.ipynb)\r\n\r\n[![ko-fi](https://ko-fi.com/img/githubbutton_sm.svg)](https://ko-fi.com/bnsantoso)\r\n## Dependencies\r\n[ffmpeg](https://ffmpeg.org/), [pydub](https://github.com/jiaaro/pydub), [librosa](https://github.com/librosa/librosa), [coqui-ai TTS](https://github.com/coqui-ai/TTS/), [ffmpeg-python](https://github.com/kkroening/ffmpeg-python)\r\n\r\n## Installation\r\n\r\n```bash\r\npip install git+https://github.com/bnsantoso/sub-to-audio\r\n```\r\n```bash\r\npip install subtoaudio\r\n```\r\nffmpeg on linux\r\n```bash\r\napt-get install ffmpeg\r\n```\r\n## Example usage\r\n\r\nBasic use is very similiar to [Coqui-ai TTS](https://github.com/coqui-ai/TTS/), you can check their [documentation](https://tts.readthedocs.io/en/latest/inference.html) and the [<lang-iso_code>](https://dl.fbaipublicfiles.com/mms/tts/all-tts-languages.html).\r\n\r\n**!Note: Use non-overlapping subtitles with an optimal Character per Second / CPS for best result**\r\n\r\n```python\r\nfrom subtoaudio import SubToAudio\r\n\r\n#Using the Fairseq English speaker model as the default\r\n#The code will output 'yoursubtitle.wav' in the current directory.\r\nsub = SubToAudio(gpu=True)\r\nsubtitle = sub.subtitle(\"yoursubtitle.srt\")\r\nsub.convert_to_audio(sub_data=subtitle)\r\n\r\n#you can choose 1100 different language using fairseq model\r\nsub = SubToAudio(language='<lang-iso_code>')\r\nsubtitle = sub.subtitle(\"yoursubtitle.ass\")\r\nsub.convert_to_audio(sub_data=subtitle) \r\n\r\n#specify model name\r\nsub = SubToAudio(model_name=\"tts_models/multilingual/multi-dataset/your_tts\")\r\nsubtitle = sub.subtitle(\"yoursubtitle.srt\")\r\nsub.convert_to_audio(sub_data=subtitle, output_path=\"subtitle.wav\")\r\n\r\n#specify model and config path\r\nsub = SubToAudio(model_path=\"path/to/your/model.pth\" config_path=\"config/path.json\")\r\nsubtitle = sub.subtitle(\"yoursubtitle.srt\")\r\nsub.convert_to_audio(sub_data=subtitle)\r\n\r\n#By default, it is using \"speaker=tts.speakers[0]/None, \r\n#language=tts.languages[0]/None, speaker_wav=None\r\nsub = SubToAudio(model_name=\"tts_models/multilingual/multi-dataset/your_tts\")\r\nsubtitle = sub.subtitle(\"yoursubtitle.srt\")\r\nsub.convert_to_audio(sub_data=subtitle, language=\"en\", speaker=\"speakername\", speaker_wav=\"your/path/speaker.wav\", output_path=\"subtitle.wav\")\r\n\r\n#Save temporary audio to current folder\r\nsub = SubToAudio(model_name=\"tts_models/multilingual/multi-dataset/your_tts\")\r\nsubtitle = sub.subtitle(\"yoursubtitle.srt\")\r\nsub.convert_to_audio(sub_data=subtitle, output_path=\"subtitle.wav\", save_temp=True)\r\n\r\n```\r\n\r\n## Tempo Mode\r\n\r\nUse the `tempo_mode` parameter to speed up the audio. There are three tempo modes: \r\n\r\n- `tempo_mode=\"all\"` : This accelerates all audio. Use `tempo_speed=float` to specify the speed.\r\n- `tempo_mode=\"overflow\"` : This accelerates the audio to match the total subtitle duration plus the blank duration before the next subtitle appears. `'tempo_limit'` will limit the speed increase during overflow.\r\n- `tempo_mode=\"precise\"` : This accelerates the audio to match the duration the subtitle appears.\"\r\n\r\n\r\n```python\r\nfrom subtoaudio import SubToAudio\r\n\r\n#Speed up tempo or speech rate\r\nsub = SubToAudio(gpu=True)\r\nsubtitle = sub.subtitle(\"yoursubtitle.srt\")\r\nsub.convert_to_audio(sub_data=subtitle, tempo_mode=\"all\", tempo_speed=1.3)\r\n\r\n#Change the tempo or speech rate of all audio files , default is 1.2\r\nsub = SubToAudio(gpu=True)\r\nsubtitle = sub.subtitle(\"yoursubtitle.srt\")\r\nsub.convert_to_audio(sub_data=subtitle, tempo_mode=\"all\", tempo_speed=1.3)\r\n\r\n#Change tempo or speech rate to audio that doesn't match the subtitle duration\r\nsub = SubToAudio(gpu=True)\r\nsubtitle = sub.subtitle(\"yoursubtitle.srt\")\r\nsub.convert_to_audio(sub_data=subtitle, tempo_mode=\"overflow\")\r\n\r\n#Limit tempo speed on the overflow mode \r\nsub.convert_to_audio(sub_data=subtitle, tempo_mode=\"overflow\", tempo_limit=1.2)\r\n\r\n#Match audio length to subtitle duration\r\nsub = SubToAudio(gpu=True)\r\nsubtitle = sub.subtitle(\"yoursubtitle.srt\")\r\nsub.convert_to_audio(sub_data=subtitle, tempo_mode=\"precise\")\r\n\r\n```\r\n\r\n## Shift Mode\r\n\r\n`shift_mode` parameter will shift audio that doesnt match subtitle duration.\r\n\r\n- `shift_mode=\"right\"` : Shift audio time to the right and prevent audio overlaping.\r\n- `shift_mode=\"left\"` : Shift audio to the left and prevent audio overlap, but be cautious of limited space on the left side, as some audio may disappear.\r\n- `shift_mode=\"interpose\"` : Shift audio to mid position and prevent right and left of audio overlaping. (Note: This mode can be clunky, so use it cautiously.)\r\n- `shift_mode=\"left-overlap\"` : Shift audio time to the left, allowing overlap.\r\n- `shift_mode=\"interpose-overlap\"` : Shift audio to mid position, allowing overlap.\r\n- `shift_limit=int or \"str\"` : limit audio shift, use integer for millisecond or string like `2.5s` for second\r\n\r\n```python\r\nfrom subtoaudio import SubToAudio\r\n\r\n#shift mode with limit of 2 second to the right.\r\nsub = SubToAudio(languages=\"vie\", gpu=True)\r\nsubtitle = sub.subtitle(\"yoursubtitle.srt\")\r\nsub.convert_to_audio(sub_data=sub, tempo_mode=\"overflow\", shift_mode=\"right\", limit_shift=\"2s\")\r\n\r\n#shift audio to left position or, time before next subtitle appear\r\nsub = SubToAudio(languages=\"fra\" gpu=True)\r\nsubtitle = sub.subtitle(\"yoursubtitle.srt\")\r\nsub.convert_to_audio(sub_data=sub, shift_mode=\"left-overlap\")\r\n\r\n#shift to left, and limit shift only 1 sec.\r\nsub = SubToAudio(gpu=False)\r\nsubtitle = sub.subtitle(\"yoursubtitle.srt\")\r\nsub.convert_to_audio(sub_data=sub, shift_mode=\"left\", shift_limit=1000) #1000 = 1s\r\n\r\n```\r\n\r\n### Citation \r\nEren, G., & The Coqui TTS Team. (2021). Coqui TTS (Version 1.4) [Computer software]. https://doi.org/10.5281/zenodo.6334862\r\n\r\n",
    "bugtrack_url": null,
    "license": "MPL 2.0",
    "summary": "Subtitle to Audio, generate audio or speech from any subtitle file",
    "version": "0.1.4",
    "project_urls": {
        "Homepage": "https://github.com/bnsantoso/"
    },
    "split_keywords": [
        "subtitle",
        "tts",
        "text to audio",
        "subtitle to audio",
        "subtitle to speech"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ff9589aaf50b38643857087455b83501d52c288706e10fd7672e0a2bcaa6af41",
                "md5": "2f50470a550e95954c52f04473e8789f",
                "sha256": "e5327e47ef6204f90581b1cbb1d6905849bb7cea61291e186cd8e48005cbe66d"
            },
            "downloads": -1,
            "filename": "subtoaudio-0.1.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2f50470a550e95954c52f04473e8789f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 11675,
            "upload_time": "2023-08-11T04:29:45",
            "upload_time_iso_8601": "2023-08-11T04:29:45.330203Z",
            "url": "https://files.pythonhosted.org/packages/ff/95/89aaf50b38643857087455b83501d52c288706e10fd7672e0a2bcaa6af41/subtoaudio-0.1.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "86ff3386a97d39cb6b37583ea420d6197ab6f4dde1a6358452a73a5727af9e7c",
                "md5": "87b0b86a0660ec845176a84113eb6b20",
                "sha256": "14ecad278f500521b9bf723361acb464a0cb7045e76fbce18a31ba7caa329e42"
            },
            "downloads": -1,
            "filename": "subtoaudio-0.1.4.tar.gz",
            "has_sig": false,
            "md5_digest": "87b0b86a0660ec845176a84113eb6b20",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 11990,
            "upload_time": "2023-08-11T04:29:47",
            "upload_time_iso_8601": "2023-08-11T04:29:47.728023Z",
            "url": "https://files.pythonhosted.org/packages/86/ff/3386a97d39cb6b37583ea420d6197ab6f4dde1a6358452a73a5727af9e7c/subtoaudio-0.1.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-11 04:29:47",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "subtoaudio"
}
        
Elapsed time: 2.52572s