open-dubbing


Nameopen-dubbing JSON
Version 0.1.7 PyPI version JSON
download
home_pagehttps://github.com/Softcatala/open-dubbing
SummaryAI dubbing system which uses machine learning models to automatically translate and synchronize audio dialogue into different languages.
upload_time2024-12-31 14:30:33
maintainerNone
docs_urlNone
authorJordi Mas
requires_python>=3.10
licenseApache Software License 2.0
keywords
VCS
bugtrack_url
requirements numpy moviepy demucs psutil torch pyannote.audio pydub ctranslate2 faster-whisper transformers spacy iso639-lang edge-tts
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![PyPI version](https://img.shields.io/pypi/v/open-dubbing.svg?logo=pypi&logoColor=FFE873)](https://pypi.org/project/open-dubbing/)
[![PyPI downloads](https://img.shields.io/pypi/dm/open-dubbing.svg)](https://pypistats.org/packages/open-dubbing)
[![codecov](https://codecov.io/github/softcatala/open-dubbing/graph/badge.svg?token=TI6SIB9SGK)](https://codecov.io/github/softcatala/open-dubbing)

# Introduction

Open dubbing is an AI dubbing system uses machine learning models to automatically translate and synchronize audio dialogue into different languages.
It is designed as a command line tool.

At the moment, it is pure *experimental* and an excuse to help me to understand better STT, TTS and translation systems combined together.

If you want to see a live system running you can do it at https://www.softcatala.org/doblatge/. It combines this project and https://github.com/Softcatala/subdub-editor.

# Features

* Build on top of open source models and able to run it locally
* Dubs automatically a video from a source to a target language
* Supports multiple Text To Speech (TTS): Coqui, MMS, Edge
 * Allows to use any non-supported one by configuring an API or CLI
* Gender voice detection to allow to assign properly synthetic voice
* Support for multiple translation engines (Meta's NLLB, Apertium API, etc)
* Automatic detection of the source language of the video (using Whisper)

# Roadmap

Areas what we will like to explore:

* Better control of voice used for dubbing
* Optimize it for long videos and less resource usage
* Support for multiple video input formats

# Demo

This video on propose shows the strengths and limitations of the system.

*Original English video*

https://github.com/user-attachments/assets/54c0d37f-0cc8-4ea2-8f8d-fd2d2f4eeccc

*Automatic dubbed video in Catalan*


https://github.com/user-attachments/assets/99936655-5851-4d0c-827b-f36f79f56190


# Limitations

* This is an experimental project
* Automatic video dubbing includes speech recognition, translation, vocal recognition, etc. At each one of these steps errors can be introduced

# Supported languages

The support languages depends on the combination of text to speech, translation system and text to speech system used. With Coqui TTS, these are the languages supported (I only tested a very few of them):

Supported source languages: Afrikaans, Amharic, Armenian, Assamese, Bashkir, Basque, Belarusian, Bengali, Bosnian, Bulgarian, Burmese, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Faroese, Finnish, French, Galician, Georgian, German, Gujarati, Haitian, Hausa, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Lao, Lingala, Lithuanian, Luxembourgish, Macedonian, Malayalam, Maltese, Maori, Marathi, Modern Greek (1453-), Norwegian Nynorsk, Occitan (post 1500), Panjabi, Polish, Portuguese, Romanian, Russian, Sanskrit, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swedish, Tagalog, Tajik, Tamil, Tatar, Telugu, Thai, Tibetan, Turkish, Turkmen, Ukrainian, Urdu, Vietnamese, Welsh, Yoruba, Yue Chinese

Supported target languages: Achinese, Akan, Amharic, Assamese, Awadhi, Ayacucho Quechua, Balinese, Bambara, Bashkir, Basque, Bemba (Zambia), Bengali, Bulgarian, Burmese, Catalan, Cebuano, Central Aymara, Chhattisgarhi, Crimean Tatar, Dutch, Dyula, Dzongkha, English, Ewe, Faroese, Fijian, Finnish, Fon, French, Ganda, German, Guarani, Gujarati, Haitian, Hausa, Hebrew, Hindi, Hungarian, Icelandic, Iloko, Indonesian, Javanese, Kabiyè, Kabyle, Kachin, Kannada, Kazakh, Khmer, Kikuyu, Kinyarwanda, Kirghiz, Korean, Lao, Magahi, Maithili, Malayalam, Marathi, Minangkabau, Modern Greek (1453-), Mossi, North Azerbaijani, Northern Kurdish, Nuer, Nyanja, Odia, Pangasinan, Panjabi, Papiamento, Polish, Portuguese, Romanian, Rundi, Russian, Samoan, Sango, Shan, Shona, Somali, South Azerbaijani, Southwestern Dinka, Spanish, Sundanese, Swahili (individual language), Swedish, Tagalog, Tajik, Tamasheq, Tamil, Tatar, Telugu, Thai, Tibetan, Tigrinya, Tok Pisin, Tsonga, Turkish, Turkmen, Uighur, Ukrainian, Urdu, Vietnamese, Waray (Philippines), Welsh, Yoruba

# Installation

To install the open_dubbing in all platforms:

```shell
pip install open_dubbing
```

If you want to install also Coqui-tts, do:

```shell
pip install open_dubbing[coqui]
```

## Linux additional dependencies

In Linux you also need to install:

```shell
sudo apt install ffmpeg
```

If you are going to use Coqui-tts you also need to install espeak-ng:

```shell
sudo apt install espeak-ng
```

## macOS additional dependencies

In macOS you also need to install:

```shell
brew install ffmpeg
```

If you are going to use Coqui-tts you also need to install espeak-ng:

```shell
brew install espeak-ng
```

## Windows additional dependencies

Windows currently works but it has not been tested extensively.

You also need to install [ffmpeg](https://www.ffmpeg.org/download.html) for Windows. Make sure that is the system path.

## Accept pyannote license

1. Go to and Accept [`pyannote/segmentation-3.0`](https://hf.co/pyannote/segmentation-3.0) user conditions
2. Accept [`pyannote/speaker-diarization-3.1`](https://hf.co/pyannote/speaker-diarization-3.1) user conditions
3. Go to and access token at [`hf.co/settings/tokens`](https://hf.co/settings/tokens).

# Quick start

Quick start

```shell
 open-dubbing --input_file video.mp4 --target_language=cat --hugging_face_token=TOKEN
```

Where:
- _TOKEN_ is the HuggingFace token that allows to access the models
- _cat_ in this case is the target language using iso ISO 639-3 language codes

By default, the source language is predicted using the first 30 seconds of the video. If this does not work (e.g. there is only music at the beginning), use the parameter _source_language_ to specify the source language using ISO 639-3 language codes (e.g. 'eng' for English).

To get a list of available options:

```shell
open-dubbing --help

```
# Post editing automatic generated dubbed files

There are cases where you want to manually adjust the text generated automatically for dubbing, the voice used or the timings.

After you have executed _open-dubbing_ you have the intermediate files and the outcome dubbed file in the selected output directory.

You can edit the file _utterance_metadata_XXX.json_ (where XXX is the target language code), make manual adjustments, and generate the video again.

See an example JSON:

```json
    "utterances": [
        {
            "start": 7.607843750000001,
            "end": 8.687843750000003,
            "speaker_id": "SPEAKER_00",
            "path": "short/chunk_7.607843750000001_8.687843750000003.mp3",
            "text": "And I love this city.",
            "for_dubbing": true,
            "gender": "Male",
            "translated_text": **"I m'encanta aquesta ciutat."**,
            "assigned_voice": "ca-ES-EnricNeural",
            "speed": 1.3,
            "dubbed_path": "short/dubbed_chunk_7.607843750000001_8.687843750000003.mp3",
            "hash": "b11d7f0e2aa5475e652937469d89ef0a178fecea726f076095942d552944089f"
        },
```

Imagine that you have changed the **translated_text**. To generated the post-edited video:

```shell
 open-dubbing --input_file video.mp4 --target_language=cat --hugging_face_token=TOKEN --update
```

The _update_ parameter changes the behavior of _open-dubbing_ and instead of producing a full dubbing it rebuilds the already existing dubbing incorporating any change made into the JSON file.

Fields that are usefull to modify are: translated_text, gender (of the voice) or speed.

# Documentation

For more detailed documentation on how the tool works and how to use it, see our [documentation page](./DOCUMENTATION.md).

# Appreciation

Core libraries used:
* [demucs](https://github.com/facebookresearch/demucs) to separate vocals from the audio
* [pyannote-audio](https://github.com/pyannote/pyannote-audio) to diarize speakers
* [faster-whisper](https://github.com/SYSTRAN/faster-whisper) for audio to speech
* [NLLB-200](https://github.com/facebookresearch/fairseq/tree/nllb) for machine translation
* TTS
  * [coqui-tts](https://github.com/idiap/coqui-ai-TTS)
  * Meta [mms](https://github.com/facebookresearch/fairseq/tree/main/examples/mms)
  * Microsoft [Edge TTS](https://github.com/rany2/edge-tts)

And very special thanks to [ariel](https://github.com/google-marketing-solutions/ariel) from which we leveraged parts of their code base.

# License

See [license](./LICENSE)

# Contact

Email address: Jordi Mas: jmas@softcatala.org

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Softcatala/open-dubbing",
    "name": "open-dubbing",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": "Jordi Mas",
    "author_email": "jmas@softcatala.org",
    "download_url": "https://files.pythonhosted.org/packages/a8/36/7b6aaa1642c71c5ec06c473c7a79005b5e59d08fad39c53238775f270e0f/open_dubbing-0.1.7.tar.gz",
    "platform": null,
    "description": "[![PyPI version](https://img.shields.io/pypi/v/open-dubbing.svg?logo=pypi&logoColor=FFE873)](https://pypi.org/project/open-dubbing/)\n[![PyPI downloads](https://img.shields.io/pypi/dm/open-dubbing.svg)](https://pypistats.org/packages/open-dubbing)\n[![codecov](https://codecov.io/github/softcatala/open-dubbing/graph/badge.svg?token=TI6SIB9SGK)](https://codecov.io/github/softcatala/open-dubbing)\n\n# Introduction\n\nOpen dubbing is an AI dubbing system uses machine learning models to automatically translate and synchronize audio dialogue into different languages.\nIt is designed as a command line tool.\n\nAt the moment, it is pure *experimental* and an excuse to help me to understand better STT, TTS and translation systems combined together.\n\nIf you want to see a live system running you can do it at https://www.softcatala.org/doblatge/. It combines this project and https://github.com/Softcatala/subdub-editor.\n\n# Features\n\n* Build on top of open source models and able to run it locally\n* Dubs automatically a video from a source to a target language\n* Supports multiple Text To Speech (TTS): Coqui, MMS, Edge\n * Allows to use any non-supported one by configuring an API or CLI\n* Gender voice detection to allow to assign properly synthetic voice\n* Support for multiple translation engines (Meta's NLLB, Apertium API, etc)\n* Automatic detection of the source language of the video (using Whisper)\n\n# Roadmap\n\nAreas what we will like to explore:\n\n* Better control of voice used for dubbing\n* Optimize it for long videos and less resource usage\n* Support for multiple video input formats\n\n# Demo\n\nThis video on propose shows the strengths and limitations of the system.\n\n*Original English video*\n\nhttps://github.com/user-attachments/assets/54c0d37f-0cc8-4ea2-8f8d-fd2d2f4eeccc\n\n*Automatic dubbed video in Catalan*\n\n\nhttps://github.com/user-attachments/assets/99936655-5851-4d0c-827b-f36f79f56190\n\n\n# Limitations\n\n* This is an experimental project\n* Automatic video dubbing includes speech recognition, translation, vocal recognition, etc. At each one of these steps errors can be introduced\n\n# Supported languages\n\nThe support languages depends on the combination of text to speech, translation system and text to speech system used. With Coqui TTS, these are the languages supported (I only tested a very few of them):\n\nSupported source languages: Afrikaans, Amharic, Armenian, Assamese, Bashkir, Basque, Belarusian, Bengali, Bosnian, Bulgarian, Burmese, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Faroese, Finnish, French, Galician, Georgian, German, Gujarati, Haitian, Hausa, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Lao, Lingala, Lithuanian, Luxembourgish, Macedonian, Malayalam, Maltese, Maori, Marathi, Modern Greek (1453-), Norwegian Nynorsk, Occitan (post 1500), Panjabi, Polish, Portuguese, Romanian, Russian, Sanskrit, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swedish, Tagalog, Tajik, Tamil, Tatar, Telugu, Thai, Tibetan, Turkish, Turkmen, Ukrainian, Urdu, Vietnamese, Welsh, Yoruba, Yue Chinese\n\nSupported target languages: Achinese, Akan, Amharic, Assamese, Awadhi, Ayacucho Quechua, Balinese, Bambara, Bashkir, Basque, Bemba (Zambia), Bengali, Bulgarian, Burmese, Catalan, Cebuano, Central Aymara, Chhattisgarhi, Crimean Tatar, Dutch, Dyula, Dzongkha, English, Ewe, Faroese, Fijian, Finnish, Fon, French, Ganda, German, Guarani, Gujarati, Haitian, Hausa, Hebrew, Hindi, Hungarian, Icelandic, Iloko, Indonesian, Javanese, Kabiy\u00e8, Kabyle, Kachin, Kannada, Kazakh, Khmer, Kikuyu, Kinyarwanda, Kirghiz, Korean, Lao, Magahi, Maithili, Malayalam, Marathi, Minangkabau, Modern Greek (1453-), Mossi, North Azerbaijani, Northern Kurdish, Nuer, Nyanja, Odia, Pangasinan, Panjabi, Papiamento, Polish, Portuguese, Romanian, Rundi, Russian, Samoan, Sango, Shan, Shona, Somali, South Azerbaijani, Southwestern Dinka, Spanish, Sundanese, Swahili (individual language), Swedish, Tagalog, Tajik, Tamasheq, Tamil, Tatar, Telugu, Thai, Tibetan, Tigrinya, Tok Pisin, Tsonga, Turkish, Turkmen, Uighur, Ukrainian, Urdu, Vietnamese, Waray (Philippines), Welsh, Yoruba\n\n# Installation\n\nTo install the open_dubbing in all platforms:\n\n```shell\npip install open_dubbing\n```\n\nIf you want to install also Coqui-tts, do:\n\n```shell\npip install open_dubbing[coqui]\n```\n\n## Linux additional dependencies\n\nIn Linux you also need to install:\n\n```shell\nsudo apt install ffmpeg\n```\n\nIf you are going to use Coqui-tts you also need to install espeak-ng:\n\n```shell\nsudo apt install espeak-ng\n```\n\n## macOS additional dependencies\n\nIn macOS you also need to install:\n\n```shell\nbrew install ffmpeg\n```\n\nIf you are going to use Coqui-tts you also need to install espeak-ng:\n\n```shell\nbrew install espeak-ng\n```\n\n## Windows additional dependencies\n\nWindows currently works but it has not been tested extensively.\n\nYou also need to install [ffmpeg](https://www.ffmpeg.org/download.html) for Windows. Make sure that is the system path.\n\n## Accept pyannote license\n\n1. Go to and Accept [`pyannote/segmentation-3.0`](https://hf.co/pyannote/segmentation-3.0) user conditions\n2. Accept [`pyannote/speaker-diarization-3.1`](https://hf.co/pyannote/speaker-diarization-3.1) user conditions\n3. Go to and access token at [`hf.co/settings/tokens`](https://hf.co/settings/tokens).\n\n# Quick start\n\nQuick start\n\n```shell\n open-dubbing --input_file video.mp4 --target_language=cat --hugging_face_token=TOKEN\n```\n\nWhere:\n- _TOKEN_ is the HuggingFace token that allows to access the models\n- _cat_ in this case is the target language using iso ISO 639-3 language codes\n\nBy default, the source language is predicted using the first 30 seconds of the video. If this does not work (e.g. there is only music at the beginning), use the parameter _source_language_ to specify the source language using ISO 639-3 language codes (e.g. 'eng' for English).\n\nTo get a list of available options:\n\n```shell\nopen-dubbing --help\n\n```\n# Post editing automatic generated dubbed files\n\nThere are cases where you want to manually adjust the text generated automatically for dubbing, the voice used or the timings.\n\nAfter you have executed _open-dubbing_ you have the intermediate files and the outcome dubbed file in the selected output directory.\n\nYou can edit the file _utterance_metadata_XXX.json_ (where XXX is the target language code), make manual adjustments, and generate the video again.\n\nSee an example JSON:\n\n```json\n    \"utterances\": [\n        {\n            \"start\": 7.607843750000001,\n            \"end\": 8.687843750000003,\n            \"speaker_id\": \"SPEAKER_00\",\n            \"path\": \"short/chunk_7.607843750000001_8.687843750000003.mp3\",\n            \"text\": \"And I love this city.\",\n            \"for_dubbing\": true,\n            \"gender\": \"Male\",\n            \"translated_text\": **\"I m'encanta aquesta ciutat.\"**,\n            \"assigned_voice\": \"ca-ES-EnricNeural\",\n            \"speed\": 1.3,\n            \"dubbed_path\": \"short/dubbed_chunk_7.607843750000001_8.687843750000003.mp3\",\n            \"hash\": \"b11d7f0e2aa5475e652937469d89ef0a178fecea726f076095942d552944089f\"\n        },\n```\n\nImagine that you have changed the **translated_text**. To generated the post-edited video:\n\n```shell\n open-dubbing --input_file video.mp4 --target_language=cat --hugging_face_token=TOKEN --update\n```\n\nThe _update_ parameter changes the behavior of _open-dubbing_ and instead of producing a full dubbing it rebuilds the already existing dubbing incorporating any change made into the JSON file.\n\nFields that are usefull to modify are: translated_text, gender (of the voice) or speed.\n\n# Documentation\n\nFor more detailed documentation on how the tool works and how to use it, see our [documentation page](./DOCUMENTATION.md).\n\n# Appreciation\n\nCore libraries used:\n* [demucs](https://github.com/facebookresearch/demucs) to separate vocals from the audio\n* [pyannote-audio](https://github.com/pyannote/pyannote-audio) to diarize speakers\n* [faster-whisper](https://github.com/SYSTRAN/faster-whisper) for audio to speech\n* [NLLB-200](https://github.com/facebookresearch/fairseq/tree/nllb) for machine translation\n* TTS\n  * [coqui-tts](https://github.com/idiap/coqui-ai-TTS)\n  * Meta [mms](https://github.com/facebookresearch/fairseq/tree/main/examples/mms)\n  * Microsoft [Edge TTS](https://github.com/rany2/edge-tts)\n\nAnd very special thanks to [ariel](https://github.com/google-marketing-solutions/ariel) from which we leveraged parts of their code base.\n\n# License\n\nSee [license](./LICENSE)\n\n# Contact\n\nEmail address: Jordi Mas: jmas@softcatala.org\n",
    "bugtrack_url": null,
    "license": "Apache Software License 2.0",
    "summary": "AI dubbing system which uses machine learning models to automatically translate and synchronize audio dialogue into different languages.",
    "version": "0.1.7",
    "project_urls": {
        "Homepage": "https://github.com/Softcatala/open-dubbing"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "75835bb97921fd45b95e0850295459efd259f320d6c662c1ec9995ea4968c587",
                "md5": "78c8c33daa0ef7f4467e41e9d6ee33d2",
                "sha256": "5ed88dc14be4179db398ab64ff9ea18452b2747bc8eea0c385132909532d5beb"
            },
            "downloads": -1,
            "filename": "open_dubbing-0.1.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "78c8c33daa0ef7f4467e41e9d6ee33d2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 59405,
            "upload_time": "2024-12-31T14:30:30",
            "upload_time_iso_8601": "2024-12-31T14:30:30.849644Z",
            "url": "https://files.pythonhosted.org/packages/75/83/5bb97921fd45b95e0850295459efd259f320d6c662c1ec9995ea4968c587/open_dubbing-0.1.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a8367b6aaa1642c71c5ec06c473c7a79005b5e59d08fad39c53238775f270e0f",
                "md5": "8b0c3f480a084d400643e991da013369",
                "sha256": "bc683129676be2250be555f5606d1d48779a5fe5b96769fc4d5cefb0a0be40c5"
            },
            "downloads": -1,
            "filename": "open_dubbing-0.1.7.tar.gz",
            "has_sig": false,
            "md5_digest": "8b0c3f480a084d400643e991da013369",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 45285,
            "upload_time": "2024-12-31T14:30:33",
            "upload_time_iso_8601": "2024-12-31T14:30:33.274176Z",
            "url": "https://files.pythonhosted.org/packages/a8/36/7b6aaa1642c71c5ec06c473c7a79005b5e59d08fad39c53238775f270e0f/open_dubbing-0.1.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-31 14:30:33",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Softcatala",
    "github_project": "open-dubbing",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.17.3"
                ]
            ]
        },
        {
            "name": "moviepy",
            "specs": [
                [
                    "==",
                    "2.1.1"
                ]
            ]
        },
        {
            "name": "demucs",
            "specs": [
                [
                    "==",
                    "4.0.1"
                ]
            ]
        },
        {
            "name": "psutil",
            "specs": [
                [
                    "==",
                    "6.0.0"
                ]
            ]
        },
        {
            "name": "torch",
            "specs": [
                [
                    "<",
                    "2.5"
                ],
                [
                    ">=",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "pyannote.audio",
            "specs": [
                [
                    "==",
                    "3.3.2"
                ]
            ]
        },
        {
            "name": "pydub",
            "specs": [
                [
                    "==",
                    "0.25.1"
                ]
            ]
        },
        {
            "name": "ctranslate2",
            "specs": [
                [
                    ">=",
                    "4.1.0"
                ],
                [
                    "<=",
                    "4.4.0"
                ]
            ]
        },
        {
            "name": "faster-whisper",
            "specs": [
                [
                    "==",
                    "1.1.0"
                ]
            ]
        },
        {
            "name": "transformers",
            "specs": [
                [
                    "<=",
                    "4.45.2"
                ],
                [
                    ">=",
                    "4.40"
                ]
            ]
        },
        {
            "name": "spacy",
            "specs": [
                [
                    "==",
                    "3.7.6"
                ]
            ]
        },
        {
            "name": "iso639-lang",
            "specs": [
                [
                    "==",
                    "2.3.0"
                ]
            ]
        },
        {
            "name": "edge-tts",
            "specs": [
                [
                    "==",
                    "6.1.12"
                ]
            ]
        }
    ],
    "lcname": "open-dubbing"
}
        
Elapsed time: 0.95460s