# gTech Ads Ariel for AI Video Ad Dubbing
### Ariel is an open-source Python library that facilitates efficient and cost-effective dubbing of video ads into multiple languages.
[![python](https://img.shields.io/badge/Python->=3.10-3776AB.svg?style=flat&logo=python&logoColor=white)](https://www.python.org)
[![PyPI](https://img.shields.io/pypi/v/gtech-ariel?logo=pypi&logoColor=white&style=flat)](https://pypi.org/project/gtech-ariel/)
[![GitHub last commit](https://img.shields.io/github/last-commit/google-marketing-solutions/ariel)](https://github.com/google-marketing-solutions/ariel/commits)
[![Code Style: Google](https://img.shields.io/badge/code%20style-google-blueviolet.svg)](https://google.github.io/styleguide/pyguide.html)
[![Open in Colab](https://img.shields.io/badge/Dubbing_Workflow-blue?style=flat&logo=google%20colab&labelColor=grey)](https://colab.research.google.com/github/google-marketing-solutions/ariel/blob/main/examples/video_ad_dubbing_gtech_ads_ariel_demo.ipynb)
##### _This is not an official Google product._
[Overview](#overview) •
[Features](#features) •
[Benefits](#benefits) •
[Building Blocks](#building-blocks) •
[Requirements](#requirements) •
[Language Compatibility](#language-compatibility) •
[Getting Started](#getting-started) •
[References](#references)
## Overview
Ariel is a cutting-edge solution designed to enhance the global reach of digital advertising. It enables advertisers to automate the translation and dubbing of their video ads into a wide range of languages.
## Features
* **Automated Dubbing:** Streamline the generation of high-quality dubbed versions of video ads in various target languages.
* **Scalability:** Handle large volumes of videos and diverse languages efficiently.
* **User-Friendly:** Offers a straightforward API and/or user interface for simplified operation.
* **Cost-Effective:** Significantly reduce dubbing costs compared to traditional methods. The primary expenses are limited to Gemini API and Text-To-Speech API calls.
## Benefits
* **Enhanced Ad Performance:** Improve viewer engagement and potentially increase conversion rates with localized ads.
* **Streamlined Production:** Minimize the time and cost associated with manual translation and voiceover work.
* **Rapid Turnaround:** Quickly generate dubbed versions of ads to accelerate multilingual campaign deployment.
* **Expanded Global Reach:** Reach broader audiences worldwide with localized advertising content.
## Building Blocks
Ariel leverages a powerful combination of state-of-the-art AI and audio processing techniques to deliver accurate and efficient dubbing results:
1. **Video Processing:** Extracts the audio track from the input video file.
2. **Audio Processing:**
* **DEMUCS:** Employed for advanced audio source separation.
* **pyannote:** Performs speaker diarization to identify and separate individual speakers.
3. **Speech-To-Text (STT):**
* **faster-whisper:** A high-performance speech-to-text model.
* **Gemini 1.5 Flash:** A powerful multimodal language model that contributes to enhanced transcription.
4. **Translation:**
* **Gemini 1.5 Flash:** Leverages its language understanding for accurate and contextually relevant translation.
5. **Text-to-Speech (TTS):**
* **GCP's Text-To-Speech:** Generates natural-sounding speech in the target language.
* **[OPTIONAL] ElevenLabs:** An alternative API to generate speech. It's recommened for the best results. **WARNING:** ElevenLabs is a paid solution and will generate extra costs. See the pricing [here](https://elevenlabs.io/pricing).
## Requirements
* **System Requirements:**
* **FFmpeg:** For video and audio processing. If not installed, you can use the following commands:
```bash
sudo apt update
sudo apt install ffmpeg
```
* **GPU (Recommended):** For optimal performance, especially with larger videos.
* **Accounts and Tokens:**
* **Google Cloud Platform (GCP) Project:** Set up a GCP project. See [here](https://cloud.google.com/resource-manager/docs/creating-managing-projects) for instructions.
* **Enabled Text-To-Speech API:** Enable the Text-To-Speech API in your GCP project. See [here](https://cloud.google.com/text-to-speech/docs/before-you-begin) for instructions.
* **Hugging Face Token:** To access the PyAnnote speaker diarization model. See [here](https://huggingface.co/docs/hub/en/security-tokens) on how to get the token.
* **Google AI Studio Token:** To access the Gemini language model. See [here](https://ai.google.dev/gemini-api/docs) on how to get the token.
* **[OPTIONAL] ElevenLabs API:** To access the ElevenLabs API. See [here](https://help.elevenlabs.io/hc/en-us/articles/14599447207697-How-to-authorize-yourself-using-your-xi-api-key).
* **User Agreements:**
* **Hugging Face Model License:** You must accept the user conditions for the PyAnnote speaker diarization [here](https://huggingface.co/pyannote/speaker-diarization-3.1) and segmentation models [here](https://huggingface.co/pyannote/segmentation-3.0).
## Language Compatibility
You can dub video ads from and to the following languages:
* Arabic (ar-SA), (ar-EG)
* Bengali (bn-BD), (bn-IN)
* Bulgarian (bg-BG)
* Chinese (Simplified) (zh-CN)
* Chinese (Traditional) (zh-TW)
* Croatian (hr-HR)
* Czech (cs-CZ)
* Danish (da-DK)
* Dutch (nl-NL)
* English (en-US), (en-GB), (en-CA), (en-AU)
* Estonian (et-EE)
* Finnish (fi-FI)
* French (fr-FR), (fr-CA)
* German (de-DE)
* Greek (el-GR)
* Gujarati (gu-IN)
* Hebrew (he-IL) (Note: Not supported with ElevenLabs API)
* Hindi (hi-IN)
* Hungarian (hu-HU)
* Indonesian (id-ID)
* Italian (it-IT)
* Japanese (ja-JP)
* Kannada (kn-IN)
* Korean (ko-KR)
* Latvian (lv-LV)
* Lithuanian (lt-LT)
* Malayalam (ml-IN)
* Marathi (mr-IN)
* Norwegian (nb-NO), (nn-NO)
* Polish (pl-PL)
* Portuguese (pt-PT), (pt-BR)
* Romanian (ro-RO)
* Russian (ru-RU)
* Serbian (sr-RS)
* Slovak (sk-SK)
* Slovenian (sl-SI)
* Spanish (es-ES), (es-MX)
* Swahili (sw-KE)
* Swedish (sv-SE)
* Tamil (ta-IN), (ta-LK)
* Telugu (te-IN)
* Thai (th-TH)
* Turkish (tr-TR)
* Ukrainian (uk-UA)
* Vietnamese (vi-VN)
The language coverage depends on the underlying services. Check the below for any changes:
### Speech-to-Text (Whisper)
Ariel leverages the open-source Whisper model, which supports a wide array of languages for speech-to-text conversion. The supported languages can be found [here](https://github.com/openai/whisper).
### Translation (Gemini)
Gemini, the language model used for translation, is proficient in multiple languages. For the most current list of supported languages, refer to [here](https://cloud.google.com/gemini/docs/codeassist/supported-languages).
### Text-to-Speech (GCP Text-to-Speech or ElevenLabs)
GCP Text-to-Speech offers an extensive selection of voices in various languages. For a comprehensive list of supported languages and available voices, refer to [here](https://cloud.google.com/text-to-speech/docs/voices).
ElevenLabs API is an alterantive to GCP Text-to-Speech. See a list of supported languages [here](https://elevenlabs.io/docs/api-reference/text-to-speech#supported-languages).
## Getting Started
1. **Installation:**
```bash
pip install gtech-ariel
```
2. **Usage:**
```bash
python main.py --input_file=<path_to_video> --output_directory=<output_dir> --advertiser_name=<name> --original_language=<lang_code> --target_language=<lang_code> [--number_of_speakers=<num>] [--diarization_instructions=<instructions>] [--translation_instructions=<instructions>] [--merge_utterances=<True/False>] [--minimum_merge_threshold=<seconds>] [--preferred_voices=<voice1>,<voice2>] [--clean_up=<True/False>] [--pyannote_model=<model_name>] [--diarization_system_instructions=<instructions>] [--translation_system_instructions=<instructions>] [--hugging_face_token=<token>] [--gemini_token=<token>] [--model_name=<model_name>] [--temperature=<value>] [--top_p=<value>] [--top_k=<value>] [--max_output_tokens=<value>] [--elevenlabs_token=<token>] [--use_elevenlabs=<value>]
```
3. **Configuration:** (Optional)
* Customize settings for speaker diarization, translation, voice selection, and more using the command-line flags.
## References
* **DEMUCS:** [https://github.com/facebookresearch/demucs](https://github.com/facebookresearch/demucs)
* **pyannote:** [https://github.com/pyannote/pyannote-audio](https://github.com/pyannote/pyannote-audio)
* **faster-whisper:** [https://github.com/SYSTRAN/faster-whisper](https://github.com/SYSTRAN/faster-whisper)
* **ElevenLabs:** [https://elevenlabs.io/docs/introduction](https://elevenlabs.io/docs/introduction)
Raw data
{
"_id": null,
"home_page": "https://github.com/google-marketing-solutions/ariel",
"name": "gtech-ariel",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "python ai genai speech-to-text translation text-to-speech video dubbing youtube gcp",
"author": "Google EMEA gTech Ads Data Science Team",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/44/0c/8e386558f1af26ebbb15c159178f0d3c44cfc78fdb82dc0799e2ce5fdfca/gtech-ariel-0.0.11.tar.gz",
"platform": null,
"description": "# gTech Ads Ariel for AI Video Ad Dubbing\n\n### Ariel is an open-source Python library that facilitates efficient and cost-effective dubbing of video ads into multiple languages.\n\n[![python](https://img.shields.io/badge/Python->=3.10-3776AB.svg?style=flat&logo=python&logoColor=white)](https://www.python.org)\n[![PyPI](https://img.shields.io/pypi/v/gtech-ariel?logo=pypi&logoColor=white&style=flat)](https://pypi.org/project/gtech-ariel/)\n[![GitHub last commit](https://img.shields.io/github/last-commit/google-marketing-solutions/ariel)](https://github.com/google-marketing-solutions/ariel/commits)\n[![Code Style: Google](https://img.shields.io/badge/code%20style-google-blueviolet.svg)](https://google.github.io/styleguide/pyguide.html)\n[![Open in Colab](https://img.shields.io/badge/Dubbing_Workflow-blue?style=flat&logo=google%20colab&labelColor=grey)](https://colab.research.google.com/github/google-marketing-solutions/ariel/blob/main/examples/video_ad_dubbing_gtech_ads_ariel_demo.ipynb)\n\n##### _This is not an official Google product._\n\n[Overview](#overview) \u2022\n[Features](#features) \u2022\n[Benefits](#benefits) \u2022\n[Building Blocks](#building-blocks) \u2022\n[Requirements](#requirements) \u2022\n[Language Compatibility](#language-compatibility) \u2022\n[Getting Started](#getting-started) \u2022\n[References](#references)\n\n## Overview\n\nAriel is a cutting-edge solution designed to enhance the global reach of digital advertising. It enables advertisers to automate the translation and dubbing of their video ads into a wide range of languages.\n\n## Features\n\n* **Automated Dubbing:** Streamline the generation of high-quality dubbed versions of video ads in various target languages.\n* **Scalability:** Handle large volumes of videos and diverse languages efficiently.\n* **User-Friendly:** Offers a straightforward API and/or user interface for simplified operation.\n* **Cost-Effective:** Significantly reduce dubbing costs compared to traditional methods. The primary expenses are limited to Gemini API and Text-To-Speech API calls.\n\n## Benefits\n\n* **Enhanced Ad Performance:** Improve viewer engagement and potentially increase conversion rates with localized ads.\n* **Streamlined Production:** Minimize the time and cost associated with manual translation and voiceover work.\n* **Rapid Turnaround:** Quickly generate dubbed versions of ads to accelerate multilingual campaign deployment.\n* **Expanded Global Reach:** Reach broader audiences worldwide with localized advertising content.\n\n## Building Blocks\n\nAriel leverages a powerful combination of state-of-the-art AI and audio processing techniques to deliver accurate and efficient dubbing results:\n\n1. **Video Processing:** Extracts the audio track from the input video file.\n2. **Audio Processing:**\n * **DEMUCS:** Employed for advanced audio source separation.\n * **pyannote:** Performs speaker diarization to identify and separate individual speakers.\n3. **Speech-To-Text (STT):**\n * **faster-whisper:** A high-performance speech-to-text model.\n * **Gemini 1.5 Flash:** A powerful multimodal language model that contributes to enhanced transcription.\n4. **Translation:**\n * **Gemini 1.5 Flash:** Leverages its language understanding for accurate and contextually relevant translation.\n5. **Text-to-Speech (TTS):**\n * **GCP's Text-To-Speech:** Generates natural-sounding speech in the target language.\n * **[OPTIONAL] ElevenLabs:** An alternative API to generate speech. It's recommened for the best results. **WARNING:** ElevenLabs is a paid solution and will generate extra costs. See the pricing [here](https://elevenlabs.io/pricing).\n\n## Requirements\n\n* **System Requirements:**\n * **FFmpeg:** For video and audio processing. If not installed, you can use the following commands:\n ```bash\n sudo apt update\n sudo apt install ffmpeg\n ```\n * **GPU (Recommended):** For optimal performance, especially with larger videos.\n* **Accounts and Tokens:**\n * **Google Cloud Platform (GCP) Project:** Set up a GCP project. See [here](https://cloud.google.com/resource-manager/docs/creating-managing-projects) for instructions.\n * **Enabled Text-To-Speech API:** Enable the Text-To-Speech API in your GCP project. See [here](https://cloud.google.com/text-to-speech/docs/before-you-begin) for instructions.\n * **Hugging Face Token:** To access the PyAnnote speaker diarization model. See [here](https://huggingface.co/docs/hub/en/security-tokens) on how to get the token.\n * **Google AI Studio Token:** To access the Gemini language model. See [here](https://ai.google.dev/gemini-api/docs) on how to get the token.\n * **[OPTIONAL] ElevenLabs API:** To access the ElevenLabs API. See [here](https://help.elevenlabs.io/hc/en-us/articles/14599447207697-How-to-authorize-yourself-using-your-xi-api-key).\n* **User Agreements:**\n * **Hugging Face Model License:** You must accept the user conditions for the PyAnnote speaker diarization [here](https://huggingface.co/pyannote/speaker-diarization-3.1) and segmentation models [here](https://huggingface.co/pyannote/segmentation-3.0).\n\n## Language Compatibility\n\nYou can dub video ads from and to the following languages:\n\n* Arabic (ar-SA), (ar-EG)\n* Bengali (bn-BD), (bn-IN)\n* Bulgarian (bg-BG)\n* Chinese (Simplified) (zh-CN)\n* Chinese (Traditional) (zh-TW)\n* Croatian (hr-HR)\n* Czech (cs-CZ)\n* Danish (da-DK)\n* Dutch (nl-NL)\n* English (en-US), (en-GB), (en-CA), (en-AU)\n* Estonian (et-EE)\n* Finnish (fi-FI)\n* French (fr-FR), (fr-CA)\n* German (de-DE)\n* Greek (el-GR)\n* Gujarati (gu-IN)\n* Hebrew (he-IL) (Note: Not supported with ElevenLabs API)\n* Hindi (hi-IN)\n* Hungarian (hu-HU)\n* Indonesian (id-ID)\n* Italian (it-IT)\n* Japanese (ja-JP)\n* Kannada (kn-IN)\n* Korean (ko-KR)\n* Latvian (lv-LV)\n* Lithuanian (lt-LT)\n* Malayalam (ml-IN)\n* Marathi (mr-IN)\n* Norwegian (nb-NO), (nn-NO)\n* Polish (pl-PL)\n* Portuguese (pt-PT), (pt-BR)\n* Romanian (ro-RO)\n* Russian (ru-RU)\n* Serbian (sr-RS)\n* Slovak (sk-SK)\n* Slovenian (sl-SI)\n* Spanish (es-ES), (es-MX)\n* Swahili (sw-KE)\n* Swedish (sv-SE)\n* Tamil (ta-IN), (ta-LK)\n* Telugu (te-IN)\n* Thai (th-TH)\n* Turkish (tr-TR)\n* Ukrainian (uk-UA)\n* Vietnamese (vi-VN)\n\nThe language coverage depends on the underlying services. Check the below for any changes:\n\n### Speech-to-Text (Whisper)\n\nAriel leverages the open-source Whisper model, which supports a wide array of languages for speech-to-text conversion. The supported languages can be found [here](https://github.com/openai/whisper).\n\n\n### Translation (Gemini)\n\nGemini, the language model used for translation, is proficient in multiple languages. For the most current list of supported languages, refer to [here](https://cloud.google.com/gemini/docs/codeassist/supported-languages).\n\n### Text-to-Speech (GCP Text-to-Speech or ElevenLabs)\n\nGCP Text-to-Speech offers an extensive selection of voices in various languages. For a comprehensive list of supported languages and available voices, refer to [here](https://cloud.google.com/text-to-speech/docs/voices).\nElevenLabs API is an alterantive to GCP Text-to-Speech. See a list of supported languages [here](https://elevenlabs.io/docs/api-reference/text-to-speech#supported-languages).\n\n\n## Getting Started\n\n1. **Installation:**\n\n ```bash\n pip install gtech-ariel\n ```\n\n2. **Usage:**\n\n ```bash\n python main.py --input_file=<path_to_video> --output_directory=<output_dir> --advertiser_name=<name> --original_language=<lang_code> --target_language=<lang_code> [--number_of_speakers=<num>] [--diarization_instructions=<instructions>] [--translation_instructions=<instructions>] [--merge_utterances=<True/False>] [--minimum_merge_threshold=<seconds>] [--preferred_voices=<voice1>,<voice2>] [--clean_up=<True/False>] [--pyannote_model=<model_name>] [--diarization_system_instructions=<instructions>] [--translation_system_instructions=<instructions>] [--hugging_face_token=<token>] [--gemini_token=<token>] [--model_name=<model_name>] [--temperature=<value>] [--top_p=<value>] [--top_k=<value>] [--max_output_tokens=<value>] [--elevenlabs_token=<token>] [--use_elevenlabs=<value>]\n ```\n\n3. **Configuration:** (Optional)\n * Customize settings for speaker diarization, translation, voice selection, and more using the command-line flags.\n\n## References\n\n* **DEMUCS:** [https://github.com/facebookresearch/demucs](https://github.com/facebookresearch/demucs)\n* **pyannote:** [https://github.com/pyannote/pyannote-audio](https://github.com/pyannote/pyannote-audio)\n* **faster-whisper:** [https://github.com/SYSTRAN/faster-whisper](https://github.com/SYSTRAN/faster-whisper)\n* **ElevenLabs:** [https://elevenlabs.io/docs/introduction](https://elevenlabs.io/docs/introduction)\n",
"bugtrack_url": null,
"license": "Apache Software License 2.0",
"summary": "Google EMEA gTech Ads Data Science Team's solution to automatically translate and dub video ads into multiple languages using AI.",
"version": "0.0.11",
"project_urls": {
"Homepage": "https://github.com/google-marketing-solutions/ariel"
},
"split_keywords": [
"python",
"ai",
"genai",
"speech-to-text",
"translation",
"text-to-speech",
"video",
"dubbing",
"youtube",
"gcp"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "f03991836cf364e15cf5096b9cb7da57d70c45b63b15e2098543485ed837da4d",
"md5": "4fb7af9d064f45207a29abfc4924fcdf",
"sha256": "d1703eee2b5b304d7dc431ea320300b52ca8c04db86fd5e39004a75fcd8beba2"
},
"downloads": -1,
"filename": "gtech_ariel-0.0.11-py3-none-any.whl",
"has_sig": false,
"md5_digest": "4fb7af9d064f45207a29abfc4924fcdf",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 49660,
"upload_time": "2024-09-19T13:35:07",
"upload_time_iso_8601": "2024-09-19T13:35:07.310839Z",
"url": "https://files.pythonhosted.org/packages/f0/39/91836cf364e15cf5096b9cb7da57d70c45b63b15e2098543485ed837da4d/gtech_ariel-0.0.11-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "440c8e386558f1af26ebbb15c159178f0d3c44cfc78fdb82dc0799e2ce5fdfca",
"md5": "b51a17bf31ed2e100e3f0a6cc6d520c1",
"sha256": "607a49f14490650407b1807a5016a806264b3bc43b9723a10e7b0526c7f99161"
},
"downloads": -1,
"filename": "gtech-ariel-0.0.11.tar.gz",
"has_sig": false,
"md5_digest": "b51a17bf31ed2e100e3f0a6cc6d520c1",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 46246,
"upload_time": "2024-09-19T13:35:09",
"upload_time_iso_8601": "2024-09-19T13:35:09.057225Z",
"url": "https://files.pythonhosted.org/packages/44/0c/8e386558f1af26ebbb15c159178f0d3c44cfc78fdb82dc0799e2ce5fdfca/gtech-ariel-0.0.11.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-19 13:35:09",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "google-marketing-solutions",
"github_project": "ariel",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "numpy",
"specs": [
[
">=",
"1.17.3"
]
]
},
{
"name": "moviepy",
"specs": [
[
"==",
"1.0.3"
]
]
},
{
"name": "absl-py",
"specs": [
[
"==",
"2.1.0"
]
]
},
{
"name": "demucs",
"specs": [
[
"==",
"4.0.1"
]
]
},
{
"name": "torch",
"specs": [
[
"==",
"2.3.1"
]
]
},
{
"name": "pyannote.audio",
"specs": [
[
"==",
"3.3.0"
]
]
},
{
"name": "pydub",
"specs": [
[
"==",
"0.25.1"
]
]
},
{
"name": "faster-whisper",
"specs": [
[
"==",
"1.0.2"
]
]
},
{
"name": "google-generativeai",
"specs": [
[
"==",
"0.6.0"
]
]
},
{
"name": "google-cloud-texttospeech",
"specs": [
[
"==",
"2.16.3"
]
]
},
{
"name": "tensorflow",
"specs": [
[
"==",
"2.16.1"
]
]
},
{
"name": "elevenlabs",
"specs": [
[
"==",
"1.3.1"
]
]
},
{
"name": "google-api-core",
"specs": [
[
"==",
"2.19.1"
]
]
}
],
"lcname": "gtech-ariel"
}