# đ ebook2audiobook
Convert eBooks to audiobooks with chapters and metadata using Calibre and Coqui XTTS. Supports optional voice cloning and multiple languages!
#### đĨī¸ Web GUI Interface
![demo_web_gui](https://github.com/user-attachments/assets/85af88a7-05dd-4a29-91de-76a14cf5ef06)
<details>
<summary>Click to see images of Web GUI</summary>
<img width="1728" alt="image" src="https://github.com/user-attachments/assets/b36c71cf-8e06-484c-a252-934e6b1d0c2f">
<img width="1728" alt="image" src="https://github.com/user-attachments/assets/c0dab57a-d2d4-4658-bff9-3842ec90cb40">
<img width="1728" alt="image" src="https://github.com/user-attachments/assets/0a99eeac-c521-4b21-8656-e064c1adc528">
</details>
## README.md
- en [English](README.md)
- zh_CN [įŽäŊä¸æ](readme/README_CN.md)
## đ Features
- đ Converts eBooks to text format with Calibre.
- đ Splits eBook into chapters for organized audio.
- đī¸ High-quality text-to-speech with Coqui XTTS.
- đŖī¸ Optional voice cloning with your own voice file.
- đ Supports multiple languages (English by default).
- đĨī¸ Designed to run on 4GB RAM.
## đ¤ [Huggingface space demo](https://huggingface.co/spaces/drewThomasson/ebook2audiobookXTTS)
- Huggingface space is running on free cpu tier so expect very slow or timeout lol, just don't give it giant files is all
- Best to duplicate space or run locally.
## Free Google Colab [![Free Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DrewThomasson/ebook2audiobookXTTS/blob/main/Notebooks/colab_ebook2audiobookxtts.ipynb)
## đ ī¸ Requirements
- Python 3.10
- `coqui-tts` Python package
- Calibre (for eBook conversion)
- FFmpeg (for audiobook creation)
- Optional: Custom voice file for voice cloning
### đ§ Installation Instructions
1. **Install Python 3.7 < version < 3.13** from [Python.org](https://www.python.org/downloads/).
2. **Install Calibre**:
- **Ubuntu**: `sudo apt-get install -y calibre`
- **macOS**: `brew install calibre`
- **Windows** (Admin Powershell): `choco install calibre`
3. **Install FFmpeg**:
- **Ubuntu**: `sudo apt-get install -y ffmpeg`
- **macOS**: `brew install ffmpeg`
- **Windows** (Admin Powershell): `choco install ffmpeg`
4. **Optional: Install Mecab** (for non-Latin languages):
- **Ubuntu**: `sudo apt-get install -y mecab libmecab-dev mecab-ipadic-utf8`
- **macOS**: `brew install mecab`, `brew install mecab-ipadic`
- **Windows**: [mecab-website-to-install-manually](https://taku910.github.io/mecab/#download) (Note: Japanese support is limited)
5. **Pip install ebook2audiobook**:
```bash
pip install ebook2audiobook
```
**For non-Latin languages (Japanese Support)**:
```bash
pip install mecab mecab-python3 unidic
python -m unidic download
```
## đ Supported Languages
- **English (en)**
- **Spanish (es)**
- **French (fr)**
- **German (de)**
- **Italian (it)**
- **Portuguese (pt)**
- **Polish (pl)**
- **Turkish (tr)**
- **Russian (ru)**
- **Dutch (nl)**
- **Czech (cs)**
- **Arabic (ar)**
- **Chinese (zh-cn)**
- **Japanese (ja)**
- **Hungarian (hu)**
- **Korean (ko)**
Specify the language code when running the script in headless mode.
## đ Usage
### đĨī¸ Launching Gradio Web Interface
1. **Run the command**:
```bash
ebook2audiobook
```
2. **Open the Web App**: Click the URL provided in the terminal to access the web app and convert eBooks.
3. **For Public Link**: Add `--share True` to the end of it like this: `ebook2audiobook --share True`
- **[For More Parameters]**: use the `-h` parameter like this `ebook2audiobook -h`
### đ Basic Headless Usage
```bash
ebook2audiobook --headless True --ebook <path_to_ebook_file> --voice [path_to_voice_file] --language [language_code]
```
- **<path_to_ebook_file>**: Path to your eBook file.
- **[path_to_voice_file]**: Optional for voice cloning.
- **[language_code]**: Optional to specify language.
- **[For More Parameters]**: use the `-h` parameter like this `ebook2audiobook -h`
### 𧊠Headless Custom XTTS Model Usage
```bash
ebook2audiobook --headless True --use_custom_model True --ebook <ebook_file_path> --voice <target_voice_file_path> --language <language> --custom_model <custom_model_path> --custom_config <custom_config_path> --custom_vocab <custom_vocab_path>
```
- **<ebook_file_path>**: Path to your eBook file.
- **<target_voice_file_path>**: Optional for voice cloning.
- **<language>**: Optional to specify language.
- **<custom_model_path>**: Path to `model.pth`.
- **<custom_config_path>**: Path to `config.json`.
- **<custom_vocab_path>**: Path to `vocab.json`.
- **[For More Parameters]**: use the `-h` parameter like this `ebook2audiobook -h`
### 𧊠Headless Custom XTTS Model Usage With Zip link to XTTS Fine-Tune Model đ
```bash
ebook2audiobook --headless True --use_custom_model True --ebook <ebook_file_path> --voice <target_voice_file_path> --language <language> --custom_model_url <custom_model_URL_ZIP_path>
```
- **<ebook_file_path>**: Path to your eBook file.
- **<target_voice_file_path>**: Optional for voice cloning.
- **<language>**: Optional to specify language.
- **<custom_model_URL_ZIP_path>**: URL Path to zip of Model folder. For Example this for the [xtts_David_Attenborough_fine_tune](https://huggingface.co/drewThomasson/xtts_David_Attenborough_fine_tune/tree/main) `https://huggingface.co/drewThomasson/xtts_David_Attenborough_fine_tune/resolve/main/Finished_model_files.zip?download=true`
- For a custom model a ref audio clip of the voice will also be needed:
[ref audio clip of David Attenborough](https://huggingface.co/drewThomasson/xtts_David_Attenborough_fine_tune/blob/main/ref.wav)
- **[For More Parameters]**: use the `-h` parameter like this `ebook2audiobook -h`
### đ For Detailed Guide with list of all Parameters to use
```bash
ebook2audiobook -h
```
- This will output the following:
```bash
usage: app.py [-h] [--share SHARE] [--headless HEADLESS] [--ebook EBOOK] [--voice VOICE]
[--language LANGUAGE] [--use_custom_model USE_CUSTOM_MODEL]
[--custom_model CUSTOM_MODEL] [--custom_config CUSTOM_CONFIG]
[--custom_vocab CUSTOM_VOCAB] [--custom_model_url CUSTOM_MODEL_URL]
[--temperature TEMPERATURE] [--length_penalty LENGTH_PENALTY]
[--repetition_penalty REPETITION_PENALTY] [--top_k TOP_K] [--top_p TOP_P]
[--speed SPEED] [--enable_text_splitting ENABLE_TEXT_SPLITTING]
Convert eBooks to Audiobooks using a Text-to-Speech model. You can either launch the
Gradio interface or run the script in headless mode for direct conversion.
options:
-h, --help show this help message and exit
--share SHARE Set to True to enable a public shareable Gradio link. Defaults
to False.
--headless HEADLESS Set to True to run in headless mode without the Gradio
interface. Defaults to False.
--ebook EBOOK Path to the ebook file for conversion. Required in headless
mode.
--voice VOICE Path to the target voice file for TTS. Optional, uses a default
voice if not provided.
--language LANGUAGE Language for the audiobook conversion. Options: en, es, fr, de,
it, pt, pl, tr, ru, nl, cs, ar, zh-cn, ja, hu, ko. Defaults to
English (en).
--use_custom_model USE_CUSTOM_MODEL
Set to True to use a custom TTS model. Defaults to False. Must
be True to use custom models, otherwise you'll get an error.
--custom_model CUSTOM_MODEL
Path to the custom model file (.pth). Required if using a custom
model.
--custom_config CUSTOM_CONFIG
Path to the custom config file (config.json). Required if using
a custom model.
--custom_vocab CUSTOM_VOCAB
Path to the custom vocab file (vocab.json). Required if using a
custom model.
--custom_model_url CUSTOM_MODEL_URL
URL to download the custom model as a zip file. Optional, but
will be used if provided. Examples include David Attenborough's
model: 'https://huggingface.co/drewThomasson/xtts_David_Attenbor
ough_fine_tune/resolve/main/Finished_model_files.zip?download=tr
ue'. More XTTS fine-tunes can be found on my Hugging Face at
'https://huggingface.co/drewThomasson'.
--temperature TEMPERATURE
Temperature for the model. Defaults to 0.65. Higher Tempatures
will lead to more creative outputs IE: more Hallucinations.
Lower Tempatures will be more monotone outputs IE: less
Hallucinations.
--length_penalty LENGTH_PENALTY
A length penalty applied to the autoregressive decoder. Defaults
to 1.0. Not applied to custom models.
--repetition_penalty REPETITION_PENALTY
A penalty that prevents the autoregressive decoder from
repeating itself. Defaults to 2.0.
--top_k TOP_K Top-k sampling. Lower values mean more likely outputs and
increased audio generation speed. Defaults to 50.
--top_p TOP_P Top-p sampling. Lower values mean more likely outputs and
increased audio generation speed. Defaults to 0.8.
--speed SPEED Speed factor for the speech generation. IE: How fast the
Narrerator will speak. Defaults to 1.0.
--enable_text_splitting ENABLE_TEXT_SPLITTING
Enable splitting text into sentences. Defaults to True.
Example: python script.py --headless --ebook path_to_ebook --voice path_to_voice
--language en --use_custom_model True --custom_model model.pth --custom_config
config.json --custom_vocab vocab.json
```
<details>
<summary>â ī¸ Legacy-Depricated Old Use Instructions</summary>
## đ Usage
## Legacy files have been moved to `ebook2audiobookXTTS/legacy/`
### đĨī¸ Gradio Web Interface
1. **Run the Script**:
```bash
python custom_model_ebook2audiobookXTTS_gradio.py
```
2. **Open the Web App**: Click the URL provided in the terminal to access the web app and convert eBooks.
### đ Basic Usage
```bash
python ebook2audiobook.py <path_to_ebook_file> [path_to_voice_file] [language_code]
```
- **<path_to_ebook_file>**: Path to your eBook file.
- **[path_to_voice_file]**: Optional for voice cloning.
- **[language_code]**: Optional to specify language.
### 𧊠Custom XTTS Model
```bash
python custom_model_ebook2audiobookXTTS.py <ebook_file_path> <target_voice_file_path> <language> <custom_model_path> <custom_config_path> <custom_vocab_path>
```
- **<ebook_file_path>**: Path to your eBook file.
- **<target_voice_file_path>**: Optional for voice cloning.
- **<language>**: Optional to specify language.
- **<custom_model_path>**: Path to `model.pth`.
- **<custom_config_path>**: Path to `config.json`.
- **<custom_vocab_path>**: Path to `vocab.json`.
</details>
### đŗ Using Docker
You can also use Docker to run the eBook to Audiobook converter. This method ensures consistency across different environments and simplifies setup.
#### đ Running the Docker Container
To run the Docker container and start the Gradio interface, use the following command:
-Run with CPU only
```powershell
docker run -it --rm -p 7860:7860 --platform=linux/amd64 athomasson2/ebook2audiobookxtts:huggingface python app.py
```
-Run with GPU Speedup (Nvida graphics cards only)
```powershell
docker run -it --rm --gpus all -p 7860:7860 --platform=linux/amd64 athomasson2/ebook2audiobookxtts:huggingface python app.py
```
This command will start the Gradio interface on port 7860.(localhost:7860)
- For more options like running the docker in headless mode or making the gradio link public add the `-h` parameter after the `app.py` in the docker launch command
<details>
<summary><strong>Example of using docker in headless mode or modifying anything with the extra parameters + Full guide</strong></summary>
## Example of using docker in headless mode
first for a docker pull of the latest with
```bash
docker pull athomasson2/ebook2audiobookxtts:huggingface
```
- Before you do run this you need to create a dir named "input-folder" in your current dir which will be linked, This is where you can put your input files for the docker image to see
```bash
mkdir input-folder && mkdir Audiobooks
```
- In the command below swap out **YOUR_INPUT_FILE.TXT** with the name of your input file
```bash
docker run -it --rm \
-v $(pwd)/input-folder:/home/user/app/input_folder \
-v $(pwd)/Audiobooks:/home/user/app/Audiobooks \
--platform linux/amd64 \
athomasson2/ebook2audiobookxtts:huggingface \
python app.py --headless True --ebook /home/user/app/input_folder/YOUR_INPUT_FILE.TXT
```
- And that should be it!
- The output Audiobooks will be found in the Audiobook folder which will also be located in your local dir you ran this docker command in
## To get the help command for the other parameters this program has you can run this
```bash
docker run -it --rm \
--platform linux/amd64 \
athomasson2/ebook2audiobookxtts:huggingface \
python app.py -h
```
and that will output this
```bash
user/app/ebook2audiobookXTTS/input-folder -v $(pwd)/Audiobooks:/home/user/app/ebook2audiobookXTTS/Audiobooks --memory="4g" --network none --platform linux/amd64 athomasson2/ebook2audiobookxtts:huggingface python app.py -h
starting...
usage: app.py [-h] [--share SHARE] [--headless HEADLESS] [--ebook EBOOK] [--voice VOICE]
[--language LANGUAGE] [--use_custom_model USE_CUSTOM_MODEL]
[--custom_model CUSTOM_MODEL] [--custom_config CUSTOM_CONFIG]
[--custom_vocab CUSTOM_VOCAB] [--custom_model_url CUSTOM_MODEL_URL]
[--temperature TEMPERATURE] [--length_penalty LENGTH_PENALTY]
[--repetition_penalty REPETITION_PENALTY] [--top_k TOP_K] [--top_p TOP_P]
[--speed SPEED] [--enable_text_splitting ENABLE_TEXT_SPLITTING]
Convert eBooks to Audiobooks using a Text-to-Speech model. You can either launch the
Gradio interface or run the script in headless mode for direct conversion.
options:
-h, --help show this help message and exit
--share SHARE Set to True to enable a public shareable Gradio link. Defaults
to False.
--headless HEADLESS Set to True to run in headless mode without the Gradio
interface. Defaults to False.
--ebook EBOOK Path to the ebook file for conversion. Required in headless
mode.
--voice VOICE Path to the target voice file for TTS. Optional, uses a default
voice if not provided.
--language LANGUAGE Language for the audiobook conversion. Options: en, es, fr, de,
it, pt, pl, tr, ru, nl, cs, ar, zh-cn, ja, hu, ko. Defaults to
English (en).
--use_custom_model USE_CUSTOM_MODEL
Set to True to use a custom TTS model. Defaults to False. Must
be True to use custom models, otherwise you'll get an error.
--custom_model CUSTOM_MODEL
Path to the custom model file (.pth). Required if using a custom
model.
--custom_config CUSTOM_CONFIG
Path to the custom config file (config.json). Required if using
a custom model.
--custom_vocab CUSTOM_VOCAB
Path to the custom vocab file (vocab.json). Required if using a
custom model.
--custom_model_url CUSTOM_MODEL_URL
URL to download the custom model as a zip file. Optional, but
will be used if provided. Examples include David Attenborough's
model: 'https://huggingface.co/drewThomasson/xtts_David_Attenbor
ough_fine_tune/resolve/main/Finished_model_files.zip?download=tr
ue'. More XTTS fine-tunes can be found on my Hugging Face at
'https://huggingface.co/drewThomasson'.
--temperature TEMPERATURE
Temperature for the model. Defaults to 0.65. Higher Tempatures
will lead to more creative outputs IE: more Hallucinations.
Lower Tempatures will be more monotone outputs IE: less
Hallucinations.
--length_penalty LENGTH_PENALTY
A length penalty applied to the autoregressive decoder. Defaults
to 1.0. Not applied to custom models.
--repetition_penalty REPETITION_PENALTY
A penalty that prevents the autoregressive decoder from
repeating itself. Defaults to 2.0.
--top_k TOP_K Top-k sampling. Lower values mean more likely outputs and
increased audio generation speed. Defaults to 50.
--top_p TOP_P Top-p sampling. Lower values mean more likely outputs and
increased audio generation speed. Defaults to 0.8.
--speed SPEED Speed factor for the speech generation. IE: How fast the
Narrerator will speak. Defaults to 1.0.
--enable_text_splitting ENABLE_TEXT_SPLITTING
Enable splitting text into sentences. Defaults to True.
Example: python script.py --headless --ebook path_to_ebook --voice path_to_voice
--language en --use_custom_model True --custom_model model.pth --custom_config
config.json --custom_vocab vocab.json
```
</details>
#### đĨī¸ Docker GUI
![demo_web_gui](https://github.com/user-attachments/assets/85af88a7-05dd-4a29-91de-76a14cf5ef06)
<details>
<summary>Click to see images of Web GUI</summary>
<img width="1728" alt="image" src="https://github.com/user-attachments/assets/b36c71cf-8e06-484c-a252-934e6b1d0c2f">
<img width="1728" alt="image" src="https://github.com/user-attachments/assets/c0dab57a-d2d4-4658-bff9-3842ec90cb40">
<img width="1728" alt="image" src="https://github.com/user-attachments/assets/0a99eeac-c521-4b21-8656-e064c1adc528">
</details>
### đ ī¸ For Custom Xtts Models
Models built to be better at a specific voice. Check out my Hugging Face page [here](https://huggingface.co/drewThomasson).
To use a custom model, paste the link of the `Finished_model_files.zip` file like this:
[David Attenborough fine tuned Finished_model_files.zip](https://huggingface.co/drewThomasson/xtts_David_Attenborough_fine_tune/resolve/main/Finished_model_files.zip?download=true)
For a custom model a ref audio clip of the voice will also be needed:
[ref audio clip of David Attenborough](https://huggingface.co/drewThomasson/xtts_David_Attenborough_fine_tune/blob/main/ref.wav)
More details can be found at the [Dockerfile Hub Page]([https://github.com/DrewThomasson/ebook2audiobookXTTS](https://hub.docker.com/repository/docker/athomasson2/ebook2audiobookxtts/general)).
## đ Fine Tuned Xtts models
To find already fine-tuned XTTS models, visit [this Hugging Face link](https://huggingface.co/drewThomasson) đ. Search for models that include "xtts fine tune" in their names.
## đĨ Demos
Rainy day voice
https://github.com/user-attachments/assets/8486603c-38b1-43ce-9639-73757dfb1031
David Attenborough voice
https://github.com/user-attachments/assets/47c846a7-9e51-4eb9-844a-7460402a20a8
## đ¤ [Huggingface space demo](https://huggingface.co/spaces/drewThomasson/ebook2audiobookXTTS)
- Huggingface space is running on free cpu tier so expect very slow or timeout lol, just don't give it giant files is all
- Best to duplicate space or run locally.
## Free Google Colab [![Free Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DrewThomasson/ebook2audiobookXTTS/blob/main/Notebooks/colab_ebook2audiobookxtts.ipynb)
## đ Supported eBook Formats
- `.epub`, `.pdf`, `.mobi`, `.txt`, `.html`, `.rtf`, `.chm`, `.lit`, `.pdb`, `.fb2`, `.odt`, `.cbr`, `.cbz`, `.prc`, `.lrf`, `.pml`, `.snb`, `.cbc`, `.rb`, `.tcr`
- **Best results**: `.epub` or `.mobi` for automatic chapter detection
## đ Output
- Creates an `.m4b` file with metadata and chapters.
- **Example Output**: ![Example](https://github.com/DrewThomasson/VoxNovel/blob/dc5197dff97252fa44c391dc0596902d71278a88/readme_files/example_in_app.jpeg)
## đ ī¸ Common Issues:
- "It's slow!" - On CPU only this is very slow, and you can only get speedups though a NVIDIA GPU. [Discussion about this](https://github.com/DrewThomasson/ebook2audiobookXTTS/discussions/19#discussioncomment-10879846) For faster multilingual generation I would suggest my other [project that uses piper-tts](https://github.com/DrewThomasson/ebook2audiobookpiper-tts) instead(It doesn't have zero-shot voice cloning though, and is siri quality voices, but it is much faster on cpu.)
- "I'm having dependency issues" - Just use the docker, its fully self contained and has a headless mode, add `-h` parameter after the `app.py` in the docker run command for more information.
- "Im getting a truncated audio issue!" - PLEASE MAKE AN ISSUE OF THIS, I don't speak every language and I need advise from each person to fine tune my sentense splitting function on any other languages.đ
- "The loading bar is stuck at 30% in the web gui!" - The web gui loading bar is extreamly basic as its just split between the three loading steps, refer to the terminal and what sentense it's on for a more accurate gauge on where is it progress wise.
## What I need help with! đ
## [Full list of things can be found here](https://github.com/DrewThomasson/ebook2audiobookXTTS/issues/32)
- Any help from people speaking any of the supported langues to help with proper sentence splitting methods
- Potentially creating readme Guides for Multiple languages(Becuase the only language I know is English đ)
## đ Special Thanks
- **Coqui TTS**: [Coqui TTS GitHub](https://github.com/coqui-ai/TTS)
- **Calibre**: [Calibre Website](https://calibre-ebook.com)
- [@shakenbake15 for better chapter saving method](https://github.com/DrewThomasson/ebook2audiobookXTTS/issues/8)
Raw data
{
"_id": null,
"home_page": "https://github.com/DrewThomasson/ebook2audiobookXTTS",
"name": "ebook2audiobook",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.13,>=3.7",
"maintainer_email": null,
"keywords": null,
"author": "Andrew Phillip Thomasson",
"author_email": "drew.thomasson100@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/a1/65/92b230ded658916c14d0b062ecddc65a4b50ea0ebc2ffb4ee7d7840cb8b3/ebook2audiobook-0.0.8.tar.gz",
"platform": null,
"description": "# \ud83d\udcda ebook2audiobook\n\nConvert eBooks to audiobooks with chapters and metadata using Calibre and Coqui XTTS. Supports optional voice cloning and multiple languages!\n\n\n#### \ud83d\udda5\ufe0f Web GUI Interface\n![demo_web_gui](https://github.com/user-attachments/assets/85af88a7-05dd-4a29-91de-76a14cf5ef06)\n\n<details>\n <summary>Click to see images of Web GUI</summary>\n<img width=\"1728\" alt=\"image\" src=\"https://github.com/user-attachments/assets/b36c71cf-8e06-484c-a252-934e6b1d0c2f\">\n<img width=\"1728\" alt=\"image\" src=\"https://github.com/user-attachments/assets/c0dab57a-d2d4-4658-bff9-3842ec90cb40\">\n<img width=\"1728\" alt=\"image\" src=\"https://github.com/user-attachments/assets/0a99eeac-c521-4b21-8656-e064c1adc528\">\n</details>\n\n## README.md\n- en [English](README.md)\n- zh_CN [\u7b80\u4f53\u4e2d\u6587](readme/README_CN.md)\n\n\n## \ud83c\udf1f Features\n\n- \ud83d\udcd6 Converts eBooks to text format with Calibre.\n- \ud83d\udcda Splits eBook into chapters for organized audio.\n- \ud83c\udf99\ufe0f High-quality text-to-speech with Coqui XTTS.\n- \ud83d\udde3\ufe0f Optional voice cloning with your own voice file.\n- \ud83c\udf0d Supports multiple languages (English by default).\n- \ud83d\udda5\ufe0f Designed to run on 4GB RAM.\n\n## \ud83e\udd17 [Huggingface space demo](https://huggingface.co/spaces/drewThomasson/ebook2audiobookXTTS)\n- Huggingface space is running on free cpu tier so expect very slow or timeout lol, just don't give it giant files is all\n- Best to duplicate space or run locally.\n\n## Free Google Colab [![Free Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DrewThomasson/ebook2audiobookXTTS/blob/main/Notebooks/colab_ebook2audiobookxtts.ipynb)\n\n\n## \ud83d\udee0\ufe0f Requirements\n\n- Python 3.10\n- `coqui-tts` Python package\n- Calibre (for eBook conversion)\n- FFmpeg (for audiobook creation)\n- Optional: Custom voice file for voice cloning\n\n\n### \ud83d\udd27 Installation Instructions\n\n1. **Install Python 3.7 < version < 3.13** from [Python.org](https://www.python.org/downloads/).\n\n2. **Install Calibre**:\n - **Ubuntu**: `sudo apt-get install -y calibre`\n - **macOS**: `brew install calibre`\n - **Windows** (Admin Powershell): `choco install calibre`\n\n3. **Install FFmpeg**:\n - **Ubuntu**: `sudo apt-get install -y ffmpeg`\n - **macOS**: `brew install ffmpeg`\n - **Windows** (Admin Powershell): `choco install ffmpeg`\n\n4. **Optional: Install Mecab** (for non-Latin languages):\n - **Ubuntu**: `sudo apt-get install -y mecab libmecab-dev mecab-ipadic-utf8`\n - **macOS**: `brew install mecab`, `brew install mecab-ipadic`\n - **Windows**: [mecab-website-to-install-manually](https://taku910.github.io/mecab/#download) (Note: Japanese support is limited)\n\n5. **Pip install ebook2audiobook**:\n ```bash\n pip install ebook2audiobook\n ```\n\n **For non-Latin languages (Japanese Support)**:\n ```bash\n pip install mecab mecab-python3 unidic\n \n python -m unidic download\n ```\n\n## \ud83c\udf10 Supported Languages\n\n- **English (en)**\n- **Spanish (es)**\n- **French (fr)**\n- **German (de)**\n- **Italian (it)**\n- **Portuguese (pt)**\n- **Polish (pl)**\n- **Turkish (tr)**\n- **Russian (ru)**\n- **Dutch (nl)**\n- **Czech (cs)**\n- **Arabic (ar)**\n- **Chinese (zh-cn)**\n- **Japanese (ja)**\n- **Hungarian (hu)**\n- **Korean (ko)**\n\nSpecify the language code when running the script in headless mode.\n## \ud83d\ude80 Usage\n\n### \ud83d\udda5\ufe0f Launching Gradio Web Interface\n\n1. **Run the command**:\n ```bash\n ebook2audiobook\n ```\n\n2. **Open the Web App**: Click the URL provided in the terminal to access the web app and convert eBooks.\n3. **For Public Link**: Add `--share True` to the end of it like this: `ebook2audiobook --share True`\n- **[For More Parameters]**: use the `-h` parameter like this `ebook2audiobook -h`\n\n### \ud83d\udcdd Basic Headless Usage\n\n```bash\nebook2audiobook --headless True --ebook <path_to_ebook_file> --voice [path_to_voice_file] --language [language_code]\n```\n\n- **<path_to_ebook_file>**: Path to your eBook file.\n- **[path_to_voice_file]**: Optional for voice cloning.\n- **[language_code]**: Optional to specify language.\n- **[For More Parameters]**: use the `-h` parameter like this `ebook2audiobook -h`\n\n### \ud83e\udde9 Headless Custom XTTS Model Usage\n\n```bash\nebook2audiobook --headless True --use_custom_model True --ebook <ebook_file_path> --voice <target_voice_file_path> --language <language> --custom_model <custom_model_path> --custom_config <custom_config_path> --custom_vocab <custom_vocab_path>\n```\n\n- **<ebook_file_path>**: Path to your eBook file.\n- **<target_voice_file_path>**: Optional for voice cloning.\n- **<language>**: Optional to specify language.\n- **<custom_model_path>**: Path to `model.pth`.\n- **<custom_config_path>**: Path to `config.json`.\n- **<custom_vocab_path>**: Path to `vocab.json`.\n- **[For More Parameters]**: use the `-h` parameter like this `ebook2audiobook -h`\n\n\n### \ud83e\udde9 Headless Custom XTTS Model Usage With Zip link to XTTS Fine-Tune Model \ud83c\udf10\n\n```bash\nebook2audiobook --headless True --use_custom_model True --ebook <ebook_file_path> --voice <target_voice_file_path> --language <language> --custom_model_url <custom_model_URL_ZIP_path>\n```\n\n- **<ebook_file_path>**: Path to your eBook file.\n- **<target_voice_file_path>**: Optional for voice cloning.\n- **<language>**: Optional to specify language.\n- **<custom_model_URL_ZIP_path>**: URL Path to zip of Model folder. For Example this for the [xtts_David_Attenborough_fine_tune](https://huggingface.co/drewThomasson/xtts_David_Attenborough_fine_tune/tree/main) `https://huggingface.co/drewThomasson/xtts_David_Attenborough_fine_tune/resolve/main/Finished_model_files.zip?download=true`\n- For a custom model a ref audio clip of the voice will also be needed:\n[ref audio clip of David Attenborough](https://huggingface.co/drewThomasson/xtts_David_Attenborough_fine_tune/blob/main/ref.wav)\n- **[For More Parameters]**: use the `-h` parameter like this `ebook2audiobook -h`\n\n### \ud83d\udd0d For Detailed Guide with list of all Parameters to use\n```bash\nebook2audiobook -h\n```\n- This will output the following:\n```bash\nusage: app.py [-h] [--share SHARE] [--headless HEADLESS] [--ebook EBOOK] [--voice VOICE]\n [--language LANGUAGE] [--use_custom_model USE_CUSTOM_MODEL]\n [--custom_model CUSTOM_MODEL] [--custom_config CUSTOM_CONFIG]\n [--custom_vocab CUSTOM_VOCAB] [--custom_model_url CUSTOM_MODEL_URL]\n [--temperature TEMPERATURE] [--length_penalty LENGTH_PENALTY]\n [--repetition_penalty REPETITION_PENALTY] [--top_k TOP_K] [--top_p TOP_P]\n [--speed SPEED] [--enable_text_splitting ENABLE_TEXT_SPLITTING]\n\nConvert eBooks to Audiobooks using a Text-to-Speech model. You can either launch the\nGradio interface or run the script in headless mode for direct conversion.\n\noptions:\n -h, --help show this help message and exit\n --share SHARE Set to True to enable a public shareable Gradio link. Defaults\n to False.\n --headless HEADLESS Set to True to run in headless mode without the Gradio\n interface. Defaults to False.\n --ebook EBOOK Path to the ebook file for conversion. Required in headless\n mode.\n --voice VOICE Path to the target voice file for TTS. Optional, uses a default\n voice if not provided.\n --language LANGUAGE Language for the audiobook conversion. Options: en, es, fr, de,\n it, pt, pl, tr, ru, nl, cs, ar, zh-cn, ja, hu, ko. Defaults to\n English (en).\n --use_custom_model USE_CUSTOM_MODEL\n Set to True to use a custom TTS model. Defaults to False. Must\n be True to use custom models, otherwise you'll get an error.\n --custom_model CUSTOM_MODEL\n Path to the custom model file (.pth). Required if using a custom\n model.\n --custom_config CUSTOM_CONFIG\n Path to the custom config file (config.json). Required if using\n a custom model.\n --custom_vocab CUSTOM_VOCAB\n Path to the custom vocab file (vocab.json). Required if using a\n custom model.\n --custom_model_url CUSTOM_MODEL_URL\n URL to download the custom model as a zip file. Optional, but\n will be used if provided. Examples include David Attenborough's\n model: 'https://huggingface.co/drewThomasson/xtts_David_Attenbor\n ough_fine_tune/resolve/main/Finished_model_files.zip?download=tr\n ue'. More XTTS fine-tunes can be found on my Hugging Face at\n 'https://huggingface.co/drewThomasson'.\n --temperature TEMPERATURE\n Temperature for the model. Defaults to 0.65. Higher Tempatures\n will lead to more creative outputs IE: more Hallucinations.\n Lower Tempatures will be more monotone outputs IE: less\n Hallucinations.\n --length_penalty LENGTH_PENALTY\n A length penalty applied to the autoregressive decoder. Defaults\n to 1.0. Not applied to custom models.\n --repetition_penalty REPETITION_PENALTY\n A penalty that prevents the autoregressive decoder from\n repeating itself. Defaults to 2.0.\n --top_k TOP_K Top-k sampling. Lower values mean more likely outputs and\n increased audio generation speed. Defaults to 50.\n --top_p TOP_P Top-p sampling. Lower values mean more likely outputs and\n increased audio generation speed. Defaults to 0.8.\n --speed SPEED Speed factor for the speech generation. IE: How fast the\n Narrerator will speak. Defaults to 1.0.\n --enable_text_splitting ENABLE_TEXT_SPLITTING\n Enable splitting text into sentences. Defaults to True.\n\nExample: python script.py --headless --ebook path_to_ebook --voice path_to_voice\n--language en --use_custom_model True --custom_model model.pth --custom_config\nconfig.json --custom_vocab vocab.json\n```\n\n\n<details>\n <summary>\u26a0\ufe0f Legacy-Depricated Old Use Instructions</summary>\n \n## \ud83d\ude80 Usage\n\n## Legacy files have been moved to `ebook2audiobookXTTS/legacy/`\n\n### \ud83d\udda5\ufe0f Gradio Web Interface\n\n1. **Run the Script**:\n ```bash\n python custom_model_ebook2audiobookXTTS_gradio.py\n ```\n\n2. **Open the Web App**: Click the URL provided in the terminal to access the web app and convert eBooks.\n\n### \ud83d\udcdd Basic Usage\n\n```bash\npython ebook2audiobook.py <path_to_ebook_file> [path_to_voice_file] [language_code]\n```\n\n- **<path_to_ebook_file>**: Path to your eBook file.\n- **[path_to_voice_file]**: Optional for voice cloning.\n- **[language_code]**: Optional to specify language.\n\n### \ud83e\udde9 Custom XTTS Model\n\n```bash\npython custom_model_ebook2audiobookXTTS.py <ebook_file_path> <target_voice_file_path> <language> <custom_model_path> <custom_config_path> <custom_vocab_path>\n```\n\n- **<ebook_file_path>**: Path to your eBook file.\n- **<target_voice_file_path>**: Optional for voice cloning.\n- **<language>**: Optional to specify language.\n- **<custom_model_path>**: Path to `model.pth`.\n- **<custom_config_path>**: Path to `config.json`.\n- **<custom_vocab_path>**: Path to `vocab.json`.\n</details>\n\n### \ud83d\udc33 Using Docker\n\nYou can also use Docker to run the eBook to Audiobook converter. This method ensures consistency across different environments and simplifies setup.\n\n#### \ud83d\ude80 Running the Docker Container\n\nTo run the Docker container and start the Gradio interface, use the following command:\n\n -Run with CPU only\n```powershell\ndocker run -it --rm -p 7860:7860 --platform=linux/amd64 athomasson2/ebook2audiobookxtts:huggingface python app.py\n```\n -Run with GPU Speedup (Nvida graphics cards only)\n```powershell\ndocker run -it --rm --gpus all -p 7860:7860 --platform=linux/amd64 athomasson2/ebook2audiobookxtts:huggingface python app.py\n```\n\nThis command will start the Gradio interface on port 7860.(localhost:7860)\n- For more options like running the docker in headless mode or making the gradio link public add the `-h` parameter after the `app.py` in the docker launch command\n<details>\n <summary><strong>Example of using docker in headless mode or modifying anything with the extra parameters + Full guide</strong></summary>\n \n## Example of using docker in headless mode\n\nfirst for a docker pull of the latest with\n```bash \ndocker pull athomasson2/ebook2audiobookxtts:huggingface\n```\n\n- Before you do run this you need to create a dir named \"input-folder\" in your current dir which will be linked, This is where you can put your input files for the docker image to see\n```bash\nmkdir input-folder && mkdir Audiobooks\n```\n\n- In the command below swap out **YOUR_INPUT_FILE.TXT** with the name of your input file \n\n```bash\ndocker run -it --rm \\\n -v $(pwd)/input-folder:/home/user/app/input_folder \\\n -v $(pwd)/Audiobooks:/home/user/app/Audiobooks \\\n --platform linux/amd64 \\\n athomasson2/ebook2audiobookxtts:huggingface \\\n python app.py --headless True --ebook /home/user/app/input_folder/YOUR_INPUT_FILE.TXT\n```\n\n- And that should be it! \n\n- The output Audiobooks will be found in the Audiobook folder which will also be located in your local dir you ran this docker command in\n\n\n## To get the help command for the other parameters this program has you can run this \n\n```bash\ndocker run -it --rm \\\n --platform linux/amd64 \\\n athomasson2/ebook2audiobookxtts:huggingface \\\n python app.py -h\n\n```\n\n\nand that will output this \n\n```bash\nuser/app/ebook2audiobookXTTS/input-folder -v $(pwd)/Audiobooks:/home/user/app/ebook2audiobookXTTS/Audiobooks --memory=\"4g\" --network none --platform linux/amd64 athomasson2/ebook2audiobookxtts:huggingface python app.py -h\nstarting...\nusage: app.py [-h] [--share SHARE] [--headless HEADLESS] [--ebook EBOOK] [--voice VOICE]\n [--language LANGUAGE] [--use_custom_model USE_CUSTOM_MODEL]\n [--custom_model CUSTOM_MODEL] [--custom_config CUSTOM_CONFIG]\n [--custom_vocab CUSTOM_VOCAB] [--custom_model_url CUSTOM_MODEL_URL]\n [--temperature TEMPERATURE] [--length_penalty LENGTH_PENALTY]\n [--repetition_penalty REPETITION_PENALTY] [--top_k TOP_K] [--top_p TOP_P]\n [--speed SPEED] [--enable_text_splitting ENABLE_TEXT_SPLITTING]\n\nConvert eBooks to Audiobooks using a Text-to-Speech model. You can either launch the\nGradio interface or run the script in headless mode for direct conversion.\n\noptions:\n -h, --help show this help message and exit\n --share SHARE Set to True to enable a public shareable Gradio link. Defaults\n to False.\n --headless HEADLESS Set to True to run in headless mode without the Gradio\n interface. Defaults to False.\n --ebook EBOOK Path to the ebook file for conversion. Required in headless\n mode.\n --voice VOICE Path to the target voice file for TTS. Optional, uses a default\n voice if not provided.\n --language LANGUAGE Language for the audiobook conversion. Options: en, es, fr, de,\n it, pt, pl, tr, ru, nl, cs, ar, zh-cn, ja, hu, ko. Defaults to\n English (en).\n --use_custom_model USE_CUSTOM_MODEL\n Set to True to use a custom TTS model. Defaults to False. Must\n be True to use custom models, otherwise you'll get an error.\n --custom_model CUSTOM_MODEL\n Path to the custom model file (.pth). Required if using a custom\n model.\n --custom_config CUSTOM_CONFIG\n Path to the custom config file (config.json). Required if using\n a custom model.\n --custom_vocab CUSTOM_VOCAB\n Path to the custom vocab file (vocab.json). Required if using a\n custom model.\n --custom_model_url CUSTOM_MODEL_URL\n URL to download the custom model as a zip file. Optional, but\n will be used if provided. Examples include David Attenborough's\n model: 'https://huggingface.co/drewThomasson/xtts_David_Attenbor\n ough_fine_tune/resolve/main/Finished_model_files.zip?download=tr\n ue'. More XTTS fine-tunes can be found on my Hugging Face at\n 'https://huggingface.co/drewThomasson'.\n --temperature TEMPERATURE\n Temperature for the model. Defaults to 0.65. Higher Tempatures\n will lead to more creative outputs IE: more Hallucinations.\n Lower Tempatures will be more monotone outputs IE: less\n Hallucinations.\n --length_penalty LENGTH_PENALTY\n A length penalty applied to the autoregressive decoder. Defaults\n to 1.0. Not applied to custom models.\n --repetition_penalty REPETITION_PENALTY\n A penalty that prevents the autoregressive decoder from\n repeating itself. Defaults to 2.0.\n --top_k TOP_K Top-k sampling. Lower values mean more likely outputs and\n increased audio generation speed. Defaults to 50.\n --top_p TOP_P Top-p sampling. Lower values mean more likely outputs and\n increased audio generation speed. Defaults to 0.8.\n --speed SPEED Speed factor for the speech generation. IE: How fast the\n Narrerator will speak. Defaults to 1.0.\n --enable_text_splitting ENABLE_TEXT_SPLITTING\n Enable splitting text into sentences. Defaults to True.\n\nExample: python script.py --headless --ebook path_to_ebook --voice path_to_voice\n--language en --use_custom_model True --custom_model model.pth --custom_config\nconfig.json --custom_vocab vocab.json\n```\n</details>\n\n#### \ud83d\udda5\ufe0f Docker GUI \n![demo_web_gui](https://github.com/user-attachments/assets/85af88a7-05dd-4a29-91de-76a14cf5ef06)\n\n<details>\n <summary>Click to see images of Web GUI</summary>\n<img width=\"1728\" alt=\"image\" src=\"https://github.com/user-attachments/assets/b36c71cf-8e06-484c-a252-934e6b1d0c2f\">\n<img width=\"1728\" alt=\"image\" src=\"https://github.com/user-attachments/assets/c0dab57a-d2d4-4658-bff9-3842ec90cb40\">\n<img width=\"1728\" alt=\"image\" src=\"https://github.com/user-attachments/assets/0a99eeac-c521-4b21-8656-e064c1adc528\">\n</details>\n### \ud83d\udee0\ufe0f For Custom Xtts Models\n\nModels built to be better at a specific voice. Check out my Hugging Face page [here](https://huggingface.co/drewThomasson).\n\nTo use a custom model, paste the link of the `Finished_model_files.zip` file like this:\n\n[David Attenborough fine tuned Finished_model_files.zip](https://huggingface.co/drewThomasson/xtts_David_Attenborough_fine_tune/resolve/main/Finished_model_files.zip?download=true)\n\nFor a custom model a ref audio clip of the voice will also be needed:\n[ref audio clip of David Attenborough](https://huggingface.co/drewThomasson/xtts_David_Attenborough_fine_tune/blob/main/ref.wav)\n\n\n\nMore details can be found at the [Dockerfile Hub Page]([https://github.com/DrewThomasson/ebook2audiobookXTTS](https://hub.docker.com/repository/docker/athomasson2/ebook2audiobookxtts/general)).\n\n## \ud83c\udf10 Fine Tuned Xtts models\n\nTo find already fine-tuned XTTS models, visit [this Hugging Face link](https://huggingface.co/drewThomasson) \ud83c\udf10. Search for models that include \"xtts fine tune\" in their names.\n\n## \ud83c\udfa5 Demos\n\nRainy day voice\n\nhttps://github.com/user-attachments/assets/8486603c-38b1-43ce-9639-73757dfb1031\n\nDavid Attenborough voice\n\nhttps://github.com/user-attachments/assets/47c846a7-9e51-4eb9-844a-7460402a20a8\n\n\n## \ud83e\udd17 [Huggingface space demo](https://huggingface.co/spaces/drewThomasson/ebook2audiobookXTTS)\n- Huggingface space is running on free cpu tier so expect very slow or timeout lol, just don't give it giant files is all\n- Best to duplicate space or run locally.\n\n## Free Google Colab [![Free Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DrewThomasson/ebook2audiobookXTTS/blob/main/Notebooks/colab_ebook2audiobookxtts.ipynb)\n\n\n\n## \ud83d\udcda Supported eBook Formats\n\n- `.epub`, `.pdf`, `.mobi`, `.txt`, `.html`, `.rtf`, `.chm`, `.lit`, `.pdb`, `.fb2`, `.odt`, `.cbr`, `.cbz`, `.prc`, `.lrf`, `.pml`, `.snb`, `.cbc`, `.rb`, `.tcr`\n- **Best results**: `.epub` or `.mobi` for automatic chapter detection\n\n## \ud83d\udcc2 Output\n\n- Creates an `.m4b` file with metadata and chapters.\n- **Example Output**: ![Example](https://github.com/DrewThomasson/VoxNovel/blob/dc5197dff97252fa44c391dc0596902d71278a88/readme_files/example_in_app.jpeg)\n\n## \ud83d\udee0\ufe0f Common Issues:\n- \"It's slow!\" - On CPU only this is very slow, and you can only get speedups though a NVIDIA GPU. [Discussion about this](https://github.com/DrewThomasson/ebook2audiobookXTTS/discussions/19#discussioncomment-10879846) For faster multilingual generation I would suggest my other [project that uses piper-tts](https://github.com/DrewThomasson/ebook2audiobookpiper-tts) instead(It doesn't have zero-shot voice cloning though, and is siri quality voices, but it is much faster on cpu.)\n- \"I'm having dependency issues\" - Just use the docker, its fully self contained and has a headless mode, add `-h` parameter after the `app.py` in the docker run command for more information.\n- \"Im getting a truncated audio issue!\" - PLEASE MAKE AN ISSUE OF THIS, I don't speak every language and I need advise from each person to fine tune my sentense splitting function on any other languages.\ud83d\ude0a\n- \"The loading bar is stuck at 30% in the web gui!\" - The web gui loading bar is extreamly basic as its just split between the three loading steps, refer to the terminal and what sentense it's on for a more accurate gauge on where is it progress wise.\n\n## What I need help with! \ud83d\ude4c \n## [Full list of things can be found here](https://github.com/DrewThomasson/ebook2audiobookXTTS/issues/32)\n- Any help from people speaking any of the supported langues to help with proper sentence splitting methods\n- Potentially creating readme Guides for Multiple languages(Becuase the only language I know is English \ud83d\ude14)\n\n## \ud83d\ude4f Special Thanks\n\n- **Coqui TTS**: [Coqui TTS GitHub](https://github.com/coqui-ai/TTS)\n- **Calibre**: [Calibre Website](https://calibre-ebook.com)\n\n- [@shakenbake15 for better chapter saving method](https://github.com/DrewThomasson/ebook2audiobookXTTS/issues/8) \n\n",
"bugtrack_url": null,
"license": null,
"summary": "Convert eBooks to Audiobooks using a Text-to-Speech model with optional Gradio interface.",
"version": "0.0.8",
"project_urls": {
"Homepage": "https://github.com/DrewThomasson/ebook2audiobookXTTS"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "7087e23423141fe4afe4832e4c99e1417a7d57294cc354262e1dca0c94556b3b",
"md5": "b5c539e57bb3b171b6d6cfdbc5846c8c",
"sha256": "5039ccb0f363d4ea51b0b31393439890d5c6ca4d2580b54a7eb1b4223a01c6e3"
},
"downloads": -1,
"filename": "ebook2audiobook-0.0.8-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b5c539e57bb3b171b6d6cfdbc5846c8c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.13,>=3.7",
"size": 287842,
"upload_time": "2024-11-07T04:25:57",
"upload_time_iso_8601": "2024-11-07T04:25:57.699658Z",
"url": "https://files.pythonhosted.org/packages/70/87/e23423141fe4afe4832e4c99e1417a7d57294cc354262e1dca0c94556b3b/ebook2audiobook-0.0.8-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a16592b230ded658916c14d0b062ecddc65a4b50ea0ebc2ffb4ee7d7840cb8b3",
"md5": "d3640880af07303e76961061c3322d88",
"sha256": "4e1d92235a3d500ef3077aa53c38130c7daff2dd6e913d0a56c57f1bcb439068"
},
"downloads": -1,
"filename": "ebook2audiobook-0.0.8.tar.gz",
"has_sig": false,
"md5_digest": "d3640880af07303e76961061c3322d88",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.13,>=3.7",
"size": 8911005,
"upload_time": "2024-11-07T04:25:59",
"upload_time_iso_8601": "2024-11-07T04:25:59.642016Z",
"url": "https://files.pythonhosted.org/packages/a1/65/92b230ded658916c14d0b062ecddc65a4b50ea0ebc2ffb4ee7d7840cb8b3/ebook2audiobook-0.0.8.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-07 04:25:59",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "DrewThomasson",
"github_project": "ebook2audiobookXTTS",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "ebook2audiobook"
}