This is a re-implementation of [F5-TTS](https://github.com/SWivid/F5-TTS) aimed at reducing dependencies, increasing speed, reducing model size and improving usability.
# Installation
Fairytaler assumes you have a working CUDA environment to install into.
```
pip install fairytaler
```
# How to Use
You do not need to pre-download anything, necessary data will be downloaded at runtime. Weights will be fetched from [HuggingFace.](https://huggingface.co/benjamin-paine/fairytaler)
## Command Line
Use the `fairytaler` binary from the command line like so:
```sh
fairytaler examples/reference.wav examples/reference.txt "Hello, this is some test audio!"
```
Many options are available, for complete documentation run `fairytaler --help`.
## Python
```py
from fairytaler import F5TTSPipeline
pipeline = F5TTSPipeline.from_pretrained(
"benjamin-paine/fairytaler",
variant="fp16", # Omit for float32
device="auto"
)
output_wav_file = pipeline(
text="Hello, this is some test audio!",
reference_audio="examples/reference.wav",
reference_text="examples/reference.txt",
output_save=True
)
print(f"Output saved to {output_wav_file}")
```
The full execution signature is:
```py
def __call__(
self,
text: Union[str, List[str]],
reference_audio: AudioType,
reference_text: str,
reference_sample_rate: Optional[int]=None,
seed: SeedType=None,
speed: float=1.0,
sway_sampling_coef: float=-1.0,
target_rms: float=0.1,
cross_fade_duration: float=0.15,
punctuation_pause_duration: float=0.10,
num_steps: int=32,
cfg_strength: float=2.0,
fix_duration: Optional[float]=None,
use_tqdm: bool=False,
output_format: AUDIO_OUTPUT_FORMAT_LITERAL="wav",
output_save: bool=False,
) -> AudioResultType
```
Format values are `wav`, `ogg`, `flac`, `mp3`, `float` and `int`. Passing `output_save=True` will save to file, not passing it will return the data directly.
# Citation
```
@misc{chen2024f5ttsfairytalerfakesfluent,
title={F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching},
author={Yushen Chen and Zhikang Niu and Ziyang Ma and Keqi Deng and Chunhui Wang and Jian Zhao and Kai Yu and Xie Chen},
year={2024},
eprint={2410.06885},
archivePrefix={arXiv},
primaryClass={eess.AS},
url={https://arxiv.org/abs/2410.06885},
}
```
Raw data
{
"_id": null,
"home_page": "https://github.com/painebenjamin/fairytaler",
"name": "fairytaler",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8.0",
"maintainer_email": null,
"keywords": null,
"author": "Benjamin Paine",
"author_email": "painebenjamin@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/a3/f4/4d28f61a5ad24f136193ca9a94dcd94e451a4d70e9affc93164f594deb03/fairytaler-0.1.1.tar.gz",
"platform": null,
"description": "This is a re-implementation of [F5-TTS](https://github.com/SWivid/F5-TTS) aimed at reducing dependencies, increasing speed, reducing model size and improving usability.\n\n# Installation\n\nFairytaler assumes you have a working CUDA environment to install into.\n\n```\npip install fairytaler\n```\n\n# How to Use\n\nYou do not need to pre-download anything, necessary data will be downloaded at runtime. Weights will be fetched from [HuggingFace.](https://huggingface.co/benjamin-paine/fairytaler)\n\n## Command Line\n\nUse the `fairytaler` binary from the command line like so:\n\n```sh\nfairytaler examples/reference.wav examples/reference.txt \"Hello, this is some test audio!\"\n```\n\nMany options are available, for complete documentation run `fairytaler --help`.\n\n## Python\n\n```py\nfrom fairytaler import F5TTSPipeline\n\npipeline = F5TTSPipeline.from_pretrained(\n \"benjamin-paine/fairytaler\",\n variant=\"fp16\", # Omit for float32\n device=\"auto\"\n)\noutput_wav_file = pipeline(\n text=\"Hello, this is some test audio!\",\n reference_audio=\"examples/reference.wav\",\n reference_text=\"examples/reference.txt\",\n output_save=True\n)\nprint(f\"Output saved to {output_wav_file}\")\n```\n\nThe full execution signature is:\n\n```py\ndef __call__(\n self,\n text: Union[str, List[str]],\n reference_audio: AudioType,\n reference_text: str,\n reference_sample_rate: Optional[int]=None,\n seed: SeedType=None,\n speed: float=1.0,\n sway_sampling_coef: float=-1.0,\n target_rms: float=0.1,\n cross_fade_duration: float=0.15,\n punctuation_pause_duration: float=0.10,\n num_steps: int=32,\n cfg_strength: float=2.0,\n fix_duration: Optional[float]=None,\n use_tqdm: bool=False,\n output_format: AUDIO_OUTPUT_FORMAT_LITERAL=\"wav\",\n output_save: bool=False,\n) -> AudioResultType\n```\n\nFormat values are `wav`, `ogg`, `flac`, `mp3`, `float` and `int`. Passing `output_save=True` will save to file, not passing it will return the data directly.\n\n# Citation\n\n```\n@misc{chen2024f5ttsfairytalerfakesfluent,\n title={F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching}, \n author={Yushen Chen and Zhikang Niu and Ziyang Ma and Keqi Deng and Chunhui Wang and Jian Zhao and Kai Yu and Xie Chen},\n year={2024},\n eprint={2410.06885},\n archivePrefix={arXiv},\n primaryClass={eess.AS},\n url={https://arxiv.org/abs/2410.06885}, \n}\n```\n",
"bugtrack_url": null,
"license": "cc-by-nc-4.0",
"summary": "An unofficial reimplementation of F5TTS",
"version": "0.1.1",
"project_urls": {
"Homepage": "https://github.com/painebenjamin/fairytaler"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "a3f44d28f61a5ad24f136193ca9a94dcd94e451a4d70e9affc93164f594deb03",
"md5": "86fde4a2dcce5d0873ee11dbbb151717",
"sha256": "48bfead1770f6b6f3de2b6fe6dff516f5d7d163461faecc4f3ec8fe4c6d2f142"
},
"downloads": -1,
"filename": "fairytaler-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "86fde4a2dcce5d0873ee11dbbb151717",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8.0",
"size": 39845,
"upload_time": "2024-12-09T03:38:00",
"upload_time_iso_8601": "2024-12-09T03:38:00.865585Z",
"url": "https://files.pythonhosted.org/packages/a3/f4/4d28f61a5ad24f136193ca9a94dcd94e451a4d70e9affc93164f594deb03/fairytaler-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-09 03:38:00",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "painebenjamin",
"github_project": "fairytaler",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "accelerate",
"specs": [
[
"~=",
"1.0"
]
]
},
{
"name": "einops",
"specs": [
[
">=",
"0.8"
]
]
},
{
"name": "huggingface-hub",
"specs": [
[
"~=",
"0.26"
]
]
},
{
"name": "librosa",
"specs": [
[
">=",
"0.10"
]
]
},
{
"name": "numpy",
"specs": [
[
"~=",
"1.22"
]
]
},
{
"name": "pillow",
"specs": [
[
"~=",
"9.5"
]
]
},
{
"name": "safetensors",
"specs": [
[
"~=",
"0.4"
]
]
},
{
"name": "scipy",
"specs": [
[
">=",
"1.11"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
"~=",
"1.5"
]
]
},
{
"name": "torch",
"specs": [
[
">=",
"2.4"
]
]
},
{
"name": "torchaudio",
"specs": [
[
">=",
"2.4"
]
]
},
{
"name": "torchdiffeq",
"specs": [
[
"~=",
"0.2"
]
]
},
{
"name": "torchvision",
"specs": [
[
">=",
"0.19"
]
]
},
{
"name": "transformers",
"specs": [
[
">=",
"4.41"
]
]
},
{
"name": "vocos",
"specs": [
[
"~=",
"0.1"
]
]
}
],
"lcname": "fairytaler"
}