![F5 TTS diagram](f5tts.jpg)
# F5 TTS — MLX
Implementation of [F5-TTS](https://arxiv.org/abs/2410.06885), with the [MLX](https://github.com/ml-explore/mlx) framework.
F5 TTS is a non-autoregressive, zero-shot text-to-speech system using a flow-matching mel spectrogram generator with a diffusion transformer (DiT).
You can listen to a [sample here](https://s3.amazonaws.com/lucasnewman.datasets/f5tts/sample.wav) that was generated in ~11 seconds on an M3 Max MacBook Pro.
F5 is an evolution of [E2 TTS](https://arxiv.org/abs/2406.18009v2) and improves performance with ConvNeXT v2 blocks for the learned text alignment. This repository is based on the original Pytorch implementation available [here](https://github.com/SWivid/F5-TTS).
## Installation
```bash
pip install f5-tts-mlx
```
## Usage
```bash
python -m f5_tts_mlx.generate --text "The quick brown fox jumped over the lazy dog."
```
If you want to use your own reference audio sample, make sure it's a mono, 24kHz wav file of around 5-10 seconds:
```bash
python -m f5_tts_mlx.generate \
--text "The quick brown fox jumped over the lazy dog."
--ref-audio /path/to/audio.wav
--ref-text "This is the caption for the reference audio."
```
You can convert an audio file to the correct format with ffmpeg like this:
```bash
ffmpeg -i /path/to/audio.wav -ac 1 -ar 24000 -sample_fmt s16 -t 10 /path/to/output_audio.wav
```
See [here](./f5_tts_mlx) for more options to customize generation.
—
You can load a pretrained model from Python like this:
```python
from f5_tts_mlx.generate import generate
audio = generate(text = "Hello world.", ...)
```
Pretrained model weights are also available [on Hugging Face](https://huggingface.co/lucasnewman/f5-tts-mlx).
## Appreciation
[Yushen Chen](https://github.com/SWivid) for the original Pytorch implementation of F5 TTS and pretrained model.
[Phil Wang](https://github.com/lucidrains) for the E2 TTS implementation that this model is based on.
## Citations
```bibtex
@article{chen-etal-2024-f5tts,
title={F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching},
author={Yushen Chen and Zhikang Niu and Ziyang Ma and Keqi Deng and Chunhui Wang and Jian Zhao and Kai Yu and Xie Chen},
journal={arXiv preprint arXiv:2410.06885},
year={2024},
}
```
```bibtex
@inproceedings{Eskimez2024E2TE,
title = {E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS},
author = {Sefik Emre Eskimez and Xiaofei Wang and Manthan Thakker and Canrun Li and Chung-Hsien Tsai and Zhen Xiao and Hemin Yang and Zirun Zhu and Min Tang and Xu Tan and Yanqing Liu and Sheng Zhao and Naoyuki Kanda},
year = {2024},
url = {https://api.semanticscholar.org/CorpusID:270738197}
}
```
## License
The code in this repository is released under the MIT license as found in the
[LICENSE](LICENSE) file.
Raw data
{
"_id": null,
"home_page": null,
"name": "f5-tts-mlx",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "artificial intelligence, asr, audio-generation, deep learning, transformers, text-to-speech",
"author": null,
"author_email": "Lucas Newman <lucasnewman@me.com>",
"download_url": "https://files.pythonhosted.org/packages/fa/21/f60dfdf3cd8e7bc611afbcaa34c9adb1029a9d507177aff5ee15121ee934/f5_tts_mlx-0.1.7.tar.gz",
"platform": null,
"description": "![F5 TTS diagram](f5tts.jpg)\n\n# F5 TTS \u2014 MLX\n\nImplementation of [F5-TTS](https://arxiv.org/abs/2410.06885), with the [MLX](https://github.com/ml-explore/mlx) framework.\n\nF5 TTS is a non-autoregressive, zero-shot text-to-speech system using a flow-matching mel spectrogram generator with a diffusion transformer (DiT).\n\nYou can listen to a [sample here](https://s3.amazonaws.com/lucasnewman.datasets/f5tts/sample.wav) that was generated in ~11 seconds on an M3 Max MacBook Pro.\n\nF5 is an evolution of [E2 TTS](https://arxiv.org/abs/2406.18009v2) and improves performance with ConvNeXT v2 blocks for the learned text alignment. This repository is based on the original Pytorch implementation available [here](https://github.com/SWivid/F5-TTS).\n\n## Installation\n\n```bash\npip install f5-tts-mlx\n```\n\n## Usage\n\n```bash\npython -m f5_tts_mlx.generate --text \"The quick brown fox jumped over the lazy dog.\"\n```\n\nIf you want to use your own reference audio sample, make sure it's a mono, 24kHz wav file of around 5-10 seconds:\n\n```bash\npython -m f5_tts_mlx.generate \\\n--text \"The quick brown fox jumped over the lazy dog.\"\n--ref-audio /path/to/audio.wav\n--ref-text \"This is the caption for the reference audio.\"\n```\n\nYou can convert an audio file to the correct format with ffmpeg like this:\n\n```bash\nffmpeg -i /path/to/audio.wav -ac 1 -ar 24000 -sample_fmt s16 -t 10 /path/to/output_audio.wav\n```\n\nSee [here](./f5_tts_mlx) for more options to customize generation.\n\n\u2014\n\nYou can load a pretrained model from Python like this:\n\n```python\nfrom f5_tts_mlx.generate import generate\n\naudio = generate(text = \"Hello world.\", ...)\n```\n\nPretrained model weights are also available [on Hugging Face](https://huggingface.co/lucasnewman/f5-tts-mlx).\n\n## Appreciation\n\n[Yushen Chen](https://github.com/SWivid) for the original Pytorch implementation of F5 TTS and pretrained model.\n\n[Phil Wang](https://github.com/lucidrains) for the E2 TTS implementation that this model is based on.\n\n## Citations\n\n```bibtex\n@article{chen-etal-2024-f5tts,\n title={F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching}, \n author={Yushen Chen and Zhikang Niu and Ziyang Ma and Keqi Deng and Chunhui Wang and Jian Zhao and Kai Yu and Xie Chen},\n journal={arXiv preprint arXiv:2410.06885},\n year={2024},\n}\n```\n\n```bibtex\n@inproceedings{Eskimez2024E2TE,\n title = {E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS},\n author = {Sefik Emre Eskimez and Xiaofei Wang and Manthan Thakker and Canrun Li and Chung-Hsien Tsai and Zhen Xiao and Hemin Yang and Zirun Zhu and Min Tang and Xu Tan and Yanqing Liu and Sheng Zhao and Naoyuki Kanda},\n year = {2024},\n url = {https://api.semanticscholar.org/CorpusID:270738197}\n}\n```\n\n## License\n\nThe code in this repository is released under the MIT license as found in the\n[LICENSE](LICENSE) file.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "F5-TTS - MLX",
"version": "0.1.7",
"project_urls": {
"Homepage": "https://github.com/lucasnewman/f5-tts-mlx"
},
"split_keywords": [
"artificial intelligence",
" asr",
" audio-generation",
" deep learning",
" transformers",
" text-to-speech"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "e87964875671cb8f1cb8aa53a2a4fbe9dada18958505ccc8167d4e9ffba1b68b",
"md5": "79af8a8dd5a379dd2b889281963962ba",
"sha256": "55ae6039d8c97d018b3610b0c7fa7186cb9623546ca586e483e4a032feca624e"
},
"downloads": -1,
"filename": "f5_tts_mlx-0.1.7-py3-none-any.whl",
"has_sig": false,
"md5_digest": "79af8a8dd5a379dd2b889281963962ba",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 237250,
"upload_time": "2024-10-31T20:37:13",
"upload_time_iso_8601": "2024-10-31T20:37:13.239597Z",
"url": "https://files.pythonhosted.org/packages/e8/79/64875671cb8f1cb8aa53a2a4fbe9dada18958505ccc8167d4e9ffba1b68b/f5_tts_mlx-0.1.7-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "fa21f60dfdf3cd8e7bc611afbcaa34c9adb1029a9d507177aff5ee15121ee934",
"md5": "34133a47c915bd63c18ab5f169a05b4a",
"sha256": "798b915987efb8da657098a94f5e363ea595fb89d639573a365df2532a0f5060"
},
"downloads": -1,
"filename": "f5_tts_mlx-0.1.7.tar.gz",
"has_sig": false,
"md5_digest": "34133a47c915bd63c18ab5f169a05b4a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 236953,
"upload_time": "2024-10-31T20:37:14",
"upload_time_iso_8601": "2024-10-31T20:37:14.402194Z",
"url": "https://files.pythonhosted.org/packages/fa/21/f60dfdf3cd8e7bc611afbcaa34c9adb1029a9d507177aff5ee15121ee934/f5_tts_mlx-0.1.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-31 20:37:14",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "lucasnewman",
"github_project": "f5-tts-mlx",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "f5-tts-mlx"
}