# zho-tts
[![PyPI](https://img.shields.io/pypi/v/zho-tts.svg)](https://pypi.python.org/pypi/zho-tts)
![PyPI](https://img.shields.io/pypi/pyversions/zho-tts.svg)
[![Hugging Face 🤗](https://img.shields.io/badge/%20%F0%9F%A4%97_Hugging_Face-zho--tts-blue.svg)](https://huggingface.co/spaces/stefantaubert/zho-tts)
[![pytorch](https://img.shields.io/badge/PyTorch_2.0+-ee4c2c?logo=pytorch&logoColor=white)](https://pytorch.org/get-started/pytorch-2.0/)
[![MIT](https://img.shields.io/github/license/stefantaubert/zh-tts.svg)](https://github.com/stefantaubert/zh-tts/blob/master/LICENSE)
[![PyPI](https://img.shields.io/pypi/wheel/zho-tts.svg)](https://pypi.python.org/pypi/zho-tts/#files)
![PyPI](https://img.shields.io/pypi/implementation/zho-tts.svg)
[![PyPI](https://img.shields.io/github/commits-since/stefantaubert/zh-tts/latest/master.svg)](https://github.com/stefantaubert/zh-tts/compare/v0.0.2...master)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.11048515.svg)](https://doi.org/10.5281/zenodo.11048515)
Web app, command-line interface and Python library for synthesizing Chinese texts into speech.
## Installation
```sh
pip install zho-tts --user
```
## Usage as web app
Visit [🤗 Hugging Face](https://huggingface.co/spaces/stefantaubert/zho-tts) for a live demo.
<a href="https://huggingface.co/spaces/stefantaubert/zho-tts">
<img src="https://github.com/stefantaubert/zh-tts/raw/master/img/hf.png" alt="Screenshot Hugging Face" style="max-width: 600px; width: 100%"/>
</a>
You can also run it locally be executing `zho-tts-web` in CLI and opening your browser on [http://127.0.0.1:7860](http://127.0.0.1:7860).
## Usage as CLI
```sh
zho-tts-cli synthesize "长江 航务 管理局 和 长江 轮船 总公司 最近 决定 安排 一百三十三 艘 客轮 迎接 长江 干线 春运。"
```
The output can be listened [here](https://github.com/stefantaubert/zh-tts/raw/master/examples/synthesize.wav).
```sh
# Same example using IPA input
zho-tts-cli synthesize-ipa "ʈʂː|a˧˩˧˘|ŋ|tɕ˘|j|a˥˘|ŋ˘|SIL0|x|a˧˥˘|ŋ|u˥˩|SIL0|k|w|a˧˩˧|n|l˘|i˧˩˧|tɕː|y˧˥ˑ|SIL0|x|ɤ˧˥|SIL0|ʈʂː|a˧˩˧˘|ŋ|tɕ˘|j|a˥˘|ŋ|SIL0|l|w|ə˧˥|n|ʈʂʰ˘|w|a˧˥|n|SIL0|ts˘|ʊ˧˩˧|ŋ˘|kː|ʊ˥|ŋ|s|ɹ̩˥ˑ|SIL0|ts|w˘|ei̯˥˩|tɕ|i˥˩˘|n|SIL0|tɕ|ɥ|e˧˥|t|i˥˩|ŋ|SIL3|a˥|n|pʰ|ai̯˧˥|SIL0|i˥ˑ|p|ai̯˧˩˧|s|a˥˘|n|ʂ˘|ɻ̩˧˥|s|a˥|n|SIL0|s˘|ou̯˥|SIL0|kʰˑ|ɤ˥˩|lː|wˑ|ə˧˥ˑ|n|SIL0|i˧˥ː|ŋ|tɕ˘|j˘|e˥|SIL0|ʈʂː|a˧˩˧|ŋ|tɕ˘|j|a˥˘|ŋ|SIL0|k˘|a˥˩|n|ɕ|j˘|ɛ˥˩|n˘|SIL0|ʈʂʰˑ|w˘|ə˥˘|nː|y˥˩ˑ|nː|。"
```
The output can be listened [here](https://github.com/stefantaubert/zh-tts/raw/master/examples/synthesize-ipa.wav).
## Usage as library
```py
from pathlib import Path
from tempfile import gettempdir
from zho_tts import Synthesizer, Transcriber, normalize_audio, save_audio
text = "长江 航务 管理局 和 长江 轮船 总公司 最近 决定 安排 一百三十三 艘 客轮 迎接 长江 干线 春运。"
transcriber = Transcriber()
synthesizer = Synthesizer()
text_ipa = transcriber.transcribe_to_ipa(text)
audio = synthesizer.synthesize(text_ipa)
tmp_dir = Path(gettempdir())
save_audio(audio, tmp_dir / "output.wav")
# Optional: normalize output
normalize_audio(tmp_dir / "output.wav", tmp_dir / "output_norm.wav")
```
## Model info
The used TTS model is published [here](https://doi.org/10.5281/zenodo.10209990).
### Phoneme set
- Vowels: a ɛ e ə ɚ ɤ i o u ʊ y
- Diphthongs: ai̯ au̯ ei̯ ou̯
- Consonants: f j k kʰ l m n p pʰ ɹ̩¹ ɻ¹ ɻ̩¹ s t ts tsʰ tɕ tɕʰ tʰ w x ŋ ɕ ɥ ʂ ʈʂ ʈʂʰ
- Breaks:
- SIL0 (no break)
- SIL1 (short break)
- SIL2 (break)
- SIL3 (long break)
- special characters: 。 ?
Vowels and diphthongs contain one of these tones:
- ˥ (first tone)
- ˧˥ (second tone)
- ˧˩˧ (third tone)
- ˥˩ (fourth tone)
- (none)
¹ These consonants contain also tones.
Vowels, diphthongs and consonants contain one of these duration markers:
- ˘ -> very short, e.g., ou̯˘
- nothing -> normal, e.g., ou̯
- ˑ -> half long, e.g., ou̯ˑ
- ː -> long, e.g., ou̯ː
Tones and duration markers can be combined, e.g., ə˧˥ː
### Speakers
![Objective Evaluation](https://github.com/stefantaubert/zh-tts/raw/master/img/eval.png)
## Citation
If you want to cite this repo, you can use the BibTeX-entry generated by GitHub (see *About => Cite this repository*).
- Taubert, S. (2024). zho-tts (Version 0.0.2) [Computer software]. [https://doi.org/10.5281/zenodo.11048515](https://doi.org/10.5281/zenodo.11048515)
## Acknowledgments
Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 416228727 – CRC 1410
The authors gratefully acknowledge the GWK support for funding this project by providing computing time through the Center for Information Services and HPC (ZIH) at TU Dresden.
The authors are grateful to the Center for Information Services and High Performance Computing [Zentrum fur Informationsdienste und Hochleistungsrechnen (ZIH)] at TU Dresden for providing its facilities for high throughput calculations.
Raw data
{
"_id": null,
"home_page": null,
"name": "zho-tts",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.12,>=3.8",
"maintainer_email": "Stefan Taubert <pypi@stefantaubert.com>",
"keywords": "Text-to-speech, Speech synthesis, praat, TextGrid, Utils, Language, Linguistics",
"author": null,
"author_email": "Stefan Taubert <pypi@stefantaubert.com>",
"download_url": "https://files.pythonhosted.org/packages/0c/c3/1f84c3b66afca7bec93554961bfe069d6c82ae8b95b6e0e4cbcc9891052c/zho_tts-0.0.2.tar.gz",
"platform": null,
"description": "# zho-tts\n\n[![PyPI](https://img.shields.io/pypi/v/zho-tts.svg)](https://pypi.python.org/pypi/zho-tts)\n![PyPI](https://img.shields.io/pypi/pyversions/zho-tts.svg)\n[![Hugging Face \ud83e\udd17](https://img.shields.io/badge/%20%F0%9F%A4%97_Hugging_Face-zho--tts-blue.svg)](https://huggingface.co/spaces/stefantaubert/zho-tts)\n[![pytorch](https://img.shields.io/badge/PyTorch_2.0+-ee4c2c?logo=pytorch&logoColor=white)](https://pytorch.org/get-started/pytorch-2.0/)\n[![MIT](https://img.shields.io/github/license/stefantaubert/zh-tts.svg)](https://github.com/stefantaubert/zh-tts/blob/master/LICENSE)\n[![PyPI](https://img.shields.io/pypi/wheel/zho-tts.svg)](https://pypi.python.org/pypi/zho-tts/#files)\n![PyPI](https://img.shields.io/pypi/implementation/zho-tts.svg)\n[![PyPI](https://img.shields.io/github/commits-since/stefantaubert/zh-tts/latest/master.svg)](https://github.com/stefantaubert/zh-tts/compare/v0.0.2...master)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.11048515.svg)](https://doi.org/10.5281/zenodo.11048515)\n\nWeb app, command-line interface and Python library for synthesizing Chinese texts into speech.\n\n## Installation\n\n```sh\npip install zho-tts --user\n```\n\n## Usage as web app\n\nVisit [\ud83e\udd17 Hugging Face](https://huggingface.co/spaces/stefantaubert/zho-tts) for a live demo.\n\n<a href=\"https://huggingface.co/spaces/stefantaubert/zho-tts\">\n<img src=\"https://github.com/stefantaubert/zh-tts/raw/master/img/hf.png\" alt=\"Screenshot Hugging Face\" style=\"max-width: 600px; width: 100%\"/>\n</a>\n\nYou can also run it locally be executing `zho-tts-web` in CLI and opening your browser on [http://127.0.0.1:7860](http://127.0.0.1:7860).\n\n## Usage as CLI\n\n```sh\nzho-tts-cli synthesize \"\u957f\u6c5f \u822a\u52a1 \u7ba1\u7406\u5c40 \u548c \u957f\u6c5f \u8f6e\u8239 \u603b\u516c\u53f8 \u6700\u8fd1 \u51b3\u5b9a \u5b89\u6392 \u4e00\u767e\u4e09\u5341\u4e09 \u8258 \u5ba2\u8f6e \u8fce\u63a5 \u957f\u6c5f \u5e72\u7ebf \u6625\u8fd0\u3002\"\n```\n\nThe output can be listened [here](https://github.com/stefantaubert/zh-tts/raw/master/examples/synthesize.wav).\n\n```sh\n# Same example using IPA input\nzho-tts-cli synthesize-ipa \"\u0288\u0282\u02d0|a\u02e7\u02e9\u02e7\u02d8|\u014b|t\u0255\u02d8|j|a\u02e5\u02d8|\u014b\u02d8|SIL0|x|a\u02e7\u02e5\u02d8|\u014b|u\u02e5\u02e9|SIL0|k|w|a\u02e7\u02e9\u02e7|n|l\u02d8|i\u02e7\u02e9\u02e7|t\u0255\u02d0|y\u02e7\u02e5\u02d1|SIL0|x|\u0264\u02e7\u02e5|SIL0|\u0288\u0282\u02d0|a\u02e7\u02e9\u02e7\u02d8|\u014b|t\u0255\u02d8|j|a\u02e5\u02d8|\u014b|SIL0|l|w|\u0259\u02e7\u02e5|n|\u0288\u0282\u02b0\u02d8|w|a\u02e7\u02e5|n|SIL0|ts\u02d8|\u028a\u02e7\u02e9\u02e7|\u014b\u02d8|k\u02d0|\u028a\u02e5|\u014b|s|\u0279\u0329\u02e5\u02d1|SIL0|ts|w\u02d8|ei\u032f\u02e5\u02e9|t\u0255|i\u02e5\u02e9\u02d8|n|SIL0|t\u0255|\u0265|e\u02e7\u02e5|t|i\u02e5\u02e9|\u014b|SIL3|a\u02e5|n|p\u02b0|ai\u032f\u02e7\u02e5|SIL0|i\u02e5\u02d1|p|ai\u032f\u02e7\u02e9\u02e7|s|a\u02e5\u02d8|n|\u0282\u02d8|\u027b\u0329\u02e7\u02e5|s|a\u02e5|n|SIL0|s\u02d8|ou\u032f\u02e5|SIL0|k\u02b0\u02d1|\u0264\u02e5\u02e9|l\u02d0|w\u02d1|\u0259\u02e7\u02e5\u02d1|n|SIL0|i\u02e7\u02e5\u02d0|\u014b|t\u0255\u02d8|j\u02d8|e\u02e5|SIL0|\u0288\u0282\u02d0|a\u02e7\u02e9\u02e7|\u014b|t\u0255\u02d8|j|a\u02e5\u02d8|\u014b|SIL0|k\u02d8|a\u02e5\u02e9|n|\u0255|j\u02d8|\u025b\u02e5\u02e9|n\u02d8|SIL0|\u0288\u0282\u02b0\u02d1|w\u02d8|\u0259\u02e5\u02d8|n\u02d0|y\u02e5\u02e9\u02d1|n\u02d0|\u3002\"\n```\n\nThe output can be listened [here](https://github.com/stefantaubert/zh-tts/raw/master/examples/synthesize-ipa.wav).\n\n## Usage as library\n\n```py\nfrom pathlib import Path\nfrom tempfile import gettempdir\n\nfrom zho_tts import Synthesizer, Transcriber, normalize_audio, save_audio\n\ntext = \"\u957f\u6c5f \u822a\u52a1 \u7ba1\u7406\u5c40 \u548c \u957f\u6c5f \u8f6e\u8239 \u603b\u516c\u53f8 \u6700\u8fd1 \u51b3\u5b9a \u5b89\u6392 \u4e00\u767e\u4e09\u5341\u4e09 \u8258 \u5ba2\u8f6e \u8fce\u63a5 \u957f\u6c5f \u5e72\u7ebf \u6625\u8fd0\u3002\"\n\ntranscriber = Transcriber()\nsynthesizer = Synthesizer()\n\ntext_ipa = transcriber.transcribe_to_ipa(text)\naudio = synthesizer.synthesize(text_ipa)\n\ntmp_dir = Path(gettempdir())\nsave_audio(audio, tmp_dir / \"output.wav\")\n\n# Optional: normalize output\nnormalize_audio(tmp_dir / \"output.wav\", tmp_dir / \"output_norm.wav\")\n```\n\n## Model info\n\nThe used TTS model is published [here](https://doi.org/10.5281/zenodo.10209990).\n\n### Phoneme set\n\n- Vowels: a \u025b e \u0259 \u025a \u0264 i o u \u028a y\n- Diphthongs: ai\u032f au\u032f ei\u032f ou\u032f\n- Consonants: f j k k\u02b0 l m n p p\u02b0 \u0279\u0329\u00b9 \u027b\u00b9 \u027b\u0329\u00b9 s t ts ts\u02b0 t\u0255 t\u0255\u02b0 t\u02b0 w x \u014b \u0255 \u0265 \u0282 \u0288\u0282 \u0288\u0282\u02b0\n- Breaks:\n - SIL0 (no break)\n - SIL1 (short break)\n - SIL2 (break)\n - SIL3 (long break)\n- special characters: \u3002 ?\n\nVowels and diphthongs contain one of these tones:\n\n- \u02e5 (first tone)\n- \u02e7\u02e5 (second tone)\n- \u02e7\u02e9\u02e7 (third tone)\n- \u02e5\u02e9 (fourth tone)\n- (none)\n\n\u00b9 These consonants contain also tones.\n\nVowels, diphthongs and consonants contain one of these duration markers:\n\n- \u02d8 -> very short, e.g., ou\u032f\u02d8\n- nothing -> normal, e.g., ou\u032f\n- \u02d1 -> half long, e.g., ou\u032f\u02d1\n- \u02d0 -> long, e.g., ou\u032f\u02d0\n\nTones and duration markers can be combined, e.g., \u0259\u02e7\u02e5\u02d0\n\n### Speakers\n\n![Objective Evaluation](https://github.com/stefantaubert/zh-tts/raw/master/img/eval.png)\n\n## Citation\n\nIf you want to cite this repo, you can use the BibTeX-entry generated by GitHub (see *About => Cite this repository*).\n\n- Taubert, S. (2024). zho-tts (Version 0.0.2) [Computer software]. [https://doi.org/10.5281/zenodo.11048515](https://doi.org/10.5281/zenodo.11048515)\n\n## Acknowledgments\n\nFunded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) \u2013 Project-ID 416228727 \u2013 CRC 1410\n\nThe authors gratefully acknowledge the GWK support for funding this project by providing computing time through the Center for Information Services and HPC (ZIH) at TU Dresden.\n\nThe authors are grateful to the Center for Information Services and High Performance Computing [Zentrum fur Informationsdienste und Hochleistungsrechnen (ZIH)] at TU Dresden for providing its facilities for high throughput calculations.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Web app, command-line interface and Python library for synthesizing Chinese texts into speech.",
"version": "0.0.2",
"project_urls": {
"Homepage": "https://github.com/stefantaubert/zh-tts",
"Issues": "https://github.com/stefantaubert/zh-tts/issues"
},
"split_keywords": [
"text-to-speech",
" speech synthesis",
" praat",
" textgrid",
" utils",
" language",
" linguistics"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b9c68b09ae86b884d73dba0d73b89e327b1364c5f5baf4794bb59f529e0782bf",
"md5": "1fcf4e4c9c36b123414250c77b7f3124",
"sha256": "d144187f82bec5026b927c16dbb67c32759477083a480de9ce6a93906937e2a1"
},
"downloads": -1,
"filename": "zho_tts-0.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1fcf4e4c9c36b123414250c77b7f3124",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.12,>=3.8",
"size": 56453,
"upload_time": "2024-04-24T08:47:15",
"upload_time_iso_8601": "2024-04-24T08:47:15.166328Z",
"url": "https://files.pythonhosted.org/packages/b9/c6/8b09ae86b884d73dba0d73b89e327b1364c5f5baf4794bb59f529e0782bf/zho_tts-0.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "0cc31f84c3b66afca7bec93554961bfe069d6c82ae8b95b6e0e4cbcc9891052c",
"md5": "9f967b335ab54ebd74687ce6a48c153f",
"sha256": "5d307fc84d0d6634e6e4ce406a2f7a843cb27a8f397807d919c6295ee921b4ec"
},
"downloads": -1,
"filename": "zho_tts-0.0.2.tar.gz",
"has_sig": false,
"md5_digest": "9f967b335ab54ebd74687ce6a48c153f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.12,>=3.8",
"size": 1089357,
"upload_time": "2024-04-24T08:47:24",
"upload_time_iso_8601": "2024-04-24T08:47:24.465880Z",
"url": "https://files.pythonhosted.org/packages/0c/c3/1f84c3b66afca7bec93554961bfe069d6c82ae8b95b6e0e4cbcc9891052c/zho_tts-0.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-24 08:47:24",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "stefantaubert",
"github_project": "zh-tts",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "zho-tts"
}