en-tts


Nameen-tts JSON
Version 0.0.2 PyPI version JSON
download
home_pageNone
SummaryWeb app, command-line interface and Python library for synthesizing English texts into speech.
upload_time2024-04-23 10:20:44
maintainerNone
docs_urlNone
authorNone
requires_python<3.12,>=3.8
licenseMIT
keywords text-to-speech speech synthesis praat textgrid utils language linguistics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # en-tts

[![PyPI](https://img.shields.io/pypi/v/en-tts.svg)](https://pypi.python.org/pypi/en-tts)
![PyPI](https://img.shields.io/pypi/pyversions/en-tts.svg)
[![Hugging Face 🤗](https://img.shields.io/badge/%20%F0%9F%A4%97_Hugging_Face-en--tts-blue.svg)](https://huggingface.co/spaces/stefantaubert/en-tts)
[![pytorch](https://img.shields.io/badge/PyTorch_2.0+-ee4c2c?logo=pytorch&logoColor=white)](https://pytorch.org/get-started/pytorch-2.0/)
[![MIT](https://img.shields.io/github/license/stefantaubert/en-tts.svg)](https://github.com/stefantaubert/en-tts/blob/master/LICENSE)
[![PyPI](https://img.shields.io/pypi/wheel/en-tts.svg)](https://pypi.python.org/pypi/en-tts/#files)
![PyPI](https://img.shields.io/pypi/implementation/en-tts.svg)
[![PyPI](https://img.shields.io/github/commits-since/stefantaubert/en-tts/latest/master.svg)](https://github.com/stefantaubert/en-tts/compare/v0.0.2...master)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.11032264.svg)](https://doi.org/10.5281/zenodo.11032264)

Web app, command-line interface and Python library for synthesizing English texts into speech.

## Installation

```sh
pip install en-tts --user
```

## Usage as web app

Visit [🤗 Hugging Face](https://huggingface.co/spaces/stefantaubert/en-tts) for a live demo.

<a href="https://huggingface.co/spaces/stefantaubert/en-tts">
<img src="https://github.com/stefantaubert/en-tts/raw/master/img/hf.png" alt="Screenshot Hugging Face" style="max-width: 600px; width: 100%"/>
</a>

You can also run it locally be executing `en-tts-web` in CLI and opening your browser on [http://127.0.0.1:7860](http://127.0.0.1:7860).

## Usage as CLI

```sh
en-tts-cli synthesize "When the sunlight strikes raindrops in the air, they act as a prism and form a rainbow."
```

The output can be listened [here](https://github.com/stefantaubert/en-tts/raw/master/examples/rainbow.wav).

## Usage as library

```py
from pathlib import Path
from tempfile import gettempdir

from en_tts import Synthesizer, Transcriber, normalize_audio, save_audio

text = "When the sunlight strikes raindrops in the air, they act as a prism and form a rainbow."

transcriber = Transcriber()
synthesizer = Synthesizer()

text_ipa = transcriber.transcribe_to_ipa(text)
audio = synthesizer.synthesize(text_ipa)

tmp_dir = Path(gettempdir())
save_audio(audio, tmp_dir / "output.wav")

# Optional: normalize output
normalize_audio(tmp_dir / "output.wav", tmp_dir / "output_norm.wav")
```

## Model info

The used TTS model is published [here](https://zenodo.org/records/10107104).

Evaluation results:

- MOS naturalness: 3.55 ± 0.28 (GT: 4.17 ± 0.23)
- MOS intelligibility: 4.44 ± 0.24 (GT: 4.63 ± 0.19)
- Mean MCD-DTW: 29.15
- Mean penalty: 0.1018

### Phoneme set

- Vowels: i, u, æ, ɑ, ɔ, ə, ɛ, ɪ, ʊ, ʌ
- Diphthongs: aɪ, aʊ, eɪ, oʊ, ɔɪ
- R-colored vowels: ɔr, ər, ɛr, ɪr, ʊr, ʌr
- Consonants: b, d, dʒ, f, h, j, k, l, m, n, p, r, s, t, tʃ, v, w, z, ð, ŋ, ɡ, ʃ, θ
- Breaks:
  - SIL0 (no break)
  - SIL1 (short break)
  - SIL2 (break)
  - SIL3 (long break)
- Special characters: . ? ! , : ; - — " ' ( ) [ ]

Each vowel, diphthong, r-colored vowel and consonant can have one of these duration markers:

- ˘ -> very short, e.g., oʊ˘
- nothing -> normal, e.g., oʊ
- ˑ -> half long, e.g., oʊˑ
- ː -> long, e.g., oʊː

Furthermore, each vowel, diphthong and r-colored vowel can have a leading stress symbol attached:

- ˈ -> primary stress, e.g., ˈoʊ
- ˌ -> secondary stress, e.g., ˌoʊ
- nothing -> no stress, e.g., oʊ

Stress and duration markers can be combined, e.g., ˌoʊː

## Citation

If you want to cite this repo, you can use the BibTeX-entry generated by GitHub (see *About => Cite this repository*).

- Taubert, S. (2024). en-tts (Version 0.0.2) [Computer software]. [https://doi.org/10.5281/zenodo.11032264](https://doi.org/10.5281/zenodo.11032264)

## Acknowledgments

Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 416228727 – CRC 1410

The authors gratefully acknowledge the GWK support for funding this project by providing computing time through the Center for Information Services and HPC (ZIH) at TU Dresden.

The authors are grateful to the Center for Information Services and High Performance Computing [Zentrum fur Informationsdienste und Hochleistungsrechnen (ZIH)] at TU Dresden for providing its facilities for high throughput calculations.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "en-tts",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.12,>=3.8",
    "maintainer_email": "Stefan Taubert <pypi@stefantaubert.com>",
    "keywords": "Text-to-speech, Speech synthesis, praat, TextGrid, Utils, Language, Linguistics",
    "author": null,
    "author_email": "Stefan Taubert <pypi@stefantaubert.com>",
    "download_url": "https://files.pythonhosted.org/packages/1f/be/77e64915c75b52335242f6427bfc213f133ecd57ecee99038454d03c2e5a/en_tts-0.0.2.tar.gz",
    "platform": null,
    "description": "# en-tts\n\n[![PyPI](https://img.shields.io/pypi/v/en-tts.svg)](https://pypi.python.org/pypi/en-tts)\n![PyPI](https://img.shields.io/pypi/pyversions/en-tts.svg)\n[![Hugging Face \ud83e\udd17](https://img.shields.io/badge/%20%F0%9F%A4%97_Hugging_Face-en--tts-blue.svg)](https://huggingface.co/spaces/stefantaubert/en-tts)\n[![pytorch](https://img.shields.io/badge/PyTorch_2.0+-ee4c2c?logo=pytorch&logoColor=white)](https://pytorch.org/get-started/pytorch-2.0/)\n[![MIT](https://img.shields.io/github/license/stefantaubert/en-tts.svg)](https://github.com/stefantaubert/en-tts/blob/master/LICENSE)\n[![PyPI](https://img.shields.io/pypi/wheel/en-tts.svg)](https://pypi.python.org/pypi/en-tts/#files)\n![PyPI](https://img.shields.io/pypi/implementation/en-tts.svg)\n[![PyPI](https://img.shields.io/github/commits-since/stefantaubert/en-tts/latest/master.svg)](https://github.com/stefantaubert/en-tts/compare/v0.0.2...master)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.11032264.svg)](https://doi.org/10.5281/zenodo.11032264)\n\nWeb app, command-line interface and Python library for synthesizing English texts into speech.\n\n## Installation\n\n```sh\npip install en-tts --user\n```\n\n## Usage as web app\n\nVisit [\ud83e\udd17 Hugging Face](https://huggingface.co/spaces/stefantaubert/en-tts) for a live demo.\n\n<a href=\"https://huggingface.co/spaces/stefantaubert/en-tts\">\n<img src=\"https://github.com/stefantaubert/en-tts/raw/master/img/hf.png\" alt=\"Screenshot Hugging Face\" style=\"max-width: 600px; width: 100%\"/>\n</a>\n\nYou can also run it locally be executing `en-tts-web` in CLI and opening your browser on [http://127.0.0.1:7860](http://127.0.0.1:7860).\n\n## Usage as CLI\n\n```sh\nen-tts-cli synthesize \"When the sunlight strikes raindrops in the air, they act as a prism and form a rainbow.\"\n```\n\nThe output can be listened [here](https://github.com/stefantaubert/en-tts/raw/master/examples/rainbow.wav).\n\n## Usage as library\n\n```py\nfrom pathlib import Path\nfrom tempfile import gettempdir\n\nfrom en_tts import Synthesizer, Transcriber, normalize_audio, save_audio\n\ntext = \"When the sunlight strikes raindrops in the air, they act as a prism and form a rainbow.\"\n\ntranscriber = Transcriber()\nsynthesizer = Synthesizer()\n\ntext_ipa = transcriber.transcribe_to_ipa(text)\naudio = synthesizer.synthesize(text_ipa)\n\ntmp_dir = Path(gettempdir())\nsave_audio(audio, tmp_dir / \"output.wav\")\n\n# Optional: normalize output\nnormalize_audio(tmp_dir / \"output.wav\", tmp_dir / \"output_norm.wav\")\n```\n\n## Model info\n\nThe used TTS model is published [here](https://zenodo.org/records/10107104).\n\nEvaluation results:\n\n- MOS naturalness: 3.55 \u00b1 0.28 (GT: 4.17 \u00b1 0.23)\n- MOS intelligibility: 4.44 \u00b1 0.24 (GT: 4.63 \u00b1 0.19)\n- Mean MCD-DTW: 29.15\n- Mean penalty: 0.1018\n\n### Phoneme set\n\n- Vowels: i, u, \u00e6, \u0251, \u0254, \u0259, \u025b, \u026a, \u028a, \u028c\n- Diphthongs: a\u026a, a\u028a, e\u026a, o\u028a, \u0254\u026a\n- R-colored vowels: \u0254r, \u0259r, \u025br, \u026ar, \u028ar, \u028cr\n- Consonants: b, d, d\u0292, f, h, j, k, l, m, n, p, r, s, t, t\u0283, v, w, z, \u00f0, \u014b, \u0261, \u0283, \u03b8\n- Breaks:\n  - SIL0 (no break)\n  - SIL1 (short break)\n  - SIL2 (break)\n  - SIL3 (long break)\n- Special characters: . ? ! , : ; - \u2014 \" ' ( ) [ ]\n\nEach vowel, diphthong, r-colored vowel and consonant can have one of these duration markers:\n\n- \u02d8 -> very short, e.g., o\u028a\u02d8\n- nothing -> normal, e.g., o\u028a\n- \u02d1 -> half long, e.g., o\u028a\u02d1\n- \u02d0 -> long, e.g., o\u028a\u02d0\n\nFurthermore, each vowel, diphthong and r-colored vowel can have a leading stress symbol attached:\n\n- \u02c8 -> primary stress, e.g., \u02c8o\u028a\n- \u02cc -> secondary stress, e.g., \u02cco\u028a\n- nothing -> no stress, e.g., o\u028a\n\nStress and duration markers can be combined, e.g., \u02cco\u028a\u02d0\n\n## Citation\n\nIf you want to cite this repo, you can use the BibTeX-entry generated by GitHub (see *About => Cite this repository*).\n\n- Taubert, S. (2024). en-tts (Version 0.0.2) [Computer software]. [https://doi.org/10.5281/zenodo.11032264](https://doi.org/10.5281/zenodo.11032264)\n\n## Acknowledgments\n\nFunded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) \u2013 Project-ID 416228727 \u2013 CRC 1410\n\nThe authors gratefully acknowledge the GWK support for funding this project by providing computing time through the Center for Information Services and HPC (ZIH) at TU Dresden.\n\nThe authors are grateful to the Center for Information Services and High Performance Computing [Zentrum fur Informationsdienste und Hochleistungsrechnen (ZIH)] at TU Dresden for providing its facilities for high throughput calculations.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Web app, command-line interface and Python library for synthesizing English texts into speech.",
    "version": "0.0.2",
    "project_urls": {
        "Homepage": "https://github.com/stefantaubert/en-tts",
        "Issues": "https://github.com/stefantaubert/en-tts/issues"
    },
    "split_keywords": [
        "text-to-speech",
        " speech synthesis",
        " praat",
        " textgrid",
        " utils",
        " language",
        " linguistics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a1b626552015973d562655df43ef5494f08db95b70350531a76cba0775c98180",
                "md5": "3b82301ad824bf8cb3b75944f3555fb0",
                "sha256": "dc082decdc2f5f9c2aabf485802d6bdd059c87a09e1b501aa0f118f6b13328bb"
            },
            "downloads": -1,
            "filename": "en_tts-0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3b82301ad824bf8cb3b75944f3555fb0",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.12,>=3.8",
            "size": 26594,
            "upload_time": "2024-04-23T10:20:41",
            "upload_time_iso_8601": "2024-04-23T10:20:41.937126Z",
            "url": "https://files.pythonhosted.org/packages/a1/b6/26552015973d562655df43ef5494f08db95b70350531a76cba0775c98180/en_tts-0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1fbe77e64915c75b52335242f6427bfc213f133ecd57ecee99038454d03c2e5a",
                "md5": "710f72a867762dabdd45c911cf75aaf9",
                "sha256": "6487b2d7fc41249f5a5ab72ba74c4051a5f20157973918a2ff15819fd19285ac"
            },
            "downloads": -1,
            "filename": "en_tts-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "710f72a867762dabdd45c911cf75aaf9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.12,>=3.8",
            "size": 512001,
            "upload_time": "2024-04-23T10:20:44",
            "upload_time_iso_8601": "2024-04-23T10:20:44.107983Z",
            "url": "https://files.pythonhosted.org/packages/1f/be/77e64915c75b52335242f6427bfc213f133ecd57ecee99038454d03c2e5a/en_tts-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-23 10:20:44",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "stefantaubert",
    "github_project": "en-tts",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "en-tts"
}
        
Elapsed time: 3.66645s