tacotron-cli

Name	tacotron-cli JSON
Version	0.0.5 JSON
	download
home_page
Summary	Command-line interface (CLI) to train Tacotron 2 using .wav <=> .TextGrid pairs.
upload_time	2024-01-25 15:29:27
maintainer
docs_url	None
author
requires_python	<3.12,>=3.8
license	MIT
keywords	text-to-speech speech synthesis corpus utils language linguistics
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # tacotron-cli

[![PyPI](https://img.shields.io/pypi/v/tacotron-cli.svg)](https://pypi.python.org/pypi/tacotron-cli)
[![PyPI](https://img.shields.io/pypi/pyversions/tacotron-cli.svg)](https://pypi.python.org/pypi/tacotron-cli)
[![MIT](https://img.shields.io/github/license/stefantaubert/tacotron.svg)](https://github.com/stefantaubert/tacotron/blob/master/LICENSE)
[![PyPI](https://img.shields.io/pypi/wheel/tacotron-cli.svg)](https://pypi.python.org/pypi/tacotron-cli)
[![PyPI](https://img.shields.io/pypi/implementation/tacotron-cli.svg)](https://pypi.python.org/pypi/tacotron-cli)
[![PyPI](https://img.shields.io/github/commits-since/stefantaubert/tacotron/latest/master.svg)](https://github.com/stefantaubert/tacotron/compare/v0.0.5...master)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.10568731.svg)](https://doi.org/10.5281/zenodo.10568731)

Command-line interface (CLI) to train Tacotron 2 using .wav <=> .TextGrid pairs.

## Features

- train phoneme stress separately (ARPAbet/IPA)
- train phoneme tone separately (IPA)
- train phoneme duration separately (IPA)
- train single/multi-speaker
- train/synthesize on CPU or GPU
- synthesis of paragraphs
- copy embeddings from one checkpoint to another
- train using embeddings or one-hot encodings

## Installation

```sh
pip install tacotron-cli --user
```

## Usage

```txt
usage: tacotron-cli [-h] [-v] {create-mels,train,continue-train,validate,synthesize,synthesize-grids,analyze,add-missing-symbols} ...

Command-line interface (CLI) to train Tacotron 2 using .wav <=> .TextGrid pairs.

positional arguments:
  {create-mels,train,continue-train,validate,synthesize,synthesize-grids,analyze,add-missing-symbols}
                              description
    create-mels               create mel-spectrograms from audio files
    train                     start training
    continue-train            continue training from a checkpoint
    validate                  validate checkpoint(s)
    synthesize                synthesize lines from a file
    synthesize-grids          synthesize .TextGrid files
    analyze                   analyze checkpoint
    add-missing-symbols       copy missing symbols from one checkpoint to another

options:
  -h, --help                  show this help message and exit
  -v, --version               show program's version number and exit
```

## Training

The dataset structure need to follow the generic format of [speech-dataset-parser](https://pypi.org/project/speech-dataset-parser/), i.e., each TextGrid need to contain a tier in which all phonemes are separated into single intervals, e.g., `T|h|i|s| |i|s| |a| |t|e|x|t|.`.

Tips:

- place stress directly to the vowel of the syllable, e.g. `b|ˈo|d|i` instead of `ˈb|o|d|i` (body)
- place tone directly to the vowel of the syllable, e.g. `ʈʂʰ|w|a˥˩|n` instead of `ʈʂʰ|w|a|n˥˩` (串)
  - tone-characters which are considered: `˥ ˦ ˧ ˨ ˩`, e.g., `ɑ˥˩`
- duration-characters which are considered: `˘ ˑ ː`, e.g., `ʌː`
- normalize the text, e.g., numbers should be written out
- substituted space by either `SIL0`, `SIL1` or `SIL2` depending on the duration of the pause
  - use `SIL0` for no pause
  - use `SIL1` for a short pause, for example after a comma `...|v|i|ˈɛ|n|ʌ|,|SIL1|ˈɔ|s|t|ɹ|i|ʌ|...`
  - use `SIL2` for a longer pause, for example after a sentence: `...|ˈɝ|θ|.|SIL2`
- Note: only phonemes occurring in the TextGrids (on the selected tier) are possible to synthesize

## Synthesis

To prepare a text for synthesis, following things need to be considered:

- each line in the text file will be synthesized as a single file, therefore it is recommended to place each sentence onto a single line
- paragraphs can be separated by a blank line
- each symbol needs can be separated by an separator like `|`, e.g. `s|ˌɪ|ɡ|ɝ|ˈɛ|t`
  - this is useful if the model contains phonemes/symbols that consist of multiple characters, e.g., `ˈɛ`

Example valid sentence: "As the overlying plate lifts up, it also forms mountain ranges." => `ˈæ|z|SIL0|ð|ʌ|SIL0|ˌoʊ|v|ɝ|l|ˈaɪ|ɪ|ŋ|SIL0|p|l|ˈeɪ|t|SIL0|l|ˈɪ|f|t|s|SIL0|ˈʌ|p|,|SIL1|ɪ|t|SIL0|ˈɔ|l|s|oʊ|SIL0|f|ˈɔ|ɹ|m|z|SIL0|m|ˈaʊ|n|t|ʌ|n|SIL0|ɹ|ˈeɪ|n|d͡ʒ|ʌ|z|.|SIL2`

Example invalid sentence: "Digestion is a vital process which involves the breakdown of food into smaller and smaller components, until they can be absorbed and assimilated into the body." => `daɪˈʤɛsʧʌn ɪz ʌ ˈvaɪtʌl ˈpɹɑˌsɛs wɪʧ ɪnˈvɑlvz ðʌ ˈbɹeɪkˌdaʊn ʌv fud ˈɪntu ˈsmɔlɝ ænd ˈsmɔlɝ kʌmˈpoʊnʌnts, ʌnˈtɪl ðeɪ kæn bi ʌbˈzɔɹbd ænd ʌˈsɪmʌˌleɪtɪd ˈɪntu ðʌ ˈbɑdi.`

## Pretrained Models

- English
  - [LJ Speech English TTS](https://zenodo.org/records/10200955)
  - [LJ Speech English TTS with explicit duration markers](https://zenodo.org/records/10107104)
- Chinese
  - [THCHS-30 Chinese TTS](https://zenodo.org/records/10210310)
  - [THCHS-30 Chinese TTS with explicit duration markers](https://zenodo.org/records/10209990)

## Audio Example

"The North Wind and the Sun were disputing which was the stronger, when a traveler came along wrapped in a warm cloak." [Listen here](https://tuc.cloud/index.php/s/gzaYDNKinHw6GCz) (headphones recommended)

## Example Synthesis

To reproduce the audio example from above, you can use the following commands:

```sh
# Create example directory
mkdir ~/example

# Download pre-trained Tacotron model checkpoint
wget https://tuc.cloud/index.php/s/xxFCDMgEk8dZKbp/download/LJS-IPA-101500.pt -O ~/example/checkpoint-tacotron.pt

# Download pre-trained Waveglow model checkpoint
wget https://tuc.cloud/index.php/s/yBRaWz5oHrFwigf/download/LJS-v3-580000.pt -O ~/example/checkpoint-waveglow.pt

# Create text containing phonetic transcription of: "The North Wind and the Sun were disputing which was the stronger, when a traveler came along wrapped in a warm cloak."
cat > ~/example/text.txt << EOF
ð|ʌ|SIL0|n|ˈɔ|ɹ|θ|SIL0|w|ˈɪ|n|d|SIL0|ˈæ|n|d|SIL0|ð|ʌ|SIL0|s|ˈʌ|n|SIL0|w|ɝ|SIL0|d|ɪ|s|p|j|ˈu|t|ɪ|ŋ|SIL0|h|w|ˈɪ|t͡ʃ|SIL0|w|ˈɑ|z|SIL0|ð|ʌ|SIL0|s|t|ɹ|ˈɔ|ŋ|ɝ|,|SIL1|h|w|ˈɛ|n|SIL0|ʌ|SIL0|t|ɹ|ˈæ|v|ʌ|l|ɝ|SIL0|k|ˈeɪ|m|SIL0|ʌ|l|ˈɔ|ŋ|SIL0|ɹ|ˈæ|p|t|SIL0|ɪ|n|SIL0|ʌ|SIL0|w|ˈɔ|ɹ|m|SIL0|k|l|ˈoʊ|k|.|SIL2
EOF

# Synthesize text to mel-spectrogram
tacotron-cli synthesize \
  ~/example/checkpoint-tacotron.pt \
  ~/example/text.txt \
  --sep "|"

# Install waveglow-cli for synthesis of mel-spectrograms
pip install waveglow-cli --user

# Synthesize mel-spectrogram to wav
waveglow-cli synthesize \
  ~/example/checkpoint-waveglow.pt \
  ~/example/text -o

# Resulting wav is written to: ~/example/text/1-1.npy.wav
```

## Roadmap

- Outsource method to convert audio files to mel-spectrograms before training
- Better logging
- Provide more pre-trained models
- Adding tests

## Development setup

```sh
# update
sudo apt update
# install Python 3.8-3.11 for ensuring that tests can be run
sudo apt install python3-pip \
  python3.8 python3.8-dev python3.8-distutils python3.8-venv \
  python3.9 python3.9-dev python3.9-distutils python3.9-venv \
  python3.10 python3.10-dev python3.10-distutils python3.10-venv \
  python3.11 python3.11-dev python3.11-distutils python3.11-venv
# install pipenv for creation of virtual environments
python3.8 -m pip install pipenv --user

# check out repo
git clone https://github.com/stefantaubert/tacotron.git
cd tacotron
# create virtual environment
python3.8 -m pipenv install --dev
```

## Running the tests

```sh
# first install the tool like in "Development setup"
# then, navigate into the directory of the repo (if not already done)
cd tacotron
# activate environment
python3.8 -m pipenv shell
# run tests
tox
```

Final lines of test result output:

```log
py38: commands succeeded
py39: commands succeeded
py310: commands succeeded
py311: commands succeeded
congratulations :)
```

## License

MIT License

## Acknowledgments

Model code adapted from [Nvidia](https://github.com/NVIDIA/tacotron2).

Papers:

- [Tacotron: Towards End-to-End Speech Synthesis](https://www.isca-speech.org/archive/interspeech_2017/wang17n_interspeech.html)
- [Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions](https://ieeexplore.ieee.org/document/8461368)

Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 416228727 – CRC 1410

## Citation

If you want to cite this repo, you can use the BibTeX-entry generated by GitHub (see *About => Cite this repository*).

```txt
Taubert, S. (2024). tacotron-cli (Version 0.0.5) [Computer software]. [https://doi.org/10.5281/zenodo.10568731](https://doi.org/10.5281/zenodo.10568731)
```

## Cited by

- Taubert, S., Sternkopf, J., Kahl, S., & Eibl, M. (2022). A Comparison of Text Selection Algorithms for Sequence-to-Sequence Neural TTS. 2022 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), 1–6. [https://doi.org/10.1109/ICSPCC55723.2022.9984283](https://doi.org/10.1109/ICSPCC55723.2022.9984283)
- Albrecht, S., Tamboli, R., Taubert, S., Eibl, M., Rey, G. D., & Schmied, J. (2022). Towards a Vowel Formant Based Quality Metric for Text-to-Speech Systems: Measuring Monophthong Naturalness. 2022 IEEE 9th International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA), 1–6. [https://doi.org/10.1109/CIVEMSA53371.2022.9853712](https://doi.org/10.1109/CIVEMSA53371.2022.9853712)

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "tacotron-cli",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "<3.12,>=3.8",
    "maintainer_email": "Stefan Taubert <pypi@stefantaubert.com>",
    "keywords": "Text-to-speech,Speech synthesis,Corpus,Utils,Language,Linguistics",
    "author": "",
    "author_email": "Stefan Taubert <pypi@stefantaubert.com>",
    "download_url": "https://files.pythonhosted.org/packages/ae/3d/3e4f3ea07fa4cb436f374c47a6758d64e44e435676a64b952a680575c8df/tacotron-cli-0.0.5.tar.gz",
    "platform": null,
    "description": "# tacotron-cli\n\n[![PyPI](https://img.shields.io/pypi/v/tacotron-cli.svg)](https://pypi.python.org/pypi/tacotron-cli)\n[![PyPI](https://img.shields.io/pypi/pyversions/tacotron-cli.svg)](https://pypi.python.org/pypi/tacotron-cli)\n[![MIT](https://img.shields.io/github/license/stefantaubert/tacotron.svg)](https://github.com/stefantaubert/tacotron/blob/master/LICENSE)\n[![PyPI](https://img.shields.io/pypi/wheel/tacotron-cli.svg)](https://pypi.python.org/pypi/tacotron-cli)\n[![PyPI](https://img.shields.io/pypi/implementation/tacotron-cli.svg)](https://pypi.python.org/pypi/tacotron-cli)\n[![PyPI](https://img.shields.io/github/commits-since/stefantaubert/tacotron/latest/master.svg)](https://github.com/stefantaubert/tacotron/compare/v0.0.5...master)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.10568731.svg)](https://doi.org/10.5281/zenodo.10568731)\n\nCommand-line interface (CLI) to train Tacotron 2 using .wav <=> .TextGrid pairs.\n\n## Features\n\n- train phoneme stress separately (ARPAbet/IPA)\n- train phoneme tone separately (IPA)\n- train phoneme duration separately (IPA)\n- train single/multi-speaker\n- train/synthesize on CPU or GPU\n- synthesis of paragraphs\n- copy embeddings from one checkpoint to another\n- train using embeddings or one-hot encodings\n\n## Installation\n\n```sh\npip install tacotron-cli --user\n```\n\n## Usage\n\n```txt\nusage: tacotron-cli [-h] [-v] {create-mels,train,continue-train,validate,synthesize,synthesize-grids,analyze,add-missing-symbols} ...\n\nCommand-line interface (CLI) to train Tacotron 2 using .wav <=> .TextGrid pairs.\n\npositional arguments:\n  {create-mels,train,continue-train,validate,synthesize,synthesize-grids,analyze,add-missing-symbols}\n                              description\n    create-mels               create mel-spectrograms from audio files\n    train                     start training\n    continue-train            continue training from a checkpoint\n    validate                  validate checkpoint(s)\n    synthesize                synthesize lines from a file\n    synthesize-grids          synthesize .TextGrid files\n    analyze                   analyze checkpoint\n    add-missing-symbols       copy missing symbols from one checkpoint to another\n\noptions:\n  -h, --help                  show this help message and exit\n  -v, --version               show program's version number and exit\n```\n\n## Training\n\nThe dataset structure need to follow the generic format of [speech-dataset-parser](https://pypi.org/project/speech-dataset-parser/), i.e., each TextGrid need to contain a tier in which all phonemes are separated into single intervals, e.g., `T|h|i|s| |i|s| |a| |t|e|x|t|.`.\n\nTips:\n\n- place stress directly to the vowel of the syllable, e.g. `b|\u02c8o|d|i` instead of `\u02c8b|o|d|i` (body)\n- place tone directly to the vowel of the syllable, e.g. `\u0288\u0282\u02b0|w|a\u02e5\u02e9|n` instead of `\u0288\u0282\u02b0|w|a|n\u02e5\u02e9` (\u4e32)\n  - tone-characters which are considered: `\u02e5 \u02e6 \u02e7 \u02e8 \u02e9`, e.g., `\u0251\u02e5\u02e9`\n- duration-characters which are considered: `\u02d8 \u02d1 \u02d0`, e.g., `\u028c\u02d0`\n- normalize the text, e.g., numbers should be written out\n- substituted space by either `SIL0`, `SIL1` or `SIL2` depending on the duration of the pause\n  - use `SIL0` for no pause\n  - use `SIL1` for a short pause, for example after a comma `...|v|i|\u02c8\u025b|n|\u028c|,|SIL1|\u02c8\u0254|s|t|\u0279|i|\u028c|...`\n  - use `SIL2` for a longer pause, for example after a sentence: `...|\u02c8\u025d|\u03b8|.|SIL2`\n- Note: only phonemes occurring in the TextGrids (on the selected tier) are possible to synthesize\n\n## Synthesis\n\nTo prepare a text for synthesis, following things need to be considered:\n\n- each line in the text file will be synthesized as a single file, therefore it is recommended to place each sentence onto a single line\n- paragraphs can be separated by a blank line\n- each symbol needs can be separated by an separator like `|`, e.g. `s|\u02cc\u026a|\u0261|\u025d|\u02c8\u025b|t`\n  - this is useful if the model contains phonemes/symbols that consist of multiple characters, e.g., `\u02c8\u025b`\n\nExample valid sentence: \"As the overlying plate lifts up, it also forms mountain ranges.\" => `\u02c8\u00e6|z|SIL0|\u00f0|\u028c|SIL0|\u02cco\u028a|v|\u025d|l|\u02c8a\u026a|\u026a|\u014b|SIL0|p|l|\u02c8e\u026a|t|SIL0|l|\u02c8\u026a|f|t|s|SIL0|\u02c8\u028c|p|,|SIL1|\u026a|t|SIL0|\u02c8\u0254|l|s|o\u028a|SIL0|f|\u02c8\u0254|\u0279|m|z|SIL0|m|\u02c8a\u028a|n|t|\u028c|n|SIL0|\u0279|\u02c8e\u026a|n|d\u0361\u0292|\u028c|z|.|SIL2`\n\nExample invalid sentence: \"Digestion is a vital process which involves the breakdown of food into smaller and smaller components, until they can be absorbed and assimilated into the body.\" => `da\u026a\u02c8\u02a4\u025bs\u02a7\u028cn \u026az \u028c \u02c8va\u026at\u028cl \u02c8p\u0279\u0251\u02ccs\u025bs w\u026a\u02a7 \u026an\u02c8v\u0251lvz \u00f0\u028c \u02c8b\u0279e\u026ak\u02ccda\u028an \u028cv fud \u02c8\u026antu \u02c8sm\u0254l\u025d \u00e6nd \u02c8sm\u0254l\u025d k\u028cm\u02c8po\u028an\u028cnts, \u028cn\u02c8t\u026al \u00f0e\u026a k\u00e6n bi \u028cb\u02c8z\u0254\u0279bd \u00e6nd \u028c\u02c8s\u026am\u028c\u02ccle\u026at\u026ad \u02c8\u026antu \u00f0\u028c \u02c8b\u0251di.`\n\n## Pretrained Models\n\n- English\n  - [LJ Speech English TTS](https://zenodo.org/records/10200955)\n  - [LJ Speech English TTS with explicit duration markers](https://zenodo.org/records/10107104)\n- Chinese\n  - [THCHS-30 Chinese TTS](https://zenodo.org/records/10210310)\n  - [THCHS-30 Chinese TTS with explicit duration markers](https://zenodo.org/records/10209990)\n\n## Audio Example\n\n\"The North Wind and the Sun were disputing which was the stronger, when a traveler came along wrapped in a warm cloak.\" [Listen here](https://tuc.cloud/index.php/s/gzaYDNKinHw6GCz) (headphones recommended)\n\n## Example Synthesis\n\nTo reproduce the audio example from above, you can use the following commands:\n\n```sh\n# Create example directory\nmkdir ~/example\n\n# Download pre-trained Tacotron model checkpoint\nwget https://tuc.cloud/index.php/s/xxFCDMgEk8dZKbp/download/LJS-IPA-101500.pt -O ~/example/checkpoint-tacotron.pt\n\n# Download pre-trained Waveglow model checkpoint\nwget https://tuc.cloud/index.php/s/yBRaWz5oHrFwigf/download/LJS-v3-580000.pt -O ~/example/checkpoint-waveglow.pt\n\n# Create text containing phonetic transcription of: \"The North Wind and the Sun were disputing which was the stronger, when a traveler came along wrapped in a warm cloak.\"\ncat > ~/example/text.txt << EOF\n\u00f0|\u028c|SIL0|n|\u02c8\u0254|\u0279|\u03b8|SIL0|w|\u02c8\u026a|n|d|SIL0|\u02c8\u00e6|n|d|SIL0|\u00f0|\u028c|SIL0|s|\u02c8\u028c|n|SIL0|w|\u025d|SIL0|d|\u026a|s|p|j|\u02c8u|t|\u026a|\u014b|SIL0|h|w|\u02c8\u026a|t\u0361\u0283|SIL0|w|\u02c8\u0251|z|SIL0|\u00f0|\u028c|SIL0|s|t|\u0279|\u02c8\u0254|\u014b|\u025d|,|SIL1|h|w|\u02c8\u025b|n|SIL0|\u028c|SIL0|t|\u0279|\u02c8\u00e6|v|\u028c|l|\u025d|SIL0|k|\u02c8e\u026a|m|SIL0|\u028c|l|\u02c8\u0254|\u014b|SIL0|\u0279|\u02c8\u00e6|p|t|SIL0|\u026a|n|SIL0|\u028c|SIL0|w|\u02c8\u0254|\u0279|m|SIL0|k|l|\u02c8o\u028a|k|.|SIL2\nEOF\n\n# Synthesize text to mel-spectrogram\ntacotron-cli synthesize \\\n  ~/example/checkpoint-tacotron.pt \\\n  ~/example/text.txt \\\n  --sep \"|\"\n\n# Install waveglow-cli for synthesis of mel-spectrograms\npip install waveglow-cli --user\n\n# Synthesize mel-spectrogram to wav\nwaveglow-cli synthesize \\\n  ~/example/checkpoint-waveglow.pt \\\n  ~/example/text -o\n\n# Resulting wav is written to: ~/example/text/1-1.npy.wav\n```\n\n## Roadmap\n\n- Outsource method to convert audio files to mel-spectrograms before training\n- Better logging\n- Provide more pre-trained models\n- Adding tests\n\n## Development setup\n\n```sh\n# update\nsudo apt update\n# install Python 3.8-3.11 for ensuring that tests can be run\nsudo apt install python3-pip \\\n  python3.8 python3.8-dev python3.8-distutils python3.8-venv \\\n  python3.9 python3.9-dev python3.9-distutils python3.9-venv \\\n  python3.10 python3.10-dev python3.10-distutils python3.10-venv \\\n  python3.11 python3.11-dev python3.11-distutils python3.11-venv\n# install pipenv for creation of virtual environments\npython3.8 -m pip install pipenv --user\n\n# check out repo\ngit clone https://github.com/stefantaubert/tacotron.git\ncd tacotron\n# create virtual environment\npython3.8 -m pipenv install --dev\n```\n\n## Running the tests\n\n```sh\n# first install the tool like in \"Development setup\"\n# then, navigate into the directory of the repo (if not already done)\ncd tacotron\n# activate environment\npython3.8 -m pipenv shell\n# run tests\ntox\n```\n\nFinal lines of test result output:\n\n```log\npy38: commands succeeded\npy39: commands succeeded\npy310: commands succeeded\npy311: commands succeeded\ncongratulations :)\n```\n\n## License\n\nMIT License\n\n## Acknowledgments\n\nModel code adapted from [Nvidia](https://github.com/NVIDIA/tacotron2).\n\nPapers:\n\n- [Tacotron: Towards End-to-End Speech Synthesis](https://www.isca-speech.org/archive/interspeech_2017/wang17n_interspeech.html)\n- [Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions](https://ieeexplore.ieee.org/document/8461368)\n\nFunded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) \u2013 Project-ID 416228727 \u2013 CRC 1410\n\n## Citation\n\nIf you want to cite this repo, you can use the BibTeX-entry generated by GitHub (see *About => Cite this repository*).\n\n```txt\nTaubert, S. (2024). tacotron-cli (Version 0.0.5) [Computer software]. [https://doi.org/10.5281/zenodo.10568731](https://doi.org/10.5281/zenodo.10568731)\n```\n\n## Cited by\n\n- Taubert, S., Sternkopf, J., Kahl, S., & Eibl, M. (2022). A Comparison of Text Selection Algorithms for Sequence-to-Sequence Neural TTS. 2022 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), 1\u20136. [https://doi.org/10.1109/ICSPCC55723.2022.9984283](https://doi.org/10.1109/ICSPCC55723.2022.9984283)\n- Albrecht, S., Tamboli, R., Taubert, S., Eibl, M., Rey, G. D., & Schmied, J. (2022). Towards a Vowel Formant Based Quality Metric for Text-to-Speech Systems: Measuring Monophthong Naturalness. 2022 IEEE 9th International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA), 1\u20136. [https://doi.org/10.1109/CIVEMSA53371.2022.9853712](https://doi.org/10.1109/CIVEMSA53371.2022.9853712)\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Command-line interface (CLI) to train Tacotron 2 using .wav <=> .TextGrid pairs.",
    "version": "0.0.5",
    "project_urls": {
        "Homepage": "https://github.com/stefantaubert/tacotron",
        "Issues": "https://github.com/stefantaubert/tacotron/issues"
    },
    "split_keywords": [
        "text-to-speech",
        "speech synthesis",
        "corpus",
        "utils",
        "language",
        "linguistics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cd7b28b07684b518343371c7f65e5a42c5c93fd178ef3c38211a471a2aa4b0d2",
                "md5": "bc9cb587f13192a1bb23bcb8747cdb74",
                "sha256": "4410caace2609e86c01d8407b239b4b5d8c7f7f751cc2b8c92ae67c91c6f200f"
            },
            "downloads": -1,
            "filename": "tacotron_cli-0.0.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "bc9cb587f13192a1bb23bcb8747cdb74",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.12,>=3.8",
            "size": 88404,
            "upload_time": "2024-01-25T15:29:25",
            "upload_time_iso_8601": "2024-01-25T15:29:25.474970Z",
            "url": "https://files.pythonhosted.org/packages/cd/7b/28b07684b518343371c7f65e5a42c5c93fd178ef3c38211a471a2aa4b0d2/tacotron_cli-0.0.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ae3d3e4f3ea07fa4cb436f374c47a6758d64e44e435676a64b952a680575c8df",
                "md5": "f30fe36baa66a21487526616e18812d6",
                "sha256": "532390ca2ad67b38b5994a0d0ce5609d9dcff4912b59cf89a14648089c0b6491"
            },
            "downloads": -1,
            "filename": "tacotron-cli-0.0.5.tar.gz",
            "has_sig": false,
            "md5_digest": "f30fe36baa66a21487526616e18812d6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.12,>=3.8",
            "size": 93257,
            "upload_time": "2024-01-25T15:29:27",
            "upload_time_iso_8601": "2024-01-25T15:29:27.077142Z",
            "url": "https://files.pythonhosted.org/packages/ae/3d/3e4f3ea07fa4cb436f374c47a6758d64e44e435676a64b952a680575c8df/tacotron-cli-0.0.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-25 15:29:27",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "stefantaubert",
    "github_project": "tacotron",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "tacotron-cli"
}