tts-say


Nametts-say JSON
Version 2.4.1 PyPI version JSON
download
home_pagehttps://gitlab.com/waser-technologies/technologies/say
Summaryecho but with TTS.
upload_time2023-12-02 00:21:01
maintainer
docs_urlNone
authorDanny Waser
requires_python>=3.7,<3.12
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Say: echo but with TTS

Say uses coqui-TTS to create convincing voices for TTS application.

Flexible as you like it.

## Installation

```zsh
pip install tts-say
# Or from source
pip install git+https://gitlab.com/waser-technologies/technologies/say.git
```

## Usage

From super simple...

```zsh
❯ say Hello World
Hello World
```

...to choosing your own vocoder.

```zsh
❯ say --help
usage: say [-h] [-n] [-e] [-E] [-v] [-L LANG] [--out_path OUT_PATH] [--list_models [LIST_MODELS]] [--model_name MODEL_NAME] [--vocoder_name VOCODER_NAME] [--config_path CONFIG_PATH] [--model_path MODEL_PATH] [--vocoder_path VOCODER_PATH] [--vocoder_config_path VOCODER_CONFIG_PATH] [--speaker_idx SPEAKER_IDX]
           [--speaker_wav SPEAKER_WAV [SPEAKER_WAV ...]] [--speakers_file_path SPEAKERS_FILE_PATH] [--use_cuda USE_CUDA] [--debug DEBUG]
           [text ...]

Same as echo but with Text-To-Speech.

positional arguments:
  text                  Text to be said.

options:
  -h, --help            show this help message and exit
  -n, --n               do not output the trailing newline
  -e, --e               enable interpretation of backslash escapes
  -E, --E               disable interpretation of backslash escapes (default)
  -v, --version         output version information and exit
  -L LANG, --lang LANG  Language to be spoken (default: $LANG)
  --out_path OUT_PATH   Output wav file path.
  --list_models [LIST_MODELS]
                        list available pre-trained tts and vocoder models.
  --model_name MODEL_NAME
                        Name of one of the pre-trained tts models in format <language>/<dataset>/<model_name>
  --vocoder_name VOCODER_NAME
                        name of one of the released vocoder models.
  --config_path CONFIG_PATH
                        Path to model config file.
  --model_path MODEL_PATH
                        Path to model file.
  --vocoder_path VOCODER_PATH
                        Path to vocoder model file. If it is not defined, model uses GL as vocoder. Please make sure that you installed vocoder library before (WaveRNN).
  --vocoder_config_path VOCODER_CONFIG_PATH
                        Path to vocoder model config file.
  --speaker_idx SPEAKER_IDX
                        Target speaker ID for a multi-speaker TTS model.
  --speaker_wav SPEAKER_WAV [SPEAKER_WAV ...]
                        wav file(s) to condition a multi-speaker TTS model with a Speaker Encoder. You can give multiple file paths. The d_vectors is computed as their average.
  --speakers_file_path SPEAKERS_FILE_PATH
                        JSON file for multi-speaker model.
  --use_cuda USE_CUDA   true to use CUDA.
  --debug DEBUG         true to enable debug mode.
```

`say` gives you the power.

### Start the server

First you need to load the models in memory.

To do so, start the TTS server using `say` without any `text` argument.

```
say [--list_models [LIST_MODELS]] [--model_name MODEL_NAME] [--vocoder_name VOCODER_NAME] [--config_path CONFIG_PATH] [--model_path MODEL_PATH] [--vocoder_path VOCODER_PATH] [--vocoder_config_path VOCODER_CONFIG_PATH] [--speakers_file_path SPEAKERS_FILE_PATH] [--use_cuda USE_CUDA] [--debug DEBUG]
No attribute `text`.
say --help
For more information.
Starting server now.
Please wait.
...
```

Or enable its service.

```
cp ./speak.service.example /usr/lib/systemd/user/speak.service
systemctl --user enable --now speak.service
```

#### Get authorization to speak

You need to authorize the system to speak first. Change the service configuration as follows.

```toml
# ~/.assistant/tts.toml
...
[tts]
is_allowed = true
...
```

Then [start the server](#start-the-server) and use `say` with some `text` argument to [say something](#use-the-client).

### Use the client

Before you use the client, make sure :
  1. the system has a valid [authorization to speak](#get-authorization-to-speak), 
  2. the server has correctly loaded the models,
  3. if the server has loaded `YourTTS` (by default); you need to [create a `style_wav` file of your default speaker](#setup-your-own-voice-yourtts-only).


```zsh
say [-n] [-e] [-E] [-v] [-L LANG] [--out_path OUT_PATH] [text ...]

❯ say --version
Say, version two dot, zero dot, three.
Say: version 2.0.3
Copyright (c) 2022, Danny Waser
TTS version 0.6.2
...

❯ say Hello, this is a test
Hello, this is a test
```

### Save the audio

To save the resulted speech, use the argument `--out_path`.

```zsh
❯ say "Bonjour." --out_path "say_output.wav"
Bonjour.
❯ soxi say_output.wav

Input File     : 'say_output.wav'
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 00:00:01.17 = 18726 samples ~ 87.7781 CDDA sectors
File Size      : 37.5k
Bit Rate       : 256k
Sample Encoding: 16-bit Signed Integer PCM
```

## Setup your own voice (YourTTS only)

By default, the server uses YourTTS to produce speech.

Therefor, before saying anything, you need to add a wav file to `~/.assistant/data/${lang}/TTS/styles/default.wav`.

Where `$lang` is your target language (_i.e_ _`en`_, _`fr`_, _etc._).

This wav file must contain between 5 and 15 seconds of speech.

Make sure it matches with your `tts.toml` configuration.

You can also use the flag `--speaker_wav` manually.

```zsh
say "Hello." --speaker_wav "~/.assistant/data/en/TTS/styles/default.wav"
```

### _Don't want to hunt down a voice?_

Checkout my [collection of high quality TTS voices](https://gitlab.com/waser-technologies/data/tts/en/voices) generated using TTS VTCK/VITS models. 

### Audio samples
<audio src="https://gitlab.com/waser-technologies/data/tts/en/voices/-/raw/master/female/default.wav?inline=false" controls preload></audio>
![](img/default_female.wav)

<audio src="https://gitlab.com/waser-technologies/data/tts/en/voices/-/raw/master/male/default.wav?inline=false" controls preload></audio>
![](img/default_male.wav)

<audio src="https://gitlab.com/waser-technologies/data/tts/en/voices/-/raw/master/female/default_2.wav?inline=false" controls preload></audio>
![](img/default_female_2.wav)

## Yes yes but echo is for text right ?

Yes but you should be able to `alias` `echo` to `say` inside your favorite shell.

Because when you think about it, asking your computer to `say something` is like asking it to `echo something`.

Both cases output `something`.

Where echo repeat what it got in stdin, say as an injonction is used to ask someone to repeat what comes after.

Like so :
```
❯ Assistant, say Hello.
[Assistant] Hello.
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://gitlab.com/waser-technologies/technologies/say",
    "name": "tts-say",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7,<3.12",
    "maintainer_email": "",
    "keywords": "",
    "author": "Danny Waser",
    "author_email": "danny@waser.tech",
    "download_url": "https://files.pythonhosted.org/packages/d7/9d/d4c14ecd9f54c96473b0c23fe36cc96fc9f1f37afcc7be0fd6fbc2bca103/tts-say-2.4.1.tar.gz",
    "platform": null,
    "description": "# Say: echo but with TTS\n\nSay uses coqui-TTS to create convincing voices for TTS application.\n\nFlexible as you like it.\n\n## Installation\n\n```zsh\npip install tts-say\n# Or from source\npip install git+https://gitlab.com/waser-technologies/technologies/say.git\n```\n\n## Usage\n\nFrom super simple...\n\n```zsh\n\u276f say Hello World\nHello World\n```\n\n...to choosing your own vocoder.\n\n```zsh\n\u276f say --help\nusage: say [-h] [-n] [-e] [-E] [-v] [-L LANG] [--out_path OUT_PATH] [--list_models [LIST_MODELS]] [--model_name MODEL_NAME] [--vocoder_name VOCODER_NAME] [--config_path CONFIG_PATH] [--model_path MODEL_PATH] [--vocoder_path VOCODER_PATH] [--vocoder_config_path VOCODER_CONFIG_PATH] [--speaker_idx SPEAKER_IDX]\n           [--speaker_wav SPEAKER_WAV [SPEAKER_WAV ...]] [--speakers_file_path SPEAKERS_FILE_PATH] [--use_cuda USE_CUDA] [--debug DEBUG]\n           [text ...]\n\nSame as echo but with Text-To-Speech.\n\npositional arguments:\n  text                  Text to be said.\n\noptions:\n  -h, --help            show this help message and exit\n  -n, --n               do not output the trailing newline\n  -e, --e               enable interpretation of backslash escapes\n  -E, --E               disable interpretation of backslash escapes (default)\n  -v, --version         output version information and exit\n  -L LANG, --lang LANG  Language to be spoken (default: $LANG)\n  --out_path OUT_PATH   Output wav file path.\n  --list_models [LIST_MODELS]\n                        list available pre-trained tts and vocoder models.\n  --model_name MODEL_NAME\n                        Name of one of the pre-trained tts models in format <language>/<dataset>/<model_name>\n  --vocoder_name VOCODER_NAME\n                        name of one of the released vocoder models.\n  --config_path CONFIG_PATH\n                        Path to model config file.\n  --model_path MODEL_PATH\n                        Path to model file.\n  --vocoder_path VOCODER_PATH\n                        Path to vocoder model file. If it is not defined, model uses GL as vocoder. Please make sure that you installed vocoder library before (WaveRNN).\n  --vocoder_config_path VOCODER_CONFIG_PATH\n                        Path to vocoder model config file.\n  --speaker_idx SPEAKER_IDX\n                        Target speaker ID for a multi-speaker TTS model.\n  --speaker_wav SPEAKER_WAV [SPEAKER_WAV ...]\n                        wav file(s) to condition a multi-speaker TTS model with a Speaker Encoder. You can give multiple file paths. The d_vectors is computed as their average.\n  --speakers_file_path SPEAKERS_FILE_PATH\n                        JSON file for multi-speaker model.\n  --use_cuda USE_CUDA   true to use CUDA.\n  --debug DEBUG         true to enable debug mode.\n```\n\n`say` gives you the power.\n\n### Start the server\n\nFirst you need to load the models in memory.\n\nTo do so, start the TTS server using `say` without any `text` argument.\n\n```\nsay [--list_models [LIST_MODELS]] [--model_name MODEL_NAME] [--vocoder_name VOCODER_NAME] [--config_path CONFIG_PATH] [--model_path MODEL_PATH] [--vocoder_path VOCODER_PATH] [--vocoder_config_path VOCODER_CONFIG_PATH] [--speakers_file_path SPEAKERS_FILE_PATH] [--use_cuda USE_CUDA] [--debug DEBUG]\nNo attribute `text`.\nsay --help\nFor more information.\nStarting server now.\nPlease wait.\n...\n```\n\nOr enable its service.\n\n```\ncp ./speak.service.example /usr/lib/systemd/user/speak.service\nsystemctl --user enable --now speak.service\n```\n\n#### Get authorization to speak\n\nYou need to authorize the system to speak first. Change the service configuration as follows.\n\n```toml\n# ~/.assistant/tts.toml\n...\n[tts]\nis_allowed = true\n...\n```\n\nThen [start the server](#start-the-server) and use `say` with some `text` argument to [say something](#use-the-client).\n\n### Use the client\n\nBefore you use the client, make sure :\n  1. the system has a valid [authorization to speak](#get-authorization-to-speak), \n  2. the server has correctly loaded the models,\n  3. if the server has loaded `YourTTS` (by default); you need to [create a `style_wav` file of your default speaker](#setup-your-own-voice-yourtts-only).\n\n\n```zsh\nsay [-n] [-e] [-E] [-v] [-L LANG] [--out_path OUT_PATH] [text ...]\n\n\u276f say --version\nSay, version two dot, zero dot, three.\nSay: version 2.0.3\nCopyright (c) 2022, Danny Waser\nTTS version 0.6.2\n...\n\n\u276f say Hello, this is a test\nHello, this is a test\n```\n\n### Save the audio\n\nTo save the resulted speech, use the argument `--out_path`.\n\n```zsh\n\u276f say \"Bonjour.\" --out_path \"say_output.wav\"\nBonjour.\n\u276f soxi say_output.wav\n\nInput File     : 'say_output.wav'\nChannels       : 1\nSample Rate    : 16000\nPrecision      : 16-bit\nDuration       : 00:00:01.17 = 18726 samples ~ 87.7781 CDDA sectors\nFile Size      : 37.5k\nBit Rate       : 256k\nSample Encoding: 16-bit Signed Integer PCM\n```\n\n## Setup your own voice (YourTTS only)\n\nBy default, the server uses YourTTS to produce speech.\n\nTherefor, before saying anything, you need to add a wav file to `~/.assistant/data/${lang}/TTS/styles/default.wav`.\n\nWhere `$lang` is your target language (_i.e_ _`en`_, _`fr`_, _etc._).\n\nThis wav file must contain between 5 and 15 seconds of speech.\n\nMake sure it matches with your `tts.toml` configuration.\n\nYou can also use the flag `--speaker_wav` manually.\n\n```zsh\nsay \"Hello.\" --speaker_wav \"~/.assistant/data/en/TTS/styles/default.wav\"\n```\n\n### _Don't want to hunt down a voice?_\n\nCheckout my [collection of high quality TTS voices](https://gitlab.com/waser-technologies/data/tts/en/voices) generated using TTS VTCK/VITS models. \n\n### Audio samples\n<audio src=\"https://gitlab.com/waser-technologies/data/tts/en/voices/-/raw/master/female/default.wav?inline=false\" controls preload></audio>\n![](img/default_female.wav)\n\n<audio src=\"https://gitlab.com/waser-technologies/data/tts/en/voices/-/raw/master/male/default.wav?inline=false\" controls preload></audio>\n![](img/default_male.wav)\n\n<audio src=\"https://gitlab.com/waser-technologies/data/tts/en/voices/-/raw/master/female/default_2.wav?inline=false\" controls preload></audio>\n![](img/default_female_2.wav)\n\n## Yes yes but echo is for text right ?\n\nYes but you should be able to `alias` `echo` to `say` inside your favorite shell.\n\nBecause when you think about it, asking your computer to `say something` is like asking it to `echo something`.\n\nBoth cases output `something`.\n\nWhere echo repeat what it got in stdin, say as an injonction is used to ask someone to repeat what comes after.\n\nLike so :\n```\n\u276f Assistant, say Hello.\n[Assistant] Hello.\n```\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "echo but with TTS.",
    "version": "2.4.1",
    "project_urls": {
        "Code": "https://gitlab.com/waser-technologies/technologies/say",
        "Documentation": "https://gitlab.com/waser-technologies/technologies/say/blob/main/README.md",
        "Homepage": "https://gitlab.com/waser-technologies/technologies/say",
        "Issue tracker": "https://gitlab.com/waser-technologies/technologies/say/issues"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ddc5ba7a08f5abbbf3be2976f0ffc1da4f720a7e4b7bae6ab1a9df70686b8611",
                "md5": "b87f6cff765008637e46eaf2fd5131b8",
                "sha256": "0314348c71c2dc75b59323a23b5f3880480134cef919074d0328671265c12477"
            },
            "downloads": -1,
            "filename": "tts_say-2.4.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b87f6cff765008637e46eaf2fd5131b8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7,<3.12",
            "size": 20566,
            "upload_time": "2023-12-02T00:21:00",
            "upload_time_iso_8601": "2023-12-02T00:21:00.338135Z",
            "url": "https://files.pythonhosted.org/packages/dd/c5/ba7a08f5abbbf3be2976f0ffc1da4f720a7e4b7bae6ab1a9df70686b8611/tts_say-2.4.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d79dd4c14ecd9f54c96473b0c23fe36cc96fc9f1f37afcc7be0fd6fbc2bca103",
                "md5": "7f7474b53c87130b81dde7c939137375",
                "sha256": "1a0f8486f6685f479c74d2a1090e143a1efe37acd2ed4078ed37d5a3fdd7fdbb"
            },
            "downloads": -1,
            "filename": "tts-say-2.4.1.tar.gz",
            "has_sig": false,
            "md5_digest": "7f7474b53c87130b81dde7c939137375",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7,<3.12",
            "size": 21102,
            "upload_time": "2023-12-02T00:21:01",
            "upload_time_iso_8601": "2023-12-02T00:21:01.824926Z",
            "url": "https://files.pythonhosted.org/packages/d7/9d/d4c14ecd9f54c96473b0c23fe36cc96fc9f1f37afcc7be0fd6fbc2bca103/tts-say-2.4.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-02 00:21:01",
    "github": false,
    "gitlab": true,
    "bitbucket": false,
    "codeberg": false,
    "gitlab_user": "waser-technologies",
    "gitlab_project": "technologies",
    "lcname": "tts-say"
}
        
Elapsed time: 0.19399s