# Say: echo but with TTS
Say uses coqui-TTS to create convincing voices for TTS application.
Flexible as you like it.
## Installation
```zsh
pip install tts-say
# Or from source
pip install git+https://gitlab.com/waser-technologies/technologies/say.git
```
## Usage
From super simple...
```zsh
❯ say Hello World
Hello World
```
...to choosing your own vocoder.
```zsh
❯ say --help
usage: say [-h] [-n] [-e] [-E] [-v] [-L LANG] [--out_path OUT_PATH] [--list_models [LIST_MODELS]] [--model_name MODEL_NAME] [--vocoder_name VOCODER_NAME] [--config_path CONFIG_PATH] [--model_path MODEL_PATH] [--vocoder_path VOCODER_PATH] [--vocoder_config_path VOCODER_CONFIG_PATH] [--speaker_idx SPEAKER_IDX]
[--speaker_wav SPEAKER_WAV [SPEAKER_WAV ...]] [--speakers_file_path SPEAKERS_FILE_PATH] [--use_cuda USE_CUDA] [--debug DEBUG]
[text ...]
Same as echo but with Text-To-Speech.
positional arguments:
text Text to be said.
options:
-h, --help show this help message and exit
-n, --n do not output the trailing newline
-e, --e enable interpretation of backslash escapes
-E, --E disable interpretation of backslash escapes (default)
-v, --version output version information and exit
-L LANG, --lang LANG Language to be spoken (default: $LANG)
--out_path OUT_PATH Output wav file path.
--list_models [LIST_MODELS]
list available pre-trained tts and vocoder models.
--model_name MODEL_NAME
Name of one of the pre-trained tts models in format <language>/<dataset>/<model_name>
--vocoder_name VOCODER_NAME
name of one of the released vocoder models.
--config_path CONFIG_PATH
Path to model config file.
--model_path MODEL_PATH
Path to model file.
--vocoder_path VOCODER_PATH
Path to vocoder model file. If it is not defined, model uses GL as vocoder. Please make sure that you installed vocoder library before (WaveRNN).
--vocoder_config_path VOCODER_CONFIG_PATH
Path to vocoder model config file.
--speaker_idx SPEAKER_IDX
Target speaker ID for a multi-speaker TTS model.
--speaker_wav SPEAKER_WAV [SPEAKER_WAV ...]
wav file(s) to condition a multi-speaker TTS model with a Speaker Encoder. You can give multiple file paths. The d_vectors is computed as their average.
--speakers_file_path SPEAKERS_FILE_PATH
JSON file for multi-speaker model.
--use_cuda USE_CUDA true to use CUDA.
--debug DEBUG true to enable debug mode.
```
`say` gives you the power.
### Start the server
First you need to load the models in memory.
To do so, start the TTS server using `say` without any `text` argument.
```
say [--list_models [LIST_MODELS]] [--model_name MODEL_NAME] [--vocoder_name VOCODER_NAME] [--config_path CONFIG_PATH] [--model_path MODEL_PATH] [--vocoder_path VOCODER_PATH] [--vocoder_config_path VOCODER_CONFIG_PATH] [--speakers_file_path SPEAKERS_FILE_PATH] [--use_cuda USE_CUDA] [--debug DEBUG]
No attribute `text`.
say --help
For more information.
Starting server now.
Please wait.
...
```
Or enable its service.
```
cp ./speak.service.example /usr/lib/systemd/user/speak.service
systemctl --user enable --now speak.service
```
#### Get authorization to speak
You need to authorize the system to speak first. Change the service configuration as follows.
```toml
# ~/.assistant/tts.toml
...
[tts]
is_allowed = true
...
```
Then [start the server](#start-the-server) and use `say` with some `text` argument to [say something](#use-the-client).
### Use the client
Before you use the client, make sure :
1. the system has a valid [authorization to speak](#get-authorization-to-speak),
2. the server has correctly loaded the models,
3. if the server has loaded `YourTTS` (by default); you need to [create a `style_wav` file of your default speaker](#setup-your-own-voice-yourtts-only).
```zsh
say [-n] [-e] [-E] [-v] [-L LANG] [--out_path OUT_PATH] [text ...]
❯ say --version
Say, version two dot, zero dot, three.
Say: version 2.0.3
Copyright (c) 2022, Danny Waser
TTS version 0.6.2
...
❯ say Hello, this is a test
Hello, this is a test
```
### Save the audio
To save the resulted speech, use the argument `--out_path`.
```zsh
❯ say "Bonjour." --out_path "say_output.wav"
Bonjour.
❯ soxi say_output.wav
Input File : 'say_output.wav'
Channels : 1
Sample Rate : 16000
Precision : 16-bit
Duration : 00:00:01.17 = 18726 samples ~ 87.7781 CDDA sectors
File Size : 37.5k
Bit Rate : 256k
Sample Encoding: 16-bit Signed Integer PCM
```
## Setup your own voice (YourTTS only)
By default, the server uses YourTTS to produce speech.
Therefor, before saying anything, you need to add a wav file to `~/.assistant/data/${lang}/TTS/styles/default.wav`.
Where `$lang` is your target language (_i.e_ _`en`_, _`fr`_, _etc._).
This wav file must contain between 5 and 15 seconds of speech.
Make sure it matches with your `tts.toml` configuration.
You can also use the flag `--speaker_wav` manually.
```zsh
say "Hello." --speaker_wav "~/.assistant/data/en/TTS/styles/default.wav"
```
### _Don't want to hunt down a voice?_
Checkout my [collection of high quality TTS voices](https://gitlab.com/waser-technologies/data/tts/en/voices) generated using TTS VTCK/VITS models.
### Audio samples
<audio src="https://gitlab.com/waser-technologies/data/tts/en/voices/-/raw/master/female/default.wav?inline=false" controls preload></audio>
![](img/default_female.wav)
<audio src="https://gitlab.com/waser-technologies/data/tts/en/voices/-/raw/master/male/default.wav?inline=false" controls preload></audio>
![](img/default_male.wav)
<audio src="https://gitlab.com/waser-technologies/data/tts/en/voices/-/raw/master/female/default_2.wav?inline=false" controls preload></audio>
![](img/default_female_2.wav)
## Yes yes but echo is for text right ?
Yes but you should be able to `alias` `echo` to `say` inside your favorite shell.
Because when you think about it, asking your computer to `say something` is like asking it to `echo something`.
Both cases output `something`.
Where echo repeat what it got in stdin, say as an injonction is used to ask someone to repeat what comes after.
Like so :
```
❯ Assistant, say Hello.
[Assistant] Hello.
```
Raw data
{
"_id": null,
"home_page": "https://gitlab.com/waser-technologies/technologies/say",
"name": "tts-say",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7,<3.12",
"maintainer_email": "",
"keywords": "",
"author": "Danny Waser",
"author_email": "danny@waser.tech",
"download_url": "https://files.pythonhosted.org/packages/d7/9d/d4c14ecd9f54c96473b0c23fe36cc96fc9f1f37afcc7be0fd6fbc2bca103/tts-say-2.4.1.tar.gz",
"platform": null,
"description": "# Say: echo but with TTS\n\nSay uses coqui-TTS to create convincing voices for TTS application.\n\nFlexible as you like it.\n\n## Installation\n\n```zsh\npip install tts-say\n# Or from source\npip install git+https://gitlab.com/waser-technologies/technologies/say.git\n```\n\n## Usage\n\nFrom super simple...\n\n```zsh\n\u276f say Hello World\nHello World\n```\n\n...to choosing your own vocoder.\n\n```zsh\n\u276f say --help\nusage: say [-h] [-n] [-e] [-E] [-v] [-L LANG] [--out_path OUT_PATH] [--list_models [LIST_MODELS]] [--model_name MODEL_NAME] [--vocoder_name VOCODER_NAME] [--config_path CONFIG_PATH] [--model_path MODEL_PATH] [--vocoder_path VOCODER_PATH] [--vocoder_config_path VOCODER_CONFIG_PATH] [--speaker_idx SPEAKER_IDX]\n [--speaker_wav SPEAKER_WAV [SPEAKER_WAV ...]] [--speakers_file_path SPEAKERS_FILE_PATH] [--use_cuda USE_CUDA] [--debug DEBUG]\n [text ...]\n\nSame as echo but with Text-To-Speech.\n\npositional arguments:\n text Text to be said.\n\noptions:\n -h, --help show this help message and exit\n -n, --n do not output the trailing newline\n -e, --e enable interpretation of backslash escapes\n -E, --E disable interpretation of backslash escapes (default)\n -v, --version output version information and exit\n -L LANG, --lang LANG Language to be spoken (default: $LANG)\n --out_path OUT_PATH Output wav file path.\n --list_models [LIST_MODELS]\n list available pre-trained tts and vocoder models.\n --model_name MODEL_NAME\n Name of one of the pre-trained tts models in format <language>/<dataset>/<model_name>\n --vocoder_name VOCODER_NAME\n name of one of the released vocoder models.\n --config_path CONFIG_PATH\n Path to model config file.\n --model_path MODEL_PATH\n Path to model file.\n --vocoder_path VOCODER_PATH\n Path to vocoder model file. If it is not defined, model uses GL as vocoder. Please make sure that you installed vocoder library before (WaveRNN).\n --vocoder_config_path VOCODER_CONFIG_PATH\n Path to vocoder model config file.\n --speaker_idx SPEAKER_IDX\n Target speaker ID for a multi-speaker TTS model.\n --speaker_wav SPEAKER_WAV [SPEAKER_WAV ...]\n wav file(s) to condition a multi-speaker TTS model with a Speaker Encoder. You can give multiple file paths. The d_vectors is computed as their average.\n --speakers_file_path SPEAKERS_FILE_PATH\n JSON file for multi-speaker model.\n --use_cuda USE_CUDA true to use CUDA.\n --debug DEBUG true to enable debug mode.\n```\n\n`say` gives you the power.\n\n### Start the server\n\nFirst you need to load the models in memory.\n\nTo do so, start the TTS server using `say` without any `text` argument.\n\n```\nsay [--list_models [LIST_MODELS]] [--model_name MODEL_NAME] [--vocoder_name VOCODER_NAME] [--config_path CONFIG_PATH] [--model_path MODEL_PATH] [--vocoder_path VOCODER_PATH] [--vocoder_config_path VOCODER_CONFIG_PATH] [--speakers_file_path SPEAKERS_FILE_PATH] [--use_cuda USE_CUDA] [--debug DEBUG]\nNo attribute `text`.\nsay --help\nFor more information.\nStarting server now.\nPlease wait.\n...\n```\n\nOr enable its service.\n\n```\ncp ./speak.service.example /usr/lib/systemd/user/speak.service\nsystemctl --user enable --now speak.service\n```\n\n#### Get authorization to speak\n\nYou need to authorize the system to speak first. Change the service configuration as follows.\n\n```toml\n# ~/.assistant/tts.toml\n...\n[tts]\nis_allowed = true\n...\n```\n\nThen [start the server](#start-the-server) and use `say` with some `text` argument to [say something](#use-the-client).\n\n### Use the client\n\nBefore you use the client, make sure :\n 1. the system has a valid [authorization to speak](#get-authorization-to-speak), \n 2. the server has correctly loaded the models,\n 3. if the server has loaded `YourTTS` (by default); you need to [create a `style_wav` file of your default speaker](#setup-your-own-voice-yourtts-only).\n\n\n```zsh\nsay [-n] [-e] [-E] [-v] [-L LANG] [--out_path OUT_PATH] [text ...]\n\n\u276f say --version\nSay, version two dot, zero dot, three.\nSay: version 2.0.3\nCopyright (c) 2022, Danny Waser\nTTS version 0.6.2\n...\n\n\u276f say Hello, this is a test\nHello, this is a test\n```\n\n### Save the audio\n\nTo save the resulted speech, use the argument `--out_path`.\n\n```zsh\n\u276f say \"Bonjour.\" --out_path \"say_output.wav\"\nBonjour.\n\u276f soxi say_output.wav\n\nInput File : 'say_output.wav'\nChannels : 1\nSample Rate : 16000\nPrecision : 16-bit\nDuration : 00:00:01.17 = 18726 samples ~ 87.7781 CDDA sectors\nFile Size : 37.5k\nBit Rate : 256k\nSample Encoding: 16-bit Signed Integer PCM\n```\n\n## Setup your own voice (YourTTS only)\n\nBy default, the server uses YourTTS to produce speech.\n\nTherefor, before saying anything, you need to add a wav file to `~/.assistant/data/${lang}/TTS/styles/default.wav`.\n\nWhere `$lang` is your target language (_i.e_ _`en`_, _`fr`_, _etc._).\n\nThis wav file must contain between 5 and 15 seconds of speech.\n\nMake sure it matches with your `tts.toml` configuration.\n\nYou can also use the flag `--speaker_wav` manually.\n\n```zsh\nsay \"Hello.\" --speaker_wav \"~/.assistant/data/en/TTS/styles/default.wav\"\n```\n\n### _Don't want to hunt down a voice?_\n\nCheckout my [collection of high quality TTS voices](https://gitlab.com/waser-technologies/data/tts/en/voices) generated using TTS VTCK/VITS models. \n\n### Audio samples\n<audio src=\"https://gitlab.com/waser-technologies/data/tts/en/voices/-/raw/master/female/default.wav?inline=false\" controls preload></audio>\n![](img/default_female.wav)\n\n<audio src=\"https://gitlab.com/waser-technologies/data/tts/en/voices/-/raw/master/male/default.wav?inline=false\" controls preload></audio>\n![](img/default_male.wav)\n\n<audio src=\"https://gitlab.com/waser-technologies/data/tts/en/voices/-/raw/master/female/default_2.wav?inline=false\" controls preload></audio>\n![](img/default_female_2.wav)\n\n## Yes yes but echo is for text right ?\n\nYes but you should be able to `alias` `echo` to `say` inside your favorite shell.\n\nBecause when you think about it, asking your computer to `say something` is like asking it to `echo something`.\n\nBoth cases output `something`.\n\nWhere echo repeat what it got in stdin, say as an injonction is used to ask someone to repeat what comes after.\n\nLike so :\n```\n\u276f Assistant, say Hello.\n[Assistant] Hello.\n```\n",
"bugtrack_url": null,
"license": "",
"summary": "echo but with TTS.",
"version": "2.4.1",
"project_urls": {
"Code": "https://gitlab.com/waser-technologies/technologies/say",
"Documentation": "https://gitlab.com/waser-technologies/technologies/say/blob/main/README.md",
"Homepage": "https://gitlab.com/waser-technologies/technologies/say",
"Issue tracker": "https://gitlab.com/waser-technologies/technologies/say/issues"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ddc5ba7a08f5abbbf3be2976f0ffc1da4f720a7e4b7bae6ab1a9df70686b8611",
"md5": "b87f6cff765008637e46eaf2fd5131b8",
"sha256": "0314348c71c2dc75b59323a23b5f3880480134cef919074d0328671265c12477"
},
"downloads": -1,
"filename": "tts_say-2.4.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b87f6cff765008637e46eaf2fd5131b8",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7,<3.12",
"size": 20566,
"upload_time": "2023-12-02T00:21:00",
"upload_time_iso_8601": "2023-12-02T00:21:00.338135Z",
"url": "https://files.pythonhosted.org/packages/dd/c5/ba7a08f5abbbf3be2976f0ffc1da4f720a7e4b7bae6ab1a9df70686b8611/tts_say-2.4.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d79dd4c14ecd9f54c96473b0c23fe36cc96fc9f1f37afcc7be0fd6fbc2bca103",
"md5": "7f7474b53c87130b81dde7c939137375",
"sha256": "1a0f8486f6685f479c74d2a1090e143a1efe37acd2ed4078ed37d5a3fdd7fdbb"
},
"downloads": -1,
"filename": "tts-say-2.4.1.tar.gz",
"has_sig": false,
"md5_digest": "7f7474b53c87130b81dde7c939137375",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7,<3.12",
"size": 21102,
"upload_time": "2023-12-02T00:21:01",
"upload_time_iso_8601": "2023-12-02T00:21:01.824926Z",
"url": "https://files.pythonhosted.org/packages/d7/9d/d4c14ecd9f54c96473b0c23fe36cc96fc9f1f37afcc7be0fd6fbc2bca103/tts-say-2.4.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-12-02 00:21:01",
"github": false,
"gitlab": true,
"bitbucket": false,
"codeberg": false,
"gitlab_user": "waser-technologies",
"gitlab_project": "technologies",
"lcname": "tts-say"
}