primitive-tts

Name	primitive-tts JSON
Version	0.0.1 JSON
	download
home_page	None
Summary	A simple text-to-speech library
upload_time	2025-02-02 16:00:04
maintainer	None
docs_url	None
author	Gunther Cox
requires_python	>=3.4
license	MIT
keywords	speech speech synthesis text-to-speech tts
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

# Primitive TTS

Primitive TTS is a text-to-speech (TTS) library intended to have as simple of an implementation as possible. Its creation was brought about by the fact that although numerous other speech synthesis libraries exist, they almost ubiquitously seem to suffer from one of the following issues:

* Complex or strict dependencies, often failing to install on various _`*nix`_ distributions
* Unclear licensing of produced voices
* Have received low, or no maintenance in several years
* The available voices don't quite fit what I'm building

In an attempt to remedy this, Primitive TTS achieves is built using:

* Minimal dependencies - [`wave`](https://docs.python.org/3/library/wave.html) audio for example is natively supported in Python
* Allows commercial use for the generated speech
* Is designed with simplicity in mind to encourage and allow for low effort of maintainability

The generated speech is far from perfect, is not high quality, **and** when a simple alternative comes along I'll switch to that.

Anecdotally, there was a period of time from 2000 to 2015 where excessive plastic housings and computer monitored hardware seemed to be rapidly encroaching around the engines and other reparable areas of both pedestrian and industrial vehicles. Not that this is fully a first hand observation, but rather what I observed through lense of my father (who is notably the main influence that led me to pursue technology as a line of work). With over 50 years experience as a mechanic working on countless different machines, he still needed additional specialized tools and training. That's where the inspiration for Primitive TTS comes in. There is a significant learning curve between a standard multimeter, and proprietary diagnostic software - not to mention the factors of price and access. I'm seeing similar themes within the current bloom of artificial intelligence. Various AI models are limited to specific brands of hardware, and have significant resource requirements (memory, GPU, etc.). I know for certain that within a few years these limitations will ebb as the software becomes more distilled and standard catches up with it. But until then I think it might help to have at least some software that can be fixed and edited easily, using standard and widely available tools.

## Design

Primitive TTS uses a set of 39 phonemes that are compatible with [The CMU Pronouncing Dictionary](http://www.speech.cs.cmu.edu/cgi-bin/cmudict) - phonemes being the distinct units of sound in a given language, and represented herein by the following letter codes:

```
AA, AE, AH, AO, AW, AY, B, CH, D, DH, EH, ER, EY,
F, G, HH, IH, IY, JH, K, L , M, N, NG, OW, OY, P,
R, S, SH, T, TH, UH, UW, V, W, Y, Z, ZH
```

To generate speech, text is split into tokens (words) and each word is split into chunks consisting of two letters. For example, the word `"system"` becomes `["sy", "st", "em"]`. Each two-letter chunk is mapped to a set of phonemes, which in this example would be `[S, AY], [S, T], [AH, M]`. Speech is then produced by stitching together the `wav` files corresponding to each phoneme.

Odd length words are handled by selecting one letter at random and doubling it. This generally seems ok because there is also logic that removed duplicate consecutive phonemes before generating the combined wave file, this deduplication reduces jitter in the generated speech.

Splitting words two letter chunks results in a relatively small finite set of possible letter combinations for the 26 letters of the english language (26² = 676).

## Language Support

Currently only american english is supported, but theoretically support for other languages could be added by adding new phoneme mappings (see the `en` and `en.py` set as examples).

## Voices

This library is only going to support one "voice", which although not expressly named anywhere within the code, is refereed to as the "Salvius voice". This voice aims to be principally recognizable as _the voice of the humanoid robot, [Salvius](https://salvius.org)_.

## Installation

```bash
pip install primitive-tts
```

## Usage

```python
from primitive_tts.speech import speak

speak('system online')
```

## Samples

> "system online"

<audio controls>
<source src="./samples/system-online.wav" type="audio/wav">
Audio sample cannot be rendered.
</audio>

> "lorem ipsum"

<audio controls>
<source src="./samples/lorem-ipsum.wav" type="audio/wav">
Audio sample cannot be rendered.
</audio>

> Attention captain I have finished my analysis

<audio controls>
<source src="./samples/attention-captain-I-have-finished-my-analysis.wav" type="audio/wav">
Audio sample cannot be rendered.
</audio>

> The quick brown fox jumps over the lazy dog

<audio controls>
<source src="./samples/the-quick-brown-fox-jumps-over-the-lazy-dog.wav" type="audio/wav">
Audio sample cannot be rendered.
</audio>

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "primitive-tts",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.4",
    "maintainer_email": null,
    "keywords": "speech, speech synthesis, text-to-speech, tts",
    "author": "Gunther Cox",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/6d/bb/16fef2994a66a52a4b451cec4188b81059d3b550f31df0b13b493f65095a/primitive_tts-0.0.1.tar.gz",
    "platform": null,
    "description": "# Primitive TTS\n\nPrimitive TTS is a text-to-speech (TTS) library intended to have as simple of an implementation as possible. Its creation was brought about by the fact that although numerous other speech synthesis libraries exist, they almost ubiquitously seem to suffer from one of the following issues:\n\n* Complex or strict dependencies, often failing to install on various _`*nix`_ distributions\n* Unclear licensing of produced voices\n* Have received low, or no maintenance in several years\n* The available voices don't quite fit what I'm building\n\nIn an attempt to remedy this, Primitive TTS achieves is built using:\n\n* Minimal dependencies - [`wave`](https://docs.python.org/3/library/wave.html) audio for example is natively supported in Python\n* Allows commercial use for the generated speech\n* Is designed with simplicity in mind to encourage and allow for low effort of maintainability\n\nThe generated speech is far from perfect, is not high quality, **and** when a simple alternative comes along I'll switch to that.\n\nAnecdotally, there was a period of time from 2000 to 2015 where excessive plastic housings and computer monitored hardware seemed to be rapidly encroaching around the engines and other reparable areas of both pedestrian and industrial vehicles. Not that this is fully a first hand observation, but rather what I observed through lense of my father (who is notably the main influence that led me to pursue technology as a line of work). With over 50 years experience as a mechanic working on countless different machines, he still needed additional specialized tools and training. That's where the inspiration for Primitive TTS comes in. There is a significant learning curve between a standard multimeter, and proprietary diagnostic software - not to mention the factors of price and access. I'm seeing similar themes within the current bloom of artificial intelligence. Various AI models are limited to specific brands of hardware, and have significant resource requirements (memory, GPU, etc.). I know for certain that within a few years these limitations will ebb as the software becomes more distilled and standard catches up with it. But until then I think it might help to have at least some software that can be fixed and edited easily, using standard and widely available tools.\n\n## Design\n\nPrimitive TTS uses a set of 39 phonemes that are compatible with [The CMU Pronouncing Dictionary](http://www.speech.cs.cmu.edu/cgi-bin/cmudict) - phonemes being the distinct units of sound in a given language, and represented herein by the following letter codes:\n\n```\nAA, AE, AH, AO, AW, AY, B, CH, D, DH, EH, ER, EY,\nF, G, HH, IH, IY, JH, K, L , M, N, NG, OW, OY, P,\nR, S, SH, T, TH, UH, UW, V, W, Y, Z, ZH\n```\n\nTo generate speech, text is split into tokens (words) and each word is split into chunks consisting of two letters. For example, the word `\"system\"` becomes `[\"sy\", \"st\", \"em\"]`. Each two-letter chunk is mapped to a set of phonemes, which in this example would be `[S, AY], [S, T], [AH, M]`. Speech is then produced by stitching together the `wav` files corresponding to each phoneme.\n\nOdd length words are handled by selecting one letter at random and doubling it. This generally seems ok because there is also logic that removed duplicate consecutive phonemes before generating the combined wave file, this deduplication reduces jitter in the generated speech.\n\nSplitting words two letter chunks results in a relatively small finite set of possible letter combinations for the 26 letters of the english language (26\u00b2 = 676).\n\n## Language Support\n\nCurrently only american english is supported, but theoretically support for other languages could be added by adding new phoneme mappings (see the `en` and `en.py` set as examples).\n\n## Voices\n\nThis library is only going to support one \"voice\", which although not expressly named anywhere within the code, is refereed to as the \"Salvius voice\". This voice aims to be principally recognizable as _the voice of the humanoid robot, [Salvius](https://salvius.org)_.\n\n## Installation\n\n```bash\npip install primitive-tts\n```\n\n## Usage\n\n```python\nfrom primitive_tts.speech import speak\n\nspeak('system online')\n```\n\n## Samples\n\n> \"system online\"\n\n<audio controls>\n  <source src=\"./samples/system-online.wav\" type=\"audio/wav\">\n  Audio sample cannot be rendered.\n</audio>\n\n> \"lorem ipsum\"\n\n<audio controls>\n  <source src=\"./samples/lorem-ipsum.wav\" type=\"audio/wav\">\n  Audio sample cannot be rendered.\n</audio>\n\n> Attention captain I have finished my analysis\n\n<audio controls>\n  <source src=\"./samples/attention-captain-I-have-finished-my-analysis.wav\" type=\"audio/wav\">\n  Audio sample cannot be rendered.\n</audio>\n\n\n\n> The quick brown fox jumps over the lazy dog\n\n<audio controls>\n  <source src=\"./samples/the-quick-brown-fox-jumps-over-the-lazy-dog.wav\" type=\"audio/wav\">\n  Audio sample cannot be rendered.\n</audio>\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A simple text-to-speech library",
    "version": "0.0.1",
    "project_urls": {
        "Repository": "https://github.com/gunthercox/primitive-tts.git"
    },
    "split_keywords": [
        "speech",
        " speech synthesis",
        " text-to-speech",
        " tts"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "94e04250c79af4d15beb5300b56c26476d323c6d890a7da7da319393cff13c5b",
                "md5": "dccc15573e280ab2bded40ed9cfc2d39",
                "sha256": "8c75c3d8152832306bcbbbefb3d7ea0fce406a24c357864375688c021965411f"
            },
            "downloads": -1,
            "filename": "primitive_tts-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "dccc15573e280ab2bded40ed9cfc2d39",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.4",
            "size": 9225,
            "upload_time": "2025-02-02T16:00:06",
            "upload_time_iso_8601": "2025-02-02T16:00:06.601083Z",
            "url": "https://files.pythonhosted.org/packages/94/e0/4250c79af4d15beb5300b56c26476d323c6d890a7da7da319393cff13c5b/primitive_tts-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6dbb16fef2994a66a52a4b451cec4188b81059d3b550f31df0b13b493f65095a",
                "md5": "e8a2ea0f35836d6923ed8bb23a4c7942",
                "sha256": "ad0422361d8fd176235e87f46667dd984a85d305e18109590670e6d6dcca0d99"
            },
            "downloads": -1,
            "filename": "primitive_tts-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "e8a2ea0f35836d6923ed8bb23a4c7942",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.4",
            "size": 10248,
            "upload_time": "2025-02-02T16:00:04",
            "upload_time_iso_8601": "2025-02-02T16:00:04.952169Z",
            "url": "https://files.pythonhosted.org/packages/6d/bb/16fef2994a66a52a4b451cec4188b81059d3b550f31df0b13b493f65095a/primitive_tts-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-02 16:00:04",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "gunthercox",
    "github_project": "primitive-tts",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "primitive-tts"
}

Gunther Cox