voxpopuli


Namevoxpopuli JSON
Version 0.3.9 PyPI version JSON
download
home_pagehttps://github.com/hadware/voxpopuli
SummaryA wrapper around Espeak and Mbrola, to do simple Text-To-Speech (TTS), with the possibility to tweak the phonemic form.
upload_time2022-12-05 02:16:23
maintainer
docs_urlNone
authorHadware
requires_python
licenseMIT
keywords tts speech phonemes audio
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI
coveralls test coverage No coveralls.
            # Voxpopuli
[![PyPI](https://img.shields.io/pypi/v/voxpopuli.svg)](https://pypi.python.org/pypi/voxpopuli)
[![PyPI](https://img.shields.io/pypi/pyversions/voxpopuli.svg)](http://py3readiness.org/)
[![Build Status](https://travis-ci.org/hadware/voxpopuli.svg?branch=master)](https://travis-ci.org/hadware/voxpopuli)
[![Documentation Status](https://readthedocs.org/projects/voxpopuli/badge/?version=latest)](http://voxpopuli.readthedocs.io/en/latest/?badge=latest)
[![license](https://img.shields.io/github/license/mashape/apistatus.svg)](LICENSE)


**A wrapper around Espeak and Mbrola.**

This is a lightweight Python wrapper for Espeak and Mbrola, two co-dependent TTS tools. It enables you to 
render sound by simply feeding it text and voice parameters. Phonemes (the data transmitted by Espeak to
mbrola) can also be manipulated using a mimalistic API.

This is a short introduction, but you might want to look at the [readthedoc documentation](http://voxpopuli.readthedocs.io/en/latest/).

## Install

**These instructions should work on any Debian/Ubuntu-derivative**

Install with pip as:
```sh
pip install voxpopuli
```

You have to have espeak and mbrola installed beforehand:
```sh
sudo apt install mbrola espeak
```

You'll also need some mbrola voices installed, which you can either get on their project page, 
and then uppack in `/usr/share/mbrola/<lang><voiceid>/` or more simply by 
installing them from the ubuntu repo's. All the voices' packages are of the form
`mbrola-<lang><voiceid>`. You can even more simply install all the voices available
by running:
```sh
sudo apt install mbrola-*
```

In case the voices you need aren't all in the ubuntu repo's, you can use this convenient little script
that install voices directly from [Mbrola's voice repo](https://github.com/numediart/MBROLA-voices):
```sh
# this installs all british english and french voices for instance
sudo python3 -m voxpopuli.voice_install en fr
```

## Usage

### Picking a voice and making it say things

The most simple usage of this lib is just bare TTS, using a voice and
a text. The rendered audio is returned in a .wav bytes object:
```python
from voxpopuli import Voice
voice = Voice(lang="fr")
wav = voice.to_audio("salut c'est cool")
```
Evaluating `type(wav)` whould return `bytes`. You can then save the wav using the `wb` 
file option

```python
with open("salut.wav", "wb") as wavfile:
    wavfile.write(wav)
```
If you wish to hear how it sounds right away, you'll have to make sure you installed pyaudio *via* pip, and then do:
```python
voice.say("Salut c'est cool")
```

Ou can also, say, use scipy to get the pcm audio as a `ndarray`:

```python
import scipy.io.wavfile import read, write
from io import BytesIO

rate, wave_array = read(BytesIO(wav))
reversed = wave_array[::-1] # reversing the sound file
write("tulas.wav", rate, reversed)
```

### Getting different voices

You can set some parameters you can set on the voice, such as language or pitch

```python
from voxpopuli import Voice
# really slow fice with high pitch
voice = Voice(lang="us", pitch=99, speed=40, voice_id=2)
voice.say("I'm high on helium")
```

The exhaustive list of parameters is:

 * lang, a language code among those available (us, fr, en, es, ...) You can list
    them using the `listvoices` method from a `Voice` instance.
 * voice_id, an integer, used to select the voice id for a language. If not specified,
    the first voice id found for a given language is used.
 * pitch, an integer between 0 and 99 (included)
 * speed, an integer, in the words per minute. Default and regular speed
is 160 wpm.
 * volume, float ratio applied to the output sample. Some languages have presets
    that our best specialists tested. Otherwise, defaults to 1.

### Handling the phonemic form

To render a string of text to audio, the Voice object actually chains espeak's output
to mbrola, who then renders it to audio. Espeak only renders the text to a list of
phonemes (such as the one in the IPA), who then are to be processed by mbrola.
For those who like pictures, here is a diagram of what happens when you run
`voice.to_audio("Hello world")`

![phonemes](docs/source/img/phonemes.png?raw=true)

phonemes are represented sequentially by a code, a duration in milliseconds, and
a list of pitch modifiers. The pitch modifiers are a list of couples, each couple
representing the percentage of the sample at which to apply the pitch modification and
the pitch. 

Funny thing is, with voxpopuli, you can "intercept" that phoneme list as a
simple object, modify it, and then pass it back to the voice to render it to
audio. For instance, let's make a simple alteration that'll double the
duration for each vowels in an english text.

```python
from voxpopuli import Voice, BritishEnglishPhonemes

voice = Voice(lang="en")
# here's how you get the phonemes list
phoneme_list = voice.to_phonemes("Now go away or I will taunt you a second time.") 
for phoneme in phoneme_list: #phoneme list object inherits from the list object
    if phoneme.name in BritishEnglishPhonemes.VOWELS:
        phoneme.duration *= 3
        
# rendering and saving the sound, then saying it out loud:
voice.to_audio(phoneme_list, "modified.wav")
voice.say(phoneme_list)
```

Notes:

 * For French, Spanish, German and Italian, the phoneme codes
 used by espeak and mbrola are available as class attributes similar to the `BritishEnglishPhonemes` class as above.
 * More info on the phonemes can be found here: [SAMPA page](http://www.phon.ucl.ac.uk/home/sampa/)
 

## What's left to do

 * Moar unit tests
 * Maybe some examples

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/hadware/voxpopuli",
    "name": "voxpopuli",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "tts speech phonemes audio",
    "author": "Hadware",
    "author_email": "hadwarez@gmail.com",
    "download_url": "",
    "platform": null,
    "description": "# Voxpopuli\n[![PyPI](https://img.shields.io/pypi/v/voxpopuli.svg)](https://pypi.python.org/pypi/voxpopuli)\n[![PyPI](https://img.shields.io/pypi/pyversions/voxpopuli.svg)](http://py3readiness.org/)\n[![Build Status](https://travis-ci.org/hadware/voxpopuli.svg?branch=master)](https://travis-ci.org/hadware/voxpopuli)\n[![Documentation Status](https://readthedocs.org/projects/voxpopuli/badge/?version=latest)](http://voxpopuli.readthedocs.io/en/latest/?badge=latest)\n[![license](https://img.shields.io/github/license/mashape/apistatus.svg)](LICENSE)\n\n\n**A wrapper around Espeak and Mbrola.**\n\nThis is a lightweight Python wrapper for Espeak and Mbrola, two co-dependent TTS tools. It enables you to \nrender sound by simply feeding it text and voice parameters. Phonemes (the data transmitted by Espeak to\nmbrola) can also be manipulated using a mimalistic API.\n\nThis is a short introduction, but you might want to look at the [readthedoc documentation](http://voxpopuli.readthedocs.io/en/latest/).\n\n## Install\n\n**These instructions should work on any Debian/Ubuntu-derivative**\n\nInstall with pip as:\n```sh\npip install voxpopuli\n```\n\nYou have to have espeak and mbrola installed beforehand:\n```sh\nsudo apt install mbrola espeak\n```\n\nYou'll also need some mbrola voices installed, which you can either get on their project page, \nand then uppack in `/usr/share/mbrola/<lang><voiceid>/` or more simply by \ninstalling them from the ubuntu repo's. All the voices' packages are of the form\n`mbrola-<lang><voiceid>`. You can even more simply install all the voices available\nby running:\n```sh\nsudo apt install mbrola-*\n```\n\nIn case the voices you need aren't all in the ubuntu repo's, you can use this convenient little script\nthat install voices directly from [Mbrola's voice repo](https://github.com/numediart/MBROLA-voices):\n```sh\n# this installs all british english and french voices for instance\nsudo python3 -m voxpopuli.voice_install en fr\n```\n\n## Usage\n\n### Picking a voice and making it say things\n\nThe most simple usage of this lib is just bare TTS, using a voice and\na text. The rendered audio is returned in a .wav bytes object:\n```python\nfrom voxpopuli import Voice\nvoice = Voice(lang=\"fr\")\nwav = voice.to_audio(\"salut c'est cool\")\n```\nEvaluating `type(wav)` whould return `bytes`. You can then save the wav using the `wb` \nfile option\n\n```python\nwith open(\"salut.wav\", \"wb\") as wavfile:\n    wavfile.write(wav)\n```\nIf you wish to hear how it sounds right away, you'll have to make sure you installed pyaudio *via* pip, and then do:\n```python\nvoice.say(\"Salut c'est cool\")\n```\n\nOu can also, say, use scipy to get the pcm audio as a `ndarray`:\n\n```python\nimport scipy.io.wavfile import read, write\nfrom io import BytesIO\n\nrate, wave_array = read(BytesIO(wav))\nreversed = wave_array[::-1] # reversing the sound file\nwrite(\"tulas.wav\", rate, reversed)\n```\n\n### Getting different voices\n\nYou can set some parameters you can set on the voice, such as language or pitch\n\n```python\nfrom voxpopuli import Voice\n# really slow fice with high pitch\nvoice = Voice(lang=\"us\", pitch=99, speed=40, voice_id=2)\nvoice.say(\"I'm high on helium\")\n```\n\nThe exhaustive list of parameters is:\n\n * lang, a language code among those available (us, fr, en, es, ...) You can list\n    them using the `listvoices` method from a `Voice` instance.\n * voice_id, an integer, used to select the voice id for a language. If not specified,\n    the first voice id found for a given language is used.\n * pitch, an integer between 0 and 99 (included)\n * speed, an integer, in the words per minute. Default and regular speed\nis 160 wpm.\n * volume, float ratio applied to the output sample. Some languages have presets\n    that our best specialists tested. Otherwise, defaults to 1.\n\n### Handling the phonemic form\n\nTo render a string of text to audio, the Voice object actually chains espeak's output\nto mbrola, who then renders it to audio. Espeak only renders the text to a list of\nphonemes (such as the one in the IPA), who then are to be processed by mbrola.\nFor those who like pictures, here is a diagram of what happens when you run\n`voice.to_audio(\"Hello world\")`\n\n![phonemes](docs/source/img/phonemes.png?raw=true)\n\nphonemes are represented sequentially by a code, a duration in milliseconds, and\na list of pitch modifiers. The pitch modifiers are a list of couples, each couple\nrepresenting the percentage of the sample at which to apply the pitch modification and\nthe pitch. \n\nFunny thing is, with voxpopuli, you can \"intercept\" that phoneme list as a\nsimple object, modify it, and then pass it back to the voice to render it to\naudio. For instance, let's make a simple alteration that'll double the\nduration for each vowels in an english text.\n\n```python\nfrom voxpopuli import Voice, BritishEnglishPhonemes\n\nvoice = Voice(lang=\"en\")\n# here's how you get the phonemes list\nphoneme_list = voice.to_phonemes(\"Now go away or I will taunt you a second time.\") \nfor phoneme in phoneme_list: #phoneme list object inherits from the list object\n    if phoneme.name in BritishEnglishPhonemes.VOWELS:\n        phoneme.duration *= 3\n        \n# rendering and saving the sound, then saying it out loud:\nvoice.to_audio(phoneme_list, \"modified.wav\")\nvoice.say(phoneme_list)\n```\n\nNotes:\n\n * For French, Spanish, German and Italian, the phoneme codes\n used by espeak and mbrola are available as class attributes similar to the `BritishEnglishPhonemes` class as above.\n * More info on the phonemes can be found here: [SAMPA page](http://www.phon.ucl.ac.uk/home/sampa/)\n \n\n## What's left to do\n\n * Moar unit tests\n * Maybe some examples\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A wrapper around Espeak and Mbrola, to do simple Text-To-Speech (TTS), with the possibility to tweak the phonemic form.",
    "version": "0.3.9",
    "split_keywords": [
        "tts",
        "speech",
        "phonemes",
        "audio"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "b6ab7b290ded807dce794599f532c6ef",
                "sha256": "86aeb78bc074610227ec2e9076d874ba685be5ea388ce2629fa426e5eb926bda"
            },
            "downloads": -1,
            "filename": "voxpopuli-0.3.9-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b6ab7b290ded807dce794599f532c6ef",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 12935,
            "upload_time": "2022-12-05T02:16:23",
            "upload_time_iso_8601": "2022-12-05T02:16:23.564820Z",
            "url": "https://files.pythonhosted.org/packages/d1/b5/251092f46d38f2b11ebc537f0d46f09e4f701998a01f91915826eccbc581/voxpopuli-0.3.9-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-12-05 02:16:23",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "hadware",
    "github_project": "voxpopuli",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": false,
    "lcname": "voxpopuli"
}
        
Elapsed time: 0.03496s