# PYTHON WORLD VOCODER:
*************************************
This is a line-by-line implementation of WORLD vocoder (Matlab, C++) in python. It supports *python 3.0* and later.
# INSTALLATION
*********************
```
pip install worldvocoder
```
# EXAMPLE
**************
```python
import worldvocoder as wv
import soundfile as sf
import librosa
# read audio
audio, sample_rate = sf.read("some_file.wav")
audio = librosa.to_mono(audio)
# initialize vocoder
vocoder = wv.World()
# encode audio
dat = vocoder.encode(sample_rate, audio, f0_method='harvest')
```
in which, ```sample_rate``` is sampling frequency and ```audio``` is the speech/singing signal.
The ```dat``` is a dictionary object that contains pitch, magnitude spectrum, and aperiodicity.
We can scale the pitch:
```python
dat = vocoder.scale_pitch(dat, 1.5)
```
Be careful when you scale the pich because there is upper limit and lower limit.
We can make speech faster or slower:
```python
dat = vocoder.scale_duration(dat, 2)
```
To resynthesize the audio:
```python
dat = vocoder.decode(dat)
output = dat["out"]
```
To use d4c_requiem analysis and requiem_synthesis in WORLD version 0.2.2, set the variable ```is_requiem=True```:
```python
# requiem analysis
dat = vocoder.encode(fs, x, f0_method='harvest', is_requiem=True)
```
To extract log-filterbanks, MCEP-40, VAE-12 as described in the paper `Using a Manifold Vocoder for Spectral Voice and Style Conversion`, check ```test/spectralFeatures.py```. You need Keras 2.2.4 and TensorFlow 1.14.0 to extract VAE-12.
Check out [speech samples](https://tuanad121.github.io/samples/2019-09-15-Manifold/)
# NOTE:
**********
* The vocoder use pitch-synchronous analysis, the size of each window is determined by fundamental frequency ```F0```. The centers of the windows are equally spaced with the distance of ```frame_period``` ms.
* The Fourier transform size (```fft_size```) is determined automatically using sampling frequency and the lowest value of F0 ```f0_floor```.
When you want to specify your own ```fft_size```, you have to use ```f0_floor = 3.0 * fs / fft_size```.
If you decrease ```fft_size```, the ```f0_floor``` increases. But, a high ```f0_floor``` might be not good for the analysis of male voices.
# CITATION:
Dinh, T., Kain, A., & Tjaden, K. (2019). Using a manifold vocoder for spectral voice and style conversion. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2019-September, 1388-1392.
Raw data
{
"_id": null,
"home_page": "https://github.com/javanasse/Python-WORLD",
"name": "worldvocoder",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "",
"author": "JulianArmandVanasse",
"author_email": "Julian <julian.vanasse@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/d4/e4/8336dffb1a26e61d3558a8b9c8120538121089dbd978dbff7806de301d52/worldvocoder-0.0.5.tar.gz",
"platform": null,
"description": "# PYTHON WORLD VOCODER: \n*************************************\n\nThis is a line-by-line implementation of WORLD vocoder (Matlab, C++) in python. It supports *python 3.0* and later.\n\n# INSTALLATION\n*********************\n\n```\npip install worldvocoder\n```\n\n# EXAMPLE\n**************\n\n```python\nimport worldvocoder as wv\nimport soundfile as sf\nimport librosa\n\n# read audio\naudio, sample_rate = sf.read(\"some_file.wav\")\naudio = librosa.to_mono(audio)\n\n# initialize vocoder\nvocoder = wv.World()\n\n# encode audio\ndat = vocoder.encode(sample_rate, audio, f0_method='harvest')\n\n```\n\nin which, ```sample_rate``` is sampling frequency and ```audio``` is the speech/singing signal.\n\nThe ```dat``` is a dictionary object that contains pitch, magnitude spectrum, and aperiodicity. \n\nWe can scale the pitch:\n\n```python\ndat = vocoder.scale_pitch(dat, 1.5)\n```\n\nBe careful when you scale the pich because there is upper limit and lower limit.\n\nWe can make speech faster or slower:\n\n```python\ndat = vocoder.scale_duration(dat, 2)\n```\n\nTo resynthesize the audio:\n\n```python\ndat = vocoder.decode(dat)\noutput = dat[\"out\"]\n```\n\nTo use d4c_requiem analysis and requiem_synthesis in WORLD version 0.2.2, set the variable ```is_requiem=True```:\n\n```python\n# requiem analysis\ndat = vocoder.encode(fs, x, f0_method='harvest', is_requiem=True)\n```\n\nTo extract log-filterbanks, MCEP-40, VAE-12 as described in the paper `Using a Manifold Vocoder for Spectral Voice and Style Conversion`, check ```test/spectralFeatures.py```. You need Keras 2.2.4 and TensorFlow 1.14.0 to extract VAE-12.\nCheck out [speech samples](https://tuanad121.github.io/samples/2019-09-15-Manifold/)\n\n# NOTE:\n**********\n\n* The vocoder use pitch-synchronous analysis, the size of each window is determined by fundamental frequency ```F0```. The centers of the windows are equally spaced with the distance of ```frame_period``` ms.\n\n* The Fourier transform size (```fft_size```) is determined automatically using sampling frequency and the lowest value of F0 ```f0_floor```. \nWhen you want to specify your own ```fft_size```, you have to use ```f0_floor = 3.0 * fs / fft_size```. \nIf you decrease ```fft_size```, the ```f0_floor``` increases. But, a high ```f0_floor``` might be not good for the analysis of male voices.\n\n\n# CITATION:\n\nDinh, T., Kain, A., & Tjaden, K. (2019). Using a manifold vocoder for spectral voice and style conversion. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2019-September, 1388-1392.\n",
"bugtrack_url": null,
"license": "",
"summary": "Python implementation of WORLD vocoder.",
"version": "0.0.5",
"project_urls": {
"Download": "https://github.com/javanasse/Python-WORLD/archive/refs/tags/v0.tar.gz",
"Homepage": "https://github.com/javanasse/worldvocoder"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "c154de9ac193992ba965bed8cc611a56d0327919d09b9569bc5290886974334e",
"md5": "f298c378be06bfc47b72ec7a866ebe90",
"sha256": "df6b147d0e2d45d26ab0c5e52a44154d65a5b2ff8d29d2b3593ebecd3e518879"
},
"downloads": -1,
"filename": "worldvocoder-0.0.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f298c378be06bfc47b72ec7a866ebe90",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 41207,
"upload_time": "2023-07-18T18:53:49",
"upload_time_iso_8601": "2023-07-18T18:53:49.900862Z",
"url": "https://files.pythonhosted.org/packages/c1/54/de9ac193992ba965bed8cc611a56d0327919d09b9569bc5290886974334e/worldvocoder-0.0.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d4e48336dffb1a26e61d3558a8b9c8120538121089dbd978dbff7806de301d52",
"md5": "9044046d5fbadd8cdb6e3604c6486a0c",
"sha256": "9c2748c6bc0be1df04e4a7675805966c8981ce81b863d9b90cb8764a7ad03176"
},
"downloads": -1,
"filename": "worldvocoder-0.0.5.tar.gz",
"has_sig": false,
"md5_digest": "9044046d5fbadd8cdb6e3604c6486a0c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 31456,
"upload_time": "2023-07-18T18:53:51",
"upload_time_iso_8601": "2023-07-18T18:53:51.758787Z",
"url": "https://files.pythonhosted.org/packages/d4/e4/8336dffb1a26e61d3558a8b9c8120538121089dbd978dbff7806de301d52/worldvocoder-0.0.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-07-18 18:53:51",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "javanasse",
"github_project": "Python-WORLD",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "numpy",
"specs": [
[
"==",
"1.24.3"
]
]
},
{
"name": "scipy",
"specs": [
[
"==",
"1.10.1"
]
]
},
{
"name": "numba",
"specs": [
[
"==",
"0.57.0"
]
]
},
{
"name": "cython",
"specs": [
[
"==",
"0.29.35"
]
]
},
{
"name": "simpleaudio",
"specs": [
[
"==",
"1.0.2"
]
]
},
{
"name": "matplotlib",
"specs": [
[
"==",
"3.7.1"
]
]
}
],
"lcname": "worldvocoder"
}