wavegrad


Namewavegrad JSON
Version 0.1.3 PyPI version JSON
download
home_pagehttps://www.lmnt.com
Summarywavegrad
upload_time2020-09-18 06:27:53
maintainer
docs_urlNone
authorLMNT, Inc.
requires_python
licenseApache 2.0
keywords wavegrad machine learning neural vocoder tts speech
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # WaveGrad
![PyPI Release](https://img.shields.io/pypi/v/wavegrad?label=release) [![License](https://img.shields.io/github/license/lmnt-com/wavegrad)](https://github.com/lmnt-com/wavegrad/blob/master/LICENSE)

WaveGrad is a fast, high-quality neural vocoder designed by the folks at Google Brain. The architecture is described in [WaveGrad: Estimating Gradients for Waveform Generation](https://arxiv.org/pdf/2009.00713.pdf). In short, this model takes a log-scaled Mel spectrogram and converts it to a waveform via iterative refinement.

## Status
- [x] stable training (22 kHz, 24 kHz)
- [x] high-quality synthesis
- [x] mixed-precision training
- [x] custom noise schedule (faster inference)
- [x] programmatic API
- [x] PyPI package
- [x] audio samples
- [x] pretrained models

## Audio samples
[24 kHz audio samples](https://lmnt.com/assets/wavegrad/24kHz)

## Pretrained models
[24 kHz pretrained model](https://lmnt.com/assets/wavegrad/wavegrad-24kHz.pt) (183 MB, SHA256: `65e9366da318d58d60d2c78416559351ad16971de906e53b415836c068e335f3`)

## Install

Install using pip:
```
pip install wavegrad
```

or from GitHub:
```
git clone https://github.com/lmnt-com/wavegrad.git
cd wavegrad
pip install .
```

### Training
Before you start training, you'll need to prepare a training dataset. The dataset can have any directory structure as long as the contained .wav files are 16-bit mono (e.g. [LJSpeech](https://keithito.com/LJ-Speech-Dataset/), [VCTK](https://pytorch.org/audio/_modules/torchaudio/datasets/vctk.html)). By default, this implementation assumes a sample rate of 22 kHz. If you need to change this value, edit [params.py](https://github.com/lmnt-com/wavegrad/blob/master/src/wavegrad/params.py).

```
python -m wavegrad.preprocess /path/to/dir/containing/wavs
python -m wavegrad /path/to/model/dir /path/to/dir/containing/wavs

# in another shell to monitor training progress:
tensorboard --logdir /path/to/model/dir --bind_all
```

You should expect to hear intelligible speech by ~20k steps (~1.5h on a 2080 Ti).

### Inference API
Basic usage:

```python
from wavegrad.inference import predict as wavegrad_predict

model_dir = '/path/to/model/dir'
spectrogram = # get your hands on a spectrogram in [N,C,W] format
audio, sample_rate = wavegrad_predict(spectrogram, model_dir)

# audio is a GPU tensor in [N,T] format.
```

If you have a custom noise schedule (see below):
```python
from wavegrad.inference import predict as wavegrad_predict

params = { 'noise_schedule': np.load('/path/to/noise_schedule.npy') }
model_dir = '/path/to/model/dir'
spectrogram = # get your hands on a spectrogram in [N,C,W] format
audio, sample_rate = wavegrad_predict(spectrogram, model_dir, params=params)

# `audio` is a GPU tensor in [N,T] format.
```

### Inference CLI
```
python -m wavegrad.inference /path/to/model /path/to/spectrogram -o output.wav
```

### Noise schedule
The default implementation uses 1000 iterations to refine the waveform, which runs slower than real-time. WaveGrad is able to achieve high-quality, faster than real-time synthesis with as few as 6 iterations without re-training the model with new hyperparameters.

To achieve this speed-up, you will need to search for a `noise schedule` that works well for your dataset. This implementation provides a script to perform the search for you:

```
python -m wavegrad.noise_schedule /path/to/trained/model /path/to/preprocessed/validation/dataset
python -m wavegrad.inference /path/to/trained/model /path/to/spectrogram -n noise_schedule.npy -o output.wav
```

The default settings should give good results without spending too much time on the search. If you'd like to find a better noise schedule or use a different number of inference iterations, run the `noise_schedule` script with `--help` to see additional configuration options.


## References
- [WaveGrad: Estimating Gradients for Waveform Generation](https://arxiv.org/pdf/2009.00713.pdf)
- [Denoising Diffusion Probabilistic Models](https://arxiv.org/pdf/2006.11239.pdf)
- [Code for Denoising Diffusion Probabilistic Models](https://github.com/hojonathanho/diffusion)
            

Raw data

            {
    "_id": null,
    "home_page": "https://www.lmnt.com",
    "name": "wavegrad",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "wavegrad machine learning neural vocoder tts speech",
    "author": "LMNT, Inc.",
    "author_email": "github@lmnt.com",
    "download_url": "https://files.pythonhosted.org/packages/26/b2/b55e7a13806cfdfcbdb6d66030172319b116c8ce7d9ec4694d0a75f26aff/wavegrad-0.1.3.tar.gz",
    "platform": "",
    "description": "# WaveGrad\n![PyPI Release](https://img.shields.io/pypi/v/wavegrad?label=release) [![License](https://img.shields.io/github/license/lmnt-com/wavegrad)](https://github.com/lmnt-com/wavegrad/blob/master/LICENSE)\n\nWaveGrad is a fast, high-quality neural vocoder designed by the folks at Google Brain. The architecture is described in [WaveGrad: Estimating Gradients for Waveform Generation](https://arxiv.org/pdf/2009.00713.pdf). In short, this model takes a log-scaled Mel spectrogram and converts it to a waveform via iterative refinement.\n\n## Status\n- [x] stable training (22 kHz, 24 kHz)\n- [x] high-quality synthesis\n- [x] mixed-precision training\n- [x] custom noise schedule (faster inference)\n- [x] programmatic API\n- [x] PyPI package\n- [x] audio samples\n- [x] pretrained models\n\n## Audio samples\n[24 kHz audio samples](https://lmnt.com/assets/wavegrad/24kHz)\n\n## Pretrained models\n[24 kHz pretrained model](https://lmnt.com/assets/wavegrad/wavegrad-24kHz.pt) (183 MB, SHA256: `65e9366da318d58d60d2c78416559351ad16971de906e53b415836c068e335f3`)\n\n## Install\n\nInstall using pip:\n```\npip install wavegrad\n```\n\nor from GitHub:\n```\ngit clone https://github.com/lmnt-com/wavegrad.git\ncd wavegrad\npip install .\n```\n\n### Training\nBefore you start training, you'll need to prepare a training dataset. The dataset can have any directory structure as long as the contained .wav files are 16-bit mono (e.g. [LJSpeech](https://keithito.com/LJ-Speech-Dataset/), [VCTK](https://pytorch.org/audio/_modules/torchaudio/datasets/vctk.html)). By default, this implementation assumes a sample rate of 22 kHz. If you need to change this value, edit [params.py](https://github.com/lmnt-com/wavegrad/blob/master/src/wavegrad/params.py).\n\n```\npython -m wavegrad.preprocess /path/to/dir/containing/wavs\npython -m wavegrad /path/to/model/dir /path/to/dir/containing/wavs\n\n# in another shell to monitor training progress:\ntensorboard --logdir /path/to/model/dir --bind_all\n```\n\nYou should expect to hear intelligible speech by ~20k steps (~1.5h on a 2080 Ti).\n\n### Inference API\nBasic usage:\n\n```python\nfrom wavegrad.inference import predict as wavegrad_predict\n\nmodel_dir = '/path/to/model/dir'\nspectrogram = # get your hands on a spectrogram in [N,C,W] format\naudio, sample_rate = wavegrad_predict(spectrogram, model_dir)\n\n# audio is a GPU tensor in [N,T] format.\n```\n\nIf you have a custom noise schedule (see below):\n```python\nfrom wavegrad.inference import predict as wavegrad_predict\n\nparams = { 'noise_schedule': np.load('/path/to/noise_schedule.npy') }\nmodel_dir = '/path/to/model/dir'\nspectrogram = # get your hands on a spectrogram in [N,C,W] format\naudio, sample_rate = wavegrad_predict(spectrogram, model_dir, params=params)\n\n# `audio` is a GPU tensor in [N,T] format.\n```\n\n### Inference CLI\n```\npython -m wavegrad.inference /path/to/model /path/to/spectrogram -o output.wav\n```\n\n### Noise schedule\nThe default implementation uses 1000 iterations to refine the waveform, which runs slower than real-time. WaveGrad is able to achieve high-quality, faster than real-time synthesis with as few as 6 iterations without re-training the model with new hyperparameters.\n\nTo achieve this speed-up, you will need to search for a `noise schedule` that works well for your dataset. This implementation provides a script to perform the search for you:\n\n```\npython -m wavegrad.noise_schedule /path/to/trained/model /path/to/preprocessed/validation/dataset\npython -m wavegrad.inference /path/to/trained/model /path/to/spectrogram -n noise_schedule.npy -o output.wav\n```\n\nThe default settings should give good results without spending too much time on the search. If you'd like to find a better noise schedule or use a different number of inference iterations, run the `noise_schedule` script with `--help` to see additional configuration options.\n\n\n## References\n- [WaveGrad: Estimating Gradients for Waveform Generation](https://arxiv.org/pdf/2009.00713.pdf)\n- [Denoising Diffusion Probabilistic Models](https://arxiv.org/pdf/2006.11239.pdf)\n- [Code for Denoising Diffusion Probabilistic Models](https://github.com/hojonathanho/diffusion)",
    "bugtrack_url": null,
    "license": "Apache 2.0",
    "summary": "wavegrad",
    "version": "0.1.3",
    "split_keywords": [
        "wavegrad",
        "machine",
        "learning",
        "neural",
        "vocoder",
        "tts",
        "speech"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "62bb4bcb36c901a9915c8032cba92e57",
                "sha256": "ce85443e578f113e4f8fed48ef0ce96131381077445cfd5d21f8addde98e61b3"
            },
            "downloads": -1,
            "filename": "wavegrad-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "62bb4bcb36c901a9915c8032cba92e57",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 12137,
            "upload_time": "2020-09-18T06:27:53",
            "upload_time_iso_8601": "2020-09-18T06:27:53.621471Z",
            "url": "https://files.pythonhosted.org/packages/26/b2/b55e7a13806cfdfcbdb6d66030172319b116c8ce7d9ec4694d0a75f26aff/wavegrad-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2020-09-18 06:27:53",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "wavegrad"
}
        
Elapsed time: 0.15739s