tvocos


Nametvocos JSON
Version 0.1.0 PyPI version JSON
download
home_pagehttps://github.com/hubertsiuzdak/vocos-local
SummaryFourier-based neural vocoder
upload_time2023-06-11 19:06:04
maintainer
docs_urlNone
authorHubert Siuzdak
requires_python
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

[Audio samples](https://charactr-platform.github.io/vocos/) |
Paper [[abs]](https://arxiv.org/abs/2306.00814) [[pdf]](https://arxiv.org/pdf/2306.00814.pdf)

## Installation

To use Vocos only in inference mode, install it using:

```bash
pip install vocos
```

If you wish to train the model, install it with additional dependencies:

```bash
pip install vocos[train]
```

## Usage

### Reconstruct audio from mel-spectrogram

```python
import torch

from vocos import Vocos

vocos = Vocos.from_pretrained("charactr/vocos-mel-24khz")

mel = torch.randn(1, 100, 256)  # B, C, T

with torch.no_grad():
    audio = vocos.decode(mel)
```

Copy-synthesis from a file:

```python
import torchaudio

y, sr = torchaudio.load(YOUR_AUDIO_FILE)
y = torchaudio.functional.resample(y, orig_freq=sr, new_freq=24000)

with torch.no_grad():
    y_hat = vocos(y)
```

### Reconstruct audio from EnCodec

Additionally, you need to provide a `bandwidth_id` which corresponds to the lookup embedding for bandwidth from the
list: `[1.5, 3.0, 6.0, 12.0]`.

```python
vocos = Vocos.from_pretrained("charactr/vocos-encodec-24khz")

quantized_features = torch.randn(1, 128, 256)
bandwidth_id = torch.tensor([3])  # 12 kbps

with torch.no_grad():
    audio = vocos.decode(quantized_features, bandwidth_id=bandwidth_id)  
```

Copy-synthesis from a file. Extract and quantize features with EnCodec, reconstruct with Vocos:

```python
y, sr = torchaudio.load(YOUR_AUDIO_FILE)
y = torchaudio.functional.resample(y, orig_freq=sr, new_freq=24000)

with torch.no_grad():
    y_hat = vocos(y, bandwidth_id=bandwidth_id)
```

## Pre-trained models

The provided models were trained up to 2.5 million generator iterations, which resulted in slightly better objective
scores
compared to those reported in the paper.

| Model Name                                                                          | Dataset       | Training Iterations | Parameters 
|-------------------------------------------------------------------------------------|---------------|---------------------|------------|
| [charactr/vocos-mel-24khz](https://huggingface.co/charactr/vocos-mel-24khz)         | LibriTTS      | 2.5 M               | 13.5 M     
| [charactr/vocos-encodec-24khz](https://huggingface.co/charactr/vocos-encodec-24khz) | DNS Challenge | 2.5 M               | 7.9 M      

## Training

Prepare a filelist of audio files for the training and validation set:

```bash
find $TRAIN_DATASET_DIR -name *.wav > filelist.train
find $VAL_DATASET_DIR -name *.wav > filelist.val
```

Fill a config file, e.g. [vocos.yaml](configs%2Fvocos.yaml), with your filelist paths and start training with:

```bash
python train.py -c configs/vocos.yaml
```

Refer to [Pytorch Lightning documentation](https://lightning.ai/docs/pytorch/stable/) for details about customizing the
training pipeline.

## Citation

If you use this code or results in your paper, please cite our work as:

```
@article{siuzdak2023vocos,
  title={Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis},
  author={Siuzdak, Hubert},
  journal={arXiv preprint arXiv:2306.00814},
  year={2023}
}
```

## License

The code in this repository is released under the MIT license as found in the
[LICENSE](LICENSE) file.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/hubertsiuzdak/vocos-local",
    "name": "tvocos",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Hubert Siuzdak",
    "author_email": "huberts@charactr.com",
    "download_url": "https://files.pythonhosted.org/packages/c5/18/a139a143e8d622ed6d25dcf0966decaf5966dbe15808b24d118f998a4967/tvocos-0.1.0.tar.gz",
    "platform": null,
    "description": "# Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis\n\n[Audio samples](https://charactr-platform.github.io/vocos/) |\nPaper [[abs]](https://arxiv.org/abs/2306.00814) [[pdf]](https://arxiv.org/pdf/2306.00814.pdf)\n\n## Installation\n\nTo use Vocos only in inference mode, install it using:\n\n```bash\npip install vocos\n```\n\nIf you wish to train the model, install it with additional dependencies:\n\n```bash\npip install vocos[train]\n```\n\n## Usage\n\n### Reconstruct audio from mel-spectrogram\n\n```python\nimport torch\n\nfrom vocos import Vocos\n\nvocos = Vocos.from_pretrained(\"charactr/vocos-mel-24khz\")\n\nmel = torch.randn(1, 100, 256)  # B, C, T\n\nwith torch.no_grad():\n    audio = vocos.decode(mel)\n```\n\nCopy-synthesis from a file:\n\n```python\nimport torchaudio\n\ny, sr = torchaudio.load(YOUR_AUDIO_FILE)\ny = torchaudio.functional.resample(y, orig_freq=sr, new_freq=24000)\n\nwith torch.no_grad():\n    y_hat = vocos(y)\n```\n\n### Reconstruct audio from EnCodec\n\nAdditionally, you need to provide a `bandwidth_id` which corresponds to the lookup embedding for bandwidth from the\nlist: `[1.5, 3.0, 6.0, 12.0]`.\n\n```python\nvocos = Vocos.from_pretrained(\"charactr/vocos-encodec-24khz\")\n\nquantized_features = torch.randn(1, 128, 256)\nbandwidth_id = torch.tensor([3])  # 12 kbps\n\nwith torch.no_grad():\n    audio = vocos.decode(quantized_features, bandwidth_id=bandwidth_id)  \n```\n\nCopy-synthesis from a file. Extract and quantize features with EnCodec, reconstruct with Vocos:\n\n```python\ny, sr = torchaudio.load(YOUR_AUDIO_FILE)\ny = torchaudio.functional.resample(y, orig_freq=sr, new_freq=24000)\n\nwith torch.no_grad():\n    y_hat = vocos(y, bandwidth_id=bandwidth_id)\n```\n\n## Pre-trained models\n\nThe provided models were trained up to 2.5 million generator iterations, which resulted in slightly better objective\nscores\ncompared to those reported in the paper.\n\n| Model Name                                                                          | Dataset       | Training Iterations | Parameters \n|-------------------------------------------------------------------------------------|---------------|---------------------|------------|\n| [charactr/vocos-mel-24khz](https://huggingface.co/charactr/vocos-mel-24khz)         | LibriTTS      | 2.5 M               | 13.5 M     \n| [charactr/vocos-encodec-24khz](https://huggingface.co/charactr/vocos-encodec-24khz) | DNS Challenge | 2.5 M               | 7.9 M      \n\n## Training\n\nPrepare a filelist of audio files for the training and validation set:\n\n```bash\nfind $TRAIN_DATASET_DIR -name *.wav > filelist.train\nfind $VAL_DATASET_DIR -name *.wav > filelist.val\n```\n\nFill a config file, e.g. [vocos.yaml](configs%2Fvocos.yaml), with your filelist paths and start training with:\n\n```bash\npython train.py -c configs/vocos.yaml\n```\n\nRefer to [Pytorch Lightning documentation](https://lightning.ai/docs/pytorch/stable/) for details about customizing the\ntraining pipeline.\n\n## Citation\n\nIf you use this code or results in your paper, please cite our work as:\n\n```\n@article{siuzdak2023vocos,\n  title={Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis},\n  author={Siuzdak, Hubert},\n  journal={arXiv preprint arXiv:2306.00814},\n  year={2023}\n}\n```\n\n## License\n\nThe code in this repository is released under the MIT license as found in the\n[LICENSE](LICENSE) file.\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Fourier-based neural vocoder",
    "version": "0.1.0",
    "project_urls": {
        "Homepage": "https://github.com/hubertsiuzdak/vocos-local"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0109690e7baa1aa3161dd94f31b683ab2161a83b81a58cc5a8810cad9bc57bc5",
                "md5": "fc8703075d7b15e9ea087064d1ddc63e",
                "sha256": "82d15ebd6361b50b043d674edc9639f68695189663619c6a32dd90d77c4338dc"
            },
            "downloads": -1,
            "filename": "tvocos-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fc8703075d7b15e9ea087064d1ddc63e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 22340,
            "upload_time": "2023-06-11T19:06:02",
            "upload_time_iso_8601": "2023-06-11T19:06:02.461476Z",
            "url": "https://files.pythonhosted.org/packages/01/09/690e7baa1aa3161dd94f31b683ab2161a83b81a58cc5a8810cad9bc57bc5/tvocos-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c518a139a143e8d622ed6d25dcf0966decaf5966dbe15808b24d118f998a4967",
                "md5": "df25f9615c29b8c5dbfaa9b1c9c77715",
                "sha256": "df2d990d256a617a69febceb21b43ec1aa5c603fae41efa43c3ab8892839b131"
            },
            "downloads": -1,
            "filename": "tvocos-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "df25f9615c29b8c5dbfaa9b1c9c77715",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 18033,
            "upload_time": "2023-06-11T19:06:04",
            "upload_time_iso_8601": "2023-06-11T19:06:04.191183Z",
            "url": "https://files.pythonhosted.org/packages/c5/18/a139a143e8d622ed6d25dcf0966decaf5966dbe15808b24d118f998a4967/tvocos-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-11 19:06:04",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "hubertsiuzdak",
    "github_project": "vocos-local",
    "github_not_found": true,
    "lcname": "tvocos"
}
        
Elapsed time: 0.08934s