# Vocos — MLX
Implementation of [Vocos](https://github.com/gemelo-ai/vocos) with the [MLX](https://github.com/ml-explore/mlx) framework. Vocos allows for high quality reconstruction of audio from Mel spectrograms or EnCodec tokens.
### Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
Paper [[abs]](https://arxiv.org/abs/2306.00814) [[pdf]](https://arxiv.org/pdf/2306.00814.pdf)
## Installation
To use Vocos in inference mode, install it using:
```bash
pip install vocos-mlx
```
## Usage
### Mel Spectrogram
```python
from vocos_mlx import Vocos, load_audio, log_mel_spectrogram
vocos = Vocos.from_pretrained("lucasnewman/vocos-mel-24khz")
# reconstruct
audio = load_audio("audio.wav", 24_000)
reconstructed_audio = vocos(audio)
# decode from mel spec
mel_spec = log_mel_spectrogram(audio, n_mels = 100)
decoded_audio = vocos.decode(mel_spec)
```
### EnCodec
```python
from vocos_mlx import Vocos, load_audio
vocos = Vocos.from_pretrained("lucasnewman/vocos-encodec-24khz")
# reconstruct
audio = load_audio("audio.wav", 24_000)
reconstructed_audio = vocos(audio, bandwidth_id = 3)
# decode with encodec codes
codes = vocos.get_encodec_codes(audio, bandwidth_id = 3)
decoded_audio = vocos.decode_from_codes(codes, bandwidth_id = 3)
```
## Appreciation
[Awni Hannun](https://github.com/awni) for the reference [EnCodec](https://github.com/ml-explore/mlx-examples/tree/main/encodec) implementation for MLX.
## Citations
```
@article{siuzdak2023vocos,
title={Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis},
author={Siuzdak, Hubert},
journal={arXiv preprint arXiv:2306.00814},
year={2023}
}
```
## License
The code in this repository is released under the MIT license as found in the
[LICENSE](LICENSE) file.
Raw data
{
"_id": null,
"home_page": null,
"name": "vocos-mlx",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "artificial intelligence, asr, audio-generation, deep learning, transformers, text-to-speech",
"author": null,
"author_email": "Lucas Newman <lucasnewman@me.com>",
"download_url": "https://files.pythonhosted.org/packages/f1/f5/fc2149185abe3a65358e96d429f9e00148341f88da52851615fc4c991fd4/vocos_mlx-0.0.7.tar.gz",
"platform": null,
"description": "# Vocos \u2014 MLX \n\nImplementation of [Vocos](https://github.com/gemelo-ai/vocos) with the [MLX](https://github.com/ml-explore/mlx) framework. Vocos allows for high quality reconstruction of audio from Mel spectrograms or EnCodec tokens.\n\n### Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis\nPaper [[abs]](https://arxiv.org/abs/2306.00814) [[pdf]](https://arxiv.org/pdf/2306.00814.pdf)\n\n## Installation\n\nTo use Vocos in inference mode, install it using:\n\n```bash\npip install vocos-mlx\n```\n\n## Usage\n\n### Mel Spectrogram\n\n```python\nfrom vocos_mlx import Vocos, load_audio, log_mel_spectrogram\n\nvocos = Vocos.from_pretrained(\"lucasnewman/vocos-mel-24khz\")\n\n# reconstruct\naudio = load_audio(\"audio.wav\", 24_000)\nreconstructed_audio = vocos(audio)\n\n# decode from mel spec\nmel_spec = log_mel_spectrogram(audio, n_mels = 100)\ndecoded_audio = vocos.decode(mel_spec)\n```\n\n### EnCodec\n\n```python\nfrom vocos_mlx import Vocos, load_audio\n\nvocos = Vocos.from_pretrained(\"lucasnewman/vocos-encodec-24khz\")\n\n# reconstruct\naudio = load_audio(\"audio.wav\", 24_000)\nreconstructed_audio = vocos(audio, bandwidth_id = 3)\n\n# decode with encodec codes\ncodes = vocos.get_encodec_codes(audio, bandwidth_id = 3)\ndecoded_audio = vocos.decode_from_codes(codes, bandwidth_id = 3)\n```\n\n## Appreciation\n\n[Awni Hannun](https://github.com/awni) for the reference [EnCodec](https://github.com/ml-explore/mlx-examples/tree/main/encodec) implementation for MLX.\n\n## Citations\n\n```\n@article{siuzdak2023vocos,\n title={Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis},\n author={Siuzdak, Hubert},\n journal={arXiv preprint arXiv:2306.00814},\n year={2023}\n}\n```\n\n## License\n\nThe code in this repository is released under the MIT license as found in the\n[LICENSE](LICENSE) file.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Vocos - MLX",
"version": "0.0.7",
"project_urls": {
"Homepage": "https://github.com/lucasnewman/vocos-mlx"
},
"split_keywords": [
"artificial intelligence",
" asr",
" audio-generation",
" deep learning",
" transformers",
" text-to-speech"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "16ecd92a634ba5538d8213c300e2c3429f8f4a512db1012967cbf3ac11f08c01",
"md5": "012c7ff2cbdf8d9bb8212ed7946bbde4",
"sha256": "a80ea5961811e982ef82a77b2450ea4cbeb5c47318497acaddbea4b58cbaa8d6"
},
"downloads": -1,
"filename": "vocos_mlx-0.0.7-py3-none-any.whl",
"has_sig": false,
"md5_digest": "012c7ff2cbdf8d9bb8212ed7946bbde4",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 15096,
"upload_time": "2024-10-30T22:15:45",
"upload_time_iso_8601": "2024-10-30T22:15:45.242499Z",
"url": "https://files.pythonhosted.org/packages/16/ec/d92a634ba5538d8213c300e2c3429f8f4a512db1012967cbf3ac11f08c01/vocos_mlx-0.0.7-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f1f5fc2149185abe3a65358e96d429f9e00148341f88da52851615fc4c991fd4",
"md5": "eded1e7e1751e244635460a239596745",
"sha256": "3d556736aaafe23b760befa5f7b639977a3d5c52ad7af65aefe0e636d5541ba3"
},
"downloads": -1,
"filename": "vocos_mlx-0.0.7.tar.gz",
"has_sig": false,
"md5_digest": "eded1e7e1751e244635460a239596745",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 14660,
"upload_time": "2024-10-30T22:15:46",
"upload_time_iso_8601": "2024-10-30T22:15:46.898760Z",
"url": "https://files.pythonhosted.org/packages/f1/f5/fc2149185abe3a65358e96d429f9e00148341f88da52851615fc4c991fd4/vocos_mlx-0.0.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-30 22:15:46",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "lucasnewman",
"github_project": "vocos-mlx",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "vocos-mlx"
}