vocos-mlx

Name	vocos-mlx JSON
Version	0.0.7 JSON
	download
home_page	None
Summary	Vocos - MLX
upload_time	2024-10-30 22:15:46
maintainer	None
docs_url	None
author	None
requires_python	>=3.9
license	MIT
keywords	artificial intelligence asr audio-generation deep learning transformers text-to-speech
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Vocos — MLX 

Implementation of [Vocos](https://github.com/gemelo-ai/vocos) with the [MLX](https://github.com/ml-explore/mlx) framework. Vocos allows for high quality reconstruction of audio from Mel spectrograms or EnCodec tokens.

### Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
Paper [[abs]](https://arxiv.org/abs/2306.00814) [[pdf]](https://arxiv.org/pdf/2306.00814.pdf)

## Installation

To use Vocos in inference mode, install it using:

```bash
pip install vocos-mlx
```

## Usage

### Mel Spectrogram

```python
from vocos_mlx import Vocos, load_audio, log_mel_spectrogram

vocos = Vocos.from_pretrained("lucasnewman/vocos-mel-24khz")

# reconstruct
audio = load_audio("audio.wav", 24_000)
reconstructed_audio = vocos(audio)

# decode from mel spec
mel_spec = log_mel_spectrogram(audio, n_mels = 100)
decoded_audio = vocos.decode(mel_spec)
```

### EnCodec

```python
from vocos_mlx import Vocos, load_audio

vocos = Vocos.from_pretrained("lucasnewman/vocos-encodec-24khz")

# reconstruct
audio = load_audio("audio.wav", 24_000)
reconstructed_audio = vocos(audio, bandwidth_id = 3)

# decode with encodec codes
codes = vocos.get_encodec_codes(audio, bandwidth_id = 3)
decoded_audio = vocos.decode_from_codes(codes, bandwidth_id = 3)
```

## Appreciation

[Awni Hannun](https://github.com/awni) for the reference [EnCodec](https://github.com/ml-explore/mlx-examples/tree/main/encodec) implementation for MLX.

## Citations

```
@article{siuzdak2023vocos,
  title={Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis},
  author={Siuzdak, Hubert},
  journal={arXiv preprint arXiv:2306.00814},
  year={2023}
}
```

## License

The code in this repository is released under the MIT license as found in the
[LICENSE](LICENSE) file.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "vocos-mlx",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "artificial intelligence, asr, audio-generation, deep learning, transformers, text-to-speech",
    "author": null,
    "author_email": "Lucas Newman <lucasnewman@me.com>",
    "download_url": "https://files.pythonhosted.org/packages/f1/f5/fc2149185abe3a65358e96d429f9e00148341f88da52851615fc4c991fd4/vocos_mlx-0.0.7.tar.gz",
    "platform": null,
    "description": "# Vocos \u2014 MLX \n\nImplementation of [Vocos](https://github.com/gemelo-ai/vocos) with the [MLX](https://github.com/ml-explore/mlx) framework. Vocos allows for high quality reconstruction of audio from Mel spectrograms or EnCodec tokens.\n\n### Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis\nPaper [[abs]](https://arxiv.org/abs/2306.00814) [[pdf]](https://arxiv.org/pdf/2306.00814.pdf)\n\n## Installation\n\nTo use Vocos in inference mode, install it using:\n\n```bash\npip install vocos-mlx\n```\n\n## Usage\n\n### Mel Spectrogram\n\n```python\nfrom vocos_mlx import Vocos, load_audio, log_mel_spectrogram\n\nvocos = Vocos.from_pretrained(\"lucasnewman/vocos-mel-24khz\")\n\n# reconstruct\naudio = load_audio(\"audio.wav\", 24_000)\nreconstructed_audio = vocos(audio)\n\n# decode from mel spec\nmel_spec = log_mel_spectrogram(audio, n_mels = 100)\ndecoded_audio = vocos.decode(mel_spec)\n```\n\n### EnCodec\n\n```python\nfrom vocos_mlx import Vocos, load_audio\n\nvocos = Vocos.from_pretrained(\"lucasnewman/vocos-encodec-24khz\")\n\n# reconstruct\naudio = load_audio(\"audio.wav\", 24_000)\nreconstructed_audio = vocos(audio, bandwidth_id = 3)\n\n# decode with encodec codes\ncodes = vocos.get_encodec_codes(audio, bandwidth_id = 3)\ndecoded_audio = vocos.decode_from_codes(codes, bandwidth_id = 3)\n```\n\n## Appreciation\n\n[Awni Hannun](https://github.com/awni) for the reference [EnCodec](https://github.com/ml-explore/mlx-examples/tree/main/encodec) implementation for MLX.\n\n## Citations\n\n```\n@article{siuzdak2023vocos,\n  title={Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis},\n  author={Siuzdak, Hubert},\n  journal={arXiv preprint arXiv:2306.00814},\n  year={2023}\n}\n```\n\n## License\n\nThe code in this repository is released under the MIT license as found in the\n[LICENSE](LICENSE) file.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Vocos - MLX",
    "version": "0.0.7",
    "project_urls": {
        "Homepage": "https://github.com/lucasnewman/vocos-mlx"
    },
    "split_keywords": [
        "artificial intelligence",
        " asr",
        " audio-generation",
        " deep learning",
        " transformers",
        " text-to-speech"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "16ecd92a634ba5538d8213c300e2c3429f8f4a512db1012967cbf3ac11f08c01",
                "md5": "012c7ff2cbdf8d9bb8212ed7946bbde4",
                "sha256": "a80ea5961811e982ef82a77b2450ea4cbeb5c47318497acaddbea4b58cbaa8d6"
            },
            "downloads": -1,
            "filename": "vocos_mlx-0.0.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "012c7ff2cbdf8d9bb8212ed7946bbde4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 15096,
            "upload_time": "2024-10-30T22:15:45",
            "upload_time_iso_8601": "2024-10-30T22:15:45.242499Z",
            "url": "https://files.pythonhosted.org/packages/16/ec/d92a634ba5538d8213c300e2c3429f8f4a512db1012967cbf3ac11f08c01/vocos_mlx-0.0.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f1f5fc2149185abe3a65358e96d429f9e00148341f88da52851615fc4c991fd4",
                "md5": "eded1e7e1751e244635460a239596745",
                "sha256": "3d556736aaafe23b760befa5f7b639977a3d5c52ad7af65aefe0e636d5541ba3"
            },
            "downloads": -1,
            "filename": "vocos_mlx-0.0.7.tar.gz",
            "has_sig": false,
            "md5_digest": "eded1e7e1751e244635460a239596745",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 14660,
            "upload_time": "2024-10-30T22:15:46",
            "upload_time_iso_8601": "2024-10-30T22:15:46.898760Z",
            "url": "https://files.pythonhosted.org/packages/f1/f5/fc2149185abe3a65358e96d429f9e00148341f88da52851615fc4c991fd4/vocos_mlx-0.0.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-30 22:15:46",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "lucasnewman",
    "github_project": "vocos-mlx",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "vocos-mlx"
}

None