# Music2Latent
Encode and decode audio samples to/from compressed representations! Useful for efficient generative modelling applications and for other downstream tasks.
![music2latent](music2latent.png)
Read the ISMIR 2024 paper [here](https://arxiv.org/abs/2408.06500).
Listen to audio samples [here](https://sonycslparis.github.io/music2latent-companion/).
Under the hood, __Music2Latent__ uses a __Consistency Autoencoder__ model to efficiently encode and decode audio samples.
44.1 kHz audio is encoded into a sequence of __~10 Hz__, and each of the latents has 64 channels.
48 kHz audio can also be encoded, which results in a sequence of ~12 Hz.
A generative model can then be trained on these embeddings, or they can be used for other downstream tasks.
Music2Latent was trained on __music__ and on __speech__. Refer to the [paper](https://arxiv.org/abs/2408.06500) for more details.
## Installation
```bash
pip install music2latent
```
The model weights will be downloaded automatically the first time the code is run.
## How to use
To encode and decode audio samples to/from latent embeddings:
```bash
audio_path = librosa.example('trumpet')
wv, sr = librosa.load(audio_path, sr=44100)
from music2latent import EncoderDecoder
encdec = EncoderDecoder()
latent = encdec.encode(wv)
# latent has shape (batch_size/audio_channels, dim (64), sequence_length)
wv_rec = encdec.decode(latent)
```
To extract encoder features to use in downstream tasks:
```bash
features = encoder.encode(wv, extract_features=True)
```
These features are extracted before the encoder bottleneck, and thus have more channels (contain more information) than the latents used for reconstruction. It will not be possible to directly decode these features back to audio.
music2latent supports more advanced usage, including GPU memory management controls. Please refer to __tutorial.ipynb__.
## License
This library is released under the CC BY-NC 4.0 license. Please refer to the LICENSE file for more details.
This work was conducted by [Marco Pasini](https://twitter.com/marco_ppasini) during his PhD at Queen Mary University of London, in partnership with Sony Computer Science Laboratories Paris.
This work was supervised by Stefan Lattner and George Fazekas.
Raw data
{
"_id": null,
"home_page": "https://github.com/SonyCSLParis/music2latent",
"name": "music2latent",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "audio speech music compression generative-model autoencoder diffusion consistency",
"author": "Sony Computer Science Laboratories Paris",
"author_email": "music@csl.sony.fr",
"download_url": "https://files.pythonhosted.org/packages/d1/03/b0736244bb05a2ed09ebaa0fd9127172b0f03f4a3751d4080def672bd011/music2latent-0.1.6.tar.gz",
"platform": null,
"description": "# Music2Latent\nEncode and decode audio samples to/from compressed representations! Useful for efficient generative modelling applications and for other downstream tasks.\n\n![music2latent](music2latent.png)\n\nRead the ISMIR 2024 paper [here](https://arxiv.org/abs/2408.06500).\nListen to audio samples [here](https://sonycslparis.github.io/music2latent-companion/).\n\nUnder the hood, __Music2Latent__ uses a __Consistency Autoencoder__ model to efficiently encode and decode audio samples.\n\n44.1 kHz audio is encoded into a sequence of __~10 Hz__, and each of the latents has 64 channels.\n48 kHz audio can also be encoded, which results in a sequence of ~12 Hz.\nA generative model can then be trained on these embeddings, or they can be used for other downstream tasks.\n\nMusic2Latent was trained on __music__ and on __speech__. Refer to the [paper](https://arxiv.org/abs/2408.06500) for more details.\n\n\n## Installation\n\n ```bash\n pip install music2latent\n ```\nThe model weights will be downloaded automatically the first time the code is run.\n\n\n## How to use\nTo encode and decode audio samples to/from latent embeddings:\n ```bash\n audio_path = librosa.example('trumpet')\n wv, sr = librosa.load(audio_path, sr=44100)\n\n from music2latent import EncoderDecoder\n encdec = EncoderDecoder()\n\n latent = encdec.encode(wv)\n # latent has shape (batch_size/audio_channels, dim (64), sequence_length)\n\n wv_rec = encdec.decode(latent)\n ```\nTo extract encoder features to use in downstream tasks:\n ```bash\n features = encoder.encode(wv, extract_features=True)\n ```\nThese features are extracted before the encoder bottleneck, and thus have more channels (contain more information) than the latents used for reconstruction. It will not be possible to directly decode these features back to audio.\n\nmusic2latent supports more advanced usage, including GPU memory management controls. Please refer to __tutorial.ipynb__.\n\n\n## License\nThis library is released under the CC BY-NC 4.0 license. Please refer to the LICENSE file for more details.\n\n\n\nThis work was conducted by [Marco Pasini](https://twitter.com/marco_ppasini) during his PhD at Queen Mary University of London, in partnership with Sony Computer Science Laboratories Paris.\nThis work was supervised by Stefan Lattner and George Fazekas.\n",
"bugtrack_url": null,
"license": "CC BY-NC 4.0",
"summary": "Encode and decode audio samples to/from compressed representations!",
"version": "0.1.6",
"project_urls": {
"Homepage": "https://github.com/SonyCSLParis/music2latent"
},
"split_keywords": [
"audio",
"speech",
"music",
"compression",
"generative-model",
"autoencoder",
"diffusion",
"consistency"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b6f28b420c4dff220aa81f2f3efcdebe7daf3c4f5ae055746dbbbe240035b7e1",
"md5": "84837002c11b4e7c110c0ff44c64aa4a",
"sha256": "1dfc23299bbb55d68e3a9e6394af359a7155c027c1d836d2ba78ebccd3ae3cbe"
},
"downloads": -1,
"filename": "music2latent-0.1.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "84837002c11b4e7c110c0ff44c64aa4a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 19882,
"upload_time": "2024-08-16T11:50:34",
"upload_time_iso_8601": "2024-08-16T11:50:34.426578Z",
"url": "https://files.pythonhosted.org/packages/b6/f2/8b420c4dff220aa81f2f3efcdebe7daf3c4f5ae055746dbbbe240035b7e1/music2latent-0.1.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d103b0736244bb05a2ed09ebaa0fd9127172b0f03f4a3751d4080def672bd011",
"md5": "edb08d6da759999eabbd1da092b76cf7",
"sha256": "f5edfe04538261ee1b61723328423f306c8e611d3ecdb9c528a8056fb7f96413"
},
"downloads": -1,
"filename": "music2latent-0.1.6.tar.gz",
"has_sig": false,
"md5_digest": "edb08d6da759999eabbd1da092b76cf7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 19185,
"upload_time": "2024-08-16T11:50:35",
"upload_time_iso_8601": "2024-08-16T11:50:35.600803Z",
"url": "https://files.pythonhosted.org/packages/d1/03/b0736244bb05a2ed09ebaa0fd9127172b0f03f4a3751d4080def672bd011/music2latent-0.1.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-16 11:50:35",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "SonyCSLParis",
"github_project": "music2latent",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "music2latent"
}