snac


Namesnac JSON
Version 1.2.1 PyPI version JSON
download
home_pagehttps://github.com/hubertsiuzdak/snac
SummaryMulti-Scale Neural Audio Codec
upload_time2024-09-11 15:00:46
maintainerNone
docs_urlNone
authorHubert Siuzdak
requires_pythonNone
licenseNone
keywords
VCS
bugtrack_url
requirements torch numpy einops huggingface_hub
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # SNAC 🍿

Multi-**S**cale **N**eural **A**udio **C**odec (SNAC) compresses audio into discrete codes at a low bitrate.

| 🎸 Music samples                                                                                         | 🗣️ Speech samples                                                                                       |
|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|
| <video src='https://github.com/hubertsiuzdak/snac/assets/35269911/e8adac68-d3f1-4fc1-8cf9-f48d9bcd95ed'> | <video src='https://github.com/hubertsiuzdak/snac/assets/35269911/65ac2547-c711-49d4-8a5d-64d52e6d6ba1'> |

🎧 More audio samples available at https://hubertsiuzdak.github.io/snac/

## Overview

SNAC encodes audio into hierarchical tokens similarly to SoundStream, EnCodec, and DAC (see the image
on the left). However, SNAC introduces a simple change where coarse tokens are sampled less frequently,
covering a broader time span (see the image on the right).

This can not only save on bitrate, but more importantly this might be very useful for language modeling approaches to
audio generation. E.g. with coarse tokens of ~10 Hz and a context window of 2048 you can effectively model a
consistent structure of an audio track for ~3 minutes.

![snac.png](img%2Fsnac.png)

## Pretrained models

Currently, all models support only single audio channel (mono).

| Model                                                                       | Bitrate   | Sample Rate | Params | Recommended use case     | 
|-----------------------------------------------------------------------------|-----------|-------------|--------|--------------------------|
| [hubertsiuzdak/snac_24khz](https://huggingface.co/hubertsiuzdak/snac_24khz) | 0.98 kbps | 24 kHz      | 19.8 M | 🗣️ Speech               | 
| [hubertsiuzdak/snac_32khz](https://huggingface.co/hubertsiuzdak/snac_32khz) | 1.9 kbps  | 32 kHz      | 54.5 M | 🎸 Music / Sound Effects | 
| [hubertsiuzdak/snac_44khz](https://huggingface.co/hubertsiuzdak/snac_44khz) | 2.6 kbps  | 44 kHz      | 54.5 M | 🎸 Music / Sound Effects |

## Usage

Install it using:

```bash
pip install snac
```

To encode (and decode) audio with SNAC in Python, use the following code:

```python
import torch
from snac import SNAC

model = SNAC.from_pretrained("hubertsiuzdak/snac_32khz").eval().cuda()
audio = torch.randn(1, 1, 32000).cuda()  # placeholder for actual audio with shape (B, 1, T)

with torch.inference_mode():
    codes = model.encode(audio)
    audio_hat = model.decode(codes)
```

You can also encode and reconstruct in a single call:

```python
with torch.inference_mode():
    audio_hat, codes = model(audio)
```

⚠️ Note that `codes` is a list of token sequences of variable lengths, each corresponding to a different temporal
resolution.

```
>>> [code.shape[1] for code in codes]
[12, 24, 48, 96]
```

## Acknowledgements

Module definitions are adapted from the [Descript Audio Codec](https://github.com/descriptinc/descript-audio-codec).

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/hubertsiuzdak/snac",
    "name": "snac",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "Hubert Siuzdak",
    "author_email": "hubert.siuzdak@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/43/38/5b64fb15c1cf02233252975c43c4b85ccacd9f77c55f7fee72b16b3bd2f6/snac-1.2.1.tar.gz",
    "platform": null,
    "description": "# SNAC \ud83c\udf7f\n\nMulti-**S**cale **N**eural **A**udio **C**odec (SNAC) compresses audio into discrete codes at a low bitrate.\n\n| \ud83c\udfb8 Music samples                                                                                         | \ud83d\udde3\ufe0f Speech samples                                                                                       |\n|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|\n| <video src='https://github.com/hubertsiuzdak/snac/assets/35269911/e8adac68-d3f1-4fc1-8cf9-f48d9bcd95ed'> | <video src='https://github.com/hubertsiuzdak/snac/assets/35269911/65ac2547-c711-49d4-8a5d-64d52e6d6ba1'> |\n\n\ud83c\udfa7 More audio samples available at https://hubertsiuzdak.github.io/snac/\n\n## Overview\n\nSNAC encodes audio into hierarchical tokens similarly to SoundStream, EnCodec, and DAC (see the image\non the left). However, SNAC introduces a simple change where coarse tokens are sampled less frequently,\ncovering a broader time span (see the image on the right).\n\nThis can not only save on bitrate, but more importantly this might be very useful for language modeling approaches to\naudio generation. E.g. with coarse tokens of ~10 Hz and a context window of 2048 you can effectively model a\nconsistent structure of an audio track for ~3 minutes.\n\n![snac.png](img%2Fsnac.png)\n\n## Pretrained models\n\nCurrently, all models support only single audio channel (mono).\n\n| Model                                                                       | Bitrate   | Sample Rate | Params | Recommended use case     | \n|-----------------------------------------------------------------------------|-----------|-------------|--------|--------------------------|\n| [hubertsiuzdak/snac_24khz](https://huggingface.co/hubertsiuzdak/snac_24khz) | 0.98 kbps | 24 kHz      | 19.8 M | \ud83d\udde3\ufe0f Speech               | \n| [hubertsiuzdak/snac_32khz](https://huggingface.co/hubertsiuzdak/snac_32khz) | 1.9 kbps  | 32 kHz      | 54.5 M | \ud83c\udfb8 Music / Sound Effects | \n| [hubertsiuzdak/snac_44khz](https://huggingface.co/hubertsiuzdak/snac_44khz) | 2.6 kbps  | 44 kHz      | 54.5 M | \ud83c\udfb8 Music / Sound Effects |\n\n## Usage\n\nInstall it using:\n\n```bash\npip install snac\n```\n\nTo encode (and decode) audio with SNAC in Python, use the following code:\n\n```python\nimport torch\nfrom snac import SNAC\n\nmodel = SNAC.from_pretrained(\"hubertsiuzdak/snac_32khz\").eval().cuda()\naudio = torch.randn(1, 1, 32000).cuda()  # placeholder for actual audio with shape (B, 1, T)\n\nwith torch.inference_mode():\n    codes = model.encode(audio)\n    audio_hat = model.decode(codes)\n```\n\nYou can also encode and reconstruct in a single call:\n\n```python\nwith torch.inference_mode():\n    audio_hat, codes = model(audio)\n```\n\n\u26a0\ufe0f Note that `codes` is a list of token sequences of variable lengths, each corresponding to a different temporal\nresolution.\n\n```\n>>> [code.shape[1] for code in codes]\n[12, 24, 48, 96]\n```\n\n## Acknowledgements\n\nModule definitions are adapted from the [Descript Audio Codec](https://github.com/descriptinc/descript-audio-codec).\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Multi-Scale Neural Audio Codec",
    "version": "1.2.1",
    "project_urls": {
        "Homepage": "https://github.com/hubertsiuzdak/snac"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "794f6401dc74af3d9e9602209763eccbb7eac739c2501e499b51b560f71443c0",
                "md5": "142366c2a36bac18efab8cb84bc81dbe",
                "sha256": "96f90e221121ad03d6e3b060a787268b1efdbe424560a58f6f732df6d4914dc7"
            },
            "downloads": -1,
            "filename": "snac-1.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "142366c2a36bac18efab8cb84bc81dbe",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 8444,
            "upload_time": "2024-09-11T15:00:44",
            "upload_time_iso_8601": "2024-09-11T15:00:44.906830Z",
            "url": "https://files.pythonhosted.org/packages/79/4f/6401dc74af3d9e9602209763eccbb7eac739c2501e499b51b560f71443c0/snac-1.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "43385b64fb15c1cf02233252975c43c4b85ccacd9f77c55f7fee72b16b3bd2f6",
                "md5": "fe9f116edbda97af3ae87aa93775a769",
                "sha256": "697f27fc5b98308eee8946739e5fd9c1b4ec629ef51b4f01c08dace1290685ee"
            },
            "downloads": -1,
            "filename": "snac-1.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "fe9f116edbda97af3ae87aa93775a769",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 8737,
            "upload_time": "2024-09-11T15:00:46",
            "upload_time_iso_8601": "2024-09-11T15:00:46.175379Z",
            "url": "https://files.pythonhosted.org/packages/43/38/5b64fb15c1cf02233252975c43c4b85ccacd9f77c55f7fee72b16b3bd2f6/snac-1.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-11 15:00:46",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "hubertsiuzdak",
    "github_project": "snac",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "torch",
            "specs": []
        },
        {
            "name": "numpy",
            "specs": []
        },
        {
            "name": "einops",
            "specs": []
        },
        {
            "name": "huggingface_hub",
            "specs": []
        }
    ],
    "lcname": "snac"
}
        
Elapsed time: 0.28797s