descript-audio-codec


Namedescript-audio-codec JSON
Version 0.0.4 PyPI version JSON
download
home_pagehttps://github.com/descriptinc/descript-audio-codec
SummaryA high-quality general neural audio codec.
upload_time2023-06-21 17:49:36
maintainer
docs_urlNone
authorPrem Seetharaman, Rithesh Kumar
requires_python
licenseMIT
keywords audio compression machine learning
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Descript Audio Codec (.dac): High-Fidelity Audio Compression with Improved RVQGAN

This repository contains training and inference scripts
for the Descript Audio Codec (.dac), a high fidelity general
neural audio codec, introduced in the paper titled **High-Fidelity Audio Compression with Improved RVQGAN**.

![](https://static.arxiv.org/static/browse/0.3.4/images/icons/favicon-16x16.png) [arXiv Paper: High-Fidelity Audio Compression with Improved RVQGAN
](http://arxiv.org/abs/2306.06546) <br>
📈 [Demo Site](https://descript.notion.site/Descript-Audio-Codec-11389fce0ce2419891d6591a68f814d5)<br>
âš™ [Model Weights](https://github.com/descriptinc/descript-audio-codec/releases/download/0.0.1/weights.pth)

👉 With Descript Audio Codec, you can compress **44.1 KHz audio** into discrete codes at a **low 8 kbps bitrate**.  <br>
🤌 That's approximately **90x compression** while maintaining exceptional fidelity and minimizing artifacts.  <br>
💪 Our universal model works on all domains (speech, environment, music, etc.), making it widely applicable to generative modeling of all audio.  <br>
👌 It can be used as a drop-in replacement for EnCodec for all audio language modeling applications (such as AudioLMs, MusicLMs, MusicGen, etc.) <br>

<p align="center">
<img src="./assets/comparsion_stats.png" alt="Comparison of compressions approaches. Our model achieves a higher compression factor compared to all baseline methods. Our model has a ~90x compression factor compared to 32x compression factor of EnCodec and 64x of SoundStream. Note that we operate at a target bitrate of 8 kbps, whereas EnCodec operates at 24 kbps and SoundStream at 6 kbps. We also operate at 44.1 kHz, whereas EnCodec operates at 48 kHz and SoundStream operates at 24 kHz." width=35%></p>


## Usage

### Installation
```
pip install descript-audio-codec
```
OR

```
pip install git+https://github.com/descriptinc/descript-audio-codec
```

### Weights
Weights are released as part of this repo under MIT license.
We release weights for models that can natively support 24kHz and 44.1kHz sampling rates.
Weights are automatically downloaded when you first run `encode` or `decode` command. You can cache them using one of the following commands
```bash
python3 -m dac download # downloads the default 44kHz variant
python3 -m dac download --model_type 44khz # downloads the 44kHz variant
python3 -m dac download --model_type 24khz # downloads the 24kHz variant
```
We provide a Dockerfile that installs all required dependencies for encoding and decoding. The build process caches the default model weights inside the image. This allows the image to be used without an internet connection. [Please refer to instructions below.](#docker-image)


### Compress audio
```
python3 -m dac encode /path/to/input --output /path/to/output/codes
```

This command will create `.dac` files with the same name as the input files.
It will also preserve the directory structure relative to input root and
re-create it in the output directory. Please use `python -m dac encode --help`
for more options.

### Reconstruct audio from compressed codes
```
python3 -m dac decode /path/to/output/codes --output /path/to/reconstructed_input
```

This command will create `.wav` files with the same name as the input files.
It will also preserve the directory structure relative to input root and
re-create it in the output directory. Please use `python -m dac decode --help`
for more options.

### Programmatic Usage
```py
import dac
from dac.utils import load_model
from dac.model import DAC

from dac.utils.encode import process as encode
from dac.utils.decode import process as decode

from audiotools import AudioSignal

# Init an empty model
model = DAC()

# Load compatible pre-trained model
model = load_model(tag="latest", model_type="44khz")
model.eval()
model.to('cuda')

# Load audio signal file
signal = AudioSignal('input.wav')

# Encode audio signal
encoded_out = encode(signal, 'cuda', model)

# Decode audio signal
recon = decode(encoded_out, 'cuda', model, preserve_sample_rate=True)

# Write to file
recon.write('recon.wav')
```

### Docker image
We provide a dockerfile to build a docker image with all the necessary
dependencies.
1. Building the image.
    ```
    docker build -t dac .
    ```
2. Using the image.

    Usage on CPU:
    ```
    docker run dac <command>
    ```

    Usage on GPU:
    ```
    docker run --gpus=all dac <command>
    ```

    `<command>` can be one of the compression and reconstruction commands listed
    above. For example, if you want to run compression,

    ```
    docker run --gpus=all dac python3 -m dac encode ...
    ```


## Training
The baseline model configuration can be trained using the following commands.

### Pre-requisites
Please install the correct dependencies
```
pip install -e ".[dev]"
```


### Single GPU training
```
export CUDA_VISIBLE_DEVICES=0
python scripts/train.py --args.load conf/ablations/baseline.yml --save_path runs/baseline/
```

### Multi GPU training
```
export CUDA_VISIBLE_DEVICES=0,1
torchrun --nproc_per_node gpu scripts/train.py --args.load conf/ablations/baseline.yml --save_path runs/baseline/
```

## Testing
We provide two test scripts to test CLI + training functionality. Please
make sure that the trainig pre-requisites are satisfied before launching these
tests. To launch these tests please run
```
python -m pytest tests
```

## Results

<p align="left">
<img src="./assets/objective_comparisons.png" width=75%></p>

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/descriptinc/descript-audio-codec",
    "name": "descript-audio-codec",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "audio,compression,machine learning",
    "author": "Prem Seetharaman, Rithesh Kumar",
    "author_email": "prem@descript.com",
    "download_url": "https://files.pythonhosted.org/packages/0a/68/03b18353eee978921c8d6ebcbb42d16781d04ac70448ac7a040c50e3098b/descript-audio-codec-0.0.4.tar.gz",
    "platform": null,
    "description": "# Descript Audio Codec (.dac): High-Fidelity Audio Compression with Improved RVQGAN\n\nThis repository contains training and inference scripts\nfor the Descript Audio Codec (.dac), a high fidelity general\nneural audio codec, introduced in the paper titled **High-Fidelity Audio Compression with Improved RVQGAN**.\n\n![](https://static.arxiv.org/static/browse/0.3.4/images/icons/favicon-16x16.png) [arXiv Paper: High-Fidelity Audio Compression with Improved RVQGAN\n](http://arxiv.org/abs/2306.06546) <br>\n\ud83d\udcc8 [Demo Site](https://descript.notion.site/Descript-Audio-Codec-11389fce0ce2419891d6591a68f814d5)<br>\n\u2699 [Model Weights](https://github.com/descriptinc/descript-audio-codec/releases/download/0.0.1/weights.pth)\n\n\ud83d\udc49 With Descript Audio Codec, you can compress **44.1 KHz audio** into discrete codes at a **low 8 kbps bitrate**.  <br>\n\ud83e\udd0c That's approximately **90x compression** while maintaining exceptional fidelity and minimizing artifacts.  <br>\n\ud83d\udcaa Our universal model works on all domains (speech, environment, music, etc.), making it widely applicable to generative modeling of all audio.  <br>\n\ud83d\udc4c It can be used as a drop-in replacement for EnCodec for all audio language modeling applications (such as AudioLMs, MusicLMs, MusicGen, etc.) <br>\n\n<p align=\"center\">\n<img src=\"./assets/comparsion_stats.png\" alt=\"Comparison of compressions approaches. Our model achieves a higher compression factor compared to all baseline methods. Our model has a ~90x compression factor compared to 32x compression factor of EnCodec and 64x of SoundStream. Note that we operate at a target bitrate of 8 kbps, whereas EnCodec operates at 24 kbps and SoundStream at 6 kbps. We also operate at 44.1 kHz, whereas EnCodec operates at 48 kHz and SoundStream operates at 24 kHz.\" width=35%></p>\n\n\n## Usage\n\n### Installation\n```\npip install descript-audio-codec\n```\nOR\n\n```\npip install git+https://github.com/descriptinc/descript-audio-codec\n```\n\n### Weights\nWeights are released as part of this repo under MIT license.\nWe release weights for models that can natively support 24kHz and 44.1kHz sampling rates.\nWeights are automatically downloaded when you first run `encode` or `decode` command. You can cache them using one of the following commands\n```bash\npython3 -m dac download # downloads the default 44kHz variant\npython3 -m dac download --model_type 44khz # downloads the 44kHz variant\npython3 -m dac download --model_type 24khz # downloads the 24kHz variant\n```\nWe provide a Dockerfile that installs all required dependencies for encoding and decoding. The build process caches the default model weights inside the image. This allows the image to be used without an internet connection. [Please refer to instructions below.](#docker-image)\n\n\n### Compress audio\n```\npython3 -m dac encode /path/to/input --output /path/to/output/codes\n```\n\nThis command will create `.dac` files with the same name as the input files.\nIt will also preserve the directory structure relative to input root and\nre-create it in the output directory. Please use `python -m dac encode --help`\nfor more options.\n\n### Reconstruct audio from compressed codes\n```\npython3 -m dac decode /path/to/output/codes --output /path/to/reconstructed_input\n```\n\nThis command will create `.wav` files with the same name as the input files.\nIt will also preserve the directory structure relative to input root and\nre-create it in the output directory. Please use `python -m dac decode --help`\nfor more options.\n\n### Programmatic Usage\n```py\nimport dac\nfrom dac.utils import load_model\nfrom dac.model import DAC\n\nfrom dac.utils.encode import process as encode\nfrom dac.utils.decode import process as decode\n\nfrom audiotools import AudioSignal\n\n# Init an empty model\nmodel = DAC()\n\n# Load compatible pre-trained model\nmodel = load_model(tag=\"latest\", model_type=\"44khz\")\nmodel.eval()\nmodel.to('cuda')\n\n# Load audio signal file\nsignal = AudioSignal('input.wav')\n\n# Encode audio signal\nencoded_out = encode(signal, 'cuda', model)\n\n# Decode audio signal\nrecon = decode(encoded_out, 'cuda', model, preserve_sample_rate=True)\n\n# Write to file\nrecon.write('recon.wav')\n```\n\n### Docker image\nWe provide a dockerfile to build a docker image with all the necessary\ndependencies.\n1. Building the image.\n    ```\n    docker build -t dac .\n    ```\n2. Using the image.\n\n    Usage on CPU:\n    ```\n    docker run dac <command>\n    ```\n\n    Usage on GPU:\n    ```\n    docker run --gpus=all dac <command>\n    ```\n\n    `<command>` can be one of the compression and reconstruction commands listed\n    above. For example, if you want to run compression,\n\n    ```\n    docker run --gpus=all dac python3 -m dac encode ...\n    ```\n\n\n## Training\nThe baseline model configuration can be trained using the following commands.\n\n### Pre-requisites\nPlease install the correct dependencies\n```\npip install -e \".[dev]\"\n```\n\n\n### Single GPU training\n```\nexport CUDA_VISIBLE_DEVICES=0\npython scripts/train.py --args.load conf/ablations/baseline.yml --save_path runs/baseline/\n```\n\n### Multi GPU training\n```\nexport CUDA_VISIBLE_DEVICES=0,1\ntorchrun --nproc_per_node gpu scripts/train.py --args.load conf/ablations/baseline.yml --save_path runs/baseline/\n```\n\n## Testing\nWe provide two test scripts to test CLI + training functionality. Please\nmake sure that the trainig pre-requisites are satisfied before launching these\ntests. To launch these tests please run\n```\npython -m pytest tests\n```\n\n## Results\n\n<p align=\"left\">\n<img src=\"./assets/objective_comparisons.png\" width=75%></p>\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A high-quality general neural audio codec.",
    "version": "0.0.4",
    "project_urls": {
        "Homepage": "https://github.com/descriptinc/descript-audio-codec"
    },
    "split_keywords": [
        "audio",
        "compression",
        "machine learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cc217575a4b14da01a33daff074663a0ba36b798aed7912d1507a8788719d4e0",
                "md5": "39d0cc7c5510e078858c89a4b4d24282",
                "sha256": "a4cdb02d0ee4b7357b73f2beffccda582e8649d19b2cac5759df8c3ee20eeb71"
            },
            "downloads": -1,
            "filename": "descript_audio_codec-0.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "39d0cc7c5510e078858c89a4b4d24282",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 26659,
            "upload_time": "2023-06-21T17:49:35",
            "upload_time_iso_8601": "2023-06-21T17:49:35.156864Z",
            "url": "https://files.pythonhosted.org/packages/cc/21/7575a4b14da01a33daff074663a0ba36b798aed7912d1507a8788719d4e0/descript_audio_codec-0.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0a6803b18353eee978921c8d6ebcbb42d16781d04ac70448ac7a040c50e3098b",
                "md5": "52553dd1418e8b5873cf58a9264ad924",
                "sha256": "b1b2fc0a3d85ba474318740c7d95858d421d1d5395d8bd4a8d3aaa3fe30f90a0"
            },
            "downloads": -1,
            "filename": "descript-audio-codec-0.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "52553dd1418e8b5873cf58a9264ad924",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 22776,
            "upload_time": "2023-06-21T17:49:36",
            "upload_time_iso_8601": "2023-06-21T17:49:36.340080Z",
            "url": "https://files.pythonhosted.org/packages/0a/68/03b18353eee978921c8d6ebcbb42d16781d04ac70448ac7a040c50e3098b/descript-audio-codec-0.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-21 17:49:36",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "descriptinc",
    "github_project": "descript-audio-codec",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "descript-audio-codec"
}
        
Elapsed time: 0.08026s