dtokenizer


Namedtokenizer JSON
Version 0.0.6 PyPI version JSON
download
home_pagehttps://github.com/voidful/dtokenizer
SummaryNone
upload_time2024-08-13 10:26:00
maintainerNone
docs_urlNone
authorVoidful
requires_pythonNone
licenseApache
keywords tokenizer
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # dtokenizer
discretize everything into tokens

## Introduction
`dtokenizer` is a Python library designed to discretize audio files into tokens using various models. It supports models like Hubert and Encodec for tokenization.

## Installation
To use `dtokenizer`, first ensure you have Python and pip installed. Then, install the required dependencies by running:
```bash
pip install -r requirements.txt
```

## Usage

### Hubert Tokenizer
The Hubert tokenizer can be used to tokenize audio files into discrete tokens and then decode them back. Here's how you can use it:

```python
from dtokenizer.audio.model.hubert_model import HubertTokenizer
import soundfile as sf

ht = HubertTokenizer('hubert_layer6_code100')
code, decodec_stuff = ht.encode_file('./sample2_22k.wav')
wav_values = ht.decode(code)

# Write the decoded audio to a file
sf.write('output.wav', wav_values, 16000)
```

### Encodec Tokenizer
Similarly, the Encodec tokenizer allows for efficient audio file tokenization. Here's an example of its usage:

```python
import torch
from dtokenizer.audio.model.encodec_model import EncodecTokenizer
import torchaudio

et = EncodecTokenizer('encodec_24k_6bps')
code, stuff_for_decode = et.encode_file('./sample2_22k.wav')
wav_values = et.decode(stuff_for_decode)

# Save the decoded audio to a file
torchaudio.save('output.wav', torch.from_numpy(wav_values), 22050)
```

## Contributing
We welcome contributions to the `dtokenizer` project. Please feel free to submit issues or pull requests.

## License
This project is released under the MIT License. See the LICENSE file for more details.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/voidful/dtokenizer",
    "name": "dtokenizer",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "tokenizer",
    "author": "Voidful",
    "author_email": "voidful.stack@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/4a/c1/b0f69b87d91ba84a46bfe7b925fbc51326590493f938bbc48c090ca2be03/dtokenizer-0.0.6.tar.gz",
    "platform": null,
    "description": "# dtokenizer\ndiscretize everything into tokens\n\n## Introduction\n`dtokenizer` is a Python library designed to discretize audio files into tokens using various models. It supports models like Hubert and Encodec for tokenization.\n\n## Installation\nTo use `dtokenizer`, first ensure you have Python and pip installed. Then, install the required dependencies by running:\n```bash\npip install -r requirements.txt\n```\n\n## Usage\n\n### Hubert Tokenizer\nThe Hubert tokenizer can be used to tokenize audio files into discrete tokens and then decode them back. Here's how you can use it:\n\n```python\nfrom dtokenizer.audio.model.hubert_model import HubertTokenizer\nimport soundfile as sf\n\nht = HubertTokenizer('hubert_layer6_code100')\ncode, decodec_stuff = ht.encode_file('./sample2_22k.wav')\nwav_values = ht.decode(code)\n\n# Write the decoded audio to a file\nsf.write('output.wav', wav_values, 16000)\n```\n\n### Encodec Tokenizer\nSimilarly, the Encodec tokenizer allows for efficient audio file tokenization. Here's an example of its usage:\n\n```python\nimport torch\nfrom dtokenizer.audio.model.encodec_model import EncodecTokenizer\nimport torchaudio\n\net = EncodecTokenizer('encodec_24k_6bps')\ncode, stuff_for_decode = et.encode_file('./sample2_22k.wav')\nwav_values = et.decode(stuff_for_decode)\n\n# Save the decoded audio to a file\ntorchaudio.save('output.wav', torch.from_numpy(wav_values), 22050)\n```\n\n## Contributing\nWe welcome contributions to the `dtokenizer` project. Please feel free to submit issues or pull requests.\n\n## License\nThis project is released under the MIT License. See the LICENSE file for more details.\n",
    "bugtrack_url": null,
    "license": "Apache",
    "summary": null,
    "version": "0.0.6",
    "project_urls": {
        "Homepage": "https://github.com/voidful/dtokenizer"
    },
    "split_keywords": [
        "tokenizer"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3aca7ba55ac54043c8a475c2dcc32b778a8de32953eeccb4d6aa005aa916d1e7",
                "md5": "5c5805bef6b63013496a0208b7f7c210",
                "sha256": "679388a733fe94f3137bce77102bd2d6787f10d48457c6f6c24ec3f3139acf05"
            },
            "downloads": -1,
            "filename": "dtokenizer-0.0.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5c5805bef6b63013496a0208b7f7c210",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 20335,
            "upload_time": "2024-08-13T10:25:59",
            "upload_time_iso_8601": "2024-08-13T10:25:59.381758Z",
            "url": "https://files.pythonhosted.org/packages/3a/ca/7ba55ac54043c8a475c2dcc32b778a8de32953eeccb4d6aa005aa916d1e7/dtokenizer-0.0.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4ac1b0f69b87d91ba84a46bfe7b925fbc51326590493f938bbc48c090ca2be03",
                "md5": "51cbf0916a031ad1927d19a5515030cb",
                "sha256": "d800c475ecaf2ec6f1d1fc0060d2aadd4f218556213a8dfea4c1869253eb473c"
            },
            "downloads": -1,
            "filename": "dtokenizer-0.0.6.tar.gz",
            "has_sig": false,
            "md5_digest": "51cbf0916a031ad1927d19a5515030cb",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 15112,
            "upload_time": "2024-08-13T10:26:00",
            "upload_time_iso_8601": "2024-08-13T10:26:00.975118Z",
            "url": "https://files.pythonhosted.org/packages/4a/c1/b0f69b87d91ba84a46bfe7b925fbc51326590493f938bbc48c090ca2be03/dtokenizer-0.0.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-13 10:26:00",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "voidful",
    "github_project": "dtokenizer",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "dtokenizer"
}
        
Elapsed time: 2.66653s