# dtokenizer
discretize everything into tokens
## Introduction
`dtokenizer` is a Python library designed to discretize audio files into tokens using various models. It supports models like Hubert and Encodec for tokenization.
## Installation
To use `dtokenizer`, first ensure you have Python and pip installed. Then, install the required dependencies by running:
```bash
pip install -r requirements.txt
```
## Usage
### Hubert Tokenizer
The Hubert tokenizer can be used to tokenize audio files into discrete tokens and then decode them back. Here's how you can use it:
```python
from dtokenizer.audio.model.hubert_model import HubertTokenizer
import soundfile as sf
ht = HubertTokenizer('hubert_layer6_code100')
code, decodec_stuff = ht.encode_file('./sample2_22k.wav')
wav_values = ht.decode(code)
# Write the decoded audio to a file
sf.write('output.wav', wav_values, 16000)
```
### Encodec Tokenizer
Similarly, the Encodec tokenizer allows for efficient audio file tokenization. Here's an example of its usage:
```python
import torch
from dtokenizer.audio.model.encodec_model import EncodecTokenizer
import torchaudio
et = EncodecTokenizer('encodec_24k_6bps')
code, stuff_for_decode = et.encode_file('./sample2_22k.wav')
wav_values = et.decode(stuff_for_decode)
# Save the decoded audio to a file
torchaudio.save('output.wav', torch.from_numpy(wav_values), 22050)
```
## Contributing
We welcome contributions to the `dtokenizer` project. Please feel free to submit issues or pull requests.
## License
This project is released under the MIT License. See the LICENSE file for more details.
Raw data
{
"_id": null,
"home_page": "https://github.com/voidful/dtokenizer",
"name": "dtokenizer",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "tokenizer",
"author": "Voidful",
"author_email": "voidful.stack@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/4a/c1/b0f69b87d91ba84a46bfe7b925fbc51326590493f938bbc48c090ca2be03/dtokenizer-0.0.6.tar.gz",
"platform": null,
"description": "# dtokenizer\ndiscretize everything into tokens\n\n## Introduction\n`dtokenizer` is a Python library designed to discretize audio files into tokens using various models. It supports models like Hubert and Encodec for tokenization.\n\n## Installation\nTo use `dtokenizer`, first ensure you have Python and pip installed. Then, install the required dependencies by running:\n```bash\npip install -r requirements.txt\n```\n\n## Usage\n\n### Hubert Tokenizer\nThe Hubert tokenizer can be used to tokenize audio files into discrete tokens and then decode them back. Here's how you can use it:\n\n```python\nfrom dtokenizer.audio.model.hubert_model import HubertTokenizer\nimport soundfile as sf\n\nht = HubertTokenizer('hubert_layer6_code100')\ncode, decodec_stuff = ht.encode_file('./sample2_22k.wav')\nwav_values = ht.decode(code)\n\n# Write the decoded audio to a file\nsf.write('output.wav', wav_values, 16000)\n```\n\n### Encodec Tokenizer\nSimilarly, the Encodec tokenizer allows for efficient audio file tokenization. Here's an example of its usage:\n\n```python\nimport torch\nfrom dtokenizer.audio.model.encodec_model import EncodecTokenizer\nimport torchaudio\n\net = EncodecTokenizer('encodec_24k_6bps')\ncode, stuff_for_decode = et.encode_file('./sample2_22k.wav')\nwav_values = et.decode(stuff_for_decode)\n\n# Save the decoded audio to a file\ntorchaudio.save('output.wav', torch.from_numpy(wav_values), 22050)\n```\n\n## Contributing\nWe welcome contributions to the `dtokenizer` project. Please feel free to submit issues or pull requests.\n\n## License\nThis project is released under the MIT License. See the LICENSE file for more details.\n",
"bugtrack_url": null,
"license": "Apache",
"summary": null,
"version": "0.0.6",
"project_urls": {
"Homepage": "https://github.com/voidful/dtokenizer"
},
"split_keywords": [
"tokenizer"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "3aca7ba55ac54043c8a475c2dcc32b778a8de32953eeccb4d6aa005aa916d1e7",
"md5": "5c5805bef6b63013496a0208b7f7c210",
"sha256": "679388a733fe94f3137bce77102bd2d6787f10d48457c6f6c24ec3f3139acf05"
},
"downloads": -1,
"filename": "dtokenizer-0.0.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "5c5805bef6b63013496a0208b7f7c210",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 20335,
"upload_time": "2024-08-13T10:25:59",
"upload_time_iso_8601": "2024-08-13T10:25:59.381758Z",
"url": "https://files.pythonhosted.org/packages/3a/ca/7ba55ac54043c8a475c2dcc32b778a8de32953eeccb4d6aa005aa916d1e7/dtokenizer-0.0.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "4ac1b0f69b87d91ba84a46bfe7b925fbc51326590493f938bbc48c090ca2be03",
"md5": "51cbf0916a031ad1927d19a5515030cb",
"sha256": "d800c475ecaf2ec6f1d1fc0060d2aadd4f218556213a8dfea4c1869253eb473c"
},
"downloads": -1,
"filename": "dtokenizer-0.0.6.tar.gz",
"has_sig": false,
"md5_digest": "51cbf0916a031ad1927d19a5515030cb",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 15112,
"upload_time": "2024-08-13T10:26:00",
"upload_time_iso_8601": "2024-08-13T10:26:00.975118Z",
"url": "https://files.pythonhosted.org/packages/4a/c1/b0f69b87d91ba84a46bfe7b925fbc51326590493f938bbc48c090ca2be03/dtokenizer-0.0.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-13 10:26:00",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "voidful",
"github_project": "dtokenizer",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "dtokenizer"
}