# chat-tokenizer
## Usage
```python
from chat_tokenizer import ChatTokenizer
from transformers import AutoTokenizer
tokenizer = ChatTokenizer(AutoTokenizer.from_pretrained("qwen/Qwen2.5-0.5B-Instruct"))
audio_lens = [[4, 2], 3]
labels = [["今天天气不错", "哈哈"], "你好啊"]
input_ids, input_lens, label_ids, label_lens = tokenizer.batch_tokenize(audio_lens, labels)
input_ids = tokenizer.fill_labels(label_ids, input_ids)
```
Raw data
{
"_id": null,
"home_page": "https://github.com/pengzhendong/chat-tokenizer",
"name": "chat-tokenizer",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "Zhendong Peng",
"author_email": "pzd17@tsinghua.org.cn",
"download_url": null,
"platform": null,
"description": "# chat-tokenizer\n\n## Usage\n\n```python\nfrom chat_tokenizer import ChatTokenizer\nfrom transformers import AutoTokenizer\n\n\ntokenizer = ChatTokenizer(AutoTokenizer.from_pretrained(\"qwen/Qwen2.5-0.5B-Instruct\"))\n\naudio_lens = [[4, 2], 3]\nlabels = [[\"\u4eca\u5929\u5929\u6c14\u4e0d\u9519\", \"\u54c8\u54c8\"], \"\u4f60\u597d\u554a\"]\ninput_ids, input_lens, label_ids, label_lens = tokenizer.batch_tokenize(audio_lens, labels)\ninput_ids = tokenizer.fill_labels(label_ids, input_ids)\n```\n",
"bugtrack_url": null,
"license": null,
"summary": "Chat Tokenizer",
"version": "0.0.6",
"project_urls": {
"Homepage": "https://github.com/pengzhendong/chat-tokenizer"
},
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "3dc83dd7381bab754a2a88d9eae3ccd9b0511eb1e1544b3635f942aec2431715",
"md5": "134dc2f48a795171099b906824f64044",
"sha256": "775e507e4c87c87feab6d6b801a4fb531f041521081e3de323253da437f36370"
},
"downloads": -1,
"filename": "chat_tokenizer-0.0.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "134dc2f48a795171099b906824f64044",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 8175,
"upload_time": "2025-03-03T06:36:52",
"upload_time_iso_8601": "2025-03-03T06:36:52.160141Z",
"url": "https://files.pythonhosted.org/packages/3d/c8/3dd7381bab754a2a88d9eae3ccd9b0511eb1e1544b3635f942aec2431715/chat_tokenizer-0.0.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-03-03 06:36:52",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "pengzhendong",
"github_project": "chat-tokenizer",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "torch",
"specs": []
}
],
"lcname": "chat-tokenizer"
}