chat-tokenizer


Namechat-tokenizer JSON
Version 0.0.6 PyPI version JSON
download
home_pagehttps://github.com/pengzhendong/chat-tokenizer
SummaryChat Tokenizer
upload_time2025-03-03 06:36:52
maintainerNone
docs_urlNone
authorZhendong Peng
requires_pythonNone
licenseNone
keywords
VCS
bugtrack_url
requirements torch
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # chat-tokenizer

## Usage

```python
from chat_tokenizer import ChatTokenizer
from transformers import AutoTokenizer


tokenizer = ChatTokenizer(AutoTokenizer.from_pretrained("qwen/Qwen2.5-0.5B-Instruct"))

audio_lens = [[4, 2], 3]
labels = [["今天天气不错", "哈哈"], "你好啊"]
input_ids, input_lens, label_ids, label_lens = tokenizer.batch_tokenize(audio_lens, labels)
input_ids = tokenizer.fill_labels(label_ids, input_ids)
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/pengzhendong/chat-tokenizer",
    "name": "chat-tokenizer",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "Zhendong Peng",
    "author_email": "pzd17@tsinghua.org.cn",
    "download_url": null,
    "platform": null,
    "description": "# chat-tokenizer\n\n## Usage\n\n```python\nfrom chat_tokenizer import ChatTokenizer\nfrom transformers import AutoTokenizer\n\n\ntokenizer = ChatTokenizer(AutoTokenizer.from_pretrained(\"qwen/Qwen2.5-0.5B-Instruct\"))\n\naudio_lens = [[4, 2], 3]\nlabels = [[\"\u4eca\u5929\u5929\u6c14\u4e0d\u9519\", \"\u54c8\u54c8\"], \"\u4f60\u597d\u554a\"]\ninput_ids, input_lens, label_ids, label_lens = tokenizer.batch_tokenize(audio_lens, labels)\ninput_ids = tokenizer.fill_labels(label_ids, input_ids)\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Chat Tokenizer",
    "version": "0.0.6",
    "project_urls": {
        "Homepage": "https://github.com/pengzhendong/chat-tokenizer"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "3dc83dd7381bab754a2a88d9eae3ccd9b0511eb1e1544b3635f942aec2431715",
                "md5": "134dc2f48a795171099b906824f64044",
                "sha256": "775e507e4c87c87feab6d6b801a4fb531f041521081e3de323253da437f36370"
            },
            "downloads": -1,
            "filename": "chat_tokenizer-0.0.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "134dc2f48a795171099b906824f64044",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 8175,
            "upload_time": "2025-03-03T06:36:52",
            "upload_time_iso_8601": "2025-03-03T06:36:52.160141Z",
            "url": "https://files.pythonhosted.org/packages/3d/c8/3dd7381bab754a2a88d9eae3ccd9b0511eb1e1544b3635f942aec2431715/chat_tokenizer-0.0.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-03-03 06:36:52",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "pengzhendong",
    "github_project": "chat-tokenizer",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "torch",
            "specs": []
        }
    ],
    "lcname": "chat-tokenizer"
}
        
Elapsed time: 0.71278s