# rs-bytepiece
## Install
```bash
pip install rs_bytepiece
```
## Usage
```python
from rs_bytepiece import Tokenizer
tokenizer = Tokenizer()
ids = tokenizer.encode("今天天气不错")
text = tokenizer.decode(ids)
```
## Performance
The performance is a bit faster than the original implementation. I've tested the《鲁迅全集》which has 625890 chars. The time unit is millisecond.
| length | jieba | aho_py | aho_cy | aho_rs |
| ------ | -------- | ------- | ------ | ------ |
| 100 | 17062.12 | 1404.37 | 564.31 | 299.09 |
| 1000 | 17104.38 | 1424.6 | 573.32 | 281.84 |
| 10000 | 17432.58 | 1429.0 | 574.93 | 293.16 |
| 100000 | 17228.17 | 1401.01 | 574.5 | 280.81 |
| 625890 | 17305.95 | 1419.79 | 567.78 | 282.35 |
Raw data
{
"_id": null,
"home_page": null,
"name": "rs-bytepiece",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "NLP,tokenizer,bytepiece,Deep Learning",
"author": "Yam(\u957f\u7434) <haoshaochun@gmail.com>",
"author_email": "Yam(\u957f\u7434) <haoshaochun@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/b3/8f/0c45bbe2b117502ed15e3b006fb5115da493fcb9e5b0e66a204f5b6b00fa/rs_bytepiece-0.1.0.tar.gz",
"platform": null,
"description": "# rs-bytepiece\n\n## Install\n\n```bash\npip install rs_bytepiece\n```\n\n## Usage\n\n```python\nfrom rs_bytepiece import Tokenizer\n\ntokenizer = Tokenizer()\nids = tokenizer.encode(\"\u4eca\u5929\u5929\u6c14\u4e0d\u9519\")\ntext = tokenizer.decode(ids)\n```\n\n## Performance\n\nThe performance is a bit faster than the original implementation. I've tested the\u300a\u9c81\u8fc5\u5168\u96c6\u300bwhich has 625890 chars. The time unit is millisecond.\n\n| length | jieba | aho_py | aho_cy | aho_rs |\n| ------ | -------- | ------- | ------ | ------ |\n| 100 | 17062.12 | 1404.37 | 564.31 | 299.09 |\n| 1000 | 17104.38 | 1424.6 | 573.32 | 281.84 |\n| 10000 | 17432.58 | 1429.0 | 574.93 | 293.16 |\n| 100000 | 17228.17 | 1401.01 | 574.5 | 280.81 |\n| 625890 | 17305.95 | 1419.79 | 567.78 | 282.35 |\n\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "bytepiece-rs Python binding",
"version": "0.1.0",
"project_urls": {
"documentation": "https://github.com/hscspring/bytepiece-rs",
"homepage": "https://github.com/hscspring/bytepiece-rs",
"repository": "https://github.com/hscspring/bytepiece-rs"
},
"split_keywords": [
"nlp",
"tokenizer",
"bytepiece",
"deep learning"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "9ba9989fefc126fc658ab52925ca89ba7d88a2528bd61f6cfd68e1ff10d24f56",
"md5": "144167070d471e4bad2ab650eb3d89dc",
"sha256": "65f88bb0878bae7c5add49dc5077116428e46edc89dbd25b0ce49e098df4981f"
},
"downloads": -1,
"filename": "rs_bytepiece-0.1.0-cp37-abi3-macosx_10_7_x86_64.whl",
"has_sig": false,
"md5_digest": "144167070d471e4bad2ab650eb3d89dc",
"packagetype": "bdist_wheel",
"python_version": "cp37",
"requires_python": ">=3.7",
"size": 2242127,
"upload_time": "2023-09-20T13:26:11",
"upload_time_iso_8601": "2023-09-20T13:26:11.389927Z",
"url": "https://files.pythonhosted.org/packages/9b/a9/989fefc126fc658ab52925ca89ba7d88a2528bd61f6cfd68e1ff10d24f56/rs_bytepiece-0.1.0-cp37-abi3-macosx_10_7_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "91d9cb576d4bbf36b9df2d2fa74cce06b94fa65815138ad7e5fc4da7fb491ac7",
"md5": "4a9de2f2bbe80e54fd7bc3f92058ebee",
"sha256": "404a7aa84ff603b9d4554d30ce8a892249be09b0e0ef1a10ff593367d89ac6a6"
},
"downloads": -1,
"filename": "rs_bytepiece-0.1.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "4a9de2f2bbe80e54fd7bc3f92058ebee",
"packagetype": "bdist_wheel",
"python_version": "cp37",
"requires_python": ">=3.7",
"size": 3391255,
"upload_time": "2023-09-20T13:26:16",
"upload_time_iso_8601": "2023-09-20T13:26:16.690397Z",
"url": "https://files.pythonhosted.org/packages/91/d9/cb576d4bbf36b9df2d2fa74cce06b94fa65815138ad7e5fc4da7fb491ac7/rs_bytepiece-0.1.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "892f50b11b57eea11225e4f19bd493f7a454eed42aa91ed810957a345c3130b9",
"md5": "50f9cfd9ff1f78e9ce16e3374ec7fbdc",
"sha256": "020a47804007a430627016eeb025fe7a6fad18af5704b63fa40408b1ea706538"
},
"downloads": -1,
"filename": "rs_bytepiece-0.1.0-cp37-abi3-win_amd64.whl",
"has_sig": false,
"md5_digest": "50f9cfd9ff1f78e9ce16e3374ec7fbdc",
"packagetype": "bdist_wheel",
"python_version": "cp37",
"requires_python": ">=3.7",
"size": 3757801,
"upload_time": "2023-09-20T13:26:22",
"upload_time_iso_8601": "2023-09-20T13:26:22.317966Z",
"url": "https://files.pythonhosted.org/packages/89/2f/50b11b57eea11225e4f19bd493f7a454eed42aa91ed810957a345c3130b9/rs_bytepiece-0.1.0-cp37-abi3-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "b38f0c45bbe2b117502ed15e3b006fb5115da493fcb9e5b0e66a204f5b6b00fa",
"md5": "128737282102f92368900d8e1d5d4213",
"sha256": "93e434129cd5bf93bdc56771a5bbdca6e775b780e39a0e992bd59d7b378a9083"
},
"downloads": -1,
"filename": "rs_bytepiece-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "128737282102f92368900d8e1d5d4213",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 14609,
"upload_time": "2023-09-20T13:26:25",
"upload_time_iso_8601": "2023-09-20T13:26:25.406890Z",
"url": "https://files.pythonhosted.org/packages/b3/8f/0c45bbe2b117502ed15e3b006fb5115da493fcb9e5b0e66a204f5b6b00fa/rs_bytepiece-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-09-20 13:26:25",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "hscspring",
"github_project": "bytepiece-rs",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "rs-bytepiece"
}