rs-bytepiece


Namers-bytepiece JSON
Version 0.1.0 PyPI version JSON
download
home_pageNone
Summarybytepiece-rs Python binding
upload_time2023-09-20 13:26:25
maintainerNone
docs_urlNone
authorYam(长琴) <haoshaochun@gmail.com>
requires_python>=3.7
licenseMIT
keywords nlp tokenizer bytepiece deep learning
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # rs-bytepiece

## Install

```bash
pip install rs_bytepiece
```

## Usage

```python
from rs_bytepiece import Tokenizer

tokenizer = Tokenizer()
ids = tokenizer.encode("今天天气不错")
text = tokenizer.decode(ids)
```

## Performance

The performance is a bit faster than the original implementation. I've tested the《鲁迅全集》which has 625890 chars. The time unit is millisecond.

| length | jieba    | aho_py  | aho_cy | aho_rs |
| ------ | -------- | ------- | ------ | ------ |
| 100    | 17062.12 | 1404.37 | 564.31 | 299.09 |
| 1000   | 17104.38 | 1424.6  | 573.32 | 281.84 |
| 10000  | 17432.58 | 1429.0  | 574.93 | 293.16 |
| 100000 | 17228.17 | 1401.01 | 574.5  | 280.81 |
| 625890 | 17305.95 | 1419.79 | 567.78 | 282.35 |



            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "rs-bytepiece",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "NLP,tokenizer,bytepiece,Deep Learning",
    "author": "Yam(\u957f\u7434) <haoshaochun@gmail.com>",
    "author_email": "Yam(\u957f\u7434) <haoshaochun@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/b3/8f/0c45bbe2b117502ed15e3b006fb5115da493fcb9e5b0e66a204f5b6b00fa/rs_bytepiece-0.1.0.tar.gz",
    "platform": null,
    "description": "# rs-bytepiece\n\n## Install\n\n```bash\npip install rs_bytepiece\n```\n\n## Usage\n\n```python\nfrom rs_bytepiece import Tokenizer\n\ntokenizer = Tokenizer()\nids = tokenizer.encode(\"\u4eca\u5929\u5929\u6c14\u4e0d\u9519\")\ntext = tokenizer.decode(ids)\n```\n\n## Performance\n\nThe performance is a bit faster than the original implementation. I've tested the\u300a\u9c81\u8fc5\u5168\u96c6\u300bwhich has 625890 chars. The time unit is millisecond.\n\n| length | jieba    | aho_py  | aho_cy | aho_rs |\n| ------ | -------- | ------- | ------ | ------ |\n| 100    | 17062.12 | 1404.37 | 564.31 | 299.09 |\n| 1000   | 17104.38 | 1424.6  | 573.32 | 281.84 |\n| 10000  | 17432.58 | 1429.0  | 574.93 | 293.16 |\n| 100000 | 17228.17 | 1401.01 | 574.5  | 280.81 |\n| 625890 | 17305.95 | 1419.79 | 567.78 | 282.35 |\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "bytepiece-rs Python binding",
    "version": "0.1.0",
    "project_urls": {
        "documentation": "https://github.com/hscspring/bytepiece-rs",
        "homepage": "https://github.com/hscspring/bytepiece-rs",
        "repository": "https://github.com/hscspring/bytepiece-rs"
    },
    "split_keywords": [
        "nlp",
        "tokenizer",
        "bytepiece",
        "deep learning"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9ba9989fefc126fc658ab52925ca89ba7d88a2528bd61f6cfd68e1ff10d24f56",
                "md5": "144167070d471e4bad2ab650eb3d89dc",
                "sha256": "65f88bb0878bae7c5add49dc5077116428e46edc89dbd25b0ce49e098df4981f"
            },
            "downloads": -1,
            "filename": "rs_bytepiece-0.1.0-cp37-abi3-macosx_10_7_x86_64.whl",
            "has_sig": false,
            "md5_digest": "144167070d471e4bad2ab650eb3d89dc",
            "packagetype": "bdist_wheel",
            "python_version": "cp37",
            "requires_python": ">=3.7",
            "size": 2242127,
            "upload_time": "2023-09-20T13:26:11",
            "upload_time_iso_8601": "2023-09-20T13:26:11.389927Z",
            "url": "https://files.pythonhosted.org/packages/9b/a9/989fefc126fc658ab52925ca89ba7d88a2528bd61f6cfd68e1ff10d24f56/rs_bytepiece-0.1.0-cp37-abi3-macosx_10_7_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "91d9cb576d4bbf36b9df2d2fa74cce06b94fa65815138ad7e5fc4da7fb491ac7",
                "md5": "4a9de2f2bbe80e54fd7bc3f92058ebee",
                "sha256": "404a7aa84ff603b9d4554d30ce8a892249be09b0e0ef1a10ff593367d89ac6a6"
            },
            "downloads": -1,
            "filename": "rs_bytepiece-0.1.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "4a9de2f2bbe80e54fd7bc3f92058ebee",
            "packagetype": "bdist_wheel",
            "python_version": "cp37",
            "requires_python": ">=3.7",
            "size": 3391255,
            "upload_time": "2023-09-20T13:26:16",
            "upload_time_iso_8601": "2023-09-20T13:26:16.690397Z",
            "url": "https://files.pythonhosted.org/packages/91/d9/cb576d4bbf36b9df2d2fa74cce06b94fa65815138ad7e5fc4da7fb491ac7/rs_bytepiece-0.1.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "892f50b11b57eea11225e4f19bd493f7a454eed42aa91ed810957a345c3130b9",
                "md5": "50f9cfd9ff1f78e9ce16e3374ec7fbdc",
                "sha256": "020a47804007a430627016eeb025fe7a6fad18af5704b63fa40408b1ea706538"
            },
            "downloads": -1,
            "filename": "rs_bytepiece-0.1.0-cp37-abi3-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "50f9cfd9ff1f78e9ce16e3374ec7fbdc",
            "packagetype": "bdist_wheel",
            "python_version": "cp37",
            "requires_python": ">=3.7",
            "size": 3757801,
            "upload_time": "2023-09-20T13:26:22",
            "upload_time_iso_8601": "2023-09-20T13:26:22.317966Z",
            "url": "https://files.pythonhosted.org/packages/89/2f/50b11b57eea11225e4f19bd493f7a454eed42aa91ed810957a345c3130b9/rs_bytepiece-0.1.0-cp37-abi3-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b38f0c45bbe2b117502ed15e3b006fb5115da493fcb9e5b0e66a204f5b6b00fa",
                "md5": "128737282102f92368900d8e1d5d4213",
                "sha256": "93e434129cd5bf93bdc56771a5bbdca6e775b780e39a0e992bd59d7b378a9083"
            },
            "downloads": -1,
            "filename": "rs_bytepiece-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "128737282102f92368900d8e1d5d4213",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 14609,
            "upload_time": "2023-09-20T13:26:25",
            "upload_time_iso_8601": "2023-09-20T13:26:25.406890Z",
            "url": "https://files.pythonhosted.org/packages/b3/8f/0c45bbe2b117502ed15e3b006fb5115da493fcb9e5b0e66a204f5b6b00fa/rs_bytepiece-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-20 13:26:25",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "hscspring",
    "github_project": "bytepiece-rs",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "rs-bytepiece"
}
        
Elapsed time: 0.16924s