Tokenize2


NameTokenize2 JSON
Version 2.0.3 PyPI version JSON
download
home_pagehttps://github.com/TnsaAi/Tokenize2
SummaryA byte-level BPE tokenizer for efficient text processing
upload_time2024-10-16 06:03:23
maintainerNone
docs_urlNone
authorTNSA AI
requires_python>=3.6
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Tokenize2

Tokenize2 is an improved byte-level BPE tokenizer, inspired by models like GPT-3, designed for efficient tokenization of text into subword units. It supports special tokens and byte-level text handling for robust tokenization, including for non-ASCII characters.

## Features

- Byte-level tokenization for handling a wide range of characters
- Special tokens (like `<PAD>`, `<UNK>`) for flexible token management
- Supports efficient BPE merges for subword tokenization
- Suitable for natural language processing and text generation tasks

## Installation

You can install Tokenize2 via pip:


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/TnsaAi/Tokenize2",
    "name": "Tokenize2",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": null,
    "author": "TNSA AI",
    "author_email": "thishyakethabimalla@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/23/1e/19f970ac89f7b6dcbafa5aed933331fd2a57cb989c9ecc7f756f15d3df40/tokenize2-2.0.3.tar.gz",
    "platform": null,
    "description": "# Tokenize2\r\n\r\nTokenize2 is an improved byte-level BPE tokenizer, inspired by models like GPT-3, designed for efficient tokenization of text into subword units. It supports special tokens and byte-level text handling for robust tokenization, including for non-ASCII characters.\r\n\r\n## Features\r\n\r\n- Byte-level tokenization for handling a wide range of characters\r\n- Special tokens (like `<PAD>`, `<UNK>`) for flexible token management\r\n- Supports efficient BPE merges for subword tokenization\r\n- Suitable for natural language processing and text generation tasks\r\n\r\n## Installation\r\n\r\nYou can install Tokenize2 via pip:\r\n\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A byte-level BPE tokenizer for efficient text processing",
    "version": "2.0.3",
    "project_urls": {
        "Homepage": "https://github.com/TnsaAi/Tokenize2"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b4a0dba0a280c36708e5c1870dc1d8cb6b8eccb31287a9fadbcbc4ccc73a1276",
                "md5": "ca993e1526c2092b85dadef7a43745e3",
                "sha256": "6f58195d600e503d295bdbf650ce2b58cee4de0e7bd3ebc773ffa0350e492ecc"
            },
            "downloads": -1,
            "filename": "Tokenize2-2.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ca993e1526c2092b85dadef7a43745e3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 4181,
            "upload_time": "2024-10-16T06:03:22",
            "upload_time_iso_8601": "2024-10-16T06:03:22.253518Z",
            "url": "https://files.pythonhosted.org/packages/b4/a0/dba0a280c36708e5c1870dc1d8cb6b8eccb31287a9fadbcbc4ccc73a1276/Tokenize2-2.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "231e19f970ac89f7b6dcbafa5aed933331fd2a57cb989c9ecc7f756f15d3df40",
                "md5": "5ae284fa012d2385861c04a229e1eee5",
                "sha256": "61c5730bc4d897cb55dae1c9a00b05ffc66d7c794623f75f33d6e69f7c7a6d86"
            },
            "downloads": -1,
            "filename": "tokenize2-2.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "5ae284fa012d2385861c04a229e1eee5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 3850,
            "upload_time": "2024-10-16T06:03:23",
            "upload_time_iso_8601": "2024-10-16T06:03:23.533248Z",
            "url": "https://files.pythonhosted.org/packages/23/1e/19f970ac89f7b6dcbafa5aed933331fd2a57cb989c9ecc7f756f15d3df40/tokenize2-2.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-16 06:03:23",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "TnsaAi",
    "github_project": "Tokenize2",
    "github_not_found": true,
    "lcname": "tokenize2"
}
        
Elapsed time: 2.17043s