whitespacetokenizer


Namewhitespacetokenizer JSON
Version 1.0.3 PyPI version JSON
download
home_pagehttps://github.com/mdocekal/whitespacetokenizer
SummaryFast python whitespace tokenizer wtitten in cython.
upload_time2024-11-28 13:08:56
maintainerNone
docs_urlNone
authorMartin Dočekal
requires_python>=3.10
licenseThe Unlicense
keywords tokenizer whitespace
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # whitespacetokenizer
Fast python whitespace tokenizer written in cython that also gives start and end character positions of tokens.

## Installation

    pip install whitespacetokenizer

## Usage

```python
from whitespacetokenizer import whitespace_tokenizer

text = "Hello, world! How are you?"
tokens = whitespace_tokenizer(text)

print(tokens)
# [("Hello,", 0, 6), ("world!", 7, 13), ("How", 14, 17), ("are", 18, 21), ("you?", 22, 26)]
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/mdocekal/whitespacetokenizer",
    "name": "whitespacetokenizer",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "tokenizer, whitespace",
    "author": "Martin Do\u010dekal",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/df/4a/1464b1331fbeb60a8b61c656b4865c61f6f882761798e1da8177c1fcba84/whitespacetokenizer-1.0.3.tar.gz",
    "platform": null,
    "description": "# whitespacetokenizer\nFast python whitespace tokenizer written in cython that also gives start and end character positions of tokens.\n\n## Installation\n\n    pip install whitespacetokenizer\n\n## Usage\n\n```python\nfrom whitespacetokenizer import whitespace_tokenizer\n\ntext = \"Hello, world! How are you?\"\ntokens = whitespace_tokenizer(text)\n\nprint(tokens)\n# [(\"Hello,\", 0, 6), (\"world!\", 7, 13), (\"How\", 14, 17), (\"are\", 18, 21), (\"you?\", 22, 26)]\n```\n",
    "bugtrack_url": null,
    "license": "The Unlicense",
    "summary": "Fast python whitespace tokenizer wtitten in cython.",
    "version": "1.0.3",
    "project_urls": {
        "Homepage": "https://github.com/mdocekal/whitespacetokenizer"
    },
    "split_keywords": [
        "tokenizer",
        " whitespace"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "df4a1464b1331fbeb60a8b61c656b4865c61f6f882761798e1da8177c1fcba84",
                "md5": "c7c8a11297ce11d1c6c486e3525880ae",
                "sha256": "80f81705bee8c36098d7df25dc0d24d716a9712bd37ecac4eb861af4b1d92fe7"
            },
            "downloads": -1,
            "filename": "whitespacetokenizer-1.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "c7c8a11297ce11d1c6c486e3525880ae",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 46123,
            "upload_time": "2024-11-28T13:08:56",
            "upload_time_iso_8601": "2024-11-28T13:08:56.733272Z",
            "url": "https://files.pythonhosted.org/packages/df/4a/1464b1331fbeb60a8b61c656b4865c61f6f882761798e1da8177c1fcba84/whitespacetokenizer-1.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-28 13:08:56",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mdocekal",
    "github_project": "whitespacetokenizer",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "whitespacetokenizer"
}
        
Elapsed time: 0.93602s