# whitespacetokenizer
Fast python whitespace tokenizer written in cython that also gives start and end character positions of tokens.
## Installation
pip install whitespacetokenizer
## Usage
```python
from whitespacetokenizer import whitespace_tokenizer
text = "Hello, world! How are you?"
tokens = whitespace_tokenizer(text)
print(tokens)
# [("Hello,", 0, 6), ("world!", 7, 13), ("How", 14, 17), ("are", 18, 21), ("you?", 22, 26)]
```
Raw data
{
"_id": null,
"home_page": "https://github.com/mdocekal/whitespacetokenizer",
"name": "whitespacetokenizer",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "tokenizer, whitespace",
"author": "Martin Do\u010dekal",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/df/4a/1464b1331fbeb60a8b61c656b4865c61f6f882761798e1da8177c1fcba84/whitespacetokenizer-1.0.3.tar.gz",
"platform": null,
"description": "# whitespacetokenizer\nFast python whitespace tokenizer written in cython that also gives start and end character positions of tokens.\n\n## Installation\n\n pip install whitespacetokenizer\n\n## Usage\n\n```python\nfrom whitespacetokenizer import whitespace_tokenizer\n\ntext = \"Hello, world! How are you?\"\ntokens = whitespace_tokenizer(text)\n\nprint(tokens)\n# [(\"Hello,\", 0, 6), (\"world!\", 7, 13), (\"How\", 14, 17), (\"are\", 18, 21), (\"you?\", 22, 26)]\n```\n",
"bugtrack_url": null,
"license": "The Unlicense",
"summary": "Fast python whitespace tokenizer wtitten in cython.",
"version": "1.0.3",
"project_urls": {
"Homepage": "https://github.com/mdocekal/whitespacetokenizer"
},
"split_keywords": [
"tokenizer",
" whitespace"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "df4a1464b1331fbeb60a8b61c656b4865c61f6f882761798e1da8177c1fcba84",
"md5": "c7c8a11297ce11d1c6c486e3525880ae",
"sha256": "80f81705bee8c36098d7df25dc0d24d716a9712bd37ecac4eb861af4b1d92fe7"
},
"downloads": -1,
"filename": "whitespacetokenizer-1.0.3.tar.gz",
"has_sig": false,
"md5_digest": "c7c8a11297ce11d1c6c486e3525880ae",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 46123,
"upload_time": "2024-11-28T13:08:56",
"upload_time_iso_8601": "2024-11-28T13:08:56.733272Z",
"url": "https://files.pythonhosted.org/packages/df/4a/1464b1331fbeb60a8b61c656b4865c61f6f882761798e1da8177c1fcba84/whitespacetokenizer-1.0.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-28 13:08:56",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "mdocekal",
"github_project": "whitespacetokenizer",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "whitespacetokenizer"
}