# Fast Sentence Tokenizer (fast-sentence-tokenize)
Best in class tokenizer
## Usage
### Import
```python
from fast_sentence_tokenize import fast_sentence_tokenize
```
### Call Tokenizer
```python
results = fast_sentence_tokenize("isn't a test great!!?")
```
### Results
```json
[
"isn't",
"a",
"test",
"great",
"!",
"!",
"?"
]
```
Note that whitespace is not preserved in the output by default.
This generally results in a more accurate parse from downstream components, but may make the reassembly of the original sentence more challenging.
### Preserve Whitespace
```python
results = fast_sentence_tokenize("isn't a test great!!?", eliminate_whitespace=False)
```
### Results
```json
[
"isn't ",
"a ",
"test ",
"great",
"!",
"!",
"?"
]
```
This option preserves whitespace.
This is useful if you want to re-assemble the tokens using the pre-existing spacing
```python
assert ''.join(tokens) == input_text
```
Raw data
{
"_id": null,
"home_page": "https://github.com/craigtrim/fast-sentence-tokenize",
"name": "fast-sentence-tokenize",
"maintainer": "Craig Trim",
"docs_url": null,
"requires_python": ">=3.8.5,<4.0.0",
"maintainer_email": "craigtrim@gmail.com",
"keywords": "nlp,nlu,text,classify,classification",
"author": "Craig Trim",
"author_email": "craigtrim@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/36/59/1c68d48388ab9d7e6e77a2d6029d94317159bd1d6dadb6a533facd99cdf1/fast_sentence_tokenize-0.1.15.tar.gz",
"platform": null,
"description": "# Fast Sentence Tokenizer (fast-sentence-tokenize)\nBest in class tokenizer\n\n## Usage\n\n### Import\n```python\nfrom fast_sentence_tokenize import fast_sentence_tokenize\n```\n\n### Call Tokenizer\n```python\nresults = fast_sentence_tokenize(\"isn't a test great!!?\")\n```\n\n### Results\n```json\n[\n \"isn't\",\n \"a\",\n \"test\",\n \"great\",\n \"!\",\n \"!\",\n \"?\"\n]\n```\nNote that whitespace is not preserved in the output by default.\n\nThis generally results in a more accurate parse from downstream components, but may make the reassembly of the original sentence more challenging.\n\n### Preserve Whitespace\n```python\nresults = fast_sentence_tokenize(\"isn't a test great!!?\", eliminate_whitespace=False)\n```\n### Results\n```json\n[\n \"isn't \",\n \"a \",\n \"test \",\n \"great\",\n \"!\",\n \"!\",\n \"?\"\n]\n```\n\nThis option preserves whitespace.\n\nThis is useful if you want to re-assemble the tokens using the pre-existing spacing\n```python\nassert ''.join(tokens) == input_text\n```\n",
"bugtrack_url": null,
"license": "None",
"summary": "Fast and Efficient Sentence Tokenization",
"version": "0.1.15",
"project_urls": {
"Bug Tracker": "https://github.com/craigtrim/fast-sentence-tokenize/issues",
"Homepage": "https://github.com/craigtrim/fast-sentence-tokenize",
"Repository": "https://github.com/craigtrim/fast-sentence-tokenize"
},
"split_keywords": [
"nlp",
"nlu",
"text",
"classify",
"classification"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0ebc4f5de44e36700aff3303c1def32e7d155146cd9070d7f92ca8904e9983c2",
"md5": "6f255453224b8296ff8dab0677c56b88",
"sha256": "85eed0ba762a6f919c7628b8c6951c5a09abf8f0544bfcf5add033c0e59e0b8d"
},
"downloads": -1,
"filename": "fast_sentence_tokenize-0.1.15-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6f255453224b8296ff8dab0677c56b88",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8.5,<4.0.0",
"size": 13889,
"upload_time": "2023-06-27T19:14:00",
"upload_time_iso_8601": "2023-06-27T19:14:00.714537Z",
"url": "https://files.pythonhosted.org/packages/0e/bc/4f5de44e36700aff3303c1def32e7d155146cd9070d7f92ca8904e9983c2/fast_sentence_tokenize-0.1.15-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "36591c68d48388ab9d7e6e77a2d6029d94317159bd1d6dadb6a533facd99cdf1",
"md5": "c3ab532f89691946b53b66991e91a87b",
"sha256": "0f5d8f5691f8dc41e321eac720ddaf1cb59fd33259e5482f78992e26162ac294"
},
"downloads": -1,
"filename": "fast_sentence_tokenize-0.1.15.tar.gz",
"has_sig": false,
"md5_digest": "c3ab532f89691946b53b66991e91a87b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8.5,<4.0.0",
"size": 9308,
"upload_time": "2023-06-27T19:14:02",
"upload_time_iso_8601": "2023-06-27T19:14:02.743296Z",
"url": "https://files.pythonhosted.org/packages/36/59/1c68d48388ab9d7e6e77a2d6029d94317159bd1d6dadb6a533facd99cdf1/fast_sentence_tokenize-0.1.15.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-06-27 19:14:02",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "craigtrim",
"github_project": "fast-sentence-tokenize",
"github_not_found": true,
"lcname": "fast-sentence-tokenize"
}