[](https://badge.fury.io/py/tokeniser)
[](https://opensource.org/licenses/MIT)
[](https://pepy.tech/project/tokeniser)
# Tokeniser
`Tokeniser` is a lightweight Python package designed for simple and efficient token counting in text. It uses regular expressions to identify tokens, providing a straightforward approach to tokenization without relying on complex NLP models.
## Installation
To install `Tokeniser`, you can use pip:
```bash
pip install tokeniser
```
## Usage
`Tokeniser` is easy to use in your Python scripts. Here's a basic example:
```python
import tokeniser
text = "Hello, World!"
token_count = tokeniser.estimate_tokens(text)
print(f"Number of tokens: {token_count}")
```
This package is ideal for scenarios where a simple token count is needed, without the overhead of more complex NLP tools.
## Features
- Simple and efficient token counting using regular expressions.
- Lightweight with no dependencies on large NLP models or frameworks.
- Versatile for use in various text processing tasks.
## Contributing
Contributions, issues, and feature requests are welcome! Feel free to check the [issues page](https://github.com/chigwell/tokeniser/issues).
## License
This project is licensed under the [MIT License](https://choosealicense.com/licenses/mit/).
Raw data
{
"_id": null,
"home_page": "https://github.com/chigwell/tokeniser",
"name": "tokeniser",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": null,
"author": "Eugene Evstafev",
"author_email": "chigwel@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/07/ea/1548d27059f09987588d2a9dcf3fe75a9f058e59215d07a257ebc5d4386f/tokeniser-0.0.3.tar.gz",
"platform": null,
"description": "[](https://badge.fury.io/py/tokeniser)\n[](https://opensource.org/licenses/MIT)\n[](https://pepy.tech/project/tokeniser)\n\n# Tokeniser\n\n`Tokeniser` is a lightweight Python package designed for simple and efficient token counting in text. It uses regular expressions to identify tokens, providing a straightforward approach to tokenization without relying on complex NLP models.\n\n## Installation\n\nTo install `Tokeniser`, you can use pip:\n\n```bash\npip install tokeniser\n```\n\n## Usage\n\n`Tokeniser` is easy to use in your Python scripts. Here's a basic example:\n\n```python\nimport tokeniser\n\ntext = \"Hello, World!\"\ntoken_count = tokeniser.estimate_tokens(text)\nprint(f\"Number of tokens: {token_count}\")\n```\n\nThis package is ideal for scenarios where a simple token count is needed, without the overhead of more complex NLP tools.\n\n## Features\n\n- Simple and efficient token counting using regular expressions.\n- Lightweight with no dependencies on large NLP models or frameworks.\n- Versatile for use in various text processing tasks.\n\n## Contributing\n\nContributions, issues, and feature requests are welcome! Feel free to check the [issues page](https://github.com/chigwell/tokeniser/issues).\n\n## License\n\nThis project is licensed under the [MIT License](https://choosealicense.com/licenses/mit/).\n",
"bugtrack_url": null,
"license": null,
"summary": null,
"version": "0.0.3",
"project_urls": {
"Homepage": "https://github.com/chigwell/tokeniser"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "240d8777b5942fb608ac3b6a81427658d7336667e61656e927f62ad9ca800518",
"md5": "f2ce4884c26af5da24f8c4cdaf6a006c",
"sha256": "7940ab3b2a02b8b02307805c2cc53cf7c591fd9b106c963ad017349cf65330f0"
},
"downloads": -1,
"filename": "tokeniser-0.0.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f2ce4884c26af5da24f8c4cdaf6a006c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 3152,
"upload_time": "2024-04-15T09:46:47",
"upload_time_iso_8601": "2024-04-15T09:46:47.388403Z",
"url": "https://files.pythonhosted.org/packages/24/0d/8777b5942fb608ac3b6a81427658d7336667e61656e927f62ad9ca800518/tokeniser-0.0.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "07ea1548d27059f09987588d2a9dcf3fe75a9f058e59215d07a257ebc5d4386f",
"md5": "fdde3f89d5b3f6cb15fd5df9d17f9734",
"sha256": "5d3160809f4ea9288b93aeff67fe0f22bccc63fd729173df591a5b8b65543c95"
},
"downloads": -1,
"filename": "tokeniser-0.0.3.tar.gz",
"has_sig": false,
"md5_digest": "fdde3f89d5b3f6cb15fd5df9d17f9734",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 2803,
"upload_time": "2024-04-15T09:46:50",
"upload_time_iso_8601": "2024-04-15T09:46:50.339533Z",
"url": "https://files.pythonhosted.org/packages/07/ea/1548d27059f09987588d2a9dcf3fe75a9f058e59215d07a257ebc5d4386f/tokeniser-0.0.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-15 09:46:50",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "chigwell",
"github_project": "tokeniser",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "tokeniser"
}