Name | bnlp-toolkit JSON |
Version |
4.0.3
JSON |
| download |
home_page | https://github.com/sagorbrur/bnlp |
Summary | BNLP is a natural language processing toolkit for Bengali Language |
upload_time | 2024-08-20 11:37:41 |
maintainer | None |
docs_url | None |
author | Sagor Sarker |
requires_python | >=3.6 |
license | MIT |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Bengali Natural Language Processing(BNLP)
[![PyPI version](https://img.shields.io/pypi/v/bnlp_toolkit)](https://pypi.org/project/bnlp-toolkit/)
[![Downloads](https://static.pepy.tech/badge/bnlp_toolkit)](https://pepy.tech/project/bnlp_toolkit)
BNLP is a natural language processing toolkit for Bengali Language. This tool will help you to **tokenize Bengali text**, **Embedding Bengali words**, **Embedding Bengali Document**, **Bengali POS Tagging**, **Bengali Name Entity Recognition**, **Bangla Text Cleaning** for Bengali NLP purposes.
## Features
- Tokenization
- [Basic Tokenizer](./docs/README.md#basic-tokenizer)
- [NLTK Tokenizer](./docs/README.md#nltk-tokenization)
- [Sentencepiece Tokenizer](./docs/README.md#bengali-sentencepiece-tokenization)
- Embeddings
- [Word2vec embedding](./docs/README.md#bengali-word2vec)
- [Fasttext embedding](./docs/README.md#bengali-fasttext)
- [Glove Embedding](./docs/README.md#bengali-glove-word-vectors)
- [Doc2vec Document embedding](./docs/README.md#document-embedding)
- Part of speech tagging
- [CRF-based POS tagging](./docs/README.md#bengali-crf-pos-tagging)
- Named Entity Recognition
- [CRF-based NER](./docs/README.md#bengali-crf-ner)
- [Text Cleaning](./docs/README.md#text-cleaning)
- [Corpus](./docs/README.md#bengali-corpus-class)
- Letters, vowels, punctuations, stopwords
## Installation
### PIP installer
```
pip install bnlp_toolkit
```
**or Upgrade**
```
pip install -U bnlp_toolkit
```
- Python: 3.8, 3.9, 3.10, 3.11
- OS: Linux, Windows, Mac
### Build from source
```
git clone https://github.com/sagorbrur/bnlp.git
cd bnlp
python setup.py install
```
## Sample Usage
```py
from bnlp import BasicTokenizer
tokenizer = BasicTokenizer()
raw_text = "আমি বাংলায় গান গাই।"
tokens = tokenizer(raw_text)
print(tokens)
# output: ["আমি", "বাংলায়", "গান", "গাই", "।"]
```
## Documentation
Full documentation are available [here](https://github.com/sagorbrur/bnlp/tree/master/docs)
If you are using previous version of **bnlp** check the documentation [archive](https://github.com/sagorbrur/bnlp/tree/master/docs/archive)
## Contributor Guide
Check [CONTRIBUTING.md](https://github.com/sagorbrur/bnlp/blob/master/CONTRIBUTING.md) page for details.
## Thanks To
* [Semantics Lab](https://www.facebook.com/lab.semantics/)
* All the developers who are contributing to enrich Bengali NLP.
Raw data
{
"_id": null,
"home_page": "https://github.com/sagorbrur/bnlp",
"name": "bnlp-toolkit",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": null,
"author": "Sagor Sarker",
"author_email": "sagorhem3532@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/8c/fb/d89762ed8ce174d896eec825646304df28f20ab2dcd13e131e4fcd6eeff3/bnlp_toolkit-4.0.3.tar.gz",
"platform": null,
"description": "# Bengali Natural Language Processing(BNLP)\n\n[![PyPI version](https://img.shields.io/pypi/v/bnlp_toolkit)](https://pypi.org/project/bnlp-toolkit/)\n[![Downloads](https://static.pepy.tech/badge/bnlp_toolkit)](https://pepy.tech/project/bnlp_toolkit)\n\nBNLP is a natural language processing toolkit for Bengali Language. This tool will help you to **tokenize Bengali text**, **Embedding Bengali words**, **Embedding Bengali Document**, **Bengali POS Tagging**, **Bengali Name Entity Recognition**, **Bangla Text Cleaning** for Bengali NLP purposes.\n\n\n## Features\n- Tokenization\n - [Basic Tokenizer](./docs/README.md#basic-tokenizer)\n - [NLTK Tokenizer](./docs/README.md#nltk-tokenization)\n - [Sentencepiece Tokenizer](./docs/README.md#bengali-sentencepiece-tokenization)\n- Embeddings\n - [Word2vec embedding](./docs/README.md#bengali-word2vec)\n - [Fasttext embedding](./docs/README.md#bengali-fasttext)\n - [Glove Embedding](./docs/README.md#bengali-glove-word-vectors)\n - [Doc2vec Document embedding](./docs/README.md#document-embedding)\n- Part of speech tagging\n - [CRF-based POS tagging](./docs/README.md#bengali-crf-pos-tagging)\n- Named Entity Recognition\n - [CRF-based NER](./docs/README.md#bengali-crf-ner)\n- [Text Cleaning](./docs/README.md#text-cleaning)\n- [Corpus](./docs/README.md#bengali-corpus-class)\n - Letters, vowels, punctuations, stopwords\n\n## Installation\n\n### PIP installer\n\n ```\n pip install bnlp_toolkit\n ```\n **or Upgrade**\n\n ```\n pip install -U bnlp_toolkit\n ```\n - Python: 3.8, 3.9, 3.10, 3.11\n - OS: Linux, Windows, Mac\n\n### Build from source\n```\ngit clone https://github.com/sagorbrur/bnlp.git\ncd bnlp\npython setup.py install\n```\n\n## Sample Usage\n\n```py\nfrom bnlp import BasicTokenizer\n\ntokenizer = BasicTokenizer()\n\nraw_text = \"\u0986\u09ae\u09bf \u09ac\u09be\u0982\u09b2\u09be\u09df \u0997\u09be\u09a8 \u0997\u09be\u0987\u0964\"\ntokens = tokenizer(raw_text)\nprint(tokens)\n# output: [\"\u0986\u09ae\u09bf\", \"\u09ac\u09be\u0982\u09b2\u09be\u09df\", \"\u0997\u09be\u09a8\", \"\u0997\u09be\u0987\", \"\u0964\"]\n```\n\n## Documentation\nFull documentation are available [here](https://github.com/sagorbrur/bnlp/tree/master/docs)\n\nIf you are using previous version of **bnlp** check the documentation [archive](https://github.com/sagorbrur/bnlp/tree/master/docs/archive)\n\n## Contributor Guide\n\nCheck [CONTRIBUTING.md](https://github.com/sagorbrur/bnlp/blob/master/CONTRIBUTING.md) page for details.\n\n\n## Thanks To\n\n* [Semantics Lab](https://www.facebook.com/lab.semantics/)\n* All the developers who are contributing to enrich Bengali NLP.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "BNLP is a natural language processing toolkit for Bengali Language",
"version": "4.0.3",
"project_urls": {
"Homepage": "https://github.com/sagorbrur/bnlp"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "71ef519a0f9b66086db2e6e2404d5bd7754c8286d3308fee48d81fd23a5819c8",
"md5": "cfda69c0e84ebf330eb7ee31451fbe2e",
"sha256": "dda244b9f97f4e8cec501deb2ae7a0c4aa85a3ca91910cb3dc850b4c9bee66c7"
},
"downloads": -1,
"filename": "bnlp_toolkit-4.0.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "cfda69c0e84ebf330eb7ee31451fbe2e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 22667,
"upload_time": "2024-08-20T11:37:39",
"upload_time_iso_8601": "2024-08-20T11:37:39.723285Z",
"url": "https://files.pythonhosted.org/packages/71/ef/519a0f9b66086db2e6e2404d5bd7754c8286d3308fee48d81fd23a5819c8/bnlp_toolkit-4.0.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "8cfbd89762ed8ce174d896eec825646304df28f20ab2dcd13e131e4fcd6eeff3",
"md5": "9ad5111340fe0cd388391b9f06d28e8d",
"sha256": "4a19b4bba6347635b5cda88e0cf9e266938c57648d1eeb629ff65745f4791b94"
},
"downloads": -1,
"filename": "bnlp_toolkit-4.0.3.tar.gz",
"has_sig": false,
"md5_digest": "9ad5111340fe0cd388391b9f06d28e8d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 18531,
"upload_time": "2024-08-20T11:37:41",
"upload_time_iso_8601": "2024-08-20T11:37:41.148281Z",
"url": "https://files.pythonhosted.org/packages/8c/fb/d89762ed8ce174d896eec825646304df28f20ab2dcd13e131e4fcd6eeff3/bnlp_toolkit-4.0.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-20 11:37:41",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "sagorbrur",
"github_project": "bnlp",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "bnlp-toolkit"
}