# Bengali Natural Language Processing(BengaliNLP)
[](https://pypi.org/project/bengalinlp/)
[](https://pepy.tech/project/bengalinlp)
BengaliNLP is a natural language processing toolkit for Bengali Language. This tool will help you to **tokenize Bengali text**, **Embedding Bengali words**, **Embedding Bengali Document**, **Bengali POS Tagging**, **Bengali Name Entity Recognition**, **Bangla Text Cleaning** for Bengali NLP purposes.
## Features
- Tokenization
- [Basic Tokenizer](./docs/README.md#basic-tokenizer)
- [NLTK Tokenizer](./docs/README.md#nltk-tokenization)
- [Sentencepiece Tokenizer](./docs/README.md#bengali-sentencepiece-tokenization)
- Embeddings
- [Word2vec embedding](./docs/README.md#bengali-word2vec)
- [Fasttext embedding](./docs/README.md#bengali-fasttext)
- [Glove Embedding](./docs/README.md#bengali-glove-word-vectors)
- [Doc2vec Document embedding](./docs/README.md#document-embedding)
- Part of speech tagging
- [CRF-based POS tagging](./docs/README.md#bengali-crf-pos-tagging)
- Named Entity Recognition
- [CRF-based NER](./docs/README.md#bengali-crf-ner)
- [Text Cleaning](./docs/README.md#text-cleaning)
- [Corpus](./docs/README.md#bengali-corpus-class)
- Letters, vowels, punctuations, stopwords
## Installation
### PIP installer
```
pip install bengalinlp
```
**or Upgrade**
```
pip install -U bengalinlp
```
- Python: 3.8, 3.9, 3.10, 3.11
- OS: Linux, Windows, Mac
### Build from source
```
git clone https://github.com/banglawiki/bengalinlp.git
cd bengalinlp
python setup.py install
```
## Sample Usage
```py
from bengalinlp import BasicTokenizer
tokenizer = BasicTokenizer()
raw_text = "আমি বাংলায় গান গাই।"
tokens = tokenizer(raw_text)
print(tokens)
# output: ["আমি", "বাংলায়", "গান", "গাই", "।"]
```
Raw data
{
"_id": null,
"home_page": "https://github.com/banglawiki/bengalinlp",
"name": "bengalinlp",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": null,
"author": "KhulnaSoft DevOps",
"author_email": "info@khulnasoft.com",
"download_url": "https://files.pythonhosted.org/packages/e8/f4/575ad90d9cba293ae10e7f33db6292dc29c60c0ec1443196c768a8980a20/bengalinlp-2.0.0.tar.gz",
"platform": null,
"description": "# Bengali Natural Language Processing(BengaliNLP)\n\n[](https://pypi.org/project/bengalinlp/)\n[](https://pepy.tech/project/bengalinlp)\n\nBengaliNLP is a natural language processing toolkit for Bengali Language. This tool will help you to **tokenize Bengali text**, **Embedding Bengali words**, **Embedding Bengali Document**, **Bengali POS Tagging**, **Bengali Name Entity Recognition**, **Bangla Text Cleaning** for Bengali NLP purposes.\n\n\n## Features\n- Tokenization\n - [Basic Tokenizer](./docs/README.md#basic-tokenizer)\n - [NLTK Tokenizer](./docs/README.md#nltk-tokenization)\n - [Sentencepiece Tokenizer](./docs/README.md#bengali-sentencepiece-tokenization)\n- Embeddings\n - [Word2vec embedding](./docs/README.md#bengali-word2vec)\n - [Fasttext embedding](./docs/README.md#bengali-fasttext)\n - [Glove Embedding](./docs/README.md#bengali-glove-word-vectors)\n - [Doc2vec Document embedding](./docs/README.md#document-embedding)\n- Part of speech tagging\n - [CRF-based POS tagging](./docs/README.md#bengali-crf-pos-tagging)\n- Named Entity Recognition\n - [CRF-based NER](./docs/README.md#bengali-crf-ner)\n- [Text Cleaning](./docs/README.md#text-cleaning)\n- [Corpus](./docs/README.md#bengali-corpus-class)\n - Letters, vowels, punctuations, stopwords\n\n## Installation\n\n### PIP installer\n\n ```\n pip install bengalinlp\n ```\n **or Upgrade**\n\n ```\n pip install -U bengalinlp\n ```\n - Python: 3.8, 3.9, 3.10, 3.11\n - OS: Linux, Windows, Mac\n\n### Build from source\n```\ngit clone https://github.com/banglawiki/bengalinlp.git\ncd bengalinlp\npython setup.py install\n```\n\n## Sample Usage\n\n```py\nfrom bengalinlp import BasicTokenizer\n\ntokenizer = BasicTokenizer()\n\nraw_text = \"\u0986\u09ae\u09bf \u09ac\u09be\u0982\u09b2\u09be\u09af\u09bc \u0997\u09be\u09a8 \u0997\u09be\u0987\u0964\"\ntokens = tokenizer(raw_text)\nprint(tokens)\n# output: [\"\u0986\u09ae\u09bf\", \"\u09ac\u09be\u0982\u09b2\u09be\u09af\u09bc\", \"\u0997\u09be\u09a8\", \"\u0997\u09be\u0987\", \"\u0964\"]\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "BengaliNLP is a natural language processing toolkit for Bengali Language",
"version": "2.0.0",
"project_urls": {
"Homepage": "https://github.com/banglawiki/bengalinlp"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "7ed1020432cb6402f58fcd357ca97e0e165eac2f8fa0664722f8d04cf44d5e53",
"md5": "7c160d0fb17a0499a8eda27213cf4820",
"sha256": "0b77ef8253dc64d52b574da27ecaef290122a27358889babb4fe72d43372c07a"
},
"downloads": -1,
"filename": "bengalinlp-2.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "7c160d0fb17a0499a8eda27213cf4820",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 27243,
"upload_time": "2024-09-07T05:29:32",
"upload_time_iso_8601": "2024-09-07T05:29:32.062210Z",
"url": "https://files.pythonhosted.org/packages/7e/d1/020432cb6402f58fcd357ca97e0e165eac2f8fa0664722f8d04cf44d5e53/bengalinlp-2.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "e8f4575ad90d9cba293ae10e7f33db6292dc29c60c0ec1443196c768a8980a20",
"md5": "995b72627be44e0f0403b5e9094dacf5",
"sha256": "21fdb29a906f7aad0a1f896139c99c8d32ada9345fca7310103e6ed928eb650f"
},
"downloads": -1,
"filename": "bengalinlp-2.0.0.tar.gz",
"has_sig": false,
"md5_digest": "995b72627be44e0f0403b5e9094dacf5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 20715,
"upload_time": "2024-09-07T05:29:33",
"upload_time_iso_8601": "2024-09-07T05:29:33.289223Z",
"url": "https://files.pythonhosted.org/packages/e8/f4/575ad90d9cba293ae10e7f33db6292dc29c60c0ec1443196c768a8980a20/bengalinlp-2.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-07 05:29:33",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "banglawiki",
"github_project": "bengalinlp",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "sentencepiece",
"specs": [
[
"==",
"0.2.0"
]
]
},
{
"name": "gensim",
"specs": [
[
"==",
"4.3.2"
]
]
},
{
"name": "numpy",
"specs": []
},
{
"name": "scipy",
"specs": [
[
"==",
"1.10.1"
]
]
},
{
"name": "sklearn-crfsuite",
"specs": [
[
"==",
"0.3.6"
]
]
},
{
"name": "tqdm",
"specs": [
[
"==",
"4.66.3"
]
]
},
{
"name": "ftfy",
"specs": [
[
"==",
"6.2.0"
]
]
},
{
"name": "emoji",
"specs": [
[
"==",
"1.7.0"
]
]
},
{
"name": "requests",
"specs": []
},
{
"name": "nltk",
"specs": []
}
],
"lcname": "bengalinlp"
}