bnlp-toolkit

Name	bnlp-toolkit JSON
Version	4.0.3 JSON
	download
home_page	https://github.com/sagorbrur/bnlp
Summary	BNLP is a natural language processing toolkit for Bengali Language
upload_time	2024-08-20 11:37:41
maintainer	None
docs_url	None
author	Sagor Sarker
requires_python	>=3.6
license	MIT
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Bengali Natural Language Processing(BNLP)

[![PyPI version](https://img.shields.io/pypi/v/bnlp_toolkit)](https://pypi.org/project/bnlp-toolkit/)
[![Downloads](https://static.pepy.tech/badge/bnlp_toolkit)](https://pepy.tech/project/bnlp_toolkit)

BNLP is a natural language processing toolkit for Bengali Language. This tool will help you to **tokenize Bengali text**, **Embedding Bengali words**, **Embedding Bengali Document**, **Bengali POS Tagging**, **Bengali Name Entity Recognition**, **Bangla Text Cleaning** for Bengali NLP purposes.


## Features
- Tokenization
   - [Basic Tokenizer](./docs/README.md#basic-tokenizer)
   - [NLTK Tokenizer](./docs/README.md#nltk-tokenization)
   - [Sentencepiece Tokenizer](./docs/README.md#bengali-sentencepiece-tokenization)
- Embeddings
   - [Word2vec embedding](./docs/README.md#bengali-word2vec)
   - [Fasttext embedding](./docs/README.md#bengali-fasttext)
   - [Glove Embedding](./docs/README.md#bengali-glove-word-vectors)
   - [Doc2vec Document embedding](./docs/README.md#document-embedding)
- Part of speech tagging
   - [CRF-based POS tagging](./docs/README.md#bengali-crf-pos-tagging)
- Named Entity Recognition
   - [CRF-based NER](./docs/README.md#bengali-crf-ner)
- [Text Cleaning](./docs/README.md#text-cleaning)
- [Corpus](./docs/README.md#bengali-corpus-class)
   - Letters, vowels, punctuations, stopwords

## Installation

### PIP installer

  ```
  pip install bnlp_toolkit
  ```
  **or Upgrade**

  ```
  pip install -U bnlp_toolkit
  ```
  - Python: 3.8, 3.9, 3.10, 3.11
  - OS: Linux, Windows, Mac

### Build from source
```
git clone https://github.com/sagorbrur/bnlp.git
cd bnlp
python setup.py install
```

## Sample Usage

```py
from bnlp import BasicTokenizer

tokenizer = BasicTokenizer()

raw_text = "আমি বাংলায় গান গাই।"
tokens = tokenizer(raw_text)
print(tokens)
# output: ["আমি", "বাংলায়", "গান", "গাই", "।"]
```

## Documentation
Full documentation are available [here](https://github.com/sagorbrur/bnlp/tree/master/docs)

If you are using previous version of **bnlp** check the documentation [archive](https://github.com/sagorbrur/bnlp/tree/master/docs/archive)

## Contributor Guide

Check [CONTRIBUTING.md](https://github.com/sagorbrur/bnlp/blob/master/CONTRIBUTING.md) page for details.


## Thanks To

* [Semantics Lab](https://www.facebook.com/lab.semantics/)
* All the developers who are contributing to enrich Bengali NLP.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/sagorbrur/bnlp",
    "name": "bnlp-toolkit",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": null,
    "author": "Sagor Sarker",
    "author_email": "sagorhem3532@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/8c/fb/d89762ed8ce174d896eec825646304df28f20ab2dcd13e131e4fcd6eeff3/bnlp_toolkit-4.0.3.tar.gz",
    "platform": null,
    "description": "# Bengali Natural Language Processing(BNLP)\n\n[![PyPI version](https://img.shields.io/pypi/v/bnlp_toolkit)](https://pypi.org/project/bnlp-toolkit/)\n[![Downloads](https://static.pepy.tech/badge/bnlp_toolkit)](https://pepy.tech/project/bnlp_toolkit)\n\nBNLP is a natural language processing toolkit for Bengali Language. This tool will help you to **tokenize Bengali text**, **Embedding Bengali words**, **Embedding Bengali Document**, **Bengali POS Tagging**, **Bengali Name Entity Recognition**, **Bangla Text Cleaning** for Bengali NLP purposes.\n\n\n## Features\n- Tokenization\n   - [Basic Tokenizer](./docs/README.md#basic-tokenizer)\n   - [NLTK Tokenizer](./docs/README.md#nltk-tokenization)\n   - [Sentencepiece Tokenizer](./docs/README.md#bengali-sentencepiece-tokenization)\n- Embeddings\n   - [Word2vec embedding](./docs/README.md#bengali-word2vec)\n   - [Fasttext embedding](./docs/README.md#bengali-fasttext)\n   - [Glove Embedding](./docs/README.md#bengali-glove-word-vectors)\n   - [Doc2vec Document embedding](./docs/README.md#document-embedding)\n- Part of speech tagging\n   - [CRF-based POS tagging](./docs/README.md#bengali-crf-pos-tagging)\n- Named Entity Recognition\n   - [CRF-based NER](./docs/README.md#bengali-crf-ner)\n- [Text Cleaning](./docs/README.md#text-cleaning)\n- [Corpus](./docs/README.md#bengali-corpus-class)\n   - Letters, vowels, punctuations, stopwords\n\n## Installation\n\n### PIP installer\n\n  ```\n  pip install bnlp_toolkit\n  ```\n  **or Upgrade**\n\n  ```\n  pip install -U bnlp_toolkit\n  ```\n  - Python: 3.8, 3.9, 3.10, 3.11\n  - OS: Linux, Windows, Mac\n\n### Build from source\n```\ngit clone https://github.com/sagorbrur/bnlp.git\ncd bnlp\npython setup.py install\n```\n\n## Sample Usage\n\n```py\nfrom bnlp import BasicTokenizer\n\ntokenizer = BasicTokenizer()\n\nraw_text = \"\u0986\u09ae\u09bf \u09ac\u09be\u0982\u09b2\u09be\u09df \u0997\u09be\u09a8 \u0997\u09be\u0987\u0964\"\ntokens = tokenizer(raw_text)\nprint(tokens)\n# output: [\"\u0986\u09ae\u09bf\", \"\u09ac\u09be\u0982\u09b2\u09be\u09df\", \"\u0997\u09be\u09a8\", \"\u0997\u09be\u0987\", \"\u0964\"]\n```\n\n## Documentation\nFull documentation are available [here](https://github.com/sagorbrur/bnlp/tree/master/docs)\n\nIf you are using previous version of **bnlp** check the documentation [archive](https://github.com/sagorbrur/bnlp/tree/master/docs/archive)\n\n## Contributor Guide\n\nCheck [CONTRIBUTING.md](https://github.com/sagorbrur/bnlp/blob/master/CONTRIBUTING.md) page for details.\n\n\n## Thanks To\n\n* [Semantics Lab](https://www.facebook.com/lab.semantics/)\n* All the developers who are contributing to enrich Bengali NLP.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "BNLP is a natural language processing toolkit for Bengali Language",
    "version": "4.0.3",
    "project_urls": {
        "Homepage": "https://github.com/sagorbrur/bnlp"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "71ef519a0f9b66086db2e6e2404d5bd7754c8286d3308fee48d81fd23a5819c8",
                "md5": "cfda69c0e84ebf330eb7ee31451fbe2e",
                "sha256": "dda244b9f97f4e8cec501deb2ae7a0c4aa85a3ca91910cb3dc850b4c9bee66c7"
            },
            "downloads": -1,
            "filename": "bnlp_toolkit-4.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "cfda69c0e84ebf330eb7ee31451fbe2e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 22667,
            "upload_time": "2024-08-20T11:37:39",
            "upload_time_iso_8601": "2024-08-20T11:37:39.723285Z",
            "url": "https://files.pythonhosted.org/packages/71/ef/519a0f9b66086db2e6e2404d5bd7754c8286d3308fee48d81fd23a5819c8/bnlp_toolkit-4.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8cfbd89762ed8ce174d896eec825646304df28f20ab2dcd13e131e4fcd6eeff3",
                "md5": "9ad5111340fe0cd388391b9f06d28e8d",
                "sha256": "4a19b4bba6347635b5cda88e0cf9e266938c57648d1eeb629ff65745f4791b94"
            },
            "downloads": -1,
            "filename": "bnlp_toolkit-4.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "9ad5111340fe0cd388391b9f06d28e8d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 18531,
            "upload_time": "2024-08-20T11:37:41",
            "upload_time_iso_8601": "2024-08-20T11:37:41.148281Z",
            "url": "https://files.pythonhosted.org/packages/8c/fb/d89762ed8ce174d896eec825646304df28f20ab2dcd13e131e4fcd6eeff3/bnlp_toolkit-4.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-20 11:37:41",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "sagorbrur",
    "github_project": "bnlp",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "bnlp-toolkit"
}

Sagor Sarker