bengalinlp


Namebengalinlp JSON
Version 2.0.0 PyPI version JSON
download
home_pagehttps://github.com/banglawiki/bengalinlp
SummaryBengaliNLP is a natural language processing toolkit for Bengali Language
upload_time2024-09-07 05:29:33
maintainerNone
docs_urlNone
authorKhulnaSoft DevOps
requires_python>=3.6
licenseMIT
keywords
VCS
bugtrack_url
requirements sentencepiece gensim numpy scipy sklearn-crfsuite tqdm ftfy emoji requests nltk
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Bengali Natural Language Processing(BengaliNLP)

[![PyPI version](https://img.shields.io/pypi/v/bengalinlp)](https://pypi.org/project/bengalinlp/)
[![Downloads](https://static.pepy.tech/badge/bengalinlp)](https://pepy.tech/project/bengalinlp)

BengaliNLP is a natural language processing toolkit for Bengali Language. This tool will help you to **tokenize Bengali text**, **Embedding Bengali words**, **Embedding Bengali Document**, **Bengali POS Tagging**, **Bengali Name Entity Recognition**, **Bangla Text Cleaning** for Bengali NLP purposes.


## Features
- Tokenization
   - [Basic Tokenizer](./docs/README.md#basic-tokenizer)
   - [NLTK Tokenizer](./docs/README.md#nltk-tokenization)
   - [Sentencepiece Tokenizer](./docs/README.md#bengali-sentencepiece-tokenization)
- Embeddings
   - [Word2vec embedding](./docs/README.md#bengali-word2vec)
   - [Fasttext embedding](./docs/README.md#bengali-fasttext)
   - [Glove Embedding](./docs/README.md#bengali-glove-word-vectors)
   - [Doc2vec Document embedding](./docs/README.md#document-embedding)
- Part of speech tagging
   - [CRF-based POS tagging](./docs/README.md#bengali-crf-pos-tagging)
- Named Entity Recognition
   - [CRF-based NER](./docs/README.md#bengali-crf-ner)
- [Text Cleaning](./docs/README.md#text-cleaning)
- [Corpus](./docs/README.md#bengali-corpus-class)
   - Letters, vowels, punctuations, stopwords

## Installation

### PIP installer

  ```
  pip install bengalinlp
  ```
  **or Upgrade**

  ```
  pip install -U bengalinlp
  ```
  - Python: 3.8, 3.9, 3.10, 3.11
  - OS: Linux, Windows, Mac

### Build from source
```
git clone https://github.com/banglawiki/bengalinlp.git
cd bengalinlp
python setup.py install
```

## Sample Usage

```py
from bengalinlp import BasicTokenizer

tokenizer = BasicTokenizer()

raw_text = "আমি বাংলায় গান গাই।"
tokens = tokenizer(raw_text)
print(tokens)
# output: ["আমি", "বাংলায়", "গান", "গাই", "।"]
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/banglawiki/bengalinlp",
    "name": "bengalinlp",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": null,
    "author": "KhulnaSoft DevOps",
    "author_email": "info@khulnasoft.com",
    "download_url": "https://files.pythonhosted.org/packages/e8/f4/575ad90d9cba293ae10e7f33db6292dc29c60c0ec1443196c768a8980a20/bengalinlp-2.0.0.tar.gz",
    "platform": null,
    "description": "# Bengali Natural Language Processing(BengaliNLP)\n\n[![PyPI version](https://img.shields.io/pypi/v/bengalinlp)](https://pypi.org/project/bengalinlp/)\n[![Downloads](https://static.pepy.tech/badge/bengalinlp)](https://pepy.tech/project/bengalinlp)\n\nBengaliNLP is a natural language processing toolkit for Bengali Language. This tool will help you to **tokenize Bengali text**, **Embedding Bengali words**, **Embedding Bengali Document**, **Bengali POS Tagging**, **Bengali Name Entity Recognition**, **Bangla Text Cleaning** for Bengali NLP purposes.\n\n\n## Features\n- Tokenization\n   - [Basic Tokenizer](./docs/README.md#basic-tokenizer)\n   - [NLTK Tokenizer](./docs/README.md#nltk-tokenization)\n   - [Sentencepiece Tokenizer](./docs/README.md#bengali-sentencepiece-tokenization)\n- Embeddings\n   - [Word2vec embedding](./docs/README.md#bengali-word2vec)\n   - [Fasttext embedding](./docs/README.md#bengali-fasttext)\n   - [Glove Embedding](./docs/README.md#bengali-glove-word-vectors)\n   - [Doc2vec Document embedding](./docs/README.md#document-embedding)\n- Part of speech tagging\n   - [CRF-based POS tagging](./docs/README.md#bengali-crf-pos-tagging)\n- Named Entity Recognition\n   - [CRF-based NER](./docs/README.md#bengali-crf-ner)\n- [Text Cleaning](./docs/README.md#text-cleaning)\n- [Corpus](./docs/README.md#bengali-corpus-class)\n   - Letters, vowels, punctuations, stopwords\n\n## Installation\n\n### PIP installer\n\n  ```\n  pip install bengalinlp\n  ```\n  **or Upgrade**\n\n  ```\n  pip install -U bengalinlp\n  ```\n  - Python: 3.8, 3.9, 3.10, 3.11\n  - OS: Linux, Windows, Mac\n\n### Build from source\n```\ngit clone https://github.com/banglawiki/bengalinlp.git\ncd bengalinlp\npython setup.py install\n```\n\n## Sample Usage\n\n```py\nfrom bengalinlp import BasicTokenizer\n\ntokenizer = BasicTokenizer()\n\nraw_text = \"\u0986\u09ae\u09bf \u09ac\u09be\u0982\u09b2\u09be\u09af\u09bc \u0997\u09be\u09a8 \u0997\u09be\u0987\u0964\"\ntokens = tokenizer(raw_text)\nprint(tokens)\n# output: [\"\u0986\u09ae\u09bf\", \"\u09ac\u09be\u0982\u09b2\u09be\u09af\u09bc\", \"\u0997\u09be\u09a8\", \"\u0997\u09be\u0987\", \"\u0964\"]\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "BengaliNLP is a natural language processing toolkit for Bengali Language",
    "version": "2.0.0",
    "project_urls": {
        "Homepage": "https://github.com/banglawiki/bengalinlp"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7ed1020432cb6402f58fcd357ca97e0e165eac2f8fa0664722f8d04cf44d5e53",
                "md5": "7c160d0fb17a0499a8eda27213cf4820",
                "sha256": "0b77ef8253dc64d52b574da27ecaef290122a27358889babb4fe72d43372c07a"
            },
            "downloads": -1,
            "filename": "bengalinlp-2.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7c160d0fb17a0499a8eda27213cf4820",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 27243,
            "upload_time": "2024-09-07T05:29:32",
            "upload_time_iso_8601": "2024-09-07T05:29:32.062210Z",
            "url": "https://files.pythonhosted.org/packages/7e/d1/020432cb6402f58fcd357ca97e0e165eac2f8fa0664722f8d04cf44d5e53/bengalinlp-2.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e8f4575ad90d9cba293ae10e7f33db6292dc29c60c0ec1443196c768a8980a20",
                "md5": "995b72627be44e0f0403b5e9094dacf5",
                "sha256": "21fdb29a906f7aad0a1f896139c99c8d32ada9345fca7310103e6ed928eb650f"
            },
            "downloads": -1,
            "filename": "bengalinlp-2.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "995b72627be44e0f0403b5e9094dacf5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 20715,
            "upload_time": "2024-09-07T05:29:33",
            "upload_time_iso_8601": "2024-09-07T05:29:33.289223Z",
            "url": "https://files.pythonhosted.org/packages/e8/f4/575ad90d9cba293ae10e7f33db6292dc29c60c0ec1443196c768a8980a20/bengalinlp-2.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-07 05:29:33",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "banglawiki",
    "github_project": "bengalinlp",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "sentencepiece",
            "specs": [
                [
                    "==",
                    "0.2.0"
                ]
            ]
        },
        {
            "name": "gensim",
            "specs": [
                [
                    "==",
                    "4.3.2"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": []
        },
        {
            "name": "scipy",
            "specs": [
                [
                    "==",
                    "1.10.1"
                ]
            ]
        },
        {
            "name": "sklearn-crfsuite",
            "specs": [
                [
                    "==",
                    "0.3.6"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    "==",
                    "4.66.3"
                ]
            ]
        },
        {
            "name": "ftfy",
            "specs": [
                [
                    "==",
                    "6.2.0"
                ]
            ]
        },
        {
            "name": "emoji",
            "specs": [
                [
                    "==",
                    "1.7.0"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": []
        },
        {
            "name": "nltk",
            "specs": []
        }
    ],
    "lcname": "bengalinlp"
}
        
Elapsed time: 0.32772s