# Piraye: NLP Utils
<p align="center">
<a href="https://pypi.org/project/piraye"><img alt="PyPI Version" src="https://img.shields.io/pypi/v/piraye.svg?maxAge=86400" /></a>
<a href="https://pypi.org/project/piraye"><img alt="Python Versions" src="https://img.shields.io/pypi/pyversions/piraye.svg?maxAge=86400" /></a>
<a href="https://pypi.org/project/piraye"><img alt="License" src="https://img.shields.io/pypi/l/piraye.svg?maxAge=86400" /></a>
<a href="https://github.com/arushadev/piraye/actions/workflows/pylint.yml"><img alt="Pylint" src="https://github.com/arushadev/piraye/actions/workflows/pylint.yml/badge.svg" /></a>
<a href="https://github.com/arushadev/piraye/actions/workflows/unit-test.yml/badge.svg)](https://github.com/arushadev/piraye/actions/workflows/unit-test.yml"><img alt="Unit Test" src="https://github.com/arushadev/piraye/actions/workflows/unit-test.yml/badge.svg" /></a>
</p>
A utility for normalizing persian, arabic and english texts
## Requirements
* Python 3.11+
* nltk 3.4.5+
## Installation
Install the latest version with pip
`pip install piraye`
## Usage
Create an instance of Normalizer with NormalizerBuilder and then call normalize function. Also see list of all available
configs in [configs](#Configs) section.
* Using builder pattern:
```python
from piraye import NormalizerBuilder
from piraye.tasks.normalizer.normalizer_builder import Config
text = "این یک متن تسة اسﺘ , 24/12/1400 "
normalizer = NormalizerBuilder().alphabet_fa().digit_fa().punctuation_fa().tokenizing().remove_extra_spaces().build()
normalizer.normalize(text) # "این یک متن تست است ، ۲۴/۱۲/۱۴۰۰"
```
* Using constructor:
```python
from piraye import NormalizerBuilder
from piraye.tasks.normalizer.normalizer_builder import Config
text = "این یک متن تسة اسﺘ , 24/12/1400 "
normalizer = NormalizerBuilder([Config.PUNCTUATION_FA, Config.ALPHABET_FA, Config.DIGIT_FA], remove_extra_spaces=True,
tokenization=True).build()
normalizer.normalize(text) # "این یک متن تست است ، ۲۴/۱۲/۱۴۰۰"
```
Also see [other examples](https://github.com/arushadev/piraye/blob/readme/examples.md)
## Configs
| Config | Function | Description |
|:----------------:|:----------------:|:-----------------------------------------------------:|
| ALPHABET_AR | alphabet_ar | mapping alphabet characters to arabic |
| ALPHABET_EN | alphabet_en | mapping alphabet characters to english |
| ALPHABET_FA | alphabet_fa | mapping alphabet characters to persian |
| DIGIT_AR | digit_ar | convert digits to arabic digits |
| DIGIT_EN | digit_en | convert digits to english digits |
| DIGIT_FA | digit_fa | convert digits to persian digits |
| DIACRITIC_DELETE | diacritic_delete | remove all diacritics |
| SPACE_DELETE | space_delete | remove all spaces |
| SPACE_NORMAL | space_normal | normal spaces ( like NO-BREAK SPACE , Tab and etc...) |
| SPACE_KEEP | space_keep | mapping spaces and not normal them |
| PUNCTUATION_AR | punctuation_ar | mapping punctuations to arabic punctuations |
| PUNCTUATION_Fa | punctuation_fa | mapping punctuations to persian punctuations |
| PUNCTUATION_EN | punctuation_en | mapping punctuations to english punctuations |
Other attributes:
* remove_extra_spaces : append multiple spaces together
* tokenization : replace punctuation characters that just are tokens
## Development
* Install dependencies with `pip install -e .[dev]`
## License
**GNU Lesser General Public License v2.1**
Primarily used for software libraries, the GNU LGPL requires that derived works be licensed under the same license, but
works that only link to it do not fall under this restriction. There are two commonly used versions of the GNU LGPL.
See [LICENSE](https://github.com/arushadev/piraye/blob/main/LICENSE)
## About ️
[Arusha](https://www.arusha.dev)
Raw data
{
"_id": null,
"home_page": "",
"name": "priaye",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.11",
"maintainer_email": "Arusha Developers <info@arusha.dev>",
"keywords": "NLP,Natural Language Processing,Tokenizing,Normalization",
"author": "",
"author_email": "Hamed Khademi Khaledi <khaledihkh@gmail.com>, HosseiN Khademi khaeldi <hossein@arusha.dev>, Majid Asgiar Bidhendi <majid@arusha.dev>",
"download_url": "https://files.pythonhosted.org/packages/7f/65/5e99ea63e41da831f4ebff69c8fd02d88eee393323f4bf1c803959c782d9/priaye-0.4.0.tar.gz",
"platform": null,
"description": "# Piraye: NLP Utils\n\n<p align=\"center\">\n <a href=\"https://pypi.org/project/piraye\"><img alt=\"PyPI Version\" src=\"https://img.shields.io/pypi/v/piraye.svg?maxAge=86400\" /></a>\n <a href=\"https://pypi.org/project/piraye\"><img alt=\"Python Versions\" src=\"https://img.shields.io/pypi/pyversions/piraye.svg?maxAge=86400\" /></a>\n <a href=\"https://pypi.org/project/piraye\"><img alt=\"License\" src=\"https://img.shields.io/pypi/l/piraye.svg?maxAge=86400\" /></a>\n <a href=\"https://github.com/arushadev/piraye/actions/workflows/pylint.yml\"><img alt=\"Pylint\" src=\"https://github.com/arushadev/piraye/actions/workflows/pylint.yml/badge.svg\" /></a>\n <a href=\"https://github.com/arushadev/piraye/actions/workflows/unit-test.yml/badge.svg)](https://github.com/arushadev/piraye/actions/workflows/unit-test.yml\"><img alt=\"Unit Test\" src=\"https://github.com/arushadev/piraye/actions/workflows/unit-test.yml/badge.svg\" /></a>\n</p>\n\n\nA utility for normalizing persian, arabic and english texts\n\n## Requirements\n\n* Python 3.11+\n* nltk 3.4.5+\n\n## Installation\n\nInstall the latest version with pip\n`pip install piraye`\n\n## Usage\n\nCreate an instance of Normalizer with NormalizerBuilder and then call normalize function. Also see list of all available\nconfigs in [configs](#Configs) section.\n\n* Using builder pattern:\n\n```python\nfrom piraye import NormalizerBuilder\nfrom piraye.tasks.normalizer.normalizer_builder import Config\n\ntext = \"\u0627\u06cc\u0646 \u06cc\u06a9 \u0645\u062a\u0646 \u062a\u0633\u0629 \u0627\u0633\ufe98 , 24/12/1400 \"\nnormalizer = NormalizerBuilder().alphabet_fa().digit_fa().punctuation_fa().tokenizing().remove_extra_spaces().build()\nnormalizer.normalize(text) # \"\u0627\u06cc\u0646 \u06cc\u06a9 \u0645\u062a\u0646 \u062a\u0633\u062a \u0627\u0633\u062a \u060c \u06f2\u06f4/\u06f1\u06f2/\u06f1\u06f4\u06f0\u06f0\"\n```\n\n* Using constructor:\n\n```python\nfrom piraye import NormalizerBuilder\nfrom piraye.tasks.normalizer.normalizer_builder import Config\n\ntext = \"\u0627\u06cc\u0646 \u06cc\u06a9 \u0645\u062a\u0646 \u062a\u0633\u0629 \u0627\u0633\ufe98 , 24/12/1400 \"\nnormalizer = NormalizerBuilder([Config.PUNCTUATION_FA, Config.ALPHABET_FA, Config.DIGIT_FA], remove_extra_spaces=True,\n tokenization=True).build()\nnormalizer.normalize(text) # \"\u0627\u06cc\u0646 \u06cc\u06a9 \u0645\u062a\u0646 \u062a\u0633\u062a \u0627\u0633\u062a \u060c \u06f2\u06f4/\u06f1\u06f2/\u06f1\u06f4\u06f0\u06f0\"\n```\n\nAlso see [other examples](https://github.com/arushadev/piraye/blob/readme/examples.md)\n\n## Configs\n\n| Config | Function | Description |\n|:----------------:|:----------------:|:-----------------------------------------------------:|\n| ALPHABET_AR | alphabet_ar | mapping alphabet characters to arabic |\n| ALPHABET_EN | alphabet_en | mapping alphabet characters to english |\n| ALPHABET_FA | alphabet_fa | mapping alphabet characters to persian |\n| DIGIT_AR | digit_ar | convert digits to arabic digits |\n| DIGIT_EN | digit_en | convert digits to english digits |\n| DIGIT_FA | digit_fa | convert digits to persian digits |\n| DIACRITIC_DELETE | diacritic_delete | remove all diacritics |\n| SPACE_DELETE | space_delete | remove all spaces |\n| SPACE_NORMAL | space_normal | normal spaces ( like NO-BREAK SPACE , Tab and etc...) |\n| SPACE_KEEP | space_keep | mapping spaces and not normal them |\n| PUNCTUATION_AR | punctuation_ar | mapping punctuations to arabic punctuations |\n| PUNCTUATION_Fa | punctuation_fa | mapping punctuations to persian punctuations |\n| PUNCTUATION_EN | punctuation_en | mapping punctuations to english punctuations |\n\nOther attributes:\n\n* remove_extra_spaces : append multiple spaces together\n* tokenization : replace punctuation characters that just are tokens\n\n## Development\n\n* Install dependencies with `pip install -e .[dev]`\n\n## License\n\n**GNU Lesser General Public License v2.1**\n\nPrimarily used for software libraries, the GNU LGPL requires that derived works be licensed under the same license, but\nworks that only link to it do not fall under this restriction. There are two commonly used versions of the GNU LGPL.\n\nSee [LICENSE](https://github.com/arushadev/piraye/blob/main/LICENSE)\n\n## About \ufe0f\n\n[Arusha](https://www.arusha.dev)\n\n",
"bugtrack_url": null,
"license": "LGPLv2",
"summary": "A utility for normalizing persian, arabic and english texts",
"version": "0.4.0",
"project_urls": {
"Bug Tracker": "https://github.com/arushadev/piraye/issues",
"Homepage": "https://github.com/arushadev/piraye"
},
"split_keywords": [
"nlp",
"natural language processing",
"tokenizing",
"normalization"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "693b087f086a2fe4e068f436595a263e785d8e3bf4e7278b97a616bbd9a57a99",
"md5": "4411f56558aeb5dc2a15739399fdf0fd",
"sha256": "52aebafd9d69e5c242df74f08c4786f92e9fe035f5405a617a6589537231c21f"
},
"downloads": -1,
"filename": "priaye-0.4.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "4411f56558aeb5dc2a15739399fdf0fd",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.11",
"size": 52537,
"upload_time": "2024-02-14T17:05:51",
"upload_time_iso_8601": "2024-02-14T17:05:51.291805Z",
"url": "https://files.pythonhosted.org/packages/69/3b/087f086a2fe4e068f436595a263e785d8e3bf4e7278b97a616bbd9a57a99/priaye-0.4.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "7f655e99ea63e41da831f4ebff69c8fd02d88eee393323f4bf1c803959c782d9",
"md5": "2ffc328688510dd1ece05a7da54552a2",
"sha256": "fb12f53cf271936a67042aecf0d1c369d5b6e80aecbe356716e05d305915e560"
},
"downloads": -1,
"filename": "priaye-0.4.0.tar.gz",
"has_sig": false,
"md5_digest": "2ffc328688510dd1ece05a7da54552a2",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.11",
"size": 49904,
"upload_time": "2024-02-14T17:05:53",
"upload_time_iso_8601": "2024-02-14T17:05:53.147910Z",
"url": "https://files.pythonhosted.org/packages/7f/65/5e99ea63e41da831f4ebff69c8fd02d88eee393323f4bf1c803959c782d9/priaye-0.4.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-02-14 17:05:53",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "arushadev",
"github_project": "piraye",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "priaye"
}