# Piraye: NLP Utilities
<p align="center">
<a href="https://pypi.org/project/piraye"><img alt="PyPI Version" src="https://img.shields.io/pypi/v/piraye.svg?maxAge=86400" /></a>
<a href="https://pypi.org/project/piraye"><img alt="Python Versions" src="https://img.shields.io/pypi/pyversions/piraye.svg?maxAge=86400" /></a>
<a href="https://pypi.org/project/piraye"><img alt="License" src="https://img.shields.io/pypi/l/piraye.svg?maxAge=86400" /></a>
<a href="https://pepy.tech/project/piraye"><img alt="Downloads" src="https://static.pepy.tech/badge/piraye" /></a>
<a href="https://github.com/arushadev/piraye/actions/workflows/pylint.yml"><img alt="Pylint" src="https://github.com/arushadev/piraye/actions/workflows/pylint.yml/badge.svg" /></a>
<a href="https://github.com/arushadev/piraye/actions/workflows/unit-test.yml/badge.svg)](https://github.com/arushadev/piraye/actions/workflows/unit-test.yml"><img alt="Unit Test" src="https://github.com/arushadev/piraye/actions/workflows/unit-test.yml/badge.svg" /></a>
</p>
**Piraye** is a Python library designed to facilitate text normalization for Persian, Arabic, and English languages.
## Requirements
* Python 3.11+
* nltk 3.4.5+
## Installation
You can install the latest version of Piraye via pip:
`pip install piraye`
## Usage
To use Piraye, create an instance of the Normalizer class with NormalizerBuilder and then call the normalize function. You can configure the normalization process using various settings available. Below are two examples demonstrating different approaches:
* Using builder pattern:
```python
from piraye import NormalizerBuilder
text = "این یک متن تسة اسﺘ , 24/12/1400 "
normalizer = NormalizerBuilder().alphabet_fa().digit_fa().punctuation_fa().tokenizing().remove_extra_spaces().build()
normalizer.normalize(text) # "این یک متن تست است ، ۲۴/۱۲/۱۴۰۰"
```
* Using constructor:
```python
from piraye import NormalizerBuilder
from piraye.tasks.normalizer.normalizer_builder import Config
text = "این یک متن تسة اسﺘ , 24/12/1400 "
normalizer = NormalizerBuilder([Config.PUNCTUATION_FA, Config.ALPHABET_FA, Config.DIGIT_FA], remove_extra_spaces=True,
tokenization=True).build()
normalizer.normalize(text) # "این یک متن تست است ، ۲۴/۱۲/۱۴۰۰"
```
You can find more examples [here](https://github.com/arushadev/piraye/blob/readme/examples.md)
## Configs
Piraye provides various configurations for text normalization. Here's a list of available configurations:
| Config | Function | Description |
|:----------------:|:----------------:|:-----------------------------------------------------:|
| ALPHABET_AR | alphabet_ar | mapping alphabet characters to Arabic |
| ALPHABET_EN | alphabet_en | mapping alphabet characters to English |
| ALPHABET_FA | alphabet_fa | mapping alphabet characters to Persian |
| DIGIT_AR | digit_ar | convert digits to Arabic digits |
| DIGIT_EN | digit_en | convert digits to English digits |
| DIGIT_FA | digit_fa | convert digits to Persian digits |
| DIACRITIC_DELETE | diacritic_delete | remove all diacritics |
| SPACE_DELETE | space_delete | remove all spaces |
| SPACE_NORMAL | space_normal | normal spaces ( like NO-BREAK SPACE , Tab and etc...) |
| SPACE_KEEP | space_keep | mapping spaces and not normal them |
| PUNCTUATION_AR | punctuation_ar | mapping punctuations to Arabic punctuations |
| PUNCTUATION_Fa | punctuation_fa | mapping punctuations to Persian punctuations |
| PUNCTUATION_EN | punctuation_en | mapping punctuations to English punctuations |
Other attributes:
* remove_extra_spaces: Appends multiple spaces together.
* tokenization: Replaces punctuation characters which are just tokens.
## Development
To set up a development environment, install dependencies with:
`pip install -e .[dev]`
## License
**GNU Lesser General Public License v2.1**
Piraye is licensed under the GNU Lesser General Public License v2.1, which primarily applies to software libraries.
See the [LICENSE](https://github.com/arushadev/piraye/blob/main/LICENSE) file for more details.
## About ️
Piraye is maintained by [Arusha](https://www.arusha.dev).
Raw data
{
"_id": null,
"home_page": null,
"name": "piraye",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.11",
"maintainer_email": "Arusha Developers <info@arusha.dev>",
"keywords": "NLP, Natural Language Processing, Tokenizing, Normalization",
"author": null,
"author_email": "Hamed Khademi Khaledi <khaledihkh@gmail.com>, HosseiN Khademi khaeldi <hossein@arusha.dev>, Majid Asgiar Bidhendi <majid@arusha.dev>",
"download_url": "https://files.pythonhosted.org/packages/38/37/47542ae6857ad54b026d1327e6ade378582428a29749f9a035002a16e660/piraye-0.6.1.tar.gz",
"platform": null,
"description": "# Piraye: NLP Utilities\n\n<p align=\"center\">\n <a href=\"https://pypi.org/project/piraye\"><img alt=\"PyPI Version\" src=\"https://img.shields.io/pypi/v/piraye.svg?maxAge=86400\" /></a>\n <a href=\"https://pypi.org/project/piraye\"><img alt=\"Python Versions\" src=\"https://img.shields.io/pypi/pyversions/piraye.svg?maxAge=86400\" /></a>\n <a href=\"https://pypi.org/project/piraye\"><img alt=\"License\" src=\"https://img.shields.io/pypi/l/piraye.svg?maxAge=86400\" /></a>\n <a href=\"https://pepy.tech/project/piraye\"><img alt=\"Downloads\" src=\"https://static.pepy.tech/badge/piraye\" /></a>\n <a href=\"https://github.com/arushadev/piraye/actions/workflows/pylint.yml\"><img alt=\"Pylint\" src=\"https://github.com/arushadev/piraye/actions/workflows/pylint.yml/badge.svg\" /></a>\n <a href=\"https://github.com/arushadev/piraye/actions/workflows/unit-test.yml/badge.svg)](https://github.com/arushadev/piraye/actions/workflows/unit-test.yml\"><img alt=\"Unit Test\" src=\"https://github.com/arushadev/piraye/actions/workflows/unit-test.yml/badge.svg\" /></a>\n</p>\n\n\n**Piraye** is a Python library designed to facilitate text normalization for Persian, Arabic, and English languages.\n\n## Requirements\n\n* Python 3.11+\n* nltk 3.4.5+\n\n## Installation\n\nYou can install the latest version of Piraye via pip:\n\n`pip install piraye`\n\n## Usage\n\nTo use Piraye, create an instance of the Normalizer class with NormalizerBuilder and then call the normalize function. You can configure the normalization process using various settings available. Below are two examples demonstrating different approaches:\n\n* Using builder pattern:\n\n```python\nfrom piraye import NormalizerBuilder\n\ntext = \"\u0627\u06cc\u0646 \u06cc\u06a9 \u0645\u062a\u0646 \u062a\u0633\u0629 \u0627\u0633\ufe98 , 24/12/1400 \"\nnormalizer = NormalizerBuilder().alphabet_fa().digit_fa().punctuation_fa().tokenizing().remove_extra_spaces().build()\nnormalizer.normalize(text) # \"\u0627\u06cc\u0646 \u06cc\u06a9 \u0645\u062a\u0646 \u062a\u0633\u062a \u0627\u0633\u062a \u060c \u06f2\u06f4/\u06f1\u06f2/\u06f1\u06f4\u06f0\u06f0\"\n```\n\n* Using constructor:\n\n```python\nfrom piraye import NormalizerBuilder\nfrom piraye.tasks.normalizer.normalizer_builder import Config\n\ntext = \"\u0627\u06cc\u0646 \u06cc\u06a9 \u0645\u062a\u0646 \u062a\u0633\u0629 \u0627\u0633\ufe98 , 24/12/1400 \"\nnormalizer = NormalizerBuilder([Config.PUNCTUATION_FA, Config.ALPHABET_FA, Config.DIGIT_FA], remove_extra_spaces=True,\n tokenization=True).build()\nnormalizer.normalize(text) # \"\u0627\u06cc\u0646 \u06cc\u06a9 \u0645\u062a\u0646 \u062a\u0633\u062a \u0627\u0633\u062a \u060c \u06f2\u06f4/\u06f1\u06f2/\u06f1\u06f4\u06f0\u06f0\"\n```\n\nYou can find more examples [here](https://github.com/arushadev/piraye/blob/readme/examples.md)\n\n## Configs\n\nPiraye provides various configurations for text normalization. Here's a list of available configurations:\n\n| Config | Function | Description |\n|:----------------:|:----------------:|:-----------------------------------------------------:|\n| ALPHABET_AR | alphabet_ar | mapping alphabet characters to Arabic |\n| ALPHABET_EN | alphabet_en | mapping alphabet characters to English |\n| ALPHABET_FA | alphabet_fa | mapping alphabet characters to Persian |\n| DIGIT_AR | digit_ar | convert digits to Arabic digits |\n| DIGIT_EN | digit_en | convert digits to English digits |\n| DIGIT_FA | digit_fa | convert digits to Persian digits |\n| DIACRITIC_DELETE | diacritic_delete | remove all diacritics |\n| SPACE_DELETE | space_delete | remove all spaces |\n| SPACE_NORMAL | space_normal | normal spaces ( like NO-BREAK SPACE , Tab and etc...) |\n| SPACE_KEEP | space_keep | mapping spaces and not normal them |\n| PUNCTUATION_AR | punctuation_ar | mapping punctuations to Arabic punctuations |\n| PUNCTUATION_Fa | punctuation_fa | mapping punctuations to Persian punctuations |\n| PUNCTUATION_EN | punctuation_en | mapping punctuations to English punctuations |\n\nOther attributes:\n\n* remove_extra_spaces: Appends multiple spaces together.\n* tokenization: Replaces punctuation characters which are just tokens.\n\n## Development\n\nTo set up a development environment, install dependencies with:\n\n`pip install -e .[dev]`\n\n## License\n\n**GNU Lesser General Public License v2.1**\n\nPiraye is licensed under the GNU Lesser General Public License v2.1, which primarily applies to software libraries.\nSee the [LICENSE](https://github.com/arushadev/piraye/blob/main/LICENSE) file for more details.\n\n## About \ufe0f\n\nPiraye is maintained by [Arusha](https://www.arusha.dev).\n\n\n",
"bugtrack_url": null,
"license": "LGPLv2",
"summary": "A utility for normalizing persian, arabic and english texts",
"version": "0.6.1",
"project_urls": {
"Bug Tracker": "https://github.com/arushadev/piraye/issues",
"Homepage": "https://github.com/arushadev/piraye"
},
"split_keywords": [
"nlp",
" natural language processing",
" tokenizing",
" normalization"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "69d5353bf283b948479acce2ac0d244d1f68ea85a656e2468cfb6a5389330988",
"md5": "0935764fc66de474d81f01c77dc7b218",
"sha256": "6275d06b49ce780b8ed3ae854fec742a1aa9cb4447b5fa9710ce13164901cd4d"
},
"downloads": -1,
"filename": "piraye-0.6.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "0935764fc66de474d81f01c77dc7b218",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.11",
"size": 53807,
"upload_time": "2024-04-28T13:05:52",
"upload_time_iso_8601": "2024-04-28T13:05:52.935565Z",
"url": "https://files.pythonhosted.org/packages/69/d5/353bf283b948479acce2ac0d244d1f68ea85a656e2468cfb6a5389330988/piraye-0.6.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "383747542ae6857ad54b026d1327e6ade378582428a29749f9a035002a16e660",
"md5": "66c7e01591f1d430829a20896399f37f",
"sha256": "49735dc63669a6e0cbcd2a9c3a1aac1c9cec0923d3d7b3e27e663d297b0e1007"
},
"downloads": -1,
"filename": "piraye-0.6.1.tar.gz",
"has_sig": false,
"md5_digest": "66c7e01591f1d430829a20896399f37f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.11",
"size": 51128,
"upload_time": "2024-04-28T13:05:54",
"upload_time_iso_8601": "2024-04-28T13:05:54.801995Z",
"url": "https://files.pythonhosted.org/packages/38/37/47542ae6857ad54b026d1327e6ade378582428a29749f9a035002a16e660/piraye-0.6.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-28 13:05:54",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "arushadev",
"github_project": "piraye",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "piraye"
}