piraye


Namepiraye JSON
Version 0.5.1 PyPI version JSON
download
home_pageNone
SummaryA utility for normalizing persian, arabic and english texts
upload_time2024-04-21 11:58:16
maintainerNone
docs_urlNone
authorNone
requires_python>=3.11
licenseLGPLv2
keywords nlp natural language processing tokenizing normalization
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Piraye: NLP Utilities

<p align="center">
  <a href="https://pypi.org/project/piraye"><img alt="PyPI Version" src="https://img.shields.io/pypi/v/piraye.svg?maxAge=86400" /></a>
  <a href="https://pypi.org/project/piraye"><img alt="Python Versions" src="https://img.shields.io/pypi/pyversions/piraye.svg?maxAge=86400" /></a>
  <a href="https://pypi.org/project/piraye"><img alt="License" src="https://img.shields.io/pypi/l/piraye.svg?maxAge=86400" /></a>
  <a href="https://pepy.tech/project/piraye"><img alt="Downloads" src="https://static.pepy.tech/badge/piraye" /></a>
  <a href="https://github.com/arushadev/piraye/actions/workflows/pylint.yml"><img alt="Pylint" src="https://github.com/arushadev/piraye/actions/workflows/pylint.yml/badge.svg" /></a>
  <a href="https://github.com/arushadev/piraye/actions/workflows/unit-test.yml/badge.svg)](https://github.com/arushadev/piraye/actions/workflows/unit-test.yml"><img alt="Unit Test" src="https://github.com/arushadev/piraye/actions/workflows/unit-test.yml/badge.svg" /></a>
</p>


**Piraye** is a Python library designed to facilitate text normalization for Persian, Arabic, and English languages.

## Requirements

* Python 3.11+
* nltk 3.4.5+

## Installation

You can install the latest version of Piraye via pip:

`pip install piraye`

## Usage

To use Piraye, create an instance of the Normalizer class with NormalizerBuilder and then call the normalize function. You can configure the normalization process using various settings available. Below are two examples demonstrating different approaches:

* Using builder pattern:

```python
from piraye import NormalizerBuilder

text = "این یک متن تسة اسﺘ       , 24/12/1400 "
normalizer = NormalizerBuilder().alphabet_fa().digit_fa().punctuation_fa().tokenizing().remove_extra_spaces().build()
normalizer.normalize(text)  # "این یک متن تست است ، ۲۴/۱۲/۱۴۰۰"
```

* Using constructor:

```python
from piraye import NormalizerBuilder
from piraye.tasks.normalizer.normalizer_builder import Config

text = "این یک متن تسة اسﺘ       , 24/12/1400 "
normalizer = NormalizerBuilder([Config.PUNCTUATION_FA, Config.ALPHABET_FA, Config.DIGIT_FA], remove_extra_spaces=True,
                               tokenization=True).build()
normalizer.normalize(text)  # "این یک متن تست است ، ۲۴/۱۲/۱۴۰۰"
```

You can find more examples [here](https://github.com/arushadev/piraye/blob/readme/examples.md)

## Configs

Piraye provides various configurations for text normalization. Here's a list of available configurations:

|      Config      |     Function     |                      Description                      |
|:----------------:|:----------------:|:-----------------------------------------------------:|
|   ALPHABET_AR    |   alphabet_ar    |         mapping alphabet characters to Arabic         |
|   ALPHABET_EN    |   alphabet_en    |        mapping alphabet characters to English         |
|   ALPHABET_FA    |   alphabet_fa    |        mapping alphabet characters to Persian         |
|     DIGIT_AR     |     digit_ar     |            convert digits to Arabic digits            |
|     DIGIT_EN     |     digit_en     |           convert digits to English digits            |
|     DIGIT_FA     |     digit_fa     |           convert digits to Persian digits            |
| DIACRITIC_DELETE | diacritic_delete |                 remove all diacritics                 |
|   SPACE_DELETE   |   space_delete   |                   remove all spaces                   |
|   SPACE_NORMAL   |   space_normal   | normal spaces ( like NO-BREAK SPACE , Tab and etc...) |
|    SPACE_KEEP    |    space_keep    |          mapping spaces and not normal them           |
|  PUNCTUATION_AR  |  punctuation_ar  |      mapping punctuations to Arabic punctuations      |
|  PUNCTUATION_Fa  |  punctuation_fa  |     mapping punctuations to Persian punctuations      |
|  PUNCTUATION_EN  |  punctuation_en  |     mapping punctuations to English punctuations      |

Other attributes:

* remove_extra_spaces: Appends multiple spaces together.
* tokenization: Replaces punctuation characters which are just tokens.

## Development

To set up a development environment, install dependencies with:

`pip install -e .[dev]`

## License

**GNU Lesser General Public License v2.1**

Piraye is licensed under the GNU Lesser General Public License v2.1, which primarily applies to software libraries.
See the [LICENSE](https://github.com/arushadev/piraye/blob/main/LICENSE) file for more details.

## About ️

Piraye is maintained by [Arusha](https://www.arusha.dev).



            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "piraye",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": "Arusha Developers <info@arusha.dev>",
    "keywords": "NLP, Natural Language Processing, Tokenizing, Normalization",
    "author": null,
    "author_email": "Hamed Khademi Khaledi <khaledihkh@gmail.com>, HosseiN Khademi khaeldi <hossein@arusha.dev>, Majid Asgiar Bidhendi <majid@arusha.dev>",
    "download_url": "https://files.pythonhosted.org/packages/47/2a/12d6f6fc93dcf36276c36fa5589a90480612aa77d56f3982801cdee542e4/piraye-0.5.1.tar.gz",
    "platform": null,
    "description": "# Piraye: NLP Utilities\n\n<p align=\"center\">\n  <a href=\"https://pypi.org/project/piraye\"><img alt=\"PyPI Version\" src=\"https://img.shields.io/pypi/v/piraye.svg?maxAge=86400\" /></a>\n  <a href=\"https://pypi.org/project/piraye\"><img alt=\"Python Versions\" src=\"https://img.shields.io/pypi/pyversions/piraye.svg?maxAge=86400\" /></a>\n  <a href=\"https://pypi.org/project/piraye\"><img alt=\"License\" src=\"https://img.shields.io/pypi/l/piraye.svg?maxAge=86400\" /></a>\n  <a href=\"https://pepy.tech/project/piraye\"><img alt=\"Downloads\" src=\"https://static.pepy.tech/badge/piraye\" /></a>\n  <a href=\"https://github.com/arushadev/piraye/actions/workflows/pylint.yml\"><img alt=\"Pylint\" src=\"https://github.com/arushadev/piraye/actions/workflows/pylint.yml/badge.svg\" /></a>\n  <a href=\"https://github.com/arushadev/piraye/actions/workflows/unit-test.yml/badge.svg)](https://github.com/arushadev/piraye/actions/workflows/unit-test.yml\"><img alt=\"Unit Test\" src=\"https://github.com/arushadev/piraye/actions/workflows/unit-test.yml/badge.svg\" /></a>\n</p>\n\n\n**Piraye** is a Python library designed to facilitate text normalization for Persian, Arabic, and English languages.\n\n## Requirements\n\n* Python 3.11+\n* nltk 3.4.5+\n\n## Installation\n\nYou can install the latest version of Piraye via pip:\n\n`pip install piraye`\n\n## Usage\n\nTo use Piraye, create an instance of the Normalizer class with NormalizerBuilder and then call the normalize function. You can configure the normalization process using various settings available. Below are two examples demonstrating different approaches:\n\n* Using builder pattern:\n\n```python\nfrom piraye import NormalizerBuilder\n\ntext = \"\u0627\u06cc\u0646 \u06cc\u06a9 \u0645\u062a\u0646 \u062a\u0633\u0629 \u0627\u0633\ufe98       , 24/12/1400 \"\nnormalizer = NormalizerBuilder().alphabet_fa().digit_fa().punctuation_fa().tokenizing().remove_extra_spaces().build()\nnormalizer.normalize(text)  # \"\u0627\u06cc\u0646 \u06cc\u06a9 \u0645\u062a\u0646 \u062a\u0633\u062a \u0627\u0633\u062a \u060c \u06f2\u06f4/\u06f1\u06f2/\u06f1\u06f4\u06f0\u06f0\"\n```\n\n* Using constructor:\n\n```python\nfrom piraye import NormalizerBuilder\nfrom piraye.tasks.normalizer.normalizer_builder import Config\n\ntext = \"\u0627\u06cc\u0646 \u06cc\u06a9 \u0645\u062a\u0646 \u062a\u0633\u0629 \u0627\u0633\ufe98       , 24/12/1400 \"\nnormalizer = NormalizerBuilder([Config.PUNCTUATION_FA, Config.ALPHABET_FA, Config.DIGIT_FA], remove_extra_spaces=True,\n                               tokenization=True).build()\nnormalizer.normalize(text)  # \"\u0627\u06cc\u0646 \u06cc\u06a9 \u0645\u062a\u0646 \u062a\u0633\u062a \u0627\u0633\u062a \u060c \u06f2\u06f4/\u06f1\u06f2/\u06f1\u06f4\u06f0\u06f0\"\n```\n\nYou can find more examples [here](https://github.com/arushadev/piraye/blob/readme/examples.md)\n\n## Configs\n\nPiraye provides various configurations for text normalization. Here's a list of available configurations:\n\n|      Config      |     Function     |                      Description                      |\n|:----------------:|:----------------:|:-----------------------------------------------------:|\n|   ALPHABET_AR    |   alphabet_ar    |         mapping alphabet characters to Arabic         |\n|   ALPHABET_EN    |   alphabet_en    |        mapping alphabet characters to English         |\n|   ALPHABET_FA    |   alphabet_fa    |        mapping alphabet characters to Persian         |\n|     DIGIT_AR     |     digit_ar     |            convert digits to Arabic digits            |\n|     DIGIT_EN     |     digit_en     |           convert digits to English digits            |\n|     DIGIT_FA     |     digit_fa     |           convert digits to Persian digits            |\n| DIACRITIC_DELETE | diacritic_delete |                 remove all diacritics                 |\n|   SPACE_DELETE   |   space_delete   |                   remove all spaces                   |\n|   SPACE_NORMAL   |   space_normal   | normal spaces ( like NO-BREAK SPACE , Tab and etc...) |\n|    SPACE_KEEP    |    space_keep    |          mapping spaces and not normal them           |\n|  PUNCTUATION_AR  |  punctuation_ar  |      mapping punctuations to Arabic punctuations      |\n|  PUNCTUATION_Fa  |  punctuation_fa  |     mapping punctuations to Persian punctuations      |\n|  PUNCTUATION_EN  |  punctuation_en  |     mapping punctuations to English punctuations      |\n\nOther attributes:\n\n* remove_extra_spaces: Appends multiple spaces together.\n* tokenization: Replaces punctuation characters which are just tokens.\n\n## Development\n\nTo set up a development environment, install dependencies with:\n\n`pip install -e .[dev]`\n\n## License\n\n**GNU Lesser General Public License v2.1**\n\nPiraye is licensed under the GNU Lesser General Public License v2.1, which primarily applies to software libraries.\nSee the [LICENSE](https://github.com/arushadev/piraye/blob/main/LICENSE) file for more details.\n\n## About \ufe0f\n\nPiraye is maintained by [Arusha](https://www.arusha.dev).\n\n\n",
    "bugtrack_url": null,
    "license": "LGPLv2",
    "summary": "A utility for normalizing persian, arabic and english texts",
    "version": "0.5.1",
    "project_urls": {
        "Bug Tracker": "https://github.com/arushadev/piraye/issues",
        "Homepage": "https://github.com/arushadev/piraye"
    },
    "split_keywords": [
        "nlp",
        " natural language processing",
        " tokenizing",
        " normalization"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c122e23b80a7b1f806cd90be2f64490f6f025e3f4c96a56a570dc413381523c1",
                "md5": "db9ce35fb38422d51e58ae4782b9f47a",
                "sha256": "192878e4fae40a7e2154364596106405c95e6631da014aeb0c20da82e3c742f5"
            },
            "downloads": -1,
            "filename": "piraye-0.5.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "db9ce35fb38422d51e58ae4782b9f47a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 53623,
            "upload_time": "2024-04-21T11:58:14",
            "upload_time_iso_8601": "2024-04-21T11:58:14.738198Z",
            "url": "https://files.pythonhosted.org/packages/c1/22/e23b80a7b1f806cd90be2f64490f6f025e3f4c96a56a570dc413381523c1/piraye-0.5.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "472a12d6f6fc93dcf36276c36fa5589a90480612aa77d56f3982801cdee542e4",
                "md5": "6cd8713879f0e3edba0a8caef175a4d1",
                "sha256": "d986d9e305ecad20c5d813c151ef0938fa159428b7ef6c7c1c14529316773721"
            },
            "downloads": -1,
            "filename": "piraye-0.5.1.tar.gz",
            "has_sig": false,
            "md5_digest": "6cd8713879f0e3edba0a8caef175a4d1",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 50688,
            "upload_time": "2024-04-21T11:58:16",
            "upload_time_iso_8601": "2024-04-21T11:58:16.206813Z",
            "url": "https://files.pythonhosted.org/packages/47/2a/12d6f6fc93dcf36276c36fa5589a90480612aa77d56f3982801cdee542e4/piraye-0.5.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-21 11:58:16",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "arushadev",
    "github_project": "piraye",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "piraye"
}
        
Elapsed time: 0.25328s