priaye


Namepriaye JSON
Version 0.4.0 PyPI version JSON
download
home_page
SummaryA utility for normalizing persian, arabic and english texts
upload_time2024-02-14 17:05:53
maintainer
docs_urlNone
author
requires_python>=3.11
licenseLGPLv2
keywords nlp natural language processing tokenizing normalization
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Piraye: NLP Utils

<p align="center">
  <a href="https://pypi.org/project/piraye"><img alt="PyPI Version" src="https://img.shields.io/pypi/v/piraye.svg?maxAge=86400" /></a>
  <a href="https://pypi.org/project/piraye"><img alt="Python Versions" src="https://img.shields.io/pypi/pyversions/piraye.svg?maxAge=86400" /></a>
  <a href="https://pypi.org/project/piraye"><img alt="License" src="https://img.shields.io/pypi/l/piraye.svg?maxAge=86400" /></a>
  <a href="https://github.com/arushadev/piraye/actions/workflows/pylint.yml"><img alt="Pylint" src="https://github.com/arushadev/piraye/actions/workflows/pylint.yml/badge.svg" /></a>
  <a href="https://github.com/arushadev/piraye/actions/workflows/unit-test.yml/badge.svg)](https://github.com/arushadev/piraye/actions/workflows/unit-test.yml"><img alt="Unit Test" src="https://github.com/arushadev/piraye/actions/workflows/unit-test.yml/badge.svg" /></a>
</p>


A utility for normalizing persian, arabic and english texts

## Requirements

* Python 3.11+
* nltk 3.4.5+

## Installation

Install the latest version with pip
`pip install piraye`

## Usage

Create an instance of Normalizer with NormalizerBuilder and then call normalize function. Also see list of all available
configs in [configs](#Configs) section.

* Using builder pattern:

```python
from piraye import NormalizerBuilder
from piraye.tasks.normalizer.normalizer_builder import Config

text = "این یک متن تسة اسﺘ       , 24/12/1400 "
normalizer = NormalizerBuilder().alphabet_fa().digit_fa().punctuation_fa().tokenizing().remove_extra_spaces().build()
normalizer.normalize(text)  # "این یک متن تست است ، ۲۴/۱۲/۱۴۰۰"
```

* Using constructor:

```python
from piraye import NormalizerBuilder
from piraye.tasks.normalizer.normalizer_builder import Config

text = "این یک متن تسة اسﺘ       , 24/12/1400 "
normalizer = NormalizerBuilder([Config.PUNCTUATION_FA, Config.ALPHABET_FA, Config.DIGIT_FA], remove_extra_spaces=True,
                               tokenization=True).build()
normalizer.normalize(text)  # "این یک متن تست است ، ۲۴/۱۲/۱۴۰۰"
```

Also see [other examples](https://github.com/arushadev/piraye/blob/readme/examples.md)

## Configs

|      Config      |     Function     |                      Description                      |
|:----------------:|:----------------:|:-----------------------------------------------------:|
|   ALPHABET_AR    |   alphabet_ar    |         mapping alphabet characters to arabic         |
|   ALPHABET_EN    |   alphabet_en    |        mapping alphabet characters to english         |
|   ALPHABET_FA    |   alphabet_fa    |        mapping alphabet characters to persian         |
|     DIGIT_AR     |     digit_ar     |            convert digits to arabic digits            |
|     DIGIT_EN     |     digit_en     |           convert digits to english digits            |
|     DIGIT_FA     |     digit_fa     |           convert digits to persian digits            |
| DIACRITIC_DELETE | diacritic_delete |                 remove all diacritics                 |
|   SPACE_DELETE   |   space_delete   |                   remove all spaces                   |
|   SPACE_NORMAL   |   space_normal   | normal spaces ( like NO-BREAK SPACE , Tab and etc...) |
|    SPACE_KEEP    |    space_keep    |          mapping spaces and not normal them           |
|  PUNCTUATION_AR  |  punctuation_ar  |      mapping punctuations to arabic punctuations      |
|  PUNCTUATION_Fa  |  punctuation_fa  |     mapping punctuations to persian punctuations      |
|  PUNCTUATION_EN  |  punctuation_en  |     mapping punctuations to english punctuations      |

Other attributes:

* remove_extra_spaces : append multiple spaces together
* tokenization : replace punctuation characters that just are tokens

## Development

* Install dependencies with `pip install -e .[dev]`

## License

**GNU Lesser General Public License v2.1**

Primarily used for software libraries, the GNU LGPL requires that derived works be licensed under the same license, but
works that only link to it do not fall under this restriction. There are two commonly used versions of the GNU LGPL.

See [LICENSE](https://github.com/arushadev/piraye/blob/main/LICENSE)

## About ️

[Arusha](https://www.arusha.dev)


            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "priaye",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": "Arusha Developers <info@arusha.dev>",
    "keywords": "NLP,Natural Language Processing,Tokenizing,Normalization",
    "author": "",
    "author_email": "Hamed Khademi Khaledi <khaledihkh@gmail.com>, HosseiN Khademi khaeldi <hossein@arusha.dev>, Majid Asgiar Bidhendi <majid@arusha.dev>",
    "download_url": "https://files.pythonhosted.org/packages/7f/65/5e99ea63e41da831f4ebff69c8fd02d88eee393323f4bf1c803959c782d9/priaye-0.4.0.tar.gz",
    "platform": null,
    "description": "# Piraye: NLP Utils\n\n<p align=\"center\">\n  <a href=\"https://pypi.org/project/piraye\"><img alt=\"PyPI Version\" src=\"https://img.shields.io/pypi/v/piraye.svg?maxAge=86400\" /></a>\n  <a href=\"https://pypi.org/project/piraye\"><img alt=\"Python Versions\" src=\"https://img.shields.io/pypi/pyversions/piraye.svg?maxAge=86400\" /></a>\n  <a href=\"https://pypi.org/project/piraye\"><img alt=\"License\" src=\"https://img.shields.io/pypi/l/piraye.svg?maxAge=86400\" /></a>\n  <a href=\"https://github.com/arushadev/piraye/actions/workflows/pylint.yml\"><img alt=\"Pylint\" src=\"https://github.com/arushadev/piraye/actions/workflows/pylint.yml/badge.svg\" /></a>\n  <a href=\"https://github.com/arushadev/piraye/actions/workflows/unit-test.yml/badge.svg)](https://github.com/arushadev/piraye/actions/workflows/unit-test.yml\"><img alt=\"Unit Test\" src=\"https://github.com/arushadev/piraye/actions/workflows/unit-test.yml/badge.svg\" /></a>\n</p>\n\n\nA utility for normalizing persian, arabic and english texts\n\n## Requirements\n\n* Python 3.11+\n* nltk 3.4.5+\n\n## Installation\n\nInstall the latest version with pip\n`pip install piraye`\n\n## Usage\n\nCreate an instance of Normalizer with NormalizerBuilder and then call normalize function. Also see list of all available\nconfigs in [configs](#Configs) section.\n\n* Using builder pattern:\n\n```python\nfrom piraye import NormalizerBuilder\nfrom piraye.tasks.normalizer.normalizer_builder import Config\n\ntext = \"\u0627\u06cc\u0646 \u06cc\u06a9 \u0645\u062a\u0646 \u062a\u0633\u0629 \u0627\u0633\ufe98       , 24/12/1400 \"\nnormalizer = NormalizerBuilder().alphabet_fa().digit_fa().punctuation_fa().tokenizing().remove_extra_spaces().build()\nnormalizer.normalize(text)  # \"\u0627\u06cc\u0646 \u06cc\u06a9 \u0645\u062a\u0646 \u062a\u0633\u062a \u0627\u0633\u062a \u060c \u06f2\u06f4/\u06f1\u06f2/\u06f1\u06f4\u06f0\u06f0\"\n```\n\n* Using constructor:\n\n```python\nfrom piraye import NormalizerBuilder\nfrom piraye.tasks.normalizer.normalizer_builder import Config\n\ntext = \"\u0627\u06cc\u0646 \u06cc\u06a9 \u0645\u062a\u0646 \u062a\u0633\u0629 \u0627\u0633\ufe98       , 24/12/1400 \"\nnormalizer = NormalizerBuilder([Config.PUNCTUATION_FA, Config.ALPHABET_FA, Config.DIGIT_FA], remove_extra_spaces=True,\n                               tokenization=True).build()\nnormalizer.normalize(text)  # \"\u0627\u06cc\u0646 \u06cc\u06a9 \u0645\u062a\u0646 \u062a\u0633\u062a \u0627\u0633\u062a \u060c \u06f2\u06f4/\u06f1\u06f2/\u06f1\u06f4\u06f0\u06f0\"\n```\n\nAlso see [other examples](https://github.com/arushadev/piraye/blob/readme/examples.md)\n\n## Configs\n\n|      Config      |     Function     |                      Description                      |\n|:----------------:|:----------------:|:-----------------------------------------------------:|\n|   ALPHABET_AR    |   alphabet_ar    |         mapping alphabet characters to arabic         |\n|   ALPHABET_EN    |   alphabet_en    |        mapping alphabet characters to english         |\n|   ALPHABET_FA    |   alphabet_fa    |        mapping alphabet characters to persian         |\n|     DIGIT_AR     |     digit_ar     |            convert digits to arabic digits            |\n|     DIGIT_EN     |     digit_en     |           convert digits to english digits            |\n|     DIGIT_FA     |     digit_fa     |           convert digits to persian digits            |\n| DIACRITIC_DELETE | diacritic_delete |                 remove all diacritics                 |\n|   SPACE_DELETE   |   space_delete   |                   remove all spaces                   |\n|   SPACE_NORMAL   |   space_normal   | normal spaces ( like NO-BREAK SPACE , Tab and etc...) |\n|    SPACE_KEEP    |    space_keep    |          mapping spaces and not normal them           |\n|  PUNCTUATION_AR  |  punctuation_ar  |      mapping punctuations to arabic punctuations      |\n|  PUNCTUATION_Fa  |  punctuation_fa  |     mapping punctuations to persian punctuations      |\n|  PUNCTUATION_EN  |  punctuation_en  |     mapping punctuations to english punctuations      |\n\nOther attributes:\n\n* remove_extra_spaces : append multiple spaces together\n* tokenization : replace punctuation characters that just are tokens\n\n## Development\n\n* Install dependencies with `pip install -e .[dev]`\n\n## License\n\n**GNU Lesser General Public License v2.1**\n\nPrimarily used for software libraries, the GNU LGPL requires that derived works be licensed under the same license, but\nworks that only link to it do not fall under this restriction. There are two commonly used versions of the GNU LGPL.\n\nSee [LICENSE](https://github.com/arushadev/piraye/blob/main/LICENSE)\n\n## About \ufe0f\n\n[Arusha](https://www.arusha.dev)\n\n",
    "bugtrack_url": null,
    "license": "LGPLv2",
    "summary": "A utility for normalizing persian, arabic and english texts",
    "version": "0.4.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/arushadev/piraye/issues",
        "Homepage": "https://github.com/arushadev/piraye"
    },
    "split_keywords": [
        "nlp",
        "natural language processing",
        "tokenizing",
        "normalization"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "693b087f086a2fe4e068f436595a263e785d8e3bf4e7278b97a616bbd9a57a99",
                "md5": "4411f56558aeb5dc2a15739399fdf0fd",
                "sha256": "52aebafd9d69e5c242df74f08c4786f92e9fe035f5405a617a6589537231c21f"
            },
            "downloads": -1,
            "filename": "priaye-0.4.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4411f56558aeb5dc2a15739399fdf0fd",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 52537,
            "upload_time": "2024-02-14T17:05:51",
            "upload_time_iso_8601": "2024-02-14T17:05:51.291805Z",
            "url": "https://files.pythonhosted.org/packages/69/3b/087f086a2fe4e068f436595a263e785d8e3bf4e7278b97a616bbd9a57a99/priaye-0.4.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7f655e99ea63e41da831f4ebff69c8fd02d88eee393323f4bf1c803959c782d9",
                "md5": "2ffc328688510dd1ece05a7da54552a2",
                "sha256": "fb12f53cf271936a67042aecf0d1c369d5b6e80aecbe356716e05d305915e560"
            },
            "downloads": -1,
            "filename": "priaye-0.4.0.tar.gz",
            "has_sig": false,
            "md5_digest": "2ffc328688510dd1ece05a7da54552a2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 49904,
            "upload_time": "2024-02-14T17:05:53",
            "upload_time_iso_8601": "2024-02-14T17:05:53.147910Z",
            "url": "https://files.pythonhosted.org/packages/7f/65/5e99ea63e41da831f4ebff69c8fd02d88eee393323f4bf1c803959c782d9/priaye-0.4.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-14 17:05:53",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "arushadev",
    "github_project": "piraye",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "priaye"
}
        
Elapsed time: 0.18250s