ruqia


Nameruqia JSON
Version 0.0.23 PyPI version JSON
download
home_pagehttps://github.com/Ruqyai/Ruqia-Library
SummaryArabic NLP
upload_time2024-07-06 11:23:51
maintainerNone
docs_urlNone
authorRuqiya Bin Safi
requires_python<4,>=3.7
licenseNone
keywords arabic nlp development
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Ruqia Library
This library used for Arabic NLP to process, prepare and clean the Arabic text   


مكتبة مخصصة لخدمة معالجة اللغة العربية وتشمل عدد من الوظائف لتنظيف النصوص وغيرها

## Install
```
pip install ruqia
```
## Use
```
from ruqiya import ruqiya
```
## Example: Apply a Function to Pandas Single Column

```
from ruqiya.ruqiya import clean_text

# Often df['text'] be Object not String, so we need to apply str 
df['text']=df['text'].apply(str)
# Now apply our function
df['cleaned_text']=df['text'].apply(clean_text)
# Show the result
df['cleaned_text']
```

## All Functions

## Clean the text 
`clean_text` function includes all these functions:   
  >      1. remove_emails  
  >      2. remove_URLs  
  >      3. remove_mentions   
  >      4. hashtags_to_words     
  >      5. remove_punctuations  
  >      6. normalize_arabic   
  >      7. remove_diacritics   
  >      8. remove_repeating_char   
  >      9. remove_stop_words   
  >      10. remove_emojis

 In other words, `clean_text` includes all functions except `remove_hashtags` 
```
text_cleaned1=ruqiya.clean_text(text)
print(text_cleaned1)
```
## Remove repeating character
`remove_repeating_char` function
```
text_cleaned2=ruqiya.remove_repeating_char(text)
print(text_cleaned2)
```
## Remove punctuations
`remove_punctuations` function
```
text_cleaned3=ruqiya.remove_punctuations(text)
print(text_cleaned3)
```
## Normalize Arabic
`normalize_arabic` function

```
text_cleaned4=ruqiya.normalize_arabic(text)
print(text_cleaned4)
```
## Remove diacritics
`remove_diacritics` function
```
text_cleaned5=ruqiya.remove_diacritics(text)
print(text_cleaned5)
```
## Remove stop words
`remove_stop_words` function
```
text_cleaned6=ruqiya.remove_stop_words(text)
print(text_cleaned6)
```
## Remove emojis
`remove_emojis` function
```
text_cleaned7=ruqiya.remove_emojis(text)
print(text_cleaned7)
```

## Remove mentions
`remove_mentions` function
```
text_cleaned8=ruqiya.remove_mentions(text)
print(text_cleaned8)
```
## Convert any hashtags to words
`hashtags_to_words` function
```
text_cleaned9=ruqiya.hashtags_to_words(text)
print(text_cleaned9)
```

## Remove hashtags
`remove_hashtags` function
```
text_cleaned10=ruqiya.remove_hashtags(text)
print(text_cleaned10)
```
## Remove emails
`remove_emails` function
```
text_cleaned11=ruqiya.remove_emails(text)
print(text_cleaned11)
```
## Remove URLs
`remove_URLs` function
```
text_cleaned12=ruqiya.remove_URLs(text)
print(text_cleaned12)
```
#
## Example
```
from ruqiya import ruqiya

text="""
!!أهلًا وسهلًا بك 👋 في الإصدارِ الأولِ من مكتبة رقيا
هل هذه هي المرة الأولى التي تستخدم فيها المكتبة😀؟!!
معلومات التواصل 
ايميل
example@email.com
الموقع
https://pypi.org/project/ruqia/
تويتر
@Ru0Sa
وسم
#معالجة_العربية
"""

print('===========clean_text===========')
text_cleaned1=ruqiya.clean_text(text)
print(text_cleaned1)
print('===========remove_repeating_char===========')
text_cleaned2=ruqiya.remove_repeating_char(text)
print(text_cleaned2)
print('===========remove_punctuations===========')
text_cleaned3=ruqiya.remove_punctuations(text)
print(text_cleaned3)
print('===========normalize_arabic===========')
text_cleaned4=ruqiya.normalize_arabic(text)
print(text_cleaned4)
print('===========remove_diacritics===========')
text_cleaned5=ruqiya.remove_diacritics(text)
print(text_cleaned5)
print('===========remove_stop_words===========')
text_cleaned6=ruqiya.remove_stop_words(text)
print(text_cleaned6)
print('===========remove_emojis===========')
text_cleaned7=ruqiya.remove_emojis(text)
print(text_cleaned7)
print('===========remove_mentions===========')
text_cleaned8=ruqiya.remove_mentions(text)
print(text_cleaned8)
print('===========hashtags_to_words===========')
text_cleaned9=ruqiya.hashtags_to_words(text)
print(text_cleaned9)
print('===========remove_hashtags===========')
text_cleaned10=ruqiya.remove_hashtags(text)
print(text_cleaned10)
print('===========remove_emails===========')
text_cleaned11=ruqiya.remove_emails(text)
print(text_cleaned11)
print('===========remove_URLs===========')
text_cleaned12=ruqiya.remove_URLs(text)
print(text_cleaned12)

```

## Example 2: Apply a Function to Pandas DataFrame (Single Column)

```
from ruqiya.ruqiya import clean_text
import pandas as pd

data="https://raw.githubusercontent.com/Ruqyai/data4test/main/test_with_lables.csv"
df=pd.read_csv(data)
df['text']=df['poem_text']

#--------------------
# Often df['text'] be Object not String, so we need to apply str 
df['text']=df['text'].apply(str)
# Now apply our function
df['cleaned_text']=df['text'].apply(clean_text)
#--------------------

# Show the result
df['cleaned_text']
```
# Citing Ruqia
If Ruqia helps your research, we appreciate your citations. Here is the BibTeX entry:   
```
@misc{Ruqia2022,
  title={Ruqia-Library},
  author={Ruqiya Bin Safi},
  year={2022},
  howpublished={\url{https://github.com/Ruqyai/Ruqia-Library}},
}
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Ruqyai/Ruqia-Library",
    "name": "ruqia",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4,>=3.7",
    "maintainer_email": null,
    "keywords": "Arabic, NLP, development",
    "author": "Ruqiya Bin Safi",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/82/a9/8204f503cdf8a8314d4194f293dc119506c374597123a9fe456ce6437417/ruqia-0.0.23.tar.gz",
    "platform": null,
    "description": "# Ruqia Library\nThis library used for Arabic NLP to process, prepare and clean the Arabic text   \n\n\n\u0645\u0643\u062a\u0628\u0629 \u0645\u062e\u0635\u0635\u0629 \u0644\u062e\u062f\u0645\u0629 \u0645\u0639\u0627\u0644\u062c\u0629 \u0627\u0644\u0644\u063a\u0629 \u0627\u0644\u0639\u0631\u0628\u064a\u0629 \u0648\u062a\u0634\u0645\u0644 \u0639\u062f\u062f \u0645\u0646 \u0627\u0644\u0648\u0638\u0627\u0626\u0641 \u0644\u062a\u0646\u0638\u064a\u0641 \u0627\u0644\u0646\u0635\u0648\u0635 \u0648\u063a\u064a\u0631\u0647\u0627\n\n## Install\n```\npip install ruqia\n```\n## Use\n```\nfrom ruqiya import ruqiya\n```\n## Example: Apply a Function to Pandas Single Column\n\n```\nfrom ruqiya.ruqiya import clean_text\n\n# Often df['text'] be Object not String, so we need to apply str \ndf['text']=df['text'].apply(str)\n# Now apply our function\ndf['cleaned_text']=df['text'].apply(clean_text)\n# Show the result\ndf['cleaned_text']\n```\n\n## All Functions\n\n## Clean the text \n`clean_text` function includes all these functions:   \n  >      1. remove_emails  \n  >      2. remove_URLs  \n  >      3. remove_mentions   \n  >      4. hashtags_to_words     \n  >      5. remove_punctuations  \n  >      6. normalize_arabic   \n  >      7. remove_diacritics   \n  >      8. remove_repeating_char   \n  >      9. remove_stop_words   \n  >      10. remove_emojis\n\n In other words, `clean_text` includes all functions except `remove_hashtags` \n```\ntext_cleaned1=ruqiya.clean_text(text)\nprint(text_cleaned1)\n```\n## Remove repeating character\n`remove_repeating_char` function\n```\ntext_cleaned2=ruqiya.remove_repeating_char(text)\nprint(text_cleaned2)\n```\n## Remove punctuations\n`remove_punctuations` function\n```\ntext_cleaned3=ruqiya.remove_punctuations(text)\nprint(text_cleaned3)\n```\n## Normalize Arabic\n`normalize_arabic` function\n\n```\ntext_cleaned4=ruqiya.normalize_arabic(text)\nprint(text_cleaned4)\n```\n## Remove diacritics\n`remove_diacritics` function\n```\ntext_cleaned5=ruqiya.remove_diacritics(text)\nprint(text_cleaned5)\n```\n## Remove stop words\n`remove_stop_words` function\n```\ntext_cleaned6=ruqiya.remove_stop_words(text)\nprint(text_cleaned6)\n```\n## Remove emojis\n`remove_emojis` function\n```\ntext_cleaned7=ruqiya.remove_emojis(text)\nprint(text_cleaned7)\n```\n\n## Remove mentions\n`remove_mentions` function\n```\ntext_cleaned8=ruqiya.remove_mentions(text)\nprint(text_cleaned8)\n```\n## Convert any hashtags to words\n`hashtags_to_words` function\n```\ntext_cleaned9=ruqiya.hashtags_to_words(text)\nprint(text_cleaned9)\n```\n\n## Remove hashtags\n`remove_hashtags` function\n```\ntext_cleaned10=ruqiya.remove_hashtags(text)\nprint(text_cleaned10)\n```\n## Remove emails\n`remove_emails` function\n```\ntext_cleaned11=ruqiya.remove_emails(text)\nprint(text_cleaned11)\n```\n## Remove URLs\n`remove_URLs` function\n```\ntext_cleaned12=ruqiya.remove_URLs(text)\nprint(text_cleaned12)\n```\n#\n## Example\n```\nfrom ruqiya import ruqiya\n\ntext=\"\"\"\n!!\u0623\u0647\u0644\u064b\u0627 \u0648\u0633\u0647\u0644\u064b\u0627 \u0628\u0643 \ud83d\udc4b \u0641\u064a \u0627\u0644\u0625\u0635\u062f\u0627\u0631\u0650 \u0627\u0644\u0623\u0648\u0644\u0650 \u0645\u0646 \u0645\u0643\u062a\u0628\u0629 \u0631\u0642\u064a\u0627\n\u0647\u0644 \u0647\u0630\u0647 \u0647\u064a \u0627\u0644\u0645\u0631\u0629 \u0627\u0644\u0623\u0648\u0644\u0649 \u0627\u0644\u062a\u064a \u062a\u0633\u062a\u062e\u062f\u0645 \u0641\u064a\u0647\u0627 \u0627\u0644\u0645\u0643\u062a\u0628\u0629\ud83d\ude00\u061f!!\n\u0645\u0639\u0644\u0648\u0645\u0627\u062a \u0627\u0644\u062a\u0648\u0627\u0635\u0644 \n\u0627\u064a\u0645\u064a\u0644\nexample@email.com\n\u0627\u0644\u0645\u0648\u0642\u0639\nhttps://pypi.org/project/ruqia/\n\u062a\u0648\u064a\u062a\u0631\n@Ru0Sa\n\u0648\u0633\u0645\n#\u0645\u0639\u0627\u0644\u062c\u0629_\u0627\u0644\u0639\u0631\u0628\u064a\u0629\n\"\"\"\n\nprint('===========clean_text===========')\ntext_cleaned1=ruqiya.clean_text(text)\nprint(text_cleaned1)\nprint('===========remove_repeating_char===========')\ntext_cleaned2=ruqiya.remove_repeating_char(text)\nprint(text_cleaned2)\nprint('===========remove_punctuations===========')\ntext_cleaned3=ruqiya.remove_punctuations(text)\nprint(text_cleaned3)\nprint('===========normalize_arabic===========')\ntext_cleaned4=ruqiya.normalize_arabic(text)\nprint(text_cleaned4)\nprint('===========remove_diacritics===========')\ntext_cleaned5=ruqiya.remove_diacritics(text)\nprint(text_cleaned5)\nprint('===========remove_stop_words===========')\ntext_cleaned6=ruqiya.remove_stop_words(text)\nprint(text_cleaned6)\nprint('===========remove_emojis===========')\ntext_cleaned7=ruqiya.remove_emojis(text)\nprint(text_cleaned7)\nprint('===========remove_mentions===========')\ntext_cleaned8=ruqiya.remove_mentions(text)\nprint(text_cleaned8)\nprint('===========hashtags_to_words===========')\ntext_cleaned9=ruqiya.hashtags_to_words(text)\nprint(text_cleaned9)\nprint('===========remove_hashtags===========')\ntext_cleaned10=ruqiya.remove_hashtags(text)\nprint(text_cleaned10)\nprint('===========remove_emails===========')\ntext_cleaned11=ruqiya.remove_emails(text)\nprint(text_cleaned11)\nprint('===========remove_URLs===========')\ntext_cleaned12=ruqiya.remove_URLs(text)\nprint(text_cleaned12)\n\n```\n\n## Example 2: Apply a Function to Pandas DataFrame (Single Column)\n\n```\nfrom ruqiya.ruqiya import clean_text\nimport pandas as pd\n\ndata=\"https://raw.githubusercontent.com/Ruqyai/data4test/main/test_with_lables.csv\"\ndf=pd.read_csv(data)\ndf['text']=df['poem_text']\n\n#--------------------\n# Often df['text'] be Object not String, so we need to apply str \ndf['text']=df['text'].apply(str)\n# Now apply our function\ndf['cleaned_text']=df['text'].apply(clean_text)\n#--------------------\n\n# Show the result\ndf['cleaned_text']\n```\n# Citing Ruqia\nIf Ruqia helps your research, we appreciate your citations. Here is the BibTeX entry:   \n```\n@misc{Ruqia2022,\n  title={Ruqia-Library},\n  author={Ruqiya Bin Safi},\n  year={2022},\n  howpublished={\\url{https://github.com/Ruqyai/Ruqia-Library}},\n}\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Arabic NLP",
    "version": "0.0.23",
    "project_urls": {
        "Become a sponsor": "https://github.com/sponsors/Ruqyai",
        "Bug Reports": "https://github.com/Ruqyai/Ruqia-Library/issues",
        "Homepage": "https://github.com/Ruqyai/Ruqia-Library",
        "Source": "https://github.com/Ruqyai/Ruqia-Library"
    },
    "split_keywords": [
        "arabic",
        " nlp",
        " development"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f408ac61ea565b427cd7b9fe0d713629825c030327dd67dcffa558c0bdb6249b",
                "md5": "3191503eb23a71813c06b11a79843701",
                "sha256": "f34c0a4ee9fd130ab065f98aa91ebb71a47919882f079c8570dee7954314dbb4"
            },
            "downloads": -1,
            "filename": "ruqia-0.0.23-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3191503eb23a71813c06b11a79843701",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4,>=3.7",
            "size": 18228,
            "upload_time": "2024-07-06T11:23:50",
            "upload_time_iso_8601": "2024-07-06T11:23:50.053712Z",
            "url": "https://files.pythonhosted.org/packages/f4/08/ac61ea565b427cd7b9fe0d713629825c030327dd67dcffa558c0bdb6249b/ruqia-0.0.23-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "82a98204f503cdf8a8314d4194f293dc119506c374597123a9fe456ce6437417",
                "md5": "5b5f7c6b526a097b733c90a9c1a433f7",
                "sha256": "55492200f54ff35cbefa2a39af4e3f8b8ea042dbad06c7a3416aa0d24d37fb97"
            },
            "downloads": -1,
            "filename": "ruqia-0.0.23.tar.gz",
            "has_sig": false,
            "md5_digest": "5b5f7c6b526a097b733c90a9c1a433f7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4,>=3.7",
            "size": 19348,
            "upload_time": "2024-07-06T11:23:51",
            "upload_time_iso_8601": "2024-07-06T11:23:51.815602Z",
            "url": "https://files.pythonhosted.org/packages/82/a9/8204f503cdf8a8314d4194f293dc119506c374597123a9fe456ce6437417/ruqia-0.0.23.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-06 11:23:51",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Ruqyai",
    "github_project": "Ruqia-Library",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "ruqia"
}
        
Elapsed time: 0.38861s