ruqia


Nameruqia JSON
Version 0.0.17 PyPI version JSON
download
home_pagehttps://github.com/Ruqyai/Ara-NLP-lib
SummaryArabic NLP
upload_time2023-01-02 12:04:58
maintainer
docs_urlNone
authorRuqiya Bin Safi
requires_python>=3.7, <4
license
keywords arabic nlp development
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Ruqia lib
This library used for Arabic NLP to process, prepare and clean the Arabic text   


مكتبة مخصصة لخدمة معالجة اللغة العربية وتشمل عدد من الوظائف لتنظيف النصوص وغيرها

## Install
```
pip install ruqia
```
## Use
```
from ruqiya import ruqiya
```
## Example: Apply a Function to Pandas Single Column

```
from ruqiya.ruqiya import clean_text

# Often df['text'] be Object not String, so we need to apply str 
df['text']=df['text'].apply(str)
# Now apply our function
df['cleaned_text']=df['text'].apply(clean_text)
# Show the result
df['cleaned_text']
```

## All Functions

## Clean the text 
`clean_text` function includes all these functions:   
  >      1. remove_emails  
  >      2. remove_URLs  
  >      3. remove_mentions   
  >      4. hashtags_to_words     
  >      5. remove_punctuations  
  >      6. normalize_arabic   
  >      7. remove_diacritics   
  >      8. remove_repeating_char   
  >      9. remove_stop_words   
  >      10. remove_emojis

 In other words, `clean_text` includes all functions except `remove_hashtags` 
```
text_cleaned1=ruqiya.clean_text(text)
print(text_cleaned1)
```
## Remove repeating character
`remove_repeating_char` function
```
text_cleaned2=ruqiya.remove_repeating_char(text)
print(text_cleaned2)
```
## Remove punctuations
`remove_punctuations` function
```
text_cleaned3=ruqiya.remove_punctuations(text)
print(text_cleaned3)
```
## Normalize Arabic
`normalize_arabic` function

```
text_cleaned4=ruqiya.normalize_arabic(text)
print(text_cleaned4)
```
## Remove diacritics
`remove_diacritics` function
```
text_cleaned5=ruqiya.remove_diacritics(text)
print(text_cleaned5)
```
## Remove stop words
`remove_stop_words` function
```
text_cleaned6=ruqiya.remove_stop_words(text)
print(text_cleaned6)
```
## Remove emojis
`remove_emojis` function
```
text_cleaned7=ruqiya.remove_emojis(text)
print(text_cleaned7)
```

## Remove mentions
`remove_mentions` function
```
text_cleaned8=ruqiya.remove_mentions(text)
print(text_cleaned8)
```
## Convert any hashtags to words
`hashtags_to_words` function
```
text_cleaned9=ruqiya.hashtags_to_words(text)
print(text_cleaned9)
```

## Remove hashtags
`remove_hashtags` function
```
text_cleaned10=ruqiya.remove_hashtags(text)
print(text_cleaned10)
```
## Remove emails
`remove_emails` function
```
text_cleaned11=ruqiya.remove_emails(text)
print(text_cleaned11)
```
## Remove URLs
`remove_URLs` function
```
text_cleaned12=ruqiya.remove_URLs(text)
print(text_cleaned12)
```
#
## Example
```
from ruqiya import ruqiya

text="""
!!أهلًا وسهلًا بك 👋 في الإصدارِ الأولِ من مكتبة رقيا
هل هذه هي المرة الأولى التي تستخدم فيها المكتبة😀؟!!
معلومات التواصل 
ايميل
example@email.com
الموقع
https://pypi.org/project/ruqia/
تويتر
@Ru0Sa
وسم
#معالجة_العربية
"""

print('===========clean_text===========')
text_cleaned1=ruqiya.clean_text(text)
print(text_cleaned1)
print('===========remove_repeating_char===========')
text_cleaned2=ruqiya.remove_repeating_char(text)
print(text_cleaned2)
print('===========remove_punctuations===========')
text_cleaned3=ruqiya.remove_punctuations(text)
print(text_cleaned3)
print('===========normalize_arabic===========')
text_cleaned4=ruqiya.normalize_arabic(text)
print(text_cleaned4)
print('===========remove_diacritics===========')
text_cleaned5=ruqiya.remove_diacritics(text)
print(text_cleaned5)
print('===========remove_stop_words===========')
text_cleaned6=ruqiya.remove_stop_words(text)
print(text_cleaned6)
print('===========remove_emojis===========')
text_cleaned7=ruqiya.remove_emojis(text)
print(text_cleaned7)
print('===========remove_mentions===========')
text_cleaned8=ruqiya.remove_mentions(text)
print(text_cleaned8)
print('===========hashtags_to_words===========')
text_cleaned9=ruqiya.hashtags_to_words(text)
print(text_cleaned9)
print('===========remove_hashtags===========')
text_cleaned10=ruqiya.remove_hashtags(text)
print(text_cleaned10)
print('===========remove_emails===========')
text_cleaned11=ruqiya.remove_emails(text)
print(text_cleaned11)
print('===========remove_URLs===========')
text_cleaned12=ruqiya.remove_URLs(text)
print(text_cleaned12)

```

## Example 2: Apply a Function to Pandas DataFrame (Single Column)

```
from ruqiya.ruqiya import clean_text
import pandas as pd

data="https://raw.githubusercontent.com/Ruqyai/data4test/main/test_with_lables.csv"
df=pd.read_csv(data)
df['text']=df['poem_text']

#--------------------
# Often df['text'] be Object not String, so we need to apply str 
df['text']=df['text'].apply(str)
# Now apply our function
df['cleaned_text']=df['text'].apply(clean_text)
#--------------------

# Show the result
df['cleaned_text']
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Ruqyai/Ara-NLP-lib",
    "name": "ruqia",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7, <4",
    "maintainer_email": "",
    "keywords": "Arabic,NLP,development",
    "author": "Ruqiya Bin Safi",
    "author_email": "myacount05@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/d1/bd/7d8e547a4939612ae8e34f74d599e4a2e035c9a6775ccdef82e60e78e9b4/ruqia-0.0.17.tar.gz",
    "platform": null,
    "description": "# Ruqia lib\nThis library used for Arabic NLP to process, prepare and clean the Arabic text   \n\n\n\u0645\u0643\u062a\u0628\u0629 \u0645\u062e\u0635\u0635\u0629 \u0644\u062e\u062f\u0645\u0629 \u0645\u0639\u0627\u0644\u062c\u0629 \u0627\u0644\u0644\u063a\u0629 \u0627\u0644\u0639\u0631\u0628\u064a\u0629 \u0648\u062a\u0634\u0645\u0644 \u0639\u062f\u062f \u0645\u0646 \u0627\u0644\u0648\u0638\u0627\u0626\u0641 \u0644\u062a\u0646\u0638\u064a\u0641 \u0627\u0644\u0646\u0635\u0648\u0635 \u0648\u063a\u064a\u0631\u0647\u0627\n\n## Install\n```\npip install ruqia\n```\n## Use\n```\nfrom ruqiya import ruqiya\n```\n## Example: Apply a Function to Pandas Single Column\n\n```\nfrom ruqiya.ruqiya import clean_text\n\n# Often df['text'] be Object not String, so we need to apply str \ndf['text']=df['text'].apply(str)\n# Now apply our function\ndf['cleaned_text']=df['text'].apply(clean_text)\n# Show the result\ndf['cleaned_text']\n```\n\n## All Functions\n\n## Clean the text \n`clean_text` function includes all these functions:   \n  >      1. remove_emails  \n  >      2. remove_URLs  \n  >      3. remove_mentions   \n  >      4. hashtags_to_words     \n  >      5. remove_punctuations  \n  >      6. normalize_arabic   \n  >      7. remove_diacritics   \n  >      8. remove_repeating_char   \n  >      9. remove_stop_words   \n  >      10. remove_emojis\n\n In other words, `clean_text` includes all functions except `remove_hashtags` \n```\ntext_cleaned1=ruqiya.clean_text(text)\nprint(text_cleaned1)\n```\n## Remove repeating character\n`remove_repeating_char` function\n```\ntext_cleaned2=ruqiya.remove_repeating_char(text)\nprint(text_cleaned2)\n```\n## Remove punctuations\n`remove_punctuations` function\n```\ntext_cleaned3=ruqiya.remove_punctuations(text)\nprint(text_cleaned3)\n```\n## Normalize Arabic\n`normalize_arabic` function\n\n```\ntext_cleaned4=ruqiya.normalize_arabic(text)\nprint(text_cleaned4)\n```\n## Remove diacritics\n`remove_diacritics` function\n```\ntext_cleaned5=ruqiya.remove_diacritics(text)\nprint(text_cleaned5)\n```\n## Remove stop words\n`remove_stop_words` function\n```\ntext_cleaned6=ruqiya.remove_stop_words(text)\nprint(text_cleaned6)\n```\n## Remove emojis\n`remove_emojis` function\n```\ntext_cleaned7=ruqiya.remove_emojis(text)\nprint(text_cleaned7)\n```\n\n## Remove mentions\n`remove_mentions` function\n```\ntext_cleaned8=ruqiya.remove_mentions(text)\nprint(text_cleaned8)\n```\n## Convert any hashtags to words\n`hashtags_to_words` function\n```\ntext_cleaned9=ruqiya.hashtags_to_words(text)\nprint(text_cleaned9)\n```\n\n## Remove hashtags\n`remove_hashtags` function\n```\ntext_cleaned10=ruqiya.remove_hashtags(text)\nprint(text_cleaned10)\n```\n## Remove emails\n`remove_emails` function\n```\ntext_cleaned11=ruqiya.remove_emails(text)\nprint(text_cleaned11)\n```\n## Remove URLs\n`remove_URLs` function\n```\ntext_cleaned12=ruqiya.remove_URLs(text)\nprint(text_cleaned12)\n```\n#\n## Example\n```\nfrom ruqiya import ruqiya\n\ntext=\"\"\"\n!!\u0623\u0647\u0644\u064b\u0627 \u0648\u0633\u0647\u0644\u064b\u0627 \u0628\u0643 \ud83d\udc4b \u0641\u064a \u0627\u0644\u0625\u0635\u062f\u0627\u0631\u0650 \u0627\u0644\u0623\u0648\u0644\u0650 \u0645\u0646 \u0645\u0643\u062a\u0628\u0629 \u0631\u0642\u064a\u0627\n\u0647\u0644 \u0647\u0630\u0647 \u0647\u064a \u0627\u0644\u0645\u0631\u0629 \u0627\u0644\u0623\u0648\u0644\u0649 \u0627\u0644\u062a\u064a \u062a\u0633\u062a\u062e\u062f\u0645 \u0641\u064a\u0647\u0627 \u0627\u0644\u0645\u0643\u062a\u0628\u0629\ud83d\ude00\u061f!!\n\u0645\u0639\u0644\u0648\u0645\u0627\u062a \u0627\u0644\u062a\u0648\u0627\u0635\u0644 \n\u0627\u064a\u0645\u064a\u0644\nexample@email.com\n\u0627\u0644\u0645\u0648\u0642\u0639\nhttps://pypi.org/project/ruqia/\n\u062a\u0648\u064a\u062a\u0631\n@Ru0Sa\n\u0648\u0633\u0645\n#\u0645\u0639\u0627\u0644\u062c\u0629_\u0627\u0644\u0639\u0631\u0628\u064a\u0629\n\"\"\"\n\nprint('===========clean_text===========')\ntext_cleaned1=ruqiya.clean_text(text)\nprint(text_cleaned1)\nprint('===========remove_repeating_char===========')\ntext_cleaned2=ruqiya.remove_repeating_char(text)\nprint(text_cleaned2)\nprint('===========remove_punctuations===========')\ntext_cleaned3=ruqiya.remove_punctuations(text)\nprint(text_cleaned3)\nprint('===========normalize_arabic===========')\ntext_cleaned4=ruqiya.normalize_arabic(text)\nprint(text_cleaned4)\nprint('===========remove_diacritics===========')\ntext_cleaned5=ruqiya.remove_diacritics(text)\nprint(text_cleaned5)\nprint('===========remove_stop_words===========')\ntext_cleaned6=ruqiya.remove_stop_words(text)\nprint(text_cleaned6)\nprint('===========remove_emojis===========')\ntext_cleaned7=ruqiya.remove_emojis(text)\nprint(text_cleaned7)\nprint('===========remove_mentions===========')\ntext_cleaned8=ruqiya.remove_mentions(text)\nprint(text_cleaned8)\nprint('===========hashtags_to_words===========')\ntext_cleaned9=ruqiya.hashtags_to_words(text)\nprint(text_cleaned9)\nprint('===========remove_hashtags===========')\ntext_cleaned10=ruqiya.remove_hashtags(text)\nprint(text_cleaned10)\nprint('===========remove_emails===========')\ntext_cleaned11=ruqiya.remove_emails(text)\nprint(text_cleaned11)\nprint('===========remove_URLs===========')\ntext_cleaned12=ruqiya.remove_URLs(text)\nprint(text_cleaned12)\n\n```\n\n## Example 2: Apply a Function to Pandas DataFrame (Single Column)\n\n```\nfrom ruqiya.ruqiya import clean_text\nimport pandas as pd\n\ndata=\"https://raw.githubusercontent.com/Ruqyai/data4test/main/test_with_lables.csv\"\ndf=pd.read_csv(data)\ndf['text']=df['poem_text']\n\n#--------------------\n# Often df['text'] be Object not String, so we need to apply str \ndf['text']=df['text'].apply(str)\n# Now apply our function\ndf['cleaned_text']=df['text'].apply(clean_text)\n#--------------------\n\n# Show the result\ndf['cleaned_text']\n```\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Arabic NLP",
    "version": "0.0.17",
    "split_keywords": [
        "arabic",
        "nlp",
        "development"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "ff345af1b061e93b14807bf6b327a61a",
                "sha256": "cb17594701f45eb5c20ccb0216cc8a74fea225bf47363051d1b05eae55fc9f60"
            },
            "downloads": -1,
            "filename": "ruqia-0.0.17-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ff345af1b061e93b14807bf6b327a61a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7, <4",
            "size": 10072,
            "upload_time": "2023-01-02T12:04:56",
            "upload_time_iso_8601": "2023-01-02T12:04:56.485611Z",
            "url": "https://files.pythonhosted.org/packages/d7/97/c69313e4a918635b5a3ffd385d8698b51638572aa813321bfb31d71de869/ruqia-0.0.17-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "c1d5c3b175e23318b869441d9d2cad21",
                "sha256": "151fed03ca22f830f07de7d78bea7a1930ac73198c27cac9b4952c34951c3f19"
            },
            "downloads": -1,
            "filename": "ruqia-0.0.17.tar.gz",
            "has_sig": false,
            "md5_digest": "c1d5c3b175e23318b869441d9d2cad21",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7, <4",
            "size": 13613,
            "upload_time": "2023-01-02T12:04:58",
            "upload_time_iso_8601": "2023-01-02T12:04:58.188024Z",
            "url": "https://files.pythonhosted.org/packages/d1/bd/7d8e547a4939612ae8e34f74d599e4a2e035c9a6775ccdef82e60e78e9b4/ruqia-0.0.17.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-01-02 12:04:58",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "Ruqyai",
    "github_project": "Ara-NLP-lib",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "ruqia"
}
        
Elapsed time: 0.02403s