# Ruqia lib
This library used for Arabic NLP to process, prepare and clean the Arabic text
مكتبة مخصصة لخدمة معالجة اللغة العربية وتشمل عدد من الوظائف لتنظيف النصوص وغيرها
## Install
```
pip install ruqia
```
## Use
```
from ruqiya import ruqiya
```
## Example: Apply a Function to Pandas Single Column
```
from ruqiya.ruqiya import clean_text
# Often df['text'] be Object not String, so we need to apply str
df['text']=df['text'].apply(str)
# Now apply our function
df['cleaned_text']=df['text'].apply(clean_text)
# Show the result
df['cleaned_text']
```
## All Functions
## Clean the text
`clean_text` function includes all these functions:
> 1. remove_emails
> 2. remove_URLs
> 3. remove_mentions
> 4. hashtags_to_words
> 5. remove_punctuations
> 6. normalize_arabic
> 7. remove_diacritics
> 8. remove_repeating_char
> 9. remove_stop_words
> 10. remove_emojis
In other words, `clean_text` includes all functions except `remove_hashtags`
```
text_cleaned1=ruqiya.clean_text(text)
print(text_cleaned1)
```
## Remove repeating character
`remove_repeating_char` function
```
text_cleaned2=ruqiya.remove_repeating_char(text)
print(text_cleaned2)
```
## Remove punctuations
`remove_punctuations` function
```
text_cleaned3=ruqiya.remove_punctuations(text)
print(text_cleaned3)
```
## Normalize Arabic
`normalize_arabic` function
```
text_cleaned4=ruqiya.normalize_arabic(text)
print(text_cleaned4)
```
## Remove diacritics
`remove_diacritics` function
```
text_cleaned5=ruqiya.remove_diacritics(text)
print(text_cleaned5)
```
## Remove stop words
`remove_stop_words` function
```
text_cleaned6=ruqiya.remove_stop_words(text)
print(text_cleaned6)
```
## Remove emojis
`remove_emojis` function
```
text_cleaned7=ruqiya.remove_emojis(text)
print(text_cleaned7)
```
## Remove mentions
`remove_mentions` function
```
text_cleaned8=ruqiya.remove_mentions(text)
print(text_cleaned8)
```
## Convert any hashtags to words
`hashtags_to_words` function
```
text_cleaned9=ruqiya.hashtags_to_words(text)
print(text_cleaned9)
```
## Remove hashtags
`remove_hashtags` function
```
text_cleaned10=ruqiya.remove_hashtags(text)
print(text_cleaned10)
```
## Remove emails
`remove_emails` function
```
text_cleaned11=ruqiya.remove_emails(text)
print(text_cleaned11)
```
## Remove URLs
`remove_URLs` function
```
text_cleaned12=ruqiya.remove_URLs(text)
print(text_cleaned12)
```
#
## Example
```
from ruqiya import ruqiya
text="""
!!أهلًا وسهلًا بك 👋 في الإصدارِ الأولِ من مكتبة رقيا
هل هذه هي المرة الأولى التي تستخدم فيها المكتبة😀؟!!
معلومات التواصل
ايميل
example@email.com
الموقع
https://pypi.org/project/ruqia/
تويتر
@Ru0Sa
وسم
#معالجة_العربية
"""
print('===========clean_text===========')
text_cleaned1=ruqiya.clean_text(text)
print(text_cleaned1)
print('===========remove_repeating_char===========')
text_cleaned2=ruqiya.remove_repeating_char(text)
print(text_cleaned2)
print('===========remove_punctuations===========')
text_cleaned3=ruqiya.remove_punctuations(text)
print(text_cleaned3)
print('===========normalize_arabic===========')
text_cleaned4=ruqiya.normalize_arabic(text)
print(text_cleaned4)
print('===========remove_diacritics===========')
text_cleaned5=ruqiya.remove_diacritics(text)
print(text_cleaned5)
print('===========remove_stop_words===========')
text_cleaned6=ruqiya.remove_stop_words(text)
print(text_cleaned6)
print('===========remove_emojis===========')
text_cleaned7=ruqiya.remove_emojis(text)
print(text_cleaned7)
print('===========remove_mentions===========')
text_cleaned8=ruqiya.remove_mentions(text)
print(text_cleaned8)
print('===========hashtags_to_words===========')
text_cleaned9=ruqiya.hashtags_to_words(text)
print(text_cleaned9)
print('===========remove_hashtags===========')
text_cleaned10=ruqiya.remove_hashtags(text)
print(text_cleaned10)
print('===========remove_emails===========')
text_cleaned11=ruqiya.remove_emails(text)
print(text_cleaned11)
print('===========remove_URLs===========')
text_cleaned12=ruqiya.remove_URLs(text)
print(text_cleaned12)
```
## Example 2: Apply a Function to Pandas DataFrame (Single Column)
```
from ruqiya.ruqiya import clean_text
import pandas as pd
data="https://raw.githubusercontent.com/Ruqyai/data4test/main/test_with_lables.csv"
df=pd.read_csv(data)
df['text']=df['poem_text']
#--------------------
# Often df['text'] be Object not String, so we need to apply str
df['text']=df['text'].apply(str)
# Now apply our function
df['cleaned_text']=df['text'].apply(clean_text)
#--------------------
# Show the result
df['cleaned_text']
```
Raw data
{
"_id": null,
"home_page": "https://github.com/Ruqyai/Ara-NLP-lib",
"name": "ruqia",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7, <4",
"maintainer_email": "",
"keywords": "Arabic,NLP,development",
"author": "Ruqiya Bin Safi",
"author_email": "myacount05@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/d1/bd/7d8e547a4939612ae8e34f74d599e4a2e035c9a6775ccdef82e60e78e9b4/ruqia-0.0.17.tar.gz",
"platform": null,
"description": "# Ruqia lib\nThis library used for Arabic NLP to process, prepare and clean the Arabic text \n\n\n\u0645\u0643\u062a\u0628\u0629 \u0645\u062e\u0635\u0635\u0629 \u0644\u062e\u062f\u0645\u0629 \u0645\u0639\u0627\u0644\u062c\u0629 \u0627\u0644\u0644\u063a\u0629 \u0627\u0644\u0639\u0631\u0628\u064a\u0629 \u0648\u062a\u0634\u0645\u0644 \u0639\u062f\u062f \u0645\u0646 \u0627\u0644\u0648\u0638\u0627\u0626\u0641 \u0644\u062a\u0646\u0638\u064a\u0641 \u0627\u0644\u0646\u0635\u0648\u0635 \u0648\u063a\u064a\u0631\u0647\u0627\n\n## Install\n```\npip install ruqia\n```\n## Use\n```\nfrom ruqiya import ruqiya\n```\n## Example: Apply a Function to Pandas Single Column\n\n```\nfrom ruqiya.ruqiya import clean_text\n\n# Often df['text'] be Object not String, so we need to apply str \ndf['text']=df['text'].apply(str)\n# Now apply our function\ndf['cleaned_text']=df['text'].apply(clean_text)\n# Show the result\ndf['cleaned_text']\n```\n\n## All Functions\n\n## Clean the text \n`clean_text` function includes all these functions: \n > 1. remove_emails \n > 2. remove_URLs \n > 3. remove_mentions \n > 4. hashtags_to_words \n > 5. remove_punctuations \n > 6. normalize_arabic \n > 7. remove_diacritics \n > 8. remove_repeating_char \n > 9. remove_stop_words \n > 10. remove_emojis\n\n In other words, `clean_text` includes all functions except `remove_hashtags` \n```\ntext_cleaned1=ruqiya.clean_text(text)\nprint(text_cleaned1)\n```\n## Remove repeating character\n`remove_repeating_char` function\n```\ntext_cleaned2=ruqiya.remove_repeating_char(text)\nprint(text_cleaned2)\n```\n## Remove punctuations\n`remove_punctuations` function\n```\ntext_cleaned3=ruqiya.remove_punctuations(text)\nprint(text_cleaned3)\n```\n## Normalize Arabic\n`normalize_arabic` function\n\n```\ntext_cleaned4=ruqiya.normalize_arabic(text)\nprint(text_cleaned4)\n```\n## Remove diacritics\n`remove_diacritics` function\n```\ntext_cleaned5=ruqiya.remove_diacritics(text)\nprint(text_cleaned5)\n```\n## Remove stop words\n`remove_stop_words` function\n```\ntext_cleaned6=ruqiya.remove_stop_words(text)\nprint(text_cleaned6)\n```\n## Remove emojis\n`remove_emojis` function\n```\ntext_cleaned7=ruqiya.remove_emojis(text)\nprint(text_cleaned7)\n```\n\n## Remove mentions\n`remove_mentions` function\n```\ntext_cleaned8=ruqiya.remove_mentions(text)\nprint(text_cleaned8)\n```\n## Convert any hashtags to words\n`hashtags_to_words` function\n```\ntext_cleaned9=ruqiya.hashtags_to_words(text)\nprint(text_cleaned9)\n```\n\n## Remove hashtags\n`remove_hashtags` function\n```\ntext_cleaned10=ruqiya.remove_hashtags(text)\nprint(text_cleaned10)\n```\n## Remove emails\n`remove_emails` function\n```\ntext_cleaned11=ruqiya.remove_emails(text)\nprint(text_cleaned11)\n```\n## Remove URLs\n`remove_URLs` function\n```\ntext_cleaned12=ruqiya.remove_URLs(text)\nprint(text_cleaned12)\n```\n#\n## Example\n```\nfrom ruqiya import ruqiya\n\ntext=\"\"\"\n!!\u0623\u0647\u0644\u064b\u0627 \u0648\u0633\u0647\u0644\u064b\u0627 \u0628\u0643 \ud83d\udc4b \u0641\u064a \u0627\u0644\u0625\u0635\u062f\u0627\u0631\u0650 \u0627\u0644\u0623\u0648\u0644\u0650 \u0645\u0646 \u0645\u0643\u062a\u0628\u0629 \u0631\u0642\u064a\u0627\n\u0647\u0644 \u0647\u0630\u0647 \u0647\u064a \u0627\u0644\u0645\u0631\u0629 \u0627\u0644\u0623\u0648\u0644\u0649 \u0627\u0644\u062a\u064a \u062a\u0633\u062a\u062e\u062f\u0645 \u0641\u064a\u0647\u0627 \u0627\u0644\u0645\u0643\u062a\u0628\u0629\ud83d\ude00\u061f!!\n\u0645\u0639\u0644\u0648\u0645\u0627\u062a \u0627\u0644\u062a\u0648\u0627\u0635\u0644 \n\u0627\u064a\u0645\u064a\u0644\nexample@email.com\n\u0627\u0644\u0645\u0648\u0642\u0639\nhttps://pypi.org/project/ruqia/\n\u062a\u0648\u064a\u062a\u0631\n@Ru0Sa\n\u0648\u0633\u0645\n#\u0645\u0639\u0627\u0644\u062c\u0629_\u0627\u0644\u0639\u0631\u0628\u064a\u0629\n\"\"\"\n\nprint('===========clean_text===========')\ntext_cleaned1=ruqiya.clean_text(text)\nprint(text_cleaned1)\nprint('===========remove_repeating_char===========')\ntext_cleaned2=ruqiya.remove_repeating_char(text)\nprint(text_cleaned2)\nprint('===========remove_punctuations===========')\ntext_cleaned3=ruqiya.remove_punctuations(text)\nprint(text_cleaned3)\nprint('===========normalize_arabic===========')\ntext_cleaned4=ruqiya.normalize_arabic(text)\nprint(text_cleaned4)\nprint('===========remove_diacritics===========')\ntext_cleaned5=ruqiya.remove_diacritics(text)\nprint(text_cleaned5)\nprint('===========remove_stop_words===========')\ntext_cleaned6=ruqiya.remove_stop_words(text)\nprint(text_cleaned6)\nprint('===========remove_emojis===========')\ntext_cleaned7=ruqiya.remove_emojis(text)\nprint(text_cleaned7)\nprint('===========remove_mentions===========')\ntext_cleaned8=ruqiya.remove_mentions(text)\nprint(text_cleaned8)\nprint('===========hashtags_to_words===========')\ntext_cleaned9=ruqiya.hashtags_to_words(text)\nprint(text_cleaned9)\nprint('===========remove_hashtags===========')\ntext_cleaned10=ruqiya.remove_hashtags(text)\nprint(text_cleaned10)\nprint('===========remove_emails===========')\ntext_cleaned11=ruqiya.remove_emails(text)\nprint(text_cleaned11)\nprint('===========remove_URLs===========')\ntext_cleaned12=ruqiya.remove_URLs(text)\nprint(text_cleaned12)\n\n```\n\n## Example 2: Apply a Function to Pandas DataFrame (Single Column)\n\n```\nfrom ruqiya.ruqiya import clean_text\nimport pandas as pd\n\ndata=\"https://raw.githubusercontent.com/Ruqyai/data4test/main/test_with_lables.csv\"\ndf=pd.read_csv(data)\ndf['text']=df['poem_text']\n\n#--------------------\n# Often df['text'] be Object not String, so we need to apply str \ndf['text']=df['text'].apply(str)\n# Now apply our function\ndf['cleaned_text']=df['text'].apply(clean_text)\n#--------------------\n\n# Show the result\ndf['cleaned_text']\n```\n",
"bugtrack_url": null,
"license": "",
"summary": "Arabic NLP",
"version": "0.0.17",
"split_keywords": [
"arabic",
"nlp",
"development"
],
"urls": [
{
"comment_text": "",
"digests": {
"md5": "ff345af1b061e93b14807bf6b327a61a",
"sha256": "cb17594701f45eb5c20ccb0216cc8a74fea225bf47363051d1b05eae55fc9f60"
},
"downloads": -1,
"filename": "ruqia-0.0.17-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ff345af1b061e93b14807bf6b327a61a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7, <4",
"size": 10072,
"upload_time": "2023-01-02T12:04:56",
"upload_time_iso_8601": "2023-01-02T12:04:56.485611Z",
"url": "https://files.pythonhosted.org/packages/d7/97/c69313e4a918635b5a3ffd385d8698b51638572aa813321bfb31d71de869/ruqia-0.0.17-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"md5": "c1d5c3b175e23318b869441d9d2cad21",
"sha256": "151fed03ca22f830f07de7d78bea7a1930ac73198c27cac9b4952c34951c3f19"
},
"downloads": -1,
"filename": "ruqia-0.0.17.tar.gz",
"has_sig": false,
"md5_digest": "c1d5c3b175e23318b869441d9d2cad21",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7, <4",
"size": 13613,
"upload_time": "2023-01-02T12:04:58",
"upload_time_iso_8601": "2023-01-02T12:04:58.188024Z",
"url": "https://files.pythonhosted.org/packages/d1/bd/7d8e547a4939612ae8e34f74d599e4a2e035c9a6775ccdef82e60e78e9b4/ruqia-0.0.17.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-01-02 12:04:58",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "Ruqyai",
"github_project": "Ara-NLP-lib",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "ruqia"
}