# Ruqia Library
This library used for Arabic NLP to process, prepare and clean the Arabic text
مكتبة مخصصة لخدمة معالجة اللغة العربية وتشمل عدد من الوظائف لتنظيف النصوص وغيرها
## Install
```
pip install ruqia
```
## Use
```
from ruqiya import ruqiya
```
## Example: Apply a Function to Pandas Single Column
```
from ruqiya.ruqiya import clean_text
# Often df['text'] be Object not String, so we need to apply str
df['text']=df['text'].apply(str)
# Now apply our function
df['cleaned_text']=df['text'].apply(clean_text)
# Show the result
df['cleaned_text']
```
## All Functions
## Clean the text
`clean_text` function includes all these functions:
> 1. remove_emails
> 2. remove_URLs
> 3. remove_mentions
> 4. hashtags_to_words
> 5. remove_punctuations
> 6. normalize_arabic
> 7. remove_diacritics
> 8. remove_repeating_char
> 9. remove_stop_words
> 10. remove_emojis
In other words, `clean_text` includes all functions except `remove_hashtags`
```
text_cleaned1=ruqiya.clean_text(text)
print(text_cleaned1)
```
## Remove repeating character
`remove_repeating_char` function
```
text_cleaned2=ruqiya.remove_repeating_char(text)
print(text_cleaned2)
```
## Remove punctuations
`remove_punctuations` function
```
text_cleaned3=ruqiya.remove_punctuations(text)
print(text_cleaned3)
```
## Normalize Arabic
`normalize_arabic` function
```
text_cleaned4=ruqiya.normalize_arabic(text)
print(text_cleaned4)
```
## Remove diacritics
`remove_diacritics` function
```
text_cleaned5=ruqiya.remove_diacritics(text)
print(text_cleaned5)
```
## Remove stop words
`remove_stop_words` function
```
text_cleaned6=ruqiya.remove_stop_words(text)
print(text_cleaned6)
```
## Remove emojis
`remove_emojis` function
```
text_cleaned7=ruqiya.remove_emojis(text)
print(text_cleaned7)
```
## Remove mentions
`remove_mentions` function
```
text_cleaned8=ruqiya.remove_mentions(text)
print(text_cleaned8)
```
## Convert any hashtags to words
`hashtags_to_words` function
```
text_cleaned9=ruqiya.hashtags_to_words(text)
print(text_cleaned9)
```
## Remove hashtags
`remove_hashtags` function
```
text_cleaned10=ruqiya.remove_hashtags(text)
print(text_cleaned10)
```
## Remove emails
`remove_emails` function
```
text_cleaned11=ruqiya.remove_emails(text)
print(text_cleaned11)
```
## Remove URLs
`remove_URLs` function
```
text_cleaned12=ruqiya.remove_URLs(text)
print(text_cleaned12)
```
#
## Example
```
from ruqiya import ruqiya
text="""
!!أهلًا وسهلًا بك 👋 في الإصدارِ الأولِ من مكتبة رقيا
هل هذه هي المرة الأولى التي تستخدم فيها المكتبة😀؟!!
معلومات التواصل
ايميل
example@email.com
الموقع
https://pypi.org/project/ruqia/
تويتر
@Ru0Sa
وسم
#معالجة_العربية
"""
print('===========clean_text===========')
text_cleaned1=ruqiya.clean_text(text)
print(text_cleaned1)
print('===========remove_repeating_char===========')
text_cleaned2=ruqiya.remove_repeating_char(text)
print(text_cleaned2)
print('===========remove_punctuations===========')
text_cleaned3=ruqiya.remove_punctuations(text)
print(text_cleaned3)
print('===========normalize_arabic===========')
text_cleaned4=ruqiya.normalize_arabic(text)
print(text_cleaned4)
print('===========remove_diacritics===========')
text_cleaned5=ruqiya.remove_diacritics(text)
print(text_cleaned5)
print('===========remove_stop_words===========')
text_cleaned6=ruqiya.remove_stop_words(text)
print(text_cleaned6)
print('===========remove_emojis===========')
text_cleaned7=ruqiya.remove_emojis(text)
print(text_cleaned7)
print('===========remove_mentions===========')
text_cleaned8=ruqiya.remove_mentions(text)
print(text_cleaned8)
print('===========hashtags_to_words===========')
text_cleaned9=ruqiya.hashtags_to_words(text)
print(text_cleaned9)
print('===========remove_hashtags===========')
text_cleaned10=ruqiya.remove_hashtags(text)
print(text_cleaned10)
print('===========remove_emails===========')
text_cleaned11=ruqiya.remove_emails(text)
print(text_cleaned11)
print('===========remove_URLs===========')
text_cleaned12=ruqiya.remove_URLs(text)
print(text_cleaned12)
```
## Example 2: Apply a Function to Pandas DataFrame (Single Column)
```
from ruqiya.ruqiya import clean_text
import pandas as pd
data="https://raw.githubusercontent.com/Ruqyai/data4test/main/test_with_lables.csv"
df=pd.read_csv(data)
df['text']=df['poem_text']
#--------------------
# Often df['text'] be Object not String, so we need to apply str
df['text']=df['text'].apply(str)
# Now apply our function
df['cleaned_text']=df['text'].apply(clean_text)
#--------------------
# Show the result
df['cleaned_text']
```
# Citing Ruqia
If Ruqia helps your research, we appreciate your citations. Here is the BibTeX entry:
```
@misc{Ruqia2022,
title={Ruqia-Library},
author={Ruqiya Bin Safi},
year={2022},
howpublished={\url{https://github.com/Ruqyai/Ruqia-Library}},
}
```
Raw data
{
"_id": null,
"home_page": "https://github.com/Ruqyai/Ruqia-Library",
"name": "ruqia",
"maintainer": null,
"docs_url": null,
"requires_python": "<4,>=3.7",
"maintainer_email": null,
"keywords": "Arabic, NLP, development",
"author": "Ruqiya Bin Safi",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/82/a9/8204f503cdf8a8314d4194f293dc119506c374597123a9fe456ce6437417/ruqia-0.0.23.tar.gz",
"platform": null,
"description": "# Ruqia Library\nThis library used for Arabic NLP to process, prepare and clean the Arabic text \n\n\n\u0645\u0643\u062a\u0628\u0629 \u0645\u062e\u0635\u0635\u0629 \u0644\u062e\u062f\u0645\u0629 \u0645\u0639\u0627\u0644\u062c\u0629 \u0627\u0644\u0644\u063a\u0629 \u0627\u0644\u0639\u0631\u0628\u064a\u0629 \u0648\u062a\u0634\u0645\u0644 \u0639\u062f\u062f \u0645\u0646 \u0627\u0644\u0648\u0638\u0627\u0626\u0641 \u0644\u062a\u0646\u0638\u064a\u0641 \u0627\u0644\u0646\u0635\u0648\u0635 \u0648\u063a\u064a\u0631\u0647\u0627\n\n## Install\n```\npip install ruqia\n```\n## Use\n```\nfrom ruqiya import ruqiya\n```\n## Example: Apply a Function to Pandas Single Column\n\n```\nfrom ruqiya.ruqiya import clean_text\n\n# Often df['text'] be Object not String, so we need to apply str \ndf['text']=df['text'].apply(str)\n# Now apply our function\ndf['cleaned_text']=df['text'].apply(clean_text)\n# Show the result\ndf['cleaned_text']\n```\n\n## All Functions\n\n## Clean the text \n`clean_text` function includes all these functions: \n > 1. remove_emails \n > 2. remove_URLs \n > 3. remove_mentions \n > 4. hashtags_to_words \n > 5. remove_punctuations \n > 6. normalize_arabic \n > 7. remove_diacritics \n > 8. remove_repeating_char \n > 9. remove_stop_words \n > 10. remove_emojis\n\n In other words, `clean_text` includes all functions except `remove_hashtags` \n```\ntext_cleaned1=ruqiya.clean_text(text)\nprint(text_cleaned1)\n```\n## Remove repeating character\n`remove_repeating_char` function\n```\ntext_cleaned2=ruqiya.remove_repeating_char(text)\nprint(text_cleaned2)\n```\n## Remove punctuations\n`remove_punctuations` function\n```\ntext_cleaned3=ruqiya.remove_punctuations(text)\nprint(text_cleaned3)\n```\n## Normalize Arabic\n`normalize_arabic` function\n\n```\ntext_cleaned4=ruqiya.normalize_arabic(text)\nprint(text_cleaned4)\n```\n## Remove diacritics\n`remove_diacritics` function\n```\ntext_cleaned5=ruqiya.remove_diacritics(text)\nprint(text_cleaned5)\n```\n## Remove stop words\n`remove_stop_words` function\n```\ntext_cleaned6=ruqiya.remove_stop_words(text)\nprint(text_cleaned6)\n```\n## Remove emojis\n`remove_emojis` function\n```\ntext_cleaned7=ruqiya.remove_emojis(text)\nprint(text_cleaned7)\n```\n\n## Remove mentions\n`remove_mentions` function\n```\ntext_cleaned8=ruqiya.remove_mentions(text)\nprint(text_cleaned8)\n```\n## Convert any hashtags to words\n`hashtags_to_words` function\n```\ntext_cleaned9=ruqiya.hashtags_to_words(text)\nprint(text_cleaned9)\n```\n\n## Remove hashtags\n`remove_hashtags` function\n```\ntext_cleaned10=ruqiya.remove_hashtags(text)\nprint(text_cleaned10)\n```\n## Remove emails\n`remove_emails` function\n```\ntext_cleaned11=ruqiya.remove_emails(text)\nprint(text_cleaned11)\n```\n## Remove URLs\n`remove_URLs` function\n```\ntext_cleaned12=ruqiya.remove_URLs(text)\nprint(text_cleaned12)\n```\n#\n## Example\n```\nfrom ruqiya import ruqiya\n\ntext=\"\"\"\n!!\u0623\u0647\u0644\u064b\u0627 \u0648\u0633\u0647\u0644\u064b\u0627 \u0628\u0643 \ud83d\udc4b \u0641\u064a \u0627\u0644\u0625\u0635\u062f\u0627\u0631\u0650 \u0627\u0644\u0623\u0648\u0644\u0650 \u0645\u0646 \u0645\u0643\u062a\u0628\u0629 \u0631\u0642\u064a\u0627\n\u0647\u0644 \u0647\u0630\u0647 \u0647\u064a \u0627\u0644\u0645\u0631\u0629 \u0627\u0644\u0623\u0648\u0644\u0649 \u0627\u0644\u062a\u064a \u062a\u0633\u062a\u062e\u062f\u0645 \u0641\u064a\u0647\u0627 \u0627\u0644\u0645\u0643\u062a\u0628\u0629\ud83d\ude00\u061f!!\n\u0645\u0639\u0644\u0648\u0645\u0627\u062a \u0627\u0644\u062a\u0648\u0627\u0635\u0644 \n\u0627\u064a\u0645\u064a\u0644\nexample@email.com\n\u0627\u0644\u0645\u0648\u0642\u0639\nhttps://pypi.org/project/ruqia/\n\u062a\u0648\u064a\u062a\u0631\n@Ru0Sa\n\u0648\u0633\u0645\n#\u0645\u0639\u0627\u0644\u062c\u0629_\u0627\u0644\u0639\u0631\u0628\u064a\u0629\n\"\"\"\n\nprint('===========clean_text===========')\ntext_cleaned1=ruqiya.clean_text(text)\nprint(text_cleaned1)\nprint('===========remove_repeating_char===========')\ntext_cleaned2=ruqiya.remove_repeating_char(text)\nprint(text_cleaned2)\nprint('===========remove_punctuations===========')\ntext_cleaned3=ruqiya.remove_punctuations(text)\nprint(text_cleaned3)\nprint('===========normalize_arabic===========')\ntext_cleaned4=ruqiya.normalize_arabic(text)\nprint(text_cleaned4)\nprint('===========remove_diacritics===========')\ntext_cleaned5=ruqiya.remove_diacritics(text)\nprint(text_cleaned5)\nprint('===========remove_stop_words===========')\ntext_cleaned6=ruqiya.remove_stop_words(text)\nprint(text_cleaned6)\nprint('===========remove_emojis===========')\ntext_cleaned7=ruqiya.remove_emojis(text)\nprint(text_cleaned7)\nprint('===========remove_mentions===========')\ntext_cleaned8=ruqiya.remove_mentions(text)\nprint(text_cleaned8)\nprint('===========hashtags_to_words===========')\ntext_cleaned9=ruqiya.hashtags_to_words(text)\nprint(text_cleaned9)\nprint('===========remove_hashtags===========')\ntext_cleaned10=ruqiya.remove_hashtags(text)\nprint(text_cleaned10)\nprint('===========remove_emails===========')\ntext_cleaned11=ruqiya.remove_emails(text)\nprint(text_cleaned11)\nprint('===========remove_URLs===========')\ntext_cleaned12=ruqiya.remove_URLs(text)\nprint(text_cleaned12)\n\n```\n\n## Example 2: Apply a Function to Pandas DataFrame (Single Column)\n\n```\nfrom ruqiya.ruqiya import clean_text\nimport pandas as pd\n\ndata=\"https://raw.githubusercontent.com/Ruqyai/data4test/main/test_with_lables.csv\"\ndf=pd.read_csv(data)\ndf['text']=df['poem_text']\n\n#--------------------\n# Often df['text'] be Object not String, so we need to apply str \ndf['text']=df['text'].apply(str)\n# Now apply our function\ndf['cleaned_text']=df['text'].apply(clean_text)\n#--------------------\n\n# Show the result\ndf['cleaned_text']\n```\n# Citing Ruqia\nIf Ruqia helps your research, we appreciate your citations. Here is the BibTeX entry: \n```\n@misc{Ruqia2022,\n title={Ruqia-Library},\n author={Ruqiya Bin Safi},\n year={2022},\n howpublished={\\url{https://github.com/Ruqyai/Ruqia-Library}},\n}\n```\n",
"bugtrack_url": null,
"license": null,
"summary": "Arabic NLP",
"version": "0.0.23",
"project_urls": {
"Become a sponsor": "https://github.com/sponsors/Ruqyai",
"Bug Reports": "https://github.com/Ruqyai/Ruqia-Library/issues",
"Homepage": "https://github.com/Ruqyai/Ruqia-Library",
"Source": "https://github.com/Ruqyai/Ruqia-Library"
},
"split_keywords": [
"arabic",
" nlp",
" development"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "f408ac61ea565b427cd7b9fe0d713629825c030327dd67dcffa558c0bdb6249b",
"md5": "3191503eb23a71813c06b11a79843701",
"sha256": "f34c0a4ee9fd130ab065f98aa91ebb71a47919882f079c8570dee7954314dbb4"
},
"downloads": -1,
"filename": "ruqia-0.0.23-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3191503eb23a71813c06b11a79843701",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4,>=3.7",
"size": 18228,
"upload_time": "2024-07-06T11:23:50",
"upload_time_iso_8601": "2024-07-06T11:23:50.053712Z",
"url": "https://files.pythonhosted.org/packages/f4/08/ac61ea565b427cd7b9fe0d713629825c030327dd67dcffa558c0bdb6249b/ruqia-0.0.23-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "82a98204f503cdf8a8314d4194f293dc119506c374597123a9fe456ce6437417",
"md5": "5b5f7c6b526a097b733c90a9c1a433f7",
"sha256": "55492200f54ff35cbefa2a39af4e3f8b8ea042dbad06c7a3416aa0d24d37fb97"
},
"downloads": -1,
"filename": "ruqia-0.0.23.tar.gz",
"has_sig": false,
"md5_digest": "5b5f7c6b526a097b733c90a9c1a433f7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4,>=3.7",
"size": 19348,
"upload_time": "2024-07-06T11:23:51",
"upload_time_iso_8601": "2024-07-06T11:23:51.815602Z",
"url": "https://files.pythonhosted.org/packages/82/a9/8204f503cdf8a8314d4194f293dc119506c374597123a9fe456ce6437417/ruqia-0.0.23.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-06 11:23:51",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Ruqyai",
"github_project": "Ruqia-Library",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "ruqia"
}