# fathah
Lightweight NLP preprocessing package for Arabic language
## Installation
```sh
pip install fathah
```
## Usage
```python
from Fathah import TextClean
```
## Methods
### Clean the text
`clean_text` function includes all these functions:
> 1. remove_emails
> 2. remove_URLs
> 3. remove_mentions
> 4. hashtags_to_words
> 5. remove_punctuations
> 6. normalize_arabic
> 7. remove_diacritics
> 8. remove_repeating_char
> 9. remove_stop_words
> 10. remove_emojis
In other words, `clean_text` includes all functions except `remove_hashtags`
```
text_cleaned1 = TextClean.clean_text(text)
print(text_cleaned1)
```
### Remove repeating character
`remove_repeating_char` function
```
text_cleaned2 = TextClean.remove_repeating_char(text)
print(text_cleaned2)
```
### Remove punctuations
`remove_punctuations` function
```
text_cleaned3 = TextClean.remove_punctuations(text)
print(text_cleaned3)
```
### Normalize Arabic
`normalize_arabic` function
```
text_cleaned4 = TextClean.normalize_arabic(text)
print(text_cleaned4)
```
### Remove diacritics
`remove_diacritics` function
```
text_cleaned5= TextClean.remove_diacritics(text)
print(text_cleaned5)
```
### Remove stop words
`remove_stop_words` function
```
text_cleaned6 = TextClean.remove_stop_words(text)
print(text_cleaned6)
```
### Remove emojis
`remove_emojis` function
```
text_cleaned7 = TextClean.remove_emojis(text)
print(text_cleaned7)
```
### Remove mentions
`remove_mentions` function
```
text_cleaned8 = TextClean.remove_mentions(text)
print(text_cleaned8)
```
### Convert any hashtags to words
`hashtags_to_words` function
```
text_cleaned9 = TextClean.hashtags_to_words(text)
print(text_cleaned9)
```
### Remove hashtags
`remove_hashtags` function
```
text_cleaned10 = TextClean.remove_hashtags(text)
print(text_cleaned10)
```
### Remove emails
`remove_emails` function
```
text_cleaned11 = TextClean.remove_emails(text)
print(text_cleaned11)
```
### Remove URLs
`remove_URLs` function
```
text_cleaned12 = TextClean.remove_URLs(text)
print(text_cleaned12)
```
## Example
```python
from fathah import TextClean
cleaner = TextClean(text)
cleaner.remove_diacritics()
# Outputs: السلام عليكم ورحمة الله وبركاته
```
*This package is under development. Contributions are highly welcome*
[Github](https://github.com/fathah) | [IG](https://instagram.com/fatha_cr)
Raw data
{
"_id": null,
"home_page": "https://github.com/fathah/fathah_python",
"name": "fathah",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "nlp,fathah,arabic",
"author": "Abdul Fathah KA",
"author_email": "fathah@ziqx.in",
"download_url": "https://files.pythonhosted.org/packages/67/83/ae299c84346b5bf62a2f285bf098d07a077b85f193b3218c05c73c51f3b8/fathah-0.0.2.tar.gz",
"platform": null,
"description": "\r\n# fathah\r\n\r\nLightweight NLP preprocessing package for Arabic language\r\n\r\n\r\n\r\n## Installation\r\n\r\n```sh\r\n\r\npip install fathah\r\n\r\n```\r\n\r\n## Usage\r\n\r\n```python\r\n\r\nfrom Fathah import TextClean\r\n\r\n```\r\n\r\n\r\n\r\n## Methods \r\n\r\n\r\n\r\n ### Clean the text \r\n\r\n`clean_text` function includes all these functions: \r\n\r\n > 1. remove_emails \r\n\r\n > 2. remove_URLs \r\n\r\n > 3. remove_mentions \r\n\r\n > 4. hashtags_to_words \r\n\r\n > 5. remove_punctuations \r\n\r\n > 6. normalize_arabic \r\n\r\n > 7. remove_diacritics \r\n\r\n > 8. remove_repeating_char \r\n\r\n > 9. remove_stop_words \r\n\r\n > 10. remove_emojis\r\n\r\n\r\n\r\n In other words, `clean_text` includes all functions except `remove_hashtags` \r\n\r\n```\r\n\r\ntext_cleaned1 = TextClean.clean_text(text)\r\n\r\nprint(text_cleaned1)\r\n\r\n```\r\n\r\n\r\n\r\n ### Remove repeating character\r\n\r\n`remove_repeating_char` function\r\n\r\n```\r\n\r\ntext_cleaned2 = TextClean.remove_repeating_char(text)\r\n\r\nprint(text_cleaned2)\r\n\r\n```\r\n\r\n\r\n\r\n ### Remove punctuations\r\n\r\n`remove_punctuations` function\r\n\r\n```\r\n\r\ntext_cleaned3 = TextClean.remove_punctuations(text)\r\n\r\nprint(text_cleaned3)\r\n\r\n```\r\n\r\n\r\n\r\n ### Normalize Arabic\r\n\r\n`normalize_arabic` function\r\n\r\n\r\n\r\n```\r\n\r\ntext_cleaned4 = TextClean.normalize_arabic(text)\r\n\r\nprint(text_cleaned4)\r\n\r\n```\r\n\r\n\r\n\r\n ### Remove diacritics\r\n\r\n`remove_diacritics` function\r\n\r\n```\r\n\r\ntext_cleaned5= TextClean.remove_diacritics(text)\r\n\r\nprint(text_cleaned5)\r\n\r\n```\r\n\r\n\r\n\r\n ### Remove stop words\r\n\r\n`remove_stop_words` function\r\n\r\n```\r\n\r\ntext_cleaned6 = TextClean.remove_stop_words(text)\r\n\r\nprint(text_cleaned6)\r\n\r\n```\r\n\r\n\r\n\r\n ### Remove emojis\r\n\r\n`remove_emojis` function\r\n\r\n```\r\n\r\ntext_cleaned7 = TextClean.remove_emojis(text)\r\n\r\nprint(text_cleaned7)\r\n\r\n```\r\n\r\n\r\n\r\n ### Remove mentions\r\n\r\n`remove_mentions` function\r\n\r\n```\r\n\r\ntext_cleaned8 = TextClean.remove_mentions(text)\r\n\r\nprint(text_cleaned8)\r\n\r\n```\r\n\r\n\r\n\r\n ### Convert any hashtags to words\r\n\r\n`hashtags_to_words` function\r\n\r\n```\r\n\r\ntext_cleaned9 = TextClean.hashtags_to_words(text)\r\n\r\nprint(text_cleaned9)\r\n\r\n```\r\n\r\n\r\n\r\n ### Remove hashtags\r\n\r\n`remove_hashtags` function\r\n\r\n```\r\n\r\ntext_cleaned10 = TextClean.remove_hashtags(text)\r\n\r\nprint(text_cleaned10)\r\n\r\n```\r\n\r\n\r\n\r\n ### Remove emails\r\n\r\n`remove_emails` function\r\n\r\n```\r\n\r\ntext_cleaned11 = TextClean.remove_emails(text)\r\n\r\nprint(text_cleaned11)\r\n\r\n```\r\n\r\n\r\n\r\n ### Remove URLs\r\n\r\n`remove_URLs` function\r\n\r\n```\r\n\r\ntext_cleaned12 = TextClean.remove_URLs(text)\r\n\r\nprint(text_cleaned12)\r\n\r\n```\r\n\r\n\r\n\r\n\r\n\r\n## Example\r\n\r\n```python\r\n\r\nfrom fathah import TextClean\r\n\r\n\r\n\r\ncleaner = TextClean(text)\r\n\r\ncleaner.remove_diacritics()\r\n\r\n\r\n\r\n# Outputs: \u0627\u0644\u0633\u0644\u0627\u0645 \u0639\u0644\u064a\u0643\u0645 \u0648\u0631\u062d\u0645\u0629 \u0627\u0644\u0644\u0647 \u0648\u0628\u0631\u0643\u0627\u062a\u0647\r\n\r\n```\r\n\r\n\r\n\r\n\r\n\r\n*This package is under development. Contributions are highly welcome*\r\n\r\n\r\n\r\n[Github](https://github.com/fathah) | [IG](https://instagram.com/fatha_cr)\r\n\r\n",
"bugtrack_url": null,
"license": "",
"summary": "Lightweight NLP preprocessing package for Arabic language",
"version": "0.0.2",
"split_keywords": [
"nlp",
"fathah",
"arabic"
],
"urls": [
{
"comment_text": "",
"digests": {
"md5": "2f4083f3ac2b549b6c5b9176176c35c2",
"sha256": "9ef6c0e02f13396e510b707c8d8da36769b30dda87bb35d207e4d06da21fa96f"
},
"downloads": -1,
"filename": "fathah-0.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2f4083f3ac2b549b6c5b9176176c35c2",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 9518,
"upload_time": "2022-12-11T08:48:19",
"upload_time_iso_8601": "2022-12-11T08:48:19.340346Z",
"url": "https://files.pythonhosted.org/packages/5e/41/f553b8d235813779c47c9d5b2267990ab186cafab55ceed36651c21eeef5/fathah-0.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"md5": "d79384d81725f3b47a0761f75d37960d",
"sha256": "c0a4e56cb44d0b6456e0885eaad3990913900b4bbc09fd509b67655a9e4397c2"
},
"downloads": -1,
"filename": "fathah-0.0.2.tar.gz",
"has_sig": false,
"md5_digest": "d79384d81725f3b47a0761f75d37960d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 10722,
"upload_time": "2022-12-11T08:48:23",
"upload_time_iso_8601": "2022-12-11T08:48:23.334208Z",
"url": "https://files.pythonhosted.org/packages/67/83/ae299c84346b5bf62a2f285bf098d07a077b85f193b3218c05c73c51f3b8/fathah-0.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2022-12-11 08:48:23",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "fathah",
"github_project": "fathah_python",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "fathah"
}