# UltraClean
UltraClean is a fast and efficient Python library for cleaning and preprocessing text data, specifically designed for AI/ML tasks and data processing.
## Features
- Remove unwanted characters, links, emails, phone numbers, underscores, unicode characters, emojis, numbers, currencies, punctuation, HTML tags, LaTeX commands, and more.
- Handle multi-dots, extra spaces, and hashtags.
- Batch processing for efficient text cleaning.
- Spam detection and filtering using pre-trained models.
## Installation
You can install UltraClean using pip:
```bash
pip install ultraclean
```
## Usage
### Text Cleaning
```python
from ultraclean.clean import cleanup
text = "Congratulations! You've won a free trip to Hawaii. Click here to claim your prize. This is not a scam."
cleaned_text = cleanup(text)
print(cleaned_text)
```
### Spam Detection
```python
from ultraclean.predict import Spam
spam_detector = Spam()
text = "Congratulations! You've won a free trip to Hawaii. Click here to claim your prize."
is_spam = spam_detector.predict(text)
print(f"Is the text spam? {'Yes' if is_spam else 'No'}")
paragraph = "Congratulations! You've won a free trip to Hawaii. Click here to claim your prize. This is not a scam."
cleaned_paragraph = spam_detector.filter(paragraph)
print(cleaned_paragraph)
```
## License
This project is licensed under the MIT License with attribution requirement.
## Author
Ranit Bhowmick - [bhowmickranitking@duck.com](mailto:bhowmickranitking@duck.com)
Raw data
{
"_id": null,
"home_page": "https://github.com/Kawai-Senpai/UltraClean",
"name": "ultraclean",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "Text Cleaning, Data Preprocessing, AI, ML, Spam Detection",
"author": "Ranit Bhowmick",
"author_email": "bhowmickranitking@duck.com",
"download_url": "https://files.pythonhosted.org/packages/1d/f6/69331e1224049788172374174686638be715517dead4e60167db9ab8833f/ultraclean-0.2.2.tar.gz",
"platform": null,
"description": "# UltraClean\r\n\r\nUltraClean is a fast and efficient Python library for cleaning and preprocessing text data, specifically designed for AI/ML tasks and data processing.\r\n\r\n## Features\r\n\r\n- Remove unwanted characters, links, emails, phone numbers, underscores, unicode characters, emojis, numbers, currencies, punctuation, HTML tags, LaTeX commands, and more.\r\n- Handle multi-dots, extra spaces, and hashtags.\r\n- Batch processing for efficient text cleaning.\r\n- Spam detection and filtering using pre-trained models.\r\n\r\n## Installation\r\n\r\nYou can install UltraClean using pip:\r\n\r\n```bash\r\npip install ultraclean\r\n```\r\n\r\n## Usage\r\n\r\n### Text Cleaning\r\n\r\n```python\r\nfrom ultraclean.clean import cleanup\r\n\r\ntext = \"Congratulations! You've won a free trip to Hawaii. Click here to claim your prize. This is not a scam.\"\r\ncleaned_text = cleanup(text)\r\nprint(cleaned_text)\r\n```\r\n\r\n### Spam Detection\r\n\r\n```python\r\nfrom ultraclean.predict import Spam\r\n\r\nspam_detector = Spam()\r\ntext = \"Congratulations! You've won a free trip to Hawaii. Click here to claim your prize.\"\r\nis_spam = spam_detector.predict(text)\r\nprint(f\"Is the text spam? {'Yes' if is_spam else 'No'}\")\r\n\r\nparagraph = \"Congratulations! You've won a free trip to Hawaii. Click here to claim your prize. This is not a scam.\"\r\ncleaned_paragraph = spam_detector.filter(paragraph)\r\nprint(cleaned_paragraph)\r\n```\r\n\r\n## License\r\n\r\nThis project is licensed under the MIT License with attribution requirement.\r\n\r\n## Author\r\n\r\nRanit Bhowmick - [bhowmickranitking@duck.com](mailto:bhowmickranitking@duck.com)\r\n",
"bugtrack_url": null,
"license": "MIT License with attribution requirement",
"summary": "UltraClean is a fast and efficient Python library for cleaning and preprocessing text data for AI/ML tasks and data processing.",
"version": "0.2.2",
"project_urls": {
"Download": "https://github.com/Kawai-Senpai/UltraClean",
"Homepage": "https://github.com/Kawai-Senpai/UltraClean"
},
"split_keywords": [
"text cleaning",
" data preprocessing",
" ai",
" ml",
" spam detection"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "2f6a4685265e621ff2b351c22ffcaa8ba1ae9159d452a9f1ca461707cb37ad20",
"md5": "86d1e6642771e1e4165c0c2d1e26b7f5",
"sha256": "10a407318e0042b6608477d7fed7b71693b88e89502969533e8506866a7d5826"
},
"downloads": -1,
"filename": "ultraclean-0.2.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "86d1e6642771e1e4165c0c2d1e26b7f5",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 5732,
"upload_time": "2024-12-30T14:23:10",
"upload_time_iso_8601": "2024-12-30T14:23:10.960090Z",
"url": "https://files.pythonhosted.org/packages/2f/6a/4685265e621ff2b351c22ffcaa8ba1ae9159d452a9f1ca461707cb37ad20/ultraclean-0.2.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "1df669331e1224049788172374174686638be715517dead4e60167db9ab8833f",
"md5": "acd9eb09af2a7ada4031b7d21e122634",
"sha256": "a1835b943569aa8f730676d427ce9a62817fafd2656aed77c057d24a023b4665"
},
"downloads": -1,
"filename": "ultraclean-0.2.2.tar.gz",
"has_sig": false,
"md5_digest": "acd9eb09af2a7ada4031b7d21e122634",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 6050,
"upload_time": "2024-12-30T14:23:13",
"upload_time_iso_8601": "2024-12-30T14:23:13.264372Z",
"url": "https://files.pythonhosted.org/packages/1d/f6/69331e1224049788172374174686638be715517dead4e60167db9ab8833f/ultraclean-0.2.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-30 14:23:13",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Kawai-Senpai",
"github_project": "UltraClean",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "ultraclean"
}