ultraclean


Nameultraclean JSON
Version 0.2.2 PyPI version JSON
download
home_pagehttps://github.com/Kawai-Senpai/UltraClean
SummaryUltraClean is a fast and efficient Python library for cleaning and preprocessing text data for AI/ML tasks and data processing.
upload_time2024-12-30 14:23:13
maintainerNone
docs_urlNone
authorRanit Bhowmick
requires_python>=3.7
licenseMIT License with attribution requirement
keywords text cleaning data preprocessing ai ml spam detection
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # UltraClean

UltraClean is a fast and efficient Python library for cleaning and preprocessing text data, specifically designed for AI/ML tasks and data processing.

## Features

- Remove unwanted characters, links, emails, phone numbers, underscores, unicode characters, emojis, numbers, currencies, punctuation, HTML tags, LaTeX commands, and more.
- Handle multi-dots, extra spaces, and hashtags.
- Batch processing for efficient text cleaning.
- Spam detection and filtering using pre-trained models.

## Installation

You can install UltraClean using pip:

```bash
pip install ultraclean
```

## Usage

### Text Cleaning

```python
from ultraclean.clean import cleanup

text = "Congratulations! You've won a free trip to Hawaii. Click here to claim your prize. This is not a scam."
cleaned_text = cleanup(text)
print(cleaned_text)
```

### Spam Detection

```python
from ultraclean.predict import Spam

spam_detector = Spam()
text = "Congratulations! You've won a free trip to Hawaii. Click here to claim your prize."
is_spam = spam_detector.predict(text)
print(f"Is the text spam? {'Yes' if is_spam else 'No'}")

paragraph = "Congratulations! You've won a free trip to Hawaii. Click here to claim your prize. This is not a scam."
cleaned_paragraph = spam_detector.filter(paragraph)
print(cleaned_paragraph)
```

## License

This project is licensed under the MIT License with attribution requirement.

## Author

Ranit Bhowmick - [bhowmickranitking@duck.com](mailto:bhowmickranitking@duck.com)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Kawai-Senpai/UltraClean",
    "name": "ultraclean",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "Text Cleaning, Data Preprocessing, AI, ML, Spam Detection",
    "author": "Ranit Bhowmick",
    "author_email": "bhowmickranitking@duck.com",
    "download_url": "https://files.pythonhosted.org/packages/1d/f6/69331e1224049788172374174686638be715517dead4e60167db9ab8833f/ultraclean-0.2.2.tar.gz",
    "platform": null,
    "description": "# UltraClean\r\n\r\nUltraClean is a fast and efficient Python library for cleaning and preprocessing text data, specifically designed for AI/ML tasks and data processing.\r\n\r\n## Features\r\n\r\n- Remove unwanted characters, links, emails, phone numbers, underscores, unicode characters, emojis, numbers, currencies, punctuation, HTML tags, LaTeX commands, and more.\r\n- Handle multi-dots, extra spaces, and hashtags.\r\n- Batch processing for efficient text cleaning.\r\n- Spam detection and filtering using pre-trained models.\r\n\r\n## Installation\r\n\r\nYou can install UltraClean using pip:\r\n\r\n```bash\r\npip install ultraclean\r\n```\r\n\r\n## Usage\r\n\r\n### Text Cleaning\r\n\r\n```python\r\nfrom ultraclean.clean import cleanup\r\n\r\ntext = \"Congratulations! You've won a free trip to Hawaii. Click here to claim your prize. This is not a scam.\"\r\ncleaned_text = cleanup(text)\r\nprint(cleaned_text)\r\n```\r\n\r\n### Spam Detection\r\n\r\n```python\r\nfrom ultraclean.predict import Spam\r\n\r\nspam_detector = Spam()\r\ntext = \"Congratulations! You've won a free trip to Hawaii. Click here to claim your prize.\"\r\nis_spam = spam_detector.predict(text)\r\nprint(f\"Is the text spam? {'Yes' if is_spam else 'No'}\")\r\n\r\nparagraph = \"Congratulations! You've won a free trip to Hawaii. Click here to claim your prize. This is not a scam.\"\r\ncleaned_paragraph = spam_detector.filter(paragraph)\r\nprint(cleaned_paragraph)\r\n```\r\n\r\n## License\r\n\r\nThis project is licensed under the MIT License with attribution requirement.\r\n\r\n## Author\r\n\r\nRanit Bhowmick - [bhowmickranitking@duck.com](mailto:bhowmickranitking@duck.com)\r\n",
    "bugtrack_url": null,
    "license": "MIT License with attribution requirement",
    "summary": "UltraClean is a fast and efficient Python library for cleaning and preprocessing text data for AI/ML tasks and data processing.",
    "version": "0.2.2",
    "project_urls": {
        "Download": "https://github.com/Kawai-Senpai/UltraClean",
        "Homepage": "https://github.com/Kawai-Senpai/UltraClean"
    },
    "split_keywords": [
        "text cleaning",
        " data preprocessing",
        " ai",
        " ml",
        " spam detection"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2f6a4685265e621ff2b351c22ffcaa8ba1ae9159d452a9f1ca461707cb37ad20",
                "md5": "86d1e6642771e1e4165c0c2d1e26b7f5",
                "sha256": "10a407318e0042b6608477d7fed7b71693b88e89502969533e8506866a7d5826"
            },
            "downloads": -1,
            "filename": "ultraclean-0.2.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "86d1e6642771e1e4165c0c2d1e26b7f5",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 5732,
            "upload_time": "2024-12-30T14:23:10",
            "upload_time_iso_8601": "2024-12-30T14:23:10.960090Z",
            "url": "https://files.pythonhosted.org/packages/2f/6a/4685265e621ff2b351c22ffcaa8ba1ae9159d452a9f1ca461707cb37ad20/ultraclean-0.2.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1df669331e1224049788172374174686638be715517dead4e60167db9ab8833f",
                "md5": "acd9eb09af2a7ada4031b7d21e122634",
                "sha256": "a1835b943569aa8f730676d427ce9a62817fafd2656aed77c057d24a023b4665"
            },
            "downloads": -1,
            "filename": "ultraclean-0.2.2.tar.gz",
            "has_sig": false,
            "md5_digest": "acd9eb09af2a7ada4031b7d21e122634",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 6050,
            "upload_time": "2024-12-30T14:23:13",
            "upload_time_iso_8601": "2024-12-30T14:23:13.264372Z",
            "url": "https://files.pythonhosted.org/packages/1d/f6/69331e1224049788172374174686638be715517dead4e60167db9ab8833f/ultraclean-0.2.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-30 14:23:13",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Kawai-Senpai",
    "github_project": "UltraClean",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "ultraclean"
}
        
Elapsed time: 0.50992s