text-prettifier


Nametext-prettifier JSON
Version 1.1.4 PyPI version JSON
download
home_pageNone
SummaryA Python library for cleaning and preprocessing text data by removing,emojies,internet words, special characters, digits, HTML tags, URLs, and stopwords.
upload_time2024-08-17 21:26:33
maintainerNone
docs_urlNone
authorQadeer Ahmad
requires_pythonNone
licenseNone
keywords text cleaning text preprocessing text scrubber nlp natural language processing data cleaning data preprocessing string manipulation text manipulation stopwords removal contractions expansion text normalization text sanitization internet words removal emojis removal emojis killer
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# TextPrettifier

TextPrettifier is a Python library for cleaning text data by removing HTML tags, URLs, numbers, special characters, contractions, and stopwords.
## TextPrettifier Key Features

### 1. Removing Emojis
The `remove_emojis` method removes emojis from the text.

### 2. Removing Internet Words
The `remove_internet_words` method removes internet-specific words from the text.

### 3. Removing HTML Tags
The `remove_html_tags` method removes HTML tags from the text.

### 4. Removing URLs
The `remove_urls` method removes URLs from the text.

### 5. Removing Numbers
The `remove_numbers` method removes numbers from the text.

### 6. Removing Special Characters
The `remove_special_chars` method removes special characters from the text.

### 7. Expanding Contractions
The `remove_contractions` method expands contractions in the text.

### 8. Removing Stopwords
The `remove_stopwords` method removes stopwords from the text.

### Additional Functionality
- If `is_lower` and `is_token` are both `True`, the text is returned in lowercase and as a list of tokens.
- If only `is_lower` is `True`, the text is returned in lowercase.
- If only `is_token` is `True`, the text is returned as a list of tokens.
- If neither `is_lower` nor `is_token` is `True`, the text is returned as is.


## Installation

You can install TextPrettifier using pip:

```bash
pip install text-prettifier
```
```python
from text_prettifier import TextPrettifier
```
### Initialize TextPrettifier
text_prettifier = TextPrettifier()

#### Example: Remove Emojis
```python
html_text = "Hi,Pythonogist! I ❤️ Python."
cleaned_html = text_prettifier.remove_emojis(html_text)
print(cleaned_html)
```
**Output**
Hi,Pythonogist! I Python.
#### Example: Remove HTML tags
```python
html_text = "<p>Hello, <b>world</b>!</p>"
cleaned_html = text_prettifier.remove_html_tags(html_text)
print(cleaned_html)
```
**Output**
Hello,world!
#### Example: Remove URLs
```python
url_text = "Visit our website at https://example.com"
cleaned_urls = text_prettifier.remove_urls(url_text)
print(cleaned_urls)
```
**Output**
Visit our webiste at
#### Example: Remove numbers
```python
number_text = "There are 123 apples"
cleaned_numbers = text_prettifier.remove_numbers(number_text)
print(cleaned_numbers)
```
**Output**
There are apples
#### Example: Remove special characters
```python
special_text = "Hello, @world!"
cleaned_special = text_prettifier.remove_special_chars(special_text)
print(cleaned_special)
```
**Output**
Hello world
#### Example: Remove contractions
```python
contraction_text = "I can't do it"
cleaned_contractions = text_prettifier.remove_contractions(contraction_text)
print(cleaned_contractions)
```
**Output**
I cannot do it
#### Example: Remove stopwords
```python
stopwords_text = "This is a test"
cleaned_stopwords = text_prettifier.remove_stopwords(stopwords_text)
print(cleaned_stopwords)
```
**Output**
This test
#### Example: Apply all cleaning methods
```python
all_text = "<p>Hello, @world!</p> There are 123 apples. I can't do it. This is a test."
all_cleaned = text_prettifier.sigma_cleaner(all_text)
print(all_cleaned)

```
**Output**
Hello world 123 apples cannot test


```If you are interested to tokenized and lower the cleaned text write the code```
```python
all_text = "<p>Hello, @world!</p> There are 123 apples. I can't do it. This is a test."
all_cleaned = text_prettifier.sigma_cleaner(all_text,is_token=True,is_lower=True)
print(all_cleaned)

```
**Output**
['Hello','world', '123','apples', 'cannot','test']

**Note:** I didn't include ```remove_numbers``` in ```sigma_cleaner``` because sometimes numbers carry useful information in term of NLP. If you want to remove number you can apply this method seperately on output of ```sigma_cleaner```.


### Contact Information

Feel free to reach out to me on social media:

[![GitHub](https://img.shields.io/badge/GitHub-mrqadeer)](https://github.com/mrqadeer)
[![LinkedIn](https://img.shields.io/badge/LinkedIn-Qadeer)](https://www.linkedin.com/in/qadeer-ahmad-3499a4205/)
[![Twitter](https://img.shields.io/badge/Twitter-Twitter)](https://twitter.com/mr_sin_of_me)
[![Facebook](https://img.shields.io/badge/Facebook-Facebook)](https://web.facebook.com/mrqadeerofficial/)




## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "text-prettifier",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "text cleaning, text preprocessing, text scrubber, NLP, natural language processing, data cleaning, data preprocessing, string manipulation, text manipulation, stopwords removal, contractions expansion, text normalization, text sanitization, internet words removal, emojis removal, emojis killer",
    "author": "Qadeer Ahmad",
    "author_email": "mrqadeer1231122@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/96/30/1058adc542addde82034ecd099cb74816e09ed87d5ba86d89aec8a38a888/text_prettifier-1.1.4.tar.gz",
    "platform": null,
    "description": "\n# TextPrettifier\n\nTextPrettifier is a Python library for cleaning text data by removing HTML tags, URLs, numbers, special characters, contractions, and stopwords.\n## TextPrettifier Key Features\n\n### 1. Removing Emojis\nThe `remove_emojis` method removes emojis from the text.\n\n### 2. Removing Internet Words\nThe `remove_internet_words` method removes internet-specific words from the text.\n\n### 3. Removing HTML Tags\nThe `remove_html_tags` method removes HTML tags from the text.\n\n### 4. Removing URLs\nThe `remove_urls` method removes URLs from the text.\n\n### 5. Removing Numbers\nThe `remove_numbers` method removes numbers from the text.\n\n### 6. Removing Special Characters\nThe `remove_special_chars` method removes special characters from the text.\n\n### 7. Expanding Contractions\nThe `remove_contractions` method expands contractions in the text.\n\n### 8. Removing Stopwords\nThe `remove_stopwords` method removes stopwords from the text.\n\n### Additional Functionality\n- If `is_lower` and `is_token` are both `True`, the text is returned in lowercase and as a list of tokens.\n- If only `is_lower` is `True`, the text is returned in lowercase.\n- If only `is_token` is `True`, the text is returned as a list of tokens.\n- If neither `is_lower` nor `is_token` is `True`, the text is returned as is.\n\n\n## Installation\n\nYou can install TextPrettifier using pip:\n\n```bash\npip install text-prettifier\n```\n```python\nfrom text_prettifier import TextPrettifier\n```\n### Initialize TextPrettifier\ntext_prettifier = TextPrettifier()\n\n#### Example: Remove Emojis\n```python\nhtml_text = \"Hi,Pythonogist! I \u2764\ufe0f Python.\"\ncleaned_html = text_prettifier.remove_emojis(html_text)\nprint(cleaned_html)\n```\n**Output**\nHi,Pythonogist! I Python.\n#### Example: Remove HTML tags\n```python\nhtml_text = \"<p>Hello, <b>world</b>!</p>\"\ncleaned_html = text_prettifier.remove_html_tags(html_text)\nprint(cleaned_html)\n```\n**Output**\nHello,world!\n#### Example: Remove URLs\n```python\nurl_text = \"Visit our website at https://example.com\"\ncleaned_urls = text_prettifier.remove_urls(url_text)\nprint(cleaned_urls)\n```\n**Output**\nVisit our webiste at\n#### Example: Remove numbers\n```python\nnumber_text = \"There are 123 apples\"\ncleaned_numbers = text_prettifier.remove_numbers(number_text)\nprint(cleaned_numbers)\n```\n**Output**\nThere are apples\n#### Example: Remove special characters\n```python\nspecial_text = \"Hello, @world!\"\ncleaned_special = text_prettifier.remove_special_chars(special_text)\nprint(cleaned_special)\n```\n**Output**\nHello world\n#### Example: Remove contractions\n```python\ncontraction_text = \"I can't do it\"\ncleaned_contractions = text_prettifier.remove_contractions(contraction_text)\nprint(cleaned_contractions)\n```\n**Output**\nI cannot do it\n#### Example: Remove stopwords\n```python\nstopwords_text = \"This is a test\"\ncleaned_stopwords = text_prettifier.remove_stopwords(stopwords_text)\nprint(cleaned_stopwords)\n```\n**Output**\nThis test\n#### Example: Apply all cleaning methods\n```python\nall_text = \"<p>Hello, @world!</p> There are 123 apples. I can't do it. This is a test.\"\nall_cleaned = text_prettifier.sigma_cleaner(all_text)\nprint(all_cleaned)\n\n```\n**Output**\nHello world 123 apples cannot test\n\n\n```If you are interested to tokenized and lower the cleaned text write the code```\n```python\nall_text = \"<p>Hello, @world!</p> There are 123 apples. I can't do it. This is a test.\"\nall_cleaned = text_prettifier.sigma_cleaner(all_text,is_token=True,is_lower=True)\nprint(all_cleaned)\n\n```\n**Output**\n['Hello','world', '123','apples', 'cannot','test']\n\n**Note:** I didn't include ```remove_numbers``` in ```sigma_cleaner``` because sometimes numbers carry useful information in term of NLP. If you want to remove number you can apply this method seperately on output of ```sigma_cleaner```.\n\n\n### Contact Information\n\nFeel free to reach out to me on social media:\n\n[![GitHub](https://img.shields.io/badge/GitHub-mrqadeer)](https://github.com/mrqadeer)\n[![LinkedIn](https://img.shields.io/badge/LinkedIn-Qadeer)](https://www.linkedin.com/in/qadeer-ahmad-3499a4205/)\n[![Twitter](https://img.shields.io/badge/Twitter-Twitter)](https://twitter.com/mr_sin_of_me)\n[![Facebook](https://img.shields.io/badge/Facebook-Facebook)](https://web.facebook.com/mrqadeerofficial/)\n\n\n\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A Python library for cleaning and preprocessing text data by removing,emojies,internet words, special characters, digits, HTML tags, URLs, and stopwords.",
    "version": "1.1.4",
    "project_urls": null,
    "split_keywords": [
        "text cleaning",
        " text preprocessing",
        " text scrubber",
        " nlp",
        " natural language processing",
        " data cleaning",
        " data preprocessing",
        " string manipulation",
        " text manipulation",
        " stopwords removal",
        " contractions expansion",
        " text normalization",
        " text sanitization",
        " internet words removal",
        " emojis removal",
        " emojis killer"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7216f75decea8eae64f145a52a2d586ba683120d6933b2362ffa6a6a7bb3fe87",
                "md5": "54330abbe2da48e9143d8b02d2e1334e",
                "sha256": "a3db624155eaf151f32f9a22aca5481696be112201c133269cc603b6171b72ae"
            },
            "downloads": -1,
            "filename": "text_prettifier-1.1.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "54330abbe2da48e9143d8b02d2e1334e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 7126,
            "upload_time": "2024-08-17T21:26:32",
            "upload_time_iso_8601": "2024-08-17T21:26:32.000172Z",
            "url": "https://files.pythonhosted.org/packages/72/16/f75decea8eae64f145a52a2d586ba683120d6933b2362ffa6a6a7bb3fe87/text_prettifier-1.1.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "96301058adc542addde82034ecd099cb74816e09ed87d5ba86d89aec8a38a888",
                "md5": "d15b8066eba015c33db1bee656f1c4bd",
                "sha256": "cfee5fc8c43960037321b5245982d4f79a787bffaf65dc63dfe6dd8a9a9f6286"
            },
            "downloads": -1,
            "filename": "text_prettifier-1.1.4.tar.gz",
            "has_sig": false,
            "md5_digest": "d15b8066eba015c33db1bee656f1c4bd",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 7469,
            "upload_time": "2024-08-17T21:26:33",
            "upload_time_iso_8601": "2024-08-17T21:26:33.303657Z",
            "url": "https://files.pythonhosted.org/packages/96/30/1058adc542addde82034ecd099cb74816e09ed87d5ba86d89aec8a38a888/text_prettifier-1.1.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-17 21:26:33",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "text-prettifier"
}
        
Elapsed time: 0.74450s