Name | tweetben JSON |
Version |
0.0.1
JSON |
| download |
home_page | |
Summary | This is for text preprocessing |
upload_time | 2023-04-26 15:25:20 |
maintainer | |
docs_url | None |
author | Behdad Ehsani |
requires_python | |
license | |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Text and Tweet Preprocessing package
This package is created by Behdad (Ben) Ehsani. The package is designed for cleaning tweets on Twitter immediately and with one-shot coding. Additionally, some functions can be used for text preprocessing. An example is provided to demonstrate efficient usage.
## Installing the library
`pip install preprocessing-text-ben`
## Unistalling the library
`pip uninstall preprocessing-text-ben`
Example of one-shot cleaning the code:
```
import preprocessing-text-ben as pp
def get_clean(x):
# Convert the string to lowercase
x = str(x).lower()
# Expand contractions like "don't" to "do not"
x = pp.cont_to_exp(x)
# Remove any email addresses from the string
x = pp.remove_emails(x)
# Remove any URLs from the string
x = pp.remove_urls(x)
# Remove any HTML tags from the string
x = pp.remove_html_tags(x)
# Remove any retweet tags (RT) from the string
x = pp.remove_rt(x)
# Remove any accented characters from the string
x = pp.remove_accented_chars(x)
# Remove any special characters from the string
x = pp.remove_special_chars(x)
# Return the cleaned string
return x
#here is the cleaned text in one shot
df['your_cleaned_column'] = df['your_text_column'].apply(lambda x: get_clean(x))
```
version: 0.0.1
Raw data
{
"_id": null,
"home_page": "",
"name": "tweetben",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "",
"author": "Behdad Ehsani",
"author_email": "behdad.ehsani@hec.ca",
"download_url": "https://files.pythonhosted.org/packages/a3/0b/2d7e915e08bb9da05a34bb4c601d0acebdceef47dd0dc6ba21fd53f18b04/tweetben-0.0.1.tar.gz",
"platform": null,
"description": "# Text and Tweet Preprocessing package\n\n\n\nThis package is created by Behdad (Ben) Ehsani. The package is designed for cleaning tweets on Twitter immediately and with one-shot coding. Additionally, some functions can be used for text preprocessing. An example is provided to demonstrate efficient usage.\n\n\n## Installing the library\n\n`pip install preprocessing-text-ben`\n\n## Unistalling the library\n\n`pip uninstall preprocessing-text-ben`\n\n\n\nExample of one-shot cleaning the code: \n\n```\nimport preprocessing-text-ben as pp\n\ndef get_clean(x):\n \n # Convert the string to lowercase\n x = str(x).lower()\n \n # Expand contractions like \"don't\" to \"do not\"\n x = pp.cont_to_exp(x)\n \n # Remove any email addresses from the string\n x = pp.remove_emails(x)\n \n # Remove any URLs from the string\n x = pp.remove_urls(x)\n \n # Remove any HTML tags from the string\n x = pp.remove_html_tags(x)\n \n # Remove any retweet tags (RT) from the string\n x = pp.remove_rt(x)\n \n # Remove any accented characters from the string\n x = pp.remove_accented_chars(x)\n \n # Remove any special characters from the string\n x = pp.remove_special_chars(x)\n \n # Return the cleaned string\n return x\n\n\n#here is the cleaned text in one shot\ndf['your_cleaned_column'] = df['your_text_column'].apply(lambda x: get_clean(x))\n\n```\n\n\n\n\n\n\nversion: 0.0.1\n",
"bugtrack_url": null,
"license": "",
"summary": "This is for text preprocessing",
"version": "0.0.1",
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b1038d3447f7037d88066233ca7c10197e3c0894aed9962cb42c13a800807ae4",
"md5": "28140dde169246506a2f0558e765a671",
"sha256": "aa726d3e375a3c712db382eb4a0faee02a633c26c5361940cc169027b2935aae"
},
"downloads": -1,
"filename": "tweetben-0.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "28140dde169246506a2f0558e765a671",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 4086,
"upload_time": "2023-04-26T15:25:17",
"upload_time_iso_8601": "2023-04-26T15:25:17.937475Z",
"url": "https://files.pythonhosted.org/packages/b1/03/8d3447f7037d88066233ca7c10197e3c0894aed9962cb42c13a800807ae4/tweetben-0.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a30b2d7e915e08bb9da05a34bb4c601d0acebdceef47dd0dc6ba21fd53f18b04",
"md5": "1b4035448cf64ab097c487374fe7f270",
"sha256": "1fa997dece0e1121d684022b87af9e24e9d328b89f2b8e38c6715c3d729b1ced"
},
"downloads": -1,
"filename": "tweetben-0.0.1.tar.gz",
"has_sig": false,
"md5_digest": "1b4035448cf64ab097c487374fe7f270",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 3647,
"upload_time": "2023-04-26T15:25:20",
"upload_time_iso_8601": "2023-04-26T15:25:20.726641Z",
"url": "https://files.pythonhosted.org/packages/a3/0b/2d7e915e08bb9da05a34bb4c601d0acebdceef47dd0dc6ba21fd53f18b04/tweetben-0.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-04-26 15:25:20",
"github": false,
"gitlab": false,
"bitbucket": false,
"lcname": "tweetben"
}