# Proces
[![Pypi](https://img.shields.io/pypi/v/proces.svg)](https://pypi.org/project/proces/)
[![MIT License](https://img.shields.io/badge/license-MIT-green.svg)](https://github.com/Ailln/proces/blob/master/LICENSE)
[![stars](https://img.shields.io/github/stars/Ailln/proces.svg)](https://github.com/Ailln/proces/stargazers)
🐨 文本预处理。
## 1 安装
> ⚠️ 注意:
> 1. 本地安装仅支持 Python 的 3.6 以上版本;
> 2. 尽可能使用 `proces` 的最新版本。
### 使用 pip 安装
```shell
pip install proces -U
```
### 从代码库安装
```shell
git clone https://github.com/Ailln/proces.git
cd proces && python setup.py install
```
## 2 使用
```python
from proces import preprocess
# 默认会按照顺序执行,处理空白字符、大写转小写、繁体转简体、全角转半角
result = preprocess("Today, 你 幹 什 麼 !")
# result: today,你干什么!
# 配置 pipeline,比如只去除空白字符
result = preprocess("Today, 你 幹 什 麼 !", pipelines=["handle_blank_character"])
# result: Today,你幹什麼!
# 单独使用子方法
from proces import filter_unusual_characters, filter_
from proces import handle_blank_character
from proces import uppercase_to_lowercase
from proces import traditional_to_simplified
from proces import full_angle_to_half_angle
from proces import handle_substitute
# 删除不常见字符
result = filter_unusual_characters("【你是个恶魔😈啊�】")
# result: 【你是个恶魔啊】
# 也可以使用短方法 filter_
result = filter_("【你是个恶魔😈啊�】")
# result: 【你是个恶魔啊】
# 处理空白字符
result = handle_blank_character("空 白 字 符")
# result: 空白字符
result = handle_blank_character("空 白 字 符", ",")
# result: 空,白,字,符
# 大写转小写
result = uppercase_to_lowercase("UP to low")
# result: up to low
# 繁体转简体
result = traditional_to_simplified("我幹什麼不干你事")
# result: 我干什么不干你事
# 全角转半角
result = full_angle_to_half_angle("你好!")
# result: 你好!
# 替换一些字符
result = handle_substitute("你好!/:-", r"/:-", "表情")
# result: 你好!表情
```
```python
## 敏感信息过滤
from proces import mask_phone, mask_address
# 过滤手机号
result = mask_phone("手机号 13397238231")
# result: 手机号 133********
# 过滤地址
result = mask_address("我在浙江杭州余杭区")
# result: 我在浙江杭州***
```
## 3 TODO
- [x] add get all methods of preprocess
- [ ] 装饰器
## 4 许可
[![](https://award.dovolopor.com?lt=License&rt=MIT&rbc=green)](./LICENSE)
Raw data
{
"_id": null,
"home_page": "https://github.com/Ailln/proces",
"name": "proces",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "",
"keywords": "",
"author": "Ailln",
"author_email": "kinggreenhall@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/2c/3d/4159b57736ced0fd22553226df20a985ef7655519c80ffcb8a9fb49ebeee/proces-0.1.7.tar.gz",
"platform": null,
"description": "# Proces\n\n[![Pypi](https://img.shields.io/pypi/v/proces.svg)](https://pypi.org/project/proces/)\n[![MIT License](https://img.shields.io/badge/license-MIT-green.svg)](https://github.com/Ailln/proces/blob/master/LICENSE)\n[![stars](https://img.shields.io/github/stars/Ailln/proces.svg)](https://github.com/Ailln/proces/stargazers)\n\n\ud83d\udc28 \u6587\u672c\u9884\u5904\u7406\u3002\n\n## 1 \u5b89\u88c5\n\n> \u26a0\ufe0f \u6ce8\u610f\uff1a\n> 1. \u672c\u5730\u5b89\u88c5\u4ec5\u652f\u6301 Python \u7684 3.6 \u4ee5\u4e0a\u7248\u672c\uff1b\n> 2. \u5c3d\u53ef\u80fd\u4f7f\u7528 `proces` \u7684\u6700\u65b0\u7248\u672c\u3002\n\n### \u4f7f\u7528 pip \u5b89\u88c5\n\n```shell\npip install proces -U\n```\n\n### \u4ece\u4ee3\u7801\u5e93\u5b89\u88c5\n\n```shell\ngit clone https://github.com/Ailln/proces.git\n\ncd proces && python setup.py install\n```\n\n## 2 \u4f7f\u7528\n\n```python\nfrom proces import preprocess\n\n# \u9ed8\u8ba4\u4f1a\u6309\u7167\u987a\u5e8f\u6267\u884c\uff0c\u5904\u7406\u7a7a\u767d\u5b57\u7b26\u3001\u5927\u5199\u8f6c\u5c0f\u5199\u3001\u7e41\u4f53\u8f6c\u7b80\u4f53\u3001\u5168\u89d2\u8f6c\u534a\u89d2\nresult = preprocess(\"Today, \u4f60 \u5e79 \u4ec0 \u9ebc \uff01\")\n# result: today,\u4f60\u5e72\u4ec0\u4e48!\n\n# \u914d\u7f6e pipeline\uff0c\u6bd4\u5982\u53ea\u53bb\u9664\u7a7a\u767d\u5b57\u7b26\nresult = preprocess(\"Today, \u4f60 \u5e79 \u4ec0 \u9ebc \uff01\", pipelines=[\"handle_blank_character\"])\n# result: Today,\u4f60\u5e79\u4ec0\u9ebc\uff01\n\n# \u5355\u72ec\u4f7f\u7528\u5b50\u65b9\u6cd5\nfrom proces import filter_unusual_characters, filter_\nfrom proces import handle_blank_character\nfrom proces import uppercase_to_lowercase\nfrom proces import traditional_to_simplified\nfrom proces import full_angle_to_half_angle\nfrom proces import handle_substitute\n\n# \u5220\u9664\u4e0d\u5e38\u89c1\u5b57\u7b26\nresult = filter_unusual_characters(\"\u3010\u4f60\u662f\u4e2a\u6076\u9b54\ud83d\ude08\u554a\ufffd\u3011\")\n# result: \u3010\u4f60\u662f\u4e2a\u6076\u9b54\u554a\u3011\n# \u4e5f\u53ef\u4ee5\u4f7f\u7528\u77ed\u65b9\u6cd5 filter_\nresult = filter_(\"\u3010\u4f60\u662f\u4e2a\u6076\u9b54\ud83d\ude08\u554a\ufffd\u3011\")\n# result: \u3010\u4f60\u662f\u4e2a\u6076\u9b54\u554a\u3011\n\n# \u5904\u7406\u7a7a\u767d\u5b57\u7b26\nresult = handle_blank_character(\"\u7a7a \u767d \u5b57 \u7b26\")\n# result: \u7a7a\u767d\u5b57\u7b26\nresult = handle_blank_character(\"\u7a7a \u767d \u5b57 \u7b26\", \",\")\n# result: \u7a7a,\u767d,\u5b57,\u7b26\n\n# \u5927\u5199\u8f6c\u5c0f\u5199\nresult = uppercase_to_lowercase(\"UP to low\")\n# result: up to low\n\n# \u7e41\u4f53\u8f6c\u7b80\u4f53\nresult = traditional_to_simplified(\"\u6211\u5e79\u4ec0\u9ebc\u4e0d\u5e72\u4f60\u4e8b\")\n# result: \u6211\u5e72\u4ec0\u4e48\u4e0d\u5e72\u4f60\u4e8b\n\n# \u5168\u89d2\u8f6c\u534a\u89d2\nresult = full_angle_to_half_angle(\"\u4f60\u597d\uff01\")\n# result: \u4f60\u597d!\n\n# \u66ff\u6362\u4e00\u4e9b\u5b57\u7b26\nresult = handle_substitute(\"\u4f60\u597d\uff01/:-\", r\"/:-\", \"\u8868\u60c5\")\n# result: \u4f60\u597d\uff01\u8868\u60c5\n```\n\n```python\n## \u654f\u611f\u4fe1\u606f\u8fc7\u6ee4\nfrom proces import mask_phone, mask_address\n\n# \u8fc7\u6ee4\u624b\u673a\u53f7\nresult = mask_phone(\"\u624b\u673a\u53f7 13397238231\")\n# result: \u624b\u673a\u53f7 133********\n\n# \u8fc7\u6ee4\u5730\u5740\nresult = mask_address(\"\u6211\u5728\u6d59\u6c5f\u676d\u5dde\u4f59\u676d\u533a\")\n# result: \u6211\u5728\u6d59\u6c5f\u676d\u5dde***\n```\n\n## 3 TODO\n\n- [x] add get all methods of preprocess\n- [ ] \u88c5\u9970\u5668\n\n## 4 \u8bb8\u53ef\n\n[![](https://award.dovolopor.com?lt=License&rt=MIT&rbc=green)](./LICENSE)\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "text preprocess.",
"version": "0.1.7",
"project_urls": {
"Homepage": "https://github.com/Ailln/proces"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "6f8806cc0c7d890ed8d7e16ef0e56880dea516a21643fb1f3a69a50f4cc6f716",
"md5": "a1bf89c15906e1fb75c1dba894a07847",
"sha256": "308325bbc96877263f06e57e5e9c760c4b42cc722887ad60be6b18fc37d68762"
},
"downloads": -1,
"filename": "proces-0.1.7-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a1bf89c15906e1fb75c1dba894a07847",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 137718,
"upload_time": "2023-09-09T03:27:35",
"upload_time_iso_8601": "2023-09-09T03:27:35.463017Z",
"url": "https://files.pythonhosted.org/packages/6f/88/06cc0c7d890ed8d7e16ef0e56880dea516a21643fb1f3a69a50f4cc6f716/proces-0.1.7-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "2c3d4159b57736ced0fd22553226df20a985ef7655519c80ffcb8a9fb49ebeee",
"md5": "f67ef78a899e4d55828fa09a63752ef1",
"sha256": "70a05d9e973dd685f7a9092c58be695a8181a411d63796c213232fd3fdc43775"
},
"downloads": -1,
"filename": "proces-0.1.7.tar.gz",
"has_sig": false,
"md5_digest": "f67ef78a899e4d55828fa09a63752ef1",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 31188,
"upload_time": "2023-09-09T03:27:38",
"upload_time_iso_8601": "2023-09-09T03:27:38.158176Z",
"url": "https://files.pythonhosted.org/packages/2c/3d/4159b57736ced0fd22553226df20a985ef7655519c80ffcb8a9fb49ebeee/proces-0.1.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-09-09 03:27:38",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Ailln",
"github_project": "proces",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "proces"
}