proces


Nameproces JSON
Version 0.1.7 PyPI version JSON
download
home_pagehttps://github.com/Ailln/proces
Summarytext preprocess.
upload_time2023-09-09 03:27:38
maintainer
docs_urlNone
authorAilln
requires_python>=3.6
licenseMIT License
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Proces

[![Pypi](https://img.shields.io/pypi/v/proces.svg)](https://pypi.org/project/proces/)
[![MIT License](https://img.shields.io/badge/license-MIT-green.svg)](https://github.com/Ailln/proces/blob/master/LICENSE)
[![stars](https://img.shields.io/github/stars/Ailln/proces.svg)](https://github.com/Ailln/proces/stargazers)

🐨 文本预处理。

## 1 安装

> ⚠️ 注意:
> 1. 本地安装仅支持 Python 的 3.6 以上版本;
> 2. 尽可能使用 `proces` 的最新版本。

### 使用 pip 安装

```shell
pip install proces -U
```

### 从代码库安装

```shell
git clone https://github.com/Ailln/proces.git

cd proces && python setup.py install
```

## 2 使用

```python
from proces import preprocess

# 默认会按照顺序执行,处理空白字符、大写转小写、繁体转简体、全角转半角
result = preprocess("Today, 你 幹 什 麼 !")
# result: today,你干什么!

# 配置 pipeline,比如只去除空白字符
result = preprocess("Today, 你 幹 什 麼 !", pipelines=["handle_blank_character"])
# result: Today,你幹什麼!

# 单独使用子方法
from proces import filter_unusual_characters, filter_
from proces import handle_blank_character
from proces import uppercase_to_lowercase
from proces import traditional_to_simplified
from proces import full_angle_to_half_angle
from proces import handle_substitute

# 删除不常见字符
result = filter_unusual_characters("【你是个恶魔😈啊�】")
# result: 【你是个恶魔啊】
# 也可以使用短方法 filter_
result = filter_("【你是个恶魔😈啊�】")
# result: 【你是个恶魔啊】

# 处理空白字符
result = handle_blank_character("空 白 字 符")
# result: 空白字符
result = handle_blank_character("空 白 字 符", ",")
# result: 空,白,字,符

# 大写转小写
result = uppercase_to_lowercase("UP to low")
# result: up to low

# 繁体转简体
result = traditional_to_simplified("我幹什麼不干你事")
# result: 我干什么不干你事

# 全角转半角
result = full_angle_to_half_angle("你好!")
# result: 你好!

# 替换一些字符
result = handle_substitute("你好!/:-", r"/:-", "表情")
# result: 你好!表情
```

```python
## 敏感信息过滤
from proces import mask_phone, mask_address

# 过滤手机号
result = mask_phone("手机号 13397238231")
# result: 手机号 133********

# 过滤地址
result = mask_address("我在浙江杭州余杭区")
# result: 我在浙江杭州***
```

## 3 TODO

- [x] add get all methods of preprocess
- [ ] 装饰器

## 4 许可

[![](https://award.dovolopor.com?lt=License&rt=MIT&rbc=green)](./LICENSE)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Ailln/proces",
    "name": "proces",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "",
    "author": "Ailln",
    "author_email": "kinggreenhall@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/2c/3d/4159b57736ced0fd22553226df20a985ef7655519c80ffcb8a9fb49ebeee/proces-0.1.7.tar.gz",
    "platform": null,
    "description": "# Proces\n\n[![Pypi](https://img.shields.io/pypi/v/proces.svg)](https://pypi.org/project/proces/)\n[![MIT License](https://img.shields.io/badge/license-MIT-green.svg)](https://github.com/Ailln/proces/blob/master/LICENSE)\n[![stars](https://img.shields.io/github/stars/Ailln/proces.svg)](https://github.com/Ailln/proces/stargazers)\n\n\ud83d\udc28 \u6587\u672c\u9884\u5904\u7406\u3002\n\n## 1 \u5b89\u88c5\n\n> \u26a0\ufe0f \u6ce8\u610f\uff1a\n> 1. \u672c\u5730\u5b89\u88c5\u4ec5\u652f\u6301 Python \u7684 3.6 \u4ee5\u4e0a\u7248\u672c\uff1b\n> 2. \u5c3d\u53ef\u80fd\u4f7f\u7528 `proces` \u7684\u6700\u65b0\u7248\u672c\u3002\n\n### \u4f7f\u7528 pip \u5b89\u88c5\n\n```shell\npip install proces -U\n```\n\n### \u4ece\u4ee3\u7801\u5e93\u5b89\u88c5\n\n```shell\ngit clone https://github.com/Ailln/proces.git\n\ncd proces && python setup.py install\n```\n\n## 2 \u4f7f\u7528\n\n```python\nfrom proces import preprocess\n\n# \u9ed8\u8ba4\u4f1a\u6309\u7167\u987a\u5e8f\u6267\u884c\uff0c\u5904\u7406\u7a7a\u767d\u5b57\u7b26\u3001\u5927\u5199\u8f6c\u5c0f\u5199\u3001\u7e41\u4f53\u8f6c\u7b80\u4f53\u3001\u5168\u89d2\u8f6c\u534a\u89d2\nresult = preprocess(\"Today, \u4f60 \u5e79 \u4ec0 \u9ebc \uff01\")\n# result: today,\u4f60\u5e72\u4ec0\u4e48!\n\n# \u914d\u7f6e pipeline\uff0c\u6bd4\u5982\u53ea\u53bb\u9664\u7a7a\u767d\u5b57\u7b26\nresult = preprocess(\"Today, \u4f60 \u5e79 \u4ec0 \u9ebc \uff01\", pipelines=[\"handle_blank_character\"])\n# result: Today,\u4f60\u5e79\u4ec0\u9ebc\uff01\n\n# \u5355\u72ec\u4f7f\u7528\u5b50\u65b9\u6cd5\nfrom proces import filter_unusual_characters, filter_\nfrom proces import handle_blank_character\nfrom proces import uppercase_to_lowercase\nfrom proces import traditional_to_simplified\nfrom proces import full_angle_to_half_angle\nfrom proces import handle_substitute\n\n# \u5220\u9664\u4e0d\u5e38\u89c1\u5b57\u7b26\nresult = filter_unusual_characters(\"\u3010\u4f60\u662f\u4e2a\u6076\u9b54\ud83d\ude08\u554a\ufffd\u3011\")\n# result: \u3010\u4f60\u662f\u4e2a\u6076\u9b54\u554a\u3011\n# \u4e5f\u53ef\u4ee5\u4f7f\u7528\u77ed\u65b9\u6cd5 filter_\nresult = filter_(\"\u3010\u4f60\u662f\u4e2a\u6076\u9b54\ud83d\ude08\u554a\ufffd\u3011\")\n# result: \u3010\u4f60\u662f\u4e2a\u6076\u9b54\u554a\u3011\n\n# \u5904\u7406\u7a7a\u767d\u5b57\u7b26\nresult = handle_blank_character(\"\u7a7a \u767d \u5b57 \u7b26\")\n# result: \u7a7a\u767d\u5b57\u7b26\nresult = handle_blank_character(\"\u7a7a \u767d \u5b57 \u7b26\", \",\")\n# result: \u7a7a,\u767d,\u5b57,\u7b26\n\n# \u5927\u5199\u8f6c\u5c0f\u5199\nresult = uppercase_to_lowercase(\"UP to low\")\n# result: up to low\n\n# \u7e41\u4f53\u8f6c\u7b80\u4f53\nresult = traditional_to_simplified(\"\u6211\u5e79\u4ec0\u9ebc\u4e0d\u5e72\u4f60\u4e8b\")\n# result: \u6211\u5e72\u4ec0\u4e48\u4e0d\u5e72\u4f60\u4e8b\n\n# \u5168\u89d2\u8f6c\u534a\u89d2\nresult = full_angle_to_half_angle(\"\u4f60\u597d\uff01\")\n# result: \u4f60\u597d!\n\n# \u66ff\u6362\u4e00\u4e9b\u5b57\u7b26\nresult = handle_substitute(\"\u4f60\u597d\uff01/:-\", r\"/:-\", \"\u8868\u60c5\")\n# result: \u4f60\u597d\uff01\u8868\u60c5\n```\n\n```python\n## \u654f\u611f\u4fe1\u606f\u8fc7\u6ee4\nfrom proces import mask_phone, mask_address\n\n# \u8fc7\u6ee4\u624b\u673a\u53f7\nresult = mask_phone(\"\u624b\u673a\u53f7 13397238231\")\n# result: \u624b\u673a\u53f7 133********\n\n# \u8fc7\u6ee4\u5730\u5740\nresult = mask_address(\"\u6211\u5728\u6d59\u6c5f\u676d\u5dde\u4f59\u676d\u533a\")\n# result: \u6211\u5728\u6d59\u6c5f\u676d\u5dde***\n```\n\n## 3 TODO\n\n- [x] add get all methods of preprocess\n- [ ] \u88c5\u9970\u5668\n\n## 4 \u8bb8\u53ef\n\n[![](https://award.dovolopor.com?lt=License&rt=MIT&rbc=green)](./LICENSE)\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "text preprocess.",
    "version": "0.1.7",
    "project_urls": {
        "Homepage": "https://github.com/Ailln/proces"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6f8806cc0c7d890ed8d7e16ef0e56880dea516a21643fb1f3a69a50f4cc6f716",
                "md5": "a1bf89c15906e1fb75c1dba894a07847",
                "sha256": "308325bbc96877263f06e57e5e9c760c4b42cc722887ad60be6b18fc37d68762"
            },
            "downloads": -1,
            "filename": "proces-0.1.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a1bf89c15906e1fb75c1dba894a07847",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 137718,
            "upload_time": "2023-09-09T03:27:35",
            "upload_time_iso_8601": "2023-09-09T03:27:35.463017Z",
            "url": "https://files.pythonhosted.org/packages/6f/88/06cc0c7d890ed8d7e16ef0e56880dea516a21643fb1f3a69a50f4cc6f716/proces-0.1.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2c3d4159b57736ced0fd22553226df20a985ef7655519c80ffcb8a9fb49ebeee",
                "md5": "f67ef78a899e4d55828fa09a63752ef1",
                "sha256": "70a05d9e973dd685f7a9092c58be695a8181a411d63796c213232fd3fdc43775"
            },
            "downloads": -1,
            "filename": "proces-0.1.7.tar.gz",
            "has_sig": false,
            "md5_digest": "f67ef78a899e4d55828fa09a63752ef1",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 31188,
            "upload_time": "2023-09-09T03:27:38",
            "upload_time_iso_8601": "2023-09-09T03:27:38.158176Z",
            "url": "https://files.pythonhosted.org/packages/2c/3d/4159b57736ced0fd22553226df20a985ef7655519c80ffcb8a9fb49ebeee/proces-0.1.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-09 03:27:38",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Ailln",
    "github_project": "proces",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "proces"
}
        
Elapsed time: 0.12216s