easy-spider-tool


Nameeasy-spider-tool JSON
Version 1.0.16 PyPI version JSON
download
home_pagehttps://easy-spider-tool.xink.top/
Summary简易、好用的爬虫工具,减少重复代码与文件冗余
upload_time2023-12-18 10:36:24
maintainer
docs_urlNone
authorhanxinkong
requires_python>=3.6.8
licenseMIT
keywords easy spider tool
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # easy spider tool

在实际工作中,沉淀的一些简易、好用的爬虫工具,减少重复代码与文件冗余,希望一样能为使用者带来益处。如果您也想贡献好的代码片段,请将代码以及描述,通过邮箱( [xinkonghan@gmail.com](mailto:hanxinkong<xinkonghan@gmail.com>)
)发送给我。代码格式是遵循自我主观,如存在不足敬请指出!

## 安装

```shell
pip install easy_spider_tool
```

## 主要功能

- `时间相关`
    - `before_day` 昨天日期(可用于时间递减)
    - `after_day` 明天日期(可用于时间递增)
    - `between_day` 两个日期之间
    - `current_date` 当前时间
    - `timestamp` 当前时间戳(支持精确到毫秒)
    - `date_parse` 任意格式时间解析(支持时区转换,指定保留日期/时间(可设置默认值)部分)
- `json相关`
    - `format_json` 漂亮美观的格式化输出
    - `jsonpath` 任意多个json路径解析(支持设置默认值,选取首个匹配值)
- `hash摘要相关`
    - `md5` 字符经md5编码
- `正则匹配相关`
    - `regex_match` 条件匹配(支持多个不相关条件匹配,支持设置默认值,选取首个匹配值)
    - `for_to_regx_match` 多个不相关条件匹配(兼容老版本保留)
- `数据清洗/转换相关`
    - `cookie_to_dic` cookie转换为字典(Dict)格式
    - `clear_value` 清除列表(List)或字典(Dict)中的指定值(递归清除所有嵌套字典和列表中的指定值)
- `合法性验证相关`
    - `verify_ip_address` IP地址合法性验证
    - `verify_domain_name` 域名合法性验证
    - `verify_port` 端口合法性验证
    - `verify_url` URL合法性验证
- `通知相关`
    - 暂无

## 简单使用

```python
from easy_spider_tool import format_json, jsonpath

data = {
    "code": 200,
    "data": [
        {
            "id": 1,
            "username": "admin",
            "level": "boss"
        },
        {
            "id": 2,
            "username": "user",
            "level": "staff"
        }
    ]
}

boss_name = jsonpath(data, '$.data[?(@.level=="boss")].username', first=True)
all_user_info = jsonpath(data, '$.data[*].username')

print(boss_name)
print(format_json(all_user_info))
```

## 链接

Github:https://github.com/hanxinkong/easy-spider-tool

在线文档:https://easy-spider-tool.xink.top/

## 注明

该工具借鉴作者【xingcweb】,根据主观新增部分功能


            

Raw data

            {
    "_id": null,
    "home_page": "https://easy-spider-tool.xink.top/",
    "name": "easy-spider-tool",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6.8",
    "maintainer_email": "",
    "keywords": "easy,spider,tool",
    "author": "hanxinkong",
    "author_email": "xinkonghan@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/9e/8a/fd38d87f3b11713e6202b58e14ced3baa8a206b65a330ac43c6ca2990efe/easy_spider_tool-1.0.16.tar.gz",
    "platform": null,
    "description": "# easy spider tool\r\n\r\n\u5728\u5b9e\u9645\u5de5\u4f5c\u4e2d\uff0c\u6c89\u6dc0\u7684\u4e00\u4e9b\u7b80\u6613\u3001\u597d\u7528\u7684\u722c\u866b\u5de5\u5177\uff0c\u51cf\u5c11\u91cd\u590d\u4ee3\u7801\u4e0e\u6587\u4ef6\u5197\u4f59\uff0c\u5e0c\u671b\u4e00\u6837\u80fd\u4e3a\u4f7f\u7528\u8005\u5e26\u6765\u76ca\u5904\u3002\u5982\u679c\u60a8\u4e5f\u60f3\u8d21\u732e\u597d\u7684\u4ee3\u7801\u7247\u6bb5\uff0c\u8bf7\u5c06\u4ee3\u7801\u4ee5\u53ca\u63cf\u8ff0\uff0c\u901a\u8fc7\u90ae\u7bb1\uff08 [xinkonghan@gmail.com](mailto:hanxinkong<xinkonghan@gmail.com>)\r\n\uff09\u53d1\u9001\u7ed9\u6211\u3002\u4ee3\u7801\u683c\u5f0f\u662f\u9075\u5faa\u81ea\u6211\u4e3b\u89c2\uff0c\u5982\u5b58\u5728\u4e0d\u8db3\u656c\u8bf7\u6307\u51fa\uff01\r\n\r\n## \u5b89\u88c5\r\n\r\n```shell\r\npip install easy_spider_tool\r\n```\r\n\r\n## \u4e3b\u8981\u529f\u80fd\r\n\r\n- `\u65f6\u95f4\u76f8\u5173`\r\n    - `before_day` \u6628\u5929\u65e5\u671f\uff08\u53ef\u7528\u4e8e\u65f6\u95f4\u9012\u51cf\uff09\r\n    - `after_day` \u660e\u5929\u65e5\u671f\uff08\u53ef\u7528\u4e8e\u65f6\u95f4\u9012\u589e\uff09\r\n    - `between_day` \u4e24\u4e2a\u65e5\u671f\u4e4b\u95f4\r\n    - `current_date` \u5f53\u524d\u65f6\u95f4\r\n    - `timestamp` \u5f53\u524d\u65f6\u95f4\u6233\uff08\u652f\u6301\u7cbe\u786e\u5230\u6beb\u79d2\uff09\r\n    - `date_parse` \u4efb\u610f\u683c\u5f0f\u65f6\u95f4\u89e3\u6790(\u652f\u6301\u65f6\u533a\u8f6c\u6362\uff0c\u6307\u5b9a\u4fdd\u7559\u65e5\u671f/\u65f6\u95f4\uff08\u53ef\u8bbe\u7f6e\u9ed8\u8ba4\u503c\uff09\u90e8\u5206)\r\n- `json\u76f8\u5173`\r\n    - `format_json` \u6f02\u4eae\u7f8e\u89c2\u7684\u683c\u5f0f\u5316\u8f93\u51fa\r\n    - `jsonpath` \u4efb\u610f\u591a\u4e2ajson\u8def\u5f84\u89e3\u6790\uff08\u652f\u6301\u8bbe\u7f6e\u9ed8\u8ba4\u503c\uff0c\u9009\u53d6\u9996\u4e2a\u5339\u914d\u503c\uff09\r\n- `hash\u6458\u8981\u76f8\u5173`\r\n    - `md5` \u5b57\u7b26\u7ecfmd5\u7f16\u7801\r\n- `\u6b63\u5219\u5339\u914d\u76f8\u5173`\r\n    - `regex_match` \u6761\u4ef6\u5339\u914d\uff08\u652f\u6301\u591a\u4e2a\u4e0d\u76f8\u5173\u6761\u4ef6\u5339\u914d,\u652f\u6301\u8bbe\u7f6e\u9ed8\u8ba4\u503c\uff0c\u9009\u53d6\u9996\u4e2a\u5339\u914d\u503c\uff09\r\n    - `for_to_regx_match` \u591a\u4e2a\u4e0d\u76f8\u5173\u6761\u4ef6\u5339\u914d\uff08\u517c\u5bb9\u8001\u7248\u672c\u4fdd\u7559\uff09\r\n- `\u6570\u636e\u6e05\u6d17/\u8f6c\u6362\u76f8\u5173`\r\n    - `cookie_to_dic` cookie\u8f6c\u6362\u4e3a\u5b57\u5178\uff08Dict\uff09\u683c\u5f0f\r\n    - `clear_value` \u6e05\u9664\u5217\u8868\uff08List\uff09\u6216\u5b57\u5178\uff08Dict\uff09\u4e2d\u7684\u6307\u5b9a\u503c\uff08\u9012\u5f52\u6e05\u9664\u6240\u6709\u5d4c\u5957\u5b57\u5178\u548c\u5217\u8868\u4e2d\u7684\u6307\u5b9a\u503c\uff09\r\n- `\u5408\u6cd5\u6027\u9a8c\u8bc1\u76f8\u5173`\r\n    - `verify_ip_address` IP\u5730\u5740\u5408\u6cd5\u6027\u9a8c\u8bc1\r\n    - `verify_domain_name` \u57df\u540d\u5408\u6cd5\u6027\u9a8c\u8bc1\r\n    - `verify_port` \u7aef\u53e3\u5408\u6cd5\u6027\u9a8c\u8bc1\r\n    - `verify_url` URL\u5408\u6cd5\u6027\u9a8c\u8bc1\r\n- `\u901a\u77e5\u76f8\u5173`\r\n    - \u6682\u65e0\r\n\r\n## \u7b80\u5355\u4f7f\u7528\r\n\r\n```python\r\nfrom easy_spider_tool import format_json, jsonpath\r\n\r\ndata = {\r\n    \"code\": 200,\r\n    \"data\": [\r\n        {\r\n            \"id\": 1,\r\n            \"username\": \"admin\",\r\n            \"level\": \"boss\"\r\n        },\r\n        {\r\n            \"id\": 2,\r\n            \"username\": \"user\",\r\n            \"level\": \"staff\"\r\n        }\r\n    ]\r\n}\r\n\r\nboss_name = jsonpath(data, '$.data[?(@.level==\"boss\")].username', first=True)\r\nall_user_info = jsonpath(data, '$.data[*].username')\r\n\r\nprint(boss_name)\r\nprint(format_json(all_user_info))\r\n```\r\n\r\n## \u94fe\u63a5\r\n\r\nGithub\uff1ahttps://github.com/hanxinkong/easy-spider-tool\r\n\r\n\u5728\u7ebf\u6587\u6863\uff1ahttps://easy-spider-tool.xink.top/\r\n\r\n## \u6ce8\u660e\r\n\r\n\u8be5\u5de5\u5177\u501f\u9274\u4f5c\u8005\u3010xingcweb\u3011,\u6839\u636e\u4e3b\u89c2\u65b0\u589e\u90e8\u5206\u529f\u80fd\r\n\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "\u7b80\u6613\u3001\u597d\u7528\u7684\u722c\u866b\u5de5\u5177,\u51cf\u5c11\u91cd\u590d\u4ee3\u7801\u4e0e\u6587\u4ef6\u5197\u4f59",
    "version": "1.0.16",
    "project_urls": {
        "Homepage": "https://easy-spider-tool.xink.top/"
    },
    "split_keywords": [
        "easy",
        "spider",
        "tool"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e040023ea7750e28c69e61498bf35b8880eceb34f36db5b1eee83c6faebd484b",
                "md5": "fbdd546c5d6bc9dc405e85df65e9e082",
                "sha256": "23b460e777d41b1767135d89f455dbb18b41456fd02b134b106419f89fe8b889"
            },
            "downloads": -1,
            "filename": "easy_spider_tool-1.0.16-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fbdd546c5d6bc9dc405e85df65e9e082",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6.8",
            "size": 13521,
            "upload_time": "2023-12-18T10:36:22",
            "upload_time_iso_8601": "2023-12-18T10:36:22.995333Z",
            "url": "https://files.pythonhosted.org/packages/e0/40/023ea7750e28c69e61498bf35b8880eceb34f36db5b1eee83c6faebd484b/easy_spider_tool-1.0.16-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9e8afd38d87f3b11713e6202b58e14ced3baa8a206b65a330ac43c6ca2990efe",
                "md5": "36ff87f57a700416b57c06275855e7b9",
                "sha256": "51b1561bff13c8706af99e60da6b847844dc80755af1e9ed9aef0b126712217f"
            },
            "downloads": -1,
            "filename": "easy_spider_tool-1.0.16.tar.gz",
            "has_sig": false,
            "md5_digest": "36ff87f57a700416b57c06275855e7b9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6.8",
            "size": 11970,
            "upload_time": "2023-12-18T10:36:24",
            "upload_time_iso_8601": "2023-12-18T10:36:24.924096Z",
            "url": "https://files.pythonhosted.org/packages/9e/8a/fd38d87f3b11713e6202b58e14ced3baa8a206b65a330ac43c6ca2990efe/easy_spider_tool-1.0.16.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-18 10:36:24",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "easy-spider-tool"
}
        
Elapsed time: 0.29433s