# easy spider tool
在实际工作中,沉淀的一些简易、好用的爬虫工具,减少重复代码与文件冗余,希望一样能为使用者带来益处。如果您也想贡献好的代码片段,请将代码以及描述,通过邮箱( [xinkonghan@gmail.com](mailto:hanxinkong<xinkonghan@gmail.com>)
)发送给我。代码格式是遵循自我主观,如存在不足敬请指出!
## 安装
```shell
pip install easy_spider_tool
```
## 主要功能
- `时间相关`
- `before_day` 昨天日期(可用于时间递减)
- `after_day` 明天日期(可用于时间递增)
- `between_day` 两个日期之间
- `current_date` 当前时间
- `timestamp` 当前时间戳(支持精确到毫秒)
- `date_parse` 任意格式时间解析(支持时区转换,指定保留日期/时间(可设置默认值)部分)
- `json相关`
- `format_json` 漂亮美观的格式化输出
- `jsonpath` 任意多个json路径解析(支持设置默认值,选取首个匹配值)
- `hash摘要相关`
- `md5` 字符经md5编码
- `正则匹配相关`
- `regex_match` 条件匹配(支持多个不相关条件匹配,支持设置默认值,选取首个匹配值)
- `for_to_regx_match` 多个不相关条件匹配(兼容老版本保留)
- `数据清洗/转换相关`
- `cookie_to_dic` cookie转换为字典(Dict)格式
- `clear_value` 清除列表(List)或字典(Dict)中的指定值(递归清除所有嵌套字典和列表中的指定值)
- `合法性验证相关`
- `verify_ip_address` IP地址合法性验证
- `verify_domain_name` 域名合法性验证
- `verify_port` 端口合法性验证
- `verify_url` URL合法性验证
- `通知相关`
- 暂无
## 简单使用
```python
from easy_spider_tool import format_json, jsonpath
data = {
"code": 200,
"data": [
{
"id": 1,
"username": "admin",
"level": "boss"
},
{
"id": 2,
"username": "user",
"level": "staff"
}
]
}
boss_name = jsonpath(data, '$.data[?(@.level=="boss")].username', first=True)
all_user_info = jsonpath(data, '$.data[*].username')
print(boss_name)
print(format_json(all_user_info))
```
## 链接
Github:https://github.com/hanxinkong/easy-spider-tool
在线文档:https://easy-spider-tool.xink.top/
## 注明
该工具借鉴作者【xingcweb】,根据主观新增部分功能
Raw data
{
"_id": null,
"home_page": "https://easy-spider-tool.xink.top/",
"name": "easy-spider-tool",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6.8",
"maintainer_email": "",
"keywords": "easy,spider,tool",
"author": "hanxinkong",
"author_email": "xinkonghan@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/9e/8a/fd38d87f3b11713e6202b58e14ced3baa8a206b65a330ac43c6ca2990efe/easy_spider_tool-1.0.16.tar.gz",
"platform": null,
"description": "# easy spider tool\r\n\r\n\u5728\u5b9e\u9645\u5de5\u4f5c\u4e2d\uff0c\u6c89\u6dc0\u7684\u4e00\u4e9b\u7b80\u6613\u3001\u597d\u7528\u7684\u722c\u866b\u5de5\u5177\uff0c\u51cf\u5c11\u91cd\u590d\u4ee3\u7801\u4e0e\u6587\u4ef6\u5197\u4f59\uff0c\u5e0c\u671b\u4e00\u6837\u80fd\u4e3a\u4f7f\u7528\u8005\u5e26\u6765\u76ca\u5904\u3002\u5982\u679c\u60a8\u4e5f\u60f3\u8d21\u732e\u597d\u7684\u4ee3\u7801\u7247\u6bb5\uff0c\u8bf7\u5c06\u4ee3\u7801\u4ee5\u53ca\u63cf\u8ff0\uff0c\u901a\u8fc7\u90ae\u7bb1\uff08 [xinkonghan@gmail.com](mailto:hanxinkong<xinkonghan@gmail.com>)\r\n\uff09\u53d1\u9001\u7ed9\u6211\u3002\u4ee3\u7801\u683c\u5f0f\u662f\u9075\u5faa\u81ea\u6211\u4e3b\u89c2\uff0c\u5982\u5b58\u5728\u4e0d\u8db3\u656c\u8bf7\u6307\u51fa\uff01\r\n\r\n## \u5b89\u88c5\r\n\r\n```shell\r\npip install easy_spider_tool\r\n```\r\n\r\n## \u4e3b\u8981\u529f\u80fd\r\n\r\n- `\u65f6\u95f4\u76f8\u5173`\r\n - `before_day` \u6628\u5929\u65e5\u671f\uff08\u53ef\u7528\u4e8e\u65f6\u95f4\u9012\u51cf\uff09\r\n - `after_day` \u660e\u5929\u65e5\u671f\uff08\u53ef\u7528\u4e8e\u65f6\u95f4\u9012\u589e\uff09\r\n - `between_day` \u4e24\u4e2a\u65e5\u671f\u4e4b\u95f4\r\n - `current_date` \u5f53\u524d\u65f6\u95f4\r\n - `timestamp` \u5f53\u524d\u65f6\u95f4\u6233\uff08\u652f\u6301\u7cbe\u786e\u5230\u6beb\u79d2\uff09\r\n - `date_parse` \u4efb\u610f\u683c\u5f0f\u65f6\u95f4\u89e3\u6790(\u652f\u6301\u65f6\u533a\u8f6c\u6362\uff0c\u6307\u5b9a\u4fdd\u7559\u65e5\u671f/\u65f6\u95f4\uff08\u53ef\u8bbe\u7f6e\u9ed8\u8ba4\u503c\uff09\u90e8\u5206)\r\n- `json\u76f8\u5173`\r\n - `format_json` \u6f02\u4eae\u7f8e\u89c2\u7684\u683c\u5f0f\u5316\u8f93\u51fa\r\n - `jsonpath` \u4efb\u610f\u591a\u4e2ajson\u8def\u5f84\u89e3\u6790\uff08\u652f\u6301\u8bbe\u7f6e\u9ed8\u8ba4\u503c\uff0c\u9009\u53d6\u9996\u4e2a\u5339\u914d\u503c\uff09\r\n- `hash\u6458\u8981\u76f8\u5173`\r\n - `md5` \u5b57\u7b26\u7ecfmd5\u7f16\u7801\r\n- `\u6b63\u5219\u5339\u914d\u76f8\u5173`\r\n - `regex_match` \u6761\u4ef6\u5339\u914d\uff08\u652f\u6301\u591a\u4e2a\u4e0d\u76f8\u5173\u6761\u4ef6\u5339\u914d,\u652f\u6301\u8bbe\u7f6e\u9ed8\u8ba4\u503c\uff0c\u9009\u53d6\u9996\u4e2a\u5339\u914d\u503c\uff09\r\n - `for_to_regx_match` \u591a\u4e2a\u4e0d\u76f8\u5173\u6761\u4ef6\u5339\u914d\uff08\u517c\u5bb9\u8001\u7248\u672c\u4fdd\u7559\uff09\r\n- `\u6570\u636e\u6e05\u6d17/\u8f6c\u6362\u76f8\u5173`\r\n - `cookie_to_dic` cookie\u8f6c\u6362\u4e3a\u5b57\u5178\uff08Dict\uff09\u683c\u5f0f\r\n - `clear_value` \u6e05\u9664\u5217\u8868\uff08List\uff09\u6216\u5b57\u5178\uff08Dict\uff09\u4e2d\u7684\u6307\u5b9a\u503c\uff08\u9012\u5f52\u6e05\u9664\u6240\u6709\u5d4c\u5957\u5b57\u5178\u548c\u5217\u8868\u4e2d\u7684\u6307\u5b9a\u503c\uff09\r\n- `\u5408\u6cd5\u6027\u9a8c\u8bc1\u76f8\u5173`\r\n - `verify_ip_address` IP\u5730\u5740\u5408\u6cd5\u6027\u9a8c\u8bc1\r\n - `verify_domain_name` \u57df\u540d\u5408\u6cd5\u6027\u9a8c\u8bc1\r\n - `verify_port` \u7aef\u53e3\u5408\u6cd5\u6027\u9a8c\u8bc1\r\n - `verify_url` URL\u5408\u6cd5\u6027\u9a8c\u8bc1\r\n- `\u901a\u77e5\u76f8\u5173`\r\n - \u6682\u65e0\r\n\r\n## \u7b80\u5355\u4f7f\u7528\r\n\r\n```python\r\nfrom easy_spider_tool import format_json, jsonpath\r\n\r\ndata = {\r\n \"code\": 200,\r\n \"data\": [\r\n {\r\n \"id\": 1,\r\n \"username\": \"admin\",\r\n \"level\": \"boss\"\r\n },\r\n {\r\n \"id\": 2,\r\n \"username\": \"user\",\r\n \"level\": \"staff\"\r\n }\r\n ]\r\n}\r\n\r\nboss_name = jsonpath(data, '$.data[?(@.level==\"boss\")].username', first=True)\r\nall_user_info = jsonpath(data, '$.data[*].username')\r\n\r\nprint(boss_name)\r\nprint(format_json(all_user_info))\r\n```\r\n\r\n## \u94fe\u63a5\r\n\r\nGithub\uff1ahttps://github.com/hanxinkong/easy-spider-tool\r\n\r\n\u5728\u7ebf\u6587\u6863\uff1ahttps://easy-spider-tool.xink.top/\r\n\r\n## \u6ce8\u660e\r\n\r\n\u8be5\u5de5\u5177\u501f\u9274\u4f5c\u8005\u3010xingcweb\u3011,\u6839\u636e\u4e3b\u89c2\u65b0\u589e\u90e8\u5206\u529f\u80fd\r\n\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "\u7b80\u6613\u3001\u597d\u7528\u7684\u722c\u866b\u5de5\u5177,\u51cf\u5c11\u91cd\u590d\u4ee3\u7801\u4e0e\u6587\u4ef6\u5197\u4f59",
"version": "1.0.16",
"project_urls": {
"Homepage": "https://easy-spider-tool.xink.top/"
},
"split_keywords": [
"easy",
"spider",
"tool"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "e040023ea7750e28c69e61498bf35b8880eceb34f36db5b1eee83c6faebd484b",
"md5": "fbdd546c5d6bc9dc405e85df65e9e082",
"sha256": "23b460e777d41b1767135d89f455dbb18b41456fd02b134b106419f89fe8b889"
},
"downloads": -1,
"filename": "easy_spider_tool-1.0.16-py3-none-any.whl",
"has_sig": false,
"md5_digest": "fbdd546c5d6bc9dc405e85df65e9e082",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6.8",
"size": 13521,
"upload_time": "2023-12-18T10:36:22",
"upload_time_iso_8601": "2023-12-18T10:36:22.995333Z",
"url": "https://files.pythonhosted.org/packages/e0/40/023ea7750e28c69e61498bf35b8880eceb34f36db5b1eee83c6faebd484b/easy_spider_tool-1.0.16-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "9e8afd38d87f3b11713e6202b58e14ced3baa8a206b65a330ac43c6ca2990efe",
"md5": "36ff87f57a700416b57c06275855e7b9",
"sha256": "51b1561bff13c8706af99e60da6b847844dc80755af1e9ed9aef0b126712217f"
},
"downloads": -1,
"filename": "easy_spider_tool-1.0.16.tar.gz",
"has_sig": false,
"md5_digest": "36ff87f57a700416b57c06275855e7b9",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6.8",
"size": 11970,
"upload_time": "2023-12-18T10:36:24",
"upload_time_iso_8601": "2023-12-18T10:36:24.924096Z",
"url": "https://files.pythonhosted.org/packages/9e/8a/fd38d87f3b11713e6202b58e14ced3baa8a206b65a330ac43c6ca2990efe/easy_spider_tool-1.0.16.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-12-18 10:36:24",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "easy-spider-tool"
}