# simple-spider-tool-document
easy-spider-tool 可选xpath/jsonpath聚合解析扩展包
## 安装
```shell
pip install easy-spider-tool[document]
```
## 主要功能
- `data_extractor` 表达式数据解析(支持jsonpath,xpath)
- `xpath` xpath语法解析数据(支持首选项,设置默认值)
## 简单使用
```python
from easy_spider_tool_document import data_extractor
data = '<p>这是一个easy_spider_tool的document扩展的示例</p>'
print(data_extractor(data, ['//p//text()'], first=True, default=''))
# 这是一个easy_spider_tool的document扩展的示例
data = {
"code": 200,
"data": [
{
"id": 1,
"username": "admin",
"level": "boss"
},
{
"id": 2,
"username": "user",
"level": "staff"
}
]
}
print(data_extractor(data, ['$.data[*].username'], first=False, default=''))
# ['admin', 'user']
```
## 链接
Github:https://github.com/hanxinkong/easy-spider-tool-document
在线文档:https://easy-spider-tool-document.xink.top/
## 注明
Raw data
{
"_id": null,
"home_page": "https://easy-spider-tool-document.xink.top/",
"name": "easy-spider-tool-document",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6.8",
"maintainer_email": "",
"keywords": "easy,spider,tool,document",
"author": "hanxinkong",
"author_email": "xinkonghan@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/cc/c4/a5c79e72293655f3c83e5e6921a1699a3f3ebc37a78ce7e559b0160f39d7/easy-spider-tool-document-1.0.13.tar.gz",
"platform": null,
"description": "# simple-spider-tool-document\r\n\r\neasy-spider-tool \u53ef\u9009xpath/jsonpath\u805a\u5408\u89e3\u6790\u6269\u5c55\u5305\r\n\r\n## \u5b89\u88c5\r\n\r\n```shell\r\npip install easy-spider-tool[document]\r\n```\r\n\r\n## \u4e3b\u8981\u529f\u80fd\r\n\r\n- `data_extractor` \u8868\u8fbe\u5f0f\u6570\u636e\u89e3\u6790\uff08\u652f\u6301jsonpath,xpath\uff09\r\n- `xpath` xpath\u8bed\u6cd5\u89e3\u6790\u6570\u636e\uff08\u652f\u6301\u9996\u9009\u9879\uff0c\u8bbe\u7f6e\u9ed8\u8ba4\u503c\uff09\r\n\r\n## \u7b80\u5355\u4f7f\u7528\r\n\r\n```python\r\nfrom easy_spider_tool_document import data_extractor\r\n\r\ndata = '<p>\u8fd9\u662f\u4e00\u4e2aeasy_spider_tool\u7684document\u6269\u5c55\u7684\u793a\u4f8b</p>'\r\nprint(data_extractor(data, ['//p//text()'], first=True, default=''))\r\n# \u8fd9\u662f\u4e00\u4e2aeasy_spider_tool\u7684document\u6269\u5c55\u7684\u793a\u4f8b\r\n\r\ndata = {\r\n \"code\": 200,\r\n \"data\": [\r\n {\r\n \"id\": 1,\r\n \"username\": \"admin\",\r\n \"level\": \"boss\"\r\n },\r\n {\r\n \"id\": 2,\r\n \"username\": \"user\",\r\n \"level\": \"staff\"\r\n }\r\n ]\r\n}\r\n\r\nprint(data_extractor(data, ['$.data[*].username'], first=False, default=''))\r\n# ['admin', 'user']\r\n```\r\n\r\n## \u94fe\u63a5\r\n\r\nGithub\uff1ahttps://github.com/hanxinkong/easy-spider-tool-document\r\n\r\n\u5728\u7ebf\u6587\u6863\uff1ahttps://easy-spider-tool-document.xink.top/\r\n\r\n## \u6ce8\u660e\r\n\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "easy-spider-tool \u53ef\u9009xpath/jsonpath\u805a\u5408\u89e3\u6790\u6269\u5c55\u5305",
"version": "1.0.13",
"project_urls": {
"Homepage": "https://easy-spider-tool-document.xink.top/"
},
"split_keywords": [
"easy",
"spider",
"tool",
"document"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "03e65c3d7aefc0e43a5dbbd12231938a67522e16cbdc43b6b96828a3d853999d",
"md5": "fdb32d2f4568ad90d7f4ea621a08778a",
"sha256": "a7f1dabd1d1524cac3a0e98b6a6a16406a3bf34fb410decf9137b1ec9080e051"
},
"downloads": -1,
"filename": "easy_spider_tool_document-1.0.13-py3-none-any.whl",
"has_sig": false,
"md5_digest": "fdb32d2f4568ad90d7f4ea621a08778a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6.8",
"size": 4765,
"upload_time": "2023-09-21T02:19:31",
"upload_time_iso_8601": "2023-09-21T02:19:31.457189Z",
"url": "https://files.pythonhosted.org/packages/03/e6/5c3d7aefc0e43a5dbbd12231938a67522e16cbdc43b6b96828a3d853999d/easy_spider_tool_document-1.0.13-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ccc4a5c79e72293655f3c83e5e6921a1699a3f3ebc37a78ce7e559b0160f39d7",
"md5": "80fd5870f941c76bf75b564a72c52d10",
"sha256": "882295b48f25639bf3c36919f3ad860164c259dd718187df50c8257c268eb28b"
},
"downloads": -1,
"filename": "easy-spider-tool-document-1.0.13.tar.gz",
"has_sig": false,
"md5_digest": "80fd5870f941c76bf75b564a72c52d10",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6.8",
"size": 3340,
"upload_time": "2023-09-21T02:19:33",
"upload_time_iso_8601": "2023-09-21T02:19:33.197504Z",
"url": "https://files.pythonhosted.org/packages/cc/c4/a5c79e72293655f3c83e5e6921a1699a3f3ebc37a78ce7e559b0160f39d7/easy-spider-tool-document-1.0.13.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-09-21 02:19:33",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "easy-spider-tool-document"
}