Name | hssp JSON |
Version |
0.4.7
JSON |
| download |
home_page | None |
Summary | 一个简单快速的异步爬虫框架 |
upload_time | 2024-11-04 04:21:45 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.10 |
license | None |
keywords |
异步
爬虫
爬虫框架
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# HSSP 爬虫框架
一个基于python asyncio开发的爬虫框架 (开发中)
## 作者
- [@昊色居士](https://github.com/x-haose)
## 特性
- 使用scrapy框架的选择器`parsel`作为内置网页选择器
- 基于tenacity的自动异常重试
- 基于fake-useragent的可选随机UA
- 可选的多种下载器: httpx、aiohttp、requests、curl-cffi等
- 请求前、响应后、重试后监听
## 计划
- 在情求过程中临时更换下载器:比如net初始化时使用的是httpx下载器,其中一个情求要临时切换至 `DrissionPage`, 其他的依旧是httpx
- 支持 `DrissionPage` 浏览器渲染的下载器
- 支持 `playwright` 浏览器渲染的下载器
- 针对curl-cffi使用更多了配置项及自定义项
- 编写详细使用文档
## 安装
###
使用 pip 安装 hssp
```bash
pip install hssp
```
###
使用 rye 安装 hssp
```bash
rye add hssp
```
## 路线图
- 基于fake-useragent的随机UA
- curl-cff的支持
- drissionpage的支持
## 支持
如需支持,请发送电子邮件至 xhrtxh@gmail.com。
## 开发测试
项目使用`rye`管理依赖,需先安装rye
```bash
rye sync
```
Raw data
{
"_id": null,
"home_page": null,
"name": "hssp",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "\u5f02\u6b65, \u722c\u866b, \u722c\u866b\u6846\u67b6",
"author": null,
"author_email": "\u660a\u8272\u5c45\u58eb <xhrtxh@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/13/9a/2fe1e2a34e94ebc455dbf4d807cb72548ae163831105bf88255f59e0828d/hssp-0.4.7.tar.gz",
"platform": null,
"description": "# HSSP \u722c\u866b\u6846\u67b6\n\n\u4e00\u4e2a\u57fa\u4e8epython asyncio\u5f00\u53d1\u7684\u722c\u866b\u6846\u67b6 (\u5f00\u53d1\u4e2d)\n\n## \u4f5c\u8005\n\n- [@\u660a\u8272\u5c45\u58eb](https://github.com/x-haose)\n\n## \u7279\u6027\n\n- \u4f7f\u7528scrapy\u6846\u67b6\u7684\u9009\u62e9\u5668`parsel`\u4f5c\u4e3a\u5185\u7f6e\u7f51\u9875\u9009\u62e9\u5668\n- \u57fa\u4e8etenacity\u7684\u81ea\u52a8\u5f02\u5e38\u91cd\u8bd5\n- \u57fa\u4e8efake-useragent\u7684\u53ef\u9009\u968f\u673aUA\n- \u53ef\u9009\u7684\u591a\u79cd\u4e0b\u8f7d\u5668: httpx\u3001aiohttp\u3001requests\u3001curl-cffi\u7b49\n- \u8bf7\u6c42\u524d\u3001\u54cd\u5e94\u540e\u3001\u91cd\u8bd5\u540e\u76d1\u542c\n\n## \u8ba1\u5212\n\n- \u5728\u60c5\u6c42\u8fc7\u7a0b\u4e2d\u4e34\u65f6\u66f4\u6362\u4e0b\u8f7d\u5668\uff1a\u6bd4\u5982net\u521d\u59cb\u5316\u65f6\u4f7f\u7528\u7684\u662fhttpx\u4e0b\u8f7d\u5668\uff0c\u5176\u4e2d\u4e00\u4e2a\u60c5\u6c42\u8981\u4e34\u65f6\u5207\u6362\u81f3 `DrissionPage`, \u5176\u4ed6\u7684\u4f9d\u65e7\u662fhttpx\n- \u652f\u6301 `DrissionPage` \u6d4f\u89c8\u5668\u6e32\u67d3\u7684\u4e0b\u8f7d\u5668\n- \u652f\u6301 `playwright` \u6d4f\u89c8\u5668\u6e32\u67d3\u7684\u4e0b\u8f7d\u5668\n- \u9488\u5bf9curl-cffi\u4f7f\u7528\u66f4\u591a\u4e86\u914d\u7f6e\u9879\u53ca\u81ea\u5b9a\u4e49\u9879\n- \u7f16\u5199\u8be6\u7ec6\u4f7f\u7528\u6587\u6863\n\n## \u5b89\u88c5\n\n###\n\n\u4f7f\u7528 pip \u5b89\u88c5 hssp\n\n```bash\npip install hssp\n```\n\n###\n\n\u4f7f\u7528 rye \u5b89\u88c5 hssp\n\n```bash\nrye add hssp\n```\n\n## \u8def\u7ebf\u56fe\n\n- \u57fa\u4e8efake-useragent\u7684\u968f\u673aUA\n- curl-cff\u7684\u652f\u6301\n- drissionpage\u7684\u652f\u6301\n\n## \u652f\u6301\n\n\u5982\u9700\u652f\u6301\uff0c\u8bf7\u53d1\u9001\u7535\u5b50\u90ae\u4ef6\u81f3 xhrtxh@gmail.com\u3002\n\n## \u5f00\u53d1\u6d4b\u8bd5\n\n\u9879\u76ee\u4f7f\u7528`rye`\u7ba1\u7406\u4f9d\u8d56\uff0c\u9700\u5148\u5b89\u88c5rye\n\n```bash\n rye sync\n```\n",
"bugtrack_url": null,
"license": null,
"summary": "\u4e00\u4e2a\u7b80\u5355\u5feb\u901f\u7684\u5f02\u6b65\u722c\u866b\u6846\u67b6",
"version": "0.4.7",
"project_urls": {
"documentation": "https://github.com/x-haose/hssp",
"homepage": "https://github.com/x-haose/hssp",
"repository": "https://github.com/x-haose/hssp"
},
"split_keywords": [
"\u5f02\u6b65",
" \u722c\u866b",
" \u722c\u866b\u6846\u67b6"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "dc0ea6068350f1c30cb2fa58de4d923ff6e00a2a7c9d2fdbd0d894722333ba7a",
"md5": "d841e0d749543dac268a823d3d46c6fe",
"sha256": "9a3a4b8eab3e7d53ee5589585ba870bae34f043cdeefe82fe2551f5560da4273"
},
"downloads": -1,
"filename": "hssp-0.4.7-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d841e0d749543dac268a823d3d46c6fe",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 20009,
"upload_time": "2024-11-04T04:21:43",
"upload_time_iso_8601": "2024-11-04T04:21:43.688511Z",
"url": "https://files.pythonhosted.org/packages/dc/0e/a6068350f1c30cb2fa58de4d923ff6e00a2a7c9d2fdbd0d894722333ba7a/hssp-0.4.7-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "139a2fe1e2a34e94ebc455dbf4d807cb72548ae163831105bf88255f59e0828d",
"md5": "0ab32e96b074b11b2cb4ece15e7e759c",
"sha256": "186f05c3c2a57e224a8340b50e1c2fe66b7d710357cec68e4b1fba67aa2c46b8"
},
"downloads": -1,
"filename": "hssp-0.4.7.tar.gz",
"has_sig": false,
"md5_digest": "0ab32e96b074b11b2cb4ece15e7e759c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 34645,
"upload_time": "2024-11-04T04:21:45",
"upload_time_iso_8601": "2024-11-04T04:21:45.390612Z",
"url": "https://files.pythonhosted.org/packages/13/9a/2fe1e2a34e94ebc455dbf4d807cb72548ae163831105bf88255f59e0828d/hssp-0.4.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-04 04:21:45",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "x-haose",
"github_project": "hssp",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "hssp"
}