# 更新历史
- 新增`jsonp2json`静态方法
- 爬虫默认保持会话状态
- 新增`get_uuid`、`base64`加解密的静态方法
- 删除`download_text`、`download_bdata`,合并为`download`
- 新增`update_default_headers`方法
- `make_md5`支持字符串、二进制参数,并且可以加盐
- `send`方法加入`delay`参数,请求时可以设置延迟
- 新增`tools`包、`spiders`包
- 线程池管理者加入上下文,可以使用`with`了
- 新增`get_results`方法,获取所有`fs`的返回值
- 可以提前在send方法之前自定义延迟、超时
- 线程池管理者新增`running`方法,可以用于判断任务状态
- `send`方法加入详细注释
- 新增`todos`方法、tools改为utils
- `done`加入func_name参数,可以定位到具体是哪一个`线程函数`出现异常
- `WaitPool`、`SpeedPool`
- 一些参数的变化(改名、补充注解)
- 加入了一些装饰器函数
- 补充`send`方法中`**kwargs`的说明
- 新增`block`方法,可以进行阻塞
- 一些优化
# 项目说明
- 基于requests封装的一个爬虫类
# Python解释器
- python3.10+
# 如何使用?
## 开始导入
```python
from wauo import WauoSpider
spider = WauoSpider()
```
## 请求
### GET
- 默认是get请求
```python
url = 'https://github.com/markadc'
resp = spider.send(url)
print(resp.text)
```
### POST
- 使用了`data`或者`json`参数,则是post请求
```python
api = 'https://github.com/markadc'
payload = {
'key1': 'value1',
'key2': 'value2'
}
resp = spider.send(api, data=payload) # 使用data参数
resp = spider.send(api, json=payload) # 使用json参数
```
## 响应
### 校验响应
#### 1、限制响应码
- 如果响应码不在codes范围里则抛弃响应(此时`send`返回`None`)
```python
resp = spider.send('https://github.com/markadc', codes=[200, 301, 302])
```
#### 2、限制响应内容
- 如果checker返回False则抛弃响应(此时`send`返回`None`)
```python
def is_ok(response):
html = response.text
if html.find('验证码') != -1:
return False
resp = spider.send('https://github.com/markadc', checker=is_ok)
```
## 设置默认请求配置
- 给headers设置Cookie
- 给headers设置代理
- 给headers设置认证信息
- ...
### 例子1
- 每一次请求的headers都带上`cookie`
```python
from wauo import WauoSpider
cookie = 'Your Cookies'
spider = WauoSpider(default_headers={'Cookie': cookie})
resp1 = spider.send('https://github.com/markadc')
resp2 = spider.send('https://github.com/markadc/wauo')
print(resp1.request.headers)
print(resp2.request.headers)
```
Raw data
{
"_id": null,
"home_page": "https://github.com/markadc/wauo",
"name": "wauo",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "Python, Spider",
"author": "WangTuo",
"author_email": "markadc@126.com",
"download_url": "https://files.pythonhosted.org/packages/2a/da/ff908dc1d8a32cf374d773546a2514611ae2f405ead8be52fbf12b03a2ef/wauo-0.6.6.tar.gz",
"platform": null,
"description": "# \u66f4\u65b0\u5386\u53f2\n\n- \u65b0\u589e`jsonp2json`\u9759\u6001\u65b9\u6cd5\n- \u722c\u866b\u9ed8\u8ba4\u4fdd\u6301\u4f1a\u8bdd\u72b6\u6001\n- \u65b0\u589e`get_uuid`\u3001`base64`\u52a0\u89e3\u5bc6\u7684\u9759\u6001\u65b9\u6cd5\n- \u5220\u9664`download_text`\u3001`download_bdata`\uff0c\u5408\u5e76\u4e3a`download`\n- \u65b0\u589e`update_default_headers`\u65b9\u6cd5\n- `make_md5`\u652f\u6301\u5b57\u7b26\u4e32\u3001\u4e8c\u8fdb\u5236\u53c2\u6570\uff0c\u5e76\u4e14\u53ef\u4ee5\u52a0\u76d0\n- `send`\u65b9\u6cd5\u52a0\u5165`delay`\u53c2\u6570\uff0c\u8bf7\u6c42\u65f6\u53ef\u4ee5\u8bbe\u7f6e\u5ef6\u8fdf\n- \u65b0\u589e`tools`\u5305\u3001`spiders`\u5305\n- \u7ebf\u7a0b\u6c60\u7ba1\u7406\u8005\u52a0\u5165\u4e0a\u4e0b\u6587\uff0c\u53ef\u4ee5\u4f7f\u7528`with`\u4e86\n- \u65b0\u589e`get_results`\u65b9\u6cd5\uff0c\u83b7\u53d6\u6240\u6709`fs`\u7684\u8fd4\u56de\u503c\n- \u53ef\u4ee5\u63d0\u524d\u5728send\u65b9\u6cd5\u4e4b\u524d\u81ea\u5b9a\u4e49\u5ef6\u8fdf\u3001\u8d85\u65f6\n- \u7ebf\u7a0b\u6c60\u7ba1\u7406\u8005\u65b0\u589e`running`\u65b9\u6cd5\uff0c\u53ef\u4ee5\u7528\u4e8e\u5224\u65ad\u4efb\u52a1\u72b6\u6001\n- `send`\u65b9\u6cd5\u52a0\u5165\u8be6\u7ec6\u6ce8\u91ca\n- \u65b0\u589e`todos`\u65b9\u6cd5\u3001tools\u6539\u4e3autils\n- `done`\u52a0\u5165func_name\u53c2\u6570\uff0c\u53ef\u4ee5\u5b9a\u4f4d\u5230\u5177\u4f53\u662f\u54ea\u4e00\u4e2a`\u7ebf\u7a0b\u51fd\u6570`\u51fa\u73b0\u5f02\u5e38\n- `WaitPool`\u3001`SpeedPool`\n- \u4e00\u4e9b\u53c2\u6570\u7684\u53d8\u5316\uff08\u6539\u540d\u3001\u8865\u5145\u6ce8\u89e3\uff09\n- \u52a0\u5165\u4e86\u4e00\u4e9b\u88c5\u9970\u5668\u51fd\u6570\n- \u8865\u5145`send`\u65b9\u6cd5\u4e2d`**kwargs`\u7684\u8bf4\u660e\n- \u65b0\u589e`block`\u65b9\u6cd5\uff0c\u53ef\u4ee5\u8fdb\u884c\u963b\u585e\n- \u4e00\u4e9b\u4f18\u5316\n\n# \u9879\u76ee\u8bf4\u660e\n\n- \u57fa\u4e8erequests\u5c01\u88c5\u7684\u4e00\u4e2a\u722c\u866b\u7c7b\n\n# Python\u89e3\u91ca\u5668\n\n- python3.10+\n\n# \u5982\u4f55\u4f7f\u7528\uff1f\n\n## \u5f00\u59cb\u5bfc\u5165\n\n```python\nfrom wauo import WauoSpider\n\nspider = WauoSpider()\n```\n\n## \u8bf7\u6c42\n\n### GET\n\n- \u9ed8\u8ba4\u662fget\u8bf7\u6c42\n\n```python\nurl = 'https://github.com/markadc'\nresp = spider.send(url)\nprint(resp.text)\n```\n\n### POST\n\n- \u4f7f\u7528\u4e86`data`\u6216\u8005`json`\u53c2\u6570\uff0c\u5219\u662fpost\u8bf7\u6c42\n\n```python\napi = 'https://github.com/markadc'\npayload = {\n 'key1': 'value1',\n 'key2': 'value2'\n}\nresp = spider.send(api, data=payload) # \u4f7f\u7528data\u53c2\u6570\nresp = spider.send(api, json=payload) # \u4f7f\u7528json\u53c2\u6570\n```\n\n## \u54cd\u5e94\n\n### \u6821\u9a8c\u54cd\u5e94\n\n#### 1\u3001\u9650\u5236\u54cd\u5e94\u7801\n\n- \u5982\u679c\u54cd\u5e94\u7801\u4e0d\u5728codes\u8303\u56f4\u91cc\u5219\u629b\u5f03\u54cd\u5e94\uff08\u6b64\u65f6`send`\u8fd4\u56de`None`\uff09\n\n```python\nresp = spider.send('https://github.com/markadc', codes=[200, 301, 302])\n```\n\n#### 2\u3001\u9650\u5236\u54cd\u5e94\u5185\u5bb9\n\n- \u5982\u679cchecker\u8fd4\u56deFalse\u5219\u629b\u5f03\u54cd\u5e94\uff08\u6b64\u65f6`send`\u8fd4\u56de`None`\uff09\n\n```python\ndef is_ok(response):\n html = response.text\n if html.find('\u9a8c\u8bc1\u7801') != -1:\n return False\n\n\nresp = spider.send('https://github.com/markadc', checker=is_ok)\n```\n\n## \u8bbe\u7f6e\u9ed8\u8ba4\u8bf7\u6c42\u914d\u7f6e\n\n- \u7ed9headers\u8bbe\u7f6eCookie\n- \u7ed9headers\u8bbe\u7f6e\u4ee3\u7406\n- \u7ed9headers\u8bbe\u7f6e\u8ba4\u8bc1\u4fe1\u606f\n- ...\n\n### \u4f8b\u5b501\n\n- \u6bcf\u4e00\u6b21\u8bf7\u6c42\u7684headers\u90fd\u5e26\u4e0a`cookie`\n\n```python\nfrom wauo import WauoSpider\n\ncookie = 'Your Cookies'\nspider = WauoSpider(default_headers={'Cookie': cookie})\nresp1 = spider.send('https://github.com/markadc')\nresp2 = spider.send('https://github.com/markadc/wauo')\nprint(resp1.request.headers)\nprint(resp2.request.headers)\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "\u722c\u866b\u8005\u7684\u8d34\u5fc3\u52a9\u624b",
"version": "0.6.6",
"project_urls": {
"Homepage": "https://github.com/markadc/wauo"
},
"split_keywords": [
"python",
" spider"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "2adaff908dc1d8a32cf374d773546a2514611ae2f405ead8be52fbf12b03a2ef",
"md5": "c2b81ee41604a2eb1f6b289bc72a7be7",
"sha256": "0dc8c4939b9f9f6ed07cb9c343ff26a5f59dc700e8a9d98ff2c7653d11392f8f"
},
"downloads": -1,
"filename": "wauo-0.6.6.tar.gz",
"has_sig": false,
"md5_digest": "c2b81ee41604a2eb1f6b289bc72a7be7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 10936,
"upload_time": "2024-12-17T14:08:23",
"upload_time_iso_8601": "2024-12-17T14:08:23.341837Z",
"url": "https://files.pythonhosted.org/packages/2a/da/ff908dc1d8a32cf374d773546a2514611ae2f405ead8be52fbf12b03a2ef/wauo-0.6.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-17 14:08:23",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "markadc",
"github_project": "wauo",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "fake_useragent",
"specs": [
[
"==",
"0.1.11"
]
]
},
{
"name": "loguru",
"specs": [
[
"==",
"0.5.3"
]
]
},
{
"name": "requests",
"specs": [
[
"==",
"2.28.1"
]
]
},
{
"name": "parsel",
"specs": [
[
"==",
"1.9.1"
]
]
}
],
"lcname": "wauo"
}