wauo


Namewauo JSON
Version 0.6.6 PyPI version JSON
download
home_pagehttps://github.com/markadc/wauo
Summary爬虫者的贴心助手
upload_time2024-12-17 14:08:23
maintainerNone
docs_urlNone
authorWangTuo
requires_pythonNone
licenseMIT
keywords python spider
VCS
bugtrack_url
requirements fake_useragent loguru requests parsel
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # 更新历史

- 新增`jsonp2json`静态方法
- 爬虫默认保持会话状态
- 新增`get_uuid`、`base64`加解密的静态方法
- 删除`download_text`、`download_bdata`,合并为`download`
- 新增`update_default_headers`方法
- `make_md5`支持字符串、二进制参数,并且可以加盐
- `send`方法加入`delay`参数,请求时可以设置延迟
- 新增`tools`包、`spiders`包
- 线程池管理者加入上下文,可以使用`with`了
- 新增`get_results`方法,获取所有`fs`的返回值
- 可以提前在send方法之前自定义延迟、超时
- 线程池管理者新增`running`方法,可以用于判断任务状态
- `send`方法加入详细注释
- 新增`todos`方法、tools改为utils
- `done`加入func_name参数,可以定位到具体是哪一个`线程函数`出现异常
- `WaitPool`、`SpeedPool`
- 一些参数的变化(改名、补充注解)
- 加入了一些装饰器函数
- 补充`send`方法中`**kwargs`的说明
- 新增`block`方法,可以进行阻塞
- 一些优化

# 项目说明

- 基于requests封装的一个爬虫类

# Python解释器

- python3.10+

# 如何使用?

## 开始导入

```python
from wauo import WauoSpider

spider = WauoSpider()
```

## 请求

### GET

- 默认是get请求

```python
url = 'https://github.com/markadc'
resp = spider.send(url)
print(resp.text)
```

### POST

- 使用了`data`或者`json`参数,则是post请求

```python
api = 'https://github.com/markadc'
payload = {
    'key1': 'value1',
    'key2': 'value2'
}
resp = spider.send(api, data=payload)  # 使用data参数
resp = spider.send(api, json=payload)  # 使用json参数
```

## 响应

### 校验响应

#### 1、限制响应码

- 如果响应码不在codes范围里则抛弃响应(此时`send`返回`None`)

```python
resp = spider.send('https://github.com/markadc', codes=[200, 301, 302])
```

#### 2、限制响应内容

- 如果checker返回False则抛弃响应(此时`send`返回`None`)

```python
def is_ok(response):
    html = response.text
    if html.find('验证码') != -1:
        return False


resp = spider.send('https://github.com/markadc', checker=is_ok)
```

## 设置默认请求配置

- 给headers设置Cookie
- 给headers设置代理
- 给headers设置认证信息
- ...

### 例子1

- 每一次请求的headers都带上`cookie`

```python
from wauo import WauoSpider

cookie = 'Your Cookies'
spider = WauoSpider(default_headers={'Cookie': cookie})
resp1 = spider.send('https://github.com/markadc')
resp2 = spider.send('https://github.com/markadc/wauo')
print(resp1.request.headers)
print(resp2.request.headers)
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/markadc/wauo",
    "name": "wauo",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "Python, Spider",
    "author": "WangTuo",
    "author_email": "markadc@126.com",
    "download_url": "https://files.pythonhosted.org/packages/2a/da/ff908dc1d8a32cf374d773546a2514611ae2f405ead8be52fbf12b03a2ef/wauo-0.6.6.tar.gz",
    "platform": null,
    "description": "# \u66f4\u65b0\u5386\u53f2\n\n- \u65b0\u589e`jsonp2json`\u9759\u6001\u65b9\u6cd5\n- \u722c\u866b\u9ed8\u8ba4\u4fdd\u6301\u4f1a\u8bdd\u72b6\u6001\n- \u65b0\u589e`get_uuid`\u3001`base64`\u52a0\u89e3\u5bc6\u7684\u9759\u6001\u65b9\u6cd5\n- \u5220\u9664`download_text`\u3001`download_bdata`\uff0c\u5408\u5e76\u4e3a`download`\n- \u65b0\u589e`update_default_headers`\u65b9\u6cd5\n- `make_md5`\u652f\u6301\u5b57\u7b26\u4e32\u3001\u4e8c\u8fdb\u5236\u53c2\u6570\uff0c\u5e76\u4e14\u53ef\u4ee5\u52a0\u76d0\n- `send`\u65b9\u6cd5\u52a0\u5165`delay`\u53c2\u6570\uff0c\u8bf7\u6c42\u65f6\u53ef\u4ee5\u8bbe\u7f6e\u5ef6\u8fdf\n- \u65b0\u589e`tools`\u5305\u3001`spiders`\u5305\n- \u7ebf\u7a0b\u6c60\u7ba1\u7406\u8005\u52a0\u5165\u4e0a\u4e0b\u6587\uff0c\u53ef\u4ee5\u4f7f\u7528`with`\u4e86\n- \u65b0\u589e`get_results`\u65b9\u6cd5\uff0c\u83b7\u53d6\u6240\u6709`fs`\u7684\u8fd4\u56de\u503c\n- \u53ef\u4ee5\u63d0\u524d\u5728send\u65b9\u6cd5\u4e4b\u524d\u81ea\u5b9a\u4e49\u5ef6\u8fdf\u3001\u8d85\u65f6\n- \u7ebf\u7a0b\u6c60\u7ba1\u7406\u8005\u65b0\u589e`running`\u65b9\u6cd5\uff0c\u53ef\u4ee5\u7528\u4e8e\u5224\u65ad\u4efb\u52a1\u72b6\u6001\n- `send`\u65b9\u6cd5\u52a0\u5165\u8be6\u7ec6\u6ce8\u91ca\n- \u65b0\u589e`todos`\u65b9\u6cd5\u3001tools\u6539\u4e3autils\n- `done`\u52a0\u5165func_name\u53c2\u6570\uff0c\u53ef\u4ee5\u5b9a\u4f4d\u5230\u5177\u4f53\u662f\u54ea\u4e00\u4e2a`\u7ebf\u7a0b\u51fd\u6570`\u51fa\u73b0\u5f02\u5e38\n- `WaitPool`\u3001`SpeedPool`\n- \u4e00\u4e9b\u53c2\u6570\u7684\u53d8\u5316\uff08\u6539\u540d\u3001\u8865\u5145\u6ce8\u89e3\uff09\n- \u52a0\u5165\u4e86\u4e00\u4e9b\u88c5\u9970\u5668\u51fd\u6570\n- \u8865\u5145`send`\u65b9\u6cd5\u4e2d`**kwargs`\u7684\u8bf4\u660e\n- \u65b0\u589e`block`\u65b9\u6cd5\uff0c\u53ef\u4ee5\u8fdb\u884c\u963b\u585e\n- \u4e00\u4e9b\u4f18\u5316\n\n# \u9879\u76ee\u8bf4\u660e\n\n- \u57fa\u4e8erequests\u5c01\u88c5\u7684\u4e00\u4e2a\u722c\u866b\u7c7b\n\n# Python\u89e3\u91ca\u5668\n\n- python3.10+\n\n# \u5982\u4f55\u4f7f\u7528\uff1f\n\n## \u5f00\u59cb\u5bfc\u5165\n\n```python\nfrom wauo import WauoSpider\n\nspider = WauoSpider()\n```\n\n## \u8bf7\u6c42\n\n### GET\n\n- \u9ed8\u8ba4\u662fget\u8bf7\u6c42\n\n```python\nurl = 'https://github.com/markadc'\nresp = spider.send(url)\nprint(resp.text)\n```\n\n### POST\n\n- \u4f7f\u7528\u4e86`data`\u6216\u8005`json`\u53c2\u6570\uff0c\u5219\u662fpost\u8bf7\u6c42\n\n```python\napi = 'https://github.com/markadc'\npayload = {\n    'key1': 'value1',\n    'key2': 'value2'\n}\nresp = spider.send(api, data=payload)  # \u4f7f\u7528data\u53c2\u6570\nresp = spider.send(api, json=payload)  # \u4f7f\u7528json\u53c2\u6570\n```\n\n## \u54cd\u5e94\n\n### \u6821\u9a8c\u54cd\u5e94\n\n#### 1\u3001\u9650\u5236\u54cd\u5e94\u7801\n\n- \u5982\u679c\u54cd\u5e94\u7801\u4e0d\u5728codes\u8303\u56f4\u91cc\u5219\u629b\u5f03\u54cd\u5e94\uff08\u6b64\u65f6`send`\u8fd4\u56de`None`\uff09\n\n```python\nresp = spider.send('https://github.com/markadc', codes=[200, 301, 302])\n```\n\n#### 2\u3001\u9650\u5236\u54cd\u5e94\u5185\u5bb9\n\n- \u5982\u679cchecker\u8fd4\u56deFalse\u5219\u629b\u5f03\u54cd\u5e94\uff08\u6b64\u65f6`send`\u8fd4\u56de`None`\uff09\n\n```python\ndef is_ok(response):\n    html = response.text\n    if html.find('\u9a8c\u8bc1\u7801') != -1:\n        return False\n\n\nresp = spider.send('https://github.com/markadc', checker=is_ok)\n```\n\n## \u8bbe\u7f6e\u9ed8\u8ba4\u8bf7\u6c42\u914d\u7f6e\n\n- \u7ed9headers\u8bbe\u7f6eCookie\n- \u7ed9headers\u8bbe\u7f6e\u4ee3\u7406\n- \u7ed9headers\u8bbe\u7f6e\u8ba4\u8bc1\u4fe1\u606f\n- ...\n\n### \u4f8b\u5b501\n\n- \u6bcf\u4e00\u6b21\u8bf7\u6c42\u7684headers\u90fd\u5e26\u4e0a`cookie`\n\n```python\nfrom wauo import WauoSpider\n\ncookie = 'Your Cookies'\nspider = WauoSpider(default_headers={'Cookie': cookie})\nresp1 = spider.send('https://github.com/markadc')\nresp2 = spider.send('https://github.com/markadc/wauo')\nprint(resp1.request.headers)\nprint(resp2.request.headers)\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "\u722c\u866b\u8005\u7684\u8d34\u5fc3\u52a9\u624b",
    "version": "0.6.6",
    "project_urls": {
        "Homepage": "https://github.com/markadc/wauo"
    },
    "split_keywords": [
        "python",
        " spider"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2adaff908dc1d8a32cf374d773546a2514611ae2f405ead8be52fbf12b03a2ef",
                "md5": "c2b81ee41604a2eb1f6b289bc72a7be7",
                "sha256": "0dc8c4939b9f9f6ed07cb9c343ff26a5f59dc700e8a9d98ff2c7653d11392f8f"
            },
            "downloads": -1,
            "filename": "wauo-0.6.6.tar.gz",
            "has_sig": false,
            "md5_digest": "c2b81ee41604a2eb1f6b289bc72a7be7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 10936,
            "upload_time": "2024-12-17T14:08:23",
            "upload_time_iso_8601": "2024-12-17T14:08:23.341837Z",
            "url": "https://files.pythonhosted.org/packages/2a/da/ff908dc1d8a32cf374d773546a2514611ae2f405ead8be52fbf12b03a2ef/wauo-0.6.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-17 14:08:23",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "markadc",
    "github_project": "wauo",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "fake_useragent",
            "specs": [
                [
                    "==",
                    "0.1.11"
                ]
            ]
        },
        {
            "name": "loguru",
            "specs": [
                [
                    "==",
                    "0.5.3"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    "==",
                    "2.28.1"
                ]
            ]
        },
        {
            "name": "parsel",
            "specs": [
                [
                    "==",
                    "1.9.1"
                ]
            ]
        }
    ],
    "lcname": "wauo"
}
        
Elapsed time: 0.39232s