wauo


Namewauo JSON
Version 0.6.5.3 PyPI version JSON
download
home_pagehttps://github.com/markadc/wauo
Summary爬虫者的贴心助手
upload_time2024-08-14 14:58:34
maintainerNone
docs_urlNone
authorWangTuo
requires_pythonNone
licenseMIT
keywords python requests spider
VCS
bugtrack_url
requirements fake_useragent loguru requests
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # 更新历史

- 新增`jsonp2json`静态方法
- 爬虫`默认保持会话`状态
- 新增`get_uuid`、`base64加解密`静态方法
- 删除`download_text`、`download_bdata`,合并为`download`
- 新增`update_default_headers`方法
- `make_md5`支持`字符串`、`二进制`参数,并且可以加盐
- `send`方法加入`delay`参数,请求时可以设置延迟
- 新增`tools`包、`spiders`包
- `线程池管理者`加入上下文,可以使用`with`了
- 新增`get_results`方法,获取`所有fs`的返回值
- 可以`提前`在send方法之前`自定义延迟、超时`
- 线程池管理者新增`running`方法,可以用于判断任务状态
- `send`方法加入`详细注释`
- 新增`todos`方法、tools改为utils
- `done`加入func_name参数,可以定位到具体是哪一个`线程函数`出现异常
- `WaitPool`、`SpeedPool`
- 一些参数的变化(改名、补充注解)

# 项目说明

- 基于requests封装的一个爬虫类

# Python解释器

- python3.10+

# 如何使用?

## 开始导入

```python
from wauo import WauoSpider

spider = WauoSpider()
```

## 请求

### GET

- 默认是get请求

```python
url = 'https://github.com/markadc'
resp = spider.send(url)
print(resp.text)
```

### POST

- 使用了`data`或者`json`参数,则是post请求

```python
api = 'https://github.com/markadc'
payload = {
    'key1': 'value1',
    'key2': 'value2'
}
resp = spider.send(api, data=payload)  # 使用data参数
resp = spider.send(api, json=payload)  # 使用json参数
```

## 响应

### 校验响应

#### 1、限制响应码

- 如果响应码不在codes范围里则抛弃响应(此时`send`返回`None`)

```python
resp = spider.send('https://github.com/markadc', codes=[200, 301, 302])
```

#### 2、限制响应内容

- 如果checker返回False则抛弃响应(此时`send`返回`None`)

```python
def is_ok(response):
    html = response.text
    if html.find('验证码') != -1:
        return False


resp = spider.send('https://github.com/markadc', checker=is_ok)
```

## 设置默认请求配置

- 给headers设置Cookie
- 给headers设置代理
- 给headers设置认证信息
- ...

### 例子1

- 每一次请求的headers都带上`cookie`

```python
from wauo import WauoSpider

cookie = 'Your Cookies'
spider = WauoSpider(default_headers={'Cookie': cookie})
resp1 = spider.send('https://github.com/markadc')
resp2 = spider.send('https://github.com/markadc/wauo')
print(resp1.request.headers)
print(resp2.request.headers)
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/markadc/wauo",
    "name": "wauo",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "python, requests, spider",
    "author": "WangTuo",
    "author_email": "markadc@126.com",
    "download_url": "https://files.pythonhosted.org/packages/b4/95/10968413a4efd12e9ba685d909de1d9af946af55cf070d3d8a0db419c28d/wauo-0.6.5.3.tar.gz",
    "platform": null,
    "description": "# \u66f4\u65b0\u5386\u53f2\n\n- \u65b0\u589e`jsonp2json`\u9759\u6001\u65b9\u6cd5\n- \u722c\u866b`\u9ed8\u8ba4\u4fdd\u6301\u4f1a\u8bdd`\u72b6\u6001\n- \u65b0\u589e`get_uuid`\u3001`base64\u52a0\u89e3\u5bc6`\u9759\u6001\u65b9\u6cd5\n- \u5220\u9664`download_text`\u3001`download_bdata`\uff0c\u5408\u5e76\u4e3a`download`\n- \u65b0\u589e`update_default_headers`\u65b9\u6cd5\n- `make_md5`\u652f\u6301`\u5b57\u7b26\u4e32`\u3001`\u4e8c\u8fdb\u5236`\u53c2\u6570\uff0c\u5e76\u4e14\u53ef\u4ee5\u52a0\u76d0\n- `send`\u65b9\u6cd5\u52a0\u5165`delay`\u53c2\u6570\uff0c\u8bf7\u6c42\u65f6\u53ef\u4ee5\u8bbe\u7f6e\u5ef6\u8fdf\n- \u65b0\u589e`tools`\u5305\u3001`spiders`\u5305\n- `\u7ebf\u7a0b\u6c60\u7ba1\u7406\u8005`\u52a0\u5165\u4e0a\u4e0b\u6587\uff0c\u53ef\u4ee5\u4f7f\u7528`with`\u4e86\n- \u65b0\u589e`get_results`\u65b9\u6cd5\uff0c\u83b7\u53d6`\u6240\u6709fs`\u7684\u8fd4\u56de\u503c\n- \u53ef\u4ee5`\u63d0\u524d`\u5728send\u65b9\u6cd5\u4e4b\u524d`\u81ea\u5b9a\u4e49\u5ef6\u8fdf\u3001\u8d85\u65f6`\n- \u7ebf\u7a0b\u6c60\u7ba1\u7406\u8005\u65b0\u589e`running`\u65b9\u6cd5\uff0c\u53ef\u4ee5\u7528\u4e8e\u5224\u65ad\u4efb\u52a1\u72b6\u6001\n- `send`\u65b9\u6cd5\u52a0\u5165`\u8be6\u7ec6\u6ce8\u91ca`\n- \u65b0\u589e`todos`\u65b9\u6cd5\u3001tools\u6539\u4e3autils\n- `done`\u52a0\u5165func_name\u53c2\u6570\uff0c\u53ef\u4ee5\u5b9a\u4f4d\u5230\u5177\u4f53\u662f\u54ea\u4e00\u4e2a`\u7ebf\u7a0b\u51fd\u6570`\u51fa\u73b0\u5f02\u5e38\n- `WaitPool`\u3001`SpeedPool`\n- \u4e00\u4e9b\u53c2\u6570\u7684\u53d8\u5316\uff08\u6539\u540d\u3001\u8865\u5145\u6ce8\u89e3\uff09\n\n# \u9879\u76ee\u8bf4\u660e\n\n- \u57fa\u4e8erequests\u5c01\u88c5\u7684\u4e00\u4e2a\u722c\u866b\u7c7b\n\n# Python\u89e3\u91ca\u5668\n\n- python3.10+\n\n# \u5982\u4f55\u4f7f\u7528\uff1f\n\n## \u5f00\u59cb\u5bfc\u5165\n\n```python\nfrom wauo import WauoSpider\n\nspider = WauoSpider()\n```\n\n## \u8bf7\u6c42\n\n### GET\n\n- \u9ed8\u8ba4\u662fget\u8bf7\u6c42\n\n```python\nurl = 'https://github.com/markadc'\nresp = spider.send(url)\nprint(resp.text)\n```\n\n### POST\n\n- \u4f7f\u7528\u4e86`data`\u6216\u8005`json`\u53c2\u6570\uff0c\u5219\u662fpost\u8bf7\u6c42\n\n```python\napi = 'https://github.com/markadc'\npayload = {\n    'key1': 'value1',\n    'key2': 'value2'\n}\nresp = spider.send(api, data=payload)  # \u4f7f\u7528data\u53c2\u6570\nresp = spider.send(api, json=payload)  # \u4f7f\u7528json\u53c2\u6570\n```\n\n## \u54cd\u5e94\n\n### \u6821\u9a8c\u54cd\u5e94\n\n#### 1\u3001\u9650\u5236\u54cd\u5e94\u7801\n\n- \u5982\u679c\u54cd\u5e94\u7801\u4e0d\u5728codes\u8303\u56f4\u91cc\u5219\u629b\u5f03\u54cd\u5e94\uff08\u6b64\u65f6`send`\u8fd4\u56de`None`\uff09\n\n```python\nresp = spider.send('https://github.com/markadc', codes=[200, 301, 302])\n```\n\n#### 2\u3001\u9650\u5236\u54cd\u5e94\u5185\u5bb9\n\n- \u5982\u679cchecker\u8fd4\u56deFalse\u5219\u629b\u5f03\u54cd\u5e94\uff08\u6b64\u65f6`send`\u8fd4\u56de`None`\uff09\n\n```python\ndef is_ok(response):\n    html = response.text\n    if html.find('\u9a8c\u8bc1\u7801') != -1:\n        return False\n\n\nresp = spider.send('https://github.com/markadc', checker=is_ok)\n```\n\n## \u8bbe\u7f6e\u9ed8\u8ba4\u8bf7\u6c42\u914d\u7f6e\n\n- \u7ed9headers\u8bbe\u7f6eCookie\n- \u7ed9headers\u8bbe\u7f6e\u4ee3\u7406\n- \u7ed9headers\u8bbe\u7f6e\u8ba4\u8bc1\u4fe1\u606f\n- ...\n\n### \u4f8b\u5b501\n\n- \u6bcf\u4e00\u6b21\u8bf7\u6c42\u7684headers\u90fd\u5e26\u4e0a`cookie`\n\n```python\nfrom wauo import WauoSpider\n\ncookie = 'Your Cookies'\nspider = WauoSpider(default_headers={'Cookie': cookie})\nresp1 = spider.send('https://github.com/markadc')\nresp2 = spider.send('https://github.com/markadc/wauo')\nprint(resp1.request.headers)\nprint(resp2.request.headers)\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "\u722c\u866b\u8005\u7684\u8d34\u5fc3\u52a9\u624b",
    "version": "0.6.5.3",
    "project_urls": {
        "Homepage": "https://github.com/markadc/wauo"
    },
    "split_keywords": [
        "python",
        " requests",
        " spider"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b49510968413a4efd12e9ba685d909de1d9af946af55cf070d3d8a0db419c28d",
                "md5": "d4669acde176b18193cd8e1c07f6467e",
                "sha256": "7f70df36f17596e47bd18b5358b485978f6d02bca8f040afc6c7268f1a2759db"
            },
            "downloads": -1,
            "filename": "wauo-0.6.5.3.tar.gz",
            "has_sig": false,
            "md5_digest": "d4669acde176b18193cd8e1c07f6467e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 8577,
            "upload_time": "2024-08-14T14:58:34",
            "upload_time_iso_8601": "2024-08-14T14:58:34.464565Z",
            "url": "https://files.pythonhosted.org/packages/b4/95/10968413a4efd12e9ba685d909de1d9af946af55cf070d3d8a0db419c28d/wauo-0.6.5.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-14 14:58:34",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "markadc",
    "github_project": "wauo",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "fake_useragent",
            "specs": [
                [
                    "==",
                    "0.1.11"
                ]
            ]
        },
        {
            "name": "loguru",
            "specs": [
                [
                    "==",
                    "0.5.3"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    "==",
                    "2.28.1"
                ]
            ]
        }
    ],
    "lcname": "wauo"
}
        
Elapsed time: 0.30606s