SmartProxyPool


NameSmartProxyPool JSON
Version 1.1.5 PyPI version JSON
download
home_pagehttps://github.com/yanjlee/SmartProxyPool
Summary高质量, 高灵活的开放代理池服务 可能是`全球第一个`带有`智能动态代理`的代理池服务..
upload_time2024-06-01 08:25:52
maintainerNone
docs_urlNone
authoryanjlee
requires_pythonNone
licenseNone
keywords
VCS
bugtrack_url
requirements APScheduler Werkzeug Flask requests lxml gevent Flask-RESTful ipip-datx pymongo flask-mongoengine Flask-Admin Flask-Security
Travis-CI
coveralls test coverage No coveralls.
            
高质量, 高灵活的开放代理池服务

可能是`全球第一个`带有`智能动态代理`的代理池服务.

这下牛皮吹大了, 不好下来.

[ProxyPool Demo](http://proxy.1again.cc:35050/api/v1/proxy/) (我就是个栗子, 别指望我能有多稳定!)

---

# 功能/特点

我们的目标是`高质量`, `高灵活`.

所有功能都是围绕这两点开发的:

1. 所有代理都有验证的`计数`和`评分`, 验证成功的次数 / 总计验证的次数 == 代理可用率 (数据库界面)

![](Docs/images/2019-06-12-22-11-21.png)

2. 支持动态代理(手动加粗)

```
root@1again:~# curl -x "proxy.1again.cc:36050" https://httpbin.org/ip
{
  "origin": "183.82.32.56"
}
root@1again:~# curl -x "proxy.1again.cc:36050" https://httpbin.org/ip
{
  "origin": "200.149.19.170"
}
root@1again:~# curl -x "proxy.1again.cc:36050" https://httpbin.org/ip
{
  "origin": "125.21.43.82"
}
root@1again:~# curl -x "proxy.1again.cc:36050" https://httpbin.org/ip
{
  "origin": "110.52.235.124"
}
root@1again:~# curl -x "proxy.1again.cc:36050" https://httpbin.org/ip
{
  "origin": "176.74.134.6"
}
```

3. 获取代理时可以根据是否支持`https`, 透明还是匿名(普匿)`type`, 代理的所在的区域`region`进行过滤, 举栗子

```
# 获取支持https的proxy
http://proxy.1again.cc:35050/api/v1/proxy/?https=1

# 获取匿名的proxy
http://proxy.1again.cc:35050/api/v1/proxy/?type=2

# 获取所在区域为中国的proxy
http://proxy.1again.cc:35050/api/v1/proxy/?region=中国

# 获取所在区域不为中国的proxy
http://proxy.1again.cc:35050/api/v1/proxy/?region=!中国

# 获取支持https, 匿名, 所在区域为中国的rpoxy
http://proxy.1again.cc:35050/api/v1/proxy/?https=1&type=2&region=中国
```

4. [WEB页面的管理](http://proxy.1again.cc:35050/admin) 用户名:admin 密码:admin (尔敢乱动, 打洗雷啊!)

![](Docs/images/2019-06-15-08-18-36.png)

5. 可以通过WEB界面配置参数.

![](Docs/images/2019-06-15-13-18-47.png)

6. WEB管理`抓取代理的站点`

![](Docs/images/2019-06-12-22-22-46.png)

7. 支持`gevent`并发模式, 效果杠杠的, 别看广告, 看疗效!

```
2019-06-13 10:00:26,656 ProxyFetch.py[line:103] INFO fetch [   xicidaili   ] proxy finish,             total:400, succ:65, fail:0, skip:335, elapsed_time:1s
2019-06-13 10:00:26,662 ProxyFetch.py[line:103] INFO fetch [ proxylistplus ] proxy finish,             total:0, succ:0, fail:0, skip:0, elapsed_time:1s
2019-06-13 10:00:27,179 ProxyFetch.py[line:103] INFO fetch [     iphai     ] proxy finish,             total:83, succ:17, fail:0, skip:66, elapsed_time:2s
2019-06-13 10:00:27,374 ProxyFetch.py[line:103] INFO fetch [     66ip      ] proxy finish,             total:0, succ:0, fail:0, skip:0, elapsed_time:2s
2019-06-13 10:00:32,276 ProxyFetch.py[line:103] INFO fetch [    ip3366     ] proxy finish,             total:15, succ:0, fail:0, skip:15, elapsed_time:7s
2019-06-13 10:00:33,888 ProxyFetch.py[line:103] INFO fetch [     ip181     ] proxy finish,             total:0, succ:0, fail:0, skip:0, elapsed_time:8s
2019-06-13 10:00:34,978 ProxyFetch.py[line:103] INFO fetch [    mimiip     ] proxy finish,             total:0, succ:0, fail:0, skip:0, elapsed_time:9s
2019-06-13 10:00:38,182 ProxyFetch.py[line:103] INFO fetch [  proxy-list   ] proxy finish,             total:28, succ:28, fail:0, skip:0, elapsed_time:13s
2019-06-13 10:01:36,432 ProxyVerify.py[line:301] INFO useful_proxy verify proxy finish, total:636, succ:327, fail:309, elapsed_time:58s
2019-06-13 10:31:15,800 ProxyVerify.py[line:301] INFO useful_proxy verify proxy finish, total:481, succ:299, fail:182, elapsed_time:37s
2019-06-13 11:01:37,569 ProxyVerify.py[line:301] INFO useful_proxy verify proxy finish, total:639, succ:315, fail:324, elapsed_time:59s
2019-06-13 11:31:54,798 ProxyVerify.py[line:301] INFO useful_proxy verify proxy finish, total:977, succ:342, fail:635, elapsed_time:76s
2019-06-13 12:01:21,659 ProxyVerify.py[line:301] INFO useful_proxy verify proxy finish, total:608, succ:314, fail:294, elapsed_time:43s
```

8. 实在编不下去了, 你行你来!

# 文档

[设计文档](Docs/Design.md)

# 目前

基本上满足了当初的设想, 准备开始写文档和代码优化.

# 使用场景

1. 主要还是用于爬虫.

2. 公司需要有个内部代理池服务, 用来干一些丧尽天良的坏事.

3. 个人需要用来干一些见不得人的事.

# 安装/部署

## 生产环境

```shell
# Install Docker
curl -sSL https://get.docker.com | sh

# start mongo database
docker run -d --name mongo -v /data/mongodb:/data -p 27017:27017 mongo

# Start proxy_pool container
docker run -d --name proxy_pool --link mongo:proxy_pool_db -p 35050:35050 -p 36050:36050 1again/proxy_pool
```

## 开发环境

```shell
# Clone Repo
git clone https://github.com/1again/ProxyPool

# Entry Dir
cd ProxyPool

# Install Docker
curl -sSL https://get.docker.com | sh

# start mongo database
docker run -d --name mongo -v /data/mongodb:/data -p 27017:27017 mongo

# Start proxy_pool container
docker run -it --rm --link mongo:proxy_pool_db -v $(pwd):/usr/src/app -p 35050:35050 -p 36050:36050 1again/proxy_pool
```

# 使用

启动过几分钟后就能看到抓取到的代理IP, 你可以直接在WEB管理界面中中查看

## DYNAMIC PROXY

```shell
curl -x 'your_server_ip:36050' your_access_url

like this:
curl -x "proxy.1again.cc:36050" https://httpbin.org/ip
```

## RESTFUL API

```python

API_LIST = {
    "/api/v1/proxy/": {
        "args": {
            "https": {
                "value": [1],
                "desc": "need https proxy? 1 == true",
                "required": False,
            },
            "region": {
                "value": "region name like 中国 or 广州 or 江苏",
                "desc": "Get Region Proxy",
                "required": False,
            },
            "type": {
                "value": [1,2],
                "desc": "clear proxy 1 or (common) anonymous 2",
                "required": False,
            }
        },
        "desc": "Get A Random Proxy"
    },
    "/api/v1/proxies/": {
        "args": {
            "https": {
                "value": [1],
                "desc": "need https proxy? 1 == true",
                "required": False,
            },
            "region": {
                "value": "region name like 中国 or 广州 or 江苏",
                "desc": "Get Region Proxy",
                "required": False,
            },
            "type": {
                "value": [1,2],
                "desc": "clear proxy 1 or (common) anonymous 2",
                "required": False,
            }
        },
        "desc": "Get All Proxy",
    },
}

```

## 扩展代理

项目默认包含几个免费的代理获取方法

如果遇到好的免费代理渠道, 可以自行添加其他代理获取的方法.

添加一个新的代理获取方法如下:

首先在`Src/Fetcher/fetchers/`目录中添加你的代理类.

该类需要有一个`run`方法, 以生成器(yield)形式返回`host:ip`格式的代理,例如:

```python

# 文件名任意, 一般建议与`fetcher_host`的中间部分保持一致方便识别
# Class名, 固定为`CustomFetcher`
class CustomFetcher():
    # 只用来识别的, 会映射到数据库里面
    fetcher_host = "www.66ip.cn"

    def run(self):
        url_list = [
            'http://www.xxx.com/',
        ]
        for url in url_list:
            html_tree = getHtmlTree(url)
            ul_list = html_tree.xpath('//ul[@class="l2"]')
            for ul in ul_list:
                try:
                    yield ':'.join(ul.xpath('.//li/text()')[0:2])
                except Exception as e:
                    print(e)
```

`ProxyFetchSchedule` 会每隔一段时间抓取一次代理,下次抓取时会自动识别调用你定义的方法。

# Contributing

感谢你的支持, 让我们变得更好!

为了规范和清晰, 我们需要一起做些简单约定.

两个主要的分支
develop  为下个版本的内容
master   为当前稳定版本的内容

1. 小修小改, 不影响原版本的修改, 可以在develop上进行, 然后pull requests
2. 大动干戈, 影响之前版本的修改, 需要新建一个分支eg: feature_random_proxy, 然后进行pull requests.

我会将新分支合并到develop上, 并在演示的机器上运行一段时间后合并至master.

以上, 感谢!

# 问题反馈

任何问题欢迎在[Issues](https://github.com/1again/ProxyPool/issues)中反馈.

我们的目标是, 没有蛀牙!

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/yanjlee/SmartProxyPool",
    "name": "SmartProxyPool",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "yanjlee",
    "author_email": "yanjlee@163.com",
    "download_url": "https://files.pythonhosted.org/packages/5e/38/4d08d99308efc9ba19646ee623de2797d208cf71cd3a54c9d2581a6e2a83/smartproxypool-1.1.5.tar.gz",
    "platform": null,
    "description": "\r\n\u9ad8\u8d28\u91cf, \u9ad8\u7075\u6d3b\u7684\u5f00\u653e\u4ee3\u7406\u6c60\u670d\u52a1\r\n\r\n\u53ef\u80fd\u662f`\u5168\u7403\u7b2c\u4e00\u4e2a`\u5e26\u6709`\u667a\u80fd\u52a8\u6001\u4ee3\u7406`\u7684\u4ee3\u7406\u6c60\u670d\u52a1.\r\n\r\n\u8fd9\u4e0b\u725b\u76ae\u5439\u5927\u4e86, \u4e0d\u597d\u4e0b\u6765.\r\n\r\n[ProxyPool Demo](http://proxy.1again.cc:35050/api/v1/proxy/) (\u6211\u5c31\u662f\u4e2a\u6817\u5b50, \u522b\u6307\u671b\u6211\u80fd\u6709\u591a\u7a33\u5b9a!)\r\n\r\n---\r\n\r\n# \u529f\u80fd/\u7279\u70b9\r\n\r\n\u6211\u4eec\u7684\u76ee\u6807\u662f`\u9ad8\u8d28\u91cf`, `\u9ad8\u7075\u6d3b`.\r\n\r\n\u6240\u6709\u529f\u80fd\u90fd\u662f\u56f4\u7ed5\u8fd9\u4e24\u70b9\u5f00\u53d1\u7684:\r\n\r\n1. \u6240\u6709\u4ee3\u7406\u90fd\u6709\u9a8c\u8bc1\u7684`\u8ba1\u6570`\u548c`\u8bc4\u5206`, \u9a8c\u8bc1\u6210\u529f\u7684\u6b21\u6570 / \u603b\u8ba1\u9a8c\u8bc1\u7684\u6b21\u6570 == \u4ee3\u7406\u53ef\u7528\u7387 (\u6570\u636e\u5e93\u754c\u9762)\r\n\r\n![](Docs/images/2019-06-12-22-11-21.png)\r\n\r\n2. \u652f\u6301\u52a8\u6001\u4ee3\u7406(\u624b\u52a8\u52a0\u7c97)\r\n\r\n```\r\nroot@1again:~# curl -x \"proxy.1again.cc:36050\" https://httpbin.org/ip\r\n{\r\n  \"origin\": \"183.82.32.56\"\r\n}\r\nroot@1again:~# curl -x \"proxy.1again.cc:36050\" https://httpbin.org/ip\r\n{\r\n  \"origin\": \"200.149.19.170\"\r\n}\r\nroot@1again:~# curl -x \"proxy.1again.cc:36050\" https://httpbin.org/ip\r\n{\r\n  \"origin\": \"125.21.43.82\"\r\n}\r\nroot@1again:~# curl -x \"proxy.1again.cc:36050\" https://httpbin.org/ip\r\n{\r\n  \"origin\": \"110.52.235.124\"\r\n}\r\nroot@1again:~# curl -x \"proxy.1again.cc:36050\" https://httpbin.org/ip\r\n{\r\n  \"origin\": \"176.74.134.6\"\r\n}\r\n```\r\n\r\n3. \u83b7\u53d6\u4ee3\u7406\u65f6\u53ef\u4ee5\u6839\u636e\u662f\u5426\u652f\u6301`https`, \u900f\u660e\u8fd8\u662f\u533f\u540d(\u666e\u533f)`type`, \u4ee3\u7406\u7684\u6240\u5728\u7684\u533a\u57df`region`\u8fdb\u884c\u8fc7\u6ee4, \u4e3e\u6817\u5b50\r\n\r\n```\r\n# \u83b7\u53d6\u652f\u6301https\u7684proxy\r\nhttp://proxy.1again.cc:35050/api/v1/proxy/?https=1\r\n\r\n# \u83b7\u53d6\u533f\u540d\u7684proxy\r\nhttp://proxy.1again.cc:35050/api/v1/proxy/?type=2\r\n\r\n# \u83b7\u53d6\u6240\u5728\u533a\u57df\u4e3a\u4e2d\u56fd\u7684proxy\r\nhttp://proxy.1again.cc:35050/api/v1/proxy/?region=\u4e2d\u56fd\r\n\r\n# \u83b7\u53d6\u6240\u5728\u533a\u57df\u4e0d\u4e3a\u4e2d\u56fd\u7684proxy\r\nhttp://proxy.1again.cc:35050/api/v1/proxy/?region=!\u4e2d\u56fd\r\n\r\n# \u83b7\u53d6\u652f\u6301https, \u533f\u540d, \u6240\u5728\u533a\u57df\u4e3a\u4e2d\u56fd\u7684rpoxy\r\nhttp://proxy.1again.cc:35050/api/v1/proxy/?https=1&type=2&region=\u4e2d\u56fd\r\n```\r\n\r\n4. [WEB\u9875\u9762\u7684\u7ba1\u7406](http://proxy.1again.cc:35050/admin) \u7528\u6237\u540d:admin \u5bc6\u7801:admin (\u5c14\u6562\u4e71\u52a8, \u6253\u6d17\u96f7\u554a!)\r\n\r\n![](Docs/images/2019-06-15-08-18-36.png)\r\n\r\n5. \u53ef\u4ee5\u901a\u8fc7WEB\u754c\u9762\u914d\u7f6e\u53c2\u6570.\r\n\r\n![](Docs/images/2019-06-15-13-18-47.png)\r\n\r\n6. WEB\u7ba1\u7406`\u6293\u53d6\u4ee3\u7406\u7684\u7ad9\u70b9`\r\n\r\n![](Docs/images/2019-06-12-22-22-46.png)\r\n\r\n7. \u652f\u6301`gevent`\u5e76\u53d1\u6a21\u5f0f, \u6548\u679c\u6760\u6760\u7684, \u522b\u770b\u5e7f\u544a, \u770b\u7597\u6548!\r\n\r\n```\r\n2019-06-13 10:00:26,656 ProxyFetch.py[line:103] INFO fetch [   xicidaili   ] proxy finish,             total:400, succ:65, fail:0, skip:335, elapsed_time:1s\r\n2019-06-13 10:00:26,662 ProxyFetch.py[line:103] INFO fetch [ proxylistplus ] proxy finish,             total:0, succ:0, fail:0, skip:0, elapsed_time:1s\r\n2019-06-13 10:00:27,179 ProxyFetch.py[line:103] INFO fetch [     iphai     ] proxy finish,             total:83, succ:17, fail:0, skip:66, elapsed_time:2s\r\n2019-06-13 10:00:27,374 ProxyFetch.py[line:103] INFO fetch [     66ip      ] proxy finish,             total:0, succ:0, fail:0, skip:0, elapsed_time:2s\r\n2019-06-13 10:00:32,276 ProxyFetch.py[line:103] INFO fetch [    ip3366     ] proxy finish,             total:15, succ:0, fail:0, skip:15, elapsed_time:7s\r\n2019-06-13 10:00:33,888 ProxyFetch.py[line:103] INFO fetch [     ip181     ] proxy finish,             total:0, succ:0, fail:0, skip:0, elapsed_time:8s\r\n2019-06-13 10:00:34,978 ProxyFetch.py[line:103] INFO fetch [    mimiip     ] proxy finish,             total:0, succ:0, fail:0, skip:0, elapsed_time:9s\r\n2019-06-13 10:00:38,182 ProxyFetch.py[line:103] INFO fetch [  proxy-list   ] proxy finish,             total:28, succ:28, fail:0, skip:0, elapsed_time:13s\r\n2019-06-13 10:01:36,432 ProxyVerify.py[line:301] INFO useful_proxy verify proxy finish, total:636, succ:327, fail:309, elapsed_time:58s\r\n2019-06-13 10:31:15,800 ProxyVerify.py[line:301] INFO useful_proxy verify proxy finish, total:481, succ:299, fail:182, elapsed_time:37s\r\n2019-06-13 11:01:37,569 ProxyVerify.py[line:301] INFO useful_proxy verify proxy finish, total:639, succ:315, fail:324, elapsed_time:59s\r\n2019-06-13 11:31:54,798 ProxyVerify.py[line:301] INFO useful_proxy verify proxy finish, total:977, succ:342, fail:635, elapsed_time:76s\r\n2019-06-13 12:01:21,659 ProxyVerify.py[line:301] INFO useful_proxy verify proxy finish, total:608, succ:314, fail:294, elapsed_time:43s\r\n```\r\n\r\n8. \u5b9e\u5728\u7f16\u4e0d\u4e0b\u53bb\u4e86, \u4f60\u884c\u4f60\u6765!\r\n\r\n# \u6587\u6863\r\n\r\n[\u8bbe\u8ba1\u6587\u6863](Docs/Design.md)\r\n\r\n# \u76ee\u524d\r\n\r\n\u57fa\u672c\u4e0a\u6ee1\u8db3\u4e86\u5f53\u521d\u7684\u8bbe\u60f3, \u51c6\u5907\u5f00\u59cb\u5199\u6587\u6863\u548c\u4ee3\u7801\u4f18\u5316.\r\n\r\n# \u4f7f\u7528\u573a\u666f\r\n\r\n1. \u4e3b\u8981\u8fd8\u662f\u7528\u4e8e\u722c\u866b.\r\n\r\n2. \u516c\u53f8\u9700\u8981\u6709\u4e2a\u5185\u90e8\u4ee3\u7406\u6c60\u670d\u52a1, \u7528\u6765\u5e72\u4e00\u4e9b\u4e27\u5c3d\u5929\u826f\u7684\u574f\u4e8b.\r\n\r\n3. \u4e2a\u4eba\u9700\u8981\u7528\u6765\u5e72\u4e00\u4e9b\u89c1\u4e0d\u5f97\u4eba\u7684\u4e8b.\r\n\r\n# \u5b89\u88c5/\u90e8\u7f72\r\n\r\n## \u751f\u4ea7\u73af\u5883\r\n\r\n```shell\r\n# Install Docker\r\ncurl -sSL https://get.docker.com | sh\r\n\r\n# start mongo database\r\ndocker run -d --name mongo -v /data/mongodb:/data -p 27017:27017 mongo\r\n\r\n# Start proxy_pool container\r\ndocker run -d --name proxy_pool --link mongo:proxy_pool_db -p 35050:35050 -p 36050:36050 1again/proxy_pool\r\n```\r\n\r\n## \u5f00\u53d1\u73af\u5883\r\n\r\n```shell\r\n# Clone Repo\r\ngit clone https://github.com/1again/ProxyPool\r\n\r\n# Entry Dir\r\ncd ProxyPool\r\n\r\n# Install Docker\r\ncurl -sSL https://get.docker.com | sh\r\n\r\n# start mongo database\r\ndocker run -d --name mongo -v /data/mongodb:/data -p 27017:27017 mongo\r\n\r\n# Start proxy_pool container\r\ndocker run -it --rm --link mongo:proxy_pool_db -v $(pwd):/usr/src/app -p 35050:35050 -p 36050:36050 1again/proxy_pool\r\n```\r\n\r\n# \u4f7f\u7528\r\n\r\n\u542f\u52a8\u8fc7\u51e0\u5206\u949f\u540e\u5c31\u80fd\u770b\u5230\u6293\u53d6\u5230\u7684\u4ee3\u7406IP, \u4f60\u53ef\u4ee5\u76f4\u63a5\u5728WEB\u7ba1\u7406\u754c\u9762\u4e2d\u4e2d\u67e5\u770b\r\n\r\n## DYNAMIC PROXY\r\n\r\n```shell\r\ncurl -x 'your_server_ip:36050' your_access_url\r\n\r\nlike this:\r\ncurl -x \"proxy.1again.cc:36050\" https://httpbin.org/ip\r\n```\r\n\r\n## RESTFUL API\r\n\r\n```python\r\n\r\nAPI_LIST = {\r\n    \"/api/v1/proxy/\": {\r\n        \"args\": {\r\n            \"https\": {\r\n                \"value\": [1],\r\n                \"desc\": \"need https proxy? 1 == true\",\r\n                \"required\": False,\r\n            },\r\n            \"region\": {\r\n                \"value\": \"region name like \u4e2d\u56fd or \u5e7f\u5dde or \u6c5f\u82cf\",\r\n                \"desc\": \"Get Region Proxy\",\r\n                \"required\": False,\r\n            },\r\n            \"type\": {\r\n                \"value\": [1,2],\r\n                \"desc\": \"clear proxy 1 or (common) anonymous 2\",\r\n                \"required\": False,\r\n            }\r\n        },\r\n        \"desc\": \"Get A Random Proxy\"\r\n    },\r\n    \"/api/v1/proxies/\": {\r\n        \"args\": {\r\n            \"https\": {\r\n                \"value\": [1],\r\n                \"desc\": \"need https proxy? 1 == true\",\r\n                \"required\": False,\r\n            },\r\n            \"region\": {\r\n                \"value\": \"region name like \u4e2d\u56fd or \u5e7f\u5dde or \u6c5f\u82cf\",\r\n                \"desc\": \"Get Region Proxy\",\r\n                \"required\": False,\r\n            },\r\n            \"type\": {\r\n                \"value\": [1,2],\r\n                \"desc\": \"clear proxy 1 or (common) anonymous 2\",\r\n                \"required\": False,\r\n            }\r\n        },\r\n        \"desc\": \"Get All Proxy\",\r\n    },\r\n}\r\n\r\n```\r\n\r\n## \u6269\u5c55\u4ee3\u7406\r\n\r\n\u9879\u76ee\u9ed8\u8ba4\u5305\u542b\u51e0\u4e2a\u514d\u8d39\u7684\u4ee3\u7406\u83b7\u53d6\u65b9\u6cd5\r\n\r\n\u5982\u679c\u9047\u5230\u597d\u7684\u514d\u8d39\u4ee3\u7406\u6e20\u9053, \u53ef\u4ee5\u81ea\u884c\u6dfb\u52a0\u5176\u4ed6\u4ee3\u7406\u83b7\u53d6\u7684\u65b9\u6cd5.\r\n\r\n\u6dfb\u52a0\u4e00\u4e2a\u65b0\u7684\u4ee3\u7406\u83b7\u53d6\u65b9\u6cd5\u5982\u4e0b:\r\n\r\n\u9996\u5148\u5728`Src/Fetcher/fetchers/`\u76ee\u5f55\u4e2d\u6dfb\u52a0\u4f60\u7684\u4ee3\u7406\u7c7b.\r\n\r\n\u8be5\u7c7b\u9700\u8981\u6709\u4e00\u4e2a`run`\u65b9\u6cd5, \u4ee5\u751f\u6210\u5668(yield)\u5f62\u5f0f\u8fd4\u56de`host:ip`\u683c\u5f0f\u7684\u4ee3\u7406\uff0c\u4f8b\u5982:\r\n\r\n```python\r\n\r\n# \u6587\u4ef6\u540d\u4efb\u610f, \u4e00\u822c\u5efa\u8bae\u4e0e`fetcher_host`\u7684\u4e2d\u95f4\u90e8\u5206\u4fdd\u6301\u4e00\u81f4\u65b9\u4fbf\u8bc6\u522b\r\n# Class\u540d, \u56fa\u5b9a\u4e3a`CustomFetcher`\r\nclass CustomFetcher():\r\n    # \u53ea\u7528\u6765\u8bc6\u522b\u7684, \u4f1a\u6620\u5c04\u5230\u6570\u636e\u5e93\u91cc\u9762\r\n    fetcher_host = \"www.66ip.cn\"\r\n\r\n    def run(self):\r\n        url_list = [\r\n            'http://www.xxx.com/',\r\n        ]\r\n        for url in url_list:\r\n            html_tree = getHtmlTree(url)\r\n            ul_list = html_tree.xpath('//ul[@class=\"l2\"]')\r\n            for ul in ul_list:\r\n                try:\r\n                    yield ':'.join(ul.xpath('.//li/text()')[0:2])\r\n                except Exception as e:\r\n                    print(e)\r\n```\r\n\r\n`ProxyFetchSchedule` \u4f1a\u6bcf\u9694\u4e00\u6bb5\u65f6\u95f4\u6293\u53d6\u4e00\u6b21\u4ee3\u7406\uff0c\u4e0b\u6b21\u6293\u53d6\u65f6\u4f1a\u81ea\u52a8\u8bc6\u522b\u8c03\u7528\u4f60\u5b9a\u4e49\u7684\u65b9\u6cd5\u3002\r\n\r\n# Contributing\r\n\r\n\u611f\u8c22\u4f60\u7684\u652f\u6301, \u8ba9\u6211\u4eec\u53d8\u5f97\u66f4\u597d!\r\n\r\n\u4e3a\u4e86\u89c4\u8303\u548c\u6e05\u6670, \u6211\u4eec\u9700\u8981\u4e00\u8d77\u505a\u4e9b\u7b80\u5355\u7ea6\u5b9a.\r\n\r\n\u4e24\u4e2a\u4e3b\u8981\u7684\u5206\u652f\r\ndevelop  \u4e3a\u4e0b\u4e2a\u7248\u672c\u7684\u5185\u5bb9\r\nmaster   \u4e3a\u5f53\u524d\u7a33\u5b9a\u7248\u672c\u7684\u5185\u5bb9\r\n\r\n1. \u5c0f\u4fee\u5c0f\u6539, \u4e0d\u5f71\u54cd\u539f\u7248\u672c\u7684\u4fee\u6539, \u53ef\u4ee5\u5728develop\u4e0a\u8fdb\u884c, \u7136\u540epull requests\r\n2. \u5927\u52a8\u5e72\u6208, \u5f71\u54cd\u4e4b\u524d\u7248\u672c\u7684\u4fee\u6539, \u9700\u8981\u65b0\u5efa\u4e00\u4e2a\u5206\u652feg: feature_random_proxy, \u7136\u540e\u8fdb\u884cpull requests.\r\n\r\n\u6211\u4f1a\u5c06\u65b0\u5206\u652f\u5408\u5e76\u5230develop\u4e0a, \u5e76\u5728\u6f14\u793a\u7684\u673a\u5668\u4e0a\u8fd0\u884c\u4e00\u6bb5\u65f6\u95f4\u540e\u5408\u5e76\u81f3master.\r\n\r\n\u4ee5\u4e0a, \u611f\u8c22!\r\n\r\n# \u95ee\u9898\u53cd\u9988\r\n\r\n\u4efb\u4f55\u95ee\u9898\u6b22\u8fce\u5728[Issues](https://github.com/1again/ProxyPool/issues)\u4e2d\u53cd\u9988.\r\n\r\n\u6211\u4eec\u7684\u76ee\u6807\u662f, \u6ca1\u6709\u86c0\u7259!\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "\u9ad8\u8d28\u91cf, \u9ad8\u7075\u6d3b\u7684\u5f00\u653e\u4ee3\u7406\u6c60\u670d\u52a1 \u53ef\u80fd\u662f`\u5168\u7403\u7b2c\u4e00\u4e2a`\u5e26\u6709`\u667a\u80fd\u52a8\u6001\u4ee3\u7406`\u7684\u4ee3\u7406\u6c60\u670d\u52a1..",
    "version": "1.1.5",
    "project_urls": {
        "Homepage": "https://github.com/yanjlee/SmartProxyPool"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "93c10c1e5efde0d28339943db332a629cebd9c4301ff2c0fd5f0071775f0135a",
                "md5": "fa1c73123905c8683ed91b1b0d2dd254",
                "sha256": "0fa357c6c53535f1399592bea9c7c61e53281cb1e24f0b65b9cd1e6bd145de41"
            },
            "downloads": -1,
            "filename": "SmartProxyPool-1.1.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fa1c73123905c8683ed91b1b0d2dd254",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 7605,
            "upload_time": "2024-06-01T08:25:50",
            "upload_time_iso_8601": "2024-06-01T08:25:50.325717Z",
            "url": "https://files.pythonhosted.org/packages/93/c1/0c1e5efde0d28339943db332a629cebd9c4301ff2c0fd5f0071775f0135a/SmartProxyPool-1.1.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5e384d08d99308efc9ba19646ee623de2797d208cf71cd3a54c9d2581a6e2a83",
                "md5": "6f0b183f11ed9a7f717f4d1a53170e7e",
                "sha256": "f9c5ad147546cba38edfce6ed3b8e59f98f233068e0af2ca43f6dc4f0334d7bf"
            },
            "downloads": -1,
            "filename": "smartproxypool-1.1.5.tar.gz",
            "has_sig": false,
            "md5_digest": "6f0b183f11ed9a7f717f4d1a53170e7e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 7890,
            "upload_time": "2024-06-01T08:25:52",
            "upload_time_iso_8601": "2024-06-01T08:25:52.089731Z",
            "url": "https://files.pythonhosted.org/packages/5e/38/4d08d99308efc9ba19646ee623de2797d208cf71cd3a54c9d2581a6e2a83/smartproxypool-1.1.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-01 08:25:52",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "yanjlee",
    "github_project": "SmartProxyPool",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "APScheduler",
            "specs": [
                [
                    "==",
                    "3.2.0"
                ]
            ]
        },
        {
            "name": "Werkzeug",
            "specs": [
                [
                    "==",
                    "0.15.3"
                ]
            ]
        },
        {
            "name": "Flask",
            "specs": [
                [
                    "==",
                    "1.0.2"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    "==",
                    "2.20.0"
                ]
            ]
        },
        {
            "name": "lxml",
            "specs": [
                [
                    "==",
                    "4.3.3"
                ]
            ]
        },
        {
            "name": "gevent",
            "specs": [
                [
                    "==",
                    "1.4.0"
                ]
            ]
        },
        {
            "name": "Flask-RESTful",
            "specs": [
                [
                    "==",
                    "0.3.6"
                ]
            ]
        },
        {
            "name": "ipip-datx",
            "specs": [
                [
                    "==",
                    "0.4.0"
                ]
            ]
        },
        {
            "name": "pymongo",
            "specs": [
                [
                    "==",
                    "3.7.2"
                ]
            ]
        },
        {
            "name": "flask-mongoengine",
            "specs": [
                [
                    "==",
                    "0.8.2"
                ]
            ]
        },
        {
            "name": "Flask-Admin",
            "specs": [
                [
                    "==",
                    "1.5.3"
                ]
            ]
        },
        {
            "name": "Flask-Security",
            "specs": [
                [
                    "==",
                    "3.0.0"
                ]
            ]
        }
    ],
    "lcname": "smartproxypool"
}
        
Elapsed time: 0.38155s