高质量, 高灵活的开放代理池服务
可能是`全球第一个`带有`智能动态代理`的代理池服务.
这下牛皮吹大了, 不好下来.
[ProxyPool Demo](http://proxy.1again.cc:35050/api/v1/proxy/) (我就是个栗子, 别指望我能有多稳定!)
---
# 功能/特点
我们的目标是`高质量`, `高灵活`.
所有功能都是围绕这两点开发的:
1. 所有代理都有验证的`计数`和`评分`, 验证成功的次数 / 总计验证的次数 == 代理可用率 (数据库界面)
![](Docs/images/2019-06-12-22-11-21.png)
2. 支持动态代理(手动加粗)
```
root@1again:~# curl -x "proxy.1again.cc:36050" https://httpbin.org/ip
{
"origin": "183.82.32.56"
}
root@1again:~# curl -x "proxy.1again.cc:36050" https://httpbin.org/ip
{
"origin": "200.149.19.170"
}
root@1again:~# curl -x "proxy.1again.cc:36050" https://httpbin.org/ip
{
"origin": "125.21.43.82"
}
root@1again:~# curl -x "proxy.1again.cc:36050" https://httpbin.org/ip
{
"origin": "110.52.235.124"
}
root@1again:~# curl -x "proxy.1again.cc:36050" https://httpbin.org/ip
{
"origin": "176.74.134.6"
}
```
3. 获取代理时可以根据是否支持`https`, 透明还是匿名(普匿)`type`, 代理的所在的区域`region`进行过滤, 举栗子
```
# 获取支持https的proxy
http://proxy.1again.cc:35050/api/v1/proxy/?https=1
# 获取匿名的proxy
http://proxy.1again.cc:35050/api/v1/proxy/?type=2
# 获取所在区域为中国的proxy
http://proxy.1again.cc:35050/api/v1/proxy/?region=中国
# 获取所在区域不为中国的proxy
http://proxy.1again.cc:35050/api/v1/proxy/?region=!中国
# 获取支持https, 匿名, 所在区域为中国的rpoxy
http://proxy.1again.cc:35050/api/v1/proxy/?https=1&type=2®ion=中国
```
4. [WEB页面的管理](http://proxy.1again.cc:35050/admin) 用户名:admin 密码:admin (尔敢乱动, 打洗雷啊!)
![](Docs/images/2019-06-15-08-18-36.png)
5. 可以通过WEB界面配置参数.
![](Docs/images/2019-06-15-13-18-47.png)
6. WEB管理`抓取代理的站点`
![](Docs/images/2019-06-12-22-22-46.png)
7. 支持`gevent`并发模式, 效果杠杠的, 别看广告, 看疗效!
```
2019-06-13 10:00:26,656 ProxyFetch.py[line:103] INFO fetch [ xicidaili ] proxy finish, total:400, succ:65, fail:0, skip:335, elapsed_time:1s
2019-06-13 10:00:26,662 ProxyFetch.py[line:103] INFO fetch [ proxylistplus ] proxy finish, total:0, succ:0, fail:0, skip:0, elapsed_time:1s
2019-06-13 10:00:27,179 ProxyFetch.py[line:103] INFO fetch [ iphai ] proxy finish, total:83, succ:17, fail:0, skip:66, elapsed_time:2s
2019-06-13 10:00:27,374 ProxyFetch.py[line:103] INFO fetch [ 66ip ] proxy finish, total:0, succ:0, fail:0, skip:0, elapsed_time:2s
2019-06-13 10:00:32,276 ProxyFetch.py[line:103] INFO fetch [ ip3366 ] proxy finish, total:15, succ:0, fail:0, skip:15, elapsed_time:7s
2019-06-13 10:00:33,888 ProxyFetch.py[line:103] INFO fetch [ ip181 ] proxy finish, total:0, succ:0, fail:0, skip:0, elapsed_time:8s
2019-06-13 10:00:34,978 ProxyFetch.py[line:103] INFO fetch [ mimiip ] proxy finish, total:0, succ:0, fail:0, skip:0, elapsed_time:9s
2019-06-13 10:00:38,182 ProxyFetch.py[line:103] INFO fetch [ proxy-list ] proxy finish, total:28, succ:28, fail:0, skip:0, elapsed_time:13s
2019-06-13 10:01:36,432 ProxyVerify.py[line:301] INFO useful_proxy verify proxy finish, total:636, succ:327, fail:309, elapsed_time:58s
2019-06-13 10:31:15,800 ProxyVerify.py[line:301] INFO useful_proxy verify proxy finish, total:481, succ:299, fail:182, elapsed_time:37s
2019-06-13 11:01:37,569 ProxyVerify.py[line:301] INFO useful_proxy verify proxy finish, total:639, succ:315, fail:324, elapsed_time:59s
2019-06-13 11:31:54,798 ProxyVerify.py[line:301] INFO useful_proxy verify proxy finish, total:977, succ:342, fail:635, elapsed_time:76s
2019-06-13 12:01:21,659 ProxyVerify.py[line:301] INFO useful_proxy verify proxy finish, total:608, succ:314, fail:294, elapsed_time:43s
```
8. 实在编不下去了, 你行你来!
# 文档
[设计文档](Docs/Design.md)
# 目前
基本上满足了当初的设想, 准备开始写文档和代码优化.
# 使用场景
1. 主要还是用于爬虫.
2. 公司需要有个内部代理池服务, 用来干一些丧尽天良的坏事.
3. 个人需要用来干一些见不得人的事.
# 安装/部署
## 生产环境
```shell
# Install Docker
curl -sSL https://get.docker.com | sh
# start mongo database
docker run -d --name mongo -v /data/mongodb:/data -p 27017:27017 mongo
# Start proxy_pool container
docker run -d --name proxy_pool --link mongo:proxy_pool_db -p 35050:35050 -p 36050:36050 1again/proxy_pool
```
## 开发环境
```shell
# Clone Repo
git clone https://github.com/1again/ProxyPool
# Entry Dir
cd ProxyPool
# Install Docker
curl -sSL https://get.docker.com | sh
# start mongo database
docker run -d --name mongo -v /data/mongodb:/data -p 27017:27017 mongo
# Start proxy_pool container
docker run -it --rm --link mongo:proxy_pool_db -v $(pwd):/usr/src/app -p 35050:35050 -p 36050:36050 1again/proxy_pool
```
# 使用
启动过几分钟后就能看到抓取到的代理IP, 你可以直接在WEB管理界面中中查看
## DYNAMIC PROXY
```shell
curl -x 'your_server_ip:36050' your_access_url
like this:
curl -x "proxy.1again.cc:36050" https://httpbin.org/ip
```
## RESTFUL API
```python
API_LIST = {
"/api/v1/proxy/": {
"args": {
"https": {
"value": [1],
"desc": "need https proxy? 1 == true",
"required": False,
},
"region": {
"value": "region name like 中国 or 广州 or 江苏",
"desc": "Get Region Proxy",
"required": False,
},
"type": {
"value": [1,2],
"desc": "clear proxy 1 or (common) anonymous 2",
"required": False,
}
},
"desc": "Get A Random Proxy"
},
"/api/v1/proxies/": {
"args": {
"https": {
"value": [1],
"desc": "need https proxy? 1 == true",
"required": False,
},
"region": {
"value": "region name like 中国 or 广州 or 江苏",
"desc": "Get Region Proxy",
"required": False,
},
"type": {
"value": [1,2],
"desc": "clear proxy 1 or (common) anonymous 2",
"required": False,
}
},
"desc": "Get All Proxy",
},
}
```
## 扩展代理
项目默认包含几个免费的代理获取方法
如果遇到好的免费代理渠道, 可以自行添加其他代理获取的方法.
添加一个新的代理获取方法如下:
首先在`Src/Fetcher/fetchers/`目录中添加你的代理类.
该类需要有一个`run`方法, 以生成器(yield)形式返回`host:ip`格式的代理,例如:
```python
# 文件名任意, 一般建议与`fetcher_host`的中间部分保持一致方便识别
# Class名, 固定为`CustomFetcher`
class CustomFetcher():
# 只用来识别的, 会映射到数据库里面
fetcher_host = "www.66ip.cn"
def run(self):
url_list = [
'http://www.xxx.com/',
]
for url in url_list:
html_tree = getHtmlTree(url)
ul_list = html_tree.xpath('//ul[@class="l2"]')
for ul in ul_list:
try:
yield ':'.join(ul.xpath('.//li/text()')[0:2])
except Exception as e:
print(e)
```
`ProxyFetchSchedule` 会每隔一段时间抓取一次代理,下次抓取时会自动识别调用你定义的方法。
# Contributing
感谢你的支持, 让我们变得更好!
为了规范和清晰, 我们需要一起做些简单约定.
两个主要的分支
develop 为下个版本的内容
master 为当前稳定版本的内容
1. 小修小改, 不影响原版本的修改, 可以在develop上进行, 然后pull requests
2. 大动干戈, 影响之前版本的修改, 需要新建一个分支eg: feature_random_proxy, 然后进行pull requests.
我会将新分支合并到develop上, 并在演示的机器上运行一段时间后合并至master.
以上, 感谢!
# 问题反馈
任何问题欢迎在[Issues](https://github.com/1again/ProxyPool/issues)中反馈.
我们的目标是, 没有蛀牙!
Raw data
{
"_id": null,
"home_page": "https://github.com/yanjlee/SmartProxyPool",
"name": "SmartProxyPool",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "yanjlee",
"author_email": "yanjlee@163.com",
"download_url": "https://files.pythonhosted.org/packages/5e/38/4d08d99308efc9ba19646ee623de2797d208cf71cd3a54c9d2581a6e2a83/smartproxypool-1.1.5.tar.gz",
"platform": null,
"description": "\r\n\u9ad8\u8d28\u91cf, \u9ad8\u7075\u6d3b\u7684\u5f00\u653e\u4ee3\u7406\u6c60\u670d\u52a1\r\n\r\n\u53ef\u80fd\u662f`\u5168\u7403\u7b2c\u4e00\u4e2a`\u5e26\u6709`\u667a\u80fd\u52a8\u6001\u4ee3\u7406`\u7684\u4ee3\u7406\u6c60\u670d\u52a1.\r\n\r\n\u8fd9\u4e0b\u725b\u76ae\u5439\u5927\u4e86, \u4e0d\u597d\u4e0b\u6765.\r\n\r\n[ProxyPool Demo](http://proxy.1again.cc:35050/api/v1/proxy/) (\u6211\u5c31\u662f\u4e2a\u6817\u5b50, \u522b\u6307\u671b\u6211\u80fd\u6709\u591a\u7a33\u5b9a!)\r\n\r\n---\r\n\r\n# \u529f\u80fd/\u7279\u70b9\r\n\r\n\u6211\u4eec\u7684\u76ee\u6807\u662f`\u9ad8\u8d28\u91cf`, `\u9ad8\u7075\u6d3b`.\r\n\r\n\u6240\u6709\u529f\u80fd\u90fd\u662f\u56f4\u7ed5\u8fd9\u4e24\u70b9\u5f00\u53d1\u7684:\r\n\r\n1. \u6240\u6709\u4ee3\u7406\u90fd\u6709\u9a8c\u8bc1\u7684`\u8ba1\u6570`\u548c`\u8bc4\u5206`, \u9a8c\u8bc1\u6210\u529f\u7684\u6b21\u6570 / \u603b\u8ba1\u9a8c\u8bc1\u7684\u6b21\u6570 == \u4ee3\u7406\u53ef\u7528\u7387 (\u6570\u636e\u5e93\u754c\u9762)\r\n\r\n![](Docs/images/2019-06-12-22-11-21.png)\r\n\r\n2. \u652f\u6301\u52a8\u6001\u4ee3\u7406(\u624b\u52a8\u52a0\u7c97)\r\n\r\n```\r\nroot@1again:~# curl -x \"proxy.1again.cc:36050\" https://httpbin.org/ip\r\n{\r\n \"origin\": \"183.82.32.56\"\r\n}\r\nroot@1again:~# curl -x \"proxy.1again.cc:36050\" https://httpbin.org/ip\r\n{\r\n \"origin\": \"200.149.19.170\"\r\n}\r\nroot@1again:~# curl -x \"proxy.1again.cc:36050\" https://httpbin.org/ip\r\n{\r\n \"origin\": \"125.21.43.82\"\r\n}\r\nroot@1again:~# curl -x \"proxy.1again.cc:36050\" https://httpbin.org/ip\r\n{\r\n \"origin\": \"110.52.235.124\"\r\n}\r\nroot@1again:~# curl -x \"proxy.1again.cc:36050\" https://httpbin.org/ip\r\n{\r\n \"origin\": \"176.74.134.6\"\r\n}\r\n```\r\n\r\n3. \u83b7\u53d6\u4ee3\u7406\u65f6\u53ef\u4ee5\u6839\u636e\u662f\u5426\u652f\u6301`https`, \u900f\u660e\u8fd8\u662f\u533f\u540d(\u666e\u533f)`type`, \u4ee3\u7406\u7684\u6240\u5728\u7684\u533a\u57df`region`\u8fdb\u884c\u8fc7\u6ee4, \u4e3e\u6817\u5b50\r\n\r\n```\r\n# \u83b7\u53d6\u652f\u6301https\u7684proxy\r\nhttp://proxy.1again.cc:35050/api/v1/proxy/?https=1\r\n\r\n# \u83b7\u53d6\u533f\u540d\u7684proxy\r\nhttp://proxy.1again.cc:35050/api/v1/proxy/?type=2\r\n\r\n# \u83b7\u53d6\u6240\u5728\u533a\u57df\u4e3a\u4e2d\u56fd\u7684proxy\r\nhttp://proxy.1again.cc:35050/api/v1/proxy/?region=\u4e2d\u56fd\r\n\r\n# \u83b7\u53d6\u6240\u5728\u533a\u57df\u4e0d\u4e3a\u4e2d\u56fd\u7684proxy\r\nhttp://proxy.1again.cc:35050/api/v1/proxy/?region=!\u4e2d\u56fd\r\n\r\n# \u83b7\u53d6\u652f\u6301https, \u533f\u540d, \u6240\u5728\u533a\u57df\u4e3a\u4e2d\u56fd\u7684rpoxy\r\nhttp://proxy.1again.cc:35050/api/v1/proxy/?https=1&type=2®ion=\u4e2d\u56fd\r\n```\r\n\r\n4. [WEB\u9875\u9762\u7684\u7ba1\u7406](http://proxy.1again.cc:35050/admin) \u7528\u6237\u540d:admin \u5bc6\u7801:admin (\u5c14\u6562\u4e71\u52a8, \u6253\u6d17\u96f7\u554a!)\r\n\r\n![](Docs/images/2019-06-15-08-18-36.png)\r\n\r\n5. \u53ef\u4ee5\u901a\u8fc7WEB\u754c\u9762\u914d\u7f6e\u53c2\u6570.\r\n\r\n![](Docs/images/2019-06-15-13-18-47.png)\r\n\r\n6. WEB\u7ba1\u7406`\u6293\u53d6\u4ee3\u7406\u7684\u7ad9\u70b9`\r\n\r\n![](Docs/images/2019-06-12-22-22-46.png)\r\n\r\n7. \u652f\u6301`gevent`\u5e76\u53d1\u6a21\u5f0f, \u6548\u679c\u6760\u6760\u7684, \u522b\u770b\u5e7f\u544a, \u770b\u7597\u6548!\r\n\r\n```\r\n2019-06-13 10:00:26,656 ProxyFetch.py[line:103] INFO fetch [ xicidaili ] proxy finish, total:400, succ:65, fail:0, skip:335, elapsed_time:1s\r\n2019-06-13 10:00:26,662 ProxyFetch.py[line:103] INFO fetch [ proxylistplus ] proxy finish, total:0, succ:0, fail:0, skip:0, elapsed_time:1s\r\n2019-06-13 10:00:27,179 ProxyFetch.py[line:103] INFO fetch [ iphai ] proxy finish, total:83, succ:17, fail:0, skip:66, elapsed_time:2s\r\n2019-06-13 10:00:27,374 ProxyFetch.py[line:103] INFO fetch [ 66ip ] proxy finish, total:0, succ:0, fail:0, skip:0, elapsed_time:2s\r\n2019-06-13 10:00:32,276 ProxyFetch.py[line:103] INFO fetch [ ip3366 ] proxy finish, total:15, succ:0, fail:0, skip:15, elapsed_time:7s\r\n2019-06-13 10:00:33,888 ProxyFetch.py[line:103] INFO fetch [ ip181 ] proxy finish, total:0, succ:0, fail:0, skip:0, elapsed_time:8s\r\n2019-06-13 10:00:34,978 ProxyFetch.py[line:103] INFO fetch [ mimiip ] proxy finish, total:0, succ:0, fail:0, skip:0, elapsed_time:9s\r\n2019-06-13 10:00:38,182 ProxyFetch.py[line:103] INFO fetch [ proxy-list ] proxy finish, total:28, succ:28, fail:0, skip:0, elapsed_time:13s\r\n2019-06-13 10:01:36,432 ProxyVerify.py[line:301] INFO useful_proxy verify proxy finish, total:636, succ:327, fail:309, elapsed_time:58s\r\n2019-06-13 10:31:15,800 ProxyVerify.py[line:301] INFO useful_proxy verify proxy finish, total:481, succ:299, fail:182, elapsed_time:37s\r\n2019-06-13 11:01:37,569 ProxyVerify.py[line:301] INFO useful_proxy verify proxy finish, total:639, succ:315, fail:324, elapsed_time:59s\r\n2019-06-13 11:31:54,798 ProxyVerify.py[line:301] INFO useful_proxy verify proxy finish, total:977, succ:342, fail:635, elapsed_time:76s\r\n2019-06-13 12:01:21,659 ProxyVerify.py[line:301] INFO useful_proxy verify proxy finish, total:608, succ:314, fail:294, elapsed_time:43s\r\n```\r\n\r\n8. \u5b9e\u5728\u7f16\u4e0d\u4e0b\u53bb\u4e86, \u4f60\u884c\u4f60\u6765!\r\n\r\n# \u6587\u6863\r\n\r\n[\u8bbe\u8ba1\u6587\u6863](Docs/Design.md)\r\n\r\n# \u76ee\u524d\r\n\r\n\u57fa\u672c\u4e0a\u6ee1\u8db3\u4e86\u5f53\u521d\u7684\u8bbe\u60f3, \u51c6\u5907\u5f00\u59cb\u5199\u6587\u6863\u548c\u4ee3\u7801\u4f18\u5316.\r\n\r\n# \u4f7f\u7528\u573a\u666f\r\n\r\n1. \u4e3b\u8981\u8fd8\u662f\u7528\u4e8e\u722c\u866b.\r\n\r\n2. \u516c\u53f8\u9700\u8981\u6709\u4e2a\u5185\u90e8\u4ee3\u7406\u6c60\u670d\u52a1, \u7528\u6765\u5e72\u4e00\u4e9b\u4e27\u5c3d\u5929\u826f\u7684\u574f\u4e8b.\r\n\r\n3. \u4e2a\u4eba\u9700\u8981\u7528\u6765\u5e72\u4e00\u4e9b\u89c1\u4e0d\u5f97\u4eba\u7684\u4e8b.\r\n\r\n# \u5b89\u88c5/\u90e8\u7f72\r\n\r\n## \u751f\u4ea7\u73af\u5883\r\n\r\n```shell\r\n# Install Docker\r\ncurl -sSL https://get.docker.com | sh\r\n\r\n# start mongo database\r\ndocker run -d --name mongo -v /data/mongodb:/data -p 27017:27017 mongo\r\n\r\n# Start proxy_pool container\r\ndocker run -d --name proxy_pool --link mongo:proxy_pool_db -p 35050:35050 -p 36050:36050 1again/proxy_pool\r\n```\r\n\r\n## \u5f00\u53d1\u73af\u5883\r\n\r\n```shell\r\n# Clone Repo\r\ngit clone https://github.com/1again/ProxyPool\r\n\r\n# Entry Dir\r\ncd ProxyPool\r\n\r\n# Install Docker\r\ncurl -sSL https://get.docker.com | sh\r\n\r\n# start mongo database\r\ndocker run -d --name mongo -v /data/mongodb:/data -p 27017:27017 mongo\r\n\r\n# Start proxy_pool container\r\ndocker run -it --rm --link mongo:proxy_pool_db -v $(pwd):/usr/src/app -p 35050:35050 -p 36050:36050 1again/proxy_pool\r\n```\r\n\r\n# \u4f7f\u7528\r\n\r\n\u542f\u52a8\u8fc7\u51e0\u5206\u949f\u540e\u5c31\u80fd\u770b\u5230\u6293\u53d6\u5230\u7684\u4ee3\u7406IP, \u4f60\u53ef\u4ee5\u76f4\u63a5\u5728WEB\u7ba1\u7406\u754c\u9762\u4e2d\u4e2d\u67e5\u770b\r\n\r\n## DYNAMIC PROXY\r\n\r\n```shell\r\ncurl -x 'your_server_ip:36050' your_access_url\r\n\r\nlike this:\r\ncurl -x \"proxy.1again.cc:36050\" https://httpbin.org/ip\r\n```\r\n\r\n## RESTFUL API\r\n\r\n```python\r\n\r\nAPI_LIST = {\r\n \"/api/v1/proxy/\": {\r\n \"args\": {\r\n \"https\": {\r\n \"value\": [1],\r\n \"desc\": \"need https proxy? 1 == true\",\r\n \"required\": False,\r\n },\r\n \"region\": {\r\n \"value\": \"region name like \u4e2d\u56fd or \u5e7f\u5dde or \u6c5f\u82cf\",\r\n \"desc\": \"Get Region Proxy\",\r\n \"required\": False,\r\n },\r\n \"type\": {\r\n \"value\": [1,2],\r\n \"desc\": \"clear proxy 1 or (common) anonymous 2\",\r\n \"required\": False,\r\n }\r\n },\r\n \"desc\": \"Get A Random Proxy\"\r\n },\r\n \"/api/v1/proxies/\": {\r\n \"args\": {\r\n \"https\": {\r\n \"value\": [1],\r\n \"desc\": \"need https proxy? 1 == true\",\r\n \"required\": False,\r\n },\r\n \"region\": {\r\n \"value\": \"region name like \u4e2d\u56fd or \u5e7f\u5dde or \u6c5f\u82cf\",\r\n \"desc\": \"Get Region Proxy\",\r\n \"required\": False,\r\n },\r\n \"type\": {\r\n \"value\": [1,2],\r\n \"desc\": \"clear proxy 1 or (common) anonymous 2\",\r\n \"required\": False,\r\n }\r\n },\r\n \"desc\": \"Get All Proxy\",\r\n },\r\n}\r\n\r\n```\r\n\r\n## \u6269\u5c55\u4ee3\u7406\r\n\r\n\u9879\u76ee\u9ed8\u8ba4\u5305\u542b\u51e0\u4e2a\u514d\u8d39\u7684\u4ee3\u7406\u83b7\u53d6\u65b9\u6cd5\r\n\r\n\u5982\u679c\u9047\u5230\u597d\u7684\u514d\u8d39\u4ee3\u7406\u6e20\u9053, \u53ef\u4ee5\u81ea\u884c\u6dfb\u52a0\u5176\u4ed6\u4ee3\u7406\u83b7\u53d6\u7684\u65b9\u6cd5.\r\n\r\n\u6dfb\u52a0\u4e00\u4e2a\u65b0\u7684\u4ee3\u7406\u83b7\u53d6\u65b9\u6cd5\u5982\u4e0b:\r\n\r\n\u9996\u5148\u5728`Src/Fetcher/fetchers/`\u76ee\u5f55\u4e2d\u6dfb\u52a0\u4f60\u7684\u4ee3\u7406\u7c7b.\r\n\r\n\u8be5\u7c7b\u9700\u8981\u6709\u4e00\u4e2a`run`\u65b9\u6cd5, \u4ee5\u751f\u6210\u5668(yield)\u5f62\u5f0f\u8fd4\u56de`host:ip`\u683c\u5f0f\u7684\u4ee3\u7406\uff0c\u4f8b\u5982:\r\n\r\n```python\r\n\r\n# \u6587\u4ef6\u540d\u4efb\u610f, \u4e00\u822c\u5efa\u8bae\u4e0e`fetcher_host`\u7684\u4e2d\u95f4\u90e8\u5206\u4fdd\u6301\u4e00\u81f4\u65b9\u4fbf\u8bc6\u522b\r\n# Class\u540d, \u56fa\u5b9a\u4e3a`CustomFetcher`\r\nclass CustomFetcher():\r\n # \u53ea\u7528\u6765\u8bc6\u522b\u7684, \u4f1a\u6620\u5c04\u5230\u6570\u636e\u5e93\u91cc\u9762\r\n fetcher_host = \"www.66ip.cn\"\r\n\r\n def run(self):\r\n url_list = [\r\n 'http://www.xxx.com/',\r\n ]\r\n for url in url_list:\r\n html_tree = getHtmlTree(url)\r\n ul_list = html_tree.xpath('//ul[@class=\"l2\"]')\r\n for ul in ul_list:\r\n try:\r\n yield ':'.join(ul.xpath('.//li/text()')[0:2])\r\n except Exception as e:\r\n print(e)\r\n```\r\n\r\n`ProxyFetchSchedule` \u4f1a\u6bcf\u9694\u4e00\u6bb5\u65f6\u95f4\u6293\u53d6\u4e00\u6b21\u4ee3\u7406\uff0c\u4e0b\u6b21\u6293\u53d6\u65f6\u4f1a\u81ea\u52a8\u8bc6\u522b\u8c03\u7528\u4f60\u5b9a\u4e49\u7684\u65b9\u6cd5\u3002\r\n\r\n# Contributing\r\n\r\n\u611f\u8c22\u4f60\u7684\u652f\u6301, \u8ba9\u6211\u4eec\u53d8\u5f97\u66f4\u597d!\r\n\r\n\u4e3a\u4e86\u89c4\u8303\u548c\u6e05\u6670, \u6211\u4eec\u9700\u8981\u4e00\u8d77\u505a\u4e9b\u7b80\u5355\u7ea6\u5b9a.\r\n\r\n\u4e24\u4e2a\u4e3b\u8981\u7684\u5206\u652f\r\ndevelop \u4e3a\u4e0b\u4e2a\u7248\u672c\u7684\u5185\u5bb9\r\nmaster \u4e3a\u5f53\u524d\u7a33\u5b9a\u7248\u672c\u7684\u5185\u5bb9\r\n\r\n1. \u5c0f\u4fee\u5c0f\u6539, \u4e0d\u5f71\u54cd\u539f\u7248\u672c\u7684\u4fee\u6539, \u53ef\u4ee5\u5728develop\u4e0a\u8fdb\u884c, \u7136\u540epull requests\r\n2. \u5927\u52a8\u5e72\u6208, \u5f71\u54cd\u4e4b\u524d\u7248\u672c\u7684\u4fee\u6539, \u9700\u8981\u65b0\u5efa\u4e00\u4e2a\u5206\u652feg: feature_random_proxy, \u7136\u540e\u8fdb\u884cpull requests.\r\n\r\n\u6211\u4f1a\u5c06\u65b0\u5206\u652f\u5408\u5e76\u5230develop\u4e0a, \u5e76\u5728\u6f14\u793a\u7684\u673a\u5668\u4e0a\u8fd0\u884c\u4e00\u6bb5\u65f6\u95f4\u540e\u5408\u5e76\u81f3master.\r\n\r\n\u4ee5\u4e0a, \u611f\u8c22!\r\n\r\n# \u95ee\u9898\u53cd\u9988\r\n\r\n\u4efb\u4f55\u95ee\u9898\u6b22\u8fce\u5728[Issues](https://github.com/1again/ProxyPool/issues)\u4e2d\u53cd\u9988.\r\n\r\n\u6211\u4eec\u7684\u76ee\u6807\u662f, \u6ca1\u6709\u86c0\u7259!\r\n",
"bugtrack_url": null,
"license": null,
"summary": "\u9ad8\u8d28\u91cf, \u9ad8\u7075\u6d3b\u7684\u5f00\u653e\u4ee3\u7406\u6c60\u670d\u52a1 \u53ef\u80fd\u662f`\u5168\u7403\u7b2c\u4e00\u4e2a`\u5e26\u6709`\u667a\u80fd\u52a8\u6001\u4ee3\u7406`\u7684\u4ee3\u7406\u6c60\u670d\u52a1..",
"version": "1.1.5",
"project_urls": {
"Homepage": "https://github.com/yanjlee/SmartProxyPool"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "93c10c1e5efde0d28339943db332a629cebd9c4301ff2c0fd5f0071775f0135a",
"md5": "fa1c73123905c8683ed91b1b0d2dd254",
"sha256": "0fa357c6c53535f1399592bea9c7c61e53281cb1e24f0b65b9cd1e6bd145de41"
},
"downloads": -1,
"filename": "SmartProxyPool-1.1.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "fa1c73123905c8683ed91b1b0d2dd254",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 7605,
"upload_time": "2024-06-01T08:25:50",
"upload_time_iso_8601": "2024-06-01T08:25:50.325717Z",
"url": "https://files.pythonhosted.org/packages/93/c1/0c1e5efde0d28339943db332a629cebd9c4301ff2c0fd5f0071775f0135a/SmartProxyPool-1.1.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "5e384d08d99308efc9ba19646ee623de2797d208cf71cd3a54c9d2581a6e2a83",
"md5": "6f0b183f11ed9a7f717f4d1a53170e7e",
"sha256": "f9c5ad147546cba38edfce6ed3b8e59f98f233068e0af2ca43f6dc4f0334d7bf"
},
"downloads": -1,
"filename": "smartproxypool-1.1.5.tar.gz",
"has_sig": false,
"md5_digest": "6f0b183f11ed9a7f717f4d1a53170e7e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 7890,
"upload_time": "2024-06-01T08:25:52",
"upload_time_iso_8601": "2024-06-01T08:25:52.089731Z",
"url": "https://files.pythonhosted.org/packages/5e/38/4d08d99308efc9ba19646ee623de2797d208cf71cd3a54c9d2581a6e2a83/smartproxypool-1.1.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-06-01 08:25:52",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "yanjlee",
"github_project": "SmartProxyPool",
"travis_ci": true,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "APScheduler",
"specs": [
[
"==",
"3.2.0"
]
]
},
{
"name": "Werkzeug",
"specs": [
[
"==",
"0.15.3"
]
]
},
{
"name": "Flask",
"specs": [
[
"==",
"1.0.2"
]
]
},
{
"name": "requests",
"specs": [
[
"==",
"2.20.0"
]
]
},
{
"name": "lxml",
"specs": [
[
"==",
"4.3.3"
]
]
},
{
"name": "gevent",
"specs": [
[
"==",
"1.4.0"
]
]
},
{
"name": "Flask-RESTful",
"specs": [
[
"==",
"0.3.6"
]
]
},
{
"name": "ipip-datx",
"specs": [
[
"==",
"0.4.0"
]
]
},
{
"name": "pymongo",
"specs": [
[
"==",
"3.7.2"
]
]
},
{
"name": "flask-mongoengine",
"specs": [
[
"==",
"0.8.2"
]
]
},
{
"name": "Flask-Admin",
"specs": [
[
"==",
"1.5.3"
]
]
},
{
"name": "Flask-Security",
"specs": [
[
"==",
"3.0.0"
]
]
}
],
"lcname": "smartproxypool"
}