# Eagle-Eye Scraper
**Eagle-Eye Scraper** 是一个高效、灵活且具备原生分布式特性的 Python 数据采集框架。它支持静态/动态网页、API 数据采集,并通过模块化架构实现采集逻辑与业务逻辑的彻底解耦,适合构建可维护、可扩展的数据抓取系统。
---
## ✨ 核心特点
* **原生分布式设计**
内置对分布式任务调度的支持,轻松扩展至多节点并发采集,适用于大规摸爬取任务。
* **通用采集能力**
支持静态网页、JavaScript 渲染页面和 API 接口等多种数据源类型,适应各类业务需求。
* **逻辑解耦架构**
采集引擎逻辑与业务处理逻辑完全分离,便于测试、维护与功能演进。
* **高性能任务调度**
集成 `APScheduler` 提供异步高效的定时调度能力,支持复杂的任务管理。
* **模块化与插件化设计**
支持自定义采集器、过滤器、解析器等组件,方便二次开发和集成。
---
## 📦 安装方式
### 基础安装
```bash
pip install eagle-eye-scraper
```
### 安装可选依赖项
根据使用场景,可选择安装如下依赖:
| 组件 | 安装命令 |
| --------- | ------------------------------------------------------ |
| Redis | `pip install "eagle-eye-scraper[redis]"` |
| MongoDB | `pip install "eagle-eye-scraper[mongodb]"` |
| MySQL | `pip install "eagle-eye-scraper[mysql]"` |
| MinIO | `pip install "eagle-eye-scraper[minio]"` |
| Pulsar MQ | `pip install "eagle-eye-scraper[mq]"` |
| 多组件组合安装 | `pip install "eagle-eye-scraper[redis,mongodb,minio]"` |
> 💡 如果使用的是旧版 pip,请将 `[]` 用引号括起来,例如:
>
> ```bash
> pip install "eagle-eye-scraper[mongo,redis]"
> ```
---
## 🧰 示例用法
```python
from eagle_eye_scraper import Spider
class SimpleSpider(Spider):
def crawl(self, **kwargs):
# 模拟从网络抓取数据
self.raw_data = "<html><title>示例页面</title><body>Hello World</body></html>"
print("抓取完成")
def parse(self, **kwargs):
# 模拟对抓取数据的解析
title_start = self.raw_data.find("<title>") + 7
title_end = self.raw_data.find("</title>")
title = self.raw_data[title_start:title_end]
print(f"解析得到标题:{title}")
if __name__ == "__main__":
spider = SimpleSpider()
spider.run()
```
---
## 📄 License
MIT License
Raw data
{
"_id": null,
"home_page": null,
"name": "eagle-eye-scraper",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "python, scraper, data extraction, distributed scraping",
"author": null,
"author_email": "Nick <mr.nickdone@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/86/40/cac884512a79d62b00b432d9d03545c1075a955fa025fee3b62efe9436b6/eagle_eye_scraper-1.3.5.tar.gz",
"platform": null,
"description": "# Eagle-Eye Scraper\n\n**Eagle-Eye Scraper** \u662f\u4e00\u4e2a\u9ad8\u6548\u3001\u7075\u6d3b\u4e14\u5177\u5907\u539f\u751f\u5206\u5e03\u5f0f\u7279\u6027\u7684 Python \u6570\u636e\u91c7\u96c6\u6846\u67b6\u3002\u5b83\u652f\u6301\u9759\u6001/\u52a8\u6001\u7f51\u9875\u3001API \u6570\u636e\u91c7\u96c6\uff0c\u5e76\u901a\u8fc7\u6a21\u5757\u5316\u67b6\u6784\u5b9e\u73b0\u91c7\u96c6\u903b\u8f91\u4e0e\u4e1a\u52a1\u903b\u8f91\u7684\u5f7b\u5e95\u89e3\u8026\uff0c\u9002\u5408\u6784\u5efa\u53ef\u7ef4\u62a4\u3001\u53ef\u6269\u5c55\u7684\u6570\u636e\u6293\u53d6\u7cfb\u7edf\u3002\n\n---\n\n## \u2728 \u6838\u5fc3\u7279\u70b9\n\n* **\u539f\u751f\u5206\u5e03\u5f0f\u8bbe\u8ba1**\n \u5185\u7f6e\u5bf9\u5206\u5e03\u5f0f\u4efb\u52a1\u8c03\u5ea6\u7684\u652f\u6301\uff0c\u8f7b\u677e\u6269\u5c55\u81f3\u591a\u8282\u70b9\u5e76\u53d1\u91c7\u96c6\uff0c\u9002\u7528\u4e8e\u5927\u89c4\u6478\u722c\u53d6\u4efb\u52a1\u3002\n\n* **\u901a\u7528\u91c7\u96c6\u80fd\u529b**\n \u652f\u6301\u9759\u6001\u7f51\u9875\u3001JavaScript \u6e32\u67d3\u9875\u9762\u548c API \u63a5\u53e3\u7b49\u591a\u79cd\u6570\u636e\u6e90\u7c7b\u578b\uff0c\u9002\u5e94\u5404\u7c7b\u4e1a\u52a1\u9700\u6c42\u3002\n\n* **\u903b\u8f91\u89e3\u8026\u67b6\u6784**\n \u91c7\u96c6\u5f15\u64ce\u903b\u8f91\u4e0e\u4e1a\u52a1\u5904\u7406\u903b\u8f91\u5b8c\u5168\u5206\u79bb\uff0c\u4fbf\u4e8e\u6d4b\u8bd5\u3001\u7ef4\u62a4\u4e0e\u529f\u80fd\u6f14\u8fdb\u3002\n\n* **\u9ad8\u6027\u80fd\u4efb\u52a1\u8c03\u5ea6**\n \u96c6\u6210 `APScheduler` \u63d0\u4f9b\u5f02\u6b65\u9ad8\u6548\u7684\u5b9a\u65f6\u8c03\u5ea6\u80fd\u529b\uff0c\u652f\u6301\u590d\u6742\u7684\u4efb\u52a1\u7ba1\u7406\u3002\n\n* **\u6a21\u5757\u5316\u4e0e\u63d2\u4ef6\u5316\u8bbe\u8ba1**\n \u652f\u6301\u81ea\u5b9a\u4e49\u91c7\u96c6\u5668\u3001\u8fc7\u6ee4\u5668\u3001\u89e3\u6790\u5668\u7b49\u7ec4\u4ef6\uff0c\u65b9\u4fbf\u4e8c\u6b21\u5f00\u53d1\u548c\u96c6\u6210\u3002\n\n---\n\n## \ud83d\udce6 \u5b89\u88c5\u65b9\u5f0f\n\n### \u57fa\u7840\u5b89\u88c5\n\n```bash\npip install eagle-eye-scraper\n```\n\n### \u5b89\u88c5\u53ef\u9009\u4f9d\u8d56\u9879\n\n\u6839\u636e\u4f7f\u7528\u573a\u666f\uff0c\u53ef\u9009\u62e9\u5b89\u88c5\u5982\u4e0b\u4f9d\u8d56\uff1a\n\n| \u7ec4\u4ef6 | \u5b89\u88c5\u547d\u4ee4 |\n| --------- | ------------------------------------------------------ |\n| Redis | `pip install \"eagle-eye-scraper[redis]\"` |\n| MongoDB | `pip install \"eagle-eye-scraper[mongodb]\"` |\n| MySQL | `pip install \"eagle-eye-scraper[mysql]\"` |\n| MinIO | `pip install \"eagle-eye-scraper[minio]\"` |\n| Pulsar MQ | `pip install \"eagle-eye-scraper[mq]\"` |\n| \u591a\u7ec4\u4ef6\u7ec4\u5408\u5b89\u88c5 | `pip install \"eagle-eye-scraper[redis,mongodb,minio]\"` |\n\n\n> \ud83d\udca1 \u5982\u679c\u4f7f\u7528\u7684\u662f\u65e7\u7248 pip\uff0c\u8bf7\u5c06 `[]` \u7528\u5f15\u53f7\u62ec\u8d77\u6765\uff0c\u4f8b\u5982\uff1a\n>\n> ```bash\n> pip install \"eagle-eye-scraper[mongo,redis]\"\n> ```\n\n---\n\n## \ud83e\uddf0 \u793a\u4f8b\u7528\u6cd5\n\n```python\nfrom eagle_eye_scraper import Spider\n\nclass SimpleSpider(Spider):\n def crawl(self, **kwargs):\n # \u6a21\u62df\u4ece\u7f51\u7edc\u6293\u53d6\u6570\u636e\n self.raw_data = \"<html><title>\u793a\u4f8b\u9875\u9762</title><body>Hello World</body></html>\"\n print(\"\u6293\u53d6\u5b8c\u6210\")\n\n def parse(self, **kwargs):\n # \u6a21\u62df\u5bf9\u6293\u53d6\u6570\u636e\u7684\u89e3\u6790\n title_start = self.raw_data.find(\"<title>\") + 7\n title_end = self.raw_data.find(\"</title>\")\n title = self.raw_data[title_start:title_end]\n print(f\"\u89e3\u6790\u5f97\u5230\u6807\u9898\uff1a{title}\")\n\nif __name__ == \"__main__\":\n spider = SimpleSpider()\n spider.run()\n\n```\n\n---\n\n## \ud83d\udcc4 License\n\nMIT License\n\n\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "eagle-eye-scraper \u662f\u4e00\u4e2a\u9ad8\u6548\u7684 Python \u6570\u636e\u91c7\u96c6\u6846\u67b6\uff0c\u652f\u6301\u5206\u5e03\u5f0f\u90e8\u7f72\uff0c\u9002\u7528\u4e8e\u590d\u6742\u9875\u9762\u548c\u5927\u89c4\u6a21\u6570\u636e\u91c7\u96c6\u3002",
"version": "1.3.5",
"project_urls": null,
"split_keywords": [
"python",
" scraper",
" data extraction",
" distributed scraping"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "376afbb692a5127a0126411cd7876d534fb775323316bb2d90ff2fd0e9402f0c",
"md5": "632bca91e0546cdfe24db3dc5ba352cb",
"sha256": "4fb4bae44df5d93a69c1569395ec992f8213cf1d342e5c1f6fad18b29d1df329"
},
"downloads": -1,
"filename": "eagle_eye_scraper-1.3.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "632bca91e0546cdfe24db3dc5ba352cb",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 41034,
"upload_time": "2025-07-09T08:50:43",
"upload_time_iso_8601": "2025-07-09T08:50:43.161213Z",
"url": "https://files.pythonhosted.org/packages/37/6a/fbb692a5127a0126411cd7876d534fb775323316bb2d90ff2fd0e9402f0c/eagle_eye_scraper-1.3.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "8640cac884512a79d62b00b432d9d03545c1075a955fa025fee3b62efe9436b6",
"md5": "c7e5d5ee31e973cb6d44491a96a52a4d",
"sha256": "4570b1b78db63360f150a271f67f86ecfd376f3e1a8673288853bed98d196fd4"
},
"downloads": -1,
"filename": "eagle_eye_scraper-1.3.5.tar.gz",
"has_sig": false,
"md5_digest": "c7e5d5ee31e973cb6d44491a96a52a4d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 28941,
"upload_time": "2025-07-09T08:50:44",
"upload_time_iso_8601": "2025-07-09T08:50:44.814566Z",
"url": "https://files.pythonhosted.org/packages/86/40/cac884512a79d62b00b432d9d03545c1075a955fa025fee3b62efe9436b6/eagle_eye_scraper-1.3.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-09 08:50:44",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "eagle-eye-scraper"
}