eagle-eye-scraper


Nameeagle-eye-scraper JSON
Version 1.3.5 PyPI version JSON
download
home_pageNone
Summaryeagle-eye-scraper 是一个高效的 Python 数据采集框架,支持分布式部署,适用于复杂页面和大规模数据采集。
upload_time2025-07-09 08:50:44
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT License
keywords python scraper data extraction distributed scraping
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Eagle-Eye Scraper

**Eagle-Eye Scraper** 是一个高效、灵活且具备原生分布式特性的 Python 数据采集框架。它支持静态/动态网页、API 数据采集,并通过模块化架构实现采集逻辑与业务逻辑的彻底解耦,适合构建可维护、可扩展的数据抓取系统。

---

## ✨ 核心特点

* **原生分布式设计**
  内置对分布式任务调度的支持,轻松扩展至多节点并发采集,适用于大规摸爬取任务。

* **通用采集能力**
  支持静态网页、JavaScript 渲染页面和 API 接口等多种数据源类型,适应各类业务需求。

* **逻辑解耦架构**
  采集引擎逻辑与业务处理逻辑完全分离,便于测试、维护与功能演进。

* **高性能任务调度**
  集成 `APScheduler` 提供异步高效的定时调度能力,支持复杂的任务管理。

* **模块化与插件化设计**
  支持自定义采集器、过滤器、解析器等组件,方便二次开发和集成。

---

## 📦 安装方式

### 基础安装

```bash
pip install eagle-eye-scraper
```

### 安装可选依赖项

根据使用场景,可选择安装如下依赖:

| 组件        | 安装命令                                                   |
| --------- | ------------------------------------------------------ |
| Redis     | `pip install "eagle-eye-scraper[redis]"`               |
| MongoDB   | `pip install "eagle-eye-scraper[mongodb]"`             |
| MySQL     | `pip install "eagle-eye-scraper[mysql]"`               |
| MinIO     | `pip install "eagle-eye-scraper[minio]"`               |
| Pulsar MQ | `pip install "eagle-eye-scraper[mq]"`                  |
| 多组件组合安装   | `pip install "eagle-eye-scraper[redis,mongodb,minio]"` |


> 💡 如果使用的是旧版 pip,请将 `[]` 用引号括起来,例如:
>
> ```bash
> pip install "eagle-eye-scraper[mongo,redis]"
> ```

---

## 🧰 示例用法

```python
from eagle_eye_scraper import Spider

class SimpleSpider(Spider):
    def crawl(self, **kwargs):
        # 模拟从网络抓取数据
        self.raw_data = "<html><title>示例页面</title><body>Hello World</body></html>"
        print("抓取完成")

    def parse(self, **kwargs):
        # 模拟对抓取数据的解析
        title_start = self.raw_data.find("<title>") + 7
        title_end = self.raw_data.find("</title>")
        title = self.raw_data[title_start:title_end]
        print(f"解析得到标题:{title}")

if __name__ == "__main__":
    spider = SimpleSpider()
    spider.run()

```

---

## 📄 License

MIT License



            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "eagle-eye-scraper",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "python, scraper, data extraction, distributed scraping",
    "author": null,
    "author_email": "Nick <mr.nickdone@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/86/40/cac884512a79d62b00b432d9d03545c1075a955fa025fee3b62efe9436b6/eagle_eye_scraper-1.3.5.tar.gz",
    "platform": null,
    "description": "# Eagle-Eye Scraper\n\n**Eagle-Eye Scraper** \u662f\u4e00\u4e2a\u9ad8\u6548\u3001\u7075\u6d3b\u4e14\u5177\u5907\u539f\u751f\u5206\u5e03\u5f0f\u7279\u6027\u7684 Python \u6570\u636e\u91c7\u96c6\u6846\u67b6\u3002\u5b83\u652f\u6301\u9759\u6001/\u52a8\u6001\u7f51\u9875\u3001API \u6570\u636e\u91c7\u96c6\uff0c\u5e76\u901a\u8fc7\u6a21\u5757\u5316\u67b6\u6784\u5b9e\u73b0\u91c7\u96c6\u903b\u8f91\u4e0e\u4e1a\u52a1\u903b\u8f91\u7684\u5f7b\u5e95\u89e3\u8026\uff0c\u9002\u5408\u6784\u5efa\u53ef\u7ef4\u62a4\u3001\u53ef\u6269\u5c55\u7684\u6570\u636e\u6293\u53d6\u7cfb\u7edf\u3002\n\n---\n\n## \u2728 \u6838\u5fc3\u7279\u70b9\n\n* **\u539f\u751f\u5206\u5e03\u5f0f\u8bbe\u8ba1**\n  \u5185\u7f6e\u5bf9\u5206\u5e03\u5f0f\u4efb\u52a1\u8c03\u5ea6\u7684\u652f\u6301\uff0c\u8f7b\u677e\u6269\u5c55\u81f3\u591a\u8282\u70b9\u5e76\u53d1\u91c7\u96c6\uff0c\u9002\u7528\u4e8e\u5927\u89c4\u6478\u722c\u53d6\u4efb\u52a1\u3002\n\n* **\u901a\u7528\u91c7\u96c6\u80fd\u529b**\n  \u652f\u6301\u9759\u6001\u7f51\u9875\u3001JavaScript \u6e32\u67d3\u9875\u9762\u548c API \u63a5\u53e3\u7b49\u591a\u79cd\u6570\u636e\u6e90\u7c7b\u578b\uff0c\u9002\u5e94\u5404\u7c7b\u4e1a\u52a1\u9700\u6c42\u3002\n\n* **\u903b\u8f91\u89e3\u8026\u67b6\u6784**\n  \u91c7\u96c6\u5f15\u64ce\u903b\u8f91\u4e0e\u4e1a\u52a1\u5904\u7406\u903b\u8f91\u5b8c\u5168\u5206\u79bb\uff0c\u4fbf\u4e8e\u6d4b\u8bd5\u3001\u7ef4\u62a4\u4e0e\u529f\u80fd\u6f14\u8fdb\u3002\n\n* **\u9ad8\u6027\u80fd\u4efb\u52a1\u8c03\u5ea6**\n  \u96c6\u6210 `APScheduler` \u63d0\u4f9b\u5f02\u6b65\u9ad8\u6548\u7684\u5b9a\u65f6\u8c03\u5ea6\u80fd\u529b\uff0c\u652f\u6301\u590d\u6742\u7684\u4efb\u52a1\u7ba1\u7406\u3002\n\n* **\u6a21\u5757\u5316\u4e0e\u63d2\u4ef6\u5316\u8bbe\u8ba1**\n  \u652f\u6301\u81ea\u5b9a\u4e49\u91c7\u96c6\u5668\u3001\u8fc7\u6ee4\u5668\u3001\u89e3\u6790\u5668\u7b49\u7ec4\u4ef6\uff0c\u65b9\u4fbf\u4e8c\u6b21\u5f00\u53d1\u548c\u96c6\u6210\u3002\n\n---\n\n## \ud83d\udce6 \u5b89\u88c5\u65b9\u5f0f\n\n### \u57fa\u7840\u5b89\u88c5\n\n```bash\npip install eagle-eye-scraper\n```\n\n### \u5b89\u88c5\u53ef\u9009\u4f9d\u8d56\u9879\n\n\u6839\u636e\u4f7f\u7528\u573a\u666f\uff0c\u53ef\u9009\u62e9\u5b89\u88c5\u5982\u4e0b\u4f9d\u8d56\uff1a\n\n| \u7ec4\u4ef6        | \u5b89\u88c5\u547d\u4ee4                                                   |\n| --------- | ------------------------------------------------------ |\n| Redis     | `pip install \"eagle-eye-scraper[redis]\"`               |\n| MongoDB   | `pip install \"eagle-eye-scraper[mongodb]\"`             |\n| MySQL     | `pip install \"eagle-eye-scraper[mysql]\"`               |\n| MinIO     | `pip install \"eagle-eye-scraper[minio]\"`               |\n| Pulsar MQ | `pip install \"eagle-eye-scraper[mq]\"`                  |\n| \u591a\u7ec4\u4ef6\u7ec4\u5408\u5b89\u88c5   | `pip install \"eagle-eye-scraper[redis,mongodb,minio]\"` |\n\n\n> \ud83d\udca1 \u5982\u679c\u4f7f\u7528\u7684\u662f\u65e7\u7248 pip\uff0c\u8bf7\u5c06 `[]` \u7528\u5f15\u53f7\u62ec\u8d77\u6765\uff0c\u4f8b\u5982\uff1a\n>\n> ```bash\n> pip install \"eagle-eye-scraper[mongo,redis]\"\n> ```\n\n---\n\n## \ud83e\uddf0 \u793a\u4f8b\u7528\u6cd5\n\n```python\nfrom eagle_eye_scraper import Spider\n\nclass SimpleSpider(Spider):\n    def crawl(self, **kwargs):\n        # \u6a21\u62df\u4ece\u7f51\u7edc\u6293\u53d6\u6570\u636e\n        self.raw_data = \"<html><title>\u793a\u4f8b\u9875\u9762</title><body>Hello World</body></html>\"\n        print(\"\u6293\u53d6\u5b8c\u6210\")\n\n    def parse(self, **kwargs):\n        # \u6a21\u62df\u5bf9\u6293\u53d6\u6570\u636e\u7684\u89e3\u6790\n        title_start = self.raw_data.find(\"<title>\") + 7\n        title_end = self.raw_data.find(\"</title>\")\n        title = self.raw_data[title_start:title_end]\n        print(f\"\u89e3\u6790\u5f97\u5230\u6807\u9898\uff1a{title}\")\n\nif __name__ == \"__main__\":\n    spider = SimpleSpider()\n    spider.run()\n\n```\n\n---\n\n## \ud83d\udcc4 License\n\nMIT License\n\n\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "eagle-eye-scraper \u662f\u4e00\u4e2a\u9ad8\u6548\u7684 Python \u6570\u636e\u91c7\u96c6\u6846\u67b6\uff0c\u652f\u6301\u5206\u5e03\u5f0f\u90e8\u7f72\uff0c\u9002\u7528\u4e8e\u590d\u6742\u9875\u9762\u548c\u5927\u89c4\u6a21\u6570\u636e\u91c7\u96c6\u3002",
    "version": "1.3.5",
    "project_urls": null,
    "split_keywords": [
        "python",
        " scraper",
        " data extraction",
        " distributed scraping"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "376afbb692a5127a0126411cd7876d534fb775323316bb2d90ff2fd0e9402f0c",
                "md5": "632bca91e0546cdfe24db3dc5ba352cb",
                "sha256": "4fb4bae44df5d93a69c1569395ec992f8213cf1d342e5c1f6fad18b29d1df329"
            },
            "downloads": -1,
            "filename": "eagle_eye_scraper-1.3.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "632bca91e0546cdfe24db3dc5ba352cb",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 41034,
            "upload_time": "2025-07-09T08:50:43",
            "upload_time_iso_8601": "2025-07-09T08:50:43.161213Z",
            "url": "https://files.pythonhosted.org/packages/37/6a/fbb692a5127a0126411cd7876d534fb775323316bb2d90ff2fd0e9402f0c/eagle_eye_scraper-1.3.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8640cac884512a79d62b00b432d9d03545c1075a955fa025fee3b62efe9436b6",
                "md5": "c7e5d5ee31e973cb6d44491a96a52a4d",
                "sha256": "4570b1b78db63360f150a271f67f86ecfd376f3e1a8673288853bed98d196fd4"
            },
            "downloads": -1,
            "filename": "eagle_eye_scraper-1.3.5.tar.gz",
            "has_sig": false,
            "md5_digest": "c7e5d5ee31e973cb6d44491a96a52a4d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 28941,
            "upload_time": "2025-07-09T08:50:44",
            "upload_time_iso_8601": "2025-07-09T08:50:44.814566Z",
            "url": "https://files.pythonhosted.org/packages/86/40/cac884512a79d62b00b432d9d03545c1075a955fa025fee3b62efe9436b6/eagle_eye_scraper-1.3.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-09 08:50:44",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "eagle-eye-scraper"
}
        
Elapsed time: 0.41936s