# cobweb
> 通用爬虫框架: 1.单机模式采集框架;2.分布式采集框架
>
> 5部分
>
> 1. starter -- 启动器
>
> 2. scheduler -- 调度器
>
> 3. distributor -- 分发器
>
> 4. storer -- 存储器
>
> 5. utils -- 工具函数
>
need deal
- 队列优化完善,使用queue的机制wait()同步各模块执行?
- 日志功能完善,单机模式调度和保存数据写入文件,结构化输出各任务日志
- 去重过滤(布隆过滤器等)
- 防丢失(单机模式可以通过日志文件进行检查种子)
- 自定义数据库的功能
- excel、mysql、redis数据完善
![img.png](https://image-luyuan.oss-cn-hangzhou.aliyuncs.com/image/D2388CDC-B9E5-4CE4-9F2C-7D173763B6A8.png)
Raw data
{
"_id": null,
"home_page": "https://github.com/Juannie-PP/cobweb",
"name": "cbb",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "cobweb",
"author": "Juannie-PP",
"author_email": "2604868278@qq.com",
"download_url": "https://files.pythonhosted.org/packages/2c/28/34caaab3fd5f065c0ad402cf34cbe9f610428ade731ad17f097554024480/cbb-0.0.8.tar.gz",
"platform": null,
"description": "# cobweb\n\n> \u901a\u7528\u722c\u866b\u6846\u67b6\uff1a 1.\u5355\u673a\u6a21\u5f0f\u91c7\u96c6\u6846\u67b6\uff1b2.\u5206\u5e03\u5f0f\u91c7\u96c6\u6846\u67b6\n> \n> 5\u90e8\u5206\n> \n> 1. starter -- \u542f\u52a8\u5668\n> \n> 2. scheduler -- \u8c03\u5ea6\u5668\n> \n> 3. distributor -- \u5206\u53d1\u5668\n> \n> 4. storer -- \u5b58\u50a8\u5668\n> \n> 5. utils -- \u5de5\u5177\u51fd\u6570\n> \n\nneed deal\n- \u961f\u5217\u4f18\u5316\u5b8c\u5584\uff0c\u4f7f\u7528queue\u7684\u673a\u5236wait()\u540c\u6b65\u5404\u6a21\u5757\u6267\u884c\uff1f\n- \u65e5\u5fd7\u529f\u80fd\u5b8c\u5584\uff0c\u5355\u673a\u6a21\u5f0f\u8c03\u5ea6\u548c\u4fdd\u5b58\u6570\u636e\u5199\u5165\u6587\u4ef6\uff0c\u7ed3\u6784\u5316\u8f93\u51fa\u5404\u4efb\u52a1\u65e5\u5fd7\n- \u53bb\u91cd\u8fc7\u6ee4\uff08\u5e03\u9686\u8fc7\u6ee4\u5668\u7b49\uff09\n- \u9632\u4e22\u5931\uff08\u5355\u673a\u6a21\u5f0f\u53ef\u4ee5\u901a\u8fc7\u65e5\u5fd7\u6587\u4ef6\u8fdb\u884c\u68c0\u67e5\u79cd\u5b50\uff09\n- \u81ea\u5b9a\u4e49\u6570\u636e\u5e93\u7684\u529f\u80fd\n- excel\u3001mysql\u3001redis\u6570\u636e\u5b8c\u5584\n\n\n![img.png](https://image-luyuan.oss-cn-hangzhou.aliyuncs.com/image/D2388CDC-B9E5-4CE4-9F2C-7D173763B6A8.png)\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "cobweb",
"version": "0.0.8",
"project_urls": {
"Homepage": "https://github.com/Juannie-PP/cobweb"
},
"split_keywords": [
"cobweb"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "9351e148476a9c09c6bf8f0fd0696a61c0a713d8fb4d7132e7bb29e4500c1d78",
"md5": "76bd371fa367cd9ed01448fca6857487",
"sha256": "c06fdf0c6b9b36b31970e264190ac66ec3e4c01869014bffb276eac5fd7ea131"
},
"downloads": -1,
"filename": "cbb-0.0.8-py3-none-any.whl",
"has_sig": false,
"md5_digest": "76bd371fa367cd9ed01448fca6857487",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 37270,
"upload_time": "2024-03-02T09:40:46",
"upload_time_iso_8601": "2024-03-02T09:40:46.493801Z",
"url": "https://files.pythonhosted.org/packages/93/51/e148476a9c09c6bf8f0fd0696a61c0a713d8fb4d7132e7bb29e4500c1d78/cbb-0.0.8-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "2c2834caaab3fd5f065c0ad402cf34cbe9f610428ade731ad17f097554024480",
"md5": "2d5bdf1c61e5abb91b1abaa3d44ac92c",
"sha256": "305abda8f37b5f0e48bd3a00f874a668f3e982be48fada5f05ac2e11576a244b"
},
"downloads": -1,
"filename": "cbb-0.0.8.tar.gz",
"has_sig": false,
"md5_digest": "2d5bdf1c61e5abb91b1abaa3d44ac92c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 18513,
"upload_time": "2024-03-02T09:40:48",
"upload_time_iso_8601": "2024-03-02T09:40:48.288090Z",
"url": "https://files.pythonhosted.org/packages/2c/28/34caaab3fd5f065c0ad402cf34cbe9f610428ade731ad17f097554024480/cbb-0.0.8.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-03-02 09:40:48",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Juannie-PP",
"github_project": "cobweb",
"github_not_found": true,
"lcname": "cbb"
}