Name | netnovelcrawler JSON |
Version |
0.0.2
JSON |
| download |
home_page | None |
Summary | Crawler framework to download Internet-novels from web. |
upload_time | 2024-09-16 06:27:24 |
maintainer | None |
docs_url | None |
author | None |
requires_python | None |
license | MIT |
keywords |
net novel
crawler
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
## 爬取小说网站并生成TXT
### 目前支持
- [夜读库](https://m.yeduku.net)
- [顶点小说](https://www.dingdianks.com)
- [笔趣阁](https://www.22biqu.com)
- [~~欢乐书客/刺猬猫~~](https://www.ciweimao.com)
- [~~sf轻小说~~](https://book.sfacg.com)
### 安装
### 用法
#### CLI
```python
config = (
r'D:\net_novels\crawler_ocr\lord',
{
'start_page': 'https://ccc.xxxx.com/Novel/xxxxxx/',
'login_info': ('test_login', 'test_pwd'),
'image_folder': 'vip_images',
'image_process': 'ocr',
'text_file': 'xxx.txt',
}
)
from netnovelcrawler import Crawler
from netnovelcrawler.utils.starter_stopper import AfterChapterStarter, CountStopper
mycrawler = Crawler(*config)
mycrawler.crawl(starter=AfterChapterStarter("10. 某章节"), stopper=CountStopper(50))
```
#### GUI
```bash
python -m netnovelcrawlertaskmgr
```
#### 绕过滑块验证反爬虫机制
######修改chromedriver.exe
- 文本编辑器打开chromedriver.exe
- 找到`cdc_`字符串
- 等长替换$cdc_lasutopfhvcZLmcfl
- 保存
Raw data
{
"_id": null,
"home_page": null,
"name": "netnovelcrawler",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "net novel, crawler",
"author": null,
"author_email": "NovelReader <xxxx@hotmail.com>",
"download_url": "https://files.pythonhosted.org/packages/8e/59/e22840195f7f24d51fb0c8a501d94385218e84eadd3efeda008e5f7859cb/netnovelcrawler-0.0.2.tar.gz",
"platform": null,
"description": "## \u722c\u53d6\u5c0f\u8bf4\u7f51\u7ad9\u5e76\u751f\u6210TXT\r\n\r\n### \u76ee\u524d\u652f\u6301\r\n- [\u591c\u8bfb\u5e93](https://m.yeduku.net)\r\n- [\u9876\u70b9\u5c0f\u8bf4](https://www.dingdianks.com)\r\n- [\u7b14\u8da3\u9601](https://www.22biqu.com)\r\n- [~~\u6b22\u4e50\u4e66\u5ba2/\u523a\u732c\u732b~~](https://www.ciweimao.com)\r\n- [~~sf\u8f7b\u5c0f\u8bf4~~](https://book.sfacg.com)\r\n\r\n### \u5b89\u88c5\r\n\r\n\r\n### \u7528\u6cd5\r\n\r\n#### CLI\r\n```python\r\nconfig = (\r\n r'D:\\net_novels\\crawler_ocr\\lord',\r\n {\r\n 'start_page': 'https://ccc.xxxx.com/Novel/xxxxxx/',\r\n 'login_info': ('test_login', 'test_pwd'),\r\n 'image_folder': 'vip_images',\r\n 'image_process': 'ocr',\r\n 'text_file': 'xxx.txt',\r\n }\r\n)\r\nfrom netnovelcrawler import Crawler\r\nfrom netnovelcrawler.utils.starter_stopper import AfterChapterStarter, CountStopper\r\n\r\nmycrawler = Crawler(*config)\r\nmycrawler.crawl(starter=AfterChapterStarter(\"10. \u67d0\u7ae0\u8282\"), stopper=CountStopper(50))\r\n```\r\n\r\n#### GUI\r\n```bash\r\npython -m netnovelcrawlertaskmgr\r\n```\r\n\r\n\r\n#### \u7ed5\u8fc7\u6ed1\u5757\u9a8c\u8bc1\u53cd\u722c\u866b\u673a\u5236\r\n\r\n######\u4fee\u6539chromedriver.exe\r\n- \u6587\u672c\u7f16\u8f91\u5668\u6253\u5f00chromedriver.exe\r\n- \u627e\u5230`cdc_`\u5b57\u7b26\u4e32\r\n- \u7b49\u957f\u66ff\u6362$cdc_lasutopfhvcZLmcfl\r\n- \u4fdd\u5b58\r\n\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Crawler framework to download Internet-novels from web.",
"version": "0.0.2",
"project_urls": null,
"split_keywords": [
"net novel",
" crawler"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "dce722e1501b93433511075322f2c34e02cecb5aeaa31028f99a7cee677024be",
"md5": "32184814f367d73c564675c2f70387d3",
"sha256": "13e8f51512a33c739366b502caa3c17d752e6e48bd12e8f20defee2a600eb811"
},
"downloads": -1,
"filename": "netnovelcrawler-0.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "32184814f367d73c564675c2f70387d3",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 26200,
"upload_time": "2024-09-16T06:27:23",
"upload_time_iso_8601": "2024-09-16T06:27:23.377052Z",
"url": "https://files.pythonhosted.org/packages/dc/e7/22e1501b93433511075322f2c34e02cecb5aeaa31028f99a7cee677024be/netnovelcrawler-0.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "8e59e22840195f7f24d51fb0c8a501d94385218e84eadd3efeda008e5f7859cb",
"md5": "35794f4b0455ffe7166c72984fc2dcbc",
"sha256": "4264c6c68a8ff46aaed21f97881f988dde6bd392e7fd05ccd77fa51e7d90dd18"
},
"downloads": -1,
"filename": "netnovelcrawler-0.0.2.tar.gz",
"has_sig": false,
"md5_digest": "35794f4b0455ffe7166c72984fc2dcbc",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 25438,
"upload_time": "2024-09-16T06:27:24",
"upload_time_iso_8601": "2024-09-16T06:27:24.637592Z",
"url": "https://files.pythonhosted.org/packages/8e/59/e22840195f7f24d51fb0c8a501d94385218e84eadd3efeda008e5f7859cb/netnovelcrawler-0.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-16 06:27:24",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "netnovelcrawler"
}