baidu-image-crawling


Namebaidu-image-crawling JSON
Version 0.0.1 PyPI version JSON
download
home_pagehttps://github.com/SWHL/BaiduImageCrawling
SummaryBaidu Image Spider
upload_time2025-01-15 15:10:24
maintainerNone
docs_urlNone
authorSWHL
requires_python<3.13,>=3.6
licenseApache-2.0
keywords spider baidu
VCS
bugtrack_url
requirements fake_useragent
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ## Baidu Image Crawling

一个超级轻量的百度图片爬虫, modified from <https://github.com/kong36088/BaiduImageCrawling>

### 安装

```bash
pip install baidu_image_crawling
```

### Python使用

```python
from baidu_image_crawling.main import Crawler

crawler = Crawler(0.05, save_dir="outputs")  # 抓取延迟为 0.05

# 抓取关键词为 “美女”,总数为2页,开始页码为1,每页 30 张, 即总共2*30=60张
crawler(word="美女", total_page=2, start_page=1, per_page=30)
```

### 终端使用

```bash
baidu_image_crawling -w 美女 -tp 1 -sp 1 -pp 2
```

查看参数文档:

```bash
$ baidu_image_crawling -h
usage: baidu_image_crawling [-h] -w WORD -tp TOTAL_PAGE -sp START_PAGE [-pp [PER_PAGE]] [-sd SAVE_DIR] [-d DELAY]

options:
  -h, --help            show this help message and exit
  -w WORD, --word WORD  抓取关键词
  -tp TOTAL_PAGE, --total_page TOTAL_PAGE
                        需要抓取的总页数
  -sp START_PAGE, --start_page START_PAGE
                        起始页数
  -pp [PER_PAGE], --per_page [PER_PAGE]
                        每页大小
  -sd SAVE_DIR, --save_dir SAVE_DIR
                        图片保存目录
  -d DELAY, --delay DELAY
                        抓取延时(间隔)
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/SWHL/BaiduImageCrawling",
    "name": "baidu-image-crawling",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.6",
    "maintainer_email": null,
    "keywords": "spider, baidu",
    "author": "SWHL",
    "author_email": "liekkaskono@163.com",
    "download_url": null,
    "platform": "Any",
    "description": "## Baidu Image Crawling\n\n\u4e00\u4e2a\u8d85\u7ea7\u8f7b\u91cf\u7684\u767e\u5ea6\u56fe\u7247\u722c\u866b, modified from <https://github.com/kong36088/BaiduImageCrawling>\n\n### \u5b89\u88c5\n\n```bash\npip install baidu_image_crawling\n```\n\n### Python\u4f7f\u7528\n\n```python\nfrom baidu_image_crawling.main import Crawler\n\ncrawler = Crawler(0.05, save_dir=\"outputs\")  # \u6293\u53d6\u5ef6\u8fdf\u4e3a 0.05\n\n# \u6293\u53d6\u5173\u952e\u8bcd\u4e3a \u201c\u7f8e\u5973\u201d\uff0c\u603b\u6570\u4e3a2\u9875\uff0c\u5f00\u59cb\u9875\u7801\u4e3a1\uff0c\u6bcf\u9875 30 \u5f20, \u5373\u603b\u51712*30=60\u5f20\ncrawler(word=\"\u7f8e\u5973\", total_page=2, start_page=1, per_page=30)\n```\n\n### \u7ec8\u7aef\u4f7f\u7528\n\n```bash\nbaidu_image_crawling -w \u7f8e\u5973 -tp 1 -sp 1 -pp 2\n```\n\n\u67e5\u770b\u53c2\u6570\u6587\u6863\uff1a\n\n```bash\n$ baidu_image_crawling -h\nusage: baidu_image_crawling [-h] -w WORD -tp TOTAL_PAGE -sp START_PAGE [-pp [PER_PAGE]] [-sd SAVE_DIR] [-d DELAY]\n\noptions:\n  -h, --help            show this help message and exit\n  -w WORD, --word WORD  \u6293\u53d6\u5173\u952e\u8bcd\n  -tp TOTAL_PAGE, --total_page TOTAL_PAGE\n                        \u9700\u8981\u6293\u53d6\u7684\u603b\u9875\u6570\n  -sp START_PAGE, --start_page START_PAGE\n                        \u8d77\u59cb\u9875\u6570\n  -pp [PER_PAGE], --per_page [PER_PAGE]\n                        \u6bcf\u9875\u5927\u5c0f\n  -sd SAVE_DIR, --save_dir SAVE_DIR\n                        \u56fe\u7247\u4fdd\u5b58\u76ee\u5f55\n  -d DELAY, --delay DELAY\n                        \u6293\u53d6\u5ef6\u65f6\uff08\u95f4\u9694\uff09\n```\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Baidu Image Spider",
    "version": "0.0.1",
    "project_urls": {
        "Homepage": "https://github.com/SWHL/BaiduImageCrawling"
    },
    "split_keywords": [
        "spider",
        " baidu"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "89674631eeec8dbe49b67eac2378c8aaeeffc7759e3c7f326bee847a2d5d8183",
                "md5": "8fdfddc0a9743bfc56f24383baae2635",
                "sha256": "2c955d3dce0395ae1abe88f8effc013a3c8052fa973458cb8749ff7861e2b9c4"
            },
            "downloads": -1,
            "filename": "baidu_image_crawling-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8fdfddc0a9743bfc56f24383baae2635",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.6",
            "size": 6021,
            "upload_time": "2025-01-15T15:10:24",
            "upload_time_iso_8601": "2025-01-15T15:10:24.698102Z",
            "url": "https://files.pythonhosted.org/packages/89/67/4631eeec8dbe49b67eac2378c8aaeeffc7759e3c7f326bee847a2d5d8183/baidu_image_crawling-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-15 15:10:24",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "SWHL",
    "github_project": "BaiduImageCrawling",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "fake_useragent",
            "specs": []
        }
    ],
    "lcname": "baidu-image-crawling"
}
        
Elapsed time: 4.64257s