baidu-serp-api


Namebaidu-serp-api JSON
Version 1.0.2 PyPI version JSON
download
home_pagehttps://github.com/ohblue/baidu-serp-api
SummaryA library to extract data from Baidu SERP and output it as JSON objects
upload_time2024-06-15 08:06:27
maintainerNone
docs_urlNone
authorBen Chen
requires_python>=3.6
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [English](#baidu-serp-api) | [中文](#百度SERP-API)

# Baidu SERP API

A Python library to extract data from Baidu Search Engine Results Pages (SERP) and output it as JSON objects.

## Installation

```bash
pip install baidu-serp-api
```

## Usage

```python
from baidu_serp_api import BaiduPc, BaiduMobile

pc_serp = BaiduPc()
results = pc_serp.search('keyword', date_range='20240501,20240531', pn='2', proxies={'http': 'http://your-proxy-server:port'})
print(results)

m_serp = BaiduMobile()
results = m_serp.search('keyword', date_range='day', pn='2', proxies={'http': 'http://your-proxy-server:port'})
print(results)

# Filter the specified content. The following returned results do not contain 'recommend', 'last_page', 'match_count'
results = m_serp.search('关键词', exclude=['recommend', 'last_page', 'match_count'])
```

### Parameters

- `keyword`: The search keyword.
- `date_range` (optional): Search for results within the specified date range. the format should be a time range string like `'20240501,20240531'`, representing searching results between May 1, 2024, and May 31, 2024. 
- `pn` (optional): Search for results on the specified page.
- `proxies` (optional): Use proxies for searching.

### Return Values

- `{'code': 500, 'msg': '网络请求错误'}`: Network request exception.
- `{'code': 501, 'msg': '百度安全验证'}`: Baidu security verification required.
- `{'code': 404, 'msg': '未找到相关结果'}`: No relevant results found.
- `{'code': 403, 'msg': '疑似违禁词'}`: Suspected prohibited word.
- `{'code': 200, 'msg': 'ok', 'data': {'results': [], 'recommend': [], last_page': True}}`: Successful response. 
    - `results` search results list.
    - `recommend` recommend keywords.
    - `last_page` indicates whether it's the last page.

### Disclaimer
This project is intended for educational purposes only and must not be used for commercial purposes or for large-scale scraping of Baidu data. This project is licensed under the GPLv3 open-source license. If other projects utilize the content of this project, they must be open-sourced and acknowledge the source. Additionally, the author of this project shall not be held responsible for any legal risks resulting from misuse. Violators will bear the consequences at their own risk.



# 百度SERP API

一个用于从百度搜索结果页面(SERP)提取数据并将其输出为JSON对象的Python库。

## 安装

```bash
pip install baidu-serp-api
```

## 使用

```python
from baidu_serp_api import BaiduPc, BaiduMobile

pc_serp = BaiduPc()
results = pc_serp.search('关键词', date_range='20240501,20240531', pn='2', proxies={'http': 'http://你的代理服务器:端口'})
print(results)

m_serp = BaiduMobile()
results = m_serp.search('关键词', date_range='20240501,20240531', pn='2', proxies={'http': 'http://你的代理服务器:端口'})
print(results)

# 过滤指定内容,以下返回的结果不含'recommend', 'last_page', 'match_count'
results = m_serp.search('关键词', exclude=['recommend', 'last_page', 'match_count'])
```

### 参数

- `keyword`: 搜索关键词。
- `date_range` (可选): 在指定日期范围内搜索结果。格式应为一个时间范围字符串,如 `'20240501,20240531'`,表示搜索2024年5月1日至2024年5月31日之间的结果。
- `pn` (可选): 搜索指定页码的结果。
- `proxies` (可选): 使用代理进行搜索。

### 返回值

- `{'code': 500, 'msg': '网络请求错误'}`: 网络请求失败需要重试。
- `{'code': 502, 'msg': '百度安全验证'}`: 需要进行百度安全验证。
- `{'code': 404, 'msg': '未找到相关结果'}`: 未找到相关结果。
- `{'code': 403, 'msg': '疑似违禁词'}`: 疑似违禁词。
- `{'code': 200, 'msg': 'ok', 'data': {'results': [], 'last_page': True}}`: 成功响应。
    - `results` 搜索结果列表。
    - `recommend` 推荐相关搜索词。
    - `last_page` 表示是否为最后一页。

### 免责声明

本项目仅供学习之用,不可用于商业目的或大规模爬取百度数据。本项目采用GPLv3开源许可,若涉及到其他项目使用本项目内容,需开源并注明来源。同时,本项目作者不对滥用行为可能导致的法律风险承担责任,违者自负后果。

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ohblue/baidu-serp-api",
    "name": "baidu-serp-api",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": null,
    "author": "Ben Chen",
    "author_email": "chan@live.cn",
    "download_url": "https://files.pythonhosted.org/packages/44/d0/2e21a5b520c7622ab357abc3b777c7b7681a91b637d8e32afe8c6b454542/baidu_serp_api-1.0.2.tar.gz",
    "platform": null,
    "description": "[English](#baidu-serp-api) | [\u4e2d\u6587](#\u767e\u5ea6SERP-API)\n\n# Baidu SERP API\n\nA Python library to extract data from Baidu Search Engine Results Pages (SERP) and output it as JSON objects.\n\n## Installation\n\n```bash\npip install baidu-serp-api\n```\n\n## Usage\n\n```python\nfrom baidu_serp_api import BaiduPc, BaiduMobile\n\npc_serp = BaiduPc()\nresults = pc_serp.search('keyword', date_range='20240501,20240531', pn='2', proxies={'http': 'http://your-proxy-server:port'})\nprint(results)\n\nm_serp = BaiduMobile()\nresults = m_serp.search('keyword', date_range='day', pn='2', proxies={'http': 'http://your-proxy-server:port'})\nprint(results)\n\n# Filter the specified content. The following returned results do not contain 'recommend', 'last_page', 'match_count'\nresults = m_serp.search('\u5173\u952e\u8bcd', exclude=['recommend', 'last_page', 'match_count'])\n```\n\n### Parameters\n\n- `keyword`: The search keyword.\n- `date_range` (optional): Search for results within the specified date range. the format should be a time range string like `'20240501,20240531'`, representing searching results between May 1, 2024, and May 31, 2024. \n- `pn` (optional): Search for results on the specified page.\n- `proxies` (optional): Use proxies for searching.\n\n### Return Values\n\n- `{'code': 500, 'msg': '\u7f51\u7edc\u8bf7\u6c42\u9519\u8bef'}`: Network request exception.\n- `{'code': 501, 'msg': '\u767e\u5ea6\u5b89\u5168\u9a8c\u8bc1'}`: Baidu security verification required.\n- `{'code': 404, 'msg': '\u672a\u627e\u5230\u76f8\u5173\u7ed3\u679c'}`: No relevant results found.\n- `{'code': 403, 'msg': '\u7591\u4f3c\u8fdd\u7981\u8bcd'}`: Suspected prohibited word.\n- `{'code': 200, 'msg': 'ok', 'data': {'results': [], 'recommend': [], last_page': True}}`: Successful response. \n    - `results` search results list.\n    - `recommend` recommend keywords.\n    - `last_page` indicates whether it's the last page.\n\n### Disclaimer\nThis project is intended for educational purposes only and must not be used for commercial purposes or for large-scale scraping of Baidu data. This project is licensed under the GPLv3 open-source license. If other projects utilize the content of this project, they must be open-sourced and acknowledge the source. Additionally, the author of this project shall not be held responsible for any legal risks resulting from misuse. Violators will bear the consequences at their own risk.\n\n\n\n# \u767e\u5ea6SERP API\n\n\u4e00\u4e2a\u7528\u4e8e\u4ece\u767e\u5ea6\u641c\u7d22\u7ed3\u679c\u9875\u9762\uff08SERP\uff09\u63d0\u53d6\u6570\u636e\u5e76\u5c06\u5176\u8f93\u51fa\u4e3aJSON\u5bf9\u8c61\u7684Python\u5e93\u3002\n\n## \u5b89\u88c5\n\n```bash\npip install baidu-serp-api\n```\n\n## \u4f7f\u7528\n\n```python\nfrom baidu_serp_api import BaiduPc, BaiduMobile\n\npc_serp = BaiduPc()\nresults = pc_serp.search('\u5173\u952e\u8bcd', date_range='20240501,20240531', pn='2', proxies={'http': 'http://\u4f60\u7684\u4ee3\u7406\u670d\u52a1\u5668:\u7aef\u53e3'})\nprint(results)\n\nm_serp = BaiduMobile()\nresults = m_serp.search('\u5173\u952e\u8bcd', date_range='20240501,20240531', pn='2', proxies={'http': 'http://\u4f60\u7684\u4ee3\u7406\u670d\u52a1\u5668:\u7aef\u53e3'})\nprint(results)\n\n# \u8fc7\u6ee4\u6307\u5b9a\u5185\u5bb9\uff0c\u4ee5\u4e0b\u8fd4\u56de\u7684\u7ed3\u679c\u4e0d\u542b'recommend', 'last_page', 'match_count'\nresults = m_serp.search('\u5173\u952e\u8bcd', exclude=['recommend', 'last_page', 'match_count'])\n```\n\n### \u53c2\u6570\n\n- `keyword`: \u641c\u7d22\u5173\u952e\u8bcd\u3002\n- `date_range` (\u53ef\u9009): \u5728\u6307\u5b9a\u65e5\u671f\u8303\u56f4\u5185\u641c\u7d22\u7ed3\u679c\u3002\u683c\u5f0f\u5e94\u4e3a\u4e00\u4e2a\u65f6\u95f4\u8303\u56f4\u5b57\u7b26\u4e32\uff0c\u5982 `'20240501,20240531'`\uff0c\u8868\u793a\u641c\u7d222024\u5e745\u67081\u65e5\u81f32024\u5e745\u670831\u65e5\u4e4b\u95f4\u7684\u7ed3\u679c\u3002\n- `pn` (\u53ef\u9009): \u641c\u7d22\u6307\u5b9a\u9875\u7801\u7684\u7ed3\u679c\u3002\n- `proxies` (\u53ef\u9009): \u4f7f\u7528\u4ee3\u7406\u8fdb\u884c\u641c\u7d22\u3002\n\n### \u8fd4\u56de\u503c\n\n- `{'code': 500, 'msg': '\u7f51\u7edc\u8bf7\u6c42\u9519\u8bef'}`: \u7f51\u7edc\u8bf7\u6c42\u5931\u8d25\u9700\u8981\u91cd\u8bd5\u3002\n- `{'code': 502, 'msg': '\u767e\u5ea6\u5b89\u5168\u9a8c\u8bc1'}`: \u9700\u8981\u8fdb\u884c\u767e\u5ea6\u5b89\u5168\u9a8c\u8bc1\u3002\n- `{'code': 404, 'msg': '\u672a\u627e\u5230\u76f8\u5173\u7ed3\u679c'}`: \u672a\u627e\u5230\u76f8\u5173\u7ed3\u679c\u3002\n- `{'code': 403, 'msg': '\u7591\u4f3c\u8fdd\u7981\u8bcd'}`: \u7591\u4f3c\u8fdd\u7981\u8bcd\u3002\n- `{'code': 200, 'msg': 'ok', 'data': {'results': [], 'last_page': True}}`: \u6210\u529f\u54cd\u5e94\u3002\n    - `results` \u641c\u7d22\u7ed3\u679c\u5217\u8868\u3002\n    - `recommend` \u63a8\u8350\u76f8\u5173\u641c\u7d22\u8bcd\u3002\n    - `last_page` \u8868\u793a\u662f\u5426\u4e3a\u6700\u540e\u4e00\u9875\u3002\n\n### \u514d\u8d23\u58f0\u660e\n\n\u672c\u9879\u76ee\u4ec5\u4f9b\u5b66\u4e60\u4e4b\u7528\uff0c\u4e0d\u53ef\u7528\u4e8e\u5546\u4e1a\u76ee\u7684\u6216\u5927\u89c4\u6a21\u722c\u53d6\u767e\u5ea6\u6570\u636e\u3002\u672c\u9879\u76ee\u91c7\u7528GPLv3\u5f00\u6e90\u8bb8\u53ef\uff0c\u82e5\u6d89\u53ca\u5230\u5176\u4ed6\u9879\u76ee\u4f7f\u7528\u672c\u9879\u76ee\u5185\u5bb9\uff0c\u9700\u5f00\u6e90\u5e76\u6ce8\u660e\u6765\u6e90\u3002\u540c\u65f6\uff0c\u672c\u9879\u76ee\u4f5c\u8005\u4e0d\u5bf9\u6ee5\u7528\u884c\u4e3a\u53ef\u80fd\u5bfc\u81f4\u7684\u6cd5\u5f8b\u98ce\u9669\u627f\u62c5\u8d23\u4efb\uff0c\u8fdd\u8005\u81ea\u8d1f\u540e\u679c\u3002\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A library to extract data from Baidu SERP and output it as JSON objects",
    "version": "1.0.2",
    "project_urls": {
        "Homepage": "https://github.com/ohblue/baidu-serp-api"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ab6639f91a25302d2dd527c799c224f9d5cd0f3ca82e0162deea9eba85574968",
                "md5": "b52b4767ed1e980d931561df894ef8d6",
                "sha256": "b184bed7500cac88d2aaed41d6b4b4bbe584d240d963fb58888d06c9147d44ce"
            },
            "downloads": -1,
            "filename": "baidu_serp_api-1.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b52b4767ed1e980d931561df894ef8d6",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 22211,
            "upload_time": "2024-06-15T08:06:25",
            "upload_time_iso_8601": "2024-06-15T08:06:25.804100Z",
            "url": "https://files.pythonhosted.org/packages/ab/66/39f91a25302d2dd527c799c224f9d5cd0f3ca82e0162deea9eba85574968/baidu_serp_api-1.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "44d02e21a5b520c7622ab357abc3b777c7b7681a91b637d8e32afe8c6b454542",
                "md5": "f8d25c05348897cf7ebc1a8f03e08923",
                "sha256": "ec5c9b1b0ef767d755f40101c021e5443133ee10f55d8edfaa566f21de6d2cbd"
            },
            "downloads": -1,
            "filename": "baidu_serp_api-1.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "f8d25c05348897cf7ebc1a8f03e08923",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 21553,
            "upload_time": "2024-06-15T08:06:27",
            "upload_time_iso_8601": "2024-06-15T08:06:27.627503Z",
            "url": "https://files.pythonhosted.org/packages/44/d0/2e21a5b520c7622ab357abc3b777c7b7681a91b637d8e32afe8c6b454542/baidu_serp_api-1.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-15 08:06:27",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ohblue",
    "github_project": "baidu-serp-api",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "baidu-serp-api"
}
        
Elapsed time: 0.31848s