Name | googlesearch-tool JSON |
Version |
1.1.1
JSON |
| download |
home_page | None |
Summary | A Python library for performing Google searches with support for dynamic query parameters, result deduplication, and custom proxy configuration. |
upload_time | 2024-12-02 02:00:43 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.8 |
license | MIT |
keywords |
async
google
proxy
search
web-scraping
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# GooglSearch-Tool
**GooglSearch-Tool** 是一个 Python 库,用于进行 Google 搜索并获取搜索结果。支持动态查询参数、结果去重以及自定义代理配置。
[![GitHub stars](https://img.shields.io/github/stars/huazz233/googlesearch.svg)](https://github.com/huazz233/googlesearch/stargazers)
[![GitHub issues](https://img.shields.io/github/issues/huazz233/googlesearch.svg)](https://github.com/huazz233/googlesearch/issues)
[![GitHub license](https://img.shields.io/github/license/huazz233/googlesearch.svg)](https://github.com/huazz233/googlesearch/blob/master/LICENSE)
简体中文 | [English](README_EN.md)
## 目录
- [特性](#特性)
- [安装](#安装)
- [快速开始](#快速开始)
- [高级用法](#高级用法)
- [配置说明](#配置说明)
- [打包说明](#打包说明)
- [常见问题](#常见问题)
- [参与贡献](#参与贡献)
- [社区支持](#社区支持)
## 特性
- 支持 Google 搜索
- 可配置的查询参数(包括时间范围)
- 根据标题、URL 和摘要进行结果去重
- 支持自定义代理
- 搜索结果包括标题、链接、描述和时间信息
- 使用随机域名进行请求,防止访问受限
- 随机选择 User-Agent 请求头
- 手动更新并保存最新的 User-Agent 和 Google 域名列表(函数与保存位置在 `/config/data` 目录)
## 安装
可以通过 `pip` 安装 `googlesearch-tool`:
```bash
pip install googlesearch-tool
```
## 快速开始
以下是使用 GooglSearch-Tool 库的基本示例:
### 基础示例
```python
import asyncio
from googlesearch.search import search
from googlesearch.news_search import search_news
async def test_search():
"""测试普通搜索"""
try:
"""
代理配置说明:
1. 不使用代理:直接删除或注释掉 proxies 配置
2. 使用代理:取消注释并修改代理地址
"""
# 代理配置示例(如需使用,请取消注释并修改代理地址)
# proxies = {
# "http://": "http://your-proxy-host:port",
# "https://": "http://your-proxy-host:port"
# }
print("\n=== 普通搜索结果 ===")
results = await search(
term="python programming",
num=10,
lang="en"
)
if not results:
print("未找到搜索结果")
return False
for i, result in enumerate(results, 1):
print(f"\n结果 {i}:")
print(f"标题: {result.title}")
print(f"链接: {result.url}")
print(f"描述: {result.description}")
if result.time:
print(f"时间: {result.time}")
print("-" * 80)
return True
except Exception as e:
print(f"普通搜索失败: {str(e)}")
return False
async def test_news_search():
"""测试新闻搜索"""
try:
print("\n=== 新闻搜索结果 ===")
results = await search_news(
term="python news",
num=5,
lang="en"
)
if not results:
print("未找到新闻结果")
return False
for i, result in enumerate(results, 1):
print(f"\n新闻 {i}:")
print(f"标题: {result.title}")
print(f"链接: {result.url}")
print(f"描述: {result.description}")
if result.time:
print(f"时间: {result.time}")
print("-" * 80)
return True
except Exception as e:
print(f"新闻搜索失败: {str(e)}")
return False
async def main():
"""运行所有测试"""
print("开始搜索...\n")
await test_search()
await test_news_search()
if __name__ == "__main__":
asyncio.run(main())
```
### 代理配置说明
1. **不使用代理**
- 直接删除或注释掉 proxies 配置
- 确保搜索函数中的 proxies/proxy 参数也被注释掉
2. **使用代理**
- 取消注释 proxies 配置
- 修改代理地址为您的实际代理服务器地址
- 取消注释搜索函数中的 proxies/proxy 参数
### 参数说明
- `url`: 通过 `Config.get_random_domain()` 获取的随机 Google 域名
- `headers`: 包含随机 User-Agent 的请求头
- `term`: 搜索查询字符串
- `num`: 要获取的结果数量
- `tbs`: 时间范围参数
- `qdr:h` - 过去一小时
- `qdr:d` - 过去一天
- `qdr:w` - 过去一周
- `qdr:m` - 过去一月
- `qdr:y` - 过去一年
- `proxies`: 代理配置(可选)
- `timeout`: 请求超时时间(秒)
### 结果对象
每个搜索结果的对象包含以下字段:
- `link`:结果的 URL
- `title`:结果的标题
- `description`:结果的描述
- `time_string`:结果的时间信息(如果有)
## 高级用法
### 获取随机域名和请求头
为了避免请求被限制,库提供了获取随机 Google 搜索域名和随机 User-Agent 的功能:
```python
from googlesearch.config.config import Config
# 获取随机 Google 搜索域名
url = Config.get_random_domain()
print(url) # 输出示例: https://www.google.ge/search
# 获取随机 User-Agent
headers = {"User-Agent": Config.get_random_user_agent()}
print(headers) # 输出示例: {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.1.7760.206 Safari/537.36'}
```
### 域名和 User-Agent 更新
域名列表和 User-Agent 列表存储在 `config/data` 目录下:
- `all_domain.txt`: 包含所有可用的 Google 搜索域名
- `user_agents.txt`: 包含最新的 Chrome User-Agent 列表
如需更新这些列表:
1. 运行 `fetch_and_save_user_domain.py` 更新域名列表
2. 运行 `fetch_and_save_user_agents.py` 更新 User-Agent 列表
## 高级搜索语法
> 更多详细的 Google 搜索运算符和高级搜索技巧,请访问 [Google 搜索帮助](https://support.google.com/websearch/answer/2466433)。
### 基础搜索运算符
以下是一些常用的搜索运算符,使用时请注意运算符和搜索词之间不要有空格:
- **精确匹配搜索**:使用引号包围词组,如 `"exact phrase"`
- **站内搜索**:`site:domain.com keywords`
- **排除特定词**:使用减号排除词,如 `china -snake`
- **文件类型**:`filetype:pdf keywords`
- **标题搜索**:`intitle:keywords`
- **URL搜索**:`inurl:keywords`
- **多个条件**:`site:domain.com filetype:pdf keywords`
### 时间范围参数 (tbs)
搜索函数支持以下时间范围参数:
```python
tbs = {
"qdr:h", # 过去一小时内的结果
"qdr:d", # 过去一天内的结果
"qdr:w", # 过去一周内的结果
"qdr:m", # 过去一月内的结果
"qdr:y" # 过去一年内的结果
}
```
### 其他搜索参数
```python
params = {
"hl": "zh-CN", # 界面语言(例如:zh-CN, en)
"lr": "lang_zh", # 搜索结果语言
"safe": "active", # 安全搜索设置("active"启用安全搜索)
"start": 0, # 结果起始位置(分页用)
"num": 100, # 返回结果数量(最大100)
}
```
### 高级搜索示例
```python
# 在特定网站中搜索PDF文件
term = "site:example.com filetype:pdf china programming"
# 搜索特定时间范围内的新闻
term = "china news site:cnn.com"
tbs = "qdr:d" # 过去24小时内的结果
# 精确匹配标题中的短语
term = 'intitle:"machine learning" site:arxiv.org'
# 排除特定内容
term = "china programming -beginner -tutorial site:github.com"
```
### 搜索结果过滤
搜索结果可以按以下类型进行过滤:
- 网页(Web)
- 新闻(News)
- 图片(Images)
- 视频(Videos)
在我们的库中,已经为不同类型的搜索提供了专门的函数:
```python
# 普通网页搜索
results = await search(...)
# 新闻搜索
news_results = await search_news(...)
```
### 搜索技巧
1. **使用多个条件组合**
```python
# 在多个特定网站中搜索
term = "site:edu.cn OR site:ac.cn machine learning"
```
2. **使用通配符**
```python
# 使用星号作为通配符
term = "china * programming"
```
3. **使用数字范围**
```python
# 搜索特定年份范围
term = "china programming 2020..2024"
```
4. **相关词搜索**
```python
# 使用波浪号搜索相关词
term = "~programming tutorials"
```
## 配置说明
### 为什么我的请求总是超时?
请检查您的网络连接和代理设置。确保代理配置正确,并且目标网站没有被屏蔽。
### 如何进行更复杂的查询?
您可以使用 Google 搜索的高级语法(如 `site:`、`filetype:` 等)来构造更复杂的查询字符串。
### 如何处理请求失败或异常?
请确保在请求中设置适当的异常处理,并查看错误日志以获取更多信息。可以参考 [httpx 文档](https://www.python-httpx.org/) 了解更多关于异常处理的信息。
## 打包说明
使用 PyInstaller 打包时,需要确保配置文件被正确包含。以下是打包步骤和注意事项:
### 1. 创建 spec 文件
```bash
pyi-makespec --onefile your_script.py
```
### 2. 修改 spec 文件
需要在 spec 文件中添加 datas 参数,确保包含必要的配置文件:
```python
# your_script.spec
a = Analysis(
['your_script.py'],
pathex=[],
binaries=[],
datas=[
# 添加配置文件
('googlesearch/config/data/all_domain.txt', 'googlesearch/config/data'),
('googlesearch/config/data/user_agents.txt', 'googlesearch/config/data'),
],
# ... 其他配置 ...
)
```
### 3. 使用 spec 文件打包
```bash
pyinstaller your_script.spec
```
### 4. 验证打包结果
运行打包后的程序,确保能正确读取配置文件:
```python
from googlesearch.config.config import Config
# 测试配置文件是否正确加载
url = Config.get_random_domain()
headers = {"User-Agent": Config.get_random_user_agent()}
```
如果出现文件未找到的错误,请检查 spec 文件中的路径配置是否正确。
## 常见问题
### 为什么我的请求总是超时?
请检查您的网络连接和代理设置。确保代理配置正确,并且目标网站没有被屏蔽。
### 如何进行更复杂的查询?
您可以使用 Google 搜索的高级语法(如 `site:` 等)来构造更复杂的查询字符串。
### 如何处理请求失败或异常?
请确保在请求中设置适当的异常处理,并查看错误日志以获取更多信息。可以参考 [httpx 文档](https://www.python-httpx.org/) 了解更多关于异常处理的信息。
## 参与贡献
我们非常欢迎社区成员参与项目建设!以下是几种参与方式:
### Star ⭐ 本项目
如果您觉得这个项目对您有帮助,欢迎点击右上角的 Star 按钮支持我们!
### 提交 Issue
发现 bug 或有新功能建议?欢迎提交 [Issue](https://github.com/huazz233/googlesearch/issues)!
- 🐛 Bug 反馈:请详细描述问题现象和复现步骤
- 💡 功能建议:请说明新功能的使用场景和预期效果
### Pull Request
想要为项目贡献代码?非常欢迎提交 PR!
1. Fork 本仓库
2. 创建新分支: `git checkout -b feature/your-feature-name`
3. 提交更改: `git commit -am 'Add some feature'`
4. 推送分支: `git push origin feature/your-feature-name`
5. 提交 Pull Request
我们会认真审查每一个 PR,并提供及时反馈。
## 社区支持
- 📫 邮件联系:[huazz233@163.com](mailto:huazz233@163.com)
- 💬 问题反馈:[GitHub Issues](https://github.com/huazz233/googlesearch/issues)
- 📖 开发文档:[Wiki](https://github.com/huazz233/googlesearch/wiki)
- 👥 讨论区:[Discussions](https://github.com/huazz233/googlesearch/discussions)
## 许可证
本项目采用 MIT 许可证 - 查看 [LICENSE](LICENSE) 了解详情
Raw data
{
"_id": null,
"home_page": null,
"name": "googlesearch-tool",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "async, google, proxy, search, web-scraping",
"author": null,
"author_email": "huazz233 <huazz233@163.com>",
"download_url": "https://files.pythonhosted.org/packages/50/61/c4958e1671feb6b9115ca771b2b4549a77ad9267eb7c3f601d4f552449ce/googlesearch_tool-1.1.1.tar.gz",
"platform": null,
"description": "# GooglSearch-Tool\n\n**GooglSearch-Tool** \u662f\u4e00\u4e2a Python \u5e93\uff0c\u7528\u4e8e\u8fdb\u884c Google \u641c\u7d22\u5e76\u83b7\u53d6\u641c\u7d22\u7ed3\u679c\u3002\u652f\u6301\u52a8\u6001\u67e5\u8be2\u53c2\u6570\u3001\u7ed3\u679c\u53bb\u91cd\u4ee5\u53ca\u81ea\u5b9a\u4e49\u4ee3\u7406\u914d\u7f6e\u3002\n\n[![GitHub stars](https://img.shields.io/github/stars/huazz233/googlesearch.svg)](https://github.com/huazz233/googlesearch/stargazers)\n[![GitHub issues](https://img.shields.io/github/issues/huazz233/googlesearch.svg)](https://github.com/huazz233/googlesearch/issues)\n[![GitHub license](https://img.shields.io/github/license/huazz233/googlesearch.svg)](https://github.com/huazz233/googlesearch/blob/master/LICENSE)\n\n\u7b80\u4f53\u4e2d\u6587 | [English](README_EN.md)\n\n## \u76ee\u5f55\n\n- [\u7279\u6027](#\u7279\u6027)\n- [\u5b89\u88c5](#\u5b89\u88c5)\n- [\u5feb\u901f\u5f00\u59cb](#\u5feb\u901f\u5f00\u59cb)\n- [\u9ad8\u7ea7\u7528\u6cd5](#\u9ad8\u7ea7\u7528\u6cd5)\n- [\u914d\u7f6e\u8bf4\u660e](#\u914d\u7f6e\u8bf4\u660e)\n- [\u6253\u5305\u8bf4\u660e](#\u6253\u5305\u8bf4\u660e)\n- [\u5e38\u89c1\u95ee\u9898](#\u5e38\u89c1\u95ee\u9898)\n- [\u53c2\u4e0e\u8d21\u732e](#\u53c2\u4e0e\u8d21\u732e)\n- [\u793e\u533a\u652f\u6301](#\u793e\u533a\u652f\u6301)\n\n## \u7279\u6027\n\n- \u652f\u6301 Google \u641c\u7d22\n- \u53ef\u914d\u7f6e\u7684\u67e5\u8be2\u53c2\u6570\uff08\u5305\u62ec\u65f6\u95f4\u8303\u56f4\uff09\n- \u6839\u636e\u6807\u9898\u3001URL \u548c\u6458\u8981\u8fdb\u884c\u7ed3\u679c\u53bb\u91cd\n- \u652f\u6301\u81ea\u5b9a\u4e49\u4ee3\u7406\n- \u641c\u7d22\u7ed3\u679c\u5305\u62ec\u6807\u9898\u3001\u94fe\u63a5\u3001\u63cf\u8ff0\u548c\u65f6\u95f4\u4fe1\u606f\n- \u4f7f\u7528\u968f\u673a\u57df\u540d\u8fdb\u884c\u8bf7\u6c42\uff0c\u9632\u6b62\u8bbf\u95ee\u53d7\u9650\n- \u968f\u673a\u9009\u62e9 User-Agent \u8bf7\u6c42\u5934\n- \u624b\u52a8\u66f4\u65b0\u5e76\u4fdd\u5b58\u6700\u65b0\u7684 User-Agent \u548c Google \u57df\u540d\u5217\u8868\uff08\u51fd\u6570\u4e0e\u4fdd\u5b58\u4f4d\u7f6e\u5728 `/config/data` \u76ee\u5f55\uff09\n\n## \u5b89\u88c5\n\n\u53ef\u4ee5\u901a\u8fc7 `pip` \u5b89\u88c5 `googlesearch-tool`\uff1a\n\n```bash\npip install googlesearch-tool\n```\n\n## \u5feb\u901f\u5f00\u59cb\n\n\u4ee5\u4e0b\u662f\u4f7f\u7528 GooglSearch-Tool \u5e93\u7684\u57fa\u672c\u793a\u4f8b\uff1a\n\n### \u57fa\u7840\u793a\u4f8b\n\n```python\nimport asyncio\nfrom googlesearch.search import search\nfrom googlesearch.news_search import search_news\n\nasync def test_search():\n \"\"\"\u6d4b\u8bd5\u666e\u901a\u641c\u7d22\"\"\"\n try:\n \"\"\"\n \u4ee3\u7406\u914d\u7f6e\u8bf4\u660e\uff1a\n 1. \u4e0d\u4f7f\u7528\u4ee3\u7406\uff1a\u76f4\u63a5\u5220\u9664\u6216\u6ce8\u91ca\u6389 proxies \u914d\u7f6e\n 2. \u4f7f\u7528\u4ee3\u7406\uff1a\u53d6\u6d88\u6ce8\u91ca\u5e76\u4fee\u6539\u4ee3\u7406\u5730\u5740\n \"\"\"\n # \u4ee3\u7406\u914d\u7f6e\u793a\u4f8b\uff08\u5982\u9700\u4f7f\u7528\uff0c\u8bf7\u53d6\u6d88\u6ce8\u91ca\u5e76\u4fee\u6539\u4ee3\u7406\u5730\u5740\uff09\n # proxies = {\n # \"http://\": \"http://your-proxy-host:port\",\n # \"https://\": \"http://your-proxy-host:port\"\n # }\n \n print(\"\\n=== \u666e\u901a\u641c\u7d22\u7ed3\u679c ===\")\n results = await search(\n term=\"python programming\",\n num=10,\n lang=\"en\"\n )\n\n if not results:\n print(\"\u672a\u627e\u5230\u641c\u7d22\u7ed3\u679c\")\n return False\n\n for i, result in enumerate(results, 1):\n print(f\"\\n\u7ed3\u679c {i}:\")\n print(f\"\u6807\u9898: {result.title}\")\n print(f\"\u94fe\u63a5: {result.url}\")\n print(f\"\u63cf\u8ff0: {result.description}\")\n if result.time:\n print(f\"\u65f6\u95f4: {result.time}\")\n print(\"-\" * 80)\n\n return True\n except Exception as e:\n print(f\"\u666e\u901a\u641c\u7d22\u5931\u8d25: {str(e)}\")\n return False\n\nasync def test_news_search():\n \"\"\"\u6d4b\u8bd5\u65b0\u95fb\u641c\u7d22\"\"\"\n try:\n print(\"\\n=== \u65b0\u95fb\u641c\u7d22\u7ed3\u679c ===\")\n results = await search_news(\n term=\"python news\",\n num=5,\n lang=\"en\"\n )\n\n if not results:\n print(\"\u672a\u627e\u5230\u65b0\u95fb\u7ed3\u679c\")\n return False\n\n for i, result in enumerate(results, 1):\n print(f\"\\n\u65b0\u95fb {i}:\")\n print(f\"\u6807\u9898: {result.title}\")\n print(f\"\u94fe\u63a5: {result.url}\")\n print(f\"\u63cf\u8ff0: {result.description}\")\n if result.time:\n print(f\"\u65f6\u95f4: {result.time}\")\n print(\"-\" * 80)\n\n return True\n except Exception as e:\n print(f\"\u65b0\u95fb\u641c\u7d22\u5931\u8d25: {str(e)}\")\n return False\n\nasync def main():\n \"\"\"\u8fd0\u884c\u6240\u6709\u6d4b\u8bd5\"\"\"\n print(\"\u5f00\u59cb\u641c\u7d22...\\n\")\n await test_search()\n await test_news_search()\n\nif __name__ == \"__main__\":\n asyncio.run(main())\n\n```\n\n### \u4ee3\u7406\u914d\u7f6e\u8bf4\u660e\n\n1. **\u4e0d\u4f7f\u7528\u4ee3\u7406**\n - \u76f4\u63a5\u5220\u9664\u6216\u6ce8\u91ca\u6389 proxies \u914d\u7f6e\n - \u786e\u4fdd\u641c\u7d22\u51fd\u6570\u4e2d\u7684 proxies/proxy \u53c2\u6570\u4e5f\u88ab\u6ce8\u91ca\u6389\n\n2. **\u4f7f\u7528\u4ee3\u7406**\n - \u53d6\u6d88\u6ce8\u91ca proxies \u914d\u7f6e\n - \u4fee\u6539\u4ee3\u7406\u5730\u5740\u4e3a\u60a8\u7684\u5b9e\u9645\u4ee3\u7406\u670d\u52a1\u5668\u5730\u5740\n - \u53d6\u6d88\u6ce8\u91ca\u641c\u7d22\u51fd\u6570\u4e2d\u7684 proxies/proxy \u53c2\u6570\n\n### \u53c2\u6570\u8bf4\u660e\n\n- `url`: \u901a\u8fc7 `Config.get_random_domain()` \u83b7\u53d6\u7684\u968f\u673a Google \u57df\u540d\n- `headers`: \u5305\u542b\u968f\u673a User-Agent \u7684\u8bf7\u6c42\u5934\n- `term`: \u641c\u7d22\u67e5\u8be2\u5b57\u7b26\u4e32\n- `num`: \u8981\u83b7\u53d6\u7684\u7ed3\u679c\u6570\u91cf\n- `tbs`: \u65f6\u95f4\u8303\u56f4\u53c2\u6570\n - `qdr:h` - \u8fc7\u53bb\u4e00\u5c0f\u65f6\n - `qdr:d` - \u8fc7\u53bb\u4e00\u5929\n - `qdr:w` - \u8fc7\u53bb\u4e00\u5468\n - `qdr:m` - \u8fc7\u53bb\u4e00\u6708\n - `qdr:y` - \u8fc7\u53bb\u4e00\u5e74\n- `proxies`: \u4ee3\u7406\u914d\u7f6e\uff08\u53ef\u9009\uff09\n- `timeout`: \u8bf7\u6c42\u8d85\u65f6\u65f6\u95f4\uff08\u79d2\uff09\n\n### \u7ed3\u679c\u5bf9\u8c61\n\n\u6bcf\u4e2a\u641c\u7d22\u7ed3\u679c\u7684\u5bf9\u8c61\u5305\u542b\u4ee5\u4e0b\u5b57\u6bb5\uff1a\n\n- `link`\uff1a\u7ed3\u679c\u7684 URL\n- `title`\uff1a\u7ed3\u679c\u7684\u6807\u9898\n- `description`\uff1a\u7ed3\u679c\u7684\u63cf\u8ff0\n- `time_string`\uff1a\u7ed3\u679c\u7684\u65f6\u95f4\u4fe1\u606f\uff08\u5982\u679c\u6709\uff09\n\n## \u9ad8\u7ea7\u7528\u6cd5\n\n### \u83b7\u53d6\u968f\u673a\u57df\u540d\u548c\u8bf7\u6c42\u5934\n\n\u4e3a\u4e86\u907f\u514d\u8bf7\u6c42\u88ab\u9650\u5236\uff0c\u5e93\u63d0\u4f9b\u4e86\u83b7\u53d6\u968f\u673a Google \u641c\u7d22\u57df\u540d\u548c\u968f\u673a User-Agent \u7684\u529f\u80fd\uff1a\n\n```python \nfrom googlesearch.config.config import Config\n\n# \u83b7\u53d6\u968f\u673a Google \u641c\u7d22\u57df\u540d\nurl = Config.get_random_domain()\nprint(url) # \u8f93\u51fa\u793a\u4f8b: https://www.google.ge/search\n\n# \u83b7\u53d6\u968f\u673a User-Agent\nheaders = {\"User-Agent\": Config.get_random_user_agent()}\nprint(headers) # \u8f93\u51fa\u793a\u4f8b: {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.1.7760.206 Safari/537.36'}\n```\n\n### \u57df\u540d\u548c User-Agent \u66f4\u65b0\n\n\u57df\u540d\u5217\u8868\u548c User-Agent \u5217\u8868\u5b58\u50a8\u5728 `config/data` \u76ee\u5f55\u4e0b\uff1a\n- `all_domain.txt`: \u5305\u542b\u6240\u6709\u53ef\u7528\u7684 Google \u641c\u7d22\u57df\u540d\n- `user_agents.txt`: \u5305\u542b\u6700\u65b0\u7684 Chrome User-Agent \u5217\u8868\n\n\u5982\u9700\u66f4\u65b0\u8fd9\u4e9b\u5217\u8868\uff1a\n1. \u8fd0\u884c `fetch_and_save_user_domain.py` \u66f4\u65b0\u57df\u540d\u5217\u8868\n2. \u8fd0\u884c `fetch_and_save_user_agents.py` \u66f4\u65b0 User-Agent \u5217\u8868\n\n## \u9ad8\u7ea7\u641c\u7d22\u8bed\u6cd5\n\n> \u66f4\u591a\u8be6\u7ec6\u7684 Google \u641c\u7d22\u8fd0\u7b97\u7b26\u548c\u9ad8\u7ea7\u641c\u7d22\u6280\u5de7\uff0c\u8bf7\u8bbf\u95ee [Google \u641c\u7d22\u5e2e\u52a9](https://support.google.com/websearch/answer/2466433)\u3002\n\n### \u57fa\u7840\u641c\u7d22\u8fd0\u7b97\u7b26\n\n\u4ee5\u4e0b\u662f\u4e00\u4e9b\u5e38\u7528\u7684\u641c\u7d22\u8fd0\u7b97\u7b26\uff0c\u4f7f\u7528\u65f6\u8bf7\u6ce8\u610f\u8fd0\u7b97\u7b26\u548c\u641c\u7d22\u8bcd\u4e4b\u95f4\u4e0d\u8981\u6709\u7a7a\u683c\uff1a\n\n- **\u7cbe\u786e\u5339\u914d\u641c\u7d22**\uff1a\u4f7f\u7528\u5f15\u53f7\u5305\u56f4\u8bcd\u7ec4\uff0c\u5982 `\"exact phrase\"`\n- **\u7ad9\u5185\u641c\u7d22**\uff1a`site:domain.com keywords`\n- **\u6392\u9664\u7279\u5b9a\u8bcd**\uff1a\u4f7f\u7528\u51cf\u53f7\u6392\u9664\u8bcd\uff0c\u5982 `china -snake`\n- **\u6587\u4ef6\u7c7b\u578b**\uff1a`filetype:pdf keywords`\n- **\u6807\u9898\u641c\u7d22**\uff1a`intitle:keywords`\n- **URL\u641c\u7d22**\uff1a`inurl:keywords`\n- **\u591a\u4e2a\u6761\u4ef6**\uff1a`site:domain.com filetype:pdf keywords`\n\n### \u65f6\u95f4\u8303\u56f4\u53c2\u6570 (tbs)\n\n\u641c\u7d22\u51fd\u6570\u652f\u6301\u4ee5\u4e0b\u65f6\u95f4\u8303\u56f4\u53c2\u6570\uff1a\n\n```python\ntbs = {\n \"qdr:h\", # \u8fc7\u53bb\u4e00\u5c0f\u65f6\u5185\u7684\u7ed3\u679c\n \"qdr:d\", # \u8fc7\u53bb\u4e00\u5929\u5185\u7684\u7ed3\u679c\n \"qdr:w\", # \u8fc7\u53bb\u4e00\u5468\u5185\u7684\u7ed3\u679c\n \"qdr:m\", # \u8fc7\u53bb\u4e00\u6708\u5185\u7684\u7ed3\u679c\n \"qdr:y\" # \u8fc7\u53bb\u4e00\u5e74\u5185\u7684\u7ed3\u679c\n}\n```\n\n### \u5176\u4ed6\u641c\u7d22\u53c2\u6570\n\n```python\nparams = {\n \"hl\": \"zh-CN\", # \u754c\u9762\u8bed\u8a00\uff08\u4f8b\u5982\uff1azh-CN, en\uff09\n \"lr\": \"lang_zh\", # \u641c\u7d22\u7ed3\u679c\u8bed\u8a00\n \"safe\": \"active\", # \u5b89\u5168\u641c\u7d22\u8bbe\u7f6e\uff08\"active\"\u542f\u7528\u5b89\u5168\u641c\u7d22\uff09\n \"start\": 0, # \u7ed3\u679c\u8d77\u59cb\u4f4d\u7f6e\uff08\u5206\u9875\u7528\uff09\n \"num\": 100, # \u8fd4\u56de\u7ed3\u679c\u6570\u91cf\uff08\u6700\u5927100\uff09\n}\n```\n\n### \u9ad8\u7ea7\u641c\u7d22\u793a\u4f8b\n\n```python\n# \u5728\u7279\u5b9a\u7f51\u7ad9\u4e2d\u641c\u7d22PDF\u6587\u4ef6\nterm = \"site:example.com filetype:pdf china programming\"\n\n# \u641c\u7d22\u7279\u5b9a\u65f6\u95f4\u8303\u56f4\u5185\u7684\u65b0\u95fb\nterm = \"china news site:cnn.com\"\ntbs = \"qdr:d\" # \u8fc7\u53bb24\u5c0f\u65f6\u5185\u7684\u7ed3\u679c\n\n# \u7cbe\u786e\u5339\u914d\u6807\u9898\u4e2d\u7684\u77ed\u8bed\nterm = 'intitle:\"machine learning\" site:arxiv.org'\n\n# \u6392\u9664\u7279\u5b9a\u5185\u5bb9\nterm = \"china programming -beginner -tutorial site:github.com\"\n```\n\n### \u641c\u7d22\u7ed3\u679c\u8fc7\u6ee4\n\n\u641c\u7d22\u7ed3\u679c\u53ef\u4ee5\u6309\u4ee5\u4e0b\u7c7b\u578b\u8fdb\u884c\u8fc7\u6ee4\uff1a\n- \u7f51\u9875\uff08Web\uff09\n- \u65b0\u95fb\uff08News\uff09\n- \u56fe\u7247\uff08Images\uff09\n- \u89c6\u9891\uff08Videos\uff09\n\n\u5728\u6211\u4eec\u7684\u5e93\u4e2d\uff0c\u5df2\u7ecf\u4e3a\u4e0d\u540c\u7c7b\u578b\u7684\u641c\u7d22\u63d0\u4f9b\u4e86\u4e13\u95e8\u7684\u51fd\u6570\uff1a\n```python\n# \u666e\u901a\u7f51\u9875\u641c\u7d22\nresults = await search(...)\n\n# \u65b0\u95fb\u641c\u7d22\nnews_results = await search_news(...)\n```\n\n### \u641c\u7d22\u6280\u5de7\n\n1. **\u4f7f\u7528\u591a\u4e2a\u6761\u4ef6\u7ec4\u5408**\n ```python\n # \u5728\u591a\u4e2a\u7279\u5b9a\u7f51\u7ad9\u4e2d\u641c\u7d22\n term = \"site:edu.cn OR site:ac.cn machine learning\"\n ```\n\n2. **\u4f7f\u7528\u901a\u914d\u7b26**\n ```python\n # \u4f7f\u7528\u661f\u53f7\u4f5c\u4e3a\u901a\u914d\u7b26\n term = \"china * programming\"\n ```\n\n3. **\u4f7f\u7528\u6570\u5b57\u8303\u56f4**\n ```python\n # \u641c\u7d22\u7279\u5b9a\u5e74\u4efd\u8303\u56f4\n term = \"china programming 2020..2024\"\n ```\n\n4. **\u76f8\u5173\u8bcd\u641c\u7d22**\n ```python\n # \u4f7f\u7528\u6ce2\u6d6a\u53f7\u641c\u7d22\u76f8\u5173\u8bcd\n term = \"~programming tutorials\"\n ```\n\n## \u914d\u7f6e\u8bf4\u660e\n\n### \u4e3a\u4ec0\u4e48\u6211\u7684\u8bf7\u6c42\u603b\u662f\u8d85\u65f6\uff1f\n\n\u8bf7\u68c0\u67e5\u60a8\u7684\u7f51\u7edc\u8fde\u63a5\u548c\u4ee3\u7406\u8bbe\u7f6e\u3002\u786e\u4fdd\u4ee3\u7406\u914d\u7f6e\u6b63\u786e\uff0c\u5e76\u4e14\u76ee\u6807\u7f51\u7ad9\u6ca1\u6709\u88ab\u5c4f\u853d\u3002\n\n### \u5982\u4f55\u8fdb\u884c\u66f4\u590d\u6742\u7684\u67e5\u8be2\uff1f\n\n\u60a8\u53ef\u4ee5\u4f7f\u7528 Google \u641c\u7d22\u7684\u9ad8\u7ea7\u8bed\u6cd5\uff08\u5982 `site:`\u3001`filetype:` \u7b49\uff09\u6765\u6784\u9020\u66f4\u590d\u6742\u7684\u67e5\u8be2\u5b57\u7b26\u4e32\u3002\n\n### \u5982\u4f55\u5904\u7406\u8bf7\u6c42\u5931\u8d25\u6216\u5f02\u5e38\uff1f\n\n\u8bf7\u786e\u4fdd\u5728\u8bf7\u6c42\u4e2d\u8bbe\u7f6e\u9002\u5f53\u7684\u5f02\u5e38\u5904\u7406\uff0c\u5e76\u67e5\u770b\u9519\u8bef\u65e5\u5fd7\u4ee5\u83b7\u53d6\u66f4\u591a\u4fe1\u606f\u3002\u53ef\u4ee5\u53c2\u8003 [httpx \u6587\u6863](https://www.python-httpx.org/) \u4e86\u89e3\u66f4\u591a\u5173\u4e8e\u5f02\u5e38\u5904\u7406\u7684\u4fe1\u606f\u3002\n\n## \u6253\u5305\u8bf4\u660e\n\n\u4f7f\u7528 PyInstaller \u6253\u5305\u65f6\uff0c\u9700\u8981\u786e\u4fdd\u914d\u7f6e\u6587\u4ef6\u88ab\u6b63\u786e\u5305\u542b\u3002\u4ee5\u4e0b\u662f\u6253\u5305\u6b65\u9aa4\u548c\u6ce8\u610f\u4e8b\u9879\uff1a\n\n### 1. \u521b\u5efa spec \u6587\u4ef6\n\n```bash\npyi-makespec --onefile your_script.py\n```\n\n### 2. \u4fee\u6539 spec \u6587\u4ef6\n\n\u9700\u8981\u5728 spec \u6587\u4ef6\u4e2d\u6dfb\u52a0 datas \u53c2\u6570\uff0c\u786e\u4fdd\u5305\u542b\u5fc5\u8981\u7684\u914d\u7f6e\u6587\u4ef6\uff1a\n\n```python \n# your_script.spec\na = Analysis(\n ['your_script.py'],\n pathex=[],\n binaries=[],\n datas=[\n # \u6dfb\u52a0\u914d\u7f6e\u6587\u4ef6\n ('googlesearch/config/data/all_domain.txt', 'googlesearch/config/data'),\n ('googlesearch/config/data/user_agents.txt', 'googlesearch/config/data'),\n ],\n # ... \u5176\u4ed6\u914d\u7f6e ...\n)\n```\n\n### 3. \u4f7f\u7528 spec \u6587\u4ef6\u6253\u5305\n\n```bash\npyinstaller your_script.spec\n```\n\n### 4. \u9a8c\u8bc1\u6253\u5305\u7ed3\u679c\n\n\u8fd0\u884c\u6253\u5305\u540e\u7684\u7a0b\u5e8f\uff0c\u786e\u4fdd\u80fd\u6b63\u786e\u8bfb\u53d6\u914d\u7f6e\u6587\u4ef6\uff1a\n```python \nfrom googlesearch.config.config import Config\n\n# \u6d4b\u8bd5\u914d\u7f6e\u6587\u4ef6\u662f\u5426\u6b63\u786e\u52a0\u8f7d\nurl = Config.get_random_domain()\nheaders = {\"User-Agent\": Config.get_random_user_agent()}\n```\n\n\u5982\u679c\u51fa\u73b0\u6587\u4ef6\u672a\u627e\u5230\u7684\u9519\u8bef\uff0c\u8bf7\u68c0\u67e5 spec \u6587\u4ef6\u4e2d\u7684\u8def\u5f84\u914d\u7f6e\u662f\u5426\u6b63\u786e\u3002\n\n## \u5e38\u89c1\u95ee\u9898\n\n### \u4e3a\u4ec0\u4e48\u6211\u7684\u8bf7\u6c42\u603b\u662f\u8d85\u65f6\uff1f\n\n\u8bf7\u68c0\u67e5\u60a8\u7684\u7f51\u7edc\u8fde\u63a5\u548c\u4ee3\u7406\u8bbe\u7f6e\u3002\u786e\u4fdd\u4ee3\u7406\u914d\u7f6e\u6b63\u786e\uff0c\u5e76\u4e14\u76ee\u6807\u7f51\u7ad9\u6ca1\u6709\u88ab\u5c4f\u853d\u3002\n\n### \u5982\u4f55\u8fdb\u884c\u66f4\u590d\u6742\u7684\u67e5\u8be2\uff1f\n\n\u60a8\u53ef\u4ee5\u4f7f\u7528 Google \u641c\u7d22\u7684\u9ad8\u7ea7\u8bed\u6cd5\uff08\u5982 `site:` \u7b49\uff09\u6765\u6784\u9020\u66f4\u590d\u6742\u7684\u67e5\u8be2\u5b57\u7b26\u4e32\u3002\n\n### \u5982\u4f55\u5904\u7406\u8bf7\u6c42\u5931\u8d25\u6216\u5f02\u5e38\uff1f\n\n\u8bf7\u786e\u4fdd\u5728\u8bf7\u6c42\u4e2d\u8bbe\u7f6e\u9002\u5f53\u7684\u5f02\u5e38\u5904\u7406\uff0c\u5e76\u67e5\u770b\u9519\u8bef\u65e5\u5fd7\u4ee5\u83b7\u53d6\u66f4\u591a\u4fe1\u606f\u3002\u53ef\u4ee5\u53c2\u8003 [httpx \u6587\u6863](https://www.python-httpx.org/) \u4e86\u89e3\u66f4\u591a\u5173\u4e8e\u5f02\u5e38\u5904\u7406\u7684\u4fe1\u606f\u3002\n\n## \u53c2\u4e0e\u8d21\u732e\n\n\u6211\u4eec\u975e\u5e38\u6b22\u8fce\u793e\u533a\u6210\u5458\u53c2\u4e0e\u9879\u76ee\u5efa\u8bbe\uff01\u4ee5\u4e0b\u662f\u51e0\u79cd\u53c2\u4e0e\u65b9\u5f0f\uff1a\n\n### Star \u2b50 \u672c\u9879\u76ee\n\u5982\u679c\u60a8\u89c9\u5f97\u8fd9\u4e2a\u9879\u76ee\u5bf9\u60a8\u6709\u5e2e\u52a9\uff0c\u6b22\u8fce\u70b9\u51fb\u53f3\u4e0a\u89d2\u7684 Star \u6309\u94ae\u652f\u6301\u6211\u4eec\uff01\n\n### \u63d0\u4ea4 Issue \n\u53d1\u73b0 bug \u6216\u6709\u65b0\u529f\u80fd\u5efa\u8bae\uff1f\u6b22\u8fce\u63d0\u4ea4 [Issue](https://github.com/huazz233/googlesearch/issues)\uff01\n- \ud83d\udc1b Bug \u53cd\u9988\uff1a\u8bf7\u8be6\u7ec6\u63cf\u8ff0\u95ee\u9898\u73b0\u8c61\u548c\u590d\u73b0\u6b65\u9aa4\n- \ud83d\udca1 \u529f\u80fd\u5efa\u8bae\uff1a\u8bf7\u8bf4\u660e\u65b0\u529f\u80fd\u7684\u4f7f\u7528\u573a\u666f\u548c\u9884\u671f\u6548\u679c\n\n### Pull Request\n\u60f3\u8981\u4e3a\u9879\u76ee\u8d21\u732e\u4ee3\u7801\uff1f\u975e\u5e38\u6b22\u8fce\u63d0\u4ea4 PR\uff01\n\n1. Fork \u672c\u4ed3\u5e93\n2. \u521b\u5efa\u65b0\u5206\u652f: `git checkout -b feature/your-feature-name`\n3. \u63d0\u4ea4\u66f4\u6539: `git commit -am 'Add some feature'`\n4. \u63a8\u9001\u5206\u652f: `git push origin feature/your-feature-name`\n5. \u63d0\u4ea4 Pull Request\n\n\u6211\u4eec\u4f1a\u8ba4\u771f\u5ba1\u67e5\u6bcf\u4e00\u4e2a PR\uff0c\u5e76\u63d0\u4f9b\u53ca\u65f6\u53cd\u9988\u3002\n\n## \u793e\u533a\u652f\u6301\n\n- \ud83d\udceb \u90ae\u4ef6\u8054\u7cfb\uff1a[huazz233@163.com](mailto:huazz233@163.com)\n- \ud83d\udcac \u95ee\u9898\u53cd\u9988\uff1a[GitHub Issues](https://github.com/huazz233/googlesearch/issues)\n- \ud83d\udcd6 \u5f00\u53d1\u6587\u6863\uff1a[Wiki](https://github.com/huazz233/googlesearch/wiki)\n- \ud83d\udc65 \u8ba8\u8bba\u533a\uff1a[Discussions](https://github.com/huazz233/googlesearch/discussions)\n\n## \u8bb8\u53ef\u8bc1\n\n\u672c\u9879\u76ee\u91c7\u7528 MIT \u8bb8\u53ef\u8bc1 - \u67e5\u770b [LICENSE](LICENSE) \u4e86\u89e3\u8be6\u60c5\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A Python library for performing Google searches with support for dynamic query parameters, result deduplication, and custom proxy configuration.",
"version": "1.1.1",
"project_urls": {
"Bug Tracker": "https://github.com/huazz233/googlesearcher/issues",
"Documentation": "https://github.com/huazz233/googlesearcher#readme",
"Homepage": "https://github.com/huazz233/googlesearcher",
"Source Code": "https://github.com/huazz233/googlesearcher"
},
"split_keywords": [
"async",
" google",
" proxy",
" search",
" web-scraping"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "456542ea36f47a1ae06af7c3b2ee5b39525a8e489f856533f362fb39080b1cde",
"md5": "f2c4fa80dfb58b1dbaaa725061b82709",
"sha256": "1951df29d568417cdfd7c9dde375b7eed640a3f2004039b92e532c5dacbb585e"
},
"downloads": -1,
"filename": "googlesearch_tool-1.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f2c4fa80dfb58b1dbaaa725061b82709",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 30807,
"upload_time": "2024-12-02T02:00:42",
"upload_time_iso_8601": "2024-12-02T02:00:42.310312Z",
"url": "https://files.pythonhosted.org/packages/45/65/42ea36f47a1ae06af7c3b2ee5b39525a8e489f856533f362fb39080b1cde/googlesearch_tool-1.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "5061c4958e1671feb6b9115ca771b2b4549a77ad9267eb7c3f601d4f552449ce",
"md5": "692bbfb6013cc79b13faf62a283526bb",
"sha256": "f13f521fa2d88ad4e5cb6cad7b0d1e501c14f4f177b9719a8efdb4d79c4a9152"
},
"downloads": -1,
"filename": "googlesearch_tool-1.1.1.tar.gz",
"has_sig": false,
"md5_digest": "692bbfb6013cc79b13faf62a283526bb",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 24391,
"upload_time": "2024-12-02T02:00:43",
"upload_time_iso_8601": "2024-12-02T02:00:43.801779Z",
"url": "https://files.pythonhosted.org/packages/50/61/c4958e1671feb6b9115ca771b2b4549a77ad9267eb7c3f601d4f552449ce/googlesearch_tool-1.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-02 02:00:43",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "huazz233",
"github_project": "googlesearcher",
"github_not_found": true,
"lcname": "googlesearch-tool"
}