googlesearch-tool


Namegooglesearch-tool JSON
Version 1.1.1 PyPI version JSON
download
home_pageNone
SummaryA Python library for performing Google searches with support for dynamic query parameters, result deduplication, and custom proxy configuration.
upload_time2024-12-02 02:00:43
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT
keywords async google proxy search web-scraping
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # GooglSearch-Tool

**GooglSearch-Tool** 是一个 Python 库,用于进行 Google 搜索并获取搜索结果。支持动态查询参数、结果去重以及自定义代理配置。

[![GitHub stars](https://img.shields.io/github/stars/huazz233/googlesearch.svg)](https://github.com/huazz233/googlesearch/stargazers)
[![GitHub issues](https://img.shields.io/github/issues/huazz233/googlesearch.svg)](https://github.com/huazz233/googlesearch/issues)
[![GitHub license](https://img.shields.io/github/license/huazz233/googlesearch.svg)](https://github.com/huazz233/googlesearch/blob/master/LICENSE)

简体中文 | [English](README_EN.md)

## 目录

- [特性](#特性)
- [安装](#安装)
- [快速开始](#快速开始)
- [高级用法](#高级用法)
- [配置说明](#配置说明)
- [打包说明](#打包说明)
- [常见问题](#常见问题)
- [参与贡献](#参与贡献)
- [社区支持](#社区支持)

## 特性

- 支持 Google 搜索
- 可配置的查询参数(包括时间范围)
- 根据标题、URL 和摘要进行结果去重
- 支持自定义代理
- 搜索结果包括标题、链接、描述和时间信息
- 使用随机域名进行请求,防止访问受限
- 随机选择 User-Agent 请求头
- 手动更新并保存最新的 User-Agent 和 Google 域名列表(函数与保存位置在 `/config/data` 目录)

## 安装

可以通过 `pip` 安装 `googlesearch-tool`:

```bash
pip install googlesearch-tool
```

## 快速开始

以下是使用 GooglSearch-Tool 库的基本示例:

### 基础示例

```python
import asyncio
from googlesearch.search import search
from googlesearch.news_search import search_news

async def test_search():
    """测试普通搜索"""
    try:
        """
        代理配置说明:
        1. 不使用代理:直接删除或注释掉 proxies 配置
        2. 使用代理:取消注释并修改代理地址
        """
        # 代理配置示例(如需使用,请取消注释并修改代理地址)
        # proxies = {
        #     "http://": "http://your-proxy-host:port",
        #     "https://": "http://your-proxy-host:port"
        # }
         
        print("\n=== 普通搜索结果 ===")
        results = await search(
            term="python programming",
            num=10,
            lang="en"
        )

        if not results:
            print("未找到搜索结果")
            return False

        for i, result in enumerate(results, 1):
            print(f"\n结果 {i}:")
            print(f"标题: {result.title}")
            print(f"链接: {result.url}")
            print(f"描述: {result.description}")
            if result.time:
                print(f"时间: {result.time}")
            print("-" * 80)

        return True
    except Exception as e:
        print(f"普通搜索失败: {str(e)}")
        return False

async def test_news_search():
    """测试新闻搜索"""
    try:
        print("\n=== 新闻搜索结果 ===")
        results = await search_news(
            term="python news",
            num=5,
            lang="en"
        )

        if not results:
            print("未找到新闻结果")
            return False

        for i, result in enumerate(results, 1):
            print(f"\n新闻 {i}:")
            print(f"标题: {result.title}")
            print(f"链接: {result.url}")
            print(f"描述: {result.description}")
            if result.time:
                print(f"时间: {result.time}")
            print("-" * 80)

        return True
    except Exception as e:
        print(f"新闻搜索失败: {str(e)}")
        return False

async def main():
    """运行所有测试"""
    print("开始搜索...\n")
    await test_search()
    await test_news_search()

if __name__ == "__main__":
    asyncio.run(main())

```

### 代理配置说明

1. **不使用代理**
   - 直接删除或注释掉 proxies 配置
   - 确保搜索函数中的 proxies/proxy 参数也被注释掉

2. **使用代理**
   - 取消注释 proxies 配置
   - 修改代理地址为您的实际代理服务器地址
   - 取消注释搜索函数中的 proxies/proxy 参数

### 参数说明

- `url`: 通过 `Config.get_random_domain()` 获取的随机 Google 域名
- `headers`: 包含随机 User-Agent 的请求头
- `term`: 搜索查询字符串
- `num`: 要获取的结果数量
- `tbs`: 时间范围参数
  - `qdr:h` - 过去一小时
  - `qdr:d` - 过去一天
  - `qdr:w` - 过去一周
  - `qdr:m` - 过去一月
  - `qdr:y` - 过去一年
- `proxies`: 代理配置(可选)
- `timeout`: 请求超时时间(秒)

### 结果对象

每个搜索结果的对象包含以下字段:

- `link`:结果的 URL
- `title`:结果的标题
- `description`:结果的描述
- `time_string`:结果的时间信息(如果有)

## 高级用法

### 获取随机域名和请求头

为了避免请求被限制,库提供了获取随机 Google 搜索域名和随机 User-Agent 的功能:

```python 
from googlesearch.config.config import Config

# 获取随机 Google 搜索域名
url = Config.get_random_domain()
print(url)  # 输出示例: https://www.google.ge/search

# 获取随机 User-Agent
headers = {"User-Agent": Config.get_random_user_agent()}
print(headers)  # 输出示例: {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.1.7760.206 Safari/537.36'}
```

### 域名和 User-Agent 更新

域名列表和 User-Agent 列表存储在 `config/data` 目录下:
- `all_domain.txt`: 包含所有可用的 Google 搜索域名
- `user_agents.txt`: 包含最新的 Chrome User-Agent 列表

如需更新这些列表:
1. 运行 `fetch_and_save_user_domain.py` 更新域名列表
2. 运行 `fetch_and_save_user_agents.py` 更新 User-Agent 列表

## 高级搜索语法

> 更多详细的 Google 搜索运算符和高级搜索技巧,请访问 [Google 搜索帮助](https://support.google.com/websearch/answer/2466433)。

### 基础搜索运算符

以下是一些常用的搜索运算符,使用时请注意运算符和搜索词之间不要有空格:

- **精确匹配搜索**:使用引号包围词组,如 `"exact phrase"`
- **站内搜索**:`site:domain.com keywords`
- **排除特定词**:使用减号排除词,如 `china -snake`
- **文件类型**:`filetype:pdf keywords`
- **标题搜索**:`intitle:keywords`
- **URL搜索**:`inurl:keywords`
- **多个条件**:`site:domain.com filetype:pdf keywords`

### 时间范围参数 (tbs)

搜索函数支持以下时间范围参数:

```python
tbs = {
    "qdr:h",  # 过去一小时内的结果
    "qdr:d",  # 过去一天内的结果
    "qdr:w",  # 过去一周内的结果
    "qdr:m",  # 过去一月内的结果
    "qdr:y"   # 过去一年内的结果
}
```

### 其他搜索参数

```python
params = {
    "hl": "zh-CN",     # 界面语言(例如:zh-CN, en)
    "lr": "lang_zh",   # 搜索结果语言
    "safe": "active",  # 安全搜索设置("active"启用安全搜索)
    "start": 0,        # 结果起始位置(分页用)
    "num": 100,        # 返回结果数量(最大100)
}
```

### 高级搜索示例

```python
# 在特定网站中搜索PDF文件
term = "site:example.com filetype:pdf china programming"

# 搜索特定时间范围内的新闻
term = "china news site:cnn.com"
tbs = "qdr:d"  # 过去24小时内的结果

# 精确匹配标题中的短语
term = 'intitle:"machine learning" site:arxiv.org'

# 排除特定内容
term = "china programming -beginner -tutorial site:github.com"
```

### 搜索结果过滤

搜索结果可以按以下类型进行过滤:
- 网页(Web)
- 新闻(News)
- 图片(Images)
- 视频(Videos)

在我们的库中,已经为不同类型的搜索提供了专门的函数:
```python
# 普通网页搜索
results = await search(...)

# 新闻搜索
news_results = await search_news(...)
```

### 搜索技巧

1. **使用多个条件组合**
   ```python
   # 在多个特定网站中搜索
   term = "site:edu.cn OR site:ac.cn machine learning"
   ```

2. **使用通配符**
   ```python
   # 使用星号作为通配符
   term = "china * programming"
   ```

3. **使用数字范围**
   ```python
   # 搜索特定年份范围
   term = "china programming 2020..2024"
   ```

4. **相关词搜索**
   ```python
   # 使用波浪号搜索相关词
   term = "~programming tutorials"
   ```

## 配置说明

### 为什么我的请求总是超时?

请检查您的网络连接和代理设置。确保代理配置正确,并且目标网站没有被屏蔽。

### 如何进行更复杂的查询?

您可以使用 Google 搜索的高级语法(如 `site:`、`filetype:` 等)来构造更复杂的查询字符串。

### 如何处理请求失败或异常?

请确保在请求中设置适当的异常处理,并查看错误日志以获取更多信息。可以参考 [httpx 文档](https://www.python-httpx.org/) 了解更多关于异常处理的信息。

## 打包说明

使用 PyInstaller 打包时,需要确保配置文件被正确包含。以下是打包步骤和注意事项:

### 1. 创建 spec 文件

```bash
pyi-makespec --onefile your_script.py
```

### 2. 修改 spec 文件

需要在 spec 文件中添加 datas 参数,确保包含必要的配置文件:

```python 
# your_script.spec
a = Analysis(
    ['your_script.py'],
    pathex=[],
    binaries=[],
    datas=[
        # 添加配置文件
        ('googlesearch/config/data/all_domain.txt', 'googlesearch/config/data'),
        ('googlesearch/config/data/user_agents.txt', 'googlesearch/config/data'),
    ],
    # ... 其他配置 ...
)
```

### 3. 使用 spec 文件打包

```bash
pyinstaller your_script.spec
```

### 4. 验证打包结果

运行打包后的程序,确保能正确读取配置文件:
```python 
from googlesearch.config.config import Config

# 测试配置文件是否正确加载
url = Config.get_random_domain()
headers = {"User-Agent": Config.get_random_user_agent()}
```

如果出现文件未找到的错误,请检查 spec 文件中的路径配置是否正确。

## 常见问题

### 为什么我的请求总是超时?

请检查您的网络连接和代理设置。确保代理配置正确,并且目标网站没有被屏蔽。

### 如何进行更复杂的查询?

您可以使用 Google 搜索的高级语法(如 `site:` 等)来构造更复杂的查询字符串。

### 如何处理请求失败或异常?

请确保在请求中设置适当的异常处理,并查看错误日志以获取更多信息。可以参考 [httpx 文档](https://www.python-httpx.org/) 了解更多关于异常处理的信息。

## 参与贡献

我们非常欢迎社区成员参与项目建设!以下是几种参与方式:

### Star ⭐ 本项目
如果您觉得这个项目对您有帮助,欢迎点击右上角的 Star 按钮支持我们!

### 提交 Issue 
发现 bug 或有新功能建议?欢迎提交 [Issue](https://github.com/huazz233/googlesearch/issues)!
- 🐛 Bug 反馈:请详细描述问题现象和复现步骤
- 💡 功能建议:请说明新功能的使用场景和预期效果

### Pull Request
想要为项目贡献代码?非常欢迎提交 PR!

1. Fork 本仓库
2. 创建新分支: `git checkout -b feature/your-feature-name`
3. 提交更改: `git commit -am 'Add some feature'`
4. 推送分支: `git push origin feature/your-feature-name`
5. 提交 Pull Request

我们会认真审查每一个 PR,并提供及时反馈。

## 社区支持

- 📫 邮件联系:[huazz233@163.com](mailto:huazz233@163.com)
- 💬 问题反馈:[GitHub Issues](https://github.com/huazz233/googlesearch/issues)
- 📖 开发文档:[Wiki](https://github.com/huazz233/googlesearch/wiki)
- 👥 讨论区:[Discussions](https://github.com/huazz233/googlesearch/discussions)

## 许可证

本项目采用 MIT 许可证 - 查看 [LICENSE](LICENSE) 了解详情

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "googlesearch-tool",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "async, google, proxy, search, web-scraping",
    "author": null,
    "author_email": "huazz233 <huazz233@163.com>",
    "download_url": "https://files.pythonhosted.org/packages/50/61/c4958e1671feb6b9115ca771b2b4549a77ad9267eb7c3f601d4f552449ce/googlesearch_tool-1.1.1.tar.gz",
    "platform": null,
    "description": "# GooglSearch-Tool\n\n**GooglSearch-Tool** \u662f\u4e00\u4e2a Python \u5e93\uff0c\u7528\u4e8e\u8fdb\u884c Google \u641c\u7d22\u5e76\u83b7\u53d6\u641c\u7d22\u7ed3\u679c\u3002\u652f\u6301\u52a8\u6001\u67e5\u8be2\u53c2\u6570\u3001\u7ed3\u679c\u53bb\u91cd\u4ee5\u53ca\u81ea\u5b9a\u4e49\u4ee3\u7406\u914d\u7f6e\u3002\n\n[![GitHub stars](https://img.shields.io/github/stars/huazz233/googlesearch.svg)](https://github.com/huazz233/googlesearch/stargazers)\n[![GitHub issues](https://img.shields.io/github/issues/huazz233/googlesearch.svg)](https://github.com/huazz233/googlesearch/issues)\n[![GitHub license](https://img.shields.io/github/license/huazz233/googlesearch.svg)](https://github.com/huazz233/googlesearch/blob/master/LICENSE)\n\n\u7b80\u4f53\u4e2d\u6587 | [English](README_EN.md)\n\n## \u76ee\u5f55\n\n- [\u7279\u6027](#\u7279\u6027)\n- [\u5b89\u88c5](#\u5b89\u88c5)\n- [\u5feb\u901f\u5f00\u59cb](#\u5feb\u901f\u5f00\u59cb)\n- [\u9ad8\u7ea7\u7528\u6cd5](#\u9ad8\u7ea7\u7528\u6cd5)\n- [\u914d\u7f6e\u8bf4\u660e](#\u914d\u7f6e\u8bf4\u660e)\n- [\u6253\u5305\u8bf4\u660e](#\u6253\u5305\u8bf4\u660e)\n- [\u5e38\u89c1\u95ee\u9898](#\u5e38\u89c1\u95ee\u9898)\n- [\u53c2\u4e0e\u8d21\u732e](#\u53c2\u4e0e\u8d21\u732e)\n- [\u793e\u533a\u652f\u6301](#\u793e\u533a\u652f\u6301)\n\n## \u7279\u6027\n\n- \u652f\u6301 Google \u641c\u7d22\n- \u53ef\u914d\u7f6e\u7684\u67e5\u8be2\u53c2\u6570\uff08\u5305\u62ec\u65f6\u95f4\u8303\u56f4\uff09\n- \u6839\u636e\u6807\u9898\u3001URL \u548c\u6458\u8981\u8fdb\u884c\u7ed3\u679c\u53bb\u91cd\n- \u652f\u6301\u81ea\u5b9a\u4e49\u4ee3\u7406\n- \u641c\u7d22\u7ed3\u679c\u5305\u62ec\u6807\u9898\u3001\u94fe\u63a5\u3001\u63cf\u8ff0\u548c\u65f6\u95f4\u4fe1\u606f\n- \u4f7f\u7528\u968f\u673a\u57df\u540d\u8fdb\u884c\u8bf7\u6c42\uff0c\u9632\u6b62\u8bbf\u95ee\u53d7\u9650\n- \u968f\u673a\u9009\u62e9 User-Agent \u8bf7\u6c42\u5934\n- \u624b\u52a8\u66f4\u65b0\u5e76\u4fdd\u5b58\u6700\u65b0\u7684 User-Agent \u548c Google \u57df\u540d\u5217\u8868\uff08\u51fd\u6570\u4e0e\u4fdd\u5b58\u4f4d\u7f6e\u5728 `/config/data` \u76ee\u5f55\uff09\n\n## \u5b89\u88c5\n\n\u53ef\u4ee5\u901a\u8fc7 `pip` \u5b89\u88c5 `googlesearch-tool`\uff1a\n\n```bash\npip install googlesearch-tool\n```\n\n## \u5feb\u901f\u5f00\u59cb\n\n\u4ee5\u4e0b\u662f\u4f7f\u7528 GooglSearch-Tool \u5e93\u7684\u57fa\u672c\u793a\u4f8b\uff1a\n\n### \u57fa\u7840\u793a\u4f8b\n\n```python\nimport asyncio\nfrom googlesearch.search import search\nfrom googlesearch.news_search import search_news\n\nasync def test_search():\n    \"\"\"\u6d4b\u8bd5\u666e\u901a\u641c\u7d22\"\"\"\n    try:\n        \"\"\"\n        \u4ee3\u7406\u914d\u7f6e\u8bf4\u660e\uff1a\n        1. \u4e0d\u4f7f\u7528\u4ee3\u7406\uff1a\u76f4\u63a5\u5220\u9664\u6216\u6ce8\u91ca\u6389 proxies \u914d\u7f6e\n        2. \u4f7f\u7528\u4ee3\u7406\uff1a\u53d6\u6d88\u6ce8\u91ca\u5e76\u4fee\u6539\u4ee3\u7406\u5730\u5740\n        \"\"\"\n        # \u4ee3\u7406\u914d\u7f6e\u793a\u4f8b\uff08\u5982\u9700\u4f7f\u7528\uff0c\u8bf7\u53d6\u6d88\u6ce8\u91ca\u5e76\u4fee\u6539\u4ee3\u7406\u5730\u5740\uff09\n        # proxies = {\n        #     \"http://\": \"http://your-proxy-host:port\",\n        #     \"https://\": \"http://your-proxy-host:port\"\n        # }\n         \n        print(\"\\n=== \u666e\u901a\u641c\u7d22\u7ed3\u679c ===\")\n        results = await search(\n            term=\"python programming\",\n            num=10,\n            lang=\"en\"\n        )\n\n        if not results:\n            print(\"\u672a\u627e\u5230\u641c\u7d22\u7ed3\u679c\")\n            return False\n\n        for i, result in enumerate(results, 1):\n            print(f\"\\n\u7ed3\u679c {i}:\")\n            print(f\"\u6807\u9898: {result.title}\")\n            print(f\"\u94fe\u63a5: {result.url}\")\n            print(f\"\u63cf\u8ff0: {result.description}\")\n            if result.time:\n                print(f\"\u65f6\u95f4: {result.time}\")\n            print(\"-\" * 80)\n\n        return True\n    except Exception as e:\n        print(f\"\u666e\u901a\u641c\u7d22\u5931\u8d25: {str(e)}\")\n        return False\n\nasync def test_news_search():\n    \"\"\"\u6d4b\u8bd5\u65b0\u95fb\u641c\u7d22\"\"\"\n    try:\n        print(\"\\n=== \u65b0\u95fb\u641c\u7d22\u7ed3\u679c ===\")\n        results = await search_news(\n            term=\"python news\",\n            num=5,\n            lang=\"en\"\n        )\n\n        if not results:\n            print(\"\u672a\u627e\u5230\u65b0\u95fb\u7ed3\u679c\")\n            return False\n\n        for i, result in enumerate(results, 1):\n            print(f\"\\n\u65b0\u95fb {i}:\")\n            print(f\"\u6807\u9898: {result.title}\")\n            print(f\"\u94fe\u63a5: {result.url}\")\n            print(f\"\u63cf\u8ff0: {result.description}\")\n            if result.time:\n                print(f\"\u65f6\u95f4: {result.time}\")\n            print(\"-\" * 80)\n\n        return True\n    except Exception as e:\n        print(f\"\u65b0\u95fb\u641c\u7d22\u5931\u8d25: {str(e)}\")\n        return False\n\nasync def main():\n    \"\"\"\u8fd0\u884c\u6240\u6709\u6d4b\u8bd5\"\"\"\n    print(\"\u5f00\u59cb\u641c\u7d22...\\n\")\n    await test_search()\n    await test_news_search()\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n\n```\n\n### \u4ee3\u7406\u914d\u7f6e\u8bf4\u660e\n\n1. **\u4e0d\u4f7f\u7528\u4ee3\u7406**\n   - \u76f4\u63a5\u5220\u9664\u6216\u6ce8\u91ca\u6389 proxies \u914d\u7f6e\n   - \u786e\u4fdd\u641c\u7d22\u51fd\u6570\u4e2d\u7684 proxies/proxy \u53c2\u6570\u4e5f\u88ab\u6ce8\u91ca\u6389\n\n2. **\u4f7f\u7528\u4ee3\u7406**\n   - \u53d6\u6d88\u6ce8\u91ca proxies \u914d\u7f6e\n   - \u4fee\u6539\u4ee3\u7406\u5730\u5740\u4e3a\u60a8\u7684\u5b9e\u9645\u4ee3\u7406\u670d\u52a1\u5668\u5730\u5740\n   - \u53d6\u6d88\u6ce8\u91ca\u641c\u7d22\u51fd\u6570\u4e2d\u7684 proxies/proxy \u53c2\u6570\n\n### \u53c2\u6570\u8bf4\u660e\n\n- `url`: \u901a\u8fc7 `Config.get_random_domain()` \u83b7\u53d6\u7684\u968f\u673a Google \u57df\u540d\n- `headers`: \u5305\u542b\u968f\u673a User-Agent \u7684\u8bf7\u6c42\u5934\n- `term`: \u641c\u7d22\u67e5\u8be2\u5b57\u7b26\u4e32\n- `num`: \u8981\u83b7\u53d6\u7684\u7ed3\u679c\u6570\u91cf\n- `tbs`: \u65f6\u95f4\u8303\u56f4\u53c2\u6570\n  - `qdr:h` - \u8fc7\u53bb\u4e00\u5c0f\u65f6\n  - `qdr:d` - \u8fc7\u53bb\u4e00\u5929\n  - `qdr:w` - \u8fc7\u53bb\u4e00\u5468\n  - `qdr:m` - \u8fc7\u53bb\u4e00\u6708\n  - `qdr:y` - \u8fc7\u53bb\u4e00\u5e74\n- `proxies`: \u4ee3\u7406\u914d\u7f6e\uff08\u53ef\u9009\uff09\n- `timeout`: \u8bf7\u6c42\u8d85\u65f6\u65f6\u95f4\uff08\u79d2\uff09\n\n### \u7ed3\u679c\u5bf9\u8c61\n\n\u6bcf\u4e2a\u641c\u7d22\u7ed3\u679c\u7684\u5bf9\u8c61\u5305\u542b\u4ee5\u4e0b\u5b57\u6bb5\uff1a\n\n- `link`\uff1a\u7ed3\u679c\u7684 URL\n- `title`\uff1a\u7ed3\u679c\u7684\u6807\u9898\n- `description`\uff1a\u7ed3\u679c\u7684\u63cf\u8ff0\n- `time_string`\uff1a\u7ed3\u679c\u7684\u65f6\u95f4\u4fe1\u606f\uff08\u5982\u679c\u6709\uff09\n\n## \u9ad8\u7ea7\u7528\u6cd5\n\n### \u83b7\u53d6\u968f\u673a\u57df\u540d\u548c\u8bf7\u6c42\u5934\n\n\u4e3a\u4e86\u907f\u514d\u8bf7\u6c42\u88ab\u9650\u5236\uff0c\u5e93\u63d0\u4f9b\u4e86\u83b7\u53d6\u968f\u673a Google \u641c\u7d22\u57df\u540d\u548c\u968f\u673a User-Agent \u7684\u529f\u80fd\uff1a\n\n```python \nfrom googlesearch.config.config import Config\n\n# \u83b7\u53d6\u968f\u673a Google \u641c\u7d22\u57df\u540d\nurl = Config.get_random_domain()\nprint(url)  # \u8f93\u51fa\u793a\u4f8b: https://www.google.ge/search\n\n# \u83b7\u53d6\u968f\u673a User-Agent\nheaders = {\"User-Agent\": Config.get_random_user_agent()}\nprint(headers)  # \u8f93\u51fa\u793a\u4f8b: {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.1.7760.206 Safari/537.36'}\n```\n\n### \u57df\u540d\u548c User-Agent \u66f4\u65b0\n\n\u57df\u540d\u5217\u8868\u548c User-Agent \u5217\u8868\u5b58\u50a8\u5728 `config/data` \u76ee\u5f55\u4e0b\uff1a\n- `all_domain.txt`: \u5305\u542b\u6240\u6709\u53ef\u7528\u7684 Google \u641c\u7d22\u57df\u540d\n- `user_agents.txt`: \u5305\u542b\u6700\u65b0\u7684 Chrome User-Agent \u5217\u8868\n\n\u5982\u9700\u66f4\u65b0\u8fd9\u4e9b\u5217\u8868\uff1a\n1. \u8fd0\u884c `fetch_and_save_user_domain.py` \u66f4\u65b0\u57df\u540d\u5217\u8868\n2. \u8fd0\u884c `fetch_and_save_user_agents.py` \u66f4\u65b0 User-Agent \u5217\u8868\n\n## \u9ad8\u7ea7\u641c\u7d22\u8bed\u6cd5\n\n> \u66f4\u591a\u8be6\u7ec6\u7684 Google \u641c\u7d22\u8fd0\u7b97\u7b26\u548c\u9ad8\u7ea7\u641c\u7d22\u6280\u5de7\uff0c\u8bf7\u8bbf\u95ee [Google \u641c\u7d22\u5e2e\u52a9](https://support.google.com/websearch/answer/2466433)\u3002\n\n### \u57fa\u7840\u641c\u7d22\u8fd0\u7b97\u7b26\n\n\u4ee5\u4e0b\u662f\u4e00\u4e9b\u5e38\u7528\u7684\u641c\u7d22\u8fd0\u7b97\u7b26\uff0c\u4f7f\u7528\u65f6\u8bf7\u6ce8\u610f\u8fd0\u7b97\u7b26\u548c\u641c\u7d22\u8bcd\u4e4b\u95f4\u4e0d\u8981\u6709\u7a7a\u683c\uff1a\n\n- **\u7cbe\u786e\u5339\u914d\u641c\u7d22**\uff1a\u4f7f\u7528\u5f15\u53f7\u5305\u56f4\u8bcd\u7ec4\uff0c\u5982 `\"exact phrase\"`\n- **\u7ad9\u5185\u641c\u7d22**\uff1a`site:domain.com keywords`\n- **\u6392\u9664\u7279\u5b9a\u8bcd**\uff1a\u4f7f\u7528\u51cf\u53f7\u6392\u9664\u8bcd\uff0c\u5982 `china -snake`\n- **\u6587\u4ef6\u7c7b\u578b**\uff1a`filetype:pdf keywords`\n- **\u6807\u9898\u641c\u7d22**\uff1a`intitle:keywords`\n- **URL\u641c\u7d22**\uff1a`inurl:keywords`\n- **\u591a\u4e2a\u6761\u4ef6**\uff1a`site:domain.com filetype:pdf keywords`\n\n### \u65f6\u95f4\u8303\u56f4\u53c2\u6570 (tbs)\n\n\u641c\u7d22\u51fd\u6570\u652f\u6301\u4ee5\u4e0b\u65f6\u95f4\u8303\u56f4\u53c2\u6570\uff1a\n\n```python\ntbs = {\n    \"qdr:h\",  # \u8fc7\u53bb\u4e00\u5c0f\u65f6\u5185\u7684\u7ed3\u679c\n    \"qdr:d\",  # \u8fc7\u53bb\u4e00\u5929\u5185\u7684\u7ed3\u679c\n    \"qdr:w\",  # \u8fc7\u53bb\u4e00\u5468\u5185\u7684\u7ed3\u679c\n    \"qdr:m\",  # \u8fc7\u53bb\u4e00\u6708\u5185\u7684\u7ed3\u679c\n    \"qdr:y\"   # \u8fc7\u53bb\u4e00\u5e74\u5185\u7684\u7ed3\u679c\n}\n```\n\n### \u5176\u4ed6\u641c\u7d22\u53c2\u6570\n\n```python\nparams = {\n    \"hl\": \"zh-CN\",     # \u754c\u9762\u8bed\u8a00\uff08\u4f8b\u5982\uff1azh-CN, en\uff09\n    \"lr\": \"lang_zh\",   # \u641c\u7d22\u7ed3\u679c\u8bed\u8a00\n    \"safe\": \"active\",  # \u5b89\u5168\u641c\u7d22\u8bbe\u7f6e\uff08\"active\"\u542f\u7528\u5b89\u5168\u641c\u7d22\uff09\n    \"start\": 0,        # \u7ed3\u679c\u8d77\u59cb\u4f4d\u7f6e\uff08\u5206\u9875\u7528\uff09\n    \"num\": 100,        # \u8fd4\u56de\u7ed3\u679c\u6570\u91cf\uff08\u6700\u5927100\uff09\n}\n```\n\n### \u9ad8\u7ea7\u641c\u7d22\u793a\u4f8b\n\n```python\n# \u5728\u7279\u5b9a\u7f51\u7ad9\u4e2d\u641c\u7d22PDF\u6587\u4ef6\nterm = \"site:example.com filetype:pdf china programming\"\n\n# \u641c\u7d22\u7279\u5b9a\u65f6\u95f4\u8303\u56f4\u5185\u7684\u65b0\u95fb\nterm = \"china news site:cnn.com\"\ntbs = \"qdr:d\"  # \u8fc7\u53bb24\u5c0f\u65f6\u5185\u7684\u7ed3\u679c\n\n# \u7cbe\u786e\u5339\u914d\u6807\u9898\u4e2d\u7684\u77ed\u8bed\nterm = 'intitle:\"machine learning\" site:arxiv.org'\n\n# \u6392\u9664\u7279\u5b9a\u5185\u5bb9\nterm = \"china programming -beginner -tutorial site:github.com\"\n```\n\n### \u641c\u7d22\u7ed3\u679c\u8fc7\u6ee4\n\n\u641c\u7d22\u7ed3\u679c\u53ef\u4ee5\u6309\u4ee5\u4e0b\u7c7b\u578b\u8fdb\u884c\u8fc7\u6ee4\uff1a\n- \u7f51\u9875\uff08Web\uff09\n- \u65b0\u95fb\uff08News\uff09\n- \u56fe\u7247\uff08Images\uff09\n- \u89c6\u9891\uff08Videos\uff09\n\n\u5728\u6211\u4eec\u7684\u5e93\u4e2d\uff0c\u5df2\u7ecf\u4e3a\u4e0d\u540c\u7c7b\u578b\u7684\u641c\u7d22\u63d0\u4f9b\u4e86\u4e13\u95e8\u7684\u51fd\u6570\uff1a\n```python\n# \u666e\u901a\u7f51\u9875\u641c\u7d22\nresults = await search(...)\n\n# \u65b0\u95fb\u641c\u7d22\nnews_results = await search_news(...)\n```\n\n### \u641c\u7d22\u6280\u5de7\n\n1. **\u4f7f\u7528\u591a\u4e2a\u6761\u4ef6\u7ec4\u5408**\n   ```python\n   # \u5728\u591a\u4e2a\u7279\u5b9a\u7f51\u7ad9\u4e2d\u641c\u7d22\n   term = \"site:edu.cn OR site:ac.cn machine learning\"\n   ```\n\n2. **\u4f7f\u7528\u901a\u914d\u7b26**\n   ```python\n   # \u4f7f\u7528\u661f\u53f7\u4f5c\u4e3a\u901a\u914d\u7b26\n   term = \"china * programming\"\n   ```\n\n3. **\u4f7f\u7528\u6570\u5b57\u8303\u56f4**\n   ```python\n   # \u641c\u7d22\u7279\u5b9a\u5e74\u4efd\u8303\u56f4\n   term = \"china programming 2020..2024\"\n   ```\n\n4. **\u76f8\u5173\u8bcd\u641c\u7d22**\n   ```python\n   # \u4f7f\u7528\u6ce2\u6d6a\u53f7\u641c\u7d22\u76f8\u5173\u8bcd\n   term = \"~programming tutorials\"\n   ```\n\n## \u914d\u7f6e\u8bf4\u660e\n\n### \u4e3a\u4ec0\u4e48\u6211\u7684\u8bf7\u6c42\u603b\u662f\u8d85\u65f6\uff1f\n\n\u8bf7\u68c0\u67e5\u60a8\u7684\u7f51\u7edc\u8fde\u63a5\u548c\u4ee3\u7406\u8bbe\u7f6e\u3002\u786e\u4fdd\u4ee3\u7406\u914d\u7f6e\u6b63\u786e\uff0c\u5e76\u4e14\u76ee\u6807\u7f51\u7ad9\u6ca1\u6709\u88ab\u5c4f\u853d\u3002\n\n### \u5982\u4f55\u8fdb\u884c\u66f4\u590d\u6742\u7684\u67e5\u8be2\uff1f\n\n\u60a8\u53ef\u4ee5\u4f7f\u7528 Google \u641c\u7d22\u7684\u9ad8\u7ea7\u8bed\u6cd5\uff08\u5982 `site:`\u3001`filetype:` \u7b49\uff09\u6765\u6784\u9020\u66f4\u590d\u6742\u7684\u67e5\u8be2\u5b57\u7b26\u4e32\u3002\n\n### \u5982\u4f55\u5904\u7406\u8bf7\u6c42\u5931\u8d25\u6216\u5f02\u5e38\uff1f\n\n\u8bf7\u786e\u4fdd\u5728\u8bf7\u6c42\u4e2d\u8bbe\u7f6e\u9002\u5f53\u7684\u5f02\u5e38\u5904\u7406\uff0c\u5e76\u67e5\u770b\u9519\u8bef\u65e5\u5fd7\u4ee5\u83b7\u53d6\u66f4\u591a\u4fe1\u606f\u3002\u53ef\u4ee5\u53c2\u8003 [httpx \u6587\u6863](https://www.python-httpx.org/) \u4e86\u89e3\u66f4\u591a\u5173\u4e8e\u5f02\u5e38\u5904\u7406\u7684\u4fe1\u606f\u3002\n\n## \u6253\u5305\u8bf4\u660e\n\n\u4f7f\u7528 PyInstaller \u6253\u5305\u65f6\uff0c\u9700\u8981\u786e\u4fdd\u914d\u7f6e\u6587\u4ef6\u88ab\u6b63\u786e\u5305\u542b\u3002\u4ee5\u4e0b\u662f\u6253\u5305\u6b65\u9aa4\u548c\u6ce8\u610f\u4e8b\u9879\uff1a\n\n### 1. \u521b\u5efa spec \u6587\u4ef6\n\n```bash\npyi-makespec --onefile your_script.py\n```\n\n### 2. \u4fee\u6539 spec \u6587\u4ef6\n\n\u9700\u8981\u5728 spec \u6587\u4ef6\u4e2d\u6dfb\u52a0 datas \u53c2\u6570\uff0c\u786e\u4fdd\u5305\u542b\u5fc5\u8981\u7684\u914d\u7f6e\u6587\u4ef6\uff1a\n\n```python \n# your_script.spec\na = Analysis(\n    ['your_script.py'],\n    pathex=[],\n    binaries=[],\n    datas=[\n        # \u6dfb\u52a0\u914d\u7f6e\u6587\u4ef6\n        ('googlesearch/config/data/all_domain.txt', 'googlesearch/config/data'),\n        ('googlesearch/config/data/user_agents.txt', 'googlesearch/config/data'),\n    ],\n    # ... \u5176\u4ed6\u914d\u7f6e ...\n)\n```\n\n### 3. \u4f7f\u7528 spec \u6587\u4ef6\u6253\u5305\n\n```bash\npyinstaller your_script.spec\n```\n\n### 4. \u9a8c\u8bc1\u6253\u5305\u7ed3\u679c\n\n\u8fd0\u884c\u6253\u5305\u540e\u7684\u7a0b\u5e8f\uff0c\u786e\u4fdd\u80fd\u6b63\u786e\u8bfb\u53d6\u914d\u7f6e\u6587\u4ef6\uff1a\n```python \nfrom googlesearch.config.config import Config\n\n# \u6d4b\u8bd5\u914d\u7f6e\u6587\u4ef6\u662f\u5426\u6b63\u786e\u52a0\u8f7d\nurl = Config.get_random_domain()\nheaders = {\"User-Agent\": Config.get_random_user_agent()}\n```\n\n\u5982\u679c\u51fa\u73b0\u6587\u4ef6\u672a\u627e\u5230\u7684\u9519\u8bef\uff0c\u8bf7\u68c0\u67e5 spec \u6587\u4ef6\u4e2d\u7684\u8def\u5f84\u914d\u7f6e\u662f\u5426\u6b63\u786e\u3002\n\n## \u5e38\u89c1\u95ee\u9898\n\n### \u4e3a\u4ec0\u4e48\u6211\u7684\u8bf7\u6c42\u603b\u662f\u8d85\u65f6\uff1f\n\n\u8bf7\u68c0\u67e5\u60a8\u7684\u7f51\u7edc\u8fde\u63a5\u548c\u4ee3\u7406\u8bbe\u7f6e\u3002\u786e\u4fdd\u4ee3\u7406\u914d\u7f6e\u6b63\u786e\uff0c\u5e76\u4e14\u76ee\u6807\u7f51\u7ad9\u6ca1\u6709\u88ab\u5c4f\u853d\u3002\n\n### \u5982\u4f55\u8fdb\u884c\u66f4\u590d\u6742\u7684\u67e5\u8be2\uff1f\n\n\u60a8\u53ef\u4ee5\u4f7f\u7528 Google \u641c\u7d22\u7684\u9ad8\u7ea7\u8bed\u6cd5\uff08\u5982 `site:` \u7b49\uff09\u6765\u6784\u9020\u66f4\u590d\u6742\u7684\u67e5\u8be2\u5b57\u7b26\u4e32\u3002\n\n### \u5982\u4f55\u5904\u7406\u8bf7\u6c42\u5931\u8d25\u6216\u5f02\u5e38\uff1f\n\n\u8bf7\u786e\u4fdd\u5728\u8bf7\u6c42\u4e2d\u8bbe\u7f6e\u9002\u5f53\u7684\u5f02\u5e38\u5904\u7406\uff0c\u5e76\u67e5\u770b\u9519\u8bef\u65e5\u5fd7\u4ee5\u83b7\u53d6\u66f4\u591a\u4fe1\u606f\u3002\u53ef\u4ee5\u53c2\u8003 [httpx \u6587\u6863](https://www.python-httpx.org/) \u4e86\u89e3\u66f4\u591a\u5173\u4e8e\u5f02\u5e38\u5904\u7406\u7684\u4fe1\u606f\u3002\n\n## \u53c2\u4e0e\u8d21\u732e\n\n\u6211\u4eec\u975e\u5e38\u6b22\u8fce\u793e\u533a\u6210\u5458\u53c2\u4e0e\u9879\u76ee\u5efa\u8bbe\uff01\u4ee5\u4e0b\u662f\u51e0\u79cd\u53c2\u4e0e\u65b9\u5f0f\uff1a\n\n### Star \u2b50 \u672c\u9879\u76ee\n\u5982\u679c\u60a8\u89c9\u5f97\u8fd9\u4e2a\u9879\u76ee\u5bf9\u60a8\u6709\u5e2e\u52a9\uff0c\u6b22\u8fce\u70b9\u51fb\u53f3\u4e0a\u89d2\u7684 Star \u6309\u94ae\u652f\u6301\u6211\u4eec\uff01\n\n### \u63d0\u4ea4 Issue \n\u53d1\u73b0 bug \u6216\u6709\u65b0\u529f\u80fd\u5efa\u8bae\uff1f\u6b22\u8fce\u63d0\u4ea4 [Issue](https://github.com/huazz233/googlesearch/issues)\uff01\n- \ud83d\udc1b Bug \u53cd\u9988\uff1a\u8bf7\u8be6\u7ec6\u63cf\u8ff0\u95ee\u9898\u73b0\u8c61\u548c\u590d\u73b0\u6b65\u9aa4\n- \ud83d\udca1 \u529f\u80fd\u5efa\u8bae\uff1a\u8bf7\u8bf4\u660e\u65b0\u529f\u80fd\u7684\u4f7f\u7528\u573a\u666f\u548c\u9884\u671f\u6548\u679c\n\n### Pull Request\n\u60f3\u8981\u4e3a\u9879\u76ee\u8d21\u732e\u4ee3\u7801\uff1f\u975e\u5e38\u6b22\u8fce\u63d0\u4ea4 PR\uff01\n\n1. Fork \u672c\u4ed3\u5e93\n2. \u521b\u5efa\u65b0\u5206\u652f: `git checkout -b feature/your-feature-name`\n3. \u63d0\u4ea4\u66f4\u6539: `git commit -am 'Add some feature'`\n4. \u63a8\u9001\u5206\u652f: `git push origin feature/your-feature-name`\n5. \u63d0\u4ea4 Pull Request\n\n\u6211\u4eec\u4f1a\u8ba4\u771f\u5ba1\u67e5\u6bcf\u4e00\u4e2a PR\uff0c\u5e76\u63d0\u4f9b\u53ca\u65f6\u53cd\u9988\u3002\n\n## \u793e\u533a\u652f\u6301\n\n- \ud83d\udceb \u90ae\u4ef6\u8054\u7cfb\uff1a[huazz233@163.com](mailto:huazz233@163.com)\n- \ud83d\udcac \u95ee\u9898\u53cd\u9988\uff1a[GitHub Issues](https://github.com/huazz233/googlesearch/issues)\n- \ud83d\udcd6 \u5f00\u53d1\u6587\u6863\uff1a[Wiki](https://github.com/huazz233/googlesearch/wiki)\n- \ud83d\udc65 \u8ba8\u8bba\u533a\uff1a[Discussions](https://github.com/huazz233/googlesearch/discussions)\n\n## \u8bb8\u53ef\u8bc1\n\n\u672c\u9879\u76ee\u91c7\u7528 MIT \u8bb8\u53ef\u8bc1 - \u67e5\u770b [LICENSE](LICENSE) \u4e86\u89e3\u8be6\u60c5\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A Python library for performing Google searches with support for dynamic query parameters, result deduplication, and custom proxy configuration.",
    "version": "1.1.1",
    "project_urls": {
        "Bug Tracker": "https://github.com/huazz233/googlesearcher/issues",
        "Documentation": "https://github.com/huazz233/googlesearcher#readme",
        "Homepage": "https://github.com/huazz233/googlesearcher",
        "Source Code": "https://github.com/huazz233/googlesearcher"
    },
    "split_keywords": [
        "async",
        " google",
        " proxy",
        " search",
        " web-scraping"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "456542ea36f47a1ae06af7c3b2ee5b39525a8e489f856533f362fb39080b1cde",
                "md5": "f2c4fa80dfb58b1dbaaa725061b82709",
                "sha256": "1951df29d568417cdfd7c9dde375b7eed640a3f2004039b92e532c5dacbb585e"
            },
            "downloads": -1,
            "filename": "googlesearch_tool-1.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f2c4fa80dfb58b1dbaaa725061b82709",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 30807,
            "upload_time": "2024-12-02T02:00:42",
            "upload_time_iso_8601": "2024-12-02T02:00:42.310312Z",
            "url": "https://files.pythonhosted.org/packages/45/65/42ea36f47a1ae06af7c3b2ee5b39525a8e489f856533f362fb39080b1cde/googlesearch_tool-1.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5061c4958e1671feb6b9115ca771b2b4549a77ad9267eb7c3f601d4f552449ce",
                "md5": "692bbfb6013cc79b13faf62a283526bb",
                "sha256": "f13f521fa2d88ad4e5cb6cad7b0d1e501c14f4f177b9719a8efdb4d79c4a9152"
            },
            "downloads": -1,
            "filename": "googlesearch_tool-1.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "692bbfb6013cc79b13faf62a283526bb",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 24391,
            "upload_time": "2024-12-02T02:00:43",
            "upload_time_iso_8601": "2024-12-02T02:00:43.801779Z",
            "url": "https://files.pythonhosted.org/packages/50/61/c4958e1671feb6b9115ca771b2b4549a77ad9267eb7c3f601d4f552449ce/googlesearch_tool-1.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-02 02:00:43",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "huazz233",
    "github_project": "googlesearcher",
    "github_not_found": true,
    "lcname": "googlesearch-tool"
}
        
Elapsed time: 0.38131s