AIWebSearcher

Name	AIWebSearcher JSON
Version	0.1.1 JSON
	download
home_page	None
Summary	A Model Context Protocol (MCP) server providing AI-powered Baidu search with intelligent reranking and web content extraction
upload_time	2025-10-15 14:43:14
maintainer	None
docs_url	None
author	None
requires_python	>=3.10
license	MIT
keywords	ai baidu llm mcp model-context-protocol search
VCS
bugtrack_url
requirements	fastmcp agno requests asyncio-mqtt baidusearch pydantic pycountry beautifulsoup4 trafilatura lxml charset-normalizer
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Search MCP Server

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)

A powerful Model Context Protocol (MCP) server providing AI-enhanced Baidu search with intelligent reranking and comprehensive web content extraction capabilities.

## ✨ Features

- 🔍 **Baidu Search Integration**: Fast and reliable search results from Baidu
- 🤖 **AI-Powered Reranking**: Uses multiple AI agents (Qwen) to intelligently rerank search results by relevance
- 📄 **Web Content Extraction**: Extract clean, readable text from web pages with pagination support
- 🎯 **Batch Processing**: Extract content from multiple URLs simultaneously
- 🌐 **MCP Standard**: Fully compliant with Model Context Protocol for seamless integration

## 🚀 Quick Start

### Prerequisites

- Python 3.10 or higher
- [uv](https://github.com/astral-sh/uv) (recommended) or pip
- DashScope API key (for AI search features)

### Installation

#### Using uv (Recommended)

```bash
# Clone the repository
git clone https://github.com/Vist233/Google-Search-Tool.git
cd search-mcp

# Install with uv
uv pip install -e .
```

#### Using pip

```bash
pip install -e .
```

### Environment Setup

Create a `.env` file or set environment variables for AI features:

```bash
export DASHSCOPE_API_KEY="your-api-key-here"
```

## 📖 Usage

### As an MCP Server

Add to your MCP client configuration (e.g., Claude Desktop):

**For macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`

**For Windows**: `%APPDATA%/Claude/claude_desktop_config.json`

```json
{
  "mcpServers": {
    "aiwebsearcher": {
      "command": "uvx",
      "args": [
        "aiwebsearcher"
      ]
    }
  }
}
```

**Note**: API key is read from environment variable `DASHSCOPE_API_KEY`. Set it before running:

```bash
# macOS/Linux
export DASHSCOPE_API_KEY="your-api-key-here"

# Windows (PowerShell)
$env:DASHSCOPE_API_KEY="your-api-key-here"
```

### Standalone Testing

```bash
# Install the package
pip install aiwebsearcher

# Set API key
export DASHSCOPE_API_KEY="your-key"

# Run the server
aiwebsearcher
```


## 🛠️ Available Tools

### 1. `search_baidu`

Execute basic Baidu search and return structured results.

**Parameters:**
- `query` (str): Search keyword
- `max_results` (int, optional): Maximum results to return (default: 5)
- `language` (str, optional): Search language (default: "zh")

**Returns:** JSON string with title, url, and abstract for each result.

**Example:**
```python
{
  "query": "人工智能发展现状",
  "max_results": 5
}
```

### 2. `AI_search_baidu`

AI-enhanced search with intelligent reranking and content extraction. Takes ~3x longer but provides higher quality, ranked results with full page content.

**Parameters:**
- `query` (str): Search keyword
- `max_results` (int, optional): Initial results to fetch (default: 5, recommended 5+)
- `language` (str, optional): Search language (default: "zh")

**Returns:** JSON string with rank, title, url, and Content (full page text) for each result.

**Example:**
```python
{
  "query": "AI发展趋势 2025",
  "max_results": 12
}
```

### 3. `extractTextFromUrl`

Extract clean, readable text from a single webpage.

**Parameters:**
- `url` (str): Target webpage URL
- `follow_pagination` (bool, optional): Follow rel="next" links (default: true)
- `pagination_limit` (int, optional): Max pagination depth (default: 3)
- `timeout` (float, optional): HTTP timeout in seconds (default: 10.0)
- `user_agent` (str, optional): Custom User-Agent header
- `regular_expressions` (list[str], optional): Regex patterns to filter text

**Returns:** Extracted text content as string.

### 4. `extractTextFromUrls`

Extract text from multiple webpages in batch.

**Parameters:** Same as `extractTextFromUrl`, plus:
- `urls` (list[str]): List of target URLs

**Returns:** Combined text from all URLs, separated by double newlines.

## 🏗️ Project Structure

```
search-mcp/
├── searcher/
│   └── src/
│       ├── server.py              # MCP server entry point
│       ├── FetchPage/
│       │   └── fetchWeb.py        # Web content extraction
│       ├── WebSearch/
│       │   ├── baiduSearchTool.py # Baidu search implementation
│       │   └── SearchAgent.py     # AI agent definitions (legacy)
│       └── useAI2Search/
│           └── SearchAgent.py     # AI-powered search orchestration
├── tests/                         # Test files
├── pyproject.toml                # Project configuration
├── requirements.txt              # Dependencies
└── README.md                     # This file
```

## 🔧 Development

### Install Development Dependencies

```bash
uv pip install -e ".[dev]"
```

### Run Tests

```bash
pytest
```

### Code Formatting

```bash
# Format with black
black searcher/

# Lint with ruff
ruff check searcher/
```

## 📝 Configuration

### MCP Client Configuration Examples

**Minimal configuration:**
```json
{
  "mcpServers": {
    "search": {
      "command": "python",
      "args": ["server.py"],
      "cwd": "/path/to/search-mcp/searcher/src"
    }
  }
}
```

**With uv for dependency isolation:**
```json
{
  "mcpServers": {
    "search": {
      "command": "uv",
      "args": ["--directory", "/path/to/search-mcp/searcher/src", "run", "python", "server.py"]
    }
  }
}
```

## 🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 🙏 Acknowledgments

- Built with [FastMCP](https://github.com/jlowin/fastmcp)
- AI models powered by [Agno](https://github.com/agno-agi/agno) and DashScope
- Search powered by [baidusearch](https://github.com/liuxingwt/baidusearch)
- Content extraction using [trafilatura](https://github.com/adbar/trafilatura)

## 📮 Contact

- GitHub: [@Vist233](https://github.com/Vist233)
- Repository: [Google-Search-Tool](https://github.com/Vist233/Google-Search-Tool)

## ⚠️ Disclaimer

This tool is for educational and research purposes. Please respect website terms of service and rate limits when scraping content.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "AIWebSearcher",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "ai, baidu, llm, mcp, model-context-protocol, search",
    "author": null,
    "author_email": "Vist233 <zhangyvjing@outlook.com>",
    "download_url": "https://files.pythonhosted.org/packages/92/ff/c219dc6be40620c3c0267591d34a06332b77a60d41da940ea49f151ce0fa/aiwebsearcher-0.1.1.tar.gz",
    "platform": null,
    "description": "# Search MCP Server\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)\n\nA powerful Model Context Protocol (MCP) server providing AI-enhanced Baidu search with intelligent reranking and comprehensive web content extraction capabilities.\n\n## \u2728 Features\n\n- \ud83d\udd0d **Baidu Search Integration**: Fast and reliable search results from Baidu\n- \ud83e\udd16 **AI-Powered Reranking**: Uses multiple AI agents (Qwen) to intelligently rerank search results by relevance\n- \ud83d\udcc4 **Web Content Extraction**: Extract clean, readable text from web pages with pagination support\n- \ud83c\udfaf **Batch Processing**: Extract content from multiple URLs simultaneously\n- \ud83c\udf10 **MCP Standard**: Fully compliant with Model Context Protocol for seamless integration\n\n## \ud83d\ude80 Quick Start\n\n### Prerequisites\n\n- Python 3.10 or higher\n- [uv](https://github.com/astral-sh/uv) (recommended) or pip\n- DashScope API key (for AI search features)\n\n### Installation\n\n#### Using uv (Recommended)\n\n```bash\n# Clone the repository\ngit clone https://github.com/Vist233/Google-Search-Tool.git\ncd search-mcp\n\n# Install with uv\nuv pip install -e .\n```\n\n#### Using pip\n\n```bash\npip install -e .\n```\n\n### Environment Setup\n\nCreate a `.env` file or set environment variables for AI features:\n\n```bash\nexport DASHSCOPE_API_KEY=\"your-api-key-here\"\n```\n\n## \ud83d\udcd6 Usage\n\n### As an MCP Server\n\nAdd to your MCP client configuration (e.g., Claude Desktop):\n\n**For macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`\n\n**For Windows**: `%APPDATA%/Claude/claude_desktop_config.json`\n\n```json\n{\n  \"mcpServers\": {\n    \"aiwebsearcher\": {\n      \"command\": \"uvx\",\n      \"args\": [\n        \"aiwebsearcher\"\n      ]\n    }\n  }\n}\n```\n\n**Note**: API key is read from environment variable `DASHSCOPE_API_KEY`. Set it before running:\n\n```bash\n# macOS/Linux\nexport DASHSCOPE_API_KEY=\"your-api-key-here\"\n\n# Windows (PowerShell)\n$env:DASHSCOPE_API_KEY=\"your-api-key-here\"\n```\n\n### Standalone Testing\n\n```bash\n# Install the package\npip install aiwebsearcher\n\n# Set API key\nexport DASHSCOPE_API_KEY=\"your-key\"\n\n# Run the server\naiwebsearcher\n```\n\n\n## \ud83d\udee0\ufe0f Available Tools\n\n### 1. `search_baidu`\n\nExecute basic Baidu search and return structured results.\n\n**Parameters:**\n- `query` (str): Search keyword\n- `max_results` (int, optional): Maximum results to return (default: 5)\n- `language` (str, optional): Search language (default: \"zh\")\n\n**Returns:** JSON string with title, url, and abstract for each result.\n\n**Example:**\n```python\n{\n  \"query\": \"\u4eba\u5de5\u667a\u80fd\u53d1\u5c55\u73b0\u72b6\",\n  \"max_results\": 5\n}\n```\n\n### 2. `AI_search_baidu`\n\nAI-enhanced search with intelligent reranking and content extraction. Takes ~3x longer but provides higher quality, ranked results with full page content.\n\n**Parameters:**\n- `query` (str): Search keyword\n- `max_results` (int, optional): Initial results to fetch (default: 5, recommended 5+)\n- `language` (str, optional): Search language (default: \"zh\")\n\n**Returns:** JSON string with rank, title, url, and Content (full page text) for each result.\n\n**Example:**\n```python\n{\n  \"query\": \"AI\u53d1\u5c55\u8d8b\u52bf 2025\",\n  \"max_results\": 12\n}\n```\n\n### 3. `extractTextFromUrl`\n\nExtract clean, readable text from a single webpage.\n\n**Parameters:**\n- `url` (str): Target webpage URL\n- `follow_pagination` (bool, optional): Follow rel=\"next\" links (default: true)\n- `pagination_limit` (int, optional): Max pagination depth (default: 3)\n- `timeout` (float, optional): HTTP timeout in seconds (default: 10.0)\n- `user_agent` (str, optional): Custom User-Agent header\n- `regular_expressions` (list[str], optional): Regex patterns to filter text\n\n**Returns:** Extracted text content as string.\n\n### 4. `extractTextFromUrls`\n\nExtract text from multiple webpages in batch.\n\n**Parameters:** Same as `extractTextFromUrl`, plus:\n- `urls` (list[str]): List of target URLs\n\n**Returns:** Combined text from all URLs, separated by double newlines.\n\n## \ud83c\udfd7\ufe0f Project Structure\n\n```\nsearch-mcp/\n\u251c\u2500\u2500 searcher/\n\u2502   \u2514\u2500\u2500 src/\n\u2502       \u251c\u2500\u2500 server.py              # MCP server entry point\n\u2502       \u251c\u2500\u2500 FetchPage/\n\u2502       \u2502   \u2514\u2500\u2500 fetchWeb.py        # Web content extraction\n\u2502       \u251c\u2500\u2500 WebSearch/\n\u2502       \u2502   \u251c\u2500\u2500 baiduSearchTool.py # Baidu search implementation\n\u2502       \u2502   \u2514\u2500\u2500 SearchAgent.py     # AI agent definitions (legacy)\n\u2502       \u2514\u2500\u2500 useAI2Search/\n\u2502           \u2514\u2500\u2500 SearchAgent.py     # AI-powered search orchestration\n\u251c\u2500\u2500 tests/                         # Test files\n\u251c\u2500\u2500 pyproject.toml                # Project configuration\n\u251c\u2500\u2500 requirements.txt              # Dependencies\n\u2514\u2500\u2500 README.md                     # This file\n```\n\n## \ud83d\udd27 Development\n\n### Install Development Dependencies\n\n```bash\nuv pip install -e \".[dev]\"\n```\n\n### Run Tests\n\n```bash\npytest\n```\n\n### Code Formatting\n\n```bash\n# Format with black\nblack searcher/\n\n# Lint with ruff\nruff check searcher/\n```\n\n## \ud83d\udcdd Configuration\n\n### MCP Client Configuration Examples\n\n**Minimal configuration:**\n```json\n{\n  \"mcpServers\": {\n    \"search\": {\n      \"command\": \"python\",\n      \"args\": [\"server.py\"],\n      \"cwd\": \"/path/to/search-mcp/searcher/src\"\n    }\n  }\n}\n```\n\n**With uv for dependency isolation:**\n```json\n{\n  \"mcpServers\": {\n    \"search\": {\n      \"command\": \"uv\",\n      \"args\": [\"--directory\", \"/path/to/search-mcp/searcher/src\", \"run\", \"python\", \"server.py\"]\n    }\n  }\n}\n```\n\n## \ud83e\udd1d Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n1. Fork the repository\n2. Create your feature branch (`git checkout -b feature/AmazingFeature`)\n3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)\n4. Push to the branch (`git push origin feature/AmazingFeature`)\n5. Open a Pull Request\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## \ud83d\ude4f Acknowledgments\n\n- Built with [FastMCP](https://github.com/jlowin/fastmcp)\n- AI models powered by [Agno](https://github.com/agno-agi/agno) and DashScope\n- Search powered by [baidusearch](https://github.com/liuxingwt/baidusearch)\n- Content extraction using [trafilatura](https://github.com/adbar/trafilatura)\n\n## \ud83d\udcee Contact\n\n- GitHub: [@Vist233](https://github.com/Vist233)\n- Repository: [Google-Search-Tool](https://github.com/Vist233/Google-Search-Tool)\n\n## \u26a0\ufe0f Disclaimer\n\nThis tool is for educational and research purposes. Please respect website terms of service and rate limits when scraping content.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A Model Context Protocol (MCP) server providing AI-powered Baidu search with intelligent reranking and web content extraction",
    "version": "0.1.1",
    "project_urls": {
        "Homepage": "https://github.com/Vist233/Web-Searcher",
        "Issues": "https://github.com/Vist233/Web-Searcher/issues",
        "Repository": "https://github.com/Vist233/Web-Searcher"
    },
    "split_keywords": [
        "ai",
        " baidu",
        " llm",
        " mcp",
        " model-context-protocol",
        " search"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e96be22a429174c115e4ef06ce55fce6dfac7a3d2a5172e27a152051d749ed29",
                "md5": "d4cb3a087867aa9208346fbb725c56cf",
                "sha256": "0a8b002ec3d7beb095dd153fc42b9b27aa39a7b4ba19484000ac4241026d4610"
            },
            "downloads": -1,
            "filename": "aiwebsearcher-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d4cb3a087867aa9208346fbb725c56cf",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 19618,
            "upload_time": "2025-10-15T14:43:12",
            "upload_time_iso_8601": "2025-10-15T14:43:12.729478Z",
            "url": "https://files.pythonhosted.org/packages/e9/6b/e22a429174c115e4ef06ce55fce6dfac7a3d2a5172e27a152051d749ed29/aiwebsearcher-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "92ffc219dc6be40620c3c0267591d34a06332b77a60d41da940ea49f151ce0fa",
                "md5": "24d5679b259d9d01e456bae300d287b9",
                "sha256": "704be2c74502793ec4fdfef76e0d9c2239ae1a341262444bf0a0dbafdf5dbc00"
            },
            "downloads": -1,
            "filename": "aiwebsearcher-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "24d5679b259d9d01e456bae300d287b9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 30287,
            "upload_time": "2025-10-15T14:43:14",
            "upload_time_iso_8601": "2025-10-15T14:43:14.441010Z",
            "url": "https://files.pythonhosted.org/packages/92/ff/c219dc6be40620c3c0267591d34a06332b77a60d41da940ea49f151ce0fa/aiwebsearcher-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-15 14:43:14",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Vist233",
    "github_project": "Web-Searcher",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "fastmcp",
            "specs": [
                [
                    ">=",
                    "0.1.0"
                ]
            ]
        },
        {
            "name": "agno",
            "specs": [
                [
                    ">=",
                    "0.1.0"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    "<",
                    "3.0.0"
                ],
                [
                    ">=",
                    "2.31.0"
                ]
            ]
        },
        {
            "name": "asyncio-mqtt",
            "specs": [
                [
                    ">=",
                    "0.16.0"
                ]
            ]
        },
        {
            "name": "baidusearch",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "pydantic",
            "specs": [
                [
                    "<",
                    "3.0.0"
                ],
                [
                    ">=",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "pycountry",
            "specs": [
                [
                    ">=",
                    "22.0.0"
                ],
                [
                    "<",
                    "24.0.0"
                ]
            ]
        },
        {
            "name": "beautifulsoup4",
            "specs": [
                [
                    ">=",
                    "4.12.0"
                ],
                [
                    "<",
                    "5.0.0"
                ]
            ]
        },
        {
            "name": "trafilatura",
            "specs": [
                [
                    "<",
                    "2.0.0"
                ],
                [
                    ">=",
                    "1.6.0"
                ]
            ]
        },
        {
            "name": "lxml",
            "specs": [
                [
                    "<",
                    "6.0.0"
                ],
                [
                    ">=",
                    "4.9.0"
                ]
            ]
        },
        {
            "name": "charset-normalizer",
            "specs": [
                [
                    ">=",
                    "3.0.0"
                ],
                [
                    "<",
                    "4.0.0"
                ]
            ]
        }
    ],
    "lcname": "aiwebsearcher"
}

None