mcp-web-tools


Namemcp-web-tools JSON
Version 0.9.0 PyPI version JSON
download
home_pageNone
SummaryA powerful MCP server to equip LLMs with web access, search, and content extraction capabilities
upload_time2025-11-03 20:08:53
maintainerNone
docs_urlNone
authorNone
requires_python>=3.12
licenseMIT
keywords extraction llm mcp pdf search web
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # MCP Web Tools

This package provides a powerful MCP server to equip LLMs with web access, going beyond naive methods of searching, fetching and extracting content.

## Introduction

I created this package out of the frustration that most MCP servers enabling web access to LLMs, didn't perform as well as I hoped. Some of these shortcomings I wanted fix, include:

- [x] Good search results without requiring an API key
- [x] Sophisticated fetching for more complex JavaScript sites
- [x] Extracting content in nicely formatted Markdown
- [x] Support for extracting content from PDFs
- [x] Support for loading and displaying images
- [x] Capture rendered webpage screenshots for visual context
- [x] Usage options for advanced cases like loading raw HTML

## Installation

### Claude Desktop

```json
```

### Claude Code

```bash
claude mcp add web-tools uvx mcp-web-tools
```

Or to also set the Brave Search API key:

```bash
claude mcp add web-tools uvx mcp-web-tools -e BRAVE_SEARCH_API_KEY=<key>
```

Provide a [Perplexity Search API](https://www.perplexity.ai/api-platform) key to prioritize their fresh, citation-rich index:

```bash
claude mcp add web-tools uvx mcp-web-tools -e PERPLEXITY_API_KEY=<key>
```

You can mix both environment variables to fall back from Perplexity to Brave seamlessly.

## Internals

The package is written in Python using powerful libraries and services under the hood to improve results.

### Searching

We use the [Perplexity Search API](https://www.perplexity.ai/hub/blog/introducing-the-perplexity-search-api) when a `PERPLEXITY_API_KEY` is configured. It delivers ranked snippets with citations from Perplexity's continuously refreshed index. If no Perplexity key is available, we fall back to the [Brave Search API](https://brave.com/search/api) (via `BRAVE_SEARCH_API_KEY`), then a lightweight Google workaround, and finally DuckDuckGo. While we recommend adding at least one API key, the chained fallbacks continue working for most workloads.

### Fetching

The fetching of web content is based on [Zendriver](https://github.com/stephanlensky/zendriver), a fork of [nodriver](https://github.com/ultrafunkamsterdam/nodriver/) for next level webscraping and performance. It should stay undetected for most anti-bot solutions and fetch content even from complex JS-based sites.

### Extracting

For web extraction, we use [Trafilatura](https://trafilatura.readthedocs.io/en/latest/index.html) which consistently outperforms other alternatives for extracting content from HTML pages. For PDFs, we use [PyMuPDF4LLM](https://pymupdf.readthedocs.io/en/latest/pymupdf4llm/) which similarly extracts content in an easy-to-read format for LLMs, with advanced layout support.

### Screenshots

Rendered page previews are powered by [Zendriver](https://github.com/stephanlensky/zendriver). The `view_website` tool navigates to a URL in a headless Chromium session and returns the resulting page as a PNG screenshot. By default only the current viewport is captured, but callers can request a full-page image by setting the `full_page` argument to `true`.

## Contributing

While it's impossible to support all pages and layouts, we thrive to make this package better over time. For unsupported sites, problems, or feature requests open an issue.

## CI, Releases, and Publishing

This repo includes a GitHub Actions workflow that:

- Runs tests via `uv` on PRs and pushes to `main`.
- On push to `main`, if `project.version` in `pyproject.toml` changed, it:
  - Builds distributions with `uv build`.
  - Creates a GitHub Release tagged `v<version>` with autogenerated notes.
  - Publishes the package to PyPI using `uv publish`.
- Merge a PR that bumps `project.version` in `pyproject.toml` to trigger a release.

Rollback:

- If a release was created erroneously, delete the GitHub Release and tag `v<version>`.
- Yank the version on PyPI if needed.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "mcp-web-tools",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.12",
    "maintainer_email": null,
    "keywords": "extraction, llm, mcp, pdf, search, web",
    "author": null,
    "author_email": "Paul-Louis Pr\u00f6ve <mail@plpp.de>",
    "download_url": "https://files.pythonhosted.org/packages/c8/1c/c6f328f84e8b3d85d164e0f2ebeb9bee7b2b4aab92bbb3dd40c842392fd3/mcp_web_tools-0.9.0.tar.gz",
    "platform": null,
    "description": "# MCP Web Tools\n\nThis package provides a powerful MCP server to equip LLMs with web access, going beyond naive methods of searching, fetching and extracting content.\n\n## Introduction\n\nI created this package out of the frustration that most MCP servers enabling web access to LLMs, didn't perform as well as I hoped. Some of these shortcomings I wanted fix, include:\n\n- [x] Good search results without requiring an API key\n- [x] Sophisticated fetching for more complex JavaScript sites\n- [x] Extracting content in nicely formatted Markdown\n- [x] Support for extracting content from PDFs\n- [x] Support for loading and displaying images\n- [x] Capture rendered webpage screenshots for visual context\n- [x] Usage options for advanced cases like loading raw HTML\n\n## Installation\n\n### Claude Desktop\n\n```json\n```\n\n### Claude Code\n\n```bash\nclaude mcp add web-tools uvx mcp-web-tools\n```\n\nOr to also set the Brave Search API key:\n\n```bash\nclaude mcp add web-tools uvx mcp-web-tools -e BRAVE_SEARCH_API_KEY=<key>\n```\n\nProvide a [Perplexity Search API](https://www.perplexity.ai/api-platform) key to prioritize their fresh, citation-rich index:\n\n```bash\nclaude mcp add web-tools uvx mcp-web-tools -e PERPLEXITY_API_KEY=<key>\n```\n\nYou can mix both environment variables to fall back from Perplexity to Brave seamlessly.\n\n## Internals\n\nThe package is written in Python using powerful libraries and services under the hood to improve results.\n\n### Searching\n\nWe use the [Perplexity Search API](https://www.perplexity.ai/hub/blog/introducing-the-perplexity-search-api) when a `PERPLEXITY_API_KEY` is configured. It delivers ranked snippets with citations from Perplexity's continuously refreshed index. If no Perplexity key is available, we fall back to the [Brave Search API](https://brave.com/search/api) (via `BRAVE_SEARCH_API_KEY`), then a lightweight Google workaround, and finally DuckDuckGo. While we recommend adding at least one API key, the chained fallbacks continue working for most workloads.\n\n### Fetching\n\nThe fetching of web content is based on [Zendriver](https://github.com/stephanlensky/zendriver), a fork of [nodriver](https://github.com/ultrafunkamsterdam/nodriver/) for next level webscraping and performance. It should stay undetected for most anti-bot solutions and fetch content even from complex JS-based sites.\n\n### Extracting\n\nFor web extraction, we use [Trafilatura](https://trafilatura.readthedocs.io/en/latest/index.html) which consistently outperforms other alternatives for extracting content from HTML pages. For PDFs, we use [PyMuPDF4LLM](https://pymupdf.readthedocs.io/en/latest/pymupdf4llm/) which similarly extracts content in an easy-to-read format for LLMs, with advanced layout support.\n\n### Screenshots\n\nRendered page previews are powered by [Zendriver](https://github.com/stephanlensky/zendriver). The `view_website` tool navigates to a URL in a headless Chromium session and returns the resulting page as a PNG screenshot. By default only the current viewport is captured, but callers can request a full-page image by setting the `full_page` argument to `true`.\n\n## Contributing\n\nWhile it's impossible to support all pages and layouts, we thrive to make this package better over time. For unsupported sites, problems, or feature requests open an issue.\n\n## CI, Releases, and Publishing\n\nThis repo includes a GitHub Actions workflow that:\n\n- Runs tests via `uv` on PRs and pushes to `main`.\n- On push to `main`, if `project.version` in `pyproject.toml` changed, it:\n  - Builds distributions with `uv build`.\n  - Creates a GitHub Release tagged `v<version>` with autogenerated notes.\n  - Publishes the package to PyPI using `uv publish`.\n- Merge a PR that bumps `project.version` in `pyproject.toml` to trigger a release.\n\nRollback:\n\n- If a release was created erroneously, delete the GitHub Release and tag `v<version>`.\n- Yank the version on PyPI if needed.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A powerful MCP server to equip LLMs with web access, search, and content extraction capabilities",
    "version": "0.9.0",
    "project_urls": {
        "Homepage": "https://github.com/pietz/mcp-web-tools",
        "Issues": "https://github.com/pietz/mcp-web-tools/issues"
    },
    "split_keywords": [
        "extraction",
        " llm",
        " mcp",
        " pdf",
        " search",
        " web"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7b02b8542414b8d3602b7a9becf9b6ae6bb301fbd3dace512fa8a9ce953b4f1b",
                "md5": "228f98562687b54aacde91e5aeb8bb40",
                "sha256": "4c5513e57e2004a8fd54a0a62f9122d5517a5555994183a810bd90442643f837"
            },
            "downloads": -1,
            "filename": "mcp_web_tools-0.9.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "228f98562687b54aacde91e5aeb8bb40",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.12",
            "size": 10021,
            "upload_time": "2025-11-03T20:08:52",
            "upload_time_iso_8601": "2025-11-03T20:08:52.384881Z",
            "url": "https://files.pythonhosted.org/packages/7b/02/b8542414b8d3602b7a9becf9b6ae6bb301fbd3dace512fa8a9ce953b4f1b/mcp_web_tools-0.9.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c81cc6f328f84e8b3d85d164e0f2ebeb9bee7b2b4aab92bbb3dd40c842392fd3",
                "md5": "8586adc508739ed83f933cd65cdf038b",
                "sha256": "24a1edf9aa9793564f5f1c34ba2c5fd5ccd5b89b15d1e54aa4ab93d668233314"
            },
            "downloads": -1,
            "filename": "mcp_web_tools-0.9.0.tar.gz",
            "has_sig": false,
            "md5_digest": "8586adc508739ed83f933cd65cdf038b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.12",
            "size": 71641,
            "upload_time": "2025-11-03T20:08:53",
            "upload_time_iso_8601": "2025-11-03T20:08:53.479225Z",
            "url": "https://files.pythonhosted.org/packages/c8/1c/c6f328f84e8b3d85d164e0f2ebeb9bee7b2b4aab92bbb3dd40c842392fd3/mcp_web_tools-0.9.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-11-03 20:08:53",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "pietz",
    "github_project": "mcp-web-tools",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "mcp-web-tools"
}
        
Elapsed time: 2.77800s