webcap


Namewebcap JSON
Version 0.1.21 PyPI version JSON
download
home_pageNone
SummaryNone
upload_time2024-12-16 22:08:16
maintainerNone
docs_urlNone
authorTheTechromancer
requires_python<3.13,>=3.9
licenseGPL-v3.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <img src="https://github.com/user-attachments/assets/25912aba-690a-45e2-a6a9-2b0445e8218f" width="600"/>

[![Python Version](https://img.shields.io/badge/python-3.9+-8400ff)](https://www.python.org) [![License](https://img.shields.io/badge/license-GPLv3-8400ff.svg)](https://github.com/blacklanternsecurity/webcap/blob/dev/LICENSE) [![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff) [![Tests](https://github.com/blacklanternsecurity/webcap/actions/workflows/tests.yml/badge.svg?branch=stable)](https://github.com/blacklanternsecurity/webcap/actions?query=workflow%3A"tests") [![Codecov](https://codecov.io/gh/blacklanternsecurity/webcap/branch/dev/graph/badge.svg?token=IR5AZBDM5K)](https://codecov.io/gh/blacklanternsecurity/webcap) [![Discord](https://img.shields.io/discord/859164869970362439)](https://discord.com/invite/PZqkgxu5SA)

**WebCap** is an extremely lightweight headless browser tool. It doesn't require Selenium, Playwright, Puppeteer, or any other browser automation framework; all it needs is a working Chrome installation. Used by [BBOT](https://github.com/blacklanternsecurity/bbot).

### Installation

```bash
pipx install webcap
```

### Features

WebCap's most unique feature is its ability to capture not only the **fully-rendered DOM**, but also every snippet of **parsed Javascript** (regardless of inline or external), and the **full content** of every HTTP request + response (including Javascript API calls etc.). For convenience, it outputs directly to JSON:

#### Screenshots

![image](https://github.com/user-attachments/assets/c5a409ea-d068-45d7-a4c1-8a8cddf6c491)

#### Fully-rendered DOM

![image](https://github.com/user-attachments/assets/60dd2a80-f9c3-438e-8f00-f982c356625d)

#### Javascript Capture

![image](https://github.com/user-attachments/assets/6f960bbb-efb6-4294-a1f2-2c6181baa31a)

#### Requests + Responses

![image](https://github.com/user-attachments/assets/0f036384-a465-4579-b70a-b567daaa8113)

#### OCR

![image](https://github.com/user-attachments/assets/cffb268e-8b9b-490c-8949-39e73e73aa8a)

### Full feature list

- [x] Blazing fast screenshots
- [x] Fullscreen capture (entire scrollable page)
- [x] JSON output
- [x] Full DOM extraction
- [x] Javascript extraction (inline + external)
- [ ] Javascript extraction (environment dump)
- [x] Full network logs (incl. request/response bodies)
- [x] Title
- [x] Status code
- [x] Fuzzy (perception) hashing
- [ ] Technology detection
- [x] OCR text extraction
- [ ] Web interface

### Example Commands

```bash
# Capture screenshots of all URLs in urls.txt
webcap -u urls.txt -o ./my_screenshots

# Output to JSON, and include the fully-rendered DOM
webcap -u urls.txt --json --dom | jq

# Capture requests and responses
webcap -u urls.txt --json --requests --responses | jq

# Capture javascript
webcap -u urls.txt --json --javascript | jq

# Extract text from screenshots
webcap -u urls.txt --json --ocr | jq
```

### Use as a Python library

```python
import base64
from webcap import Browser

async def main():
    # create a browser instance
    browser = Browser()
    # start the browser
    await browser.start()
    # take a screenshot
    webscreenshot = await browser.screenshot("http://example.com")
    # save the screenshot to a file
    with open("screenshot.png", "wb") as f:
        f.write(webscreenshot.blob)
    # stop the browser
    await browser.stop()

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())
```

## CLI Usage (--help)

```
usage: webcap [-h] [-u URLS [URLS ...]] [-o OUTPUT] [-j] [-r RESOLUTION] [-f] [--no-screenshots] [-t THREADS] [--delay DELAY] [-U USER_AGENT] [-H HEADERS [HEADERS ...]] [-p PROXY] [-b]
              [-d] [-rs] [-rq] [-J] [--ignore-types IGNORE_TYPES [IGNORE_TYPES ...]] [--ocr] [-s] [--debug] [--no-color] [-c CHROME]

options:
  -h, --help            show this help message and exit
  -u URLS [URLS ...], --urls URLS [URLS ...]
                        URL(s) to capture, or file(s) containing URLs
  -o OUTPUT, --output OUTPUT
                        Output directory
  -j, --json            Output JSON

Screenshots:
  -r RESOLUTION, --resolution RESOLUTION
                        Resolution to capture
  -f, --full-page       Capture the full page (larger resolution images)
  --no-screenshots      Don't take screenshots

Performance:
  -t THREADS, --threads THREADS
                        Number of threads to use
  --delay DELAY         Delay before capturing (default: 3.0 seconds)

HTTP:
  -U USER_AGENT, --user-agent USER_AGENT
                        User agent to use
  -H HEADERS [HEADERS ...], --headers HEADERS [HEADERS ...]
                        Additional headers to send in format: 'Header-Name: Header-Value' (multiple supported)
  -p PROXY, --proxy PROXY
                        HTTP proxy to use

JSON Output:
  -b, --base64          Output each screenshot as base64
  -d, --dom             Capture the fully-rendered DOM
  -rs, --responses      Capture the full body of each HTTP response (including API calls etc.)
  -rq, --requests       Capture the full body of each HTTP request (including API calls etc.)
  -J, --javascript      Capture every snippet of Javascript (inline + external)
  --ignore-types IGNORE_TYPES [IGNORE_TYPES ...]
                        Ignore certain types of network requests (default: Image, Media, Font, Stylesheet)
  --ocr                 Extract text from screenshots

Misc:
  -s, --silent          Silent mode
  --debug               Enable debugging
  --no-color            Disable color output
  -c CHROME, --chrome CHROME
                        Path to Chrome executable
```


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "webcap",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.9",
    "maintainer_email": null,
    "keywords": null,
    "author": "TheTechromancer",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/e2/2f/339c4541dfe4b38a3926148f313ef901daf263a9603fb535d42fd51f5f1a/webcap-0.1.21.tar.gz",
    "platform": null,
    "description": "<img src=\"https://github.com/user-attachments/assets/25912aba-690a-45e2-a6a9-2b0445e8218f\" width=\"600\"/>\n\n[![Python Version](https://img.shields.io/badge/python-3.9+-8400ff)](https://www.python.org) [![License](https://img.shields.io/badge/license-GPLv3-8400ff.svg)](https://github.com/blacklanternsecurity/webcap/blob/dev/LICENSE) [![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff) [![Tests](https://github.com/blacklanternsecurity/webcap/actions/workflows/tests.yml/badge.svg?branch=stable)](https://github.com/blacklanternsecurity/webcap/actions?query=workflow%3A\"tests\") [![Codecov](https://codecov.io/gh/blacklanternsecurity/webcap/branch/dev/graph/badge.svg?token=IR5AZBDM5K)](https://codecov.io/gh/blacklanternsecurity/webcap) [![Discord](https://img.shields.io/discord/859164869970362439)](https://discord.com/invite/PZqkgxu5SA)\n\n**WebCap** is an extremely lightweight headless browser tool. It doesn't require Selenium, Playwright, Puppeteer, or any other browser automation framework; all it needs is a working Chrome installation. Used by [BBOT](https://github.com/blacklanternsecurity/bbot).\n\n### Installation\n\n```bash\npipx install webcap\n```\n\n### Features\n\nWebCap's most unique feature is its ability to capture not only the **fully-rendered DOM**, but also every snippet of **parsed Javascript** (regardless of inline or external), and the **full content** of every HTTP request + response (including Javascript API calls etc.). For convenience, it outputs directly to JSON:\n\n#### Screenshots\n\n![image](https://github.com/user-attachments/assets/c5a409ea-d068-45d7-a4c1-8a8cddf6c491)\n\n#### Fully-rendered DOM\n\n![image](https://github.com/user-attachments/assets/60dd2a80-f9c3-438e-8f00-f982c356625d)\n\n#### Javascript Capture\n\n![image](https://github.com/user-attachments/assets/6f960bbb-efb6-4294-a1f2-2c6181baa31a)\n\n#### Requests + Responses\n\n![image](https://github.com/user-attachments/assets/0f036384-a465-4579-b70a-b567daaa8113)\n\n#### OCR\n\n![image](https://github.com/user-attachments/assets/cffb268e-8b9b-490c-8949-39e73e73aa8a)\n\n### Full feature list\n\n- [x] Blazing fast screenshots\n- [x] Fullscreen capture (entire scrollable page)\n- [x] JSON output\n- [x] Full DOM extraction\n- [x] Javascript extraction (inline + external)\n- [ ] Javascript extraction (environment dump)\n- [x] Full network logs (incl. request/response bodies)\n- [x] Title\n- [x] Status code\n- [x] Fuzzy (perception) hashing\n- [ ] Technology detection\n- [x] OCR text extraction\n- [ ] Web interface\n\n### Example Commands\n\n```bash\n# Capture screenshots of all URLs in urls.txt\nwebcap -u urls.txt -o ./my_screenshots\n\n# Output to JSON, and include the fully-rendered DOM\nwebcap -u urls.txt --json --dom | jq\n\n# Capture requests and responses\nwebcap -u urls.txt --json --requests --responses | jq\n\n# Capture javascript\nwebcap -u urls.txt --json --javascript | jq\n\n# Extract text from screenshots\nwebcap -u urls.txt --json --ocr | jq\n```\n\n### Use as a Python library\n\n```python\nimport base64\nfrom webcap import Browser\n\nasync def main():\n    # create a browser instance\n    browser = Browser()\n    # start the browser\n    await browser.start()\n    # take a screenshot\n    webscreenshot = await browser.screenshot(\"http://example.com\")\n    # save the screenshot to a file\n    with open(\"screenshot.png\", \"wb\") as f:\n        f.write(webscreenshot.blob)\n    # stop the browser\n    await browser.stop()\n\nif __name__ == \"__main__\":\n    import asyncio\n    asyncio.run(main())\n```\n\n## CLI Usage (--help)\n\n```\nusage: webcap [-h] [-u URLS [URLS ...]] [-o OUTPUT] [-j] [-r RESOLUTION] [-f] [--no-screenshots] [-t THREADS] [--delay DELAY] [-U USER_AGENT] [-H HEADERS [HEADERS ...]] [-p PROXY] [-b]\n              [-d] [-rs] [-rq] [-J] [--ignore-types IGNORE_TYPES [IGNORE_TYPES ...]] [--ocr] [-s] [--debug] [--no-color] [-c CHROME]\n\noptions:\n  -h, --help            show this help message and exit\n  -u URLS [URLS ...], --urls URLS [URLS ...]\n                        URL(s) to capture, or file(s) containing URLs\n  -o OUTPUT, --output OUTPUT\n                        Output directory\n  -j, --json            Output JSON\n\nScreenshots:\n  -r RESOLUTION, --resolution RESOLUTION\n                        Resolution to capture\n  -f, --full-page       Capture the full page (larger resolution images)\n  --no-screenshots      Don't take screenshots\n\nPerformance:\n  -t THREADS, --threads THREADS\n                        Number of threads to use\n  --delay DELAY         Delay before capturing (default: 3.0 seconds)\n\nHTTP:\n  -U USER_AGENT, --user-agent USER_AGENT\n                        User agent to use\n  -H HEADERS [HEADERS ...], --headers HEADERS [HEADERS ...]\n                        Additional headers to send in format: 'Header-Name: Header-Value' (multiple supported)\n  -p PROXY, --proxy PROXY\n                        HTTP proxy to use\n\nJSON Output:\n  -b, --base64          Output each screenshot as base64\n  -d, --dom             Capture the fully-rendered DOM\n  -rs, --responses      Capture the full body of each HTTP response (including API calls etc.)\n  -rq, --requests       Capture the full body of each HTTP request (including API calls etc.)\n  -J, --javascript      Capture every snippet of Javascript (inline + external)\n  --ignore-types IGNORE_TYPES [IGNORE_TYPES ...]\n                        Ignore certain types of network requests (default: Image, Media, Font, Stylesheet)\n  --ocr                 Extract text from screenshots\n\nMisc:\n  -s, --silent          Silent mode\n  --debug               Enable debugging\n  --no-color            Disable color output\n  -c CHROME, --chrome CHROME\n                        Path to Chrome executable\n```\n\n",
    "bugtrack_url": null,
    "license": "GPL-v3.0",
    "summary": null,
    "version": "0.1.21",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bb8ae8f1eef129b208d03d926c795bbbeac083b91bef535d6666fe998879a68b",
                "md5": "62eece409289c9c561d01383e3b826c9",
                "sha256": "28a72f19157d92ba9a6dc34c5fe6bb229a9ccbfa3ad100fa8656660caa4f36d9"
            },
            "downloads": -1,
            "filename": "webcap-0.1.21-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "62eece409289c9c561d01383e3b826c9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.9",
            "size": 36098,
            "upload_time": "2024-12-16T22:08:14",
            "upload_time_iso_8601": "2024-12-16T22:08:14.358360Z",
            "url": "https://files.pythonhosted.org/packages/bb/8a/e8f1eef129b208d03d926c795bbbeac083b91bef535d6666fe998879a68b/webcap-0.1.21-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e22f339c4541dfe4b38a3926148f313ef901daf263a9603fb535d42fd51f5f1a",
                "md5": "21305c880418391438125d29ef04fe7a",
                "sha256": "99cddd626aa1850137fb7c94e0ce42e03f427267696fb328334afc7203f54094"
            },
            "downloads": -1,
            "filename": "webcap-0.1.21.tar.gz",
            "has_sig": false,
            "md5_digest": "21305c880418391438125d29ef04fe7a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.9",
            "size": 33722,
            "upload_time": "2024-12-16T22:08:16",
            "upload_time_iso_8601": "2024-12-16T22:08:16.548194Z",
            "url": "https://files.pythonhosted.org/packages/e2/2f/339c4541dfe4b38a3926148f313ef901daf263a9603fb535d42fd51f5f1a/webcap-0.1.21.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-16 22:08:16",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "webcap"
}
        
Elapsed time: 0.41802s