Name | webcap JSON |
Version |
0.1.21
JSON |
| download |
home_page | None |
Summary | None |
upload_time | 2024-12-16 22:08:16 |
maintainer | None |
docs_url | None |
author | TheTechromancer |
requires_python | <3.13,>=3.9 |
license | GPL-v3.0 |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
<img src="https://github.com/user-attachments/assets/25912aba-690a-45e2-a6a9-2b0445e8218f" width="600"/>
[](https://www.python.org) [](https://github.com/blacklanternsecurity/webcap/blob/dev/LICENSE) [](https://github.com/astral-sh/ruff) [](https://github.com/blacklanternsecurity/webcap/actions?query=workflow%3A"tests") [](https://codecov.io/gh/blacklanternsecurity/webcap) [](https://discord.com/invite/PZqkgxu5SA)
**WebCap** is an extremely lightweight headless browser tool. It doesn't require Selenium, Playwright, Puppeteer, or any other browser automation framework; all it needs is a working Chrome installation. Used by [BBOT](https://github.com/blacklanternsecurity/bbot).
### Installation
```bash
pipx install webcap
```
### Features
WebCap's most unique feature is its ability to capture not only the **fully-rendered DOM**, but also every snippet of **parsed Javascript** (regardless of inline or external), and the **full content** of every HTTP request + response (including Javascript API calls etc.). For convenience, it outputs directly to JSON:
#### Screenshots

#### Fully-rendered DOM

#### Javascript Capture

#### Requests + Responses

#### OCR

### Full feature list
- [x] Blazing fast screenshots
- [x] Fullscreen capture (entire scrollable page)
- [x] JSON output
- [x] Full DOM extraction
- [x] Javascript extraction (inline + external)
- [ ] Javascript extraction (environment dump)
- [x] Full network logs (incl. request/response bodies)
- [x] Title
- [x] Status code
- [x] Fuzzy (perception) hashing
- [ ] Technology detection
- [x] OCR text extraction
- [ ] Web interface
### Example Commands
```bash
# Capture screenshots of all URLs in urls.txt
webcap -u urls.txt -o ./my_screenshots
# Output to JSON, and include the fully-rendered DOM
webcap -u urls.txt --json --dom | jq
# Capture requests and responses
webcap -u urls.txt --json --requests --responses | jq
# Capture javascript
webcap -u urls.txt --json --javascript | jq
# Extract text from screenshots
webcap -u urls.txt --json --ocr | jq
```
### Use as a Python library
```python
import base64
from webcap import Browser
async def main():
# create a browser instance
browser = Browser()
# start the browser
await browser.start()
# take a screenshot
webscreenshot = await browser.screenshot("http://example.com")
# save the screenshot to a file
with open("screenshot.png", "wb") as f:
f.write(webscreenshot.blob)
# stop the browser
await browser.stop()
if __name__ == "__main__":
import asyncio
asyncio.run(main())
```
## CLI Usage (--help)
```
usage: webcap [-h] [-u URLS [URLS ...]] [-o OUTPUT] [-j] [-r RESOLUTION] [-f] [--no-screenshots] [-t THREADS] [--delay DELAY] [-U USER_AGENT] [-H HEADERS [HEADERS ...]] [-p PROXY] [-b]
[-d] [-rs] [-rq] [-J] [--ignore-types IGNORE_TYPES [IGNORE_TYPES ...]] [--ocr] [-s] [--debug] [--no-color] [-c CHROME]
options:
-h, --help show this help message and exit
-u URLS [URLS ...], --urls URLS [URLS ...]
URL(s) to capture, or file(s) containing URLs
-o OUTPUT, --output OUTPUT
Output directory
-j, --json Output JSON
Screenshots:
-r RESOLUTION, --resolution RESOLUTION
Resolution to capture
-f, --full-page Capture the full page (larger resolution images)
--no-screenshots Don't take screenshots
Performance:
-t THREADS, --threads THREADS
Number of threads to use
--delay DELAY Delay before capturing (default: 3.0 seconds)
HTTP:
-U USER_AGENT, --user-agent USER_AGENT
User agent to use
-H HEADERS [HEADERS ...], --headers HEADERS [HEADERS ...]
Additional headers to send in format: 'Header-Name: Header-Value' (multiple supported)
-p PROXY, --proxy PROXY
HTTP proxy to use
JSON Output:
-b, --base64 Output each screenshot as base64
-d, --dom Capture the fully-rendered DOM
-rs, --responses Capture the full body of each HTTP response (including API calls etc.)
-rq, --requests Capture the full body of each HTTP request (including API calls etc.)
-J, --javascript Capture every snippet of Javascript (inline + external)
--ignore-types IGNORE_TYPES [IGNORE_TYPES ...]
Ignore certain types of network requests (default: Image, Media, Font, Stylesheet)
--ocr Extract text from screenshots
Misc:
-s, --silent Silent mode
--debug Enable debugging
--no-color Disable color output
-c CHROME, --chrome CHROME
Path to Chrome executable
```
Raw data
{
"_id": null,
"home_page": null,
"name": "webcap",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.13,>=3.9",
"maintainer_email": null,
"keywords": null,
"author": "TheTechromancer",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/e2/2f/339c4541dfe4b38a3926148f313ef901daf263a9603fb535d42fd51f5f1a/webcap-0.1.21.tar.gz",
"platform": null,
"description": "<img src=\"https://github.com/user-attachments/assets/25912aba-690a-45e2-a6a9-2b0445e8218f\" width=\"600\"/>\n\n[](https://www.python.org) [](https://github.com/blacklanternsecurity/webcap/blob/dev/LICENSE) [](https://github.com/astral-sh/ruff) [](https://github.com/blacklanternsecurity/webcap/actions?query=workflow%3A\"tests\") [](https://codecov.io/gh/blacklanternsecurity/webcap) [](https://discord.com/invite/PZqkgxu5SA)\n\n**WebCap** is an extremely lightweight headless browser tool. It doesn't require Selenium, Playwright, Puppeteer, or any other browser automation framework; all it needs is a working Chrome installation. Used by [BBOT](https://github.com/blacklanternsecurity/bbot).\n\n### Installation\n\n```bash\npipx install webcap\n```\n\n### Features\n\nWebCap's most unique feature is its ability to capture not only the **fully-rendered DOM**, but also every snippet of **parsed Javascript** (regardless of inline or external), and the **full content** of every HTTP request + response (including Javascript API calls etc.). For convenience, it outputs directly to JSON:\n\n#### Screenshots\n\n\n\n#### Fully-rendered DOM\n\n\n\n#### Javascript Capture\n\n\n\n#### Requests + Responses\n\n\n\n#### OCR\n\n\n\n### Full feature list\n\n- [x] Blazing fast screenshots\n- [x] Fullscreen capture (entire scrollable page)\n- [x] JSON output\n- [x] Full DOM extraction\n- [x] Javascript extraction (inline + external)\n- [ ] Javascript extraction (environment dump)\n- [x] Full network logs (incl. request/response bodies)\n- [x] Title\n- [x] Status code\n- [x] Fuzzy (perception) hashing\n- [ ] Technology detection\n- [x] OCR text extraction\n- [ ] Web interface\n\n### Example Commands\n\n```bash\n# Capture screenshots of all URLs in urls.txt\nwebcap -u urls.txt -o ./my_screenshots\n\n# Output to JSON, and include the fully-rendered DOM\nwebcap -u urls.txt --json --dom | jq\n\n# Capture requests and responses\nwebcap -u urls.txt --json --requests --responses | jq\n\n# Capture javascript\nwebcap -u urls.txt --json --javascript | jq\n\n# Extract text from screenshots\nwebcap -u urls.txt --json --ocr | jq\n```\n\n### Use as a Python library\n\n```python\nimport base64\nfrom webcap import Browser\n\nasync def main():\n # create a browser instance\n browser = Browser()\n # start the browser\n await browser.start()\n # take a screenshot\n webscreenshot = await browser.screenshot(\"http://example.com\")\n # save the screenshot to a file\n with open(\"screenshot.png\", \"wb\") as f:\n f.write(webscreenshot.blob)\n # stop the browser\n await browser.stop()\n\nif __name__ == \"__main__\":\n import asyncio\n asyncio.run(main())\n```\n\n## CLI Usage (--help)\n\n```\nusage: webcap [-h] [-u URLS [URLS ...]] [-o OUTPUT] [-j] [-r RESOLUTION] [-f] [--no-screenshots] [-t THREADS] [--delay DELAY] [-U USER_AGENT] [-H HEADERS [HEADERS ...]] [-p PROXY] [-b]\n [-d] [-rs] [-rq] [-J] [--ignore-types IGNORE_TYPES [IGNORE_TYPES ...]] [--ocr] [-s] [--debug] [--no-color] [-c CHROME]\n\noptions:\n -h, --help show this help message and exit\n -u URLS [URLS ...], --urls URLS [URLS ...]\n URL(s) to capture, or file(s) containing URLs\n -o OUTPUT, --output OUTPUT\n Output directory\n -j, --json Output JSON\n\nScreenshots:\n -r RESOLUTION, --resolution RESOLUTION\n Resolution to capture\n -f, --full-page Capture the full page (larger resolution images)\n --no-screenshots Don't take screenshots\n\nPerformance:\n -t THREADS, --threads THREADS\n Number of threads to use\n --delay DELAY Delay before capturing (default: 3.0 seconds)\n\nHTTP:\n -U USER_AGENT, --user-agent USER_AGENT\n User agent to use\n -H HEADERS [HEADERS ...], --headers HEADERS [HEADERS ...]\n Additional headers to send in format: 'Header-Name: Header-Value' (multiple supported)\n -p PROXY, --proxy PROXY\n HTTP proxy to use\n\nJSON Output:\n -b, --base64 Output each screenshot as base64\n -d, --dom Capture the fully-rendered DOM\n -rs, --responses Capture the full body of each HTTP response (including API calls etc.)\n -rq, --requests Capture the full body of each HTTP request (including API calls etc.)\n -J, --javascript Capture every snippet of Javascript (inline + external)\n --ignore-types IGNORE_TYPES [IGNORE_TYPES ...]\n Ignore certain types of network requests (default: Image, Media, Font, Stylesheet)\n --ocr Extract text from screenshots\n\nMisc:\n -s, --silent Silent mode\n --debug Enable debugging\n --no-color Disable color output\n -c CHROME, --chrome CHROME\n Path to Chrome executable\n```\n\n",
"bugtrack_url": null,
"license": "GPL-v3.0",
"summary": null,
"version": "0.1.21",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "bb8ae8f1eef129b208d03d926c795bbbeac083b91bef535d6666fe998879a68b",
"md5": "62eece409289c9c561d01383e3b826c9",
"sha256": "28a72f19157d92ba9a6dc34c5fe6bb229a9ccbfa3ad100fa8656660caa4f36d9"
},
"downloads": -1,
"filename": "webcap-0.1.21-py3-none-any.whl",
"has_sig": false,
"md5_digest": "62eece409289c9c561d01383e3b826c9",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.13,>=3.9",
"size": 36098,
"upload_time": "2024-12-16T22:08:14",
"upload_time_iso_8601": "2024-12-16T22:08:14.358360Z",
"url": "https://files.pythonhosted.org/packages/bb/8a/e8f1eef129b208d03d926c795bbbeac083b91bef535d6666fe998879a68b/webcap-0.1.21-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "e22f339c4541dfe4b38a3926148f313ef901daf263a9603fb535d42fd51f5f1a",
"md5": "21305c880418391438125d29ef04fe7a",
"sha256": "99cddd626aa1850137fb7c94e0ce42e03f427267696fb328334afc7203f54094"
},
"downloads": -1,
"filename": "webcap-0.1.21.tar.gz",
"has_sig": false,
"md5_digest": "21305c880418391438125d29ef04fe7a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.13,>=3.9",
"size": 33722,
"upload_time": "2024-12-16T22:08:16",
"upload_time_iso_8601": "2024-12-16T22:08:16.548194Z",
"url": "https://files.pythonhosted.org/packages/e2/2f/339c4541dfe4b38a3926148f313ef901daf263a9603fb535d42fd51f5f1a/webcap-0.1.21.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-16 22:08:16",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "webcap"
}