# HTTPZ Web Scanner

A high-performance concurrent web scanner written in Python. HTTPZ efficiently scans domains for HTTP/HTTPS services, extracting valuable information like status codes, titles, SSL certificates, and more.
## Requirements
- [Python](https://www.python.org/downloads/)
- [aiohttp](https://pypi.org/project/aiohttp/)
- [beautifulsoup4](https://pypi.org/project/beautifulsoup4/)
- [cryptography](https://pypi.org/project/cryptography/)
- [dnspython](https://pypi.org/project/dnspython/)
- [mmh3](https://pypi.org/project/mmh3/)
- [python-dotenv](https://pypi.org/project/python-dotenv/)
## Installation
### Via pip *(recommended)*
```bash
# Install from PyPI
pip install httpz_scanner
# The 'httpz' command will now be available in your terminal
httpz --help
```
### From source
```bash
# Clone the repository
git clone https://github.com/acidvegas/httpz
cd httpz
pip install -r requirements.txt
```
## Usage
### Command Line Interface
Basic usage:
```bash
python -m httpz_scanner domains.txt
```
Scan with all flags enabled and output to JSONL:
```bash
python -m httpz_scanner domains.txt -all -c 100 -o results.jsonl -j -p
```
Read from stdin:
```bash
cat domains.txt | python -m httpz_scanner - -all -c 100
echo "example.com" | python -m httpz_scanner - -all
```
Filter by status codes and follow redirects:
```bash
python -m httpz_scanner domains.txt -mc 200,301-399 -ec 404,500 -fr -p
```
Show specific fields with custom timeout and resolvers:
```bash
python -m httpz_scanner domains.txt -sc -ti -i -tls -to 10 -r resolvers.txt
```
Full scan with all options:
```bash
python -m httpz_scanner domains.txt -c 100 -o output.jsonl -j -all -to 10 -mc 200,301 -ec 404,500 -p -ax -r resolvers.txt
```
### Distributed Scanning
Split scanning across multiple machines using the `--shard` argument:
```bash
# Machine 1
httpz domains.txt --shard 1/3
# Machine 2
httpz domains.txt --shard 2/3
# Machine 3
httpz domains.txt --shard 3/3
```
Each machine will process a different subset of domains without overlap. For example, with 3 shards:
- Machine 1 processes lines 0,3,6,9,...
- Machine 2 processes lines 1,4,7,10,...
- Machine 3 processes lines 2,5,8,11,...
This allows efficient distribution of large scans across multiple machines.
### Python Library
```python
import asyncio
import urllib.request
from httpz_scanner import HTTPZScanner
async def scan_from_list() -> list:
with urllib.request.urlopen('https://example.com/domains.txt') as response:
content = response.read().decode()
return [line.strip() for line in content.splitlines() if line.strip()][:20]
async def scan_from_url():
with urllib.request.urlopen('https://example.com/domains.txt') as response:
for line in response:
if line := line.strip():
yield line.decode().strip()
async def scan_from_file():
with open('domains.txt', 'r') as file:
for line in file:
if line := line.strip():
yield line
async def main():
# Initialize scanner with all possible options (showing defaults)
scanner = HTTPZScanner(
concurrent_limit=100, # Number of concurrent requests
timeout=5, # Request timeout in seconds
follow_redirects=False, # Follow redirects (max 10)
check_axfr=False, # Try AXFR transfer against nameservers
resolver_file=None, # Path to custom DNS resolvers file
output_file=None, # Path to JSONL output file
show_progress=False, # Show progress counter
debug_mode=False, # Show error states and debug info
jsonl_output=False, # Output in JSONL format
shard=None, # Tuple of (shard_index, total_shards) for distributed scanning
# Control which fields to show (all False by default unless show_fields is None)
show_fields={
'status_code': True, # Show status code
'content_type': True, # Show content type
'content_length': True, # Show content length
'title': True, # Show page title
'body': True, # Show body preview
'ip': True, # Show IP addresses
'favicon': True, # Show favicon hash
'headers': True, # Show response headers
'follow_redirects': True, # Show redirect chain
'cname': True, # Show CNAME records
'tls': True # Show TLS certificate info
},
# Filter results
match_codes={200,301,302}, # Only show these status codes
exclude_codes={404,500,503} # Exclude these status codes
)
# Example 1: Process file
print('\nProcessing file:')
async for result in scanner.scan(scan_from_file()):
print(f"{result['domain']}: {result['status']}")
# Example 2: Stream URLs
print('\nStreaming URLs:')
async for result in scanner.scan(scan_from_url()):
print(f"{result['domain']}: {result['status']}")
# Example 3: Process list
print('\nProcessing list:')
domains = await scan_from_list()
async for result in scanner.scan(domains):
print(f"{result['domain']}: {result['status']}")
if __name__ == '__main__':
asyncio.run(main())
```
The scanner accepts various input types:
- File paths (string)
- Lists/tuples of domains
- stdin (using '-')
- Async generators that yield domains
All inputs support sharding for distributed scanning using the `shard` parameter.
## Arguments
| Argument | Long Form | Description |
|---------------|------------------|-------------------------------------------------------------|
| `file` | | File containing domains *(one per line)*, use `-` for stdin |
| `-d` | `--debug` | Show error states and debug information |
| `-c N` | `--concurrent N` | Number of concurrent checks *(default: 100)* |
| `-o FILE` | `--output FILE` | Output file path *(JSONL format)* |
| `-j` | `--jsonl` | Output JSON Lines format to console |
| `-all` | `--all-flags` | Enable all output flags |
| `-sh` | `--shard N/T` | Process shard N of T total shards *(e.g., 1/3)* |
### Output Field Flags
| Flag | Long Form | Description |
|--------| ---------------------|----------------------------------|
| `-sc` | `--status-code` | Show status code |
| `-ct` | `--content-type` | Show content type |
| `-ti` | `--title` | Show page title |
| `-b` | `--body` | Show body preview |
| `-i` | `--ip` | Show IP addresses |
| `-f` | `--favicon` | Show favicon hash |
| `-hr` | `--headers` | Show response headers |
| `-cl` | `--content-length` | Show content length |
| `-fr` | `--follow-redirects` | Follow redirects *(max 10)* |
| `-cn` | `--cname` | Show CNAME records |
| `-tls` | `--tls-info` | Show TLS certificate information |
### Other Options
| Option | Long Form | Description |
|-------------|-------------------------|-----------------------------------------------------|
| `-to N` | `--timeout N` | Request timeout in seconds *(default: 5)* |
| `-mc CODES` | `--match-codes CODES` | Only show specific status codes *(comma-separated)* |
| `-ec CODES` | `--exclude-codes CODES` | Exclude specific status codes *(comma-separated)* |
| `-p` | `--progress` | Show progress counter |
| `-ax` | `--axfr` | Try AXFR transfer against nameservers |
| `-r FILE` | `--resolvers FILE` | File containing DNS resolvers *(one per line)* |
Raw data
{
"_id": null,
"home_page": "https://github.com/acidvegas/httpz",
"name": "httpz-scanner",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": null,
"author": "acidvegas",
"author_email": "acid.vegas@acid.vegas",
"download_url": "https://files.pythonhosted.org/packages/00/cd/2484fcc50f385c0b528734d4dc73641b86c816aad98a3885f80993b8757c/httpz_scanner-2.1.9.tar.gz",
"platform": null,
"description": "# HTTPZ Web Scanner\n\n\n\nA high-performance concurrent web scanner written in Python. HTTPZ efficiently scans domains for HTTP/HTTPS services, extracting valuable information like status codes, titles, SSL certificates, and more.\n\n## Requirements\n\n- [Python](https://www.python.org/downloads/)\n - [aiohttp](https://pypi.org/project/aiohttp/)\n - [beautifulsoup4](https://pypi.org/project/beautifulsoup4/)\n - [cryptography](https://pypi.org/project/cryptography/)\n - [dnspython](https://pypi.org/project/dnspython/)\n - [mmh3](https://pypi.org/project/mmh3/)\n - [python-dotenv](https://pypi.org/project/python-dotenv/)\n\n## Installation\n\n### Via pip *(recommended)*\n```bash\n# Install from PyPI\npip install httpz_scanner\n\n# The 'httpz' command will now be available in your terminal\nhttpz --help\n```\n\n### From source\n```bash\n# Clone the repository\ngit clone https://github.com/acidvegas/httpz\ncd httpz\npip install -r requirements.txt\n```\n\n## Usage\n\n### Command Line Interface\n\nBasic usage:\n```bash\npython -m httpz_scanner domains.txt\n```\n\nScan with all flags enabled and output to JSONL:\n```bash\npython -m httpz_scanner domains.txt -all -c 100 -o results.jsonl -j -p\n```\n\nRead from stdin:\n```bash\ncat domains.txt | python -m httpz_scanner - -all -c 100\necho \"example.com\" | python -m httpz_scanner - -all\n```\n\nFilter by status codes and follow redirects:\n```bash\npython -m httpz_scanner domains.txt -mc 200,301-399 -ec 404,500 -fr -p\n```\n\nShow specific fields with custom timeout and resolvers:\n```bash\npython -m httpz_scanner domains.txt -sc -ti -i -tls -to 10 -r resolvers.txt\n```\n\nFull scan with all options:\n```bash\npython -m httpz_scanner domains.txt -c 100 -o output.jsonl -j -all -to 10 -mc 200,301 -ec 404,500 -p -ax -r resolvers.txt\n```\n\n### Distributed Scanning\nSplit scanning across multiple machines using the `--shard` argument:\n\n```bash\n# Machine 1\nhttpz domains.txt --shard 1/3\n\n# Machine 2\nhttpz domains.txt --shard 2/3\n\n# Machine 3\nhttpz domains.txt --shard 3/3\n```\n\nEach machine will process a different subset of domains without overlap. For example, with 3 shards:\n- Machine 1 processes lines 0,3,6,9,...\n- Machine 2 processes lines 1,4,7,10,...\n- Machine 3 processes lines 2,5,8,11,...\n\nThis allows efficient distribution of large scans across multiple machines.\n\n### Python Library\n```python\nimport asyncio\nimport urllib.request\nfrom httpz_scanner import HTTPZScanner\n\nasync def scan_from_list() -> list:\n with urllib.request.urlopen('https://example.com/domains.txt') as response:\n content = response.read().decode()\n return [line.strip() for line in content.splitlines() if line.strip()][:20]\n \nasync def scan_from_url():\n with urllib.request.urlopen('https://example.com/domains.txt') as response:\n for line in response:\n if line := line.strip():\n yield line.decode().strip()\n\nasync def scan_from_file():\n with open('domains.txt', 'r') as file:\n for line in file:\n if line := line.strip():\n yield line\n\nasync def main():\n # Initialize scanner with all possible options (showing defaults)\n scanner = HTTPZScanner(\n concurrent_limit=100, # Number of concurrent requests\n timeout=5, # Request timeout in seconds\n follow_redirects=False, # Follow redirects (max 10)\n check_axfr=False, # Try AXFR transfer against nameservers\n resolver_file=None, # Path to custom DNS resolvers file\n output_file=None, # Path to JSONL output file\n show_progress=False, # Show progress counter\n debug_mode=False, # Show error states and debug info\n jsonl_output=False, # Output in JSONL format\n shard=None, # Tuple of (shard_index, total_shards) for distributed scanning\n \n # Control which fields to show (all False by default unless show_fields is None)\n show_fields={\n 'status_code': True, # Show status code\n 'content_type': True, # Show content type\n 'content_length': True, # Show content length\n 'title': True, # Show page title\n 'body': True, # Show body preview\n 'ip': True, # Show IP addresses\n 'favicon': True, # Show favicon hash\n 'headers': True, # Show response headers\n 'follow_redirects': True, # Show redirect chain\n 'cname': True, # Show CNAME records\n 'tls': True # Show TLS certificate info\n },\n \n # Filter results\n match_codes={200,301,302}, # Only show these status codes\n exclude_codes={404,500,503} # Exclude these status codes\n )\n\n # Example 1: Process file\n print('\\nProcessing file:')\n async for result in scanner.scan(scan_from_file()):\n print(f\"{result['domain']}: {result['status']}\")\n\n # Example 2: Stream URLs\n print('\\nStreaming URLs:')\n async for result in scanner.scan(scan_from_url()):\n print(f\"{result['domain']}: {result['status']}\")\n\n # Example 3: Process list\n print('\\nProcessing list:')\n domains = await scan_from_list()\n async for result in scanner.scan(domains):\n print(f\"{result['domain']}: {result['status']}\")\n\nif __name__ == '__main__':\n asyncio.run(main())\n```\n\nThe scanner accepts various input types:\n- File paths (string)\n- Lists/tuples of domains\n- stdin (using '-')\n- Async generators that yield domains\n\nAll inputs support sharding for distributed scanning using the `shard` parameter.\n\n## Arguments\n\n| Argument | Long Form | Description |\n|---------------|------------------|-------------------------------------------------------------|\n| `file` | | File containing domains *(one per line)*, use `-` for stdin |\n| `-d` | `--debug` | Show error states and debug information |\n| `-c N` | `--concurrent N` | Number of concurrent checks *(default: 100)* |\n| `-o FILE` | `--output FILE` | Output file path *(JSONL format)* |\n| `-j` | `--jsonl` | Output JSON Lines format to console |\n| `-all` | `--all-flags` | Enable all output flags |\n| `-sh` | `--shard N/T` | Process shard N of T total shards *(e.g., 1/3)* |\n\n### Output Field Flags\n\n| Flag | Long Form | Description |\n|--------| ---------------------|----------------------------------|\n| `-sc` | `--status-code` | Show status code |\n| `-ct` | `--content-type` | Show content type |\n| `-ti` | `--title` | Show page title |\n| `-b` | `--body` | Show body preview |\n| `-i` | `--ip` | Show IP addresses |\n| `-f` | `--favicon` | Show favicon hash |\n| `-hr` | `--headers` | Show response headers |\n| `-cl` | `--content-length` | Show content length |\n| `-fr` | `--follow-redirects` | Follow redirects *(max 10)* |\n| `-cn` | `--cname` | Show CNAME records |\n| `-tls` | `--tls-info` | Show TLS certificate information |\n\n### Other Options\n\n| Option | Long Form | Description |\n|-------------|-------------------------|-----------------------------------------------------|\n| `-to N` | `--timeout N` | Request timeout in seconds *(default: 5)* |\n| `-mc CODES` | `--match-codes CODES` | Only show specific status codes *(comma-separated)* |\n| `-ec CODES` | `--exclude-codes CODES` | Exclude specific status codes *(comma-separated)* |\n| `-p` | `--progress` | Show progress counter |\n| `-ax` | `--axfr` | Try AXFR transfer against nameservers |\n| `-r FILE` | `--resolvers FILE` | File containing DNS resolvers *(one per line)* |\n",
"bugtrack_url": null,
"license": null,
"summary": "Hyper-fast HTTP Scraping Tool",
"version": "2.1.9",
"project_urls": {
"Homepage": "https://github.com/acidvegas/httpz"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "4ba68d36afdd9c926c74fa2a6989b6760794e1613cb76f275e43853f082e7df4",
"md5": "cb7beb27128c6c4f586d982928b2a830",
"sha256": "a882a78d16730713c4d2f9300238115e73d57ff5401125785441c7707f84d506"
},
"downloads": -1,
"filename": "httpz_scanner-2.1.9-py3-none-any.whl",
"has_sig": false,
"md5_digest": "cb7beb27128c6c4f586d982928b2a830",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 19826,
"upload_time": "2025-02-12T08:09:25",
"upload_time_iso_8601": "2025-02-12T08:09:25.278363Z",
"url": "https://files.pythonhosted.org/packages/4b/a6/8d36afdd9c926c74fa2a6989b6760794e1613cb76f275e43853f082e7df4/httpz_scanner-2.1.9-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "00cd2484fcc50f385c0b528734d4dc73641b86c816aad98a3885f80993b8757c",
"md5": "a486cd69d8e84336cd1e41467dd16151",
"sha256": "ecc7bd61e48a498487605f76859b1489709e85190739846ec856ad84ea8676d1"
},
"downloads": -1,
"filename": "httpz_scanner-2.1.9.tar.gz",
"has_sig": false,
"md5_digest": "a486cd69d8e84336cd1e41467dd16151",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 18821,
"upload_time": "2025-02-12T08:09:32",
"upload_time_iso_8601": "2025-02-12T08:09:32.422044Z",
"url": "https://files.pythonhosted.org/packages/00/cd/2484fcc50f385c0b528734d4dc73641b86c816aad98a3885f80993b8757c/httpz_scanner-2.1.9.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-12 08:09:32",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "acidvegas",
"github_project": "httpz",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "aiohttp",
"specs": [
[
">=",
"3.8.0"
]
]
},
{
"name": "beautifulsoup4",
"specs": [
[
">=",
"4.9.3"
]
]
},
{
"name": "cryptography",
"specs": [
[
">=",
"3.4.7"
]
]
},
{
"name": "dnspython",
"specs": [
[
">=",
"2.1.0"
]
]
},
{
"name": "mmh3",
"specs": [
[
">=",
"3.0.0"
]
]
}
],
"lcname": "httpz-scanner"
}