httpz-scanner

Name	httpz-scanner JSON
Version	2.1.9 JSON
	download
home_page	https://github.com/acidvegas/httpz
Summary	Hyper-fast HTTP Scraping Tool
upload_time	2025-02-12 08:09:32
maintainer	None
docs_url	None
author	acidvegas
requires_python	>=3.8
license	None
keywords
VCS
bugtrack_url
requirements	aiohttp beautifulsoup4 cryptography dnspython mmh3
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # HTTPZ Web Scanner

![](./.screens/preview.gif)

A high-performance concurrent web scanner written in Python. HTTPZ efficiently scans domains for HTTP/HTTPS services, extracting valuable information like status codes, titles, SSL certificates, and more.

## Requirements

- [Python](https://www.python.org/downloads/)
  - [aiohttp](https://pypi.org/project/aiohttp/)
  - [beautifulsoup4](https://pypi.org/project/beautifulsoup4/)
  - [cryptography](https://pypi.org/project/cryptography/)
  - [dnspython](https://pypi.org/project/dnspython/)
  - [mmh3](https://pypi.org/project/mmh3/)
  - [python-dotenv](https://pypi.org/project/python-dotenv/)

## Installation

### Via pip *(recommended)*
```bash
# Install from PyPI
pip install httpz_scanner

# The 'httpz' command will now be available in your terminal
httpz --help
```

### From source
```bash
# Clone the repository
git clone https://github.com/acidvegas/httpz
cd httpz
pip install -r requirements.txt
```

## Usage

### Command Line Interface

Basic usage:
```bash
python -m httpz_scanner domains.txt
```

Scan with all flags enabled and output to JSONL:
```bash
python -m httpz_scanner domains.txt -all -c 100 -o results.jsonl -j -p
```

Read from stdin:
```bash
cat domains.txt | python -m httpz_scanner - -all -c 100
echo "example.com" | python -m httpz_scanner - -all
```

Filter by status codes and follow redirects:
```bash
python -m httpz_scanner domains.txt -mc 200,301-399 -ec 404,500 -fr -p
```

Show specific fields with custom timeout and resolvers:
```bash
python -m httpz_scanner domains.txt -sc -ti -i -tls -to 10 -r resolvers.txt
```

Full scan with all options:
```bash
python -m httpz_scanner domains.txt -c 100 -o output.jsonl -j -all -to 10 -mc 200,301 -ec 404,500 -p -ax -r resolvers.txt
```

### Distributed Scanning
Split scanning across multiple machines using the `--shard` argument:

```bash
# Machine 1
httpz domains.txt --shard 1/3

# Machine 2
httpz domains.txt --shard 2/3

# Machine 3
httpz domains.txt --shard 3/3
```

Each machine will process a different subset of domains without overlap. For example, with 3 shards:
- Machine 1 processes lines 0,3,6,9,...
- Machine 2 processes lines 1,4,7,10,...
- Machine 3 processes lines 2,5,8,11,...

This allows efficient distribution of large scans across multiple machines.

### Python Library
```python
import asyncio
import urllib.request
from httpz_scanner import HTTPZScanner

async def scan_from_list() -> list:
    with urllib.request.urlopen('https://example.com/domains.txt') as response:
        content = response.read().decode()
        return [line.strip() for line in content.splitlines() if line.strip()][:20]
    
async def scan_from_url():
    with urllib.request.urlopen('https://example.com/domains.txt') as response:
        for line in response:
            if line := line.strip():
                yield line.decode().strip()

async def scan_from_file():
    with open('domains.txt', 'r') as file:
        for line in file:
            if line := line.strip():
                yield line

async def main():
    # Initialize scanner with all possible options (showing defaults)
    scanner = HTTPZScanner(
        concurrent_limit=100,   # Number of concurrent requests
        timeout=5,              # Request timeout in seconds
        follow_redirects=False, # Follow redirects (max 10)
        check_axfr=False,       # Try AXFR transfer against nameservers
        resolver_file=None,     # Path to custom DNS resolvers file
        output_file=None,       # Path to JSONL output file
        show_progress=False,    # Show progress counter
        debug_mode=False,       # Show error states and debug info
        jsonl_output=False,     # Output in JSONL format
        shard=None,             # Tuple of (shard_index, total_shards) for distributed scanning
        
        # Control which fields to show (all False by default unless show_fields is None)
        show_fields={
            'status_code': True,      # Show status code
            'content_type': True,     # Show content type
            'content_length': True,   # Show content length
            'title': True,            # Show page title
            'body': True,             # Show body preview
            'ip': True,               # Show IP addresses
            'favicon': True,          # Show favicon hash
            'headers': True,          # Show response headers
            'follow_redirects': True, # Show redirect chain
            'cname': True,            # Show CNAME records
            'tls': True               # Show TLS certificate info
        },
        
        # Filter results
        match_codes={200,301,302},  # Only show these status codes
        exclude_codes={404,500,503} # Exclude these status codes
    )

    # Example 1: Process file
    print('\nProcessing file:')
    async for result in scanner.scan(scan_from_file()):
        print(f"{result['domain']}: {result['status']}")

    # Example 2: Stream URLs
    print('\nStreaming URLs:')
    async for result in scanner.scan(scan_from_url()):
        print(f"{result['domain']}: {result['status']}")

    # Example 3: Process list
    print('\nProcessing list:')
    domains = await scan_from_list()
    async for result in scanner.scan(domains):
        print(f"{result['domain']}: {result['status']}")

if __name__ == '__main__':
    asyncio.run(main())
```

The scanner accepts various input types:
- File paths (string)
- Lists/tuples of domains
- stdin (using '-')
- Async generators that yield domains

All inputs support sharding for distributed scanning using the `shard` parameter.

## Arguments

| Argument      | Long Form        | Description                                                 |
|---------------|------------------|-------------------------------------------------------------|
| `file`        |                  | File containing domains *(one per line)*, use `-` for stdin |
| `-d`          | `--debug`        | Show error states and debug information                     |
| `-c N`        | `--concurrent N` | Number of concurrent checks *(default: 100)*                |
| `-o FILE`     | `--output FILE`  | Output file path *(JSONL format)*                           |
| `-j`          | `--jsonl`        | Output JSON Lines format to console                         |
| `-all`        | `--all-flags`    | Enable all output flags                                     |
| `-sh`         | `--shard N/T`    | Process shard N of T total shards *(e.g., 1/3)*             |

### Output Field Flags

| Flag   | Long Form            | Description                      |
|--------| ---------------------|----------------------------------|
| `-sc`  | `--status-code`      | Show status code                 |
| `-ct`  | `--content-type`     | Show content type                |
| `-ti`  | `--title`            | Show page title                  |
| `-b`   | `--body`             | Show body preview                |
| `-i`   | `--ip`               | Show IP addresses                |
| `-f`   | `--favicon`          | Show favicon hash                |
| `-hr`  | `--headers`          | Show response headers            |
| `-cl`  | `--content-length`   | Show content length              |
| `-fr`  | `--follow-redirects` | Follow redirects *(max 10)*      |
| `-cn`  | `--cname`            | Show CNAME records               |
| `-tls` | `--tls-info`         | Show TLS certificate information |

### Other Options

| Option      | Long Form               | Description                                         |
|-------------|-------------------------|-----------------------------------------------------|
| `-to N`     | `--timeout N`           | Request timeout in seconds *(default: 5)*           |
| `-mc CODES` | `--match-codes CODES`   | Only show specific status codes *(comma-separated)* |
| `-ec CODES` | `--exclude-codes CODES` | Exclude specific status codes *(comma-separated)*   |
| `-p`        | `--progress`            | Show progress counter                               |
| `-ax`       | `--axfr`                | Try AXFR transfer against nameservers               |
| `-r FILE`   | `--resolvers FILE`      | File containing DNS resolvers *(one per line)*      |

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/acidvegas/httpz",
    "name": "httpz-scanner",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": "acidvegas",
    "author_email": "acid.vegas@acid.vegas",
    "download_url": "https://files.pythonhosted.org/packages/00/cd/2484fcc50f385c0b528734d4dc73641b86c816aad98a3885f80993b8757c/httpz_scanner-2.1.9.tar.gz",
    "platform": null,
    "description": "# HTTPZ Web Scanner\n\n![](./.screens/preview.gif)\n\nA high-performance concurrent web scanner written in Python. HTTPZ efficiently scans domains for HTTP/HTTPS services, extracting valuable information like status codes, titles, SSL certificates, and more.\n\n## Requirements\n\n- [Python](https://www.python.org/downloads/)\n  - [aiohttp](https://pypi.org/project/aiohttp/)\n  - [beautifulsoup4](https://pypi.org/project/beautifulsoup4/)\n  - [cryptography](https://pypi.org/project/cryptography/)\n  - [dnspython](https://pypi.org/project/dnspython/)\n  - [mmh3](https://pypi.org/project/mmh3/)\n  - [python-dotenv](https://pypi.org/project/python-dotenv/)\n\n## Installation\n\n### Via pip *(recommended)*\n```bash\n# Install from PyPI\npip install httpz_scanner\n\n# The 'httpz' command will now be available in your terminal\nhttpz --help\n```\n\n### From source\n```bash\n# Clone the repository\ngit clone https://github.com/acidvegas/httpz\ncd httpz\npip install -r requirements.txt\n```\n\n## Usage\n\n### Command Line Interface\n\nBasic usage:\n```bash\npython -m httpz_scanner domains.txt\n```\n\nScan with all flags enabled and output to JSONL:\n```bash\npython -m httpz_scanner domains.txt -all -c 100 -o results.jsonl -j -p\n```\n\nRead from stdin:\n```bash\ncat domains.txt | python -m httpz_scanner - -all -c 100\necho \"example.com\" | python -m httpz_scanner - -all\n```\n\nFilter by status codes and follow redirects:\n```bash\npython -m httpz_scanner domains.txt -mc 200,301-399 -ec 404,500 -fr -p\n```\n\nShow specific fields with custom timeout and resolvers:\n```bash\npython -m httpz_scanner domains.txt -sc -ti -i -tls -to 10 -r resolvers.txt\n```\n\nFull scan with all options:\n```bash\npython -m httpz_scanner domains.txt -c 100 -o output.jsonl -j -all -to 10 -mc 200,301 -ec 404,500 -p -ax -r resolvers.txt\n```\n\n### Distributed Scanning\nSplit scanning across multiple machines using the `--shard` argument:\n\n```bash\n# Machine 1\nhttpz domains.txt --shard 1/3\n\n# Machine 2\nhttpz domains.txt --shard 2/3\n\n# Machine 3\nhttpz domains.txt --shard 3/3\n```\n\nEach machine will process a different subset of domains without overlap. For example, with 3 shards:\n- Machine 1 processes lines 0,3,6,9,...\n- Machine 2 processes lines 1,4,7,10,...\n- Machine 3 processes lines 2,5,8,11,...\n\nThis allows efficient distribution of large scans across multiple machines.\n\n### Python Library\n```python\nimport asyncio\nimport urllib.request\nfrom httpz_scanner import HTTPZScanner\n\nasync def scan_from_list() -> list:\n    with urllib.request.urlopen('https://example.com/domains.txt') as response:\n        content = response.read().decode()\n        return [line.strip() for line in content.splitlines() if line.strip()][:20]\n    \nasync def scan_from_url():\n    with urllib.request.urlopen('https://example.com/domains.txt') as response:\n        for line in response:\n            if line := line.strip():\n                yield line.decode().strip()\n\nasync def scan_from_file():\n    with open('domains.txt', 'r') as file:\n        for line in file:\n            if line := line.strip():\n                yield line\n\nasync def main():\n    # Initialize scanner with all possible options (showing defaults)\n    scanner = HTTPZScanner(\n        concurrent_limit=100,   # Number of concurrent requests\n        timeout=5,              # Request timeout in seconds\n        follow_redirects=False, # Follow redirects (max 10)\n        check_axfr=False,       # Try AXFR transfer against nameservers\n        resolver_file=None,     # Path to custom DNS resolvers file\n        output_file=None,       # Path to JSONL output file\n        show_progress=False,    # Show progress counter\n        debug_mode=False,       # Show error states and debug info\n        jsonl_output=False,     # Output in JSONL format\n        shard=None,             # Tuple of (shard_index, total_shards) for distributed scanning\n        \n        # Control which fields to show (all False by default unless show_fields is None)\n        show_fields={\n            'status_code': True,      # Show status code\n            'content_type': True,     # Show content type\n            'content_length': True,   # Show content length\n            'title': True,            # Show page title\n            'body': True,             # Show body preview\n            'ip': True,               # Show IP addresses\n            'favicon': True,          # Show favicon hash\n            'headers': True,          # Show response headers\n            'follow_redirects': True, # Show redirect chain\n            'cname': True,            # Show CNAME records\n            'tls': True               # Show TLS certificate info\n        },\n        \n        # Filter results\n        match_codes={200,301,302},  # Only show these status codes\n        exclude_codes={404,500,503} # Exclude these status codes\n    )\n\n    # Example 1: Process file\n    print('\\nProcessing file:')\n    async for result in scanner.scan(scan_from_file()):\n        print(f\"{result['domain']}: {result['status']}\")\n\n    # Example 2: Stream URLs\n    print('\\nStreaming URLs:')\n    async for result in scanner.scan(scan_from_url()):\n        print(f\"{result['domain']}: {result['status']}\")\n\n    # Example 3: Process list\n    print('\\nProcessing list:')\n    domains = await scan_from_list()\n    async for result in scanner.scan(domains):\n        print(f\"{result['domain']}: {result['status']}\")\n\nif __name__ == '__main__':\n    asyncio.run(main())\n```\n\nThe scanner accepts various input types:\n- File paths (string)\n- Lists/tuples of domains\n- stdin (using '-')\n- Async generators that yield domains\n\nAll inputs support sharding for distributed scanning using the `shard` parameter.\n\n## Arguments\n\n| Argument      | Long Form        | Description                                                 |\n|---------------|------------------|-------------------------------------------------------------|\n| `file`        |                  | File containing domains *(one per line)*, use `-` for stdin |\n| `-d`          | `--debug`        | Show error states and debug information                     |\n| `-c N`        | `--concurrent N` | Number of concurrent checks *(default: 100)*                |\n| `-o FILE`     | `--output FILE`  | Output file path *(JSONL format)*                           |\n| `-j`          | `--jsonl`        | Output JSON Lines format to console                         |\n| `-all`        | `--all-flags`    | Enable all output flags                                     |\n| `-sh`         | `--shard N/T`    | Process shard N of T total shards *(e.g., 1/3)*             |\n\n### Output Field Flags\n\n| Flag   | Long Form            | Description                      |\n|--------| ---------------------|----------------------------------|\n| `-sc`  | `--status-code`      | Show status code                 |\n| `-ct`  | `--content-type`     | Show content type                |\n| `-ti`  | `--title`            | Show page title                  |\n| `-b`   | `--body`             | Show body preview                |\n| `-i`   | `--ip`               | Show IP addresses                |\n| `-f`   | `--favicon`          | Show favicon hash                |\n| `-hr`  | `--headers`          | Show response headers            |\n| `-cl`  | `--content-length`   | Show content length              |\n| `-fr`  | `--follow-redirects` | Follow redirects *(max 10)*      |\n| `-cn`  | `--cname`            | Show CNAME records               |\n| `-tls` | `--tls-info`         | Show TLS certificate information |\n\n### Other Options\n\n| Option      | Long Form               | Description                                         |\n|-------------|-------------------------|-----------------------------------------------------|\n| `-to N`     | `--timeout N`           | Request timeout in seconds *(default: 5)*           |\n| `-mc CODES` | `--match-codes CODES`   | Only show specific status codes *(comma-separated)* |\n| `-ec CODES` | `--exclude-codes CODES` | Exclude specific status codes *(comma-separated)*   |\n| `-p`        | `--progress`            | Show progress counter                               |\n| `-ax`       | `--axfr`                | Try AXFR transfer against nameservers               |\n| `-r FILE`   | `--resolvers FILE`      | File containing DNS resolvers *(one per line)*      |\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Hyper-fast HTTP Scraping Tool",
    "version": "2.1.9",
    "project_urls": {
        "Homepage": "https://github.com/acidvegas/httpz"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4ba68d36afdd9c926c74fa2a6989b6760794e1613cb76f275e43853f082e7df4",
                "md5": "cb7beb27128c6c4f586d982928b2a830",
                "sha256": "a882a78d16730713c4d2f9300238115e73d57ff5401125785441c7707f84d506"
            },
            "downloads": -1,
            "filename": "httpz_scanner-2.1.9-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "cb7beb27128c6c4f586d982928b2a830",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 19826,
            "upload_time": "2025-02-12T08:09:25",
            "upload_time_iso_8601": "2025-02-12T08:09:25.278363Z",
            "url": "https://files.pythonhosted.org/packages/4b/a6/8d36afdd9c926c74fa2a6989b6760794e1613cb76f275e43853f082e7df4/httpz_scanner-2.1.9-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "00cd2484fcc50f385c0b528734d4dc73641b86c816aad98a3885f80993b8757c",
                "md5": "a486cd69d8e84336cd1e41467dd16151",
                "sha256": "ecc7bd61e48a498487605f76859b1489709e85190739846ec856ad84ea8676d1"
            },
            "downloads": -1,
            "filename": "httpz_scanner-2.1.9.tar.gz",
            "has_sig": false,
            "md5_digest": "a486cd69d8e84336cd1e41467dd16151",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 18821,
            "upload_time": "2025-02-12T08:09:32",
            "upload_time_iso_8601": "2025-02-12T08:09:32.422044Z",
            "url": "https://files.pythonhosted.org/packages/00/cd/2484fcc50f385c0b528734d4dc73641b86c816aad98a3885f80993b8757c/httpz_scanner-2.1.9.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-12 08:09:32",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "acidvegas",
    "github_project": "httpz",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "aiohttp",
            "specs": [
                [
                    ">=",
                    "3.8.0"
                ]
            ]
        },
        {
            "name": "beautifulsoup4",
            "specs": [
                [
                    ">=",
                    "4.9.3"
                ]
            ]
        },
        {
            "name": "cryptography",
            "specs": [
                [
                    ">=",
                    "3.4.7"
                ]
            ]
        },
        {
            "name": "dnspython",
            "specs": [
                [
                    ">=",
                    "2.1.0"
                ]
            ]
        },
        {
            "name": "mmh3",
            "specs": [
                [
                    ">=",
                    "3.0.0"
                ]
            ]
        }
    ],
    "lcname": "httpz-scanner"
}

acidvegas