hrfh

Name	hrfh JSON
Version	0.1.22 JSON
	download
home_page	None
Summary	HTTP Response Fuzzy Hashing
upload_time	2025-08-12 13:25:45
maintainer	None
docs_url	None
author	None
requires_python	>=3.7
license	None
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # HRFH - HTTP Response Fuzzy Hashing

[![Python 3.7+](https://img.shields.io/badge/python-3.7+-blue.svg)](https://www.python.org/downloads/)
[![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)
[![PyPI](https://img.shields.io/badge/pypi-v0.1.19-blue.svg)](https://pypi.org/project/hrfh/)

A Python library for generating fuzzy hashes of HTTP responses, useful for identifying similar web content, detecting CDN configurations, and analyzing web infrastructure.

## Features

- **Fast Processing**: Efficient HTTP response parsing and hashing
- **Fuzzy Hashing**: Generate consistent hashes for similar content
- **Content Masking**: Intelligent masking of dynamic content (timestamps, IDs, etc.)
- **Multiple Formats**: Support for raw HTTP responses and JSON data
- **Python 3.7+**: Compatible with modern Python versions
- **Easy Integration**: Simple API for embedding in your projects

## Installation

### From PyPI (Recommended)

```bash
pip install hrfh
```

### From Source

```bash
git clone https://github.com/yourusername/hrfh.git
cd hrfh
uv sync
```

## Quick Start

### Basic Usage

```python
from hrfh.utils.parser import create_http_response_from_bytes

# Parse HTTP response from bytes
response = create_http_response_from_bytes(
    b"""HTTP/1.0 200 OK\r\nServer: nginx\r\nServer: apache\r\nETag: ea67ba7f802fb5c6cfa13a6b6d27adc6\r\n\r\n"""
)

# Get basic response info
print(response)
# Output: <HTTPResponse 1.1.1.1:80 200 OK>

# Get masked content (with dynamic parts masked)
print(response.masked)
# Output: HTTP/1.0 200 OK
#         ETag: [MASK]
#         Server: apache
#         Server: nginx

# Generate fuzzy hash for similarity detection
print(response.fuzzy_hash())
# Output: ba15cc1f9ad3ef632d0ce7798f7fa44718f1e7fcc2c0f94c1a702f647b79923b
```

### Interactive Example

```python
>>> from hrfh.utils.parser import create_http_response_from_bytes
>>> response = create_http_response_from_bytes(b"""HTTP/1.0 200 OK\r\nServer: nginx\r\nServer: apache\r\nETag: ea67ba7f802fb5c6cfa13a6b6d27adc6\r\n\r\n""")
>>> print(response)
<HTTPResponse 1.1.1.1:80 200 OK>
>>> print(response.masked)
HTTP/1.0 200 OK
ETag: [MASK]
Server: apache
Server: nginx
>>> print(response.fuzzy_hash())
ba15cc1f9ad3ef632d0ce7798f7fa44718f1e7fcc2c0f94c1a702f647b79923b
```

## API Reference

### Core Classes

#### HTTPResponse

Main class for representing HTTP responses with fuzzy hashing capabilities.

```python
from hrfh.models import HTTPResponse

response = HTTPResponse(
    ip="1.2.3.4",
    port=80,
    version="HTTP/1.1",
    status_code=200,
    status_reason="OK",
    headers=[("Server", "nginx"), ("Content-Type", "text/html")],
    body=b"<html>Hello World</html>"
)
```

**Key Methods:**
- `fuzzy_hash()`: Generate fuzzy hash for similarity detection
- `masked`: Get masked content with dynamic parts hidden
- `dump()`: Get formatted HTTP response string

#### HTTPRequest

Class for representing HTTP requests.

```python
from hrfh.models import HTTPRequest

request = HTTPRequest(
    ip="1.2.3.4",
    port=80,
    method="GET",
    version="HTTP/1.1",
    headers=[("Host", "example.com")],
    body=b""
)
```

### Utility Functions

#### Parsing Functions

```python
from hrfh.utils.parser import (
    create_http_response_from_bytes,
    create_http_response_from_json,
    create_http_request_from_json
)

# Parse from raw HTTP response bytes
response = create_http_response_from_bytes(http_bytes)

# Parse from JSON data
response = create_http_response_from_json(json_data)
request = create_http_request_from_json(json_data)
```

## Advanced Usage

### Working with JSON Data

```python
import json
from hrfh.utils.parser import create_http_response_from_json

# Load HTTP response data from JSON file
with open('response_data.json', 'r') as f:
    data = json.load(f)

response = create_http_response_from_json(data)
hash_value = response.fuzzy_hash()
```

**Example JSON format:**
```json
{
  "ip": "104.103.147.116",
  "timestamp": 1717146116,
  "status_code": 400,
  "status_reason": "Bad Request",
  "headers": {
    "Server": "AkamaiGHost",
    "Content-Type": "text/html",
    "Content-Length": "312"
  },
  "body": "<HTML><HEAD><TITLE>Invalid URL</TITLE></HEAD><BODY>...</BODY></HTML>"
}
```

### Batch Processing

```python
import os
from hrfh.utils.parser import create_http_response_from_json

def process_responses(data_dir):
    results = {}

    for cdn_dir in os.listdir(data_dir):
        cdn_path = os.path.join(data_dir, cdn_dir)
        if os.path.isdir(cdn_path):
            for json_file in os.listdir(cdn_path):
                if json_file.endswith('.json'):
                    file_path = os.path.join(cdn_path, json_file)
                    with open(file_path, 'r') as f:
                        data = json.load(f)

                    response = create_http_response_from_json(data)
                    hash_value = response.fuzzy_hash()
                    results[hash_value] = response

    return results

# Usage
results = process_responses('data/')
for hash_val, response in results.items():
    print(f"{hash_val[:16]} {response}")
```

## Development

### Setting Up Development Environment

1. **Clone the repository**
   ```bash
   git clone https://github.com/yourusername/hrfh.git
   cd hrfh
   ```

2. **Install dependencies**
   ```bash
   uv sync
   ```

3. **Run tests**
   ```bash
   uv run pytest
   ```

4. **Type checking**
   ```bash
   uv run mypy hrfh/
   ```

### Project Structure

```
hrfh/
├── hrfh/                    # Main package
│   ├── models/             # Data models (HTTPRequest, HTTPResponse)
│   ├── utils/              # Utility functions
│   │   ├── parser.py       # HTTP parsing utilities
│   │   ├── masker.py       # Content masking logic
│   │   ├── hasher.py       # Hashing algorithms
│   │   └── tokenizer.py    # HTML tokenization
│   └── __main__.py         # CLI entry point
├── tests/                   # Test suite
├── data/                    # Sample data for testing
├── pyproject.toml          # Project configuration
└── README.md               # This file
```

### Running the CLI Tool

```bash
# Install the package in development mode
uv sync

# Run the CLI tool
uv run hrfh --help

# Process a specific file
uv run hrfh data/akamai/104.103.147.116.json

# Process from stdin
cat data/akamai/104.103.147.116.json | uv run hrfh -
```

### Contributing

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

### Testing

```bash
# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=hrfh

# Run specific test file
uv run pytest tests/test_http_response.py
```

## Examples

### CDN Analysis

```python
from hrfh.utils.parser import create_http_response_from_bytes

# Analyze responses from different CDNs
akamai_response = create_http_response_from_bytes(akamai_bytes)
cloudflare_response = create_http_response_from_bytes(cloudflare_bytes)

# Compare hashes to detect similar content
if akamai_response.fuzzy_hash() == cloudflare_response.fuzzy_hash():
    print("Same content served from different CDNs")
```

### Content Change Detection

```python
# Monitor for content changes
old_hash = response.fuzzy_hash()

# After some time...
new_response = create_http_response_from_bytes(new_bytes)
new_hash = new_response.fuzzy_hash()

if old_hash != new_hash:
    print("Content has changed!")
```

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Support

- **Issues**: [GitHub Issues](https://github.com/yourusername/hrfh/issues)
- **Documentation**: [GitHub Wiki](https://github.com/yourusername/hrfh/wiki)
- **Discussions**: [GitHub Discussions](https://github.com/yourusername/hrfh/discussions)

## Acknowledgments

- Built with [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/) for HTML parsing
- Uses [NLTK](https://www.nltk.org/) for natural language processing
- Inspired by fuzzy hashing techniques for digital forensics

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "hrfh",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/bc/d5/3a1fa8b8d131c453226af21c5fd542e8c459e58597c6f5a3401fbe87d7d8/hrfh-0.1.22.tar.gz",
    "platform": null,
    "description": "# HRFH - HTTP Response Fuzzy Hashing\n\n[![Python 3.7+](https://img.shields.io/badge/python-3.7+-blue.svg)](https://www.python.org/downloads/)\n[![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)\n[![PyPI](https://img.shields.io/badge/pypi-v0.1.19-blue.svg)](https://pypi.org/project/hrfh/)\n\nA Python library for generating fuzzy hashes of HTTP responses, useful for identifying similar web content, detecting CDN configurations, and analyzing web infrastructure.\n\n## Features\n\n- **Fast Processing**: Efficient HTTP response parsing and hashing\n- **Fuzzy Hashing**: Generate consistent hashes for similar content\n- **Content Masking**: Intelligent masking of dynamic content (timestamps, IDs, etc.)\n- **Multiple Formats**: Support for raw HTTP responses and JSON data\n- **Python 3.7+**: Compatible with modern Python versions\n- **Easy Integration**: Simple API for embedding in your projects\n\n## Installation\n\n### From PyPI (Recommended)\n\n```bash\npip install hrfh\n```\n\n### From Source\n\n```bash\ngit clone https://github.com/yourusername/hrfh.git\ncd hrfh\nuv sync\n```\n\n## Quick Start\n\n### Basic Usage\n\n```python\nfrom hrfh.utils.parser import create_http_response_from_bytes\n\n# Parse HTTP response from bytes\nresponse = create_http_response_from_bytes(\n    b\"\"\"HTTP/1.0 200 OK\\r\\nServer: nginx\\r\\nServer: apache\\r\\nETag: ea67ba7f802fb5c6cfa13a6b6d27adc6\\r\\n\\r\\n\"\"\"\n)\n\n# Get basic response info\nprint(response)\n# Output: <HTTPResponse 1.1.1.1:80 200 OK>\n\n# Get masked content (with dynamic parts masked)\nprint(response.masked)\n# Output: HTTP/1.0 200 OK\n#         ETag: [MASK]\n#         Server: apache\n#         Server: nginx\n\n# Generate fuzzy hash for similarity detection\nprint(response.fuzzy_hash())\n# Output: ba15cc1f9ad3ef632d0ce7798f7fa44718f1e7fcc2c0f94c1a702f647b79923b\n```\n\n### Interactive Example\n\n```python\n>>> from hrfh.utils.parser import create_http_response_from_bytes\n>>> response = create_http_response_from_bytes(b\"\"\"HTTP/1.0 200 OK\\r\\nServer: nginx\\r\\nServer: apache\\r\\nETag: ea67ba7f802fb5c6cfa13a6b6d27adc6\\r\\n\\r\\n\"\"\")\n>>> print(response)\n<HTTPResponse 1.1.1.1:80 200 OK>\n>>> print(response.masked)\nHTTP/1.0 200 OK\nETag: [MASK]\nServer: apache\nServer: nginx\n>>> print(response.fuzzy_hash())\nba15cc1f9ad3ef632d0ce7798f7fa44718f1e7fcc2c0f94c1a702f647b79923b\n```\n\n## API Reference\n\n### Core Classes\n\n#### HTTPResponse\n\nMain class for representing HTTP responses with fuzzy hashing capabilities.\n\n```python\nfrom hrfh.models import HTTPResponse\n\nresponse = HTTPResponse(\n    ip=\"1.2.3.4\",\n    port=80,\n    version=\"HTTP/1.1\",\n    status_code=200,\n    status_reason=\"OK\",\n    headers=[(\"Server\", \"nginx\"), (\"Content-Type\", \"text/html\")],\n    body=b\"<html>Hello World</html>\"\n)\n```\n\n**Key Methods:**\n- `fuzzy_hash()`: Generate fuzzy hash for similarity detection\n- `masked`: Get masked content with dynamic parts hidden\n- `dump()`: Get formatted HTTP response string\n\n#### HTTPRequest\n\nClass for representing HTTP requests.\n\n```python\nfrom hrfh.models import HTTPRequest\n\nrequest = HTTPRequest(\n    ip=\"1.2.3.4\",\n    port=80,\n    method=\"GET\",\n    version=\"HTTP/1.1\",\n    headers=[(\"Host\", \"example.com\")],\n    body=b\"\"\n)\n```\n\n### Utility Functions\n\n#### Parsing Functions\n\n```python\nfrom hrfh.utils.parser import (\n    create_http_response_from_bytes,\n    create_http_response_from_json,\n    create_http_request_from_json\n)\n\n# Parse from raw HTTP response bytes\nresponse = create_http_response_from_bytes(http_bytes)\n\n# Parse from JSON data\nresponse = create_http_response_from_json(json_data)\nrequest = create_http_request_from_json(json_data)\n```\n\n## Advanced Usage\n\n### Working with JSON Data\n\n```python\nimport json\nfrom hrfh.utils.parser import create_http_response_from_json\n\n# Load HTTP response data from JSON file\nwith open('response_data.json', 'r') as f:\n    data = json.load(f)\n\nresponse = create_http_response_from_json(data)\nhash_value = response.fuzzy_hash()\n```\n\n**Example JSON format:**\n```json\n{\n  \"ip\": \"104.103.147.116\",\n  \"timestamp\": 1717146116,\n  \"status_code\": 400,\n  \"status_reason\": \"Bad Request\",\n  \"headers\": {\n    \"Server\": \"AkamaiGHost\",\n    \"Content-Type\": \"text/html\",\n    \"Content-Length\": \"312\"\n  },\n  \"body\": \"<HTML><HEAD><TITLE>Invalid URL</TITLE></HEAD><BODY>...</BODY></HTML>\"\n}\n```\n\n### Batch Processing\n\n```python\nimport os\nfrom hrfh.utils.parser import create_http_response_from_json\n\ndef process_responses(data_dir):\n    results = {}\n\n    for cdn_dir in os.listdir(data_dir):\n        cdn_path = os.path.join(data_dir, cdn_dir)\n        if os.path.isdir(cdn_path):\n            for json_file in os.listdir(cdn_path):\n                if json_file.endswith('.json'):\n                    file_path = os.path.join(cdn_path, json_file)\n                    with open(file_path, 'r') as f:\n                        data = json.load(f)\n\n                    response = create_http_response_from_json(data)\n                    hash_value = response.fuzzy_hash()\n                    results[hash_value] = response\n\n    return results\n\n# Usage\nresults = process_responses('data/')\nfor hash_val, response in results.items():\n    print(f\"{hash_val[:16]} {response}\")\n```\n\n## Development\n\n### Setting Up Development Environment\n\n1. **Clone the repository**\n   ```bash\n   git clone https://github.com/yourusername/hrfh.git\n   cd hrfh\n   ```\n\n2. **Install dependencies**\n   ```bash\n   uv sync\n   ```\n\n3. **Run tests**\n   ```bash\n   uv run pytest\n   ```\n\n4. **Type checking**\n   ```bash\n   uv run mypy hrfh/\n   ```\n\n### Project Structure\n\n```\nhrfh/\n\u251c\u2500\u2500 hrfh/                    # Main package\n\u2502   \u251c\u2500\u2500 models/             # Data models (HTTPRequest, HTTPResponse)\n\u2502   \u251c\u2500\u2500 utils/              # Utility functions\n\u2502   \u2502   \u251c\u2500\u2500 parser.py       # HTTP parsing utilities\n\u2502   \u2502   \u251c\u2500\u2500 masker.py       # Content masking logic\n\u2502   \u2502   \u251c\u2500\u2500 hasher.py       # Hashing algorithms\n\u2502   \u2502   \u2514\u2500\u2500 tokenizer.py    # HTML tokenization\n\u2502   \u2514\u2500\u2500 __main__.py         # CLI entry point\n\u251c\u2500\u2500 tests/                   # Test suite\n\u251c\u2500\u2500 data/                    # Sample data for testing\n\u251c\u2500\u2500 pyproject.toml          # Project configuration\n\u2514\u2500\u2500 README.md               # This file\n```\n\n### Running the CLI Tool\n\n```bash\n# Install the package in development mode\nuv sync\n\n# Run the CLI tool\nuv run hrfh --help\n\n# Process a specific file\nuv run hrfh data/akamai/104.103.147.116.json\n\n# Process from stdin\ncat data/akamai/104.103.147.116.json | uv run hrfh -\n```\n\n### Contributing\n\n1. Fork the repository\n2. Create a feature branch (`git checkout -b feature/amazing-feature`)\n3. Commit your changes (`git commit -m 'Add amazing feature'`)\n4. Push to the branch (`git push origin feature/amazing-feature`)\n5. Open a Pull Request\n\n### Testing\n\n```bash\n# Run all tests\nuv run pytest\n\n# Run with coverage\nuv run pytest --cov=hrfh\n\n# Run specific test file\nuv run pytest tests/test_http_response.py\n```\n\n## Examples\n\n### CDN Analysis\n\n```python\nfrom hrfh.utils.parser import create_http_response_from_bytes\n\n# Analyze responses from different CDNs\nakamai_response = create_http_response_from_bytes(akamai_bytes)\ncloudflare_response = create_http_response_from_bytes(cloudflare_bytes)\n\n# Compare hashes to detect similar content\nif akamai_response.fuzzy_hash() == cloudflare_response.fuzzy_hash():\n    print(\"Same content served from different CDNs\")\n```\n\n### Content Change Detection\n\n```python\n# Monitor for content changes\nold_hash = response.fuzzy_hash()\n\n# After some time...\nnew_response = create_http_response_from_bytes(new_bytes)\nnew_hash = new_response.fuzzy_hash()\n\nif old_hash != new_hash:\n    print(\"Content has changed!\")\n```\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Support\n\n- **Issues**: [GitHub Issues](https://github.com/yourusername/hrfh/issues)\n- **Documentation**: [GitHub Wiki](https://github.com/yourusername/hrfh/wiki)\n- **Discussions**: [GitHub Discussions](https://github.com/yourusername/hrfh/discussions)\n\n## Acknowledgments\n\n- Built with [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/) for HTML parsing\n- Uses [NLTK](https://www.nltk.org/) for natural language processing\n- Inspired by fuzzy hashing techniques for digital forensics\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "HTTP Response Fuzzy Hashing",
    "version": "0.1.22",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "31197840c805fdd644dd5f818c2c9a21b714a77756ca5af41df003fed1b99cf8",
                "md5": "7aba3ee514db327b072e0649a6a31c8d",
                "sha256": "1aa65c187302ce179cf0c04e58544ac75494e5b8701c1e2992c4e4f58fe3dca7"
            },
            "downloads": -1,
            "filename": "hrfh-0.1.22-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7aba3ee514db327b072e0649a6a31c8d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 131876,
            "upload_time": "2025-08-12T13:25:43",
            "upload_time_iso_8601": "2025-08-12T13:25:43.700057Z",
            "url": "https://files.pythonhosted.org/packages/31/19/7840c805fdd644dd5f818c2c9a21b714a77756ca5af41df003fed1b99cf8/hrfh-0.1.22-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "bcd53a1fa8b8d131c453226af21c5fd542e8c459e58597c6f5a3401fbe87d7d8",
                "md5": "c2408081b4f2393df50a032c0e8ae81e",
                "sha256": "e63962348a69b6d67ed8205f60d848f932471c0d59bb8f2a33aaa03ac58fffee"
            },
            "downloads": -1,
            "filename": "hrfh-0.1.22.tar.gz",
            "has_sig": false,
            "md5_digest": "c2408081b4f2393df50a032c0e8ae81e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 30534,
            "upload_time": "2025-08-12T13:25:45",
            "upload_time_iso_8601": "2025-08-12T13:25:45.078131Z",
            "url": "https://files.pythonhosted.org/packages/bc/d5/3a1fa8b8d131c453226af21c5fd542e8c459e58597c6f5a3401fbe87d7d8/hrfh-0.1.22.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-12 13:25:45",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "hrfh"
}

None