hhhash


Namehhhash JSON
Version 0.4 PyPI version JSON
download
home_pagehttps://github.com/adulau/HHHash
SummaryHHHash library is calculate HHHash from HTTP servers.
upload_time2023-08-22 12:34:06
maintainer
docs_urlNone
authorAlexandre Dulaunoy
requires_python>=3.6,<4.0
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # HTTP Headers Hashing (HHHash)

HTTP Headers Hashing (HHHash) is a technique used to create a fingerprint of an HTTP server based on the headers it returns. HHHash employs one-way hashing to generate a hash value for the set of header keys returned by the server.

For more details about HHHash background, [HTTP Headers Hashing (HHHash) or improving correlation of crawled content](https://www.foo.be/2023/07/HTTP-Headers-Hashing_HHHash).

## Calculation of the HHHash

To calculate the HHHash, we concatenate the list of headers returned by the HTTP server. This list is ordered according to the sequence in which the headers appear in the server's response. Each header value is separated with `:`. 

The HHHash value is the SHA256 of the list.

## HHHash format

`hhh`:`1`:`20247663b5c63bf1291fe5350010dafb6d5e845e4c0daaf7dc9c0f646e947c29`

`prefix`:`version`:`SHA 256 value`

## Example

### Calculating HHHash from a curl command

Curl will attempt to run the request using HTTP2 by default. In order to get the same hash as the python requests module (which doesn't supports HTTP2), you need to specify the version with the `--http1.1` switch.

~~~bash
curl --http1.1 -s -D - https://www.circl.lu/ -o /dev/null  | awk 'NR != 1' | cut -f1 -d: | sed '/^[[:space:]]*$/d' | sed -z 's/\n/:/g' | sed 's/.$//' | sha256sum | cut -f1 -d " " | awk {'print "hhh:1:"$1'}
~~~

Output value
~~~
hhh:1:78f7ef0651bac1a5ea42ed9d22242ed8725f07815091032a34ab4e30d3c3cefc
~~~

## Limitations 

HHHash is an effective technique; however, its performance is heavily reliant on the characteristics of the HTTP client requests. Therefore, it is important to note that correlations between a set of hashes are typically established when using the same crawler or HTTP client parameters.

HTTP2 requires the [headers to be lowercase](https://www.rfc-editor.org/rfc/rfc7540#section-8.1.2). It will then changes the hash so you need to be aware of the HTTP version you're using.

### hhhash - Python Library

The [hhhash package](https://pypi.org/project/hhhash/) can be installed via a `pip install hhhash` or build with Poetry from this repository `poetry build` and `poetry install`.

#### Usage

~~~ipython
In [1]: import hhhash

In [2]: hhhash.buildhash(url="https://www.misp-lea.org", debug=False)
Out[2]: 'hhh:1:adca8a87f2a537dbbf07ba6d8cba6db53fde257ae2da4dad6f3ee6b47080c53f'

In [3]: hhhash.buildhash(url="https://www.misp-project.org", debug=False)
Out[3]: 'hhh:1:adca8a87f2a537dbbf07ba6d8cba6db53fde257ae2da4dad6f3ee6b47080c53f'

In [4]: hhhash.buildhash(url="https://www.circl.lu", debug=False)
Out[4]: 'hhh:1:334d8ab68f9e935f3af7c4a91220612f980f2d9168324530c03d28c9429e1299'

In [5]:
~~~

## Other libraries

- [c-hhhash](https://github.com/hrbrmstr/c-hhhash) - C++ HTTP Headers Hashing CLI
- [go-hhhash](https://github.com/hrbrmstr/go-hhhash) - golang HTTP Headers Hashing CLI
- [R hhhash](https://github.com/hrbrmstr/hhhash) - R library HHHash

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/adulau/HHHash",
    "name": "hhhash",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6,<4.0",
    "maintainer_email": "",
    "keywords": "",
    "author": "Alexandre Dulaunoy",
    "author_email": "a@foo.be",
    "download_url": "https://files.pythonhosted.org/packages/0a/f2/ea79550b4b9d9e308f6e189c91793b756afa126f1af671ed9e89bd1e1ca1/hhhash-0.4.tar.gz",
    "platform": null,
    "description": "# HTTP Headers Hashing (HHHash)\n\nHTTP Headers Hashing (HHHash) is a technique used to create a fingerprint of an HTTP server based on the headers it returns. HHHash employs one-way hashing to generate a hash value for the set of header keys returned by the server.\n\nFor more details about HHHash background, [HTTP Headers Hashing (HHHash) or improving correlation of crawled content](https://www.foo.be/2023/07/HTTP-Headers-Hashing_HHHash).\n\n## Calculation of the HHHash\n\nTo calculate the HHHash, we concatenate the list of headers returned by the HTTP server. This list is ordered according to the sequence in which the headers appear in the server's response. Each header value is separated with `:`. \n\nThe HHHash value is the SHA256 of the list.\n\n## HHHash format\n\n`hhh`:`1`:`20247663b5c63bf1291fe5350010dafb6d5e845e4c0daaf7dc9c0f646e947c29`\n\n`prefix`:`version`:`SHA 256 value`\n\n## Example\n\n### Calculating HHHash from a curl command\n\nCurl will attempt to run the request using HTTP2 by default. In order to get the same hash as the python requests module (which doesn't supports HTTP2), you need to specify the version with the `--http1.1` switch.\n\n~~~bash\ncurl --http1.1 -s -D - https://www.circl.lu/ -o /dev/null  | awk 'NR != 1' | cut -f1 -d: | sed '/^[[:space:]]*$/d' | sed -z 's/\\n/:/g' | sed 's/.$//' | sha256sum | cut -f1 -d \" \" | awk {'print \"hhh:1:\"$1'}\n~~~\n\nOutput value\n~~~\nhhh:1:78f7ef0651bac1a5ea42ed9d22242ed8725f07815091032a34ab4e30d3c3cefc\n~~~\n\n## Limitations \n\nHHHash is an effective technique; however, its performance is heavily reliant on the characteristics of the HTTP client requests. Therefore, it is important to note that correlations between a set of hashes are typically established when using the same crawler or HTTP client parameters.\n\nHTTP2 requires the [headers to be lowercase](https://www.rfc-editor.org/rfc/rfc7540#section-8.1.2). It will then changes the hash so you need to be aware of the HTTP version you're using.\n\n### hhhash - Python Library\n\nThe [hhhash package](https://pypi.org/project/hhhash/) can be installed via a `pip install hhhash` or build with Poetry from this repository `poetry build` and `poetry install`.\n\n#### Usage\n\n~~~ipython\nIn [1]: import hhhash\n\nIn [2]: hhhash.buildhash(url=\"https://www.misp-lea.org\", debug=False)\nOut[2]: 'hhh:1:adca8a87f2a537dbbf07ba6d8cba6db53fde257ae2da4dad6f3ee6b47080c53f'\n\nIn [3]: hhhash.buildhash(url=\"https://www.misp-project.org\", debug=False)\nOut[3]: 'hhh:1:adca8a87f2a537dbbf07ba6d8cba6db53fde257ae2da4dad6f3ee6b47080c53f'\n\nIn [4]: hhhash.buildhash(url=\"https://www.circl.lu\", debug=False)\nOut[4]: 'hhh:1:334d8ab68f9e935f3af7c4a91220612f980f2d9168324530c03d28c9429e1299'\n\nIn [5]:\n~~~\n\n## Other libraries\n\n- [c-hhhash](https://github.com/hrbrmstr/c-hhhash) - C++ HTTP Headers Hashing CLI\n- [go-hhhash](https://github.com/hrbrmstr/go-hhhash) - golang HTTP Headers Hashing CLI\n- [R hhhash](https://github.com/hrbrmstr/hhhash) - R library HHHash\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "HHHash library is calculate HHHash from HTTP servers.",
    "version": "0.4",
    "project_urls": {
        "Homepage": "https://github.com/adulau/HHHash",
        "Repository": "https://github.com/adulau/HHHash/issues"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e0b76abf6ac2c66b23980388296eb5302ccc8f50592aab4b8ffe6b875709e35f",
                "md5": "d5fe0020ea1de4c1cc3087cd2f45e83f",
                "sha256": "80d80c3d707fcfb8b8a87c6a2e823b74cd46a1f4397304c880cd6ff27e02286f"
            },
            "downloads": -1,
            "filename": "hhhash-0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d5fe0020ea1de4c1cc3087cd2f45e83f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6,<4.0",
            "size": 4768,
            "upload_time": "2023-08-22T12:34:04",
            "upload_time_iso_8601": "2023-08-22T12:34:04.649607Z",
            "url": "https://files.pythonhosted.org/packages/e0/b7/6abf6ac2c66b23980388296eb5302ccc8f50592aab4b8ffe6b875709e35f/hhhash-0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0af2ea79550b4b9d9e308f6e189c91793b756afa126f1af671ed9e89bd1e1ca1",
                "md5": "3f1646a5a8cdf23ecf13ed3553d226ec",
                "sha256": "fd6312e7a078b2a85409eb18ad21b65a1c64167a869e15320c82787af1594374"
            },
            "downloads": -1,
            "filename": "hhhash-0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "3f1646a5a8cdf23ecf13ed3553d226ec",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6,<4.0",
            "size": 4622,
            "upload_time": "2023-08-22T12:34:06",
            "upload_time_iso_8601": "2023-08-22T12:34:06.405810Z",
            "url": "https://files.pythonhosted.org/packages/0a/f2/ea79550b4b9d9e308f6e189c91793b756afa126f1af671ed9e89bd1e1ca1/hhhash-0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-22 12:34:06",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "adulau",
    "github_project": "HHHash",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "hhhash"
}
        
Elapsed time: 0.13352s