hibp-downloader


Namehibp-downloader JSON
Version 0.3.2 PyPI version JSON
download
home_pageNone
SummaryEfficiently download HIBP new pwned password data by hash-prefix for a local-copy
upload_time2025-01-26 01:48:11
maintainerNone
docs_urlNone
authorNicholas de Jong
requires_python<4.0,>=3.8
licenseBSD-3-Clause
keywords hibp-downloader hibp haveibeenpwned haveibeenpwned-downloader sha1 ntlm
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # hibp-downloader

[![pypi](https://img.shields.io/pypi/v/hibp-downloader.svg)](https://pypi.python.org/pypi/hibp-downloader/)
[![python](https://img.shields.io/pypi/pyversions/hibp-downloader.svg)](https://github.com/threatpatrols/hibp-downloader/)
[![build tests](https://github.com/threatpatrols/hibp-downloader/actions/workflows/build-tests.yml/badge.svg)](https://github.com/threatpatrols/hibp-downloader/actions/workflows/build-tests.yml)
[![docs](https://img.shields.io/readthedocs/hibp-downloader)](https://hibp-downloader.readthedocs.io)
[![license](https://img.shields.io/github/license/threatpatrols/hibp-downloader.svg)](https://github.com/threatpatrols/hibp-downloader)

This is a CLI tool to efficiently download a local copy of the pwned password hash data from the very awesome
[HIBP](https://haveibeenpwned.com/Passwords) pwned passwords [api-endpoint](https://api.pwnedpasswords.com) using all the good bits;
multiprocessing, async-processes, local-caching, content-etags and http2-connection pooling to probably make things 
as fast as is Pythonly possible.

## Features
 - Interface to directly `query` for compromised password values from the *compressed* file data-store!
 - Download and store acquired data in gzip'd compressed to save on storage and speed up queries. 
 - Download the full dataset in under 45 mins (generally CPU bound)
 - Easily resume interrupted `download` operations into a `--data-path` without re-clobbering api-source.
 - Only download hash-prefix content blocks when the source content has changed (via content ETAG values); making it 
   easy to periodically sync-up when needed.
 - Query interface performance is efficient enough to attach a user web-service with reasonable loads (ie don't waste 
   your own resources decompressing the dataset and storing in a database!)
 - Ability to generate a single text file with in-order pwned password hash values, similar to [PwnedPasswordsDownloader](https://github.com/HaveIBeenPwned/PwnedPasswordsDownloader) from 
   the awesome HIBP team.
 - Per prefix file metadata in JSON format for easy data reuse by other tooling if required.

## Install
```commandline
pipx install hibp-downloader
```

## Usage (download)
![screenshot-help.png](https://raw.githubusercontent.com/threatpatrols/hibp-downloader/main/docs/content/assets/screenshot-help.png)

## Performance
Sample download activity log; host with 32 cores on 500Mbit/s connection. 
```text
...
2024-05-16T10:18:01-0400 | INFO | hibp-downloader | prefix=f80c7 source=[lc:13616 et:3 rc:1002358 ro:25 xx:1] processed=[17836.6MB ~414462H/s] api=[918req/s 17597.4MB] runtime=36.4min
2024-05-16T10:18:02-0400 | INFO | hibp-downloader | prefix=f81af source=[lc:13616 et:3 rc:1002558 ro:25 xx:1] processed=[17840.1MB ~414454H/s] api=[918req/s 17600.9MB] runtime=36.4min
2024-05-16T10:18:02-0400 | INFO | hibp-downloader | prefix=f826f source=[lc:13616 et:3 rc:1002758 ro:25 xx:1] processed=[17843.6MB ~414454H/s] api=[918req/s 17604.4MB] runtime=36.4min
2024-05-16T10:18:03-0400 | INFO | hibp-downloader | prefix=f833f source=[lc:13616 et:3 rc:1002958 ro:25 xx:1] processed=[17847.1MB ~414450H/s] api=[918req/s 17607.9MB] runtime=36.4min
```

 - 918x requests per second to `api.pwnedpasswords.com`
 - Log sources are shorthand:
     - `lc`: 13616 from local-cache (lc) - request-responses handled locally without hitting the network. 
     - `et`: 3 etag-matched (et) - request-responses that confirmed our local data was up-to-date and did not require a new download.
     - `rc`: 1002958 from remote-cache (rc) - request-responses that were downloaded to local, but came from the remote-server cache.
     - `ro`: 25 from remote-origin (ro) - request-responses that were downloaded to local, and the download needed to be fetched from remote origin source.
     - `xx`: 1 failed responses - request-responses that failed (and successfully retried).
 - ~17GB downloaded in ~36 minutes (full dataset)
 - Approx ~414k hash values received per second
 - Processing in this example appears to be CPU bound, measured traffic around ~160 Mbit/s.

## Usage (query)
![screenshot-help.png](https://raw.githubusercontent.com/threatpatrols/hibp-downloader/main/docs/content/assets/screenshot-query-help.png)

## Project

 - Github - [github.com/threatpatrols/hibp-downloader](https://github.com/threatpatrols/hibp-downloader)
 - PyPI - [pypi.org/project/hibp-downloader/](https://pypi.org/project/hibp-downloader/)
 - ReadTheDocs - [hibp-downloader.readthedocs.io](https://hibp-downloader.readthedocs.io)

## Copyright
 - Copyright &copy; 2023-2024 [Threat Patrols Pty Ltd](https://www.threatpatrols.com)
 - Copyright &copy; 2023-2024 [Nicholas de Jong](https://www.nicholasdejong.com)

All rights reserved.

## License
 * BSD-3-Clause - see LICENSE file for details.


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "hibp-downloader",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.8",
    "maintainer_email": null,
    "keywords": "hibp-downloader, hibp, haveibeenpwned, haveibeenpwned-downloader, sha1, ntlm",
    "author": "Nicholas de Jong",
    "author_email": "contact@threatpatrols.com",
    "download_url": "https://files.pythonhosted.org/packages/b1/b7/823b5db215c0892f0bfac5080303ea9078d22dbed95a6ead116a34575169/hibp_downloader-0.3.2.tar.gz",
    "platform": null,
    "description": "# hibp-downloader\n\n[![pypi](https://img.shields.io/pypi/v/hibp-downloader.svg)](https://pypi.python.org/pypi/hibp-downloader/)\n[![python](https://img.shields.io/pypi/pyversions/hibp-downloader.svg)](https://github.com/threatpatrols/hibp-downloader/)\n[![build tests](https://github.com/threatpatrols/hibp-downloader/actions/workflows/build-tests.yml/badge.svg)](https://github.com/threatpatrols/hibp-downloader/actions/workflows/build-tests.yml)\n[![docs](https://img.shields.io/readthedocs/hibp-downloader)](https://hibp-downloader.readthedocs.io)\n[![license](https://img.shields.io/github/license/threatpatrols/hibp-downloader.svg)](https://github.com/threatpatrols/hibp-downloader)\n\nThis is a CLI tool to efficiently download a local copy of the pwned password hash data from the very awesome\n[HIBP](https://haveibeenpwned.com/Passwords) pwned passwords [api-endpoint](https://api.pwnedpasswords.com) using all the good bits;\nmultiprocessing, async-processes, local-caching, content-etags and http2-connection pooling to probably make things \nas fast as is Pythonly possible.\n\n## Features\n - Interface to directly `query` for compromised password values from the *compressed* file data-store!\n - Download and store acquired data in gzip'd compressed to save on storage and speed up queries. \n - Download the full dataset in under 45 mins (generally CPU bound)\n - Easily resume interrupted `download` operations into a `--data-path` without re-clobbering api-source.\n - Only download hash-prefix content blocks when the source content has changed (via content ETAG values); making it \n   easy to periodically sync-up when needed.\n - Query interface performance is efficient enough to attach a user web-service with reasonable loads (ie don't waste \n   your own resources decompressing the dataset and storing in a database!)\n - Ability to generate a single text file with in-order pwned password hash values, similar to [PwnedPasswordsDownloader](https://github.com/HaveIBeenPwned/PwnedPasswordsDownloader) from \n   the awesome HIBP team.\n - Per prefix file metadata in JSON format for easy data reuse by other tooling if required.\n\n## Install\n```commandline\npipx install hibp-downloader\n```\n\n## Usage (download)\n![screenshot-help.png](https://raw.githubusercontent.com/threatpatrols/hibp-downloader/main/docs/content/assets/screenshot-help.png)\n\n## Performance\nSample download activity log; host with 32 cores on 500Mbit/s connection. \n```text\n...\n2024-05-16T10:18:01-0400 | INFO | hibp-downloader | prefix=f80c7 source=[lc:13616 et:3 rc:1002358 ro:25 xx:1] processed=[17836.6MB ~414462H/s] api=[918req/s 17597.4MB] runtime=36.4min\n2024-05-16T10:18:02-0400 | INFO | hibp-downloader | prefix=f81af source=[lc:13616 et:3 rc:1002558 ro:25 xx:1] processed=[17840.1MB ~414454H/s] api=[918req/s 17600.9MB] runtime=36.4min\n2024-05-16T10:18:02-0400 | INFO | hibp-downloader | prefix=f826f source=[lc:13616 et:3 rc:1002758 ro:25 xx:1] processed=[17843.6MB ~414454H/s] api=[918req/s 17604.4MB] runtime=36.4min\n2024-05-16T10:18:03-0400 | INFO | hibp-downloader | prefix=f833f source=[lc:13616 et:3 rc:1002958 ro:25 xx:1] processed=[17847.1MB ~414450H/s] api=[918req/s 17607.9MB] runtime=36.4min\n```\n\n - 918x requests per second to `api.pwnedpasswords.com`\n - Log sources are shorthand:\n     - `lc`: 13616 from local-cache (lc) - request-responses handled locally without hitting the network. \n     - `et`: 3 etag-matched (et) - request-responses that confirmed our local data was up-to-date and did not require a new download.\n     - `rc`: 1002958 from remote-cache (rc) - request-responses that were downloaded to local, but came from the remote-server cache.\n     - `ro`: 25 from remote-origin (ro) - request-responses that were downloaded to local, and the download needed to be fetched from remote origin source.\n     - `xx`: 1 failed responses - request-responses that failed (and successfully retried).\n - ~17GB downloaded in ~36 minutes (full dataset)\n - Approx ~414k hash values received per second\n - Processing in this example appears to be CPU bound, measured traffic around ~160 Mbit/s.\n\n## Usage (query)\n![screenshot-help.png](https://raw.githubusercontent.com/threatpatrols/hibp-downloader/main/docs/content/assets/screenshot-query-help.png)\n\n## Project\n\n - Github - [github.com/threatpatrols/hibp-downloader](https://github.com/threatpatrols/hibp-downloader)\n - PyPI - [pypi.org/project/hibp-downloader/](https://pypi.org/project/hibp-downloader/)\n - ReadTheDocs - [hibp-downloader.readthedocs.io](https://hibp-downloader.readthedocs.io)\n\n## Copyright\n - Copyright &copy; 2023-2024 [Threat Patrols Pty Ltd](https://www.threatpatrols.com)\n - Copyright &copy; 2023-2024 [Nicholas de Jong](https://www.nicholasdejong.com)\n\nAll rights reserved.\n\n## License\n * BSD-3-Clause - see LICENSE file for details.\n\n",
    "bugtrack_url": null,
    "license": "BSD-3-Clause",
    "summary": "Efficiently download HIBP new pwned password data by hash-prefix for a local-copy",
    "version": "0.3.2",
    "project_urls": {
        "Bug Tracker": "https://github.com/threatpatrols/hibp-downloader/issues",
        "Documentation": "https://hibp-downloader.readthedocs.io/en/latest/",
        "Homepage": "https://github.com/threatpatrols/hibp-downloader",
        "Repository": "https://github.com/threatpatrols/hibp-downloader"
    },
    "split_keywords": [
        "hibp-downloader",
        " hibp",
        " haveibeenpwned",
        " haveibeenpwned-downloader",
        " sha1",
        " ntlm"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0c97383d94aa70c61026046f80a50b468a498e33be9b6d90311cbe0ee07726a5",
                "md5": "e093e6547b3d3fafff11f54a465f0d86",
                "sha256": "c2a5308eccdae351e33c5d99d0ef5652c6fa27230a84e77dd9987956d315d17f"
            },
            "downloads": -1,
            "filename": "hibp_downloader-0.3.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e093e6547b3d3fafff11f54a465f0d86",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.8",
            "size": 26453,
            "upload_time": "2025-01-26T01:48:08",
            "upload_time_iso_8601": "2025-01-26T01:48:08.189915Z",
            "url": "https://files.pythonhosted.org/packages/0c/97/383d94aa70c61026046f80a50b468a498e33be9b6d90311cbe0ee07726a5/hibp_downloader-0.3.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b1b7823b5db215c0892f0bfac5080303ea9078d22dbed95a6ead116a34575169",
                "md5": "8a789804f2e94f92db7418add548daca",
                "sha256": "7eaed086ec3b50af31e295850bb56e470e82676c52cb0de2f8774568e72c6023"
            },
            "downloads": -1,
            "filename": "hibp_downloader-0.3.2.tar.gz",
            "has_sig": false,
            "md5_digest": "8a789804f2e94f92db7418add548daca",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.8",
            "size": 20600,
            "upload_time": "2025-01-26T01:48:11",
            "upload_time_iso_8601": "2025-01-26T01:48:11.374096Z",
            "url": "https://files.pythonhosted.org/packages/b1/b7/823b5db215c0892f0bfac5080303ea9078d22dbed95a6ead116a34575169/hibp_downloader-0.3.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-26 01:48:11",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "threatpatrols",
    "github_project": "hibp-downloader",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "hibp-downloader"
}
        
Elapsed time: 0.40028s