# hibp-downloader
[](https://pypi.python.org/pypi/hibp-downloader/)
[](https://github.com/threatpatrols/hibp-downloader/)
[](https://github.com/threatpatrols/hibp-downloader/actions/workflows/build-tests.yml)
[](https://hibp-downloader.readthedocs.io)
[](https://github.com/threatpatrols/hibp-downloader)
This is a CLI tool to efficiently download a local copy of the pwned password hash data from the very awesome
[HIBP](https://haveibeenpwned.com/Passwords) pwned passwords [api-endpoint](https://api.pwnedpasswords.com) using all the good bits;
multiprocessing, async-processes, local-caching, content-etags and http2-connection pooling to probably make things
as fast as is Pythonly possible.
## Features
- Interface to directly `query` for compromised password values from the *compressed* file data-store!
- Download and store acquired data in gzip'd compressed to save on storage and speed up queries.
- Download the full dataset in under 45 mins (generally CPU bound)
- Easily resume interrupted `download` operations into a `--data-path` without re-clobbering api-source.
- Only download hash-prefix content blocks when the source content has changed (via content ETAG values); making it
easy to periodically sync-up when needed.
- Query interface performance is efficient enough to attach a user web-service with reasonable loads (ie don't waste
your own resources decompressing the dataset and storing in a database!)
- Ability to generate a single text file with in-order pwned password hash values, similar to [PwnedPasswordsDownloader](https://github.com/HaveIBeenPwned/PwnedPasswordsDownloader) from
the awesome HIBP team.
- Per prefix file metadata in JSON format for easy data reuse by other tooling if required.
## Install
```commandline
pipx install hibp-downloader
```
## Usage (download)

## Performance
Sample download activity log; host with 32 cores on 500Mbit/s connection.
```text
...
2024-05-16T10:18:01-0400 | INFO | hibp-downloader | prefix=f80c7 source=[lc:13616 et:3 rc:1002358 ro:25 xx:1] processed=[17836.6MB ~414462H/s] api=[918req/s 17597.4MB] runtime=36.4min
2024-05-16T10:18:02-0400 | INFO | hibp-downloader | prefix=f81af source=[lc:13616 et:3 rc:1002558 ro:25 xx:1] processed=[17840.1MB ~414454H/s] api=[918req/s 17600.9MB] runtime=36.4min
2024-05-16T10:18:02-0400 | INFO | hibp-downloader | prefix=f826f source=[lc:13616 et:3 rc:1002758 ro:25 xx:1] processed=[17843.6MB ~414454H/s] api=[918req/s 17604.4MB] runtime=36.4min
2024-05-16T10:18:03-0400 | INFO | hibp-downloader | prefix=f833f source=[lc:13616 et:3 rc:1002958 ro:25 xx:1] processed=[17847.1MB ~414450H/s] api=[918req/s 17607.9MB] runtime=36.4min
```
- 918x requests per second to `api.pwnedpasswords.com`
- Log sources are shorthand:
- `lc`: 13616 from local-cache (lc) - request-responses handled locally without hitting the network.
- `et`: 3 etag-matched (et) - request-responses that confirmed our local data was up-to-date and did not require a new download.
- `rc`: 1002958 from remote-cache (rc) - request-responses that were downloaded to local, but came from the remote-server cache.
- `ro`: 25 from remote-origin (ro) - request-responses that were downloaded to local, and the download needed to be fetched from remote origin source.
- `xx`: 1 failed responses - request-responses that failed (and successfully retried).
- ~17GB downloaded in ~36 minutes (full dataset)
- Approx ~414k hash values received per second
- Processing in this example appears to be CPU bound, measured traffic around ~160 Mbit/s.
## Usage (query)

## Project
- Github - [github.com/threatpatrols/hibp-downloader](https://github.com/threatpatrols/hibp-downloader)
- PyPI - [pypi.org/project/hibp-downloader/](https://pypi.org/project/hibp-downloader/)
- ReadTheDocs - [hibp-downloader.readthedocs.io](https://hibp-downloader.readthedocs.io)
## Copyright
- Copyright © 2023-2024 [Threat Patrols Pty Ltd](https://www.threatpatrols.com)
- Copyright © 2023-2024 [Nicholas de Jong](https://www.nicholasdejong.com)
All rights reserved.
## License
* BSD-3-Clause - see LICENSE file for details.
Raw data
{
"_id": null,
"home_page": null,
"name": "hibp-downloader",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.8",
"maintainer_email": null,
"keywords": "hibp-downloader, hibp, haveibeenpwned, haveibeenpwned-downloader, sha1, ntlm",
"author": "Nicholas de Jong",
"author_email": "contact@threatpatrols.com",
"download_url": "https://files.pythonhosted.org/packages/b1/b7/823b5db215c0892f0bfac5080303ea9078d22dbed95a6ead116a34575169/hibp_downloader-0.3.2.tar.gz",
"platform": null,
"description": "# hibp-downloader\n\n[](https://pypi.python.org/pypi/hibp-downloader/)\n[](https://github.com/threatpatrols/hibp-downloader/)\n[](https://github.com/threatpatrols/hibp-downloader/actions/workflows/build-tests.yml)\n[](https://hibp-downloader.readthedocs.io)\n[](https://github.com/threatpatrols/hibp-downloader)\n\nThis is a CLI tool to efficiently download a local copy of the pwned password hash data from the very awesome\n[HIBP](https://haveibeenpwned.com/Passwords) pwned passwords [api-endpoint](https://api.pwnedpasswords.com) using all the good bits;\nmultiprocessing, async-processes, local-caching, content-etags and http2-connection pooling to probably make things \nas fast as is Pythonly possible.\n\n## Features\n - Interface to directly `query` for compromised password values from the *compressed* file data-store!\n - Download and store acquired data in gzip'd compressed to save on storage and speed up queries. \n - Download the full dataset in under 45 mins (generally CPU bound)\n - Easily resume interrupted `download` operations into a `--data-path` without re-clobbering api-source.\n - Only download hash-prefix content blocks when the source content has changed (via content ETAG values); making it \n easy to periodically sync-up when needed.\n - Query interface performance is efficient enough to attach a user web-service with reasonable loads (ie don't waste \n your own resources decompressing the dataset and storing in a database!)\n - Ability to generate a single text file with in-order pwned password hash values, similar to [PwnedPasswordsDownloader](https://github.com/HaveIBeenPwned/PwnedPasswordsDownloader) from \n the awesome HIBP team.\n - Per prefix file metadata in JSON format for easy data reuse by other tooling if required.\n\n## Install\n```commandline\npipx install hibp-downloader\n```\n\n## Usage (download)\n\n\n## Performance\nSample download activity log; host with 32 cores on 500Mbit/s connection. \n```text\n...\n2024-05-16T10:18:01-0400 | INFO | hibp-downloader | prefix=f80c7 source=[lc:13616 et:3 rc:1002358 ro:25 xx:1] processed=[17836.6MB ~414462H/s] api=[918req/s 17597.4MB] runtime=36.4min\n2024-05-16T10:18:02-0400 | INFO | hibp-downloader | prefix=f81af source=[lc:13616 et:3 rc:1002558 ro:25 xx:1] processed=[17840.1MB ~414454H/s] api=[918req/s 17600.9MB] runtime=36.4min\n2024-05-16T10:18:02-0400 | INFO | hibp-downloader | prefix=f826f source=[lc:13616 et:3 rc:1002758 ro:25 xx:1] processed=[17843.6MB ~414454H/s] api=[918req/s 17604.4MB] runtime=36.4min\n2024-05-16T10:18:03-0400 | INFO | hibp-downloader | prefix=f833f source=[lc:13616 et:3 rc:1002958 ro:25 xx:1] processed=[17847.1MB ~414450H/s] api=[918req/s 17607.9MB] runtime=36.4min\n```\n\n - 918x requests per second to `api.pwnedpasswords.com`\n - Log sources are shorthand:\n - `lc`: 13616 from local-cache (lc) - request-responses handled locally without hitting the network. \n - `et`: 3 etag-matched (et) - request-responses that confirmed our local data was up-to-date and did not require a new download.\n - `rc`: 1002958 from remote-cache (rc) - request-responses that were downloaded to local, but came from the remote-server cache.\n - `ro`: 25 from remote-origin (ro) - request-responses that were downloaded to local, and the download needed to be fetched from remote origin source.\n - `xx`: 1 failed responses - request-responses that failed (and successfully retried).\n - ~17GB downloaded in ~36 minutes (full dataset)\n - Approx ~414k hash values received per second\n - Processing in this example appears to be CPU bound, measured traffic around ~160 Mbit/s.\n\n## Usage (query)\n\n\n## Project\n\n - Github - [github.com/threatpatrols/hibp-downloader](https://github.com/threatpatrols/hibp-downloader)\n - PyPI - [pypi.org/project/hibp-downloader/](https://pypi.org/project/hibp-downloader/)\n - ReadTheDocs - [hibp-downloader.readthedocs.io](https://hibp-downloader.readthedocs.io)\n\n## Copyright\n - Copyright © 2023-2024 [Threat Patrols Pty Ltd](https://www.threatpatrols.com)\n - Copyright © 2023-2024 [Nicholas de Jong](https://www.nicholasdejong.com)\n\nAll rights reserved.\n\n## License\n * BSD-3-Clause - see LICENSE file for details.\n\n",
"bugtrack_url": null,
"license": "BSD-3-Clause",
"summary": "Efficiently download HIBP new pwned password data by hash-prefix for a local-copy",
"version": "0.3.2",
"project_urls": {
"Bug Tracker": "https://github.com/threatpatrols/hibp-downloader/issues",
"Documentation": "https://hibp-downloader.readthedocs.io/en/latest/",
"Homepage": "https://github.com/threatpatrols/hibp-downloader",
"Repository": "https://github.com/threatpatrols/hibp-downloader"
},
"split_keywords": [
"hibp-downloader",
" hibp",
" haveibeenpwned",
" haveibeenpwned-downloader",
" sha1",
" ntlm"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0c97383d94aa70c61026046f80a50b468a498e33be9b6d90311cbe0ee07726a5",
"md5": "e093e6547b3d3fafff11f54a465f0d86",
"sha256": "c2a5308eccdae351e33c5d99d0ef5652c6fa27230a84e77dd9987956d315d17f"
},
"downloads": -1,
"filename": "hibp_downloader-0.3.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e093e6547b3d3fafff11f54a465f0d86",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.8",
"size": 26453,
"upload_time": "2025-01-26T01:48:08",
"upload_time_iso_8601": "2025-01-26T01:48:08.189915Z",
"url": "https://files.pythonhosted.org/packages/0c/97/383d94aa70c61026046f80a50b468a498e33be9b6d90311cbe0ee07726a5/hibp_downloader-0.3.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b1b7823b5db215c0892f0bfac5080303ea9078d22dbed95a6ead116a34575169",
"md5": "8a789804f2e94f92db7418add548daca",
"sha256": "7eaed086ec3b50af31e295850bb56e470e82676c52cb0de2f8774568e72c6023"
},
"downloads": -1,
"filename": "hibp_downloader-0.3.2.tar.gz",
"has_sig": false,
"md5_digest": "8a789804f2e94f92db7418add548daca",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.8",
"size": 20600,
"upload_time": "2025-01-26T01:48:11",
"upload_time_iso_8601": "2025-01-26T01:48:11.374096Z",
"url": "https://files.pythonhosted.org/packages/b1/b7/823b5db215c0892f0bfac5080303ea9078d22dbed95a6ead116a34575169/hibp_downloader-0.3.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-26 01:48:11",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "threatpatrols",
"github_project": "hibp-downloader",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "hibp-downloader"
}