# hibp-downloader
[![pypi](https://img.shields.io/pypi/v/hibp-downloader.svg)](https://pypi.python.org/pypi/hibp-downloader/)
[![python](https://img.shields.io/pypi/pyversions/hibp-downloader.svg)](https://github.com/threatpatrols/hibp-downloader/)
[![build tests](https://github.com/threatpatrols/hibp-downloader/actions/workflows/build-tests.yml/badge.svg)](https://github.com/threatpatrols/hibp-downloader/actions/workflows/build-tests.yml)
[![docs](https://img.shields.io/readthedocs/hibp-downloader)](https://hibp-downloader.readthedocs.io)
[![license](https://img.shields.io/github/license/threatpatrols/hibp-downloader.svg)](https://github.com/threatpatrols/hibp-downloader)
This is a CLI tool to efficiently download a local copy of the pwned password hash data from the very awesome
[HIBP](https://haveibeenpwned.com/Passwords) pwned passwords [api-endpoint](https://api.pwnedpasswords.com) using all the good bits;
multiprocessing, async-processes, local-caching, content-etags and http2-connection pooling to make things as fast
as is Pythonly possible.
## Features
- Easily resume interrupted `download` operations into a `--data-path` without re-clobbering api-source.
- Only download hash-prefix content blocks when the source content has changed (via content ETAG values); thus making
it easy to periodically re-sync when needed.
- Ability to directly `query` for compromised password values from the data in-place; efficient enough to attach a
service with reasonable loads.
- Ability to generate a single text file with in-order pwned password hash values, similar to [PwnedPasswordsDownloader](https://github.com/HaveIBeenPwned/PwnedPasswordsDownloader) from the HIBP team.
- Per prefix file metadata in JSON format for easy data reuse by other tooling if required.
## Install
```commandline
pip install --upgrade hibp-downloader
```
## Usage
![screenshot-help.png](https://raw.githubusercontent.com/threatpatrols/hibp-downloader/main/docs/content/assets/screenshot-help.png)
## Performance
Sample download activity log; host with 12 cores on 45Mbit/s DSL connection.
```text
2023-11-12T21:25:08+1000 | INFO | hibp-downloader | prefix=00ec3 source=[lc:10 et:2 rc:3800 ro:0 xx:0] processed=[62.0MB ~43589H/s] api=[105req/s 60.0MB] runtime=1.2min
2023-11-12T21:25:09+1000 | INFO | hibp-downloader | prefix=00eff source=[lc:10 et:2 rc:3850 ro:0 xx:0] processed=[62.8MB ~43547H/s] api=[105req/s 60.8MB] runtime=1.2min
2023-11-12T21:25:10+1000 | INFO | hibp-downloader | prefix=00f3b source=[lc:10 et:2 rc:3900 ro:0 xx:0] processed=[63.7MB ~43528H/s] api=[105req/s 61.7MB] runtime=1.2min
2023-11-12T21:25:11+1000 | INFO | hibp-downloader | prefix=00f6d source=[lc:10 et:2 rc:3950 ro:0 xx:0] processed=[64.5MB ~43541H/s] api=[105req/s 62.5MB] runtime=1.3min
```
- 105x requests per second to `api.pwnedpasswords.com`
- Log sources are shorthand:
- `lc`: 10x prefix files from local-cache
- `et`: 2x etag-match responses
- `rc`: 3950x from remote-cache
- `ro`: 0x from remote-origin
- `xx`: 0x failed download
- 62MB downloaded in ~75 seconds
- Approx ~43k hash values per second
## Project
- Github - [github.com/threatpatrols/hibp-downloader](https://github.com/threatpatrols/hibp-downloader)
- PyPI - [pypi.org/project/hibp-downloader/](https://pypi.org/project/hibp-downloader/)
- ReadTheDocs - [hibp-downloader.readthedocs.io](https://hibp-downloader.readthedocs.io)
## Copyright
- Copyright © 2023 [Threat Patrols Pty Ltd](https://www.threatpatrols.com)
- Copyright © 2023 [Nicholas de Jong](https://www.nicholasdejong.com)
All rights reserved.
## License
* BSD-3-Clause - see LICENSE file for details.
Raw data
{
"_id": null,
"home_page": "",
"name": "hibp-downloader",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8,<4.0",
"maintainer_email": "",
"keywords": "hibp-downloader,hibp,haveibeenpwned,haveibeenpwned-downloader,sha1,ntlm",
"author": "Nicholas de Jong",
"author_email": "contact@threatpatrols.com",
"download_url": "https://files.pythonhosted.org/packages/0e/b9/f18a66f51a8184abd788f6e1ce3bda629de6a4f846145ae76d6cceb7b222/hibp_downloader-0.3.1.tar.gz",
"platform": null,
"description": "# hibp-downloader\n\n[![pypi](https://img.shields.io/pypi/v/hibp-downloader.svg)](https://pypi.python.org/pypi/hibp-downloader/)\n[![python](https://img.shields.io/pypi/pyversions/hibp-downloader.svg)](https://github.com/threatpatrols/hibp-downloader/)\n[![build tests](https://github.com/threatpatrols/hibp-downloader/actions/workflows/build-tests.yml/badge.svg)](https://github.com/threatpatrols/hibp-downloader/actions/workflows/build-tests.yml)\n[![docs](https://img.shields.io/readthedocs/hibp-downloader)](https://hibp-downloader.readthedocs.io)\n[![license](https://img.shields.io/github/license/threatpatrols/hibp-downloader.svg)](https://github.com/threatpatrols/hibp-downloader)\n\nThis is a CLI tool to efficiently download a local copy of the pwned password hash data from the very awesome\n[HIBP](https://haveibeenpwned.com/Passwords) pwned passwords [api-endpoint](https://api.pwnedpasswords.com) using all the good bits;\nmultiprocessing, async-processes, local-caching, content-etags and http2-connection pooling to make things as fast \nas is Pythonly possible.\n\n## Features\n\n - Easily resume interrupted `download` operations into a `--data-path` without re-clobbering api-source.\n - Only download hash-prefix content blocks when the source content has changed (via content ETAG values); thus making \n it easy to periodically re-sync when needed.\n - Ability to directly `query` for compromised password values from the data in-place; efficient enough to attach a \n service with reasonable loads.\n - Ability to generate a single text file with in-order pwned password hash values, similar to [PwnedPasswordsDownloader](https://github.com/HaveIBeenPwned/PwnedPasswordsDownloader) from the HIBP team.\n - Per prefix file metadata in JSON format for easy data reuse by other tooling if required.\n\n## Install\n```commandline\npip install --upgrade hibp-downloader\n```\n\n## Usage\n![screenshot-help.png](https://raw.githubusercontent.com/threatpatrols/hibp-downloader/main/docs/content/assets/screenshot-help.png)\n\n## Performance\nSample download activity log; host with 12 cores on 45Mbit/s DSL connection. \n```text\n2023-11-12T21:25:08+1000 | INFO | hibp-downloader | prefix=00ec3 source=[lc:10 et:2 rc:3800 ro:0 xx:0] processed=[62.0MB ~43589H/s] api=[105req/s 60.0MB] runtime=1.2min\n2023-11-12T21:25:09+1000 | INFO | hibp-downloader | prefix=00eff source=[lc:10 et:2 rc:3850 ro:0 xx:0] processed=[62.8MB ~43547H/s] api=[105req/s 60.8MB] runtime=1.2min\n2023-11-12T21:25:10+1000 | INFO | hibp-downloader | prefix=00f3b source=[lc:10 et:2 rc:3900 ro:0 xx:0] processed=[63.7MB ~43528H/s] api=[105req/s 61.7MB] runtime=1.2min\n2023-11-12T21:25:11+1000 | INFO | hibp-downloader | prefix=00f6d source=[lc:10 et:2 rc:3950 ro:0 xx:0] processed=[64.5MB ~43541H/s] api=[105req/s 62.5MB] runtime=1.3min\n```\n\n - 105x requests per second to `api.pwnedpasswords.com`\n - Log sources are shorthand:\n - `lc`: 10x prefix files from local-cache\n - `et`: 2x etag-match responses\n - `rc`: 3950x from remote-cache\n - `ro`: 0x from remote-origin\n - `xx`: 0x failed download\n - 62MB downloaded in ~75 seconds\n - Approx ~43k hash values per second\n\n## Project\n\n - Github - [github.com/threatpatrols/hibp-downloader](https://github.com/threatpatrols/hibp-downloader)\n - PyPI - [pypi.org/project/hibp-downloader/](https://pypi.org/project/hibp-downloader/)\n - ReadTheDocs - [hibp-downloader.readthedocs.io](https://hibp-downloader.readthedocs.io)\n\n## Copyright\n - Copyright © 2023 [Threat Patrols Pty Ltd](https://www.threatpatrols.com)\n - Copyright © 2023 [Nicholas de Jong](https://www.nicholasdejong.com)\n\nAll rights reserved.\n\n## License\n * BSD-3-Clause - see LICENSE file for details.\n\n",
"bugtrack_url": null,
"license": "BSD-3-Clause",
"summary": "Efficiently download HIBP new pwned password data by hash-prefix for a local-copy",
"version": "0.3.1",
"project_urls": {
"Bug Tracker": "https://github.com/threatpatrols/hibp-downloader/issues",
"Documentation": "https://hibp-downloader.readthedocs.io/en/latest/",
"Homepage": "https://github.com/threatpatrols/hibp-downloader",
"Repository": "https://github.com/threatpatrols/hibp-downloader"
},
"split_keywords": [
"hibp-downloader",
"hibp",
"haveibeenpwned",
"haveibeenpwned-downloader",
"sha1",
"ntlm"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b349570b9fe497295aa6401e9b88f03c8692d1670aaeb568ca1e7a67f46ee54b",
"md5": "7414e82e6d9c91248777c37af5290ce9",
"sha256": "025d961f6957e1cb859178e553d2568890136913e1d67000d36f79a0ca9a3a29"
},
"downloads": -1,
"filename": "hibp_downloader-0.3.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "7414e82e6d9c91248777c37af5290ce9",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8,<4.0",
"size": 25859,
"upload_time": "2024-02-09T01:48:55",
"upload_time_iso_8601": "2024-02-09T01:48:55.327491Z",
"url": "https://files.pythonhosted.org/packages/b3/49/570b9fe497295aa6401e9b88f03c8692d1670aaeb568ca1e7a67f46ee54b/hibp_downloader-0.3.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "0eb9f18a66f51a8184abd788f6e1ce3bda629de6a4f846145ae76d6cceb7b222",
"md5": "a9af145eb8ddd7e098cf5c7ae408b5e1",
"sha256": "54a0119672bcf9d86a6e2a531c34c89a300532c52b1167ae9f6ecc67d8f95b1e"
},
"downloads": -1,
"filename": "hibp_downloader-0.3.1.tar.gz",
"has_sig": false,
"md5_digest": "a9af145eb8ddd7e098cf5c7ae408b5e1",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8,<4.0",
"size": 20499,
"upload_time": "2024-02-09T01:48:57",
"upload_time_iso_8601": "2024-02-09T01:48:57.600946Z",
"url": "https://files.pythonhosted.org/packages/0e/b9/f18a66f51a8184abd788f6e1ce3bda629de6a4f846145ae76d6cceb7b222/hibp_downloader-0.3.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-02-09 01:48:57",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "threatpatrols",
"github_project": "hibp-downloader",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "hibp-downloader"
}