pywaybackup

Name	pywaybackup JSON
Version	4.0.0 JSON
	download
home_page	None
Summary	Query and download archive.org as simple as possible.
upload_time	2025-09-01 18:21:22
maintainer	None
docs_url	None
author	None
requires_python	>=3.8
license	MIT License Copyright (c) 2023 bitdruid Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # python wayback machine downloader

[![PyPI](https://img.shields.io/pypi/v/pywaybackup)](https://pypi.org/project/pywaybackup/)
[![PyPI - Downloads](https://img.shields.io/pypi/dm/pywaybackup)](https://pypi.org/project/pywaybackup/)
![Python Version](https://img.shields.io/badge/Python-3.8-blue)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

Downloading archived web pages from the [Wayback Machine](https://archive.org/web/).

Internet-archive is a nice source for several OSINT-information. This tool is a work in progress to query and fetch archived web pages.

This tool allows you to download content from the Wayback Machine (archive.org). You can use it to download either the latest version or all versions of web page snapshots within a specified range.

# Content

➡️ [Installation](#installation) <br>
➡️ [notes / issues / hints](#notes--issues--hints) <br>
➡️ [import](#import) <br>
➡️ [cli](#cli) <br>
➡️ [Usage](#usage) <br>
➡️ [Examples](#examples) <br>
➡️ [Output](#output) <br>
➡️ [Contributing](#contributing) <br>

## Installation

### Pip

1. Install the package <br>
   `pip install pywaybackup`
2. Run the tool <br>
   `waybackup -h`

### Manual

1. Clone the repository <br>
   `git clone https://github.com/bitdruid/python-wayback-machine-downloader.git`
2. Install <br>
   `pip install .`
   - in a virtual env or use `--break-system-package`

## notes / issues / hints

- Linux recommended: On Windows machines, the path length is limited. Files that exceed the path length will not be downloaded.
- The tool uses a sqlite database to handle snapshots. The database will only persist while the download is running.
- If you query an explicit file (e.g. a query-string `?query=this` or `login.html`), the `--explicit`-argument is recommended as a wildcard query may lead to an empty result.
- Downloading directly into a network share is not recommended. The sqlite locking mechanism may cause issues. If you need to download into a network share, set the `--metadata` argument to a local path.

<br>
<br>

## import

You can import pywaybackup into your own scripts and run it. Args are the same as cli.

Additional args:
- `silent` (default False): If True, suppresses all output to the console.
- `debug` (default True): If False, disables writing errors to the error log file.

Use:
- `run()`
- `status()`
- `paths()`
- `stop()`

```python
from pywaybackup import PyWayBackup

backup = PyWayBackup(
  url="https://example.com",
  all=True,
  start="20200101",
  end="20201231",
  silent=False,
  debug=True,
  log=True,
  keep=True
)

backup.run()
backup_paths = backup.paths(rel=True)
print(backup_paths)
```
output:
```bash
{
  'snapshots': 'output/example.com',
  'cdxfile': 'output/waybackup_example.cdx',
  'dbfile': 'output/waybackup_example.com.db',
  'csvfile': 'output/waybackup_https.example.com.csv',
  'log': 'output/waybackup_example.com.log',
  'debug': 'output/waybackup_error.log'
}
```

... or run it asynchronously and print the current status or stop it whenever needed.

```python
import time
from pywaybackup import PyWayBackup

backup = PyWayBackup( ... )
backup.run(daemon=True)
print(backup.status())
time.sleep(10)
print(backup.status())
backup.stop()
```
output:
```bash
{
  'task': 'downloading snapshots',
  'current': 15,
  'total': 84,
  'progress': '18%'
}
```

## cli

- `-h`, `--help`: Show the help message and exit.
- `-v`, `--version`: Show information about the tool and exit.

#### Required

- **`-u`**, **`--url`**:<br>
  The URL of the web page to download. This argument is required.

#### Mode Selection (Choose One)

- **`-a`**, **`--all`**:<br>
  Download snapshots of all timestamps. You will get a folder per timestamp with the files available at that time.
- **`-l`**, **`--last`**:<br>
  Download the last version of each file snapshot. You will get one directory with a rebuild of the page. It contains the last version of each file of your specified `--range`.
- **`-f`**, **`--first`**:<br>
  Download the first version of each file snapshot. You will get one directory with a rebuild of the page. It contains the first version of each file of your specified `--range`.
- **`-s`**, **`--save`**:<br>
  Save a page to the Wayback Machine. (beta)

#### Optional query parameters

- **`-e`**, **`--explicit`**:<br>
  Only download the explicit given URL. No wildcard subdomains or paths. Use e.g. to get root-only snapshots. This is recommended for explicit files like `login.html` or `?query=this`.

- **`--limit`** `<count>`:<br>
  Limits the amount of snapshots to query from the CDX server. If an existing CDX file is injected, the limit will have no effect. So you would need to set `--keep`.

- **Range Selection:**<br>
  Specify the range in years or a specific timestamp either start, end, or both. If you specify the `range`, the `start` and `end` will be ignored. Format for timestamps: YYYYMMDDhhmmss. You can only give a year or increase specificity by going through the timestamp starting on the left.<br>
  (year 2019, year+month+day 20190101, year+month+day+hour 2019010112)

  - **`-r`**, **`--range`**:<br>
    Specify the range in years for which to search and download snapshots.
  - **`--start`**:<br>
    Timestamp to start searching.
  - **`--end`**:<br>
    Timestamp to end searching.

- **Filtering:**<br>
  A filter will result in a filtered cdx-file. So if you want to download all files later, you need to query again without the filter.

  - **`--filetype`** `<filetype>`:<br>
    Specify filetypes to download. Default is all filetypes. Separate multiple filetypes with a comma. Example: `--filetype jpg,css,js`. Filetypes are filtered as they are in the snapshot. So if there is no explicit `html` file in the path (common practice) then you cant filter them.

  - **`--statuscode`** `<statuscode>`:<br>
    Specify HTTP status codes to download. Default is all statuscodes. Separate multiple status codes with a comma. Example: `--statuscode 200,301`. Pywaybackup will try to download any snapshot regardless of it's statuscode. For 404 of course this means logged errors and corresponding entries in the csv. However, you may want to get a csv that includes these negative attempts for your needs.<br>
    Common status codes you may want to handle/filter:
      - `200` (OK)
      - `301` (Moved Permanently - will redirect snapshot)
      - `404` (Not Found - snapshot seems to be empty)
      - `500` (Internal Server Error - snapshot is at least for now not available)

### Optional

#### Behavior Manipulation

- **`-o`**, **`--output`**:<br>
  Defaults to `waybackup_snapshots` in the current directory. The folder where downloaded files will be saved.

- **`-m`**, **`--metadata`**<br>
  Change the folder where metadata will be saved (`cdx`/`db`/`csv`/`log`). Especially if you are downloading into a network share, you SHOULD set this to a local path because sqlite locking mechanism may cause issues with network shares.

- **`--verbose`**:<br>
  Increase output verbosity.

- **`--log`** <!-- `<path>` -->:<br>
  Saves a log file into the output-dir. Named as `waybackup_<sanitized_url>.log`.

- **`--progress`**:<br>
  Shows a progress bar instead of the default output.

- **`--workers`** `<count>`:<br>
  Sets the number of simultaneous download workers. Default is 1, safe range is about 10. Be cautious as too many workers may lead to refused connections from the Wayback Machine.

- **`--no-redirect`**:<br>
  Disables following redirects of snapshots. Useful for preventing timestamp-folder mismatches caused by Archive.org redirects.

- **`--retry`** `<attempts>`:<br>
  Specifies number of retry attempts for failed downloads.

- **`--delay`** `<seconds>`:<br>
  Specifies delay between download requests in seconds. Default is no delay (0).

#### Job Handling:

- **`--reset`**:  
  If set, the job will be reset, and any existing `cdx`, `db`, `csv` files will be **deleted**. This allows you to start the job from scratch without considering previously downloaded data.

- **`--keep`**:  
  If set, all files will be kept after the job is finished. This includes the `cdx` and `db` file. Without this argument, they will be deleted if the job finished successfully.

<br>
<br>

## Usage

### Handling Interrupted Jobs

`pywaybackup` resumes interrupted jobs. The tool automatically continues from where it left off.

- Detects existing `.cdx` and `.db` files in an `output dir` to resume downloading from the last successful point.
- Compares `URL`, `mode`, and `optional query parameters` to ensure automatic resumption.
- Skips previously downloaded files to save time.
  > **Note:** Changing URL, mode selection, query parameters or output prevents automatic resumption.

#### Resetting a Job (`--reset`)

- Deletes `.cdx` and `.db` files and restarts the process from scratch.
- Does **not** remove already downloaded files.
- `waybackup -u https://example.com -a --reset`

#### Keeping Job Data (`--keep`)

- Normally, `.cdx` and `.db` files are deleted after a successful job.
- `--keep` preserves them for future re-analysis or extending the query.
- `waybackup -u https://example.com -a --keep`

<br>
<br>

## Examples

1. Download a specific single snapshot of all available files (starting from root):<br>
   `waybackup -u https://example.com -a --start 20210101000000 --end 20210101000000`
2. Download a specific single snapshot of all available files (starting from a subdirectory):<br>
   `waybackup -u https://example.com/subdir1/subdir2/assets/ -a --start 20210101000000 --end 20210101000000`
3. Download a specific single snapshot of the exact given URL (no subdirs):<br>
   `waybackup -u https://example.com -a --start 20210101000000 --end 20210101000000 --explicit`
4. Download all snapshots of all available files in the given range:<br>
   `waybackup -u https://example.com -a --start 20210101000000 --end 20231122000000`

<br>
<br>

## Output

### Path Structure

The output path is currently structured as follows by an example for the query:<br>
`http://example.com/subdir1/subdir2/assets/`
<br><br>
For the first and last version (`-f` or `-l`):

- Will only include all files/folders starting from your query-path.

```
your/path/waybackup_snapshots/
└── the_root_of_your_query/ (example.com/)
    └── subdir1/
        └── subdir2/
            └── assets/
                ├── image.jpg
                ├── style.css
                ...
```

For all versions (`-a`):

- Will create a folder named as the root of your query. Inside this folder, you will find all timestamps and per timestamp the path you requested.

```
your/path/waybackup_snapshots/
└── the_root_of_your_query/ (example.com/)
    ├── yyyymmddhhmmss/
    │   ├── subidr1/
    │   │   └── subdir2/
    │   │       └── assets/
    │   │           ├── image.jpg
    │   │           └── style.css
    ├── yyyymmddhhmmss/
    │   ├── subdir1/
    │   │   └── subdir2/
    │   │       └── assets/
    │   │           ├── image.jpg
    │   │           └── style.css
    ...
```

### CSV

The CSV contains a snapshot per row:

```
[
   {
      "file": "/your/path/waybackup_snapshots/example.com/yyyymmddhhmmss/index.html",
      "id": 1,
      "redirect_timestamp": "yyyymmddhhmmss",
      "redirect_url": "http://web.archive.org/web/yyyymmddhhmmssid_/http://example.com/",
      "response": 200,
      "timestamp": "yyyymmddhhmmss",
      "url_archive": "http://web.archive.org/web/yyyymmddhhmmssid_/http://example.com/",
      "url_origin": "http://example.com/"
   },
    ...
]
```

### Log

Verbose:

```
-----> Worker: 2 - Attempt: [1/1] Snapshot ID: [23/81]
SUCCESS   -> 200 OK
          -> URL:  https://web.archive.org/web/20240225193302id_/https://example.com/assets/css/custom-styles.css
          -> FILE: /home/manjaro/Stuff/python-wayback-machine-downloader/waybackup_snapshots/example.com/20240225193302id_/assets/css/custom-styles.css
```

Non-verbose:

```
55/81 - W:2 - SUCCESS - 20240225193302 - https://example.com/assets/css/custom-styles.css
```

### Debugging

Exceptions will be written into `waybackup_error.log` (each run overwrites the file).

<br>
<br>

## Contributing

I'm always happy for some feature requests to improve the usability of this tool.
Feel free to give suggestions and report issues. Project is still far from being perfect.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "pywaybackup",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": "bitdruid <bitdruid@outlook.com>",
    "download_url": "https://files.pythonhosted.org/packages/31/55/db07cdd46b0abb17739ff567a60eb5a0759cefae26a33c7b727c78b1d221/pywaybackup-4.0.0.tar.gz",
    "platform": null,
    "description": "# python wayback machine downloader\n\n[![PyPI](https://img.shields.io/pypi/v/pywaybackup)](https://pypi.org/project/pywaybackup/)\n[![PyPI - Downloads](https://img.shields.io/pypi/dm/pywaybackup)](https://pypi.org/project/pywaybackup/)\n![Python Version](https://img.shields.io/badge/Python-3.8-blue)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\nDownloading archived web pages from the [Wayback Machine](https://archive.org/web/).\n\nInternet-archive is a nice source for several OSINT-information. This tool is a work in progress to query and fetch archived web pages.\n\nThis tool allows you to download content from the Wayback Machine (archive.org). You can use it to download either the latest version or all versions of web page snapshots within a specified range.\n\n# Content\n\n\u27a1\ufe0f [Installation](#installation) <br>\n\u27a1\ufe0f [notes / issues / hints](#notes--issues--hints) <br>\n\u27a1\ufe0f [import](#import) <br>\n\u27a1\ufe0f [cli](#cli) <br>\n\u27a1\ufe0f [Usage](#usage) <br>\n\u27a1\ufe0f [Examples](#examples) <br>\n\u27a1\ufe0f [Output](#output) <br>\n\u27a1\ufe0f [Contributing](#contributing) <br>\n\n## Installation\n\n### Pip\n\n1. Install the package <br>\n   `pip install pywaybackup`\n2. Run the tool <br>\n   `waybackup -h`\n\n### Manual\n\n1. Clone the repository <br>\n   `git clone https://github.com/bitdruid/python-wayback-machine-downloader.git`\n2. Install <br>\n   `pip install .`\n   - in a virtual env or use `--break-system-package`\n\n## notes / issues / hints\n\n- Linux recommended: On Windows machines, the path length is limited. Files that exceed the path length will not be downloaded.\n- The tool uses a sqlite database to handle snapshots. The database will only persist while the download is running.\n- If you query an explicit file (e.g. a query-string `?query=this` or `login.html`), the `--explicit`-argument is recommended as a wildcard query may lead to an empty result.\n- Downloading directly into a network share is not recommended. The sqlite locking mechanism may cause issues. If you need to download into a network share, set the `--metadata` argument to a local path.\n\n<br>\n<br>\n\n## import\n\nYou can import pywaybackup into your own scripts and run it. Args are the same as cli.\n\nAdditional args:\n- `silent` (default False): If True, suppresses all output to the console.\n- `debug` (default True): If False, disables writing errors to the error log file.\n\nUse:\n- `run()`\n- `status()`\n- `paths()`\n- `stop()`\n\n```python\nfrom pywaybackup import PyWayBackup\n\nbackup = PyWayBackup(\n  url=\"https://example.com\",\n  all=True,\n  start=\"20200101\",\n  end=\"20201231\",\n  silent=False,\n  debug=True,\n  log=True,\n  keep=True\n)\n\nbackup.run()\nbackup_paths = backup.paths(rel=True)\nprint(backup_paths)\n```\noutput:\n```bash\n{\n  'snapshots': 'output/example.com',\n  'cdxfile': 'output/waybackup_example.cdx',\n  'dbfile': 'output/waybackup_example.com.db',\n  'csvfile': 'output/waybackup_https.example.com.csv',\n  'log': 'output/waybackup_example.com.log',\n  'debug': 'output/waybackup_error.log'\n}\n```\n\n... or run it asynchronously and print the current status or stop it whenever needed.\n\n```python\nimport time\nfrom pywaybackup import PyWayBackup\n\nbackup = PyWayBackup( ... )\nbackup.run(daemon=True)\nprint(backup.status())\ntime.sleep(10)\nprint(backup.status())\nbackup.stop()\n```\noutput:\n```bash\n{\n  'task': 'downloading snapshots',\n  'current': 15,\n  'total': 84,\n  'progress': '18%'\n}\n```\n\n## cli\n\n- `-h`, `--help`: Show the help message and exit.\n- `-v`, `--version`: Show information about the tool and exit.\n\n#### Required\n\n- **`-u`**, **`--url`**:<br>\n  The URL of the web page to download. This argument is required.\n\n#### Mode Selection (Choose One)\n\n- **`-a`**, **`--all`**:<br>\n  Download snapshots of all timestamps. You will get a folder per timestamp with the files available at that time.\n- **`-l`**, **`--last`**:<br>\n  Download the last version of each file snapshot. You will get one directory with a rebuild of the page. It contains the last version of each file of your specified `--range`.\n- **`-f`**, **`--first`**:<br>\n  Download the first version of each file snapshot. You will get one directory with a rebuild of the page. It contains the first version of each file of your specified `--range`.\n- **`-s`**, **`--save`**:<br>\n  Save a page to the Wayback Machine. (beta)\n\n#### Optional query parameters\n\n- **`-e`**, **`--explicit`**:<br>\n  Only download the explicit given URL. No wildcard subdomains or paths. Use e.g. to get root-only snapshots. This is recommended for explicit files like `login.html` or `?query=this`.\n\n- **`--limit`** `<count>`:<br>\n  Limits the amount of snapshots to query from the CDX server. If an existing CDX file is injected, the limit will have no effect. So you would need to set `--keep`.\n\n- **Range Selection:**<br>\n  Specify the range in years or a specific timestamp either start, end, or both. If you specify the `range`, the `start` and `end` will be ignored. Format for timestamps: YYYYMMDDhhmmss. You can only give a year or increase specificity by going through the timestamp starting on the left.<br>\n  (year 2019, year+month+day 20190101, year+month+day+hour 2019010112)\n\n  - **`-r`**, **`--range`**:<br>\n    Specify the range in years for which to search and download snapshots.\n  - **`--start`**:<br>\n    Timestamp to start searching.\n  - **`--end`**:<br>\n    Timestamp to end searching.\n\n- **Filtering:**<br>\n  A filter will result in a filtered cdx-file. So if you want to download all files later, you need to query again without the filter.\n\n  - **`--filetype`** `<filetype>`:<br>\n    Specify filetypes to download. Default is all filetypes. Separate multiple filetypes with a comma. Example: `--filetype jpg,css,js`. Filetypes are filtered as they are in the snapshot. So if there is no explicit `html` file in the path (common practice) then you cant filter them.\n\n  - **`--statuscode`** `<statuscode>`:<br>\n    Specify HTTP status codes to download. Default is all statuscodes. Separate multiple status codes with a comma. Example: `--statuscode 200,301`. Pywaybackup will try to download any snapshot regardless of it's statuscode. For 404 of course this means logged errors and corresponding entries in the csv. However, you may want to get a csv that includes these negative attempts for your needs.<br>\n    Common status codes you may want to handle/filter:\n      - `200` (OK)\n      - `301` (Moved Permanently - will redirect snapshot)\n      - `404` (Not Found - snapshot seems to be empty)\n      - `500` (Internal Server Error - snapshot is at least for now not available)\n\n### Optional\n\n#### Behavior Manipulation\n\n- **`-o`**, **`--output`**:<br>\n  Defaults to `waybackup_snapshots` in the current directory. The folder where downloaded files will be saved.\n\n- **`-m`**, **`--metadata`**<br>\n  Change the folder where metadata will be saved (`cdx`/`db`/`csv`/`log`). Especially if you are downloading into a network share, you SHOULD set this to a local path because sqlite locking mechanism may cause issues with network shares.\n\n- **`--verbose`**:<br>\n  Increase output verbosity.\n\n- **`--log`** <!-- `<path>` -->:<br>\n  Saves a log file into the output-dir. Named as `waybackup_<sanitized_url>.log`.\n\n- **`--progress`**:<br>\n  Shows a progress bar instead of the default output.\n\n- **`--workers`** `<count>`:<br>\n  Sets the number of simultaneous download workers. Default is 1, safe range is about 10. Be cautious as too many workers may lead to refused connections from the Wayback Machine.\n\n- **`--no-redirect`**:<br>\n  Disables following redirects of snapshots. Useful for preventing timestamp-folder mismatches caused by Archive.org redirects.\n\n- **`--retry`** `<attempts>`:<br>\n  Specifies number of retry attempts for failed downloads.\n\n- **`--delay`** `<seconds>`:<br>\n  Specifies delay between download requests in seconds. Default is no delay (0).\n\n#### Job Handling:\n\n- **`--reset`**:  \n  If set, the job will be reset, and any existing `cdx`, `db`, `csv` files will be **deleted**. This allows you to start the job from scratch without considering previously downloaded data.\n\n- **`--keep`**:  \n  If set, all files will be kept after the job is finished. This includes the `cdx` and `db` file. Without this argument, they will be deleted if the job finished successfully.\n\n<br>\n<br>\n\n## Usage\n\n### Handling Interrupted Jobs\n\n`pywaybackup` resumes interrupted jobs. The tool automatically continues from where it left off.\n\n- Detects existing `.cdx` and `.db` files in an `output dir` to resume downloading from the last successful point.\n- Compares `URL`, `mode`, and `optional query parameters` to ensure automatic resumption.\n- Skips previously downloaded files to save time.\n  > **Note:** Changing URL, mode selection, query parameters or output prevents automatic resumption.\n\n#### Resetting a Job (`--reset`)\n\n- Deletes `.cdx` and `.db` files and restarts the process from scratch.\n- Does **not** remove already downloaded files.\n- `waybackup -u https://example.com -a --reset`\n\n#### Keeping Job Data (`--keep`)\n\n- Normally, `.cdx` and `.db` files are deleted after a successful job.\n- `--keep` preserves them for future re-analysis or extending the query.\n- `waybackup -u https://example.com -a --keep`\n\n<br>\n<br>\n\n## Examples\n\n1. Download a specific single snapshot of all available files (starting from root):<br>\n   `waybackup -u https://example.com -a --start 20210101000000 --end 20210101000000`\n2. Download a specific single snapshot of all available files (starting from a subdirectory):<br>\n   `waybackup -u https://example.com/subdir1/subdir2/assets/ -a --start 20210101000000 --end 20210101000000`\n3. Download a specific single snapshot of the exact given URL (no subdirs):<br>\n   `waybackup -u https://example.com -a --start 20210101000000 --end 20210101000000 --explicit`\n4. Download all snapshots of all available files in the given range:<br>\n   `waybackup -u https://example.com -a --start 20210101000000 --end 20231122000000`\n\n<br>\n<br>\n\n## Output\n\n### Path Structure\n\nThe output path is currently structured as follows by an example for the query:<br>\n`http://example.com/subdir1/subdir2/assets/`\n<br><br>\nFor the first and last version (`-f` or `-l`):\n\n- Will only include all files/folders starting from your query-path.\n\n```\nyour/path/waybackup_snapshots/\n\u2514\u2500\u2500 the_root_of_your_query/ (example.com/)\n    \u2514\u2500\u2500 subdir1/\n        \u2514\u2500\u2500 subdir2/\n            \u2514\u2500\u2500 assets/\n                \u251c\u2500\u2500 image.jpg\n                \u251c\u2500\u2500 style.css\n                ...\n```\n\nFor all versions (`-a`):\n\n- Will create a folder named as the root of your query. Inside this folder, you will find all timestamps and per timestamp the path you requested.\n\n```\nyour/path/waybackup_snapshots/\n\u2514\u2500\u2500 the_root_of_your_query/ (example.com/)\n    \u251c\u2500\u2500 yyyymmddhhmmss/\n    \u2502   \u251c\u2500\u2500 subidr1/\n    \u2502   \u2502   \u2514\u2500\u2500 subdir2/\n    \u2502   \u2502       \u2514\u2500\u2500 assets/\n    \u2502   \u2502           \u251c\u2500\u2500 image.jpg\n    \u2502   \u2502           \u2514\u2500\u2500 style.css\n    \u251c\u2500\u2500 yyyymmddhhmmss/\n    \u2502   \u251c\u2500\u2500 subdir1/\n    \u2502   \u2502   \u2514\u2500\u2500 subdir2/\n    \u2502   \u2502       \u2514\u2500\u2500 assets/\n    \u2502   \u2502           \u251c\u2500\u2500 image.jpg\n    \u2502   \u2502           \u2514\u2500\u2500 style.css\n    ...\n```\n\n### CSV\n\nThe CSV contains a snapshot per row:\n\n```\n[\n   {\n      \"file\": \"/your/path/waybackup_snapshots/example.com/yyyymmddhhmmss/index.html\",\n      \"id\": 1,\n      \"redirect_timestamp\": \"yyyymmddhhmmss\",\n      \"redirect_url\": \"http://web.archive.org/web/yyyymmddhhmmssid_/http://example.com/\",\n      \"response\": 200,\n      \"timestamp\": \"yyyymmddhhmmss\",\n      \"url_archive\": \"http://web.archive.org/web/yyyymmddhhmmssid_/http://example.com/\",\n      \"url_origin\": \"http://example.com/\"\n   },\n    ...\n]\n```\n\n### Log\n\nVerbose:\n\n```\n-----> Worker: 2 - Attempt: [1/1] Snapshot ID: [23/81]\nSUCCESS   -> 200 OK\n          -> URL:  https://web.archive.org/web/20240225193302id_/https://example.com/assets/css/custom-styles.css\n          -> FILE: /home/manjaro/Stuff/python-wayback-machine-downloader/waybackup_snapshots/example.com/20240225193302id_/assets/css/custom-styles.css\n```\n\nNon-verbose:\n\n```\n55/81 - W:2 - SUCCESS - 20240225193302 - https://example.com/assets/css/custom-styles.css\n```\n\n### Debugging\n\nExceptions will be written into `waybackup_error.log` (each run overwrites the file).\n\n<br>\n<br>\n\n## Contributing\n\nI'm always happy for some feature requests to improve the usability of this tool.\nFeel free to give suggestions and report issues. Project is still far from being perfect.\n",
    "bugtrack_url": null,
    "license": "MIT License\n        \n        Copyright (c) 2023 bitdruid\n        \n        Permission is hereby granted, free of charge, to any person obtaining a copy\n        of this software and associated documentation files (the \"Software\"), to deal\n        in the Software without restriction, including without limitation the rights\n        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n        copies of the Software, and to permit persons to whom the Software is\n        furnished to do so, subject to the following conditions:\n        \n        The above copyright notice and this permission notice shall be included in all\n        copies or substantial portions of the Software.\n        \n        THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n        SOFTWARE.\n        ",
    "summary": "Query and download archive.org as simple as possible.",
    "version": "4.0.0",
    "project_urls": {
        "homepage": "https://github.com/bitdruid/python-wayback-machine-downloader"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d02d230d4407a3fb4c9d957311613c2b7363dddc320d09752741f2cdc6573da7",
                "md5": "60ef758e50054f99da45896cd6165d71",
                "sha256": "673174eeb2982d01881dde3533d3780d519cd9c4e1b2ab782ac6118719ab82fc"
            },
            "downloads": -1,
            "filename": "pywaybackup-4.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "60ef758e50054f99da45896cd6165d71",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 32178,
            "upload_time": "2025-09-01T18:21:20",
            "upload_time_iso_8601": "2025-09-01T18:21:20.934679Z",
            "url": "https://files.pythonhosted.org/packages/d0/2d/230d4407a3fb4c9d957311613c2b7363dddc320d09752741f2cdc6573da7/pywaybackup-4.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "3155db07cdd46b0abb17739ff567a60eb5a0759cefae26a33c7b727c78b1d221",
                "md5": "1fe5229707e7f44efd84353ba425e933",
                "sha256": "4f8197c9f100d6e3cd97a30cbe1ce86d163955c404b993e1dbe30e000daf1575"
            },
            "downloads": -1,
            "filename": "pywaybackup-4.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "1fe5229707e7f44efd84353ba425e933",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 31583,
            "upload_time": "2025-09-01T18:21:22",
            "upload_time_iso_8601": "2025-09-01T18:21:22.192208Z",
            "url": "https://files.pythonhosted.org/packages/31/55/db07cdd46b0abb17739ff567a60eb5a0759cefae26a33c7b727c78b1d221/pywaybackup-4.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-01 18:21:22",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "bitdruid",
    "github_project": "python-wayback-machine-downloader",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "pywaybackup"
}

None