wayback-downloader


Namewayback-downloader JSON
Version 0.1.6 PyPI version JSON
download
home_pagehttps://github.com/carygeo/wayback_downloader
SummaryA powerful CLI tool to download and archive historical versions of websites from the Wayback Machine.
upload_time2024-07-24 16:38:01
maintainerNone
docs_urlNone
authorCary Greenwood
requires_python<4.0,>=3.7
licenseMIT
keywords wayback archive downloader web history internet archive
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Wayback Downloader

Wayback Downloader is a powerful and user-friendly command-line tool designed to retrieve and archive historical versions of websites from the Internet Archive's Wayback Machine. This Python-based utility empowers users to effortlessly capture and preserve web content across time, making it an invaluable resource for researchers, developers, and digital archivists.

## Key Features:

1. **Efficient Retrieval**: Quickly download multiple snapshots of a website within a specified date range.
2. **Selective Archiving**: Save only unique content, avoiding duplicate snapshots to conserve storage space.
3. **Recursive Crawling**: Automatically discover and download linked pages within the same domain.
4. **Flexible Date Range**: Specify custom start and end dates for targeted historical content retrieval.
5. **Robust Error Handling**: Implements retry mechanisms and comprehensive error reporting for reliable operation.
6. **User-Friendly CLI**: Simple command-line interface for easy integration into workflows and scripts.
7. **Customizable Output**: Option to specify the output directory for downloaded archives.
8. **Verbose Logging**: Detailed progress and diagnostic information available with verbose mode.

Wayback Downloader simplifies the process of accessing and preserving web history, making it easier than ever to study website evolution, recover lost content, or create comprehensive web archives. Whether you're conducting academic research, performing due diligence, or simply curious about the past state of the web, Wayback Downloader provides a streamlined solution for accessing the vast archives of the Wayback Machine.

Get started with Wayback Downloader today and unlock the power of web history at your fingertips!

## Installation

You can install Wayback Downloader using pip:

```
pip install wayback_downloader
```

## Usage

After installation, you can use Wayback Downloader from the command line:

```
wayback-downloader [URL] [START_DATE] [END_DATE] [-o OUTPUT_DIR] [-v]
```

For example:

```
wayback-downloader http://example.com 20200101 20230101 -o /path/to/output -v
```

This will download archives for example.com from January 1, 2020, to January 1, 2023, save them to the specified output directory, and provide verbose output.

For more information on available options:

```
wayback-downloader --help
```

## License

This project is licensed under the MIT License - see the LICENSE file for details.


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/carygeo/wayback_downloader",
    "name": "wayback-downloader",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.7",
    "maintainer_email": null,
    "keywords": "wayback, archive, downloader, web, history, internet archive",
    "author": "Cary Greenwood",
    "author_email": "carygreenwood@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/16/d4/559526a8114964823af7fa7de957f512c3301fadcca8cd47414eab86488b/wayback_downloader-0.1.6.tar.gz",
    "platform": null,
    "description": "# Wayback Downloader\n\nWayback Downloader is a powerful and user-friendly command-line tool designed to retrieve and archive historical versions of websites from the Internet Archive's Wayback Machine. This Python-based utility empowers users to effortlessly capture and preserve web content across time, making it an invaluable resource for researchers, developers, and digital archivists.\n\n## Key Features:\n\n1. **Efficient Retrieval**: Quickly download multiple snapshots of a website within a specified date range.\n2. **Selective Archiving**: Save only unique content, avoiding duplicate snapshots to conserve storage space.\n3. **Recursive Crawling**: Automatically discover and download linked pages within the same domain.\n4. **Flexible Date Range**: Specify custom start and end dates for targeted historical content retrieval.\n5. **Robust Error Handling**: Implements retry mechanisms and comprehensive error reporting for reliable operation.\n6. **User-Friendly CLI**: Simple command-line interface for easy integration into workflows and scripts.\n7. **Customizable Output**: Option to specify the output directory for downloaded archives.\n8. **Verbose Logging**: Detailed progress and diagnostic information available with verbose mode.\n\nWayback Downloader simplifies the process of accessing and preserving web history, making it easier than ever to study website evolution, recover lost content, or create comprehensive web archives. Whether you're conducting academic research, performing due diligence, or simply curious about the past state of the web, Wayback Downloader provides a streamlined solution for accessing the vast archives of the Wayback Machine.\n\nGet started with Wayback Downloader today and unlock the power of web history at your fingertips!\n\n## Installation\n\nYou can install Wayback Downloader using pip:\n\n```\npip install wayback_downloader\n```\n\n## Usage\n\nAfter installation, you can use Wayback Downloader from the command line:\n\n```\nwayback-downloader [URL] [START_DATE] [END_DATE] [-o OUTPUT_DIR] [-v]\n```\n\nFor example:\n\n```\nwayback-downloader http://example.com 20200101 20230101 -o /path/to/output -v\n```\n\nThis will download archives for example.com from January 1, 2020, to January 1, 2023, save them to the specified output directory, and provide verbose output.\n\nFor more information on available options:\n\n```\nwayback-downloader --help\n```\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A powerful CLI tool to download and archive historical versions of websites from the Wayback Machine.",
    "version": "0.1.6",
    "project_urls": {
        "Homepage": "https://github.com/carygeo/wayback_downloader",
        "Repository": "https://github.com/carygeo/wayback_downloader"
    },
    "split_keywords": [
        "wayback",
        " archive",
        " downloader",
        " web",
        " history",
        " internet archive"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0059108b98070ea17297f9b2c00e461f8c9468dff43dd26c431078aa4b30f816",
                "md5": "3d6ada74dc60950b32ae6348f7f55c43",
                "sha256": "a0e4d5257889d6dcfe1415cf2f1aabf97dc770ac31b63911444928ed7f4c4735"
            },
            "downloads": -1,
            "filename": "wayback_downloader-0.1.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3d6ada74dc60950b32ae6348f7f55c43",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.7",
            "size": 5870,
            "upload_time": "2024-07-24T16:37:59",
            "upload_time_iso_8601": "2024-07-24T16:37:59.783939Z",
            "url": "https://files.pythonhosted.org/packages/00/59/108b98070ea17297f9b2c00e461f8c9468dff43dd26c431078aa4b30f816/wayback_downloader-0.1.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "16d4559526a8114964823af7fa7de957f512c3301fadcca8cd47414eab86488b",
                "md5": "29c6202b73507d5c5e0d362d06395c02",
                "sha256": "41ea5068161bfb14ee9bb928269710cb8634d69ffa128328aab95cc8897486b0"
            },
            "downloads": -1,
            "filename": "wayback_downloader-0.1.6.tar.gz",
            "has_sig": false,
            "md5_digest": "29c6202b73507d5c5e0d362d06395c02",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.7",
            "size": 5023,
            "upload_time": "2024-07-24T16:38:01",
            "upload_time_iso_8601": "2024-07-24T16:38:01.210716Z",
            "url": "https://files.pythonhosted.org/packages/16/d4/559526a8114964823af7fa7de957f512c3301fadcca8cd47414eab86488b/wayback_downloader-0.1.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-24 16:38:01",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "carygeo",
    "github_project": "wayback_downloader",
    "github_not_found": true,
    "lcname": "wayback-downloader"
}
        
Elapsed time: 0.85259s