pypdl

Name	pypdl JSON
Version	1.5.5 JSON
	download
home_page	https://github.com/mjishnu/pypdl
Summary	A concurrent pure python download manager
upload_time	2025-07-12 09:59:23
maintainer	None
docs_url	None
author	mjishnu
requires_python	>=3.8
license	MIT
keywords	python downloader multi-threaded-downloader concurrent-downloader parallel-downloader async-downloader asyncronous-downloader download-manager fast-downloader download-accelerator download-optimizer download-utility download-tool download-automation
VCS
bugtrack_url
requirements	aiohttp aiofiles
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # pypdl

pypdl is a Python library for downloading files from the internet. It provides features such as multi-segmented downloads, retry download in case of failure, option to continue downloading using a different URL if necessary, progress tracking, pause/resume functionality, checksum and many more.

## Table of Contents

- [Prerequisites](#prerequisites)
- [Installation](#installation)
- [Usage](#usage)
  - [Basic Usage](#basic-usage)
  - [Advanced Usage](#advanced-usage)
  - [Examples](#examples)
- [API Reference](#api-reference)
- [License](#license)
- [Contribution](#contribution)
- [Contact](#contact)

## Prerequisites

* Python 3.8 or later.

## Installation

To install the pypdl, run the following command:


```bash
pip install pypdl
```
## Usage

### Basic Usage

To download a file using the pypdl, simply create a new `Pypdl` object and call its `start` method, passing in the URL of the file to be downloaded:

```py
from pypdl import Pypdl

dl = Pypdl()
dl.start('http://example.com/file.txt')
```

### Advanced Usage

The `Pypdl` object provides additional options for advanced usage:

```py
from pypdl import Pypdl

dl = Pypdl(allow_reuse=False, logger=default_logger("Pypdl"), max_concurrent=1)
dl.start(
    url: Union[Callable, str] = None,
    file_path: str = None,
    tasks: List = None,
    multisegment: bool = True,
    segments: int = 5,
    retries: int = 0,
    mirrors: Union[str, List, Callable] = None,
    overwrite: bool = True,
    speed_limit: float = 0,
    etag_validation: bool = True,
    hash_algorithms: Union[str, List] = None,
    callback: Callable = None,
    block: bool = True,
    display: bool = True,
)
```

Each option is explained below:
- `allow_reuse`: Whether to allow reuse of existing Pypdl object for the next download. The default value is `False`.
- `logger`: A logger object to log messages. The default value is a custom `Logger` with the name *Pypdl*.
- `max_concurrent`: The maximum number of concurrent downloads. The default value is 1.
- `url`: This can either be the URL of the file to download or a function that returns the URL.
- `file_path`: An optional path to save the downloaded file. By default, it uses the present working directory. If `file_path` is a directory, then the file is downloaded into it; otherwise, the file is downloaded into the given path.
- `tasks`: A list of tasks to be downloaded. Each task is a dictionary with the following keys:
    - `url` (required): The URL of the file to download.
    - Optional keys (The default value is set by the `Pypdl` start method):
        - `file_path`: path to save the downloaded file.
        - `multisegment`: Whether to use multi-segmented download. 
        - `segments`: The number of segments the file should be divided into for multi-segmented download.
        - `retries`: The number of times to retry the download in case of an error.
        - `mirrors`: The mirror URLs to be used if the primary URL fails.
        - `overwrite`: Whether to overwrite the file if it already exists. 
        - `speed_limit`: The maximum download speed in MB/s. 
        - `etag_validation`: Whether to validate the ETag before resuming downloads.
        - `hash_algorithms`: The hash algorithms to be used for precomputation of hash values.
        - `callback`: A callback function to be called when the download is complete.
    - Additional supported keyword arguments of `Pypdl` start method.
    
- `multisegment`: Whether to use multi-segmented download. The default value is `True`.
- `segments`: The number of segments the file should be divided into for multi-segmented download. The default value is 5.
- `retries`: The number of times to retry the download in case of an error. The default value is 0.
- `mirrors`: The mirror URLs to be used if the primary URL fails. The default value is `None`. It can be a callable (functions, coroutines), string or List of callables, strings or both.
- `overwrite`: Whether to overwrite the file if it already exists. The default value is `True`.
- `speed_limit`: The maximum download speed in MB/s. The default value is 0.
- `etag_validation`: Whether to validate the ETag before resuming downloads. The default value is `True`.
- `hash_algorithms`: The hash algorithms to be used for precomputation of hash values. It can be a string or a list of strings. The default value is `None`.
- `callback`: A callback function to be called when the download is complete. The default value is `None`. The function must accept 2 positional parameters: `status` (bool) indicating if the download was successful, and `result` (FileValidator object if successful, None if failed).
- `block`: Whether to block until the download is complete. The default value is `True`.
- `display`: Whether to display download progress and other optional messages. The default value is `True`.

- Supported Keyword Arguments:
    - `params`: Parameters to be sent in the query string of the new request. The default value is `None`.
    - `data`: The data to send in the body of the request. The default value is `None`.
    - `json`: A JSON-compatible Python object to send in the body of the request. The default value is `None`.
    - `cookies`: HTTP Cookies to send with the request. The default value is `None`.
    - `headers`: HTTP headers to be sent with the request. The default value is `None`. *Please note that [multi-range headers](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Range#requesting_multiple_ranges) are not supported*.
    - `auth`: An object that represents HTTP Basic Authorization. The default value is `None`.
    - `allow_redirects`: If set to False, do not follow redirects. The default value is `True`.
    - `max_redirects`: Maximum number of redirects to follow. The default value is `10`.
    - `proxy`: Proxy URL. The default value is `None`.
    - `proxy_auth`: An object that represents proxy HTTP Basic Authorization. The default value is `None`.
    - `timeout`: (default `aiohttp.ClientTimeout(sock_read=60)`): Override the session’s timeout. The default value is `aiohttp.ClientTimeout(sock_read=60)`.
    - `ssl`: SSL validation mode. The default value is `True`.
    - `proxy_headers`: HTTP headers to send to the proxy if the `proxy` parameter has been provided. The default value is `None`.

    For detailed information on each parameter, refer the [aiohttp documentation](https://docs.aiohttp.org/en/stable/client_reference.html#aiohttp.ClientSession.request). Please ensure that only the *supported keyword arguments* are used. Using unsupported or irrelevant keyword arguments may lead to unexpected behavior or errors.

### Examples

Here is an example that demonstrates how to use pypdl library to download a file using headers, proxy and timeout:

```py
import aiohttp
from pypdl import Pypdl

def main():
    # Using headers 
    headers = {"User-Agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0", "range":"bytes=-10485760"}
    # Using proxy
    proxy = "http://user:pass@some.proxy.com"
    # Using timeout
    timeout = aiohttp.ClientTimeout(sock_read=20)

    # create a new pypdl object
    dl = Pypdl()

    # start the download
    dl.start(
        url='https://speed.hetzner.de/100MB.bin',
        file_path='100MB.bin',
        segments=10,
        display=True,
        multisegment=True,
        block=True,
        retries=3,
        etag_validation=True,
        headers=headers, 
        proxy=proxy, 
        timeout=timeout
    )

if __name__ == '__main__':
    main()
```

This example downloads a file from the internet using 10 segments and displays the download progress. If the download fails, it will retry up to 3 times. We are also using headers to set the User-Agent and Range to download the last 10MB of the file, as well as a proxy and timeout. For more information on these parameters, refer to the [API reference](https://github.com/mjishnu/pypdl?tab=readme-ov-file#pypdl-1).

Another example of implementing pause resume functionality, printing the progress to console and changing log level to debug:

```py
from pypdl import Pypdl

# create a pypdl object
dl = Pypdl()

# changing log level to debug
dl.logger.setLevel('DEBUG')

# start the download process
# block=False so we can print the progress
# display=False so we can print the progress ourselves
future = dl.start('https://example.com/file.zip', segments=8,block=False,display=False)

# print the progress
while dl.progress != 70:
  print(dl.progress)

# stop the download process
dl.stop() 

#do something
#...

# resume the download process
future = dl.start('https://example.com/file.zip', segments=8,block=False,display=False)

# print rest of the progress
while not dl.completed:
  print(dl.progress)

# get the result, calling result() on future is essential when block=False so everything is properly cleaned up
result = future.result()

```

This example we start the download process and print the progress to console. We then stop the download process and do something else. After that we resume the download process and print the rest of the progress to console. This can be used to create a pause/resume functionality.

Another example of using hash validation with dynamic url:

```py
from pypdl import Pypdl

# Generate the url dynamically
def dynamic_url():
    return 'https://example.com/file.zip'

# create a pypdl object
dl = Pypdl()

# if block = True --> returns a FileValidator object
res = dl.start(dynamic_url, block=True) 

# validate hash
if res.validate_hash(correct_hash,'sha256'):
    print('Hash is valid')
else:
    print('Hash is invalid')

# scenario where block = False --> returns a AutoShutdownFuture object
mirror_urls = ['https://example1.com/file2.zip', 'https://example2.com/file2.zip']

# retry download with different url if current fails
future = dl.start(url="https://example.com/file2.zip", mirrors=mirror_urls, block=False,retries=2)

# do something
# ...

# It is essential to call result() on future when block=False so everything is properly cleaned up
res = future.result()
# validate hash
if dl.completed:
  if res.validate_hash(correct_hash,'sha256'):
      print('Hash is valid')
  else:
      print('Hash is invalid')
```
An example of using Pypdl object to get size of the files with `allow_reuse` set to `True` and custom logger:

```py
import logging
import time
from pypdl import Pypdl

urls = [
    'https://example.com/file1.zip',
    'https://example.com/file2.zip',
    'https://example.com/file3.zip',
    'https://example.com/file4.zip',
    'https://example.com/file5.zip',
]

# create a custom logger
logger = logging.getLogger('custom')

size = []

# create a pypdl object
dl = Pypdl(allow_reuse=True, logger=logger)

for url in urls:
    dl.start(url, block=False)
    
    # get the size of the file and add it to size list
    size.append(dl.size)

    # do something 

    while not dl.completed:
        print(dl.progress)

print(size)
# shutdown the downloader, this is essential when allow_reuse is enabled
dl.shutdown()

```


An example of downloading multiple files concurrently:

```py
from pypdl import pypdl

proxy = "http://user:pass@some.proxy.com"

# create a pypdl object with max_concurrent set to 2
dl = pypdl(max_concurrent=2, allow_reuse=True)

# List of tasks to be downloaded..
tasks = [
    {'url':'https://example.com/file1.zip', 'file_path': 'file1.zip'},
    {'url':'https://example.com/file2.zip', 'file_path': 'file2.zip'},
    {'url':'https://example.com/file3.zip', 'file_path': 'file3.zip'},
    {'url':'https://example.com/file4.zip', 'file_path': 'file4.zip'},
    {'url':'https://example.com/file5.zip', 'file_path': 'file5.zip'},
]

# start the download process with proxy
results = dl.start(tasks=tasks, display=True, block=False,proxy=proxy)

# do something
# ...

# stop the download process
dl.stop()

# do something
# ...

# restart the download process without proxy
results = dl.start(tasks=tasks, display=True, block=True)

# print the results
for url, result in results:
    # validate hash
    if result.validate_hash(correct_hash,'sha256'):
        print(f'{url} - Hash is valid')
    else:
        print(f'{url} - Hash is invalid')

task2 = [
    {'url':'https://example.com/file6.zip', 'file_path': 'file6.zip'},
    {'url':'https://example.com/file7.zip', 'file_path': 'file7.zip'},
    {'url':'https://example.com/file8.zip', 'file_path': 'file8.zip'},
    {'url':'https://example.com/file9.zip', 'file_path': 'file9.zip'},
    {'url':'https://example.com/file10.zip', 'file_path': 'file10.zip'},
]

# start the download process
dl.start(tasks=task2, display=True, block=True)

# shutdown the downloader, this is essential when allow_reuse is enabled
dl.shutdown()
```
Another example of using precomputed hash for parallel calculation and validation of hash using callbacks

```py
from pypdl import pypdl

# create a pypdl object with max_concurrent set to 2
dl = pypdl(max_concurrent=2)

# List of tasks to be downloaded..
tasks = [
    {'url':'https://example.com/file1.zip', 'file_path': 'file1.zip'},
    {'url':'https://example.com/file2.zip', 'file_path': 'file2.zip'},
    {'url':'https://example.com/file3.zip', 'file_path': 'file3.zip'},
    {'url':'https://example.com/file4.zip', 'file_path': 'file4.zip'},
    {'url':'https://example.com/file5.zip', 'file_path': 'file5.zip'},
]

# Callback requires 2 positional arguments
# status: bool indicating download success
# result: FileValidator object if successful, None if failed
def callback_func(status, result):
    if status == True:
        result.validate_hash(correct_hash=correct_hash, algorithm='sha256')
        # do something 

# hash_algorithms can be a list for multiple hashes: ['sha256', 'md5']
# Hashes are computed during download and cached into FileValidator
dl.start(tasks=tasks, hash_algorithms='sha256', callback=callback_func)
```
## API Reference

### `Pypdl()`

The `Pypdl` class represents a file downloader that can download a file from a given URL to a specified file path. The class supports both single-segmented and multi-segmented downloads and many other features like retry download in case of failure and option to continue downloading using a different url if necessary, pause/resume functionality, progress tracking etc.

#### Arguments
- `allow_reuse`: (bool, Optional) Whether to allow reuse of existing `Pypdl` object for next download. The default value is `False`. It's essential to use `shutdown()` method when `allow_reuse` is enabled to ensure efficient resource management.

- `logger`: (logging.Logger, Optional) A logger object to log messages. The default value is custom `Logger` with the name *Pypdl*.

- `max_concurrent`: (int, Optional) The maximum number of concurrent downloads. The default value is 1.

#### Attributes

- `size`: The total size of the file to be downloaded, in bytes.
- `current_size`: The amount of data downloaded so far, in bytes.
- `remaining_size`: The amount of data remaining to be downloaded, in bytes.
- `progress`: The download progress percentage.
- `speed`: The download speed, in MB/s.
- `time_spent`: The time spent downloading, in seconds.
- `eta`: The estimated time remaining for download completion, in seconds.
- `total_tasks`: The total number of tasks to be downloaded.
- `completed_tasks`: The number of tasks that have been completed.
- `task_progress`: The progress of all tasks.
- `completed`: A flag that indicates if the download is complete.
- `success`: A list of tasks that were successfully downloaded.
- `failed`: A list of tasks that failed to download.
- `logger`: A property that returns the logger if available.
- `is_idle`: A flag that indicates if the downloader is idle.

#### Methods

- `start(url = None,
    file_path = None,
    tasks = None,
    multisegment = True,
    segments = 5,
    retries = 0,
    mirrors = None,
    overwrite = True,
    speed_limit = 0,
    etag_validation = True,
    hash_algorithms = None,
    callback = None,
    block = True,
    display = True,
)`: Starts the download process.

    ##### Parameters

    - `url`: This can either be the URL of the file to download or a function that returns the URL.
    - `file_path`: An optional path to save the downloaded file. By default, it uses the present working directory. If `file_path` is a directory, then the file is downloaded into it; otherwise, the file is downloaded into the given path.
    - `tasks`: A list of tasks to be downloaded. Each task is a dictionary with the following keys:
        - `url` (required): The URL of the file to download.
        - Optional keys (The default value is set by the `Pypdl` start method):
            - `file_path`: path to save the downloaded file.
            - `multisegment`: Whether to use multi-segmented download. 
            - `segments`: The number of segments the file should be divided into for multi-segmented download.
            - `retries`: The number of times to retry the download in case of an error.
            - `mirrors`: The mirror URLs to be used if the primary URL fails.
            - `overwrite`: Whether to overwrite the file if it already exists. 
            - `speed_limit`: The maximum download speed in MB/s.
            - `etag_validation`: Whether to validate the ETag before resuming downloads.
            - `hash_algorithms`: The hash algorithms to be used for precomputation of hash values.
            - `callback`: A callback function to be called when the download is complete.
        - Additional supported keyword arguments of `Pypdl` start method.
        
    - `multisegment`: Whether to use multi-segmented download. The default value is `True`.
    - `segments`: The number of segments the file should be divided into for multi-segmented download. The default value is 5.
    - `retries`: The number of times to retry the download in case of an error. The default value is 0.
    - `mirrors`: The mirror URLs to be used if the primary URL fails. The default value is `None`. It can be a callable (functions, coroutines), string or List of callables, strings or both.
    - `overwrite`: Whether to overwrite the file if it already exists. The default value is `True`.
    - `speed_limit`: The maximum download speed in MB/s. The default value is 0.
    - `etag_validation`: Whether to validate the ETag before resuming downloads. The default value is `True`.
    - `hash_algorithms`: The hash algorithms to be used for precomputation of hash values. It can be a string or a list of strings. The default value is `None`.
    - `callback`: A callback function to be called when the download is complete. The default value is `None`. The function must accept 2 positional parameters: `status` (bool) indicating if the download was successful, and `result` (FileValidator object if successful, None if failed).
    - `block`: Whether to block until the download is complete. The default value is `True`.
    - `display`: Whether to display download progress and other optional messages. The default value is `True`.

    - Supported Keyword Arguments:
        - `params`: Parameters to be sent in the query string of the new request. The default value is `None`.
        - `data`: The data to send in the body of the request. The default value is `None`.
        - `json`: A JSON-compatible Python object to send in the body of the request. The default value is `None`.
        - `cookies`: HTTP Cookies to send with the request. The default value is `None`.
        - `headers`: HTTP headers to be sent with the request. The default value is `None`. *Please note that [multi-range headers](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Range#requesting_multiple_ranges) are not supported*.
        - `auth`: An object that represents HTTP Basic Authorization. The default value is `None`.
        - `allow_redirects`: If set to False, do not follow redirects. The default value is `True`.
        - `max_redirects`: Maximum number of redirects to follow. The default value is `10`.
        - `proxy`: Proxy URL. The default value is `None`.
        - `proxy_auth`: An object that represents proxy HTTP Basic Authorization. The default value is `None`.
        - `timeout`: (default `aiohttp.ClientTimeout(sock_read=60)`): Override the session’s timeout. The default value is `aiohttp.ClientTimeout(sock_read=60)`.
        - `ssl`: SSL validation mode. The default value is `True`.
        - `proxy_headers`: HTTP headers to send to the proxy if the `proxy` parameter has been provided. The default value is `None`.

        For detailed information on each parameter, refer the [aiohttp documentation](https://docs.aiohttp.org/en/stable/client_reference.html#aiohttp.ClientSession.request). Please ensure that only the *supported keyword arguments* are used. Using unsupported or irrelevant keyword arguments may lead to unexpected behavior or errors.

    ##### Returns
    
    - `AutoShutdownFuture`: If `block` and `allow_reuse` is  set to `False`.
    - `EFuture`: If `block` is `False` and `allow_reuse` is `True`.
    - `List`: If `block` is `True` and the download is successful. Returns a list of tuples, where each tuple contains the URL and a `FileValidator` object for that URL.
    - `None`: If `block` is `True` and the download fails.

- `stop()`: Stops the download process.
- `shutdown()`: Shuts down the downloader.
- `set_allow_reuse(allow_reuse)`: Sets whether to allow reuse of existing `Pypdl` object for next download.
- `set_logger(logger)`: Sets the logger object to be used for logging messages.
- `set_max_concurrent(max_concurrent)`: Sets the maximum number of concurrent downloads.

### Helper Classes

#### `FileValidator()`

The `FileValidator` class is used to validate the integrity of the downloaded file.

##### Attributes
- `path`: The path of the file

##### Methods

- `get_hash(algorithm)`: Fetches/calculates the hash of the file using the specified algorithm. Returns the hash as a string.

- `validate_hash(correct_hash, algorithm)`: Validates the hash of the file against the correct hash. Returns `True` if the hashes match, `False` otherwise.

#### `AutoShutdownFuture()`

The `AutoShutdownFuture` class is a wrapper for concurrent.futures.Future object that shuts down the eventloop and executor when the result is retrieved.

##### Methods

- `result(timeout=None)`: Retrieves the result of the Future object and shuts down the executor. If the download was successful, it returns a `FileValidator` object; otherwise, it returns `None`.

#### `EFuture()`

The `EFuture` class is a wrapper for a `concurrent.futures.Future` object that integrates with an event loop to handle asynchronous operations.

##### Methods

- `result(timeout=None)`: Retrieves the result of the `Future` object. If the `Future` completes successfully, it returns the result; otherwise, it raises an exception.

## License

pypdl is licensed under the MIT License. See the [LICENSE](https://github.com/mjishnu/pypdl/blob/main/LICENSE) file for more details.

## Contribution

Contributions to pypdl are always welcome. If you want to contribute to this project, please fork the repository and submit a pull request.

## Contact

If you have any questions, issues, or feedback about pypdl, please open an issue on the [GitHub repository](https://github.com/mjishnu/pypdl).

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/mjishnu/pypdl",
    "name": "pypdl",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "python, downloader, multi-threaded-downloader, concurrent-downloader, parallel-downloader, async-downloader, asyncronous-downloader, download-manager, fast-downloader, download-accelerator, download-optimizer, download-utility, download-tool, download-automation",
    "author": "mjishnu",
    "author_email": "<mjishnu@proton.me>",
    "download_url": "https://files.pythonhosted.org/packages/53/00/6b83b810124425712cc816cc7ec86c60ef534c039049a133a5429a5b606b/pypdl-1.5.5.tar.gz",
    "platform": null,
    "description": "# pypdl\n\npypdl is a Python library for downloading files from the internet. It provides features such as multi-segmented downloads, retry download in case of failure, option to continue downloading using a different URL if necessary, progress tracking, pause/resume functionality, checksum and many more.\n\n## Table of Contents\n\n- [Prerequisites](#prerequisites)\n- [Installation](#installation)\n- [Usage](#usage)\n  - [Basic Usage](#basic-usage)\n  - [Advanced Usage](#advanced-usage)\n  - [Examples](#examples)\n- [API Reference](#api-reference)\n- [License](#license)\n- [Contribution](#contribution)\n- [Contact](#contact)\n\n## Prerequisites\n\n* Python 3.8 or later.\n\n## Installation\n\nTo install the pypdl, run the following command:\n\n\n```bash\npip install pypdl\n```\n## Usage\n\n### Basic Usage\n\nTo download a file using the pypdl, simply create a new `Pypdl` object and call its `start` method, passing in the URL of the file to be downloaded:\n\n```py\nfrom pypdl import Pypdl\n\ndl = Pypdl()\ndl.start('http://example.com/file.txt')\n```\n\n### Advanced Usage\n\nThe `Pypdl` object provides additional options for advanced usage:\n\n```py\nfrom pypdl import Pypdl\n\ndl = Pypdl(allow_reuse=False, logger=default_logger(\"Pypdl\"), max_concurrent=1)\ndl.start(\n    url: Union[Callable, str] = None,\n    file_path: str = None,\n    tasks: List = None,\n    multisegment: bool = True,\n    segments: int = 5,\n    retries: int = 0,\n    mirrors: Union[str, List, Callable] = None,\n    overwrite: bool = True,\n    speed_limit: float = 0,\n    etag_validation: bool = True,\n    hash_algorithms: Union[str, List] = None,\n    callback: Callable = None,\n    block: bool = True,\n    display: bool = True,\n)\n```\n\nEach option is explained below:\n- `allow_reuse`: Whether to allow reuse of existing Pypdl object for the next download. The default value is `False`.\n- `logger`: A logger object to log messages. The default value is a custom `Logger` with the name *Pypdl*.\n- `max_concurrent`: The maximum number of concurrent downloads. The default value is 1.\n- `url`: This can either be the URL of the file to download or a function that returns the URL.\n- `file_path`: An optional path to save the downloaded file. By default, it uses the present working directory. If `file_path` is a directory, then the file is downloaded into it; otherwise, the file is downloaded into the given path.\n- `tasks`: A list of tasks to be downloaded. Each task is a dictionary with the following keys:\n    - `url` (required): The URL of the file to download.\n    - Optional keys (The default value is set by the `Pypdl` start method):\n        - `file_path`: path to save the downloaded file.\n        - `multisegment`: Whether to use multi-segmented download. \n        - `segments`: The number of segments the file should be divided into for multi-segmented download.\n        - `retries`: The number of times to retry the download in case of an error.\n        - `mirrors`: The mirror URLs to be used if the primary URL fails.\n        - `overwrite`: Whether to overwrite the file if it already exists. \n        - `speed_limit`: The maximum download speed in MB/s. \n        - `etag_validation`: Whether to validate the ETag before resuming downloads.\n        - `hash_algorithms`: The hash algorithms to be used for precomputation of hash values.\n        - `callback`: A callback function to be called when the download is complete.\n    - Additional supported keyword arguments of `Pypdl` start method.\n    \n- `multisegment`: Whether to use multi-segmented download. The default value is `True`.\n- `segments`: The number of segments the file should be divided into for multi-segmented download. The default value is 5.\n- `retries`: The number of times to retry the download in case of an error. The default value is 0.\n- `mirrors`: The mirror URLs to be used if the primary URL fails. The default value is `None`. It can be a callable (functions, coroutines), string or List of callables, strings or both.\n- `overwrite`: Whether to overwrite the file if it already exists. The default value is `True`.\n- `speed_limit`: The maximum download speed in MB/s. The default value is 0.\n- `etag_validation`: Whether to validate the ETag before resuming downloads. The default value is `True`.\n- `hash_algorithms`: The hash algorithms to be used for precomputation of hash values. It can be a string or a list of strings. The default value is `None`.\n- `callback`: A callback function to be called when the download is complete. The default value is `None`. The function must accept 2 positional parameters: `status` (bool) indicating if the download was successful, and `result` (FileValidator object if successful, None if failed).\n- `block`: Whether to block until the download is complete. The default value is `True`.\n- `display`: Whether to display download progress and other optional messages. The default value is `True`.\n\n- Supported Keyword Arguments:\n    - `params`: Parameters to be sent in the query string of the new request. The default value is `None`.\n    - `data`: The data to send in the body of the request. The default value is `None`.\n    - `json`: A JSON-compatible Python object to send in the body of the request. The default value is `None`.\n    - `cookies`: HTTP Cookies to send with the request. The default value is `None`.\n    - `headers`: HTTP headers to be sent with the request. The default value is `None`. *Please note that [multi-range headers](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Range#requesting_multiple_ranges) are not supported*.\n    - `auth`: An object that represents HTTP Basic Authorization. The default value is `None`.\n    - `allow_redirects`: If set to False, do not follow redirects. The default value is `True`.\n    - `max_redirects`: Maximum number of redirects to follow. The default value is `10`.\n    - `proxy`: Proxy URL. The default value is `None`.\n    - `proxy_auth`: An object that represents proxy HTTP Basic Authorization. The default value is `None`.\n    - `timeout`: (default `aiohttp.ClientTimeout(sock_read=60)`): Override the session\u2019s timeout. The default value is `aiohttp.ClientTimeout(sock_read=60)`.\n    - `ssl`: SSL validation mode. The default value is `True`.\n    - `proxy_headers`: HTTP headers to send to the proxy if the `proxy` parameter has been provided. The default value is `None`.\n\n    For detailed information on each parameter, refer the [aiohttp documentation](https://docs.aiohttp.org/en/stable/client_reference.html#aiohttp.ClientSession.request). Please ensure that only the *supported keyword arguments* are used. Using unsupported or irrelevant keyword arguments may lead to unexpected behavior or errors.\n\n### Examples\n\nHere is an example that demonstrates how to use pypdl library to download a file using headers, proxy and timeout:\n\n```py\nimport aiohttp\nfrom pypdl import Pypdl\n\ndef main():\n    # Using headers \n    headers = {\"User-Agent\":\"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0\", \"range\":\"bytes=-10485760\"}\n    # Using proxy\n    proxy = \"http://user:pass@some.proxy.com\"\n    # Using timeout\n    timeout = aiohttp.ClientTimeout(sock_read=20)\n\n    # create a new pypdl object\n    dl = Pypdl()\n\n    # start the download\n    dl.start(\n        url='https://speed.hetzner.de/100MB.bin',\n        file_path='100MB.bin',\n        segments=10,\n        display=True,\n        multisegment=True,\n        block=True,\n        retries=3,\n        etag_validation=True,\n        headers=headers, \n        proxy=proxy, \n        timeout=timeout\n    )\n\nif __name__ == '__main__':\n    main()\n```\n\nThis example downloads a file from the internet using 10 segments and displays the download progress. If the download fails, it will retry up to 3 times. We are also using headers to set the User-Agent and Range to download the last 10MB of the file, as well as a proxy and timeout. For more information on these parameters, refer to the [API reference](https://github.com/mjishnu/pypdl?tab=readme-ov-file#pypdl-1).\n\nAnother example of implementing pause resume functionality, printing the progress to console and changing log level to debug:\n\n```py\nfrom pypdl import Pypdl\n\n# create a pypdl object\ndl = Pypdl()\n\n# changing log level to debug\ndl.logger.setLevel('DEBUG')\n\n# start the download process\n# block=False so we can print the progress\n# display=False so we can print the progress ourselves\nfuture = dl.start('https://example.com/file.zip', segments=8,block=False,display=False)\n\n# print the progress\nwhile dl.progress != 70:\n  print(dl.progress)\n\n# stop the download process\ndl.stop() \n\n#do something\n#...\n\n# resume the download process\nfuture = dl.start('https://example.com/file.zip', segments=8,block=False,display=False)\n\n# print rest of the progress\nwhile not dl.completed:\n  print(dl.progress)\n\n# get the result, calling result() on future is essential when block=False so everything is properly cleaned up\nresult = future.result()\n\n```\n\nThis example we start the download process and print the progress to console. We then stop the download process and do something else. After that we resume the download process and print the rest of the progress to console. This can be used to create a pause/resume functionality.\n\nAnother example of using hash validation with dynamic url:\n\n```py\nfrom pypdl import Pypdl\n\n# Generate the url dynamically\ndef dynamic_url():\n    return 'https://example.com/file.zip'\n\n# create a pypdl object\ndl = Pypdl()\n\n# if block = True --> returns a FileValidator object\nres = dl.start(dynamic_url, block=True) \n\n# validate hash\nif res.validate_hash(correct_hash,'sha256'):\n    print('Hash is valid')\nelse:\n    print('Hash is invalid')\n\n# scenario where block = False --> returns a AutoShutdownFuture object\nmirror_urls = ['https://example1.com/file2.zip', 'https://example2.com/file2.zip']\n\n# retry download with different url if current fails\nfuture = dl.start(url=\"https://example.com/file2.zip\", mirrors=mirror_urls, block=False,retries=2)\n\n# do something\n# ...\n\n# It is essential to call result() on future when block=False so everything is properly cleaned up\nres = future.result()\n# validate hash\nif dl.completed:\n  if res.validate_hash(correct_hash,'sha256'):\n      print('Hash is valid')\n  else:\n      print('Hash is invalid')\n```\nAn example of using Pypdl object to get size of the files with `allow_reuse` set to `True` and custom logger:\n\n```py\nimport logging\nimport time\nfrom pypdl import Pypdl\n\nurls = [\n    'https://example.com/file1.zip',\n    'https://example.com/file2.zip',\n    'https://example.com/file3.zip',\n    'https://example.com/file4.zip',\n    'https://example.com/file5.zip',\n]\n\n# create a custom logger\nlogger = logging.getLogger('custom')\n\nsize = []\n\n# create a pypdl object\ndl = Pypdl(allow_reuse=True, logger=logger)\n\nfor url in urls:\n    dl.start(url, block=False)\n    \n    # get the size of the file and add it to size list\n    size.append(dl.size)\n\n    # do something \n\n    while not dl.completed:\n        print(dl.progress)\n\nprint(size)\n# shutdown the downloader, this is essential when allow_reuse is enabled\ndl.shutdown()\n\n```\n\n\nAn example of downloading multiple files concurrently:\n\n```py\nfrom pypdl import pypdl\n\nproxy = \"http://user:pass@some.proxy.com\"\n\n# create a pypdl object with max_concurrent set to 2\ndl = pypdl(max_concurrent=2, allow_reuse=True)\n\n# List of tasks to be downloaded..\ntasks = [\n    {'url':'https://example.com/file1.zip', 'file_path': 'file1.zip'},\n    {'url':'https://example.com/file2.zip', 'file_path': 'file2.zip'},\n    {'url':'https://example.com/file3.zip', 'file_path': 'file3.zip'},\n    {'url':'https://example.com/file4.zip', 'file_path': 'file4.zip'},\n    {'url':'https://example.com/file5.zip', 'file_path': 'file5.zip'},\n]\n\n# start the download process with proxy\nresults = dl.start(tasks=tasks, display=True, block=False,proxy=proxy)\n\n# do something\n# ...\n\n# stop the download process\ndl.stop()\n\n# do something\n# ...\n\n# restart the download process without proxy\nresults = dl.start(tasks=tasks, display=True, block=True)\n\n# print the results\nfor url, result in results:\n    # validate hash\n    if result.validate_hash(correct_hash,'sha256'):\n        print(f'{url} - Hash is valid')\n    else:\n        print(f'{url} - Hash is invalid')\n\ntask2 = [\n    {'url':'https://example.com/file6.zip', 'file_path': 'file6.zip'},\n    {'url':'https://example.com/file7.zip', 'file_path': 'file7.zip'},\n    {'url':'https://example.com/file8.zip', 'file_path': 'file8.zip'},\n    {'url':'https://example.com/file9.zip', 'file_path': 'file9.zip'},\n    {'url':'https://example.com/file10.zip', 'file_path': 'file10.zip'},\n]\n\n# start the download process\ndl.start(tasks=task2, display=True, block=True)\n\n# shutdown the downloader, this is essential when allow_reuse is enabled\ndl.shutdown()\n```\nAnother example of using precomputed hash for parallel calculation and validation of hash using callbacks\n\n```py\nfrom pypdl import pypdl\n\n# create a pypdl object with max_concurrent set to 2\ndl = pypdl(max_concurrent=2)\n\n# List of tasks to be downloaded..\ntasks = [\n    {'url':'https://example.com/file1.zip', 'file_path': 'file1.zip'},\n    {'url':'https://example.com/file2.zip', 'file_path': 'file2.zip'},\n    {'url':'https://example.com/file3.zip', 'file_path': 'file3.zip'},\n    {'url':'https://example.com/file4.zip', 'file_path': 'file4.zip'},\n    {'url':'https://example.com/file5.zip', 'file_path': 'file5.zip'},\n]\n\n# Callback requires 2 positional arguments\n# status: bool indicating download success\n# result: FileValidator object if successful, None if failed\ndef callback_func(status, result):\n    if status == True:\n        result.validate_hash(correct_hash=correct_hash, algorithm='sha256')\n        # do something \n\n# hash_algorithms can be a list for multiple hashes: ['sha256', 'md5']\n# Hashes are computed during download and cached into FileValidator\ndl.start(tasks=tasks, hash_algorithms='sha256', callback=callback_func)\n```\n## API Reference\n\n### `Pypdl()`\n\nThe `Pypdl` class represents a file downloader that can download a file from a given URL to a specified file path. The class supports both single-segmented and multi-segmented downloads and many other features like retry download in case of failure and option to continue downloading using a different url if necessary, pause/resume functionality, progress tracking etc.\n\n#### Arguments\n- `allow_reuse`: (bool, Optional) Whether to allow reuse of existing `Pypdl` object for next download. The default value is `False`. It's essential to use `shutdown()` method when `allow_reuse` is enabled to ensure efficient resource management.\n\n- `logger`: (logging.Logger, Optional) A logger object to log messages. The default value is custom `Logger` with the name *Pypdl*.\n\n- `max_concurrent`: (int, Optional) The maximum number of concurrent downloads. The default value is 1.\n\n#### Attributes\n\n- `size`: The total size of the file to be downloaded, in bytes.\n- `current_size`: The amount of data downloaded so far, in bytes.\n- `remaining_size`: The amount of data remaining to be downloaded, in bytes.\n- `progress`: The download progress percentage.\n- `speed`: The download speed, in MB/s.\n- `time_spent`: The time spent downloading, in seconds.\n- `eta`: The estimated time remaining for download completion, in seconds.\n- `total_tasks`: The total number of tasks to be downloaded.\n- `completed_tasks`: The number of tasks that have been completed.\n- `task_progress`: The progress of all tasks.\n- `completed`: A flag that indicates if the download is complete.\n- `success`: A list of tasks that were successfully downloaded.\n- `failed`: A list of tasks that failed to download.\n- `logger`: A property that returns the logger if available.\n- `is_idle`: A flag that indicates if the downloader is idle.\n\n#### Methods\n\n- `start(url = None,\n    file_path = None,\n    tasks = None,\n    multisegment = True,\n    segments = 5,\n    retries = 0,\n    mirrors = None,\n    overwrite = True,\n    speed_limit = 0,\n    etag_validation = True,\n    hash_algorithms = None,\n    callback = None,\n    block = True,\n    display = True,\n)`: Starts the download process.\n\n    ##### Parameters\n\n    - `url`: This can either be the URL of the file to download or a function that returns the URL.\n    - `file_path`: An optional path to save the downloaded file. By default, it uses the present working directory. If `file_path` is a directory, then the file is downloaded into it; otherwise, the file is downloaded into the given path.\n    - `tasks`: A list of tasks to be downloaded. Each task is a dictionary with the following keys:\n        - `url` (required): The URL of the file to download.\n        - Optional keys (The default value is set by the `Pypdl` start method):\n            - `file_path`: path to save the downloaded file.\n            - `multisegment`: Whether to use multi-segmented download. \n            - `segments`: The number of segments the file should be divided into for multi-segmented download.\n            - `retries`: The number of times to retry the download in case of an error.\n            - `mirrors`: The mirror URLs to be used if the primary URL fails.\n            - `overwrite`: Whether to overwrite the file if it already exists. \n            - `speed_limit`: The maximum download speed in MB/s.\n            - `etag_validation`: Whether to validate the ETag before resuming downloads.\n            - `hash_algorithms`: The hash algorithms to be used for precomputation of hash values.\n            - `callback`: A callback function to be called when the download is complete.\n        - Additional supported keyword arguments of `Pypdl` start method.\n        \n    - `multisegment`: Whether to use multi-segmented download. The default value is `True`.\n    - `segments`: The number of segments the file should be divided into for multi-segmented download. The default value is 5.\n    - `retries`: The number of times to retry the download in case of an error. The default value is 0.\n    - `mirrors`: The mirror URLs to be used if the primary URL fails. The default value is `None`. It can be a callable (functions, coroutines), string or List of callables, strings or both.\n    - `overwrite`: Whether to overwrite the file if it already exists. The default value is `True`.\n    - `speed_limit`: The maximum download speed in MB/s. The default value is 0.\n    - `etag_validation`: Whether to validate the ETag before resuming downloads. The default value is `True`.\n    - `hash_algorithms`: The hash algorithms to be used for precomputation of hash values. It can be a string or a list of strings. The default value is `None`.\n    - `callback`: A callback function to be called when the download is complete. The default value is `None`. The function must accept 2 positional parameters: `status` (bool) indicating if the download was successful, and `result` (FileValidator object if successful, None if failed).\n    - `block`: Whether to block until the download is complete. The default value is `True`.\n    - `display`: Whether to display download progress and other optional messages. The default value is `True`.\n\n    - Supported Keyword Arguments:\n        - `params`: Parameters to be sent in the query string of the new request. The default value is `None`.\n        - `data`: The data to send in the body of the request. The default value is `None`.\n        - `json`: A JSON-compatible Python object to send in the body of the request. The default value is `None`.\n        - `cookies`: HTTP Cookies to send with the request. The default value is `None`.\n        - `headers`: HTTP headers to be sent with the request. The default value is `None`. *Please note that [multi-range headers](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Range#requesting_multiple_ranges) are not supported*.\n        - `auth`: An object that represents HTTP Basic Authorization. The default value is `None`.\n        - `allow_redirects`: If set to False, do not follow redirects. The default value is `True`.\n        - `max_redirects`: Maximum number of redirects to follow. The default value is `10`.\n        - `proxy`: Proxy URL. The default value is `None`.\n        - `proxy_auth`: An object that represents proxy HTTP Basic Authorization. The default value is `None`.\n        - `timeout`: (default `aiohttp.ClientTimeout(sock_read=60)`): Override the session\u2019s timeout. The default value is `aiohttp.ClientTimeout(sock_read=60)`.\n        - `ssl`: SSL validation mode. The default value is `True`.\n        - `proxy_headers`: HTTP headers to send to the proxy if the `proxy` parameter has been provided. The default value is `None`.\n\n        For detailed information on each parameter, refer the [aiohttp documentation](https://docs.aiohttp.org/en/stable/client_reference.html#aiohttp.ClientSession.request). Please ensure that only the *supported keyword arguments* are used. Using unsupported or irrelevant keyword arguments may lead to unexpected behavior or errors.\n\n    ##### Returns\n    \n    - `AutoShutdownFuture`: If `block` and `allow_reuse` is  set to `False`.\n    - `EFuture`: If `block` is `False` and `allow_reuse` is `True`.\n    - `List`: If `block` is `True` and the download is successful. Returns a list of tuples, where each tuple contains the URL and a `FileValidator` object for that URL.\n    - `None`: If `block` is `True` and the download fails.\n\n- `stop()`: Stops the download process.\n- `shutdown()`: Shuts down the downloader.\n- `set_allow_reuse(allow_reuse)`: Sets whether to allow reuse of existing `Pypdl` object for next download.\n- `set_logger(logger)`: Sets the logger object to be used for logging messages.\n- `set_max_concurrent(max_concurrent)`: Sets the maximum number of concurrent downloads.\n\n### Helper Classes\n\n#### `FileValidator()`\n\nThe `FileValidator` class is used to validate the integrity of the downloaded file.\n\n##### Attributes\n- `path`: The path of the file\n\n##### Methods\n\n- `get_hash(algorithm)`: Fetches/calculates the hash of the file using the specified algorithm. Returns the hash as a string.\n\n- `validate_hash(correct_hash, algorithm)`: Validates the hash of the file against the correct hash. Returns `True` if the hashes match, `False` otherwise.\n\n#### `AutoShutdownFuture()`\n\nThe `AutoShutdownFuture` class is a wrapper for concurrent.futures.Future object that shuts down the eventloop and executor when the result is retrieved.\n\n##### Methods\n\n- `result(timeout=None)`: Retrieves the result of the Future object and shuts down the executor. If the download was successful, it returns a `FileValidator` object; otherwise, it returns `None`.\n\n#### `EFuture()`\n\nThe `EFuture` class is a wrapper for a `concurrent.futures.Future` object that integrates with an event loop to handle asynchronous operations.\n\n##### Methods\n\n- `result(timeout=None)`: Retrieves the result of the `Future` object. If the `Future` completes successfully, it returns the result; otherwise, it raises an exception.\n\n## License\n\npypdl is licensed under the MIT License. See the [LICENSE](https://github.com/mjishnu/pypdl/blob/main/LICENSE) file for more details.\n\n## Contribution\n\nContributions to pypdl are always welcome. If you want to contribute to this project, please fork the repository and submit a pull request.\n\n## Contact\n\nIf you have any questions, issues, or feedback about pypdl, please open an issue on the [GitHub repository](https://github.com/mjishnu/pypdl).\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A concurrent pure python download manager",
    "version": "1.5.5",
    "project_urls": {
        "Homepage": "https://github.com/mjishnu/pypdl"
    },
    "split_keywords": [
        "python",
        " downloader",
        " multi-threaded-downloader",
        " concurrent-downloader",
        " parallel-downloader",
        " async-downloader",
        " asyncronous-downloader",
        " download-manager",
        " fast-downloader",
        " download-accelerator",
        " download-optimizer",
        " download-utility",
        " download-tool",
        " download-automation"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ec9f0c8b811eff0b9d7c7d19d3440753c57a51ec62f81bc211711bb9a5605481",
                "md5": "850c337783525a9b2168a624a2565f21",
                "sha256": "354682072d14c60b152534dd822a8e1e63672bfb640bc9e1446c7c7dbd09a549"
            },
            "downloads": -1,
            "filename": "pypdl-1.5.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "850c337783525a9b2168a624a2565f21",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 22533,
            "upload_time": "2025-07-12T09:59:22",
            "upload_time_iso_8601": "2025-07-12T09:59:22.213767Z",
            "url": "https://files.pythonhosted.org/packages/ec/9f/0c8b811eff0b9d7c7d19d3440753c57a51ec62f81bc211711bb9a5605481/pypdl-1.5.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "53006b83b810124425712cc816cc7ec86c60ef534c039049a133a5429a5b606b",
                "md5": "e39d913d2a112ce32d08dc1a842f3d59",
                "sha256": "239b7c2b86b9ff12c7a6293d41249dbbbaf8838c07227d9ddf3dc746ba534517"
            },
            "downloads": -1,
            "filename": "pypdl-1.5.5.tar.gz",
            "has_sig": false,
            "md5_digest": "e39d913d2a112ce32d08dc1a842f3d59",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 25855,
            "upload_time": "2025-07-12T09:59:23",
            "upload_time_iso_8601": "2025-07-12T09:59:23.800140Z",
            "url": "https://files.pythonhosted.org/packages/53/00/6b83b810124425712cc816cc7ec86c60ef534c039049a133a5429a5b606b/pypdl-1.5.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-12 09:59:23",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mjishnu",
    "github_project": "pypdl",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "aiohttp",
            "specs": []
        },
        {
            "name": "aiofiles",
            "specs": []
        }
    ],
    "lcname": "pypdl"
}

mjishnu