# Pymatris 📂
[![Pyversions](https://img.shields.io/pypi/pyversions/pymatris.svg?style=flat-square)](https://pypi.python.org/pypi/pymatris)
Parallel file downloader for HTTP/HTTPS, FTP and SFTP protocols, built using Python.
## Installation
```
pip install pymatris
```
# Usage
* Initialize Downloader
```python
from pymatris import Downloader
dl = Downloader()
```
* Enqueue file to download
```python
dl.enqueue_file("https://storage.data.gov.my/pricecatcher/pricecatcher_2022-01.parquet", path="./")
```
* Start downloading files
```python
dl.download()
```
* View results
```python
result = dl.download()
```
Under the hood, `pymatris.Downloader` uses a global queue to manage the download tasks. `pymatris.Downloader.enqueue_file()` will add url to download queue, and `pymatris.Downloder.download()` will download the files in parallel. Pymatris uses asyncio to download files in parallel, with asychronous I/O operations using aiofiles, hence enabling faster downloads.
#### Results and Error Handling
`pymatris.Downloader.download()` returns a `Results` object, which is a list of the filenames that have been downloaded. `Results` object has two attributes, `success` and `errors`.
`success` is a list of named tuples, where each named tuple contains `.path` the filepath and `.url` the url.
`errors` is a list of named tuples, where each named tuple contains `.filepath_partial` the intended filepath, `.url` the url, `.exception` an Exception or aiohttp.ClientResponse that occurred during download.
### Example Usage
from [main.py](https://github.com/zhuolisam/pymatris/blob/main/main.py)
```python
from pymatris import Downloader
urls = [
"https://storage.data.gov.my/pricecatcher/pricecatcher_2022-01.parquet",
"ftp://bob:bob@192.168.1.6:20/tesfile.txt",
"https://storage.data.gov.my/pricecatcher/pricecatcher_2022-01.parquet",
]
dm = Downloader()
for url in urls:
dm.enqueue_file(url, path="./")
results = dm.download()
print(results)
>> Success:
>> pricecatcher_2022-01.csv https://storage.data.gov.my/pricecatcher/pricecatcher_2022-01.csv
>> pricecatcher_2022-01.parquet https://storage.data.gov.my/pricecatcher/pricecatcher_2022-01.parquet
>> Errors:
>> (ftp://bob:bob@192.168.1.6:20/tesfile.txt,
>> ConnectionRefusedError(61, "Connect call failed ('192.168.1.6', 20)"))
```
### Advanced Usage
Visit [main.py](https://github.com/zhuolisam/pymatris/blob/main/main.py) for advanced usage.
### CLI
Pymatris also provides a command line interface to download files in parallel.
In your terminal, run the following command to download files in parallel.
```bash
# Insert single url as argument
pymatris https://storage.data.gov.my/pricecatcher/pricecatcher_2022-01.parquet
# Or multiple urls
pymatris https://storage.data.gov.my/pricecatcher/pricecatcher_2022-01.parquet https://storage.data.gov.my/pricecatcher/pricecatcher_2022-02.parquet https://storage.data.gov.my/pricecatcher/pricecatcher_2022-03.parquet
```
```bash
$ pymatris --help
usage: pymatris [-h] [--max-parallel MAX_PARALLEL] [--max-splits MAX_SPLITS]
[--overwrite] [--quiet] [--dir DIR] [--show-errors SHOW_ERRORS]
[--timeouts TIMEOUTS] [--max-tries MAX_TRIES]
URLS [URLS ...]
```
#### Arguments
**To provide path to save files, use --dir option. By default files will be saved in current directory.**
```bash
pymatris --dir "./" <urls>
```
**To overwrite existing files, use --overwrite option. By default, files will not be overwritten.**
```bash
pymatris --overwrite <urls>
```
_Assuming your have "pricecatcher_2022-01.parquet" file in your current directory, running above command will overwrite the existing file.
During download, Pymatris creates tempfile to download files, if download is interrupted, rest assured that your existing files are safe, and tempfiles will be deleted._
**To configure number of parallel downloads, use --max-parallel option. By default, 5 parallel downloads are allowed.**
```bash
pymatris --max-parallel 10 <urls>
```
_Pymatris uses asyncio to download files in parallel. By default, 5 files are downloaded in parallel. You can increase or decrease the number of parallel downloads._
**To configure number of parallel download parts per file, use --max-splits option. By default, 5 parts are downloaded in parallel for each file.**
```bash
pymatris --max-splits 10 <urls>
```
_This is only available for HTTP/HTTPS and SFTP protocols. Currently, FTP protocol does not support multipart downloads._
**To configure number of retries for failed downloads, use --max-tries option. By default, 5 retries are allowed.**
```bash
pymatris --max-tries 10 <urls>
```
**To hide progress bar, use --quiet option. By default, progress bar is shown.**
```bash
pymatris --quiet <urls>
```
### Requirements
* Python 3.9 or above
* aiohttp
* aioftp
* asyncssh
* aiofiles
* tqdm
### TODO
- [ ] Add better concurrency support for FTP protocol.
- [ ] Better error handling and logging for FTP and SFTP protocols.
### Acknowledgements
* [aiofiles](https://github.com/Tinche/aiofiles)
* [pytest-localserver](https://github.com/pytest-dev/pytest-localserver)
* [asyncssh](https://github.com/ronf/asyncssh)
* [Pyaiodl](https://github.com/aryanvikash/Pyaiodl)
* [aiodl](https://github.com/cshuaimin/aiodl)
* [parfive](https://github.com/Cadair/parfive)
Raw data
{
"_id": null,
"home_page": null,
"name": "pymatris",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": null,
"keywords": "downloader, download-manager, http/https, sftp, ftp",
"author": "zhuolisam",
"author_email": "zhuolisam0627@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/d3/08/fd833fb3780c35fa87c7b710b0e511802ae8c889421ad1f1643e97be5244/pymatris-0.0.10.tar.gz",
"platform": null,
"description": "# Pymatris \ud83d\udcc2\n\n\n[![Pyversions](https://img.shields.io/pypi/pyversions/pymatris.svg?style=flat-square)](https://pypi.python.org/pypi/pymatris)\n\nParallel file downloader for HTTP/HTTPS, FTP and SFTP protocols, built using Python.\n\n\n## Installation\n\n```\npip install pymatris\n```\n\n# Usage\n\n* Initialize Downloader\n```python\nfrom pymatris import Downloader\n\ndl = Downloader()\n```\n* Enqueue file to download\n```python\n\ndl.enqueue_file(\"https://storage.data.gov.my/pricecatcher/pricecatcher_2022-01.parquet\", path=\"./\")\n\n```\n\n* Start downloading files\n```python\n\ndl.download()\n\n```\n\n* View results\n```python\nresult = dl.download()\n```\n\n\n\nUnder the hood, `pymatris.Downloader` uses a global queue to manage the download tasks. `pymatris.Downloader.enqueue_file()` will add url to download queue, and `pymatris.Downloder.download()` will download the files in parallel. Pymatris uses asyncio to download files in parallel, with asychronous I/O operations using aiofiles, hence enabling faster downloads.\n\n\n#### Results and Error Handling\n`pymatris.Downloader.download()` returns a `Results` object, which is a list of the filenames that have been downloaded. `Results` object has two attributes, `success` and `errors`. \n\n`success` is a list of named tuples, where each named tuple contains `.path` the filepath and `.url` the url. \n\n`errors` is a list of named tuples, where each named tuple contains `.filepath_partial` the intended filepath, `.url` the url, `.exception` an Exception or aiohttp.ClientResponse that occurred during download.\n\n\n### Example Usage\n\nfrom [main.py](https://github.com/zhuolisam/pymatris/blob/main/main.py)\n\n```python\nfrom pymatris import Downloader\nurls = [\n \"https://storage.data.gov.my/pricecatcher/pricecatcher_2022-01.parquet\",\n \"ftp://bob:bob@192.168.1.6:20/tesfile.txt\",\n \"https://storage.data.gov.my/pricecatcher/pricecatcher_2022-01.parquet\",\n]\n\n\n\ndm = Downloader()\n\nfor url in urls:\n dm.enqueue_file(url, path=\"./\")\n\nresults = dm.download()\n\n\nprint(results)\n>> Success:\n>> pricecatcher_2022-01.csv https://storage.data.gov.my/pricecatcher/pricecatcher_2022-01.csv\n>> pricecatcher_2022-01.parquet https://storage.data.gov.my/pricecatcher/pricecatcher_2022-01.parquet\n\n>> Errors:\n>> (ftp://bob:bob@192.168.1.6:20/tesfile.txt,\n>> ConnectionRefusedError(61, \"Connect call failed ('192.168.1.6', 20)\"))\n\n```\n\n### Advanced Usage\nVisit [main.py](https://github.com/zhuolisam/pymatris/blob/main/main.py) for advanced usage.\n\n\n### CLI\nPymatris also provides a command line interface to download files in parallel.\nIn your terminal, run the following command to download files in parallel.\n```bash\n# Insert single url as argument\npymatris https://storage.data.gov.my/pricecatcher/pricecatcher_2022-01.parquet \n\n# Or multiple urls \npymatris https://storage.data.gov.my/pricecatcher/pricecatcher_2022-01.parquet https://storage.data.gov.my/pricecatcher/pricecatcher_2022-02.parquet https://storage.data.gov.my/pricecatcher/pricecatcher_2022-03.parquet\n```\n\n```bash\n $ pymatris --help\n usage: pymatris [-h] [--max-parallel MAX_PARALLEL] [--max-splits MAX_SPLITS]\n [--overwrite] [--quiet] [--dir DIR] [--show-errors SHOW_ERRORS]\n [--timeouts TIMEOUTS] [--max-tries MAX_TRIES]\n URLS [URLS ...]\n\n```\n\n\n#### Arguments\n\n**To provide path to save files, use --dir option. By default files will be saved in current directory.**\n\n```bash\npymatris --dir \"./\" <urls>\n```\n\n**To overwrite existing files, use --overwrite option. By default, files will not be overwritten.**\n\n```bash\npymatris --overwrite <urls>\n```\n_Assuming your have \"pricecatcher_2022-01.parquet\" file in your current directory, running above command will overwrite the existing file.\nDuring download, Pymatris creates tempfile to download files, if download is interrupted, rest assured that your existing files are safe, and tempfiles will be deleted._\n\n**To configure number of parallel downloads, use --max-parallel option. By default, 5 parallel downloads are allowed.**\n\n```bash\npymatris --max-parallel 10 <urls>\n```\n_Pymatris uses asyncio to download files in parallel. By default, 5 files are downloaded in parallel. You can increase or decrease the number of parallel downloads._\n\n\n\n**To configure number of parallel download parts per file, use --max-splits option. By default, 5 parts are downloaded in parallel for each file.**\n\n```bash\npymatris --max-splits 10 <urls>\n```\n_This is only available for HTTP/HTTPS and SFTP protocols. Currently, FTP protocol does not support multipart downloads._\n\n**To configure number of retries for failed downloads, use --max-tries option. By default, 5 retries are allowed.**\n\n```bash\n\npymatris --max-tries 10 <urls>\n```\n\n**To hide progress bar, use --quiet option. By default, progress bar is shown.**\n\n```bash\npymatris --quiet <urls>\n```\n\n\n\n### Requirements\n* Python 3.9 or above\n* aiohttp\n* aioftp\n* asyncssh\n* aiofiles\n* tqdm\n\n\n### TODO\n- [ ] Add better concurrency support for FTP protocol.\n- [ ] Better error handling and logging for FTP and SFTP protocols.\n\n\n### Acknowledgements \n* [aiofiles](https://github.com/Tinche/aiofiles)\n* [pytest-localserver](https://github.com/pytest-dev/pytest-localserver)\n* [asyncssh](https://github.com/ronf/asyncssh)\n* [Pyaiodl](https://github.com/aryanvikash/Pyaiodl)\n* [aiodl](https://github.com/cshuaimin/aiodl)\n* [parfive](https://github.com/Cadair/parfive)",
"bugtrack_url": null,
"license": null,
"summary": "Parallel download manager for HTTP/HTTPS/FTP/SFTP protocols.",
"version": "0.0.10",
"project_urls": {
"Documentation": "https://github.com/zhuolisam/pymatris",
"Homepage": "https://github.com/zhuolisam/pymatris",
"Source Code": "https://github.com/zhuolisam/pymatris"
},
"split_keywords": [
"downloader",
" download-manager",
" http/https",
" sftp",
" ftp"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "2793e8427012d7385538543116ec2e9f2bb2e83e8c837cc4d0ac924ec3d53448",
"md5": "ab97b53df5f7907666ab55b0e2be0872",
"sha256": "4fc6297ea43e35f47c0eaa958227b2cad16192bba4708a98e427116c464c11d7"
},
"downloads": -1,
"filename": "pymatris-0.0.10-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ab97b53df5f7907666ab55b0e2be0872",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 19318,
"upload_time": "2024-04-28T16:36:27",
"upload_time_iso_8601": "2024-04-28T16:36:27.255029Z",
"url": "https://files.pythonhosted.org/packages/27/93/e8427012d7385538543116ec2e9f2bb2e83e8c837cc4d0ac924ec3d53448/pymatris-0.0.10-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d308fd833fb3780c35fa87c7b710b0e511802ae8c889421ad1f1643e97be5244",
"md5": "8e8375b78899dfd3aba3386d76caa21f",
"sha256": "e26a25aaa9f9c88237d8e44713dd499ccee7b85005ae6f9787f36efe068cea50"
},
"downloads": -1,
"filename": "pymatris-0.0.10.tar.gz",
"has_sig": false,
"md5_digest": "8e8375b78899dfd3aba3386d76caa21f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.9",
"size": 15680,
"upload_time": "2024-04-28T16:36:28",
"upload_time_iso_8601": "2024-04-28T16:36:28.964635Z",
"url": "https://files.pythonhosted.org/packages/d3/08/fd833fb3780c35fa87c7b710b0e511802ae8c889421ad1f1643e97be5244/pymatris-0.0.10.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-28 16:36:28",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "zhuolisam",
"github_project": "pymatris",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "pymatris"
}