# fastdownload
> Easily download, verify, and extract archives
If you have datasets or other archives that you want to make available to your users, and ensure they always have the latest versions and that they are downloaded correctly, `fastdownload` can help.
## Install
Using pip:
pip install fastdownload
...or using conda:
conda install -c fastai fastdownload
## What's this about?
The situation where you might want to use `fastdownload` is where you have one or more URLs pointing at some archives you want to make available, and you want to ensure that your users download those archives correctly, have the latest version, and that it's as easy as possible for them to access the information in those archives.
Your user just calls a single method, `FastDownload.get`, passing the URL required, and the URL will be downloaded and extracted to the directories you choose. The path to the extracted file is returned. If that URL has already been downloaded, then the cached archive or contents will be used automatically. However, if that size or hash of the archive is different to what it should be, then the user will be informed, and a new version will be downloaded.
In the future, you may want to update one or more of your archives. When you do so, `fastdownload` will ensure your users have the latest version, by checking their downloaded archives against your updated file size and hash information.
For instance, `fastai` uses `fastdownload` to provide access to datasets for deep learning. `fastai` users can download and extract them with a single command, using the return value to access the files. The files are automatically placed in appropriate subdirectories of a `.fastai` folder in the user's homedir. If a dataset is updated, users are informed the next time they use the dataset, and the latest version is automatically downloaded and extracted for them.
## Usage: downloading files
When your users download an archive, `fastdownload` will automatically save it to a directory, check if the size and hash matches, and extract the contents. Minimal usage for downloading and extracting is:
```python
from fastdownload import FastDownload
d = FastDownload()
path = d.get('https://...')
```
After this, `path` will contain the path where the extracted files are located. By default, archives are saved to `{base}/archive`, and extracted to `{base}/data`. `{base}` defaults to `~/.fastdownload`. If there is more than one file or folder in the root of the downloaded archive, then a new folder is created in `data` for the contents.
Instead of `get`, use `download` to download the URL without extracting it, or `extract` to extract the URL without downloading it (assuming it's already been downloaded to the `archive` directory). All of these methods accept a `force` parameter which will download/extract the archive even if it's already present.
You can change any or all of the `base`, `archive`, and `data` paths by passing them to `FastDownload`:
```python
d = FastDownload(base='~/.mypath', archive='downloaded', data='extracted')
```
You can remove the cached archive file and/or the extracted contents with `rm`:
```python
d.rm('https://...')
```
## Usage: making archives available to download
`fastdownload` will add a file `download_checks.py` to your Python module which contains file sizes and hashes for your archives. The file is located in the same directory as a module you choose, e.g.:
```python
d = FastDownload(module=fastai.some_module)
```
Then use `update` to create or update the size and hash for a URL:
```python
d.update('https://...')
```
You will now find there is a file called `download_checks.py` in the same directory where `fastai.some_module` is located, which contains a Python dict with the URL, size, and hash for this file. If you've downloaded this file before to your `archive` path then it will be used, instead of downloading a new copy. Use `get(force=True)` first to download a new copy if even you have it in your archive.
## Config file
If there is a file called `config.ini` in your `base` directory, then keys `archive` and `data` will be used as the default values for `FastDownload`. The file should be in [configparser](https://docs.python.org/3/library/configparser.html) format. Here's a sample `config.ini`:
```
[DEFAULT]
archive = downloaded
data = extracted
```
If there is no ini file present, one will be automatically created for for you using the details you pass to `FastDownload`.
You can add any additional key/value pairs to the config file that you want. When you call `FastDownload.get` pass `extract_key` to use a key other than `data` for choosing a location to extract to.
Raw data
{
"_id": null,
"home_page": "https://github.com/fastai/fastdownload/tree/master/",
"name": "fastdownload",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "",
"keywords": "fastdownload,dataset,fastai,download",
"author": "Jeremy Howard",
"author_email": "info@fast.ai",
"download_url": "https://files.pythonhosted.org/packages/08/be/d2c2e8dc81aa88316ed27f1bd707440a83a7420c35e67c0b143fe81aeca9/fastdownload-0.0.7.tar.gz",
"platform": null,
"description": "# fastdownload\n> Easily download, verify, and extract archives\n\n\nIf you have datasets or other archives that you want to make available to your users, and ensure they always have the latest versions and that they are downloaded correctly, `fastdownload` can help.\n\n## Install\n\nUsing pip:\n\n pip install fastdownload\n\n...or using conda:\n\n conda install -c fastai fastdownload\n\n## What's this about?\n\nThe situation where you might want to use `fastdownload` is where you have one or more URLs pointing at some archives you want to make available, and you want to ensure that your users download those archives correctly, have the latest version, and that it's as easy as possible for them to access the information in those archives.\n\nYour user just calls a single method, `FastDownload.get`, passing the URL required, and the URL will be downloaded and extracted to the directories you choose. The path to the extracted file is returned. If that URL has already been downloaded, then the cached archive or contents will be used automatically. However, if that size or hash of the archive is different to what it should be, then the user will be informed, and a new version will be downloaded.\n\nIn the future, you may want to update one or more of your archives. When you do so, `fastdownload` will ensure your users have the latest version, by checking their downloaded archives against your updated file size and hash information.\n\nFor instance, `fastai` uses `fastdownload` to provide access to datasets for deep learning. `fastai` users can download and extract them with a single command, using the return value to access the files. The files are automatically placed in appropriate subdirectories of a `.fastai` folder in the user's homedir. If a dataset is updated, users are informed the next time they use the dataset, and the latest version is automatically downloaded and extracted for them.\n\n## Usage: downloading files\n\nWhen your users download an archive, `fastdownload` will automatically save it to a directory, check if the size and hash matches, and extract the contents. Minimal usage for downloading and extracting is:\n\n```python\nfrom fastdownload import FastDownload\nd = FastDownload()\npath = d.get('https://...')\n```\n\nAfter this, `path` will contain the path where the extracted files are located. By default, archives are saved to `{base}/archive`, and extracted to `{base}/data`. `{base}` defaults to `~/.fastdownload`. If there is more than one file or folder in the root of the downloaded archive, then a new folder is created in `data` for the contents.\n\nInstead of `get`, use `download` to download the URL without extracting it, or `extract` to extract the URL without downloading it (assuming it's already been downloaded to the `archive` directory). All of these methods accept a `force` parameter which will download/extract the archive even if it's already present.\n\nYou can change any or all of the `base`, `archive`, and `data` paths by passing them to `FastDownload`:\n\n```python\nd = FastDownload(base='~/.mypath', archive='downloaded', data='extracted')\n```\n\nYou can remove the cached archive file and/or the extracted contents with `rm`:\n\n```python\nd.rm('https://...')\n```\n\n\n## Usage: making archives available to download\n\n`fastdownload` will add a file `download_checks.py` to your Python module which contains file sizes and hashes for your archives. The file is located in the same directory as a module you choose, e.g.:\n\n```python\nd = FastDownload(module=fastai.some_module)\n```\n\nThen use `update` to create or update the size and hash for a URL:\n\n```python\nd.update('https://...')\n```\n\nYou will now find there is a file called `download_checks.py` in the same directory where `fastai.some_module` is located, which contains a Python dict with the URL, size, and hash for this file. If you've downloaded this file before to your `archive` path then it will be used, instead of downloading a new copy. Use `get(force=True)` first to download a new copy if even you have it in your archive.\n\n## Config file\n\nIf there is a file called `config.ini` in your `base` directory, then keys `archive` and `data` will be used as the default values for `FastDownload`. The file should be in [configparser](https://docs.python.org/3/library/configparser.html) format. Here's a sample `config.ini`:\n\n```\n[DEFAULT] \narchive = downloaded\ndata = extracted\n```\n\nIf there is no ini file present, one will be automatically created for for you using the details you pass to `FastDownload`.\n\nYou can add any additional key/value pairs to the config file that you want. When you call `FastDownload.get` pass `extract_key` to use a key other than `data` for choosing a location to extract to.\n\n\n",
"bugtrack_url": null,
"license": "Apache Software License 2.0",
"summary": "A general purpose data downloading library.",
"version": "0.0.7",
"project_urls": {
"Homepage": "https://github.com/fastai/fastdownload/tree/master/"
},
"split_keywords": [
"fastdownload",
"dataset",
"fastai",
"download"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "4760ed35253a05a70b63e4f52df1daa39a6a464a3e22b0bd060b77f63e2e2b6a",
"md5": "ea2955724c778270aecddc1b2be4906a",
"sha256": "b791fa3406a2da003ba64615f03c60e2ea041c3c555796450b9a9a601bc0bbac"
},
"downloads": -1,
"filename": "fastdownload-0.0.7-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ea2955724c778270aecddc1b2be4906a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 12803,
"upload_time": "2022-07-07T18:35:50",
"upload_time_iso_8601": "2022-07-07T18:35:50.912946Z",
"url": "https://files.pythonhosted.org/packages/47/60/ed35253a05a70b63e4f52df1daa39a6a464a3e22b0bd060b77f63e2e2b6a/fastdownload-0.0.7-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "08bed2c2e8dc81aa88316ed27f1bd707440a83a7420c35e67c0b143fe81aeca9",
"md5": "d0cc36f8c781774053e8b435a3628b69",
"sha256": "20507edb8e89406a1fbd7775e6e2a3d81a4dd633dd506b0e9cf0e1613e831d6a"
},
"downloads": -1,
"filename": "fastdownload-0.0.7.tar.gz",
"has_sig": false,
"md5_digest": "d0cc36f8c781774053e8b435a3628b69",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 16096,
"upload_time": "2022-07-07T18:35:53",
"upload_time_iso_8601": "2022-07-07T18:35:53.336457Z",
"url": "https://files.pythonhosted.org/packages/08/be/d2c2e8dc81aa88316ed27f1bd707440a83a7420c35e67c0b143fe81aeca9/fastdownload-0.0.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2022-07-07 18:35:53",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "fastai",
"github_project": "fastdownload",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "fastdownload"
}