<img src="https://github.com/fatiando/pooch/raw/main/doc/_static/readme-banner.png" alt="Pooch: A friend to fetch your data files">
<p align="center">
<a href="https://www.fatiando.org/pooch"><strong>Documentation</strong> (latest)</a> •
<a href="https://www.fatiando.org/pooch/dev"><strong>Documentation</strong> (main branch)</a> •
<a href="https://github.com/fatiando/pooch/blob/main/CONTRIBUTING.md"><strong>Contributing</strong></a> •
<a href="https://www.fatiando.org/contact/"><strong>Contact</strong></a>
</p>
<p align="center">
Part of the <a href="https://www.fatiando.org"><strong>Fatiando a Terra</strong></a> project
</p>
<p align="center">
<a href="https://pypi.python.org/pypi/pooch"><img src="http://img.shields.io/pypi/v/pooch.svg?style=flat-square" alt="Latest version on PyPI"></a>
<a href="https://github.com/conda-forge/pooch-feedstock"><img src="https://img.shields.io/conda/vn/conda-forge/pooch.svg?style=flat-square" alt="Latest version on conda-forge"></a>
<a href="https://codecov.io/gh/fatiando/pooch"><img src="https://img.shields.io/codecov/c/github/fatiando/pooch/main.svg?style=flat-square" alt="Test coverage status"></a>
<a href="https://pypi.python.org/pypi/pooch"><img src="https://img.shields.io/pypi/pyversions/pooch.svg?style=flat-square" alt="Compatible Python versions."></a>
<a href="https://doi.org/10.21105/joss.01943"><img src="https://img.shields.io/badge/doi-10.21105%2Fjoss.01943-blue?style=flat-square" alt="DOI used to cite Pooch"></a>
</p>
## About
*Does your Python package include sample datasets?
Are you shipping them with the code?
Are they getting too big?*
**Pooch** is here to help! It will manage a data *registry* by downloading your
data files from a server only when needed and storing them locally in a data
*cache* (a folder on your computer).
Here are Pooch's main features:
* Pure Python and minimal dependencies.
* Download a file only if necessary (it's not in the data cache or needs to be
updated).
* Verify download integrity through SHA256 hashes (also used to check if a file
needs to be updated).
* Designed to be extended: plug in custom download (FTP, scp, etc) and
post-processing (unzip, decompress, rename) functions.
* Includes utilities to unzip/decompress the data upon download to save loading
time.
* Can handle basic HTTP authentication (for servers that require a login) and
printing download progress bars.
* Easily set up an environment variable to overwrite the data cache location.
*Are you a scientist or researcher? Pooch can help you too!*
* Automatically download your data files so you don't have to keep them in your
GitHub repository.
* Make sure everyone running the code has the same version of the data files
(enforced through the SHA256 hashes).
## Example
For a **scientist downloading a data file** for analysis:
```python
import pooch
import pandas as pd
# Download a file and save it locally, returning the path to it.
# Running this again will not cause a download. Pooch will check the hash
# (checksum) of the downloaded file against the given value to make sure
# it's the right file (not corrupted or outdated).
fname_bathymetry = pooch.retrieve(
url="https://github.com/fatiando-data/caribbean-bathymetry/releases/download/v1/caribbean-bathymetry.csv.xz",
known_hash="md5:a7332aa6e69c77d49d7fb54b764caa82",
)
# Pooch can also download based on a DOI from certain providers.
fname_gravity = pooch.retrieve(
url="doi:10.5281/zenodo.5882430/southern-africa-gravity.csv.xz",
known_hash="md5:1dee324a14e647855366d6eb01a1ef35",
)
# Load the data with Pandas
data_bathymetry = pd.read_csv(fname_bathymetry)
data_gravity = pd.read_csv(fname_gravity)
```
For **package developers** including sample data in their projects:
```python
"""
Module mypackage/datasets.py
"""
import pkg_resources
import pandas
import pooch
# Get the version string from your project. You have one of these, right?
from . import version
# Create a new friend to manage your sample data storage
GOODBOY = pooch.create(
# Folder where the data will be stored. For a sensible default, use the
# default cache folder for your OS.
path=pooch.os_cache("mypackage"),
# Base URL of the remote data store. Will call .format on this string
# to insert the version (see below).
base_url="https://github.com/myproject/mypackage/raw/{version}/data/",
# Pooches are versioned so that you can use multiple versions of a
# package simultaneously. Use PEP440 compliant version number. The
# version will be appended to the path.
version=version,
# If a version as a "+XX.XXXXX" suffix, we'll assume that this is a dev
# version and replace the version with this string.
version_dev="main",
# An environment variable that overwrites the path.
env="MYPACKAGE_DATA_DIR",
# The cache file registry. A dictionary with all files managed by this
# pooch. Keys are the file names (relative to *base_url*) and values
# are their respective SHA256 hashes. Files will be downloaded
# automatically when needed (see fetch_gravity_data).
registry={"gravity-data.csv": "89y10phsdwhs09whljwc09whcowsdhcwodcydw"}
)
# You can also load the registry from a file. Each line contains a file
# name and it's sha256 hash separated by a space. This makes it easier to
# manage large numbers of data files. The registry file should be packaged
# and distributed with your software.
GOODBOY.load_registry(
pkg_resources.resource_stream("mypackage", "registry.txt")
)
# Define functions that your users can call to get back the data in memory
def fetch_gravity_data():
"""
Load some sample gravity data to use in your docs.
"""
# Fetch the path to a file in the local storage. If it's not there,
# we'll download it.
fname = GOODBOY.fetch("gravity-data.csv")
# Load it with numpy/pandas/etc
data = pandas.read_csv(fname)
return data
```
## Projects using Pooch
* [SciPy](https://github.com/scipy/scipy)
* [scikit-image](https://github.com/scikit-image/scikit-image)
* [MetPy](https://github.com/Unidata/MetPy)
* [icepack](https://github.com/icepack/icepack)
* [histolab](https://github.com/histolab/histolab)
* [seaborn-image](https://github.com/SarthakJariwala/seaborn-image)
* [Ensaio](https://github.com/fatiando/ensaio)
* [Open AR-Sandbox](https://github.com/cgre-aachen/open_AR_Sandbox)
* [climlab](https://github.com/climlab/climlab)
* [napari](https://github.com/napari/napari)
* [mne-python](https://github.com/mne-tools/mne-python)
* [GemGIS](https://github.com/cgre-aachen/gemgis)
*If you're using Pooch, send us a pull request adding your project to the list.*
## Getting involved
🗨️ **Contact us:**
Find out more about how to reach us at
[fatiando.org/contact](https://www.fatiando.org/contact/).
👩🏾💻 **Contributing to project development:**
Please read our
[Contributing Guide](https://github.com/fatiando/pooch/blob/main/CONTRIBUTING.md)
to see how you can help and give feedback.
🧑🏾🤝🧑🏼 **Code of conduct:**
This project is released with a
[Code of Conduct](https://github.com/fatiando/community/blob/main/CODE_OF_CONDUCT.md).
By participating in this project you agree to abide by its terms.
> **Imposter syndrome disclaimer:**
> We want your help. **No, really.** There may be a little voice inside your
> head that is telling you that you're not ready, that you aren't skilled
> enough to contribute. We assure you that the little voice in your head is
> wrong. Most importantly, **there are many valuable ways to contribute besides
> writing code**.
>
> *This disclaimer was adapted from the*
> [MetPy project](https://github.com/Unidata/MetPy).
## License
This is free software: you can redistribute it and/or modify it under the terms
of the **BSD 3-clause License**. A copy of this license is provided in
[`LICENSE.txt`](https://github.com/fatiando/pooch/blob/main/LICENSE.txt).
Raw data
{
"_id": null,
"home_page": "https://github.com/fatiando/pooch",
"name": "pooch",
"maintainer": "\"Leonardo Uieda\"",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "leouieda@gmail.com",
"keywords": "data,download,caching,http",
"author": "The Pooch Developers",
"author_email": "fatiandoaterra@protonmail.com",
"download_url": "https://files.pythonhosted.org/packages/0b/ab/89804bd6f840d1e2eb7be9959dae0fd3945b0c7487699c2db1cc0f32f0b0/pooch-1.8.0.tar.gz",
"platform": "any",
"description": "<img src=\"https://github.com/fatiando/pooch/raw/main/doc/_static/readme-banner.png\" alt=\"Pooch: A friend to fetch your data files\">\n\n<p align=\"center\">\n<a href=\"https://www.fatiando.org/pooch\"><strong>Documentation</strong> (latest)</a> \u2022\n<a href=\"https://www.fatiando.org/pooch/dev\"><strong>Documentation</strong> (main branch)</a> \u2022\n<a href=\"https://github.com/fatiando/pooch/blob/main/CONTRIBUTING.md\"><strong>Contributing</strong></a> \u2022\n<a href=\"https://www.fatiando.org/contact/\"><strong>Contact</strong></a>\n</p>\n\n<p align=\"center\">\nPart of the <a href=\"https://www.fatiando.org\"><strong>Fatiando a Terra</strong></a> project\n</p>\n\n<p align=\"center\">\n<a href=\"https://pypi.python.org/pypi/pooch\"><img src=\"http://img.shields.io/pypi/v/pooch.svg?style=flat-square\" alt=\"Latest version on PyPI\"></a>\n<a href=\"https://github.com/conda-forge/pooch-feedstock\"><img src=\"https://img.shields.io/conda/vn/conda-forge/pooch.svg?style=flat-square\" alt=\"Latest version on conda-forge\"></a>\n<a href=\"https://codecov.io/gh/fatiando/pooch\"><img src=\"https://img.shields.io/codecov/c/github/fatiando/pooch/main.svg?style=flat-square\" alt=\"Test coverage status\"></a>\n<a href=\"https://pypi.python.org/pypi/pooch\"><img src=\"https://img.shields.io/pypi/pyversions/pooch.svg?style=flat-square\" alt=\"Compatible Python versions.\"></a>\n<a href=\"https://doi.org/10.21105/joss.01943\"><img src=\"https://img.shields.io/badge/doi-10.21105%2Fjoss.01943-blue?style=flat-square\" alt=\"DOI used to cite Pooch\"></a>\n</p>\n\n## About\n\n*Does your Python package include sample datasets?\nAre you shipping them with the code?\nAre they getting too big?*\n\n**Pooch** is here to help! It will manage a data *registry* by downloading your\ndata files from a server only when needed and storing them locally in a data\n*cache* (a folder on your computer).\n\nHere are Pooch's main features:\n\n* Pure Python and minimal dependencies.\n* Download a file only if necessary (it's not in the data cache or needs to be\n updated).\n* Verify download integrity through SHA256 hashes (also used to check if a file\n needs to be updated).\n* Designed to be extended: plug in custom download (FTP, scp, etc) and\n post-processing (unzip, decompress, rename) functions.\n* Includes utilities to unzip/decompress the data upon download to save loading\n time.\n* Can handle basic HTTP authentication (for servers that require a login) and\n printing download progress bars.\n* Easily set up an environment variable to overwrite the data cache location.\n\n*Are you a scientist or researcher? Pooch can help you too!*\n\n* Automatically download your data files so you don't have to keep them in your\n GitHub repository.\n* Make sure everyone running the code has the same version of the data files\n (enforced through the SHA256 hashes).\n\n## Example\n\nFor a **scientist downloading a data file** for analysis:\n\n```python\nimport pooch\nimport pandas as pd\n\n# Download a file and save it locally, returning the path to it.\n# Running this again will not cause a download. Pooch will check the hash\n# (checksum) of the downloaded file against the given value to make sure\n# it's the right file (not corrupted or outdated).\nfname_bathymetry = pooch.retrieve(\n url=\"https://github.com/fatiando-data/caribbean-bathymetry/releases/download/v1/caribbean-bathymetry.csv.xz\",\n known_hash=\"md5:a7332aa6e69c77d49d7fb54b764caa82\",\n)\n\n# Pooch can also download based on a DOI from certain providers.\nfname_gravity = pooch.retrieve(\n url=\"doi:10.5281/zenodo.5882430/southern-africa-gravity.csv.xz\",\n known_hash=\"md5:1dee324a14e647855366d6eb01a1ef35\",\n)\n\n# Load the data with Pandas\ndata_bathymetry = pd.read_csv(fname_bathymetry)\ndata_gravity = pd.read_csv(fname_gravity)\n```\n\nFor **package developers** including sample data in their projects:\n\n```python\n\"\"\"\nModule mypackage/datasets.py\n\"\"\"\nimport pkg_resources\nimport pandas\nimport pooch\n\n# Get the version string from your project. You have one of these, right?\nfrom . import version\n\n# Create a new friend to manage your sample data storage\nGOODBOY = pooch.create(\n # Folder where the data will be stored. For a sensible default, use the\n # default cache folder for your OS.\n path=pooch.os_cache(\"mypackage\"),\n # Base URL of the remote data store. Will call .format on this string\n # to insert the version (see below).\n base_url=\"https://github.com/myproject/mypackage/raw/{version}/data/\",\n # Pooches are versioned so that you can use multiple versions of a\n # package simultaneously. Use PEP440 compliant version number. The\n # version will be appended to the path.\n version=version,\n # If a version as a \"+XX.XXXXX\" suffix, we'll assume that this is a dev\n # version and replace the version with this string.\n version_dev=\"main\",\n # An environment variable that overwrites the path.\n env=\"MYPACKAGE_DATA_DIR\",\n # The cache file registry. A dictionary with all files managed by this\n # pooch. Keys are the file names (relative to *base_url*) and values\n # are their respective SHA256 hashes. Files will be downloaded\n # automatically when needed (see fetch_gravity_data).\n registry={\"gravity-data.csv\": \"89y10phsdwhs09whljwc09whcowsdhcwodcydw\"}\n)\n# You can also load the registry from a file. Each line contains a file\n# name and it's sha256 hash separated by a space. This makes it easier to\n# manage large numbers of data files. The registry file should be packaged\n# and distributed with your software.\nGOODBOY.load_registry(\n pkg_resources.resource_stream(\"mypackage\", \"registry.txt\")\n)\n\n# Define functions that your users can call to get back the data in memory\ndef fetch_gravity_data():\n \"\"\"\n Load some sample gravity data to use in your docs.\n \"\"\"\n # Fetch the path to a file in the local storage. If it's not there,\n # we'll download it.\n fname = GOODBOY.fetch(\"gravity-data.csv\")\n # Load it with numpy/pandas/etc\n data = pandas.read_csv(fname)\n return data\n```\n\n## Projects using Pooch\n\n* [SciPy](https://github.com/scipy/scipy)\n* [scikit-image](https://github.com/scikit-image/scikit-image)\n* [MetPy](https://github.com/Unidata/MetPy)\n* [icepack](https://github.com/icepack/icepack)\n* [histolab](https://github.com/histolab/histolab)\n* [seaborn-image](https://github.com/SarthakJariwala/seaborn-image)\n* [Ensaio](https://github.com/fatiando/ensaio)\n* [Open AR-Sandbox](https://github.com/cgre-aachen/open_AR_Sandbox)\n* [climlab](https://github.com/climlab/climlab)\n* [napari](https://github.com/napari/napari)\n* [mne-python](https://github.com/mne-tools/mne-python)\n* [GemGIS](https://github.com/cgre-aachen/gemgis)\n\n*If you're using Pooch, send us a pull request adding your project to the list.*\n\n## Getting involved\n\n\ud83d\udde8\ufe0f **Contact us:**\nFind out more about how to reach us at\n[fatiando.org/contact](https://www.fatiando.org/contact/).\n\n\ud83d\udc69\ud83c\udffe\u200d\ud83d\udcbb **Contributing to project development:**\nPlease read our\n[Contributing Guide](https://github.com/fatiando/pooch/blob/main/CONTRIBUTING.md)\nto see how you can help and give feedback.\n\n\ud83e\uddd1\ud83c\udffe\u200d\ud83e\udd1d\u200d\ud83e\uddd1\ud83c\udffc **Code of conduct:**\nThis project is released with a\n[Code of Conduct](https://github.com/fatiando/community/blob/main/CODE_OF_CONDUCT.md).\nBy participating in this project you agree to abide by its terms.\n\n> **Imposter syndrome disclaimer:**\n> We want your help. **No, really.** There may be a little voice inside your\n> head that is telling you that you're not ready, that you aren't skilled\n> enough to contribute. We assure you that the little voice in your head is\n> wrong. Most importantly, **there are many valuable ways to contribute besides\n> writing code**.\n>\n> *This disclaimer was adapted from the*\n> [MetPy project](https://github.com/Unidata/MetPy).\n\n## License\n\nThis is free software: you can redistribute it and/or modify it under the terms\nof the **BSD 3-clause License**. A copy of this license is provided in\n[`LICENSE.txt`](https://github.com/fatiando/pooch/blob/main/LICENSE.txt).\n",
"bugtrack_url": null,
"license": "BSD 3-Clause License",
"summary": "\"Pooch manages your Python library's sample data files: it automatically downloads and stores them in a local directory, with support for versioning and corruption checks.\"",
"version": "1.8.0",
"project_urls": {
"Bug Tracker": "https://github.com/fatiando/pooch/issues",
"Documentation": "https://www.fatiando.org/pooch",
"Homepage": "https://github.com/fatiando/pooch",
"Release Notes": "https://github.com/fatiando/pooch/releases",
"Source Code": "https://github.com/fatiando/pooch"
},
"split_keywords": [
"data",
"download",
"caching",
"http"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "1aa55174dac3957ac412e80a00f30b6507031fcab7000afc9ea0ac413bddcff2",
"md5": "909fda52d8f37beb0f05e9f193bcdd2f",
"sha256": "1bfba436d9e2ad5199ccad3583cca8c241b8736b5bb23fe67c213d52650dbb66"
},
"downloads": -1,
"filename": "pooch-1.8.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "909fda52d8f37beb0f05e9f193bcdd2f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 62687,
"upload_time": "2023-10-24T17:23:15",
"upload_time_iso_8601": "2023-10-24T17:23:15.038965Z",
"url": "https://files.pythonhosted.org/packages/1a/a5/5174dac3957ac412e80a00f30b6507031fcab7000afc9ea0ac413bddcff2/pooch-1.8.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "0bab89804bd6f840d1e2eb7be9959dae0fd3945b0c7487699c2db1cc0f32f0b0",
"md5": "0fb23ddbf2c4ea8cf6929176e619e8ab",
"sha256": "f59981fd5b9b5d032dcde8f4a11eaa492c2ac6343fae3596a2fdae35fc54b0a0"
},
"downloads": -1,
"filename": "pooch-1.8.0.tar.gz",
"has_sig": false,
"md5_digest": "0fb23ddbf2c4ea8cf6929176e619e8ab",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 59408,
"upload_time": "2023-10-24T17:23:17",
"upload_time_iso_8601": "2023-10-24T17:23:17.609707Z",
"url": "https://files.pythonhosted.org/packages/0b/ab/89804bd6f840d1e2eb7be9959dae0fd3945b0c7487699c2db1cc0f32f0b0/pooch-1.8.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-10-24 17:23:17",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "fatiando",
"github_project": "pooch",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"lcname": "pooch"
}