pooch


Namepooch JSON
Version 1.8.2 PyPI version JSON
download
home_pageNone
SummaryA friend to fetch your data files
upload_time2024-06-06 16:53:46
maintainerNone
docs_urlNone
authorNone
requires_python>=3.7
licenseBSD-3-Clause
keywords data download caching http
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            <img src="https://github.com/fatiando/pooch/raw/main/doc/_static/readme-banner.png" alt="Pooch: A friend to fetch your data files">

<p align="center">
<a href="https://www.fatiando.org/pooch"><strong>Documentation</strong> (latest)</a> •
<a href="https://www.fatiando.org/pooch/dev"><strong>Documentation</strong> (main branch)</a> •
<a href="https://github.com/fatiando/pooch/blob/main/CONTRIBUTING.md"><strong>Contributing</strong></a> •
<a href="https://www.fatiando.org/contact/"><strong>Contact</strong></a>
</p>

<p align="center">
Part of the <a href="https://www.fatiando.org"><strong>Fatiando a Terra</strong></a> project
</p>

<p align="center">
<a href="https://pypi.python.org/pypi/pooch"><img src="http://img.shields.io/pypi/v/pooch.svg?style=flat-square" alt="Latest version on PyPI"></a>
<a href="https://github.com/conda-forge/pooch-feedstock"><img src="https://img.shields.io/conda/vn/conda-forge/pooch.svg?style=flat-square" alt="Latest version on conda-forge"></a>
<a href="https://codecov.io/gh/fatiando/pooch"><img src="https://img.shields.io/codecov/c/github/fatiando/pooch/main.svg?style=flat-square" alt="Test coverage status"></a>
<a href="https://pypi.python.org/pypi/pooch"><img src="https://img.shields.io/pypi/pyversions/pooch.svg?style=flat-square" alt="Compatible Python versions."></a>
<a href="https://doi.org/10.21105/joss.01943"><img src="https://img.shields.io/badge/doi-10.21105%2Fjoss.01943-blue?style=flat-square" alt="DOI used to cite Pooch"></a>
</p>

## About

> Just want to download a file without messing with `requests` and `urllib`?
> Trying to add sample datasets to your Python package?
> **Pooch is here to help!**

*Pooch* is a **Python library** that can manage data by **downloading files**
from a server (only when needed) and storing them locally in a data **cache**
(a folder on your computer).

* Pure Python and minimal dependencies.
* Download files over HTTP, FTP, and from data repositories like Zenodo and figshare.
* Built-in post-processors to unzip/decompress the data after download.
* Designed to be extended: create custom downloaders and post-processors.

Are you a **scientist** or researcher? Pooch can help you too!

* Host your data on a repository and download using the DOI.
* Automatically download data using code instead of telling colleagues to do it themselves.
* Make sure everyone running the code has the same version of the data files.

## Projects using Pooch

[SciPy](https://github.com/scipy/scipy), 
[scikit-image](https://github.com/scikit-image/scikit-image),
[xarray](https://github.com/pydata/xarray),
[Ensaio](https://github.com/fatiando/ensaio),
[GemPy](https://github.com/cgre-aachen/gempy),
[MetPy](https://github.com/Unidata/MetPy),
[napari](https://github.com/napari/napari),
[Satpy](https://github.com/pytroll/satpy),
[yt](https://github.com/yt-project/yt),
[PyVista](https://github.com/pyvista/pyvista),
[icepack](https://github.com/icepack/icepack),
[histolab](https://github.com/histolab/histolab),
[seaborn-image](https://github.com/SarthakJariwala/seaborn-image),
[Open AR-Sandbox](https://github.com/cgre-aachen/open_AR_Sandbox),
[climlab](https://github.com/climlab/climlab),
[mne-python](https://github.com/mne-tools/mne-python),
[GemGIS](https://github.com/cgre-aachen/gemgis),
[SHTOOLS](https://github.com/SHTOOLS/SHTOOLS),
[MOABB](https://github.com/NeuroTechX/moabb),
[GeoViews](https://github.com/holoviz/geoviews),
[ScopeSim](https://github.com/AstarVienna/ScopeSim),
[Brainrender](https://github.com/brainglobe/brainrender),
[pyxem](https://github.com/pyxem/pyxem),
[cellfinder](https://github.com/brainglobe/cellfinder),
[PVGeo](https://github.com/OpenGeoVis/PVGeo),
[geosnap](https://github.com/oturns/geosnap),
[BioCypher](https://github.com/biocypher/biocypher),
[cf-xarray](https://github.com/xarray-contrib/cf-xarray),
[Scirpy](https://github.com/scverse/scirpy),
[rembg](https://github.com/danielgatis/rembg),
[DASCore](https://github.com/DASDAE/dascore),
[scikit-mobility](https://github.com/scikit-mobility/scikit-mobility),
[Py-ART](https://github.com/ARM-DOE/pyart),
[HyperSpy](https://github.com/hyperspy/hyperspy),
[RosettaSciIO](https://github.com/hyperspy/rosettasciio),
[eXSpy](https://github.com/hyperspy/exspy)


> If you're using Pooch, **send us a pull request** adding your project to the list.

## Example

For a **scientist downloading a data file** for analysis:

```python
import pooch
import pandas as pd

# Download a file and save it locally, returning the path to it.
# Running this again will not cause a download. Pooch will check the hash
# (checksum) of the downloaded file against the given value to make sure
# it's the right file (not corrupted or outdated).
fname_bathymetry = pooch.retrieve(
    url="https://github.com/fatiando-data/caribbean-bathymetry/releases/download/v1/caribbean-bathymetry.csv.xz",
    known_hash="md5:a7332aa6e69c77d49d7fb54b764caa82",
)

# Pooch can also download based on a DOI from certain providers.
fname_gravity = pooch.retrieve(
    url="doi:10.5281/zenodo.5882430/southern-africa-gravity.csv.xz",
    known_hash="md5:1dee324a14e647855366d6eb01a1ef35",
)

# Load the data with Pandas
data_bathymetry = pd.read_csv(fname_bathymetry)
data_gravity = pd.read_csv(fname_gravity)
```

For **package developers** including sample data in their projects:

```python
"""
Module mypackage/datasets.py
"""
import pkg_resources
import pandas
import pooch

# Get the version string from your project. You have one of these, right?
from . import version

# Create a new friend to manage your sample data storage
GOODBOY = pooch.create(
    # Folder where the data will be stored. For a sensible default, use the
    # default cache folder for your OS.
    path=pooch.os_cache("mypackage"),
    # Base URL of the remote data store. Will call .format on this string
    # to insert the version (see below).
    base_url="https://github.com/myproject/mypackage/raw/{version}/data/",
    # Pooches are versioned so that you can use multiple versions of a
    # package simultaneously. Use PEP440 compliant version number. The
    # version will be appended to the path.
    version=version,
    # If a version as a "+XX.XXXXX" suffix, we'll assume that this is a dev
    # version and replace the version with this string.
    version_dev="main",
    # An environment variable that overwrites the path.
    env="MYPACKAGE_DATA_DIR",
    # The cache file registry. A dictionary with all files managed by this
    # pooch. Keys are the file names (relative to *base_url*) and values
    # are their respective SHA256 hashes. Files will be downloaded
    # automatically when needed (see fetch_gravity_data).
    registry={"gravity-data.csv": "89y10phsdwhs09whljwc09whcowsdhcwodcydw"}
)
# You can also load the registry from a file. Each line contains a file
# name and it's sha256 hash separated by a space. This makes it easier to
# manage large numbers of data files. The registry file should be packaged
# and distributed with your software.
GOODBOY.load_registry(
    pkg_resources.resource_stream("mypackage", "registry.txt")
)

# Define functions that your users can call to get back the data in memory
def fetch_gravity_data():
    """
    Load some sample gravity data to use in your docs.
    """
    # Fetch the path to a file in the local storage. If it's not there,
    # we'll download it.
    fname = GOODBOY.fetch("gravity-data.csv")
    # Load it with numpy/pandas/etc
    data = pandas.read_csv(fname)
    return data
```

## Getting involved

🗨️ **Contact us:**
Find out more about how to reach us at
[fatiando.org/contact](https://www.fatiando.org/contact/).

👩🏾‍💻 **Contributing to project development:**
Please read our
[Contributing Guide](https://github.com/fatiando/pooch/blob/main/CONTRIBUTING.md)
to see how you can help and give feedback.

🧑🏾‍🤝‍🧑🏼 **Code of conduct:**
This project is released with a
[Code of Conduct](https://github.com/fatiando/community/blob/main/CODE_OF_CONDUCT.md).
By participating in this project you agree to abide by its terms.

> **Imposter syndrome disclaimer:**
> We want your help. **No, really.** There may be a little voice inside your
> head that is telling you that you're not ready, that you aren't skilled
> enough to contribute. We assure you that the little voice in your head is
> wrong. Most importantly, **there are many valuable ways to contribute besides
> writing code**.
>
> *This disclaimer was adapted from the*
> [MetPy project](https://github.com/Unidata/MetPy).

## License

This is free software: you can redistribute it and/or modify it under the terms
of the **BSD 3-clause License**. A copy of this license is provided in
[`LICENSE.txt`](https://github.com/fatiando/pooch/blob/main/LICENSE.txt).

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "pooch",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "Leonardo Uieda <leo@uieda.com>",
    "keywords": "data, download, caching, http",
    "author": null,
    "author_email": "The Pooch Developers <fatiandoaterra@protonmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/c6/77/b3d3e00c696c16cf99af81ef7b1f5fe73bd2a307abca41bd7605429fe6e5/pooch-1.8.2.tar.gz",
    "platform": null,
    "description": "<img src=\"https://github.com/fatiando/pooch/raw/main/doc/_static/readme-banner.png\" alt=\"Pooch: A friend to fetch your data files\">\n\n<p align=\"center\">\n<a href=\"https://www.fatiando.org/pooch\"><strong>Documentation</strong> (latest)</a> \u2022\n<a href=\"https://www.fatiando.org/pooch/dev\"><strong>Documentation</strong> (main branch)</a> \u2022\n<a href=\"https://github.com/fatiando/pooch/blob/main/CONTRIBUTING.md\"><strong>Contributing</strong></a> \u2022\n<a href=\"https://www.fatiando.org/contact/\"><strong>Contact</strong></a>\n</p>\n\n<p align=\"center\">\nPart of the <a href=\"https://www.fatiando.org\"><strong>Fatiando a Terra</strong></a> project\n</p>\n\n<p align=\"center\">\n<a href=\"https://pypi.python.org/pypi/pooch\"><img src=\"http://img.shields.io/pypi/v/pooch.svg?style=flat-square\" alt=\"Latest version on PyPI\"></a>\n<a href=\"https://github.com/conda-forge/pooch-feedstock\"><img src=\"https://img.shields.io/conda/vn/conda-forge/pooch.svg?style=flat-square\" alt=\"Latest version on conda-forge\"></a>\n<a href=\"https://codecov.io/gh/fatiando/pooch\"><img src=\"https://img.shields.io/codecov/c/github/fatiando/pooch/main.svg?style=flat-square\" alt=\"Test coverage status\"></a>\n<a href=\"https://pypi.python.org/pypi/pooch\"><img src=\"https://img.shields.io/pypi/pyversions/pooch.svg?style=flat-square\" alt=\"Compatible Python versions.\"></a>\n<a href=\"https://doi.org/10.21105/joss.01943\"><img src=\"https://img.shields.io/badge/doi-10.21105%2Fjoss.01943-blue?style=flat-square\" alt=\"DOI used to cite Pooch\"></a>\n</p>\n\n## About\n\n> Just want to download a file without messing with `requests` and `urllib`?\n> Trying to add sample datasets to your Python package?\n> **Pooch is here to help!**\n\n*Pooch* is a **Python library** that can manage data by **downloading files**\nfrom a server (only when needed) and storing them locally in a data **cache**\n(a folder on your computer).\n\n* Pure Python and minimal dependencies.\n* Download files over HTTP, FTP, and from data repositories like Zenodo and figshare.\n* Built-in post-processors to unzip/decompress the data after download.\n* Designed to be extended: create custom downloaders and post-processors.\n\nAre you a **scientist** or researcher? Pooch can help you too!\n\n* Host your data on a repository and download using the DOI.\n* Automatically download data using code instead of telling colleagues to do it themselves.\n* Make sure everyone running the code has the same version of the data files.\n\n## Projects using Pooch\n\n[SciPy](https://github.com/scipy/scipy), \n[scikit-image](https://github.com/scikit-image/scikit-image),\n[xarray](https://github.com/pydata/xarray),\n[Ensaio](https://github.com/fatiando/ensaio),\n[GemPy](https://github.com/cgre-aachen/gempy),\n[MetPy](https://github.com/Unidata/MetPy),\n[napari](https://github.com/napari/napari),\n[Satpy](https://github.com/pytroll/satpy),\n[yt](https://github.com/yt-project/yt),\n[PyVista](https://github.com/pyvista/pyvista),\n[icepack](https://github.com/icepack/icepack),\n[histolab](https://github.com/histolab/histolab),\n[seaborn-image](https://github.com/SarthakJariwala/seaborn-image),\n[Open AR-Sandbox](https://github.com/cgre-aachen/open_AR_Sandbox),\n[climlab](https://github.com/climlab/climlab),\n[mne-python](https://github.com/mne-tools/mne-python),\n[GemGIS](https://github.com/cgre-aachen/gemgis),\n[SHTOOLS](https://github.com/SHTOOLS/SHTOOLS),\n[MOABB](https://github.com/NeuroTechX/moabb),\n[GeoViews](https://github.com/holoviz/geoviews),\n[ScopeSim](https://github.com/AstarVienna/ScopeSim),\n[Brainrender](https://github.com/brainglobe/brainrender),\n[pyxem](https://github.com/pyxem/pyxem),\n[cellfinder](https://github.com/brainglobe/cellfinder),\n[PVGeo](https://github.com/OpenGeoVis/PVGeo),\n[geosnap](https://github.com/oturns/geosnap),\n[BioCypher](https://github.com/biocypher/biocypher),\n[cf-xarray](https://github.com/xarray-contrib/cf-xarray),\n[Scirpy](https://github.com/scverse/scirpy),\n[rembg](https://github.com/danielgatis/rembg),\n[DASCore](https://github.com/DASDAE/dascore),\n[scikit-mobility](https://github.com/scikit-mobility/scikit-mobility),\n[Py-ART](https://github.com/ARM-DOE/pyart),\n[HyperSpy](https://github.com/hyperspy/hyperspy),\n[RosettaSciIO](https://github.com/hyperspy/rosettasciio),\n[eXSpy](https://github.com/hyperspy/exspy)\n\n\n> If you're using Pooch, **send us a pull request** adding your project to the list.\n\n## Example\n\nFor a **scientist downloading a data file** for analysis:\n\n```python\nimport pooch\nimport pandas as pd\n\n# Download a file and save it locally, returning the path to it.\n# Running this again will not cause a download. Pooch will check the hash\n# (checksum) of the downloaded file against the given value to make sure\n# it's the right file (not corrupted or outdated).\nfname_bathymetry = pooch.retrieve(\n    url=\"https://github.com/fatiando-data/caribbean-bathymetry/releases/download/v1/caribbean-bathymetry.csv.xz\",\n    known_hash=\"md5:a7332aa6e69c77d49d7fb54b764caa82\",\n)\n\n# Pooch can also download based on a DOI from certain providers.\nfname_gravity = pooch.retrieve(\n    url=\"doi:10.5281/zenodo.5882430/southern-africa-gravity.csv.xz\",\n    known_hash=\"md5:1dee324a14e647855366d6eb01a1ef35\",\n)\n\n# Load the data with Pandas\ndata_bathymetry = pd.read_csv(fname_bathymetry)\ndata_gravity = pd.read_csv(fname_gravity)\n```\n\nFor **package developers** including sample data in their projects:\n\n```python\n\"\"\"\nModule mypackage/datasets.py\n\"\"\"\nimport pkg_resources\nimport pandas\nimport pooch\n\n# Get the version string from your project. You have one of these, right?\nfrom . import version\n\n# Create a new friend to manage your sample data storage\nGOODBOY = pooch.create(\n    # Folder where the data will be stored. For a sensible default, use the\n    # default cache folder for your OS.\n    path=pooch.os_cache(\"mypackage\"),\n    # Base URL of the remote data store. Will call .format on this string\n    # to insert the version (see below).\n    base_url=\"https://github.com/myproject/mypackage/raw/{version}/data/\",\n    # Pooches are versioned so that you can use multiple versions of a\n    # package simultaneously. Use PEP440 compliant version number. The\n    # version will be appended to the path.\n    version=version,\n    # If a version as a \"+XX.XXXXX\" suffix, we'll assume that this is a dev\n    # version and replace the version with this string.\n    version_dev=\"main\",\n    # An environment variable that overwrites the path.\n    env=\"MYPACKAGE_DATA_DIR\",\n    # The cache file registry. A dictionary with all files managed by this\n    # pooch. Keys are the file names (relative to *base_url*) and values\n    # are their respective SHA256 hashes. Files will be downloaded\n    # automatically when needed (see fetch_gravity_data).\n    registry={\"gravity-data.csv\": \"89y10phsdwhs09whljwc09whcowsdhcwodcydw\"}\n)\n# You can also load the registry from a file. Each line contains a file\n# name and it's sha256 hash separated by a space. This makes it easier to\n# manage large numbers of data files. The registry file should be packaged\n# and distributed with your software.\nGOODBOY.load_registry(\n    pkg_resources.resource_stream(\"mypackage\", \"registry.txt\")\n)\n\n# Define functions that your users can call to get back the data in memory\ndef fetch_gravity_data():\n    \"\"\"\n    Load some sample gravity data to use in your docs.\n    \"\"\"\n    # Fetch the path to a file in the local storage. If it's not there,\n    # we'll download it.\n    fname = GOODBOY.fetch(\"gravity-data.csv\")\n    # Load it with numpy/pandas/etc\n    data = pandas.read_csv(fname)\n    return data\n```\n\n## Getting involved\n\n\ud83d\udde8\ufe0f **Contact us:**\nFind out more about how to reach us at\n[fatiando.org/contact](https://www.fatiando.org/contact/).\n\n\ud83d\udc69\ud83c\udffe\u200d\ud83d\udcbb **Contributing to project development:**\nPlease read our\n[Contributing Guide](https://github.com/fatiando/pooch/blob/main/CONTRIBUTING.md)\nto see how you can help and give feedback.\n\n\ud83e\uddd1\ud83c\udffe\u200d\ud83e\udd1d\u200d\ud83e\uddd1\ud83c\udffc **Code of conduct:**\nThis project is released with a\n[Code of Conduct](https://github.com/fatiando/community/blob/main/CODE_OF_CONDUCT.md).\nBy participating in this project you agree to abide by its terms.\n\n> **Imposter syndrome disclaimer:**\n> We want your help. **No, really.** There may be a little voice inside your\n> head that is telling you that you're not ready, that you aren't skilled\n> enough to contribute. We assure you that the little voice in your head is\n> wrong. Most importantly, **there are many valuable ways to contribute besides\n> writing code**.\n>\n> *This disclaimer was adapted from the*\n> [MetPy project](https://github.com/Unidata/MetPy).\n\n## License\n\nThis is free software: you can redistribute it and/or modify it under the terms\nof the **BSD 3-clause License**. A copy of this license is provided in\n[`LICENSE.txt`](https://github.com/fatiando/pooch/blob/main/LICENSE.txt).\n",
    "bugtrack_url": null,
    "license": "BSD-3-Clause",
    "summary": "A friend to fetch your data files",
    "version": "1.8.2",
    "project_urls": {
        "Bug Tracker": "https://github.com/fatiando/pooch/issues",
        "Changelog": "https://www.fatiando.org/pooch/latest/changes.html",
        "Documentation": "https://www.fatiando.org/pooch",
        "Source Code": "https://github.com/fatiando/pooch"
    },
    "split_keywords": [
        "data",
        " download",
        " caching",
        " http"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a88777cc11c7a9ea9fd05503def69e3d18605852cd0d4b0d3b8f15bbeb3ef1d1",
                "md5": "830674534379589ada41ffb585feeea4",
                "sha256": "3529a57096f7198778a5ceefd5ac3ef0e4d06a6ddaf9fc2d609b806f25302c47"
            },
            "downloads": -1,
            "filename": "pooch-1.8.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "830674534379589ada41ffb585feeea4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 64574,
            "upload_time": "2024-06-06T16:53:44",
            "upload_time_iso_8601": "2024-06-06T16:53:44.343361Z",
            "url": "https://files.pythonhosted.org/packages/a8/87/77cc11c7a9ea9fd05503def69e3d18605852cd0d4b0d3b8f15bbeb3ef1d1/pooch-1.8.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c677b3d3e00c696c16cf99af81ef7b1f5fe73bd2a307abca41bd7605429fe6e5",
                "md5": "7a333ef27c34984385c25f1e0b156185",
                "sha256": "76561f0de68a01da4df6af38e9955c4c9d1a5c90da73f7e40276a5728ec83d10"
            },
            "downloads": -1,
            "filename": "pooch-1.8.2.tar.gz",
            "has_sig": false,
            "md5_digest": "7a333ef27c34984385c25f1e0b156185",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 59353,
            "upload_time": "2024-06-06T16:53:46",
            "upload_time_iso_8601": "2024-06-06T16:53:46.224355Z",
            "url": "https://files.pythonhosted.org/packages/c6/77/b3d3e00c696c16cf99af81ef7b1f5fe73bd2a307abca41bd7605429fe6e5/pooch-1.8.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-06 16:53:46",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "fatiando",
    "github_project": "pooch",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "lcname": "pooch"
}
        
Elapsed time: 0.97361s