pangeo-forge-esgf


Namepangeo-forge-esgf JSON
Version 0.3.0 PyPI version JSON
download
home_pageNone
SummaryUsing queries to the ESGF API to generate urls and keyword arguments for receipe generation in pangeo-forge
upload_time2024-04-30 15:33:30
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseApache-2.0
keywords pangeo data esgf
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pangeo-forge-esgf

Using queries to the ESGF API to generate urls and keyword arguments for receipe generation in pangeo-forge

## Install
You can install pangeo-forge-esgf via pip:
```
pip install pangeo-forge-esgf
```

If you want all the required dependencies for testing and development simply do:
```
pip install pangeo-forge-esgf[dev]
```

## Parsing a list of instance ids using wildcards

Pangeo forge recipes require the user to provide exact instance_id's for the datasets they want to be processed. Discovering these with the [web search](https://esgf-node.llnl.gov/search/cmip6/) can become cumbersome, especially when dealing with a large number of members/models etc.

`pangeo-forge-esgf` provides some functions to query the ESGF API based on instance_id values with wildcards.

For example if you want to find all the zonal (`uo`) and meridonal (`vo`) velocities available for the `lgm` experiment of PMIP, you can do:

```python
from pangeo_forge_esgf.parsing import parse_instance_ids
parse_iids = [
    "CMIP6.PMIP.*.*.lgm.*.*.[uo, vo].*.*",
]
# Comma separated values in square brackets will be expanded and the above is equivalent to:
# parse_iids = [
#     "CMIP6.PMIP.*.*.lgm.*.*.[uo, vo].*.*", # this is equivalent to passing
#     "CMIP6.PMIP.*.*.lgm.*.*.vo.*.*",
# ]
iids = []
for piid in parse_iids:
    iids.extend(parse_instance_ids(piid))
iids
```

and you will get:

```
['CMIP6.PMIP.MIROC.MIROC-ES2L.lgm.r1i1p1f2.Omon.uo.gn.v20191002',
 'CMIP6.PMIP.AWI.AWI-ESM-1-1-LR.lgm.r1i1p1f1.Odec.uo.gn.v20200212',
 'CMIP6.PMIP.AWI.AWI-ESM-1-1-LR.lgm.r1i1p1f1.Omon.uo.gn.v20200212',
 'CMIP6.PMIP.MIROC.MIROC-ES2L.lgm.r1i1p1f2.Omon.uo.gr1.v20200911',
 'CMIP6.PMIP.MPI-M.MPI-ESM1-2-LR.lgm.r1i1p1f1.Omon.uo.gn.v20200909',
 'CMIP6.PMIP.AWI.AWI-ESM-1-1-LR.lgm.r1i1p1f1.Omon.vo.gn.v20200212',
 'CMIP6.PMIP.MIROC.MIROC-ES2L.lgm.r1i1p1f2.Omon.vo.gn.v20191002',
 'CMIP6.PMIP.AWI.AWI-ESM-1-1-LR.lgm.r1i1p1f1.Odec.vo.gn.v20200212',
 'CMIP6.PMIP.MIROC.MIROC-ES2L.lgm.r1i1p1f2.Omon.vo.gr1.v20200911',
 'CMIP6.PMIP.MPI-M.MPI-ESM1-2-LR.lgm.r1i1p1f1.Omon.vo.gn.v20190710']
```

Eventually I hope I can leverage this functionality to handle user requests in PRs that add wildcard instance_ids, but for now this might be helpful to manually construct lists of instance_ids to submit to a pangeo-forge feedstock.

## Generating PGF recipe input (urls) from instance_ids

```python
from pangeo_forge_esgf import get_urls_from_esgf
iids = ['CMIP6.CMIP.CSIRO-ARCCSS.ACCESS-CM2.historical.r1i1p1f1.SImon.sifb.gn.v20200817']
url_dict = await get_urls_from_esgf(iids)
url_dict['CMIP6.CMIP.CSIRO-ARCCSS.ACCESS-CM2.historical.r1i1p1f1.SImon.sifb.gn.v20200817']
```

gives

```
100%|██████████| 5/5 [00:01<00:00,  4.98it/s]
Processing responses
Processing responses: Expected files per iid
Processing responses: Check for missing iids
Processing responses: Flatten results
Processing responses: Group results
Find responsive urls
100%|██████████| 1/1 [00:00<00:00,  3.25it/s]
['https://esgf-data1.llnl.gov/thredds/fileServer/css03_data/CMIP6/CMIP/CSIRO-ARCCSS/ACCESS-CM2/historical/r1i1p1f1/SImon/sifb/gn/v20200817/sifb_SImon_ACCESS-CM2_historical_r1i1p1f1_gn_185001-201412.nc']
```

or if you want to see detaile debugging statements

```python
from pangeo_forge_esgf import get_urls_from_esgf, setup_logging
setup_logging('DEBUG')
iids = ['CMIP6.CMIP.CSIRO-ARCCSS.ACCESS-CM2.historical.r1i1p1f1.SImon.sifb.gn.v20200817']
url_dict = await get_urls_from_esgf(iids)
url_dict['CMIP6.CMIP.CSIRO-ARCCSS.ACCESS-CM2.historical.r1i1p1f1.SImon.sifb.gn.v20200817']
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "pangeo-forge-esgf",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "pangeo, data, esgf",
    "author": null,
    "author_email": "Julius Busecke <julius@ldeo.columbia.edu>",
    "download_url": "https://files.pythonhosted.org/packages/3e/51/ef09c32470911ce1322ce41481bd1b5349c51925ebf0a3d555296ac59be9/pangeo_forge_esgf-0.3.0.tar.gz",
    "platform": null,
    "description": "# pangeo-forge-esgf\n\nUsing queries to the ESGF API to generate urls and keyword arguments for receipe generation in pangeo-forge\n\n## Install\nYou can install pangeo-forge-esgf via pip:\n```\npip install pangeo-forge-esgf\n```\n\nIf you want all the required dependencies for testing and development simply do:\n```\npip install pangeo-forge-esgf[dev]\n```\n\n## Parsing a list of instance ids using wildcards\n\nPangeo forge recipes require the user to provide exact instance_id's for the datasets they want to be processed. Discovering these with the [web search](https://esgf-node.llnl.gov/search/cmip6/) can become cumbersome, especially when dealing with a large number of members/models etc.\n\n`pangeo-forge-esgf` provides some functions to query the ESGF API based on instance_id values with wildcards.\n\nFor example if you want to find all the zonal (`uo`) and meridonal (`vo`) velocities available for the `lgm` experiment of PMIP, you can do:\n\n```python\nfrom pangeo_forge_esgf.parsing import parse_instance_ids\nparse_iids = [\n    \"CMIP6.PMIP.*.*.lgm.*.*.[uo, vo].*.*\",\n]\n# Comma separated values in square brackets will be expanded and the above is equivalent to:\n# parse_iids = [\n#     \"CMIP6.PMIP.*.*.lgm.*.*.[uo, vo].*.*\", # this is equivalent to passing\n#     \"CMIP6.PMIP.*.*.lgm.*.*.vo.*.*\",\n# ]\niids = []\nfor piid in parse_iids:\n    iids.extend(parse_instance_ids(piid))\niids\n```\n\nand you will get:\n\n```\n['CMIP6.PMIP.MIROC.MIROC-ES2L.lgm.r1i1p1f2.Omon.uo.gn.v20191002',\n 'CMIP6.PMIP.AWI.AWI-ESM-1-1-LR.lgm.r1i1p1f1.Odec.uo.gn.v20200212',\n 'CMIP6.PMIP.AWI.AWI-ESM-1-1-LR.lgm.r1i1p1f1.Omon.uo.gn.v20200212',\n 'CMIP6.PMIP.MIROC.MIROC-ES2L.lgm.r1i1p1f2.Omon.uo.gr1.v20200911',\n 'CMIP6.PMIP.MPI-M.MPI-ESM1-2-LR.lgm.r1i1p1f1.Omon.uo.gn.v20200909',\n 'CMIP6.PMIP.AWI.AWI-ESM-1-1-LR.lgm.r1i1p1f1.Omon.vo.gn.v20200212',\n 'CMIP6.PMIP.MIROC.MIROC-ES2L.lgm.r1i1p1f2.Omon.vo.gn.v20191002',\n 'CMIP6.PMIP.AWI.AWI-ESM-1-1-LR.lgm.r1i1p1f1.Odec.vo.gn.v20200212',\n 'CMIP6.PMIP.MIROC.MIROC-ES2L.lgm.r1i1p1f2.Omon.vo.gr1.v20200911',\n 'CMIP6.PMIP.MPI-M.MPI-ESM1-2-LR.lgm.r1i1p1f1.Omon.vo.gn.v20190710']\n```\n\nEventually I hope I can leverage this functionality to handle user requests in PRs that add wildcard instance_ids, but for now this might be helpful to manually construct lists of instance_ids to submit to a pangeo-forge feedstock.\n\n## Generating PGF recipe input (urls) from instance_ids\n\n```python\nfrom pangeo_forge_esgf import get_urls_from_esgf\niids = ['CMIP6.CMIP.CSIRO-ARCCSS.ACCESS-CM2.historical.r1i1p1f1.SImon.sifb.gn.v20200817']\nurl_dict = await get_urls_from_esgf(iids)\nurl_dict['CMIP6.CMIP.CSIRO-ARCCSS.ACCESS-CM2.historical.r1i1p1f1.SImon.sifb.gn.v20200817']\n```\n\ngives\n\n```\n100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 5/5 [00:01<00:00,  4.98it/s]\nProcessing responses\nProcessing responses: Expected files per iid\nProcessing responses: Check for missing iids\nProcessing responses: Flatten results\nProcessing responses: Group results\nFind responsive urls\n100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 1/1 [00:00<00:00,  3.25it/s]\n['https://esgf-data1.llnl.gov/thredds/fileServer/css03_data/CMIP6/CMIP/CSIRO-ARCCSS/ACCESS-CM2/historical/r1i1p1f1/SImon/sifb/gn/v20200817/sifb_SImon_ACCESS-CM2_historical_r1i1p1f1_gn_185001-201412.nc']\n```\n\nor if you want to see detaile debugging statements\n\n```python\nfrom pangeo_forge_esgf import get_urls_from_esgf, setup_logging\nsetup_logging('DEBUG')\niids = ['CMIP6.CMIP.CSIRO-ARCCSS.ACCESS-CM2.historical.r1i1p1f1.SImon.sifb.gn.v20200817']\nurl_dict = await get_urls_from_esgf(iids)\nurl_dict['CMIP6.CMIP.CSIRO-ARCCSS.ACCESS-CM2.historical.r1i1p1f1.SImon.sifb.gn.v20200817']\n```\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Using queries to the ESGF API to generate urls and keyword arguments for receipe generation in pangeo-forge",
    "version": "0.3.0",
    "project_urls": {
        "Homepage": "https://github.com/jbusecke/pangeo-forge-esgf",
        "Tracker": "https://github.com/jbusecke/pangeo-forge-esgf/issues"
    },
    "split_keywords": [
        "pangeo",
        " data",
        " esgf"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e6794395b445b09856ffc2714e8faee5f563b68bfd7653bf8460be1e96881ee3",
                "md5": "3fcaf3f696e4273b3fe1439434c215ee",
                "sha256": "3cba7bad3e12f8fcb7010bce2e6136fa65d6b14ae9ff268e5927b5e1cc37298e"
            },
            "downloads": -1,
            "filename": "pangeo_forge_esgf-0.3.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3fcaf3f696e4273b3fe1439434c215ee",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 17676,
            "upload_time": "2024-04-30T15:33:29",
            "upload_time_iso_8601": "2024-04-30T15:33:29.319502Z",
            "url": "https://files.pythonhosted.org/packages/e6/79/4395b445b09856ffc2714e8faee5f563b68bfd7653bf8460be1e96881ee3/pangeo_forge_esgf-0.3.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3e51ef09c32470911ce1322ce41481bd1b5349c51925ebf0a3d555296ac59be9",
                "md5": "718bbf2c71c67ba40b4c2db52ae0ed1f",
                "sha256": "2635e2128dd2f87a532eac91a114028183313ccdd23eb4fb0abcf5dcee061be1"
            },
            "downloads": -1,
            "filename": "pangeo_forge_esgf-0.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "718bbf2c71c67ba40b4c2db52ae0ed1f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 20754,
            "upload_time": "2024-04-30T15:33:30",
            "upload_time_iso_8601": "2024-04-30T15:33:30.510683Z",
            "url": "https://files.pythonhosted.org/packages/3e/51/ef09c32470911ce1322ce41481bd1b5349c51925ebf0a3d555296ac59be9/pangeo_forge_esgf-0.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-30 15:33:30",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jbusecke",
    "github_project": "pangeo-forge-esgf",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "pangeo-forge-esgf"
}
        
Elapsed time: 0.25393s