pangeo-forge-esgf


Namepangeo-forge-esgf JSON
Version 0.3.1 PyPI version JSON
download
home_pageNone
SummaryUsing queries to the ESGF API to generate urls and keyword arguments for receipe generation in pangeo-forge
upload_time2024-05-31 16:23:18
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseApache-2.0
keywords pangeo data esgf
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pangeo-forge-esgf

Using queries to the ESGF API to generate urls and keyword arguments for receipe generation in pangeo-forge

## Install
You can install pangeo-forge-esgf via pip:
```
pip install pangeo-forge-esgf
```

If you want all the required dependencies for testing and development simply do:
```
pip install pangeo-forge-esgf[dev]
```

## Parsing a list of instance ids using wildcards

Pangeo forge recipes require the user to provide exact instance_id's for the datasets they want to be processed. Discovering these with the [web search](https://esgf-node.llnl.gov/search/cmip6/) can become cumbersome, especially when dealing with a large number of members/models etc.

`pangeo-forge-esgf` provides some functions to query the ESGF API based on instance_id values with wildcards.

For example if you want to find all the zonal (`uo`) and meridonal (`vo`) velocities available for the `lgm` experiment of PMIP, you can do:

```python
from pangeo_forge_esgf.parsing import parse_instance_ids
parse_iids = [
    "CMIP6.PMIP.*.*.lgm.*.*.[uo, vo].*.*",
]
# Comma separated values in square brackets will be expanded and the above is equivalent to:
# parse_iids = [
#     "CMIP6.PMIP.*.*.lgm.*.*.[uo, vo].*.*", # this is equivalent to passing
#     "CMIP6.PMIP.*.*.lgm.*.*.vo.*.*",
# ]
iids = []
for piid in parse_iids:
    iids.extend(parse_instance_ids(piid))
iids
```

and you will get:

```
['CMIP6.PMIP.MIROC.MIROC-ES2L.lgm.r1i1p1f2.Omon.uo.gn.v20191002',
 'CMIP6.PMIP.AWI.AWI-ESM-1-1-LR.lgm.r1i1p1f1.Odec.uo.gn.v20200212',
 'CMIP6.PMIP.AWI.AWI-ESM-1-1-LR.lgm.r1i1p1f1.Omon.uo.gn.v20200212',
 'CMIP6.PMIP.MIROC.MIROC-ES2L.lgm.r1i1p1f2.Omon.uo.gr1.v20200911',
 'CMIP6.PMIP.MPI-M.MPI-ESM1-2-LR.lgm.r1i1p1f1.Omon.uo.gn.v20200909',
 'CMIP6.PMIP.AWI.AWI-ESM-1-1-LR.lgm.r1i1p1f1.Omon.vo.gn.v20200212',
 'CMIP6.PMIP.MIROC.MIROC-ES2L.lgm.r1i1p1f2.Omon.vo.gn.v20191002',
 'CMIP6.PMIP.AWI.AWI-ESM-1-1-LR.lgm.r1i1p1f1.Odec.vo.gn.v20200212',
 'CMIP6.PMIP.MIROC.MIROC-ES2L.lgm.r1i1p1f2.Omon.vo.gr1.v20200911',
 'CMIP6.PMIP.MPI-M.MPI-ESM1-2-LR.lgm.r1i1p1f1.Omon.vo.gn.v20190710']
```

Eventually I hope I can leverage this functionality to handle user requests in PRs that add wildcard instance_ids, but for now this might be helpful to manually construct lists of instance_ids to submit to a pangeo-forge feedstock.

## Generating PGF recipe input (urls) from instance_ids

```python
from pangeo_forge_esgf import get_urls_from_esgf
iids = ['CMIP6.CMIP.CSIRO-ARCCSS.ACCESS-CM2.historical.r1i1p1f1.SImon.sifb.gn.v20200817']
url_dict = await get_urls_from_esgf(iids)
url_dict['CMIP6.CMIP.CSIRO-ARCCSS.ACCESS-CM2.historical.r1i1p1f1.SImon.sifb.gn.v20200817']
```

gives

```
100%|██████████| 5/5 [00:01<00:00,  4.98it/s]
Processing responses
Processing responses: Expected files per iid
Processing responses: Check for missing iids
Processing responses: Flatten results
Processing responses: Group results
Find responsive urls
100%|██████████| 1/1 [00:00<00:00,  3.25it/s]
['https://esgf-data1.llnl.gov/thredds/fileServer/css03_data/CMIP6/CMIP/CSIRO-ARCCSS/ACCESS-CM2/historical/r1i1p1f1/SImon/sifb/gn/v20200817/sifb_SImon_ACCESS-CM2_historical_r1i1p1f1_gn_185001-201412.nc']
```

or if you want to see detaile debugging statements

```python
from pangeo_forge_esgf import get_urls_from_esgf, setup_logging
setup_logging('DEBUG')
iids = ['CMIP6.CMIP.CSIRO-ARCCSS.ACCESS-CM2.historical.r1i1p1f1.SImon.sifb.gn.v20200817']
url_dict = await get_urls_from_esgf(iids)
url_dict['CMIP6.CMIP.CSIRO-ARCCSS.ACCESS-CM2.historical.r1i1p1f1.SImon.sifb.gn.v20200817']
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "pangeo-forge-esgf",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "pangeo, data, esgf",
    "author": null,
    "author_email": "Julius Busecke <julius@ldeo.columbia.edu>",
    "download_url": "https://files.pythonhosted.org/packages/86/30/8ecce769206c8783a8ec665a0776617d7663ed158507d67275665954c2e4/pangeo_forge_esgf-0.3.1.tar.gz",
    "platform": null,
    "description": "# pangeo-forge-esgf\n\nUsing queries to the ESGF API to generate urls and keyword arguments for receipe generation in pangeo-forge\n\n## Install\nYou can install pangeo-forge-esgf via pip:\n```\npip install pangeo-forge-esgf\n```\n\nIf you want all the required dependencies for testing and development simply do:\n```\npip install pangeo-forge-esgf[dev]\n```\n\n## Parsing a list of instance ids using wildcards\n\nPangeo forge recipes require the user to provide exact instance_id's for the datasets they want to be processed. Discovering these with the [web search](https://esgf-node.llnl.gov/search/cmip6/) can become cumbersome, especially when dealing with a large number of members/models etc.\n\n`pangeo-forge-esgf` provides some functions to query the ESGF API based on instance_id values with wildcards.\n\nFor example if you want to find all the zonal (`uo`) and meridonal (`vo`) velocities available for the `lgm` experiment of PMIP, you can do:\n\n```python\nfrom pangeo_forge_esgf.parsing import parse_instance_ids\nparse_iids = [\n    \"CMIP6.PMIP.*.*.lgm.*.*.[uo, vo].*.*\",\n]\n# Comma separated values in square brackets will be expanded and the above is equivalent to:\n# parse_iids = [\n#     \"CMIP6.PMIP.*.*.lgm.*.*.[uo, vo].*.*\", # this is equivalent to passing\n#     \"CMIP6.PMIP.*.*.lgm.*.*.vo.*.*\",\n# ]\niids = []\nfor piid in parse_iids:\n    iids.extend(parse_instance_ids(piid))\niids\n```\n\nand you will get:\n\n```\n['CMIP6.PMIP.MIROC.MIROC-ES2L.lgm.r1i1p1f2.Omon.uo.gn.v20191002',\n 'CMIP6.PMIP.AWI.AWI-ESM-1-1-LR.lgm.r1i1p1f1.Odec.uo.gn.v20200212',\n 'CMIP6.PMIP.AWI.AWI-ESM-1-1-LR.lgm.r1i1p1f1.Omon.uo.gn.v20200212',\n 'CMIP6.PMIP.MIROC.MIROC-ES2L.lgm.r1i1p1f2.Omon.uo.gr1.v20200911',\n 'CMIP6.PMIP.MPI-M.MPI-ESM1-2-LR.lgm.r1i1p1f1.Omon.uo.gn.v20200909',\n 'CMIP6.PMIP.AWI.AWI-ESM-1-1-LR.lgm.r1i1p1f1.Omon.vo.gn.v20200212',\n 'CMIP6.PMIP.MIROC.MIROC-ES2L.lgm.r1i1p1f2.Omon.vo.gn.v20191002',\n 'CMIP6.PMIP.AWI.AWI-ESM-1-1-LR.lgm.r1i1p1f1.Odec.vo.gn.v20200212',\n 'CMIP6.PMIP.MIROC.MIROC-ES2L.lgm.r1i1p1f2.Omon.vo.gr1.v20200911',\n 'CMIP6.PMIP.MPI-M.MPI-ESM1-2-LR.lgm.r1i1p1f1.Omon.vo.gn.v20190710']\n```\n\nEventually I hope I can leverage this functionality to handle user requests in PRs that add wildcard instance_ids, but for now this might be helpful to manually construct lists of instance_ids to submit to a pangeo-forge feedstock.\n\n## Generating PGF recipe input (urls) from instance_ids\n\n```python\nfrom pangeo_forge_esgf import get_urls_from_esgf\niids = ['CMIP6.CMIP.CSIRO-ARCCSS.ACCESS-CM2.historical.r1i1p1f1.SImon.sifb.gn.v20200817']\nurl_dict = await get_urls_from_esgf(iids)\nurl_dict['CMIP6.CMIP.CSIRO-ARCCSS.ACCESS-CM2.historical.r1i1p1f1.SImon.sifb.gn.v20200817']\n```\n\ngives\n\n```\n100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 5/5 [00:01<00:00,  4.98it/s]\nProcessing responses\nProcessing responses: Expected files per iid\nProcessing responses: Check for missing iids\nProcessing responses: Flatten results\nProcessing responses: Group results\nFind responsive urls\n100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 1/1 [00:00<00:00,  3.25it/s]\n['https://esgf-data1.llnl.gov/thredds/fileServer/css03_data/CMIP6/CMIP/CSIRO-ARCCSS/ACCESS-CM2/historical/r1i1p1f1/SImon/sifb/gn/v20200817/sifb_SImon_ACCESS-CM2_historical_r1i1p1f1_gn_185001-201412.nc']\n```\n\nor if you want to see detaile debugging statements\n\n```python\nfrom pangeo_forge_esgf import get_urls_from_esgf, setup_logging\nsetup_logging('DEBUG')\niids = ['CMIP6.CMIP.CSIRO-ARCCSS.ACCESS-CM2.historical.r1i1p1f1.SImon.sifb.gn.v20200817']\nurl_dict = await get_urls_from_esgf(iids)\nurl_dict['CMIP6.CMIP.CSIRO-ARCCSS.ACCESS-CM2.historical.r1i1p1f1.SImon.sifb.gn.v20200817']\n```\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Using queries to the ESGF API to generate urls and keyword arguments for receipe generation in pangeo-forge",
    "version": "0.3.1",
    "project_urls": {
        "Homepage": "https://github.com/jbusecke/pangeo-forge-esgf",
        "Tracker": "https://github.com/jbusecke/pangeo-forge-esgf/issues"
    },
    "split_keywords": [
        "pangeo",
        " data",
        " esgf"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fbd461ba41fe0b7b12de3e3ae34753db39e2773e2275beba0f2a4b4726ab79b0",
                "md5": "4ce5378be7080df931eebb6f5f6649e0",
                "sha256": "fd84c5c4857ce376a8d4238c9c3b1180f9f4cc52caf1cf250f4fc512c2aa91f5"
            },
            "downloads": -1,
            "filename": "pangeo_forge_esgf-0.3.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4ce5378be7080df931eebb6f5f6649e0",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 18723,
            "upload_time": "2024-05-31T16:23:16",
            "upload_time_iso_8601": "2024-05-31T16:23:16.859181Z",
            "url": "https://files.pythonhosted.org/packages/fb/d4/61ba41fe0b7b12de3e3ae34753db39e2773e2275beba0f2a4b4726ab79b0/pangeo_forge_esgf-0.3.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "86308ecce769206c8783a8ec665a0776617d7663ed158507d67275665954c2e4",
                "md5": "3b66f87011105d67ff800156b661724b",
                "sha256": "2ccd659b29e43071ee4ca5075e2cb7bf7af449fe77ceb4ce3cd28dd410ca522e"
            },
            "downloads": -1,
            "filename": "pangeo_forge_esgf-0.3.1.tar.gz",
            "has_sig": false,
            "md5_digest": "3b66f87011105d67ff800156b661724b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 21449,
            "upload_time": "2024-05-31T16:23:18",
            "upload_time_iso_8601": "2024-05-31T16:23:18.629175Z",
            "url": "https://files.pythonhosted.org/packages/86/30/8ecce769206c8783a8ec665a0776617d7663ed158507d67275665954c2e4/pangeo_forge_esgf-0.3.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-31 16:23:18",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jbusecke",
    "github_project": "pangeo-forge-esgf",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "pangeo-forge-esgf"
}
        
Elapsed time: 0.80537s