Name | pangeo-forge-esgf JSON |
Version |
0.3.0
JSON |
| download |
home_page | None |
Summary | Using queries to the ESGF API to generate urls and keyword arguments for receipe generation in pangeo-forge |
upload_time | 2024-04-30 15:33:30 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.9 |
license | Apache-2.0 |
keywords |
pangeo
data
esgf
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# pangeo-forge-esgf
Using queries to the ESGF API to generate urls and keyword arguments for receipe generation in pangeo-forge
## Install
You can install pangeo-forge-esgf via pip:
```
pip install pangeo-forge-esgf
```
If you want all the required dependencies for testing and development simply do:
```
pip install pangeo-forge-esgf[dev]
```
## Parsing a list of instance ids using wildcards
Pangeo forge recipes require the user to provide exact instance_id's for the datasets they want to be processed. Discovering these with the [web search](https://esgf-node.llnl.gov/search/cmip6/) can become cumbersome, especially when dealing with a large number of members/models etc.
`pangeo-forge-esgf` provides some functions to query the ESGF API based on instance_id values with wildcards.
For example if you want to find all the zonal (`uo`) and meridonal (`vo`) velocities available for the `lgm` experiment of PMIP, you can do:
```python
from pangeo_forge_esgf.parsing import parse_instance_ids
parse_iids = [
"CMIP6.PMIP.*.*.lgm.*.*.[uo, vo].*.*",
]
# Comma separated values in square brackets will be expanded and the above is equivalent to:
# parse_iids = [
# "CMIP6.PMIP.*.*.lgm.*.*.[uo, vo].*.*", # this is equivalent to passing
# "CMIP6.PMIP.*.*.lgm.*.*.vo.*.*",
# ]
iids = []
for piid in parse_iids:
iids.extend(parse_instance_ids(piid))
iids
```
and you will get:
```
['CMIP6.PMIP.MIROC.MIROC-ES2L.lgm.r1i1p1f2.Omon.uo.gn.v20191002',
'CMIP6.PMIP.AWI.AWI-ESM-1-1-LR.lgm.r1i1p1f1.Odec.uo.gn.v20200212',
'CMIP6.PMIP.AWI.AWI-ESM-1-1-LR.lgm.r1i1p1f1.Omon.uo.gn.v20200212',
'CMIP6.PMIP.MIROC.MIROC-ES2L.lgm.r1i1p1f2.Omon.uo.gr1.v20200911',
'CMIP6.PMIP.MPI-M.MPI-ESM1-2-LR.lgm.r1i1p1f1.Omon.uo.gn.v20200909',
'CMIP6.PMIP.AWI.AWI-ESM-1-1-LR.lgm.r1i1p1f1.Omon.vo.gn.v20200212',
'CMIP6.PMIP.MIROC.MIROC-ES2L.lgm.r1i1p1f2.Omon.vo.gn.v20191002',
'CMIP6.PMIP.AWI.AWI-ESM-1-1-LR.lgm.r1i1p1f1.Odec.vo.gn.v20200212',
'CMIP6.PMIP.MIROC.MIROC-ES2L.lgm.r1i1p1f2.Omon.vo.gr1.v20200911',
'CMIP6.PMIP.MPI-M.MPI-ESM1-2-LR.lgm.r1i1p1f1.Omon.vo.gn.v20190710']
```
Eventually I hope I can leverage this functionality to handle user requests in PRs that add wildcard instance_ids, but for now this might be helpful to manually construct lists of instance_ids to submit to a pangeo-forge feedstock.
## Generating PGF recipe input (urls) from instance_ids
```python
from pangeo_forge_esgf import get_urls_from_esgf
iids = ['CMIP6.CMIP.CSIRO-ARCCSS.ACCESS-CM2.historical.r1i1p1f1.SImon.sifb.gn.v20200817']
url_dict = await get_urls_from_esgf(iids)
url_dict['CMIP6.CMIP.CSIRO-ARCCSS.ACCESS-CM2.historical.r1i1p1f1.SImon.sifb.gn.v20200817']
```
gives
```
100%|██████████| 5/5 [00:01<00:00, 4.98it/s]
Processing responses
Processing responses: Expected files per iid
Processing responses: Check for missing iids
Processing responses: Flatten results
Processing responses: Group results
Find responsive urls
100%|██████████| 1/1 [00:00<00:00, 3.25it/s]
['https://esgf-data1.llnl.gov/thredds/fileServer/css03_data/CMIP6/CMIP/CSIRO-ARCCSS/ACCESS-CM2/historical/r1i1p1f1/SImon/sifb/gn/v20200817/sifb_SImon_ACCESS-CM2_historical_r1i1p1f1_gn_185001-201412.nc']
```
or if you want to see detaile debugging statements
```python
from pangeo_forge_esgf import get_urls_from_esgf, setup_logging
setup_logging('DEBUG')
iids = ['CMIP6.CMIP.CSIRO-ARCCSS.ACCESS-CM2.historical.r1i1p1f1.SImon.sifb.gn.v20200817']
url_dict = await get_urls_from_esgf(iids)
url_dict['CMIP6.CMIP.CSIRO-ARCCSS.ACCESS-CM2.historical.r1i1p1f1.SImon.sifb.gn.v20200817']
```
Raw data
{
"_id": null,
"home_page": null,
"name": "pangeo-forge-esgf",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "pangeo, data, esgf",
"author": null,
"author_email": "Julius Busecke <julius@ldeo.columbia.edu>",
"download_url": "https://files.pythonhosted.org/packages/3e/51/ef09c32470911ce1322ce41481bd1b5349c51925ebf0a3d555296ac59be9/pangeo_forge_esgf-0.3.0.tar.gz",
"platform": null,
"description": "# pangeo-forge-esgf\n\nUsing queries to the ESGF API to generate urls and keyword arguments for receipe generation in pangeo-forge\n\n## Install\nYou can install pangeo-forge-esgf via pip:\n```\npip install pangeo-forge-esgf\n```\n\nIf you want all the required dependencies for testing and development simply do:\n```\npip install pangeo-forge-esgf[dev]\n```\n\n## Parsing a list of instance ids using wildcards\n\nPangeo forge recipes require the user to provide exact instance_id's for the datasets they want to be processed. Discovering these with the [web search](https://esgf-node.llnl.gov/search/cmip6/) can become cumbersome, especially when dealing with a large number of members/models etc.\n\n`pangeo-forge-esgf` provides some functions to query the ESGF API based on instance_id values with wildcards.\n\nFor example if you want to find all the zonal (`uo`) and meridonal (`vo`) velocities available for the `lgm` experiment of PMIP, you can do:\n\n```python\nfrom pangeo_forge_esgf.parsing import parse_instance_ids\nparse_iids = [\n \"CMIP6.PMIP.*.*.lgm.*.*.[uo, vo].*.*\",\n]\n# Comma separated values in square brackets will be expanded and the above is equivalent to:\n# parse_iids = [\n# \"CMIP6.PMIP.*.*.lgm.*.*.[uo, vo].*.*\", # this is equivalent to passing\n# \"CMIP6.PMIP.*.*.lgm.*.*.vo.*.*\",\n# ]\niids = []\nfor piid in parse_iids:\n iids.extend(parse_instance_ids(piid))\niids\n```\n\nand you will get:\n\n```\n['CMIP6.PMIP.MIROC.MIROC-ES2L.lgm.r1i1p1f2.Omon.uo.gn.v20191002',\n 'CMIP6.PMIP.AWI.AWI-ESM-1-1-LR.lgm.r1i1p1f1.Odec.uo.gn.v20200212',\n 'CMIP6.PMIP.AWI.AWI-ESM-1-1-LR.lgm.r1i1p1f1.Omon.uo.gn.v20200212',\n 'CMIP6.PMIP.MIROC.MIROC-ES2L.lgm.r1i1p1f2.Omon.uo.gr1.v20200911',\n 'CMIP6.PMIP.MPI-M.MPI-ESM1-2-LR.lgm.r1i1p1f1.Omon.uo.gn.v20200909',\n 'CMIP6.PMIP.AWI.AWI-ESM-1-1-LR.lgm.r1i1p1f1.Omon.vo.gn.v20200212',\n 'CMIP6.PMIP.MIROC.MIROC-ES2L.lgm.r1i1p1f2.Omon.vo.gn.v20191002',\n 'CMIP6.PMIP.AWI.AWI-ESM-1-1-LR.lgm.r1i1p1f1.Odec.vo.gn.v20200212',\n 'CMIP6.PMIP.MIROC.MIROC-ES2L.lgm.r1i1p1f2.Omon.vo.gr1.v20200911',\n 'CMIP6.PMIP.MPI-M.MPI-ESM1-2-LR.lgm.r1i1p1f1.Omon.vo.gn.v20190710']\n```\n\nEventually I hope I can leverage this functionality to handle user requests in PRs that add wildcard instance_ids, but for now this might be helpful to manually construct lists of instance_ids to submit to a pangeo-forge feedstock.\n\n## Generating PGF recipe input (urls) from instance_ids\n\n```python\nfrom pangeo_forge_esgf import get_urls_from_esgf\niids = ['CMIP6.CMIP.CSIRO-ARCCSS.ACCESS-CM2.historical.r1i1p1f1.SImon.sifb.gn.v20200817']\nurl_dict = await get_urls_from_esgf(iids)\nurl_dict['CMIP6.CMIP.CSIRO-ARCCSS.ACCESS-CM2.historical.r1i1p1f1.SImon.sifb.gn.v20200817']\n```\n\ngives\n\n```\n100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 5/5 [00:01<00:00, 4.98it/s]\nProcessing responses\nProcessing responses: Expected files per iid\nProcessing responses: Check for missing iids\nProcessing responses: Flatten results\nProcessing responses: Group results\nFind responsive urls\n100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 1/1 [00:00<00:00, 3.25it/s]\n['https://esgf-data1.llnl.gov/thredds/fileServer/css03_data/CMIP6/CMIP/CSIRO-ARCCSS/ACCESS-CM2/historical/r1i1p1f1/SImon/sifb/gn/v20200817/sifb_SImon_ACCESS-CM2_historical_r1i1p1f1_gn_185001-201412.nc']\n```\n\nor if you want to see detaile debugging statements\n\n```python\nfrom pangeo_forge_esgf import get_urls_from_esgf, setup_logging\nsetup_logging('DEBUG')\niids = ['CMIP6.CMIP.CSIRO-ARCCSS.ACCESS-CM2.historical.r1i1p1f1.SImon.sifb.gn.v20200817']\nurl_dict = await get_urls_from_esgf(iids)\nurl_dict['CMIP6.CMIP.CSIRO-ARCCSS.ACCESS-CM2.historical.r1i1p1f1.SImon.sifb.gn.v20200817']\n```\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Using queries to the ESGF API to generate urls and keyword arguments for receipe generation in pangeo-forge",
"version": "0.3.0",
"project_urls": {
"Homepage": "https://github.com/jbusecke/pangeo-forge-esgf",
"Tracker": "https://github.com/jbusecke/pangeo-forge-esgf/issues"
},
"split_keywords": [
"pangeo",
" data",
" esgf"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "e6794395b445b09856ffc2714e8faee5f563b68bfd7653bf8460be1e96881ee3",
"md5": "3fcaf3f696e4273b3fe1439434c215ee",
"sha256": "3cba7bad3e12f8fcb7010bce2e6136fa65d6b14ae9ff268e5927b5e1cc37298e"
},
"downloads": -1,
"filename": "pangeo_forge_esgf-0.3.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3fcaf3f696e4273b3fe1439434c215ee",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 17676,
"upload_time": "2024-04-30T15:33:29",
"upload_time_iso_8601": "2024-04-30T15:33:29.319502Z",
"url": "https://files.pythonhosted.org/packages/e6/79/4395b445b09856ffc2714e8faee5f563b68bfd7653bf8460be1e96881ee3/pangeo_forge_esgf-0.3.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "3e51ef09c32470911ce1322ce41481bd1b5349c51925ebf0a3d555296ac59be9",
"md5": "718bbf2c71c67ba40b4c2db52ae0ed1f",
"sha256": "2635e2128dd2f87a532eac91a114028183313ccdd23eb4fb0abcf5dcee061be1"
},
"downloads": -1,
"filename": "pangeo_forge_esgf-0.3.0.tar.gz",
"has_sig": false,
"md5_digest": "718bbf2c71c67ba40b4c2db52ae0ed1f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 20754,
"upload_time": "2024-04-30T15:33:30",
"upload_time_iso_8601": "2024-04-30T15:33:30.510683Z",
"url": "https://files.pythonhosted.org/packages/3e/51/ef09c32470911ce1322ce41481bd1b5349c51925ebf0a3d555296ac59be9/pangeo_forge_esgf-0.3.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-30 15:33:30",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "jbusecke",
"github_project": "pangeo-forge-esgf",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "pangeo-forge-esgf"
}