# pandasGWAS: a Python package for easy retrieval of GWAS Catalog data
## Cite this work
Cao, T., Li, A. & Huang, Y. pandasGWAS: a Python package for easy retrieval of GWAS catalog data. BMC Genomics 24, 238 (2023). https://doi.org/10.1186/s12864-023-09340-2
## News
Starting from V1.2.0, pandasGWAS upgraded the version supported by Python to 3.11.
Starting from V0.99.18, pandasGWAS can cache API requests in memory.
Starting from V0.99.14, pandasGWAS can retrieve the summary statistics of the GWAS Catalog.
## Installation
`pip install pandasgwas`
## Example
Get studies related to triple-negative breast cancer:
```Python
from pandasgwas import get_studies
studies = get_studies(efo_trait = 'triple-negative breast cancer')
studies.studies[0:4]
# initialSampleSize gxe gxg snpCount qualifier imputed pooled studyDesignComment accessionId fullPvalueSet userRequested platforms ancestries genotypingTechnologies replicationSampleSize diseaseTrait.trait publicationInfo.pubmedId publicationInfo.publicationDate publicationInfo.publication publicationInfo.title publicationInfo.author.fullname publicationInfo.author.orcid
#0 1,529 European ancestry cases, 3,399 European ... False False NaN None True False None GCST002305 False False [{'manufacturer': 'Illumina'}] [{'type': 'replication', 'numberOfIndividuals'... [{'genotypingTechnology': 'Genome-wide genotyp... 2,148 European ancestry cases, 1,309 European ... Breast cancer (estrogen-receptor negative, pro... 24325915 2013-12-09 Carcinogenesis Genome-wide association study identifies 25 kn... Purrington KS 0000-0002-5710-1692
#1 8,602 European ancestry triple negative cases,... False False 9.700e+06 ~ True False None GCST010100 False True [{'manufacturer': 'Illumina'}] [{'type': 'initial', 'numberOfIndividuals': 11... [{'genotypingTechnology': 'Genome-wide genotyp... NA Breast cancer (estrogen-receptor negative, pro... 32424353 2020-05-18 Nat Genet Genome-wide association study identifies 32 no... Zhang H None
#2 5,631 European ancestry individuals False False 1.000e+07 None True False None GCST90029052 False False [] [{'type': 'initial', 'numberOfIndividuals': 56... [{'genotypingTechnology': 'Genome-wide genotyp... NA 15-year breast cancer-specific survival (ER ne... 34407845 2021-08-18 Breast Cancer Res Association of germline genetic variants with ... Morra A None
```
Find associated variants with study GCST002305:
```Python
from pandasgwas import get_variants
variants = get_variants(study_id='GCST002305')
variants.variants[['rsId', 'functionalClass']]
# rsId functionalClass
# 0 rs4245739 3_prime_UTR_variant
# 1 rs2363956 missense_variant
# 2 rs10069690 intron_variant
# 3 rs3757318 intron_variant
# 4 rs10771399 intergenic_variant
```
Aggregate queried results using mathematical symbols. In addition to using the plus sign(+), the package can also use other symbols(-, &, |, ^) to perform corresponding set operations on data objects of the same type.
```Python
from pandasgwas.get_studies import get_studies
study1=get_studies(reported_trait='Suicide risk')
study2=get_studies(reported_trait="Dupuytren's disease")
study3=get_studies(reported_trait="Triglycerides")
study4=get_studies(reported_trait="Retinal vascular caliber")
study5=get_studies(reported_trait="Non-small cell lung cancer (survival)")
all_studies=study1+study2+study3+study4+study5
```
## Summary statistics
>It’s important to note that the data available on the FTP and REST API out of sync. The FTP is updated nightly with any newly ingested data. Currently, we’re unable to release more data to the REST API as it’s undergoing a complete redevelopment to help us cope with the tremendous growth in summary statistics data.
Due to the above description on the official website, pandasGWAS has established a programming interface to query summary statistics data based on FTP data.
An example to get started is as follows:
```Python
from pandasgwas.summary_statistics import search, browser, download, parse
#Search the index based on PubMed_id, study_accession_id, and EFO_trait_id. The indexed results will be returned as a DataFrame.
search_DF = search(PubMed_id='27918534', study_accession_id='GCST003966')
#Based on the index results, view the data directory on the browser.
browser(search_DF)
#Based on index results, download summary statistics data in $Home/pandasgwas_home.
download(search_DF)
#Based on the index results, load the data from $Home/pandasgwas_home and convert it into a DataFrame.
df = parse(search_DF)
```
## Dependencies
python: 3.11
pandas: 1.5.3
requests: 2.31.0
progressbar2: 4.2.0
## Documentation
See [pandasGWAS Documentation](https://caotianze.github.io/pandasgwas/)
## Licensing information
### Source code
MIT License
### Data from NHGRI-EBI GWAS Catalog
The NHGRI-EBI GWAS Catalog and all its contents are available under the general [Terms of Use for EMBL-EBI Services](https://www.ebi.ac.uk/about/terms-of-use). Summary statistics are made available under [CC0](https://creativecommons.org/publicdomain/zero/1.0/) unless [otherwise stated](https://www.ebi.ac.uk/gwas/docs/faq#faq-H7).
## Development environment
OS: Windows10 Professional
IDE: PyCharm 2022.1 (Community Edition)
## Similar projects
R package [gwasrapidd](https://github.com/ramiromagno/gwasrapidd) by Ramiro Magno
Raw data
{
"_id": null,
"home_page": "https://github.com/caotianze/pandasgwas",
"name": "pandasgwas",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.11",
"maintainer_email": null,
"keywords": "gwas, genomics, snp, bioinformatics, pandas",
"author": "Cao Tianze",
"author_email": "hnrcao@qq.com",
"download_url": "https://files.pythonhosted.org/packages/6c/ef/54a6f66afb54016cb7310684731f295f56809fa2121f03c253c680d5ef17/pandasgwas-1.2.2.tar.gz",
"platform": null,
"description": "# pandasGWAS: a Python package for easy retrieval of GWAS Catalog data\r\n## Cite this work\r\nCao, T., Li, A. & Huang, Y. pandasGWAS: a Python package for easy retrieval of GWAS catalog data. BMC Genomics 24, 238 (2023). https://doi.org/10.1186/s12864-023-09340-2\r\n## News\r\nStarting from V1.2.0, pandasGWAS upgraded the version supported by Python to 3.11. \r\nStarting from V0.99.18, pandasGWAS can cache API requests in memory. \r\nStarting from V0.99.14, pandasGWAS can retrieve the summary statistics of the GWAS Catalog.\r\n## Installation\r\n`pip install pandasgwas`\r\n## Example\r\nGet studies related to triple-negative breast cancer:\r\n```Python\r\nfrom pandasgwas import get_studies\r\nstudies = get_studies(efo_trait = 'triple-negative breast cancer')\r\nstudies.studies[0:4]\r\n# initialSampleSize gxe gxg snpCount qualifier imputed pooled studyDesignComment accessionId fullPvalueSet userRequested platforms ancestries genotypingTechnologies replicationSampleSize diseaseTrait.trait publicationInfo.pubmedId publicationInfo.publicationDate publicationInfo.publication publicationInfo.title publicationInfo.author.fullname publicationInfo.author.orcid\r\n#0 1,529 European ancestry cases, 3,399 European ... False False NaN None True False None GCST002305 False False [{'manufacturer': 'Illumina'}] [{'type': 'replication', 'numberOfIndividuals'... [{'genotypingTechnology': 'Genome-wide genotyp... 2,148 European ancestry cases, 1,309 European ... Breast cancer (estrogen-receptor negative, pro... 24325915 2013-12-09 Carcinogenesis Genome-wide association study identifies 25 kn... Purrington KS 0000-0002-5710-1692 \r\n#1 8,602 European ancestry triple negative cases,... False False 9.700e+06 ~ True False None GCST010100 False True [{'manufacturer': 'Illumina'}] [{'type': 'initial', 'numberOfIndividuals': 11... [{'genotypingTechnology': 'Genome-wide genotyp... NA Breast cancer (estrogen-receptor negative, pro... 32424353 2020-05-18 Nat Genet Genome-wide association study identifies 32 no... Zhang H None \r\n#2 5,631 European ancestry individuals False False 1.000e+07 None True False None GCST90029052 False False [] [{'type': 'initial', 'numberOfIndividuals': 56... [{'genotypingTechnology': 'Genome-wide genotyp... NA 15-year breast cancer-specific survival (ER ne... 34407845 2021-08-18 Breast Cancer Res Association of germline genetic variants with ... Morra A None\r\n```\r\nFind associated variants with study GCST002305:\r\n\r\n```Python\r\nfrom pandasgwas import get_variants\r\nvariants = get_variants(study_id='GCST002305')\r\nvariants.variants[['rsId', 'functionalClass']]\r\n# rsId functionalClass \r\n# 0 rs4245739 3_prime_UTR_variant\r\n# 1 rs2363956 missense_variant\r\n# 2 rs10069690 intron_variant\r\n# 3 rs3757318 intron_variant\r\n# 4 rs10771399 intergenic_variant\r\n```\r\nAggregate queried results using mathematical symbols. In addition to using the plus sign(+), the package can also use other symbols(-, &, |, ^) to perform corresponding set operations on data objects of the same type.\r\n```Python\r\nfrom pandasgwas.get_studies import get_studies\r\nstudy1=get_studies(reported_trait='Suicide risk')\r\nstudy2=get_studies(reported_trait=\"Dupuytren's disease\")\r\nstudy3=get_studies(reported_trait=\"Triglycerides\")\r\nstudy4=get_studies(reported_trait=\"Retinal vascular caliber\")\r\nstudy5=get_studies(reported_trait=\"Non-small cell lung cancer (survival)\")\r\nall_studies=study1+study2+study3+study4+study5\r\n```\r\n## Summary statistics\r\n>It\u2019s important to note that the data available on the FTP and REST API out of sync. The FTP is updated nightly with any newly ingested data. Currently, we\u2019re unable to release more data to the REST API as it\u2019s undergoing a complete redevelopment to help us cope with the tremendous growth in summary statistics data.\r\n\r\nDue to the above description on the official website, pandasGWAS has established a programming interface to query summary statistics data based on FTP data. \r\nAn example to get started is as follows:\r\n```Python\r\nfrom pandasgwas.summary_statistics import search, browser, download, parse\r\n#Search the index based on PubMed_id, study_accession_id, and EFO_trait_id. The indexed results will be returned as a DataFrame.\r\nsearch_DF = search(PubMed_id='27918534', study_accession_id='GCST003966')\r\n#Based on the index results, view the data directory on the browser.\r\nbrowser(search_DF)\r\n#Based on index results, download summary statistics data in $Home/pandasgwas_home.\r\ndownload(search_DF)\r\n#Based on the index results, load the data from $Home/pandasgwas_home and convert it into a DataFrame. \r\ndf = parse(search_DF)\r\n```\r\n## Dependencies\r\npython: 3.11 \r\npandas: 1.5.3 \r\nrequests: 2.31.0 \r\nprogressbar2: 4.2.0\r\n## Documentation\r\nSee [pandasGWAS Documentation](https://caotianze.github.io/pandasgwas/)\r\n## Licensing information\r\n### Source code\r\nMIT License\r\n### Data from NHGRI-EBI GWAS Catalog\r\nThe NHGRI-EBI GWAS Catalog and all its contents are available under the general [Terms of Use for EMBL-EBI Services](https://www.ebi.ac.uk/about/terms-of-use). Summary statistics are made available under [CC0](https://creativecommons.org/publicdomain/zero/1.0/) unless [otherwise stated](https://www.ebi.ac.uk/gwas/docs/faq#faq-H7).\r\n## Development environment\r\nOS: Windows10 Professional \r\nIDE: PyCharm 2022.1 (Community Edition)\r\n## Similar projects\r\nR package [gwasrapidd](https://github.com/ramiromagno/gwasrapidd) by Ramiro Magno\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A Python package for easy retrieval of GWAS Catalog data",
"version": "1.2.2",
"project_urls": {
"Bug Tracker": "https://github.com/caotianze/pandasgwas/issues",
"Homepage": "https://github.com/caotianze/pandasgwas"
},
"split_keywords": [
"gwas",
" genomics",
" snp",
" bioinformatics",
" pandas"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "6cef54a6f66afb54016cb7310684731f295f56809fa2121f03c253c680d5ef17",
"md5": "999f6709be326b1e1b1c9f58e285d4fa",
"sha256": "e27f5bce3eda33de38c0e477342626bfba131f8b9daa3b6919d2baad2467dcb6"
},
"downloads": -1,
"filename": "pandasgwas-1.2.2.tar.gz",
"has_sig": false,
"md5_digest": "999f6709be326b1e1b1c9f58e285d4fa",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.11",
"size": 226841,
"upload_time": "2024-03-24T07:30:36",
"upload_time_iso_8601": "2024-03-24T07:30:36.301771Z",
"url": "https://files.pythonhosted.org/packages/6c/ef/54a6f66afb54016cb7310684731f295f56809fa2121f03c253c680d5ef17/pandasgwas-1.2.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-03-24 07:30:36",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "caotianze",
"github_project": "pandasgwas",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "pandasgwas"
}