pandasgwas


Namepandasgwas JSON
Version 1.2.2 PyPI version JSON
download
home_pagehttps://github.com/caotianze/pandasgwas
SummaryA Python package for easy retrieval of GWAS Catalog data
upload_time2024-03-24 07:30:36
maintainerNone
docs_urlNone
authorCao Tianze
requires_python>=3.11
licenseMIT
keywords gwas genomics snp bioinformatics pandas
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pandasGWAS: a Python package for easy retrieval of GWAS Catalog data
## Cite this work
Cao, T., Li, A. & Huang, Y. pandasGWAS: a Python package for easy retrieval of GWAS catalog data. BMC Genomics 24, 238 (2023). https://doi.org/10.1186/s12864-023-09340-2
## News
Starting from V1.2.0, pandasGWAS upgraded the version supported by Python to 3.11.    
Starting from V0.99.18, pandasGWAS can cache API requests in memory.    
Starting from V0.99.14, pandasGWAS can retrieve the summary statistics of the GWAS Catalog.
## Installation
`pip install pandasgwas`
## Example
Get studies related to triple-negative breast cancer:
```Python
from pandasgwas import get_studies
studies = get_studies(efo_trait = 'triple-negative breast cancer')
studies.studies[0:4]
#                  initialSampleSize                    gxe    gxg   snpCount  qualifier  imputed  pooled studyDesignComment  accessionId   fullPvalueSet  userRequested            platforms                                ancestries                                   genotypingTechnologies                             replicationSampleSize                                diseaseTrait.trait                 publicationInfo.pubmedId publicationInfo.publicationDate publicationInfo.publication               publicationInfo.title                publicationInfo.author.fullname publicationInfo.author.orcid
#0  1,529 European ancestry cases, 3,399 European ...  False  False        NaN    None     True     False        None           GCST002305      False          False      [{'manufacturer': 'Illumina'}]  [{'type': 'replication', 'numberOfIndividuals'...  [{'genotypingTechnology': 'Genome-wide genotyp...  2,148 European ancestry cases, 1,309 European ...  Breast cancer (estrogen-receptor negative, pro...         24325915                    2013-12-09                    Carcinogenesis      Genome-wide association study identifies 25 kn...           Purrington KS              0000-0002-5710-1692    
#1  8,602 European ancestry triple negative cases,...  False  False  9.700e+06       ~     True     False        None           GCST010100      False           True      [{'manufacturer': 'Illumina'}]  [{'type': 'initial', 'numberOfIndividuals': 11...  [{'genotypingTechnology': 'Genome-wide genotyp...                                                 NA  Breast cancer (estrogen-receptor negative, pro...         32424353                    2020-05-18                         Nat Genet      Genome-wide association study identifies 32 no...                 Zhang H                             None    
#2                5,631 European ancestry individuals  False  False  1.000e+07    None     True     False        None         GCST90029052      False          False                                  []  [{'type': 'initial', 'numberOfIndividuals': 56...  [{'genotypingTechnology': 'Genome-wide genotyp...                                                 NA  15-year breast cancer-specific survival (ER ne...         34407845                    2021-08-18                 Breast Cancer Res      Association of germline genetic variants with ...                 Morra A                             None
```
Find associated variants with study GCST002305:

```Python
from pandasgwas import get_variants
variants = get_variants(study_id='GCST002305')
variants.variants[['rsId', 'functionalClass']]
#      rsId      functionalClass   
# 0   rs4245739  3_prime_UTR_variant
# 1   rs2363956     missense_variant
# 2  rs10069690       intron_variant
# 3   rs3757318       intron_variant
# 4  rs10771399   intergenic_variant
```
Aggregate queried results using mathematical symbols. In addition to using the plus sign(+), the package can also use other symbols(-, &, |, ^) to perform corresponding set operations on data objects of the same type.
```Python
from pandasgwas.get_studies import get_studies
study1=get_studies(reported_trait='Suicide risk')
study2=get_studies(reported_trait="Dupuytren's disease")
study3=get_studies(reported_trait="Triglycerides")
study4=get_studies(reported_trait="Retinal vascular caliber")
study5=get_studies(reported_trait="Non-small cell lung cancer (survival)")
all_studies=study1+study2+study3+study4+study5
```
## Summary statistics
>It’s important to note that the data available on the FTP and REST API out of sync. The FTP is updated nightly with any newly ingested data. Currently, we’re unable to release more data to the REST API as it’s undergoing a complete redevelopment to help us cope with the tremendous growth in summary statistics data.

Due to the above description on the official website, pandasGWAS has established a programming interface to query summary statistics data based on FTP data.    
An example to get started is as follows:
```Python
from pandasgwas.summary_statistics import search, browser, download, parse
#Search the index based on PubMed_id, study_accession_id, and EFO_trait_id. The indexed results will be returned as a DataFrame.
search_DF = search(PubMed_id='27918534', study_accession_id='GCST003966')
#Based on the index results, view the data directory on the browser.
browser(search_DF)
#Based on index results, download summary statistics data in $Home/pandasgwas_home.
download(search_DF)
#Based on the index results, load the data from $Home/pandasgwas_home and convert it into a DataFrame. 
df = parse(search_DF)
```
## Dependencies
python: 3.11  
pandas: 1.5.3  
requests: 2.31.0  
progressbar2: 4.2.0
## Documentation
See [pandasGWAS Documentation](https://caotianze.github.io/pandasgwas/)
## Licensing information
### Source code
MIT License
### Data from NHGRI-EBI GWAS Catalog
The NHGRI-EBI GWAS Catalog and all its contents are available under the general [Terms of Use for EMBL-EBI Services](https://www.ebi.ac.uk/about/terms-of-use). Summary statistics are made available under [CC0](https://creativecommons.org/publicdomain/zero/1.0/) unless [otherwise stated](https://www.ebi.ac.uk/gwas/docs/faq#faq-H7).
## Development environment
OS: Windows10 Professional  
IDE: PyCharm 2022.1 (Community Edition)
## Similar projects
R package [gwasrapidd](https://github.com/ramiromagno/gwasrapidd) by Ramiro Magno

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/caotianze/pandasgwas",
    "name": "pandasgwas",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "gwas, genomics, snp, bioinformatics, pandas",
    "author": "Cao Tianze",
    "author_email": "hnrcao@qq.com",
    "download_url": "https://files.pythonhosted.org/packages/6c/ef/54a6f66afb54016cb7310684731f295f56809fa2121f03c253c680d5ef17/pandasgwas-1.2.2.tar.gz",
    "platform": null,
    "description": "# pandasGWAS: a Python package for easy retrieval of GWAS Catalog data\r\n## Cite this work\r\nCao, T., Li, A. & Huang, Y. pandasGWAS: a Python package for easy retrieval of GWAS catalog data. BMC Genomics 24, 238 (2023). https://doi.org/10.1186/s12864-023-09340-2\r\n## News\r\nStarting from V1.2.0, pandasGWAS upgraded the version supported by Python to 3.11.    \r\nStarting from V0.99.18, pandasGWAS can cache API requests in memory.    \r\nStarting from V0.99.14, pandasGWAS can retrieve the summary statistics of the GWAS Catalog.\r\n## Installation\r\n`pip install pandasgwas`\r\n## Example\r\nGet studies related to triple-negative breast cancer:\r\n```Python\r\nfrom pandasgwas import get_studies\r\nstudies = get_studies(efo_trait = 'triple-negative breast cancer')\r\nstudies.studies[0:4]\r\n#                  initialSampleSize                    gxe    gxg   snpCount  qualifier  imputed  pooled studyDesignComment  accessionId   fullPvalueSet  userRequested            platforms                                ancestries                                   genotypingTechnologies                             replicationSampleSize                                diseaseTrait.trait                 publicationInfo.pubmedId publicationInfo.publicationDate publicationInfo.publication               publicationInfo.title                publicationInfo.author.fullname publicationInfo.author.orcid\r\n#0  1,529 European ancestry cases, 3,399 European ...  False  False        NaN    None     True     False        None           GCST002305      False          False      [{'manufacturer': 'Illumina'}]  [{'type': 'replication', 'numberOfIndividuals'...  [{'genotypingTechnology': 'Genome-wide genotyp...  2,148 European ancestry cases, 1,309 European ...  Breast cancer (estrogen-receptor negative, pro...         24325915                    2013-12-09                    Carcinogenesis      Genome-wide association study identifies 25 kn...           Purrington KS              0000-0002-5710-1692    \r\n#1  8,602 European ancestry triple negative cases,...  False  False  9.700e+06       ~     True     False        None           GCST010100      False           True      [{'manufacturer': 'Illumina'}]  [{'type': 'initial', 'numberOfIndividuals': 11...  [{'genotypingTechnology': 'Genome-wide genotyp...                                                 NA  Breast cancer (estrogen-receptor negative, pro...         32424353                    2020-05-18                         Nat Genet      Genome-wide association study identifies 32 no...                 Zhang H                             None    \r\n#2                5,631 European ancestry individuals  False  False  1.000e+07    None     True     False        None         GCST90029052      False          False                                  []  [{'type': 'initial', 'numberOfIndividuals': 56...  [{'genotypingTechnology': 'Genome-wide genotyp...                                                 NA  15-year breast cancer-specific survival (ER ne...         34407845                    2021-08-18                 Breast Cancer Res      Association of germline genetic variants with ...                 Morra A                             None\r\n```\r\nFind associated variants with study GCST002305:\r\n\r\n```Python\r\nfrom pandasgwas import get_variants\r\nvariants = get_variants(study_id='GCST002305')\r\nvariants.variants[['rsId', 'functionalClass']]\r\n#      rsId      functionalClass   \r\n# 0   rs4245739  3_prime_UTR_variant\r\n# 1   rs2363956     missense_variant\r\n# 2  rs10069690       intron_variant\r\n# 3   rs3757318       intron_variant\r\n# 4  rs10771399   intergenic_variant\r\n```\r\nAggregate queried results using mathematical symbols. In addition to using the plus sign(+), the package can also use other symbols(-, &, |, ^) to perform corresponding set operations on data objects of the same type.\r\n```Python\r\nfrom pandasgwas.get_studies import get_studies\r\nstudy1=get_studies(reported_trait='Suicide risk')\r\nstudy2=get_studies(reported_trait=\"Dupuytren's disease\")\r\nstudy3=get_studies(reported_trait=\"Triglycerides\")\r\nstudy4=get_studies(reported_trait=\"Retinal vascular caliber\")\r\nstudy5=get_studies(reported_trait=\"Non-small cell lung cancer (survival)\")\r\nall_studies=study1+study2+study3+study4+study5\r\n```\r\n## Summary statistics\r\n>It\u2019s important to note that the data available on the FTP and REST API out of sync. The FTP is updated nightly with any newly ingested data. Currently, we\u2019re unable to release more data to the REST API as it\u2019s undergoing a complete redevelopment to help us cope with the tremendous growth in summary statistics data.\r\n\r\nDue to the above description on the official website, pandasGWAS has established a programming interface to query summary statistics data based on FTP data.    \r\nAn example to get started is as follows:\r\n```Python\r\nfrom pandasgwas.summary_statistics import search, browser, download, parse\r\n#Search the index based on PubMed_id, study_accession_id, and EFO_trait_id. The indexed results will be returned as a DataFrame.\r\nsearch_DF = search(PubMed_id='27918534', study_accession_id='GCST003966')\r\n#Based on the index results, view the data directory on the browser.\r\nbrowser(search_DF)\r\n#Based on index results, download summary statistics data in $Home/pandasgwas_home.\r\ndownload(search_DF)\r\n#Based on the index results, load the data from $Home/pandasgwas_home and convert it into a DataFrame. \r\ndf = parse(search_DF)\r\n```\r\n## Dependencies\r\npython: 3.11  \r\npandas: 1.5.3  \r\nrequests: 2.31.0  \r\nprogressbar2: 4.2.0\r\n## Documentation\r\nSee [pandasGWAS Documentation](https://caotianze.github.io/pandasgwas/)\r\n## Licensing information\r\n### Source code\r\nMIT License\r\n### Data from NHGRI-EBI GWAS Catalog\r\nThe NHGRI-EBI GWAS Catalog and all its contents are available under the general [Terms of Use for EMBL-EBI Services](https://www.ebi.ac.uk/about/terms-of-use). Summary statistics are made available under [CC0](https://creativecommons.org/publicdomain/zero/1.0/) unless [otherwise stated](https://www.ebi.ac.uk/gwas/docs/faq#faq-H7).\r\n## Development environment\r\nOS: Windows10 Professional  \r\nIDE: PyCharm 2022.1 (Community Edition)\r\n## Similar projects\r\nR package [gwasrapidd](https://github.com/ramiromagno/gwasrapidd) by Ramiro Magno\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A Python package for easy retrieval of GWAS Catalog data",
    "version": "1.2.2",
    "project_urls": {
        "Bug Tracker": "https://github.com/caotianze/pandasgwas/issues",
        "Homepage": "https://github.com/caotianze/pandasgwas"
    },
    "split_keywords": [
        "gwas",
        " genomics",
        " snp",
        " bioinformatics",
        " pandas"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6cef54a6f66afb54016cb7310684731f295f56809fa2121f03c253c680d5ef17",
                "md5": "999f6709be326b1e1b1c9f58e285d4fa",
                "sha256": "e27f5bce3eda33de38c0e477342626bfba131f8b9daa3b6919d2baad2467dcb6"
            },
            "downloads": -1,
            "filename": "pandasgwas-1.2.2.tar.gz",
            "has_sig": false,
            "md5_digest": "999f6709be326b1e1b1c9f58e285d4fa",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 226841,
            "upload_time": "2024-03-24T07:30:36",
            "upload_time_iso_8601": "2024-03-24T07:30:36.301771Z",
            "url": "https://files.pythonhosted.org/packages/6c/ef/54a6f66afb54016cb7310684731f295f56809fa2121f03c253c680d5ef17/pandasgwas-1.2.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-24 07:30:36",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "caotianze",
    "github_project": "pandasgwas",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "pandasgwas"
}
        
Elapsed time: 0.24364s