Name | gwaslab JSON |
Version |
3.5.5
JSON |
| download |
home_page | None |
Summary | A collection of handy tools for GWAS SumStats |
upload_time | 2025-01-02 07:11:00 |
maintainer | None |
docs_url | None |
author | None |
requires_python | <3.11,>=3.9 |
license | None |
keywords |
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# GWASLab
<img width="600" alt="image" src="https://user-images.githubusercontent.com/40289485/197167760-5f761f5e-5856-4b27-a540-8b9cd90bdadb.png">

[](https://pepy.tech/project/gwaslab)

[](https://hits.seeyoufarm.com)

* A handy Python toolkit for handling GWAS summary statistics (sumstats).
* Each process is modularized and can be customized to your needs.
* Sumstats-specific manipulations are designed as methods of a Python object, `gwaslab.Sumstats`.
Please check GWASLab documentation at [https://cloufield.github.io/gwaslab/](https://cloufield.github.io/gwaslab/)
Note: GWASLab is being updated very frequently for now. I will release the first stable version soon! Please stay tuned.
Warning: Known issues of GWASLab are summarized in [https://cloufield.github.io/gwaslab/KnownIssues/](https://cloufield.github.io/gwaslab/KnownIssues/) .
## Install
### install via pip
```
pip install gwaslab==3.5.4
```
```python
import gwaslab as gl
# load plink2 output
mysumstats = gl.Sumstats("t2d_bbj.txt.gz", fmt="plink2")
# load sumstats with auto mode (auto-detecting common headers)
# assuming ALT/A1 is EA, and frq is EAF
mysumstats = gl.Sumstats("t2d_bbj.txt.gz", fmt="auto")
# or you can specify the columns:
mysumstats = gl.Sumstats("t2d_bbj.txt.gz",
snpid="SNP",
chrom="CHR",
pos="POS",
ea="ALT",
nea="REF",
neaf="Frq",
beta="BETA",
se="SE",
p="P",
direction="Dir",
n="N",
build="19")
# manhattan and qq plot
mysumstats.plot_mqq()
...
```
### install in conda environment
Create a Python 3.9 environment and install gwaslab using pip:
```
conda env create -n gwaslab_test -c conda-forge python=3.9
conda activate gwaslab
pip install gwaslab==3.4.45
```
or create a new environment using yml file [environment_3.4.40.yml](https://github.com/Cloufield/gwaslab/blob/main/environment_3.4.40.yml)
```
conda env create -n gwaslab -f environment_3.4.40.yml
```
### install using docker
A docker file is available [here](https://github.com/Cloufield/gwaslab/blob/main/docker/Dockerfile) for building local images.
## Functions
### Loading and Formatting
- Loading sumstats by simply specifying the software name or format name, or specifying each column name.
- Converting GWAS sumstats to specific formats:
- LDSC / MAGMA / METAL / PLINK / SAIGE / REGENIE / MR-MEGA / GWAS-SSF / FUMA / GWAS-VCF / BED...
- [check available formats](https://github.com/Cloufield/formatbook)
- Optional filtering of variants in commonly used genomic regions: Hapmap3 SNPs / High-LD regions / MHC region
### Standardization & Normalization
- Variant ID standardization
- CHR and POS notation standardization
- Variant POS and allele normalization
- Genome build : Inference and Liftover
### Quality control, Value conversion & Filtering
- Statistics sanity check
- Extreme value removal
- Equivalent statistics conversion
- BETA/SE , OR/OR_95L/OR_95U
- P, Z, CHISQ, MLOG10P
- Customizable value filtering
### Harmonization
- rsID assignment based on CHR, POS, and REF/ALT
- CHR POS assignment based on rsID using a reference text file
- Palindromic SNPs and indels strand inference using a reference VCF
- Check allele frequency discrepancy using a reference VCF
- Reference allele alignment using a reference genome sequence FASTA file
### Visualization
- Mqq plot: Manhattan plot, QQ plot or MQQ plot (with a bunch of customizable features including auto-annotate nearest gene names)
- Miami plot: mirrored Manhattan plot
- Brisbane plot: GWAS hits density plot
- Regional plot: GWAS regional plot
- Genetic correlation heatmap: ldsc-rg genetic correlation matrix
- Scatter plot: variant effect size comparison
- Scatter plot: allele frequency comparison
- Scatter plot: trumpet plot (plot of MAF and effect size with power lines)
### Visualization Examples
<img width="600" alt="image" src="https://user-images.githubusercontent.com/40289485/233836639-34b03c47-5a59-4fd4-9677-5e13b02aab15.png">
<img width="600" alt="image" src="https://user-images.githubusercontent.com/40289485/197393168-e3e7076f-2801-4d66-9526-80778d44f3da.png">
<img width="600" alt="image" src="https://user-images.githubusercontent.com/40289485/197463243-89352749-f882-418d-907d-27530fd4e922.png">
<img width="600" alt="image" src="https://user-images.githubusercontent.com/40289485/197126045-b1c55adf-3391-4c3d-b2f6-eaeac7c26024.png">
### Other Utilities
- Read ldsc h2 or rg outputs directly as DataFrames (auto-parsing).
- Extract lead variants given a sliding window size.
- Extract novel loci given a list of known lead variants / or known loci obtained from GWAS Catalog.
- Logging: keep a complete record of manipulations applied to the sumstats.
- Sumstats summary: give you a quick overview of the sumstats.
- ...
## Requirements (deprecated)
environment.yml
```
name: gwaslab
channels:
- conda-forge
- defaults
dependencies:
- python=3.8.16=h7a1cb2a_3
- jupyter==1.0.0
- pip==23.1.2
- pip:
- adjusttext==0.8
- biopython==1.81
- gwaslab==3.4.16
- liftover==1.1.16
- matplotlib==3.7.1
- numpy==1.24.2
- pandas==1.4.4
- scikit-allel==1.3.5
- scikit-learn==1.2.2
- scipy==1.10.1
- seaborn==0.11.2
- statsmodels==0.13
- adjustText==0.8
- pysam==0.19
- pyensembl==2.2.3
- h5py==3.10.0
```
## How to cite
- GWASLab preprint: He, Y., Koido, M., Shimmori, Y., Kamatani, Y. (2023). GWASLab: a Python package for processing and visualizing GWAS summary statistics. Preprint at Jxiv, 2023-5. https://doi.org/10.51094/jxiv.370
## Sample Data
- Sample GWAS data used in GWASLab is obtained from: http://jenger.riken.jp/ (Suzuki, Ken, et al. "Identification of 28 new susceptibility loci for type 2 diabetes in the Japanese population." Nature genetics 51.3 (2019): 379-386.).
## Acknowledgement
Thanks to @sup3rgiu, @soumickmj and @gmauro for their contributions to the source codes.
## Contacts
* Github: [https://github.com/Cloufield/gwaslab](https://github.com/Cloufield/gwaslab)
* Blog (in Chinese): [https://gwaslab.com/](https://gwaslab.com/)
* Email: gwaslab@gmail.com
* Stats: [https://pypistats.org/packages/gwaslab](https://pypistats.org/packages/gwaslab)
Raw data
{
"_id": null,
"home_page": null,
"name": "gwaslab",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.11,>=3.9",
"maintainer_email": null,
"keywords": null,
"author": null,
"author_email": "Yunye <yunye@gwaslab.com>",
"download_url": "https://files.pythonhosted.org/packages/b4/85/ca66b24d77408f1feab88c4dd49c96045535ed1493db99ec9434e3b39e77/gwaslab-3.5.5.tar.gz",
"platform": null,
"description": "# GWASLab\n\n<img width=\"600\" alt=\"image\" src=\"https://user-images.githubusercontent.com/40289485/197167760-5f761f5e-5856-4b27-a540-8b9cd90bdadb.png\">\n\n\n[](https://pepy.tech/project/gwaslab)\n\n[](https://hits.seeyoufarm.com)\n\n\n* A handy Python toolkit for handling GWAS summary statistics (sumstats).\n* Each process is modularized and can be customized to your needs.\n* Sumstats-specific manipulations are designed as methods of a Python object, `gwaslab.Sumstats`.\n\nPlease check GWASLab documentation at [https://cloufield.github.io/gwaslab/](https://cloufield.github.io/gwaslab/)\n\nNote: GWASLab is being updated very frequently for now. I will release the first stable version soon! Please stay tuned.\n\nWarning: Known issues of GWASLab are summarized in [https://cloufield.github.io/gwaslab/KnownIssues/](https://cloufield.github.io/gwaslab/KnownIssues/) .\n\n## Install\n\n### install via pip\n\n```\npip install gwaslab==3.5.4\n```\n\n```python\nimport gwaslab as gl\n# load plink2 output\nmysumstats = gl.Sumstats(\"t2d_bbj.txt.gz\", fmt=\"plink2\")\n\n# load sumstats with auto mode (auto-detecting common headers) \n# assuming ALT/A1 is EA, and frq is EAF\nmysumstats = gl.Sumstats(\"t2d_bbj.txt.gz\", fmt=\"auto\")\n\n# or you can specify the columns:\nmysumstats = gl.Sumstats(\"t2d_bbj.txt.gz\",\n snpid=\"SNP\",\n chrom=\"CHR\",\n pos=\"POS\",\n ea=\"ALT\",\n nea=\"REF\",\n neaf=\"Frq\",\n beta=\"BETA\",\n se=\"SE\",\n p=\"P\",\n direction=\"Dir\",\n n=\"N\",\n build=\"19\")\n\n# manhattan and qq plot\nmysumstats.plot_mqq()\n...\n```\n\n### install in conda environment\n\nCreate a Python 3.9 environment and install gwaslab using pip:\n\n```\nconda env create -n gwaslab_test -c conda-forge python=3.9\nconda activate gwaslab\npip install gwaslab==3.4.45\n```\n\nor create a new environment using yml file [environment_3.4.40.yml](https://github.com/Cloufield/gwaslab/blob/main/environment_3.4.40.yml)\n\n```\nconda env create -n gwaslab -f environment_3.4.40.yml\n```\n\n\n### install using docker\n\nA docker file is available [here](https://github.com/Cloufield/gwaslab/blob/main/docker/Dockerfile) for building local images.\n\n## Functions\n\n### Loading and Formatting\n\n- Loading sumstats by simply specifying the software name or format name, or specifying each column name.\n- Converting GWAS sumstats to specific formats:\n - LDSC / MAGMA / METAL / PLINK / SAIGE / REGENIE / MR-MEGA / GWAS-SSF / FUMA / GWAS-VCF / BED... \n - [check available formats](https://github.com/Cloufield/formatbook)\n- Optional filtering of variants in commonly used genomic regions: Hapmap3 SNPs / High-LD regions / MHC region \n \n### Standardization & Normalization\n\n- Variant ID standardization\n- CHR and POS notation standardization\n- Variant POS and allele normalization\n- Genome build : Inference and Liftover \n\n### Quality control, Value conversion & Filtering\n\n- Statistics sanity check\n- Extreme value removal\n- Equivalent statistics conversion\n - BETA/SE , OR/OR_95L/OR_95U\n - P, Z, CHISQ, MLOG10P\n- Customizable value filtering\n\n### Harmonization\n\n- rsID assignment based on CHR, POS, and REF/ALT\n- CHR POS assignment based on rsID using a reference text file\n- Palindromic SNPs and indels strand inference using a reference VCF\n- Check allele frequency discrepancy using a reference VCF\n- Reference allele alignment using a reference genome sequence FASTA file\n\n### Visualization\n\n- Mqq plot: Manhattan plot, QQ plot or MQQ plot (with a bunch of customizable features including auto-annotate nearest gene names)\n- Miami plot: mirrored Manhattan plot\n- Brisbane plot: GWAS hits density plot\n- Regional plot: GWAS regional plot\n- Genetic correlation heatmap: ldsc-rg genetic correlation matrix\n- Scatter plot: variant effect size comparison\n- Scatter plot: allele frequency comparison \n- Scatter plot: trumpet plot (plot of MAF and effect size with power lines)\n\n### Visualization Examples\n\n<img width=\"600\" alt=\"image\" src=\"https://user-images.githubusercontent.com/40289485/233836639-34b03c47-5a59-4fd4-9677-5e13b02aab15.png\">\n<img width=\"600\" alt=\"image\" src=\"https://user-images.githubusercontent.com/40289485/197393168-e3e7076f-2801-4d66-9526-80778d44f3da.png\">\n<img width=\"600\" alt=\"image\" src=\"https://user-images.githubusercontent.com/40289485/197463243-89352749-f882-418d-907d-27530fd4e922.png\">\n<img width=\"600\" alt=\"image\" src=\"https://user-images.githubusercontent.com/40289485/197126045-b1c55adf-3391-4c3d-b2f6-eaeac7c26024.png\">\n\n### Other Utilities\n\n- Read ldsc h2 or rg outputs directly as DataFrames (auto-parsing).\n- Extract lead variants given a sliding window size.\n- Extract novel loci given a list of known lead variants / or known loci obtained from GWAS Catalog.\n- Logging: keep a complete record of manipulations applied to the sumstats.\n- Sumstats summary: give you a quick overview of the sumstats. \n- ...\n\n## Requirements (deprecated)\n\nenvironment.yml\n\n```\nname: gwaslab\nchannels:\n - conda-forge\n - defaults\ndependencies:\n - python=3.8.16=h7a1cb2a_3\n - jupyter==1.0.0\n - pip==23.1.2\n - pip:\n - adjusttext==0.8\n - biopython==1.81\n - gwaslab==3.4.16\n - liftover==1.1.16\n - matplotlib==3.7.1\n - numpy==1.24.2\n - pandas==1.4.4\n - scikit-allel==1.3.5\n - scikit-learn==1.2.2\n - scipy==1.10.1\n - seaborn==0.11.2\n - statsmodels==0.13\n - adjustText==0.8\n - pysam==0.19\n - pyensembl==2.2.3\n - h5py==3.10.0\n```\n\n## How to cite\n- GWASLab preprint: He, Y., Koido, M., Shimmori, Y., Kamatani, Y. (2023). GWASLab: a Python package for processing and visualizing GWAS summary statistics. Preprint at Jxiv, 2023-5. https://doi.org/10.51094/jxiv.370\n\n## Sample Data\n- Sample GWAS data used in GWASLab is obtained from: http://jenger.riken.jp/ (Suzuki, Ken, et al. \"Identification of 28 new susceptibility loci for type 2 diabetes in the Japanese population.\" Nature genetics 51.3 (2019): 379-386.).\n\n## Acknowledgement\n\nThanks to @sup3rgiu, @soumickmj and @gmauro for their contributions to the source codes.\n\n## Contacts\n* Github: [https://github.com/Cloufield/gwaslab](https://github.com/Cloufield/gwaslab)\n* Blog (in Chinese): [https://gwaslab.com/](https://gwaslab.com/)\n* Email: gwaslab@gmail.com\n* Stats: [https://pypistats.org/packages/gwaslab](https://pypistats.org/packages/gwaslab)\n",
"bugtrack_url": null,
"license": null,
"summary": "A collection of handy tools for GWAS SumStats",
"version": "3.5.5",
"project_urls": {
"Github": "https://github.com/Cloufield/gwaslab",
"Homepage": "https://cloufield.github.io/gwaslab/"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "7d9a33cbbfd861a2ed8c46d910279d2f308a39c0a902619d93defea629311a74",
"md5": "b4b00a2ca4440d0a2d6b8af182536818",
"sha256": "213e63fb1397fa280d0144fb14460b712cf665f4c83bd384b2fb1cef6073bdd9"
},
"downloads": -1,
"filename": "gwaslab-3.5.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b4b00a2ca4440d0a2d6b8af182536818",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.11,>=3.9",
"size": 20803850,
"upload_time": "2025-01-02T07:10:54",
"upload_time_iso_8601": "2025-01-02T07:10:54.124404Z",
"url": "https://files.pythonhosted.org/packages/7d/9a/33cbbfd861a2ed8c46d910279d2f308a39c0a902619d93defea629311a74/gwaslab-3.5.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b485ca66b24d77408f1feab88c4dd49c96045535ed1493db99ec9434e3b39e77",
"md5": "a821eabed035330e72aa579a12d31f03",
"sha256": "3560cb3aec443e9d429409f2dbcc5e8a058f4573d8930220f0d294495160776f"
},
"downloads": -1,
"filename": "gwaslab-3.5.5.tar.gz",
"has_sig": false,
"md5_digest": "a821eabed035330e72aa579a12d31f03",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.11,>=3.9",
"size": 20775736,
"upload_time": "2025-01-02T07:11:00",
"upload_time_iso_8601": "2025-01-02T07:11:00.334527Z",
"url": "https://files.pythonhosted.org/packages/b4/85/ca66b24d77408f1feab88c4dd49c96045535ed1493db99ec9434e3b39e77/gwaslab-3.5.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-02 07:11:00",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Cloufield",
"github_project": "gwaslab",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "gwaslab"
}