# Gauchian: WGS-based GBA variant caller
Gauchian is a targeted variant caller for the GBA gene based on a whole-genome sequencing (WGS) BAM file. Gauchian uses a novel method to solve the problems caused by the high sequence similarity with the pseudogene paralog GBAP1 and is able to detect variants accurately in the Exons 9-11 homology region, such as large deletions or duplications between GBA and GBAP1, and GBAP1-like variants in GBA, including p.A495P, p.L483P, p.D448H, c.1263del, RecNciI, RecTL and c.1263del+RecTL. In addition to these challenging variants, Gauchian also calls known pathogenic or likely pathogenic GBA variants classified in ClinVar. Gauchian has been tested on Illumina WGS data with standard sequencing depth (>=30X). Gauchian does not work on targeted sequencing data. Please refer to our [preprint](https://www.medrxiv.org/content/10.1101/2021.11.12.21266253v1) for more details about the method.
## Installation
This Python package is supported for Linux and macOS. It has been tested on CentOS 7.9.2009.
The Python dependencies can be found in `requirements.txt`. Installation takes a few seconds.
```bash
git clone https://github.com/Illumina/Gauchian
cd Gauchian
python3 setup.py install
```
## Running the program
```bash
gauchian --manifest MANIFEST_FILE \
--genome [19/37/38] \
--prefix OUTPUT_FILE_PREFIX \
--outDir OUTPUT_DIRECTORY \
--threads NUMBER_THREADS
```
The manifest is a text file in which each line should list the absolute path to an input WGS BAM/CRAM file. Full WGS BAM/CRAM files are recommended. If you would like to use a subsetted bamlet, please subset using region files in gauchian/data/GBA_region_*.bed.
For CRAM input, it’s suggested to provide the path to the reference fasta file with `--reference` in the command.
## Interpreting the output
The program produces a .tsv file in the directory specified by --outDir.
The fields are explained below:
| Fields in tsv | Explanation |
|:-----------------------------------------|:-------------------------------------------------------------------------------|
| Sample | Sample name |
| is_biallelic(GBAP1-like_variant_exon9-11)| Whether the sample is called as biallelic for GBAP1-like variants in exon9-11 |
| is_carrier(GBAP1-like_variant_exon9-11) | Whether the sample is called as a carrier for GBAP1-like variants in exon9-11 |
| CN(GBA+GBAP1) | Total copy number of GBA+GBAP1 |
| deletion_breakpoint_in_GBA | Whether the deletion breakpoint is in GBA gene if a deletion exists |
| GBAP1-like_variant_exon9-11 | GBAP1-like variants called in exon9-11, two alleles separated by / |
| other_unphased_variants | Other variants called (non-GBAP1-like variants or variants outside of exon9-11)|
A .json file is also produced that contains more information about each sample.
| Fields in json | Explanation |
|:------------------|:----------------------------------------------------------------------------------|
| Coverage_MAD | Median absolute deviation of depth, measure of sample quality |
| Median_depth | Sample median depth |
| deletion_CN | CN of the unique region between GBA and GBAP1. This value plus 2 is the total CN |
| deletion_CN_raw | Raw normalized depth of the unique region between GBA and GBAP1 |
| variant_raw_count | Supporting reads for each variant |
| snp_call | GBA copy number call at GBA/GBAP1 differentiating sites |
| snp_raw | Raw GBA copy number at GBA/GBAP1 differentiating sites |
| haplotypes | Summary of haplotypes assembled across GBA/GBAP1 differentiating sites in Exon9-11|
Raw data
{
"_id": null,
"home_page": "https://github.com/illumina/Gauchian",
"name": "gauchian",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "GBA",
"author": "Xiao Chen",
"author_email": "xchen2@illumina.com",
"download_url": "https://files.pythonhosted.org/packages/69/3d/1fcca60bdca179cf600fbe0d1927a916a99875fb10472932c11bcb6b261c/gauchian-1.0.2.tar.gz",
"platform": null,
"description": "# Gauchian: WGS-based GBA variant caller\n\nGauchian is a targeted variant caller for the GBA gene based on a whole-genome sequencing (WGS) BAM file. Gauchian uses a novel method to solve the problems caused by the high sequence similarity with the pseudogene paralog GBAP1 and is able to detect variants accurately in the Exons 9-11 homology region, such as large deletions or duplications between GBA and GBAP1, and GBAP1-like variants in GBA, including p.A495P, p.L483P, p.D448H, c.1263del, RecNciI, RecTL and c.1263del+RecTL. In addition to these challenging variants, Gauchian also calls known pathogenic or likely pathogenic GBA variants classified in ClinVar. Gauchian has been tested on Illumina WGS data with standard sequencing depth (>=30X). Gauchian does not work on targeted sequencing data. Please refer to our [preprint](https://www.medrxiv.org/content/10.1101/2021.11.12.21266253v1) for more details about the method.\n\n## Installation\n\nThis Python package is supported for Linux and macOS. It has been tested on CentOS 7.9.2009.\n\nThe Python dependencies can be found in `requirements.txt`. Installation takes a few seconds.\n\n```bash\ngit clone https://github.com/Illumina/Gauchian\ncd Gauchian\npython3 setup.py install\n```\n\n## Running the program\n\n```bash\ngauchian --manifest MANIFEST_FILE \\\n --genome [19/37/38] \\\n --prefix OUTPUT_FILE_PREFIX \\\n --outDir OUTPUT_DIRECTORY \\\n --threads NUMBER_THREADS\n```\n\nThe manifest is a text file in which each line should list the absolute path to an input WGS BAM/CRAM file. Full WGS BAM/CRAM files are recommended. If you would like to use a subsetted bamlet, please subset using region files in gauchian/data/GBA_region_*.bed.\n\nFor CRAM input, it\u2019s suggested to provide the path to the reference fasta file with `--reference` in the command.\n\n## Interpreting the output\n\nThe program produces a .tsv file in the directory specified by --outDir.\nThe fields are explained below:\n\n| Fields in tsv | Explanation |\n|:-----------------------------------------|:-------------------------------------------------------------------------------|\n| Sample | Sample name |\n| is_biallelic(GBAP1-like_variant_exon9-11)| Whether the sample is called as biallelic for GBAP1-like variants in exon9-11 |\n| is_carrier(GBAP1-like_variant_exon9-11) | Whether the sample is called as a carrier for GBAP1-like variants in exon9-11 |\n| CN(GBA+GBAP1) | Total copy number of GBA+GBAP1 |\n| deletion_breakpoint_in_GBA | Whether the deletion breakpoint is in GBA gene if a deletion exists |\n| GBAP1-like_variant_exon9-11 | GBAP1-like variants called in exon9-11, two alleles separated by / |\n| other_unphased_variants | Other variants called (non-GBAP1-like variants or variants outside of exon9-11)|\n\nA .json file is also produced that contains more information about each sample.\n\n| Fields in json | Explanation |\n|:------------------|:----------------------------------------------------------------------------------|\n| Coverage_MAD | Median absolute deviation of depth, measure of sample quality |\n| Median_depth | Sample median depth |\n| deletion_CN | CN of the unique region between GBA and GBAP1. This value plus 2 is the total CN |\n| deletion_CN_raw | Raw normalized depth of the unique region between GBA and GBAP1 |\n| variant_raw_count | Supporting reads for each variant |\n| snp_call | GBA copy number call at GBA/GBAP1 differentiating sites |\n| snp_raw | Raw GBA copy number at GBA/GBAP1 differentiating sites |\n| haplotypes | Summary of haplotypes assembled across GBA/GBAP1 differentiating sites in Exon9-11|\n\n\n",
"bugtrack_url": null,
"license": "GPLv3",
"summary": "WGS-based GBA variant caller",
"version": "1.0.2",
"project_urls": {
"Homepage": "https://github.com/illumina/Gauchian"
},
"split_keywords": [
"gba"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "4a8c76d09a5651af0dcc386a913d74a29f49d7e5c736dcdd8b91787eb3062949",
"md5": "6b052be6d95fd33ffb710300bae4e733",
"sha256": "85bb886e72963b9cf2f7917a0fb25262b9cbfb63f711e5d6dabf7b58c76a4425"
},
"downloads": -1,
"filename": "gauchian-1.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6b052be6d95fd33ffb710300bae4e733",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 217532,
"upload_time": "2023-10-14T16:49:54",
"upload_time_iso_8601": "2023-10-14T16:49:54.684073Z",
"url": "https://files.pythonhosted.org/packages/4a/8c/76d09a5651af0dcc386a913d74a29f49d7e5c736dcdd8b91787eb3062949/gauchian-1.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "693d1fcca60bdca179cf600fbe0d1927a916a99875fb10472932c11bcb6b261c",
"md5": "e15cc66a796d432496d6f3f68862bac9",
"sha256": "be2e23c4afb3b6b170706b8864f808c3c5c5f3b1828e4cd76b7f9dc64f023a38"
},
"downloads": -1,
"filename": "gauchian-1.0.2.tar.gz",
"has_sig": false,
"md5_digest": "e15cc66a796d432496d6f3f68862bac9",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 26663,
"upload_time": "2023-10-14T16:49:59",
"upload_time_iso_8601": "2023-10-14T16:49:59.835287Z",
"url": "https://files.pythonhosted.org/packages/69/3d/1fcca60bdca179cf600fbe0d1927a916a99875fb10472932c11bcb6b261c/gauchian-1.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-10-14 16:49:59",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "illumina",
"github_project": "Gauchian",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "numpy",
"specs": [
[
">=",
"1.16"
]
]
},
{
"name": "scipy",
"specs": [
[
">=",
"1.2"
]
]
},
{
"name": "pysam",
"specs": [
[
">=",
"0.15.3"
]
]
},
{
"name": "statsmodels",
"specs": [
[
">=",
"0.9"
]
]
}
],
"lcname": "gauchian"
}