gauchian


Namegauchian JSON
Version 1.0.2 PyPI version JSON
download
home_pagehttps://github.com/illumina/Gauchian
SummaryWGS-based GBA variant caller
upload_time2023-10-14 16:49:59
maintainer
docs_urlNone
authorXiao Chen
requires_python
licenseGPLv3
keywords gba
VCS
bugtrack_url
requirements numpy scipy pysam statsmodels
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Gauchian: WGS-based GBA variant caller

Gauchian is a targeted variant caller for the GBA gene based on a whole-genome sequencing (WGS) BAM file. Gauchian uses a novel method to solve the problems caused by the high sequence similarity with the pseudogene paralog GBAP1 and is able to detect variants accurately in the Exons 9-11 homology region, such as large deletions or duplications between GBA and GBAP1, and GBAP1-like variants in GBA, including p.A495P, p.L483P, p.D448H, c.1263del, RecNciI, RecTL and c.1263del+RecTL. In addition to these challenging variants, Gauchian also calls known pathogenic or likely pathogenic GBA variants classified in ClinVar. Gauchian has been tested on Illumina WGS data with standard sequencing depth (>=30X). Gauchian does not work on targeted sequencing data. Please refer to our [preprint](https://www.medrxiv.org/content/10.1101/2021.11.12.21266253v1) for more details about the method.

## Installation

This Python package is supported for Linux and macOS. It has been tested on CentOS 7.9.2009.

The Python dependencies can be found in `requirements.txt`. Installation takes a few seconds.

```bash
git clone https://github.com/Illumina/Gauchian
cd Gauchian
python3 setup.py install
```

## Running the program

```bash
gauchian --manifest MANIFEST_FILE \
         --genome [19/37/38] \
         --prefix OUTPUT_FILE_PREFIX \
         --outDir OUTPUT_DIRECTORY \
         --threads NUMBER_THREADS
```

The manifest is a text file in which each line should list the absolute path to an input WGS BAM/CRAM file. Full WGS BAM/CRAM files are recommended. If you would like to use a subsetted bamlet, please subset using region files in gauchian/data/GBA_region_*.bed.

For CRAM input, it’s suggested to provide the path to the reference fasta file with `--reference` in the command.

## Interpreting the output

The program produces a .tsv file in the directory specified by --outDir.
The fields are explained below:

| Fields in tsv                            | Explanation                                                                    |
|:-----------------------------------------|:-------------------------------------------------------------------------------|
| Sample                                   | Sample name                                                                    |
| is_biallelic(GBAP1-like_variant_exon9-11)| Whether the sample is called as biallelic for GBAP1-like variants in exon9-11  |
| is_carrier(GBAP1-like_variant_exon9-11)  | Whether the sample is called as a carrier for GBAP1-like variants in exon9-11  |
| CN(GBA+GBAP1)                            | Total copy number of GBA+GBAP1                                                 |
| deletion_breakpoint_in_GBA               | Whether the deletion breakpoint is in GBA gene if a deletion exists            |
| GBAP1-like_variant_exon9-11              | GBAP1-like variants called in exon9-11, two alleles separated by /             |
| other_unphased_variants                  | Other variants called (non-GBAP1-like variants or variants outside of exon9-11)|

A .json file is also produced that contains more information about each sample.

| Fields in json    | Explanation                                                                       |
|:------------------|:----------------------------------------------------------------------------------|
| Coverage_MAD      | Median absolute deviation of depth, measure of sample quality                     |
| Median_depth      | Sample median depth                                                               |
| deletion_CN       | CN of the unique region between GBA and GBAP1. This value plus 2 is the total CN  |
| deletion_CN_raw   | Raw normalized depth of the unique region between GBA and GBAP1                   |
| variant_raw_count | Supporting reads for each variant                                                 |
| snp_call          | GBA copy number call at GBA/GBAP1 differentiating sites                           |
| snp_raw           | Raw GBA copy number at GBA/GBAP1 differentiating sites                            |
| haplotypes        | Summary of haplotypes assembled across GBA/GBAP1 differentiating sites in Exon9-11|



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/illumina/Gauchian",
    "name": "gauchian",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "GBA",
    "author": "Xiao Chen",
    "author_email": "xchen2@illumina.com",
    "download_url": "https://files.pythonhosted.org/packages/69/3d/1fcca60bdca179cf600fbe0d1927a916a99875fb10472932c11bcb6b261c/gauchian-1.0.2.tar.gz",
    "platform": null,
    "description": "# Gauchian: WGS-based GBA variant caller\n\nGauchian is a targeted variant caller for the GBA gene based on a whole-genome sequencing (WGS) BAM file. Gauchian uses a novel method to solve the problems caused by the high sequence similarity with the pseudogene paralog GBAP1 and is able to detect variants accurately in the Exons 9-11 homology region, such as large deletions or duplications between GBA and GBAP1, and GBAP1-like variants in GBA, including p.A495P, p.L483P, p.D448H, c.1263del, RecNciI, RecTL and c.1263del+RecTL. In addition to these challenging variants, Gauchian also calls known pathogenic or likely pathogenic GBA variants classified in ClinVar. Gauchian has been tested on Illumina WGS data with standard sequencing depth (>=30X). Gauchian does not work on targeted sequencing data. Please refer to our [preprint](https://www.medrxiv.org/content/10.1101/2021.11.12.21266253v1) for more details about the method.\n\n## Installation\n\nThis Python package is supported for Linux and macOS. It has been tested on CentOS 7.9.2009.\n\nThe Python dependencies can be found in `requirements.txt`. Installation takes a few seconds.\n\n```bash\ngit clone https://github.com/Illumina/Gauchian\ncd Gauchian\npython3 setup.py install\n```\n\n## Running the program\n\n```bash\ngauchian --manifest MANIFEST_FILE \\\n         --genome [19/37/38] \\\n         --prefix OUTPUT_FILE_PREFIX \\\n         --outDir OUTPUT_DIRECTORY \\\n         --threads NUMBER_THREADS\n```\n\nThe manifest is a text file in which each line should list the absolute path to an input WGS BAM/CRAM file. Full WGS BAM/CRAM files are recommended. If you would like to use a subsetted bamlet, please subset using region files in gauchian/data/GBA_region_*.bed.\n\nFor CRAM input, it\u2019s suggested to provide the path to the reference fasta file with `--reference` in the command.\n\n## Interpreting the output\n\nThe program produces a .tsv file in the directory specified by --outDir.\nThe fields are explained below:\n\n| Fields in tsv                            | Explanation                                                                    |\n|:-----------------------------------------|:-------------------------------------------------------------------------------|\n| Sample                                   | Sample name                                                                    |\n| is_biallelic(GBAP1-like_variant_exon9-11)| Whether the sample is called as biallelic for GBAP1-like variants in exon9-11  |\n| is_carrier(GBAP1-like_variant_exon9-11)  | Whether the sample is called as a carrier for GBAP1-like variants in exon9-11  |\n| CN(GBA+GBAP1)                            | Total copy number of GBA+GBAP1                                                 |\n| deletion_breakpoint_in_GBA               | Whether the deletion breakpoint is in GBA gene if a deletion exists            |\n| GBAP1-like_variant_exon9-11              | GBAP1-like variants called in exon9-11, two alleles separated by /             |\n| other_unphased_variants                  | Other variants called (non-GBAP1-like variants or variants outside of exon9-11)|\n\nA .json file is also produced that contains more information about each sample.\n\n| Fields in json    | Explanation                                                                       |\n|:------------------|:----------------------------------------------------------------------------------|\n| Coverage_MAD      | Median absolute deviation of depth, measure of sample quality                     |\n| Median_depth      | Sample median depth                                                               |\n| deletion_CN       | CN of the unique region between GBA and GBAP1. This value plus 2 is the total CN  |\n| deletion_CN_raw   | Raw normalized depth of the unique region between GBA and GBAP1                   |\n| variant_raw_count | Supporting reads for each variant                                                 |\n| snp_call          | GBA copy number call at GBA/GBAP1 differentiating sites                           |\n| snp_raw           | Raw GBA copy number at GBA/GBAP1 differentiating sites                            |\n| haplotypes        | Summary of haplotypes assembled across GBA/GBAP1 differentiating sites in Exon9-11|\n\n\n",
    "bugtrack_url": null,
    "license": "GPLv3",
    "summary": "WGS-based GBA variant caller",
    "version": "1.0.2",
    "project_urls": {
        "Homepage": "https://github.com/illumina/Gauchian"
    },
    "split_keywords": [
        "gba"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4a8c76d09a5651af0dcc386a913d74a29f49d7e5c736dcdd8b91787eb3062949",
                "md5": "6b052be6d95fd33ffb710300bae4e733",
                "sha256": "85bb886e72963b9cf2f7917a0fb25262b9cbfb63f711e5d6dabf7b58c76a4425"
            },
            "downloads": -1,
            "filename": "gauchian-1.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6b052be6d95fd33ffb710300bae4e733",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 217532,
            "upload_time": "2023-10-14T16:49:54",
            "upload_time_iso_8601": "2023-10-14T16:49:54.684073Z",
            "url": "https://files.pythonhosted.org/packages/4a/8c/76d09a5651af0dcc386a913d74a29f49d7e5c736dcdd8b91787eb3062949/gauchian-1.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "693d1fcca60bdca179cf600fbe0d1927a916a99875fb10472932c11bcb6b261c",
                "md5": "e15cc66a796d432496d6f3f68862bac9",
                "sha256": "be2e23c4afb3b6b170706b8864f808c3c5c5f3b1828e4cd76b7f9dc64f023a38"
            },
            "downloads": -1,
            "filename": "gauchian-1.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "e15cc66a796d432496d6f3f68862bac9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 26663,
            "upload_time": "2023-10-14T16:49:59",
            "upload_time_iso_8601": "2023-10-14T16:49:59.835287Z",
            "url": "https://files.pythonhosted.org/packages/69/3d/1fcca60bdca179cf600fbe0d1927a916a99875fb10472932c11bcb6b261c/gauchian-1.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-14 16:49:59",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "illumina",
    "github_project": "Gauchian",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.16"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    ">=",
                    "1.2"
                ]
            ]
        },
        {
            "name": "pysam",
            "specs": [
                [
                    ">=",
                    "0.15.3"
                ]
            ]
        },
        {
            "name": "statsmodels",
            "specs": [
                [
                    ">=",
                    "0.9"
                ]
            ]
        }
    ],
    "lcname": "gauchian"
}
        
Elapsed time: 0.16310s