vrs-anvil


Namevrs-anvil JSON
Version 0.0.1rc2 PyPI version JSON
download
home_pagehttps://github.com/ohsu-comp-bio/vrs_anvil_toolkit
SummaryCommons utilities
upload_time2024-04-08 21:57:07
maintainerNone
docs_urlNone
authorEllrott Lab
requires_python<4,>=3.10
licenseNone
keywords anvil terra bioinformatics
VCS
bugtrack_url
requirements ga4gh.vrs diskcache biocommons.seqrepo glom click pyyaml google requests boto3 tqdm google-cloud-storage psutil
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <img width="685" alt="image" src="https://github.com/ohsu-comp-bio/vrs-python-testing/assets/47808/909db052-972c-4508-a2f4-8a389de03320">


# VRS AnVIL

## Project Overview

This Python project is designed to process Variant Call Format (VCF) files or other sources of variant information and perform lookup operations on Genomic Variation Representation Service (GA4GH VRS) identifiers. The GA4GH VRS identifiers provide a standardized way to represent genomic variations, making it easier to exchange and share genomic information.

In addition, this project facilitates the retrieval of evidence associated with genomic alleles by leveraging the Genomic Data Representation and Knowledge Base (GA4GH MetaKB) service. GA4GH MetaKB provides a comprehensive knowledge base that links genomic variants to relevant evidence, enabling users to access valuable information about genomic alleles.

## Features

1. **VCF File Processing:**
   - Streamlines reading and parsing of VCF files, to extract relevant genomic information.

2. **GA4GH VRS Identifier Lookup:**
   - Utilizes the GA4GH VRS API to perform lookups for each genomic variation mentioned in the VCF file.
   - Retrieves standardized identifiers for the alleles, enhancing interoperability with GA4GH-compliant systems.
   - GA4GH MetaKB Service Integration:  Utilizes the GA4GH MetaKB service to query and retrieve evidence associated with the specified genomic alleles.
3. **Output Generation:**
   - Generates summary metrics about throughput, errors, and evidence hits and misses
   - Optionally, generates a processed VCF file with additional GA4GH VRS identifiers for each genomic variation.
   - Presents the retrieved evidence in a structured format, including information about studies, publications, and other relevant details.


4. **Error Handling:**
   - Implements robust error handling to address issues like invalid input files, invalid variants, connectivity problems with the GA4GH MetaKB API, and more.

## Getting Started

### Prerequisites

- Python 3.10 or later
- Internet connectivity for setting up dependencies, GA4GH MetaKB lookups, etc.

### Installation

1. Clone the repository:

   ```bash
   git clone https://github.com/ohsu-comp-bio/vrs_bulk_toolkit
   cd vrs-anvil
   ```

2. Install dependencies:
   a. for local use
   ```bash
   # install postgresql@14 (required for vrs-python)
   brew install postgresql@14
   bash scripts/setup.sh
   ```
   b. for use on Terra
   ```bash
   SEQREPO_ROOT=~
   bash terra/setup.sh
   ```

### Usage
**Manifest**

The configuration is controlled by a [manifest.yaml](tests/fixtures/manifest.yaml) file. The manifest file specifies the input VCF file(s), the output directory, and other parameters.

**CLI**
```bash
source venv/bin/activate
# navigate to a working directory, with your manifest.yaml file.  Add the VCF urls or file paths to your manifest

# run the vrs_anvil command in the fore ground
vrs_bulk annotate

# run the vrs_bulk command in parallel, one process per VCF file
vrs_bulk annotate --scatter

# run the vrs_bulk command in parallel in the background
nohup vrs_bulk annotate --scatter & # press enter to continue

# get the status of the scatter processes
vrs_bulk ps
```

**Processing VCF Files ([vrs-python](https://github.com/ga4gh/vrs-python))**

vrs-python is a GA4GH GKS package centered around creating Variant Representation specification (VRS) IDs: consistent, globally unique identifiers for variation. Some of its functionality includes variant ID translation and VCF annotation. Used as a dependency in vrs_bulk, it can also be used as a standalone package.

For Python usage, see [vrs_vcf_annotator.py](scripts/vrs_vcf_annotator.py) for an example.

For CLI usage:
```bash
python3 -m ga4gh.vrs.extras.vcf_annotation --vcf_in tests/fixtures/1kGP.chr1.1000.vcf --vcf_out annotated_output.vcf.gz --vrs_pickle_out allele_dicts.pkl --seqrepo_root_dir ~/seqrepo/latest
```

The above is an example using an example vcf. Replace the `--vcf_out` and `vrs_pickle_out` here with your desired output file path, where the output vcf can be BCF (`vcf.gz`) or VCF (`vcf`)

**Terra**
The command line utility supports Google Cloud URIs and running commands in the background to interop with Terra out-of-the-box. This is described in the [CLI usage](#features) above. For an example notebook, see `vrs-anvil-demo.ipynb` on the `vrs-anvil` workspace.

### Contributing

This project is open to contributions from the research community. If you are interested in contributing to the project, please contact the project team.
See the [contributing guide](CONTRIBUTING.md) for more information on how to contribute to the project.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE.md) file for details.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ohsu-comp-bio/vrs_anvil_toolkit",
    "name": "vrs-anvil",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4,>=3.10",
    "maintainer_email": null,
    "keywords": "anvil terra bioinformatics",
    "author": "Ellrott Lab",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/ce/b1/fd9bb61dbb696ae93f9e749e803d622dc0847b0295eda88aaee83b2eeb42/vrs_anvil-0.0.1rc2.tar.gz",
    "platform": null,
    "description": "<img width=\"685\" alt=\"image\" src=\"https://github.com/ohsu-comp-bio/vrs-python-testing/assets/47808/909db052-972c-4508-a2f4-8a389de03320\">\n\n\n# VRS AnVIL\n\n## Project Overview\n\nThis Python project is designed to process Variant Call Format (VCF) files or other sources of variant information and perform lookup operations on Genomic Variation Representation Service (GA4GH VRS) identifiers. The GA4GH VRS identifiers provide a standardized way to represent genomic variations, making it easier to exchange and share genomic information.\n\nIn addition, this project facilitates the retrieval of evidence associated with genomic alleles by leveraging the Genomic Data Representation and Knowledge Base (GA4GH MetaKB) service. GA4GH MetaKB provides a comprehensive knowledge base that links genomic variants to relevant evidence, enabling users to access valuable information about genomic alleles.\n\n## Features\n\n1. **VCF File Processing:**\n   - Streamlines reading and parsing of VCF files, to extract relevant genomic information.\n\n2. **GA4GH VRS Identifier Lookup:**\n   - Utilizes the GA4GH VRS API to perform lookups for each genomic variation mentioned in the VCF file.\n   - Retrieves standardized identifiers for the alleles, enhancing interoperability with GA4GH-compliant systems.\n   - GA4GH MetaKB Service Integration:  Utilizes the GA4GH MetaKB service to query and retrieve evidence associated with the specified genomic alleles.\n3. **Output Generation:**\n   - Generates summary metrics about throughput, errors, and evidence hits and misses\n   - Optionally, generates a processed VCF file with additional GA4GH VRS identifiers for each genomic variation.\n   - Presents the retrieved evidence in a structured format, including information about studies, publications, and other relevant details.\n\n\n4. **Error Handling:**\n   - Implements robust error handling to address issues like invalid input files, invalid variants, connectivity problems with the GA4GH MetaKB API, and more.\n\n## Getting Started\n\n### Prerequisites\n\n- Python 3.10 or later\n- Internet connectivity for setting up dependencies, GA4GH MetaKB lookups, etc.\n\n### Installation\n\n1. Clone the repository:\n\n   ```bash\n   git clone https://github.com/ohsu-comp-bio/vrs_bulk_toolkit\n   cd vrs-anvil\n   ```\n\n2. Install dependencies:\n   a. for local use\n   ```bash\n   # install postgresql@14 (required for vrs-python)\n   brew install postgresql@14\n   bash scripts/setup.sh\n   ```\n   b. for use on Terra\n   ```bash\n   SEQREPO_ROOT=~\n   bash terra/setup.sh\n   ```\n\n### Usage\n**Manifest**\n\nThe configuration is controlled by a [manifest.yaml](tests/fixtures/manifest.yaml) file. The manifest file specifies the input VCF file(s), the output directory, and other parameters.\n\n**CLI**\n```bash\nsource venv/bin/activate\n# navigate to a working directory, with your manifest.yaml file.  Add the VCF urls or file paths to your manifest\n\n# run the vrs_anvil command in the fore ground\nvrs_bulk annotate\n\n# run the vrs_bulk command in parallel, one process per VCF file\nvrs_bulk annotate --scatter\n\n# run the vrs_bulk command in parallel in the background\nnohup vrs_bulk annotate --scatter & # press enter to continue\n\n# get the status of the scatter processes\nvrs_bulk ps\n```\n\n**Processing VCF Files ([vrs-python](https://github.com/ga4gh/vrs-python))**\n\nvrs-python is a GA4GH GKS package centered around creating Variant Representation specification (VRS) IDs: consistent, globally unique identifiers for variation. Some of its functionality includes variant ID translation and VCF annotation. Used as a dependency in vrs_bulk, it can also be used as a standalone package.\n\nFor Python usage, see [vrs_vcf_annotator.py](scripts/vrs_vcf_annotator.py) for an example.\n\nFor CLI usage:\n```bash\npython3 -m ga4gh.vrs.extras.vcf_annotation --vcf_in tests/fixtures/1kGP.chr1.1000.vcf --vcf_out annotated_output.vcf.gz --vrs_pickle_out allele_dicts.pkl --seqrepo_root_dir ~/seqrepo/latest\n```\n\nThe above is an example using an example vcf. Replace the `--vcf_out` and `vrs_pickle_out` here with your desired output file path, where the output vcf can be BCF (`vcf.gz`) or VCF (`vcf`)\n\n**Terra**\nThe command line utility supports Google Cloud URIs and running commands in the background to interop with Terra out-of-the-box. This is described in the [CLI usage](#features) above. For an example notebook, see `vrs-anvil-demo.ipynb` on the `vrs-anvil` workspace.\n\n### Contributing\n\nThis project is open to contributions from the research community. If you are interested in contributing to the project, please contact the project team.\nSee the [contributing guide](CONTRIBUTING.md) for more information on how to contribute to the project.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE.md) file for details.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Commons utilities",
    "version": "0.0.1rc2",
    "project_urls": {
        "Bug Reports": "https://github.com/ohsu-comp-bio/vrs_anvil_toolkit/issues",
        "Homepage": "https://github.com/ohsu-comp-bio/vrs_anvil_toolkit",
        "Source": "https://github.com/ohsu-comp-bio/vrs_anvil_toolkit"
    },
    "split_keywords": [
        "anvil",
        "terra",
        "bioinformatics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f963fa2b09c4f80d5bfd9c886ff4403d32b9a16b44ffcb02f4aeabada9903a04",
                "md5": "0ba1f06317e810de1d98001ac8a6bdbb",
                "sha256": "396207ae3e1fbf71de27dc4acc5d15ea2c12964cb754e2d3870c7b0f4ba1418a"
            },
            "downloads": -1,
            "filename": "vrs_anvil-0.0.1rc2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0ba1f06317e810de1d98001ac8a6bdbb",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4,>=3.10",
            "size": 26560,
            "upload_time": "2024-04-08T21:57:05",
            "upload_time_iso_8601": "2024-04-08T21:57:05.630988Z",
            "url": "https://files.pythonhosted.org/packages/f9/63/fa2b09c4f80d5bfd9c886ff4403d32b9a16b44ffcb02f4aeabada9903a04/vrs_anvil-0.0.1rc2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ceb1fd9bb61dbb696ae93f9e749e803d622dc0847b0295eda88aaee83b2eeb42",
                "md5": "b9ba91d85fc46100eb8d775a63394ec2",
                "sha256": "44d08e87b15b492a79a710444fdddb5446a889757a8780592d35f05060d3b13d"
            },
            "downloads": -1,
            "filename": "vrs_anvil-0.0.1rc2.tar.gz",
            "has_sig": false,
            "md5_digest": "b9ba91d85fc46100eb8d775a63394ec2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4,>=3.10",
            "size": 25140,
            "upload_time": "2024-04-08T21:57:07",
            "upload_time_iso_8601": "2024-04-08T21:57:07.771744Z",
            "url": "https://files.pythonhosted.org/packages/ce/b1/fd9bb61dbb696ae93f9e749e803d622dc0847b0295eda88aaee83b2eeb42/vrs_anvil-0.0.1rc2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-08 21:57:07",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ohsu-comp-bio",
    "github_project": "vrs_anvil_toolkit",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "ga4gh.vrs",
            "specs": [
                [
                    "==",
                    "2.0.0a5"
                ]
            ]
        },
        {
            "name": "diskcache",
            "specs": []
        },
        {
            "name": "biocommons.seqrepo",
            "specs": []
        },
        {
            "name": "glom",
            "specs": []
        },
        {
            "name": "click",
            "specs": []
        },
        {
            "name": "pyyaml",
            "specs": []
        },
        {
            "name": "google",
            "specs": []
        },
        {
            "name": "requests",
            "specs": []
        },
        {
            "name": "boto3",
            "specs": []
        },
        {
            "name": "tqdm",
            "specs": []
        },
        {
            "name": "google-cloud-storage",
            "specs": []
        },
        {
            "name": "psutil",
            "specs": []
        }
    ],
    "lcname": "vrs-anvil"
}
        
Elapsed time: 0.22627s