clinvar-tsv


Nameclinvar-tsv JSON
Version 0.6.3 PyPI version JSON
download
home_pagehttps://github.com/bihealth/clinvar-tsv
SummaryPython 3 library for accessing and managing BioMedical sheets
upload_time2023-06-22 09:37:01
maintainer
docs_urlNone
authorManuel Holtgrewe
requires_python
licenseMIT license
keywords clinvar
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![CI](https://github.com/bihealth/clinvar-tsv/actions/workflows/ci.yml/badge.svg)](https://github.com/bihealth/clinvar-tsv/actions/workflows/ci.yml)
[![codecov](https://codecov.io/gh/bihealth/clinvar-tsv/branch/main/graph/badge.svg?token=287XB5P11T)](https://codecov.io/gh/bihealth/clinvar-tsv)
[![Package Version](https://img.shields.io/pypi/v/clinvar-tsv.svg)](https://pypi.org/project/clinvar-tsv)
[![Python Versions](https://img.shields.io/pypi/pyversions/clinvar-tsv.svg)](https://pypi.org/project/clinvar-tsv)

# Clinvar-TSV

The code in this repository allows to first download,t hen convert ClinVar XML files into TSV files (one for b37 and b38).
The TSV files will contain one entry for each ClinVar `<ReferenceClinVarAssertion>` entry with important information extracted from ClinVar.
The code is used by [bihealth/varfish-db-downloader](https://github.com/bihealth/varfish-db-downloader).

- [clinvar-tsv on PyPi](https://pypi.org/project/clinvar-tsv/)
- [clinvar-tsv on bioconda](http://bioconda.github.io/recipes/clinvar-tsv/README.html)

## Overview

Users usually run the tool by calling `clinvar_tsv main`.

```
$ clinvar_tsv main \
    --cores 2 \
    --b37-path hs37d5.fa \
    --b38-path hs38.fa
```

This will call a Snakemake workflow that will in turn do the following

1. Download the latest ClinVar XML file to the `downloads/` directory using `wget`.
2. Parse the XML file and convert it into a "raw" TSV file in `parsed` for each the 37 and 38 release with `clinvar_tsv parse_xml`.
   This file contains one record for each ClinVar VCV record.
3. Sort this file by coordinate and VCV ID using Unix `sort`, and finally...
4. Merge the lines in the resulting TSV file (for each genome build) by VCV ID and produce aggregate summaries for each VCV.

There are two summaries:

- `summary_clinvar_*` -- which merges record which attempts to imitate the [approach taken by ClinVar](https://www.ncbi.nlm.nih.gov/clinvar/docs/review_status/)
- `summary_paranoid_*` -- which considers all assessment as equally important, whether the reporter provided assessment criteria or not

## References

Documentation in ClinVar:

- https://www.ncbi.nlm.nih.gov/clinvar/docs/review_status/
- https://www.ncbi.nlm.nih.gov/clinvar/docs/help/
- https://www.ncbi.nlm.nih.gov/clinvar/docs/variation_report/


# Changelog

### [0.6.3](https://www.github.com/bihealth/clinvar-tsv/compare/v0.6.2...v0.6.3) (2023-06-22)


### Bug Fixes

* improved debugging for normalize ([#29](https://www.github.com/bihealth/clinvar-tsv/issues/29)) ([4124a34](https://www.github.com/bihealth/clinvar-tsv/commit/4124a3434294ee6736e5487b1586f9a9f0921b02))
* memory usage in normalize command ([#31](https://www.github.com/bihealth/clinvar-tsv/issues/31)) ([33f8ac7](https://www.github.com/bihealth/clinvar-tsv/commit/33f8ac79891f1a36f788bc619e2ad728ebdabbae))

### [0.6.2](https://www.github.com/bihealth/clinvar-tsv/compare/v0.6.1...v0.6.2) (2023-06-21)


### Bug Fixes

* failure to determine location only goes to debug level ([#22](https://www.github.com/bihealth/clinvar-tsv/issues/22)) ([76bc510](https://www.github.com/bihealth/clinvar-tsv/commit/76bc5105e6f0b26146dcaca43465c1dd59d1aee2))
* higher verbosity in Snakemake rules ([#26](https://www.github.com/bihealth/clinvar-tsv/issues/26)) ([a3fc327](https://www.github.com/bihealth/clinvar-tsv/commit/a3fc3271a8984b39d52c8852f6fe77b05b1193c0))
* interpret --cores argument ([#24](https://www.github.com/bihealth/clinvar-tsv/issues/24)) ([3b09803](https://www.github.com/bihealth/clinvar-tsv/commit/3b098038c47fb33bffabf94510cc9b6fee3f7d43))
* map "low penetrance" to "uncertain significance" ([#25](https://www.github.com/bihealth/clinvar-tsv/issues/25)) ([b2708d7](https://www.github.com/bihealth/clinvar-tsv/commit/b2708d75ad37d4270c253bf6928056c7deba8d84))
* no verbose output by default ([#27](https://www.github.com/bihealth/clinvar-tsv/issues/27)) ([0ad10cb](https://www.github.com/bihealth/clinvar-tsv/commit/0ad10cb8122480d2f46fbb6d2fba1e063be6da3c))
* reduce tqdm progress display unless on TTY ([#21](https://www.github.com/bihealth/clinvar-tsv/issues/21)) ([770b0c8](https://www.github.com/bihealth/clinvar-tsv/commit/770b0c833a2707a88e5ee9c0f2a0eb1435defdc6))

### [0.6.1](https://www.github.com/bihealth/clinvar-tsv/compare/v0.6.0...v0.6.1) (2023-06-21)


### Bug Fixes

* missing/problematic clinvar version ([#19](https://www.github.com/bihealth/clinvar-tsv/issues/19)) ([b11a8a4](https://www.github.com/bihealth/clinvar-tsv/commit/b11a8a435d9269031589106cf8929169893db5ef))

## [0.6.0](https://www.github.com/bihealth/clinvar-tsv/compare/v0.5.0...v0.6.0) (2023-06-21)


### Features

* allow providing clinvar version ([#17](https://www.github.com/bihealth/clinvar-tsv/issues/17)) ([dd80f2d](https://www.github.com/bihealth/clinvar-tsv/commit/dd80f2d10fceab350c61fa8de61bbe6264ad2008))


### Documentation

* adding badges to README ([#15](https://www.github.com/bihealth/clinvar-tsv/issues/15)) ([6e7ac01](https://www.github.com/bihealth/clinvar-tsv/commit/6e7ac013be4c0c7c34df41b69594581a7b8116f9))

## [0.5.0](https://www.github.com/bihealth/clinvar-tsv/compare/v0.4.1...v0.5.0) (2023-05-03)


### Features

* export structural variants ([#13](https://www.github.com/bihealth/clinvar-tsv/issues/13)) ([db44d87](https://www.github.com/bihealth/clinvar-tsv/commit/db44d8739f6f619266f806611950f339b0842352))

## 0.4.1

- Also writing out ``set_type`` column (#10).

## 0.4.0

- Greatly refining record merging strategy (#6).
  Also, providing both a ClinVar-like and a paranoid merging scheme.
- Improving CI (#7)

## 0.3.0

- Various refinements of the code.
- Adding tests and CI.

## 0.2.2

- Fixing bug with quotes.

## 0.2.1

- Fixing bug in setting clinical significance flags.

## 0.2.0

- Complete refurbishing of XML parsing, using models based on python-attrs.
- Removing old tests.

## 0.1.1

- Fixing installation of Snakefile.

## 0.1.0

- First actual release, versioning done using versioneer.
- Everything is new!

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/bihealth/clinvar-tsv",
    "name": "clinvar-tsv",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "clinvar",
    "author": "Manuel Holtgrewe",
    "author_email": "manuel.holtgrewe@bihealth.de",
    "download_url": "https://files.pythonhosted.org/packages/62/58/4703c553795d30e2d67858127db96463758261446acc2afa171bae731354/clinvar-tsv-0.6.3.tar.gz",
    "platform": null,
    "description": "[![CI](https://github.com/bihealth/clinvar-tsv/actions/workflows/ci.yml/badge.svg)](https://github.com/bihealth/clinvar-tsv/actions/workflows/ci.yml)\n[![codecov](https://codecov.io/gh/bihealth/clinvar-tsv/branch/main/graph/badge.svg?token=287XB5P11T)](https://codecov.io/gh/bihealth/clinvar-tsv)\n[![Package Version](https://img.shields.io/pypi/v/clinvar-tsv.svg)](https://pypi.org/project/clinvar-tsv)\n[![Python Versions](https://img.shields.io/pypi/pyversions/clinvar-tsv.svg)](https://pypi.org/project/clinvar-tsv)\n\n# Clinvar-TSV\n\nThe code in this repository allows to first download,t hen convert ClinVar XML files into TSV files (one for b37 and b38).\nThe TSV files will contain one entry for each ClinVar `<ReferenceClinVarAssertion>` entry with important information extracted from ClinVar.\nThe code is used by [bihealth/varfish-db-downloader](https://github.com/bihealth/varfish-db-downloader).\n\n- [clinvar-tsv on PyPi](https://pypi.org/project/clinvar-tsv/)\n- [clinvar-tsv on bioconda](http://bioconda.github.io/recipes/clinvar-tsv/README.html)\n\n## Overview\n\nUsers usually run the tool by calling `clinvar_tsv main`.\n\n```\n$ clinvar_tsv main \\\n    --cores 2 \\\n    --b37-path hs37d5.fa \\\n    --b38-path hs38.fa\n```\n\nThis will call a Snakemake workflow that will in turn do the following\n\n1. Download the latest ClinVar XML file to the `downloads/` directory using `wget`.\n2. Parse the XML file and convert it into a \"raw\" TSV file in `parsed` for each the 37 and 38 release with `clinvar_tsv parse_xml`.\n   This file contains one record for each ClinVar VCV record.\n3. Sort this file by coordinate and VCV ID using Unix `sort`, and finally...\n4. Merge the lines in the resulting TSV file (for each genome build) by VCV ID and produce aggregate summaries for each VCV.\n\nThere are two summaries:\n\n- `summary_clinvar_*` -- which merges record which attempts to imitate the [approach taken by ClinVar](https://www.ncbi.nlm.nih.gov/clinvar/docs/review_status/)\n- `summary_paranoid_*` -- which considers all assessment as equally important, whether the reporter provided assessment criteria or not\n\n## References\n\nDocumentation in ClinVar:\n\n- https://www.ncbi.nlm.nih.gov/clinvar/docs/review_status/\n- https://www.ncbi.nlm.nih.gov/clinvar/docs/help/\n- https://www.ncbi.nlm.nih.gov/clinvar/docs/variation_report/\n\n\n# Changelog\n\n### [0.6.3](https://www.github.com/bihealth/clinvar-tsv/compare/v0.6.2...v0.6.3) (2023-06-22)\n\n\n### Bug Fixes\n\n* improved debugging for normalize ([#29](https://www.github.com/bihealth/clinvar-tsv/issues/29)) ([4124a34](https://www.github.com/bihealth/clinvar-tsv/commit/4124a3434294ee6736e5487b1586f9a9f0921b02))\n* memory usage in normalize command ([#31](https://www.github.com/bihealth/clinvar-tsv/issues/31)) ([33f8ac7](https://www.github.com/bihealth/clinvar-tsv/commit/33f8ac79891f1a36f788bc619e2ad728ebdabbae))\n\n### [0.6.2](https://www.github.com/bihealth/clinvar-tsv/compare/v0.6.1...v0.6.2) (2023-06-21)\n\n\n### Bug Fixes\n\n* failure to determine location only goes to debug level ([#22](https://www.github.com/bihealth/clinvar-tsv/issues/22)) ([76bc510](https://www.github.com/bihealth/clinvar-tsv/commit/76bc5105e6f0b26146dcaca43465c1dd59d1aee2))\n* higher verbosity in Snakemake rules ([#26](https://www.github.com/bihealth/clinvar-tsv/issues/26)) ([a3fc327](https://www.github.com/bihealth/clinvar-tsv/commit/a3fc3271a8984b39d52c8852f6fe77b05b1193c0))\n* interpret --cores argument ([#24](https://www.github.com/bihealth/clinvar-tsv/issues/24)) ([3b09803](https://www.github.com/bihealth/clinvar-tsv/commit/3b098038c47fb33bffabf94510cc9b6fee3f7d43))\n* map \"low penetrance\" to \"uncertain significance\" ([#25](https://www.github.com/bihealth/clinvar-tsv/issues/25)) ([b2708d7](https://www.github.com/bihealth/clinvar-tsv/commit/b2708d75ad37d4270c253bf6928056c7deba8d84))\n* no verbose output by default ([#27](https://www.github.com/bihealth/clinvar-tsv/issues/27)) ([0ad10cb](https://www.github.com/bihealth/clinvar-tsv/commit/0ad10cb8122480d2f46fbb6d2fba1e063be6da3c))\n* reduce tqdm progress display unless on TTY ([#21](https://www.github.com/bihealth/clinvar-tsv/issues/21)) ([770b0c8](https://www.github.com/bihealth/clinvar-tsv/commit/770b0c833a2707a88e5ee9c0f2a0eb1435defdc6))\n\n### [0.6.1](https://www.github.com/bihealth/clinvar-tsv/compare/v0.6.0...v0.6.1) (2023-06-21)\n\n\n### Bug Fixes\n\n* missing/problematic clinvar version ([#19](https://www.github.com/bihealth/clinvar-tsv/issues/19)) ([b11a8a4](https://www.github.com/bihealth/clinvar-tsv/commit/b11a8a435d9269031589106cf8929169893db5ef))\n\n## [0.6.0](https://www.github.com/bihealth/clinvar-tsv/compare/v0.5.0...v0.6.0) (2023-06-21)\n\n\n### Features\n\n* allow providing clinvar version ([#17](https://www.github.com/bihealth/clinvar-tsv/issues/17)) ([dd80f2d](https://www.github.com/bihealth/clinvar-tsv/commit/dd80f2d10fceab350c61fa8de61bbe6264ad2008))\n\n\n### Documentation\n\n* adding badges to README ([#15](https://www.github.com/bihealth/clinvar-tsv/issues/15)) ([6e7ac01](https://www.github.com/bihealth/clinvar-tsv/commit/6e7ac013be4c0c7c34df41b69594581a7b8116f9))\n\n## [0.5.0](https://www.github.com/bihealth/clinvar-tsv/compare/v0.4.1...v0.5.0) (2023-05-03)\n\n\n### Features\n\n* export structural variants ([#13](https://www.github.com/bihealth/clinvar-tsv/issues/13)) ([db44d87](https://www.github.com/bihealth/clinvar-tsv/commit/db44d8739f6f619266f806611950f339b0842352))\n\n## 0.4.1\n\n- Also writing out ``set_type`` column (#10).\n\n## 0.4.0\n\n- Greatly refining record merging strategy (#6).\n  Also, providing both a ClinVar-like and a paranoid merging scheme.\n- Improving CI (#7)\n\n## 0.3.0\n\n- Various refinements of the code.\n- Adding tests and CI.\n\n## 0.2.2\n\n- Fixing bug with quotes.\n\n## 0.2.1\n\n- Fixing bug in setting clinical significance flags.\n\n## 0.2.0\n\n- Complete refurbishing of XML parsing, using models based on python-attrs.\n- Removing old tests.\n\n## 0.1.1\n\n- Fixing installation of Snakefile.\n\n## 0.1.0\n\n- First actual release, versioning done using versioneer.\n- Everything is new!\n",
    "bugtrack_url": null,
    "license": "MIT license",
    "summary": "Python 3 library for accessing and managing BioMedical sheets",
    "version": "0.6.3",
    "project_urls": {
        "Homepage": "https://github.com/bihealth/clinvar-tsv"
    },
    "split_keywords": [
        "clinvar"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "62584703c553795d30e2d67858127db96463758261446acc2afa171bae731354",
                "md5": "8e9d9361519b5c3de894e37aecd50ad0",
                "sha256": "1ff604fea0979313d8ee2eb6be7aafa0cd8e7d644d9561c1e06ef03f1a93812f"
            },
            "downloads": -1,
            "filename": "clinvar-tsv-0.6.3.tar.gz",
            "has_sig": false,
            "md5_digest": "8e9d9361519b5c3de894e37aecd50ad0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 164972,
            "upload_time": "2023-06-22T09:37:01",
            "upload_time_iso_8601": "2023-06-22T09:37:01.371460Z",
            "url": "https://files.pythonhosted.org/packages/62/58/4703c553795d30e2d67858127db96463758261446acc2afa171bae731354/clinvar-tsv-0.6.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-22 09:37:01",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "bihealth",
    "github_project": "clinvar-tsv",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "tox": true,
    "lcname": "clinvar-tsv"
}
        
Elapsed time: 0.07878s