vcfparser


Namevcfparser JSON
Version 0.2.2 PyPI version JSON
download
home_pagehttps://github.com/everestial/vcfparser
SummaryPython (version <=3.12) package for parsing the genomics and transcriptomics VCF data.
upload_time2024-09-11 11:05:15
maintainerNone
docs_urlNone
authorKiran Bishwa
requires_pythonNone
licenseMIT License Copyright (c) 2019 Kiran N' Bishwa Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords vcfparser
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI
coveralls test coverage No coveralls.
            # vcfparser

![PyPI version](https://img.shields.io/pypi/v/vcfparser.svg)  
[![Travis Build Status](https://img.shields.io/travis/everestial/vcfparser.svg)](https://travis-ci.org/everestial/vcfparser)  
[![Read the Docs](https://readthedocs.org/projects/vcfparser/badge/?version=latest)](https://vcfparser.readthedocs.io/en/latest/?badge=latest)

Python (version >=3.6) package for parsing the genomics and transcriptomics VCF data.

- Free software: MIT license
- Documentation: https://vcfparser.readthedocs.io.

## Features

- No external dependency except python (version >=3.6).
- Minimalistic in nature.
- Provides a lot of features to API users.
- Cython compiling is provided to optimize performance.

## Installation

**Method A:**

`VCFsimplify <https://github.com/everestial/VCF-Simplify>`\_ uses vcfparser API, so the package is readily available if VCFsimplify is already installed.

This is only preferred while developing/optimizing **VcfSimplify** along with **vcfparser**.

Navigate to the VCFsimplify directory ->
activate python ->
call the 'vcfparser' package.

```console

    $ C:\Users\>cd VCF-Simplify
    $ C:\Users\>cd VCF-Simplify>dir
      Volume in drive C is StorageDrive
      Volume Serial Number is .........

      Directory of C:\Users\VCF-Simplify

      07/12/2020  10:14 AM    <DIR>          .
      07/12/2020  10:14 AM    <DIR>          ..
      07/12/2020  08:55 AM    <DIR>          .github
      ............................
      ............................
      07/12/2020  10:42 AM    <DIR>          vcfparser
      07/12/2020  08:55 AM             1,494 VcfSimplify.py
              11 File(s)     20,873,992 bytes
              13 Dir(s)  241,211,793,408 bytes free

    $ C:\Users\VCF-Simplify>python
    Python 3.8.1 (tags/v3.8.1:1b293b6, Dec 18 2019, 22:39:24) [MSC v.1916 (Intel)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from vcfparser import VcfParser
    >>>
```

**Method B (preferred method):**
Pip is the preferred method of installing and using **vcfparser** API if custom python scripts/app are being developed.

```console
    $ pip install vcfparser
```

**Method C:**

For offline install, or in order to build from the source code, follow :ref:`advance install <advanced-install>`.

## Cythonize (optional but helpful)

The installed "vcfparser" package can be cythonized to optimize performance.
Cythonizing the package can increase the speed of the parser by about x.x - y.y (?) times.

TODO: Bhuwan - add required cython method in here

## Usage

```bash
from vcfparser import VcfParser
vcf_obj = VcfParser('input_test.vcf')
```

### Get metadata information from the vcf file

```python
metainfo = vcf_obj.parse_metadata()
metainfo.fileformat
# Output: 'VCFv4.2'

metainfo.filters
# Output: [{'ID': 'LowQual', 'Description': 'Low quality'}, {'ID': 'my_indel_filter', 'Description': 'QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0'}, {'ID': 'my_snp_filter', 'Description': 'QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0'}]

metainfo.alt_
# Output: [{'ID': 'NON_REF', 'Description': 'Represents any possible alternative allele at this location'}]

metainfo.sample_names
# Output: ['ms01e', 'ms02g', 'ms03g', 'ms04h', 'MA611', 'MA605', 'MA622']

metainfo.record_keys
# Output: ['CHROM', 'POS', 'ID', 'REF', 'ALT', 'QUAL', 'FILTER', 'INFO', 'FORMAT', 'ms01e', 'ms02g', 'ms03g', 'ms04h', 'MA611', 'MA605', 'MA622']
```

### Get Records from the vcf file

```python
records = vcf_obj.parse_records()
# Note: Records are returned as a generator.

first_record = next(records)
first_record.CHROM
# Output: '2'

first_record.POS
# Output: '15881018'

first_record.REF
# Output: 'G'

first_record.ALT
# Output: 'A,C'

first_record.QUAL
# Output: '5082.45'

first_record.FILTER
# Output: ['PASS']

first_record.get_mapped_samples()
# Output: {'ms01e': {'GT': './.', 'PI': '.', 'GQ': '.', 'PG': './.', 'PM': '.', 'PW': './.', 'AD': '0,0', 'PL': '0,0,0,.,.,.', 'DP': '0', 'PB': '.', 'PC': '.'},
#           'ms02g': {'GT': './.', 'PI': '.', 'GQ': '.', 'PG': './.', 'PM': '.', 'PW': './.', 'AD': '0,0', 'PL': '0,0,0,.,.,.', 'DP': '0', 'PB': '.', 'PC': '.'},
#           'ms03g': {'GT': './.', 'PI': '.', 'GQ': '.', 'PG': './.', 'PM': '.', 'PW': './.', 'AD': '0,0', 'PL': '0,0,0,.,.,.', 'DP': '0', 'PB': '.', 'PC': '.'},
#           'ms04h': {'GT': '1/1', 'PI': '.', 'GQ': '6', 'PG': '1/1', 'PM': '.', 'PW': '1/1', 'AD': '0,2', 'PL': '49,6,0,.,.,.', 'DP': '2', 'PB': '.', 'PC': '.'},
#           'MA611': {'GT': '0/0', 'PI': '.', 'GQ': '78', 'PG': '0/0', 'PM': '.', 'PW': '0/0', 'AD': '29,0,0', 'PL': '0,78,1170,78,1170,1170', 'DP': '29', 'PB': '.', 'PC': '.'},
#           'MA605': {'GT': '0/0', 'PI': '.', 'GQ': '9', 'PG': '0/0', 'PM': '.', 'PW': '0/0', 'AD': '3,0,0', 'PL': '0,9,112,9,112,112', 'DP': '3', 'PB': '.', 'PC': '.'},
#           'MA622': {'GT': '0/0', 'PI': '.', 'GQ': '99', 'PG': '0/0', 'PM': '.', 'PW': '0/0', 'AD': '40,0,0', 'PL': '0,105,1575,105,1575,1575', 'DP': '40', 'PB': '.', 'PC': '.\n'}}
```

TODO: Bhuwan (priority - high)
The very last example "first_record.get_mapped_samples()" is returning the value of the last sample/key with "\n".
i.e: 'PC': '.\n'
Please fix that issue - strip('\n') in the line before parsing.

|

Alternately, we can loop over each record by using a for-loop:

```bash

    for record in records:
        chrom = record.CHROM
        pos = record.POS
        id = record.ID
        ref = record.REF
        alt = record.ALT
        qual = record.QUAL
        filter = record.FILTER
        format_ = record.format_
        infos = record.get_info_dict()
        mapped_sample = record.get_mapped_samples()
```

- For more specific use cases please check the examples in the following section:
- For tutorials in metadata, please follow :ref:`Metadata Tutorial <metadata-tutorial>`.
- For tutorials in record parser, please follow :ref:`Record Parser Tutorial <record-parser-tutorial>`.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/everestial/vcfparser",
    "name": "vcfparser",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "vcfparser",
    "author": "Kiran Bishwa",
    "author_email": "Kiran Bishwa <kirannbishwa01@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/4b/b8/3e2746566a07cb11ceec1015edbb8b353c0d8e4132e2742c6b7580027b90/vcfparser-0.2.2.tar.gz",
    "platform": null,
    "description": "# vcfparser\n\n![PyPI version](https://img.shields.io/pypi/v/vcfparser.svg)  \n[![Travis Build Status](https://img.shields.io/travis/everestial/vcfparser.svg)](https://travis-ci.org/everestial/vcfparser)  \n[![Read the Docs](https://readthedocs.org/projects/vcfparser/badge/?version=latest)](https://vcfparser.readthedocs.io/en/latest/?badge=latest)\n\nPython (version >=3.6) package for parsing the genomics and transcriptomics VCF data.\n\n- Free software: MIT license\n- Documentation: https://vcfparser.readthedocs.io.\n\n## Features\n\n- No external dependency except python (version >=3.6).\n- Minimalistic in nature.\n- Provides a lot of features to API users.\n- Cython compiling is provided to optimize performance.\n\n## Installation\n\n**Method A:**\n\n`VCFsimplify <https://github.com/everestial/VCF-Simplify>`\\_ uses vcfparser API, so the package is readily available if VCFsimplify is already installed.\n\nThis is only preferred while developing/optimizing **VcfSimplify** along with **vcfparser**.\n\nNavigate to the VCFsimplify directory ->\nactivate python ->\ncall the 'vcfparser' package.\n\n```console\n\n    $ C:\\Users\\>cd VCF-Simplify\n    $ C:\\Users\\>cd VCF-Simplify>dir\n      Volume in drive C is StorageDrive\n      Volume Serial Number is .........\n\n      Directory of C:\\Users\\VCF-Simplify\n\n      07/12/2020  10:14 AM    <DIR>          .\n      07/12/2020  10:14 AM    <DIR>          ..\n      07/12/2020  08:55 AM    <DIR>          .github\n      ............................\n      ............................\n      07/12/2020  10:42 AM    <DIR>          vcfparser\n      07/12/2020  08:55 AM             1,494 VcfSimplify.py\n              11 File(s)     20,873,992 bytes\n              13 Dir(s)  241,211,793,408 bytes free\n\n    $ C:\\Users\\VCF-Simplify>python\n    Python 3.8.1 (tags/v3.8.1:1b293b6, Dec 18 2019, 22:39:24) [MSC v.1916 (Intel)] on win32\n    Type \"help\", \"copyright\", \"credits\" or \"license\" for more information.\n    >>> from vcfparser import VcfParser\n    >>>\n```\n\n**Method B (preferred method):**\nPip is the preferred method of installing and using **vcfparser** API if custom python scripts/app are being developed.\n\n```console\n    $ pip install vcfparser\n```\n\n**Method C:**\n\nFor offline install, or in order to build from the source code, follow :ref:`advance install <advanced-install>`.\n\n## Cythonize (optional but helpful)\n\nThe installed \"vcfparser\" package can be cythonized to optimize performance.\nCythonizing the package can increase the speed of the parser by about x.x - y.y (?) times.\n\nTODO: Bhuwan - add required cython method in here\n\n## Usage\n\n```bash\nfrom vcfparser import VcfParser\nvcf_obj = VcfParser('input_test.vcf')\n```\n\n### Get metadata information from the vcf file\n\n```python\nmetainfo = vcf_obj.parse_metadata()\nmetainfo.fileformat\n# Output: 'VCFv4.2'\n\nmetainfo.filters\n# Output: [{'ID': 'LowQual', 'Description': 'Low quality'}, {'ID': 'my_indel_filter', 'Description': 'QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0'}, {'ID': 'my_snp_filter', 'Description': 'QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0'}]\n\nmetainfo.alt_\n# Output: [{'ID': 'NON_REF', 'Description': 'Represents any possible alternative allele at this location'}]\n\nmetainfo.sample_names\n# Output: ['ms01e', 'ms02g', 'ms03g', 'ms04h', 'MA611', 'MA605', 'MA622']\n\nmetainfo.record_keys\n# Output: ['CHROM', 'POS', 'ID', 'REF', 'ALT', 'QUAL', 'FILTER', 'INFO', 'FORMAT', 'ms01e', 'ms02g', 'ms03g', 'ms04h', 'MA611', 'MA605', 'MA622']\n```\n\n### Get Records from the vcf file\n\n```python\nrecords = vcf_obj.parse_records()\n# Note: Records are returned as a generator.\n\nfirst_record = next(records)\nfirst_record.CHROM\n# Output: '2'\n\nfirst_record.POS\n# Output: '15881018'\n\nfirst_record.REF\n# Output: 'G'\n\nfirst_record.ALT\n# Output: 'A,C'\n\nfirst_record.QUAL\n# Output: '5082.45'\n\nfirst_record.FILTER\n# Output: ['PASS']\n\nfirst_record.get_mapped_samples()\n# Output: {'ms01e': {'GT': './.', 'PI': '.', 'GQ': '.', 'PG': './.', 'PM': '.', 'PW': './.', 'AD': '0,0', 'PL': '0,0,0,.,.,.', 'DP': '0', 'PB': '.', 'PC': '.'},\n#           'ms02g': {'GT': './.', 'PI': '.', 'GQ': '.', 'PG': './.', 'PM': '.', 'PW': './.', 'AD': '0,0', 'PL': '0,0,0,.,.,.', 'DP': '0', 'PB': '.', 'PC': '.'},\n#           'ms03g': {'GT': './.', 'PI': '.', 'GQ': '.', 'PG': './.', 'PM': '.', 'PW': './.', 'AD': '0,0', 'PL': '0,0,0,.,.,.', 'DP': '0', 'PB': '.', 'PC': '.'},\n#           'ms04h': {'GT': '1/1', 'PI': '.', 'GQ': '6', 'PG': '1/1', 'PM': '.', 'PW': '1/1', 'AD': '0,2', 'PL': '49,6,0,.,.,.', 'DP': '2', 'PB': '.', 'PC': '.'},\n#           'MA611': {'GT': '0/0', 'PI': '.', 'GQ': '78', 'PG': '0/0', 'PM': '.', 'PW': '0/0', 'AD': '29,0,0', 'PL': '0,78,1170,78,1170,1170', 'DP': '29', 'PB': '.', 'PC': '.'},\n#           'MA605': {'GT': '0/0', 'PI': '.', 'GQ': '9', 'PG': '0/0', 'PM': '.', 'PW': '0/0', 'AD': '3,0,0', 'PL': '0,9,112,9,112,112', 'DP': '3', 'PB': '.', 'PC': '.'},\n#           'MA622': {'GT': '0/0', 'PI': '.', 'GQ': '99', 'PG': '0/0', 'PM': '.', 'PW': '0/0', 'AD': '40,0,0', 'PL': '0,105,1575,105,1575,1575', 'DP': '40', 'PB': '.', 'PC': '.\\n'}}\n```\n\nTODO: Bhuwan (priority - high)\nThe very last example \"first_record.get_mapped_samples()\" is returning the value of the last sample/key with \"\\n\".\ni.e: 'PC': '.\\n'\nPlease fix that issue - strip('\\n') in the line before parsing.\n\n|\n\nAlternately, we can loop over each record by using a for-loop:\n\n```bash\n\n    for record in records:\n        chrom = record.CHROM\n        pos = record.POS\n        id = record.ID\n        ref = record.REF\n        alt = record.ALT\n        qual = record.QUAL\n        filter = record.FILTER\n        format_ = record.format_\n        infos = record.get_info_dict()\n        mapped_sample = record.get_mapped_samples()\n```\n\n- For more specific use cases please check the examples in the following section:\n- For tutorials in metadata, please follow :ref:`Metadata Tutorial <metadata-tutorial>`.\n- For tutorials in record parser, please follow :ref:`Record Parser Tutorial <record-parser-tutorial>`.\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2019 Kiran N' Bishwa  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ",
    "summary": "Python (version <=3.12) package for parsing the genomics and transcriptomics VCF data.",
    "version": "0.2.2",
    "project_urls": {
        "Homepage": "https://github.com/everestial/vcfparser"
    },
    "split_keywords": [
        "vcfparser"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9b259a25f4c345497f4d0029f84a128b994ace8e3ca4b6594fa4c2816a7560d7",
                "md5": "286c596fbca93167e05001d1262f66aa",
                "sha256": "81cfa15b41e8d7ebacc8745fea4b524a2cf721d4593c0ca87c8fd2e8a327a829"
            },
            "downloads": -1,
            "filename": "vcfparser-0.2.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "286c596fbca93167e05001d1262f66aa",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 17837,
            "upload_time": "2024-09-11T11:05:14",
            "upload_time_iso_8601": "2024-09-11T11:05:14.046144Z",
            "url": "https://files.pythonhosted.org/packages/9b/25/9a25f4c345497f4d0029f84a128b994ace8e3ca4b6594fa4c2816a7560d7/vcfparser-0.2.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4bb83e2746566a07cb11ceec1015edbb8b353c0d8e4132e2742c6b7580027b90",
                "md5": "a921c81355660ee7e3fbc99034fb6eec",
                "sha256": "476db6e7601675c94f5450dadf83dabc5e9b75062712ed72abfab85dd7c727e3"
            },
            "downloads": -1,
            "filename": "vcfparser-0.2.2.tar.gz",
            "has_sig": false,
            "md5_digest": "a921c81355660ee7e3fbc99034fb6eec",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 19572,
            "upload_time": "2024-09-11T11:05:15",
            "upload_time_iso_8601": "2024-09-11T11:05:15.237473Z",
            "url": "https://files.pythonhosted.org/packages/4b/b8/3e2746566a07cb11ceec1015edbb8b353c0d8e4132e2742c6b7580027b90/vcfparser-0.2.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-11 11:05:15",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "everestial",
    "github_project": "vcfparser",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "tox": true,
    "lcname": "vcfparser"
}
        
Elapsed time: 3.79302s