vcfpy


Namevcfpy JSON
Version 0.13.8 PyPI version JSON
download
home_pagehttps://github.com/bihealth/vcfpy
SummaryPython 3 VCF library with good support for both reading and writing
upload_time2024-01-10 07:14:23
maintainer
docs_urlNone
authorManuel Holtgrewe
requires_python
licenseMIT license
keywords vcfpy
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![pypi](https://img.shields.io/pypi/v/vcfpy.svg)](https://pypi.python.org/pypi/vcfpy)
[![bioconda](https://img.shields.io/conda/dn/bioconda/vcfpy.svg?label=Bioconda)](https://bioconda.github.io/recipes/vcfpy/README.html)
[![CI](https://github.com/bihealth/vcfpy/actions/workflows/main.yml/badge.svg)](https://github.com/bihealth/vcfpy/actions/workflows/main.yml)
[![Documentation Status](https://readthedocs.org/projects/vcfpy/badge/?version=latest)](https://vcfpy.readthedocs.io/en/latest/?badge=latest)
[![Publication in The Journal of Open Source Software](http://joss.theoj.org/papers/edae85d90ea8a49843dbaaa109e47cba/status.svg)](http://joss.theoj.org/papers/10.21105/joss.00085)

# VCFPy

Python 3 VCF library with good support for both reading and writing

- Free software: MIT license
- Documentation: <https://vcfpy.readthedocs.io>.

## Features

- Support for reading and writing VCF v4.3
- Interface to `INFO` and `FORMAT` fields is based on `OrderedDict` allows for easier modification than PyVCF (also I find this more pythonic)
- Read (and jump in) and write BGZF files just using `vcfpy`

## Why another VCF parser for Python!

I've been using PyVCF with quite some success in the past. However, the
main bottleneck of PyVCF is when you want to modify the per-sample
genotype information. There are some issues in the tracker of PyVCF but
none of them can really be considered solved. I tried several hours to
solve these problems within PyVCF but this never got far or towards a
complete rewrite...

For this reason, VCFPy was born and here it is!

## What's the State?

VCFPy is the result of two full days of development plus some
maintenance work later now (right now). I'm using it in several projects
but it is not as battle-tested as PyVCF.

## Why Python 3 Only?

As I'm only using Python 3 code, I see no advantage in carrying around
support for legacy Python 2 and maintaining it. At a later point when
VCFPy is known to be stable, Python 2 support might be added if someone
contributes a pull request.



# Changelog

## [0.13.8](https://github.com/bihealth/vcfpy/compare/v0.13.7...v0.13.8) (2024-01-10)


### Bug Fixes

* fixing manifest for changelog ([#169](https://github.com/bihealth/vcfpy/issues/169)) ([83c5b8e](https://github.com/bihealth/vcfpy/commit/83c5b8e6cd1199245673cc0d8deb2d6f3646d183))

## [0.13.7](https://github.com/bihealth/vcfpy/compare/v0.13.6...v0.13.7) (2024-01-10)


### Bug Fixes

* remove versioneer Python 3.12 compatibility ([#160](https://github.com/bihealth/vcfpy/issues/160)) ([5e2860e](https://github.com/bihealth/vcfpy/commit/5e2860e22042aa794304c8805ca716a39c88f24e))


## [0.13.6](https://github.com/bihealth/vcfpy/compare/v0.13.5...v0.13.6) (2022-11-28)

- Fixing bug in `setup.py` that prevented `pysam` dependency to be loaded (#150).

## v0.13.5 (2022-11-13)

- Treat `.bgz` files the same as `.gz` (#145, \#149)

## v0.13.4 (2022-04-13)

- Switching to Github Actions for CI
- Fix INFO flag raises TypeError (#146)

## v0.13.3 (2020-09-14)

- Adding `Record.update_calls`.
- Making `Record.{format,calls}` use list when empty

## v0.13.2 (2020-08-20)

- Adding `Call.set_genotype()`.

## v0.13.1 (2020-08-20)

- Fixed `Call.ploidy`.
- Fixed `Call.is_variant`.

## v0.13.0 (2020-07-10)

- Fixing bug in case `GT` describes only one allele.
- Proper escaping of colon and semicolon (or the lack of escaping) in
  `INFO` and `FORMAT`.

## v0.12.2 (2020-04-29)

- Fixing bug in case `GT` describes only one allele.

## v0.12.1 (2019-03-08)

- Not warning on `PASS` filter if not defined in header.

## v0.12.0 (2019-01-29)

- Fixing tests for Python \>=3.6
- Fixing CI, improving tox integration.
- Applying `black` formatting.
- Replacing Makefile with more minimal one.
- Removing some linting errors from flake8.
- Adding support for reading VCF without `FORMAT` or any sample
  column.
- Adding support for writing headers and records without `FORMAT` and
  any sample columns.

## v0.11.2 (2018-04-16)

- Removing `pip` module from `setup.py` which is not recommended
  anyway.

## v0.11.1 (2018-03-06)

- Working around problem in HTSJDK output with incomplete `FORMAT`
  fields (#127). Writing out `.` instead of keeping trailing empty
  records empty.

## v0.11.0 (2017-11-22)

- The field `FORMAT/FT` is now expected to be a semicolon-separated
  string. Internally, we will handle it as a list.
- Switching from warning helper utility code to Python `warnings`
  module.
- Return `str` in case of problems with parsing value.

## v0.10.0 (2017-02-27)

- Extending API to allow for reading subsets of records. (Writing for
  sample subsets or reordered samples is possible through using the
  appropriate `names` list in the `SamplesInfos` for the `Writer`).
- Deep-copying header lines and samples infos on `Writer` construction
- Using `samples` attribute from `Header` in `Reader` and `Writer`
  instead of passing explicitely

## 0.9.0 (2017-02-26)

- Restructuring of requirements.txt files
- Fixing parsing of no-call `GT` fields

## 0.8.1 (2017-02-08)

- PEP8 style adjustments
- Using versioneer for versioning
- Using `requirements*.txt` files now from setup.py
- Fixing dependency on cyordereddict to be for Python \<3.6 instead of
  \<3.5
- Jumping by samtools coordinate string now also allowed

## 0.8.0 (2016-10-31)

- Adding `Header.has_header_line` for querying existence of header
  line
- `Header.add_*_line` return a `bool` no indicating any conflicts
- Construction of Writer uses samples within header and no extra
  parameter (breaks API)

## 0.7.0 (2016-09-25)

- Smaller improvements and fixes to documentation
- Adding Codacy coverage and static code analysis results to README
- Various smaller code cleanup triggered by Codacy results
- Adding `__eq__`, `__neq__` and `__hash__` to data types (where
  applicable)

## 0.6.0 (2016-09-25

- Refining implementation for breakend and symbolic allele class
- Removing `record.SV_CODES`
- Refactoring parser module a bit to make the code cleaner
- Fixing small typos and problems in documentation

## 0.5.0 (2016-09-24)

- Deactivating warnings on record parsing by default because of
  performance
- Adding validation for `INFO` and `FORMAT` fields on reading (#8)
- Adding predefined `INFO` and `FORMAT` fields to `pyvcf.header` (#32)

## 0.4.1 (2016-09-22)

- Initially enabling codeclimate

## 0.4.0 (2016-09-22)

- Exporting constants for encoding variant types
- Exporting genotype constants `HOM_REF`, `HOM_ALT`, `HET`
- Implementing `Call.is_phased`, `Call.is_het`, `Call.is_variant`,
  `Call.is_phased`, `Call.is_hom_ref`, `Call.is_hom_alt`
- Removing `Call.phased` (breaks API, next release is 0.4.0)
- Adding tests, fixing bugs for methods of `Call`

## 0.3.1 (2016-09-21)

- Work around `FORMAT/FT` being a string; this is done so in the Delly
  output

## 0.3.0 (2016-09-21)

- `Reader` and `Writer` can now be used as context manager (with
  `with`)
- Including license in documentation, including Biopython license
- Adding support for writing bgzf files (taken from Biopython)
- Adding support for parsing arrays in header lines
- Removing `example-4.1-bnd.vcf` example file because v4.1 tumor
  derival lacks `ID` field
- Adding `AltAlleleHeaderLine`, `MetaHeaderLine`,
  `PedigreeHeaderLine`, and `SampleHeaderLine`
- Renaming `SimpleHeaderFile` to `SimpleHeaderLine`
- Warn on missing `FILTER` entries on parsing
- Reordered parameters in `from_stream` and `from_file` (#18)
- Renamed `from_file` to `from_stream` (#18)
- Renamed `Reader.jump_to` to `Reader.fetch`
- Adding `header_without_lines` function
- Generally extending API to make it esier to use
- Upgrading dependencies, enabling pyup-bot
- Greatly extending documentation

## 0.2.1 (2016-09-19)

- First release on PyPI

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/bihealth/vcfpy",
    "name": "vcfpy",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "vcfpy",
    "author": "Manuel Holtgrewe",
    "author_email": "manuel.holtgrewe@bih-charite.de",
    "download_url": "https://files.pythonhosted.org/packages/e7/c1/314e8bac0b73b683ecbd23e565f3309173eef3e0f9b6d470ef60f5dfdfae/vcfpy-0.13.8.tar.gz",
    "platform": null,
    "description": "[![pypi](https://img.shields.io/pypi/v/vcfpy.svg)](https://pypi.python.org/pypi/vcfpy)\n[![bioconda](https://img.shields.io/conda/dn/bioconda/vcfpy.svg?label=Bioconda)](https://bioconda.github.io/recipes/vcfpy/README.html)\n[![CI](https://github.com/bihealth/vcfpy/actions/workflows/main.yml/badge.svg)](https://github.com/bihealth/vcfpy/actions/workflows/main.yml)\n[![Documentation Status](https://readthedocs.org/projects/vcfpy/badge/?version=latest)](https://vcfpy.readthedocs.io/en/latest/?badge=latest)\n[![Publication in The Journal of Open Source Software](http://joss.theoj.org/papers/edae85d90ea8a49843dbaaa109e47cba/status.svg)](http://joss.theoj.org/papers/10.21105/joss.00085)\n\n# VCFPy\n\nPython 3 VCF library with good support for both reading and writing\n\n- Free software: MIT license\n- Documentation: <https://vcfpy.readthedocs.io>.\n\n## Features\n\n- Support for reading and writing VCF v4.3\n- Interface to `INFO` and `FORMAT` fields is based on `OrderedDict` allows for easier modification than PyVCF (also I find this more pythonic)\n- Read (and jump in) and write BGZF files just using `vcfpy`\n\n## Why another VCF parser for Python!\n\nI've been using PyVCF with quite some success in the past. However, the\nmain bottleneck of PyVCF is when you want to modify the per-sample\ngenotype information. There are some issues in the tracker of PyVCF but\nnone of them can really be considered solved. I tried several hours to\nsolve these problems within PyVCF but this never got far or towards a\ncomplete rewrite...\n\nFor this reason, VCFPy was born and here it is!\n\n## What's the State?\n\nVCFPy is the result of two full days of development plus some\nmaintenance work later now (right now). I'm using it in several projects\nbut it is not as battle-tested as PyVCF.\n\n## Why Python 3 Only?\n\nAs I'm only using Python 3 code, I see no advantage in carrying around\nsupport for legacy Python 2 and maintaining it. At a later point when\nVCFPy is known to be stable, Python 2 support might be added if someone\ncontributes a pull request.\n\n\n\n# Changelog\n\n## [0.13.8](https://github.com/bihealth/vcfpy/compare/v0.13.7...v0.13.8) (2024-01-10)\n\n\n### Bug Fixes\n\n* fixing manifest for changelog ([#169](https://github.com/bihealth/vcfpy/issues/169)) ([83c5b8e](https://github.com/bihealth/vcfpy/commit/83c5b8e6cd1199245673cc0d8deb2d6f3646d183))\n\n## [0.13.7](https://github.com/bihealth/vcfpy/compare/v0.13.6...v0.13.7) (2024-01-10)\n\n\n### Bug Fixes\n\n* remove versioneer Python 3.12 compatibility ([#160](https://github.com/bihealth/vcfpy/issues/160)) ([5e2860e](https://github.com/bihealth/vcfpy/commit/5e2860e22042aa794304c8805ca716a39c88f24e))\n\n\n## [0.13.6](https://github.com/bihealth/vcfpy/compare/v0.13.5...v0.13.6) (2022-11-28)\n\n- Fixing bug in `setup.py` that prevented `pysam` dependency to be loaded (#150).\n\n## v0.13.5 (2022-11-13)\n\n- Treat `.bgz` files the same as `.gz` (#145, \\#149)\n\n## v0.13.4 (2022-04-13)\n\n- Switching to Github Actions for CI\n- Fix INFO flag raises TypeError (#146)\n\n## v0.13.3 (2020-09-14)\n\n- Adding `Record.update_calls`.\n- Making `Record.{format,calls}` use list when empty\n\n## v0.13.2 (2020-08-20)\n\n- Adding `Call.set_genotype()`.\n\n## v0.13.1 (2020-08-20)\n\n- Fixed `Call.ploidy`.\n- Fixed `Call.is_variant`.\n\n## v0.13.0 (2020-07-10)\n\n- Fixing bug in case `GT` describes only one allele.\n- Proper escaping of colon and semicolon (or the lack of escaping) in\n  `INFO` and `FORMAT`.\n\n## v0.12.2 (2020-04-29)\n\n- Fixing bug in case `GT` describes only one allele.\n\n## v0.12.1 (2019-03-08)\n\n- Not warning on `PASS` filter if not defined in header.\n\n## v0.12.0 (2019-01-29)\n\n- Fixing tests for Python \\>=3.6\n- Fixing CI, improving tox integration.\n- Applying `black` formatting.\n- Replacing Makefile with more minimal one.\n- Removing some linting errors from flake8.\n- Adding support for reading VCF without `FORMAT` or any sample\n  column.\n- Adding support for writing headers and records without `FORMAT` and\n  any sample columns.\n\n## v0.11.2 (2018-04-16)\n\n- Removing `pip` module from `setup.py` which is not recommended\n  anyway.\n\n## v0.11.1 (2018-03-06)\n\n- Working around problem in HTSJDK output with incomplete `FORMAT`\n  fields (#127). Writing out `.` instead of keeping trailing empty\n  records empty.\n\n## v0.11.0 (2017-11-22)\n\n- The field `FORMAT/FT` is now expected to be a semicolon-separated\n  string. Internally, we will handle it as a list.\n- Switching from warning helper utility code to Python `warnings`\n  module.\n- Return `str` in case of problems with parsing value.\n\n## v0.10.0 (2017-02-27)\n\n- Extending API to allow for reading subsets of records. (Writing for\n  sample subsets or reordered samples is possible through using the\n  appropriate `names` list in the `SamplesInfos` for the `Writer`).\n- Deep-copying header lines and samples infos on `Writer` construction\n- Using `samples` attribute from `Header` in `Reader` and `Writer`\n  instead of passing explicitely\n\n## 0.9.0 (2017-02-26)\n\n- Restructuring of requirements.txt files\n- Fixing parsing of no-call `GT` fields\n\n## 0.8.1 (2017-02-08)\n\n- PEP8 style adjustments\n- Using versioneer for versioning\n- Using `requirements*.txt` files now from setup.py\n- Fixing dependency on cyordereddict to be for Python \\<3.6 instead of\n  \\<3.5\n- Jumping by samtools coordinate string now also allowed\n\n## 0.8.0 (2016-10-31)\n\n- Adding `Header.has_header_line` for querying existence of header\n  line\n- `Header.add_*_line` return a `bool` no indicating any conflicts\n- Construction of Writer uses samples within header and no extra\n  parameter (breaks API)\n\n## 0.7.0 (2016-09-25)\n\n- Smaller improvements and fixes to documentation\n- Adding Codacy coverage and static code analysis results to README\n- Various smaller code cleanup triggered by Codacy results\n- Adding `__eq__`, `__neq__` and `__hash__` to data types (where\n  applicable)\n\n## 0.6.0 (2016-09-25\n\n- Refining implementation for breakend and symbolic allele class\n- Removing `record.SV_CODES`\n- Refactoring parser module a bit to make the code cleaner\n- Fixing small typos and problems in documentation\n\n## 0.5.0 (2016-09-24)\n\n- Deactivating warnings on record parsing by default because of\n  performance\n- Adding validation for `INFO` and `FORMAT` fields on reading (#8)\n- Adding predefined `INFO` and `FORMAT` fields to `pyvcf.header` (#32)\n\n## 0.4.1 (2016-09-22)\n\n- Initially enabling codeclimate\n\n## 0.4.0 (2016-09-22)\n\n- Exporting constants for encoding variant types\n- Exporting genotype constants `HOM_REF`, `HOM_ALT`, `HET`\n- Implementing `Call.is_phased`, `Call.is_het`, `Call.is_variant`,\n  `Call.is_phased`, `Call.is_hom_ref`, `Call.is_hom_alt`\n- Removing `Call.phased` (breaks API, next release is 0.4.0)\n- Adding tests, fixing bugs for methods of `Call`\n\n## 0.3.1 (2016-09-21)\n\n- Work around `FORMAT/FT` being a string; this is done so in the Delly\n  output\n\n## 0.3.0 (2016-09-21)\n\n- `Reader` and `Writer` can now be used as context manager (with\n  `with`)\n- Including license in documentation, including Biopython license\n- Adding support for writing bgzf files (taken from Biopython)\n- Adding support for parsing arrays in header lines\n- Removing `example-4.1-bnd.vcf` example file because v4.1 tumor\n  derival lacks `ID` field\n- Adding `AltAlleleHeaderLine`, `MetaHeaderLine`,\n  `PedigreeHeaderLine`, and `SampleHeaderLine`\n- Renaming `SimpleHeaderFile` to `SimpleHeaderLine`\n- Warn on missing `FILTER` entries on parsing\n- Reordered parameters in `from_stream` and `from_file` (#18)\n- Renamed `from_file` to `from_stream` (#18)\n- Renamed `Reader.jump_to` to `Reader.fetch`\n- Adding `header_without_lines` function\n- Generally extending API to make it esier to use\n- Upgrading dependencies, enabling pyup-bot\n- Greatly extending documentation\n\n## 0.2.1 (2016-09-19)\n\n- First release on PyPI\n",
    "bugtrack_url": null,
    "license": "MIT license",
    "summary": "Python 3 VCF library with good support for both reading and writing",
    "version": "0.13.8",
    "project_urls": {
        "Homepage": "https://github.com/bihealth/vcfpy"
    },
    "split_keywords": [
        "vcfpy"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e7c1314e8bac0b73b683ecbd23e565f3309173eef3e0f9b6d470ef60f5dfdfae",
                "md5": "1661395212741f0c69da14ad2495fc55",
                "sha256": "e7d00965105e7ca9567299f073ad60c6bbfc78d685d25ba33353988af9b33160"
            },
            "downloads": -1,
            "filename": "vcfpy-0.13.8.tar.gz",
            "has_sig": false,
            "md5_digest": "1661395212741f0c69da14ad2495fc55",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 993203,
            "upload_time": "2024-01-10T07:14:23",
            "upload_time_iso_8601": "2024-01-10T07:14:23.985224Z",
            "url": "https://files.pythonhosted.org/packages/e7/c1/314e8bac0b73b683ecbd23e565f3309173eef3e0f9b6d470ef60f5dfdfae/vcfpy-0.13.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-10 07:14:23",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "bihealth",
    "github_project": "vcfpy",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "landscape": true,
    "requirements": [],
    "tox": true,
    "lcname": "vcfpy"
}
        
Elapsed time: 0.17298s