tb-variant-filter


Nametb-variant-filter JSON
Version 0.4.0 PyPI version JSON
download
home_pagehttps://github.com/COMBAT-TB/tb_variant_filter
SummaryThis tool offers multiple options for filtering variants (in VCF files, relative to M. tuberculosis H37Rv).
upload_time2023-08-10 13:48:46
maintainer
docs_urlNone
author"Peter van Heusden"
requires_python>=3.7
licenseGPLv3
keywords bioinformatics tuberculosis
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ### tb_variant_filter

![tb variant filter build status](https://github.com/COMBAT-TB/tb_variant_filter/actions/workflows/tb_variant_filter.yml/badge.svg)

This tool offers multiple options for filtering variants (in VCF files, relative to M. tuberculosis H37Rv coordinates).

It currently has 5 main modes:

1. Filter by region. Mask out variants in certain regions. Region lists available as:
    1. `farhat_rlc` and `farhat_rlc_lowmap`:  Refined Low Confidence (RLC) and RLC plus low mappability regions from [Marin et al](https://doi.org/10.1093/bioinformatics/btac023)
    1. `pe_ppe`: PE/PPE genes from [Fishbein et al 2015](https://onlinelibrary.wiley.com/doi/full/10.1111/mmi.12981)
    2. `tbprofiler`: [TBProfiler](http://tbdr.lshtm.ac.uk/) list of antibiotic resistant genes
    3. `mtbseq`: [MTBseq](https://github.com/ngs-fzb/MTBseq_source) list of antibiotic resistant genes
    4. `uvp`: [UVP](https://github.com/CPTR-ReSeqTB/UVP) list of repetitive loci in M. tuberculosis genome
2. Filter by proximity to indels. Masks out variants within a certain distance (by default 5 bases) of an insertion or
 deletion site.
3. Filter by percentage of alternate allele bases. Mask out variants with less than a minimum percentage 
(by default 90%) alternative alleles.
4. Filter by depth of reads at a variant site. Masks out variants with less than a minimum depth of coverage 
(default 30) at the site
5. Filter all non-SNV variants. Masks out variants that are not single nucleotide variants.

Filtering by (SAM/BAM) mapping quality was omitted because these filters are performed by the upstream 
workflow we ([SANBI](https://www.sanbi.ac.za)) currently use.
 
When used together the effects of the filters are added (i.e. a variant is masked out if it is masked by any of the filters).

#### Installation

The software is available via [bioconda](https://bioconda.github.io/) and can be installed with:

```
conda install tb_variant_filter
```

#### Usage
```
usage: tb_variant_filter [-h] [--region_filter REGION_FILTER]
                         [--close_to_indel_filter]
                         [--indel_window_size INDEL_WINDOW_SIZE]
                         [--min_percentage_alt_filter]
                         [--min_percentage_alt MIN_PERCENTAGE_ALT]
                         [--min_depth_filter] [--min_depth MIN_DEPTH]
                         [--snv_only_filter]
                         input_file [output_file]

Filter variants from a VCF file (relative to M. tuberculosis H37Rv)

positional arguments:
  input_file            VCF input file (relative to H37Rv)
  output_file           Output file (VCF format)

optional arguments:
  -h, --help            show this help message and exit
  --region_filter REGION_FILTER, -R REGION_FILTER
  --close_to_indel_filter, -I
                        Mask out single nucleotide variants that are too close
                        to indels
  --indel_window_size INDEL_WINDOW_SIZE
                        Window around indel to mask out (mask this number of
                        bases upstream/downstream from the indel. Requires -I
                        option to selected)
  --min_percentage_alt_filter, -P
                        Mask out variants with less than a given percentage
                        variant allele at this site
  --min_percentage_alt MIN_PERCENTAGE_ALT
                        Variants with less than this percentage variants at a
                        site will be masked out
  --min_depth_filter, -D
                        Mask out variants with less than a given depth of
                        reads
  --min_depth MIN_DEPTH
                        Variants at sites with less than this depth of reads
                        will be masked out
  --snv_only_filter     Mask out variants that are not SNVs
```

*Note* that there are no filters by default. Each filter to be used needs to be explicitly mentioned.

To export a region (from the list of possible region masks) in BED format, use the `tb_region_list_to_bed` command:


```
usage: tb_region_list_to_bed [-h] [--chromosome_name CHROMOSOME_NAME]
                             {farhat_rlc,farhat_rlc_lowmap,mtbseq,pe_ppe,tbprofiler,uvp} [output_file]

Output region filter in BED format

positional arguments:
  {farhat_rlc,farhat_rlc_lowmap,mtbseq,pe_ppe,tbprofiler,uvp}
                        Name of region list
  output_file           File to write output to

optional arguments:
  -h, --help            show this help message and exit
  --chromosome_name CHROMOSOME_NAME
                        Chromosome name to use in BED
```

### Testing and development environment

The repository contains a file, [test_environment.yml](test_environment.yml), for creating a [conda](https://docs.conda.io/en/latest/#)
environment for testing and development. Tests can be run with `pytest` and `tox`, where `tox` also uses conda to create the testing environment.

For some tests, locus locations are looked up using the [COMBAT-TB NeoDB](https://combattb.org/combat-tb-neodb/). This requires an
environment variable, `COMBATTB_BOLT_URL`. If this is not set, tests requiring this lookup are skipped. The default in `tox.ini` uses the [SANBI](https://www.sanbi.ac.za/) hosted NeoDB instance.

### Licensing and distribution

This code free software and is licensed under the terms specified in [COPYING](COPYING), i.e under the terms of the
[GNU General Public License version 3](https://www.gnu.org/licenses/gpl-3.0-standalone.html).

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/COMBAT-TB/tb_variant_filter",
    "name": "tb-variant-filter",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "bioinformatics,tuberculosis",
    "author": "\"Peter van Heusden\"",
    "author_email": "pvh@sanbi.ac.za",
    "download_url": "https://files.pythonhosted.org/packages/0b/6b/ec057790b0040cd60182cca15f7239f02c593ee9f305b3dc32bf493f7ccc/tb_variant_filter-0.4.0.tar.gz",
    "platform": null,
    "description": "### tb_variant_filter\n\n![tb variant filter build status](https://github.com/COMBAT-TB/tb_variant_filter/actions/workflows/tb_variant_filter.yml/badge.svg)\n\nThis tool offers multiple options for filtering variants (in VCF files, relative to M. tuberculosis H37Rv coordinates).\n\nIt currently has 5 main modes:\n\n1. Filter by region. Mask out variants in certain regions. Region lists available as:\n    1. `farhat_rlc` and `farhat_rlc_lowmap`:  Refined Low Confidence (RLC) and RLC plus low mappability regions from [Marin et al](https://doi.org/10.1093/bioinformatics/btac023)\n    1. `pe_ppe`: PE/PPE genes from [Fishbein et al 2015](https://onlinelibrary.wiley.com/doi/full/10.1111/mmi.12981)\n    2. `tbprofiler`: [TBProfiler](http://tbdr.lshtm.ac.uk/) list of antibiotic resistant genes\n    3. `mtbseq`: [MTBseq](https://github.com/ngs-fzb/MTBseq_source) list of antibiotic resistant genes\n    4. `uvp`: [UVP](https://github.com/CPTR-ReSeqTB/UVP) list of repetitive loci in M. tuberculosis genome\n2. Filter by proximity to indels. Masks out variants within a certain distance (by default 5 bases) of an insertion or\n deletion site.\n3. Filter by percentage of alternate allele bases. Mask out variants with less than a minimum percentage \n(by default 90%) alternative alleles.\n4. Filter by depth of reads at a variant site. Masks out variants with less than a minimum depth of coverage \n(default 30) at the site\n5. Filter all non-SNV variants. Masks out variants that are not single nucleotide variants.\n\nFiltering by (SAM/BAM) mapping quality was omitted because these filters are performed by the upstream \nworkflow we ([SANBI](https://www.sanbi.ac.za)) currently use.\n \nWhen used together the effects of the filters are added (i.e. a variant is masked out if it is masked by any of the filters).\n\n#### Installation\n\nThe software is available via [bioconda](https://bioconda.github.io/) and can be installed with:\n\n```\nconda install tb_variant_filter\n```\n\n#### Usage\n```\nusage: tb_variant_filter [-h] [--region_filter REGION_FILTER]\n                         [--close_to_indel_filter]\n                         [--indel_window_size INDEL_WINDOW_SIZE]\n                         [--min_percentage_alt_filter]\n                         [--min_percentage_alt MIN_PERCENTAGE_ALT]\n                         [--min_depth_filter] [--min_depth MIN_DEPTH]\n                         [--snv_only_filter]\n                         input_file [output_file]\n\nFilter variants from a VCF file (relative to M. tuberculosis H37Rv)\n\npositional arguments:\n  input_file            VCF input file (relative to H37Rv)\n  output_file           Output file (VCF format)\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --region_filter REGION_FILTER, -R REGION_FILTER\n  --close_to_indel_filter, -I\n                        Mask out single nucleotide variants that are too close\n                        to indels\n  --indel_window_size INDEL_WINDOW_SIZE\n                        Window around indel to mask out (mask this number of\n                        bases upstream/downstream from the indel. Requires -I\n                        option to selected)\n  --min_percentage_alt_filter, -P\n                        Mask out variants with less than a given percentage\n                        variant allele at this site\n  --min_percentage_alt MIN_PERCENTAGE_ALT\n                        Variants with less than this percentage variants at a\n                        site will be masked out\n  --min_depth_filter, -D\n                        Mask out variants with less than a given depth of\n                        reads\n  --min_depth MIN_DEPTH\n                        Variants at sites with less than this depth of reads\n                        will be masked out\n  --snv_only_filter     Mask out variants that are not SNVs\n```\n\n*Note* that there are no filters by default. Each filter to be used needs to be explicitly mentioned.\n\nTo export a region (from the list of possible region masks) in BED format, use the `tb_region_list_to_bed` command:\n\n\n```\nusage: tb_region_list_to_bed [-h] [--chromosome_name CHROMOSOME_NAME]\n                             {farhat_rlc,farhat_rlc_lowmap,mtbseq,pe_ppe,tbprofiler,uvp} [output_file]\n\nOutput region filter in BED format\n\npositional arguments:\n  {farhat_rlc,farhat_rlc_lowmap,mtbseq,pe_ppe,tbprofiler,uvp}\n                        Name of region list\n  output_file           File to write output to\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --chromosome_name CHROMOSOME_NAME\n                        Chromosome name to use in BED\n```\n\n### Testing and development environment\n\nThe repository contains a file, [test_environment.yml](test_environment.yml), for creating a [conda](https://docs.conda.io/en/latest/#)\nenvironment for testing and development. Tests can be run with `pytest` and `tox`, where `tox` also uses conda to create the testing environment.\n\nFor some tests, locus locations are looked up using the [COMBAT-TB NeoDB](https://combattb.org/combat-tb-neodb/). This requires an\nenvironment variable, `COMBATTB_BOLT_URL`. If this is not set, tests requiring this lookup are skipped. The default in `tox.ini` uses the [SANBI](https://www.sanbi.ac.za/) hosted NeoDB instance.\n\n### Licensing and distribution\n\nThis code free software and is licensed under the terms specified in [COPYING](COPYING), i.e under the terms of the\n[GNU General Public License version 3](https://www.gnu.org/licenses/gpl-3.0-standalone.html).\n",
    "bugtrack_url": null,
    "license": "GPLv3",
    "summary": "This tool offers multiple options for filtering variants (in VCF files, relative to M. tuberculosis H37Rv).",
    "version": "0.4.0",
    "project_urls": {
        "Homepage": "https://github.com/COMBAT-TB/tb_variant_filter"
    },
    "split_keywords": [
        "bioinformatics",
        "tuberculosis"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f9d3c987006d028567504a6cdea642435d0bc176ddf42abc908f1335dcaa6939",
                "md5": "8827d4e0de2c1c23baf50c661c670d11",
                "sha256": "8f4ce1810b79ae89ddd23bdaab4a1e53062b6fd71d8f49b62b8a660b85ff6fca"
            },
            "downloads": -1,
            "filename": "tb_variant_filter-0.4.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8827d4e0de2c1c23baf50c661c670d11",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 75401,
            "upload_time": "2023-08-10T13:48:44",
            "upload_time_iso_8601": "2023-08-10T13:48:44.575404Z",
            "url": "https://files.pythonhosted.org/packages/f9/d3/c987006d028567504a6cdea642435d0bc176ddf42abc908f1335dcaa6939/tb_variant_filter-0.4.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0b6bec057790b0040cd60182cca15f7239f02c593ee9f305b3dc32bf493f7ccc",
                "md5": "bfb226ee71b83f25240e9b4367af5d70",
                "sha256": "ff2b232be94ea4768587d805741f4eddb36642f86159e591c267473e31275b9f"
            },
            "downloads": -1,
            "filename": "tb_variant_filter-0.4.0.tar.gz",
            "has_sig": false,
            "md5_digest": "bfb226ee71b83f25240e9b4367af5d70",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 61597,
            "upload_time": "2023-08-10T13:48:46",
            "upload_time_iso_8601": "2023-08-10T13:48:46.890918Z",
            "url": "https://files.pythonhosted.org/packages/0b/6b/ec057790b0040cd60182cca15f7239f02c593ee9f305b3dc32bf493f7ccc/tb_variant_filter-0.4.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-10 13:48:46",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "COMBAT-TB",
    "github_project": "tb_variant_filter",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "tb-variant-filter"
}
        
Elapsed time: 1.78839s