### tb_variant_filter
![tb variant filter build status](https://github.com/COMBAT-TB/tb_variant_filter/actions/workflows/tb_variant_filter.yml/badge.svg)
This tool offers multiple options for filtering variants (in VCF files, relative to M. tuberculosis H37Rv coordinates).
It currently has 5 main modes:
1. Filter by region. Mask out variants in certain regions. Region lists available as:
1. `farhat_rlc` and `farhat_rlc_lowmap`: Refined Low Confidence (RLC) and RLC plus low mappability regions from [Marin et al](https://doi.org/10.1093/bioinformatics/btac023)
1. `pe_ppe`: PE/PPE genes from [Fishbein et al 2015](https://onlinelibrary.wiley.com/doi/full/10.1111/mmi.12981)
2. `tbprofiler`: [TBProfiler](http://tbdr.lshtm.ac.uk/) list of antibiotic resistant genes
3. `mtbseq`: [MTBseq](https://github.com/ngs-fzb/MTBseq_source) list of antibiotic resistant genes
4. `uvp`: [UVP](https://github.com/CPTR-ReSeqTB/UVP) list of repetitive loci in M. tuberculosis genome
2. Filter by proximity to indels. Masks out variants within a certain distance (by default 5 bases) of an insertion or
deletion site.
3. Filter by percentage of alternate allele bases. Mask out variants with less than a minimum percentage
(by default 90%) alternative alleles.
4. Filter by depth of reads at a variant site. Masks out variants with less than a minimum depth of coverage
(default 30) at the site
5. Filter all non-SNV variants. Masks out variants that are not single nucleotide variants.
Filtering by (SAM/BAM) mapping quality was omitted because these filters are performed by the upstream
workflow we ([SANBI](https://www.sanbi.ac.za)) currently use.
When used together the effects of the filters are added (i.e. a variant is masked out if it is masked by any of the filters).
#### Installation
The software is available via [bioconda](https://bioconda.github.io/) and can be installed with:
```
conda install tb_variant_filter
```
#### Usage
```
usage: tb_variant_filter [-h] [--region_filter REGION_FILTER]
[--close_to_indel_filter]
[--indel_window_size INDEL_WINDOW_SIZE]
[--min_percentage_alt_filter]
[--min_percentage_alt MIN_PERCENTAGE_ALT]
[--min_depth_filter] [--min_depth MIN_DEPTH]
[--snv_only_filter]
input_file [output_file]
Filter variants from a VCF file (relative to M. tuberculosis H37Rv)
positional arguments:
input_file VCF input file (relative to H37Rv)
output_file Output file (VCF format)
optional arguments:
-h, --help show this help message and exit
--region_filter REGION_FILTER, -R REGION_FILTER
--close_to_indel_filter, -I
Mask out single nucleotide variants that are too close
to indels
--indel_window_size INDEL_WINDOW_SIZE
Window around indel to mask out (mask this number of
bases upstream/downstream from the indel. Requires -I
option to selected)
--min_percentage_alt_filter, -P
Mask out variants with less than a given percentage
variant allele at this site
--min_percentage_alt MIN_PERCENTAGE_ALT
Variants with less than this percentage variants at a
site will be masked out
--min_depth_filter, -D
Mask out variants with less than a given depth of
reads
--min_depth MIN_DEPTH
Variants at sites with less than this depth of reads
will be masked out
--snv_only_filter Mask out variants that are not SNVs
```
*Note* that there are no filters by default. Each filter to be used needs to be explicitly mentioned.
To export a region (from the list of possible region masks) in BED format, use the `tb_region_list_to_bed` command:
```
usage: tb_region_list_to_bed [-h] [--chromosome_name CHROMOSOME_NAME]
{farhat_rlc,farhat_rlc_lowmap,mtbseq,pe_ppe,tbprofiler,uvp} [output_file]
Output region filter in BED format
positional arguments:
{farhat_rlc,farhat_rlc_lowmap,mtbseq,pe_ppe,tbprofiler,uvp}
Name of region list
output_file File to write output to
optional arguments:
-h, --help show this help message and exit
--chromosome_name CHROMOSOME_NAME
Chromosome name to use in BED
```
### Testing and development environment
The repository contains a file, [test_environment.yml](test_environment.yml), for creating a [conda](https://docs.conda.io/en/latest/#)
environment for testing and development. Tests can be run with `pytest` and `tox`, where `tox` also uses conda to create the testing environment.
For some tests, locus locations are looked up using the [COMBAT-TB NeoDB](https://combattb.org/combat-tb-neodb/). This requires an
environment variable, `COMBATTB_BOLT_URL`. If this is not set, tests requiring this lookup are skipped. The default in `tox.ini` uses the [SANBI](https://www.sanbi.ac.za/) hosted NeoDB instance.
### Licensing and distribution
This code free software and is licensed under the terms specified in [COPYING](COPYING), i.e under the terms of the
[GNU General Public License version 3](https://www.gnu.org/licenses/gpl-3.0-standalone.html).
Raw data
{
"_id": null,
"home_page": "https://github.com/COMBAT-TB/tb_variant_filter",
"name": "tb-variant-filter",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "bioinformatics,tuberculosis",
"author": "\"Peter van Heusden\"",
"author_email": "pvh@sanbi.ac.za",
"download_url": "https://files.pythonhosted.org/packages/0b/6b/ec057790b0040cd60182cca15f7239f02c593ee9f305b3dc32bf493f7ccc/tb_variant_filter-0.4.0.tar.gz",
"platform": null,
"description": "### tb_variant_filter\n\n![tb variant filter build status](https://github.com/COMBAT-TB/tb_variant_filter/actions/workflows/tb_variant_filter.yml/badge.svg)\n\nThis tool offers multiple options for filtering variants (in VCF files, relative to M. tuberculosis H37Rv coordinates).\n\nIt currently has 5 main modes:\n\n1. Filter by region. Mask out variants in certain regions. Region lists available as:\n 1. `farhat_rlc` and `farhat_rlc_lowmap`: Refined Low Confidence (RLC) and RLC plus low mappability regions from [Marin et al](https://doi.org/10.1093/bioinformatics/btac023)\n 1. `pe_ppe`: PE/PPE genes from [Fishbein et al 2015](https://onlinelibrary.wiley.com/doi/full/10.1111/mmi.12981)\n 2. `tbprofiler`: [TBProfiler](http://tbdr.lshtm.ac.uk/) list of antibiotic resistant genes\n 3. `mtbseq`: [MTBseq](https://github.com/ngs-fzb/MTBseq_source) list of antibiotic resistant genes\n 4. `uvp`: [UVP](https://github.com/CPTR-ReSeqTB/UVP) list of repetitive loci in M. tuberculosis genome\n2. Filter by proximity to indels. Masks out variants within a certain distance (by default 5 bases) of an insertion or\n deletion site.\n3. Filter by percentage of alternate allele bases. Mask out variants with less than a minimum percentage \n(by default 90%) alternative alleles.\n4. Filter by depth of reads at a variant site. Masks out variants with less than a minimum depth of coverage \n(default 30) at the site\n5. Filter all non-SNV variants. Masks out variants that are not single nucleotide variants.\n\nFiltering by (SAM/BAM) mapping quality was omitted because these filters are performed by the upstream \nworkflow we ([SANBI](https://www.sanbi.ac.za)) currently use.\n \nWhen used together the effects of the filters are added (i.e. a variant is masked out if it is masked by any of the filters).\n\n#### Installation\n\nThe software is available via [bioconda](https://bioconda.github.io/) and can be installed with:\n\n```\nconda install tb_variant_filter\n```\n\n#### Usage\n```\nusage: tb_variant_filter [-h] [--region_filter REGION_FILTER]\n [--close_to_indel_filter]\n [--indel_window_size INDEL_WINDOW_SIZE]\n [--min_percentage_alt_filter]\n [--min_percentage_alt MIN_PERCENTAGE_ALT]\n [--min_depth_filter] [--min_depth MIN_DEPTH]\n [--snv_only_filter]\n input_file [output_file]\n\nFilter variants from a VCF file (relative to M. tuberculosis H37Rv)\n\npositional arguments:\n input_file VCF input file (relative to H37Rv)\n output_file Output file (VCF format)\n\noptional arguments:\n -h, --help show this help message and exit\n --region_filter REGION_FILTER, -R REGION_FILTER\n --close_to_indel_filter, -I\n Mask out single nucleotide variants that are too close\n to indels\n --indel_window_size INDEL_WINDOW_SIZE\n Window around indel to mask out (mask this number of\n bases upstream/downstream from the indel. Requires -I\n option to selected)\n --min_percentage_alt_filter, -P\n Mask out variants with less than a given percentage\n variant allele at this site\n --min_percentage_alt MIN_PERCENTAGE_ALT\n Variants with less than this percentage variants at a\n site will be masked out\n --min_depth_filter, -D\n Mask out variants with less than a given depth of\n reads\n --min_depth MIN_DEPTH\n Variants at sites with less than this depth of reads\n will be masked out\n --snv_only_filter Mask out variants that are not SNVs\n```\n\n*Note* that there are no filters by default. Each filter to be used needs to be explicitly mentioned.\n\nTo export a region (from the list of possible region masks) in BED format, use the `tb_region_list_to_bed` command:\n\n\n```\nusage: tb_region_list_to_bed [-h] [--chromosome_name CHROMOSOME_NAME]\n {farhat_rlc,farhat_rlc_lowmap,mtbseq,pe_ppe,tbprofiler,uvp} [output_file]\n\nOutput region filter in BED format\n\npositional arguments:\n {farhat_rlc,farhat_rlc_lowmap,mtbseq,pe_ppe,tbprofiler,uvp}\n Name of region list\n output_file File to write output to\n\noptional arguments:\n -h, --help show this help message and exit\n --chromosome_name CHROMOSOME_NAME\n Chromosome name to use in BED\n```\n\n### Testing and development environment\n\nThe repository contains a file, [test_environment.yml](test_environment.yml), for creating a [conda](https://docs.conda.io/en/latest/#)\nenvironment for testing and development. Tests can be run with `pytest` and `tox`, where `tox` also uses conda to create the testing environment.\n\nFor some tests, locus locations are looked up using the [COMBAT-TB NeoDB](https://combattb.org/combat-tb-neodb/). This requires an\nenvironment variable, `COMBATTB_BOLT_URL`. If this is not set, tests requiring this lookup are skipped. The default in `tox.ini` uses the [SANBI](https://www.sanbi.ac.za/) hosted NeoDB instance.\n\n### Licensing and distribution\n\nThis code free software and is licensed under the terms specified in [COPYING](COPYING), i.e under the terms of the\n[GNU General Public License version 3](https://www.gnu.org/licenses/gpl-3.0-standalone.html).\n",
"bugtrack_url": null,
"license": "GPLv3",
"summary": "This tool offers multiple options for filtering variants (in VCF files, relative to M. tuberculosis H37Rv).",
"version": "0.4.0",
"project_urls": {
"Homepage": "https://github.com/COMBAT-TB/tb_variant_filter"
},
"split_keywords": [
"bioinformatics",
"tuberculosis"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "f9d3c987006d028567504a6cdea642435d0bc176ddf42abc908f1335dcaa6939",
"md5": "8827d4e0de2c1c23baf50c661c670d11",
"sha256": "8f4ce1810b79ae89ddd23bdaab4a1e53062b6fd71d8f49b62b8a660b85ff6fca"
},
"downloads": -1,
"filename": "tb_variant_filter-0.4.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "8827d4e0de2c1c23baf50c661c670d11",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 75401,
"upload_time": "2023-08-10T13:48:44",
"upload_time_iso_8601": "2023-08-10T13:48:44.575404Z",
"url": "https://files.pythonhosted.org/packages/f9/d3/c987006d028567504a6cdea642435d0bc176ddf42abc908f1335dcaa6939/tb_variant_filter-0.4.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "0b6bec057790b0040cd60182cca15f7239f02c593ee9f305b3dc32bf493f7ccc",
"md5": "bfb226ee71b83f25240e9b4367af5d70",
"sha256": "ff2b232be94ea4768587d805741f4eddb36642f86159e591c267473e31275b9f"
},
"downloads": -1,
"filename": "tb_variant_filter-0.4.0.tar.gz",
"has_sig": false,
"md5_digest": "bfb226ee71b83f25240e9b4367af5d70",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 61597,
"upload_time": "2023-08-10T13:48:46",
"upload_time_iso_8601": "2023-08-10T13:48:46.890918Z",
"url": "https://files.pythonhosted.org/packages/0b/6b/ec057790b0040cd60182cca15f7239f02c593ee9f305b3dc32bf493f7ccc/tb_variant_filter-0.4.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-08-10 13:48:46",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "COMBAT-TB",
"github_project": "tb_variant_filter",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "tb-variant-filter"
}