readcomb


Namereadcomb JSON
Version 0.3.10 PyPI version JSON
download
home_pagehttps://github.com/ness-lab/readcomb
SummaryFast detection of recombinant reads in BAMs
upload_time2022-05-19 01:47:03
maintainer
docs_urlNone
author
requires_python
licenseGNU GPLV3+
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # `readcomb` - fast detection of recombinant reads in BAMs

[![PyPI version](https://badge.fury.io/py/readcomb.svg)](https://badge.fury.io/py/readcomb)

`readcomb` is a collection of command line and Python tools for fast detection
of recombination events in pooled high-throughput sequencing data. `readcomb`
searches for changes in parental haplotype phase across individual reads and classifies
recombination events based on various properties of the observed recombinant haplotypes.

`readcomb` was designed for use with the model alga _Chlamydomonas reinhardtii_ and
currently only supports haploids. Although the means of specifically detecting gene
conversion are more specific to _C. reinhardtii_, everything else in `readcomb` is
generalizable to the detection of recombination events in any haploid species. 

## Installation

```bash
pip install readcomb
```

## Dependencies
- [cyvcf2](https://brentp.github.io/cyvcf2/) - Fast retrieval and filtering of VCF files and VCF objects written in C
- [pysam](https://pysam.readthedocs.io/en/latest/api.html) - Interface for SAM and BAM files and provides SAM and BAM objects
- [pandas](https://pandas.pydata.org/) - Support for data tables
- [tqdm](https://tqdm.github.io/) - Provides updating progress bars for command line programs
- [samtools](http://www.htslib.org/) - Used for preprocessing of VCF and BAM files

# Usage:

## bamprep 

Command line preprocessing script for BAM files. `bamprep` will prepare an
index file, filter out unusuable reads, and output a BAM _sorted by read name_.
`readcomb` requires BAMs sorted by read name for fast parsing and filtering.


```bash
readcomb-bamprep --bam [bam_filepath] --out [outdir]
```

Optional parameters:

- `--samtools` - Path to samtools binary
- `--threads [int]` - Number of threads samtools should use (default 1)
- `--index_csi` - Create CSI index instead of BAI
- `--no_progress` - Disable index creation - this will speed up `bamprep` but
  will mean no progress bars when filtering

## vcfprep

Command line preprocessing script for VCF files

```bash
readcomb-vcfprep --vcf [vcf_filepath] --out [output_filepath]
```

Optional arguments
- `--snps_only` - Keep only SNPs
- `--indels_only` - Keep only indels
- `--no_hets` - Remove heterozygote calls
- `--min_GQ [int]` - Minimum genotype quality at both sites (default 30)

## filter

Command line multiprocessing script for identification of bam sequences with phase changes

```bash
readcomb-filter --bam [bam_filepath] --vcf [vcf_filepath]
```

Optional arguments:
- `-p, --processes [processes]`, Number of processes available for filter (default 4)
- `-m, --mode [phase_change|no_match]`, Filtering mode (default `phase_change`)
- `-l, --log [log_filepath]`, Filename for log metric output
- `-o, --out [output_filepath]`, File to write filtered output to (default `recomb_diagnosis`)

## classification

Python module for detailed classification of sequences containing phase changes

```python
>>> import readcomb.classification as rc
>>> from cyvcf2 import VCF

>>> bam_filepath = 'data/example_sequences.bam'
>>> vcf_filepath = 'data/example_variants.vcf.gz'
>>> pairs = rc.pairs_creation(bam_filepath, vcf_filepath)     # generate list of Pair objects
>>> cyvcf_object = VCF(vcf_filepath)                          # cyvcf2 file object

>>> print(pairs[0])
Record name: chromosome_1-199370 
Read1: chromosome_1:499417-499667 
Read2: chromosome_1:499766-500016 
VCF: data/example_variants.vcf.gz

>>> pairs[0].classify(cyvcf_object)                           # run classification algorithm
>>> print(pairs[0])
Record name: chromosome_1-199370 
Read1: chromosome_1:499417-499667 
Read2: chromosome_1:499766-500016 
VCF: data/example_variants.vcf.gz
Unmatched Variant(s): False 
Condensed: [['CC2936', 499417, 499626], ['CC2935', 499626, 499736], ['CC2936', 499736, 500016]] 
Call: gene_conversion 
Condensed Masked: [['CC2936', 499487, 499626], ['CC2935', 499626, 499736], ['CC2936', 499736, 499946]] 
Call Masked: gene_conversion 
```
## License

GNU General Public License v3 (GPLv3+)

## Development

Currently in alpha

[Source code](https://github.com/ness-lab/readcomb)

[Development repo](https://github.com/ness-lab/recombinant-reads)



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ness-lab/readcomb",
    "name": "readcomb",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/f8/c0/d2b21c364a2e3a25cd4ab9675305fde63699336840bcbe6d543ae1284b5f/readcomb-0.3.10.tar.gz",
    "platform": null,
    "description": "# `readcomb` - fast detection of recombinant reads in BAMs\n\n[![PyPI version](https://badge.fury.io/py/readcomb.svg)](https://badge.fury.io/py/readcomb)\n\n`readcomb` is a collection of command line and Python tools for fast detection\nof recombination events in pooled high-throughput sequencing data. `readcomb`\nsearches for changes in parental haplotype phase across individual reads and classifies\nrecombination events based on various properties of the observed recombinant haplotypes.\n\n`readcomb` was designed for use with the model alga _Chlamydomonas reinhardtii_ and\ncurrently only supports haploids. Although the means of specifically detecting gene\nconversion are more specific to _C. reinhardtii_, everything else in `readcomb` is\ngeneralizable to the detection of recombination events in any haploid species. \n\n## Installation\n\n```bash\npip install readcomb\n```\n\n## Dependencies\n- [cyvcf2](https://brentp.github.io/cyvcf2/) - Fast retrieval and filtering of VCF files and VCF objects written in C\n- [pysam](https://pysam.readthedocs.io/en/latest/api.html) - Interface for SAM and BAM files and provides SAM and BAM objects\n- [pandas](https://pandas.pydata.org/) - Support for data tables\n- [tqdm](https://tqdm.github.io/) - Provides updating progress bars for command line programs\n- [samtools](http://www.htslib.org/) - Used for preprocessing of VCF and BAM files\n\n# Usage:\n\n## bamprep \n\nCommand line preprocessing script for BAM files. `bamprep` will prepare an\nindex file, filter out unusuable reads, and output a BAM _sorted by read name_.\n`readcomb` requires BAMs sorted by read name for fast parsing and filtering.\n\n\n```bash\nreadcomb-bamprep --bam [bam_filepath] --out [outdir]\n```\n\nOptional parameters:\n\n- `--samtools` - Path to samtools binary\n- `--threads [int]` - Number of threads samtools should use (default 1)\n- `--index_csi` - Create CSI index instead of BAI\n- `--no_progress` - Disable index creation - this will speed up `bamprep` but\n  will mean no progress bars when filtering\n\n## vcfprep\n\nCommand line preprocessing script for VCF files\n\n```bash\nreadcomb-vcfprep --vcf [vcf_filepath] --out [output_filepath]\n```\n\nOptional arguments\n- `--snps_only` - Keep only SNPs\n- `--indels_only` - Keep only indels\n- `--no_hets` - Remove heterozygote calls\n- `--min_GQ [int]` - Minimum genotype quality at both sites (default 30)\n\n## filter\n\nCommand line multiprocessing script for identification of bam sequences with phase changes\n\n```bash\nreadcomb-filter --bam [bam_filepath] --vcf [vcf_filepath]\n```\n\nOptional arguments:\n- `-p, --processes [processes]`, Number of processes available for filter (default 4)\n- `-m, --mode [phase_change|no_match]`, Filtering mode (default `phase_change`)\n- `-l, --log [log_filepath]`, Filename for log metric output\n- `-o, --out [output_filepath]`, File to write filtered output to (default `recomb_diagnosis`)\n\n## classification\n\nPython module for detailed classification of sequences containing phase changes\n\n```python\n>>> import readcomb.classification as rc\n>>> from cyvcf2 import VCF\n\n>>> bam_filepath = 'data/example_sequences.bam'\n>>> vcf_filepath = 'data/example_variants.vcf.gz'\n>>> pairs = rc.pairs_creation(bam_filepath, vcf_filepath)     # generate list of Pair objects\n>>> cyvcf_object = VCF(vcf_filepath)                          # cyvcf2 file object\n\n>>> print(pairs[0])\nRecord name: chromosome_1-199370 \nRead1: chromosome_1:499417-499667 \nRead2: chromosome_1:499766-500016 \nVCF: data/example_variants.vcf.gz\n\n>>> pairs[0].classify(cyvcf_object)                           # run classification algorithm\n>>> print(pairs[0])\nRecord name: chromosome_1-199370 \nRead1: chromosome_1:499417-499667 \nRead2: chromosome_1:499766-500016 \nVCF: data/example_variants.vcf.gz\nUnmatched Variant(s): False \nCondensed: [['CC2936', 499417, 499626], ['CC2935', 499626, 499736], ['CC2936', 499736, 500016]] \nCall: gene_conversion \nCondensed Masked: [['CC2936', 499487, 499626], ['CC2935', 499626, 499736], ['CC2936', 499736, 499946]] \nCall Masked: gene_conversion \n```\n## License\n\nGNU General Public License v3 (GPLv3+)\n\n## Development\n\nCurrently in alpha\n\n[Source code](https://github.com/ness-lab/readcomb)\n\n[Development repo](https://github.com/ness-lab/recombinant-reads)\n\n\n",
    "bugtrack_url": null,
    "license": "GNU GPLV3+",
    "summary": "Fast detection of recombinant reads in BAMs",
    "version": "0.3.10",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "be4f29116aaec0a56ec67e9cbede78bc",
                "sha256": "eb4733f33c553a82383f8fbf1c42d2bbeb0e8564980f01f09e2bc653bdd73a6e"
            },
            "downloads": -1,
            "filename": "readcomb-0.3.10-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "be4f29116aaec0a56ec67e9cbede78bc",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 52735,
            "upload_time": "2022-05-19T01:47:02",
            "upload_time_iso_8601": "2022-05-19T01:47:02.308527Z",
            "url": "https://files.pythonhosted.org/packages/bb/bd/626e0eb58fcc895083fd1fe648fc992ec1db74a016d031961806c74a6c2d/readcomb-0.3.10-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "9dc180638fa8e4a7f591ff70d218339f",
                "sha256": "a08f522b9dc9d00b5866764a9ba2699fad4857b5fd9e3a97ed53a65f0937aea5"
            },
            "downloads": -1,
            "filename": "readcomb-0.3.10.tar.gz",
            "has_sig": false,
            "md5_digest": "9dc180638fa8e4a7f591ff70d218339f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 39245,
            "upload_time": "2022-05-19T01:47:03",
            "upload_time_iso_8601": "2022-05-19T01:47:03.961245Z",
            "url": "https://files.pythonhosted.org/packages/f8/c0/d2b21c364a2e3a25cd4ab9675305fde63699336840bcbe6d543ae1284b5f/readcomb-0.3.10.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-05-19 01:47:03",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "ness-lab",
    "github_project": "readcomb",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "readcomb"
}
        
Elapsed time: 0.38886s