koverage

Name	koverage JSON
Version	0.1.11 JSON
	download
home_page	https://github.com/beardymcjohnface/Koverage
Summary	Quickly get coverage statistics given reads and an assembly
upload_time	2024-02-08 02:54:51
maintainer
docs_url	None
author	Michael Roach
requires_python	>=3.10
license
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            ![](koverage.png)


[![](https://img.shields.io/static/v1?label=CLI&message=Snaketool&color=blueviolet)](https://github.com/beardymcjohnface/Snaketool)
[![](https://img.shields.io/static/v1?label=Licence&message=MIT&color=black)](https://opensource.org/license/mit/)
[![](https://img.shields.io/static/v1?label=Install%20with&message=PIP&color=success)](https://pypi.org/project/koverage/)
[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/koverage/README.html)
![GitHub last commit (branch)](https://img.shields.io/github/last-commit/beardymcjohnface/Koverage/main)
[![](https://github.com/beardymcjohnface/Koverage/actions/workflows/py-app.yaml/badge.svg)](https://github.com/beardymcjohnface/Koverage/actions/workflows/py-app.yaml/)
[![Documentation Status](https://readthedocs.org/projects/koverage/badge/?version=latest)](https://koverage.readthedocs.io/en/latest/?badge=latest)
[![codecov](https://codecov.io/gh/beardymcjohnface/Koverage/branch/main/graph/badge.svg?token=17P2ZEL44U)](https://codecov.io/gh/beardymcjohnface/Koverage)

---

Quickly get coverage statistics given reads and an assembly.

# Motivation

While there are tools that will calculate read-coverage statistics, they do not scale particularly well for large 
datasets, large sample numbers, or large reference FASTAs.
Koverage is designed to place minimal burden on I/O and RAM to allow for maximum scalability.

# Install

Koverage is available on PyPI and Bioconda.

__Recommend create env for installation:__

```shell
conda create -n koverage python=3.11
conda activate koverage
```

__Install with PIP:__

```shell
pip install koverage
```

__Install with Bioconda:__

```shell
conda install -c bioconda koverage
```

__Test the installation__

```shell
koverage test
```

__Developer install:__

```shell
git clone https://github.com/beardymcjohnface/Koverage.git
cd Koverage
pip install -e .
```

# Usage

Get coverage statistics from mapped reads (default method).

```shell
koverage run --reads readDir --ref assembly.fasta
```

Get coverage statistics using kmers (scales much better for very large reference FASTAs).

```shell
koverage run --reads readDir --ref assembly.fasta kmer
```

Any unrecognised commands are passed onto Snakemake.
Run Koverage on a HPC using a Snakemake profile.

```shell
koverage run --reads readDir --ref assembly.fasta --profile mySlurmProfile
```

## Parsing samples with `--reads`

You can pass either a directory of reads or a TSV file to `--reads`. 
Note that Koverage expects your read file names to include R1 or R2 e.g. Tynes-BDA-rw-1_S14_L001_R1_001.fastq.gz or SRR7141305_R2.fastq.gz. 
 - __Directory:__ Koverage will infer sample names and \_R1/\_R2 pairs from the filenames.
 - __TSV file:__ Koverage expects 2 or 3 columns, with column 1 being the sample name and columns 2 and 3 the reads files.

[More information and examples are available here](https://gist.github.com/beardymcjohnface/bb161ba04ae1042299f48a4849e917c8#file-readme-md)

# Test

You can test the methods with the inbuilt dataset like so.

```shell
# test default method
koverage test

# test all methods
koverage test map kmer coverm
```

# Coverage methods

## Mapping-based (default)

```shell
koverage run ...
# or 
koverage run ... map
```

This method will map reads using minimap2 and use the mapping coordinates to calculate coverage.
This method is suitable for most applications.

## Kmer-based

```shell
koverage run ... kmer
```

This method calculates [Jellyfish](https://github.com/gmarcais/Jellyfish) databases of the sequencing reads.
It samples kmers from all reference contigs and queries them from the Jellyfish DBs to calculate coverage statistics.
This method is exceptionally fast for very large reference genomes.

## CoverM

```shell
koverage run ... coverm
```

We've included a wrapper for [CoverM](https://github.com/wwood/CoverM) which you may find useful.
The wrapper manually runs minimap2 and then invokes CoverM on the sorted BAM file. 
It then combines the output from all samples like the other methods.
If you have a large tempfs/ you'll probably find it faster to run CoverM directly on your reads.
__CoverM is not currently available for MacOS.__

# Outputs

## Mapping-based

Default output files using fast estimations for mean, median, hitrate, and variance.

<details>
    <summary><b>sample_coverage.tsv</b></summary>
Per sample and per contig counts.

Column | description
--- | ---
Sample | Sample name derived from read file name
Contig | Contig ID from assembly FASTA
Count | Raw mapped read count
RPM | Reads per million
RPKM | Reads per kilobase million
RPK | Reads per kilobase
TPM | Transcripts per million
Mean | _Estimated_ mean read depth
Median | _Estimated_ median read depth
Hitrate | _Estimated_ fraction of contig with depth > 0
Variance | _Estimated_ read depth variance

</details>

<br>

<details>
    <summary><b>all_coverage.tsv</b></summary>
Per contig counts (all samples).

Column | description
--- | ---
Contig | Contig ID from assembly FASTA
Count | Raw mapped read count
RPM | Reads per million
RPKM | Reads per kilobase million
RPK | Reads per kilobase
TPM | Transcripts per million

</details>
    
## Kmer-based

Outputs for kmer-based coverage metrics.
Kmer outputs are gzipped as it is anticipated that this method will be used with very large reference FASTA files.

<details>
    <summary><b>sample_kmer_coverage.NNmer.tsv.gz</b></summary>
Per sample and contig kmer coverage.

Column | description
--- | ---
Sample | Sample name derived from read file name
Contig | Contig ID from assembly FASTA
Sum | Sum of sampled kmer depths
Mean | Mean sampled kmer depth
Median | Median sampled kmer depth
Hitrate | Fraction of kmers with depth > 0
Variance | Variance of lowest 95 % of sampled kmer depths

</details>

<br>

<details>
    <summary><b>all_kmer_coverage.NNmer.tsv.gz</b></summary>
Contig kmer coverage (all samples).

Column | description
--- | ---
Contig | Contig ID from assembly FASTA
Sum | Sum of sampled kmer depths
Mean | Mean sampled kmer depth
Median | Median sampled kmer depth

</details>

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/beardymcjohnface/Koverage",
    "name": "koverage",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "",
    "keywords": "",
    "author": "Michael Roach",
    "author_email": "beardymcjohnface@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/90/1f/754fbcaf32305ad0578604100b8c0619fbcb82eafb85801514e0728130a5/koverage-0.1.11.tar.gz",
    "platform": null,
    "description": "![](koverage.png)\n\n\n[![](https://img.shields.io/static/v1?label=CLI&message=Snaketool&color=blueviolet)](https://github.com/beardymcjohnface/Snaketool)\n[![](https://img.shields.io/static/v1?label=Licence&message=MIT&color=black)](https://opensource.org/license/mit/)\n[![](https://img.shields.io/static/v1?label=Install%20with&message=PIP&color=success)](https://pypi.org/project/koverage/)\n[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/koverage/README.html)\n![GitHub last commit (branch)](https://img.shields.io/github/last-commit/beardymcjohnface/Koverage/main)\n[![](https://github.com/beardymcjohnface/Koverage/actions/workflows/py-app.yaml/badge.svg)](https://github.com/beardymcjohnface/Koverage/actions/workflows/py-app.yaml/)\n[![Documentation Status](https://readthedocs.org/projects/koverage/badge/?version=latest)](https://koverage.readthedocs.io/en/latest/?badge=latest)\n[![codecov](https://codecov.io/gh/beardymcjohnface/Koverage/branch/main/graph/badge.svg?token=17P2ZEL44U)](https://codecov.io/gh/beardymcjohnface/Koverage)\n\n---\n\nQuickly get coverage statistics given reads and an assembly.\n\n# Motivation\n\nWhile there are tools that will calculate read-coverage statistics, they do not scale particularly well for large \ndatasets, large sample numbers, or large reference FASTAs.\nKoverage is designed to place minimal burden on I/O and RAM to allow for maximum scalability.\n\n# Install\n\nKoverage is available on PyPI and Bioconda.\n\n__Recommend create env for installation:__\n\n```shell\nconda create -n koverage python=3.11\nconda activate koverage\n```\n\n__Install with PIP:__\n\n```shell\npip install koverage\n```\n\n__Install with Bioconda:__\n\n```shell\nconda install -c bioconda koverage\n```\n\n__Test the installation__\n\n```shell\nkoverage test\n```\n\n__Developer install:__\n\n```shell\ngit clone https://github.com/beardymcjohnface/Koverage.git\ncd Koverage\npip install -e .\n```\n\n# Usage\n\nGet coverage statistics from mapped reads (default method).\n\n```shell\nkoverage run --reads readDir --ref assembly.fasta\n```\n\nGet coverage statistics using kmers (scales much better for very large reference FASTAs).\n\n```shell\nkoverage run --reads readDir --ref assembly.fasta kmer\n```\n\nAny unrecognised commands are passed onto Snakemake.\nRun Koverage on a HPC using a Snakemake profile.\n\n```shell\nkoverage run --reads readDir --ref assembly.fasta --profile mySlurmProfile\n```\n\n## Parsing samples with `--reads`\n\nYou can pass either a directory of reads or a TSV file to `--reads`. \nNote that Koverage expects your read file names to include R1 or R2 e.g. Tynes-BDA-rw-1_S14_L001_R1_001.fastq.gz or SRR7141305_R2.fastq.gz. \n - __Directory:__ Koverage will infer sample names and \\_R1/\\_R2 pairs from the filenames.\n - __TSV file:__ Koverage expects 2 or 3 columns, with column 1 being the sample name and columns 2 and 3 the reads files.\n\n[More information and examples are available here](https://gist.github.com/beardymcjohnface/bb161ba04ae1042299f48a4849e917c8#file-readme-md)\n\n# Test\n\nYou can test the methods with the inbuilt dataset like so.\n\n```shell\n# test default method\nkoverage test\n\n# test all methods\nkoverage test map kmer coverm\n```\n\n# Coverage methods\n\n## Mapping-based (default)\n\n```shell\nkoverage run ...\n# or \nkoverage run ... map\n```\n\nThis method will map reads using minimap2 and use the mapping coordinates to calculate coverage.\nThis method is suitable for most applications.\n\n## Kmer-based\n\n```shell\nkoverage run ... kmer\n```\n\nThis method calculates [Jellyfish](https://github.com/gmarcais/Jellyfish) databases of the sequencing reads.\nIt samples kmers from all reference contigs and queries them from the Jellyfish DBs to calculate coverage statistics.\nThis method is exceptionally fast for very large reference genomes.\n\n## CoverM\n\n```shell\nkoverage run ... coverm\n```\n\nWe've included a wrapper for [CoverM](https://github.com/wwood/CoverM) which you may find useful.\nThe wrapper manually runs minimap2 and then invokes CoverM on the sorted BAM file. \nIt then combines the output from all samples like the other methods.\nIf you have a large tempfs/ you'll probably find it faster to run CoverM directly on your reads.\n__CoverM is not currently available for MacOS.__\n\n# Outputs\n\n## Mapping-based\n\nDefault output files using fast estimations for mean, median, hitrate, and variance.\n\n<details>\n    <summary><b>sample_coverage.tsv</b></summary>\nPer sample and per contig counts.\n\nColumn | description\n--- | ---\nSample | Sample name derived from read file name\nContig | Contig ID from assembly FASTA\nCount | Raw mapped read count\nRPM | Reads per million\nRPKM | Reads per kilobase million\nRPK | Reads per kilobase\nTPM | Transcripts per million\nMean | _Estimated_ mean read depth\nMedian | _Estimated_ median read depth\nHitrate | _Estimated_ fraction of contig with depth > 0\nVariance | _Estimated_ read depth variance\n\n</details>\n\n<br>\n\n<details>\n    <summary><b>all_coverage.tsv</b></summary>\nPer contig counts (all samples).\n\nColumn | description\n--- | ---\nContig | Contig ID from assembly FASTA\nCount | Raw mapped read count\nRPM | Reads per million\nRPKM | Reads per kilobase million\nRPK | Reads per kilobase\nTPM | Transcripts per million\n\n</details>\n    \n## Kmer-based\n\nOutputs for kmer-based coverage metrics.\nKmer outputs are gzipped as it is anticipated that this method will be used with very large reference FASTA files.\n\n<details>\n    <summary><b>sample_kmer_coverage.NNmer.tsv.gz</b></summary>\nPer sample and contig kmer coverage.\n\nColumn | description\n--- | ---\nSample | Sample name derived from read file name\nContig | Contig ID from assembly FASTA\nSum | Sum of sampled kmer depths\nMean | Mean sampled kmer depth\nMedian | Median sampled kmer depth\nHitrate | Fraction of kmers with depth > 0\nVariance | Variance of lowest 95 % of sampled kmer depths\n\n</details>\n\n<br>\n\n<details>\n    <summary><b>all_kmer_coverage.NNmer.tsv.gz</b></summary>\nContig kmer coverage (all samples).\n\nColumn | description\n--- | ---\nContig | Contig ID from assembly FASTA\nSum | Sum of sampled kmer depths\nMean | Mean sampled kmer depth\nMedian | Median sampled kmer depth\n\n</details>\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Quickly get coverage statistics given reads and an assembly",
    "version": "0.1.11",
    "project_urls": {
        "Homepage": "https://github.com/beardymcjohnface/Koverage"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4a882c5af99fcb8de272e6ca812547519d540adbcb51f1267807e603dd79d214",
                "md5": "3c430ff5be99ba8ca1c335e3eebe6d6b",
                "sha256": "bb8d5e7573aa4710f43c40ea7da470eb99ea3329b10a413f9b49e86cf88ac02a"
            },
            "downloads": -1,
            "filename": "koverage-0.1.11-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3c430ff5be99ba8ca1c335e3eebe6d6b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 19808063,
            "upload_time": "2024-02-08T02:54:49",
            "upload_time_iso_8601": "2024-02-08T02:54:49.237274Z",
            "url": "https://files.pythonhosted.org/packages/4a/88/2c5af99fcb8de272e6ca812547519d540adbcb51f1267807e603dd79d214/koverage-0.1.11-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "901f754fbcaf32305ad0578604100b8c0619fbcb82eafb85801514e0728130a5",
                "md5": "bfa0567200efd9d8b6a7c9b0c280b665",
                "sha256": "24d785d654524a59c109f6fb3b3f6fc211b5f7177d643597f144fc6954a74d57"
            },
            "downloads": -1,
            "filename": "koverage-0.1.11.tar.gz",
            "has_sig": false,
            "md5_digest": "bfa0567200efd9d8b6a7c9b0c280b665",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 19807168,
            "upload_time": "2024-02-08T02:54:51",
            "upload_time_iso_8601": "2024-02-08T02:54:51.799281Z",
            "url": "https://files.pythonhosted.org/packages/90/1f/754fbcaf32305ad0578604100b8c0619fbcb82eafb85801514e0728130a5/koverage-0.1.11.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-08 02:54:51",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "beardymcjohnface",
    "github_project": "Koverage",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "koverage"
}

Michael Roach