tadrep


Nametadrep JSON
Version 0.9.2 PyPI version JSON
download
home_pagehttps://github.com/oschwengers/tadrep
SummaryTaDRep: Targeted Detection and Reconstruction of Plasmids
upload_time2024-05-07 07:20:35
maintainerNone
docs_urlNone
authorOliver Schwengers
requires_python>=3.8
licenseGPLv3
keywords bioinformatics bacteria plasmids
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![License: GPL v3](https://img.shields.io/badge/License-GPL%20v3-brightgreen.svg)](https://github.com/oschwengers/tadrep/blob/master/LICENSE)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/tadrep.svg)
![PyPI - Status](https://img.shields.io/pypi/status/tadrep.svg)
![GitHub release](https://img.shields.io/github/release/oschwengers/tadrep.svg)
[![PyPI](https://img.shields.io/pypi/v/tadrep.svg)](https://pypi.org/project/tadrep)
[![Conda](https://img.shields.io/conda/v/bioconda/tadrep.svg)](https://bioconda.github.io/recipes/tadrep/README.html)

# TaDReP: Targeted Detection and Reconstruction of Plasmids

TaDReP is a tool for the rapid targeted detection and reconstruction of plasmids within bacterial draft genomes.

- [Description](#description)
- [Installation](#installation)
- [Input & Output](#input-and-output)
- [Overview](#overview)
- [Usage](#usage)
  - [Setup](#setup)
  - [Database](#database)
  - [Extract](#extract)
  - [Characterize](#characterize)
  - [Cluster](#cluster)
  - [Detect](#detect)
  - [Visualize](#visualize)
- [Issues & Feature Requests](#issues)

## Description

TaDReP facilitates the rapid screening for target plasmids within single or cohorts of draft genomes.

It detects and reconstructs reference plasmids within bacterial draft assemblies via Blast+ alignments of draft genome contigs that are rigourously filtered for coverage and sequence identity thresholds. Finally, reference plasmids are detected and reconstructed if strict thresholds regarding plasmid-wise coverage and sequence identity are met.

## Installation

TaDRep can be installed via Conda and Pip. However, we encourage to use [Conda](https://conda.io/projects/conda/en/latest/user-guide/install/index.html) to automatically install all required 3rd party dependencies.

### Conda

```bash
conda install -c conda-forge -c bioconda tadrep
```

### Pip

```bash
$ python3 -m pip install --user tadrep
```

## Input and Output

### Input

TaDReP accepts bacterial draft genome assemblies in (zipped) fasta format. Complete reference plasmid sequences are either extracted from (semi-)closed genomes or plasmid sequence collections, or created from public plasmid databases (RefSeq / PLSDB). For further information how to extract plasmid sequences, please read the [extract](#extract) section below.

### Output

For each draft genome TaDReP writes a TSV summary file providing all detected reference plasmids and aligned genome contigs. For each reference plasmid that was detected in a draft assembly, ordered and rearranged contigs are exported as `N`-merged scaffolds, as well as mere contigs. Furthermore, for each reconstructed plasmid, the reference plasmid backbone and all contig alignments are visualized (PDF).

- `<genome>-summary.tsv`: detailed per contig alignment summary
- `<genome>-<plasmid>-contigs.fna`: ordered and rearranged contigs of the reconstructed plasmid
- `<genome>-<plasmid>-pseudo.fna`: pseudomolecule sequence of the reconstructed plasmid
- `<genome>-<plasmid>.pdf`: visualization of aligned contigs against the detected reference plasmid

If multiple genomes are provided, TaDReP also provides a presence/absence matrix of all detected plasmids as a cohort analyses, as well as a short summary of plasmids and which contigs were matched in each genome.

- `plasmids.info`: plasmid characterization summary
- `plasmids.tsv`: presence/absence table of detected plasmids
- `summary.tsv`: short summary of matched contigs through all genomes
- `tadrep.log`: log-file for debugging

## Overview

![TaDReP overview](./images/tadrep.png)

## Usage

TaDReP's workflow comprises seven steps implement in CLI submodules to ease semi-automated multi-step analyses.

```
usage: TaDReP [--help] [--verbose] [--threads THREADS] [--tmp-dir TMP_DIR] [--version] [--output OUTPUT] [--prefix PREFIX]  ...

Targeted Detection and Reconstruction of Plasmids

General:
  --help, -h            Show this help message and exit
  --verbose, -v         Print verbose information
  --threads THREADS, -t THREADS
                        Number of threads to use (default = number of available CPUs)
  --tmp-dir TMP_DIR     Temporary directory to store blast hits
  --version             show program's version number and exit

General Input / Output:
  --output OUTPUT, -o OUTPUT
                        Output directory (default = current working directory)
  --prefix PREFIX       Prefix for all output files (default = None)

Submodules:
  
    setup               Download and prepare inc-types
    database            Download and create database for TaDReP
    extract             Extract unique plasmid sequences
    characterize        Identify plasmids with GC content, Inc types, conjugation genes
    cluster             Cluster related plasmids
    detect              Detect and reconstruct plasmids in draft genomes
    visualize           Visualize plasmid coverage of contigs

Citation:
Schwengers et al. (2023)
TaDReP: Targeted Detection and Reconstruction of Plasmids.
GitHub https://github.com/oschwengers/tadrep
```

## Setup

The `setup` module downloads external databases, *e.g.* PlasmidFinders incompatibility groups that are required to characterize plasmids.

### Example

Verbosely download inc-types:

```bash
tadrep -v -o <output-path> setup
```

## Database

The `database` module downloads public plasmid databases (PLSDB / RefSeq) into a reference plasmid file. This creates a subdirectory in a user specified output directory.

If you downloaded a database, you can skip the extract step and start with the [characterization](#characterize).

```bash
usage: TaDReP database [-h] [--type {refseq,plsdb}] [--force]

options:
  -h, --help            show this help message and exit

Input / Output:
  --type {refseq,plsdb}
                        External DB to import (default = 'refseq')
  --force, -f           Force download and new setup of database
```

### Examples

Create refseq database:

```bash
tadrep -v -o <output-path> database --type refseq
```

Create PLSDB database:

```bash
tadrep -v -o <output-path> database --type plsdb
```

Overwrite existing refseq files with newly downloaded data.

```bash
tadrep -v -o <output-path> database --type refseq -f
```

## Extract

The `extract` module Extracts reference plasmid sequences from complete genomes, (semi-)draft genomes or plasmid files.

```bash
usage: TaDReP extract [-h] [--type {genome,plasmid,draft}] [--header HEADER] [--files FILES [FILES ...]] [--discard-longest DISCARD_LONGEST] [--max-length MAX_LENGTH]

options:
  -h, --help            show this help message and exit

Input:
  --type {genome,plasmid,draft}, -t {genome,plasmid,draft}
                        Type of input files
  --header HEADER       Template for header description inside input files: e.g.: header: ">pl1234" --> --header "pl"
  --files FILES [FILES ...], -f FILES [FILES ...]
                        File path
  --discard-longest DISCARD_LONGEST, -d DISCARD_LONGEST
                        Discard n longest sequences in output
  --max-length MAX_LENGTH, -m MAX_LENGTH
                        Max sequence length (default = 1000000 bp)
```

For different input types specified via `--type`:

- `genome`: extract all but the longest sequence. This can be adjusted via `--discard-longest`.
- `draft`: extracts only sequences with specific headers. Headers can be specified via `--header`.
- `plasmid`: extracts all sequences from a given file without any filtering.

If you extracted references, you can skip the database step and start with the [characterization](#characterize).

### Examples

Extract all sequences from file `plasmids.fna` ignoring the two longest:

```bash
tadrep -v --type genome --discard-longest 2 --files plasmids.fna
```

Extract all sequences from file `plasmids.fna` where `header` contains `pl`:

```bash
tadrep -v --type draft --header "pl" --files plasmids.fna
```

Extract all potential plasmid sequences (one of 'plasmid', 'complete', 'circular=true' in header) from file `plasmids.fna` ignoring sequences longer than 500000 bp:

```bash
tadrep -v --type draft --max-length 500000 --files plasmids.fna
```

Extract all sequences from file `plasmids.fna`:

```bash
tadrep -v --type plasmid --files plasmids.fna
```

## Characterize

The `characterize` module characterizes all reference plasmids by the following features:

- Length
- GC content
- Incompatibility types
- Number of coding sequences

If you downloaded a reference database this is the step to start with.

```bash
usage: TaDReP characterize [-h] [--db DATABASE] [--inc-types INC_TYPES]

optional arguments:
  -h, --help            show this help message and exit

Input:
  --db DATABASE         Import json file from a given database path into working directory
  --inc-types INC_TYPES
                        Import inc-types from given path into working directory
```

### Examples

Characterize plasmids in working directory `<output-path>` and import inc-types from `inc-types` folder:

```bash
tadrep -v -o <output-path> characterize --inc-types inc-types/inc-types.fasta
```

If inc-types is already present inside the working directory, the parameter `--inc-types` can be omitted:

```bash
tadrep -v -o <output-path> characterize
```

If you downloaded a database you can import it into the working directory `<output-path>` with the `--db` parameter:

```bash
tadrep -v -o <output-path> characterize --db databases/plsdb/plsdb.json --inc-types inc-types/inc-types.fasta
```

## Cluster

The `cluster` module groups plasmids with similar sequences and features.

```bash
usage: TaDReP cluster [-h] [--min-sequence-identity [1-100]] [--max-sequence-length-difference [1-1000000]] [--skip]

options:
  -h, --help            show this help message and exit

Parameter:
  --min-sequence-identity [1-100]
                        Minimal plasmid sequence identity (default = 90%)
  --max-sequence-length-difference [1-1000000]
                        Maximal plasmid sequence length difference in basepairs (default = 1000)
  --skip, -s            Skips clustering, one group for each plasmid
```

### Example

```bash
tadrep -v cluster
```

## Detect

The `detect` module aligns contigs of bacterial draft genomes to reference plasmids using BLAST+. Each match is evaluated by coverage and sequence identity of the aligned plasmid section and can be individualy adjusted by using `--min-plasmid-identity` and `--min-plasmid-coverage`. If various contigs match a plasmid and the combined coverage and identity exceed a certain threshold, the combination of aligned contigs is saved.

Each detected plasmid is reconstructed as a pseudo sequence, where matching contigs are linked by a sequence of `N`. Information on detected & reconstructed plasmids and in which draft genomes they were found in provided in a summary and a presence-absence table.

```bash
usage: TaDReP detect [-h] [--genome GENOME [GENOME ...]] [--min-contig-coverage [1-100]] [--min-contig-identity [1-100]] [--min-plasmid-coverage [1-100]] [--min-plasmid-identity [1-100]]
                     [--gap-sequence-length GAP_SEQUENCE_LENGTH]

optional arguments:
  -h, --help            show this help message and exit

Input / Output:
  --genome GENOME [GENOME ...], -g GENOME [GENOME ...]
                        Draft genome path

Annotation:
  --min-contig-coverage [1-100]
                        Minimal contig coverage (default = 90%)
  --min-contig-identity [1-100]
                        Maximal contig identity (default = 90%)
  --min-plasmid-coverage [1-100]
                        Minimal plasmid coverage (default = 80%)
  --min-plasmid-identity [1-100]
                        Minimal plasmid identity (default = 90%)
  --gap-sequence-length GAP_SEQUENCE_LENGTH
                        Gap sequence N length (default = 10)
```

### Examples

Detect reference plasmids from directory `<output-path>` in file `draft.fna` with default settings:

```bash
tadrep -v -o <output-path> detect --genome draft.fna
```

Detect reference plasmids from directory `<output-path>` in file `draft.fna`;

`75%` of `contig length` has to be covered by a match;

 `Combined contig matches` have to cover at least `95%` of `reference plasmid` length:

```bash
tadrep -v -o <output-path> detect --genome draft.fna --min-contig-coverage 75 --min-plasmid-coverage 95
```

Detect reference plasmids from directory `<output-path>` in file `draft.fna`;

`Contig sequence` of a match has to be at least `80%` identical to reference plasmid;

`Combined contig matches` have to sum up to at least `95%` identity of `reference plasmid` sequence:

```bash
tadrep -v -o <output-path> detect --genome draft.fna --min-contig-identity 80 --min-plasmid-identity 95
```

Note: `--min-contig-coverage` / `--min-plasmid-identity` and `--min-contig-identity` / `--min-plasmid-coverage` can be combined as well.

## Visualize

The `visualize` module visualizes matching contigs from draft genomes for each detected plasmid.

By default, contigs are represented by boxes, either on top or bottom of the plasmid center line. The position of the boxes represents a match on either forward or backward strand respectively. A colour gradient is used to indicate the identity between contig and plasmid section, a brighter colorization implies smaller sequence identity. The start of this gradient, where it is the brightest, can be individually set with the `--interval-start` parameter.

```bash
usage: TaDReP visualize [-h] [--plotstyle {bigarrow,arrow,bigbox,box,bigrbox,rbox}] [--labelcolor LABELCOLOR] [--linewidth LINEWIDTH] [--arrow-shaft-ratio ARROW_SHAFT_RATIO] [--size-ratio SIZE_RATIO]
                        [--labelsize LABELSIZE] [--labelrotation LABELROTATION] [--labelhpos {left,center,right}] [--labelha {left,center,right}] [--interval-start [0-100]] [--number-of-intervals [1-100]]
                        [--omit_ratio [0-100]]

optional arguments:
  -h, --help            show this help message and exit

Style:
  --plotstyle {bigarrow,arrow,bigbox,box,bigrbox,rbox}
                        Contig representation in plot
  --labelcolor LABELCOLOR
                        Contig label color
  --linewidth LINEWIDTH
                        Contig edge linewidth
  --arrow-shaft-ratio ARROW_SHAFT_RATIO
                        Size ratio between arrow head and shaft
  --size-ratio SIZE_RATIO
                        Contig size ratio to track

Label:
  --labelsize LABELSIZE
                        Contig label size
  --labelrotation LABELROTATION
                        Contig label rotation
  --labelhpos {left,center,right}
                        Contig label horizontal position
  --labelha {left,center,right}
                        Contig label horizontal alignment

Gradient:
  --interval-start [0-100]
                        Percentage where gradient should stop
  --number-of-intervals [1-100]
                        Number of gradient intervals

Omit:
  --omit_ratio [0-100]  Omit contigs shorter than X percent of plasmid length from plot
```

### Examples

Visualize results from detection in directory `<output-path>` with default settings:

```bash
tadrep -v -o <output-path> visualize
```

Visualize results from detection in directory `sho<output-path>wcase`;

`Brightest colour` of gradient starts at `95.5%` sequence identity (darkest colour is always 100% identity);

`Surround` contig blocks with `1px` line:

```bash
tadrep -v -o <output-path> visualize --interval-start 95.5 --linewidth 1
```

## Issues & Feature Requests

TaDReP is brand new and like in every software, expect some bugs lurking around. So, if you run into any issues with TaDReP, we'd be happy to hear about it.
Therefore, please, execute it in verbose mode (`-v`) and do not hesitate to file an issue including as much information as possible:

- a detailed description of the issue
- command line output
- log file (`tadrep.log`)
- a reproducible example of the issue with an input file that you can share _if possible_

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/oschwengers/tadrep",
    "name": "tadrep",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "bioinformatics, bacteria, plasmids",
    "author": "Oliver Schwengers",
    "author_email": "oliver.schwengers@computational.bio.uni-giessen.de",
    "download_url": "https://files.pythonhosted.org/packages/d9/a1/41ed7c3e84e4880e1f891082f6a57be522c69e537dc779eebaf0d122d1fc/tadrep-0.9.2.tar.gz",
    "platform": null,
    "description": "[![License: GPL v3](https://img.shields.io/badge/License-GPL%20v3-brightgreen.svg)](https://github.com/oschwengers/tadrep/blob/master/LICENSE)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/tadrep.svg)\n![PyPI - Status](https://img.shields.io/pypi/status/tadrep.svg)\n![GitHub release](https://img.shields.io/github/release/oschwengers/tadrep.svg)\n[![PyPI](https://img.shields.io/pypi/v/tadrep.svg)](https://pypi.org/project/tadrep)\n[![Conda](https://img.shields.io/conda/v/bioconda/tadrep.svg)](https://bioconda.github.io/recipes/tadrep/README.html)\n\n# TaDReP: Targeted Detection and Reconstruction of Plasmids\n\nTaDReP is a tool for the rapid targeted detection and reconstruction of plasmids within bacterial draft genomes.\n\n- [Description](#description)\n- [Installation](#installation)\n- [Input & Output](#input-and-output)\n- [Overview](#overview)\n- [Usage](#usage)\n  - [Setup](#setup)\n  - [Database](#database)\n  - [Extract](#extract)\n  - [Characterize](#characterize)\n  - [Cluster](#cluster)\n  - [Detect](#detect)\n  - [Visualize](#visualize)\n- [Issues & Feature Requests](#issues)\n\n## Description\n\nTaDReP facilitates the rapid screening for target plasmids within single or cohorts of draft genomes.\n\nIt detects and reconstructs reference plasmids within bacterial draft assemblies via Blast+ alignments of draft genome contigs that are rigourously filtered for coverage and sequence identity thresholds. Finally, reference plasmids are detected and reconstructed if strict thresholds regarding plasmid-wise coverage and sequence identity are met.\n\n## Installation\n\nTaDRep can be installed via Conda and Pip. However, we encourage to use [Conda](https://conda.io/projects/conda/en/latest/user-guide/install/index.html) to automatically install all required 3rd party dependencies.\n\n### Conda\n\n```bash\nconda install -c conda-forge -c bioconda tadrep\n```\n\n### Pip\n\n```bash\n$ python3 -m pip install --user tadrep\n```\n\n## Input and Output\n\n### Input\n\nTaDReP accepts bacterial draft genome assemblies in (zipped) fasta format. Complete reference plasmid sequences are either extracted from (semi-)closed genomes or plasmid sequence collections, or created from public plasmid databases (RefSeq / PLSDB). For further information how to extract plasmid sequences, please read the [extract](#extract) section below.\n\n### Output\n\nFor each draft genome TaDReP writes a TSV summary file providing all detected reference plasmids and aligned genome contigs. For each reference plasmid that was detected in a draft assembly, ordered and rearranged contigs are exported as `N`-merged scaffolds, as well as mere contigs. Furthermore, for each reconstructed plasmid, the reference plasmid backbone and all contig alignments are visualized (PDF).\n\n- `<genome>-summary.tsv`: detailed per contig alignment summary\n- `<genome>-<plasmid>-contigs.fna`: ordered and rearranged contigs of the reconstructed plasmid\n- `<genome>-<plasmid>-pseudo.fna`: pseudomolecule sequence of the reconstructed plasmid\n- `<genome>-<plasmid>.pdf`: visualization of aligned contigs against the detected reference plasmid\n\nIf multiple genomes are provided, TaDReP also provides a presence/absence matrix of all detected plasmids as a cohort analyses, as well as a short summary of plasmids and which contigs were matched in each genome.\n\n- `plasmids.info`: plasmid characterization summary\n- `plasmids.tsv`: presence/absence table of detected plasmids\n- `summary.tsv`: short summary of matched contigs through all genomes\n- `tadrep.log`: log-file for debugging\n\n## Overview\n\n![TaDReP overview](./images/tadrep.png)\n\n## Usage\n\nTaDReP's workflow comprises seven steps implement in CLI submodules to ease semi-automated multi-step analyses.\n\n```\nusage: TaDReP [--help] [--verbose] [--threads THREADS] [--tmp-dir TMP_DIR] [--version] [--output OUTPUT] [--prefix PREFIX]  ...\n\nTargeted Detection and Reconstruction of Plasmids\n\nGeneral:\n  --help, -h            Show this help message and exit\n  --verbose, -v         Print verbose information\n  --threads THREADS, -t THREADS\n                        Number of threads to use (default = number of available CPUs)\n  --tmp-dir TMP_DIR     Temporary directory to store blast hits\n  --version             show program's version number and exit\n\nGeneral Input / Output:\n  --output OUTPUT, -o OUTPUT\n                        Output directory (default = current working directory)\n  --prefix PREFIX       Prefix for all output files (default = None)\n\nSubmodules:\n  \n    setup               Download and prepare inc-types\n    database            Download and create database for TaDReP\n    extract             Extract unique plasmid sequences\n    characterize        Identify plasmids with GC content, Inc types, conjugation genes\n    cluster             Cluster related plasmids\n    detect              Detect and reconstruct plasmids in draft genomes\n    visualize           Visualize plasmid coverage of contigs\n\nCitation:\nSchwengers et al. (2023)\nTaDReP: Targeted Detection and Reconstruction of Plasmids.\nGitHub https://github.com/oschwengers/tadrep\n```\n\n## Setup\n\nThe `setup` module downloads external databases, *e.g.* PlasmidFinders incompatibility groups that are required to characterize plasmids.\n\n### Example\n\nVerbosely download inc-types:\n\n```bash\ntadrep -v -o <output-path> setup\n```\n\n## Database\n\nThe `database` module downloads public plasmid databases (PLSDB / RefSeq) into a reference plasmid file. This creates a subdirectory in a user specified output directory.\n\nIf you downloaded a database, you can skip the extract step and start with the [characterization](#characterize).\n\n```bash\nusage: TaDReP database [-h] [--type {refseq,plsdb}] [--force]\n\noptions:\n  -h, --help            show this help message and exit\n\nInput / Output:\n  --type {refseq,plsdb}\n                        External DB to import (default = 'refseq')\n  --force, -f           Force download and new setup of database\n```\n\n### Examples\n\nCreate refseq database:\n\n```bash\ntadrep -v -o <output-path> database --type refseq\n```\n\nCreate PLSDB database:\n\n```bash\ntadrep -v -o <output-path> database --type plsdb\n```\n\nOverwrite existing refseq files with newly downloaded data.\n\n```bash\ntadrep -v -o <output-path> database --type refseq -f\n```\n\n## Extract\n\nThe `extract` module Extracts reference plasmid sequences from complete genomes, (semi-)draft genomes or plasmid files.\n\n```bash\nusage: TaDReP extract [-h] [--type {genome,plasmid,draft}] [--header HEADER] [--files FILES [FILES ...]] [--discard-longest DISCARD_LONGEST] [--max-length MAX_LENGTH]\n\noptions:\n  -h, --help            show this help message and exit\n\nInput:\n  --type {genome,plasmid,draft}, -t {genome,plasmid,draft}\n                        Type of input files\n  --header HEADER       Template for header description inside input files: e.g.: header: \">pl1234\" --> --header \"pl\"\n  --files FILES [FILES ...], -f FILES [FILES ...]\n                        File path\n  --discard-longest DISCARD_LONGEST, -d DISCARD_LONGEST\n                        Discard n longest sequences in output\n  --max-length MAX_LENGTH, -m MAX_LENGTH\n                        Max sequence length (default = 1000000 bp)\n```\n\nFor different input types specified via `--type`:\n\n- `genome`: extract all but the longest sequence. This can be adjusted via `--discard-longest`.\n- `draft`: extracts only sequences with specific headers. Headers can be specified via `--header`.\n- `plasmid`: extracts all sequences from a given file without any filtering.\n\nIf you extracted references, you can skip the database step and start with the [characterization](#characterize).\n\n### Examples\n\nExtract all sequences from file `plasmids.fna` ignoring the two longest:\n\n```bash\ntadrep -v --type genome --discard-longest 2 --files plasmids.fna\n```\n\nExtract all sequences from file `plasmids.fna` where `header` contains `pl`:\n\n```bash\ntadrep -v --type draft --header \"pl\" --files plasmids.fna\n```\n\nExtract all potential plasmid sequences (one of 'plasmid', 'complete', 'circular=true' in header) from file `plasmids.fna` ignoring sequences longer than 500000 bp:\n\n```bash\ntadrep -v --type draft --max-length 500000 --files plasmids.fna\n```\n\nExtract all sequences from file `plasmids.fna`:\n\n```bash\ntadrep -v --type plasmid --files plasmids.fna\n```\n\n## Characterize\n\nThe `characterize` module characterizes all reference plasmids by the following features:\n\n- Length\n- GC content\n- Incompatibility types\n- Number of coding sequences\n\nIf you downloaded a reference database this is the step to start with.\n\n```bash\nusage: TaDReP characterize [-h] [--db DATABASE] [--inc-types INC_TYPES]\n\noptional arguments:\n  -h, --help            show this help message and exit\n\nInput:\n  --db DATABASE         Import json file from a given database path into working directory\n  --inc-types INC_TYPES\n                        Import inc-types from given path into working directory\n```\n\n### Examples\n\nCharacterize plasmids in working directory `<output-path>` and import inc-types from `inc-types` folder:\n\n```bash\ntadrep -v -o <output-path> characterize --inc-types inc-types/inc-types.fasta\n```\n\nIf inc-types is already present inside the working directory, the parameter `--inc-types` can be omitted:\n\n```bash\ntadrep -v -o <output-path> characterize\n```\n\nIf you downloaded a database you can import it into the working directory `<output-path>` with the `--db` parameter:\n\n```bash\ntadrep -v -o <output-path> characterize --db databases/plsdb/plsdb.json --inc-types inc-types/inc-types.fasta\n```\n\n## Cluster\n\nThe `cluster` module groups plasmids with similar sequences and features.\n\n```bash\nusage: TaDReP cluster [-h] [--min-sequence-identity [1-100]] [--max-sequence-length-difference [1-1000000]] [--skip]\n\noptions:\n  -h, --help            show this help message and exit\n\nParameter:\n  --min-sequence-identity [1-100]\n                        Minimal plasmid sequence identity (default = 90%)\n  --max-sequence-length-difference [1-1000000]\n                        Maximal plasmid sequence length difference in basepairs (default = 1000)\n  --skip, -s            Skips clustering, one group for each plasmid\n```\n\n### Example\n\n```bash\ntadrep -v cluster\n```\n\n## Detect\n\nThe `detect` module aligns contigs of bacterial draft genomes to reference plasmids using BLAST+. Each match is evaluated by coverage and sequence identity of the aligned plasmid section and can be individualy adjusted by using `--min-plasmid-identity` and `--min-plasmid-coverage`. If various contigs match a plasmid and the combined coverage and identity exceed a certain threshold, the combination of aligned contigs is saved.\n\nEach detected plasmid is reconstructed as a pseudo sequence, where matching contigs are linked by a sequence of `N`. Information on detected & reconstructed plasmids and in which draft genomes they were found in provided in a summary and a presence-absence table.\n\n```bash\nusage: TaDReP detect [-h] [--genome GENOME [GENOME ...]] [--min-contig-coverage [1-100]] [--min-contig-identity [1-100]] [--min-plasmid-coverage [1-100]] [--min-plasmid-identity [1-100]]\n                     [--gap-sequence-length GAP_SEQUENCE_LENGTH]\n\noptional arguments:\n  -h, --help            show this help message and exit\n\nInput / Output:\n  --genome GENOME [GENOME ...], -g GENOME [GENOME ...]\n                        Draft genome path\n\nAnnotation:\n  --min-contig-coverage [1-100]\n                        Minimal contig coverage (default = 90%)\n  --min-contig-identity [1-100]\n                        Maximal contig identity (default = 90%)\n  --min-plasmid-coverage [1-100]\n                        Minimal plasmid coverage (default = 80%)\n  --min-plasmid-identity [1-100]\n                        Minimal plasmid identity (default = 90%)\n  --gap-sequence-length GAP_SEQUENCE_LENGTH\n                        Gap sequence N length (default = 10)\n```\n\n### Examples\n\nDetect reference plasmids from directory `<output-path>` in file `draft.fna` with default settings:\n\n```bash\ntadrep -v -o <output-path> detect --genome draft.fna\n```\n\nDetect reference plasmids from directory `<output-path>` in file `draft.fna`;\n\n`75%` of `contig length` has to be covered by a match;\n\n `Combined contig matches` have to cover at least `95%` of `reference plasmid` length:\n\n```bash\ntadrep -v -o <output-path> detect --genome draft.fna --min-contig-coverage 75 --min-plasmid-coverage 95\n```\n\nDetect reference plasmids from directory `<output-path>` in file `draft.fna`;\n\n`Contig sequence` of a match has to be at least `80%` identical to reference plasmid;\n\n`Combined contig matches` have to sum up to at least `95%` identity of `reference plasmid` sequence:\n\n```bash\ntadrep -v -o <output-path> detect --genome draft.fna --min-contig-identity 80 --min-plasmid-identity 95\n```\n\nNote: `--min-contig-coverage` / `--min-plasmid-identity` and `--min-contig-identity` / `--min-plasmid-coverage` can be combined as well.\n\n## Visualize\n\nThe `visualize` module visualizes matching contigs from draft genomes for each detected plasmid.\n\nBy default, contigs are represented by boxes, either on top or bottom of the plasmid center line. The position of the boxes represents a match on either forward or backward strand respectively. A colour gradient is used to indicate the identity between contig and plasmid section, a brighter colorization implies smaller sequence identity. The start of this gradient, where it is the brightest, can be individually set with the `--interval-start` parameter.\n\n```bash\nusage: TaDReP visualize [-h] [--plotstyle {bigarrow,arrow,bigbox,box,bigrbox,rbox}] [--labelcolor LABELCOLOR] [--linewidth LINEWIDTH] [--arrow-shaft-ratio ARROW_SHAFT_RATIO] [--size-ratio SIZE_RATIO]\n                        [--labelsize LABELSIZE] [--labelrotation LABELROTATION] [--labelhpos {left,center,right}] [--labelha {left,center,right}] [--interval-start [0-100]] [--number-of-intervals [1-100]]\n                        [--omit_ratio [0-100]]\n\noptional arguments:\n  -h, --help            show this help message and exit\n\nStyle:\n  --plotstyle {bigarrow,arrow,bigbox,box,bigrbox,rbox}\n                        Contig representation in plot\n  --labelcolor LABELCOLOR\n                        Contig label color\n  --linewidth LINEWIDTH\n                        Contig edge linewidth\n  --arrow-shaft-ratio ARROW_SHAFT_RATIO\n                        Size ratio between arrow head and shaft\n  --size-ratio SIZE_RATIO\n                        Contig size ratio to track\n\nLabel:\n  --labelsize LABELSIZE\n                        Contig label size\n  --labelrotation LABELROTATION\n                        Contig label rotation\n  --labelhpos {left,center,right}\n                        Contig label horizontal position\n  --labelha {left,center,right}\n                        Contig label horizontal alignment\n\nGradient:\n  --interval-start [0-100]\n                        Percentage where gradient should stop\n  --number-of-intervals [1-100]\n                        Number of gradient intervals\n\nOmit:\n  --omit_ratio [0-100]  Omit contigs shorter than X percent of plasmid length from plot\n```\n\n### Examples\n\nVisualize results from detection in directory `<output-path>` with default settings:\n\n```bash\ntadrep -v -o <output-path> visualize\n```\n\nVisualize results from detection in directory `sho<output-path>wcase`;\n\n`Brightest colour` of gradient starts at `95.5%` sequence identity (darkest colour is always 100% identity);\n\n`Surround` contig blocks with `1px` line:\n\n```bash\ntadrep -v -o <output-path> visualize --interval-start 95.5 --linewidth 1\n```\n\n## Issues & Feature Requests\n\nTaDReP is brand new and like in every software, expect some bugs lurking around. So, if you run into any issues with TaDReP, we'd be happy to hear about it.\nTherefore, please, execute it in verbose mode (`-v`) and do not hesitate to file an issue including as much information as possible:\n\n- a detailed description of the issue\n- command line output\n- log file (`tadrep.log`)\n- a reproducible example of the issue with an input file that you can share _if possible_\n",
    "bugtrack_url": null,
    "license": "GPLv3",
    "summary": "TaDRep: Targeted Detection and Reconstruction of Plasmids",
    "version": "0.9.2",
    "project_urls": {
        "Bug Reports": "https://github.com/oschwengers/tadrep/issues",
        "CI": "https://github.com/oschwengers/tadrep/actions",
        "Documentation": "https://github.com/oschwengers/tadrep/blob/main/README.md",
        "Homepage": "https://github.com/oschwengers/tadrep",
        "Source": "https://github.com/oschwengers/tadrep"
    },
    "split_keywords": [
        "bioinformatics",
        " bacteria",
        " plasmids"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ac89a53534c70275fe55be1a96962b15bbe9734049e422262c326b6fe7d955ae",
                "md5": "a732e3ef0186de07213edf8c6b8e686c",
                "sha256": "c8fa645c7737591e8b54ef27464c7e1ebaa6177af617f14797f488707605fc83"
            },
            "downloads": -1,
            "filename": "tadrep-0.9.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a732e3ef0186de07213edf8c6b8e686c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 40919,
            "upload_time": "2024-05-07T07:20:33",
            "upload_time_iso_8601": "2024-05-07T07:20:33.877513Z",
            "url": "https://files.pythonhosted.org/packages/ac/89/a53534c70275fe55be1a96962b15bbe9734049e422262c326b6fe7d955ae/tadrep-0.9.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d9a141ed7c3e84e4880e1f891082f6a57be522c69e537dc779eebaf0d122d1fc",
                "md5": "e7d2a8e18e264486401d703658ce40a4",
                "sha256": "11dc23e820d28159d3513a77d983f396d05c9122ba1aaf403ef2829a9246dc4e"
            },
            "downloads": -1,
            "filename": "tadrep-0.9.2.tar.gz",
            "has_sig": false,
            "md5_digest": "e7d2a8e18e264486401d703658ce40a4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 42652,
            "upload_time": "2024-05-07T07:20:35",
            "upload_time_iso_8601": "2024-05-07T07:20:35.146336Z",
            "url": "https://files.pythonhosted.org/packages/d9/a1/41ed7c3e84e4880e1f891082f6a57be522c69e537dc779eebaf0d122d1fc/tadrep-0.9.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-07 07:20:35",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "oschwengers",
    "github_project": "tadrep",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "tadrep"
}
        
Elapsed time: 0.24014s