deepac

Name	deepac JSON
Version	0.14.1 JSON
	download
home_page	https://gitlab.com/dacs-hpi/deepac
Summary	Predicting pathogenic potentials of novel DNA with reverse-complement neural networks.
upload_time	2022-12-16 17:39:35
maintainer
docs_url	None
author	Jakub Bartoszewicz
requires_python	>=3
license	MIT
keywords	deep learning dna sequencing synthetic biology pathogenicity prediction
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <!-- {#mainpage} -->

# DeePaC

DeePaC is a python package and a CLI tool for predicting labels (e.g. pathogenic potentials) from short DNA sequences (e.g. Illumina 
reads) with interpretable reverse-complement neural networks. For details, see our preprint on bioRxiv: 
<https://www.biorxiv.org/content/10.1101/535286v3> and the paper in *Bioinformatics*: <https://doi.org/10.1093/bioinformatics/btz541>.
For details regarding the interpretability functionalities of DeePaC, see the preprint here: <https://www.biorxiv.org/content/10.1101/2020.01.29.925354v2>

Documentation can be found here:
<https://rki_bioinformatics.gitlab.io/DeePaC/>. 
See also the main repo here: <https://gitlab.com/rki_bioinformatics/DeePaC>.

## Plug-ins
### DeePaC-strain
Basic version of DeePaC comes with built-in models trained to predict pathogenic potentials of NGS reads originating from
novel *bacteral species*. If you want to predict pathogenicity of novel *strains* of *known* species, try the DeePaC-strain plugin available here:
<https://gitlab.com/dacs-hpi/DeePaC-strain>. 

### DeePaC-vir
If you want to detect novel human viruses, try the DeePaC-vir plugin: <https://gitlab.com/dacs-hpi/DeePaC-vir>. 

### DeePaC-Live
If you want to run the predictions in real-time during an Illumina sequencing run, try DeePaC-Live: <https://gitlab.com/dacs-hpi/deepac-live>. 


## Installation

We recommend using Bioconda (based on the `conda` package manager) or custom Docker images based on official Tensorflow images.
Alternatively, a `pip` installation is possible as well. For installation on IBM Power Systems (e.g. AC992), see separate [installation instructions (experimental)](https://gitlab.com/rki_bioinformatics/DeePaC/-/blob/master/dockerfiles/ppc64le/README.md).

### With Bioconda (recommended)
 [![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/deepac/README.html)

You can install DeePaC with `bioconda`. Set up the [bioconda channel](
<https://bioconda.github.io/user/install.html#set-up-channels>) first (channel ordering is important):

```
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
```

We recommend setting up an isolated `conda` environment:
```
# python 3.7-3.9 are supported
conda create -n my_env python=3.9
conda activate my_env
```

and then:
```
# For GPU support (recommended) - install tensorflow-gpu from the defaults channel
conda install -c defaults tensorflow-gpu
conda install deepac
# Or: basic installation (CPU-only)
conda install deepac
```

Optional: download and compile the latest deepac-live custom models [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4456008.svg)](https://doi.org/10.5281/zenodo.4456008):
```
deepac getmodels --fetch
```

If you want to install the plugins as well, use:

```
conda install deepacvir deepacstrain
```

### With Docker (also recommended)

Requirements: 
* install [Docker](https://docs.docker.com/get-docker/) on your host machine. 
* For GPU support, you have to install the [NVIDIA Docker support](https://github.com/NVIDIA/nvidia-docker) as well.

See [TF Docker installation guide](https://www.tensorflow.org/install/docker) and the 
[NVIDIA Docker support installation guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker) 
for details. The guide below assumes you have Docker 19.03 or above.

You can then pull the desired image:
```
# Basic installation - CPU only
docker pull dacshpi/deepac:0.13.5

# For GPU support
docker pull dacshpi/deepac:0.13.5-gpu
```

And run it:
```
# Basic installation - CPU only
docker run -v $(pwd):/deepac -u $(id -u):$(id -g) --rm dacshpi/deepac:0.13.5 deepac --help
docker run -v $(pwd):/deepac -u $(id -u):$(id -g) --rm dacshpi/deepac:0.13.5 deepac test -q

# With GPU support
docker run -v $(pwd):/deepac -u $(id -u):$(id -g) --rm --gpus all dacshpi/deepac:0.13.5-gpu deepac test

# If you want to use the shell inside the container
docker run -it -v $(pwd):/deepac -u $(id -u):$(id -g) --rm --gpus all dacshpi/deepac:0.13.5-gpu bash
```

The image ships the main `deepac` package along with the `deepac-vir` and `deepac-strain` plugins. See the basic usage guide below for more deepac commands.

Optional: download and compile the latest deepac-live custom models [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4456008.svg)](https://doi.org/10.5281/zenodo.4456008):
```
docker run -v $(pwd):/deepac -u $(id -u):$(id -g) --rm --gpus all dacshpi/deepac:0.13.5 deepac --fetch
```

For more information about the usage of the NVIDIA container toolkit (e.g. selecting the GPUs to use),
 consult the [User Guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html#user-guide).

The `dacshpi/deepac:latest` corresponds to the latest version of the CPU build. We recommend using explicit version tags instead.

### With pip

We recommend setting up an isolated `conda` environment (see above). Alternatively, you can use a `virtualenv` virtual environment (note that deepac requires python 3):
```
# use -p to use the desired python interpreter (python 3.6 or higher required)
virtualenv -p /usr/bin/python3 my_env
source my_env/bin/activate
```

You can then install DeePaC with `pip`. For GPU support, you need to install CUDA and CuDNN manually first (see TensorFlow installation guide for details). 
Then you can do the same as above:

```
pip install deepac
```

Optional: download and compile the latest deepac-live custom models [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4456008.svg)](https://doi.org/10.5281/zenodo.4456008):
```
deepac getmodels --fetch
```

If you want to install the plugins, use:

```
pip install deepacvir deepacstrain
```

### Optional: run tests
Optionally, you can run explicit tests of your installation. Note that it may take some time on a CPU.
```
# Run standard tests
deepac test
# Run quick tests (eg. on CPUs)
deepac test -q
# Test using specific GPUs (here: /device:GPU:0 and /device:GPU:1) 
deepac test -g 0 1
# Test explainability and gwpa workflows
deepac test -xp
# Full tests
deepac test -a
# Full quick tests (eg. on GPUs with limited memory)
deepac test -aq
```

### Help

To see help, just use
```
deepac --help
deepac predict --help
deepac train --help
# Etc.
```

## Basic use: prediction

You can predict pathogenic potentials with one of the built-in models out of the box:
```
# A rapid CNN (trained on IMG/M data)
deepac predict -r input.fasta
# A sensitive LSTM (trained on IMG/M data)
deepac predict -s input.fasta
```

The rapid and the sensitive models are trained to predict pathogenic potentials of novel bacterial species.
For details, see <https://doi.org/10.1093/bioinformatics/btz541> or <https://www.biorxiv.org/content/10.1101/535286v3>.

To quickly filter your data according to predicted pathogenic potentials, you can use:
```
deepac predict -r input.fasta
deepac filter input.fasta input_predictions.npy -t 0.5
```
Note that after running `predict`, you can use the `input_predictions.npy` to filter your fasta file with different
thresholds. You can also add pathogenic potentials to the fasta headers in the output files:
```
deepac filter input.fasta input_predictions.npy -t 0.75 -p -o output-75.fasta
deepac filter input.fasta input_predictions.npy -t 0.9 -p -o output-90.fasta
```

## Advanced use
### Config templates
To get the config templates in the current working directory, simply use:
```
deepac templates
```
### Preprocessing

For more complex analyzes, it can be useful to preprocess the fasta files by converting them to binary numpy arrays. Use:
```
deepac preproc preproc_config.ini
```
See the `config_templates` directory of the GitLab repository (https://gitlab.com/rki_bioinformatics/DeePaC/) for a sample configuration file.

### Training
You can use the built-in architectures to train a new model:
```
deepac train -r -T train_data.npy -t train_labels.npy -V val_data.npy -v val_labels.npy
deepac train -s -T train_data.npy -t train_labels.npy -V val_data.npy -v val_labels.npy

```

To train a new model based on you custom configuration, use
```
deepac train -c nn_train_config.ini
```

If you train an LSTM on a GPU, a CUDNNLSTM implementation will be used. To convert the resulting model to be 
CPU-compatible, use `deepac convert`. You can also use it to save the weights of a model, or recompile a model 
from a set of weights:

```
# Save model weights and convert the model to an equivalent with the same architecture and weights.
# Other config parameters can be adjusted
deepac convert model_config.ini saved_model.h5
# Recompile the model
deepac convert saved_model_config.ini saved_model_weights.h5 -w
```

### Evaluation

To evaluate a trained model, use
```
# Read-by-read performance
deepac eval -r eval_config.ini
# Species-by-species performance
deepac eval -s eval_species_config.ini
# Ensemble performance
deepac eval -e eval_ens_config.ini
```
See the configs directory for sample configuration files. Note that `deepac eval -s` requires precomputed predictions 
and a csv file with a number of DNA reads for each species in each of the classes.

### TPU (experimental)
If you want to use a TPU, run DeePaC with the `--tpu` parameter:
```
# Test a TPU
deepac --tpu colab test
```

## Intepretability workflows
### Filter visualization
To find the most relevant filters and visualize them, use the following minimum workflow: 
```
# Calculate filter and nucleotide contibutions (partial Shapley values) for the first convolutional layer
# using mean-centered weight matrices and "easy" calculation mode
deepac explain fcontribs -m model.h5 -eb -t test_data.npy -N test_nonpatho.fasta -P test_patho.fasta -o fcontribs 

# Create filter ranking
deepac explain franking -f fcontribs/filter_scores -y test_labels.npy -p test_predictions.npy -o franking

# Prepare transfac files for filter visualization (weighted by filter contribution)
deepac explain fa2transfac -i fcontribs/fasta -o fcontribs/transfac -w -W fcontribs/filter_scores

# Visualize nucleotide contribution sequence logos
deepac explain xlogos -i fcontribs/fasta -s fcontribs/nuc_scores -I fcontribs/transfac -t train_data.npy -o xlogos
```
You can browse through other supplementary functionalities and parameters by checking the help:
```
deepac explain -h
deepac explain fcontribs -h
deepac explain xlogos -h
# etc.
```

### Genome-wide phenotype potential analysis (GWPA)
To find interesting regions of a whole genome, use this workflow to generate nucleotide-resolution maps of
predicted phenotype potentials and nucleotide contributions:
```
# Fragment the genomes into pseudoreads
deepac gwpa fragment -g genomes_fasta -o fragmented_genomes

# Predict the pathogenic potential of each pseudoread
deepac predict -r -a fragmented_genomes/sample1_fragmented_genomes.npy -o predictions/sample1_pred.npy

# Create bedgraphs of mean pathogenic potential at each position of the genome
# Can be visualized in IGV
deepac gwpa genomemap -f fragmented_genomes -p predictions -g genomes_genome -o bedgraph

# Rank genes by mean pathogenic potential
deepac gwpa granking -p bedgraph -g genomes_gff -o granking

# Create bedgraphs of mean nuclotide contribution at each position of the genome
# Can be visualized in IGV
deepac gwpa ntcontribs -m model.h5 -f fragmented_genomes -g genomes_genome -o bedgraph_nt
```
You can browse through other supplementary functionalities and parameters by checking the help:
```
deepac gwpa -h
deepac gwpa genomemap -h
deepac gwpa ntcontribs -h
# etc.
```
### Filter enrichment analysis
Finally, you can check for filter enrichment in annotated genes or other genomic features:
```
# Get filter activations, genome-wide
deepac gwpa factiv -m model.h5 -t fragmented_genomes/sample1_fragmented_genomes.npy -f fragmented_genomes/sample1_fragmented_genomes.fasta -o factiv

# Check for enrichment within annotated genomic features
deepac gwpa fenrichment -i factiv -g genomes_gff/sample1.gff -o fenrichment
```

## Supplementary data and scripts
Datasets are available here: [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3678563.svg)](https://doi.org/10.5281/zenodo.3678563) (bacteria) and here: [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4312525.svg)](https://doi.org/10.5281/zenodo.4312525) (viruses).
In the supplement_paper directory you can find the R scripts and data files used in the papers for dataset preprocessing and benchmarking.

## Erratum
The second sentence in section 2.2.3 of the bacterial DeePaC paper (<https://doi.org/10.1093/bioinformatics/btz541>) is partially incomplete.

Published text: “All were initialized with He weight initialization (He et al, 2015) and trained…”

Should be: “All were initialized with He weight initialization (He et al, 2015) or Glorot initialization (Glorot & Bengio, 2010) for recurrent and feedforward layers respectively and trained…

## Known issues
Unfortunately, the following issues are independent of the DeePaC codebase:
* pip installation of pybedtools (a deepac dependency) requires libz-dev and will fail if it is not present on your system. To solve this, install libz-dev or use the bioconda installation.
* A bug in TF 2.2 may cause training to hang when using Keras Sequence input (i.e. if your training config contains
 `Use_TFData = False` and `LoadTrainingByBatch = True`). To solve this, use TF 2.1 or TF 2.3+,
  pre-load your data into memory (`LoadTrainingByBatch = False`) or use TFDataset input (`Use_TFData = True`).
* A bug in TF 2.1 resets the optimizer state when continuing interrupted training. DeePaC will notice that and warn you, but to solve this, upgrade to TF 2.2+.
* h5py>=3.0 is not compatible with Tensorflow at the moment and will cause errors when loading Keras (and DeePaC) models (hence, deepac tests will fail as well). 
Conda installation takes care of it automatically, but the pip Tensorflow installation does not. To solve it, use conda installation or install h5py<3.0. 
This issue should be resolved in a future version of Tensorflow.
* shap 0.38 requires IPython but the pip installer does not install it. Manual installation solves the problem.

## Cite us
If you find DeePaC useful, please cite:

```
@article{10.1093/bioinformatics/btz541,
    author = {Bartoszewicz, Jakub M and Seidel, Anja and Rentzsch, Robert and Renard, Bernhard Y},
    title = "{DeePaC: predicting pathogenic potential of novel DNA with reverse-complement neural networks}",
    journal = {Bioinformatics},
    volume = {36},
    number = {1},
    pages = {81-89},
    year = {2020},
    month = {01},
    issn = {1367-4803},
    doi = {10.1093/bioinformatics/btz541},
    url = {https://doi.org/10.1093/bioinformatics/btz541},
    eprint = {https://academic.oup.com/bioinformatics/article-pdf/36/1/81/31813920/btz541.pdf},
}

@article{10.1093/nargab/lqab004,
    author = {Bartoszewicz, Jakub M and Seidel, Anja and Renard, Bernhard Y},
    title = "{Interpretable detection of novel human viruses from genome sequencing data}",
    journal = {NAR Genomics and Bioinformatics},
    volume = {3},
    number = {1},
    year = {2021},
    month = {02},
    issn = {2631-9268},
    doi = {10.1093/nargab/lqab004},
    url = {https://doi.org/10.1093/nargab/lqab004},
    note = {lqab004},
    eprint = {https://academic.oup.com/nargab/article-pdf/3/1/lqab004/36165658/lqab004.pdf},
}

@article{10.1093/bib/bbab269,
    author = {Bartoszewicz, Jakub M and Genske, Ulrich and Renard, Bernhard Y},
    title = "{Deep learning-based real-time detection of novel pathogens during sequencing}",
    journal = {Briefings in Bioinformatics},
    volume = {22},
    number = {6},
    year = {2021},
    month = {07},
    issn = {1477-4054},
    doi = {10.1093/bib/bbab269},
    url = {https://doi.org/10.1093/bib/bbab269},
    note = {bbab269},
    eprint = {https://academic.oup.com/bib/article-pdf/22/6/bbab269/41088711/bbab269.pdf},
}

@article{10.1101/2021.11.30.470625,
    author = {Bartoszewicz, Jakub M and Nasri, Ferdous and Nowicka, Melania and Renard, Bernhard Y},
    title = {Pathogenic potential prediction for novel fungal DNA based on a curated fungi-hosts data collection},
    year = {2021},
    doi = {10.1101/2021.11.30.470625},
    publisher = {Cold Spring Harbor Laboratory},
    URL = {https://www.biorxiv.org/content/early/2021/12/01/2021.11.30.470625},
    eprint = {https://www.biorxiv.org/content/early/2021/12/01/2021.11.30.470625.full.pdf},
    journal = {bioRxiv}
}

```

Raw data

            {
    "_id": null,
    "home_page": "https://gitlab.com/dacs-hpi/deepac",
    "name": "deepac",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3",
    "maintainer_email": "",
    "keywords": "deep learning DNA sequencing synthetic biology pathogenicity prediction",
    "author": "Jakub Bartoszewicz",
    "author_email": "jakub.bartoszewicz@hpi.de",
    "download_url": "https://files.pythonhosted.org/packages/97/5f/585ce14c34ad8558b547d8e0a6745a436689d60e80cc9b19ebdf2f921126/deepac-0.14.1.tar.gz",
    "platform": null,
    "description": "<!-- {#mainpage} -->\n\n# DeePaC\n\nDeePaC is a python package and a CLI tool for predicting labels (e.g. pathogenic potentials) from short DNA sequences (e.g. Illumina \nreads) with interpretable reverse-complement neural networks. For details, see our preprint on bioRxiv: \n<https://www.biorxiv.org/content/10.1101/535286v3> and the paper in *Bioinformatics*: <https://doi.org/10.1093/bioinformatics/btz541>.\nFor details regarding the interpretability functionalities of DeePaC, see the preprint here: <https://www.biorxiv.org/content/10.1101/2020.01.29.925354v2>\n\nDocumentation can be found here:\n<https://rki_bioinformatics.gitlab.io/DeePaC/>. \nSee also the main repo here: <https://gitlab.com/rki_bioinformatics/DeePaC>.\n\n## Plug-ins\n### DeePaC-strain\nBasic version of DeePaC comes with built-in models trained to predict pathogenic potentials of NGS reads originating from\nnovel *bacteral species*. If you want to predict pathogenicity of novel *strains* of *known* species, try the DeePaC-strain plugin available here:\n<https://gitlab.com/dacs-hpi/DeePaC-strain>. \n\n### DeePaC-vir\nIf you want to detect novel human viruses, try the DeePaC-vir plugin: <https://gitlab.com/dacs-hpi/DeePaC-vir>. \n\n### DeePaC-Live\nIf you want to run the predictions in real-time during an Illumina sequencing run, try DeePaC-Live: <https://gitlab.com/dacs-hpi/deepac-live>. \n\n\n## Installation\n\nWe recommend using Bioconda (based on the `conda` package manager) or custom Docker images based on official Tensorflow images.\nAlternatively, a `pip` installation is possible as well. For installation on IBM Power Systems (e.g. AC992), see separate [installation instructions (experimental)](https://gitlab.com/rki_bioinformatics/DeePaC/-/blob/master/dockerfiles/ppc64le/README.md).\n\n### With Bioconda (recommended)\n [![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/deepac/README.html)\n\nYou can install DeePaC with `bioconda`. Set up the [bioconda channel](\n<https://bioconda.github.io/user/install.html#set-up-channels>) first (channel ordering is important):\n\n```\nconda config --add channels defaults\nconda config --add channels bioconda\nconda config --add channels conda-forge\n```\n\nWe recommend setting up an isolated `conda` environment:\n```\n# python 3.7-3.9 are supported\nconda create -n my_env python=3.9\nconda activate my_env\n```\n\nand then:\n```\n# For GPU support (recommended) - install tensorflow-gpu from the defaults channel\nconda install -c defaults tensorflow-gpu\nconda install deepac\n# Or: basic installation (CPU-only)\nconda install deepac\n```\n\nOptional: download and compile the latest deepac-live custom models [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4456008.svg)](https://doi.org/10.5281/zenodo.4456008):\n```\ndeepac getmodels --fetch\n```\n\nIf you want to install the plugins as well, use:\n\n```\nconda install deepacvir deepacstrain\n```\n\n### With Docker (also recommended)\n\nRequirements: \n* install [Docker](https://docs.docker.com/get-docker/) on your host machine. \n* For GPU support, you have to install the [NVIDIA Docker support](https://github.com/NVIDIA/nvidia-docker) as well.\n\nSee [TF Docker installation guide](https://www.tensorflow.org/install/docker) and the \n[NVIDIA Docker support installation guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker) \nfor details. The guide below assumes you have Docker 19.03 or above.\n\nYou can then pull the desired image:\n```\n# Basic installation - CPU only\ndocker pull dacshpi/deepac:0.13.5\n\n# For GPU support\ndocker pull dacshpi/deepac:0.13.5-gpu\n```\n\nAnd run it:\n```\n# Basic installation - CPU only\ndocker run -v $(pwd):/deepac -u $(id -u):$(id -g) --rm dacshpi/deepac:0.13.5 deepac --help\ndocker run -v $(pwd):/deepac -u $(id -u):$(id -g) --rm dacshpi/deepac:0.13.5 deepac test -q\n\n# With GPU support\ndocker run -v $(pwd):/deepac -u $(id -u):$(id -g) --rm --gpus all dacshpi/deepac:0.13.5-gpu deepac test\n\n# If you want to use the shell inside the container\ndocker run -it -v $(pwd):/deepac -u $(id -u):$(id -g) --rm --gpus all dacshpi/deepac:0.13.5-gpu bash\n```\n\nThe image ships the main `deepac` package along with the `deepac-vir` and `deepac-strain` plugins. See the basic usage guide below for more deepac commands.\n\nOptional: download and compile the latest deepac-live custom models [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4456008.svg)](https://doi.org/10.5281/zenodo.4456008):\n```\ndocker run -v $(pwd):/deepac -u $(id -u):$(id -g) --rm --gpus all dacshpi/deepac:0.13.5 deepac --fetch\n```\n\nFor more information about the usage of the NVIDIA container toolkit (e.g. selecting the GPUs to use),\n consult the [User Guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html#user-guide).\n\nThe `dacshpi/deepac:latest` corresponds to the latest version of the CPU build. We recommend using explicit version tags instead.\n\n### With pip\n\nWe recommend setting up an isolated `conda` environment (see above). Alternatively, you can use a `virtualenv` virtual environment (note that deepac requires python 3):\n```\n# use -p to use the desired python interpreter (python 3.6 or higher required)\nvirtualenv -p /usr/bin/python3 my_env\nsource my_env/bin/activate\n```\n\nYou can then install DeePaC with `pip`. For GPU support, you need to install CUDA and CuDNN manually first (see TensorFlow installation guide for details). \nThen you can do the same as above:\n\n```\npip install deepac\n```\n\nOptional: download and compile the latest deepac-live custom models [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4456008.svg)](https://doi.org/10.5281/zenodo.4456008):\n```\ndeepac getmodels --fetch\n```\n\nIf you want to install the plugins, use:\n\n```\npip install deepacvir deepacstrain\n```\n\n### Optional: run tests\nOptionally, you can run explicit tests of your installation. Note that it may take some time on a CPU.\n```\n# Run standard tests\ndeepac test\n# Run quick tests (eg. on CPUs)\ndeepac test -q\n# Test using specific GPUs (here: /device:GPU:0 and /device:GPU:1) \ndeepac test -g 0 1\n# Test explainability and gwpa workflows\ndeepac test -xp\n# Full tests\ndeepac test -a\n# Full quick tests (eg. on GPUs with limited memory)\ndeepac test -aq\n```\n\n### Help\n\nTo see help, just use\n```\ndeepac --help\ndeepac predict --help\ndeepac train --help\n# Etc.\n```\n\n## Basic use: prediction\n\nYou can predict pathogenic potentials with one of the built-in models out of the box:\n```\n# A rapid CNN (trained on IMG/M data)\ndeepac predict -r input.fasta\n# A sensitive LSTM (trained on IMG/M data)\ndeepac predict -s input.fasta\n```\n\nThe rapid and the sensitive models are trained to predict pathogenic potentials of novel bacterial species.\nFor details, see <https://doi.org/10.1093/bioinformatics/btz541> or <https://www.biorxiv.org/content/10.1101/535286v3>.\n\nTo quickly filter your data according to predicted pathogenic potentials, you can use:\n```\ndeepac predict -r input.fasta\ndeepac filter input.fasta input_predictions.npy -t 0.5\n```\nNote that after running `predict`, you can use the `input_predictions.npy` to filter your fasta file with different\nthresholds. You can also add pathogenic potentials to the fasta headers in the output files:\n```\ndeepac filter input.fasta input_predictions.npy -t 0.75 -p -o output-75.fasta\ndeepac filter input.fasta input_predictions.npy -t 0.9 -p -o output-90.fasta\n```\n\n## Advanced use\n### Config templates\nTo get the config templates in the current working directory, simply use:\n```\ndeepac templates\n```\n### Preprocessing\n\nFor more complex analyzes, it can be useful to preprocess the fasta files by converting them to binary numpy arrays. Use:\n```\ndeepac preproc preproc_config.ini\n```\nSee the `config_templates` directory of the GitLab repository (https://gitlab.com/rki_bioinformatics/DeePaC/) for a sample configuration file.\n\n### Training\nYou can use the built-in architectures to train a new model:\n```\ndeepac train -r -T train_data.npy -t train_labels.npy -V val_data.npy -v val_labels.npy\ndeepac train -s -T train_data.npy -t train_labels.npy -V val_data.npy -v val_labels.npy\n\n```\n\nTo train a new model based on you custom configuration, use\n```\ndeepac train -c nn_train_config.ini\n```\n\nIf you train an LSTM on a GPU, a CUDNNLSTM implementation will be used. To convert the resulting model to be \nCPU-compatible, use `deepac convert`. You can also use it to save the weights of a model, or recompile a model \nfrom a set of weights:\n\n```\n# Save model weights and convert the model to an equivalent with the same architecture and weights.\n# Other config parameters can be adjusted\ndeepac convert model_config.ini saved_model.h5\n# Recompile the model\ndeepac convert saved_model_config.ini saved_model_weights.h5 -w\n```\n\n### Evaluation\n\nTo evaluate a trained model, use\n```\n# Read-by-read performance\ndeepac eval -r eval_config.ini\n# Species-by-species performance\ndeepac eval -s eval_species_config.ini\n# Ensemble performance\ndeepac eval -e eval_ens_config.ini\n```\nSee the configs directory for sample configuration files. Note that `deepac eval -s` requires precomputed predictions \nand a csv file with a number of DNA reads for each species in each of the classes.\n\n### TPU (experimental)\nIf you want to use a TPU, run DeePaC with the `--tpu` parameter:\n```\n# Test a TPU\ndeepac --tpu colab test\n```\n\n## Intepretability workflows\n### Filter visualization\nTo find the most relevant filters and visualize them, use the following minimum workflow: \n```\n# Calculate filter and nucleotide contibutions (partial Shapley values) for the first convolutional layer\n# using mean-centered weight matrices and \"easy\" calculation mode\ndeepac explain fcontribs -m model.h5 -eb -t test_data.npy -N test_nonpatho.fasta -P test_patho.fasta -o fcontribs \n\n# Create filter ranking\ndeepac explain franking -f fcontribs/filter_scores -y test_labels.npy -p test_predictions.npy -o franking\n\n# Prepare transfac files for filter visualization (weighted by filter contribution)\ndeepac explain fa2transfac -i fcontribs/fasta -o fcontribs/transfac -w -W fcontribs/filter_scores\n\n# Visualize nucleotide contribution sequence logos\ndeepac explain xlogos -i fcontribs/fasta -s fcontribs/nuc_scores -I fcontribs/transfac -t train_data.npy -o xlogos\n```\nYou can browse through other supplementary functionalities and parameters by checking the help:\n```\ndeepac explain -h\ndeepac explain fcontribs -h\ndeepac explain xlogos -h\n# etc.\n```\n\n### Genome-wide phenotype potential analysis (GWPA)\nTo find interesting regions of a whole genome, use this workflow to generate nucleotide-resolution maps of\npredicted phenotype potentials and nucleotide contributions:\n```\n# Fragment the genomes into pseudoreads\ndeepac gwpa fragment -g genomes_fasta -o fragmented_genomes\n\n# Predict the pathogenic potential of each pseudoread\ndeepac predict -r -a fragmented_genomes/sample1_fragmented_genomes.npy -o predictions/sample1_pred.npy\n\n# Create bedgraphs of mean pathogenic potential at each position of the genome\n# Can be visualized in IGV\ndeepac gwpa genomemap -f fragmented_genomes -p predictions -g genomes_genome -o bedgraph\n\n# Rank genes by mean pathogenic potential\ndeepac gwpa granking -p bedgraph -g genomes_gff -o granking\n\n# Create bedgraphs of mean nuclotide contribution at each position of the genome\n# Can be visualized in IGV\ndeepac gwpa ntcontribs -m model.h5 -f fragmented_genomes -g genomes_genome -o bedgraph_nt\n```\nYou can browse through other supplementary functionalities and parameters by checking the help:\n```\ndeepac gwpa -h\ndeepac gwpa genomemap -h\ndeepac gwpa ntcontribs -h\n# etc.\n```\n### Filter enrichment analysis\nFinally, you can check for filter enrichment in annotated genes or other genomic features:\n```\n# Get filter activations, genome-wide\ndeepac gwpa factiv -m model.h5 -t fragmented_genomes/sample1_fragmented_genomes.npy -f fragmented_genomes/sample1_fragmented_genomes.fasta -o factiv\n\n# Check for enrichment within annotated genomic features\ndeepac gwpa fenrichment -i factiv -g genomes_gff/sample1.gff -o fenrichment\n```\n\n## Supplementary data and scripts\nDatasets are available here: [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3678563.svg)](https://doi.org/10.5281/zenodo.3678563) (bacteria) and here: [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4312525.svg)](https://doi.org/10.5281/zenodo.4312525) (viruses).\nIn the supplement_paper directory you can find the R scripts and data files used in the papers for dataset preprocessing and benchmarking.\n\n## Erratum\nThe second sentence in section 2.2.3 of the bacterial DeePaC paper (<https://doi.org/10.1093/bioinformatics/btz541>) is partially incomplete.\n\nPublished text: \u201cAll were initialized with He weight initialization (He et al, 2015) and trained\u2026\u201d\n\nShould be: \u201cAll were initialized with He weight initialization (He et al, 2015) or Glorot initialization (Glorot & Bengio, 2010) for recurrent and feedforward layers respectively and trained\u2026\n\n## Known issues\nUnfortunately, the following issues are independent of the DeePaC codebase:\n* pip installation of pybedtools (a deepac dependency) requires libz-dev and will fail if it is not present on your system. To solve this, install libz-dev or use the bioconda installation.\n* A bug in TF 2.2 may cause training to hang when using Keras Sequence input (i.e. if your training config contains\n `Use_TFData = False` and `LoadTrainingByBatch = True`). To solve this, use TF 2.1 or TF 2.3+,\n  pre-load your data into memory (`LoadTrainingByBatch = False`) or use TFDataset input (`Use_TFData = True`).\n* A bug in TF 2.1 resets the optimizer state when continuing interrupted training. DeePaC will notice that and warn you, but to solve this, upgrade to TF 2.2+.\n* h5py>=3.0 is not compatible with Tensorflow at the moment and will cause errors when loading Keras (and DeePaC) models (hence, deepac tests will fail as well). \nConda installation takes care of it automatically, but the pip Tensorflow installation does not. To solve it, use conda installation or install h5py<3.0. \nThis issue should be resolved in a future version of Tensorflow.\n* shap 0.38 requires IPython but the pip installer does not install it. Manual installation solves the problem.\n\n## Cite us\nIf you find DeePaC useful, please cite:\n\n```\n@article{10.1093/bioinformatics/btz541,\n    author = {Bartoszewicz, Jakub M and Seidel, Anja and Rentzsch, Robert and Renard, Bernhard Y},\n    title = \"{DeePaC: predicting pathogenic potential of novel DNA with reverse-complement neural networks}\",\n    journal = {Bioinformatics},\n    volume = {36},\n    number = {1},\n    pages = {81-89},\n    year = {2020},\n    month = {01},\n    issn = {1367-4803},\n    doi = {10.1093/bioinformatics/btz541},\n    url = {https://doi.org/10.1093/bioinformatics/btz541},\n    eprint = {https://academic.oup.com/bioinformatics/article-pdf/36/1/81/31813920/btz541.pdf},\n}\n\n@article{10.1093/nargab/lqab004,\n    author = {Bartoszewicz, Jakub M and Seidel, Anja and Renard, Bernhard Y},\n    title = \"{Interpretable detection of novel human viruses from genome sequencing data}\",\n    journal = {NAR Genomics and Bioinformatics},\n    volume = {3},\n    number = {1},\n    year = {2021},\n    month = {02},\n    issn = {2631-9268},\n    doi = {10.1093/nargab/lqab004},\n    url = {https://doi.org/10.1093/nargab/lqab004},\n    note = {lqab004},\n    eprint = {https://academic.oup.com/nargab/article-pdf/3/1/lqab004/36165658/lqab004.pdf},\n}\n\n@article{10.1093/bib/bbab269,\n    author = {Bartoszewicz, Jakub M and Genske, Ulrich and Renard, Bernhard Y},\n    title = \"{Deep learning-based real-time detection of novel pathogens during sequencing}\",\n    journal = {Briefings in Bioinformatics},\n    volume = {22},\n    number = {6},\n    year = {2021},\n    month = {07},\n    issn = {1477-4054},\n    doi = {10.1093/bib/bbab269},\n    url = {https://doi.org/10.1093/bib/bbab269},\n    note = {bbab269},\n    eprint = {https://academic.oup.com/bib/article-pdf/22/6/bbab269/41088711/bbab269.pdf},\n}\n\n@article{10.1101/2021.11.30.470625,\n    author = {Bartoszewicz, Jakub M and Nasri, Ferdous and Nowicka, Melania and Renard, Bernhard Y},\n    title = {Pathogenic potential prediction for novel fungal DNA based on a curated fungi-hosts data collection},\n    year = {2021},\n    doi = {10.1101/2021.11.30.470625},\n    publisher = {Cold Spring Harbor Laboratory},\n    URL = {https://www.biorxiv.org/content/early/2021/12/01/2021.11.30.470625},\n    eprint = {https://www.biorxiv.org/content/early/2021/12/01/2021.11.30.470625.full.pdf},\n    journal = {bioRxiv}\n}\n\n```\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Predicting pathogenic potentials of novel DNA with reverse-complement neural networks.",
    "version": "0.14.1",
    "split_keywords": [
        "deep",
        "learning",
        "dna",
        "sequencing",
        "synthetic",
        "biology",
        "pathogenicity",
        "prediction"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "618059cbae4d87b8e3f74c104a60fa30",
                "sha256": "ad7b613ba181aeb91b2233ef90157f3970d5585f218243a3a17f9a87d1806584"
            },
            "downloads": -1,
            "filename": "deepac-0.14.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "618059cbae4d87b8e3f74c104a60fa30",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3",
            "size": 34841389,
            "upload_time": "2022-12-16T17:38:40",
            "upload_time_iso_8601": "2022-12-16T17:38:40.316697Z",
            "url": "https://files.pythonhosted.org/packages/e2/b5/ed5eecd797290e743be611f7415ca8a6149765fe1f6457a1153affe7b79b/deepac-0.14.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "ea5fce0079523d3a07c25756d836b400",
                "sha256": "965f60467261b05c62f3065660021c0727a64709b366008424e03a618c5fce38"
            },
            "downloads": -1,
            "filename": "deepac-0.14.1.tar.gz",
            "has_sig": false,
            "md5_digest": "ea5fce0079523d3a07c25756d836b400",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3",
            "size": 34818707,
            "upload_time": "2022-12-16T17:39:35",
            "upload_time_iso_8601": "2022-12-16T17:39:35.955045Z",
            "url": "https://files.pythonhosted.org/packages/97/5f/585ce14c34ad8558b547d8e0a6745a436689d60e80cc9b19ebdf2f921126/deepac-0.14.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-12-16 17:39:35",
    "github": false,
    "gitlab": true,
    "bitbucket": false,
    "gitlab_user": "dacs-hpi",
    "gitlab_project": "deepac",
    "lcname": "deepac"
}

Jakub Bartoszewicz