[![Edwards Lab](https://img.shields.io/badge/Bioinformatics-EdwardsLab-03A9F4)](https://edwards.flinders.edu.au)
[![DOI](https://zenodo.org/badge/403889262.svg)](https://zenodo.org/doi/10.5281/zenodo.8365088)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
![GitHub language count](https://img.shields.io/github/languages/count/linsalrob/spae)
[![](https://img.shields.io/static/v1?label=CLI&message=Snaketool&color=blueviolet)](https://github.com/beardymcjohnface/Snaketool)
![GitHub last commit (branch)](https://img.shields.io/github/last-commit/linsalrob/spae/main)
[![CI](https://github.com/linsalrob/spae/actions/workflows/testing.yml/badge.svg)](https://github.com/linsalrob/spae/actions/workflows/testing.yml)
# Sphae
## Phage toolkit to detect phage candidates for phage therapy
<p align="center">
<img src="sphae.png#gh-light-mode-only" width="300">
<img src="sphaedark.png#gh-dark-mode-only" width="300">
</p>
**Overview**
This snakemake workflow was built using Snaketool [https://doi.org/10.1371/journal.pcbi.1010705], to assemble and annotate phage sequences. Currently, this tool is being developed for phage genomes. The steps include,
- Quality control that removes adaptor sequences, low-quality reads and host contamination (optional).
- Assembly
- Contig quality checks; read coverage, viral or not, completeness, and assembly graph components.
- Phage genome annotation'
- Annotation of the phage genome
A complete list of programs used for each step is mentioned in the sphae.CITATION file.
### Install
**Pre-requisites**
- gcc
- conda
- libgl1-mesa-dev (ubuntu- for Bandage)
- libxcb-xinerama0 (ubuntu- for Bandage)
**Install**
Setting up a new conda environment
```bash
conda create -n sphae python=3.11
conda activate sphae
conda install -n base -c conda-forge mamba #if you don't already have mamba installed
```
Steps for installing sphae workflow
```bash
#clone sphae repository
git clone https://github.com/linsalrob/sphae.git
#move to sphae folder
cd sphae
#install sphae
pip install -e .
#confirm the workflow is installed by running the below command
sphae --help
```
## Installing databases
Run command,
```
sphae install
```
Install the databases to a directory, `sphae/workflow/databases`
This workflow requires the
- Pfam35.0 database to run viral_verify for contig classification.
- CheckV database to test for phage completeness
- Pharokka databases
- Phynteny models
This step takes approximately 1hr 30min to install and requires 9G of storage
## Running the workflow
The command `sphae run` will run QC, assembly and annotation
**Commands to run**
Only one command needs to be submitted to run all the above steps: QC, assembly and assembly stats
```bash
#For illumina reads, place the reads both forward and reverse reads to one directory
sphae run --input tests/data/illumina-subset --output example -k
#For nanopore reads, place the reads, one file per sample in a directory
sphae run --input tests/data/nanopore-subset --sequencing longread --output example -k
#To run either of the commands on the cluster, add --profile slurm to the command. For instance here is the command for longreads/nanopore reads
#Before running this below command, make sure have slurm config files setup, here is a tutorial, https://fame.flinders.edu.au/blog/2021/08/02/snakemake-profiles-updated
sphae run --input tests/data/nanopore-subset --preprocess longread --output example --profile slurm -k
```
**Output**
Output is saved to `example/RESULTS` directory. In this directory, there will be four files
- Genome annotations in GenBank format (Phynteny output)
- Genome in fasta format (either the reoriented to terminase output from Pharokka, or assembled viral contigs)
- Circular visualization in `png` format (Pharokka output)
- Genome summary file
Genome summary file includes the following information to help,
- Sample name
- Length of the genome
- Coding density
- If the assembled contig is circular or not (From the assembly graph)
- Completeness (calculated from CheckV)
- Contamination (calculated from CheckV)
- Taxonomy accession ID (Pharokka output, searches the genome against INPHARED database using mash)
- Taxa mash includes the number of matching hashes of the assembled genome to the accession ID/Taxa name. Higher the matching hash- more likely the genome is related to the taxa predicted
- Gene searches:
- Whether integrase is found (search for integrase gene in annotations)
- Whether anti-microbial genes were found (Pharokka search against AMR database)
- Whether any virulence factors were found (Pharokka search against virulence gene database)
- Whether any CRISPR spacers were found (Pharokka search against MinCED database)
## Issues and Questions
This is still a work in progress, so if you come across any issues or errors, report them under Issues.
Raw data
{
"_id": null,
"home_page": "https://github.com/linsalrob/sphae",
"name": "sphae",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "phage 'phage therapy' bioinformatics microbiology bacteria genome genomics",
"author": "Bhavya Papudeshi",
"author_email": "npbhavya13@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/ca/dd/f0d196e089c83062bf0a32f068a52555a263a275489aa2079492b831071e/sphae-1.31.tar.gz",
"platform": "any",
"description": "[![Edwards Lab](https://img.shields.io/badge/Bioinformatics-EdwardsLab-03A9F4)](https://edwards.flinders.edu.au)\n[![DOI](https://zenodo.org/badge/403889262.svg)](https://zenodo.org/doi/10.5281/zenodo.8365088)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n![GitHub language count](https://img.shields.io/github/languages/count/linsalrob/spae)\n[![](https://img.shields.io/static/v1?label=CLI&message=Snaketool&color=blueviolet)](https://github.com/beardymcjohnface/Snaketool)\n![GitHub last commit (branch)](https://img.shields.io/github/last-commit/linsalrob/spae/main)\n[![CI](https://github.com/linsalrob/spae/actions/workflows/testing.yml/badge.svg)](https://github.com/linsalrob/spae/actions/workflows/testing.yml)\n\n# Sphae \n## Phage toolkit to detect phage candidates for phage therapy\n<p align=\"center\">\n <img src=\"sphae.png#gh-light-mode-only\" width=\"300\">\n <img src=\"sphaedark.png#gh-dark-mode-only\" width=\"300\">\n</p>\n\n\n\n**Overview**\n\nThis snakemake workflow was built using Snaketool [https://doi.org/10.1371/journal.pcbi.1010705], to assemble and annotate phage sequences. Currently, this tool is being developed for phage genomes. The steps include,\n\n- Quality control that removes adaptor sequences, low-quality reads and host contamination (optional). \n- Assembly\n- Contig quality checks; read coverage, viral or not, completeness, and assembly graph components. \n- Phage genome annotation'\n- Annotation of the phage genome \n \nA complete list of programs used for each step is mentioned in the sphae.CITATION file. \n\n### Install \n\n**Pre-requisites** \n - gcc\n - conda \n - libgl1-mesa-dev (ubuntu- for Bandage)\n - libxcb-xinerama0 (ubuntu- for Bandage)\n\n**Install**\n\nSetting up a new conda environment \n\n```bash\nconda create -n sphae python=3.11\nconda activate sphae\nconda install -n base -c conda-forge mamba #if you don't already have mamba installed\n```\n\nSteps for installing sphae workflow \n\n```bash\n#clone sphae repository\ngit clone https://github.com/linsalrob/sphae.git\n\n#move to sphae folder\ncd sphae\n\n#install sphae\npip install -e .\n\n#confirm the workflow is installed by running the below command \nsphae --help\n```\n\n## Installing databases\nRun command,\n\n```\nsphae install\n```\n\n Install the databases to a directory, `sphae/workflow/databases`\n\n This workflow requires the \n - Pfam35.0 database to run viral_verify for contig classification. \n - CheckV database to test for phage completeness\n - Pharokka databases \n - Phynteny models\n\nThis step takes approximately 1hr 30min to install and requires 9G of storage\n\n## Running the workflow\n\nThe command `sphae run` will run QC, assembly and annotation\n\n**Commands to run**\n\nOnly one command needs to be submitted to run all the above steps: QC, assembly and assembly stats\n\n```bash\n#For illumina reads, place the reads both forward and reverse reads to one directory\nsphae run --input tests/data/illumina-subset --output example -k \n\n#For nanopore reads, place the reads, one file per sample in a directory\nsphae run --input tests/data/nanopore-subset --sequencing longread --output example -k\n\n#To run either of the commands on the cluster, add --profile slurm to the command. For instance here is the command for longreads/nanopore reads \n#Before running this below command, make sure have slurm config files setup, here is a tutorial, https://fame.flinders.edu.au/blog/2021/08/02/snakemake-profiles-updated \nsphae run --input tests/data/nanopore-subset --preprocess longread --output example --profile slurm -k\n```\n\n**Output**\n\nOutput is saved to `example/RESULTS` directory. In this directory, there will be four files \n - Genome annotations in GenBank format (Phynteny output)\n - Genome in fasta format (either the reoriented to terminase output from Pharokka, or assembled viral contigs)\n - Circular visualization in `png` format (Pharokka output)\n - Genome summary file\n\nGenome summary file includes the following information to help, \n - Sample name\n - Length of the genome \n - Coding density\n - If the assembled contig is circular or not (From the assembly graph)\n - Completeness (calculated from CheckV)\n - Contamination (calculated from CheckV)\n - Taxonomy accession ID (Pharokka output, searches the genome against INPHARED database using mash)\n - Taxa mash includes the number of matching hashes of the assembled genome to the accession ID/Taxa name. Higher the matching hash- more likely the genome is related to the taxa predicted\n - Gene searches:\n - Whether integrase is found (search for integrase gene in annotations)\n - Whether anti-microbial genes were found (Pharokka search against AMR database)\n - Whether any virulence factors were found (Pharokka search against virulence gene database)\n - Whether any CRISPR spacers were found (Pharokka search against MinCED database) \n \n## Issues and Questions\n\nThis is still a work in progress, so if you come across any issues or errors, report them under Issues. \n\n",
"bugtrack_url": null,
"license": "The MIT License (MIT)",
"summary": "Assembling pure culture phages from both Illumina and Nanopore sequencing technology",
"version": "1.31",
"project_urls": {
"Homepage": "https://github.com/linsalrob/sphae"
},
"split_keywords": [
"phage",
"'phage",
"therapy'",
"bioinformatics",
"microbiology",
"bacteria",
"genome",
"genomics"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "caddf0d196e089c83062bf0a32f068a52555a263a275489aa2079492b831071e",
"md5": "c79e1b03c1f6780826cdf819714357ba",
"sha256": "89be2ea3ecba88fd8133ff2e397f992e0858c35c072f9ff53493289d5b0c81f9"
},
"downloads": -1,
"filename": "sphae-1.31.tar.gz",
"has_sig": false,
"md5_digest": "c79e1b03c1f6780826cdf819714357ba",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 55043,
"upload_time": "2023-11-30T02:26:41",
"upload_time_iso_8601": "2023-11-30T02:26:41.066835Z",
"url": "https://files.pythonhosted.org/packages/ca/dd/f0d196e089c83062bf0a32f068a52555a263a275489aa2079492b831071e/sphae-1.31.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-11-30 02:26:41",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "linsalrob",
"github_project": "sphae",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "sphae"
}