sphae


Namesphae JSON
Version 1.31 PyPI version JSON
download
home_pagehttps://github.com/linsalrob/sphae
SummaryAssembling pure culture phages from both Illumina and Nanopore sequencing technology
upload_time2023-11-30 02:26:41
maintainer
docs_urlNone
authorBhavya Papudeshi
requires_python
licenseThe MIT License (MIT)
keywords phage 'phage therapy' bioinformatics microbiology bacteria genome genomics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![Edwards Lab](https://img.shields.io/badge/Bioinformatics-EdwardsLab-03A9F4)](https://edwards.flinders.edu.au)
[![DOI](https://zenodo.org/badge/403889262.svg)](https://zenodo.org/doi/10.5281/zenodo.8365088)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
![GitHub language count](https://img.shields.io/github/languages/count/linsalrob/spae)
[![](https://img.shields.io/static/v1?label=CLI&message=Snaketool&color=blueviolet)](https://github.com/beardymcjohnface/Snaketool)
![GitHub last commit (branch)](https://img.shields.io/github/last-commit/linsalrob/spae/main)
[![CI](https://github.com/linsalrob/spae/actions/workflows/testing.yml/badge.svg)](https://github.com/linsalrob/spae/actions/workflows/testing.yml)

# Sphae 
## Phage toolkit to detect phage candidates for phage therapy
<p align="center">
  <img src="sphae.png#gh-light-mode-only" width="300">
  <img src="sphaedark.png#gh-dark-mode-only" width="300">
</p>



**Overview**

This snakemake workflow was built using Snaketool [https://doi.org/10.1371/journal.pcbi.1010705], to assemble and annotate phage sequences. Currently, this tool is being developed for phage genomes. The steps include,

- Quality control that removes adaptor sequences, low-quality reads and host contamination (optional). 
- Assembly
- Contig quality checks; read coverage, viral or not, completeness, and assembly graph components. 
- Phage genome annotation'
- Annotation of the phage genome 
  
A complete list of programs used for each step is mentioned in the sphae.CITATION file. 

### Install 

**Pre-requisites**   
  - gcc
  - conda 
  - libgl1-mesa-dev (ubuntu- for Bandage)
  - libxcb-xinerama0 (ubuntu- for Bandage)

**Install**

Setting up a new conda environment 

```bash
conda create -n sphae python=3.11
conda activate sphae
conda install -n base -c conda-forge mamba #if you don't already have mamba installed
```

Steps for installing sphae workflow 

```bash
#clone sphae repository
git clone https://github.com/linsalrob/sphae.git

#move to sphae folder
cd sphae

#install sphae
pip install -e .

#confirm the workflow is installed by running the below command 
sphae --help
```

## Installing databases
Run command,

```
sphae install
```

  Install the databases to a directory, `sphae/workflow/databases`

  This workflow requires the 
  - Pfam35.0 database to run viral_verify for contig classification. 
  - CheckV database to test for phage completeness
  - Pharokka databases 
  - Phynteny models

This step takes approximately 1hr 30min to install and requires 9G of storage

## Running the workflow

The command `sphae run` will run QC, assembly and annotation

**Commands to run**

Only one command needs to be submitted to run all the above steps: QC, assembly and assembly stats

```bash
#For illumina reads, place the reads both forward and reverse reads to one directory
sphae run --input tests/data/illumina-subset --output example -k 

#For nanopore reads, place the reads, one file per sample in a directory
sphae run --input tests/data/nanopore-subset --sequencing longread --output example -k

#To run either of the commands on the cluster, add --profile slurm to the command. For instance here is the command for longreads/nanopore reads 
#Before running this below command, make sure have slurm config files setup, here is a tutorial, https://fame.flinders.edu.au/blog/2021/08/02/snakemake-profiles-updated 
sphae run --input tests/data/nanopore-subset --preprocess longread --output example --profile slurm -k
```

**Output**

Output is saved to `example/RESULTS` directory. In this directory, there will be four files 
  - Genome annotations in GenBank format (Phynteny output)
  - Genome in fasta format (either the reoriented to terminase output from Pharokka, or assembled viral contigs)
  - Circular visualization in `png` format (Pharokka output)
  - Genome summary file

Genome summary file includes the following information to help, 
  - Sample name
  - Length of the genome 
  - Coding density
  - If the assembled contig is circular or not (From the assembly graph)
  - Completeness (calculated from CheckV)
  - Contamination (calculated from CheckV)
  - Taxonomy accession ID (Pharokka output, searches the genome against INPHARED database using mash)
  - Taxa mash includes the number of matching hashes of the assembled genome to the accession ID/Taxa name. Higher the matching hash- more likely the genome is related to the taxa predicted
  - Gene searches:
    - Whether integrase is found (search for integrase gene in annotations)
    - Whether anti-microbial genes were found (Pharokka search against AMR database)
    - Whether any virulence factors were found (Pharokka search against virulence gene database)
    - Whether any CRISPR spacers were found (Pharokka search against MinCED database) 
 
## Issues and Questions

This is still a work in progress, so if you come across any issues or errors, report them under Issues. 


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/linsalrob/sphae",
    "name": "sphae",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "phage 'phage therapy' bioinformatics microbiology bacteria genome genomics",
    "author": "Bhavya Papudeshi",
    "author_email": "npbhavya13@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/ca/dd/f0d196e089c83062bf0a32f068a52555a263a275489aa2079492b831071e/sphae-1.31.tar.gz",
    "platform": "any",
    "description": "[![Edwards Lab](https://img.shields.io/badge/Bioinformatics-EdwardsLab-03A9F4)](https://edwards.flinders.edu.au)\n[![DOI](https://zenodo.org/badge/403889262.svg)](https://zenodo.org/doi/10.5281/zenodo.8365088)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n![GitHub language count](https://img.shields.io/github/languages/count/linsalrob/spae)\n[![](https://img.shields.io/static/v1?label=CLI&message=Snaketool&color=blueviolet)](https://github.com/beardymcjohnface/Snaketool)\n![GitHub last commit (branch)](https://img.shields.io/github/last-commit/linsalrob/spae/main)\n[![CI](https://github.com/linsalrob/spae/actions/workflows/testing.yml/badge.svg)](https://github.com/linsalrob/spae/actions/workflows/testing.yml)\n\n# Sphae \n## Phage toolkit to detect phage candidates for phage therapy\n<p align=\"center\">\n  <img src=\"sphae.png#gh-light-mode-only\" width=\"300\">\n  <img src=\"sphaedark.png#gh-dark-mode-only\" width=\"300\">\n</p>\n\n\n\n**Overview**\n\nThis snakemake workflow was built using Snaketool [https://doi.org/10.1371/journal.pcbi.1010705], to assemble and annotate phage sequences. Currently, this tool is being developed for phage genomes. The steps include,\n\n- Quality control that removes adaptor sequences, low-quality reads and host contamination (optional). \n- Assembly\n- Contig quality checks; read coverage, viral or not, completeness, and assembly graph components. \n- Phage genome annotation'\n- Annotation of the phage genome \n  \nA complete list of programs used for each step is mentioned in the sphae.CITATION file. \n\n### Install \n\n**Pre-requisites**   \n  - gcc\n  - conda \n  - libgl1-mesa-dev (ubuntu- for Bandage)\n  - libxcb-xinerama0 (ubuntu- for Bandage)\n\n**Install**\n\nSetting up a new conda environment \n\n```bash\nconda create -n sphae python=3.11\nconda activate sphae\nconda install -n base -c conda-forge mamba #if you don't already have mamba installed\n```\n\nSteps for installing sphae workflow \n\n```bash\n#clone sphae repository\ngit clone https://github.com/linsalrob/sphae.git\n\n#move to sphae folder\ncd sphae\n\n#install sphae\npip install -e .\n\n#confirm the workflow is installed by running the below command \nsphae --help\n```\n\n## Installing databases\nRun command,\n\n```\nsphae install\n```\n\n  Install the databases to a directory, `sphae/workflow/databases`\n\n  This workflow requires the \n  - Pfam35.0 database to run viral_verify for contig classification. \n  - CheckV database to test for phage completeness\n  - Pharokka databases \n  - Phynteny models\n\nThis step takes approximately 1hr 30min to install and requires 9G of storage\n\n## Running the workflow\n\nThe command `sphae run` will run QC, assembly and annotation\n\n**Commands to run**\n\nOnly one command needs to be submitted to run all the above steps: QC, assembly and assembly stats\n\n```bash\n#For illumina reads, place the reads both forward and reverse reads to one directory\nsphae run --input tests/data/illumina-subset --output example -k \n\n#For nanopore reads, place the reads, one file per sample in a directory\nsphae run --input tests/data/nanopore-subset --sequencing longread --output example -k\n\n#To run either of the commands on the cluster, add --profile slurm to the command. For instance here is the command for longreads/nanopore reads \n#Before running this below command, make sure have slurm config files setup, here is a tutorial, https://fame.flinders.edu.au/blog/2021/08/02/snakemake-profiles-updated \nsphae run --input tests/data/nanopore-subset --preprocess longread --output example --profile slurm -k\n```\n\n**Output**\n\nOutput is saved to `example/RESULTS` directory. In this directory, there will be four files \n  - Genome annotations in GenBank format (Phynteny output)\n  - Genome in fasta format (either the reoriented to terminase output from Pharokka, or assembled viral contigs)\n  - Circular visualization in `png` format (Pharokka output)\n  - Genome summary file\n\nGenome summary file includes the following information to help, \n  - Sample name\n  - Length of the genome \n  - Coding density\n  - If the assembled contig is circular or not (From the assembly graph)\n  - Completeness (calculated from CheckV)\n  - Contamination (calculated from CheckV)\n  - Taxonomy accession ID (Pharokka output, searches the genome against INPHARED database using mash)\n  - Taxa mash includes the number of matching hashes of the assembled genome to the accession ID/Taxa name. Higher the matching hash- more likely the genome is related to the taxa predicted\n  - Gene searches:\n    - Whether integrase is found (search for integrase gene in annotations)\n    - Whether anti-microbial genes were found (Pharokka search against AMR database)\n    - Whether any virulence factors were found (Pharokka search against virulence gene database)\n    - Whether any CRISPR spacers were found (Pharokka search against MinCED database) \n \n## Issues and Questions\n\nThis is still a work in progress, so if you come across any issues or errors, report them under Issues. \n\n",
    "bugtrack_url": null,
    "license": "The MIT License (MIT)",
    "summary": "Assembling pure culture phages from both Illumina and Nanopore sequencing technology",
    "version": "1.31",
    "project_urls": {
        "Homepage": "https://github.com/linsalrob/sphae"
    },
    "split_keywords": [
        "phage",
        "'phage",
        "therapy'",
        "bioinformatics",
        "microbiology",
        "bacteria",
        "genome",
        "genomics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "caddf0d196e089c83062bf0a32f068a52555a263a275489aa2079492b831071e",
                "md5": "c79e1b03c1f6780826cdf819714357ba",
                "sha256": "89be2ea3ecba88fd8133ff2e397f992e0858c35c072f9ff53493289d5b0c81f9"
            },
            "downloads": -1,
            "filename": "sphae-1.31.tar.gz",
            "has_sig": false,
            "md5_digest": "c79e1b03c1f6780826cdf819714357ba",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 55043,
            "upload_time": "2023-11-30T02:26:41",
            "upload_time_iso_8601": "2023-11-30T02:26:41.066835Z",
            "url": "https://files.pythonhosted.org/packages/ca/dd/f0d196e089c83062bf0a32f068a52555a263a275489aa2079492b831071e/sphae-1.31.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-11-30 02:26:41",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "linsalrob",
    "github_project": "sphae",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "sphae"
}
        
Elapsed time: 0.14865s