baczy


Namebaczy JSON
Version 1.0.0 PyPI version JSON
download
home_pageNone
SummaryBacterial assembly and annotation
upload_time2025-02-23 23:43:09
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseMIT
keywords bioinformatics assembly annotation bacteria
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![Edwards Lab](https://img.shields.io/badge/Bioinformatics-EdwardsLab-03A9F4)](https://edwards.flinders.edu.au)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
![GitHub language count](https://img.shields.io/github/languages/count/npbhavya/baczy)
[![](https://img.shields.io/static/v1?label=CLI&message=Snaketool&color=blueviolet)](https://github.com/beardymcjohnface/Snaketool)
![GitHub last commit (branch)](https://img.shields.io/github/last-commit/npbhavya/baczy)
[![CI](https://github.com/npbhavya/baczy/actions/workflows/testing.yml/badge.svg)](https://github.com/npbhavya/baczy/actions/workflows/testing.yml)

# Baczy
<p align="center">
  <img src="baczy.png#gh-light-mode-only" width="300">
  <img src="baczy.png#gh-dark-mode-only" width="300">
</p>

## Overview

**Baczy** is a **Snakemake-based workflow** for **assembling and annotating bacterial host genomes**. It extends **[Sphae](https://github.com/linsalrob/sphae)**, which assembles and annotates phage genomes, by enabling host genome assembly and functional annotation. 

🔹 **Features:**  
✔ **Quality control** ([Fastp](https://github.com/OpenGene/fastp))  
✔ **Genome assembly** ([MEGAHIT](https://github.com/voutcn/megahit), [Hybracter](https://github.com/gbouras13/hybracter))  
✔ **Functional annotation** ([Bakta](https://github.com/oschwengers/bakta))  
✔ **Taxonomic classification** ([GTDB-Tk](https://github.com/Ecogenomics/GTDBTk))  
✔ **Taxonomic tree** ([GTDB-Tk](https://github.com/Ecogenomics/GTDBTk)) 
✔ **Defense & resistance profiling** ([Defense-Finder](https://github.com/mdmparis/defense-finder), [AMRFinderPlus](https://github.com/ncbi/amr), [CapsuleDB](https://research.pasteur.fr/en/tool/capsulefinder/))  
✔ **Prophage detection** ([PhiSpy](https://github.com/linsalrob/PhiSpy))  
✔ **Pan-genome analysis** ([Panaroo](https://github.com/gtonkinhill/panaroo))  

## Installion

Steps for installing workflow 

  ```bash
  #clone repository
  git clone https://github.com/npbhavya/baczy.git

  #move to sphae folder
  cd baczy

  #install
  pip install -e .

  #confirm the workflow is installed by running the below command 
  baczy --help
  ```

### Manual installation 
- Install singularity or load the module
    On deepthought cluster \
    `module load apptainer`

- Install miniconda
    Download and install Miniconda:
    [Miniconda Installation Guide](https://docs.anaconda.com/miniconda/install/)

## Database setup

Download and place the required databases in:
  `baczy/workflow/databases`

  - [CheckM2_database](https://github.com/chklovski/CheckM2?tab=readme-ov-file#database)
  - [GTDBTK database](https://ecogenomics.github.io/GTDBTk/installing/index.html)
  - [Capsuledb](https://gitlab.pasteur.fr/gem/capsuledb/-/tree/master/CapsuleFinder_models?ref_type=heads)

### Running the workflow

Run Baczy using a single command!

**Before starting the run**
The taxonomic tree is generated using [GTDB-Tk](https://github.com/Ecogenomics/GTDBTk), so update the lines 
  
  ```
  gtdbtk:
    outgroup: "d__Archaea"
    taxa_filter: "d__Bacteria"
  ```

This can be set to more specific genera: 
  
  ```
  gtdbtk:
    outgroup: "g__Escherichia"
    taxa_filter: "g__Achromobacter"
  ```

**For paired end reads**

  `baczy run --input sample-data/illumina --cores 32 --use-singularity --sdm apptainer --output test -k --use-conda`

**For long reads**

  `baczy run --input sample-data/nanopore --sequencing longread --cores 32 -k --use-singularity --sdm apptainer --output test -k --use-conda`

### Intermediate files 
Stored in:


  `baczy.out/PROCESSING`

### Final Results and Output
Stored in `RESULT-short` for short reads or `RESULTS-long` for long reads

Each folder contains:
  - **{sample} folder**
    - {sample}_amrfinderplus table: identified AMR genes in the genome
    - {sample}_contigs.fa or {sample}_final.fasta : assembled genomes for each genome
    - {sample}.faa : identified proteins
    - {sample}.fna : identified genes
    - {sample}.gbff
    - {sample}.gff3
    - {sample}.png and {sample}.svg : genome visualized
    - {sample}.txt: summary 
    - {sample}_prophage_coordinates.tsv: location of the identified prophages using Phispy
  - **amrfinder_summary.tsv** : a table with all the AMRFinder genes in all the samples
  - **bakta_summary.tsv** : Bakta summary for all the samples saved to one table
  - **checkm2_quality_report.tsv** : Checkm2 completenes results
  - **defensefinder_summary.tsv** : All the defense systems found in all the samples 
  - **gtdbtk.ba120_summary.tsv** : GTDBTK summary with the predicted taxa for each of the samples
  - **gtdbtk.bac120.decorated.tree** , *gtdbtk.bac120.tree.table* : GTDBTK tree and the tree table
    - visualize the tree on iTOL
  - **prophage_regions.tsv** : Location of the prophage regions in al the samples
  - **Panaroo folder**
    - output from running [panaroo](https://github.com/gtonkinhill/panaroo)  

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "baczy",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "bioinformatics, assembly, annotation, bacteria",
    "author": null,
    "author_email": "Bhavya Papudeshi <npbhavya13@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/5f/e0/6d8b64db835fa4f404b34693a191c1a9b47a4ca5f3527ebd881c8aefc65a/baczy-1.0.0.tar.gz",
    "platform": null,
    "description": "[![Edwards Lab](https://img.shields.io/badge/Bioinformatics-EdwardsLab-03A9F4)](https://edwards.flinders.edu.au)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n![GitHub language count](https://img.shields.io/github/languages/count/npbhavya/baczy)\n[![](https://img.shields.io/static/v1?label=CLI&message=Snaketool&color=blueviolet)](https://github.com/beardymcjohnface/Snaketool)\n![GitHub last commit (branch)](https://img.shields.io/github/last-commit/npbhavya/baczy)\n[![CI](https://github.com/npbhavya/baczy/actions/workflows/testing.yml/badge.svg)](https://github.com/npbhavya/baczy/actions/workflows/testing.yml)\n\n# Baczy\n<p align=\"center\">\n  <img src=\"baczy.png#gh-light-mode-only\" width=\"300\">\n  <img src=\"baczy.png#gh-dark-mode-only\" width=\"300\">\n</p>\n\n## Overview\n\n**Baczy** is a **Snakemake-based workflow** for **assembling and annotating bacterial host genomes**. It extends **[Sphae](https://github.com/linsalrob/sphae)**, which assembles and annotates phage genomes, by enabling host genome assembly and functional annotation. \n\n\ud83d\udd39 **Features:**  \n\u2714 **Quality control** ([Fastp](https://github.com/OpenGene/fastp))  \n\u2714 **Genome assembly** ([MEGAHIT](https://github.com/voutcn/megahit), [Hybracter](https://github.com/gbouras13/hybracter))  \n\u2714 **Functional annotation** ([Bakta](https://github.com/oschwengers/bakta))  \n\u2714 **Taxonomic classification** ([GTDB-Tk](https://github.com/Ecogenomics/GTDBTk))  \n\u2714 **Taxonomic tree** ([GTDB-Tk](https://github.com/Ecogenomics/GTDBTk)) \n\u2714 **Defense & resistance profiling** ([Defense-Finder](https://github.com/mdmparis/defense-finder), [AMRFinderPlus](https://github.com/ncbi/amr), [CapsuleDB](https://research.pasteur.fr/en/tool/capsulefinder/))  \n\u2714 **Prophage detection** ([PhiSpy](https://github.com/linsalrob/PhiSpy))  \n\u2714 **Pan-genome analysis** ([Panaroo](https://github.com/gtonkinhill/panaroo))  \n\n## Installion\n\nSteps for installing workflow \n\n  ```bash\n  #clone repository\n  git clone https://github.com/npbhavya/baczy.git\n\n  #move to sphae folder\n  cd baczy\n\n  #install\n  pip install -e .\n\n  #confirm the workflow is installed by running the below command \n  baczy --help\n  ```\n\n### Manual installation \n- Install singularity or load the module\n    On deepthought cluster \\\n    `module load apptainer`\n\n- Install miniconda\n    Download and install Miniconda:\n    [Miniconda Installation Guide](https://docs.anaconda.com/miniconda/install/)\n\n## Database setup\n\nDownload and place the required databases in:\n  `baczy/workflow/databases`\n\n  - [CheckM2_database](https://github.com/chklovski/CheckM2?tab=readme-ov-file#database)\n  - [GTDBTK database](https://ecogenomics.github.io/GTDBTk/installing/index.html)\n  - [Capsuledb](https://gitlab.pasteur.fr/gem/capsuledb/-/tree/master/CapsuleFinder_models?ref_type=heads)\n\n### Running the workflow\n\nRun Baczy using a single command!\n\n**Before starting the run**\nThe taxonomic tree is generated using [GTDB-Tk](https://github.com/Ecogenomics/GTDBTk), so update the lines \n  \n  ```\n  gtdbtk:\n    outgroup: \"d__Archaea\"\n    taxa_filter: \"d__Bacteria\"\n  ```\n\nThis can be set to more specific genera: \n  \n  ```\n  gtdbtk:\n    outgroup: \"g__Escherichia\"\n    taxa_filter: \"g__Achromobacter\"\n  ```\n\n**For paired end reads**\n\n  `baczy run --input sample-data/illumina --cores 32 --use-singularity --sdm apptainer --output test -k --use-conda`\n\n**For long reads**\n\n  `baczy run --input sample-data/nanopore --sequencing longread --cores 32 -k --use-singularity --sdm apptainer --output test -k --use-conda`\n\n### Intermediate files \nStored in:\n\n\n  `baczy.out/PROCESSING`\n\n### Final Results and Output\nStored in `RESULT-short` for short reads or `RESULTS-long` for long reads\n\nEach folder contains:\n  - **{sample} folder**\n    - {sample}_amrfinderplus table: identified AMR genes in the genome\n    - {sample}_contigs.fa or {sample}_final.fasta : assembled genomes for each genome\n    - {sample}.faa : identified proteins\n    - {sample}.fna : identified genes\n    - {sample}.gbff\n    - {sample}.gff3\n    - {sample}.png and {sample}.svg : genome visualized\n    - {sample}.txt: summary \n    - {sample}_prophage_coordinates.tsv: location of the identified prophages using Phispy\n  - **amrfinder_summary.tsv** : a table with all the AMRFinder genes in all the samples\n  - **bakta_summary.tsv** : Bakta summary for all the samples saved to one table\n  - **checkm2_quality_report.tsv** : Checkm2 completenes results\n  - **defensefinder_summary.tsv** : All the defense systems found in all the samples \n  - **gtdbtk.ba120_summary.tsv** : GTDBTK summary with the predicted taxa for each of the samples\n  - **gtdbtk.bac120.decorated.tree** , *gtdbtk.bac120.tree.table* : GTDBTK tree and the tree table\n    - visualize the tree on iTOL\n  - **prophage_regions.tsv** : Location of the prophage regions in al the samples\n  - **Panaroo folder**\n    - output from running [panaroo](https://github.com/gtonkinhill/panaroo)  \n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Bacterial assembly and annotation",
    "version": "1.0.0",
    "project_urls": null,
    "split_keywords": [
        "bioinformatics",
        " assembly",
        " annotation",
        " bacteria"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "774823139f5ed5fcecc64ee88d07921c977acd37b9d0b1063edea3a6fe9c40b7",
                "md5": "7891bc6adde3d6bff96d5fd0d6c821c4",
                "sha256": "53fdcbee09c7b48434c445787c1a01acac6b61ce22cb4661b844c0aad6854328"
            },
            "downloads": -1,
            "filename": "baczy-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7891bc6adde3d6bff96d5fd0d6c821c4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 24570,
            "upload_time": "2025-02-23T23:43:07",
            "upload_time_iso_8601": "2025-02-23T23:43:07.475710Z",
            "url": "https://files.pythonhosted.org/packages/77/48/23139f5ed5fcecc64ee88d07921c977acd37b9d0b1063edea3a6fe9c40b7/baczy-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5fe06d8b64db835fa4f404b34693a191c1a9b47a4ca5f3527ebd881c8aefc65a",
                "md5": "bac534cf27c631aad85718b25661c813",
                "sha256": "6e201ab22f26bc88e8008c3a4debec876145cd34d28f64ea62f35430ec6445dc"
            },
            "downloads": -1,
            "filename": "baczy-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "bac534cf27c631aad85718b25661c813",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 6410145,
            "upload_time": "2025-02-23T23:43:09",
            "upload_time_iso_8601": "2025-02-23T23:43:09.702850Z",
            "url": "https://files.pythonhosted.org/packages/5f/e0/6d8b64db835fa4f404b34693a191c1a9b47a4ca5f3527ebd881c8aefc65a/baczy-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-23 23:43:09",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "baczy"
}
        
Elapsed time: 0.50383s