[](https://edwards.flinders.edu.au)
[](https://opensource.org/licenses/MIT)

[](https://github.com/beardymcjohnface/Snaketool)

[](https://github.com/npbhavya/baczy/actions/workflows/testing.yml)
# Baczy
<p align="center">
<img src="baczy.png#gh-light-mode-only" width="300">
<img src="baczy.png#gh-dark-mode-only" width="300">
</p>
## Overview
**Baczy** is a **Snakemake-based workflow** for **assembling and annotating bacterial host genomes**. It extends **[Sphae](https://github.com/linsalrob/sphae)**, which assembles and annotates phage genomes, by enabling host genome assembly and functional annotation.
🔹 **Features:**
✔ **Quality control** ([Fastp](https://github.com/OpenGene/fastp))
✔ **Genome assembly** ([MEGAHIT](https://github.com/voutcn/megahit), [Hybracter](https://github.com/gbouras13/hybracter))
✔ **Functional annotation** ([Bakta](https://github.com/oschwengers/bakta))
✔ **Taxonomic classification** ([GTDB-Tk](https://github.com/Ecogenomics/GTDBTk))
✔ **Taxonomic tree** ([GTDB-Tk](https://github.com/Ecogenomics/GTDBTk))
✔ **Defense & resistance profiling** ([Defense-Finder](https://github.com/mdmparis/defense-finder), [AMRFinderPlus](https://github.com/ncbi/amr), [CapsuleDB](https://research.pasteur.fr/en/tool/capsulefinder/))
✔ **Prophage detection** ([PhiSpy](https://github.com/linsalrob/PhiSpy))
✔ **Pan-genome analysis** ([Panaroo](https://github.com/gtonkinhill/panaroo))
## Installion
Steps for installing workflow
```bash
#clone repository
git clone https://github.com/npbhavya/baczy.git
#move to sphae folder
cd baczy
#install
pip install -e .
#confirm the workflow is installed by running the below command
baczy --help
```
### Manual installation
- Install singularity or load the module
On deepthought cluster \
`module load apptainer`
- Install miniconda
Download and install Miniconda:
[Miniconda Installation Guide](https://docs.anaconda.com/miniconda/install/)
## Database setup
Download and place the required databases in:
`baczy/workflow/databases`
- [CheckM2_database](https://github.com/chklovski/CheckM2?tab=readme-ov-file#database)
- [GTDBTK database](https://ecogenomics.github.io/GTDBTk/installing/index.html)
- [Capsuledb](https://gitlab.pasteur.fr/gem/capsuledb/-/tree/master/CapsuleFinder_models?ref_type=heads)
### Running the workflow
Run Baczy using a single command!
**Before starting the run**
The taxonomic tree is generated using [GTDB-Tk](https://github.com/Ecogenomics/GTDBTk), so update the lines
```
gtdbtk:
outgroup: "d__Archaea"
taxa_filter: "d__Bacteria"
```
This can be set to more specific genera:
```
gtdbtk:
outgroup: "g__Escherichia"
taxa_filter: "g__Achromobacter"
```
**For paired end reads**
`baczy run --input sample-data/illumina --cores 32 --use-singularity --sdm apptainer --output test -k --use-conda`
**For long reads**
`baczy run --input sample-data/nanopore --sequencing longread --cores 32 -k --use-singularity --sdm apptainer --output test -k --use-conda`
### Intermediate files
Stored in:
`baczy.out/PROCESSING`
### Final Results and Output
Stored in `RESULT-short` for short reads or `RESULTS-long` for long reads
Each folder contains:
- **{sample} folder**
- {sample}_amrfinderplus table: identified AMR genes in the genome
- {sample}_contigs.fa or {sample}_final.fasta : assembled genomes for each genome
- {sample}.faa : identified proteins
- {sample}.fna : identified genes
- {sample}.gbff
- {sample}.gff3
- {sample}.png and {sample}.svg : genome visualized
- {sample}.txt: summary
- {sample}_prophage_coordinates.tsv: location of the identified prophages using Phispy
- **amrfinder_summary.tsv** : a table with all the AMRFinder genes in all the samples
- **bakta_summary.tsv** : Bakta summary for all the samples saved to one table
- **checkm2_quality_report.tsv** : Checkm2 completenes results
- **defensefinder_summary.tsv** : All the defense systems found in all the samples
- **gtdbtk.ba120_summary.tsv** : GTDBTK summary with the predicted taxa for each of the samples
- **gtdbtk.bac120.decorated.tree** , *gtdbtk.bac120.tree.table* : GTDBTK tree and the tree table
- visualize the tree on iTOL
- **prophage_regions.tsv** : Location of the prophage regions in al the samples
- **Panaroo folder**
- output from running [panaroo](https://github.com/gtonkinhill/panaroo)
Raw data
{
"_id": null,
"home_page": null,
"name": "baczy",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "bioinformatics, assembly, annotation, bacteria",
"author": null,
"author_email": "Bhavya Papudeshi <npbhavya13@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/5f/e0/6d8b64db835fa4f404b34693a191c1a9b47a4ca5f3527ebd881c8aefc65a/baczy-1.0.0.tar.gz",
"platform": null,
"description": "[](https://edwards.flinders.edu.au)\n[](https://opensource.org/licenses/MIT)\n\n[](https://github.com/beardymcjohnface/Snaketool)\n\n[](https://github.com/npbhavya/baczy/actions/workflows/testing.yml)\n\n# Baczy\n<p align=\"center\">\n <img src=\"baczy.png#gh-light-mode-only\" width=\"300\">\n <img src=\"baczy.png#gh-dark-mode-only\" width=\"300\">\n</p>\n\n## Overview\n\n**Baczy** is a **Snakemake-based workflow** for **assembling and annotating bacterial host genomes**. It extends **[Sphae](https://github.com/linsalrob/sphae)**, which assembles and annotates phage genomes, by enabling host genome assembly and functional annotation. \n\n\ud83d\udd39 **Features:** \n\u2714 **Quality control** ([Fastp](https://github.com/OpenGene/fastp)) \n\u2714 **Genome assembly** ([MEGAHIT](https://github.com/voutcn/megahit), [Hybracter](https://github.com/gbouras13/hybracter)) \n\u2714 **Functional annotation** ([Bakta](https://github.com/oschwengers/bakta)) \n\u2714 **Taxonomic classification** ([GTDB-Tk](https://github.com/Ecogenomics/GTDBTk)) \n\u2714 **Taxonomic tree** ([GTDB-Tk](https://github.com/Ecogenomics/GTDBTk)) \n\u2714 **Defense & resistance profiling** ([Defense-Finder](https://github.com/mdmparis/defense-finder), [AMRFinderPlus](https://github.com/ncbi/amr), [CapsuleDB](https://research.pasteur.fr/en/tool/capsulefinder/)) \n\u2714 **Prophage detection** ([PhiSpy](https://github.com/linsalrob/PhiSpy)) \n\u2714 **Pan-genome analysis** ([Panaroo](https://github.com/gtonkinhill/panaroo)) \n\n## Installion\n\nSteps for installing workflow \n\n ```bash\n #clone repository\n git clone https://github.com/npbhavya/baczy.git\n\n #move to sphae folder\n cd baczy\n\n #install\n pip install -e .\n\n #confirm the workflow is installed by running the below command \n baczy --help\n ```\n\n### Manual installation \n- Install singularity or load the module\n On deepthought cluster \\\n `module load apptainer`\n\n- Install miniconda\n Download and install Miniconda:\n [Miniconda Installation Guide](https://docs.anaconda.com/miniconda/install/)\n\n## Database setup\n\nDownload and place the required databases in:\n `baczy/workflow/databases`\n\n - [CheckM2_database](https://github.com/chklovski/CheckM2?tab=readme-ov-file#database)\n - [GTDBTK database](https://ecogenomics.github.io/GTDBTk/installing/index.html)\n - [Capsuledb](https://gitlab.pasteur.fr/gem/capsuledb/-/tree/master/CapsuleFinder_models?ref_type=heads)\n\n### Running the workflow\n\nRun Baczy using a single command!\n\n**Before starting the run**\nThe taxonomic tree is generated using [GTDB-Tk](https://github.com/Ecogenomics/GTDBTk), so update the lines \n \n ```\n gtdbtk:\n outgroup: \"d__Archaea\"\n taxa_filter: \"d__Bacteria\"\n ```\n\nThis can be set to more specific genera: \n \n ```\n gtdbtk:\n outgroup: \"g__Escherichia\"\n taxa_filter: \"g__Achromobacter\"\n ```\n\n**For paired end reads**\n\n `baczy run --input sample-data/illumina --cores 32 --use-singularity --sdm apptainer --output test -k --use-conda`\n\n**For long reads**\n\n `baczy run --input sample-data/nanopore --sequencing longread --cores 32 -k --use-singularity --sdm apptainer --output test -k --use-conda`\n\n### Intermediate files \nStored in:\n\n\n `baczy.out/PROCESSING`\n\n### Final Results and Output\nStored in `RESULT-short` for short reads or `RESULTS-long` for long reads\n\nEach folder contains:\n - **{sample} folder**\n - {sample}_amrfinderplus table: identified AMR genes in the genome\n - {sample}_contigs.fa or {sample}_final.fasta : assembled genomes for each genome\n - {sample}.faa : identified proteins\n - {sample}.fna : identified genes\n - {sample}.gbff\n - {sample}.gff3\n - {sample}.png and {sample}.svg : genome visualized\n - {sample}.txt: summary \n - {sample}_prophage_coordinates.tsv: location of the identified prophages using Phispy\n - **amrfinder_summary.tsv** : a table with all the AMRFinder genes in all the samples\n - **bakta_summary.tsv** : Bakta summary for all the samples saved to one table\n - **checkm2_quality_report.tsv** : Checkm2 completenes results\n - **defensefinder_summary.tsv** : All the defense systems found in all the samples \n - **gtdbtk.ba120_summary.tsv** : GTDBTK summary with the predicted taxa for each of the samples\n - **gtdbtk.bac120.decorated.tree** , *gtdbtk.bac120.tree.table* : GTDBTK tree and the tree table\n - visualize the tree on iTOL\n - **prophage_regions.tsv** : Location of the prophage regions in al the samples\n - **Panaroo folder**\n - output from running [panaroo](https://github.com/gtonkinhill/panaroo) \n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Bacterial assembly and annotation",
"version": "1.0.0",
"project_urls": null,
"split_keywords": [
"bioinformatics",
" assembly",
" annotation",
" bacteria"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "774823139f5ed5fcecc64ee88d07921c977acd37b9d0b1063edea3a6fe9c40b7",
"md5": "7891bc6adde3d6bff96d5fd0d6c821c4",
"sha256": "53fdcbee09c7b48434c445787c1a01acac6b61ce22cb4661b844c0aad6854328"
},
"downloads": -1,
"filename": "baczy-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "7891bc6adde3d6bff96d5fd0d6c821c4",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 24570,
"upload_time": "2025-02-23T23:43:07",
"upload_time_iso_8601": "2025-02-23T23:43:07.475710Z",
"url": "https://files.pythonhosted.org/packages/77/48/23139f5ed5fcecc64ee88d07921c977acd37b9d0b1063edea3a6fe9c40b7/baczy-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "5fe06d8b64db835fa4f404b34693a191c1a9b47a4ca5f3527ebd881c8aefc65a",
"md5": "bac534cf27c631aad85718b25661c813",
"sha256": "6e201ab22f26bc88e8008c3a4debec876145cd34d28f64ea62f35430ec6445dc"
},
"downloads": -1,
"filename": "baczy-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "bac534cf27c631aad85718b25661c813",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 6410145,
"upload_time": "2025-02-23T23:43:09",
"upload_time_iso_8601": "2025-02-23T23:43:09.702850Z",
"url": "https://files.pythonhosted.org/packages/5f/e0/6d8b64db835fa4f404b34693a191c1a9b47a4ca5f3527ebd881c8aefc65a/baczy-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-23 23:43:09",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "baczy"
}