# PACU
PACU is a workflow for whole genome sequencing based phylogeny of Illumina and ONT R9/R10 data.
PACU stands for the Prokaryotic Awesome variant Calling Utility and is named after an omnivorous fish (that eats both
Illumina and ONT reads).
### PACU is also available on our public [Galaxy instance](https://galaxy.sciensano.be/) (registration required).
----
## INSTALLATION
### CONDA installation
`conda install -c bioconda pacu_snp`
**Note:** `MEGA` is currently not available through Conda, it can be installed manually from the link below, or
`IQ-Tree` can be used instead.
### Manual installation
The PACU workflow has the following dependencies:
- [BEDTools 2.27.1](https://github.com/arq5x/bedtools2/releases/tag/v2.27.1)
- [bcftools 1.17](https://github.com/samtools/bcftools/releases/tag/1.17)
- [FigTree 1.4.4](http://tree.bio.ed.ac.uk/software/figtree/)
- [samtools 1.17](https://github.com/samtools/samtools/releases/tag/1.17)
- [snpdists 0.8.2](https://github.com/tseemann/snp-dists)
- [MEGA 10.0.4](https://www.megasoftware.net/)
- [IQ-Tree 2.2.5](https://github.com/iqtree/iqtree2)
The mapping script has the following additional dependencies:
- [Trimmomatic 0.39](https://github.com/usadellab/Trimmomatic)
- [SeqKit 2.3.1](https://github.com/shenwei356/seqkit)
- [Bowtie2 2.5.1](https://github.com/BenLangmead/bowtie2)
- [Minimap2 2.26](https://github.com/lh3/minimap2)
The corresponding binaries should be in your PATH to run the workflow.
Other versions of these tools may work, but have not been tested.
The required Python packages are listed in the `requirements.txt` file.
Python 3.9 or 3.10 is recommended for a manual installation.
```
virtualenv pacu_env --python=python3.10;
. pacu_env/bin/activate;
pip install pacu_snp;
```
## USAGE
```
usage: PACU [-h] [--ilmn-in ILMN_IN] [--ont-in ONT_IN] --ref-fasta REF_FASTA [--ref-bed REF_BED] [--dir-working DIR_WORKING] --output OUTPUT
[--output-html OUTPUT_HTML] [--use-mega] [--include-ref] [--min-snp-af MIN_SNP_AF] [--min-snp-qual MIN_SNP_QUAL]
[--min-snp-depth MIN_SNP_DEPTH] [--min-snp-dist MIN_SNP_DIST] [--min-global-depth MIN_GLOBAL_DEPTH] [--min-mq-depth MIN_MQ_DEPTH]
[--bcftools-filt-af1] [--image-width IMAGE_WIDTH] [--image-height IMAGE_HEIGHT] [--threads THREADS] [--version]
options:
-h, --help show this help message and exit
--ilmn-in ILMN_IN Directory with Illumina input BAM files
--ont-in ONT_IN Directory with ONT input BAM files
--ref-fasta REF_FASTA
Reference FASTA file
--ref-bed REF_BED BED file with phage regions
--dir-working DIR_WORKING
Working directory
--output OUTPUT Output directory
--output-html OUTPUT_HTML
Output report name
--use-mega If set, MEGA is used for the construction of the phylogeny (instead of IQ-TREE)
--include-ref If set, the reference genome is included in the phylogeny
--min-snp-af MIN_SNP_AF
Minimum allele frequency for variants
--min-snp-qual MIN_SNP_QUAL
Minimum SNP quality
--min-snp-depth MIN_SNP_DEPTH
Minimum SNP depth
--min-snp-dist MIN_SNP_DIST
Minimum distance between SNPs
--min-global-depth MIN_GLOBAL_DEPTH
Minimum depth for all samples to include positions in SNP analysis
--min-mq-depth MIN_MQ_DEPTH
MQ cutoff for samtools depth
--skip-gubbins If set, gubbins is skipped
--bcftools-filt-af1 If enabled, allele frequency filtering also considers the VAF value
--image-width IMAGE_WIDTH
Image width
--image-height IMAGE_HEIGHT
Image height
--threads THREADS
--version Print version and exit
```
**Note:** The location of the temporary directory can be changed by setting the `TMPDIR` environment variable.
### Basic usage example
The PACU workflow requires BAM files as input with reads mapped to a reference genome.
Illumina data can be provided using the `--ilmn-in` option, ONT data can be provided using the `--ont-in` option.
```
PACU \
--ilmn-in in/ilmn/ \
--ont-in in/ont/ \
--ref-fasta ref.fasta \
--output output/ \
--dir-working work/ \
--threads 8
```
### Read mapping
A script is included to map reads to a reference genome in FASTA format for both ONT and Illumina data.
The resulting BAM files can be used as input for the SNP workflow. The `--trim` option can be used to perform read
trimming before mapping.
*Illumina data*
```
PACU_map \
--ref-fasta genome.fasta \
--data-type illumina \
--fastq-illumina reads_1.fastq.gz reads_2.fastq.gz \
--output mapped.bam \
--threads 4
```
*ONT data*
```
PACU_map \
--ref-fasta genome.fasta \
--data-type ont \
--fastq-ont reads_ont.fastq.gz \
--output mapped.bam \
--threads 4
```
## TESTING
A test dataset is available under `resources/testdata/bam`, these files contain *Escherichia coli* reads mapped to a
small part of the *E. coli* NC_002695.2 genome. This is a not a real dataset, and should only be used for testing.
The complete workflow can be tested using the following command:
```
pytest --log-cli-level=DEBUG pacu/tests/test_workflow.py
```
## CONTACT
[Create an issue](https://github.com/BioinformaticsPlatformWIV-ISP/PACU/issues) to report bugs, propose new functions or ask for help.
## CITATION
If you use this tool, please consider citing our [publication](https://pubmed.ncbi.nlm.nih.gov/38441926/).
-----
Copyright - 2024 Bert Bogaerts <bert.bogaerts@sciensano.be>
Raw data
{
"_id": null,
"home_page": "https://github.com/BioinformaticsPlatformWIV-ISP/PACU/",
"name": "pacu-snp",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3",
"maintainer_email": null,
"keywords": "nanopore illumina sequencing phylogeny",
"author": "Bert Bogaerts",
"author_email": "bioit@sciensano.be",
"download_url": "https://files.pythonhosted.org/packages/b2/81/7b204b5908e93eafa064992d42001b571a7743a1e3a987e673bfd017c633/pacu_snp-0.0.6.tar.gz",
"platform": null,
"description": "# PACU\nPACU is a workflow for whole genome sequencing based phylogeny of Illumina and ONT R9/R10 data.\n\nPACU stands for the Prokaryotic Awesome variant Calling Utility and is named after an omnivorous fish (that eats both \nIllumina and ONT reads).\n\n### PACU is also available on our public [Galaxy instance](https://galaxy.sciensano.be/) (registration required).\n\n----\n\n## INSTALLATION\n\n### CONDA installation\n\n`conda install -c bioconda pacu_snp`\n\n**Note:** `MEGA` is currently not available through Conda, it can be installed manually from the link below, or \n`IQ-Tree` can be used instead.\n\n### Manual installation\n\nThe PACU workflow has the following dependencies:\n- [BEDTools 2.27.1](https://github.com/arq5x/bedtools2/releases/tag/v2.27.1)\n- [bcftools 1.17](https://github.com/samtools/bcftools/releases/tag/1.17)\n- [FigTree 1.4.4](http://tree.bio.ed.ac.uk/software/figtree/)\n- [samtools 1.17](https://github.com/samtools/samtools/releases/tag/1.17)\n- [snpdists 0.8.2](https://github.com/tseemann/snp-dists)\n- [MEGA 10.0.4](https://www.megasoftware.net/)\n- [IQ-Tree 2.2.5](https://github.com/iqtree/iqtree2)\n\nThe mapping script has the following additional dependencies:\n- [Trimmomatic 0.39](https://github.com/usadellab/Trimmomatic)\n- [SeqKit 2.3.1](https://github.com/shenwei356/seqkit)\n- [Bowtie2 2.5.1](https://github.com/BenLangmead/bowtie2)\n- [Minimap2 2.26](https://github.com/lh3/minimap2)\n\nThe corresponding binaries should be in your PATH to run the workflow. \nOther versions of these tools may work, but have not been tested.\n\nThe required Python packages are listed in the `requirements.txt` file. \nPython 3.9 or 3.10 is recommended for a manual installation.\n\n```\nvirtualenv pacu_env --python=python3.10;\n. pacu_env/bin/activate;\npip install pacu_snp;\n```\n\n## USAGE\n\n```\nusage: PACU [-h] [--ilmn-in ILMN_IN] [--ont-in ONT_IN] --ref-fasta REF_FASTA [--ref-bed REF_BED] [--dir-working DIR_WORKING] --output OUTPUT\n [--output-html OUTPUT_HTML] [--use-mega] [--include-ref] [--min-snp-af MIN_SNP_AF] [--min-snp-qual MIN_SNP_QUAL]\n [--min-snp-depth MIN_SNP_DEPTH] [--min-snp-dist MIN_SNP_DIST] [--min-global-depth MIN_GLOBAL_DEPTH] [--min-mq-depth MIN_MQ_DEPTH]\n [--bcftools-filt-af1] [--image-width IMAGE_WIDTH] [--image-height IMAGE_HEIGHT] [--threads THREADS] [--version]\n\noptions:\n -h, --help show this help message and exit\n --ilmn-in ILMN_IN Directory with Illumina input BAM files\n --ont-in ONT_IN Directory with ONT input BAM files\n --ref-fasta REF_FASTA\n Reference FASTA file\n --ref-bed REF_BED BED file with phage regions\n --dir-working DIR_WORKING\n Working directory\n --output OUTPUT Output directory\n --output-html OUTPUT_HTML\n Output report name\n --use-mega If set, MEGA is used for the construction of the phylogeny (instead of IQ-TREE)\n --include-ref If set, the reference genome is included in the phylogeny\n --min-snp-af MIN_SNP_AF\n Minimum allele frequency for variants\n --min-snp-qual MIN_SNP_QUAL\n Minimum SNP quality\n --min-snp-depth MIN_SNP_DEPTH\n Minimum SNP depth\n --min-snp-dist MIN_SNP_DIST\n Minimum distance between SNPs\n --min-global-depth MIN_GLOBAL_DEPTH\n Minimum depth for all samples to include positions in SNP analysis\n --min-mq-depth MIN_MQ_DEPTH\n MQ cutoff for samtools depth\n --skip-gubbins If set, gubbins is skipped\n --bcftools-filt-af1 If enabled, allele frequency filtering also considers the VAF value\n --image-width IMAGE_WIDTH\n Image width\n --image-height IMAGE_HEIGHT\n Image height\n --threads THREADS\n --version Print version and exit\n```\n\n**Note:** The location of the temporary directory can be changed by setting the `TMPDIR` environment variable.\n\n### Basic usage example\n\nThe PACU workflow requires BAM files as input with reads mapped to a reference genome. \nIllumina data can be provided using the `--ilmn-in` option, ONT data can be provided using the `--ont-in` option.\n\n```\nPACU \\\n --ilmn-in in/ilmn/ \\\n --ont-in in/ont/ \\\n --ref-fasta ref.fasta \\\n --output output/ \\\n --dir-working work/ \\\n --threads 8\n```\n\n### Read mapping\n\nA script is included to map reads to a reference genome in FASTA format for both ONT and Illumina data.\nThe resulting BAM files can be used as input for the SNP workflow. The `--trim` option can be used to perform read\ntrimming before mapping.\n\n*Illumina data*\n```\nPACU_map \\\n --ref-fasta genome.fasta \\\n --data-type illumina \\\n --fastq-illumina reads_1.fastq.gz reads_2.fastq.gz \\\n --output mapped.bam \\\n --threads 4\n```\n*ONT data*\n```\nPACU_map \\\n --ref-fasta genome.fasta \\\n --data-type ont \\\n --fastq-ont reads_ont.fastq.gz \\\n --output mapped.bam \\\n --threads 4\n```\n\n## TESTING\n\nA test dataset is available under `resources/testdata/bam`, these files contain *Escherichia coli* reads mapped to a \nsmall part of the *E. coli* NC_002695.2 genome. This is a not a real dataset, and should only be used for testing.\n\nThe complete workflow can be tested using the following command:\n```\npytest --log-cli-level=DEBUG pacu/tests/test_workflow.py\n```\n\n## CONTACT\n[Create an issue](https://github.com/BioinformaticsPlatformWIV-ISP/PACU/issues) to report bugs, propose new functions or ask for help.\n\n## CITATION\nIf you use this tool, please consider citing our [publication](https://pubmed.ncbi.nlm.nih.gov/38441926/).\n\n-----\n\nCopyright - 2024 Bert Bogaerts <bert.bogaerts@sciensano.be>\n\n\n",
"bugtrack_url": null,
"license": "GPL",
"summary": "Workflow for whole genome sequencing based phylogeny of Illumina and ONT data.",
"version": "0.0.6",
"project_urls": {
"Homepage": "https://github.com/BioinformaticsPlatformWIV-ISP/PACU/"
},
"split_keywords": [
"nanopore",
"illumina",
"sequencing",
"phylogeny"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "87193e9aafc26bbeeb91a2ac1c5e4c1ff6eacd4786092a7ed61b3f412efe7246",
"md5": "640e2a2967d8f6479bea9f0a1572a7cb",
"sha256": "42ffc056d0e24976448e51712fa9aec94c354afa1d5b5a34bdeeeb1921564460"
},
"downloads": -1,
"filename": "pacu_snp-0.0.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "640e2a2967d8f6479bea9f0a1572a7cb",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3",
"size": 1991290,
"upload_time": "2024-11-13T13:54:05",
"upload_time_iso_8601": "2024-11-13T13:54:05.914553Z",
"url": "https://files.pythonhosted.org/packages/87/19/3e9aafc26bbeeb91a2ac1c5e4c1ff6eacd4786092a7ed61b3f412efe7246/pacu_snp-0.0.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b2817b204b5908e93eafa064992d42001b571a7743a1e3a987e673bfd017c633",
"md5": "6213ec8b6e9e10defef2cc7de48ef828",
"sha256": "5ba982f196852e380a130897108bedf401a1a256fc1bd262a2ec3dac42cfd8d1"
},
"downloads": -1,
"filename": "pacu_snp-0.0.6.tar.gz",
"has_sig": false,
"md5_digest": "6213ec8b6e9e10defef2cc7de48ef828",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3",
"size": 1965657,
"upload_time": "2024-11-13T13:54:10",
"upload_time_iso_8601": "2024-11-13T13:54:10.001392Z",
"url": "https://files.pythonhosted.org/packages/b2/81/7b204b5908e93eafa064992d42001b571a7743a1e3a987e673bfd017c633/pacu_snp-0.0.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-13 13:54:10",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "BioinformaticsPlatformWIV-ISP",
"github_project": "PACU",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "PuLP",
"specs": [
[
"==",
"2.7.0"
]
]
},
{
"name": "PyYAML",
"specs": [
[
">=",
"6.0"
]
]
},
{
"name": "UpSetPlot",
"specs": [
[
">=",
"0.8.0"
]
]
},
{
"name": "beautifulsoup4",
"specs": [
[
">=",
"4.11.1"
]
]
},
{
"name": "biopython",
"specs": []
},
{
"name": "matplotlib",
"specs": [
[
">=",
"3.7.1"
]
]
},
{
"name": "numpy",
"specs": [
[
">=",
"1.26.4"
]
]
},
{
"name": "pandas",
"specs": [
[
">=",
"2.1.0"
]
]
},
{
"name": "pytest",
"specs": [
[
">=",
"8.2.2"
]
]
},
{
"name": "pyvcf3",
"specs": [
[
">=",
"1.0.3"
]
]
},
{
"name": "snakemake",
"specs": [
[
"==",
"7.18.2"
]
]
},
{
"name": "yattag",
"specs": [
[
">=",
"1.14.0"
]
]
}
],
"lcname": "pacu-snp"
}