# GPatch
## Assemble contigs into a chromosome-scalse pseudo-assembly using alignments to a reference sequence.
Starting with alignments of contigs to a reference genome, produce a chromosome-scale pseudoassembly by patching gaps between mapped contigs with sequences from the reference.
## Dependencies
* Python >= v3.7
* samtools (https://github.com/samtools/samtools)
* biopython (https://biopython.org/)
* pysam (https://github.com/pysam-developers/pysam)
* minimap2 (https://github.com/lh3/minimap2)
We recommend using minimap2 for alignment, using the -a option to generate SAM output.
## Installation
We recommend installing with conda, into a new environment:
```
conda create -n GPatch -c conda-forge -c bioconda Bio pysam minimap2 samtools GPatch
```
Install with pip:
```
pip install GPatch
```
Installation from the github repository is not recommended. However, if you must, follow the steps below:
1) git clone https://github.com/adadiehl/GPatch
2) cd GPatch/
3) python3 -m pip install -e .
## Usage
```
usage: GPatch.py [-h] -q SAM/BAM -r FASTA [-x STR] [-b FILENAME] [-m N]
[-w PATH] [-d]
```
Starting with alignments of contigs to a reference genome, produce a chromosome-scale pseudoassembly by patching gaps between mapped contigs with sequences from the reference. Reference chromosomes with no mapped contigs are printed to output unchanged.
#### Required Arguments
| Argument | Description |
|---|---|
| __-q SAM/BAM, --query_bam SAM/BAM__ | Path to SAM/BAM file containing non-overlapping contig mappings to the reference genome. |
| __-r FASTA, --reference_fasta FASTA__ | Path to reference genome fasta. |
#### Optional Arguments:
| Argument | Description |
|---|---|
| __-h, --help__ | Show this help message and exit. |
| __-x STR, --prefix STR__ | Prefix to add to output file names. Default=None |
| __-b FILENAME, --store_final_bam FILENAME__ | Store the final set of primary contig alignments to the given file name. Default: Do not store the final BAM. |
| __-m N, --min_qual_score N__ | Minimum mapping quality score to retain an alignment. Default=30 |
| __-w PATH, --whitelist PATH__ | Path to BED file containing whitelist regions: i.e., the inverse of blacklist regions. Supplying this will have the effect of excluding alignments that fall entirely within blacklist regions. Default=None |
| __-d, --drop_missing__ | Omit unpatched reference chromosome records from the output if no contigs map to them. Default: Unpatched chromosomes are printed to output unchanged. |
## Output
GPatch produces three output files:
| File | Description |
|---|---|
| __patched.fasta__ | The final patched genome. |
| __contigs.bed__ | Location of contigs in the coordinate frame of the patched genome. |
| __patches.bed__ | Location of patches in the coordinate frame of the reference genome. |
## Citing GPatch
Please use the following citation if you use this software in your work:
CITATION_HERE
Raw data
{
"_id": null,
"home_page": "https://github.com/adadiehl/GPatch",
"name": "GPatch",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "genomics, genome assembly",
"author": "Adam Diehl",
"author_email": "adadiehl@umich.edu",
"download_url": "https://files.pythonhosted.org/packages/5d/b5/646571ac86881d057447fac45407ccce5d87ee7cfc2eb83db2e0f39d47a0/GPatch-0.3.6.tar.gz",
"platform": null,
"description": "# GPatch\n## Assemble contigs into a chromosome-scalse pseudo-assembly using alignments to a reference sequence.\n\nStarting with alignments of contigs to a reference genome, produce a chromosome-scale pseudoassembly by patching gaps between mapped contigs with sequences from the reference.\n\n## Dependencies\n* Python >= v3.7\n* samtools (https://github.com/samtools/samtools)\n* biopython (https://biopython.org/)\n* pysam (https://github.com/pysam-developers/pysam)\n* minimap2 (https://github.com/lh3/minimap2)\n\nWe recommend using minimap2 for alignment, using the -a option to generate SAM output.\n\n## Installation\n\nWe recommend installing with conda, into a new environment:\n```\nconda create -n GPatch -c conda-forge -c bioconda Bio pysam minimap2 samtools GPatch\n```\n\nInstall with pip:\n```\npip install GPatch\n```\n\nInstallation from the github repository is not recommended. However, if you must, follow the steps below:\n1) git clone https://github.com/adadiehl/GPatch\n2) cd GPatch/\n3) python3 -m pip install -e .\n\n\n## Usage\n```\nusage: GPatch.py [-h] -q SAM/BAM -r FASTA [-x STR] [-b FILENAME] [-m N]\n [-w PATH] [-d]\n```\n\nStarting with alignments of contigs to a reference genome, produce a chromosome-scale pseudoassembly by patching gaps between mapped contigs with sequences from the reference. Reference chromosomes with no mapped contigs are printed to output unchanged.\n\n#### Required Arguments\n| Argument | Description |\n|---|---|\n| __-q SAM/BAM, --query_bam SAM/BAM__ | Path to SAM/BAM file containing non-overlapping contig mappings to the reference genome. |\n| __-r FASTA, --reference_fasta FASTA__ | Path to reference genome fasta. |\n\n#### Optional Arguments:\n| Argument | Description |\n|---|---|\n| __-h, --help__ | Show this help message and exit. |\n| __-x STR, --prefix STR__ | Prefix to add to output file names. Default=None |\n| __-b FILENAME, --store_final_bam FILENAME__ | Store the final set of primary contig alignments to the given file name. Default: Do not store the final BAM. |\n| __-m N, --min_qual_score N__ | Minimum mapping quality score to retain an alignment. Default=30 |\n| __-w PATH, --whitelist PATH__ | Path to BED file containing whitelist regions: i.e., the inverse of blacklist regions. Supplying this will have the effect of excluding alignments that fall entirely within blacklist regions. Default=None |\n| __-d, --drop_missing__ | Omit unpatched reference chromosome records from the output if no contigs map to them. Default: Unpatched chromosomes are printed to output unchanged. |\n\n\n## Output\n\nGPatch produces three output files:\n| File | Description |\n|---|---|\n| __patched.fasta__ | The final patched genome. |\n| __contigs.bed__ | Location of contigs in the coordinate frame of the patched genome. |\n| __patches.bed__ | Location of patches in the coordinate frame of the reference genome. |\n\n\n## Citing GPatch\nPlease use the following citation if you use this software in your work:\n\nCITATION_HERE\n\n",
"bugtrack_url": null,
"license": null,
"summary": "Assemble contigs into a chromosome-scalse pseudo-assembly using alignments to a reference sequence.",
"version": "0.3.6",
"project_urls": {
"Homepage": "https://github.com/adadiehl/GPatch"
},
"split_keywords": [
"genomics",
" genome assembly"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "64bac1ac5e556706810fa577a68bbfa9bd0d63777526248d9eff9e0cb790ebbc",
"md5": "4a2e17733ee8c168c0614c1b1bf45fab",
"sha256": "64edc08048a9e52e2d428ab0315d1b8fbbd628cddde261de4a533d3da809e706"
},
"downloads": -1,
"filename": "GPatch-0.3.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "4a2e17733ee8c168c0614c1b1bf45fab",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 8505,
"upload_time": "2025-01-07T20:49:02",
"upload_time_iso_8601": "2025-01-07T20:49:02.275517Z",
"url": "https://files.pythonhosted.org/packages/64/ba/c1ac5e556706810fa577a68bbfa9bd0d63777526248d9eff9e0cb790ebbc/GPatch-0.3.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "5db5646571ac86881d057447fac45407ccce5d87ee7cfc2eb83db2e0f39d47a0",
"md5": "2c49b9514a04edd001648cf4a42756ad",
"sha256": "01246204a200f2e8d497e97876de5f67405bee6a847b967a4107e4b271846391"
},
"downloads": -1,
"filename": "GPatch-0.3.6.tar.gz",
"has_sig": false,
"md5_digest": "2c49b9514a04edd001648cf4a42756ad",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 10929,
"upload_time": "2025-01-07T20:49:05",
"upload_time_iso_8601": "2025-01-07T20:49:05.573710Z",
"url": "https://files.pythonhosted.org/packages/5d/b5/646571ac86881d057447fac45407ccce5d87ee7cfc2eb83db2e0f39d47a0/GPatch-0.3.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-07 20:49:05",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "adadiehl",
"github_project": "GPatch",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "Bio",
"specs": []
},
{
"name": "pysam",
"specs": []
}
],
"lcname": "gpatch"
}