# hapROH & hapCon
This Python package contains two softwares for ancient DNA, ***hapROH*** and ***hapCon***.
**For detailled Instructions for Installation and Manuals, including Getting Started and Vignettes, please visit the official tutorial:
https://haproh.readthedocs.io**
1) hapROH
This software identifies runs of homozygosity (ROH) in ancient and present-day DNA by using a panel of reference haplotypes. This package contains functions and wrappers to call ROH and functions for downstream analysis and visualization.
For downward compatibility, the package uses `hapsburg` as module name. After installation you can import Python functions via
`from hapsburg.XX import YY`
2) hapCon
This software estimates contamination in male X chromosome via using a panel of reference haplotypes. It has been incorporate into the hapROH package since version 0.4a1, no additional installation is needed.
## Scope
### hapROH
Standard parameters are tuned for human 1240K capture data (ca. 1.2 million SNPs used widely in human aDNA analysis) and using 1000 Genome haplotypes as reference. The software is tested on a wide range of test applications, both 1240K data and also whole genome sequencing data downsampled to 1240K SNPs. Successful cases include 45k year old Ust Ishim man, and a wide range of American, Eurasian and Oceanian ancient DNA, showing that the method generally works for split times of reference panel and target up to a few 10k years, which includes all out-of-Africa populations (Attention: Neanderthals and Denisovans do not fall into that range, additionally some Subsaharan hunter gatherer test cases did not give satisfactory results).
Currently, hapROH works on data for 1240K SNPs and in unpacked or packed `eigenstrat` format (which is widely used in human ancient DNA). The software assumes pseudo-haploid or diploid genotype data (the mode can be set, by default it is pseudo-haploid). The recommended coverage range is 400,000 or more 1240K SNPs covered at least once (i.e. at least ca. 0.3x coverage).
If you have whole genome data available, you have to downsample an create eigenstrat files for biallelic 1240k SNPs first.
In case you are planning applications to other kind of SNP or bigger SNP sets, or even other organisms, the method parameters have to be adjusted (the default parameters are specifically optimized for human 1240K data). You can mirror our procedure to find good parameters (described in the publication), and if you contact me for assistance - I am happy to share my own experience.
### hapCon
This software works directly from BAM file or from samtools mpileup output. We have created two reference panels for hapCON: one for 1240k data and the other for WGS data. The standard parameters are tuned towards these two use cases.
## Updates:
The text file`./change_log.md` describes updates in the various versions of this software
## Dependencies
The basic requirements for calling ROH are kept minimal and only allow the core ROH calling ('numpy', 'pandas', 'scipy', 'numdifftools', 'h5py'). If you want to use extended analysis and plotting functionality: There are extra Python packages that you need to install (e.g. via `pip` or `conda`).
1) If you want to use the advanced plotting functionality, you need `matplotlib`
2) For plotting of maps, you will need `basemap` (warning: installing can be tricky on some architectures as C packages are required).
3) If you want to use the effective population size fitting functionality from ROH output, you require the package `statsmodels`.
## c Extension
For performance reasons, the heavy lifting of the algorithm is coded into a c method (cfunc.c). This "extension" is built via cython from `cfunc.pyx` This should be done automatically via the package cython (as CYTHON=True in setup.py by default).
You can also set `CYTHON=False`, then the extension is compiled from `cfunc.c` directly (experimental, not tested on all platforms).
## Software Development
The code used to develop this package is deposited at the github repository:
https://github.com/hringbauer/hapROH
The package is packed in the folder `./package/`. In addition, there are a large number of notebooks used to test and extensively use the functionality in `./notebooks/`.
## Citation
If you use the software for a scientific publication and want to cite it, you can use:
- **hapROH***: https://doi.org/10.1038/s41467-021-25289-w
- **hapCon***: https://doi.org/10.1093/bioinformatics/btac390
## Contact
If you have bug reports, suggestions or general comments, please contact us. We are happy to hear from you. Bug reports and user suggestions will help me to improve this software - so please do not hesitate to reach out!
harald_ringbauer AT eva mpg de
yilei_huang AT eva.mpg.de
(fill in blanks with dots and AT with @)
## Acknowledgments
Big thank you to the original co-authors Matthias Steinrücken and John Novembre. The project profited immensely from Matthias' deep knowledge about HMMs and from John's extensive experience in developing population genetics software. Countless discussions with both have been key for moving forward this project. Another big thanks goes to Nick Patterson, who informed me about the benefits of working with rescaled HMMs - substantially improving the runtime of hapROH.
I want to acknowledge users who find and report software bugs (Mélanie Pruvost, Ke Wang, Ruoyun Hui, Selina Carlhoff, Matthew Mah, Xiaowen Jia) and all users who reached out with general questions and requests (Rosa Fregel, Federico Sanchez). This feedback has helped to remove errors in the program and to improve its usability. Many thanks!
Authors:
Harald Ringbauer, Yilei Huang, 2023
Raw data
{
"_id": null,
"home_page": "https://github.com/hringbauer/hapROH",
"name": "hapROH",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "",
"keywords": "population genetics,ancient DNA,runs of homozygosity,reference haplotypes,aDNA contamination",
"author": "Harald Ringbauer",
"author_email": "harald_ringbauer@eva.mpg.de",
"download_url": "https://files.pythonhosted.org/packages/55/5a/fbce71c4dc0979387783721cac3f30418e88570c19036f0947e2df25b080/hapROH-0.64.tar.gz",
"platform": null,
"description": "# hapROH & hapCon\nThis Python package contains two softwares for ancient DNA, ***hapROH*** and ***hapCon***.\n\n**For detailled Instructions for Installation and Manuals, including Getting Started and Vignettes, please visit the official tutorial:\nhttps://haproh.readthedocs.io**\n\n1) hapROH\nThis software identifies runs of homozygosity (ROH) in ancient and present-day DNA by using a panel of reference haplotypes. This package contains functions and wrappers to call ROH and functions for downstream analysis and visualization.\n\nFor downward compatibility, the package uses `hapsburg` as module name. After installation you can import Python functions via\n`from hapsburg.XX import YY`\n\n2) hapCon\nThis software estimates contamination in male X chromosome via using a panel of reference haplotypes. It has been incorporate into the hapROH package since version 0.4a1, no additional installation is needed. \n\n\n## Scope\n\n### hapROH\nStandard parameters are tuned for human 1240K capture data (ca. 1.2 million SNPs used widely in human aDNA analysis) and using 1000 Genome haplotypes as reference. The software is tested on a wide range of test applications, both 1240K data and also whole genome sequencing data downsampled to 1240K SNPs. Successful cases include 45k year old Ust Ishim man, and a wide range of American, Eurasian and Oceanian ancient DNA, showing that the method generally works for split times of reference panel and target up to a few 10k years, which includes all out-of-Africa populations (Attention: Neanderthals and Denisovans do not fall into that range, additionally some Subsaharan hunter gatherer test cases did not give satisfactory results).\n\nCurrently, hapROH works on data for 1240K SNPs and in unpacked or packed `eigenstrat` format (which is widely used in human ancient DNA). The software assumes pseudo-haploid or diploid genotype data (the mode can be set, by default it is pseudo-haploid). The recommended coverage range is 400,000 or more 1240K SNPs covered at least once (i.e. at least ca. 0.3x coverage).\n\nIf you have whole genome data available, you have to downsample an create eigenstrat files for biallelic 1240k SNPs first.\n\nIn case you are planning applications to other kind of SNP or bigger SNP sets, or even other organisms, the method parameters have to be adjusted (the default parameters are specifically optimized for human 1240K data). You can mirror our procedure to find good parameters (described in the publication), and if you contact me for assistance - I am happy to share my own experience.\n\n### hapCon\nThis software works directly from BAM file or from samtools mpileup output. We have created two reference panels for hapCON: one for 1240k data and the other for WGS data. The standard parameters are tuned towards these two use cases.\n\n## Updates:\nThe text file`./change_log.md` describes updates in the various versions of this software\n\n\n## Dependencies\nThe basic requirements for calling ROH are kept minimal and only allow the core ROH calling ('numpy', 'pandas', 'scipy', 'numdifftools', 'h5py'). If you want to use extended analysis and plotting functionality: There are extra Python packages that you need to install (e.g. via `pip` or `conda`). \n\n1) If you want to use the advanced plotting functionality, you need `matplotlib` \n2) For plotting of maps, you will need `basemap` (warning: installing can be tricky on some architectures as C packages are required). \n3) If you want to use the effective population size fitting functionality from ROH output, you require the package `statsmodels`.\n\n\n## c Extension\nFor performance reasons, the heavy lifting of the algorithm is coded into a c method (cfunc.c). This \"extension\" is built via cython from `cfunc.pyx` This should be done automatically via the package cython (as CYTHON=True in setup.py by default).\n\nYou can also set `CYTHON=False`, then the extension is compiled from `cfunc.c` directly (experimental, not tested on all platforms).\n\n\n## Software Development\nThe code used to develop this package is deposited at the github repository: \nhttps://github.com/hringbauer/hapROH\n\nThe package is packed in the folder `./package/`. In addition, there are a large number of notebooks used to test and extensively use the functionality in `./notebooks/`.\n\n\n## Citation\nIf you use the software for a scientific publication and want to cite it, you can use:\n- **hapROH***: https://doi.org/10.1038/s41467-021-25289-w\n- **hapCon***: https://doi.org/10.1093/bioinformatics/btac390\n\n\n## Contact\nIf you have bug reports, suggestions or general comments, please contact us. We are happy to hear from you. Bug reports and user suggestions will help me to improve this software - so please do not hesitate to reach out!\n\nharald_ringbauer AT eva mpg de\nyilei_huang AT eva.mpg.de\n\n(fill in blanks with dots and AT with @)\n\n\n## \tAcknowledgments\nBig thank you to the original co-authors Matthias Steinr\u00fccken and John Novembre. The project profited immensely from Matthias' deep knowledge about HMMs and from John's extensive experience in developing population genetics software. Countless discussions with both have been key for moving forward this project. Another big thanks goes to Nick Patterson, who informed me about the benefits of working with rescaled HMMs - substantially improving the runtime of hapROH. \n\nI want to acknowledge users who find and report software bugs (M\u00e9lanie Pruvost, Ke Wang, Ruoyun Hui, Selina Carlhoff, Matthew Mah, Xiaowen Jia) and all users who reached out with general questions and requests (Rosa Fregel, Federico Sanchez). This feedback has helped to remove errors in the program and to improve its usability. Many thanks!\n\n\nAuthors:\nHarald Ringbauer, Yilei Huang, 2023",
"bugtrack_url": null,
"license": "GNU GPLv3",
"summary": "Identify runs of homozygosity (hapROH) and contamination (hapCon) in low coverage ancient human DNA data (1240K SNPs) using modern reference panel",
"version": "0.64",
"split_keywords": [
"population genetics",
"ancient dna",
"runs of homozygosity",
"reference haplotypes",
"adna contamination"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "555afbce71c4dc0979387783721cac3f30418e88570c19036f0947e2df25b080",
"md5": "e6b48f3dffc179b47a3cb36867ecec33",
"sha256": "bc67bce2a7b376fa63a90b0383134d8fbfc0347a00eb63e273aea109023eb787"
},
"downloads": -1,
"filename": "hapROH-0.64.tar.gz",
"has_sig": false,
"md5_digest": "e6b48f3dffc179b47a3cb36867ecec33",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 280841,
"upload_time": "2023-04-14T15:28:56",
"upload_time_iso_8601": "2023-04-14T15:28:56.592683Z",
"url": "https://files.pythonhosted.org/packages/55/5a/fbce71c4dc0979387783721cac3f30418e88570c19036f0947e2df25b080/hapROH-0.64.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-04-14 15:28:56",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "hringbauer",
"github_project": "hapROH",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "haproh"
}