# SNPio: A Python API for Population Genomic Data I/O, Filtering, Analysis, and Encoding

**SNPio** is a Python package designed to streamline the process of reading, filtering, encoding, and analyzing genotype alignments. It supports VCF, PHYLIP, STRUCTURE, and GENEPOP file formats, and provides high-level tools for visualization, downstream machine learning analysis, and population genetic inference.
SNPio Includes:
- File I/O (VCFReader, PhylipReader, StructureReader, GenePopReader)
- Genotype filtering (NRemover2)
- Genotype encoding for AI & machine learning applications (GenotypeEncoder)
- Population genetic statistics & Principal Component Analysis (PopGenStatistics)
- Experimental: Phylogenetic tree parsing (TreeParser)
---
## ๐ Full Documentation
Detailed API usage, tutorials, and examples are available in the [Documentation](https://snpio.readthedocs.io/en/latest/)
---
## ๐ง Installation
You can install SNPio using one of the following methods:
### โ
Pip Installation
```bash
python3 -m venv snpio-env
source snpio-env/bin/activate
pip install snpio
```
### โ
Conda Installation
```bash
conda create -n snpio-env python=3.12
conda activate snpio-env
conda install -c btmartin721 snpio
```
### ๐ณ Docker
To run the Docker image interactively in a terminal, run the following commands:
```bash
docker pull btmartin721/snpio:latest
docker run -it btmartin721/snpio:latest
```
If you'd like to run SNPio in a jupyter notebook, instructions to do so in the docker container will be printed to the terminal.
> **Note:** All three installation versions (pip, conda, docker) are actively maintained and kept up-to-date with CI/CD routines.
> **Note:** SNPio supports Unix-based systems. Windows users should install via WSL.
---
## ๐ Getting Started
### Import Modules
```python
from snpio import (
NRemover2, VCFReader, PhylipReader, StructureReader,
GenePopReader, GenotypeEncoder, PopGenStatistics
)
```
### Load Genotype Data (VCF Example)
```python
vcf = "snpio/example_data/vcf_files/phylogen_subset14K_sorted.vcf.gz"
popmap = "snpio/example_data/popmaps/phylogen_nomx.popmap"
gd = VCFReader(
filename=vcf,
popmapfile=popmap,
force_popmap=True,
verbose=True,
plot_format="png",
prefix="snpio_example"
)
```
You can also specify `include_pops` and `exclude_pops` to control population-level filtering.
---
## ๐งช Development Notes
To run unit all tests:
```bash
pip install snpio[dev]
pytest tests/
```
---
## ๐งพ License and Citation
SNPio is licensed under the [GPL-3.0 License](https://github.com/btmartin721/SNPio/blob/master/LICENSE).
Please cite any publication(s) when using SNPio in your research. A manuscript is currently in development, and this section will be updated upon acceptance.
---
## ๐ค Contributing
We welcome community contributions!
- Report bugs or request features on [GitHub Issues](https://github.com/btmartin721/snpio/issues)
- Submit a pull request
- See [CONTRIBUTING.md](https://github.com/btmartin721/SNPio/blob/master/CONTRIBUTING.md) for contributing guidelines.
---
## ๐ Acknowledgments
Thanks for using SNPio. We hope it facilitates your population genomic research. Feel free to reach out with questions or feedback!
Raw data
{
"_id": null,
"home_page": null,
"name": "snpio",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.13,>=3.11",
"maintainer_email": null,
"keywords": "genomics, bioinformatics, population genetics, SNP, VCF, PHYLIP, STRUCTURE, missing data, filtering, filter, MAF, minor allele frequency, MAC, minor allele count, biallelic, monomorphic, singleton, population structure, d-statistics, Fst, multiqc, encoding",
"author": null,
"author_email": "\"Drs. Bradley T. Martin and Tyler K. Chafin\" <evobio721@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/89/3d/d78e4220102a82862a67881d9ec041fb59e96647b9194b8c907bf2a49180/snpio-1.6.0.tar.gz",
"platform": null,
"description": "# SNPio: A Python API for Population Genomic Data I/O, Filtering, Analysis, and Encoding\n\n\n\n**SNPio** is a Python package designed to streamline the process of reading, filtering, encoding, and analyzing genotype alignments. It supports VCF, PHYLIP, STRUCTURE, and GENEPOP file formats, and provides high-level tools for visualization, downstream machine learning analysis, and population genetic inference.\n\nSNPio Includes:\n\n- File I/O (VCFReader, PhylipReader, StructureReader, GenePopReader)\n- Genotype filtering (NRemover2)\n- Genotype encoding for AI & machine learning applications (GenotypeEncoder)\n- Population genetic statistics & Principal Component Analysis (PopGenStatistics)\n- Experimental: Phylogenetic tree parsing (TreeParser)\n\n---\n\n## \ud83d\udcd6 Full Documentation\n\nDetailed API usage, tutorials, and examples are available in the [Documentation](https://snpio.readthedocs.io/en/latest/)\n\n---\n\n## \ud83d\udd27 Installation\n\nYou can install SNPio using one of the following methods:\n\n### \u2705 Pip Installation\n\n```bash\npython3 -m venv snpio-env\nsource snpio-env/bin/activate\npip install snpio\n```\n\n### \u2705 Conda Installation\n\n```bash\nconda create -n snpio-env python=3.12\nconda activate snpio-env\nconda install -c btmartin721 snpio\n```\n\n### \ud83d\udc33 Docker\n\nTo run the Docker image interactively in a terminal, run the following commands:\n\n```bash\ndocker pull btmartin721/snpio:latest\ndocker run -it btmartin721/snpio:latest\n```\n\nIf you'd like to run SNPio in a jupyter notebook, instructions to do so in the docker container will be printed to the terminal. \n\n> **Note:** All three installation versions (pip, conda, docker) are actively maintained and kept up-to-date with CI/CD routines.\n\n> **Note:** SNPio supports Unix-based systems. Windows users should install via WSL.\n\n---\n\n## \ud83d\ude80 Getting Started\n\n### Import Modules\n\n```python\nfrom snpio import (\n NRemover2, VCFReader, PhylipReader, StructureReader,\n GenePopReader, GenotypeEncoder, PopGenStatistics\n)\n```\n\n### Load Genotype Data (VCF Example)\n\n```python\nvcf = \"snpio/example_data/vcf_files/phylogen_subset14K_sorted.vcf.gz\"\npopmap = \"snpio/example_data/popmaps/phylogen_nomx.popmap\"\n\ngd = VCFReader(\n filename=vcf,\n popmapfile=popmap,\n force_popmap=True,\n verbose=True,\n plot_format=\"png\",\n prefix=\"snpio_example\"\n)\n```\n\nYou can also specify `include_pops` and `exclude_pops` to control population-level filtering.\n\n---\n\n## \ud83e\uddea Development Notes\n\nTo run unit all tests:\n\n```bash\npip install snpio[dev]\npytest tests/\n```\n\n---\n\n## \ud83e\uddfe License and Citation\n\nSNPio is licensed under the [GPL-3.0 License](https://github.com/btmartin721/SNPio/blob/master/LICENSE). \n\nPlease cite any publication(s) when using SNPio in your research. A manuscript is currently in development, and this section will be updated upon acceptance.\n\n---\n\n## \ud83e\udd1d Contributing\n\nWe welcome community contributions!\n\n- Report bugs or request features on [GitHub Issues](https://github.com/btmartin721/snpio/issues)\n- Submit a pull request\n- See [CONTRIBUTING.md](https://github.com/btmartin721/SNPio/blob/master/CONTRIBUTING.md) for contributing guidelines.\n\n---\n\n## \ud83d\ude4f Acknowledgments\n\nThanks for using SNPio. We hope it facilitates your population genomic research. Feel free to reach out with questions or feedback!\n",
"bugtrack_url": null,
"license": "GPL-3.0-or-later",
"summary": "SNPio is a Python API for population genetic file processing, filtering, and analysis. It is designed to be a user-friendly tool for the manipulation of population genetic data in a variety of formats. SNPio can be used to filter data based on missingness, MAF and MAC, singletons, biallelic, and monomorphic sites. It can also generate summary statistics for population genetic analyses.",
"version": "1.6.0",
"project_urls": {
"Bug Tracker": "https://github.com/btmartin721/SNPio/issues",
"Changelog": "https://snpio.readthedocs.io/en/latest/changelog.html",
"Documentation": "https://snpio.readthedocs.io/en/latest/",
"Source Code": "https://github.com/btmartin721/SNPio"
},
"split_keywords": [
"genomics",
" bioinformatics",
" population genetics",
" snp",
" vcf",
" phylip",
" structure",
" missing data",
" filtering",
" filter",
" maf",
" minor allele frequency",
" mac",
" minor allele count",
" biallelic",
" monomorphic",
" singleton",
" population structure",
" d-statistics",
" fst",
" multiqc",
" encoding"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "e46a7fa40bfb7fdcc851210a8c2213329c1c20273cc9ca791d6861b0c10b51f7",
"md5": "72dd919de1aa6ae9bd68d8eb94adfd93",
"sha256": "101ec15e5a9fe30930514d68158ed7de4529af1e420c2f4a1be1a1d5a45927e7"
},
"downloads": -1,
"filename": "snpio-1.6.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "72dd919de1aa6ae9bd68d8eb94adfd93",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.13,>=3.11",
"size": 65450078,
"upload_time": "2025-07-25T08:41:01",
"upload_time_iso_8601": "2025-07-25T08:41:01.330834Z",
"url": "https://files.pythonhosted.org/packages/e4/6a/7fa40bfb7fdcc851210a8c2213329c1c20273cc9ca791d6861b0c10b51f7/snpio-1.6.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "893dd78e4220102a82862a67881d9ec041fb59e96647b9194b8c907bf2a49180",
"md5": "40234acef8d1e39cbd325c021bcf86b4",
"sha256": "a42560f793d67eaa038f17e364413707605eccdc55f02b2868e78f3b43c8c17f"
},
"downloads": -1,
"filename": "snpio-1.6.0.tar.gz",
"has_sig": false,
"md5_digest": "40234acef8d1e39cbd325c021bcf86b4",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.13,>=3.11",
"size": 63193115,
"upload_time": "2025-07-25T08:41:06",
"upload_time_iso_8601": "2025-07-25T08:41:06.522398Z",
"url": "https://files.pythonhosted.org/packages/89/3d/d78e4220102a82862a67881d9ec041fb59e96647b9194b8c907bf2a49180/snpio-1.6.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-25 08:41:06",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "btmartin721",
"github_project": "SNPio",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "bokeh",
"specs": []
},
{
"name": "h5py",
"specs": []
},
{
"name": "holoviews",
"specs": []
},
{
"name": "kaleido",
"specs": []
},
{
"name": "kneed",
"specs": []
},
{
"name": "matplotlib",
"specs": []
},
{
"name": "multiqc",
"specs": [
[
">=",
"1.29"
]
]
},
{
"name": "numba",
"specs": []
},
{
"name": "numpy",
"specs": []
},
{
"name": "pandas",
"specs": []
},
{
"name": "panel",
"specs": []
},
{
"name": "plotly",
"specs": []
},
{
"name": "pysam",
"specs": []
},
{
"name": "requests",
"specs": []
},
{
"name": "scikit-learn",
"specs": []
},
{
"name": "scipy",
"specs": []
},
{
"name": "statsmodels",
"specs": []
},
{
"name": "seaborn",
"specs": []
},
{
"name": "toytree",
"specs": []
},
{
"name": "tqdm",
"specs": []
}
],
"lcname": "snpio"
}