Name | taxdumpy JSON |
Version |
0.1.1
JSON |
| download |
home_page | None |
Summary | Python package for efficiently parsing NCBI's taxdump database |
upload_time | 2025-08-13 12:15:31 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.10 |
license | MIT License
Copyright (c) 2025 Omega HH
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE. |
keywords |
ncbi
metagenomics
taxonomy
tree-of-life
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Taxdumpy 🧬
#### NCBI Taxonomy Toolkit for Python
*A high-performance parser for NCBI Taxonomy databases with lineage resolution and taxonomy search*



## Features
- **Blazing Fast Parsing**
Optimized loading of NCBI taxdump files (`nodes.dmp`, `names.dmp`, etc.) with optional pickle caching
- **Comprehensive Taxon Operations**
- TaxID validation and lineage tracing
- Scientific name resolution
- Rank-based filtering (species → kingdom)
- Merged/deleted node handling
- **Fuzzy Search**
Rapid approximate name matching using `rapidfuzz` (supports misspellings)
- **Memory Efficient**
Lazy loading and optimized data structures for large taxonomies
## Installation
```bash
pip install taxdumpy
```
Or from source:
```bash
git clone https://github.com/yourusername/taxdumpy.git
cd taxdumpy
pip install -e .
```
## Quick Start
```python
from taxdumpy import TaxDb, Taxon
# Initialize database (auto-downloads if needed)
taxdb = TaxDb("/path/to/taxdump")
# Create taxon object
human = Taxon(9606, taxdb) # Homo sapiens
# Access lineage
print(human.name_lineage)
# ['Homo sapiens', 'Homo', 'Hominidae', ..., 'cellular organisms']
# Search organisms
taxdb._rapid_fuzz("Influenza", limit=5)
```
## Command Line Interface
```bash
# Cache full database
taxdumpy cache -d /path/to/taxdump
# Search organism
taxdumpy search --fast "Escherichia coli"
# Trace lineage
taxdumpy lineage --fast 511145 # E. coli K-12
```
## Database Setup
1. Download NCBI taxdump:
```bash
mkdir -p ~/.taxonkit
wget ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz -P ~/.taxonkit
tar -xzf ~/.taxonkit/taxdump.tar.gz -C ~/.taxonkit
```
2. (Optional) Create optimized cache:
```bash
taxdumpy cache -d ~/.taxonkit
```
## Advanced Usage
### Custom Caching
```python
# Fast cache with specific taxids
with open("important_taxids.txt", "w") as f:
f.write("\n".join(["9606", "511145"]))
# CLI
taxdumpy fast-cache -d ~/.taxonkit -f important_taxids.txt
```
### API Reference
```python
class Taxon:
"""Represents a taxonomic unit"""
@property
def lineage(self) -> List[Node]: ...
@property
def rank_lineage(self) -> List[str]: ...
@property
def is_legacy(self) -> bool: ...
```
## Performance Tips
- Use `fast=True` when loading for ~3x speedup (requires pre-caching)
- For batch processing, reuse `TaxDb` instances
- Set `TAXDB_PATH` environment variable to avoid path repetition
## Contributing
PRs welcome! Please:
1. Format with `black`
2. Include type hints
3. Add tests under `/tests`
## License
MIT © 2025 [Omega HH](https://github.com/omegahh)
Raw data
{
"_id": null,
"home_page": null,
"name": "taxdumpy",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "NCBI, metagenomics, taxonomy, tree-of-life",
"author": null,
"author_email": "Omega HH <omeganju@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/1e/ea/ec18d0b434433c06c94d5183615e5fd6b81ccaecf8143166583b3f88c1e9/taxdumpy-0.1.1.tar.gz",
"platform": null,
"description": "# Taxdumpy \ud83e\uddec\n\n#### NCBI Taxonomy Toolkit for Python\n\n*A high-performance parser for NCBI Taxonomy databases with lineage resolution and taxonomy search*\n\n\n\n\n\n## Features\n\n- **Blazing Fast Parsing** \n Optimized loading of NCBI taxdump files (`nodes.dmp`, `names.dmp`, etc.) with optional pickle caching\n\n- **Comprehensive Taxon Operations** \n - TaxID validation and lineage tracing\n - Scientific name resolution\n - Rank-based filtering (species \u2192 kingdom)\n - Merged/deleted node handling\n\n- **Fuzzy Search** \n Rapid approximate name matching using `rapidfuzz` (supports misspellings)\n\n- **Memory Efficient** \n Lazy loading and optimized data structures for large taxonomies\n\n## Installation\n\n```bash\npip install taxdumpy\n```\n\nOr from source:\n```bash\ngit clone https://github.com/yourusername/taxdumpy.git\ncd taxdumpy\npip install -e .\n```\n\n## Quick Start\n\n```python\nfrom taxdumpy import TaxDb, Taxon\n\n# Initialize database (auto-downloads if needed)\ntaxdb = TaxDb(\"/path/to/taxdump\")\n\n# Create taxon object\nhuman = Taxon(9606, taxdb) # Homo sapiens\n\n# Access lineage\nprint(human.name_lineage)\n# ['Homo sapiens', 'Homo', 'Hominidae', ..., 'cellular organisms']\n\n# Search organisms\ntaxdb._rapid_fuzz(\"Influenza\", limit=5)\n```\n\n## Command Line Interface\n\n```bash\n# Cache full database\ntaxdumpy cache -d /path/to/taxdump\n\n# Search organism\ntaxdumpy search --fast \"Escherichia coli\"\n\n# Trace lineage\ntaxdumpy lineage --fast 511145 # E. coli K-12\n```\n\n## Database Setup\n\n1. Download NCBI taxdump: \n ```bash\n mkdir -p ~/.taxonkit\n wget ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz -P ~/.taxonkit\n tar -xzf ~/.taxonkit/taxdump.tar.gz -C ~/.taxonkit\n ```\n\n2. (Optional) Create optimized cache: \n ```bash\n taxdumpy cache -d ~/.taxonkit\n ```\n\n## Advanced Usage\n\n### Custom Caching\n```python\n# Fast cache with specific taxids\nwith open(\"important_taxids.txt\", \"w\") as f:\n f.write(\"\\n\".join([\"9606\", \"511145\"]))\n \n# CLI\ntaxdumpy fast-cache -d ~/.taxonkit -f important_taxids.txt\n```\n\n### API Reference\n```python\nclass Taxon:\n \"\"\"Represents a taxonomic unit\"\"\"\n \n @property\n def lineage(self) -> List[Node]: ...\n @property\n def rank_lineage(self) -> List[str]: ...\n @property\n def is_legacy(self) -> bool: ...\n```\n\n## Performance Tips\n\n- Use `fast=True` when loading for ~3x speedup (requires pre-caching)\n- For batch processing, reuse `TaxDb` instances\n- Set `TAXDB_PATH` environment variable to avoid path repetition\n\n## Contributing\n\nPRs welcome! Please:\n1. Format with `black`\n2. Include type hints\n3. Add tests under `/tests`\n\n## License\n\nMIT \u00a9 2025 [Omega HH](https://github.com/omegahh)\n",
"bugtrack_url": null,
"license": "MIT License\n \n Copyright (c) 2025 Omega HH\n \n Permission is hereby granted, free of charge, to any person obtaining a copy\n of this software and associated documentation files (the \"Software\"), to deal\n in the Software without restriction, including without limitation the rights\n to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n copies of the Software, and to permit persons to whom the Software is\n furnished to do so, subject to the following conditions:\n \n The above copyright notice and this permission notice shall be included in all\n copies or substantial portions of the Software.\n \n THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n SOFTWARE.",
"summary": "Python package for efficiently parsing NCBI's taxdump database",
"version": "0.1.1",
"project_urls": {
"Documentation": "https://github.com/omegahh/taxdumpy",
"Issues": "https://github.com/omegahh/taxdumpy/issues",
"Source": "https://github.com/omegahh/taxdumpy"
},
"split_keywords": [
"ncbi",
" metagenomics",
" taxonomy",
" tree-of-life"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "9c0a1c26b94a1169b3ee2c11a4a7dfcdd68d8ed4eebd6c2d67683cd3d7621572",
"md5": "6e19f76f82e6701913fea1d0493e20d8",
"sha256": "219cbfd4ab74023f6bb33dba3b0d5aa52d3a08db16058bebb9838dcc40da5bba"
},
"downloads": -1,
"filename": "taxdumpy-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6e19f76f82e6701913fea1d0493e20d8",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 13771,
"upload_time": "2025-08-13T12:15:28",
"upload_time_iso_8601": "2025-08-13T12:15:28.604238Z",
"url": "https://files.pythonhosted.org/packages/9c/0a/1c26b94a1169b3ee2c11a4a7dfcdd68d8ed4eebd6c2d67683cd3d7621572/taxdumpy-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "1eeaec18d0b434433c06c94d5183615e5fd6b81ccaecf8143166583b3f88c1e9",
"md5": "4bb889c74af23082e00483a70392ea7e",
"sha256": "15355f9ef486eba16a450254572125160655a9f60a9e867417d8b52dca36d02b"
},
"downloads": -1,
"filename": "taxdumpy-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "4bb889c74af23082e00483a70392ea7e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 11391,
"upload_time": "2025-08-13T12:15:31",
"upload_time_iso_8601": "2025-08-13T12:15:31.102527Z",
"url": "https://files.pythonhosted.org/packages/1e/ea/ec18d0b434433c06c94d5183615e5fd6b81ccaecf8143166583b3f88c1e9/taxdumpy-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-13 12:15:31",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "omegahh",
"github_project": "taxdumpy",
"github_not_found": true,
"lcname": "taxdumpy"
}