# symclatron: symbiont classifier
**ML-based classification of microbial symbiotic lifestyles**
symclatron is a tool that classifies microbial genomes (input is protein FASTA files (`.faa`)) into three symbiotic lifestyle categories:
- **Free-living**
- **Symbiont; Host-associated**
- **Symbiont; Obligate-intracellular**
## Installation and quick start
### Step 1: Install `pixi` (Requirement ⚠️)
```sh
curl -fsSL https://pixi.sh/install.sh | sh
```
More information about `pixi` can be found in the [pixi documentation](https://pixi.sh/).
### Step 2: Install `symclatron`
```sh
pixi global install -c conda-forge -c bioconda -c https://repo.prefix.dev/astrogenomics symclatron
symclatron setup
```
### Test the installation
```sh
symclatron test
```
## Setup data (required)
Before using `symclatron` for the first time, you need to download the required database files. This only needs to be done once.
```bash
symclatron setup
```
## Input file requirements
- **Input file format**: Protein FASTA files (`.faa`)
- **Quality**: Complete or near-complete genomes recommended, but good performance for MQ MAGs are expected
### Classify your genomes
```bash
symclatron classify --genome-dir /path/to/genomes/ --output-dir results/
```
### Getting help
```bash
symclatron --help
# Command-specific help
symclatron classify --help
symclatron setup --help
# Show version and information
symclatron --version
```
### Classification command
The main classification command with all options:
```bash
symclatron classify [OPTIONS]
```
**Options:**
- `--genome-dir, -i`: Directory containing genome FASTA files (.faa) [default: input_genomes]
- `--output-dir, -o`: Output directory for results [default: output_symclatron]
- `--keep-tmp`: Keep temporary files for debugging
- `--threads, -t`: Number of threads for HMMER searches [default: 2]
- `--quiet, -q`: Suppress progress messages
- `--verbose`: Show detailed progress information
**Examples:**
```bash
# Basic usage
symclatron classify --genome-dir genomes/ --output-dir results/
# With more threads and keeping temporary files
symclatron classify -i genomes/ -o results/ --threads 8 --keep-tmp
# Quiet mode
symclatron classify --genome-dir genomes/ --quiet
# Verbose mode with detailed progress
symclatron classify --genome-dir genomes/ --verbose
```
## Results
The classification results are saved in the specified output directory:
### Main output files
1. **`symclatron_results.tsv`** - Main classification results with columns:
- `taxon_oid` - Genome identifier
- `completeness_UNI56` - Completeness metric based on universal marker genes
- `confidence` - Overall confidence score for the classification
- `classification` - Final classification label:
- `Free-living`
- `Symbiont;Host-associated`
- `Symbiont;Obligate-intracellular`
2. **`classification_summary.txt`** - Summary report with statistics
3. **Log files** - Detailed execution logs with timestamps
### Debug files
When using `--keep-tmp`, intermediate files are preserved in `tmp/` directory for analysis.
## Performance
symclatron is designed for efficiency:
- **>2 minutes per genome** on consumer-level laptops
- **Most recent benchmark**: 306 genomes in ~162 minutes (1.9 min/genome)
- **Memory efficient** - suitable for standard workstations
## Container usage
### Apptainer/Singularity
Pull the latest container:
```bash
apptainer pull docker://docker.io/jvillada/symclatron:latest
```
## Citation
If you use symclatron in your research, please cite:
A genomic catalog of Earth’s bacterial and archaeal symbionts.
Juan C. Villada, Yumary M. Vasquez, Gitta Szabo, Ewan Whittaker-Walker, Miguel F. Romero, Sarina Qin, Neha Varghese, Emiley A. Eloe-Fadrosh, Nikos C. Kyrpides, SymGs data consortium, Axel Visel, Tanja Woyke, Frederik Schulz
bioRxiv 2025.05.29.656868; doi: https://doi.org/10.1101/2025.05.29.656868
## Support
- **Repository**: [https://github.com/NeLLi-team/symclatron](https://github.com/NeLLi-team/symclatron)
- **Issues**: [https://github.com/NeLLi-team/symclatron/issues](https://github.com/NeLLi-team/symclatron/issues)
- **Author**: Juan C. Villada <jvillada@lbl.gov>
Raw data
{
"_id": null,
"home_page": null,
"name": "symclatron",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.12",
"maintainer_email": null,
"keywords": "bioinformatics, machine-learning, symbiosis, microbiology, classification, genomics",
"author": null,
"author_email": "\"Juan C. Villada\" <jvillada@lbl.gov>",
"download_url": "https://files.pythonhosted.org/packages/f9/52/041d3e66fad23acf1ff8ebeb264a96e67c907a280c4e87e63bfb53b11179/symclatron-0.6.4.tar.gz",
"platform": null,
"description": "# symclatron: symbiont classifier\n\n**ML-based classification of microbial symbiotic lifestyles**\n\nsymclatron is a tool that classifies microbial genomes (input is protein FASTA files (`.faa`)) into three symbiotic lifestyle categories:\n\n- **Free-living**\n- **Symbiont; Host-associated**\n- **Symbiont; Obligate-intracellular**\n\n## Installation and quick start\n\n### Step 1: Install `pixi` (Requirement \u26a0\ufe0f)\n\n```sh\ncurl -fsSL https://pixi.sh/install.sh | sh\n```\n\nMore information about `pixi` can be found in the [pixi documentation](https://pixi.sh/).\n\n### Step 2: Install `symclatron`\n\n```sh\npixi global install -c conda-forge -c bioconda -c https://repo.prefix.dev/astrogenomics symclatron\nsymclatron setup\n```\n\n### Test the installation\n\n```sh\nsymclatron test\n```\n\n## Setup data (required)\n\nBefore using `symclatron` for the first time, you need to download the required database files. This only needs to be done once.\n\n```bash\nsymclatron setup\n```\n\n## Input file requirements\n\n- **Input file format**: Protein FASTA files (`.faa`)\n- **Quality**: Complete or near-complete genomes recommended, but good performance for MQ MAGs are expected\n\n### Classify your genomes\n\n```bash\nsymclatron classify --genome-dir /path/to/genomes/ --output-dir results/\n```\n\n### Getting help\n\n```bash\nsymclatron --help\n\n# Command-specific help\nsymclatron classify --help\nsymclatron setup --help\n\n# Show version and information\nsymclatron --version\n```\n\n### Classification command\n\nThe main classification command with all options:\n\n```bash\nsymclatron classify [OPTIONS]\n```\n\n**Options:**\n\n- `--genome-dir, -i`: Directory containing genome FASTA files (.faa) [default: input_genomes]\n- `--output-dir, -o`: Output directory for results [default: output_symclatron]\n- `--keep-tmp`: Keep temporary files for debugging\n- `--threads, -t`: Number of threads for HMMER searches [default: 2]\n- `--quiet, -q`: Suppress progress messages\n- `--verbose`: Show detailed progress information\n\n**Examples:**\n\n```bash\n# Basic usage\nsymclatron classify --genome-dir genomes/ --output-dir results/\n\n# With more threads and keeping temporary files\nsymclatron classify -i genomes/ -o results/ --threads 8 --keep-tmp\n\n# Quiet mode\nsymclatron classify --genome-dir genomes/ --quiet\n\n# Verbose mode with detailed progress\nsymclatron classify --genome-dir genomes/ --verbose\n```\n\n## Results\n\nThe classification results are saved in the specified output directory:\n\n### Main output files\n\n1. **`symclatron_results.tsv`** - Main classification results with columns:\n - `taxon_oid` - Genome identifier\n - `completeness_UNI56` - Completeness metric based on universal marker genes\n - `confidence` - Overall confidence score for the classification\n - `classification` - Final classification label:\n - `Free-living`\n - `Symbiont;Host-associated`\n - `Symbiont;Obligate-intracellular`\n\n2. **`classification_summary.txt`** - Summary report with statistics\n\n3. **Log files** - Detailed execution logs with timestamps\n\n### Debug files\n\nWhen using `--keep-tmp`, intermediate files are preserved in `tmp/` directory for analysis.\n\n## Performance\n\nsymclatron is designed for efficiency:\n\n- **>2 minutes per genome** on consumer-level laptops\n- **Most recent benchmark**: 306 genomes in ~162 minutes (1.9 min/genome)\n- **Memory efficient** - suitable for standard workstations\n\n## Container usage\n\n### Apptainer/Singularity\n\nPull the latest container:\n\n```bash\napptainer pull docker://docker.io/jvillada/symclatron:latest\n```\n\n## Citation\n\nIf you use symclatron in your research, please cite:\n\nA genomic catalog of Earth\u2019s bacterial and archaeal symbionts.\nJuan C. Villada, Yumary M. Vasquez, Gitta Szabo, Ewan Whittaker-Walker, Miguel F. Romero, Sarina Qin, Neha Varghese, Emiley A. Eloe-Fadrosh, Nikos C. Kyrpides, SymGs data consortium, Axel Visel, Tanja Woyke, Frederik Schulz\nbioRxiv 2025.05.29.656868; doi: https://doi.org/10.1101/2025.05.29.656868\n\n## Support\n\n- **Repository**: [https://github.com/NeLLi-team/symclatron](https://github.com/NeLLi-team/symclatron)\n- **Issues**: [https://github.com/NeLLi-team/symclatron/issues](https://github.com/NeLLi-team/symclatron/issues)\n- **Author**: Juan C. Villada <jvillada@lbl.gov>",
"bugtrack_url": null,
"license": null,
"summary": "symclatron: symbiont classifier",
"version": "0.6.4",
"project_urls": {
"Bug Tracker": "https://github.com/NeLLi-team/symclatron/issues",
"Documentation": "https://github.com/NeLLi-team/symclatron#readme",
"Homepage": "https://github.com/NeLLi-team/symclatron",
"Repository": "https://github.com/NeLLi-team/symclatron"
},
"split_keywords": [
"bioinformatics",
" machine-learning",
" symbiosis",
" microbiology",
" classification",
" genomics"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "c9aafbcf56eaeaf43a898c5d022ee36575356d68927c1f0f50a63eaf27c52f50",
"md5": "6aa70544deac15e1406157d39868e857",
"sha256": "f76d5e23075fc631d68812707e75484a16898db89e0ebb9b41259dad5182293b"
},
"downloads": -1,
"filename": "symclatron-0.6.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6aa70544deac15e1406157d39868e857",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.12",
"size": 23475,
"upload_time": "2025-08-25T22:07:27",
"upload_time_iso_8601": "2025-08-25T22:07:27.051051Z",
"url": "https://files.pythonhosted.org/packages/c9/aa/fbcf56eaeaf43a898c5d022ee36575356d68927c1f0f50a63eaf27c52f50/symclatron-0.6.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "f952041d3e66fad23acf1ff8ebeb264a96e67c907a280c4e87e63bfb53b11179",
"md5": "79ba027c8732bc2cd9dbeb7b0e604901",
"sha256": "9f34cf03f7464efdfd2c2fb0b0716122aba053913ad67a9ceb9e571ac7230ff3"
},
"downloads": -1,
"filename": "symclatron-0.6.4.tar.gz",
"has_sig": false,
"md5_digest": "79ba027c8732bc2cd9dbeb7b0e604901",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.12",
"size": 24638,
"upload_time": "2025-08-25T22:07:28",
"upload_time_iso_8601": "2025-08-25T22:07:28.321439Z",
"url": "https://files.pythonhosted.org/packages/f9/52/041d3e66fad23acf1ff8ebeb264a96e67c907a280c4e87e63bfb53b11179/symclatron-0.6.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-25 22:07:28",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "NeLLi-team",
"github_project": "symclatron",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "symclatron"
}