# CountMut
[](https://pypi.python.org/pypi/countmut)
[](https://pepy.tech/project/countmut)
[](https://github.com/y9c/countmut)
> **Ultra-fast strand-aware mutation counter for bisulfite sequencing analysis**
CountMut is a high-performance tool for counting mutations from bisulfite sequencing BAM files (BS-seq, CAM-seq, GLORI-seq, eTAM-seq). It features parallel processing, quality-based mate overlap deduplication, and optimized file I/O for maximum speed.
## Features
- 🚀 **Ultra-Fast**: Call mutation without pileup reads
- 🧬 **Bisulfite Support**: NS, Zf, Yf tag filtering for conversion analysis
- 🎯 **Accurate**: Quality-based mate overlap deduplication prevents double-counting
- âš¡ **Parallel**: Multi-threaded genomic window processing
- 🔧 **Flexible**: Configurable filtering, strand-specific processing, auto-indexing
## Installation
```bash
pip install countmut
```
## Quick Start
```bash
# Basic usage - auto-creates indices if needed
countmut -i input.bam -r reference.fa -o mutations.tsv
# Count T→C mutations (common in bisulfite sequencing)
countmut -i input.bam -r reference.fa -o mutations.tsv --ref-base T --mut-base C
# With custom threads and filtering
countmut -i input.bam -r reference.fa -o mutations.tsv -t 8 --max-unc 5 --min-con 2
```
## Options
**Input/Output**
```bash
-i, --input PATH Input BAM file (coordinate-sorted) [required]
-r, --reference PATH Reference FASTA file [required]
-o, --output PATH Output TSV file (default: stdout)
-f, --force Overwrite output without prompting
```
**Mutation Analysis**
```bash
--ref-base TEXT Reference base to count from [default: A]
--mut-base TEXT Mutation base to count [default: G]
--strand TEXT Strand: both/forward/reverse [default: both]
--region TEXT Genomic region (e.g., 'chr1:1000000-2000000')
```
**Performance**
```bash
-t, --threads INTEGER Number of parallel threads [default: auto]
-b, --bin-size INTEGER Genomic bin size in bp [default: 10000]
```
**Alternative Mutation Tagging**
```bash
--ref-base2 TEXT Alternative reference base for tagging (e.g., 'C')
--mut-base2 TEXT Alternative mutation base for tagging (e.g., 'T')
--output-bam PATH Output BAM with alternative tags (Yc, Zc)
```
**Quality Filters**
```bash
--min-baseq INTEGER Min base quality (Phred score) [default: 20]
--min-mapq INTEGER Min mapping quality (MAPQ) [default: 0]
--max-sub INTEGER Max substitutions (NS tag) [default: 1]
--trim-start INTEGER Trim N bases from read 5' end (fragment orientation) [default: 2]
--trim-end INTEGER Trim N bases from read 3' end (fragment orientation) [default: 2]
--max-unc INTEGER Max unconverted (Zf tag) [default: 3]
--min-con INTEGER Min converted (Yf tag) [default: 1]
```
**Output Records**
```bash
-p, --pad INTEGER Motif window half-size [default: 15]
-s, --save-rest Include other bases (o0, o1, o2 columns)
```
> **Note**: BAM files must have **NS**, **Zf**, and **Yf** tags (essential for bisulfite analysis).
> Indices (.bai, .fai) are created automatically if missing.
## Output Format
TSV file with the following columns:
| Column | Description |
|--------|-------------|
| `chrom` | Chromosome name |
| `pos` | Genomic position (1-based) |
| `strand` | Strand (+ or -) |
| `motif` | Sequence context (2×pad+1 bp window) |
| `u0`, `u1`, `u2` | **Unconverted** (reference base) counts |
| `m0`, `m1`, `m2` | **Mutation** (mutation base only) counts |
| `o0`, `o1`, `o2` | **Other bases** counts (with `--save-rest`) |
**Count categories** (x0, x1, x2):
- **x0 (low quality)**: Bases failing quality filters (trim region, max-sub, min-mapq, min-baseq)
- **x1 (high conversion)**: Bases from reads with high conversion efficiency (low Zf and high Yf)
- **x2 (insufficient conversion)**: Bases from reads with insufficient conversion efficiency (high Zf or low Yf)
<p align="center">
<img
src="https://raw.githubusercontent.com/y9c/y9c/master/resource/footer_line.svg?sanitize=true"
/>
</p>
<p align="center">
Copyright © 2025-present
<a href="https://github.com/y9c" target="_blank">Chang Y</a>
</p>
<p align="center">
<a href="https://github.com/y9c/countmut/blob/main/LICENSE">
<img src="https://img.shields.io/static/v1.svg?style=for-the-badge&label=License&message=MIT&logoColor=d9e0ee&colorA=282a36&colorB=c678dd" />
</a>
</p>
Raw data
{
"_id": null,
"home_page": null,
"name": "countmut",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "bioinformatics, bam, pileup, mutation, bisulfite, sequencing, genomics",
"author": null,
"author_email": "Ye Chang <yech1990@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/ce/57/f64667852307a9897e2e78579e3f0ee8d81c0e58e23f9e70441888f9cd1b/countmut-0.0.6.tar.gz",
"platform": null,
"description": "# CountMut\n\n[](https://pypi.python.org/pypi/countmut)\n[](https://pepy.tech/project/countmut)\n[](https://github.com/y9c/countmut)\n\n> **Ultra-fast strand-aware mutation counter for bisulfite sequencing analysis**\n\nCountMut is a high-performance tool for counting mutations from bisulfite sequencing BAM files (BS-seq, CAM-seq, GLORI-seq, eTAM-seq). It features parallel processing, quality-based mate overlap deduplication, and optimized file I/O for maximum speed.\n\n## Features\n\n- \ud83d\ude80 **Ultra-Fast**: Call mutation without pileup reads\n- \ud83e\uddec **Bisulfite Support**: NS, Zf, Yf tag filtering for conversion analysis\n- \ud83c\udfaf **Accurate**: Quality-based mate overlap deduplication prevents double-counting\n- \u26a1 **Parallel**: Multi-threaded genomic window processing\n- \ud83d\udd27 **Flexible**: Configurable filtering, strand-specific processing, auto-indexing\n\n## Installation\n\n```bash\npip install countmut\n```\n\n## Quick Start\n\n```bash\n# Basic usage - auto-creates indices if needed\ncountmut -i input.bam -r reference.fa -o mutations.tsv\n\n# Count T\u2192C mutations (common in bisulfite sequencing)\ncountmut -i input.bam -r reference.fa -o mutations.tsv --ref-base T --mut-base C\n\n# With custom threads and filtering\ncountmut -i input.bam -r reference.fa -o mutations.tsv -t 8 --max-unc 5 --min-con 2\n```\n\n## Options\n\n**Input/Output**\n```bash\n-i, --input PATH Input BAM file (coordinate-sorted) [required]\n-r, --reference PATH Reference FASTA file [required]\n-o, --output PATH Output TSV file (default: stdout)\n-f, --force Overwrite output without prompting\n```\n\n**Mutation Analysis**\n```bash\n--ref-base TEXT Reference base to count from [default: A]\n--mut-base TEXT Mutation base to count [default: G]\n--strand TEXT Strand: both/forward/reverse [default: both]\n--region TEXT Genomic region (e.g., 'chr1:1000000-2000000')\n```\n\n**Performance**\n```bash\n-t, --threads INTEGER Number of parallel threads [default: auto]\n-b, --bin-size INTEGER Genomic bin size in bp [default: 10000]\n```\n\n**Alternative Mutation Tagging**\n```bash\n--ref-base2 TEXT Alternative reference base for tagging (e.g., 'C')\n--mut-base2 TEXT Alternative mutation base for tagging (e.g., 'T')\n--output-bam PATH Output BAM with alternative tags (Yc, Zc)\n```\n\n**Quality Filters**\n```bash\n--min-baseq INTEGER Min base quality (Phred score) [default: 20]\n--min-mapq INTEGER Min mapping quality (MAPQ) [default: 0]\n--max-sub INTEGER Max substitutions (NS tag) [default: 1]\n--trim-start INTEGER Trim N bases from read 5' end (fragment orientation) [default: 2]\n--trim-end INTEGER Trim N bases from read 3' end (fragment orientation) [default: 2]\n--max-unc INTEGER Max unconverted (Zf tag) [default: 3]\n--min-con INTEGER Min converted (Yf tag) [default: 1]\n```\n\n**Output Records**\n```bash\n-p, --pad INTEGER Motif window half-size [default: 15]\n-s, --save-rest Include other bases (o0, o1, o2 columns)\n```\n\n> **Note**: BAM files must have **NS**, **Zf**, and **Yf** tags (essential for bisulfite analysis).\n> Indices (.bai, .fai) are created automatically if missing.\n\n## Output Format\n\nTSV file with the following columns:\n\n| Column | Description |\n|--------|-------------|\n| `chrom` | Chromosome name |\n| `pos` | Genomic position (1-based) |\n| `strand` | Strand (+ or -) |\n| `motif` | Sequence context (2\u00d7pad+1 bp window) |\n| `u0`, `u1`, `u2` | **Unconverted** (reference base) counts |\n| `m0`, `m1`, `m2` | **Mutation** (mutation base only) counts |\n| `o0`, `o1`, `o2` | **Other bases** counts (with `--save-rest`) |\n\n**Count categories** (x0, x1, x2):\n- **x0 (low quality)**: Bases failing quality filters (trim region, max-sub, min-mapq, min-baseq)\n- **x1 (high conversion)**: Bases from reads with high conversion efficiency (low Zf and high Yf)\n- **x2 (insufficient conversion)**: Bases from reads with insufficient conversion efficiency (high Zf or low Yf)\n\n\n \n\n<p align=\"center\">\n <img\n src=\"https://raw.githubusercontent.com/y9c/y9c/master/resource/footer_line.svg?sanitize=true\"\n />\n</p>\n<p align=\"center\">\n Copyright © 2025-present\n <a href=\"https://github.com/y9c\" target=\"_blank\">Chang Y</a>\n</p>\n<p align=\"center\">\n <a href=\"https://github.com/y9c/countmut/blob/main/LICENSE\">\n <img src=\"https://img.shields.io/static/v1.svg?style=for-the-badge&label=License&message=MIT&logoColor=d9e0ee&colorA=282a36&colorB=c678dd\" />\n </a>\n</p>\n",
"bugtrack_url": null,
"license": null,
"summary": "Ultra-fast strand-aware mutation counter",
"version": "0.0.6",
"project_urls": null,
"split_keywords": [
"bioinformatics",
" bam",
" pileup",
" mutation",
" bisulfite",
" sequencing",
" genomics"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "7908d2ea76ae49e7744a7d0f8f1b23f9e023855b3bc8495c3abdce61a24c696f",
"md5": "631040a6e43981ae179c52c67afb85f8",
"sha256": "79b2ec7ec4fd0b125ff19d6aa8fcc6599860071b8c1725d515843298bf715936"
},
"downloads": -1,
"filename": "countmut-0.0.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "631040a6e43981ae179c52c67afb85f8",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 24157,
"upload_time": "2025-10-24T07:27:36",
"upload_time_iso_8601": "2025-10-24T07:27:36.464484Z",
"url": "https://files.pythonhosted.org/packages/79/08/d2ea76ae49e7744a7d0f8f1b23f9e023855b3bc8495c3abdce61a24c696f/countmut-0.0.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "ce57f64667852307a9897e2e78579e3f0ee8d81c0e58e23f9e70441888f9cd1b",
"md5": "0f04c9aef84e092a7f5d2b30d1a9dbe4",
"sha256": "9ccfe7486015ae85d4fea78bbc23da1d7a16dc290d5fe8790111be7de90c7b01"
},
"downloads": -1,
"filename": "countmut-0.0.6.tar.gz",
"has_sig": false,
"md5_digest": "0f04c9aef84e092a7f5d2b30d1a9dbe4",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 27326,
"upload_time": "2025-10-24T07:27:37",
"upload_time_iso_8601": "2025-10-24T07:27:37.554680Z",
"url": "https://files.pythonhosted.org/packages/ce/57/f64667852307a9897e2e78579e3f0ee8d81c0e58e23f9e70441888f9cd1b/countmut-0.0.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-24 07:27:37",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "countmut"
}