countmut


Namecountmut JSON
Version 0.0.6 PyPI version JSON
download
home_pageNone
SummaryUltra-fast strand-aware mutation counter
upload_time2025-10-24 07:27:37
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseNone
keywords bioinformatics bam pileup mutation bisulfite sequencing genomics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # CountMut

[![Pypi Releases](https://img.shields.io/pypi/v/countmut.svg)](https://pypi.python.org/pypi/countmut)
[![Downloads](https://img.shields.io/pepy/dt/countmut)](https://pepy.tech/project/countmut)
[![Development Status](https://img.shields.io/badge/status-alpha-orange.svg)](https://github.com/y9c/countmut)

> **Ultra-fast strand-aware mutation counter for bisulfite sequencing analysis**

CountMut is a high-performance tool for counting mutations from bisulfite sequencing BAM files (BS-seq, CAM-seq, GLORI-seq, eTAM-seq). It features parallel processing, quality-based mate overlap deduplication, and optimized file I/O for maximum speed.

## Features

- 🚀 **Ultra-Fast**: Call mutation without pileup reads
- 🧬 **Bisulfite Support**: NS, Zf, Yf tag filtering for conversion analysis
- 🎯 **Accurate**: Quality-based mate overlap deduplication prevents double-counting
- âš¡ **Parallel**: Multi-threaded genomic window processing
- 🔧 **Flexible**: Configurable filtering, strand-specific processing, auto-indexing

## Installation

```bash
pip install countmut
```

## Quick Start

```bash
# Basic usage - auto-creates indices if needed
countmut -i input.bam -r reference.fa -o mutations.tsv

# Count T→C mutations (common in bisulfite sequencing)
countmut -i input.bam -r reference.fa -o mutations.tsv --ref-base T --mut-base C

# With custom threads and filtering
countmut -i input.bam -r reference.fa -o mutations.tsv -t 8 --max-unc 5 --min-con 2
```

## Options

**Input/Output**
```bash
-i, --input PATH       Input BAM file (coordinate-sorted) [required]
-r, --reference PATH   Reference FASTA file [required]
-o, --output PATH      Output TSV file (default: stdout)
-f, --force            Overwrite output without prompting
```

**Mutation Analysis**
```bash
--ref-base TEXT        Reference base to count from [default: A]
--mut-base TEXT        Mutation base to count [default: G]
--strand TEXT          Strand: both/forward/reverse [default: both]
--region TEXT          Genomic region (e.g., 'chr1:1000000-2000000')
```

**Performance**
```bash
-t, --threads INTEGER  Number of parallel threads [default: auto]
-b, --bin-size INTEGER Genomic bin size in bp [default: 10000]
```

**Alternative Mutation Tagging**
```bash
--ref-base2 TEXT       Alternative reference base for tagging (e.g., 'C')
--mut-base2 TEXT       Alternative mutation base for tagging (e.g., 'T')
--output-bam PATH      Output BAM with alternative tags (Yc, Zc)
```

**Quality Filters**
```bash
--min-baseq INTEGER    Min base quality (Phred score) [default: 20]
--min-mapq INTEGER     Min mapping quality (MAPQ) [default: 0]
--max-sub INTEGER      Max substitutions (NS tag) [default: 1]
--trim-start INTEGER   Trim N bases from read 5' end (fragment orientation) [default: 2]
--trim-end INTEGER     Trim N bases from read 3' end (fragment orientation) [default: 2]
--max-unc INTEGER      Max unconverted (Zf tag) [default: 3]
--min-con INTEGER      Min converted (Yf tag) [default: 1]
```

**Output Records**
```bash
-p, --pad INTEGER      Motif window half-size [default: 15]
-s, --save-rest        Include other bases (o0, o1, o2 columns)
```

> **Note**: BAM files must have **NS**, **Zf**, and **Yf** tags (essential for bisulfite analysis).
> Indices (.bai, .fai) are created automatically if missing.

## Output Format

TSV file with the following columns:

| Column | Description |
|--------|-------------|
| `chrom` | Chromosome name |
| `pos` | Genomic position (1-based) |
| `strand` | Strand (+ or -) |
| `motif` | Sequence context (2×pad+1 bp window) |
| `u0`, `u1`, `u2` | **Unconverted** (reference base) counts |
| `m0`, `m1`, `m2` | **Mutation** (mutation base only) counts |
| `o0`, `o1`, `o2` | **Other bases** counts (with `--save-rest`) |

**Count categories** (x0, x1, x2):
- **x0 (low quality)**: Bases failing quality filters (trim region, max-sub, min-mapq, min-baseq)
- **x1 (high conversion)**: Bases from reads with high conversion efficiency (low Zf and high Yf)
- **x2 (insufficient conversion)**: Bases from reads with insufficient conversion efficiency (high Zf or low Yf)


 

<p align="center">
  <img
    src="https://raw.githubusercontent.com/y9c/y9c/master/resource/footer_line.svg?sanitize=true"
  />
</p>
<p align="center">
  Copyright &copy; 2025-present
  <a href="https://github.com/y9c" target="_blank">Chang Y</a>
</p>
<p align="center">
  <a href="https://github.com/y9c/countmut/blob/main/LICENSE">
    <img src="https://img.shields.io/static/v1.svg?style=for-the-badge&label=License&message=MIT&logoColor=d9e0ee&colorA=282a36&colorB=c678dd" />
  </a>
</p>

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "countmut",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "bioinformatics, bam, pileup, mutation, bisulfite, sequencing, genomics",
    "author": null,
    "author_email": "Ye Chang <yech1990@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/ce/57/f64667852307a9897e2e78579e3f0ee8d81c0e58e23f9e70441888f9cd1b/countmut-0.0.6.tar.gz",
    "platform": null,
    "description": "# CountMut\n\n[![Pypi Releases](https://img.shields.io/pypi/v/countmut.svg)](https://pypi.python.org/pypi/countmut)\n[![Downloads](https://img.shields.io/pepy/dt/countmut)](https://pepy.tech/project/countmut)\n[![Development Status](https://img.shields.io/badge/status-alpha-orange.svg)](https://github.com/y9c/countmut)\n\n> **Ultra-fast strand-aware mutation counter for bisulfite sequencing analysis**\n\nCountMut is a high-performance tool for counting mutations from bisulfite sequencing BAM files (BS-seq, CAM-seq, GLORI-seq, eTAM-seq). It features parallel processing, quality-based mate overlap deduplication, and optimized file I/O for maximum speed.\n\n## Features\n\n- \ud83d\ude80 **Ultra-Fast**: Call mutation without pileup reads\n- \ud83e\uddec **Bisulfite Support**: NS, Zf, Yf tag filtering for conversion analysis\n- \ud83c\udfaf **Accurate**: Quality-based mate overlap deduplication prevents double-counting\n- \u26a1 **Parallel**: Multi-threaded genomic window processing\n- \ud83d\udd27 **Flexible**: Configurable filtering, strand-specific processing, auto-indexing\n\n## Installation\n\n```bash\npip install countmut\n```\n\n## Quick Start\n\n```bash\n# Basic usage - auto-creates indices if needed\ncountmut -i input.bam -r reference.fa -o mutations.tsv\n\n# Count T\u2192C mutations (common in bisulfite sequencing)\ncountmut -i input.bam -r reference.fa -o mutations.tsv --ref-base T --mut-base C\n\n# With custom threads and filtering\ncountmut -i input.bam -r reference.fa -o mutations.tsv -t 8 --max-unc 5 --min-con 2\n```\n\n## Options\n\n**Input/Output**\n```bash\n-i, --input PATH       Input BAM file (coordinate-sorted) [required]\n-r, --reference PATH   Reference FASTA file [required]\n-o, --output PATH      Output TSV file (default: stdout)\n-f, --force            Overwrite output without prompting\n```\n\n**Mutation Analysis**\n```bash\n--ref-base TEXT        Reference base to count from [default: A]\n--mut-base TEXT        Mutation base to count [default: G]\n--strand TEXT          Strand: both/forward/reverse [default: both]\n--region TEXT          Genomic region (e.g., 'chr1:1000000-2000000')\n```\n\n**Performance**\n```bash\n-t, --threads INTEGER  Number of parallel threads [default: auto]\n-b, --bin-size INTEGER Genomic bin size in bp [default: 10000]\n```\n\n**Alternative Mutation Tagging**\n```bash\n--ref-base2 TEXT       Alternative reference base for tagging (e.g., 'C')\n--mut-base2 TEXT       Alternative mutation base for tagging (e.g., 'T')\n--output-bam PATH      Output BAM with alternative tags (Yc, Zc)\n```\n\n**Quality Filters**\n```bash\n--min-baseq INTEGER    Min base quality (Phred score) [default: 20]\n--min-mapq INTEGER     Min mapping quality (MAPQ) [default: 0]\n--max-sub INTEGER      Max substitutions (NS tag) [default: 1]\n--trim-start INTEGER   Trim N bases from read 5' end (fragment orientation) [default: 2]\n--trim-end INTEGER     Trim N bases from read 3' end (fragment orientation) [default: 2]\n--max-unc INTEGER      Max unconverted (Zf tag) [default: 3]\n--min-con INTEGER      Min converted (Yf tag) [default: 1]\n```\n\n**Output Records**\n```bash\n-p, --pad INTEGER      Motif window half-size [default: 15]\n-s, --save-rest        Include other bases (o0, o1, o2 columns)\n```\n\n> **Note**: BAM files must have **NS**, **Zf**, and **Yf** tags (essential for bisulfite analysis).\n> Indices (.bai, .fai) are created automatically if missing.\n\n## Output Format\n\nTSV file with the following columns:\n\n| Column | Description |\n|--------|-------------|\n| `chrom` | Chromosome name |\n| `pos` | Genomic position (1-based) |\n| `strand` | Strand (+ or -) |\n| `motif` | Sequence context (2\u00d7pad+1 bp window) |\n| `u0`, `u1`, `u2` | **Unconverted** (reference base) counts |\n| `m0`, `m1`, `m2` | **Mutation** (mutation base only) counts |\n| `o0`, `o1`, `o2` | **Other bases** counts (with `--save-rest`) |\n\n**Count categories** (x0, x1, x2):\n- **x0 (low quality)**: Bases failing quality filters (trim region, max-sub, min-mapq, min-baseq)\n- **x1 (high conversion)**: Bases from reads with high conversion efficiency (low Zf and high Yf)\n- **x2 (insufficient conversion)**: Bases from reads with insufficient conversion efficiency (high Zf or low Yf)\n\n\n&nbsp;\n\n<p align=\"center\">\n  <img\n    src=\"https://raw.githubusercontent.com/y9c/y9c/master/resource/footer_line.svg?sanitize=true\"\n  />\n</p>\n<p align=\"center\">\n  Copyright &copy; 2025-present\n  <a href=\"https://github.com/y9c\" target=\"_blank\">Chang Y</a>\n</p>\n<p align=\"center\">\n  <a href=\"https://github.com/y9c/countmut/blob/main/LICENSE\">\n    <img src=\"https://img.shields.io/static/v1.svg?style=for-the-badge&label=License&message=MIT&logoColor=d9e0ee&colorA=282a36&colorB=c678dd\" />\n  </a>\n</p>\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Ultra-fast strand-aware mutation counter",
    "version": "0.0.6",
    "project_urls": null,
    "split_keywords": [
        "bioinformatics",
        " bam",
        " pileup",
        " mutation",
        " bisulfite",
        " sequencing",
        " genomics"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7908d2ea76ae49e7744a7d0f8f1b23f9e023855b3bc8495c3abdce61a24c696f",
                "md5": "631040a6e43981ae179c52c67afb85f8",
                "sha256": "79b2ec7ec4fd0b125ff19d6aa8fcc6599860071b8c1725d515843298bf715936"
            },
            "downloads": -1,
            "filename": "countmut-0.0.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "631040a6e43981ae179c52c67afb85f8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 24157,
            "upload_time": "2025-10-24T07:27:36",
            "upload_time_iso_8601": "2025-10-24T07:27:36.464484Z",
            "url": "https://files.pythonhosted.org/packages/79/08/d2ea76ae49e7744a7d0f8f1b23f9e023855b3bc8495c3abdce61a24c696f/countmut-0.0.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ce57f64667852307a9897e2e78579e3f0ee8d81c0e58e23f9e70441888f9cd1b",
                "md5": "0f04c9aef84e092a7f5d2b30d1a9dbe4",
                "sha256": "9ccfe7486015ae85d4fea78bbc23da1d7a16dc290d5fe8790111be7de90c7b01"
            },
            "downloads": -1,
            "filename": "countmut-0.0.6.tar.gz",
            "has_sig": false,
            "md5_digest": "0f04c9aef84e092a7f5d2b30d1a9dbe4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 27326,
            "upload_time": "2025-10-24T07:27:37",
            "upload_time_iso_8601": "2025-10-24T07:27:37.554680Z",
            "url": "https://files.pythonhosted.org/packages/ce/57/f64667852307a9897e2e78579e3f0ee8d81c0e58e23f9e70441888f9cd1b/countmut-0.0.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-24 07:27:37",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "countmut"
}
        
Elapsed time: 1.55130s