decontx-python


Namedecontx-python JSON
Version 0.2.0 PyPI version JSON
download
home_pageNone
SummaryPython implementation of DecontX for removing ambient RNA contamination from single-cell RNA-seq data
upload_time2025-09-08 13:23:10
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords rna-seq ambient-rna bayesian bioinformatics decontamination scanpy single-cell
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # DecontX Python

A Python implementation of DecontX for removing ambient RNA contamination from single-cell RNA-seq data, designed for seamless integration with scanpy workflows.

## Overview

DecontX is a Bayesian method to estimate and remove cross-contamination from ambient RNA in droplet-based single-cell RNA-seq data. This Python implementation provides near-perfect parity with the original R version (correlation > 0.999) while enabling pure Python workflows without R dependencies.

**Key Features:**
- 🐍 Pure Python implementation (no R required)
- 🔬 Seamless scanpy integration
- ⚡ Numba-accelerated performance
- 📊 Bayesian contamination estimation per cell
- 🎯 Validated against original R implementation

## Installation

```bash
pip install decontx-python
```

## Quick Start

```python
import scanpy as sc
import decontx

# Load and preprocess data with scanpy
adata = sc.read_h5ad("pbmc.h5ad")
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata)
sc.pp.pca(adata)
sc.pp.neighbors(adata)
sc.tl.leiden(adata)

# Remove ambient RNA contamination
decontx.decontx(adata, cluster_key="leiden")

# Access results
contamination = adata.obs['decontX_contamination']
clean_counts = adata.layers['decontX_counts']

print(f"Mean contamination: {contamination.mean():.1%}")
print(f"Highly contaminated cells (>50%): {(contamination > 0.5).sum()}")
```

## Why DecontX?

Ambient RNA contamination occurs when mRNA from lysed/stressed cells gets captured in droplets with other cells, causing:
- Cross-contamination between cell types
- Blurred cell type boundaries  
- False positive marker gene expression
- Reduced clustering quality

DecontX models each cell as a mixture of:
1. **Native transcripts** from the cell's true type
2. **Contaminating transcripts** from other cell types in the sample

## Method Comparison

Based on our benchmarking study:

| Method | Ambient RNA Removed | Precision | Conservativeness |
|--------|-------------------|-----------|------------------|
| **SoupX** | ~65% | High | Very conservative |
| **DecontX** | ~90% | Medium-High | Balanced |
| **CellBender** | ~90% | Medium | More aggressive |

**Recommendation**: 
- Use **SoupX** for maximum safety and minimal false positives
- Use **DecontX** for balanced contamination removal in standard workflows  
- Use **CellBender** when you can replace your entire preprocessing pipeline

## API Reference

### Main Function

```python
decontx.decontx(
    adata,
    cluster_key="leiden",
    max_iter=500,
    delta=(10.0, 10.0),
    estimate_delta=True,
    convergence=0.001,
    copy=False
)
```

**Parameters:**
- `adata`: AnnData object with raw counts in `.X`
- `cluster_key`: Column in `.obs` containing cluster labels
- `max_iter`: Maximum EM iterations (default: 500)
- `delta`: Beta prior parameters for contamination (default: (10,10))
- `estimate_delta`: Whether to estimate delta parameters (default: True)
- `convergence`: Convergence threshold (default: 0.001)
- `copy`: Return copy or modify in place (default: False)

**Returns:**
Results stored in `adata`:
- `adata.obs['decontX_contamination']`: Per-cell contamination estimates
- `adata.layers['decontX_counts']`: Decontaminated count matrix
- `adata.uns['decontX']`: Model parameters and metadata

### Utility Functions

```python
# Get decontaminated counts as array
clean_counts = decontx.get_decontx_counts(adata)

# Get contamination estimates
contamination = decontx.get_decontx_contamination(adata)

# Simple simulation for testing
sim_data = decontx.simulate_contamination(n_cells=1000, n_genes=2000)
```

## Performance Notes

- Python implementation is ~5-6x slower than R version
- Performance acceptable for typical datasets (<50k cells)
- Numba JIT compilation provides significant speedup after first run
- Memory usage scales linearly with dataset size

## Integration with Existing Workflows

DecontX fits naturally into scanpy workflows:

```python
# Standard scanpy analysis
sc.tl.leiden(adata, resolution=0.5)
sc.tl.rank_genes_groups(adata, 'leiden')

# Add decontamination
decontx.decontx(adata, cluster_key='leiden')

# Continue with decontaminated data
adata.X = adata.layers['decontX_counts']
sc.pp.log1p(adata)  # Re-log transform clean counts
sc.pp.scale(adata)
sc.tl.pca(adata)
sc.pl.pca_variance_ratio(adata, n_pcs=50)
```

## Citation

If you use DecontX in your research, please cite:

> Yang, S., Corbett, S.E., Koga, Y. et al. Decontamination of ambient RNA in single-cell RNA-seq with DecontX. Genome Biol 21, 57 (2020). https://doi.org/10.1186/s13059-020-1950-6

## Issues and Support

- 🐛 Report bugs: [GitHub Issues](https://github.com/NiRuff/decontx-python/issues)
- 📖 Documentation: [Read the Docs](https://decontx-python.readthedocs.io)
- 💬 Questions: [GitHub Discussions](https://github.com/NiRuff/decontx-python/discussions)

## License

MIT License - see LICENSE file for details.
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "decontx-python",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "Nicolas Ruffini <nicolas.ruffini@posteo.de>",
    "keywords": "RNA-seq, ambient-RNA, bayesian, bioinformatics, decontamination, scanpy, single-cell",
    "author": null,
    "author_email": "Nicolas Ruffini <nicolas.ruffini@posteo.de>",
    "download_url": "https://files.pythonhosted.org/packages/fb/e9/42942e5895bcf8adf76a63cec9de98ef06d4b6faa1960f1032d34cfaf315/decontx_python-0.2.0.tar.gz",
    "platform": null,
    "description": "# DecontX Python\n\nA Python implementation of DecontX for removing ambient RNA contamination from single-cell RNA-seq data, designed for seamless integration with scanpy workflows.\n\n## Overview\n\nDecontX is a Bayesian method to estimate and remove cross-contamination from ambient RNA in droplet-based single-cell RNA-seq data. This Python implementation provides near-perfect parity with the original R version (correlation > 0.999) while enabling pure Python workflows without R dependencies.\n\n**Key Features:**\n- \ud83d\udc0d Pure Python implementation (no R required)\n- \ud83d\udd2c Seamless scanpy integration\n- \u26a1 Numba-accelerated performance\n- \ud83d\udcca Bayesian contamination estimation per cell\n- \ud83c\udfaf Validated against original R implementation\n\n## Installation\n\n```bash\npip install decontx-python\n```\n\n## Quick Start\n\n```python\nimport scanpy as sc\nimport decontx\n\n# Load and preprocess data with scanpy\nadata = sc.read_h5ad(\"pbmc.h5ad\")\nsc.pp.filter_cells(adata, min_genes=200)\nsc.pp.filter_genes(adata, min_cells=3)\nsc.pp.normalize_total(adata)\nsc.pp.log1p(adata)\nsc.pp.highly_variable_genes(adata)\nsc.pp.pca(adata)\nsc.pp.neighbors(adata)\nsc.tl.leiden(adata)\n\n# Remove ambient RNA contamination\ndecontx.decontx(adata, cluster_key=\"leiden\")\n\n# Access results\ncontamination = adata.obs['decontX_contamination']\nclean_counts = adata.layers['decontX_counts']\n\nprint(f\"Mean contamination: {contamination.mean():.1%}\")\nprint(f\"Highly contaminated cells (>50%): {(contamination > 0.5).sum()}\")\n```\n\n## Why DecontX?\n\nAmbient RNA contamination occurs when mRNA from lysed/stressed cells gets captured in droplets with other cells, causing:\n- Cross-contamination between cell types\n- Blurred cell type boundaries  \n- False positive marker gene expression\n- Reduced clustering quality\n\nDecontX models each cell as a mixture of:\n1. **Native transcripts** from the cell's true type\n2. **Contaminating transcripts** from other cell types in the sample\n\n## Method Comparison\n\nBased on our benchmarking study:\n\n| Method | Ambient RNA Removed | Precision | Conservativeness |\n|--------|-------------------|-----------|------------------|\n| **SoupX** | ~65% | High | Very conservative |\n| **DecontX** | ~90% | Medium-High | Balanced |\n| **CellBender** | ~90% | Medium | More aggressive |\n\n**Recommendation**: \n- Use **SoupX** for maximum safety and minimal false positives\n- Use **DecontX** for balanced contamination removal in standard workflows  \n- Use **CellBender** when you can replace your entire preprocessing pipeline\n\n## API Reference\n\n### Main Function\n\n```python\ndecontx.decontx(\n    adata,\n    cluster_key=\"leiden\",\n    max_iter=500,\n    delta=(10.0, 10.0),\n    estimate_delta=True,\n    convergence=0.001,\n    copy=False\n)\n```\n\n**Parameters:**\n- `adata`: AnnData object with raw counts in `.X`\n- `cluster_key`: Column in `.obs` containing cluster labels\n- `max_iter`: Maximum EM iterations (default: 500)\n- `delta`: Beta prior parameters for contamination (default: (10,10))\n- `estimate_delta`: Whether to estimate delta parameters (default: True)\n- `convergence`: Convergence threshold (default: 0.001)\n- `copy`: Return copy or modify in place (default: False)\n\n**Returns:**\nResults stored in `adata`:\n- `adata.obs['decontX_contamination']`: Per-cell contamination estimates\n- `adata.layers['decontX_counts']`: Decontaminated count matrix\n- `adata.uns['decontX']`: Model parameters and metadata\n\n### Utility Functions\n\n```python\n# Get decontaminated counts as array\nclean_counts = decontx.get_decontx_counts(adata)\n\n# Get contamination estimates\ncontamination = decontx.get_decontx_contamination(adata)\n\n# Simple simulation for testing\nsim_data = decontx.simulate_contamination(n_cells=1000, n_genes=2000)\n```\n\n## Performance Notes\n\n- Python implementation is ~5-6x slower than R version\n- Performance acceptable for typical datasets (<50k cells)\n- Numba JIT compilation provides significant speedup after first run\n- Memory usage scales linearly with dataset size\n\n## Integration with Existing Workflows\n\nDecontX fits naturally into scanpy workflows:\n\n```python\n# Standard scanpy analysis\nsc.tl.leiden(adata, resolution=0.5)\nsc.tl.rank_genes_groups(adata, 'leiden')\n\n# Add decontamination\ndecontx.decontx(adata, cluster_key='leiden')\n\n# Continue with decontaminated data\nadata.X = adata.layers['decontX_counts']\nsc.pp.log1p(adata)  # Re-log transform clean counts\nsc.pp.scale(adata)\nsc.tl.pca(adata)\nsc.pl.pca_variance_ratio(adata, n_pcs=50)\n```\n\n## Citation\n\nIf you use DecontX in your research, please cite:\n\n> Yang, S., Corbett, S.E., Koga, Y. et al. Decontamination of ambient RNA in single-cell RNA-seq with DecontX. Genome Biol 21, 57 (2020). https://doi.org/10.1186/s13059-020-1950-6\n\n## Issues and Support\n\n- \ud83d\udc1b Report bugs: [GitHub Issues](https://github.com/NiRuff/decontx-python/issues)\n- \ud83d\udcd6 Documentation: [Read the Docs](https://decontx-python.readthedocs.io)\n- \ud83d\udcac Questions: [GitHub Discussions](https://github.com/NiRuff/decontx-python/discussions)\n\n## License\n\nMIT License - see LICENSE file for details.",
    "bugtrack_url": null,
    "license": null,
    "summary": "Python implementation of DecontX for removing ambient RNA contamination from single-cell RNA-seq data",
    "version": "0.2.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/NiRuff/decontx-python/issues",
        "Changelog": "https://github.com/NiRuff/decontx-python/blob/main/CHANGELOG.md",
        "Documentation": "https://github.com/NiRuff/decontx-python",
        "Homepage": "https://github.com/NiRuff/decontx-python",
        "Repository": "https://github.com/NiRuff/decontx-python"
    },
    "split_keywords": [
        "rna-seq",
        " ambient-rna",
        " bayesian",
        " bioinformatics",
        " decontamination",
        " scanpy",
        " single-cell"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "30ecb6c047c2933c8937f6c353fc82a1e2fa7f8fb2101030a128850cb3d2c0da",
                "md5": "abef0e7bb7205473b5b55ac2f5cdec1b",
                "sha256": "6675845cc3ad856c4c878640260529c101ad0964decbc62343c991a8926616d2"
            },
            "downloads": -1,
            "filename": "decontx_python-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "abef0e7bb7205473b5b55ac2f5cdec1b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 15167,
            "upload_time": "2025-09-08T13:23:09",
            "upload_time_iso_8601": "2025-09-08T13:23:09.195803Z",
            "url": "https://files.pythonhosted.org/packages/30/ec/b6c047c2933c8937f6c353fc82a1e2fa7f8fb2101030a128850cb3d2c0da/decontx_python-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "fbe942942e5895bcf8adf76a63cec9de98ef06d4b6faa1960f1032d34cfaf315",
                "md5": "6d9ec1ec7ff75fe0405d2ba28fbc4a74",
                "sha256": "74b1a049ae9633fa7052b291069caa06d71a80ac98636abc6d92d207f6dd8a10"
            },
            "downloads": -1,
            "filename": "decontx_python-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "6d9ec1ec7ff75fe0405d2ba28fbc4a74",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 18721,
            "upload_time": "2025-09-08T13:23:10",
            "upload_time_iso_8601": "2025-09-08T13:23:10.945880Z",
            "url": "https://files.pythonhosted.org/packages/fb/e9/42942e5895bcf8adf76a63cec9de98ef06d4b6faa1960f1032d34cfaf315/decontx_python-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-08 13:23:10",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "NiRuff",
    "github_project": "decontx-python",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "decontx-python"
}
        
Elapsed time: 1.05334s