# DecontX Python
A Python implementation of DecontX for removing ambient RNA contamination from single-cell RNA-seq data, designed for seamless integration with scanpy workflows.
## Overview
DecontX is a Bayesian method to estimate and remove cross-contamination from ambient RNA in droplet-based single-cell RNA-seq data. This Python implementation provides near-perfect parity with the original R version (correlation > 0.999) while enabling pure Python workflows without R dependencies.
**Key Features:**
- 🐍 Pure Python implementation (no R required)
- 🔬 Seamless scanpy integration
- ⚡ Numba-accelerated performance
- 📊 Bayesian contamination estimation per cell
- 🎯 Validated against original R implementation
## Installation
```bash
pip install decontx-python
```
## Quick Start
```python
import scanpy as sc
import decontx
# Load and preprocess data with scanpy
adata = sc.read_h5ad("pbmc.h5ad")
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata)
sc.pp.pca(adata)
sc.pp.neighbors(adata)
sc.tl.leiden(adata)
# Remove ambient RNA contamination
decontx.decontx(adata, cluster_key="leiden")
# Access results
contamination = adata.obs['decontX_contamination']
clean_counts = adata.layers['decontX_counts']
print(f"Mean contamination: {contamination.mean():.1%}")
print(f"Highly contaminated cells (>50%): {(contamination > 0.5).sum()}")
```
## Why DecontX?
Ambient RNA contamination occurs when mRNA from lysed/stressed cells gets captured in droplets with other cells, causing:
- Cross-contamination between cell types
- Blurred cell type boundaries
- False positive marker gene expression
- Reduced clustering quality
DecontX models each cell as a mixture of:
1. **Native transcripts** from the cell's true type
2. **Contaminating transcripts** from other cell types in the sample
## Method Comparison
Based on our benchmarking study:
| Method | Ambient RNA Removed | Precision | Conservativeness |
|--------|-------------------|-----------|------------------|
| **SoupX** | ~65% | High | Very conservative |
| **DecontX** | ~90% | Medium-High | Balanced |
| **CellBender** | ~90% | Medium | More aggressive |
**Recommendation**:
- Use **SoupX** for maximum safety and minimal false positives
- Use **DecontX** for balanced contamination removal in standard workflows
- Use **CellBender** when you can replace your entire preprocessing pipeline
## API Reference
### Main Function
```python
decontx.decontx(
adata,
cluster_key="leiden",
max_iter=500,
delta=(10.0, 10.0),
estimate_delta=True,
convergence=0.001,
copy=False
)
```
**Parameters:**
- `adata`: AnnData object with raw counts in `.X`
- `cluster_key`: Column in `.obs` containing cluster labels
- `max_iter`: Maximum EM iterations (default: 500)
- `delta`: Beta prior parameters for contamination (default: (10,10))
- `estimate_delta`: Whether to estimate delta parameters (default: True)
- `convergence`: Convergence threshold (default: 0.001)
- `copy`: Return copy or modify in place (default: False)
**Returns:**
Results stored in `adata`:
- `adata.obs['decontX_contamination']`: Per-cell contamination estimates
- `adata.layers['decontX_counts']`: Decontaminated count matrix
- `adata.uns['decontX']`: Model parameters and metadata
### Utility Functions
```python
# Get decontaminated counts as array
clean_counts = decontx.get_decontx_counts(adata)
# Get contamination estimates
contamination = decontx.get_decontx_contamination(adata)
# Simple simulation for testing
sim_data = decontx.simulate_contamination(n_cells=1000, n_genes=2000)
```
## Performance Notes
- Python implementation is ~5-6x slower than R version
- Performance acceptable for typical datasets (<50k cells)
- Numba JIT compilation provides significant speedup after first run
- Memory usage scales linearly with dataset size
## Integration with Existing Workflows
DecontX fits naturally into scanpy workflows:
```python
# Standard scanpy analysis
sc.tl.leiden(adata, resolution=0.5)
sc.tl.rank_genes_groups(adata, 'leiden')
# Add decontamination
decontx.decontx(adata, cluster_key='leiden')
# Continue with decontaminated data
adata.X = adata.layers['decontX_counts']
sc.pp.log1p(adata) # Re-log transform clean counts
sc.pp.scale(adata)
sc.tl.pca(adata)
sc.pl.pca_variance_ratio(adata, n_pcs=50)
```
## Citation
If you use DecontX in your research, please cite:
> Yang, S., Corbett, S.E., Koga, Y. et al. Decontamination of ambient RNA in single-cell RNA-seq with DecontX. Genome Biol 21, 57 (2020). https://doi.org/10.1186/s13059-020-1950-6
## Issues and Support
- 🐛 Report bugs: [GitHub Issues](https://github.com/NiRuff/decontx-python/issues)
- 📖 Documentation: [Read the Docs](https://decontx-python.readthedocs.io)
- 💬 Questions: [GitHub Discussions](https://github.com/NiRuff/decontx-python/discussions)
## License
MIT License - see LICENSE file for details.
Raw data
{
"_id": null,
"home_page": null,
"name": "decontx-python",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "Nicolas Ruffini <nicolas.ruffini@posteo.de>",
"keywords": "RNA-seq, ambient-RNA, bayesian, bioinformatics, decontamination, scanpy, single-cell",
"author": null,
"author_email": "Nicolas Ruffini <nicolas.ruffini@posteo.de>",
"download_url": "https://files.pythonhosted.org/packages/fb/e9/42942e5895bcf8adf76a63cec9de98ef06d4b6faa1960f1032d34cfaf315/decontx_python-0.2.0.tar.gz",
"platform": null,
"description": "# DecontX Python\n\nA Python implementation of DecontX for removing ambient RNA contamination from single-cell RNA-seq data, designed for seamless integration with scanpy workflows.\n\n## Overview\n\nDecontX is a Bayesian method to estimate and remove cross-contamination from ambient RNA in droplet-based single-cell RNA-seq data. This Python implementation provides near-perfect parity with the original R version (correlation > 0.999) while enabling pure Python workflows without R dependencies.\n\n**Key Features:**\n- \ud83d\udc0d Pure Python implementation (no R required)\n- \ud83d\udd2c Seamless scanpy integration\n- \u26a1 Numba-accelerated performance\n- \ud83d\udcca Bayesian contamination estimation per cell\n- \ud83c\udfaf Validated against original R implementation\n\n## Installation\n\n```bash\npip install decontx-python\n```\n\n## Quick Start\n\n```python\nimport scanpy as sc\nimport decontx\n\n# Load and preprocess data with scanpy\nadata = sc.read_h5ad(\"pbmc.h5ad\")\nsc.pp.filter_cells(adata, min_genes=200)\nsc.pp.filter_genes(adata, min_cells=3)\nsc.pp.normalize_total(adata)\nsc.pp.log1p(adata)\nsc.pp.highly_variable_genes(adata)\nsc.pp.pca(adata)\nsc.pp.neighbors(adata)\nsc.tl.leiden(adata)\n\n# Remove ambient RNA contamination\ndecontx.decontx(adata, cluster_key=\"leiden\")\n\n# Access results\ncontamination = adata.obs['decontX_contamination']\nclean_counts = adata.layers['decontX_counts']\n\nprint(f\"Mean contamination: {contamination.mean():.1%}\")\nprint(f\"Highly contaminated cells (>50%): {(contamination > 0.5).sum()}\")\n```\n\n## Why DecontX?\n\nAmbient RNA contamination occurs when mRNA from lysed/stressed cells gets captured in droplets with other cells, causing:\n- Cross-contamination between cell types\n- Blurred cell type boundaries \n- False positive marker gene expression\n- Reduced clustering quality\n\nDecontX models each cell as a mixture of:\n1. **Native transcripts** from the cell's true type\n2. **Contaminating transcripts** from other cell types in the sample\n\n## Method Comparison\n\nBased on our benchmarking study:\n\n| Method | Ambient RNA Removed | Precision | Conservativeness |\n|--------|-------------------|-----------|------------------|\n| **SoupX** | ~65% | High | Very conservative |\n| **DecontX** | ~90% | Medium-High | Balanced |\n| **CellBender** | ~90% | Medium | More aggressive |\n\n**Recommendation**: \n- Use **SoupX** for maximum safety and minimal false positives\n- Use **DecontX** for balanced contamination removal in standard workflows \n- Use **CellBender** when you can replace your entire preprocessing pipeline\n\n## API Reference\n\n### Main Function\n\n```python\ndecontx.decontx(\n adata,\n cluster_key=\"leiden\",\n max_iter=500,\n delta=(10.0, 10.0),\n estimate_delta=True,\n convergence=0.001,\n copy=False\n)\n```\n\n**Parameters:**\n- `adata`: AnnData object with raw counts in `.X`\n- `cluster_key`: Column in `.obs` containing cluster labels\n- `max_iter`: Maximum EM iterations (default: 500)\n- `delta`: Beta prior parameters for contamination (default: (10,10))\n- `estimate_delta`: Whether to estimate delta parameters (default: True)\n- `convergence`: Convergence threshold (default: 0.001)\n- `copy`: Return copy or modify in place (default: False)\n\n**Returns:**\nResults stored in `adata`:\n- `adata.obs['decontX_contamination']`: Per-cell contamination estimates\n- `adata.layers['decontX_counts']`: Decontaminated count matrix\n- `adata.uns['decontX']`: Model parameters and metadata\n\n### Utility Functions\n\n```python\n# Get decontaminated counts as array\nclean_counts = decontx.get_decontx_counts(adata)\n\n# Get contamination estimates\ncontamination = decontx.get_decontx_contamination(adata)\n\n# Simple simulation for testing\nsim_data = decontx.simulate_contamination(n_cells=1000, n_genes=2000)\n```\n\n## Performance Notes\n\n- Python implementation is ~5-6x slower than R version\n- Performance acceptable for typical datasets (<50k cells)\n- Numba JIT compilation provides significant speedup after first run\n- Memory usage scales linearly with dataset size\n\n## Integration with Existing Workflows\n\nDecontX fits naturally into scanpy workflows:\n\n```python\n# Standard scanpy analysis\nsc.tl.leiden(adata, resolution=0.5)\nsc.tl.rank_genes_groups(adata, 'leiden')\n\n# Add decontamination\ndecontx.decontx(adata, cluster_key='leiden')\n\n# Continue with decontaminated data\nadata.X = adata.layers['decontX_counts']\nsc.pp.log1p(adata) # Re-log transform clean counts\nsc.pp.scale(adata)\nsc.tl.pca(adata)\nsc.pl.pca_variance_ratio(adata, n_pcs=50)\n```\n\n## Citation\n\nIf you use DecontX in your research, please cite:\n\n> Yang, S., Corbett, S.E., Koga, Y. et al. Decontamination of ambient RNA in single-cell RNA-seq with DecontX. Genome Biol 21, 57 (2020). https://doi.org/10.1186/s13059-020-1950-6\n\n## Issues and Support\n\n- \ud83d\udc1b Report bugs: [GitHub Issues](https://github.com/NiRuff/decontx-python/issues)\n- \ud83d\udcd6 Documentation: [Read the Docs](https://decontx-python.readthedocs.io)\n- \ud83d\udcac Questions: [GitHub Discussions](https://github.com/NiRuff/decontx-python/discussions)\n\n## License\n\nMIT License - see LICENSE file for details.",
"bugtrack_url": null,
"license": null,
"summary": "Python implementation of DecontX for removing ambient RNA contamination from single-cell RNA-seq data",
"version": "0.2.0",
"project_urls": {
"Bug Tracker": "https://github.com/NiRuff/decontx-python/issues",
"Changelog": "https://github.com/NiRuff/decontx-python/blob/main/CHANGELOG.md",
"Documentation": "https://github.com/NiRuff/decontx-python",
"Homepage": "https://github.com/NiRuff/decontx-python",
"Repository": "https://github.com/NiRuff/decontx-python"
},
"split_keywords": [
"rna-seq",
" ambient-rna",
" bayesian",
" bioinformatics",
" decontamination",
" scanpy",
" single-cell"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "30ecb6c047c2933c8937f6c353fc82a1e2fa7f8fb2101030a128850cb3d2c0da",
"md5": "abef0e7bb7205473b5b55ac2f5cdec1b",
"sha256": "6675845cc3ad856c4c878640260529c101ad0964decbc62343c991a8926616d2"
},
"downloads": -1,
"filename": "decontx_python-0.2.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "abef0e7bb7205473b5b55ac2f5cdec1b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 15167,
"upload_time": "2025-09-08T13:23:09",
"upload_time_iso_8601": "2025-09-08T13:23:09.195803Z",
"url": "https://files.pythonhosted.org/packages/30/ec/b6c047c2933c8937f6c353fc82a1e2fa7f8fb2101030a128850cb3d2c0da/decontx_python-0.2.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "fbe942942e5895bcf8adf76a63cec9de98ef06d4b6faa1960f1032d34cfaf315",
"md5": "6d9ec1ec7ff75fe0405d2ba28fbc4a74",
"sha256": "74b1a049ae9633fa7052b291069caa06d71a80ac98636abc6d92d207f6dd8a10"
},
"downloads": -1,
"filename": "decontx_python-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "6d9ec1ec7ff75fe0405d2ba28fbc4a74",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 18721,
"upload_time": "2025-09-08T13:23:10",
"upload_time_iso_8601": "2025-09-08T13:23:10.945880Z",
"url": "https://files.pythonhosted.org/packages/fb/e9/42942e5895bcf8adf76a63cec9de98ef06d4b6faa1960f1032d34cfaf315/decontx_python-0.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-08 13:23:10",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "NiRuff",
"github_project": "decontx-python",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "decontx-python"
}