IRescue


NameIRescue JSON
Version 1.2.0 PyPI version JSON
download
home_pageNone
SummaryUncertainty-aware quantification of transposable elements expression in scRNA-seq
upload_time2025-10-30 11:20:33
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseMIT License Copyright (c) 2022-2025 Benedetto Polimeni Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords bioinformatics transposable-elements scrna-seq single-cell single-cell-rna-seq
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/bodegalab/irescue/python-publish.yml?logo=github&label=build)
[![PyPI](https://img.shields.io/pypi/v/irescue?logo=python)](https://pypi.org/project/irescue/)
[![container](https://img.shields.io/badge/dynamic/json?url=https%3A%2F%2Fquay.io%2Fapi%2Fv1%2Frepository%2Fbiocontainers%2Firescue%2Ftag%2F&query=%24.tags.0.name&logo=docker&label=docker%2Fsingularity&color=%231D63ED)](#container)
[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat&logo=anaconda)](https://bioconda.github.io/recipes/irescue/README.html)
[![paper](https://img.shields.io/badge/Nucleic%20Acids%20Res-10.1093%2Fnar%2Fgkae793-orange)](https://doi.org/10.1093/nar/gkae793)
[![zenodo](https://img.shields.io/badge/Zenodo-10.5281/zenodo.13479363-blue)](https://doi.org/10.5281/zenodo.13479363)

# IRescue - <ins>I</ins>nterspersed <ins>Re</ins>peats <ins>s</ins>ingle-<ins>c</ins>ell q<ins>u</ins>antifi<ins>e</ins>r

<img align="right" height="160" src="docs/logo.png">
IRescue quantifies the expression fo transposable elements (TEs) subfamilies in single cell RNA sequencing (scRNA-seq) data, performing UMI-deduplication with sequencing errors correction (for 10X-like libraries) or read quantification (for UMI-less libraries, e.g. SMART-seq) followed by probabilistic assignment of multi-mapping reads by an Expectation-Maximization (EM) procedure. The output is written on a sparse matrix compatible with Seurat, Scanpy and other toolkits.

## Content

- [Installation](#installation)
  - [Using conda](#conda)
  - [Using pip](#pip)
  - [Install a pre-release version](#pre)
  - [Build from source](#src)
  - [Container (Docker/Singularity)](#container)
- [Usage](#usage)
  - [Required inputs](#reqin)
  - [Custom annotation](#annot)
  - [UMI-less libraries (e.g. SMART-seq)](#noumi)
  - [Best practices](#best)
  - [Output files](#output_files)
  - [Load IRescue data with Seurat](#seurat)
- [Cite](#cite)

## <a name="installation"></a>Installation

### <a name="conda"></a>Using conda (recommended)

Use `conda` (or `mamba` or `micromamba`) to install IRescue with all its dependencies.

```bash
conda create -n irescue -c conda-forge -c bioconda irescue
```

### <a name="pip"></a>Using pip

If installing with `pip`, the following requirements must be installed manually: `python>=3.9`, `samtools>=1.12`, `bedtools>=2.30.0`, and fairly recent versions of the GNU utilities are required (tested on `gawk>=5.0.1`, `coreutils>=8.30` and `gzip>=1.10`).

```bash
pip install irescue
```

#### <a name="pre"></a>Install a pre-release version

You can install or upgrade to a pre-release using `pip`:
```bash
pip install -U --pre irescue
```

### <a name="src"></a>Build from source

By building the package directly from the source, you can try out the features and bug fixes that will be implemented in the future release. As above, you need to install some requirements manually. Be aware that builds from the development branches may be unstable.

```bash
git clone https://github.com/bodegalab/irescue
cd irescue
pip install .
```

### <a name="container"></a>Container (Docker/Singularity)

Docker and Singularity containers are available for each conda release of IRescue. Choose the `TAG` corresponding to the desired IRescue version [from the Biocontainers repository](https://quay.io/repository/biocontainers/irescue?tab=tags) and pull or execute the container with Docker or Singularity:

```bash
# Get latest biocontainers tag (with curl and python3, otherwise check the above link for the desired version/tag)
TAG=$(curl -s -X GET https://quay.io/api/v1/repository/biocontainers/irescue/tag/ | python3 -c 'import json,sys;obj=json.load(sys.stdin);print(obj["tags"][0]["name"])')

# Run with Docker
docker run quay.io/biocontainers/irescue:$TAG irescue --help

# Run with Singularity
singularity exec https://depot.galaxyproject.org/singularity/irescue:$TAG irescue --help
```

## <a name="usage"></a>Usage

Inspect all parameters:
```sh
irescue --help
```

Quick start:
```sh
irescue -b genome_alignments.bam -g hg38
```

### <a name="reqin"></a>Required inputs

BAM file sorted by coordinate, indexed and annotated with cell barcode and, optionally, UMI sequences as tags (e.g. `CB` and `UR` tags, configurable with `--cb-tag` and `--umi-tag`)

It can be obtained by aligning reads using [STARsolo](https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md). It is highly recommended to keep secondary alignments in BAM file, that will be used in the EM procedure to redistribute multi-mapping reads (at least `--outFilterMultimapNmax 100 --winAnchorMultimapNmax 100`), and remember to output all the needed SAM attributes (e.g. `--outSAMattributes NH HI AS nM NM MD jM jI XS MC ch cN CR CY UR UY GX GN CB UB sM sS sQ`).

### <a name="annot"></a>Custom annotation

A custom repeats annotation can be provided in BED format (e.g. `-r TE.bed`) of at least four columns, with the fourth column being the TE feature name (e.g. subfamily name).

### <a name="noumi"></a>UMI-less libraries (e.g. SMART-seq)

**Only in pre-release version `1.2.0b2` or later.**

You can ignore the UMI sequence (thus skipping UMI-deduplication entirely) with `--no-umi`:
```bash
irescue -b genome_alignments.bam -g hg38 --no-umi
```

NB: the BAM tag for cell barcodes sometimes is `RG` instead of `CB`. In such case, add the parameter `--cb-tag RG`.

### <a name="best"></a>Best practices

- If you already obtained gene-level counts (using STARsolo, Cell Ranger, Alevin, Kallisto or other tools), it is advised to provide the whitelisted cell barcodes list as a text file (`-w barcodes.tsv`). This will significantly improve performance by processing viable cells only.

- For optimal run time, use at least 4-8 cpus, e.g.: `-p 8`.

### <a name="output_files"></a>Output files

IRescue generates TE counts in a sparse matrix readable by [Seurat](https://github.com/satijalab/seurat) or [Scanpy](https://github.com/scverse/scanpy) into a `counts/` subdirectory. Optional outputs include a description of equivalence classes with UMI deduplication stats `ec_dump.tsv.gz` and a subdirectory of temporary files `tmp/` for debugging purpose (only kept with the `--keeptmp` parameter). You can enable a highly detailed logging with `-vv` (printed to the terminal's stderr).

```
irescue_out/
├── counts/
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
├── ec_dump.tsv.gz
└── tmp/
```

### <a name="seurat"></a>Load IRescue data with Seurat

Multiple assays can be exploited to integrate TE counts into an existing Seurat object containing gene expression data:

```R
# import TE counts from IRescue output directory
te.data <- Seurat::Read10X('./IRescue_out/', gene.column = 1, cell.column = 1)

# create Seurat assay from TE counts
te.assay <- Seurat::CreateAssayObject(te.data)

# subset the assay by the cells already present in the Seurat object (in case it has been filtered)
te.assay <- subset(te.assay, colnames(te.assay)[which(colnames(te.assay) %in% colnames(seurat_object))])

# add the assay in the Seurat object
seurat_object[['TE']] <- irescue.assay
```

The result will be something like this:
```
An object of class Seurat 
32276 features across 42513 samples within 2 assays 
Active assay: RNA (31078 features, 0 variable features)
 1 other assay present: TE
```

From here, TE expression can be normalized. To normalize according to gene counts or TE+gene counts, normalize manually or merge the assays. Reductions can be made using TE, gene or TE+gene expression.

## <a name="cite"></a>Cite

Benedetto Polimeni, Federica Marasca, Valeria Ranzani, Beatrice Bodega, **IRescue: uncertainty-aware quantification of transposable elements expression at single cell level**, *Nucleic Acids Research*, 2024; https://doi.org/10.1093/nar/gkae793

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "IRescue",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "bioinformatics, transposable-elements, scrna-seq, single-cell, single-cell-rna-seq",
    "author": null,
    "author_email": "Benedetto Polimeni <polimeni@ingm.org>",
    "download_url": "https://files.pythonhosted.org/packages/73/12/ccaa65d4a25b68c29e7e606bd77ef6db2a4f51dc63c934c6b91aac966725/irescue-1.2.0.tar.gz",
    "platform": null,
    "description": "![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/bodegalab/irescue/python-publish.yml?logo=github&label=build)\n[![PyPI](https://img.shields.io/pypi/v/irescue?logo=python)](https://pypi.org/project/irescue/)\n[![container](https://img.shields.io/badge/dynamic/json?url=https%3A%2F%2Fquay.io%2Fapi%2Fv1%2Frepository%2Fbiocontainers%2Firescue%2Ftag%2F&query=%24.tags.0.name&logo=docker&label=docker%2Fsingularity&color=%231D63ED)](#container)\n[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat&logo=anaconda)](https://bioconda.github.io/recipes/irescue/README.html)\n[![paper](https://img.shields.io/badge/Nucleic%20Acids%20Res-10.1093%2Fnar%2Fgkae793-orange)](https://doi.org/10.1093/nar/gkae793)\n[![zenodo](https://img.shields.io/badge/Zenodo-10.5281/zenodo.13479363-blue)](https://doi.org/10.5281/zenodo.13479363)\n\n# IRescue - <ins>I</ins>nterspersed <ins>Re</ins>peats <ins>s</ins>ingle-<ins>c</ins>ell q<ins>u</ins>antifi<ins>e</ins>r\n\n<img align=\"right\" height=\"160\" src=\"docs/logo.png\">\nIRescue quantifies the expression fo transposable elements (TEs) subfamilies in single cell RNA sequencing (scRNA-seq) data, performing UMI-deduplication with sequencing errors correction (for 10X-like libraries) or read quantification (for UMI-less libraries, e.g. SMART-seq) followed by probabilistic assignment of multi-mapping reads by an Expectation-Maximization (EM) procedure. The output is written on a sparse matrix compatible with Seurat, Scanpy and other toolkits.\n\n## Content\n\n- [Installation](#installation)\n  - [Using conda](#conda)\n  - [Using pip](#pip)\n  - [Install a pre-release version](#pre)\n  - [Build from source](#src)\n  - [Container (Docker/Singularity)](#container)\n- [Usage](#usage)\n  - [Required inputs](#reqin)\n  - [Custom annotation](#annot)\n  - [UMI-less libraries (e.g. SMART-seq)](#noumi)\n  - [Best practices](#best)\n  - [Output files](#output_files)\n  - [Load IRescue data with Seurat](#seurat)\n- [Cite](#cite)\n\n## <a name=\"installation\"></a>Installation\n\n### <a name=\"conda\"></a>Using conda (recommended)\n\nUse `conda` (or `mamba` or `micromamba`) to install IRescue with all its dependencies.\n\n```bash\nconda create -n irescue -c conda-forge -c bioconda irescue\n```\n\n### <a name=\"pip\"></a>Using pip\n\nIf installing with `pip`, the following requirements must be installed manually: `python>=3.9`, `samtools>=1.12`, `bedtools>=2.30.0`, and fairly recent versions of the GNU utilities are required (tested on `gawk>=5.0.1`, `coreutils>=8.30` and `gzip>=1.10`).\n\n```bash\npip install irescue\n```\n\n#### <a name=\"pre\"></a>Install a pre-release version\n\nYou can install or upgrade to a pre-release using `pip`:\n```bash\npip install -U --pre irescue\n```\n\n### <a name=\"src\"></a>Build from source\n\nBy building the package directly from the source, you can try out the features and bug fixes that will be implemented in the future release. As above, you need to install some requirements manually. Be aware that builds from the development branches may be unstable.\n\n```bash\ngit clone https://github.com/bodegalab/irescue\ncd irescue\npip install .\n```\n\n### <a name=\"container\"></a>Container (Docker/Singularity)\n\nDocker and Singularity containers are available for each conda release of IRescue. Choose the `TAG` corresponding to the desired IRescue version [from the Biocontainers repository](https://quay.io/repository/biocontainers/irescue?tab=tags) and pull or execute the container with Docker or Singularity:\n\n```bash\n# Get latest biocontainers tag (with curl and python3, otherwise check the above link for the desired version/tag)\nTAG=$(curl -s -X GET https://quay.io/api/v1/repository/biocontainers/irescue/tag/ | python3 -c 'import json,sys;obj=json.load(sys.stdin);print(obj[\"tags\"][0][\"name\"])')\n\n# Run with Docker\ndocker run quay.io/biocontainers/irescue:$TAG irescue --help\n\n# Run with Singularity\nsingularity exec https://depot.galaxyproject.org/singularity/irescue:$TAG irescue --help\n```\n\n## <a name=\"usage\"></a>Usage\n\nInspect all parameters:\n```sh\nirescue --help\n```\n\nQuick start:\n```sh\nirescue -b genome_alignments.bam -g hg38\n```\n\n### <a name=\"reqin\"></a>Required inputs\n\nBAM file sorted by coordinate, indexed and annotated with cell barcode and, optionally, UMI sequences as tags (e.g. `CB` and `UR` tags, configurable with `--cb-tag` and `--umi-tag`)\n\nIt can be obtained by aligning reads using [STARsolo](https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md). It is highly recommended to keep secondary alignments in BAM file, that will be used in the EM procedure to redistribute multi-mapping reads (at least `--outFilterMultimapNmax 100 --winAnchorMultimapNmax 100`), and remember to output all the needed SAM attributes (e.g. `--outSAMattributes NH HI AS nM NM MD jM jI XS MC ch cN CR CY UR UY GX GN CB UB sM sS sQ`).\n\n### <a name=\"annot\"></a>Custom annotation\n\nA custom repeats annotation can be provided in BED format (e.g. `-r TE.bed`) of at least four columns, with the fourth column being the TE feature name (e.g. subfamily name).\n\n### <a name=\"noumi\"></a>UMI-less libraries (e.g. SMART-seq)\n\n**Only in pre-release version `1.2.0b2` or later.**\n\nYou can ignore the UMI sequence (thus skipping UMI-deduplication entirely) with `--no-umi`:\n```bash\nirescue -b genome_alignments.bam -g hg38 --no-umi\n```\n\nNB: the BAM tag for cell barcodes sometimes is `RG` instead of `CB`. In such case, add the parameter `--cb-tag RG`.\n\n### <a name=\"best\"></a>Best practices\n\n- If you already obtained gene-level counts (using STARsolo, Cell Ranger, Alevin, Kallisto or other tools), it is advised to provide the whitelisted cell barcodes list as a text file (`-w barcodes.tsv`). This will significantly improve performance by processing viable cells only.\n\n- For optimal run time, use at least 4-8 cpus, e.g.: `-p 8`.\n\n### <a name=\"output_files\"></a>Output files\n\nIRescue generates TE counts in a sparse matrix readable by [Seurat](https://github.com/satijalab/seurat) or [Scanpy](https://github.com/scverse/scanpy) into a `counts/` subdirectory. Optional outputs include a description of equivalence classes with UMI deduplication stats `ec_dump.tsv.gz` and a subdirectory of temporary files `tmp/` for debugging purpose (only kept with the `--keeptmp` parameter). You can enable a highly detailed logging with `-vv` (printed to the terminal's stderr).\n\n```\nirescue_out/\n\u251c\u2500\u2500 counts/\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 barcodes.tsv.gz\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 features.tsv.gz\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 matrix.mtx.gz\n\u251c\u2500\u2500 ec_dump.tsv.gz\n\u2514\u2500\u2500 tmp/\n```\n\n### <a name=\"seurat\"></a>Load IRescue data with Seurat\n\nMultiple assays can be exploited to integrate TE counts into an existing Seurat object containing gene expression data:\n\n```R\n# import TE counts from IRescue output directory\nte.data <- Seurat::Read10X('./IRescue_out/', gene.column = 1, cell.column = 1)\n\n# create Seurat assay from TE counts\nte.assay <- Seurat::CreateAssayObject(te.data)\n\n# subset the assay by the cells already present in the Seurat object (in case it has been filtered)\nte.assay <- subset(te.assay, colnames(te.assay)[which(colnames(te.assay) %in% colnames(seurat_object))])\n\n# add the assay in the Seurat object\nseurat_object[['TE']] <- irescue.assay\n```\n\nThe result will be something like this:\n```\nAn object of class Seurat \n32276 features across 42513 samples within 2 assays \nActive assay: RNA (31078 features, 0 variable features)\n 1 other assay present: TE\n```\n\nFrom here, TE expression can be normalized. To normalize according to gene counts or TE+gene counts, normalize manually or merge the assays. Reductions can be made using TE, gene or TE+gene expression.\n\n## <a name=\"cite\"></a>Cite\n\nBenedetto Polimeni, Federica Marasca, Valeria Ranzani, Beatrice Bodega, **IRescue: uncertainty-aware quantification of transposable elements expression at single cell level**, *Nucleic Acids Research*, 2024; https://doi.org/10.1093/nar/gkae793\n",
    "bugtrack_url": null,
    "license": "MIT License\n        \n        Copyright (c) 2022-2025 Benedetto Polimeni\n        \n        Permission is hereby granted, free of charge, to any person obtaining a copy\n        of this software and associated documentation files (the \"Software\"), to deal\n        in the Software without restriction, including without limitation the rights\n        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n        copies of the Software, and to permit persons to whom the Software is\n        furnished to do so, subject to the following conditions:\n        \n        The above copyright notice and this permission notice shall be included in all\n        copies or substantial portions of the Software.\n        \n        THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n        SOFTWARE.\n        ",
    "summary": "Uncertainty-aware quantification of transposable elements expression in scRNA-seq",
    "version": "1.2.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/bodegalab/irescue/issues",
        "Documentation": "https://github.com/bodegalab/irescue#readme",
        "Source Code": "https://github.com/bodegalab/irescue"
    },
    "split_keywords": [
        "bioinformatics",
        " transposable-elements",
        " scrna-seq",
        " single-cell",
        " single-cell-rna-seq"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "17744fa4c30132ea3628d01107d047412c2b9601a553ba12d5c3f218563abea1",
                "md5": "53f8dfa1645b2db8f47356f603cf5881",
                "sha256": "2e707fd6543623e31d0e793c4af1b8b45285868f0711e3bf2d12e094ff2c28c9"
            },
            "downloads": -1,
            "filename": "irescue-1.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "53f8dfa1645b2db8f47356f603cf5881",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 24572,
            "upload_time": "2025-10-30T11:20:32",
            "upload_time_iso_8601": "2025-10-30T11:20:32.285694Z",
            "url": "https://files.pythonhosted.org/packages/17/74/4fa4c30132ea3628d01107d047412c2b9601a553ba12d5c3f218563abea1/irescue-1.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7312ccaa65d4a25b68c29e7e606bd77ef6db2a4f51dc63c934c6b91aac966725",
                "md5": "5f69469e1a1f756bd0ed545f37dfe436",
                "sha256": "79c0902df3df842a2b4b9d5760d79a0b8808002fcdca11d4c46322bc1812acdf"
            },
            "downloads": -1,
            "filename": "irescue-1.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "5f69469e1a1f756bd0ed545f37dfe436",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 21730,
            "upload_time": "2025-10-30T11:20:33",
            "upload_time_iso_8601": "2025-10-30T11:20:33.557640Z",
            "url": "https://files.pythonhosted.org/packages/73/12/ccaa65d4a25b68c29e7e606bd77ef6db2a4f51dc63c934c6b91aac966725/irescue-1.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-30 11:20:33",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "bodegalab",
    "github_project": "irescue",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "irescue"
}
        
Elapsed time: 3.48581s