scib


Namescib JSON
Version 1.1.5 PyPI version JSON
download
home_pagehttps://github.com/theislab/scib
SummaryEvaluating single-cell data integration methods
upload_time2024-04-01 15:53:58
maintainerNone
docs_urlNone
authorMalte D. Luecken, Maren Buettner, Daniel C. Strobl, Michaela F. Mueller
requires_python>=3.8
licenseMIT
keywords benchmarking single cell data integration
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![Stars](https://img.shields.io/github/stars/theislab/scib?logo=GitHub&color=yellow)](https://github.com/theislab/scib/stargazers)
[![PyPI](https://img.shields.io/pypi/v/scib?logo=PyPI)](https://pypi.org/project/scib)
[![PyPIDownloads](https://pepy.tech/badge/scib)](https://pepy.tech/project/scib)
[![Build Status](https://github.com/theislab/scib/actions/workflows/test.yml/badge.svg)](https://github.com/theislab/scib/actions/workflows/test.yml)
[![Documentation](https://readthedocs.org/projects/scib/badge/?version=latest)](https://scib.readthedocs.io/en/latest/?badge=latest)
[![codecov](https://codecov.io/gh/theislab/scib/branch/main/graph/badge.svg?token=M1nuTpAxyS)](https://codecov.io/gh/theislab/scib)
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit)

# Benchmarking atlas-level data integration in single-cell genomics

This repository contains the code for the `scib` package used in our benchmarking study for data integration tools.
In [our study](https://doi.org/10.1038/s41592-021-01336-8), we benchmark 16 methods (see Tools) with 4 combinations of
preprocessing steps leading to 68 methods combinations on 85 batches of gene expression and chromatin accessibility
data.

![Workflow](https://raw.githubusercontent.com/theislab/scib/main/docs/source/_static/figure.png)

## Resources

- The git repository of the [`scib` package](https://github.com/theislab/scib) and
  its [documentation](https://scib.readthedocs.io/).
- The reusable pipeline we used in the study can be found in the
  separate [scib pipeline](https://github.com/theislab/scib-pipeline.git) repository. It is reproducible and automates
  the computation of preprocesssing combinations, integration methods and benchmarking metrics.
- On our [website](https://theislab.github.io/scib-reproducibility) we visualise the results of the study.
- For reproducibility and visualisation we have a dedicated
  repository: [scib-reproducibility](https://github.com/theislab/scib-reproducibility).

### Please cite:

Luecken, M.D., Büttner, M., Chaichoompu, K. et al. Benchmarking atlas-level data integration in single-cell genomics.
Nat Methods 19, 41–50 (2022). [https://doi.org/10.1038/s41592-021-01336-8](https://doi.org/10.1038/s41592-021-01336-8)

## Package: scib

We created the python package called `scib` that uses `scanpy` to streamline the integration of single-cell datasets and
evaluate the results. The package contains several modules for preprocessing an `anndata` object, running integration
methods and evaluating the resulting using a number of metrics. For preprocessing, `scib.preprocessing` (or `scib.pp`)
contains functions for normalising, scaling or batch-aware selection of highly variable genes. Functions for the
integration methods are in `scib.integration` or for short `scib.ig` and metrics are under
`scib.metrics` (or `scib.me`).

The `scib` python package is available on [PyPI](https://pypi.org/) and can be installed through

```commandline
pip install scib
```

Import `scib` in python:

```python
import scib
```

### Optional Dependencies

The package contains optional dependencies that need to be installed manually if needed.
These include R dependencies (`rpy2`, `anndata2ri`) which require an installation of R integration method packages.
All optional dependencies are listed under `setup.cfg` under `[options.extras_require]` and can be installed through pip.

e.g. for installing `rpy2` and `bbknn` dependencies:
```commandline
pip install 'scib[rpy2,bbknn]'
```

Optional dependencies outside of python need to be installed separately.
For instance, in order to run kBET, install it via the following command in R:

```R
install.packages('remotes')
remotes::install_github('theislab/kBET')
```

## Metrics

We implemented different metrics for evaluating batch correction and biological conservation in the `scib.metrics`
module.

<table class="docutils align-default">
  <colgroup>
    <col style="width: 50%" />
    <col style="width: 50%" />
  </colgroup>
  <thead>
    <tr class="row-odd"><th class="head"><p>Biological Conservation</p></th>
      <th class="head"><p>Batch Correction</p></th>
    </tr>
  </thead>
  <tbody>
    <tr class="row-even" >
      <td><ul class="simple">
        <li><p>Cell type ASW</p></li>
        <li><p>Cell cycle conservation</p></li>
        <li><p>Graph cLISI</p></li>
        <li><p>Adjusted rand index (ARI) for cell label</p></li>
        <li><p>Normalised mutual information (NMI) for cell label</p></li>
        <li><p>Highly variable gene conservation</p></li>
        <li><p>Isolated label ASW</p></li>
        <li><p>Isolated label F1</p></li>
        <li><p>Trajectory conservation</p></li>
      </ul></td>
      <td><ul class="simple">
        <li><p>Batch ASW</p></li>
        <li><p>Principal component regression</p></li>
        <li><p>Graph iLISI</p></li>
        <li><p>Graph connectivity</p></li>
        <li><p>kBET (K-nearest neighbour batch effect)</p></li>
      </ul></td>
    </tr>
  </tbody>
</table>

For a detailed description of the metrics implemented in this package, please see our
[publication](https://doi.org/10.1038/s41592-021-01336-8) and the package [documentation](https://scib.readthedocs.io/).

## Integration Tools

Tools that are compared include:

- [BBKNN](https://github.com/Teichlab/bbknn) 1.3.9
- [Combat](https://scanpy.readthedocs.io/en/stable/api/scanpy.pp.combat.html) [paper](https://academic.oup.com/biostatistics/article/8/1/118/252073)
- [Conos](https://github.com/hms-dbmi/conos) 1.3.0
- [DESC](https://github.com/eleozzr/desc) 2.0.3
- [FastMNN](https://bioconductor.org/packages/batchelor/) (batchelor 1.4.0)
- [Harmony](https://github.com/immunogenomics/harmony) 1.0
- [LIGER](https://github.com/MacoskoLab/liger) 0.5.0
- [MNN](https://github.com/chriscainx/mnnpy) 0.1.9.5
- [SAUCIE](https://github.com/KrishnaswamyLab/SAUCIE)
- [Scanorama](https://github.com/brianhie/scanorama) 1.7.0
- [scANVI](https://github.com/chenlingantelope/HarmonizationSCANVI) (scVI 0.6.7)
- [scGen](https://github.com/theislab/scgen) 1.1.5
- [scVI](https://github.com/YosefLab/scVI) 0.6.7
- [Seurat v3](https://github.com/satijalab/seurat) 3.2.0 CCA (default) and RPCA
- [TrVae](https://github.com/theislab/trvae) 0.0.1
- [TrVaep](https://github.com/theislab/trvaep) 0.1.0

## Development

For developing this package, please make sure to install additional dependencies so that you can use `pytest` and
`pre-commit`.

```shell
pip install -e '.[test,dev]'
```

Please refer to the `setup.cfg` for more optional dependencies.

Install `pre-commit` to the repository for running it automatically every time you commit in git.

```shell
pre-commit install
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/theislab/scib",
    "name": "scib",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "benchmarking, single cell, data integration",
    "author": "Malte D. Luecken, Maren Buettner, Daniel C. Strobl, Michaela F. Mueller",
    "author_email": "malte.luecken@helmholtz-muenchen.de, michaela.mueller@helmholtz-muenchen.de",
    "download_url": "https://files.pythonhosted.org/packages/07/1e/74d194a4597bc6c3adae7e286a75fa102bf4e4b2094439df2cf01c77ba76/scib-1.1.5.tar.gz",
    "platform": null,
    "description": "[![Stars](https://img.shields.io/github/stars/theislab/scib?logo=GitHub&color=yellow)](https://github.com/theislab/scib/stargazers)\n[![PyPI](https://img.shields.io/pypi/v/scib?logo=PyPI)](https://pypi.org/project/scib)\n[![PyPIDownloads](https://pepy.tech/badge/scib)](https://pepy.tech/project/scib)\n[![Build Status](https://github.com/theislab/scib/actions/workflows/test.yml/badge.svg)](https://github.com/theislab/scib/actions/workflows/test.yml)\n[![Documentation](https://readthedocs.org/projects/scib/badge/?version=latest)](https://scib.readthedocs.io/en/latest/?badge=latest)\n[![codecov](https://codecov.io/gh/theislab/scib/branch/main/graph/badge.svg?token=M1nuTpAxyS)](https://codecov.io/gh/theislab/scib)\n[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit)\n\n# Benchmarking atlas-level data integration in single-cell genomics\n\nThis repository contains the code for the `scib` package used in our benchmarking study for data integration tools.\nIn [our study](https://doi.org/10.1038/s41592-021-01336-8), we benchmark 16 methods (see Tools) with 4 combinations of\npreprocessing steps leading to 68 methods combinations on 85 batches of gene expression and chromatin accessibility\ndata.\n\n![Workflow](https://raw.githubusercontent.com/theislab/scib/main/docs/source/_static/figure.png)\n\n## Resources\n\n- The git repository of the [`scib` package](https://github.com/theislab/scib) and\n  its [documentation](https://scib.readthedocs.io/).\n- The reusable pipeline we used in the study can be found in the\n  separate [scib pipeline](https://github.com/theislab/scib-pipeline.git) repository. It is reproducible and automates\n  the computation of preprocesssing combinations, integration methods and benchmarking metrics.\n- On our [website](https://theislab.github.io/scib-reproducibility) we visualise the results of the study.\n- For reproducibility and visualisation we have a dedicated\n  repository: [scib-reproducibility](https://github.com/theislab/scib-reproducibility).\n\n### Please cite:\n\nLuecken, M.D., B\u00fcttner, M., Chaichoompu, K. et al. Benchmarking atlas-level data integration in single-cell genomics.\nNat Methods 19, 41\u201350 (2022). [https://doi.org/10.1038/s41592-021-01336-8](https://doi.org/10.1038/s41592-021-01336-8)\n\n## Package: scib\n\nWe created the python package called `scib` that uses `scanpy` to streamline the integration of single-cell datasets and\nevaluate the results. The package contains several modules for preprocessing an `anndata` object, running integration\nmethods and evaluating the resulting using a number of metrics. For preprocessing, `scib.preprocessing` (or `scib.pp`)\ncontains functions for normalising, scaling or batch-aware selection of highly variable genes. Functions for the\nintegration methods are in `scib.integration` or for short `scib.ig` and metrics are under\n`scib.metrics` (or `scib.me`).\n\nThe `scib` python package is available on [PyPI](https://pypi.org/) and can be installed through\n\n```commandline\npip install scib\n```\n\nImport `scib` in python:\n\n```python\nimport scib\n```\n\n### Optional Dependencies\n\nThe package contains optional dependencies that need to be installed manually if needed.\nThese include R dependencies (`rpy2`, `anndata2ri`) which require an installation of R integration method packages.\nAll optional dependencies are listed under `setup.cfg` under `[options.extras_require]` and can be installed through pip.\n\ne.g. for installing `rpy2` and `bbknn` dependencies:\n```commandline\npip install 'scib[rpy2,bbknn]'\n```\n\nOptional dependencies outside of python need to be installed separately.\nFor instance, in order to run kBET, install it via the following command in R:\n\n```R\ninstall.packages('remotes')\nremotes::install_github('theislab/kBET')\n```\n\n## Metrics\n\nWe implemented different metrics for evaluating batch correction and biological conservation in the `scib.metrics`\nmodule.\n\n<table class=\"docutils align-default\">\n  <colgroup>\n    <col style=\"width: 50%\" />\n    <col style=\"width: 50%\" />\n  </colgroup>\n  <thead>\n    <tr class=\"row-odd\"><th class=\"head\"><p>Biological Conservation</p></th>\n      <th class=\"head\"><p>Batch Correction</p></th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr class=\"row-even\" >\n      <td><ul class=\"simple\">\n        <li><p>Cell type ASW</p></li>\n        <li><p>Cell cycle conservation</p></li>\n        <li><p>Graph cLISI</p></li>\n        <li><p>Adjusted rand index (ARI) for cell label</p></li>\n        <li><p>Normalised mutual information (NMI) for cell label</p></li>\n        <li><p>Highly variable gene conservation</p></li>\n        <li><p>Isolated label ASW</p></li>\n        <li><p>Isolated label F1</p></li>\n        <li><p>Trajectory conservation</p></li>\n      </ul></td>\n      <td><ul class=\"simple\">\n        <li><p>Batch ASW</p></li>\n        <li><p>Principal component regression</p></li>\n        <li><p>Graph iLISI</p></li>\n        <li><p>Graph connectivity</p></li>\n        <li><p>kBET (K-nearest neighbour batch effect)</p></li>\n      </ul></td>\n    </tr>\n  </tbody>\n</table>\n\nFor a detailed description of the metrics implemented in this package, please see our\n[publication](https://doi.org/10.1038/s41592-021-01336-8) and the package [documentation](https://scib.readthedocs.io/).\n\n## Integration Tools\n\nTools that are compared include:\n\n- [BBKNN](https://github.com/Teichlab/bbknn) 1.3.9\n- [Combat](https://scanpy.readthedocs.io/en/stable/api/scanpy.pp.combat.html) [paper](https://academic.oup.com/biostatistics/article/8/1/118/252073)\n- [Conos](https://github.com/hms-dbmi/conos) 1.3.0\n- [DESC](https://github.com/eleozzr/desc) 2.0.3\n- [FastMNN](https://bioconductor.org/packages/batchelor/) (batchelor 1.4.0)\n- [Harmony](https://github.com/immunogenomics/harmony) 1.0\n- [LIGER](https://github.com/MacoskoLab/liger) 0.5.0\n- [MNN](https://github.com/chriscainx/mnnpy) 0.1.9.5\n- [SAUCIE](https://github.com/KrishnaswamyLab/SAUCIE)\n- [Scanorama](https://github.com/brianhie/scanorama) 1.7.0\n- [scANVI](https://github.com/chenlingantelope/HarmonizationSCANVI) (scVI 0.6.7)\n- [scGen](https://github.com/theislab/scgen) 1.1.5\n- [scVI](https://github.com/YosefLab/scVI) 0.6.7\n- [Seurat v3](https://github.com/satijalab/seurat) 3.2.0 CCA (default) and RPCA\n- [TrVae](https://github.com/theislab/trvae) 0.0.1\n- [TrVaep](https://github.com/theislab/trvaep) 0.1.0\n\n## Development\n\nFor developing this package, please make sure to install additional dependencies so that you can use `pytest` and\n`pre-commit`.\n\n```shell\npip install -e '.[test,dev]'\n```\n\nPlease refer to the `setup.cfg` for more optional dependencies.\n\nInstall `pre-commit` to the repository for running it automatically every time you commit in git.\n\n```shell\npre-commit install\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Evaluating single-cell data integration methods",
    "version": "1.1.5",
    "project_urls": {
        "Bug Tracker": "https://github.com/theislab/scib/issues",
        "Homepage": "https://github.com/theislab/scib",
        "Pipeline": "https://github.com/theislab/scib-pipeline",
        "Reproducibility": "https://theislab.github.io/scib-reproducibility"
    },
    "split_keywords": [
        "benchmarking",
        " single cell",
        " data integration"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c5f44a27b5bec99be3f24a0634761deba4cd336962c105b70888294762fb3bf0",
                "md5": "5c44b688794b6b20f4fc50bdc8cdc070",
                "sha256": "e5aec8037bb001a5f1b920ea81bac5288758e6013b2f89f1fbc3dcbd1e6c4e47"
            },
            "downloads": -1,
            "filename": "scib-1.1.5-1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5c44b688794b6b20f4fc50bdc8cdc070",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 79544,
            "upload_time": "2024-04-01T15:53:56",
            "upload_time_iso_8601": "2024-04-01T15:53:56.982790Z",
            "url": "https://files.pythonhosted.org/packages/c5/f4/4a27b5bec99be3f24a0634761deba4cd336962c105b70888294762fb3bf0/scib-1.1.5-1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "071e74d194a4597bc6c3adae7e286a75fa102bf4e4b2094439df2cf01c77ba76",
                "md5": "3bcff3b4483f3a64b2dd261c5651d71c",
                "sha256": "7ab3183065f2d861b64f88823a55cec767327a37ad6d0eaccce7b43c996293ad"
            },
            "downloads": -1,
            "filename": "scib-1.1.5.tar.gz",
            "has_sig": false,
            "md5_digest": "3bcff3b4483f3a64b2dd261c5651d71c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 74372,
            "upload_time": "2024-04-01T15:53:58",
            "upload_time_iso_8601": "2024-04-01T15:53:58.407612Z",
            "url": "https://files.pythonhosted.org/packages/07/1e/74d194a4597bc6c3adae7e286a75fa102bf4e4b2094439df2cf01c77ba76/scib-1.1.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-01 15:53:58",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "theislab",
    "github_project": "scib",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "scib"
}
        
Elapsed time: 0.22470s