PopV

Name	PopV JSON
Version	0.4.2 JSON
	download
home_page	None
Summary	Consensus prediction of cell type labels with popV
upload_time	2024-02-15 19:11:52
maintainer	None
docs_url	None
author	Can Ergen
requires_python	>=3.9
license	MIT License Copyright (c) 2023, Can Ergen Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            [![Stars](https://img.shields.io/github/stars/yoseflab/popv?logo=GitHub&color=yellow)](https://github.com/YosefLab/popv/stargazers)
[![PyPI](https://img.shields.io/pypi/v/popv.svg)](https://pypi.org/project/popv)
[![PopV](https://github.com/YosefLab/PopV/actions/workflows/test.yml/badge.svg)](https://github.com/YosefLab/PopV/actions/workflows/test.yml)
[![Coverage](https://codecov.io/gh/YosefLab/popv/branch/main/graph/badge.svg?token=KuSsL5q3l7)](https://codecov.io/gh/YosefLab/popv)
[![Code Style](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/python/black)
[![Downloads](https://pepy.tech/badge/popv)](https://pepy.tech/project/popv)

# PopV

PopV uses popular vote of a variety of cell-type transfer tools to classify cell-types in a query dataset based on a test dataset.
Using this variety of algorithms, we compute the agreement between those algorithms and use this agreement to predict which cell-types are with a high likelihood the same cell-types observed in the reference.

## Algorithms

Currently implemented algorithms are:

-   K-nearest neighbor classification after dataset integration with [BBKNN](https://github.com/Teichlab/bbknn)
-   K-nearest neighbor classification after dataset integration with [SCANORAMA](https://github.com/brianhie/scanorama)
-   K-nearest neighbor classification after dataset integration with [scVI](https://github.com/scverse/scvi-tools)
-   K-nearest neighbor classification after dataset integration with [Harmony](https://github.com/lilab-bcb/harmony-pytorch)
-   Random forest classification
-   Support vector machine classification
-   [OnClass](https://github.com/wangshenguiuc/OnClass) cell type classification
-   [scANVI](https://github.com/scverse/scvi-tools) label transfer
-   [Celltypist](https://www.celltypist.org) cell type classification

All algorithms are implemented as a class in [popv/algorithms](popv/algorithms/__init__.py).
To implement a new method, a class has to have several methods:

-   algorithm.compute_integration: Computes dataset integration to yield an integrated latent space.
-   algorithm.predict: Computes cell-type labels based on the specific classifier.
-   algorithm.compute_embedding: Computes UMAP embedding of previously computed integrated latent space.

We highlight the implementation of a new classifier in a [scaffold](popv/algorithms/_scaffold.py). Adding a new class with those methods will automatically tell PopV to include this class into its classifiers and will use the new classifier as another expert.

All algorithms that allow for pre-training are pre-trained. This excludes by design BBKNN, Harmony and SCANORAMA as all construct a new embedding space. Pretrained models are stored on (Zenodo)[https://zenodo.org/record/7580707] and are automatically downloaded in the Colab notebook linked below. We encourage pre-training models when implementing new classes.

All input parameters are defined during initial call to [Process_Query](popv/preprocessing.py) and are stored in the unstructured field of the generated AnnData object. PopV has three levels of prediction complexities:

-   retrain will train all classifiers from scratch. For 50k cells this takes up to an hour of computing time using a GPU.
-   inference will use pretrained classifiers to annotate query as well as reference cells and construct a joint embedding using all integration methods from above. For 50k cells this takes in our hands up to half an hour of computing time using a GPU.
-   fast will use only methods with pretrained classifiers to annotate only query cells. For 50k cells this takes 5 minutes without a GPU (without UMAP embedding).

A user-defined selection of classification algorithms can be defined when calling [annotate_data](popv/annotation.py). Additionally advanced users can define here non-standard parameters for the integration methods as well as the classifiers.

## Output

PopV will output a cell-type classification for each of the used classifiers, as well as the majority vote across all classifiers. Additionally, PopV uses the ontology to go through the full ontology descendants for the OnClass prediction (disabled in fast mode). This method will be further described when PopV is published. PopV additionally outputs a score, which counts the number of classifiers that agreed upon the PopV prediction. This can be seen as the certainty that the current prediction is correct for every single cell in the query data. We generally found disagreement of a single expert to be still highly reliable while disagreement of more than 2 classifiers signifies less reliable results. The aim of PopV is not to fully annotate a data set but to highlight cells that potentially benefit from further manual careful annotation.
Additionally, PopV outputs UMAP embeddings of all integrated latent spaces if _compute_embedding==True_ in [Process_Query](popv/preprocessing.py) and computes certainties for every used classifier if _return_probabilities==True_ in [Process_Query](popv/preprocessing.py).

## Installation

We suggest using a package manager like conda or mamba to install the package. OnClass files for annotation based on Tabula sapiens are deposited in popv/ontology. We use [Cell Ontology](https://obofoundry.org/ontology/cl.html) as an ontology throughout our experiments. PopV will automatically look for the ontology in this folder. If you want to provide your user-edited ontology, we will provide notebooks to create the Natural Language Model used in OnClass for this user-defined ontology.

    conda create -n yourenv python=3.8
    conda activate yourenv
    pip install git+https://github.com/czbiohub/PopV

## Example notebook

We provide an example notebook in Google Colab:

-   [Tutorial demonstrating use of Tabula sapiens as a reference](docs/notebooks/tabula_sapiens_tutorial.ipynb)

This notebook will guide you through annotating a dataset based on the annotated [Tabula sapiens reference](https://tabula-sapiens-portal.ds.czbiohub.org) and demonstrates how to run annotation on your own query dataset. This notebook requires that all cells are annotated based on a cell ontology. We strongly encourage the use of a common cell ontology, see also [Osumi-Sutherland et al](https://www.nature.com/articles/s41556-021-00787-7). Using a cell ontology is a requirement to run OnClass as a prediction algorithm.

We allow running PopV without using a cell ontology.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "PopV",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": "Can Ergen <canergen.ac@gmail.com>",
    "keywords": null,
    "author": "Can Ergen",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/af/59/ee896a95d17d481d6e5d5cef163420c35b6ab8ed24069520ad07c45ab326/popv-0.4.2.tar.gz",
    "platform": null,
    "description": "[![Stars](https://img.shields.io/github/stars/yoseflab/popv?logo=GitHub&color=yellow)](https://github.com/YosefLab/popv/stargazers)\n[![PyPI](https://img.shields.io/pypi/v/popv.svg)](https://pypi.org/project/popv)\n[![PopV](https://github.com/YosefLab/PopV/actions/workflows/test.yml/badge.svg)](https://github.com/YosefLab/PopV/actions/workflows/test.yml)\n[![Coverage](https://codecov.io/gh/YosefLab/popv/branch/main/graph/badge.svg?token=KuSsL5q3l7)](https://codecov.io/gh/YosefLab/popv)\n[![Code Style](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/python/black)\n[![Downloads](https://pepy.tech/badge/popv)](https://pepy.tech/project/popv)\n\n# PopV\n\nPopV uses popular vote of a variety of cell-type transfer tools to classify cell-types in a query dataset based on a test dataset.\nUsing this variety of algorithms, we compute the agreement between those algorithms and use this agreement to predict which cell-types are with a high likelihood the same cell-types observed in the reference.\n\n## Algorithms\n\nCurrently implemented algorithms are:\n\n-   K-nearest neighbor classification after dataset integration with [BBKNN](https://github.com/Teichlab/bbknn)\n-   K-nearest neighbor classification after dataset integration with [SCANORAMA](https://github.com/brianhie/scanorama)\n-   K-nearest neighbor classification after dataset integration with [scVI](https://github.com/scverse/scvi-tools)\n-   K-nearest neighbor classification after dataset integration with [Harmony](https://github.com/lilab-bcb/harmony-pytorch)\n-   Random forest classification\n-   Support vector machine classification\n-   [OnClass](https://github.com/wangshenguiuc/OnClass) cell type classification\n-   [scANVI](https://github.com/scverse/scvi-tools) label transfer\n-   [Celltypist](https://www.celltypist.org) cell type classification\n\nAll algorithms are implemented as a class in [popv/algorithms](popv/algorithms/__init__.py).\nTo implement a new method, a class has to have several methods:\n\n-   algorithm.compute_integration: Computes dataset integration to yield an integrated latent space.\n-   algorithm.predict: Computes cell-type labels based on the specific classifier.\n-   algorithm.compute_embedding: Computes UMAP embedding of previously computed integrated latent space.\n\nWe highlight the implementation of a new classifier in a [scaffold](popv/algorithms/_scaffold.py). Adding a new class with those methods will automatically tell PopV to include this class into its classifiers and will use the new classifier as another expert.\n\nAll algorithms that allow for pre-training are pre-trained. This excludes by design BBKNN, Harmony and SCANORAMA as all construct a new embedding space. Pretrained models are stored on (Zenodo)[https://zenodo.org/record/7580707] and are automatically downloaded in the Colab notebook linked below. We encourage pre-training models when implementing new classes.\n\nAll input parameters are defined during initial call to [Process_Query](popv/preprocessing.py) and are stored in the unstructured field of the generated AnnData object. PopV has three levels of prediction complexities:\n\n-   retrain will train all classifiers from scratch. For 50k cells this takes up to an hour of computing time using a GPU.\n-   inference will use pretrained classifiers to annotate query as well as reference cells and construct a joint embedding using all integration methods from above. For 50k cells this takes in our hands up to half an hour of computing time using a GPU.\n-   fast will use only methods with pretrained classifiers to annotate only query cells. For 50k cells this takes 5 minutes without a GPU (without UMAP embedding).\n\nA user-defined selection of classification algorithms can be defined when calling [annotate_data](popv/annotation.py). Additionally advanced users can define here non-standard parameters for the integration methods as well as the classifiers.\n\n## Output\n\nPopV will output a cell-type classification for each of the used classifiers, as well as the majority vote across all classifiers. Additionally, PopV uses the ontology to go through the full ontology descendants for the OnClass prediction (disabled in fast mode). This method will be further described when PopV is published. PopV additionally outputs a score, which counts the number of classifiers that agreed upon the PopV prediction. This can be seen as the certainty that the current prediction is correct for every single cell in the query data. We generally found disagreement of a single expert to be still highly reliable while disagreement of more than 2 classifiers signifies less reliable results. The aim of PopV is not to fully annotate a data set but to highlight cells that potentially benefit from further manual careful annotation.\nAdditionally, PopV outputs UMAP embeddings of all integrated latent spaces if _compute_embedding==True_ in [Process_Query](popv/preprocessing.py) and computes certainties for every used classifier if _return_probabilities==True_ in [Process_Query](popv/preprocessing.py).\n\n## Installation\n\nWe suggest using a package manager like conda or mamba to install the package. OnClass files for annotation based on Tabula sapiens are deposited in popv/ontology. We use [Cell Ontology](https://obofoundry.org/ontology/cl.html) as an ontology throughout our experiments. PopV will automatically look for the ontology in this folder. If you want to provide your user-edited ontology, we will provide notebooks to create the Natural Language Model used in OnClass for this user-defined ontology.\n\n    conda create -n yourenv python=3.8\n    conda activate yourenv\n    pip install git+https://github.com/czbiohub/PopV\n\n## Example notebook\n\nWe provide an example notebook in Google Colab:\n\n-   [Tutorial demonstrating use of Tabula sapiens as a reference](docs/notebooks/tabula_sapiens_tutorial.ipynb)\n\nThis notebook will guide you through annotating a dataset based on the annotated [Tabula sapiens reference](https://tabula-sapiens-portal.ds.czbiohub.org) and demonstrates how to run annotation on your own query dataset. This notebook requires that all cells are annotated based on a cell ontology. We strongly encourage the use of a common cell ontology, see also [Osumi-Sutherland et al](https://www.nature.com/articles/s41556-021-00787-7). Using a cell ontology is a requirement to run OnClass as a prediction algorithm.\n\nWe allow running PopV without using a cell ontology.\n",
    "bugtrack_url": null,
    "license": "MIT License\n        \n        Copyright (c) 2023, Can Ergen\n        \n        Permission is hereby granted, free of charge, to any person obtaining a copy\n        of this software and associated documentation files (the \"Software\"), to deal\n        in the Software without restriction, including without limitation the rights\n        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n        copies of the Software, and to permit persons to whom the Software is\n        furnished to do so, subject to the following conditions:\n        \n        The above copyright notice and this permission notice shall be included in all\n        copies or substantial portions of the Software.\n        \n        THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n        SOFTWARE.",
    "summary": "Consensus prediction of cell type labels with popV",
    "version": "0.4.2",
    "project_urls": {
        "Documentation": "https://PopV.readthedocs.io/",
        "Home-page": "https://github.com/YosefLab/PopV.git",
        "Source": "https://github.com/YosefLab/PopV.git"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "fce83c2de90739415dd715d5ec08e927d92348d186595a20616e6cebb8065412",
                "md5": "4fa7289f5dd4eb905d09877e5f4be563",
                "sha256": "cd6772d1583050d46f003e6c30ff0d852833c0ce436d3330f6ad2f090b07674a"
            },
            "downloads": -1,
            "filename": "popv-0.4.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4fa7289f5dd4eb905d09877e5f4be563",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 37071,
            "upload_time": "2024-02-15T19:11:40",
            "upload_time_iso_8601": "2024-02-15T19:11:40.921139Z",
            "url": "https://files.pythonhosted.org/packages/fc/e8/3c2de90739415dd715d5ec08e927d92348d186595a20616e6cebb8065412/popv-0.4.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "af59ee896a95d17d481d6e5d5cef163420c35b6ab8ed24069520ad07c45ab326",
                "md5": "d32401cfabd614681b524ffebea1eadc",
                "sha256": "8fc2bb6a10dc01e61640797ffd5a12767e0aa4414a43abbabe8bbf3ee4cd5439"
            },
            "downloads": -1,
            "filename": "popv-0.4.2.tar.gz",
            "has_sig": false,
            "md5_digest": "d32401cfabd614681b524ffebea1eadc",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 8450476,
            "upload_time": "2024-02-15T19:11:52",
            "upload_time_iso_8601": "2024-02-15T19:11:52.900512Z",
            "url": "https://files.pythonhosted.org/packages/af/59/ee896a95d17d481d6e5d5cef163420c35b6ab8ed24069520ad07c45ab326/popv-0.4.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-15 19:11:52",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "YosefLab",
    "github_project": "PopV",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "popv"
}

Can Ergen