scAnnot


NamescAnnot JSON
Version 0.0.7 PyPI version JSON
download
home_pagehttps://github.com/changebio/scAnnot
Summarysingle cell annotation
upload_time2024-09-24 08:47:41
maintainerNone
docs_urlNone
authorYin Huang
requires_python>=3.7
licenseApache Software License 2.0
keywords nbdev jupyter notebook python
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            **scAnnot User Documentation**
================

<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

> scAnnot is a tool for performing hierarchical multi-level annotation
> of single-cell datasets using pre-trained SCANVI models.
>
> This document will guide you through the installation process, explain
> the workflow and the usage of the tool, and provide additional info.

## Overview

scAnnot utilizes pre-trained SCANVI models on a core reference dataset
and organized by annotation hierarchy to recursively annotate
single-cell datasets at multiple levels (e.g. cell type \> subtype).

It generates annotated labels by leveraging cell representations learned
from the reference dataset. The modular workflow is highly customizable
and scalable to large datasets.

## Dependency

    numpy
    pandas
    matplotlib
    scanpy
    anndata
    scipy
    tqdm
    pathlib
    fastcore
    scvi-tools

## Installation

To install scAnnot, please follow these steps:

**Step 1:** Ensure that you have Python 3.7 or higher installed on your
system.

**Step 2:** Open a terminal or command prompt.

**Step 3:** Install scAnnot using pip by executing the following
command:

    pip install scAnnot

**Step 4:** Once the installation is complete, you can verify it by
running the following command:

    scAnnot --version

If the installation was successful, the tool’s version number will be
displayed.

## Annotation Workflow

The main steps in the workflow are:

### 1. Data Preprocessing

The input AnnData object containing expression values is preprocessed to
filter to common genes with a reference dataset and apply optional
normalization steps.

This is handled by the `preprocess_data()` function.

### 2. Model Loading

Trained SCANVI models for each annotation level are stored in a
directory structure matching the hierarchy of annotations.

#### Model Directory Structure

The trained models are organized in a directory structure matching the
annotation hierarchy.

The root `/model.pt` file contains the model for Level 1 (primary cell
type) annotation.

Subdirectories named for each level cell type contain further nested
models:

    /root
    ├── model.pt  
    ├── L1_celltype1
    │   ├── model.pt
    │   ├── L2_celltype2  
    │   │   ├── model.pt 
    │   ├── L2_celltype3
    │       ├── model.pt
    ├── L1_celltype2
       ├── model.pt
    ├── L1_celltype3
       ├── model.pt

Each subdirectory contains a model for the next annotation level of
sub-celltypes specific to that cell type.

This organizational structure encodes the annotation schema and enables
the recursive modeling approach.

### 3. Recursive Annotation

Annotation begins with the top/primary cell types using the model in the
‘root/model.pt’ file.

Deeper levels are then recursively annotated by:

1.  Filtering to the current annotation
2.  Looping through sub-directories as next level down
3.  Loading appropriate model
4.  Making predictions to annotate
5.  Repeating on subset for next level

This is implemented by the `annotate_levels()` and
`annotate_deeper_levels()` functions. It allows fully generalized
annotation of hierarchical multi-level structure in single-cell
datasets.

## Usage

The scAnnot tool provides a command-line interface (CLI) and an
interactive interface (e.g., jupyter notebook) for annotating single
cell datasets. Here is an overview of the available commands and their
usage:

    scAnnot --input <input_file> --reference <reference_file> --model_dir <model_dir> --output <output_file>

-   `<input_file>`: Path to the single-cell data file.
-   `<reference_file>`: Path to the reference dataset file.
-   `<model_dir>`: Path to the directory containing trained SCANVI
    models.
-   `<output_file>`: Path to the output file where annotated data will
    be saved.

### In command line

**1. output the table of predicted lables in csv format**

    scAnnot test.h5ad --reference ref.h5ad --output test.csv

**2. output the anndata with predicted lables in h5ad**

    scAnnot test.h5ad --reference ref.h5ad --output test.h5ad

### In jupyter notebook

    ad=scAnnot('test.h5ad', 'ref.h5ad')

    #show umap from the latent space
    ad=scAnnot('test.h5ad', 'ref.h5ad', show=True)

the UMAP plot for the level1 annotation:
![level1](https://github.com/rnacentre/scAnnot/blob/master/img/level1.png)

the UMAP plot for the original annotation:
![original](https://github.com/rnacentre/scAnnot/blob/master/img/original.png)

## Additional Details

-   **Supported Data Formats:** scAnnot now only supports data in H5ad
    format. Ensure that your data is formatted correctly before running
    the tool.
-   **Reference Dataset:** To achieve accurate annotation results, it is
    crucial to provide a reference dataset that closely matches your
    scRNA-seq data. The reference dataset should contain annotated cells
    from various cell types.
-   **pre-trained scANVI models:** The models are trained on the
    reference dataset for each level cell types and saved in a directory
    structure matching the annotation hierarchy.
-   **Output Formats:** The tool saves the annotated data in various
    formats, depending on the output name. By default, CSV format is
    used for output files, but you can specify other formats such as TXT
    or H5ad if desired.

## Citation

Chen, X., Huang, Y., Huang, L. et al. A brain cell atlas integrating
single-cell transcriptomes across human brain regions. Nat Med 30,
2679–2691 (2024). https://doi.org/10.1038/s41591-024-03150-z

## Issues

Let me know if you have any other questions! Please submit any issues or
questions as GitHub issues.



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/changebio/scAnnot",
    "name": "scAnnot",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "nbdev jupyter notebook python",
    "author": "Yin Huang",
    "author_email": "changebio@yeah.net",
    "download_url": "https://files.pythonhosted.org/packages/c1/16/56338f6ca1f7fff6d3a068bb63994752aa0851824481c542b7ee86662ba4/scAnnot-0.0.7.tar.gz",
    "platform": null,
    "description": "**scAnnot User Documentation**\n================\n\n<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->\n\n> scAnnot is a tool for performing hierarchical multi-level annotation\n> of single-cell datasets using pre-trained SCANVI models.\n>\n> This document will guide you through the installation process, explain\n> the workflow and the usage of the tool, and provide additional info.\n\n## Overview\n\nscAnnot utilizes pre-trained SCANVI models on a core reference dataset\nand organized by annotation hierarchy to recursively annotate\nsingle-cell datasets at multiple levels (e.g.\u00a0cell type \\> subtype).\n\nIt generates annotated labels by leveraging cell representations learned\nfrom the reference dataset. The modular workflow is highly customizable\nand scalable to large datasets.\n\n## Dependency\n\n    numpy\n    pandas\n    matplotlib\n    scanpy\n    anndata\n    scipy\n    tqdm\n    pathlib\n    fastcore\n    scvi-tools\n\n## Installation\n\nTo install scAnnot, please follow these steps:\n\n**Step 1:** Ensure that you have Python 3.7 or higher installed on your\nsystem.\n\n**Step 2:** Open a terminal or command prompt.\n\n**Step 3:** Install scAnnot using pip by executing the following\ncommand:\n\n    pip install scAnnot\n\n**Step 4:** Once the installation is complete, you can verify it by\nrunning the following command:\n\n    scAnnot --version\n\nIf the installation was successful, the tool\u2019s version number will be\ndisplayed.\n\n## Annotation Workflow\n\nThe main steps in the workflow are:\n\n### 1. Data Preprocessing\n\nThe input AnnData object containing expression values is preprocessed to\nfilter to common genes with a reference dataset and apply optional\nnormalization steps.\n\nThis is handled by the `preprocess_data()` function.\n\n### 2. Model Loading\n\nTrained SCANVI models for each annotation level are stored in a\ndirectory structure matching the hierarchy of annotations.\n\n#### Model Directory Structure\n\nThe trained models are organized in a directory structure matching the\nannotation hierarchy.\n\nThe root `/model.pt` file contains the model for Level 1 (primary cell\ntype) annotation.\n\nSubdirectories named for each level cell type contain further nested\nmodels:\n\n    /root\n    \u251c\u2500\u2500 model.pt  \n    \u251c\u2500\u2500 L1_celltype1\n    \u2502   \u251c\u2500\u2500 model.pt\n    \u2502   \u251c\u2500\u2500 L2_celltype2  \n    \u2502   \u2502   \u251c\u2500\u2500 model.pt \n    \u2502   \u251c\u2500\u2500 L2_celltype3\n    \u2502       \u251c\u2500\u2500 model.pt\n    \u251c\u2500\u2500 L1_celltype2\n       \u251c\u2500\u2500 model.pt\n    \u251c\u2500\u2500 L1_celltype3\n       \u251c\u2500\u2500 model.pt\n\nEach subdirectory contains a model for the next annotation level of\nsub-celltypes specific to that cell type.\n\nThis organizational structure encodes the annotation schema and enables\nthe recursive modeling approach.\n\n### 3. Recursive Annotation\n\nAnnotation begins with the top/primary cell types using the model in the\n\u2018root/model.pt\u2019 file.\n\nDeeper levels are then recursively annotated by:\n\n1.  Filtering to the current annotation\n2.  Looping through sub-directories as next level down\n3.  Loading appropriate model\n4.  Making predictions to annotate\n5.  Repeating on subset for next level\n\nThis is implemented by the `annotate_levels()` and\n`annotate_deeper_levels()` functions. It allows fully generalized\nannotation of hierarchical multi-level structure in single-cell\ndatasets.\n\n## Usage\n\nThe scAnnot tool provides a command-line interface (CLI) and an\ninteractive interface (e.g., jupyter notebook) for annotating single\ncell datasets. Here is an overview of the available commands and their\nusage:\n\n    scAnnot --input <input_file> --reference <reference_file> --model_dir <model_dir> --output <output_file>\n\n-   `<input_file>`: Path to the single-cell data file.\n-   `<reference_file>`: Path to the reference dataset file.\n-   `<model_dir>`: Path to the directory containing trained SCANVI\n    models.\n-   `<output_file>`: Path to the output file where annotated data will\n    be saved.\n\n### In command line\n\n**1. output the table of predicted lables in csv format**\n\n    scAnnot test.h5ad --reference ref.h5ad --output test.csv\n\n**2. output the anndata with predicted lables in h5ad**\n\n    scAnnot test.h5ad --reference ref.h5ad --output test.h5ad\n\n### In jupyter notebook\n\n    ad=scAnnot('test.h5ad', 'ref.h5ad')\n\n    #show umap from the latent space\n    ad=scAnnot('test.h5ad', 'ref.h5ad', show=True)\n\nthe UMAP plot for the level1 annotation:\n![level1](https://github.com/rnacentre/scAnnot/blob/master/img/level1.png)\n\nthe UMAP plot for the original annotation:\n![original](https://github.com/rnacentre/scAnnot/blob/master/img/original.png)\n\n## Additional Details\n\n-   **Supported Data Formats:** scAnnot now only supports data in H5ad\n    format. Ensure that your data is formatted correctly before running\n    the tool.\n-   **Reference Dataset:** To achieve accurate annotation results, it is\n    crucial to provide a reference dataset that closely matches your\n    scRNA-seq data. The reference dataset should contain annotated cells\n    from various cell types.\n-   **pre-trained scANVI models:** The models are trained on the\n    reference dataset for each level cell types and saved in a directory\n    structure matching the annotation hierarchy.\n-   **Output Formats:** The tool saves the annotated data in various\n    formats, depending on the output name. By default, CSV format is\n    used for output files, but you can specify other formats such as TXT\n    or H5ad if desired.\n\n## Citation\n\nChen, X., Huang, Y., Huang, L. et al.\u00a0A brain cell atlas integrating\nsingle-cell transcriptomes across human brain regions. Nat Med 30,\n2679\u20132691 (2024). https://doi.org/10.1038/s41591-024-03150-z\n\n## Issues\n\nLet me know if you have any other questions! Please submit any issues or\nquestions as GitHub issues.\n\n\n",
    "bugtrack_url": null,
    "license": "Apache Software License 2.0",
    "summary": "single cell annotation",
    "version": "0.0.7",
    "project_urls": {
        "Homepage": "https://github.com/changebio/scAnnot"
    },
    "split_keywords": [
        "nbdev",
        "jupyter",
        "notebook",
        "python"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "977eb602cdc17690d4d05f1a10e736900276e977b010909165919e3dc6174fa1",
                "md5": "1182d832b22c2cf44ed28dd89aaf92e3",
                "sha256": "16720bdfca86f4d3041d33f89dbf8a94c62d84db14687bdbd551ab662af5fbaf"
            },
            "downloads": -1,
            "filename": "scAnnot-0.0.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1182d832b22c2cf44ed28dd89aaf92e3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 14462,
            "upload_time": "2024-09-24T08:47:39",
            "upload_time_iso_8601": "2024-09-24T08:47:39.166865Z",
            "url": "https://files.pythonhosted.org/packages/97/7e/b602cdc17690d4d05f1a10e736900276e977b010909165919e3dc6174fa1/scAnnot-0.0.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c11656338f6ca1f7fff6d3a068bb63994752aa0851824481c542b7ee86662ba4",
                "md5": "e99ab853d45e989d837c46db873efbd4",
                "sha256": "02dda3a5820f6abb2aa88ed44216ae6caba81cb611949f653f3420999d99d750"
            },
            "downloads": -1,
            "filename": "scAnnot-0.0.7.tar.gz",
            "has_sig": false,
            "md5_digest": "e99ab853d45e989d837c46db873efbd4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 16842,
            "upload_time": "2024-09-24T08:47:41",
            "upload_time_iso_8601": "2024-09-24T08:47:41.186782Z",
            "url": "https://files.pythonhosted.org/packages/c1/16/56338f6ca1f7fff6d3a068bb63994752aa0851824481c542b7ee86662ba4/scAnnot-0.0.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-24 08:47:41",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "changebio",
    "github_project": "scAnnot",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "scannot"
}
        
Elapsed time: 0.36972s