iperturb


Nameiperturb JSON
Version 0.2.9 PyPI version JSON
download
home_pagehttps://github.com/BillyChen123/iPerturb
SummaryAtlas-level data integration in multi-condition single-cell genomics
upload_time2024-08-12 04:58:09
maintainerNone
docs_urlNone
authorBilly Chen
requires_python>=3.6
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # iPerturb: An Introduction

## Overview

iPerturb is an efficient tool for integrating single-cell RNA sequencing (scRNA-seq) data with multiple samples and multiple conditions, focusing on removing batch effects while retaining condition-specific changes in gene expression. This document will introduce how to use iPerturb to process and analyze scRNA-seq datasets from different experimental conditions.

## Installation

Install using pip:

```bash
pip install iperturb
```

## Dataset Description

We applied iPerturb to analyze droplet-based scRNA-seq data from peripheral blood mononuclear cells (PBMCs). The dataset consists of two groups: one group includes peripheral blood cells treated with interferon-β (INF-β), and the other group includes untreated control cells. You can download the dataset from [here](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE96583).

Specifically, gene expression levels were measured from 8 experimental samples treated with INF-β (stimulated group; N = 7466 cells) and 8 control samples (control group; N = 6573 cells) to assess condition-specific changes in gene expression.

## Loading

Let's start by loading the required packages. We import the `iPerturb.iperturb` package as `iPerturb`. We also need to import the following two packages to ensure iPerturb works properly:

- `scanpy`: iPerturb is built on the `scanpy` framework and accepts single-cell data files in the h5ad format.
- `torch`: iPerturb uses `pytorch` to build the variational autoencoder and use `cuda` to accelerate the inference, we need to detect if cuda is avaliable.

```python
import iperturb as iPerturb
import torch
import scanpy as sc

cuda = torch.cuda.is_available()
if cuda:
    print('cuda is available')
else:
    print('cuda is not available')

anndata = sc.read_h5ad('/data/chenyz/iPerturb_project/data/PBMC.h5ad')
```
## Preprocessing

Data preprocessing of `anndata` included the following steps:

1. **Quality Control**: Removal of low-quality cells and genes (default: min_genes=200, min_cells=3).
   
2. **Normalization**: Standardizing gene expression data (default: `normalize_total()`, `log1p()`).
   
3. **Dataset initiation**: Batch, condition, and groundtruth (optional) information are added to `anndata.obs` and set as category types.

4. **Find highly variable genes**: Annotate highly variable genes to accelerate integration (default: n_top_genes=4000).


In iPerturb, we provide a unified function `preprocess.data_load()` to accomplish this:
```python
# Load necessary datasets and parameters such as batch_key, condition_key, and groundtruth_key (optional)
batch_key = 'batch_2'
condition_key = 'batch'
groundtruth_key = 'groundtruth'  # Used for calculating ARI
datasets,raw,var_names,index_names = iPerturb.preprocess.data_load(anndata, batch_key = batch_key ,condition_key = condition_key , groundtruth_key = groundtruth_key ,n_top_genes = 4000)

datasets.X = datasets.layers['counts'] # if mode = 'possion'
# datasets.X = datasets.layers['lognorm'] # if mode ='lognorm'
```
## Model initiating
After preprocessing the data, the next step is to initiate the model. Here are the steps to set up and initiate the iPerturb model:

1. **Create Hyperparameters**: We start by creating the hyperparameters for the model using the `utils.create_hyper()` function.

2. **Define Training Parameters**: We define the training parameters including the number of epochs and the optimizer. In this example, we use the Adam optimizer.

3. **Initialize the Model**: Next, we initialize the iPerturb model using the `model.model_init_ function()`. This function sets up the model with the specified hyperparameters, latent dimensions, optimizer, learning rate, and other parameters. Here are the explanation of Parameters:

- `hyper`: The hyperparameters created in step 1.
   
- `latent_dim1`, `latent_dim2`, `latent_dim3`: Dimensions of the latent variable of Z, Z_t and Z_s.
   
- `optimizer`: The optimizer used for training, in this case, Adam.
   
- `lr`: Learning rate for the optimizer.
   
- `gamma`: Learning rate decay factor.
   
- `milestones`: Epochs at which the learning rate is decayed.
   
- `set_seed`: Random seed for reproducibility.
   
- `cuda`: Boolean indicating whether to use GPU for training.
   
- `alpha`: Regularization parameter.

Finally, we got 3 key component as output to strat VAE infernce(reference by [Pyro](https://pyro.ai/)):
- `svi`: Stochastic Variational Inference (SVI) object used for optimizing the variational inference objective in the variational autoencoder (VAE) model.
  
- `scheduler`: Learning rate scheduler that adjusts the learning rate during training based on the specified milestones and gamma.
  
- `iPerturb_model`: The initialized iPerturb model, which includes the variational autoencoder (VAE) architecture configured with the specified hyperparameters and settings.

```python
# create hyperparameters
hyper = iPerturb.utils.create_hyper(datasets, var_names, index_names)
# train
epochs = 15
optimizer = torch.optim.Adam

svi, scheduler, iPerturb_model = iPerturb.model.model_init_(hyper, latent_dim1=100, latent_dim2=20, latent_dim3=20, 
                                                            optimizer=optimizer, lr=0.006, gamma=0.2, milestones=[20], 
                                                            set_seed=123, cuda=cuda, alpha=1e-4, mode='possion')
```
## Model training
Once the iPerturb model is initialized, we proceed to train the model using the `model.RUN()` function. The parameter `if_likelihood` is used to compute the model's t_logits. The model returns two results: `x_pred`, which represents the corrected matrix, and `reconstruct_data`, which represents the corrected AnnData.

iPerturb is computationally efficient, with a typical runtime of approximately 5-10 minutes in this example.
```python
x_pred, reconstruct_data = iPerturb.model.RUN(datasets, iPerturb_model, svi, scheduler, epochs, hyper, raw, cuda, batch_size=100, if_likelihood=True)

reconstruct_data.write(os.path.join(savepath, 'iPerturb.h5ad'))
```
    
`

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/BillyChen123/iPerturb",
    "name": "iperturb",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": null,
    "author": "Billy Chen",
    "author_email": "cyz2022@stu.xjtu.edu.cn",
    "download_url": "https://files.pythonhosted.org/packages/0f/79/305dfac1567f9d8617c10228100d1d740585cc4a2117181a2d32d9e97ed7/iperturb-0.2.9.tar.gz",
    "platform": null,
    "description": "# iPerturb: An Introduction\n\n## Overview\n\niPerturb is an efficient tool for integrating single-cell RNA sequencing (scRNA-seq) data with multiple samples and multiple conditions, focusing on removing batch effects while retaining condition-specific changes in gene expression. This document will introduce how to use iPerturb to process and analyze scRNA-seq datasets from different experimental conditions.\n\n## Installation\n\nInstall using pip:\n\n```bash\npip install iperturb\n```\n\n## Dataset Description\n\nWe applied iPerturb to analyze droplet-based scRNA-seq data from peripheral blood mononuclear cells (PBMCs). The dataset consists of two groups: one group includes peripheral blood cells treated with interferon-\u03b2 (INF-\u03b2), and the other group includes untreated control cells. You can download the dataset from [here](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE96583).\n\nSpecifically, gene expression levels were measured from 8 experimental samples treated with INF-\u03b2 (stimulated group; N = 7466 cells) and 8 control samples (control group; N = 6573 cells) to assess condition-specific changes in gene expression.\n\n## Loading\n\nLet's start by loading the required packages. We import the `iPerturb.iperturb` package as `iPerturb`. We also need to import the following two packages to ensure iPerturb works properly:\n\n- `scanpy`: iPerturb is built on the `scanpy` framework and accepts single-cell data files in the h5ad format.\n- `torch`: iPerturb uses `pytorch` to build the variational autoencoder and use `cuda` to accelerate the inference, we need to detect if cuda is avaliable.\n\n```python\nimport iperturb as iPerturb\nimport torch\nimport scanpy as sc\n\ncuda = torch.cuda.is_available()\nif cuda:\n    print('cuda is available')\nelse:\n    print('cuda is not available')\n\nanndata = sc.read_h5ad('/data/chenyz/iPerturb_project/data/PBMC.h5ad')\n```\n## Preprocessing\n\nData preprocessing of `anndata` included the following steps:\n\n1. **Quality Control**: Removal of low-quality cells and genes (default: min_genes=200, min_cells=3).\n   \n2. **Normalization**: Standardizing gene expression data (default: `normalize_total()`, `log1p()`).\n   \n3. **Dataset initiation**: Batch, condition, and groundtruth (optional) information are added to `anndata.obs` and set as category types.\n\n4. **Find highly variable genes**: Annotate highly variable genes to accelerate integration (default: n_top_genes=4000).\n\n\nIn iPerturb, we provide a unified function `preprocess.data_load()` to accomplish this:\n```python\n# Load necessary datasets and parameters such as batch_key, condition_key, and groundtruth_key (optional)\nbatch_key = 'batch_2'\ncondition_key = 'batch'\ngroundtruth_key = 'groundtruth'  # Used for calculating ARI\ndatasets,raw,var_names,index_names = iPerturb.preprocess.data_load(anndata, batch_key = batch_key ,condition_key = condition_key , groundtruth_key = groundtruth_key ,n_top_genes = 4000)\n\ndatasets.X = datasets.layers['counts'] # if mode = 'possion'\n# datasets.X = datasets.layers['lognorm'] # if mode ='lognorm'\n```\n## Model initiating\nAfter preprocessing the data, the next step is to initiate the model. Here are the steps to set up and initiate the iPerturb model:\n\n1. **Create Hyperparameters**: We start by creating the hyperparameters for the model using the `utils.create_hyper()` function.\n\n2. **Define Training Parameters**: We define the training parameters including the number of epochs and the optimizer. In this example, we use the Adam optimizer.\n\n3. **Initialize the Model**: Next, we initialize the iPerturb model using the `model.model_init_ function()`. This function sets up the model with the specified hyperparameters, latent dimensions, optimizer, learning rate, and other parameters. Here are the explanation of Parameters:\n\n- `hyper`: The hyperparameters created in step 1.\n   \n- `latent_dim1`, `latent_dim2`, `latent_dim3`: Dimensions of the latent variable of Z, Z_t and Z_s.\n   \n- `optimizer`: The optimizer used for training, in this case, Adam.\n   \n- `lr`: Learning rate for the optimizer.\n   \n- `gamma`: Learning rate decay factor.\n   \n- `milestones`: Epochs at which the learning rate is decayed.\n   \n- `set_seed`: Random seed for reproducibility.\n   \n- `cuda`: Boolean indicating whether to use GPU for training.\n   \n- `alpha`: Regularization parameter.\n\nFinally, we got 3 key component as output to strat VAE infernce(reference by [Pyro](https://pyro.ai/)):\n- `svi`: Stochastic Variational Inference (SVI) object used for optimizing the variational inference objective in the variational autoencoder (VAE) model.\n  \n- `scheduler`: Learning rate scheduler that adjusts the learning rate during training based on the specified milestones and gamma.\n  \n- `iPerturb_model`: The initialized iPerturb model, which includes the variational autoencoder (VAE) architecture configured with the specified hyperparameters and settings.\n\n```python\n# create hyperparameters\nhyper = iPerturb.utils.create_hyper(datasets, var_names, index_names)\n# train\nepochs = 15\noptimizer = torch.optim.Adam\n\nsvi, scheduler, iPerturb_model = iPerturb.model.model_init_(hyper, latent_dim1=100, latent_dim2=20, latent_dim3=20, \n                                                            optimizer=optimizer, lr=0.006, gamma=0.2, milestones=[20], \n                                                            set_seed=123, cuda=cuda, alpha=1e-4, mode='possion')\n```\n## Model training\nOnce the iPerturb model is initialized, we proceed to train the model using the `model.RUN()` function. The parameter `if_likelihood` is used to compute the model's t_logits. The model returns two results: `x_pred`, which represents the corrected matrix, and `reconstruct_data`, which represents the corrected AnnData.\n\niPerturb is computationally efficient, with a typical runtime of approximately 5-10 minutes in this example.\n```python\nx_pred, reconstruct_data = iPerturb.model.RUN(datasets, iPerturb_model, svi, scheduler, epochs, hyper, raw, cuda, batch_size=100, if_likelihood=True)\n\nreconstruct_data.write(os.path.join(savepath, 'iPerturb.h5ad'))\n```\n    \n`\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Atlas-level data integration in multi-condition single-cell genomics",
    "version": "0.2.9",
    "project_urls": {
        "Homepage": "https://github.com/BillyChen123/iPerturb"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "48c722c0140709bbe4e9db52ac667f479425e7408eaa0e1e40371d88cd79874c",
                "md5": "986077d1077adf304dea05d614ff144a",
                "sha256": "da6884493e54468d79c436a53df672a7937726aaf276d3d181b213485cf92a4e"
            },
            "downloads": -1,
            "filename": "iperturb-0.2.9-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "986077d1077adf304dea05d614ff144a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 14386,
            "upload_time": "2024-08-12T04:58:07",
            "upload_time_iso_8601": "2024-08-12T04:58:07.176818Z",
            "url": "https://files.pythonhosted.org/packages/48/c7/22c0140709bbe4e9db52ac667f479425e7408eaa0e1e40371d88cd79874c/iperturb-0.2.9-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0f79305dfac1567f9d8617c10228100d1d740585cc4a2117181a2d32d9e97ed7",
                "md5": "9ebb2c1e9c373be38b277ff1903dcb60",
                "sha256": "21fd3306c26cbc1c0a1311059b6aed4093b10f9d84703ac40bcfa75e1bc11408"
            },
            "downloads": -1,
            "filename": "iperturb-0.2.9.tar.gz",
            "has_sig": false,
            "md5_digest": "9ebb2c1e9c373be38b277ff1903dcb60",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 16047,
            "upload_time": "2024-08-12T04:58:09",
            "upload_time_iso_8601": "2024-08-12T04:58:09.012273Z",
            "url": "https://files.pythonhosted.org/packages/0f/79/305dfac1567f9d8617c10228100d1d740585cc4a2117181a2d32d9e97ed7/iperturb-0.2.9.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-12 04:58:09",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "BillyChen123",
    "github_project": "iPerturb",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "iperturb"
}
        
Elapsed time: 0.28895s