# GEARS: Predicting transcriptional outcomes of novel multi-gene perturbations
This repository hosts the official implementation of GEARS, a method that can predict transcriptional response to both single and multi-gene perturbations using single-cell RNA-sequencing data from perturbational screens.
<p align="center"><img src="https://github.com/snap-stanford/GEARS/blob/master/img/gears.png" alt="gears" width="900px" /></p>
### Installation
Install [PyG](https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html), and then do `pip install cell-gears`.
### [New] Updates in v0.1.1
- Fixed training breakpoint bug from v0.1.0
- Preprocessed dataloader now available for Replogle 2022 RPE1 and K562 essential datasets
- Added custom split, fixed no-test split
### Core API Interface
Using the API, you can (1) reproduce the results in our paper and (2) train GEARS on your perturbation dataset using a few lines of code.
```python
from gears import PertData, GEARS
# get data
pert_data = PertData('./data')
# load dataset in paper: norman, adamson, dixit.
pert_data.load(data_name = 'norman')
# specify data split
pert_data.prepare_split(split = 'simulation', seed = 1)
# get dataloader with batch size
pert_data.get_dataloader(batch_size = 32, test_batch_size = 128)
# set up and train a model
gears_model = GEARS(pert_data, device = 'cuda:8')
gears_model.model_initialize(hidden_size = 64)
gears_model.train(epochs = 20)
# save/load model
gears_model.save_model('gears')
gears_model.load_pretrained('gears')
# predict
gears_model.predict([['CBL', 'CNN1'], ['FEV']])
gears_model.GI_predict(['CBL', 'CNN1'], GI_genes_file=None)
```
To use your own dataset, create a scanpy adata object with a `gene_name` column in `adata.var`, and two columns `condition`, `cell_type` in `adata.obs`. Then run:
```python
pert_data.new_data_process(dataset_name = 'XXX', adata = adata)
# to load the processed data
pert_data.load(data_path = './data/XXX')
```
### Demos
| Name | Description |
|-----------------|-------------|
| [Dataset Tutorial](demo/data_tutorial.ipynb) | Tutorial on how to use the dataset loader and read customized data|
| [Model Tutorial](demo/model_tutorial.ipynb) | Tutorial on how to train GEARS |
| [Plot top 20 DE genes](demo/tutorial_plot_top20_DE.ipynb) | Tutorial on how to plot the top 20 DE genes|
| [Uncertainty](demo/tutorial_uncertainty.ipynb) | Tutorial on how to train an uncertainty-aware GEARS model |
### Colab
| Name | Description |
|-----------------|-------------|
| [Using Trained Model](https://colab.research.google.com/drive/11LlzGEUGoBk_Uj6DzlzizAeWse5_E9MK?usp=sharing) | Use a model trained on Norman et al. 2019 to make predictions (Needs Colab Pro)|
### Cite Us
```
@article{roohani2023predicting,
title={Predicting transcriptional outcomes of novel multigene perturbations with gears},
author={Roohani, Yusuf and Huang, Kexin and Leskovec, Jure},
journal={Nature Biotechnology},
year={2023},
publisher={Nature Publishing Group US New York}
}
```
Paper: [Link](https://www.nature.com/articles/s41587-023-01905-6)
Code for reproducing figures: [Link](https://github.com/yhr91/gears_misc)
Raw data
{
"_id": null,
"home_page": "https://github.com/snap-stanford/GEARS",
"name": "cell-gears",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "",
"author": "Yusuf Roohani, Kexin Huang, Jure Leskovec",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/37/34/bb7c4a418fbb3c1f7216bc1d6396b328ef9c338631c616295da7d444d6b3/cell-gears-0.1.2.tar.gz",
"platform": null,
"description": "# GEARS: Predicting transcriptional outcomes of novel multi-gene perturbations\n\nThis repository hosts the official implementation of GEARS, a method that can predict transcriptional response to both single and multi-gene perturbations using single-cell RNA-sequencing data from perturbational screens. \n\n\n<p align=\"center\"><img src=\"https://github.com/snap-stanford/GEARS/blob/master/img/gears.png\" alt=\"gears\" width=\"900px\" /></p>\n\n\n### Installation \n\nInstall [PyG](https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html), and then do `pip install cell-gears`.\n\n### [New] Updates in v0.1.1\n\n- Fixed training breakpoint bug from v0.1.0\n- Preprocessed dataloader now available for Replogle 2022 RPE1 and K562 essential datasets\n- Added custom split, fixed no-test split\n\n### Core API Interface\n\nUsing the API, you can (1) reproduce the results in our paper and (2) train GEARS on your perturbation dataset using a few lines of code.\n\n```python\nfrom gears import PertData, GEARS\n\n# get data\npert_data = PertData('./data')\n# load dataset in paper: norman, adamson, dixit.\npert_data.load(data_name = 'norman')\n# specify data split\npert_data.prepare_split(split = 'simulation', seed = 1)\n# get dataloader with batch size\npert_data.get_dataloader(batch_size = 32, test_batch_size = 128)\n\n# set up and train a model\ngears_model = GEARS(pert_data, device = 'cuda:8')\ngears_model.model_initialize(hidden_size = 64)\ngears_model.train(epochs = 20)\n\n# save/load model\ngears_model.save_model('gears')\ngears_model.load_pretrained('gears')\n\n# predict\ngears_model.predict([['CBL', 'CNN1'], ['FEV']])\ngears_model.GI_predict(['CBL', 'CNN1'], GI_genes_file=None)\n```\n\nTo use your own dataset, create a scanpy adata object with a `gene_name` column in `adata.var`, and two columns `condition`, `cell_type` in `adata.obs`. Then run:\n\n```python\npert_data.new_data_process(dataset_name = 'XXX', adata = adata)\n# to load the processed data\npert_data.load(data_path = './data/XXX')\n```\n\n### Demos\n\n| Name | Description |\n|-----------------|-------------|\n| [Dataset Tutorial](demo/data_tutorial.ipynb) | Tutorial on how to use the dataset loader and read customized data|\n| [Model Tutorial](demo/model_tutorial.ipynb) | Tutorial on how to train GEARS |\n| [Plot top 20 DE genes](demo/tutorial_plot_top20_DE.ipynb) | Tutorial on how to plot the top 20 DE genes|\n| [Uncertainty](demo/tutorial_uncertainty.ipynb) | Tutorial on how to train an uncertainty-aware GEARS model |\n\n\n### Colab\n\n| Name | Description |\n|-----------------|-------------|\n| [Using Trained Model](https://colab.research.google.com/drive/11LlzGEUGoBk_Uj6DzlzizAeWse5_E9MK?usp=sharing) | Use a model trained on Norman et al. 2019 to make predictions (Needs Colab Pro)|\n\n\n\n### Cite Us\n\n```\n@article{roohani2023predicting,\n title={Predicting transcriptional outcomes of novel multigene perturbations with gears},\n author={Roohani, Yusuf and Huang, Kexin and Leskovec, Jure},\n journal={Nature Biotechnology},\n year={2023},\n publisher={Nature Publishing Group US New York}\n}\n```\nPaper: [Link](https://www.nature.com/articles/s41587-023-01905-6)\n\nCode for reproducing figures: [Link](https://github.com/yhr91/gears_misc)\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "GEARS",
"version": "0.1.2",
"project_urls": {
"Homepage": "https://github.com/snap-stanford/GEARS"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ac557732f2dfc51c9685da00c6b824c2f8ad734a6ec4fcf54db6890c8bfc1ff1",
"md5": "ac42ce67baa90be9499d97e090856a3d",
"sha256": "4db11fd69ce4825cf9a3e4a533c7e864ad0de3d60260bc9e88fc5d6a3c7b095a"
},
"downloads": -1,
"filename": "cell_gears-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ac42ce67baa90be9499d97e090856a3d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 31083,
"upload_time": "2023-12-13T10:41:46",
"upload_time_iso_8601": "2023-12-13T10:41:46.413482Z",
"url": "https://files.pythonhosted.org/packages/ac/55/7732f2dfc51c9685da00c6b824c2f8ad734a6ec4fcf54db6890c8bfc1ff1/cell_gears-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "3734bb7c4a418fbb3c1f7216bc1d6396b328ef9c338631c616295da7d444d6b3",
"md5": "d3c7932ef18cc2fa6dedb85a8bd549a5",
"sha256": "11b8a180b0af7fa797999c0272a5f1c39f5e89bb6d09b784df5cd2b892507959"
},
"downloads": -1,
"filename": "cell-gears-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "d3c7932ef18cc2fa6dedb85a8bd549a5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 28914,
"upload_time": "2023-12-13T10:41:48",
"upload_time_iso_8601": "2023-12-13T10:41:48.015461Z",
"url": "https://files.pythonhosted.org/packages/37/34/bb7c4a418fbb3c1f7216bc1d6396b328ef9c338631c616295da7d444d6b3/cell-gears-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-12-13 10:41:48",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "snap-stanford",
"github_project": "GEARS",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "numpy",
"specs": []
},
{
"name": "pandas",
"specs": []
},
{
"name": "tqdm",
"specs": []
},
{
"name": "scikit-learn",
"specs": []
},
{
"name": "torch",
"specs": []
},
{
"name": "scanpy",
"specs": []
},
{
"name": "networkx",
"specs": []
},
{
"name": "dcor",
"specs": []
}
],
"lcname": "cell-gears"
}