<img src="images/G2G_logo_new.png" alt="Image" width="180" height="150">
# Genes2Genes
Project page: https://teichlab.github.io/Genes2Genes
## A new framework for aligning single-cell trajectories of gene expression
G2G aims to guide downstream comparative analysis of single-cell reference and query systems along any axis of progression (e.g. pseudotime).
This is done by employing a new dynamic programming (DP) based alignment algorithm which unifies dynamic time warping (DTW) and gap modelling to capture both matches and mismatches between time points. Our DP algorithm
incorporates a Bayesian information-theoretic scoring scheme with a five-state probabilistic machine to generate an optimal alignment between a reference trajectory and query trajectory of a given gene in terms of their scRNA-seq expression.
We can use the G2G framework to perform comparisons across pseudotime such as:
<ul>
<li>Organoid vs. Reference tissue
<li>Control vs. Treatment
<li>Healthy vs. Disease
</ul>
by inferring fully-descriptive gene-specific alignments and single-aggregate alignments.
These alignment results enable us to pinpoint dynamic similarities and differences in gene expression between a reference and query, as well as to group genes with similar alignment patterns.
### Manuscript preprint
***"Gene-level alignment of single cell trajectories"*** <br>
**Authors**: Dinithi Sumanaweera†, Chenqu Suo†, Ana-Maria Cujba, Daniele Muraro, Emma Dann, Krzysztof Polanski, Alexander S. Steemers, Woochan Lee, Amanda J. Oliver, Jong-Eun Park, Kerstin B. Meyer, Bianca Dumitrascu, Sarah A. Teichmann* <br>
Available at: https://doi.org/10.1101/2023.03.08.531713
### **Installing G2G**
For now, G2G needs to be installed from GitHub in a Python >=3.8 environment. We recommend creating a new Conda environment before installing G2G, to avoid any version conflicts and dependency issues.
```bash
conda create --name g2g_env python=3.8
conda activate g2g_env
pip install git+https://github.com/Teichlab/Genes2Genes.git
```
The package will be made available on PyPi soon.
### **Input to G2G**
(1) Reference anndata object (with `adata_ref.X` storing log1p normalised gene expression),
(2) Query anndata object (with `adata_query.X` storing log1p normalised gene expression), and
(3) Pseudotime estimates stored in each anndata object under `adata_ref.obs['time']` and `adata_query.obs['time']`.
**Note:** Please ensure that you have reasonable pseudotime estimates that fairly represent the trajectories, as the accuracy and reliability of trajectory alignment entirely depend on the accuracy and reliability of your pseudotime estimation. We recommend users to inspect whether the cell density distribution along estimated pseudotime (in terms of the meta attributes such as the annotated cell type, sampling time points, etc. where applicable) well-represents each trajectory of focus. Users can choose the best pseudotime estimates to compare after testing several different pseudotime estimation tools on their datasets.
### **Tutorial**
Please refer to the notebook [`notebooks/Tutorial.ipynb`](https://github.com/Teichlab/Genes2Genes/blob/main/notebooks/Tutorial.ipynb) which gives an example analysis between a reference and query dataset from literature.
Please also refer https://teichlab.github.io/Genes2Genes on how to read a trajectory alignment output generated by G2G. <br>
### **Runtime**
The runtime of the G2G algorithm depends on the number of cells in the reference and query datasets, the number of interpolation time points, and the number of genes to align.
For an idea, please see below a simple run-time analysis of G2G for 89 genes of the reference (N<sub>R</sub> = 179 cells) and query (N<sub>Q</sub> = 290 cells) from literature used in our tutorial. Note: the number of interpolation points = 14 for the middle plot. (Reference: [`notebooks/Supplementary_notebook1.ipynb`](https://github.com/Teichlab/Genes2Genes/blob/main/notebooks/Supplementary_notebook1.ipynb))
<div style="display: flex; justify-content: space-between;">
<p align="center">
<img src="images/n_interpolation_points_vs_time_PAM_LPS_G2G_alignment.png" alt="Image" width="300" height="200">
<img src="images/cell_numbers_vs_approx_time_PAM_LPS_G2G_alignment.png" alt="Image" width="500" height="200">
</p>
</div><br>
**Further examples from the case studies of our manuscript:** <br>
(Reference: [`notebooks/Supplementary_notebook2.ipynb`](https://github.com/Teichlab/Genes2Genes/blob/main/notebooks/Supplementary_notebook2.ipynb))
It took approximately 12min to align 1371 gene trajectories of 20,327 reference cells & 17,176 query cells under 14 interpolation time points; and approximately 4.5min to align 994 gene trajectories of 3157 reference cells & 890 query cells under 13 interpolation time points.
G2G can also utilize concurrency through Python multiprocessing by creating a number of processes equal to the number of cores in the system where each process performs a single gene-level alignment at one time. However we note that sequential processing (the default setting of G2G) seems to be more efficient than parallel processing, as multiprocessing seems to add an overhead when allocating and sharing resources amongst processes, thus doubling up the runtime.
### Funding Acknowledgement
Marie Skłodowska-Curie grant agreement No: 101026506 (Marie Curie Individual Fellowship) under the European Union’s Horizon 2020 research and innovation programme; Wellcome Trust Ph.D. Fellowship for Clinicians; Wellcome Trust (WT206194); ERC Consolidator Grant (646794); Wellcome Sanger Institute’s Translation Committee Fund.
Raw data
{
"_id": null,
"home_page": null,
"name": "genes2genes",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "single cell, trajectory alignment, dynamic programming",
"author": null,
"author_email": "Dinithi Sumanaweera <ds40@sanger.ac.uk>",
"download_url": "https://files.pythonhosted.org/packages/db/53/35890113653acb1b9ddcddd3e61e1ed997d42bd0999c179b3dfa80472ce5/genes2genes-0.2.0.tar.gz",
"platform": null,
"description": "<img src=\"images/G2G_logo_new.png\" alt=\"Image\" width=\"180\" height=\"150\">\n\n# Genes2Genes\nProject page: https://teichlab.github.io/Genes2Genes\n\n## A new framework for aligning single-cell trajectories of gene expression \nG2G aims to guide downstream comparative analysis of single-cell reference and query systems along any axis of progression (e.g. pseudotime). \nThis is done by employing a new dynamic programming (DP) based alignment algorithm which unifies dynamic time warping (DTW) and gap modelling to capture both matches and mismatches between time points. Our DP algorithm \nincorporates a Bayesian information-theoretic scoring scheme with a five-state probabilistic machine to generate an optimal alignment between a reference trajectory and query trajectory of a given gene in terms of their scRNA-seq expression. \n\nWe can use the G2G framework to perform comparisons across pseudotime such as:\n<ul>\n <li>Organoid vs. Reference tissue\n <li>Control vs. Treatment\n <li>Healthy vs. Disease\n</ul> \nby inferring fully-descriptive gene-specific alignments and single-aggregate alignments. \nThese alignment results enable us to pinpoint dynamic similarities and differences in gene expression between a reference and query, as well as to group genes with similar alignment patterns. \n\n### Manuscript preprint \n***\"Gene-level alignment of single cell trajectories\"*** <br>\n**Authors**: Dinithi Sumanaweera\u2020, Chenqu Suo\u2020, Ana-Maria Cujba, Daniele Muraro, Emma Dann, Krzysztof Polanski, Alexander S. Steemers, Woochan Lee, Amanda J. Oliver, Jong-Eun Park, Kerstin B. Meyer, Bianca Dumitrascu, Sarah A. Teichmann* <br>\nAvailable at: https://doi.org/10.1101/2023.03.08.531713 \n\n### **Installing G2G**\n\nFor now, G2G needs to be installed from GitHub in a Python >=3.8 environment. We recommend creating a new Conda environment before installing G2G, to avoid any version conflicts and dependency issues.\n```bash\nconda create --name g2g_env python=3.8 \nconda activate g2g_env\npip install git+https://github.com/Teichlab/Genes2Genes.git\n```\nThe package will be made available on PyPi soon.\n\n### **Input to G2G**\n(1) Reference anndata object (with `adata_ref.X` storing log1p normalised gene expression), \n(2) Query anndata object (with `adata_query.X` storing log1p normalised gene expression), and\n(3) Pseudotime estimates stored in each anndata object under `adata_ref.obs['time']` and `adata_query.obs['time']`.\n\n**Note:** Please ensure that you have reasonable pseudotime estimates that fairly represent the trajectories, as the accuracy and reliability of trajectory alignment entirely depend on the accuracy and reliability of your pseudotime estimation. We recommend users to inspect whether the cell density distribution along estimated pseudotime (in terms of the meta attributes such as the annotated cell type, sampling time points, etc. where applicable) well-represents each trajectory of focus. Users can choose the best pseudotime estimates to compare after testing several different pseudotime estimation tools on their datasets. \n\n### **Tutorial**\n\nPlease refer to the notebook [`notebooks/Tutorial.ipynb`](https://github.com/Teichlab/Genes2Genes/blob/main/notebooks/Tutorial.ipynb) which gives an example analysis between a reference and query dataset from literature. \nPlease also refer https://teichlab.github.io/Genes2Genes on how to read a trajectory alignment output generated by G2G. <br>\n\n### **Runtime**\n\nThe runtime of the G2G algorithm depends on the number of cells in the reference and query datasets, the number of interpolation time points, and the number of genes to align. \nFor an idea, please see below a simple run-time analysis of G2G for 89 genes of the reference (N<sub>R</sub> = 179 cells) and query (N<sub>Q</sub> = 290 cells) from literature used in our tutorial. Note: the number of interpolation points = 14 for the middle plot. (Reference: [`notebooks/Supplementary_notebook1.ipynb`](https://github.com/Teichlab/Genes2Genes/blob/main/notebooks/Supplementary_notebook1.ipynb))\n\n<div style=\"display: flex; justify-content: space-between;\">\n <p align=\"center\">\n <img src=\"images/n_interpolation_points_vs_time_PAM_LPS_G2G_alignment.png\" alt=\"Image\" width=\"300\" height=\"200\">\n <img src=\"images/cell_numbers_vs_approx_time_PAM_LPS_G2G_alignment.png\" alt=\"Image\" width=\"500\" height=\"200\">\n </p>\n</div><br>\n\n\n**Further examples from the case studies of our manuscript:** <br>\n(Reference: [`notebooks/Supplementary_notebook2.ipynb`](https://github.com/Teichlab/Genes2Genes/blob/main/notebooks/Supplementary_notebook2.ipynb))\n\nIt took approximately 12min to align 1371 gene trajectories of 20,327 reference cells & 17,176 query cells under 14 interpolation time points; and approximately 4.5min to align 994 gene trajectories of 3157 reference cells & 890 query cells under 13 interpolation time points. \n\nG2G can also utilize concurrency through Python multiprocessing by creating a number of processes equal to the number of cores in the system where each process performs a single gene-level alignment at one time. However we note that sequential processing (the default setting of G2G) seems to be more efficient than parallel processing, as multiprocessing seems to add an overhead when allocating and sharing resources amongst processes, thus doubling up the runtime. \n\n\n### Funding Acknowledgement\nMarie Sk\u0142odowska-Curie grant agreement No: 101026506 (Marie Curie Individual Fellowship) under the European Union\u2019s Horizon 2020 research and innovation programme; Wellcome Trust Ph.D. Fellowship for Clinicians; Wellcome Trust (WT206194); ERC Consolidator Grant (646794); Wellcome Sanger Institute\u2019s Translation Committee Fund.\n\n",
"bugtrack_url": null,
"license": null,
"summary": "A tool for aligning gene expression trajectories of single-cell reference and query systems",
"version": "0.2.0",
"project_urls": {
"Home": "https://teichlab.github.io/Genes2Genes",
"Repository": "https://github.com/Teichlab/Genes2Genes"
},
"split_keywords": [
"single cell",
" trajectory alignment",
" dynamic programming"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "e198ead18f50d311a64c8bf7043e2af9ed45dc895530fa2ecb6cdc304fd36b44",
"md5": "2b1d15ce78d0eeffce545a36bb1c7d0f",
"sha256": "deb31cbb5378d00cc2b1eab4e2f779e9a6c043631714bacd7ce9fca9d6f52fb7"
},
"downloads": -1,
"filename": "genes2genes-0.2.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2b1d15ce78d0eeffce545a36bb1c7d0f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 53255,
"upload_time": "2024-04-19T10:15:51",
"upload_time_iso_8601": "2024-04-19T10:15:51.406998Z",
"url": "https://files.pythonhosted.org/packages/e1/98/ead18f50d311a64c8bf7043e2af9ed45dc895530fa2ecb6cdc304fd36b44/genes2genes-0.2.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "db5335890113653acb1b9ddcddd3e61e1ed997d42bd0999c179b3dfa80472ce5",
"md5": "c83f4265962f59205865704ba63b1427",
"sha256": "57a2feae223c8e06e9b71d6803ffeed6ada0b26cf93371081e8af6ed51d82809"
},
"downloads": -1,
"filename": "genes2genes-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "c83f4265962f59205865704ba63b1427",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 50093,
"upload_time": "2024-04-19T10:15:53",
"upload_time_iso_8601": "2024-04-19T10:15:53.621408Z",
"url": "https://files.pythonhosted.org/packages/db/53/35890113653acb1b9ddcddd3e61e1ed997d42bd0999c179b3dfa80472ce5/genes2genes-0.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-19 10:15:53",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Teichlab",
"github_project": "Genes2Genes",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "genes2genes"
}