# scAGDE
[![PyPI badge](https://img.shields.io/badge/pypi_package-0.0.15-blue)](https://pypi.org/project/scAGDE/)
[![License](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
[![zenodo](https://zenodo.org/badge/DOI/10.5281/zenodo.12176520.svg)](https://zenodo.org/records/12176520)
![logo](https://github.com/Hgy1014/scAGDE/assets/64194550/867c48cc-c777-4a08-9886-eb6fdb214cc5)
`scAGDE` is a Python implementation for a novel single-cell chromatin accessibility model-based deep graph representation learning method that simultaneously learns feature representation and
clustering through explicit modeling of single-cell ATAC-seq data generation.
- [Briefly](#Briefly)
- [Overview](#overview)
- [System Requirements](#system-requirements)
- [Installation Guide](#installation-guide)
- [Usage](#Usage)
- [Data Availability](#data-availability)
- [License](#license)
# Briefly
Single-cell ATAC-seq technology has significantly advanced our understanding of cellular heterogeneity by enabling the exploration
of epigenetic landscapes and regulatory elements at the single-cell level. A major challenge in analyzing high-throughput single-cell
ATAC-seq data is its inherently low copy number, leading to data sparsity and high dimensionality, significantly limiting the elucidation and characterization of gene regulatory elements. To address these limitations, we developed scAGDE, a novel single-cell chromatin accessibility model-based deep graph representation learning method that simultaneously learns feature representation and
clustering through explicit modeling of single-cell ATAC-seq data generation. scAGDE first leverages a chromatin accessibility-based
autoencoder, which is designed to identify key patterns in single-cell ATAC-seq data, eliminate less relevant peaks, and construct a cell
graph to elucidate the topological connections among individual cells. After that, scAGDE integrates a Graph Convolutional Network
(GCN) as an encoder to extract essential structural information from both the ATAC-seq count matrix and the cell graph, coupled
with a Bernoulli-based decoder to characterize the global probabilistic structure of the data. Additionally, the graph embedding
process independently generates soft labels that guide self-supervised deep clustering, which is characterized by its iterative refinement of results.
# Overview
Overview of the scAGDE framework. (a) A summary graphical illustration of scAGDE workflow. scAGDE takes as input the binary cell-by-peak matrix first into
a chromatin accessibility-based autoencoder and then performs the graph embedding learning. (b) The chromatin accessibility-based autoencoder maps data into latent
space, where each individual cell connects its nearest cell as neighbours to construct a cell graph. The variation of encoder’s weights can be translated to importance score
of peaks for peak selection procedure. (c) The well-prepared cell graph and filtered data are simultaneously handled by a two-layer GCN encoder (i) and mapped into the
latent space (ii). On the one hand, the latent embedding serves as input to dual decoders (iii), which include a graph decoder module to reconstruct from embedding, and a
Bernoulli-based decoder module to estimate the probability of a peak being accessible, which are estimates of the true chromatin landscape in each cell. On the other hand,
the dual clustering optimizations are introduced (iv), where a network of cluster layer, which is initialized by K-means results on the embedding, infers soft clustering label.
The target distribution and one-hot pseudo label are sequentially calculated and used for label prediction loss and distribution alignment loss. (d) scAGDE facilitates critical
downstream applications of clustering, visualization, imputation, enrichment analysis and discovery of regulators.
![framework](https://github.com/Hgy1014/images/blob/main/scAGDE/framework.png)
<!-- ![framework](https://github.com/Hgy1014/scAGDE/assets/64194550/79b02f20-7bde-4849-abc2-89d5bae66ce3) -->
# System Requirements
## Hardware requirements
`scAGDE` package requires only a standard computer with enough RAM to support the in-memory operations.
## Software requirements
### OS Requirements
This package is supported for *Linux*. The package has been tested on the following systems:
+ Linux: Ubuntu 18.04
### Python Dependencies
`scAGDE` mainly depends on the Python scientific stack.
```
numpy
scipy
torch
scikit-learn
pandas
scanpy
anndata
rpy2
```
For specific setting, please see <a href="requirements.txt">requirements.text</a>.
### R Dependencies
We need your environment to have R and `mclust` package installed.
# Installation Guide:
You can create an environment to run scAGDE without any problems by following the code below:
```
conda create -n scagde python=3.9.13
conda activate scagde
pip install torch==2.0.1
pip install numpy==1.23.5
pip install rpy2==3.5.16
pip install scanpy==1.9.3
pip install matplotlib==3.5.0
pip install leidenalg==0.10.2
conda install r-mclust
pip install scAGDE
```
# Usage
We give users detailed usage guidelines in the folder `tutorials`. Specifically, `Tutorial 1` provides suggestions for running scAGDE in an end-to-end style and `Tutorial 2` for running scAGDE in an step-by-step way, where detailed instructions are added for each step. `Tutorial 3` and `4` provide numerous R scripts for you to complete the experimental analysis of imputation or peak selection preferences. `Tutorial 5` illustrates how scAGDE utilizes batch training on large-scale data to speed up training.
You can also visit the online document at <a href="https://scagde-tutorial.readthedocs.io/en/latest/index.html">https://scagde-tutorial.readthedocs.io/en/latest/index.html</a> for instruction.
# Data Availability
All the simulated and realistic datasets we used in our study, including the human brain dataset can be download <a href="https://zenodo.org/records/12176520">here</a>.
# License
This project is covered under the **MIT License**.
# Citation
```
```
Raw data
{
"_id": null,
"home_page": "https://github.com/Hgy1014/scAGDE",
"name": "scAGDE",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "python, first package",
"author": "Gaoyang Hao",
"author_email": "<haogy22@mails.jlu.edu.cn>",
"download_url": "https://files.pythonhosted.org/packages/4f/8f/580046cc111e41a6799238e1439165e490c033f32ebf0f522a93ac963da3/scagde-0.0.17.tar.gz",
"platform": null,
"description": "# scAGDE\n\n[![PyPI badge](https://img.shields.io/badge/pypi_package-0.0.15-blue)](https://pypi.org/project/scAGDE/)\n[![License](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)\n[![zenodo](https://zenodo.org/badge/DOI/10.5281/zenodo.12176520.svg)](https://zenodo.org/records/12176520)\n![logo](https://github.com/Hgy1014/scAGDE/assets/64194550/867c48cc-c777-4a08-9886-eb6fdb214cc5)\n\n`scAGDE` is a Python implementation for a novel single-cell chromatin accessibility model-based deep graph representation learning method that simultaneously learns feature representation and\nclustering through explicit modeling of single-cell ATAC-seq data generation.\n- [Briefly](#Briefly)\n- [Overview](#overview)\n- [System Requirements](#system-requirements)\n- [Installation Guide](#installation-guide)\n- [Usage](#Usage)\n- [Data Availability](#data-availability)\n- [License](#license)\n\n# Briefly\nSingle-cell ATAC-seq technology has significantly advanced our understanding of cellular heterogeneity by enabling the exploration\nof epigenetic landscapes and regulatory elements at the single-cell level. A major challenge in analyzing high-throughput single-cell\nATAC-seq data is its inherently low copy number, leading to data sparsity and high dimensionality, significantly limiting the elucidation and characterization of gene regulatory elements. To address these limitations, we developed scAGDE, a novel single-cell chromatin accessibility model-based deep graph representation learning method that simultaneously learns feature representation and\nclustering through explicit modeling of single-cell ATAC-seq data generation. scAGDE first leverages a chromatin accessibility-based\nautoencoder, which is designed to identify key patterns in single-cell ATAC-seq data, eliminate less relevant peaks, and construct a cell\ngraph to elucidate the topological connections among individual cells. After that, scAGDE integrates a Graph Convolutional Network\n(GCN) as an encoder to extract essential structural information from both the ATAC-seq count matrix and the cell graph, coupled\nwith a Bernoulli-based decoder to characterize the global probabilistic structure of the data. Additionally, the graph embedding\nprocess independently generates soft labels that guide self-supervised deep clustering, which is characterized by its iterative refinement of results.\n# Overview\nOverview of the scAGDE framework. (a) A summary graphical illustration of scAGDE workflow. scAGDE takes as input the binary cell-by-peak matrix first into\na chromatin accessibility-based autoencoder and then performs the graph embedding learning. (b) The chromatin accessibility-based autoencoder maps data into latent\nspace, where each individual cell connects its nearest cell as neighbours to construct a cell graph. The variation of encoder\u2019s weights can be translated to importance score\nof peaks for peak selection procedure. (c) The well-prepared cell graph and filtered data are simultaneously handled by a two-layer GCN encoder (i) and mapped into the\nlatent space (ii). On the one hand, the latent embedding serves as input to dual decoders (iii), which include a graph decoder module to reconstruct from embedding, and a\nBernoulli-based decoder module to estimate the probability of a peak being accessible, which are estimates of the true chromatin landscape in each cell. On the other hand,\nthe dual clustering optimizations are introduced (iv), where a network of cluster layer, which is initialized by K-means results on the embedding, infers soft clustering label.\nThe target distribution and one-hot pseudo label are sequentially calculated and used for label prediction loss and distribution alignment loss. (d) scAGDE facilitates critical\ndownstream applications of clustering, visualization, imputation, enrichment analysis and discovery of regulators.\n![framework](https://github.com/Hgy1014/images/blob/main/scAGDE/framework.png)\n<!-- ![framework](https://github.com/Hgy1014/scAGDE/assets/64194550/79b02f20-7bde-4849-abc2-89d5bae66ce3) -->\n\n# System Requirements\n## Hardware requirements\n`scAGDE` package requires only a standard computer with enough RAM to support the in-memory operations.\n\n## Software requirements\n### OS Requirements\nThis package is supported for *Linux*. The package has been tested on the following systems:\n+ Linux: Ubuntu 18.04\n\n### Python Dependencies\n`scAGDE` mainly depends on the Python scientific stack.\n```\nnumpy\nscipy\ntorch\nscikit-learn\npandas\nscanpy\nanndata\nrpy2\n```\nFor specific setting, please see <a href=\"requirements.txt\">requirements.text</a>.\n### R Dependencies\nWe need your environment to have R and `mclust` package installed.\n# Installation Guide:\nYou can create an environment to run scAGDE without any problems by following the code below:\n```\nconda create -n scagde python=3.9.13\nconda activate scagde\npip install torch==2.0.1\npip install numpy==1.23.5\npip install rpy2==3.5.16\npip install scanpy==1.9.3\npip install matplotlib==3.5.0\npip install leidenalg==0.10.2\nconda install r-mclust\npip install scAGDE\n```\n\n# Usage\nWe give users detailed usage guidelines in the folder `tutorials`. Specifically, `Tutorial 1` provides suggestions for running scAGDE in an end-to-end style and `Tutorial 2` for running scAGDE in an step-by-step way, where detailed instructions are added for each step. `Tutorial 3` and `4` provide numerous R scripts for you to complete the experimental analysis of imputation or peak selection preferences. `Tutorial 5` illustrates how scAGDE utilizes batch training on large-scale data to speed up training.\nYou can also visit the online document at <a href=\"https://scagde-tutorial.readthedocs.io/en/latest/index.html\">https://scagde-tutorial.readthedocs.io/en/latest/index.html</a> for instruction.\n\n# Data Availability\n\nAll the simulated and realistic datasets we used in our study, including the human brain dataset can be download <a href=\"https://zenodo.org/records/12176520\">here</a>.\n\n# License\n\nThis project is covered under the **MIT License**.\n\n# Citation\n\n```\n\n```\n\n",
"bugtrack_url": null,
"license": null,
"summary": "scAGDE Python package",
"version": "0.0.17",
"project_urls": {
"Homepage": "https://github.com/Hgy1014/scAGDE"
},
"split_keywords": [
"python",
" first package"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "9ce4fd136a17f542e0212d01aae022c1e9aca99f2ddea6c824a994ae876ced32",
"md5": "041f4eab5339b6e477c9363fb243e537",
"sha256": "a743a00202a0048ecb6e4584b1cfd8c1c7df312455d9ce5610402ff2f49651c8"
},
"downloads": -1,
"filename": "scAGDE-0.0.17-py3-none-any.whl",
"has_sig": false,
"md5_digest": "041f4eab5339b6e477c9363fb243e537",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 19747,
"upload_time": "2024-11-23T09:36:52",
"upload_time_iso_8601": "2024-11-23T09:36:52.841580Z",
"url": "https://files.pythonhosted.org/packages/9c/e4/fd136a17f542e0212d01aae022c1e9aca99f2ddea6c824a994ae876ced32/scAGDE-0.0.17-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "4f8f580046cc111e41a6799238e1439165e490c033f32ebf0f522a93ac963da3",
"md5": "09958df9bc27f029368d3aba6c7b8c77",
"sha256": "16bc22ec4390c4ccbd131807a8c5f35ef317baefc82c42aba4c59f3c3275c7cd"
},
"downloads": -1,
"filename": "scagde-0.0.17.tar.gz",
"has_sig": false,
"md5_digest": "09958df9bc27f029368d3aba6c7b8c77",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 10490891,
"upload_time": "2024-11-23T09:36:58",
"upload_time_iso_8601": "2024-11-23T09:36:58.026782Z",
"url": "https://files.pythonhosted.org/packages/4f/8f/580046cc111e41a6799238e1439165e490c033f32ebf0f522a93ac963da3/scagde-0.0.17.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-23 09:36:58",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Hgy1014",
"github_project": "scAGDE",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "scagde"
}