STMiner


NameSTMiner JSON
Version 0.0.0 PyPI version JSON
download
home_pagehttps://github.com/PSSUN/STMiner
SummaryPython package for spatial transcriptomics data analysis
upload_time2024-04-29 07:45:30
maintainerNone
docs_urlNone
authorPeisen Sun
requires_pythonNone
licenseMIT License
keywords stminer bioinformatics gmm hellinger distance spatial transcriptomics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ![Static Badge](https://img.shields.io/badge/License-MIT-blue)
![Static Badge](https://img.shields.io/badge/readthedocs-blue?logo=readthedocs&label=Documents)
![Static Badge](https://img.shields.io/badge/3.10-green?logo=python&label=Python&labelColor=yellow)
![Static Badge](https://img.shields.io/badge/Linux-blue?logo=Linux&logoColor=white)
![Static Badge](https://img.shields.io/badge/Windows-blue?logo=Windows&logoColor=white)
![Static Badge](https://img.shields.io/badge/macos-blue?logo=apple&logoColor=white)

<div align=center><img src="./pic/logo.png" height = "200"/></div>

# Introduction

Spatial transcriptomics revolutionizes transcriptomics by incorporating positional information. However, an emergency
problem is to find out the gene expression pattern which can reveal the special region in tissue and find out the genes
only expression in those regions.

![STMiner](./pic/fig1.png)

Here we propose “STMiner” based on the Gaussian mixture model to solve this problem. STMiner is a bottom-up methodology
algorithm. It is initiated by fitting a parametric model of gene spatial distributions and constructing a distance array
between them utilizing the Hellinger distance. Genes are clustered, thereby recognizing spatial co-expression patterns
across distinct gene classes.

**Please visit STMiner [Documents](https://stminerdoc.readthedocs.io/en/latest/Introduction/Introduction.html) for
details.**

# Quick start by example

## import package

```python
from STMiner import SPFinder
```

## Load data

You can download test data [here](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM4838133).

```python
sp = SPFinder()
file_path = 'D://10X_Visium_hunter2021spatially_sample_C_data.h5ad'
sp.read_h5ad(file=file_path)
```

## Find spatial high variable genes

```python
sp.get_genes_csr_array(min_cells=500, log1p=False)
sp.spatial_high_variable_genes()
```

You can check the distance of each genes by

```python
sp.global_distance
```

| Gene  | Distance |
|-------|----------|
| geneA | 9998     |
| geneB | 9994     |
| ...   | ...      |
| geneC | 8724     |

## Preprocess and Fit GMM

```python
sp.fit_pattern(n_comp=20, gene_list=list(sp.global_distance[:1000]['Gene']))
```

Each GMM model has 20 components.

## Build distance matrix & clustering

```python
sp.build_distance_array()
sp.cluster_gene(n_clusters=6, mds_components=20)
```

## Result & Visualization

The result is stored in **genes_labels**:

```python
sp.genes_labels
```

The output looks like the following:

|    | gene_id        | labels |
|----|----------------|--------|
| 0  | Cldn5          | 2      |
| 1  | Fyco1          | 2      |
| 2  | Pmepa1         | 2      |
| 3  | Arhgap5        | 0      |
| 4  | Apc            | 5      |
| .. | ...            | ...    |
| 95 | Cyp2a5         | 0      |
| 96 | X5730403I07Rik | 0      |
| 97 | Ltbp2          | 2      |
| 98 | Rbp4           | 4      |
| 99 | Hist1h1e       | 4      |

### To visualize the patterns:

```python
sp.get_pattern_array(vote_rate=0.3)
sp.plot.plot_pattern(vmax=99,
                     heatmap=False,
                     s=5,
                     reverse_y=True,
                     reverse_x=True,
                     image_path='E://cut_img.png',
                     rotate_img=True,
                     k=4,
                     aspect=0.55)
```

<div  align="center">    
  <img src="./pic/scatterplot.png" width = "600" align=center />
</div>

### Visualize the intersections between patterns 3 & 1:

```python
sp.plot.plot_intersection(pattern_list=[0, 1],
                          image_path='E://OneDrive - stu.xjtu.edu.cn/paper/cut_img.png',
                          reverse_y=True,
                          reverse_x=True,
                          aspect=0.55,
                          s=20)
```

<div  align="center">    
  <img src="./pic/scatterplot_mx.png" width = "300" align=center />
</div>

### To visualize the gene expression by labels:

```python
sp.plot.plot_genes(label=0, vmax=99)
```

## Attribute of STMiner.SPFinder Object

| Attribute            | Type         | Description                             |
|----------------------|--------------|-----------------------------------------|
| adata                | Anndata      | Anndata for loaded spatial data         |
| global_distance      | pd.DataFrame | OT distance between gene and background |
| genes_labels         | pd.DataFrame | Gene name and their pattern labels      |
| genes_patterns       | dict         | GMM model for each gene                 |
| genes_distance_array | pd.DataFrame | Distance between each GMM               |
| kmeans_fit_result    | obj          | Result of k-means                       |
| mds_features         | pd.DataFrame | embedding features after MDS            |


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/PSSUN/STMiner",
    "name": "STMiner",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "STMiner, bioinformatics, GMM, hellinger distance, Spatial transcriptomics",
    "author": "Peisen Sun",
    "author_email": "sunpeisen@stu.xjtu.edu.cn",
    "download_url": "https://files.pythonhosted.org/packages/62/f9/a5e7158e260ba03e4411ec602f85248c872bc139b8c60e494c0cd31b7517/STMiner-0.0.0.tar.gz",
    "platform": "Linux",
    "description": "![Static Badge](https://img.shields.io/badge/License-MIT-blue)\r\n![Static Badge](https://img.shields.io/badge/readthedocs-blue?logo=readthedocs&label=Documents)\r\n![Static Badge](https://img.shields.io/badge/3.10-green?logo=python&label=Python&labelColor=yellow)\r\n![Static Badge](https://img.shields.io/badge/Linux-blue?logo=Linux&logoColor=white)\r\n![Static Badge](https://img.shields.io/badge/Windows-blue?logo=Windows&logoColor=white)\r\n![Static Badge](https://img.shields.io/badge/macos-blue?logo=apple&logoColor=white)\r\n\r\n<div align=center><img src=\"./pic/logo.png\" height = \"200\"/></div>\r\n\r\n# Introduction\r\n\r\nSpatial transcriptomics revolutionizes transcriptomics by incorporating positional information. However, an emergency\r\nproblem is to find out the gene expression pattern which can reveal the special region in tissue and find out the genes\r\nonly expression in those regions.\r\n\r\n![STMiner](./pic/fig1.png)\r\n\r\nHere we propose \u201cSTMiner\u201d based on the Gaussian mixture model to solve this problem. STMiner is a bottom-up methodology\r\nalgorithm. It is initiated by fitting a parametric model of gene spatial distributions and constructing a distance array\r\nbetween them utilizing the Hellinger distance. Genes are clustered, thereby recognizing spatial co-expression patterns\r\nacross distinct gene classes.\r\n\r\n**Please visit STMiner [Documents](https://stminerdoc.readthedocs.io/en/latest/Introduction/Introduction.html) for\r\ndetails.**\r\n\r\n# Quick start by example\r\n\r\n## import package\r\n\r\n```python\r\nfrom STMiner import SPFinder\r\n```\r\n\r\n## Load data\r\n\r\nYou can download test data [here](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM4838133).\r\n\r\n```python\r\nsp = SPFinder()\r\nfile_path = 'D://10X_Visium_hunter2021spatially_sample_C_data.h5ad'\r\nsp.read_h5ad(file=file_path)\r\n```\r\n\r\n## Find spatial high variable genes\r\n\r\n```python\r\nsp.get_genes_csr_array(min_cells=500, log1p=False)\r\nsp.spatial_high_variable_genes()\r\n```\r\n\r\nYou can check the distance of each genes by\r\n\r\n```python\r\nsp.global_distance\r\n```\r\n\r\n| Gene  | Distance |\r\n|-------|----------|\r\n| geneA | 9998     |\r\n| geneB | 9994     |\r\n| ...   | ...      |\r\n| geneC | 8724     |\r\n\r\n## Preprocess and Fit GMM\r\n\r\n```python\r\nsp.fit_pattern(n_comp=20, gene_list=list(sp.global_distance[:1000]['Gene']))\r\n```\r\n\r\nEach GMM model has 20 components.\r\n\r\n## Build distance matrix & clustering\r\n\r\n```python\r\nsp.build_distance_array()\r\nsp.cluster_gene(n_clusters=6, mds_components=20)\r\n```\r\n\r\n## Result & Visualization\r\n\r\nThe result is stored in **genes_labels**:\r\n\r\n```python\r\nsp.genes_labels\r\n```\r\n\r\nThe output looks like the following:\r\n\r\n|    | gene_id        | labels |\r\n|----|----------------|--------|\r\n| 0  | Cldn5          | 2      |\r\n| 1  | Fyco1          | 2      |\r\n| 2  | Pmepa1         | 2      |\r\n| 3  | Arhgap5        | 0      |\r\n| 4  | Apc            | 5      |\r\n| .. | ...            | ...    |\r\n| 95 | Cyp2a5         | 0      |\r\n| 96 | X5730403I07Rik | 0      |\r\n| 97 | Ltbp2          | 2      |\r\n| 98 | Rbp4           | 4      |\r\n| 99 | Hist1h1e       | 4      |\r\n\r\n### To visualize the patterns:\r\n\r\n```python\r\nsp.get_pattern_array(vote_rate=0.3)\r\nsp.plot.plot_pattern(vmax=99,\r\n                     heatmap=False,\r\n                     s=5,\r\n                     reverse_y=True,\r\n                     reverse_x=True,\r\n                     image_path='E://cut_img.png',\r\n                     rotate_img=True,\r\n                     k=4,\r\n                     aspect=0.55)\r\n```\r\n\r\n<div  align=\"center\">    \r\n  <img src=\"./pic/scatterplot.png\" width = \"600\" align=center />\r\n</div>\r\n\r\n### Visualize the intersections between patterns 3 & 1:\r\n\r\n```python\r\nsp.plot.plot_intersection(pattern_list=[0, 1],\r\n                          image_path='E://OneDrive - stu.xjtu.edu.cn/paper/cut_img.png',\r\n                          reverse_y=True,\r\n                          reverse_x=True,\r\n                          aspect=0.55,\r\n                          s=20)\r\n```\r\n\r\n<div  align=\"center\">    \r\n  <img src=\"./pic/scatterplot_mx.png\" width = \"300\" align=center />\r\n</div>\r\n\r\n### To visualize the gene expression by labels:\r\n\r\n```python\r\nsp.plot.plot_genes(label=0, vmax=99)\r\n```\r\n\r\n## Attribute of STMiner.SPFinder Object\r\n\r\n| Attribute            | Type         | Description                             |\r\n|----------------------|--------------|-----------------------------------------|\r\n| adata                | Anndata      | Anndata for loaded spatial data         |\r\n| global_distance      | pd.DataFrame | OT distance between gene and background |\r\n| genes_labels         | pd.DataFrame | Gene name and their pattern labels      |\r\n| genes_patterns       | dict         | GMM model for each gene                 |\r\n| genes_distance_array | pd.DataFrame | Distance between each GMM               |\r\n| kmeans_fit_result    | obj          | Result of k-means                       |\r\n| mds_features         | pd.DataFrame | embedding features after MDS            |\r\n\r\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "Python package for spatial transcriptomics data analysis",
    "version": "0.0.0",
    "project_urls": {
        "Homepage": "https://github.com/PSSUN/STMiner"
    },
    "split_keywords": [
        "stminer",
        " bioinformatics",
        " gmm",
        " hellinger distance",
        " spatial transcriptomics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b659672ccacb1204c1966d5f3b128f891455b5615d1eacce6d47c8d5d6ee6305",
                "md5": "75427ad490959780e01c88e57678ed08",
                "sha256": "32f6f87354e903e5345107640ef5f5986cf1ab24da01feaa61082bf2e1d390e1"
            },
            "downloads": -1,
            "filename": "STMiner-0.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "75427ad490959780e01c88e57678ed08",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 36483,
            "upload_time": "2024-04-29T07:45:27",
            "upload_time_iso_8601": "2024-04-29T07:45:27.953138Z",
            "url": "https://files.pythonhosted.org/packages/b6/59/672ccacb1204c1966d5f3b128f891455b5615d1eacce6d47c8d5d6ee6305/STMiner-0.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "62f9a5e7158e260ba03e4411ec602f85248c872bc139b8c60e494c0cd31b7517",
                "md5": "49ab9ea6a54753ad5a722c61854badff",
                "sha256": "c2c1b08a1db3b1b6331934cddec50ce321437645af577d22c09c6bdc96e98f63"
            },
            "downloads": -1,
            "filename": "STMiner-0.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "49ab9ea6a54753ad5a722c61854badff",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 30236,
            "upload_time": "2024-04-29T07:45:30",
            "upload_time_iso_8601": "2024-04-29T07:45:30.950651Z",
            "url": "https://files.pythonhosted.org/packages/62/f9/a5e7158e260ba03e4411ec602f85248c872bc139b8c60e494c0cd31b7517/STMiner-0.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-29 07:45:30",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "PSSUN",
    "github_project": "STMiner",
    "github_not_found": true,
    "lcname": "stminer"
}
        
Elapsed time: 0.23745s