splotch-st


Namesplotch-st JSON
Version 0.1.1a0 PyPI version JSON
download
home_pagehttps://github.com/tare/Splotch
SummarySplotch is a hierarchical generative probabilistic model for analyzing Spatial Transcriptomics (ST) data
upload_time2023-12-22 18:34:53
maintainer
docs_urlNone
authorTarmo Äijö
requires_python>=3.9,<3.13
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Splotch

Splotch is a hierarchical generative probabilistic model for analyzing Spatial Transcriptomics (ST) [[1]](#references) data.

## Features

- Supports complex hierarchical experimental designs and model-based analysis of replicates
- Full Bayesian inference with Hamiltonian Monte Carlo (HMC) using the adaptive HMC sampler as implemented in NumPyro [[2]](#references)
  - CPU, GPU, and TPU support
- Analysis of expression differences between anatomical regions and conditions using posterior samples
- Different anatomical annotated regions (AARs) are modeled using a linear model
- Zero-inflated Poisson or Poisson likelihood
- Gaussian Process prior for spatial random effect

The Splotch code in this repository supports single-, two-, and three-level experimental designs.

## Installation

### PyPI

```console
$ pip install splotch-st
```

### GitHub

```console
$ pip install git+https://git@github.com/tare/Splotch.git
```

#### CUDA

To install JAX with NVIDIA support, please see [this](https://jax.readthedocs.io/en/latest/installation.html#pip-installation-gpu-cuda-installed-via-pip-easier) page for instructions.

## Usage

The main steps of Splotch analysis are the following:

1. [Preparation of count files](#preparation-of-count-files)
2. [Annotation of ST spots](#annotation-of-st-spots)
3. [Preparation of metadata table](#preparation-of-metadata-table)
4. [Splotch analysis](#splotch-analysis)

### Preparation of count files

The count files have the following tab-separated values (TSV) file format

|               | 32.06_2.04 | 31.16_2.04 | 14.07_2.1 | …   | 28.16_33.01 |
| ------------- | ---------- | ---------- | --------- | --- | ----------- |
| A130010J15Rik | 0          | 0          | 0         | …   | 0           |
| A230046K03Rik | 0          | 0          | 0         | …   | 0           |
| A230050P20Rik | 0          | 0          | 0         | …   | 0           |
| A2m           | 0          | 1          | 0         | …   | 0           |
| ⋮             | ⋮          | ⋮          | ⋮         | ⋱   | ⋮           |
| Zzz3          | 0          | 1          | 0         | …   | 0           |

The rows and columns have gene identifiers and ST spot coordinates (X and Y coordinates are separated by an underscore), respectively.

### Annotation of ST spots

To get the most out of the statistical model of Splotch one has to annotate the ST spots based on their tissue context. These annotations will allow the model to share information across tissue sections, resulting in more robust conclusions.

To make the annotation step slightly less tedious, we have implemented a light-weight javascript tool called [Span](https://github.com/tare/Span).

The annotation files have the following TSV file format

|                | 32.06_2.04 | 31.16_2.04 | 14.07_2.1 | …   | 28.16_33.01 |
| -------------- | ---------- | ---------- | --------- | --- | ----------- |
| Vent_Med_White | 0          | 0          | 0         | …   | 0           |
| Vent_Horn      | 1          | 1          | 0         | …   | 0           |
| Vent_Lat_White | 0          | 0          | 0         | …   | 0           |
| Med_Grey       | 0          | 0          | 0         | …   | 0           |
| Dors_Horn      | 0          | 0          | 0         | …   | 0           |
| Dors_Edge      | 0          | 0          | 0         | …   | 1           |
| Med_Lat_White  | 0          | 0          | 0         | …   | 0           |
| Vent_Edge      | 0          | 0          | 1         | …   | 0           |
| Dors_Med_White | 0          | 0          | 0         | …   | 0           |
| Cent_Can       | 0          | 0          | 0         | …   | 0           |
| Lat_Edge       | 0          | 0          | 0         | …   | 0           |

The rows and columns correspond to the user-define anatomical annotation regions (AAR) and ST spot coordinates (X and Y coordinates are separated by an underscore), respectively. For instance, the spot 32.06_2.04 has the Vent_Horn annotation (i.e. located in ventral horn). The annotation category of each ST spot is **one-hot encoded** and we do not currently support more than one annotation category per ST spot.

ST spots without annotation categories are discarded in the analysis. This behaviour can be useful when you want to discard some ST spots from the analysis based on the tissue histology.

### Preparation of metadata table

The metadata table contains information about the samples (i.e. count files). Additionally, the metadata table is used for matching count and annotation files.

The metadata table has the following TSV file format

| name      | level_1   | level_2 | level_3 | count_file                                                       | annotation_file           | image_file              |
| --------- | --------- | ------- | ------- | ---------------------------------------------------------------- | ------------------------- | ----------------------- |
| L7CN36_C1 | G93A p120 | F       | 1394    | count_tables/L7CN36_C1_stdata_aligned_counts_IDs.txt.unified.tsv | annotations/L7CN36_C1.tsv | images/L7CN36_C1_HE.jpg |
| L7CN36_C2 | G93A p120 | F       | 1394    | count_tables/L7CN36_C2_stdata_aligned_counts_IDs.txt.unified.tsv | annotations/L7CN36_C2.tsv | images/L7CN36_C2_HE.jpg |
| L7CN30_C1 | WT p120   | M       | 2967    | count_tables/L7CN30_C1_stdata_aligned_counts_IDs.txt.unified.tsv | annotations/L7CN30_C1.tsv | images/L7CN30_C1_HE.jpg |
| L7CN30_C2 | WT p120   | M       | 2967    | count_tables/L7CN30_C2_stdata_aligned_counts_IDs.txt.unified.tsv | annotations/L7CN30_C2.tsv | images/L7CN30_C2_HE.jpg |
| L7CN69_D1 | WT p120   | M       | 1310    | count_tables/L7CN69_D1_stdata_aligned_counts_IDs.txt.unified.tsv | annotations/L7CN69_D1.tsv | images/L7CN69_D1_HE.jpg |
| L7CN69_D2 | WT p120   | M       | 1310    | count_tables/L7CN69_D2_stdata_aligned_counts_IDs.txt.unified.tsv | annotations/L7CN69_D2.tsv | images/L7CN69_D2_HE.jpg |
| CN96_E1   | WT p120   | F       | 1040    | count_tables/CN96_E1_stdata_aligned_counts_IDs.txt.unified.tsv   | annotations/CN96_E1.tsv   | images/CN96_E1_HE.jpg   |
| CN96_E2   | WT p120   | F       | 1040    | count_tables/CN96_E2_stdata_aligned_counts_IDs.txt.unified.tsv   | annotations/CN96_E2.tsv   | images/CN96_E2_HE.jpg   |
| CN93_E1   | G93A p120 | M       | 975     | count_tables/CN93_E1_stdata_aligned_counts_IDs.txt.unified.tsv   | annotations/CN93_E1.tsv   | images/CN93_E1_HE.jpg   |
| CN93_E2   | G93A p120 | M       | 975     | count_tables/CN93_E2_stdata_aligned_counts_IDs.txt.unified.tsv   | annotations/CN93_E2.tsv   | images/CN93_E2_HE.jpg   |

Each sample (i.e. slide) has its own row in the metadata table. The columns `level_1`, `level_2`, and `level_3` define how the samples are analyzed using the linear hierarchical AAR model. The columns `level_1`, `count_file`, and `annotation_file` are mandatory. The column `level_2` is mandatory when using the two-level model. Similarly, the columns `level_2` and `level_3` are mandatory when using the three-level model. At the moment we only support categorical variables.

If a given slide contains tissue sections from multiple biological conditions in terms of the explanatory variables, then it is recommended to split the tissue sections into multiple count files so that the design matrix can be defined accordingly.

The user can include additional columns at their own discretion. For instance, we will use the column `image_file` in the [tutorials](tutorials/).

### Example data

In the `tutorials` directory, we have two example ST data sets

1. [ALS](tutorials/als_st/) [[3]](#references)
2. [Olfactory Bulb](tutorials/olfactory_bulb_st/) [[1]](#references)

### Splotch analysis

Please see the [ALS](tutorials/als_st/Tutorial.ipynb) and [Olfactory Bulb](tutorials/olfactory_bulb_st/Tutorial.ipynb) tutorials.

In the simplest setting, the following lines would be enough to run Splotch on a single gene

```python
# read input data
splotch_input_data = get_input_data("metadata.tsv")

# run Splotch on the Gfap gene
key = random.PRNGKey(0)
key, key_ = random.split(key)
splotch_result_nuts = run_nuts(key_, ["Gfap"], splotch_input_data)
```

### References

[1] Ståhl, Patrik L., et al. ["Visualization and analysis of gene expression in tissue sections by spatial transcriptomics."](https://science.sciencemag.org/content/353/6294/78) _Science_ 353.6294 (2016): 78-82.

[2] Phan, Du, et al. ["Composable Effects for Flexible and Accelerated Probabilistic Programming in NumPyro."](https://www.jstatsoft.org/article/view/v076i01) _arXiv preprint_ 1912.11554 (2019).

[3] Maniatis, Silas, et al. ["Spatiotemporal dynamics of molecular pathology in amyotrophic lateral sclerosis."](https://science.sciencemag.org/content/364/6435/89) _Science_ 364.6435 (2019): 89-93.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/tare/Splotch",
    "name": "splotch-st",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9,<3.13",
    "maintainer_email": "",
    "keywords": "",
    "author": "Tarmo \u00c4ij\u00f6",
    "author_email": "tarmo.aijo@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/51/80/49262abbbe569b1c8fccb99874367290a7cede656cc771521eb59448eabe/splotch_st-0.1.1a0.tar.gz",
    "platform": null,
    "description": "# Splotch\n\nSplotch is a hierarchical generative probabilistic model for analyzing Spatial Transcriptomics (ST) [[1]](#references) data.\n\n## Features\n\n- Supports complex hierarchical experimental designs and model-based analysis of replicates\n- Full Bayesian inference with Hamiltonian Monte Carlo (HMC) using the adaptive HMC sampler as implemented in NumPyro [[2]](#references)\n  - CPU, GPU, and TPU support\n- Analysis of expression differences between anatomical regions and conditions using posterior samples\n- Different anatomical annotated regions (AARs) are modeled using a linear model\n- Zero-inflated Poisson or Poisson likelihood\n- Gaussian Process prior for spatial random effect\n\nThe Splotch code in this repository supports single-, two-, and three-level experimental designs.\n\n## Installation\n\n### PyPI\n\n```console\n$ pip install splotch-st\n```\n\n### GitHub\n\n```console\n$ pip install git+https://git@github.com/tare/Splotch.git\n```\n\n#### CUDA\n\nTo install JAX with NVIDIA support, please see [this](https://jax.readthedocs.io/en/latest/installation.html#pip-installation-gpu-cuda-installed-via-pip-easier) page for instructions.\n\n## Usage\n\nThe main steps of Splotch analysis are the following:\n\n1. [Preparation of count files](#preparation-of-count-files)\n2. [Annotation of ST spots](#annotation-of-st-spots)\n3. [Preparation of metadata table](#preparation-of-metadata-table)\n4. [Splotch analysis](#splotch-analysis)\n\n### Preparation of count files\n\nThe count files have the following tab-separated values (TSV) file format\n\n|               | 32.06_2.04 | 31.16_2.04 | 14.07_2.1 | \u2026   | 28.16_33.01 |\n| ------------- | ---------- | ---------- | --------- | --- | ----------- |\n| A130010J15Rik | 0          | 0          | 0         | \u2026   | 0           |\n| A230046K03Rik | 0          | 0          | 0         | \u2026   | 0           |\n| A230050P20Rik | 0          | 0          | 0         | \u2026   | 0           |\n| A2m           | 0          | 1          | 0         | \u2026   | 0           |\n| \u22ee             | \u22ee          | \u22ee          | \u22ee         | \u22f1   | \u22ee           |\n| Zzz3          | 0          | 1          | 0         | \u2026   | 0           |\n\nThe rows and columns have gene identifiers and ST spot coordinates (X and Y coordinates are separated by an underscore), respectively.\n\n### Annotation of ST spots\n\nTo get the most out of the statistical model of Splotch one has to annotate the ST spots based on their tissue context. These annotations will allow the model to share information across tissue sections, resulting in more robust conclusions.\n\nTo make the annotation step slightly less tedious, we have implemented a light-weight javascript tool called [Span](https://github.com/tare/Span).\n\nThe annotation files have the following TSV file format\n\n|                | 32.06_2.04 | 31.16_2.04 | 14.07_2.1 | \u2026   | 28.16_33.01 |\n| -------------- | ---------- | ---------- | --------- | --- | ----------- |\n| Vent_Med_White | 0          | 0          | 0         | \u2026   | 0           |\n| Vent_Horn      | 1          | 1          | 0         | \u2026   | 0           |\n| Vent_Lat_White | 0          | 0          | 0         | \u2026   | 0           |\n| Med_Grey       | 0          | 0          | 0         | \u2026   | 0           |\n| Dors_Horn      | 0          | 0          | 0         | \u2026   | 0           |\n| Dors_Edge      | 0          | 0          | 0         | \u2026   | 1           |\n| Med_Lat_White  | 0          | 0          | 0         | \u2026   | 0           |\n| Vent_Edge      | 0          | 0          | 1         | \u2026   | 0           |\n| Dors_Med_White | 0          | 0          | 0         | \u2026   | 0           |\n| Cent_Can       | 0          | 0          | 0         | \u2026   | 0           |\n| Lat_Edge       | 0          | 0          | 0         | \u2026   | 0           |\n\nThe rows and columns correspond to the user-define anatomical annotation regions (AAR) and ST spot coordinates (X and Y coordinates are separated by an underscore), respectively. For instance, the spot 32.06_2.04 has the Vent_Horn annotation (i.e. located in ventral horn). The annotation category of each ST spot is **one-hot encoded** and we do not currently support more than one annotation category per ST spot.\n\nST spots without annotation categories are discarded in the analysis. This behaviour can be useful when you want to discard some ST spots from the analysis based on the tissue histology.\n\n### Preparation of metadata table\n\nThe metadata table contains information about the samples (i.e. count files). Additionally, the metadata table is used for matching count and annotation files.\n\nThe metadata table has the following TSV file format\n\n| name      | level_1   | level_2 | level_3 | count_file                                                       | annotation_file           | image_file              |\n| --------- | --------- | ------- | ------- | ---------------------------------------------------------------- | ------------------------- | ----------------------- |\n| L7CN36_C1 | G93A p120 | F       | 1394    | count_tables/L7CN36_C1_stdata_aligned_counts_IDs.txt.unified.tsv | annotations/L7CN36_C1.tsv | images/L7CN36_C1_HE.jpg |\n| L7CN36_C2 | G93A p120 | F       | 1394    | count_tables/L7CN36_C2_stdata_aligned_counts_IDs.txt.unified.tsv | annotations/L7CN36_C2.tsv | images/L7CN36_C2_HE.jpg |\n| L7CN30_C1 | WT p120   | M       | 2967    | count_tables/L7CN30_C1_stdata_aligned_counts_IDs.txt.unified.tsv | annotations/L7CN30_C1.tsv | images/L7CN30_C1_HE.jpg |\n| L7CN30_C2 | WT p120   | M       | 2967    | count_tables/L7CN30_C2_stdata_aligned_counts_IDs.txt.unified.tsv | annotations/L7CN30_C2.tsv | images/L7CN30_C2_HE.jpg |\n| L7CN69_D1 | WT p120   | M       | 1310    | count_tables/L7CN69_D1_stdata_aligned_counts_IDs.txt.unified.tsv | annotations/L7CN69_D1.tsv | images/L7CN69_D1_HE.jpg |\n| L7CN69_D2 | WT p120   | M       | 1310    | count_tables/L7CN69_D2_stdata_aligned_counts_IDs.txt.unified.tsv | annotations/L7CN69_D2.tsv | images/L7CN69_D2_HE.jpg |\n| CN96_E1   | WT p120   | F       | 1040    | count_tables/CN96_E1_stdata_aligned_counts_IDs.txt.unified.tsv   | annotations/CN96_E1.tsv   | images/CN96_E1_HE.jpg   |\n| CN96_E2   | WT p120   | F       | 1040    | count_tables/CN96_E2_stdata_aligned_counts_IDs.txt.unified.tsv   | annotations/CN96_E2.tsv   | images/CN96_E2_HE.jpg   |\n| CN93_E1   | G93A p120 | M       | 975     | count_tables/CN93_E1_stdata_aligned_counts_IDs.txt.unified.tsv   | annotations/CN93_E1.tsv   | images/CN93_E1_HE.jpg   |\n| CN93_E2   | G93A p120 | M       | 975     | count_tables/CN93_E2_stdata_aligned_counts_IDs.txt.unified.tsv   | annotations/CN93_E2.tsv   | images/CN93_E2_HE.jpg   |\n\nEach sample (i.e. slide) has its own row in the metadata table. The columns `level_1`, `level_2`, and `level_3` define how the samples are analyzed using the linear hierarchical AAR model. The columns `level_1`, `count_file`, and `annotation_file` are mandatory. The column `level_2` is mandatory when using the two-level model. Similarly, the columns `level_2` and `level_3` are mandatory when using the three-level model. At the moment we only support categorical variables.\n\nIf a given slide contains tissue sections from multiple biological conditions in terms of the explanatory variables, then it is recommended to split the tissue sections into multiple count files so that the design matrix can be defined accordingly.\n\nThe user can include additional columns at their own discretion. For instance, we will use the column `image_file` in the [tutorials](tutorials/).\n\n### Example data\n\nIn the `tutorials` directory, we have two example ST data sets\n\n1. [ALS](tutorials/als_st/) [[3]](#references)\n2. [Olfactory Bulb](tutorials/olfactory_bulb_st/) [[1]](#references)\n\n### Splotch analysis\n\nPlease see the [ALS](tutorials/als_st/Tutorial.ipynb) and [Olfactory Bulb](tutorials/olfactory_bulb_st/Tutorial.ipynb) tutorials.\n\nIn the simplest setting, the following lines would be enough to run Splotch on a single gene\n\n```python\n# read input data\nsplotch_input_data = get_input_data(\"metadata.tsv\")\n\n# run Splotch on the Gfap gene\nkey = random.PRNGKey(0)\nkey, key_ = random.split(key)\nsplotch_result_nuts = run_nuts(key_, [\"Gfap\"], splotch_input_data)\n```\n\n### References\n\n[1] St\u00e5hl, Patrik L., et al. [\"Visualization and analysis of gene expression in tissue sections by spatial transcriptomics.\"](https://science.sciencemag.org/content/353/6294/78) _Science_ 353.6294 (2016): 78-82.\n\n[2] Phan, Du, et al. [\"Composable Effects for Flexible and Accelerated Probabilistic Programming in NumPyro.\"](https://www.jstatsoft.org/article/view/v076i01) _arXiv preprint_ 1912.11554 (2019).\n\n[3] Maniatis, Silas, et al. [\"Spatiotemporal dynamics of molecular pathology in amyotrophic lateral sclerosis.\"](https://science.sciencemag.org/content/364/6435/89) _Science_ 364.6435 (2019): 89-93.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Splotch is a hierarchical generative probabilistic model for analyzing Spatial Transcriptomics (ST) data",
    "version": "0.1.1a0",
    "project_urls": {
        "Bug Tracker": "https://github.com/tare/Splotch/issues",
        "Homepage": "https://github.com/tare/Splotch",
        "Repository": "https://github.com/tare/Splotch"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "49d472e00d80bc9ec31a7798d5cf87bf3b710383e1d7b2706124eb2e3fbc5b2f",
                "md5": "8ae24397138a7478a8fcdece865d8948",
                "sha256": "8f540ec147ae068de1ae3040ed4f26d9236a924f3361517cdc3a3d96475da823"
            },
            "downloads": -1,
            "filename": "splotch_st-0.1.1a0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8ae24397138a7478a8fcdece865d8948",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9,<3.13",
            "size": 22821,
            "upload_time": "2023-12-22T18:34:51",
            "upload_time_iso_8601": "2023-12-22T18:34:51.332455Z",
            "url": "https://files.pythonhosted.org/packages/49/d4/72e00d80bc9ec31a7798d5cf87bf3b710383e1d7b2706124eb2e3fbc5b2f/splotch_st-0.1.1a0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "518049262abbbe569b1c8fccb99874367290a7cede656cc771521eb59448eabe",
                "md5": "c1a28036dedc8bc78b4636fdce8faad6",
                "sha256": "6f077d9c23861e908ceed947ffb60bdc24556e49ad2e940730fadc011d2349e1"
            },
            "downloads": -1,
            "filename": "splotch_st-0.1.1a0.tar.gz",
            "has_sig": false,
            "md5_digest": "c1a28036dedc8bc78b4636fdce8faad6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9,<3.13",
            "size": 23396,
            "upload_time": "2023-12-22T18:34:53",
            "upload_time_iso_8601": "2023-12-22T18:34:53.511374Z",
            "url": "https://files.pythonhosted.org/packages/51/80/49262abbbe569b1c8fccb99874367290a7cede656cc771521eb59448eabe/splotch_st-0.1.1a0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-22 18:34:53",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "tare",
    "github_project": "Splotch",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "splotch-st"
}
        
Elapsed time: 0.22059s