paste-bio


Namepaste-bio JSON
Version 1.4.0 PyPI version JSON
download
home_pagehttps://github.com/raphael-group/paste
SummaryA computational method to align and integrate spatial transcriptomics experiments.
upload_time2023-05-26 20:41:42
maintainer
docs_urlNone
authorMax Land
requires_python>=3.6
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![PyPI](https://img.shields.io/pypi/v/paste-bio.svg)](https://pypi.org/project/paste-bio)
[![Downloads](https://pepy.tech/badge/paste-bio)](https://pepy.tech/project/paste-bio)
[![Documentation Status](https://readthedocs.org/projects/paste-bio/badge/?version=latest)](https://paste-bio.readthedocs.io/en/stable/?badge=stable)
[![Anaconda](https://anaconda.org/bioconda/paste-bio/badges/version.svg)](https://anaconda.org/bioconda/paste-bio/badges/version.svg)
[![bioconda-downloads](https://anaconda.org/bioconda/paste-bio/badges/downloads.svg)](https://anaconda.org/bioconda/paste-bio/badges/downloads.svg)

# PASTE

![PASTE Overview](https://github.com/raphael-group/paste/blob/main/docs/source/_static/images/paste_overview.png)

PASTE is a computational method that leverages both gene expression similarity and spatial distances between spots to align and integrate spatial transcriptomics data. In particular, there are two methods:
1. `pairwise_align`: align spots across pairwise slices.
2. `center_align`: integrate multiple slices into one center slice.

You can read full paper [here](https://www.nature.com/articles/s41592-022-01459-6). 

Additional examples and the code to reproduce the paper's analyses can be found [here](https://github.com/raphael-group/paste_reproducibility). Preprocessed datasets used in the paper can be found on [zenodo](https://doi.org/10.5281/zenodo.6334774).

### Recent News

* PASTE is now published in [Nature Methods](https://www.nature.com/articles/s41592-022-01459-6)!

* The code to reproduce the analisys can be found [here](https://github.com/raphael-group/paste_reproducibility).

* As of version 1.2.0, PASTE now supports GPU implementation via Pytorch. For more details, see the GPU section of the [Tutorial notebook](docs/source/notebooks/getting-started.ipynb).

### Installation

The easiest way is to install PASTE on pypi: https://pypi.org/project/paste-bio/. 

`pip install paste-bio` 

Or you can install PASTE on bioconda: https://anaconda.org/bioconda/paste-bio.

`conda install -c bioconda paste-bio`

Check out Tutorial.ipynb for an example of how to use PASTE.

Alternatively, you can clone the respository and try the following example in a
notebook or the command line. 

### Quick Start

To use PASTE we require at least two slices of spatial-omics data (both
expression and coordinates) that are in
anndata format (i.e. read in by scanpy/squidpy). We have included a breast
cancer dataset from [1] in the [sample_data folder](sample_data/) of this repo 
that we will use as an example below to show how to use PASTE.

```python
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import numpy as np
import scanpy as sc
import paste as pst

# Load Slices
data_dir = './sample_data/' # change this path to the data you wish to analyze

# Assume that the coordinates of slices are named slice_name + "_coor.csv"
def load_slices(data_dir, slice_names=["slice1", "slice2"]):
    slices = []  
    for slice_name in slice_names:
        slice_i = sc.read_csv(data_dir + slice_name + ".csv")
        slice_i_coor = np.genfromtxt(data_dir + slice_name + "_coor.csv", delimiter = ',')
        slice_i.obsm['spatial'] = slice_i_coor
        # Preprocess slices
        sc.pp.filter_genes(slice_i, min_counts = 15)
        sc.pp.filter_cells(slice_i, min_counts = 100)
        slices.append(slice_i)
    return slices

slices = load_slices(data_dir)
slice1, slice2 = slices

# Pairwise align the slices
pi12 = pst.pairwise_align(slice1, slice2)

# To visualize the alignment you can stack the slices 
# according to the alignment pi
slices, pis = [slice1, slice2], [pi12]
new_slices = pst.stack_slices_pairwise(slices, pis)

slice_colors = ['#e41a1c','#377eb8']
plt.figure(figsize=(7,7))
for i in range(len(new_slices)):
    pst.plot_slice(new_slices[i],slice_colors[i],s=400)
plt.legend(handles=[mpatches.Patch(color=slice_colors[0], label='1'),mpatches.Patch(color=slice_colors[1], label='2')])
plt.gca().invert_yaxis()
plt.axis('off')
plt.show()

# Center align slices
## We have to reload the slices as pairwise_alignment modifies the slices.
slices = load_slices(data_dir)
slice1, slice2 = slices

# Construct a center slice
## choose one of the slices as the coordinate reference for the center slice,
## i.e. the center slice will have the same number of spots as this slice and
## the same coordinates.
initial_slice = slice1.copy()    
slices = [slice1, slice2]
lmbda = len(slices)*[1/len(slices)] # set hyperparameter to be uniform

## Possible to pass in an initial pi (as keyword argument pis_init) 
## to improve performance, see Tutorial.ipynb notebook for more details.
center_slice, pis = pst.center_align(initial_slice, slices, lmbda) 

## The low dimensional representation of our center slice is held 
## in the matrices W and H, which can be used for downstream analyses
W = center_slice.uns['paste_W']
H = center_slice.uns['paste_H']
```

### GPU implementation
PASTE now is compatible with gpu via Pytorch. All we need to do is add the following two parameters to our main functions:
```
pi12 = pst.pairwise_align(slice1, slice2, backend = ot.backend.TorchBackend(), use_gpu = True)

center_slice, pis = pst.center_align(initial_slice, slices, lmbda, backend = ot.backend.TorchBackend(), use_gpu = True) 
```
For more details, see the GPU section of the [Tutorial notebook](docs/source/notebooks/getting-started.ipynb).

### Command Line

We provide the option of running PASTE from the command line. 

First, clone the repository:

`git clone https://github.com/raphael-group/paste.git`

Next, when providing files, you will need to provide two separate files: the gene expression data followed by spatial data (both as .csv) for the code to initialize one slice object.

Sample execution (based on this repo): `python paste-cmd-line.py -m center -f ./sample_data/slice1.csv ./sample_data/slice1_coor.csv ./sample_data/slice2.csv ./sample_data/slice2_coor.csv ./sample_data/slice3.csv ./sample_data/slice3_coor.csv`

Note: `pairwise` will return pairwise alignment between each consecutive pair of slices (e.g. \[slice1,slice2\], \[slice2,slice3\]).

| Flag | Name | Description | Default Value |
| --- | --- | --- | --- |
| -m | mode | Select either `pairwise` or `center` | (str) `pairwise` |
| -f | files | Path to data files (.csv) | None |
| -d | direc | Directory to store output files | Current Directory |
| -a | alpha | Alpha parameter for PASTE | (float) `0.1` |
| -c | cost | Expression dissimilarity cost (`kl` or `Euclidean`) | (str) `kl` |
| -p | n_components | n_components for NMF step in `center_align` | (int) `15` |
| -l | lmbda | Lambda parameter in `center_align` | (floats) probability vector of length `n`  |
| -i | intial_slice | Specify which file is also the intial slice in `center_align` | (int) `1` |
| -t | threshold | Convergence threshold for `center_align` | (float) `0.001` |
| -x | coordinates | Output new coordinates (toggle to turn on) | `False` |
| -w | weights | Weights files of spots in each slice (.csv) | None |
| -s | start | Initial alignments for OT. If not given uses uniform (.csv structure similar to alignment output) | None |

`pairwise_align` outputs a (.csv) file containing mapping of spots between each consecutive pair of slices. The rows correspond to spots of the first slice, and cols the second.

`center_align` outputs two files containing the low dimensional representation (NMF decomposition) of the center slice gene expression, and files containing a mapping of spots between the center slice (rows) to each input slice (cols).

### Sample Dataset

Added sample spatial transcriptomics dataset consisting of four breast cancer slice courtesy of:

[1] Ståhl, Patrik & Salmén, Fredrik & Vickovic, Sanja & Lundmark, Anna & Fernandez Navarro, Jose & Magnusson, Jens & Giacomello, Stefania & Asp, Michaela & Westholm, Jakub & Huss, Mikael & Mollbrink, Annelie & Linnarsson, Sten & Codeluppi, Simone & Borg, Åke & Pontén, Fredrik & Costea, Paul & Sahlén, Pelin Akan & Mulder, Jan & Bergmann, Olaf & Frisén, Jonas. (2016). Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science. 353. 78-82. 10.1126/science.aaf2403. 

Note: Original data is (.tsv), but we converted it to (.csv).

### References

Ron Zeira, Max Land, Alexander Strzalkowski and Benjamin J. Raphael. "Alignment and integration of spatial transcriptomics data". Nature Methods (2022). https://doi.org/10.1038/s41592-022-01459-6

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/raphael-group/paste",
    "name": "paste-bio",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "",
    "author": "Max Land",
    "author_email": "max.ruikang.land@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/ab/9d/ecf6f5dd4a8b0d4b5205559a520e07379d4632df3d880005c51e351cd5ba/paste-bio-1.4.0.tar.gz",
    "platform": null,
    "description": "[![PyPI](https://img.shields.io/pypi/v/paste-bio.svg)](https://pypi.org/project/paste-bio)\n[![Downloads](https://pepy.tech/badge/paste-bio)](https://pepy.tech/project/paste-bio)\n[![Documentation Status](https://readthedocs.org/projects/paste-bio/badge/?version=latest)](https://paste-bio.readthedocs.io/en/stable/?badge=stable)\n[![Anaconda](https://anaconda.org/bioconda/paste-bio/badges/version.svg)](https://anaconda.org/bioconda/paste-bio/badges/version.svg)\n[![bioconda-downloads](https://anaconda.org/bioconda/paste-bio/badges/downloads.svg)](https://anaconda.org/bioconda/paste-bio/badges/downloads.svg)\n\n# PASTE\n\n![PASTE Overview](https://github.com/raphael-group/paste/blob/main/docs/source/_static/images/paste_overview.png)\n\nPASTE is a computational method that leverages both gene expression similarity and spatial distances between spots to align and integrate spatial transcriptomics data. In particular, there are two methods:\n1. `pairwise_align`: align spots across pairwise slices.\n2. `center_align`: integrate multiple slices into one center slice.\n\nYou can read full paper [here](https://www.nature.com/articles/s41592-022-01459-6). \n\nAdditional examples and the code to reproduce the paper's analyses can be found [here](https://github.com/raphael-group/paste_reproducibility). Preprocessed datasets used in the paper can be found on [zenodo](https://doi.org/10.5281/zenodo.6334774).\n\n### Recent News\n\n* PASTE is now published in [Nature Methods](https://www.nature.com/articles/s41592-022-01459-6)!\n\n* The code to reproduce the analisys can be found [here](https://github.com/raphael-group/paste_reproducibility).\n\n* As of version 1.2.0, PASTE now supports GPU implementation via Pytorch. For more details, see the GPU section of the [Tutorial notebook](docs/source/notebooks/getting-started.ipynb).\n\n### Installation\n\nThe easiest way is to install PASTE on pypi: https://pypi.org/project/paste-bio/. \n\n`pip install paste-bio` \n\nOr you can install PASTE on bioconda: https://anaconda.org/bioconda/paste-bio.\n\n`conda install -c bioconda paste-bio`\n\nCheck out Tutorial.ipynb for an example of how to use PASTE.\n\nAlternatively, you can clone the respository and try the following example in a\nnotebook or the command line. \n\n### Quick Start\n\nTo use PASTE we require at least two slices of spatial-omics data (both\nexpression and coordinates) that are in\nanndata format (i.e. read in by scanpy/squidpy). We have included a breast\ncancer dataset from [1] in the [sample_data folder](sample_data/) of this repo \nthat we will use as an example below to show how to use PASTE.\n\n```python\nimport matplotlib.pyplot as plt\nimport matplotlib.patches as mpatches\nimport numpy as np\nimport scanpy as sc\nimport paste as pst\n\n# Load Slices\ndata_dir = './sample_data/' # change this path to the data you wish to analyze\n\n# Assume that the coordinates of slices are named slice_name + \"_coor.csv\"\ndef load_slices(data_dir, slice_names=[\"slice1\", \"slice2\"]):\n    slices = []  \n    for slice_name in slice_names:\n        slice_i = sc.read_csv(data_dir + slice_name + \".csv\")\n        slice_i_coor = np.genfromtxt(data_dir + slice_name + \"_coor.csv\", delimiter = ',')\n        slice_i.obsm['spatial'] = slice_i_coor\n        # Preprocess slices\n        sc.pp.filter_genes(slice_i, min_counts = 15)\n        sc.pp.filter_cells(slice_i, min_counts = 100)\n        slices.append(slice_i)\n    return slices\n\nslices = load_slices(data_dir)\nslice1, slice2 = slices\n\n# Pairwise align the slices\npi12 = pst.pairwise_align(slice1, slice2)\n\n# To visualize the alignment you can stack the slices \n# according to the alignment pi\nslices, pis = [slice1, slice2], [pi12]\nnew_slices = pst.stack_slices_pairwise(slices, pis)\n\nslice_colors = ['#e41a1c','#377eb8']\nplt.figure(figsize=(7,7))\nfor i in range(len(new_slices)):\n    pst.plot_slice(new_slices[i],slice_colors[i],s=400)\nplt.legend(handles=[mpatches.Patch(color=slice_colors[0], label='1'),mpatches.Patch(color=slice_colors[1], label='2')])\nplt.gca().invert_yaxis()\nplt.axis('off')\nplt.show()\n\n# Center align slices\n## We have to reload the slices as pairwise_alignment modifies the slices.\nslices = load_slices(data_dir)\nslice1, slice2 = slices\n\n# Construct a center slice\n## choose one of the slices as the coordinate reference for the center slice,\n## i.e. the center slice will have the same number of spots as this slice and\n## the same coordinates.\ninitial_slice = slice1.copy()    \nslices = [slice1, slice2]\nlmbda = len(slices)*[1/len(slices)] # set hyperparameter to be uniform\n\n## Possible to pass in an initial pi (as keyword argument pis_init) \n## to improve performance, see Tutorial.ipynb notebook for more details.\ncenter_slice, pis = pst.center_align(initial_slice, slices, lmbda) \n\n## The low dimensional representation of our center slice is held \n## in the matrices W and H, which can be used for downstream analyses\nW = center_slice.uns['paste_W']\nH = center_slice.uns['paste_H']\n```\n\n### GPU implementation\nPASTE now is compatible with gpu via Pytorch. All we need to do is add the following two parameters to our main functions:\n```\npi12 = pst.pairwise_align(slice1, slice2, backend = ot.backend.TorchBackend(), use_gpu = True)\n\ncenter_slice, pis = pst.center_align(initial_slice, slices, lmbda, backend = ot.backend.TorchBackend(), use_gpu = True) \n```\nFor more details, see the GPU section of the [Tutorial notebook](docs/source/notebooks/getting-started.ipynb).\n\n### Command Line\n\nWe provide the option of running PASTE from the command line. \n\nFirst, clone the repository:\n\n`git clone https://github.com/raphael-group/paste.git`\n\nNext, when providing files, you will need to provide two separate files: the gene expression data followed by spatial data (both as .csv) for the code to initialize one slice object.\n\nSample execution (based on this repo): `python paste-cmd-line.py -m center -f ./sample_data/slice1.csv ./sample_data/slice1_coor.csv ./sample_data/slice2.csv ./sample_data/slice2_coor.csv ./sample_data/slice3.csv ./sample_data/slice3_coor.csv`\n\nNote: `pairwise` will return pairwise alignment between each consecutive pair of slices (e.g. \\[slice1,slice2\\], \\[slice2,slice3\\]).\n\n| Flag | Name | Description | Default Value |\n| --- | --- | --- | --- |\n| -m | mode | Select either `pairwise` or `center` | (str) `pairwise` |\n| -f | files | Path to data files (.csv) | None |\n| -d | direc | Directory to store output files | Current Directory |\n| -a | alpha | Alpha parameter for PASTE | (float) `0.1` |\n| -c | cost | Expression dissimilarity cost (`kl` or `Euclidean`) | (str) `kl` |\n| -p | n_components | n_components for NMF step in `center_align` | (int) `15` |\n| -l | lmbda | Lambda parameter in `center_align` | (floats) probability vector of length `n`  |\n| -i | intial_slice | Specify which file is also the intial slice in `center_align` | (int) `1` |\n| -t | threshold | Convergence threshold for `center_align` | (float) `0.001` |\n| -x | coordinates | Output new coordinates (toggle to turn on) | `False` |\n| -w | weights | Weights files of spots in each slice (.csv) | None |\n| -s | start | Initial alignments for OT. If not given uses uniform (.csv structure similar to alignment output) | None |\n\n`pairwise_align` outputs a (.csv) file containing mapping of spots between each consecutive pair of slices. The rows correspond to spots of the first slice, and cols the second.\n\n`center_align` outputs two files containing the low dimensional representation (NMF decomposition) of the center slice gene expression, and files containing a mapping of spots between the center slice (rows) to each input slice (cols).\n\n### Sample Dataset\n\nAdded sample spatial transcriptomics dataset consisting of four breast cancer slice courtesy of:\n\n[1] St\u00e5hl, Patrik & Salm\u00e9n, Fredrik & Vickovic, Sanja & Lundmark, Anna & Fernandez Navarro, Jose & Magnusson, Jens & Giacomello, Stefania & Asp, Michaela & Westholm, Jakub & Huss, Mikael & Mollbrink, Annelie & Linnarsson, Sten & Codeluppi, Simone & Borg, \u00c5ke & Pont\u00e9n, Fredrik & Costea, Paul & Sahl\u00e9n, Pelin Akan & Mulder, Jan & Bergmann, Olaf & Fris\u00e9n, Jonas. (2016). Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science. 353. 78-82. 10.1126/science.aaf2403. \n\nNote: Original data is (.tsv), but we converted it to (.csv).\n\n### References\n\nRon Zeira, Max Land, Alexander Strzalkowski and Benjamin J. Raphael. \"Alignment and integration of spatial transcriptomics data\". Nature Methods (2022). https://doi.org/10.1038/s41592-022-01459-6\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "A computational method to align and integrate spatial transcriptomics experiments.",
    "version": "1.4.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/raphael-group/paste/issues",
        "Homepage": "https://github.com/raphael-group/paste"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9fa30f6ad33e70e6abb97c3b860d04a8b800f9a0106b31f8cc76e80ab6815412",
                "md5": "3baa4747930592dea0c804d9731d28af",
                "sha256": "372946e9978871ce31bf0d3a6f1bab1e04451b9df914e5ebdca7ed79846cfad9"
            },
            "downloads": -1,
            "filename": "paste_bio-1.4.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3baa4747930592dea0c804d9731d28af",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 14592,
            "upload_time": "2023-05-26T20:41:39",
            "upload_time_iso_8601": "2023-05-26T20:41:39.419822Z",
            "url": "https://files.pythonhosted.org/packages/9f/a3/0f6ad33e70e6abb97c3b860d04a8b800f9a0106b31f8cc76e80ab6815412/paste_bio-1.4.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ab9decf6f5dd4a8b0d4b5205559a520e07379d4632df3d880005c51e351cd5ba",
                "md5": "45afba19195c06a500174040d1a06615",
                "sha256": "fa525de92cfe1b179f2ce797514b352c489ac6b12692c3cfaa1e6862fc7cdbbe"
            },
            "downloads": -1,
            "filename": "paste-bio-1.4.0.tar.gz",
            "has_sig": false,
            "md5_digest": "45afba19195c06a500174040d1a06615",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 16781,
            "upload_time": "2023-05-26T20:41:42",
            "upload_time_iso_8601": "2023-05-26T20:41:42.039328Z",
            "url": "https://files.pythonhosted.org/packages/ab/9d/ecf6f5dd4a8b0d4b5205559a520e07379d4632df3d880005c51e351cd5ba/paste-bio-1.4.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-26 20:41:42",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "raphael-group",
    "github_project": "paste",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "paste-bio"
}
        
Elapsed time: 0.14335s