cvapipe-analysis

Name	cvapipe-analysis JSON
Version	0.2.0 JSON
	download
home_page	https://github.com/AllenCellModeling/cvapipe_analysis
Summary	Analysis pipeline usinf in Integrated intracellular organization and its variations in human iPS cells
upload_time	2024-03-28 23:48:28
maintainer	None
docs_url	None
author	Matheus Viana
requires_python	>=3.9
license	Allen Institute Software License
keywords	cvapipe_analysis
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # cvapipe_analysis

>[!IMPORTANT]  
For reproducing the analysis and figures in our paper [1], please use this version of the code:

https://github.com/AllenCell/cvapipe_analysis/tree/nature-paper

[1] - Viana, M. P., Chen, J., Knijnenburg, T. A., Vasan, R., Yan, C., Arakaki, J. E., ... & Rafelski, S. M. (2023). Integrated intracellular organization and its variations in human iPS cells. Nature, 613(7943), 345-354.

## Analysis Pipeline for Cell Variance

[![Build Status](https://github.com/AllenCell/cvapipe_analysis/workflows/Build%20Main/badge.svg)](https://github.com/AllenCell/cvapipe_analysis/actions)
[![Documentation](https://github.com/AllenCell/cvapipe_analysis/workflows/Documentation/badge.svg)](https://AllenCell.github.io/cvapipe_analysis/)


![Shape modes](docs/logo.png)

---

## Installation

First, create a conda environment for this project:

```
conda create --name cvapipe python=3.8
conda activate cvapipe
```

then clone this repo

```
git clone https://github.com/AllenCell/cvapipe_analysis.git
```

and install it with

```
cd cvapipe_analysis
pip install -e .
```

Alternatively, install the latest stable version from pypi by running

```
pip install cvapipe_analysis
```

## Types of usage

This package can be used to reproduce main results shown in [1] or to generate similar results using your own data. However, before applying to your dataset, we highly recommend you to first run it for reproducibility in our test dataset to understand how the package works.

## The YAML configuration file

This package is fully configured through the file `config.yaml`. This file is divided into sections that more or less has a one-to-one mapping to existing workflow steps. Here are the main things you need to know about the configuration file:

**Project**

```
appName: cvapipe_analysis
project:
    # Sufix to append to local_staging
    local_staging: "path_to_your/local_staging"
    overwrite: on
```

Set the full path where you want data and results to be stored in `local_staging`.

**Data**

```
data:
    nucleus:
        channel: "dna_segmentation"
        alias: "NUC"
        color: "#3AADA7"
    cell:
        channel: "membrane_segmentation"
        alias: "MEM"
        color: "#F200FF"
    structure:
        channel: "struct_segmentation_roof"
        alias: "STR"
        color: "#000000"
    structure-raw:
        channel: "structure"
        alias: "STRRAW"
        color: "#000000"
```

Here we provide a description of the data. Aliases must be unique and they are used in the rest of the configuration file to specify which data we are referring to. In case you are using this package on your own data, be aware that the values used in the field `channel` must be found in the column `name_dict`of your input manifets file (see the section "Running the pipeline on your own data").

**Features**

```
features:
    aliases: ["NUC", "MEM", "STR"]
    # SHE - Spherical harmonics expansion
    SHE:
        alignment:
            align: on
            unique: off
            reference: "cell"
        aliases: ["NUC", "MEM"]
        # Size of Gaussian kernal used to smooth the
        # images before SHE coefficients calculation
        sigma: 2
        # Number of SHE coefficients used to describe cell
        # and nuclear shape
        lmax: 16
```

This section is used to specify which aliases we should compute features on. In addition, which aliases we should calculate the spherical harmonics coefficies on and which type of alignment should be used.

**Pre-processing**

```
preprocessing:
    remove_mitotics: on
    remove_outliers: on
```

Here we set whether or not to remove mitotic cells or outlier from the dataset. You can turn this off when running `cvapipe_analysis` on your own data.

**Shape Space**

```
shapespace:
    # Specify the a set of aliases here
    aliases: ["NUC", "MEM"]
    # Sort shape modes by volume of
    sorter: "MEM"
    # Percentage of exteme points to be removed
    removal_pct: 1.0
    # Number of principal components to be calculated
    number_of_shape_modes: 8
    # Map points
    map_points: [-2.0, -1.5, -1.0, -0.5, 0.0, 0.5, 1.0, 1.5, 2.0]
    plot:
        swapxy_on_zproj: off
        # limits of x and y axies in the animated GIFs
        limits: [-150, 150, -80, 80]
```

Here we specify which aliases should be used to create a shape space. This must be a subset of the aliases specified above to have their spherical harmonics coefficients computed. In case os small datasets with only hundreds of cells, you may want to reduce the number of map points of your shape soace. The number of map points must be odd.

**Intensity Parameterization**

```
parameterization:
    inner: "NUC"
    outer: "MEM"
    parameterize: ["RAWSTR", "STR"]
    number_of_interpolating_points: 32
```

First we specify which alias should be used as internal and external references and the aliases that we obtain parameterization for.

**Structures**

```
structures:
    "FBL": ["nucleoli [DFC)", "#A9D1E5", "{'raw': (420, 2610), 'seg': (0,30), 'avgseg': (80,160)}"]
    "NPM1": ["nucleoli [GC)", "#88D1E5", "{'raw': (480, 8300), 'seg': (0,30), 'avgseg': (80,160)}"]
    "SON": ["nuclear speckles", "#3292C9", "{'raw': (420, 1500), 'seg': (0,10), 'avgseg': (10,60)}"]
    "SMC1A": ["cohesins", "#306598", "{'raw': (450, 630), 'seg': (0,2), 'avgseg': (0,15)}"]
    "HIST1H2BJ": ["histones", "#305098", "{'raw': (450, 2885), 'seg': (0,30), 'avgseg': (10,100)}"]
    "LMNB1": ["nuclear envelope", "#084AE7", "{'raw': (475,1700), 'seg': (0,30), 'avgseg': (0,60)}"]
    "NUP153": ["nuclear pores", "#0840E7", "{'raw': (420, 600), 'seg': (0,15), 'avgseg': (0,50)}"]
    "SEC61B": ["ER [Sec61 beta)", "#FFFFB5", "{'raw': (490,1070), 'seg': (0,30), 'avgseg': (0,100)}"]
    "ATP2A2": ["ER [SERCA2)", "#FFFFA0", "{'raw': (430,670), 'seg': (0,25), 'avgseg': (0,80)}"]
    "SLC25A17": ["peroxisomes", "#FFD184", "{'raw': (400,515), 'seg': (0,7), 'avgseg': (0,15)}"]
    "RAB5A": ["endosomes", "#FFC846", "{'raw': (420,600), 'seg': (0,7), 'avgseg': (0,10)}"]
    "TOMM20": ["mitochondria", "#FFBE37", "{'raw': (410,815), 'seg': (0,27), 'avgseg': (0,50)}"]
    "LAMP1": ["lysosomes", "#AD952A", "{'raw': (440,800), 'seg': (0,27), 'avgseg': (0,30)}"]
    "ST6GAL1": ["Golgi", "#B7952A", "{'raw': (400,490), 'seg': (0,17), 'avgseg': (0,30)}"]
    "TUBA1B": ["microtubules", "#9D7000", "{'raw': (1100,3200), 'seg': (0,22), 'avgseg': (0,60)}"]
    "CETN2": ["centrioles", "#C8E1AA", "{'raw': (440,800), 'seg': (0, 2), 'avgseg': (0,2)}"]
    "GJA1": ["gap junctions", "#BEE18C", "{'raw': (420,2200), 'seg': (0,4), 'avgseg': (0,8)}"]
    "TJP1": ["tight junctions", "#B4C878", "{'raw': (420,1500), 'seg': (0,8), 'avgseg': (0,20)}"]
    "DSP": ["desmosomes", "#B4C864", "{'raw': (410,620), 'seg': (0,5), 'avgseg': (0,3)}"]
    "CTNNB1": ["adherens junctions", "#96AA46", "{'raw': (410,750), 'seg': (0,22), 'avgseg': (5,40)}"]
    "AAVS1": ["plasma membrane", "#FFD2FF", "{'raw': (505,2255), 'seg': (0,30), 'avgseg': (10,120)}"]
    "ACTB": ["actin filaments", "#E6A0FF", "{'raw': (550,1300), 'seg': (0,18), 'avgseg': (0,35)}"]
    "ACTN1": ["actin bundles", "#E696FF", "{'raw': (440,730), 'seg': (0,13), 'avgseg': (0,25)}"]
    "MYH10": ["actomyosin bundles", "#FF82FF", "{'raw': (440,900), 'seg': (0,13), 'avgseg': (0,25)}"]
    "PXN": ["matrix adhesions", "#CB1CCC", "{'raw': (410,490), 'seg': (0,5), 'avgseg': (0,5)}"]
```

Here we specify a dictionary with the gene names, description and color for each structure. Again, in case you are applying to your own data, make sure you specify here the values you use in the column `structure_name` of your manifest file (see the section "Running the pipeline on your own data"). A list with contrast values (min, max) for each structure is also specified here and will be used for the plotting functions to display single cell images of raw data, segmentation or average morphed cells (avgseg).

## Running the pipeline to reproduce the paper

This analysis is currently not configured to run as a workflow. Please run steps individually.

### 1. Download the single-cell image dataset manifest including raw GFP and segmented cropped images

```
cvapipe_analysis loaddata run
```

This command downloads the whole dataset of ~7Tb. For each cell in the dataset, we provide a raw 3-channels image containing fiducial markers for cell membrane and nucleus, toghether with a FP marker for one intracellular structure. We also provide segmentations for each cell in the format of 5-channels binary images. The extra two channels corresponds to roof-augmented versions of cell and intracellular structures segmentations. For more information about this, please refer to our paper [1]. Metadata about each cell can be found in the file `manifest.csv`. This is a table where each row corresponds to a cell.

**Importantly**, you can download a _small test dataset composed by 300 cells chosen at random_ from the main dataset. To do so, please run

```
cvapipe_analysis loaddata run --test
```

This step saves the single-cell images in the folders `local_staging/loaddata/crop_raw` and `local_staging/loaddata/crop_seg`.

### 2. Compute single-cell features

```
cvapipe_analysis computefeatures run
```

This step extract single-cell features, including cell, nuclear and intracellular volumes and other basic features. Here we also use `aics-shparam` [(link)](https://github.com/AllenCell/aics-shparam) to compute the spherical harmonics coefficients for cell and nuclear shape. This step depends on step 1.

This step saves the features in the file `local_staging/computefeatures/manifest.csv`.

### 3. Pre-processing dataset

```
cvapipe_analysis preprocessing run
```

This step removes outliers and mitotic cells from the single cell dataset. This step depends on step 2.

This step saves results in the file `local_staging/preprocessing/manifest.csv` and the **folder: `local_staging/preprocessing/outliers/`**

-   `xx.png`: Diagnostic plots for outlier detection.

### 4. Compute shapemodes

```
cvapipe_analysis shapemode run
```

Here we implement a few pre-processing steps. First, all mitotic cells are removed from the dataset. Next we use a feature-based outlier detection to detect and remove outliers form the dataset. The remaining dataset is used as input for principal component analysis. Finally, we compute cell and nuclear shape modes. This step depends on step 3.

Two output folders are produced by this step:

**Folder: `local_staging/shapemode/pca/`**

-   `explained_variance.png`: Explained variance by each principal component.
-   `feature_importance.txt`: Importance of first few features of each principal component.
-   `pairwise_correlations.png`: Pairwise correlations between all principal components.

**Folder: `local_staging/shapemode/avgshape/`**

-   `xx.vtk`: vtkPolyData files corresponding to 3D cell and nuclear meshes. We recommend [Paraview](https://www.paraview.org) to open these files.
-   `xx.gif`: Animated GIF illustrating cell and nuclear shape modes from 3 different projections.
-   `combined.tif`: Multichannel TIF that combines all animated GIFs in the same image.

### 5. Create the parameterized intracellular location representation (PILR)

```
cvapipe_analysis parameterization run
```

Here we use `aics-cytoparam` [(link)](https://github.com/AllenCell/aics-cytoparam) to create parameterizations for all of the single-cell data. This steps depends on step 4 and step 3.

One output folder is produced by this step:

**Folder: `local_staging/parameterization/representations/`**

-   `xx.tif`: Multichannels TIFF image with the cell PILR.

### 6. Create average PILRs

```
cvapipe_analysis aggregation run
```

This step average multiple cell PILRs and morphs them into idealized shapes from the shape space. This step depends on step 5.

Two output folders are produced by this step:

**Folder: `local_staging/aggregation/repsagg/`**

-   `avg-SEG-TUBA1B-DNA_MEM_PC4-B5-CODE.tif`: Example of file generated. This represents the average PILR from segmented images of all TUBA1B cells that fall into bin number 5 from shape mode 4.

**Folder: `local_staging/aggregation/aggmorph/`**

-   `avg-SEG-TUBA1B-DNA_MEM_PC4-B5.tif`: Same as above but the PILR has been morphed into the cell shape corresponding to bin number 5 of shape mode 4.

### 7. Correlate single-cells PIRL

```
cvapipe_analysis correlation run
```

This step computes the pair-wise correlation between PILRs of cells. This step depends on step 5.

One output folder is produced by this step:

**Folder: `local_staging/correlation/values/`**

-   `avg-STR-NUC_MEM_PC8-1.tif`: Example of file generated. Correlation matrix of between PILRs of all cells that fall into bin number 1 and shape mode 8.
-   `avg-STR-NUC_MEM_PC8-1.csv`: Example of file generated. Provides the cell indices for the correlation matrix above.

### 8. Stereotypy analysis

```
cvapipe_analysis stereotypy run
```

This step calculates the extent to which a structure’s individual location varies. This step depends on step 5.

Two output folders are produced by this step:

**Folder: `local_staging/stereotypy/values`**

-   `*.csv*`: Stereotypy values.

**Folder: `local_staging/stereotypy/plots`**

-   Resulting plots.

### 9. Concordance analysis

```
cvapipe_analysis concordance run
```

This step calculates the extent to which the structure localized relative to all the other cellular structures. This step depends on step 6.

Two output folders are produced by this step:

**Folder: `local_staging/concordance/values/`**

-   `*.csv*`: Concordance values

**Folder: `local_staging/concordance/plots/`**

-   Resulting plots.

## Running the pipeline on your own data

You need to specify the format of your data using a `manifest.csv` file. Each row of this file corresponds to a cell in your dataset. This file is requred to have the following columns:

`CellId`: Unique ID of the cell. Example: `AB98765`.

`structure_name`: FP structure tagged in the cell. Add something like "NA" if you don't have anything tagged for the cell. Example: `TOMM20`.

`crop_seg`: Full path to the multichannel single cell segmentation.

`crop_raw`: Full path to the multichannel single cell raw image.

`name_dict`: Dictionary that specifies the names of each channel in the two images above. Example: `"{'crop_raw': ['dna_dye', 'membrane', 'gfp'], 'crop_seg': ['dna_seg', 'cell_seg', 'gfp_seg', 'gfp_seg2']}"`. In this case, your `crop_raw` images must have 3 channels once this is the number of names you provide in `name_dict`. Similarly, `crop_seg` must have 4 channels in this example.

You are ready to start using `cvapipe_analysis` once you have this manifest file created. To do so, you should run the step `loaddata` with the additional flag `--csv path_to_manifest`, where `path_to_manifest` is the full path to the manifest file that you juest created:

`cvapipe_analysis loaddata run --csv path_to_manifest`

All the other steps can be ran without modifications.

## Running the pipeline on a cluster with `sbatch` capabilities

If you are running `cvapipe_analysis` on a Slurm cluster or any other cluster with `sbatch` capabilities, each step can be called with a flag `--distribute`. This will spawn many jobs to run in parallel in the cluster. Specific parameters can be set in the `resources` section of the YAML config file.

**_Free software: Allen Institute Software License_**

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/AllenCellModeling/cvapipe_analysis",
    "name": "cvapipe-analysis",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "cvapipe_analysis",
    "author": "Matheus Viana",
    "author_email": "matheus.viana@alleninstitute.org",
    "download_url": "https://files.pythonhosted.org/packages/0a/03/a8bf0162fad7ffd82ec22ef9ca52631b27c6c8a508eb8ee150210c64a015/cvapipe_analysis-0.2.0.tar.gz",
    "platform": null,
    "description": "# cvapipe_analysis\n\n>[!IMPORTANT]  \nFor reproducing the analysis and figures in our paper [1], please use this version of the code:\n\nhttps://github.com/AllenCell/cvapipe_analysis/tree/nature-paper\n\n[1] - Viana, M. P., Chen, J., Knijnenburg, T. A., Vasan, R., Yan, C., Arakaki, J. E., ... & Rafelski, S. M. (2023). Integrated intracellular organization and its variations in human iPS cells. Nature, 613(7943), 345-354.\n\n## Analysis Pipeline for Cell Variance\n\n[![Build Status](https://github.com/AllenCell/cvapipe_analysis/workflows/Build%20Main/badge.svg)](https://github.com/AllenCell/cvapipe_analysis/actions)\n[![Documentation](https://github.com/AllenCell/cvapipe_analysis/workflows/Documentation/badge.svg)](https://AllenCell.github.io/cvapipe_analysis/)\n\n\n![Shape modes](docs/logo.png)\n\n---\n\n## Installation\n\nFirst, create a conda environment for this project:\n\n```\nconda create --name cvapipe python=3.8\nconda activate cvapipe\n```\n\nthen clone this repo\n\n```\ngit clone https://github.com/AllenCell/cvapipe_analysis.git\n```\n\nand install it with\n\n```\ncd cvapipe_analysis\npip install -e .\n```\n\nAlternatively, install the latest stable version from pypi by running\n\n```\npip install cvapipe_analysis\n```\n\n## Types of usage\n\nThis package can be used to reproduce main results shown in [1] or to generate similar results using your own data. However, before applying to your dataset, we highly recommend you to first run it for reproducibility in our test dataset to understand how the package works.\n\n## The YAML configuration file\n\nThis package is fully configured through the file `config.yaml`. This file is divided into sections that more or less has a one-to-one mapping to existing workflow steps. Here are the main things you need to know about the configuration file:\n\n**Project**\n\n```\nappName: cvapipe_analysis\nproject:\n    # Sufix to append to local_staging\n    local_staging: \"path_to_your/local_staging\"\n    overwrite: on\n```\n\nSet the full path where you want data and results to be stored in `local_staging`.\n\n**Data**\n\n```\ndata:\n    nucleus:\n        channel: \"dna_segmentation\"\n        alias: \"NUC\"\n        color: \"#3AADA7\"\n    cell:\n        channel: \"membrane_segmentation\"\n        alias: \"MEM\"\n        color: \"#F200FF\"\n    structure:\n        channel: \"struct_segmentation_roof\"\n        alias: \"STR\"\n        color: \"#000000\"\n    structure-raw:\n        channel: \"structure\"\n        alias: \"STRRAW\"\n        color: \"#000000\"\n```\n\nHere we provide a description of the data. Aliases must be unique and they are used in the rest of the configuration file to specify which data we are referring to. In case you are using this package on your own data, be aware that the values used in the field `channel` must be found in the column `name_dict`of your input manifets file (see the section \"Running the pipeline on your own data\").\n\n**Features**\n\n```\nfeatures:\n    aliases: [\"NUC\", \"MEM\", \"STR\"]\n    # SHE - Spherical harmonics expansion\n    SHE:\n        alignment:\n            align: on\n            unique: off\n            reference: \"cell\"\n        aliases: [\"NUC\", \"MEM\"]\n        # Size of Gaussian kernal used to smooth the\n        # images before SHE coefficients calculation\n        sigma: 2\n        # Number of SHE coefficients used to describe cell\n        # and nuclear shape\n        lmax: 16\n```\n\nThis section is used to specify which aliases we should compute features on. In addition, which aliases we should calculate the spherical harmonics coefficies on and which type of alignment should be used.\n\n**Pre-processing**\n\n```\npreprocessing:\n    remove_mitotics: on\n    remove_outliers: on\n```\n\nHere we set whether or not to remove mitotic cells or outlier from the dataset. You can turn this off when running `cvapipe_analysis` on your own data.\n\n**Shape Space**\n\n```\nshapespace:\n    # Specify the a set of aliases here\n    aliases: [\"NUC\", \"MEM\"]\n    # Sort shape modes by volume of\n    sorter: \"MEM\"\n    # Percentage of exteme points to be removed\n    removal_pct: 1.0\n    # Number of principal components to be calculated\n    number_of_shape_modes: 8\n    # Map points\n    map_points: [-2.0, -1.5, -1.0, -0.5, 0.0, 0.5, 1.0, 1.5, 2.0]\n    plot:\n        swapxy_on_zproj: off\n        # limits of x and y axies in the animated GIFs\n        limits: [-150, 150, -80, 80]\n```\n\nHere we specify which aliases should be used to create a shape space. This must be a subset of the aliases specified above to have their spherical harmonics coefficients computed. In case os small datasets with only hundreds of cells, you may want to reduce the number of map points of your shape soace. The number of map points must be odd.\n\n**Intensity Parameterization**\n\n```\nparameterization:\n    inner: \"NUC\"\n    outer: \"MEM\"\n    parameterize: [\"RAWSTR\", \"STR\"]\n    number_of_interpolating_points: 32\n```\n\nFirst we specify which alias should be used as internal and external references and the aliases that we obtain parameterization for.\n\n**Structures**\n\n```\nstructures:\n    \"FBL\": [\"nucleoli [DFC)\", \"#A9D1E5\", \"{'raw': (420, 2610), 'seg': (0,30), 'avgseg': (80,160)}\"]\n    \"NPM1\": [\"nucleoli [GC)\", \"#88D1E5\", \"{'raw': (480, 8300), 'seg': (0,30), 'avgseg': (80,160)}\"]\n    \"SON\": [\"nuclear speckles\", \"#3292C9\", \"{'raw': (420, 1500), 'seg': (0,10), 'avgseg': (10,60)}\"]\n    \"SMC1A\": [\"cohesins\", \"#306598\", \"{'raw': (450, 630), 'seg': (0,2), 'avgseg': (0,15)}\"]\n    \"HIST1H2BJ\": [\"histones\", \"#305098\", \"{'raw': (450, 2885), 'seg': (0,30), 'avgseg': (10,100)}\"]\n    \"LMNB1\": [\"nuclear envelope\", \"#084AE7\", \"{'raw': (475,1700), 'seg': (0,30), 'avgseg': (0,60)}\"]\n    \"NUP153\": [\"nuclear pores\", \"#0840E7\", \"{'raw': (420, 600), 'seg': (0,15), 'avgseg': (0,50)}\"]\n    \"SEC61B\": [\"ER [Sec61 beta)\", \"#FFFFB5\", \"{'raw': (490,1070), 'seg': (0,30), 'avgseg': (0,100)}\"]\n    \"ATP2A2\": [\"ER [SERCA2)\", \"#FFFFA0\", \"{'raw': (430,670), 'seg': (0,25), 'avgseg': (0,80)}\"]\n    \"SLC25A17\": [\"peroxisomes\", \"#FFD184\", \"{'raw': (400,515), 'seg': (0,7), 'avgseg': (0,15)}\"]\n    \"RAB5A\": [\"endosomes\", \"#FFC846\", \"{'raw': (420,600), 'seg': (0,7), 'avgseg': (0,10)}\"]\n    \"TOMM20\": [\"mitochondria\", \"#FFBE37\", \"{'raw': (410,815), 'seg': (0,27), 'avgseg': (0,50)}\"]\n    \"LAMP1\": [\"lysosomes\", \"#AD952A\", \"{'raw': (440,800), 'seg': (0,27), 'avgseg': (0,30)}\"]\n    \"ST6GAL1\": [\"Golgi\", \"#B7952A\", \"{'raw': (400,490), 'seg': (0,17), 'avgseg': (0,30)}\"]\n    \"TUBA1B\": [\"microtubules\", \"#9D7000\", \"{'raw': (1100,3200), 'seg': (0,22), 'avgseg': (0,60)}\"]\n    \"CETN2\": [\"centrioles\", \"#C8E1AA\", \"{'raw': (440,800), 'seg': (0, 2), 'avgseg': (0,2)}\"]\n    \"GJA1\": [\"gap junctions\", \"#BEE18C\", \"{'raw': (420,2200), 'seg': (0,4), 'avgseg': (0,8)}\"]\n    \"TJP1\": [\"tight junctions\", \"#B4C878\", \"{'raw': (420,1500), 'seg': (0,8), 'avgseg': (0,20)}\"]\n    \"DSP\": [\"desmosomes\", \"#B4C864\", \"{'raw': (410,620), 'seg': (0,5), 'avgseg': (0,3)}\"]\n    \"CTNNB1\": [\"adherens junctions\", \"#96AA46\", \"{'raw': (410,750), 'seg': (0,22), 'avgseg': (5,40)}\"]\n    \"AAVS1\": [\"plasma membrane\", \"#FFD2FF\", \"{'raw': (505,2255), 'seg': (0,30), 'avgseg': (10,120)}\"]\n    \"ACTB\": [\"actin filaments\", \"#E6A0FF\", \"{'raw': (550,1300), 'seg': (0,18), 'avgseg': (0,35)}\"]\n    \"ACTN1\": [\"actin bundles\", \"#E696FF\", \"{'raw': (440,730), 'seg': (0,13), 'avgseg': (0,25)}\"]\n    \"MYH10\": [\"actomyosin bundles\", \"#FF82FF\", \"{'raw': (440,900), 'seg': (0,13), 'avgseg': (0,25)}\"]\n    \"PXN\": [\"matrix adhesions\", \"#CB1CCC\", \"{'raw': (410,490), 'seg': (0,5), 'avgseg': (0,5)}\"]\n```\n\nHere we specify a dictionary with the gene names, description and color for each structure. Again, in case you are applying to your own data, make sure you specify here the values you use in the column `structure_name` of your manifest file (see the section \"Running the pipeline on your own data\"). A list with contrast values (min, max) for each structure is also specified here and will be used for the plotting functions to display single cell images of raw data, segmentation or average morphed cells (avgseg).\n\n## Running the pipeline to reproduce the paper\n\nThis analysis is currently not configured to run as a workflow. Please run steps individually.\n\n### 1. Download the single-cell image dataset manifest including raw GFP and segmented cropped images\n\n```\ncvapipe_analysis loaddata run\n```\n\nThis command downloads the whole dataset of ~7Tb. For each cell in the dataset, we provide a raw 3-channels image containing fiducial markers for cell membrane and nucleus, toghether with a FP marker for one intracellular structure. We also provide segmentations for each cell in the format of 5-channels binary images. The extra two channels corresponds to roof-augmented versions of cell and intracellular structures segmentations. For more information about this, please refer to our paper [1]. Metadata about each cell can be found in the file `manifest.csv`. This is a table where each row corresponds to a cell.\n\n**Importantly**, you can download a _small test dataset composed by 300 cells chosen at random_ from the main dataset. To do so, please run\n\n```\ncvapipe_analysis loaddata run --test\n```\n\nThis step saves the single-cell images in the folders `local_staging/loaddata/crop_raw` and `local_staging/loaddata/crop_seg`.\n\n### 2. Compute single-cell features\n\n```\ncvapipe_analysis computefeatures run\n```\n\nThis step extract single-cell features, including cell, nuclear and intracellular volumes and other basic features. Here we also use `aics-shparam` [(link)](https://github.com/AllenCell/aics-shparam) to compute the spherical harmonics coefficients for cell and nuclear shape. This step depends on step 1.\n\nThis step saves the features in the file `local_staging/computefeatures/manifest.csv`.\n\n### 3. Pre-processing dataset\n\n```\ncvapipe_analysis preprocessing run\n```\n\nThis step removes outliers and mitotic cells from the single cell dataset. This step depends on step 2.\n\nThis step saves results in the file `local_staging/preprocessing/manifest.csv` and the **folder: `local_staging/preprocessing/outliers/`**\n\n-   `xx.png`: Diagnostic plots for outlier detection.\n\n### 4. Compute shapemodes\n\n```\ncvapipe_analysis shapemode run\n```\n\nHere we implement a few pre-processing steps. First, all mitotic cells are removed from the dataset. Next we use a feature-based outlier detection to detect and remove outliers form the dataset. The remaining dataset is used as input for principal component analysis. Finally, we compute cell and nuclear shape modes. This step depends on step 3.\n\nTwo output folders are produced by this step:\n\n**Folder: `local_staging/shapemode/pca/`**\n\n-   `explained_variance.png`: Explained variance by each principal component.\n-   `feature_importance.txt`: Importance of first few features of each principal component.\n-   `pairwise_correlations.png`: Pairwise correlations between all principal components.\n\n**Folder: `local_staging/shapemode/avgshape/`**\n\n-   `xx.vtk`: vtkPolyData files corresponding to 3D cell and nuclear meshes. We recommend [Paraview](https://www.paraview.org) to open these files.\n-   `xx.gif`: Animated GIF illustrating cell and nuclear shape modes from 3 different projections.\n-   `combined.tif`: Multichannel TIF that combines all animated GIFs in the same image.\n\n### 5. Create the parameterized intracellular location representation (PILR)\n\n```\ncvapipe_analysis parameterization run\n```\n\nHere we use `aics-cytoparam` [(link)](https://github.com/AllenCell/aics-cytoparam) to create parameterizations for all of the single-cell data. This steps depends on step 4 and step 3.\n\nOne output folder is produced by this step:\n\n**Folder: `local_staging/parameterization/representations/`**\n\n-   `xx.tif`: Multichannels TIFF image with the cell PILR.\n\n### 6. Create average PILRs\n\n```\ncvapipe_analysis aggregation run\n```\n\nThis step average multiple cell PILRs and morphs them into idealized shapes from the shape space. This step depends on step 5.\n\nTwo output folders are produced by this step:\n\n**Folder: `local_staging/aggregation/repsagg/`**\n\n-   `avg-SEG-TUBA1B-DNA_MEM_PC4-B5-CODE.tif`: Example of file generated. This represents the average PILR from segmented images of all TUBA1B cells that fall into bin number 5 from shape mode 4.\n\n**Folder: `local_staging/aggregation/aggmorph/`**\n\n-   `avg-SEG-TUBA1B-DNA_MEM_PC4-B5.tif`: Same as above but the PILR has been morphed into the cell shape corresponding to bin number 5 of shape mode 4.\n\n### 7. Correlate single-cells PIRL\n\n```\ncvapipe_analysis correlation run\n```\n\nThis step computes the pair-wise correlation between PILRs of cells. This step depends on step 5.\n\nOne output folder is produced by this step:\n\n**Folder: `local_staging/correlation/values/`**\n\n-   `avg-STR-NUC_MEM_PC8-1.tif`: Example of file generated. Correlation matrix of between PILRs of all cells that fall into bin number 1 and shape mode 8.\n-   `avg-STR-NUC_MEM_PC8-1.csv`: Example of file generated. Provides the cell indices for the correlation matrix above.\n\n### 8. Stereotypy analysis\n\n```\ncvapipe_analysis stereotypy run\n```\n\nThis step calculates the extent to which a structure\u2019s individual location varies. This step depends on step 5.\n\nTwo output folders are produced by this step:\n\n**Folder: `local_staging/stereotypy/values`**\n\n-   `*.csv*`: Stereotypy values.\n\n**Folder: `local_staging/stereotypy/plots`**\n\n-   Resulting plots.\n\n### 9. Concordance analysis\n\n```\ncvapipe_analysis concordance run\n```\n\nThis step calculates the extent to which the structure localized relative to all the other cellular structures. This step depends on step 6.\n\nTwo output folders are produced by this step:\n\n**Folder: `local_staging/concordance/values/`**\n\n-   `*.csv*`: Concordance values\n\n**Folder: `local_staging/concordance/plots/`**\n\n-   Resulting plots.\n\n## Running the pipeline on your own data\n\nYou need to specify the format of your data using a `manifest.csv` file. Each row of this file corresponds to a cell in your dataset. This file is requred to have the following columns:\n\n`CellId`: Unique ID of the cell. Example: `AB98765`.\n\n`structure_name`: FP structure tagged in the cell. Add something like \"NA\" if you don't have anything tagged for the cell. Example: `TOMM20`.\n\n`crop_seg`: Full path to the multichannel single cell segmentation.\n\n`crop_raw`: Full path to the multichannel single cell raw image.\n\n`name_dict`: Dictionary that specifies the names of each channel in the two images above. Example: `\"{'crop_raw': ['dna_dye', 'membrane', 'gfp'], 'crop_seg': ['dna_seg', 'cell_seg', 'gfp_seg', 'gfp_seg2']}\"`. In this case, your `crop_raw` images must have 3 channels once this is the number of names you provide in `name_dict`. Similarly, `crop_seg` must have 4 channels in this example.\n\nYou are ready to start using `cvapipe_analysis` once you have this manifest file created. To do so, you should run the step `loaddata` with the additional flag `--csv path_to_manifest`, where `path_to_manifest` is the full path to the manifest file that you juest created:\n\n`cvapipe_analysis loaddata run --csv path_to_manifest`\n\nAll the other steps can be ran without modifications.\n\n## Running the pipeline on a cluster with `sbatch` capabilities\n\nIf you are running `cvapipe_analysis` on a Slurm cluster or any other cluster with `sbatch` capabilities, each step can be called with a flag `--distribute`. This will spawn many jobs to run in parallel in the cluster. Specific parameters can be set in the `resources` section of the YAML config file.\n\n**_Free software: Allen Institute Software License_**\n",
    "bugtrack_url": null,
    "license": "Allen Institute Software License",
    "summary": "Analysis pipeline usinf in Integrated intracellular organization and its variations in human iPS cells",
    "version": "0.2.0",
    "project_urls": {
        "Homepage": "https://github.com/AllenCellModeling/cvapipe_analysis"
    },
    "split_keywords": [
        "cvapipe_analysis"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1575be4d90080439bb961a682b502b54593b18f7093850fdad007242f362247f",
                "md5": "be00bd73d61d9a1cf5f137b8580cdc36",
                "sha256": "393dcf2c8e5411f2fbeaacc6ed40c86df78eb26f7557d045222ab6d63aca04aa"
            },
            "downloads": -1,
            "filename": "cvapipe_analysis-0.2.0-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "be00bd73d61d9a1cf5f137b8580cdc36",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": ">=3.9",
            "size": 79208,
            "upload_time": "2024-03-28T23:48:26",
            "upload_time_iso_8601": "2024-03-28T23:48:26.422260Z",
            "url": "https://files.pythonhosted.org/packages/15/75/be4d90080439bb961a682b502b54593b18f7093850fdad007242f362247f/cvapipe_analysis-0.2.0-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0a03a8bf0162fad7ffd82ec22ef9ca52631b27c6c8a508eb8ee150210c64a015",
                "md5": "1e82f17ae9f7aec3163cfb6dbfccf4ae",
                "sha256": "5817832c266b2d169866ff29c8abc58d21d6212aeb42d82bf9692fd7effc635f"
            },
            "downloads": -1,
            "filename": "cvapipe_analysis-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "1e82f17ae9f7aec3163cfb6dbfccf4ae",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 227475,
            "upload_time": "2024-03-28T23:48:28",
            "upload_time_iso_8601": "2024-03-28T23:48:28.351529Z",
            "url": "https://files.pythonhosted.org/packages/0a/03/a8bf0162fad7ffd82ec22ef9ca52631b27c6c8a508eb8ee150210c64a015/cvapipe_analysis-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-28 23:48:28",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "AllenCellModeling",
    "github_project": "cvapipe_analysis",
    "github_not_found": true,
    "lcname": "cvapipe-analysis"
}

Matheus Viana