actk


Nameactk JSON
Version 0.2.1 PyPI version JSON
download
home_pagehttps://github.com/AllenCellModeling/actk
SummaryAutomated Cell Toolkit
upload_time2020-12-15 19:09:23
maintainer
docs_urlNone
authorJackson Maxfield Brown
requires_python>=3.6
licenseAllen Institute Software License
keywords actk computational biology workflow cell microscopy
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # actk

[![Build Status](https://github.com/AllenCellModeling/actk/workflows/Build%20Master/badge.svg)](https://github.com/AllenCellModeling/actk/actions)
[![Documentation](https://github.com/AllenCellModeling/actk/workflows/Documentation/badge.svg)](https://AllenCellModeling.github.io/actk)
[![Code Coverage](https://codecov.io/gh/AllenCellModeling/actk/branch/master/graph/badge.svg)](https://codecov.io/gh/AllenCellModeling/actk)
[![Published Data](https://img.shields.io/badge/Data-Published-Success)](https://open.quiltdata.com/b/allencell/tree/aics/actk/)

Automated Cell Toolkit

A pipeline to process field-of-view (FOV) microscopy images and generate data and
render-ready products for the cells in each field. Of note, the data produced by this
pipeline is used for the [Cell Feature Explorer](https://cfe.allencell.org/).

![workflow as an image](./images/header.png)

---

## Features
All steps and functionality in this package can be run as single steps or all together
by using the command line.

In general, all commands for this package will follow the format:
`actk {step} {command}`

* `step` is the name of the step, such as "StandardizeFOVArray" or "SingleCellFeatures"
* `command` is what you want that step to do, such as "run" or "push"

Each step will check that the dataset provided contains the required fields prior to
processing. For details and definitions on each field, see our
[dataset fields documentation](https://AllenCellModeling.github.io/actk/dataset_fields.html).

An example dataset can be seen [here](https://open.quiltdata.com/b/aics-modeling-packages-test-resources/tree/actk/test_data/data/example_dataset.csv).

### Pipeline
To run the entire pipeline from start to finish you can simply run:

```bash
actk all run --dataset {path to dataset}
```

Step specific parameters can additionally be passed by simply appending them.
For example: the step `SingleCellFeatures` has a parameter for
`cell_ceiling_adjustment` and this can be set on both the individual step run level and
also for the entire pipeline with:

```bash
actk all run --dataset {path to dataset} --cell_ceiling_adjustment {integer}
```

See the [steps module in our documentation](https://AllenCellModeling.github.io/actk/actk.steps.html)
for a full list of parameters for each step

#### Pipeline Config

A configuration file can be provided to the underlying `datastep` library that manages
the data storage and upload of the steps in this workflow.

The config file should simply be called `workflow_config.json` and be available from
whichever directory you run `actk` from. If this config is not found in the current
working directory, defaults are selected by the `datastep` package.

Here is an example of our production config:

```json
{
    "quilt_storage_bucket": "s3://allencell",
    "project_local_staging_dir": "/allen/aics/modeling/jacksonb/results/actk"
}
```

You can even additionally attach step-specific configuration in this file by using the
name of the step like so:

```json
{
    "quilt_storage_bucket": "s3://example_config_7",
    "project_local_staging_dir": "example/config/7",
    "example": {
        "step_local_staging_dir": "example/step/local/staging/"
    }
}
```

#### AICS Distributed Computing

For members of the AICS team, to run in distributed mode across the SLURM cluster add
the `--distributed` flag to the pipeline call.

To set distributed cluster and worker parameters you can additionally add the flags:
* `--n_workers {int}` (i.e. `--n_workers 100`)
* `--worker_cpu {int}` (i.e. `--worker_cpu 2`)
* `--worker_mem {str}` (i.e. `--worker_mem 100GB`)

### Individual Steps
* `actk standardizefovarray run --dataset {path to dataset}`, Generate standardized,
ordered, and normalized FOV images as OME-Tiffs.
* `actk singlecellfeatures run --dataset {path to dataset}`, Generate a features JSON
file for each cell in the dataset.
* `actk singlecellimages run --dataset {path to dataset}`, Generate bounded 3D images
and 2D projections for each cell in the dataset.
* `actk diagnosticsheets run --dataset {path to dataset}`, Generate diagnostic sheets
for single cell images. Useful for quality control.

## Installation
**Install Requires:** The python package, `numpy`, must be installed prior to the
installation of this package: `pip install numpy`

**Stable Release:** `pip install actk`<br>
**Development Head:** `pip install git+https://github.com/AllenCellModeling/actk.git`

## Documentation
For full package documentation please visit
[allencellmodeling.github.io/actk](https://allencellmodeling.github.io/actk/index.html).

## Published Data

For a large-scale example of what this library is capable of, please see the data
produced by this pipeline after running our largest cell dataset through it. The data
from the Allen Institute for Cell Science created from this pipeline can be found
[here](https://open.quiltdata.com/b/allencell/tree/aics/actk/).

This package contains the source microscopy images, segmentation files, pre-processed
single cell images and features, and diagnostic sheets.

Our source images are of endogenously-tagged hiPSC, grown for 4 days on Matrigel-coated
96-well, glass bottom imaging plates. Each field of view (FOV) includes 4 channels (BF,
EGFP, DNA, Cell membrane) collected either interwoven with one camera (workflow
Pipeline 4.0 - 4.2) or simultaneously with two cameras (Workflow Pipeline 4.4). You can
use the file metadata of each image to target the specific channel you are interested
in. FOVs were either selected randomly (mode A), enriched for mitotic events (mode B)
or sampling 3 different areas of a colony (edge, ridge, center) using a photo
protective cocktail (mode C). The images cataloged in this dataset come in several
flavors:

* Field of view (FOV) images with channels* :
  * Brightfield
  * EGFP
  * DNA
  * Cell Membrane
* Segmentation files with channels:
  * Nucleus Segmentation
  * Nucleus Contour
  * Membrane Segmentation
  * Membrane Contour

_* Some FOV images contain seven channels rather than four. The extra three channels
are "dummy" channels added during acquisition that can be ignored._

The full details of the Allen Institute cell workflow are available on our website
[here](https://www.allencell.org/methods-for-cells-in-the-lab.html).<br>
The full details of the Allen Institute microscopy workflow are available on our
website [here](https://www.allencell.org/methods-for-microscopy.html).

The following is provided for each cell:
* Cell Id
* Cell Index (from within the FOV's segmentation)
* Metadata (Cell line, Labeled protein name, segmented region index, gene, etc.)
* 3D cell and nuclear segmentation, and, DNA, membrane, and structure channels
* 2D max projects for dimension pairs (XY, ZX, and ZY) of the above 3D images
* A whole bunch of features for each cell

For the 3D single cell images the channel ordering is:
* Segmented DNA
* Segmented Membrane
* DNA (Hoechst)
* Membrane (CellMask)
* Labeled Structure (GFP)
* Transmitted Light

To interact with this dataset please see the
[Quilt Documentation](https://docs.quiltdata.com/).

## Development
See
[CONTRIBUTING.md](https://github.com/AllenCellModeling/actk/blob/master/CONTRIBUTING.md)
for information related to developing the code.

For more details on how this pipeline is constructed please see
[cookiecutter-stepworkflow](https://github.com/AllenCellModeling/cookiecutter-stepworkflow)
and [datastep](https://github.com/AllenCellModeling/datastep).

To add new steps to this pipeline, run `make_new_step` and follow the instructions in
[CONTRIBUTING.md](https://github.com/AllenCellModeling/actk/blob/master/CONTRIBUTING.md)

### Developer Installation
The following two commands will install the package with dev dependencies in editable
mode and download all resources required for testing.

```bash
pip install -e .[dev]
python scripts/download_test_data.py
```

### AICS Developer Instructions
If you want to run this pipeline with the Pipeline Integrated Cell dataset
(`pipeline 4.*`) run the following commands:

```bash
pip install -e .[all]
python scripts/download_aics_dataset.py
```

Options for this script are available and can be viewed with:
`python scripts/download_aics_dataset.py --help`

## Acknowledgments

A previous iteration of this pipeline was created and managed by
[Gregory Johnson](https://github.com/gregjohnso) for work with
[PyTorch Integrated Cell](https://github.com/AllenCellModeling/pytorch_integrated_cell).

This version of this pipeline is more generalized and while still used for the
Integrated Cell model, can be used to pre-process a variety of microscopy image
datasets.

The previous version of this pipeline produced the
[pipeline_integrated_single_cell dataset](https://open.quiltdata.com/b/allencell/tree/aics/pipeline_integrated_single_cell/).

***Free software: Allen Institute Software License***



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/AllenCellModeling/actk",
    "name": "actk",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "actk,computational biology,workflow,cell,microscopy",
    "author": "Jackson Maxfield Brown",
    "author_email": "jacksonb@alleninstitute.org",
    "download_url": "https://files.pythonhosted.org/packages/ac/27/1f1a29edc0d062e615e801c306cd21fee9f0d1a0d707e52ebfe9202ae001/actk-0.2.1.tar.gz",
    "platform": "",
    "description": "# actk\n\n[![Build Status](https://github.com/AllenCellModeling/actk/workflows/Build%20Master/badge.svg)](https://github.com/AllenCellModeling/actk/actions)\n[![Documentation](https://github.com/AllenCellModeling/actk/workflows/Documentation/badge.svg)](https://AllenCellModeling.github.io/actk)\n[![Code Coverage](https://codecov.io/gh/AllenCellModeling/actk/branch/master/graph/badge.svg)](https://codecov.io/gh/AllenCellModeling/actk)\n[![Published Data](https://img.shields.io/badge/Data-Published-Success)](https://open.quiltdata.com/b/allencell/tree/aics/actk/)\n\nAutomated Cell Toolkit\n\nA pipeline to process field-of-view (FOV) microscopy images and generate data and\nrender-ready products for the cells in each field. Of note, the data produced by this\npipeline is used for the [Cell Feature Explorer](https://cfe.allencell.org/).\n\n![workflow as an image](./images/header.png)\n\n---\n\n## Features\nAll steps and functionality in this package can be run as single steps or all together\nby using the command line.\n\nIn general, all commands for this package will follow the format:\n`actk {step} {command}`\n\n* `step` is the name of the step, such as \"StandardizeFOVArray\" or \"SingleCellFeatures\"\n* `command` is what you want that step to do, such as \"run\" or \"push\"\n\nEach step will check that the dataset provided contains the required fields prior to\nprocessing. For details and definitions on each field, see our\n[dataset fields documentation](https://AllenCellModeling.github.io/actk/dataset_fields.html).\n\nAn example dataset can be seen [here](https://open.quiltdata.com/b/aics-modeling-packages-test-resources/tree/actk/test_data/data/example_dataset.csv).\n\n### Pipeline\nTo run the entire pipeline from start to finish you can simply run:\n\n```bash\nactk all run --dataset {path to dataset}\n```\n\nStep specific parameters can additionally be passed by simply appending them.\nFor example: the step `SingleCellFeatures` has a parameter for\n`cell_ceiling_adjustment` and this can be set on both the individual step run level and\nalso for the entire pipeline with:\n\n```bash\nactk all run --dataset {path to dataset} --cell_ceiling_adjustment {integer}\n```\n\nSee the [steps module in our documentation](https://AllenCellModeling.github.io/actk/actk.steps.html)\nfor a full list of parameters for each step\n\n#### Pipeline Config\n\nA configuration file can be provided to the underlying `datastep` library that manages\nthe data storage and upload of the steps in this workflow.\n\nThe config file should simply be called `workflow_config.json` and be available from\nwhichever directory you run `actk` from. If this config is not found in the current\nworking directory, defaults are selected by the `datastep` package.\n\nHere is an example of our production config:\n\n```json\n{\n    \"quilt_storage_bucket\": \"s3://allencell\",\n    \"project_local_staging_dir\": \"/allen/aics/modeling/jacksonb/results/actk\"\n}\n```\n\nYou can even additionally attach step-specific configuration in this file by using the\nname of the step like so:\n\n```json\n{\n    \"quilt_storage_bucket\": \"s3://example_config_7\",\n    \"project_local_staging_dir\": \"example/config/7\",\n    \"example\": {\n        \"step_local_staging_dir\": \"example/step/local/staging/\"\n    }\n}\n```\n\n#### AICS Distributed Computing\n\nFor members of the AICS team, to run in distributed mode across the SLURM cluster add\nthe `--distributed` flag to the pipeline call.\n\nTo set distributed cluster and worker parameters you can additionally add the flags:\n* `--n_workers {int}` (i.e. `--n_workers 100`)\n* `--worker_cpu {int}` (i.e. `--worker_cpu 2`)\n* `--worker_mem {str}` (i.e. `--worker_mem 100GB`)\n\n### Individual Steps\n* `actk standardizefovarray run --dataset {path to dataset}`, Generate standardized,\nordered, and normalized FOV images as OME-Tiffs.\n* `actk singlecellfeatures run --dataset {path to dataset}`, Generate a features JSON\nfile for each cell in the dataset.\n* `actk singlecellimages run --dataset {path to dataset}`, Generate bounded 3D images\nand 2D projections for each cell in the dataset.\n* `actk diagnosticsheets run --dataset {path to dataset}`, Generate diagnostic sheets\nfor single cell images. Useful for quality control.\n\n## Installation\n**Install Requires:** The python package, `numpy`, must be installed prior to the\ninstallation of this package: `pip install numpy`\n\n**Stable Release:** `pip install actk`<br>\n**Development Head:** `pip install git+https://github.com/AllenCellModeling/actk.git`\n\n## Documentation\nFor full package documentation please visit\n[allencellmodeling.github.io/actk](https://allencellmodeling.github.io/actk/index.html).\n\n## Published Data\n\nFor a large-scale example of what this library is capable of, please see the data\nproduced by this pipeline after running our largest cell dataset through it. The data\nfrom the Allen Institute for Cell Science created from this pipeline can be found\n[here](https://open.quiltdata.com/b/allencell/tree/aics/actk/).\n\nThis package contains the source microscopy images, segmentation files, pre-processed\nsingle cell images and features, and diagnostic sheets.\n\nOur source images are of endogenously-tagged hiPSC, grown for 4 days on Matrigel-coated\n96-well, glass bottom imaging plates. Each field of view (FOV) includes 4 channels (BF,\nEGFP, DNA, Cell membrane) collected either interwoven with one camera (workflow\nPipeline 4.0 - 4.2) or simultaneously with two cameras (Workflow Pipeline 4.4). You can\nuse the file metadata of each image to target the specific channel you are interested\nin. FOVs were either selected randomly (mode A), enriched for mitotic events (mode B)\nor sampling 3 different areas of a colony (edge, ridge, center) using a photo\nprotective cocktail (mode C). The images cataloged in this dataset come in several\nflavors:\n\n* Field of view (FOV) images with channels* :\n  * Brightfield\n  * EGFP\n  * DNA\n  * Cell Membrane\n* Segmentation files with channels:\n  * Nucleus Segmentation\n  * Nucleus Contour\n  * Membrane Segmentation\n  * Membrane Contour\n\n_* Some FOV images contain seven channels rather than four. The extra three channels\nare \"dummy\" channels added during acquisition that can be ignored._\n\nThe full details of the Allen Institute cell workflow are available on our website\n[here](https://www.allencell.org/methods-for-cells-in-the-lab.html).<br>\nThe full details of the Allen Institute microscopy workflow are available on our\nwebsite [here](https://www.allencell.org/methods-for-microscopy.html).\n\nThe following is provided for each cell:\n* Cell Id\n* Cell Index (from within the FOV's segmentation)\n* Metadata (Cell line, Labeled protein name, segmented region index, gene, etc.)\n* 3D cell and nuclear segmentation, and, DNA, membrane, and structure channels\n* 2D max projects for dimension pairs (XY, ZX, and ZY) of the above 3D images\n* A whole bunch of features for each cell\n\nFor the 3D single cell images the channel ordering is:\n* Segmented DNA\n* Segmented Membrane\n* DNA (Hoechst)\n* Membrane (CellMask)\n* Labeled Structure (GFP)\n* Transmitted Light\n\nTo interact with this dataset please see the\n[Quilt Documentation](https://docs.quiltdata.com/).\n\n## Development\nSee\n[CONTRIBUTING.md](https://github.com/AllenCellModeling/actk/blob/master/CONTRIBUTING.md)\nfor information related to developing the code.\n\nFor more details on how this pipeline is constructed please see\n[cookiecutter-stepworkflow](https://github.com/AllenCellModeling/cookiecutter-stepworkflow)\nand [datastep](https://github.com/AllenCellModeling/datastep).\n\nTo add new steps to this pipeline, run `make_new_step` and follow the instructions in\n[CONTRIBUTING.md](https://github.com/AllenCellModeling/actk/blob/master/CONTRIBUTING.md)\n\n### Developer Installation\nThe following two commands will install the package with dev dependencies in editable\nmode and download all resources required for testing.\n\n```bash\npip install -e .[dev]\npython scripts/download_test_data.py\n```\n\n### AICS Developer Instructions\nIf you want to run this pipeline with the Pipeline Integrated Cell dataset\n(`pipeline 4.*`) run the following commands:\n\n```bash\npip install -e .[all]\npython scripts/download_aics_dataset.py\n```\n\nOptions for this script are available and can be viewed with:\n`python scripts/download_aics_dataset.py --help`\n\n## Acknowledgments\n\nA previous iteration of this pipeline was created and managed by\n[Gregory Johnson](https://github.com/gregjohnso) for work with\n[PyTorch Integrated Cell](https://github.com/AllenCellModeling/pytorch_integrated_cell).\n\nThis version of this pipeline is more generalized and while still used for the\nIntegrated Cell model, can be used to pre-process a variety of microscopy image\ndatasets.\n\nThe previous version of this pipeline produced the\n[pipeline_integrated_single_cell dataset](https://open.quiltdata.com/b/allencell/tree/aics/pipeline_integrated_single_cell/).\n\n***Free software: Allen Institute Software License***\n\n\n",
    "bugtrack_url": null,
    "license": "Allen Institute Software License",
    "summary": "Automated Cell Toolkit",
    "version": "0.2.1",
    "split_keywords": [
        "actk",
        "computational biology",
        "workflow",
        "cell",
        "microscopy"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "3171436c1cbb86fbb069fb999158ddae",
                "sha256": "51dfb2d92f0aa6786b54e0881449793715fbf08686a42393577990f4221a0dfd"
            },
            "downloads": -1,
            "filename": "actk-0.2.1-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3171436c1cbb86fbb069fb999158ddae",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": ">=3.6",
            "size": 34293,
            "upload_time": "2020-12-15T19:09:22",
            "upload_time_iso_8601": "2020-12-15T19:09:22.242236Z",
            "url": "https://files.pythonhosted.org/packages/ef/b8/7f31fcdb17481a6ea0545199f808be93ab5f878a45af8c214b3d919803f6/actk-0.2.1-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "0513aeb4e555fd982f3eedd18c904f82",
                "sha256": "48e8681cf877d789adde25cd67fecabc22ebc03049b5004ca0dece288d5cc75b"
            },
            "downloads": -1,
            "filename": "actk-0.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "0513aeb4e555fd982f3eedd18c904f82",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 32929,
            "upload_time": "2020-12-15T19:09:23",
            "upload_time_iso_8601": "2020-12-15T19:09:23.764831Z",
            "url": "https://files.pythonhosted.org/packages/ac/27/1f1a29edc0d062e615e801c306cd21fee9f0d1a0d707e52ebfe9202ae001/actk-0.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2020-12-15 19:09:23",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": null,
    "github_project": "AllenCellModeling",
    "error": "Could not fetch GitHub repository",
    "lcname": "actk"
}
        
Elapsed time: 0.16584s