Cell Data Loader
================
Cell Data Loader is a simple AI support tool in Python that can take in images of cells (or other image types) and output them with minimal effort to formats that can be read by Pytorch (Tensor) or Tensorflow (Numpy) format. With Cell Data Loader, users have the option to output their cell images as whole images, sliced images, or, with the support of [CellPose](https://github.com/MouseLand/cellpose), segment their images by cell and output those individually.
It can also be used for normal computer vision research, which is why CellPose is not a strict dependency.
Quick Start
-----------
To install Cell Data Loader, simply type into a standard UNIX terminal
pip install cell-data-loader
To use a quick Jupyter Notebook example, navigate to [example/CellDataLoaderPlayground.ipynb](https://github.com/mleming/CellDataLoader/blob/main/example/CellDataLoaderPlayground.ipynb), download it, and run it in [Jupyter Notebook](https://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html). On a Mac, that may be run with the following command:
python3 -m jupyterlab ~/Downloads/CellDataLoaderPlayground.ipynb
Python
------
The simplest way to use Cell Data Loader is to instantiate a dataloader as such:
~~~python
from cell_data_loader import CellDataloader
imfolder = '/path/to/my/images'
dataloader = CellDataloader(imfolder)
for image in dataloader:
...
~~~
And viola!
Lists of files are also supported:
~~~python
imfiles = ['/path/to/image1.png','/path/to/image2.png','/path/to/image3.png']
dataloader = CellDataloader(imfiles)
for image in dataloader:
...
~~~
Labels
------
Cell Data Loader has a few ways to support image labels. The simplest is whole images that are located in different folders, with each folder representing a label. This can be supported via the following:
~~~python
imfolder1 = '/path/to/my/images'
imfolder2 = '/path/to/your/images'
dataloader = CellDataloader(imfolder1,imfolder2)
for label,image in dataloader:
...
~~~
Alternatively, if you have one folder or file list with images that have different naming conventions, a regex match is supported:
~~~python
imfiles = ['/path/to/CANCER_image1.png',
'/path/to/CANCER_image2.png',
'/path/to/CANCER_image3.png',
'/path/to/HEALTHY_image1.png',
'/path/to/HEALTHY_image2.png',
'/path/to/HEALTHY_image3.png']
dataloader = CellDataloader(imfiles,label_regex = ["CANCER","HEALTHY"])
for label,image in dataloader:
...
~~~
Boxes
-----
In cases where you need to cut out individual cells from an image and have the coordinates file, cell_data_loader.py accepts an argument, cell_box_filelist, which is a list of files corresponding to the inputs that mark out the coordinates of labels on the cells. The format of the csv is as follows:
| X | Y | W | H | Label |
| - | - | - | - | ----- |
| 14 | 13 | 5 | 6 | 0 |
| 20 | 25 | 15 | 5 | 1 |
~~~python
dataloader = CellDataloader('/path/to/file.svs',cell_box_filelist=['/path/to/boxfile.csv'])
~~~
Arguments
---------
Additional arguments taken by Cell Data Loader include
~~~python
imfolder = '/path/to/folder'
dataloader = CellDataloader(imfolder,
dim = (64,64),
batch_size = 64,
dtype = "torch", # Can also be "numpy"
label_regex = None,
verbose = True,
segment_image = "whole", # "whole" outputs the whole image, resized
# to dim; "sliced" cuts the image checkerboard pattern into
# dim-shaped outputs, so it's suitable for large images; "cell"
# segments cells from the image using CellPose, though it throws
# an error if CellPose is not installed properly. CellPose is
# not included by default in the dependencies and needs to be
# installed separately by the user.
n_channels = 3, # Detected in first image by default; re-samples all
# images to force this number of channels
augment_image = True, # Augments the output image in the standard
# ways -- rotation, color jiggling, etc.
label_balance = True, # Outputs proportional amounts of each label
# in the dataset
gpu_ids = None, # GPUs that the outputs are read to, if present.
channels_first = True # Places channels either first, before the
# batch dimension, or last
)
~~~
Raw data
{
"_id": null,
"home_page": null,
"name": "cell-data-loader",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "Matt Leming <mleming@mgh.harvard.edu>",
"keywords": "biomedical, cell, cell image, cellpose, csz, dataloader",
"author": null,
"author_email": "Matt Leming <mleming@mgh.harvard.edu>",
"download_url": "https://files.pythonhosted.org/packages/1e/3f/6e4c6107d879a7674040a36e6701b763f5f7d7dea9421363583dcec26251/cell_data_loader-0.0.34.tar.gz",
"platform": null,
"description": "Cell Data Loader\n================\n\nCell Data Loader is a simple AI support tool in Python that can take in images of cells (or other image types) and output them with minimal effort to formats that can be read by Pytorch (Tensor) or Tensorflow (Numpy) format. With Cell Data Loader, users have the option to output their cell images as whole images, sliced images, or, with the support of [CellPose](https://github.com/MouseLand/cellpose), segment their images by cell and output those individually.\n\nIt can also be used for normal computer vision research, which is why CellPose is not a strict dependency.\n\nQuick Start\n-----------\n\nTo install Cell Data Loader, simply type into a standard UNIX terminal\n\n pip install cell-data-loader\n\nTo use a quick Jupyter Notebook example, navigate to [example/CellDataLoaderPlayground.ipynb](https://github.com/mleming/CellDataLoader/blob/main/example/CellDataLoaderPlayground.ipynb), download it, and run it in [Jupyter Notebook](https://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html). On a Mac, that may be run with the following command:\n\n\tpython3 -m jupyterlab ~/Downloads/CellDataLoaderPlayground.ipynb\n\n\nPython\n------\n\nThe simplest way to use Cell Data Loader is to instantiate a dataloader as such:\n\n~~~python\nfrom cell_data_loader import CellDataloader\n\nimfolder = '/path/to/my/images'\n\ndataloader = CellDataloader(imfolder)\n\nfor image in dataloader:\n\t...\n~~~\n\nAnd viola!\n\nLists of files are also supported:\n\n~~~python\n\nimfiles = ['/path/to/image1.png','/path/to/image2.png','/path/to/image3.png']\n\ndataloader = CellDataloader(imfiles)\n\nfor image in dataloader:\n\t...\n~~~\n\nLabels\n------\n\nCell Data Loader has a few ways to support image labels. The simplest is whole images that are located in different folders, with each folder representing a label. This can be supported via the following:\n\n~~~python\nimfolder1 = '/path/to/my/images'\nimfolder2 = '/path/to/your/images'\n\ndataloader = CellDataloader(imfolder1,imfolder2)\n\nfor label,image in dataloader:\n\t...\n~~~\n\nAlternatively, if you have one folder or file list with images that have different naming conventions, a regex match is supported:\n\n~~~python\nimfiles = ['/path/to/CANCER_image1.png',\n\t\t\t'/path/to/CANCER_image2.png',\n\t\t\t'/path/to/CANCER_image3.png',\n\t\t\t'/path/to/HEALTHY_image1.png',\n\t\t\t'/path/to/HEALTHY_image2.png',\n\t\t\t'/path/to/HEALTHY_image3.png']\n\ndataloader = CellDataloader(imfiles,label_regex = [\"CANCER\",\"HEALTHY\"])\nfor label,image in dataloader:\n\t...\n~~~\n\nBoxes\n-----\n\nIn cases where you need to cut out individual cells from an image and have the coordinates file, cell_data_loader.py accepts an argument, cell_box_filelist, which is a list of files corresponding to the inputs that mark out the coordinates of labels on the cells. The format of the csv is as follows:\n\n| X | Y | W | H | Label |\n| - | - | - | - | ----- |\n| 14 | 13 | 5 | 6 | 0 |\n| 20 | 25 | 15 | 5 | 1 |\n\n\n~~~python\ndataloader = CellDataloader('/path/to/file.svs',cell_box_filelist=['/path/to/boxfile.csv'])\n~~~\n\nArguments\n---------\n\nAdditional arguments taken by Cell Data Loader include\n\n~~~python\n\nimfolder = '/path/to/folder'\n\ndataloader = CellDataloader(imfolder,\n\t\t\tdim = (64,64),\n\t\t\tbatch_size = 64,\n\t\t\tdtype = \"torch\", # Can also be \"numpy\"\n\t\t\tlabel_regex = None,\n\t\t\tverbose = True,\n\t\t\tsegment_image = \"whole\", # \"whole\" outputs the whole image, resized\n\t\t\t\t# to dim; \"sliced\" cuts the image checkerboard pattern into\n\t\t\t\t# dim-shaped outputs, so it's suitable for large images; \"cell\"\n\t\t\t\t# segments cells from the image using CellPose, though it throws\n\t\t\t\t# an error if CellPose is not installed properly. CellPose is\n\t\t\t\t# not included by default in the dependencies and needs to be\n\t\t\t\t# installed separately by the user.\n\t\t\tn_channels = 3, # Detected in first image by default; re-samples all\n\t\t\t\t# images to force this number of channels\n\t\t\taugment_image = True, # Augments the output image in the standard\n\t\t\t\t# ways -- rotation, color jiggling, etc.\n\t\t\tlabel_balance = True, # Outputs proportional amounts of each label\n\t\t\t\t# in the dataset\n\t\t\tgpu_ids = None, # GPUs that the outputs are read to, if present.\n\t\t\tchannels_first = True # Places channels either first, before the\n\t\t\t\t# batch dimension, or last\n\t\t\t)\n~~~\n\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "Converts general images of cells into formats and labels for deep learning pipelines",
"version": "0.0.34",
"project_urls": {
"Homepage": "https://github.com/mleming/CellDataLoader"
},
"split_keywords": [
"biomedical",
" cell",
" cell image",
" cellpose",
" csz",
" dataloader"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "942aee42c29a7d4b810e34e8637be0a9443ef443939a211f9a8299b0e6b2eb1a",
"md5": "c9510abd80a38138d5c4d27f533a4cbc",
"sha256": "fe5e3c7e2b5d119db927797bb198c494eb13302dd0e204f5fae81c946266859f"
},
"downloads": -1,
"filename": "cell_data_loader-0.0.34-py3-none-any.whl",
"has_sig": false,
"md5_digest": "c9510abd80a38138d5c4d27f533a4cbc",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 18573,
"upload_time": "2025-01-22T20:02:19",
"upload_time_iso_8601": "2025-01-22T20:02:19.581109Z",
"url": "https://files.pythonhosted.org/packages/94/2a/ee42c29a7d4b810e34e8637be0a9443ef443939a211f9a8299b0e6b2eb1a/cell_data_loader-0.0.34-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "1e3f6e4c6107d879a7674040a36e6701b763f5f7d7dea9421363583dcec26251",
"md5": "f2096c8326352f560aba5a10e35549ac",
"sha256": "295a15424bca636ac6499df78abc100f511812368ff1e4cf2da4b44379bb353f"
},
"downloads": -1,
"filename": "cell_data_loader-0.0.34.tar.gz",
"has_sig": false,
"md5_digest": "f2096c8326352f560aba5a10e35549ac",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 51968,
"upload_time": "2025-01-22T20:02:20",
"upload_time_iso_8601": "2025-01-22T20:02:20.935508Z",
"url": "https://files.pythonhosted.org/packages/1e/3f/6e4c6107d879a7674040a36e6701b763f5f7d7dea9421363583dcec26251/cell_data_loader-0.0.34.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-22 20:02:20",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "mleming",
"github_project": "CellDataLoader",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "numpy",
"specs": []
},
{
"name": "torch",
"specs": []
},
{
"name": "torchvision",
"specs": []
},
{
"name": "opencv-python",
"specs": [
[
">=",
"4.5.4"
]
]
},
{
"name": "slideio",
"specs": [
[
"==",
"2.4.1"
]
]
},
{
"name": "scipy",
"specs": []
},
{
"name": "scikit-image",
"specs": []
},
{
"name": "pillow",
"specs": []
}
],
"lcname": "cell-data-loader"
}