cell-data-loader


Namecell-data-loader JSON
Version 0.0.37 PyPI version JSON
download
home_pageNone
SummaryConverts general images of cells into formats and labels for deep learning pipelines
upload_time2025-08-13 20:22:24
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT License
keywords biomedical cell cell image cellpose csz dataloader
VCS
bugtrack_url
requirements numpy torch torchvision opencv-python slideio scipy scikit-image pillow
Travis-CI No Travis.
coveralls test coverage No coveralls.
            Cell Data Loader
================

Cell Data Loader is a simple AI support tool in Python that can take in images of cells (or other image types) and output them with minimal effort to formats that can be read by Pytorch (Tensor) or Tensorflow (Numpy) format. With Cell Data Loader, users have the option to output their cell images as whole images, sliced images, or, with the support of [CellPose](https://github.com/MouseLand/cellpose), segment their images by cell and output those individually.

It can also be used for normal computer vision research, which is why CellPose is not a strict dependency.

Quick Start
-----------

To install Cell Data Loader, simply type into a standard UNIX terminal

    pip install cell-data-loader

To use a quick Jupyter Notebook example, navigate to [example/CellDataLoaderPlayground.ipynb](https://github.com/mleming/CellDataLoader/blob/main/example/CellDataLoaderPlayground.ipynb), download it, and run it in [Jupyter Notebook](https://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html). On a Mac, that may be run with the following command:

	python3 -m jupyterlab ~/Downloads/CellDataLoaderPlayground.ipynb


Python
------

The simplest way to use Cell Data Loader is to instantiate a dataloader as such:

~~~python
from cell_data_loader import CellDataloader

imfolder = '/path/to/my/images'

dataloader = CellDataloader(imfolder)

for image in dataloader:
	...
~~~

And viola!

Lists of files are also supported:

~~~python

imfiles = ['/path/to/image1.png','/path/to/image2.png','/path/to/image3.png']

dataloader = CellDataloader(imfiles)

for image in dataloader:
	...
~~~

Labels
------

Cell Data Loader has a few ways to support image labels. The simplest is whole images that are located in different folders, with each folder representing a label. This can be supported via the following:

~~~python
imfolder1 = '/path/to/my/images'
imfolder2 = '/path/to/your/images'

dataloader = CellDataloader(imfolder1,imfolder2)

for label,image in dataloader:
	...
~~~

Alternatively, if you have one folder or file list with images that have different naming conventions, a regex match is supported:

~~~python
imfiles = ['/path/to/CANCER_image1.png',
			'/path/to/CANCER_image2.png',
			'/path/to/CANCER_image3.png',
			'/path/to/HEALTHY_image1.png',
			'/path/to/HEALTHY_image2.png',
			'/path/to/HEALTHY_image3.png']

dataloader = CellDataloader(imfiles,label_regex = ["CANCER","HEALTHY"])
for label,image in dataloader:
	...
~~~

Boxes
-----

In cases where you need to cut out individual cells from an image and have the coordinates file, cell_data_loader.py accepts an argument, file_to_label_regex, which is a regex that translates image file names into the paths of CSVs that correspond to the inputs that mark out the coordinates of labels on the cells. The format of the csv is as follows:

| X  | Y   | W  | H | Label |
| -  | -   | -  | - | ----- |
| 14 | 13  | 5  | 6 | 0     |
| 20 | 25  | 15 | 5 | 1     |

The file_to_label_regex is represented as a tuple of two regexes — a matching and a replacement expression. So if an image is named '/path/to/AF647-1.tif' and a label file is named '/path/to/For_DL_AF647-1.csv', the following expression would be appropriate:

~~~python
dataloader = CellDataloader('/path/to/AF647-1.tif',file_to_label_regex=((r'%s([^%s+]*).tif' % (os.sep,os.sep),r'%sFor_DL_\1.csv' % os.sep)))
~~~



Arguments
---------

Additional arguments taken by Cell Data Loader include

~~~python

imfolder = '/path/to/folder'

dataloader = CellDataloader(imfolder,
			dim = (64,64),
			batch_size = 64,
			dtype = "torch", # Can also be "numpy"
			label_regex = None,
			verbose = True,
			segment_image = "whole", # "whole" outputs the whole image, resized
				# to dim; "sliced" cuts the image checkerboard pattern into
				# dim-shaped outputs, so it's suitable for large images; "cell"
				# segments cells from the image using CellPose, though it throws
				# an error if CellPose is not installed properly. CellPose is
				# not included by default in the dependencies and needs to be
				# installed separately by the user.
			n_channels = 3, # Detected in first image by default; re-samples all
				# images to force this number of channels
			augment_image = True, # Augments the output image in the standard
				# ways -- rotation, color jiggling, etc.
			label_balance = True, # Outputs proportional amounts of each label
				# in the dataset
			gpu_ids = None, # GPUs that the outputs are read to, if present.
			channels_first = True # Places channels either first, before the
				# batch dimension, or last
			)
~~~


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "cell-data-loader",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "Matt Leming <mleming@mgh.harvard.edu>",
    "keywords": "biomedical, cell, cell image, cellpose, csz, dataloader",
    "author": null,
    "author_email": "Matt Leming <mleming@mgh.harvard.edu>",
    "download_url": "https://files.pythonhosted.org/packages/86/2b/557c9f3c51cb1e8d0c799155c48bd45a1bf12445f6167a4d68a5de671d02/cell_data_loader-0.0.37.tar.gz",
    "platform": null,
    "description": "Cell Data Loader\n================\n\nCell Data Loader is a simple AI support tool in Python that can take in images of cells (or other image types) and output them with minimal effort to formats that can be read by Pytorch (Tensor) or Tensorflow (Numpy) format. With Cell Data Loader, users have the option to output their cell images as whole images, sliced images, or, with the support of [CellPose](https://github.com/MouseLand/cellpose), segment their images by cell and output those individually.\n\nIt can also be used for normal computer vision research, which is why CellPose is not a strict dependency.\n\nQuick Start\n-----------\n\nTo install Cell Data Loader, simply type into a standard UNIX terminal\n\n    pip install cell-data-loader\n\nTo use a quick Jupyter Notebook example, navigate to [example/CellDataLoaderPlayground.ipynb](https://github.com/mleming/CellDataLoader/blob/main/example/CellDataLoaderPlayground.ipynb), download it, and run it in [Jupyter Notebook](https://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html). On a Mac, that may be run with the following command:\n\n\tpython3 -m jupyterlab ~/Downloads/CellDataLoaderPlayground.ipynb\n\n\nPython\n------\n\nThe simplest way to use Cell Data Loader is to instantiate a dataloader as such:\n\n~~~python\nfrom cell_data_loader import CellDataloader\n\nimfolder = '/path/to/my/images'\n\ndataloader = CellDataloader(imfolder)\n\nfor image in dataloader:\n\t...\n~~~\n\nAnd viola!\n\nLists of files are also supported:\n\n~~~python\n\nimfiles = ['/path/to/image1.png','/path/to/image2.png','/path/to/image3.png']\n\ndataloader = CellDataloader(imfiles)\n\nfor image in dataloader:\n\t...\n~~~\n\nLabels\n------\n\nCell Data Loader has a few ways to support image labels. The simplest is whole images that are located in different folders, with each folder representing a label. This can be supported via the following:\n\n~~~python\nimfolder1 = '/path/to/my/images'\nimfolder2 = '/path/to/your/images'\n\ndataloader = CellDataloader(imfolder1,imfolder2)\n\nfor label,image in dataloader:\n\t...\n~~~\n\nAlternatively, if you have one folder or file list with images that have different naming conventions, a regex match is supported:\n\n~~~python\nimfiles = ['/path/to/CANCER_image1.png',\n\t\t\t'/path/to/CANCER_image2.png',\n\t\t\t'/path/to/CANCER_image3.png',\n\t\t\t'/path/to/HEALTHY_image1.png',\n\t\t\t'/path/to/HEALTHY_image2.png',\n\t\t\t'/path/to/HEALTHY_image3.png']\n\ndataloader = CellDataloader(imfiles,label_regex = [\"CANCER\",\"HEALTHY\"])\nfor label,image in dataloader:\n\t...\n~~~\n\nBoxes\n-----\n\nIn cases where you need to cut out individual cells from an image and have the coordinates file, cell_data_loader.py accepts an argument, file_to_label_regex, which is a regex that translates image file names into the paths of CSVs that correspond to the inputs that mark out the coordinates of labels on the cells. The format of the csv is as follows:\n\n| X  | Y   | W  | H | Label |\n| -  | -   | -  | - | ----- |\n| 14 | 13  | 5  | 6 | 0     |\n| 20 | 25  | 15 | 5 | 1     |\n\nThe file_to_label_regex is represented as a tuple of two regexes \u2014 a matching and a replacement expression. So if an image is named '/path/to/AF647-1.tif' and a label file is named '/path/to/For_DL_AF647-1.csv', the following expression would be appropriate:\n\n~~~python\ndataloader = CellDataloader('/path/to/AF647-1.tif',file_to_label_regex=((r'%s([^%s+]*).tif' % (os.sep,os.sep),r'%sFor_DL_\\1.csv' % os.sep)))\n~~~\n\n\n\nArguments\n---------\n\nAdditional arguments taken by Cell Data Loader include\n\n~~~python\n\nimfolder = '/path/to/folder'\n\ndataloader = CellDataloader(imfolder,\n\t\t\tdim = (64,64),\n\t\t\tbatch_size = 64,\n\t\t\tdtype = \"torch\", # Can also be \"numpy\"\n\t\t\tlabel_regex = None,\n\t\t\tverbose = True,\n\t\t\tsegment_image = \"whole\", # \"whole\" outputs the whole image, resized\n\t\t\t\t# to dim; \"sliced\" cuts the image checkerboard pattern into\n\t\t\t\t# dim-shaped outputs, so it's suitable for large images; \"cell\"\n\t\t\t\t# segments cells from the image using CellPose, though it throws\n\t\t\t\t# an error if CellPose is not installed properly. CellPose is\n\t\t\t\t# not included by default in the dependencies and needs to be\n\t\t\t\t# installed separately by the user.\n\t\t\tn_channels = 3, # Detected in first image by default; re-samples all\n\t\t\t\t# images to force this number of channels\n\t\t\taugment_image = True, # Augments the output image in the standard\n\t\t\t\t# ways -- rotation, color jiggling, etc.\n\t\t\tlabel_balance = True, # Outputs proportional amounts of each label\n\t\t\t\t# in the dataset\n\t\t\tgpu_ids = None, # GPUs that the outputs are read to, if present.\n\t\t\tchannels_first = True # Places channels either first, before the\n\t\t\t\t# batch dimension, or last\n\t\t\t)\n~~~\n\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "Converts general images of cells into formats and labels for deep learning pipelines",
    "version": "0.0.37",
    "project_urls": {
        "Homepage": "https://github.com/mleming/CellDataLoader"
    },
    "split_keywords": [
        "biomedical",
        " cell",
        " cell image",
        " cellpose",
        " csz",
        " dataloader"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7e5e71f027aca5ede0911dabdfe9c6c8f46be622307b3371fe572b4c458a6831",
                "md5": "f8569ad6d7811d5fbb112e18b3baded7",
                "sha256": "113f5598da5752ad77a5f9154a99981f1baecbefacafaad607e503dc49893f47"
            },
            "downloads": -1,
            "filename": "cell_data_loader-0.0.37-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f8569ad6d7811d5fbb112e18b3baded7",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 22580,
            "upload_time": "2025-08-13T20:22:21",
            "upload_time_iso_8601": "2025-08-13T20:22:21.407533Z",
            "url": "https://files.pythonhosted.org/packages/7e/5e/71f027aca5ede0911dabdfe9c6c8f46be622307b3371fe572b4c458a6831/cell_data_loader-0.0.37-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "862b557c9f3c51cb1e8d0c799155c48bd45a1bf12445f6167a4d68a5de671d02",
                "md5": "a58a3cdf86a2312b95eb19c7ac830f33",
                "sha256": "5415106846973ee44461029dc8024fcc658d7dea4f477e3d12df74c3c1bd7861"
            },
            "downloads": -1,
            "filename": "cell_data_loader-0.0.37.tar.gz",
            "has_sig": false,
            "md5_digest": "a58a3cdf86a2312b95eb19c7ac830f33",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 28105073,
            "upload_time": "2025-08-13T20:22:24",
            "upload_time_iso_8601": "2025-08-13T20:22:24.014813Z",
            "url": "https://files.pythonhosted.org/packages/86/2b/557c9f3c51cb1e8d0c799155c48bd45a1bf12445f6167a4d68a5de671d02/cell_data_loader-0.0.37.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-13 20:22:24",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mleming",
    "github_project": "CellDataLoader",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "numpy",
            "specs": []
        },
        {
            "name": "torch",
            "specs": []
        },
        {
            "name": "torchvision",
            "specs": []
        },
        {
            "name": "opencv-python",
            "specs": [
                [
                    ">=",
                    "4.5.4"
                ]
            ]
        },
        {
            "name": "slideio",
            "specs": [
                [
                    "==",
                    "2.4.1"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": []
        },
        {
            "name": "scikit-image",
            "specs": []
        },
        {
            "name": "pillow",
            "specs": []
        }
    ],
    "lcname": "cell-data-loader"
}
        
Elapsed time: 1.82074s