ndsampler

Name	ndsampler JSON
Version	0.8.0 JSON
	download
home_page	https://gitlab.kitware.com/computer-vision/ndsampler
Summary	Fast sampling from large images
upload_time	2024-06-24 21:53:01
maintainer	None
docs_url	None
author	Jon Crall
requires_python	>=3.8
license	Apache 2
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            ndsampler
=========

|GitlabCIPipeline| |GitlabCICoverage| |Pypi| |PypiDownloads| |ReadTheDocs|


+------------------+---------------------------------------------------------+
| Read the docs    | https://ndsampler.readthedocs.io                        |
+------------------+---------------------------------------------------------+
| Gitlab (main)    | https://gitlab.kitware.com/computer-vision/ndsampler    |
+------------------+---------------------------------------------------------+
| Github (mirror)  | https://github.com/Kitware/ndsampler                    |
+------------------+---------------------------------------------------------+
| Pypi             | https://pypi.org/project/ndsampler                      |
+------------------+---------------------------------------------------------+

The main webpage for this project is: https://gitlab.kitware.com/computer-vision/ndsampler

Fast random access to small regions in large images.

Random access is amortized by converting images into an efficient backend
format (current backends include cloud-optimized geotiffs (cog) or numpy array
files (npy)). If images are already in COG format, then no conversion is
needed.

The ndsampler module was built with detection, segmentation, and classification
tasks in mind, but it is not limited to these use cases.

The basic idea is to ensure your data is in MS-coco format, and then the
CocoSampler class will let you sample positive and negative regions.

For classification tasks the MS-COCO data could just be that every image has an
annotation that takes up the entire image.

The aspiration of this module is to allow you to access your data in-situ (i.e.
no pre-processing), although the cost of that may depend on how efficient your
data is to access. However a faster cache can be built at the cost of disk
space. Currently we have a "cog" and "npy" backend. Help is wanted to integrate
backends for hdf5 and other medical / domain-specific formats.

Installation
------------

The `ndsampler <https://pypi.org/project/ndsampler/>`_.  package can be installed via pip:

.. code-block:: bash

    pip install ndsampler


Note that ndsampler depends on `kwimage <https://pypi.org/project/kwimage/>`_,
where there is a known compatibility issue between `opencv-python <https://pypi.org/project/opencv-python/>`_
and `opencv-python-headless <https://pypi.org/project/opencv-python-headless/>`_. Please ensure that one
or the other (but not both) are installed as well:

.. code-block:: bash

    pip install opencv-python-headless

    # OR

    pip install opencv-python


Lastly, to fully leverage ndsampler's features GDAL must be installed (although
much of ndsampler can work without it).  Kitware has a pypi index that hosts
GDAL wheels for linux systems, but other systems will need to find some way of
installing gdal (conda is safe choice).

.. code-block:: bash

    pip install --find-links https://girder.github.io/large_image_wheels GDAL


Features
--------

* CocoDataset for managing and manipulating annotated image datasets
* Amortized O(1) sampling of N-dimension space-time data (wrt to constant window size) (e.g. images and video).
* Hierarchical or mutually exclusive category management.
* Random negative window sampling.
* Coverage-based positive sampling.
* Dynamic toydata generator.


Also installs the kwcoco package and CLI tool.


Usage
-----

The main pattern of usage is:

1. Use kwcoco to load a json-based COCO dataset (or create a ``kwcoco.CocoDataset``
   programatically).

2. Pass that dataset to an ``ndsampler.CocoSampler`` object, and that
   effectively wraps the json structure that holds your images and annotations
   and it allows you to sample patches from those images efficiently.

3. You can either manually specify image + region or you can specify an
   annotation id, in which case it loads the region corresponding to the
   annotation.


Example
--------

This example shows how you can efficiently load subregions from images.

.. code-block:: python

    >>> # Imagine you have some images
    >>> import kwimage
    >>> image_paths = [
    >>>     kwimage.grab_test_image_fpath('astro'),
    >>>     kwimage.grab_test_image_fpath('carl'),
    >>>     kwimage.grab_test_image_fpath('airport'),
    >>> ]  # xdoc: +IGNORE_WANT
    ['~/.cache/kwimage/demodata/KXhKM72.png',
     '~/.cache/kwimage/demodata/flTHWFD.png',
     '~/.cache/kwimage/demodata/Airport.jpg']
    >>> # And you want to randomly load subregions of them in O(1) time
    >>> import ndsampler
    >>> import kwcoco
    >>> # First make a COCO dataset that refers to your images (and possibly annotations)
    >>> dataset = {
    >>>     'images': [{'id': i, 'file_name': fpath} for i, fpath in enumerate(image_paths)],
    >>>     'annotations': [],
    >>>     'categories': [],
    >>> }
    >>> coco_dset = kwcoco.CocoDataset(dataset)
    >>> print(coco_dset)
    <CocoDataset(tag=None, n_anns=0, n_imgs=3, n_cats=0)>
    >>> # Now pass the dataset to a sampler and tell it where it can store temporary files
    >>> workdir = ub.Path.appdir('ndsampler/demo').ensuredir()
    >>> sampler = ndsampler.CocoSampler(coco_dset, workdir=workdir)
    >>> # Now you can load arbirary samples by specifing a target dictionary
    >>> # with an image_id (gid) center location (cx, cy) and width, height.
    >>> target = {'gid': 0, 'cx': 200, 'cy': 200, 'width': 100, 'height': 100}
    >>> sample = sampler.load_sample(target)
    >>> # The sample contains the image data, any visible annotations, a reference
    >>> # to the original target, and params of the transform used to sample this
    >>> # patch
    >>> print(sorted(sample.keys()))
    ['annots', 'im', 'params', 'tr']
    >>> im = sample['im']
    >>> print(im.shape)
    (100, 100, 3)
    >>> # The load sample function is at the core of what ndsampler does
    >>> # There are other helper functions like load_positive / load_negative
    >>> # which deal with annotations. See those for more details.
    >>> # For random negative sampling see coco_regions.

A Note On COGs
--------------
COGs (cloud optimized geotiffs) are the backbone efficient sampling in the
ndsampler library.

To preform deep learning efficiently you need to be able to effectively
randomly sample cropped regions from images, so when ``ndsampler.Sampler``
(more acurately the ``FramesSampler`` belonging to the base ``Sampler`` object)
is in "cog" mode, it caches all images larger than 512x512 in cog format.

I've noticed a significant speedups even for "small" 1024x1024 images.  I
haven't made effective use of the overviews feature yet, but in the future I
plan to, as I want to allow ndsampler to sample in scale as well as in space.

Its possible to obtain this speedup with the "npy" backend, which supports true
random sampling, but this is an uncompressed format, which can require a large
amount of disk space. Using the "None" backend, means that loading a small
windowed region requires loading the entire image first (which can be ok for
some applications).

Using COGs requires that GDAL is installed.  Installing GDAL is a pain though.

https://gist.github.com/cspanring/5680334

Using conda is relatively simple

.. code-block:: bash

    conda install gdal

    # Test that this works
    python -c "from osgeo import gdal; print(gdal)"


Also possible to use system packages

.. code-block:: bash

    # References:
    # https://gis.stackexchange.com/questions/28966/python-gdal-package-missing-header-file-when-installing-via-pip
    # https://gist.github.com/cspanring/5680334


    # Install GDAL system libs
    sudo apt install libgdal-dev

    GDAL_VERSION=`gdal-config --version`
    echo "GDAL_VERSION = $GDAL_VERSION"
    pip install --global-option=build_ext --global-option="-I/usr/include/gdal" GDAL==$GDAL_VERSION


    # Test that this works
    python -c "from osgeo import gdal; print(gdal)"


Kitware also has a pypi index that hosts GDAL wheels for linux systems:

.. code-block:: bash

    pip install --find-links https://girder.github.io/large_image_wheels GDAL


TODO
----

- [ ] Currently only supports image-based detection tasks, but not much work is
  needed to extend to video. The code was originally based on sampling code for
  video, so ndimensions is builtin to most places in the code. However, there are
  currently no test cases that demonstrate that this library does work with video.
  So we should (a) port the video toydata code from irharn to test ndcases and (b)
  fix the code to work for both still images and video where things break.

- [ ] Currently we are good at loading many small objects in 2d images.
  However, we are bad at loading images with one single large object that needs
  to be downsampled (e.g. loading an entire 1024x1024 image and downsampling it
  to 224x224). We should find a way to mitigate this using pyramid overviews in
  the backend COG files.


.. |Pypi| image:: https://img.shields.io/pypi/v/ndsampler.svg
   :target: https://pypi.python.org/pypi/ndsampler

.. |PypiDownloads| image:: https://img.shields.io/pypi/dm/ndsampler.svg
   :target: https://pypistats.org/packages/ndsampler

.. |ReadTheDocs| image:: https://readthedocs.org/projects/ndsampler/badge/?version=latest
    :target: http://ndsampler.readthedocs.io/en/latest/

.. # See: https://ci.appveyor.com/project/jon.crall/ndsampler/settings/badges
.. .. |Appveyor| image:: https://ci.appveyor.com/api/projects/status/py3s2d6tyfjc8lm3/branch/master?svg=true
.. :target: https://ci.appveyor.com/project/jon.crall/ndsampler/branch/master

.. |GitlabCIPipeline| image:: https://gitlab.kitware.com/computer-vision/ndsampler/badges/master/pipeline.svg
   :target: https://gitlab.kitware.com/computer-vision/ndsampler/-/jobs

.. |GitlabCICoverage| image:: https://gitlab.kitware.com/computer-vision/ndsampler/badges/master/coverage.svg
    :target: https://gitlab.kitware.com/computer-vision/ndsampler/commits/master

Raw data

            {
    "_id": null,
    "home_page": "https://gitlab.kitware.com/computer-vision/ndsampler",
    "name": "ndsampler",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": "Jon Crall",
    "author_email": "jon.crall@kitware.com",
    "download_url": null,
    "platform": null,
    "description": "ndsampler\n=========\n\n|GitlabCIPipeline| |GitlabCICoverage| |Pypi| |PypiDownloads| |ReadTheDocs|\n\n\n+------------------+---------------------------------------------------------+\n| Read the docs    | https://ndsampler.readthedocs.io                        |\n+------------------+---------------------------------------------------------+\n| Gitlab (main)    | https://gitlab.kitware.com/computer-vision/ndsampler    |\n+------------------+---------------------------------------------------------+\n| Github (mirror)  | https://github.com/Kitware/ndsampler                    |\n+------------------+---------------------------------------------------------+\n| Pypi             | https://pypi.org/project/ndsampler                      |\n+------------------+---------------------------------------------------------+\n\nThe main webpage for this project is: https://gitlab.kitware.com/computer-vision/ndsampler\n\nFast random access to small regions in large images.\n\nRandom access is amortized by converting images into an efficient backend\nformat (current backends include cloud-optimized geotiffs (cog) or numpy array\nfiles (npy)). If images are already in COG format, then no conversion is\nneeded.\n\nThe ndsampler module was built with detection, segmentation, and classification\ntasks in mind, but it is not limited to these use cases.\n\nThe basic idea is to ensure your data is in MS-coco format, and then the\nCocoSampler class will let you sample positive and negative regions.\n\nFor classification tasks the MS-COCO data could just be that every image has an\nannotation that takes up the entire image.\n\nThe aspiration of this module is to allow you to access your data in-situ (i.e.\nno pre-processing), although the cost of that may depend on how efficient your\ndata is to access. However a faster cache can be built at the cost of disk\nspace. Currently we have a \"cog\" and \"npy\" backend. Help is wanted to integrate\nbackends for hdf5 and other medical / domain-specific formats.\n\nInstallation\n------------\n\nThe `ndsampler <https://pypi.org/project/ndsampler/>`_.  package can be installed via pip:\n\n.. code-block:: bash\n\n    pip install ndsampler\n\n\nNote that ndsampler depends on `kwimage <https://pypi.org/project/kwimage/>`_,\nwhere there is a known compatibility issue between `opencv-python <https://pypi.org/project/opencv-python/>`_\nand `opencv-python-headless <https://pypi.org/project/opencv-python-headless/>`_. Please ensure that one\nor the other (but not both) are installed as well:\n\n.. code-block:: bash\n\n    pip install opencv-python-headless\n\n    # OR\n\n    pip install opencv-python\n\n\nLastly, to fully leverage ndsampler's features GDAL must be installed (although\nmuch of ndsampler can work without it).  Kitware has a pypi index that hosts\nGDAL wheels for linux systems, but other systems will need to find some way of\ninstalling gdal (conda is safe choice).\n\n.. code-block:: bash\n\n    pip install --find-links https://girder.github.io/large_image_wheels GDAL\n\n\nFeatures\n--------\n\n* CocoDataset for managing and manipulating annotated image datasets\n* Amortized O(1) sampling of N-dimension space-time data (wrt to constant window size) (e.g. images and video).\n* Hierarchical or mutually exclusive category management.\n* Random negative window sampling.\n* Coverage-based positive sampling.\n* Dynamic toydata generator.\n\n\nAlso installs the kwcoco package and CLI tool.\n\n\nUsage\n-----\n\nThe main pattern of usage is:\n\n1. Use kwcoco to load a json-based COCO dataset (or create a ``kwcoco.CocoDataset``\n   programatically).\n\n2. Pass that dataset to an ``ndsampler.CocoSampler`` object, and that\n   effectively wraps the json structure that holds your images and annotations\n   and it allows you to sample patches from those images efficiently.\n\n3. You can either manually specify image + region or you can specify an\n   annotation id, in which case it loads the region corresponding to the\n   annotation.\n\n\nExample\n--------\n\nThis example shows how you can efficiently load subregions from images.\n\n.. code-block:: python\n\n    >>> # Imagine you have some images\n    >>> import kwimage\n    >>> image_paths = [\n    >>>     kwimage.grab_test_image_fpath('astro'),\n    >>>     kwimage.grab_test_image_fpath('carl'),\n    >>>     kwimage.grab_test_image_fpath('airport'),\n    >>> ]  # xdoc: +IGNORE_WANT\n    ['~/.cache/kwimage/demodata/KXhKM72.png',\n     '~/.cache/kwimage/demodata/flTHWFD.png',\n     '~/.cache/kwimage/demodata/Airport.jpg']\n    >>> # And you want to randomly load subregions of them in O(1) time\n    >>> import ndsampler\n    >>> import kwcoco\n    >>> # First make a COCO dataset that refers to your images (and possibly annotations)\n    >>> dataset = {\n    >>>     'images': [{'id': i, 'file_name': fpath} for i, fpath in enumerate(image_paths)],\n    >>>     'annotations': [],\n    >>>     'categories': [],\n    >>> }\n    >>> coco_dset = kwcoco.CocoDataset(dataset)\n    >>> print(coco_dset)\n    <CocoDataset(tag=None, n_anns=0, n_imgs=3, n_cats=0)>\n    >>> # Now pass the dataset to a sampler and tell it where it can store temporary files\n    >>> workdir = ub.Path.appdir('ndsampler/demo').ensuredir()\n    >>> sampler = ndsampler.CocoSampler(coco_dset, workdir=workdir)\n    >>> # Now you can load arbirary samples by specifing a target dictionary\n    >>> # with an image_id (gid) center location (cx, cy) and width, height.\n    >>> target = {'gid': 0, 'cx': 200, 'cy': 200, 'width': 100, 'height': 100}\n    >>> sample = sampler.load_sample(target)\n    >>> # The sample contains the image data, any visible annotations, a reference\n    >>> # to the original target, and params of the transform used to sample this\n    >>> # patch\n    >>> print(sorted(sample.keys()))\n    ['annots', 'im', 'params', 'tr']\n    >>> im = sample['im']\n    >>> print(im.shape)\n    (100, 100, 3)\n    >>> # The load sample function is at the core of what ndsampler does\n    >>> # There are other helper functions like load_positive / load_negative\n    >>> # which deal with annotations. See those for more details.\n    >>> # For random negative sampling see coco_regions.\n\nA Note On COGs\n--------------\nCOGs (cloud optimized geotiffs) are the backbone efficient sampling in the\nndsampler library.\n\nTo preform deep learning efficiently you need to be able to effectively\nrandomly sample cropped regions from images, so when ``ndsampler.Sampler``\n(more acurately the ``FramesSampler`` belonging to the base ``Sampler`` object)\nis in \"cog\" mode, it caches all images larger than 512x512 in cog format.\n\nI've noticed a significant speedups even for \"small\" 1024x1024 images.  I\nhaven't made effective use of the overviews feature yet, but in the future I\nplan to, as I want to allow ndsampler to sample in scale as well as in space.\n\nIts possible to obtain this speedup with the \"npy\" backend, which supports true\nrandom sampling, but this is an uncompressed format, which can require a large\namount of disk space. Using the \"None\" backend, means that loading a small\nwindowed region requires loading the entire image first (which can be ok for\nsome applications).\n\nUsing COGs requires that GDAL is installed.  Installing GDAL is a pain though.\n\nhttps://gist.github.com/cspanring/5680334\n\nUsing conda is relatively simple\n\n.. code-block:: bash\n\n    conda install gdal\n\n    # Test that this works\n    python -c \"from osgeo import gdal; print(gdal)\"\n\n\nAlso possible to use system packages\n\n.. code-block:: bash\n\n    # References:\n    # https://gis.stackexchange.com/questions/28966/python-gdal-package-missing-header-file-when-installing-via-pip\n    # https://gist.github.com/cspanring/5680334\n\n\n    # Install GDAL system libs\n    sudo apt install libgdal-dev\n\n    GDAL_VERSION=`gdal-config --version`\n    echo \"GDAL_VERSION = $GDAL_VERSION\"\n    pip install --global-option=build_ext --global-option=\"-I/usr/include/gdal\" GDAL==$GDAL_VERSION\n\n\n    # Test that this works\n    python -c \"from osgeo import gdal; print(gdal)\"\n\n\nKitware also has a pypi index that hosts GDAL wheels for linux systems:\n\n.. code-block:: bash\n\n    pip install --find-links https://girder.github.io/large_image_wheels GDAL\n\n\nTODO\n----\n\n- [ ] Currently only supports image-based detection tasks, but not much work is\n  needed to extend to video. The code was originally based on sampling code for\n  video, so ndimensions is builtin to most places in the code. However, there are\n  currently no test cases that demonstrate that this library does work with video.\n  So we should (a) port the video toydata code from irharn to test ndcases and (b)\n  fix the code to work for both still images and video where things break.\n\n- [ ] Currently we are good at loading many small objects in 2d images.\n  However, we are bad at loading images with one single large object that needs\n  to be downsampled (e.g. loading an entire 1024x1024 image and downsampling it\n  to 224x224). We should find a way to mitigate this using pyramid overviews in\n  the backend COG files.\n\n\n.. |Pypi| image:: https://img.shields.io/pypi/v/ndsampler.svg\n   :target: https://pypi.python.org/pypi/ndsampler\n\n.. |PypiDownloads| image:: https://img.shields.io/pypi/dm/ndsampler.svg\n   :target: https://pypistats.org/packages/ndsampler\n\n.. |ReadTheDocs| image:: https://readthedocs.org/projects/ndsampler/badge/?version=latest\n    :target: http://ndsampler.readthedocs.io/en/latest/\n\n.. # See: https://ci.appveyor.com/project/jon.crall/ndsampler/settings/badges\n.. .. |Appveyor| image:: https://ci.appveyor.com/api/projects/status/py3s2d6tyfjc8lm3/branch/master?svg=true\n.. :target: https://ci.appveyor.com/project/jon.crall/ndsampler/branch/master\n\n.. |GitlabCIPipeline| image:: https://gitlab.kitware.com/computer-vision/ndsampler/badges/master/pipeline.svg\n   :target: https://gitlab.kitware.com/computer-vision/ndsampler/-/jobs\n\n.. |GitlabCICoverage| image:: https://gitlab.kitware.com/computer-vision/ndsampler/badges/master/coverage.svg\n    :target: https://gitlab.kitware.com/computer-vision/ndsampler/commits/master\n",
    "bugtrack_url": null,
    "license": "Apache 2",
    "summary": "Fast sampling from large images",
    "version": "0.8.0",
    "project_urls": {
        "Homepage": "https://gitlab.kitware.com/computer-vision/ndsampler"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "be4206e6c07e25352f4ae01a286a9e03e21df672c11875352032420f50b43709",
                "md5": "f4fcc09c9c9edd838284ce200a37a807",
                "sha256": "5586ad92cdd2d8e1e4fdbcbe2ddc78b8adf1d1ed4ff11b1d8126d3a2a63f3e83"
            },
            "downloads": -1,
            "filename": "ndsampler-0.8.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f4fcc09c9c9edd838284ce200a37a807",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 96479,
            "upload_time": "2024-06-24T21:53:01",
            "upload_time_iso_8601": "2024-06-24T21:53:01.862782Z",
            "url": "https://files.pythonhosted.org/packages/be/42/06e6c07e25352f4ae01a286a9e03e21df672c11875352032420f50b43709/ndsampler-0.8.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-24 21:53:01",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "ndsampler"
}

Jon Crall