multidim-indexing

Name	multidim-indexing JSON
Version	0.9.1 JSON
	download
home_page
Summary	Multidimensional batch indexing of pytorch tensors and numpy arrays
upload_time	2023-09-27 21:32:49
maintainer
docs_url	None
author
requires_python	>=3.6
license	Copyright (c) 2023 Sheng Zhong Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords	numpy pytorch indexing
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            This repository documents the syntax for multidimensional indexing for Pytorch and Numpy, and offers classes that
encapsulates the process and provides additional features on top for data that represents a coordinate grid.
You can follow along the code blocks here with the included Jupyter notebook.

## Multidimensional Indexing

Suppose we have a multidimensional tensor, which could be a cached voxel grid, or a batch of images
(the values are ordered to make clear how the indexing works):

```python
import torch

B = 256  # batch size (optional)
shape = (B, 64, 64)
high = torch.prod(torch.tensor(shape)).to(dtype=torch.long)
data = torch.arange(0, high).reshape(shape)
```

A key operation on this tensor is to index it for querying and assignment. It is straightforward to index a single
value, and particular groupings of dimensions:

```python
# index a single element
print(data[124, 5, 52])

# index all dimensions given the first is index 0 (the following are equivalent)
print(data[0])
print(data[0, :, :])
print(data[0, ...])  # pytorch only syntax

# index all dimensions given the last is index 5 (the following are equivalent)
print(data[..., 5])
print(data[:, :, 5])
```

It is also straightforward to batch index along a single dimension:

```python
idx = [4, 8, 15, 16, 23, 42]

# index all dimensions given the first follows idx
print(data[idx].shape)  # (len(idx), 64, 64)
print(data[idx, ...].shape)
print(data[idx, :, :].shape)

# index all dimensions given the second follows idx
print(data[:, idx].shape)
print(data[:, idx, :].shape)
```

It is also reasonable to batch index along multiple dimensions. Note that it does not make sense for `idx` and `idx2` to
have different lengths since that would lead to combinations where one is missing a value.

```python
idx = [4, 8, 15, 16, 23, 42]
idx2 = [5, 2, 7, 1, 32, 4]

# index the last dimension when the first two are (4,5), (8,2), (15,7), (16,1), (23,32), and (42,4)
print(data[idx, idx2].shape)  # (len(idx), 64)
```

It is also common to have a list of entries by their indices that we'd like to batch query.

```python
# indices of 5 entries
idx3 = [[0, 5, 3],
        [2, 7, 5],
        [100, 23, 45],
        [3, 6, 4],
        [4, 2, 1]]
```

Directly indexing the tensor with a multidimensional index does not do what you want:

```python
print(data[idx3])  # results in an error
```

Instead, **split up the indices by their dimension** either manually, or with `torch.unbind`

```python
# easier to convert it to something that allows column indexing first
idx4 = torch.tensor(idx3)
print(data[idx4[:, 0], idx4[:, 1], idx4[:, 2]])  # returns the 5 entries as desired
print(data[torch.unbind(idx4, -1)])              # can also use unbind
```

## How can it be improved?

Most importantly, it may not be clear why simply doing `data[idx3]` does not work, and what the correct syntax is. So
reading up to here should resolve most questions about indexing with a batch of indices on a multidimensional tensor.
This library provides `MultidimView` variants (torch and numpy) that provide a view for these tensors with features
specialized to multidimensional tensor that represent coordinate gridded values:

- direct indexing so `data[idx3]` does what you want
- optional indexing on values if you specify value ranges
    - value resolution implicitly defined by size of source and value range
- optional safety checking for out of bound values or indices
    - provide default value for out of bound queries instead of throwing an exception

## Installation
numpy only
```shell
pip install multidim-indexing[numpy]
```
pytorch only
```shell
pip install multidim-indexing[torch]
```
all
```shell
pip install multidim-indexing[all]
```
## Usage

Continuing with `data` and the indices described before,

```python
from multidim_indexing import torch_view as view

# for numpy, import numpy_view and use NumpyMultidimView

# simple wrapper with bounds checking
data_multi = view.TorchMultidimView(data)
# another view into the data, treating it as a batch of 2 dimensional grid data with X in [-5, 5] and Y in [0, 10]
# can specify value to assign a query if it's out of bounds (defaults to -1)
# note that the invalid value needs to be of the same type as the source, so we can't for example use float('inf') here
data_batch = view.TorchMultidimView(data, value_ranges=[[0, B], [-5, 5], [0, 10]], invalid_value=-1)
# another view into the data, treating it as a 3D grid data with X in [-2.5, 5], Y in [0, 4], and Z in [0, 10]
data_3d = view.TorchMultidimView(data, value_ranges=[[-2.5, 5], [0, 4], [0, 10]])
```
By default, the nearest grid value is returned. You can instead use linear interpolation like scipy's interpn by setting
`method='linear'` in the constructor.
```python
data_3d = view.TorchMultidimView(data, value_ranges=[[-2.5, 5], [0, 4], [0, 10]], method='linear')
```

We can then use them like:

```python
# convert index to the corresponding type (pytorch vs numpy)
key = torch.tensor(idx3, dtype=torch.long)
print(data_multi[key])  # returns the 5 entries as desired
```

```python
# query the other views using grid values
# first, let's try keying the same 2D values across all batches
value_key_per_batch = torch.tensor([[-3.5, 0.2],
                                    [-4, 0.1],
                                    [-7, 0.5],  # this is out of bounds
                                    [3, 2]])
# number of entries to query
N = value_key_per_batch.shape[0]
print(torch.arange(B, dtype=value_key_per_batch.dtype).reshape(B, 1, 1).repeat(1, N, 1).shape)
# make the indices for all batches
value_key_batch = torch.cat(
    (torch.arange(B, dtype=value_key_per_batch.dtype).reshape(B, 1, 1).repeat(1, N, 1),
     value_key_per_batch.repeat(B, 1, 1)), dim=-1)
# keys can have an additional batch indices at the front
print(value_key_batch.shape)  # (B, N, 3)
# these 2 should be the same apart from the first batch index
print(value_key_batch[0:N])
print(value_key_batch[12*N:13*N])

# should see some -1 to indicate invalid value
print(data_batch[value_key_batch]) 

# also there is a shorthand for directly using the per batch indices
print(data_batch[value_key_per_batch.repeat(B,1,1)]) # should be the same as above
```

```python
value_key_3d = torch.tensor([[-2.5, 0., 0.],  # right on the boundary of validity
                             [-2.51, 0.5, 0],  # out of bounds
                             [5, 4, 10]  # right on the boundary
                             ]
                            )
print(data_3d[value_key_3d])  # (0, -1 for invalid, high - 1)
print(torch.prod(torch.tensor(data.shape)) - 1)
print(high - 1)
```

The indexing naturally allows setting in addition to querying. Out of bound indices will be
ignored.

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "multidim-indexing",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "Sheng Zhong <zhsh@umich.edu>",
    "keywords": "numpy,pytorch,indexing",
    "author": "",
    "author_email": "Sheng Zhong <zhsh@umich.edu>",
    "download_url": "https://files.pythonhosted.org/packages/ce/91/47f3f82b0885843b77ee3c59c92f4c0a286ae1063f3b48fba8f6233c62b5/multidim_indexing-0.9.1.tar.gz",
    "platform": null,
    "description": "This repository documents the syntax for multidimensional indexing for Pytorch and Numpy, and offers classes that\nencapsulates the process and provides additional features on top for data that represents a coordinate grid.\nYou can follow along the code blocks here with the included Jupyter notebook.\n\n## Multidimensional Indexing\n\nSuppose we have a multidimensional tensor, which could be a cached voxel grid, or a batch of images\n(the values are ordered to make clear how the indexing works):\n\n```python\nimport torch\n\nB = 256  # batch size (optional)\nshape = (B, 64, 64)\nhigh = torch.prod(torch.tensor(shape)).to(dtype=torch.long)\ndata = torch.arange(0, high).reshape(shape)\n```\n\nA key operation on this tensor is to index it for querying and assignment. It is straightforward to index a single\nvalue, and particular groupings of dimensions:\n\n```python\n# index a single element\nprint(data[124, 5, 52])\n\n# index all dimensions given the first is index 0 (the following are equivalent)\nprint(data[0])\nprint(data[0, :, :])\nprint(data[0, ...])  # pytorch only syntax\n\n# index all dimensions given the last is index 5 (the following are equivalent)\nprint(data[..., 5])\nprint(data[:, :, 5])\n```\n\nIt is also straightforward to batch index along a single dimension:\n\n```python\nidx = [4, 8, 15, 16, 23, 42]\n\n# index all dimensions given the first follows idx\nprint(data[idx].shape)  # (len(idx), 64, 64)\nprint(data[idx, ...].shape)\nprint(data[idx, :, :].shape)\n\n# index all dimensions given the second follows idx\nprint(data[:, idx].shape)\nprint(data[:, idx, :].shape)\n```\n\nIt is also reasonable to batch index along multiple dimensions. Note that it does not make sense for `idx` and `idx2` to\nhave different lengths since that would lead to combinations where one is missing a value.\n\n```python\nidx = [4, 8, 15, 16, 23, 42]\nidx2 = [5, 2, 7, 1, 32, 4]\n\n# index the last dimension when the first two are (4,5), (8,2), (15,7), (16,1), (23,32), and (42,4)\nprint(data[idx, idx2].shape)  # (len(idx), 64)\n```\n\nIt is also common to have a list of entries by their indices that we'd like to batch query.\n\n```python\n# indices of 5 entries\nidx3 = [[0, 5, 3],\n        [2, 7, 5],\n        [100, 23, 45],\n        [3, 6, 4],\n        [4, 2, 1]]\n```\n\nDirectly indexing the tensor with a multidimensional index does not do what you want:\n\n```python\nprint(data[idx3])  # results in an error\n```\n\nInstead, **split up the indices by their dimension** either manually, or with `torch.unbind`\n\n```python\n# easier to convert it to something that allows column indexing first\nidx4 = torch.tensor(idx3)\nprint(data[idx4[:, 0], idx4[:, 1], idx4[:, 2]])  # returns the 5 entries as desired\nprint(data[torch.unbind(idx4, -1)])              # can also use unbind\n```\n\n## How can it be improved?\n\nMost importantly, it may not be clear why simply doing `data[idx3]` does not work, and what the correct syntax is. So\nreading up to here should resolve most questions about indexing with a batch of indices on a multidimensional tensor.\nThis library provides `MultidimView` variants (torch and numpy) that provide a view for these tensors with features\nspecialized to multidimensional tensor that represent coordinate gridded values:\n\n- direct indexing so `data[idx3]` does what you want\n- optional indexing on values if you specify value ranges\n    - value resolution implicitly defined by size of source and value range\n- optional safety checking for out of bound values or indices\n    - provide default value for out of bound queries instead of throwing an exception\n\n## Installation\nnumpy only\n```shell\npip install multidim-indexing[numpy]\n```\npytorch only\n```shell\npip install multidim-indexing[torch]\n```\nall\n```shell\npip install multidim-indexing[all]\n```\n## Usage\n\nContinuing with `data` and the indices described before,\n\n```python\nfrom multidim_indexing import torch_view as view\n\n# for numpy, import numpy_view and use NumpyMultidimView\n\n# simple wrapper with bounds checking\ndata_multi = view.TorchMultidimView(data)\n# another view into the data, treating it as a batch of 2 dimensional grid data with X in [-5, 5] and Y in [0, 10]\n# can specify value to assign a query if it's out of bounds (defaults to -1)\n# note that the invalid value needs to be of the same type as the source, so we can't for example use float('inf') here\ndata_batch = view.TorchMultidimView(data, value_ranges=[[0, B], [-5, 5], [0, 10]], invalid_value=-1)\n# another view into the data, treating it as a 3D grid data with X in [-2.5, 5], Y in [0, 4], and Z in [0, 10]\ndata_3d = view.TorchMultidimView(data, value_ranges=[[-2.5, 5], [0, 4], [0, 10]])\n```\nBy default, the nearest grid value is returned. You can instead use linear interpolation like scipy's interpn by setting\n`method='linear'` in the constructor.\n```python\ndata_3d = view.TorchMultidimView(data, value_ranges=[[-2.5, 5], [0, 4], [0, 10]], method='linear')\n```\n\nWe can then use them like:\n\n```python\n# convert index to the corresponding type (pytorch vs numpy)\nkey = torch.tensor(idx3, dtype=torch.long)\nprint(data_multi[key])  # returns the 5 entries as desired\n```\n\n```python\n# query the other views using grid values\n# first, let's try keying the same 2D values across all batches\nvalue_key_per_batch = torch.tensor([[-3.5, 0.2],\n                                    [-4, 0.1],\n                                    [-7, 0.5],  # this is out of bounds\n                                    [3, 2]])\n# number of entries to query\nN = value_key_per_batch.shape[0]\nprint(torch.arange(B, dtype=value_key_per_batch.dtype).reshape(B, 1, 1).repeat(1, N, 1).shape)\n# make the indices for all batches\nvalue_key_batch = torch.cat(\n    (torch.arange(B, dtype=value_key_per_batch.dtype).reshape(B, 1, 1).repeat(1, N, 1),\n     value_key_per_batch.repeat(B, 1, 1)), dim=-1)\n# keys can have an additional batch indices at the front\nprint(value_key_batch.shape)  # (B, N, 3)\n# these 2 should be the same apart from the first batch index\nprint(value_key_batch[0:N])\nprint(value_key_batch[12*N:13*N])\n\n# should see some -1 to indicate invalid value\nprint(data_batch[value_key_batch]) \n\n# also there is a shorthand for directly using the per batch indices\nprint(data_batch[value_key_per_batch.repeat(B,1,1)]) # should be the same as above\n```\n\n```python\nvalue_key_3d = torch.tensor([[-2.5, 0., 0.],  # right on the boundary of validity\n                             [-2.51, 0.5, 0],  # out of bounds\n                             [5, 4, 10]  # right on the boundary\n                             ]\n                            )\nprint(data_3d[value_key_3d])  # (0, -1 for invalid, high - 1)\nprint(torch.prod(torch.tensor(data.shape)) - 1)\nprint(high - 1)\n```\n\nThe indexing naturally allows setting in addition to querying. Out of bound indices will be\nignored.\n",
    "bugtrack_url": null,
    "license": "Copyright (c) 2023 Sheng Zhong  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.  ",
    "summary": "Multidimensional batch indexing of pytorch tensors and numpy arrays",
    "version": "0.9.1",
    "project_urls": {
        "Bug Reports": "https://github.com/LemonPi/multidim_indexing/issues",
        "Homepage": "https://github.com/LemonPi/multidim_indexing",
        "Source": "https://github.com/LemonPi/multidim_indexing"
    },
    "split_keywords": [
        "numpy",
        "pytorch",
        "indexing"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "61032c7a5f4c29ec515a293c4d2e3aa91f9cff2fa0f45bbaa0a612af84bd63fd",
                "md5": "c688cc2980dc55e5900c2008c962b377",
                "sha256": "5657ecb45d6d4f89f6869e7394358b4844e88bd8e9d4a56f2dd33c2ac4f4ddb3"
            },
            "downloads": -1,
            "filename": "multidim_indexing-0.9.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c688cc2980dc55e5900c2008c962b377",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 10015,
            "upload_time": "2023-09-27T21:32:45",
            "upload_time_iso_8601": "2023-09-27T21:32:45.168592Z",
            "url": "https://files.pythonhosted.org/packages/61/03/2c7a5f4c29ec515a293c4d2e3aa91f9cff2fa0f45bbaa0a612af84bd63fd/multidim_indexing-0.9.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ce9147f3f82b0885843b77ee3c59c92f4c0a286ae1063f3b48fba8f6233c62b5",
                "md5": "9651f99502f92a1d1843d553dbd3d6a7",
                "sha256": "8fdc3f93836ab3cf53dbf4c6342bf115c0e55e2e2c2922dc4b349c75a0421af0"
            },
            "downloads": -1,
            "filename": "multidim_indexing-0.9.1.tar.gz",
            "has_sig": false,
            "md5_digest": "9651f99502f92a1d1843d553dbd3d6a7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 15348,
            "upload_time": "2023-09-27T21:32:49",
            "upload_time_iso_8601": "2023-09-27T21:32:49.245210Z",
            "url": "https://files.pythonhosted.org/packages/ce/91/47f3f82b0885843b77ee3c59c92f4c0a286ae1063f3b48fba8f6233c62b5/multidim_indexing-0.9.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-27 21:32:49",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "LemonPi",
    "github_project": "multidim_indexing",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "multidim-indexing"
}