tensorloader


Nametensorloader JSON
Version 0.1.0 PyPI version JSON
download
home_page
SummaryA faster dataloader for tensor data.
upload_time2023-01-26 15:33:42
maintainer
docs_urlNone
author
requires_python>=3.8
licenseApache-2.0
keywords pytorch
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Tensor Loader

![PyPI](https://img.shields.io/pypi/v/tensorloader?logo=pypi&logoColor=white) ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/tensorloader?logo=python&logoColor=white) ![PyPI - License](https://img.shields.io/pypi/l/tensorloader)

`TensorLoader` is similar to the combination of PyTorch's `TensorDataset` and `DataLoader`. It is faster and has better type hints.

## Installation

Install from PyPi:

```shell
pip install tensorloader
```

Install from source:

```shell
git clone https://github.com/zhb2000/tensorloader.git
cd tensorloader
pip install .
```

## Usage

This package only contains a `TensorLoader` class.

```python
from tensorloader import TensorLoader
```

Use a single tensor as data:

```python
X = torch.tensor(...)
dataloader = TensorLoader(X)
for x in dataloader:
    ...
```

Use a tuple of tensor as data:

```python
X = torch.tensor(...)
Y = torch.tensor(...)
dataloader = TensorLoader((X, Y))
for x, y in dataloader:  # unpack the batch tuple as x, y
    ...
```

Use a namedtuple of tensor as data:

```python
from collections import namedtuple

Batch = namedtuple('Batch', ['x', 'y'])
X = torch.tensor(...)
Y = torch.tensor(...)
# set unpack_args=True when using a namedtuple as data
dataloader = TensorLoader(Batch(X, Y), unpack_args=True)
for batch in dataloader:
    assert isinstance(batch, Batch)
    assert isinstance(batch.x, torch.Tensor)
    assert isinstance(batch.y, torch.Tensor)
    x, y = batch
    ...
```

PS: Namedtuples are similar to common tuples and they allow field access by name which makes code more readable. For more information, see the [documentation](https://docs.python.org/3/library/collections.html#collections.namedtuple) of namedtuple.

## Speed Test

`TensorLoader` is much faster than `TensorDataset` + `DataLoader`, for it uses vectorized operations instead of creating costly Python lists.

```python
import timeit
import torch
from torch.utils.data import TensorDataset, DataLoader
from tensorloader import TensorLoader

def speed_test(epoch_num: int, **kwargs):
    sample_num = int(1e6)
    X = torch.zeros(sample_num, 10)
    Y = torch.zeros(sample_num)
    tensorloader = TensorLoader((X, Y), **kwargs)
    torchloader = DataLoader(TensorDataset(X, Y), **kwargs)

    def loop(loader):
        for _ in loader:
            pass

    t1 = timeit.timeit(lambda: loop(tensorloader), number=epoch_num)
    t2 = timeit.timeit(lambda: loop(torchloader), number=epoch_num)
    print(f'TensorLoader: {t1:.4g}s, TensorDatset + DataLoader: {t2:.4g}s.')
```

```
>>> speed_test(epoch_num=10, batch_size=128, shuffle=False)
TensorLoader: 0.363s, TensorDatset + DataLoader: 54.39s.
>>> speed_test(epoch_num=10, batch_size=128, shuffle=True)
TensorLoader: 0.9296s, TensorDatset + DataLoader: 56.54s.
>>> speed_test(epoch_num=10, batch_size=10000, shuffle=False)
TensorLoader: 0.005262s, TensorDatset + DataLoader: 55.57s.
>>> speed_test(epoch_num=10, batch_size=10000, shuffle=True)
TensorLoader: 0.5682s, TensorDatset + DataLoader: 57.71s.
```

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "tensorloader",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "pytorch",
    "author": "",
    "author_email": "ZHB <zhb2000@zju.edu.cn>",
    "download_url": "https://files.pythonhosted.org/packages/8f/40/ee6150f7986d784c9d70f0af58a0eda09f936c9d9359f1508afb3cc8c198/tensorloader-0.1.0.tar.gz",
    "platform": null,
    "description": "# Tensor Loader\r\n\r\n![PyPI](https://img.shields.io/pypi/v/tensorloader?logo=pypi&logoColor=white) ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/tensorloader?logo=python&logoColor=white) ![PyPI - License](https://img.shields.io/pypi/l/tensorloader)\r\n\r\n`TensorLoader` is similar to the combination of PyTorch's `TensorDataset` and `DataLoader`. It is faster and has better type hints.\r\n\r\n## Installation\r\n\r\nInstall from PyPi:\r\n\r\n```shell\r\npip install tensorloader\r\n```\r\n\r\nInstall from source:\r\n\r\n```shell\r\ngit clone https://github.com/zhb2000/tensorloader.git\r\ncd tensorloader\r\npip install .\r\n```\r\n\r\n## Usage\r\n\r\nThis package only contains a `TensorLoader` class.\r\n\r\n```python\r\nfrom tensorloader import TensorLoader\r\n```\r\n\r\nUse a single tensor as data:\r\n\r\n```python\r\nX = torch.tensor(...)\r\ndataloader = TensorLoader(X)\r\nfor x in dataloader:\r\n    ...\r\n```\r\n\r\nUse a tuple of tensor as data:\r\n\r\n```python\r\nX = torch.tensor(...)\r\nY = torch.tensor(...)\r\ndataloader = TensorLoader((X, Y))\r\nfor x, y in dataloader:  # unpack the batch tuple as x, y\r\n    ...\r\n```\r\n\r\nUse a namedtuple of tensor as data:\r\n\r\n```python\r\nfrom collections import namedtuple\r\n\r\nBatch = namedtuple('Batch', ['x', 'y'])\r\nX = torch.tensor(...)\r\nY = torch.tensor(...)\r\n# set unpack_args=True when using a namedtuple as data\r\ndataloader = TensorLoader(Batch(X, Y), unpack_args=True)\r\nfor batch in dataloader:\r\n    assert isinstance(batch, Batch)\r\n    assert isinstance(batch.x, torch.Tensor)\r\n    assert isinstance(batch.y, torch.Tensor)\r\n    x, y = batch\r\n    ...\r\n```\r\n\r\nPS: Namedtuples are similar to common tuples and they allow field access by name which makes code more readable. For more information, see the [documentation](https://docs.python.org/3/library/collections.html#collections.namedtuple) of namedtuple.\r\n\r\n## Speed Test\r\n\r\n`TensorLoader` is much faster than `TensorDataset` + `DataLoader`, for it uses vectorized operations instead of creating costly Python lists.\r\n\r\n```python\r\nimport timeit\r\nimport torch\r\nfrom torch.utils.data import TensorDataset, DataLoader\r\nfrom tensorloader import TensorLoader\r\n\r\ndef speed_test(epoch_num: int, **kwargs):\r\n    sample_num = int(1e6)\r\n    X = torch.zeros(sample_num, 10)\r\n    Y = torch.zeros(sample_num)\r\n    tensorloader = TensorLoader((X, Y), **kwargs)\r\n    torchloader = DataLoader(TensorDataset(X, Y), **kwargs)\r\n\r\n    def loop(loader):\r\n        for _ in loader:\r\n            pass\r\n\r\n    t1 = timeit.timeit(lambda: loop(tensorloader), number=epoch_num)\r\n    t2 = timeit.timeit(lambda: loop(torchloader), number=epoch_num)\r\n    print(f'TensorLoader: {t1:.4g}s, TensorDatset + DataLoader: {t2:.4g}s.')\r\n```\r\n\r\n```\r\n>>> speed_test(epoch_num=10, batch_size=128, shuffle=False)\r\nTensorLoader: 0.363s, TensorDatset + DataLoader: 54.39s.\r\n>>> speed_test(epoch_num=10, batch_size=128, shuffle=True)\r\nTensorLoader: 0.9296s, TensorDatset + DataLoader: 56.54s.\r\n>>> speed_test(epoch_num=10, batch_size=10000, shuffle=False)\r\nTensorLoader: 0.005262s, TensorDatset + DataLoader: 55.57s.\r\n>>> speed_test(epoch_num=10, batch_size=10000, shuffle=True)\r\nTensorLoader: 0.5682s, TensorDatset + DataLoader: 57.71s.\r\n```\r\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "A faster dataloader for tensor data.",
    "version": "0.1.0",
    "split_keywords": [
        "pytorch"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8f40ee6150f7986d784c9d70f0af58a0eda09f936c9d9359f1508afb3cc8c198",
                "md5": "133cf138388e44bcce72332231085fe5",
                "sha256": "3e9b8eb224ef90807538fb92fef183430b883592eb47f71086e1935bae058ab5"
            },
            "downloads": -1,
            "filename": "tensorloader-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "133cf138388e44bcce72332231085fe5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 7698,
            "upload_time": "2023-01-26T15:33:42",
            "upload_time_iso_8601": "2023-01-26T15:33:42.527732Z",
            "url": "https://files.pythonhosted.org/packages/8f/40/ee6150f7986d784c9d70f0af58a0eda09f936c9d9359f1508afb3cc8c198/tensorloader-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-01-26 15:33:42",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "tensorloader"
}
        
Elapsed time: 0.12115s