Name | tensorloader JSON |
Version |
0.1.0
JSON |
| download |
home_page | |
Summary | A faster dataloader for tensor data. |
upload_time | 2023-01-26 15:33:42 |
maintainer | |
docs_url | None |
author | |
requires_python | >=3.8 |
license | Apache-2.0 |
keywords |
pytorch
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Tensor Loader
  
`TensorLoader` is similar to the combination of PyTorch's `TensorDataset` and `DataLoader`. It is faster and has better type hints.
## Installation
Install from PyPi:
```shell
pip install tensorloader
```
Install from source:
```shell
git clone https://github.com/zhb2000/tensorloader.git
cd tensorloader
pip install .
```
## Usage
This package only contains a `TensorLoader` class.
```python
from tensorloader import TensorLoader
```
Use a single tensor as data:
```python
X = torch.tensor(...)
dataloader = TensorLoader(X)
for x in dataloader:
...
```
Use a tuple of tensor as data:
```python
X = torch.tensor(...)
Y = torch.tensor(...)
dataloader = TensorLoader((X, Y))
for x, y in dataloader: # unpack the batch tuple as x, y
...
```
Use a namedtuple of tensor as data:
```python
from collections import namedtuple
Batch = namedtuple('Batch', ['x', 'y'])
X = torch.tensor(...)
Y = torch.tensor(...)
# set unpack_args=True when using a namedtuple as data
dataloader = TensorLoader(Batch(X, Y), unpack_args=True)
for batch in dataloader:
assert isinstance(batch, Batch)
assert isinstance(batch.x, torch.Tensor)
assert isinstance(batch.y, torch.Tensor)
x, y = batch
...
```
PS: Namedtuples are similar to common tuples and they allow field access by name which makes code more readable. For more information, see the [documentation](https://docs.python.org/3/library/collections.html#collections.namedtuple) of namedtuple.
## Speed Test
`TensorLoader` is much faster than `TensorDataset` + `DataLoader`, for it uses vectorized operations instead of creating costly Python lists.
```python
import timeit
import torch
from torch.utils.data import TensorDataset, DataLoader
from tensorloader import TensorLoader
def speed_test(epoch_num: int, **kwargs):
sample_num = int(1e6)
X = torch.zeros(sample_num, 10)
Y = torch.zeros(sample_num)
tensorloader = TensorLoader((X, Y), **kwargs)
torchloader = DataLoader(TensorDataset(X, Y), **kwargs)
def loop(loader):
for _ in loader:
pass
t1 = timeit.timeit(lambda: loop(tensorloader), number=epoch_num)
t2 = timeit.timeit(lambda: loop(torchloader), number=epoch_num)
print(f'TensorLoader: {t1:.4g}s, TensorDatset + DataLoader: {t2:.4g}s.')
```
```
>>> speed_test(epoch_num=10, batch_size=128, shuffle=False)
TensorLoader: 0.363s, TensorDatset + DataLoader: 54.39s.
>>> speed_test(epoch_num=10, batch_size=128, shuffle=True)
TensorLoader: 0.9296s, TensorDatset + DataLoader: 56.54s.
>>> speed_test(epoch_num=10, batch_size=10000, shuffle=False)
TensorLoader: 0.005262s, TensorDatset + DataLoader: 55.57s.
>>> speed_test(epoch_num=10, batch_size=10000, shuffle=True)
TensorLoader: 0.5682s, TensorDatset + DataLoader: 57.71s.
```
Raw data
{
"_id": null,
"home_page": "",
"name": "tensorloader",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "pytorch",
"author": "",
"author_email": "ZHB <zhb2000@zju.edu.cn>",
"download_url": "https://files.pythonhosted.org/packages/8f/40/ee6150f7986d784c9d70f0af58a0eda09f936c9d9359f1508afb3cc8c198/tensorloader-0.1.0.tar.gz",
"platform": null,
"description": "# Tensor Loader\r\n\r\n  \r\n\r\n`TensorLoader` is similar to the combination of PyTorch's `TensorDataset` and `DataLoader`. It is faster and has better type hints.\r\n\r\n## Installation\r\n\r\nInstall from PyPi:\r\n\r\n```shell\r\npip install tensorloader\r\n```\r\n\r\nInstall from source:\r\n\r\n```shell\r\ngit clone https://github.com/zhb2000/tensorloader.git\r\ncd tensorloader\r\npip install .\r\n```\r\n\r\n## Usage\r\n\r\nThis package only contains a `TensorLoader` class.\r\n\r\n```python\r\nfrom tensorloader import TensorLoader\r\n```\r\n\r\nUse a single tensor as data:\r\n\r\n```python\r\nX = torch.tensor(...)\r\ndataloader = TensorLoader(X)\r\nfor x in dataloader:\r\n ...\r\n```\r\n\r\nUse a tuple of tensor as data:\r\n\r\n```python\r\nX = torch.tensor(...)\r\nY = torch.tensor(...)\r\ndataloader = TensorLoader((X, Y))\r\nfor x, y in dataloader: # unpack the batch tuple as x, y\r\n ...\r\n```\r\n\r\nUse a namedtuple of tensor as data:\r\n\r\n```python\r\nfrom collections import namedtuple\r\n\r\nBatch = namedtuple('Batch', ['x', 'y'])\r\nX = torch.tensor(...)\r\nY = torch.tensor(...)\r\n# set unpack_args=True when using a namedtuple as data\r\ndataloader = TensorLoader(Batch(X, Y), unpack_args=True)\r\nfor batch in dataloader:\r\n assert isinstance(batch, Batch)\r\n assert isinstance(batch.x, torch.Tensor)\r\n assert isinstance(batch.y, torch.Tensor)\r\n x, y = batch\r\n ...\r\n```\r\n\r\nPS: Namedtuples are similar to common tuples and they allow field access by name which makes code more readable. For more information, see the [documentation](https://docs.python.org/3/library/collections.html#collections.namedtuple) of namedtuple.\r\n\r\n## Speed Test\r\n\r\n`TensorLoader` is much faster than `TensorDataset` + `DataLoader`, for it uses vectorized operations instead of creating costly Python lists.\r\n\r\n```python\r\nimport timeit\r\nimport torch\r\nfrom torch.utils.data import TensorDataset, DataLoader\r\nfrom tensorloader import TensorLoader\r\n\r\ndef speed_test(epoch_num: int, **kwargs):\r\n sample_num = int(1e6)\r\n X = torch.zeros(sample_num, 10)\r\n Y = torch.zeros(sample_num)\r\n tensorloader = TensorLoader((X, Y), **kwargs)\r\n torchloader = DataLoader(TensorDataset(X, Y), **kwargs)\r\n\r\n def loop(loader):\r\n for _ in loader:\r\n pass\r\n\r\n t1 = timeit.timeit(lambda: loop(tensorloader), number=epoch_num)\r\n t2 = timeit.timeit(lambda: loop(torchloader), number=epoch_num)\r\n print(f'TensorLoader: {t1:.4g}s, TensorDatset + DataLoader: {t2:.4g}s.')\r\n```\r\n\r\n```\r\n>>> speed_test(epoch_num=10, batch_size=128, shuffle=False)\r\nTensorLoader: 0.363s, TensorDatset + DataLoader: 54.39s.\r\n>>> speed_test(epoch_num=10, batch_size=128, shuffle=True)\r\nTensorLoader: 0.9296s, TensorDatset + DataLoader: 56.54s.\r\n>>> speed_test(epoch_num=10, batch_size=10000, shuffle=False)\r\nTensorLoader: 0.005262s, TensorDatset + DataLoader: 55.57s.\r\n>>> speed_test(epoch_num=10, batch_size=10000, shuffle=True)\r\nTensorLoader: 0.5682s, TensorDatset + DataLoader: 57.71s.\r\n```\r\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "A faster dataloader for tensor data.",
"version": "0.1.0",
"split_keywords": [
"pytorch"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "8f40ee6150f7986d784c9d70f0af58a0eda09f936c9d9359f1508afb3cc8c198",
"md5": "133cf138388e44bcce72332231085fe5",
"sha256": "3e9b8eb224ef90807538fb92fef183430b883592eb47f71086e1935bae058ab5"
},
"downloads": -1,
"filename": "tensorloader-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "133cf138388e44bcce72332231085fe5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 7698,
"upload_time": "2023-01-26T15:33:42",
"upload_time_iso_8601": "2023-01-26T15:33:42.527732Z",
"url": "https://files.pythonhosted.org/packages/8f/40/ee6150f7986d784c9d70f0af58a0eda09f936c9d9359f1508afb3cc8c198/tensorloader-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-01-26 15:33:42",
"github": false,
"gitlab": false,
"bitbucket": false,
"lcname": "tensorloader"
}