torchdata-nightly


Nametorchdata-nightly JSON
Version 1597449981 PyPI version JSON
download
home_pagehttps://github.com/pypa/torchdata
SummaryPyTorch based library focused on data processing and input pipelines in general.
upload_time2020-08-15 00:06:29
maintainer
docs_urlNone
authorSzymon Maszke
requires_python>=3.6
licenseMIT
keywords pytorch torch data datasets map cache memory disk apply database
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <img align="left" width="256" height="256" src="https://github.com/szymonmaszke/torchdata/blob/master/assets/logos/medium.png">

* Use `map`, `apply`, `reduce` or `filter` directly on `Dataset` objects
* `cache` data in RAM/disk or via your own method (partial caching supported)
* Full PyTorch's [`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) and [`IterableDataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.IterableDataset>) support
* General `torchdata.maps` like `Flatten` or `Select`
* Extensible interface (your own cache methods, cache modifiers, maps etc.)
* Useful `torchdata.datasets` classes designed for general tasks (e.g. file reading)
* Support for `torchvision` datasets (e.g. `ImageFolder`, `MNIST`, `CIFAR10`) via `td.datasets.WrapDataset`
* Minimal overhead (single call to `super().__init__()`)

| Version | Docs | Tests | Coverage | Style | PyPI | Python | PyTorch | Docker | Roadmap |
|---------|------|-------|----------|-------|------|--------|---------|--------|---------|
| [![Version](https://img.shields.io/static/v1?label=&message=0.2.0&color=377EF0&style=for-the-badge)](https://github.com/szymonmaszke/torchdata/releases) | [![Documentation](https://img.shields.io/static/v1?label=&message=docs&color=EE4C2C&style=for-the-badge)](https://szymonmaszke.github.io/torchdata/)  | ![Tests](https://github.com/szymonmaszke/torchdata/workflows/test/badge.svg) | ![Coverage](https://img.shields.io/codecov/c/github/szymonmaszke/torchdata?label=%20&logo=codecov&style=for-the-badge) | [![codebeat](https://img.shields.io/static/v1?label=&message=CB&color=27A8E0&style=for-the-badge)](https://codebeat.co/projects/github-com-szymonmaszke-torchdata-master) | [![PyPI](https://img.shields.io/static/v1?label=&message=PyPI&color=377EF0&style=for-the-badge)](https://pypi.org/project/torchdata/) | [![Python](https://img.shields.io/static/v1?label=&message=3.6&color=377EF0&style=for-the-badge&logo=python&logoColor=F8C63D)](https://www.python.org/) | [![PyTorch](https://img.shields.io/static/v1?label=&message=>=1.2.0&color=EE4C2C&style=for-the-badge)](https://pytorch.org/) | [![Docker](https://img.shields.io/static/v1?label=&message=docker&color=309cef&style=for-the-badge)](https://hub.docker.com/r/szymonmaszke/torchdata) | [![Roadmap](https://img.shields.io/static/v1?label=&message=roadmap&color=009688&style=for-the-badge)](https://github.com/szymonmaszke/torchdata/blob/master/ROADMAP.md) |

# :bulb: Examples

__Check documentation here:__
[https://szymonmaszke.github.io/torchdata](https://szymonmaszke.github.io/torchdata)

## General example

- Create image dataset, convert it to Tensors, cache and concatenate with smoothed labels:

```python
import torchdata as td
import torchvision

class Images(td.Dataset): # Different inheritance
    def __init__(self, path: str):
        super().__init__() # This is the only change
        self.files = [file for file in pathlib.Path(path).glob("*")]

    def __getitem__(self, index):
        return Image.open(self.files[index])

    def __len__(self):
        return len(self.files)


images = Images("./data").map(torchvision.transforms.ToTensor()).cache()
```

You can concatenate above dataset with another (say `labels`) and iterate over them as per usual:

```python
for data, label in images | labels:
    # Do whatever you want with your data
```

- Cache first `1000` samples in memory, save the rest on disk in folder `./cache`:

```python
images = (
    ImageDataset.from_folder("./data").map(torchvision.transforms.ToTensor())
    # First 1000 samples in memory
    .cache(td.modifiers.UpToIndex(1000, td.cachers.Memory()))
    # Sample from 1000 to the end saved with Pickle on disk
    .cache(td.modifiers.FromIndex(1000, td.cachers.Pickle("./cache")))
    # You can define your own cachers, modifiers, see docs
)
```
To see what else you can do please check [**torchdata documentation**](https://szymonmaszke.github.io/torchdata/)

## Integration with `torchvision`

Using `torchdata` you can easily split `torchvision` datasets and apply augmentation
only to the training part of data without any troubles:

```python
import torchvision

import torchdata as td

# Wrap torchvision dataset with WrapDataset
dataset = td.datasets.WrapDataset(torchvision.datasets.ImageFolder("./images"))

# Split dataset
train_dataset, validation_dataset, test_dataset = torch.utils.data.random_split(
    model_dataset,
    (int(0.6 * len(dataset)), int(0.2 * len(dataset)), int(0.2 * len(dataset))),
)

# Apply torchvision mappings ONLY to train dataset
train_dataset.map(
    td.maps.To(
        torchvision.transforms.Compose(
            [
                torchvision.transforms.RandomResizedCrop(224),
                torchvision.transforms.RandomHorizontalFlip(),
                torchvision.transforms.ToTensor(),
                torchvision.transforms.Normalize(
                    mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
                ),
            ]
        )
    ),
    # Apply this transformation to zeroth sample
    # First sample is the label
    0,
)
```

Please notice you can use `td.datasets.WrapDataset` with any existing `torch.utils.data.Dataset`
instance to give it additional `caching` and `mapping` powers!

# :wrench: Installation

## :snake: [pip](<https://pypi.org/project/torchdata/>)

### Latest release:

```shell
pip install --user torchdata
```

### Nightly:

```shell
pip install --user torchdata-nightly
```

## :whale2: [Docker](https://hub.docker.com/r/szymonmaszke/torchdata)

__CPU standalone__ and various versions of __GPU enabled__ images are available
at [dockerhub](https://hub.docker.com/r/szymonmaszke/torchdata/tags).

For CPU quickstart, issue:

```shell
docker pull szymonmaszke/torchdata:18.04
```

Nightly builds are also available, just prefix tag with `nightly_`. If you are going for `GPU` image make sure you have
[nvidia/docker](https://github.com/NVIDIA/nvidia-docker) installed and it's runtime set.

# :question: Contributing

If you find any issue or you think some functionality may be useful to others and fits this library, please [open new Issue](https://help.github.com/en/articles/creating-an-issue) or [create Pull Request](https://help.github.com/en/articles/creating-a-pull-request-from-a-fork).

To get an overview of thins one can do to help this project, see [Roadmap](https://github.com/szymonmaszke/torchdata/blob/master/ROADMAP.md)



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/pypa/torchdata",
    "name": "torchdata-nightly",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "pytorch torch data datasets map cache memory disk apply database",
    "author": "Szymon Maszke",
    "author_email": "szymon.maszke@protonmail.com",
    "download_url": "https://files.pythonhosted.org/packages/2a/34/7217ed58876a4e04149f7f85e6aa9335a6a6e5a25e5ae1ead28a2759d51e/torchdata-nightly-1597449981.tar.gz",
    "platform": "",
    "description": "<img align=\"left\" width=\"256\" height=\"256\" src=\"https://github.com/szymonmaszke/torchdata/blob/master/assets/logos/medium.png\">\n\n* Use `map`, `apply`, `reduce` or `filter` directly on `Dataset` objects\n* `cache` data in RAM/disk or via your own method (partial caching supported)\n* Full PyTorch's [`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) and [`IterableDataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.IterableDataset>) support\n* General `torchdata.maps` like `Flatten` or `Select`\n* Extensible interface (your own cache methods, cache modifiers, maps etc.)\n* Useful `torchdata.datasets` classes designed for general tasks (e.g. file reading)\n* Support for `torchvision` datasets (e.g. `ImageFolder`, `MNIST`, `CIFAR10`) via `td.datasets.WrapDataset`\n* Minimal overhead (single call to `super().__init__()`)\n\n| Version | Docs | Tests | Coverage | Style | PyPI | Python | PyTorch | Docker | Roadmap |\n|---------|------|-------|----------|-------|------|--------|---------|--------|---------|\n| [![Version](https://img.shields.io/static/v1?label=&message=0.2.0&color=377EF0&style=for-the-badge)](https://github.com/szymonmaszke/torchdata/releases) | [![Documentation](https://img.shields.io/static/v1?label=&message=docs&color=EE4C2C&style=for-the-badge)](https://szymonmaszke.github.io/torchdata/)  | ![Tests](https://github.com/szymonmaszke/torchdata/workflows/test/badge.svg) | ![Coverage](https://img.shields.io/codecov/c/github/szymonmaszke/torchdata?label=%20&logo=codecov&style=for-the-badge) | [![codebeat](https://img.shields.io/static/v1?label=&message=CB&color=27A8E0&style=for-the-badge)](https://codebeat.co/projects/github-com-szymonmaszke-torchdata-master) | [![PyPI](https://img.shields.io/static/v1?label=&message=PyPI&color=377EF0&style=for-the-badge)](https://pypi.org/project/torchdata/) | [![Python](https://img.shields.io/static/v1?label=&message=3.6&color=377EF0&style=for-the-badge&logo=python&logoColor=F8C63D)](https://www.python.org/) | [![PyTorch](https://img.shields.io/static/v1?label=&message=>=1.2.0&color=EE4C2C&style=for-the-badge)](https://pytorch.org/) | [![Docker](https://img.shields.io/static/v1?label=&message=docker&color=309cef&style=for-the-badge)](https://hub.docker.com/r/szymonmaszke/torchdata) | [![Roadmap](https://img.shields.io/static/v1?label=&message=roadmap&color=009688&style=for-the-badge)](https://github.com/szymonmaszke/torchdata/blob/master/ROADMAP.md) |\n\n# :bulb: Examples\n\n__Check documentation here:__\n[https://szymonmaszke.github.io/torchdata](https://szymonmaszke.github.io/torchdata)\n\n## General example\n\n- Create image dataset, convert it to Tensors, cache and concatenate with smoothed labels:\n\n```python\nimport torchdata as td\nimport torchvision\n\nclass Images(td.Dataset): # Different inheritance\n    def __init__(self, path: str):\n        super().__init__() # This is the only change\n        self.files = [file for file in pathlib.Path(path).glob(\"*\")]\n\n    def __getitem__(self, index):\n        return Image.open(self.files[index])\n\n    def __len__(self):\n        return len(self.files)\n\n\nimages = Images(\"./data\").map(torchvision.transforms.ToTensor()).cache()\n```\n\nYou can concatenate above dataset with another (say `labels`) and iterate over them as per usual:\n\n```python\nfor data, label in images | labels:\n    # Do whatever you want with your data\n```\n\n- Cache first `1000` samples in memory, save the rest on disk in folder `./cache`:\n\n```python\nimages = (\n    ImageDataset.from_folder(\"./data\").map(torchvision.transforms.ToTensor())\n    # First 1000 samples in memory\n    .cache(td.modifiers.UpToIndex(1000, td.cachers.Memory()))\n    # Sample from 1000 to the end saved with Pickle on disk\n    .cache(td.modifiers.FromIndex(1000, td.cachers.Pickle(\"./cache\")))\n    # You can define your own cachers, modifiers, see docs\n)\n```\nTo see what else you can do please check [**torchdata documentation**](https://szymonmaszke.github.io/torchdata/)\n\n## Integration with `torchvision`\n\nUsing `torchdata` you can easily split `torchvision` datasets and apply augmentation\nonly to the training part of data without any troubles:\n\n```python\nimport torchvision\n\nimport torchdata as td\n\n# Wrap torchvision dataset with WrapDataset\ndataset = td.datasets.WrapDataset(torchvision.datasets.ImageFolder(\"./images\"))\n\n# Split dataset\ntrain_dataset, validation_dataset, test_dataset = torch.utils.data.random_split(\n    model_dataset,\n    (int(0.6 * len(dataset)), int(0.2 * len(dataset)), int(0.2 * len(dataset))),\n)\n\n# Apply torchvision mappings ONLY to train dataset\ntrain_dataset.map(\n    td.maps.To(\n        torchvision.transforms.Compose(\n            [\n                torchvision.transforms.RandomResizedCrop(224),\n                torchvision.transforms.RandomHorizontalFlip(),\n                torchvision.transforms.ToTensor(),\n                torchvision.transforms.Normalize(\n                    mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]\n                ),\n            ]\n        )\n    ),\n    # Apply this transformation to zeroth sample\n    # First sample is the label\n    0,\n)\n```\n\nPlease notice you can use `td.datasets.WrapDataset` with any existing `torch.utils.data.Dataset`\ninstance to give it additional `caching` and `mapping` powers!\n\n# :wrench: Installation\n\n## :snake: [pip](<https://pypi.org/project/torchdata/>)\n\n### Latest release:\n\n```shell\npip install --user torchdata\n```\n\n### Nightly:\n\n```shell\npip install --user torchdata-nightly\n```\n\n## :whale2: [Docker](https://hub.docker.com/r/szymonmaszke/torchdata)\n\n__CPU standalone__ and various versions of __GPU enabled__ images are available\nat [dockerhub](https://hub.docker.com/r/szymonmaszke/torchdata/tags).\n\nFor CPU quickstart, issue:\n\n```shell\ndocker pull szymonmaszke/torchdata:18.04\n```\n\nNightly builds are also available, just prefix tag with `nightly_`. If you are going for `GPU` image make sure you have\n[nvidia/docker](https://github.com/NVIDIA/nvidia-docker) installed and it's runtime set.\n\n# :question: Contributing\n\nIf you find any issue or you think some functionality may be useful to others and fits this library, please [open new Issue](https://help.github.com/en/articles/creating-an-issue) or [create Pull Request](https://help.github.com/en/articles/creating-a-pull-request-from-a-fork).\n\nTo get an overview of thins one can do to help this project, see [Roadmap](https://github.com/szymonmaszke/torchdata/blob/master/ROADMAP.md)\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "PyTorch based library focused on data processing and input pipelines in general.",
    "version": "1597449981",
    "split_keywords": [
        "pytorch",
        "torch",
        "data",
        "datasets",
        "map",
        "cache",
        "memory",
        "disk",
        "apply",
        "database"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "5fccc4c85b47fd62832aaeee5e310172",
                "sha256": "90aabdb2d603415b5e0238ae7f8a4460234e143adfb57573939fec78acbf67bc"
            },
            "downloads": -1,
            "filename": "torchdata_nightly-1597449981-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5fccc4c85b47fd62832aaeee5e310172",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 27559,
            "upload_time": "2020-08-15T00:06:27",
            "upload_time_iso_8601": "2020-08-15T00:06:27.580922Z",
            "url": "https://files.pythonhosted.org/packages/e6/46/fff4c0a798ee7654a9430a9c48fada632c1394945aa325438853248a7352/torchdata_nightly-1597449981-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "fa1e4e3d4772691681c8a0a7c7b5ab5b",
                "sha256": "81d3dc62aadfff6da22cc68b7bbfda0666aa1b8569c0ed4db2333216f8f09c77"
            },
            "downloads": -1,
            "filename": "torchdata-nightly-1597449981.tar.gz",
            "has_sig": false,
            "md5_digest": "fa1e4e3d4772691681c8a0a7c7b5ab5b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 23059,
            "upload_time": "2020-08-15T00:06:29",
            "upload_time_iso_8601": "2020-08-15T00:06:29.195740Z",
            "url": "https://files.pythonhosted.org/packages/2a/34/7217ed58876a4e04149f7f85e6aa9335a6a6e5a25e5ae1ead28a2759d51e/torchdata-nightly-1597449981.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2020-08-15 00:06:29",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": null,
    "github_project": "pypa",
    "error": "Could not fetch GitHub repository",
    "lcname": "torchdata-nightly"
}
        
Elapsed time: 0.16026s