merlin-dataloader


Namemerlin-dataloader JSON
Version 23.8.0 PyPI version JSON
download
home_pagehttps://github.com/NVIDIA-Merlin/dataloader
SummaryMerlin Dataloader
upload_time2023-08-29 16:35:49
maintainer
docs_urlNone
authorNVIDIA Corporation
requires_python>=3.8
licenseApache 2.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # [Merlin Dataloader](https://github.com/NVIDIA-Merlin/dataloader)

![PyPI - Python Version](https://img.shields.io/pypi/pyversions/merlin-dataloader)
[![PyPI version shields.io](https://img.shields.io/pypi/v/merlin-dataloader.svg)](https://pypi.python.org/pypi/merlin-dataloader/)
![GitHub License](https://img.shields.io/github/license/NVIDIA-Merlin/dataloader)
[![Documentation](https://img.shields.io/badge/documentation-blue.svg)](https://nvidia-merlin.github.io/dataloader/stable/README.html)

The merlin-dataloader lets you quickly train recommender models for TensorFlow, PyTorch and JAX. It eliminates the biggest bottleneck in training recommender models, by providing GPU optimized dataloaders that read data directly into the GPU, and then do a 0-copy transfer to TensorFlow and PyTorch using [dlpack](https://github.com/dmlc/dlpack).

The benefits of the Merlin Dataloader include:

- Over 10x speedup over native framework dataloaders
- Handles larger than memory datasets
- Per-epoch shuffling
- Distributed training

## Installation

Merlin-dataloader requires Python version 3.7+. Additionally, GPU support requires CUDA 11.0+.

To install using Conda:

```
conda install -c nvidia -c rapidsai -c numba -c conda-forge merlin-dataloader python=3.7 cudatoolkit=11.2
```

To install from PyPi:

```
pip install merlin-dataloader
```

There are also [docker containers on NGC](https://nvidia-merlin.github.io/Merlin/stable/containers.html) with the merlin-dataloader and dependencies included on them

## Basic Usage

```python
# Get a merlin dataset from a set of parquet files
import merlin.io
dataset = merlin.io.Dataset(PARQUET_FILE_PATHS, engine="parquet")

# Create a Tensorflow dataloader from the dataset, loading 65K items
# per batch
from merlin.dataloader.tensorflow import Loader
loader = Loader(dataset, batch_size=65536)

# Get a single batch of data. Inputs will be a dictionary of columnname
# to TensorFlow tensors
inputs, target = next(loader)

# Train a Keras model with the dataloader
model = tf.keras.Model( ... )
model.fit(loader, epochs=5)
```



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/NVIDIA-Merlin/dataloader",
    "name": "merlin-dataloader",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "",
    "author": "NVIDIA Corporation",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/b5/89/5a97dceddec86fa1b4510a1e77b41674007d6dfd9dba928bd1f7cc511073/merlin-dataloader-23.8.0.tar.gz",
    "platform": null,
    "description": "# [Merlin Dataloader](https://github.com/NVIDIA-Merlin/dataloader)\n\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/merlin-dataloader)\n[![PyPI version shields.io](https://img.shields.io/pypi/v/merlin-dataloader.svg)](https://pypi.python.org/pypi/merlin-dataloader/)\n![GitHub License](https://img.shields.io/github/license/NVIDIA-Merlin/dataloader)\n[![Documentation](https://img.shields.io/badge/documentation-blue.svg)](https://nvidia-merlin.github.io/dataloader/stable/README.html)\n\nThe merlin-dataloader lets you quickly train recommender models for TensorFlow, PyTorch and JAX. It eliminates the biggest bottleneck in training recommender models, by providing GPU optimized dataloaders that read data directly into the GPU, and then do a 0-copy transfer to TensorFlow and PyTorch using [dlpack](https://github.com/dmlc/dlpack).\n\nThe benefits of the Merlin Dataloader include:\n\n- Over 10x speedup over native framework dataloaders\n- Handles larger than memory datasets\n- Per-epoch shuffling\n- Distributed training\n\n## Installation\n\nMerlin-dataloader requires Python version 3.7+. Additionally, GPU support requires CUDA 11.0+.\n\nTo install using Conda:\n\n```\nconda install -c nvidia -c rapidsai -c numba -c conda-forge merlin-dataloader python=3.7 cudatoolkit=11.2\n```\n\nTo install from PyPi:\n\n```\npip install merlin-dataloader\n```\n\nThere are also [docker containers on NGC](https://nvidia-merlin.github.io/Merlin/stable/containers.html) with the merlin-dataloader and dependencies included on them\n\n## Basic Usage\n\n```python\n# Get a merlin dataset from a set of parquet files\nimport merlin.io\ndataset = merlin.io.Dataset(PARQUET_FILE_PATHS, engine=\"parquet\")\n\n# Create a Tensorflow dataloader from the dataset, loading 65K items\n# per batch\nfrom merlin.dataloader.tensorflow import Loader\nloader = Loader(dataset, batch_size=65536)\n\n# Get a single batch of data. Inputs will be a dictionary of columnname\n# to TensorFlow tensors\ninputs, target = next(loader)\n\n# Train a Keras model with the dataloader\nmodel = tf.keras.Model( ... )\nmodel.fit(loader, epochs=5)\n```\n\n\n",
    "bugtrack_url": null,
    "license": "Apache 2.0",
    "summary": "Merlin Dataloader",
    "version": "23.8.0",
    "project_urls": {
        "Homepage": "https://github.com/NVIDIA-Merlin/dataloader"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b5895a97dceddec86fa1b4510a1e77b41674007d6dfd9dba928bd1f7cc511073",
                "md5": "4326030cf02146e3a4aec433215c4631",
                "sha256": "5b2199ab82f9aeaf6cbf728cffe03827547c6af6a780e13e42e81a617f73507b"
            },
            "downloads": -1,
            "filename": "merlin-dataloader-23.8.0.tar.gz",
            "has_sig": false,
            "md5_digest": "4326030cf02146e3a4aec433215c4631",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 46868,
            "upload_time": "2023-08-29T16:35:49",
            "upload_time_iso_8601": "2023-08-29T16:35:49.350805Z",
            "url": "https://files.pythonhosted.org/packages/b5/89/5a97dceddec86fa1b4510a1e77b41674007d6dfd9dba928bd1f7cc511073/merlin-dataloader-23.8.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-29 16:35:49",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "NVIDIA-Merlin",
    "github_project": "dataloader",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "merlin-dataloader"
}
        
Elapsed time: 0.23831s