metabatch


Namemetabatch JSON
Version 0.9.0 PyPI version JSON
download
home_page
SummaryMetaBatch: A micro-framework for efficient batching of tasks in PyTorch.
upload_time2023-07-04 15:36:29
maintainer
docs_urlNone
author
requires_python>=3.7
license
keywords maml dataset deep few-shot learning meta meta-learning pytorch task taskset
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Introduction

MetaBatch is a micro-framework for meta-learning in PyTorch. It provides convenient `Taskset` and
`TaskLoader` classes for **batch-aware online task creation for meta-learning**.

## Efficient batching

Training meta-learning models efficiently can be a challenge, especially when it comes to creating
random tasks of a consistent shape in one batch. The task creation process can be time-consuming
and typically requires all tasks in the batch to have the same amount of context and target points.
This can be a bottleneck during training:

```python
# Sample code for creating a batch of tasks with traditional approach
class MyTaskDataset(Dataset):
    ...
    def __getitem__(self, idx):
        task = self.task_data[idx]
        return task

class Model(Module):
    ...
    def forward(self, tasks):
        ctx_batch = tasks['context']
        tgt_batch = tasks['target']
        ...

# create dataset
task_data = [{'images': [...], 'label': 'dog'},
             {'images': [...], 'label': 'cat'}, ...]
dataset = MyTaskDataset(task_data)
dataloader = DataLoader(dataset, batch_size=16, workers=8)

for batch in dataloader:
    ...
    # Construct batch of random tasks in the training loop (bottleneck!)
    n_context = random.randint(low=1, high=5)
    n_target = random.randint(low=1, high=10)
    tasks = {'context': [], 'target': []}
    for task in batch:
        context_images = sample_n_images(task['images'], n_context)
        target_images = sample_n_images(task['images'], n_target)
        tasks['context'].append(context_images)
        tasks['target'].append(target_images)
    model(tasks)
    ...
```

### Multiprocessing
Wouldn't it be better to offload the task creation to the dataloader, so that it can be done in
parallel on multiple cores?
With **MetaBatch**, we simplify the process by allowing you to do just that.
We provide a `TaskSet` wrapper, where you can implement the `__gettask__(self, index, n_context,
n_target)__` method instead of PyTorch's `__getitem(self, index)__`. Our `TaskLoader` and
custom sampler take care of synchronizing `n_context` and `n_target` for each batch element
dispatched to all workers. With **MetaBatch**, the training bottleneck can be removed from the
above example:
```python
# Sample code for creating a batch of tasks with MetaBatch
from metabatch import TaskSet, TaskLoader

class MyTaskSet(TaskSet):
    ...
    def __gettask__(self, idx, n_context, n_target):
        data = self.task_data[idx]
        context_images = sample_n_images(data['images'], n_context)
        target_images = sample_n_images(data['images'], n_target)
        return {
            'context': context_images
            'target': target_images
        }

class Model(Module):
    ...
    def forward(self, tasks):
        ctx_batch = tasks['context']
        tgt_batch = tasks['target']
        ...

# create dataset
task_data = [{'images': [...], 'label': 'dog'},
             {'images': [...], 'label': 'cat'}, ...]
dataset = MyTaskSet(task_data, min_pts=1, max_ctx_pts=5, max_tgt_pts=10)
dataloader = TaskLoader(dataset, batch_size=16, workers=8)

for batch in dataloader:
    ...
    # Simply access the batch of constructed tasks (no bottleneck!)
    model(batch)
    ...

```

## Installation & usage

Install it: `pip install metabatch`

Requirements:
- `pytorch`

Look at the example above for an idea or how to use `TaskLoader` with `TaskSet`, or go through the
examples in `examples/` (**TODO**).



## Advantages

- MetaBatch allows for efficient task creation and batching during training, resulting in more task
    variations since you are no longer limited to precomputed tasks.
- Reduces boilerplate needed to precompute and load tasks.

MetaBatch is a micro-framework for meta-learning in PyTorch that provides convenient tools for
(potentially faster) meta-training. It simplifies the task creation process and allows for efficient batching,
making it a useful tool for researchers and engineers working on meta-learning projects.

## How much faster?

**TODO**: benchmark MAML and CNP examples with typical implementation and other repos.


## License

MetaBatch is released under the MIT License. See the LICENSE file for more information.

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "metabatch",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "MAML,dataset,deep,few-shot,learning,meta,meta-learning,pytorch,task,taskset",
    "author": "",
    "author_email": "Th\u00e9o Morales <moralest@tcd.ie>",
    "download_url": "https://files.pythonhosted.org/packages/be/d9/b386d88469e79c553b74c423900576471e1e7fb464d6d4175d1828d08edb/metabatch-0.9.0.tar.gz",
    "platform": null,
    "description": "# Introduction\n\nMetaBatch is a micro-framework for meta-learning in PyTorch. It provides convenient `Taskset` and\n`TaskLoader` classes for **batch-aware online task creation for meta-learning**.\n\n## Efficient batching\n\nTraining meta-learning models efficiently can be a challenge, especially when it comes to creating\nrandom tasks of a consistent shape in one batch. The task creation process can be time-consuming\nand typically requires all tasks in the batch to have the same amount of context and target points.\nThis can be a bottleneck during training:\n\n```python\n# Sample code for creating a batch of tasks with traditional approach\nclass MyTaskDataset(Dataset):\n    ...\n    def __getitem__(self, idx):\n        task = self.task_data[idx]\n        return task\n\nclass Model(Module):\n    ...\n    def forward(self, tasks):\n        ctx_batch = tasks['context']\n        tgt_batch = tasks['target']\n        ...\n\n# create dataset\ntask_data = [{'images': [...], 'label': 'dog'},\n             {'images': [...], 'label': 'cat'}, ...]\ndataset = MyTaskDataset(task_data)\ndataloader = DataLoader(dataset, batch_size=16, workers=8)\n\nfor batch in dataloader:\n    ...\n    # Construct batch of random tasks in the training loop (bottleneck!)\n    n_context = random.randint(low=1, high=5)\n    n_target = random.randint(low=1, high=10)\n    tasks = {'context': [], 'target': []}\n    for task in batch:\n        context_images = sample_n_images(task['images'], n_context)\n        target_images = sample_n_images(task['images'], n_target)\n        tasks['context'].append(context_images)\n        tasks['target'].append(target_images)\n    model(tasks)\n    ...\n```\n\n### Multiprocessing\nWouldn't it be better to offload the task creation to the dataloader, so that it can be done in\nparallel on multiple cores?\nWith **MetaBatch**, we simplify the process by allowing you to do just that.\nWe provide a `TaskSet` wrapper, where you can implement the `__gettask__(self, index, n_context,\nn_target)__` method instead of PyTorch's `__getitem(self, index)__`. Our `TaskLoader` and\ncustom sampler take care of synchronizing `n_context` and `n_target` for each batch element\ndispatched to all workers. With **MetaBatch**, the training bottleneck can be removed from the\nabove example:\n```python\n# Sample code for creating a batch of tasks with MetaBatch\nfrom metabatch import TaskSet, TaskLoader\n\nclass MyTaskSet(TaskSet):\n    ...\n    def __gettask__(self, idx, n_context, n_target):\n        data = self.task_data[idx]\n        context_images = sample_n_images(data['images'], n_context)\n        target_images = sample_n_images(data['images'], n_target)\n        return {\n            'context': context_images\n            'target': target_images\n        }\n\nclass Model(Module):\n    ...\n    def forward(self, tasks):\n        ctx_batch = tasks['context']\n        tgt_batch = tasks['target']\n        ...\n\n# create dataset\ntask_data = [{'images': [...], 'label': 'dog'},\n             {'images': [...], 'label': 'cat'}, ...]\ndataset = MyTaskSet(task_data, min_pts=1, max_ctx_pts=5, max_tgt_pts=10)\ndataloader = TaskLoader(dataset, batch_size=16, workers=8)\n\nfor batch in dataloader:\n    ...\n    # Simply access the batch of constructed tasks (no bottleneck!)\n    model(batch)\n    ...\n\n```\n\n## Installation & usage\n\nInstall it: `pip install metabatch`\n\nRequirements:\n- `pytorch`\n\nLook at the example above for an idea or how to use `TaskLoader` with `TaskSet`, or go through the\nexamples in `examples/` (**TODO**).\n\n\n\n## Advantages\n\n- MetaBatch allows for efficient task creation and batching during training, resulting in more task\n    variations since you are no longer limited to precomputed tasks.\n- Reduces boilerplate needed to precompute and load tasks.\n\nMetaBatch is a micro-framework for meta-learning in PyTorch that provides convenient tools for\n(potentially faster) meta-training. It simplifies the task creation process and allows for efficient batching,\nmaking it a useful tool for researchers and engineers working on meta-learning projects.\n\n## How much faster?\n\n**TODO**: benchmark MAML and CNP examples with typical implementation and other repos.\n\n\n## License\n\nMetaBatch is released under the MIT License. See the LICENSE file for more information.\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "MetaBatch: A micro-framework for efficient batching of tasks in PyTorch.",
    "version": "0.9.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/DubiousCactus/metabatch/issues",
        "Homepage": "https://github.com/DubiousCactus/metabatch"
    },
    "split_keywords": [
        "maml",
        "dataset",
        "deep",
        "few-shot",
        "learning",
        "meta",
        "meta-learning",
        "pytorch",
        "task",
        "taskset"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "77ecf53d132521c8da1719acab940b3f4771c2de35acb1fed9933b39adde4841",
                "md5": "9297d245c5f7b0ab5e0c3ec818242ef7",
                "sha256": "ea44f24605cf0c1bc71c3935c9bc49a9b635087001a4737b7bf5b524cbf3235e"
            },
            "downloads": -1,
            "filename": "metabatch-0.9.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9297d245c5f7b0ab5e0c3ec818242ef7",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 7128,
            "upload_time": "2023-07-04T15:36:26",
            "upload_time_iso_8601": "2023-07-04T15:36:26.807692Z",
            "url": "https://files.pythonhosted.org/packages/77/ec/f53d132521c8da1719acab940b3f4771c2de35acb1fed9933b39adde4841/metabatch-0.9.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bed9b386d88469e79c553b74c423900576471e1e7fb464d6d4175d1828d08edb",
                "md5": "2f45fc2ac7dbf1a5c5a097fa16d594b7",
                "sha256": "2089fba07857d88d598a9c609e4595f61fde107c06c2a6efb6e4936d55df240b"
            },
            "downloads": -1,
            "filename": "metabatch-0.9.0.tar.gz",
            "has_sig": false,
            "md5_digest": "2f45fc2ac7dbf1a5c5a097fa16d594b7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 19158,
            "upload_time": "2023-07-04T15:36:29",
            "upload_time_iso_8601": "2023-07-04T15:36:29.546815Z",
            "url": "https://files.pythonhosted.org/packages/be/d9/b386d88469e79c553b74c423900576471e1e7fb464d6d4175d1828d08edb/metabatch-0.9.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-04 15:36:29",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "DubiousCactus",
    "github_project": "metabatch",
    "github_not_found": true,
    "lcname": "metabatch"
}
        
Elapsed time: 0.08845s