# RotoGrad
[![Documentation](https://img.shields.io/badge/docs-stable-informational.svg)](https://rotograd.readthedocs.io/en/stable/index.html)
[![Package](https://img.shields.io/badge/pypi-rotograd-informational.svg)](https://pypi.org/project/rotograd/)
[![Paper](http://img.shields.io/badge/paper-arxiv.2103.02631-9cf.svg)](https://arxiv.org/abs/2103.02631)
[![License](https://img.shields.io/badge/license-MIT-yellow.svg)](https://github.com/adrianjav/rotograd/blob/main/LICENSE)
> A library for dynamic gradient homogenization for multitask learning in Pytorch
## Installation
Installing this library is as simple as running in your terminal
```bash
pip install rotograd
```
The code has been tested in Pytorch 1.7.0, yet it should work on most versions. Feel free to open an issue
if that were not the case.
## Overview
This is the official Pytorch implementation of RotoGrad, an algorithm to reduce the negative transfer due
to gradient conflict with respect to the shared parameters when different tasks of a multitask learning
system fight for the shared resources.
Let's say you have a hard-parameter sharing architecture with a `backbone` model shared across tasks, and
two different tasks you want to solve. These tasks take the output of the backbone `z = backbone(x)` and fed
it to a task-specific model (`head1` and `head2`) to obtain the predictions of their tasks, that is,
`y1 = head1(z)` and `y2 = head2(z)`.
Then you can simply use RotateOnly, RotoGrad. or RotoGradNorm (RotateOnly + GradNorm) by putting all parts together in a single model.
```python
from rotograd import RotoGrad
model = RotoGrad(backbone, [head1, head2], size_z, normalize_losses=True)
```
where you can recover the backbone and i-th head simply calling `model.backbone` and `model.heads[i]`. Even
more, you can obtain the end-to-end model for a single task (that is, backbone + head), by typing `model[i]`.
As discussed in the paper, it is advisable to have a smaller learning rate for the parameters of RotoGrad
and GradNorm. This is as simple as doing:
```python
optimizer = nn.Adam(
[{'params': m.parameters()} for m in [backbone, head1, head2]] +
[{'params': model.parameters(), 'lr': learning_rate_rotograd}],
lr=learning_rate_model)
```
Finally, we can train the model on all tasks using a simple step function:
```python
import rotograd
def step(x, y1, y2):
model.train()
optimizer.zero_grad()
with rotograd.cached(): # Speeds-up computations by caching Rotograd's parameters
pred1, pred2 = model(x)
loss1, loss2 = loss_task1(pred1, y1), loss_task2(pred2, y2)
model.backward([loss1, loss2])
optimizer.step()
return loss1, loss2
```
## Example
You can find a working example in the folder `example`. However, it requires some other dependencies to run (e.g.,
ignite and seaborn). The example shows how to use RotoGrad on one of the regression problems from the manuscript.
![image](_assets/toy.gif)
## Citing
Consider citing the following paper if you use RotoGrad:
```bibtex
@inproceedings{javaloy2022rotograd,
title={RotoGrad: Gradient Homogenization in Multitask Learning},
author={Adri{\'a}n Javaloy and Isabel Valera},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=T8wHz4rnuGL}
}
```
Raw data
{
"_id": null,
"home_page": "https://github.com/adrianjav/rotograd",
"name": "rotograd",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "Multitask Learning,Gradient Alignment,Gradient Interference,Negative Transfer,Pytorch,Positive Transfer,Gradient Conflict",
"author": "Adri\u00e1n Javaloy",
"author_email": "adrian.javaloy@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/e0/f6/5ad7199b612fd98f5b417df93d59b7d6b65333641c9a68f1cee4c7580db4/rotograd-0.1.6.0.tar.gz",
"platform": null,
"description": "# RotoGrad\n\n\n[![Documentation](https://img.shields.io/badge/docs-stable-informational.svg)](https://rotograd.readthedocs.io/en/stable/index.html)\n[![Package](https://img.shields.io/badge/pypi-rotograd-informational.svg)](https://pypi.org/project/rotograd/)\n[![Paper](http://img.shields.io/badge/paper-arxiv.2103.02631-9cf.svg)](https://arxiv.org/abs/2103.02631)\n[![License](https://img.shields.io/badge/license-MIT-yellow.svg)](https://github.com/adrianjav/rotograd/blob/main/LICENSE)\n\n> A library for dynamic gradient homogenization for multitask learning in Pytorch\n\n## Installation\n\nInstalling this library is as simple as running in your terminal\n```bash\npip install rotograd\n```\n\nThe code has been tested in Pytorch 1.7.0, yet it should work on most versions. Feel free to open an issue\nif that were not the case.\n\n## Overview\n\nThis is the official Pytorch implementation of RotoGrad, an algorithm to reduce the negative transfer due \nto gradient conflict with respect to the shared parameters when different tasks of a multitask learning\nsystem fight for the shared resources.\n\nLet's say you have a hard-parameter sharing architecture with a `backbone` model shared across tasks, and \ntwo different tasks you want to solve. These tasks take the output of the backbone `z = backbone(x)` and fed\nit to a task-specific model (`head1` and `head2`) to obtain the predictions of their tasks, that is,\n`y1 = head1(z)` and `y2 = head2(z)`.\n\nThen you can simply use RotateOnly, RotoGrad. or RotoGradNorm (RotateOnly + GradNorm) by putting all parts together in a single model.\n\n```python\nfrom rotograd import RotoGrad\nmodel = RotoGrad(backbone, [head1, head2], size_z, normalize_losses=True)\n```\n\nwhere you can recover the backbone and i-th head simply calling `model.backbone` and `model.heads[i]`. Even\nmore, you can obtain the end-to-end model for a single task (that is, backbone + head), by typing `model[i]`.\n\nAs discussed in the paper, it is advisable to have a smaller learning rate for the parameters of RotoGrad\nand GradNorm. This is as simple as doing:\n\n```python\noptimizer = nn.Adam(\n [{'params': m.parameters()} for m in [backbone, head1, head2]] +\n [{'params': model.parameters(), 'lr': learning_rate_rotograd}],\n lr=learning_rate_model)\n```\n\nFinally, we can train the model on all tasks using a simple step function:\n```python\nimport rotograd\n\ndef step(x, y1, y2):\n model.train()\n \n optimizer.zero_grad()\n\n with rotograd.cached(): # Speeds-up computations by caching Rotograd's parameters\n pred1, pred2 = model(x)\n loss1, loss2 = loss_task1(pred1, y1), loss_task2(pred2, y2)\n model.backward([loss1, loss2])\n optimizer.step()\n \n return loss1, loss2\n```\n\n## Example\n\nYou can find a working example in the folder `example`. However, it requires some other dependencies to run (e.g., \nignite and seaborn). The example shows how to use RotoGrad on one of the regression problems from the manuscript.\n\n![image](_assets/toy.gif)\n\n## Citing\n\nConsider citing the following paper if you use RotoGrad:\n\n```bibtex\n@inproceedings{javaloy2022rotograd,\n title={RotoGrad: Gradient Homogenization in Multitask Learning},\n author={Adri{\\'a}n Javaloy and Isabel Valera},\n booktitle={International Conference on Learning Representations},\n year={2022},\n url={https://openreview.net/forum?id=T8wHz4rnuGL}\n}\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "RotoGrad: Gradient Homogenization in Multitask Learning in Pytorch",
"version": "0.1.6.0",
"project_urls": {
"Homepage": "https://github.com/adrianjav/rotograd"
},
"split_keywords": [
"multitask learning",
"gradient alignment",
"gradient interference",
"negative transfer",
"pytorch",
"positive transfer",
"gradient conflict"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "550a73dbcf343c0c8271c3b338bc72080f66372b1bd18b0204d7b0a4bb4f976c",
"md5": "9a9a25ede6e1475eb5bec1e2162a6147",
"sha256": "84322c7eb1d0f20fe587df70340936c8666c3174b0b8c4bc97d082bcf29bcd14"
},
"downloads": -1,
"filename": "rotograd-0.1.6.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9a9a25ede6e1475eb5bec1e2162a6147",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 8645,
"upload_time": "2023-08-01T15:55:49",
"upload_time_iso_8601": "2023-08-01T15:55:49.028405Z",
"url": "https://files.pythonhosted.org/packages/55/0a/73dbcf343c0c8271c3b338bc72080f66372b1bd18b0204d7b0a4bb4f976c/rotograd-0.1.6.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "e0f65ad7199b612fd98f5b417df93d59b7d6b65333641c9a68f1cee4c7580db4",
"md5": "50f244e0d9fefd9c67e1720909326770",
"sha256": "cbc172f0fd03aaf5970ce05066928258a0eca9c4b1e4c992932850f1bef1d3b1"
},
"downloads": -1,
"filename": "rotograd-0.1.6.0.tar.gz",
"has_sig": false,
"md5_digest": "50f244e0d9fefd9c67e1720909326770",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 10115,
"upload_time": "2023-08-01T15:55:51",
"upload_time_iso_8601": "2023-08-01T15:55:51.411460Z",
"url": "https://files.pythonhosted.org/packages/e0/f6/5ad7199b612fd98f5b417df93d59b7d6b65333641c9a68f1cee4c7580db4/rotograd-0.1.6.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-08-01 15:55:51",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "adrianjav",
"github_project": "rotograd",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "rotograd"
}