Name | momo-opt JSON |
Version |
0.1.0
JSON |
| download |
home_page | https://github.com/fabian-sp/MoMo |
Summary | MoMo: Momentum Models for Adaptive Learning Rates |
upload_time | 2023-05-13 08:51:19 |
maintainer | |
docs_url | None |
author | Fabian Schaipp |
requires_python | >=3.8.0 |
license | |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# MoMo
Pytorch implementation of MoMo methods. Adaptive learning rates for SGD with momentum (SGD-M) and Adam.
## Installation
You can install the package with
```
pip install momo-opt
```
## Usage
Import the optimizers in Python with
``` python
from momo import Momo
opt = Momo(model.parameters(), lr=1)
```
or
``` python
from momo import MomoAdam
opt = MomoAdam(model.parameters(), lr=1e-2)
```
**Note that Momo needs access to the value of the batch loss.**
In the ``.step()`` method, you need to pass either
* the loss tensor (when backward has already been done) to the argument `loss`
* or a callable ``closure`` to the argument `closure` that computes gradients and returns the loss.
For example:
``` python
def compute_loss(output, labels):
loss = criterion(output, labels)
loss.backward()
return loss
# in each training step, use:
closure = lambda: compute_loss(output,labels)
opt.step(closure=closure)
```
**For more details, see [a full example script](example.py).**
## Examples
### ResNet110 for CIFAR100
<p float="left">
<img src="png/cifar100_resnet110.png" width="320" />
<img src="png/cifar100_resnet110_training.png" width="305" />
</p>
### ResNet20 for CIFAR10
<p float="left">
<img src="png/cifar10_resnet20.png" width="320" />
<img src="png/cifar10_resnet20_training.png" width="305" />
</p>
## Recommendations
In general, if you expect SGD-M to work well on your task, then use Momo. If you expect Adam to work well on your problem, then use MomoAdam.
* The option `lr` and `weight_decay` are the same as in standard optimizers. As Momo and MomoAdam automatically adapt the learning rate, you should get good preformance without heavy tuning of `lr` and setting a schedule. Setting `lr` constant should work fine. For Momo, our experiments work well with `lr=1`, for MomoAdam `lr=1e-2` (or slightly smaller) should work well.
**One of the main goals of Momo optimizers is to reduce the tuning effort for the learning-rate schedule and get good performance for a wide range of learning rates.**
* For Momo, the argument `beta` refers to the momentum parameter. The default is `beta=0.9`. For MomoAdam, `(beta1,beta2)` have the same role as in Adam.
* The option `lb` refers to a lower bound of your loss function. In many cases, `lb=0` will be a good enough estimate. If your loss converges to a large positive number (and you roughly know the value), then set `lb` to this value (or slightly smaller).
* If you can not estimate a lower bound before training, use the option `use_fstar=True`. This will activate an online estimation of the lower bound.
Raw data
{
"_id": null,
"home_page": "https://github.com/fabian-sp/MoMo",
"name": "momo-opt",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8.0",
"maintainer_email": "",
"keywords": "",
"author": "Fabian Schaipp",
"author_email": "fabian.schaipp@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/ee/09/23651f542e8ac27e2ae63aa7b38365ff1d6b289b39254ada4c58f39e0e57/momo-opt-0.1.0.tar.gz",
"platform": null,
"description": "# MoMo\nPytorch implementation of MoMo methods. Adaptive learning rates for SGD with momentum (SGD-M) and Adam. \n\n## Installation\n\nYou can install the package with\n\n```\npip install momo-opt\n```\n\n## Usage\n\nImport the optimizers in Python with\n\n``` python\nfrom momo import Momo\nopt = Momo(model.parameters(), lr=1)\n```\nor\n\n``` python\nfrom momo import MomoAdam\nopt = MomoAdam(model.parameters(), lr=1e-2)\n```\n\n**Note that Momo needs access to the value of the batch loss.** \nIn the ``.step()`` method, you need to pass either \n* the loss tensor (when backward has already been done) to the argument `loss`\n* or a callable ``closure`` to the argument `closure` that computes gradients and returns the loss. \n\nFor example:\n\n``` python\ndef compute_loss(output, labels):\n loss = criterion(output, labels)\n loss.backward()\n return loss\n\n# in each training step, use:\nclosure = lambda: compute_loss(output,labels)\nopt.step(closure=closure)\n```\n**For more details, see [a full example script](example.py).**\n\n\n\n\n## Examples\n\n### ResNet110 for CIFAR100\n\n<p float=\"left\">\n <img src=\"png/cifar100_resnet110.png\" width=\"320\" />\n <img src=\"png/cifar100_resnet110_training.png\" width=\"305\" />\n</p>\n\n### ResNet20 for CIFAR10\n\n\n<p float=\"left\">\n <img src=\"png/cifar10_resnet20.png\" width=\"320\" />\n <img src=\"png/cifar10_resnet20_training.png\" width=\"305\" />\n</p>\n\n\n## Recommendations\n\nIn general, if you expect SGD-M to work well on your task, then use Momo. If you expect Adam to work well on your problem, then use MomoAdam.\n\n* The option `lr` and `weight_decay` are the same as in standard optimizers. As Momo and MomoAdam automatically adapt the learning rate, you should get good preformance without heavy tuning of `lr` and setting a schedule. Setting `lr` constant should work fine. For Momo, our experiments work well with `lr=1`, for MomoAdam `lr=1e-2` (or slightly smaller) should work well.\n\n**One of the main goals of Momo optimizers is to reduce the tuning effort for the learning-rate schedule and get good performance for a wide range of learning rates.**\n\n* For Momo, the argument `beta` refers to the momentum parameter. The default is `beta=0.9`. For MomoAdam, `(beta1,beta2)` have the same role as in Adam.\n\n* The option `lb` refers to a lower bound of your loss function. In many cases, `lb=0` will be a good enough estimate. If your loss converges to a large positive number (and you roughly know the value), then set `lb` to this value (or slightly smaller). \n\n* If you can not estimate a lower bound before training, use the option `use_fstar=True`. This will activate an online estimation of the lower bound.\n\n\n",
"bugtrack_url": null,
"license": "",
"summary": "MoMo: Momentum Models for Adaptive Learning Rates",
"version": "0.1.0",
"project_urls": {
"Homepage": "https://github.com/fabian-sp/MoMo"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "f3f604626a49f15cb3608f02ab84bcebdd7ca647f0b92fef7c2c5fe6c57eb4cd",
"md5": "c451de4700cc5a3029d310d80c595d79",
"sha256": "90648b8189bfc34cf183d8f2f286baa78c2ca1f0541ef332d3ff13cde77728c1"
},
"downloads": -1,
"filename": "momo_opt-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "c451de4700cc5a3029d310d80c595d79",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8.0",
"size": 8632,
"upload_time": "2023-05-13T08:51:17",
"upload_time_iso_8601": "2023-05-13T08:51:17.230856Z",
"url": "https://files.pythonhosted.org/packages/f3/f6/04626a49f15cb3608f02ab84bcebdd7ca647f0b92fef7c2c5fe6c57eb4cd/momo_opt-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ee0923651f542e8ac27e2ae63aa7b38365ff1d6b289b39254ada4c58f39e0e57",
"md5": "815578882ce61c45a029b03a5dcdecee",
"sha256": "4c4e9336652d68d0cad4dfddbc8f7a38acaa9e4e6fd8e83262294d547f737352"
},
"downloads": -1,
"filename": "momo-opt-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "815578882ce61c45a029b03a5dcdecee",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8.0",
"size": 7004,
"upload_time": "2023-05-13T08:51:19",
"upload_time_iso_8601": "2023-05-13T08:51:19.585171Z",
"url": "https://files.pythonhosted.org/packages/ee/09/23651f542e8ac27e2ae63aa7b38365ff1d6b289b39254ada4c58f39e0e57/momo-opt-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-05-13 08:51:19",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "fabian-sp",
"github_project": "MoMo",
"github_not_found": true,
"lcname": "momo-opt"
}