adam-lr-decay


Nameadam-lr-decay JSON
Version 0.0.8 PyPI version JSON
download
home_pagehttps://github.com/OrigamiDream/adam-lr-decay
SummaryAdam Layer-wise LR Decay
upload_time2023-10-18 16:56:44
maintainer
docs_urlNone
authorOrigamiDream
requires_python>=3.8,<3.11
licenseMIT
keywords machine-learning deep-learning tensorflow optimizers
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Adam Layer-wise LR Decay

In [ELECTRA](https://arxiv.org/abs/2003.10555), 
which had been published by Stanford University and Google Brain, 
they had used Layerwise LR Decay technique for the Adam optimizer to prevent Catastrophic forgetting of Pre-trained model.

This repo contains the implementation of Layer-wise LR Decay for Adam, with new Optimizer API that had been proposed in TensorFlow 2.11.

## Usage

Installations:
```bash
$ pip install adam-lr-decay  # this method does not install tensorflow
```
For CPU:
```bash
$ pip install adam-lr-decay[cpu]  # this method installs tensorflow-cpu>=2.11
```
For GPU:
```bash
$ pip install adam-lr-decay[gpu]  # this method installs tensorflow>=2.11
```

```python
from tensorflow.keras import layers, models
from adam_lr_decay import AdamLRDecay

# ... prepare training data

# model definition
model = models.Sequential([
    layers.Dense(3, input_shape=(2,), name='hidden_dense'),
    layers.Dense(1, name='output')
])

# optimizer definition with layerwise lr decay
adam = AdamLRDecay(learning_rate=1e-3)
adam.apply_layerwise_lr_decay(var_name_dicts={
    'hidden_dense': 0.1,
    'output': 0.
})
# this config decays the key layers by the value, 
# which is (lr * (1. - decay_rate))

# compile the model
model.compile(optimizer=adam)

# ... training loop
```

In official [ELECTRA repo](https://github.com/google-research/electra/blob/8a46635f32083ada044d7e9ad09604742600ee7b/model/optimization.py#L181),
they have defined the decay rate in the code. The adapted version is as follows:
```python
import collections
from adam_lr_decay import AdamLRDecay

def _get_layer_lrs(layer_decay, n_layers):
    key_to_depths = collections.OrderedDict({
        '/embeddings/': 0,
        '/embeddings_project/': 0,
        'task_specific/': n_layers + 2,
    })
    for layer in range(n_layers):
        key_to_depths['encoder/layer_' + str(layer) + '/'] = layer + 1
    return {
        key: 1. - (layer_decay ** (n_layers + 2 - depth))
        for key, depth in key_to_depths.items()
    }

# ... ELECTRA model definition

adam = AdamLRDecay(learning_rate=1e-3)
adam.apply_layerwise_lr_decay(var_name_dicts=_get_layer_lrs(0.9, 8))

# ... custom training loop
```

The generated decay rate must be looked like this. `0.0` means there is no decay and `1.0` means it is zero learning rate. (non-trainable)
```json
{
  "/embeddings/": 0.6513215599,
  "/embeddings_project/": 0.6513215599, 
  "task_specific/": 0.0, 
  "encoder/layer_0/": 0.6125795109999999, 
  "encoder/layer_1/": 0.5695327899999999, 
  "encoder/layer_2/": 0.5217030999999999, 
  "encoder/layer_3/": 0.46855899999999995, 
  "encoder/layer_4/": 0.40950999999999993, 
  "encoder/layer_5/": 0.3439, 
  "encoder/layer_6/": 0.2709999999999999, 
  "encoder/layer_7/": 0.18999999999999995
}
```

## Citation
```bibtex
@article{clark2020electra,
  title={Electra: Pre-training text encoders as discriminators rather than generators},
  author={Clark, Kevin and Luong, Minh-Thang and Le, Quoc V and Manning, Christopher D},
  journal={arXiv preprint arXiv:2003.10555},
  year={2020}
}
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/OrigamiDream/adam-lr-decay",
    "name": "adam-lr-decay",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8,<3.11",
    "maintainer_email": "",
    "keywords": "machine-learning,deep-learning,tensorflow,optimizers",
    "author": "OrigamiDream",
    "author_email": "hello@origamidream.me",
    "download_url": "https://files.pythonhosted.org/packages/78/97/a904efa1cb4532785a8478125b678cba7da57ea569c39ced722ea8f45429/adam_lr_decay-0.0.8.tar.gz",
    "platform": null,
    "description": "# Adam Layer-wise LR Decay\n\nIn [ELECTRA](https://arxiv.org/abs/2003.10555), \nwhich had been published by Stanford University and Google Brain, \nthey had used Layerwise LR Decay technique for the Adam optimizer to prevent Catastrophic forgetting of Pre-trained model.\n\nThis repo contains the implementation of Layer-wise LR Decay for Adam, with new Optimizer API that had been proposed in TensorFlow 2.11.\n\n## Usage\n\nInstallations:\n```bash\n$ pip install adam-lr-decay  # this method does not install tensorflow\n```\nFor CPU:\n```bash\n$ pip install adam-lr-decay[cpu]  # this method installs tensorflow-cpu>=2.11\n```\nFor GPU:\n```bash\n$ pip install adam-lr-decay[gpu]  # this method installs tensorflow>=2.11\n```\n\n```python\nfrom tensorflow.keras import layers, models\nfrom adam_lr_decay import AdamLRDecay\n\n# ... prepare training data\n\n# model definition\nmodel = models.Sequential([\n    layers.Dense(3, input_shape=(2,), name='hidden_dense'),\n    layers.Dense(1, name='output')\n])\n\n# optimizer definition with layerwise lr decay\nadam = AdamLRDecay(learning_rate=1e-3)\nadam.apply_layerwise_lr_decay(var_name_dicts={\n    'hidden_dense': 0.1,\n    'output': 0.\n})\n# this config decays the key layers by the value, \n# which is (lr * (1. - decay_rate))\n\n# compile the model\nmodel.compile(optimizer=adam)\n\n# ... training loop\n```\n\nIn official [ELECTRA repo](https://github.com/google-research/electra/blob/8a46635f32083ada044d7e9ad09604742600ee7b/model/optimization.py#L181),\nthey have defined the decay rate in the code. The adapted version is as follows:\n```python\nimport collections\nfrom adam_lr_decay import AdamLRDecay\n\ndef _get_layer_lrs(layer_decay, n_layers):\n    key_to_depths = collections.OrderedDict({\n        '/embeddings/': 0,\n        '/embeddings_project/': 0,\n        'task_specific/': n_layers + 2,\n    })\n    for layer in range(n_layers):\n        key_to_depths['encoder/layer_' + str(layer) + '/'] = layer + 1\n    return {\n        key: 1. - (layer_decay ** (n_layers + 2 - depth))\n        for key, depth in key_to_depths.items()\n    }\n\n# ... ELECTRA model definition\n\nadam = AdamLRDecay(learning_rate=1e-3)\nadam.apply_layerwise_lr_decay(var_name_dicts=_get_layer_lrs(0.9, 8))\n\n# ... custom training loop\n```\n\nThe generated decay rate must be looked like this. `0.0` means there is no decay and `1.0` means it is zero learning rate. (non-trainable)\n```json\n{\n  \"/embeddings/\": 0.6513215599,\n  \"/embeddings_project/\": 0.6513215599, \n  \"task_specific/\": 0.0, \n  \"encoder/layer_0/\": 0.6125795109999999, \n  \"encoder/layer_1/\": 0.5695327899999999, \n  \"encoder/layer_2/\": 0.5217030999999999, \n  \"encoder/layer_3/\": 0.46855899999999995, \n  \"encoder/layer_4/\": 0.40950999999999993, \n  \"encoder/layer_5/\": 0.3439, \n  \"encoder/layer_6/\": 0.2709999999999999, \n  \"encoder/layer_7/\": 0.18999999999999995\n}\n```\n\n## Citation\n```bibtex\n@article{clark2020electra,\n  title={Electra: Pre-training text encoders as discriminators rather than generators},\n  author={Clark, Kevin and Luong, Minh-Thang and Le, Quoc V and Manning, Christopher D},\n  journal={arXiv preprint arXiv:2003.10555},\n  year={2020}\n}\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Adam Layer-wise LR Decay",
    "version": "0.0.8",
    "project_urls": {
        "Homepage": "https://github.com/OrigamiDream/adam-lr-decay",
        "Repository": "https://github.com/OrigamiDream/adam-lr-decay"
    },
    "split_keywords": [
        "machine-learning",
        "deep-learning",
        "tensorflow",
        "optimizers"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c6d1441ecd9a895e98593cabd4d47fc354adaf7dc60fa231fd93458304b0edcb",
                "md5": "aa5d52fad938ab83acc9f6654af9ee17",
                "sha256": "ee36199771c7577ed2058450d469b3e8ac3032f21ce8e90ac87a09a271e90dc7"
            },
            "downloads": -1,
            "filename": "adam_lr_decay-0.0.8-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "aa5d52fad938ab83acc9f6654af9ee17",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<3.11",
            "size": 5266,
            "upload_time": "2023-10-18T16:56:43",
            "upload_time_iso_8601": "2023-10-18T16:56:43.434688Z",
            "url": "https://files.pythonhosted.org/packages/c6/d1/441ecd9a895e98593cabd4d47fc354adaf7dc60fa231fd93458304b0edcb/adam_lr_decay-0.0.8-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7897a904efa1cb4532785a8478125b678cba7da57ea569c39ced722ea8f45429",
                "md5": "b1cd4170ac71e266f428a6608b13eb73",
                "sha256": "d55f718f8466d0a98a3f1160acb4dd2d354a154dab18014cabf7901946614b0b"
            },
            "downloads": -1,
            "filename": "adam_lr_decay-0.0.8.tar.gz",
            "has_sig": false,
            "md5_digest": "b1cd4170ac71e266f428a6608b13eb73",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<3.11",
            "size": 4650,
            "upload_time": "2023-10-18T16:56:44",
            "upload_time_iso_8601": "2023-10-18T16:56:44.497618Z",
            "url": "https://files.pythonhosted.org/packages/78/97/a904efa1cb4532785a8478125b678cba7da57ea569c39ced722ea8f45429/adam_lr_decay-0.0.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-18 16:56:44",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "OrigamiDream",
    "github_project": "adam-lr-decay",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "adam-lr-decay"
}
        
Elapsed time: 1.44377s