# Adam Layer-wise LR Decay
In [ELECTRA](https://arxiv.org/abs/2003.10555),
which had been published by Stanford University and Google Brain,
they had used Layerwise LR Decay technique for the Adam optimizer to prevent Catastrophic forgetting of Pre-trained model.
This repo contains the implementation of Layer-wise LR Decay for Adam, with new Optimizer API that had been proposed in TensorFlow 2.11.
## Usage
Installations:
```bash
$ pip install adam-lr-decay # this method does not install tensorflow
```
For CPU:
```bash
$ pip install adam-lr-decay[cpu] # this method installs tensorflow-cpu>=2.11
```
For GPU:
```bash
$ pip install adam-lr-decay[gpu] # this method installs tensorflow>=2.11
```
```python
from tensorflow.keras import layers, models
from adam_lr_decay import AdamLRDecay
# ... prepare training data
# model definition
model = models.Sequential([
layers.Dense(3, input_shape=(2,), name='hidden_dense'),
layers.Dense(1, name='output')
])
# optimizer definition with layerwise lr decay
adam = AdamLRDecay(learning_rate=1e-3)
adam.apply_layerwise_lr_decay(var_name_dicts={
'hidden_dense': 0.1,
'output': 0.
})
# this config decays the key layers by the value,
# which is (lr * (1. - decay_rate))
# compile the model
model.compile(optimizer=adam)
# ... training loop
```
In official [ELECTRA repo](https://github.com/google-research/electra/blob/8a46635f32083ada044d7e9ad09604742600ee7b/model/optimization.py#L181),
they have defined the decay rate in the code. The adapted version is as follows:
```python
import collections
from adam_lr_decay import AdamLRDecay
def _get_layer_lrs(layer_decay, n_layers):
key_to_depths = collections.OrderedDict({
'/embeddings/': 0,
'/embeddings_project/': 0,
'task_specific/': n_layers + 2,
})
for layer in range(n_layers):
key_to_depths['encoder/layer_' + str(layer) + '/'] = layer + 1
return {
key: 1. - (layer_decay ** (n_layers + 2 - depth))
for key, depth in key_to_depths.items()
}
# ... ELECTRA model definition
adam = AdamLRDecay(learning_rate=1e-3)
adam.apply_layerwise_lr_decay(var_name_dicts=_get_layer_lrs(0.9, 8))
# ... custom training loop
```
The generated decay rate must be looked like this. `0.0` means there is no decay and `1.0` means it is zero learning rate. (non-trainable)
```json
{
"/embeddings/": 0.6513215599,
"/embeddings_project/": 0.6513215599,
"task_specific/": 0.0,
"encoder/layer_0/": 0.6125795109999999,
"encoder/layer_1/": 0.5695327899999999,
"encoder/layer_2/": 0.5217030999999999,
"encoder/layer_3/": 0.46855899999999995,
"encoder/layer_4/": 0.40950999999999993,
"encoder/layer_5/": 0.3439,
"encoder/layer_6/": 0.2709999999999999,
"encoder/layer_7/": 0.18999999999999995
}
```
## Citation
```bibtex
@article{clark2020electra,
title={Electra: Pre-training text encoders as discriminators rather than generators},
author={Clark, Kevin and Luong, Minh-Thang and Le, Quoc V and Manning, Christopher D},
journal={arXiv preprint arXiv:2003.10555},
year={2020}
}
```
Raw data
{
"_id": null,
"home_page": "https://github.com/OrigamiDream/adam-lr-decay",
"name": "adam-lr-decay",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8,<3.11",
"maintainer_email": "",
"keywords": "machine-learning,deep-learning,tensorflow,optimizers",
"author": "OrigamiDream",
"author_email": "hello@origamidream.me",
"download_url": "https://files.pythonhosted.org/packages/78/97/a904efa1cb4532785a8478125b678cba7da57ea569c39ced722ea8f45429/adam_lr_decay-0.0.8.tar.gz",
"platform": null,
"description": "# Adam Layer-wise LR Decay\n\nIn [ELECTRA](https://arxiv.org/abs/2003.10555), \nwhich had been published by Stanford University and Google Brain, \nthey had used Layerwise LR Decay technique for the Adam optimizer to prevent Catastrophic forgetting of Pre-trained model.\n\nThis repo contains the implementation of Layer-wise LR Decay for Adam, with new Optimizer API that had been proposed in TensorFlow 2.11.\n\n## Usage\n\nInstallations:\n```bash\n$ pip install adam-lr-decay # this method does not install tensorflow\n```\nFor CPU:\n```bash\n$ pip install adam-lr-decay[cpu] # this method installs tensorflow-cpu>=2.11\n```\nFor GPU:\n```bash\n$ pip install adam-lr-decay[gpu] # this method installs tensorflow>=2.11\n```\n\n```python\nfrom tensorflow.keras import layers, models\nfrom adam_lr_decay import AdamLRDecay\n\n# ... prepare training data\n\n# model definition\nmodel = models.Sequential([\n layers.Dense(3, input_shape=(2,), name='hidden_dense'),\n layers.Dense(1, name='output')\n])\n\n# optimizer definition with layerwise lr decay\nadam = AdamLRDecay(learning_rate=1e-3)\nadam.apply_layerwise_lr_decay(var_name_dicts={\n 'hidden_dense': 0.1,\n 'output': 0.\n})\n# this config decays the key layers by the value, \n# which is (lr * (1. - decay_rate))\n\n# compile the model\nmodel.compile(optimizer=adam)\n\n# ... training loop\n```\n\nIn official [ELECTRA repo](https://github.com/google-research/electra/blob/8a46635f32083ada044d7e9ad09604742600ee7b/model/optimization.py#L181),\nthey have defined the decay rate in the code. The adapted version is as follows:\n```python\nimport collections\nfrom adam_lr_decay import AdamLRDecay\n\ndef _get_layer_lrs(layer_decay, n_layers):\n key_to_depths = collections.OrderedDict({\n '/embeddings/': 0,\n '/embeddings_project/': 0,\n 'task_specific/': n_layers + 2,\n })\n for layer in range(n_layers):\n key_to_depths['encoder/layer_' + str(layer) + '/'] = layer + 1\n return {\n key: 1. - (layer_decay ** (n_layers + 2 - depth))\n for key, depth in key_to_depths.items()\n }\n\n# ... ELECTRA model definition\n\nadam = AdamLRDecay(learning_rate=1e-3)\nadam.apply_layerwise_lr_decay(var_name_dicts=_get_layer_lrs(0.9, 8))\n\n# ... custom training loop\n```\n\nThe generated decay rate must be looked like this. `0.0` means there is no decay and `1.0` means it is zero learning rate. (non-trainable)\n```json\n{\n \"/embeddings/\": 0.6513215599,\n \"/embeddings_project/\": 0.6513215599, \n \"task_specific/\": 0.0, \n \"encoder/layer_0/\": 0.6125795109999999, \n \"encoder/layer_1/\": 0.5695327899999999, \n \"encoder/layer_2/\": 0.5217030999999999, \n \"encoder/layer_3/\": 0.46855899999999995, \n \"encoder/layer_4/\": 0.40950999999999993, \n \"encoder/layer_5/\": 0.3439, \n \"encoder/layer_6/\": 0.2709999999999999, \n \"encoder/layer_7/\": 0.18999999999999995\n}\n```\n\n## Citation\n```bibtex\n@article{clark2020electra,\n title={Electra: Pre-training text encoders as discriminators rather than generators},\n author={Clark, Kevin and Luong, Minh-Thang and Le, Quoc V and Manning, Christopher D},\n journal={arXiv preprint arXiv:2003.10555},\n year={2020}\n}\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Adam Layer-wise LR Decay",
"version": "0.0.8",
"project_urls": {
"Homepage": "https://github.com/OrigamiDream/adam-lr-decay",
"Repository": "https://github.com/OrigamiDream/adam-lr-decay"
},
"split_keywords": [
"machine-learning",
"deep-learning",
"tensorflow",
"optimizers"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "c6d1441ecd9a895e98593cabd4d47fc354adaf7dc60fa231fd93458304b0edcb",
"md5": "aa5d52fad938ab83acc9f6654af9ee17",
"sha256": "ee36199771c7577ed2058450d469b3e8ac3032f21ce8e90ac87a09a271e90dc7"
},
"downloads": -1,
"filename": "adam_lr_decay-0.0.8-py3-none-any.whl",
"has_sig": false,
"md5_digest": "aa5d52fad938ab83acc9f6654af9ee17",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8,<3.11",
"size": 5266,
"upload_time": "2023-10-18T16:56:43",
"upload_time_iso_8601": "2023-10-18T16:56:43.434688Z",
"url": "https://files.pythonhosted.org/packages/c6/d1/441ecd9a895e98593cabd4d47fc354adaf7dc60fa231fd93458304b0edcb/adam_lr_decay-0.0.8-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "7897a904efa1cb4532785a8478125b678cba7da57ea569c39ced722ea8f45429",
"md5": "b1cd4170ac71e266f428a6608b13eb73",
"sha256": "d55f718f8466d0a98a3f1160acb4dd2d354a154dab18014cabf7901946614b0b"
},
"downloads": -1,
"filename": "adam_lr_decay-0.0.8.tar.gz",
"has_sig": false,
"md5_digest": "b1cd4170ac71e266f428a6608b13eb73",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8,<3.11",
"size": 4650,
"upload_time": "2023-10-18T16:56:44",
"upload_time_iso_8601": "2023-10-18T16:56:44.497618Z",
"url": "https://files.pythonhosted.org/packages/78/97/a904efa1cb4532785a8478125b678cba7da57ea569c39ced722ea8f45429/adam_lr_decay-0.0.8.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-10-18 16:56:44",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "OrigamiDream",
"github_project": "adam-lr-decay",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "adam-lr-decay"
}