# Adapter-LoRa for Quantization
<div align="center">
<img src="assets/LoRa.png" alt="LoRa-Logo" width="200">
[![Made With Love](https://img.shields.io/badge/Made%20With-Love-orange.svg)](https://github.com/youness-elbrag/AdapterLoRa/)
[![GitHub issues](https://img.shields.io/github/issues/kyegomez/Med-Palm)](https://github.com/youness-elbrag/AdapterLoRa/issues)
[![GitHub forks](https://img.shields.io/github/forks/kyegomez/Med-Palm)](https://github.com/youness-elbrag/AdapterLoRa/network)
[![GitHub stars](https://img.shields.io/github/stars/kyegomez/Med-Palm)](https://github.com/youness-elbrag/AdapterLoRa/stargazers) [![GitHub license](https://img.shields.io/github/license/youness-elbrag/AdapterLoRa)](https://github.com/youness-elbrag/AdapterLoRa/blob/master/LICENSE)
</div>
## Comparative Features of "loralib" and "loratorch" Implementations
**Distinguishing the "loralib" and "loratorch" Approaches for Implementation**
The implementations of "loralib" and "loratorch" exhibit distinct methodologies, particularly when using the example of `nn.Linear`. The underlying mathematical representations are as follows:
1. * **loralib** Approach
The computation is defined as:
\[
h = x W_0^\top + \frac{\alpha}{r} x(BA)^\top,
\]
where:
- `x` is an input matrix of dimensions \(k \times n\),
- `W_0` is a pre-trained weight matrix of dimensions \(m \times n\),
- `r` is a predefined LoRA rank,
- `B` and `A` are LoRA matrices of dimensions \(m \times r\) and \(r \times n\) respectively,
- `\alpha` is a hyper-parameter.
1. For ``loralib``,
$h = x W_0^\top + \frac{\alpha}{r} x(BA)^\top,$
where $x\in\mathbb{R}^{k\times n}$ is the input matrix, $W_0\in\mathbb{R}^{m\times n}$ is the pre-trained weight matrix, $r$ is the predefined LoRA rank, $B\in\mathbb{R}^{m\times r}$ and $A\in \mathbb{R}^{r\times n}$ are the LoRA matrixes, and $\alpha$ is a hyper-parameter.
2. For ``loratorch``,
$h = x (W_0 + \frac{\alpha}{r} BA)^\top.$
``loralib`` computes $xW_0^\top$ and $x(BA)^\top$ respectively and then merges the results. While ``loratorch`` merges pre-trained weight $W_0$ and its LoRA weight $BA$ and then computes the results by simply using ``nn.Linear.forward()``. There is no difference between ``loralib`` and ``loratorch`` in the linear layers. But in some no-linear or complex layers, we are no sure whether this layer satisfies $L(x, W_0)+L(x, BA) = L(x, W_0+BA)$. Hence, it is difficult to extend LoRA to some complex layers by using ``loralib``. On the contrary, the idea of merging weights first in ``loratorch`` is more general and extensible. You just call ``merge_lora_param()`` in ``loratorch`` to merge weights and then call ``forward()`` in the original layer to compute the results. With the help of ``loratorch``, you can easily implement LoRA to any type of layer of ``torch.nn``.
## Supported Layers
| | ``loralib`` | ``loratorch`` | |
| ------------------------- |:--------------:|:--------------:| -------------------------------------------------- |
| ``nn.Linear`` | ✓ | ✓ | [linear.ipynb](https://github.com/Baijiong-Lin/LoRA-Torch/blob/main/examples/linear.ipynb) |
| ``nn.Embedding`` | ✓ | ✓ | [embedding.ipynb](https://github.com/Baijiong-Lin/LoRA-Torch/blob/main/examples/embedding.ipynb) |
| ``nn.Conv1d`` | ✓ | ✓ | |
| ``nn.Conv2d`` | ✓ | ✓ | |
| ``nn.Conv3d`` | ✓ | ✓ | |
| ``nn.MultiheadAttention`` | ✘ | ✓ | |
| ``MergedLinear`` | ✓ (Error) | ✓ | [mergedlinear.ipynb](https://github.com/Baijiong-Lin/LoRA-Torch/blob/main/examples/mergedlinear.ipynb) |
| $\cdots$ | hard to extend | easy to extend | |
*We compare the results of ``loralib`` and ``loratorch`` in [examples](./examples) to demonstrate the correctness of the implementation in ``loratorch``.*
## Quick Start
**The usage of ``AdapterLoRa``**
1. Install ``AdapterLoRa``.
```bash
pip install git+https://github.com/Baijiong-Lin/LoRA-Torch
```
```python
pip install AdapterLoRa
```
### Usage Tool AdpaterLoRa
```python
import torch.nn as nn
import torch
from core.Quantized import AdapterLoRa
model = nn.TransformerEncoderLayer(d_model=512, nhead=8)
Adpate_model = AdapterLoRa(model , method="LoRa", Rank=4)
"""
adding Linear Layer built Self.attention
Replace the layers where you would like to use AdapterLoRa by using add_layer function.
"""
Adpate_model.add_layer("self_attn")
Adpate_model.add_layer("linear1")
Adpate_model.add_layer("linear2")
# reconstruct model Quantized
Adpate_model.reconstruct_model()
# Iplmented LoRa Method
model = Adpate_model.implement_lora(verbose=True)
# Total trainable parameters before LoRA: 3176960
# Total trainable parameters after LoRA: 24576
# This sets requires_grad to False for all parameters without the string "lora_" in their names
# Training loop
for batch in dataloader:
model.train()
```
### Saving Wieghts model
* Save LoRA model (only the LoRA matrixes will be saved).
```python
import loralib as lora
# ===== Before =====
# torch.save(model.state_dict(), checkpoint_path)
# ===== After =====
torch.save(lora.lora_state_dict(model), checkpoint_path)
```
### Loading the Pre-Trained Model
* Load LoRA model (need to load the pre-trained model first).
```python
import loralib as lora
# Load the pre-trained checkpoint first
model.load_state_dict(torch.load('ckpt_pretrained.pt'), strict=False)
# Then load the LoRA checkpoint
model.load_state_dict(torch.load('ckpt_lora.pt'), strict=False)
```
- <img src="assets/rocket.gif" width="32" height="32"/> Quantized Model <img src="assets/rocket.gif" width="32" height="32"/>
- <img src="assets/time.gif" width="32" height="32"/> Time to Train <img src="assets/time.gif" width="32" height="32"/>
- <img src="assets/money.gif" width="32" height="32"/> Cost to Train <img src="assets/money.gif" width="32" height="32"/>
## What's in it for you?
For each of the above four pillars, we are sharing our codebase and insights to:
- Assist you to leverage Transfomer-Based Model for your machines needs and challenges
- Boost reproducibility efforts which are becoming increasingly difficult with Transfomers
i am providing Tool that are ready-to-use for Quantize the model:
- Finetuning Transfomer-Based on your proprietary dataset via PeFT methodologies such as LoRA and QLoRa
- Performing hyperparameter optimization to get the maximum performance out of these models
## What's the best way to use this repository?
Go over to the Transfomer-Based-specific directory that you are interested in, and open the ```README.md```. We have included details about the LLMs, followed by performance results on open-source datasets!
## Roadmap
Our plan is to perform these experiments on all the Transformer-Based model below. To that end, this is a tentative roadmap of the LLMs that we aim to cover:
- [x] TransfomerEncoder
- [x] TransfomerDecoder
- [x] Vision-Transfomer
- [x] minGPT
- [x] OpenAI GPT-2
- [ ] Inflection Pi **Under Progress**
## Correspondence
## Contributor
``AdapterLoRa`` is developed and maintained by
''Youness ELbrag'' ([Email](younsselbrag@gmail.com) | [LinkedIn](https://www.linkedin.com/in/youness-el-brag-b13628203/))
Raw data
{
"_id": null,
"home_page": "https://github.com/youness-elbrag/AdapterLoRa/",
"name": "AdapterLoRa",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "Quantization,AdapterLLM,PEFT",
"author": "Youness EL BRAG",
"author_email": "younsselbrag@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/af/39/6921f288b74d1ae8ddfd7f4b81ab290f75e029935bb00f73b9a700dd3cf1/AdapterLoRa-2.0.0.tar.gz",
"platform": null,
"description": "# Adapter-LoRa for Quantization \n\n<div align=\"center\">\n <img src=\"assets/LoRa.png\" alt=\"LoRa-Logo\" width=\"200\">\n\n[![Made With Love](https://img.shields.io/badge/Made%20With-Love-orange.svg)](https://github.com/youness-elbrag/AdapterLoRa/)\n[![GitHub issues](https://img.shields.io/github/issues/kyegomez/Med-Palm)](https://github.com/youness-elbrag/AdapterLoRa/issues) \n[![GitHub forks](https://img.shields.io/github/forks/kyegomez/Med-Palm)](https://github.com/youness-elbrag/AdapterLoRa/network) \n[![GitHub stars](https://img.shields.io/github/stars/kyegomez/Med-Palm)](https://github.com/youness-elbrag/AdapterLoRa/stargazers) [![GitHub license](https://img.shields.io/github/license/youness-elbrag/AdapterLoRa)](https://github.com/youness-elbrag/AdapterLoRa/blob/master/LICENSE)\n</div>\n\n\n## Comparative Features of \"loralib\" and \"loratorch\" Implementations\n\n**Distinguishing the \"loralib\" and \"loratorch\" Approaches for Implementation**\n\nThe implementations of \"loralib\" and \"loratorch\" exhibit distinct methodologies, particularly when using the example of `nn.Linear`. The underlying mathematical representations are as follows:\n\n1. * **loralib** Approach\n\n The computation is defined as:\n\n \\[\n h = x W_0^\\top + \\frac{\\alpha}{r} x(BA)^\\top,\n \\]\n\n where:\n - `x` is an input matrix of dimensions \\(k \\times n\\),\n - `W_0` is a pre-trained weight matrix of dimensions \\(m \\times n\\),\n - `r` is a predefined LoRA rank,\n - `B` and `A` are LoRA matrices of dimensions \\(m \\times r\\) and \\(r \\times n\\) respectively,\n - `\\alpha` is a hyper-parameter.\n\n\n1. For ``loralib``,\n $h = x W_0^\\top + \\frac{\\alpha}{r} x(BA)^\\top,$\n\nwhere $x\\in\\mathbb{R}^{k\\times n}$ is the input matrix, $W_0\\in\\mathbb{R}^{m\\times n}$ is the pre-trained weight matrix, $r$ is the predefined LoRA rank, $B\\in\\mathbb{R}^{m\\times r}$ and $A\\in \\mathbb{R}^{r\\times n}$ are the LoRA matrixes, and $\\alpha$ is a hyper-parameter.\n\n2. For ``loratorch``,\n $h = x (W_0 + \\frac{\\alpha}{r} BA)^\\top.$\n \n``loralib`` computes $xW_0^\\top$ and $x(BA)^\\top$ respectively and then merges the results. While ``loratorch`` merges pre-trained weight $W_0$ and its LoRA weight $BA$ and then computes the results by simply using ``nn.Linear.forward()``. There is no difference between ``loralib`` and ``loratorch`` in the linear layers. But in some no-linear or complex layers, we are no sure whether this layer satisfies $L(x, W_0)+L(x, BA) = L(x, W_0+BA)$. Hence, it is difficult to extend LoRA to some complex layers by using ``loralib``. On the contrary, the idea of merging weights first in ``loratorch`` is more general and extensible. You just call ``merge_lora_param()`` in ``loratorch`` to merge weights and then call ``forward()`` in the original layer to compute the results. With the help of ``loratorch``, you can easily implement LoRA to any type of layer of ``torch.nn``.\n\n## Supported Layers\n\n| | ``loralib`` | ``loratorch`` | |\n| ------------------------- |:--------------:|:--------------:| -------------------------------------------------- |\n| ``nn.Linear`` | \u2713 | \u2713 | [linear.ipynb](https://github.com/Baijiong-Lin/LoRA-Torch/blob/main/examples/linear.ipynb) |\n| ``nn.Embedding`` | \u2713 | \u2713 | [embedding.ipynb](https://github.com/Baijiong-Lin/LoRA-Torch/blob/main/examples/embedding.ipynb) |\n| ``nn.Conv1d`` | \u2713 | \u2713 | |\n| ``nn.Conv2d`` | \u2713 | \u2713 | |\n| ``nn.Conv3d`` | \u2713 | \u2713 | |\n| ``nn.MultiheadAttention`` | \u2718 | \u2713 | |\n| ``MergedLinear`` | \u2713 (Error) | \u2713 | [mergedlinear.ipynb](https://github.com/Baijiong-Lin/LoRA-Torch/blob/main/examples/mergedlinear.ipynb) |\n| $\\cdots$ | hard to extend | easy to extend | |\n\n*We compare the results of ``loralib`` and ``loratorch`` in [examples](./examples) to demonstrate the correctness of the implementation in ``loratorch``.*\n\n## Quick Start\n\n**The usage of ``AdapterLoRa``**\n\n1. Install ``AdapterLoRa``.\n \n ```bash\n pip install git+https://github.com/Baijiong-Lin/LoRA-Torch\n ```\n\n ```python\n pip install AdapterLoRa\n ```\n\n### Usage Tool AdpaterLoRa\n\n```python\n\nimport torch.nn as nn\nimport torch\nfrom core.Quantized import AdapterLoRa\n\nmodel = nn.TransformerEncoderLayer(d_model=512, nhead=8)\n\nAdpate_model = AdapterLoRa(model , method=\"LoRa\", Rank=4)\n\n\"\"\"\nadding Linear Layer built Self.attention \nReplace the layers where you would like to use AdapterLoRa by using add_layer function.\n\"\"\"\n\nAdpate_model.add_layer(\"self_attn\") \nAdpate_model.add_layer(\"linear1\")\nAdpate_model.add_layer(\"linear2\")\n\n# reconstruct model Quantized \nAdpate_model.reconstruct_model()\n\n# Iplmented LoRa Method\nmodel = Adpate_model.implement_lora(verbose=True)\n# Total trainable parameters before LoRA: 3176960\n# Total trainable parameters after LoRA: 24576\n\n# This sets requires_grad to False for all parameters without the string \"lora_\" in their names\n\n# Training loop\nfor batch in dataloader:\n model.train()\n```\n### Saving Wieghts model \n\n* Save LoRA model (only the LoRA matrixes will be saved).\n\n```python\nimport loralib as lora \n# ===== Before =====\n# torch.save(model.state_dict(), checkpoint_path)\n# ===== After =====\ntorch.save(lora.lora_state_dict(model), checkpoint_path)\n```\n\n### Loading the Pre-Trained Model \n\n* Load LoRA model (need to load the pre-trained model first).\n\n```python\nimport loralib as lora \n# Load the pre-trained checkpoint first\nmodel.load_state_dict(torch.load('ckpt_pretrained.pt'), strict=False)\n# Then load the LoRA checkpoint\nmodel.load_state_dict(torch.load('ckpt_lora.pt'), strict=False)\n```\n\n\n- <img src=\"assets/rocket.gif\" width=\"32\" height=\"32\"/> Quantized Model <img src=\"assets/rocket.gif\" width=\"32\" height=\"32\"/>\n\n- <img src=\"assets/time.gif\" width=\"32\" height=\"32\"/> Time to Train <img src=\"assets/time.gif\" width=\"32\" height=\"32\"/>\n\n- <img src=\"assets/money.gif\" width=\"32\" height=\"32\"/> Cost to Train <img src=\"assets/money.gif\" width=\"32\" height=\"32\"/>\n\n\n## What's in it for you?\n\nFor each of the above four pillars, we are sharing our codebase and insights to:\n- Assist you to leverage Transfomer-Based Model for your machines needs and challenges\n\n- Boost reproducibility efforts which are becoming increasingly difficult with Transfomers \n\ni am providing Tool that are ready-to-use for Quantize the model:\n\n- Finetuning Transfomer-Based on your proprietary dataset via PeFT methodologies such as LoRA and QLoRa\n\n- Performing hyperparameter optimization to get the maximum performance out of these models\n\n## What's the best way to use this repository?\n\nGo over to the Transfomer-Based-specific directory that you are interested in, and open the ```README.md```. We have included details about the LLMs, followed by performance results on open-source datasets!\n\n## Roadmap\n\nOur plan is to perform these experiments on all the Transformer-Based model below. To that end, this is a tentative roadmap of the LLMs that we aim to cover:\n\n- [x] TransfomerEncoder\n- [x] TransfomerDecoder\n- [x] Vision-Transfomer\n- [x] minGPT \n- [x] OpenAI GPT-2 \n- [ ] Inflection Pi **Under Progress**\n\n## Correspondence\n\n## Contributor\n\n``AdapterLoRa`` is developed and maintained by \n''Youness ELbrag'' ([Email](younsselbrag@gmail.com) | [LinkedIn](https://www.linkedin.com/in/youness-el-brag-b13628203/))\n\n\n\n\n",
"bugtrack_url": null,
"license": "",
"summary": "A Tool for adaptation Larger Transfomer-Based model and Quantization built top on libraries LoRa and LoRa-Torch.",
"version": "2.0.0",
"project_urls": {
"Homepage": "https://github.com/youness-elbrag/AdapterLoRa/"
},
"split_keywords": [
"quantization",
"adapterllm",
"peft"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "3a4753a777a92eab68ab8b7519de5a36734477871a631c08d52d0338251bc16f",
"md5": "63a12c4418a413c1e3147e68d6743157",
"sha256": "112d00ae96f710d2c900cbe3c2223ed349e3619aa75dbf01d334726e2225596b"
},
"downloads": -1,
"filename": "AdapterLoRa-2.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "63a12c4418a413c1e3147e68d6743157",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 7247,
"upload_time": "2023-08-26T15:15:24",
"upload_time_iso_8601": "2023-08-26T15:15:24.908734Z",
"url": "https://files.pythonhosted.org/packages/3a/47/53a777a92eab68ab8b7519de5a36734477871a631c08d52d0338251bc16f/AdapterLoRa-2.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "af396921f288b74d1ae8ddfd7f4b81ab290f75e029935bb00f73b9a700dd3cf1",
"md5": "f15f45fc74f01f8b4743808453a17768",
"sha256": "142f3f17d480b4541a95dfae133a590c25e5212152fa92adf48f77f4be558a12"
},
"downloads": -1,
"filename": "AdapterLoRa-2.0.0.tar.gz",
"has_sig": false,
"md5_digest": "f15f45fc74f01f8b4743808453a17768",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 7316,
"upload_time": "2023-08-26T15:15:26",
"upload_time_iso_8601": "2023-08-26T15:15:26.443612Z",
"url": "https://files.pythonhosted.org/packages/af/39/6921f288b74d1ae8ddfd7f4b81ab290f75e029935bb00f73b9a700dd3cf1/AdapterLoRa-2.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-08-26 15:15:26",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "youness-elbrag",
"github_project": "AdapterLoRa",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "adapterlora"
}