# MLorc - Momentum Low-Rank Compression for Memory-Efficient LLM Fine-tuning
Unofficial implementation of "MLorc: Momentum Low-rank Compression for Large Language Model Adaptation"
This repository introduces **MLorc (Momentum Low-rank Compression)**, a novel and highly memory-efficient paradigm designed to significantly reduce the memory footprint of full-parameter fine-tuning for large language models. Based on the paper "[MLorc: Momentum Low-rank Compression for Large Language Model Adaptation](https://arxiv.org/abs/2506.01897)" this method offers a compelling alternative to existing memory-efficient techniques.
<img width="1385" height="469" alt="image" src="https://github.com/user-attachments/assets/7bcab5ec-beaf-4d1a-b115-81ab1a7d4b18" />
---
### How MLorc Works
MLorc's core innovation lies in its approach to **momentum compression and reconstruction**:
* **Direct Momentum Compression:** It directly compresses and reconstructs both first and second-order momentum using **Randomized SVD (RSVD)** at each optimization step.
* **Adaptive Second-Order Momentum Handling:** To ensure stability, especially for non-negative second-order momentum, MLorc adaptively adds a small constant to zero values introduced by ReLU during reconstruction.
---
### Key Advantages of MLorc
MLorc is broadly applicable to any momentum-based optimizer (e.g., Adam, Lion) and delivers superior performance:
* **State-of-the-Art Performance:** Empirically, MLorc consistently **outperforms other memory-efficient methods like LoRA and GaLore** in terms of validation accuracy. It can even match or **exceed the performance of full fine-tuning** with a small rank (e.g., `rank=4`).
* **Memory and Time Efficiency:** It maintains **comparable memory efficiency to LoRA** while demonstrating **improved time efficiency compared to GaLore**.
* **Theoretical Guarantees:** MLorc offers a **theoretical guarantee for convergence**, matching the convergence rate of the original Lion optimizer under reasonable assumptions.
<img width="1403" height="602" alt="image" src="https://github.com/user-attachments/assets/ad76a8ab-966d-4121-b010-28a2ddb6e28d" />
---
### Included MLorc-Integrated Optimizers
This repository integrates MLorc into six momentum-based optimizers, each with additional enhancements for improved performance and stability:
1. **`MLorc_AdamW`**: AdamW with MLorc compression, featuring:
* **Fused Backward Pass**
* **[Gradient Descent with Adaptive Momentum Scaling (Grams)](https://github.com/Gunale0926/Grams)**: For better performance and faster convergence.
* **[`atan2` smoothing & scaling](https://github.com/lucidrains/adam-atan2-pytorch)**: A robust replacement for `eps` (no tuning required), which also incorporates gradient clipping. (If enabled, `eps` is ignored.)
* **[OrthoGrad](https://github.com/LucasPrietoAl/grokking-at-the-edge-of-numerical-stability)**: Prevents "naïve loss minimization" (NLM) that can lead to overfitting by removing the gradient component parallel to the weight, thus improving generalization
2. **`MLorc_Prodigy`**:
* **Same Features as `MLorc_AdamW`**
* Incorporates MLorc with the [**Prodigy adaptive method**](https://github.com/konstmish/prodigy) and its associated features.
3. **`MLorc_Lion`**: Lion with MLorc compression, featuring:
* **Fused Backward Pass**
* **OrthoGrad**
* **[`use_cautious`](https://github.com/kyleliang919/C-Optim)**: use the cautious varaint of Lion.
* **`clip_threshold`**: whether to clip the gradients norm per-parameter as proposed in the paper **[Lions and Muons: Optimization via Stochastic Frank-Wolfe](https://arxiv.org/abs/2506.04192)** to make Lion more stable (default: 5.0, from the paper).
4. **`MLorc_DAdapt_Lion`**:
* **Same Features as `MLorc_Lion`**
* Integrates MLorc with the [**DAdaptation adaptive**](https://github.com/facebookresearch/dadaptation) method for **LION**, and includes the slice_p feature (from Prodigy).
5. **`MLorc_Adopt`**:
* **Same Features as `MLorc_AdamW`**
* Implements the method of **[ADOPT: Modified Adam Can Converge with Any β_2 with the Optimal Rate](https://arxiv.org/abs/2411.02853)**.
6. **`MLorc_CAME`**:
* **Same Features as `MLorc_AdamW`**
* The first moment (momentum) is compressed using the low-rank factorization from MLorc, while the adaptive pre-conditioning and confidence-guided updates are from **[CAME: Confidence-guided Adaptive Memory Efficient Optimization](https://arxiv.org/abs/2307.02047)**.
Raw data
{
"_id": null,
"home_page": "https://github.com/Koratahiu/MLorc",
"name": "MLorc-optim",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "llm, fine-tuning, memory-efficient, low-rank, compression, pytorch, optimizer, adam, lion",
"author": "Koratahiu",
"author_email": "hiuhonor@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/d0/3d/cc11c2ca33b2bd227f7224f15f34c5ee8fe9de990077c8175e82f82f5774/MLorc_optim-0.1.4.tar.gz",
"platform": null,
"description": "\r\n# MLorc - Momentum Low-Rank Compression for Memory-Efficient LLM Fine-tuning\r\nUnofficial implementation of \"MLorc: Momentum Low-rank Compression for Large Language Model Adaptation\"\r\n\r\nThis repository introduces **MLorc (Momentum Low-rank Compression)**, a novel and highly memory-efficient paradigm designed to significantly reduce the memory footprint of full-parameter fine-tuning for large language models. Based on the paper \"[MLorc: Momentum Low-rank Compression for Large Language Model Adaptation](https://arxiv.org/abs/2506.01897)\" this method offers a compelling alternative to existing memory-efficient techniques.\r\n\r\n<img width=\"1385\" height=\"469\" alt=\"image\" src=\"https://github.com/user-attachments/assets/7bcab5ec-beaf-4d1a-b115-81ab1a7d4b18\" />\r\n\r\n---\r\n\r\n### How MLorc Works\r\n\r\nMLorc's core innovation lies in its approach to **momentum compression and reconstruction**:\r\n\r\n* **Direct Momentum Compression:** It directly compresses and reconstructs both first and second-order momentum using **Randomized SVD (RSVD)** at each optimization step.\r\n* **Adaptive Second-Order Momentum Handling:** To ensure stability, especially for non-negative second-order momentum, MLorc adaptively adds a small constant to zero values introduced by ReLU during reconstruction.\r\n\r\n---\r\n\r\n### Key Advantages of MLorc\r\n\r\nMLorc is broadly applicable to any momentum-based optimizer (e.g., Adam, Lion) and delivers superior performance:\r\n\r\n* **State-of-the-Art Performance:** Empirically, MLorc consistently **outperforms other memory-efficient methods like LoRA and GaLore** in terms of validation accuracy. It can even match or **exceed the performance of full fine-tuning** with a small rank (e.g., `rank=4`).\r\n* **Memory and Time Efficiency:** It maintains **comparable memory efficiency to LoRA** while demonstrating **improved time efficiency compared to GaLore**.\r\n* **Theoretical Guarantees:** MLorc offers a **theoretical guarantee for convergence**, matching the convergence rate of the original Lion optimizer under reasonable assumptions.\r\n\r\n<img width=\"1403\" height=\"602\" alt=\"image\" src=\"https://github.com/user-attachments/assets/ad76a8ab-966d-4121-b010-28a2ddb6e28d\" />\r\n\r\n---\r\n\r\n### Included MLorc-Integrated Optimizers\r\n\r\nThis repository integrates MLorc into six momentum-based optimizers, each with additional enhancements for improved performance and stability:\r\n\r\n1. **`MLorc_AdamW`**: AdamW with MLorc compression, featuring:\r\n * **Fused Backward Pass**\r\n * **[Gradient Descent with Adaptive Momentum Scaling (Grams)](https://github.com/Gunale0926/Grams)**: For better performance and faster convergence.\r\n * **[`atan2` smoothing & scaling](https://github.com/lucidrains/adam-atan2-pytorch)**: A robust replacement for `eps` (no tuning required), which also incorporates gradient clipping. (If enabled, `eps` is ignored.)\r\n * **[OrthoGrad](https://github.com/LucasPrietoAl/grokking-at-the-edge-of-numerical-stability)**: Prevents \"na\u00efve loss minimization\" (NLM) that can lead to overfitting by removing the gradient component parallel to the weight, thus improving generalization\r\n\r\n2. **`MLorc_Prodigy`**:\r\n * **Same Features as `MLorc_AdamW`**\r\n * Incorporates MLorc with the [**Prodigy adaptive method**](https://github.com/konstmish/prodigy) and its associated features.\r\n\r\n3. **`MLorc_Lion`**: Lion with MLorc compression, featuring:\r\n * **Fused Backward Pass**\r\n * **OrthoGrad**\r\n * **[`use_cautious`](https://github.com/kyleliang919/C-Optim)**: use the cautious varaint of Lion.\r\n * **`clip_threshold`**: whether to clip the gradients norm per-parameter as proposed in the paper **[Lions and Muons: Optimization via Stochastic Frank-Wolfe](https://arxiv.org/abs/2506.04192)** to make Lion more stable (default: 5.0, from the paper).\r\n\r\n4. **`MLorc_DAdapt_Lion`**:\r\n * **Same Features as `MLorc_Lion`**\r\n * Integrates MLorc with the [**DAdaptation adaptive**](https://github.com/facebookresearch/dadaptation) method for **LION**, and includes the slice_p feature (from Prodigy).\r\n\r\n5. **`MLorc_Adopt`**:\r\n * **Same Features as `MLorc_AdamW`**\r\n * Implements the method of **[ADOPT: Modified Adam Can Converge with Any \u03b2_2 with the Optimal Rate](https://arxiv.org/abs/2411.02853)**.\r\n \r\n6. **`MLorc_CAME`**:\r\n * **Same Features as `MLorc_AdamW`**\r\n * The first moment (momentum) is compressed using the low-rank factorization from MLorc, while the adaptive pre-conditioning and confidence-guided updates are from **[CAME: Confidence-guided Adaptive Memory Efficient Optimization](https://arxiv.org/abs/2307.02047)**.\r\n",
"bugtrack_url": null,
"license": "Apache 2.0",
"summary": "Unofficial implementation of Momentum Low-Rank Compression (MLorc) for memory-efficient LLM fine-tuning",
"version": "0.1.4",
"project_urls": {
"Homepage": "https://github.com/Koratahiu/MLorc"
},
"split_keywords": [
"llm",
" fine-tuning",
" memory-efficient",
" low-rank",
" compression",
" pytorch",
" optimizer",
" adam",
" lion"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "58ff2b54fa68e13262767adaf226134eed55f7fbdaf95c332cc39ed9dd23ad3f",
"md5": "e231731f16dc8da42e6252c75a906da5",
"sha256": "405db69a3de1ef6170037981bafd60c46b060931e8b2694a826f69e9933f6989"
},
"downloads": -1,
"filename": "MLorc_optim-0.1.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e231731f16dc8da42e6252c75a906da5",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 29803,
"upload_time": "2025-08-22T07:28:03",
"upload_time_iso_8601": "2025-08-22T07:28:03.894491Z",
"url": "https://files.pythonhosted.org/packages/58/ff/2b54fa68e13262767adaf226134eed55f7fbdaf95c332cc39ed9dd23ad3f/MLorc_optim-0.1.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "d03dcc11c2ca33b2bd227f7224f15f34c5ee8fe9de990077c8175e82f82f5774",
"md5": "722631f42cf8a2f0f910f1ec5af3f72f",
"sha256": "b9878ed85e5f09140b73a33665b0c0f7e2d604ea5349f00597793ce680241ec6"
},
"downloads": -1,
"filename": "MLorc_optim-0.1.4.tar.gz",
"has_sig": false,
"md5_digest": "722631f42cf8a2f0f910f1ec5af3f72f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 19401,
"upload_time": "2025-08-22T07:28:05",
"upload_time_iso_8601": "2025-08-22T07:28:05.675181Z",
"url": "https://files.pythonhosted.org/packages/d0/3d/cc11c2ca33b2bd227f7224f15f34c5ee8fe9de990077c8175e82f82f5774/MLorc_optim-0.1.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-22 07:28:05",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Koratahiu",
"github_project": "MLorc",
"github_not_found": true,
"lcname": "mlorc-optim"
}