MLorc-optim


NameMLorc-optim JSON
Version 0.1.4 PyPI version JSON
download
home_pagehttps://github.com/Koratahiu/MLorc
SummaryUnofficial implementation of Momentum Low-Rank Compression (MLorc) for memory-efficient LLM fine-tuning
upload_time2025-08-22 07:28:05
maintainerNone
docs_urlNone
authorKoratahiu
requires_python>=3.8
licenseApache 2.0
keywords llm fine-tuning memory-efficient low-rank compression pytorch optimizer adam lion
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# MLorc - Momentum Low-Rank Compression for Memory-Efficient LLM Fine-tuning
Unofficial implementation of "MLorc: Momentum Low-rank Compression for Large Language Model Adaptation"

This repository introduces **MLorc (Momentum Low-rank Compression)**, a novel and highly memory-efficient paradigm designed to significantly reduce the memory footprint of full-parameter fine-tuning for large language models. Based on the paper "[MLorc: Momentum Low-rank Compression for Large Language Model Adaptation](https://arxiv.org/abs/2506.01897)" this method offers a compelling alternative to existing memory-efficient techniques.

<img width="1385" height="469" alt="image" src="https://github.com/user-attachments/assets/7bcab5ec-beaf-4d1a-b115-81ab1a7d4b18" />

---

### How MLorc Works

MLorc's core innovation lies in its approach to **momentum compression and reconstruction**:

* **Direct Momentum Compression:** It directly compresses and reconstructs both first and second-order momentum using **Randomized SVD (RSVD)** at each optimization step.
* **Adaptive Second-Order Momentum Handling:** To ensure stability, especially for non-negative second-order momentum, MLorc adaptively adds a small constant to zero values introduced by ReLU during reconstruction.

---

### Key Advantages of MLorc

MLorc is broadly applicable to any momentum-based optimizer (e.g., Adam, Lion) and delivers superior performance:

* **State-of-the-Art Performance:** Empirically, MLorc consistently **outperforms other memory-efficient methods like LoRA and GaLore** in terms of validation accuracy. It can even match or **exceed the performance of full fine-tuning** with a small rank (e.g., `rank=4`).
* **Memory and Time Efficiency:** It maintains **comparable memory efficiency to LoRA** while demonstrating **improved time efficiency compared to GaLore**.
* **Theoretical Guarantees:** MLorc offers a **theoretical guarantee for convergence**, matching the convergence rate of the original Lion optimizer under reasonable assumptions.

<img width="1403" height="602" alt="image" src="https://github.com/user-attachments/assets/ad76a8ab-966d-4121-b010-28a2ddb6e28d" />

---

### Included MLorc-Integrated Optimizers

This repository integrates MLorc into six momentum-based optimizers, each with additional enhancements for improved performance and stability:

1.  **`MLorc_AdamW`**: AdamW with MLorc compression, featuring:
    * **Fused Backward Pass**
    * **[Gradient Descent with Adaptive Momentum Scaling (Grams)](https://github.com/Gunale0926/Grams)**: For better performance and faster convergence.
    * **[`atan2` smoothing & scaling](https://github.com/lucidrains/adam-atan2-pytorch)**: A robust replacement for `eps` (no tuning required), which also incorporates gradient clipping. (If enabled, `eps` is ignored.)
    * **[OrthoGrad](https://github.com/LucasPrietoAl/grokking-at-the-edge-of-numerical-stability)**: Prevents "naïve loss minimization" (NLM) that can lead to overfitting by removing the gradient component parallel to the weight, thus improving generalization

2.  **`MLorc_Prodigy`**:
    * **Same Features as `MLorc_AdamW`**
    * Incorporates MLorc with the [**Prodigy adaptive method**](https://github.com/konstmish/prodigy) and its associated features.

3.  **`MLorc_Lion`**: Lion with MLorc compression, featuring:
    * **Fused Backward Pass**
    * **OrthoGrad**
    * **[`use_cautious`](https://github.com/kyleliang919/C-Optim)**: use the cautious varaint of Lion.
    * **`clip_threshold`**: whether to clip the gradients norm per-parameter as proposed in the paper **[Lions and Muons: Optimization via Stochastic Frank-Wolfe](https://arxiv.org/abs/2506.04192)** to make Lion more stable (default: 5.0, from the paper).

4.  **`MLorc_DAdapt_Lion`**:
    * **Same Features as `MLorc_Lion`**
    * Integrates MLorc with the [**DAdaptation adaptive**](https://github.com/facebookresearch/dadaptation) method for **LION**, and includes the slice_p feature (from Prodigy).

5.  **`MLorc_Adopt`**:
    * **Same Features as `MLorc_AdamW`**
    * Implements the method of **[ADOPT: Modified Adam Can Converge with Any β_2 with the Optimal Rate](https://arxiv.org/abs/2411.02853)**.
  
6.  **`MLorc_CAME`**:
    * **Same Features as `MLorc_AdamW`**
    * The first moment (momentum) is compressed using the low-rank factorization from MLorc, while the adaptive pre-conditioning and confidence-guided updates are from **[CAME: Confidence-guided Adaptive Memory Efficient Optimization](https://arxiv.org/abs/2307.02047)**.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Koratahiu/MLorc",
    "name": "MLorc-optim",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "llm, fine-tuning, memory-efficient, low-rank, compression, pytorch, optimizer, adam, lion",
    "author": "Koratahiu",
    "author_email": "hiuhonor@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/d0/3d/cc11c2ca33b2bd227f7224f15f34c5ee8fe9de990077c8175e82f82f5774/MLorc_optim-0.1.4.tar.gz",
    "platform": null,
    "description": "\r\n# MLorc - Momentum Low-Rank Compression for Memory-Efficient LLM Fine-tuning\r\nUnofficial implementation of \"MLorc: Momentum Low-rank Compression for Large Language Model Adaptation\"\r\n\r\nThis repository introduces **MLorc (Momentum Low-rank Compression)**, a novel and highly memory-efficient paradigm designed to significantly reduce the memory footprint of full-parameter fine-tuning for large language models. Based on the paper \"[MLorc: Momentum Low-rank Compression for Large Language Model Adaptation](https://arxiv.org/abs/2506.01897)\" this method offers a compelling alternative to existing memory-efficient techniques.\r\n\r\n<img width=\"1385\" height=\"469\" alt=\"image\" src=\"https://github.com/user-attachments/assets/7bcab5ec-beaf-4d1a-b115-81ab1a7d4b18\" />\r\n\r\n---\r\n\r\n### How MLorc Works\r\n\r\nMLorc's core innovation lies in its approach to **momentum compression and reconstruction**:\r\n\r\n* **Direct Momentum Compression:** It directly compresses and reconstructs both first and second-order momentum using **Randomized SVD (RSVD)** at each optimization step.\r\n* **Adaptive Second-Order Momentum Handling:** To ensure stability, especially for non-negative second-order momentum, MLorc adaptively adds a small constant to zero values introduced by ReLU during reconstruction.\r\n\r\n---\r\n\r\n### Key Advantages of MLorc\r\n\r\nMLorc is broadly applicable to any momentum-based optimizer (e.g., Adam, Lion) and delivers superior performance:\r\n\r\n* **State-of-the-Art Performance:** Empirically, MLorc consistently **outperforms other memory-efficient methods like LoRA and GaLore** in terms of validation accuracy. It can even match or **exceed the performance of full fine-tuning** with a small rank (e.g., `rank=4`).\r\n* **Memory and Time Efficiency:** It maintains **comparable memory efficiency to LoRA** while demonstrating **improved time efficiency compared to GaLore**.\r\n* **Theoretical Guarantees:** MLorc offers a **theoretical guarantee for convergence**, matching the convergence rate of the original Lion optimizer under reasonable assumptions.\r\n\r\n<img width=\"1403\" height=\"602\" alt=\"image\" src=\"https://github.com/user-attachments/assets/ad76a8ab-966d-4121-b010-28a2ddb6e28d\" />\r\n\r\n---\r\n\r\n### Included MLorc-Integrated Optimizers\r\n\r\nThis repository integrates MLorc into six momentum-based optimizers, each with additional enhancements for improved performance and stability:\r\n\r\n1.  **`MLorc_AdamW`**: AdamW with MLorc compression, featuring:\r\n    * **Fused Backward Pass**\r\n    * **[Gradient Descent with Adaptive Momentum Scaling (Grams)](https://github.com/Gunale0926/Grams)**: For better performance and faster convergence.\r\n    * **[`atan2` smoothing & scaling](https://github.com/lucidrains/adam-atan2-pytorch)**: A robust replacement for `eps` (no tuning required), which also incorporates gradient clipping. (If enabled, `eps` is ignored.)\r\n    * **[OrthoGrad](https://github.com/LucasPrietoAl/grokking-at-the-edge-of-numerical-stability)**: Prevents \"na\u00efve loss minimization\" (NLM) that can lead to overfitting by removing the gradient component parallel to the weight, thus improving generalization\r\n\r\n2.  **`MLorc_Prodigy`**:\r\n    * **Same Features as `MLorc_AdamW`**\r\n    * Incorporates MLorc with the [**Prodigy adaptive method**](https://github.com/konstmish/prodigy) and its associated features.\r\n\r\n3.  **`MLorc_Lion`**: Lion with MLorc compression, featuring:\r\n    * **Fused Backward Pass**\r\n    * **OrthoGrad**\r\n    * **[`use_cautious`](https://github.com/kyleliang919/C-Optim)**: use the cautious varaint of Lion.\r\n    * **`clip_threshold`**: whether to clip the gradients norm per-parameter as proposed in the paper **[Lions and Muons: Optimization via Stochastic Frank-Wolfe](https://arxiv.org/abs/2506.04192)** to make Lion more stable (default: 5.0, from the paper).\r\n\r\n4.  **`MLorc_DAdapt_Lion`**:\r\n    * **Same Features as `MLorc_Lion`**\r\n    * Integrates MLorc with the [**DAdaptation adaptive**](https://github.com/facebookresearch/dadaptation) method for **LION**, and includes the slice_p feature (from Prodigy).\r\n\r\n5.  **`MLorc_Adopt`**:\r\n    * **Same Features as `MLorc_AdamW`**\r\n    * Implements the method of **[ADOPT: Modified Adam Can Converge with Any \u03b2_2 with the Optimal Rate](https://arxiv.org/abs/2411.02853)**.\r\n  \r\n6.  **`MLorc_CAME`**:\r\n    * **Same Features as `MLorc_AdamW`**\r\n    * The first moment (momentum) is compressed using the low-rank factorization from MLorc, while the adaptive pre-conditioning and confidence-guided updates are from **[CAME: Confidence-guided Adaptive Memory Efficient Optimization](https://arxiv.org/abs/2307.02047)**.\r\n",
    "bugtrack_url": null,
    "license": "Apache 2.0",
    "summary": "Unofficial implementation of Momentum Low-Rank Compression (MLorc) for memory-efficient LLM fine-tuning",
    "version": "0.1.4",
    "project_urls": {
        "Homepage": "https://github.com/Koratahiu/MLorc"
    },
    "split_keywords": [
        "llm",
        " fine-tuning",
        " memory-efficient",
        " low-rank",
        " compression",
        " pytorch",
        " optimizer",
        " adam",
        " lion"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "58ff2b54fa68e13262767adaf226134eed55f7fbdaf95c332cc39ed9dd23ad3f",
                "md5": "e231731f16dc8da42e6252c75a906da5",
                "sha256": "405db69a3de1ef6170037981bafd60c46b060931e8b2694a826f69e9933f6989"
            },
            "downloads": -1,
            "filename": "MLorc_optim-0.1.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e231731f16dc8da42e6252c75a906da5",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 29803,
            "upload_time": "2025-08-22T07:28:03",
            "upload_time_iso_8601": "2025-08-22T07:28:03.894491Z",
            "url": "https://files.pythonhosted.org/packages/58/ff/2b54fa68e13262767adaf226134eed55f7fbdaf95c332cc39ed9dd23ad3f/MLorc_optim-0.1.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d03dcc11c2ca33b2bd227f7224f15f34c5ee8fe9de990077c8175e82f82f5774",
                "md5": "722631f42cf8a2f0f910f1ec5af3f72f",
                "sha256": "b9878ed85e5f09140b73a33665b0c0f7e2d604ea5349f00597793ce680241ec6"
            },
            "downloads": -1,
            "filename": "MLorc_optim-0.1.4.tar.gz",
            "has_sig": false,
            "md5_digest": "722631f42cf8a2f0f910f1ec5af3f72f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 19401,
            "upload_time": "2025-08-22T07:28:05",
            "upload_time_iso_8601": "2025-08-22T07:28:05.675181Z",
            "url": "https://files.pythonhosted.org/packages/d0/3d/cc11c2ca33b2bd227f7224f15f34c5ee8fe9de990077c8175e82f82f5774/MLorc_optim-0.1.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-22 07:28:05",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Koratahiu",
    "github_project": "MLorc",
    "github_not_found": true,
    "lcname": "mlorc-optim"
}
        
Elapsed time: 1.19690s