adv-lion

Name	adv-lion JSON
Version	0.1.1 JSON
	download
home_page	https://github.com/Koratahiu/Adv_Lion
Summary	Pytorch Lion optimizer with updated and advanced features.
upload_time	2025-09-01 16:22:17
maintainer	None
docs_url	None
author	Koratahiu
requires_python	>=3.8
license	Apache 2.0
keywords	llm fine-tuning pytorch optimizer lion
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

# Advanced Lion Optimizer

This repository provides an enhanced implementation of the Lion optimizer, incorporating several state-of-the-art techniques to improve performance, stability, and memory efficiency. It includes two base variants: the original **[Lion](https://github.com/lucidrains/lion-pytorch)** and **[D-Adapt-Lion](https://github.com/facebookresearch/dadaptation)**.

---

## Features

### 1. Fused Backward Pass
Reduces memory overhead by hooking gradients as they become available during the backward pass, eliminating the need to store them explicitly via the `step_parameter` method.

***

### 2. Stochastic Rounding for BF16 Training
Achieves **FP32-level performance** in BF16 training by implementing stochastic rounding for the final update. This allows for faster training with lower-precision data types without sacrificing accuracy.
- **References**:
- "[Revisiting BFloat16 Training](https://arxiv.org/abs/2010.06192)"
- "[Stochastic Rounding for LLM Training: Theory and Practice](https://arxiv.org/abs/2502.20566)"

***

### 3. Gradient Orthogonalization
Improves **model generalization** and enhances **numerical stability** by modifying gradients to be orthogonal to the parameters.
- **Reference**: "[Grokking at the Edge of Numerical Stability](https://arxiv.org/abs/2501.04697)"

***

### 4. Variance Reduction
Theoretically accelerates the **convergence speed of Lion by 33.33%** while making training more stable in noisy, small-batch environments. The main trade-off is the requirement of an additional state to store gradients from the previous step.
- **Reference**: "[Convergence Analysis of the Lion Optimizer in Centralized and Distributed Settings](https://arxiv.org/abs/2508.12327)"

***

### 5. Cautious Lion Variant
Includes the "Cautious" variant of Lion, an approach introduced to refine the optimization process and improve training outcomes.
- **Reference**: "[Cautious Optimizers: Improving Training with One Line of Code](https://arxiv.org/abs/2411.16085)"

***

### 6. Per-Parameter Gradient Norm Clipping
Enhances training stability by applying gradient norm clipping on a per-parameter basis, preventing erratic updates from large gradients.
- **Reference**: "[Lions and Muons: Optimization via Stochastic Frank-Wolfe](https://arxiv.org/abs/2506.04192)" (The paper uses a clipping value of 4-5).

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Koratahiu/Adv_Lion",
    "name": "adv-lion",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "llm, fine-tuning, pytorch, optimizer, lion",
    "author": "Koratahiu",
    "author_email": "hiuhonor@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/82/93/68d424a66151232316f6a0f4d6fccb67641676f05161c930899e52038d79/adv_lion-0.1.1.tar.gz",
    "platform": null,
    "description": "# Advanced Lion Optimizer\r\n\r\nThis repository provides an enhanced implementation of the Lion optimizer, incorporating several state-of-the-art techniques to improve performance, stability, and memory efficiency. It includes two base variants: the original **[Lion](https://github.com/lucidrains/lion-pytorch)** and **[D-Adapt-Lion](https://github.com/facebookresearch/dadaptation)**.\r\n\r\n---\r\n\r\n## Features\r\n\r\n### 1. Fused Backward Pass\r\nReduces memory overhead by hooking gradients as they become available during the backward pass, eliminating the need to store them explicitly via the `step_parameter` method.\r\n\r\n***\r\n\r\n### 2. Stochastic Rounding for BF16 Training\r\nAchieves **FP32-level performance** in BF16 training by implementing stochastic rounding for the final update. This allows for faster training with lower-precision data types without sacrificing accuracy.\r\n-   **References**:\r\n    -   \"[Revisiting BFloat16 Training](https://arxiv.org/abs/2010.06192)\"\r\n    -   \"[Stochastic Rounding for LLM Training: Theory and Practice](https://arxiv.org/abs/2502.20566)\"\r\n\r\n***\r\n\r\n### 3. Gradient Orthogonalization\r\nImproves **model generalization** and enhances **numerical stability** by modifying gradients to be orthogonal to the parameters.\r\n-   **Reference**: \"[Grokking at the Edge of Numerical Stability](https://arxiv.org/abs/2501.04697)\"\r\n\r\n***\r\n\r\n### 4. Variance Reduction\r\nTheoretically accelerates the **convergence speed of Lion by 33.33%** while making training more stable in noisy, small-batch environments. The main trade-off is the requirement of an additional state to store gradients from the previous step.\r\n-   **Reference**: \"[Convergence Analysis of the Lion Optimizer in Centralized and Distributed Settings](https://arxiv.org/abs/2508.12327)\"\r\n\r\n***\r\n\r\n### 5. Cautious Lion Variant\r\nIncludes the \"Cautious\" variant of Lion, an approach introduced to refine the optimization process and improve training outcomes.\r\n-   **Reference**: \"[Cautious Optimizers: Improving Training with One Line of Code](https://arxiv.org/abs/2411.16085)\"\r\n\r\n***\r\n\r\n### 6. Per-Parameter Gradient Norm Clipping\r\nEnhances training stability by applying gradient norm clipping on a per-parameter basis, preventing erratic updates from large gradients.\r\n-   **Reference**: \"[Lions and Muons: Optimization via Stochastic Frank-Wolfe](https://arxiv.org/abs/2506.04192)\" (The paper uses a clipping value of 4-5).\r\n",
    "bugtrack_url": null,
    "license": "Apache 2.0",
    "summary": "Pytorch Lion optimizer with updated and advanced features.",
    "version": "0.1.1",
    "project_urls": {
        "Homepage": "https://github.com/Koratahiu/Adv_Lion"
    },
    "split_keywords": [
        "llm",
        " fine-tuning",
        " pytorch",
        " optimizer",
        " lion"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "038ce9d7a399148114391c9960177ca52db044915d29585b330ea47f53e4e1b3",
                "md5": "551ad26d657c836dcb53c322d37cdc26",
                "sha256": "910aa94b15052e516aa25c82a69d1b034e1ff6652de8885e563d89070625d0cb"
            },
            "downloads": -1,
            "filename": "adv_lion-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "551ad26d657c836dcb53c322d37cdc26",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 14040,
            "upload_time": "2025-09-01T16:22:16",
            "upload_time_iso_8601": "2025-09-01T16:22:16.222293Z",
            "url": "https://files.pythonhosted.org/packages/03/8c/e9d7a399148114391c9960177ca52db044915d29585b330ea47f53e4e1b3/adv_lion-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "829368d424a66151232316f6a0f4d6fccb67641676f05161c930899e52038d79",
                "md5": "aa217388e10c9633646dfab2f8c1cac8",
                "sha256": "214e6b26c84ec502aeb396439d4bde2d908700c44c0f3cd753daaef0fe13b744"
            },
            "downloads": -1,
            "filename": "adv_lion-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "aa217388e10c9633646dfab2f8c1cac8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 12484,
            "upload_time": "2025-09-01T16:22:17",
            "upload_time_iso_8601": "2025-09-01T16:22:17.261642Z",
            "url": "https://files.pythonhosted.org/packages/82/93/68d424a66151232316f6a0f4d6fccb67641676f05161c930899e52038d79/adv_lion-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-01 16:22:17",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Koratahiu",
    "github_project": "Adv_Lion",
    "github_not_found": true,
    "lcname": "adv-lion"
}

Koratahiu