# Advanced Lion Optimizer
This repository provides an enhanced implementation of the Lion optimizer, incorporating several state-of-the-art techniques to improve performance, stability, and memory efficiency. It includes two base variants: the original **[Lion](https://github.com/lucidrains/lion-pytorch)** and **[D-Adapt-Lion](https://github.com/facebookresearch/dadaptation)**.
---
## Features
### 1. Fused Backward Pass
Reduces memory overhead by hooking gradients as they become available during the backward pass, eliminating the need to store them explicitly via the `step_parameter` method.
***
### 2. Stochastic Rounding for BF16 Training
Achieves **FP32-level performance** in BF16 training by implementing stochastic rounding for the final update. This allows for faster training with lower-precision data types without sacrificing accuracy.
- **References**:
- "[Revisiting BFloat16 Training](https://arxiv.org/abs/2010.06192)"
- "[Stochastic Rounding for LLM Training: Theory and Practice](https://arxiv.org/abs/2502.20566)"
***
### 3. Gradient Orthogonalization
Improves **model generalization** and enhances **numerical stability** by modifying gradients to be orthogonal to the parameters.
- **Reference**: "[Grokking at the Edge of Numerical Stability](https://arxiv.org/abs/2501.04697)"
***
### 4. Variance Reduction
Theoretically accelerates the **convergence speed of Lion by 33.33%** while making training more stable in noisy, small-batch environments. The main trade-off is the requirement of an additional state to store gradients from the previous step.
- **Reference**: "[Convergence Analysis of the Lion Optimizer in Centralized and Distributed Settings](https://arxiv.org/abs/2508.12327)"
***
### 5. Cautious Lion Variant
Includes the "Cautious" variant of Lion, an approach introduced to refine the optimization process and improve training outcomes.
- **Reference**: "[Cautious Optimizers: Improving Training with One Line of Code](https://arxiv.org/abs/2411.16085)"
***
### 6. Per-Parameter Gradient Norm Clipping
Enhances training stability by applying gradient norm clipping on a per-parameter basis, preventing erratic updates from large gradients.
- **Reference**: "[Lions and Muons: Optimization via Stochastic Frank-Wolfe](https://arxiv.org/abs/2506.04192)" (The paper uses a clipping value of 4-5).
Raw data
{
"_id": null,
"home_page": "https://github.com/Koratahiu/Adv_Lion",
"name": "adv-lion",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "llm, fine-tuning, pytorch, optimizer, lion",
"author": "Koratahiu",
"author_email": "hiuhonor@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/82/93/68d424a66151232316f6a0f4d6fccb67641676f05161c930899e52038d79/adv_lion-0.1.1.tar.gz",
"platform": null,
"description": "# Advanced Lion Optimizer\r\n\r\nThis repository provides an enhanced implementation of the Lion optimizer, incorporating several state-of-the-art techniques to improve performance, stability, and memory efficiency. It includes two base variants: the original **[Lion](https://github.com/lucidrains/lion-pytorch)** and **[D-Adapt-Lion](https://github.com/facebookresearch/dadaptation)**.\r\n\r\n---\r\n\r\n## Features\r\n\r\n### 1. Fused Backward Pass\r\nReduces memory overhead by hooking gradients as they become available during the backward pass, eliminating the need to store them explicitly via the `step_parameter` method.\r\n\r\n***\r\n\r\n### 2. Stochastic Rounding for BF16 Training\r\nAchieves **FP32-level performance** in BF16 training by implementing stochastic rounding for the final update. This allows for faster training with lower-precision data types without sacrificing accuracy.\r\n- **References**:\r\n - \"[Revisiting BFloat16 Training](https://arxiv.org/abs/2010.06192)\"\r\n - \"[Stochastic Rounding for LLM Training: Theory and Practice](https://arxiv.org/abs/2502.20566)\"\r\n\r\n***\r\n\r\n### 3. Gradient Orthogonalization\r\nImproves **model generalization** and enhances **numerical stability** by modifying gradients to be orthogonal to the parameters.\r\n- **Reference**: \"[Grokking at the Edge of Numerical Stability](https://arxiv.org/abs/2501.04697)\"\r\n\r\n***\r\n\r\n### 4. Variance Reduction\r\nTheoretically accelerates the **convergence speed of Lion by 33.33%** while making training more stable in noisy, small-batch environments. The main trade-off is the requirement of an additional state to store gradients from the previous step.\r\n- **Reference**: \"[Convergence Analysis of the Lion Optimizer in Centralized and Distributed Settings](https://arxiv.org/abs/2508.12327)\"\r\n\r\n***\r\n\r\n### 5. Cautious Lion Variant\r\nIncludes the \"Cautious\" variant of Lion, an approach introduced to refine the optimization process and improve training outcomes.\r\n- **Reference**: \"[Cautious Optimizers: Improving Training with One Line of Code](https://arxiv.org/abs/2411.16085)\"\r\n\r\n***\r\n\r\n### 6. Per-Parameter Gradient Norm Clipping\r\nEnhances training stability by applying gradient norm clipping on a per-parameter basis, preventing erratic updates from large gradients.\r\n- **Reference**: \"[Lions and Muons: Optimization via Stochastic Frank-Wolfe](https://arxiv.org/abs/2506.04192)\" (The paper uses a clipping value of 4-5).\r\n",
"bugtrack_url": null,
"license": "Apache 2.0",
"summary": "Pytorch Lion optimizer with updated and advanced features.",
"version": "0.1.1",
"project_urls": {
"Homepage": "https://github.com/Koratahiu/Adv_Lion"
},
"split_keywords": [
"llm",
" fine-tuning",
" pytorch",
" optimizer",
" lion"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "038ce9d7a399148114391c9960177ca52db044915d29585b330ea47f53e4e1b3",
"md5": "551ad26d657c836dcb53c322d37cdc26",
"sha256": "910aa94b15052e516aa25c82a69d1b034e1ff6652de8885e563d89070625d0cb"
},
"downloads": -1,
"filename": "adv_lion-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "551ad26d657c836dcb53c322d37cdc26",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 14040,
"upload_time": "2025-09-01T16:22:16",
"upload_time_iso_8601": "2025-09-01T16:22:16.222293Z",
"url": "https://files.pythonhosted.org/packages/03/8c/e9d7a399148114391c9960177ca52db044915d29585b330ea47f53e4e1b3/adv_lion-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "829368d424a66151232316f6a0f4d6fccb67641676f05161c930899e52038d79",
"md5": "aa217388e10c9633646dfab2f8c1cac8",
"sha256": "214e6b26c84ec502aeb396439d4bde2d908700c44c0f3cd753daaef0fe13b744"
},
"downloads": -1,
"filename": "adv_lion-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "aa217388e10c9633646dfab2f8c1cac8",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 12484,
"upload_time": "2025-09-01T16:22:17",
"upload_time_iso_8601": "2025-09-01T16:22:17.261642Z",
"url": "https://files.pythonhosted.org/packages/82/93/68d424a66151232316f6a0f4d6fccb67641676f05161c930899e52038d79/adv_lion-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-01 16:22:17",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Koratahiu",
"github_project": "Adv_Lion",
"github_not_found": true,
"lcname": "adv-lion"
}