# PSGD-QUAD
An implementation of PSGD-QUAD for PyTorch.
```python
import torch
from quad_torch import QUAD
model = torch.nn.Linear(10, 10)
optimizer = QUAD(
model.parameters(),
lr=0.001,
lr_style="adam", # "adam", "mu-p", or None
momentum=0.95,
weight_decay=0.1,
preconditioner_lr=0.7,
max_size_dense=8192,
max_skew_dense=1.0,
normalize_grads=False,
dtype=torch.bfloat16,
)
```
`lr_style` can be "adam" for adam-style scaling, "mu-p" for mu-p scaling based on sqrt(G.shape[-2]), or None for
PSGD scaling of RMS=1.0.
## Resources
Xi-Lin Li's repo: https://github.com/lixilinx/psgd_torch
PSGD papers and resources listed from Xi-Lin's repo
1) Xi-Lin Li. Preconditioned stochastic gradient descent, [arXiv:1512.04202](https://arxiv.org/abs/1512.04202), 2015. (General ideas of PSGD, preconditioner fitting losses and Kronecker product preconditioners.)
2) Xi-Lin Li. Preconditioner on matrix Lie group for SGD, [arXiv:1809.10232](https://arxiv.org/abs/1809.10232), 2018. (Focus on preconditioners with the affine Lie group.)
3) Xi-Lin Li. Black box Lie group preconditioners for SGD, [arXiv:2211.04422](https://arxiv.org/abs/2211.04422), 2022. (Mainly about the LRA preconditioner. See [these supplementary materials](https://drive.google.com/file/d/1CTNx1q67_py87jn-0OI-vSLcsM1K7VsM/view) for detailed math derivations.)
4) Xi-Lin Li. Stochastic Hessian fittings on Lie groups, [arXiv:2402.11858](https://arxiv.org/abs/2402.11858), 2024. (Some theoretical works on the efficiency of PSGD. The Hessian fitting problem is shown to be strongly convex on set ${\rm GL}(n, \mathbb{R})/R_{\rm polar}$.)
5) Omead Pooladzandi, Xi-Lin Li. Curvature-informed SGD via general purpose Lie-group preconditioners, [arXiv:2402.04553](https://arxiv.org/abs/2402.04553), 2024. (Plenty of benchmark results and analyses for PSGD vs. other optimizers.)
## License
[![CC BY 4.0][cc-by-image]][cc-by]
This work is licensed under a [Creative Commons Attribution 4.0 International License][cc-by].
2024 Evan Walters, Omead Pooladzandi, Xi-Lin Li
[cc-by]: http://creativecommons.org/licenses/by/4.0/
[cc-by-image]: https://licensebuttons.net/l/by/4.0/88x31.png
[cc-by-shield]: https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg
Raw data
{
"_id": null,
"home_page": null,
"name": "quad-torch",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "python, machine learning, optimization, pytorch",
"author": "Evan Walters, Omead Pooladzandi, Xi-Lin Li",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/43/70/a2b7bac69a55a054e23a26cab4403ecfec8f86e4729193e0267101582ec2/quad_torch-0.2.0.tar.gz",
"platform": null,
"description": "# PSGD-QUAD\nAn implementation of PSGD-QUAD for PyTorch.\n\n\n```python\nimport torch\nfrom quad_torch import QUAD\n\nmodel = torch.nn.Linear(10, 10)\noptimizer = QUAD(\n model.parameters(),\n lr=0.001,\n lr_style=\"adam\", # \"adam\", \"mu-p\", or None\n momentum=0.95,\n weight_decay=0.1,\n preconditioner_lr=0.7,\n max_size_dense=8192,\n max_skew_dense=1.0,\n normalize_grads=False,\n dtype=torch.bfloat16,\n)\n```\n\n`lr_style` can be \"adam\" for adam-style scaling, \"mu-p\" for mu-p scaling based on sqrt(G.shape[-2]), or None for \nPSGD scaling of RMS=1.0.\n\n\n## Resources\n\nXi-Lin Li's repo: https://github.com/lixilinx/psgd_torch\n\nPSGD papers and resources listed from Xi-Lin's repo\n\n1) Xi-Lin Li. Preconditioned stochastic gradient descent, [arXiv:1512.04202](https://arxiv.org/abs/1512.04202), 2015. (General ideas of PSGD, preconditioner fitting losses and Kronecker product preconditioners.)\n2) Xi-Lin Li. Preconditioner on matrix Lie group for SGD, [arXiv:1809.10232](https://arxiv.org/abs/1809.10232), 2018. (Focus on preconditioners with the affine Lie group.)\n3) Xi-Lin Li. Black box Lie group preconditioners for SGD, [arXiv:2211.04422](https://arxiv.org/abs/2211.04422), 2022. (Mainly about the LRA preconditioner. See [these supplementary materials](https://drive.google.com/file/d/1CTNx1q67_py87jn-0OI-vSLcsM1K7VsM/view) for detailed math derivations.)\n4) Xi-Lin Li. Stochastic Hessian fittings on Lie groups, [arXiv:2402.11858](https://arxiv.org/abs/2402.11858), 2024. (Some theoretical works on the efficiency of PSGD. The Hessian fitting problem is shown to be strongly convex on set ${\\rm GL}(n, \\mathbb{R})/R_{\\rm polar}$.)\n5) Omead Pooladzandi, Xi-Lin Li. Curvature-informed SGD via general purpose Lie-group preconditioners, [arXiv:2402.04553](https://arxiv.org/abs/2402.04553), 2024. (Plenty of benchmark results and analyses for PSGD vs. other optimizers.)\n\n\n## License\n\n[![CC BY 4.0][cc-by-image]][cc-by]\n\nThis work is licensed under a [Creative Commons Attribution 4.0 International License][cc-by].\n\n2024 Evan Walters, Omead Pooladzandi, Xi-Lin Li\n\n\n[cc-by]: http://creativecommons.org/licenses/by/4.0/\n[cc-by-image]: https://licensebuttons.net/l/by/4.0/88x31.png\n[cc-by-shield]: https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg\n\n",
"bugtrack_url": null,
"license": null,
"summary": "An implementation of PSGD-QUAD optimizer in PyTorch.",
"version": "0.2.0",
"project_urls": {
"homepage": "https://github.com/evanatyourservice/quad_torch",
"repository": "https://github.com/evanatyourservice/quad_torch"
},
"split_keywords": [
"python",
" machine learning",
" optimization",
" pytorch"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "fd4f0cc589fb05abd0e24023ce67bb2b1958a0020d8f3267923daae5381186cc",
"md5": "d1bf410f43feaf584a344097b53f6932",
"sha256": "5d9ebd030196dd163ed3e5602828b377d0b8d7a644a8b68d3190028f688fc615"
},
"downloads": -1,
"filename": "quad_torch-0.2.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d1bf410f43feaf584a344097b53f6932",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 11982,
"upload_time": "2025-08-24T17:16:48",
"upload_time_iso_8601": "2025-08-24T17:16:48.895265Z",
"url": "https://files.pythonhosted.org/packages/fd/4f/0cc589fb05abd0e24023ce67bb2b1958a0020d8f3267923daae5381186cc/quad_torch-0.2.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "4370a2b7bac69a55a054e23a26cab4403ecfec8f86e4729193e0267101582ec2",
"md5": "567a61bce24b79e35a53046f07142234",
"sha256": "4f7541660e475e0a2407b4f622017f1cad4f5ba4f98e8d6d3e8517f53d68a9ab"
},
"downloads": -1,
"filename": "quad_torch-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "567a61bce24b79e35a53046f07142234",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 11268,
"upload_time": "2025-08-24T17:16:50",
"upload_time_iso_8601": "2025-08-24T17:16:50.139955Z",
"url": "https://files.pythonhosted.org/packages/43/70/a2b7bac69a55a054e23a26cab4403ecfec8f86e4729193e0267101582ec2/quad_torch-0.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-24 17:16:50",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "evanatyourservice",
"github_project": "quad_torch",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "quad-torch"
}