ncg-optimizer

Name	ncg-optimizer JSON
Version	0.1.1 JSON
	download
home_page	https://github.com/RyunMi/NCG-optimizer
Summary	PyTorch optimizer based on nonlinear conjugate gradient method
upload_time	2023-08-21 14:19:36
maintainer
docs_url	None
author	Kerun Mi
requires_python
license	Apache 2
keywords	ncg-optimizer pytorch lcg fr prp hs cd dy ls hz hs-dy armijo strong wolfe
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            =============
NCG-Optimizer
=============
.. image:: https://github.com/RyunMi/NCG-optimizer/workflows/CI/badge.svg
    :target: https://github.com/RyunMi/NCG-optimizer/actions?query=workflow%3ACI
    :alt: GitHub Actions status for master branch
.. image:: https://img.shields.io/pypi/pyversions/ncg-optimizer.svg
    :target: https://pypi.org/project/ncg-optimizer
.. image:: https://img.shields.io/pypi/v/ncg-optimizer.svg
    :target: https://pypi.python.org/pypi/ncg-optimizer
.. image:: https://static.pepy.tech/badge/ncg-optimizer
    :target: https://pepy.tech/project/ncg-optimizer
.. image:: https://img.shields.io/badge/License-Apache_2.0-blue.svg
    :target: https://opensource.org/licenses/Apache-2.0

**NCG-Optimizer** is a set of optimizer about *nonlinear conjugate gradient* in PyTorch.

Inspired by `jettify <https://github.com/jettify/pytorch-optimizer>`__ and `kozistr <https://github.com/kozistr/pytorch_optimizer>`__.

Install
=======

::

    $ pip install ncg_optimizer

Supported Optimizers
====================

Basic Methods
-------------

The theoretical analysis and implementation of all basic methods is based on the "Nonlinear Conjugate Gradient Method" [#NCGM]_ , "Numerical Optimization" ([#NO1]_ [#NO2]_) and "Conjugate gradient algorithms in nonconvex optimization"[#CGNO]_.

Linear Conjugate Gradient
^^^^^^^^^^^^^^^^^^^^^^^^^

The Linear Conjugate Gradient(**LCG**) method is only applicable to linear equation solving problems. It converts linear equations into quadratic functions, so that the problem can be solved iteratively without inverting the coefficient matrix.

.. image:: https://raw.githubusercontent.com/RyunMi/NCG-optimizer/master/docs/LCG.png
        :width: 800px

.. code-block:: python

        import ncg_optimizer as optim
        
        # model = Your Model
        
        optimizer = optim.LCG(model.parameters(), eps=1e-5)
        def closure():
            optimizer.zero_grad()
            loss(model(input), target).backward()
            return loss
        optimizer.step(closure)

Nonlinear Conjugate Gradient
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. image:: https://raw.githubusercontent.com/RyunMi/NCG-optimizer/master/docs/NCG.png
        :width: 800px

Fletcher-Reeves Method
""""""""""""""""""""""
The Fletcher-Reeves conjugate gradient( **FR** ) method  is the earliest nonlinear conjugate gradient method. 
It was obtained by Fletcher and Reeves in 1964 by extending the conjugate gradient method for solving linear equations to solve optimization problems. 

The scalar parameter update formula of the FR method is as follows:

$$ \\beta_k^{F R}=\\frac{g_{k+1}^T g_{k+1}}{g_k^T g_k}$$

The convergence analysis of FR method is often closely related to its selected line search. 
The FR method of exact line search is used to converge the general nonconvex function. 
The FR method of strong Wolfe inexact line search method $c_2 \\leq 0.5$ is adopted to globally converge to the general nonconvex function. 
The generalized Wolfe or Armijo inexact line search FR method is globally convergent for general nonconvex functions.

.. code-block:: python

        
        optimizer = optim.BASIC(
            model.parameters(), method = 'FR',
            line_search = 'Strong_Wolfe', c1 = 1e-4, 
            c2 = 0.5, lr = 0.2, max_ls = 25)
        def closure():
            optimizer.zero_grad()
            loss(model(input), target).backward()
            return loss
        optimizer.step(closure)

Polak-Ribiere-Polyak Method
"""""""""""""""""""""""""""

The  Polak-Ribiere-Polyak(**PRP**) method is a nonlinear conjugate gradient method proposed independently by Polak, Ribiere and Polyak in 1969. 
The PRP method is one of the conjugate gradient methods with the best numerical performance. 
When the algorithm produces a small step, the search direction $d_k$ defined by the PRP method automatically approaches the negative gradient direction, 
thus effectively avoiding the disadvantage that the FR method may continuously produce small steps.

The scalar parameter update formula of the PRP method is as follows:

$$ \\beta_k^{PRP}=\\frac{g_{k}^{T}(g_{k}-g_{k-1})}{\\lVert g_{k-1}\\rVert^2}$$

The convergence analysis of the PRP method is often closely related to the selected line search. When the step size $s_k = x_{k+1} - x_{k} \\to 0$ is regarded as a measure of global convergence, 
the PRP method of exact line search is used to converge the uniformly convex function under this benchmark. 
The PRP method using Armijo-type inexact line search method converges globally for general nonconvex functions. 
The PRP $^+$ method using the strong Wolfe( $0 < c_2 < \\frac{1}{4}$ ) inexact line search method converges globally for general nonconvex functions. 
The PRP method with some constant step size factor ( involving Lipschitz constant ) inexact line search method converges globally for general nonconvex functions.

.. code-block:: python


        optimizer = optim.BASIC(
            model.parameters(), method = 'PRP',
            line_search = 'Armijo', c1 = 1e-4, 
            c2 = 0.9, lr = 1, rho = 0.5,)
        def closure():
            optimizer.zero_grad()
            loss(model(input), target).backward()
            return loss
        optimizer.step(closure)

Hestenes-Stiefel Method
"""""""""""""""""""""""

Another famous conjugate gradient method Hestenes-Stiefel( **HS** ) method was proposed by Hestenes and Stiefel.
The scalar parameter update formula of the HS method is as follows:

$$ \\beta_{k}^{HS}=\\frac{g_{k}^{T}(g_{k}-g_{k-1})}{(g_{k}-g_{k-1})^Td_{k-1}} $$

Compared with the PRP method, an important property of the HS method is that the conjugate relation 
$d_k^T(g_{k}-g_{k-1}) = 0$ always holds regardless of the exact of the line search. 
However, the theoretical properties and computational performance of the HS method are similar to those of the PRP method.

The convergence analysis of the HS method is often closely related to the selected line search. 
If the $f(x)$ level set is bounded, its derivative is Lipschitz continuous and satisfies the sufficient descent condition, 
then the HS method with Wolfe inexact line search method is globally convergent. 
The HS $^+$ method with the strong Wolfe ( $0 < c_2 < \\frac{1}{3}$ ) inexact line search method converges globally for general nonconvex functions.

.. code-block:: python


        optimizer = optim.BASIC(
            model.parameters(), method = 'HS',
            line_search = 'Strong_Wolfe', c1 = 1e-4, 
            c2 = 0.4, lr = 0.2, max_ls = 25,)
        def closure():
            optimizer.zero_grad()
            loss(model(input), target).backward()
            return loss
        optimizer.step(closure)

Conjugate Descent Method
""""""""""""""""""""""""
Conjugate Descent ( **CD** ) was first introduced by Fletcherl in 1987. 
It can avoid the phenomenon that a rising search direction may occur in each iteration 
such as the PRP method and the FR method under certain conditions.

The scalar parameter update formula of the CD method is as follows:

$$ \\beta_{k}^{CD}=\\frac{g_{k}^T g_{k}}{-(g_{k-1})^T d_{k-1}} $$

The convergence analysis of the CD method is often closely related to the selected line search. 
The CD method using the strong Wolfe ( $c_2 < 1$ ) inexact line search method converges globally for general nonconvex functions, 
but the convergence accuracy cannot be guaranteed. 
The CD method using Armijo inexact line search method converges globally for general nonconvex functions.

.. code-block:: python


        optimizer = optim.BASIC(
            model.parameters(), method = 'CD',
            line_search = 'Armijo', c1 = 1e-4, 
            c2 = 0.9, lr = 1, rho = 0.5,)
        def closure():
            optimizer.zero_grad()
            loss(model(input), target).backward()
            return loss
        optimizer.step(closure)

Liu-Storey Method
"""""""""""""""""
Liu-Storey ( **LS** ) conjugate gradient method is a nonlinear conjugate gradient method 
proposed by Liu and Storey in 1991, which has good numerical performance.

The scalar parameter update formula of the LS method is as follows:

$$ \\beta_{k}^{LS}=\\frac{g_{k}^T (g_{k} - g_{k-1})}{ - g_{k-1}^T d_{k-1}} $$

The convergence analysis of the LS method is often closely related to the selected line search. 
The LS method with strong Wolfe inexact line search method has global convergence property ( under Lipschitz condition ). 
The LS method using Armijo-type inexact line search method converges globally for general nonconvex functions.

.. code-block:: python


        optimizer = optim.BASIC(
            model.parameters(), method = 'LS',
            line_search = 'Armijo', c1 = 1e-4, 
            c2 = 0.9, lr = 1, rho = 0.5,)
        def closure():
            optimizer.zero_grad()
            loss(model(input), target).backward()
            return loss
        optimizer.step(closure)



Dai-Yuan Method
"""""""""""""""

The Dai-Yuan method ( **DY** ) was first proposed by Yuhong Dai and Yaxiang Yuan in 1995, which always produces a descent search direction under weaker line search conditions and is globally convergent. 
In addition, good convergence results can be obtained without using strong Wolfe inexact line search but only using Wolfe inexact line search.

The scalar parameter update formula of the DY method is as follows:

$$ \\beta_{k}^{DY}=\\frac{g_{k}^T g_{k}}{(g_{k} - g_{k-1})^T d_{k-1}} $$

The convergence analysis of the DY method is often closely related to the selected line search. 
The DY method using the strong Wolfe inexact line search method can guarantee sufficient descent and global convergence for general nonconvex functions. 
The DY method using the Wolfe inexact line search method converges globally for general nonconvex functions.

.. code-block:: python


        optimizer = optim.BASIC(
            model.parameters(), method = 'DY',
            line_search = 'Strong_Wolfe', c1 = 1e-4, 
            c2 = 0.9, lr = 0.2, max_ls = 25,)
        def closure():
            optimizer.zero_grad()
            loss(model(input), target).backward()
            return loss
        optimizer.step(closure)

Hager-Zhang Method [#HZ]_
"""""""""""""""""""""""""
The Hager-Zhang ( **HZ** ) method is a new nonlinear conjugate gradient method proposed by Hager and Zhang in 2005. 
It satisfies the sufficient descent condition and has global convergence for strongly convex functions, 
and the search direction approaches the direction of the memoryless BFGS quasi-Newton method.

The scalar parameter update formula of the HZ method is as follows:

$$
\\beta_k^{HZ}=\\frac{1}{d_{k-1}^T (g_{k} - g_{k-1})}((g_{k} - g_{k-1})-2 d_{k-1} \\frac{\\|(g_{k} - g_{k-1}) \\|^2}{d_{k-1}^T (g_{k} - g_{k-1})})^T{g}_{k}
$$

The convergence analysis of the HZ method is often closely related to the selected line search. 
The HZ method with ( strong ) Wolfe inexact line search method converges globally for general nonconvex functions. 
The HZ $^+$ method using Armijo inexact line search method converges globally for general nonconvex functions.

.. code-block:: python


        optimizer = optim.BASIC(
            model.parameters(), method = 'HZ',
            line_search = 'Strong_Wolfe', c1 = 1e-4, 
            c2 = 0.9, lr = 0.2, max_ls = 25,)
        def closure():
            optimizer.zero_grad()
            loss(model(input), target).backward()
            return loss
        optimizer.step(closure)


Hybrid HS-DY Method
"""""""""""""""""""
Dai and Yuan studied the **HS-DY** hybrid conjugate gradient method of. 
Compared with other hybrid conjugate gradient methods ( such as FR + PRP hybrid conjugate gradient method ), 
the advantage of this hybrid method is that it does not require the line search to satisfy the strong Wolfe condition, but only the Wolfe condition. 
Their numerical experiments show that the HS-DY hybrid conjugate gradient method performs very well on difficult problems.

The scalar parameter update formula of the HS-DY method is as follows:

$$
\\beta_k^{HS-DY}=\\max (0, \\min (\\beta_k^{HS}, \\beta_k^{DY})))
$$

Regarding the convergence analysis of the HS-DY method, 
the HS-DY method using the Wolfe inexact line search method is globally convergent for general non-convex functions, 
and the performance effect is also better than the PRP method.

.. code-block:: python


        optimizer = optim.BASIC(
            model.parameters(), method = 'HS-DY',
            line_search = 'Armijo', c1 = 1e-4, 
            c2 = 0.9 lr = 1, rho = 0.5,)
        def closure():
            optimizer.zero_grad()
            loss(model(input), target).backward()
            return loss
        optimizer.step(closure)

Line Search
^^^^^^^^^^^
Armijo Line Search
""""""""""""""""""
In order to satisfy the condition that the decrease of the function is at least proportional to the decrease of the tangent, 
there are:

$$
f\\left(x_k+\\alpha_k d_k\\right) \\leqslant f\\left(x_k\\right)+c_1 \\alpha_k d_k^T g_k
$$

Among them, $c_1\\in (0,1)$ is generally taken as $c_1 = 10^{-4}$.

.. image:: https://raw.githubusercontent.com/RyunMi/NCG-optimizer/master/docs/ArmijoLS.png
        :width: 800px

Wolfe Line Search(Coming Soon...)
"""""""""""""""""""""""""""""""""
In the following two formulas, the first inequality is a overwrite of the Armijo criterion.
In addition, in order to prevent the step size from being too small and ensure that the objective function decreases sufficiently, 
the second inequality is introduced, so there is:

$$
f\\left(x_k+\\alpha_k d_k\\right) \\leqslant f\\left(x_k\\right)+c_1 \\alpha_k d_k^T g_k
$$

$$
\\nabla f\\left(x_k+\\alpha d_k\\right)^T d_k \\geq c_2 \\nabla f_k^T d_k  
$$

where the $c_2 \\in (c_1, 1)$.

Strong Wolfe Line Search [#NO1]_ [#MF]_
"""""""""""""""""""""""""""""""""""""""
The Strong Wolfe criterion reduces the constraint to less than 0 on the basis of the original Wolfe criterion to ensure the true approximation of the exact line search :

$$
f\\left(x_k+\\alpha_k d_k\\right) \\leqslant f\\left(x_k\\right)+c_1 \\alpha_k d_k^T g_k
$$

$$
\|\\nabla f\\left(x_k+\\alpha d_k\\right)^T d_k\| \\leq -c_2 \\nabla f_k^T d_k  
$$

.. image:: https://raw.githubusercontent.com/RyunMi/NCG-optimizer/master/docs/Strong_Wolfe.png
        :width: 800px

.. image:: https://raw.githubusercontent.com/RyunMi/NCG-optimizer/master/docs/Zoom.png
        :width: 800px

Citation
========
Please cite original authors of optimization algorithms. If you like this software:

::

    @software{Mi_ncgoptimizer,
    	title        = {{NCG-Optimizer -- a set of optimizer about nonlinear conjugate gradient in PyTorch.}},
    	author       = {Kerun Mi},
    	year         = 2023,
    	month        = 2,
    	version      = {0.1.0}}

Or you can get from "cite this repository" button.


References
==========

.. [#NCGM] Y.H. Dai and Y. Yuan (2000), Nonlinear Conjugate Gradient Methods, Shanghai Scientific and Technical Publishers, Shanghai. (in Chinese)
.. [#NO1] Nocedal J, Wright S J. Line search methods[J]. Numerical optimization, 2006: 30-65.
.. [#NO2] Nocedal J, Wright S J. Conjugate gradient methods[J]. Numerical optimization, 2006: 101-134. 
.. [#CGNO] Pytlak R. Conjugate gradient algorithms in nonconvex optimization[M]. Springer Science & Business Media, 2008.
.. [#HZ] Hager W W, Zhang H. A new conjugate gradient method with guaranteed descent and an efficient line search[J]. SIAM Journal on optimization, 2005, 16(1): 170-192.
.. [#MF] Schmidt M. minFunc: unconstrained differentiable multivariate optimization in Matlab[J]. Software available at https://www.cs.ubc.ca/~schmidtm/Software/minFunc.html, 2005.

Changes
-------

0.1.0 (2023-02-05)
------------------
* Initial release.
* Added support for Linear Conjugate Gradient(LCG), Basic Nonlinear Conjugate Gradient(FR, PRP, HS, CD, DY, LS, HZ, HS-DY) methods and two line search function(Armijo & Strong Wolfe).

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/RyunMi/NCG-optimizer",
    "name": "ncg-optimizer",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "ncg-optimizer,pytorch,LCG,FR,PRP,HS,CD,DY,LS,HZ,HS-DY,Armijo,Strong Wolfe",
    "author": "Kerun Mi",
    "author_email": "ryunxiaomi@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/28/fc/47731fb9e68daf092c5d047356d89183867f0fcfa081583e3de7c1833838/ncg-optimizer-0.1.1.tar.gz",
    "platform": null,
    "description": "=============\nNCG-Optimizer\n=============\n.. image:: https://github.com/RyunMi/NCG-optimizer/workflows/CI/badge.svg\n    :target: https://github.com/RyunMi/NCG-optimizer/actions?query=workflow%3ACI\n    :alt: GitHub Actions status for master branch\n.. image:: https://img.shields.io/pypi/pyversions/ncg-optimizer.svg\n    :target: https://pypi.org/project/ncg-optimizer\n.. image:: https://img.shields.io/pypi/v/ncg-optimizer.svg\n    :target: https://pypi.python.org/pypi/ncg-optimizer\n.. image:: https://static.pepy.tech/badge/ncg-optimizer\n    :target: https://pepy.tech/project/ncg-optimizer\n.. image:: https://img.shields.io/badge/License-Apache_2.0-blue.svg\n    :target: https://opensource.org/licenses/Apache-2.0\n\n**NCG-Optimizer** is a set of optimizer about *nonlinear conjugate gradient* in PyTorch.\n\nInspired by `jettify <https://github.com/jettify/pytorch-optimizer>`__ and `kozistr <https://github.com/kozistr/pytorch_optimizer>`__.\n\nInstall\n=======\n\n::\n\n    $ pip install ncg_optimizer\n\nSupported Optimizers\n====================\n\nBasic Methods\n-------------\n\nThe theoretical analysis and implementation of all basic methods is based on the \"Nonlinear Conjugate Gradient Method\" [#NCGM]_ , \"Numerical Optimization\" ([#NO1]_ [#NO2]_) and \"Conjugate gradient algorithms in nonconvex optimization\"[#CGNO]_.\n\nLinear Conjugate Gradient\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe Linear Conjugate Gradient(**LCG**) method is only applicable to linear equation solving problems. It converts linear equations into quadratic functions, so that the problem can be solved iteratively without inverting the coefficient matrix.\n\n.. image:: https://raw.githubusercontent.com/RyunMi/NCG-optimizer/master/docs/LCG.png\n        :width: 800px\n\n.. code-block:: python\n\n        import ncg_optimizer as optim\n        \n        # model = Your Model\n        \n        optimizer = optim.LCG(model.parameters(), eps=1e-5)\n        def closure():\n            optimizer.zero_grad()\n            loss(model(input), target).backward()\n            return loss\n        optimizer.step(closure)\n\nNonlinear Conjugate Gradient\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n.. image:: https://raw.githubusercontent.com/RyunMi/NCG-optimizer/master/docs/NCG.png\n        :width: 800px\n\nFletcher-Reeves Method\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\nThe Fletcher-Reeves conjugate gradient( **FR** ) method  is the earliest nonlinear conjugate gradient method. \nIt was obtained by Fletcher and Reeves in 1964 by extending the conjugate gradient method for solving linear equations to solve optimization problems. \n\nThe scalar parameter update formula of the FR method is as follows:\n\n$$ \\\\beta_k^{F R}=\\\\frac{g_{k+1}^T g_{k+1}}{g_k^T g_k}$$\n\nThe convergence analysis of FR method is often closely related to its selected line search. \nThe FR method of exact line search is used to converge the general nonconvex function. \nThe FR method of strong Wolfe inexact line search method $c_2 \\\\leq 0.5$ is adopted to globally converge to the general nonconvex function. \nThe generalized Wolfe or Armijo inexact line search FR method is globally convergent for general nonconvex functions.\n\n.. code-block:: python\n\n        \n        optimizer = optim.BASIC(\n            model.parameters(), method = 'FR',\n            line_search = 'Strong_Wolfe', c1 = 1e-4, \n            c2 = 0.5, lr = 0.2, max_ls = 25)\n        def closure():\n            optimizer.zero_grad()\n            loss(model(input), target).backward()\n            return loss\n        optimizer.step(closure)\n\nPolak-Ribiere-Polyak Method\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\nThe  Polak-Ribiere-Polyak(**PRP**) method is a nonlinear conjugate gradient method proposed independently by Polak, Ribiere and Polyak in 1969. \nThe PRP method is one of the conjugate gradient methods with the best numerical performance. \nWhen the algorithm produces a small step, the search direction $d_k$ defined by the PRP method automatically approaches the negative gradient direction, \nthus effectively avoiding the disadvantage that the FR method may continuously produce small steps.\n\nThe scalar parameter update formula of the PRP method is as follows:\n\n$$ \\\\beta_k^{PRP}=\\\\frac{g_{k}^{T}(g_{k}-g_{k-1})}{\\\\lVert g_{k-1}\\\\rVert^2}$$\n\nThe convergence analysis of the PRP method is often closely related to the selected line search. When the step size $s_k = x_{k+1} - x_{k} \\\\to 0$ is regarded as a measure of global convergence, \nthe PRP method of exact line search is used to converge the uniformly convex function under this benchmark. \nThe PRP method using Armijo-type inexact line search method converges globally for general nonconvex functions. \nThe PRP $^+$ method using the strong Wolfe( $0 < c_2 < \\\\frac{1}{4}$ ) inexact line search method converges globally for general nonconvex functions. \nThe PRP method with some constant step size factor ( involving Lipschitz constant ) inexact line search method converges globally for general nonconvex functions.\n\n.. code-block:: python\n\n\n        optimizer = optim.BASIC(\n            model.parameters(), method = 'PRP',\n            line_search = 'Armijo', c1 = 1e-4, \n            c2 = 0.9, lr = 1, rho = 0.5,)\n        def closure():\n            optimizer.zero_grad()\n            loss(model(input), target).backward()\n            return loss\n        optimizer.step(closure)\n\nHestenes-Stiefel Method\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\nAnother famous conjugate gradient method Hestenes-Stiefel( **HS** ) method was proposed by Hestenes and Stiefel.\nThe scalar parameter update formula of the HS method is as follows:\n\n$$ \\\\beta_{k}^{HS}=\\\\frac{g_{k}^{T}(g_{k}-g_{k-1})}{(g_{k}-g_{k-1})^Td_{k-1}} $$\n\nCompared with the PRP method, an important property of the HS method is that the conjugate relation \n$d_k^T(g_{k}-g_{k-1}) = 0$ always holds regardless of the exact of the line search. \nHowever, the theoretical properties and computational performance of the HS method are similar to those of the PRP method.\n\nThe convergence analysis of the HS method is often closely related to the selected line search. \nIf the $f(x)$ level set is bounded, its derivative is Lipschitz continuous and satisfies the sufficient descent condition, \nthen the HS method with Wolfe inexact line search method is globally convergent. \nThe HS $^+$ method with the strong Wolfe ( $0 < c_2 < \\\\frac{1}{3}$ ) inexact line search method converges globally for general nonconvex functions.\n\n.. code-block:: python\n\n\n        optimizer = optim.BASIC(\n            model.parameters(), method = 'HS',\n            line_search = 'Strong_Wolfe', c1 = 1e-4, \n            c2 = 0.4, lr = 0.2, max_ls = 25,)\n        def closure():\n            optimizer.zero_grad()\n            loss(model(input), target).backward()\n            return loss\n        optimizer.step(closure)\n\nConjugate Descent Method\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\nConjugate Descent ( **CD** ) was first introduced by Fletcherl in 1987. \nIt can avoid the phenomenon that a rising search direction may occur in each iteration \nsuch as the PRP method and the FR method under certain conditions.\n\nThe scalar parameter update formula of the CD method is as follows:\n\n$$ \\\\beta_{k}^{CD}=\\\\frac{g_{k}^T g_{k}}{-(g_{k-1})^T d_{k-1}} $$\n\nThe convergence analysis of the CD method is often closely related to the selected line search. \nThe CD method using the strong Wolfe ( $c_2 < 1$ ) inexact line search method converges globally for general nonconvex functions, \nbut the convergence accuracy cannot be guaranteed. \nThe CD method using Armijo inexact line search method converges globally for general nonconvex functions.\n\n.. code-block:: python\n\n\n        optimizer = optim.BASIC(\n            model.parameters(), method = 'CD',\n            line_search = 'Armijo', c1 = 1e-4, \n            c2 = 0.9, lr = 1, rho = 0.5,)\n        def closure():\n            optimizer.zero_grad()\n            loss(model(input), target).backward()\n            return loss\n        optimizer.step(closure)\n\nLiu-Storey Method\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\nLiu-Storey ( **LS** ) conjugate gradient method is a nonlinear conjugate gradient method \nproposed by Liu and Storey in 1991, which has good numerical performance.\n\nThe scalar parameter update formula of the LS method is as follows:\n\n$$ \\\\beta_{k}^{LS}=\\\\frac{g_{k}^T (g_{k} - g_{k-1})}{ - g_{k-1}^T d_{k-1}} $$\n\nThe convergence analysis of the LS method is often closely related to the selected line search. \nThe LS method with strong Wolfe inexact line search method has global convergence property ( under Lipschitz condition ). \nThe LS method using Armijo-type inexact line search method converges globally for general nonconvex functions.\n\n.. code-block:: python\n\n\n        optimizer = optim.BASIC(\n            model.parameters(), method = 'LS',\n            line_search = 'Armijo', c1 = 1e-4, \n            c2 = 0.9, lr = 1, rho = 0.5,)\n        def closure():\n            optimizer.zero_grad()\n            loss(model(input), target).backward()\n            return loss\n        optimizer.step(closure)\n\n\n\nDai-Yuan Method\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\nThe Dai-Yuan method ( **DY** ) was first proposed by Yuhong Dai and Yaxiang Yuan in 1995, which always produces a descent search direction under weaker line search conditions and is globally convergent. \nIn addition, good convergence results can be obtained without using strong Wolfe inexact line search but only using Wolfe inexact line search.\n\nThe scalar parameter update formula of the DY method is as follows:\n\n$$ \\\\beta_{k}^{DY}=\\\\frac{g_{k}^T g_{k}}{(g_{k} - g_{k-1})^T d_{k-1}} $$\n\nThe convergence analysis of the DY method is often closely related to the selected line search. \nThe DY method using the strong Wolfe inexact line search method can guarantee sufficient descent and global convergence for general nonconvex functions. \nThe DY method using the Wolfe inexact line search method converges globally for general nonconvex functions.\n\n.. code-block:: python\n\n\n        optimizer = optim.BASIC(\n            model.parameters(), method = 'DY',\n            line_search = 'Strong_Wolfe', c1 = 1e-4, \n            c2 = 0.9, lr = 0.2, max_ls = 25,)\n        def closure():\n            optimizer.zero_grad()\n            loss(model(input), target).backward()\n            return loss\n        optimizer.step(closure)\n\nHager-Zhang Method [#HZ]_\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\nThe Hager-Zhang ( **HZ** ) method is a new nonlinear conjugate gradient method proposed by Hager and Zhang in 2005. \nIt satisfies the sufficient descent condition and has global convergence for strongly convex functions, \nand the search direction approaches the direction of the memoryless BFGS quasi-Newton method.\n\nThe scalar parameter update formula of the HZ method is as follows:\n\n$$\n\\\\beta_k^{HZ}=\\\\frac{1}{d_{k-1}^T (g_{k} - g_{k-1})}((g_{k} - g_{k-1})-2 d_{k-1} \\\\frac{\\\\|(g_{k} - g_{k-1}) \\\\|^2}{d_{k-1}^T (g_{k} - g_{k-1})})^T{g}_{k}\n$$\n\nThe convergence analysis of the HZ method is often closely related to the selected line search. \nThe HZ method with ( strong ) Wolfe inexact line search method converges globally for general nonconvex functions. \nThe HZ $^+$ method using Armijo inexact line search method converges globally for general nonconvex functions.\n\n.. code-block:: python\n\n\n        optimizer = optim.BASIC(\n            model.parameters(), method = 'HZ',\n            line_search = 'Strong_Wolfe', c1 = 1e-4, \n            c2 = 0.9, lr = 0.2, max_ls = 25,)\n        def closure():\n            optimizer.zero_grad()\n            loss(model(input), target).backward()\n            return loss\n        optimizer.step(closure)\n\n\nHybrid HS-DY Method\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\nDai and Yuan studied the **HS-DY** hybrid conjugate gradient method of. \nCompared with other hybrid conjugate gradient methods ( such as FR + PRP hybrid conjugate gradient method ), \nthe advantage of this hybrid method is that it does not require the line search to satisfy the strong Wolfe condition, but only the Wolfe condition. \nTheir numerical experiments show that the HS-DY hybrid conjugate gradient method performs very well on difficult problems.\n\nThe scalar parameter update formula of the HS-DY method is as follows:\n\n$$\n\\\\beta_k^{HS-DY}=\\\\max (0, \\\\min (\\\\beta_k^{HS}, \\\\beta_k^{DY})))\n$$\n\nRegarding the convergence analysis of the HS-DY method, \nthe HS-DY method using the Wolfe inexact line search method is globally convergent for general non-convex functions, \nand the performance effect is also better than the PRP method.\n\n.. code-block:: python\n\n\n        optimizer = optim.BASIC(\n            model.parameters(), method = 'HS-DY',\n            line_search = 'Armijo', c1 = 1e-4, \n            c2 = 0.9 lr = 1, rho = 0.5,)\n        def closure():\n            optimizer.zero_grad()\n            loss(model(input), target).backward()\n            return loss\n        optimizer.step(closure)\n\nLine Search\n^^^^^^^^^^^\nArmijo Line Search\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\nIn order to satisfy the condition that the decrease of the function is at least proportional to the decrease of the tangent, \nthere are:\n\n$$\nf\\\\left(x_k+\\\\alpha_k d_k\\\\right) \\\\leqslant f\\\\left(x_k\\\\right)+c_1 \\\\alpha_k d_k^T g_k\n$$\n\nAmong them, $c_1\\\\in (0,1)$ is generally taken as $c_1 = 10^{-4}$.\n\n.. image:: https://raw.githubusercontent.com/RyunMi/NCG-optimizer/master/docs/ArmijoLS.png\n        :width: 800px\n\nWolfe Line Search(Coming Soon...)\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\nIn the following two formulas, the first inequality is a overwrite of the Armijo criterion.\nIn addition, in order to prevent the step size from being too small and ensure that the objective function decreases sufficiently, \nthe second inequality is introduced, so there is:\n\n$$\nf\\\\left(x_k+\\\\alpha_k d_k\\\\right) \\\\leqslant f\\\\left(x_k\\\\right)+c_1 \\\\alpha_k d_k^T g_k\n$$\n\n$$\n\\\\nabla f\\\\left(x_k+\\\\alpha d_k\\\\right)^T d_k \\\\geq c_2 \\\\nabla f_k^T d_k  \n$$\n\nwhere the $c_2 \\\\in (c_1, 1)$.\n\nStrong Wolfe Line Search [#NO1]_ [#MF]_\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\nThe Strong Wolfe criterion reduces the constraint to less than 0 on the basis of the original Wolfe criterion to ensure the true approximation of the exact line search :\n\n$$\nf\\\\left(x_k+\\\\alpha_k d_k\\\\right) \\\\leqslant f\\\\left(x_k\\\\right)+c_1 \\\\alpha_k d_k^T g_k\n$$\n\n$$\n\\|\\\\nabla f\\\\left(x_k+\\\\alpha d_k\\\\right)^T d_k\\| \\\\leq -c_2 \\\\nabla f_k^T d_k  \n$$\n\n.. image:: https://raw.githubusercontent.com/RyunMi/NCG-optimizer/master/docs/Strong_Wolfe.png\n        :width: 800px\n\n.. image:: https://raw.githubusercontent.com/RyunMi/NCG-optimizer/master/docs/Zoom.png\n        :width: 800px\n\nCitation\n========\nPlease cite original authors of optimization algorithms. If you like this software:\n\n::\n\n    @software{Mi_ncgoptimizer,\n    \ttitle        = {{NCG-Optimizer -- a set of optimizer about nonlinear conjugate gradient in PyTorch.}},\n    \tauthor       = {Kerun Mi},\n    \tyear         = 2023,\n    \tmonth        = 2,\n    \tversion      = {0.1.0}}\n\nOr you can get from \"cite this repository\" button.\n\n\nReferences\n==========\n\n.. [#NCGM] Y.H. Dai and Y. Yuan (2000), Nonlinear Conjugate Gradient Methods, Shanghai Scientific and Technical Publishers, Shanghai. (in Chinese)\n.. [#NO1] Nocedal J, Wright S J. Line search methods[J]. Numerical optimization, 2006: 30-65.\n.. [#NO2] Nocedal J, Wright S J. Conjugate gradient methods[J]. Numerical optimization, 2006: 101-134. \n.. [#CGNO] Pytlak R. Conjugate gradient algorithms in nonconvex optimization[M]. Springer Science & Business Media, 2008.\n.. [#HZ] Hager W W, Zhang H. A new conjugate gradient method with guaranteed descent and an efficient line search[J]. SIAM Journal on optimization, 2005, 16(1): 170-192.\n.. [#MF] Schmidt M. minFunc: unconstrained differentiable multivariate optimization in Matlab[J]. Software available at https://www.cs.ubc.ca/~schmidtm/Software/minFunc.html, 2005.\n\nChanges\n-------\n\n0.1.0 (2023-02-05)\n------------------\n* Initial release.\n* Added support for Linear Conjugate Gradient(LCG), Basic Nonlinear Conjugate Gradient(FR, PRP, HS, CD, DY, LS, HZ, HS-DY) methods and two line search function(Armijo & Strong Wolfe).\n",
    "bugtrack_url": null,
    "license": "Apache 2",
    "summary": "PyTorch optimizer based on nonlinear conjugate gradient method",
    "version": "0.1.1",
    "project_urls": {
        "Homepage": "https://github.com/RyunMi/NCG-optimizer"
    },
    "split_keywords": [
        "ncg-optimizer",
        "pytorch",
        "lcg",
        "fr",
        "prp",
        "hs",
        "cd",
        "dy",
        "ls",
        "hz",
        "hs-dy",
        "armijo",
        "strong wolfe"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2131d39e678316d923725b2cee0dff6bacd6cc191b2f01a2c6c03f9746beddee",
                "md5": "f87cf010b5e38338f2a77de090197551",
                "sha256": "2a765bdb6749da24028f55905c2af85d1e8cd22ce671952de3833f7491d0e207"
            },
            "downloads": -1,
            "filename": "ncg_optimizer-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f87cf010b5e38338f2a77de090197551",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 16507,
            "upload_time": "2023-08-21T14:19:34",
            "upload_time_iso_8601": "2023-08-21T14:19:34.391940Z",
            "url": "https://files.pythonhosted.org/packages/21/31/d39e678316d923725b2cee0dff6bacd6cc191b2f01a2c6c03f9746beddee/ncg_optimizer-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "28fc47731fb9e68daf092c5d047356d89183867f0fcfa081583e3de7c1833838",
                "md5": "c266d898ca0ba649de7f06a409216f39",
                "sha256": "349a11b163e026df9ce7fe1993ac66ea7620dce828d7f95833757d805ac81534"
            },
            "downloads": -1,
            "filename": "ncg-optimizer-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "c266d898ca0ba649de7f06a409216f39",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 20174,
            "upload_time": "2023-08-21T14:19:36",
            "upload_time_iso_8601": "2023-08-21T14:19:36.627240Z",
            "url": "https://files.pythonhosted.org/packages/28/fc/47731fb9e68daf092c5d047356d89183867f0fcfa081583e3de7c1833838/ncg-optimizer-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-21 14:19:36",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "RyunMi",
    "github_project": "NCG-optimizer",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "ncg-optimizer"
}

Kerun Mi