gradient-ascent

Name	gradient-ascent JSON
Version	0.0.2 JSON
	download
home_page	https://github.com/kyegomez/gradient-ascent
Summary	Gradient Ascent - Pytorch
upload_time	2023-10-21 16:47:33
maintainer
docs_url	None
author	Kye Gomez
requires_python	>=3.6,<4.0
license	MIT
keywords	artificial intelligence deep learning optimizers prompt engineering
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            [![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)

# Gradient Ascent
Gradient Ascent is just the opposite of Gradient Descent. While Gradient Descent adjusts the parameters in the opposite direction of the gradient to minimize a loss function, Gradient Ascent adjusts the parameters in the direction of the gradient to maximize some objective function.


I got the idea for this while playing basketball, I don't know why or how but this is my attempt to implement it.



# Appreciation
* Lucidrains
* Agorians

# Install
`pip install gradient-ascent`

# Usage
```python
import torch
from gradient_ascent.main import GradientAscent


class SimpleModel(torch.nn.Module):
    def __init__(self):
        super(SimpleModel, self).__init__()
        self.fc = torch.nn.Linear(1, 1)

    def forward(self, x):
        return self.fc(x)


# Test the optimizer
model = SimpleModel()
optimizer = GradientAscent(model.parameters(), lr=0.01)

# General some sample data
data = torch.tensor([[2.0]])
target = torch.tensor([[3.0]])

for _ in range(1000):
    optimizer.zero_grad()
    output = model(data)

    # Negative loss as we are maximizing]
    loss = -torch.nn.functional.mse_loss(output, target)
    loss.backward()
    optimizer.step()

print("Final output after training: ", model(data))

```

# Architecture

### Theoretical Overview
For a function \( f(\theta) \), the update step in gradient ascent is given by:

\[ \theta_{new} = \theta_{old} + \alpha \nabla f(\theta_{old}) \]

Where:
- \( \theta \) are the parameters.
- \( \alpha \) is the learning rate.
- \( \nabla f(\theta_{old}) \) is the gradient of the function with respect to the parameters.

### Gradient Ascent Pseudocode

```
Algorithm: GradientAscentOptimizer

1. Input: 
   - Objective function f(θ)
   - Initial parameters θ₀
   - Learning rate α
   - Maximum iterations max_iter

2. For iteration = 1 to max_iter:
   a. Compute gradient: ∇θ = gradient of f(θ) w.r.t θ
   b. Update parameters: θ = θ + α * ∇θ

3. Return final parameters θ
```

### 3 New Features
1. Non-Convexity:
Many problems in deep learning involve non-convex optimization landscapes. Gradient ascent, like gradient descent, can get stuck in local maxima when dealing with such landscapes. Adding mechanisms to escape from these local optima can be necessary.

2. Momentum:
Momentum can be integrated to accelerate gradient vectors in any consistent direction, which can help in faster convergence and also in avoiding getting stuck in shallow local maxima.

3. Adaptive Learning Rates:
The learning rate might need to adapt based on the recent history of gradients, allowing the optimization to move faster during the early stages and slow down during fine-tuning. This is seen in optimizers like AdaGrad, RMSProp, and Adam.



# Applications:
The Gradient Ascent with features like momentum and adaptive learning rates, as discussed, is tailored to handle challenges in non-convex optimization landscapes. Here are some tasks and scenarios where this optimizer would be particularly beneficial:

1. **Maximizing Likelihoods**: 
   - Many models in machine learning are framed as maximum likelihood estimation (MLE) problems, where the goal is to adjust parameters to maximize the likelihood of the observed data. Examples include Gaussian Mixture Models, Hidden Markov Models, and some Neural Network configurations. Gradient ascent is directly applicable in such cases.

2. **Generative Adversarial Networks (GANs)**:
   - The generator in a GAN tries to maximize an objective function where it fools the discriminator. While traditional GANs use gradient descent with a flipped objective, using gradient ascent can be a more direct way to express this maximization problem.

3. **Game Theoretic Frameworks**:
   - In scenarios where multiple agents are in competition or cooperation, and their objectives are to maximize certain rewards, gradient ascent can be used. This applies to multi-agent reinforcement learning and certain types of equilibrium-seeking networks.

4. **Policy Gradient Methods in Reinforcement Learning**:
   - Policy gradient methods aim to maximize the expected reward by adjusting a policy in the direction that increases expected returns. Gradient ascent is a natural fit for this optimization problem.

5. **Eigenproblems**:
   - In tasks where the goal is to find the maximum eigenvalue of a matrix or the corresponding eigenvector, gradient ascent techniques can be applied.

6. **Feature Extraction and Representation Learning**:
   - When the goal is to learn features that maximize the variance or mutual information (e.g., Principal Component Analysis or some Information Maximization approaches), gradient ascent can be used to optimize the objective directly.

7. **Sparse Coding**:
   - The aim here is to find a representation that maximizes the sparsity under certain constraints. The problem can be reframed as a maximization problem solvable with gradient ascent.

For scenarios with non-convex landscapes, the features like **momentum** help escape shallow local maxima, and **adaptive learning rates** ensure efficient traversal of the optimization landscape, adapting the step sizes based on the gradient's recent history.

However, while this optimizer can be effective in the above scenarios, one should always consider the specific nuances of the problem. It's essential to remember that no optimizer is universally the best, and empirical testing is often necessary to determine the most effective optimizer for a particular task.

# Benchmarks
`python benchmarks.py`

```
Benchmark 1: 9.999994277954102
Benchmark 2: 1.375625112855263e-23
Benchmark 3: -131395.9375
Benchmark 4: -333186848.0
Benchmark 5: -166376013824.0
Benchmark 6: 0.31278279423713684
Benchmark 7: [1.375625112855263e-23, 1.375625112855263e-23]
Benchmark 8: -28.793724060058594
Benchmark 9: 1.0
Benchmark 10: 0.8203693628311157
```

# License
MIT



# Todo
- Provide metric logging + make more dynamic
- Add more benchmarks
- Validate by training a small Hidden Markov Model or another model

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/kyegomez/gradient-ascent",
    "name": "gradient-ascent",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6,<4.0",
    "maintainer_email": "",
    "keywords": "artificial intelligence,deep learning,optimizers,Prompt Engineering",
    "author": "Kye Gomez",
    "author_email": "kye@apac.ai",
    "download_url": "https://files.pythonhosted.org/packages/ad/4a/3a34b23e08e0e6c9b2bb46327e8240af60aa04ae1d13a0ae4cbacc646de8/gradient_ascent-0.0.2.tar.gz",
    "platform": null,
    "description": "[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)\n\n# Gradient Ascent\nGradient Ascent is just the opposite of Gradient Descent. While Gradient Descent adjusts the parameters in the opposite direction of the gradient to minimize a loss function, Gradient Ascent adjusts the parameters in the direction of the gradient to maximize some objective function.\n\n\nI got the idea for this while playing basketball, I don't know why or how but this is my attempt to implement it.\n\n\n\n# Appreciation\n* Lucidrains\n* Agorians\n\n# Install\n`pip install gradient-ascent`\n\n# Usage\n```python\nimport torch\nfrom gradient_ascent.main import GradientAscent\n\n\nclass SimpleModel(torch.nn.Module):\n    def __init__(self):\n        super(SimpleModel, self).__init__()\n        self.fc = torch.nn.Linear(1, 1)\n\n    def forward(self, x):\n        return self.fc(x)\n\n\n# Test the optimizer\nmodel = SimpleModel()\noptimizer = GradientAscent(model.parameters(), lr=0.01)\n\n# General some sample data\ndata = torch.tensor([[2.0]])\ntarget = torch.tensor([[3.0]])\n\nfor _ in range(1000):\n    optimizer.zero_grad()\n    output = model(data)\n\n    # Negative loss as we are maximizing]\n    loss = -torch.nn.functional.mse_loss(output, target)\n    loss.backward()\n    optimizer.step()\n\nprint(\"Final output after training: \", model(data))\n\n```\n\n# Architecture\n\n### Theoretical Overview\nFor a function \\( f(\\theta) \\), the update step in gradient ascent is given by:\n\n\\[ \\theta_{new} = \\theta_{old} + \\alpha \\nabla f(\\theta_{old}) \\]\n\nWhere:\n- \\( \\theta \\) are the parameters.\n- \\( \\alpha \\) is the learning rate.\n- \\( \\nabla f(\\theta_{old}) \\) is the gradient of the function with respect to the parameters.\n\n### Gradient Ascent Pseudocode\n\n```\nAlgorithm: GradientAscentOptimizer\n\n1. Input: \n   - Objective function f(\u03b8)\n   - Initial parameters \u03b8\u2080\n   - Learning rate \u03b1\n   - Maximum iterations max_iter\n\n2. For iteration = 1 to max_iter:\n   a. Compute gradient: \u2207\u03b8 = gradient of f(\u03b8) w.r.t \u03b8\n   b. Update parameters: \u03b8 = \u03b8 + \u03b1 * \u2207\u03b8\n\n3. Return final parameters \u03b8\n```\n\n### 3 New Features\n1. Non-Convexity:\nMany problems in deep learning involve non-convex optimization landscapes. Gradient ascent, like gradient descent, can get stuck in local maxima when dealing with such landscapes. Adding mechanisms to escape from these local optima can be necessary.\n\n2. Momentum:\nMomentum can be integrated to accelerate gradient vectors in any consistent direction, which can help in faster convergence and also in avoiding getting stuck in shallow local maxima.\n\n3. Adaptive Learning Rates:\nThe learning rate might need to adapt based on the recent history of gradients, allowing the optimization to move faster during the early stages and slow down during fine-tuning. This is seen in optimizers like AdaGrad, RMSProp, and Adam.\n\n\n\n# Applications:\nThe Gradient Ascent with features like momentum and adaptive learning rates, as discussed, is tailored to handle challenges in non-convex optimization landscapes. Here are some tasks and scenarios where this optimizer would be particularly beneficial:\n\n1. **Maximizing Likelihoods**: \n   - Many models in machine learning are framed as maximum likelihood estimation (MLE) problems, where the goal is to adjust parameters to maximize the likelihood of the observed data. Examples include Gaussian Mixture Models, Hidden Markov Models, and some Neural Network configurations. Gradient ascent is directly applicable in such cases.\n\n2. **Generative Adversarial Networks (GANs)**:\n   - The generator in a GAN tries to maximize an objective function where it fools the discriminator. While traditional GANs use gradient descent with a flipped objective, using gradient ascent can be a more direct way to express this maximization problem.\n\n3. **Game Theoretic Frameworks**:\n   - In scenarios where multiple agents are in competition or cooperation, and their objectives are to maximize certain rewards, gradient ascent can be used. This applies to multi-agent reinforcement learning and certain types of equilibrium-seeking networks.\n\n4. **Policy Gradient Methods in Reinforcement Learning**:\n   - Policy gradient methods aim to maximize the expected reward by adjusting a policy in the direction that increases expected returns. Gradient ascent is a natural fit for this optimization problem.\n\n5. **Eigenproblems**:\n   - In tasks where the goal is to find the maximum eigenvalue of a matrix or the corresponding eigenvector, gradient ascent techniques can be applied.\n\n6. **Feature Extraction and Representation Learning**:\n   - When the goal is to learn features that maximize the variance or mutual information (e.g., Principal Component Analysis or some Information Maximization approaches), gradient ascent can be used to optimize the objective directly.\n\n7. **Sparse Coding**:\n   - The aim here is to find a representation that maximizes the sparsity under certain constraints. The problem can be reframed as a maximization problem solvable with gradient ascent.\n\nFor scenarios with non-convex landscapes, the features like **momentum** help escape shallow local maxima, and **adaptive learning rates** ensure efficient traversal of the optimization landscape, adapting the step sizes based on the gradient's recent history.\n\nHowever, while this optimizer can be effective in the above scenarios, one should always consider the specific nuances of the problem. It's essential to remember that no optimizer is universally the best, and empirical testing is often necessary to determine the most effective optimizer for a particular task.\n\n# Benchmarks\n`python benchmarks.py`\n\n```\nBenchmark 1: 9.999994277954102\nBenchmark 2: 1.375625112855263e-23\nBenchmark 3: -131395.9375\nBenchmark 4: -333186848.0\nBenchmark 5: -166376013824.0\nBenchmark 6: 0.31278279423713684\nBenchmark 7: [1.375625112855263e-23, 1.375625112855263e-23]\nBenchmark 8: -28.793724060058594\nBenchmark 9: 1.0\nBenchmark 10: 0.8203693628311157\n```\n\n# License\nMIT\n\n\n\n# Todo\n- Provide metric logging + make more dynamic\n- Add more benchmarks\n- Validate by training a small Hidden Markov Model or another model",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Gradient Ascent - Pytorch",
    "version": "0.0.2",
    "project_urls": {
        "Homepage": "https://github.com/kyegomez/gradient-ascent",
        "Repository": "https://github.com/kyegomez/gradient-ascent"
    },
    "split_keywords": [
        "artificial intelligence",
        "deep learning",
        "optimizers",
        "prompt engineering"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "caab9cc7bf410c14a0a564fd9c74f6deceb92a7821c0c09eabab256d7d601a7f",
                "md5": "fffde81149310ea8150907ca2b97d907",
                "sha256": "69372838d62e647becd6854abaeaba98815cbd6fa8c2dd841202ee0175249779"
            },
            "downloads": -1,
            "filename": "gradient_ascent-0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fffde81149310ea8150907ca2b97d907",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6,<4.0",
            "size": 6060,
            "upload_time": "2023-10-21T16:47:31",
            "upload_time_iso_8601": "2023-10-21T16:47:31.746131Z",
            "url": "https://files.pythonhosted.org/packages/ca/ab/9cc7bf410c14a0a564fd9c74f6deceb92a7821c0c09eabab256d7d601a7f/gradient_ascent-0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ad4a3a34b23e08e0e6c9b2bb46327e8240af60aa04ae1d13a0ae4cbacc646de8",
                "md5": "4302747a3c1d2fe05bddcd0a1bf95a76",
                "sha256": "aebdb82c39b0c1dfe04046e65f4a0fa1023c81f1656a6d4cfd22ae66fa4a01d2"
            },
            "downloads": -1,
            "filename": "gradient_ascent-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "4302747a3c1d2fe05bddcd0a1bf95a76",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6,<4.0",
            "size": 6360,
            "upload_time": "2023-10-21T16:47:33",
            "upload_time_iso_8601": "2023-10-21T16:47:33.319909Z",
            "url": "https://files.pythonhosted.org/packages/ad/4a/3a34b23e08e0e6c9b2bb46327e8240af60aa04ae1d13a0ae4cbacc646de8/gradient_ascent-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-21 16:47:33",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kyegomez",
    "github_project": "gradient-ascent",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "gradient-ascent"
}

Kye Gomez