bayesian-lora


Namebayesian-lora JSON
Version 0.0.5 PyPI version JSON
download
home_page
SummaryBayesian LoRA adapters for Language Models
upload_time2024-02-19 10:19:07
maintainer
docs_urlNone
author
requires_python>=3.8
licenseApache-2.0
keywords bayes llm lora machine learning uncertainty
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Bayesian LoRA

Code for the paper [Bayesian Low-Rank Adaptation for Large Language Models](https://openreview.net/forum?id=FJiUyzOF1m).

See the explanatory [blog post](https://maximerobeyns.com/bayesian_lora) and [documentation](https://maximerobeyns.github.io/bayesian_lora/).

## Installation

```bash
pip install bayesian-lora
```

# Example

We provide a comprehensive example in `examples/example_usage.py`, running
through the main methods using Phi-2 on ARC-E.

Note that running this requires a local installation with a few extra
dependencies. Run:
```bash
git clone https://github.com/MaximeRobeyns/bayesian_lora
cd bayesian_lora
pip install -e ".[examples]"
```
and then
```bash
python ./examples/example_usage.py
```

The main functions this library provides are for calculating Kronecker factors,
the marginal likelihood, and the posterior predictive distribution. We show how
to use these in the examples below.

## Calculating (low-rank) Kronecker factors

First, wrap your model call in a function that takes a batch from your data
loader, and returns the relevant logits. For a CausalLM from HuggingFace:

```python
def fwd_call(model: nn.Module, batch_prompts: Any) -> t.Tensor:
    inputs = tokenizer(batch_prompts).to(device)
    outputs = model(**inputs)
    logits = outputs.logits[:, -1]  # Get the last token logits
    return logits
```
You can now call our `calculate_kronecker_factors` function:
```python
from bayesian_lora import calculate_kronecker_factors

factors = calculate_kronecker_factors(
    model,            # Your model (not necessarily PEFT)
    fwd_call,         # Model call wrapper, defined above
    train_loader,     # Your training data loader
    cfg.n_kfac,       # (Optional) rank to use
    cfg.lr_threshold, # (Optional) threshold for low-rank approximation
    ["lora"],         # modules to target
    use_tqdm=True,    # (Optional) use tqdm for progress bar
)
```
In the above, the `["lora"]` argument contains a case-insensitive list of
keywords to identify modules to target. Since we're working with a LoRa model,
we choose `"lora"` to target (e.g. `layers.0.q_proj.lora_A`, etc).

The `factors` are a dictionary with keys being the full name of the targetted
modules, and a tuple of two tensors as the values: the first being the
(possibly low-rank) Kronecker factor corresponding to the input activations,
and the second being the (possibly low-rank) factor corresponding to the output
gradients.

See [the K-FAC docs](https://maximerobeyns.github.io/bayesian_lora/kfac.html)
for more detail.

## Model Evidence

We provide a function called `model_evidence` which returns the evidence /
marginal likelihood.

```python
from bayesian_lora import model_evidence

evidence = model_evidence(
    model,           # Your model
    log_likelihood,  # A Tensor with model's log likelihood on some eval dataset
    factors,         # Kronecker factors, as calculated above
    n_lora,          # rank used in the LoRA adapters
    n_kfac,          # rank used in the Kronecker factors
    prior_var,       # prior variance hyperparameter, as a tensor
)
```

You can then use `evidence` as the loss in a normal training loop, presuming
your parameters (e.g. `prior_var` have gradients).

## Posterior Predictive Distribution

To get the parameters of the Gaussian over the logits, use
the `jacobian_mean` and `variance` functions.

```python
with t.no_grad():
    for batch in validation_loader
        prompts, classes = batch

        batch_inputs = tokenizer(prompts)

        # Predict the output logit locations
        # target_ids is a tensor containing the indices of the target tokens
        # e.g. [354, 355, 356].
        jacobian, f_mu = jacobian_mean(
            model, batch_inputs, target_ids
        )

        # Predict the output logit variances
        f_var = variance(
            batch_inputs,     # inputs
            jacobian,         # the Jacobian dictionary, obtained above
            factors,          # Kronecker factors, as calculated above
            prior_var,        # prior variance hyperparameter, as a tensor
            classes.size(-1), # number of classes to predict
            n_lora,           # rank of the LoRA adapters
            n_kfac,           # rank of the Kronecker factors
            device,           # device to use
        )

        # Now use the parameters to e.g. sample logits from the Gaussian
        # predictive, parametrised by f_mu, f_var
        L = t.linalg.cholesky(f_var)
        samples = 100_000
        f_mu = f_mu.expand(samples, *f_mu.shape)
        L = L.expand(samples, *L.shape)
        eps = t.randn_like(f_mu)
        logits = (f_mu + L @ eps).squeeze(-1).softmax(-1).mean(0)
```

The above is a minimal example; see [this
section](https://maximerobeyns.github.io/bayesian_lora/bayesian_lora.html#posterior-predictive)
of the documentation for more detail.

# Development

This library is intentionally very small and hackable. It has two main files,
and three dependencies (`torch`, `tqdm` and `jaxtyping`.)

- `main.py` contains methods specific to [the paper](https://openreview.net/forum?id=FJiUyzOF1m),
- `kfac.py` contains relatively portable K-FAC methods

Feel free to directly copy the code into your projects and hack on it.

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "bayesian-lora",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "Bayes,LLM,LoRA,machine learning,uncertainty",
    "author": "",
    "author_email": "Maxime Robeyns <dev@maximerobeyns.com>",
    "download_url": "https://files.pythonhosted.org/packages/34/a7/5e1043ed36ecf89571ba766c06a83dfca4c3998c724f68bd39e9c0d7bb39/bayesian_lora-0.0.5.tar.gz",
    "platform": null,
    "description": "# Bayesian LoRA\n\nCode for the paper [Bayesian Low-Rank Adaptation for Large Language Models](https://openreview.net/forum?id=FJiUyzOF1m).\n\nSee the explanatory [blog post](https://maximerobeyns.com/bayesian_lora) and [documentation](https://maximerobeyns.github.io/bayesian_lora/).\n\n## Installation\n\n```bash\npip install bayesian-lora\n```\n\n# Example\n\nWe provide a comprehensive example in `examples/example_usage.py`, running\nthrough the main methods using Phi-2 on ARC-E.\n\nNote that running this requires a local installation with a few extra\ndependencies. Run:\n```bash\ngit clone https://github.com/MaximeRobeyns/bayesian_lora\ncd bayesian_lora\npip install -e \".[examples]\"\n```\nand then\n```bash\npython ./examples/example_usage.py\n```\n\nThe main functions this library provides are for calculating Kronecker factors,\nthe marginal likelihood, and the posterior predictive distribution. We show how\nto use these in the examples below.\n\n## Calculating (low-rank) Kronecker factors\n\nFirst, wrap your model call in a function that takes a batch from your data\nloader, and returns the relevant logits. For a CausalLM from HuggingFace:\n\n```python\ndef fwd_call(model: nn.Module, batch_prompts: Any) -> t.Tensor:\n    inputs = tokenizer(batch_prompts).to(device)\n    outputs = model(**inputs)\n    logits = outputs.logits[:, -1]  # Get the last token logits\n    return logits\n```\nYou can now call our `calculate_kronecker_factors` function:\n```python\nfrom bayesian_lora import calculate_kronecker_factors\n\nfactors = calculate_kronecker_factors(\n    model,            # Your model (not necessarily PEFT)\n    fwd_call,         # Model call wrapper, defined above\n    train_loader,     # Your training data loader\n    cfg.n_kfac,       # (Optional) rank to use\n    cfg.lr_threshold, # (Optional) threshold for low-rank approximation\n    [\"lora\"],         # modules to target\n    use_tqdm=True,    # (Optional) use tqdm for progress bar\n)\n```\nIn the above, the `[\"lora\"]` argument contains a case-insensitive list of\nkeywords to identify modules to target. Since we're working with a LoRa model,\nwe choose `\"lora\"` to target (e.g. `layers.0.q_proj.lora_A`, etc).\n\nThe `factors` are a dictionary with keys being the full name of the targetted\nmodules, and a tuple of two tensors as the values: the first being the\n(possibly low-rank) Kronecker factor corresponding to the input activations,\nand the second being the (possibly low-rank) factor corresponding to the output\ngradients.\n\nSee [the K-FAC docs](https://maximerobeyns.github.io/bayesian_lora/kfac.html)\nfor more detail.\n\n## Model Evidence\n\nWe provide a function called `model_evidence` which returns the evidence /\nmarginal likelihood.\n\n```python\nfrom bayesian_lora import model_evidence\n\nevidence = model_evidence(\n    model,           # Your model\n    log_likelihood,  # A Tensor with model's log likelihood on some eval dataset\n    factors,         # Kronecker factors, as calculated above\n    n_lora,          # rank used in the LoRA adapters\n    n_kfac,          # rank used in the Kronecker factors\n    prior_var,       # prior variance hyperparameter, as a tensor\n)\n```\n\nYou can then use `evidence` as the loss in a normal training loop, presuming\nyour parameters (e.g. `prior_var` have gradients).\n\n## Posterior Predictive Distribution\n\nTo get the parameters of the Gaussian over the logits, use\nthe `jacobian_mean` and `variance` functions.\n\n```python\nwith t.no_grad():\n    for batch in validation_loader\n        prompts, classes = batch\n\n        batch_inputs = tokenizer(prompts)\n\n        # Predict the output logit locations\n        # target_ids is a tensor containing the indices of the target tokens\n        # e.g. [354, 355, 356].\n        jacobian, f_mu = jacobian_mean(\n            model, batch_inputs, target_ids\n        )\n\n        # Predict the output logit variances\n        f_var = variance(\n            batch_inputs,     # inputs\n            jacobian,         # the Jacobian dictionary, obtained above\n            factors,          # Kronecker factors, as calculated above\n            prior_var,        # prior variance hyperparameter, as a tensor\n            classes.size(-1), # number of classes to predict\n            n_lora,           # rank of the LoRA adapters\n            n_kfac,           # rank of the Kronecker factors\n            device,           # device to use\n        )\n\n        # Now use the parameters to e.g. sample logits from the Gaussian\n        # predictive, parametrised by f_mu, f_var\n        L = t.linalg.cholesky(f_var)\n        samples = 100_000\n        f_mu = f_mu.expand(samples, *f_mu.shape)\n        L = L.expand(samples, *L.shape)\n        eps = t.randn_like(f_mu)\n        logits = (f_mu + L @ eps).squeeze(-1).softmax(-1).mean(0)\n```\n\nThe above is a minimal example; see [this\nsection](https://maximerobeyns.github.io/bayesian_lora/bayesian_lora.html#posterior-predictive)\nof the documentation for more detail.\n\n# Development\n\nThis library is intentionally very small and hackable. It has two main files,\nand three dependencies (`torch`, `tqdm` and `jaxtyping`.)\n\n- `main.py` contains methods specific to [the paper](https://openreview.net/forum?id=FJiUyzOF1m),\n- `kfac.py` contains relatively portable K-FAC methods\n\nFeel free to directly copy the code into your projects and hack on it.\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Bayesian LoRA adapters for Language Models",
    "version": "0.0.5",
    "project_urls": {
        "Documentation": "https://maximerobeyns.github.io/bayesian_lora/",
        "Homepage": "https://github.com/MaximeRobeyns/bayesian_lora",
        "Repository": "https://github.com/MaximeRobeyns/bayesian_lora"
    },
    "split_keywords": [
        "bayes",
        "llm",
        "lora",
        "machine learning",
        "uncertainty"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "48cf89ec52edd421ea4e8689c5c907343c8ab7f4dbda7f665ad5e91b9c6363a2",
                "md5": "25acb16a419af2249f03996aa8dd330d",
                "sha256": "3bbd35b0d1de99b077d87c854bafbf81f583d8949b65d0f6f6aa4c44347ccdf4"
            },
            "downloads": -1,
            "filename": "bayesian_lora-0.0.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "25acb16a419af2249f03996aa8dd330d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 16924,
            "upload_time": "2024-02-19T10:19:06",
            "upload_time_iso_8601": "2024-02-19T10:19:06.388764Z",
            "url": "https://files.pythonhosted.org/packages/48/cf/89ec52edd421ea4e8689c5c907343c8ab7f4dbda7f665ad5e91b9c6363a2/bayesian_lora-0.0.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "34a75e1043ed36ecf89571ba766c06a83dfca4c3998c724f68bd39e9c0d7bb39",
                "md5": "6d39ded6566d518a661929fd12c142d1",
                "sha256": "aac82b3c6a53ed9e250b9e8ee8a5c33714a2357f3ad366d3f7f606169330a7ff"
            },
            "downloads": -1,
            "filename": "bayesian_lora-0.0.5.tar.gz",
            "has_sig": false,
            "md5_digest": "6d39ded6566d518a661929fd12c142d1",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 19968,
            "upload_time": "2024-02-19T10:19:07",
            "upload_time_iso_8601": "2024-02-19T10:19:07.835361Z",
            "url": "https://files.pythonhosted.org/packages/34/a7/5e1043ed36ecf89571ba766c06a83dfca4c3998c724f68bd39e9c0d7bb39/bayesian_lora-0.0.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-19 10:19:07",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "MaximeRobeyns",
    "github_project": "bayesian_lora",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "bayesian-lora"
}
        
Elapsed time: 0.21738s