# Bayesian LoRA
Code for the paper [Bayesian Low-Rank Adaptation for Large Language Models](https://openreview.net/forum?id=FJiUyzOF1m).
See the explanatory [blog post](https://maximerobeyns.com/bayesian_lora) and [documentation](https://maximerobeyns.github.io/bayesian_lora/).
## Installation
```bash
pip install bayesian-lora
```
# Example
We provide a comprehensive example in `examples/example_usage.py`, running
through the main methods using Phi-2 on ARC-E.
Note that running this requires a local installation with a few extra
dependencies. Run:
```bash
git clone https://github.com/MaximeRobeyns/bayesian_lora
cd bayesian_lora
pip install -e ".[examples]"
```
and then
```bash
python ./examples/example_usage.py
```
The main functions this library provides are for calculating Kronecker factors,
the marginal likelihood, and the posterior predictive distribution. We show how
to use these in the examples below.
## Calculating (low-rank) Kronecker factors
First, wrap your model call in a function that takes a batch from your data
loader, and returns the relevant logits. For a CausalLM from HuggingFace:
```python
def fwd_call(model: nn.Module, batch_prompts: Any) -> t.Tensor:
inputs = tokenizer(batch_prompts).to(device)
outputs = model(**inputs)
logits = outputs.logits[:, -1] # Get the last token logits
return logits
```
You can now call our `calculate_kronecker_factors` function:
```python
from bayesian_lora import calculate_kronecker_factors
factors = calculate_kronecker_factors(
model, # Your model (not necessarily PEFT)
fwd_call, # Model call wrapper, defined above
train_loader, # Your training data loader
cfg.n_kfac, # (Optional) rank to use
cfg.lr_threshold, # (Optional) threshold for low-rank approximation
["lora"], # modules to target
use_tqdm=True, # (Optional) use tqdm for progress bar
)
```
In the above, the `["lora"]` argument contains a case-insensitive list of
keywords to identify modules to target. Since we're working with a LoRa model,
we choose `"lora"` to target (e.g. `layers.0.q_proj.lora_A`, etc).
The `factors` are a dictionary with keys being the full name of the targetted
modules, and a tuple of two tensors as the values: the first being the
(possibly low-rank) Kronecker factor corresponding to the input activations,
and the second being the (possibly low-rank) factor corresponding to the output
gradients.
See [the K-FAC docs](https://maximerobeyns.github.io/bayesian_lora/kfac.html)
for more detail.
## Model Evidence
We provide a function called `model_evidence` which returns the evidence /
marginal likelihood.
```python
from bayesian_lora import model_evidence
evidence = model_evidence(
model, # Your model
log_likelihood, # A Tensor with model's log likelihood on some eval dataset
factors, # Kronecker factors, as calculated above
n_lora, # rank used in the LoRA adapters
n_kfac, # rank used in the Kronecker factors
prior_var, # prior variance hyperparameter, as a tensor
)
```
You can then use `evidence` as the loss in a normal training loop, presuming
your parameters (e.g. `prior_var` have gradients).
## Posterior Predictive Distribution
To get the parameters of the Gaussian over the logits, use
the `jacobian_mean` and `variance` functions.
```python
with t.no_grad():
for batch in validation_loader
prompts, classes = batch
batch_inputs = tokenizer(prompts)
# Predict the output logit locations
# target_ids is a tensor containing the indices of the target tokens
# e.g. [354, 355, 356].
jacobian, f_mu = jacobian_mean(
model, batch_inputs, target_ids
)
# Predict the output logit variances
f_var = variance(
batch_inputs, # inputs
jacobian, # the Jacobian dictionary, obtained above
factors, # Kronecker factors, as calculated above
prior_var, # prior variance hyperparameter, as a tensor
classes.size(-1), # number of classes to predict
n_lora, # rank of the LoRA adapters
n_kfac, # rank of the Kronecker factors
device, # device to use
)
# Now use the parameters to e.g. sample logits from the Gaussian
# predictive, parametrised by f_mu, f_var
L = t.linalg.cholesky(f_var)
samples = 100_000
f_mu = f_mu.expand(samples, *f_mu.shape)
L = L.expand(samples, *L.shape)
eps = t.randn_like(f_mu)
logits = (f_mu + L @ eps).squeeze(-1).softmax(-1).mean(0)
```
The above is a minimal example; see [this
section](https://maximerobeyns.github.io/bayesian_lora/bayesian_lora.html#posterior-predictive)
of the documentation for more detail.
# Development
This library is intentionally very small and hackable. It has two main files,
and three dependencies (`torch`, `tqdm` and `jaxtyping`.)
- `main.py` contains methods specific to [the paper](https://openreview.net/forum?id=FJiUyzOF1m),
- `kfac.py` contains relatively portable K-FAC methods
Feel free to directly copy the code into your projects and hack on it.
Raw data
{
"_id": null,
"home_page": "",
"name": "bayesian-lora",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "Bayes,LLM,LoRA,machine learning,uncertainty",
"author": "",
"author_email": "Maxime Robeyns <dev@maximerobeyns.com>",
"download_url": "https://files.pythonhosted.org/packages/34/a7/5e1043ed36ecf89571ba766c06a83dfca4c3998c724f68bd39e9c0d7bb39/bayesian_lora-0.0.5.tar.gz",
"platform": null,
"description": "# Bayesian LoRA\n\nCode for the paper [Bayesian Low-Rank Adaptation for Large Language Models](https://openreview.net/forum?id=FJiUyzOF1m).\n\nSee the explanatory [blog post](https://maximerobeyns.com/bayesian_lora) and [documentation](https://maximerobeyns.github.io/bayesian_lora/).\n\n## Installation\n\n```bash\npip install bayesian-lora\n```\n\n# Example\n\nWe provide a comprehensive example in `examples/example_usage.py`, running\nthrough the main methods using Phi-2 on ARC-E.\n\nNote that running this requires a local installation with a few extra\ndependencies. Run:\n```bash\ngit clone https://github.com/MaximeRobeyns/bayesian_lora\ncd bayesian_lora\npip install -e \".[examples]\"\n```\nand then\n```bash\npython ./examples/example_usage.py\n```\n\nThe main functions this library provides are for calculating Kronecker factors,\nthe marginal likelihood, and the posterior predictive distribution. We show how\nto use these in the examples below.\n\n## Calculating (low-rank) Kronecker factors\n\nFirst, wrap your model call in a function that takes a batch from your data\nloader, and returns the relevant logits. For a CausalLM from HuggingFace:\n\n```python\ndef fwd_call(model: nn.Module, batch_prompts: Any) -> t.Tensor:\n inputs = tokenizer(batch_prompts).to(device)\n outputs = model(**inputs)\n logits = outputs.logits[:, -1] # Get the last token logits\n return logits\n```\nYou can now call our `calculate_kronecker_factors` function:\n```python\nfrom bayesian_lora import calculate_kronecker_factors\n\nfactors = calculate_kronecker_factors(\n model, # Your model (not necessarily PEFT)\n fwd_call, # Model call wrapper, defined above\n train_loader, # Your training data loader\n cfg.n_kfac, # (Optional) rank to use\n cfg.lr_threshold, # (Optional) threshold for low-rank approximation\n [\"lora\"], # modules to target\n use_tqdm=True, # (Optional) use tqdm for progress bar\n)\n```\nIn the above, the `[\"lora\"]` argument contains a case-insensitive list of\nkeywords to identify modules to target. Since we're working with a LoRa model,\nwe choose `\"lora\"` to target (e.g. `layers.0.q_proj.lora_A`, etc).\n\nThe `factors` are a dictionary with keys being the full name of the targetted\nmodules, and a tuple of two tensors as the values: the first being the\n(possibly low-rank) Kronecker factor corresponding to the input activations,\nand the second being the (possibly low-rank) factor corresponding to the output\ngradients.\n\nSee [the K-FAC docs](https://maximerobeyns.github.io/bayesian_lora/kfac.html)\nfor more detail.\n\n## Model Evidence\n\nWe provide a function called `model_evidence` which returns the evidence /\nmarginal likelihood.\n\n```python\nfrom bayesian_lora import model_evidence\n\nevidence = model_evidence(\n model, # Your model\n log_likelihood, # A Tensor with model's log likelihood on some eval dataset\n factors, # Kronecker factors, as calculated above\n n_lora, # rank used in the LoRA adapters\n n_kfac, # rank used in the Kronecker factors\n prior_var, # prior variance hyperparameter, as a tensor\n)\n```\n\nYou can then use `evidence` as the loss in a normal training loop, presuming\nyour parameters (e.g. `prior_var` have gradients).\n\n## Posterior Predictive Distribution\n\nTo get the parameters of the Gaussian over the logits, use\nthe `jacobian_mean` and `variance` functions.\n\n```python\nwith t.no_grad():\n for batch in validation_loader\n prompts, classes = batch\n\n batch_inputs = tokenizer(prompts)\n\n # Predict the output logit locations\n # target_ids is a tensor containing the indices of the target tokens\n # e.g. [354, 355, 356].\n jacobian, f_mu = jacobian_mean(\n model, batch_inputs, target_ids\n )\n\n # Predict the output logit variances\n f_var = variance(\n batch_inputs, # inputs\n jacobian, # the Jacobian dictionary, obtained above\n factors, # Kronecker factors, as calculated above\n prior_var, # prior variance hyperparameter, as a tensor\n classes.size(-1), # number of classes to predict\n n_lora, # rank of the LoRA adapters\n n_kfac, # rank of the Kronecker factors\n device, # device to use\n )\n\n # Now use the parameters to e.g. sample logits from the Gaussian\n # predictive, parametrised by f_mu, f_var\n L = t.linalg.cholesky(f_var)\n samples = 100_000\n f_mu = f_mu.expand(samples, *f_mu.shape)\n L = L.expand(samples, *L.shape)\n eps = t.randn_like(f_mu)\n logits = (f_mu + L @ eps).squeeze(-1).softmax(-1).mean(0)\n```\n\nThe above is a minimal example; see [this\nsection](https://maximerobeyns.github.io/bayesian_lora/bayesian_lora.html#posterior-predictive)\nof the documentation for more detail.\n\n# Development\n\nThis library is intentionally very small and hackable. It has two main files,\nand three dependencies (`torch`, `tqdm` and `jaxtyping`.)\n\n- `main.py` contains methods specific to [the paper](https://openreview.net/forum?id=FJiUyzOF1m),\n- `kfac.py` contains relatively portable K-FAC methods\n\nFeel free to directly copy the code into your projects and hack on it.\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Bayesian LoRA adapters for Language Models",
"version": "0.0.5",
"project_urls": {
"Documentation": "https://maximerobeyns.github.io/bayesian_lora/",
"Homepage": "https://github.com/MaximeRobeyns/bayesian_lora",
"Repository": "https://github.com/MaximeRobeyns/bayesian_lora"
},
"split_keywords": [
"bayes",
"llm",
"lora",
"machine learning",
"uncertainty"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "48cf89ec52edd421ea4e8689c5c907343c8ab7f4dbda7f665ad5e91b9c6363a2",
"md5": "25acb16a419af2249f03996aa8dd330d",
"sha256": "3bbd35b0d1de99b077d87c854bafbf81f583d8949b65d0f6f6aa4c44347ccdf4"
},
"downloads": -1,
"filename": "bayesian_lora-0.0.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "25acb16a419af2249f03996aa8dd330d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 16924,
"upload_time": "2024-02-19T10:19:06",
"upload_time_iso_8601": "2024-02-19T10:19:06.388764Z",
"url": "https://files.pythonhosted.org/packages/48/cf/89ec52edd421ea4e8689c5c907343c8ab7f4dbda7f665ad5e91b9c6363a2/bayesian_lora-0.0.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "34a75e1043ed36ecf89571ba766c06a83dfca4c3998c724f68bd39e9c0d7bb39",
"md5": "6d39ded6566d518a661929fd12c142d1",
"sha256": "aac82b3c6a53ed9e250b9e8ee8a5c33714a2357f3ad366d3f7f606169330a7ff"
},
"downloads": -1,
"filename": "bayesian_lora-0.0.5.tar.gz",
"has_sig": false,
"md5_digest": "6d39ded6566d518a661929fd12c142d1",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 19968,
"upload_time": "2024-02-19T10:19:07",
"upload_time_iso_8601": "2024-02-19T10:19:07.835361Z",
"url": "https://files.pythonhosted.org/packages/34/a7/5e1043ed36ecf89571ba766c06a83dfca4c3998c724f68bd39e9c0d7bb39/bayesian_lora-0.0.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-02-19 10:19:07",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "MaximeRobeyns",
"github_project": "bayesian_lora",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "bayesian-lora"
}