transformer-heads

Name	transformer-heads JSON
Version	0.2.0 JSON
	download
home_page	None
Summary	Attach custom heads to transformer models.
upload_time	2025-01-23 18:29:37
maintainer	None
docs_url	None
author	None
requires_python	>=3.10
license	MIT License
keywords	linear probe qlora transformer
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <h4 align="center">
    <p>
        <a href="https://transformer-heads.readthedocs.io/en/latest/">Documentation</a> |
        <a href="docs/source/getting_started.md">Getting Started</a> |
        <a href="https://www.reddit.com/r/LocalLLaMA/comments/1bnd621/new_library_transformerheads_for_attaching_heads/">Reddit Post with more info</a>
    </p>
</h4>

# Transformer Heads
This library aims to be an allround toolkit for attaching, training, saving and loading of new heads for transformer models.  
A new head could be: 
* A [linear probe](https://arxiv.org/pdf/1610.01644.pdf) used to get an understanding of the information processing in a transformer architecture
* A head to be finetuned jointly with the weights of a pretrained transformer model to perform a completely different kind of task.
    - E.g. a transformer pretrained to do causal language modelling could get a sequence classification head attached and be finetuned to do sentiment classification.
    - Or one could attach a regression head to turn a large language model into a value function for a reinforcement learning problem.

On top of that, attaching multiple heads at once can make multi-task learning easy, making it possible to train very general models.

## Installation
Install from pypi: `pip install transformer-heads`.

Or, clone this repo and from the root of this repository:
`pip install -e .`

## Usage
Create head configurations
```python
head_config = HeadConfig(
    name=f"imdb_head_3",
    layer_hook=-3,  # Attach at the output of the third-to-last transformer-block
    in_size=hidden_size,
    output_activation="linear",
    pred_for_sequence=True,
    loss_fct="cross_entropy",
    num_outputs=2,
    target="label" # The name of the ground-truth column in the dataset
)
```
Create a model with your head from a pretrained transformer model
```python
model = load_headed(
    LlamaForCausalLM,
    "meta-llama/Llama-2-7b-hf",
    head_configs=[heads_config],
)
```
Train you model using (for example) the simple to use huggingface *Trainer* interface:
```python
trainer = Trainer(
    model,
    args=args,
    train_dataset=imdb_dataset["train"],
    data_collator=collator,
)
```

For a more in-depth introduction and a fully working example, check the [linear probe notebook](notebooks/gpt2/linear_probe.ipynb).

## Explanation of approach for training a transformer value function with QLoRA
* **The Base Model**
    * The value model builds on a pre-trained base large language model. 
    * That is, a transformer model trained on the causal language modelling objective on a large corpus of free flowing text
    * To solve the task, LLMs have a linear causal language modelling head that projects from the hidden dimension for each token to the number of tokens in the vocabulary.
    * The base model is not instruct tuned or trained by RLHF
* **Adding a value head**
    * The causal language modelling head is removed.
    * It is replaced by a value head that projects from the hidden dimension for each token to a one-dimensional value prediction.
    * The value head may be linear or a small multilayer perceptron.
    * The value head is solving a regression task and is trained via the mean-squared-error loss.
* **Preparing for QLoRA training**
    * QLoRA is desired to reduce memory-overhead and enable DDP training.
    * All weights from the model except the value-head are quantized and frozen.
    * LoRA weights are trained for all these frozen weights.
    * The value-head is still fully trained.

## Joint training of multiple linear probes
![_images/multi_linear_probe.svg](_images/multi_linear_probe.svg)

## Notebooks
This repository contains multiple jupyter notebooks for a tutorial/illustration of how do do certain things with this library. Here is an overview of which notebook you should check out depending on the use you are interested in.
* Linear Probes (understanding the inner workings of transformers)
    - Basic example with one probe for causal LM: [notebooks/gpt2/linear_probe.ipynb](notebooks/gpt2/linear_probe.ipynb)
    - Train many probes for causal LM at once: [notebooks/gpt2/multi_linear_probe.ipynb](notebooks/gpt2/multi_linear_probe.ipynb)
    - Train many probes for text classification at once: [notebooks/gpt2/text_classification_linear_probe.ipynb](notebooks/gpt2/text_classification_linear_probe.ipynb)
* Finetuning on a new type of task (with a new head)
    - QLoRA: [notebooks/gpt2/text_classification_qlora.ipynb](notebooks/gpt2/text_classification_qlora.ipynb)
    - Full finetuning: [notebooks/gpt2/text_classification_full_finetune.ipynb](notebooks/gpt2/text_classification_full_finetune.ipynb)
* Joint multi-task learning
    - Many heads doing completely different tasks + QLoRA, all trained at the same time: [notebooks/gpt2/joint_multitask_learning.ipynb](notebooks/gpt2/joint_multitask_learning.ipynb)
* Regression with pretrained transformers
    - Check the regression heads of this notebook: [notebooks/gpt2/joint_multitask_learning.ipynb](notebooks/gpt2/joint_multitask_learning.ipynb)
* Saving and loading
    - Notebook: [notebooks/gpt2/saving_and_loading.ipynb](notebooks/gpt2/saving_and_loading.ipynb)
    - Tests: [transformer_heads/tests/test_load_model.py](transformer_heads/tests/test_load_model.py)

## Joint multi-task training with different types of heads and QLoRA.
![_images/example_architecture.svg](_images/example_architecture.svg)

## More custom loss functions and models
At the state of writing, only a subset of loss functions are supported out of the box. Check [transformer_heads/constants.py](transformer_heads/constants.py) for more up to date info.

However, it is not so hard to add/use different loss functions/models. You'll just need to add their respective information to `loss_fct_map` and `model_type_map`. Just import from `transformer_heads.constants`. To add a loss function, add a mapping from string to torch class. To add a model add a mapping from model type to a 2 tuple out of attribute name of the base model in the Model Class and Base model class. That may sound confusing, but what that means is just the following:

```python
from transformer_heads.constants import model_type_map, loss_fct_map
import torch.nn as nn
from transformers import MistralModel

loss_fct_map["bce"] = nn.BCELoss()
model_type_map["mistral"] = ("model",MistralModel)
```
## Can my transformer architecture be supported?
One of the basic assumtions of my library is that there is a transformer class such as the LlamaForCausalLM class of huggingface that has an [attribute pointing to a base model that outputs raw hidden state](https://github.com/huggingface/transformers/blob/7eb3ba82241c927053689270a0751f4ff5d33c54/src/transformers/models/llama/modeling_llama.py#L1116). If your transformers model is built up in a similar way, adding support may be as easy as adding an entry to the [model_type_map](https://github.com/center-for-humans-and-machines/transformer-heads/blob/8ea0805ab95ca01dff7ea73ed9c844df946c17cb/transformer_heads/constants.py#L20) with the name of the attribute and the class of the base model. You can either do that by importing from [constants.py](transformer_heads/constants.py) or by adding it directly and creating a pull request.

## Q&A
* Is Llama-3 supported? YES! Check [here](https://github.com/center-for-humans-and-machines/transformer-heads/issues/3)
* How do I use my model for inference? Check the notebooks or [this](https://github.com/center-for-humans-and-machines/transformer-heads/issues/2) issue to get started.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "transformer-heads",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "Linear Probe, Qlora, Transformer",
    "author": null,
    "author_email": "Yannik Keller <yannik@kelnet.de>",
    "download_url": "https://files.pythonhosted.org/packages/26/c5/851581eb0a92926c47cb4a0fcd38ed37e66c1019331df08113807966437b/transformer_heads-0.2.0.tar.gz",
    "platform": null,
    "description": "<h4 align=\"center\">\n    <p>\n        <a href=\"https://transformer-heads.readthedocs.io/en/latest/\">Documentation</a> |\n        <a href=\"docs/source/getting_started.md\">Getting Started</a> |\n        <a href=\"https://www.reddit.com/r/LocalLLaMA/comments/1bnd621/new_library_transformerheads_for_attaching_heads/\">Reddit Post with more info</a>\n    </p>\n</h4>\n\n# Transformer Heads\nThis library aims to be an allround toolkit for attaching, training, saving and loading of new heads for transformer models.  \nA new head could be: \n* A [linear probe](https://arxiv.org/pdf/1610.01644.pdf) used to get an understanding of the information processing in a transformer architecture\n* A head to be finetuned jointly with the weights of a pretrained transformer model to perform a completely different kind of task.\n    - E.g. a transformer pretrained to do causal language modelling could get a sequence classification head attached and be finetuned to do sentiment classification.\n    - Or one could attach a regression head to turn a large language model into a value function for a reinforcement learning problem.\n\nOn top of that, attaching multiple heads at once can make multi-task learning easy, making it possible to train very general models.\n\n## Installation\nInstall from pypi: `pip install transformer-heads`.\n\nOr, clone this repo and from the root of this repository:\n`pip install -e .`\n\n## Usage\nCreate head configurations\n```python\nhead_config = HeadConfig(\n    name=f\"imdb_head_3\",\n    layer_hook=-3,  # Attach at the output of the third-to-last transformer-block\n    in_size=hidden_size,\n    output_activation=\"linear\",\n    pred_for_sequence=True,\n    loss_fct=\"cross_entropy\",\n    num_outputs=2,\n    target=\"label\" # The name of the ground-truth column in the dataset\n)\n```\nCreate a model with your head from a pretrained transformer model\n```python\nmodel = load_headed(\n    LlamaForCausalLM,\n    \"meta-llama/Llama-2-7b-hf\",\n    head_configs=[heads_config],\n)\n```\nTrain you model using (for example) the simple to use huggingface *Trainer* interface:\n```python\ntrainer = Trainer(\n    model,\n    args=args,\n    train_dataset=imdb_dataset[\"train\"],\n    data_collator=collator,\n)\n```\n\nFor a more in-depth introduction and a fully working example, check the [linear probe notebook](notebooks/gpt2/linear_probe.ipynb).\n\n## Explanation of approach for training a transformer value function with QLoRA\n* **The Base Model**\n    * The value model builds on a pre-trained base large language model. \n    * That is, a transformer model trained on the causal language modelling objective on a large corpus of free flowing text\n    * To solve the task, LLMs have a linear causal language modelling head that projects from the hidden dimension for each token to the number of tokens in the vocabulary.\n    * The base model is not instruct tuned or trained by RLHF\n* **Adding a value head**\n    * The causal language modelling head is removed.\n    * It is replaced by a value head that projects from the hidden dimension for each token to a one-dimensional value prediction.\n    * The value head may be linear or a small multilayer perceptron.\n    * The value head is solving a regression task and is trained via the mean-squared-error loss.\n* **Preparing for QLoRA training**\n    * QLoRA is desired to reduce memory-overhead and enable DDP training.\n    * All weights from the model except the value-head are quantized and frozen.\n    * LoRA weights are trained for all these frozen weights.\n    * The value-head is still fully trained.\n\n## Joint training of multiple linear probes\n![_images/multi_linear_probe.svg](_images/multi_linear_probe.svg)\n\n## Notebooks\nThis repository contains multiple jupyter notebooks for a tutorial/illustration of how do do certain things with this library. Here is an overview of which notebook you should check out depending on the use you are interested in.\n* Linear Probes (understanding the inner workings of transformers)\n    - Basic example with one probe for causal LM: [notebooks/gpt2/linear_probe.ipynb](notebooks/gpt2/linear_probe.ipynb)\n    - Train many probes for causal LM at once: [notebooks/gpt2/multi_linear_probe.ipynb](notebooks/gpt2/multi_linear_probe.ipynb)\n    - Train many probes for text classification at once: [notebooks/gpt2/text_classification_linear_probe.ipynb](notebooks/gpt2/text_classification_linear_probe.ipynb)\n* Finetuning on a new type of task (with a new head)\n    - QLoRA: [notebooks/gpt2/text_classification_qlora.ipynb](notebooks/gpt2/text_classification_qlora.ipynb)\n    - Full finetuning: [notebooks/gpt2/text_classification_full_finetune.ipynb](notebooks/gpt2/text_classification_full_finetune.ipynb)\n* Joint multi-task learning\n    - Many heads doing completely different tasks + QLoRA, all trained at the same time: [notebooks/gpt2/joint_multitask_learning.ipynb](notebooks/gpt2/joint_multitask_learning.ipynb)\n* Regression with pretrained transformers\n    - Check the regression heads of this notebook: [notebooks/gpt2/joint_multitask_learning.ipynb](notebooks/gpt2/joint_multitask_learning.ipynb)\n* Saving and loading\n    - Notebook: [notebooks/gpt2/saving_and_loading.ipynb](notebooks/gpt2/saving_and_loading.ipynb)\n    - Tests: [transformer_heads/tests/test_load_model.py](transformer_heads/tests/test_load_model.py)\n\n## Joint multi-task training with different types of heads and QLoRA.\n![_images/example_architecture.svg](_images/example_architecture.svg)\n\n## More custom loss functions and models\nAt the state of writing, only a subset of loss functions are supported out of the box. Check [transformer_heads/constants.py](transformer_heads/constants.py) for more up to date info.\n\nHowever, it is not so hard to add/use different loss functions/models. You'll just need to add their respective information to `loss_fct_map` and `model_type_map`. Just import from `transformer_heads.constants`. To add a loss function, add a mapping from string to torch class. To add a model add a mapping from model type to a 2 tuple out of attribute name of the base model in the Model Class and Base model class. That may sound confusing, but what that means is just the following:\n\n```python\nfrom transformer_heads.constants import model_type_map, loss_fct_map\nimport torch.nn as nn\nfrom transformers import MistralModel\n\nloss_fct_map[\"bce\"] = nn.BCELoss()\nmodel_type_map[\"mistral\"] = (\"model\",MistralModel)\n```\n## Can my transformer architecture be supported?\nOne of the basic assumtions of my library is that there is a transformer class such as the LlamaForCausalLM class of huggingface that has an [attribute pointing to a base model that outputs raw hidden state](https://github.com/huggingface/transformers/blob/7eb3ba82241c927053689270a0751f4ff5d33c54/src/transformers/models/llama/modeling_llama.py#L1116). If your transformers model is built up in a similar way, adding support may be as easy as adding an entry to the [model_type_map](https://github.com/center-for-humans-and-machines/transformer-heads/blob/8ea0805ab95ca01dff7ea73ed9c844df946c17cb/transformer_heads/constants.py#L20) with the name of the attribute and the class of the base model. You can either do that by importing from [constants.py](transformer_heads/constants.py) or by adding it directly and creating a pull request.\n\n## Q&A\n* Is Llama-3 supported? YES! Check [here](https://github.com/center-for-humans-and-machines/transformer-heads/issues/3)\n* How do I use my model for inference? Check the notebooks or [this](https://github.com/center-for-humans-and-machines/transformer-heads/issues/2) issue to get started.",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "Attach custom heads to transformer models.",
    "version": "0.2.0",
    "project_urls": null,
    "split_keywords": [
        "linear probe",
        " qlora",
        " transformer"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5fc96d0ecdb55efdbf57fc07eb59adff058e8469931388bf5e6ff9b75943d258",
                "md5": "a52bb4aebca3e51e3092ada4e3550fd8",
                "sha256": "5241f86253f4a6f7d79c80a476009f9f9007370a36747cd3a5890c59dd5446c6"
            },
            "downloads": -1,
            "filename": "transformer_heads-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a52bb4aebca3e51e3092ada4e3550fd8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 36586,
            "upload_time": "2025-01-23T18:29:35",
            "upload_time_iso_8601": "2025-01-23T18:29:35.031138Z",
            "url": "https://files.pythonhosted.org/packages/5f/c9/6d0ecdb55efdbf57fc07eb59adff058e8469931388bf5e6ff9b75943d258/transformer_heads-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "26c5851581eb0a92926c47cb4a0fcd38ed37e66c1019331df08113807966437b",
                "md5": "a0be996d95a351757995a9e96adaaf65",
                "sha256": "2c9ee5ee7181cd38b4db5273017e78bedc5f2266f38c266b62c34bd5b9a544b5"
            },
            "downloads": -1,
            "filename": "transformer_heads-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "a0be996d95a351757995a9e96adaaf65",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 273522,
            "upload_time": "2025-01-23T18:29:37",
            "upload_time_iso_8601": "2025-01-23T18:29:37.687497Z",
            "url": "https://files.pythonhosted.org/packages/26/c5/851581eb0a92926c47cb4a0fcd38ed37e66c1019331df08113807966437b/transformer_heads-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-23 18:29:37",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "transformer-heads"
}

None