<h4 align="center">
<p>
<a href="https://transformer-heads.readthedocs.io/en/latest/">Documentation</a> |
<a href="docs/source/getting_started.md">Getting Started</a> |
<a href="https://www.reddit.com/r/LocalLLaMA/comments/1bnd621/new_library_transformerheads_for_attaching_heads/">Reddit Post with more info</a>
</p>
</h4>
# Transformer Heads
This library aims to be an allround toolkit for attaching, training, saving and loading of new heads for transformer models.
A new head could be:
* A [linear probe](https://arxiv.org/pdf/1610.01644.pdf) used to get an understanding of the information processing in a transformer architecture
* A head to be finetuned jointly with the weights of a pretrained transformer model to perform a completely different kind of task.
- E.g. a transformer pretrained to do causal language modelling could get a sequence classification head attached and be finetuned to do sentiment classification.
- Or one could attach a regression head to turn a large language model into a value function for a reinforcement learning problem.
On top of that, attaching multiple heads at once can make multi-task learning easy, making it possible to train very general models.
## Installation
Install from pypi: `pip install transformer-heads`.
Or, clone this repo and from the root of this repository:
`pip install -e .`
## Usage
Create head configurations
```python
head_config = HeadConfig(
name=f"imdb_head_3",
layer_hook=-3, # Attach at the output of the third-to-last transformer-block
in_size=hidden_size,
output_activation="linear",
pred_for_sequence=True,
loss_fct="cross_entropy",
num_outputs=2,
target="label" # The name of the ground-truth column in the dataset
)
```
Create a model with your head from a pretrained transformer model
```python
model = load_headed(
LlamaForCausalLM,
"meta-llama/Llama-2-7b-hf",
head_configs=[heads_config],
)
```
Train you model using (for example) the simple to use huggingface *Trainer* interface:
```python
trainer = Trainer(
model,
args=args,
train_dataset=imdb_dataset["train"],
data_collator=collator,
)
```
For a more in-depth introduction and a fully working example, check the [linear probe notebook](notebooks/gpt2/linear_probe.ipynb).
## Joint training of multiple linear probes
![_images/multi_linear_probe.svg](_images/multi_linear_probe.svg)
## Notebooks
This repository contains multiple jupyter notebooks for a tutorial/illustration of how do do certain things with this library. Here is an overview of which notebook you should check out depending on the use you are interested in.
* Linear Probes (understanding the inner workings of transformers)
- Basic example with one probe for causal LM: [notebooks/gpt2/linear_probe.ipynb](notebooks/gpt2/linear_probe.ipynb)
- Train many probes for causal LM at once: [notebooks/gpt2/multi_linear_probe.ipynb](notebooks/gpt2/multi_linear_probe.ipynb)
- Train many probes for text classification at once: [notebooks/gpt2/text_classification_linear_probe.ipynb](notebooks/gpt2/text_classification_linear_probe.ipynb)
* Finetuning on a new type of task (with a new head)
- QLoRA: [notebooks/gpt2/text_classification_qlora.ipynb](notebooks/gpt2/text_classification_qlora.ipynb)
- Full finetuning: [notebooks/gpt2/text_classification_full_finetune.ipynb](notebooks/gpt2/text_classification_full_finetune.ipynb)
* Joint multi-task learning
- Many heads doing completely different tasks + QLoRA, all trained at the same time: [notebooks/gpt2/joint_multitask_learning.ipynb](notebooks/gpt2/joint_multitask_learning.ipynb)
* Regression with pretrained transformers
- Check the regression heads of this notebook: [notebooks/gpt2/joint_multitask_learning.ipynb](notebooks/gpt2/joint_multitask_learning.ipynb)
* Saving and loading
- Notebook: [notebooks/gpt2/saving_and_loading.ipynb](notebooks/gpt2/saving_and_loading.ipynb)
- Tests: [transformer_heads/tests/test_load_model.py](transformer_heads/tests/test_load_model.py)
## Joint multi-task training with different types of heads and QLoRA.
![_images/example_architecture.svg](_images/example_architecture.svg)
## More custom loss functions and models
At the state of writing, only a subset of loss functions are supported out of the box. Check [transformer_heads/constants.py](transformer_heads/constants.py) for more up to date info.
However, it is not so hard to add/use different loss functions/models. You'll just need to add their respective information to `loss_fct_map` and `model_type_map`. Just import from `transformer_heads.constants`. To add a loss function, add a mapping from string to torch class. To add a model add a mapping from model type to a 2 tuple out of attribute name of the base model in the Model Class and Base model class. That may sound confusing, but what that means is just the following:
```python
from transformer_heads.constants import model_type_map, loss_fct_map
import torch.nn as nn
from transformers import MistralModel
loss_fct_map["bce"] = nn.BCELoss()
model_type_map["mistral"] = ("model",MistralModel)
```
## Can my transformer architecture be supported?
One of the basic assumtions of my library is that there is a transformer class such as the LlamaForCausalLM class of huggingface that has an [attribute pointing to a base model that outputs raw hidden state](https://github.com/huggingface/transformers/blob/7eb3ba82241c927053689270a0751f4ff5d33c54/src/transformers/models/llama/modeling_llama.py#L1116). If your transformers model is built up in a similar way, adding support may be as easy as adding an entry to the [model_type_map](https://github.com/center-for-humans-and-machines/transformer-heads/blob/8ea0805ab95ca01dff7ea73ed9c844df946c17cb/transformer_heads/constants.py#L20) with the name of the attribute and the class of the base model. You can either do that by importing from [constants.py](transformer_heads/constants.py) or by adding it directly and creating a pull request.
## Q&A
* Is Llama-3 supported? YES! Check [here](https://github.com/center-for-humans-and-machines/transformer-heads/issues/3)
* How do I use my model for inference? Check the notebooks or [this](https://github.com/center-for-humans-and-machines/transformer-heads/issues/2) issue to get started.
Raw data
{
"_id": null,
"home_page": null,
"name": "transformer-heads",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "Linear Probe, Qlora, Transformer",
"author": null,
"author_email": "Yannik Keller <yannik@kelnet.de>",
"download_url": "https://files.pythonhosted.org/packages/9e/6c/b63443a7105add470f26c2ee1f943a3103a1419c51b1e58d4b6854d58ee0/transformer_heads-0.1.3.tar.gz",
"platform": null,
"description": "<h4 align=\"center\">\n <p>\n <a href=\"https://transformer-heads.readthedocs.io/en/latest/\">Documentation</a> |\n <a href=\"docs/source/getting_started.md\">Getting Started</a> |\n <a href=\"https://www.reddit.com/r/LocalLLaMA/comments/1bnd621/new_library_transformerheads_for_attaching_heads/\">Reddit Post with more info</a>\n </p>\n</h4>\n\n# Transformer Heads\nThis library aims to be an allround toolkit for attaching, training, saving and loading of new heads for transformer models. \nA new head could be: \n* A [linear probe](https://arxiv.org/pdf/1610.01644.pdf) used to get an understanding of the information processing in a transformer architecture\n* A head to be finetuned jointly with the weights of a pretrained transformer model to perform a completely different kind of task.\n - E.g. a transformer pretrained to do causal language modelling could get a sequence classification head attached and be finetuned to do sentiment classification.\n - Or one could attach a regression head to turn a large language model into a value function for a reinforcement learning problem.\n\nOn top of that, attaching multiple heads at once can make multi-task learning easy, making it possible to train very general models.\n\n## Installation\nInstall from pypi: `pip install transformer-heads`.\n\nOr, clone this repo and from the root of this repository:\n`pip install -e .`\n\n## Usage\nCreate head configurations\n```python\nhead_config = HeadConfig(\n name=f\"imdb_head_3\",\n layer_hook=-3, # Attach at the output of the third-to-last transformer-block\n in_size=hidden_size,\n output_activation=\"linear\",\n pred_for_sequence=True,\n loss_fct=\"cross_entropy\",\n num_outputs=2,\n target=\"label\" # The name of the ground-truth column in the dataset\n)\n```\nCreate a model with your head from a pretrained transformer model\n```python\nmodel = load_headed(\n LlamaForCausalLM,\n \"meta-llama/Llama-2-7b-hf\",\n head_configs=[heads_config],\n)\n```\nTrain you model using (for example) the simple to use huggingface *Trainer* interface:\n```python\ntrainer = Trainer(\n model,\n args=args,\n train_dataset=imdb_dataset[\"train\"],\n data_collator=collator,\n)\n```\n\nFor a more in-depth introduction and a fully working example, check the [linear probe notebook](notebooks/gpt2/linear_probe.ipynb).\n\n## Joint training of multiple linear probes\n![_images/multi_linear_probe.svg](_images/multi_linear_probe.svg)\n\n## Notebooks\nThis repository contains multiple jupyter notebooks for a tutorial/illustration of how do do certain things with this library. Here is an overview of which notebook you should check out depending on the use you are interested in.\n* Linear Probes (understanding the inner workings of transformers)\n - Basic example with one probe for causal LM: [notebooks/gpt2/linear_probe.ipynb](notebooks/gpt2/linear_probe.ipynb)\n - Train many probes for causal LM at once: [notebooks/gpt2/multi_linear_probe.ipynb](notebooks/gpt2/multi_linear_probe.ipynb)\n - Train many probes for text classification at once: [notebooks/gpt2/text_classification_linear_probe.ipynb](notebooks/gpt2/text_classification_linear_probe.ipynb)\n* Finetuning on a new type of task (with a new head)\n - QLoRA: [notebooks/gpt2/text_classification_qlora.ipynb](notebooks/gpt2/text_classification_qlora.ipynb)\n - Full finetuning: [notebooks/gpt2/text_classification_full_finetune.ipynb](notebooks/gpt2/text_classification_full_finetune.ipynb)\n* Joint multi-task learning\n - Many heads doing completely different tasks + QLoRA, all trained at the same time: [notebooks/gpt2/joint_multitask_learning.ipynb](notebooks/gpt2/joint_multitask_learning.ipynb)\n* Regression with pretrained transformers\n - Check the regression heads of this notebook: [notebooks/gpt2/joint_multitask_learning.ipynb](notebooks/gpt2/joint_multitask_learning.ipynb)\n* Saving and loading\n - Notebook: [notebooks/gpt2/saving_and_loading.ipynb](notebooks/gpt2/saving_and_loading.ipynb)\n - Tests: [transformer_heads/tests/test_load_model.py](transformer_heads/tests/test_load_model.py)\n\n## Joint multi-task training with different types of heads and QLoRA.\n![_images/example_architecture.svg](_images/example_architecture.svg)\n\n## More custom loss functions and models\nAt the state of writing, only a subset of loss functions are supported out of the box. Check [transformer_heads/constants.py](transformer_heads/constants.py) for more up to date info.\n\nHowever, it is not so hard to add/use different loss functions/models. You'll just need to add their respective information to `loss_fct_map` and `model_type_map`. Just import from `transformer_heads.constants`. To add a loss function, add a mapping from string to torch class. To add a model add a mapping from model type to a 2 tuple out of attribute name of the base model in the Model Class and Base model class. That may sound confusing, but what that means is just the following:\n\n```python\nfrom transformer_heads.constants import model_type_map, loss_fct_map\nimport torch.nn as nn\nfrom transformers import MistralModel\n\nloss_fct_map[\"bce\"] = nn.BCELoss()\nmodel_type_map[\"mistral\"] = (\"model\",MistralModel)\n```\n## Can my transformer architecture be supported?\nOne of the basic assumtions of my library is that there is a transformer class such as the LlamaForCausalLM class of huggingface that has an [attribute pointing to a base model that outputs raw hidden state](https://github.com/huggingface/transformers/blob/7eb3ba82241c927053689270a0751f4ff5d33c54/src/transformers/models/llama/modeling_llama.py#L1116). If your transformers model is built up in a similar way, adding support may be as easy as adding an entry to the [model_type_map](https://github.com/center-for-humans-and-machines/transformer-heads/blob/8ea0805ab95ca01dff7ea73ed9c844df946c17cb/transformer_heads/constants.py#L20) with the name of the attribute and the class of the base model. You can either do that by importing from [constants.py](transformer_heads/constants.py) or by adding it directly and creating a pull request.\n\n## Q&A\n* Is Llama-3 supported? YES! Check [here](https://github.com/center-for-humans-and-machines/transformer-heads/issues/3)\n* How do I use my model for inference? Check the notebooks or [this](https://github.com/center-for-humans-and-machines/transformer-heads/issues/2) issue to get started.",
"bugtrack_url": null,
"license": "MIT License",
"summary": "Attach custom heads to transformer models.",
"version": "0.1.3",
"project_urls": null,
"split_keywords": [
"linear probe",
" qlora",
" transformer"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "9ff4b73c0a74231fcfbad5f98997027144e071c7c8df03c71173edd690c237e6",
"md5": "a9e928eaed0e4ba7f9ea87cbfbdb324f",
"sha256": "1350197f7701ff6af78412205bbcd7e8b3f2d14b980fc23ae3a42a4e33918c05"
},
"downloads": -1,
"filename": "transformer_heads-0.1.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a9e928eaed0e4ba7f9ea87cbfbdb324f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 31156,
"upload_time": "2024-11-29T22:37:02",
"upload_time_iso_8601": "2024-11-29T22:37:02.138289Z",
"url": "https://files.pythonhosted.org/packages/9f/f4/b73c0a74231fcfbad5f98997027144e071c7c8df03c71173edd690c237e6/transformer_heads-0.1.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "9e6cb63443a7105add470f26c2ee1f943a3103a1419c51b1e58d4b6854d58ee0",
"md5": "3d8cc595026f88d6cd9ce88458844d11",
"sha256": "98b4cc99303bda603f06789e98d8c87ddfbb5a36f58add7fa8e442bd6b6f22b5"
},
"downloads": -1,
"filename": "transformer_heads-0.1.3.tar.gz",
"has_sig": false,
"md5_digest": "3d8cc595026f88d6cd9ce88458844d11",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 268933,
"upload_time": "2024-11-29T22:37:18",
"upload_time_iso_8601": "2024-11-29T22:37:18.011990Z",
"url": "https://files.pythonhosted.org/packages/9e/6c/b63443a7105add470f26c2ee1f943a3103a1419c51b1e58d4b6854d58ee0/transformer_heads-0.1.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-29 22:37:18",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "transformer-heads"
}