transformer-heads


Nametransformer-heads JSON
Version 0.1.0 PyPI version JSON
download
home_pageNone
SummaryAttach custom heads to transformer models.
upload_time2024-04-28 21:42:43
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseMIT License
keywords linear probe qlora transformer
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <h4 align="center">
    <p>
        <a href="https://transformer-heads.readthedocs.io/en/latest/">Documentation</a> |
        <a href="docs/source/getting_started.md">Getting Started</a> |
        <a href="https://www.reddit.com/r/LocalLLaMA/comments/1bnd621/new_library_transformerheads_for_attaching_heads/">Reddit Post with more info</a>
    </p>
</h4>

# Transformer Heads
This library aims to be an allround toolkit for attaching, training, saving and loading of new heads for transformer models.  
A new head could be: 
* A [linear probe](https://arxiv.org/pdf/1610.01644.pdf) used to get an understanding of the information processing in a transformer architecture
* A head to be finetuned jointly with the weights of a pretrained transformer model to perform a completely different kind of task.
    - E.g. a transformer pretrained to do causal language modelling could get a sequence classification head attached and be finetuned to do sentiment classification.
    - Or one could attach a regression head to turn a large language model into a value function for a reinforcement learning problem.

On top of that, attaching multiple heads at once can make multi-task learning easy, making it possible to train very general models.

## Installation
Install from pypi: `pip install transformer-heads`.

Or, clone this repo and from the root of this repository:
`pip install -e .`

## Usage
Create head configurations
```python
head_config = HeadConfig(
    name=f"imdb_head_3",
    layer_hook=-3,  # Attach at the output of the third-to-last transformer-block
    in_size=hidden_size,
    output_activation="linear",
    pred_for_sequence=True,
    loss_fct="cross_entropy",
    num_outputs=2,
    target="label" # The name of the ground-truth column in the dataset
)
```
Create a model with your head from a pretrained transformer model
```python
model = load_headed(
    LlamaForCausalLM,
    "meta-llama/Llama-2-7b-hf",
    head_configs=[heads_config],
)
```
Train you model using (for example) the simple to use huggingface *Trainer* interface:
```python
trainer = Trainer(
    model,
    args=args,
    train_dataset=imdb_dataset["train"],
    data_collator=collator,
)
```

For a more in-depth introduction and a fully working example, check the [linear probe notebook](notebooks/gpt2/linear_probe.ipynb).

## Joint training of multiple linear probes
![_images/multi_linear_probe.svg](_images/multi_linear_probe.svg)

## Notebooks
This repository contains multiple jupyter notebooks for a tutorial/illustration of how do do certain things with this library. Here is an overview of which notebook you should check out depending on the use you are interested in.
* Linear Probes (understanding the inner workings of transformers)
    - Basic example with one probe for causal LM: [notebooks/gpt2/linear_probe.ipynb](notebooks/gpt2/linear_probe.ipynb)
    - Train many probes for causal LM at once: [notebooks/gpt2/multi_linear_probe.ipynb](notebooks/gpt2/multi_linear_probe.ipynb)
    - Train many probes for text classification at once: [notebooks/gpt2/text_classification_linear_probe.ipynb](notebooks/gpt2/text_classification_linear_probe.ipynb)
* Finetuning on a new type of task (with a new head)
    - QLoRA: [notebooks/gpt2/text_classification_qlora.ipynb](notebooks/gpt2/text_classification_qlora.ipynb)
    - Full finetuning: [notebooks/gpt2/text_classification_full_finetune.ipynb](notebooks/gpt2/text_classification_full_finetune.ipynb)
* Joint multi-task learning
    - Many heads doing completely different tasks + QLoRA, all trained at the same time: [notebooks/gpt2/joint_multitask_learning.ipynb](notebooks/gpt2/joint_multitask_learning.ipynb)
* Regression with pretrained transformers
    - Check the regression heads of this notebook: [notebooks/gpt2/joint_multitask_learning.ipynb](notebooks/gpt2/joint_multitask_learning.ipynb)
* Saving and loading
    - Notebook: [notebooks/gpt2/saving_and_loading.ipynb](notebooks/gpt2/saving_and_loading.ipynb)
    - Tests: [transformer_heads/tests/test_load_model.py](transformer_heads/tests/test_load_model.py)

## Joint multi-task training with different types of heads and QLoRA.
![_images/example_architecture.svg](_images/example_architecture.svg)

## More custom loss functions and models
At the state of writing, only a subset of loss functions / models are supported out of the box. At the time of writing, the supported models are `Mistral-7b`, `LLaMA 2` (all model sizes) and `gpt2`. Check [transformer_heads/constants.py](transformer_heads/constants.py) for more up to date info.

However, it is not so hard to add/use different loss functions/models. You'll just need to add their respective information to `loss_fct_map` and `model_type_map`. Just import from `transformer_heads.constants`. To add a loss function, add a mapping from string to torch class. To add a model add a mapping from model type to a 2 tuple out of attribute name of the base model in the Model Class and Base model class. That may sound confusing, but what that means is just the following:

```python
from transformer_heads.constants import model_type_map, loss_fct_map
import torch.nn as nn
from transformers import MistralModel

loss_fct_map["bce"] = nn.BCELoss()
model_type_map["mistral"] = ("model",MistralModel)
```
## Can my transformer architecture be supported?
One of the basic assumtions of my library is that there is a transformer class such as the LlamaForCausalLM class of huggingface that has an [attribute pointing to a base model that outputs raw hidden state](https://github.com/huggingface/transformers/blob/7eb3ba82241c927053689270a0751f4ff5d33c54/src/transformers/models/llama/modeling_llama.py#L1116). If your transformers model is built up in a similar way, adding support may be as easy as adding an entry to the [model_type_map](https://github.com/center-for-humans-and-machines/transformer-heads/blob/8ea0805ab95ca01dff7ea73ed9c844df946c17cb/transformer_heads/constants.py#L20) with the name of the attribute and the class of the base model. You can either do that by importing from [constants.py](transformer_heads/constants.py) or by adding it directly and creating a pull request.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "transformer-heads",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "Linear Probe, Qlora, Transformer",
    "author": null,
    "author_email": "Yannik Keller <yannik@kelnet.de>",
    "download_url": "https://files.pythonhosted.org/packages/7c/d7/0a3c1b76565feb7bc3c0330ffb798fd832cf597ba6bbeae2a99899dd19ba/transformer_heads-0.1.0.tar.gz",
    "platform": null,
    "description": "<h4 align=\"center\">\n    <p>\n        <a href=\"https://transformer-heads.readthedocs.io/en/latest/\">Documentation</a> |\n        <a href=\"docs/source/getting_started.md\">Getting Started</a> |\n        <a href=\"https://www.reddit.com/r/LocalLLaMA/comments/1bnd621/new_library_transformerheads_for_attaching_heads/\">Reddit Post with more info</a>\n    </p>\n</h4>\n\n# Transformer Heads\nThis library aims to be an allround toolkit for attaching, training, saving and loading of new heads for transformer models.  \nA new head could be: \n* A [linear probe](https://arxiv.org/pdf/1610.01644.pdf) used to get an understanding of the information processing in a transformer architecture\n* A head to be finetuned jointly with the weights of a pretrained transformer model to perform a completely different kind of task.\n    - E.g. a transformer pretrained to do causal language modelling could get a sequence classification head attached and be finetuned to do sentiment classification.\n    - Or one could attach a regression head to turn a large language model into a value function for a reinforcement learning problem.\n\nOn top of that, attaching multiple heads at once can make multi-task learning easy, making it possible to train very general models.\n\n## Installation\nInstall from pypi: `pip install transformer-heads`.\n\nOr, clone this repo and from the root of this repository:\n`pip install -e .`\n\n## Usage\nCreate head configurations\n```python\nhead_config = HeadConfig(\n    name=f\"imdb_head_3\",\n    layer_hook=-3,  # Attach at the output of the third-to-last transformer-block\n    in_size=hidden_size,\n    output_activation=\"linear\",\n    pred_for_sequence=True,\n    loss_fct=\"cross_entropy\",\n    num_outputs=2,\n    target=\"label\" # The name of the ground-truth column in the dataset\n)\n```\nCreate a model with your head from a pretrained transformer model\n```python\nmodel = load_headed(\n    LlamaForCausalLM,\n    \"meta-llama/Llama-2-7b-hf\",\n    head_configs=[heads_config],\n)\n```\nTrain you model using (for example) the simple to use huggingface *Trainer* interface:\n```python\ntrainer = Trainer(\n    model,\n    args=args,\n    train_dataset=imdb_dataset[\"train\"],\n    data_collator=collator,\n)\n```\n\nFor a more in-depth introduction and a fully working example, check the [linear probe notebook](notebooks/gpt2/linear_probe.ipynb).\n\n## Joint training of multiple linear probes\n![_images/multi_linear_probe.svg](_images/multi_linear_probe.svg)\n\n## Notebooks\nThis repository contains multiple jupyter notebooks for a tutorial/illustration of how do do certain things with this library. Here is an overview of which notebook you should check out depending on the use you are interested in.\n* Linear Probes (understanding the inner workings of transformers)\n    - Basic example with one probe for causal LM: [notebooks/gpt2/linear_probe.ipynb](notebooks/gpt2/linear_probe.ipynb)\n    - Train many probes for causal LM at once: [notebooks/gpt2/multi_linear_probe.ipynb](notebooks/gpt2/multi_linear_probe.ipynb)\n    - Train many probes for text classification at once: [notebooks/gpt2/text_classification_linear_probe.ipynb](notebooks/gpt2/text_classification_linear_probe.ipynb)\n* Finetuning on a new type of task (with a new head)\n    - QLoRA: [notebooks/gpt2/text_classification_qlora.ipynb](notebooks/gpt2/text_classification_qlora.ipynb)\n    - Full finetuning: [notebooks/gpt2/text_classification_full_finetune.ipynb](notebooks/gpt2/text_classification_full_finetune.ipynb)\n* Joint multi-task learning\n    - Many heads doing completely different tasks + QLoRA, all trained at the same time: [notebooks/gpt2/joint_multitask_learning.ipynb](notebooks/gpt2/joint_multitask_learning.ipynb)\n* Regression with pretrained transformers\n    - Check the regression heads of this notebook: [notebooks/gpt2/joint_multitask_learning.ipynb](notebooks/gpt2/joint_multitask_learning.ipynb)\n* Saving and loading\n    - Notebook: [notebooks/gpt2/saving_and_loading.ipynb](notebooks/gpt2/saving_and_loading.ipynb)\n    - Tests: [transformer_heads/tests/test_load_model.py](transformer_heads/tests/test_load_model.py)\n\n## Joint multi-task training with different types of heads and QLoRA.\n![_images/example_architecture.svg](_images/example_architecture.svg)\n\n## More custom loss functions and models\nAt the state of writing, only a subset of loss functions / models are supported out of the box. At the time of writing, the supported models are `Mistral-7b`, `LLaMA 2` (all model sizes) and `gpt2`. Check [transformer_heads/constants.py](transformer_heads/constants.py) for more up to date info.\n\nHowever, it is not so hard to add/use different loss functions/models. You'll just need to add their respective information to `loss_fct_map` and `model_type_map`. Just import from `transformer_heads.constants`. To add a loss function, add a mapping from string to torch class. To add a model add a mapping from model type to a 2 tuple out of attribute name of the base model in the Model Class and Base model class. That may sound confusing, but what that means is just the following:\n\n```python\nfrom transformer_heads.constants import model_type_map, loss_fct_map\nimport torch.nn as nn\nfrom transformers import MistralModel\n\nloss_fct_map[\"bce\"] = nn.BCELoss()\nmodel_type_map[\"mistral\"] = (\"model\",MistralModel)\n```\n## Can my transformer architecture be supported?\nOne of the basic assumtions of my library is that there is a transformer class such as the LlamaForCausalLM class of huggingface that has an [attribute pointing to a base model that outputs raw hidden state](https://github.com/huggingface/transformers/blob/7eb3ba82241c927053689270a0751f4ff5d33c54/src/transformers/models/llama/modeling_llama.py#L1116). If your transformers model is built up in a similar way, adding support may be as easy as adding an entry to the [model_type_map](https://github.com/center-for-humans-and-machines/transformer-heads/blob/8ea0805ab95ca01dff7ea73ed9c844df946c17cb/transformer_heads/constants.py#L20) with the name of the attribute and the class of the base model. You can either do that by importing from [constants.py](transformer_heads/constants.py) or by adding it directly and creating a pull request.\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "Attach custom heads to transformer models.",
    "version": "0.1.0",
    "project_urls": null,
    "split_keywords": [
        "linear probe",
        " qlora",
        " transformer"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e0b8df111472b563cea7390b4eebb1b72d91bd1c61800cae15c004601c0c527d",
                "md5": "53accd068ba8b8f758c494b63732ea6a",
                "sha256": "2fec51e3f7b1123217e0c2a38e264a2dde95317305dea0c43985991eaa43cf47"
            },
            "downloads": -1,
            "filename": "transformer_heads-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "53accd068ba8b8f758c494b63732ea6a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 30936,
            "upload_time": "2024-04-28T21:42:41",
            "upload_time_iso_8601": "2024-04-28T21:42:41.584276Z",
            "url": "https://files.pythonhosted.org/packages/e0/b8/df111472b563cea7390b4eebb1b72d91bd1c61800cae15c004601c0c527d/transformer_heads-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7cd70a3c1b76565feb7bc3c0330ffb798fd832cf597ba6bbeae2a99899dd19ba",
                "md5": "67713d27bb879d65060fa96b72d03a7f",
                "sha256": "e626bed956229d40acc8437f5c0ce7ce9a686ebde908f4e16c292506f2414523"
            },
            "downloads": -1,
            "filename": "transformer_heads-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "67713d27bb879d65060fa96b72d03a7f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 268489,
            "upload_time": "2024-04-28T21:42:43",
            "upload_time_iso_8601": "2024-04-28T21:42:43.711859Z",
            "url": "https://files.pythonhosted.org/packages/7c/d7/0a3c1b76565feb7bc3c0330ffb798fd832cf597ba6bbeae2a99899dd19ba/transformer_heads-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-28 21:42:43",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "transformer-heads"
}
        
Elapsed time: 0.23719s