# Pay attention pipeline
This repository provides a pipeline for computing **Influence** and performing **Generation with GUIDE** to enhance transformer model explainability and performance. The pipeline leverages attention scores and embedding vectors to assess the importance of specific subsequences and applies various levels of instruction enhancement to improve model responses.
## Features
- **Generation with GUIDE**: Use guided instruction to generate more accurate and contextually relevant outputs from transformer models.
- **Influence Calculation**: Assess the impact of specific subsequences on the model's predictions using attention scores and embedding vectors.
# Motivation
Large Language Models (LLMs) are currently the state-of-the-art in most NLP tasks. Despite their success, pretrained LLMs sometimes struggle to accurately interpret diverse users' instructions and may generate outputs that do not align with human expectations. Additionally, LLMs can produce biased or hallucinated facts, which can limit their practical usefulness.
Other work indicate that transformers are less likely to align with instructions as the context length grows. In such cases, rather than fulfilling the user's request, the model generates nonsensical text or repeats segments from the prompt.
A common solution to this problem is Supervised Fine Tuning (SFT) and Reinforcement Learning. However, these approaches are resource-intensive, time-consuming, and sensitive to specific data and tasks. Ideally, a more efficient approach would be one that, once implemented, does not require additional training.
In that sense, due to its low cost and broad accessibility, prompt engineering is widely used to align the outputs of LLMs with user preferences. However, this method does not always produce consistent results and can be very unstable.
We present GUIDE (**G**uided **U**nderstanding with **I**nstruction-**D**riven **E**nhancements): a systematic approach that allows users to emphasize instructions in their prompts.
# GUIDE
GUIDE is a novel and systematic approach that enables users to highlight critical instructions within the text input provided to an LLM. This pipeline implements GUIDE and enables users to influence the attention given to specific tokens by simply enclosing important text within tags like ```<!-> <-!>``` (as shown below). We propose to achieve this by simply adding a bias, denoted by $\Delta$, to the attention logits of the important tokens, i.e., $\bar{w}_{k,i}^{(\ell)} = w_{k,i}^{(\ell)} + \Delta,$ for all tokens indicated by the user, as shown by the attention matrices below, where each entry represents the impact of a past token (x-axis) on the ongoing token (y-axis).
![GUIDE](img/PayAttentionToWhatMatters-Workshop-extended.drawio.png)
Our results show that GUIDE substantially improves the accuracy of following certain instructions, outperforming natural prompting alternatives and Supervised Fine Tuning up to 1M tokens.
## Installation
To set up the environment for this project, follow these steps:
**Install python package**
```bash
pip install pay-attention-pipeline
```
## Usage
Normally, one would load a pipeline using the Hugging Face pipeline as shown below:
```python
from transformers import pipeline
pipe = pipeline(
"text-generation",
model="your_model_name",
)
prompt = '''
The Eiffel Tower, an iconic symbol of Paris and France, was completed in 1889 as the centerpiece of the Exposition Universelle, a world’s fair celebrating the centennial of the French Revolution...
'''
out = pipe("Rewrite in French" + prompt, max_new_tokens = 100)
```
However, with this repository, you can use our custom ```PayAttentionPipeline``` to take advantage of the specialized features provided here: GUIDE and Influence.
If your prompt does not contain the tags `<?-> <-?>`, `<!-> <-!>`, `<!!-> <-!!>` or `<!!!-> <-!!!>`, our pipeline works exactly the same as HuggingFace's one
The influence metric assesses the importance of a subsequence in the context of the model's predictions. Here’s how to compute it:
1. **Load the Pipeline**
```python
from transformers import pipeline
from transformers.pipelines import PIPELINE_REGISTRY
from pay_attention_pipeline import PayAttentionPipeline
pipe = pipeline(
"pay-attention",
model="mistralai/Mistral-7B-Instruct-v0.1",
)
PIPELINE_REGISTRY.check_task("pay-attention") # check if the pipeline is correctly loaded
prompt = "Add you prompt here"
```
2. **Apply GUIDE Levels**
Enhance the generation using various levels of instruction:
```python
message_1 = [{'role': 'user', 'content': "<!-> Rewrite in French: <-!>" + prompt}]
out_1 = pipe(message_1, max_new_tokens=100)
message_2 = [{'role': 'user', 'content': "<!!-> Rewrite in French: <-!!>" + prompt}]
out_2 = pipe(message_2, max_new_tokens=100)
message_3 = [{'role': 'user', 'content': "<!!!-> Rewrite in French: <-!!!>" + prompt}]
out_3 = pipe(message_3, max_new_tokens=100)
```
Adjust the enhancement values as needed for your task.
3. **Customizing (Optional)**
To experiment with other values of delta, set `delta_mid`:
```python
dumb_pipe = pipeline(
"pay-attention",
model=base_model,
tokenizer=tokenizer,
model_kwargs=dict(cache_dir="/Data"),
**dict(delta_mid=10)
)
message = [{'role': 'user', 'content': "<!!-> Rewrite in French: <-!!>" + prompt}]
out = dumb_pipe(message, max_new_tokens=100)
```
# Influence
While GUIDE does not require additional training, it does necessitate the careful selection of how much to increase attention weights. In our study, we propose default values for certain tasks, but we also recognize the need to quantify these adjustments. To address this, we introduce a novel metric called *Influence*. This metric measures the importance of specific tokens in relation to instruction tokens within the text, and we use it to determine reasonable values for the increase in attention weights.
1. **Compute Influence**
To compute the importance of specific text within a given context, wrap the text with `<?->` and `<?->` tokens. The output will be a dictionary of tensors, where each tensor represents the importance of the enclosed text across the context length.
We provide two metrics to measure importance:
1. **Influence** (default)
2. **Attention Rollout**
By default, **Influence** is used to compute the importance. If you want to compute importance with a custom value for `Δ` (i.e., `Δ ≠ 0`), you can add the parameter `delta_influence`.
```python
prompt = '''
The Eiffel Tower, an iconic symbol of Paris and France, was completed in 1889 as the centerpiece of the Exposition Universelle, a world’s fair celebrating the centennial of the French Revolution...
'''
out1 = pipe("<?-> Rewrite in French <-?>" + prompt, max_new_tokens=100)
out2 = pipe("<?-> Rewrite in French <-?>" + prompt, max_new_tokens=100, delta_influence = 1)
out3 = pipe("<?-> Rewrite in French <-?>" + prompt, max_new_tokens=100, metric = 'attention_rollout')
out_caps = pipe("<?-> REWRITE IN FRENCH <-?>" + prompt, max_new_tokens = 100, )
influence = out1['influence']
influence_delta = out2['influence']
rollout = out3['influence']
influence_caps = out_caps['influence']
```
You can visualize the influence of different layers as follows. We also provide [here](examples/influence.ipynb) one example of plotting using HoloViews.
<!--
```python
import torch
import torch.nn.functional as F
import matplotlib.pyplot as plt
def rolling_mean(x, window_size):
# (Function implementation)
layers_to_plot = [0, 15, 31]
layers_to_axs_idx = {v: i for i, v in enumerate(layers_to_plot)}
n_plots = len(layers_to_plot)
fig, axes = plt.subplots(1, n_plots, figsize=(n_plots * 5, 4))
for layer_idx in layers_to_plot:
plot_idx = layers_to_axs_idx[layer_idx]
axes[plot_idx].plot(
rolling_mean(torch.log(influence[layer_idx]), 10)[10:],
label="Normal"
)
axes[plot_idx].plot(
rolling_mean(torch.log(influence_caps[layer_idx]), 10)[10:],
label="Uppercase"
)
axes[plot_idx].plot(
rolling_mean(torch.log(influence_delta[layer_idx]), 10)[10:],
label = r"$\Delta = 1$"
)
axes[plot_idx].plot(
rolling_mean(torch.log(influence_delta[layer_idx]), 10)[10:],
label = r"$\Delta = 1$"
)
axes[plot_idx].set_title(f"Layer {layer_idx+1}")
axes[plot_idx].grid()
axes[plot_idx].set_xlabel("context length")
axes[plot_idx].set_ylabel("log influence")
axes[plot_idx].legend()
``` -->
![Influence Plot](img/example_influence.png)
## Citation
If you use this for ressearch, please cite our paper.
```latex
@misc{silva2024payattention,
title={Pay attention to what matters},
author={Silva, Pedro Luiz and Ayed, Fadhel and de Domenico, Antonio and Maatouk, Ali},
year={2024},
}
```
## License
This project is licensed under the MIT License.
## Contributing
Contributions are welcome! Please submit a pull request or open an issue to discuss potential improvements.
Raw data
{
"_id": null,
"home_page": "https://github.com/netop-team/pay_attention_pipeline",
"name": "pay-attention-pipeline",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": null,
"author": "Pedro Silva",
"author_email": "pedrolmssilva@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/a2/e4/e360f1f35836013ebeda309dcd06b265908e1bb055f709c08941353a1a70/pay_attention_pipeline-0.1.8.tar.gz",
"platform": null,
"description": "# Pay attention pipeline\n\n\nThis repository provides a pipeline for computing **Influence** and performing **Generation with GUIDE** to enhance transformer model explainability and performance. The pipeline leverages attention scores and embedding vectors to assess the importance of specific subsequences and applies various levels of instruction enhancement to improve model responses.\n\n## Features\n\n- **Generation with GUIDE**: Use guided instruction to generate more accurate and contextually relevant outputs from transformer models.\n- **Influence Calculation**: Assess the impact of specific subsequences on the model's predictions using attention scores and embedding vectors.\n\n# Motivation\n\nLarge Language Models (LLMs) are currently the state-of-the-art in most NLP tasks. Despite their success, pretrained LLMs sometimes struggle to accurately interpret diverse users' instructions and may generate outputs that do not align with human expectations. Additionally, LLMs can produce biased or hallucinated facts, which can limit their practical usefulness. \n\nOther work indicate that transformers are less likely to align with instructions as the context length grows. In such cases, rather than fulfilling the user's request, the model generates nonsensical text or repeats segments from the prompt.\n\nA common solution to this problem is Supervised Fine Tuning (SFT) and Reinforcement Learning. However, these approaches are resource-intensive, time-consuming, and sensitive to specific data and tasks. Ideally, a more efficient approach would be one that, once implemented, does not require additional training.\n\nIn that sense, due to its low cost and broad accessibility, prompt engineering is widely used to align the outputs of LLMs with user preferences. However, this method does not always produce consistent results and can be very unstable.\n\nWe present GUIDE (**G**uided **U**nderstanding with **I**nstruction-**D**riven **E**nhancements): a systematic approach that allows users to emphasize instructions in their prompts.\n\n\n# GUIDE \n\nGUIDE is a novel and systematic approach that enables users to highlight critical instructions within the text input provided to an LLM. This pipeline implements GUIDE and enables users to influence the attention given to specific tokens by simply enclosing important text within tags like ```<!-> <-!>``` (as shown below). We propose to achieve this by simply adding a bias, denoted by $\\Delta$, to the attention logits of the important tokens, i.e., $\\bar{w}_{k,i}^{(\\ell)} = w_{k,i}^{(\\ell)} + \\Delta,$ for all tokens indicated by the user, as shown by the attention matrices below, where each entry represents the impact of a past token (x-axis) on the ongoing token (y-axis).\n\n![GUIDE](img/PayAttentionToWhatMatters-Workshop-extended.drawio.png)\n\nOur results show that GUIDE substantially improves the accuracy of following certain instructions, outperforming natural prompting alternatives and Supervised Fine Tuning up to 1M tokens.\n\n\n## Installation\n\n\nTo set up the environment for this project, follow these steps:\n\n**Install python package**\n\n ```bash\n pip install pay-attention-pipeline\n ```\n\n\n## Usage\n\nNormally, one would load a pipeline using the Hugging Face pipeline as shown below:\n\n```python\nfrom transformers import pipeline\n\npipe = pipeline(\n \"text-generation\",\n model=\"your_model_name\",\n)\n\nprompt = '''\n The Eiffel Tower, an iconic symbol of Paris and France, was completed in 1889 as the centerpiece of the Exposition Universelle, a world\u2019s fair celebrating the centennial of the French Revolution...\n '''\n\nout = pipe(\"Rewrite in French\" + prompt, max_new_tokens = 100)\n```\n\nHowever, with this repository, you can use our custom ```PayAttentionPipeline``` to take advantage of the specialized features provided here: GUIDE and Influence.\n\nIf your prompt does not contain the tags `<?-> <-?>`, `<!-> <-!>`, `<!!-> <-!!>` or `<!!!-> <-!!!>`, our pipeline works exactly the same as HuggingFace's one\n\nThe influence metric assesses the importance of a subsequence in the context of the model's predictions. Here\u2019s how to compute it:\n\n1. **Load the Pipeline**\n\n ```python\n from transformers import pipeline\n from transformers.pipelines import PIPELINE_REGISTRY\n from pay_attention_pipeline import PayAttentionPipeline\n\n pipe = pipeline(\n \"pay-attention\",\n model=\"mistralai/Mistral-7B-Instruct-v0.1\",\n )\n\n PIPELINE_REGISTRY.check_task(\"pay-attention\") # check if the pipeline is correctly loaded\n prompt = \"Add you prompt here\"\n ```\n\n2. **Apply GUIDE Levels**\n\n Enhance the generation using various levels of instruction:\n\n ```python\n message_1 = [{'role': 'user', 'content': \"<!-> Rewrite in French: <-!>\" + prompt}]\n out_1 = pipe(message_1, max_new_tokens=100)\n\n message_2 = [{'role': 'user', 'content': \"<!!-> Rewrite in French: <-!!>\" + prompt}]\n out_2 = pipe(message_2, max_new_tokens=100)\n\n message_3 = [{'role': 'user', 'content': \"<!!!-> Rewrite in French: <-!!!>\" + prompt}]\n out_3 = pipe(message_3, max_new_tokens=100)\n ```\n\n Adjust the enhancement values as needed for your task.\n\n\n3. **Customizing (Optional)**\n\n To experiment with other values of delta, set `delta_mid`:\n\n ```python\n dumb_pipe = pipeline(\n \"pay-attention\",\n model=base_model,\n tokenizer=tokenizer,\n model_kwargs=dict(cache_dir=\"/Data\"),\n **dict(delta_mid=10)\n )\n message = [{'role': 'user', 'content': \"<!!-> Rewrite in French: <-!!>\" + prompt}]\n out = dumb_pipe(message, max_new_tokens=100)\n ```\n\n# Influence \n\nWhile GUIDE does not require additional training, it does necessitate the careful selection of how much to increase attention weights. In our study, we propose default values for certain tasks, but we also recognize the need to quantify these adjustments. To address this, we introduce a novel metric called *Influence*. This metric measures the importance of specific tokens in relation to instruction tokens within the text, and we use it to determine reasonable values for the increase in attention weights.\n\n1. **Compute Influence**\n\n To compute the importance of specific text within a given context, wrap the text with `<?->` and `<?->` tokens. The output will be a dictionary of tensors, where each tensor represents the importance of the enclosed text across the context length.\n\n We provide two metrics to measure importance:\n\n 1. **Influence** (default)\n 2. **Attention Rollout**\n\n By default, **Influence** is used to compute the importance. If you want to compute importance with a custom value for `\u0394` (i.e., `\u0394 \u2260 0`), you can add the parameter `delta_influence`.\n\n\n ```python\n prompt = '''\n The Eiffel Tower, an iconic symbol of Paris and France, was completed in 1889 as the centerpiece of the Exposition Universelle, a world\u2019s fair celebrating the centennial of the French Revolution...\n '''\n out1 = pipe(\"<?-> Rewrite in French <-?>\" + prompt, max_new_tokens=100)\n out2 = pipe(\"<?-> Rewrite in French <-?>\" + prompt, max_new_tokens=100, delta_influence = 1)\n out3 = pipe(\"<?-> Rewrite in French <-?>\" + prompt, max_new_tokens=100, metric = 'attention_rollout') \n out_caps = pipe(\"<?-> REWRITE IN FRENCH <-?>\" + prompt, max_new_tokens = 100, )\n \n\n influence = out1['influence']\n influence_delta = out2['influence']\n rollout = out3['influence']\n influence_caps = out_caps['influence']\n ```\n\n You can visualize the influence of different layers as follows. We also provide [here](examples/influence.ipynb) one example of plotting using HoloViews.\n<!-- \n ```python\n import torch\n import torch.nn.functional as F\n import matplotlib.pyplot as plt\n\n def rolling_mean(x, window_size):\n # (Function implementation)\n\n layers_to_plot = [0, 15, 31]\n layers_to_axs_idx = {v: i for i, v in enumerate(layers_to_plot)}\n n_plots = len(layers_to_plot)\n fig, axes = plt.subplots(1, n_plots, figsize=(n_plots * 5, 4))\n\n for layer_idx in layers_to_plot:\n plot_idx = layers_to_axs_idx[layer_idx]\n axes[plot_idx].plot(\n rolling_mean(torch.log(influence[layer_idx]), 10)[10:],\n label=\"Normal\"\n )\n axes[plot_idx].plot(\n rolling_mean(torch.log(influence_caps[layer_idx]), 10)[10:],\n label=\"Uppercase\"\n )\n\n axes[plot_idx].plot(\n rolling_mean(torch.log(influence_delta[layer_idx]), 10)[10:],\n label = r\"$\\Delta = 1$\"\n )\n\n axes[plot_idx].plot(\n rolling_mean(torch.log(influence_delta[layer_idx]), 10)[10:],\n label = r\"$\\Delta = 1$\"\n )\n axes[plot_idx].set_title(f\"Layer {layer_idx+1}\")\n axes[plot_idx].grid()\n axes[plot_idx].set_xlabel(\"context length\")\n axes[plot_idx].set_ylabel(\"log influence\")\n axes[plot_idx].legend()\n ``` -->\n\n![Influence Plot](img/example_influence.png)\n\n## Citation\n\nIf you use this for ressearch, please cite our paper.\n\n```latex\n@misc{silva2024payattention,\n title={Pay attention to what matters},\n author={Silva, Pedro Luiz and Ayed, Fadhel and de Domenico, Antonio and Maatouk, Ali},\n year={2024},\n}\n```\n\n\n## License\n\nThis project is licensed under the MIT License.\n\n## Contributing\n\nContributions are welcome! Please submit a pull request or open an issue to discuss potential improvements.\n",
"bugtrack_url": null,
"license": null,
"summary": "A Python package that uses PyTorch, Transformers, and NumPy",
"version": "0.1.8",
"project_urls": {
"Homepage": "https://github.com/netop-team/pay_attention_pipeline"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0549f6d715d2e0c63e5538c4f675979a3d90b81cc94773505d6f9f7144229430",
"md5": "ccc4fdf1c54d3f7a1a0cd7fbecedf419",
"sha256": "b79ab871f9dcd00536560f3a6858d4b002ef89e7b2785f7e7411bd5a58d3430f"
},
"downloads": -1,
"filename": "pay_attention_pipeline-0.1.8-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ccc4fdf1c54d3f7a1a0cd7fbecedf419",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 16613,
"upload_time": "2024-09-07T11:15:53",
"upload_time_iso_8601": "2024-09-07T11:15:53.214607Z",
"url": "https://files.pythonhosted.org/packages/05/49/f6d715d2e0c63e5538c4f675979a3d90b81cc94773505d6f9f7144229430/pay_attention_pipeline-0.1.8-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a2e4e360f1f35836013ebeda309dcd06b265908e1bb055f709c08941353a1a70",
"md5": "a5a2af9d7472ee057f0043a6687b5378",
"sha256": "24501b4645f720951797c4f640c50164041cb7faf386087d216d12aa9fe2c529"
},
"downloads": -1,
"filename": "pay_attention_pipeline-0.1.8.tar.gz",
"has_sig": false,
"md5_digest": "a5a2af9d7472ee057f0043a6687b5378",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 18547,
"upload_time": "2024-09-07T11:15:55",
"upload_time_iso_8601": "2024-09-07T11:15:55.255412Z",
"url": "https://files.pythonhosted.org/packages/a2/e4/e360f1f35836013ebeda309dcd06b265908e1bb055f709c08941353a1a70/pay_attention_pipeline-0.1.8.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-07 11:15:55",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "netop-team",
"github_project": "pay_attention_pipeline",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "pay-attention-pipeline"
}