# Tuned Lens 🔎
<a target="_blank" href="https://colab.research.google.com/github/AlignmentResearch/tuned-lens/blob/main/notebooks/interactive.ipynb">
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>
<a target="_blank" href="https://huggingface.co/spaces/AlignmentResearch/tuned-lens">
<img src="https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-sm-dark.svg", alt="Open in Spaces">
</a>
Tools for understanding how transformer predictions are built layer-by-layer.
<img src=https://user-images.githubusercontent.com/12176390/224879115-8bc95f26-68e4-4f43-9b4c-06ca5934a29d.png>
This package provides a simple interface for training and evaluating __tuned lenses__. A tuned lens allows us to peek at the iterative computations a transformer uses to compute the next token.
## What is a Lens?
<img alt="A diagram showing how a translator within the lens allows you to skip intermediate layers." src="https://user-images.githubusercontent.com/12176390/227057947-1ef56811-f91f-48ff-8d2d-ff04cc599125.png" width=400/>
A lens into a transformer with _n_ layers allows you to replace the last _m_ layers of the model with an [affine transformation](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) (we call these affine translators). Each affine translator is trained to minimize the KL divergence between its prediction and the final output distribution of the original model. This means that after training, the tuned lens allows you to skip over these last few layers and see the best prediction that can be made from the model's intermediate representations, i.e., the residual stream, at layer _n - m_.
The reason we need to train an affine translator is that the representations may be rotated, shifted, or stretched from layer to layer. This training differentiates this method from simpler approaches that unembed the residual stream of the network directly using the unembedding matrix, i.e., the [logit lens](https://www.lesswrong.com/posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens). We explain this process and its applications in the paper [Eliciting Latent Predictions from Transformers with the Tuned Lens](https://arxiv.org/abs/2303.08112).
### Acknowledgments
Originally conceived by [Igor Ostrovsky](https://twitter.com/igoro) and [Stella Biderman](https://www.stellabiderman.com/) at [EleutherAI](https://www.eleuther.ai/), this library was built as a collaboration between FAR and EleutherAI researchers.
## Install Instructions
### Installing from PyPI
First, you will need to install the basic prerequisites into a virtual environment:
* Python 3.9+
* PyTorch 1.13.0+
Then, you can simply install the package using pip.
```
pip install tuned-lens
```
### Installing the container
If you prefer to run the training scripts from within a container, you can use the provided Docker container.
```
docker pull ghcr.io/alignmentresearch/tuned-lens:latest
docker run --rm tuned-lens:latest tuned-lens --help
```
## Contributing
Make sure to install the dev dependencies and install the pre-commit hooks.
```
$ git clone https://github.com/AlignmentResearch/tuned-lens.git
$ pip install -e ".[dev]"
$ pre-commit install
```
## Citation
If you find this library useful, please cite it as:
```bibtex
@article{belrose2023eliciting,
title={Eliciting Latent Predictions from Transformers with the Tuned Lens},
authors={Belrose, Nora and Furman, Zach and Smith, Logan and Halawi, Danny and McKinney, Lev and Ostrovsky, Igor and Biderman, Stella and Steinhardt, Jacob},
journal={to appear},
year={2023}
}
```
> **Warning**
> This package has not reached 1.0. Expect the public interface to change regularly and without a major version bumps.
Raw data
{
"_id": null,
"home_page": "",
"name": "tuned-lens",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": "",
"keywords": "nlp,interpretability,language-models,explainable-ai",
"author": "",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/f5/c5/0d0b6867acdf8566ef9f96e2405aec2677619e291db98bbc3e9205ca6b93/tuned-lens-0.1.1.tar.gz",
"platform": null,
"description": "# Tuned Lens \ud83d\udd0e\n<a target=\"_blank\" href=\"https://colab.research.google.com/github/AlignmentResearch/tuned-lens/blob/main/notebooks/interactive.ipynb\">\n <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n</a>\n<a target=\"_blank\" href=\"https://huggingface.co/spaces/AlignmentResearch/tuned-lens\">\n<img src=\"https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-sm-dark.svg\", alt=\"Open in Spaces\">\n</a>\n\n\nTools for understanding how transformer predictions are built layer-by-layer.\n\n<img src=https://user-images.githubusercontent.com/12176390/224879115-8bc95f26-68e4-4f43-9b4c-06ca5934a29d.png>\n\nThis package provides a simple interface for training and evaluating __tuned lenses__. A tuned lens allows us to peek at the iterative computations a transformer uses to compute the next token.\n\n\n## What is a Lens?\n<img alt=\"A diagram showing how a translator within the lens allows you to skip intermediate layers.\" src=\"https://user-images.githubusercontent.com/12176390/227057947-1ef56811-f91f-48ff-8d2d-ff04cc599125.png\" width=400/>\n\nA lens into a transformer with _n_ layers allows you to replace the last _m_ layers of the model with an [affine transformation](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) (we call these affine translators). Each affine translator is trained to minimize the KL divergence between its prediction and the final output distribution of the original model. This means that after training, the tuned lens allows you to skip over these last few layers and see the best prediction that can be made from the model's intermediate representations, i.e., the residual stream, at layer _n - m_.\n\nThe reason we need to train an affine translator is that the representations may be rotated, shifted, or stretched from layer to layer. This training differentiates this method from simpler approaches that unembed the residual stream of the network directly using the unembedding matrix, i.e., the [logit lens](https://www.lesswrong.com/posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens). We explain this process and its applications in the paper [Eliciting Latent Predictions from Transformers with the Tuned Lens](https://arxiv.org/abs/2303.08112).\n\n### Acknowledgments\nOriginally conceived by [Igor Ostrovsky](https://twitter.com/igoro) and [Stella Biderman](https://www.stellabiderman.com/) at [EleutherAI](https://www.eleuther.ai/), this library was built as a collaboration between FAR and EleutherAI researchers.\n\n## Install Instructions\n### Installing from PyPI\nFirst, you will need to install the basic prerequisites into a virtual environment:\n* Python 3.9+\n* PyTorch 1.13.0+\n\nThen, you can simply install the package using pip.\n```\npip install tuned-lens\n```\n\n### Installing the container\nIf you prefer to run the training scripts from within a container, you can use the provided Docker container.\n\n```\ndocker pull ghcr.io/alignmentresearch/tuned-lens:latest\ndocker run --rm tuned-lens:latest tuned-lens --help\n```\n\n## Contributing\nMake sure to install the dev dependencies and install the pre-commit hooks.\n```\n$ git clone https://github.com/AlignmentResearch/tuned-lens.git\n$ pip install -e \".[dev]\"\n$ pre-commit install\n```\n\n## Citation\n\nIf you find this library useful, please cite it as:\n\n```bibtex\n@article{belrose2023eliciting,\n title={Eliciting Latent Predictions from Transformers with the Tuned Lens},\n authors={Belrose, Nora and Furman, Zach and Smith, Logan and Halawi, Danny and McKinney, Lev and Ostrovsky, Igor and Biderman, Stella and Steinhardt, Jacob},\n journal={to appear},\n year={2023}\n}\n```\n\n> **Warning**\n> This package has not reached 1.0. Expect the public interface to change regularly and without a major version bumps.\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "Tools for understanding how transformer predictions are built layer-by-layer",
"version": "0.1.1",
"project_urls": null,
"split_keywords": [
"nlp",
"interpretability",
"language-models",
"explainable-ai"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "cfd377cbc0748acef0253a9e13798a7d91cf2c36f45258c1de02772068332ba3",
"md5": "6320b36236c8e7823680a86b37205455",
"sha256": "e6d15eaa528b936cc494fe4fe1665b561118cb992cf778d57b96abf3cf4a57eb"
},
"downloads": -1,
"filename": "tuned_lens-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6320b36236c8e7823680a86b37205455",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 53445,
"upload_time": "2023-06-13T16:10:11",
"upload_time_iso_8601": "2023-06-13T16:10:11.530542Z",
"url": "https://files.pythonhosted.org/packages/cf/d3/77cbc0748acef0253a9e13798a7d91cf2c36f45258c1de02772068332ba3/tuned_lens-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f5c50d0b6867acdf8566ef9f96e2405aec2677619e291db98bbc3e9205ca6b93",
"md5": "3c9ee00884d2268bc792c35edc6121ad",
"sha256": "2c62203cdb16a32ce80b02fad3da8e711510598ea4384645072be6dc8840d40e"
},
"downloads": -1,
"filename": "tuned-lens-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "3c9ee00884d2268bc792c35edc6121ad",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 44310,
"upload_time": "2023-06-13T16:10:12",
"upload_time_iso_8601": "2023-06-13T16:10:12.917043Z",
"url": "https://files.pythonhosted.org/packages/f5/c5/0d0b6867acdf8566ef9f96e2405aec2677619e291db98bbc3e9205ca6b93/tuned-lens-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-06-13 16:10:12",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "tuned-lens"
}