# Steering Vectors
[![ci](https://img.shields.io/github/actions/workflow/status/steering-vectors/steering-vectors/ci.yaml?branch=main)](https://github.com/steering-vectors/steering-vectors)
[![PyPI](https://img.shields.io/pypi/v/steering-vectors?color=blue)](https://pypi.org/project/steering-vectors/)
Steering vectors / representation engineering for transformer language models in Pytorch / Huggingface
Check out our [example notebook](examples/caa_sycophancy.ipynb). <a target="_blank" href="https://colab.research.google.com/github/steering-vectors/steering-vectors/blob/main/examples/caa_sycophancy.ipynb">
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>
Full docs: https://steering-vectors.github.io/steering-vectors
## About
This library provides utilies for training and applying steering vectors to language models (LMs) from [Huggingface](https://huggingface.co/), like GPT, LLaMa, Gemma, Mistral, Pythia, and many more!
For more info on steering vectors and representation engineering, check out the following work:
- [Steering Llama 2 via Contrastive Activation Addition](https://arxiv.org/abs/2312.06681) Rimsky et al., 2023
- [Representation Engineering: A Top-Down Approach to AI Transparency](https://arxiv.org/abs/2310.01405) Zou et al., 2023
## Installation
```
pip install steering-vectors
```
Check out the [full documentation](https://steering-vectors.github.io/steering-vectors/) for more usage info.
## Contributing
Any contributions to improve this project are welcome! Please open an issue or pull request in this repo with any bugfixes / changes / improvements you have.
This project uses [Ruff](https://docs.astral.sh/ruff/) for code formatting and linting, [MyPy](https://mypy.readthedocs.io/en/stable/) for type checking, and [Pytest](https://docs.pytest.org/) for tests. Make sure any changes you submit pass these code checks in your PR. If you have trouble getting these to run feel free to open a pull-request regardless and we can discuss further in the PR.
## License
This code is released under a MIT license.
Raw data
{
"_id": null,
"home_page": "https://steering-vectors.github.io/steering-vectors",
"name": "steering-vectors",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.10",
"maintainer_email": null,
"keywords": null,
"author": "David Chanin",
"author_email": "chanindav@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/91/f5/6e1f61a8d9b3b3852cad8801b8cecf089fc15ca8d48ce332ac30aac6c7b3/steering_vectors-0.10.2.tar.gz",
"platform": null,
"description": "# Steering Vectors\n\n[![ci](https://img.shields.io/github/actions/workflow/status/steering-vectors/steering-vectors/ci.yaml?branch=main)](https://github.com/steering-vectors/steering-vectors)\n[![PyPI](https://img.shields.io/pypi/v/steering-vectors?color=blue)](https://pypi.org/project/steering-vectors/)\n\nSteering vectors / representation engineering for transformer language models in Pytorch / Huggingface\n\nCheck out our [example notebook](examples/caa_sycophancy.ipynb). <a target=\"_blank\" href=\"https://colab.research.google.com/github/steering-vectors/steering-vectors/blob/main/examples/caa_sycophancy.ipynb\">\n <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n</a>\n\nFull docs: https://steering-vectors.github.io/steering-vectors\n\n## About\n\nThis library provides utilies for training and applying steering vectors to language models (LMs) from [Huggingface](https://huggingface.co/), like GPT, LLaMa, Gemma, Mistral, Pythia, and many more!\n\nFor more info on steering vectors and representation engineering, check out the following work:\n\n- [Steering Llama 2 via Contrastive Activation Addition](https://arxiv.org/abs/2312.06681) Rimsky et al., 2023\n- [Representation Engineering: A Top-Down Approach to AI Transparency](https://arxiv.org/abs/2310.01405) Zou et al., 2023\n\n## Installation\n\n```\npip install steering-vectors\n```\n\nCheck out the [full documentation](https://steering-vectors.github.io/steering-vectors/) for more usage info.\n\n## Contributing\n\nAny contributions to improve this project are welcome! Please open an issue or pull request in this repo with any bugfixes / changes / improvements you have.\n\nThis project uses [Ruff](https://docs.astral.sh/ruff/) for code formatting and linting, [MyPy](https://mypy.readthedocs.io/en/stable/) for type checking, and [Pytest](https://docs.pytest.org/) for tests. Make sure any changes you submit pass these code checks in your PR. If you have trouble getting these to run feel free to open a pull-request regardless and we can discuss further in the PR.\n\n## License\n\nThis code is released under a MIT license.\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Steering vectors for transformer language models in Pytorch / Huggingface",
"version": "0.10.2",
"project_urls": {
"Homepage": "https://steering-vectors.github.io/steering-vectors",
"Repository": "https://github.com/steering-vectors/steering-vectors"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ab36fd04b696fa578efeb3daec744ab2f06f71b24f8607b7cf2afca3a4719638",
"md5": "eea19b22c86953eb950c62481bc1882f",
"sha256": "32d052f22f33fc4d7f42fc39441cedc8f4cd70c89fb0f117db88bec6d9498254"
},
"downloads": -1,
"filename": "steering_vectors-0.10.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "eea19b22c86953eb950c62481bc1882f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.10",
"size": 14340,
"upload_time": "2024-04-02T10:37:58",
"upload_time_iso_8601": "2024-04-02T10:37:58.403338Z",
"url": "https://files.pythonhosted.org/packages/ab/36/fd04b696fa578efeb3daec744ab2f06f71b24f8607b7cf2afca3a4719638/steering_vectors-0.10.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "91f56e1f61a8d9b3b3852cad8801b8cecf089fc15ca8d48ce332ac30aac6c7b3",
"md5": "798b8ab74385c8f98bdf55159e186e92",
"sha256": "9924a228f06e80efd49b004b7e8769db343da38263c1e73429a1ad0e281f4618"
},
"downloads": -1,
"filename": "steering_vectors-0.10.2.tar.gz",
"has_sig": false,
"md5_digest": "798b8ab74385c8f98bdf55159e186e92",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.10",
"size": 12024,
"upload_time": "2024-04-02T10:37:59",
"upload_time_iso_8601": "2024-04-02T10:37:59.993039Z",
"url": "https://files.pythonhosted.org/packages/91/f5/6e1f61a8d9b3b3852cad8801b8cecf089fc15ca8d48ce332ac30aac6c7b3/steering_vectors-0.10.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-02 10:37:59",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "steering-vectors",
"github_project": "steering-vectors",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "steering-vectors"
}