Name | repe JSON |
Version |
0.1.4
JSON |
| download |
home_page | None |
Summary | Representation Engineering |
upload_time | 2024-08-14 02:03:43 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.9 |
license | None |
keywords |
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Representation Engineering (RepE)
This is the official repository for "[Representation Engineering: A Top-Down Approach to AI Transparency](https://arxiv.org/abs/2310.01405)"
by [Andy Zou](https://andyzoujm.github.io/), [Long Phan](https://longphan.ai/), [Sarah Chen](https://www.linkedin.com/in/sarah-chen1/), [James Campbell](https://www.linkedin.com/in/jamescampbell57), [Phillip Guo](https://www.linkedin.com/in/phillip-guo), [Richard Ren](https://github.com/notrichardren), [Alexander Pan](https://aypan17.github.io/), [Xuwang Yin](https://xuwangyin.github.io/), [Mantas Mazeika](https://www.linkedin.com/in/mmazeika), [Ann-Kathrin Dombrowski](https://scholar.google.com/citations?user=YoNVKCYAAAAJ&hl=en), [Shashwat Goel](https://in.linkedin.com/in/shashwatgoel42), [Nathaniel Li](https://nat.quest/), [Michael J. Byun](https://www.linkedin.com/in/michael-byun), [Zifan Wang](https://sites.google.com/west.cmu.edu/zifan-wang/home), [Alex Mallen](https://www.linkedin.com/in/alex-mallen-815b01176), [Steven Basart](https://stevenbas.art/), [Sanmi Koyejo](https://cs.stanford.edu/~sanmi/), [Dawn Song](https://dawnsong.io/), [Matt Fredrikson](https://www.cs.cmu.edu/~mfredrik/), [Zico Kolter](https://zicokolter.com/), and [Dan Hendrycks](https://people.eecs.berkeley.edu/~hendrycks/).
Check out our [website and demo here](https://www.ai-transparency.org/).
<img align="center" src="assets/repe_splash.png" width="750">
## Introduction
In this paper, we introduce and characterize the emerging area of representation engineering (RepE), an approach to enhancing the transparency of AI systems that draws on insights from cognitive neuroscience. RepE places population-level representations, rather than neurons or circuits, at the center of analysis, equipping us with novel methods for monitoring and manipulating high-level cognitive phenomena in deep neural networks (DNNs). We provide baselines and an initial analysis of RepE techniques, showing that they offer simple yet effective solutions for improving our understanding and control of large language models. We showcase how these methods can provide traction on a wide range of safety-relevant problems, including truthfulness, memorization, power-seeking, and more, demonstrating the promise of representation-centered transparency research. We hope that this work catalyzes further exploration of RepE and fosters advancements in the transparency and safety of AI systems.
## Installation
To install `repe` from the github repository main branch, run:
```bash
git clone https://github.com/andyzoujm/representation-engineering.git
cd representation-engineering
pip install -e .
```
## Quickstart
Our RepReading and RepControl pipelines inherit the [🤗 Hugging Face pipelines](https://huggingface.co/docs/transformers/main_classes/pipelines) for both classification and generation.
```python
from repe import repe_pipeline_registry # register 'rep-reading' and 'rep-control' tasks into Hugging Face pipelines
repe_pipeline_registry()
# ... initializing model and tokenizer ....
rep_reading_pipeline = pipeline("rep-reading", model=model, tokenizer=tokenizer)
rep_control_pipeline = pipeline("rep-control", model=model, tokenizer=tokenizer, **control_kwargs)
```
## RepReading and RepControl Experiments
Check out [example frontiers](./examples) of Representation Engineering (RepE), containing both RepControl and RepReading implementation. We welcome community contributions as well!
## RepE_eval
We also release a language model evaluation framework [RepE_eval](./repe_eval) based on RepReading that can serve as an additional baseline beside zero-shot and few-shot on standard benchmarks. Please check out our [paper](https://arxiv.org/abs/2310.01405) for more details.
## Citation
If you find this useful in your research, please consider citing:
```
@misc{zou2023transparency,
title={Representation Engineering: A Top-Down Approach to AI Transparency},
author={Andy Zou, Long Phan, Sarah Chen, James Campbell, Phillip Guo, Richard Ren, Alexander Pan, Xuwang Yin, Mantas Mazeika, Ann-Kathrin Dombrowski, Shashwat Goel, Nathaniel Li, Michael J. Byun, Zifan Wang, Alex Mallen, Steven Basart, Sanmi Koyejo, Dawn Song, Matt Fredrikson, Zico Kolter, Dan Hendrycks},
year={2023},
eprint={2310.01405},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
Raw data
{
"_id": null,
"home_page": null,
"name": "repe",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": null,
"author": null,
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/ff/91/246656dfed091069bee837f0a8b90b48f3e0f09ecaae0a7849102f8a21d5/repe-0.1.4.tar.gz",
"platform": null,
"description": "# Representation Engineering (RepE)\nThis is the official repository for \"[Representation Engineering: A Top-Down Approach to AI Transparency](https://arxiv.org/abs/2310.01405)\" \nby [Andy Zou](https://andyzoujm.github.io/), [Long Phan](https://longphan.ai/), [Sarah Chen](https://www.linkedin.com/in/sarah-chen1/), [James Campbell](https://www.linkedin.com/in/jamescampbell57), [Phillip Guo](https://www.linkedin.com/in/phillip-guo), [Richard Ren](https://github.com/notrichardren), [Alexander Pan](https://aypan17.github.io/), [Xuwang Yin](https://xuwangyin.github.io/), [Mantas Mazeika](https://www.linkedin.com/in/mmazeika), [Ann-Kathrin Dombrowski](https://scholar.google.com/citations?user=YoNVKCYAAAAJ&hl=en), [Shashwat Goel](https://in.linkedin.com/in/shashwatgoel42), [Nathaniel Li](https://nat.quest/), [Michael J. Byun](https://www.linkedin.com/in/michael-byun), [Zifan Wang](https://sites.google.com/west.cmu.edu/zifan-wang/home), [Alex Mallen](https://www.linkedin.com/in/alex-mallen-815b01176), [Steven Basart](https://stevenbas.art/), [Sanmi Koyejo](https://cs.stanford.edu/~sanmi/), [Dawn Song](https://dawnsong.io/), [Matt Fredrikson](https://www.cs.cmu.edu/~mfredrik/), [Zico Kolter](https://zicokolter.com/), and [Dan Hendrycks](https://people.eecs.berkeley.edu/~hendrycks/).\n\nCheck out our [website and demo here](https://www.ai-transparency.org/).\n\n<img align=\"center\" src=\"assets/repe_splash.png\" width=\"750\">\n\n## Introduction\nIn this paper, we introduce and characterize the emerging area of representation engineering (RepE), an approach to enhancing the transparency of AI systems that draws on insights from cognitive neuroscience. RepE places population-level representations, rather than neurons or circuits, at the center of analysis, equipping us with novel methods for monitoring and manipulating high-level cognitive phenomena in deep neural networks (DNNs). We provide baselines and an initial analysis of RepE techniques, showing that they offer simple yet effective solutions for improving our understanding and control of large language models. We showcase how these methods can provide traction on a wide range of safety-relevant problems, including truthfulness, memorization, power-seeking, and more, demonstrating the promise of representation-centered transparency research. We hope that this work catalyzes further exploration of RepE and fosters advancements in the transparency and safety of AI systems.\n\n## Installation\n\nTo install `repe` from the github repository main branch, run:\n\n```bash\ngit clone https://github.com/andyzoujm/representation-engineering.git\ncd representation-engineering\npip install -e .\n```\n## Quickstart\n\nOur RepReading and RepControl pipelines inherit the [\ud83e\udd17 Hugging Face pipelines](https://huggingface.co/docs/transformers/main_classes/pipelines) for both classification and generation.\n\n```python\nfrom repe import repe_pipeline_registry # register 'rep-reading' and 'rep-control' tasks into Hugging Face pipelines\nrepe_pipeline_registry()\n\n# ... initializing model and tokenizer ....\n\nrep_reading_pipeline = pipeline(\"rep-reading\", model=model, tokenizer=tokenizer)\nrep_control_pipeline = pipeline(\"rep-control\", model=model, tokenizer=tokenizer, **control_kwargs)\n```\n\n## RepReading and RepControl Experiments\nCheck out [example frontiers](./examples) of Representation Engineering (RepE), containing both RepControl and RepReading implementation. We welcome community contributions as well!\n\n## RepE_eval\nWe also release a language model evaluation framework [RepE_eval](./repe_eval) based on RepReading that can serve as an additional baseline beside zero-shot and few-shot on standard benchmarks. Please check out our [paper](https://arxiv.org/abs/2310.01405) for more details.\n\n## Citation\nIf you find this useful in your research, please consider citing:\n\n```\n@misc{zou2023transparency,\n title={Representation Engineering: A Top-Down Approach to AI Transparency}, \n author={Andy Zou, Long Phan, Sarah Chen, James Campbell, Phillip Guo, Richard Ren, Alexander Pan, Xuwang Yin, Mantas Mazeika, Ann-Kathrin Dombrowski, Shashwat Goel, Nathaniel Li, Michael J. Byun, Zifan Wang, Alex Mallen, Steven Basart, Sanmi Koyejo, Dawn Song, Matt Fredrikson, Zico Kolter, Dan Hendrycks},\n year={2023},\n eprint={2310.01405},\n archivePrefix={arXiv},\n primaryClass={cs.CL}\n}\n```\n",
"bugtrack_url": null,
"license": null,
"summary": "Representation Engineering",
"version": "0.1.4",
"project_urls": {
"Homepage": "https://github.com/andyzoujm/representation-engineering",
"Issues": "https://github.com/andyzoujm/representation-engineering/issues"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "1d8a5ff21a7d4965a62fa6433ee3811cfefda384bf38f58212354b5986e7ad2e",
"md5": "e4404bbf1af1cd2daff189b41524f0b7",
"sha256": "5e6140114d2a907fc7cc74e8709f6356d2436f0153d0a853e0678fb9eb8acf8a"
},
"downloads": -1,
"filename": "repe-0.1.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e4404bbf1af1cd2daff189b41524f0b7",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 17468,
"upload_time": "2024-08-14T02:03:42",
"upload_time_iso_8601": "2024-08-14T02:03:42.216766Z",
"url": "https://files.pythonhosted.org/packages/1d/8a/5ff21a7d4965a62fa6433ee3811cfefda384bf38f58212354b5986e7ad2e/repe-0.1.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ff91246656dfed091069bee837f0a8b90b48f3e0f09ecaae0a7849102f8a21d5",
"md5": "bafbeb340e4feec3a886487850a7b5d8",
"sha256": "ff710d408ec8db95c96ff25e281ef06b4b29266a529a64fed19e701f3f5b3460"
},
"downloads": -1,
"filename": "repe-0.1.4.tar.gz",
"has_sig": false,
"md5_digest": "bafbeb340e4feec3a886487850a7b5d8",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 33396,
"upload_time": "2024-08-14T02:03:43",
"upload_time_iso_8601": "2024-08-14T02:03:43.561777Z",
"url": "https://files.pythonhosted.org/packages/ff/91/246656dfed091069bee837f0a8b90b48f3e0f09ecaae0a7849102f8a21d5/repe-0.1.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-14 02:03:43",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "andyzoujm",
"github_project": "representation-engineering",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "repe"
}