# Least-Squares Concept Erasure (LEACE)
Concept erasure aims to remove specified features from a representation. It can be used to improve fairness (e.g. preventing a classifier from using gender or race) and interpretability (e.g. removing a concept to observe changes in model behavior). This is the repo for **LEAst-squares Concept Erasure (LEACE)**, a closed-form method which provably prevents all linear classifiers from detecting a concept while inflicting the least possible damage to the representation. You can check out the paper [here](https://arxiv.org/abs/2306.03819).
# Installation
We require Python 3.10 or later. You can install the package from PyPI:
```bash
pip install concept-erasure
```
# Usage
The two main classes in this repo are `LeaceFitter` and `LeaceEraser`.
- `LeaceFitter` keeps track of the covariance and cross-covariance statistics needed to compute the LEACE erasure function. These statistics can be updated in an incremental fashion with `LeaceFitter.update()`. The erasure function is lazily computed when the `.eraser` property is accessed. This class uses O(_d<sup>2</sup>_) memory, where _d_ is the dimensionality of the representation, so you may want to discard it after computing the erasure function.
- `LeaceEraser` is a compact representation of the LEACE erasure function, using only O(_dk_) memory, where _k_ is the number of classes in the concept you're trying to erase (or equivalently, the _dimensionality_ of the concept if it's not categorical).
## Batch usage
In most cases, you probably have a batch of feature vectors `X` and concept labels `Z` and want to erase the concept from `X`. The easiest way to do this is by using the `LeaceEraser.fit()` convenience method:
```python
import torch
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from concept_erasure import LeaceEraser
n, d, k = 2048, 128, 2
X, Y = make_classification(
n_samples=n,
n_features=d,
n_classes=k,
random_state=42,
)
X_t = torch.from_numpy(X)
Y_t = torch.from_numpy(Y)
# Logistic regression does learn something before concept erasure
real_lr = LogisticRegression(max_iter=1000).fit(X, Y)
beta = torch.from_numpy(real_lr.coef_)
assert beta.norm(p=torch.inf) > 0.1
eraser = LeaceEraser.fit(X_t, Y_t)
X_ = eraser(X_t)
# But learns nothing after
null_lr = LogisticRegression(max_iter=1000, tol=0.0).fit(X_.numpy(), Y)
beta = torch.from_numpy(null_lr.coef_)
assert beta.norm(p=torch.inf) < 1e-4
```
## Streaming usage
If you have a **stream** of data, you can use `LeaceFitter.update()` to update the statistics. This is useful if you have a large dataset and want to avoid storing it all in memory.
```python
from concept_erasure import LeaceFitter
from sklearn.datasets import make_classification
import torch
n, d, k = 2048, 128, 2
X, Y = make_classification(
n_samples=n,
n_features=d,
n_classes=k,
random_state=42,
)
X_t = torch.from_numpy(X)
Y_t = torch.from_numpy(Y)
fitter = LeaceFitter(d, 1, dtype=X_t.dtype)
# Compute cross-covariance matrix using batched updates
for x, y in zip(X_t.chunk(2), Y_t.chunk(2)):
fitter.update(x, y)
# Erase the concept from the data
x_ = fitter.eraser(X_t[0])
```
# Paper replication
Scripts used to generate the part-of-speech tags for the concept scrubbing experiments can be found in [this repo](https://github.com/EleutherAI/tagged-pile). We plan to upload the tagged datasets to the HuggingFace Hub shortly.
## Concept scrubbing
The concept scrubbing code is a bit messy right now, and will probably be refactored soon. We found it necessary to write bespoke implementations for different HuggingFace model families. So far we've implemented LLaMA and GPT-NeoX. These can be found in the `concept_erasure.scrubbing` submodule.
Raw data
{
"_id": null,
"home_page": "",
"name": "concept-erasure",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": "",
"keywords": "fairness,interpretability,explainable-ai",
"author": "",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/73/70/bdbbbeae7ff5bd6e25f2747af23a74115f4fc70a0b158e31b105cc06412f/concept-erasure-0.2.3.tar.gz",
"platform": null,
"description": "# Least-Squares Concept Erasure (LEACE)\nConcept erasure aims to remove specified features from a representation. It can be used to improve fairness (e.g. preventing a classifier from using gender or race) and interpretability (e.g. removing a concept to observe changes in model behavior). This is the repo for **LEAst-squares Concept Erasure (LEACE)**, a closed-form method which provably prevents all linear classifiers from detecting a concept while inflicting the least possible damage to the representation. You can check out the paper [here](https://arxiv.org/abs/2306.03819).\n\n# Installation\n\nWe require Python 3.10 or later. You can install the package from PyPI:\n\n```bash\npip install concept-erasure\n```\n\n# Usage\n\nThe two main classes in this repo are `LeaceFitter` and `LeaceEraser`.\n\n- `LeaceFitter` keeps track of the covariance and cross-covariance statistics needed to compute the LEACE erasure function. These statistics can be updated in an incremental fashion with `LeaceFitter.update()`. The erasure function is lazily computed when the `.eraser` property is accessed. This class uses O(_d<sup>2</sup>_) memory, where _d_ is the dimensionality of the representation, so you may want to discard it after computing the erasure function.\n- `LeaceEraser` is a compact representation of the LEACE erasure function, using only O(_dk_) memory, where _k_ is the number of classes in the concept you're trying to erase (or equivalently, the _dimensionality_ of the concept if it's not categorical).\n\n## Batch usage\n\nIn most cases, you probably have a batch of feature vectors `X` and concept labels `Z` and want to erase the concept from `X`. The easiest way to do this is by using the `LeaceEraser.fit()` convenience method:\n\n```python\nimport torch\nfrom sklearn.datasets import make_classification\nfrom sklearn.linear_model import LogisticRegression\n\nfrom concept_erasure import LeaceEraser\n\nn, d, k = 2048, 128, 2\n\nX, Y = make_classification(\n n_samples=n,\n n_features=d,\n n_classes=k,\n random_state=42,\n)\nX_t = torch.from_numpy(X)\nY_t = torch.from_numpy(Y)\n\n# Logistic regression does learn something before concept erasure\nreal_lr = LogisticRegression(max_iter=1000).fit(X, Y)\nbeta = torch.from_numpy(real_lr.coef_)\nassert beta.norm(p=torch.inf) > 0.1\n\neraser = LeaceEraser.fit(X_t, Y_t)\nX_ = eraser(X_t)\n\n# But learns nothing after\nnull_lr = LogisticRegression(max_iter=1000, tol=0.0).fit(X_.numpy(), Y)\nbeta = torch.from_numpy(null_lr.coef_)\nassert beta.norm(p=torch.inf) < 1e-4\n```\n\n## Streaming usage\nIf you have a **stream** of data, you can use `LeaceFitter.update()` to update the statistics. This is useful if you have a large dataset and want to avoid storing it all in memory.\n\n```python\nfrom concept_erasure import LeaceFitter\nfrom sklearn.datasets import make_classification\nimport torch\n\nn, d, k = 2048, 128, 2\n\nX, Y = make_classification(\n n_samples=n,\n n_features=d,\n n_classes=k,\n random_state=42,\n)\nX_t = torch.from_numpy(X)\nY_t = torch.from_numpy(Y)\n\nfitter = LeaceFitter(d, 1, dtype=X_t.dtype)\n\n# Compute cross-covariance matrix using batched updates\nfor x, y in zip(X_t.chunk(2), Y_t.chunk(2)):\n fitter.update(x, y)\n\n# Erase the concept from the data\nx_ = fitter.eraser(X_t[0])\n```\n\n# Paper replication\n\nScripts used to generate the part-of-speech tags for the concept scrubbing experiments can be found in [this repo](https://github.com/EleutherAI/tagged-pile). We plan to upload the tagged datasets to the HuggingFace Hub shortly.\n\n## Concept scrubbing\n\nThe concept scrubbing code is a bit messy right now, and will probably be refactored soon. We found it necessary to write bespoke implementations for different HuggingFace model families. So far we've implemented LLaMA and GPT-NeoX. These can be found in the `concept_erasure.scrubbing` submodule.\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "Erasing concepts from neural representations with provable guarantees",
"version": "0.2.3",
"project_urls": null,
"split_keywords": [
"fairness",
"interpretability",
"explainable-ai"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "a59fcfe9c0529bcf10bd769878d97d7c134879b52e4a5c5ae6bd2726686368cb",
"md5": "add5146572ad0624a1e5ab257c6cac07",
"sha256": "ad65c3f7d074e2c69bd81861725d5fcac8c1eb4b3a3ce7fa7f85cdeb5d167ba5"
},
"downloads": -1,
"filename": "concept_erasure-0.2.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "add5146572ad0624a1e5ab257c6cac07",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 25657,
"upload_time": "2024-01-10T19:49:30",
"upload_time_iso_8601": "2024-01-10T19:49:30.539967Z",
"url": "https://files.pythonhosted.org/packages/a5/9f/cfe9c0529bcf10bd769878d97d7c134879b52e4a5c5ae6bd2726686368cb/concept_erasure-0.2.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "7370bdbbbeae7ff5bd6e25f2747af23a74115f4fc70a0b158e31b105cc06412f",
"md5": "a323894121cd22713b2e33b603fb66ac",
"sha256": "662ed47e327a8c2d5d459a11facd1e97a5766cdec79777e9a92d5bee39e5979e"
},
"downloads": -1,
"filename": "concept-erasure-0.2.3.tar.gz",
"has_sig": false,
"md5_digest": "a323894121cd22713b2e33b603fb66ac",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 24928,
"upload_time": "2024-01-10T19:49:32",
"upload_time_iso_8601": "2024-01-10T19:49:32.109129Z",
"url": "https://files.pythonhosted.org/packages/73/70/bdbbbeae7ff5bd6e25f2747af23a74115f4fc70a0b158e31b105cc06412f/concept-erasure-0.2.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-01-10 19:49:32",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "concept-erasure"
}