# SAE Probes Benchmark
[](https://pypi.org/project/sae-probes/)
[](https://opensource.org/licenses/MIT)
[](https://github.com/sae-probes/sae-probes/actions/workflows/ci.yaml)
This repository contains the code for the paper [_Are Sparse Autoencoders Useful? A Case Study in Sparse Probing_](https://arxiv.org/pdf/2502.16681), but has been reformatted into a Python package that will work with any SAE that can be loaded in [SAELens](https://github.com/jbloomAus/SAELens). This makes it easy to use the sparse probing tasks from the paper as a standalone SAE benchmark.
# Installation
```
pip install sae-probes
```
## Running evaluations
You can run benchmarks directly; any missing model activations are generated on demand. If you don't pass a `model_cache_path`, a temporary directory is used and cleaned up when the function completes. To persist activations across runs (recommended for repeated experiments), provide a `model_cache_path`.
## Training Probes
Probes can be trained directly on the model activations (baselines) or on SAE activations. In both cases, the following test data-balance settings are available: `"normal"`, `"scarcity"`, and `"imbalance"`. For more details about these settings, see the original paper. For the most standard sparse-probing benchmark, use the `normal` setting.
### SAE Probes
The most standard use of this library is as a sparse probing benchmark for SAEs using the `normal` setting. This is demonstrated below:
```python
from sae_probes import run_sae_evals
from sae_lens import SAE
# run the benchmark on a Gemma Scope SAE
release = "gemma-scope-2b-pt-res-canonical"
sae_id = "layer_12/width_16k/canonical"
sae = SAE.from_pretrained(release, sae_id)
run_sae_evals(
sae=sae,
model_name="gemma-2-2b",
hook_name="blocks.12.hook_resid_post",
reg_type="l1",
setting="normal",
results_path="/results/output/path",
# model_cache_path is optional; if omitted, a temp dir is used and cleared after
model_cache_path="/path/to/saved/activations",
ks=[1, 16],
)
```
The sparse probing results for each dataset will be saved to `results_path` as a JSON file per dataset.
### Baseline Probes
You can now run baseline probes using a unified API that matches the SAE evaluation interface:
```python
from sae_probes import run_baseline_evals
# Run baseline probes with consistent API
run_baseline_evals(
model_name="gemma-2-2b",
hook_name="blocks.12.hook_resid_post",
setting="normal", # or "scarcity", "imbalance"
results_path="/results/output/path",
# model_cache_path is optional; if omitted, a temp dir is used and cleared after
model_cache_path="/path/to/saved/activations",
)
```
#### Output Format
Both SAE and baseline probes now save results as **JSON files** with consistent structure:
- **SAE results**: `sae_probes_{model_name}/{setting}_setting/{dataset}_{hook_name}_{reg_type}.json`
- **Baseline results**: `baseline_results_{model_name}/{setting}_setting/{dataset}_{hook_name}_{method}.json`
Each JSON file contains a list with metrics and metadata for easy comparison between SAE and baseline approaches.
#### Optional: Pre-generating model activations
Pre-generating can speed up repeated runs and lets you inspect the saved tensors. It's optional because benchmarks will auto-generate missing activations on their first run if missing.
```python
from sae_probes import generate_dataset_activations
generate_dataset_activations(
model_name="gemma-2-2b", # the TransformerLens name of the model
hook_names=["blocks.12.hook_resid_post"], # Any TLens hook names
batch_size=64,
device="cuda",
model_cache_path="/path/to/save/activations",
)
```
If you skip pre-generation, the benchmarks will create any missing activations automatically. Passing a `model_cache_path` persists them; if omitted, activations will be written to a temporary directory that is deleted after the run.
## Citation
If you use this code in your research, please cite:
```
@inproceedings{kantamnenisparse,
title={Are Sparse Autoencoders Useful? A Case Study in Sparse Probing},
author={Kantamneni, Subhash and Engels, Joshua and Rajamanoharan, Senthooran and Tegmark, Max and Nanda, Neel},
booktitle={Forty-second International Conference on Machine Learning}
}
```
Raw data
{
"_id": null,
"home_page": null,
"name": "sae-probes",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.10",
"maintainer_email": null,
"keywords": "deep-learning, sparse-autoencoders, mechanistic-interpretability, PyTorch",
"author": "Subhash Kantamneni",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/be/ad/7583afb9665f2ab2cad95c4173cda2cb5f209b221411688581ac01b12619/sae_probes-0.1.3.tar.gz",
"platform": null,
"description": "# SAE Probes Benchmark\n\n[](https://pypi.org/project/sae-probes/)\n[](https://opensource.org/licenses/MIT)\n[](https://github.com/sae-probes/sae-probes/actions/workflows/ci.yaml)\n\nThis repository contains the code for the paper [_Are Sparse Autoencoders Useful? A Case Study in Sparse Probing_](https://arxiv.org/pdf/2502.16681), but has been reformatted into a Python package that will work with any SAE that can be loaded in [SAELens](https://github.com/jbloomAus/SAELens). This makes it easy to use the sparse probing tasks from the paper as a standalone SAE benchmark.\n\n# Installation\n\n```\npip install sae-probes\n```\n\n## Running evaluations\n\nYou can run benchmarks directly; any missing model activations are generated on demand. If you don't pass a `model_cache_path`, a temporary directory is used and cleaned up when the function completes. To persist activations across runs (recommended for repeated experiments), provide a `model_cache_path`.\n\n## Training Probes\n\nProbes can be trained directly on the model activations (baselines) or on SAE activations. In both cases, the following test data-balance settings are available: `\"normal\"`, `\"scarcity\"`, and `\"imbalance\"`. For more details about these settings, see the original paper. For the most standard sparse-probing benchmark, use the `normal` setting.\n\n### SAE Probes\n\nThe most standard use of this library is as a sparse probing benchmark for SAEs using the `normal` setting. This is demonstrated below:\n\n```python\nfrom sae_probes import run_sae_evals\nfrom sae_lens import SAE\n\n# run the benchmark on a Gemma Scope SAE\nrelease = \"gemma-scope-2b-pt-res-canonical\"\nsae_id = \"layer_12/width_16k/canonical\"\nsae = SAE.from_pretrained(release, sae_id)\n\nrun_sae_evals(\n sae=sae,\n model_name=\"gemma-2-2b\",\n hook_name=\"blocks.12.hook_resid_post\",\n reg_type=\"l1\",\n setting=\"normal\",\n results_path=\"/results/output/path\",\n # model_cache_path is optional; if omitted, a temp dir is used and cleared after\n model_cache_path=\"/path/to/saved/activations\",\n ks=[1, 16],\n)\n```\n\nThe sparse probing results for each dataset will be saved to `results_path` as a JSON file per dataset.\n\n### Baseline Probes\n\nYou can now run baseline probes using a unified API that matches the SAE evaluation interface:\n\n```python\nfrom sae_probes import run_baseline_evals\n\n# Run baseline probes with consistent API\nrun_baseline_evals(\n model_name=\"gemma-2-2b\",\n hook_name=\"blocks.12.hook_resid_post\",\n setting=\"normal\", # or \"scarcity\", \"imbalance\"\n results_path=\"/results/output/path\",\n # model_cache_path is optional; if omitted, a temp dir is used and cleared after\n model_cache_path=\"/path/to/saved/activations\",\n)\n```\n\n#### Output Format\n\nBoth SAE and baseline probes now save results as **JSON files** with consistent structure:\n\n- **SAE results**: `sae_probes_{model_name}/{setting}_setting/{dataset}_{hook_name}_{reg_type}.json`\n- **Baseline results**: `baseline_results_{model_name}/{setting}_setting/{dataset}_{hook_name}_{method}.json`\n\nEach JSON file contains a list with metrics and metadata for easy comparison between SAE and baseline approaches.\n\n#### Optional: Pre-generating model activations\n\nPre-generating can speed up repeated runs and lets you inspect the saved tensors. It's optional because benchmarks will auto-generate missing activations on their first run if missing.\n\n```python\nfrom sae_probes import generate_dataset_activations\n\ngenerate_dataset_activations(\n model_name=\"gemma-2-2b\", # the TransformerLens name of the model\n hook_names=[\"blocks.12.hook_resid_post\"], # Any TLens hook names\n batch_size=64,\n device=\"cuda\",\n model_cache_path=\"/path/to/save/activations\",\n)\n```\n\nIf you skip pre-generation, the benchmarks will create any missing activations automatically. Passing a `model_cache_path` persists them; if omitted, activations will be written to a temporary directory that is deleted after the run.\n\n## Citation\n\nIf you use this code in your research, please cite:\n\n```\n@inproceedings{kantamnenisparse,\n title={Are Sparse Autoencoders Useful? A Case Study in Sparse Probing},\n author={Kantamneni, Subhash and Engels, Joshua and Rajamanoharan, Senthooran and Tegmark, Max and Nanda, Neel},\n booktitle={Forty-second International Conference on Machine Learning}\n}\n```\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Sparse probing benchmark for Sparse Autoencoders derived from the paper \"Are Sparse Autoencoders Useful? A Case Study in Sparse Probing\"",
"version": "0.1.3",
"project_urls": {
"Homepage": "https://github.com/sae-probes/sae-probes",
"Repository": "https://github.com/sae-probes/sae-probes"
},
"split_keywords": [
"deep-learning",
" sparse-autoencoders",
" mechanistic-interpretability",
" pytorch"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "90337ed04eb5ff440069805da3cabe77f8eb5f9124b096dde0c3af093260c3ab",
"md5": "15f0069a75b05ebbc50068e74fae9b1c",
"sha256": "17ce7635b365cc3c48ebce87a57fa80f660f3401bf6dc836d3e4be8341b78499"
},
"downloads": -1,
"filename": "sae_probes-0.1.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "15f0069a75b05ebbc50068e74fae9b1c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.10",
"size": 45012605,
"upload_time": "2025-08-16T22:49:58",
"upload_time_iso_8601": "2025-08-16T22:49:58.966180Z",
"url": "https://files.pythonhosted.org/packages/90/33/7ed04eb5ff440069805da3cabe77f8eb5f9124b096dde0c3af093260c3ab/sae_probes-0.1.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "bead7583afb9665f2ab2cad95c4173cda2cb5f209b221411688581ac01b12619",
"md5": "2e656b5dbbc3d85b611cb5445dc33dbd",
"sha256": "b1c8ee148007c40534465f7482fafb2569727eebcfbdf0a2c6cd053bc56e8512"
},
"downloads": -1,
"filename": "sae_probes-0.1.3.tar.gz",
"has_sig": false,
"md5_digest": "2e656b5dbbc3d85b611cb5445dc33dbd",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.10",
"size": 44965164,
"upload_time": "2025-08-16T22:50:03",
"upload_time_iso_8601": "2025-08-16T22:50:03.297984Z",
"url": "https://files.pythonhosted.org/packages/be/ad/7583afb9665f2ab2cad95c4173cda2cb5f209b221411688581ac01b12619/sae_probes-0.1.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-16 22:50:03",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "sae-probes",
"github_project": "sae-probes",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "sae-probes"
}