sae-probes


Namesae-probes JSON
Version 0.1.3 PyPI version JSON
download
home_pageNone
SummarySparse probing benchmark for Sparse Autoencoders derived from the paper "Are Sparse Autoencoders Useful? A Case Study in Sparse Probing"
upload_time2025-08-16 22:50:03
maintainerNone
docs_urlNone
authorSubhash Kantamneni
requires_python<4.0,>=3.10
licenseMIT
keywords deep-learning sparse-autoencoders mechanistic-interpretability pytorch
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # SAE Probes Benchmark

[![PyPI](https://img.shields.io/pypi/v/sae-probes?color=blue)](https://pypi.org/project/sae-probes/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![build](https://github.com/sae-probes/sae-probes/actions/workflows/ci.yaml/badge.svg)](https://github.com/sae-probes/sae-probes/actions/workflows/ci.yaml)

This repository contains the code for the paper [_Are Sparse Autoencoders Useful? A Case Study in Sparse Probing_](https://arxiv.org/pdf/2502.16681), but has been reformatted into a Python package that will work with any SAE that can be loaded in [SAELens](https://github.com/jbloomAus/SAELens). This makes it easy to use the sparse probing tasks from the paper as a standalone SAE benchmark.

# Installation

```
pip install sae-probes
```

## Running evaluations

You can run benchmarks directly; any missing model activations are generated on demand. If you don't pass a `model_cache_path`, a temporary directory is used and cleaned up when the function completes. To persist activations across runs (recommended for repeated experiments), provide a `model_cache_path`.

## Training Probes

Probes can be trained directly on the model activations (baselines) or on SAE activations. In both cases, the following test data-balance settings are available: `"normal"`, `"scarcity"`, and `"imbalance"`. For more details about these settings, see the original paper. For the most standard sparse-probing benchmark, use the `normal` setting.

### SAE Probes

The most standard use of this library is as a sparse probing benchmark for SAEs using the `normal` setting. This is demonstrated below:

```python
from sae_probes import run_sae_evals
from sae_lens import SAE

# run the benchmark on a Gemma Scope SAE
release = "gemma-scope-2b-pt-res-canonical"
sae_id = "layer_12/width_16k/canonical"
sae = SAE.from_pretrained(release, sae_id)

run_sae_evals(
  sae=sae,
  model_name="gemma-2-2b",
  hook_name="blocks.12.hook_resid_post",
  reg_type="l1",
  setting="normal",
  results_path="/results/output/path",
  # model_cache_path is optional; if omitted, a temp dir is used and cleared after
  model_cache_path="/path/to/saved/activations",
  ks=[1, 16],
)
```

The sparse probing results for each dataset will be saved to `results_path` as a JSON file per dataset.

### Baseline Probes

You can now run baseline probes using a unified API that matches the SAE evaluation interface:

```python
from sae_probes import run_baseline_evals

# Run baseline probes with consistent API
run_baseline_evals(
  model_name="gemma-2-2b",
  hook_name="blocks.12.hook_resid_post",
  setting="normal",  # or "scarcity", "imbalance"
  results_path="/results/output/path",
  # model_cache_path is optional; if omitted, a temp dir is used and cleared after
  model_cache_path="/path/to/saved/activations",
)
```

#### Output Format

Both SAE and baseline probes now save results as **JSON files** with consistent structure:

- **SAE results**: `sae_probes_{model_name}/{setting}_setting/{dataset}_{hook_name}_{reg_type}.json`
- **Baseline results**: `baseline_results_{model_name}/{setting}_setting/{dataset}_{hook_name}_{method}.json`

Each JSON file contains a list with metrics and metadata for easy comparison between SAE and baseline approaches.

#### Optional: Pre-generating model activations

Pre-generating can speed up repeated runs and lets you inspect the saved tensors. It's optional because benchmarks will auto-generate missing activations on their first run if missing.

```python
from sae_probes import generate_dataset_activations

generate_dataset_activations(
  model_name="gemma-2-2b", # the TransformerLens name of the model
  hook_names=["blocks.12.hook_resid_post"], # Any TLens hook names
  batch_size=64,
  device="cuda",
  model_cache_path="/path/to/save/activations",
)
```

If you skip pre-generation, the benchmarks will create any missing activations automatically. Passing a `model_cache_path` persists them; if omitted, activations will be written to a temporary directory that is deleted after the run.

## Citation

If you use this code in your research, please cite:

```
@inproceedings{kantamnenisparse,
  title={Are Sparse Autoencoders Useful? A Case Study in Sparse Probing},
  author={Kantamneni, Subhash and Engels, Joshua and Rajamanoharan, Senthooran and Tegmark, Max and Nanda, Neel},
  booktitle={Forty-second International Conference on Machine Learning}
}
```


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "sae-probes",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": null,
    "keywords": "deep-learning, sparse-autoencoders, mechanistic-interpretability, PyTorch",
    "author": "Subhash Kantamneni",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/be/ad/7583afb9665f2ab2cad95c4173cda2cb5f209b221411688581ac01b12619/sae_probes-0.1.3.tar.gz",
    "platform": null,
    "description": "# SAE Probes Benchmark\n\n[![PyPI](https://img.shields.io/pypi/v/sae-probes?color=blue)](https://pypi.org/project/sae-probes/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![build](https://github.com/sae-probes/sae-probes/actions/workflows/ci.yaml/badge.svg)](https://github.com/sae-probes/sae-probes/actions/workflows/ci.yaml)\n\nThis repository contains the code for the paper [_Are Sparse Autoencoders Useful? A Case Study in Sparse Probing_](https://arxiv.org/pdf/2502.16681), but has been reformatted into a Python package that will work with any SAE that can be loaded in [SAELens](https://github.com/jbloomAus/SAELens). This makes it easy to use the sparse probing tasks from the paper as a standalone SAE benchmark.\n\n# Installation\n\n```\npip install sae-probes\n```\n\n## Running evaluations\n\nYou can run benchmarks directly; any missing model activations are generated on demand. If you don't pass a `model_cache_path`, a temporary directory is used and cleaned up when the function completes. To persist activations across runs (recommended for repeated experiments), provide a `model_cache_path`.\n\n## Training Probes\n\nProbes can be trained directly on the model activations (baselines) or on SAE activations. In both cases, the following test data-balance settings are available: `\"normal\"`, `\"scarcity\"`, and `\"imbalance\"`. For more details about these settings, see the original paper. For the most standard sparse-probing benchmark, use the `normal` setting.\n\n### SAE Probes\n\nThe most standard use of this library is as a sparse probing benchmark for SAEs using the `normal` setting. This is demonstrated below:\n\n```python\nfrom sae_probes import run_sae_evals\nfrom sae_lens import SAE\n\n# run the benchmark on a Gemma Scope SAE\nrelease = \"gemma-scope-2b-pt-res-canonical\"\nsae_id = \"layer_12/width_16k/canonical\"\nsae = SAE.from_pretrained(release, sae_id)\n\nrun_sae_evals(\n  sae=sae,\n  model_name=\"gemma-2-2b\",\n  hook_name=\"blocks.12.hook_resid_post\",\n  reg_type=\"l1\",\n  setting=\"normal\",\n  results_path=\"/results/output/path\",\n  # model_cache_path is optional; if omitted, a temp dir is used and cleared after\n  model_cache_path=\"/path/to/saved/activations\",\n  ks=[1, 16],\n)\n```\n\nThe sparse probing results for each dataset will be saved to `results_path` as a JSON file per dataset.\n\n### Baseline Probes\n\nYou can now run baseline probes using a unified API that matches the SAE evaluation interface:\n\n```python\nfrom sae_probes import run_baseline_evals\n\n# Run baseline probes with consistent API\nrun_baseline_evals(\n  model_name=\"gemma-2-2b\",\n  hook_name=\"blocks.12.hook_resid_post\",\n  setting=\"normal\",  # or \"scarcity\", \"imbalance\"\n  results_path=\"/results/output/path\",\n  # model_cache_path is optional; if omitted, a temp dir is used and cleared after\n  model_cache_path=\"/path/to/saved/activations\",\n)\n```\n\n#### Output Format\n\nBoth SAE and baseline probes now save results as **JSON files** with consistent structure:\n\n- **SAE results**: `sae_probes_{model_name}/{setting}_setting/{dataset}_{hook_name}_{reg_type}.json`\n- **Baseline results**: `baseline_results_{model_name}/{setting}_setting/{dataset}_{hook_name}_{method}.json`\n\nEach JSON file contains a list with metrics and metadata for easy comparison between SAE and baseline approaches.\n\n#### Optional: Pre-generating model activations\n\nPre-generating can speed up repeated runs and lets you inspect the saved tensors. It's optional because benchmarks will auto-generate missing activations on their first run if missing.\n\n```python\nfrom sae_probes import generate_dataset_activations\n\ngenerate_dataset_activations(\n  model_name=\"gemma-2-2b\", # the TransformerLens name of the model\n  hook_names=[\"blocks.12.hook_resid_post\"], # Any TLens hook names\n  batch_size=64,\n  device=\"cuda\",\n  model_cache_path=\"/path/to/save/activations\",\n)\n```\n\nIf you skip pre-generation, the benchmarks will create any missing activations automatically. Passing a `model_cache_path` persists them; if omitted, activations will be written to a temporary directory that is deleted after the run.\n\n## Citation\n\nIf you use this code in your research, please cite:\n\n```\n@inproceedings{kantamnenisparse,\n  title={Are Sparse Autoencoders Useful? A Case Study in Sparse Probing},\n  author={Kantamneni, Subhash and Engels, Joshua and Rajamanoharan, Senthooran and Tegmark, Max and Nanda, Neel},\n  booktitle={Forty-second International Conference on Machine Learning}\n}\n```\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Sparse probing benchmark for Sparse Autoencoders derived from the paper \"Are Sparse Autoencoders Useful? A Case Study in Sparse Probing\"",
    "version": "0.1.3",
    "project_urls": {
        "Homepage": "https://github.com/sae-probes/sae-probes",
        "Repository": "https://github.com/sae-probes/sae-probes"
    },
    "split_keywords": [
        "deep-learning",
        " sparse-autoencoders",
        " mechanistic-interpretability",
        " pytorch"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "90337ed04eb5ff440069805da3cabe77f8eb5f9124b096dde0c3af093260c3ab",
                "md5": "15f0069a75b05ebbc50068e74fae9b1c",
                "sha256": "17ce7635b365cc3c48ebce87a57fa80f660f3401bf6dc836d3e4be8341b78499"
            },
            "downloads": -1,
            "filename": "sae_probes-0.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "15f0069a75b05ebbc50068e74fae9b1c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.10",
            "size": 45012605,
            "upload_time": "2025-08-16T22:49:58",
            "upload_time_iso_8601": "2025-08-16T22:49:58.966180Z",
            "url": "https://files.pythonhosted.org/packages/90/33/7ed04eb5ff440069805da3cabe77f8eb5f9124b096dde0c3af093260c3ab/sae_probes-0.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "bead7583afb9665f2ab2cad95c4173cda2cb5f209b221411688581ac01b12619",
                "md5": "2e656b5dbbc3d85b611cb5445dc33dbd",
                "sha256": "b1c8ee148007c40534465f7482fafb2569727eebcfbdf0a2c6cd053bc56e8512"
            },
            "downloads": -1,
            "filename": "sae_probes-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "2e656b5dbbc3d85b611cb5445dc33dbd",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 44965164,
            "upload_time": "2025-08-16T22:50:03",
            "upload_time_iso_8601": "2025-08-16T22:50:03.297984Z",
            "url": "https://files.pythonhosted.org/packages/be/ad/7583afb9665f2ab2cad95c4173cda2cb5f209b221411688581ac01b12619/sae_probes-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-16 22:50:03",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "sae-probes",
    "github_project": "sae-probes",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "sae-probes"
}
        
Elapsed time: 0.42753s