interp-embed

Name	interp-embed JSON
Version	0.1.0 JSON
	download
home_page	None
Summary	Toolkit for analyzing unstructured datasets with sparse autoencoders
upload_time	2025-10-23 05:38:15
maintainer	None
docs_url	None
author	Nick Jiang
requires_python	>=3.10
license	MIT
keywords	sparse-autoencoders sae interpretability machine-learning dataset-analysis embeddings
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # InterpEmbed

`interp_embed` is a toolkit for analyzing unstructured (ex. text) datasets with sparse autoencoders (SAEs). It can quickly compute and efficiently store feature activations for data analysis. Given a dataset of documents, `interp_embed` creates sparse, high-dimensional, interpretable embeddings, where each dimension maps to a concept like syntax or topic, for a variety of downstream analysis tasks like dataset diffing, concept correlations, and directed clustering.

## Setup

**With uv (recommended):**
```bash
uv sync  # To install uv, see https://docs.astral.sh/uv/getting-started/installation/
```

**Without uv (using pip):**
```bash
pip install -r requirements.txt
```

Create a `.env` file that has `OPENROUTER_API_KEY` and `OPENAI_KEY`. We use these models for creating feature labels if they don't exist.

## Quickstart
First, create a dataset object. We currently support SAEs from SAELens (`LocalSAE`) and Goodfire (`GoodfireSAE`).

```python
from interp_embed import Dataset
from interp_embed.saes import GoodfireSAE
import pandas as pd

# 1. Load a Goodfire SAE or SAE supported through the SAELens package
sae = GoodfireSAE(
    variant_name="Llama-3.1-8B-Instruct-SAE-l19",  # or "Llama-3.3-70B-Instruct-SAE-l50" for higher quality features
    device="cuda:0", # optional
    quantize=True # optional
)

# 2. Prepare your data as a DataFrame
df = pd.DataFrame({
    "text": ["Good morning!", "Hello there!", "Good afternoon."],
    "date": ["2022-01-10", "2021-08-23", "2023-03-14"] # Metadata column
})

# 3. Create dataset - computes and saves feature activations
dataset = Dataset(
    data=df,
    sae=sae,
    field="text",  # Optional. Column containing text to analyze
    save_path="my_dataset.pkl"  # Optional. Auto-saves progress, which enables recovery if computations fail
)

# 4. In the future, load saved dataset to skip expensive recomputation.
dataset = Dataset.load_from_file("my_dataset.pkl") # # If some activations failed, use 'resume=True' to continue.
```

Here are some commonly used methods.
```python
# Get feature activations as a sparse matrix of shape (N = # documents, F = # features)
embeddings = dataset.latents()

# Get the feature labels if they exist from the SAE
labels = dataset.feature_labels()

# Pass in a feature index to get a more accurate label
new_label = await dataset.label_feature(feature = 65478) # example: "Friendly greetings"

# Annotate a document for a given feature, marking activating tokens with << >>.
annotated_document = dataset[0].token_activations(feature = 65478)

# Extract a list of top documents for a given feature
top_documents = dataset.top_documents_for_feature(feature = 65478)
```

For analyses (e.g. dataset diffing, correlations) done on example datasets, see the `examples/` folder.

## How does this work?

To embed a document, we pass the data into a "reader" LLM and use a sparse autoencoder (SAE) to decompose its internal representation into interpretable concepts known as "features". The number of features per SAE varies from 1000 - 100000. A SAE produces a sparse, high-dimensional vector of feature activations per token that we aggregate into a single document embedding.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "interp-embed",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "sparse-autoencoders, sae, interpretability, machine-learning, dataset-analysis, embeddings",
    "author": "Nick Jiang",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/6e/e9/be4d8aa4d005853ce3e264289c80d692c35e6bc9ab5e74cf82441b66a0eb/interp_embed-0.1.0.tar.gz",
    "platform": null,
    "description": "# InterpEmbed\n\n`interp_embed` is a toolkit for analyzing unstructured (ex. text) datasets with sparse autoencoders (SAEs). It can quickly compute and efficiently store feature activations for data analysis. Given a dataset of documents, `interp_embed` creates sparse, high-dimensional, interpretable embeddings, where each dimension maps to a concept like syntax or topic, for a variety of downstream analysis tasks like dataset diffing, concept correlations, and directed clustering.\n\n## Setup\n\n**With uv (recommended):**\n```bash\nuv sync  # To install uv, see https://docs.astral.sh/uv/getting-started/installation/\n```\n\n**Without uv (using pip):**\n```bash\npip install -r requirements.txt\n```\n\nCreate a `.env` file that has `OPENROUTER_API_KEY` and `OPENAI_KEY`. We use these models for creating feature labels if they don't exist.\n\n## Quickstart\nFirst, create a dataset object. We currently support SAEs from SAELens (`LocalSAE`) and Goodfire (`GoodfireSAE`).\n\n```python\nfrom interp_embed import Dataset\nfrom interp_embed.saes import GoodfireSAE\nimport pandas as pd\n\n# 1. Load a Goodfire SAE or SAE supported through the SAELens package\nsae = GoodfireSAE(\n    variant_name=\"Llama-3.1-8B-Instruct-SAE-l19\",  # or \"Llama-3.3-70B-Instruct-SAE-l50\" for higher quality features\n    device=\"cuda:0\", # optional\n    quantize=True # optional\n)\n\n# 2. Prepare your data as a DataFrame\ndf = pd.DataFrame({\n    \"text\": [\"Good morning!\", \"Hello there!\", \"Good afternoon.\"],\n    \"date\": [\"2022-01-10\", \"2021-08-23\", \"2023-03-14\"] # Metadata column\n})\n\n# 3. Create dataset - computes and saves feature activations\ndataset = Dataset(\n    data=df,\n    sae=sae,\n    field=\"text\",  # Optional. Column containing text to analyze\n    save_path=\"my_dataset.pkl\"  # Optional. Auto-saves progress, which enables recovery if computations fail\n)\n\n# 4. In the future, load saved dataset to skip expensive recomputation.\ndataset = Dataset.load_from_file(\"my_dataset.pkl\") # # If some activations failed, use 'resume=True' to continue.\n```\n\nHere are some commonly used methods.\n```python\n# Get feature activations as a sparse matrix of shape (N = # documents, F = # features)\nembeddings = dataset.latents()\n\n# Get the feature labels if they exist from the SAE\nlabels = dataset.feature_labels()\n\n# Pass in a feature index to get a more accurate label\nnew_label = await dataset.label_feature(feature = 65478) # example: \"Friendly greetings\"\n\n# Annotate a document for a given feature, marking activating tokens with << >>.\nannotated_document = dataset[0].token_activations(feature = 65478)\n\n# Extract a list of top documents for a given feature\ntop_documents = dataset.top_documents_for_feature(feature = 65478)\n```\n\nFor analyses (e.g. dataset diffing, correlations) done on example datasets, see the `examples/` folder.\n\n## How does this work?\n\nTo embed a document, we pass the data into a \"reader\" LLM and use a sparse autoencoder (SAE) to decompose its internal representation into interpretable concepts known as \"features\". The number of features per SAE varies from 1000 - 100000. A SAE produces a sparse, high-dimensional vector of feature activations per token that we aggregate into a single document embedding.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Toolkit for analyzing unstructured datasets with sparse autoencoders",
    "version": "0.1.0",
    "project_urls": null,
    "split_keywords": [
        "sparse-autoencoders",
        " sae",
        " interpretability",
        " machine-learning",
        " dataset-analysis",
        " embeddings"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5228ec7e33341713ffb2b65e871768789791973f0394b80ce577583344e452a4",
                "md5": "98c1d97e40dee60f6c4c88221635c995",
                "sha256": "5fdfb1ebc2e457ddabda20383efbe5a2a330cafc7e350a4c7951f98d8698503b"
            },
            "downloads": -1,
            "filename": "interp_embed-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "98c1d97e40dee60f6c4c88221635c995",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 24197,
            "upload_time": "2025-10-23T05:38:13",
            "upload_time_iso_8601": "2025-10-23T05:38:13.244241Z",
            "url": "https://files.pythonhosted.org/packages/52/28/ec7e33341713ffb2b65e871768789791973f0394b80ce577583344e452a4/interp_embed-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6ee9be4d8aa4d005853ce3e264289c80d692c35e6bc9ab5e74cf82441b66a0eb",
                "md5": "6db11a7eb88c9bc364eeefb6ad6b5508",
                "sha256": "1e3753fba0c0605e86f24d86c557b58e3dcc05aab2c6e2c3348daae4337baffe"
            },
            "downloads": -1,
            "filename": "interp_embed-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "6db11a7eb88c9bc364eeefb6ad6b5508",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 22747,
            "upload_time": "2025-10-23T05:38:15",
            "upload_time_iso_8601": "2025-10-23T05:38:15.045918Z",
            "url": "https://files.pythonhosted.org/packages/6e/e9/be4d8aa4d005853ce3e264289c80d692c35e6bc9ab5e74cf82441b66a0eb/interp_embed-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-23 05:38:15",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "interp-embed"
}

Nick Jiang