taker

Name	taker JSON
Version	1.2.0 JSON
	download
home_page	https://github.com/nickypro/taker
Summary	Tools for Transformer Activations Knowledge ExtRaction
upload_time	2025-01-27 09:43:14
maintainer	None
docs_url	None
author	Nicky Pochinkov
requires_python	<4.0,>=3.9
license	None
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # taker

`taker` is a Python library for studying and manipulating Language Models (LLMs), with support for multi-GPU inference, editing, and quantized inference. It provides tools for analyzing model activations and pruning different parts of the model based on those activations.

## Features

- Support for various popular LLM architectures
- Multi-GPU inference and editing
- Quantized inference
- Activation analysis and model pruning
- Hooks for manipulating model behavior

## Installation

```bash
pip install taker
```

## Supported Models

- EleutherAI's Pythia and GPT-NeoX
- Meta's OPT, Galactica, Llama, Llama 2, and Llama 3 models
- MistralAI's Mistral 7B
- OpenAI's GPT-2
- RoBERTa
- Google's Vision Transformer (ViT)
- Google's Gemma and Gemma 2 models

## Quick Start

### Loading a Model

```python
from taker import Model

# Load a model from Hugging Face
# You can choose your level of quantization
# Note that some do not support multi-GPU, so use device_map='cuda:0'
dtype = "fp16" # "fp32", "bf16", "hqq8", "int8", "hqq4", "nf4", "int4"
model = Model("facebook/opt-125m", dtype=dtype)

# Access model attributes
print(f"Number of layers: {model.cfg.n_layer}")
print(f"Model width: {model.cfg.d_model}")
print(f"MLP width: {model.cfg.d_mlp}")
print(f"Number of heads: {model.cfg.n_heads}")
print(f"Head size: {model.cfg.d_head}")
print(f"Vocabulary size: {model.cfg.d_vocab}")
```

### Text Generation

```python
prompt = "I am hungry and want to"
output = model.generate(prompt, num=10)
print(output)
```

### Analyzing Residual Stream Activations

```python
residual_stream = model.get_residual_stream("I am hungry")
print(residual_stream.shape)
# Output: torch.Size([13, 5, 288])
# Format: [n_layer, n_tokens, d_model]
# Layers: [input], [attn_0_output], [mlp_0_output], [attn_1_input], ...

residual_stream_decoder = model.get_residual_stream_decoder("I am hungry")
print(residual_stream_decoder.shape)
# Output: torch.Size([7, 5, 288])
# Format: [n_layer, n_tokens, d_model]
# Layers: [decoder_0_input], [decoder_1_input], ..., [decoder_n_output]
```

## Model Maps
## Model Maps

Model maps in `taker` provide a unified interface for interacting with different model architectures, abstracting away implementation details and allowing consistent access across various models.

### Example Usage

Here's an expanded look at how you can use model maps in `taker`:

```python
from taker import Model

# Initialize a model
model = Model("facebook/opt-125m")

# Accessing high-level model components
embedding = model["embed"]
unembed = model["unembed"]
ln_final = model["ln_final"]

# Working with layers
layer_0 = model.layers[0]
layer_1 = model.layers[1]

# Accessing attention components
attn_weights_q = layer_0["attn.W_Q"]
attn_weights_k = layer_0["attn.W_K"]
attn_weights_v = layer_0["attn.W_V"]
attn_weights_o = layer_0["attn.W_O"]

# Accessing MLP components
mlp_in_weights = layer_0["mlp.W_in"]
mlp_out_weights = layer_0["mlp.W_out"]

# Accessing normalization layers
attn_ln = layer_0["attn.ln_in"]
mlp_ln = layer_0["mlp.ln_in"]

# Modifying model components
import torch

# Change the embedding for a specific token
model["embed.W_E"][1000] = torch.randn_like(model["embed.W_E"][1000])

# Zero out some attention weights
layer_0["attn.W_Q"][:, :10] = 0

# Print model configuration
print(f"Model has {len(model.layers)} layers")
print(f"Hidden size: {model.cfg.d_model}")
print(f"Number of attention heads: {model.cfg.n_heads}")

# Accessing and modifying biases (if present in the model)
if "attn.b_Q" in layer_0:
    attn_bias_q = layer_0["attn.b_Q"]
    layer_0["attn.b_Q"] += 0.1  # Add a small bias

# Iterating over all layers
for i, layer in enumerate(model.layers):
    print(f"Layer {i} attention output shape: {layer['attn.W_O'].shape}")
```

This example demonstrates how to access and modify various components of the model using the model map interface. It shows operations on embeddings, attention weights, MLP weights, and layer normalization parameters.

### Types of Model Maps

`taker` uses two types of model maps:

1. **Model-level maps**: For high-level model components.
2. **Layer-level maps**: For the internal structure of individual layers.

### Customization

Advanced users can extend model maps for new architectures or modify existing
ones. For this, see `src/taker/model_maps.py`.

## Advanced Hook Operations

`taker` provides powerful hooks for manipulating model behavior. Here's an in-depth look at Activation Addition, Neuron Replacement, and related operations.

### Hook Points in taker

The `taker` library provides several hook points for manipulating and analyzing model behavior. Here's a comprehensive list of the hook points available:

1. **pre_attn**: Before the attention mechanism
   - Applied to the input of the attention layer

2. **attn_pre_out**: Before the attention output projection
   - Applied to the output of the attention mechanism before the final projection

3. **post_attn**: After the attention mechanism
   - Applied to the output of the attention layer

4. **pre_mlp**: Before the MLP (feedforward) layer
   - Applied to the input of the MLP layer

5. **mlp_pre_out**: Before the MLP output projection
   - Applied to the intermediate output of the MLP before the final projection

6. **post_mlp**: After the MLP layer
   - Applied to the output of the MLP layer

7. **pre_decoder**: Before the entire decoder block
   - Applied to the input of a full decoder layer (including attention and MLP)

8. **post_decoder**: After the entire decoder block
   - Applied to the output of a full decoder layer

- These hook points can be used with various hook types such as `collect`, `mask`, `actadd`, `postbias`, `offset`, `replace`, and `whiten`.
- Hook points can be applied to specific layers or to all layers using the `'all'` specifier.
- The exact behavior and effect of each hook point may vary depending on the specific model architecture.

**Example Configuration**

Here's the default configuration of how these hook points might be configured.

```python
config_string = """
pre_decoder: collect
post_decoder: collect
pre_attn: collect
attn_pre_out: offset, mask, replace, collect, unoffset
post_attn: collect
pre_mlp: collect
mlp_pre_out: offset, mask, replace, collect, unoffset
post_mlp: collect
"""
hook_config = HookConfig().from_string(config_string)
```

This can be loaded into the model at initialisation, or after the fact.

```
model = Model("nickypro/tinyllama-15m", hook_config=config_string)
model.set_hook_config(config_string) # this clears any existing hooks
```

This configuration sets up various hook types at different hook points throughout the model.

### Neuron Replacement Hook

The Neuron Replacement hook in `taker` allows you to completely replace neuron activations at specific token positions. This powerful feature is useful for studying how changes in intermediate activations affect the model's output.

**Implementation and Usage**

The `NeuronReplace` class stores its state in a `torch.nn.ParameterDict` called `param`. Each key in this dictionary is a string representation of a token index, and the corresponding value is a `torch.nn.Parameter` containing the replacement activation for that token.

```python
from taker import Model
import torch

model = Model("facebook/opt-125m")

# Basic usage: Replace neuron activations at a specific token index
replacement_activation = torch.randn(model.cfg.d_mlp)
model.hooks.neuron_replace["layer_2_mlp_pre_out"].add_token(token_index=1, value=replacement_activation)

# Access the NeuronReplace instance for a specific layer
neuron_replace_hook = model.hooks.neuron_replace["layer_2_mlp_pre_out"]

# View the current state
print(neuron_replace_hook.param)

# Manually add or modify a replacement
token_index = 3
new_replacement = torch.randn(model.cfg.d_mlp)
neuron_replace_hook.param[str(token_index)] = torch.nn.Parameter(new_replacement)

# Remove a replacement
del neuron_replace_hook.param[str(token_index)]

# Modify an existing replacement
existing_index = "1"  # Note: keys are stored as strings
if existing_index in neuron_replace_hook.param:
    neuron_replace_hook.param[existing_index].data += 0.1  # Add a small perturbation

# Clear all replacements
neuron_replace_hook.param.clear()

# Set multiple replacements at once
replacements = {
    "0": torch.randn(model.cfg.d_mlp),
    "2": torch.randn(model.cfg.d_mlp),
    "4": torch.randn(model.cfg.d_mlp)
}
for idx, replacement in replacements.items():
    neuron_replace_hook.param[idx] = torch.nn.Parameter(replacement)

# Update max_tokens if necessary
neuron_replace_hook.max_tokens = max(neuron_replace_hook.max_tokens, max(int(idx) for idx in neuron_replace_hook.param.keys()) + 1)

# Reset the hook (clear all replacements and reset counters)
neuron_replace_hook.reset()

# Restart the token counter
# Note: This is typically handled automatically during generation tasks
# Manual restart is only necessary in specific scenarios
neuron_replace_hook.restart()

# Apply neuron replacements across multiple layers
for layer in range(model.cfg.n_layers):
    hook_name = f"layer_{layer}_mlp_pre_out"
    model.hooks.neuron_replace[hook_name].add_token(token_index=0, value=torch.randn(model.cfg.d_mlp))

# Reset all neuron replace hooks across the model
model.hooks.reset_neuron_replace()
```

**Key Features and Concepts**

1. **State Storage**: The state is stored in `self.param`, a `ParameterDict` where keys are token indices (as strings) and values are replacement activations.

2. **Adding Replacements**: Use `add_token(token_index, value)` to add a replacement for a specific token.

3. **Manual Modification**: Access and modify the `param` dictionary directly for fine-grained control.

4. **Resetting**: The `reset()` method clears all replacements and resets internal counters.

5. **Restarting**: The `restart()` method resets the token counter. This is typically handled automatically during generation tasks but can be called manually if needed.

6. **Automatic Behavior**: The hook uses an "autorestart" feature, which automatically resets the token counter when it detects a new sequence (inferred from input size).

7. **Multi-layer Application**: You can apply replacements to multiple layers independently.

8. **Global Reset**: Use `model.hooks.reset_neuron_replace()` to reset all neuron replacement hooks across the entire model.

**Use Cases and Considerations**

- **Activation Study**: Replace activations at specific positions to study their impact on model output.
- **Ablation Experiments**: Systematically replace activations to identify critical neurons or patterns.
- **Intervention Analysis**: Modify intermediate representations to test hypotheses about model behavior.
- **Generation Tasks**: The autorestart feature ensures proper handling of multiple generated sequences without manual intervention in most cases.

By leveraging the Neuron Replacement hook, researchers and developers can conduct detailed analyses of neural network behavior, test interventions, and gain insights into the model's internal representations and decision-making processes.

### Activation Addition

Activation Addition allows you to modify neuron activations at specific
positions in the input sequence. This is otherwise the same as Neuron Replacement,
except that the replacement is added to the original activation rather than
replacing it.

```python
from taker import Model
import torch

model = Model("facebook/opt-125m")

# Add a custom activation to a specific token position
custom_activation = torch.randn(model.cfg.d_mlp)
model.hooks["mlp_pre_out"][2].add_token(token_index=0, value=custom_activation)

# Add activations for multiple tokens
token_activations = torch.randn(5, model.cfg.d_mlp)  # 5 tokens
for i in range(5):
    model.hooks["mlp_pre_out"][2].add_token(token_index=i, value=token_activations[i])

# Reset the activation additions
model.hooks["mlp_pre_out"][2].reset()

# Manually set all activation additions
all_token_activations = torch.randn(10, model.cfg.d_mlp)  # 10 tokens
model.hooks["mlp_pre_out"][2].set_actadd(all_token_activations)

# Restart the token counter (useful for processing multiple sequences)
model.hooks["mlp_pre_out"][2].restart()
```

### Neuron Masking

Neuron Masking allows you to selectively zero out certain neurons.

```python
# Create a mask for MLP neurons in layer 2
neurons_to_keep = torch.ones(model.cfg.d_mlp)
neurons_to_keep[:10] = 0  # Mask the first 10 neurons

model.hooks["mlp_pre_out"][2].delete_neurons(keep_indices=neurons_to_keep)

# Manually set the mask
new_mask = torch.randint(0, 2, (model.cfg.d_mlp,)).float()
model.hooks["mlp_pre_out"][2]["mask"].set_mask(new_mask)

# Set an offset for masked neurons
offset = torch.randn(model.cfg.d_mlp)
model.hooks["mlp_pre_out"][2]["mask"].set_offset(offset)
```

### Neuron Offsetting

Neuron Offsetting allows you to add a constant offset to neuron activations.

```python
# Set an offset for all neurons in a layer
offset = torch.randn(model.cfg.d_mlp)
model.hooks["mlp_pre_out"][2]["offset"].set_offset(offset)

# Set offsets for all layers at once
all_layer_offsets = torch.randn(model.cfg.n_layers, model.cfg.d_mlp)
model.hooks["mlp_pre_out"].set_offsets(all_layer_offsets)
```

### Collecting Activations

You can use hooks to collect activations from specific parts of the model.

```python
# Enable collection for specific components and layers
model.hooks.enable_collect_hooks(components=["mlp_pre_out", "attn_pre_out"], layers=[0, 1, 2])

# Run a forward pass
_ = model.get_logits("Hello, world!")

# Retrieve collected activations
mlp_activations = model.hooks["mlp_pre_out"]["collect"]
attn_activations = model.hooks["attn_pre_out"]["collect"]

# Disable all collect hooks
model.hooks.disable_all_collect_hooks()
```

### Global Hook Operations

You can perform operations on all hooks of a certain type across the model.

```python
# Delete neurons across all MLP layers
remove_indices = torch.randint(0, 2, (model.cfg.n_layers, model.cfg.d_mlp)).bool()
model.hooks.delete_mlp_neurons(remove_indices)

# Delete neurons in attention layers
attn_remove_indices = torch.randint(0, 2, (model.cfg.n_layers, model.cfg.n_heads * model.cfg.d_head)).bool()
model.hooks.delete_attn_neurons(attn_remove_indices)
```

These advanced hook operations provide fine-grained control over the model's
behavior, allowing for detailed analysis and manipulation of the model's
internal representations.

## Example: Pruning Based on Capabilities

```python
from taker.data_classes import PruningConfig
from taker.prune import run_pruning

config = PruningConfig(
    wandb_project="my_project",
    model_repo="facebook/opt-125m",
    token_limit=1000,
    run_pre_test=True,
    ff_scoring="abs",
    ff_frac=0.02,
    ff_eps=0.001,
    attn_scoring="abs",
    attn_frac=0.00,
    attn_eps=1e-4,
    focus="pile_codeless",
    cripple="code",
)

pruned_model, history = run_pruning(config)
```

## Contributing

Contributions to `taker` are welcome! Please check out our [contributing guidelines](CONTRIBUTING.md) for more information.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/nickypro/taker",
    "name": "taker",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": null,
    "keywords": null,
    "author": "Nicky Pochinkov",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/6f/4f/9476e901e5ad654257eb47bfb131b04805280b6cc42189347affbdd9a007/taker-1.2.0.tar.gz",
    "platform": null,
    "description": "# taker\n\n`taker` is a Python library for studying and manipulating Language Models (LLMs), with support for multi-GPU inference, editing, and quantized inference. It provides tools for analyzing model activations and pruning different parts of the model based on those activations.\n\n## Features\n\n- Support for various popular LLM architectures\n- Multi-GPU inference and editing\n- Quantized inference\n- Activation analysis and model pruning\n- Hooks for manipulating model behavior\n\n## Installation\n\n```bash\npip install taker\n```\n\n## Supported Models\n\n- EleutherAI's Pythia and GPT-NeoX\n- Meta's OPT, Galactica, Llama, Llama 2, and Llama 3 models\n- MistralAI's Mistral 7B\n- OpenAI's GPT-2\n- RoBERTa\n- Google's Vision Transformer (ViT)\n- Google's Gemma and Gemma 2 models\n\n## Quick Start\n\n### Loading a Model\n\n```python\nfrom taker import Model\n\n# Load a model from Hugging Face\n# You can choose your level of quantization\n# Note that some do not support multi-GPU, so use device_map='cuda:0'\ndtype = \"fp16\" #\u00a0\"fp32\", \"bf16\", \"hqq8\", \"int8\", \"hqq4\", \"nf4\", \"int4\"\nmodel = Model(\"facebook/opt-125m\", dtype=dtype)\n\n# Access model attributes\nprint(f\"Number of layers: {model.cfg.n_layer}\")\nprint(f\"Model width: {model.cfg.d_model}\")\nprint(f\"MLP width: {model.cfg.d_mlp}\")\nprint(f\"Number of heads: {model.cfg.n_heads}\")\nprint(f\"Head size: {model.cfg.d_head}\")\nprint(f\"Vocabulary size: {model.cfg.d_vocab}\")\n```\n\n### Text Generation\n\n```python\nprompt = \"I am hungry and want to\"\noutput = model.generate(prompt, num=10)\nprint(output)\n```\n\n### Analyzing Residual Stream Activations\n\n```python\nresidual_stream = model.get_residual_stream(\"I am hungry\")\nprint(residual_stream.shape)\n# Output: torch.Size([13, 5, 288])\n# Format: [n_layer, n_tokens, d_model]\n# Layers: [input], [attn_0_output], [mlp_0_output], [attn_1_input], ...\n\nresidual_stream_decoder = model.get_residual_stream_decoder(\"I am hungry\")\nprint(residual_stream_decoder.shape)\n# Output: torch.Size([7, 5, 288])\n# Format: [n_layer, n_tokens, d_model]\n# Layers: [decoder_0_input], [decoder_1_input], ..., [decoder_n_output]\n```\n\n## Model Maps\n## Model Maps\n\nModel maps in `taker` provide a unified interface for interacting with different model architectures, abstracting away implementation details and allowing consistent access across various models.\n\n### Example Usage\n\nHere's an expanded look at how you can use model maps in `taker`:\n\n```python\nfrom taker import Model\n\n# Initialize a model\nmodel = Model(\"facebook/opt-125m\")\n\n# Accessing high-level model components\nembedding = model[\"embed\"]\nunembed = model[\"unembed\"]\nln_final = model[\"ln_final\"]\n\n# Working with layers\nlayer_0 = model.layers[0]\nlayer_1 = model.layers[1]\n\n# Accessing attention components\nattn_weights_q = layer_0[\"attn.W_Q\"]\nattn_weights_k = layer_0[\"attn.W_K\"]\nattn_weights_v = layer_0[\"attn.W_V\"]\nattn_weights_o = layer_0[\"attn.W_O\"]\n\n# Accessing MLP components\nmlp_in_weights = layer_0[\"mlp.W_in\"]\nmlp_out_weights = layer_0[\"mlp.W_out\"]\n\n# Accessing normalization layers\nattn_ln = layer_0[\"attn.ln_in\"]\nmlp_ln = layer_0[\"mlp.ln_in\"]\n\n# Modifying model components\nimport torch\n\n# Change the embedding for a specific token\nmodel[\"embed.W_E\"][1000] = torch.randn_like(model[\"embed.W_E\"][1000])\n\n# Zero out some attention weights\nlayer_0[\"attn.W_Q\"][:, :10] = 0\n\n# Print model configuration\nprint(f\"Model has {len(model.layers)} layers\")\nprint(f\"Hidden size: {model.cfg.d_model}\")\nprint(f\"Number of attention heads: {model.cfg.n_heads}\")\n\n# Accessing and modifying biases (if present in the model)\nif \"attn.b_Q\" in layer_0:\n    attn_bias_q = layer_0[\"attn.b_Q\"]\n    layer_0[\"attn.b_Q\"] += 0.1  # Add a small bias\n\n# Iterating over all layers\nfor i, layer in enumerate(model.layers):\n    print(f\"Layer {i} attention output shape: {layer['attn.W_O'].shape}\")\n```\n\nThis example demonstrates how to access and modify various components of the model using the model map interface. It shows operations on embeddings, attention weights, MLP weights, and layer normalization parameters.\n\n### Types of Model Maps\n\n`taker` uses two types of model maps:\n\n1. **Model-level maps**: For high-level model components.\n2. **Layer-level maps**: For the internal structure of individual layers.\n\n### Customization\n\nAdvanced users can extend model maps for new architectures or modify existing\nones. For this, see `src/taker/model_maps.py`.\n\n## Advanced Hook Operations\n\n`taker` provides powerful hooks for manipulating model behavior. Here's an in-depth look at Activation Addition, Neuron Replacement, and related operations.\n\n### Hook Points in taker\n\nThe `taker` library provides several hook points for manipulating and analyzing model behavior. Here's a comprehensive list of the hook points available:\n\n1. **pre_attn**: Before the attention mechanism\n   - Applied to the input of the attention layer\n\n2. **attn_pre_out**: Before the attention output projection\n   - Applied to the output of the attention mechanism before the final projection\n\n3. **post_attn**: After the attention mechanism\n   - Applied to the output of the attention layer\n\n4. **pre_mlp**: Before the MLP (feedforward) layer\n   - Applied to the input of the MLP layer\n\n5. **mlp_pre_out**: Before the MLP output projection\n   - Applied to the intermediate output of the MLP before the final projection\n\n6. **post_mlp**: After the MLP layer\n   - Applied to the output of the MLP layer\n\n7. **pre_decoder**: Before the entire decoder block\n   - Applied to the input of a full decoder layer (including attention and MLP)\n\n8. **post_decoder**: After the entire decoder block\n   - Applied to the output of a full decoder layer\n\n- These hook points can be used with various hook types such as `collect`, `mask`, `actadd`, `postbias`, `offset`, `replace`, and `whiten`.\n- Hook points can be applied to specific layers or to all layers using the `'all'` specifier.\n- The exact behavior and effect of each hook point may vary depending on the specific model architecture.\n\n**Example Configuration**\n\nHere's the default configuration of how these hook points might be configured.\n\n```python\nconfig_string = \"\"\"\npre_decoder: collect\npost_decoder: collect\npre_attn: collect\nattn_pre_out: offset, mask, replace, collect, unoffset\npost_attn: collect\npre_mlp: collect\nmlp_pre_out: offset, mask, replace, collect, unoffset\npost_mlp: collect\n\"\"\"\nhook_config = HookConfig().from_string(config_string)\n```\n\nThis can be loaded into the model at initialisation, or after the fact.\n\n```\nmodel = Model(\"nickypro/tinyllama-15m\", hook_config=config_string)\nmodel.set_hook_config(config_string) # this clears any existing hooks\n```\n\nThis configuration sets up various hook types at different hook points throughout the model.\n\n### Neuron Replacement Hook\n\nThe Neuron Replacement hook in `taker` allows you to completely replace neuron activations at specific token positions. This powerful feature is useful for studying how changes in intermediate activations affect the model's output.\n\n**Implementation and Usage**\n\nThe `NeuronReplace` class stores its state in a `torch.nn.ParameterDict` called `param`. Each key in this dictionary is a string representation of a token index, and the corresponding value is a `torch.nn.Parameter` containing the replacement activation for that token.\n\n```python\nfrom taker import Model\nimport torch\n\nmodel = Model(\"facebook/opt-125m\")\n\n# Basic usage: Replace neuron activations at a specific token index\nreplacement_activation = torch.randn(model.cfg.d_mlp)\nmodel.hooks.neuron_replace[\"layer_2_mlp_pre_out\"].add_token(token_index=1, value=replacement_activation)\n\n# Access the NeuronReplace instance for a specific layer\nneuron_replace_hook = model.hooks.neuron_replace[\"layer_2_mlp_pre_out\"]\n\n# View the current state\nprint(neuron_replace_hook.param)\n\n# Manually add or modify a replacement\ntoken_index = 3\nnew_replacement = torch.randn(model.cfg.d_mlp)\nneuron_replace_hook.param[str(token_index)] = torch.nn.Parameter(new_replacement)\n\n# Remove a replacement\ndel neuron_replace_hook.param[str(token_index)]\n\n# Modify an existing replacement\nexisting_index = \"1\"  # Note: keys are stored as strings\nif existing_index in neuron_replace_hook.param:\n    neuron_replace_hook.param[existing_index].data += 0.1  # Add a small perturbation\n\n# Clear all replacements\nneuron_replace_hook.param.clear()\n\n# Set multiple replacements at once\nreplacements = {\n    \"0\": torch.randn(model.cfg.d_mlp),\n    \"2\": torch.randn(model.cfg.d_mlp),\n    \"4\": torch.randn(model.cfg.d_mlp)\n}\nfor idx, replacement in replacements.items():\n    neuron_replace_hook.param[idx] = torch.nn.Parameter(replacement)\n\n# Update max_tokens if necessary\nneuron_replace_hook.max_tokens = max(neuron_replace_hook.max_tokens, max(int(idx) for idx in neuron_replace_hook.param.keys()) + 1)\n\n# Reset the hook (clear all replacements and reset counters)\nneuron_replace_hook.reset()\n\n# Restart the token counter\n# Note: This is typically handled automatically during generation tasks\n# Manual restart is only necessary in specific scenarios\nneuron_replace_hook.restart()\n\n# Apply neuron replacements across multiple layers\nfor layer in range(model.cfg.n_layers):\n    hook_name = f\"layer_{layer}_mlp_pre_out\"\n    model.hooks.neuron_replace[hook_name].add_token(token_index=0, value=torch.randn(model.cfg.d_mlp))\n\n# Reset all neuron replace hooks across the model\nmodel.hooks.reset_neuron_replace()\n```\n\n**Key Features and Concepts**\n\n1. **State Storage**: The state is stored in `self.param`, a `ParameterDict` where keys are token indices (as strings) and values are replacement activations.\n\n2. **Adding Replacements**: Use `add_token(token_index, value)` to add a replacement for a specific token.\n\n3. **Manual Modification**: Access and modify the `param` dictionary directly for fine-grained control.\n\n4. **Resetting**: The `reset()` method clears all replacements and resets internal counters.\n\n5. **Restarting**: The `restart()` method resets the token counter. This is typically handled automatically during generation tasks but can be called manually if needed.\n\n6. **Automatic Behavior**: The hook uses an \"autorestart\" feature, which automatically resets the token counter when it detects a new sequence (inferred from input size).\n\n7. **Multi-layer Application**: You can apply replacements to multiple layers independently.\n\n8. **Global Reset**: Use `model.hooks.reset_neuron_replace()` to reset all neuron replacement hooks across the entire model.\n\n**Use Cases and Considerations**\n\n- **Activation Study**: Replace activations at specific positions to study their impact on model output.\n- **Ablation Experiments**: Systematically replace activations to identify critical neurons or patterns.\n- **Intervention Analysis**: Modify intermediate representations to test hypotheses about model behavior.\n- **Generation Tasks**: The autorestart feature ensures proper handling of multiple generated sequences without manual intervention in most cases.\n\nBy leveraging the Neuron Replacement hook, researchers and developers can conduct detailed analyses of neural network behavior, test interventions, and gain insights into the model's internal representations and decision-making processes.\n\n### Activation Addition\n\nActivation Addition allows you to modify neuron activations at specific\npositions in the input sequence. This is otherwise the same as Neuron Replacement,\nexcept that the replacement is added to the original activation rather than\nreplacing it.\n\n```python\nfrom taker import Model\nimport torch\n\nmodel = Model(\"facebook/opt-125m\")\n\n# Add a custom activation to a specific token position\ncustom_activation = torch.randn(model.cfg.d_mlp)\nmodel.hooks[\"mlp_pre_out\"][2].add_token(token_index=0, value=custom_activation)\n\n# Add activations for multiple tokens\ntoken_activations = torch.randn(5, model.cfg.d_mlp)  # 5 tokens\nfor i in range(5):\n    model.hooks[\"mlp_pre_out\"][2].add_token(token_index=i, value=token_activations[i])\n\n# Reset the activation additions\nmodel.hooks[\"mlp_pre_out\"][2].reset()\n\n# Manually set all activation additions\nall_token_activations = torch.randn(10, model.cfg.d_mlp)  # 10 tokens\nmodel.hooks[\"mlp_pre_out\"][2].set_actadd(all_token_activations)\n\n# Restart the token counter (useful for processing multiple sequences)\nmodel.hooks[\"mlp_pre_out\"][2].restart()\n```\n\n### Neuron Masking\n\nNeuron Masking allows you to selectively zero out certain neurons.\n\n```python\n# Create a mask for MLP neurons in layer 2\nneurons_to_keep = torch.ones(model.cfg.d_mlp)\nneurons_to_keep[:10] = 0  # Mask the first 10 neurons\n\nmodel.hooks[\"mlp_pre_out\"][2].delete_neurons(keep_indices=neurons_to_keep)\n\n# Manually set the mask\nnew_mask = torch.randint(0, 2, (model.cfg.d_mlp,)).float()\nmodel.hooks[\"mlp_pre_out\"][2][\"mask\"].set_mask(new_mask)\n\n# Set an offset for masked neurons\noffset = torch.randn(model.cfg.d_mlp)\nmodel.hooks[\"mlp_pre_out\"][2][\"mask\"].set_offset(offset)\n```\n\n### Neuron Offsetting\n\nNeuron Offsetting allows you to add a constant offset to neuron activations.\n\n```python\n# Set an offset for all neurons in a layer\noffset = torch.randn(model.cfg.d_mlp)\nmodel.hooks[\"mlp_pre_out\"][2][\"offset\"].set_offset(offset)\n\n# Set offsets for all layers at once\nall_layer_offsets = torch.randn(model.cfg.n_layers, model.cfg.d_mlp)\nmodel.hooks[\"mlp_pre_out\"].set_offsets(all_layer_offsets)\n```\n\n### Collecting Activations\n\nYou can use hooks to collect activations from specific parts of the model.\n\n```python\n# Enable collection for specific components and layers\nmodel.hooks.enable_collect_hooks(components=[\"mlp_pre_out\", \"attn_pre_out\"], layers=[0, 1, 2])\n\n# Run a forward pass\n_ = model.get_logits(\"Hello, world!\")\n\n# Retrieve collected activations\nmlp_activations = model.hooks[\"mlp_pre_out\"][\"collect\"]\nattn_activations = model.hooks[\"attn_pre_out\"][\"collect\"]\n\n# Disable all collect hooks\nmodel.hooks.disable_all_collect_hooks()\n```\n\n### Global Hook Operations\n\nYou can perform operations on all hooks of a certain type across the model.\n\n```python\n# Delete neurons across all MLP layers\nremove_indices = torch.randint(0, 2, (model.cfg.n_layers, model.cfg.d_mlp)).bool()\nmodel.hooks.delete_mlp_neurons(remove_indices)\n\n# Delete neurons in attention layers\nattn_remove_indices = torch.randint(0, 2, (model.cfg.n_layers, model.cfg.n_heads * model.cfg.d_head)).bool()\nmodel.hooks.delete_attn_neurons(attn_remove_indices)\n```\n\nThese advanced hook operations provide fine-grained control over the model's\nbehavior, allowing for detailed analysis and manipulation of the model's\ninternal representations.\n\n## Example: Pruning Based on Capabilities\n\n```python\nfrom taker.data_classes import PruningConfig\nfrom taker.prune import run_pruning\n\nconfig = PruningConfig(\n    wandb_project=\"my_project\",\n    model_repo=\"facebook/opt-125m\",\n    token_limit=1000,\n    run_pre_test=True,\n    ff_scoring=\"abs\",\n    ff_frac=0.02,\n    ff_eps=0.001,\n    attn_scoring=\"abs\",\n    attn_frac=0.00,\n    attn_eps=1e-4,\n    focus=\"pile_codeless\",\n    cripple=\"code\",\n)\n\npruned_model, history = run_pruning(config)\n```\n\n## Contributing\n\nContributions to `taker` are welcome! Please check out our [contributing guidelines](CONTRIBUTING.md) for more information.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Tools for Transformer Activations Knowledge ExtRaction",
    "version": "1.2.0",
    "project_urls": {
        "Homepage": "https://github.com/nickypro/taker",
        "Repository": "https://github.com/nickypro/taker"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8582ec09b5cd72a40eeb153f44ff9abc077ec9a733a9d3072a6d4abfae02474b",
                "md5": "9a636d40997eacd9937c7c7fc9580924",
                "sha256": "82961161e8eac83a4e1412d9c142574da508c825405fc16958c3c6b99cd0e3b2"
            },
            "downloads": -1,
            "filename": "taker-1.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9a636d40997eacd9937c7c7fc9580924",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 181928,
            "upload_time": "2025-01-27T09:43:12",
            "upload_time_iso_8601": "2025-01-27T09:43:12.301660Z",
            "url": "https://files.pythonhosted.org/packages/85/82/ec09b5cd72a40eeb153f44ff9abc077ec9a733a9d3072a6d4abfae02474b/taker-1.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6f4f9476e901e5ad654257eb47bfb131b04805280b6cc42189347affbdd9a007",
                "md5": "e5353f5811a5164d39ba8276669bbc87",
                "sha256": "36833df9287ecb3f800294aec1e1a3c1e5818b7df6be940764aba3abaf7c767e"
            },
            "downloads": -1,
            "filename": "taker-1.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "e5353f5811a5164d39ba8276669bbc87",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 178989,
            "upload_time": "2025-01-27T09:43:14",
            "upload_time_iso_8601": "2025-01-27T09:43:14.851058Z",
            "url": "https://files.pythonhosted.org/packages/6f/4f/9476e901e5ad654257eb47bfb131b04805280b6cc42189347affbdd9a007/taker-1.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-27 09:43:14",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "nickypro",
    "github_project": "taker",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "taker"
}

Nicky Pochinkov