Name | tinystoriesmodel JSON |
Version |
0.1.4.post9
JSON |
| download |
home_page | None |
Summary | A small TinyStories LM with SAEs and transcoders |
upload_time | 2024-06-25 04:55:46 |
maintainer | None |
docs_url | None |
author | Noa Nabeshima |
requires_python | <4.0,>=3.11 |
license | None |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# [TinyModel](https://github.com/noanabeshima/tiny_model)
TinyModel is a 4 layer, 44M parameter model trained on [TinyStories V2](https://arxiv.org/abs/2305.07759) for mechanistic interpretability. It uses ReLU activations and no layernorms. It comes with trained SAEs and transcoders.
It can be installed with `pip install tinystoriesmodel`
```
from tiny_model import TinyModel, tokenizer
lm = TinyModel()
# for inference
tok_ids, attn_mask = tokenizer(['Once upon a time', 'In the forest'])
logprobs = lm(tok_ids)
# Get SAE/transcoder acts
# See 'SAEs/Transcoders' section for more information.
feature_acts = lm['M1N123'](tok_ids)
all_feat_acts = lm['M2'](tok_ids)
# Generation
lm.generate('Once upon a time, Ada was happily walking through a magical forest with')
# To decode tok_ids you can use
tokenizer.decode(tok_ids)
```
It was trained for 3 epochs on a [preprocessed version of TinyStoriesV2](https://huggingface.co/datasets/noanabeshima/TinyStoriesV2). Pre-tokenized dataset [here](https://huggingface.co/datasets/noanabeshima/TinyModelTokIds). I recommend using this dataset for getting SAE/transcoder activations.
# SAEs/transcoders
Some sparse SAEs/transcoders are provided along with the model.
For example, `acts = lm['M2N100'](tok_ids)`
To get sparse acts, choose which part of the transformer block you want to look at (currently [sparse MLP](https://www.lesswrong.com/posts/MXabwqMwo3rkGqEW8/sparse-mlp-distillation)/[transcoder](https://www.alignmentforum.org/posts/YmkjnWtZGLbHRbzrP/transcoders-enable-fine-grained-interpretable-circuit) and SAEs on attention out are available, under the tags `'M'` and `'A'` respectively). Residual stream and MLP out SAEs exist, they just haven't been added yet, bug me on e.g. Twitter if you want this to happen fast.
Then, add the layer. A sparse MLP at layer 2 would be `'M2'`.
Finally, optionally add a particular neuron. For example `'M0N10000'`.
# Tokenization
Tokenization is done as follows:
- the top-10K most frequent tokens using the GPT-NeoX tokenizer are selected and sorted by frequency.
- To tokenize a document, first tokenize with the GPT-NeoX tokenizer. Then replace tokens not in the top 10K tokens with a special \[UNK\] token id. All token ids are then mapped to be between 1 and 10K, roughly sorted from most frequent to least.
- Finally, prepend the document with a [BEGIN] token id.
Raw data
{
"_id": null,
"home_page": null,
"name": "tinystoriesmodel",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.11",
"maintainer_email": null,
"keywords": null,
"author": "Noa Nabeshima",
"author_email": "noanabeshima@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/03/64/0a56d057b0f42a439f0d0632f0ed43f09f73a38e80859e6304f28ac4e7af/tinystoriesmodel-0.1.4.post9.tar.gz",
"platform": null,
"description": "# [TinyModel](https://github.com/noanabeshima/tiny_model)\nTinyModel is a 4 layer, 44M parameter model trained on [TinyStories V2](https://arxiv.org/abs/2305.07759) for mechanistic interpretability. It uses ReLU activations and no layernorms. It comes with trained SAEs and transcoders.\n\nIt can be installed with `pip install tinystoriesmodel`\n\n\n```\nfrom tiny_model import TinyModel, tokenizer\n\nlm = TinyModel()\n\n# for inference\ntok_ids, attn_mask = tokenizer(['Once upon a time', 'In the forest'])\nlogprobs = lm(tok_ids)\n\n# Get SAE/transcoder acts\n# See 'SAEs/Transcoders' section for more information.\nfeature_acts = lm['M1N123'](tok_ids)\nall_feat_acts = lm['M2'](tok_ids)\n\n# Generation\nlm.generate('Once upon a time, Ada was happily walking through a magical forest with')\n\n# To decode tok_ids you can use\ntokenizer.decode(tok_ids)\n```\n\nIt was trained for 3 epochs on a [preprocessed version of TinyStoriesV2](https://huggingface.co/datasets/noanabeshima/TinyStoriesV2). Pre-tokenized dataset [here](https://huggingface.co/datasets/noanabeshima/TinyModelTokIds). I recommend using this dataset for getting SAE/transcoder activations.\n\n\n\n# SAEs/transcoders\nSome sparse SAEs/transcoders are provided along with the model.\n\nFor example, `acts = lm['M2N100'](tok_ids)`\n\nTo get sparse acts, choose which part of the transformer block you want to look at (currently [sparse MLP](https://www.lesswrong.com/posts/MXabwqMwo3rkGqEW8/sparse-mlp-distillation)/[transcoder](https://www.alignmentforum.org/posts/YmkjnWtZGLbHRbzrP/transcoders-enable-fine-grained-interpretable-circuit) and SAEs on attention out are available, under the tags `'M'` and `'A'` respectively). Residual stream and MLP out SAEs exist, they just haven't been added yet, bug me on e.g. Twitter if you want this to happen fast.\n\nThen, add the layer. A sparse MLP at layer 2 would be `'M2'`.\nFinally, optionally add a particular neuron. For example `'M0N10000'`.\n\n# Tokenization\nTokenization is done as follows:\n- the top-10K most frequent tokens using the GPT-NeoX tokenizer are selected and sorted by frequency.\n- To tokenize a document, first tokenize with the GPT-NeoX tokenizer. Then replace tokens not in the top 10K tokens with a special \\[UNK\\] token id. All token ids are then mapped to be between 1 and 10K, roughly sorted from most frequent to least.\n- Finally, prepend the document with a [BEGIN] token id.\n\n",
"bugtrack_url": null,
"license": null,
"summary": "A small TinyStories LM with SAEs and transcoders",
"version": "0.1.4.post9",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "454878a03810d383039bd4e9c9986d183be5a2c03df7e58725177b0a1d4a95de",
"md5": "c387004c04add92a953ce5986527094d",
"sha256": "f18523b02d5ef939366011dd7bf7e27694d1f3a040728616abea70791b5cb66f"
},
"downloads": -1,
"filename": "tinystoriesmodel-0.1.4.post9-py3-none-any.whl",
"has_sig": false,
"md5_digest": "c387004c04add92a953ce5986527094d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.11",
"size": 75982,
"upload_time": "2024-06-25T04:55:45",
"upload_time_iso_8601": "2024-06-25T04:55:45.047965Z",
"url": "https://files.pythonhosted.org/packages/45/48/78a03810d383039bd4e9c9986d183be5a2c03df7e58725177b0a1d4a95de/tinystoriesmodel-0.1.4.post9-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "03640a56d057b0f42a439f0d0632f0ed43f09f73a38e80859e6304f28ac4e7af",
"md5": "ea815ba39bf04c6497ba75b89f8d39b1",
"sha256": "aee81be38353c670d2b8b54980e4e03c29d0822d851ae504c2cdbe8e1bcec569"
},
"downloads": -1,
"filename": "tinystoriesmodel-0.1.4.post9.tar.gz",
"has_sig": false,
"md5_digest": "ea815ba39bf04c6497ba75b89f8d39b1",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.11",
"size": 77023,
"upload_time": "2024-06-25T04:55:46",
"upload_time_iso_8601": "2024-06-25T04:55:46.912905Z",
"url": "https://files.pythonhosted.org/packages/03/64/0a56d057b0f42a439f0d0632f0ed43f09f73a38e80859e6304f28ac4e7af/tinystoriesmodel-0.1.4.post9.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-06-25 04:55:46",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "tinystoriesmodel"
}