tinystoriesmodel


Nametinystoriesmodel JSON
Version 0.1.4.post9 PyPI version JSON
download
home_pageNone
SummaryA small TinyStories LM with SAEs and transcoders
upload_time2024-06-25 04:55:46
maintainerNone
docs_urlNone
authorNoa Nabeshima
requires_python<4.0,>=3.11
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # [TinyModel](https://github.com/noanabeshima/tiny_model)
TinyModel is a 4 layer, 44M parameter model trained on [TinyStories V2](https://arxiv.org/abs/2305.07759) for mechanistic interpretability. It uses ReLU activations and no layernorms. It comes with trained SAEs and transcoders.

It can be installed with `pip install tinystoriesmodel`


```
from tiny_model import TinyModel, tokenizer

lm = TinyModel()

# for inference
tok_ids, attn_mask = tokenizer(['Once upon a time', 'In the forest'])
logprobs = lm(tok_ids)

# Get SAE/transcoder acts
# See 'SAEs/Transcoders' section for more information.
feature_acts = lm['M1N123'](tok_ids)
all_feat_acts = lm['M2'](tok_ids)

# Generation
lm.generate('Once upon a time, Ada was happily walking through a magical forest with')

# To decode tok_ids you can use
tokenizer.decode(tok_ids)
```

It was trained for 3 epochs on a [preprocessed version of TinyStoriesV2](https://huggingface.co/datasets/noanabeshima/TinyStoriesV2). Pre-tokenized dataset [here](https://huggingface.co/datasets/noanabeshima/TinyModelTokIds). I recommend using this dataset for getting SAE/transcoder activations.



# SAEs/transcoders
Some sparse SAEs/transcoders are provided along with the model.

For example, `acts = lm['M2N100'](tok_ids)`

To get sparse acts, choose which part of the transformer block you want to look at (currently [sparse MLP](https://www.lesswrong.com/posts/MXabwqMwo3rkGqEW8/sparse-mlp-distillation)/[transcoder](https://www.alignmentforum.org/posts/YmkjnWtZGLbHRbzrP/transcoders-enable-fine-grained-interpretable-circuit) and SAEs on attention out are available, under the tags `'M'` and `'A'` respectively). Residual stream and MLP out SAEs exist, they just haven't been added yet, bug me on e.g. Twitter if you want this to happen fast.

Then, add the layer. A sparse MLP at layer 2 would be `'M2'`.
Finally, optionally add a particular neuron. For example `'M0N10000'`.

# Tokenization
Tokenization is done as follows:
- the top-10K most frequent tokens using the GPT-NeoX tokenizer are selected and sorted by frequency.
- To tokenize a document, first tokenize with the GPT-NeoX tokenizer. Then replace tokens not in the top 10K tokens with a special \[UNK\] token id. All token ids are then mapped to be between 1 and 10K, roughly sorted from most frequent to least.
- Finally, prepend the document with a [BEGIN] token id.


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "tinystoriesmodel",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.11",
    "maintainer_email": null,
    "keywords": null,
    "author": "Noa Nabeshima",
    "author_email": "noanabeshima@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/03/64/0a56d057b0f42a439f0d0632f0ed43f09f73a38e80859e6304f28ac4e7af/tinystoriesmodel-0.1.4.post9.tar.gz",
    "platform": null,
    "description": "# [TinyModel](https://github.com/noanabeshima/tiny_model)\nTinyModel is a 4 layer, 44M parameter model trained on [TinyStories V2](https://arxiv.org/abs/2305.07759) for mechanistic interpretability. It uses ReLU activations and no layernorms. It comes with trained SAEs and transcoders.\n\nIt can be installed with `pip install tinystoriesmodel`\n\n\n```\nfrom tiny_model import TinyModel, tokenizer\n\nlm = TinyModel()\n\n# for inference\ntok_ids, attn_mask = tokenizer(['Once upon a time', 'In the forest'])\nlogprobs = lm(tok_ids)\n\n# Get SAE/transcoder acts\n# See 'SAEs/Transcoders' section for more information.\nfeature_acts = lm['M1N123'](tok_ids)\nall_feat_acts = lm['M2'](tok_ids)\n\n# Generation\nlm.generate('Once upon a time, Ada was happily walking through a magical forest with')\n\n# To decode tok_ids you can use\ntokenizer.decode(tok_ids)\n```\n\nIt was trained for 3 epochs on a [preprocessed version of TinyStoriesV2](https://huggingface.co/datasets/noanabeshima/TinyStoriesV2). Pre-tokenized dataset [here](https://huggingface.co/datasets/noanabeshima/TinyModelTokIds). I recommend using this dataset for getting SAE/transcoder activations.\n\n\n\n# SAEs/transcoders\nSome sparse SAEs/transcoders are provided along with the model.\n\nFor example, `acts = lm['M2N100'](tok_ids)`\n\nTo get sparse acts, choose which part of the transformer block you want to look at (currently [sparse MLP](https://www.lesswrong.com/posts/MXabwqMwo3rkGqEW8/sparse-mlp-distillation)/[transcoder](https://www.alignmentforum.org/posts/YmkjnWtZGLbHRbzrP/transcoders-enable-fine-grained-interpretable-circuit) and SAEs on attention out are available, under the tags `'M'` and `'A'` respectively). Residual stream and MLP out SAEs exist, they just haven't been added yet, bug me on e.g. Twitter if you want this to happen fast.\n\nThen, add the layer. A sparse MLP at layer 2 would be `'M2'`.\nFinally, optionally add a particular neuron. For example `'M0N10000'`.\n\n# Tokenization\nTokenization is done as follows:\n- the top-10K most frequent tokens using the GPT-NeoX tokenizer are selected and sorted by frequency.\n- To tokenize a document, first tokenize with the GPT-NeoX tokenizer. Then replace tokens not in the top 10K tokens with a special \\[UNK\\] token id. All token ids are then mapped to be between 1 and 10K, roughly sorted from most frequent to least.\n- Finally, prepend the document with a [BEGIN] token id.\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A small TinyStories LM with SAEs and transcoders",
    "version": "0.1.4.post9",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "454878a03810d383039bd4e9c9986d183be5a2c03df7e58725177b0a1d4a95de",
                "md5": "c387004c04add92a953ce5986527094d",
                "sha256": "f18523b02d5ef939366011dd7bf7e27694d1f3a040728616abea70791b5cb66f"
            },
            "downloads": -1,
            "filename": "tinystoriesmodel-0.1.4.post9-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c387004c04add92a953ce5986527094d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.11",
            "size": 75982,
            "upload_time": "2024-06-25T04:55:45",
            "upload_time_iso_8601": "2024-06-25T04:55:45.047965Z",
            "url": "https://files.pythonhosted.org/packages/45/48/78a03810d383039bd4e9c9986d183be5a2c03df7e58725177b0a1d4a95de/tinystoriesmodel-0.1.4.post9-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "03640a56d057b0f42a439f0d0632f0ed43f09f73a38e80859e6304f28ac4e7af",
                "md5": "ea815ba39bf04c6497ba75b89f8d39b1",
                "sha256": "aee81be38353c670d2b8b54980e4e03c29d0822d851ae504c2cdbe8e1bcec569"
            },
            "downloads": -1,
            "filename": "tinystoriesmodel-0.1.4.post9.tar.gz",
            "has_sig": false,
            "md5_digest": "ea815ba39bf04c6497ba75b89f8d39b1",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.11",
            "size": 77023,
            "upload_time": "2024-06-25T04:55:46",
            "upload_time_iso_8601": "2024-06-25T04:55:46.912905Z",
            "url": "https://files.pythonhosted.org/packages/03/64/0a56d057b0f42a439f0d0632f0ed43f09f73a38e80859e6304f28ac4e7af/tinystoriesmodel-0.1.4.post9.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-25 04:55:46",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "tinystoriesmodel"
}
        
Elapsed time: 1.02473s