product-key-memory

Name	product-key-memory JSON
Version	0.3.0 JSON
	download
home_page	None
Summary	Product Key Memory
upload_time	2025-11-01 16:09:58
maintainer	None
docs_url	None
author	None
requires_python	>=3.6
license	MIT
keywords	artificial intelligence transformers
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <img src="./pkm.png" width="400px"></img>

## Product Key Memory

[![PyPI version](https://badge.fury.io/py/product-key-memory.svg)](https://badge.fury.io/py/product-key-memory)

Standalone <a href="https://arxiv.org/abs/1907.05242">Product Key Memory</a> module for augmenting Transformer models

## Install

```bash
$ pip install product-key-memory
```

## Usage

Replace the feedforwards in a Transformer with the following

```python
import torch
from product_key_memory import PKM

pkm = PKM(
    dim = 512,
    heads = 4,
    dim_head = 128,       # keep at 128 for best results
    num_keys = 256,       # number of subkeys, # values will be num_keys ^ 2
    topk = 32             # the top number of subkeys to select
)

x = torch.randn(1, 1024, 512)
mask = torch.ones((1, 1024)).bool()
values = pkm(x, input_mask = mask) # (1, 1024, 512)
```

## Learning Rates

To give different learning rates to the value parameters of the product-key-memory network, use the following helper function.

```python
from torch.optim import Adam
from product_key_memory import fetch_pkm_value_parameters

# this helper function, for your root model, finds all the PKM models and the embedding bag weight parameters
pkm_parameters, other_parameters = fetch_pkm_value_parameters(model)

optim = Adam([
    {'params': other_parameters},
    {'params': pkm_parameters, 'lr': 1e-2}
], lr=1e-3)
```

Or, if product-key-memory parameters are the only other parameters you have a different learning rate for

```python
from torch.optim import Adam
from product_key_memory import fetch_optimizer_parameters

parameters = fetch_optimizer_parameters(model) # automatically creates array of parameter settings with learning rate set at 1e-2 for pkm values
optim = Adam(parameters, lr=1e-3)
```

## Appreciation

Special thanks go to <a href="https://github.com/AranKomat">Aran</a> for encouraging me to look into this, and to <a href="https://github.com/madisonmay">Madison May</a> for his <a href="https://www.pragmatic.ml/large-memory-layers-with-product-keys/">educational blog post</a>, which helped me understand this better.

## Todo

- [x] offer stochasticity with annealed gumbel noise. seen dramatic effects in vector-quantization setting
- [x] offer a way for smaller value dimensions + concat and linear combination of heads (like multi-head attention)

- [ ] get caught up on latest literature on product key memories, if any
- [ ] instead of additive scores, try multiplicative using coordinate descent routing

## Citations

```bibtex
@misc{lample2019large,
    title   = {Large Memory Layers with Product Keys},
    author  = {Guillaume Lample and Alexandre Sablayrolles and Marc'Aurelio Ranzato and Ludovic Denoyer and Hervé Jégou},
    year    = {2019},
    eprint  = {1907.05242},
    archivePrefix = {arXiv}
}
```

```bibtex
@misc{liu2020evolving,
    title   = {Evolving Normalization-Activation Layers},
    author  = {Hanxiao Liu and Andrew Brock and Karen Simonyan and Quoc V. Le},
    year    = {2020},
    eprint  = {2004.02967},
    archivePrefix = {arXiv}
}
```

```bibtex
@article{Shen2023ASO,
    title   = {A Study on ReLU and Softmax in Transformer},
    author  = {Kai Shen and Junliang Guo and Xuejiao Tan and Siliang Tang and Rui Wang and Jiang Bian},
    journal = {ArXiv},
    year    = {2023},
    volume  = {abs/2302.06461},
    url     = {https://api.semanticscholar.org/CorpusID:256827573}
}
```

```bibtex
@article{Csordas2023ApproximatingTF,
    title   = {Approximating Two-Layer Feedforward Networks for Efficient Transformers},
    author  = {R'obert Csord'as and Kazuki Irie and J{\"u}rgen Schmidhuber},
    journal = {ArXiv},
    year    = {2023},
    volume  = {abs/2310.10837},
    url     = {https://api.semanticscholar.org/CorpusID:264172384}
}
```

```bibtex
@inproceedings{anonymous2025continual,
    title   = {Continual Learning via Sparse Memory Finetuning},
    author  = {Anonymous},
    booktitle = {Submitted to The Fourteenth International Conference on Learning Representations},
    year    = {2025},
    url     = {https://openreview.net/forum?id=LGo7U1m24L},
    note    = {under review}
}
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "product-key-memory",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": "artificial intelligence, transformers",
    "author": null,
    "author_email": "Aran Komatsuzaki <aran1234321@gmail.com>, Phil Wang <lucidrains@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/d2/15/88ff08fc280ee8fe702379699e57220aebaf4a676dc740ef6b0a97b3a26d/product_key_memory-0.3.0.tar.gz",
    "platform": null,
    "description": "<img src=\"./pkm.png\" width=\"400px\"></img>\n\n## Product Key Memory\n\n[![PyPI version](https://badge.fury.io/py/product-key-memory.svg)](https://badge.fury.io/py/product-key-memory)\n\nStandalone <a href=\"https://arxiv.org/abs/1907.05242\">Product Key Memory</a> module for augmenting Transformer models\n\n## Install\n\n```bash\n$ pip install product-key-memory\n```\n\n## Usage\n\nReplace the feedforwards in a Transformer with the following\n\n```python\nimport torch\nfrom product_key_memory import PKM\n\npkm = PKM(\n    dim = 512,\n    heads = 4,\n    dim_head = 128,       # keep at 128 for best results\n    num_keys = 256,       # number of subkeys, # values will be num_keys ^ 2\n    topk = 32             # the top number of subkeys to select\n)\n\nx = torch.randn(1, 1024, 512)\nmask = torch.ones((1, 1024)).bool()\nvalues = pkm(x, input_mask = mask) # (1, 1024, 512)\n```\n\n## Learning Rates\n\nTo give different learning rates to the value parameters of the product-key-memory network, use the following helper function.\n\n```python\nfrom torch.optim import Adam\nfrom product_key_memory import fetch_pkm_value_parameters\n\n# this helper function, for your root model, finds all the PKM models and the embedding bag weight parameters\npkm_parameters, other_parameters = fetch_pkm_value_parameters(model)\n\noptim = Adam([\n    {'params': other_parameters},\n    {'params': pkm_parameters, 'lr': 1e-2}\n], lr=1e-3)\n```\n\nOr, if product-key-memory parameters are the only other parameters you have a different learning rate for\n\n```python\nfrom torch.optim import Adam\nfrom product_key_memory import fetch_optimizer_parameters\n\nparameters = fetch_optimizer_parameters(model) # automatically creates array of parameter settings with learning rate set at 1e-2 for pkm values\noptim = Adam(parameters, lr=1e-3)\n```\n\n## Appreciation\n\nSpecial thanks go to <a href=\"https://github.com/AranKomat\">Aran</a> for encouraging me to look into this, and to <a href=\"https://github.com/madisonmay\">Madison May</a> for his <a href=\"https://www.pragmatic.ml/large-memory-layers-with-product-keys/\">educational blog post</a>, which helped me understand this better.\n\n## Todo\n\n- [x] offer stochasticity with annealed gumbel noise. seen dramatic effects in vector-quantization setting\n- [x] offer a way for smaller value dimensions + concat and linear combination of heads (like multi-head attention)\n\n- [ ] get caught up on latest literature on product key memories, if any\n- [ ] instead of additive scores, try multiplicative using coordinate descent routing\n\n## Citations\n\n```bibtex\n@misc{lample2019large,\n    title   = {Large Memory Layers with Product Keys},\n    author  = {Guillaume Lample and Alexandre Sablayrolles and Marc'Aurelio Ranzato and Ludovic Denoyer and Herv\u00e9 J\u00e9gou},\n    year    = {2019},\n    eprint  = {1907.05242},\n    archivePrefix = {arXiv}\n}\n```\n\n```bibtex\n@misc{liu2020evolving,\n    title   = {Evolving Normalization-Activation Layers},\n    author  = {Hanxiao Liu and Andrew Brock and Karen Simonyan and Quoc V. Le},\n    year    = {2020},\n    eprint  = {2004.02967},\n    archivePrefix = {arXiv}\n}\n```\n\n```bibtex\n@article{Shen2023ASO,\n    title   = {A Study on ReLU and Softmax in Transformer},\n    author  = {Kai Shen and Junliang Guo and Xuejiao Tan and Siliang Tang and Rui Wang and Jiang Bian},\n    journal = {ArXiv},\n    year    = {2023},\n    volume  = {abs/2302.06461},\n    url     = {https://api.semanticscholar.org/CorpusID:256827573}\n}\n```\n\n```bibtex\n@article{Csordas2023ApproximatingTF,\n    title   = {Approximating Two-Layer Feedforward Networks for Efficient Transformers},\n    author  = {R'obert Csord'as and Kazuki Irie and J{\\\"u}rgen Schmidhuber},\n    journal = {ArXiv},\n    year    = {2023},\n    volume  = {abs/2310.10837},\n    url     = {https://api.semanticscholar.org/CorpusID:264172384}\n}\n```\n\n```bibtex\n@inproceedings{anonymous2025continual,\n    title   = {Continual Learning via Sparse Memory Finetuning},\n    author  = {Anonymous},\n    booktitle = {Submitted to The Fourteenth International Conference on Learning Representations},\n    year    = {2025},\n    url     = {https://openreview.net/forum?id=LGo7U1m24L},\n    note    = {under review}\n}\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Product Key Memory",
    "version": "0.3.0",
    "project_urls": {
        "Repository": "https://github.com/lucidrains/product-key-memory"
    },
    "split_keywords": [
        "artificial intelligence",
        " transformers"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "951683bd0a7213e5f38713a565ce3b62d53d7ddec7ac0dfa1aa18b0144b48e9f",
                "md5": "3fb6d849eefc209e3801aaaccb14fbbc",
                "sha256": "bea0b37808906f173aa05f54bcda508d9e22c63055513124022c503dfa2e5c70"
            },
            "downloads": -1,
            "filename": "product_key_memory-0.3.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3fb6d849eefc209e3801aaaccb14fbbc",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 8254,
            "upload_time": "2025-11-01T16:09:56",
            "upload_time_iso_8601": "2025-11-01T16:09:56.869081Z",
            "url": "https://files.pythonhosted.org/packages/95/16/83bd0a7213e5f38713a565ce3b62d53d7ddec7ac0dfa1aa18b0144b48e9f/product_key_memory-0.3.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d21588ff08fc280ee8fe702379699e57220aebaf4a676dc740ef6b0a97b3a26d",
                "md5": "402e72ee264323ed198f42f6f837f316",
                "sha256": "e4961ee71da62a25e6740bfca693feef20f4503e1049233c8684c27def065a2a"
            },
            "downloads": -1,
            "filename": "product_key_memory-0.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "402e72ee264323ed198f42f6f837f316",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 36610161,
            "upload_time": "2025-11-01T16:09:58",
            "upload_time_iso_8601": "2025-11-01T16:09:58.921821Z",
            "url": "https://files.pythonhosted.org/packages/d2/15/88ff08fc280ee8fe702379699e57220aebaf4a676dc740ef6b0a97b3a26d/product_key_memory-0.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-11-01 16:09:58",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "lucidrains",
    "github_project": "product-key-memory",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "product-key-memory"
}

None