RPQ-pytorch


NameRPQ-pytorch JSON
Version 0.0.34 PyPI version JSON
download
home_pagehttps://github.com/a-kore/RPQ-pytorch
SummaryReverse Product Quantization (RPQ) of weights to reduce static memory usage.
upload_time2023-09-28 20:46:15
maintainer
docs_urlNone
authorAli Kore
requires_python
licenseMIT
keywords artificial intelligence ai machine learning deep learning pytorch quantization product quantization reverse product quantization memory reduction
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # RPQ-pytorch
Reverse Product Quantization (RPQ) of weights to reduce static memory usage.

![](assets/rpq_diagram.gif)

<!-- Go into how the method works. -->

## Table of Contents

- [Introduction](#introduction)
- [Installation](#installation)
- [Usage](#usage)
- [Benchmarks](#benchmarks)

## Introduction

[Product quantization](https://www.pinecone.io/learn/product-quantization/) is a method for reducing the memory requirements for vector similarity search.  It reduces the memory footprint by chunking the vectors into subvectors that are each compressed into a set of codebooks with 256 codes each.  This allows us to have a set of codes that can be represented by uint8 indices instead of the full vector representation.

If we reverse this process, we can dynamically spawn a larger set of vectors from a much smaller set of codebooks containing sub-vectors and a set of randomized uint8 indices, rather than having to persistently hold a much larger set of vectors.  This can be used during the forward pass to expand/compile the weight just-in-time in order to perform the operations on the input.  

This creates a state for a model where the weights are "dormant" and expanded to their active state just before use.  This plays very well with methods like **gradient checkpointing** (and inference, similarly) where we can unpack the weights again rather then storing them.  In other words, the weights are part of the dynamic computational graph and can be forgotten/unpacked whenever they are needed.

However, this doesn't come for free, the indices inherit from a set of shared codebooks, so the larger the weights, the more likelihood that vectors generated will share sub-vectors.  This can be prevented by increasing the number of codebooks, but requires more testing to see what the minimum number of codebooks required for each implementation should be.

For instance, in the [Usage](#usage) section we define an RPQOPT model(OPT variant w/ RPQ weights) where the number of codebooks is set to the number of heads.  This is chosen abitrarily, but works well since the `hidden_dim` must be divisible by `num_codebooks`. 

The effect of having a set of entangled vectors is unknown and would require rigorous testing with standard benchmarks for comparison.  Intuitively, this would have a different outcome depending on the way the final weight structure is used.  For a vector quantization module, it could be advantageous to have codes be entangled to avoid the issue of "dead codes" and increase codebook utilization.


## Installation

```bash
pip install rpq-pytorch
```

## Usage

#### Standalone Weights

A standalone module `RPQWeight` is available as an `nn.Module` wrapper that intializes a set of dynamic PQ weight and returns the expanded set of weight vectors.

```python
from rpq.nn import RPQWeight

w = RPQWeight(num_codebooks=72, codebook_dim=128, num_vectors=9216) 

print(w.codebooks.shape, w.indices.shape) # torch.Size([72, 256, 128]) torch.Size([72, 9216])

print(w().shape) # torch.Size([9216, 9216])
```

#### Layers

A set of common layers are re-implemented with quantized weights.  It follows the same usage as `torch.nn` modules with an extra argument for the `num_codebooks` for each layer.  For each layer, the `out_features`/`num_embeddings` must be divisible by the `num_codebooks`.

```python
from rpq.nn import RPQLinear

layer = RPQLinear(in_features=1024, out_features=1024, num_codebooks=16)

x = torch.randn(1, 1, 1024) # (b, n, d)
y = layer(x) # (1, 1, 4096)

```

Layers implemented:

- [x] `RPQLinear`
- [x] `RPQEmbedding`*
- [ ] `RPQConv1d`
- [ ] `RPQConv2d`
- [ ] `RPQConvTranspose2d`
- [ ] `RPQConv1d`
- [ ] `RPQConvTranspose1d`
- [ ] `RPQConv3d`
- [ ] `RPQConvTranspose3d`
- [ ] `RPQBilinear`

*Note: `Embedding` layers are a lookup table and therefore very fast, as such the operation to expand the weights for `RPQEmbedding` adds a lot of time to the operation especially for a small number of tokens (10s of $\mu s$ -> 10s of ms).

#### Models

Using the layer implementations, we can implement models via drop-in replacement of their static weight counterparts.

##### RPQViT (ViT Giant)

```python
from vit_pytorch import ViT
from rpq.models.rpqvit import RPQViT
from rpq.utils import model_size

# vit_giant_patch14_336
model = ViT(
    image_size=336,
    patch_size=14,
    num_classes=1000,
    dim=1280,
    depth=32,
    heads=16,
    mlp_dim=5120,
    dropout=0.1,
    emb_dropout=0.1
)

# rpqvit_giant_patch14_336
rpq_model = RPQViT(
    image_size=336,
    patch_size=14,
    num_classes=1000,
    dim=1280,
    depth=32,
    heads=16,
    mlp_dim=5120,
    dropout=0.1,
    emb_dropout=0.1
)

model_size(model)
model_size(rpq_model)
```
```
model size: 2252.157MB
model size: 361.429MB  
```
Approximately ~6x reduction in model size.

##### RPQOPT (opt-66b)

```python

import torch
from transformers.models.opt.modeling_opt import OPTConfig
from transformers import GPT2Tokenizer
from rpq.models.rpqopt import RPQOPTModel
from rpq.utils import model_size


tokenizer = GPT2Tokenizer.from_pretrained("facebook/opt-66b")
conf = OPTConfig.from_pretrained("facebook/opt-66b")
rpq_model = RPQOPTModel(conf) # randomly initialized model

inputs = tokenizer("Hello, my dog is cute.", return_tensors="pt")

with torch.no_grad():
    outputs = rpq_model(**inputs)

model_size(rpq_model)
```
```
model size: 5885.707MB 
```
This is an RPQOPT-66b initialized at float32 precision, a static weight version (standard OPT-66b) would be **264 GB** in size. This amounts to approximately ~44x reduction in size.


## Benchmarks

Due to the entanglement of the weight matrix arising as result of the inheritance from a shared set of codebooks, testing the RPQ model variants against the original methods would be important to characterize issues/tradeoffs with training stability, especially at scale.  Those tests will be displayed in the table below:
<!-- 93.6 93.4 43.0 57.2 -->
| Model | Config | Model Size | Dataset | Validation Accuracy | Epochs |
| --- | --- | --- | --- | --- | -- |
| ViT | vit_base_patch16_224 | 330MB | MNIST | TBD | 90 |
| RPQViT | vit_base_patch16_224 | 88MB | MINST | TBD | 90 |
| ViT | vit_base_patch16_224 | 330MB | CIFAR10 | TBD | 90 |
| RPQViT | vit_base_patch16_224 | 88MB | CIFAR10 | TBD | 90 |
| ViT | vit_base_patch16_224 | 330MB | Imagenet | TBD | 90 |
| RPQViT | vit_base_patch16_224 | 88MB | Imagenet | TBD | 90 |


## TODO

- [ ] Implement `RPQConv1d` layer
- [ ] Implement `RPQConv2d` layer
- [ ] Implement `RPQConv3d` layer
- [ ] Implement `RPQConvTranspose1d` layer
- [ ] Implement `RPQConvTranspose2d` layer
- [ ] Implement `RPQConvTranspose3d` layer
- [ ] Implement `RPQBilinear` layer
- [ ] Perform benchmarks with ViTs (ViT vs RPQViT)
- [ ] Perform benchmarks with LLMs (BERT, OPT, etc.,)
- [ ] Explore methods of conversion from pre-trained static weights to dynamic RPQ weights





            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/a-kore/RPQ-pytorch",
    "name": "RPQ-pytorch",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "artificial intelligence,AI,machine learning,deep learning,pytorch,quantization,product quantization,reverse product quantization,memory reduction",
    "author": "Ali Kore",
    "author_email": "akore654@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/33/b0/0c09dafac1b3053bcb2a63fe06b296ff360e9e357851e3ceecab1e03d620/RPQ-pytorch-0.0.34.tar.gz",
    "platform": null,
    "description": "# RPQ-pytorch\nReverse Product Quantization (RPQ) of weights to reduce static memory usage.\n\n![](assets/rpq_diagram.gif)\n\n<!-- Go into how the method works. -->\n\n## Table of Contents\n\n- [Introduction](#introduction)\n- [Installation](#installation)\n- [Usage](#usage)\n- [Benchmarks](#benchmarks)\n\n## Introduction\n\n[Product quantization](https://www.pinecone.io/learn/product-quantization/) is a method for reducing the memory requirements for vector similarity search.  It reduces the memory footprint by chunking the vectors into subvectors that are each compressed into a set of codebooks with 256 codes each.  This allows us to have a set of codes that can be represented by uint8 indices instead of the full vector representation.\n\nIf we reverse this process, we can dynamically spawn a larger set of vectors from a much smaller set of codebooks containing sub-vectors and a set of randomized uint8 indices, rather than having to persistently hold a much larger set of vectors.  This can be used during the forward pass to expand/compile the weight just-in-time in order to perform the operations on the input.  \n\nThis creates a state for a model where the weights are \"dormant\" and expanded to their active state just before use.  This plays very well with methods like **gradient checkpointing** (and inference, similarly) where we can unpack the weights again rather then storing them.  In other words, the weights are part of the dynamic computational graph and can be forgotten/unpacked whenever they are needed.\n\nHowever, this doesn't come for free, the indices inherit from a set of shared codebooks, so the larger the weights, the more likelihood that vectors generated will share sub-vectors.  This can be prevented by increasing the number of codebooks, but requires more testing to see what the minimum number of codebooks required for each implementation should be.\n\nFor instance, in the [Usage](#usage) section we define an RPQOPT model(OPT variant w/ RPQ weights) where the number of codebooks is set to the number of heads.  This is chosen abitrarily, but works well since the `hidden_dim` must be divisible by `num_codebooks`. \n\nThe effect of having a set of entangled vectors is unknown and would require rigorous testing with standard benchmarks for comparison.  Intuitively, this would have a different outcome depending on the way the final weight structure is used.  For a vector quantization module, it could be advantageous to have codes be entangled to avoid the issue of \"dead codes\" and increase codebook utilization.\n\n\n## Installation\n\n```bash\npip install rpq-pytorch\n```\n\n## Usage\n\n#### Standalone Weights\n\nA standalone module `RPQWeight` is available as an `nn.Module` wrapper that intializes a set of dynamic PQ weight and returns the expanded set of weight vectors.\n\n```python\nfrom rpq.nn import RPQWeight\n\nw = RPQWeight(num_codebooks=72, codebook_dim=128, num_vectors=9216) \n\nprint(w.codebooks.shape, w.indices.shape) # torch.Size([72, 256, 128]) torch.Size([72, 9216])\n\nprint(w().shape) # torch.Size([9216, 9216])\n```\n\n#### Layers\n\nA set of common layers are re-implemented with quantized weights.  It follows the same usage as `torch.nn` modules with an extra argument for the `num_codebooks` for each layer.  For each layer, the `out_features`/`num_embeddings` must be divisible by the `num_codebooks`.\n\n```python\nfrom rpq.nn import RPQLinear\n\nlayer = RPQLinear(in_features=1024, out_features=1024, num_codebooks=16)\n\nx = torch.randn(1, 1, 1024) # (b, n, d)\ny = layer(x) # (1, 1, 4096)\n\n```\n\nLayers implemented:\n\n- [x] `RPQLinear`\n- [x] `RPQEmbedding`*\n- [ ] `RPQConv1d`\n- [ ] `RPQConv2d`\n- [ ] `RPQConvTranspose2d`\n- [ ] `RPQConv1d`\n- [ ] `RPQConvTranspose1d`\n- [ ] `RPQConv3d`\n- [ ] `RPQConvTranspose3d`\n- [ ] `RPQBilinear`\n\n*Note: `Embedding` layers are a lookup table and therefore very fast, as such the operation to expand the weights for `RPQEmbedding` adds a lot of time to the operation especially for a small number of tokens (10s of $\\mu s$ -> 10s of ms).\n\n#### Models\n\nUsing the layer implementations, we can implement models via drop-in replacement of their static weight counterparts.\n\n##### RPQViT (ViT Giant)\n\n```python\nfrom vit_pytorch import ViT\nfrom rpq.models.rpqvit import RPQViT\nfrom rpq.utils import model_size\n\n# vit_giant_patch14_336\nmodel = ViT(\n    image_size=336,\n    patch_size=14,\n    num_classes=1000,\n    dim=1280,\n    depth=32,\n    heads=16,\n    mlp_dim=5120,\n    dropout=0.1,\n    emb_dropout=0.1\n)\n\n# rpqvit_giant_patch14_336\nrpq_model = RPQViT(\n    image_size=336,\n    patch_size=14,\n    num_classes=1000,\n    dim=1280,\n    depth=32,\n    heads=16,\n    mlp_dim=5120,\n    dropout=0.1,\n    emb_dropout=0.1\n)\n\nmodel_size(model)\nmodel_size(rpq_model)\n```\n```\nmodel size: 2252.157MB\nmodel size: 361.429MB  \n```\nApproximately ~6x reduction in model size.\n\n##### RPQOPT (opt-66b)\n\n```python\n\nimport torch\nfrom transformers.models.opt.modeling_opt import OPTConfig\nfrom transformers import GPT2Tokenizer\nfrom rpq.models.rpqopt import RPQOPTModel\nfrom rpq.utils import model_size\n\n\ntokenizer = GPT2Tokenizer.from_pretrained(\"facebook/opt-66b\")\nconf = OPTConfig.from_pretrained(\"facebook/opt-66b\")\nrpq_model = RPQOPTModel(conf) # randomly initialized model\n\ninputs = tokenizer(\"Hello, my dog is cute.\", return_tensors=\"pt\")\n\nwith torch.no_grad():\n    outputs = rpq_model(**inputs)\n\nmodel_size(rpq_model)\n```\n```\nmodel size: 5885.707MB \n```\nThis is an RPQOPT-66b initialized at float32 precision, a static weight version (standard OPT-66b) would be **264 GB** in size. This amounts to approximately ~44x reduction in size.\n\n\n## Benchmarks\n\nDue to the entanglement of the weight matrix arising as result of the inheritance from a shared set of codebooks, testing the RPQ model variants against the original methods would be important to characterize issues/tradeoffs with training stability, especially at scale.  Those tests will be displayed in the table below:\n<!-- 93.6 93.4 43.0 57.2 -->\n| Model | Config | Model Size | Dataset | Validation Accuracy | Epochs |\n| --- | --- | --- | --- | --- | -- |\n| ViT | vit_base_patch16_224 | 330MB | MNIST | TBD | 90 |\n| RPQViT | vit_base_patch16_224 | 88MB | MINST | TBD | 90 |\n| ViT | vit_base_patch16_224 | 330MB | CIFAR10 | TBD | 90 |\n| RPQViT | vit_base_patch16_224 | 88MB | CIFAR10 | TBD | 90 |\n| ViT | vit_base_patch16_224 | 330MB | Imagenet | TBD | 90 |\n| RPQViT | vit_base_patch16_224 | 88MB | Imagenet | TBD | 90 |\n\n\n## TODO\n\n- [ ] Implement `RPQConv1d` layer\n- [ ] Implement `RPQConv2d` layer\n- [ ] Implement `RPQConv3d` layer\n- [ ] Implement `RPQConvTranspose1d` layer\n- [ ] Implement `RPQConvTranspose2d` layer\n- [ ] Implement `RPQConvTranspose3d` layer\n- [ ] Implement `RPQBilinear` layer\n- [ ] Perform benchmarks with ViTs (ViT vs RPQViT)\n- [ ] Perform benchmarks with LLMs (BERT, OPT, etc.,)\n- [ ] Explore methods of conversion from pre-trained static weights to dynamic RPQ weights\n\n\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Reverse Product Quantization (RPQ) of weights to reduce static memory usage.",
    "version": "0.0.34",
    "project_urls": {
        "Homepage": "https://github.com/a-kore/RPQ-pytorch"
    },
    "split_keywords": [
        "artificial intelligence",
        "ai",
        "machine learning",
        "deep learning",
        "pytorch",
        "quantization",
        "product quantization",
        "reverse product quantization",
        "memory reduction"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6e85b6df5ff1817534aef52fded55c11b7dd66256d2bae29c425404e7aaedbb6",
                "md5": "d23f4a1af7546b36f5aaf902ff9c3e74",
                "sha256": "eed345f0f8720ad5ffc6d1172cc0f5a9fb14cd8dfa8f800addc867300442c4be"
            },
            "downloads": -1,
            "filename": "RPQ_pytorch-0.0.34-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d23f4a1af7546b36f5aaf902ff9c3e74",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 34802,
            "upload_time": "2023-09-28T20:46:13",
            "upload_time_iso_8601": "2023-09-28T20:46:13.629567Z",
            "url": "https://files.pythonhosted.org/packages/6e/85/b6df5ff1817534aef52fded55c11b7dd66256d2bae29c425404e7aaedbb6/RPQ_pytorch-0.0.34-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "33b00c09dafac1b3053bcb2a63fe06b296ff360e9e357851e3ceecab1e03d620",
                "md5": "4ce22a15271fa73c4f568df76b70b5fb",
                "sha256": "260defa0703cfed7d1ca2d8156133cbd28ac2f0e166920b56a365781b5248acd"
            },
            "downloads": -1,
            "filename": "RPQ-pytorch-0.0.34.tar.gz",
            "has_sig": false,
            "md5_digest": "4ce22a15271fa73c4f568df76b70b5fb",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 31442,
            "upload_time": "2023-09-28T20:46:15",
            "upload_time_iso_8601": "2023-09-28T20:46:15.070164Z",
            "url": "https://files.pythonhosted.org/packages/33/b0/0c09dafac1b3053bcb2a63fe06b296ff360e9e357851e3ceecab1e03d620/RPQ-pytorch-0.0.34.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-28 20:46:15",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "a-kore",
    "github_project": "RPQ-pytorch",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "rpq-pytorch"
}
        
Elapsed time: 0.15387s