soft-moe


Namesoft-moe JSON
Version 0.0.1 PyPI version JSON
download
home_pagehttps://github.com/bwconrad/soft-moe
SummaryPyTorch implementation of 'From Sparse to Soft Mixtures of Experts'
upload_time2023-08-22 19:40:31
maintainer
docs_urlNone
authorBen Conrad
requires_python>=3.10
licenseApache-2.0
keywords transformers artificial intelligence computer vision deep learning
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Soft Mixture of Experts

PyTorch implementation of Soft Mixture of Experts (Soft-MoE) from ["From Sparse to Soft Mixtures of Experts"](https://arxiv.org/abs/2308.00951v1).
This implementation extends the [`timm`](https://github.com/huggingface/pytorch-image-models) library's `VisionTransformer` class to support Soft-MoE MLP layers.


<p align="center">
<img src="https://raw.githubusercontent.com/bwconrad/soft-moe/main/assets/fig.png" width="100%" style={text-align: center;}/>
</p>


## Installation

```
pip install soft-moe
```

Or install the entire repo with:

```
git clone https://github.com/bwconrad/soft-moe
cd soft-moe/
pip install -r requirements.txt
```

## Usage

### Initializing a Soft Mixture of Experts Vision Transformer

```python
import torch
from soft_moe import SoftMoEVisionTransformer

net = SoftMoEVisionTransformer(
    num_experts=128,
    slots_per_expert=1,
    moe_layer_index=6, 
    img_size=224,
    patch_size=32,
    num_classes=1000,
    embed_dim=768,
    depth=12,
    num_heads=12,
    mlp_ratio=4,
)

img = torch.randn(1, 3, 224, 224)
preds = net(img)
```

Functions are also available to initialize default network configurations:

```python
from soft_moe import (soft_moe_vit_base, soft_moe_vit_huge,
                      soft_moe_vit_large, soft_moe_vit_small,
                      soft_moe_vit_tiny)

net = soft_moe_vit_tiny()
net = soft_moe_vit_small()
net = soft_moe_vit_base()
net = soft_moe_vit_large()
net = soft_moe_vit_huge()

net = soft_moe_vit_tiny(num_experts=64, slots_per_expert=2, img_size=128)
```

#### Setting the Mixture of Expert Layers

The `moe_layer_index` argument sets at which layer indices to use MoE MLP layers instead of regular MLP layers.
When an `int` is given, all layers starting from that depth index will be MoE layers.

```python
net = SoftMoEVisionTransformer(
    moe_layer_index=6, # Blocks 6-12
    depth=12,
)
```

When a `list` is given, all specified layers will be MoE layers.

```python
net = SoftMoEVisionTransformer(
    moe_layer_index=[0, 2, 4], # Blocks 0, 2 and 4
    depth=12,
)
```

- __Note__: `moe_layer_index` uses __0-index__ convention.

### Creating a Soft Mixture of Experts Layer 

The `SoftMoELayerWrapper` class can be used to make any network layer, that takes a tensor of shape `[batch, length, dim]`, into a Soft Mixture of Experts layer.

```python 
import torch
import torch.nn as nn

from soft_moe import SoftMoELayerWrapper

x = torch.rand(1, 16, 128)

layer = SoftMoELayerWrapper(
    dim=128,
    slots_per_expert=2,
    num_experts=32,
    layer=nn.Linear,
    # nn.Linear arguments
    in_features=128,
    out_features=32,
)
y = layer(x)

layer = SoftMoELayerWrapper(
    dim=128,
    slots_per_expert=1,
    num_experts=16,
    layer=nn.TransformerEncoderLayer,
    # nn.TransformerEncoderLayer arguments
    d_model=128,
    nhead=8,
)
y = layer(x)
```

- __Note__: If the name of a layer argument overlaps with one of other arguments (e.g. `dim`) you can pass a partial function to `layer`.
    - e.g. `layer=partial(MyCustomLayer, dim=128)`

## Citation
```bibtex
@article{puigcerver2023sparse,
  title={From Sparse to Soft Mixtures of Experts},
  author={Puigcerver, Joan and Riquelme, Carlos and Mustafa, Basil and Houlsby, Neil},
  journal={arXiv preprint arXiv:2308.00951},
  year={2023}
}
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/bwconrad/soft-moe",
    "name": "soft-moe",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "",
    "keywords": "transformers,artificial intelligence,computer vision,deep learning",
    "author": "Ben Conrad",
    "author_email": "benwconrad@proton.me",
    "download_url": "https://files.pythonhosted.org/packages/24/80/2a0615570c6fa8020d92039ed0bc3a702db275c15c57d366d28fea264c59/soft_moe-0.0.1.tar.gz",
    "platform": null,
    "description": "# Soft Mixture of Experts\n\nPyTorch implementation of Soft Mixture of Experts (Soft-MoE) from [\"From Sparse to Soft Mixtures of Experts\"](https://arxiv.org/abs/2308.00951v1).\nThis implementation extends the [`timm`](https://github.com/huggingface/pytorch-image-models) library's `VisionTransformer` class to support Soft-MoE MLP layers.\n\n\n<p align=\"center\">\n<img src=\"https://raw.githubusercontent.com/bwconrad/soft-moe/main/assets/fig.png\" width=\"100%\" style={text-align: center;}/>\n</p>\n\n\n## Installation\n\n```\npip install soft-moe\n```\n\nOr install the entire repo with:\n\n```\ngit clone https://github.com/bwconrad/soft-moe\ncd soft-moe/\npip install -r requirements.txt\n```\n\n## Usage\n\n### Initializing a Soft Mixture of Experts Vision Transformer\n\n```python\nimport torch\nfrom soft_moe import SoftMoEVisionTransformer\n\nnet = SoftMoEVisionTransformer(\n    num_experts=128,\n    slots_per_expert=1,\n    moe_layer_index=6, \n    img_size=224,\n    patch_size=32,\n    num_classes=1000,\n    embed_dim=768,\n    depth=12,\n    num_heads=12,\n    mlp_ratio=4,\n)\n\nimg = torch.randn(1, 3, 224, 224)\npreds = net(img)\n```\n\nFunctions are also available to initialize default network configurations:\n\n```python\nfrom soft_moe import (soft_moe_vit_base, soft_moe_vit_huge,\n                      soft_moe_vit_large, soft_moe_vit_small,\n                      soft_moe_vit_tiny)\n\nnet = soft_moe_vit_tiny()\nnet = soft_moe_vit_small()\nnet = soft_moe_vit_base()\nnet = soft_moe_vit_large()\nnet = soft_moe_vit_huge()\n\nnet = soft_moe_vit_tiny(num_experts=64, slots_per_expert=2, img_size=128)\n```\n\n#### Setting the Mixture of Expert Layers\n\nThe `moe_layer_index` argument sets at which layer indices to use MoE MLP layers instead of regular MLP layers.\nWhen an `int` is given, all layers starting from that depth index will be MoE layers.\n\n```python\nnet = SoftMoEVisionTransformer(\n    moe_layer_index=6, # Blocks 6-12\n    depth=12,\n)\n```\n\nWhen a `list` is given, all specified layers will be MoE layers.\n\n```python\nnet = SoftMoEVisionTransformer(\n    moe_layer_index=[0, 2, 4], # Blocks 0, 2 and 4\n    depth=12,\n)\n```\n\n- __Note__: `moe_layer_index` uses __0-index__ convention.\n\n### Creating a Soft Mixture of Experts Layer \n\nThe `SoftMoELayerWrapper` class can be used to make any network layer, that takes a tensor of shape `[batch, length, dim]`, into a Soft Mixture of Experts layer.\n\n```python \nimport torch\nimport torch.nn as nn\n\nfrom soft_moe import SoftMoELayerWrapper\n\nx = torch.rand(1, 16, 128)\n\nlayer = SoftMoELayerWrapper(\n    dim=128,\n    slots_per_expert=2,\n    num_experts=32,\n    layer=nn.Linear,\n    # nn.Linear arguments\n    in_features=128,\n    out_features=32,\n)\ny = layer(x)\n\nlayer = SoftMoELayerWrapper(\n    dim=128,\n    slots_per_expert=1,\n    num_experts=16,\n    layer=nn.TransformerEncoderLayer,\n    # nn.TransformerEncoderLayer arguments\n    d_model=128,\n    nhead=8,\n)\ny = layer(x)\n```\n\n- __Note__: If the name of a layer argument overlaps with one of other arguments (e.g. `dim`) you can pass a partial function to `layer`.\n    - e.g. `layer=partial(MyCustomLayer, dim=128)`\n\n## Citation\n```bibtex\n@article{puigcerver2023sparse,\n  title={From Sparse to Soft Mixtures of Experts},\n  author={Puigcerver, Joan and Riquelme, Carlos and Mustafa, Basil and Houlsby, Neil},\n  journal={arXiv preprint arXiv:2308.00951},\n  year={2023}\n}\n```\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "PyTorch implementation of 'From Sparse to Soft Mixtures of Experts'",
    "version": "0.0.1",
    "project_urls": {
        "Homepage": "https://github.com/bwconrad/soft-moe"
    },
    "split_keywords": [
        "transformers",
        "artificial intelligence",
        "computer vision",
        "deep learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ca01edd5b1efc5f08c32ad1c97802e4df5abdd57339e474ed0cad71b9d9571cd",
                "md5": "e997bb1628d062cca6609d4a00df8f74",
                "sha256": "573f82e0e883d0a3bdcb05592a64c7cf15a959fcd62afa1b79dda0359945945f"
            },
            "downloads": -1,
            "filename": "soft_moe-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e997bb1628d062cca6609d4a00df8f74",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 14363,
            "upload_time": "2023-08-22T19:40:30",
            "upload_time_iso_8601": "2023-08-22T19:40:30.135936Z",
            "url": "https://files.pythonhosted.org/packages/ca/01/edd5b1efc5f08c32ad1c97802e4df5abdd57339e474ed0cad71b9d9571cd/soft_moe-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "24802a0615570c6fa8020d92039ed0bc3a702db275c15c57d366d28fea264c59",
                "md5": "eff480acfebe86a4cf84fdcbc03945c4",
                "sha256": "6d2aaace545cf301d1c596f5bad2e296ce9ae7b8d76fb373d20124e5de9a286f"
            },
            "downloads": -1,
            "filename": "soft_moe-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "eff480acfebe86a4cf84fdcbc03945c4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 13821,
            "upload_time": "2023-08-22T19:40:31",
            "upload_time_iso_8601": "2023-08-22T19:40:31.461902Z",
            "url": "https://files.pythonhosted.org/packages/24/80/2a0615570c6fa8020d92039ed0bc3a702db275c15c57d366d28fea264c59/soft_moe-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-22 19:40:31",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "bwconrad",
    "github_project": "soft-moe",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "soft-moe"
}
        
Elapsed time: 0.40997s