m2pt

Name	m2pt JSON
Version	0.0.6 JSON
	download
home_page	https://github.com/kyegomez/M2PT
Summary	M2PT - Pytorch
upload_time	2024-01-29 16:28:46
maintainer
docs_url	None
author	Kye Gomez
requires_python	>=3.6,<4.0
license	MIT
keywords	artificial intelligence deep learning optimizers prompt engineering
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            [![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)

# Multi-Modal Pathway Transformer

![Diagram](diagram.png)

Implementation of M2PT in PyTorch from the paper: "Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities".  [PAPER LINK](https://arxiv.org/abs/2401.14405). This is really really cool because just by merging the projections of different multi-modal models together you can increase the performance of your base model. This is a small but effective technique that can be implemented in any model with a minor plug in.


## Install
`pip3 install -U m2pt`

## Usage

### `M2PT`
A fully ready to train implementation of the M2PT model that can be merged with the linears from any multi-modal models, just plug it in! It takes in tokenized texts which are integers then embeds them and then passes -> them into the transformer blocks and then at the end projects them and applies a softmax

```python
import torch
from torch import nn
from m2pt.main import M2PT

# Create an instance of the M2PT model class with the specified parameters
model = M2PT(
    dim=512,  # Dimension of the input and output tensors
    num_tokens=10000,
    depth=6,
    dim_head=64,  # Dimension of each attention head
    heads=8,  # Number of attention heads
    dropout=0.1,  # Dropout rate
    ff_mult=4,  # Multiplier for the dimension of the feed-forward network
    original_linear=nn.Linear(512, 512),  # Linear layer for the original input tensor
    auxiliar_linear=nn.Linear(512, 512),  # Linear layer for the auxiliary input tensor
    ffn_original_linear=nn.Linear,  # Linear layer for the original input tensor in the feed-forward network
    ffn_auxiliar_linear=nn.Linear,  # Linear layer for the auxiliary input tensor in the feed-forward network
    ffn_original_last_linear=nn.Linear,  # Last linear layer for the original input tensor in the feed-forward network
    ffn_aux_last_linear=nn.Linear,  # Last linear layer for the auxiliary input tensor in the feed-forward network
)

# Create a 3D tensor with shape B x S x D
x = torch.randint(0, 10000, (1, 512))

# Pass the input tensor through the model
out = model(x)

# Print the shape of the output tensor
print(out.shape)
```



### `MPTransformerBlock`

- Implementation of Figure 2 and the Multimodal Pathway Transformer with cross modal FFN, plug in and play your FFN

- Re-Usable and Modular.

- Combines linear projections from multiple models


```python
import torch
from torch import nn
from m2pt import MPTransformerBlock

# Create an instance of the MPTransformerBlock class with the specified parameters
model = MPTransformerBlock(
    dim=512,  # Dimension of the input and output tensors
    dim_head=64,  # Dimension of each attention head
    heads=8,  # Number of attention heads
    dropout=0.1,  # Dropout rate
    ff_mult=4,  # Multiplier for the dimension of the feed-forward network
    original_linear=nn.Linear(512, 512),  # Linear layer for the original input tensor
    auxiliar_linear=nn.Linear(512, 512),  # Linear layer for the auxiliary input tensor
    ffn_original_linear=nn.Linear,  # Linear layer for the original input tensor in the feed-forward network
    ffn_auxiliar_linear=nn.Linear,  # Linear layer for the auxiliary input tensor in the feed-forward network
    ffn_original_last_linear=nn.Linear,  # Last linear layer for the original input tensor in the feed-forward network
    ffn_aux_last_linear=nn.Linear,  # Last linear layer for the auxiliary input tensor in the feed-forward network
)

# Create a 3D tensor with shape B x S x D
x = torch.randn(1, 512, 512)

# Pass the input tensor through the model
out = model(x)

# Print the shape of the output tensor
print(out.shape)


```


### `CrossModalReparameterization`
- Implementation of the Cross Modal Reparameterization from the paper in Figure 2 and section 3.2

- It combines the linear methods of different multi-modal models and kinda merges them through addition and a constant value lambda or Cross Modal Scale

- Modular & Re-usable: Simply plug in your linears from any models!

```python
import torch

import torch.nn as nn

from transformers import BertModel, BertConfig, ViTModel, ViTConfig

from m2pt import CrossModalReparameterization

# Define a simple Transformer model for text
class TextTransformerModel(nn.Module):
    def __init__(self, bert_model_name='bert-base-uncased'):
        super(TextTransformerModel, self).__init__()
        self.bert = BertModel.from_pretrained(bert_model_name)

        # Assume we're reparameterizing the first linear layer of the classifier
        self.classifier = nn.Linear(self.bert.config.hidden_size, 2)

    def forward(self, input_ids, attention_mask):
        outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)
        pooled_output = outputs.pooler_output
        logits = self.classifier(pooled_output)
        return logits

# Define a simple Transformer model for images (using ViT for example)
class ImageTransformerModel(nn.Module):
    def __init__(self, vit_model_name='google/vit-base-patch16-224'):
        super(ImageTransformerModel, self).__init__()
        self.vit = ViTModel.from_pretrained(vit_model_name)

        # Assume we're using the first linear layer of the classifier as the auxiliary layer
        self.classifier = nn.Linear(self.vit.config.hidden_size, 2)

    def forward(self, pixel_values):
        outputs = self.vit(pixel_values=pixel_values)
        pooled_output = outputs.pooler_output
        logits = self.classifier(pooled_output)
        return logits

# Example usage
# Initialize both models
text_model = TextTransformerModel()
image_model = ImageTransformerModel()

# Assume we want to reparameterize the classifier layer of the text model
# using the classifier layer of the image model
cross_modal_layer = CrossModalReparameterization(text_model.classifier, image_model.classifier)

# Replace the classifier in the text model with the cross-modal layer
text_model.classifier = cross_modal_layer

# Example input (batch_size, sequence_length)
input_ids = torch.randint(0, 1000, (8, 512))
attention_mask = torch.ones(8, 512)

# Forward pass through the reparameterized model
logits = text_model(input_ids, attention_mask)
print(logits)

# Train the text model as usual...

# After training, merge the parameters for inference
text_model.classifier.merge_parameters()



```


## Citation
```bibtex
@misc{zhang2024multimodal,
    title={Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities}, 
    author={Yiyuan Zhang and Xiaohan Ding and Kaixiong Gong and Yixiao Ge and Ying Shan and Xiangyu Yue},
    year={2024},
    eprint={2401.14405},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
```


# License
MIT

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/kyegomez/M2PT",
    "name": "m2pt",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6,<4.0",
    "maintainer_email": "",
    "keywords": "artificial intelligence,deep learning,optimizers,Prompt Engineering",
    "author": "Kye Gomez",
    "author_email": "kye@apac.ai",
    "download_url": "https://files.pythonhosted.org/packages/48/2f/93a30aeee9330c4778d7e43142ef548b33d7e71d67e94f5d1fdda4c1d755/m2pt-0.0.6.tar.gz",
    "platform": null,
    "description": "[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)\n\n# Multi-Modal Pathway Transformer\n\n![Diagram](diagram.png)\n\nImplementation of M2PT in PyTorch from the paper: \"Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities\".  [PAPER LINK](https://arxiv.org/abs/2401.14405). This is really really cool because just by merging the projections of different multi-modal models together you can increase the performance of your base model. This is a small but effective technique that can be implemented in any model with a minor plug in.\n\n\n## Install\n`pip3 install -U m2pt`\n\n## Usage\n\n### `M2PT`\nA fully ready to train implementation of the M2PT model that can be merged with the linears from any multi-modal models, just plug it in! It takes in tokenized texts which are integers then embeds them and then passes -> them into the transformer blocks and then at the end projects them and applies a softmax\n\n```python\nimport torch\nfrom torch import nn\nfrom m2pt.main import M2PT\n\n# Create an instance of the M2PT model class with the specified parameters\nmodel = M2PT(\n    dim=512,  # Dimension of the input and output tensors\n    num_tokens=10000,\n    depth=6,\n    dim_head=64,  # Dimension of each attention head\n    heads=8,  # Number of attention heads\n    dropout=0.1,  # Dropout rate\n    ff_mult=4,  # Multiplier for the dimension of the feed-forward network\n    original_linear=nn.Linear(512, 512),  # Linear layer for the original input tensor\n    auxiliar_linear=nn.Linear(512, 512),  # Linear layer for the auxiliary input tensor\n    ffn_original_linear=nn.Linear,  # Linear layer for the original input tensor in the feed-forward network\n    ffn_auxiliar_linear=nn.Linear,  # Linear layer for the auxiliary input tensor in the feed-forward network\n    ffn_original_last_linear=nn.Linear,  # Last linear layer for the original input tensor in the feed-forward network\n    ffn_aux_last_linear=nn.Linear,  # Last linear layer for the auxiliary input tensor in the feed-forward network\n)\n\n# Create a 3D tensor with shape B x S x D\nx = torch.randint(0, 10000, (1, 512))\n\n# Pass the input tensor through the model\nout = model(x)\n\n# Print the shape of the output tensor\nprint(out.shape)\n```\n\n\n\n### `MPTransformerBlock`\n\n- Implementation of Figure 2 and the Multimodal Pathway Transformer with cross modal FFN, plug in and play your FFN\n\n- Re-Usable and Modular.\n\n- Combines linear projections from multiple models\n\n\n```python\nimport torch\nfrom torch import nn\nfrom m2pt import MPTransformerBlock\n\n# Create an instance of the MPTransformerBlock class with the specified parameters\nmodel = MPTransformerBlock(\n    dim=512,  # Dimension of the input and output tensors\n    dim_head=64,  # Dimension of each attention head\n    heads=8,  # Number of attention heads\n    dropout=0.1,  # Dropout rate\n    ff_mult=4,  # Multiplier for the dimension of the feed-forward network\n    original_linear=nn.Linear(512, 512),  # Linear layer for the original input tensor\n    auxiliar_linear=nn.Linear(512, 512),  # Linear layer for the auxiliary input tensor\n    ffn_original_linear=nn.Linear,  # Linear layer for the original input tensor in the feed-forward network\n    ffn_auxiliar_linear=nn.Linear,  # Linear layer for the auxiliary input tensor in the feed-forward network\n    ffn_original_last_linear=nn.Linear,  # Last linear layer for the original input tensor in the feed-forward network\n    ffn_aux_last_linear=nn.Linear,  # Last linear layer for the auxiliary input tensor in the feed-forward network\n)\n\n# Create a 3D tensor with shape B x S x D\nx = torch.randn(1, 512, 512)\n\n# Pass the input tensor through the model\nout = model(x)\n\n# Print the shape of the output tensor\nprint(out.shape)\n\n\n```\n\n\n### `CrossModalReparameterization`\n- Implementation of the Cross Modal Reparameterization from the paper in Figure 2 and section 3.2\n\n- It combines the linear methods of different multi-modal models and kinda merges them through addition and a constant value lambda or Cross Modal Scale\n\n- Modular & Re-usable: Simply plug in your linears from any models!\n\n```python\nimport torch\n\nimport torch.nn as nn\n\nfrom transformers import BertModel, BertConfig, ViTModel, ViTConfig\n\nfrom m2pt import CrossModalReparameterization\n\n# Define a simple Transformer model for text\nclass TextTransformerModel(nn.Module):\n    def __init__(self, bert_model_name='bert-base-uncased'):\n        super(TextTransformerModel, self).__init__()\n        self.bert = BertModel.from_pretrained(bert_model_name)\n\n        # Assume we're reparameterizing the first linear layer of the classifier\n        self.classifier = nn.Linear(self.bert.config.hidden_size, 2)\n\n    def forward(self, input_ids, attention_mask):\n        outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)\n        pooled_output = outputs.pooler_output\n        logits = self.classifier(pooled_output)\n        return logits\n\n# Define a simple Transformer model for images (using ViT for example)\nclass ImageTransformerModel(nn.Module):\n    def __init__(self, vit_model_name='google/vit-base-patch16-224'):\n        super(ImageTransformerModel, self).__init__()\n        self.vit = ViTModel.from_pretrained(vit_model_name)\n\n        # Assume we're using the first linear layer of the classifier as the auxiliary layer\n        self.classifier = nn.Linear(self.vit.config.hidden_size, 2)\n\n    def forward(self, pixel_values):\n        outputs = self.vit(pixel_values=pixel_values)\n        pooled_output = outputs.pooler_output\n        logits = self.classifier(pooled_output)\n        return logits\n\n# Example usage\n# Initialize both models\ntext_model = TextTransformerModel()\nimage_model = ImageTransformerModel()\n\n# Assume we want to reparameterize the classifier layer of the text model\n# using the classifier layer of the image model\ncross_modal_layer = CrossModalReparameterization(text_model.classifier, image_model.classifier)\n\n# Replace the classifier in the text model with the cross-modal layer\ntext_model.classifier = cross_modal_layer\n\n# Example input (batch_size, sequence_length)\ninput_ids = torch.randint(0, 1000, (8, 512))\nattention_mask = torch.ones(8, 512)\n\n# Forward pass through the reparameterized model\nlogits = text_model(input_ids, attention_mask)\nprint(logits)\n\n# Train the text model as usual...\n\n# After training, merge the parameters for inference\ntext_model.classifier.merge_parameters()\n\n\n\n```\n\n\n## Citation\n```bibtex\n@misc{zhang2024multimodal,\n    title={Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities}, \n    author={Yiyuan Zhang and Xiaohan Ding and Kaixiong Gong and Yixiao Ge and Ying Shan and Xiangyu Yue},\n    year={2024},\n    eprint={2401.14405},\n    archivePrefix={arXiv},\n    primaryClass={cs.CV}\n}\n```\n\n\n# License\nMIT\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "M2PT - Pytorch",
    "version": "0.0.6",
    "project_urls": {
        "Documentation": "https://github.com/kyegomez/M2PT",
        "Homepage": "https://github.com/kyegomez/M2PT",
        "Repository": "https://github.com/kyegomez/M2PT"
    },
    "split_keywords": [
        "artificial intelligence",
        "deep learning",
        "optimizers",
        "prompt engineering"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8d06548732b4c40ca6c17c1205dcdfcbe879977d14bd81dcd7dd7a4e073e9a9d",
                "md5": "6e9c3e617eddecf51aa55645984192aa",
                "sha256": "4ad0135d2dab80298e4d3737e00b626661cffc6765ca3722646072e416d6d780"
            },
            "downloads": -1,
            "filename": "m2pt-0.0.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6e9c3e617eddecf51aa55645984192aa",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6,<4.0",
            "size": 7326,
            "upload_time": "2024-01-29T16:28:44",
            "upload_time_iso_8601": "2024-01-29T16:28:44.996127Z",
            "url": "https://files.pythonhosted.org/packages/8d/06/548732b4c40ca6c17c1205dcdfcbe879977d14bd81dcd7dd7a4e073e9a9d/m2pt-0.0.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "482f93a30aeee9330c4778d7e43142ef548b33d7e71d67e94f5d1fdda4c1d755",
                "md5": "27eb24f5270cb866989145017040205d",
                "sha256": "47bd85f290d4272267a9b222ece4074d30a076d6ceaabf08679d5bbb1ee623a3"
            },
            "downloads": -1,
            "filename": "m2pt-0.0.6.tar.gz",
            "has_sig": false,
            "md5_digest": "27eb24f5270cb866989145017040205d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6,<4.0",
            "size": 8165,
            "upload_time": "2024-01-29T16:28:46",
            "upload_time_iso_8601": "2024-01-29T16:28:46.081832Z",
            "url": "https://files.pythonhosted.org/packages/48/2f/93a30aeee9330c4778d7e43142ef548b33d7e71d67e94f5d1fdda4c1d755/m2pt-0.0.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-29 16:28:46",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kyegomez",
    "github_project": "M2PT",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "m2pt"
}

Kye Gomez