linmult


Namelinmult JSON
Version 1.2.0 PyPI version JSON
download
home_pageNone
SummaryGeneral-purpose Multimodal Transformer with Linear Complexity Attention Mechanism.
upload_time2023-10-10 16:51:51
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseNone
keywords linear-complexity attention multimodal transformer
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # LinMulT
[![License](https://img.shields.io/badge/license-MIT-yellow.svg)](LICENSE)
[![python](https://img.shields.io/badge/Python-3.10-3776AB.svg?style=flat&logo=python&logoColor=white)](https://www.python.org)
[![pytorch](https://img.shields.io/badge/PyTorch-2.0.1-EE4C2C.svg?style=flat&logo=pytorch)](https://pytorch.org)

General-purpose Multimodal Transformer with Linear Complexity Attention Mechanism.

# Setup
### Install package from PyPi
```
pip install linmult
```

### Install package from repository root
```
git clone https://github.com/fodorad/LinMulT
cd LinMulT
pip install -e .
pip install -U -r requirements.txt
python -m unittest
```

# Quick start
### Example 1:
Simple transformer encoder with linear attention.
The forward pass is performed using an input sequence.
```
import torch
from linmult import LinT

# input shape: (batch_size, time_dimension, feature_dimension)
x = torch.rand((32, 15, 1024), device='cuda')
model = LinT(input_modality_channels=1024, output_dim=5).cuda()
y_pred_seq = model(x)

# output shape: (batch_size, time_dimension, output_dimension)
assert y_pred_seq.size() == torch.Size([32, 15, 5])
```

### Example 2:
Multimodal Transformer with Linear Attention.
The forward pass is performed using 2 input sequences. Both input sequences have the same time dimension.
```
import torch
from linmult import LinMulT

# input shape: (batch_size, time_dimension, feature_dimension)
x_1 = torch.rand((32, 15, 1024), device='cuda')
x_2 = torch.rand((32, 15, 160), device='cuda')
model = LinMulT(input_modality_channels=[1024, 160], output_dim=5).cuda()
y_pred_cls, y_pred_seq = model([x_1, x_2])

# 1. output shape: (batch_size, output_dimension)
assert y_pred_cls.size() == torch.Size([32, 5])

# 2. output shape: (batch_size, time_dimension, output_dimension)
assert y_pred_seq.size() == torch.Size([32, 15, 5])
```

### Example 3:
Multimodal Transformer with Linear Attention. The forward pass is performed using 3 input sequences with different time dimensions.
```
import torch
from linmult import LinMulT

# input shape: (batch_size, time_dimension, feature_dimension)
x_1 = torch.rand((16, 1500, 25), device='cuda')
x_2 = torch.rand((16, 450, 35), device='cuda')
x_3 = torch.rand((16, 120, 768), device='cuda')
model = LinMulT(input_modality_channels=[25, 35, 768],
                output_dim=5,
                add_time_collapse=True,
                add_self_attention_fusion=False).cuda()
y_pred_cls = model([x_1, x_2, x_3])

# output shape: (batch_size, output_dimension)
assert y_pred_cls.size() == torch.Size([16, 5])
```

# Similar projects using LinMulT

### (2023) BlinkLinMulT
LinMulT is trained for blink presence detection and eye state recognition tasks.
Our results demonstrate comparable or superior performance compared to state-of-the-art models on 2 tasks, using 7 public benchmark databases.
* paper: BlinkLinMulT: Transformer-based Eye Blink Detection (accepted, available soon)
* code: https://github.com/fodorad/BlinkLinMulT

### (2022) PersonalityLinMulT
LinMulT is trained for Big Five personality trait estimation using the First Impressions V2 dataset and sentiment estimation using the MOSI and MOSEI datasets.
* paper: Multimodal Sentiment and Personality Perception Under Speech: A Comparison of Transformer-based Architectures ([pdf](https://proceedings.mlr.press/v173/fodor22a/fodor22a.pdf), [website](https://proceedings.mlr.press/v173/fodor22a.html))
* code: https://github.com/fodorad/PersonalityLinMulT


# Citation - BibTex
If you found our research helpful or influential please consider citing:

### (2023) LinMulT for blink presence detection and eye state recognition:
```
@article{blinklinmult-fodor23,
  title = {BlinkLinMulT: Transformer-based Eye Blink Detection},
  author = {Fodor, {\'A}d{\'a}m and Fenech, Kristian and L{\H{o}}rincz, Andr{\'a}s},
  journal = {...}
  pages = {1--19},
  year = {2023}
}
```

### (2022) LinMulT for personality trait and sentiment estimation:
```
@InProceedings{pmlr-v173-fodor22a,
  title = {Multimodal Sentiment and Personality Perception Under Speech: A Comparison of Transformer-based Architectures},
  author = {Fodor, {\'A}d{\'a}m and Saboundji, Rachid R. and Jacques Junior, Julio C. S. and Escalera, Sergio and Gallardo-Pujol, David and L{\H{o}}rincz, Andr{\'a}s},
  booktitle = {Understanding Social Behavior in Dyadic and Small Group Interactions},
  pages = {218--241},
  year = {2022},
  editor = {Palmero, Cristina and Jacques Junior, Julio C. S. and Clapés, Albert and Guyon, Isabelle and Tu, Wei-Wei and Moeslund, Thomas B. and Escalera, Sergio},
  volume = {173},
  series = {Proceedings of Machine Learning Research},
  month = {16 Oct},
  publisher = {PMLR},
  pdf = {https://proceedings.mlr.press/v173/fodor22a/fodor22a.pdf},
  url = {https://proceedings.mlr.press/v173/fodor22a.html}
}
```

# Acknowledgement
The code is inspired by the following two materials:

### Multimodal Transformer:
* paper: Multimodal Transformer for Unaligned Multimodal Language Sequences ([1906.00295](https://arxiv.org/pdf/1906.00295.pdf))
* code: https://github.com/yaohungt/Multimodal-Transformer

### Linear Attention:
* paper: Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention ([2006.16236](https://arxiv.org/pdf/2006.16236.pdf))
* code: https://github.com/idiap/fast-transformers

# Contact
* Ádám Fodor (foauaai@inf.elte.hu)
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "linmult",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "linear-complexity attention,multimodal,transformer",
    "author": null,
    "author_email": "fodorad <foauaai@inf.elte.hu>",
    "download_url": "https://files.pythonhosted.org/packages/ea/b1/ec27afa2b6b740ca7c0e6e6ed627d200153a956d7fe5bc97ce9bd7afe5f7/linmult-1.2.0.tar.gz",
    "platform": null,
    "description": "# LinMulT\n[![License](https://img.shields.io/badge/license-MIT-yellow.svg)](LICENSE)\n[![python](https://img.shields.io/badge/Python-3.10-3776AB.svg?style=flat&logo=python&logoColor=white)](https://www.python.org)\n[![pytorch](https://img.shields.io/badge/PyTorch-2.0.1-EE4C2C.svg?style=flat&logo=pytorch)](https://pytorch.org)\n\nGeneral-purpose Multimodal Transformer with Linear Complexity Attention Mechanism.\n\n# Setup\n### Install package from PyPi\n```\npip install linmult\n```\n\n### Install package from repository root\n```\ngit clone https://github.com/fodorad/LinMulT\ncd LinMulT\npip install -e .\npip install -U -r requirements.txt\npython -m unittest\n```\n\n# Quick start\n### Example 1:\nSimple transformer encoder with linear attention.\nThe forward pass is performed using an input sequence.\n```\nimport torch\nfrom linmult import LinT\n\n# input shape: (batch_size, time_dimension, feature_dimension)\nx = torch.rand((32, 15, 1024), device='cuda')\nmodel = LinT(input_modality_channels=1024, output_dim=5).cuda()\ny_pred_seq = model(x)\n\n# output shape: (batch_size, time_dimension, output_dimension)\nassert y_pred_seq.size() == torch.Size([32, 15, 5])\n```\n\n### Example 2:\nMultimodal Transformer with Linear Attention.\nThe forward pass is performed using 2 input sequences. Both input sequences have the same time dimension.\n```\nimport torch\nfrom linmult import LinMulT\n\n# input shape: (batch_size, time_dimension, feature_dimension)\nx_1 = torch.rand((32, 15, 1024), device='cuda')\nx_2 = torch.rand((32, 15, 160), device='cuda')\nmodel = LinMulT(input_modality_channels=[1024, 160], output_dim=5).cuda()\ny_pred_cls, y_pred_seq = model([x_1, x_2])\n\n# 1. output shape: (batch_size, output_dimension)\nassert y_pred_cls.size() == torch.Size([32, 5])\n\n# 2. output shape: (batch_size, time_dimension, output_dimension)\nassert y_pred_seq.size() == torch.Size([32, 15, 5])\n```\n\n### Example 3:\nMultimodal Transformer with Linear Attention. The forward pass is performed using 3 input sequences with different time dimensions.\n```\nimport torch\nfrom linmult import LinMulT\n\n# input shape: (batch_size, time_dimension, feature_dimension)\nx_1 = torch.rand((16, 1500, 25), device='cuda')\nx_2 = torch.rand((16, 450, 35), device='cuda')\nx_3 = torch.rand((16, 120, 768), device='cuda')\nmodel = LinMulT(input_modality_channels=[25, 35, 768],\n                output_dim=5,\n                add_time_collapse=True,\n                add_self_attention_fusion=False).cuda()\ny_pred_cls = model([x_1, x_2, x_3])\n\n# output shape: (batch_size, output_dimension)\nassert y_pred_cls.size() == torch.Size([16, 5])\n```\n\n# Similar projects using LinMulT\n\n### (2023) BlinkLinMulT\nLinMulT is trained for blink presence detection and eye state recognition tasks.\nOur results demonstrate comparable or superior performance compared to state-of-the-art models on 2 tasks, using 7 public benchmark databases.\n* paper: BlinkLinMulT: Transformer-based Eye Blink Detection (accepted, available soon)\n* code: https://github.com/fodorad/BlinkLinMulT\n\n### (2022) PersonalityLinMulT\nLinMulT is trained for Big Five personality trait estimation using the First Impressions V2 dataset and sentiment estimation using the MOSI and MOSEI datasets.\n* paper: Multimodal Sentiment and Personality Perception Under Speech: A Comparison of Transformer-based Architectures ([pdf](https://proceedings.mlr.press/v173/fodor22a/fodor22a.pdf), [website](https://proceedings.mlr.press/v173/fodor22a.html))\n* code: https://github.com/fodorad/PersonalityLinMulT\n\n\n# Citation - BibTex\nIf you found our research helpful or influential please consider citing:\n\n### (2023) LinMulT for blink presence detection and eye state recognition:\n```\n@article{blinklinmult-fodor23,\n  title = {BlinkLinMulT: Transformer-based Eye Blink Detection},\n  author = {Fodor, {\\'A}d{\\'a}m and Fenech, Kristian and L{\\H{o}}rincz, Andr{\\'a}s},\n  journal = {...}\n  pages = {1--19},\n  year = {2023}\n}\n```\n\n### (2022) LinMulT for personality trait and sentiment estimation:\n```\n@InProceedings{pmlr-v173-fodor22a,\n  title = {Multimodal Sentiment and Personality Perception Under Speech: A Comparison of Transformer-based Architectures},\n  author = {Fodor, {\\'A}d{\\'a}m and Saboundji, Rachid R. and Jacques Junior, Julio C. S. and Escalera, Sergio and Gallardo-Pujol, David and L{\\H{o}}rincz, Andr{\\'a}s},\n  booktitle = {Understanding Social Behavior in Dyadic and Small Group Interactions},\n  pages = {218--241},\n  year = {2022},\n  editor = {Palmero, Cristina and Jacques Junior, Julio C. S. and Clap\u00e9s, Albert and Guyon, Isabelle and Tu, Wei-Wei and Moeslund, Thomas B. and Escalera, Sergio},\n  volume = {173},\n  series = {Proceedings of Machine Learning Research},\n  month = {16 Oct},\n  publisher = {PMLR},\n  pdf = {https://proceedings.mlr.press/v173/fodor22a/fodor22a.pdf},\n  url = {https://proceedings.mlr.press/v173/fodor22a.html}\n}\n```\n\n# Acknowledgement\nThe code is inspired by the following two materials:\n\n### Multimodal Transformer:\n* paper: Multimodal Transformer for Unaligned Multimodal Language Sequences ([1906.00295](https://arxiv.org/pdf/1906.00295.pdf))\n* code: https://github.com/yaohungt/Multimodal-Transformer\n\n### Linear Attention:\n* paper: Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention ([2006.16236](https://arxiv.org/pdf/2006.16236.pdf))\n* code: https://github.com/idiap/fast-transformers\n\n# Contact\n* \u00c1d\u00e1m Fodor (foauaai@inf.elte.hu)",
    "bugtrack_url": null,
    "license": null,
    "summary": "General-purpose Multimodal Transformer with Linear Complexity Attention Mechanism.",
    "version": "1.2.0",
    "project_urls": {
        "Documentation": "https://github.com/fodorad/linmult#readme",
        "Issues": "https://github.com/fodorad/linmult/issues",
        "Source": "https://github.com/fodorad/linmult"
    },
    "split_keywords": [
        "linear-complexity attention",
        "multimodal",
        "transformer"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b548e9d61fd76eaa1cb5c552542ea58a5c217318355acf336b3e6cdf271d3136",
                "md5": "ddef298c7f1b774fef6345c28c6aae43",
                "sha256": "ebfb2b36b3273392b8ef59066064b6d62dbc79f779bf5d281f12a1061ec0f216"
            },
            "downloads": -1,
            "filename": "linmult-1.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ddef298c7f1b774fef6345c28c6aae43",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 19994,
            "upload_time": "2023-10-10T16:51:53",
            "upload_time_iso_8601": "2023-10-10T16:51:53.716325Z",
            "url": "https://files.pythonhosted.org/packages/b5/48/e9d61fd76eaa1cb5c552542ea58a5c217318355acf336b3e6cdf271d3136/linmult-1.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "eab1ec27afa2b6b740ca7c0e6e6ed627d200153a956d7fe5bc97ce9bd7afe5f7",
                "md5": "ba1bf7eda6114db29cf0dbcb4b53896f",
                "sha256": "324a310009a65865504fd22ced6473e4163f6e8757aad2270a4d8b66647e5dda"
            },
            "downloads": -1,
            "filename": "linmult-1.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "ba1bf7eda6114db29cf0dbcb4b53896f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 15597,
            "upload_time": "2023-10-10T16:51:51",
            "upload_time_iso_8601": "2023-10-10T16:51:51.799060Z",
            "url": "https://files.pythonhosted.org/packages/ea/b1/ec27afa2b6b740ca7c0e6e6ed627d200153a956d7fe5bc97ce9bd7afe5f7/linmult-1.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-10 16:51:51",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "fodorad",
    "github_project": "linmult#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "linmult"
}
        
Elapsed time: 0.12765s