# LinMulT
[![License](https://img.shields.io/badge/license-MIT-yellow.svg)](LICENSE)
[![python](https://img.shields.io/badge/Python-3.10-3776AB.svg?style=flat&logo=python&logoColor=white)](https://www.python.org)
[![pytorch](https://img.shields.io/badge/PyTorch-2.0.1-EE4C2C.svg?style=flat&logo=pytorch)](https://pytorch.org)
General-purpose Multimodal Transformer with Linear Complexity Attention Mechanism.
# Setup
### Install package from PyPi
```
pip install linmult
```
### Install package from repository root
```
git clone https://github.com/fodorad/LinMulT
cd LinMulT
pip install -e .
pip install -U -r requirements.txt
python -m unittest
```
# Quick start
### Example 1:
Simple transformer encoder with linear attention.
The forward pass is performed using an input sequence.
```
import torch
from linmult import LinT
# input shape: (batch_size, time_dimension, feature_dimension)
x = torch.rand((32, 15, 1024), device='cuda')
model = LinT(input_modality_channels=1024, output_dim=5).cuda()
y_pred_seq = model(x)
# output shape: (batch_size, time_dimension, output_dimension)
assert y_pred_seq.size() == torch.Size([32, 15, 5])
```
### Example 2:
Multimodal Transformer with Linear Attention.
The forward pass is performed using 2 input sequences. Both input sequences have the same time dimension.
```
import torch
from linmult import LinMulT
# input shape: (batch_size, time_dimension, feature_dimension)
x_1 = torch.rand((32, 15, 1024), device='cuda')
x_2 = torch.rand((32, 15, 160), device='cuda')
model = LinMulT(input_modality_channels=[1024, 160], output_dim=5).cuda()
y_pred_cls, y_pred_seq = model([x_1, x_2])
# 1. output shape: (batch_size, output_dimension)
assert y_pred_cls.size() == torch.Size([32, 5])
# 2. output shape: (batch_size, time_dimension, output_dimension)
assert y_pred_seq.size() == torch.Size([32, 15, 5])
```
### Example 3:
Multimodal Transformer with Linear Attention. The forward pass is performed using 3 input sequences with different time dimensions.
```
import torch
from linmult import LinMulT
# input shape: (batch_size, time_dimension, feature_dimension)
x_1 = torch.rand((16, 1500, 25), device='cuda')
x_2 = torch.rand((16, 450, 35), device='cuda')
x_3 = torch.rand((16, 120, 768), device='cuda')
model = LinMulT(input_modality_channels=[25, 35, 768],
output_dim=5,
add_time_collapse=True,
add_self_attention_fusion=False).cuda()
y_pred_cls = model([x_1, x_2, x_3])
# output shape: (batch_size, output_dimension)
assert y_pred_cls.size() == torch.Size([16, 5])
```
# Similar projects using LinMulT
### (2023) BlinkLinMulT
LinMulT is trained for blink presence detection and eye state recognition tasks.
Our results demonstrate comparable or superior performance compared to state-of-the-art models on 2 tasks, using 7 public benchmark databases.
* paper: BlinkLinMulT: Transformer-based Eye Blink Detection (accepted, available soon)
* code: https://github.com/fodorad/BlinkLinMulT
### (2022) PersonalityLinMulT
LinMulT is trained for Big Five personality trait estimation using the First Impressions V2 dataset and sentiment estimation using the MOSI and MOSEI datasets.
* paper: Multimodal Sentiment and Personality Perception Under Speech: A Comparison of Transformer-based Architectures ([pdf](https://proceedings.mlr.press/v173/fodor22a/fodor22a.pdf), [website](https://proceedings.mlr.press/v173/fodor22a.html))
* code: https://github.com/fodorad/PersonalityLinMulT
# Citation - BibTex
If you found our research helpful or influential please consider citing:
### (2023) LinMulT for blink presence detection and eye state recognition:
```
@article{blinklinmult-fodor23,
title = {BlinkLinMulT: Transformer-based Eye Blink Detection},
author = {Fodor, {\'A}d{\'a}m and Fenech, Kristian and L{\H{o}}rincz, Andr{\'a}s},
journal = {...}
pages = {1--19},
year = {2023}
}
```
### (2022) LinMulT for personality trait and sentiment estimation:
```
@InProceedings{pmlr-v173-fodor22a,
title = {Multimodal Sentiment and Personality Perception Under Speech: A Comparison of Transformer-based Architectures},
author = {Fodor, {\'A}d{\'a}m and Saboundji, Rachid R. and Jacques Junior, Julio C. S. and Escalera, Sergio and Gallardo-Pujol, David and L{\H{o}}rincz, Andr{\'a}s},
booktitle = {Understanding Social Behavior in Dyadic and Small Group Interactions},
pages = {218--241},
year = {2022},
editor = {Palmero, Cristina and Jacques Junior, Julio C. S. and Clapés, Albert and Guyon, Isabelle and Tu, Wei-Wei and Moeslund, Thomas B. and Escalera, Sergio},
volume = {173},
series = {Proceedings of Machine Learning Research},
month = {16 Oct},
publisher = {PMLR},
pdf = {https://proceedings.mlr.press/v173/fodor22a/fodor22a.pdf},
url = {https://proceedings.mlr.press/v173/fodor22a.html}
}
```
# Acknowledgement
The code is inspired by the following two materials:
### Multimodal Transformer:
* paper: Multimodal Transformer for Unaligned Multimodal Language Sequences ([1906.00295](https://arxiv.org/pdf/1906.00295.pdf))
* code: https://github.com/yaohungt/Multimodal-Transformer
### Linear Attention:
* paper: Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention ([2006.16236](https://arxiv.org/pdf/2006.16236.pdf))
* code: https://github.com/idiap/fast-transformers
# Contact
* Ádám Fodor (foauaai@inf.elte.hu)
Raw data
{
"_id": null,
"home_page": null,
"name": "linmult",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "linear-complexity attention,multimodal,transformer",
"author": null,
"author_email": "fodorad <foauaai@inf.elte.hu>",
"download_url": "https://files.pythonhosted.org/packages/ea/b1/ec27afa2b6b740ca7c0e6e6ed627d200153a956d7fe5bc97ce9bd7afe5f7/linmult-1.2.0.tar.gz",
"platform": null,
"description": "# LinMulT\n[![License](https://img.shields.io/badge/license-MIT-yellow.svg)](LICENSE)\n[![python](https://img.shields.io/badge/Python-3.10-3776AB.svg?style=flat&logo=python&logoColor=white)](https://www.python.org)\n[![pytorch](https://img.shields.io/badge/PyTorch-2.0.1-EE4C2C.svg?style=flat&logo=pytorch)](https://pytorch.org)\n\nGeneral-purpose Multimodal Transformer with Linear Complexity Attention Mechanism.\n\n# Setup\n### Install package from PyPi\n```\npip install linmult\n```\n\n### Install package from repository root\n```\ngit clone https://github.com/fodorad/LinMulT\ncd LinMulT\npip install -e .\npip install -U -r requirements.txt\npython -m unittest\n```\n\n# Quick start\n### Example 1:\nSimple transformer encoder with linear attention.\nThe forward pass is performed using an input sequence.\n```\nimport torch\nfrom linmult import LinT\n\n# input shape: (batch_size, time_dimension, feature_dimension)\nx = torch.rand((32, 15, 1024), device='cuda')\nmodel = LinT(input_modality_channels=1024, output_dim=5).cuda()\ny_pred_seq = model(x)\n\n# output shape: (batch_size, time_dimension, output_dimension)\nassert y_pred_seq.size() == torch.Size([32, 15, 5])\n```\n\n### Example 2:\nMultimodal Transformer with Linear Attention.\nThe forward pass is performed using 2 input sequences. Both input sequences have the same time dimension.\n```\nimport torch\nfrom linmult import LinMulT\n\n# input shape: (batch_size, time_dimension, feature_dimension)\nx_1 = torch.rand((32, 15, 1024), device='cuda')\nx_2 = torch.rand((32, 15, 160), device='cuda')\nmodel = LinMulT(input_modality_channels=[1024, 160], output_dim=5).cuda()\ny_pred_cls, y_pred_seq = model([x_1, x_2])\n\n# 1. output shape: (batch_size, output_dimension)\nassert y_pred_cls.size() == torch.Size([32, 5])\n\n# 2. output shape: (batch_size, time_dimension, output_dimension)\nassert y_pred_seq.size() == torch.Size([32, 15, 5])\n```\n\n### Example 3:\nMultimodal Transformer with Linear Attention. The forward pass is performed using 3 input sequences with different time dimensions.\n```\nimport torch\nfrom linmult import LinMulT\n\n# input shape: (batch_size, time_dimension, feature_dimension)\nx_1 = torch.rand((16, 1500, 25), device='cuda')\nx_2 = torch.rand((16, 450, 35), device='cuda')\nx_3 = torch.rand((16, 120, 768), device='cuda')\nmodel = LinMulT(input_modality_channels=[25, 35, 768],\n output_dim=5,\n add_time_collapse=True,\n add_self_attention_fusion=False).cuda()\ny_pred_cls = model([x_1, x_2, x_3])\n\n# output shape: (batch_size, output_dimension)\nassert y_pred_cls.size() == torch.Size([16, 5])\n```\n\n# Similar projects using LinMulT\n\n### (2023) BlinkLinMulT\nLinMulT is trained for blink presence detection and eye state recognition tasks.\nOur results demonstrate comparable or superior performance compared to state-of-the-art models on 2 tasks, using 7 public benchmark databases.\n* paper: BlinkLinMulT: Transformer-based Eye Blink Detection (accepted, available soon)\n* code: https://github.com/fodorad/BlinkLinMulT\n\n### (2022) PersonalityLinMulT\nLinMulT is trained for Big Five personality trait estimation using the First Impressions V2 dataset and sentiment estimation using the MOSI and MOSEI datasets.\n* paper: Multimodal Sentiment and Personality Perception Under Speech: A Comparison of Transformer-based Architectures ([pdf](https://proceedings.mlr.press/v173/fodor22a/fodor22a.pdf), [website](https://proceedings.mlr.press/v173/fodor22a.html))\n* code: https://github.com/fodorad/PersonalityLinMulT\n\n\n# Citation - BibTex\nIf you found our research helpful or influential please consider citing:\n\n### (2023) LinMulT for blink presence detection and eye state recognition:\n```\n@article{blinklinmult-fodor23,\n title = {BlinkLinMulT: Transformer-based Eye Blink Detection},\n author = {Fodor, {\\'A}d{\\'a}m and Fenech, Kristian and L{\\H{o}}rincz, Andr{\\'a}s},\n journal = {...}\n pages = {1--19},\n year = {2023}\n}\n```\n\n### (2022) LinMulT for personality trait and sentiment estimation:\n```\n@InProceedings{pmlr-v173-fodor22a,\n title = {Multimodal Sentiment and Personality Perception Under Speech: A Comparison of Transformer-based Architectures},\n author = {Fodor, {\\'A}d{\\'a}m and Saboundji, Rachid R. and Jacques Junior, Julio C. S. and Escalera, Sergio and Gallardo-Pujol, David and L{\\H{o}}rincz, Andr{\\'a}s},\n booktitle = {Understanding Social Behavior in Dyadic and Small Group Interactions},\n pages = {218--241},\n year = {2022},\n editor = {Palmero, Cristina and Jacques Junior, Julio C. S. and Clap\u00e9s, Albert and Guyon, Isabelle and Tu, Wei-Wei and Moeslund, Thomas B. and Escalera, Sergio},\n volume = {173},\n series = {Proceedings of Machine Learning Research},\n month = {16 Oct},\n publisher = {PMLR},\n pdf = {https://proceedings.mlr.press/v173/fodor22a/fodor22a.pdf},\n url = {https://proceedings.mlr.press/v173/fodor22a.html}\n}\n```\n\n# Acknowledgement\nThe code is inspired by the following two materials:\n\n### Multimodal Transformer:\n* paper: Multimodal Transformer for Unaligned Multimodal Language Sequences ([1906.00295](https://arxiv.org/pdf/1906.00295.pdf))\n* code: https://github.com/yaohungt/Multimodal-Transformer\n\n### Linear Attention:\n* paper: Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention ([2006.16236](https://arxiv.org/pdf/2006.16236.pdf))\n* code: https://github.com/idiap/fast-transformers\n\n# Contact\n* \u00c1d\u00e1m Fodor (foauaai@inf.elte.hu)",
"bugtrack_url": null,
"license": null,
"summary": "General-purpose Multimodal Transformer with Linear Complexity Attention Mechanism.",
"version": "1.2.0",
"project_urls": {
"Documentation": "https://github.com/fodorad/linmult#readme",
"Issues": "https://github.com/fodorad/linmult/issues",
"Source": "https://github.com/fodorad/linmult"
},
"split_keywords": [
"linear-complexity attention",
"multimodal",
"transformer"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "b548e9d61fd76eaa1cb5c552542ea58a5c217318355acf336b3e6cdf271d3136",
"md5": "ddef298c7f1b774fef6345c28c6aae43",
"sha256": "ebfb2b36b3273392b8ef59066064b6d62dbc79f779bf5d281f12a1061ec0f216"
},
"downloads": -1,
"filename": "linmult-1.2.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ddef298c7f1b774fef6345c28c6aae43",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 19994,
"upload_time": "2023-10-10T16:51:53",
"upload_time_iso_8601": "2023-10-10T16:51:53.716325Z",
"url": "https://files.pythonhosted.org/packages/b5/48/e9d61fd76eaa1cb5c552542ea58a5c217318355acf336b3e6cdf271d3136/linmult-1.2.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "eab1ec27afa2b6b740ca7c0e6e6ed627d200153a956d7fe5bc97ce9bd7afe5f7",
"md5": "ba1bf7eda6114db29cf0dbcb4b53896f",
"sha256": "324a310009a65865504fd22ced6473e4163f6e8757aad2270a4d8b66647e5dda"
},
"downloads": -1,
"filename": "linmult-1.2.0.tar.gz",
"has_sig": false,
"md5_digest": "ba1bf7eda6114db29cf0dbcb4b53896f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 15597,
"upload_time": "2023-10-10T16:51:51",
"upload_time_iso_8601": "2023-10-10T16:51:51.799060Z",
"url": "https://files.pythonhosted.org/packages/ea/b1/ec27afa2b6b740ca7c0e6e6ed627d200153a956d7fe5bc97ce9bd7afe5f7/linmult-1.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-10-10 16:51:51",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "fodorad",
"github_project": "linmult#readme",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "linmult"
}