# BlinkLinMulT
[![License](https://img.shields.io/badge/license-MIT-yellow.svg)](LICENSE)
[![python](https://img.shields.io/badge/Python-3.11-3776AB.svg?style=flat&logo=python&logoColor=white)](https://www.python.org)
[![pytorch](https://img.shields.io/badge/PyTorch-2.0.1-EE4C2C.svg?style=flat&logo=pytorch)](https://pytorch.org)
BlinkLinMulT is trained for blink presence detection and eye state recognition tasks.
Our results demonstrate comparable or superior performance compared to state-of-the-art models on 2 tasks, using 7 public benchmark databases.
* paper: **BlinkLinMulT: Transformer-based Eye Blink Detection** ([website](https://www.mdpi.com/2313-433X/9/10/196))
* code: https://github.com/fodorad/BlinkLinMulT
# Setup
### Install package from PyPI for inference
```
pip install blinklinmult
```
### Install package for training
```
git clone https://github.com/fodorad/BlinkLinMulT
cd BlinkLinMulT
pip install -e .[all]
pip install -U -r requirements.txt
python -m unittest discover -s test
```
#### Supported extras definitions:
| extras tag | description |
| --- | --- |
| train | dependencies for training the model from scratch |
| all | extends the train dependencies for development, e.g. to include CLIP models |
# Quick start
### Load models from the paper with pre-trained weights
The pre-trained weights are loaded by default.
```
from blinklinmult.models import DenseNet121, BlinkLinT, BlinkLinMulT
model1 = DenseNet121()
model2 = BlinkLinT()
model3 = BlinkLinMulT()
```
In the next sessions there are more detailed examples with dummy data, e.g. a forward pass is performed, shapes are mentioned.
### Inference with dummy data
Pre-trained DenseNet121 for eye state recognition.
The forward pass is performed using an input image.
```
import torch
from blinklinmult.models import DenseNet121
# input shape: (batch_size, channels, height, width)
x = torch.rand((32, 3, 64, 64), device='cuda')
model = DenseNet121(output_dim=1, weights='densenet121-union').cuda()
y_pred = model(x)
# output shape: (batch_size, output_dimension)
assert y_pred.size() == torch.Size([32, 1])
```
Pre-trained BlinkLinT for blink presence detection and eye state recognition.
The forward pass is performed using sequence of images.
```
import torch
from blinklinmult.models import BlinkLinT
# input shape: (batch_size, time_dimension, channels, height, width)
x = torch.rand((8, 15, 3, 64, 64), device='cuda')
model = BlinkLinT(output_dim=1, weights='blinklint-union').cuda()
y_seq = model(x)
# output shape: (batch_size, time_dimension, output_dimension)
assert y_seq.size() == torch.Size([8, 15, 1])
```
Pre-trained BlinkLinMulT for blink presence detection and eye state recognition.
The forward pass is performed using sequence of images and sequence of high-level features.
```
import torch
from blinklinmult.models import BlinkLinMulT
# inputs with shapes: [(batch_size, time_dimension, channels, height, width), (batch_size, time_dimension, feature_dimension)]
x1 = torch.rand((8, 15, 3, 64, 64), device='cuda')
x2 = torch.rand((8, 15, 160), device='cuda')
model = BlinkLinMulT(input_dim=160, output_dim=1, weights='blinklinmult-union').cuda()
y_cls, y_seq = model([x1, x2])
# output shape: (batch_size, time_dimension, output_dimension)
assert y_seq.size() == torch.Size([8, 15, 1])
assert y_cls.size() == torch.Size([8, 1])
```
# Related projects
### exordium
Collection of preprocessing functions and deep learning methods. This repository contains revised codes for fine landmark detection (including face, eye region, iris and pupil landmarks), head pose estimation, and eye feature calculation.
* code: https://github.com/fodorad/exordium
### (2022) LinMulT
General-purpose Multimodal Transformer with Linear Complexity Attention Mechanism. This base model is further modified and trained for various tasks and datasets.
* code: https://github.com/fodorad/LinMulT
### (2022) PersonalityLinMulT for personality trait and sentiment estimation
LinMulT is trained for Big Five personality trait estimation using the First Impressions V2 dataset and sentiment estimation using the MOSI and MOSEI datasets.
* paper: Multimodal Sentiment and Personality Perception Under Speech: A Comparison of Transformer-based Architectures ([pdf](https://proceedings.mlr.press/v173/fodor22a/fodor22a.pf), [website](https://proceedings.mlr.press/v173/fodor22a.html))
* code: https://github.com/fodorad/PersonalityLinMulT (soon)
# Citation - BibTex
If you found our research helpful or influential please consider citing:
### (2023) BlinkLinMulT for blink presence detection and eye state recognition
```
@Article{fodor2023blinklinmult,
title = {BlinkLinMulT: Transformer-Based Eye Blink Detection},
author = {Fodor, Ádám and Fenech, Kristian and Lőrincz, András},
journal = {Journal of Imaging},
volume = {9},
year = {2023},
number = {10},
article-number = {196},
url = {https://www.mdpi.com/2313-433X/9/10/196},
PubMedID = {37888303},
ISSN = {2313-433X},
DOI = {10.3390/jimaging9100196}
}
```
### (2022) LinMulT for personality trait and sentiment estimation
```
@InProceedings{pmlr-v173-fodor22a,
title = {Multimodal Sentiment and Personality Perception Under Speech: A Comparison of Transformer-based Architectures},
author = {Fodor, {\'A}d{\'a}m and Saboundji, Rachid R. and Jacques Junior, Julio C. S. and Escalera, Sergio and Gallardo-Pujol, David and L{\H{o}}rincz, Andr{\'a}s},
booktitle = {Understanding Social Behavior in Dyadic and Small Group Interactions},
pages = {218--241},
year = {2022},
editor = {Palmero, Cristina and Jacques Junior, Julio C. S. and Clapés, Albert and Guyon, Isabelle and Tu, Wei-Wei and Moeslund, Thomas B. and Escalera, Sergio},
volume = {173},
series = {Proceedings of Machine Learning Research},
month = {16 Oct},
publisher = {PMLR},
pdf = {https://proceedings.mlr.press/v173/fodor22a/fodor22a.pdf},
url = {https://proceedings.mlr.press/v173/fodor22a.html}
}
```
# What's next
* Preprocessed data hosting for easier reproduction and benchmarking
* Add train and test scripts for various databases
# Updates
* 1.0.0: Release version. Inference only with pre-trained models.
# Contact
* Ádám Fodor (foauaai@inf.elte.hu)
Raw data
{
"_id": null,
"home_page": null,
"name": "blinklinmult",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.11",
"maintainer_email": null,
"keywords": "blink, blink presence detection, eye state recognition, linear-complexity attention, multimodal, transformer",
"author": null,
"author_email": "fodorad <foauaai@inf.elte.hu>",
"download_url": "https://files.pythonhosted.org/packages/19/35/d1aaddb01eb6f8b917f910d851712bc95ac9b7b109a69bc090c66c1e88a4/blinklinmult-1.0.1.tar.gz",
"platform": null,
"description": "# BlinkLinMulT\n[![License](https://img.shields.io/badge/license-MIT-yellow.svg)](LICENSE)\n[![python](https://img.shields.io/badge/Python-3.11-3776AB.svg?style=flat&logo=python&logoColor=white)](https://www.python.org)\n[![pytorch](https://img.shields.io/badge/PyTorch-2.0.1-EE4C2C.svg?style=flat&logo=pytorch)](https://pytorch.org)\n\nBlinkLinMulT is trained for blink presence detection and eye state recognition tasks.\nOur results demonstrate comparable or superior performance compared to state-of-the-art models on 2 tasks, using 7 public benchmark databases.\n* paper: **BlinkLinMulT: Transformer-based Eye Blink Detection** ([website](https://www.mdpi.com/2313-433X/9/10/196))\n* code: https://github.com/fodorad/BlinkLinMulT\n\n# Setup\n### Install package from PyPI for inference\n```\npip install blinklinmult\n```\n\n### Install package for training\n```\ngit clone https://github.com/fodorad/BlinkLinMulT\ncd BlinkLinMulT\npip install -e .[all]\npip install -U -r requirements.txt\npython -m unittest discover -s test\n```\n\n#### Supported extras definitions:\n| extras tag | description |\n| --- | --- |\n| train | dependencies for training the model from scratch |\n| all | extends the train dependencies for development, e.g. to include CLIP models |\n\n# Quick start\n### Load models from the paper with pre-trained weights\nThe pre-trained weights are loaded by default.\n```\nfrom blinklinmult.models import DenseNet121, BlinkLinT, BlinkLinMulT\n\nmodel1 = DenseNet121()\nmodel2 = BlinkLinT()\nmodel3 = BlinkLinMulT()\n```\nIn the next sessions there are more detailed examples with dummy data, e.g. a forward pass is performed, shapes are mentioned.\n\n\n### Inference with dummy data\nPre-trained DenseNet121 for eye state recognition.\nThe forward pass is performed using an input image.\n```\nimport torch\nfrom blinklinmult.models import DenseNet121\n\n# input shape: (batch_size, channels, height, width)\nx = torch.rand((32, 3, 64, 64), device='cuda')\nmodel = DenseNet121(output_dim=1, weights='densenet121-union').cuda()\ny_pred = model(x)\n\n# output shape: (batch_size, output_dimension)\nassert y_pred.size() == torch.Size([32, 1])\n```\n\nPre-trained BlinkLinT for blink presence detection and eye state recognition.\nThe forward pass is performed using sequence of images.\n```\nimport torch\nfrom blinklinmult.models import BlinkLinT\n\n# input shape: (batch_size, time_dimension, channels, height, width)\nx = torch.rand((8, 15, 3, 64, 64), device='cuda')\nmodel = BlinkLinT(output_dim=1, weights='blinklint-union').cuda()\ny_seq = model(x)\n\n# output shape: (batch_size, time_dimension, output_dimension)\nassert y_seq.size() == torch.Size([8, 15, 1])\n```\n\nPre-trained BlinkLinMulT for blink presence detection and eye state recognition.\nThe forward pass is performed using sequence of images and sequence of high-level features.\n```\nimport torch\nfrom blinklinmult.models import BlinkLinMulT\n\n# inputs with shapes: [(batch_size, time_dimension, channels, height, width), (batch_size, time_dimension, feature_dimension)]\nx1 = torch.rand((8, 15, 3, 64, 64), device='cuda')\nx2 = torch.rand((8, 15, 160), device='cuda')\nmodel = BlinkLinMulT(input_dim=160, output_dim=1, weights='blinklinmult-union').cuda()\ny_cls, y_seq = model([x1, x2])\n\n# output shape: (batch_size, time_dimension, output_dimension)\nassert y_seq.size() == torch.Size([8, 15, 1])\nassert y_cls.size() == torch.Size([8, 1])\n```\n\n# Related projects\n\n### exordium\nCollection of preprocessing functions and deep learning methods. This repository contains revised codes for fine landmark detection (including face, eye region, iris and pupil landmarks), head pose estimation, and eye feature calculation.\n* code: https://github.com/fodorad/exordium\n\n### (2022) LinMulT\nGeneral-purpose Multimodal Transformer with Linear Complexity Attention Mechanism. This base model is further modified and trained for various tasks and datasets.\n* code: https://github.com/fodorad/LinMulT\n\n### (2022) PersonalityLinMulT for personality trait and sentiment estimation\nLinMulT is trained for Big Five personality trait estimation using the First Impressions V2 dataset and sentiment estimation using the MOSI and MOSEI datasets.\n* paper: Multimodal Sentiment and Personality Perception Under Speech: A Comparison of Transformer-based Architectures ([pdf](https://proceedings.mlr.press/v173/fodor22a/fodor22a.pf), [website](https://proceedings.mlr.press/v173/fodor22a.html))\n* code: https://github.com/fodorad/PersonalityLinMulT (soon)\n\n\n# Citation - BibTex\nIf you found our research helpful or influential please consider citing:\n\n### (2023) BlinkLinMulT for blink presence detection and eye state recognition\n```\n@Article{fodor2023blinklinmult,\n title = {BlinkLinMulT: Transformer-Based Eye Blink Detection},\n author = {Fodor, \u00c1d\u00e1m and Fenech, Kristian and L\u0151rincz, Andr\u00e1s},\n journal = {Journal of Imaging},\n volume = {9},\n year = {2023},\n number = {10},\n article-number = {196},\n url = {https://www.mdpi.com/2313-433X/9/10/196},\n PubMedID = {37888303},\n ISSN = {2313-433X},\n DOI = {10.3390/jimaging9100196}\n}\n```\n\n### (2022) LinMulT for personality trait and sentiment estimation\n```\n@InProceedings{pmlr-v173-fodor22a,\n title = {Multimodal Sentiment and Personality Perception Under Speech: A Comparison of Transformer-based Architectures},\n author = {Fodor, {\\'A}d{\\'a}m and Saboundji, Rachid R. and Jacques Junior, Julio C. S. and Escalera, Sergio and Gallardo-Pujol, David and L{\\H{o}}rincz, Andr{\\'a}s},\n booktitle = {Understanding Social Behavior in Dyadic and Small Group Interactions},\n pages = {218--241},\n year = {2022},\n editor = {Palmero, Cristina and Jacques Junior, Julio C. S. and Clap\u00e9s, Albert and Guyon, Isabelle and Tu, Wei-Wei and Moeslund, Thomas B. and Escalera, Sergio},\n volume = {173},\n series = {Proceedings of Machine Learning Research},\n month = {16 Oct},\n publisher = {PMLR},\n pdf = {https://proceedings.mlr.press/v173/fodor22a/fodor22a.pdf},\n url = {https://proceedings.mlr.press/v173/fodor22a.html}\n}\n```\n\n# What's next\n* Preprocessed data hosting for easier reproduction and benchmarking\n* Add train and test scripts for various databases\n\n# Updates\n* 1.0.0: Release version. Inference only with pre-trained models.\n\n# Contact\n* \u00c1d\u00e1m Fodor (foauaai@inf.elte.hu)",
"bugtrack_url": null,
"license": "MIT",
"summary": "BlinkLinMulT: Transformer-based Eye Blink Detection.",
"version": "1.0.1",
"project_urls": {
"Documentation": "https://github.com/fodorad/blinklinmult#readme",
"Issues": "https://github.com/fodorad/blinklinmult/issues",
"Source": "https://github.com/fodorad/blinklinmult"
},
"split_keywords": [
"blink",
" blink presence detection",
" eye state recognition",
" linear-complexity attention",
" multimodal",
" transformer"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "6c21f4b87d5be8f824facad65f72dc2283f4bbb978a096a22027ea6695eb889f",
"md5": "09fbc872e01a6d738c378ffece87c7a6",
"sha256": "eef8be062147fb87819f3453bc402a12c7bcf469d3307c2470cc10bb53318dd2"
},
"downloads": -1,
"filename": "blinklinmult-1.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "09fbc872e01a6d738c378ffece87c7a6",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.11",
"size": 25461,
"upload_time": "2024-11-26T10:40:42",
"upload_time_iso_8601": "2024-11-26T10:40:42.787842Z",
"url": "https://files.pythonhosted.org/packages/6c/21/f4b87d5be8f824facad65f72dc2283f4bbb978a096a22027ea6695eb889f/blinklinmult-1.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "1935d1aaddb01eb6f8b917f910d851712bc95ac9b7b109a69bc090c66c1e88a4",
"md5": "b5fca1620a0d312b0c35dddbe4cebb77",
"sha256": "472e2d45f0c261e640d6c9176f7ca59fda18eb28dbae38b6f67ac7d7d38ef7e7"
},
"downloads": -1,
"filename": "blinklinmult-1.0.1.tar.gz",
"has_sig": false,
"md5_digest": "b5fca1620a0d312b0c35dddbe4cebb77",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.11",
"size": 19619,
"upload_time": "2024-11-26T10:40:45",
"upload_time_iso_8601": "2024-11-26T10:40:45.015498Z",
"url": "https://files.pythonhosted.org/packages/19/35/d1aaddb01eb6f8b917f910d851712bc95ac9b7b109a69bc090c66c1e88a4/blinklinmult-1.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-26 10:40:45",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "fodorad",
"github_project": "blinklinmult#readme",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "toml",
"specs": []
},
{
"name": "tqdm",
"specs": []
},
{
"name": "matplotlib",
"specs": []
},
{
"name": "opencv-python",
"specs": []
},
{
"name": "torch",
"specs": []
},
{
"name": "torchvision",
"specs": []
},
{
"name": "bbox_visualizer",
"specs": []
},
{
"name": "linmult",
"specs": [
[
"==",
"1.1.0"
]
]
},
{
"name": "exordium",
"specs": []
}
],
"lcname": "blinklinmult"
}