[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)
# Screen AI
Implementation of the ScreenAI model from the paper: "A Vision-Language Model for UI and Infographics Understanding". The flow is:
img + text -> patch sizes -> vit -> embed + concat -> attn + ffn -> cross attn + ffn + self attn -> to out. [PAPER LINK: ](https://arxiv.org/abs/2402.04615)
## Install
`pip3 install screenai`
## Usage
```python
import torch
from screenai.main import ScreenAI
# Create a tensor for the image
image = torch.rand(1, 3, 224, 224)
# Create a tensor for the text
text = torch.randn(1, 1, 512)
# Create an instance of the ScreenAI model with specified parameters
model = ScreenAI(
patch_size=16,
image_size=224,
dim=512,
depth=6,
heads=8,
vit_depth=4,
multi_modal_encoder_depth=4,
llm_decoder_depth=4,
mm_encoder_ff_mult=4,
)
# Perform forward pass of the model with the given text and image tensors
out = model(text, image)
# Print the shape of the output tensor
print(out)
```
# License
MIT
## Citation
```bibtex
@misc{baechler2024screenai,
title={ScreenAI: A Vision-Language Model for UI and Infographics Understanding},
author={Gilles Baechler and Srinivas Sunkara and Maria Wang and Fedir Zubach and Hassan Mansoor and Vincent Etter and Victor Cărbune and Jason Lin and Jindong Chen and Abhanshu Sharma},
year={2024},
eprint={2402.04615},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
# Todo
- [ ] Implement the nn.ModuleList([]) in the encoder and decoder
Raw data
{
"_id": null,
"home_page": "https://github.com/kyegomez/ScreenAI",
"name": "screenai",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6,<4.0",
"maintainer_email": "",
"keywords": "artificial intelligence,deep learning,optimizers,Prompt Engineering",
"author": "Kye Gomez",
"author_email": "kye@apac.ai",
"download_url": "https://files.pythonhosted.org/packages/d8/c0/a9e62577833c187bf32933aa338d288ddada1e8420b273857cc3f6b3e075/screenai-0.0.8.tar.gz",
"platform": null,
"description": "[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)\n\n# Screen AI\nImplementation of the ScreenAI model from the paper: \"A Vision-Language Model for UI and Infographics Understanding\". The flow is:\nimg + text -> patch sizes -> vit -> embed + concat -> attn + ffn -> cross attn + ffn + self attn -> to out. [PAPER LINK: ](https://arxiv.org/abs/2402.04615)\n\n## Install\n`pip3 install screenai`\n\n## Usage\n```python\n\nimport torch\nfrom screenai.main import ScreenAI\n\n# Create a tensor for the image\nimage = torch.rand(1, 3, 224, 224)\n\n# Create a tensor for the text\ntext = torch.randn(1, 1, 512)\n\n# Create an instance of the ScreenAI model with specified parameters\nmodel = ScreenAI(\n patch_size=16,\n image_size=224,\n dim=512,\n depth=6,\n heads=8,\n vit_depth=4,\n multi_modal_encoder_depth=4,\n llm_decoder_depth=4,\n mm_encoder_ff_mult=4,\n)\n\n# Perform forward pass of the model with the given text and image tensors\nout = model(text, image)\n\n# Print the shape of the output tensor\nprint(out)\n\n\n```\n\n# License\nMIT\n\n\n## Citation\n```bibtex\n\n@misc{baechler2024screenai,\n title={ScreenAI: A Vision-Language Model for UI and Infographics Understanding}, \n author={Gilles Baechler and Srinivas Sunkara and Maria Wang and Fedir Zubach and Hassan Mansoor and Vincent Etter and Victor C\u0103rbune and Jason Lin and Jindong Chen and Abhanshu Sharma},\n year={2024},\n eprint={2402.04615},\n archivePrefix={arXiv},\n primaryClass={cs.CV}\n}\n```\n\n# Todo\n- [ ] Implement the nn.ModuleList([]) in the encoder and decoder\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Screen AI - Pytorch",
"version": "0.0.8",
"project_urls": {
"Documentation": "https://github.com/kyegomez/ScreenAI",
"Homepage": "https://github.com/kyegomez/ScreenAI",
"Repository": "https://github.com/kyegomez/ScreenAI"
},
"split_keywords": [
"artificial intelligence",
"deep learning",
"optimizers",
"prompt engineering"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "842a14fc880153ea6fcd779d30188c46061260c5624f6b6284493ea1b0a4fc3d",
"md5": "7dcf6222fc66f413f4faf92f86f302df",
"sha256": "83728637490180adb0c2d8eabc359d82a4f70cea79f53e960a7daed2e0056ef6"
},
"downloads": -1,
"filename": "screenai-0.0.8-py3-none-any.whl",
"has_sig": false,
"md5_digest": "7dcf6222fc66f413f4faf92f86f302df",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6,<4.0",
"size": 6726,
"upload_time": "2024-02-08T23:48:52",
"upload_time_iso_8601": "2024-02-08T23:48:52.493698Z",
"url": "https://files.pythonhosted.org/packages/84/2a/14fc880153ea6fcd779d30188c46061260c5624f6b6284493ea1b0a4fc3d/screenai-0.0.8-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d8c0a9e62577833c187bf32933aa338d288ddada1e8420b273857cc3f6b3e075",
"md5": "449d9518ea058b0b1f33050a0f407021",
"sha256": "88bfa0d00baa0c01cb8bca8010f679d1034ac13c0b4918763bb0e3121151169d"
},
"downloads": -1,
"filename": "screenai-0.0.8.tar.gz",
"has_sig": false,
"md5_digest": "449d9518ea058b0b1f33050a0f407021",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6,<4.0",
"size": 6904,
"upload_time": "2024-02-08T23:48:54",
"upload_time_iso_8601": "2024-02-08T23:48:54.276175Z",
"url": "https://files.pythonhosted.org/packages/d8/c0/a9e62577833c187bf32933aa338d288ddada1e8420b273857cc3f6b3e075/screenai-0.0.8.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-02-08 23:48:54",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "kyegomez",
"github_project": "ScreenAI",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "screenai"
}