screenai

Name	screenai JSON
Version	0.0.8 JSON
	download
home_page	https://github.com/kyegomez/ScreenAI
Summary	Screen AI - Pytorch
upload_time	2024-02-08 23:48:54
maintainer
docs_url	None
author	Kye Gomez
requires_python	>=3.6,<4.0
license	MIT
keywords	artificial intelligence deep learning optimizers prompt engineering
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            [![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)

# Screen AI
Implementation of the ScreenAI model from the paper: "A Vision-Language Model for UI and Infographics Understanding". The flow is:
img + text -> patch sizes -> vit -> embed + concat -> attn + ffn -> cross attn + ffn + self attn -> to out. [PAPER LINK: ](https://arxiv.org/abs/2402.04615)

## Install
`pip3 install screenai`

## Usage
```python

import torch
from screenai.main import ScreenAI

# Create a tensor for the image
image = torch.rand(1, 3, 224, 224)

# Create a tensor for the text
text = torch.randn(1, 1, 512)

# Create an instance of the ScreenAI model with specified parameters
model = ScreenAI(
    patch_size=16,
    image_size=224,
    dim=512,
    depth=6,
    heads=8,
    vit_depth=4,
    multi_modal_encoder_depth=4,
    llm_decoder_depth=4,
    mm_encoder_ff_mult=4,
)

# Perform forward pass of the model with the given text and image tensors
out = model(text, image)

# Print the shape of the output tensor
print(out)


```

# License
MIT


## Citation
```bibtex

@misc{baechler2024screenai,
    title={ScreenAI: A Vision-Language Model for UI and Infographics Understanding}, 
    author={Gilles Baechler and Srinivas Sunkara and Maria Wang and Fedir Zubach and Hassan Mansoor and Vincent Etter and Victor Cărbune and Jason Lin and Jindong Chen and Abhanshu Sharma},
    year={2024},
    eprint={2402.04615},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
```

# Todo
- [ ] Implement the nn.ModuleList([]) in the encoder and decoder

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/kyegomez/ScreenAI",
    "name": "screenai",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6,<4.0",
    "maintainer_email": "",
    "keywords": "artificial intelligence,deep learning,optimizers,Prompt Engineering",
    "author": "Kye Gomez",
    "author_email": "kye@apac.ai",
    "download_url": "https://files.pythonhosted.org/packages/d8/c0/a9e62577833c187bf32933aa338d288ddada1e8420b273857cc3f6b3e075/screenai-0.0.8.tar.gz",
    "platform": null,
    "description": "[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)\n\n# Screen AI\nImplementation of the ScreenAI model from the paper: \"A Vision-Language Model for UI and Infographics Understanding\". The flow is:\nimg + text -> patch sizes -> vit -> embed + concat -> attn + ffn -> cross attn + ffn + self attn -> to out. [PAPER LINK: ](https://arxiv.org/abs/2402.04615)\n\n## Install\n`pip3 install screenai`\n\n## Usage\n```python\n\nimport torch\nfrom screenai.main import ScreenAI\n\n# Create a tensor for the image\nimage = torch.rand(1, 3, 224, 224)\n\n# Create a tensor for the text\ntext = torch.randn(1, 1, 512)\n\n# Create an instance of the ScreenAI model with specified parameters\nmodel = ScreenAI(\n    patch_size=16,\n    image_size=224,\n    dim=512,\n    depth=6,\n    heads=8,\n    vit_depth=4,\n    multi_modal_encoder_depth=4,\n    llm_decoder_depth=4,\n    mm_encoder_ff_mult=4,\n)\n\n# Perform forward pass of the model with the given text and image tensors\nout = model(text, image)\n\n# Print the shape of the output tensor\nprint(out)\n\n\n```\n\n# License\nMIT\n\n\n## Citation\n```bibtex\n\n@misc{baechler2024screenai,\n    title={ScreenAI: A Vision-Language Model for UI and Infographics Understanding}, \n    author={Gilles Baechler and Srinivas Sunkara and Maria Wang and Fedir Zubach and Hassan Mansoor and Vincent Etter and Victor C\u0103rbune and Jason Lin and Jindong Chen and Abhanshu Sharma},\n    year={2024},\n    eprint={2402.04615},\n    archivePrefix={arXiv},\n    primaryClass={cs.CV}\n}\n```\n\n# Todo\n- [ ] Implement the nn.ModuleList([]) in the encoder and decoder\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Screen AI - Pytorch",
    "version": "0.0.8",
    "project_urls": {
        "Documentation": "https://github.com/kyegomez/ScreenAI",
        "Homepage": "https://github.com/kyegomez/ScreenAI",
        "Repository": "https://github.com/kyegomez/ScreenAI"
    },
    "split_keywords": [
        "artificial intelligence",
        "deep learning",
        "optimizers",
        "prompt engineering"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "842a14fc880153ea6fcd779d30188c46061260c5624f6b6284493ea1b0a4fc3d",
                "md5": "7dcf6222fc66f413f4faf92f86f302df",
                "sha256": "83728637490180adb0c2d8eabc359d82a4f70cea79f53e960a7daed2e0056ef6"
            },
            "downloads": -1,
            "filename": "screenai-0.0.8-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7dcf6222fc66f413f4faf92f86f302df",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6,<4.0",
            "size": 6726,
            "upload_time": "2024-02-08T23:48:52",
            "upload_time_iso_8601": "2024-02-08T23:48:52.493698Z",
            "url": "https://files.pythonhosted.org/packages/84/2a/14fc880153ea6fcd779d30188c46061260c5624f6b6284493ea1b0a4fc3d/screenai-0.0.8-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d8c0a9e62577833c187bf32933aa338d288ddada1e8420b273857cc3f6b3e075",
                "md5": "449d9518ea058b0b1f33050a0f407021",
                "sha256": "88bfa0d00baa0c01cb8bca8010f679d1034ac13c0b4918763bb0e3121151169d"
            },
            "downloads": -1,
            "filename": "screenai-0.0.8.tar.gz",
            "has_sig": false,
            "md5_digest": "449d9518ea058b0b1f33050a0f407021",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6,<4.0",
            "size": 6904,
            "upload_time": "2024-02-08T23:48:54",
            "upload_time_iso_8601": "2024-02-08T23:48:54.276175Z",
            "url": "https://files.pythonhosted.org/packages/d8/c0/a9e62577833c187bf32933aa338d288ddada1e8420b273857cc3f6b3e075/screenai-0.0.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-08 23:48:54",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kyegomez",
    "github_project": "ScreenAI",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "screenai"
}

Kye Gomez