legrad-torch

Name	legrad-torch JSON
Version	1.1 JSON
	download
home_page	https://github.com/WalBouss/LeGrad
Summary	LeGrad
upload_time	2024-10-14 17:22:51
maintainer	None
docs_url	None
author	Walid Bousselham, Angie Boggust, Sofian Chaybouti, Hendrik Strobelt, Hilde Kuehne
requires_python	>=3.7
license	None
keywords	explainability vit vision-language models gradcam clip pretrained
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # LeGrad

<div align="center">
<img src="./assets/logo_LeGrad.png" width="20%"/>
</div>

### [An Explainability Method for Vision Transformers via Feature Formation Sensitivity](https://arxiv.org/abs/2404.03214)
_[Walid Bousselham](http://walidbousselham.com/)<sup>1</sup>, [Angie Boggust](http://angieboggust.com/)<sup>2</sup>, [Sofian Chaybouti](https://scholar.google.com/citations?user=8tewdk4AAAAJ&hl)<sup>1</sup>, [Hendrik Strobelt](http://hendrik.strobelt.com/)<sup>3,4</sup> and [Hilde Kuehne](https://hildekuehne.github.io/)<sup>1,3</sup>_

<sup>1</sup> University of Bonn & Goethe University Frankfurt,
<sup>2</sup> MIT CSAIL,
<sup>3</sup> MIT-IBM Watson AI Lab,
<sup>4</sup> IBM Research.

[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/WalidBouss/LeGrad) 
<a href="https://arxiv.org/abs/2404.03214"><img src="https://img.shields.io/badge/arXiv-Paper-<color>"></a>
<a href="https://walidbousselham.com/LeGrad"><img src="https://img.shields.io/badge/Project-Website-red"></a>

Vision-Language foundation models have shown remarkable performance in various zero-shot settings such as image retrieval, classification, or captioning.
 we propose LeGrad, an explainability method specifically designed for ViTs. 
We LeGrad we explore how the decision-making process of such models by leveraging their feature formation process.
A by-product of understanding VL models decision-making is the ability to produce localised heatmap for any text prompt. 

The following is the code for a wrapper around the [OpenCLIP](https://github.com/mlfoundations/open_clip) library to equip VL models with LeGrad.

<div align="center">
<img src="./assets/teaser_figure.png" width="100%"/>
</div>

## :hammer: Installation
`legrad` library can be simply installed via pip: 
```bash
$ pip install legrad_torch
```

## Demo
- Try out our web demo on [HuggingFace Spaces](https://huggingface.co/spaces) [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/WalidBouss/LeGrad)
- Run the demo on Google Colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1ooB4AB9NRRe6Z-VilZizFOlFpTiKQHAc?usp=sharing)
- Run [`playground.py`](./playground.py) for a usage example.

To run the gradio app locally, first install gradio and then run [`app.py`](./app.py):
```bash
$ pip install gradio
$ python app.py
```
## Usage
To see which pretrained models is available use the following code snippet:
```python
import legrad
legrad.list_pretrained()
```

### Single Image
To process an image and a text prompt use the following code snippet:

**Note**: the wrapper does not affect the original model, hence all the functionalities of OpenCLIP models can be used seamlessly.
```python
import requests
from PIL import Image
import open_clip
import torch

from legrad import LeWrapper, LePreprocess
from legrad.utils import visualize

# ------- model's paramters -------
model_name = 'ViT-B-16'
pretrained = 'laion2b_s34b_b88k'
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# ------- init model -------
model, _, preprocess = open_clip.create_model_and_transforms(
    model_name=model_name, pretrained=pretrained, device=device)
tokenizer = open_clip.get_tokenizer(model_name=model_name)
model.eval()
# ------- Equip the model with LeGrad -------
model = LeWrapper(model)
# ___ (Optional): Wrapper for Higher-Res input image ___
preprocess = LePreprocess(preprocess=preprocess, image_size=448)

# ------- init inputs: image + text -------
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = preprocess(Image.open(requests.get(url, stream=True).raw)).unsqueeze(0).to(device)
text = tokenizer(['a photo of a cat']).to(device)

# -------
text_embedding = model.encode_text(text, normalize=True)
print(image.shape)
explainability_map = model.compute_legrad_clip(image=image, text_embedding=text_embedding)

# ___ (Optional): Visualize overlay of the image + heatmap ___
visualize(heatmaps=explainability_map, image=image)
```
 


# :star: Acknowledgement
This code is build as wrapper around [OpenCLIP](https://github.com/mlfoundations/open_clip) library from [LAION](https://laion.ai/), visit their repo for more vision-language models.
This project also takes inspiration from [Transformer-MM-Explainability](https://github.com/hila-chefer/Transformer-MM-Explainability) and the [timm library](https://github.com/huggingface/pytorch-image-models), please visit their repository.

# :books: Citation
If you find this repository useful, please consider citing our work :pencil: and giving a star :star2: :
```
@article{bousselham2024legrad,
  author    = {Bousselham, Walid and Boggust, Angie and Chaybouti, Sofian and Strobelt, Hendrik and Kuehne, Hilde}
  title     = {LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity},
  journal   = {arXiv preprint arXiv:2404.03214},
  year      = {2024},
}
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/WalBouss/LeGrad",
    "name": "legrad-torch",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "Explainability, ViT, Vision-Language Models, GradCAM, CLIP pretrained",
    "author": "Walid Bousselham, Angie Boggust, Sofian Chaybouti, Hendrik Strobelt, Hilde Kuehne",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/e3/2b/5e1f060e34500476a6f82b7c7fd57a12847a60216aeacf2752f4272a0026/legrad_torch-1.1.tar.gz",
    "platform": null,
    "description": "# LeGrad\n\n<div align=\"center\">\n<img src=\"./assets/logo_LeGrad.png\" width=\"20%\"/>\n</div>\n\n### [An Explainability Method for Vision Transformers via Feature Formation Sensitivity](https://arxiv.org/abs/2404.03214)\n_[Walid Bousselham](http://walidbousselham.com/)<sup>1</sup>, [Angie Boggust](http://angieboggust.com/)<sup>2</sup>, [Sofian Chaybouti](https://scholar.google.com/citations?user=8tewdk4AAAAJ&hl)<sup>1</sup>, [Hendrik Strobelt](http://hendrik.strobelt.com/)<sup>3,4</sup> and [Hilde Kuehne](https://hildekuehne.github.io/)<sup>1,3</sup>_\n\n<sup>1</sup> University of Bonn & Goethe University Frankfurt,\n<sup>2</sup> MIT CSAIL,\n<sup>3</sup> MIT-IBM Watson AI Lab,\n<sup>4</sup> IBM Research.\n\n[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/WalidBouss/LeGrad) \n<a href=\"https://arxiv.org/abs/2404.03214\"><img src=\"https://img.shields.io/badge/arXiv-Paper-<color>\"></a>\n<a href=\"https://walidbousselham.com/LeGrad\"><img src=\"https://img.shields.io/badge/Project-Website-red\"></a>\n\nVision-Language foundation models have shown remarkable performance in various zero-shot settings such as image retrieval, classification, or captioning.\n we propose LeGrad, an explainability method specifically designed for ViTs. \nWe LeGrad we explore how the decision-making process of such models by leveraging their feature formation process.\nA by-product of understanding VL models decision-making is the ability to produce localised heatmap for any text prompt. \n\nThe following is the code for a wrapper around the [OpenCLIP](https://github.com/mlfoundations/open_clip) library to equip VL models with LeGrad.\n\n<div align=\"center\">\n<img src=\"./assets/teaser_figure.png\" width=\"100%\"/>\n</div>\n\n## :hammer: Installation\n`legrad` library can be simply installed via pip: \n```bash\n$ pip install legrad_torch\n```\n\n## Demo\n- Try out our web demo on [HuggingFace Spaces](https://huggingface.co/spaces) [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/WalidBouss/LeGrad)\n- Run the demo on Google Colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1ooB4AB9NRRe6Z-VilZizFOlFpTiKQHAc?usp=sharing)\n- Run [`playground.py`](./playground.py) for a usage example.\n\nTo run the gradio app locally, first install gradio and then run [`app.py`](./app.py):\n```bash\n$ pip install gradio\n$ python app.py\n```\n## Usage\nTo see which pretrained models is available use the following code snippet:\n```python\nimport legrad\nlegrad.list_pretrained()\n```\n\n### Single Image\nTo process an image and a text prompt use the following code snippet:\n\n**Note**: the wrapper does not affect the original model, hence all the functionalities of OpenCLIP models can be used seamlessly.\n```python\nimport requests\nfrom PIL import Image\nimport open_clip\nimport torch\n\nfrom legrad import LeWrapper, LePreprocess\nfrom legrad.utils import visualize\n\n# ------- model's paramters -------\nmodel_name = 'ViT-B-16'\npretrained = 'laion2b_s34b_b88k'\ndevice = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n# ------- init model -------\nmodel, _, preprocess = open_clip.create_model_and_transforms(\n    model_name=model_name, pretrained=pretrained, device=device)\ntokenizer = open_clip.get_tokenizer(model_name=model_name)\nmodel.eval()\n# ------- Equip the model with LeGrad -------\nmodel = LeWrapper(model)\n# ___ (Optional): Wrapper for Higher-Res input image ___\npreprocess = LePreprocess(preprocess=preprocess, image_size=448)\n\n# ------- init inputs: image + text -------\nurl = \"http://images.cocodataset.org/val2017/000000039769.jpg\"\nimage = preprocess(Image.open(requests.get(url, stream=True).raw)).unsqueeze(0).to(device)\ntext = tokenizer(['a photo of a cat']).to(device)\n\n# -------\ntext_embedding = model.encode_text(text, normalize=True)\nprint(image.shape)\nexplainability_map = model.compute_legrad_clip(image=image, text_embedding=text_embedding)\n\n# ___ (Optional): Visualize overlay of the image + heatmap ___\nvisualize(heatmaps=explainability_map, image=image)\n```\n \n\n\n# :star: Acknowledgement\nThis code is build as wrapper around [OpenCLIP](https://github.com/mlfoundations/open_clip) library from [LAION](https://laion.ai/), visit their repo for more vision-language models.\nThis project also takes inspiration from [Transformer-MM-Explainability](https://github.com/hila-chefer/Transformer-MM-Explainability) and the [timm library](https://github.com/huggingface/pytorch-image-models), please visit their repository.\n\n# :books: Citation\nIf you find this repository useful, please consider citing our work :pencil: and giving a star :star2: :\n```\n@article{bousselham2024legrad,\n  author    = {Bousselham, Walid and Boggust, Angie and Chaybouti, Sofian and Strobelt, Hendrik and Kuehne, Hilde}\n  title     = {LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity},\n  journal   = {arXiv preprint arXiv:2404.03214},\n  year      = {2024},\n}\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "LeGrad",
    "version": "1.1",
    "project_urls": {
        "Homepage": "https://github.com/WalBouss/LeGrad"
    },
    "split_keywords": [
        "explainability",
        " vit",
        " vision-language models",
        " gradcam",
        " clip pretrained"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "99f9d552783ff4f11e54abb2e901c2cc3132549b82a7493855b79bc10a8f94ee",
                "md5": "636f64464e035fcbd390ed3507fc7e81",
                "sha256": "cad57870ce1347c2cad69c1bcda2475c46ba8f0313eec52b85c4359f61ca0267"
            },
            "downloads": -1,
            "filename": "legrad_torch-1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "636f64464e035fcbd390ed3507fc7e81",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 14035,
            "upload_time": "2024-10-14T17:22:49",
            "upload_time_iso_8601": "2024-10-14T17:22:49.272846Z",
            "url": "https://files.pythonhosted.org/packages/99/f9/d552783ff4f11e54abb2e901c2cc3132549b82a7493855b79bc10a8f94ee/legrad_torch-1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e32b5e1f060e34500476a6f82b7c7fd57a12847a60216aeacf2752f4272a0026",
                "md5": "7a4f8e73662d28f4da9b9f423ebaee81",
                "sha256": "a43c4c026f742b8d5e9305dbecd20d2481e116bab3c7514dd0f12a3547fddac5"
            },
            "downloads": -1,
            "filename": "legrad_torch-1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "7a4f8e73662d28f4da9b9f423ebaee81",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 16128,
            "upload_time": "2024-10-14T17:22:51",
            "upload_time_iso_8601": "2024-10-14T17:22:51.216416Z",
            "url": "https://files.pythonhosted.org/packages/e3/2b/5e1f060e34500476a6f82b7c7fd57a12847a60216aeacf2752f4272a0026/legrad_torch-1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-14 17:22:51",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "WalBouss",
    "github_project": "LeGrad",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "legrad-torch"
}

Walid Bousselham, Angie Boggust, Sofian Chaybouti, Hendrik Strobelt, Hilde Kuehne