Name | vit-keras JSON |
Version |
0.2.0
JSON |
| download |
home_page | None |
Summary | Keras implementation of ViT (Vision Transformer) |
upload_time | 2025-08-04 03:17:50 |
maintainer | None |
docs_url | None |
author | None |
requires_python | <3.12,>=3.10 |
license | None |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# vit-keras
This is a Keras implementation of the models described in [An Image is Worth 16x16 Words:
Transformes For Image Recognition at Scale](https://arxiv.org/pdf/2010.11929.pdf). It is based on an earlier implementation from [tuvovan](https://github.com/tuvovan/Vision_Transformer_Keras), modified to match the Flax implementation in the [official repository](https://github.com/google-research/vision_transformer).
The weights here are ported over from the weights provided in the official repository. See `utils.load_weights_numpy` to see how this is done (it's not pretty, but it does the job).
## Usage
Install this package using `pip install vit-keras`
You can use the model out-of-the-box with ImageNet 2012 classes using
something like the following. The weights will be downloaded automatically.
```python
from vit_keras import vit, utils
image_size = 384
classes = utils.get_imagenet_classes()
model = vit.vit_b16(
image_size=image_size,
activation='sigmoid',
pretrained=True,
include_top=True,
pretrained_top=True
)
url = 'https://upload.wikimedia.org/wikipedia/commons/d/d7/Granny_smith_and_cross_section.jpg'
image = utils.read(url, image_size)
X = vit.preprocess_inputs(image).reshape(1, image_size, image_size, 3)
y = model.predict(X)
print(classes[y[0].argmax()]) # Granny smith
```
You can fine-tune using a model loaded as follows.
```python
image_size = 224
model = vit.vit_l32(
image_size=image_size,
activation='sigmoid',
pretrained=True,
include_top=True,
pretrained_top=False,
classes=200
)
# Train this model on your data as desired.
```
## Visualizing Attention Maps
There's some functionality for plotting attention maps for a given image and model. See example below. I'm not sure I'm doing this correctly (the official repository didn't have example code). Feedback /corrections welcome!
```python
import numpy as np
import matplotlib.pyplot as plt
from vit_keras import vit, utils, visualize
# Load a model
image_size = 384
classes = utils.get_imagenet_classes()
model = vit.vit_b16(
image_size=image_size,
activation='sigmoid',
pretrained=True,
include_top=True,
pretrained_top=True
)
classes = utils.get_imagenet_classes()
# Get an image and compute the attention map
url = 'https://upload.wikimedia.org/wikipedia/commons/b/bc/Free%21_%283987584939%29.jpg'
image = utils.read(url, image_size)
attention_map = visualize.attention_map(model=model, image=image)
print('Prediction:', classes[
model.predict(vit.preprocess_inputs(image)[np.newaxis])[0].argmax()]
) # Prediction: Eskimo dog, husky
# Plot results
fig, (ax1, ax2) = plt.subplots(ncols=2)
ax1.axis('off')
ax2.axis('off')
ax1.set_title('Original')
ax2.set_title('Attention Map')
_ = ax1.imshow(image)
_ = ax2.imshow(attention_map)
```

Raw data
{
"_id": null,
"home_page": null,
"name": "vit-keras",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.12,>=3.10",
"maintainer_email": null,
"keywords": null,
"author": null,
"author_email": "Fausto Morales <fausto@robinbay.com>",
"download_url": "https://files.pythonhosted.org/packages/d2/9c/2cc182b43a0924aa23f03a5b4d22052d5353dfb06ee2bbc9ba6ee4dc2026/vit_keras-0.2.0.tar.gz",
"platform": null,
"description": "# vit-keras\nThis is a Keras implementation of the models described in [An Image is Worth 16x16 Words:\nTransformes For Image Recognition at Scale](https://arxiv.org/pdf/2010.11929.pdf). It is based on an earlier implementation from [tuvovan](https://github.com/tuvovan/Vision_Transformer_Keras), modified to match the Flax implementation in the [official repository](https://github.com/google-research/vision_transformer).\n\nThe weights here are ported over from the weights provided in the official repository. See `utils.load_weights_numpy` to see how this is done (it's not pretty, but it does the job).\n\n## Usage\nInstall this package using `pip install vit-keras`\n\nYou can use the model out-of-the-box with ImageNet 2012 classes using\nsomething like the following. The weights will be downloaded automatically.\n\n```python\nfrom vit_keras import vit, utils\n\nimage_size = 384\nclasses = utils.get_imagenet_classes()\nmodel = vit.vit_b16(\n image_size=image_size,\n activation='sigmoid',\n pretrained=True,\n include_top=True,\n pretrained_top=True\n)\nurl = 'https://upload.wikimedia.org/wikipedia/commons/d/d7/Granny_smith_and_cross_section.jpg'\nimage = utils.read(url, image_size)\nX = vit.preprocess_inputs(image).reshape(1, image_size, image_size, 3)\ny = model.predict(X)\nprint(classes[y[0].argmax()]) # Granny smith\n```\n\nYou can fine-tune using a model loaded as follows.\n\n```python\nimage_size = 224\nmodel = vit.vit_l32(\n image_size=image_size,\n activation='sigmoid',\n pretrained=True,\n include_top=True,\n pretrained_top=False,\n classes=200\n)\n# Train this model on your data as desired.\n```\n\n## Visualizing Attention Maps\nThere's some functionality for plotting attention maps for a given image and model. See example below. I'm not sure I'm doing this correctly (the official repository didn't have example code). Feedback /corrections welcome!\n\n```python\nimport numpy as np\nimport matplotlib.pyplot as plt\nfrom vit_keras import vit, utils, visualize\n\n# Load a model\nimage_size = 384\nclasses = utils.get_imagenet_classes()\nmodel = vit.vit_b16(\n image_size=image_size,\n activation='sigmoid',\n pretrained=True,\n include_top=True,\n pretrained_top=True\n)\nclasses = utils.get_imagenet_classes()\n\n# Get an image and compute the attention map\nurl = 'https://upload.wikimedia.org/wikipedia/commons/b/bc/Free%21_%283987584939%29.jpg'\nimage = utils.read(url, image_size)\nattention_map = visualize.attention_map(model=model, image=image)\nprint('Prediction:', classes[\n model.predict(vit.preprocess_inputs(image)[np.newaxis])[0].argmax()]\n) # Prediction: Eskimo dog, husky\n\n# Plot results\nfig, (ax1, ax2) = plt.subplots(ncols=2)\nax1.axis('off')\nax2.axis('off')\nax1.set_title('Original')\nax2.set_title('Attention Map')\n_ = ax1.imshow(image)\n_ = ax2.imshow(attention_map)\n```\n\n\n",
"bugtrack_url": null,
"license": null,
"summary": "Keras implementation of ViT (Vision Transformer)",
"version": "0.2.0",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "660db2ea088fd10306b9914229e6eb7e82cb02ac0daaa3e81aff7412b75c278b",
"md5": "004985716d0cccea2c10eeacb2482809",
"sha256": "a1a48b8cbe01895d420cb2b4e96512e7839022347c8733a8fb223fb8fec7510d"
},
"downloads": -1,
"filename": "vit_keras-0.2.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "004985716d0cccea2c10eeacb2482809",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.12,>=3.10",
"size": 24571,
"upload_time": "2025-08-04T03:17:48",
"upload_time_iso_8601": "2025-08-04T03:17:48.177207Z",
"url": "https://files.pythonhosted.org/packages/66/0d/b2ea088fd10306b9914229e6eb7e82cb02ac0daaa3e81aff7412b75c278b/vit_keras-0.2.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "d29c2cc182b43a0924aa23f03a5b4d22052d5353dfb06ee2bbc9ba6ee4dc2026",
"md5": "c886149f10ce57d9759ba9e926f5e254",
"sha256": "fcff0397f94187823cbf8f5a453b7836b1cd365a7c9e6c422c2e959f8babb1dc"
},
"downloads": -1,
"filename": "vit_keras-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "c886149f10ce57d9759ba9e926f5e254",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.12,>=3.10",
"size": 140900,
"upload_time": "2025-08-04T03:17:50",
"upload_time_iso_8601": "2025-08-04T03:17:50.092049Z",
"url": "https://files.pythonhosted.org/packages/d2/9c/2cc182b43a0924aa23f03a5b4d22052d5353dfb06ee2bbc9ba6ee4dc2026/vit_keras-0.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-04 03:17:50",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "vit-keras"
}