## Pytorch Implementation of ViT
Original Paper link: <a href="https://arxiv.org/abs/2010.11929">An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale(Alexey Dosovitskiy et al.)</a>
## Install
```bash
$ pip install vit-pytorch-implementation
```
#Usage:
```python
import torch
from vit_pytorch import lilViT
v = lilViT(
img_size=224,
in_channels=3,
patch_size=16,
num_transformer_layers=12,
embedding_dim=768,
mlp_size=3072,
num_heads=12,
attn_dropout=0,
mlp_dropout=0.1,
embedding_dropout=0.1,
num_classes=1000
)
img = torch.randn(1, 3, 224, 224)
preds = v(img) # (1, 1000)
preds.shape
```
## Parameters
- `img_size`: int.
Image resolution. Default=224(224x224)
- `in_channels`: int.
Image channels. Default `3`
- `patch_size`: int.
Size of patches. `image_size` must be divisible by `patch_size`.
The number of patches is: ` n = (image_size // patch_size) ** 2` and `n` **must be greater than 16**. Default `16`
- `num_transformer_layers`: int.
Depth(number of transformer blocks). Default `12`
- `embedding_dim`: int.
Embedding dimension. Default `768`
- `mlp_size`: int.
MLP size. Default `3072`
- `num_heads`: int.
Number of heads in Multi-head Attention layer. Default `12`
- `attn_dropout`: float.
Dropout for attention projection. Default `0`
- `mlp_dropout`: float
Dropout for dense/MLP layers. Default `0.1`
- `embedding_dropout`: float.
Dropout for patch and position embeddings.Default `0.1`
- `num_classes`: int.
Number of classes to classify. Default `1000`
Raw data
{
"_id": null,
"home_page": "https://github.com/soumya1729/vit-pytorch-implementation/",
"name": "vit-pytorch-implementation",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "artificial intelligence, attention mechanism, image recognition",
"author": "SM",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/cb/f2/90f9ce6d7371f7fad08a6b5195998685dbb69c386fe0f235212a7eb49cbe/vit-pytorch-implementation-1.0.2.tar.gz",
"platform": null,
"description": "## Pytorch Implementation of ViT\nOriginal Paper link: <a href=\"https://arxiv.org/abs/2010.11929\">An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale(Alexey Dosovitskiy et al.)</a>\n\n## Install\n\n```bash\n$ pip install vit-pytorch-implementation\n```\n\n#Usage:\n\n```python\nimport torch\nfrom vit_pytorch import lilViT\n\nv = lilViT(\n img_size=224, \n in_channels=3,\n patch_size=16, \n num_transformer_layers=12,\n embedding_dim=768,\n mlp_size=3072,\n num_heads=12, \n attn_dropout=0,\n mlp_dropout=0.1,\n embedding_dropout=0.1,\n num_classes=1000\n)\n\nimg = torch.randn(1, 3, 224, 224)\n\npreds = v(img) # (1, 1000)\npreds.shape\n```\n\n\n## Parameters\n\n- `img_size`: int. \nImage resolution. Default=224(224x224)\n- `in_channels`: int. \nImage channels. Default `3`\n- `patch_size`: int. \nSize of patches. `image_size` must be divisible by `patch_size`. \nThe number of patches is: ` n = (image_size // patch_size) ** 2` and `n` **must be greater than 16**. Default `16`\n- `num_transformer_layers`: int. \nDepth(number of transformer blocks). Default `12`\n- `embedding_dim`: int. \nEmbedding dimension. Default `768`\n- `mlp_size`: int. \nMLP size. Default `3072`\n- `num_heads`: int. \nNumber of heads in Multi-head Attention layer. Default `12`\n- `attn_dropout`: float. \nDropout for attention projection. Default `0`\n- `mlp_dropout`: float \nDropout for dense/MLP layers. Default `0.1` \n- `embedding_dropout`: float. \nDropout for patch and position embeddings.Default `0.1`\n- `num_classes`: int. \nNumber of classes to classify. Default `1000`\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Vision Transformer (ViT) - Pytorch",
"version": "1.0.2",
"project_urls": {
"Homepage": "https://github.com/soumya1729/vit-pytorch-implementation/"
},
"split_keywords": [
"artificial intelligence",
" attention mechanism",
" image recognition"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "22c6ed1e04c5230d8f04afc66828e4aec0bcd0d29b9a4486b538a863dc29253a",
"md5": "effe1b95176a18783642ef741492133f",
"sha256": "6a2c3ba6101229fbd19e4ca7d0b0de308a5d0b8cf89c6cfad6bdf4af662a82d8"
},
"downloads": -1,
"filename": "vit_pytorch_implementation-1.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "effe1b95176a18783642ef741492133f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 4265,
"upload_time": "2024-04-11T05:56:47",
"upload_time_iso_8601": "2024-04-11T05:56:47.323959Z",
"url": "https://files.pythonhosted.org/packages/22/c6/ed1e04c5230d8f04afc66828e4aec0bcd0d29b9a4486b538a863dc29253a/vit_pytorch_implementation-1.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "cbf290f9ce6d7371f7fad08a6b5195998685dbb69c386fe0f235212a7eb49cbe",
"md5": "b03a20aa63edbe07232c7cd77dc5e05b",
"sha256": "60cd8447dec9f445cadd634efc7bcdc0aedc40cee77784f4e2da8275c920cddd"
},
"downloads": -1,
"filename": "vit-pytorch-implementation-1.0.2.tar.gz",
"has_sig": false,
"md5_digest": "b03a20aa63edbe07232c7cd77dc5e05b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 3841,
"upload_time": "2024-04-11T05:56:48",
"upload_time_iso_8601": "2024-04-11T05:56:48.575460Z",
"url": "https://files.pythonhosted.org/packages/cb/f2/90f9ce6d7371f7fad08a6b5195998685dbb69c386fe0f235212a7eb49cbe/vit-pytorch-implementation-1.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-11 05:56:48",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "soumya1729",
"github_project": "vit-pytorch-implementation",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "vit-pytorch-implementation"
}