vit-pytorch-implementation

Name	vit-pytorch-implementation JSON
Version	1.0.2 JSON
	download
home_page	https://github.com/soumya1729/vit-pytorch-implementation/
Summary	Vision Transformer (ViT) - Pytorch
upload_time	2024-04-11 05:56:48
maintainer	None
docs_url	None
author	SM
requires_python	None
license	MIT
keywords	artificial intelligence attention mechanism image recognition
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            ## Pytorch Implementation of ViT
Original Paper link: <a href="https://arxiv.org/abs/2010.11929">An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale(Alexey Dosovitskiy et al.)</a>

## Install

```bash
$ pip install vit-pytorch-implementation
```

#Usage:

```python
import torch
from vit_pytorch import lilViT

v = lilViT(
                 img_size=224, 
                 in_channels=3,
                 patch_size=16, 
                 num_transformer_layers=12,
                 embedding_dim=768,
                 mlp_size=3072,
                 num_heads=12, 
                 attn_dropout=0,
                 mlp_dropout=0.1,
                 embedding_dropout=0.1,
                 num_classes=1000
)

img = torch.randn(1, 3, 224, 224)

preds = v(img) # (1, 1000)
preds.shape
```


## Parameters

- `img_size`: int.  
Image resolution. Default=224(224x224)
- `in_channels`: int.  
Image channels. Default `3`
- `patch_size`: int.  
Size of patches. `image_size` must be divisible by `patch_size`.  
The number of patches is: ` n = (image_size // patch_size) ** 2` and `n` **must be greater than 16**. Default `16`
- `num_transformer_layers`: int.  
Depth(number of transformer blocks). Default `12`
- `embedding_dim`: int.  
Embedding dimension. Default `768`
- `mlp_size`: int.  
MLP size. Default `3072`
- `num_heads`: int.  
Number of heads in Multi-head Attention layer. Default `12`
- `attn_dropout`: float.  
Dropout for attention projection. Default `0`
- `mlp_dropout`: float  
Dropout for dense/MLP layers. Default `0.1` 
- `embedding_dropout`: float.   
Dropout for patch and position embeddings.Default `0.1`
- `num_classes`: int.  
Number of classes to classify. Default `1000`

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/soumya1729/vit-pytorch-implementation/",
    "name": "vit-pytorch-implementation",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "artificial intelligence, attention mechanism, image recognition",
    "author": "SM",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/cb/f2/90f9ce6d7371f7fad08a6b5195998685dbb69c386fe0f235212a7eb49cbe/vit-pytorch-implementation-1.0.2.tar.gz",
    "platform": null,
    "description": "## Pytorch Implementation of ViT\nOriginal Paper link: <a href=\"https://arxiv.org/abs/2010.11929\">An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale(Alexey Dosovitskiy et al.)</a>\n\n## Install\n\n```bash\n$ pip install vit-pytorch-implementation\n```\n\n#Usage:\n\n```python\nimport torch\nfrom vit_pytorch import lilViT\n\nv = lilViT(\n                 img_size=224, \n                 in_channels=3,\n                 patch_size=16, \n                 num_transformer_layers=12,\n                 embedding_dim=768,\n                 mlp_size=3072,\n                 num_heads=12, \n                 attn_dropout=0,\n                 mlp_dropout=0.1,\n                 embedding_dropout=0.1,\n                 num_classes=1000\n)\n\nimg = torch.randn(1, 3, 224, 224)\n\npreds = v(img) # (1, 1000)\npreds.shape\n```\n\n\n## Parameters\n\n- `img_size`: int.  \nImage resolution. Default=224(224x224)\n- `in_channels`: int.  \nImage channels. Default `3`\n- `patch_size`: int.  \nSize of patches. `image_size` must be divisible by `patch_size`.  \nThe number of patches is: ` n = (image_size // patch_size) ** 2` and `n` **must be greater than 16**. Default `16`\n- `num_transformer_layers`: int.  \nDepth(number of transformer blocks). Default `12`\n- `embedding_dim`: int.  \nEmbedding dimension. Default `768`\n- `mlp_size`: int.  \nMLP size. Default `3072`\n- `num_heads`: int.  \nNumber of heads in Multi-head Attention layer. Default `12`\n- `attn_dropout`: float.  \nDropout for attention projection. Default `0`\n- `mlp_dropout`: float  \nDropout for dense/MLP layers. Default `0.1` \n- `embedding_dropout`: float.   \nDropout for patch and position embeddings.Default `0.1`\n- `num_classes`: int.  \nNumber of classes to classify. Default `1000`\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Vision Transformer (ViT) - Pytorch",
    "version": "1.0.2",
    "project_urls": {
        "Homepage": "https://github.com/soumya1729/vit-pytorch-implementation/"
    },
    "split_keywords": [
        "artificial intelligence",
        " attention mechanism",
        " image recognition"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "22c6ed1e04c5230d8f04afc66828e4aec0bcd0d29b9a4486b538a863dc29253a",
                "md5": "effe1b95176a18783642ef741492133f",
                "sha256": "6a2c3ba6101229fbd19e4ca7d0b0de308a5d0b8cf89c6cfad6bdf4af662a82d8"
            },
            "downloads": -1,
            "filename": "vit_pytorch_implementation-1.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "effe1b95176a18783642ef741492133f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 4265,
            "upload_time": "2024-04-11T05:56:47",
            "upload_time_iso_8601": "2024-04-11T05:56:47.323959Z",
            "url": "https://files.pythonhosted.org/packages/22/c6/ed1e04c5230d8f04afc66828e4aec0bcd0d29b9a4486b538a863dc29253a/vit_pytorch_implementation-1.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cbf290f9ce6d7371f7fad08a6b5195998685dbb69c386fe0f235212a7eb49cbe",
                "md5": "b03a20aa63edbe07232c7cd77dc5e05b",
                "sha256": "60cd8447dec9f445cadd634efc7bcdc0aedc40cee77784f4e2da8275c920cddd"
            },
            "downloads": -1,
            "filename": "vit-pytorch-implementation-1.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "b03a20aa63edbe07232c7cd77dc5e05b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 3841,
            "upload_time": "2024-04-11T05:56:48",
            "upload_time_iso_8601": "2024-04-11T05:56:48.575460Z",
            "url": "https://files.pythonhosted.org/packages/cb/f2/90f9ce6d7371f7fad08a6b5195998685dbb69c386fe0f235212a7eb49cbe/vit-pytorch-implementation-1.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-11 05:56:48",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "soumya1729",
    "github_project": "vit-pytorch-implementation",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "vit-pytorch-implementation"
}