# vision_transformers
***A repository for everything Vision Transformers.***

## Currently Supported Models
- Image Classification
- ViT Base Patch 16 | 224x224: Torchvision pretrained weights
- ViT Base Patch 32 | 224x224: Torchvision pretrained weights
- ViT Tiny Patch 16 | 224x224: Timm pretrained weights
- Vit Tiny Patch 16 | 384x384: Timm pretrained weights
- Swin Transformer Tiny Patch 4 Window 7 | 224x224: Official Microsoft weights
- Swin Transformer Small Patch 4 Window 7 | 224x224: Official Microsoft weights
- Swin Transformer Base Patch 4 Window 7 | 224x224: Official Microsoft weights
- Swin Transformer Large Patch 4 Window 7 | 224x224: No pretrained weights
## Quick Setup
### Stable PyPi Package
```
pip install vision-transformers
```
### OR
### Latest Git Updates
```
git clone https://github.com/sovit-123/vision_transformers.git
cd vision_transformers
```
Installation in the environment of your choice:
```
pip install .
```
## Importing Models and Usage
### If you have you own training pipeline and just want the model
**Replace `num_classes=1000`** **with you own number of classes**.
```python
from vision_transformers.models import vit
model = vit.vit_b_p16_224(num_classes=1000, pretrained=True)
# model = vit.vit_b_p32_224(num_classes=1000, pretrained=True)
# model = vit.vit_ti_p16_224(num_classes=1000, pretrained=True)
```
```python
from vision_transformers.models import swin_transformer
model = swin_transformer.swin_t_p4_w7_224(num_classes=1000, pretrained=True)
# model = swin_transformer.swin_s_p4_w7_224(num_classes=1000, pretrained=True)
# model = swin_transformer.swin_b_p4_w7_224(num_classes=1000, pretrained=True)
# model = swin_transformer.swin_l_p4_w7_224(num_classes=1000)
```
### If you want to use the training pipeline
* Clone the repository:
```
git clone https://github.com/sovit-123/vision_transformers.git
cd vision_transformers
```
* Install
```
pip install .
```
From the `vision_transformers` directory:
* If you have no validation split
```
python tools/train_classifier.py --data data/diabetic_retinopathy/colored_images/ 0.15 --epochs 5 --model vit_ti_p16_224
```
* In the above command:
* `data/diabetic_retinopathy/colored_images/` represents the data folder where the images will be inside the respective class folders
* `0.15` represents the validation split as the dataset does not contain a validation folder
* If you have validation split
```
python tools/train_classifier.py --train-dir data/plant_disease_recognition/train/ --valid-dir data/plant_disease_recognition/valid/ --epochs 5 --model vit_ti_p16_224
```
* In the above command:
* `--train-dir` should be path to the training directory where the images will be inside their respective class folders.
* `--valid-dir` should be path to the validation directory where the images will be inside their respective class folders.
### All Available Model Flags for `--model`
```
vit_b_p32_224
vit_ti_p16_224
vit_ti_p16_384
vit_b_p16_224
swin_b_p4_w7_224
swin_t_p4_w7_224
swin_s_p4_w7_224
swin_l_p4_w7_224
```
### DETR Training
* The datasets annotations should be in XML format. The dataset (according to `--data` flag) given in following can be found here => https://www.kaggle.com/datasets/sovitrath/aquarium-data
```
python tools/train_detector.py --model detr_resnet50 --epochs 2 --data data/aquarium.yaml
```
## [Examples](https://github.com/sovit-123/vision_transformers/tree/main/examples)
- [ViT Base 16 | 224x224 pretrained fine-tuning on CIFAR10](https://github.com/sovit-123/vision_transformers/blob/main/examples/cifar10_vit_pretrained.ipynb)
- [ViT Tiny 16 | 224x224 pretrained fine-tuning on CIFAR10](https://github.com/sovit-123/vision_transformers/blob/main/examples/cifar10_vit_tiny_p16_224.ipynb)
- [DETR image inference notebook](https://github.com/sovit-123/vision_transformers/blob/main/examples/detr_image_inference.ipynb)
- [DETR video inference script](https://github.com/sovit-123/vision_transformers/blob/main/examples/detr_video_inference.py) (**Fine Tuning Coming Soon**) --- [Check commands here](#DETR-Video-Inference-Commands)
## DETR Video Inference Commands
***All commands to be executed from the root project directory (`vision_transformers`)***
* Using default video:
```
python examples/detr_video_inference.py
```
* Using CPU only:
```
python examples/detr_video_inference.py --device cpu
```
* Using another video file:
```
python examples/detr_video_inference.py --input /path/to/video/file
```
Raw data
{
"_id": null,
"home_page": "https://github.com/sovit-123/vision_transformers",
"name": "vision-transformers",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "vision,neural attention,deep learning",
"author": "Sovit Rath",
"author_email": "sovitrath5@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/88/45/f25edb1b3f643c9f4655fb87d14c07270721dab597bf61096114be7c9c69/vision_transformers-0.1.1.0.tar.gz",
"platform": null,
"description": "# vision_transformers\n\n***A repository for everything Vision Transformers.***\n\n\n\n## Currently Supported Models\n\n- Image Classification\n\n - ViT Base Patch 16 | 224x224: Torchvision pretrained weights\n - ViT Base Patch 32 | 224x224: Torchvision pretrained weights\n - ViT Tiny Patch 16 | 224x224: Timm pretrained weights\n - Vit Tiny Patch 16 | 384x384: Timm pretrained weights\n - Swin Transformer Tiny Patch 4 Window 7 | 224x224: Official Microsoft weights\n - Swin Transformer Small Patch 4 Window 7 | 224x224: Official Microsoft weights\n - Swin Transformer Base Patch 4 Window 7 | 224x224: Official Microsoft weights\n - Swin Transformer Large Patch 4 Window 7 | 224x224: No pretrained weights\n\n## Quick Setup\n\n### Stable PyPi Package\n\n```\npip install vision-transformers\n```\n\n### OR\n\n### Latest Git Updates\n\n```\ngit clone https://github.com/sovit-123/vision_transformers.git\ncd vision_transformers\n```\n\nInstallation in the environment of your choice:\n\n```\npip install .\n```\n\n## Importing Models and Usage\n\n### If you have you own training pipeline and just want the model\n\n**Replace `num_classes=1000`** **with you own number of classes**.\n\n```python\nfrom vision_transformers.models import vit\n\nmodel = vit.vit_b_p16_224(num_classes=1000, pretrained=True)\n# model = vit.vit_b_p32_224(num_classes=1000, pretrained=True)\n# model = vit.vit_ti_p16_224(num_classes=1000, pretrained=True)\n```\n\n```python\nfrom vision_transformers.models import swin_transformer\n\nmodel = swin_transformer.swin_t_p4_w7_224(num_classes=1000, pretrained=True)\n# model = swin_transformer.swin_s_p4_w7_224(num_classes=1000, pretrained=True)\n# model = swin_transformer.swin_b_p4_w7_224(num_classes=1000, pretrained=True)\n# model = swin_transformer.swin_l_p4_w7_224(num_classes=1000)\n```\n\n### If you want to use the training pipeline\n\n* Clone the repository:\n\n```\ngit clone https://github.com/sovit-123/vision_transformers.git\ncd vision_transformers\n```\n\n* Install\n\n```\npip install .\n```\n\nFrom the `vision_transformers` directory:\n\n* If you have no validation split\n\n```\npython tools/train_classifier.py --data data/diabetic_retinopathy/colored_images/ 0.15 --epochs 5 --model vit_ti_p16_224\n```\n\n* In the above command:\n\n * `data/diabetic_retinopathy/colored_images/` represents the data folder where the images will be inside the respective class folders\n\n * `0.15` represents the validation split as the dataset does not contain a validation folder\n\n* If you have validation split\n\n```\npython tools/train_classifier.py --train-dir data/plant_disease_recognition/train/ --valid-dir data/plant_disease_recognition/valid/ --epochs 5 --model vit_ti_p16_224\n```\n\n* In the above command:\n * `--train-dir` should be path to the training directory where the images will be inside their respective class folders.\n * `--valid-dir` should be path to the validation directory where the images will be inside their respective class folders.\n\n### All Available Model Flags for `--model`\n\n```\nvit_b_p32_224\nvit_ti_p16_224\nvit_ti_p16_384\nvit_b_p16_224\nswin_b_p4_w7_224\nswin_t_p4_w7_224\nswin_s_p4_w7_224\nswin_l_p4_w7_224\n```\n\n### DETR Training\n\n* The datasets annotations should be in XML format. The dataset (according to `--data` flag) given in following can be found here => https://www.kaggle.com/datasets/sovitrath/aquarium-data\n\n```\npython tools/train_detector.py --model detr_resnet50 --epochs 2 --data data/aquarium.yaml\n```\n\n## [Examples](https://github.com/sovit-123/vision_transformers/tree/main/examples)\n\n- [ViT Base 16 | 224x224 pretrained fine-tuning on CIFAR10](https://github.com/sovit-123/vision_transformers/blob/main/examples/cifar10_vit_pretrained.ipynb)\n- [ViT Tiny 16 | 224x224 pretrained fine-tuning on CIFAR10](https://github.com/sovit-123/vision_transformers/blob/main/examples/cifar10_vit_tiny_p16_224.ipynb)\n- [DETR image inference notebook](https://github.com/sovit-123/vision_transformers/blob/main/examples/detr_image_inference.ipynb)\n- [DETR video inference script](https://github.com/sovit-123/vision_transformers/blob/main/examples/detr_video_inference.py) (**Fine Tuning Coming Soon**) --- [Check commands here](#DETR-Video-Inference-Commands)\n\n## DETR Video Inference Commands\n\n***All commands to be executed from the root project directory (`vision_transformers`)***\n\n* Using default video:\n\n```\npython examples/detr_video_inference.py\n```\n\n* Using CPU only:\n\n```\npython examples/detr_video_inference.py --device cpu\n```\n\n* Using another video file:\n\n```\npython examples/detr_video_inference.py --input /path/to/video/file\n```\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Vision Transformers (ViT)",
"version": "0.1.1.0",
"split_keywords": [
"vision",
"neural attention",
"deep learning"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "8845f25edb1b3f643c9f4655fb87d14c07270721dab597bf61096114be7c9c69",
"md5": "32f790ee086c3b1ac110fa1c0755a361",
"sha256": "45fc09d90aef2a9f6eaf3cc60cd0e9879a1d5bae868d30352cbf5a1b0f3e4a5d"
},
"downloads": -1,
"filename": "vision_transformers-0.1.1.0.tar.gz",
"has_sig": false,
"md5_digest": "32f790ee086c3b1ac110fa1c0755a361",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 38139,
"upload_time": "2023-03-25T12:41:54",
"upload_time_iso_8601": "2023-03-25T12:41:54.491710Z",
"url": "https://files.pythonhosted.org/packages/88/45/f25edb1b3f643c9f4655fb87d14c07270721dab597bf61096114be7c9c69/vision_transformers-0.1.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-03-25 12:41:54",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "sovit-123",
"github_project": "vision_transformers",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "albumentations",
"specs": [
[
">=",
"1.1.0"
]
]
},
{
"name": "torchinfo",
"specs": []
},
{
"name": "tqdm",
"specs": []
},
{
"name": "pycocotools",
"specs": [
[
">=",
"2.0.2"
]
]
}
],
"lcname": "vision-transformers"
}