vision-transformers


Namevision-transformers JSON
Version 0.1.1.0 PyPI version JSON
download
home_pagehttps://github.com/sovit-123/vision_transformers
SummaryVision Transformers (ViT)
upload_time2023-03-25 12:41:54
maintainer
docs_urlNone
authorSovit Rath
requires_python
licenseMIT
keywords vision neural attention deep learning
VCS
bugtrack_url
requirements albumentations torchinfo tqdm pycocotools
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # vision_transformers

***A repository for everything Vision Transformers.***

![](readme_images/detr_infer.gif)

## Currently Supported Models

- Image Classification

  - ViT Base Patch 16 | 224x224: Torchvision pretrained weights
  - ViT Base Patch 32 | 224x224: Torchvision pretrained weights
  - ViT Tiny Patch 16 | 224x224: Timm pretrained weights
  - Vit Tiny Patch 16 | 384x384: Timm pretrained weights
  - Swin Transformer Tiny Patch 4 Window 7 | 224x224: Official Microsoft weights
  - Swin Transformer Small Patch 4 Window 7 | 224x224: Official Microsoft weights
  - Swin Transformer Base Patch 4 Window 7 | 224x224: Official Microsoft weights
  - Swin Transformer Large Patch 4 Window 7 | 224x224: No pretrained weights

## Quick Setup

### Stable PyPi Package

```
pip install vision-transformers
```

### OR

### Latest Git Updates

```
git clone https://github.com/sovit-123/vision_transformers.git
cd vision_transformers
```

Installation in the environment of your choice:

```
pip install .
```

## Importing Models and Usage

### If you have you own training pipeline and just want the model

**Replace `num_classes=1000`** **with you own number of classes**.

```python
from vision_transformers.models import vit

model = vit.vit_b_p16_224(num_classes=1000, pretrained=True)
# model = vit.vit_b_p32_224(num_classes=1000, pretrained=True)
# model = vit.vit_ti_p16_224(num_classes=1000, pretrained=True)
```

```python
from vision_transformers.models import swin_transformer

model = swin_transformer.swin_t_p4_w7_224(num_classes=1000, pretrained=True)
# model = swin_transformer.swin_s_p4_w7_224(num_classes=1000, pretrained=True)
# model = swin_transformer.swin_b_p4_w7_224(num_classes=1000, pretrained=True)
# model = swin_transformer.swin_l_p4_w7_224(num_classes=1000)
```

### If you want to use the training pipeline

* Clone the repository:

```
git clone https://github.com/sovit-123/vision_transformers.git
cd vision_transformers
```

* Install

```
pip install .
```

From the `vision_transformers` directory:

* If you have no validation split

```
python tools/train_classifier.py --data data/diabetic_retinopathy/colored_images/ 0.15 --epochs 5 --model vit_ti_p16_224
```

* In the above command:

  * `data/diabetic_retinopathy/colored_images/` represents the data folder where the images will be inside the respective class folders

  * `0.15` represents the validation split as the dataset does not contain a validation folder

* If you have validation split

```
python tools/train_classifier.py --train-dir data/plant_disease_recognition/train/ --valid-dir data/plant_disease_recognition/valid/ --epochs 5 --model vit_ti_p16_224
```

* In the above command:
  * `--train-dir` should be path to the training directory where the images will be inside their respective class folders.
  * `--valid-dir` should be path to the validation directory where the images will be inside their respective class folders.

### All Available Model Flags for `--model`

```
vit_b_p32_224
vit_ti_p16_224
vit_ti_p16_384
vit_b_p16_224
swin_b_p4_w7_224
swin_t_p4_w7_224
swin_s_p4_w7_224
swin_l_p4_w7_224
```

### DETR Training

* The datasets annotations should be in XML format. The dataset (according to `--data` flag) given in following can be found here => https://www.kaggle.com/datasets/sovitrath/aquarium-data

```
python tools/train_detector.py --model detr_resnet50 --epochs 2 --data data/aquarium.yaml
```

## [Examples](https://github.com/sovit-123/vision_transformers/tree/main/examples)

- [ViT Base 16 | 224x224 pretrained fine-tuning on CIFAR10](https://github.com/sovit-123/vision_transformers/blob/main/examples/cifar10_vit_pretrained.ipynb)
- [ViT Tiny 16 | 224x224 pretrained fine-tuning on CIFAR10](https://github.com/sovit-123/vision_transformers/blob/main/examples/cifar10_vit_tiny_p16_224.ipynb)
- [DETR image inference notebook](https://github.com/sovit-123/vision_transformers/blob/main/examples/detr_image_inference.ipynb)
- [DETR video inference script](https://github.com/sovit-123/vision_transformers/blob/main/examples/detr_video_inference.py) (**Fine Tuning Coming Soon**) --- [Check commands here](#DETR-Video-Inference-Commands)

## DETR Video Inference Commands

***All commands to be executed from the root project directory (`vision_transformers`)***

* Using default video:

```
python examples/detr_video_inference.py
```

* Using CPU only:

```
python examples/detr_video_inference.py --device cpu
```

* Using another video file:

```
python examples/detr_video_inference.py --input /path/to/video/file
```


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/sovit-123/vision_transformers",
    "name": "vision-transformers",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "vision,neural attention,deep learning",
    "author": "Sovit Rath",
    "author_email": "sovitrath5@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/88/45/f25edb1b3f643c9f4655fb87d14c07270721dab597bf61096114be7c9c69/vision_transformers-0.1.1.0.tar.gz",
    "platform": null,
    "description": "# vision_transformers\n\n***A repository for everything Vision Transformers.***\n\n![](readme_images/detr_infer.gif)\n\n## Currently Supported Models\n\n- Image Classification\n\n  - ViT Base Patch 16 | 224x224: Torchvision pretrained weights\n  - ViT Base Patch 32 | 224x224: Torchvision pretrained weights\n  - ViT Tiny Patch 16 | 224x224: Timm pretrained weights\n  - Vit Tiny Patch 16 | 384x384: Timm pretrained weights\n  - Swin Transformer Tiny Patch 4 Window 7 | 224x224: Official Microsoft weights\n  - Swin Transformer Small Patch 4 Window 7 | 224x224: Official Microsoft weights\n  - Swin Transformer Base Patch 4 Window 7 | 224x224: Official Microsoft weights\n  - Swin Transformer Large Patch 4 Window 7 | 224x224: No pretrained weights\n\n## Quick Setup\n\n### Stable PyPi Package\n\n```\npip install vision-transformers\n```\n\n### OR\n\n### Latest Git Updates\n\n```\ngit clone https://github.com/sovit-123/vision_transformers.git\ncd vision_transformers\n```\n\nInstallation in the environment of your choice:\n\n```\npip install .\n```\n\n## Importing Models and Usage\n\n### If you have you own training pipeline and just want the model\n\n**Replace `num_classes=1000`** **with you own number of classes**.\n\n```python\nfrom vision_transformers.models import vit\n\nmodel = vit.vit_b_p16_224(num_classes=1000, pretrained=True)\n# model = vit.vit_b_p32_224(num_classes=1000, pretrained=True)\n# model = vit.vit_ti_p16_224(num_classes=1000, pretrained=True)\n```\n\n```python\nfrom vision_transformers.models import swin_transformer\n\nmodel = swin_transformer.swin_t_p4_w7_224(num_classes=1000, pretrained=True)\n# model = swin_transformer.swin_s_p4_w7_224(num_classes=1000, pretrained=True)\n# model = swin_transformer.swin_b_p4_w7_224(num_classes=1000, pretrained=True)\n# model = swin_transformer.swin_l_p4_w7_224(num_classes=1000)\n```\n\n### If you want to use the training pipeline\n\n* Clone the repository:\n\n```\ngit clone https://github.com/sovit-123/vision_transformers.git\ncd vision_transformers\n```\n\n* Install\n\n```\npip install .\n```\n\nFrom the `vision_transformers` directory:\n\n* If you have no validation split\n\n```\npython tools/train_classifier.py --data data/diabetic_retinopathy/colored_images/ 0.15 --epochs 5 --model vit_ti_p16_224\n```\n\n* In the above command:\n\n  * `data/diabetic_retinopathy/colored_images/` represents the data folder where the images will be inside the respective class folders\n\n  * `0.15` represents the validation split as the dataset does not contain a validation folder\n\n* If you have validation split\n\n```\npython tools/train_classifier.py --train-dir data/plant_disease_recognition/train/ --valid-dir data/plant_disease_recognition/valid/ --epochs 5 --model vit_ti_p16_224\n```\n\n* In the above command:\n  * `--train-dir` should be path to the training directory where the images will be inside their respective class folders.\n  * `--valid-dir` should be path to the validation directory where the images will be inside their respective class folders.\n\n### All Available Model Flags for `--model`\n\n```\nvit_b_p32_224\nvit_ti_p16_224\nvit_ti_p16_384\nvit_b_p16_224\nswin_b_p4_w7_224\nswin_t_p4_w7_224\nswin_s_p4_w7_224\nswin_l_p4_w7_224\n```\n\n### DETR Training\n\n* The datasets annotations should be in XML format. The dataset (according to `--data` flag) given in following can be found here => https://www.kaggle.com/datasets/sovitrath/aquarium-data\n\n```\npython tools/train_detector.py --model detr_resnet50 --epochs 2 --data data/aquarium.yaml\n```\n\n## [Examples](https://github.com/sovit-123/vision_transformers/tree/main/examples)\n\n- [ViT Base 16 | 224x224 pretrained fine-tuning on CIFAR10](https://github.com/sovit-123/vision_transformers/blob/main/examples/cifar10_vit_pretrained.ipynb)\n- [ViT Tiny 16 | 224x224 pretrained fine-tuning on CIFAR10](https://github.com/sovit-123/vision_transformers/blob/main/examples/cifar10_vit_tiny_p16_224.ipynb)\n- [DETR image inference notebook](https://github.com/sovit-123/vision_transformers/blob/main/examples/detr_image_inference.ipynb)\n- [DETR video inference script](https://github.com/sovit-123/vision_transformers/blob/main/examples/detr_video_inference.py) (**Fine Tuning Coming Soon**) --- [Check commands here](#DETR-Video-Inference-Commands)\n\n## DETR Video Inference Commands\n\n***All commands to be executed from the root project directory (`vision_transformers`)***\n\n* Using default video:\n\n```\npython examples/detr_video_inference.py\n```\n\n* Using CPU only:\n\n```\npython examples/detr_video_inference.py --device cpu\n```\n\n* Using another video file:\n\n```\npython examples/detr_video_inference.py --input /path/to/video/file\n```\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Vision Transformers (ViT)",
    "version": "0.1.1.0",
    "split_keywords": [
        "vision",
        "neural attention",
        "deep learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8845f25edb1b3f643c9f4655fb87d14c07270721dab597bf61096114be7c9c69",
                "md5": "32f790ee086c3b1ac110fa1c0755a361",
                "sha256": "45fc09d90aef2a9f6eaf3cc60cd0e9879a1d5bae868d30352cbf5a1b0f3e4a5d"
            },
            "downloads": -1,
            "filename": "vision_transformers-0.1.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "32f790ee086c3b1ac110fa1c0755a361",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 38139,
            "upload_time": "2023-03-25T12:41:54",
            "upload_time_iso_8601": "2023-03-25T12:41:54.491710Z",
            "url": "https://files.pythonhosted.org/packages/88/45/f25edb1b3f643c9f4655fb87d14c07270721dab597bf61096114be7c9c69/vision_transformers-0.1.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-03-25 12:41:54",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "sovit-123",
    "github_project": "vision_transformers",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "albumentations",
            "specs": [
                [
                    ">=",
                    "1.1.0"
                ]
            ]
        },
        {
            "name": "torchinfo",
            "specs": []
        },
        {
            "name": "tqdm",
            "specs": []
        },
        {
            "name": "pycocotools",
            "specs": [
                [
                    ">=",
                    "2.0.2"
                ]
            ]
        }
    ],
    "lcname": "vision-transformers"
}
        
Elapsed time: 0.08673s