maskedautoencoder


Namemaskedautoencoder JSON
Version 0.0.1 PyPI version JSON
download
home_pagehttps://github.com/henrywoo/mae
SummaryMAE - Masked Autoencoder (An Updated PyTorch Implementation for Single GPU with 4GB Memory)
upload_time2024-06-27 22:12:21
maintainerNone
docs_urlNone
authorFuheng Wu
requires_pythonNone
licenseNone
keywords maskedautoencoder mae autoendcoder
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ## Masked Autoencoders: An Updated PyTorch Implementation for Single GPU with 4GB Memory

<p align="center">
  <img src="https://user-images.githubusercontent.com/11435359/146857310-f258c86c-fde6-48e8-9cee-badd2b21bd2c.png" width="480">
</p>

This is an `updated` PyTorch/GPU re-implementation of the paper [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) for consumer GPU user for learning purpose.

* Updated to latest Torch and Timm
* Use Imagenette as the default dataset so that you can run the training in a consumer GPU to debug the code immediately without downloading the huge Imagenet
* 

[Github Repo: 🔗](https://github.com/henrywoo/mae/)

### Command to Train the model:

```python
pip install maskedautoencoder

git checkout https://github.com/henrywoo/mae/
cd mae
pip install -r requirements.txt
bash run.sh
```

Screenshot of training it with a 4G GPU laptop:

![](mae_training.png)

One liner change to replace ImageNette with ImageNet1K:

Repalce

```python
dataset_train = get_cv_dataset(path=DS_PATH_IMAGENETTE, transform=transform_train, name="full_size")
```

with

```python
dataset_train = get_cv_dataset(path=DS_PATH_IMAGENET1K, transform=transform_train)
```

### Catalog

- [x] Visualization demo
- [x] Pre-trained checkpoints + fine-tuning code
- [x] Pre-training code

### Visualization demo

Run our interactive visualization demo using [Colab notebook](https://colab.research.google.com/github/facebookresearch/mae/blob/main/demo/mae_visualize.ipynb) (no GPU needed):
<p align="center">
  <img src="https://user-images.githubusercontent.com/11435359/147859292-77341c70-2ed8-4703-b153-f505dcb6f2f8.png" width="600">
</p>

### Fine-tuning with pre-trained checkpoints

The following table provides the pre-trained checkpoints used in the paper, converted from TF/TPU to PT/GPU:
<table><tbody>
<!-- START TABLE -->
<!-- TABLE HEADER -->
<th valign="bottom"></th>
<th valign="bottom">ViT-Base</th>
<th valign="bottom">ViT-Large</th>
<th valign="bottom">ViT-Huge</th>
<!-- TABLE BODY -->
<tr><td align="left">pre-trained checkpoint</td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/mae/pretrain/mae_pretrain_vit_base.pth">download</a></td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/mae/pretrain/mae_pretrain_vit_large.pth">download</a></td>
<td align="center"><a href="https://dl.fbaipublicfiles.com/mae/pretrain/mae_pretrain_vit_huge.pth">download</a></td>
</tr>
<tr><td align="left">md5</td>
<td align="center"><tt>8cad7c</tt></td>
<td align="center"><tt>b8b06e</tt></td>
<td align="center"><tt>9bdbb0</tt></td>
</tr>
</tbody></table>

The fine-tuning instruction is in [FINETUNE.md](FINETUNE.md).

By fine-tuning these pre-trained models, we rank #1 in these classification tasks (detailed in the paper):
<table><tbody>
<!-- START TABLE -->
<!-- TABLE HEADER -->
<th valign="bottom"></th>
<th valign="bottom">ViT-B</th>
<th valign="bottom">ViT-L</th>
<th valign="bottom">ViT-H</th>
<th valign="bottom">ViT-H<sub>448</sub></th>
<td valign="bottom" style="color:#C0C0C0">prev best</td>
<!-- TABLE BODY -->
<tr><td align="left">ImageNet-1K (no external data)</td>
<td align="center">83.6</td>
<td align="center">85.9</td>
<td align="center">86.9</td>
<td align="center"><b>87.8</b></td>
<td align="center" style="color:#C0C0C0">87.1</td>
</tr>
<td colspan="5"><font size="1"><em>following are evaluation of the same model weights (fine-tuned in original ImageNet-1K):</em></font></td>
<tr>
</tr>
<tr><td align="left">ImageNet-Corruption (error rate) </td>
<td align="center">51.7</td>
<td align="center">41.8</td>
<td align="center"><b>33.8</b></td>
<td align="center">36.8</td>
<td align="center" style="color:#C0C0C0">42.5</td>
</tr>
<tr><td align="left">ImageNet-Adversarial</td>
<td align="center">35.9</td>
<td align="center">57.1</td>
<td align="center">68.2</td>
<td align="center"><b>76.7</b></td>
<td align="center" style="color:#C0C0C0">35.8</td>
</tr>
<tr><td align="left">ImageNet-Rendition</td>
<td align="center">48.3</td>
<td align="center">59.9</td>
<td align="center">64.4</td>
<td align="center"><b>66.5</b></td>
<td align="center" style="color:#C0C0C0">48.7</td>
</tr>
<tr><td align="left">ImageNet-Sketch</td>
<td align="center">34.5</td>
<td align="center">45.3</td>
<td align="center">49.6</td>
<td align="center"><b>50.9</b></td>
<td align="center" style="color:#C0C0C0">36.0</td>
</tr>
<td colspan="5"><font size="1"><em>following are transfer learning by fine-tuning the pre-trained MAE on the target dataset:</em></font></td>
</tr>
<tr><td align="left">iNaturalists 2017</td>
<td align="center">70.5</td>
<td align="center">75.7</td>
<td align="center">79.3</td>
<td align="center"><b>83.4</b></td>
<td align="center" style="color:#C0C0C0">75.4</td>
</tr>
<tr><td align="left">iNaturalists 2018</td>
<td align="center">75.4</td>
<td align="center">80.1</td>
<td align="center">83.0</td>
<td align="center"><b>86.8</b></td>
<td align="center" style="color:#C0C0C0">81.2</td>
</tr>
<tr><td align="left">iNaturalists 2019</td>
<td align="center">80.5</td>
<td align="center">83.4</td>
<td align="center">85.7</td>
<td align="center"><b>88.3</b></td>
<td align="center" style="color:#C0C0C0">84.1</td>
</tr>
<tr><td align="left">Places205</td>
<td align="center">63.9</td>
<td align="center">65.8</td>
<td align="center">65.9</td>
<td align="center"><b>66.8</b></td>
<td align="center" style="color:#C0C0C0">66.0</td>
</tr>
<tr><td align="left">Places365</td>
<td align="center">57.9</td>
<td align="center">59.4</td>
<td align="center">59.8</td>
<td align="center"><b>60.3</b></td>
<td align="center" style="color:#C0C0C0">58.0</td>
</tr>
</tbody></table>

### Pre-training

The pre-training instruction is in [PRETRAIN.md](PRETRAIN.md).

### License

This project is under the CC-BY-NC 4.0 license. See [LICENSE](LICENSE) for details.

### Other Versions

- The original version: [PyTorch Version](https://github.com/facebookresearch/mae)
- Other version: [TF](https://github.com/ariG23498/mae-scalable-vision-learners), [MAE-pytorch 1](https://github.com/pengzhiliang/MAE-pytorch), [MAE-pytorch 2](https://github.com/FlyEgle/MAE-pytorch)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/henrywoo/mae",
    "name": "maskedautoencoder",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "maskedautoencoder, mae, autoendcoder",
    "author": "Fuheng Wu",
    "author_email": null,
    "download_url": null,
    "platform": null,
    "description": "## Masked Autoencoders: An Updated PyTorch Implementation for Single GPU with 4GB Memory\n\n<p align=\"center\">\n  <img src=\"https://user-images.githubusercontent.com/11435359/146857310-f258c86c-fde6-48e8-9cee-badd2b21bd2c.png\" width=\"480\">\n</p>\n\nThis is an `updated` PyTorch/GPU re-implementation of the paper [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) for consumer GPU user for learning purpose.\n\n* Updated to latest Torch and Timm\n* Use Imagenette as the default dataset so that you can run the training in a consumer GPU to debug the code immediately without downloading the huge Imagenet\n* \n\n[Github Repo: \ud83d\udd17](https://github.com/henrywoo/mae/)\n\n### Command to Train the model:\n\n```python\npip install maskedautoencoder\n\ngit checkout https://github.com/henrywoo/mae/\ncd mae\npip install -r requirements.txt\nbash run.sh\n```\n\nScreenshot of training it with a 4G GPU laptop:\n\n![](mae_training.png)\n\nOne liner change to replace ImageNette with ImageNet1K:\n\nRepalce\n\n```python\ndataset_train = get_cv_dataset(path=DS_PATH_IMAGENETTE, transform=transform_train, name=\"full_size\")\n```\n\nwith\n\n```python\ndataset_train = get_cv_dataset(path=DS_PATH_IMAGENET1K, transform=transform_train)\n```\n\n### Catalog\n\n- [x] Visualization demo\n- [x] Pre-trained checkpoints + fine-tuning code\n- [x] Pre-training code\n\n### Visualization demo\n\nRun our interactive visualization demo using [Colab notebook](https://colab.research.google.com/github/facebookresearch/mae/blob/main/demo/mae_visualize.ipynb) (no GPU needed):\n<p align=\"center\">\n  <img src=\"https://user-images.githubusercontent.com/11435359/147859292-77341c70-2ed8-4703-b153-f505dcb6f2f8.png\" width=\"600\">\n</p>\n\n### Fine-tuning with pre-trained checkpoints\n\nThe following table provides the pre-trained checkpoints used in the paper, converted from TF/TPU to PT/GPU:\n<table><tbody>\n<!-- START TABLE -->\n<!-- TABLE HEADER -->\n<th valign=\"bottom\"></th>\n<th valign=\"bottom\">ViT-Base</th>\n<th valign=\"bottom\">ViT-Large</th>\n<th valign=\"bottom\">ViT-Huge</th>\n<!-- TABLE BODY -->\n<tr><td align=\"left\">pre-trained checkpoint</td>\n<td align=\"center\"><a href=\"https://dl.fbaipublicfiles.com/mae/pretrain/mae_pretrain_vit_base.pth\">download</a></td>\n<td align=\"center\"><a href=\"https://dl.fbaipublicfiles.com/mae/pretrain/mae_pretrain_vit_large.pth\">download</a></td>\n<td align=\"center\"><a href=\"https://dl.fbaipublicfiles.com/mae/pretrain/mae_pretrain_vit_huge.pth\">download</a></td>\n</tr>\n<tr><td align=\"left\">md5</td>\n<td align=\"center\"><tt>8cad7c</tt></td>\n<td align=\"center\"><tt>b8b06e</tt></td>\n<td align=\"center\"><tt>9bdbb0</tt></td>\n</tr>\n</tbody></table>\n\nThe fine-tuning instruction is in [FINETUNE.md](FINETUNE.md).\n\nBy fine-tuning these pre-trained models, we rank #1 in these classification tasks (detailed in the paper):\n<table><tbody>\n<!-- START TABLE -->\n<!-- TABLE HEADER -->\n<th valign=\"bottom\"></th>\n<th valign=\"bottom\">ViT-B</th>\n<th valign=\"bottom\">ViT-L</th>\n<th valign=\"bottom\">ViT-H</th>\n<th valign=\"bottom\">ViT-H<sub>448</sub></th>\n<td valign=\"bottom\" style=\"color:#C0C0C0\">prev best</td>\n<!-- TABLE BODY -->\n<tr><td align=\"left\">ImageNet-1K (no external data)</td>\n<td align=\"center\">83.6</td>\n<td align=\"center\">85.9</td>\n<td align=\"center\">86.9</td>\n<td align=\"center\"><b>87.8</b></td>\n<td align=\"center\" style=\"color:#C0C0C0\">87.1</td>\n</tr>\n<td colspan=\"5\"><font size=\"1\"><em>following are evaluation of the same model weights (fine-tuned in original ImageNet-1K):</em></font></td>\n<tr>\n</tr>\n<tr><td align=\"left\">ImageNet-Corruption (error rate) </td>\n<td align=\"center\">51.7</td>\n<td align=\"center\">41.8</td>\n<td align=\"center\"><b>33.8</b></td>\n<td align=\"center\">36.8</td>\n<td align=\"center\" style=\"color:#C0C0C0\">42.5</td>\n</tr>\n<tr><td align=\"left\">ImageNet-Adversarial</td>\n<td align=\"center\">35.9</td>\n<td align=\"center\">57.1</td>\n<td align=\"center\">68.2</td>\n<td align=\"center\"><b>76.7</b></td>\n<td align=\"center\" style=\"color:#C0C0C0\">35.8</td>\n</tr>\n<tr><td align=\"left\">ImageNet-Rendition</td>\n<td align=\"center\">48.3</td>\n<td align=\"center\">59.9</td>\n<td align=\"center\">64.4</td>\n<td align=\"center\"><b>66.5</b></td>\n<td align=\"center\" style=\"color:#C0C0C0\">48.7</td>\n</tr>\n<tr><td align=\"left\">ImageNet-Sketch</td>\n<td align=\"center\">34.5</td>\n<td align=\"center\">45.3</td>\n<td align=\"center\">49.6</td>\n<td align=\"center\"><b>50.9</b></td>\n<td align=\"center\" style=\"color:#C0C0C0\">36.0</td>\n</tr>\n<td colspan=\"5\"><font size=\"1\"><em>following are transfer learning by fine-tuning the pre-trained MAE on the target dataset:</em></font></td>\n</tr>\n<tr><td align=\"left\">iNaturalists 2017</td>\n<td align=\"center\">70.5</td>\n<td align=\"center\">75.7</td>\n<td align=\"center\">79.3</td>\n<td align=\"center\"><b>83.4</b></td>\n<td align=\"center\" style=\"color:#C0C0C0\">75.4</td>\n</tr>\n<tr><td align=\"left\">iNaturalists 2018</td>\n<td align=\"center\">75.4</td>\n<td align=\"center\">80.1</td>\n<td align=\"center\">83.0</td>\n<td align=\"center\"><b>86.8</b></td>\n<td align=\"center\" style=\"color:#C0C0C0\">81.2</td>\n</tr>\n<tr><td align=\"left\">iNaturalists 2019</td>\n<td align=\"center\">80.5</td>\n<td align=\"center\">83.4</td>\n<td align=\"center\">85.7</td>\n<td align=\"center\"><b>88.3</b></td>\n<td align=\"center\" style=\"color:#C0C0C0\">84.1</td>\n</tr>\n<tr><td align=\"left\">Places205</td>\n<td align=\"center\">63.9</td>\n<td align=\"center\">65.8</td>\n<td align=\"center\">65.9</td>\n<td align=\"center\"><b>66.8</b></td>\n<td align=\"center\" style=\"color:#C0C0C0\">66.0</td>\n</tr>\n<tr><td align=\"left\">Places365</td>\n<td align=\"center\">57.9</td>\n<td align=\"center\">59.4</td>\n<td align=\"center\">59.8</td>\n<td align=\"center\"><b>60.3</b></td>\n<td align=\"center\" style=\"color:#C0C0C0\">58.0</td>\n</tr>\n</tbody></table>\n\n### Pre-training\n\nThe pre-training instruction is in [PRETRAIN.md](PRETRAIN.md).\n\n### License\n\nThis project is under the CC-BY-NC 4.0 license. See [LICENSE](LICENSE) for details.\n\n### Other Versions\n\n- The original version: [PyTorch Version](https://github.com/facebookresearch/mae)\n- Other version: [TF](https://github.com/ariG23498/mae-scalable-vision-learners), [MAE-pytorch 1](https://github.com/pengzhiliang/MAE-pytorch), [MAE-pytorch 2](https://github.com/FlyEgle/MAE-pytorch)\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "MAE - Masked Autoencoder (An Updated PyTorch Implementation for Single GPU with 4GB Memory)",
    "version": "0.0.1",
    "project_urls": {
        "Homepage": "https://github.com/henrywoo/mae"
    },
    "split_keywords": [
        "maskedautoencoder",
        " mae",
        " autoendcoder"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6f7641741bd1350cc42aacd6ce3e9fc8b962a201eb88fde8a53c5def130b4393",
                "md5": "847468ea9b9e77b2ac56d7ab1ccd9e95",
                "sha256": "e93d3921d7bdff8f66b95108797442c05a05e8e36d06856919f0ffb6cd94d329"
            },
            "downloads": -1,
            "filename": "maskedautoencoder-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "847468ea9b9e77b2ac56d7ab1ccd9e95",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 36043,
            "upload_time": "2024-06-27T22:12:21",
            "upload_time_iso_8601": "2024-06-27T22:12:21.907600Z",
            "url": "https://files.pythonhosted.org/packages/6f/76/41741bd1350cc42aacd6ce3e9fc8b962a201eb88fde8a53c5def130b4393/maskedautoencoder-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-27 22:12:21",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "henrywoo",
    "github_project": "mae",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "maskedautoencoder"
}
        
Elapsed time: 0.38858s