vitvqgan

Name	vitvqgan JSON
Version	0.0.1.dev1 JSON
	download
home_page	https://github.com/henrywoo/vim
Summary	VITVQGAN - VECTOR-QUANTIZED IMAGE MODELING WITH IMPROVED VQGAN
upload_time	2024-06-29 03:00:51
maintainer	None
docs_url	None
author	Fuheng Wu
requires_python	None
license	None
keywords	vitvqgan
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <div id="top"></div>

# VIT-VQGAN

This is an unofficial implementation of both [ViT-VQGAN](https://arxiv.org/abs/2110.04627) and [RQ-VAE](https://arxiv.org/abs/2110.04627) in Pytorch. ViT-VQGAN is a simple ViT-based Vector Quantized AutoEncoder while RQ-VAE introduces a new residual quantization scheme. Further details can be viewed in the papers

![](https://raw.githubusercontent.com/henrywoo/vim/main/vitvqgan.png)


## Installation

```python
pip install vitvqgan 
```


## Training

**Train the model:**
```
python -m vitvqgan.train_vim
```

You can add more options too:

```python
python -m vitvqgan.train_vim -c imagenet_vitvq_small -lr 0.00001 -e 10
```

It uses `Imagenette` as the training dataset for demo purpose, to change it, modify [dataloader init file](vitvqgan/dataloader/__init__.py).

**Inference:**
- download checkpoints from above in mbin folder
- Run the following command:
```
python -m vitvqgan.demo_recon
```

## Checkpoints

- [ViT-VQGAN Small](https://drive.google.com/file/d/1jbjD4q0iJpXrRMVSYJRIvM_94AxA1EqJ/view?usp=sharing) 
- [ViT-VQGAN Base](https://drive.google.com/file/d/1syv0t3nAJ-bETFgFpztw9cPXghanUaM6/view?usp=sharing)


## Acknowledgements

The repo is modified from [here](https://github.com/thuanz123/enhancing-transformers) with updates to latest dependencies and to be easily run in consumer-grade GPU for learning purpose.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/henrywoo/vim",
    "name": "vitvqgan",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "vitvqgan",
    "author": "Fuheng Wu",
    "author_email": null,
    "download_url": null,
    "platform": null,
    "description": "<div id=\"top\"></div>\n\n# VIT-VQGAN\n\nThis is an unofficial implementation of both [ViT-VQGAN](https://arxiv.org/abs/2110.04627) and [RQ-VAE](https://arxiv.org/abs/2110.04627) in Pytorch. ViT-VQGAN is a simple ViT-based Vector Quantized AutoEncoder while RQ-VAE introduces a new residual quantization scheme. Further details can be viewed in the papers\n\n![](https://raw.githubusercontent.com/henrywoo/vim/main/vitvqgan.png)\n\n\n## Installation\n\n```python\npip install vitvqgan \n```\n\n\n## Training\n\n**Train the model:**\n```\npython -m vitvqgan.train_vim\n```\n\nYou can add more options too:\n\n```python\npython -m vitvqgan.train_vim -c imagenet_vitvq_small -lr 0.00001 -e 10\n```\n\nIt uses `Imagenette` as the training dataset for demo purpose, to change it, modify [dataloader init file](vitvqgan/dataloader/__init__.py).\n\n**Inference:**\n- download checkpoints from above in mbin folder\n- Run the following command:\n```\npython -m vitvqgan.demo_recon\n```\n\n## Checkpoints\n\n- [ViT-VQGAN Small](https://drive.google.com/file/d/1jbjD4q0iJpXrRMVSYJRIvM_94AxA1EqJ/view?usp=sharing) \n- [ViT-VQGAN Base](https://drive.google.com/file/d/1syv0t3nAJ-bETFgFpztw9cPXghanUaM6/view?usp=sharing)\n\n\n## Acknowledgements\n\nThe repo is modified from [here](https://github.com/thuanz123/enhancing-transformers) with updates to latest dependencies and to be easily run in consumer-grade GPU for learning purpose.\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "VITVQGAN - VECTOR-QUANTIZED IMAGE MODELING WITH IMPROVED VQGAN",
    "version": "0.0.1.dev1",
    "project_urls": {
        "Homepage": "https://github.com/henrywoo/vim"
    },
    "split_keywords": [
        "vitvqgan"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f6564c5b693749692093acc7a391cc9ae8799ecb7e363aac069effc37989a835",
                "md5": "3fe5b3f3fd26a72eaa1c1d0a64254931",
                "sha256": "86ec8a289cbbc90cefbd6b6c8cc6eab7b8d41831df82f2950ae48dd77341edfc"
            },
            "downloads": -1,
            "filename": "vitvqgan-0.0.1.dev1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3fe5b3f3fd26a72eaa1c1d0a64254931",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 1933350,
            "upload_time": "2024-06-29T03:00:51",
            "upload_time_iso_8601": "2024-06-29T03:00:51.664152Z",
            "url": "https://files.pythonhosted.org/packages/f6/56/4c5b693749692093acc7a391cc9ae8799ecb7e363aac069effc37989a835/vitvqgan-0.0.1.dev1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-29 03:00:51",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "henrywoo",
    "github_project": "vim",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "vitvqgan"
}

Fuheng Wu