[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)
# Vision Mamba
Implementation of Vision Mamba from the paper: "Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model" It's 2.8x faster than DeiT and saves 86.8% GPU memory when performing batch inference to extract features on high-res images.
[PAPER LINK](https://arxiv.org/abs/2401.09417)
## Installation
```bash
pip install vision-mamba
```
# Usage
```python
import torch
from vision_mamba import Vim
# Forward pass
x = torch.randn(1, 3, 224, 224) # Input tensor with shape (batch_size, channels, height, width)
# Model
model = Vim(
dim=256, # Dimension of the transformer model
heads=8, # Number of attention heads
dt_rank=32, # Rank of the dynamic routing matrix
dim_inner=256, # Inner dimension of the transformer model
d_state=256, # Dimension of the state vector
num_classes=1000, # Number of output classes
image_size=224, # Size of the input image
patch_size=16, # Size of each image patch
channels=3, # Number of input channels
dropout=0.1, # Dropout rate
depth=12, # Depth of the transformer model
)
# Forward pass
out = model(x) # Output tensor from the model
print(out.shape) # Print the shape of the output tensor
print(out) # Print the output tensor
```
## Citation
```bibtex
@misc{zhu2024vision,
title={Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model},
author={Lianghui Zhu and Bencheng Liao and Qian Zhang and Xinlong Wang and Wenyu Liu and Xinggang Wang},
year={2024},
eprint={2401.09417},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
# License
MIT
# Todo
- [ ] Create training script for imagenet
- [ ] Create a visual mamba for facial recognition
Raw data
{
"_id": null,
"home_page": "https://github.com/kyegomez/VisionMamba",
"name": "vision-mamba",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.10",
"maintainer_email": null,
"keywords": "artificial intelligence, deep learning, optimizers, Prompt Engineering",
"author": "Kye Gomez",
"author_email": "kye@apac.ai",
"download_url": "https://files.pythonhosted.org/packages/12/9f/ab5e240c1b13f0bc48e79beed1fbec33f4da6b450f2274926087dd99d60b/vision_mamba-0.1.0.tar.gz",
"platform": null,
"description": "[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)\n\n# Vision Mamba\nImplementation of Vision Mamba from the paper: \"Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model\" It's 2.8x faster than DeiT and saves 86.8% GPU memory when performing batch inference to extract features on high-res images. \n\n[PAPER LINK](https://arxiv.org/abs/2401.09417)\n\n## Installation\n\n```bash\npip install vision-mamba\n```\n\n# Usage\n```python\nimport torch\nfrom vision_mamba import Vim\n\n# Forward pass\nx = torch.randn(1, 3, 224, 224) # Input tensor with shape (batch_size, channels, height, width)\n\n# Model\nmodel = Vim(\n dim=256, # Dimension of the transformer model\n heads=8, # Number of attention heads\n dt_rank=32, # Rank of the dynamic routing matrix\n dim_inner=256, # Inner dimension of the transformer model\n d_state=256, # Dimension of the state vector\n num_classes=1000, # Number of output classes\n image_size=224, # Size of the input image\n patch_size=16, # Size of each image patch\n channels=3, # Number of input channels\n dropout=0.1, # Dropout rate\n depth=12, # Depth of the transformer model\n)\n\n# Forward pass\nout = model(x) # Output tensor from the model\nprint(out.shape) # Print the shape of the output tensor\nprint(out) # Print the output tensor\n\n\n\n```\n\n\n\n## Citation\n```bibtex\n@misc{zhu2024vision,\n title={Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model}, \n author={Lianghui Zhu and Bencheng Liao and Qian Zhang and Xinlong Wang and Wenyu Liu and Xinggang Wang},\n year={2024},\n eprint={2401.09417},\n archivePrefix={arXiv},\n primaryClass={cs.CV}\n}\n```\n\n# License\nMIT\n\n\n\n# Todo\n- [ ] Create training script for imagenet\n- [ ] Create a visual mamba for facial recognition",
"bugtrack_url": null,
"license": "MIT",
"summary": "Vision Mamba - Pytorch",
"version": "0.1.0",
"project_urls": {
"Documentation": "https://github.com/kyegomez/VisionMamba",
"Homepage": "https://github.com/kyegomez/VisionMamba",
"Repository": "https://github.com/kyegomez/VisionMamba"
},
"split_keywords": [
"artificial intelligence",
" deep learning",
" optimizers",
" prompt engineering"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "cde7a1119b151c25d62f0c7688c3ba850b0bfa09e011f3e00723f6bfffb7225b",
"md5": "9b7a499a8f3b858dd78e45f18fe02c84",
"sha256": "6f21a8c22888b877e2f1b0b5737f805083faec824394245f98133cf41f7a7cb3"
},
"downloads": -1,
"filename": "vision_mamba-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9b7a499a8f3b858dd78e45f18fe02c84",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.10",
"size": 5260,
"upload_time": "2024-05-16T18:16:55",
"upload_time_iso_8601": "2024-05-16T18:16:55.178662Z",
"url": "https://files.pythonhosted.org/packages/cd/e7/a1119b151c25d62f0c7688c3ba850b0bfa09e011f3e00723f6bfffb7225b/vision_mamba-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "129fab5e240c1b13f0bc48e79beed1fbec33f4da6b450f2274926087dd99d60b",
"md5": "039962faf7af93e68e3c7db06198f4d6",
"sha256": "acb7c01d794daff3d05ecd6d8852150e8c8d269d65002f258c3ef3d241ec5edf"
},
"downloads": -1,
"filename": "vision_mamba-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "039962faf7af93e68e3c7db06198f4d6",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.10",
"size": 5340,
"upload_time": "2024-05-16T18:16:57",
"upload_time_iso_8601": "2024-05-16T18:16:57.045396Z",
"url": "https://files.pythonhosted.org/packages/12/9f/ab5e240c1b13f0bc48e79beed1fbec33f4da6b450f2274926087dd99d60b/vision_mamba-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-16 18:16:57",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "kyegomez",
"github_project": "VisionMamba",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "vision-mamba"
}