vision-mamba


Namevision-mamba JSON
Version 0.1.0 PyPI version JSON
download
home_pagehttps://github.com/kyegomez/VisionMamba
SummaryVision Mamba - Pytorch
upload_time2024-05-16 18:16:57
maintainerNone
docs_urlNone
authorKye Gomez
requires_python<4.0,>=3.10
licenseMIT
keywords artificial intelligence deep learning optimizers prompt engineering
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)

# Vision Mamba
Implementation of Vision Mamba from the paper: "Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model" It's 2.8x faster than DeiT and saves 86.8% GPU memory when performing batch inference to extract features on high-res images. 

[PAPER LINK](https://arxiv.org/abs/2401.09417)

## Installation

```bash
pip install vision-mamba
```

# Usage
```python
import torch
from vision_mamba import Vim

# Forward pass
x = torch.randn(1, 3, 224, 224)  # Input tensor with shape (batch_size, channels, height, width)

# Model
model = Vim(
    dim=256,  # Dimension of the transformer model
    heads=8,  # Number of attention heads
    dt_rank=32,  # Rank of the dynamic routing matrix
    dim_inner=256,  # Inner dimension of the transformer model
    d_state=256,  # Dimension of the state vector
    num_classes=1000,  # Number of output classes
    image_size=224,  # Size of the input image
    patch_size=16,  # Size of each image patch
    channels=3,  # Number of input channels
    dropout=0.1,  # Dropout rate
    depth=12,  # Depth of the transformer model
)

# Forward pass
out = model(x)  # Output tensor from the model
print(out.shape)  # Print the shape of the output tensor
print(out)  # Print the output tensor



```



## Citation
```bibtex
@misc{zhu2024vision,
    title={Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model}, 
    author={Lianghui Zhu and Bencheng Liao and Qian Zhang and Xinlong Wang and Wenyu Liu and Xinggang Wang},
    year={2024},
    eprint={2401.09417},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
```

# License
MIT



# Todo
- [ ] Create training script for imagenet
- [ ] Create a visual mamba for facial recognition
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/kyegomez/VisionMamba",
    "name": "vision-mamba",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": null,
    "keywords": "artificial intelligence, deep learning, optimizers, Prompt Engineering",
    "author": "Kye Gomez",
    "author_email": "kye@apac.ai",
    "download_url": "https://files.pythonhosted.org/packages/12/9f/ab5e240c1b13f0bc48e79beed1fbec33f4da6b450f2274926087dd99d60b/vision_mamba-0.1.0.tar.gz",
    "platform": null,
    "description": "[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)\n\n# Vision Mamba\nImplementation of Vision Mamba from the paper: \"Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model\" It's 2.8x faster than DeiT and saves 86.8% GPU memory when performing batch inference to extract features on high-res images. \n\n[PAPER LINK](https://arxiv.org/abs/2401.09417)\n\n## Installation\n\n```bash\npip install vision-mamba\n```\n\n# Usage\n```python\nimport torch\nfrom vision_mamba import Vim\n\n# Forward pass\nx = torch.randn(1, 3, 224, 224)  # Input tensor with shape (batch_size, channels, height, width)\n\n# Model\nmodel = Vim(\n    dim=256,  # Dimension of the transformer model\n    heads=8,  # Number of attention heads\n    dt_rank=32,  # Rank of the dynamic routing matrix\n    dim_inner=256,  # Inner dimension of the transformer model\n    d_state=256,  # Dimension of the state vector\n    num_classes=1000,  # Number of output classes\n    image_size=224,  # Size of the input image\n    patch_size=16,  # Size of each image patch\n    channels=3,  # Number of input channels\n    dropout=0.1,  # Dropout rate\n    depth=12,  # Depth of the transformer model\n)\n\n# Forward pass\nout = model(x)  # Output tensor from the model\nprint(out.shape)  # Print the shape of the output tensor\nprint(out)  # Print the output tensor\n\n\n\n```\n\n\n\n## Citation\n```bibtex\n@misc{zhu2024vision,\n    title={Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model}, \n    author={Lianghui Zhu and Bencheng Liao and Qian Zhang and Xinlong Wang and Wenyu Liu and Xinggang Wang},\n    year={2024},\n    eprint={2401.09417},\n    archivePrefix={arXiv},\n    primaryClass={cs.CV}\n}\n```\n\n# License\nMIT\n\n\n\n# Todo\n- [ ] Create training script for imagenet\n- [ ] Create a visual mamba for facial recognition",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Vision Mamba - Pytorch",
    "version": "0.1.0",
    "project_urls": {
        "Documentation": "https://github.com/kyegomez/VisionMamba",
        "Homepage": "https://github.com/kyegomez/VisionMamba",
        "Repository": "https://github.com/kyegomez/VisionMamba"
    },
    "split_keywords": [
        "artificial intelligence",
        " deep learning",
        " optimizers",
        " prompt engineering"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cde7a1119b151c25d62f0c7688c3ba850b0bfa09e011f3e00723f6bfffb7225b",
                "md5": "9b7a499a8f3b858dd78e45f18fe02c84",
                "sha256": "6f21a8c22888b877e2f1b0b5737f805083faec824394245f98133cf41f7a7cb3"
            },
            "downloads": -1,
            "filename": "vision_mamba-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9b7a499a8f3b858dd78e45f18fe02c84",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.10",
            "size": 5260,
            "upload_time": "2024-05-16T18:16:55",
            "upload_time_iso_8601": "2024-05-16T18:16:55.178662Z",
            "url": "https://files.pythonhosted.org/packages/cd/e7/a1119b151c25d62f0c7688c3ba850b0bfa09e011f3e00723f6bfffb7225b/vision_mamba-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "129fab5e240c1b13f0bc48e79beed1fbec33f4da6b450f2274926087dd99d60b",
                "md5": "039962faf7af93e68e3c7db06198f4d6",
                "sha256": "acb7c01d794daff3d05ecd6d8852150e8c8d269d65002f258c3ef3d241ec5edf"
            },
            "downloads": -1,
            "filename": "vision_mamba-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "039962faf7af93e68e3c7db06198f4d6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 5340,
            "upload_time": "2024-05-16T18:16:57",
            "upload_time_iso_8601": "2024-05-16T18:16:57.045396Z",
            "url": "https://files.pythonhosted.org/packages/12/9f/ab5e240c1b13f0bc48e79beed1fbec33f4da6b450f2274926087dd99d60b/vision_mamba-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-16 18:16:57",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kyegomez",
    "github_project": "VisionMamba",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "vision-mamba"
}
        
Elapsed time: 0.27750s