[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)
# Vision LLama
Implementation of VisionLLaMA from the paper: "VisionLLaMA: A Unified LLaMA Interface for Vision Tasks" in PyTorch and Zeta. [PAPER LINK](https://arxiv.org/abs/2403.00522)
## install
`$ pip install vision-llama`
## usage
```python
import torch
from vision_llama.main import VisionLlama
# Forward Tensor
x = torch.randn(1, 3, 224, 224)
# Create an instance of the VisionLlamaBlock model with the specified parameters
model = VisionLlama(
dim=768, depth=12, channels=3, heads=12, num_classes=1000
)
# Print the shape of the output tensor when x is passed through the model
print(model(x))
```
# License
MIT
## Citation
```bibtex
@misc{chu2024visionllama,
title={VisionLLaMA: A Unified LLaMA Interface for Vision Tasks},
author={Xiangxiang Chu and Jianlin Su and Bo Zhang and Chunhua Shen},
year={2024},
eprint={2403.00522},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
## todo
- [ ] Implement the AS2DRoPE rope, might just use axial rotary embeddings instead, my implementation is really bad
- [x] Implement the GSA attention, i implemented it but's bad
- [ ] Add imagenet training script with distributed
Raw data
{
"_id": null,
"home_page": "https://github.com/kyegomez/VisionLLaMA",
"name": "vision-llama",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6,<4.0",
"maintainer_email": "",
"keywords": "artificial intelligence,deep learning,optimizers,Prompt Engineering",
"author": "Kye Gomez",
"author_email": "kye@apac.ai",
"download_url": "https://files.pythonhosted.org/packages/51/68/d3bd820836cfb702b873d7af9adc2eda4300ebf9758abe5e90f1a076ff98/vision_llama-0.0.8.tar.gz",
"platform": null,
"description": "[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)\n\n# Vision LLama\nImplementation of VisionLLaMA from the paper: \"VisionLLaMA: A Unified LLaMA Interface for Vision Tasks\" in PyTorch and Zeta. [PAPER LINK](https://arxiv.org/abs/2403.00522)\n\n\n## install\n`$ pip install vision-llama`\n\n\n## usage\n```python\n\nimport torch\nfrom vision_llama.main import VisionLlama\n\n# Forward Tensor\nx = torch.randn(1, 3, 224, 224)\n\n# Create an instance of the VisionLlamaBlock model with the specified parameters\nmodel = VisionLlama(\n dim=768, depth=12, channels=3, heads=12, num_classes=1000\n)\n\n\n# Print the shape of the output tensor when x is passed through the model\nprint(model(x))\n\n```\n\n\n\n# License\nMIT\n\n## Citation\n```bibtex\n@misc{chu2024visionllama,\n title={VisionLLaMA: A Unified LLaMA Interface for Vision Tasks}, \n author={Xiangxiang Chu and Jianlin Su and Bo Zhang and Chunhua Shen},\n year={2024},\n eprint={2403.00522},\n archivePrefix={arXiv},\n primaryClass={cs.CV}\n}\n```\n\n## todo\n- [ ] Implement the AS2DRoPE rope, might just use axial rotary embeddings instead, my implementation is really bad\n- [x] Implement the GSA attention, i implemented it but's bad\n- [ ] Add imagenet training script with distributed",
"bugtrack_url": null,
"license": "MIT",
"summary": "Vision Llama - Pytorch",
"version": "0.0.8",
"project_urls": {
"Documentation": "https://github.com/kyegomez/VisionLLaMA",
"Homepage": "https://github.com/kyegomez/VisionLLaMA",
"Repository": "https://github.com/kyegomez/VisionLLaMA"
},
"split_keywords": [
"artificial intelligence",
"deep learning",
"optimizers",
"prompt engineering"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "a97d3bdcd336d5f261367182e6f8c3549476c78238bc56b11a12be4f8e6ffa20",
"md5": "71021388427164bc509ee2843abc9c5d",
"sha256": "e9ba5d07001b8115eff47e07bfbca15838d75c1f53065dfef5da0ccd2ffa7e28"
},
"downloads": -1,
"filename": "vision_llama-0.0.8-py3-none-any.whl",
"has_sig": false,
"md5_digest": "71021388427164bc509ee2843abc9c5d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6,<4.0",
"size": 7480,
"upload_time": "2024-03-06T05:13:22",
"upload_time_iso_8601": "2024-03-06T05:13:22.104608Z",
"url": "https://files.pythonhosted.org/packages/a9/7d/3bdcd336d5f261367182e6f8c3549476c78238bc56b11a12be4f8e6ffa20/vision_llama-0.0.8-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "5168d3bd820836cfb702b873d7af9adc2eda4300ebf9758abe5e90f1a076ff98",
"md5": "1b95dd41e192fd3ccf2f78bf6f301437",
"sha256": "5adc93a897c33fed5db0f4fa05f7ec6254986990f4f4691b38e39f2d9d02cb6a"
},
"downloads": -1,
"filename": "vision_llama-0.0.8.tar.gz",
"has_sig": false,
"md5_digest": "1b95dd41e192fd3ccf2f78bf6f301437",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6,<4.0",
"size": 7326,
"upload_time": "2024-03-06T05:13:23",
"upload_time_iso_8601": "2024-03-06T05:13:23.423729Z",
"url": "https://files.pythonhosted.org/packages/51/68/d3bd820836cfb702b873d7af9adc2eda4300ebf9758abe5e90f1a076ff98/vision_llama-0.0.8.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-03-06 05:13:23",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "kyegomez",
"github_project": "VisionLLaMA",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "vision-llama"
}