omegavit

Name	omegavit JSON
Version	0.0.1 JSON
	download
home_page	https://github.com/Agora-Lab-AI/OmegaViT
Summary	OmegaViT: A State-of-the-Art Vision Transformer with Multi-Query Attention, State Space Modeling, and Mixture of Experts
upload_time	2024-12-19 06:39:03
maintainer	None
docs_url	None
author	Kye Gomez
requires_python	<4.0,>=3.10
license	MIT
keywords	artificial intelligence deep learning optimizers prompt engineering
VCS
bugtrack_url
requirements	torch loguru einops
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # OmegaViT: A State-of-the-Art Vision Transformer with Multi-Query Attention, State Space Modeling, and Mixture of Experts

[![Join our Discord](https://img.shields.io/badge/Discord-Join%20our%20server-5865F2?style=for-the-badge&logo=discord&logoColor=white)](https://discord.gg/agora-999382051935506503) [![Subscribe on YouTube](https://img.shields.io/badge/YouTube-Subscribe-red?style=for-the-badge&logo=youtube&logoColor=white)](https://www.youtube.com/@kyegomez3242) [![Connect on LinkedIn](https://img.shields.io/badge/LinkedIn-Connect-blue?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/kye-g-38759a207/) [![Follow on X.com](https://img.shields.io/badge/X.com-Follow-1DA1F2?style=for-the-badge&logo=x&logoColor=white)](https://x.com/kyegomezb)




[![PyPI version](https://badge.fury.io/py/omegavit.svg)](https://badge.fury.io/py/omegavit)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Build Status](https://github.com/Agora-Lab-AI/OmegaViT/workflows/build/badge.svg)](https://github.com/Agora-Lab-AI/OmegaViT/actions)
[![Documentation Status](https://readthedocs.org/projects/omegavit/badge/?version=latest)](https://omegavit.readthedocs.io/en/latest/?badge=latest)

OmegaViT (ΩViT) is a cutting-edge vision transformer architecture that combines multi-query attention, rotary embeddings, state space modeling, and mixture of experts to achieve superior performance across various computer vision tasks. The model can process images of any resolution while maintaining computational efficiency.

## Key Features

- **Flexible Resolution Processing**: Handles arbitrary input image sizes through adaptive patch embedding
- **Multi-Query Attention (MQA)**: Reduces computational complexity while maintaining model expressiveness
- **Rotary Embeddings**: Enables better modeling of relative positions and spatial relationships
- **State Space Models (SSM)**: Integrates efficient sequence modeling every third layer
- **Mixture of Experts (MoE)**: Implements conditional computation for enhanced model capacity
- **Comprehensive Logging**: Built-in loguru integration for detailed execution tracking
- **Shape-Aware Design**: Continuous tensor shape tracking for reliable processing

## Architecture

```mermaid
flowchart TB
    subgraph Input
        img[Input Image]
    end
    
    subgraph PatchEmbed[Flexible Patch Embedding]
        conv[Convolution]
        norm1[LayerNorm]
        conv --> norm1
    end
    
    subgraph TransformerBlocks[Transformer Blocks x12]
        subgraph Block1[Block n]
            direction TB
            mqa[Multi-Query Attention]
            ln1[LayerNorm]
            moe1[Mixture of Experts]
            ln2[LayerNorm]
            ln1 --> mqa --> ln2 --> moe1
        end
        
        subgraph Block2[Block n+1]
            direction TB
            mqa2[Multi-Query Attention]
            ln3[LayerNorm]
            moe2[Mixture of Experts]
            ln4[LayerNorm]
            ln3 --> mqa2 --> ln4 --> moe2
        end
        
        subgraph Block3[Block n+2 SSM]
            direction TB
            ssm[State Space Model]
            ln5[LayerNorm]
            moe3[Mixture of Experts]
            ln6[LayerNorm]
            ln5 --> ssm --> ln6 --> moe3
        end
    end
    
    subgraph Output
        gap[Global Average Pooling]
        classifier[Classification Head]
    end
    
    img --> PatchEmbed --> TransformerBlocks --> gap --> classifier
```

## Multi-Query Attention Detail

```mermaid
flowchart LR
    input[Input Features]
    
    subgraph MQA[Multi-Query Attention]
        direction TB
        q[Q Linear]
        k[K Linear]
        v[V Linear]
        rotary[Rotary Embeddings]
        attn[Attention Weights]
        
        input --> q & k & v
        q & k --> rotary
        rotary --> attn
        attn --> v
    end
    
    MQA --> output[Output Features]

```

## Installation

```bash
pip install omegavit
```

## Quick Start

```python
import torch
from omegavit import create_advanced_vit

# Create model
model = create_advanced_vit(num_classes=1000)

# Example forward pass
batch_size = 8
x = torch.randn(batch_size, 3, 224, 224)
output = model(x)
print(f"Output shape: {output.shape}")  # [8, 1000]
```

## Model Configurations

| Parameter | Default | Description |
|-----------|---------|-------------|
| hidden_size | 768 | Dimension of transformer layers |
| num_attention_heads | 12 | Number of attention heads |
| num_experts | 8 | Number of expert networks in MoE |
| expert_capacity | 32 | Tokens per expert in MoE |
| num_layers | 12 | Number of transformer blocks |
| patch_size | 16 | Size of image patches |
| ssm_state_size | 16 | Hidden state size in SSM |

## Performance

*Note: Benchmarks coming soon*

## Citation

If you use OmegaViT in your research, please cite:

```bibtex
@article{omegavit2024,
  title={OmegaViT: A State-of-the-Art Vision Transformer with Multi-Query Attention, State Space Modeling, and Mixture of Experts},
  author={Agora Lab},
  journal={arXiv preprint arXiv:XXXX.XXXXX},
  year={2024}
}
```

## Contributing

We welcome contributions! Please see our [contributing guidelines](CONTRIBUTING.md) for details.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

Special thanks to the Agora Lab AI team and the open-source community for their valuable contributions and feedback.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Agora-Lab-AI/OmegaViT",
    "name": "omegavit",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": null,
    "keywords": "artificial intelligence, deep learning, optimizers, Prompt Engineering",
    "author": "Kye Gomez",
    "author_email": "kye@apac.ai",
    "download_url": "https://files.pythonhosted.org/packages/29/4f/81ae6aa58a819193274f3c04044e51fce2fd324e5d7815498f98d7db3409/omegavit-0.0.1.tar.gz",
    "platform": null,
    "description": "# OmegaViT: A State-of-the-Art Vision Transformer with Multi-Query Attention, State Space Modeling, and Mixture of Experts\n\n[![Join our Discord](https://img.shields.io/badge/Discord-Join%20our%20server-5865F2?style=for-the-badge&logo=discord&logoColor=white)](https://discord.gg/agora-999382051935506503) [![Subscribe on YouTube](https://img.shields.io/badge/YouTube-Subscribe-red?style=for-the-badge&logo=youtube&logoColor=white)](https://www.youtube.com/@kyegomez3242) [![Connect on LinkedIn](https://img.shields.io/badge/LinkedIn-Connect-blue?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/kye-g-38759a207/) [![Follow on X.com](https://img.shields.io/badge/X.com-Follow-1DA1F2?style=for-the-badge&logo=x&logoColor=white)](https://x.com/kyegomezb)\n\n\n\n\n[![PyPI version](https://badge.fury.io/py/omegavit.svg)](https://badge.fury.io/py/omegavit)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Build Status](https://github.com/Agora-Lab-AI/OmegaViT/workflows/build/badge.svg)](https://github.com/Agora-Lab-AI/OmegaViT/actions)\n[![Documentation Status](https://readthedocs.org/projects/omegavit/badge/?version=latest)](https://omegavit.readthedocs.io/en/latest/?badge=latest)\n\nOmegaViT (\u03a9ViT) is a cutting-edge vision transformer architecture that combines multi-query attention, rotary embeddings, state space modeling, and mixture of experts to achieve superior performance across various computer vision tasks. The model can process images of any resolution while maintaining computational efficiency.\n\n## Key Features\n\n- **Flexible Resolution Processing**: Handles arbitrary input image sizes through adaptive patch embedding\n- **Multi-Query Attention (MQA)**: Reduces computational complexity while maintaining model expressiveness\n- **Rotary Embeddings**: Enables better modeling of relative positions and spatial relationships\n- **State Space Models (SSM)**: Integrates efficient sequence modeling every third layer\n- **Mixture of Experts (MoE)**: Implements conditional computation for enhanced model capacity\n- **Comprehensive Logging**: Built-in loguru integration for detailed execution tracking\n- **Shape-Aware Design**: Continuous tensor shape tracking for reliable processing\n\n## Architecture\n\n```mermaid\nflowchart TB\n    subgraph Input\n        img[Input Image]\n    end\n    \n    subgraph PatchEmbed[Flexible Patch Embedding]\n        conv[Convolution]\n        norm1[LayerNorm]\n        conv --> norm1\n    end\n    \n    subgraph TransformerBlocks[Transformer Blocks x12]\n        subgraph Block1[Block n]\n            direction TB\n            mqa[Multi-Query Attention]\n            ln1[LayerNorm]\n            moe1[Mixture of Experts]\n            ln2[LayerNorm]\n            ln1 --> mqa --> ln2 --> moe1\n        end\n        \n        subgraph Block2[Block n+1]\n            direction TB\n            mqa2[Multi-Query Attention]\n            ln3[LayerNorm]\n            moe2[Mixture of Experts]\n            ln4[LayerNorm]\n            ln3 --> mqa2 --> ln4 --> moe2\n        end\n        \n        subgraph Block3[Block n+2 SSM]\n            direction TB\n            ssm[State Space Model]\n            ln5[LayerNorm]\n            moe3[Mixture of Experts]\n            ln6[LayerNorm]\n            ln5 --> ssm --> ln6 --> moe3\n        end\n    end\n    \n    subgraph Output\n        gap[Global Average Pooling]\n        classifier[Classification Head]\n    end\n    \n    img --> PatchEmbed --> TransformerBlocks --> gap --> classifier\n```\n\n## Multi-Query Attention Detail\n\n```mermaid\nflowchart LR\n    input[Input Features]\n    \n    subgraph MQA[Multi-Query Attention]\n        direction TB\n        q[Q Linear]\n        k[K Linear]\n        v[V Linear]\n        rotary[Rotary Embeddings]\n        attn[Attention Weights]\n        \n        input --> q & k & v\n        q & k --> rotary\n        rotary --> attn\n        attn --> v\n    end\n    \n    MQA --> output[Output Features]\n\n```\n\n## Installation\n\n```bash\npip install omegavit\n```\n\n## Quick Start\n\n```python\nimport torch\nfrom omegavit import create_advanced_vit\n\n# Create model\nmodel = create_advanced_vit(num_classes=1000)\n\n# Example forward pass\nbatch_size = 8\nx = torch.randn(batch_size, 3, 224, 224)\noutput = model(x)\nprint(f\"Output shape: {output.shape}\")  # [8, 1000]\n```\n\n## Model Configurations\n\n| Parameter | Default | Description |\n|-----------|---------|-------------|\n| hidden_size | 768 | Dimension of transformer layers |\n| num_attention_heads | 12 | Number of attention heads |\n| num_experts | 8 | Number of expert networks in MoE |\n| expert_capacity | 32 | Tokens per expert in MoE |\n| num_layers | 12 | Number of transformer blocks |\n| patch_size | 16 | Size of image patches |\n| ssm_state_size | 16 | Hidden state size in SSM |\n\n## Performance\n\n*Note: Benchmarks coming soon*\n\n## Citation\n\nIf you use OmegaViT in your research, please cite:\n\n```bibtex\n@article{omegavit2024,\n  title={OmegaViT: A State-of-the-Art Vision Transformer with Multi-Query Attention, State Space Modeling, and Mixture of Experts},\n  author={Agora Lab},\n  journal={arXiv preprint arXiv:XXXX.XXXXX},\n  year={2024}\n}\n```\n\n## Contributing\n\nWe welcome contributions! Please see our [contributing guidelines](CONTRIBUTING.md) for details.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Acknowledgments\n\nSpecial thanks to the Agora Lab AI team and the open-source community for their valuable contributions and feedback.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "OmegaViT: A State-of-the-Art Vision Transformer with Multi-Query Attention, State Space Modeling, and Mixture of Experts",
    "version": "0.0.1",
    "project_urls": {
        "Documentation": "https://github.com/Agora-Lab-AI/OmegaViT",
        "Homepage": "https://github.com/Agora-Lab-AI/OmegaViT",
        "Repository": "https://github.com/Agora-Lab-AI/OmegaViT"
    },
    "split_keywords": [
        "artificial intelligence",
        " deep learning",
        " optimizers",
        " prompt engineering"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "86dfa86a12ef426a99e0881bc36ffd91faa09307213bcde358188ed6a23c9071",
                "md5": "59245bb3295bba1b956f2d296a823122",
                "sha256": "74383eae3494a5e64df26cb664657b9905efd7df33bd85f21e7c62c681d75191"
            },
            "downloads": -1,
            "filename": "omegavit-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "59245bb3295bba1b956f2d296a823122",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.10",
            "size": 8153,
            "upload_time": "2024-12-19T06:39:00",
            "upload_time_iso_8601": "2024-12-19T06:39:00.831529Z",
            "url": "https://files.pythonhosted.org/packages/86/df/a86a12ef426a99e0881bc36ffd91faa09307213bcde358188ed6a23c9071/omegavit-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "294f81ae6aa58a819193274f3c04044e51fce2fd324e5d7815498f98d7db3409",
                "md5": "918266b67748cb10b04961a4d91aa8a3",
                "sha256": "5811951b2a6aecce212ca25d38fd81915b6d02a016558e5e158a2b04192d4200"
            },
            "downloads": -1,
            "filename": "omegavit-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "918266b67748cb10b04961a4d91aa8a3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 7778,
            "upload_time": "2024-12-19T06:39:03",
            "upload_time_iso_8601": "2024-12-19T06:39:03.152631Z",
            "url": "https://files.pythonhosted.org/packages/29/4f/81ae6aa58a819193274f3c04044e51fce2fd324e5d7815498f98d7db3409/omegavit-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-19 06:39:03",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Agora-Lab-AI",
    "github_project": "OmegaViT",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "torch",
            "specs": []
        },
        {
            "name": "loguru",
            "specs": []
        },
        {
            "name": "einops",
            "specs": []
        }
    ],
    "lcname": "omegavit"
}

Kye Gomez