mobilevlm

Name	mobilevlm JSON
Version	0.0.3 JSON
	download
home_page	https://github.com/kyegomez/MobileVLM
Summary	MobileVLM - Pytorch
upload_time	2024-01-04 02:11:06
maintainer
docs_url	None
author	Kye Gomez
requires_python	>=3.6,<4.0
license	MIT
keywords	artificial intelligence deep learning optimizers prompt engineering
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)

# MobileVLM
Implementation of the LDP module block in PyTorch and Zeta from the paper: "MobileVLM: A Fast, Strong and Open Vision Language Assistant for Mobile Devices"

# Install
`pip3 install mobilevlm`

## Usage
```python
# Import the necessary libraries
import torch
from mobilevlm import LDP

# Create an instance of the LDP model
ldp = LDP(in_channels=128, out_channels=128)

# Create an example input tensor
input_tensor = torch.randn(1, 128, 64, 64)

# Pass the input tensor through the LDP model to get the output
output = ldp(input_tensor)

# Print the shape of the output tensor
print(output.shape)

```

## Lightweight Downsample Projection (LDP) Layer

The Lightweight Downsample Projection (LDP) Layer is a component designed for efficient feature extraction and dimensionality reduction in convolutional neural networks. The LDP layer is particularly suited for mobile and edge devices where computational resources are limited.

The LDP layer combines depthwise separable convolutions with pointwise convolutions and skip connections, allowing for a reduced number of parameters while maintaining a rich feature representation. The incorporation of Layer Normalization stabilizes the training process and allows for faster convergence.

### Architecture

The LDP layer is structured as follows:

1. **Initial Pointwise Convolution**: This is a 1x1 convolution that transforms the input feature map to the desired number of channels. It is computationally efficient and serves as a channel-wise feature transformation.

2. **GELU Activation**: After the initial pointwise convolution, we apply a Gaussian Error Linear Unit (GELU) activation function. GELU provides non-linearity to the model, allowing it to learn more complex patterns.

3. **First Depthwise Convolution**: A depthwise convolution with a stride of 1 follows, which applies a single filter per input channel. It is used for spatial feature extraction without altering the dimensionality of the feature map.

4. **First Skip Connection**: The output of the first depthwise convolution is added back to the output of the initial pointwise convolution. This skip connection allows gradients to flow directly through the network, mitigating the vanishing gradient problem and enabling deeper architectures.

5. **Second Pointwise Convolution**: Another 1x1 convolution is applied to further mix the channel-wise features.

6. **Layer Normalization**: Normalization is applied over the channel dimension to stabilize the mean and variance of activations, leading to improved training dynamics.

7. **Second GELU Activation**: A second GELU activation function is applied for additional non-linearity.

8. **Second Depthwise Convolution**: This depthwise convolution has a stride of 2, halving the spatial dimensions of the feature map and effectively downsampling the input.

9. **Second Skip Connection**: A pixel-wise addition combines the downsampled input to the block with the output of the second depthwise convolution. This connection helps to preserve information lost due to downsampling.

10. **Third Pointwise Convolution**: A final 1x1 convolution adjusts the channel dimensions if necessary and refines the features before passing them to subsequent layers.

11. **Layer Normalization**: Another layer normalization is applied to the output of the final pointwise convolution.

## Why It Works

The LDP layer is designed to capture the essence of the input features while reducing the spatial resolution in a computationally efficient manner. The use of depthwise separable convolutions significantly decreases the number of parameters compared to standard convolutions, reducing both the computational cost and the risk of overfitting.

Skip connections not only help to preserve information throughout the layer but also improve gradient flow during backpropagation, allowing for deeper network architectures. Layer Normalization is known to accelerate training and make the model less sensitive to initialization and learning rate choices.

This combination of efficiency and robustness makes the LDP layer a versatile component in designing neural networks for resource-constrained environments.

# Citation
```bibtex
@misc{chu2023mobilevlm,
title={MobileVLM : A Fast, Reproducible and Strong Vision Language Assistant for Mobile Devices},
author={Xiangxiang Chu and Limeng Qiao and Xinyang Lin and Shuang Xu and Yang Yang and Yiming Hu and Fei Wei and Xinyu Zhang and Bo Zhang and Xiaolin Wei and Chunhua Shen},
year={2023},
eprint={2312.16886},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```

# License
MIT

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/kyegomez/MobileVLM",
    "name": "mobilevlm",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6,<4.0",
    "maintainer_email": "",
    "keywords": "artificial intelligence,deep learning,optimizers,Prompt Engineering",
    "author": "Kye Gomez",
    "author_email": "kye@apac.ai",
    "download_url": "https://files.pythonhosted.org/packages/42/c9/9e71bc94e4af424c96896b6bdc3cae5a6bd38a657bd3a3fbbda41bfb25d4/mobilevlm-0.0.3.tar.gz",
    "platform": null,
    "description": "[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)\n\n# MobileVLM\nImplementation of the LDP module block in PyTorch and Zeta from the paper: \"MobileVLM: A Fast, Strong and Open Vision Language Assistant for Mobile Devices\"\n\n\n# Install\n`pip3 install mobilevlm`\n\n\n## Usage\n```python\n# Import the necessary libraries\nimport torch\nfrom mobilevlm import LDP\n\n# Create an instance of the LDP model\nldp = LDP(in_channels=128, out_channels=128)\n\n# Create an example input tensor\ninput_tensor = torch.randn(1, 128, 64, 64)\n\n# Pass the input tensor through the LDP model to get the output\noutput = ldp(input_tensor)\n\n# Print the shape of the output tensor\nprint(output.shape)\n\n```\n\n\n## Lightweight Downsample Projection (LDP) Layer\n\nThe Lightweight Downsample Projection (LDP) Layer is a component designed for efficient feature extraction and dimensionality reduction in convolutional neural networks. The LDP layer is particularly suited for mobile and edge devices where computational resources are limited. \n\nThe LDP layer combines depthwise separable convolutions with pointwise convolutions and skip connections, allowing for a reduced number of parameters while maintaining a rich feature representation. The incorporation of Layer Normalization stabilizes the training process and allows for faster convergence.\n\n### Architecture\n\nThe LDP layer is structured as follows:\n\n1. **Initial Pointwise Convolution**: This is a 1x1 convolution that transforms the input feature map to the desired number of channels. It is computationally efficient and serves as a channel-wise feature transformation.\n\n2. **GELU Activation**: After the initial pointwise convolution, we apply a Gaussian Error Linear Unit (GELU) activation function. GELU provides non-linearity to the model, allowing it to learn more complex patterns.\n\n3. **First Depthwise Convolution**: A depthwise convolution with a stride of 1 follows, which applies a single filter per input channel. It is used for spatial feature extraction without altering the dimensionality of the feature map.\n\n4. **First Skip Connection**: The output of the first depthwise convolution is added back to the output of the initial pointwise convolution. This skip connection allows gradients to flow directly through the network, mitigating the vanishing gradient problem and enabling deeper architectures.\n\n5. **Second Pointwise Convolution**: Another 1x1 convolution is applied to further mix the channel-wise features.\n\n6. **Layer Normalization**: Normalization is applied over the channel dimension to stabilize the mean and variance of activations, leading to improved training dynamics.\n\n7. **Second GELU Activation**: A second GELU activation function is applied for additional non-linearity.\n\n8. **Second Depthwise Convolution**: This depthwise convolution has a stride of 2, halving the spatial dimensions of the feature map and effectively downsampling the input.\n\n9. **Second Skip Connection**: A pixel-wise addition combines the downsampled input to the block with the output of the second depthwise convolution. This connection helps to preserve information lost due to downsampling.\n\n10. **Third Pointwise Convolution**: A final 1x1 convolution adjusts the channel dimensions if necessary and refines the features before passing them to subsequent layers.\n\n11. **Layer Normalization**: Another layer normalization is applied to the output of the final pointwise convolution.\n\n## Why It Works\n\nThe LDP layer is designed to capture the essence of the input features while reducing the spatial resolution in a computationally efficient manner. The use of depthwise separable convolutions significantly decreases the number of parameters compared to standard convolutions, reducing both the computational cost and the risk of overfitting.\n\nSkip connections not only help to preserve information throughout the layer but also improve gradient flow during backpropagation, allowing for deeper network architectures. Layer Normalization is known to accelerate training and make the model less sensitive to initialization and learning rate choices.\n\nThis combination of efficiency and robustness makes the LDP layer a versatile component in designing neural networks for resource-constrained environments.\n\n\n\n# Citation\n```bibtex\n@misc{chu2023mobilevlm,\n    title={MobileVLM : A Fast, Reproducible and Strong Vision Language Assistant for Mobile Devices}, \n    author={Xiangxiang Chu and Limeng Qiao and Xinyang Lin and Shuang Xu and Yang Yang and Yiming Hu and Fei Wei and Xinyu Zhang and Bo Zhang and Xiaolin Wei and Chunhua Shen},\n    year={2023},\n    eprint={2312.16886},\n    archivePrefix={arXiv},\n    primaryClass={cs.CV}\n}\n```\n\n\n# License\nMIT\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "MobileVLM - Pytorch",
    "version": "0.0.3",
    "project_urls": {
        "Documentation": "https://github.com/kyegomez/MobileVLM",
        "Homepage": "https://github.com/kyegomez/MobileVLM",
        "Repository": "https://github.com/kyegomez/MobileVLM"
    },
    "split_keywords": [
        "artificial intelligence",
        "deep learning",
        "optimizers",
        "prompt engineering"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0c3488f9e319ff3d834ad395a23db32242207089ad5e9cfe7aee4507c1591ed5",
                "md5": "0474b97a0b4b7ff215fdda157935822e",
                "sha256": "2bbe1c5fb12f99323541eb0f169f7988a02ad02371945ab68ea5281b43c8938c"
            },
            "downloads": -1,
            "filename": "mobilevlm-0.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0474b97a0b4b7ff215fdda157935822e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6,<4.0",
            "size": 5198,
            "upload_time": "2024-01-04T02:11:04",
            "upload_time_iso_8601": "2024-01-04T02:11:04.133706Z",
            "url": "https://files.pythonhosted.org/packages/0c/34/88f9e319ff3d834ad395a23db32242207089ad5e9cfe7aee4507c1591ed5/mobilevlm-0.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "42c99e71bc94e4af424c96896b6bdc3cae5a6bd38a657bd3a3fbbda41bfb25d4",
                "md5": "be938c0c07c5f0a075978235b652cdb4",
                "sha256": "ccd07720419d238542fe032ae9967b504438b64be62e9325d1022ed5c5393068"
            },
            "downloads": -1,
            "filename": "mobilevlm-0.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "be938c0c07c5f0a075978235b652cdb4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6,<4.0",
            "size": 5413,
            "upload_time": "2024-01-04T02:11:06",
            "upload_time_iso_8601": "2024-01-04T02:11:06.030706Z",
            "url": "https://files.pythonhosted.org/packages/42/c9/9e71bc94e4af424c96896b6bdc3cae5a6bd38a657bd3a3fbbda41bfb25d4/mobilevlm-0.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-04 02:11:06",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kyegomez",
    "github_project": "MobileVLM",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "mobilevlm"
}

Kye Gomez