Name | mlx-vlm JSON |
Version |
0.1.0
JSON |
| download |
home_page | https://github.com/Blaizzy/mlx-vlm |
Summary | Vision LLMs on Apple silicon with MLX and the Hugging Face Hub |
upload_time | 2024-10-18 00:15:34 |
maintainer | None |
docs_url | None |
author | Prince Canuma |
requires_python | >=3.8 |
license | MIT |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# MLX-VLM
MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.
## Table of Contents
- [Installation](#installation)
- [Usage](#usage)
- [Command Line Interface (CLI)](#command-line-interface-cli)
- [Chat UI with Gradio](#chat-ui-with-gradio)
- [Python Script](#python-script)
- [Multi-Image Chat Support](#multi-image-chat-support)
- [Supported Models](#supported-models)
- [Usage Examples](#usage-examples)
- [Fine-tuning](#fine-tuning)
## Installation
The easiest way to get started is to install the `mlx-vlm` package using pip:
```sh
pip install mlx-vlm
```
## Usage
### Command Line Interface (CLI)
Generate output from a model using the CLI:
```sh
python -m mlx_vlm.generate --model mlx-community/Qwen2-VL-2B-Instruct-4bit --max-tokens 100 --temp 0.0 --image http://images.cocodataset.org/val2017/000000039769.jpg
```
### Chat UI with Gradio
Launch a chat interface using Gradio:
```sh
python -m mlx_vlm.chat_ui --model mlx-community/Qwen2-VL-2B-Instruct-4bit
```
### Python Script
Here's an example of how to use MLX-VLM in a Python script:
```python
import mlx.core as mx
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config
# Load the model
model_path = "mlx-community/Qwen2-VL-2B-Instruct-4bit"
model, processor = load(model_path)
config = load_config(model_path)
# Prepare input
image = ["http://images.cocodataset.org/val2017/000000039769.jpg"]
prompt = "Describe this image."
# Apply chat template
formatted_prompt = apply_chat_template(
processor, config, prompt, num_images=len(image)
)
# Generate output
output = generate(model, processor, image, formatted_prompt, verbose=False)
print(output)
```
## Multi-Image Chat Support
MLX-VLM supports analyzing multiple images simultaneously with select models. This feature enables more complex visual reasoning tasks and comprehensive analysis across multiple images in a single conversation.
### Supported Models
The following models support multi-image chat:
1. Idefics 2
2. LLaVA (Interleave)
3. Qwen2-VL
4. Phi3-Vision
5. Pixtral
### Usage Examples
#### Python Script
```python
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config
model_path = "mlx-community/Qwen2-VL-2B-Instruct-4bit"
model, processor = load(model_path)
config = load_config(model_path)
images = ["path/to/image1.jpg", "path/to/image2.jpg"]
prompt = "Compare these two images."
formatted_prompt = apply_chat_template(
processor, config, prompt, num_images=len(images)
)
output = generate(model, processor, images, formatted_prompt, verbose=False)
print(output)
```
#### Command Line
```sh
python -m mlx_vlm.generate --model mlx-community/Qwen2-VL-2B-Instruct-4bit --max-tokens 100 --prompt "Compare these images" --image path/to/image1.jpg path/to/image2.jpg
```
These examples demonstrate how to use multiple images with MLX-VLM for more complex visual reasoning tasks.
# Fine-tuning
MLX-VLM supports fine-tuning models with LoRA and QLoRA.
## LoRA & QLoRA
To learn more about LoRA, please refer to the [LoRA.md](./mlx_vlm/LoRA.md) file.
Raw data
{
"_id": null,
"home_page": "https://github.com/Blaizzy/mlx-vlm",
"name": "mlx-vlm",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": null,
"author": "Prince Canuma",
"author_email": "prince.gdt@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/75/9f/5950a46a52b097ea92c6ab237512849ec85b4d839af7605737237f332cba/mlx_vlm-0.1.0.tar.gz",
"platform": null,
"description": "# MLX-VLM\n\nMLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.\n\n## Table of Contents\n- [Installation](#installation)\n- [Usage](#usage)\n - [Command Line Interface (CLI)](#command-line-interface-cli)\n - [Chat UI with Gradio](#chat-ui-with-gradio)\n - [Python Script](#python-script)\n- [Multi-Image Chat Support](#multi-image-chat-support)\n - [Supported Models](#supported-models)\n - [Usage Examples](#usage-examples)\n- [Fine-tuning](#fine-tuning)\n\n## Installation\n\nThe easiest way to get started is to install the `mlx-vlm` package using pip:\n\n```sh\npip install mlx-vlm\n```\n\n## Usage\n\n### Command Line Interface (CLI)\n\nGenerate output from a model using the CLI:\n\n```sh\npython -m mlx_vlm.generate --model mlx-community/Qwen2-VL-2B-Instruct-4bit --max-tokens 100 --temp 0.0 --image http://images.cocodataset.org/val2017/000000039769.jpg\n```\n\n### Chat UI with Gradio\n\nLaunch a chat interface using Gradio:\n\n```sh\npython -m mlx_vlm.chat_ui --model mlx-community/Qwen2-VL-2B-Instruct-4bit\n```\n\n### Python Script\n\nHere's an example of how to use MLX-VLM in a Python script:\n\n```python\nimport mlx.core as mx\nfrom mlx_vlm import load, generate\nfrom mlx_vlm.prompt_utils import apply_chat_template\nfrom mlx_vlm.utils import load_config\n\n# Load the model\nmodel_path = \"mlx-community/Qwen2-VL-2B-Instruct-4bit\"\nmodel, processor = load(model_path)\nconfig = load_config(model_path)\n\n# Prepare input\nimage = [\"http://images.cocodataset.org/val2017/000000039769.jpg\"]\nprompt = \"Describe this image.\"\n\n# Apply chat template\nformatted_prompt = apply_chat_template(\n processor, config, prompt, num_images=len(image)\n)\n\n# Generate output\noutput = generate(model, processor, image, formatted_prompt, verbose=False)\nprint(output)\n```\n\n## Multi-Image Chat Support\n\nMLX-VLM supports analyzing multiple images simultaneously with select models. This feature enables more complex visual reasoning tasks and comprehensive analysis across multiple images in a single conversation.\n\n### Supported Models\n\nThe following models support multi-image chat:\n\n1. Idefics 2\n2. LLaVA (Interleave)\n3. Qwen2-VL\n4. Phi3-Vision\n5. Pixtral\n\n### Usage Examples\n\n#### Python Script\n\n```python\nfrom mlx_vlm import load, generate\nfrom mlx_vlm.prompt_utils import apply_chat_template\nfrom mlx_vlm.utils import load_config\n\nmodel_path = \"mlx-community/Qwen2-VL-2B-Instruct-4bit\"\nmodel, processor = load(model_path)\nconfig = load_config(model_path)\n\nimages = [\"path/to/image1.jpg\", \"path/to/image2.jpg\"]\nprompt = \"Compare these two images.\"\n\nformatted_prompt = apply_chat_template(\n processor, config, prompt, num_images=len(images)\n)\n\noutput = generate(model, processor, images, formatted_prompt, verbose=False)\nprint(output)\n```\n\n#### Command Line\n\n```sh\npython -m mlx_vlm.generate --model mlx-community/Qwen2-VL-2B-Instruct-4bit --max-tokens 100 --prompt \"Compare these images\" --image path/to/image1.jpg path/to/image2.jpg\n```\n\nThese examples demonstrate how to use multiple images with MLX-VLM for more complex visual reasoning tasks.\n\n# Fine-tuning\n\nMLX-VLM supports fine-tuning models with LoRA and QLoRA.\n\n## LoRA & QLoRA\n\nTo learn more about LoRA, please refer to the [LoRA.md](./mlx_vlm/LoRA.md) file.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Vision LLMs on Apple silicon with MLX and the Hugging Face Hub",
"version": "0.1.0",
"project_urls": {
"Homepage": "https://github.com/Blaizzy/mlx-vlm"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "313d480caabed732ff97be92566288216a6468f43c6dad773528449a1aa7ab10",
"md5": "2a1a262dd23e11368ce056cf4e3e99a2",
"sha256": "19b863a12648a109d3ab78dab9cf0adb7ceb4ecfb047dcbb5dbd947a199bf892"
},
"downloads": -1,
"filename": "mlx_vlm-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2a1a262dd23e11368ce056cf4e3e99a2",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 111892,
"upload_time": "2024-10-18T00:15:32",
"upload_time_iso_8601": "2024-10-18T00:15:32.827724Z",
"url": "https://files.pythonhosted.org/packages/31/3d/480caabed732ff97be92566288216a6468f43c6dad773528449a1aa7ab10/mlx_vlm-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "759f5950a46a52b097ea92c6ab237512849ec85b4d839af7605737237f332cba",
"md5": "1b8409b57c2248b1315023ca7be271a8",
"sha256": "ce53cc24faf23ea4052f8c1bac0a782d5483028d23768516779e1be0024637c6"
},
"downloads": -1,
"filename": "mlx_vlm-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "1b8409b57c2248b1315023ca7be271a8",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 81768,
"upload_time": "2024-10-18T00:15:34",
"upload_time_iso_8601": "2024-10-18T00:15:34.645263Z",
"url": "https://files.pythonhosted.org/packages/75/9f/5950a46a52b097ea92c6ab237512849ec85b4d839af7605737237f332cba/mlx_vlm-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-18 00:15:34",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Blaizzy",
"github_project": "mlx-vlm",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "mlx-vlm"
}