mlx-vlm


Namemlx-vlm JSON
Version 0.1.12 PyPI version JSON
download
home_pagehttps://github.com/Blaizzy/mlx-vlm
SummaryVision LLMs on Apple silicon with MLX and the Hugging Face Hub
upload_time2025-01-29 21:32:54
maintainerNone
docs_urlNone
authorPrince Canuma
requires_python>=3.8
licenseMIT
keywords
VCS
bugtrack_url
requirements mlx datasets tqdm numpy transformers scipy gradio Pillow requests
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # MLX-VLM

MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.

## Table of Contents
- [Installation](#installation)
- [Usage](#usage)
  - [Command Line Interface (CLI)](#command-line-interface-cli)
  - [Chat UI with Gradio](#chat-ui-with-gradio)
  - [Python Script](#python-script)
- [Multi-Image Chat Support](#multi-image-chat-support)
  - [Supported Models](#supported-models)
  - [Usage Examples](#usage-examples)
- [Fine-tuning](#fine-tuning)

## Installation

The easiest way to get started is to install the `mlx-vlm` package using pip:

```sh
pip install mlx-vlm
```

## Usage

### Command Line Interface (CLI)

Generate output from a model using the CLI:

```sh
python -m mlx_vlm.generate --model mlx-community/Qwen2-VL-2B-Instruct-4bit --max-tokens 100 --temp 0.0 --image http://images.cocodataset.org/val2017/000000039769.jpg
```

### Chat UI with Gradio

Launch a chat interface using Gradio:

```sh
python -m mlx_vlm.chat_ui --model mlx-community/Qwen2-VL-2B-Instruct-4bit
```

### Python Script

Here's an example of how to use MLX-VLM in a Python script:

```python
import mlx.core as mx
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

# Load the model
model_path = "mlx-community/Qwen2-VL-2B-Instruct-4bit"
model, processor = load(model_path)
config = load_config(model_path)

# Prepare input
image = ["http://images.cocodataset.org/val2017/000000039769.jpg"]
prompt = "Describe this image."

# Apply chat template
formatted_prompt = apply_chat_template(
    processor, config, prompt, num_images=len(image)
)

# Generate output
output = generate(model, processor, formatted_prompt, image, verbose=False)
print(output)
```

## Multi-Image Chat Support

MLX-VLM supports analyzing multiple images simultaneously with select models. This feature enables more complex visual reasoning tasks and comprehensive analysis across multiple images in a single conversation.

### Supported Models

The following models support multi-image chat:

1. Idefics 2
2. LLaVA (Interleave)
3. Qwen2-VL
4. Phi3-Vision
5. Pixtral

### Usage Examples

#### Python Script

```python
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

model_path = "mlx-community/Qwen2-VL-2B-Instruct-4bit"
model, processor = load(model_path)
config = load_config(model_path)

images = ["path/to/image1.jpg", "path/to/image2.jpg"]
prompt = "Compare these two images."

formatted_prompt = apply_chat_template(
    processor, config, prompt, num_images=len(images)
)

output = generate(model, processor, formatted_prompt, images, verbose=False)
print(output)
```

#### Command Line

```sh
python -m mlx_vlm.generate --model mlx-community/Qwen2-VL-2B-Instruct-4bit --max-tokens 100 --prompt "Compare these images" --image path/to/image1.jpg path/to/image2.jpg
```

These examples demonstrate how to use multiple images with MLX-VLM for more complex visual reasoning tasks.

# Fine-tuning

MLX-VLM supports fine-tuning models with LoRA and QLoRA.

## LoRA & QLoRA

To learn more about LoRA, please refer to the [LoRA.md](./mlx_vlm/LORA.MD) file.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Blaizzy/mlx-vlm",
    "name": "mlx-vlm",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": "Prince Canuma",
    "author_email": "prince.gdt@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/2f/77/c13a27eb1e5ca76d5ded3618bc11f012783b91c715c953522ba0a83e653f/mlx_vlm-0.1.12.tar.gz",
    "platform": null,
    "description": "# MLX-VLM\n\nMLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.\n\n## Table of Contents\n- [Installation](#installation)\n- [Usage](#usage)\n  - [Command Line Interface (CLI)](#command-line-interface-cli)\n  - [Chat UI with Gradio](#chat-ui-with-gradio)\n  - [Python Script](#python-script)\n- [Multi-Image Chat Support](#multi-image-chat-support)\n  - [Supported Models](#supported-models)\n  - [Usage Examples](#usage-examples)\n- [Fine-tuning](#fine-tuning)\n\n## Installation\n\nThe easiest way to get started is to install the `mlx-vlm` package using pip:\n\n```sh\npip install mlx-vlm\n```\n\n## Usage\n\n### Command Line Interface (CLI)\n\nGenerate output from a model using the CLI:\n\n```sh\npython -m mlx_vlm.generate --model mlx-community/Qwen2-VL-2B-Instruct-4bit --max-tokens 100 --temp 0.0 --image http://images.cocodataset.org/val2017/000000039769.jpg\n```\n\n### Chat UI with Gradio\n\nLaunch a chat interface using Gradio:\n\n```sh\npython -m mlx_vlm.chat_ui --model mlx-community/Qwen2-VL-2B-Instruct-4bit\n```\n\n### Python Script\n\nHere's an example of how to use MLX-VLM in a Python script:\n\n```python\nimport mlx.core as mx\nfrom mlx_vlm import load, generate\nfrom mlx_vlm.prompt_utils import apply_chat_template\nfrom mlx_vlm.utils import load_config\n\n# Load the model\nmodel_path = \"mlx-community/Qwen2-VL-2B-Instruct-4bit\"\nmodel, processor = load(model_path)\nconfig = load_config(model_path)\n\n# Prepare input\nimage = [\"http://images.cocodataset.org/val2017/000000039769.jpg\"]\nprompt = \"Describe this image.\"\n\n# Apply chat template\nformatted_prompt = apply_chat_template(\n    processor, config, prompt, num_images=len(image)\n)\n\n# Generate output\noutput = generate(model, processor, formatted_prompt, image, verbose=False)\nprint(output)\n```\n\n## Multi-Image Chat Support\n\nMLX-VLM supports analyzing multiple images simultaneously with select models. This feature enables more complex visual reasoning tasks and comprehensive analysis across multiple images in a single conversation.\n\n### Supported Models\n\nThe following models support multi-image chat:\n\n1. Idefics 2\n2. LLaVA (Interleave)\n3. Qwen2-VL\n4. Phi3-Vision\n5. Pixtral\n\n### Usage Examples\n\n#### Python Script\n\n```python\nfrom mlx_vlm import load, generate\nfrom mlx_vlm.prompt_utils import apply_chat_template\nfrom mlx_vlm.utils import load_config\n\nmodel_path = \"mlx-community/Qwen2-VL-2B-Instruct-4bit\"\nmodel, processor = load(model_path)\nconfig = load_config(model_path)\n\nimages = [\"path/to/image1.jpg\", \"path/to/image2.jpg\"]\nprompt = \"Compare these two images.\"\n\nformatted_prompt = apply_chat_template(\n    processor, config, prompt, num_images=len(images)\n)\n\noutput = generate(model, processor, formatted_prompt, images, verbose=False)\nprint(output)\n```\n\n#### Command Line\n\n```sh\npython -m mlx_vlm.generate --model mlx-community/Qwen2-VL-2B-Instruct-4bit --max-tokens 100 --prompt \"Compare these images\" --image path/to/image1.jpg path/to/image2.jpg\n```\n\nThese examples demonstrate how to use multiple images with MLX-VLM for more complex visual reasoning tasks.\n\n# Fine-tuning\n\nMLX-VLM supports fine-tuning models with LoRA and QLoRA.\n\n## LoRA & QLoRA\n\nTo learn more about LoRA, please refer to the [LoRA.md](./mlx_vlm/LORA.MD) file.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Vision LLMs on Apple silicon with MLX and the Hugging Face Hub",
    "version": "0.1.12",
    "project_urls": {
        "Homepage": "https://github.com/Blaizzy/mlx-vlm"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "4bd91db159a72c3485e3eea893426d40fff72c614e2d3332e80f92460129be4c",
                "md5": "9cf076f5270a96c60e9e2fe5ba1bbcc3",
                "sha256": "8a0b52e3c9d98d2662b57ebf7c0c7981b9c29979e5dca23541bd97561edfb48c"
            },
            "downloads": -1,
            "filename": "mlx_vlm-0.1.12-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9cf076f5270a96c60e9e2fe5ba1bbcc3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 166659,
            "upload_time": "2025-01-29T21:32:49",
            "upload_time_iso_8601": "2025-01-29T21:32:49.599413Z",
            "url": "https://files.pythonhosted.org/packages/4b/d9/1db159a72c3485e3eea893426d40fff72c614e2d3332e80f92460129be4c/mlx_vlm-0.1.12-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "2f77c13a27eb1e5ca76d5ded3618bc11f012783b91c715c953522ba0a83e653f",
                "md5": "86622dc2cc7cebeadc2fbfbe91ac23cb",
                "sha256": "039a8d9afa9c1aa4f753eb3658199350aaaac51c6ad195540289911ff2021f2c"
            },
            "downloads": -1,
            "filename": "mlx_vlm-0.1.12.tar.gz",
            "has_sig": false,
            "md5_digest": "86622dc2cc7cebeadc2fbfbe91ac23cb",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 125624,
            "upload_time": "2025-01-29T21:32:54",
            "upload_time_iso_8601": "2025-01-29T21:32:54.757884Z",
            "url": "https://files.pythonhosted.org/packages/2f/77/c13a27eb1e5ca76d5ded3618bc11f012783b91c715c953522ba0a83e653f/mlx_vlm-0.1.12.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-29 21:32:54",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Blaizzy",
    "github_project": "mlx-vlm",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "mlx",
            "specs": [
                [
                    ">=",
                    "0.22.0"
                ]
            ]
        },
        {
            "name": "datasets",
            "specs": [
                [
                    ">=",
                    "2.19.1"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    ">=",
                    "4.66.2"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.23.4"
                ]
            ]
        },
        {
            "name": "transformers",
            "specs": [
                [
                    ">=",
                    "4.47.1"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    "==",
                    "1.13.1"
                ]
            ]
        },
        {
            "name": "gradio",
            "specs": [
                [
                    ">=",
                    "4.44.0"
                ]
            ]
        },
        {
            "name": "Pillow",
            "specs": [
                [
                    ">=",
                    "10.3.0"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    ">=",
                    "2.31.0"
                ]
            ]
        }
    ],
    "lcname": "mlx-vlm"
}
        
Elapsed time: 1.71830s