mlx-vlm


Namemlx-vlm JSON
Version 0.3.1 PyPI version JSON
download
home_pagehttps://github.com/Blaizzy/mlx-vlm
SummaryVision Language Models (VLMs) and Omni Models (Vision, Audio and Video support) on Apple silicon with MLX and the Hugging Face Hub
upload_time2025-07-12 16:21:31
maintainerNone
docs_urlNone
authorPrince Canuma
requires_python>=3.8
licenseMIT
keywords
VCS
bugtrack_url
requirements mlx datasets tqdm transformers gradio Pillow requests opencv-python mlx-lm fastapi soundfile scipy numpy mlx-audio
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![Upload Python Package](https://github.com/Blaizzy/mlx-vlm/actions/workflows/python-publish.yml/badge.svg)](https://github.com/Blaizzy/mlx-vlm/actions/workflows/python-publish.yml)
# MLX-VLM

MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) and Omni Models (VLMs with audio and video support) on your Mac using MLX.

## Table of Contents
- [Installation](#installation)
- [Usage](#usage)
  - [Command Line Interface (CLI)](#command-line-interface-cli)
  - [Chat UI with Gradio](#chat-ui-with-gradio)
  - [Python Script](#python-script)
- [Multi-Image Chat Support](#multi-image-chat-support)
  - [Supported Models](#supported-models)
  - [Usage Examples](#usage-examples)
- [Fine-tuning](#fine-tuning)

## Installation

The easiest way to get started is to install the `mlx-vlm` package using pip:

```sh
pip install -U mlx-vlm
```

## Usage

### Command Line Interface (CLI)

Generate output from a model using the CLI:

```sh
# Image generation
mlx_vlm.generate --model mlx-community/Qwen2-VL-2B-Instruct-4bit --max-tokens 100 --temperature 0.0 --image http://images.cocodataset.org/val2017/000000039769.jpg

# Audio generation (New)
mlx_vlm.generate --model mlx-community/gemma-3n-E2B-it-4bit --max-tokens 100 --prompt "Describe what you hear" --audio /path/to/audio.wav

# Multi-modal generation (Image + Audio)
mlx_vlm.generate --model mlx-community/gemma-3n-E2B-it-4bit --max-tokens 100 --prompt "Describe what you see and hear" --image /path/to/image.jpg --audio /path/to/audio.wav
```

### Chat UI with Gradio

Launch a chat interface using Gradio:

```sh
mlx_vlm.chat_ui --model mlx-community/Qwen2-VL-2B-Instruct-4bit
```

### Python Script

Here's an example of how to use MLX-VLM in a Python script:

```python
import mlx.core as mx
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

# Load the model
model_path = "mlx-community/Qwen2-VL-2B-Instruct-4bit"
model, processor = load(model_path)
config = load_config(model_path)

# Prepare input
image = ["http://images.cocodataset.org/val2017/000000039769.jpg"]
# image = [Image.open("...")] can also be used with PIL.Image.Image objects
prompt = "Describe this image."

# Apply chat template
formatted_prompt = apply_chat_template(
    processor, config, prompt, num_images=len(image)
)

# Generate output
output = generate(model, processor, formatted_prompt, image, verbose=False)
print(output)
```

#### Audio Example

```python
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

# Load model with audio support
model_path = "mlx-community/gemma-3n-E2B-it-4bit"
model, processor = load(model_path)
config = model.config

# Prepare audio input
audio = ["/path/to/audio1.wav", "/path/to/audio2.mp3"]
prompt = "Describe what you hear in these audio files."

# Apply chat template with audio
formatted_prompt = apply_chat_template(
    processor, config, prompt, num_audios=len(audio)
)

# Generate output with audio
output = generate(model, processor, formatted_prompt, audio=audio, verbose=False)
print(output)
```

#### Multi-Modal Example (Image + Audio)

```python
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

# Load multi-modal model
model_path = "mlx-community/gemma-3n-E2B-it-4bit"
model, processor = load(model_path)
config = model.config

# Prepare inputs
image = ["/path/to/image.jpg"]
audio = ["/path/to/audio.wav"]
prompt = ""

# Apply chat template
formatted_prompt = apply_chat_template(
    processor, config, prompt,
    num_images=len(image),
    num_audios=len(audio)
)

# Generate output
output = generate(model, processor, formatted_prompt, image, audio=audio, verbose=False)
print(output)
```

### Server (FastAPI)

Start the server:
```sh
mlx_vlm.server
```

The server provides multiple endpoints for different use cases and supports dynamic model loading/unloading with caching (one model at a time).

#### Available Endpoints

- `/generate` - Main generation endpoint with support for images, audio, and text
- `/chat` - Chat-style interaction endpoint
- `/responses` - OpenAI-compatible endpoint
- `/health` - Check server status
- `/unload` - Unload current model from memory

#### Usage Examples

##### Basic Image Generation
```sh
curl -X POST "http://localhost:8000/generate" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mlx-community/Qwen2.5-VL-32B-Instruct-8bit",
    "image": ["/path/to/repo/examples/images/renewables_california.png"],
    "prompt": "This is today'\''s chart for energy demand in California. Can you provide an analysis of the chart and comment on the implications for renewable energy in California?",
    "system": "You are a helpful assistant.",
    "stream": true,
    "max_tokens": 1000
  }'
```

##### Audio Support (New)
```sh
curl -X POST "http://localhost:8000/generate" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mlx-community/gemma-3n-E2B-it-4bit",
    "audio": ["/path/to/audio1.wav", "https://example.com/audio2.mp3"],
    "prompt": "Describe what you hear in these audio files",
    "stream": true,
    "max_tokens": 500
  }'
```

##### Multi-Modal (Image + Audio)
```sh
curl -X POST "http://localhost:8000/generate" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mlx-community/gemma-3n-E2B-it-4bit",
    "image": ["/path/to/image.jpg"],
    "audio": ["/path/to/audio.wav"],
    "prompt": "",
    "max_tokens": 1000
  }'
```

##### Chat Endpoint
```sh
curl -X POST "http://localhost:8000/chat" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mlx-community/Qwen2-VL-2B-Instruct-4bit",
    "messages": [
      {
        "role": "user",
        "content": "What is in this image?",
        "images": ["/path/to/image.jpg"]
      }
    ],
    "max_tokens": 100
  }'
```

##### OpenAI-Compatible Endpoint
```sh
curl -X POST "http://localhost:8000/responses" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mlx-community/Qwen2-VL-2B-Instruct-4bit",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "input_text", "text": "What is in this image?"},
          {"type": "input_image", "image": "/path/to/image.jpg"}
        ]
      }
    ],
    "max_tokens": 100
  }'
```

#### Request Parameters

- `model`: Model identifier (required)
- `prompt`: Text prompt for generation
- `image`: List of image URLs or local paths (optional)
- `audio`: List of audio URLs or local paths (optional, new)
- `system`: System prompt (optional)
- `messages`: Chat messages for chat/OpenAI endpoints
- `max_tokens`: Maximum tokens to generate
- `temperature`: Sampling temperature
- `top_p`: Top-p sampling parameter
- `stream`: Enable streaming responses


## Multi-Image Chat Support

MLX-VLM supports analyzing multiple images simultaneously with select models. This feature enables more complex visual reasoning tasks and comprehensive analysis across multiple images in a single conversation.


### Usage Examples

#### Python Script

```python
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

model_path = "mlx-community/Qwen2-VL-2B-Instruct-4bit"
model, processor = load(model_path)
config = model.config

images = ["path/to/image1.jpg", "path/to/image2.jpg"]
prompt = "Compare these two images."

formatted_prompt = apply_chat_template(
    processor, config, prompt, num_images=len(images)
)

output = generate(model, processor, formatted_prompt, images, verbose=False)
print(output)
```

#### Command Line

```sh
mlx_vlm.generate --model mlx-community/Qwen2-VL-2B-Instruct-4bit --max-tokens 100 --prompt "Compare these images" --image path/to/image1.jpg path/to/image2.jpg
```

## Video Understanding

MLX-VLM also supports video analysis such as captioning, summarization, and more, with select models.

### Supported Models

The following models support video chat:

1. Qwen2-VL
2. Qwen2.5-VL
3. Idefics3
4. LLaVA

With more coming soon.

### Usage Examples

#### Command Line
```sh
mlx_vlm.video_generate --model mlx-community/Qwen2-VL-2B-Instruct-4bit --max-tokens 100 --prompt "Describe this video" --video path/to/video.mp4 --max-pixels 224 224 --fps 1.0
```


These examples demonstrate how to use multiple images with MLX-VLM for more complex visual reasoning tasks.

# Fine-tuning

MLX-VLM supports fine-tuning models with LoRA and QLoRA.

## LoRA & QLoRA

To learn more about LoRA, please refer to the [LoRA.md](./mlx_vlm/LORA.MD) file.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Blaizzy/mlx-vlm",
    "name": "mlx-vlm",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": "Prince Canuma",
    "author_email": "prince.gdt@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/de/c2/f8a664ba84159bdf4ee89511ffe603e3a98fd2e50fabb3b4d01829246793/mlx_vlm-0.3.1.tar.gz",
    "platform": null,
    "description": "[![Upload Python Package](https://github.com/Blaizzy/mlx-vlm/actions/workflows/python-publish.yml/badge.svg)](https://github.com/Blaizzy/mlx-vlm/actions/workflows/python-publish.yml)\n# MLX-VLM\n\nMLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) and Omni Models (VLMs with audio and video support) on your Mac using MLX.\n\n## Table of Contents\n- [Installation](#installation)\n- [Usage](#usage)\n  - [Command Line Interface (CLI)](#command-line-interface-cli)\n  - [Chat UI with Gradio](#chat-ui-with-gradio)\n  - [Python Script](#python-script)\n- [Multi-Image Chat Support](#multi-image-chat-support)\n  - [Supported Models](#supported-models)\n  - [Usage Examples](#usage-examples)\n- [Fine-tuning](#fine-tuning)\n\n## Installation\n\nThe easiest way to get started is to install the `mlx-vlm` package using pip:\n\n```sh\npip install -U mlx-vlm\n```\n\n## Usage\n\n### Command Line Interface (CLI)\n\nGenerate output from a model using the CLI:\n\n```sh\n# Image generation\nmlx_vlm.generate --model mlx-community/Qwen2-VL-2B-Instruct-4bit --max-tokens 100 --temperature 0.0 --image http://images.cocodataset.org/val2017/000000039769.jpg\n\n# Audio generation (New)\nmlx_vlm.generate --model mlx-community/gemma-3n-E2B-it-4bit --max-tokens 100 --prompt \"Describe what you hear\" --audio /path/to/audio.wav\n\n# Multi-modal generation (Image + Audio)\nmlx_vlm.generate --model mlx-community/gemma-3n-E2B-it-4bit --max-tokens 100 --prompt \"Describe what you see and hear\" --image /path/to/image.jpg --audio /path/to/audio.wav\n```\n\n### Chat UI with Gradio\n\nLaunch a chat interface using Gradio:\n\n```sh\nmlx_vlm.chat_ui --model mlx-community/Qwen2-VL-2B-Instruct-4bit\n```\n\n### Python Script\n\nHere's an example of how to use MLX-VLM in a Python script:\n\n```python\nimport mlx.core as mx\nfrom mlx_vlm import load, generate\nfrom mlx_vlm.prompt_utils import apply_chat_template\nfrom mlx_vlm.utils import load_config\n\n# Load the model\nmodel_path = \"mlx-community/Qwen2-VL-2B-Instruct-4bit\"\nmodel, processor = load(model_path)\nconfig = load_config(model_path)\n\n# Prepare input\nimage = [\"http://images.cocodataset.org/val2017/000000039769.jpg\"]\n# image = [Image.open(\"...\")] can also be used with PIL.Image.Image objects\nprompt = \"Describe this image.\"\n\n# Apply chat template\nformatted_prompt = apply_chat_template(\n    processor, config, prompt, num_images=len(image)\n)\n\n# Generate output\noutput = generate(model, processor, formatted_prompt, image, verbose=False)\nprint(output)\n```\n\n#### Audio Example\n\n```python\nfrom mlx_vlm import load, generate\nfrom mlx_vlm.prompt_utils import apply_chat_template\nfrom mlx_vlm.utils import load_config\n\n# Load model with audio support\nmodel_path = \"mlx-community/gemma-3n-E2B-it-4bit\"\nmodel, processor = load(model_path)\nconfig = model.config\n\n# Prepare audio input\naudio = [\"/path/to/audio1.wav\", \"/path/to/audio2.mp3\"]\nprompt = \"Describe what you hear in these audio files.\"\n\n# Apply chat template with audio\nformatted_prompt = apply_chat_template(\n    processor, config, prompt, num_audios=len(audio)\n)\n\n# Generate output with audio\noutput = generate(model, processor, formatted_prompt, audio=audio, verbose=False)\nprint(output)\n```\n\n#### Multi-Modal Example (Image + Audio)\n\n```python\nfrom mlx_vlm import load, generate\nfrom mlx_vlm.prompt_utils import apply_chat_template\nfrom mlx_vlm.utils import load_config\n\n# Load multi-modal model\nmodel_path = \"mlx-community/gemma-3n-E2B-it-4bit\"\nmodel, processor = load(model_path)\nconfig = model.config\n\n# Prepare inputs\nimage = [\"/path/to/image.jpg\"]\naudio = [\"/path/to/audio.wav\"]\nprompt = \"\"\n\n# Apply chat template\nformatted_prompt = apply_chat_template(\n    processor, config, prompt,\n    num_images=len(image),\n    num_audios=len(audio)\n)\n\n# Generate output\noutput = generate(model, processor, formatted_prompt, image, audio=audio, verbose=False)\nprint(output)\n```\n\n### Server (FastAPI)\n\nStart the server:\n```sh\nmlx_vlm.server\n```\n\nThe server provides multiple endpoints for different use cases and supports dynamic model loading/unloading with caching (one model at a time).\n\n#### Available Endpoints\n\n- `/generate` - Main generation endpoint with support for images, audio, and text\n- `/chat` - Chat-style interaction endpoint\n- `/responses` - OpenAI-compatible endpoint\n- `/health` - Check server status\n- `/unload` - Unload current model from memory\n\n#### Usage Examples\n\n##### Basic Image Generation\n```sh\ncurl -X POST \"http://localhost:8000/generate\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"model\": \"mlx-community/Qwen2.5-VL-32B-Instruct-8bit\",\n    \"image\": [\"/path/to/repo/examples/images/renewables_california.png\"],\n    \"prompt\": \"This is today'\\''s chart for energy demand in California. Can you provide an analysis of the chart and comment on the implications for renewable energy in California?\",\n    \"system\": \"You are a helpful assistant.\",\n    \"stream\": true,\n    \"max_tokens\": 1000\n  }'\n```\n\n##### Audio Support (New)\n```sh\ncurl -X POST \"http://localhost:8000/generate\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"model\": \"mlx-community/gemma-3n-E2B-it-4bit\",\n    \"audio\": [\"/path/to/audio1.wav\", \"https://example.com/audio2.mp3\"],\n    \"prompt\": \"Describe what you hear in these audio files\",\n    \"stream\": true,\n    \"max_tokens\": 500\n  }'\n```\n\n##### Multi-Modal (Image + Audio)\n```sh\ncurl -X POST \"http://localhost:8000/generate\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"model\": \"mlx-community/gemma-3n-E2B-it-4bit\",\n    \"image\": [\"/path/to/image.jpg\"],\n    \"audio\": [\"/path/to/audio.wav\"],\n    \"prompt\": \"\",\n    \"max_tokens\": 1000\n  }'\n```\n\n##### Chat Endpoint\n```sh\ncurl -X POST \"http://localhost:8000/chat\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"model\": \"mlx-community/Qwen2-VL-2B-Instruct-4bit\",\n    \"messages\": [\n      {\n        \"role\": \"user\",\n        \"content\": \"What is in this image?\",\n        \"images\": [\"/path/to/image.jpg\"]\n      }\n    ],\n    \"max_tokens\": 100\n  }'\n```\n\n##### OpenAI-Compatible Endpoint\n```sh\ncurl -X POST \"http://localhost:8000/responses\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"model\": \"mlx-community/Qwen2-VL-2B-Instruct-4bit\",\n    \"messages\": [\n      {\n        \"role\": \"user\",\n        \"content\": [\n          {\"type\": \"input_text\", \"text\": \"What is in this image?\"},\n          {\"type\": \"input_image\", \"image\": \"/path/to/image.jpg\"}\n        ]\n      }\n    ],\n    \"max_tokens\": 100\n  }'\n```\n\n#### Request Parameters\n\n- `model`: Model identifier (required)\n- `prompt`: Text prompt for generation\n- `image`: List of image URLs or local paths (optional)\n- `audio`: List of audio URLs or local paths (optional, new)\n- `system`: System prompt (optional)\n- `messages`: Chat messages for chat/OpenAI endpoints\n- `max_tokens`: Maximum tokens to generate\n- `temperature`: Sampling temperature\n- `top_p`: Top-p sampling parameter\n- `stream`: Enable streaming responses\n\n\n## Multi-Image Chat Support\n\nMLX-VLM supports analyzing multiple images simultaneously with select models. This feature enables more complex visual reasoning tasks and comprehensive analysis across multiple images in a single conversation.\n\n\n### Usage Examples\n\n#### Python Script\n\n```python\nfrom mlx_vlm import load, generate\nfrom mlx_vlm.prompt_utils import apply_chat_template\nfrom mlx_vlm.utils import load_config\n\nmodel_path = \"mlx-community/Qwen2-VL-2B-Instruct-4bit\"\nmodel, processor = load(model_path)\nconfig = model.config\n\nimages = [\"path/to/image1.jpg\", \"path/to/image2.jpg\"]\nprompt = \"Compare these two images.\"\n\nformatted_prompt = apply_chat_template(\n    processor, config, prompt, num_images=len(images)\n)\n\noutput = generate(model, processor, formatted_prompt, images, verbose=False)\nprint(output)\n```\n\n#### Command Line\n\n```sh\nmlx_vlm.generate --model mlx-community/Qwen2-VL-2B-Instruct-4bit --max-tokens 100 --prompt \"Compare these images\" --image path/to/image1.jpg path/to/image2.jpg\n```\n\n## Video Understanding\n\nMLX-VLM also supports video analysis such as captioning, summarization, and more, with select models.\n\n### Supported Models\n\nThe following models support video chat:\n\n1. Qwen2-VL\n2. Qwen2.5-VL\n3. Idefics3\n4. LLaVA\n\nWith more coming soon.\n\n### Usage Examples\n\n#### Command Line\n```sh\nmlx_vlm.video_generate --model mlx-community/Qwen2-VL-2B-Instruct-4bit --max-tokens 100 --prompt \"Describe this video\" --video path/to/video.mp4 --max-pixels 224 224 --fps 1.0\n```\n\n\nThese examples demonstrate how to use multiple images with MLX-VLM for more complex visual reasoning tasks.\n\n# Fine-tuning\n\nMLX-VLM supports fine-tuning models with LoRA and QLoRA.\n\n## LoRA & QLoRA\n\nTo learn more about LoRA, please refer to the [LoRA.md](./mlx_vlm/LORA.MD) file.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Vision Language Models (VLMs) and Omni Models (Vision, Audio and Video support) on Apple silicon with MLX and the Hugging Face Hub",
    "version": "0.3.1",
    "project_urls": {
        "Homepage": "https://github.com/Blaizzy/mlx-vlm"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ceff7a24cb5a70482113f752ffa89b942d1c7e3710c99aa7eeebd6d4d0d34f7a",
                "md5": "56d08ade8746ca0897d26df25875cc69",
                "sha256": "9de5063149192f4801c2349d63a2949090d989d79aa72821f42afaae9d775f07"
            },
            "downloads": -1,
            "filename": "mlx_vlm-0.3.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "56d08ade8746ca0897d26df25875cc69",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 282925,
            "upload_time": "2025-07-12T16:21:30",
            "upload_time_iso_8601": "2025-07-12T16:21:30.101358Z",
            "url": "https://files.pythonhosted.org/packages/ce/ff/7a24cb5a70482113f752ffa89b942d1c7e3710c99aa7eeebd6d4d0d34f7a/mlx_vlm-0.3.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "dec2f8a664ba84159bdf4ee89511ffe603e3a98fd2e50fabb3b4d01829246793",
                "md5": "d6d72047e12fcedfeec57e71f401726a",
                "sha256": "10044d5d3ab9bbb0bf0f4cd836a39fd8751bac44452a2a6735dc98925fd228fb"
            },
            "downloads": -1,
            "filename": "mlx_vlm-0.3.1.tar.gz",
            "has_sig": false,
            "md5_digest": "d6d72047e12fcedfeec57e71f401726a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 227297,
            "upload_time": "2025-07-12T16:21:31",
            "upload_time_iso_8601": "2025-07-12T16:21:31.720222Z",
            "url": "https://files.pythonhosted.org/packages/de/c2/f8a664ba84159bdf4ee89511ffe603e3a98fd2e50fabb3b4d01829246793/mlx_vlm-0.3.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-12 16:21:31",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Blaizzy",
    "github_project": "mlx-vlm",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "mlx",
            "specs": [
                [
                    ">=",
                    "0.26.0"
                ]
            ]
        },
        {
            "name": "datasets",
            "specs": [
                [
                    ">=",
                    "2.19.1"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    ">=",
                    "4.66.2"
                ]
            ]
        },
        {
            "name": "transformers",
            "specs": [
                [
                    ">=",
                    "4.53.0"
                ]
            ]
        },
        {
            "name": "gradio",
            "specs": [
                [
                    ">=",
                    "5.19.0"
                ]
            ]
        },
        {
            "name": "Pillow",
            "specs": [
                [
                    ">=",
                    "10.3.0"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    ">=",
                    "2.31.0"
                ]
            ]
        },
        {
            "name": "opencv-python",
            "specs": [
                [
                    "==",
                    "4.10.0.84"
                ]
            ]
        },
        {
            "name": "mlx-lm",
            "specs": [
                [
                    ">=",
                    "0.23.0"
                ]
            ]
        },
        {
            "name": "fastapi",
            "specs": [
                [
                    ">=",
                    "0.95.1"
                ]
            ]
        },
        {
            "name": "soundfile",
            "specs": [
                [
                    ">=",
                    "0.13.1"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    ">=",
                    "1.15.3"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": []
        },
        {
            "name": "mlx-audio",
            "specs": []
        }
    ],
    "lcname": "mlx-vlm"
}
        
Elapsed time: 0.40945s