llm-mlx

Name	llm-mlx JSON
Version	0.3 JSON
	download
home_page	None
Summary	Support for MLX models in LLM
upload_time	2025-02-17 01:50:00
maintainer	None
docs_url	None
author	Simon Willison
requires_python	>=3.9
license	Apache-2.0
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # llm-mlx

[![PyPI](https://img.shields.io/pypi/v/llm-mlx.svg)](https://pypi.org/project/llm-mlx/)
[![Changelog](https://img.shields.io/github/v/release/simonw/llm-mlx?include_prereleases&label=changelog)](https://github.com/simonw/llm-mlx/releases)
[![Tests](https://github.com/simonw/llm-mlx/actions/workflows/test.yml/badge.svg)](https://github.com/simonw/llm-mlx/actions/workflows/test.yml)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/llm-mlx/blob/main/LICENSE)

Support for [MLX](https://github.com/ml-explore/mlx) models in [LLM](https://llm.datasette.io/).

Read my blog for [background on this project](https://simonwillison.net/2025/Feb/15/llm-mlx/).

## Installation

Install this plugin in the same environment as [LLM](https://llm.datasette.io/). This plugin likely only works on macOS.
```bash
llm install llm-mlx
```
This plugin depends on [sentencepiece](https://pypi.org/project/sentencepiece/) which does not yet publish a binary wheel for Python 3.13. You will find this plugin easier to run on Python 3.12 or lower. One way to install a version of LLM that uses Python 3.12 is like this, using [uv](https://github.com/astral-sh/uv):

```bash
uv tool install llm --python 3.12
```
See [issue #7](https://github.com/simonw/llm-mlx/issues/7) for more on this.

## Usage

To install an MLX model from Hugging Face, use the `llm mlx download-model` command. This example downloads 1.8GB of model weights from [mlx-community/Llama-3.2-3B-Instruct-4bit](https://huggingface.co/mlx-community/Llama-3.2-3B-Instruct-4bit):

```bash
llm mlx download-model mlx-community/Llama-3.2-3B-Instruct-4bit
```
Then run prompts like this:
```bash
llm -m mlx-community/Llama-3.2-3B-Instruct-4bit 'Capital of France?' -s 'you are a pelican'
```
The [mlx-community](https://huggingface.co/mlx-community) organization is a useful source for compatible models.

### Models to try

The following models all work well with this plugin:

- `mlx-community/Qwen2.5-0.5B-Instruct-4bit` - [278MB](https://huggingface.co/mlx-community/Qwen2.5-0.5B-Instruct-4bit)
- `mlx-community/Mistral-7B-Instruct-v0.3-4bit` - [4.08GB](https://huggingface.co/mlx-community/Mistral-7B-Instruct-v0.3-4bit)
-  `mlx-community/Mistral-Small-24B-Instruct-2501-4bit` — [13.26 GB](https://huggingface.co/mlx-community/Mistral-Small-24B-Instruct-2501-4bit)
- `mlx-community/DeepSeek-R1-Distill-Qwen-32B-4bit` - [18.5GB](https://huggingface.co/mlx-community/DeepSeek-R1-Distill-Qwen-32B-4bit)
- `mlx-community/Llama-3.3-70B-Instruct-4bit` - [40GB](https://huggingface.co/mlx-community/Llama-3.3-70B-Instruct-4bit)

### Model options

MLX models can use the following model options:

- `-o max_tokens INTEGER`: Maximum number of tokens to generate in the completion (defaults to 1024)
- `-o unlimited 1`: Generate an unlimited number of tokens in the completion
- `-o temperature FLOAT`: Sampling temperature (defaults to 0.8)
- `-o top_p FLOAT`: Sampling top-p (defaults to 0.9)
- `-o min_p FLOAT`: Sampling min-p (defaults to 0.1)
- `-o min_tokens_to_keep INT`: Minimum tokens to keep for min-p sampling (defaults to 1)
- `-o seed INT`: Random number seed to use

For example:
```bash
llm -m mlx-community/Llama-3.2-3B-Instruct-4bit 'Joke about pelicans' -o max_tokens 60 -o temperature 1.0
```

## Importing existing models

If you have used MLX models in the past you may already have some installed in your `~/.cache/huggingface/hub` directory.

The `llm mlx import-models` command can detect these and provide you with the option to add them to the list of models registered with LLM.

```bash
llm mlx import-models
```
This will open an interface like this one:
```
Available models (↑/↓ to navigate, SPACE to select, ENTER to confirm, Ctrl+C to quit):
> ○ (llama) mlx-community/DeepSeek-R1-Distill-Llama-8B (already imported)
  ○ (llama) mlx-community/Llama-3.2-3B-Instruct-4bit (already imported)
  ○ (llama) mlx-community/Llama-3.3-70B-Instruct-4bit
  ○ (mistral) mlx-community/Mistral-7B-Instruct-v0.3-4bit (already imported)
  ○ (mistral) mlx-community/Mistral-Small-24B-Instruct-2501-4bit
```
Navigate <up> and <down>, hit `<space>` to select models to import and then hit `<enter>` to confirm.

## Using models from Python

If you have registered models with the `llm download-model` command you can use in Python like this:
```python
import llm
model = llm.get_model("mlx-community/Llama-3.2-3B-Instruct-4bit")
print(model.prompt("hi").text())
```
You can avoid that registration step entirely by accessing the models like this instead:
```python
from llm_mlx import MlxModel
model = MlxModel("mlx-community/Llama-3.2-3B-Instruct-4bit")
print(model.prompt("hi").text())
# Outputs: How can I assist you today?
```

The [LLM Python API documentation](https://llm.datasette.io/en/stable/python-api.html) has more details on how to use LLM models.

## Development

To set up this plugin locally, first checkout the code. Then create a new virtual environment:
```bash
cd llm-mlx
python -m venv venv
source venv/bin/activate
```
Now install the dependencies and test dependencies:
```bash
llm install -e '.[test]'
```
To run the tests:
```bash
python -m pytest
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "llm-mlx",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": null,
    "author": "Simon Willison",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/d1/16/47abf862f49fcfc21b9f1eed875acc8a489ccc6e318a0d5ff7014a207ab9/llm_mlx-0.3.tar.gz",
    "platform": null,
    "description": "# llm-mlx\n\n[![PyPI](https://img.shields.io/pypi/v/llm-mlx.svg)](https://pypi.org/project/llm-mlx/)\n[![Changelog](https://img.shields.io/github/v/release/simonw/llm-mlx?include_prereleases&label=changelog)](https://github.com/simonw/llm-mlx/releases)\n[![Tests](https://github.com/simonw/llm-mlx/actions/workflows/test.yml/badge.svg)](https://github.com/simonw/llm-mlx/actions/workflows/test.yml)\n[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/llm-mlx/blob/main/LICENSE)\n\nSupport for [MLX](https://github.com/ml-explore/mlx) models in [LLM](https://llm.datasette.io/).\n\nRead my blog for [background on this project](https://simonwillison.net/2025/Feb/15/llm-mlx/).\n\n## Installation\n\nInstall this plugin in the same environment as [LLM](https://llm.datasette.io/). This plugin likely only works on macOS.\n```bash\nllm install llm-mlx\n```\nThis plugin depends on [sentencepiece](https://pypi.org/project/sentencepiece/) which does not yet publish a binary wheel for Python 3.13. You will find this plugin easier to run on Python 3.12 or lower. One way to install a version of LLM that uses Python 3.12 is like this, using [uv](https://github.com/astral-sh/uv):\n\n```bash\nuv tool install llm --python 3.12\n```\nSee [issue #7](https://github.com/simonw/llm-mlx/issues/7) for more on this.\n\n## Usage\n\nTo install an MLX model from Hugging Face, use the `llm mlx download-model` command. This example downloads 1.8GB of model weights from [mlx-community/Llama-3.2-3B-Instruct-4bit](https://huggingface.co/mlx-community/Llama-3.2-3B-Instruct-4bit):\n\n```bash\nllm mlx download-model mlx-community/Llama-3.2-3B-Instruct-4bit\n```\nThen run prompts like this:\n```bash\nllm -m mlx-community/Llama-3.2-3B-Instruct-4bit 'Capital of France?' -s 'you are a pelican'\n```\nThe [mlx-community](https://huggingface.co/mlx-community) organization is a useful source for compatible models.\n\n### Models to try\n\nThe following models all work well with this plugin:\n\n- `mlx-community/Qwen2.5-0.5B-Instruct-4bit` - [278MB](https://huggingface.co/mlx-community/Qwen2.5-0.5B-Instruct-4bit)\n- `mlx-community/Mistral-7B-Instruct-v0.3-4bit` - [4.08GB](https://huggingface.co/mlx-community/Mistral-7B-Instruct-v0.3-4bit)\n-  `mlx-community/Mistral-Small-24B-Instruct-2501-4bit` \u2014 [13.26 GB](https://huggingface.co/mlx-community/Mistral-Small-24B-Instruct-2501-4bit)\n- `mlx-community/DeepSeek-R1-Distill-Qwen-32B-4bit` - [18.5GB](https://huggingface.co/mlx-community/DeepSeek-R1-Distill-Qwen-32B-4bit)\n- `mlx-community/Llama-3.3-70B-Instruct-4bit` - [40GB](https://huggingface.co/mlx-community/Llama-3.3-70B-Instruct-4bit)\n\n### Model options\n\nMLX models can use the following model options:\n\n- `-o max_tokens INTEGER`: Maximum number of tokens to generate in the completion (defaults to 1024)\n- `-o unlimited 1`: Generate an unlimited number of tokens in the completion\n- `-o temperature FLOAT`: Sampling temperature (defaults to 0.8)\n- `-o top_p FLOAT`: Sampling top-p (defaults to 0.9)\n- `-o min_p FLOAT`: Sampling min-p (defaults to 0.1)\n- `-o min_tokens_to_keep INT`: Minimum tokens to keep for min-p sampling (defaults to 1)\n- `-o seed INT`: Random number seed to use\n\nFor example:\n```bash\nllm -m mlx-community/Llama-3.2-3B-Instruct-4bit 'Joke about pelicans' -o max_tokens 60 -o temperature 1.0\n```\n\n## Importing existing models\n\nIf you have used MLX models in the past you may already have some installed in your `~/.cache/huggingface/hub` directory.\n\nThe `llm mlx import-models` command can detect these and provide you with the option to add them to the list of models registered with LLM.\n\n```bash\nllm mlx import-models\n```\nThis will open an interface like this one:\n```\nAvailable models (\u2191/\u2193 to navigate, SPACE to select, ENTER to confirm, Ctrl+C to quit):\n> \u25cb (llama) mlx-community/DeepSeek-R1-Distill-Llama-8B (already imported)\n  \u25cb (llama) mlx-community/Llama-3.2-3B-Instruct-4bit (already imported)\n  \u25cb (llama) mlx-community/Llama-3.3-70B-Instruct-4bit\n  \u25cb (mistral) mlx-community/Mistral-7B-Instruct-v0.3-4bit (already imported)\n  \u25cb (mistral) mlx-community/Mistral-Small-24B-Instruct-2501-4bit\n```\nNavigate <up> and <down>, hit `<space>` to select models to import and then hit `<enter>` to confirm.\n\n## Using models from Python\n\nIf you have registered models with the `llm download-model` command you can use in Python like this:\n```python\nimport llm\nmodel = llm.get_model(\"mlx-community/Llama-3.2-3B-Instruct-4bit\")\nprint(model.prompt(\"hi\").text())\n```\nYou can avoid that registration step entirely by accessing the models like this instead:\n```python\nfrom llm_mlx import MlxModel\nmodel = MlxModel(\"mlx-community/Llama-3.2-3B-Instruct-4bit\")\nprint(model.prompt(\"hi\").text())\n# Outputs: How can I assist you today?\n```\n\nThe [LLM Python API documentation](https://llm.datasette.io/en/stable/python-api.html) has more details on how to use LLM models.\n\n## Development\n\nTo set up this plugin locally, first checkout the code. Then create a new virtual environment:\n```bash\ncd llm-mlx\npython -m venv venv\nsource venv/bin/activate\n```\nNow install the dependencies and test dependencies:\n```bash\nllm install -e '.[test]'\n```\nTo run the tests:\n```bash\npython -m pytest\n```\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Support for MLX models in LLM",
    "version": "0.3",
    "project_urls": {
        "CI": "https://github.com/simonw/llm-mlx/actions",
        "Changelog": "https://github.com/simonw/llm-mlx/releases",
        "Homepage": "https://github.com/simonw/llm-mlx",
        "Issues": "https://github.com/simonw/llm-mlx/issues"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "2b93b4583797749f1f13412cf7560e4c0f05fd4cf80032507f967de9d167e196",
                "md5": "9d0a220e686cae77ff99d3ecb8bbc0ef",
                "sha256": "bf4490ca1e8332bdd2b1d3e88da8a84158f176cdeb616e300b68a265c7116476"
            },
            "downloads": -1,
            "filename": "llm_mlx-0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9d0a220e686cae77ff99d3ecb8bbc0ef",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 10877,
            "upload_time": "2025-02-17T01:49:58",
            "upload_time_iso_8601": "2025-02-17T01:49:58.216229Z",
            "url": "https://files.pythonhosted.org/packages/2b/93/b4583797749f1f13412cf7560e4c0f05fd4cf80032507f967de9d167e196/llm_mlx-0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d11647abf862f49fcfc21b9f1eed875acc8a489ccc6e318a0d5ff7014a207ab9",
                "md5": "cf2cd2c9e3b88c9f2c6a9c31a39835b7",
                "sha256": "9ca248b96a7099fc14d0bd43953a202de937c25a71c2aeb906a601bd2805ca72"
            },
            "downloads": -1,
            "filename": "llm_mlx-0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "cf2cd2c9e3b88c9f2c6a9c31a39835b7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 10827,
            "upload_time": "2025-02-17T01:50:00",
            "upload_time_iso_8601": "2025-02-17T01:50:00.165983Z",
            "url": "https://files.pythonhosted.org/packages/d1/16/47abf862f49fcfc21b9f1eed875acc8a489ccc6e318a0d5ff7014a207ab9/llm_mlx-0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-17 01:50:00",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "simonw",
    "github_project": "llm-mlx",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "llm-mlx"
}

Simon Willison