Name | llm-mlx JSON |
Version |
0.3
JSON |
| download |
home_page | None |
Summary | Support for MLX models in LLM |
upload_time | 2025-02-17 01:50:00 |
maintainer | None |
docs_url | None |
author | Simon Willison |
requires_python | >=3.9 |
license | Apache-2.0 |
keywords |
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# llm-mlx
[](https://pypi.org/project/llm-mlx/)
[](https://github.com/simonw/llm-mlx/releases)
[](https://github.com/simonw/llm-mlx/actions/workflows/test.yml)
[](https://github.com/simonw/llm-mlx/blob/main/LICENSE)
Support for [MLX](https://github.com/ml-explore/mlx) models in [LLM](https://llm.datasette.io/).
Read my blog for [background on this project](https://simonwillison.net/2025/Feb/15/llm-mlx/).
## Installation
Install this plugin in the same environment as [LLM](https://llm.datasette.io/). This plugin likely only works on macOS.
```bash
llm install llm-mlx
```
This plugin depends on [sentencepiece](https://pypi.org/project/sentencepiece/) which does not yet publish a binary wheel for Python 3.13. You will find this plugin easier to run on Python 3.12 or lower. One way to install a version of LLM that uses Python 3.12 is like this, using [uv](https://github.com/astral-sh/uv):
```bash
uv tool install llm --python 3.12
```
See [issue #7](https://github.com/simonw/llm-mlx/issues/7) for more on this.
## Usage
To install an MLX model from Hugging Face, use the `llm mlx download-model` command. This example downloads 1.8GB of model weights from [mlx-community/Llama-3.2-3B-Instruct-4bit](https://huggingface.co/mlx-community/Llama-3.2-3B-Instruct-4bit):
```bash
llm mlx download-model mlx-community/Llama-3.2-3B-Instruct-4bit
```
Then run prompts like this:
```bash
llm -m mlx-community/Llama-3.2-3B-Instruct-4bit 'Capital of France?' -s 'you are a pelican'
```
The [mlx-community](https://huggingface.co/mlx-community) organization is a useful source for compatible models.
### Models to try
The following models all work well with this plugin:
- `mlx-community/Qwen2.5-0.5B-Instruct-4bit` - [278MB](https://huggingface.co/mlx-community/Qwen2.5-0.5B-Instruct-4bit)
- `mlx-community/Mistral-7B-Instruct-v0.3-4bit` - [4.08GB](https://huggingface.co/mlx-community/Mistral-7B-Instruct-v0.3-4bit)
- `mlx-community/Mistral-Small-24B-Instruct-2501-4bit` — [13.26 GB](https://huggingface.co/mlx-community/Mistral-Small-24B-Instruct-2501-4bit)
- `mlx-community/DeepSeek-R1-Distill-Qwen-32B-4bit` - [18.5GB](https://huggingface.co/mlx-community/DeepSeek-R1-Distill-Qwen-32B-4bit)
- `mlx-community/Llama-3.3-70B-Instruct-4bit` - [40GB](https://huggingface.co/mlx-community/Llama-3.3-70B-Instruct-4bit)
### Model options
MLX models can use the following model options:
- `-o max_tokens INTEGER`: Maximum number of tokens to generate in the completion (defaults to 1024)
- `-o unlimited 1`: Generate an unlimited number of tokens in the completion
- `-o temperature FLOAT`: Sampling temperature (defaults to 0.8)
- `-o top_p FLOAT`: Sampling top-p (defaults to 0.9)
- `-o min_p FLOAT`: Sampling min-p (defaults to 0.1)
- `-o min_tokens_to_keep INT`: Minimum tokens to keep for min-p sampling (defaults to 1)
- `-o seed INT`: Random number seed to use
For example:
```bash
llm -m mlx-community/Llama-3.2-3B-Instruct-4bit 'Joke about pelicans' -o max_tokens 60 -o temperature 1.0
```
## Importing existing models
If you have used MLX models in the past you may already have some installed in your `~/.cache/huggingface/hub` directory.
The `llm mlx import-models` command can detect these and provide you with the option to add them to the list of models registered with LLM.
```bash
llm mlx import-models
```
This will open an interface like this one:
```
Available models (↑/↓ to navigate, SPACE to select, ENTER to confirm, Ctrl+C to quit):
> ○ (llama) mlx-community/DeepSeek-R1-Distill-Llama-8B (already imported)
○ (llama) mlx-community/Llama-3.2-3B-Instruct-4bit (already imported)
○ (llama) mlx-community/Llama-3.3-70B-Instruct-4bit
○ (mistral) mlx-community/Mistral-7B-Instruct-v0.3-4bit (already imported)
○ (mistral) mlx-community/Mistral-Small-24B-Instruct-2501-4bit
```
Navigate <up> and <down>, hit `<space>` to select models to import and then hit `<enter>` to confirm.
## Using models from Python
If you have registered models with the `llm download-model` command you can use in Python like this:
```python
import llm
model = llm.get_model("mlx-community/Llama-3.2-3B-Instruct-4bit")
print(model.prompt("hi").text())
```
You can avoid that registration step entirely by accessing the models like this instead:
```python
from llm_mlx import MlxModel
model = MlxModel("mlx-community/Llama-3.2-3B-Instruct-4bit")
print(model.prompt("hi").text())
# Outputs: How can I assist you today?
```
The [LLM Python API documentation](https://llm.datasette.io/en/stable/python-api.html) has more details on how to use LLM models.
## Development
To set up this plugin locally, first checkout the code. Then create a new virtual environment:
```bash
cd llm-mlx
python -m venv venv
source venv/bin/activate
```
Now install the dependencies and test dependencies:
```bash
llm install -e '.[test]'
```
To run the tests:
```bash
python -m pytest
```
Raw data
{
"_id": null,
"home_page": null,
"name": "llm-mlx",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": null,
"author": "Simon Willison",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/d1/16/47abf862f49fcfc21b9f1eed875acc8a489ccc6e318a0d5ff7014a207ab9/llm_mlx-0.3.tar.gz",
"platform": null,
"description": "# llm-mlx\n\n[](https://pypi.org/project/llm-mlx/)\n[](https://github.com/simonw/llm-mlx/releases)\n[](https://github.com/simonw/llm-mlx/actions/workflows/test.yml)\n[](https://github.com/simonw/llm-mlx/blob/main/LICENSE)\n\nSupport for [MLX](https://github.com/ml-explore/mlx) models in [LLM](https://llm.datasette.io/).\n\nRead my blog for [background on this project](https://simonwillison.net/2025/Feb/15/llm-mlx/).\n\n## Installation\n\nInstall this plugin in the same environment as [LLM](https://llm.datasette.io/). This plugin likely only works on macOS.\n```bash\nllm install llm-mlx\n```\nThis plugin depends on [sentencepiece](https://pypi.org/project/sentencepiece/) which does not yet publish a binary wheel for Python 3.13. You will find this plugin easier to run on Python 3.12 or lower. One way to install a version of LLM that uses Python 3.12 is like this, using [uv](https://github.com/astral-sh/uv):\n\n```bash\nuv tool install llm --python 3.12\n```\nSee [issue #7](https://github.com/simonw/llm-mlx/issues/7) for more on this.\n\n## Usage\n\nTo install an MLX model from Hugging Face, use the `llm mlx download-model` command. This example downloads 1.8GB of model weights from [mlx-community/Llama-3.2-3B-Instruct-4bit](https://huggingface.co/mlx-community/Llama-3.2-3B-Instruct-4bit):\n\n```bash\nllm mlx download-model mlx-community/Llama-3.2-3B-Instruct-4bit\n```\nThen run prompts like this:\n```bash\nllm -m mlx-community/Llama-3.2-3B-Instruct-4bit 'Capital of France?' -s 'you are a pelican'\n```\nThe [mlx-community](https://huggingface.co/mlx-community) organization is a useful source for compatible models.\n\n### Models to try\n\nThe following models all work well with this plugin:\n\n- `mlx-community/Qwen2.5-0.5B-Instruct-4bit` - [278MB](https://huggingface.co/mlx-community/Qwen2.5-0.5B-Instruct-4bit)\n- `mlx-community/Mistral-7B-Instruct-v0.3-4bit` - [4.08GB](https://huggingface.co/mlx-community/Mistral-7B-Instruct-v0.3-4bit)\n- `mlx-community/Mistral-Small-24B-Instruct-2501-4bit` \u2014 [13.26 GB](https://huggingface.co/mlx-community/Mistral-Small-24B-Instruct-2501-4bit)\n- `mlx-community/DeepSeek-R1-Distill-Qwen-32B-4bit` - [18.5GB](https://huggingface.co/mlx-community/DeepSeek-R1-Distill-Qwen-32B-4bit)\n- `mlx-community/Llama-3.3-70B-Instruct-4bit` - [40GB](https://huggingface.co/mlx-community/Llama-3.3-70B-Instruct-4bit)\n\n### Model options\n\nMLX models can use the following model options:\n\n- `-o max_tokens INTEGER`: Maximum number of tokens to generate in the completion (defaults to 1024)\n- `-o unlimited 1`: Generate an unlimited number of tokens in the completion\n- `-o temperature FLOAT`: Sampling temperature (defaults to 0.8)\n- `-o top_p FLOAT`: Sampling top-p (defaults to 0.9)\n- `-o min_p FLOAT`: Sampling min-p (defaults to 0.1)\n- `-o min_tokens_to_keep INT`: Minimum tokens to keep for min-p sampling (defaults to 1)\n- `-o seed INT`: Random number seed to use\n\nFor example:\n```bash\nllm -m mlx-community/Llama-3.2-3B-Instruct-4bit 'Joke about pelicans' -o max_tokens 60 -o temperature 1.0\n```\n\n## Importing existing models\n\nIf you have used MLX models in the past you may already have some installed in your `~/.cache/huggingface/hub` directory.\n\nThe `llm mlx import-models` command can detect these and provide you with the option to add them to the list of models registered with LLM.\n\n```bash\nllm mlx import-models\n```\nThis will open an interface like this one:\n```\nAvailable models (\u2191/\u2193 to navigate, SPACE to select, ENTER to confirm, Ctrl+C to quit):\n> \u25cb (llama) mlx-community/DeepSeek-R1-Distill-Llama-8B (already imported)\n \u25cb (llama) mlx-community/Llama-3.2-3B-Instruct-4bit (already imported)\n \u25cb (llama) mlx-community/Llama-3.3-70B-Instruct-4bit\n \u25cb (mistral) mlx-community/Mistral-7B-Instruct-v0.3-4bit (already imported)\n \u25cb (mistral) mlx-community/Mistral-Small-24B-Instruct-2501-4bit\n```\nNavigate <up> and <down>, hit `<space>` to select models to import and then hit `<enter>` to confirm.\n\n## Using models from Python\n\nIf you have registered models with the `llm download-model` command you can use in Python like this:\n```python\nimport llm\nmodel = llm.get_model(\"mlx-community/Llama-3.2-3B-Instruct-4bit\")\nprint(model.prompt(\"hi\").text())\n```\nYou can avoid that registration step entirely by accessing the models like this instead:\n```python\nfrom llm_mlx import MlxModel\nmodel = MlxModel(\"mlx-community/Llama-3.2-3B-Instruct-4bit\")\nprint(model.prompt(\"hi\").text())\n# Outputs: How can I assist you today?\n```\n\nThe [LLM Python API documentation](https://llm.datasette.io/en/stable/python-api.html) has more details on how to use LLM models.\n\n## Development\n\nTo set up this plugin locally, first checkout the code. Then create a new virtual environment:\n```bash\ncd llm-mlx\npython -m venv venv\nsource venv/bin/activate\n```\nNow install the dependencies and test dependencies:\n```bash\nllm install -e '.[test]'\n```\nTo run the tests:\n```bash\npython -m pytest\n```\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Support for MLX models in LLM",
"version": "0.3",
"project_urls": {
"CI": "https://github.com/simonw/llm-mlx/actions",
"Changelog": "https://github.com/simonw/llm-mlx/releases",
"Homepage": "https://github.com/simonw/llm-mlx",
"Issues": "https://github.com/simonw/llm-mlx/issues"
},
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "2b93b4583797749f1f13412cf7560e4c0f05fd4cf80032507f967de9d167e196",
"md5": "9d0a220e686cae77ff99d3ecb8bbc0ef",
"sha256": "bf4490ca1e8332bdd2b1d3e88da8a84158f176cdeb616e300b68a265c7116476"
},
"downloads": -1,
"filename": "llm_mlx-0.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9d0a220e686cae77ff99d3ecb8bbc0ef",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 10877,
"upload_time": "2025-02-17T01:49:58",
"upload_time_iso_8601": "2025-02-17T01:49:58.216229Z",
"url": "https://files.pythonhosted.org/packages/2b/93/b4583797749f1f13412cf7560e4c0f05fd4cf80032507f967de9d167e196/llm_mlx-0.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "d11647abf862f49fcfc21b9f1eed875acc8a489ccc6e318a0d5ff7014a207ab9",
"md5": "cf2cd2c9e3b88c9f2c6a9c31a39835b7",
"sha256": "9ca248b96a7099fc14d0bd43953a202de937c25a71c2aeb906a601bd2805ca72"
},
"downloads": -1,
"filename": "llm_mlx-0.3.tar.gz",
"has_sig": false,
"md5_digest": "cf2cd2c9e3b88c9f2c6a9c31a39835b7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 10827,
"upload_time": "2025-02-17T01:50:00",
"upload_time_iso_8601": "2025-02-17T01:50:00.165983Z",
"url": "https://files.pythonhosted.org/packages/d1/16/47abf862f49fcfc21b9f1eed875acc8a489ccc6e318a0d5ff7014a207ab9/llm_mlx-0.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-17 01:50:00",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "simonw",
"github_project": "llm-mlx",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "llm-mlx"
}