# llama-cpp-cffi
<!--
[![Build][build-image]]()
[![Status][status-image]][pypi-project-url]
[![Stable Version][stable-ver-image]][pypi-project-url]
[![Coverage][coverage-image]]()
[![Python][python-ver-image]][pypi-project-url]
[![License][mit-image]][mit-url]
-->
[![PyPI](https://img.shields.io/pypi/v/llama-cpp-cffi)](https://pypi.org/project/llama-cpp-cffi/)
[![Supported Versions](https://img.shields.io/pypi/pyversions/llama-cpp-cffi)](https://pypi.org/project/llama-cpp-cffi)
[![PyPI Downloads](https://img.shields.io/pypi/dm/llama-cpp-cffi)](https://pypistats.org/packages/llama-cpp-cffi)
[![Github Downloads](https://img.shields.io/github/downloads/tangledgroup/llama-cpp-cffi/total.svg?label=Github%20Downloads)]()
[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT)
**Python** 3.10+ binding for [llama.cpp](https://github.com/ggerganov/llama.cpp) using **cffi**. Supports **CPU**, **Vulkan 1.x** (AMD, Intel and Nvidia GPUs) and **CUDA 12.6** (Nvidia GPUs) runtimes, **x86_64** and **aarch64** platforms.
NOTE: Currently supported operating system is **Linux** (`manylinux_2_28` and `musllinux_1_2`), but we are working on both **Windows** and **macOS** versions.
## News
- **Jan 15 2025, v0.4.15**: Dynamically load/unload models while executing prompts in parallel.
- **Jan 14 2025, v0.4.14**: Modular llama.cpp build using `cmake` build system. Deprecated `make` build system.
- **Jan 1 2025, v0.3.1**: OpenAI compatible API, **text** and **vision** models. Added support for **Qwen2-VL** models. Hot-swap of models on demand in server/API.
- **Dec 9 2024, v0.2.0**: Low-level and high-level APIs: llama, llava, clip and ggml API.
- **Nov 27 2024, v0.1.22**: Support for Multimodal models such as **llava** and **minicpmv**.
## Install
Basic library install:
```bash
pip install llama-cpp-cffi
```
In case you want [OpenAI © Chat Completions API](https://platform.openai.com/docs/overview) compatible API:
```bash
pip install llama-cpp-cffi[openai]
```
**IMPORTANT:** If you want to take advantage of **Nvidia** GPU acceleration, make sure that you have installed **CUDA 12**. If you don't have CUDA 12.X installed follow instructions here: https://developer.nvidia.com/cuda-downloads .
GPU Compute Capability: `compute_61`, `compute_70`, `compute_75`, `compute_80`, `compute_86`, `compute_89` covering from most of GPUs from **GeForce GTX 1050** to **NVIDIA H100**. [GPU Compute Capability](https://developer.nvidia.com/cuda-gpus).
## LLM Example
```python
from llama import Model
#
# first define and load/init model
#
model = Model(
creator_hf_repo='HuggingFaceTB/SmolLM2-1.7B-Instruct',
hf_repo='bartowski/SmolLM2-1.7B-Instruct-GGUF',
hf_file='SmolLM2-1.7B-Instruct-Q4_K_M.gguf',
)
model.init(n_ctx=8 * 1024, gpu_layers=99)
#
# messages
#
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'},
]
completions = model.completions(
messages=messages,
predict=1 * 1024,
temp=0.7,
top_p=0.8,
top_k=100,
)
for chunk in completions:
print(chunk, flush=True, end='')
#
# prompt
#
prompt='Evaluate 1 + 2 in Python. Result in Python is'
completions = model.completions(
prompt=prompt,
predict=1 * 1024,
temp=0.7,
top_p=0.8,
top_k=100,
)
for chunk in completions:
print(chunk, flush=True, end='')
```
### References
- `examples/llm.py`
- `examples/demo_text.py`
## VLM Example
```python
from llama import Model
#
# first define and load/init model
#
model = Model( # 1.87B
creator_hf_repo='vikhyatk/moondream2',
hf_repo='vikhyatk/moondream2',
hf_file='moondream2-text-model-f16.gguf',
mmproj_hf_file='moondream2-mmproj-f16.gguf',
)
model.init(n_ctx=8 * 1024, gpu_layers=99)
#
# prompt
#
prompt = 'Describe this image.'
image = 'examples/llama-1.png'
completions = model.completions(
prompt=prompt,
image=image,
predict=1 * 1024,
)
for chunk in completions:
print(chunk, flush=True, end='')
```
### References
- `examples/vlm.py`
- `examples/demo_llava.py`
- `examples/demo_minicpmv.py`
- `examples/demo_qwen2vl.py`
## API
### Server - llama-cpp-cffi + OpenAI API
Run server first:
```bash
python -m llama.server
# or
python -B -u -m gunicorn --bind '0.0.0.0:11434' --timeout 900 --workers 1 --worker-class aiohttp.GunicornWebWorker 'llama.server:build_app()'
```
### Client - llama-cpp-cffi API / curl
```bash
#
# llm
#
curl -XPOST 'http://localhost:11434/api/1.0/completions' \
-H "Content-Type: application/json" \
-d '{
"gpu_layers": 99,
"prompt": "Evaluate 1 + 2 in Python."
}'
curl -XPOST 'http://localhost:11434/api/1.0/completions' \
-H "Content-Type: application/json" \
-d '{
"creator_hf_repo": "HuggingFaceTB/SmolLM2-1.7B-Instruct",
"hf_repo": "bartowski/SmolLM2-1.7B-Instruct-GGUF",
"hf_file": "SmolLM2-1.7B-Instruct-Q4_K_M.gguf",
"gpu_layers": 99,
"prompt": "Evaluate 1 + 2 in Python."
}'
curl -XPOST 'http://localhost:11434/api/1.0/completions' \
-H "Content-Type: application/json" \
-d '{
"creator_hf_repo": "Qwen/Qwen2.5-0.5B-Instruct",
"hf_repo": "Qwen/Qwen2.5-0.5B-Instruct-GGUF",
"hf_file": "qwen2.5-0.5b-instruct-q4_k_m.gguf",
"gpu_layers": 99,
"prompt": "Evaluate 1 + 2 in Python."
}'
curl -XPOST 'http://localhost:11434/api/1.0/completions' \
-H "Content-Type: application/json" \
-d '{
"creator_hf_repo": "Qwen/Qwen2.5-7B-Instruct",
"hf_repo": "bartowski/Qwen2.5-7B-Instruct-GGUF",
"hf_file": "Qwen2.5-7B-Instruct-Q4_K_M.gguf",
"gpu_layers": 99,
"prompt": "Evaluate 1 + 2 in Python."
}'
#
# vlm - example 1
#
image_path="examples/llama-1.jpg"
mime_type=$(file -b --mime-type "$image_path")
base64_data=$(base64 -w 0 "$image_path")
cat << EOF > /tmp/temp.json
{
"creator_hf_repo": "Qwen/Qwen2-VL-2B-Instruct",
"hf_repo": "bartowski/Qwen2-VL-2B-Instruct-GGUF",
"hf_file": "Qwen2-VL-2B-Instruct-Q4_K_M.gguf",
"mmproj_hf_file": "mmproj-Qwen2-VL-2B-Instruct-f16.gguf",
"gpu_layers": 99,
"prompt": "Describe this image.",
"image": "data:$mime_type;base64,$base64_data"
}
EOF
curl -XPOST 'http://localhost:11434/api/1.0/completions' \
-H "Content-Type: application/json" \
--data-binary "@/tmp/temp.json"
#
# vlm - example 2
#
image_path="examples/llama-1.jpg"
mime_type=$(file -b --mime-type "$image_path")
base64_data=$(base64 -w 0 "$image_path")
cat << EOF > /tmp/temp.json
{
"creator_hf_repo": "Qwen/Qwen2-VL-2B-Instruct",
"hf_repo": "bartowski/Qwen2-VL-2B-Instruct-GGUF",
"hf_file": "Qwen2-VL-2B-Instruct-Q4_K_M.gguf",
"mmproj_hf_file": "mmproj-Qwen2-VL-2B-Instruct-f16.gguf",
"gpu_layers": 99,
"messages": [
{"role": "user", "content": [
{"type": "text", "text": "Describe this image."},
{
"type": "image_url",
"image_url": {"url": "data:$mime_type;base64,$base64_data"}
}
]}
]
}
EOF
curl -XPOST 'http://localhost:11434/api/1.0/completions' \
-H "Content-Type: application/json" \
--data-binary "@/tmp/temp.json"
```
### Client - OpenAI © compatible Chat Completions API
```bash
#
# text
#
curl -XPOST 'http://localhost:11434/v1/chat/completions' \
-H "Content-Type: application/json" \
-d '{
"model": "HuggingFaceTB/SmolLM2-1.7B-Instruct:bartowski/SmolLM2-1.7B-Instruct-GGUF:SmolLM2-1.7B-Instruct-Q4_K_M.gguf",
"messages": [
{
"role": "user",
"content": "Evaluate 1 + 2 in Python."
}
],
"n_ctx": 8192,
"gpu_layers": 99
}'
#
# image
#
image_path="examples/llama-1.jpg"
mime_type=$(file -b --mime-type "$image_path")
base64_data=$(base64 -w 0 "$image_path")
cat << EOF > /tmp/temp.json
{
"model": "Qwen/Qwen2-VL-2B-Instruct:bartowski/Qwen2-VL-2B-Instruct-GGUF:Qwen2-VL-2B-Instruct-Q4_K_M.gguf:mmproj-Qwen2-VL-2B-Instruct-f16.gguf",
"messages": [
{"role": "user", "content": [
{"type": "text", "text": "Describe this image."},
{
"type": "image_url",
"image_url": {"url": "data:$mime_type;base64,$base64_data"}
}
]}
],
"n_ctx": 8192,
"gpu_layers": 99
}
EOF
curl -XPOST 'http://localhost:11434/v1/chat/completions' \
-H "Content-Type: application/json" \
--data-binary "@/tmp/temp.json"
#
# Client Python API for OpenAI
#
python -B examples/demo_openai.py
```
### References
- `examples/demo_openai.py`
Raw data
{
"_id": null,
"home_page": "https://github.com/tangledgroup/llama-cpp-cffi",
"name": "llama-cpp-cffi",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "llama, llama-cpp, llama.cpp, llama-cpp-cffi, cffi",
"author": "Tangled Group, Inc",
"author_email": "info@tangledgroup.com",
"download_url": null,
"platform": null,
"description": "# llama-cpp-cffi\n\n<!--\n[![Build][build-image]]()\n[![Status][status-image]][pypi-project-url]\n[![Stable Version][stable-ver-image]][pypi-project-url]\n[![Coverage][coverage-image]]()\n[![Python][python-ver-image]][pypi-project-url]\n[![License][mit-image]][mit-url]\n-->\n[![PyPI](https://img.shields.io/pypi/v/llama-cpp-cffi)](https://pypi.org/project/llama-cpp-cffi/)\n[![Supported Versions](https://img.shields.io/pypi/pyversions/llama-cpp-cffi)](https://pypi.org/project/llama-cpp-cffi)\n[![PyPI Downloads](https://img.shields.io/pypi/dm/llama-cpp-cffi)](https://pypistats.org/packages/llama-cpp-cffi)\n[![Github Downloads](https://img.shields.io/github/downloads/tangledgroup/llama-cpp-cffi/total.svg?label=Github%20Downloads)]()\n[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n\n**Python** 3.10+ binding for [llama.cpp](https://github.com/ggerganov/llama.cpp) using **cffi**. Supports **CPU**, **Vulkan 1.x** (AMD, Intel and Nvidia GPUs) and **CUDA 12.6** (Nvidia GPUs) runtimes, **x86_64** and **aarch64** platforms.\n\nNOTE: Currently supported operating system is **Linux** (`manylinux_2_28` and `musllinux_1_2`), but we are working on both **Windows** and **macOS** versions.\n\n## News\n\n- **Jan 15 2025, v0.4.15**: Dynamically load/unload models while executing prompts in parallel.\n- **Jan 14 2025, v0.4.14**: Modular llama.cpp build using `cmake` build system. Deprecated `make` build system.\n- **Jan 1 2025, v0.3.1**: OpenAI compatible API, **text** and **vision** models. Added support for **Qwen2-VL** models. Hot-swap of models on demand in server/API.\n- **Dec 9 2024, v0.2.0**: Low-level and high-level APIs: llama, llava, clip and ggml API.\n- **Nov 27 2024, v0.1.22**: Support for Multimodal models such as **llava** and **minicpmv**.\n\n## Install\n\nBasic library install:\n\n```bash\npip install llama-cpp-cffi\n```\n\nIn case you want [OpenAI \u00a9 Chat Completions API](https://platform.openai.com/docs/overview) compatible API:\n\n```bash\npip install llama-cpp-cffi[openai]\n```\n\n**IMPORTANT:** If you want to take advantage of **Nvidia** GPU acceleration, make sure that you have installed **CUDA 12**. If you don't have CUDA 12.X installed follow instructions here: https://developer.nvidia.com/cuda-downloads .\n\nGPU Compute Capability: `compute_61`, `compute_70`, `compute_75`, `compute_80`, `compute_86`, `compute_89` covering from most of GPUs from **GeForce GTX 1050** to **NVIDIA H100**. [GPU Compute Capability](https://developer.nvidia.com/cuda-gpus).\n\n## LLM Example\n\n```python\nfrom llama import Model\n\n#\n# first define and load/init model\n#\nmodel = Model(\n creator_hf_repo='HuggingFaceTB/SmolLM2-1.7B-Instruct',\n hf_repo='bartowski/SmolLM2-1.7B-Instruct-GGUF',\n hf_file='SmolLM2-1.7B-Instruct-Q4_K_M.gguf',\n)\n\nmodel.init(n_ctx=8 * 1024, gpu_layers=99)\n\n#\n# messages\n#\nmessages = [\n {'role': 'system', 'content': 'You are a helpful assistant.'},\n {'role': 'user', 'content': '1 + 1 = ?'},\n {'role': 'assistant', 'content': '2'},\n {'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'},\n]\n\ncompletions = model.completions(\n messages=messages,\n predict=1 * 1024,\n temp=0.7,\n top_p=0.8,\n top_k=100,\n)\n\nfor chunk in completions:\n print(chunk, flush=True, end='')\n\n#\n# prompt\n#\nprompt='Evaluate 1 + 2 in Python. Result in Python is'\n\ncompletions = model.completions(\n prompt=prompt,\n predict=1 * 1024,\n temp=0.7,\n top_p=0.8,\n top_k=100,\n)\n\nfor chunk in completions:\n print(chunk, flush=True, end='')\n```\n\n### References\n- `examples/llm.py`\n- `examples/demo_text.py`\n\n## VLM Example\n\n```python\nfrom llama import Model\n\n#\n# first define and load/init model\n#\nmodel = Model( # 1.87B\n creator_hf_repo='vikhyatk/moondream2',\n hf_repo='vikhyatk/moondream2',\n hf_file='moondream2-text-model-f16.gguf',\n mmproj_hf_file='moondream2-mmproj-f16.gguf',\n)\n\nmodel.init(n_ctx=8 * 1024, gpu_layers=99)\n\n#\n# prompt\n#\nprompt = 'Describe this image.'\nimage = 'examples/llama-1.png'\n\ncompletions = model.completions(\n prompt=prompt,\n image=image,\n predict=1 * 1024,\n)\n\nfor chunk in completions:\n print(chunk, flush=True, end='')\n```\n\n### References\n- `examples/vlm.py`\n- `examples/demo_llava.py`\n- `examples/demo_minicpmv.py`\n- `examples/demo_qwen2vl.py`\n\n## API\n\n### Server - llama-cpp-cffi + OpenAI API\n\nRun server first:\n\n```bash\npython -m llama.server\n# or\npython -B -u -m gunicorn --bind '0.0.0.0:11434' --timeout 900 --workers 1 --worker-class aiohttp.GunicornWebWorker 'llama.server:build_app()'\n```\n\n### Client - llama-cpp-cffi API / curl\n\n```bash\n#\n# llm\n#\ncurl -XPOST 'http://localhost:11434/api/1.0/completions' \\\n-H \"Content-Type: application/json\" \\\n-d '{\n \"gpu_layers\": 99,\n \"prompt\": \"Evaluate 1 + 2 in Python.\"\n}'\n\ncurl -XPOST 'http://localhost:11434/api/1.0/completions' \\\n-H \"Content-Type: application/json\" \\\n-d '{\n \"creator_hf_repo\": \"HuggingFaceTB/SmolLM2-1.7B-Instruct\",\n \"hf_repo\": \"bartowski/SmolLM2-1.7B-Instruct-GGUF\",\n \"hf_file\": \"SmolLM2-1.7B-Instruct-Q4_K_M.gguf\",\n \"gpu_layers\": 99,\n \"prompt\": \"Evaluate 1 + 2 in Python.\"\n}'\n\ncurl -XPOST 'http://localhost:11434/api/1.0/completions' \\\n-H \"Content-Type: application/json\" \\\n-d '{\n \"creator_hf_repo\": \"Qwen/Qwen2.5-0.5B-Instruct\",\n \"hf_repo\": \"Qwen/Qwen2.5-0.5B-Instruct-GGUF\",\n \"hf_file\": \"qwen2.5-0.5b-instruct-q4_k_m.gguf\",\n \"gpu_layers\": 99,\n \"prompt\": \"Evaluate 1 + 2 in Python.\"\n}'\n\ncurl -XPOST 'http://localhost:11434/api/1.0/completions' \\\n-H \"Content-Type: application/json\" \\\n-d '{\n \"creator_hf_repo\": \"Qwen/Qwen2.5-7B-Instruct\",\n \"hf_repo\": \"bartowski/Qwen2.5-7B-Instruct-GGUF\",\n \"hf_file\": \"Qwen2.5-7B-Instruct-Q4_K_M.gguf\",\n \"gpu_layers\": 99,\n \"prompt\": \"Evaluate 1 + 2 in Python.\"\n}'\n\n#\n# vlm - example 1\n#\nimage_path=\"examples/llama-1.jpg\"\nmime_type=$(file -b --mime-type \"$image_path\")\nbase64_data=$(base64 -w 0 \"$image_path\")\n\ncat << EOF > /tmp/temp.json\n{\n \"creator_hf_repo\": \"Qwen/Qwen2-VL-2B-Instruct\",\n \"hf_repo\": \"bartowski/Qwen2-VL-2B-Instruct-GGUF\",\n \"hf_file\": \"Qwen2-VL-2B-Instruct-Q4_K_M.gguf\",\n \"mmproj_hf_file\": \"mmproj-Qwen2-VL-2B-Instruct-f16.gguf\",\n \"gpu_layers\": 99,\n \"prompt\": \"Describe this image.\",\n \"image\": \"data:$mime_type;base64,$base64_data\"\n}\nEOF\n\ncurl -XPOST 'http://localhost:11434/api/1.0/completions' \\\n-H \"Content-Type: application/json\" \\\n--data-binary \"@/tmp/temp.json\"\n\n#\n# vlm - example 2\n#\nimage_path=\"examples/llama-1.jpg\"\nmime_type=$(file -b --mime-type \"$image_path\")\nbase64_data=$(base64 -w 0 \"$image_path\")\n\ncat << EOF > /tmp/temp.json\n{\n \"creator_hf_repo\": \"Qwen/Qwen2-VL-2B-Instruct\",\n \"hf_repo\": \"bartowski/Qwen2-VL-2B-Instruct-GGUF\",\n \"hf_file\": \"Qwen2-VL-2B-Instruct-Q4_K_M.gguf\",\n \"mmproj_hf_file\": \"mmproj-Qwen2-VL-2B-Instruct-f16.gguf\",\n \"gpu_layers\": 99,\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"text\", \"text\": \"Describe this image.\"},\n {\n \"type\": \"image_url\",\n \"image_url\": {\"url\": \"data:$mime_type;base64,$base64_data\"}\n }\n ]}\n ]\n}\nEOF\n\ncurl -XPOST 'http://localhost:11434/api/1.0/completions' \\\n-H \"Content-Type: application/json\" \\\n--data-binary \"@/tmp/temp.json\"\n```\n\n### Client - OpenAI \u00a9 compatible Chat Completions API\n\n```bash\n#\n# text\n#\ncurl -XPOST 'http://localhost:11434/v1/chat/completions' \\\n-H \"Content-Type: application/json\" \\\n-d '{\n \"model\": \"HuggingFaceTB/SmolLM2-1.7B-Instruct:bartowski/SmolLM2-1.7B-Instruct-GGUF:SmolLM2-1.7B-Instruct-Q4_K_M.gguf\",\n \"messages\": [\n {\n \"role\": \"user\",\n \"content\": \"Evaluate 1 + 2 in Python.\"\n }\n ],\n \"n_ctx\": 8192,\n \"gpu_layers\": 99\n}'\n\n#\n# image\n#\nimage_path=\"examples/llama-1.jpg\"\nmime_type=$(file -b --mime-type \"$image_path\")\nbase64_data=$(base64 -w 0 \"$image_path\")\n\ncat << EOF > /tmp/temp.json\n{\n \"model\": \"Qwen/Qwen2-VL-2B-Instruct:bartowski/Qwen2-VL-2B-Instruct-GGUF:Qwen2-VL-2B-Instruct-Q4_K_M.gguf:mmproj-Qwen2-VL-2B-Instruct-f16.gguf\",\n \"messages\": [\n {\"role\": \"user\", \"content\": [\n {\"type\": \"text\", \"text\": \"Describe this image.\"},\n {\n \"type\": \"image_url\",\n \"image_url\": {\"url\": \"data:$mime_type;base64,$base64_data\"}\n }\n ]}\n ],\n \"n_ctx\": 8192,\n \"gpu_layers\": 99\n}\nEOF\n\ncurl -XPOST 'http://localhost:11434/v1/chat/completions' \\\n-H \"Content-Type: application/json\" \\\n--data-binary \"@/tmp/temp.json\"\n\n#\n# Client Python API for OpenAI\n#\npython -B examples/demo_openai.py\n```\n\n### References\n- `examples/demo_openai.py`\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Python binding for llama.cpp using cffi",
"version": "0.4.16",
"project_urls": {
"Bug Tracker": "https://github.com/tangledgroup/llama-cpp-cffi/issues",
"Documentation": "https://github.com/tangledgroup/llama-cpp-cffi",
"Homepage": "https://github.com/tangledgroup/llama-cpp-cffi",
"Repository": "https://github.com/tangledgroup/llama-cpp-cffi"
},
"split_keywords": [
"llama",
" llama-cpp",
" llama.cpp",
" llama-cpp-cffi",
" cffi"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "8de867dcae2fc225356a492d82adda2ab2f4c8fde66e163555b0d9dbfc4a0b2b",
"md5": "aee0a17e88582ff6e4b845215bf25733",
"sha256": "5d4f146ba87a7dc97bc7c2dad4dcfac9253a9430feaf634bccf118978054d387"
},
"downloads": -1,
"filename": "llama_cpp_cffi-0.4.16-cp312-cp312-manylinux_2_28_x86_64.whl",
"has_sig": false,
"md5_digest": "aee0a17e88582ff6e4b845215bf25733",
"packagetype": "bdist_wheel",
"python_version": "cp312",
"requires_python": ">=3.10",
"size": 318568445,
"upload_time": "2025-01-15T23:15:03",
"upload_time_iso_8601": "2025-01-15T23:15:03.782385Z",
"url": "https://files.pythonhosted.org/packages/8d/e8/67dcae2fc225356a492d82adda2ab2f4c8fde66e163555b0d9dbfc4a0b2b/llama_cpp_cffi-0.4.16-cp312-cp312-manylinux_2_28_x86_64.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-15 23:15:03",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "tangledgroup",
"github_project": "llama-cpp-cffi",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "llama-cpp-cffi"
}