llm-llama-cpp

Name	llm-llama-cpp JSON
Version	0.3 JSON
	download
home_page
Summary	LLM plugin for running models using llama.cpp
upload_time	2023-12-09 05:49:28
maintainer
docs_url	None
author	Simon Willison
requires_python
license	Apache-2.0
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # llm-llama-cpp

[![PyPI](https://img.shields.io/pypi/v/llm-llama-cpp.svg)](https://pypi.org/project/llm-llama-cpp/)
[![Changelog](https://img.shields.io/github/v/release/simonw/llm-llama-cpp?include_prereleases&label=changelog)](https://github.com/simonw/llm-llama-cpp/releases)
[![Tests](https://github.com/simonw/llm-llama-cpp/workflows/Test/badge.svg)](https://github.com/simonw/llm-llama-cpp/actions?query=workflow%3ATest)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/llm-llama-cpp/blob/main/LICENSE)

[LLM](https://llm.datasette.io/) plugin for running models using [llama.cpp](https://github.com/ggerganov/llama.cpp)

## Installation

Install this plugin in the same environment as `llm`.
```bash
llm install llm-llama-cpp
```
The plugin has an additional dependency on [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) which needs to be installed separately.

If you have a C compiler available on your system you can install that like so:
```bash
llm install llama-cpp-python
```
You could also try installing one of the wheels made available in their [latest release](https://github.com/abetlen/llama-cpp-python/releases/latest) on GitHub. Find the URL to the wheel for your platform, if one exists, and run:
```bash
llm install https://...
```
If you are on an Apple Silicon Mac you can try this command, which should compile the package with METAL support for running on your GPU:

```bash
CMAKE_ARGS="-DLLAMA_METAL=on" FORCE_CMAKE=1 llm install llama-cpp-python
```
## Running a GGUF model directly

The quickest way to try this plugin out is to download a GGUF file and execute that using the `gguf` model with the `-o path PATH` option:

For example, download the `una-cybertron-7b-v2-bf16.Q8_0.gguf` file from [TheBloke/una-cybertron-7B-v2-GGUF](https://huggingface.co/TheBloke/una-cybertron-7B-v2-GGUF/tree/main) and execute it like this:

```bash
llm -m gguf \
  -o path una-cybertron-7b-v2-bf16.Q8_0.gguf \
  'Instruction: Five reasons to get a pet walrus
Response:'
```
The output starts like this:

>  1. Walruses are fascinating animals that possess unique qualities that can captivate and entertain you for hours on end. Getting the chance to be around one regularly would ensure that you'll never run out of interesting things to learn about them, whether from an educational or personal standpoint.
>
> 2. Pet walruses can help alleviate depression and anxiety, as they require constant care and attention. Nurturing a relationship with these intelligent creatures provides comfort and fulfillment, fostering a sense of purpose in your daily life. Moreover, their playful nature encourages laughter and joy, promoting overall happiness. [...]

## Adding models

You can also add or download models to execute them directly using the `-m` option.

This tool should work with any model that works with `llama.cpp`.

The plugin can download models for you. Try running this command:

```bash
llm llama-cpp download-model \
  https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF/resolve/main/llama-2-7b-chat.Q6_K.gguf \
  --alias llama2-chat --alias l2c --llama2-chat
```
This will download the Llama 2 7B Chat GGUF model file (this one is 5.53GB), save it and register it with the plugin - with two aliases, `llama2-chat` and `l2c`.

The `--llama2-chat` option configures it to run using a special Llama 2 Chat prompt format. You should omit this for models that are not Llama 2 Chat models.

If you have already downloaded a `llama.cpp` compatible model you can tell the plugin to read it from its current location like this:

```bash
llm llama-cpp add-model path/to/llama-2-7b-chat.Q6_K.gguf \
  --alias l27c --llama2-chat
```
The model filename (minus the `.gguf` extension) will be registered as its ID for executing the model.

You can also set one or more aliases using the `--alias` option.

You can see a list of models you have registered in this way like this:
```bash
llm llama-cpp models
```
Models are registered in a `models.json` file. You can find the path to that file in order to edit it directly like so:
```bash
llm llama-cpp models-file
```
For example, to edit that file in Vim:
```bash
vim "$(llm llama-cpp models-file)"
```
To find the directory with downloaded models, run:
```bash
llm llama-cpp models-dir
```
Here's how to change to that directory:
```bash
cd "$(llm llama-cpp models-dir)"
```

## Running a prompt through a model

Once you have downloaded and added a model, you can run a prompt like this:
```bash
llm -m llama-2-7b-chat.Q6_K 'five names for a cute pet skunk'
```
Or if you registered an alias you can use that instead:
```bash
llm -m llama2-chat 'five creative names for a pet hedgehog'
```

## More models to try

### Llama 2 7B

This model is Llama 2 7B GGML without the chat training. You'll need to prompt it slightly differently:
```bash
llm llama-cpp download-model \
  https://huggingface.co/TheBloke/Llama-2-7B-GGUF/resolve/main/llama-2-7b.Q6_K.gguf \
  --alias llama2
```
Try prompts that expect to be completed by the model, for example:
```bash
llm -m llama2 'Three fancy names for a posh albatross are:'
```
### Llama 2 Chat 13B

This model is the Llama 2 13B Chat GGML model - a 10.7GB download:
```bash
llm llama-cpp download-model \
  'https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q6_K.gguf'\
  -a llama2-chat-13b --llama2-chat
```

### Llama 2 Python 13B

This model is the Llama 2 13B Python GGML model - a 9.24GB download:
```bash
llm llama-cpp download-model \
  'https://huggingface.co/TheBloke/CodeLlama-13B-Python-GGUF/resolve/main/codellama-13b-python.Q5_K_M.gguf'\
  -a llama2-python-13b --llama2-chat
```

## Options

The following options are available:

- `-o verbose 1` - output more verbose logging
- `-o max_tokens 100` - max tokens to return. Defaults to 4000.
- `-o no_gpu 1` - remove the default `n_gpu_layers=1`` argument, which should disable GPU usage
- `-o n_gpu_layers 10` - increase the `n_gpu_layers` argument to a higher value (the default is `1`)
- `-o n_ctx 1024` - set the `n_ctx` argument to `1024` (the default is `4000`)

For example:

```bash
llm chat -m llama2-chat-13b -o n_ctx 1024
```

These are mainly provided to support experimenting with different ways of executing the underlying model.

## Development

To set up this plugin locally, first checkout the code. Then create a new virtual environment:
```bash
cd llm-llama-cpp
python3 -m venv venv
source venv/bin/activate
```
Now install the dependencies and test dependencies:
```bash
pip install -e '.[test]'
```
To run the tests:
```bash
pytest
```

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "llm-llama-cpp",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Simon Willison",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/5c/3a/0163b63b1cce4b52877fd29bd25ec1b1335f2757c785d3f022b0f5c44997/llm-llama-cpp-0.3.tar.gz",
    "platform": null,
    "description": "# llm-llama-cpp\n\n[![PyPI](https://img.shields.io/pypi/v/llm-llama-cpp.svg)](https://pypi.org/project/llm-llama-cpp/)\n[![Changelog](https://img.shields.io/github/v/release/simonw/llm-llama-cpp?include_prereleases&label=changelog)](https://github.com/simonw/llm-llama-cpp/releases)\n[![Tests](https://github.com/simonw/llm-llama-cpp/workflows/Test/badge.svg)](https://github.com/simonw/llm-llama-cpp/actions?query=workflow%3ATest)\n[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/llm-llama-cpp/blob/main/LICENSE)\n\n[LLM](https://llm.datasette.io/) plugin for running models using [llama.cpp](https://github.com/ggerganov/llama.cpp)\n\n## Installation\n\nInstall this plugin in the same environment as `llm`.\n```bash\nllm install llm-llama-cpp\n```\nThe plugin has an additional dependency on [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) which needs to be installed separately.\n\nIf you have a C compiler available on your system you can install that like so:\n```bash\nllm install llama-cpp-python\n```\nYou could also try installing one of the wheels made available in their [latest release](https://github.com/abetlen/llama-cpp-python/releases/latest) on GitHub. Find the URL to the wheel for your platform, if one exists, and run:\n```bash\nllm install https://...\n```\nIf you are on an Apple Silicon Mac you can try this command, which should compile the package with METAL support for running on your GPU:\n\n```bash\nCMAKE_ARGS=\"-DLLAMA_METAL=on\" FORCE_CMAKE=1 llm install llama-cpp-python\n```\n## Running a GGUF model directly\n\nThe quickest way to try this plugin out is to download a GGUF file and execute that using the `gguf` model with the `-o path PATH` option:\n\nFor example, download the `una-cybertron-7b-v2-bf16.Q8_0.gguf` file from [TheBloke/una-cybertron-7B-v2-GGUF](https://huggingface.co/TheBloke/una-cybertron-7B-v2-GGUF/tree/main) and execute it like this:\n\n```bash\nllm -m gguf \\\n  -o path una-cybertron-7b-v2-bf16.Q8_0.gguf \\\n  'Instruction: Five reasons to get a pet walrus\nResponse:'\n```\nThe output starts like this:\n\n>  1. Walruses are fascinating animals that possess unique qualities that can captivate and entertain you for hours on end. Getting the chance to be around one regularly would ensure that you'll never run out of interesting things to learn about them, whether from an educational or personal standpoint.\n>\n> 2. Pet walruses can help alleviate depression and anxiety, as they require constant care and attention. Nurturing a relationship with these intelligent creatures provides comfort and fulfillment, fostering a sense of purpose in your daily life. Moreover, their playful nature encourages laughter and joy, promoting overall happiness. [...]\n\n## Adding models\n\nYou can also add or download models to execute them directly using the `-m` option.\n\nThis tool should work with any model that works with `llama.cpp`.\n\nThe plugin can download models for you. Try running this command:\n\n```bash\nllm llama-cpp download-model \\\n  https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF/resolve/main/llama-2-7b-chat.Q6_K.gguf \\\n  --alias llama2-chat --alias l2c --llama2-chat\n```\nThis will download the Llama 2 7B Chat GGUF model file (this one is 5.53GB), save it and register it with the plugin - with two aliases, `llama2-chat` and `l2c`.\n\nThe `--llama2-chat` option configures it to run using a special Llama 2 Chat prompt format. You should omit this for models that are not Llama 2 Chat models.\n\nIf you have already downloaded a `llama.cpp` compatible model you can tell the plugin to read it from its current location like this:\n\n```bash\nllm llama-cpp add-model path/to/llama-2-7b-chat.Q6_K.gguf \\\n  --alias l27c --llama2-chat\n```\nThe model filename (minus the `.gguf` extension) will be registered as its ID for executing the model.\n\nYou can also set one or more aliases using the `--alias` option.\n\nYou can see a list of models you have registered in this way like this:\n```bash\nllm llama-cpp models\n```\nModels are registered in a `models.json` file. You can find the path to that file in order to edit it directly like so:\n```bash\nllm llama-cpp models-file\n```\nFor example, to edit that file in Vim:\n```bash\nvim \"$(llm llama-cpp models-file)\"\n```\nTo find the directory with downloaded models, run:\n```bash\nllm llama-cpp models-dir\n```\nHere's how to change to that directory:\n```bash\ncd \"$(llm llama-cpp models-dir)\"\n```\n\n## Running a prompt through a model\n\nOnce you have downloaded and added a model, you can run a prompt like this:\n```bash\nllm -m llama-2-7b-chat.Q6_K 'five names for a cute pet skunk'\n```\nOr if you registered an alias you can use that instead:\n```bash\nllm -m llama2-chat 'five creative names for a pet hedgehog'\n```\n\n## More models to try\n\n### Llama 2 7B\n\nThis model is Llama 2 7B GGML without the chat training. You'll need to prompt it slightly differently:\n```bash\nllm llama-cpp download-model \\\n  https://huggingface.co/TheBloke/Llama-2-7B-GGUF/resolve/main/llama-2-7b.Q6_K.gguf \\\n  --alias llama2\n```\nTry prompts that expect to be completed by the model, for example:\n```bash\nllm -m llama2 'Three fancy names for a posh albatross are:'\n```\n### Llama 2 Chat 13B\n\nThis model is the Llama 2 13B Chat GGML model - a 10.7GB download:\n```bash\nllm llama-cpp download-model \\\n  'https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q6_K.gguf'\\\n  -a llama2-chat-13b --llama2-chat\n```\n\n### Llama 2 Python 13B\n\nThis model is the Llama 2 13B Python GGML model - a 9.24GB download:\n```bash\nllm llama-cpp download-model \\\n  'https://huggingface.co/TheBloke/CodeLlama-13B-Python-GGUF/resolve/main/codellama-13b-python.Q5_K_M.gguf'\\\n  -a llama2-python-13b --llama2-chat\n```\n\n## Options\n\nThe following options are available:\n\n- `-o verbose 1` - output more verbose logging\n- `-o max_tokens 100` - max tokens to return. Defaults to 4000.\n- `-o no_gpu 1` - remove the default `n_gpu_layers=1`` argument, which should disable GPU usage\n- `-o n_gpu_layers 10` - increase the `n_gpu_layers` argument to a higher value (the default is `1`)\n- `-o n_ctx 1024` - set the `n_ctx` argument to `1024` (the default is `4000`)\n\nFor example:\n\n```bash\nllm chat -m llama2-chat-13b -o n_ctx 1024\n```\n\nThese are mainly provided to support experimenting with different ways of executing the underlying model.\n\n## Development\n\nTo set up this plugin locally, first checkout the code. Then create a new virtual environment:\n```bash\ncd llm-llama-cpp\npython3 -m venv venv\nsource venv/bin/activate\n```\nNow install the dependencies and test dependencies:\n```bash\npip install -e '.[test]'\n```\nTo run the tests:\n```bash\npytest\n```\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "LLM plugin for running models using llama.cpp",
    "version": "0.3",
    "project_urls": {
        "CI": "https://github.com/simonw/llm-llama-cpp/actions",
        "Changelog": "https://github.com/simonw/llm-llama-cpp/releases",
        "Homepage": "https://github.com/simonw/llm-llama-cpp",
        "Issues": "https://github.com/simonw/llm-llama-cpp/issues"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "129d696c621382c439ccff25535e3b09fad5c990cfae11ec4bdf65c2eaecd690",
                "md5": "1bc71704c36b1dd7d01f53c24c24a8e8",
                "sha256": "a67d30ca7724a202b96d6d1f6dd9cbf1a6e44bde03db03820a487282eb857588"
            },
            "downloads": -1,
            "filename": "llm_llama_cpp-0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1bc71704c36b1dd7d01f53c24c24a8e8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 11627,
            "upload_time": "2023-12-09T05:49:26",
            "upload_time_iso_8601": "2023-12-09T05:49:26.702530Z",
            "url": "https://files.pythonhosted.org/packages/12/9d/696c621382c439ccff25535e3b09fad5c990cfae11ec4bdf65c2eaecd690/llm_llama_cpp-0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5c3a0163b63b1cce4b52877fd29bd25ec1b1335f2757c785d3f022b0f5c44997",
                "md5": "12a81286f02b77e3dbfa9ba29b10d08c",
                "sha256": "8de7ab99fc62510e96ff8d5ca084aebf7b8f6051eae114409a7c35cd6d26c557"
            },
            "downloads": -1,
            "filename": "llm-llama-cpp-0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "12a81286f02b77e3dbfa9ba29b10d08c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 11469,
            "upload_time": "2023-12-09T05:49:28",
            "upload_time_iso_8601": "2023-12-09T05:49:28.094016Z",
            "url": "https://files.pythonhosted.org/packages/5c/3a/0163b63b1cce4b52877fd29bd25ec1b1335f2757c785d3f022b0f5c44997/llm-llama-cpp-0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-09 05:49:28",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "simonw",
    "github_project": "llm-llama-cpp",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "llm-llama-cpp"
}

Simon Willison