llama_cpp

Name	llama_cpp_http JSON
Version	0.3.3 JSON
	download
home_page	https://github.com/mtasic85/python-llama-cpp-http
Summary	Python llama.cpp HTTP Server and LangChain LLM Client
upload_time	2023-11-10 13:55:03
maintainer
docs_url	None
author	Marko Tasic
requires_python	>=3.10,<4.0
license	MIT
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # python-llama-cpp-http

<!--
[![Build][build-image]]()
[![Status][status-image]][pypi-project-url]
[![Stable Version][stable-ver-image]][pypi-project-url]
[![Coverage][coverage-image]]()
[![Python][python-ver-image]][pypi-project-url]
[![License][mit-image]][mit-url]
-->
[![Downloads](https://img.shields.io/pypi/dm/llama_cpp_http)](https://pypistats.org/packages/llama_cpp_http)
[![Supported Versions](https://img.shields.io/pypi/pyversions/llama_cpp_http)](https://pypi.org/project/llama_cpp_http)
[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT)

Python HTTP Server and [LangChain](https://python.langchain.com) LLM Client for [llama.cpp](https://github.com/ggerganov/llama.cpp).

Server has only two routes:
- **call**: for a prompt get whole text completion at once: `POST` `/api/1.0/text/completion`
- **stream**: for a prompt get text chunks via WebSocket: `GET` `/api/1.0/text/completion`
- **embeddings**: for a prompt get text embeddings: `POST` `/api/1.0/text/embeddings`

LangChain LLM Client has support for sync calls only based on Python packages `requests` and `websockets`.

## Install

```bash
pip install llama_cpp_http
```

## Manual install

Assumption is that **GPU driver**, and **OpenCL** / **CUDA** libraries are installed.

Make sure you follow instructions from `LLAMA_CPP.md` below for one of following:
- CPU - including Apple, recommended for beginners
- OpenCL for AMDGPU/NVIDIA **CLBlast**
- HIP/ROCm for AMDGPU **hipBLAS**,
- CUDA for NVIDIA **cuBLAS**

It is the easiest to start with just **CPU**-based version of [llama.cpp](https://github.com/ggerganov/llama.cpp) if you do not want to deal with GPU drivers and libraries.

### Install build packages

- Arch/Manjaro: `sudo pacman -Sy base-devel python git jq`
- Debian/Ubuntu: `sudo apt install build-essential python3-dev python3-venv python3-pip libffi-dev libssl-dev git jq`

### Clone repo

```bash
git clone https://github.com/mtasic85/python-llama-cpp-http.git
cd python-llama-cpp-http
```

Make sure you are inside cloned repo directory `python-llama-cpp-http`.

### Setup python venv

```bash
python -m venv venv
source venv/bin/activate
python -m ensurepip --upgrade
pip install -U .
```

## Clone and compile llama.cpp

```bash
git clone https://github.com/ggerganov/llama.cpp llama.cpp
cd llama.cpp
make -j
```

## Download Meta's Llama 2 7B Model

Download GGUF model from https://huggingface.co/TheBloke/Llama-2-7B-GGUF to local directory `models`.

Our advice is to use model https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q2_K.gguf with minimum requirements, so it can fit in both RAM/VRAM.

## Run Server

```bash
python -m llama_cpp_http.server --backend cpu --models-path ./models --llama-cpp-path ./llama.cpp
```

Experimental:
```bash
gunicorn 'llama_cpp_http.server:get_gunicorn_app(backend="clblast", models_path="~/models", llama_cpp_path="~/llama.cpp-clblast", platforms_devices="0:0")' --reload --bind '0.0.0.0:5000' --worker-class aiohttp.GunicornWebWorker
```

## Run Client Examples

1) Simple text completion call `/api/1.0/text/completion`:

```bash
python -B misc/example_client_call.py | jq .
```

2) WebSocket stream `/api/1.0/text/completion`:

```bash
python -B misc/example_client_stream.py | jq -R '. as $line | try (fromjson) catch $line'
```

3) Simple text embeddings call `/api/1.0/text/embeddings`:

```bash
python -B misc/example_client_langchain_embedding.py
```

## Licensing

**python-llama-cpp-http** is licensed under the MIT license. Check the LICENSE for details.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/mtasic85/python-llama-cpp-http",
    "name": "llama_cpp_http",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.10,<4.0",
    "maintainer_email": "",
    "keywords": "",
    "author": "Marko Tasic",
    "author_email": "mtasic85@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/a9/54/7a04fd32afca116a4f35f998ad0110cf1a114c715dca8ec9ab6a812deb20/llama_cpp_http-0.3.3.tar.gz",
    "platform": null,
    "description": "# python-llama-cpp-http\n\n<!--\n[![Build][build-image]]()\n[![Status][status-image]][pypi-project-url]\n[![Stable Version][stable-ver-image]][pypi-project-url]\n[![Coverage][coverage-image]]()\n[![Python][python-ver-image]][pypi-project-url]\n[![License][mit-image]][mit-url]\n-->\n[![Downloads](https://img.shields.io/pypi/dm/llama_cpp_http)](https://pypistats.org/packages/llama_cpp_http)\n[![Supported Versions](https://img.shields.io/pypi/pyversions/llama_cpp_http)](https://pypi.org/project/llama_cpp_http)\n[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n\nPython HTTP Server and [LangChain](https://python.langchain.com) LLM Client for [llama.cpp](https://github.com/ggerganov/llama.cpp).\n\nServer has only two routes:\n- **call**: for a prompt get whole text completion at once: `POST` `/api/1.0/text/completion`\n- **stream**: for a prompt get text chunks via WebSocket: `GET` `/api/1.0/text/completion`\n- **embeddings**: for a prompt get text embeddings: `POST` `/api/1.0/text/embeddings`\n\nLangChain LLM Client has support for sync calls only based on Python packages `requests` and `websockets`.\n\n## Install\n\n```bash\npip install llama_cpp_http\n```\n\n## Manual install\n\nAssumption is that **GPU driver**, and **OpenCL** / **CUDA** libraries are installed.\n\nMake sure you follow instructions from `LLAMA_CPP.md` below for one of following:\n- CPU - including Apple, recommended for beginners\n- OpenCL for AMDGPU/NVIDIA **CLBlast**\n- HIP/ROCm for AMDGPU **hipBLAS**,\n- CUDA for NVIDIA **cuBLAS**\n\nIt is the easiest to start with just **CPU**-based version of [llama.cpp](https://github.com/ggerganov/llama.cpp) if you do not want to deal with GPU drivers and libraries.\n\n### Install build packages\n\n- Arch/Manjaro: `sudo pacman -Sy base-devel python git jq`\n- Debian/Ubuntu: `sudo apt install build-essential python3-dev python3-venv python3-pip libffi-dev libssl-dev git jq`\n\n### Clone repo\n\n```bash\ngit clone https://github.com/mtasic85/python-llama-cpp-http.git\ncd python-llama-cpp-http\n```\n\nMake sure you are inside cloned repo directory `python-llama-cpp-http`.\n\n### Setup python venv\n\n```bash\npython -m venv venv\nsource venv/bin/activate\npython -m ensurepip --upgrade\npip install -U .\n```\n\n## Clone and compile llama.cpp\n\n```bash\ngit clone https://github.com/ggerganov/llama.cpp llama.cpp\ncd llama.cpp\nmake -j\n```\n\n## Download Meta's Llama 2 7B Model\n\nDownload GGUF model from https://huggingface.co/TheBloke/Llama-2-7B-GGUF to local directory `models`.\n\nOur advice is to use model https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q2_K.gguf with minimum requirements, so it can fit in both RAM/VRAM.\n\n## Run Server\n\n```bash\npython -m llama_cpp_http.server --backend cpu --models-path ./models --llama-cpp-path ./llama.cpp\n```\n\nExperimental:\n```bash\ngunicorn 'llama_cpp_http.server:get_gunicorn_app(backend=\"clblast\", models_path=\"~/models\", llama_cpp_path=\"~/llama.cpp-clblast\", platforms_devices=\"0:0\")' --reload --bind '0.0.0.0:5000' --worker-class aiohttp.GunicornWebWorker\n```\n\n## Run Client Examples\n\n1) Simple text completion call `/api/1.0/text/completion`:\n\n```bash\npython -B misc/example_client_call.py | jq .\n```\n\n2) WebSocket stream `/api/1.0/text/completion`:\n\n```bash\npython -B misc/example_client_stream.py | jq -R '. as $line | try (fromjson) catch $line'\n```\n\n3) Simple text embeddings call `/api/1.0/text/embeddings`:\n\n```bash\npython -B misc/example_client_langchain_embedding.py\n```\n\n## Licensing\n\n**python-llama-cpp-http** is licensed under the MIT license. Check the LICENSE for details.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Python llama.cpp HTTP Server and LangChain LLM Client",
    "version": "0.3.3",
    "project_urls": {
        "Homepage": "https://github.com/mtasic85/python-llama-cpp-http",
        "Repository": "https://github.com/mtasic85/python-llama-cpp-http"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "209615fc59231f310f402269400528ebeb1752118e3ac0ec5c9b4510b6030fca",
                "md5": "4d3528fa60bc3c2656a018740f7bb7b3",
                "sha256": "653f87e553b3c0a42ec4125e137948f825dfe57a003ce5c7344e8f05215813cb"
            },
            "downloads": -1,
            "filename": "llama_cpp_http-0.3.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4d3528fa60bc3c2656a018740f7bb7b3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10,<4.0",
            "size": 11326,
            "upload_time": "2023-11-10T13:55:01",
            "upload_time_iso_8601": "2023-11-10T13:55:01.736986Z",
            "url": "https://files.pythonhosted.org/packages/20/96/15fc59231f310f402269400528ebeb1752118e3ac0ec5c9b4510b6030fca/llama_cpp_http-0.3.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a9547a04fd32afca116a4f35f998ad0110cf1a114c715dca8ec9ab6a812deb20",
                "md5": "bf629af0a5bfce2ff114226a188be453",
                "sha256": "f4b033115391bebca744d396c925a56bc1a7763ff20bb068cec1c31f924fe0ed"
            },
            "downloads": -1,
            "filename": "llama_cpp_http-0.3.3.tar.gz",
            "has_sig": false,
            "md5_digest": "bf629af0a5bfce2ff114226a188be453",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10,<4.0",
            "size": 10863,
            "upload_time": "2023-11-10T13:55:03",
            "upload_time_iso_8601": "2023-11-10T13:55:03.707964Z",
            "url": "https://files.pythonhosted.org/packages/a9/54/7a04fd32afca116a4f35f998ad0110cf1a114c715dca8ec9ab6a812deb20/llama_cpp_http-0.3.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-11-10 13:55:03",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mtasic85",
    "github_project": "python-llama-cpp-http",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "llama_cpp_http"
}

Marko Tasic