# python-llama-cpp-http
<!--
[![Build][build-image]]()
[![Status][status-image]][pypi-project-url]
[![Stable Version][stable-ver-image]][pypi-project-url]
[![Coverage][coverage-image]]()
[![Python][python-ver-image]][pypi-project-url]
[![License][mit-image]][mit-url]
-->
[](https://pypistats.org/packages/llama_cpp_http)
[](https://pypi.org/project/llama_cpp_http)
[](https://opensource.org/licenses/MIT)
Python HTTP Server and [LangChain](https://python.langchain.com) LLM Client for [llama.cpp](https://github.com/ggerganov/llama.cpp).
Server has only two routes:
- **call**: for a prompt get whole text completion at once: `POST` `/api/1.0/text/completion`
- **stream**: for a prompt get text chunks via WebSocket: `GET` `/api/1.0/text/completion`
- **embeddings**: for a prompt get text embeddings: `POST` `/api/1.0/text/embeddings`
LangChain LLM Client has support for sync calls only based on Python packages `requests` and `websockets`.
## Install
```bash
pip install llama_cpp_http
```
## Manual install
Assumption is that **GPU driver**, and **OpenCL** / **CUDA** libraries are installed.
Make sure you follow instructions from `LLAMA_CPP.md` below for one of following:
- CPU - including Apple, recommended for beginners
- OpenCL for AMDGPU/NVIDIA **CLBlast**
- HIP/ROCm for AMDGPU **hipBLAS**,
- CUDA for NVIDIA **cuBLAS**
It is the easiest to start with just **CPU**-based version of [llama.cpp](https://github.com/ggerganov/llama.cpp) if you do not want to deal with GPU drivers and libraries.
### Install build packages
- Arch/Manjaro: `sudo pacman -Sy base-devel python git jq`
- Debian/Ubuntu: `sudo apt install build-essential python3-dev python3-venv python3-pip libffi-dev libssl-dev git jq`
### Clone repo
```bash
git clone https://github.com/mtasic85/python-llama-cpp-http.git
cd python-llama-cpp-http
```
Make sure you are inside cloned repo directory `python-llama-cpp-http`.
### Setup python venv
```bash
python -m venv venv
source venv/bin/activate
python -m ensurepip --upgrade
pip install -U .
```
## Clone and compile llama.cpp
```bash
git clone https://github.com/ggerganov/llama.cpp llama.cpp
cd llama.cpp
make -j
```
## Download Meta's Llama 2 7B Model
Download GGUF model from https://huggingface.co/TheBloke/Llama-2-7B-GGUF to local directory `models`.
Our advice is to use model https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q2_K.gguf with minimum requirements, so it can fit in both RAM/VRAM.
## Run Server
```bash
python -m llama_cpp_http.server --backend cpu --models-path ./models --llama-cpp-path ./llama.cpp
```
Experimental:
```bash
gunicorn 'llama_cpp_http.server:get_gunicorn_app(backend="clblast", models_path="~/models", llama_cpp_path="~/llama.cpp-clblast", platforms_devices="0:0")' --reload --bind '0.0.0.0:5000' --worker-class aiohttp.GunicornWebWorker
```
## Run Client Examples
1) Simple text completion call `/api/1.0/text/completion`:
```bash
python -B misc/example_client_call.py | jq .
```
2) WebSocket stream `/api/1.0/text/completion`:
```bash
python -B misc/example_client_stream.py | jq -R '. as $line | try (fromjson) catch $line'
```
3) Simple text embeddings call `/api/1.0/text/embeddings`:
```bash
python -B misc/example_client_langchain_embedding.py
```
## Licensing
**python-llama-cpp-http** is licensed under the MIT license. Check the LICENSE for details.
Raw data
{
"_id": null,
"home_page": "https://github.com/mtasic85/python-llama-cpp-http",
"name": "llama_cpp_http",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.10,<4.0",
"maintainer_email": "",
"keywords": "",
"author": "Marko Tasic",
"author_email": "mtasic85@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/a9/54/7a04fd32afca116a4f35f998ad0110cf1a114c715dca8ec9ab6a812deb20/llama_cpp_http-0.3.3.tar.gz",
"platform": null,
"description": "# python-llama-cpp-http\n\n<!--\n[![Build][build-image]]()\n[![Status][status-image]][pypi-project-url]\n[![Stable Version][stable-ver-image]][pypi-project-url]\n[![Coverage][coverage-image]]()\n[![Python][python-ver-image]][pypi-project-url]\n[![License][mit-image]][mit-url]\n-->\n[](https://pypistats.org/packages/llama_cpp_http)\n[](https://pypi.org/project/llama_cpp_http)\n[](https://opensource.org/licenses/MIT)\n\nPython HTTP Server and [LangChain](https://python.langchain.com) LLM Client for [llama.cpp](https://github.com/ggerganov/llama.cpp).\n\nServer has only two routes:\n- **call**: for a prompt get whole text completion at once: `POST` `/api/1.0/text/completion`\n- **stream**: for a prompt get text chunks via WebSocket: `GET` `/api/1.0/text/completion`\n- **embeddings**: for a prompt get text embeddings: `POST` `/api/1.0/text/embeddings`\n\nLangChain LLM Client has support for sync calls only based on Python packages `requests` and `websockets`.\n\n## Install\n\n```bash\npip install llama_cpp_http\n```\n\n## Manual install\n\nAssumption is that **GPU driver**, and **OpenCL** / **CUDA** libraries are installed.\n\nMake sure you follow instructions from `LLAMA_CPP.md` below for one of following:\n- CPU - including Apple, recommended for beginners\n- OpenCL for AMDGPU/NVIDIA **CLBlast**\n- HIP/ROCm for AMDGPU **hipBLAS**,\n- CUDA for NVIDIA **cuBLAS**\n\nIt is the easiest to start with just **CPU**-based version of [llama.cpp](https://github.com/ggerganov/llama.cpp) if you do not want to deal with GPU drivers and libraries.\n\n### Install build packages\n\n- Arch/Manjaro: `sudo pacman -Sy base-devel python git jq`\n- Debian/Ubuntu: `sudo apt install build-essential python3-dev python3-venv python3-pip libffi-dev libssl-dev git jq`\n\n### Clone repo\n\n```bash\ngit clone https://github.com/mtasic85/python-llama-cpp-http.git\ncd python-llama-cpp-http\n```\n\nMake sure you are inside cloned repo directory `python-llama-cpp-http`.\n\n### Setup python venv\n\n```bash\npython -m venv venv\nsource venv/bin/activate\npython -m ensurepip --upgrade\npip install -U .\n```\n\n## Clone and compile llama.cpp\n\n```bash\ngit clone https://github.com/ggerganov/llama.cpp llama.cpp\ncd llama.cpp\nmake -j\n```\n\n## Download Meta's Llama 2 7B Model\n\nDownload GGUF model from https://huggingface.co/TheBloke/Llama-2-7B-GGUF to local directory `models`.\n\nOur advice is to use model https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q2_K.gguf with minimum requirements, so it can fit in both RAM/VRAM.\n\n## Run Server\n\n```bash\npython -m llama_cpp_http.server --backend cpu --models-path ./models --llama-cpp-path ./llama.cpp\n```\n\nExperimental:\n```bash\ngunicorn 'llama_cpp_http.server:get_gunicorn_app(backend=\"clblast\", models_path=\"~/models\", llama_cpp_path=\"~/llama.cpp-clblast\", platforms_devices=\"0:0\")' --reload --bind '0.0.0.0:5000' --worker-class aiohttp.GunicornWebWorker\n```\n\n## Run Client Examples\n\n1) Simple text completion call `/api/1.0/text/completion`:\n\n```bash\npython -B misc/example_client_call.py | jq .\n```\n\n2) WebSocket stream `/api/1.0/text/completion`:\n\n```bash\npython -B misc/example_client_stream.py | jq -R '. as $line | try (fromjson) catch $line'\n```\n\n3) Simple text embeddings call `/api/1.0/text/embeddings`:\n\n```bash\npython -B misc/example_client_langchain_embedding.py\n```\n\n## Licensing\n\n**python-llama-cpp-http** is licensed under the MIT license. Check the LICENSE for details.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Python llama.cpp HTTP Server and LangChain LLM Client",
"version": "0.3.3",
"project_urls": {
"Homepage": "https://github.com/mtasic85/python-llama-cpp-http",
"Repository": "https://github.com/mtasic85/python-llama-cpp-http"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "209615fc59231f310f402269400528ebeb1752118e3ac0ec5c9b4510b6030fca",
"md5": "4d3528fa60bc3c2656a018740f7bb7b3",
"sha256": "653f87e553b3c0a42ec4125e137948f825dfe57a003ce5c7344e8f05215813cb"
},
"downloads": -1,
"filename": "llama_cpp_http-0.3.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "4d3528fa60bc3c2656a018740f7bb7b3",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10,<4.0",
"size": 11326,
"upload_time": "2023-11-10T13:55:01",
"upload_time_iso_8601": "2023-11-10T13:55:01.736986Z",
"url": "https://files.pythonhosted.org/packages/20/96/15fc59231f310f402269400528ebeb1752118e3ac0ec5c9b4510b6030fca/llama_cpp_http-0.3.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a9547a04fd32afca116a4f35f998ad0110cf1a114c715dca8ec9ab6a812deb20",
"md5": "bf629af0a5bfce2ff114226a188be453",
"sha256": "f4b033115391bebca744d396c925a56bc1a7763ff20bb068cec1c31f924fe0ed"
},
"downloads": -1,
"filename": "llama_cpp_http-0.3.3.tar.gz",
"has_sig": false,
"md5_digest": "bf629af0a5bfce2ff114226a188be453",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10,<4.0",
"size": 10863,
"upload_time": "2023-11-10T13:55:03",
"upload_time_iso_8601": "2023-11-10T13:55:03.707964Z",
"url": "https://files.pythonhosted.org/packages/a9/54/7a04fd32afca116a4f35f998ad0110cf1a114c715dca8ec9ab6a812deb20/llama_cpp_http-0.3.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-11-10 13:55:03",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "mtasic85",
"github_project": "python-llama-cpp-http",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "llama_cpp_http"
}