llama-server


Namellama-server JSON
Version 0.1.0 PyPI version JSON
download
home_pagehttps://github.com/nuance1979/llama-server
SummaryLLaMA Server combines the power of LLaMA C++ with the beauty of Chatbot UI.
upload_time2023-04-12 04:40:06
maintainer
docs_urlNone
authorYi Su
requires_python>=3.8
licenseMIT
keywords llama llama.cpp chatbot-ui chatbot
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # LLaMA Server

[![PyPI version](https://img.shields.io/pypi/v/llama-server)](https://pypi.org/project/llama-server/)[![Unit test](https://github.com/nuance1979/llama-server/actions/workflows/test.yml/badge.svg?branch=main&&event=push)](https://github.com/nuance1979/llama-server/actions)[![GitHub stars](https://img.shields.io/github/stars/nuance1979/llama-server)](https://star-history.com/#nuance1979/llama-server&Date)[![GitHub license](https://img.shields.io/github/license/nuance1979/llama-server)](https://github.com/nuance1979/llama-server/blob/master/LICENSE)

LLaMA Server combines the power of [LLaMA C++](https://github.com/ggerganov/llama.cpp) (via [PyLLaMACpp](https://github.com/nomic-ai/pyllamacpp)) with the beauty of [Chatbot UI](https://github.com/mckaywrigley/chatbot-ui).

🦙LLaMA C++ (via 🐍PyLLaMACpp) ➕ 🤖Chatbot UI ➕ 🔗LLaMA Server 🟰 😊

**UPDATE**: Now supports better streaming through [PyLLaMACpp](https://github.com/nomic-ai/pyllamacpp)!

**UPDATE**: Now supports streaming!

## Demo
- Streaming

https://user-images.githubusercontent.com/10931178/229980159-61546fa6-2985-4cdc-8230-5dcb6a69c559.mov

- Non-streaming

https://user-images.githubusercontent.com/10931178/229408428-5b6ef72d-28d0-427f-ae83-e23972e2dcff.mov


## Setup

- Get your favorite LLaMA models by
  - Download from [🤗Hugging Face](https://huggingface.co/models?sort=downloads&search=ggml);
  - Or follow instructions at [LLaMA C++](https://github.com/ggerganov/llama.cpp);
  - Make sure models are converted and quantized;

- Create a `models.yml` file to provide your `model_home` directory and add your favorite [South American camelids](https://en.wikipedia.org/wiki/Lama_(genus)), e.g.:
```yaml
model_home: /path/to/your/models
models:
  llama-7b:
    name: LLAMA-7B
    path: 7B/ggml-model-q4_0.bin  # relative to `model_home` or an absolute path
```
See [models.yml](https://github.com/nuance1979/llama-server/blob/main/models.yml) for an example.

- Set up python environment:
```bash
conda create -n llama python=3.9
conda activate llama
```

- Install LLaMA Server:
  - From PyPI:
  ```bash
  python -m pip install llama-server
  ```
  - Or from source:
  ```bash
  python -m pip install git+https://github.com/nuance1979/llama-server.git
  ```

- Install a patched version of PyLLaMACpp: (*Note:* this step will not be needed **after** PyLLaMACpp makes a new release.)
```bash
python -m pip install git+https://github.com/nuance1979/pyllamacpp.git@dev --upgrade
```

- Start LLaMA Server with your `models.yml` file:
```bash
llama-server --models-yml models.yml --model-id llama-7b
```

- Check out [my fork](https://github.com/nuance1979/chatbot-ui) of Chatbot UI and start the app;
```bash
git clone https://github.com/nuance1979/chatbot-ui
cd chatbot-ui
git checkout llama
npm i
npm run dev
```
- Open the link http://localhost:3000 in your browser;
  - Click "OpenAI API Key" at the bottom left corner and enter your [OpenAI API Key](https://platform.openai.com/account/api-keys);
  - Or follow instructions at [Chatbot UI](https://github.com/mckaywrigley/chatbot-ui) to put your key into a `.env.local` file and restart;
  ```bash
  cp .env.local.example .env.local
  <edit .env.local to add your OPENAI_API_KEY>
  ```
- Enjoy!

## More

- Try a larger model if you have it:
```bash
llama-server --models-yml models.yml --model-id llama-13b  # or any `model_id` defined in `models.yml`
```

- Try non-streaming mode by restarting Chatbot UI:
```bash
export LLAMA_STREAM_MODE=1  # 0 to disable streaming
npm run dev
```

## Limitations

- "Regenerate response" is currently not working;
- IMHO, the prompt/reverse-prompt machanism of LLaMA C++'s interactive mode needs an overhaul. I tried very hard to dance around it but the whole thing is still a hack.

## Fun facts

I am not fluent in JavaScript at all but I was able to make the changes in Chatbot UI by chatting with [ChatGPT](https://chat.openai.com); no more StackOverflow.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/nuance1979/llama-server",
    "name": "llama-server",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "llama llama.cpp chatbot-ui chatbot",
    "author": "Yi Su",
    "author_email": "nuance@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/42/e2/7b4639685461407cac6f5366344d7b3cacfae5b221718206c8657a13ab1f/llama-server-0.1.0.tar.gz",
    "platform": null,
    "description": "# LLaMA Server\n\n[![PyPI version](https://img.shields.io/pypi/v/llama-server)](https://pypi.org/project/llama-server/)[![Unit test](https://github.com/nuance1979/llama-server/actions/workflows/test.yml/badge.svg?branch=main&&event=push)](https://github.com/nuance1979/llama-server/actions)[![GitHub stars](https://img.shields.io/github/stars/nuance1979/llama-server)](https://star-history.com/#nuance1979/llama-server&Date)[![GitHub license](https://img.shields.io/github/license/nuance1979/llama-server)](https://github.com/nuance1979/llama-server/blob/master/LICENSE)\n\nLLaMA Server combines the power of [LLaMA C++](https://github.com/ggerganov/llama.cpp) (via [PyLLaMACpp](https://github.com/nomic-ai/pyllamacpp)) with the beauty of [Chatbot UI](https://github.com/mckaywrigley/chatbot-ui).\n\n\ud83e\udd99LLaMA C++ (via \ud83d\udc0dPyLLaMACpp) \u2795 \ud83e\udd16Chatbot UI \u2795 \ud83d\udd17LLaMA Server \ud83d\udff0 \ud83d\ude0a\n\n**UPDATE**: Now supports better streaming through [PyLLaMACpp](https://github.com/nomic-ai/pyllamacpp)!\n\n**UPDATE**: Now supports streaming!\n\n## Demo\n- Streaming\n\nhttps://user-images.githubusercontent.com/10931178/229980159-61546fa6-2985-4cdc-8230-5dcb6a69c559.mov\n\n- Non-streaming\n\nhttps://user-images.githubusercontent.com/10931178/229408428-5b6ef72d-28d0-427f-ae83-e23972e2dcff.mov\n\n\n## Setup\n\n- Get your favorite LLaMA models by\n  - Download from [\ud83e\udd17Hugging Face](https://huggingface.co/models?sort=downloads&search=ggml);\n  - Or follow instructions at [LLaMA C++](https://github.com/ggerganov/llama.cpp);\n  - Make sure models are converted and quantized;\n\n- Create a `models.yml` file to provide your `model_home` directory and add your favorite [South American camelids](https://en.wikipedia.org/wiki/Lama_(genus)), e.g.:\n```yaml\nmodel_home: /path/to/your/models\nmodels:\n  llama-7b:\n    name: LLAMA-7B\n    path: 7B/ggml-model-q4_0.bin  # relative to `model_home` or an absolute path\n```\nSee [models.yml](https://github.com/nuance1979/llama-server/blob/main/models.yml) for an example.\n\n- Set up python environment:\n```bash\nconda create -n llama python=3.9\nconda activate llama\n```\n\n- Install LLaMA Server:\n  - From PyPI:\n  ```bash\n  python -m pip install llama-server\n  ```\n  - Or from source:\n  ```bash\n  python -m pip install git+https://github.com/nuance1979/llama-server.git\n  ```\n\n- Install a patched version of PyLLaMACpp: (*Note:* this step will not be needed **after** PyLLaMACpp makes a new release.)\n```bash\npython -m pip install git+https://github.com/nuance1979/pyllamacpp.git@dev --upgrade\n```\n\n- Start LLaMA Server with your `models.yml` file:\n```bash\nllama-server --models-yml models.yml --model-id llama-7b\n```\n\n- Check out [my fork](https://github.com/nuance1979/chatbot-ui) of Chatbot UI and start the app;\n```bash\ngit clone https://github.com/nuance1979/chatbot-ui\ncd chatbot-ui\ngit checkout llama\nnpm i\nnpm run dev\n```\n- Open the link http://localhost:3000 in your browser;\n  - Click \"OpenAI API Key\" at the bottom left corner and enter your [OpenAI API Key](https://platform.openai.com/account/api-keys);\n  - Or follow instructions at [Chatbot UI](https://github.com/mckaywrigley/chatbot-ui) to put your key into a `.env.local` file and restart;\n  ```bash\n  cp .env.local.example .env.local\n  <edit .env.local to add your OPENAI_API_KEY>\n  ```\n- Enjoy!\n\n## More\n\n- Try a larger model if you have it:\n```bash\nllama-server --models-yml models.yml --model-id llama-13b  # or any `model_id` defined in `models.yml`\n```\n\n- Try non-streaming mode by restarting Chatbot UI:\n```bash\nexport LLAMA_STREAM_MODE=1  # 0 to disable streaming\nnpm run dev\n```\n\n## Limitations\n\n- \"Regenerate response\" is currently not working;\n- IMHO, the prompt/reverse-prompt machanism of LLaMA C++'s interactive mode needs an overhaul. I tried very hard to dance around it but the whole thing is still a hack.\n\n## Fun facts\n\nI am not fluent in JavaScript at all but I was able to make the changes in Chatbot UI by chatting with [ChatGPT](https://chat.openai.com); no more StackOverflow.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "LLaMA Server combines the power of LLaMA C++ with the beauty of Chatbot UI.",
    "version": "0.1.0",
    "split_keywords": [
        "llama",
        "llama.cpp",
        "chatbot-ui",
        "chatbot"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "616e299dba44839fe2b046f828ce70c00e59c712fca2391d91d20f9f0b915a77",
                "md5": "1c6b55bbe8f34730c40dc7f73d6dce63",
                "sha256": "ab957e231c795b77951ec1e595b01e292f788f979c71b4736a925386d5ef2c7e"
            },
            "downloads": -1,
            "filename": "llama_server-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1c6b55bbe8f34730c40dc7f73d6dce63",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 9324,
            "upload_time": "2023-04-12T04:40:04",
            "upload_time_iso_8601": "2023-04-12T04:40:04.948642Z",
            "url": "https://files.pythonhosted.org/packages/61/6e/299dba44839fe2b046f828ce70c00e59c712fca2391d91d20f9f0b915a77/llama_server-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "42e27b4639685461407cac6f5366344d7b3cacfae5b221718206c8657a13ab1f",
                "md5": "2844ac890182774275b55ebcf6c70113",
                "sha256": "baaa67cb326e3a5276435caf484fc36fdf167f789da347284c17f61285b6c318"
            },
            "downloads": -1,
            "filename": "llama-server-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "2844ac890182774275b55ebcf6c70113",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 10388,
            "upload_time": "2023-04-12T04:40:06",
            "upload_time_iso_8601": "2023-04-12T04:40:06.476112Z",
            "url": "https://files.pythonhosted.org/packages/42/e2/7b4639685461407cac6f5366344d7b3cacfae5b221718206c8657a13ab1f/llama-server-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-04-12 04:40:06",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "nuance1979",
    "github_project": "llama-server",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "llama-server"
}
        
Elapsed time: 0.05500s