# Large Language Model (LLM) Inference API and Chatbot 🦙
![project banner](https://github.com/aniketmaurya/llm-inference/raw/main/assets/llm-inference-min.png)
Inference API for LLMs like LLaMA and Falcon powered by Lit-GPT from [Lightning AI](https://lightning.ai)
```
pip install llm-inference
```
### Install from main branch
```bash
pip install git+https://github.com/aniketmaurya/llm-inference.git@main
```
> **Note**: You need to manually install [Lit-GPT](https://github.com/Lightning-AI/lit-gpt) and setup the model weights to use this project.
```
pip install lit_gpt@git+https://github.com/aniketmaurya/install-lit-gpt.git@install
```
## For Inference
```python
from llm_inference import LLMInference, prepare_weights
from rich import print
path = prepare_weights("EleutherAI/pythia-70m")
model = LLMInference(checkpoint_dir=path)
print(model("New York is located in"))
```
## How to use the Chatbot
```python
from llm_chain import LitGPTConversationChain, LitGPTLLM
from llm_inference import prepare_weights
from rich import print
path = str(prepare_weights("lmsys/longchat-13b-16k"))
llm = LitGPTLLM(checkpoint_dir=path, quantize="bnb.nf4") # 8.4GB GPU memory
bot = LitGPTConversationChain.from_llm(llm=llm, verbose=True)
print(bot.send("hi, what is the capital of France?"))
```
## Launch Chatbot App
<video width="320" height="240" controls>
<source src="/assets/chatbot-demo.mov" type="video/mp4">
</video>
**1. Download weights**
```py
from llm_inference import prepare_weights
path = prepare_weights("lmsys/longchat-13b-16k")
```
**2. Launch Gradio App**
```
python examples/chatbot/gradio_demo.py
```
## For deploying as a REST API
Create a Python file `app.py` and initialize the `ServeLLaMA` App.
```python
# app.py
from llm_inference.serve import ServeLLaMA, Response, PromptRequest
import lightning as L
component = ServeLLaMA(input_type=PromptRequest, output_type=Response)
app = L.LightningApp(component)
```
```bash
lightning run app app.py
```
Raw data
{
"_id": null,
"home_page": "https://github.com/aniketmaurya/llm-inference",
"name": "llm-inference",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "LLM,LLaMA,GPT,Falcon",
"author": "Aniket Maurya",
"author_email": "theaniketmaurya@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/54/17/56ce7b12de3af15ae7acf9358760ad9689e36392da2491888fa7299ad6b5/llm_inference-0.0.6.tar.gz",
"platform": null,
"description": "# Large Language Model (LLM) Inference API and Chatbot \ud83e\udd99\n\n![project banner](https://github.com/aniketmaurya/llm-inference/raw/main/assets/llm-inference-min.png)\n\nInference API for LLMs like LLaMA and Falcon powered by Lit-GPT from [Lightning AI](https://lightning.ai)\n\n```\npip install llm-inference\n```\n\n### Install from main branch\n```bash\npip install git+https://github.com/aniketmaurya/llm-inference.git@main\n```\n\n> **Note**: You need to manually install [Lit-GPT](https://github.com/Lightning-AI/lit-gpt) and setup the model weights to use this project.\n\n```\npip install lit_gpt@git+https://github.com/aniketmaurya/install-lit-gpt.git@install\n```\n\n## For Inference\n\n```python\nfrom llm_inference import LLMInference, prepare_weights\nfrom rich import print\n\npath = prepare_weights(\"EleutherAI/pythia-70m\")\nmodel = LLMInference(checkpoint_dir=path)\n\nprint(model(\"New York is located in\"))\n```\n\n\n## How to use the Chatbot\n\n```python\nfrom llm_chain import LitGPTConversationChain, LitGPTLLM\nfrom llm_inference import prepare_weights\nfrom rich import print\n\n\npath = str(prepare_weights(\"lmsys/longchat-13b-16k\"))\nllm = LitGPTLLM(checkpoint_dir=path, quantize=\"bnb.nf4\") # 8.4GB GPU memory\nbot = LitGPTConversationChain.from_llm(llm=llm, verbose=True)\n\nprint(bot.send(\"hi, what is the capital of France?\"))\n```\n\n## Launch Chatbot App\n\n<video width=\"320\" height=\"240\" controls>\n <source src=\"/assets/chatbot-demo.mov\" type=\"video/mp4\">\n</video>\n\n**1. Download weights**\n```py\nfrom llm_inference import prepare_weights\npath = prepare_weights(\"lmsys/longchat-13b-16k\")\n```\n\n**2. Launch Gradio App**\n\n```\npython examples/chatbot/gradio_demo.py\n```\n\n\n\n## For deploying as a REST API\n\nCreate a Python file `app.py` and initialize the `ServeLLaMA` App.\n\n```python\n# app.py\nfrom llm_inference.serve import ServeLLaMA, Response, PromptRequest\n\nimport lightning as L\n\ncomponent = ServeLLaMA(input_type=PromptRequest, output_type=Response)\napp = L.LightningApp(component)\n```\n\n```bash\nlightning run app app.py\n```\n",
"bugtrack_url": null,
"license": "Apache License 2.0",
"summary": "Large Language Models Inference API and Applications",
"version": "0.0.6",
"project_urls": {
"Homepage": "https://github.com/aniketmaurya/llm-inference"
},
"split_keywords": [
"llm",
"llama",
"gpt",
"falcon"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "c6cd4bfeab074dd703d4977641711c16e5f4641d550c55d2890b797c71f90af7",
"md5": "5a50437efe21dab18314b9a3f2441bea",
"sha256": "08fe641c4511b1ad465a503c02b58752cf785ee00f4516a3b63e99103d1ea6ac"
},
"downloads": -1,
"filename": "llm_inference-0.0.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "5a50437efe21dab18314b9a3f2441bea",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 11980,
"upload_time": "2023-07-16T15:15:42",
"upload_time_iso_8601": "2023-07-16T15:15:42.881934Z",
"url": "https://files.pythonhosted.org/packages/c6/cd/4bfeab074dd703d4977641711c16e5f4641d550c55d2890b797c71f90af7/llm_inference-0.0.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "541756ce7b12de3af15ae7acf9358760ad9689e36392da2491888fa7299ad6b5",
"md5": "c813fdea738fb735019ce0ee3237a0ef",
"sha256": "5bc77680d1df2f9af5b4cc9187fd47c473a2c98a32d0c59eb7427949ad19cbfe"
},
"downloads": -1,
"filename": "llm_inference-0.0.6.tar.gz",
"has_sig": false,
"md5_digest": "c813fdea738fb735019ce0ee3237a0ef",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 810413,
"upload_time": "2023-07-16T15:15:44",
"upload_time_iso_8601": "2023-07-16T15:15:44.223869Z",
"url": "https://files.pythonhosted.org/packages/54/17/56ce7b12de3af15ae7acf9358760ad9689e36392da2491888fa7299ad6b5/llm_inference-0.0.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-07-16 15:15:44",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "aniketmaurya",
"github_project": "llm-inference",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "llm-inference"
}