Name | ooba-api-client JSON |
Version |
0.1.0a4
JSON |
| download |
home_page | |
Summary | API Client for Ooba Booga's Text Generation WebUI |
upload_time | 2023-10-05 21:39:14 |
maintainer | |
docs_url | None |
author | James Hutchison |
requires_python | >=3.10,<4.0 |
license | |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Python API Client for Ooba-Booga's Text Generation Web UI
An API client for the text generation UI, with sane defaults.
Motivation: documentation isn't great, examples are gnarly, not seeing an existing library.
Supported use cases:
- [x] generate / instruct
- [ ] chat
- [ ] streaming instruct
- [ ] streaming chat
- [x] model info
- [x] model loading
## Installation
`pip install ooba-api-client`
## What model should I use?
If you're new to LLMs, you may be unsure what model to use for your use case.
In general, models tend to come in three flavors:
- a foundational model
- a chat model
- an instruct model
The foundational model typically is used for text prediction (typically suggestions), if its even good for that. You probably don't want this. Foundamational models often need behavior training to be useful.
The chat model is used for conversation histories. This would be the preferred model if you're trying to create a chat bot who replies to the user.
The instruct models are tuned towards following instructions. If your interest is in creating autonomous agents, this is probably what you want. Note that you can always roll up chat histories into a single prompt.
## Example
```python
import logging
from ooba_api import OobaApiClient, Parameters, LlamaInstructPrompt
logger = logging.getLogger("ooba_api")
logger.setLevel(logging.DEBUG)
logging.basicConfig(level=logging.DEBUG)
client = OobaApiClient() # defaults to http://localhost:5000
response = client.instruct(
LlamaInstructPrompt(
system_prompt=(
"Generate only the requested code from the user. Do not generate anything else. "
"Be succint. Generate markdown of the code, and give the correct type. "
"If the code is python use ```python for the markdown. Do not explain afterwards"
),
prompt="Generate a Python function to reverse the contents of a file",
),
parameters=Parameters(temperature=0.2, repetition_penalty=1.05),
)
print(response)
```
~~~
```python
def reverse_file(file_path):
with open(file_path, "r") as f:
content = f.read()
return content[::-1]
```
~~~
## Model Information and Loading
To get the currently loaded model:
```python
from ooba_api import OobaApiClient, OobaModelInfo, OobaModelNotLoaded
model_info: OobaModelInfo = client.model_info()
print(model_info)
# model_info will be OobaModelNotLoaded if no model is currently loaded
assert not isinstance(model_info, OobaModelNotLoaded)
```
To load a model:
```python
load_model_response: OobaModelInfo = client.load_model(
"codellama-7b-instruct.Q4_K_M.gguf",
args_dict={
"loader": "ctransformers",
"n-gpu-layers": 100,
"n_ctx": 2500,
"threads": 0,
"n_batch": 512,
"model_type": "llama",
},
)
```
## Appendix
### Specific Model Help
```python
# Code Llama Instruct
from ooba_api.prompts import LlamaInstructPrompt
response = client.instruct(
LlamaInstructPrompt(
system_prompt=(
"Generate only the requested code from the user. Do not generate anything else. "
"Be succint. Generate markdown of the code, and give the correct type. "
"If the code is python use ```python for the markdown. Do not explain afterwards"
),
prompt="Generate a Python function to reverse the contents of a file",
),
parameters=Parameters(temperature=0.2, repetition_penalty=1.05),
)
```
```python
# falcon instruct
from ooba_api.prompts import InstructPrompt
response = client.instruct(
InstructPrompt(
prompt=(
"Generate only the requested code from the user. Do not generate anything else. "
"Be succint. Generate markdown of the code, and give the correct type. "
"If the code is python use ```python for the markdown. Do not explain afterwards.\n"
"Generate a Python function to reverse the contents of a file"
)
),
parameters=Parameters(temperature=0.2, repetition_penalty=1.05),
)
```
### Running Ooba-Booga
The Text Generation Web UI can be found here:
https://github.com/oobabooga/text-generation-webui/
This start-up config gives very good performance on a RTX 3060-TI with 8 GB of VRAM on an i5 system with 32 GB of DDR4. In many cases this produced over 20 tokens / sec. There is no delay before producing tokens.
`python server.py --model codellama-7b-instruct.Q4_K_M.gguf --api --listen --gpu-memory 8 --cpu-memory 1 --n-gpu-layers 1000000 --loader ctransformer --n_ctx 2500`
Increasing the context size (2500) will slow it down.
Raw data
{
"_id": null,
"home_page": "",
"name": "ooba-api-client",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.10,<4.0",
"maintainer_email": "",
"keywords": "",
"author": "James Hutchison",
"author_email": "jamesghutchison@proton.me",
"download_url": "https://files.pythonhosted.org/packages/26/c5/448e2faedb42c66304eb1fcfe2843248a923da3fe316e0f66df822ed9013/ooba_api_client-0.1.0a4.tar.gz",
"platform": null,
"description": "# Python API Client for Ooba-Booga's Text Generation Web UI\n\nAn API client for the text generation UI, with sane defaults.\n\nMotivation: documentation isn't great, examples are gnarly, not seeing an existing library.\n\nSupported use cases:\n- [x] generate / instruct\n- [ ] chat\n- [ ] streaming instruct\n- [ ] streaming chat\n- [x] model info\n- [x] model loading\n\n## Installation\n\n`pip install ooba-api-client`\n\n## What model should I use?\nIf you're new to LLMs, you may be unsure what model to use for your use case.\n\nIn general, models tend to come in three flavors:\n- a foundational model\n- a chat model\n- an instruct model\n\nThe foundational model typically is used for text prediction (typically suggestions), if its even good for that. You probably don't want this. Foundamational models often need behavior training to be useful.\n\nThe chat model is used for conversation histories. This would be the preferred model if you're trying to create a chat bot who replies to the user.\n\nThe instruct models are tuned towards following instructions. If your interest is in creating autonomous agents, this is probably what you want. Note that you can always roll up chat histories into a single prompt.\n\n## Example\n```python\nimport logging\n\nfrom ooba_api import OobaApiClient, Parameters, LlamaInstructPrompt\n\nlogger = logging.getLogger(\"ooba_api\")\nlogger.setLevel(logging.DEBUG)\nlogging.basicConfig(level=logging.DEBUG)\n\nclient = OobaApiClient() # defaults to http://localhost:5000\n\nresponse = client.instruct(\n LlamaInstructPrompt(\n system_prompt=(\n \"Generate only the requested code from the user. Do not generate anything else. \"\n \"Be succint. Generate markdown of the code, and give the correct type. \"\n \"If the code is python use ```python for the markdown. Do not explain afterwards\"\n ),\n prompt=\"Generate a Python function to reverse the contents of a file\",\n ),\n parameters=Parameters(temperature=0.2, repetition_penalty=1.05),\n)\nprint(response)\n```\n\n~~~\n ```python\ndef reverse_file(file_path):\n with open(file_path, \"r\") as f:\n content = f.read()\n return content[::-1]\n```\n~~~\n\n## Model Information and Loading\nTo get the currently loaded model:\n\n```python\nfrom ooba_api import OobaApiClient, OobaModelInfo, OobaModelNotLoaded\n\nmodel_info: OobaModelInfo = client.model_info()\n\nprint(model_info)\n\n# model_info will be OobaModelNotLoaded if no model is currently loaded\nassert not isinstance(model_info, OobaModelNotLoaded)\n```\n\nTo load a model:\n\n```python\nload_model_response: OobaModelInfo = client.load_model(\n \"codellama-7b-instruct.Q4_K_M.gguf\",\n args_dict={\n \"loader\": \"ctransformers\",\n \"n-gpu-layers\": 100,\n \"n_ctx\": 2500,\n \"threads\": 0,\n \"n_batch\": 512,\n \"model_type\": \"llama\",\n },\n)\n```\n\n## Appendix\n\n### Specific Model Help\n\n```python\n# Code Llama Instruct\nfrom ooba_api.prompts import LlamaInstructPrompt\n\nresponse = client.instruct(\n LlamaInstructPrompt(\n system_prompt=(\n \"Generate only the requested code from the user. Do not generate anything else. \"\n \"Be succint. Generate markdown of the code, and give the correct type. \"\n \"If the code is python use ```python for the markdown. Do not explain afterwards\"\n ),\n prompt=\"Generate a Python function to reverse the contents of a file\",\n ),\n parameters=Parameters(temperature=0.2, repetition_penalty=1.05),\n)\n```\n\n```python\n# falcon instruct\nfrom ooba_api.prompts import InstructPrompt\n\nresponse = client.instruct(\n InstructPrompt(\n prompt=(\n \"Generate only the requested code from the user. Do not generate anything else. \"\n \"Be succint. Generate markdown of the code, and give the correct type. \"\n \"If the code is python use ```python for the markdown. Do not explain afterwards.\\n\"\n \"Generate a Python function to reverse the contents of a file\"\n )\n ),\n parameters=Parameters(temperature=0.2, repetition_penalty=1.05),\n)\n```\n\n### Running Ooba-Booga\nThe Text Generation Web UI can be found here:\n\nhttps://github.com/oobabooga/text-generation-webui/\n\nThis start-up config gives very good performance on a RTX 3060-TI with 8 GB of VRAM on an i5 system with 32 GB of DDR4. In many cases this produced over 20 tokens / sec. There is no delay before producing tokens.\n\n`python server.py --model codellama-7b-instruct.Q4_K_M.gguf --api --listen --gpu-memory 8 --cpu-memory 1 --n-gpu-layers 1000000 --loader ctransformer --n_ctx 2500`\n\nIncreasing the context size (2500) will slow it down.\n\n",
"bugtrack_url": null,
"license": "",
"summary": "API Client for Ooba Booga's Text Generation WebUI",
"version": "0.1.0a4",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "04d527cf5ffdb5f887ef422df0a2ba32bad6dc22de7249baf36b942e271f1bdd",
"md5": "3bdaf0e7a493d2f9c3cb8756fd9df46d",
"sha256": "53f0fb61b603cc75e8f0c7974055ed418a7a0b36d7db94b6b2df70ecd2c5c48a"
},
"downloads": -1,
"filename": "ooba_api_client-0.1.0a4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3bdaf0e7a493d2f9c3cb8756fd9df46d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10,<4.0",
"size": 8101,
"upload_time": "2023-10-05T21:39:13",
"upload_time_iso_8601": "2023-10-05T21:39:13.227955Z",
"url": "https://files.pythonhosted.org/packages/04/d5/27cf5ffdb5f887ef422df0a2ba32bad6dc22de7249baf36b942e271f1bdd/ooba_api_client-0.1.0a4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "26c5448e2faedb42c66304eb1fcfe2843248a923da3fe316e0f66df822ed9013",
"md5": "c1379ee3a7aab3f98a9df7c21676743a",
"sha256": "9a7c6b9a95dd463c89b10a3ba5499eaa0a0ef2e360bbf1df5e983e170de843e8"
},
"downloads": -1,
"filename": "ooba_api_client-0.1.0a4.tar.gz",
"has_sig": false,
"md5_digest": "c1379ee3a7aab3f98a9df7c21676743a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10,<4.0",
"size": 6306,
"upload_time": "2023-10-05T21:39:14",
"upload_time_iso_8601": "2023-10-05T21:39:14.684628Z",
"url": "https://files.pythonhosted.org/packages/26/c5/448e2faedb42c66304eb1fcfe2843248a923da3fe316e0f66df822ed9013/ooba_api_client-0.1.0a4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-10-05 21:39:14",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "ooba-api-client"
}