| Name | Toolio JSON |
| Version |
0.5.0
JSON |
| download |
| home_page | None |
| Summary | OpenAI-like HTTP server API implementation which supports structured LLM response generation (e.g. make it conform to a JSON schema) |
| upload_time | 2024-09-03 13:10:35 |
| maintainer | None |
| docs_url | None |
| author | None |
| requires_python | >=3.11 |
| license | None |
| keywords |
|
| VCS |
 |
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|

♪ Come along and ride on a fantastic voyage 🎵, with AI riding shotgun seat and a flatbed full of tools.
Toolio is an OpenAI-like HTTP server API implementation which supports structured LLM response generation (e.g. make it conform to a [JSON schema](https://json-schema.org/)). It's also really useful for more reliable tool calling. Toolio is based on the MLX framework for Apple Silicon (e.g. M1/M2/M3/M4 Macs), so **that's the only supported platform at present**.
Call it tool-calling or function-calling, or agentic workflows based on schema-driven output, or guided generation, or steered response. If you're non-technical, you can think of it as your "GPT Private Agent". It can handle tasks for you, without spilling your secrets.
Builds on: https://github.com/otriscon/llm-structured-output/
## Specific components and usage modes
* `toolio_server` (command line)—Host MLX-format LLMs for structured output query or function calling via HTTP requests
* `toolio_request` (command line)—Execute HTTP client requests against a server
* `toolio.model_manager` (Python API)—Encapsulate an MLX-format LLM for convenient, in-resident query with structured output or function calling
* `toolio.client.struct_mlx_chat_api` (Python API)—Make a toolio server request from code
<table><tr>
<td><a href="https://oori.dev/"><img src="https://www.oori.dev/assets/branding/oori_Logo_FullColor.png" width="64" /></a></td>
<td>Toolio is primarily developed by the crew at <a href="https://oori.dev/">Oori Data</a>. We offer data pipelines and software engineering services around AI/LLM applications.</td>
</tr></table>
We'd love your help, though! [Click to learn how to make contributions to the project](https://github.com/OoriData/Toolio/wiki/Notes-for-contributors).
The following video, "Toolio in 10 minutes", is an easy way to learn about the project.
[](https://youtu.be/9DpQYbteakc)
<!--
<iframe width="560" height="315" src="https://www.youtube.com/embed/9DpQYbteakc?si=Zy4Cj1v1q9ID07eg" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
<img width="1268" alt="Toolio in 10 minutes still" src="https://github.com/user-attachments/assets/fc8dda94-326d-426e-a566-ac8ec60be31f">
-->
[Documentation](https://OoriData.github.io/Toolio/)
# Installation
As simple as:
```sh
pip install toolio
```
If you're not sure, you can check that you're on an Apple Silicon Mac.
```sh
python -c "import platform; assert 'arm64' in platform.platform()"
```
# Host a server
Use `toolio_server` to host MLX-format LLMs for structured output query or function-calling. For example you can host the MLX version of Nous Research's Hermes-2 Θ (Theta).
```sh
toolio_server --model=mlx-community/Hermes-2-Theta-Llama-3-8B-4bit
```
This will download the model from the HuggingFace path `mlx-community/Hermes-2-Theta-Llama-3-8B-4bit` to your local disk cache. The `4bit` at the end means you are downloading a version quantized to 4 bits, so that each parameter in the neural network, which would normally take up 16 bits, only takes up 4, in order to save memory and boost speed. There are 8 billion parameters, so this version will take up a little over 4GB on your disk, and running it will take up about the sama amount of your unified RAM.
To learn more about the MLX framework for ML workloads (including LLMs) on Apple Silicon, see the [MLX Notes](https://github.com/uogbuji/mlx-notes) article series. The "Day One" article provides all the context you need for using local LLMs with Toolio.
There are many hundreds of models you can select. One bit of advice is that Toolio, for now, tends to work better with base or base/chat models, rather than instruct-tuned models.
## cURLing the Toolio server
Try out a basic request, not using any of Toolio's special features, but rather using the LLM as is:
```sh
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H 'Content-Type: application/json' \
-d '{
"messages": [{"role": "user", "content": "I am thinking of a number between 1 and 10. Guess what it is."}],
"temperature": 0.1
}'
```
This is actually not constraining to any output structure, and is just using the LLM as is. The result will be in complx-looking JSON, but read on for more straightforward ways to query against a Toolio server.
## Specifying an output JSON schema
Here is a request that does constrain return structure:
```sh
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H 'Content-Type: application/json' \
-d '{
"messages": [{"role": "user", "content": "I am thinking of a number between 1 and 10. Guess what it is."}],
"response_format": {
"type": "json_object",
"schema": "{\"type\": \"object\",\"properties\": {\"number\": {\"type\": \"number\"}}}"
},
"temperature": 0.1
}'
```
The key here is specification of a JSON schema. The schema is escaped for the command line shell above, so here it is in its regular form:
```json
{"type": "object", "properties": {"number": {"type": "number"}}}
```
It looks a bit intimidating, at first, if you're not familiar with [JSON schema](https://json-schema.org/), but they're reasonably easy to learn. [You can follow the primer](https://json-schema.org/learn/getting-started-step-by-step).
## Using the command line client instead
cURL is a pretty raw interface for this, though. For example, you have to parse the resulting response JSON. It's a lot easier to use the more specialized command line client tool `toolio_request`. Here is the equivalent too the first cURL example, above:
```sh
toolio_request --apibase="http://localhost:8000" --prompt="I am thinking of a number between 1 and 10. Guess what it is."
```
This time you'll just get the straightforward response text, e.g. "Sure, I'll guess 5. Is that your number?"
Here is an example using JSON schema constraint to extract structured data from an unstructured sentence.
```sh
export LMPROMPT='Which countries are mentioned in the sentence "Adamma went home to Nigeria for the hols"? Your answer should be only JSON, according to this schema: {json_schema}'
export LMSCHEMA='{"type": "array", "items": {"type": "object", "properties": {"name": {"type": "string"}, "continent": {"type": "string"}}}}'
toolio_request --apibase="http://localhost:8000" --prompt=$LMPROMPT --schema=$LMSCHEMA
```
(…and yes, in practice a smaller, specialized entity extraction model might be a better option for a case this simple)
With any decent LLM you should get the following **and no extraneous text cluttering things up!**
```json
[{"name": "Nigeria", "continent": "Africa"}]
```
Or if you have the prompt or schema written to files:
```sh
echo 'Which countries are mentioned in the sentence "Adamma went home to Nigeria for the hols"? Your answer should be only JSON, according to this schema: {json_schema}' > /tmp/llmprompt.txt
echo '{"type": "array", "items": {"type": "object", "properties": {"name": {"type": "string"}, "continent": {"type": "string"}}}}' > /tmp/countries.schema.json
toolio_request --apibase="http://localhost:8000" --prompt-file=/tmp/llmprompt.txt --schema-file=/tmp/countries.schema.json
```
## Tool calling
You can run tool usage (function-calling) prompts, a key technique in LLM agent frameworks. A schema will automatically be generated from the tool specs, which themselves are based on [JSON Schema](https://json-schema.org/), according to OpenAI conventions.
```sh
echo 'What'\''s the weather like in Boulder today?' > /tmp/llmprompt.txt
echo '{"tools": [{"type": "function","function": {"name": "get_current_weather","description": "Get the current weather in a given location","parameters": {"type": "object","properties": {"location": {"type": "string","description": "City and state, e.g. San Francisco, CA"},"unit": {"type": "string","enum": ["℃","℉"]}},"required": ["location"]}}}], "tool_choice": "auto"}' > /tmp/toolspec.json
toolio_request --apibase="http://localhost:8000" --prompt-file=/tmp/llmprompt.txt --tools-file=/tmp/toolspec.json --max-trips=1
```
You can expect a response such as
```json
[...] UserWarning: No implementation provided for function: get_current_weather
The model invoked the following tool calls to complete the response, but there are no permitted trips remaining.
[
{
"id": "call_6127176720_1719458192_0",
"type": "function",
"function": {
"name": "get_current_weather",
"arguments_obj": {
"location": "Boulder, MA",
"unit": "\u2109"
}
}
}
]
```
You might have noticed the `--max-trips=1` in the original call. Normally the tool call response would go back to the LLM to further construct a response, but Toolio allows you to limit those trips. By setting the limit to 1, it is unable to make a second trip to deliver the function call response for further processing, and the user is notified of the fact.
Incidentally `\u2109` is just Unicode for `℉` (degrees fahrenheit).
## Actually running the functions
It's pretty well known at this point that LLMs are bad at maths, but we can give them help. Consider the following example:
```sh
echo 'What is the square root of 256?' > /tmp/llmprompt.txt
echo '{"tools": [{"type": "function","function": {"name": "square_root","description": "Get the square root of the given number","parameters": {"type": "object", "properties": {"square": {"type": "number", "description": "Number from which to find the square root"}},"required": ["square"]},"pyfunc": "math|sqrt"}}], "tool_choice": "auto"}' > /tmp/toolspec.json
toolio_request --apibase="http://localhost:8000" --prompt-file=/tmp/llmprompt.txt --tools-file=/tmp/toolspec.json
```
We give the LLM a Python function for getting a square root. The OpenAI-style tool spec is extended with `"pyfunc": "math|sqrt"`. This tells Toolio to import the Python built-in `math` model and call the `sqrt` function within it.
Notice there is no `--max-trips=` this time. The default value is `3`, so that's enough to have at least one round-trip to deliver the tool's response to the LLM for further processing. If all goes well with the LLM, you should get a result such as:
```
The square root of 256 is 16.
```
`math.sqrt` is a convenient, simple example. You can specify any function which can already be imported (Toolio won't install any libraries at run time), and you can use imports and attribute lookups with multiple levels, e.g. `path.to.module_to_import|path.to.function`.
## Libraries of tools (toolboxes, if you like)
The examples above might feel like a bit too much work to use a tool; in particular putting together and sending along the tool-calling spec. In most cases you'll either be reusing tools developed by someone else, or your own special ones. In either case the tool-calling spec for each tool can be bundled for easier use. Toolio comes with a few tools you can use right away, for example. `toolio.tool.math.calculator` is a really simple calculator tool the LLM can use because once again LLMs are really bad at maths. But there's one step required first. Some of the built-in tools use third-party libraries which aren't baseline requirements of Toolio. Install them as follows:
```sh
pip install -Ur requirements-extra.txt
```
Now try a prompt intended to use the calculator tool. To make sure it does, we'll add the `loglevel` flag:
```sh
toolio_request --apibase="http://localhost:8000" --tool=toolio.tool.math.calculator --loglevel=DEBUG \
--prompt='Usain Bolt ran the 100m race in 9.58s. What was his average velocity?'
```
Here's what I got from `Hermes-2-Theta-Llama-3-8B-4bit`:
```
DEBUG:toolio.cli.request:🔧 Calling tool calculator with args {'expr': '(100/9.58)'}
DEBUG:toolio.cli.request:✅ Tool call result: 10.438413361169102
To calculate Usain Bolt's average velocity during the 100m race, we divide the total distance by the total time taken. Here's the calculation:
Distance (d) = 100 meters
Time (t) = 9.58 seconds
Average velocity (v) = Distance / Time
v = 100 meters / 9.58 seconds ≈ 10.44 meters per second
So, Usain Bolt's average velocity during the 100m race was approximately 10.44 meters per second.
```
You can see that the LLM got help by calling the tool to calculate `100/9.58`.
Note: Every tool relies on the agent LLM to correctly construct the tool call call, e.g. settign up the right mathematial expression for the calculator tool. This is not something you can take for granted, so there's no shortcut from testing and selecting the right LLMs.
## Multiple tool calls
Here's an example of giving the LLM a tool to get today's date, and another with a database lookup from birthdays to employee names and interests.
```sh
toolio_request --apibase="http://localhost:8000" --loglevel=DEBUG \
--tool=toolio.tool.demo.birthday_lookup \
--tool=toolio.tool.demo.today_kfabe \
--sysprompt='You are a writer who reasons step by step and uses research tools in the correct order before writing' \
--prompt='Write a nice note for each employee who has a birthday today.'
```
These are actually contrived, fake tools for demo purposes. `demo.today_kfabe` always gives the date as 1 July 2024, and `demo.birthday_lookup` is a dummy database. Also note the added system prompt to encourag the LLM to use step-by-step reasoning in applying the tools. If your LLM is smart enough enough it would first get the (supposed) date today and then convrt that to a format suitable for the database lookip.
Unfortunately `mlx-community/Hermes-2-Theta-Llama-3-8B-4bit` fumbles this, ignoring the spoon-fed date from the first tool call, and instead grabs an example date mentioned in the tool definition. This results in no birthday lookup results, and the LLM generates no output.
```
⚙️Calling tool today with args {}
⚙️Tool call result: 07-01
⚙️Calling tool birthday_lookup with args {'date': '05-03'}
⚙️Tool call result: No one has a birthday today
Final response:
```
It's a good example of how tool-calling can pretty easily go wrong. As LLMs get more and more capable this should become more reliable. It may well be that top-end LLMs such as OpenAI's GPT and Anthropic's Claude would be able to handle this case, but of course you can't run these privately on MLX.
# Write your own tools
Study the examples in the `pylib/tools` and in the `demo` directories to see how easy it is.
# LLM-specific flows
LLMs actually get trained for tool calling, and sometimes get trained to expect different patterns. Toolio supports some flags for adapting the tool flow based on the LLM you're using on the server.
For notes on more models see https://github.com/OoriData/Toolio/wiki/Notes-on-how-MLX-models-handle-tool%E2%80%90calling
# Python HTTP client
You can also query the server from Python code, using `toolio.client.struct_mlx_chat_api`. Here's an example, including a (dummied up) custom tool:
```py
import asyncio
from ogbujipt.llm_wrapper import openai_chat_api, prompt_to_chat
from toolio.client import struct_mlx_chat_api
from toolio.tool import tool, param
@tool('currency_exchange', params=[param('from', str, 'Currency to be converted from, e.g. USD, GBP, JPY', True, rename='from_'), param('to', str, 'Currency to be converted to, e.g. USD, GBP, JPY', True), param('amount', float, 'Amount to convert from one currency to another. Just a number, with no other symbols', True)])
def currency_exchange(from_=None, to=None, amount=None):
'Tool to convert one currency to another'
# Just a dummy implementation
lookup = {('JPY', 'USD'): 1234.56}
rate = lookup.get((from_, to))
print(f'{from_=}, {to=}, {amount=}, {rate=}')
# Look up the conversion online here
return rate * amount
prompt = 'I need to import a car from Japan. It costs 5 million Yen.'
'How much must I withdraw from my US bank account'
llm = struct_mlx_chat_api(base_url='http://localhost:8000', tool_reg=[currency_exchange])
resp = asyncio.run(llm(prompt_to_chat(prompt), trip_timeout=60))
print(resp.first_choice_text)
```
Notice the use of the `rename` parameter metadata. In Python the param name we've asked the LLM to use, `from`, is a keyword, so to avoid confusion the actual function definition uses `from_`, and the `rename` instructs Toolio to make that change in the background.
You can also define asynchronous tools, e.g. `async def currency_exchange`, which I would actually recommend if, e.g. you are truly web scraping.
You might study the command line `pylib/cli/request.py` for further insight.
# Direct usage via Python
You can also, of course, just load the model and run inference on it without bothering with HTTP client/server. The `model_manager` class is a convenient interface for this.
```py
import asyncio
from toolio.llm_helper import model_manager, extract_content
toolio_mm = model_manager('mlx-community/Hermes-2-Theta-Llama-3-8B-4bit')
async def say_hello(tmm):
msgs = [{"role": "user", "content": "Hello! How are you?"}]
async for chunk in extract_content(tmm.complete(msgs)):
print(chunk, end='')
asyncio.run(say_hello(toolio_mm))
```
You should just get a simple text response from the LLm printed to the screen.
You can also do this via synchronous API, but I highly recommend leaing hard on the async habit.
The `chat_complete` method also takes a list of tools or a JSON schema, as well as some model parameters.
## LLM response metadata
Toolio uses OpenAI API conventions a lot under the hood. If you run the following:
```py
import asyncio
from toolio.llm_helper import model_manager, extract_content
toolio_mm = model_manager('mlx-community/Hermes-2-Theta-Llama-3-8B-4bit')
async def say_hello(tmm):
msgs = [{"role": "user", "content": "Hello! How are you?"}]
async for chunk_struct in tmm.complete(msgs):
print(chunk_struct)
break
asyncio.run(say_hello(toolio_mm))
```
You should see something like:
```py
{'choices': [{'index': 0, 'delta': {'role': 'assistant', 'content': 'Hi'}, 'finish_reason': None}], 'object': 'chat.completion.chunk', 'id': 'chatcmpl-17588006160_1721823730', 'created': 1721823730, 'model': 'mlx-community/Hermes-2-Theta-Llama-3-8B-4bit'}
```
The LLM response is delivered in such structures ("deltas") as they're generated. `chunk_struct['choices'][0]['delta']['content']` is a bit of the actual text we teased out in the previous snippet. `chunk_struct['choices'][0]['finish_reason']` is `None` because it's not yet finished, etc. This is based on OpenAI API.
`extract_content`, used in the previous snippet, is a very simple coroutine that extracts the actual text content from this series of response structures.
The final chunk would look something like this:
```py
{'choices': [{'index': 0, 'delta': {'role': 'assistant', 'content': ''}, 'finish_reason': 'stop'}], 'usage': {'completion_tokens': 20, 'prompt_tokens': 12, 'total_tokens': 32}, 'object': 'chat.completion.chunk', 'id': 'chatcmpl-18503717840_1721824385', 'created': 1721824385, 'model': 'mlx-community/Hermes-2-Theta-Llama-3-8B-4bit'}
```
Notice there is more information, now that it's finished (`'finish_reason': 'stop'`). Say you want the metadata such as the number of tokens generated:
```py
import asyncio
from toolio.llm_helper import model_manager, extract_content
toolio_mm = model_manager('mlx-community/Hermes-2-Theta-Llama-3-8B-4bit')
async def say_hello(tmm):
msgs = [{"role": "user", "content": "Hello! How are you?"}]
async for chunk in tmm.complete(msgs):
content = chunk['choices'][0]['delta']['content']
if content is not None:
print(content, end='')
# Final chunk has the stats
print('\n', '-'*80, '\n', 'Number of tokens generated:', chunk['usage']['total_tokens'])
asyncio.run(say_hello(toolio_mm))
```
You'll get something like:
```
*waves* Hi there! I'm doing well, thank you for asking. How about you?
--------------------------------------------------------------------------------
Number of tokens generated: 32
```
Tip: don't forget all the various, useful bits to be found in `itertools` and the like.
# Structured LLM responses via direct API
As mentioned, you can specify tools and schemata.
```py
import asyncio
from toolio.llm_helper import model_manager, extract_content
toolio_mm = model_manager('mlx-community/Hermes-2-Theta-Llama-3-8B-4bit')
async def say_hello(tmm):
prompt = ('Which countries are mentioned in the sentence \'Adamma went home to Nigeria for the hols\'?'
'Your answer should be only JSON, according to this schema: {json_schema}')
schema = ('{"type": "array", "items":'
'{"type": "object", "properties": {"name": {"type": "string"}, "continent": {"type": "string"}}}}')
msgs = [{'role': 'user', 'content': prompt.format(json_schema=schema)}]
async for chunk in extract_content(tmm.complete(msgs, json_schema=schema)):
print(chunk, end='')
asyncio.run(say_hello(toolio_mm))
```
## Example of tool use
```py
import asyncio
from math import sqrt
from toolio.llm_helper import model_manager, extract_content
SQUARE_ROOT_METADATA = {'name': 'square_root', 'description': 'Get the square root of the given number',
'parameters': {'type': 'object', 'properties': {
'square': {'type': 'number',
'description': 'Number from which to find the square root'}},
'required': ['square']}}
toolio_mm = model_manager('mlx-community/Hermes-2-Theta-Llama-3-8B-4bit',
tool_reg=[(sqrt, SQUARE_ROOT_METADATA)])
async def query_sq_root(tmm):
msgs = [ {'role': 'user', 'content': 'What is the square root of 256?'} ]
async for chunk in extract_content(tmm.complete_with_tools(msgs)):
print(chunk, end='')
asyncio.run(query_sq_root(toolio_mm))
```
# Tweaking prompts
Part of the process of getting an LLM to stick to a schema, or to call tools is to give it a system prompt to that effect. Toolio has built in prompt language for this purpose. We believe strongly in the design principle of separating natural language (e.g. prompts) from code, so the latyter is packaged into the `resource/language.toml` file, using [Word Loom](https://github.com/OoriData/OgbujiPT/wiki/Word-Loom:-A-format-for-managing-language-for-AI-LLMs-(including-prompts)) conventions.
You can of course override the built-in prompting.
## Overriding the tool-calling system prompt from the command line
```sh
echo 'What is the square root of 256?' > /tmp/llmprompt.txt
echo '{"tools": [{"type": "function","function": {"name": "square_root","description": "Get the square root of the given number","parameters": {"type": "object", "properties": {"square": {"type": "number", "description": "Number from which to find the square root"}},"required": ["square"]},"pyfunc": "math|sqrt"}}], "tool_choice": "auto"}' > /tmp/toolspec.json
toolio_request --apibase="http://localhost:8000" --prompt-file=/tmp/llmprompt.txt --tools-file=/tmp/toolspec.json --sysprompt="You are a helpful assistant with access to a tool that you may invoke if needed to answer the user's request. Please use the tool as applicable, even if you think you already know the answer. Give your final answer in Shakespearean English The tool is:
Tool"
```
## Overriding the tool-calling system prompt from the Python API
In order to override the system prompt from code, just set it in the initial chat message as the `system` role.
```py
import asyncio
from math import sqrt
from toolio.llm_helper import model_manager, extract_content
SQUARE_ROOT_METADATA = {'name': 'square_root', 'description': 'Get the square root of the given number',
'parameters': {'type': 'object', 'properties': {
'square': {'type': 'number',
'description': 'Number from which to find the square root'}},
'required': ['square']}}
toolio_mm = model_manager('mlx-community/Hermes-2-Theta-Llama-3-8B-4bit',
tool_reg=[(sqrt, SQUARE_ROOT_METADATA)])
# System prompt will be used to direct the LLM's tool-calling
SYSPROMPT = 'You are a tutor from Elizabethan England, with access to a tool that you may invoke if needed to answer'
'the user\'s request. Please use the tool as applicable, even if you think you already know the answer. '
'Remember to give your final answer in Elizabethan English. The tool is:\nTool'
async def query_sq_root(tmm):
msgs = [
{'role': 'system', 'content': SYSPROMPT},
{'role': 'user', 'content': 'What is the square root of 256?'}
]
async for chunk in extract_content(tmm.complete_with_tools(msgs)):
print(chunk, end='')
asyncio.run(query_sq_root(toolio_mm))
```
In which case you can express a response such as:
> By the tool's decree, the square root of 256, a number most fair,
> Is sixteen, a digit most true, and a figure most rare.
# Learn more
* [Documentation](https://OoriData.github.io/Toolio/)
* More examples in the `demo` directory
# Credits
* otriscon's [llm-structured-output](https://github.com/otriscon/llm-structured-output/) is the foundation of this package
* [OgbujiPT](https://github.com/OoriData/OgbujiPT) provides the client-side Open-AI-style LLM framework, and also the [Word Loom](https://github.com/OoriData/OgbujiPT/wiki/Word-Loom:-A-format-for-managing-language-for-AI-LLMs-(including-prompts)) convention for separating prompt text from code.
# License
Apache 2
# Nearby projects
* [Outlines](https://github.com/outlines-dev/outlines) - Structured Text Generation vis Pydantic, JSON schema or EBNF. Similarly to Toolio, it does steered sampling, i.e. builds a finite-state machine to guide sampling based on schema
* [Instructor](https://github.com/jxnl/instructor) - LLM structured output via prompt engineering, validation & retries rather than steered sampling.
# Why this, anyway?
In our thinking, and that of many others working in the space for a while, agent/tool systems are where GenAI are most likely to deliver practical value. Watch out, though, because McKinsey has seen fit to apply their $1,000/hr opinions along the same lines. ["Why agents are the next frontier of generative AI"](https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/why-agents-are-the-next-frontier-of-generative-ai?cid=soc-web) (July 2024)
[Parrot/Gorilla cartoon here]
# Project name
Named after the legend himself. Best don't pretend you don't know Coolio, fool! Popular rapper (R.I.P.) from LA. You watched *Cookin' with Coolio*, now it's time to Tool up with Toolio! ♪*Slide slide, but that's the past; I got something brand new for that aß.*🎼
Raw data
{
"_id": null,
"home_page": null,
"name": "Toolio",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.11",
"maintainer_email": null,
"keywords": null,
"author": null,
"author_email": "Uche Ogbuji <uche@oori.dev>",
"download_url": "https://files.pythonhosted.org/packages/44/69/796c0bb0276686920b89d2a0d6a1b7bcacb4985cd6d952abc1561a5c26f2/toolio-0.5.0.tar.gz",
"platform": null,
"description": "\n\u266a Come along and ride on a fantastic voyage \ud83c\udfb5, with AI riding shotgun seat and a flatbed full of tools.\n\nToolio is an OpenAI-like HTTP server API implementation which supports structured LLM response generation (e.g. make it conform to a [JSON schema](https://json-schema.org/)). It's also really useful for more reliable tool calling. Toolio is based on the MLX framework for Apple Silicon (e.g. M1/M2/M3/M4 Macs), so **that's the only supported platform at present**.\n\nCall it tool-calling or function-calling, or agentic workflows based on schema-driven output, or guided generation, or steered response. If you're non-technical, you can think of it as your \"GPT Private Agent\". It can handle tasks for you, without spilling your secrets.\n\nBuilds on: https://github.com/otriscon/llm-structured-output/\n\n## Specific components and usage modes\n\n* `toolio_server` (command line)\u2014Host MLX-format LLMs for structured output query or function calling via HTTP requests\n* `toolio_request` (command line)\u2014Execute HTTP client requests against a server\n* `toolio.model_manager` (Python API)\u2014Encapsulate an MLX-format LLM for convenient, in-resident query with structured output or function calling\n* `toolio.client.struct_mlx_chat_api` (Python API)\u2014Make a toolio server request from code\n\n<table><tr>\n <td><a href=\"https://oori.dev/\"><img src=\"https://www.oori.dev/assets/branding/oori_Logo_FullColor.png\" width=\"64\" /></a></td>\n <td>Toolio is primarily developed by the crew at <a href=\"https://oori.dev/\">Oori Data</a>. We offer data pipelines and software engineering services around AI/LLM applications.</td>\n</tr></table>\n\nWe'd love your help, though! [Click to learn how to make contributions to the project](https://github.com/OoriData/Toolio/wiki/Notes-for-contributors).\n\nThe following video, \"Toolio in 10 minutes\", is an easy way to learn about the project.\n\n[](https://youtu.be/9DpQYbteakc)\n\n<!--\n<iframe width=\"560\" height=\"315\" src=\"https://www.youtube.com/embed/9DpQYbteakc?si=Zy4Cj1v1q9ID07eg\" title=\"YouTube video player\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen></iframe>\n\n<img width=\"1268\" alt=\"Toolio in 10 minutes still\" src=\"https://github.com/user-attachments/assets/fc8dda94-326d-426e-a566-ac8ec60be31f\">\n-->\n\n[Documentation](https://OoriData.github.io/Toolio/)\n\n# Installation\n\nAs simple as:\n\n```sh\npip install toolio\n```\n\nIf you're not sure, you can check that you're on an Apple Silicon Mac.\n\n```sh\npython -c \"import platform; assert 'arm64' in platform.platform()\"\n```\n\n# Host a server\n\nUse `toolio_server` to host MLX-format LLMs for structured output query or function-calling. For example you can host the MLX version of Nous Research's Hermes-2 \u0398 (Theta).\n\n```sh\ntoolio_server --model=mlx-community/Hermes-2-Theta-Llama-3-8B-4bit\n```\n\nThis will download the model from the HuggingFace path `mlx-community/Hermes-2-Theta-Llama-3-8B-4bit` to your local disk cache. The `4bit` at the end means you are downloading a version quantized to 4 bits, so that each parameter in the neural network, which would normally take up 16 bits, only takes up 4, in order to save memory and boost speed. There are 8 billion parameters, so this version will take up a little over 4GB on your disk, and running it will take up about the sama amount of your unified RAM.\n\nTo learn more about the MLX framework for ML workloads (including LLMs) on Apple Silicon, see the [MLX Notes](https://github.com/uogbuji/mlx-notes) article series. The \"Day One\" article provides all the context you need for using local LLMs with Toolio.\n\nThere are many hundreds of models you can select. One bit of advice is that Toolio, for now, tends to work better with base or base/chat models, rather than instruct-tuned models.\n\n## cURLing the Toolio server\n\nTry out a basic request, not using any of Toolio's special features, but rather using the LLM as is:\n\n```sh\ncurl -X POST \"http://localhost:8000/v1/chat/completions\" \\\n -H 'Content-Type: application/json' \\\n -d '{\n \"messages\": [{\"role\": \"user\", \"content\": \"I am thinking of a number between 1 and 10. Guess what it is.\"}],\n \"temperature\": 0.1\n }'\n```\n\nThis is actually not constraining to any output structure, and is just using the LLM as is. The result will be in complx-looking JSON, but read on for more straightforward ways to query against a Toolio server.\n\n## Specifying an output JSON schema\n\nHere is a request that does constrain return structure:\n\n```sh\ncurl -X POST \"http://localhost:8000/v1/chat/completions\" \\\n -H 'Content-Type: application/json' \\\n -d '{\n \"messages\": [{\"role\": \"user\", \"content\": \"I am thinking of a number between 1 and 10. Guess what it is.\"}],\n \"response_format\": {\n \"type\": \"json_object\",\n \"schema\": \"{\\\"type\\\": \\\"object\\\",\\\"properties\\\": {\\\"number\\\": {\\\"type\\\": \\\"number\\\"}}}\"\n },\n \"temperature\": 0.1\n }'\n```\n\nThe key here is specification of a JSON schema. The schema is escaped for the command line shell above, so here it is in its regular form:\n\n```json\n{\"type\": \"object\", \"properties\": {\"number\": {\"type\": \"number\"}}}\n```\n\nIt looks a bit intimidating, at first, if you're not familiar with [JSON schema](https://json-schema.org/), but they're reasonably easy to learn. [You can follow the primer](https://json-schema.org/learn/getting-started-step-by-step).\n\n## Using the command line client instead\n\ncURL is a pretty raw interface for this, though. For example, you have to parse the resulting response JSON. It's a lot easier to use the more specialized command line client tool `toolio_request`. Here is the equivalent too the first cURL example, above:\n\n```sh\ntoolio_request --apibase=\"http://localhost:8000\" --prompt=\"I am thinking of a number between 1 and 10. Guess what it is.\"\n```\n\nThis time you'll just get the straightforward response text, e.g. \"Sure, I'll guess 5. Is that your number?\"\n\nHere is an example using JSON schema constraint to extract structured data from an unstructured sentence.\n\n```sh\nexport LMPROMPT='Which countries are mentioned in the sentence \"Adamma went home to Nigeria for the hols\"? Your answer should be only JSON, according to this schema: {json_schema}'\nexport LMSCHEMA='{\"type\": \"array\", \"items\": {\"type\": \"object\", \"properties\": {\"name\": {\"type\": \"string\"}, \"continent\": {\"type\": \"string\"}}}}'\ntoolio_request --apibase=\"http://localhost:8000\" --prompt=$LMPROMPT --schema=$LMSCHEMA\n```\n\n(\u2026and yes, in practice a smaller, specialized entity extraction model might be a better option for a case this simple)\n\nWith any decent LLM you should get the following **and no extraneous text cluttering things up!**\n\n```json\n[{\"name\": \"Nigeria\", \"continent\": \"Africa\"}]\n```\n\nOr if you have the prompt or schema written to files:\n\n```sh\necho 'Which countries are mentioned in the sentence \"Adamma went home to Nigeria for the hols\"? Your answer should be only JSON, according to this schema: {json_schema}' > /tmp/llmprompt.txt\necho '{\"type\": \"array\", \"items\": {\"type\": \"object\", \"properties\": {\"name\": {\"type\": \"string\"}, \"continent\": {\"type\": \"string\"}}}}' > /tmp/countries.schema.json\ntoolio_request --apibase=\"http://localhost:8000\" --prompt-file=/tmp/llmprompt.txt --schema-file=/tmp/countries.schema.json\n```\n\n## Tool calling\n\nYou can run tool usage (function-calling) prompts, a key technique in LLM agent frameworks. A schema will automatically be generated from the tool specs, which themselves are based on [JSON Schema](https://json-schema.org/), according to OpenAI conventions.\n\n```sh\necho 'What'\\''s the weather like in Boulder today?' > /tmp/llmprompt.txt\necho '{\"tools\": [{\"type\": \"function\",\"function\": {\"name\": \"get_current_weather\",\"description\": \"Get the current weather in a given location\",\"parameters\": {\"type\": \"object\",\"properties\": {\"location\": {\"type\": \"string\",\"description\": \"City and state, e.g. San Francisco, CA\"},\"unit\": {\"type\": \"string\",\"enum\": [\"\u2103\",\"\u2109\"]}},\"required\": [\"location\"]}}}], \"tool_choice\": \"auto\"}' > /tmp/toolspec.json\ntoolio_request --apibase=\"http://localhost:8000\" --prompt-file=/tmp/llmprompt.txt --tools-file=/tmp/toolspec.json --max-trips=1\n```\n\nYou can expect a response such as\n\n```json\n[...] UserWarning: No implementation provided for function: get_current_weather\nThe model invoked the following tool calls to complete the response, but there are no permitted trips remaining.\n[\n {\n \"id\": \"call_6127176720_1719458192_0\",\n \"type\": \"function\",\n \"function\": {\n \"name\": \"get_current_weather\",\n \"arguments_obj\": {\n \"location\": \"Boulder, MA\",\n \"unit\": \"\\u2109\"\n }\n }\n }\n]\n```\n\nYou might have noticed the `--max-trips=1` in the original call. Normally the tool call response would go back to the LLM to further construct a response, but Toolio allows you to limit those trips. By setting the limit to 1, it is unable to make a second trip to deliver the function call response for further processing, and the user is notified of the fact.\n\nIncidentally `\\u2109` is just Unicode for `\u2109` (degrees fahrenheit).\n\n## Actually running the functions\n\nIt's pretty well known at this point that LLMs are bad at maths, but we can give them help. Consider the following example:\n\n```sh\necho 'What is the square root of 256?' > /tmp/llmprompt.txt\necho '{\"tools\": [{\"type\": \"function\",\"function\": {\"name\": \"square_root\",\"description\": \"Get the square root of the given number\",\"parameters\": {\"type\": \"object\", \"properties\": {\"square\": {\"type\": \"number\", \"description\": \"Number from which to find the square root\"}},\"required\": [\"square\"]},\"pyfunc\": \"math|sqrt\"}}], \"tool_choice\": \"auto\"}' > /tmp/toolspec.json\ntoolio_request --apibase=\"http://localhost:8000\" --prompt-file=/tmp/llmprompt.txt --tools-file=/tmp/toolspec.json\n```\n\nWe give the LLM a Python function for getting a square root. The OpenAI-style tool spec is extended with `\"pyfunc\": \"math|sqrt\"`. This tells Toolio to import the Python built-in `math` model and call the `sqrt` function within it.\n\nNotice there is no `--max-trips=` this time. The default value is `3`, so that's enough to have at least one round-trip to deliver the tool's response to the LLM for further processing. If all goes well with the LLM, you should get a result such as:\n\n```\nThe square root of 256 is 16.\n```\n\n`math.sqrt` is a convenient, simple example. You can specify any function which can already be imported (Toolio won't install any libraries at run time), and you can use imports and attribute lookups with multiple levels, e.g. `path.to.module_to_import|path.to.function`.\n\n## Libraries of tools (toolboxes, if you like)\n\nThe examples above might feel like a bit too much work to use a tool; in particular putting together and sending along the tool-calling spec. In most cases you'll either be reusing tools developed by someone else, or your own special ones. In either case the tool-calling spec for each tool can be bundled for easier use. Toolio comes with a few tools you can use right away, for example. `toolio.tool.math.calculator` is a really simple calculator tool the LLM can use because once again LLMs are really bad at maths. But there's one step required first. Some of the built-in tools use third-party libraries which aren't baseline requirements of Toolio. Install them as follows:\n\n```sh\npip install -Ur requirements-extra.txt\n```\n\nNow try a prompt intended to use the calculator tool. To make sure it does, we'll add the `loglevel` flag:\n\n```sh\ntoolio_request --apibase=\"http://localhost:8000\" --tool=toolio.tool.math.calculator --loglevel=DEBUG \\\n--prompt='Usain Bolt ran the 100m race in 9.58s. What was his average velocity?'\n```\n\nHere's what I got from `Hermes-2-Theta-Llama-3-8B-4bit`:\n\n```\nDEBUG:toolio.cli.request:\ud83d\udd27 Calling tool calculator with args {'expr': '(100/9.58)'}\nDEBUG:toolio.cli.request:\u2705 Tool call result: 10.438413361169102\nTo calculate Usain Bolt's average velocity during the 100m race, we divide the total distance by the total time taken. Here's the calculation:\n\nDistance (d) = 100 meters\nTime (t) = 9.58 seconds\n\nAverage velocity (v) = Distance / Time\nv = 100 meters / 9.58 seconds \u2248 10.44 meters per second\n\nSo, Usain Bolt's average velocity during the 100m race was approximately 10.44 meters per second.\n```\n\nYou can see that the LLM got help by calling the tool to calculate `100/9.58`.\n\nNote: Every tool relies on the agent LLM to correctly construct the tool call call, e.g. settign up the right mathematial expression for the calculator tool. This is not something you can take for granted, so there's no shortcut from testing and selecting the right LLMs.\n\n## Multiple tool calls\n\nHere's an example of giving the LLM a tool to get today's date, and another with a database lookup from birthdays to employee names and interests.\n\n```sh\ntoolio_request --apibase=\"http://localhost:8000\" --loglevel=DEBUG \\\n--tool=toolio.tool.demo.birthday_lookup \\\n--tool=toolio.tool.demo.today_kfabe \\\n--sysprompt='You are a writer who reasons step by step and uses research tools in the correct order before writing' \\\n--prompt='Write a nice note for each employee who has a birthday today.'\n```\n\nThese are actually contrived, fake tools for demo purposes. `demo.today_kfabe` always gives the date as 1 July 2024, and `demo.birthday_lookup` is a dummy database. Also note the added system prompt to encourag the LLM to use step-by-step reasoning in applying the tools. If your LLM is smart enough enough it would first get the (supposed) date today and then convrt that to a format suitable for the database lookip.\n\nUnfortunately `mlx-community/Hermes-2-Theta-Llama-3-8B-4bit` fumbles this, ignoring the spoon-fed date from the first tool call, and instead grabs an example date mentioned in the tool definition. This results in no birthday lookup results, and the LLM generates no output.\n\n```\n\u2699\ufe0fCalling tool today with args {}\n\u2699\ufe0fTool call result: 07-01\n\u2699\ufe0fCalling tool birthday_lookup with args {'date': '05-03'}\n\u2699\ufe0fTool call result: No one has a birthday today\nFinal response:\n\n```\n\nIt's a good example of how tool-calling can pretty easily go wrong. As LLMs get more and more capable this should become more reliable. It may well be that top-end LLMs such as OpenAI's GPT and Anthropic's Claude would be able to handle this case, but of course you can't run these privately on MLX.\n\n# Write your own tools\n\nStudy the examples in the `pylib/tools` and in the `demo` directories to see how easy it is.\n\n# LLM-specific flows\n\nLLMs actually get trained for tool calling, and sometimes get trained to expect different patterns. Toolio supports some flags for adapting the tool flow based on the LLM you're using on the server.\n\nFor notes on more models see https://github.com/OoriData/Toolio/wiki/Notes-on-how-MLX-models-handle-tool%E2%80%90calling\n\n# Python HTTP client\n\nYou can also query the server from Python code, using `toolio.client.struct_mlx_chat_api`. Here's an example, including a (dummied up) custom tool:\n\n```py\nimport asyncio\n\nfrom ogbujipt.llm_wrapper import openai_chat_api, prompt_to_chat\n\nfrom toolio.client import struct_mlx_chat_api\nfrom toolio.tool import tool, param\n\n@tool('currency_exchange', params=[param('from', str, 'Currency to be converted from, e.g. USD, GBP, JPY', True, rename='from_'), param('to', str, 'Currency to be converted to, e.g. USD, GBP, JPY', True), param('amount', float, 'Amount to convert from one currency to another. Just a number, with no other symbols', True)])\ndef currency_exchange(from_=None, to=None, amount=None):\n 'Tool to convert one currency to another'\n # Just a dummy implementation\n lookup = {('JPY', 'USD'): 1234.56}\n rate = lookup.get((from_, to))\n print(f'{from_=}, {to=}, {amount=}, {rate=}')\n # Look up the conversion online here\n return rate * amount\n\nprompt = 'I need to import a car from Japan. It costs 5 million Yen.'\n'How much must I withdraw from my US bank account'\nllm = struct_mlx_chat_api(base_url='http://localhost:8000', tool_reg=[currency_exchange])\nresp = asyncio.run(llm(prompt_to_chat(prompt), trip_timeout=60))\nprint(resp.first_choice_text)\n```\n\nNotice the use of the `rename` parameter metadata. In Python the param name we've asked the LLM to use, `from`, is a keyword, so to avoid confusion the actual function definition uses `from_`, and the `rename` instructs Toolio to make that change in the background.\n\nYou can also define asynchronous tools, e.g. `async def currency_exchange`, which I would actually recommend if, e.g. you are truly web scraping.\n\nYou might study the command line `pylib/cli/request.py` for further insight.\n\n# Direct usage via Python\n\nYou can also, of course, just load the model and run inference on it without bothering with HTTP client/server. The `model_manager` class is a convenient interface for this.\n\n```py\nimport asyncio\nfrom toolio.llm_helper import model_manager, extract_content\n\ntoolio_mm = model_manager('mlx-community/Hermes-2-Theta-Llama-3-8B-4bit')\n\nasync def say_hello(tmm):\n msgs = [{\"role\": \"user\", \"content\": \"Hello! How are you?\"}]\n async for chunk in extract_content(tmm.complete(msgs)):\n print(chunk, end='')\n\nasyncio.run(say_hello(toolio_mm))\n```\n\nYou should just get a simple text response from the LLm printed to the screen.\n\nYou can also do this via synchronous API, but I highly recommend leaing hard on the async habit.\n\nThe `chat_complete` method also takes a list of tools or a JSON schema, as well as some model parameters.\n\n## LLM response metadata\n\nToolio uses OpenAI API conventions a lot under the hood. If you run the following:\n\n```py\nimport asyncio\nfrom toolio.llm_helper import model_manager, extract_content\n\ntoolio_mm = model_manager('mlx-community/Hermes-2-Theta-Llama-3-8B-4bit')\n\nasync def say_hello(tmm):\n msgs = [{\"role\": \"user\", \"content\": \"Hello! How are you?\"}]\n async for chunk_struct in tmm.complete(msgs):\n print(chunk_struct)\n break\n\nasyncio.run(say_hello(toolio_mm))\n```\n\nYou should see something like:\n\n```py\n{'choices': [{'index': 0, 'delta': {'role': 'assistant', 'content': 'Hi'}, 'finish_reason': None}], 'object': 'chat.completion.chunk', 'id': 'chatcmpl-17588006160_1721823730', 'created': 1721823730, 'model': 'mlx-community/Hermes-2-Theta-Llama-3-8B-4bit'}\n```\n\nThe LLM response is delivered in such structures (\"deltas\") as they're generated. `chunk_struct['choices'][0]['delta']['content']` is a bit of the actual text we teased out in the previous snippet. `chunk_struct['choices'][0]['finish_reason']` is `None` because it's not yet finished, etc. This is based on OpenAI API.\n\n`extract_content`, used in the previous snippet, is a very simple coroutine that extracts the actual text content from this series of response structures.\n\nThe final chunk would look something like this:\n\n```py\n{'choices': [{'index': 0, 'delta': {'role': 'assistant', 'content': ''}, 'finish_reason': 'stop'}], 'usage': {'completion_tokens': 20, 'prompt_tokens': 12, 'total_tokens': 32}, 'object': 'chat.completion.chunk', 'id': 'chatcmpl-18503717840_1721824385', 'created': 1721824385, 'model': 'mlx-community/Hermes-2-Theta-Llama-3-8B-4bit'}\n```\n\nNotice there is more information, now that it's finished (`'finish_reason': 'stop'`). Say you want the metadata such as the number of tokens generated:\n\n```py\nimport asyncio\nfrom toolio.llm_helper import model_manager, extract_content\n\ntoolio_mm = model_manager('mlx-community/Hermes-2-Theta-Llama-3-8B-4bit')\n\nasync def say_hello(tmm):\n msgs = [{\"role\": \"user\", \"content\": \"Hello! How are you?\"}]\n async for chunk in tmm.complete(msgs):\n content = chunk['choices'][0]['delta']['content']\n if content is not None:\n print(content, end='')\n\n # Final chunk has the stats\n print('\\n', '-'*80, '\\n', 'Number of tokens generated:', chunk['usage']['total_tokens'])\n\nasyncio.run(say_hello(toolio_mm))\n```\n\nYou'll get something like:\n\n```\n*waves* Hi there! I'm doing well, thank you for asking. How about you?\n --------------------------------------------------------------------------------\n Number of tokens generated: 32\n```\n\nTip: don't forget all the various, useful bits to be found in `itertools` and the like.\n\n# Structured LLM responses via direct API\n\nAs mentioned, you can specify tools and schemata.\n\n```py\nimport asyncio\nfrom toolio.llm_helper import model_manager, extract_content\n\ntoolio_mm = model_manager('mlx-community/Hermes-2-Theta-Llama-3-8B-4bit')\n\nasync def say_hello(tmm):\n prompt = ('Which countries are mentioned in the sentence \\'Adamma went home to Nigeria for the hols\\'?'\n 'Your answer should be only JSON, according to this schema: {json_schema}')\n schema = ('{\"type\": \"array\", \"items\":'\n '{\"type\": \"object\", \"properties\": {\"name\": {\"type\": \"string\"}, \"continent\": {\"type\": \"string\"}}}}')\n msgs = [{'role': 'user', 'content': prompt.format(json_schema=schema)}]\n async for chunk in extract_content(tmm.complete(msgs, json_schema=schema)):\n print(chunk, end='')\n\nasyncio.run(say_hello(toolio_mm))\n```\n\n## Example of tool use\n\n```py\nimport asyncio\nfrom math import sqrt\nfrom toolio.llm_helper import model_manager, extract_content\n\nSQUARE_ROOT_METADATA = {'name': 'square_root', 'description': 'Get the square root of the given number',\n 'parameters': {'type': 'object', 'properties': {\n 'square': {'type': 'number',\n 'description': 'Number from which to find the square root'}},\n 'required': ['square']}}\ntoolio_mm = model_manager('mlx-community/Hermes-2-Theta-Llama-3-8B-4bit',\n tool_reg=[(sqrt, SQUARE_ROOT_METADATA)])\n\n\nasync def query_sq_root(tmm):\n msgs = [ {'role': 'user', 'content': 'What is the square root of 256?'} ]\n async for chunk in extract_content(tmm.complete_with_tools(msgs)):\n print(chunk, end='')\n\nasyncio.run(query_sq_root(toolio_mm))\n```\n\n# Tweaking prompts\n\nPart of the process of getting an LLM to stick to a schema, or to call tools is to give it a system prompt to that effect. Toolio has built in prompt language for this purpose. We believe strongly in the design principle of separating natural language (e.g. prompts) from code, so the latyter is packaged into the `resource/language.toml` file, using [Word Loom](https://github.com/OoriData/OgbujiPT/wiki/Word-Loom:-A-format-for-managing-language-for-AI-LLMs-(including-prompts)) conventions.\n\nYou can of course override the built-in prompting.\n\n## Overriding the tool-calling system prompt from the command line\n\n```sh\necho 'What is the square root of 256?' > /tmp/llmprompt.txt\necho '{\"tools\": [{\"type\": \"function\",\"function\": {\"name\": \"square_root\",\"description\": \"Get the square root of the given number\",\"parameters\": {\"type\": \"object\", \"properties\": {\"square\": {\"type\": \"number\", \"description\": \"Number from which to find the square root\"}},\"required\": [\"square\"]},\"pyfunc\": \"math|sqrt\"}}], \"tool_choice\": \"auto\"}' > /tmp/toolspec.json\ntoolio_request --apibase=\"http://localhost:8000\" --prompt-file=/tmp/llmprompt.txt --tools-file=/tmp/toolspec.json --sysprompt=\"You are a helpful assistant with access to a tool that you may invoke if needed to answer the user's request. Please use the tool as applicable, even if you think you already know the answer. Give your final answer in Shakespearean English The tool is:\nTool\"\n```\n\n## Overriding the tool-calling system prompt from the Python API\n\nIn order to override the system prompt from code, just set it in the initial chat message as the `system` role.\n\n```py\nimport asyncio\nfrom math import sqrt\nfrom toolio.llm_helper import model_manager, extract_content\n\nSQUARE_ROOT_METADATA = {'name': 'square_root', 'description': 'Get the square root of the given number',\n 'parameters': {'type': 'object', 'properties': {\n 'square': {'type': 'number',\n 'description': 'Number from which to find the square root'}},\n 'required': ['square']}}\ntoolio_mm = model_manager('mlx-community/Hermes-2-Theta-Llama-3-8B-4bit',\n tool_reg=[(sqrt, SQUARE_ROOT_METADATA)])\n\n# System prompt will be used to direct the LLM's tool-calling\nSYSPROMPT = 'You are a tutor from Elizabethan England, with access to a tool that you may invoke if needed to answer'\n'the user\\'s request. Please use the tool as applicable, even if you think you already know the answer. '\n'Remember to give your final answer in Elizabethan English. The tool is:\\nTool'\n\nasync def query_sq_root(tmm):\n msgs = [\n {'role': 'system', 'content': SYSPROMPT},\n {'role': 'user', 'content': 'What is the square root of 256?'}\n ]\n async for chunk in extract_content(tmm.complete_with_tools(msgs)):\n print(chunk, end='')\n\nasyncio.run(query_sq_root(toolio_mm))\n```\n\nIn which case you can express a response such as:\n\n> By the tool's decree, the square root of 256, a number most fair,\n> Is sixteen, a digit most true, and a figure most rare.\n\n# Learn more\n\n* [Documentation](https://OoriData.github.io/Toolio/)\n* More examples in the `demo` directory\n\n# Credits\n\n* otriscon's [llm-structured-output](https://github.com/otriscon/llm-structured-output/) is the foundation of this package\n* [OgbujiPT](https://github.com/OoriData/OgbujiPT) provides the client-side Open-AI-style LLM framework, and also the [Word Loom](https://github.com/OoriData/OgbujiPT/wiki/Word-Loom:-A-format-for-managing-language-for-AI-LLMs-(including-prompts)) convention for separating prompt text from code.\n\n# License\n\nApache 2\n\n# Nearby projects\n\n* [Outlines](https://github.com/outlines-dev/outlines) - Structured Text Generation vis Pydantic, JSON schema or EBNF. Similarly to Toolio, it does steered sampling, i.e. builds a finite-state machine to guide sampling based on schema\n* [Instructor](https://github.com/jxnl/instructor) - LLM structured output via prompt engineering, validation & retries rather than steered sampling.\n\n\n# Why this, anyway?\n\nIn our thinking, and that of many others working in the space for a while, agent/tool systems are where GenAI are most likely to deliver practical value. Watch out, though, because McKinsey has seen fit to apply their $1,000/hr opinions along the same lines. [\"Why agents are the next frontier of generative AI\"](https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/why-agents-are-the-next-frontier-of-generative-ai?cid=soc-web) (July 2024)\n\n[Parrot/Gorilla cartoon here]\n\n# Project name\n\nNamed after the legend himself. Best don't pretend you don't know Coolio, fool! Popular rapper (R.I.P.) from LA. You watched *Cookin' with Coolio*, now it's time to Tool up with Toolio! \u266a*Slide slide, but that's the past; I got something brand new for that a\u00df.*\ud83c\udfbc\n",
"bugtrack_url": null,
"license": null,
"summary": "OpenAI-like HTTP server API implementation which supports structured LLM response generation (e.g. make it conform to a JSON schema)",
"version": "0.5.0",
"project_urls": {
"Documentation": "https://OoriData.github.io/Toolio/",
"Issues": "https://github.com/OoriData/Toolio/issues",
"Source": "https://github.com/OoriData/Toolio"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b3d201cedd1c98e219de02db674238ceba0032854244f72006f724cdd928651e",
"md5": "f9eacab89a09522c0ac175472d480520",
"sha256": "320672987ec243f106a03a9fabe66a16e7af7eb81053ba5665bc67a3b7790769"
},
"downloads": -1,
"filename": "toolio-0.5.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f9eacab89a09522c0ac175472d480520",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.11",
"size": 47698,
"upload_time": "2024-09-03T13:10:33",
"upload_time_iso_8601": "2024-09-03T13:10:33.872032Z",
"url": "https://files.pythonhosted.org/packages/b3/d2/01cedd1c98e219de02db674238ceba0032854244f72006f724cdd928651e/toolio-0.5.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "4469796c0bb0276686920b89d2a0d6a1b7bcacb4985cd6d952abc1561a5c26f2",
"md5": "66cc5b407031704b18606beb1496cfb4",
"sha256": "c661beab5f8e05235e63f983d74e039d844b7f7029a6931d81de4b2470ff1dcc"
},
"downloads": -1,
"filename": "toolio-0.5.0.tar.gz",
"has_sig": false,
"md5_digest": "66cc5b407031704b18606beb1496cfb4",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.11",
"size": 60512,
"upload_time": "2024-09-03T13:10:35",
"upload_time_iso_8601": "2024-09-03T13:10:35.839738Z",
"url": "https://files.pythonhosted.org/packages/44/69/796c0bb0276686920b89d2a0d6a1b7bcacb4985cd6d952abc1561a5c26f2/toolio-0.5.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-03 13:10:35",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "OoriData",
"github_project": "Toolio",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "toolio"
}