<div align="center">
<img width="308" src="./assets/logo.png" />
</div>
---
**Saplings is a framework for building agents that use search algorithms to solve problems.**
Traditional agents don't work well because they're vulnerable to compounding errors. Even a small mistake early in the loop can snowball and ruin the final output. But with search, agents can explore different tool-use trajectories and find the best path. This ability to look multiple steps ahead helps agents avoid mistakes and boosts overall task performance, especially on complex reasoning problems (e.g. codegen, web navigation). Saplings lets you build search into your agents with **just a couple lines of code.**
Traditional agents don't work well because even a small mistake early in the loop can snowball and ruin the final output. But with search, agents can explore
With tree search, agents can explore different tool-use trajectories and find the best path. This ability to look multiple steps ahead helps agents avoid mistakes and boosts overall task performance, especially on complex reasoning problems (e.g. codegen, web navigation). Saplings lets you build search into your agents with **just a couple lines of code.**
# 🌳 Saplings
**A framework for building agents that use tree search to solve problems.**
Traditional agents don't work well because they're vulnerable to compounding errors. Even a small mistake early in the loop can snowball and ruin the final output. But with search, agents can explore different tool-use trajectories before committing to one, making it easier to avoid mistakes. This lookahead ability makes your agents smarter, especially on complex reasoning tasks.
Saplings lets you build search-enabled agents with just a couple lines of code.
Regular function-calling agents fail because they're vulnerable to compounding errors. Even a small mistake early in the loop can snowball and ruin the final output. But with search, agents can explore different tool-use trajectories before committing to one, making it easier to avoid mistakes. This lookahead ability makes your agents smarter, especially on complex reasoning tasks. And saplings lets you build search-enabled agents with just a couple lines of code.
---
Regular chain-of-thought agents don't work well because they're vulnerable to compounding errors. Even a small mistake early in the loop can snowball and ruin the final output. Adding search allows your agents to look multiple steps ahead before committing to a tool-use trajectory, making it easier to avoid mistakes. Search is probably the easiest way to boost performance, especially on complex reasoning tasks. And saplings lets you build search-enabled agents with just a couple lines of code.
---
With search, agents can explore different tool-use trajectories and find the best path. This ability to look multiple steps ahead helps agents avoid mistakes and is an easy way to boost performance, especially on complex reasoning tasks. Saplings lets you build search into your agents with just **a couple lines of code.**
**Saplings is a plug-and-play framework for building smarter agents with search algorithms.**
**Saplings is a plug-and-play framework for building agents that use search algorithms to complete tasks.**
By incorporating search, an agent can explore different tool-use trajectories and find the optimal path. This ability to look multiple steps ahead reduces errors and boosts overall task performance –– especially on complex reasoning problems, like code generation or navigating a website. With saplings, you can build search into your agents with just a couple lines of code.
- Supports popular search algorithms: **Monte Carlo Tree Search (MCTS), A\*, and greedy best-first search**
- Uses OpenAI function calling under the hood
- Full control over the evaluation function, prompts, search parameters, etc.
![Demo](./assets/demo.png)
**Why add search?**
Chain-of-thought/ReAct-style agents don't work well because they're vulnerable to compounding errors. Even a small mistake early in the loop can snowball and ruin the final output. Adding tree search gives your agent lookahead and backtracking abilities, making it easier to recover from such mistakes. And as compute becomes cheaper, it will become table stakes for agents to use inference-time search.
---
- [Installation](#installation)
- [Quickstart](#quickstart)
- [Creating a tool](#creating-a-tool)
- [Configuring an agent](#configuring-an-agent)
- [Docs](#docs)
- [Agents](#agents)
- [Parameters](#parameters)
- [`MonteCarloAgent`: Monte Carlo tree search](#montecarloagent-monte-carlo-tree-search)
- [`AStarAgent`: A\* search](astaragent-a-search)
- [`GreedyAgent`: Greedy best-first search](#greedyagent-greedy-best-first-search)
- [`COTAgent`: Chain-of-thought (no search)](#cotagent-chain-of-thought-no-search)
- [The `Message` object](#the-message-object)
- [Termination conditions](#termination-conditions)
- [Advanced tool options](#advanced-tool-options)
- [Accessing agent memory](#accessing-agent-memory)
- [The `format_output()` method](#the-format_output-method)
- [Custom evaluators](#custom-evaluators)
- [Roadmap](#roadmap)
## Installation
```bash
$ pip install saplings
```
## Quickstart
Below is a simple agent implementing Monte Carlo tree search (MCTS). It's equipped with a multiplication tool to solve tricky arithmetic problems.
```python
from saplings.examples import MultiplicationTool
from saplings.llms import OpenAI
from saplings import AStarAgent, Evaluator
model = OpenAI(model="gpt-4o", api_key="YOUR_API_KEY")
evaluator = Evaluator(model)
tools = [MultiplicationTool()]
agent = MonteCarloAgent(tools, model, evaluator)
messages, _, _ = agent.run("Let x = 9418.343 * 8.11 and y = 2x. Calculate (xy)(x^2).")
```
This is the "bare minimum" for setting up a search agent with saplings –– just a few lines of code. There are a lot more parameters you can control, all covered in the [docs](#docs). But let's first walk through the basics of creating your own tools and configuring an agent.
### Creating a tool
Tools are what your agent will use to perform a task or answer a query. Each tool must extend the `Tool` base class and implement a few variables and methods. Here's an example of a simple tool that multiples two numbers together:
```python
from saplings.abstract import Tool
class MultiplicationTool(Tool):
def __init__(self, **kwargs):
self.name = "multiply"
self.description = "Multiplies two numbers and returns the result number."
self.parameters = {
"type": "object",
"properties": {
"a": {
"type": "number",
"description": "The number to multiply."
},
"b": {
"type": "number",
"description": "The number to multiply by."
}
},
"required": ["a", "b"],
"additionalProperties": False
}
self.is_terminal = False
async def run(self, a, b, **kwargs):
return a * b
```
**Variables:**
The instance variables in the class tell the agent when and how to call the tool. If you've used [OpenAI function calling](https://platform.openai.com/docs/guides/function-calling) before, most of this should be familiar to you.
- `name` (str): Name of the tool.
- `description` (str): Description of what the tool does and when to call it.
- `parameters` (dict): Arguments for the tool as a JSON schema.
- `is_terminal` (bool): If `True`, calling this tool will terminate a search trajectory –– meaning, no subsequent tools can be called after this one. This is typically used for tools that generate a final output for the user (e.g. an answer to a question). More on this [here.](#termination-conditions)
**`run()` method:**
This is what actually executes the tool when the agent calls it. Arguments should be the same as the input parameters in the tool schema.
**Advanced options:**
There are additional things you can do with tools, such as accessing the agent's memory during tool execution, or controlling how tool output is shown to the model (vs. how it's stored in memory). You can read about these options [here.](#advanced-tool-options)
### Configuring an agent
**Choosing a model:**
First, you need to choose a model for the agent to use. Saplings only supports OpenAI models right now, but Anthropic and Groq are on the [roadmap](#roadmap).
```python
from saplings.llms import OpenAI
model = OpenAI(model="gpt-4o", api_key="YOUR_API_KEY") # Defaults to os.getenv("OPENAI_API_KEY") if empty
```
> Note: if you pass in additional `**kwargs`, they will get passed to all the chat completion calls made with this model.
**Setting up the evaluator:**
This is what will guide the search process. The evaluator takes a search trajectory (i.e. a list of OpenAI messages) and returns a score between 0 and 1, indicating how promising the trajectory is. By default, a score of `1.0` means the agent has _solved_ the problem and can terminate the search. You can change the solution cutoff by setting the `threshold` parameter in the agent –– more on that [here](#parameters).
```python
from saplings import Evaluator
evaluator = Evaluator(model)
```
The default evaluator provided by saplings uses a LLM (i.e. the `model` you pass in above) to score trajectories. The `Evaluator` object has parameters that let you control things like the system prompt used and the sampling rate. You can also define your own custom evaluator if necessary. Read more about evaluators [here](#custom-evaluators).
**Choosing an agent/search algorithm:**
Once your tools, model, and evaluator are ready, you can simply plug them into a saplings agent. There are multiple to choose from, each implementing their own tree search algorithm: `MonteCarloAgent`, `AStarAgent`, and `GreedyAgent`. There's also a regular chain-of-thought agent available, `COTAgent`, which does not implement any search. Each agent has their own advantages and disadvantages, which you can read about [here](#agents).
```python
from saplings import MonteCarloAgent
agent = MonteCarloAgent(tools, model, evaluator)
```
This will initialize your agent. To actually run it on an input, call the `run` method. To run it asynchronously, call the `run_async` method.
```python
messages, score, is_solution = agent.run("What's 2 * 2?") # await agent.run_async("What's 2 * 2?")
```
The output is a list of messages representing the best tool-use trajectory, the final score of the trajectory (as given by the evaluator), and whether or not the search terminated because the evaluator deemed the trajectory a _solution_ to the prompt. The messages are [`Message` objects,](#the-message-object) which are special objects native to saplings that wrap OpenAI messages.
Notably, there are [many more parameters](#parameters) you can set for the agent, such as the system prompt that governs it.
## Docs
### Agents
#### Parameters
Every agent in saplings has the same parameters, listed below:
1. `tools` (List[Tool]): List of tools your agent can use.
2. `model` (Model): LLM provider that your agent will use to call tools.
3. `evaluator` (BaseEvaluator): Evaluation function that the agent will use to guide the search process.
4. `prompt` (str): System prompt for the agent.
5. `b_factor` (int): Branching factor, i.e. the number of potential next tool calls to evaluate at each step in a search trajectory. Note that this parameter does not do anything for `COTAgent`.
6. `max_depth` (int): Maximum depth of the search tree, indicating how many levels the agent can explore.
7. `threshold` (float): A cutoff value for the evaluation function. If a trajectory's evaluation score is above this threshold, the search will terminate and that trajectory will be accepted as the solution.<!--TODO: Note the exception here-->
8. `verbose` (bool): Whether to print logging statements when you run the agent.
9. `tool_choice` ("auto" | "required"): Same as the `tool_choice` parameter in the OpenAI chat completions function. Indicates whether the model must _always_ call a tool, or if it can decide to generate a normal response instead.<!--TODO: Note that this changes the termination behavior of the agent-->
10. `parallel_tool_calls` (bool): Same as the `parallel_tool_calls` parameter in the OpenAI chat completions function. Indicates whether the model can generate _multiple_ tool calls in a single completion request.
#### `GreedyAgent`: Greedy best-first search
This agent implements a greedy best-first search. It's the fastest and cheapest search agent, in terms of LLM calls, but it's also incapable of backtracking, thus making it the least effective agent. `GreedyAgent` works by taking the input and generating a set of candidate tool calls. It executes each tool call and evaluates their outputs. Then, it picks the best tool call based on its evaluation and generates a set of candidate _next_ tool calls. It repeats this process until a termination condition is met.
#### `MonteCarloAgent`: Monte Carlo tree search
![Demo](./assets/mcts.png)
This agent implements the Monte Carlo tree search (MCTS) algorithm, based on the paper [Language Agent Tree Search (Zhou, et. al).](https://arxiv.org/pdf/2310.04406) It is the most effective agent you can build with saplings, but also the slowest and most expensive (in terms of LLM calls) in the worst case. The primary advantage of this agent is its ability to balance exploration and exploitation, allowing it to efficiently find optimal trajectories by using past experiences and adjusting its strategy accordingly.
Note that, besides the parameters [listed above](#parameters), this agent has one additional parameter:
1. `max_rollouts` (int, default = 10): This controls the maximum # of simulations the agent can perform.
#### `AStarAgent`: A\* search
![Demo](./assets/astar.gif)
Implements a variation of the A\* pathfinding algorithm, based on the paper [Tree Search for Language Model Agents (Koh, et al.).](https://arxiv.org/abs/2407.01476) Unlike `GreedyAgent`, this agent makes more LLM calls in the worst case, but is capable of backtracking and recovering from mistakes. However, unlike `MonteCarloAgent`, it does not update its search strategy based on the trajectories it has already explored. Oftentimes, `AStarAgent` is the perfect middle-ground between `GreedyAgent` (dumb but fast) and `MonteCarloAgent` (smart but slow).
#### `COTAgent`: Chain-of-thought (no search)
This is a standard tool-calling agent and does not implement any search. It takes an input, calls a tool, then uses the tool output to inform the next tool call, and so on until a termination condition is met. Think of `COTAgent` as a baseline to compare your search agents to.
### The `Message` object
[Messages](https://github.com/shobrook/saplings/blob/master/src/dtos/Message.py) are a core data structure in saplings. They are essentially equivalent to OpenAI messages (e.g. user input, tool calls, tool responses, assistant responses), with a few extra properties and helper methods. A list of messages represents a search trajectory. When you run an agent, it will return a list of messages representing the best trajectory it found.
Saplings messages can be easily converted into OpenAI messages using the `to_openai_message()` method.
```python
messages, _, _ = agent.run("This is my prompt!")
messages = [message.to_openai_response() for message in messages]
print(messages)
# [{"role": "user", "content": "This is my prompt!"}, ..., {"role": "assistant", "content": "This is a response!"}]
```
Message objects have only one additional attribute that OpenAI messages don't have. If a message represents a tool response, it will have a `raw_output` property that contains the output of that tool. What's stored here [may be different than](#the-format_output-method) the tool response that gets shown to the model, which is stored in the `content` property.
### Termination conditions
Every tool has an `is_terminal` property. This is a boolean flag that tells the agent if calling the tool should terminate a search trajectory. If it's `True`, no subsequent tool calls can be made after the tool is invoked, and the agent will terminate that search trajectory. Terminal tools are typically used to generate some sort of final output for the user (e.g. an answer to a question).
We say that an agent can _self-terminate_ if it has at least one terminal tool, OR if the `tool_choice` parameter is set to "auto." In the latter case, this means that calling a tool is _optional_ for the agent, and instead of a tool call, it can generate a regular assistant response to the input prompt. We consider such a response to also terminate a search trajectory.
If an agent cannot self-terminate, then a search trajectory will only ever terminate if either a maximum depth is reached (set by the `max_depth` parameter), or the evaluator marks a trajectory as _solved_ (i.e. the score is >= the agent's `threshold` parameter) –– in which case the entire search itself terminates.
An important point of confusion here: even if an evaluator marks a trajectory as solved, the search may not terminate if the agent can self-terminate. This happens when a trajectory ends with a _non-terminal_ tool call (or a non-assistant response, in the case when tool use is optional) but is still given a score above the solution threshold. In this case, the search will continue unless until a terminal state is reached that is marked as solved. If no terminal state is ever reached, the trajectory with the best score is returned. If no solution is ever found, and there is one trajectory with a terminal state and another with a non-terminal state but a higher score, the terminal trajectory is preferred and returned.
<!--### Streaming
Let's say you're building a Q&A agent and its equipped with a _terminal_ tool that generates a final answer when called. During the search process, this tool may be called _multiple_ times before the search terminates. In these cases, if your tool streams the output to the user, the user will get multiple streams until a final one is...-->
### Advanced tool options
#### Accessing agent memory
In some cases, running your tool may depend on the output of the previous tools your agent has used, or the user input itself. If this is the case, you can access the agent's current search trajectory in the `run` method when you implement your tool. Simply use `kwargs.get("trajectory")`. This will return a list of [Message](#the-message-object) objects, which are wrappers around OpenAI messages.
#### The `format_output()` method
In some cases, it makes sense for the raw output of a tool to be separated from the output that's shown to the model. By default, the output of `run()` is what's shown to the model. But you can add the _optional_ `format_output` method to your tool class to change how the output is presented to the agent. For example, in our quickstart example, instead of seeing the multiplication result N, you might want the model to see "A \* B = N" so the agent can more easily keep track of what numbers have been multiplied. Here's how you'd modify the tool to do that:
```python
class MultiplicationTool(object):
...
async def run(self, a, b, **kwargs):
return {"a": a, "b": "result": a * b}
def format_output(self, output):
a, b = output['a'], output['b']
result = output['result']
return f"{a} * {b} = {result}"
```
The unformatted output of the tool is still stored in the agent's memory. It can be access via the `raw_output` property of the [Message](#the-message-object) object that represents the tool response.
### Custom evaluators
Every agent implements a _heuristic search algorithm,_ meaning that it uses some heuristic or value function to guide the search. By default, saplings offers the `Evaluator` object, which evaluates a search trajectory using a LLM. It takes a trajectory (i.e. a list of OpenAI messages) as input and returns a score between 0 and 1 which tells the agent if its on the right track or not, along with some written justification for the score.
The `Evaluator` object has the following parameters:
1. `model` (Model): The LLM used to generate the score.
2. `n_samples` (int): The number of scores to generate for a given trajectory. Equivalent to the `n` parameter in an OpenAI chat completion. If it's greater than 1, multiple candidate scores will be generated for a given trajectory and then averaged to return the final score. Making this greater than 1 is equivalent to enabling [self-consistency](https://arxiv.org/pdf/2203.11171) in the evaluation process.
3. `prompt` (str): The system prompt that tells the model how it should evaluate a trajectory and generate a score.
In most cases, simply customizing this object will be sufficient, but in some situations it makes sense to build your own evaluator. For example, if you're building a coding agent, you may want to evaluate a search trajectory using some external feedback, such as whether the code compiles or whether a set of unit tests are passing. To build a custom evaluator, you must extend the `Evaluator` base class and implement a `run` method. This method must take in a list of [Message](#the-message-object) objects as input, representing a search trajectory, and return an [`EvaluationDTO` object](https://github.com/shobrook/saplings/blob/master/src/dtos/Evaluation.py) as output. This object has two properties: `score` (a value between 0 and 1) and `reasoning` (an _optional_ string with written justification for the score).
```python
from saplings.abstract import Evaluator
from saplings.dtos import EvaluationDTO
class CustomEvaluator(Evaluator):
def __init__(self):
pass
async def run(self, trajectory: List[Message]) -> EvaluationDTO:
# Implement this
return EvaluationDTO(score=1.0, reasoning="Justification goes here.")
```
Note that the trajectory will always contain the original input message, every tool call, and every tool response. For the tool responses, you can access the raw output of the tool using the `Message.raw_output` property, discussed in more detail [here.](#the-format_output-method)
Each agent has a `threshold` parameter, which determines the minimum score at which to terminate the search and deem a trajectory as a _solution._ By default, it is 1.0, so you should keep this in mind when designing your evaluator.
## Roadmap
1. Support for chat history
2. Support for Anthropic and Groq models
3. Allow dynamic system prompts and tool schemas (i.e. prompts that change as the agent progresses)
4. Support for vision agents
5. Add an `llm_call_budget` parameter to every agent
**Mission:** More inference-time compute makes agents smarter. And as models get cheaper and faster, search will become more viable to use in production. Let's build the easiest and most powerful framework for building search-enabled agents.
## Note from the author
One of my other open-source packages used to be called saplings. It has since been renamed to [syntaxis](https://github.com/shobrook/syntaxis) and is now associated with the package of the same name on PyPi.
Raw data
{
"_id": null,
"home_page": "https://github.com/shobrook/saplings",
"name": "saplings",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3",
"maintainer_email": null,
"keywords": "llm, agent, openai, search-algorithm, search, ai, monte carlo tree search, mcts, a*, greedy",
"author": "shobrook",
"author_email": "shobrookj@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/db/dc/bc89b6e4affa306b33e67c2fdddbb297f9013cd4cff507a75506e97065e0/saplings-5.0.1.tar.gz",
"platform": null,
"description": "<div align=\"center\">\n <img width=\"308\" src=\"./assets/logo.png\" />\n</div>\n\n---\n\n**Saplings is a framework for building agents that use search algorithms to solve problems.**\n\nTraditional agents don't work well because they're vulnerable to compounding errors. Even a small mistake early in the loop can snowball and ruin the final output. But with search, agents can explore different tool-use trajectories and find the best path. This ability to look multiple steps ahead helps agents avoid mistakes and boosts overall task performance, especially on complex reasoning problems (e.g. codegen, web navigation). Saplings lets you build search into your agents with **just a couple lines of code.**\n\nTraditional agents don't work well because even a small mistake early in the loop can snowball and ruin the final output. But with search, agents can explore\n\nWith tree search, agents can explore different tool-use trajectories and find the best path. This ability to look multiple steps ahead helps agents avoid mistakes and boosts overall task performance, especially on complex reasoning problems (e.g. codegen, web navigation). Saplings lets you build search into your agents with **just a couple lines of code.**\n\n# \ud83c\udf33 Saplings\n\n**A framework for building agents that use tree search to solve problems.**\n\nTraditional agents don't work well because they're vulnerable to compounding errors. Even a small mistake early in the loop can snowball and ruin the final output. But with search, agents can explore different tool-use trajectories before committing to one, making it easier to avoid mistakes. This lookahead ability makes your agents smarter, especially on complex reasoning tasks.\n\nSaplings lets you build search-enabled agents with just a couple lines of code.\n\nRegular function-calling agents fail because they're vulnerable to compounding errors. Even a small mistake early in the loop can snowball and ruin the final output. But with search, agents can explore different tool-use trajectories before committing to one, making it easier to avoid mistakes. This lookahead ability makes your agents smarter, especially on complex reasoning tasks. And saplings lets you build search-enabled agents with just a couple lines of code.\n\n---\n\nRegular chain-of-thought agents don't work well because they're vulnerable to compounding errors. Even a small mistake early in the loop can snowball and ruin the final output. Adding search allows your agents to look multiple steps ahead before committing to a tool-use trajectory, making it easier to avoid mistakes. Search is probably the easiest way to boost performance, especially on complex reasoning tasks. And saplings lets you build search-enabled agents with just a couple lines of code.\n\n---\n\nWith search, agents can explore different tool-use trajectories and find the best path. This ability to look multiple steps ahead helps agents avoid mistakes and is an easy way to boost performance, especially on complex reasoning tasks. Saplings lets you build search into your agents with just **a couple lines of code.**\n\n**Saplings is a plug-and-play framework for building smarter agents with search algorithms.**\n\n**Saplings is a plug-and-play framework for building agents that use search algorithms to complete tasks.**\n\nBy incorporating search, an agent can explore different tool-use trajectories and find the optimal path. This ability to look multiple steps ahead reduces errors and boosts overall task performance \u2013\u2013 especially on complex reasoning problems, like code generation or navigating a website. With saplings, you can build search into your agents with just a couple lines of code.\n\n- Supports popular search algorithms: **Monte Carlo Tree Search (MCTS), A\\*, and greedy best-first search**\n- Uses OpenAI function calling under the hood\n- Full control over the evaluation function, prompts, search parameters, etc.\n\n![Demo](./assets/demo.png)\n\n**Why add search?**\n\nChain-of-thought/ReAct-style agents don't work well because they're vulnerable to compounding errors. Even a small mistake early in the loop can snowball and ruin the final output. Adding tree search gives your agent lookahead and backtracking abilities, making it easier to recover from such mistakes. And as compute becomes cheaper, it will become table stakes for agents to use inference-time search.\n\n---\n\n- [Installation](#installation)\n- [Quickstart](#quickstart)\n - [Creating a tool](#creating-a-tool)\n - [Configuring an agent](#configuring-an-agent)\n- [Docs](#docs)\n - [Agents](#agents)\n - [Parameters](#parameters)\n - [`MonteCarloAgent`: Monte Carlo tree search](#montecarloagent-monte-carlo-tree-search)\n - [`AStarAgent`: A\\* search](astaragent-a-search)\n - [`GreedyAgent`: Greedy best-first search](#greedyagent-greedy-best-first-search)\n - [`COTAgent`: Chain-of-thought (no search)](#cotagent-chain-of-thought-no-search)\n - [The `Message` object](#the-message-object)\n - [Termination conditions](#termination-conditions)\n - [Advanced tool options](#advanced-tool-options)\n - [Accessing agent memory](#accessing-agent-memory)\n - [The `format_output()` method](#the-format_output-method)\n - [Custom evaluators](#custom-evaluators)\n- [Roadmap](#roadmap)\n\n## Installation\n\n```bash\n$ pip install saplings\n```\n\n## Quickstart\n\nBelow is a simple agent implementing Monte Carlo tree search (MCTS). It's equipped with a multiplication tool to solve tricky arithmetic problems.\n\n```python\nfrom saplings.examples import MultiplicationTool\nfrom saplings.llms import OpenAI\nfrom saplings import AStarAgent, Evaluator\n\nmodel = OpenAI(model=\"gpt-4o\", api_key=\"YOUR_API_KEY\")\nevaluator = Evaluator(model)\ntools = [MultiplicationTool()]\n\nagent = MonteCarloAgent(tools, model, evaluator)\nmessages, _, _ = agent.run(\"Let x = 9418.343 * 8.11 and y = 2x. Calculate (xy)(x^2).\")\n```\n\nThis is the \"bare minimum\" for setting up a search agent with saplings \u2013\u2013 just a few lines of code. There are a lot more parameters you can control, all covered in the [docs](#docs). But let's first walk through the basics of creating your own tools and configuring an agent.\n\n### Creating a tool\n\nTools are what your agent will use to perform a task or answer a query. Each tool must extend the `Tool` base class and implement a few variables and methods. Here's an example of a simple tool that multiples two numbers together:\n\n```python\nfrom saplings.abstract import Tool\n\nclass MultiplicationTool(Tool):\n def __init__(self, **kwargs):\n self.name = \"multiply\"\n self.description = \"Multiplies two numbers and returns the result number.\"\n self.parameters = {\n \"type\": \"object\",\n \"properties\": {\n \"a\": {\n \"type\": \"number\",\n \"description\": \"The number to multiply.\"\n },\n \"b\": {\n \"type\": \"number\",\n \"description\": \"The number to multiply by.\"\n }\n },\n \"required\": [\"a\", \"b\"],\n \"additionalProperties\": False\n }\n self.is_terminal = False\n\n async def run(self, a, b, **kwargs):\n return a * b\n```\n\n**Variables:**\n\nThe instance variables in the class tell the agent when and how to call the tool. If you've used [OpenAI function calling](https://platform.openai.com/docs/guides/function-calling) before, most of this should be familiar to you.\n\n- `name` (str): Name of the tool.\n- `description` (str): Description of what the tool does and when to call it.\n- `parameters` (dict): Arguments for the tool as a JSON schema.\n- `is_terminal` (bool): If `True`, calling this tool will terminate a search trajectory \u2013\u2013 meaning, no subsequent tools can be called after this one. This is typically used for tools that generate a final output for the user (e.g. an answer to a question). More on this [here.](#termination-conditions)\n\n**`run()` method:**\n\nThis is what actually executes the tool when the agent calls it. Arguments should be the same as the input parameters in the tool schema.\n\n**Advanced options:**\n\nThere are additional things you can do with tools, such as accessing the agent's memory during tool execution, or controlling how tool output is shown to the model (vs. how it's stored in memory). You can read about these options [here.](#advanced-tool-options)\n\n### Configuring an agent\n\n**Choosing a model:**\n\nFirst, you need to choose a model for the agent to use. Saplings only supports OpenAI models right now, but Anthropic and Groq are on the [roadmap](#roadmap).\n\n```python\nfrom saplings.llms import OpenAI\n\nmodel = OpenAI(model=\"gpt-4o\", api_key=\"YOUR_API_KEY\") # Defaults to os.getenv(\"OPENAI_API_KEY\") if empty\n```\n\n> Note: if you pass in additional `**kwargs`, they will get passed to all the chat completion calls made with this model.\n\n**Setting up the evaluator:**\n\nThis is what will guide the search process. The evaluator takes a search trajectory (i.e. a list of OpenAI messages) and returns a score between 0 and 1, indicating how promising the trajectory is. By default, a score of `1.0` means the agent has _solved_ the problem and can terminate the search. You can change the solution cutoff by setting the `threshold` parameter in the agent \u2013\u2013 more on that [here](#parameters).\n\n```python\nfrom saplings import Evaluator\n\nevaluator = Evaluator(model)\n```\n\nThe default evaluator provided by saplings uses a LLM (i.e. the `model` you pass in above) to score trajectories. The `Evaluator` object has parameters that let you control things like the system prompt used and the sampling rate. You can also define your own custom evaluator if necessary. Read more about evaluators [here](#custom-evaluators).\n\n**Choosing an agent/search algorithm:**\n\nOnce your tools, model, and evaluator are ready, you can simply plug them into a saplings agent. There are multiple to choose from, each implementing their own tree search algorithm: `MonteCarloAgent`, `AStarAgent`, and `GreedyAgent`. There's also a regular chain-of-thought agent available, `COTAgent`, which does not implement any search. Each agent has their own advantages and disadvantages, which you can read about [here](#agents).\n\n```python\nfrom saplings import MonteCarloAgent\n\nagent = MonteCarloAgent(tools, model, evaluator)\n```\n\nThis will initialize your agent. To actually run it on an input, call the `run` method. To run it asynchronously, call the `run_async` method.\n\n```python\nmessages, score, is_solution = agent.run(\"What's 2 * 2?\") # await agent.run_async(\"What's 2 * 2?\")\n```\n\nThe output is a list of messages representing the best tool-use trajectory, the final score of the trajectory (as given by the evaluator), and whether or not the search terminated because the evaluator deemed the trajectory a _solution_ to the prompt. The messages are [`Message` objects,](#the-message-object) which are special objects native to saplings that wrap OpenAI messages.\n\nNotably, there are [many more parameters](#parameters) you can set for the agent, such as the system prompt that governs it.\n\n## Docs\n\n### Agents\n\n#### Parameters\n\nEvery agent in saplings has the same parameters, listed below:\n\n1. `tools` (List[Tool]): List of tools your agent can use.\n2. `model` (Model): LLM provider that your agent will use to call tools.\n3. `evaluator` (BaseEvaluator): Evaluation function that the agent will use to guide the search process.\n4. `prompt` (str): System prompt for the agent.\n5. `b_factor` (int): Branching factor, i.e. the number of potential next tool calls to evaluate at each step in a search trajectory. Note that this parameter does not do anything for `COTAgent`.\n6. `max_depth` (int): Maximum depth of the search tree, indicating how many levels the agent can explore.\n7. `threshold` (float): A cutoff value for the evaluation function. If a trajectory's evaluation score is above this threshold, the search will terminate and that trajectory will be accepted as the solution.<!--TODO: Note the exception here-->\n8. `verbose` (bool): Whether to print logging statements when you run the agent.\n9. `tool_choice` (\"auto\" | \"required\"): Same as the `tool_choice` parameter in the OpenAI chat completions function. Indicates whether the model must _always_ call a tool, or if it can decide to generate a normal response instead.<!--TODO: Note that this changes the termination behavior of the agent-->\n10. `parallel_tool_calls` (bool): Same as the `parallel_tool_calls` parameter in the OpenAI chat completions function. Indicates whether the model can generate _multiple_ tool calls in a single completion request.\n\n#### `GreedyAgent`: Greedy best-first search\n\nThis agent implements a greedy best-first search. It's the fastest and cheapest search agent, in terms of LLM calls, but it's also incapable of backtracking, thus making it the least effective agent. `GreedyAgent` works by taking the input and generating a set of candidate tool calls. It executes each tool call and evaluates their outputs. Then, it picks the best tool call based on its evaluation and generates a set of candidate _next_ tool calls. It repeats this process until a termination condition is met.\n\n#### `MonteCarloAgent`: Monte Carlo tree search\n\n![Demo](./assets/mcts.png)\n\nThis agent implements the Monte Carlo tree search (MCTS) algorithm, based on the paper [Language Agent Tree Search (Zhou, et. al).](https://arxiv.org/pdf/2310.04406) It is the most effective agent you can build with saplings, but also the slowest and most expensive (in terms of LLM calls) in the worst case. The primary advantage of this agent is its ability to balance exploration and exploitation, allowing it to efficiently find optimal trajectories by using past experiences and adjusting its strategy accordingly.\n\nNote that, besides the parameters [listed above](#parameters), this agent has one additional parameter:\n\n1. `max_rollouts` (int, default = 10): This controls the maximum # of simulations the agent can perform.\n\n#### `AStarAgent`: A\\* search\n\n![Demo](./assets/astar.gif)\n\nImplements a variation of the A\\* pathfinding algorithm, based on the paper [Tree Search for Language Model Agents (Koh, et al.).](https://arxiv.org/abs/2407.01476) Unlike `GreedyAgent`, this agent makes more LLM calls in the worst case, but is capable of backtracking and recovering from mistakes. However, unlike `MonteCarloAgent`, it does not update its search strategy based on the trajectories it has already explored. Oftentimes, `AStarAgent` is the perfect middle-ground between `GreedyAgent` (dumb but fast) and `MonteCarloAgent` (smart but slow).\n\n#### `COTAgent`: Chain-of-thought (no search)\n\nThis is a standard tool-calling agent and does not implement any search. It takes an input, calls a tool, then uses the tool output to inform the next tool call, and so on until a termination condition is met. Think of `COTAgent` as a baseline to compare your search agents to.\n\n### The `Message` object\n\n[Messages](https://github.com/shobrook/saplings/blob/master/src/dtos/Message.py) are a core data structure in saplings. They are essentially equivalent to OpenAI messages (e.g. user input, tool calls, tool responses, assistant responses), with a few extra properties and helper methods. A list of messages represents a search trajectory. When you run an agent, it will return a list of messages representing the best trajectory it found.\n\nSaplings messages can be easily converted into OpenAI messages using the `to_openai_message()` method.\n\n```python\nmessages, _, _ = agent.run(\"This is my prompt!\")\nmessages = [message.to_openai_response() for message in messages]\n\nprint(messages)\n# [{\"role\": \"user\", \"content\": \"This is my prompt!\"}, ..., {\"role\": \"assistant\", \"content\": \"This is a response!\"}]\n```\n\nMessage objects have only one additional attribute that OpenAI messages don't have. If a message represents a tool response, it will have a `raw_output` property that contains the output of that tool. What's stored here [may be different than](#the-format_output-method) the tool response that gets shown to the model, which is stored in the `content` property.\n\n### Termination conditions\n\nEvery tool has an `is_terminal` property. This is a boolean flag that tells the agent if calling the tool should terminate a search trajectory. If it's `True`, no subsequent tool calls can be made after the tool is invoked, and the agent will terminate that search trajectory. Terminal tools are typically used to generate some sort of final output for the user (e.g. an answer to a question).\n\nWe say that an agent can _self-terminate_ if it has at least one terminal tool, OR if the `tool_choice` parameter is set to \"auto.\" In the latter case, this means that calling a tool is _optional_ for the agent, and instead of a tool call, it can generate a regular assistant response to the input prompt. We consider such a response to also terminate a search trajectory.\n\nIf an agent cannot self-terminate, then a search trajectory will only ever terminate if either a maximum depth is reached (set by the `max_depth` parameter), or the evaluator marks a trajectory as _solved_ (i.e. the score is >= the agent's `threshold` parameter) \u2013\u2013 in which case the entire search itself terminates.\n\nAn important point of confusion here: even if an evaluator marks a trajectory as solved, the search may not terminate if the agent can self-terminate. This happens when a trajectory ends with a _non-terminal_ tool call (or a non-assistant response, in the case when tool use is optional) but is still given a score above the solution threshold. In this case, the search will continue unless until a terminal state is reached that is marked as solved. If no terminal state is ever reached, the trajectory with the best score is returned. If no solution is ever found, and there is one trajectory with a terminal state and another with a non-terminal state but a higher score, the terminal trajectory is preferred and returned.\n\n<!--### Streaming\n\nLet's say you're building a Q&A agent and its equipped with a _terminal_ tool that generates a final answer when called. During the search process, this tool may be called _multiple_ times before the search terminates. In these cases, if your tool streams the output to the user, the user will get multiple streams until a final one is...-->\n\n### Advanced tool options\n\n#### Accessing agent memory\n\nIn some cases, running your tool may depend on the output of the previous tools your agent has used, or the user input itself. If this is the case, you can access the agent's current search trajectory in the `run` method when you implement your tool. Simply use `kwargs.get(\"trajectory\")`. This will return a list of [Message](#the-message-object) objects, which are wrappers around OpenAI messages.\n\n#### The `format_output()` method\n\nIn some cases, it makes sense for the raw output of a tool to be separated from the output that's shown to the model. By default, the output of `run()` is what's shown to the model. But you can add the _optional_ `format_output` method to your tool class to change how the output is presented to the agent. For example, in our quickstart example, instead of seeing the multiplication result N, you might want the model to see \"A \\* B = N\" so the agent can more easily keep track of what numbers have been multiplied. Here's how you'd modify the tool to do that:\n\n```python\nclass MultiplicationTool(object):\n ...\n\n async def run(self, a, b, **kwargs):\n return {\"a\": a, \"b\": \"result\": a * b}\n\n def format_output(self, output):\n a, b = output['a'], output['b']\n result = output['result']\n return f\"{a} * {b} = {result}\"\n```\n\nThe unformatted output of the tool is still stored in the agent's memory. It can be access via the `raw_output` property of the [Message](#the-message-object) object that represents the tool response.\n\n### Custom evaluators\n\nEvery agent implements a _heuristic search algorithm,_ meaning that it uses some heuristic or value function to guide the search. By default, saplings offers the `Evaluator` object, which evaluates a search trajectory using a LLM. It takes a trajectory (i.e. a list of OpenAI messages) as input and returns a score between 0 and 1 which tells the agent if its on the right track or not, along with some written justification for the score.\n\nThe `Evaluator` object has the following parameters:\n\n1. `model` (Model): The LLM used to generate the score.\n2. `n_samples` (int): The number of scores to generate for a given trajectory. Equivalent to the `n` parameter in an OpenAI chat completion. If it's greater than 1, multiple candidate scores will be generated for a given trajectory and then averaged to return the final score. Making this greater than 1 is equivalent to enabling [self-consistency](https://arxiv.org/pdf/2203.11171) in the evaluation process.\n3. `prompt` (str): The system prompt that tells the model how it should evaluate a trajectory and generate a score.\n\nIn most cases, simply customizing this object will be sufficient, but in some situations it makes sense to build your own evaluator. For example, if you're building a coding agent, you may want to evaluate a search trajectory using some external feedback, such as whether the code compiles or whether a set of unit tests are passing. To build a custom evaluator, you must extend the `Evaluator` base class and implement a `run` method. This method must take in a list of [Message](#the-message-object) objects as input, representing a search trajectory, and return an [`EvaluationDTO` object](https://github.com/shobrook/saplings/blob/master/src/dtos/Evaluation.py) as output. This object has two properties: `score` (a value between 0 and 1) and `reasoning` (an _optional_ string with written justification for the score).\n\n```python\nfrom saplings.abstract import Evaluator\nfrom saplings.dtos import EvaluationDTO\n\nclass CustomEvaluator(Evaluator):\n def __init__(self):\n pass\n\n async def run(self, trajectory: List[Message]) -> EvaluationDTO:\n # Implement this\n return EvaluationDTO(score=1.0, reasoning=\"Justification goes here.\")\n```\n\nNote that the trajectory will always contain the original input message, every tool call, and every tool response. For the tool responses, you can access the raw output of the tool using the `Message.raw_output` property, discussed in more detail [here.](#the-format_output-method)\n\nEach agent has a `threshold` parameter, which determines the minimum score at which to terminate the search and deem a trajectory as a _solution._ By default, it is 1.0, so you should keep this in mind when designing your evaluator.\n\n## Roadmap\n\n1. Support for chat history\n2. Support for Anthropic and Groq models\n3. Allow dynamic system prompts and tool schemas (i.e. prompts that change as the agent progresses)\n4. Support for vision agents\n5. Add an `llm_call_budget` parameter to every agent\n\n**Mission:** More inference-time compute makes agents smarter. And as models get cheaper and faster, search will become more viable to use in production. Let's build the easiest and most powerful framework for building search-enabled agents.\n\n## Note from the author\n\nOne of my other open-source packages used to be called saplings. It has since been renamed to [syntaxis](https://github.com/shobrook/syntaxis) and is now associated with the package of the same name on PyPi.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Build smarter AI agents with tree search",
"version": "5.0.1",
"project_urls": {
"Homepage": "https://github.com/shobrook/saplings"
},
"split_keywords": [
"llm",
" agent",
" openai",
" search-algorithm",
" search",
" ai",
" monte carlo tree search",
" mcts",
" a*",
" greedy"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "dbdcbc89b6e4affa306b33e67c2fdddbb297f9013cd4cff507a75506e97065e0",
"md5": "3106d0375e449600f24f9ff999e1b6e6",
"sha256": "3d0438d4c2a6e376e6944761285e621f07c93e5c499ab3ee08857de619417ce9"
},
"downloads": -1,
"filename": "saplings-5.0.1.tar.gz",
"has_sig": false,
"md5_digest": "3106d0375e449600f24f9ff999e1b6e6",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3",
"size": 28929,
"upload_time": "2024-11-22T17:52:33",
"upload_time_iso_8601": "2024-11-22T17:52:33.644808Z",
"url": "https://files.pythonhosted.org/packages/db/dc/bc89b6e4affa306b33e67c2fdddbb297f9013cd4cff507a75506e97065e0/saplings-5.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-22 17:52:33",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "shobrook",
"github_project": "saplings",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "saplings"
}