<div align="center">
# EVOLvE: Evaluating and Optimizing LLMs For Exploration In-Context
<p align="center">
<img src="https://github.com/allenanie/EVOLvE/blob/main/assets/logo.png?raw=true" alt="EVOLvE Logo" width="200" height="200"/>
</p>
[![Github](https://img.shields.io/badge/EVOLvE-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white)](https://github.com/allenanie/EVOLvE) [![ArXiv](https://img.shields.io/badge/EVOLvE-CF4545?style=for-the-badge&logo=arxiv&logoColor=000&logoColor=white)](https://arxiv.org/pdf/2410.06238)
[![PyPI version](https://badge.fury.io/py/banditbench.svg)](https://badge.fury.io/py/banditbench)
[![Python](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![License](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
[![Build Status](https://github.com/allenanie/evolve/actions/workflows/python-app.yml/badge.svg)](https://github.com/allenanie/evolve/actions)
<div align="center" style="font-family: Arial, sans-serif;">
<p>
<a href="#-news" style="text-decoration: none; font-weight: bold;">🎉 News</a> •
<a href="#️-installation" style="text-decoration: none; font-weight: bold;">✨ Getting Started</a> •
<a href="#-features" style="text-decoration: none; font-weight: bold;">📖 Introduction</a>
</p>
<p>
<a href="#-bandit-scenario-example" style="text-decoration: none; font-weight: bold;">🔧 Usage</a> •
<a href="#-citation" style="text-decoration: none; font-weight: bold;">🎈 Citation</a> •
<a href="#-acknowledgement" style="text-decoration: none; font-weight: bold;">🌻 Acknowledgement</a>
</p>
</div>
</div>
EVOLvE is a framework for evaluating Large Language Models (LLMs) for In-Context Reinforcement Learning (ICRL).
We provide a flexible framework for experimenting with different LLM Agent Context Layers and analyzing how they affect a model's ability to interact with RL environments (bandits). This repository contains the code to reproduce results from our EVOLvE paper.
## 📰 News
- [Jan 2025] 🎉 EVOLvE codebase is released and available on [GitHub](https://github.com/allenanie/EVOLvE)
- [Jan 2025] 📦 First version of `banditbench` package is published on PyPI
- [Oct 2024] 📄 Our paper ["EVOLvE: Evaluating and Optimizing LLMs For Exploration"](https://arxiv.org/abs/2410.06238) is now available on arXiv
## 🚀 Features
- Flexible framework for evaluating LLMs for In-Context Reinforcement Learning (ICRL)
- Support for both multi-armed and contextual bandit scenarios
- Mixin-based design for LLM agents with customizable **Context Layers**
- Built-in support for few-shot learning and demonstration
- Includes popular benchmark environments (e.g., MovieLens)
## 🛠️ Installation
### Option 1: Install from PyPI (Recommended for Users)
```bash
pip install banditbench
```
### Option 2: Install from Source (Recommended for Developers)
```bash
git clone https://github.com/allenanie/EVOLvE.git
cd EVOLvE
pip install -e . # Install in editable mode for development
```
## 🎯 Bandit Scenario
We provide two types of bandit scenarios:
**Multi-Armed Bandit Scenario**
- Classic exploration-exploitation problem with stochastic reward sampled from a fixed distributions
- Agent learns to select the best arm without any contextual information
- Example: Choosing between 5 different TikTok videos to show, without knowing which one is more popular at first
**Contextual Bandit Scenario**
- Reward distributions depend on a context (e.g., user features)
- Agent learns to map contexts to optimal actions
- Example: Recommending movies to users based on their age, location (e.g., suggesting "The Dark Knight" to a 25-year-old who enjoys action movies and lives in an urban area)
<p align="center">
<img src="https://github.com/allenanie/EVOLvE/blob/main/assets/bandit_scenario.png?raw=true" alt="Bandit Scenario Example"/>
</p>
## 🎮 Quick Start
### Evaluate LLMs for their In-Context Reinforcement Learning Performance
In this example, we will compare the performance of two agents (LLM and one of the classic agents) on a multi-armed bandit task.
```python
from banditbench.tasks.mab import BernoulliBandit, VerbalMultiArmedBandit
from banditbench.agents.llm import LLMAgent
from banditbench.agents.classics import UCBAgent
# this is a 5-armed bandit
# with the probability of getting a reward to be [0.2, 0.2, 0.2, 0.2, 0.5]
core_bandit = BernoulliBandit(5, horizon=100, arm_params=[0.2, 0.2, 0.2, 0.2, 0.5])
# The scenario is "ClothesShopping", agent sees actions as clothing items
verbal_bandit = VerbalMultiArmedBandit(core_bandit, "ClothesShopping")
# we create an LLM agent that uses summary statistics (mean, number of times, etc.)
agent = LLMAgent.build_with_env(verbal_bandit, summary=True, model="gpt-3.5-turbo")
llm_result = agent.in_context_learn(verbal_bandit, n_trajs=5)
# we create a UCB agent, which is a classic agent that uses
# Upper Confidence Bound to make decisions
classic_agent = UCBAgent(core_bandit)
# we run the classic agent in-context learning on the core bandit for 5 trajectories
classic_result = classic_agent.in_context_learn(core_bandit, n_trajs=5)
classic_result.plot_performance(llm_result, labels=['UCB', 'GPT-3.5 Turbo'])
```
Doing this will give you a plot like this:
<p align="left">
<img src="https://github.com/allenanie/EVOLvE/blob/main/assets/UCBvsLLM.png?raw=true" alt="UCB vs LLM" style="width: 60%;"/>
</p>
## 💰 Evaluation Cost
Each of the benchmark has a cost estimation tool for the inference cost. The listed cost is in $ amount which contains
all trials and repetitions.
```python
from banditbench import HardCoreBench, HardCorePlusBench, FullBench, CoreBench, MovieBench
bench = HardCoreBench()
cost = bench.calculate_eval_cost([
'gemini-1.5-pro',
'gemini-1.5-flash',
'gpt-4o-2024-11-20',
"gpt-4o-mini-2024-07-18",
"o1-2024-12-17",
"o1-mini-2024-09-12",
"claude-3-5-sonnet-20241022",
"claude-3-5-haiku-20241022"
])
```
You can evaluate an agent by doing:
```python
from banditbench.agents.llm import LLMAgent
from banditbench.agents.guide import UCBGuide
env_to_agent_results = bench.evaluate([
LLMAgent.build(), # Raw History Context Layer
LLMAgent.build(summary=True), # Summary Context Layer
LLMAgent.build(summary=True, guide=UCBGuide(env)) # Summary + UCB Guide Context Layer
])
```
Cost estimation is performed for a **single** agent with raw history (the longest context). If you evaluate multiple agent,
you can simply multiply this cost by the number of agents.
| Model | Core | HardCore | HardCore+ | Full | MovieBench |
|---------------------------|----------|--------------|---------------|-----------|----------------|
| **gemini-1.5-flash** | **$31.05** | **$14.91** | **$39.18** | **$83.44** | **$31.05** |
| gpt-4o-mini-2024-07-18 | $62.10 | $29.83 | $78.36 | $166.88 | $62.10 |
| claude-3-5-haiku-20241022 | $414.33 | $198.97 | $522.64 | $1113.18 | $414.33 |
| **gemini-1.5-pro** | **$517.54** | **$248.55** | **$652.98** | **$1390.69** | **$517.54** |
| gpt-4o-2024-11-20 | $1035.07 | $497.11 | $1305.96 | $2781.38 | $1035.07 |
| o1-mini-2024-09-12 | $1242.09 | $596.53 | $1567.16 | $3337.66 | $1242.09 |
| claude-3-5-sonnet-20241022| $1243.00 | $596.91 | $1567.91 | $3339.53 | $1243.00 |
| o1-2024-12-17 | $6210.45 | $2982.64 | $7835.79 | $16688.31 | $6210.45 |
## 🌍 Environments & 🤖 Agents
Here are a list of agents that are supported by EVOLvE:
For Multi-Armed Bandit Scenario:
| Agent Name | Code | Interaction History | Algorithm Guide |
|------------|------|---------------------|-----------------|
| UCB | `UCBAgent(env)` | `False` | `NA` |
| Greedy | `GreedyAgent(env)` | `False` | `NA` |
| Thompson Sampling | `ThompsonSamplingAgent(env)` | `False` | `NA` |
| LLM with Raw History | `LLMAgent.build(env)` | `False` | `False` |
| LLM with Summary | `LLMAgent.build(env, summary=True)` | `True` | `False` |
| LLM with UCB Guide | `LLMAgent.build(env, summary=True, guide=UCBGuide(env))` | `True` | `True` |
For Contextual Bandit Scenario:
| Agent Name | Code | Interaction History | Algorithm Guide |
|------------|------|---------------------|-----------------|
| LinUCB | `LinUCBAgent(env)` | `False` | `NA` |
| LLM with Raw History | `LLMAgent.build(env)` | `False` | `False` |
| LLM with UCB Guide | `LLMAgent.build(env, guide=LinUCBGuide(env))` | `True` | `True` |
Here are a list of environments that are supported by EVOLvE:
**Multi-Armed Bandit Scenario**
| Environment Name | Code | Description |
|------------|------|-----------------|
| Bernoulli Bandit | `BernoulliBandit(n_arms, horizon, arm_params)` | Arm parameter is Bernoulli p|
| Gaussian Bandit | `GaussianBandit(n_arms, horizon, arm_params)` | Arm parameter is a tuple of (mean, variance)|
For LLM, we provide a `VerbalMultiArmedBandit` environment that converts the core bandit into a verbal bandit.
| Scenario Name | Code | Action Names |
|------------|------|-----------------|
| Button Pushing | `ButtonPushing` | Action names are colored buttons like "Red", "Blue", "Green", etc.|
| Online Ads | `OnlineAds` | Action names are online ads like "Ad A", "Ad B", "Ad C", etc.|
| Video Watching | `VideoWatching` | Action names are videos like "Video A", "Video B", "Video C", etc.|
| Clothes Shopping | `ClothesShopping` | Action names are clothing items like "Velvet Vogue Jacket", "Silk Serenity Dress", etc.|
They can be coupled together like:
```python
from banditbench.tasks.mab import BernoulliBandit, VerbalMultiArmedBandit
core_bandit = BernoulliBandit(2, 10, [0.5, 0.2], 123)
verbal_bandit = VerbalMultiArmedBandit(core_bandit, "VideoWatching")
```
**Contextual Bandit Scenario**
| Environment Name | Code | Description |
|------------|------|-----------------|
| MovieLens | `MovieLens(task_name, num_arms, horizon)` | `task_name` loads in specific MovieLens dataset|
| MovieLensVerbal | `MovieLensVerbal(env)` | Similar to VerbalEnv before. Scenario is fixed to be "MovieLens"|
```python
from banditbench.tasks.contextual import MovieLens, MovieLensVerbal
env = MovieLens('100k-ratings', num_arms=10, horizon=200, rank_k=5, mode='train',
save_data_dir='./tensorflow_datasets/')
verbal_env = MovieLensVerbal(env)
```
To use the environments listed in the paper, you can use the following code:
```python
from banditbench.tasks.mab import create_small_gap_bernoulli_bandit, create_large_gap_bernoulli_bandit
from banditbench.tasks.mab import create_high_var_gaussian_bandit, create_low_var_gaussian_bandit
easy_bern_bandit = create_small_gap_bernoulli_bandit(num_arms=5, horizon=1000)
```
## 🧩 Architecture
### Decision-Making Context
The framework represents decision-making contexts in three segments:
```text
{Task Description + Instruction} (provided by the environment)
{Few-shot demonstrations from historical interactions}
{Current history of interaction} (decided by the agent)
{Query prompt for the next decision} (provided by the environment)
```
### LLM Agents
We use a Mixin-based design pattern to provide maximum flexibility and customization options for agent implementation. This allows you to:
- Combine different agent behaviors
- Customize prompt engineering strategies
- Implement new decision-making algorithms
## 🔧 Customization
### Adding Custom Multi-Armed Bandit Scenarios
To create a custom bandit scenario:
1. Inherit from the base scenario class
2. Implement required methods
(Coming soon)
### Creating Custom Agents
(Coming soon)
## ⚠️ Known Issues
1. **TFDS Issues**: There is a known issue with TensorFlow Datasets when using multiple Jupyter notebooks sharing the same kernel. The kernel may crash when loading datasets, even with different save locations.
2. **TensorFlow Dependency**: The project currently requires TensorFlow due to TFDS usage. We plan to remove this dependency in future releases.
## 🎈 Citation
If you find EVOLvE useful in your research, please consider citing our paper:
```bibtex
@article{nie2024evolve,
title={EVOLvE: Evaluating and Optimizing LLMs For Exploration},
author={Nie, Allen and Su, Yi and Chang, Bo and Lee, Jonathan N and Chi, Ed H and Le, Quoc V and Chen, Minmin},
journal={arXiv preprint arXiv:2410.06238},
year={2024}
}
```
## 📄 License
This project is licensed under the [LICENSE NAME] - see the [LICENSE](LICENSE) file for details.
## 🌻 Acknowledgement
The design of EVOLvE is inspired by the following projects:
- [DSPy](https://github.com/stanfordnlp/dspy)
- [Trace](https://github.com/microsoft/Trace)
- [Textgrad](https://github.com/zou-group/textgrad)
- [d3rlpy](https://d3rlpy.readthedocs.io/en/v2.6.0/)
- [Scala Mixin Trait](https://docs.scala-lang.org/tour/mixin-class-composition.html)
- [In-Context Reinforcement Learning Paper List](https://github.com/dunnolab/awesome-in-context-rl)
## 🤝 Contributing
We welcome contributions! Please start by reporting an issue or a feature request.
<p align="center">
<img src="https://github.com/allenanie/EVOLvE/blob/main/assets/main.jpeg?raw=true" alt="EVOLvE Framework Overview"/>
</p>
Raw data
{
"_id": null,
"home_page": "https://github.com/allenanie/EVOLvE",
"name": "banditbench",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "banditbench, Bandit, LLM, RL, Agent",
"author": "Allen Nie",
"author_email": "anie@cs.stanford.edu",
"download_url": "https://files.pythonhosted.org/packages/08/3f/72ac6832bdec0e2a44dc56e12ba1484957be720f9bf4a3b76be9cc59be64/banditbench-0.0.8.tar.gz",
"platform": null,
"description": "<div align=\"center\">\n\n# EVOLvE: Evaluating and Optimizing LLMs For Exploration In-Context\n\n<p align=\"center\">\n <img src=\"https://github.com/allenanie/EVOLvE/blob/main/assets/logo.png?raw=true\" alt=\"EVOLvE Logo\" width=\"200\" height=\"200\"/>\n</p>\n\n\n[![Github](https://img.shields.io/badge/EVOLvE-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white)](https://github.com/allenanie/EVOLvE) [![ArXiv](https://img.shields.io/badge/EVOLvE-CF4545?style=for-the-badge&logo=arxiv&logoColor=000&logoColor=white)](https://arxiv.org/pdf/2410.06238)\n\n\n[![PyPI version](https://badge.fury.io/py/banditbench.svg)](https://badge.fury.io/py/banditbench)\n[![Python](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)\n[![License](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)\n[![Build Status](https://github.com/allenanie/evolve/actions/workflows/python-app.yml/badge.svg)](https://github.com/allenanie/evolve/actions)\n\n<div align=\"center\" style=\"font-family: Arial, sans-serif;\">\n <p>\n <a href=\"#-news\" style=\"text-decoration: none; font-weight: bold;\">\ud83c\udf89 News</a> \u2022\n <a href=\"#\ufe0f-installation\" style=\"text-decoration: none; font-weight: bold;\">\u2728 Getting Started</a> \u2022\n <a href=\"#-features\" style=\"text-decoration: none; font-weight: bold;\">\ud83d\udcd6 Introduction</a>\n </p>\n <p>\n <a href=\"#-bandit-scenario-example\" style=\"text-decoration: none; font-weight: bold;\">\ud83d\udd27 Usage</a> \u2022\n <a href=\"#-citation\" style=\"text-decoration: none; font-weight: bold;\">\ud83c\udf88 Citation</a> \u2022\n <a href=\"#-acknowledgement\" style=\"text-decoration: none; font-weight: bold;\">\ud83c\udf3b Acknowledgement</a>\n </p>\n</div>\n\n</div>\n\nEVOLvE is a framework for evaluating Large Language Models (LLMs) for In-Context Reinforcement Learning (ICRL). \nWe provide a flexible framework for experimenting with different LLM Agent Context Layers and analyzing how they affect a model's ability to interact with RL environments (bandits). This repository contains the code to reproduce results from our EVOLvE paper.\n\n## \ud83d\udcf0 News\n\n- [Jan 2025] \ud83c\udf89 EVOLvE codebase is released and available on [GitHub](https://github.com/allenanie/EVOLvE)\n- [Jan 2025] \ud83d\udce6 First version of `banditbench` package is published on PyPI\n- [Oct 2024] \ud83d\udcc4 Our paper [\"EVOLvE: Evaluating and Optimizing LLMs For Exploration\"](https://arxiv.org/abs/2410.06238) is now available on arXiv\n\n## \ud83d\ude80 Features\n\n- Flexible framework for evaluating LLMs for In-Context Reinforcement Learning (ICRL)\n- Support for both multi-armed and contextual bandit scenarios\n- Mixin-based design for LLM agents with customizable **Context Layers**\n- Built-in support for few-shot learning and demonstration\n- Includes popular benchmark environments (e.g., MovieLens)\n\n\n## \ud83d\udee0\ufe0f Installation\n\n### Option 1: Install from PyPI (Recommended for Users)\n\n```bash\npip install banditbench\n```\n\n### Option 2: Install from Source (Recommended for Developers)\n\n```bash\ngit clone https://github.com/allenanie/EVOLvE.git\ncd EVOLvE\npip install -e . # Install in editable mode for development\n```\n\n## \ud83c\udfaf Bandit Scenario\n\nWe provide two types of bandit scenarios:\n\n**Multi-Armed Bandit Scenario**\n - Classic exploration-exploitation problem with stochastic reward sampled from a fixed distributions\n - Agent learns to select the best arm without any contextual information\n - Example: Choosing between 5 different TikTok videos to show, without knowing which one is more popular at first\n\n**Contextual Bandit Scenario**\n - Reward distributions depend on a context (e.g., user features)\n - Agent learns to map contexts to optimal actions\n - Example: Recommending movies to users based on their age, location (e.g., suggesting \"The Dark Knight\" to a 25-year-old who enjoys action movies and lives in an urban area)\n\n<p align=\"center\">\n <img src=\"https://github.com/allenanie/EVOLvE/blob/main/assets/bandit_scenario.png?raw=true\" alt=\"Bandit Scenario Example\"/>\n</p>\n\n## \ud83c\udfae Quick Start\n\n### Evaluate LLMs for their In-Context Reinforcement Learning Performance\n\nIn this example, we will compare the performance of two agents (LLM and one of the classic agents) on a multi-armed bandit task.\n\n```python\nfrom banditbench.tasks.mab import BernoulliBandit, VerbalMultiArmedBandit\nfrom banditbench.agents.llm import LLMAgent\nfrom banditbench.agents.classics import UCBAgent\n\n# this is a 5-armed bandit\n# with the probability of getting a reward to be [0.2, 0.2, 0.2, 0.2, 0.5]\ncore_bandit = BernoulliBandit(5, horizon=100, arm_params=[0.2, 0.2, 0.2, 0.2, 0.5])\n\n# The scenario is \"ClothesShopping\", agent sees actions as clothing items\nverbal_bandit = VerbalMultiArmedBandit(core_bandit, \"ClothesShopping\")\n\n# we create an LLM agent that uses summary statistics (mean, number of times, etc.)\nagent = LLMAgent.build_with_env(verbal_bandit, summary=True, model=\"gpt-3.5-turbo\")\n\nllm_result = agent.in_context_learn(verbal_bandit, n_trajs=5)\n\n# we create a UCB agent, which is a classic agent that uses \n# Upper Confidence Bound to make decisions\nclassic_agent = UCBAgent(core_bandit)\n\n# we run the classic agent in-context learning on the core bandit for 5 trajectories\nclassic_result = classic_agent.in_context_learn(core_bandit, n_trajs=5)\n\nclassic_result.plot_performance(llm_result, labels=['UCB', 'GPT-3.5 Turbo'])\n```\n\nDoing this will give you a plot like this:\n\n<p align=\"left\">\n <img src=\"https://github.com/allenanie/EVOLvE/blob/main/assets/UCBvsLLM.png?raw=true\" alt=\"UCB vs LLM\" style=\"width: 60%;\"/>\n</p>\n\n## \ud83d\udcb0 Evaluation Cost\n\nEach of the benchmark has a cost estimation tool for the inference cost. The listed cost is in $ amount which contains\nall trials and repetitions.\n\n```python\nfrom banditbench import HardCoreBench, HardCorePlusBench, FullBench, CoreBench, MovieBench\nbench = HardCoreBench()\ncost = bench.calculate_eval_cost([\n 'gemini-1.5-pro',\n 'gemini-1.5-flash',\n 'gpt-4o-2024-11-20',\n \"gpt-4o-mini-2024-07-18\",\n \"o1-2024-12-17\",\n \"o1-mini-2024-09-12\",\n \"claude-3-5-sonnet-20241022\",\n \"claude-3-5-haiku-20241022\"\n])\n```\n\nYou can evaluate an agent by doing:\n```python\nfrom banditbench.agents.llm import LLMAgent\nfrom banditbench.agents.guide import UCBGuide\n\nenv_to_agent_results = bench.evaluate([\n LLMAgent.build(), # Raw History Context Layer\n LLMAgent.build(summary=True), # Summary Context Layer\n LLMAgent.build(summary=True, guide=UCBGuide(env)) # Summary + UCB Guide Context Layer\n])\n```\n\nCost estimation is performed for a **single** agent with raw history (the longest context). If you evaluate multiple agent,\nyou can simply multiply this cost by the number of agents.\n\n| Model | Core | HardCore | HardCore+ | Full | MovieBench |\n|---------------------------|----------|--------------|---------------|-----------|----------------|\n| **gemini-1.5-flash** | **$31.05** | **$14.91** | **$39.18** | **$83.44** | **$31.05** |\n| gpt-4o-mini-2024-07-18 | $62.10 | $29.83 | $78.36 | $166.88 | $62.10 |\n| claude-3-5-haiku-20241022 | $414.33 | $198.97 | $522.64 | $1113.18 | $414.33 |\n| **gemini-1.5-pro** | **$517.54** | **$248.55** | **$652.98** | **$1390.69** | **$517.54** |\n| gpt-4o-2024-11-20 | $1035.07 | $497.11 | $1305.96 | $2781.38 | $1035.07 |\n| o1-mini-2024-09-12 | $1242.09 | $596.53 | $1567.16 | $3337.66 | $1242.09 |\n| claude-3-5-sonnet-20241022| $1243.00 | $596.91 | $1567.91 | $3339.53 | $1243.00 |\n| o1-2024-12-17 | $6210.45 | $2982.64 | $7835.79 | $16688.31 | $6210.45 |\n\n## \ud83c\udf0d Environments & \ud83e\udd16 Agents\n\nHere are a list of agents that are supported by EVOLvE:\n\nFor Multi-Armed Bandit Scenario:\n\n| Agent Name | Code | Interaction History | Algorithm Guide |\n|------------|------|---------------------|-----------------|\n| UCB | `UCBAgent(env)` | `False` | `NA` |\n| Greedy | `GreedyAgent(env)` | `False` | `NA` |\n| Thompson Sampling | `ThompsonSamplingAgent(env)` | `False` | `NA` |\n| LLM with Raw History | `LLMAgent.build(env)` | `False` | `False` |\n| LLM with Summary | `LLMAgent.build(env, summary=True)` | `True` | `False` |\n| LLM with UCB Guide | `LLMAgent.build(env, summary=True, guide=UCBGuide(env))` | `True` | `True` |\n\nFor Contextual Bandit Scenario:\n\n| Agent Name | Code | Interaction History | Algorithm Guide |\n|------------|------|---------------------|-----------------|\n| LinUCB | `LinUCBAgent(env)` | `False` | `NA` |\n| LLM with Raw History | `LLMAgent.build(env)` | `False` | `False` |\n| LLM with UCB Guide | `LLMAgent.build(env, guide=LinUCBGuide(env))` | `True` | `True` |\n\nHere are a list of environments that are supported by EVOLvE:\n\n**Multi-Armed Bandit Scenario**\n\n| Environment Name | Code | Description |\n|------------|------|-----------------|\n| Bernoulli Bandit | `BernoulliBandit(n_arms, horizon, arm_params)` | Arm parameter is Bernoulli p|\n| Gaussian Bandit | `GaussianBandit(n_arms, horizon, arm_params)` | Arm parameter is a tuple of (mean, variance)|\n\nFor LLM, we provide a `VerbalMultiArmedBandit` environment that converts the core bandit into a verbal bandit.\n\n| Scenario Name | Code | Action Names |\n|------------|------|-----------------|\n| Button Pushing | `ButtonPushing` | Action names are colored buttons like \"Red\", \"Blue\", \"Green\", etc.|\n| Online Ads | `OnlineAds` | Action names are online ads like \"Ad A\", \"Ad B\", \"Ad C\", etc.|\n| Video Watching | `VideoWatching` | Action names are videos like \"Video A\", \"Video B\", \"Video C\", etc.|\n| Clothes Shopping | `ClothesShopping` | Action names are clothing items like \"Velvet Vogue Jacket\", \"Silk Serenity Dress\", etc.|\n\nThey can be coupled together like:\n\n```python\nfrom banditbench.tasks.mab import BernoulliBandit, VerbalMultiArmedBandit\n\ncore_bandit = BernoulliBandit(2, 10, [0.5, 0.2], 123)\nverbal_bandit = VerbalMultiArmedBandit(core_bandit, \"VideoWatching\")\n```\n\n**Contextual Bandit Scenario**\n\n| Environment Name | Code | Description |\n|------------|------|-----------------|\n| MovieLens | `MovieLens(task_name, num_arms, horizon)` | `task_name` loads in specific MovieLens dataset|\n| MovieLensVerbal | `MovieLensVerbal(env)` | Similar to VerbalEnv before. Scenario is fixed to be \"MovieLens\"|\n\n```python\nfrom banditbench.tasks.contextual import MovieLens, MovieLensVerbal\n\nenv = MovieLens('100k-ratings', num_arms=10, horizon=200, rank_k=5, mode='train',\n save_data_dir='./tensorflow_datasets/')\nverbal_env = MovieLensVerbal(env)\n```\n\nTo use the environments listed in the paper, you can use the following code:\n\n```python\nfrom banditbench.tasks.mab import create_small_gap_bernoulli_bandit, create_large_gap_bernoulli_bandit\nfrom banditbench.tasks.mab import create_high_var_gaussian_bandit, create_low_var_gaussian_bandit\n\neasy_bern_bandit = create_small_gap_bernoulli_bandit(num_arms=5, horizon=1000)\n```\n\n## \ud83e\udde9 Architecture\n\n### Decision-Making Context\n\nThe framework represents decision-making contexts in three segments:\n\n```text\n{Task Description + Instruction} (provided by the environment)\n{Few-shot demonstrations from historical interactions}\n{Current history of interaction} (decided by the agent)\n{Query prompt for the next decision} (provided by the environment)\n```\n\n### LLM Agents\n\nWe use a Mixin-based design pattern to provide maximum flexibility and customization options for agent implementation. This allows you to:\n- Combine different agent behaviors\n- Customize prompt engineering strategies\n- Implement new decision-making algorithms\n\n\n## \ud83d\udd27 Customization\n\n### Adding Custom Multi-Armed Bandit Scenarios\n\nTo create a custom bandit scenario:\n1. Inherit from the base scenario class\n2. Implement required methods\n(Coming soon)\n\n### Creating Custom Agents\n\n(Coming soon)\n\n## \u26a0\ufe0f Known Issues\n\n1. **TFDS Issues**: There is a known issue with TensorFlow Datasets when using multiple Jupyter notebooks sharing the same kernel. The kernel may crash when loading datasets, even with different save locations.\n\n2. **TensorFlow Dependency**: The project currently requires TensorFlow due to TFDS usage. We plan to remove this dependency in future releases.\n\n## \ud83c\udf88 Citation\n\nIf you find EVOLvE useful in your research, please consider citing our paper:\n\n```bibtex\n@article{nie2024evolve,\n title={EVOLvE: Evaluating and Optimizing LLMs For Exploration},\n author={Nie, Allen and Su, Yi and Chang, Bo and Lee, Jonathan N and Chi, Ed H and Le, Quoc V and Chen, Minmin},\n journal={arXiv preprint arXiv:2410.06238},\n year={2024}\n}\n```\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the [LICENSE NAME] - see the [LICENSE](LICENSE) file for details.\n\n## \ud83c\udf3b Acknowledgement\n\nThe design of EVOLvE is inspired by the following projects:\n\n- [DSPy](https://github.com/stanfordnlp/dspy) \n- [Trace](https://github.com/microsoft/Trace)\n- [Textgrad](https://github.com/zou-group/textgrad)\n- [d3rlpy](https://d3rlpy.readthedocs.io/en/v2.6.0/)\n- [Scala Mixin Trait](https://docs.scala-lang.org/tour/mixin-class-composition.html)\n- [In-Context Reinforcement Learning Paper List](https://github.com/dunnolab/awesome-in-context-rl)\n\n## \ud83e\udd1d Contributing\n\nWe welcome contributions! Please start by reporting an issue or a feature request.\n\n<p align=\"center\">\n <img src=\"https://github.com/allenanie/EVOLvE/blob/main/assets/main.jpeg?raw=true\" alt=\"EVOLvE Framework Overview\"/>\n</p>\n",
"bugtrack_url": null,
"license": "MIT LICENSE",
"summary": "BanditBench: A Bandit Benchmark to Evaluate Self-Improving LLM Algorithms",
"version": "0.0.8",
"project_urls": {
"Documentation": "https://github.com/allenanie/EVOLvE",
"Homepage": "https://github.com/allenanie/EVOLvE",
"Repository": "https://github.com/allenanie/EVOLvE"
},
"split_keywords": [
"banditbench",
" bandit",
" llm",
" rl",
" agent"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "43f72ca04c851fb95d308c05ea21912580408771461ae95264302c92afe8cede",
"md5": "c3bfb5832f9c24250dc2e276f5c2eb74",
"sha256": "d381390a2d4da089ca60505758e9da881c2e0beb621ff94fabd2524e9324a654"
},
"downloads": -1,
"filename": "banditbench-0.0.8-py3-none-any.whl",
"has_sig": false,
"md5_digest": "c3bfb5832f9c24250dc2e276f5c2eb74",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 353449,
"upload_time": "2025-01-20T22:51:50",
"upload_time_iso_8601": "2025-01-20T22:51:50.298307Z",
"url": "https://files.pythonhosted.org/packages/43/f7/2ca04c851fb95d308c05ea21912580408771461ae95264302c92afe8cede/banditbench-0.0.8-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "083f72ac6832bdec0e2a44dc56e12ba1484957be720f9bf4a3b76be9cc59be64",
"md5": "f255d07ca35abbf0645c1dc9016ec46a",
"sha256": "6c0871046006d78f497acfe9da3ad26a08525afe4a9fe49c3ed79fd9ea10b9f0"
},
"downloads": -1,
"filename": "banditbench-0.0.8.tar.gz",
"has_sig": false,
"md5_digest": "f255d07ca35abbf0645c1dc9016ec46a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 3988894,
"upload_time": "2025-01-20T22:51:52",
"upload_time_iso_8601": "2025-01-20T22:51:52.831982Z",
"url": "https://files.pythonhosted.org/packages/08/3f/72ac6832bdec0e2a44dc56e12ba1484957be720f9bf4a3b76be9cc59be64/banditbench-0.0.8.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-20 22:51:52",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "allenanie",
"github_project": "EVOLvE",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "tensorflow",
"specs": [
[
"==",
"2.17.0"
]
]
},
{
"name": "ml-dtypes",
"specs": [
[
"==",
"0.3.1"
]
]
},
{
"name": "pydantic",
"specs": [
[
"==",
"2.10.5"
]
]
},
{
"name": "litellm",
"specs": []
},
{
"name": "pandas",
"specs": []
},
{
"name": "tensorflow_datasets",
"specs": [
[
"==",
"4.9.3"
]
]
},
{
"name": "matplotlib",
"specs": []
},
{
"name": "numpy",
"specs": []
},
{
"name": "scipy",
"specs": []
}
],
"lcname": "banditbench"
}