banditbench

Name	banditbench JSON
Version	0.0.8 JSON
	download
home_page	https://github.com/allenanie/EVOLvE
Summary	BanditBench: A Bandit Benchmark to Evaluate Self-Improving LLM Algorithms
upload_time	2025-01-20 22:51:52
maintainer	None
docs_url	None
author	Allen Nie
requires_python	>=3.9
license	MIT LICENSE
keywords	banditbench bandit llm rl agent
VCS
bugtrack_url
requirements	tensorflow ml-dtypes pydantic litellm pandas tensorflow_datasets matplotlib numpy scipy
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <div align="center">

# EVOLvE: Evaluating and Optimizing LLMs For Exploration In-Context

<p align="center">
  <img src="https://github.com/allenanie/EVOLvE/blob/main/assets/logo.png?raw=true" alt="EVOLvE Logo" width="200" height="200"/>
</p>


[![Github](https://img.shields.io/badge/EVOLvE-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white)](https://github.com/allenanie/EVOLvE)  [![ArXiv](https://img.shields.io/badge/EVOLvE-CF4545?style=for-the-badge&logo=arxiv&logoColor=000&logoColor=white)](https://arxiv.org/pdf/2410.06238)


[![PyPI version](https://badge.fury.io/py/banditbench.svg)](https://badge.fury.io/py/banditbench)
[![Python](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![License](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
[![Build Status](https://github.com/allenanie/evolve/actions/workflows/python-app.yml/badge.svg)](https://github.com/allenanie/evolve/actions)

<div align="center" style="font-family: Arial, sans-serif;">
  <p>
    <a href="#-news" style="text-decoration: none; font-weight: bold;">🎉 News</a> •
    <a href="#️-installation" style="text-decoration: none; font-weight: bold;">✨ Getting Started</a> •
    <a href="#-features" style="text-decoration: none; font-weight: bold;">📖 Introduction</a>
  </p>
  <p>
    <a href="#-bandit-scenario-example" style="text-decoration: none; font-weight: bold;">🔧 Usage</a> •
    <a href="#-citation" style="text-decoration: none; font-weight: bold;">🎈 Citation</a> •
    <a href="#-acknowledgement" style="text-decoration: none; font-weight: bold;">🌻 Acknowledgement</a>
  </p>
</div>

</div>

EVOLvE is a framework for evaluating Large Language Models (LLMs) for In-Context Reinforcement Learning (ICRL). 
We provide a flexible framework for experimenting with different LLM Agent Context Layers and analyzing how they affect a model's ability to interact with RL environments (bandits). This repository contains the code to reproduce results from our EVOLvE paper.

## 📰 News

- [Jan 2025] 🎉 EVOLvE codebase is released and available on [GitHub](https://github.com/allenanie/EVOLvE)
- [Jan 2025] 📦 First version of `banditbench` package is published on PyPI
- [Oct 2024] 📄 Our paper ["EVOLvE: Evaluating and Optimizing LLMs For Exploration"](https://arxiv.org/abs/2410.06238) is now available on arXiv

## 🚀 Features

- Flexible framework for evaluating LLMs for In-Context Reinforcement Learning (ICRL)
- Support for both multi-armed and contextual bandit scenarios
- Mixin-based design for LLM agents with customizable **Context Layers**
- Built-in support for few-shot learning and demonstration
- Includes popular benchmark environments (e.g., MovieLens)


## 🛠️ Installation

### Option 1: Install from PyPI (Recommended for Users)

```bash
pip install banditbench
```

### Option 2: Install from Source (Recommended for Developers)

```bash
git clone https://github.com/allenanie/EVOLvE.git
cd EVOLvE
pip install -e .  # Install in editable mode for development
```

## 🎯 Bandit Scenario

We provide two types of bandit scenarios:

**Multi-Armed Bandit Scenario**
  - Classic exploration-exploitation problem with stochastic reward sampled from a fixed distributions
  - Agent learns to select the best arm without any contextual information
  - Example: Choosing between 5 different TikTok videos to show, without knowing which one is more popular at first

**Contextual Bandit Scenario**
  - Reward distributions depend on a context (e.g., user features)
  - Agent learns to map contexts to optimal actions
  - Example: Recommending movies to users based on their age, location (e.g., suggesting "The Dark Knight" to a 25-year-old who enjoys action movies and lives in an urban area)

<p align="center">
  <img src="https://github.com/allenanie/EVOLvE/blob/main/assets/bandit_scenario.png?raw=true" alt="Bandit Scenario Example"/>
</p>

## 🎮 Quick Start

### Evaluate LLMs for their In-Context Reinforcement Learning Performance

In this example, we will compare the performance of two agents (LLM and one of the classic agents) on a multi-armed bandit task.

```python
from banditbench.tasks.mab import BernoulliBandit, VerbalMultiArmedBandit
from banditbench.agents.llm import LLMAgent
from banditbench.agents.classics import UCBAgent

# this is a 5-armed bandit
# with the probability of getting a reward to be [0.2, 0.2, 0.2, 0.2, 0.5]
core_bandit = BernoulliBandit(5, horizon=100, arm_params=[0.2, 0.2, 0.2, 0.2, 0.5])

# The scenario is "ClothesShopping", agent sees actions as clothing items
verbal_bandit = VerbalMultiArmedBandit(core_bandit, "ClothesShopping")

# we create an LLM agent that uses summary statistics (mean, number of times, etc.)
agent = LLMAgent.build_with_env(verbal_bandit, summary=True, model="gpt-3.5-turbo")

llm_result = agent.in_context_learn(verbal_bandit, n_trajs=5)

# we create a UCB agent, which is a classic agent that uses 
# Upper Confidence Bound to make decisions
classic_agent = UCBAgent(core_bandit)

# we run the classic agent in-context learning on the core bandit for 5 trajectories
classic_result = classic_agent.in_context_learn(core_bandit, n_trajs=5)

classic_result.plot_performance(llm_result, labels=['UCB', 'GPT-3.5 Turbo'])
```

Doing this will give you a plot like this:

<p align="left">
  <img src="https://github.com/allenanie/EVOLvE/blob/main/assets/UCBvsLLM.png?raw=true" alt="UCB vs LLM" style="width: 60%;"/>
</p>

## 💰 Evaluation Cost

Each of the benchmark has a cost estimation tool for the inference cost. The listed cost is in $ amount which contains
all trials and repetitions.

```python
from banditbench import HardCoreBench, HardCorePlusBench, FullBench, CoreBench, MovieBench
bench = HardCoreBench()
cost = bench.calculate_eval_cost([
    'gemini-1.5-pro',
    'gemini-1.5-flash',
    'gpt-4o-2024-11-20',
    "gpt-4o-mini-2024-07-18",
    "o1-2024-12-17",
    "o1-mini-2024-09-12",
    "claude-3-5-sonnet-20241022",
    "claude-3-5-haiku-20241022"
])
```

You can evaluate an agent by doing:
```python
from banditbench.agents.llm import LLMAgent
from banditbench.agents.guide import UCBGuide

env_to_agent_results = bench.evaluate([
  LLMAgent.build(),  # Raw History Context Layer
  LLMAgent.build(summary=True),  # Summary Context Layer
  LLMAgent.build(summary=True, guide=UCBGuide(env))  # Summary + UCB Guide Context Layer
])
```

Cost estimation is performed for a **single** agent with raw history (the longest context). If you evaluate multiple agent,
you can simply multiply this cost by the number of agents.

| Model                     | Core     | HardCore     | HardCore+     | Full      | MovieBench     |
|---------------------------|----------|--------------|---------------|-----------|----------------|
| **gemini-1.5-flash**      | **$31.05**   | **$14.91**       | **$39.18**        | **$83.44**    | **$31.05**         |
| gpt-4o-mini-2024-07-18    | $62.10   | $29.83       | $78.36        | $166.88   | $62.10         |
| claude-3-5-haiku-20241022 | $414.33  | $198.97      | $522.64       | $1113.18  | $414.33        |
| **gemini-1.5-pro**        | **$517.54**  | **$248.55**      | **$652.98**       | **$1390.69**  | **$517.54**        |
| gpt-4o-2024-11-20         | $1035.07 | $497.11      | $1305.96      | $2781.38  | $1035.07       |
| o1-mini-2024-09-12        | $1242.09 | $596.53      | $1567.16      | $3337.66  | $1242.09       |
| claude-3-5-sonnet-20241022| $1243.00 | $596.91      | $1567.91      | $3339.53  | $1243.00       |
| o1-2024-12-17             | $6210.45 | $2982.64     | $7835.79      | $16688.31 | $6210.45       |

## 🌍 Environments & 🤖 Agents

Here are a list of agents that are supported by EVOLvE:

For Multi-Armed Bandit Scenario:

| Agent Name | Code | Interaction History | Algorithm Guide |
|------------|------|---------------------|-----------------|
| UCB | `UCBAgent(env)` | `False` | `NA` |
| Greedy | `GreedyAgent(env)` | `False` | `NA` |
| Thompson Sampling | `ThompsonSamplingAgent(env)` | `False` | `NA` |
| LLM with Raw History | `LLMAgent.build(env)` | `False` | `False` |
| LLM with Summary | `LLMAgent.build(env, summary=True)` | `True` | `False` |
| LLM with UCB Guide | `LLMAgent.build(env, summary=True, guide=UCBGuide(env))` | `True` | `True` |

For Contextual Bandit Scenario:

| Agent Name | Code | Interaction History | Algorithm Guide |
|------------|------|---------------------|-----------------|
| LinUCB | `LinUCBAgent(env)` | `False` | `NA` |
| LLM with Raw History | `LLMAgent.build(env)` | `False` | `False` |
| LLM with UCB Guide | `LLMAgent.build(env, guide=LinUCBGuide(env))` | `True` | `True` |

Here are a list of environments that are supported by EVOLvE:

**Multi-Armed Bandit Scenario**

| Environment Name | Code | Description |
|------------|------|-----------------|
| Bernoulli Bandit | `BernoulliBandit(n_arms, horizon, arm_params)` | Arm parameter is Bernoulli p|
| Gaussian Bandit | `GaussianBandit(n_arms, horizon, arm_params)` | Arm parameter is a tuple of (mean, variance)|

For LLM, we provide a `VerbalMultiArmedBandit` environment that converts the core bandit into a verbal bandit.

| Scenario Name | Code | Action Names |
|------------|------|-----------------|
| Button Pushing | `ButtonPushing` | Action names are colored buttons like "Red", "Blue", "Green", etc.|
| Online Ads | `OnlineAds` | Action names are online ads like "Ad A", "Ad B", "Ad C", etc.|
| Video Watching | `VideoWatching` | Action names are videos like "Video A", "Video B", "Video C", etc.|
| Clothes Shopping | `ClothesShopping` | Action names are clothing items like "Velvet Vogue Jacket", "Silk Serenity Dress", etc.|

They can be coupled together like:

```python
from banditbench.tasks.mab import BernoulliBandit, VerbalMultiArmedBandit

core_bandit = BernoulliBandit(2, 10, [0.5, 0.2], 123)
verbal_bandit = VerbalMultiArmedBandit(core_bandit, "VideoWatching")
```

**Contextual Bandit Scenario**

| Environment Name | Code | Description |
|------------|------|-----------------|
| MovieLens | `MovieLens(task_name, num_arms, horizon)` | `task_name` loads in specific MovieLens dataset|
| MovieLensVerbal | `MovieLensVerbal(env)` | Similar to VerbalEnv before. Scenario is fixed to be "MovieLens"|

```python
from banditbench.tasks.contextual import MovieLens, MovieLensVerbal

env = MovieLens('100k-ratings', num_arms=10, horizon=200, rank_k=5, mode='train',
                        save_data_dir='./tensorflow_datasets/')
verbal_env = MovieLensVerbal(env)
```

To use the environments listed in the paper, you can use the following code:

```python
from banditbench.tasks.mab import create_small_gap_bernoulli_bandit, create_large_gap_bernoulli_bandit
from banditbench.tasks.mab import create_high_var_gaussian_bandit, create_low_var_gaussian_bandit

easy_bern_bandit = create_small_gap_bernoulli_bandit(num_arms=5, horizon=1000)
```

## 🧩 Architecture

### Decision-Making Context

The framework represents decision-making contexts in three segments:

```text
{Task Description + Instruction} (provided by the environment)
{Few-shot demonstrations from historical interactions}
{Current history of interaction} (decided by the agent)
{Query prompt for the next decision} (provided by the environment)
```

### LLM Agents

We use a Mixin-based design pattern to provide maximum flexibility and customization options for agent implementation. This allows you to:
- Combine different agent behaviors
- Customize prompt engineering strategies
- Implement new decision-making algorithms


## 🔧 Customization

### Adding Custom Multi-Armed Bandit Scenarios

To create a custom bandit scenario:
1. Inherit from the base scenario class
2. Implement required methods
(Coming soon)

### Creating Custom Agents

(Coming soon)

## ⚠️ Known Issues

1. **TFDS Issues**: There is a known issue with TensorFlow Datasets when using multiple Jupyter notebooks sharing the same kernel. The kernel may crash when loading datasets, even with different save locations.

2. **TensorFlow Dependency**: The project currently requires TensorFlow due to TFDS usage. We plan to remove this dependency in future releases.

## 🎈 Citation

If you find EVOLvE useful in your research, please consider citing our paper:

```bibtex
@article{nie2024evolve,
  title={EVOLvE: Evaluating and Optimizing LLMs For Exploration},
  author={Nie, Allen and Su, Yi and Chang, Bo and Lee, Jonathan N and Chi, Ed H and Le, Quoc V and Chen, Minmin},
  journal={arXiv preprint arXiv:2410.06238},
  year={2024}
}
```

## 📄 License

This project is licensed under the [LICENSE NAME] - see the [LICENSE](LICENSE) file for details.

## 🌻 Acknowledgement

The design of EVOLvE is inspired by the following projects:

- [DSPy](https://github.com/stanfordnlp/dspy) 
- [Trace](https://github.com/microsoft/Trace)
- [Textgrad](https://github.com/zou-group/textgrad)
- [d3rlpy](https://d3rlpy.readthedocs.io/en/v2.6.0/)
- [Scala Mixin Trait](https://docs.scala-lang.org/tour/mixin-class-composition.html)
- [In-Context Reinforcement Learning Paper List](https://github.com/dunnolab/awesome-in-context-rl)

## 🤝 Contributing

We welcome contributions! Please start by reporting an issue or a feature request.

<p align="center">
  <img src="https://github.com/allenanie/EVOLvE/blob/main/assets/main.jpeg?raw=true" alt="EVOLvE Framework Overview"/>
</p>

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/allenanie/EVOLvE",
    "name": "banditbench",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "banditbench, Bandit, LLM, RL, Agent",
    "author": "Allen Nie",
    "author_email": "anie@cs.stanford.edu",
    "download_url": "https://files.pythonhosted.org/packages/08/3f/72ac6832bdec0e2a44dc56e12ba1484957be720f9bf4a3b76be9cc59be64/banditbench-0.0.8.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n\n# EVOLvE: Evaluating and Optimizing LLMs For Exploration In-Context\n\n<p align=\"center\">\n  <img src=\"https://github.com/allenanie/EVOLvE/blob/main/assets/logo.png?raw=true\" alt=\"EVOLvE Logo\" width=\"200\" height=\"200\"/>\n</p>\n\n\n[![Github](https://img.shields.io/badge/EVOLvE-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white)](https://github.com/allenanie/EVOLvE)  [![ArXiv](https://img.shields.io/badge/EVOLvE-CF4545?style=for-the-badge&logo=arxiv&logoColor=000&logoColor=white)](https://arxiv.org/pdf/2410.06238)\n\n\n[![PyPI version](https://badge.fury.io/py/banditbench.svg)](https://badge.fury.io/py/banditbench)\n[![Python](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)\n[![License](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)\n[![Build Status](https://github.com/allenanie/evolve/actions/workflows/python-app.yml/badge.svg)](https://github.com/allenanie/evolve/actions)\n\n<div align=\"center\" style=\"font-family: Arial, sans-serif;\">\n  <p>\n    <a href=\"#-news\" style=\"text-decoration: none; font-weight: bold;\">\ud83c\udf89 News</a> \u2022\n    <a href=\"#\ufe0f-installation\" style=\"text-decoration: none; font-weight: bold;\">\u2728 Getting Started</a> \u2022\n    <a href=\"#-features\" style=\"text-decoration: none; font-weight: bold;\">\ud83d\udcd6 Introduction</a>\n  </p>\n  <p>\n    <a href=\"#-bandit-scenario-example\" style=\"text-decoration: none; font-weight: bold;\">\ud83d\udd27 Usage</a> \u2022\n    <a href=\"#-citation\" style=\"text-decoration: none; font-weight: bold;\">\ud83c\udf88 Citation</a> \u2022\n    <a href=\"#-acknowledgement\" style=\"text-decoration: none; font-weight: bold;\">\ud83c\udf3b Acknowledgement</a>\n  </p>\n</div>\n\n</div>\n\nEVOLvE is a framework for evaluating Large Language Models (LLMs) for In-Context Reinforcement Learning (ICRL). \nWe provide a flexible framework for experimenting with different LLM Agent Context Layers and analyzing how they affect a model's ability to interact with RL environments (bandits). This repository contains the code to reproduce results from our EVOLvE paper.\n\n## \ud83d\udcf0 News\n\n- [Jan 2025] \ud83c\udf89 EVOLvE codebase is released and available on [GitHub](https://github.com/allenanie/EVOLvE)\n- [Jan 2025] \ud83d\udce6 First version of `banditbench` package is published on PyPI\n- [Oct 2024] \ud83d\udcc4 Our paper [\"EVOLvE: Evaluating and Optimizing LLMs For Exploration\"](https://arxiv.org/abs/2410.06238) is now available on arXiv\n\n## \ud83d\ude80 Features\n\n- Flexible framework for evaluating LLMs for In-Context Reinforcement Learning (ICRL)\n- Support for both multi-armed and contextual bandit scenarios\n- Mixin-based design for LLM agents with customizable **Context Layers**\n- Built-in support for few-shot learning and demonstration\n- Includes popular benchmark environments (e.g., MovieLens)\n\n\n## \ud83d\udee0\ufe0f Installation\n\n### Option 1: Install from PyPI (Recommended for Users)\n\n```bash\npip install banditbench\n```\n\n### Option 2: Install from Source (Recommended for Developers)\n\n```bash\ngit clone https://github.com/allenanie/EVOLvE.git\ncd EVOLvE\npip install -e .  # Install in editable mode for development\n```\n\n## \ud83c\udfaf Bandit Scenario\n\nWe provide two types of bandit scenarios:\n\n**Multi-Armed Bandit Scenario**\n  - Classic exploration-exploitation problem with stochastic reward sampled from a fixed distributions\n  - Agent learns to select the best arm without any contextual information\n  - Example: Choosing between 5 different TikTok videos to show, without knowing which one is more popular at first\n\n**Contextual Bandit Scenario**\n  - Reward distributions depend on a context (e.g., user features)\n  - Agent learns to map contexts to optimal actions\n  - Example: Recommending movies to users based on their age, location (e.g., suggesting \"The Dark Knight\" to a 25-year-old who enjoys action movies and lives in an urban area)\n\n<p align=\"center\">\n  <img src=\"https://github.com/allenanie/EVOLvE/blob/main/assets/bandit_scenario.png?raw=true\" alt=\"Bandit Scenario Example\"/>\n</p>\n\n## \ud83c\udfae Quick Start\n\n### Evaluate LLMs for their In-Context Reinforcement Learning Performance\n\nIn this example, we will compare the performance of two agents (LLM and one of the classic agents) on a multi-armed bandit task.\n\n```python\nfrom banditbench.tasks.mab import BernoulliBandit, VerbalMultiArmedBandit\nfrom banditbench.agents.llm import LLMAgent\nfrom banditbench.agents.classics import UCBAgent\n\n# this is a 5-armed bandit\n# with the probability of getting a reward to be [0.2, 0.2, 0.2, 0.2, 0.5]\ncore_bandit = BernoulliBandit(5, horizon=100, arm_params=[0.2, 0.2, 0.2, 0.2, 0.5])\n\n# The scenario is \"ClothesShopping\", agent sees actions as clothing items\nverbal_bandit = VerbalMultiArmedBandit(core_bandit, \"ClothesShopping\")\n\n# we create an LLM agent that uses summary statistics (mean, number of times, etc.)\nagent = LLMAgent.build_with_env(verbal_bandit, summary=True, model=\"gpt-3.5-turbo\")\n\nllm_result = agent.in_context_learn(verbal_bandit, n_trajs=5)\n\n# we create a UCB agent, which is a classic agent that uses \n# Upper Confidence Bound to make decisions\nclassic_agent = UCBAgent(core_bandit)\n\n# we run the classic agent in-context learning on the core bandit for 5 trajectories\nclassic_result = classic_agent.in_context_learn(core_bandit, n_trajs=5)\n\nclassic_result.plot_performance(llm_result, labels=['UCB', 'GPT-3.5 Turbo'])\n```\n\nDoing this will give you a plot like this:\n\n<p align=\"left\">\n  <img src=\"https://github.com/allenanie/EVOLvE/blob/main/assets/UCBvsLLM.png?raw=true\" alt=\"UCB vs LLM\" style=\"width: 60%;\"/>\n</p>\n\n## \ud83d\udcb0 Evaluation Cost\n\nEach of the benchmark has a cost estimation tool for the inference cost. The listed cost is in $ amount which contains\nall trials and repetitions.\n\n```python\nfrom banditbench import HardCoreBench, HardCorePlusBench, FullBench, CoreBench, MovieBench\nbench = HardCoreBench()\ncost = bench.calculate_eval_cost([\n    'gemini-1.5-pro',\n    'gemini-1.5-flash',\n    'gpt-4o-2024-11-20',\n    \"gpt-4o-mini-2024-07-18\",\n    \"o1-2024-12-17\",\n    \"o1-mini-2024-09-12\",\n    \"claude-3-5-sonnet-20241022\",\n    \"claude-3-5-haiku-20241022\"\n])\n```\n\nYou can evaluate an agent by doing:\n```python\nfrom banditbench.agents.llm import LLMAgent\nfrom banditbench.agents.guide import UCBGuide\n\nenv_to_agent_results = bench.evaluate([\n  LLMAgent.build(),  # Raw History Context Layer\n  LLMAgent.build(summary=True),  # Summary Context Layer\n  LLMAgent.build(summary=True, guide=UCBGuide(env))  # Summary + UCB Guide Context Layer\n])\n```\n\nCost estimation is performed for a **single** agent with raw history (the longest context). If you evaluate multiple agent,\nyou can simply multiply this cost by the number of agents.\n\n| Model                     | Core     | HardCore     | HardCore+     | Full      | MovieBench     |\n|---------------------------|----------|--------------|---------------|-----------|----------------|\n| **gemini-1.5-flash**      | **$31.05**   | **$14.91**       | **$39.18**        | **$83.44**    | **$31.05**         |\n| gpt-4o-mini-2024-07-18    | $62.10   | $29.83       | $78.36        | $166.88   | $62.10         |\n| claude-3-5-haiku-20241022 | $414.33  | $198.97      | $522.64       | $1113.18  | $414.33        |\n| **gemini-1.5-pro**        | **$517.54**  | **$248.55**      | **$652.98**       | **$1390.69**  | **$517.54**        |\n| gpt-4o-2024-11-20         | $1035.07 | $497.11      | $1305.96      | $2781.38  | $1035.07       |\n| o1-mini-2024-09-12        | $1242.09 | $596.53      | $1567.16      | $3337.66  | $1242.09       |\n| claude-3-5-sonnet-20241022| $1243.00 | $596.91      | $1567.91      | $3339.53  | $1243.00       |\n| o1-2024-12-17             | $6210.45 | $2982.64     | $7835.79      | $16688.31 | $6210.45       |\n\n## \ud83c\udf0d Environments & \ud83e\udd16 Agents\n\nHere are a list of agents that are supported by EVOLvE:\n\nFor Multi-Armed Bandit Scenario:\n\n| Agent Name | Code | Interaction History | Algorithm Guide |\n|------------|------|---------------------|-----------------|\n| UCB | `UCBAgent(env)` | `False` | `NA` |\n| Greedy | `GreedyAgent(env)` | `False` | `NA` |\n| Thompson Sampling | `ThompsonSamplingAgent(env)` | `False` | `NA` |\n| LLM with Raw History | `LLMAgent.build(env)` | `False` | `False` |\n| LLM with Summary | `LLMAgent.build(env, summary=True)` | `True` | `False` |\n| LLM with UCB Guide | `LLMAgent.build(env, summary=True, guide=UCBGuide(env))` | `True` | `True` |\n\nFor Contextual Bandit Scenario:\n\n| Agent Name | Code | Interaction History | Algorithm Guide |\n|------------|------|---------------------|-----------------|\n| LinUCB | `LinUCBAgent(env)` | `False` | `NA` |\n| LLM with Raw History | `LLMAgent.build(env)` | `False` | `False` |\n| LLM with UCB Guide | `LLMAgent.build(env, guide=LinUCBGuide(env))` | `True` | `True` |\n\nHere are a list of environments that are supported by EVOLvE:\n\n**Multi-Armed Bandit Scenario**\n\n| Environment Name | Code | Description |\n|------------|------|-----------------|\n| Bernoulli Bandit | `BernoulliBandit(n_arms, horizon, arm_params)` | Arm parameter is Bernoulli p|\n| Gaussian Bandit | `GaussianBandit(n_arms, horizon, arm_params)` | Arm parameter is a tuple of (mean, variance)|\n\nFor LLM, we provide a `VerbalMultiArmedBandit` environment that converts the core bandit into a verbal bandit.\n\n| Scenario Name | Code | Action Names |\n|------------|------|-----------------|\n| Button Pushing | `ButtonPushing` | Action names are colored buttons like \"Red\", \"Blue\", \"Green\", etc.|\n| Online Ads | `OnlineAds` | Action names are online ads like \"Ad A\", \"Ad B\", \"Ad C\", etc.|\n| Video Watching | `VideoWatching` | Action names are videos like \"Video A\", \"Video B\", \"Video C\", etc.|\n| Clothes Shopping | `ClothesShopping` | Action names are clothing items like \"Velvet Vogue Jacket\", \"Silk Serenity Dress\", etc.|\n\nThey can be coupled together like:\n\n```python\nfrom banditbench.tasks.mab import BernoulliBandit, VerbalMultiArmedBandit\n\ncore_bandit = BernoulliBandit(2, 10, [0.5, 0.2], 123)\nverbal_bandit = VerbalMultiArmedBandit(core_bandit, \"VideoWatching\")\n```\n\n**Contextual Bandit Scenario**\n\n| Environment Name | Code | Description |\n|------------|------|-----------------|\n| MovieLens | `MovieLens(task_name, num_arms, horizon)` | `task_name` loads in specific MovieLens dataset|\n| MovieLensVerbal | `MovieLensVerbal(env)` | Similar to VerbalEnv before. Scenario is fixed to be \"MovieLens\"|\n\n```python\nfrom banditbench.tasks.contextual import MovieLens, MovieLensVerbal\n\nenv = MovieLens('100k-ratings', num_arms=10, horizon=200, rank_k=5, mode='train',\n                        save_data_dir='./tensorflow_datasets/')\nverbal_env = MovieLensVerbal(env)\n```\n\nTo use the environments listed in the paper, you can use the following code:\n\n```python\nfrom banditbench.tasks.mab import create_small_gap_bernoulli_bandit, create_large_gap_bernoulli_bandit\nfrom banditbench.tasks.mab import create_high_var_gaussian_bandit, create_low_var_gaussian_bandit\n\neasy_bern_bandit = create_small_gap_bernoulli_bandit(num_arms=5, horizon=1000)\n```\n\n## \ud83e\udde9 Architecture\n\n### Decision-Making Context\n\nThe framework represents decision-making contexts in three segments:\n\n```text\n{Task Description + Instruction} (provided by the environment)\n{Few-shot demonstrations from historical interactions}\n{Current history of interaction} (decided by the agent)\n{Query prompt for the next decision} (provided by the environment)\n```\n\n### LLM Agents\n\nWe use a Mixin-based design pattern to provide maximum flexibility and customization options for agent implementation. This allows you to:\n- Combine different agent behaviors\n- Customize prompt engineering strategies\n- Implement new decision-making algorithms\n\n\n## \ud83d\udd27 Customization\n\n### Adding Custom Multi-Armed Bandit Scenarios\n\nTo create a custom bandit scenario:\n1. Inherit from the base scenario class\n2. Implement required methods\n(Coming soon)\n\n### Creating Custom Agents\n\n(Coming soon)\n\n## \u26a0\ufe0f Known Issues\n\n1. **TFDS Issues**: There is a known issue with TensorFlow Datasets when using multiple Jupyter notebooks sharing the same kernel. The kernel may crash when loading datasets, even with different save locations.\n\n2. **TensorFlow Dependency**: The project currently requires TensorFlow due to TFDS usage. We plan to remove this dependency in future releases.\n\n## \ud83c\udf88 Citation\n\nIf you find EVOLvE useful in your research, please consider citing our paper:\n\n```bibtex\n@article{nie2024evolve,\n  title={EVOLvE: Evaluating and Optimizing LLMs For Exploration},\n  author={Nie, Allen and Su, Yi and Chang, Bo and Lee, Jonathan N and Chi, Ed H and Le, Quoc V and Chen, Minmin},\n  journal={arXiv preprint arXiv:2410.06238},\n  year={2024}\n}\n```\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the [LICENSE NAME] - see the [LICENSE](LICENSE) file for details.\n\n## \ud83c\udf3b Acknowledgement\n\nThe design of EVOLvE is inspired by the following projects:\n\n- [DSPy](https://github.com/stanfordnlp/dspy) \n- [Trace](https://github.com/microsoft/Trace)\n- [Textgrad](https://github.com/zou-group/textgrad)\n- [d3rlpy](https://d3rlpy.readthedocs.io/en/v2.6.0/)\n- [Scala Mixin Trait](https://docs.scala-lang.org/tour/mixin-class-composition.html)\n- [In-Context Reinforcement Learning Paper List](https://github.com/dunnolab/awesome-in-context-rl)\n\n## \ud83e\udd1d Contributing\n\nWe welcome contributions! Please start by reporting an issue or a feature request.\n\n<p align=\"center\">\n  <img src=\"https://github.com/allenanie/EVOLvE/blob/main/assets/main.jpeg?raw=true\" alt=\"EVOLvE Framework Overview\"/>\n</p>\n",
    "bugtrack_url": null,
    "license": "MIT LICENSE",
    "summary": "BanditBench: A Bandit Benchmark to Evaluate Self-Improving LLM Algorithms",
    "version": "0.0.8",
    "project_urls": {
        "Documentation": "https://github.com/allenanie/EVOLvE",
        "Homepage": "https://github.com/allenanie/EVOLvE",
        "Repository": "https://github.com/allenanie/EVOLvE"
    },
    "split_keywords": [
        "banditbench",
        " bandit",
        " llm",
        " rl",
        " agent"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "43f72ca04c851fb95d308c05ea21912580408771461ae95264302c92afe8cede",
                "md5": "c3bfb5832f9c24250dc2e276f5c2eb74",
                "sha256": "d381390a2d4da089ca60505758e9da881c2e0beb621ff94fabd2524e9324a654"
            },
            "downloads": -1,
            "filename": "banditbench-0.0.8-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c3bfb5832f9c24250dc2e276f5c2eb74",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 353449,
            "upload_time": "2025-01-20T22:51:50",
            "upload_time_iso_8601": "2025-01-20T22:51:50.298307Z",
            "url": "https://files.pythonhosted.org/packages/43/f7/2ca04c851fb95d308c05ea21912580408771461ae95264302c92afe8cede/banditbench-0.0.8-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "083f72ac6832bdec0e2a44dc56e12ba1484957be720f9bf4a3b76be9cc59be64",
                "md5": "f255d07ca35abbf0645c1dc9016ec46a",
                "sha256": "6c0871046006d78f497acfe9da3ad26a08525afe4a9fe49c3ed79fd9ea10b9f0"
            },
            "downloads": -1,
            "filename": "banditbench-0.0.8.tar.gz",
            "has_sig": false,
            "md5_digest": "f255d07ca35abbf0645c1dc9016ec46a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 3988894,
            "upload_time": "2025-01-20T22:51:52",
            "upload_time_iso_8601": "2025-01-20T22:51:52.831982Z",
            "url": "https://files.pythonhosted.org/packages/08/3f/72ac6832bdec0e2a44dc56e12ba1484957be720f9bf4a3b76be9cc59be64/banditbench-0.0.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-20 22:51:52",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "allenanie",
    "github_project": "EVOLvE",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "tensorflow",
            "specs": [
                [
                    "==",
                    "2.17.0"
                ]
            ]
        },
        {
            "name": "ml-dtypes",
            "specs": [
                [
                    "==",
                    "0.3.1"
                ]
            ]
        },
        {
            "name": "pydantic",
            "specs": [
                [
                    "==",
                    "2.10.5"
                ]
            ]
        },
        {
            "name": "litellm",
            "specs": []
        },
        {
            "name": "pandas",
            "specs": []
        },
        {
            "name": "tensorflow_datasets",
            "specs": [
                [
                    "==",
                    "4.9.3"
                ]
            ]
        },
        {
            "name": "matplotlib",
            "specs": []
        },
        {
            "name": "numpy",
            "specs": []
        },
        {
            "name": "scipy",
            "specs": []
        }
    ],
    "lcname": "banditbench"
}

Allen Nie