agent-reward-bench

Name	agent-reward-bench JSON
Version	0.1.2 JSON
	download
home_page	https://github.com/McGill-NLP/agent-reward-bench
Summary	Official library for AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
upload_time	2025-07-11 10:09:29
maintainer	None
docs_url	None
author	McGill NLP
requires_python	>=3.10
license	None
keywords
VCS
bugtrack_url
requirements	openai tqdm Pillow browsergym playwright orjson
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <div align="center">

# AgentRewardBench

| [**💾Code**](https://github.com/McGill-NLP/agent-reward-bench) |[**📄Paper**](https://arxiv.org/abs/2504.08942) | [**🌐Website**](https://agent-reward-bench.github.io) | 
| :--: | :--: | :--: |
| [**🤗Dataset**](https://huggingface.co/datasets/McGill-NLP/agent-reward-bench) | [**💻Demo**](https://huggingface.co/spaces/McGill-NLP/agent-reward-bench-demo) |  [**🏆Leaderboard**](https://huggingface.co/spaces/McGill-NLP/agent-reward-bench-leaderboard) | 

<br>

**[AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories](https://arxiv.org/abs/2504.08942)**  
*[Xing Han Lù](https://xinghanlu.com/), [Amirhossein Kazemnejad*](https://kazemnejad.com/), <br>[Nicholas Meade](https://ncmeade.github.io/), [Arkil Patel](https://arkilpatel.github.io/), [Dongchan Shin](https://scholar.google.com/citations?user=QzZOkfIAAAAJ&hl=en), [Alejandra Zambrano](https://www.linkedin.com/in/alejandra-zambrano-a71092196/), <br>[Karolina Stańczak](https://kstanczak.github.io/), [Peter Shaw](https://www.ptshaw.com/), [Christopher J. Pal](https://sites.google.com/view/christopher-pal), [Siva Reddy](https://sivareddy.in/)*  
*\*Core Contributor*

</div>


![Image showing an example](assets/primary.png)


## Using the `agent-reward-bench` library

This library provides a set of tools for evaluating the performance of agents in various environments. It includes a set of environments, a set of agents, and a set of evaluation metrics.

## Installation

To install the library:

```bash
pip install agent-reward-bench
```

You can now import the library in your Python code:

```python
# Using agents and environments:
import agent_reward_bench.modeling as arbm
import agent_reward_bench.benchmarks as arbb

# Using the judge for evaluating agents:
import agent_reward_bench.judge as arbj
from agent_reward_bench.judge.existing import aer, nnetnav
from agent_reward_bench.judge.args import default_judge_args, judge_args
```

See `scripts/run_agent.py` and `scripts/run_judge.py` for examples of how to use the library to run an agent in an environment.

## Loading dataset

You can use the `huggingface_hub` library to load the dataset. The dataset is available on Huggingface Hub at `McGill-NLP/agent-reward-bench`.

```python
from huggingface_hub import snapshot_download

# Download the dataset to ./trajectories/
snapshot_download(
    repo_id="McGill-NLP/agent-reward-bench",
    repo_type="dataset",
    local_dir="./trajectories/"
)
```

<details>
<summary>Click to see the folder structure</summary>

```
trajectories/
├── cleaned/
│   ├── assistantbench/
│   │   ├── GenericAgent-<LLM>/
│   │   │   ├── GenericAgent-<LLM>_on_<benchmark>.<split>/
│   │   │   |   ├── <benchmark>.<split>.0.json
│   │   │   |   ├── ...
│   │   │   ├── ...
|   |   ├── ...
│   ├── visualwebarena/
│   │   ├── ...
│   ├── webarena/
│   │   ├── ...
│   ├── workarena/
│   │   ├── ...
├── judgments/
│   ├── <benchmark>/
│   │   ├── GenericAgent-<LLM>/
│   │   │   ├── <judge>/
│   │   │   |   ├── <benchmark>.<split>.0.json
│   │   │   |   ├── ...
│   ├── ...
├── screenshots/
│   ├── <benchmark>/
│   │   ├── GenericAgent-<LLM>/
│   │   │   ├── <benchmark>.<split>.0/
│   │   │   |   ├── screenshot_step_0.png
│   │   │   |   ├── ...
│   │   │   ├── ...
│   │   ├── ...
│   ├── visualwebarena/
│   │   ├── ...
│   ├── ...
```
</details>

## Running Judge

First, make sure that the cleaned trajectories are in `trajectories/cleaned`. You can do this by downloading the official ones from Huggingface Hub and place them in the `trajectories/` folder, or see instructions below on how to generate them.

To run the judge, use the following command:
```bash
python scripts/run_judge.py
```

This will generate the output of the judge and save them to `trajectories/judgments` by default, which can be changed with the `--base_save_dir` argument.

## Evaluation

To evaluate a judge, run the following command:

```bash
python scripts/score_judgments.py --split test --judgments_base_dir "trajectories/judgments/" --results_save_dir "artifacts/"
```

if you need help:

```bash
python scripts/score_judgments.py --help
```

This will generate the evaluation results and save them to `artifacts/` by default, which can be changed with the `--results_save_dir` argument.
You can also use the `--split` argument to specify which split to evaluate (dev or test). The default is test.

### Submitting to leaderboard

[Open an issue to submit your results to the leadeboard](https://github.com/McGill-NLP/agent-reward-bench/issues/new?template=add-results-to-leaderboard.yml). We will review your results and add them to the leaderboard.

## Generating trajectories

If you are using the trajectories from Huggingface Hub, you can skip this step. However, if you want to generate your own trajectories, you can following the instructions below. Note you will also need to create your own annotations and save them in the same format as `agent_reward_bench/data/annotations.csv`.

### Setup

First, clone this repo and create a virtual environment:
```bash
git clone https://github.com/mcgill-nlp/agent-reward-bench.git
cd po-web-agents
python3 -m venv venv
pip install -r requirements.txt

playwright install
```

### Web Environments

To set up the environments, please see [gasse/webarena-setup](https://github.com/gasse/webarena-setup/) for WA and VWA, and [ServiceNow/WorkArena](https://github.com/ServiceNow/WorkArena/) for WorkArena and WorkArena++.

### Environment variables

You need to set the following environment variables for using the web environments.

```bash
# for workarena:
export SNOW_INSTANCE_URL="https://dev275972.service-now.com"
export SNOW_INSTANCE_UNAME="admin"
export SNOW_INSTANCE_PWD="<password>"

# for webarena:
export WA_HOMEPAGE="https://wa-homepage-${SUFFIX}.${WEBHOST}"
export WA_SHOPPING="https://wa-shopping-${SUFFIX}.${WEBHOST}/"
export WA_SHOPPING_ADMIN="https://wa-shopping-admin-${SUFFIX}.${WEBHOST}/admin"
export WA_REDDIT="https://wa-forum-${SUFFIX}.${WEBHOST}"
export WA_GITLAB="https://wa-gitlab-${SUFFIX}.${WEBHOST}"
export WA_WIKIPEDIA="https://wa-wikipedia-${SUFFIX}.${WEBHOST}/wikipedia_en_all_maxi_2022-05/A/User:The_other_Kiwix_guy/Landing"
export WA_MAP="https://wa-openstreetmap-${SUFFIX}.${WEBHOST}"
export WA_FULL_RESET="https://wa-reset-${SUFFIX}.${WEBHOST}"

# for visualwebarena:
export VWA_HOMEPAGE="https://vwa-homepage-${SUFFIX}.${WEBHOST}"
# ...
export VWA_FULL_RESET="https://vwa-reset-${SUFFIX}.${WEBHOST}"

export VWA_CLASSIFIEDS="https://vwa-classifieds-${SUFFIX}.${WEBHOST}"
export VWA_CLASSIFIEDS_RESET_TOKEN="4b61655535e7ed388f0d40a93600254c"
```

See `vars/set_envs.sh` for an example of how to set up the environment variables automatically.

You might want to set up various API keys for the different services. You can do this by by adding the following to your `.bashrc` or `.bash_profile`:

```bash
export OPENAI_ORG_ID="your-openai-org-id"

# API keys
export OPENAI_API_KEY="your-openai-api-key"
export TOGETHER_API_KEY="your-together-api-key"
export VLLM_API_KEY="your-vllm-api-key"
export OPENROUTER_API_KEY="your-openrouter-api-key"

export VLLM_BASE_URL="https://vllm.your.domain.com/v1"
export TOGETHER_BASE_URL="https://api.together.xyz/v1"
export OPENROUTER_BASE_URL="https://openrouter.ai/api/v1"
```


### Running the agent

```bash
# For WA:
export SUFFIX="-v1"  # change this to your setup
export WEBHOST="your.domain.com" # change this to your web host
source vars/set_envs.sh  # set up the environment variables

# starting a new run
python run_agent.py --model "<name>" --benchmark "<benchmark>"

# e.g., for a gpt-4o agent on WA:
python run_agent.py --model "gpt-4o" --benchmark "webarena_100"
```

The accepted benchmarks and models can be found with the following commands:

```bash
python run_agent.py --help
```

### Processing trajectories

To process the trajectories, you can run:

```bash
python scripts/convert_trajectories_to_json.py
```

This will save the trajectories to `trajectories/processed` (make sure to set the `--base_save_dir` argument to the correct path). Then, you can further clean them (optional) by running:

```bash
python scripts/clean_processed_trajectories.py 
```
This will save the cleaned trajectories to `trajectories/cleaned` (make sure to set the `--base_save_dir` argument to the correct path).

## Contributing

If you are publishing a new version of this library, run:

```
rm -r dist
python3 setup.py sdist bdist_wheel
twine upload dist/*
```

Request the api token from the repo owner.

## Acknowledgements

`webarena.csv` and `visualwebarena.csv` that we store in this paper are retrieved from the [codebase of the browsergym/agentlab ecosystem paper](https://github.com/ServiceNow/BrowserGym/tree/main/browsergym/experiments/src/browsergym/experiments/benchmark/metadata).

## Citation

If you use AgentRewardBench in your research, please cite the following paper:

```bibtex
@misc{lù2025agentrewardbenchevaluatingautomaticevaluations,
      title={AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories}, 
      author={Xing Han Lù and Amirhossein Kazemnejad and Nicholas Meade and Arkil Patel and Dongchan Shin and Alejandra Zambrano and Karolina Stańczak and Peter Shaw and Christopher J. Pal and Siva Reddy},
      year={2025},
      eprint={2504.08942},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2504.08942}, 
}
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/McGill-NLP/agent-reward-bench",
    "name": "agent-reward-bench",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": "McGill NLP",
    "author_email": "agent-reward-bench@googlegroups.com",
    "download_url": "https://files.pythonhosted.org/packages/fe/97/7c361ea7161ded7c72a063df5eb4f567bf09445e8260678970ba1606ed3e/agent_reward_bench-0.1.2.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n\n# AgentRewardBench\n\n| [**\ud83d\udcbeCode**](https://github.com/McGill-NLP/agent-reward-bench) |[**\ud83d\udcc4Paper**](https://arxiv.org/abs/2504.08942) | [**\ud83c\udf10Website**](https://agent-reward-bench.github.io) | \n| :--: | :--: | :--: |\n| [**\ud83e\udd17Dataset**](https://huggingface.co/datasets/McGill-NLP/agent-reward-bench) | [**\ud83d\udcbbDemo**](https://huggingface.co/spaces/McGill-NLP/agent-reward-bench-demo) |  [**\ud83c\udfc6Leaderboard**](https://huggingface.co/spaces/McGill-NLP/agent-reward-bench-leaderboard) | \n\n<br>\n\n**[AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories](https://arxiv.org/abs/2504.08942)**  \n*[Xing Han L\u00f9](https://xinghanlu.com/), [Amirhossein Kazemnejad*](https://kazemnejad.com/), <br>[Nicholas Meade](https://ncmeade.github.io/), [Arkil Patel](https://arkilpatel.github.io/), [Dongchan Shin](https://scholar.google.com/citations?user=QzZOkfIAAAAJ&hl=en), [Alejandra Zambrano](https://www.linkedin.com/in/alejandra-zambrano-a71092196/), <br>[Karolina Sta\u0144czak](https://kstanczak.github.io/), [Peter Shaw](https://www.ptshaw.com/), [Christopher J. Pal](https://sites.google.com/view/christopher-pal), [Siva Reddy](https://sivareddy.in/)*  \n*\\*Core Contributor*\n\n</div>\n\n\n![Image showing an example](assets/primary.png)\n\n\n## Using the `agent-reward-bench` library\n\nThis library provides a set of tools for evaluating the performance of agents in various environments. It includes a set of environments, a set of agents, and a set of evaluation metrics.\n\n## Installation\n\nTo install the library:\n\n```bash\npip install agent-reward-bench\n```\n\nYou can now import the library in your Python code:\n\n```python\n# Using agents and environments:\nimport agent_reward_bench.modeling as arbm\nimport agent_reward_bench.benchmarks as arbb\n\n# Using the judge for evaluating agents:\nimport agent_reward_bench.judge as arbj\nfrom agent_reward_bench.judge.existing import aer, nnetnav\nfrom agent_reward_bench.judge.args import default_judge_args, judge_args\n```\n\nSee `scripts/run_agent.py` and `scripts/run_judge.py` for examples of how to use the library to run an agent in an environment.\n\n## Loading dataset\n\nYou can use the `huggingface_hub` library to load the dataset. The dataset is available on Huggingface Hub at `McGill-NLP/agent-reward-bench`.\n\n```python\nfrom huggingface_hub import snapshot_download\n\n# Download the dataset to ./trajectories/\nsnapshot_download(\n    repo_id=\"McGill-NLP/agent-reward-bench\",\n    repo_type=\"dataset\",\n    local_dir=\"./trajectories/\"\n)\n```\n\n<details>\n<summary>Click to see the folder structure</summary>\n\n```\ntrajectories/\n\u251c\u2500\u2500 cleaned/\n\u2502   \u251c\u2500\u2500 assistantbench/\n\u2502   \u2502   \u251c\u2500\u2500 GenericAgent-<LLM>/\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 GenericAgent-<LLM>_on_<benchmark>.<split>/\n\u2502   \u2502   \u2502   |   \u251c\u2500\u2500 <benchmark>.<split>.0.json\n\u2502   \u2502   \u2502   |   \u251c\u2500\u2500 ...\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 ...\n|   |   \u251c\u2500\u2500 ...\n\u2502   \u251c\u2500\u2500 visualwebarena/\n\u2502   \u2502   \u251c\u2500\u2500 ...\n\u2502   \u251c\u2500\u2500 webarena/\n\u2502   \u2502   \u251c\u2500\u2500 ...\n\u2502   \u251c\u2500\u2500 workarena/\n\u2502   \u2502   \u251c\u2500\u2500 ...\n\u251c\u2500\u2500 judgments/\n\u2502   \u251c\u2500\u2500 <benchmark>/\n\u2502   \u2502   \u251c\u2500\u2500 GenericAgent-<LLM>/\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 <judge>/\n\u2502   \u2502   \u2502   |   \u251c\u2500\u2500 <benchmark>.<split>.0.json\n\u2502   \u2502   \u2502   |   \u251c\u2500\u2500 ...\n\u2502   \u251c\u2500\u2500 ...\n\u251c\u2500\u2500 screenshots/\n\u2502   \u251c\u2500\u2500 <benchmark>/\n\u2502   \u2502   \u251c\u2500\u2500 GenericAgent-<LLM>/\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 <benchmark>.<split>.0/\n\u2502   \u2502   \u2502   |   \u251c\u2500\u2500 screenshot_step_0.png\n\u2502   \u2502   \u2502   |   \u251c\u2500\u2500 ...\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 ...\n\u2502   \u2502   \u251c\u2500\u2500 ...\n\u2502   \u251c\u2500\u2500 visualwebarena/\n\u2502   \u2502   \u251c\u2500\u2500 ...\n\u2502   \u251c\u2500\u2500 ...\n```\n</details>\n\n## Running Judge\n\nFirst, make sure that the cleaned trajectories are in `trajectories/cleaned`. You can do this by downloading the official ones from Huggingface Hub and place them in the `trajectories/` folder, or see instructions below on how to generate them.\n\nTo run the judge, use the following command:\n```bash\npython scripts/run_judge.py\n```\n\nThis will generate the output of the judge and save them to `trajectories/judgments` by default, which can be changed with the `--base_save_dir` argument.\n\n## Evaluation\n\nTo evaluate a judge, run the following command:\n\n```bash\npython scripts/score_judgments.py --split test --judgments_base_dir \"trajectories/judgments/\" --results_save_dir \"artifacts/\"\n```\n\nif you need help:\n\n```bash\npython scripts/score_judgments.py --help\n```\n\nThis will generate the evaluation results and save them to `artifacts/` by default, which can be changed with the `--results_save_dir` argument.\nYou can also use the `--split` argument to specify which split to evaluate (dev or test). The default is test.\n\n### Submitting to leaderboard\n\n[Open an issue to submit your results to the leadeboard](https://github.com/McGill-NLP/agent-reward-bench/issues/new?template=add-results-to-leaderboard.yml). We will review your results and add them to the leaderboard.\n\n## Generating trajectories\n\nIf you are using the trajectories from Huggingface Hub, you can skip this step. However, if you want to generate your own trajectories, you can following the instructions below. Note you will also need to create your own annotations and save them in the same format as `agent_reward_bench/data/annotations.csv`.\n\n### Setup\n\nFirst, clone this repo and create a virtual environment:\n```bash\ngit clone https://github.com/mcgill-nlp/agent-reward-bench.git\ncd po-web-agents\npython3 -m venv venv\npip install -r requirements.txt\n\nplaywright install\n```\n\n### Web Environments\n\nTo set up the environments, please see [gasse/webarena-setup](https://github.com/gasse/webarena-setup/) for WA and VWA, and [ServiceNow/WorkArena](https://github.com/ServiceNow/WorkArena/) for WorkArena and WorkArena++.\n\n### Environment variables\n\nYou need to set the following environment variables for using the web environments.\n\n```bash\n# for workarena:\nexport SNOW_INSTANCE_URL=\"https://dev275972.service-now.com\"\nexport SNOW_INSTANCE_UNAME=\"admin\"\nexport SNOW_INSTANCE_PWD=\"<password>\"\n\n# for webarena:\nexport WA_HOMEPAGE=\"https://wa-homepage-${SUFFIX}.${WEBHOST}\"\nexport WA_SHOPPING=\"https://wa-shopping-${SUFFIX}.${WEBHOST}/\"\nexport WA_SHOPPING_ADMIN=\"https://wa-shopping-admin-${SUFFIX}.${WEBHOST}/admin\"\nexport WA_REDDIT=\"https://wa-forum-${SUFFIX}.${WEBHOST}\"\nexport WA_GITLAB=\"https://wa-gitlab-${SUFFIX}.${WEBHOST}\"\nexport WA_WIKIPEDIA=\"https://wa-wikipedia-${SUFFIX}.${WEBHOST}/wikipedia_en_all_maxi_2022-05/A/User:The_other_Kiwix_guy/Landing\"\nexport WA_MAP=\"https://wa-openstreetmap-${SUFFIX}.${WEBHOST}\"\nexport WA_FULL_RESET=\"https://wa-reset-${SUFFIX}.${WEBHOST}\"\n\n# for visualwebarena:\nexport VWA_HOMEPAGE=\"https://vwa-homepage-${SUFFIX}.${WEBHOST}\"\n# ...\nexport VWA_FULL_RESET=\"https://vwa-reset-${SUFFIX}.${WEBHOST}\"\n\nexport VWA_CLASSIFIEDS=\"https://vwa-classifieds-${SUFFIX}.${WEBHOST}\"\nexport VWA_CLASSIFIEDS_RESET_TOKEN=\"4b61655535e7ed388f0d40a93600254c\"\n```\n\nSee `vars/set_envs.sh` for an example of how to set up the environment variables automatically.\n\nYou might want to set up various API keys for the different services. You can do this by by adding the following to your `.bashrc` or `.bash_profile`:\n\n```bash\nexport OPENAI_ORG_ID=\"your-openai-org-id\"\n\n# API keys\nexport OPENAI_API_KEY=\"your-openai-api-key\"\nexport TOGETHER_API_KEY=\"your-together-api-key\"\nexport VLLM_API_KEY=\"your-vllm-api-key\"\nexport OPENROUTER_API_KEY=\"your-openrouter-api-key\"\n\nexport VLLM_BASE_URL=\"https://vllm.your.domain.com/v1\"\nexport TOGETHER_BASE_URL=\"https://api.together.xyz/v1\"\nexport OPENROUTER_BASE_URL=\"https://openrouter.ai/api/v1\"\n```\n\n\n### Running the agent\n\n```bash\n# For WA:\nexport SUFFIX=\"-v1\"  # change this to your setup\nexport WEBHOST=\"your.domain.com\" # change this to your web host\nsource vars/set_envs.sh  # set up the environment variables\n\n# starting a new run\npython run_agent.py --model \"<name>\" --benchmark \"<benchmark>\"\n\n# e.g., for a gpt-4o agent on WA:\npython run_agent.py --model \"gpt-4o\" --benchmark \"webarena_100\"\n```\n\nThe accepted benchmarks and models can be found with the following commands:\n\n```bash\npython run_agent.py --help\n```\n\n### Processing trajectories\n\nTo process the trajectories, you can run:\n\n```bash\npython scripts/convert_trajectories_to_json.py\n```\n\nThis will save the trajectories to `trajectories/processed` (make sure to set the `--base_save_dir` argument to the correct path). Then, you can further clean them (optional) by running:\n\n```bash\npython scripts/clean_processed_trajectories.py \n```\nThis will save the cleaned trajectories to `trajectories/cleaned` (make sure to set the `--base_save_dir` argument to the correct path).\n\n## Contributing\n\nIf you are publishing a new version of this library, run:\n\n```\nrm -r dist\npython3 setup.py sdist bdist_wheel\ntwine upload dist/*\n```\n\nRequest the api token from the repo owner.\n\n## Acknowledgements\n\n`webarena.csv` and `visualwebarena.csv` that we store in this paper are retrieved from the [codebase of the browsergym/agentlab ecosystem paper](https://github.com/ServiceNow/BrowserGym/tree/main/browsergym/experiments/src/browsergym/experiments/benchmark/metadata).\n\n## Citation\n\nIf you use AgentRewardBench in your research, please cite the following paper:\n\n```bibtex\n@misc{l\u00f92025agentrewardbenchevaluatingautomaticevaluations,\n      title={AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories}, \n      author={Xing Han L\u00f9 and Amirhossein Kazemnejad and Nicholas Meade and Arkil Patel and Dongchan Shin and Alejandra Zambrano and Karolina Sta\u0144czak and Peter Shaw and Christopher J. Pal and Siva Reddy},\n      year={2025},\n      eprint={2504.08942},\n      archivePrefix={arXiv},\n      primaryClass={cs.LG},\n      url={https://arxiv.org/abs/2504.08942}, \n}\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Official library for AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories",
    "version": "0.1.2",
    "project_urls": {
        "Homepage": "https://github.com/McGill-NLP/agent-reward-bench"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a0fb07a559adb643eb4db86c8fa2cd4271749baf279c7e194642e67cbb331ed9",
                "md5": "15604e6b3c369120d28dc99b25f0466c",
                "sha256": "b9fa3c9f155b95cfc4cbbf565680eaaec1a50e4c6a14e4cbb2af1f0faa07477a"
            },
            "downloads": -1,
            "filename": "agent_reward_bench-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "15604e6b3c369120d28dc99b25f0466c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 77287,
            "upload_time": "2025-07-11T10:09:28",
            "upload_time_iso_8601": "2025-07-11T10:09:28.086032Z",
            "url": "https://files.pythonhosted.org/packages/a0/fb/07a559adb643eb4db86c8fa2cd4271749baf279c7e194642e67cbb331ed9/agent_reward_bench-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "fe977c361ea7161ded7c72a063df5eb4f567bf09445e8260678970ba1606ed3e",
                "md5": "dd8baf82ffd28e0364328c85a257dbb6",
                "sha256": "6f247403e8ce193b1a313bda0810bbc9a29f4253adf14b1d1e5dd319aa0395af"
            },
            "downloads": -1,
            "filename": "agent_reward_bench-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "dd8baf82ffd28e0364328c85a257dbb6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 75140,
            "upload_time": "2025-07-11T10:09:29",
            "upload_time_iso_8601": "2025-07-11T10:09:29.370016Z",
            "url": "https://files.pythonhosted.org/packages/fe/97/7c361ea7161ded7c72a063df5eb4f567bf09445e8260678970ba1606ed3e/agent_reward_bench-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-11 10:09:29",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "McGill-NLP",
    "github_project": "agent-reward-bench",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "openai",
            "specs": [
                [
                    ">=",
                    "1.66.0"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": []
        },
        {
            "name": "Pillow",
            "specs": []
        },
        {
            "name": "browsergym",
            "specs": [
                [
                    ">=",
                    "0.13.1"
                ]
            ]
        },
        {
            "name": "playwright",
            "specs": [
                [
                    "==",
                    "1.44.0"
                ]
            ]
        },
        {
            "name": "orjson",
            "specs": []
        }
    ],
    "lcname": "agent-reward-bench"
}

McGill NLP