gui-agents

Name	gui-agents JSON
Version	0.2.5.post2 JSON
	download
home_page	None
Summary	A library for creating general purpose GUI agents using multimodal LLMs.
upload_time	2025-08-01 16:24:49
maintainer	None
docs_url	None
author	Simular AI
requires_python	<=3.12,>=3.9
license	None
keywords	ai llm gui agent multimodal
VCS
bugtrack_url
requirements	numpy backoff pandas openai anthropic fastapi uvicorn paddleocr paddlepaddle together scikit-learn websockets tiktoken pyautogui toml black pytesseract google-genai pyobjc pywinauto pywin32
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <h1 align="center">
  <img src="images/agent_s.png" alt="Logo" style="vertical-align:middle" width="60"> Agent S:
  <small>Use Computer Like a Human</small>
</h1>

<p align="center">&nbsp;
  🌐 <a href="https://www.simular.ai/articles/agent-s2-technical-review">[S2 blog]</a>&nbsp;
  📄 <a href="https://arxiv.org/abs/2504.00906">[S2 Paper (COLM 2025)]</a>&nbsp;
  🎥 <a href="https://www.youtube.com/watch?v=wUGVQl7c0eg">[S2 Video]</a>
</p>

<p align="center">&nbsp;
  🌐 <a href="https://www.simular.ai/agent-s">[S1 blog]</a>&nbsp;
  📄 <a href="https://arxiv.org/abs/2410.08164">[S1 Paper (ICLR 2025)]</a>&nbsp;
  🎥 <a href="https://www.youtube.com/watch?v=OBDE3Knte0g">[S1 Video]</a>
</p>

<p align="center">&nbsp;
<a href="https://trendshift.io/repositories/13151" target="_blank"><img src="https://trendshift.io/api/badge/repositories/13151" alt="simular-ai%2FAgent-S | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
</p>

<p align="center">
  <a href="https://discord.gg/E2XfsK9fPV">
    <img src="https://dcbadge.limes.pink/api/server/https://discord.gg/E2XfsK9fPV?style=flat" alt="Discord">
  </a>
  &nbsp;&nbsp;
  <a href="https://pepy.tech/projects/gui-agents">
    <img src="https://static.pepy.tech/badge/gui-agents" alt="PyPI Downloads">
  </a>
</p>

<div align="center">
  <!-- Keep these links. Translations will automatically update with the README. -->
  <a href="https://www.readme-i18n.com/simular-ai/Agent-S?lang=de">Deutsch</a> | 
  <a href="https://www.readme-i18n.com/simular-ai/Agent-S?lang=es">Español</a> | 
  <a href="https://www.readme-i18n.com/simular-ai/Agent-S?lang=fr">français</a> | 
  <a href="https://www.readme-i18n.com/simular-ai/Agent-S?lang=ja">日本語</a> | 
  <a href="https://www.readme-i18n.com/simular-ai/Agent-S?lang=ko">한국어</a> | 
  <a href="https://www.readme-i18n.com/simular-ai/Agent-S?lang=pt">Português</a> | 
  <a href="https://www.readme-i18n.com/simular-ai/Agent-S?lang=ru">Русский</a> | 
  <a href="https://www.readme-i18n.com/simular-ai/Agent-S?lang=zh">中文</a>
</div>

## 🥳 Updates
- [x] **2025/08/01**: Agent S2.5 is released: simpler, better, and faster! New SOTA on [OSWorld-Verified](https://os-world.github.io)!
- [x] **2025/07/07**: The [Agent S2 paper](https://arxiv.org/abs/2504.00906) is accepted to COLM 2025! See you in Montreal!
- [x] **2025/04/01**: Released the [Agent S2 paper](https://arxiv.org/abs/2504.00906) with new SOTA results on OSWorld, WindowsAgentArena, and AndroidWorld!
- [x] **2025/03/12**: Released Agent S2 along with v0.2.0 of [gui-agents](https://github.com/simular-ai/Agent-S), the new state-of-the-art for computer use agents (CUA), outperforming OpenAI's CUA/Operator and Anthropic's Claude 3.7 Sonnet Computer-Use!
- [x] **2025/01/22**: The [Agent S paper](https://arxiv.org/abs/2410.08164) is accepted to ICLR 2025!
- [x] **2025/01/21**: Released v0.1.2 of [gui-agents](https://github.com/simular-ai/Agent-S) library, with support for Linux and Windows!
- [x] **2024/12/05**: Released v0.1.0 of [gui-agents](https://github.com/simular-ai/Agent-S) library, allowing you to use Agent-S for Mac, OSWorld, and WindowsAgentArena with ease!
- [x] **2024/10/10**: Released the [Agent S paper](https://arxiv.org/abs/2410.08164) and codebase!

## Table of Contents

1. [💡 Introduction](#-introduction)
2. [🎯 Current Results](#-current-results)
3. [🛠️ Installation & Setup](#%EF%B8%8F-installation--setup) 
4. [🚀 Usage](#-usage)
5. [🤝 Acknowledgements](#-acknowledgements)
6. [💬 Citation](#-citation)

## 💡 Introduction

Welcome to **Agent S**, an open-source framework designed to enable autonomous interaction with computers through Agent-Computer Interface. Our mission is to build intelligent GUI agents that can learn from past experiences and perform complex tasks autonomously on your computer. 

Whether you're interested in AI, automation, or contributing to cutting-edge agent-based systems, we're excited to have you here!

## 🎯 Current Results

<div align="center">
  <table border="0" cellspacing="0" cellpadding="5">
    <tr>
      <th>Benchmark</th>
      <th>Agent S2.5</th>
      <th>Previous SOTA</th>
    </tr>
    <tr>
      <td>OSWorld Verified (100 step)</td>
      <td><b>56.0%</b></td>
      <td>53.1%</td>
    </tr>
    <tr>
      <td>OSWorld Verified (50 step)</td>
      <td><b>54.2%</b></td>
      <td>50.6%</td>
    </tr>
<!--     <tr>
      <td>WindowsAgentArena</td>
      <td>29.8%</td>
      <td>19.5% (NAVI)</td>
    </tr>
    <tr>
      <td>AndroidWorld</td>
      <td>54.3%</td>
      <td>46.8% (UI-TARS)</td>
    </tr> -->
  </table>
</div>


## 🛠️ Installation & Setup

### Prerequisites
- **Single Monitor**: Our agent is designed for single monitor screens
- **Security**: The agent runs Python code to control your computer - use with care
- **Supported Platforms**: Linux, Mac, and Windows


### Installation
```bash
pip install gui-agents
```

### API Configuration

#### Option 1: Environment Variables
Add to your `.bashrc` (Linux) or `.zshrc` (MacOS):
```bash
export OPENAI_API_KEY=<YOUR_API_KEY>
export ANTHROPIC_API_KEY=<YOUR_ANTHROPIC_API_KEY>
export HF_TOKEN=<YOUR_HF_TOKEN>
```

#### Option 2: Python Script
```python
import os
os.environ["OPENAI_API_KEY"] = "<YOUR_API_KEY>"
```

### Supported Models
We support Azure OpenAI, Anthropic, Gemini, Open Router, and vLLM inference. See [models.md](models.md) for details.

### Grounding Models (Required)
For optimal performance, we recommend [UI-TARS-1.5-7B](https://huggingface.co/ByteDance-Seed/UI-TARS-1.5-7B) hosted on Hugging Face Inference Endpoints or another provider. See [Hugging Face Inference Endpoints](https://huggingface.co/learn/cookbook/en/enterprise_dedicated_endpoints) for setup instructions.

## 🚀 Usage


> ⚡️ **Recommended Setup:**  
> For the best configuration, we recommend using **OpenAI o3-2025-04-16** as the main model, paired with **UI-TARS-1.5-7B** for grounding.  


### CLI

Run Agent S2.5 with the required parameters:

```bash
agent_s \
    --provider openai \
    --model o3-2025-04-16 \
    --ground_provider huggingface \
    --ground_url http://localhost:8080 \
    --ground_model ui-tars-1.5-7b \
    --grounding_width 1920 \
    --grounding_height 1080
```

#### Required Parameters
- **`--provider`**: Main generation model provider (e.g., openai, anthropic, etc.) - Default: "openai"
- **`--model`**: Main generation model name (e.g., o3-2025-04-16) - Default: "o3-2025-04-16"
- **`--ground_provider`**: The provider for the grounding model - **Required**
- **`--ground_url`**: The URL of the grounding model - **Required**
- **`--ground_model`**: The model name for the grounding model - **Required**
- **`--grounding_width`**: Width of the output coordinate resolution from the grounding model - **Required**
- **`--grounding_height`**: Height of the output coordinate resolution from the grounding model - **Required**

#### Grounding Model Dimensions
The grounding width and height should match the output coordinate resolution of your grounding model:
- **UI-TARS-1.5-7B**: Use `--grounding_width 1920 --grounding_height 1080`
- **UI-TARS-72B**: Use `--grounding_width 1000 --grounding_height 1000`

#### Optional Parameters
- **`--model_url`**: Custom API URL for main generation model - Default: ""
- **`--model_api_key`**: API key for main generation model - Default: ""
- **`--ground_api_key`**: API key for grounding model endpoint - Default: ""
- **`--max_trajectory_length`**: Maximum number of image turns to keep in trajectory - Default: 8
- **`--enable_reflection`**: Enable reflection agent to assist the worker agent - Default: True

### `gui_agents` SDK

First, we import the necessary modules. `AgentS2_5` is the main agent class for Agent S2.5. `OSWorldACI` is our grounding agent that translates agent actions into executable python code.
```python
import pyautogui
import io
from gui_agents.s2_5.agents.agent_s import AgentS2_5
from gui_agents.s2_5.agents.grounding import OSWorldACI

# Load in your API keys.
from dotenv import load_dotenv
load_dotenv()

current_platform = "linux"  # "darwin", "windows"
```

Next, we define our engine parameters. `engine_params` is used for the main agent, and `engine_params_for_grounding` is for grounding. For `engine_params_for_grounding`, we support custom endpoints like HuggingFace TGI, vLLM, and Open Router.

```python
engine_params = {
  "engine_type": provider,
  "model": model,
  "base_url": model_url,     # Optional
  "api_key": model_api_key,  # Optional
}

# Load the grounding engine from a custom endpoint
ground_provider = "<your_ground_provider>"
ground_url = "<your_ground_url>"
ground_model = "<your_ground_model>"
ground_api_key = "<your_ground_api_key>"

# Set grounding dimensions based on your model's output coordinate resolution
# UI-TARS-1.5-7B: grounding_width=1920, grounding_height=1080
# UI-TARS-72B: grounding_width=1000, grounding_height=1000
grounding_width = 1920  # Width of output coordinate resolution
grounding_height = 1080  # Height of output coordinate resolution

engine_params_for_grounding = {
  "engine_type": ground_provider,
  "model": ground_model,
  "base_url": ground_url,
  "api_key": ground_api_key,  # Optional
  "grounding_width": grounding_width,
  "grounding_height": grounding_height,
}
```

Then, we define our grounding agent and Agent S2.5.

```python
grounding_agent = OSWorldACI(
    platform=current_platform,
    engine_params_for_generation=engine_params,
    engine_params_for_grounding=engine_params_for_grounding,
    width=1920,  # Optional: screen width
    height=1080  # Optional: screen height
)

agent = AgentS2_5(
    engine_params,
    grounding_agent,
    platform=current_platform,
    max_trajectory_length=8,  # Optional: maximum image turns to keep
    enable_reflection=True     # Optional: enable reflection agent
)
```

Finally, let's query the agent!

```python
# Get screenshot.
screenshot = pyautogui.screenshot()
buffered = io.BytesIO() 
screenshot.save(buffered, format="PNG")
screenshot_bytes = buffered.getvalue()

obs = {
  "screenshot": screenshot_bytes,
}

instruction = "Close VS Code"
info, action = agent.predict(instruction=instruction, observation=obs)

exec(action[0])
```

Refer to `gui_agents/s2_5/cli_app.py` for more details on how the inference loop works.

### OSWorld

To deploy Agent S2.5 in OSWorld, follow the [OSWorld Deployment instructions](osworld_setup/s2_5/OSWorld.md).

## 💬 Citations

If you find this codebase useful, please cite:

```
@misc{Agent-S2,
      title={Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents}, 
      author={Saaket Agashe and Kyle Wong and Vincent Tu and Jiachen Yang and Ang Li and Xin Eric Wang},
      year={2025},
      eprint={2504.00906},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2504.00906}, 
}

@inproceedings{Agent-S,
    title={{Agent S: An Open Agentic Framework that Uses Computers Like a Human}},
    author={Saaket Agashe and Jiuzhou Han and Shuyu Gan and Jiachen Yang and Ang Li and Xin Eric Wang},
    booktitle={International Conference on Learning Representations (ICLR)},
    year={2025},
    url={https://arxiv.org/abs/2410.08164}
}
```

## Star History

[![Star History Chart](https://api.star-history.com/svg?repos=simular-ai/Agent-S&type=Date)](https://www.star-history.com/#agent-s/agent-s&simular-ai/Agent-S&Date)

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "gui-agents",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<=3.12,>=3.9",
    "maintainer_email": null,
    "keywords": "ai, llm, gui, agent, multimodal",
    "author": "Simular AI",
    "author_email": "eric@simular.ai",
    "download_url": "https://files.pythonhosted.org/packages/44/d5/fc1b518b2107c61ca92370bf12e598f11010c070be1852e1326d6fd190cf/gui_agents-0.2.5.post2.tar.gz",
    "platform": null,
    "description": "<h1 align=\"center\">\n  <img src=\"images/agent_s.png\" alt=\"Logo\" style=\"vertical-align:middle\" width=\"60\"> Agent S:\n  <small>Use Computer Like a Human</small>\n</h1>\n\n<p align=\"center\">&nbsp;\n  \ud83c\udf10 <a href=\"https://www.simular.ai/articles/agent-s2-technical-review\">[S2 blog]</a>&nbsp;\n  \ud83d\udcc4 <a href=\"https://arxiv.org/abs/2504.00906\">[S2 Paper (COLM 2025)]</a>&nbsp;\n  \ud83c\udfa5 <a href=\"https://www.youtube.com/watch?v=wUGVQl7c0eg\">[S2 Video]</a>\n</p>\n\n<p align=\"center\">&nbsp;\n  \ud83c\udf10 <a href=\"https://www.simular.ai/agent-s\">[S1 blog]</a>&nbsp;\n  \ud83d\udcc4 <a href=\"https://arxiv.org/abs/2410.08164\">[S1 Paper (ICLR 2025)]</a>&nbsp;\n  \ud83c\udfa5 <a href=\"https://www.youtube.com/watch?v=OBDE3Knte0g\">[S1 Video]</a>\n</p>\n\n<p align=\"center\">&nbsp;\n<a href=\"https://trendshift.io/repositories/13151\" target=\"_blank\"><img src=\"https://trendshift.io/api/badge/repositories/13151\" alt=\"simular-ai%2FAgent-S | Trendshift\" style=\"width: 250px; height: 55px;\" width=\"250\" height=\"55\"/></a>\n</p>\n\n<p align=\"center\">\n  <a href=\"https://discord.gg/E2XfsK9fPV\">\n    <img src=\"https://dcbadge.limes.pink/api/server/https://discord.gg/E2XfsK9fPV?style=flat\" alt=\"Discord\">\n  </a>\n  &nbsp;&nbsp;\n  <a href=\"https://pepy.tech/projects/gui-agents\">\n    <img src=\"https://static.pepy.tech/badge/gui-agents\" alt=\"PyPI Downloads\">\n  </a>\n</p>\n\n<div align=\"center\">\n  <!-- Keep these links. Translations will automatically update with the README. -->\n  <a href=\"https://www.readme-i18n.com/simular-ai/Agent-S?lang=de\">Deutsch</a> | \n  <a href=\"https://www.readme-i18n.com/simular-ai/Agent-S?lang=es\">Espa\u00f1ol</a> | \n  <a href=\"https://www.readme-i18n.com/simular-ai/Agent-S?lang=fr\">fran\u00e7ais</a> | \n  <a href=\"https://www.readme-i18n.com/simular-ai/Agent-S?lang=ja\">\u65e5\u672c\u8a9e</a> | \n  <a href=\"https://www.readme-i18n.com/simular-ai/Agent-S?lang=ko\">\ud55c\uad6d\uc5b4</a> | \n  <a href=\"https://www.readme-i18n.com/simular-ai/Agent-S?lang=pt\">Portugu\u00eas</a> | \n  <a href=\"https://www.readme-i18n.com/simular-ai/Agent-S?lang=ru\">\u0420\u0443\u0441\u0441\u043a\u0438\u0439</a> | \n  <a href=\"https://www.readme-i18n.com/simular-ai/Agent-S?lang=zh\">\u4e2d\u6587</a>\n</div>\n\n## \ud83e\udd73 Updates\n- [x] **2025/08/01**: Agent S2.5 is released: simpler, better, and faster! New SOTA on [OSWorld-Verified](https://os-world.github.io)!\n- [x] **2025/07/07**: The [Agent S2 paper](https://arxiv.org/abs/2504.00906) is accepted to COLM 2025! See you in Montreal!\n- [x] **2025/04/01**: Released the [Agent S2 paper](https://arxiv.org/abs/2504.00906) with new SOTA results on OSWorld, WindowsAgentArena, and AndroidWorld!\n- [x] **2025/03/12**: Released Agent S2 along with v0.2.0 of [gui-agents](https://github.com/simular-ai/Agent-S), the new state-of-the-art for computer use agents (CUA), outperforming OpenAI's CUA/Operator and Anthropic's Claude 3.7 Sonnet Computer-Use!\n- [x] **2025/01/22**: The [Agent S paper](https://arxiv.org/abs/2410.08164) is accepted to ICLR 2025!\n- [x] **2025/01/21**: Released v0.1.2 of [gui-agents](https://github.com/simular-ai/Agent-S) library, with support for Linux and Windows!\n- [x] **2024/12/05**: Released v0.1.0 of [gui-agents](https://github.com/simular-ai/Agent-S) library, allowing you to use Agent-S for Mac, OSWorld, and WindowsAgentArena with ease!\n- [x] **2024/10/10**: Released the [Agent S paper](https://arxiv.org/abs/2410.08164) and codebase!\n\n## Table of Contents\n\n1. [\ud83d\udca1 Introduction](#-introduction)\n2. [\ud83c\udfaf Current Results](#-current-results)\n3. [\ud83d\udee0\ufe0f Installation & Setup](#%EF%B8%8F-installation--setup) \n4. [\ud83d\ude80 Usage](#-usage)\n5. [\ud83e\udd1d Acknowledgements](#-acknowledgements)\n6. [\ud83d\udcac Citation](#-citation)\n\n## \ud83d\udca1 Introduction\n\nWelcome to **Agent S**, an open-source framework designed to enable autonomous interaction with computers through Agent-Computer Interface. Our mission is to build intelligent GUI agents that can learn from past experiences and perform complex tasks autonomously on your computer. \n\nWhether you're interested in AI, automation, or contributing to cutting-edge agent-based systems, we're excited to have you here!\n\n## \ud83c\udfaf Current Results\n\n<div align=\"center\">\n  <table border=\"0\" cellspacing=\"0\" cellpadding=\"5\">\n    <tr>\n      <th>Benchmark</th>\n      <th>Agent S2.5</th>\n      <th>Previous SOTA</th>\n    </tr>\n    <tr>\n      <td>OSWorld Verified (100 step)</td>\n      <td><b>56.0%</b></td>\n      <td>53.1%</td>\n    </tr>\n    <tr>\n      <td>OSWorld Verified (50 step)</td>\n      <td><b>54.2%</b></td>\n      <td>50.6%</td>\n    </tr>\n<!--     <tr>\n      <td>WindowsAgentArena</td>\n      <td>29.8%</td>\n      <td>19.5% (NAVI)</td>\n    </tr>\n    <tr>\n      <td>AndroidWorld</td>\n      <td>54.3%</td>\n      <td>46.8% (UI-TARS)</td>\n    </tr> -->\n  </table>\n</div>\n\n\n## \ud83d\udee0\ufe0f Installation & Setup\n\n### Prerequisites\n- **Single Monitor**: Our agent is designed for single monitor screens\n- **Security**: The agent runs Python code to control your computer - use with care\n- **Supported Platforms**: Linux, Mac, and Windows\n\n\n### Installation\n```bash\npip install gui-agents\n```\n\n### API Configuration\n\n#### Option 1: Environment Variables\nAdd to your `.bashrc` (Linux) or `.zshrc` (MacOS):\n```bash\nexport OPENAI_API_KEY=<YOUR_API_KEY>\nexport ANTHROPIC_API_KEY=<YOUR_ANTHROPIC_API_KEY>\nexport HF_TOKEN=<YOUR_HF_TOKEN>\n```\n\n#### Option 2: Python Script\n```python\nimport os\nos.environ[\"OPENAI_API_KEY\"] = \"<YOUR_API_KEY>\"\n```\n\n### Supported Models\nWe support Azure OpenAI, Anthropic, Gemini, Open Router, and vLLM inference. See [models.md](models.md) for details.\n\n### Grounding Models (Required)\nFor optimal performance, we recommend [UI-TARS-1.5-7B](https://huggingface.co/ByteDance-Seed/UI-TARS-1.5-7B) hosted on Hugging Face Inference Endpoints or another provider. See [Hugging Face Inference Endpoints](https://huggingface.co/learn/cookbook/en/enterprise_dedicated_endpoints) for setup instructions.\n\n## \ud83d\ude80 Usage\n\n\n> \u26a1\ufe0f **Recommended Setup:**  \n> For the best configuration, we recommend using **OpenAI o3-2025-04-16** as the main model, paired with **UI-TARS-1.5-7B** for grounding.  \n\n\n### CLI\n\nRun Agent S2.5 with the required parameters:\n\n```bash\nagent_s \\\n    --provider openai \\\n    --model o3-2025-04-16 \\\n    --ground_provider huggingface \\\n    --ground_url http://localhost:8080 \\\n    --ground_model ui-tars-1.5-7b \\\n    --grounding_width 1920 \\\n    --grounding_height 1080\n```\n\n#### Required Parameters\n- **`--provider`**: Main generation model provider (e.g., openai, anthropic, etc.) - Default: \"openai\"\n- **`--model`**: Main generation model name (e.g., o3-2025-04-16) - Default: \"o3-2025-04-16\"\n- **`--ground_provider`**: The provider for the grounding model - **Required**\n- **`--ground_url`**: The URL of the grounding model - **Required**\n- **`--ground_model`**: The model name for the grounding model - **Required**\n- **`--grounding_width`**: Width of the output coordinate resolution from the grounding model - **Required**\n- **`--grounding_height`**: Height of the output coordinate resolution from the grounding model - **Required**\n\n#### Grounding Model Dimensions\nThe grounding width and height should match the output coordinate resolution of your grounding model:\n- **UI-TARS-1.5-7B**: Use `--grounding_width 1920 --grounding_height 1080`\n- **UI-TARS-72B**: Use `--grounding_width 1000 --grounding_height 1000`\n\n#### Optional Parameters\n- **`--model_url`**: Custom API URL for main generation model - Default: \"\"\n- **`--model_api_key`**: API key for main generation model - Default: \"\"\n- **`--ground_api_key`**: API key for grounding model endpoint - Default: \"\"\n- **`--max_trajectory_length`**: Maximum number of image turns to keep in trajectory - Default: 8\n- **`--enable_reflection`**: Enable reflection agent to assist the worker agent - Default: True\n\n### `gui_agents` SDK\n\nFirst, we import the necessary modules. `AgentS2_5` is the main agent class for Agent S2.5. `OSWorldACI` is our grounding agent that translates agent actions into executable python code.\n```python\nimport pyautogui\nimport io\nfrom gui_agents.s2_5.agents.agent_s import AgentS2_5\nfrom gui_agents.s2_5.agents.grounding import OSWorldACI\n\n# Load in your API keys.\nfrom dotenv import load_dotenv\nload_dotenv()\n\ncurrent_platform = \"linux\"  # \"darwin\", \"windows\"\n```\n\nNext, we define our engine parameters. `engine_params` is used for the main agent, and `engine_params_for_grounding` is for grounding. For `engine_params_for_grounding`, we support custom endpoints like HuggingFace TGI, vLLM, and Open Router.\n\n```python\nengine_params = {\n  \"engine_type\": provider,\n  \"model\": model,\n  \"base_url\": model_url,     # Optional\n  \"api_key\": model_api_key,  # Optional\n}\n\n# Load the grounding engine from a custom endpoint\nground_provider = \"<your_ground_provider>\"\nground_url = \"<your_ground_url>\"\nground_model = \"<your_ground_model>\"\nground_api_key = \"<your_ground_api_key>\"\n\n# Set grounding dimensions based on your model's output coordinate resolution\n# UI-TARS-1.5-7B: grounding_width=1920, grounding_height=1080\n# UI-TARS-72B: grounding_width=1000, grounding_height=1000\ngrounding_width = 1920  # Width of output coordinate resolution\ngrounding_height = 1080  # Height of output coordinate resolution\n\nengine_params_for_grounding = {\n  \"engine_type\": ground_provider,\n  \"model\": ground_model,\n  \"base_url\": ground_url,\n  \"api_key\": ground_api_key,  # Optional\n  \"grounding_width\": grounding_width,\n  \"grounding_height\": grounding_height,\n}\n```\n\nThen, we define our grounding agent and Agent S2.5.\n\n```python\ngrounding_agent = OSWorldACI(\n    platform=current_platform,\n    engine_params_for_generation=engine_params,\n    engine_params_for_grounding=engine_params_for_grounding,\n    width=1920,  # Optional: screen width\n    height=1080  # Optional: screen height\n)\n\nagent = AgentS2_5(\n    engine_params,\n    grounding_agent,\n    platform=current_platform,\n    max_trajectory_length=8,  # Optional: maximum image turns to keep\n    enable_reflection=True     # Optional: enable reflection agent\n)\n```\n\nFinally, let's query the agent!\n\n```python\n# Get screenshot.\nscreenshot = pyautogui.screenshot()\nbuffered = io.BytesIO() \nscreenshot.save(buffered, format=\"PNG\")\nscreenshot_bytes = buffered.getvalue()\n\nobs = {\n  \"screenshot\": screenshot_bytes,\n}\n\ninstruction = \"Close VS Code\"\ninfo, action = agent.predict(instruction=instruction, observation=obs)\n\nexec(action[0])\n```\n\nRefer to `gui_agents/s2_5/cli_app.py` for more details on how the inference loop works.\n\n### OSWorld\n\nTo deploy Agent S2.5 in OSWorld, follow the [OSWorld Deployment instructions](osworld_setup/s2_5/OSWorld.md).\n\n## \ud83d\udcac Citations\n\nIf you find this codebase useful, please cite:\n\n```\n@misc{Agent-S2,\n      title={Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents}, \n      author={Saaket Agashe and Kyle Wong and Vincent Tu and Jiachen Yang and Ang Li and Xin Eric Wang},\n      year={2025},\n      eprint={2504.00906},\n      archivePrefix={arXiv},\n      primaryClass={cs.AI},\n      url={https://arxiv.org/abs/2504.00906}, \n}\n\n@inproceedings{Agent-S,\n    title={{Agent S: An Open Agentic Framework that Uses Computers Like a Human}},\n    author={Saaket Agashe and Jiuzhou Han and Shuyu Gan and Jiachen Yang and Ang Li and Xin Eric Wang},\n    booktitle={International Conference on Learning Representations (ICLR)},\n    year={2025},\n    url={https://arxiv.org/abs/2410.08164}\n}\n```\n\n## Star History\n\n[![Star History Chart](https://api.star-history.com/svg?repos=simular-ai/Agent-S&type=Date)](https://www.star-history.com/#agent-s/agent-s&simular-ai/Agent-S&Date)\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A library for creating general purpose GUI agents using multimodal LLMs.",
    "version": "0.2.5.post2",
    "project_urls": {
        "Bug Reports": "https://github.com/simular-ai/Agent-S/issues",
        "Source": "https://github.com/simular-ai/Agent-S"
    },
    "split_keywords": [
        "ai",
        " llm",
        " gui",
        " agent",
        " multimodal"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "335325cd686ee8d20a4d59b72f047078d575383c8660d258d66ccb181ced544d",
                "md5": "2d5ebd536efb70a5dc11f5233fdad08e",
                "sha256": "7e4c3630597416831e666f8fa3832c6aa976d130a7ef14cf7fe239b919d339d8"
            },
            "downloads": -1,
            "filename": "gui_agents-0.2.5.post2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2d5ebd536efb70a5dc11f5233fdad08e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<=3.12,>=3.9",
            "size": 125710,
            "upload_time": "2025-08-01T16:24:48",
            "upload_time_iso_8601": "2025-08-01T16:24:48.515319Z",
            "url": "https://files.pythonhosted.org/packages/33/53/25cd686ee8d20a4d59b72f047078d575383c8660d258d66ccb181ced544d/gui_agents-0.2.5.post2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "44d5fc1b518b2107c61ca92370bf12e598f11010c070be1852e1326d6fd190cf",
                "md5": "25bc2193f880658b5454cdc812799419",
                "sha256": "e7a1e68228fd12ee06e0dabfbe8291d44f1e82f2b0a0dd6344ada7cde4257fe0"
            },
            "downloads": -1,
            "filename": "gui_agents-0.2.5.post2.tar.gz",
            "has_sig": false,
            "md5_digest": "25bc2193f880658b5454cdc812799419",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<=3.12,>=3.9",
            "size": 108431,
            "upload_time": "2025-08-01T16:24:49",
            "upload_time_iso_8601": "2025-08-01T16:24:49.565127Z",
            "url": "https://files.pythonhosted.org/packages/44/d5/fc1b518b2107c61ca92370bf12e598f11010c070be1852e1326d6fd190cf/gui_agents-0.2.5.post2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-01 16:24:49",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "simular-ai",
    "github_project": "Agent-S",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "numpy",
            "specs": []
        },
        {
            "name": "backoff",
            "specs": []
        },
        {
            "name": "pandas",
            "specs": []
        },
        {
            "name": "openai",
            "specs": []
        },
        {
            "name": "anthropic",
            "specs": []
        },
        {
            "name": "fastapi",
            "specs": []
        },
        {
            "name": "uvicorn",
            "specs": []
        },
        {
            "name": "paddleocr",
            "specs": []
        },
        {
            "name": "paddlepaddle",
            "specs": []
        },
        {
            "name": "together",
            "specs": []
        },
        {
            "name": "scikit-learn",
            "specs": []
        },
        {
            "name": "websockets",
            "specs": []
        },
        {
            "name": "tiktoken",
            "specs": []
        },
        {
            "name": "pyautogui",
            "specs": []
        },
        {
            "name": "toml",
            "specs": []
        },
        {
            "name": "black",
            "specs": []
        },
        {
            "name": "pytesseract",
            "specs": []
        },
        {
            "name": "google-genai",
            "specs": []
        },
        {
            "name": "pyobjc",
            "specs": []
        },
        {
            "name": "pywinauto",
            "specs": []
        },
        {
            "name": "pywin32",
            "specs": []
        }
    ],
    "lcname": "gui-agents"
}

Simular AI