Name | gui-agents JSON |
Version |
0.1.3
JSON |
| download |
home_page | None |
Summary | A library for creating general purpose GUI agents using multimodal LLMs. |
upload_time | 2025-02-24 01:27:03 |
maintainer | None |
docs_url | None |
author | Simular AI |
requires_python | >=3.9 |
license | None |
keywords |
ai
llm
gui
agent
multimodal
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
<h1 align="center">
<img src="images/agent_s.png" alt="Logo" style="vertical-align:middle" width="60"> Agent S:
<small>Using Computers Like a Human</small>
</h1>
<p align="center">
π <a href="https://www.simular.ai/agent-s">[Website]</a>
π <a href="https://arxiv.org/abs/2410.08164">[Paper]</a>
π₯ <a href="https://www.youtube.com/watch?v=OBDE3Knte0g">[Video]</a>
π¨οΈ <a href="https://discord.gg/E2XfsK9fPV">[Discord]</a>
</p>
## π₯³ Updates
- [x] **2025/01/22**: The [Agent S paper](https://arxiv.org/abs/2410.08164) is accepted to ICLR 2025!
- [x] **2025/01/21**: Released v0.1.2 of [gui-agents](https://github.com/simular-ai/Agent-S) library, with support for Linux and Windows!
- [x] **2024/12/05**: Released v0.1.0 of [gui-agents](https://github.com/simular-ai/Agent-S) library, allowing you to use Agent-S for Mac, OSWorld, and WindowsAgentArena with ease!
- [x] **2024/10/10**: Released [Agent S paper](https://arxiv.org/abs/2410.08164) and codebase!
## Table of Contents
1. [π‘ Introduction](#-introduction)
2. [π― Current Results](#-current-results)
3. [π οΈ Installation](#%EF%B8%8F-installation)
4. [π Usage](#-usage)
5. [π Contributors](#-contributors)
6. [π¬ Citation](#-citation)
## π‘ Introduction
<p align="center">
<img src="./images/teaser.png" width="800">
</p>
Welcome to **Agent S**, an open-source framework designed to enable autonomous interaction with computers through Agent-Computer Interface. Our mission is to build intelligent GUI agents that can learn from past experiences and perform complex tasks autonomously on your computer.
Whether you're interested in AI, automation, or contributing to cutting-edge agent-based systems, we're excited to have you here!
## π― Current Results
<p align="center">
<img src="./images/results.png" width="600">
<br>
Results of Successful Rate (%) on the OSWorld full test set of all 369 test examples using Image + Accessibility Tree input.
</p>
## π οΈ Installation & Setup
> β**Warning**β: If you are on a Linux machine, creating a `conda` environment will interfere with `pyatspi`. As of now, there's no clean solution for this issue. Proceed through the installation without using `conda` or any virtual environment.
Clone the repository:
```
git clone https://github.com/simular-ai/Agent-S.git
```
Install the gui-agents package:
```
pip install gui-agents
```
Set your LLM API Keys and other environment variables. You can do this by adding the following line to your .bashrc (Linux), or .zshrc (MacOS) file.
```
export OPENAI_API_KEY=<YOUR_API_KEY>
```
Alternatively, you can set the environment variable in your Python script:
```
import os
os.environ["OPENAI_API_KEY"] = "<YOUR_API_KEY>"
```
We also support Azure OpenAI, Anthropic, and vLLM inference. For more information refer to [models.md](models.md).
### Setup Retrieval from Web using Perplexica
Agent S works best with web-knowledge retrieval. To enable this feature, you need to setup Perplexica:
1. Ensure Docker Desktop is installed and running on your system.
2. Navigate to the directory containing the project files.
```bash
cd Perplexica
git submodule update --init
```
3. Rename the `sample.config.toml` file to `config.toml`. For Docker setups, you need only fill in the following fields:
- `OPENAI`: Your OpenAI API key. **You only need to fill this if you wish to use OpenAI's models**.
- `OLLAMA`: Your Ollama API URL. You should enter it as `http://host.docker.internal:PORT_NUMBER`. If you installed Ollama on port 11434, use `http://host.docker.internal:11434`. For other ports, adjust accordingly. **You need to fill this if you wish to use Ollama's models instead of OpenAI's**.
- `GROQ`: Your Groq API key. **You only need to fill this if you wish to use Groq's hosted models**.
- `ANTHROPIC`: Your Anthropic API key. **You only need to fill this if you wish to use Anthropic models**.
**Note**: You can change these after starting Perplexica from the settings dialog.
- `SIMILARITY_MEASURE`: The similarity measure to use (This is filled by default; you can leave it as is if you are unsure about it.)
4. Ensure you are in the directory containing the `docker-compose.yaml` file and execute:
```bash
docker compose up -d
```
5. Our implementation of Agent S incorporates the Perplexica API to integrate a search engine capability, which allows for a more convenient and responsive user experience. If you want to tailor the API to your settings and specific requirements, you may modify the URL and the message of request parameters in `agent_s/query_perplexica.py`. For a comprehensive guide on configuring the Perplexica API, please refer to [Perplexica Search API Documentation](https://github.com/ItzCrazyKns/Perplexica/blob/master/docs/API/SEARCH.md)
For a more detailed setup and usage guide, please refer to the [Perplexica Repository](https://github.com/ItzCrazyKns/Perplexica.git).
### Setup Paddle-OCR Server
Switch to a new terminal where you will run Agent S. Set the OCR_SERVER_ADDRESS environment variable as shown below. For a better experience, add the following line directly to your .bashrc (Linux), or .zshrc (MacOS) file.
```
export OCR_SERVER_ADDRESS=http://localhost:8000/ocr/
```
Run the ocr_server.py file code to use OCR-based bounding boxes.
```
cd Agent-S
python gui_agents/utils/ocr_server.py
```
You can change the server address by editing the address in [agent_s/utils/ocr_server.py](agent_s/utils/ocr_server.py) file.
> β**Warning**β: The agent will directly run python code to control your computer. Please use with care.
## π Usage
### CLI
Run agent_s on your computer using:
```
agent_s --model gpt-4o
```
This will show a user query prompt where you can enter your query and interact with Agent S. You can use any model from the list of supported models in [models.md](models.md).
### `gui_agents` SDK
To deploy Agent S on MacOS or Windows:
```
import pyautogui
import io
from gui_agents.core.AgentS import GraphSearchAgent
import platform
if platform.system() == "Darwin":
from gui_agents.aci.MacOSACI import MacOSACI, UIElement
grounding_agent = MacOSACI()
elif platform.system() == "Windows":
from gui_agents.aci.WindowsOSACI import WindowsACI, UIElement
grounding_agent = WindowsACI()
elif platform.system() == "Linux":
from gui_agents.aci.LinuxOSACI import LinuxACI, UIElement
grounding_agent = LinuxACI()
else:
raise ValueError("Unsupported platform")
engine_params = {
"engine_type": "openai",
"model": "gpt-4o",
}
agent = GraphSearchAgent(
engine_params,
grounding_agent,
platform="ubuntu", # "macos", "windows"
action_space="pyautogui",
observation_type="mixed",
search_engine="Perplexica"
)
# Get screenshot.
screenshot = pyautogui.screenshot()
buffered = io.BytesIO()
screenshot.save(buffered, format="PNG")
screenshot_bytes = buffered.getvalue()
# Get accessibility tree.
acc_tree = UIElement.systemWideElement()
obs = {
"screenshot": screenshot_bytes,
"accessibility_tree": acc_tree,
}
instruction = "Close VS Code"
info, action = agent.predict(instruction=instruction, observation=obs)
exec(action[0])
```
Refer to `cli_app.py` for more details on how the inference loop works.
### OSWorld
To deploy Agent S in OSWorld, follow the [OSWorld Deployment instructions](OSWorld.md).
### WindowsAgentArena
To deploy Agent S in WindowsAgentArena, follow the [WindowsAgentArena Deployment instructions](WindowsAgentArena.md).
## π Contributors
Weβre grateful to all the [amazing people](https://github.com/simular-ai/Agent-S/graphs/contributors) who have contributed to this project. Thank you! π
## π¬ Citation
```
@misc{agashe2024agentsopenagentic,
title={Agent S: An Open Agentic Framework that Uses Computers Like a Human},
author={Saaket Agashe and Jiuzhou Han and Shuyu Gan and Jiachen Yang and Ang Li and Xin Eric Wang},
year={2024},
eprint={2410.08164},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2410.08164},
}
```
Raw data
{
"_id": null,
"home_page": null,
"name": "gui-agents",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "ai, llm, gui, agent, multimodal",
"author": "Simular AI",
"author_email": "eric@simular.ai",
"download_url": "https://files.pythonhosted.org/packages/2e/d0/19c1543a48da39a4385e41caab0fefce111d5168ad3f785fcecd40150810/gui_agents-0.1.3.tar.gz",
"platform": null,
"description": "<h1 align=\"center\">\n <img src=\"images/agent_s.png\" alt=\"Logo\" style=\"vertical-align:middle\" width=\"60\"> Agent S:\n <small>Using Computers Like a Human</small>\n</h1>\n\n<p align=\"center\">\n \ud83c\udf10 <a href=\"https://www.simular.ai/agent-s\">[Website]</a>\n \ud83d\udcc4 <a href=\"https://arxiv.org/abs/2410.08164\">[Paper]</a>\n \ud83c\udfa5 <a href=\"https://www.youtube.com/watch?v=OBDE3Knte0g\">[Video]</a>\n \ud83d\udde8\ufe0f <a href=\"https://discord.gg/E2XfsK9fPV\">[Discord]</a>\n</p>\n\n## \ud83e\udd73 Updates\n- [x] **2025/01/22**: The [Agent S paper](https://arxiv.org/abs/2410.08164) is accepted to ICLR 2025!\n- [x] **2025/01/21**: Released v0.1.2 of [gui-agents](https://github.com/simular-ai/Agent-S) library, with support for Linux and Windows!\n- [x] **2024/12/05**: Released v0.1.0 of [gui-agents](https://github.com/simular-ai/Agent-S) library, allowing you to use Agent-S for Mac, OSWorld, and WindowsAgentArena with ease!\n- [x] **2024/10/10**: Released [Agent S paper](https://arxiv.org/abs/2410.08164) and codebase!\n\n## Table of Contents\n\n1. [\ud83d\udca1 Introduction](#-introduction)\n2. [\ud83c\udfaf Current Results](#-current-results)\n3. [\ud83d\udee0\ufe0f Installation](#%EF%B8%8F-installation) \n4. [\ud83d\ude80 Usage](#-usage)\n5. [\ud83d\ude4c Contributors](#-contributors)\n6. [\ud83d\udcac Citation](#-citation)\n\n## \ud83d\udca1 Introduction\n\n<p align=\"center\">\n <img src=\"./images/teaser.png\" width=\"800\">\n</p>\n\nWelcome to **Agent S**, an open-source framework designed to enable autonomous interaction with computers through Agent-Computer Interface. Our mission is to build intelligent GUI agents that can learn from past experiences and perform complex tasks autonomously on your computer. \n\nWhether you're interested in AI, automation, or contributing to cutting-edge agent-based systems, we're excited to have you here!\n\n## \ud83c\udfaf Current Results\n\n<p align=\"center\">\n <img src=\"./images/results.png\" width=\"600\">\n <br>\n Results of Successful Rate (%) on the OSWorld full test set of all 369 test examples using Image + Accessibility Tree input.\n</p>\n\n\n## \ud83d\udee0\ufe0f Installation & Setup\n\n> \u2757**Warning**\u2757: If you are on a Linux machine, creating a `conda` environment will interfere with `pyatspi`. As of now, there's no clean solution for this issue. Proceed through the installation without using `conda` or any virtual environment.\n\nClone the repository:\n```\ngit clone https://github.com/simular-ai/Agent-S.git\n```\n\nInstall the gui-agents package:\n```\npip install gui-agents\n```\n\nSet your LLM API Keys and other environment variables. You can do this by adding the following line to your .bashrc (Linux), or .zshrc (MacOS) file. \n\n```\nexport OPENAI_API_KEY=<YOUR_API_KEY>\n```\n\nAlternatively, you can set the environment variable in your Python script:\n\n```\nimport os\nos.environ[\"OPENAI_API_KEY\"] = \"<YOUR_API_KEY>\"\n```\n\nWe also support Azure OpenAI, Anthropic, and vLLM inference. For more information refer to [models.md](models.md).\n\n### Setup Retrieval from Web using Perplexica\nAgent S works best with web-knowledge retrieval. To enable this feature, you need to setup Perplexica: \n\n1. Ensure Docker Desktop is installed and running on your system.\n\n2. Navigate to the directory containing the project files.\n\n ```bash\n cd Perplexica\n git submodule update --init\n ```\n\n3. Rename the `sample.config.toml` file to `config.toml`. For Docker setups, you need only fill in the following fields:\n\n - `OPENAI`: Your OpenAI API key. **You only need to fill this if you wish to use OpenAI's models**.\n - `OLLAMA`: Your Ollama API URL. You should enter it as `http://host.docker.internal:PORT_NUMBER`. If you installed Ollama on port 11434, use `http://host.docker.internal:11434`. For other ports, adjust accordingly. **You need to fill this if you wish to use Ollama's models instead of OpenAI's**.\n - `GROQ`: Your Groq API key. **You only need to fill this if you wish to use Groq's hosted models**.\n - `ANTHROPIC`: Your Anthropic API key. **You only need to fill this if you wish to use Anthropic models**.\n\n **Note**: You can change these after starting Perplexica from the settings dialog.\n\n - `SIMILARITY_MEASURE`: The similarity measure to use (This is filled by default; you can leave it as is if you are unsure about it.)\n\n4. Ensure you are in the directory containing the `docker-compose.yaml` file and execute:\n\n ```bash\n docker compose up -d\n ```\n\n5. Our implementation of Agent S incorporates the Perplexica API to integrate a search engine capability, which allows for a more convenient and responsive user experience. If you want to tailor the API to your settings and specific requirements, you may modify the URL and the message of request parameters in `agent_s/query_perplexica.py`. For a comprehensive guide on configuring the Perplexica API, please refer to [Perplexica Search API Documentation](https://github.com/ItzCrazyKns/Perplexica/blob/master/docs/API/SEARCH.md)\n\nFor a more detailed setup and usage guide, please refer to the [Perplexica Repository](https://github.com/ItzCrazyKns/Perplexica.git).\n\n### Setup Paddle-OCR Server\n\nSwitch to a new terminal where you will run Agent S. Set the OCR_SERVER_ADDRESS environment variable as shown below. For a better experience, add the following line directly to your .bashrc (Linux), or .zshrc (MacOS) file.\n\n```\nexport OCR_SERVER_ADDRESS=http://localhost:8000/ocr/\n```\n\nRun the ocr_server.py file code to use OCR-based bounding boxes.\n\n```\ncd Agent-S\npython gui_agents/utils/ocr_server.py\n```\n\nYou can change the server address by editing the address in [agent_s/utils/ocr_server.py](agent_s/utils/ocr_server.py) file.\n\n\n> \u2757**Warning**\u2757: The agent will directly run python code to control your computer. Please use with care.\n\n## \ud83d\ude80 Usage\n\n### CLI\n\nRun agent_s on your computer using: \n```\nagent_s --model gpt-4o\n```\nThis will show a user query prompt where you can enter your query and interact with Agent S. You can use any model from the list of supported models in [models.md](models.md).\n\n### `gui_agents` SDK\n\nTo deploy Agent S on MacOS or Windows:\n\n```\nimport pyautogui\nimport io\nfrom gui_agents.core.AgentS import GraphSearchAgent\nimport platform\n\nif platform.system() == \"Darwin\":\n from gui_agents.aci.MacOSACI import MacOSACI, UIElement\n grounding_agent = MacOSACI()\nelif platform.system() == \"Windows\":\n from gui_agents.aci.WindowsOSACI import WindowsACI, UIElement\n grounding_agent = WindowsACI()\nelif platform.system() == \"Linux\":\n from gui_agents.aci.LinuxOSACI import LinuxACI, UIElement\n grounding_agent = LinuxACI()\nelse:\n raise ValueError(\"Unsupported platform\")\n\nengine_params = {\n \"engine_type\": \"openai\",\n \"model\": \"gpt-4o\",\n}\n\nagent = GraphSearchAgent(\n engine_params,\n grounding_agent,\n platform=\"ubuntu\", # \"macos\", \"windows\"\n action_space=\"pyautogui\",\n observation_type=\"mixed\",\n search_engine=\"Perplexica\"\n)\n\n# Get screenshot.\nscreenshot = pyautogui.screenshot()\nbuffered = io.BytesIO() \nscreenshot.save(buffered, format=\"PNG\")\nscreenshot_bytes = buffered.getvalue()\n\n# Get accessibility tree.\nacc_tree = UIElement.systemWideElement()\n\nobs = {\n \"screenshot\": screenshot_bytes,\n \"accessibility_tree\": acc_tree,\n}\n\ninstruction = \"Close VS Code\"\ninfo, action = agent.predict(instruction=instruction, observation=obs)\n\nexec(action[0])\n```\n\nRefer to `cli_app.py` for more details on how the inference loop works.\n\n### OSWorld\n\nTo deploy Agent S in OSWorld, follow the [OSWorld Deployment instructions](OSWorld.md).\n\n### WindowsAgentArena\n\nTo deploy Agent S in WindowsAgentArena, follow the [WindowsAgentArena Deployment instructions](WindowsAgentArena.md).\n\n## \ud83d\ude4c Contributors\n\nWe\u2019re grateful to all the [amazing people](https://github.com/simular-ai/Agent-S/graphs/contributors) who have contributed to this project. Thank you! \ud83d\ude4f \n\n## \ud83d\udcac Citation\n```\n@misc{agashe2024agentsopenagentic,\n title={Agent S: An Open Agentic Framework that Uses Computers Like a Human}, \n author={Saaket Agashe and Jiuzhou Han and Shuyu Gan and Jiachen Yang and Ang Li and Xin Eric Wang},\n year={2024},\n eprint={2410.08164},\n archivePrefix={arXiv},\n primaryClass={cs.AI},\n url={https://arxiv.org/abs/2410.08164}, \n}\n```\n\n",
"bugtrack_url": null,
"license": null,
"summary": "A library for creating general purpose GUI agents using multimodal LLMs.",
"version": "0.1.3",
"project_urls": {
"Bug Reports": "https://github.com/simular-ai/Agent-S/issues",
"Source": "https://github.com/simular-ai/Agent-S"
},
"split_keywords": [
"ai",
" llm",
" gui",
" agent",
" multimodal"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "3bc7dd24cc880d80873939d37ecf8b7c259435c0c892df98f46483a968850610",
"md5": "e7a15636032a9b25e79efdb127b08867",
"sha256": "21cc565e83a222acbdb6d6438a8175daca9525b51e92ea1512378e59885fef41"
},
"downloads": -1,
"filename": "gui_agents-0.1.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e7a15636032a9b25e79efdb127b08867",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 838370,
"upload_time": "2025-02-24T01:27:01",
"upload_time_iso_8601": "2025-02-24T01:27:01.696941Z",
"url": "https://files.pythonhosted.org/packages/3b/c7/dd24cc880d80873939d37ecf8b7c259435c0c892df98f46483a968850610/gui_agents-0.1.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "2ed019c1543a48da39a4385e41caab0fefce111d5168ad3f785fcecd40150810",
"md5": "2ba198b38220319cc5fe3b3bf72f053d",
"sha256": "785594e61215dadbfa77bf8e5b27033b335e4966bd5f5e564871ba4071d06b1f"
},
"downloads": -1,
"filename": "gui_agents-0.1.3.tar.gz",
"has_sig": false,
"md5_digest": "2ba198b38220319cc5fe3b3bf72f053d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 819267,
"upload_time": "2025-02-24T01:27:03",
"upload_time_iso_8601": "2025-02-24T01:27:03.503866Z",
"url": "https://files.pythonhosted.org/packages/2e/d0/19c1543a48da39a4385e41caab0fefce111d5168ad3f785fcecd40150810/gui_agents-0.1.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-24 01:27:03",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "simular-ai",
"github_project": "Agent-S",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "gui-agents"
}