# webtask
LLM-powered web automation library with autonomous agents and natural language selectors.
---
## What it does
Three ways to use it:
**High-level** - Give it a task, let it figure out the steps
**Step-by-step** - Execute tasks one step at a time for debugging/control
**Low-level** - Tell it exactly what to do with natural language selectors
Uses LLMs to understand pages, plan actions, and select elements. Built with Playwright for the browser stuff.
---
## Quick look
**Setup:**
```python
from webtask import Webtask
from webtask.integrations.llm import GeminiLLM
# Create Webtask manager (browser launches lazily)
wt = Webtask()
# Choose your LLM (Gemini or OpenAI)
llm = GeminiLLM.create(model="gemini-2.5-flash")
# Create agent
agent = await wt.create_agent(llm=llm)
```
**High-level autonomous:**
```python
# Agent figures out the steps
result = await agent.execute("search for cats and click the first result")
print(f"Completed: {result.completed}")
```
**Step-by-step execution:**
```python
# Execute task one step at a time
agent.set_task("add 2 items to cart")
for i in range(10):
step = await agent.run_step()
print(f"Step {i+1}: {len(step.proposals)} actions")
print(f"Verification: {step.verification.message}")
if step.verification.complete:
break
# Useful for debugging, progress tracking, or custom control flow
```
**Low-level imperative:**
```python
# You control the steps, agent handles the selectors
await agent.navigate("https://google.com")
search_box = await agent.select("search box")
await search_box.fill("cats")
button = await agent.select("search button")
await button.click()
# Wait for page to stabilize
await agent.wait_for_idle()
# Take screenshot
await agent.screenshot("result.png")
```
No CSS selectors. No XPath. Just describe what you want.
---
## How it works
**High-level mode** - The agent loop:
1. Proposer looks at the page and task, decides next action
2. Executer runs it (click, type, navigate, etc.)
3. Verifier checks if task complete
4. Repeat until done
**Step-by-step mode** - Same as high-level but you control the loop:
- `agent.set_task(description)` - Set the task
- `agent.execute_step()` - Execute one step (propose → execute → verify)
- `agent.clear_history()` - Reset for new task
**Low-level mode** - You call methods directly:
- `agent.navigate(url)` - Go to a page
- `agent.select(description)` - Find element by natural language
- `element.click()`, `element.fill(text)`, `element.type(text)` - Interact with elements
- `agent.wait(seconds)` - Wait for specific duration
- `agent.wait_for_idle()` - Wait for network/DOM to stabilize
- `agent.screenshot(path)` - Capture page screenshot
All modes use the same core: LLM sees cleaned DOM with element IDs like `button-0` instead of raw HTML. Clean input, clean output.
---
## Status
🚧 Work in progress
Core implementation complete. See [TODO](docs/todo.md) for testing plan and future work.
---
## Benchmarks
Evaluate webtask on standard web agent benchmarks:
**[webtask-benchmarks](https://github.com/steve-z-wang/webtask-benchmarks)** - Evaluation framework for Mind2Web and other benchmarks
---
## Install
```bash
pip install pywebtask
playwright install chromium
```
---
## License
MIT
Raw data
{
"_id": null,
"home_page": null,
"name": "pywebtask",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "automation, web-automation, browser-automation, llm, ai-agent, playwright, web-agent, natural-language",
"author": "Steve Wang",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/a2/df/1e9c140b1fd2cbe86b2ffb7af19f7da1d1d9fb71feb53f201d9fd0f502de/pywebtask-0.2.1.tar.gz",
"platform": null,
"description": "# webtask\n\nLLM-powered web automation library with autonomous agents and natural language selectors.\n\n---\n\n## What it does\n\nThree ways to use it:\n\n**High-level** - Give it a task, let it figure out the steps\n**Step-by-step** - Execute tasks one step at a time for debugging/control\n**Low-level** - Tell it exactly what to do with natural language selectors\n\nUses LLMs to understand pages, plan actions, and select elements. Built with Playwright for the browser stuff.\n\n---\n\n## Quick look\n\n**Setup:**\n```python\nfrom webtask import Webtask\nfrom webtask.integrations.llm import GeminiLLM\n\n# Create Webtask manager (browser launches lazily)\nwt = Webtask()\n\n# Choose your LLM (Gemini or OpenAI)\nllm = GeminiLLM.create(model=\"gemini-2.5-flash\")\n\n# Create agent\nagent = await wt.create_agent(llm=llm)\n```\n\n**High-level autonomous:**\n```python\n# Agent figures out the steps\nresult = await agent.execute(\"search for cats and click the first result\")\nprint(f\"Completed: {result.completed}\")\n```\n\n**Step-by-step execution:**\n```python\n# Execute task one step at a time\nagent.set_task(\"add 2 items to cart\")\n\nfor i in range(10):\n step = await agent.run_step()\n\n print(f\"Step {i+1}: {len(step.proposals)} actions\")\n print(f\"Verification: {step.verification.message}\")\n\n if step.verification.complete:\n break\n\n# Useful for debugging, progress tracking, or custom control flow\n```\n\n**Low-level imperative:**\n```python\n# You control the steps, agent handles the selectors\nawait agent.navigate(\"https://google.com\")\n\nsearch_box = await agent.select(\"search box\")\nawait search_box.fill(\"cats\")\n\nbutton = await agent.select(\"search button\")\nawait button.click()\n\n# Wait for page to stabilize\nawait agent.wait_for_idle()\n\n# Take screenshot\nawait agent.screenshot(\"result.png\")\n```\n\nNo CSS selectors. No XPath. Just describe what you want.\n\n---\n\n## How it works\n\n**High-level mode** - The agent loop:\n1. Proposer looks at the page and task, decides next action\n2. Executer runs it (click, type, navigate, etc.)\n3. Verifier checks if task complete\n4. Repeat until done\n\n**Step-by-step mode** - Same as high-level but you control the loop:\n- `agent.set_task(description)` - Set the task\n- `agent.execute_step()` - Execute one step (propose \u2192 execute \u2192 verify)\n- `agent.clear_history()` - Reset for new task\n\n**Low-level mode** - You call methods directly:\n- `agent.navigate(url)` - Go to a page\n- `agent.select(description)` - Find element by natural language\n- `element.click()`, `element.fill(text)`, `element.type(text)` - Interact with elements\n- `agent.wait(seconds)` - Wait for specific duration\n- `agent.wait_for_idle()` - Wait for network/DOM to stabilize\n- `agent.screenshot(path)` - Capture page screenshot\n\nAll modes use the same core: LLM sees cleaned DOM with element IDs like `button-0` instead of raw HTML. Clean input, clean output.\n\n---\n\n## Status\n\n\ud83d\udea7 Work in progress\n\nCore implementation complete. See [TODO](docs/todo.md) for testing plan and future work.\n\n---\n\n## Benchmarks\n\nEvaluate webtask on standard web agent benchmarks:\n\n**[webtask-benchmarks](https://github.com/steve-z-wang/webtask-benchmarks)** - Evaluation framework for Mind2Web and other benchmarks\n\n---\n\n## Install\n\n```bash\npip install pywebtask\nplaywright install chromium\n```\n\n---\n\n## License\n\nMIT\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "LLM-powered web automation library with autonomous agents and natural language selectors",
"version": "0.2.1",
"project_urls": {
"Documentation": "https://github.com/steve-z-wang/webtask#readme",
"Homepage": "https://github.com/steve-z-wang/webtask",
"Issues": "https://github.com/steve-z-wang/webtask/issues",
"Repository": "https://github.com/steve-z-wang/webtask"
},
"split_keywords": [
"automation",
" web-automation",
" browser-automation",
" llm",
" ai-agent",
" playwright",
" web-agent",
" natural-language"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "fc45306dc66a56de94137ac2d68ac658aae524ee1e35f6deb2a0d08316d3a3df",
"md5": "e9a6fdb6dd5b8a1e8ccc04e28c6e4247",
"sha256": "0e30ca667919a615db06819de723cd2de1e9168873f18ffb26d971fd9d18e1d6"
},
"downloads": -1,
"filename": "pywebtask-0.2.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e9a6fdb6dd5b8a1e8ccc04e28c6e4247",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 65820,
"upload_time": "2025-10-24T00:05:06",
"upload_time_iso_8601": "2025-10-24T00:05:06.868575Z",
"url": "https://files.pythonhosted.org/packages/fc/45/306dc66a56de94137ac2d68ac658aae524ee1e35f6deb2a0d08316d3a3df/pywebtask-0.2.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "a2df1e9c140b1fd2cbe86b2ffb7af19f7da1d1d9fb71feb53f201d9fd0f502de",
"md5": "9dbb5caf8b5750e80dba6d0d8c79288e",
"sha256": "b52d52f113fc2127c27d27b0a3f9036a08101fa52bc8f465b3400f9d3b937fb2"
},
"downloads": -1,
"filename": "pywebtask-0.2.1.tar.gz",
"has_sig": false,
"md5_digest": "9dbb5caf8b5750e80dba6d0d8c79288e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 42910,
"upload_time": "2025-10-24T00:05:08",
"upload_time_iso_8601": "2025-10-24T00:05:08.457285Z",
"url": "https://files.pythonhosted.org/packages/a2/df/1e9c140b1fd2cbe86b2ffb7af19f7da1d1d9fb71feb53f201d9fd0f502de/pywebtask-0.2.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-24 00:05:08",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "steve-z-wang",
"github_project": "webtask#readme",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "pywebtask"
}