cerebellum


Namecerebellum JSON
Version 0.1 PyPI version JSON
download
home_pageNone
SummaryCerebellum is an AI-driven browser automation system.
upload_time2024-10-22 23:23:58
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords agent ai automation browser llm
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Cerebellum Browser Automation

## Quickstart

```python
from cerebellum.browser.planner import GeminiBrowserPlanner
from cerebellum.browser.session import BrowserSession
from playwright.sync_api import sync_playwright
import os

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    context = browser.new_context()
    context.tracing.start(screenshots=True)
    page = context.new_page()
    page.goto("https://www.amazon.com/")

    goal = "Add a USB C to USB C cable to cart"

    planner = GeminiBrowserPlanner(api_key=os.environ['GEMINI_API_KEY'])
    session = BrowserSession(goal, page, planner=planner)

    session.start()
```

## Overview

Cerebellum is an AI-driven browser automation system that uses large language models and planning to navigate web pages and achieve specific goals, providing a flexible, intelligent alternative to traditional rule-based automation.

## Features

- **AI-driven planning for navigating web pages**
- **Playwright-based browser control**
- **Vision-capable mode for image-based actions**
- **Local LLM integration for privacy**
- **Extensible for adding custom capabilities**
- **Apache 2.0 License**

## Why Cerebellum?

Cerebellum aims to make web interactions for AI as intuitive as human actions, serving as a bridge between AI decision-making and browser interactions.

Key benefits:

1. **Chainable**: Cerebellum allows multiple small or medium-sized goals to be chained together in series to achieve a final task. This chainability enables a human or a frontier model to plan an execution route, breaking down complex workflows into manageable steps. For example, Cerebellum can first log in to an e-commerce site, then search for a specific product, add it to the cart, and finally proceed to checkout. Each of these steps can be defined as separate goals, chained in sequence, allowing for a flexible and coordinated execution of complex tasks.
2. **Interoperability:** Cerebellum allows seamless integration between AI-driven automation, human intervention, and traditional rule-based automation. The Playwright page object can be handed off between different control mechanisms, enabling a smooth transition from AI-driven actions to manual human input or conventional scripted automation. This flexibility allows for robust handling of complex tasks where certain parts may benefit from direct human oversight or specific rule-based scripts, enhancing reliability and adaptability in web automation workflows.
3. **Human Knowledge Transcription**: Tools for supervising and creating "golden" sessions to improve AI performance. These tools include features for recording browser sessions and converting them into fine-tuning examples, making it easier to enhance the LLM's capabilities with real-world browsing data.

## Installation

Install from local source by navigating to the root directory and running:

```sh
conda develop ./cerebellum
# or
pip install -e ./cerebellum
```

For local LLM planners, also install [guidance](https://github.com/guidance-ai/guidance) and [llama-cpp-python](https://github.com/abetlen/llama-cpp-python).

## How it works

Cerebellum models web interactions as pathfinding on a directed graph. The state of a webpage is represented as nodes on an infinite graph, with each node capturing the current webpage state, including client-side and server-side information. Actions like clicking a button, filling out a form, or navigating to a new page are edges that transition the state from one node to another.

The process starts with a node representing the initial webpage state, aiming to find an optimal path to a terminal node that signifies goal completion. Each action transitions the webpage state along the graph's edges.

Neighboring nodes are discovered at runtime by analyzing the DOM structure to find interactable elements, like buttons, links, and input fields. These elements are mapped as possible actions that transition the state to new nodes, allowing the system to determine the next steps toward the goal.

The Large Language Model (LLM) planner acts as a heuristic, analyzing the current state, evaluating actions, and selecting the most promising next step. As Cerebellum interacts with the webpage, it updates its understanding of the graph structure, adapting to changes or new information.

This adaptive navigation allows Cerebellum to respond dynamically to changes in the webpage, making real-time decisions. Successful navigation sessions are used to fine-tune the LLM, improving its ability to handle similar tasks. By modeling web interactions in this way, Cerebellum offers a flexible approach to automating complex tasks in dynamic environments.

## Components

- `LocalLLMBrowserPlanner`: Generates browser actions.
- `ExtendedLlama3ChatTemplate`: Custom LLM interaction template.
- `BrowserState`, `BrowserAction`, `BrowserActionResult`: Core data structures.

## Contributing

We welcome contributions! You can help by:

1. **Code Contributions**: Fork the repo, create a branch, and submit a pull request.
2. **Bug Reports**: Report issues on GitHub.
3. **Feature Requests**: Share your ideas for improvements.
4. **Documentation**: Help refine the docs.
5. **Golden Session Files**: Submit `.cere` files for goals where Cerebellum struggles. This will help improve the AI's performance and contribute to fine-tuning efforts.

Refer to our CONTRIBUTING.md for more detailed guidelines.

## License

Apache 2.0

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "cerebellum",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "Han Wang <han.wang.2718@gmail.com>",
    "keywords": "agent, ai, automation, browser, llm",
    "author": null,
    "author_email": "Han Wang <han.wang.2718@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/b8/27/942238729ad8b537f05248e24b4541246ac2d82e72c68897ed18c95bc51f/cerebellum-0.1.tar.gz",
    "platform": null,
    "description": "# Cerebellum Browser Automation\n\n## Quickstart\n\n```python\nfrom cerebellum.browser.planner import GeminiBrowserPlanner\nfrom cerebellum.browser.session import BrowserSession\nfrom playwright.sync_api import sync_playwright\nimport os\n\nwith sync_playwright() as p:\n    browser = p.chromium.launch(headless=False)\n    context = browser.new_context()\n    context.tracing.start(screenshots=True)\n    page = context.new_page()\n    page.goto(\"https://www.amazon.com/\")\n\n    goal = \"Add a USB C to USB C cable to cart\"\n\n    planner = GeminiBrowserPlanner(api_key=os.environ['GEMINI_API_KEY'])\n    session = BrowserSession(goal, page, planner=planner)\n\n    session.start()\n```\n\n## Overview\n\nCerebellum is an AI-driven browser automation system that uses large language models and planning to navigate web pages and achieve specific goals, providing a flexible, intelligent alternative to traditional rule-based automation.\n\n## Features\n\n- **AI-driven planning for navigating web pages**\n- **Playwright-based browser control**\n- **Vision-capable mode for image-based actions**\n- **Local LLM integration for privacy**\n- **Extensible for adding custom capabilities**\n- **Apache 2.0 License**\n\n## Why Cerebellum?\n\nCerebellum aims to make web interactions for AI as intuitive as human actions, serving as a bridge between AI decision-making and browser interactions.\n\nKey benefits:\n\n1. **Chainable**: Cerebellum allows multiple small or medium-sized goals to be chained together in series to achieve a final task. This chainability enables a human or a frontier model to plan an execution route, breaking down complex workflows into manageable steps. For example, Cerebellum can first log in to an e-commerce site, then search for a specific product, add it to the cart, and finally proceed to checkout. Each of these steps can be defined as separate goals, chained in sequence, allowing for a flexible and coordinated execution of complex tasks.\n2. **Interoperability:** Cerebellum allows seamless integration between AI-driven automation, human intervention, and traditional rule-based automation. The Playwright page object can be handed off between different control mechanisms, enabling a smooth transition from AI-driven actions to manual human input or conventional scripted automation. This flexibility allows for robust handling of complex tasks where certain parts may benefit from direct human oversight or specific rule-based scripts, enhancing reliability and adaptability in web automation workflows.\n3. **Human Knowledge Transcription**: Tools for supervising and creating \"golden\" sessions to improve AI performance. These tools include features for recording browser sessions and converting them into fine-tuning examples, making it easier to enhance the LLM's capabilities with real-world browsing data.\n\n## Installation\n\nInstall from local source by navigating to the root directory and running:\n\n```sh\nconda develop ./cerebellum\n# or\npip install -e ./cerebellum\n```\n\nFor local LLM planners, also install [guidance](https://github.com/guidance-ai/guidance) and [llama-cpp-python](https://github.com/abetlen/llama-cpp-python).\n\n## How it works\n\nCerebellum models web interactions as pathfinding on a directed graph. The state of a webpage is represented as nodes on an infinite graph, with each node capturing the current webpage state, including client-side and server-side information. Actions like clicking a button, filling out a form, or navigating to a new page are edges that transition the state from one node to another.\n\nThe process starts with a node representing the initial webpage state, aiming to find an optimal path to a terminal node that signifies goal completion. Each action transitions the webpage state along the graph's edges.\n\nNeighboring nodes are discovered at runtime by analyzing the DOM structure to find interactable elements, like buttons, links, and input fields. These elements are mapped as possible actions that transition the state to new nodes, allowing the system to determine the next steps toward the goal.\n\nThe Large Language Model (LLM) planner acts as a heuristic, analyzing the current state, evaluating actions, and selecting the most promising next step. As Cerebellum interacts with the webpage, it updates its understanding of the graph structure, adapting to changes or new information.\n\nThis adaptive navigation allows Cerebellum to respond dynamically to changes in the webpage, making real-time decisions. Successful navigation sessions are used to fine-tune the LLM, improving its ability to handle similar tasks. By modeling web interactions in this way, Cerebellum offers a flexible approach to automating complex tasks in dynamic environments.\n\n## Components\n\n- `LocalLLMBrowserPlanner`: Generates browser actions.\n- `ExtendedLlama3ChatTemplate`: Custom LLM interaction template.\n- `BrowserState`, `BrowserAction`, `BrowserActionResult`: Core data structures.\n\n## Contributing\n\nWe welcome contributions! You can help by:\n\n1. **Code Contributions**: Fork the repo, create a branch, and submit a pull request.\n2. **Bug Reports**: Report issues on GitHub.\n3. **Feature Requests**: Share your ideas for improvements.\n4. **Documentation**: Help refine the docs.\n5. **Golden Session Files**: Submit `.cere` files for goals where Cerebellum struggles. This will help improve the AI's performance and contribute to fine-tuning efforts.\n\nRefer to our CONTRIBUTING.md for more detailed guidelines.\n\n## License\n\nApache 2.0\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Cerebellum is an AI-driven browser automation system.",
    "version": "0.1",
    "project_urls": null,
    "split_keywords": [
        "agent",
        " ai",
        " automation",
        " browser",
        " llm"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ca42f932380c48b9a9b4e894659ccf5fcc785566a674285bf275d0556034b3ef",
                "md5": "f40ed3882e42c6808e922ead39f32952",
                "sha256": "e1e413d4be8a4554f6187912d7c5a6db356f65351cff7a980b37e4c953c7e4a9"
            },
            "downloads": -1,
            "filename": "cerebellum-0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f40ed3882e42c6808e922ead39f32952",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 39091,
            "upload_time": "2024-10-22T23:23:56",
            "upload_time_iso_8601": "2024-10-22T23:23:56.812504Z",
            "url": "https://files.pythonhosted.org/packages/ca/42/f932380c48b9a9b4e894659ccf5fcc785566a674285bf275d0556034b3ef/cerebellum-0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b827942238729ad8b537f05248e24b4541246ac2d82e72c68897ed18c95bc51f",
                "md5": "3ec7407200072177c36850ad305f99a3",
                "sha256": "55657ea0baa4adad4d43db5acf471819c9f3d8d74859cf572c9ef127104f0e4f"
            },
            "downloads": -1,
            "filename": "cerebellum-0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "3ec7407200072177c36850ad305f99a3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 37315,
            "upload_time": "2024-10-22T23:23:58",
            "upload_time_iso_8601": "2024-10-22T23:23:58.693312Z",
            "url": "https://files.pythonhosted.org/packages/b8/27/942238729ad8b537f05248e24b4541246ac2d82e72c68897ed18c95bc51f/cerebellum-0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-22 23:23:58",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "cerebellum"
}
        
Elapsed time: 0.91381s