cesail

Name	cesail JSON
Version	0.2.1 JSON
	download
home_page	https://github.com/AkilaJay/cesail
Summary	A comprehensive web automation and DOM parsing platform with AI-powered agents
upload_time	2025-08-23 04:47:28
maintainer	None
docs_url	None
author	Rachita Pradeep
requires_python	>=3.9
license	MIT
keywords	web-automation dom-parser ai playwright mcp web-scraping
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # CeSail

A comprehensive web automation and DOM parsing platform with AI-powered agents.

## Project Overview

CeSail enables AI agents to interact with the web through a comprehensive web automation and DOM parsing platform. It transforms complex web pages into structured, agent-friendly data and provides complete web interaction capabilities. The platform offers APIs that retrieve parsed data from web pages, transform it into a format that's easy for AI agents to understand, and execute actions like clicking, typing, navigating, and scrolling - enabling full end-to-end web automation workflows.

### What CeSail Does

CeSail bridges the gap between raw web content and AI agents by:

> **⚠️ Version Compatibility Note**: CeSail automatically manages compatible versions of `fastmcp` and `mcp` packages. If you encounter import errors related to `McpError`, ensure you're using the latest version of CeSail.

1. **🌐 Web Page Analysis**: Extracts and analyzes DOM elements, forms, buttons, links, and interactive components
2. **🧠 Agent-Friendly Transformation**: Converts complex HTML structures into structured data with clear semantics
3. **🎯 Actionable Intelligence**: Identifies clickable elements, form fields, and navigation options with context
4. **📊 Structured Output**: Provides clean, JSON-like data structures that agents can easily parse and understand
5. **🔍 Context Preservation**: Maintains relationships between elements and their functional purposes
6. **📸 Visual Overlays**: Generates screenshots with overlays highlighting parsed action items and interactive elements

## Quick Start

### Install from PyPI (Recommended)

The easiest way to get started with CeSail is to install it from PyPI:

```bash
# Install CeSail
pip install cesail

# Playwright browsers are installed automatically during package installation
# If you encounter any issues, you can manually install them:
# playwright install
```

### Simple Example

Here's a quick example that demonstrates CeSail's core functionality:

```python
import asyncio
from cesail import DOMParser, Action, ActionType

async def quick_demo():
    """Quick demonstration of CeSail's web automation capabilities."""
    async with DOMParser(headless=False) as parser:
        # Navigate to a website
        action = Action(
            type=ActionType.NAVIGATE,
            metadata={"url": "https://www.example.com"}
        )
        await parser._action_executor.execute_action(action)
        
        # Analyze the page and get structured data
        parsed_page = await parser.analyze_page()
        print(f"Found {len(parsed_page.important_elements.elements)} interactive elements")
        
        # Take a screenshot with overlays
        await parser.take_screenshot("demo_screenshot.png")
        
        # Show available actions
        print("Available actions:")
        for element in parsed_page.important_elements.elements[:3]:
            print(f"  - {element.type}: {element.text}")

# Run the demo
asyncio.run(quick_demo())
```

## MCP (Model Context Protocol) Integration

CeSail provides a FastMCP server that enables AI assistants like Cursor to directly interact with web pages through standardized APIs. This allows you to give natural language commands to your AI assistant and have it execute web automation tasks.

### Setting up MCP with Cursor

1. **Install CeSail MCP Server**:
   ```bash
   pip install cesail fastmcp
   ```

2. **Configure MCP Settings**:
   - Open Cursor
   - Go to Settings → Extensions → MCP
   - Add a new server configuration:
   - **Note**: Make sure to use the path to your Python executable. You can find it by running `which python` or `which python3` in your terminal.
   ```json
   {
     "mcpServers": {
       "cesail": {
         "command": "python3",
         "args": ["-m", "cesail.cesail_mcp.fastmcp_server"],
         "env": {
           "PYTHONUNBUFFERED": "1"
         },
         "description": "CeSail MCP Server for comprehensive web automation and DOM parsing",
         "capabilities": {
           "tools": {
             "listChanged": true
           }
         }
       }
     }
   }
   ```
   
   **Note**: This configuration has been tested with Cursor. For best performance, users should disable the `get_screenshot` capability as Cursor screenshots can take a while to process. To disable it, go to Cursor Settings → Tools & Integrations → MCP and disable the `get_screenshot` capability for the CeSail server. This should also work with other MCP-compatible agents, though it hasn't been tested with them.

   For more help setting up Cursor MCP, see: https://docs.cursor.com/en/context/mcp

3. **Test the FastMCP Server**:
   ```bash
   python3 -m cesail.cesail_mcp.fastmcp_server
   ```
   
   Run this command to ensure the server launches properly. You should see output indicating the server is starting up.

4. **Use in Cursor**:
   Now you can ask Cursor to perform web automation tasks:
   ```
   "Using cesail, Navigate to example.com and do a certain task"
   "Using cesail, ..."
   ```

### Why Agents Need This

Traditional web scraping provides raw HTML, which is difficult for AI agents to interpret. CeSail solves this by:

- **Semantic Understanding**: Identifies what each element does (button, form, link, etc.)
- **Action Mapping**: Maps elements to executable actions (click, type, navigate)
- **Context Enrichment**: Adds metadata about element purpose and relationships
- **Structured Data**: Outputs clean, predictable data structures
- **Visual Context**: Combines DOM analysis with visual information via screenshots and overlays highlighting actionable elements
- **Highly Configurable**: Customizable settings for different use cases and requirements

This transformation makes it possible for AI agents to:
- Understand page structure at a glance
- Identify actionable elements quickly
- Execute precise interactions
- Adapt to different page layouts
- Make intelligent decisions about next actions

## Architecture

```
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│     Cursor      │    │   MCP Server    │    │  DOM Parser     │
│   (AI Agent)    │◄──►│   (Python)      │◄──►│  (Python)       │
│                 │    │                 │    │                 │
│ • Natural Lang. │    │ • FastMCP APIs  │    │ • Page Analyzer │
│ • Task Planning │    │ • Web Automation│    │ • Action Exec.  │
│ • Execution     │    │ • Screenshots   │    │ • Idle Watcher  │
└─────────────────┘    └─────────────────┘    └─────────────────┘
                                                        │
                                                        │
                                        ┌─────────────────┐
                                        │   Web Browser   │
                                        │  (Playwright)   │
                                        │                 │
                                        │ • Page Control  │
                                        │ • DOM Access    │
                                        │ • Screenshots   │
                                        │ • Actions       │
                                        └─────────────────┘
                                                        │
                                                        │
                                        ┌─────────────────┐
                                        │  JavaScript     │
                                        │  Layer          │
                                        │                 │
                                        │ • Element Ext.  │
                                        │ • Selector Gen. │
                                        │ • Text Analysis │
                                        │ • Action Ext.   │
                                        └─────────────────┘
```

**Key Architecture Points:**
- **MCP Server**: Connects to DOM Parser for web automation APIs
- **DOM Parser**: Orchestrates page analysis, action execution, and idle watching
- **Web Browser**: Connected to DOM Parser for page control and actions
- **JavaScript Layer**: Injected into web browser for DOM parsing and element extraction
- **Actions**: Executed by Playwright, parsing done manually through JavaScript

## Components

### 1. DOM Parser JavaScript Layer (`cesail/dom_parser/src/js/`)
Core DOM parsing engine that transforms raw HTML into structured, agent-friendly data.

**Language**: JavaScript  
**Features**: 
- **Element Extraction**: Identifies and categorizes interactive elements (buttons, forms, links)
- **Semantic Analysis**: Understands element purpose and context
- **Action Mapping**: Maps elements to executable actions (click, type, navigate)
- **Text Scoring**: Prioritizes important text content for agents
- **Selector Generation**: Creates reliable CSS selectors for element targeting
- **Performance Optimization**: Caching and monitoring for speed
- **ARIA Support**: Accessibility attribute analysis
- **Visual Context**: Combines DOM data with visual information
- **Processing Pipeline**: Multi-stage element processing and filtering

**Key Components**:
- `index.js`: Main entry point and public API
- `action-extraction.js`: Extracts actionable elements and metadata
- `filter-elements.js`: Filters and groups elements by importance
- `scoring.js`: Scores elements based on visibility and interactivity
- `selector-extraction.js`: Generates reliable CSS selectors
- `visualizer.js`: Visual debugging and element highlighting
- `cache-manager.js`: Performance optimization and caching

**Data Transformation Example**:
```javascript
// Raw HTML input
<button class="btn-primary" onclick="submit()">Submit Form</button>
<input type="text" placeholder="Enter email" id="email" />

// CeSail transforms to agent-friendly JSON
{
  "type": "BUTTON",
  "selector": "button.btn-primary",
  "text": "Submit Form",
  "action": "CLICK",
  "importance": 0.9,
  "context": "form submission",
  "metadata": {
    "aria-label": null,
    "disabled": false,
    "visible": true
  }
}
```

**Documentation**: See [cesail/dom_parser/src/js/README.md](cesail/dom_parser/src/js/README.md)

### 2. DOM Parser Python Layer (`cesail/dom_parser/src/py/`)
Orchestration layer that manages browser interactions and provides high-level APIs.

**Language**: Python  
**Features**:
- **Page Analysis**: Comprehensive page structure analysis and element extraction
- **Action Execution**: Executes clicks, typing, navigation, and other web actions
- **Idle Watching**: Monitors page state changes and waits for stability
- **Screenshot Integration**: Captures and analyzes visual page content
- **Configuration Management**: Flexible configuration for different use cases
- **Session Management**: Maintains browser state across interactions
- **Error Handling**: Robust error recovery and retry logic

**Key Components**:
- `dom_parser.py`: Main interface for DOM parsing and interaction
- `page_analyzer.py`: Analyzes page structure and extracts actionable elements
- `action_executor.py`: Executes web actions through Playwright
- `idle_watcher.py`: Monitors page state and waits for stability
- `screenshot.py`: Captures and processes page screenshots
- `types.py`: Data structures and type definitions
- `config.py`: Configuration management and validation
- `actions_plugins/`: Modular action implementations (navigation, interaction, input, system)

**Integration Example**:
```python
async with DOMParser() as parser:
    # Navigate to page
    await parser.navigate("https://example.com")
    
    # Analyze page structure
    parsed_page = await parser.analyze_page()
    
    # Execute actions
    await parser.click("button.btn-primary")
    await parser.type("input#email", "user@example.com")
```

**Documentation**: See [cesail/dom_parser/src/py/README.md](cesail/dom_parser/src/py/README.md)

### 2. MCP Server (`cesail/cesail_mcp/`)
FastMCP server that provides standardized APIs for agents to interact with transformed web data.

**Language**: Python  
**Features**:
- **Structured APIs**: Clean, predictable endpoints for web automation
- **Action Execution**: Execute clicks, typing, navigation based on transformed data
- **Page Analysis**: Get structured page information in agent-friendly format
- **Screenshot Integration**: Visual context combined with structured data
- **Session Management**: Maintain state across interactions
- **Error Handling**: Robust retry logic and error recovery

**Agent-Friendly API Example**:
```python
# Agent receives structured data from CeSail
parsed_page = await parser.analyze_page()

# Get the actions data (this is what agents typically work with)
actions = parsed_page.get_actions()

# Example actions data structure
actions_data = [
  {
    "type": "LINK",
    "selector": "2",
    "importantText": "Vintage vibesCreate your weekend moodboard | Vinta | /today/best/create-your-weekend-moodboard/128099/"
  },
  {
    "type": "LINK", 
    "selector": "3",
    "importantText": "Summer hobbiesTry bead embroidery | Summer hobbies | /today/best/try-bead-embroidery/128240/"
  },
  {
    "type": "SELECT",
    "selector": "5", 
    "importantText": "search-box-input | combobox | Search | Search"
  },
  {
    "type": "BUTTON",
    "selector": "8",
    "importantText": "vertical-nav-more-options-button | More options | More options"
  },
  {
    "type": "BUTTON",
    "selector": "10",
    "importantText": "Sign up"
  }
  ]
 

**Documentation**: See [cesail/dom_parser/src/py/README.md](cesail/dom_parser/src/py/README.md) for more details about the parsed page data structure.
```

**Usage**: `python3 -m cesail.cesail_mcp.fastmcp_server`

### 3. Simple Agent (`cesail/simple_agent/`) - WIP
AI-powered web automation agent using LLM for task breakdown and execution.

**Language**: Python  
**Features**:
- Natural language task processing
- Automated task breakdown and planning
- LLM-powered decision making
- Visual analysis with screenshots
- Interactive execution monitoring

**Documentation**: See [cesail/simple_agent/README.md](cesail/simple_agent/README.md) for more details.

**Usage**: `python3 -m cesail.simple_agent.simple_agent`

## Testing

CeSail includes comprehensive test suites to validate functionality and demonstrate capabilities.

### Test Categories

- **Playground Tests** - Integration tests with real websites (Google, Amazon, YouTube, Pinterest, etc.)
- **Unit Tests** - Individual component testing
- **Replay Tests** - Regression testing with golden values

### Quick Start

```bash
# Activate virtual environment
source venv/bin/activate

# Set PYTHONPATH
export PYTHONPATH=/Users/rachitapradeep/CeSail:$PYTHONPATH

# Run playground tests (great way to see CeSail in action!)
pytest cesail/dom_parser/tests/playground/test_page_analyzer_integration_pinterest.py -v -s

# Run all tests
pytest cesail/dom_parser/tests/ -v
```

### Playground Tests

The playground tests are an excellent way to see CeSail navigate through real websites:

- **Google Search**: Navigate and search functionality
- **Amazon**: Product browsing and search
- **YouTube**: Video navigation and interaction
- **Pinterest**: Image browsing and pinning
- **Airbnb**: Property search and filtering
- **Google Flights**: Flight search and booking flow

These tests demonstrate CeSail's ability to:
- Extract interactive elements from complex websites
- Navigate through multi-step workflows
- Handle dynamic content and AJAX loading
- Generate screenshots with bounding boxes
- Process structured data for AI agents

**Documentation**: See [cesail/dom_parser/tests/README.md](cesail/dom_parser/tests/README.md) for complete testing guide and examples.

## Development Installation

For development or advanced usage:

**Prerequisites**:
- **Python**: 3.9 or higher
- **Node.js**: 14 or higher (for DOM Parser development)
- **OpenAI API Key**: Required for Simple Agent
- **Git**: For cloning the repository

**Installation**:

1. **Clone the repository**:
   ```bash
   git clone https://github.com/AkilaJay/cesail.git
   cd cesail
   ```

2. **Set up Python environment**:
   ```bash
   python3 -m venv venv
   source venv/bin/activate  # On Windows: venv\Scripts\activate
   pip install -e .
   ```

3. **Set up DOM Parser** (optional):
   ```bash
   cd cesail/dom_parser
   npm install
   npm run build
   cd ..
   ```

4. **Configure environment** (for Simple Agent):
   ```bash
   # Create .env file in cesail/simple_agent/ directory
echo "OPENAI_API_KEY=your_openai_api_key_here" > cesail/simple_agent/.env
   ```

5. **Playwright browsers are installed automatically** during package installation.
   If you encounter any issues, you can manually install them:
   ```bash
   playwright install
   ```

## Troubleshooting

### Common Issues

#### 1. Import Errors
**Problem**: `ModuleNotFoundError: No module named 'dom_parser'`
**Solution**: Ensure you're in the correct directory and virtual environment is activated

#### 2. Playwright Browser Issues
**Problem**: Browser not found or crashes
**Solution**: Reinstall Playwright browsers:
```bash
playwright install
```

#### 3. OpenAI API Errors
**Problem**: API key invalid or rate limited
**Solution**: Check your API key and usage limits in the OpenAI dashboard

#### 4. Screenshot Failures
**Problem**: Screenshots fail with "Target page closed" error
**Solution**: Add proper error handling and retry logic

## API Reference

For detailed API documentation, see the component-specific README files:

### DOM Parser APIs
- **Python Layer**: [cesail/dom_parser/src/py/README.md](cesail/dom_parser/src/py/README.md) - Complete Python API reference including DOMParser, PageAnalyzer, ActionExecutor, and more
- **JavaScript Layer**: [cesail/dom_parser/src/js/README.md](cesail/dom_parser/src/js/README.md) - JavaScript DOM parsing APIs and element extraction functions

### MCP Server API
- **FastMCP Integration**: See the MCP server documentation for standardized web automation APIs
- **Documentation**: See [cesail_mcp/README.md](cesail_mcp/README.md) for complete API reference and usage examples

### Simple Agent API (WIP)
- **Natural Language Processing**: Process user input and execute web automation tasks
- **LLM Integration**: AI-powered task breakdown and execution
- **Documentation**: See [cesail/simple_agent/README.md](cesail/simple_agent/README.md) for current API details

## Contributing

We welcome contributions! Here's how to get started:

### Development Setup

1. **Fork the repository**
2. **Create a feature branch**:
   ```bash
   git checkout -b feature/your-feature-name
   ```
3. **Make your changes**
4. **Add tests** for new functionality
5. **Run tests** to ensure everything works
6. **Submit a pull request**

### Code Style

- **Python**: Follow PEP 8, use Black for formatting
- **JavaScript**: Follow ESLint rules, use Prettier for formatting
- **Documentation**: Update README files for new features

### Testing

- Write unit tests for new functions
- Add integration tests for new features
- Ensure all existing tests pass

## Project Structure

```
cesail/
├── cesail/                  # Python package
│   ├── dom_parser/          # JavaScript DOM parser
│   ├── src/                # Source code
│   ├── dist/               # Built files
│   ├── tests/              # JavaScript tests
│   └── README.md           # Component documentation
│   ├── cesail_mcp/         # FastMCP server
│   ├── fastmcp_server.py   # Main server file
│   ├── server.py           # Alternative server
│   └── tests/              # MCP tests
│   ├── simple_agent/       # AI web automation agent
│   ├── simple_agent.py     # Main agent file
│   ├── llm_interface.py    # LLM integration
│   └── .env               # Environment variables
├── venv/                   # Python virtual environment
├── setup.py               # Python package configuration
├── pyproject.toml         # Project configuration
└── README.md              # This file
```
## Support

- **Issues**: Report bugs and feature requests on GitHub
- **Discussions**: Join discussions for questions and ideas
- **Documentation**: Check component-specific README files for detailed docs

## Roadmap

- [ ] Enhanced simple agent
- [ ] Plugin framework for actions
- [ ] More native actions / Parser enhancements
- [ ] Replay framework

# Help needed / Bugs

- [ ] Idle watcher doesn't always wait for the page to load.
      Need to fix.
- [ ] Simple agent enhancements
- [ ] Parser enhancements
- [ ] Testing

## Contact

For questions, issues, or contributions:

- **Email**: ajjayawardane@gmail.com
- **GitHub**: [@AkilaJay](https://github.com/AkilaJay)
- **Issues**: [GitHub Issues](https://github.com/AkilaJay/cesail/issues)

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/AkilaJay/cesail",
    "name": "cesail",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": "Akila Jayawardane <ajjayawardane@gmail.com>",
    "keywords": "web-automation, dom-parser, ai, playwright, mcp, web-scraping",
    "author": "Rachita Pradeep",
    "author_email": "Akila Jayawardane <ajjayawardane@gmail.com>",
    "download_url": null,
    "platform": null,
    "description": "# CeSail\n\nA comprehensive web automation and DOM parsing platform with AI-powered agents.\n\n## Project Overview\n\nCeSail enables AI agents to interact with the web through a comprehensive web automation and DOM parsing platform. It transforms complex web pages into structured, agent-friendly data and provides complete web interaction capabilities. The platform offers APIs that retrieve parsed data from web pages, transform it into a format that's easy for AI agents to understand, and execute actions like clicking, typing, navigating, and scrolling - enabling full end-to-end web automation workflows.\n\n### What CeSail Does\n\nCeSail bridges the gap between raw web content and AI agents by:\n\n> **\u26a0\ufe0f Version Compatibility Note**: CeSail automatically manages compatible versions of `fastmcp` and `mcp` packages. If you encounter import errors related to `McpError`, ensure you're using the latest version of CeSail.\n\n1. **\ud83c\udf10 Web Page Analysis**: Extracts and analyzes DOM elements, forms, buttons, links, and interactive components\n2. **\ud83e\udde0 Agent-Friendly Transformation**: Converts complex HTML structures into structured data with clear semantics\n3. **\ud83c\udfaf Actionable Intelligence**: Identifies clickable elements, form fields, and navigation options with context\n4. **\ud83d\udcca Structured Output**: Provides clean, JSON-like data structures that agents can easily parse and understand\n5. **\ud83d\udd0d Context Preservation**: Maintains relationships between elements and their functional purposes\n6. **\ud83d\udcf8 Visual Overlays**: Generates screenshots with overlays highlighting parsed action items and interactive elements\n\n## Quick Start\n\n### Install from PyPI (Recommended)\n\nThe easiest way to get started with CeSail is to install it from PyPI:\n\n```bash\n# Install CeSail\npip install cesail\n\n# Playwright browsers are installed automatically during package installation\n# If you encounter any issues, you can manually install them:\n# playwright install\n```\n\n### Simple Example\n\nHere's a quick example that demonstrates CeSail's core functionality:\n\n```python\nimport asyncio\nfrom cesail import DOMParser, Action, ActionType\n\nasync def quick_demo():\n    \"\"\"Quick demonstration of CeSail's web automation capabilities.\"\"\"\n    async with DOMParser(headless=False) as parser:\n        # Navigate to a website\n        action = Action(\n            type=ActionType.NAVIGATE,\n            metadata={\"url\": \"https://www.example.com\"}\n        )\n        await parser._action_executor.execute_action(action)\n        \n        # Analyze the page and get structured data\n        parsed_page = await parser.analyze_page()\n        print(f\"Found {len(parsed_page.important_elements.elements)} interactive elements\")\n        \n        # Take a screenshot with overlays\n        await parser.take_screenshot(\"demo_screenshot.png\")\n        \n        # Show available actions\n        print(\"Available actions:\")\n        for element in parsed_page.important_elements.elements[:3]:\n            print(f\"  - {element.type}: {element.text}\")\n\n# Run the demo\nasyncio.run(quick_demo())\n```\n\n## MCP (Model Context Protocol) Integration\n\nCeSail provides a FastMCP server that enables AI assistants like Cursor to directly interact with web pages through standardized APIs. This allows you to give natural language commands to your AI assistant and have it execute web automation tasks.\n\n### Setting up MCP with Cursor\n\n1. **Install CeSail MCP Server**:\n   ```bash\n   pip install cesail fastmcp\n   ```\n\n2. **Configure MCP Settings**:\n   - Open Cursor\n   - Go to Settings \u2192 Extensions \u2192 MCP\n   - Add a new server configuration:\n   - **Note**: Make sure to use the path to your Python executable. You can find it by running `which python` or `which python3` in your terminal.\n   ```json\n   {\n     \"mcpServers\": {\n       \"cesail\": {\n         \"command\": \"python3\",\n         \"args\": [\"-m\", \"cesail.cesail_mcp.fastmcp_server\"],\n         \"env\": {\n           \"PYTHONUNBUFFERED\": \"1\"\n         },\n         \"description\": \"CeSail MCP Server for comprehensive web automation and DOM parsing\",\n         \"capabilities\": {\n           \"tools\": {\n             \"listChanged\": true\n           }\n         }\n       }\n     }\n   }\n   ```\n   \n   **Note**: This configuration has been tested with Cursor. For best performance, users should disable the `get_screenshot` capability as Cursor screenshots can take a while to process. To disable it, go to Cursor Settings \u2192 Tools & Integrations \u2192 MCP and disable the `get_screenshot` capability for the CeSail server. This should also work with other MCP-compatible agents, though it hasn't been tested with them.\n\n   For more help setting up Cursor MCP, see: https://docs.cursor.com/en/context/mcp\n\n3. **Test the FastMCP Server**:\n   ```bash\n   python3 -m cesail.cesail_mcp.fastmcp_server\n   ```\n   \n   Run this command to ensure the server launches properly. You should see output indicating the server is starting up.\n\n4. **Use in Cursor**:\n   Now you can ask Cursor to perform web automation tasks:\n   ```\n   \"Using cesail, Navigate to example.com and do a certain task\"\n   \"Using cesail, ...\"\n   ```\n\n### Why Agents Need This\n\nTraditional web scraping provides raw HTML, which is difficult for AI agents to interpret. CeSail solves this by:\n\n- **Semantic Understanding**: Identifies what each element does (button, form, link, etc.)\n- **Action Mapping**: Maps elements to executable actions (click, type, navigate)\n- **Context Enrichment**: Adds metadata about element purpose and relationships\n- **Structured Data**: Outputs clean, predictable data structures\n- **Visual Context**: Combines DOM analysis with visual information via screenshots and overlays highlighting actionable elements\n- **Highly Configurable**: Customizable settings for different use cases and requirements\n\nThis transformation makes it possible for AI agents to:\n- Understand page structure at a glance\n- Identify actionable elements quickly\n- Execute precise interactions\n- Adapt to different page layouts\n- Make intelligent decisions about next actions\n\n## Architecture\n\n```\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510    \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510    \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502     Cursor      \u2502    \u2502   MCP Server    \u2502    \u2502  DOM Parser     \u2502\n\u2502   (AI Agent)    \u2502\u25c4\u2500\u2500\u25ba\u2502   (Python)      \u2502\u25c4\u2500\u2500\u25ba\u2502  (Python)       \u2502\n\u2502                 \u2502    \u2502                 \u2502    \u2502                 \u2502\n\u2502 \u2022 Natural Lang. \u2502    \u2502 \u2022 FastMCP APIs  \u2502    \u2502 \u2022 Page Analyzer \u2502\n\u2502 \u2022 Task Planning \u2502    \u2502 \u2022 Web Automation\u2502    \u2502 \u2022 Action Exec.  \u2502\n\u2502 \u2022 Execution     \u2502    \u2502 \u2022 Screenshots   \u2502    \u2502 \u2022 Idle Watcher  \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518    \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518    \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n                                                        \u2502\n                                                        \u2502\n                                        \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n                                        \u2502   Web Browser   \u2502\n                                        \u2502  (Playwright)   \u2502\n                                        \u2502                 \u2502\n                                        \u2502 \u2022 Page Control  \u2502\n                                        \u2502 \u2022 DOM Access    \u2502\n                                        \u2502 \u2022 Screenshots   \u2502\n                                        \u2502 \u2022 Actions       \u2502\n                                        \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n                                                        \u2502\n                                                        \u2502\n                                        \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n                                        \u2502  JavaScript     \u2502\n                                        \u2502  Layer          \u2502\n                                        \u2502                 \u2502\n                                        \u2502 \u2022 Element Ext.  \u2502\n                                        \u2502 \u2022 Selector Gen. \u2502\n                                        \u2502 \u2022 Text Analysis \u2502\n                                        \u2502 \u2022 Action Ext.   \u2502\n                                        \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n```\n\n**Key Architecture Points:**\n- **MCP Server**: Connects to DOM Parser for web automation APIs\n- **DOM Parser**: Orchestrates page analysis, action execution, and idle watching\n- **Web Browser**: Connected to DOM Parser for page control and actions\n- **JavaScript Layer**: Injected into web browser for DOM parsing and element extraction\n- **Actions**: Executed by Playwright, parsing done manually through JavaScript\n\n## Components\n\n### 1. DOM Parser JavaScript Layer (`cesail/dom_parser/src/js/`)\nCore DOM parsing engine that transforms raw HTML into structured, agent-friendly data.\n\n**Language**: JavaScript  \n**Features**: \n- **Element Extraction**: Identifies and categorizes interactive elements (buttons, forms, links)\n- **Semantic Analysis**: Understands element purpose and context\n- **Action Mapping**: Maps elements to executable actions (click, type, navigate)\n- **Text Scoring**: Prioritizes important text content for agents\n- **Selector Generation**: Creates reliable CSS selectors for element targeting\n- **Performance Optimization**: Caching and monitoring for speed\n- **ARIA Support**: Accessibility attribute analysis\n- **Visual Context**: Combines DOM data with visual information\n- **Processing Pipeline**: Multi-stage element processing and filtering\n\n**Key Components**:\n- `index.js`: Main entry point and public API\n- `action-extraction.js`: Extracts actionable elements and metadata\n- `filter-elements.js`: Filters and groups elements by importance\n- `scoring.js`: Scores elements based on visibility and interactivity\n- `selector-extraction.js`: Generates reliable CSS selectors\n- `visualizer.js`: Visual debugging and element highlighting\n- `cache-manager.js`: Performance optimization and caching\n\n**Data Transformation Example**:\n```javascript\n// Raw HTML input\n<button class=\"btn-primary\" onclick=\"submit()\">Submit Form</button>\n<input type=\"text\" placeholder=\"Enter email\" id=\"email\" />\n\n// CeSail transforms to agent-friendly JSON\n{\n  \"type\": \"BUTTON\",\n  \"selector\": \"button.btn-primary\",\n  \"text\": \"Submit Form\",\n  \"action\": \"CLICK\",\n  \"importance\": 0.9,\n  \"context\": \"form submission\",\n  \"metadata\": {\n    \"aria-label\": null,\n    \"disabled\": false,\n    \"visible\": true\n  }\n}\n```\n\n**Documentation**: See [cesail/dom_parser/src/js/README.md](cesail/dom_parser/src/js/README.md)\n\n### 2. DOM Parser Python Layer (`cesail/dom_parser/src/py/`)\nOrchestration layer that manages browser interactions and provides high-level APIs.\n\n**Language**: Python  \n**Features**:\n- **Page Analysis**: Comprehensive page structure analysis and element extraction\n- **Action Execution**: Executes clicks, typing, navigation, and other web actions\n- **Idle Watching**: Monitors page state changes and waits for stability\n- **Screenshot Integration**: Captures and analyzes visual page content\n- **Configuration Management**: Flexible configuration for different use cases\n- **Session Management**: Maintains browser state across interactions\n- **Error Handling**: Robust error recovery and retry logic\n\n**Key Components**:\n- `dom_parser.py`: Main interface for DOM parsing and interaction\n- `page_analyzer.py`: Analyzes page structure and extracts actionable elements\n- `action_executor.py`: Executes web actions through Playwright\n- `idle_watcher.py`: Monitors page state and waits for stability\n- `screenshot.py`: Captures and processes page screenshots\n- `types.py`: Data structures and type definitions\n- `config.py`: Configuration management and validation\n- `actions_plugins/`: Modular action implementations (navigation, interaction, input, system)\n\n**Integration Example**:\n```python\nasync with DOMParser() as parser:\n    # Navigate to page\n    await parser.navigate(\"https://example.com\")\n    \n    # Analyze page structure\n    parsed_page = await parser.analyze_page()\n    \n    # Execute actions\n    await parser.click(\"button.btn-primary\")\n    await parser.type(\"input#email\", \"user@example.com\")\n```\n\n**Documentation**: See [cesail/dom_parser/src/py/README.md](cesail/dom_parser/src/py/README.md)\n\n### 2. MCP Server (`cesail/cesail_mcp/`)\nFastMCP server that provides standardized APIs for agents to interact with transformed web data.\n\n**Language**: Python  \n**Features**:\n- **Structured APIs**: Clean, predictable endpoints for web automation\n- **Action Execution**: Execute clicks, typing, navigation based on transformed data\n- **Page Analysis**: Get structured page information in agent-friendly format\n- **Screenshot Integration**: Visual context combined with structured data\n- **Session Management**: Maintain state across interactions\n- **Error Handling**: Robust retry logic and error recovery\n\n**Agent-Friendly API Example**:\n```python\n# Agent receives structured data from CeSail\nparsed_page = await parser.analyze_page()\n\n# Get the actions data (this is what agents typically work with)\nactions = parsed_page.get_actions()\n\n# Example actions data structure\nactions_data = [\n  {\n    \"type\": \"LINK\",\n    \"selector\": \"2\",\n    \"importantText\": \"Vintage vibesCreate your weekend moodboard | Vinta | /today/best/create-your-weekend-moodboard/128099/\"\n  },\n  {\n    \"type\": \"LINK\", \n    \"selector\": \"3\",\n    \"importantText\": \"Summer hobbiesTry bead embroidery | Summer hobbies | /today/best/try-bead-embroidery/128240/\"\n  },\n  {\n    \"type\": \"SELECT\",\n    \"selector\": \"5\", \n    \"importantText\": \"search-box-input | combobox | Search | Search\"\n  },\n  {\n    \"type\": \"BUTTON\",\n    \"selector\": \"8\",\n    \"importantText\": \"vertical-nav-more-options-button | More options | More options\"\n  },\n  {\n    \"type\": \"BUTTON\",\n    \"selector\": \"10\",\n    \"importantText\": \"Sign up\"\n  }\n  ]\n \n\n**Documentation**: See [cesail/dom_parser/src/py/README.md](cesail/dom_parser/src/py/README.md) for more details about the parsed page data structure.\n```\n\n**Usage**: `python3 -m cesail.cesail_mcp.fastmcp_server`\n\n### 3. Simple Agent (`cesail/simple_agent/`) - WIP\nAI-powered web automation agent using LLM for task breakdown and execution.\n\n**Language**: Python  \n**Features**:\n- Natural language task processing\n- Automated task breakdown and planning\n- LLM-powered decision making\n- Visual analysis with screenshots\n- Interactive execution monitoring\n\n**Documentation**: See [cesail/simple_agent/README.md](cesail/simple_agent/README.md) for more details.\n\n**Usage**: `python3 -m cesail.simple_agent.simple_agent`\n\n## Testing\n\nCeSail includes comprehensive test suites to validate functionality and demonstrate capabilities.\n\n### Test Categories\n\n- **Playground Tests** - Integration tests with real websites (Google, Amazon, YouTube, Pinterest, etc.)\n- **Unit Tests** - Individual component testing\n- **Replay Tests** - Regression testing with golden values\n\n### Quick Start\n\n```bash\n# Activate virtual environment\nsource venv/bin/activate\n\n# Set PYTHONPATH\nexport PYTHONPATH=/Users/rachitapradeep/CeSail:$PYTHONPATH\n\n# Run playground tests (great way to see CeSail in action!)\npytest cesail/dom_parser/tests/playground/test_page_analyzer_integration_pinterest.py -v -s\n\n# Run all tests\npytest cesail/dom_parser/tests/ -v\n```\n\n### Playground Tests\n\nThe playground tests are an excellent way to see CeSail navigate through real websites:\n\n- **Google Search**: Navigate and search functionality\n- **Amazon**: Product browsing and search\n- **YouTube**: Video navigation and interaction\n- **Pinterest**: Image browsing and pinning\n- **Airbnb**: Property search and filtering\n- **Google Flights**: Flight search and booking flow\n\nThese tests demonstrate CeSail's ability to:\n- Extract interactive elements from complex websites\n- Navigate through multi-step workflows\n- Handle dynamic content and AJAX loading\n- Generate screenshots with bounding boxes\n- Process structured data for AI agents\n\n**Documentation**: See [cesail/dom_parser/tests/README.md](cesail/dom_parser/tests/README.md) for complete testing guide and examples.\n\n## Development Installation\n\nFor development or advanced usage:\n\n**Prerequisites**:\n- **Python**: 3.9 or higher\n- **Node.js**: 14 or higher (for DOM Parser development)\n- **OpenAI API Key**: Required for Simple Agent\n- **Git**: For cloning the repository\n\n**Installation**:\n\n1. **Clone the repository**:\n   ```bash\n   git clone https://github.com/AkilaJay/cesail.git\n   cd cesail\n   ```\n\n2. **Set up Python environment**:\n   ```bash\n   python3 -m venv venv\n   source venv/bin/activate  # On Windows: venv\\Scripts\\activate\n   pip install -e .\n   ```\n\n3. **Set up DOM Parser** (optional):\n   ```bash\n   cd cesail/dom_parser\n   npm install\n   npm run build\n   cd ..\n   ```\n\n4. **Configure environment** (for Simple Agent):\n   ```bash\n   # Create .env file in cesail/simple_agent/ directory\necho \"OPENAI_API_KEY=your_openai_api_key_here\" > cesail/simple_agent/.env\n   ```\n\n5. **Playwright browsers are installed automatically** during package installation.\n   If you encounter any issues, you can manually install them:\n   ```bash\n   playwright install\n   ```\n\n## Troubleshooting\n\n### Common Issues\n\n#### 1. Import Errors\n**Problem**: `ModuleNotFoundError: No module named 'dom_parser'`\n**Solution**: Ensure you're in the correct directory and virtual environment is activated\n\n#### 2. Playwright Browser Issues\n**Problem**: Browser not found or crashes\n**Solution**: Reinstall Playwright browsers:\n```bash\nplaywright install\n```\n\n#### 3. OpenAI API Errors\n**Problem**: API key invalid or rate limited\n**Solution**: Check your API key and usage limits in the OpenAI dashboard\n\n#### 4. Screenshot Failures\n**Problem**: Screenshots fail with \"Target page closed\" error\n**Solution**: Add proper error handling and retry logic\n\n## API Reference\n\nFor detailed API documentation, see the component-specific README files:\n\n### DOM Parser APIs\n- **Python Layer**: [cesail/dom_parser/src/py/README.md](cesail/dom_parser/src/py/README.md) - Complete Python API reference including DOMParser, PageAnalyzer, ActionExecutor, and more\n- **JavaScript Layer**: [cesail/dom_parser/src/js/README.md](cesail/dom_parser/src/js/README.md) - JavaScript DOM parsing APIs and element extraction functions\n\n### MCP Server API\n- **FastMCP Integration**: See the MCP server documentation for standardized web automation APIs\n- **Documentation**: See [cesail_mcp/README.md](cesail_mcp/README.md) for complete API reference and usage examples\n\n### Simple Agent API (WIP)\n- **Natural Language Processing**: Process user input and execute web automation tasks\n- **LLM Integration**: AI-powered task breakdown and execution\n- **Documentation**: See [cesail/simple_agent/README.md](cesail/simple_agent/README.md) for current API details\n\n## Contributing\n\nWe welcome contributions! Here's how to get started:\n\n### Development Setup\n\n1. **Fork the repository**\n2. **Create a feature branch**:\n   ```bash\n   git checkout -b feature/your-feature-name\n   ```\n3. **Make your changes**\n4. **Add tests** for new functionality\n5. **Run tests** to ensure everything works\n6. **Submit a pull request**\n\n### Code Style\n\n- **Python**: Follow PEP 8, use Black for formatting\n- **JavaScript**: Follow ESLint rules, use Prettier for formatting\n- **Documentation**: Update README files for new features\n\n### Testing\n\n- Write unit tests for new functions\n- Add integration tests for new features\n- Ensure all existing tests pass\n\n## Project Structure\n\n```\ncesail/\n\u251c\u2500\u2500 cesail/                  # Python package\n\u2502   \u251c\u2500\u2500 dom_parser/          # JavaScript DOM parser\n\u2502   \u251c\u2500\u2500 src/                # Source code\n\u2502   \u251c\u2500\u2500 dist/               # Built files\n\u2502   \u251c\u2500\u2500 tests/              # JavaScript tests\n\u2502   \u2514\u2500\u2500 README.md           # Component documentation\n\u2502   \u251c\u2500\u2500 cesail_mcp/         # FastMCP server\n\u2502   \u251c\u2500\u2500 fastmcp_server.py   # Main server file\n\u2502   \u251c\u2500\u2500 server.py           # Alternative server\n\u2502   \u2514\u2500\u2500 tests/              # MCP tests\n\u2502   \u251c\u2500\u2500 simple_agent/       # AI web automation agent\n\u2502   \u251c\u2500\u2500 simple_agent.py     # Main agent file\n\u2502   \u251c\u2500\u2500 llm_interface.py    # LLM integration\n\u2502   \u2514\u2500\u2500 .env               # Environment variables\n\u251c\u2500\u2500 venv/                   # Python virtual environment\n\u251c\u2500\u2500 setup.py               # Python package configuration\n\u251c\u2500\u2500 pyproject.toml         # Project configuration\n\u2514\u2500\u2500 README.md              # This file\n```\n## Support\n\n- **Issues**: Report bugs and feature requests on GitHub\n- **Discussions**: Join discussions for questions and ideas\n- **Documentation**: Check component-specific README files for detailed docs\n\n## Roadmap\n\n- [ ] Enhanced simple agent\n- [ ] Plugin framework for actions\n- [ ] More native actions / Parser enhancements\n- [ ] Replay framework\n\n# Help needed / Bugs\n\n- [ ] Idle watcher doesn't always wait for the page to load.\n      Need to fix.\n- [ ] Simple agent enhancements\n- [ ] Parser enhancements\n- [ ] Testing\n\n## Contact\n\nFor questions, issues, or contributions:\n\n- **Email**: ajjayawardane@gmail.com\n- **GitHub**: [@AkilaJay](https://github.com/AkilaJay)\n- **Issues**: [GitHub Issues](https://github.com/AkilaJay/cesail/issues)\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A comprehensive web automation and DOM parsing platform with AI-powered agents",
    "version": "0.2.1",
    "project_urls": {
        "Changelog": "https://github.com/AkilaJay/CeSail/blob/main/CHANGELOG.md",
        "Documentation": "https://github.com/AkilaJay/CeSail#readme",
        "Homepage": "https://github.com/AkilaJay/CeSail",
        "Issues": "https://github.com/AkilaJay/CeSail/issues",
        "Repository": "https://github.com/AkilaJay/CeSail"
    },
    "split_keywords": [
        "web-automation",
        " dom-parser",
        " ai",
        " playwright",
        " mcp",
        " web-scraping"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "96e2a9c880ccad5a095f7fc6eb26437a1f52ec2efe28dac1c6b6eab31a1235c6",
                "md5": "f404992f9f360061adc5752b3b0c5af3",
                "sha256": "5270bc4fdda2feb900d606f6301cedcb0729ac761a18dab14943eb3533a44db2"
            },
            "downloads": -1,
            "filename": "cesail-0.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f404992f9f360061adc5752b3b0c5af3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 1190495,
            "upload_time": "2025-08-23T04:47:28",
            "upload_time_iso_8601": "2025-08-23T04:47:28.280382Z",
            "url": "https://files.pythonhosted.org/packages/96/e2/a9c880ccad5a095f7fc6eb26437a1f52ec2efe28dac1c6b6eab31a1235c6/cesail-0.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-23 04:47:28",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "AkilaJay",
    "github_project": "cesail",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "cesail"
}

Rachita Pradeep