computer-mcp


Namecomputer-mcp JSON
Version 0.0.5 PyPI version JSON
download
home_pageNone
SummaryCross-platform MCP server for computer automation and control
upload_time2025-11-03 02:26:58
maintainerNone
docs_urlNone
authorNone
requires_python>=3.12
licenseMIT
keywords mcp automation mouse keyboard screenshot accessibility control
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Computer MCP

A cross-platform computer automation and control library supporting multiple interfaces:
- **MCP Server** (stdio + HTTP/SSE modes)
- **HTTP REST API** (with OpenAPI spec)
- **CLI** (command execution and server management)
- **Programmatic Module** (stateless Python functions)

Provides tools for mouse/keyboard automation, screenshot capture, window management, virtual desktops, and comprehensive state tracking including accessibility tree support.

## Features

- **Mouse Control**: Click, double-click, triple-click, button down/up, drag operations
- **Keyboard Control**: Type text, key down/up/press
- **Screenshot Capture**: Fast cross-platform screenshot using `mss`, returns images as base64 or PNG
- **Window Management**: List, switch, move, resize, minimize, maximize, snap windows
- **Virtual Desktops**: List, switch, and move windows between virtual desktops
- **State Tracking**: Configurable tracking of mouse position/buttons, keyboard keys, focused app, and accessibility tree
- **Accessibility Tree**: Full platform-specific implementation for Windows, macOS, and Linux/Ubuntu

## Installation

```bash
# Install core dependencies
pip install -e .

# Optional: Install API/HTTP dependencies
pip install -e ".[api]"    # For HTTP REST API server
pip install -e ".[http]"   # For MCP HTTP/SSE mode
pip install -e ".[dev]"    # All optional dependencies

# Platform-specific optional dependencies (for enhanced features)
pip install -e ".[windows]"   # Windows: pywin32 for accessibility tree
pip install -e ".[macos]"      # macOS: pyobjc for native accessibility (AppleScript fallback available)
pip install -e ".[linux]"      # Linux: PyGObject for AT-SPI (requires: sudo apt install python3-gi gir1.2-atspi-2.0)
```

## Usage

### 1. As MCP Server (stdio mode)

The default mode for MCP clients like Cursor or Claude Desktop.

**Configuration** (e.g., `~/.cursor/mcp.json`):

```json
{
  "mcpServers": {
    "computer-mcp": {
      "command": "uv",
      "args": [
        "--directory",
        "C:\\Users\\Jacob\\Code\\computer-mcp",
        "run",
        "computer-mcp"
      ]
    }
  }
}
```

Or using `uvx`:

```json
{
  "mcpServers": {
    "computer-mcp": {
      "command": "uvx",
      "args": ["computer-mcp"]
    }
  }
}
```

**Note**: `uvx` automatically installs and runs the package if not already installed. Make sure you have [uv](https://github.com/astral-sh/uv) installed.

### 2. As MCP Server (HTTP/SSE mode)

For remote access via HTTP/SSE:

```bash
python -m computer_mcp serve mcp --http --host 127.0.0.1 --port 8000
```

This starts the MCP server with:
- SSE endpoint: `http://127.0.0.1:8000/sse`
- Tool call endpoint: `http://127.0.0.1:8000/mcp`

### 3. As HTTP REST API Server

Start the FastAPI server with automatic OpenAPI documentation:

```bash
python -m computer_mcp serve api --host 127.0.0.1 --port 8000
```

Or using the CLI:

```bash
computer-mcp serve api --port 8000
```

Then access:
- **API Docs**: http://127.0.0.1:8000/docs
- **ReDoc**: http://127.0.0.1:8000/redoc
- **OpenAPI JSON**: http://127.0.0.1:8000/openapi.json

**Example API calls**:

```bash
# Click mouse
curl -X POST http://localhost:8000/mouse/click -H "Content-Type: application/json" -d '{"button": "left"}'

# Type text
curl -X POST http://localhost:8000/keyboard/type -H "Content-Type: application/json" -d '{"text": "Hello World"}'

# Get screenshot as PNG
curl http://localhost:8000/screenshot/image -o screenshot.png

# List windows
curl http://localhost:8000/windows

# Switch to window
curl -X POST http://localhost:8000/windows/switch -H "Content-Type: application/json" -d '{"hwnd": 123456}'
```

### 4. As CLI Tool

Execute commands directly from the command line:

```bash
# Mouse commands
computer-mcp mouse click --button right
computer-mcp mouse double-click
computer-mcp mouse move --x 500 --y 300

# Keyboard commands
computer-mcp keyboard type "Hello World"
computer-mcp keyboard key-press ctrl

# Window commands
computer-mcp window list
computer-mcp window switch --hwnd 123456
computer-mcp window snap-left --hwnd 123456
computer-mcp window close --hwnd 123456

# Screenshot
computer-mcp screenshot --save screenshot.png

# Start servers
computer-mcp serve api --port 8000
computer-mcp serve mcp --http --port 8001

# JSON output
computer-mcp mouse click --json
```

### 5. As Python Module

Import and use stateless functions directly in your code:

```python
from computer_mcp import (
    click, double_click, move_mouse, drag,
    type_text, key_press, key_down, key_up,
    get_screenshot,
    list_windows, switch_to_window, close_window,
    snap_window_left, snap_window_right,
)

# Mouse operations
click("left")
double_click("right")
move_mouse(500, 300)
drag({"x": 100, "y": 200}, {"x": 300, "y": 400})

# Keyboard operations
type_text("Hello World")
key_press("ctrl")
key_down("shift")
key_up("shift")

# Screenshot
screenshot_data = get_screenshot()
print(f"Screenshot: {screenshot_data['width']}x{screenshot_data['height']}")

# Window management
windows = list_windows()
for window in windows.get("windows", []):
    print(f"{window['title']} (hwnd: {window['hwnd']})")

# Switch to a window by title
switch_to_window(title="Notepad")

# Snap window to left half
snap_window_left(hwnd=123456)
```

## Available Tools/Endpoints

### Mouse Operations

- `click(button='left'|'middle'|'right')` - Click at current cursor position
- `double_click(button='left'|'middle'|'right')` - Double-click at current cursor position
- `triple_click(button='left'|'middle'|'right')` - Triple-click at current cursor position
- `button_down(button='left'|'middle'|'right')` - Press and hold a mouse button
- `button_up(button='left'|'middle'|'right')` - Release a mouse button
- `drag(start={x, y}, end={x, y}, button='left')` - Drag from start to end position
- `mouse_move(x, y)` - Move cursor to specified coordinates

**REST API**: `POST /mouse/click`, `POST /mouse/drag`, `POST /mouse/move`, etc.

### Keyboard Operations

- `type(text)` - Type text string
- `key_down(key)` - Press and hold a key
- `key_up(key)` - Release a key
- `key_press(key)` - Press and release a key (convenience)

**REST API**: `POST /keyboard/type`, `POST /keyboard/key-press`, etc.

### Screenshot

- `screenshot()` / `get_screenshot()` - Capture screenshot (included by default in MCP responses)

**REST API**: 
- `GET /screenshot` - Returns JSON with base64 data
- `GET /screenshot/image` - Returns PNG image

### Window Management

- `list_windows()` - List all visible windows
- `switch_to_window(hwnd=<int>|title=<str>)` - Switch focus to a window
- `move_window(hwnd, x, y, width?, height?)` - Move and/or resize window
- `resize_window(hwnd, width, height)` - Resize window
- `minimize_window(hwnd)` - Minimize window
- `maximize_window(hwnd)` - Maximize window
- `restore_window(hwnd)` - Restore window
- `set_window_topmost(hwnd, topmost=true)` - Set window always-on-top
- `get_window_info(hwnd)` - Get detailed window information
- `close_window(hwnd)` - Close window
- `snap_window_left(hwnd)` - Snap to left half
- `snap_window_right(hwnd)` - Snap to right half
- `snap_window_top(hwnd)` - Snap to top half
- `snap_window_bottom(hwnd)` - Snap to bottom half
- `screenshot_window(hwnd)` - Capture screenshot of specific window

**REST API**: 
- `GET /windows` - List windows
- `POST /windows/switch` - Switch by handle
- `POST /windows/switch-by-title` - Switch by title
- `GET /windows/{hwnd}` - Get window info
- `DELETE /windows/{hwnd}` - Close window
- `POST /windows/{hwnd}/snap-left` - Snap left, etc.

### Virtual Desktops

- `list_virtual_desktops()` - List all virtual desktops
- `switch_virtual_desktop(desktop_id=<int>|name=<str>)` - Switch to virtual desktop
- `move_window_to_virtual_desktop(hwnd, desktop_id)` - Move window to desktop

**REST API**: 
- `GET /virtual-desktops` - List desktops
- `POST /virtual-desktops/switch` - Switch desktop
- `POST /windows/{hwnd}/move-to-desktop` - Move window

### Configuration

- `set_config(...)` - Configure observation options:
  - `observe_screen` (bool, default: `true`): Include screenshots in all responses
  - `observe_mouse_position` (bool, default: `false`): Track and include mouse position
  - `observe_mouse_button_states` (bool, default: `false`): Track and include mouse button states
  - `observe_keyboard_key_states` (bool, default: `false`): Track and include keyboard key states
  - `observe_focused_app` (bool, default: `false`): Include focused application information
  - `observe_accessibility_tree` (bool, default: `false`): Include accessibility tree

**REST API**: `POST /config` - Update configuration

## Key Names

Special keys can be specified as strings:
- `"ctrl"`, `"alt"`, `"shift"`, `"cmd"` (or `"win"` on Windows)
- `"space"`, `"enter"`, `"tab"`, `"esc"`, `"backspace"`
- Arrow keys: `"up"`, `"down"`, `"left"`, `"right"`
- Function keys: `"f1"` through `"f12"`
- Regular characters: `"a"`, `"b"`, etc.

## Platform Support

### Windows
- **Full Support**: All mouse/keyboard operations work
- **Window Management**: Full support via `pywin32` (included in `[windows]` extras)
- **Virtual Desktops**: Full support via `VirtualDesktopAccessor.dll`
- **Focused App**: Requires `pywin32` (install with `pip install -e ".[windows]"`)
- **Accessibility Tree**: Uses Windows UI Automation API (requires `pywin32`)

### macOS
- **Full Support**: All mouse/keyboard operations work
- **Window Management**: Limited support via AppleScript (some operations not yet implemented)
- **Virtual Desktops**: Limited support (Spaces enumeration/switching via Mission Control API)
- **Focused App**: Uses AppleScript (no dependencies)
- **Accessibility Tree**: 
  - Native: Uses AXUIElement via `pyobjc` (install with `pip install -e ".[macos]"`)
  - Fallback: Uses AppleScript (works without dependencies, limited tree depth)

### Linux/Ubuntu
- **Full Support**: All mouse/keyboard operations work
- **Window Management**: Full support via `xdotool` (install: `sudo apt install xdotool`)
- **Virtual Desktops**: Full support via `wmctrl` or `xdotool` (install: `sudo apt install wmctrl`)
- **Focused App**: Uses `xdotool` (install: `sudo apt install xdotool`)
- **Accessibility Tree**: 
  - Native: Uses AT-SPI via PyGObject (install: `sudo apt install python3-gi gir1.2-atspi-2.0`, then `pip install -e ".[linux]"`)
  - Fallback: Basic window info via `xdotool`

## Architecture

The codebase is organized into clear layers:

```
computer_mcp/
├── __init__.py          # Module API (stateless functions)
├── __main__.py          # CLI entry point
├── cli.py               # CLI implementation
├── mcp.py               # MCP server (stdio + HTTP/SSE)
├── api.py               # HTTP REST API server
├── actions/             # Business logic (pure functions)
│   ├── mouse.py
│   ├── keyboard.py
│   ├── window.py
│   ├── screenshot.py
│   ├── config.py
│   ├── focused_app.py
│   └── accessibility_tree.py
├── core/                # Core utilities
│   ├── state.py
│   ├── platform.py
│   ├── screenshot.py
│   ├── response.py
│   └── utils.py
└── resources/           # Platform-specific resources
```

**Key Design Principles:**
- **Actions layer**: Pure business logic functions, no interface dependencies
- **Interface adapters**: MCP, API, CLI wrap the actions layer
- **Stateless module API**: Clean functions for direct Python usage
- **State management**: Optional, configurable per interface

## Response Format

### MCP Server Response

By default (with `observe_screen: true`), all tool responses include a screenshot as MCP `ImageContent`:

**Response Structure:**
- `ImageContent` (type: "image"): Contains the screenshot as base64-encoded PNG with mimeType "image/png"
- `TextContent` (type: "text"): Contains JSON with action results and screenshot metadata:

```json
{
  "success": true,
  "action": "click",
  "button": "left",
  "screenshot": {
    "format": "base64_png",
    "width": 1920,
    "height": 1080
  }
}
```

With full observation enabled, the TextContent includes additional state:

```json
{
  "success": true,
  "action": "click",
  "button": "left",
  "screenshot": {
    "format": "base64_png",
    "width": 1920,
    "height": 1080
  },
  "mouse_position": {"x": 500, "y": 300},
  "mouse_button_states": ["Button.left"],
  "keyboard_key_states": ["ctrl"],
  "focused_app": {
    "name": "Code",
    "pid": 12345,
    "title": "main.py - computer-mcp"
  },
  "accessibility_tree": {
    "tree": {
      "name": "Application",
      "control_type": "...",
      "bounds": {"x": 0, "y": 0, "width": 1920, "height": 1080},
      "children": [...]
    }
  }
}
```

### HTTP REST API Response

Returns JSON directly:

```json
{
  "success": true,
  "action": "click",
  "button": "left"
}
```

Screenshots are returned as base64-encoded strings in JSON, or use the `/screenshot/image` endpoint for raw PNG.

### CLI Output

Default: Human-readable success/error messages
With `--json`: JSON output matching API format

### Module API Response

Returns plain Python dictionaries:

```python
result = click("left")
# result = {"success": True, "action": "click", "button": "left"}
```

## Notes

- Screenshots are **included by default** in MCP tool responses (when `observe_screen: true`)
- Mouse tools operate at the **current cursor position** unless you explicitly move the mouse first
- State tracking listeners are automatically started/stopped based on configuration
- Accessibility tree implementations may vary in depth and detail across platforms
- Some platform-specific features require optional dependencies or system packages
- Window management features vary by platform (Windows has full support, macOS/Linux have partial support)

## License

MIT

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "computer-mcp",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.12",
    "maintainer_email": null,
    "keywords": "mcp, automation, mouse, keyboard, screenshot, accessibility, control",
    "author": null,
    "author_email": "CommandAGI <hello@commandagi.com>",
    "download_url": "https://files.pythonhosted.org/packages/7c/30/f1a0bd3733666a06faf76b2a6ddbc52d06a4ff8dc3d74d58cadf07c8373b/computer_mcp-0.0.5.tar.gz",
    "platform": null,
    "description": "# Computer MCP\n\nA cross-platform computer automation and control library supporting multiple interfaces:\n- **MCP Server** (stdio + HTTP/SSE modes)\n- **HTTP REST API** (with OpenAPI spec)\n- **CLI** (command execution and server management)\n- **Programmatic Module** (stateless Python functions)\n\nProvides tools for mouse/keyboard automation, screenshot capture, window management, virtual desktops, and comprehensive state tracking including accessibility tree support.\n\n## Features\n\n- **Mouse Control**: Click, double-click, triple-click, button down/up, drag operations\n- **Keyboard Control**: Type text, key down/up/press\n- **Screenshot Capture**: Fast cross-platform screenshot using `mss`, returns images as base64 or PNG\n- **Window Management**: List, switch, move, resize, minimize, maximize, snap windows\n- **Virtual Desktops**: List, switch, and move windows between virtual desktops\n- **State Tracking**: Configurable tracking of mouse position/buttons, keyboard keys, focused app, and accessibility tree\n- **Accessibility Tree**: Full platform-specific implementation for Windows, macOS, and Linux/Ubuntu\n\n## Installation\n\n```bash\n# Install core dependencies\npip install -e .\n\n# Optional: Install API/HTTP dependencies\npip install -e \".[api]\"    # For HTTP REST API server\npip install -e \".[http]\"   # For MCP HTTP/SSE mode\npip install -e \".[dev]\"    # All optional dependencies\n\n# Platform-specific optional dependencies (for enhanced features)\npip install -e \".[windows]\"   # Windows: pywin32 for accessibility tree\npip install -e \".[macos]\"      # macOS: pyobjc for native accessibility (AppleScript fallback available)\npip install -e \".[linux]\"      # Linux: PyGObject for AT-SPI (requires: sudo apt install python3-gi gir1.2-atspi-2.0)\n```\n\n## Usage\n\n### 1. As MCP Server (stdio mode)\n\nThe default mode for MCP clients like Cursor or Claude Desktop.\n\n**Configuration** (e.g., `~/.cursor/mcp.json`):\n\n```json\n{\n  \"mcpServers\": {\n    \"computer-mcp\": {\n      \"command\": \"uv\",\n      \"args\": [\n        \"--directory\",\n        \"C:\\\\Users\\\\Jacob\\\\Code\\\\computer-mcp\",\n        \"run\",\n        \"computer-mcp\"\n      ]\n    }\n  }\n}\n```\n\nOr using `uvx`:\n\n```json\n{\n  \"mcpServers\": {\n    \"computer-mcp\": {\n      \"command\": \"uvx\",\n      \"args\": [\"computer-mcp\"]\n    }\n  }\n}\n```\n\n**Note**: `uvx` automatically installs and runs the package if not already installed. Make sure you have [uv](https://github.com/astral-sh/uv) installed.\n\n### 2. As MCP Server (HTTP/SSE mode)\n\nFor remote access via HTTP/SSE:\n\n```bash\npython -m computer_mcp serve mcp --http --host 127.0.0.1 --port 8000\n```\n\nThis starts the MCP server with:\n- SSE endpoint: `http://127.0.0.1:8000/sse`\n- Tool call endpoint: `http://127.0.0.1:8000/mcp`\n\n### 3. As HTTP REST API Server\n\nStart the FastAPI server with automatic OpenAPI documentation:\n\n```bash\npython -m computer_mcp serve api --host 127.0.0.1 --port 8000\n```\n\nOr using the CLI:\n\n```bash\ncomputer-mcp serve api --port 8000\n```\n\nThen access:\n- **API Docs**: http://127.0.0.1:8000/docs\n- **ReDoc**: http://127.0.0.1:8000/redoc\n- **OpenAPI JSON**: http://127.0.0.1:8000/openapi.json\n\n**Example API calls**:\n\n```bash\n# Click mouse\ncurl -X POST http://localhost:8000/mouse/click -H \"Content-Type: application/json\" -d '{\"button\": \"left\"}'\n\n# Type text\ncurl -X POST http://localhost:8000/keyboard/type -H \"Content-Type: application/json\" -d '{\"text\": \"Hello World\"}'\n\n# Get screenshot as PNG\ncurl http://localhost:8000/screenshot/image -o screenshot.png\n\n# List windows\ncurl http://localhost:8000/windows\n\n# Switch to window\ncurl -X POST http://localhost:8000/windows/switch -H \"Content-Type: application/json\" -d '{\"hwnd\": 123456}'\n```\n\n### 4. As CLI Tool\n\nExecute commands directly from the command line:\n\n```bash\n# Mouse commands\ncomputer-mcp mouse click --button right\ncomputer-mcp mouse double-click\ncomputer-mcp mouse move --x 500 --y 300\n\n# Keyboard commands\ncomputer-mcp keyboard type \"Hello World\"\ncomputer-mcp keyboard key-press ctrl\n\n# Window commands\ncomputer-mcp window list\ncomputer-mcp window switch --hwnd 123456\ncomputer-mcp window snap-left --hwnd 123456\ncomputer-mcp window close --hwnd 123456\n\n# Screenshot\ncomputer-mcp screenshot --save screenshot.png\n\n# Start servers\ncomputer-mcp serve api --port 8000\ncomputer-mcp serve mcp --http --port 8001\n\n# JSON output\ncomputer-mcp mouse click --json\n```\n\n### 5. As Python Module\n\nImport and use stateless functions directly in your code:\n\n```python\nfrom computer_mcp import (\n    click, double_click, move_mouse, drag,\n    type_text, key_press, key_down, key_up,\n    get_screenshot,\n    list_windows, switch_to_window, close_window,\n    snap_window_left, snap_window_right,\n)\n\n# Mouse operations\nclick(\"left\")\ndouble_click(\"right\")\nmove_mouse(500, 300)\ndrag({\"x\": 100, \"y\": 200}, {\"x\": 300, \"y\": 400})\n\n# Keyboard operations\ntype_text(\"Hello World\")\nkey_press(\"ctrl\")\nkey_down(\"shift\")\nkey_up(\"shift\")\n\n# Screenshot\nscreenshot_data = get_screenshot()\nprint(f\"Screenshot: {screenshot_data['width']}x{screenshot_data['height']}\")\n\n# Window management\nwindows = list_windows()\nfor window in windows.get(\"windows\", []):\n    print(f\"{window['title']} (hwnd: {window['hwnd']})\")\n\n# Switch to a window by title\nswitch_to_window(title=\"Notepad\")\n\n# Snap window to left half\nsnap_window_left(hwnd=123456)\n```\n\n## Available Tools/Endpoints\n\n### Mouse Operations\n\n- `click(button='left'|'middle'|'right')` - Click at current cursor position\n- `double_click(button='left'|'middle'|'right')` - Double-click at current cursor position\n- `triple_click(button='left'|'middle'|'right')` - Triple-click at current cursor position\n- `button_down(button='left'|'middle'|'right')` - Press and hold a mouse button\n- `button_up(button='left'|'middle'|'right')` - Release a mouse button\n- `drag(start={x, y}, end={x, y}, button='left')` - Drag from start to end position\n- `mouse_move(x, y)` - Move cursor to specified coordinates\n\n**REST API**: `POST /mouse/click`, `POST /mouse/drag`, `POST /mouse/move`, etc.\n\n### Keyboard Operations\n\n- `type(text)` - Type text string\n- `key_down(key)` - Press and hold a key\n- `key_up(key)` - Release a key\n- `key_press(key)` - Press and release a key (convenience)\n\n**REST API**: `POST /keyboard/type`, `POST /keyboard/key-press`, etc.\n\n### Screenshot\n\n- `screenshot()` / `get_screenshot()` - Capture screenshot (included by default in MCP responses)\n\n**REST API**: \n- `GET /screenshot` - Returns JSON with base64 data\n- `GET /screenshot/image` - Returns PNG image\n\n### Window Management\n\n- `list_windows()` - List all visible windows\n- `switch_to_window(hwnd=<int>|title=<str>)` - Switch focus to a window\n- `move_window(hwnd, x, y, width?, height?)` - Move and/or resize window\n- `resize_window(hwnd, width, height)` - Resize window\n- `minimize_window(hwnd)` - Minimize window\n- `maximize_window(hwnd)` - Maximize window\n- `restore_window(hwnd)` - Restore window\n- `set_window_topmost(hwnd, topmost=true)` - Set window always-on-top\n- `get_window_info(hwnd)` - Get detailed window information\n- `close_window(hwnd)` - Close window\n- `snap_window_left(hwnd)` - Snap to left half\n- `snap_window_right(hwnd)` - Snap to right half\n- `snap_window_top(hwnd)` - Snap to top half\n- `snap_window_bottom(hwnd)` - Snap to bottom half\n- `screenshot_window(hwnd)` - Capture screenshot of specific window\n\n**REST API**: \n- `GET /windows` - List windows\n- `POST /windows/switch` - Switch by handle\n- `POST /windows/switch-by-title` - Switch by title\n- `GET /windows/{hwnd}` - Get window info\n- `DELETE /windows/{hwnd}` - Close window\n- `POST /windows/{hwnd}/snap-left` - Snap left, etc.\n\n### Virtual Desktops\n\n- `list_virtual_desktops()` - List all virtual desktops\n- `switch_virtual_desktop(desktop_id=<int>|name=<str>)` - Switch to virtual desktop\n- `move_window_to_virtual_desktop(hwnd, desktop_id)` - Move window to desktop\n\n**REST API**: \n- `GET /virtual-desktops` - List desktops\n- `POST /virtual-desktops/switch` - Switch desktop\n- `POST /windows/{hwnd}/move-to-desktop` - Move window\n\n### Configuration\n\n- `set_config(...)` - Configure observation options:\n  - `observe_screen` (bool, default: `true`): Include screenshots in all responses\n  - `observe_mouse_position` (bool, default: `false`): Track and include mouse position\n  - `observe_mouse_button_states` (bool, default: `false`): Track and include mouse button states\n  - `observe_keyboard_key_states` (bool, default: `false`): Track and include keyboard key states\n  - `observe_focused_app` (bool, default: `false`): Include focused application information\n  - `observe_accessibility_tree` (bool, default: `false`): Include accessibility tree\n\n**REST API**: `POST /config` - Update configuration\n\n## Key Names\n\nSpecial keys can be specified as strings:\n- `\"ctrl\"`, `\"alt\"`, `\"shift\"`, `\"cmd\"` (or `\"win\"` on Windows)\n- `\"space\"`, `\"enter\"`, `\"tab\"`, `\"esc\"`, `\"backspace\"`\n- Arrow keys: `\"up\"`, `\"down\"`, `\"left\"`, `\"right\"`\n- Function keys: `\"f1\"` through `\"f12\"`\n- Regular characters: `\"a\"`, `\"b\"`, etc.\n\n## Platform Support\n\n### Windows\n- **Full Support**: All mouse/keyboard operations work\n- **Window Management**: Full support via `pywin32` (included in `[windows]` extras)\n- **Virtual Desktops**: Full support via `VirtualDesktopAccessor.dll`\n- **Focused App**: Requires `pywin32` (install with `pip install -e \".[windows]\"`)\n- **Accessibility Tree**: Uses Windows UI Automation API (requires `pywin32`)\n\n### macOS\n- **Full Support**: All mouse/keyboard operations work\n- **Window Management**: Limited support via AppleScript (some operations not yet implemented)\n- **Virtual Desktops**: Limited support (Spaces enumeration/switching via Mission Control API)\n- **Focused App**: Uses AppleScript (no dependencies)\n- **Accessibility Tree**: \n  - Native: Uses AXUIElement via `pyobjc` (install with `pip install -e \".[macos]\"`)\n  - Fallback: Uses AppleScript (works without dependencies, limited tree depth)\n\n### Linux/Ubuntu\n- **Full Support**: All mouse/keyboard operations work\n- **Window Management**: Full support via `xdotool` (install: `sudo apt install xdotool`)\n- **Virtual Desktops**: Full support via `wmctrl` or `xdotool` (install: `sudo apt install wmctrl`)\n- **Focused App**: Uses `xdotool` (install: `sudo apt install xdotool`)\n- **Accessibility Tree**: \n  - Native: Uses AT-SPI via PyGObject (install: `sudo apt install python3-gi gir1.2-atspi-2.0`, then `pip install -e \".[linux]\"`)\n  - Fallback: Basic window info via `xdotool`\n\n## Architecture\n\nThe codebase is organized into clear layers:\n\n```\ncomputer_mcp/\n\u251c\u2500\u2500 __init__.py          # Module API (stateless functions)\n\u251c\u2500\u2500 __main__.py          # CLI entry point\n\u251c\u2500\u2500 cli.py               # CLI implementation\n\u251c\u2500\u2500 mcp.py               # MCP server (stdio + HTTP/SSE)\n\u251c\u2500\u2500 api.py               # HTTP REST API server\n\u251c\u2500\u2500 actions/             # Business logic (pure functions)\n\u2502   \u251c\u2500\u2500 mouse.py\n\u2502   \u251c\u2500\u2500 keyboard.py\n\u2502   \u251c\u2500\u2500 window.py\n\u2502   \u251c\u2500\u2500 screenshot.py\n\u2502   \u251c\u2500\u2500 config.py\n\u2502   \u251c\u2500\u2500 focused_app.py\n\u2502   \u2514\u2500\u2500 accessibility_tree.py\n\u251c\u2500\u2500 core/                # Core utilities\n\u2502   \u251c\u2500\u2500 state.py\n\u2502   \u251c\u2500\u2500 platform.py\n\u2502   \u251c\u2500\u2500 screenshot.py\n\u2502   \u251c\u2500\u2500 response.py\n\u2502   \u2514\u2500\u2500 utils.py\n\u2514\u2500\u2500 resources/           # Platform-specific resources\n```\n\n**Key Design Principles:**\n- **Actions layer**: Pure business logic functions, no interface dependencies\n- **Interface adapters**: MCP, API, CLI wrap the actions layer\n- **Stateless module API**: Clean functions for direct Python usage\n- **State management**: Optional, configurable per interface\n\n## Response Format\n\n### MCP Server Response\n\nBy default (with `observe_screen: true`), all tool responses include a screenshot as MCP `ImageContent`:\n\n**Response Structure:**\n- `ImageContent` (type: \"image\"): Contains the screenshot as base64-encoded PNG with mimeType \"image/png\"\n- `TextContent` (type: \"text\"): Contains JSON with action results and screenshot metadata:\n\n```json\n{\n  \"success\": true,\n  \"action\": \"click\",\n  \"button\": \"left\",\n  \"screenshot\": {\n    \"format\": \"base64_png\",\n    \"width\": 1920,\n    \"height\": 1080\n  }\n}\n```\n\nWith full observation enabled, the TextContent includes additional state:\n\n```json\n{\n  \"success\": true,\n  \"action\": \"click\",\n  \"button\": \"left\",\n  \"screenshot\": {\n    \"format\": \"base64_png\",\n    \"width\": 1920,\n    \"height\": 1080\n  },\n  \"mouse_position\": {\"x\": 500, \"y\": 300},\n  \"mouse_button_states\": [\"Button.left\"],\n  \"keyboard_key_states\": [\"ctrl\"],\n  \"focused_app\": {\n    \"name\": \"Code\",\n    \"pid\": 12345,\n    \"title\": \"main.py - computer-mcp\"\n  },\n  \"accessibility_tree\": {\n    \"tree\": {\n      \"name\": \"Application\",\n      \"control_type\": \"...\",\n      \"bounds\": {\"x\": 0, \"y\": 0, \"width\": 1920, \"height\": 1080},\n      \"children\": [...]\n    }\n  }\n}\n```\n\n### HTTP REST API Response\n\nReturns JSON directly:\n\n```json\n{\n  \"success\": true,\n  \"action\": \"click\",\n  \"button\": \"left\"\n}\n```\n\nScreenshots are returned as base64-encoded strings in JSON, or use the `/screenshot/image` endpoint for raw PNG.\n\n### CLI Output\n\nDefault: Human-readable success/error messages\nWith `--json`: JSON output matching API format\n\n### Module API Response\n\nReturns plain Python dictionaries:\n\n```python\nresult = click(\"left\")\n# result = {\"success\": True, \"action\": \"click\", \"button\": \"left\"}\n```\n\n## Notes\n\n- Screenshots are **included by default** in MCP tool responses (when `observe_screen: true`)\n- Mouse tools operate at the **current cursor position** unless you explicitly move the mouse first\n- State tracking listeners are automatically started/stopped based on configuration\n- Accessibility tree implementations may vary in depth and detail across platforms\n- Some platform-specific features require optional dependencies or system packages\n- Window management features vary by platform (Windows has full support, macOS/Linux have partial support)\n\n## License\n\nMIT\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Cross-platform MCP server for computer automation and control",
    "version": "0.0.5",
    "project_urls": {
        "Documentation": "https://github.com/commandAGI/computer-mcp#readme",
        "Homepage": "https://commandagi.com",
        "Issues": "https://github.com/commandAGI/computer-mcp/issues",
        "Repository": "https://github.com/commandAGI/computer-mcp"
    },
    "split_keywords": [
        "mcp",
        " automation",
        " mouse",
        " keyboard",
        " screenshot",
        " accessibility",
        " control"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ba137536556024985f8bb0ad3c0dcf4ccbac0884aa3bb69e10b3a9d07dda3775",
                "md5": "5e9f2eea69644d7416aec2ac94f46427",
                "sha256": "21bfca59687ccd67c2f11e3e3ad33418b6af3e7a01c8f6520db01a768428015e"
            },
            "downloads": -1,
            "filename": "computer_mcp-0.0.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5e9f2eea69644d7416aec2ac94f46427",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.12",
            "size": 215336,
            "upload_time": "2025-11-03T02:26:56",
            "upload_time_iso_8601": "2025-11-03T02:26:56.817853Z",
            "url": "https://files.pythonhosted.org/packages/ba/13/7536556024985f8bb0ad3c0dcf4ccbac0884aa3bb69e10b3a9d07dda3775/computer_mcp-0.0.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7c30f1a0bd3733666a06faf76b2a6ddbc52d06a4ff8dc3d74d58cadf07c8373b",
                "md5": "e21aecc9b41e785e32237dff4a5a1461",
                "sha256": "477dbf08744b9d7f0f1f67d8547c8f45fee5e63404f269334e7bb7848254b7fe"
            },
            "downloads": -1,
            "filename": "computer_mcp-0.0.5.tar.gz",
            "has_sig": false,
            "md5_digest": "e21aecc9b41e785e32237dff4a5a1461",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.12",
            "size": 360648,
            "upload_time": "2025-11-03T02:26:58",
            "upload_time_iso_8601": "2025-11-03T02:26:58.299730Z",
            "url": "https://files.pythonhosted.org/packages/7c/30/f1a0bd3733666a06faf76b2a6ddbc52d06a4ff8dc3d74d58cadf07c8373b/computer_mcp-0.0.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-11-03 02:26:58",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "commandAGI",
    "github_project": "computer-mcp#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "computer-mcp"
}
        
Elapsed time: 0.55194s