# Computer MCP
A cross-platform computer automation and control library supporting multiple interfaces:
- **MCP Server** (stdio + HTTP/SSE modes)
- **HTTP REST API** (with OpenAPI spec)
- **CLI** (command execution and server management)
- **Programmatic Module** (stateless Python functions)
Provides tools for mouse/keyboard automation, screenshot capture, window management, virtual desktops, and comprehensive state tracking including accessibility tree support.
## Features
- **Mouse Control**: Click, double-click, triple-click, button down/up, drag operations
- **Keyboard Control**: Type text, key down/up/press
- **Screenshot Capture**: Fast cross-platform screenshot using `mss`, returns images as base64 or PNG
- **Window Management**: List, switch, move, resize, minimize, maximize, snap windows
- **Virtual Desktops**: List, switch, and move windows between virtual desktops
- **State Tracking**: Configurable tracking of mouse position/buttons, keyboard keys, focused app, and accessibility tree
- **Accessibility Tree**: Full platform-specific implementation for Windows, macOS, and Linux/Ubuntu
## Installation
```bash
# Install core dependencies
pip install -e .
# Optional: Install API/HTTP dependencies
pip install -e ".[api]" # For HTTP REST API server
pip install -e ".[http]" # For MCP HTTP/SSE mode
pip install -e ".[dev]" # All optional dependencies
# Platform-specific optional dependencies (for enhanced features)
pip install -e ".[windows]" # Windows: pywin32 for accessibility tree
pip install -e ".[macos]" # macOS: pyobjc for native accessibility (AppleScript fallback available)
pip install -e ".[linux]" # Linux: PyGObject for AT-SPI (requires: sudo apt install python3-gi gir1.2-atspi-2.0)
```
## Usage
### 1. As MCP Server (stdio mode)
The default mode for MCP clients like Cursor or Claude Desktop.
**Configuration** (e.g., `~/.cursor/mcp.json`):
```json
{
"mcpServers": {
"computer-mcp": {
"command": "uv",
"args": [
"--directory",
"C:\\Users\\Jacob\\Code\\computer-mcp",
"run",
"computer-mcp"
]
}
}
}
```
Or using `uvx`:
```json
{
"mcpServers": {
"computer-mcp": {
"command": "uvx",
"args": ["computer-mcp"]
}
}
}
```
**Note**: `uvx` automatically installs and runs the package if not already installed. Make sure you have [uv](https://github.com/astral-sh/uv) installed.
### 2. As MCP Server (HTTP/SSE mode)
For remote access via HTTP/SSE:
```bash
python -m computer_mcp serve mcp --http --host 127.0.0.1 --port 8000
```
This starts the MCP server with:
- SSE endpoint: `http://127.0.0.1:8000/sse`
- Tool call endpoint: `http://127.0.0.1:8000/mcp`
### 3. As HTTP REST API Server
Start the FastAPI server with automatic OpenAPI documentation:
```bash
python -m computer_mcp serve api --host 127.0.0.1 --port 8000
```
Or using the CLI:
```bash
computer-mcp serve api --port 8000
```
Then access:
- **API Docs**: http://127.0.0.1:8000/docs
- **ReDoc**: http://127.0.0.1:8000/redoc
- **OpenAPI JSON**: http://127.0.0.1:8000/openapi.json
**Example API calls**:
```bash
# Click mouse
curl -X POST http://localhost:8000/mouse/click -H "Content-Type: application/json" -d '{"button": "left"}'
# Type text
curl -X POST http://localhost:8000/keyboard/type -H "Content-Type: application/json" -d '{"text": "Hello World"}'
# Get screenshot as PNG
curl http://localhost:8000/screenshot/image -o screenshot.png
# List windows
curl http://localhost:8000/windows
# Switch to window
curl -X POST http://localhost:8000/windows/switch -H "Content-Type: application/json" -d '{"hwnd": 123456}'
```
### 4. As CLI Tool
Execute commands directly from the command line:
```bash
# Mouse commands
computer-mcp mouse click --button right
computer-mcp mouse double-click
computer-mcp mouse move --x 500 --y 300
# Keyboard commands
computer-mcp keyboard type "Hello World"
computer-mcp keyboard key-press ctrl
# Window commands
computer-mcp window list
computer-mcp window switch --hwnd 123456
computer-mcp window snap-left --hwnd 123456
computer-mcp window close --hwnd 123456
# Screenshot
computer-mcp screenshot --save screenshot.png
# Start servers
computer-mcp serve api --port 8000
computer-mcp serve mcp --http --port 8001
# JSON output
computer-mcp mouse click --json
```
### 5. As Python Module
Import and use stateless functions directly in your code:
```python
from computer_mcp import (
click, double_click, move_mouse, drag,
type_text, key_press, key_down, key_up,
get_screenshot,
list_windows, switch_to_window, close_window,
snap_window_left, snap_window_right,
)
# Mouse operations
click("left")
double_click("right")
move_mouse(500, 300)
drag({"x": 100, "y": 200}, {"x": 300, "y": 400})
# Keyboard operations
type_text("Hello World")
key_press("ctrl")
key_down("shift")
key_up("shift")
# Screenshot
screenshot_data = get_screenshot()
print(f"Screenshot: {screenshot_data['width']}x{screenshot_data['height']}")
# Window management
windows = list_windows()
for window in windows.get("windows", []):
print(f"{window['title']} (hwnd: {window['hwnd']})")
# Switch to a window by title
switch_to_window(title="Notepad")
# Snap window to left half
snap_window_left(hwnd=123456)
```
## Available Tools/Endpoints
### Mouse Operations
- `click(button='left'|'middle'|'right')` - Click at current cursor position
- `double_click(button='left'|'middle'|'right')` - Double-click at current cursor position
- `triple_click(button='left'|'middle'|'right')` - Triple-click at current cursor position
- `button_down(button='left'|'middle'|'right')` - Press and hold a mouse button
- `button_up(button='left'|'middle'|'right')` - Release a mouse button
- `drag(start={x, y}, end={x, y}, button='left')` - Drag from start to end position
- `mouse_move(x, y)` - Move cursor to specified coordinates
**REST API**: `POST /mouse/click`, `POST /mouse/drag`, `POST /mouse/move`, etc.
### Keyboard Operations
- `type(text)` - Type text string
- `key_down(key)` - Press and hold a key
- `key_up(key)` - Release a key
- `key_press(key)` - Press and release a key (convenience)
**REST API**: `POST /keyboard/type`, `POST /keyboard/key-press`, etc.
### Screenshot
- `screenshot()` / `get_screenshot()` - Capture screenshot (included by default in MCP responses)
**REST API**:
- `GET /screenshot` - Returns JSON with base64 data
- `GET /screenshot/image` - Returns PNG image
### Window Management
- `list_windows()` - List all visible windows
- `switch_to_window(hwnd=<int>|title=<str>)` - Switch focus to a window
- `move_window(hwnd, x, y, width?, height?)` - Move and/or resize window
- `resize_window(hwnd, width, height)` - Resize window
- `minimize_window(hwnd)` - Minimize window
- `maximize_window(hwnd)` - Maximize window
- `restore_window(hwnd)` - Restore window
- `set_window_topmost(hwnd, topmost=true)` - Set window always-on-top
- `get_window_info(hwnd)` - Get detailed window information
- `close_window(hwnd)` - Close window
- `snap_window_left(hwnd)` - Snap to left half
- `snap_window_right(hwnd)` - Snap to right half
- `snap_window_top(hwnd)` - Snap to top half
- `snap_window_bottom(hwnd)` - Snap to bottom half
- `screenshot_window(hwnd)` - Capture screenshot of specific window
**REST API**:
- `GET /windows` - List windows
- `POST /windows/switch` - Switch by handle
- `POST /windows/switch-by-title` - Switch by title
- `GET /windows/{hwnd}` - Get window info
- `DELETE /windows/{hwnd}` - Close window
- `POST /windows/{hwnd}/snap-left` - Snap left, etc.
### Virtual Desktops
- `list_virtual_desktops()` - List all virtual desktops
- `switch_virtual_desktop(desktop_id=<int>|name=<str>)` - Switch to virtual desktop
- `move_window_to_virtual_desktop(hwnd, desktop_id)` - Move window to desktop
**REST API**:
- `GET /virtual-desktops` - List desktops
- `POST /virtual-desktops/switch` - Switch desktop
- `POST /windows/{hwnd}/move-to-desktop` - Move window
### Configuration
- `set_config(...)` - Configure observation options:
- `observe_screen` (bool, default: `true`): Include screenshots in all responses
- `observe_mouse_position` (bool, default: `false`): Track and include mouse position
- `observe_mouse_button_states` (bool, default: `false`): Track and include mouse button states
- `observe_keyboard_key_states` (bool, default: `false`): Track and include keyboard key states
- `observe_focused_app` (bool, default: `false`): Include focused application information
- `observe_accessibility_tree` (bool, default: `false`): Include accessibility tree
**REST API**: `POST /config` - Update configuration
## Key Names
Special keys can be specified as strings:
- `"ctrl"`, `"alt"`, `"shift"`, `"cmd"` (or `"win"` on Windows)
- `"space"`, `"enter"`, `"tab"`, `"esc"`, `"backspace"`
- Arrow keys: `"up"`, `"down"`, `"left"`, `"right"`
- Function keys: `"f1"` through `"f12"`
- Regular characters: `"a"`, `"b"`, etc.
## Platform Support
### Windows
- **Full Support**: All mouse/keyboard operations work
- **Window Management**: Full support via `pywin32` (included in `[windows]` extras)
- **Virtual Desktops**: Full support via `VirtualDesktopAccessor.dll`
- **Focused App**: Requires `pywin32` (install with `pip install -e ".[windows]"`)
- **Accessibility Tree**: Uses Windows UI Automation API (requires `pywin32`)
### macOS
- **Full Support**: All mouse/keyboard operations work
- **Window Management**: Limited support via AppleScript (some operations not yet implemented)
- **Virtual Desktops**: Limited support (Spaces enumeration/switching via Mission Control API)
- **Focused App**: Uses AppleScript (no dependencies)
- **Accessibility Tree**:
- Native: Uses AXUIElement via `pyobjc` (install with `pip install -e ".[macos]"`)
- Fallback: Uses AppleScript (works without dependencies, limited tree depth)
### Linux/Ubuntu
- **Full Support**: All mouse/keyboard operations work
- **Window Management**: Full support via `xdotool` (install: `sudo apt install xdotool`)
- **Virtual Desktops**: Full support via `wmctrl` or `xdotool` (install: `sudo apt install wmctrl`)
- **Focused App**: Uses `xdotool` (install: `sudo apt install xdotool`)
- **Accessibility Tree**:
- Native: Uses AT-SPI via PyGObject (install: `sudo apt install python3-gi gir1.2-atspi-2.0`, then `pip install -e ".[linux]"`)
- Fallback: Basic window info via `xdotool`
## Architecture
The codebase is organized into clear layers:
```
computer_mcp/
├── __init__.py # Module API (stateless functions)
├── __main__.py # CLI entry point
├── cli.py # CLI implementation
├── mcp.py # MCP server (stdio + HTTP/SSE)
├── api.py # HTTP REST API server
├── actions/ # Business logic (pure functions)
│ ├── mouse.py
│ ├── keyboard.py
│ ├── window.py
│ ├── screenshot.py
│ ├── config.py
│ ├── focused_app.py
│ └── accessibility_tree.py
├── core/ # Core utilities
│ ├── state.py
│ ├── platform.py
│ ├── screenshot.py
│ ├── response.py
│ └── utils.py
└── resources/ # Platform-specific resources
```
**Key Design Principles:**
- **Actions layer**: Pure business logic functions, no interface dependencies
- **Interface adapters**: MCP, API, CLI wrap the actions layer
- **Stateless module API**: Clean functions for direct Python usage
- **State management**: Optional, configurable per interface
## Response Format
### MCP Server Response
By default (with `observe_screen: true`), all tool responses include a screenshot as MCP `ImageContent`:
**Response Structure:**
- `ImageContent` (type: "image"): Contains the screenshot as base64-encoded PNG with mimeType "image/png"
- `TextContent` (type: "text"): Contains JSON with action results and screenshot metadata:
```json
{
"success": true,
"action": "click",
"button": "left",
"screenshot": {
"format": "base64_png",
"width": 1920,
"height": 1080
}
}
```
With full observation enabled, the TextContent includes additional state:
```json
{
"success": true,
"action": "click",
"button": "left",
"screenshot": {
"format": "base64_png",
"width": 1920,
"height": 1080
},
"mouse_position": {"x": 500, "y": 300},
"mouse_button_states": ["Button.left"],
"keyboard_key_states": ["ctrl"],
"focused_app": {
"name": "Code",
"pid": 12345,
"title": "main.py - computer-mcp"
},
"accessibility_tree": {
"tree": {
"name": "Application",
"control_type": "...",
"bounds": {"x": 0, "y": 0, "width": 1920, "height": 1080},
"children": [...]
}
}
}
```
### HTTP REST API Response
Returns JSON directly:
```json
{
"success": true,
"action": "click",
"button": "left"
}
```
Screenshots are returned as base64-encoded strings in JSON, or use the `/screenshot/image` endpoint for raw PNG.
### CLI Output
Default: Human-readable success/error messages
With `--json`: JSON output matching API format
### Module API Response
Returns plain Python dictionaries:
```python
result = click("left")
# result = {"success": True, "action": "click", "button": "left"}
```
## Notes
- Screenshots are **included by default** in MCP tool responses (when `observe_screen: true`)
- Mouse tools operate at the **current cursor position** unless you explicitly move the mouse first
- State tracking listeners are automatically started/stopped based on configuration
- Accessibility tree implementations may vary in depth and detail across platforms
- Some platform-specific features require optional dependencies or system packages
- Window management features vary by platform (Windows has full support, macOS/Linux have partial support)
## License
MIT
Raw data
{
"_id": null,
"home_page": null,
"name": "computer-mcp",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.12",
"maintainer_email": null,
"keywords": "mcp, automation, mouse, keyboard, screenshot, accessibility, control",
"author": null,
"author_email": "CommandAGI <hello@commandagi.com>",
"download_url": "https://files.pythonhosted.org/packages/7c/30/f1a0bd3733666a06faf76b2a6ddbc52d06a4ff8dc3d74d58cadf07c8373b/computer_mcp-0.0.5.tar.gz",
"platform": null,
"description": "# Computer MCP\n\nA cross-platform computer automation and control library supporting multiple interfaces:\n- **MCP Server** (stdio + HTTP/SSE modes)\n- **HTTP REST API** (with OpenAPI spec)\n- **CLI** (command execution and server management)\n- **Programmatic Module** (stateless Python functions)\n\nProvides tools for mouse/keyboard automation, screenshot capture, window management, virtual desktops, and comprehensive state tracking including accessibility tree support.\n\n## Features\n\n- **Mouse Control**: Click, double-click, triple-click, button down/up, drag operations\n- **Keyboard Control**: Type text, key down/up/press\n- **Screenshot Capture**: Fast cross-platform screenshot using `mss`, returns images as base64 or PNG\n- **Window Management**: List, switch, move, resize, minimize, maximize, snap windows\n- **Virtual Desktops**: List, switch, and move windows between virtual desktops\n- **State Tracking**: Configurable tracking of mouse position/buttons, keyboard keys, focused app, and accessibility tree\n- **Accessibility Tree**: Full platform-specific implementation for Windows, macOS, and Linux/Ubuntu\n\n## Installation\n\n```bash\n# Install core dependencies\npip install -e .\n\n# Optional: Install API/HTTP dependencies\npip install -e \".[api]\" # For HTTP REST API server\npip install -e \".[http]\" # For MCP HTTP/SSE mode\npip install -e \".[dev]\" # All optional dependencies\n\n# Platform-specific optional dependencies (for enhanced features)\npip install -e \".[windows]\" # Windows: pywin32 for accessibility tree\npip install -e \".[macos]\" # macOS: pyobjc for native accessibility (AppleScript fallback available)\npip install -e \".[linux]\" # Linux: PyGObject for AT-SPI (requires: sudo apt install python3-gi gir1.2-atspi-2.0)\n```\n\n## Usage\n\n### 1. As MCP Server (stdio mode)\n\nThe default mode for MCP clients like Cursor or Claude Desktop.\n\n**Configuration** (e.g., `~/.cursor/mcp.json`):\n\n```json\n{\n \"mcpServers\": {\n \"computer-mcp\": {\n \"command\": \"uv\",\n \"args\": [\n \"--directory\",\n \"C:\\\\Users\\\\Jacob\\\\Code\\\\computer-mcp\",\n \"run\",\n \"computer-mcp\"\n ]\n }\n }\n}\n```\n\nOr using `uvx`:\n\n```json\n{\n \"mcpServers\": {\n \"computer-mcp\": {\n \"command\": \"uvx\",\n \"args\": [\"computer-mcp\"]\n }\n }\n}\n```\n\n**Note**: `uvx` automatically installs and runs the package if not already installed. Make sure you have [uv](https://github.com/astral-sh/uv) installed.\n\n### 2. As MCP Server (HTTP/SSE mode)\n\nFor remote access via HTTP/SSE:\n\n```bash\npython -m computer_mcp serve mcp --http --host 127.0.0.1 --port 8000\n```\n\nThis starts the MCP server with:\n- SSE endpoint: `http://127.0.0.1:8000/sse`\n- Tool call endpoint: `http://127.0.0.1:8000/mcp`\n\n### 3. As HTTP REST API Server\n\nStart the FastAPI server with automatic OpenAPI documentation:\n\n```bash\npython -m computer_mcp serve api --host 127.0.0.1 --port 8000\n```\n\nOr using the CLI:\n\n```bash\ncomputer-mcp serve api --port 8000\n```\n\nThen access:\n- **API Docs**: http://127.0.0.1:8000/docs\n- **ReDoc**: http://127.0.0.1:8000/redoc\n- **OpenAPI JSON**: http://127.0.0.1:8000/openapi.json\n\n**Example API calls**:\n\n```bash\n# Click mouse\ncurl -X POST http://localhost:8000/mouse/click -H \"Content-Type: application/json\" -d '{\"button\": \"left\"}'\n\n# Type text\ncurl -X POST http://localhost:8000/keyboard/type -H \"Content-Type: application/json\" -d '{\"text\": \"Hello World\"}'\n\n# Get screenshot as PNG\ncurl http://localhost:8000/screenshot/image -o screenshot.png\n\n# List windows\ncurl http://localhost:8000/windows\n\n# Switch to window\ncurl -X POST http://localhost:8000/windows/switch -H \"Content-Type: application/json\" -d '{\"hwnd\": 123456}'\n```\n\n### 4. As CLI Tool\n\nExecute commands directly from the command line:\n\n```bash\n# Mouse commands\ncomputer-mcp mouse click --button right\ncomputer-mcp mouse double-click\ncomputer-mcp mouse move --x 500 --y 300\n\n# Keyboard commands\ncomputer-mcp keyboard type \"Hello World\"\ncomputer-mcp keyboard key-press ctrl\n\n# Window commands\ncomputer-mcp window list\ncomputer-mcp window switch --hwnd 123456\ncomputer-mcp window snap-left --hwnd 123456\ncomputer-mcp window close --hwnd 123456\n\n# Screenshot\ncomputer-mcp screenshot --save screenshot.png\n\n# Start servers\ncomputer-mcp serve api --port 8000\ncomputer-mcp serve mcp --http --port 8001\n\n# JSON output\ncomputer-mcp mouse click --json\n```\n\n### 5. As Python Module\n\nImport and use stateless functions directly in your code:\n\n```python\nfrom computer_mcp import (\n click, double_click, move_mouse, drag,\n type_text, key_press, key_down, key_up,\n get_screenshot,\n list_windows, switch_to_window, close_window,\n snap_window_left, snap_window_right,\n)\n\n# Mouse operations\nclick(\"left\")\ndouble_click(\"right\")\nmove_mouse(500, 300)\ndrag({\"x\": 100, \"y\": 200}, {\"x\": 300, \"y\": 400})\n\n# Keyboard operations\ntype_text(\"Hello World\")\nkey_press(\"ctrl\")\nkey_down(\"shift\")\nkey_up(\"shift\")\n\n# Screenshot\nscreenshot_data = get_screenshot()\nprint(f\"Screenshot: {screenshot_data['width']}x{screenshot_data['height']}\")\n\n# Window management\nwindows = list_windows()\nfor window in windows.get(\"windows\", []):\n print(f\"{window['title']} (hwnd: {window['hwnd']})\")\n\n# Switch to a window by title\nswitch_to_window(title=\"Notepad\")\n\n# Snap window to left half\nsnap_window_left(hwnd=123456)\n```\n\n## Available Tools/Endpoints\n\n### Mouse Operations\n\n- `click(button='left'|'middle'|'right')` - Click at current cursor position\n- `double_click(button='left'|'middle'|'right')` - Double-click at current cursor position\n- `triple_click(button='left'|'middle'|'right')` - Triple-click at current cursor position\n- `button_down(button='left'|'middle'|'right')` - Press and hold a mouse button\n- `button_up(button='left'|'middle'|'right')` - Release a mouse button\n- `drag(start={x, y}, end={x, y}, button='left')` - Drag from start to end position\n- `mouse_move(x, y)` - Move cursor to specified coordinates\n\n**REST API**: `POST /mouse/click`, `POST /mouse/drag`, `POST /mouse/move`, etc.\n\n### Keyboard Operations\n\n- `type(text)` - Type text string\n- `key_down(key)` - Press and hold a key\n- `key_up(key)` - Release a key\n- `key_press(key)` - Press and release a key (convenience)\n\n**REST API**: `POST /keyboard/type`, `POST /keyboard/key-press`, etc.\n\n### Screenshot\n\n- `screenshot()` / `get_screenshot()` - Capture screenshot (included by default in MCP responses)\n\n**REST API**: \n- `GET /screenshot` - Returns JSON with base64 data\n- `GET /screenshot/image` - Returns PNG image\n\n### Window Management\n\n- `list_windows()` - List all visible windows\n- `switch_to_window(hwnd=<int>|title=<str>)` - Switch focus to a window\n- `move_window(hwnd, x, y, width?, height?)` - Move and/or resize window\n- `resize_window(hwnd, width, height)` - Resize window\n- `minimize_window(hwnd)` - Minimize window\n- `maximize_window(hwnd)` - Maximize window\n- `restore_window(hwnd)` - Restore window\n- `set_window_topmost(hwnd, topmost=true)` - Set window always-on-top\n- `get_window_info(hwnd)` - Get detailed window information\n- `close_window(hwnd)` - Close window\n- `snap_window_left(hwnd)` - Snap to left half\n- `snap_window_right(hwnd)` - Snap to right half\n- `snap_window_top(hwnd)` - Snap to top half\n- `snap_window_bottom(hwnd)` - Snap to bottom half\n- `screenshot_window(hwnd)` - Capture screenshot of specific window\n\n**REST API**: \n- `GET /windows` - List windows\n- `POST /windows/switch` - Switch by handle\n- `POST /windows/switch-by-title` - Switch by title\n- `GET /windows/{hwnd}` - Get window info\n- `DELETE /windows/{hwnd}` - Close window\n- `POST /windows/{hwnd}/snap-left` - Snap left, etc.\n\n### Virtual Desktops\n\n- `list_virtual_desktops()` - List all virtual desktops\n- `switch_virtual_desktop(desktop_id=<int>|name=<str>)` - Switch to virtual desktop\n- `move_window_to_virtual_desktop(hwnd, desktop_id)` - Move window to desktop\n\n**REST API**: \n- `GET /virtual-desktops` - List desktops\n- `POST /virtual-desktops/switch` - Switch desktop\n- `POST /windows/{hwnd}/move-to-desktop` - Move window\n\n### Configuration\n\n- `set_config(...)` - Configure observation options:\n - `observe_screen` (bool, default: `true`): Include screenshots in all responses\n - `observe_mouse_position` (bool, default: `false`): Track and include mouse position\n - `observe_mouse_button_states` (bool, default: `false`): Track and include mouse button states\n - `observe_keyboard_key_states` (bool, default: `false`): Track and include keyboard key states\n - `observe_focused_app` (bool, default: `false`): Include focused application information\n - `observe_accessibility_tree` (bool, default: `false`): Include accessibility tree\n\n**REST API**: `POST /config` - Update configuration\n\n## Key Names\n\nSpecial keys can be specified as strings:\n- `\"ctrl\"`, `\"alt\"`, `\"shift\"`, `\"cmd\"` (or `\"win\"` on Windows)\n- `\"space\"`, `\"enter\"`, `\"tab\"`, `\"esc\"`, `\"backspace\"`\n- Arrow keys: `\"up\"`, `\"down\"`, `\"left\"`, `\"right\"`\n- Function keys: `\"f1\"` through `\"f12\"`\n- Regular characters: `\"a\"`, `\"b\"`, etc.\n\n## Platform Support\n\n### Windows\n- **Full Support**: All mouse/keyboard operations work\n- **Window Management**: Full support via `pywin32` (included in `[windows]` extras)\n- **Virtual Desktops**: Full support via `VirtualDesktopAccessor.dll`\n- **Focused App**: Requires `pywin32` (install with `pip install -e \".[windows]\"`)\n- **Accessibility Tree**: Uses Windows UI Automation API (requires `pywin32`)\n\n### macOS\n- **Full Support**: All mouse/keyboard operations work\n- **Window Management**: Limited support via AppleScript (some operations not yet implemented)\n- **Virtual Desktops**: Limited support (Spaces enumeration/switching via Mission Control API)\n- **Focused App**: Uses AppleScript (no dependencies)\n- **Accessibility Tree**: \n - Native: Uses AXUIElement via `pyobjc` (install with `pip install -e \".[macos]\"`)\n - Fallback: Uses AppleScript (works without dependencies, limited tree depth)\n\n### Linux/Ubuntu\n- **Full Support**: All mouse/keyboard operations work\n- **Window Management**: Full support via `xdotool` (install: `sudo apt install xdotool`)\n- **Virtual Desktops**: Full support via `wmctrl` or `xdotool` (install: `sudo apt install wmctrl`)\n- **Focused App**: Uses `xdotool` (install: `sudo apt install xdotool`)\n- **Accessibility Tree**: \n - Native: Uses AT-SPI via PyGObject (install: `sudo apt install python3-gi gir1.2-atspi-2.0`, then `pip install -e \".[linux]\"`)\n - Fallback: Basic window info via `xdotool`\n\n## Architecture\n\nThe codebase is organized into clear layers:\n\n```\ncomputer_mcp/\n\u251c\u2500\u2500 __init__.py # Module API (stateless functions)\n\u251c\u2500\u2500 __main__.py # CLI entry point\n\u251c\u2500\u2500 cli.py # CLI implementation\n\u251c\u2500\u2500 mcp.py # MCP server (stdio + HTTP/SSE)\n\u251c\u2500\u2500 api.py # HTTP REST API server\n\u251c\u2500\u2500 actions/ # Business logic (pure functions)\n\u2502 \u251c\u2500\u2500 mouse.py\n\u2502 \u251c\u2500\u2500 keyboard.py\n\u2502 \u251c\u2500\u2500 window.py\n\u2502 \u251c\u2500\u2500 screenshot.py\n\u2502 \u251c\u2500\u2500 config.py\n\u2502 \u251c\u2500\u2500 focused_app.py\n\u2502 \u2514\u2500\u2500 accessibility_tree.py\n\u251c\u2500\u2500 core/ # Core utilities\n\u2502 \u251c\u2500\u2500 state.py\n\u2502 \u251c\u2500\u2500 platform.py\n\u2502 \u251c\u2500\u2500 screenshot.py\n\u2502 \u251c\u2500\u2500 response.py\n\u2502 \u2514\u2500\u2500 utils.py\n\u2514\u2500\u2500 resources/ # Platform-specific resources\n```\n\n**Key Design Principles:**\n- **Actions layer**: Pure business logic functions, no interface dependencies\n- **Interface adapters**: MCP, API, CLI wrap the actions layer\n- **Stateless module API**: Clean functions for direct Python usage\n- **State management**: Optional, configurable per interface\n\n## Response Format\n\n### MCP Server Response\n\nBy default (with `observe_screen: true`), all tool responses include a screenshot as MCP `ImageContent`:\n\n**Response Structure:**\n- `ImageContent` (type: \"image\"): Contains the screenshot as base64-encoded PNG with mimeType \"image/png\"\n- `TextContent` (type: \"text\"): Contains JSON with action results and screenshot metadata:\n\n```json\n{\n \"success\": true,\n \"action\": \"click\",\n \"button\": \"left\",\n \"screenshot\": {\n \"format\": \"base64_png\",\n \"width\": 1920,\n \"height\": 1080\n }\n}\n```\n\nWith full observation enabled, the TextContent includes additional state:\n\n```json\n{\n \"success\": true,\n \"action\": \"click\",\n \"button\": \"left\",\n \"screenshot\": {\n \"format\": \"base64_png\",\n \"width\": 1920,\n \"height\": 1080\n },\n \"mouse_position\": {\"x\": 500, \"y\": 300},\n \"mouse_button_states\": [\"Button.left\"],\n \"keyboard_key_states\": [\"ctrl\"],\n \"focused_app\": {\n \"name\": \"Code\",\n \"pid\": 12345,\n \"title\": \"main.py - computer-mcp\"\n },\n \"accessibility_tree\": {\n \"tree\": {\n \"name\": \"Application\",\n \"control_type\": \"...\",\n \"bounds\": {\"x\": 0, \"y\": 0, \"width\": 1920, \"height\": 1080},\n \"children\": [...]\n }\n }\n}\n```\n\n### HTTP REST API Response\n\nReturns JSON directly:\n\n```json\n{\n \"success\": true,\n \"action\": \"click\",\n \"button\": \"left\"\n}\n```\n\nScreenshots are returned as base64-encoded strings in JSON, or use the `/screenshot/image` endpoint for raw PNG.\n\n### CLI Output\n\nDefault: Human-readable success/error messages\nWith `--json`: JSON output matching API format\n\n### Module API Response\n\nReturns plain Python dictionaries:\n\n```python\nresult = click(\"left\")\n# result = {\"success\": True, \"action\": \"click\", \"button\": \"left\"}\n```\n\n## Notes\n\n- Screenshots are **included by default** in MCP tool responses (when `observe_screen: true`)\n- Mouse tools operate at the **current cursor position** unless you explicitly move the mouse first\n- State tracking listeners are automatically started/stopped based on configuration\n- Accessibility tree implementations may vary in depth and detail across platforms\n- Some platform-specific features require optional dependencies or system packages\n- Window management features vary by platform (Windows has full support, macOS/Linux have partial support)\n\n## License\n\nMIT\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Cross-platform MCP server for computer automation and control",
"version": "0.0.5",
"project_urls": {
"Documentation": "https://github.com/commandAGI/computer-mcp#readme",
"Homepage": "https://commandagi.com",
"Issues": "https://github.com/commandAGI/computer-mcp/issues",
"Repository": "https://github.com/commandAGI/computer-mcp"
},
"split_keywords": [
"mcp",
" automation",
" mouse",
" keyboard",
" screenshot",
" accessibility",
" control"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "ba137536556024985f8bb0ad3c0dcf4ccbac0884aa3bb69e10b3a9d07dda3775",
"md5": "5e9f2eea69644d7416aec2ac94f46427",
"sha256": "21bfca59687ccd67c2f11e3e3ad33418b6af3e7a01c8f6520db01a768428015e"
},
"downloads": -1,
"filename": "computer_mcp-0.0.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "5e9f2eea69644d7416aec2ac94f46427",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.12",
"size": 215336,
"upload_time": "2025-11-03T02:26:56",
"upload_time_iso_8601": "2025-11-03T02:26:56.817853Z",
"url": "https://files.pythonhosted.org/packages/ba/13/7536556024985f8bb0ad3c0dcf4ccbac0884aa3bb69e10b3a9d07dda3775/computer_mcp-0.0.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "7c30f1a0bd3733666a06faf76b2a6ddbc52d06a4ff8dc3d74d58cadf07c8373b",
"md5": "e21aecc9b41e785e32237dff4a5a1461",
"sha256": "477dbf08744b9d7f0f1f67d8547c8f45fee5e63404f269334e7bb7848254b7fe"
},
"downloads": -1,
"filename": "computer_mcp-0.0.5.tar.gz",
"has_sig": false,
"md5_digest": "e21aecc9b41e785e32237dff4a5a1461",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.12",
"size": 360648,
"upload_time": "2025-11-03T02:26:58",
"upload_time_iso_8601": "2025-11-03T02:26:58.299730Z",
"url": "https://files.pythonhosted.org/packages/7c/30/f1a0bd3733666a06faf76b2a6ddbc52d06a4ff8dc3d74d58cadf07c8373b/computer_mcp-0.0.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-11-03 02:26:58",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "commandAGI",
"github_project": "computer-mcp#readme",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "computer-mcp"
}