mcp-as-a-judge

Name	mcp-as-a-judge JSON
Version	0.3.14 JSON
	download
home_page	None
Summary	MCP as a Judge: a behavioral MCP that strengthens AI coding assistants via explicit LLM evaluations
upload_time	2025-09-18 22:05:00
maintainer	Zvi Fried
docs_url	None
author	Zvi Fried
requires_python	>=3.12
license	MIT
keywords	ai automation best-practices code-review judge mcp model-context-protocol software-engineering
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # MCP as a Judge ⚖️

mcp-name: io.github.OtherVibes/mcp-as-a-judge

<div align="left">
  <img src="assets/mcp-as-a-judge.png" alt="MCP as a Judge Logo" width="200">
</div>

> MCP as a Judge acts as a validation layer between AI coding assistants and LLMs, helping ensure safer and higher-quality code.


[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/license/mit/)
[![Python 3.13+](https://img.shields.io/badge/python-3.13+-blue.svg)](https://www.python.org/downloads/)
[![MCP Compatible](https://img.shields.io/badge/MCP-Compatible-green.svg)](https://modelcontextprotocol.io/)

[![CI](https://github.com/OtherVibes/mcp-as-a-judge/workflows/CI/badge.svg)](https://github.com/OtherVibes/mcp-as-a-judge/actions/workflows/ci.yml)
[![Release](https://github.com/OtherVibes/mcp-as-a-judge/workflows/Release/badge.svg)](https://github.com/OtherVibes/mcp-as-a-judge/actions/workflows/release.yml)
[![PyPI version](https://img.shields.io/pypi/v/mcp-as-a-judge.svg)](https://pypi.org/project/mcp-as-a-judge/)



**MCP as a Judge** is a **behavioral MCP** that strengthens AI coding assistants by requiring explicit LLM evaluations for:
- Research, system design, and planning
- Code changes, testing, and task-completion verification

It enforces evidence-based research, reuse over reinvention, and human-in-the-loop decisions.

> If your IDE has rules/agents (Copilot, Cursor, Claude Code), keep using them—this Judge adds enforceable approval gates on plan, code diffs, and tests.


## Key problems with AI coding assistants and LLMs
- Treat LLM output as ground truth; skip research and use outdated information
- Reinvent the wheel instead of reusing libraries and existing code
- Cut corners: code below engineering standards and weak tests
- Make unilateral decisions when requirements are ambiguous or plans change
- Security blind spots: missing input validation, injection risks/attack vectors, least‑privilege violations, and weak defensive programming


## **Vibe coding doesn’t have to be frustrating**

### What it enforces
- Evidence‑based research and reuse (best practices, libraries, existing code)
- Plan‑first delivery aligned to user requirements
- Human‑in‑the‑loop decisions for ambiguity and blockers
- Quality gates on code and tests (security, performance, maintainability)

### Key capabilities
- Intelligent code evaluation via MCP [sampling](https://modelcontextprotocol.io/docs/learn/client-concepts#sampling); enforces software‑engineering standards and flags security/performance/maintainability risks
- Comprehensive plan/design review: validates architecture, research depth, requirements fit, and implementation approach
- User‑driven decisions via MCP [elicitation](https://modelcontextprotocol.io/docs/learn/client-concepts#elicitation): clarifies requirements, resolves obstacles, and keeps choices transparent
- Security validation in system design and code changes



### Tools and how they help
| Tool | What it solves |
|------|-----------------|
| `set_coding_task` | Creates/updates task metadata; classifies task_size; returns next-step workflow guidance |
| `get_current_coding_task` | Recovers the latest task_id and metadata to resume work safely |
| `judge_coding_plan` | Validates plan/design; requires library selection and internal reuse maps; flags risks |
| `judge_code_change` | Reviews unified Git diffs for correctness, reuse, security, and code quality |
| `judge_testing_implementation` | Validates tests using real runner output and optional coverage |
| `judge_coding_task_completion` | Final gate ensuring plan, code, and tests approvals before completion |
| `raise_missing_requirements` | Elicits missing details and decisions to unblock progress |
| `raise_obstacle` | Engages the user on trade‑offs, constraints, and enforced changes |

## 🚀 **Quick Start**

### **Requirements & Recommendations**

#### **MCP Client Prerequisites**

MCP as a Judge is heavily dependent on **MCP Sampling** and **MCP Elicitation** features for its core functionality:

- **[MCP Sampling](https://modelcontextprotocol.io/docs/learn/client-concepts#sampling)** - Required for AI-powered code evaluation and judgment
- **[MCP Elicitation](https://modelcontextprotocol.io/docs/learn/client-concepts#elicitation)** - Required for interactive user decision prompts

#### **System Prerequisites**

- **Docker Desktop** / **Python 3.13+** - Required for running the MCP server

#### **Supported AI Assistants**

| AI Assistant | Platform | MCP Support | Status | Notes |
|---------------|----------|-------------|---------|-------|
| **GitHub Copilot** | Visual Studio Code | ✅ Full | **Recommended** | Complete MCP integration with sampling and elicitation |
| **Claude Code** | - | ⚠️ Partial | Requires LLM API key | [Sampling Support feature request](https://github.com/anthropics/claude-code/issues/1785)<br>[Elicitation Support feature request](https://github.com/anthropics/claude-code/issues/2799) |
| **Cursor** | - | ⚠️ Partial | Requires LLM API key | MCP support available, but sampling/elicitation limited |
| **Augment** | - | ⚠️ Partial | Requires LLM API key | MCP support available, but sampling/elicitation limited |
| **Qodo** | - | ⚠️ Partial | Requires LLM API key | MCP support available, but sampling/elicitation limited |

**✅ Recommended setup:** GitHub Copilot + VS Code — full MCP sampling; no API key needed.

**⚠️ Critical:** For assistants without full MCP sampling (Cursor, Claude Code, Augment, Qodo), you MUST set `LLM_API_KEY`. Without it, the server cannot evaluate plans or code. See [LLM API Configuration](#-llm-api-configuration-optional).

**💡 Tip:** Prefer large context models (≥ 1M tokens) for better analysis and judgments.

### If the MCP server isn’t auto‑used
For troubleshooting, visit the [FAQs section](#faq).

## 🔧 **MCP Configuration**

Configure **MCP as a Judge** in your MCP-enabled client:

### **Method 1: Using Docker (Recommended)**

#### One‑click install for VS Code (MCP)

[![Install for MCP as a Judge](https://img.shields.io/badge/VS_Code-Install_for_MCP_as_a_Judge-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect/mcp/install?name=mcp-as-a-judge&inputs=%5B%5D&config=%7B%22command%22%3A%22docker%22%2C%22args%22%3A%5B%22run%22%2C%22-i%22%2C%22--rm%22%2C%22--pull%3Dalways%22%2C%22ghcr.io%2Fothervibes%2Fmcp-as-a-judge%3Alatest%22%5D%7D)



Notes:
- VS Code controls the sampling model; select it via “MCP: List Servers → mcp-as-a-judge → Configure Model Access”.


1. **Configure MCP Settings:**

   Add this to your MCP client configuration file:

   ```json
   {
     "command": "docker",
     "args": ["run", "--rm", "-i", "--pull=always", "ghcr.io/othervibes/mcp-as-a-judge:latest"],
     "env": {
       "LLM_API_KEY": "your-openai-api-key-here",
       "LLM_MODEL_NAME": "gpt-4o-mini"
     }
   }
   ```

   **📝 Configuration Options (All Optional):**
   - **LLM_API_KEY**: Optional for GitHub Copilot + VS Code (has built-in MCP sampling)
   - **LLM_MODEL_NAME**: Optional custom model (see [Supported LLM Providers](#supported-llm-providers) for defaults)
   - The `--pull=always` flag ensures you always get the latest version automatically

   Then manually update when needed:

   ```bash
   # Pull the latest version
   docker pull ghcr.io/othervibes/mcp-as-a-judge:latest
   ```

### **Method 2: Using uv**

1. **Install the package:**

   ```bash
   uv tool install mcp-as-a-judge
   ```

2. **Configure MCP Settings:**

   The MCP server may be automatically detected by your MCP‑enabled client.

   **📝 Notes:**
   - **No additional configuration needed for GitHub Copilot + VS Code** (has built-in MCP sampling)
   - LLM_API_KEY is optional and can be set via environment variable if needed

3. **To update to the latest version:**

   ```bash
   # Update MCP as a Judge to the latest version
   uv tool upgrade mcp-as-a-judge
   ```
### Select a sampling model in VS Code
- Open Command Palette (Cmd/Ctrl+Shift+P) → “MCP: List Servers”
- Select the configured server “mcp-as-a-judge”
- Choose “Configure Model Access”
- Check your preferred model(s) to enable sampling



## 🔑 **LLM API Configuration (Optional)**

For [AI assistants without full MCP sampling support](#supported-ai-assistants) you can configure an LLM API key as a fallback. This ensures MCP as a Judge works even when the client doesn't support MCP sampling.

- Set `LLM_API_KEY` (unified key). Vendor is auto-detected; optionally set `LLM_MODEL_NAME` to override the default.

### **Supported LLM Providers**

| Rank | Provider | API Key Format | Default Model | Notes |
|------|----------|----------------|---------------|-------|
| **1** | **OpenAI** | `sk-...` | `gpt-4.1` | Fast and reliable model optimized for speed |
| **2** | **Anthropic** | `sk-ant-...` | `claude-sonnet-4-20250514` | High-performance with exceptional reasoning |
| **3** | **Google** | `AIza...` | `gemini-2.5-pro` | Most advanced model with built-in thinking |
| **4** | **Azure OpenAI** | `[a-f0-9]{32}` | `gpt-4.1` | Same as OpenAI but via Azure |
| **5** | **AWS Bedrock** | AWS credentials | `anthropic.claude-sonnet-4-20250514-v1:0` | Aligned with Anthropic |
| **6** | **Vertex AI** | Service Account JSON | `gemini-2.5-pro` | Enterprise Gemini via Google Cloud |
| **7** | **Groq** | `gsk_...` | `deepseek-r1` | Best reasoning model with speed advantage |
| **8** | **OpenRouter** | `sk-or-...` | `deepseek/deepseek-r1` | Best reasoning model available |
| **9** | **xAI** | `xai-...` | `grok-code-fast-1` | Latest coding-focused model (Aug 2025) |
| **10** | **Mistral** | `[a-f0-9]{64}` | `pixtral-large` | Most advanced model (124B params) |



### **Client-Specific Setup**

#### **Cursor**

1. **Open Cursor Settings:**
   - Go to `File` → `Preferences` → `Cursor Settings`
   - Navigate to the `MCP` tab
   - Click `+ Add` to add a new MCP server

2. **Add MCP Server Configuration:**
   ```json
   {
     "command": "uv",
     "args": ["tool", "run", "mcp-as-a-judge"],
     "env": {
       "LLM_API_KEY": "your-openai-api-key-here",
       "LLM_MODEL_NAME": "gpt-4.1"
     }
   }
   ```

   **📝 Configuration Options:**
   - **LLM_API_KEY**: Required for Cursor (limited MCP sampling)
   - **LLM_MODEL_NAME**: Optional custom model (see [Supported LLM Providers](#supported-llm-providers) for defaults)

#### **Claude Code**

1. **Add MCP Server via CLI:**
   ```bash
   # Set environment variables first (optional model override)
   export LLM_API_KEY="your_api_key_here"
   export LLM_MODEL_NAME="claude-3-5-haiku"  # Optional: faster/cheaper model

   # Add MCP server
   claude mcp add mcp-as-a-judge -- uv tool run mcp-as-a-judge
   ```

2. **Alternative: Manual Configuration:**
   - Create or edit `~/.config/claude-code/mcp_servers.json`
   ```json
   {
     "command": "uv",
     "args": ["tool", "run", "mcp-as-a-judge"],
     "env": {
       "LLM_API_KEY": "your-anthropic-api-key-here",
       "LLM_MODEL_NAME": "claude-3-5-haiku"
     }
   }
   ```

   **📝 Configuration Options:**
   - **LLM_API_KEY**: Required for Claude Code (limited MCP sampling)
   - **LLM_MODEL_NAME**: Optional custom model (see [Supported LLM Providers](#supported-llm-providers) for defaults)

#### **Other MCP Clients**

For other MCP-compatible clients, use the standard MCP server configuration:

```json
{
  "command": "uv",
  "args": ["tool", "run", "mcp-as-a-judge"],
  "env": {
    "LLM_API_KEY": "your-openai-api-key-here",
    "LLM_MODEL_NAME": "gpt-5"
  }
}
```

**📝 Configuration Options:**
- **LLM_API_KEY**: Required for most MCP clients (except GitHub Copilot + VS Code)
- **LLM_MODEL_NAME**: Optional custom model (see [Supported LLM Providers](#supported-llm-providers) for defaults)





## 🔒 **Privacy & Flexible AI Integration**

### **🔑 MCP Sampling (Preferred) + LLM API Key Fallback**

**Primary Mode: MCP Sampling**
- All judgments are performed using **MCP Sampling** capability
- No need to configure or pay for external LLM API services
- Works directly with your MCP-compatible client's existing AI model
- **Currently supported by:** GitHub Copilot + VS Code

**Fallback Mode: LLM API Key**
- When MCP sampling is not available, the server can use LLM API keys
- Supports multiple providers via LiteLLM: OpenAI, Anthropic, Google, Azure, Groq, Mistral, xAI
- Automatic vendor detection from API key patterns
- Default model selection per vendor when no model is specified


### **🛡️ Your Privacy Matters**

- The server runs **locally** on your machine
- **No data collection** - your code and conversations stay private
- **No external API calls when using MCP Sampling**. If you set `LLM_API_KEY` for fallback, the server will call your chosen LLM provider only to perform judgments (plan/code/test) with the evaluation content you provide.
- Complete control over your development workflow and sensitive information

## 🤝 **Contributing**

We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

### **Development Setup**

```bash
# Clone the repository
git clone https://github.com/OtherVibes/mcp-as-a-judge.git
cd mcp-as-a-judge

# Install dependencies with uv
uv sync --all-extras --dev

# Install pre-commit hooks
uv run pre-commit install

# Run tests
uv run pytest

# Run all checks
uv run pytest && uv run ruff check && uv run ruff format --check && uv run mypy src
```


## © Concepts and Methodology
© 2025 OtherVibes and Zvi Fried. The "MCP as a Judge" concept, the "behavioral MCP" approach, the staged workflow (plan → code → test → completion), tool taxonomy/descriptions, and prompt templates are original work developed in this repository.


## Prior Art and Attribution
While “LLM‑as‑a‑judge” is a broadly known idea, this repository defines the original “MCP as a Judge” behavioral MCP pattern by OtherVibes and Zvi Fried. It combines task‑centric workflow enforcement (plan → code → test → completion), explicit LLM‑based validations, and human‑in‑the‑loop elicitation, along with the prompt templates and tool taxonomy provided here. Please attribute as: “OtherVibes – MCP as a Judge (Zvi Fried)”.

## ❓ FAQ

### How is “MCP as a Judge” different from rules/subagents in IDE assistants (GitHub Copilot, Cursor, Claude Code)?
| Feature | IDE Rules | Subagents | MCP as a Judge |
|---------|-----------|-----------|----------------|
| Static behavior guidance | ✓ | ✓ | ✗ |
| Custom system prompts | ✓ | ✓ | ✓ |
| Project context integration | ✓ | ✓ | ✓ |
| Specialized task handling | ✗ | ✓ | ✓ |
| Active quality gates | ✗ | ✗ | ✓ |
| Evidence-based validation | ✗ | ✗ | ✓ |
| Approve/reject with feedback | ✗ | ✗ | ✓ |
| Workflow enforcement | ✗ | ✗ | ✓ |
| Cross-assistant compatibility | ✗ | ✗ | ✓ |
  - References: [GitHub Copilot Custom Instructions](https://docs.github.com/en/copilot/how-tos/configure-custom-instructions/add-repository-instructions), [Cursor Rules](https://docs.cursor.com/en/context/@-symbols/@-cursor-rules), [Claude Code Subagents](https://docs.anthropic.com/en/docs/claude-code/sub-agents)

### How does the Judge workflow relate to the tasklist? Why do we need both?
- Tasklist = planning/organization: tracks tasks, priorities, and status. It doesn’t guarantee engineering quality or readiness.
- Judge workflow = quality gates: enforces approvals for plan/design, code diffs, tests, and final completion. It demands real evidence (e.g., unified Git diffs and raw test output) and returns structured approvals and required improvements.
- Together: Use the tasklist to organize work; use the Judge to decide when each stage is actually ready to proceed. The server also emits next_tool guidance to keep progress moving through the gates.

### If the Judge isn’t used automatically, how do I force it?
- In your prompt: "use mcp-as-a-judge" or "Evaluate plan/code/test using the MCP server mcp-as-a-judge".
- VS Code: Command Palette → "MCP: List Servers" → ensure "mcp-as-a-judge" is listed and enabled.
- Ensure the MCP server is running and, in your client, the judge tools are enabled/approved.

### How do I select models for sampling in VS Code?
- Open Command Palette (Cmd/Ctrl+Shift+P) → "MCP: List Servers"
- Select "mcp-as-a-judge" → "Configure Model Access"
- Check your preferred model(s) to enable sampling



## 📄 **License**

This project is licensed under the MIT License (see [LICENSE](LICENSE)).

## 🙏 **Acknowledgments**

- [Model Context Protocol](https://modelcontextprotocol.io/) by Anthropic
- [LiteLLM](https://github.com/BerriAI/litellm) for unified LLM API integration

---

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "mcp-as-a-judge",
    "maintainer": "Zvi Fried",
    "docs_url": null,
    "requires_python": ">=3.12",
    "maintainer_email": null,
    "keywords": "ai, automation, best-practices, code-review, judge, mcp, model-context-protocol, software-engineering",
    "author": "Zvi Fried",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/5d/66/0de9393314c4569675e28d91ff7b5f6959bb2de380f22e301b87a2455347/mcp_as_a_judge-0.3.14.tar.gz",
    "platform": null,
    "description": "# MCP as a Judge \u2696\ufe0f\n\nmcp-name: io.github.OtherVibes/mcp-as-a-judge\n\n<div align=\"left\">\n  <img src=\"assets/mcp-as-a-judge.png\" alt=\"MCP as a Judge Logo\" width=\"200\">\n</div>\n\n> MCP as a Judge acts as a validation layer between AI coding assistants and LLMs, helping ensure safer and higher-quality code.\n\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/license/mit/)\n[![Python 3.13+](https://img.shields.io/badge/python-3.13+-blue.svg)](https://www.python.org/downloads/)\n[![MCP Compatible](https://img.shields.io/badge/MCP-Compatible-green.svg)](https://modelcontextprotocol.io/)\n\n[![CI](https://github.com/OtherVibes/mcp-as-a-judge/workflows/CI/badge.svg)](https://github.com/OtherVibes/mcp-as-a-judge/actions/workflows/ci.yml)\n[![Release](https://github.com/OtherVibes/mcp-as-a-judge/workflows/Release/badge.svg)](https://github.com/OtherVibes/mcp-as-a-judge/actions/workflows/release.yml)\n[![PyPI version](https://img.shields.io/pypi/v/mcp-as-a-judge.svg)](https://pypi.org/project/mcp-as-a-judge/)\n\n\n\n**MCP as a Judge** is a **behavioral MCP** that strengthens AI coding assistants by requiring explicit LLM evaluations for:\n- Research, system design, and planning\n- Code changes, testing, and task-completion verification\n\nIt enforces evidence-based research, reuse over reinvention, and human-in-the-loop decisions.\n\n> If your IDE has rules/agents (Copilot, Cursor, Claude Code), keep using them\u2014this Judge adds enforceable approval gates on plan, code diffs, and tests.\n\n\n## Key problems with AI coding assistants and LLMs\n- Treat LLM output as ground truth; skip research and use outdated information\n- Reinvent the wheel instead of reusing libraries and existing code\n- Cut corners: code below engineering standards and weak tests\n- Make unilateral decisions when requirements are ambiguous or plans change\n- Security blind spots: missing input validation, injection risks/attack vectors, least\u2011privilege violations, and weak defensive programming\n\n\n## **Vibe coding doesn\u2019t have to be frustrating**\n\n### What it enforces\n- Evidence\u2011based research and reuse (best practices, libraries, existing code)\n- Plan\u2011first delivery aligned to user requirements\n- Human\u2011in\u2011the\u2011loop decisions for ambiguity and blockers\n- Quality gates on code and tests (security, performance, maintainability)\n\n### Key capabilities\n- Intelligent code evaluation via MCP [sampling](https://modelcontextprotocol.io/docs/learn/client-concepts#sampling); enforces software\u2011engineering standards and flags security/performance/maintainability risks\n- Comprehensive plan/design review: validates architecture, research depth, requirements fit, and implementation approach\n- User\u2011driven decisions via MCP [elicitation](https://modelcontextprotocol.io/docs/learn/client-concepts#elicitation): clarifies requirements, resolves obstacles, and keeps choices transparent\n- Security validation in system design and code changes\n\n\n\n### Tools and how they help\n| Tool | What it solves |\n|------|-----------------|\n| `set_coding_task` | Creates/updates task metadata; classifies task_size; returns next-step workflow guidance |\n| `get_current_coding_task` | Recovers the latest task_id and metadata to resume work safely |\n| `judge_coding_plan` | Validates plan/design; requires library selection and internal reuse maps; flags risks |\n| `judge_code_change` | Reviews unified Git diffs for correctness, reuse, security, and code quality |\n| `judge_testing_implementation` | Validates tests using real runner output and optional coverage |\n| `judge_coding_task_completion` | Final gate ensuring plan, code, and tests approvals before completion |\n| `raise_missing_requirements` | Elicits missing details and decisions to unblock progress |\n| `raise_obstacle` | Engages the user on trade\u2011offs, constraints, and enforced changes |\n\n## \ud83d\ude80 **Quick Start**\n\n### **Requirements & Recommendations**\n\n#### **MCP Client Prerequisites**\n\nMCP as a Judge is heavily dependent on **MCP Sampling** and **MCP Elicitation** features for its core functionality:\n\n- **[MCP Sampling](https://modelcontextprotocol.io/docs/learn/client-concepts#sampling)** - Required for AI-powered code evaluation and judgment\n- **[MCP Elicitation](https://modelcontextprotocol.io/docs/learn/client-concepts#elicitation)** - Required for interactive user decision prompts\n\n#### **System Prerequisites**\n\n- **Docker Desktop** / **Python 3.13+** - Required for running the MCP server\n\n#### **Supported AI Assistants**\n\n| AI Assistant | Platform | MCP Support | Status | Notes |\n|---------------|----------|-------------|---------|-------|\n| **GitHub Copilot** | Visual Studio Code | \u2705 Full | **Recommended** | Complete MCP integration with sampling and elicitation |\n| **Claude Code** | - | \u26a0\ufe0f Partial | Requires LLM API key | [Sampling Support feature request](https://github.com/anthropics/claude-code/issues/1785)<br>[Elicitation Support feature request](https://github.com/anthropics/claude-code/issues/2799) |\n| **Cursor** | - | \u26a0\ufe0f Partial | Requires LLM API key | MCP support available, but sampling/elicitation limited |\n| **Augment** | - | \u26a0\ufe0f Partial | Requires LLM API key | MCP support available, but sampling/elicitation limited |\n| **Qodo** | - | \u26a0\ufe0f Partial | Requires LLM API key | MCP support available, but sampling/elicitation limited |\n\n**\u2705 Recommended setup:** GitHub Copilot + VS Code \u2014 full MCP sampling; no API key needed.\n\n**\u26a0\ufe0f Critical:** For assistants without full MCP sampling (Cursor, Claude Code, Augment, Qodo), you MUST set `LLM_API_KEY`. Without it, the server cannot evaluate plans or code. See [LLM API Configuration](#-llm-api-configuration-optional).\n\n**\ud83d\udca1 Tip:** Prefer large context models (\u2265 1M tokens) for better analysis and judgments.\n\n### If the MCP server isn\u2019t auto\u2011used\nFor troubleshooting, visit the [FAQs section](#faq).\n\n## \ud83d\udd27 **MCP Configuration**\n\nConfigure **MCP as a Judge** in your MCP-enabled client:\n\n### **Method 1: Using Docker (Recommended)**\n\n#### One\u2011click install for VS Code (MCP)\n\n[![Install for MCP as a Judge](https://img.shields.io/badge/VS_Code-Install_for_MCP_as_a_Judge-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect/mcp/install?name=mcp-as-a-judge&inputs=%5B%5D&config=%7B%22command%22%3A%22docker%22%2C%22args%22%3A%5B%22run%22%2C%22-i%22%2C%22--rm%22%2C%22--pull%3Dalways%22%2C%22ghcr.io%2Fothervibes%2Fmcp-as-a-judge%3Alatest%22%5D%7D)\n\n\n\nNotes:\n- VS Code controls the sampling model; select it via \u201cMCP: List Servers \u2192 mcp-as-a-judge \u2192 Configure Model Access\u201d.\n\n\n1. **Configure MCP Settings:**\n\n   Add this to your MCP client configuration file:\n\n   ```json\n   {\n     \"command\": \"docker\",\n     \"args\": [\"run\", \"--rm\", \"-i\", \"--pull=always\", \"ghcr.io/othervibes/mcp-as-a-judge:latest\"],\n     \"env\": {\n       \"LLM_API_KEY\": \"your-openai-api-key-here\",\n       \"LLM_MODEL_NAME\": \"gpt-4o-mini\"\n     }\n   }\n   ```\n\n   **\ud83d\udcdd Configuration Options (All Optional):**\n   - **LLM_API_KEY**: Optional for GitHub Copilot + VS Code (has built-in MCP sampling)\n   - **LLM_MODEL_NAME**: Optional custom model (see [Supported LLM Providers](#supported-llm-providers) for defaults)\n   - The `--pull=always` flag ensures you always get the latest version automatically\n\n   Then manually update when needed:\n\n   ```bash\n   # Pull the latest version\n   docker pull ghcr.io/othervibes/mcp-as-a-judge:latest\n   ```\n\n### **Method 2: Using uv**\n\n1. **Install the package:**\n\n   ```bash\n   uv tool install mcp-as-a-judge\n   ```\n\n2. **Configure MCP Settings:**\n\n   The MCP server may be automatically detected by your MCP\u2011enabled client.\n\n   **\ud83d\udcdd Notes:**\n   - **No additional configuration needed for GitHub Copilot + VS Code** (has built-in MCP sampling)\n   - LLM_API_KEY is optional and can be set via environment variable if needed\n\n3. **To update to the latest version:**\n\n   ```bash\n   # Update MCP as a Judge to the latest version\n   uv tool upgrade mcp-as-a-judge\n   ```\n### Select a sampling model in VS Code\n- Open Command Palette (Cmd/Ctrl+Shift+P) \u2192 \u201cMCP: List Servers\u201d\n- Select the configured server \u201cmcp-as-a-judge\u201d\n- Choose \u201cConfigure Model Access\u201d\n- Check your preferred model(s) to enable sampling\n\n\n\n## \ud83d\udd11 **LLM API Configuration (Optional)**\n\nFor [AI assistants without full MCP sampling support](#supported-ai-assistants) you can configure an LLM API key as a fallback. This ensures MCP as a Judge works even when the client doesn't support MCP sampling.\n\n- Set `LLM_API_KEY` (unified key). Vendor is auto-detected; optionally set `LLM_MODEL_NAME` to override the default.\n\n### **Supported LLM Providers**\n\n| Rank | Provider | API Key Format | Default Model | Notes |\n|------|----------|----------------|---------------|-------|\n| **1** | **OpenAI** | `sk-...` | `gpt-4.1` | Fast and reliable model optimized for speed |\n| **2** | **Anthropic** | `sk-ant-...` | `claude-sonnet-4-20250514` | High-performance with exceptional reasoning |\n| **3** | **Google** | `AIza...` | `gemini-2.5-pro` | Most advanced model with built-in thinking |\n| **4** | **Azure OpenAI** | `[a-f0-9]{32}` | `gpt-4.1` | Same as OpenAI but via Azure |\n| **5** | **AWS Bedrock** | AWS credentials | `anthropic.claude-sonnet-4-20250514-v1:0` | Aligned with Anthropic |\n| **6** | **Vertex AI** | Service Account JSON | `gemini-2.5-pro` | Enterprise Gemini via Google Cloud |\n| **7** | **Groq** | `gsk_...` | `deepseek-r1` | Best reasoning model with speed advantage |\n| **8** | **OpenRouter** | `sk-or-...` | `deepseek/deepseek-r1` | Best reasoning model available |\n| **9** | **xAI** | `xai-...` | `grok-code-fast-1` | Latest coding-focused model (Aug 2025) |\n| **10** | **Mistral** | `[a-f0-9]{64}` | `pixtral-large` | Most advanced model (124B params) |\n\n\n\n### **Client-Specific Setup**\n\n#### **Cursor**\n\n1. **Open Cursor Settings:**\n   - Go to `File` \u2192 `Preferences` \u2192 `Cursor Settings`\n   - Navigate to the `MCP` tab\n   - Click `+ Add` to add a new MCP server\n\n2. **Add MCP Server Configuration:**\n   ```json\n   {\n     \"command\": \"uv\",\n     \"args\": [\"tool\", \"run\", \"mcp-as-a-judge\"],\n     \"env\": {\n       \"LLM_API_KEY\": \"your-openai-api-key-here\",\n       \"LLM_MODEL_NAME\": \"gpt-4.1\"\n     }\n   }\n   ```\n\n   **\ud83d\udcdd Configuration Options:**\n   - **LLM_API_KEY**: Required for Cursor (limited MCP sampling)\n   - **LLM_MODEL_NAME**: Optional custom model (see [Supported LLM Providers](#supported-llm-providers) for defaults)\n\n#### **Claude Code**\n\n1. **Add MCP Server via CLI:**\n   ```bash\n   # Set environment variables first (optional model override)\n   export LLM_API_KEY=\"your_api_key_here\"\n   export LLM_MODEL_NAME=\"claude-3-5-haiku\"  # Optional: faster/cheaper model\n\n   # Add MCP server\n   claude mcp add mcp-as-a-judge -- uv tool run mcp-as-a-judge\n   ```\n\n2. **Alternative: Manual Configuration:**\n   - Create or edit `~/.config/claude-code/mcp_servers.json`\n   ```json\n   {\n     \"command\": \"uv\",\n     \"args\": [\"tool\", \"run\", \"mcp-as-a-judge\"],\n     \"env\": {\n       \"LLM_API_KEY\": \"your-anthropic-api-key-here\",\n       \"LLM_MODEL_NAME\": \"claude-3-5-haiku\"\n     }\n   }\n   ```\n\n   **\ud83d\udcdd Configuration Options:**\n   - **LLM_API_KEY**: Required for Claude Code (limited MCP sampling)\n   - **LLM_MODEL_NAME**: Optional custom model (see [Supported LLM Providers](#supported-llm-providers) for defaults)\n\n#### **Other MCP Clients**\n\nFor other MCP-compatible clients, use the standard MCP server configuration:\n\n```json\n{\n  \"command\": \"uv\",\n  \"args\": [\"tool\", \"run\", \"mcp-as-a-judge\"],\n  \"env\": {\n    \"LLM_API_KEY\": \"your-openai-api-key-here\",\n    \"LLM_MODEL_NAME\": \"gpt-5\"\n  }\n}\n```\n\n**\ud83d\udcdd Configuration Options:**\n- **LLM_API_KEY**: Required for most MCP clients (except GitHub Copilot + VS Code)\n- **LLM_MODEL_NAME**: Optional custom model (see [Supported LLM Providers](#supported-llm-providers) for defaults)\n\n\n\n\n\n## \ud83d\udd12 **Privacy & Flexible AI Integration**\n\n### **\ud83d\udd11 MCP Sampling (Preferred) + LLM API Key Fallback**\n\n**Primary Mode: MCP Sampling**\n- All judgments are performed using **MCP Sampling** capability\n- No need to configure or pay for external LLM API services\n- Works directly with your MCP-compatible client's existing AI model\n- **Currently supported by:** GitHub Copilot + VS Code\n\n**Fallback Mode: LLM API Key**\n- When MCP sampling is not available, the server can use LLM API keys\n- Supports multiple providers via LiteLLM: OpenAI, Anthropic, Google, Azure, Groq, Mistral, xAI\n- Automatic vendor detection from API key patterns\n- Default model selection per vendor when no model is specified\n\n\n### **\ud83d\udee1\ufe0f Your Privacy Matters**\n\n- The server runs **locally** on your machine\n- **No data collection** - your code and conversations stay private\n- **No external API calls when using MCP Sampling**. If you set `LLM_API_KEY` for fallback, the server will call your chosen LLM provider only to perform judgments (plan/code/test) with the evaluation content you provide.\n- Complete control over your development workflow and sensitive information\n\n## \ud83e\udd1d **Contributing**\n\nWe welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.\n\n### **Development Setup**\n\n```bash\n# Clone the repository\ngit clone https://github.com/OtherVibes/mcp-as-a-judge.git\ncd mcp-as-a-judge\n\n# Install dependencies with uv\nuv sync --all-extras --dev\n\n# Install pre-commit hooks\nuv run pre-commit install\n\n# Run tests\nuv run pytest\n\n# Run all checks\nuv run pytest && uv run ruff check && uv run ruff format --check && uv run mypy src\n```\n\n\n## \u00a9 Concepts and Methodology\n\u00a9 2025 OtherVibes and Zvi Fried. The \"MCP as a Judge\" concept, the \"behavioral MCP\" approach, the staged workflow (plan \u2192 code \u2192 test \u2192 completion), tool taxonomy/descriptions, and prompt templates are original work developed in this repository.\n\n\n## Prior Art and Attribution\nWhile \u201cLLM\u2011as\u2011a\u2011judge\u201d is a broadly known idea, this repository defines the original \u201cMCP as a Judge\u201d behavioral MCP pattern by OtherVibes and Zvi Fried. It combines task\u2011centric workflow enforcement (plan \u2192 code \u2192 test \u2192 completion), explicit LLM\u2011based validations, and human\u2011in\u2011the\u2011loop elicitation, along with the prompt templates and tool taxonomy provided here. Please attribute as: \u201cOtherVibes \u2013 MCP as a Judge (Zvi Fried)\u201d.\n\n## \u2753 FAQ\n\n### How is \u201cMCP as a Judge\u201d different from rules/subagents in IDE assistants (GitHub Copilot, Cursor, Claude Code)?\n| Feature | IDE Rules | Subagents | MCP as a Judge |\n|---------|-----------|-----------|----------------|\n| Static behavior guidance | \u2713 | \u2713 | \u2717 |\n| Custom system prompts | \u2713 | \u2713 | \u2713 |\n| Project context integration | \u2713 | \u2713 | \u2713 |\n| Specialized task handling | \u2717 | \u2713 | \u2713 |\n| Active quality gates | \u2717 | \u2717 | \u2713 |\n| Evidence-based validation | \u2717 | \u2717 | \u2713 |\n| Approve/reject with feedback | \u2717 | \u2717 | \u2713 |\n| Workflow enforcement | \u2717 | \u2717 | \u2713 |\n| Cross-assistant compatibility | \u2717 | \u2717 | \u2713 |\n  - References: [GitHub Copilot Custom Instructions](https://docs.github.com/en/copilot/how-tos/configure-custom-instructions/add-repository-instructions), [Cursor Rules](https://docs.cursor.com/en/context/@-symbols/@-cursor-rules), [Claude Code Subagents](https://docs.anthropic.com/en/docs/claude-code/sub-agents)\n\n### How does the Judge workflow relate to the tasklist? Why do we need both?\n- Tasklist = planning/organization: tracks tasks, priorities, and status. It doesn\u2019t guarantee engineering quality or readiness.\n- Judge workflow = quality gates: enforces approvals for plan/design, code diffs, tests, and final completion. It demands real evidence (e.g., unified Git diffs and raw test output) and returns structured approvals and required improvements.\n- Together: Use the tasklist to organize work; use the Judge to decide when each stage is actually ready to proceed. The server also emits next_tool guidance to keep progress moving through the gates.\n\n### If the Judge isn\u2019t used automatically, how do I force it?\n- In your prompt: \"use mcp-as-a-judge\" or \"Evaluate plan/code/test using the MCP server mcp-as-a-judge\".\n- VS Code: Command Palette \u2192 \"MCP: List Servers\" \u2192 ensure \"mcp-as-a-judge\" is listed and enabled.\n- Ensure the MCP server is running and, in your client, the judge tools are enabled/approved.\n\n### How do I select models for sampling in VS Code?\n- Open Command Palette (Cmd/Ctrl+Shift+P) \u2192 \"MCP: List Servers\"\n- Select \"mcp-as-a-judge\" \u2192 \"Configure Model Access\"\n- Check your preferred model(s) to enable sampling\n\n\n\n## \ud83d\udcc4 **License**\n\nThis project is licensed under the MIT License (see [LICENSE](LICENSE)).\n\n## \ud83d\ude4f **Acknowledgments**\n\n- [Model Context Protocol](https://modelcontextprotocol.io/) by Anthropic\n- [LiteLLM](https://github.com/BerriAI/litellm) for unified LLM API integration\n\n---\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "MCP as a Judge: a behavioral MCP that strengthens AI coding assistants via explicit LLM evaluations",
    "version": "0.3.14",
    "project_urls": {
        "Documentation": "https://github.com/OtherVibes/mcp-as-a-judge#readme",
        "Homepage": "https://github.com/OtherVibes/mcp-as-a-judge",
        "Issues": "https://github.com/OtherVibes/mcp-as-a-judge/issues",
        "Repository": "https://github.com/OtherVibes/mcp-as-a-judge"
    },
    "split_keywords": [
        "ai",
        " automation",
        " best-practices",
        " code-review",
        " judge",
        " mcp",
        " model-context-protocol",
        " software-engineering"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9473011d54a54859f4d5e4fb7413929b7fc500139f28b4c10a78a130a88ade47",
                "md5": "0621eb798f1c1e6a63c6e8aa7f68cfb7",
                "sha256": "4370e5ba15e0e1f6734dbcaf7f43b5f93113a78c692b4137eb1bcb23715fc95e"
            },
            "downloads": -1,
            "filename": "mcp_as_a_judge-0.3.14-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0621eb798f1c1e6a63c6e8aa7f68cfb7",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.12",
            "size": 154540,
            "upload_time": "2025-09-18T22:04:58",
            "upload_time_iso_8601": "2025-09-18T22:04:58.729864Z",
            "url": "https://files.pythonhosted.org/packages/94/73/011d54a54859f4d5e4fb7413929b7fc500139f28b4c10a78a130a88ade47/mcp_as_a_judge-0.3.14-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5d660de9393314c4569675e28d91ff7b5f6959bb2de380f22e301b87a2455347",
                "md5": "016d91938f7761a14a9219415fe2f04e",
                "sha256": "f6a34430d5a1862d5e146bf5142a300046977b16a52127cc69ed42c106080a9f"
            },
            "downloads": -1,
            "filename": "mcp_as_a_judge-0.3.14.tar.gz",
            "has_sig": false,
            "md5_digest": "016d91938f7761a14a9219415fe2f04e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.12",
            "size": 330482,
            "upload_time": "2025-09-18T22:05:00",
            "upload_time_iso_8601": "2025-09-18T22:05:00.032338Z",
            "url": "https://files.pythonhosted.org/packages/5d/66/0de9393314c4569675e28d91ff7b5f6959bb2de380f22e301b87a2455347/mcp_as_a_judge-0.3.14.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-18 22:05:00",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "OtherVibes",
    "github_project": "mcp-as-a-judge#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "mcp-as-a-judge"
}

Zvi Fried