incept-eval


Nameincept-eval JSON
Version 2.1.4 PyPI version JSON
download
home_pagehttps://github.com/incept-ai/incept-eval
SummaryStandalone CLI tool for evaluating educational questions with comprehensive AI-powered assessment
upload_time2025-10-17 09:31:05
maintainerNone
docs_urlNone
authorIncept Team
requires_python<3.14,>=3.11
licenseMIT
keywords education evaluation ai questions assessment edubench scaffolding
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Incept Eval

**Local** CLI tool for evaluating educational questions with comprehensive AI-powered assessment. Runs evaluation locally using compliance_math_evaluator, answer_verification, and reading_question_qc modules, plus EduBench tasks.

[![PyPI version](https://badge.fury.io/py/incept-eval.svg)](https://badge.fury.io/py/incept-eval)
[![Python Version](https://img.shields.io/pypi/pyversions/incept-eval.svg)](https://pypi.org/project/incept-eval/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

## Features

🎯 **Comprehensive Evaluation**
- **Internal Evaluator** - Scaffolding quality and DI compliance scoring
- **Answer Verification** - GPT-4o powered correctness checking
- **Reading Question QC** - MCQ distractor and question quality checks
- **EduBench Tasks** - Educational benchmarks (QA, EC, IP, AG, QG, TMG)

📊 **Flexible Output**
- Pretty mode for quick score viewing
- Full detailed results with all metrics
- Append mode for collecting multiple evaluations
- JSON output for easy integration

🚀 **Easy to Use**
- Simple CLI interface
- **Runs completely locally** - no API calls to external services
- Requires OpenAI and Anthropic API keys in `.env` file
- Batch processing support

## Installation

```bash
pip install incept-eval
```

## Quick Start

### 1. Install

```bash
pip install incept-eval
```

### 2. Set up API Keys

Create a `.env` file in your project directory with:

```bash
OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key
HUGGINGFACE_TOKEN=your_hf_token  # Optional for EduBench
```

### 3. Generate Sample File

```bash
incept-eval example
```

This creates `qs.json` with a complete example question.

### 4. Evaluate

```bash
incept-eval evaluate qs.json --verbose
```

## Usage

### Commands

#### `evaluate` - Evaluate questions from JSON file

```bash
# Basic evaluation (pretty mode by default)
incept-eval evaluate questions.json

# Verbose output with progress messages
incept-eval evaluate questions.json --verbose

# Save results to file (overwrite)
incept-eval evaluate questions.json -o results.json

# Append results to file (creates if not exists)
incept-eval evaluate questions.json -a all_evaluations.json --verbose

# Use local API server
incept-eval evaluate questions.json --api-url http://localhost:8000

# Full results without pretty formatting
incept-eval evaluate questions.json --no-pretty
```

#### `example` - Generate sample input file

```bash
# Generate qs.json (default)
incept-eval example

# Save to custom filename
incept-eval example -o sample.json
```

#### `configure` - Save API key

```bash
incept-eval configure YOUR_API_KEY
```

#### `help` - Show detailed help

```bash
incept-eval help
```

## Input Format

The input JSON file must contain:
- `request`: Question generation request metadata (grade, subject, instructions, etc.)
- `questions`: Array of 1-5 questions to evaluate

**Example:**

```json
{
  "request": {
    "grade": 3,
    "count": 2,
    "subject": "mathematics",
    "instructions": "Generate multiplication word problems that involve equal groups.",
    "language": "arabic"
  },
  "questions": [
    {
      "type": "mcq",
      "question": "إذا كان لديك 4 علب من القلم وكل علبة تحتوي على 7 أقلام، كم عدد الأقلام لديك إجمالاً؟",
      "answer": "28",
      "difficulty": "medium",
      "explanation": "استخدام ضرب لحساب مجموع الأقلام في جميع العلب.",
      "options": {
        "A": "21",
        "B": "32",
        "C": "35",
        "D": "28"
      },
      "answer_choice": "D",
      "detailed_explanation": { ... },
      "voiceover_script": { ... },
      "skill": null,
      "image_url": null,
      "di_formats_used": [ ... ]
    }
  ]
}
```

Use `incept-eval example` to see a complete example with all fields.

## Authentication

**Required API Keys:**

The tool requires API keys from OpenAI and Anthropic for running evaluations. Create a `.env` file in your working directory:

```bash
OPENAI_API_KEY=your_openai_api_key
ANTHROPIC_API_KEY=your_anthropic_api_key
HUGGINGFACE_TOKEN=your_hf_token  # Optional, for EduBench tasks
```

The tool will automatically load these from the `.env` file when you run evaluations.

## Output Format

### Pretty Mode (default)

Shows only the scores:

```json
{
  "overall_scores": {
    "total_questions": 1.0,
    "v3_average": 0.9555555555555555,
    "answer_correctness_rate": 1.0,
    "total_edubench_tasks": 3.0
  },
  "v3_scores": [
    {
      "correctness": 1.0,
      "grade_alignment": 1.0,
      "difficulty_alignment": 1.0,
      "language_quality": 0.9,
      "pedagogical_value": 0.9,
      "explanation_quality": 0.8,
      "instruction_adherence": 1.0,
      "format_compliance": 1.0,
      "query_relevance": 1.0,
      "di_compliance": 0.9,
      "overall": 0.9555555555555555,
      "recommendation": "accept"
    }
  ],
  "answer_verification": [
    {
      "is_correct": true,
      "confidence": 10
    }
  ]
}
```

### Full Mode (`--no-pretty`)

Includes all evaluation details:
- `overall_scores`: Aggregate metrics
- `v3_scores`: Per-question scaffolding scores
- `answer_verification`: Answer correctness checks
- `edubench_results`: Full task evaluation responses
- `summary`: Evaluation metadata and timing

## Command Reference

| Command | Description |
|---------|-------------|
| `evaluate` | Evaluate questions from JSON file |
| `example` | Generate sample input file |
| `configure` | Save API key to config file |
| `help` | Show detailed help and usage examples |

### Evaluate Options

| Option | Short | Description |
|--------|-------|-------------|
| `--output PATH` | `-o` | Save results to file (overwrites) |
| `--append PATH` | `-a` | Append results to file (creates if not exists) |
| `--api-key KEY` | `-k` | API key (or use INCEPT_API_KEY env var) |
| `--api-url URL` | | API endpoint (default: production) |
| `--pretty` | | Show only scores (default: true) |
| `--no-pretty` | | Show full results including EduBench details |
| `--verbose` | `-v` | Show progress messages |

## Examples

### Basic Evaluation

```bash
# Evaluate with default settings (pretty mode)
incept-eval evaluate questions.json --verbose
```

### Collecting Multiple Evaluations

```bash
# Append multiple evaluations to one file
incept-eval evaluate test1.json -a all_results.json
incept-eval evaluate test2.json -a all_results.json
incept-eval evaluate test3.json -a all_results.json

# Result: all_results.json contains an array of all 3 evaluations
```

### Batch Processing

```bash
# Evaluate all files and append to one results file
for file in questions/*.json; do
  incept-eval evaluate "$file" -a batch_results.json --verbose
done
```

### Local Development

```bash
# Test against local API server
incept-eval evaluate test.json --api-url http://localhost:8000 --verbose
```

### Full Results

```bash
# Get complete evaluation with EduBench details
incept-eval evaluate questions.json --no-pretty -o full_results.json
```

## Evaluation Modules

The API evaluates questions using three main modules:

### V3 Evaluation
- Scaffolding quality assessment (detailed_explanation steps)
- Direct Instruction (DI) compliance checking
- Pedagogical structure validation
- Language quality scoring
- Grade and difficulty alignment

### Answer Verification
- GPT-4o powered correctness checking
- Mathematical accuracy validation
- Confidence scoring (0-10)

### EduBench Tasks
- **QA**: Question Answering - Can the model answer the question?
- **EC**: Error Correction - Can the model identify and correct errors?
- **IP**: Instructional Planning - Can the model provide step-by-step solutions?

All modules run by default. Future versions will support configurable module selection.

## Requirements

- Python >= 3.11
- Incept API key

## Support

- **Issues**: [GitHub Issues](https://github.com/incept-ai/incept-eval/issues)
- **Help**: Run `incept-eval help` for detailed documentation

## License

MIT License - see [LICENSE](LICENSE) file for details.

---

**Made by the Incept Team**

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/incept-ai/incept-eval",
    "name": "incept-eval",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.14,>=3.11",
    "maintainer_email": null,
    "keywords": "education, evaluation, ai, questions, assessment, edubench, scaffolding",
    "author": "Incept Team",
    "author_email": "noreply@incept.ai",
    "download_url": "https://files.pythonhosted.org/packages/34/07/dc2b0dc454be18821028e3d879fb6f4a877dc779c01c2ab8495e9ae0c907/incept_eval-2.1.4.tar.gz",
    "platform": null,
    "description": "# Incept Eval\n\n**Local** CLI tool for evaluating educational questions with comprehensive AI-powered assessment. Runs evaluation locally using compliance_math_evaluator, answer_verification, and reading_question_qc modules, plus EduBench tasks.\n\n[![PyPI version](https://badge.fury.io/py/incept-eval.svg)](https://badge.fury.io/py/incept-eval)\n[![Python Version](https://img.shields.io/pypi/pyversions/incept-eval.svg)](https://pypi.org/project/incept-eval/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\n## Features\n\n\ud83c\udfaf **Comprehensive Evaluation**\n- **Internal Evaluator** - Scaffolding quality and DI compliance scoring\n- **Answer Verification** - GPT-4o powered correctness checking\n- **Reading Question QC** - MCQ distractor and question quality checks\n- **EduBench Tasks** - Educational benchmarks (QA, EC, IP, AG, QG, TMG)\n\n\ud83d\udcca **Flexible Output**\n- Pretty mode for quick score viewing\n- Full detailed results with all metrics\n- Append mode for collecting multiple evaluations\n- JSON output for easy integration\n\n\ud83d\ude80 **Easy to Use**\n- Simple CLI interface\n- **Runs completely locally** - no API calls to external services\n- Requires OpenAI and Anthropic API keys in `.env` file\n- Batch processing support\n\n## Installation\n\n```bash\npip install incept-eval\n```\n\n## Quick Start\n\n### 1. Install\n\n```bash\npip install incept-eval\n```\n\n### 2. Set up API Keys\n\nCreate a `.env` file in your project directory with:\n\n```bash\nOPENAI_API_KEY=your_openai_key\nANTHROPIC_API_KEY=your_anthropic_key\nHUGGINGFACE_TOKEN=your_hf_token  # Optional for EduBench\n```\n\n### 3. Generate Sample File\n\n```bash\nincept-eval example\n```\n\nThis creates `qs.json` with a complete example question.\n\n### 4. Evaluate\n\n```bash\nincept-eval evaluate qs.json --verbose\n```\n\n## Usage\n\n### Commands\n\n#### `evaluate` - Evaluate questions from JSON file\n\n```bash\n# Basic evaluation (pretty mode by default)\nincept-eval evaluate questions.json\n\n# Verbose output with progress messages\nincept-eval evaluate questions.json --verbose\n\n# Save results to file (overwrite)\nincept-eval evaluate questions.json -o results.json\n\n# Append results to file (creates if not exists)\nincept-eval evaluate questions.json -a all_evaluations.json --verbose\n\n# Use local API server\nincept-eval evaluate questions.json --api-url http://localhost:8000\n\n# Full results without pretty formatting\nincept-eval evaluate questions.json --no-pretty\n```\n\n#### `example` - Generate sample input file\n\n```bash\n# Generate qs.json (default)\nincept-eval example\n\n# Save to custom filename\nincept-eval example -o sample.json\n```\n\n#### `configure` - Save API key\n\n```bash\nincept-eval configure YOUR_API_KEY\n```\n\n#### `help` - Show detailed help\n\n```bash\nincept-eval help\n```\n\n## Input Format\n\nThe input JSON file must contain:\n- `request`: Question generation request metadata (grade, subject, instructions, etc.)\n- `questions`: Array of 1-5 questions to evaluate\n\n**Example:**\n\n```json\n{\n  \"request\": {\n    \"grade\": 3,\n    \"count\": 2,\n    \"subject\": \"mathematics\",\n    \"instructions\": \"Generate multiplication word problems that involve equal groups.\",\n    \"language\": \"arabic\"\n  },\n  \"questions\": [\n    {\n      \"type\": \"mcq\",\n      \"question\": \"\u0625\u0630\u0627 \u0643\u0627\u0646 \u0644\u062f\u064a\u0643 4 \u0639\u0644\u0628 \u0645\u0646 \u0627\u0644\u0642\u0644\u0645 \u0648\u0643\u0644 \u0639\u0644\u0628\u0629 \u062a\u062d\u062a\u0648\u064a \u0639\u0644\u0649 7 \u0623\u0642\u0644\u0627\u0645\u060c \u0643\u0645 \u0639\u062f\u062f \u0627\u0644\u0623\u0642\u0644\u0627\u0645 \u0644\u062f\u064a\u0643 \u0625\u062c\u0645\u0627\u0644\u0627\u064b\u061f\",\n      \"answer\": \"28\",\n      \"difficulty\": \"medium\",\n      \"explanation\": \"\u0627\u0633\u062a\u062e\u062f\u0627\u0645 \u0636\u0631\u0628 \u0644\u062d\u0633\u0627\u0628 \u0645\u062c\u0645\u0648\u0639 \u0627\u0644\u0623\u0642\u0644\u0627\u0645 \u0641\u064a \u062c\u0645\u064a\u0639 \u0627\u0644\u0639\u0644\u0628.\",\n      \"options\": {\n        \"A\": \"21\",\n        \"B\": \"32\",\n        \"C\": \"35\",\n        \"D\": \"28\"\n      },\n      \"answer_choice\": \"D\",\n      \"detailed_explanation\": { ... },\n      \"voiceover_script\": { ... },\n      \"skill\": null,\n      \"image_url\": null,\n      \"di_formats_used\": [ ... ]\n    }\n  ]\n}\n```\n\nUse `incept-eval example` to see a complete example with all fields.\n\n## Authentication\n\n**Required API Keys:**\n\nThe tool requires API keys from OpenAI and Anthropic for running evaluations. Create a `.env` file in your working directory:\n\n```bash\nOPENAI_API_KEY=your_openai_api_key\nANTHROPIC_API_KEY=your_anthropic_api_key\nHUGGINGFACE_TOKEN=your_hf_token  # Optional, for EduBench tasks\n```\n\nThe tool will automatically load these from the `.env` file when you run evaluations.\n\n## Output Format\n\n### Pretty Mode (default)\n\nShows only the scores:\n\n```json\n{\n  \"overall_scores\": {\n    \"total_questions\": 1.0,\n    \"v3_average\": 0.9555555555555555,\n    \"answer_correctness_rate\": 1.0,\n    \"total_edubench_tasks\": 3.0\n  },\n  \"v3_scores\": [\n    {\n      \"correctness\": 1.0,\n      \"grade_alignment\": 1.0,\n      \"difficulty_alignment\": 1.0,\n      \"language_quality\": 0.9,\n      \"pedagogical_value\": 0.9,\n      \"explanation_quality\": 0.8,\n      \"instruction_adherence\": 1.0,\n      \"format_compliance\": 1.0,\n      \"query_relevance\": 1.0,\n      \"di_compliance\": 0.9,\n      \"overall\": 0.9555555555555555,\n      \"recommendation\": \"accept\"\n    }\n  ],\n  \"answer_verification\": [\n    {\n      \"is_correct\": true,\n      \"confidence\": 10\n    }\n  ]\n}\n```\n\n### Full Mode (`--no-pretty`)\n\nIncludes all evaluation details:\n- `overall_scores`: Aggregate metrics\n- `v3_scores`: Per-question scaffolding scores\n- `answer_verification`: Answer correctness checks\n- `edubench_results`: Full task evaluation responses\n- `summary`: Evaluation metadata and timing\n\n## Command Reference\n\n| Command | Description |\n|---------|-------------|\n| `evaluate` | Evaluate questions from JSON file |\n| `example` | Generate sample input file |\n| `configure` | Save API key to config file |\n| `help` | Show detailed help and usage examples |\n\n### Evaluate Options\n\n| Option | Short | Description |\n|--------|-------|-------------|\n| `--output PATH` | `-o` | Save results to file (overwrites) |\n| `--append PATH` | `-a` | Append results to file (creates if not exists) |\n| `--api-key KEY` | `-k` | API key (or use INCEPT_API_KEY env var) |\n| `--api-url URL` | | API endpoint (default: production) |\n| `--pretty` | | Show only scores (default: true) |\n| `--no-pretty` | | Show full results including EduBench details |\n| `--verbose` | `-v` | Show progress messages |\n\n## Examples\n\n### Basic Evaluation\n\n```bash\n# Evaluate with default settings (pretty mode)\nincept-eval evaluate questions.json --verbose\n```\n\n### Collecting Multiple Evaluations\n\n```bash\n# Append multiple evaluations to one file\nincept-eval evaluate test1.json -a all_results.json\nincept-eval evaluate test2.json -a all_results.json\nincept-eval evaluate test3.json -a all_results.json\n\n# Result: all_results.json contains an array of all 3 evaluations\n```\n\n### Batch Processing\n\n```bash\n# Evaluate all files and append to one results file\nfor file in questions/*.json; do\n  incept-eval evaluate \"$file\" -a batch_results.json --verbose\ndone\n```\n\n### Local Development\n\n```bash\n# Test against local API server\nincept-eval evaluate test.json --api-url http://localhost:8000 --verbose\n```\n\n### Full Results\n\n```bash\n# Get complete evaluation with EduBench details\nincept-eval evaluate questions.json --no-pretty -o full_results.json\n```\n\n## Evaluation Modules\n\nThe API evaluates questions using three main modules:\n\n### V3 Evaluation\n- Scaffolding quality assessment (detailed_explanation steps)\n- Direct Instruction (DI) compliance checking\n- Pedagogical structure validation\n- Language quality scoring\n- Grade and difficulty alignment\n\n### Answer Verification\n- GPT-4o powered correctness checking\n- Mathematical accuracy validation\n- Confidence scoring (0-10)\n\n### EduBench Tasks\n- **QA**: Question Answering - Can the model answer the question?\n- **EC**: Error Correction - Can the model identify and correct errors?\n- **IP**: Instructional Planning - Can the model provide step-by-step solutions?\n\nAll modules run by default. Future versions will support configurable module selection.\n\n## Requirements\n\n- Python >= 3.11\n- Incept API key\n\n## Support\n\n- **Issues**: [GitHub Issues](https://github.com/incept-ai/incept-eval/issues)\n- **Help**: Run `incept-eval help` for detailed documentation\n\n## License\n\nMIT License - see [LICENSE](LICENSE) file for details.\n\n---\n\n**Made by the Incept Team**\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Standalone CLI tool for evaluating educational questions with comprehensive AI-powered assessment",
    "version": "2.1.4",
    "project_urls": {
        "Homepage": "https://github.com/incept-ai/incept-eval",
        "Repository": "https://github.com/incept-ai/incept-eval"
    },
    "split_keywords": [
        "education",
        " evaluation",
        " ai",
        " questions",
        " assessment",
        " edubench",
        " scaffolding"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2313b0a228ebc5e7017a28af382203a383e88278e575c1d617bcb1d18ff45ae6",
                "md5": "cfa8e24d04f52d24b8318b431fc0cb59",
                "sha256": "3aa1666a6eb4e9e8f050448771e5007d86bc014133c6583f076b4502b81ea968"
            },
            "downloads": -1,
            "filename": "incept_eval-2.1.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "cfa8e24d04f52d24b8318b431fc0cb59",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.14,>=3.11",
            "size": 63330463,
            "upload_time": "2025-10-17T09:31:01",
            "upload_time_iso_8601": "2025-10-17T09:31:01.564705Z",
            "url": "https://files.pythonhosted.org/packages/23/13/b0a228ebc5e7017a28af382203a383e88278e575c1d617bcb1d18ff45ae6/incept_eval-2.1.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3407dc2b0dc454be18821028e3d879fb6f4a877dc779c01c2ab8495e9ae0c907",
                "md5": "20f96eee52adfd733cceb66a83d2370f",
                "sha256": "86489c365683dc5c9261bf8d956adbe7935b863762d679e75fc69f9fe3ee56c8"
            },
            "downloads": -1,
            "filename": "incept_eval-2.1.4.tar.gz",
            "has_sig": false,
            "md5_digest": "20f96eee52adfd733cceb66a83d2370f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.14,>=3.11",
            "size": 62610699,
            "upload_time": "2025-10-17T09:31:05",
            "upload_time_iso_8601": "2025-10-17T09:31:05.936364Z",
            "url": "https://files.pythonhosted.org/packages/34/07/dc2b0dc454be18821028e3d879fb6f4a877dc779c01c2ab8495e9ae0c907/incept_eval-2.1.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-17 09:31:05",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "incept-ai",
    "github_project": "incept-eval",
    "github_not_found": true,
    "lcname": "incept-eval"
}
        
Elapsed time: 1.37626s