# Incept Eval
**Local** CLI tool for evaluating educational questions with comprehensive AI-powered assessment. Runs evaluation locally using compliance_math_evaluator, answer_verification, and reading_question_qc modules, plus EduBench tasks.
[](https://badge.fury.io/py/incept-eval)
[](https://pypi.org/project/incept-eval/)
[](https://opensource.org/licenses/MIT)
## Features
🎯 **Comprehensive Evaluation**
- **Internal Evaluator** - Scaffolding quality and DI compliance scoring
- **Answer Verification** - GPT-4o powered correctness checking
- **Reading Question QC** - MCQ distractor and question quality checks
- **EduBench Tasks** - Educational benchmarks (QA, EC, IP, AG, QG, TMG)
📊 **Flexible Output**
- Pretty mode for quick score viewing
- Full detailed results with all metrics
- Append mode for collecting multiple evaluations
- JSON output for easy integration
🚀 **Easy to Use**
- Simple CLI interface
- **Runs completely locally** - no API calls to external services
- Requires OpenAI and Anthropic API keys in `.env` file
- Batch processing support
## Installation
```bash
pip install incept-eval
```
## Quick Start
### 1. Install
```bash
pip install incept-eval
```
### 2. Set up API Keys
Create a `.env` file in your project directory with:
```bash
OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key
HUGGINGFACE_TOKEN=your_hf_token # Optional for EduBench
```
### 3. Generate Sample File
```bash
incept-eval example
```
This creates `qs.json` with a complete example question.
### 4. Evaluate
```bash
incept-eval evaluate qs.json --verbose
```
## Usage
### Commands
#### `evaluate` - Evaluate questions from JSON file
```bash
# Basic evaluation (pretty mode by default)
incept-eval evaluate questions.json
# Verbose output with progress messages
incept-eval evaluate questions.json --verbose
# Save results to file (overwrite)
incept-eval evaluate questions.json -o results.json
# Append results to file (creates if not exists)
incept-eval evaluate questions.json -a all_evaluations.json --verbose
# Use local API server
incept-eval evaluate questions.json --api-url http://localhost:8000
# Full results without pretty formatting
incept-eval evaluate questions.json --no-pretty
```
#### `example` - Generate sample input file
```bash
# Generate qs.json (default)
incept-eval example
# Save to custom filename
incept-eval example -o sample.json
```
#### `configure` - Save API key
```bash
incept-eval configure YOUR_API_KEY
```
#### `help` - Show detailed help
```bash
incept-eval help
```
## Input Format
The input JSON file must contain:
- `request`: Question generation request metadata (grade, subject, instructions, etc.)
- `questions`: Array of 1-5 questions to evaluate
**Example:**
```json
{
"request": {
"grade": 3,
"count": 2,
"subject": "mathematics",
"instructions": "Generate multiplication word problems that involve equal groups.",
"language": "arabic"
},
"questions": [
{
"type": "mcq",
"question": "إذا كان لديك 4 علب من القلم وكل علبة تحتوي على 7 أقلام، كم عدد الأقلام لديك إجمالاً؟",
"answer": "28",
"difficulty": "medium",
"explanation": "استخدام ضرب لحساب مجموع الأقلام في جميع العلب.",
"options": {
"A": "21",
"B": "32",
"C": "35",
"D": "28"
},
"answer_choice": "D",
"detailed_explanation": { ... },
"voiceover_script": { ... },
"skill": null,
"image_url": null,
"di_formats_used": [ ... ]
}
]
}
```
Use `incept-eval example` to see a complete example with all fields.
## Authentication
**Required API Keys:**
The tool requires API keys from OpenAI and Anthropic for running evaluations. Create a `.env` file in your working directory:
```bash
OPENAI_API_KEY=your_openai_api_key
ANTHROPIC_API_KEY=your_anthropic_api_key
HUGGINGFACE_TOKEN=your_hf_token # Optional, for EduBench tasks
```
The tool will automatically load these from the `.env` file when you run evaluations.
## Output Format
### Pretty Mode (default)
Shows only the scores:
```json
{
"overall_scores": {
"total_questions": 1.0,
"v3_average": 0.9555555555555555,
"answer_correctness_rate": 1.0,
"total_edubench_tasks": 3.0
},
"v3_scores": [
{
"correctness": 1.0,
"grade_alignment": 1.0,
"difficulty_alignment": 1.0,
"language_quality": 0.9,
"pedagogical_value": 0.9,
"explanation_quality": 0.8,
"instruction_adherence": 1.0,
"format_compliance": 1.0,
"query_relevance": 1.0,
"di_compliance": 0.9,
"overall": 0.9555555555555555,
"recommendation": "accept"
}
],
"answer_verification": [
{
"is_correct": true,
"confidence": 10
}
]
}
```
### Full Mode (`--no-pretty`)
Includes all evaluation details:
- `overall_scores`: Aggregate metrics
- `v3_scores`: Per-question scaffolding scores
- `answer_verification`: Answer correctness checks
- `edubench_results`: Full task evaluation responses
- `summary`: Evaluation metadata and timing
## Command Reference
| Command | Description |
|---------|-------------|
| `evaluate` | Evaluate questions from JSON file |
| `example` | Generate sample input file |
| `configure` | Save API key to config file |
| `help` | Show detailed help and usage examples |
### Evaluate Options
| Option | Short | Description |
|--------|-------|-------------|
| `--output PATH` | `-o` | Save results to file (overwrites) |
| `--append PATH` | `-a` | Append results to file (creates if not exists) |
| `--api-key KEY` | `-k` | API key (or use INCEPT_API_KEY env var) |
| `--api-url URL` | | API endpoint (default: production) |
| `--pretty` | | Show only scores (default: true) |
| `--no-pretty` | | Show full results including EduBench details |
| `--verbose` | `-v` | Show progress messages |
## Examples
### Basic Evaluation
```bash
# Evaluate with default settings (pretty mode)
incept-eval evaluate questions.json --verbose
```
### Collecting Multiple Evaluations
```bash
# Append multiple evaluations to one file
incept-eval evaluate test1.json -a all_results.json
incept-eval evaluate test2.json -a all_results.json
incept-eval evaluate test3.json -a all_results.json
# Result: all_results.json contains an array of all 3 evaluations
```
### Batch Processing
```bash
# Evaluate all files and append to one results file
for file in questions/*.json; do
incept-eval evaluate "$file" -a batch_results.json --verbose
done
```
### Local Development
```bash
# Test against local API server
incept-eval evaluate test.json --api-url http://localhost:8000 --verbose
```
### Full Results
```bash
# Get complete evaluation with EduBench details
incept-eval evaluate questions.json --no-pretty -o full_results.json
```
## Evaluation Modules
The API evaluates questions using three main modules:
### V3 Evaluation
- Scaffolding quality assessment (detailed_explanation steps)
- Direct Instruction (DI) compliance checking
- Pedagogical structure validation
- Language quality scoring
- Grade and difficulty alignment
### Answer Verification
- GPT-4o powered correctness checking
- Mathematical accuracy validation
- Confidence scoring (0-10)
### EduBench Tasks
- **QA**: Question Answering - Can the model answer the question?
- **EC**: Error Correction - Can the model identify and correct errors?
- **IP**: Instructional Planning - Can the model provide step-by-step solutions?
All modules run by default. Future versions will support configurable module selection.
## Requirements
- Python >= 3.11
- Incept API key
## Support
- **Issues**: [GitHub Issues](https://github.com/incept-ai/incept-eval/issues)
- **Help**: Run `incept-eval help` for detailed documentation
## License
MIT License - see [LICENSE](LICENSE) file for details.
---
**Made by the Incept Team**
Raw data
{
"_id": null,
"home_page": "https://github.com/incept-ai/incept-eval",
"name": "incept-eval",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.14,>=3.11",
"maintainer_email": null,
"keywords": "education, evaluation, ai, questions, assessment, edubench, scaffolding",
"author": "Incept Team",
"author_email": "noreply@incept.ai",
"download_url": "https://files.pythonhosted.org/packages/34/07/dc2b0dc454be18821028e3d879fb6f4a877dc779c01c2ab8495e9ae0c907/incept_eval-2.1.4.tar.gz",
"platform": null,
"description": "# Incept Eval\n\n**Local** CLI tool for evaluating educational questions with comprehensive AI-powered assessment. Runs evaluation locally using compliance_math_evaluator, answer_verification, and reading_question_qc modules, plus EduBench tasks.\n\n[](https://badge.fury.io/py/incept-eval)\n[](https://pypi.org/project/incept-eval/)\n[](https://opensource.org/licenses/MIT)\n\n## Features\n\n\ud83c\udfaf **Comprehensive Evaluation**\n- **Internal Evaluator** - Scaffolding quality and DI compliance scoring\n- **Answer Verification** - GPT-4o powered correctness checking\n- **Reading Question QC** - MCQ distractor and question quality checks\n- **EduBench Tasks** - Educational benchmarks (QA, EC, IP, AG, QG, TMG)\n\n\ud83d\udcca **Flexible Output**\n- Pretty mode for quick score viewing\n- Full detailed results with all metrics\n- Append mode for collecting multiple evaluations\n- JSON output for easy integration\n\n\ud83d\ude80 **Easy to Use**\n- Simple CLI interface\n- **Runs completely locally** - no API calls to external services\n- Requires OpenAI and Anthropic API keys in `.env` file\n- Batch processing support\n\n## Installation\n\n```bash\npip install incept-eval\n```\n\n## Quick Start\n\n### 1. Install\n\n```bash\npip install incept-eval\n```\n\n### 2. Set up API Keys\n\nCreate a `.env` file in your project directory with:\n\n```bash\nOPENAI_API_KEY=your_openai_key\nANTHROPIC_API_KEY=your_anthropic_key\nHUGGINGFACE_TOKEN=your_hf_token # Optional for EduBench\n```\n\n### 3. Generate Sample File\n\n```bash\nincept-eval example\n```\n\nThis creates `qs.json` with a complete example question.\n\n### 4. Evaluate\n\n```bash\nincept-eval evaluate qs.json --verbose\n```\n\n## Usage\n\n### Commands\n\n#### `evaluate` - Evaluate questions from JSON file\n\n```bash\n# Basic evaluation (pretty mode by default)\nincept-eval evaluate questions.json\n\n# Verbose output with progress messages\nincept-eval evaluate questions.json --verbose\n\n# Save results to file (overwrite)\nincept-eval evaluate questions.json -o results.json\n\n# Append results to file (creates if not exists)\nincept-eval evaluate questions.json -a all_evaluations.json --verbose\n\n# Use local API server\nincept-eval evaluate questions.json --api-url http://localhost:8000\n\n# Full results without pretty formatting\nincept-eval evaluate questions.json --no-pretty\n```\n\n#### `example` - Generate sample input file\n\n```bash\n# Generate qs.json (default)\nincept-eval example\n\n# Save to custom filename\nincept-eval example -o sample.json\n```\n\n#### `configure` - Save API key\n\n```bash\nincept-eval configure YOUR_API_KEY\n```\n\n#### `help` - Show detailed help\n\n```bash\nincept-eval help\n```\n\n## Input Format\n\nThe input JSON file must contain:\n- `request`: Question generation request metadata (grade, subject, instructions, etc.)\n- `questions`: Array of 1-5 questions to evaluate\n\n**Example:**\n\n```json\n{\n \"request\": {\n \"grade\": 3,\n \"count\": 2,\n \"subject\": \"mathematics\",\n \"instructions\": \"Generate multiplication word problems that involve equal groups.\",\n \"language\": \"arabic\"\n },\n \"questions\": [\n {\n \"type\": \"mcq\",\n \"question\": \"\u0625\u0630\u0627 \u0643\u0627\u0646 \u0644\u062f\u064a\u0643 4 \u0639\u0644\u0628 \u0645\u0646 \u0627\u0644\u0642\u0644\u0645 \u0648\u0643\u0644 \u0639\u0644\u0628\u0629 \u062a\u062d\u062a\u0648\u064a \u0639\u0644\u0649 7 \u0623\u0642\u0644\u0627\u0645\u060c \u0643\u0645 \u0639\u062f\u062f \u0627\u0644\u0623\u0642\u0644\u0627\u0645 \u0644\u062f\u064a\u0643 \u0625\u062c\u0645\u0627\u0644\u0627\u064b\u061f\",\n \"answer\": \"28\",\n \"difficulty\": \"medium\",\n \"explanation\": \"\u0627\u0633\u062a\u062e\u062f\u0627\u0645 \u0636\u0631\u0628 \u0644\u062d\u0633\u0627\u0628 \u0645\u062c\u0645\u0648\u0639 \u0627\u0644\u0623\u0642\u0644\u0627\u0645 \u0641\u064a \u062c\u0645\u064a\u0639 \u0627\u0644\u0639\u0644\u0628.\",\n \"options\": {\n \"A\": \"21\",\n \"B\": \"32\",\n \"C\": \"35\",\n \"D\": \"28\"\n },\n \"answer_choice\": \"D\",\n \"detailed_explanation\": { ... },\n \"voiceover_script\": { ... },\n \"skill\": null,\n \"image_url\": null,\n \"di_formats_used\": [ ... ]\n }\n ]\n}\n```\n\nUse `incept-eval example` to see a complete example with all fields.\n\n## Authentication\n\n**Required API Keys:**\n\nThe tool requires API keys from OpenAI and Anthropic for running evaluations. Create a `.env` file in your working directory:\n\n```bash\nOPENAI_API_KEY=your_openai_api_key\nANTHROPIC_API_KEY=your_anthropic_api_key\nHUGGINGFACE_TOKEN=your_hf_token # Optional, for EduBench tasks\n```\n\nThe tool will automatically load these from the `.env` file when you run evaluations.\n\n## Output Format\n\n### Pretty Mode (default)\n\nShows only the scores:\n\n```json\n{\n \"overall_scores\": {\n \"total_questions\": 1.0,\n \"v3_average\": 0.9555555555555555,\n \"answer_correctness_rate\": 1.0,\n \"total_edubench_tasks\": 3.0\n },\n \"v3_scores\": [\n {\n \"correctness\": 1.0,\n \"grade_alignment\": 1.0,\n \"difficulty_alignment\": 1.0,\n \"language_quality\": 0.9,\n \"pedagogical_value\": 0.9,\n \"explanation_quality\": 0.8,\n \"instruction_adherence\": 1.0,\n \"format_compliance\": 1.0,\n \"query_relevance\": 1.0,\n \"di_compliance\": 0.9,\n \"overall\": 0.9555555555555555,\n \"recommendation\": \"accept\"\n }\n ],\n \"answer_verification\": [\n {\n \"is_correct\": true,\n \"confidence\": 10\n }\n ]\n}\n```\n\n### Full Mode (`--no-pretty`)\n\nIncludes all evaluation details:\n- `overall_scores`: Aggregate metrics\n- `v3_scores`: Per-question scaffolding scores\n- `answer_verification`: Answer correctness checks\n- `edubench_results`: Full task evaluation responses\n- `summary`: Evaluation metadata and timing\n\n## Command Reference\n\n| Command | Description |\n|---------|-------------|\n| `evaluate` | Evaluate questions from JSON file |\n| `example` | Generate sample input file |\n| `configure` | Save API key to config file |\n| `help` | Show detailed help and usage examples |\n\n### Evaluate Options\n\n| Option | Short | Description |\n|--------|-------|-------------|\n| `--output PATH` | `-o` | Save results to file (overwrites) |\n| `--append PATH` | `-a` | Append results to file (creates if not exists) |\n| `--api-key KEY` | `-k` | API key (or use INCEPT_API_KEY env var) |\n| `--api-url URL` | | API endpoint (default: production) |\n| `--pretty` | | Show only scores (default: true) |\n| `--no-pretty` | | Show full results including EduBench details |\n| `--verbose` | `-v` | Show progress messages |\n\n## Examples\n\n### Basic Evaluation\n\n```bash\n# Evaluate with default settings (pretty mode)\nincept-eval evaluate questions.json --verbose\n```\n\n### Collecting Multiple Evaluations\n\n```bash\n# Append multiple evaluations to one file\nincept-eval evaluate test1.json -a all_results.json\nincept-eval evaluate test2.json -a all_results.json\nincept-eval evaluate test3.json -a all_results.json\n\n# Result: all_results.json contains an array of all 3 evaluations\n```\n\n### Batch Processing\n\n```bash\n# Evaluate all files and append to one results file\nfor file in questions/*.json; do\n incept-eval evaluate \"$file\" -a batch_results.json --verbose\ndone\n```\n\n### Local Development\n\n```bash\n# Test against local API server\nincept-eval evaluate test.json --api-url http://localhost:8000 --verbose\n```\n\n### Full Results\n\n```bash\n# Get complete evaluation with EduBench details\nincept-eval evaluate questions.json --no-pretty -o full_results.json\n```\n\n## Evaluation Modules\n\nThe API evaluates questions using three main modules:\n\n### V3 Evaluation\n- Scaffolding quality assessment (detailed_explanation steps)\n- Direct Instruction (DI) compliance checking\n- Pedagogical structure validation\n- Language quality scoring\n- Grade and difficulty alignment\n\n### Answer Verification\n- GPT-4o powered correctness checking\n- Mathematical accuracy validation\n- Confidence scoring (0-10)\n\n### EduBench Tasks\n- **QA**: Question Answering - Can the model answer the question?\n- **EC**: Error Correction - Can the model identify and correct errors?\n- **IP**: Instructional Planning - Can the model provide step-by-step solutions?\n\nAll modules run by default. Future versions will support configurable module selection.\n\n## Requirements\n\n- Python >= 3.11\n- Incept API key\n\n## Support\n\n- **Issues**: [GitHub Issues](https://github.com/incept-ai/incept-eval/issues)\n- **Help**: Run `incept-eval help` for detailed documentation\n\n## License\n\nMIT License - see [LICENSE](LICENSE) file for details.\n\n---\n\n**Made by the Incept Team**\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Standalone CLI tool for evaluating educational questions with comprehensive AI-powered assessment",
"version": "2.1.4",
"project_urls": {
"Homepage": "https://github.com/incept-ai/incept-eval",
"Repository": "https://github.com/incept-ai/incept-eval"
},
"split_keywords": [
"education",
" evaluation",
" ai",
" questions",
" assessment",
" edubench",
" scaffolding"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "2313b0a228ebc5e7017a28af382203a383e88278e575c1d617bcb1d18ff45ae6",
"md5": "cfa8e24d04f52d24b8318b431fc0cb59",
"sha256": "3aa1666a6eb4e9e8f050448771e5007d86bc014133c6583f076b4502b81ea968"
},
"downloads": -1,
"filename": "incept_eval-2.1.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "cfa8e24d04f52d24b8318b431fc0cb59",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.14,>=3.11",
"size": 63330463,
"upload_time": "2025-10-17T09:31:01",
"upload_time_iso_8601": "2025-10-17T09:31:01.564705Z",
"url": "https://files.pythonhosted.org/packages/23/13/b0a228ebc5e7017a28af382203a383e88278e575c1d617bcb1d18ff45ae6/incept_eval-2.1.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "3407dc2b0dc454be18821028e3d879fb6f4a877dc779c01c2ab8495e9ae0c907",
"md5": "20f96eee52adfd733cceb66a83d2370f",
"sha256": "86489c365683dc5c9261bf8d956adbe7935b863762d679e75fc69f9fe3ee56c8"
},
"downloads": -1,
"filename": "incept_eval-2.1.4.tar.gz",
"has_sig": false,
"md5_digest": "20f96eee52adfd733cceb66a83d2370f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.14,>=3.11",
"size": 62610699,
"upload_time": "2025-10-17T09:31:05",
"upload_time_iso_8601": "2025-10-17T09:31:05.936364Z",
"url": "https://files.pythonhosted.org/packages/34/07/dc2b0dc454be18821028e3d879fb6f4a877dc779c01c2ab8495e9ae0c907/incept_eval-2.1.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-17 09:31:05",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "incept-ai",
"github_project": "incept-eval",
"github_not_found": true,
"lcname": "incept-eval"
}