rubric


Namerubric JSON
Version 1.2.0 PyPI version JSON
download
home_pageNone
Summaryrubric
upload_time2025-10-27 19:51:09
maintainerNone
docs_urlNone
authorGavin, Joey
requires_python>=3.11
licenseNone
keywords rubrics grading eval rubric
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <p align="center">
  <a href="https://pypi.org/project/rubric/">
    <img src="https://img.shields.io/pypi/v/rubric" alt="PyPI version" />
  </a>
  <a href="https://pypi.org/project/rubric/">
    <img src="https://img.shields.io/pypi/pyversions/rubric" alt="Python versions" />
  </a>
  <a href="https://github.com/The-LLM-Data-Company/rubric/blob/main/LICENSE">
    <img src="https://img.shields.io/badge/license-MIT-blue.svg" alt="License" />
  </a>
</p>

# Rubric

A Python library for LLM-based evaluation using weighted rubrics.

## Installation

```bash
uv add rubric
```

## Usage

1. **Set up environment variables:**

```bash
export OPENAI_API_KEY=your_api_key_here
```

2. **Run the example below**

```python
import asyncio
import os
from openai import AsyncOpenAI
from rubric import Rubric
from rubric.autograders import PerCriterionGrader

async def generate_with_async_openai(system_prompt: str, user_prompt: str) -> str:
    client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt},
        ],
        max_tokens=400,
        temperature=0.0,
    )
    return response.choices[0].message.content or ""

async def main():
    rubric = Rubric.from_dict([
        {"weight": 10.0, "requirement": "States Q4 2023 base margin as 17.2%"},
        {"weight": 8.0, "requirement": "Explicitly uses Shapley attribution for decomposition"},
        {"weight": -15.0, "requirement": "Uses total deliveries instead of cash-only deliveries"}
    ])

    grader = PerCriterionGrader(
        generate_fn=generate_with_async_openai,
        system_prompt="This overrides the default system prompt",
    )

    result = await rubric.grade(
        to_grade="Your text to evaluate...",
        autograder=grader
    )

    print(f"Score: {result.score}/100")
    for criterion in result.report:
        print(f"  {criterion.verdict}: {criterion.requirement}")

asyncio.run(main())
```

## Autograder Strategies

- `PerCriterionGrader` - Evaluates each criterion in parallel LLM calls
- `PerCriterionOneShotGrader` - Evaluates all criteria in a single LLM call
- `RubricAsJudgeGrader` - Holistic evaluation, LLM returns final score directly

### Default System Prompts

Each autograder uses a specialized system prompt optimized for its evaluation approach:

**PerCriterionGrader** - Detailed criterion-by-criterion evaluation with strict JSON formatting requirements. The prompt instructs the LLM to evaluate each criterion independently, handling both positive and negative criteria with specific response formats.

**PerCriterionOneShotGrader** - Streamlined prompt for evaluating all criteria in a single response. Focuses on providing verdicts (MET/UNMET) and explanations for each criterion in a structured JSON format.

**RubricAsJudgeGrader** - Holistic evaluation prompt that asks the LLM to consider the output as a whole and provide a single overall score from 0-100, taking into account the weights of all criteria.

You can view the complete default prompts in the source files:

- [`per_criterion_grader.py`](src/rubric/autograders/per_criterion_grader.py#L10-L55)
- [`per_criterion_one_shot_grader.py`](src/rubric/autograders/per_criterion_one_shot_grader.py#L11-L31)
- [`rubric_as_judge_grader.py`](src/rubric/autograders/rubric_as_judge_grader.py#L11-L22)

**Customizing System Prompts:** You can override the default system prompt by passing a `system_prompt` parameter to any autograder:

```python
grader = PerCriterionGrader(
    generate_fn=your_function,
    system_prompt="Your custom system prompt here"
)
```

## Customization

You can customize grading at multiple levels:

**1. Custom `generate_fn` (most common)**
Pass any function that takes `(system_prompt, user_prompt)` and returns a string. Use any LLM provider (OpenAI, Anthropic, local models, etc.):

```python
grader = PerCriterionGrader(generate_fn=your_custom_function)
```

**2. Override specific methods**
Subclass any autograder and override:

- `judge()` - Orchestrates LLM calls to evaluate criteria and parse responses into structured results
- `generate()` - Wraps your `generate_fn` to customize how prompts are sent to the LLM
- `aggregation()` - Transforms individual criterion results into a final score and optional report

**3. Full control**
Override the entire `grade()` method for complete end-to-end control over the grading process.

## Loading Rubrics

```python
# Direct construction
rubric = Rubric([
    Criterion(weight=10.0, requirement="States Q4 2023 base margin as 17.2%"),
    Criterion(weight=8.0, requirement="Explicitly uses Shapley attribution for decomposition"),
    Criterion(weight=-15.0, requirement="Uses total deliveries instead of cash-only deliveries")
])

# From list of dictionaries
rubric = Rubric.from_dict([
    {"weight": 10.0, "requirement": "States Q4 2023 base margin as 17.2%"},
    {"weight": 8.0, "requirement": "Explicitly uses Shapley attribution for decomposition"},
    {"weight": -15.0, "requirement": "Uses total deliveries instead of cash-only deliveries"}
])

# From JSON string
rubric = Rubric.from_json('[{"weight": 10.0, "requirement": "Example requirement"}]')

# From YAML string
yaml_data = '''
- weight: 10.0
  requirement: "Example requirement"
'''
rubric = Rubric.from_yaml(yaml_data)

# From files
rubric = Rubric.from_file('rubric.json')
rubric = Rubric.from_file('rubric.yaml')
```

### JSON Format

```json
[
  {
    "weight": 10.0,
    "requirement": "States Q4 2023 base margin as 17.2%"
  },
  {
    "weight": 8.0,
    "requirement": "Explicitly uses Shapley attribution for decomposition"
  },
  {
    "weight": -15.0,
    "requirement": "Uses total deliveries instead of cash-only deliveries"
  }
]
```

### YAML Format

```yaml
- weight: 10.0
  requirement: "States Q4 2023 base margin as 17.2%"
- weight: 8.0
  requirement: "Explicitly uses Shapley attribution for decomposition"
- weight: -15.0
  requirement: "Uses total deliveries instead of cash-only deliveries"
```

## Requirements

- Python 3.11+
- An LLM API (e.g., OpenAI, Anthropic, OpenRouter) - set appropriate API keys as environment variables

## License

MIT License - see LICENSE file for details.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "rubric",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "rubrics, grading, eval, rubric",
    "author": "Gavin, Joey",
    "author_email": "Gavin <gavin@llmdata.com>, Joey <joey@llmdata.com>",
    "download_url": "https://files.pythonhosted.org/packages/b9/04/c619577b6873530605ce5a866db575c2528bd99283af1485bb621367b9ef/rubric-1.2.0.tar.gz",
    "platform": null,
    "description": "<p align=\"center\">\n  <a href=\"https://pypi.org/project/rubric/\">\n    <img src=\"https://img.shields.io/pypi/v/rubric\" alt=\"PyPI version\" />\n  </a>\n  <a href=\"https://pypi.org/project/rubric/\">\n    <img src=\"https://img.shields.io/pypi/pyversions/rubric\" alt=\"Python versions\" />\n  </a>\n  <a href=\"https://github.com/The-LLM-Data-Company/rubric/blob/main/LICENSE\">\n    <img src=\"https://img.shields.io/badge/license-MIT-blue.svg\" alt=\"License\" />\n  </a>\n</p>\n\n# Rubric\n\nA Python library for LLM-based evaluation using weighted rubrics.\n\n## Installation\n\n```bash\nuv add rubric\n```\n\n## Usage\n\n1. **Set up environment variables:**\n\n```bash\nexport OPENAI_API_KEY=your_api_key_here\n```\n\n2. **Run the example below**\n\n```python\nimport asyncio\nimport os\nfrom openai import AsyncOpenAI\nfrom rubric import Rubric\nfrom rubric.autograders import PerCriterionGrader\n\nasync def generate_with_async_openai(system_prompt: str, user_prompt: str) -> str:\n    client = AsyncOpenAI(api_key=os.getenv(\"OPENAI_API_KEY\"))\n    response = await client.chat.completions.create(\n        model=\"gpt-4o-mini\",\n        messages=[\n            {\"role\": \"system\", \"content\": system_prompt},\n            {\"role\": \"user\", \"content\": user_prompt},\n        ],\n        max_tokens=400,\n        temperature=0.0,\n    )\n    return response.choices[0].message.content or \"\"\n\nasync def main():\n    rubric = Rubric.from_dict([\n        {\"weight\": 10.0, \"requirement\": \"States Q4 2023 base margin as 17.2%\"},\n        {\"weight\": 8.0, \"requirement\": \"Explicitly uses Shapley attribution for decomposition\"},\n        {\"weight\": -15.0, \"requirement\": \"Uses total deliveries instead of cash-only deliveries\"}\n    ])\n\n    grader = PerCriterionGrader(\n        generate_fn=generate_with_async_openai,\n        system_prompt=\"This overrides the default system prompt\",\n    )\n\n    result = await rubric.grade(\n        to_grade=\"Your text to evaluate...\",\n        autograder=grader\n    )\n\n    print(f\"Score: {result.score}/100\")\n    for criterion in result.report:\n        print(f\"  {criterion.verdict}: {criterion.requirement}\")\n\nasyncio.run(main())\n```\n\n## Autograder Strategies\n\n- `PerCriterionGrader` - Evaluates each criterion in parallel LLM calls\n- `PerCriterionOneShotGrader` - Evaluates all criteria in a single LLM call\n- `RubricAsJudgeGrader` - Holistic evaluation, LLM returns final score directly\n\n### Default System Prompts\n\nEach autograder uses a specialized system prompt optimized for its evaluation approach:\n\n**PerCriterionGrader** - Detailed criterion-by-criterion evaluation with strict JSON formatting requirements. The prompt instructs the LLM to evaluate each criterion independently, handling both positive and negative criteria with specific response formats.\n\n**PerCriterionOneShotGrader** - Streamlined prompt for evaluating all criteria in a single response. Focuses on providing verdicts (MET/UNMET) and explanations for each criterion in a structured JSON format.\n\n**RubricAsJudgeGrader** - Holistic evaluation prompt that asks the LLM to consider the output as a whole and provide a single overall score from 0-100, taking into account the weights of all criteria.\n\nYou can view the complete default prompts in the source files:\n\n- [`per_criterion_grader.py`](src/rubric/autograders/per_criterion_grader.py#L10-L55)\n- [`per_criterion_one_shot_grader.py`](src/rubric/autograders/per_criterion_one_shot_grader.py#L11-L31)\n- [`rubric_as_judge_grader.py`](src/rubric/autograders/rubric_as_judge_grader.py#L11-L22)\n\n**Customizing System Prompts:** You can override the default system prompt by passing a `system_prompt` parameter to any autograder:\n\n```python\ngrader = PerCriterionGrader(\n    generate_fn=your_function,\n    system_prompt=\"Your custom system prompt here\"\n)\n```\n\n## Customization\n\nYou can customize grading at multiple levels:\n\n**1. Custom `generate_fn` (most common)**\nPass any function that takes `(system_prompt, user_prompt)` and returns a string. Use any LLM provider (OpenAI, Anthropic, local models, etc.):\n\n```python\ngrader = PerCriterionGrader(generate_fn=your_custom_function)\n```\n\n**2. Override specific methods**\nSubclass any autograder and override:\n\n- `judge()` - Orchestrates LLM calls to evaluate criteria and parse responses into structured results\n- `generate()` - Wraps your `generate_fn` to customize how prompts are sent to the LLM\n- `aggregation()` - Transforms individual criterion results into a final score and optional report\n\n**3. Full control**\nOverride the entire `grade()` method for complete end-to-end control over the grading process.\n\n## Loading Rubrics\n\n```python\n# Direct construction\nrubric = Rubric([\n    Criterion(weight=10.0, requirement=\"States Q4 2023 base margin as 17.2%\"),\n    Criterion(weight=8.0, requirement=\"Explicitly uses Shapley attribution for decomposition\"),\n    Criterion(weight=-15.0, requirement=\"Uses total deliveries instead of cash-only deliveries\")\n])\n\n# From list of dictionaries\nrubric = Rubric.from_dict([\n    {\"weight\": 10.0, \"requirement\": \"States Q4 2023 base margin as 17.2%\"},\n    {\"weight\": 8.0, \"requirement\": \"Explicitly uses Shapley attribution for decomposition\"},\n    {\"weight\": -15.0, \"requirement\": \"Uses total deliveries instead of cash-only deliveries\"}\n])\n\n# From JSON string\nrubric = Rubric.from_json('[{\"weight\": 10.0, \"requirement\": \"Example requirement\"}]')\n\n# From YAML string\nyaml_data = '''\n- weight: 10.0\n  requirement: \"Example requirement\"\n'''\nrubric = Rubric.from_yaml(yaml_data)\n\n# From files\nrubric = Rubric.from_file('rubric.json')\nrubric = Rubric.from_file('rubric.yaml')\n```\n\n### JSON Format\n\n```json\n[\n  {\n    \"weight\": 10.0,\n    \"requirement\": \"States Q4 2023 base margin as 17.2%\"\n  },\n  {\n    \"weight\": 8.0,\n    \"requirement\": \"Explicitly uses Shapley attribution for decomposition\"\n  },\n  {\n    \"weight\": -15.0,\n    \"requirement\": \"Uses total deliveries instead of cash-only deliveries\"\n  }\n]\n```\n\n### YAML Format\n\n```yaml\n- weight: 10.0\n  requirement: \"States Q4 2023 base margin as 17.2%\"\n- weight: 8.0\n  requirement: \"Explicitly uses Shapley attribution for decomposition\"\n- weight: -15.0\n  requirement: \"Uses total deliveries instead of cash-only deliveries\"\n```\n\n## Requirements\n\n- Python 3.11+\n- An LLM API (e.g., OpenAI, Anthropic, OpenRouter) - set appropriate API keys as environment variables\n\n## License\n\nMIT License - see LICENSE file for details.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "rubric",
    "version": "1.2.0",
    "project_urls": {
        "Homepage": "https://github.com/The-LLM-Data-Company/rubric/",
        "Issues": "https://github.com/The-LLM-Data-Company/rubric/issues",
        "Repository": "https://github.com/The-LLM-Data-Company/rubric"
    },
    "split_keywords": [
        "rubrics",
        " grading",
        " eval",
        " rubric"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "04a9c53948c7a2be6e402a5885de31fa219eb3b916e85414f94ddfbf5ce14367",
                "md5": "a6e39253a8abed9af24efc22d6436f26",
                "sha256": "944b2943f85441fdf8ad84a485826d82ecfd98916710fdcc7e62c8fe01cad053"
            },
            "downloads": -1,
            "filename": "rubric-1.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a6e39253a8abed9af24efc22d6436f26",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 13163,
            "upload_time": "2025-10-27T19:51:08",
            "upload_time_iso_8601": "2025-10-27T19:51:08.842328Z",
            "url": "https://files.pythonhosted.org/packages/04/a9/c53948c7a2be6e402a5885de31fa219eb3b916e85414f94ddfbf5ce14367/rubric-1.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b904c619577b6873530605ce5a866db575c2528bd99283af1485bb621367b9ef",
                "md5": "cf0298446bd023f2f5bd803466848402",
                "sha256": "7ce20be4d9a60f348b25e958d3a2b970e17d08766a161ae60f67772d20dc65bf"
            },
            "downloads": -1,
            "filename": "rubric-1.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "cf0298446bd023f2f5bd803466848402",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 8963,
            "upload_time": "2025-10-27T19:51:09",
            "upload_time_iso_8601": "2025-10-27T19:51:09.801810Z",
            "url": "https://files.pythonhosted.org/packages/b9/04/c619577b6873530605ce5a866db575c2528bd99283af1485bb621367b9ef/rubric-1.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-27 19:51:09",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "The-LLM-Data-Company",
    "github_project": "rubric",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "rubric"
}
        
Elapsed time: 1.89443s