llm-regression-tester

Name	llm-regression-tester JSON
Version	0.1.2 JSON
	download
home_page	https://github.com/ktech99/llm-regression-tester
Summary	A flexible library for testing LLM responses against predefined rubrics using OpenAI's API for automated scoring
upload_time	2025-09-06 18:46:59
maintainer	None
docs_url	None
author	LLM Regression Tester Contributors
requires_python	>=3.8
license	MIT
keywords	llm testing evaluation rubrics ai nlp
VCS
bugtrack_url
requirements	typing-extensions openai python-dotenv
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # LLM Regression Tester

[![PyPI version](https://badge.fury.io/py/llm-regression-tester.svg)](https://pypi.org/project/llm-regression-tester/)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![MIT License](https://img.shields.io/badge/license-MIT-green.svg)](https://opensource.org/licenses/MIT)

A Python library for testing LLM responses against predefined rubrics using OpenAI's API for automated scoring. Features easy assert methods for clean, readable tests. Simple, focused, and powerful.

## 🤔 Why This Library?

Traditional LLM-as-a-judge approaches lacked sophisticated scoring mechanisms - they couldn't properly weight different issues or apply negative marking, making accurate grading challenging. This library solves that problem by implementing a **college-style rubric system with negative marking**, enabling precise and nuanced evaluation of LLM responses.

**Key Innovation:**
- **Weighted Scoring**: Different rubric criteria can have different point values based on importance
- **Negative Marking**: Incorrect or missing elements can deduct points, not just give zero
- **Flexible Rubrics**: Define custom evaluation criteria that reflect real-world grading standards
- **Accurate Assessment**: More precise scoring that mirrors human evaluation processes

## 🚀 Features

- **OpenAI Integration**: Seamless integration with OpenAI's API
- **Flexible Rubric System**: Define custom evaluation criteria and scoring rules
- **Automated Scoring**: AI-powered evaluation of responses against guidelines
- **Easy Assert Methods**: Simple `assert_pass()`, `assert_fail()`, and `assert_score()` methods for testing
- **Environment Variables**: Support for .env files and environment variables
- **Simple API**: Easy to use with minimal configuration
- **Type Hints**: Full type annotation support for better IDE experience
- **Comprehensive Testing**: Built-in test examples and utilities

## 📦 Installation

### Basic Installation
```bash
pip install llm-regression-tester
```

### Environment Variables
The library uses .env files to securely store your API keys:

#### .env File Setup (Required)
```bash
# Create a .env file in your project root
echo "OPENAI_API_KEY=your-openai-api-key-here" > .env

# Edit the .env file with your actual API key
# OPENAI_API_KEY=sk-your-actual-api-key-here
```

**Important:** The library automatically loads `.env` files. Make sure `.env` is in your `.gitignore` to keep your keys secure.

## 🔧 Quick Start

### 1. Create a Rubric File

Create a JSON file defining your evaluation criteria:

```json
[
  {
    "name": "customer_support_response",
    "min_score_to_pass": 7,
    "guidelines": [
      {
        "id": "polite",
        "description": "Response is polite and professional",
        "correct_score": 3,
        "incorrect_score": 0
      },
      {
        "id": "accurate",
        "description": "Response provides accurate information",
        "correct_score": 2,
        "incorrect_score": 0
      },
      {
        "id": "helpful",
        "description": "Response offers specific help or next steps",
        "correct_score": 2,
        "incorrect_score": 0
      }
    ]
  }
]
```

### 2. Basic Usage

```python
from llm_regression_tester import LLMRegressionTester

# Option 1: Initialize with API key parameter
tester = LLMRegressionTester(
    rubric_file_path="rubrics.json",
    openai_api_key="your-openai-api-key"
)

# Option 2: Initialize with .env file (recommended for security)
# Create a .env file with: OPENAI_API_KEY=your-actual-api-key
tester = LLMRegressionTester(
    rubric_file_path="rubrics.json"
    # API key will be automatically loaded from .env file
)

# Test a response
result = tester.test_response("customer_support_response", "Thank you for your question...")
print(f"Score: {result['total_score']}/{result['min_score_to_pass']}")
print(f"Pass: {result['pass_status']}")

# Or use easy assert methods for testing
tester.assert_pass("customer_support_response", "Thank you for your question...")
tester.assert_fail("customer_support_response", "This is a terrible response.")
tester.assert_score("customer_support_response", "Good response", 7)
```

### 3. Easy Assert Methods for Testing

The library provides simple assert methods that make testing LLM responses intuitive and readable:

```python
from llm_regression_tester import LLMRegressionTester

tester = LLMRegressionTester("rubrics.json")

# Assert that a response passes the rubric
tester.assert_pass("customer_support", good_response)

# Assert that a response fails the rubric
tester.assert_fail("customer_support", bad_response)

# Assert a specific score
tester.assert_score("customer_support", response, 8)

# Custom error messages
tester.assert_pass("rubric", response, "Professional response should pass quality check")
```

**Benefits:**
- **Clean Syntax**: One-line assertions instead of multiple lines of result checking
- **Clear Errors**: Helpful error messages showing exactly what failed and why
- **Flexible**: Optional custom messages for better test documentation
- **Powerful**: Supports pass/fail/score assertions for comprehensive testing

### 4. Using .env Files

```python
from llm_regression_tester import LLMRegressionTester

# Create a .env file with your API key:
# OPENAI_API_KEY=your-actual-api-key

# Initialize without API key parameter
tester = LLMRegressionTester("rubrics.json")
# API key will be automatically loaded from .env file

result = tester.test_response("customer_support_response", "Hello, how can I help?")
print(f"Score: {result['total_score']}/{result['min_score_to_pass']}")
```

### 5. Test Examples

The library includes comprehensive test examples showing how to use the assert methods in practice:

```bash
# Run the test examples
python test_examples.py
```

This demonstrates:
- Customer service response quality testing
- Code review evaluation
- Content moderation
- Practical usage patterns with the assert methods

### Running Tests

```bash
# Run all tests
pytest

# Run specific test examples
pytest tests/test_basic.py::test_assert_pass_method -v

# Run the example demonstration
python test_examples.py
```

## 📋 API Reference

### LLMRegressionTester

#### Constructor
```python
LLMRegressionTester(
    rubric_file_path: str,
    openai_api_key: Optional[str] = None,
    openai_model: str = "gpt-4o-mini"
)
```

**Parameters:**
- `rubric_file_path`: Path to JSON file containing rubrics
- `openai_api_key`: OpenAI API key. If None, will check OPENAI_API_KEY environment variable
- `openai_model`: OpenAI model to use (default: gpt-4o-mini)

#### Methods

##### `test_response(name: str, response: str) -> Dict[str, Any]`
Test a response against a specific rubric.

**Returns:**
```python
{
    "total_score": int,
    "pass_status": bool,
    "min_score_to_pass": int,
    "details": [
        {
            "id": str,
            "description": str,
            "meets": bool,
            "score": int
        }
    ]
}
```

##### `get_available_rubrics() -> List[str]`
Get list of available rubric names.

##### `get_rubric_details(name: str) -> Optional[Dict[str, Any]]`
Get details of a specific rubric.

##### `assert_pass(rubric_name: str, response: str, message: str = None) -> None`
Assert that a response passes the specified rubric test. Raises AssertionError if the test fails.

##### `assert_fail(rubric_name: str, response: str, message: str = None) -> None`
Assert that a response fails the specified rubric test. Raises AssertionError if the test passes when it should fail.

##### `assert_score(rubric_name: str, response: str, expected_score: int, message: str = None) -> None`
Assert that a response achieves a specific score. Raises AssertionError if the score doesn't match.

## 🏗️ Architecture

The library has a simple, focused architecture:

```
LLMRegressionTester
├── Rubric Management (JSON file loading and validation)
├── OpenAI Integration (direct API calls)
├── Response Evaluation (automated scoring)
└── Assert Methods (easy testing with assert_pass/fail/score)
```

**Key Components:**
- **Rubric System**: JSON-based evaluation criteria
- **OpenAI Client**: Direct integration with OpenAI's API
- **Assert Methods**: Simple `assert_pass()`, `assert_fail()`, and `assert_score()` methods
- **Environment Support**: Automatic loading from .env files and environment variables
- **Error Handling**: Comprehensive validation and error reporting

## 📝 Rubric Format

Rubrics are defined in JSON format:

```json
{
  "name": "rubric_name",
  "min_score_to_pass": 7,
  "guidelines": [
    {
      "id": "unique_id",
      "description": "Description of the criterion",
      "correct_score": 2,
      "incorrect_score": 0
    }
  ]
}
```

## 🔐 Environment Variables

The library supports the following environment variables for API keys:

- `OPENAI_API_KEY`: OpenAI API key for GPT models

**Security Best Practices:**
- Never commit API keys to version control
- Use environment variables or secure credential management
- Rotate keys regularly
- Use different keys for development and production

## 🤝 Contributing

We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.

### Development Guidelines

1. Follow the existing code style and patterns
2. Add comprehensive tests for new features
3. Update documentation for any changes
4. Ensure backward compatibility when possible
5. Test with different rubric configurations

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
---

**Happy Testing!** 🎉

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ktech99/llm-regression-tester",
    "name": "llm-regression-tester",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "Kartik Arora <kartikarora99@gmail.com>",
    "keywords": "llm, testing, evaluation, rubrics, ai, nlp",
    "author": "LLM Regression Tester Contributors",
    "author_email": "Kartik Arora <kartikarora99@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/ff/0c/7b93e8a821822b7027555fec951f4e985bfaa8729dd391e2a5efb0c26dd9/llm_regression_tester-0.1.2.tar.gz",
    "platform": null,
    "description": "# LLM Regression Tester\n\n[![PyPI version](https://badge.fury.io/py/llm-regression-tester.svg)](https://pypi.org/project/llm-regression-tester/)\n[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)\n[![MIT License](https://img.shields.io/badge/license-MIT-green.svg)](https://opensource.org/licenses/MIT)\n\nA Python library for testing LLM responses against predefined rubrics using OpenAI's API for automated scoring. Features easy assert methods for clean, readable tests. Simple, focused, and powerful.\n\n## \ud83e\udd14 Why This Library?\n\nTraditional LLM-as-a-judge approaches lacked sophisticated scoring mechanisms - they couldn't properly weight different issues or apply negative marking, making accurate grading challenging. This library solves that problem by implementing a **college-style rubric system with negative marking**, enabling precise and nuanced evaluation of LLM responses.\n\n**Key Innovation:**\n- **Weighted Scoring**: Different rubric criteria can have different point values based on importance\n- **Negative Marking**: Incorrect or missing elements can deduct points, not just give zero\n- **Flexible Rubrics**: Define custom evaluation criteria that reflect real-world grading standards\n- **Accurate Assessment**: More precise scoring that mirrors human evaluation processes\n\n## \ud83d\ude80 Features\n\n- **OpenAI Integration**: Seamless integration with OpenAI's API\n- **Flexible Rubric System**: Define custom evaluation criteria and scoring rules\n- **Automated Scoring**: AI-powered evaluation of responses against guidelines\n- **Easy Assert Methods**: Simple `assert_pass()`, `assert_fail()`, and `assert_score()` methods for testing\n- **Environment Variables**: Support for .env files and environment variables\n- **Simple API**: Easy to use with minimal configuration\n- **Type Hints**: Full type annotation support for better IDE experience\n- **Comprehensive Testing**: Built-in test examples and utilities\n\n## \ud83d\udce6 Installation\n\n### Basic Installation\n```bash\npip install llm-regression-tester\n```\n\n### Environment Variables\nThe library uses .env files to securely store your API keys:\n\n#### .env File Setup (Required)\n```bash\n# Create a .env file in your project root\necho \"OPENAI_API_KEY=your-openai-api-key-here\" > .env\n\n# Edit the .env file with your actual API key\n# OPENAI_API_KEY=sk-your-actual-api-key-here\n```\n\n**Important:** The library automatically loads `.env` files. Make sure `.env` is in your `.gitignore` to keep your keys secure.\n\n## \ud83d\udd27 Quick Start\n\n### 1. Create a Rubric File\n\nCreate a JSON file defining your evaluation criteria:\n\n```json\n[\n  {\n    \"name\": \"customer_support_response\",\n    \"min_score_to_pass\": 7,\n    \"guidelines\": [\n      {\n        \"id\": \"polite\",\n        \"description\": \"Response is polite and professional\",\n        \"correct_score\": 3,\n        \"incorrect_score\": 0\n      },\n      {\n        \"id\": \"accurate\",\n        \"description\": \"Response provides accurate information\",\n        \"correct_score\": 2,\n        \"incorrect_score\": 0\n      },\n      {\n        \"id\": \"helpful\",\n        \"description\": \"Response offers specific help or next steps\",\n        \"correct_score\": 2,\n        \"incorrect_score\": 0\n      }\n    ]\n  }\n]\n```\n\n### 2. Basic Usage\n\n```python\nfrom llm_regression_tester import LLMRegressionTester\n\n# Option 1: Initialize with API key parameter\ntester = LLMRegressionTester(\n    rubric_file_path=\"rubrics.json\",\n    openai_api_key=\"your-openai-api-key\"\n)\n\n# Option 2: Initialize with .env file (recommended for security)\n# Create a .env file with: OPENAI_API_KEY=your-actual-api-key\ntester = LLMRegressionTester(\n    rubric_file_path=\"rubrics.json\"\n    # API key will be automatically loaded from .env file\n)\n\n# Test a response\nresult = tester.test_response(\"customer_support_response\", \"Thank you for your question...\")\nprint(f\"Score: {result['total_score']}/{result['min_score_to_pass']}\")\nprint(f\"Pass: {result['pass_status']}\")\n\n# Or use easy assert methods for testing\ntester.assert_pass(\"customer_support_response\", \"Thank you for your question...\")\ntester.assert_fail(\"customer_support_response\", \"This is a terrible response.\")\ntester.assert_score(\"customer_support_response\", \"Good response\", 7)\n```\n\n### 3. Easy Assert Methods for Testing\n\nThe library provides simple assert methods that make testing LLM responses intuitive and readable:\n\n```python\nfrom llm_regression_tester import LLMRegressionTester\n\ntester = LLMRegressionTester(\"rubrics.json\")\n\n# Assert that a response passes the rubric\ntester.assert_pass(\"customer_support\", good_response)\n\n# Assert that a response fails the rubric\ntester.assert_fail(\"customer_support\", bad_response)\n\n# Assert a specific score\ntester.assert_score(\"customer_support\", response, 8)\n\n# Custom error messages\ntester.assert_pass(\"rubric\", response, \"Professional response should pass quality check\")\n```\n\n**Benefits:**\n- **Clean Syntax**: One-line assertions instead of multiple lines of result checking\n- **Clear Errors**: Helpful error messages showing exactly what failed and why\n- **Flexible**: Optional custom messages for better test documentation\n- **Powerful**: Supports pass/fail/score assertions for comprehensive testing\n\n### 4. Using .env Files\n\n```python\nfrom llm_regression_tester import LLMRegressionTester\n\n# Create a .env file with your API key:\n# OPENAI_API_KEY=your-actual-api-key\n\n# Initialize without API key parameter\ntester = LLMRegressionTester(\"rubrics.json\")\n# API key will be automatically loaded from .env file\n\nresult = tester.test_response(\"customer_support_response\", \"Hello, how can I help?\")\nprint(f\"Score: {result['total_score']}/{result['min_score_to_pass']}\")\n```\n\n### 5. Test Examples\n\nThe library includes comprehensive test examples showing how to use the assert methods in practice:\n\n```bash\n# Run the test examples\npython test_examples.py\n```\n\nThis demonstrates:\n- Customer service response quality testing\n- Code review evaluation\n- Content moderation\n- Practical usage patterns with the assert methods\n\n### Running Tests\n\n```bash\n# Run all tests\npytest\n\n# Run specific test examples\npytest tests/test_basic.py::test_assert_pass_method -v\n\n# Run the example demonstration\npython test_examples.py\n```\n\n## \ud83d\udccb API Reference\n\n### LLMRegressionTester\n\n#### Constructor\n```python\nLLMRegressionTester(\n    rubric_file_path: str,\n    openai_api_key: Optional[str] = None,\n    openai_model: str = \"gpt-4o-mini\"\n)\n```\n\n**Parameters:**\n- `rubric_file_path`: Path to JSON file containing rubrics\n- `openai_api_key`: OpenAI API key. If None, will check OPENAI_API_KEY environment variable\n- `openai_model`: OpenAI model to use (default: gpt-4o-mini)\n\n#### Methods\n\n##### `test_response(name: str, response: str) -> Dict[str, Any]`\nTest a response against a specific rubric.\n\n**Returns:**\n```python\n{\n    \"total_score\": int,\n    \"pass_status\": bool,\n    \"min_score_to_pass\": int,\n    \"details\": [\n        {\n            \"id\": str,\n            \"description\": str,\n            \"meets\": bool,\n            \"score\": int\n        }\n    ]\n}\n```\n\n##### `get_available_rubrics() -> List[str]`\nGet list of available rubric names.\n\n##### `get_rubric_details(name: str) -> Optional[Dict[str, Any]]`\nGet details of a specific rubric.\n\n##### `assert_pass(rubric_name: str, response: str, message: str = None) -> None`\nAssert that a response passes the specified rubric test. Raises AssertionError if the test fails.\n\n##### `assert_fail(rubric_name: str, response: str, message: str = None) -> None`\nAssert that a response fails the specified rubric test. Raises AssertionError if the test passes when it should fail.\n\n##### `assert_score(rubric_name: str, response: str, expected_score: int, message: str = None) -> None`\nAssert that a response achieves a specific score. Raises AssertionError if the score doesn't match.\n\n## \ud83c\udfd7\ufe0f Architecture\n\nThe library has a simple, focused architecture:\n\n```\nLLMRegressionTester\n\u251c\u2500\u2500 Rubric Management (JSON file loading and validation)\n\u251c\u2500\u2500 OpenAI Integration (direct API calls)\n\u251c\u2500\u2500 Response Evaluation (automated scoring)\n\u2514\u2500\u2500 Assert Methods (easy testing with assert_pass/fail/score)\n```\n\n**Key Components:**\n- **Rubric System**: JSON-based evaluation criteria\n- **OpenAI Client**: Direct integration with OpenAI's API\n- **Assert Methods**: Simple `assert_pass()`, `assert_fail()`, and `assert_score()` methods\n- **Environment Support**: Automatic loading from .env files and environment variables\n- **Error Handling**: Comprehensive validation and error reporting\n\n## \ud83d\udcdd Rubric Format\n\nRubrics are defined in JSON format:\n\n```json\n{\n  \"name\": \"rubric_name\",\n  \"min_score_to_pass\": 7,\n  \"guidelines\": [\n    {\n      \"id\": \"unique_id\",\n      \"description\": \"Description of the criterion\",\n      \"correct_score\": 2,\n      \"incorrect_score\": 0\n    }\n  ]\n}\n```\n\n## \ud83d\udd10 Environment Variables\n\nThe library supports the following environment variables for API keys:\n\n- `OPENAI_API_KEY`: OpenAI API key for GPT models\n\n**Security Best Practices:**\n- Never commit API keys to version control\n- Use environment variables or secure credential management\n- Rotate keys regularly\n- Use different keys for development and production\n\n## \ud83e\udd1d Contributing\n\nWe welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.\n\n### Development Guidelines\n\n1. Follow the existing code style and patterns\n2. Add comprehensive tests for new features\n3. Update documentation for any changes\n4. Ensure backward compatibility when possible\n5. Test with different rubric configurations\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n---\n\n**Happy Testing!** \ud83c\udf89\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A flexible library for testing LLM responses against predefined rubrics using OpenAI's API for automated scoring",
    "version": "0.1.2",
    "project_urls": {
        "Changelog": "https://github.com/ktech99/llm-regression-tester/blob/main/CHANGELOG.md",
        "Documentation": "https://llm-regression-tester.readthedocs.io/",
        "Homepage": "https://github.com/ktech99/llm-regression-tester",
        "Issues": "https://github.com/ktech99/llm-regression-tester/issues",
        "Repository": "https://github.com/ktech99/llm-regression-tester.git"
    },
    "split_keywords": [
        "llm",
        " testing",
        " evaluation",
        " rubrics",
        " ai",
        " nlp"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c62739a345de2b46d8688373ef7d7536add21e5a4b012c27749ac4124cb4e44c",
                "md5": "d169fc49128946b75bd87873f555c51b",
                "sha256": "7ea8c2ad4cb4b821f9968cc3d2393122260a45ade240fa1534499bfae91a0734"
            },
            "downloads": -1,
            "filename": "llm_regression_tester-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d169fc49128946b75bd87873f555c51b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 9879,
            "upload_time": "2025-09-06T18:46:58",
            "upload_time_iso_8601": "2025-09-06T18:46:58.289867Z",
            "url": "https://files.pythonhosted.org/packages/c6/27/39a345de2b46d8688373ef7d7536add21e5a4b012c27749ac4124cb4e44c/llm_regression_tester-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ff0c7b93e8a821822b7027555fec951f4e985bfaa8729dd391e2a5efb0c26dd9",
                "md5": "5581512b3e3747fc0e293b48fe22b488",
                "sha256": "4d8178d4843366b07ebeaf8c28317aa5733ff4c9d5b29504a46f8243990c3ed5"
            },
            "downloads": -1,
            "filename": "llm_regression_tester-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "5581512b3e3747fc0e293b48fe22b488",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 23258,
            "upload_time": "2025-09-06T18:46:59",
            "upload_time_iso_8601": "2025-09-06T18:46:59.275081Z",
            "url": "https://files.pythonhosted.org/packages/ff/0c/7b93e8a821822b7027555fec951f4e985bfaa8729dd391e2a5efb0c26dd9/llm_regression_tester-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-06 18:46:59",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ktech99",
    "github_project": "llm-regression-tester",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "typing-extensions",
            "specs": [
                [
                    ">=",
                    "4.0.0"
                ]
            ]
        },
        {
            "name": "openai",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "python-dotenv",
            "specs": [
                [
                    ">=",
                    "0.19.0"
                ]
            ]
        }
    ],
    "lcname": "llm-regression-tester"
}

LLM Regression Tester Contributors