LLMEvaluationFramework


NameLLMEvaluationFramework JSON
Version 0.0.21 PyPI version JSON
download
home_pagehttps://github.com/isathish/LLMEvaluationFramework
SummaryEnterprise-Grade Python Framework for Large Language Model Evaluation & Testing
upload_time2025-10-12 08:37:49
maintainerNone
docs_urlNone
authorSathishkumar Nagarajan
requires_python>=3.8
licenseMIT
keywords llm evaluation testing benchmarking ai ml machine-learning natural-language-processing nlp language-models openai gpt anthropic claude
VCS
bugtrack_url
requirements typing-extensions pytest pytest-cov black flake8 mypy mkdocs mkdocs-material
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ๐Ÿš€ LLM Evaluation Framework

<div align="center">

[![License](https://img.shields.io/github/license/isathish/LLMEvaluationFramework?style=for-the-badge&color=blue)](LICENSE)
[![Tests](https://img.shields.io/badge/Tests-212%20Passed-success?style=for-the-badge&logo=pytest)](https://github.com/isathish/LLMEvaluationFramework)
[![Coverage](https://img.shields.io/badge/Coverage-89%25-success?style=for-the-badge&logo=codecov)](https://github.com/isathish/LLMEvaluationFramework)
[![Python](https://img.shields.io/badge/Python-3.8%2B-blue?style=for-the-badge&logo=python)](https://python.org)
[![Documentation](https://img.shields.io/badge/Docs-MkDocs-blue?style=for-the-badge&logo=gitbook)](https://isathish.github.io/LLMEvaluationFramework/)

**๐ŸŒŸ Enterprise-Grade Python Framework for Large Language Model Evaluation & Testing ๐ŸŒŸ**

*Built with production-ready standards โ€ข Type-safe โ€ข Comprehensive testing โ€ข Full CLI support*

[๐Ÿ“š **Documentation**](https://isathish.github.io/LLMEvaluationFramework/) โ€ข [๏ฟฝ **Quick Start**](#-quick-start) โ€ข [๐Ÿ’ก **Examples**](examples/) โ€ข [๐Ÿ› **Report Issues**](https://github.com/isathish/LLMEvaluationFramework/issues)

</div>

---

## ๐ŸŒŸ What Makes This Special?

<table>
<tr>
<td width="50%">

### ๐ŸŽฏ **Production Ready**
- **212 comprehensive tests** with **89% coverage**
- Complete type hints throughout codebase
- Robust error handling with custom exceptions
- Enterprise-grade logging and monitoring

### โšก **High Performance**
- Async inference engine for concurrent evaluations
- Batch processing capabilities
- Cost optimization and tracking
- Memory-efficient data handling

</td>
<td width="50%">

### ๏ฟฝ๏ธ **Developer Friendly**
- Intuitive CLI interface for all operations
- Comprehensive documentation with examples
- Modular architecture for easy extension
- Multiple storage backends (JSON, SQLite)

### ๐Ÿ“Š **Rich Analytics**
- Multiple scoring strategies (Accuracy, F1, Custom)
- Detailed performance metrics
- Cost analysis and optimization
- Exportable evaluation reports

</td>
</tr>
</table>

---

## ๏ฟฝ Quick Installation

```bash
# Install from PyPI (Recommended)
pip install LLMEvaluationFramework

# Or install from source for latest features
git clone https://github.com/isathish/LLMEvaluationFramework.git
cd LLMEvaluationFramework
pip install -e .
```

**Requirements**: Python 3.8+ โ€ข No external dependencies for core functionality

---

## โšก Quick Start

### ๐Ÿ Python API (Recommended)

```python
from llm_evaluation_framework import (
    ModelRegistry, 
    ModelInferenceEngine, 
    TestDatasetGenerator
)

# 1๏ธโƒฃ Setup the registry and register your model
registry = ModelRegistry()
registry.register_model("gpt-3.5-turbo", {
    "provider": "openai",
    "api_cost_input": 0.0015,
    "api_cost_output": 0.002,
    "capabilities": ["reasoning", "creativity", "coding"]
})

# 2๏ธโƒฃ Generate test cases
generator = TestDatasetGenerator()
test_cases = generator.generate_test_cases(
    use_case={"domain": "general", "required_capabilities": ["reasoning"]},
    count=10
)

# 3๏ธโƒฃ Run evaluation
engine = ModelInferenceEngine(registry)
results = engine.evaluate_model("gpt-3.5-turbo", test_cases)

# 4๏ธโƒฃ Analyze results
print(f"โœ… Accuracy: {results['aggregate_metrics']['accuracy']:.1%}")
print(f"๐Ÿ’ฐ Total Cost: ${results['aggregate_metrics']['total_cost']:.4f}")
print(f"โฑ๏ธ  Total Time: {results['aggregate_metrics']['total_time']:.2f}s")
```

### ๐Ÿ–ฅ๏ธ Command Line Interface

```bash
# Evaluate a model with specific capabilities
llm-eval evaluate --model gpt-3.5-turbo --test-cases 10 --capability reasoning

# Generate a custom test dataset
llm-eval generate --capability coding --count 20 --output my_dataset.json

# Score predictions against references
llm-eval score --predictions "Hello world" "Good morning" \
               --references "Hello world" "Good evening" \
               --metric accuracy

# List available capabilities and models
llm-eval list
```

---

## ๏ฟฝ๏ธ Core Architecture

<div align="center">

```mermaid
graph TB
    CLI[๐Ÿ–ฅ๏ธ CLI Interface<br/>llm-eval] --> Engine[โš™๏ธ Inference Engine<br/>ModelInferenceEngine]
    
    Engine --> Registry[๐Ÿ—„๏ธ Model Registry<br/>ModelRegistry]
    Engine --> Generator[๐Ÿงช Dataset Generator<br/>TestDatasetGenerator]
    Engine --> Scoring[๐Ÿ“Š Scoring Strategies<br/>AccuracyScoringStrategy]
    
    Registry --> Models[(๐Ÿค– Models<br/>gpt-3.5-turbo, gpt-4, etc.)]
    
    Engine --> Storage[๐Ÿ’พ Persistence Layer]
    Storage --> JSON[๐Ÿ“„ JSON Store]
    Storage --> SQLite[๐Ÿ—ƒ๏ธ SQLite Store]
    
    Engine --> Utils[๐Ÿ› ๏ธ Utilities]
    Utils --> Logger[๐Ÿ“ Advanced Logging]
    Utils --> ErrorHandler[๐Ÿ›ก๏ธ Error Handling]
    Utils --> AutoSuggest[๐Ÿ’ก Auto Suggestions]
```

</div>

### ๐ŸŽฏ Core Components

| Component | Description | Key Features |
|-----------|-------------|--------------|
| **๐Ÿ”ฅ Inference Engine** | Execute and evaluate LLM inferences | Async processing, cost tracking, batch operations |
| **๐Ÿ—„๏ธ Model Registry** | Centralized model management | Multi-provider support, configuration management |
| **๐Ÿงช Dataset Generator** | Create synthetic test cases | Capability-based generation, domain-specific tests |
| **๐Ÿ“Š Scoring Strategies** | Multiple evaluation metrics | Accuracy, F1-score, custom metrics |
| **๐Ÿ’พ Persistence Layer** | Dual storage backends | JSON files, SQLite database with querying |
| **๐Ÿ›ก๏ธ Error Handling** | Robust error management | Custom exceptions, retry mechanisms |
| **๐Ÿ“ Logging System** | Advanced logging capabilities | File rotation, structured logging |

---

## ๐ŸŽฏ Feature Highlights

### ๐Ÿš€ **What You Can Do**

<table>
<tr>
<td width="33%">

#### ๐Ÿ”ฌ **Research & Benchmarking**
- Compare multiple LLM providers
- Standardized evaluation metrics  
- Reproducible experiments
- Performance benchmarking

</td>
<td width="33%">

#### ๐Ÿข **Enterprise Integration**
- CI/CD pipeline integration
- Automated regression testing
- Cost optimization analysis
- Quality assurance workflows

</td>
<td width="33%">

#### ๐Ÿ’ฐ **Cost Management**
- Real-time cost tracking
- Provider cost comparison
- Budget optimization
- ROI analysis

</td>
</tr>
</table>

### ๐Ÿ“Š **Supported Capabilities**

```python
# Available evaluation capabilities
CAPABILITIES = [
    "reasoning",      # Logical reasoning and problem-solving
    "creativity",     # Creative writing and ideation
    "factual",        # Factual accuracy and knowledge
    "instruction",    # Instruction following
    "coding"          # Code generation and debugging
]
```

### ๐ŸŽฎ **Interactive Examples**

<details>
<summary>๐Ÿ” <strong>Click to see Advanced Usage Examples</strong></summary>

#### ๐Ÿ“ˆ **Batch Evaluation with Multiple Models**

```python
from llm_evaluation_framework import ModelRegistry, ModelInferenceEngine
from llm_evaluation_framework.persistence import JSONStore

# Setup multiple models
registry = ModelRegistry()
models = {
    "gpt-3.5-turbo": {"provider": "openai", "cost_input": 0.0015},
    "gpt-4": {"provider": "openai", "cost_input": 0.03},
    "claude-3": {"provider": "anthropic", "cost_input": 0.015}
}

for name, config in models.items():
    registry.register_model(name, config)

# Run comparative evaluation
engine = ModelInferenceEngine(registry)
results = {}

for model_name in models.keys():
    print(f"๐Ÿš€ Evaluating {model_name}...")
    result = engine.evaluate_model(model_name, test_cases)
    results[model_name] = result
    
    # Save results
    store = JSONStore(f"results_{model_name}.json")
    store.save_evaluation_result(result)

# Compare results
for model, result in results.items():
    accuracy = result['aggregate_metrics']['accuracy']
    cost = result['aggregate_metrics']['total_cost']
    print(f"๐Ÿ“Š {model}: {accuracy:.1%} accuracy, ${cost:.4f} cost")
```

#### ๐ŸŽฏ **Custom Scoring Strategy**

```python
from llm_evaluation_framework.evaluation.scoring_strategies import ScoringContext

class CustomCosineSimilarityStrategy:
    """Custom scoring using cosine similarity."""
    
    def calculate_score(self, predictions, references):
        # Your custom scoring logic here
        from sklearn.metrics.pairwise import cosine_similarity
        from sklearn.feature_extraction.text import TfidfVectorizer
        
        vectorizer = TfidfVectorizer()
        vectors = vectorizer.fit_transform(predictions + references)
        
        pred_vectors = vectors[:len(predictions)]
        ref_vectors = vectors[len(predictions):]
        
        similarities = cosine_similarity(pred_vectors, ref_vectors)
        return similarities.diagonal().mean()

# Use custom strategy
custom_strategy = CustomCosineSimilarityStrategy()
context = ScoringContext(custom_strategy)
score = context.evaluate(predictions, references)
print(f"๐ŸŽฏ Custom similarity score: {score:.3f}")
```

#### ๐Ÿ”„ **Async Evaluation Pipeline**

```python
import asyncio
from llm_evaluation_framework.engines.async_inference_engine import AsyncInferenceEngine

async def run_async_evaluation():
    """Run multiple evaluations concurrently."""
    
    async_engine = AsyncInferenceEngine(registry)
    
    # Define multiple evaluation tasks
    tasks = []
    for capability in ["reasoning", "creativity", "coding"]:
        task = async_engine.evaluate_async(
            model_name="gpt-3.5-turbo",
            test_cases=test_cases,
            capability=capability
        )
        tasks.append(task)
    
    # Run all evaluations concurrently
    results = await asyncio.gather(*tasks)
    
    # Process results
    for i, result in enumerate(results):
        capability = ["reasoning", "creativity", "coding"][i]
        accuracy = result['aggregate_metrics']['accuracy']
        print(f"โœ… {capability}: {accuracy:.1%}")

# Run async evaluation
asyncio.run(run_async_evaluation())
```

</details>

---

## ๐Ÿ“š Documentation & Resources

<div align="center">

### ๐Ÿ“– **Comprehensive Documentation Available**

[![Documentation](https://img.shields.io/badge/Read%20the%20Docs-blue?style=for-the-badge&logo=gitbook)](https://isathish.github.io/LLMEvaluationFramework/)

</div>

| Section | Description | Link |
|---------|-------------|------|
| ๐Ÿš€ **Getting Started** | Installation, quick start, and basic concepts | [View Guide](https://isathish.github.io/LLMEvaluationFramework/categories/getting-started/) |
| ๐Ÿง  **Core Concepts** | Understanding the framework architecture | [Learn More](https://isathish.github.io/LLMEvaluationFramework/categories/core-concepts/) |
| ๐Ÿ–ฅ๏ธ **CLI Usage** | Complete command-line interface documentation | [CLI Guide](https://isathish.github.io/LLMEvaluationFramework/categories/cli-usage/) |
| ๐Ÿ“Š **API Reference** | Detailed API documentation with examples | [API Docs](https://isathish.github.io/LLMEvaluationFramework/categories/api-reference/) |
| ๐Ÿ’ก **Examples** | Practical examples and tutorials | [View Examples](https://isathish.github.io/LLMEvaluationFramework/categories/examples/) |
| ๐Ÿ› ๏ธ **Developer Guide** | Contributing guidelines and development setup | [Dev Guide](https://isathish.github.io/LLMEvaluationFramework/developer-guide/) |

---

## ๐Ÿงช Testing & Quality

<div align="center">

### ๐Ÿ† **High-Quality Codebase with Comprehensive Testing**

</div>

<table>
<tr>
<td width="25%" align="center">

**๐Ÿ“ˆ Test Coverage**
<br>
<strong style="font-size: 2em; color: #28a745;">89%</strong>
<br>
<em>Comprehensive test coverage</em>

</td>
<td width="25%" align="center">

**โœ… Total Tests**
<br>
<strong style="font-size: 2em; color: #007bff;">212</strong>
<br>
<em>All tests passing</em>

</td>
<td width="25%" align="center">

**๐Ÿ”ง Test Files**
<br>
<strong style="font-size: 2em; color: #6f42c1;">10+</strong>
<br>
<em>Modular test structure</em>

</td>
<td width="25%" align="center">

**โšก Test Types**
<br>
<strong style="font-size: 2em; color: #fd7e14;">4+</strong>
<br>
<em>Unit, Integration, Edge Cases</em>

</td>
</tr>
</table>

### ๐Ÿš€ **Run Tests Locally**

```bash
# Run all tests
pytest

# Run with detailed coverage report
pytest --cov=llm_evaluation_framework --cov-report=html

# Run specific test categories
pytest tests/test_model_inference_engine_comprehensive.py  # Core engine tests
pytest tests/test_cli_comprehensive.py                     # CLI tests
pytest tests/test_persistence_comprehensive.py            # Storage tests

# View coverage report
open htmlcov/index.html
```

### ๐Ÿ“Š **Test Categories**

| Test Type | Count | Description |
|-----------|-------|-------------|
| **๐Ÿ”ง Unit Tests** | 150+ | Individual component testing |
| **๐Ÿ”— Integration Tests** | 40+ | Component interaction testing |
| **๐ŸŽฏ Edge Case Tests** | 20+ | Error conditions and boundaries |
| **โšก Performance Tests** | 10+ | Speed and memory optimization |

---

## ๐Ÿค Contributing

<div align="center">

### ๐ŸŒŸ **We Welcome Contributors!**

[![Contributors](https://img.shields.io/github/contributors/isathish/LLMEvaluationFramework?style=for-the-badge)](https://github.com/isathish/LLMEvaluationFramework/graphs/contributors)
[![Issues](https://img.shields.io/github/issues/isathish/LLMEvaluationFramework?style=for-the-badge)](https://github.com/isathish/LLMEvaluationFramework/issues)
[![Pull Requests](https://img.shields.io/github/issues-pr/isathish/LLMEvaluationFramework?style=for-the-badge)](https://github.com/isathish/LLMEvaluationFramework/pulls)

</div>

### ๐Ÿ› ๏ธ **Development Setup**

```bash
# 1๏ธโƒฃ Fork and clone the repository
git clone https://github.com/YOUR_USERNAME/LLMEvaluationFramework.git
cd LLMEvaluationFramework

# 2๏ธโƒฃ Create a virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# 3๏ธโƒฃ Install in development mode
pip install -e ".[dev]"

# 4๏ธโƒฃ Run tests to ensure everything works
pytest

# 5๏ธโƒฃ Install pre-commit hooks (optional but recommended)
pre-commit install
```

### ๐Ÿ“ **Contribution Guidelines**

1. **๐Ÿด Fork** the repository
2. **๐ŸŒฟ Create** a feature branch (`git checkout -b feature/amazing-feature`)
3. **โœ… Write** tests for your changes
4. **๐Ÿงช Run** the test suite (`pytest`)
5. **๐Ÿ“ Commit** your changes (`git commit -m 'Add amazing feature'`)
6. **๐Ÿš€ Push** to the branch (`git push origin feature/amazing-feature`)
7. **๐Ÿ”€ Open** a Pull Request

### ๐ŸŽฏ **What We're Looking For**

- ๐Ÿ› Bug fixes and improvements
- ๐Ÿ“š Documentation enhancements
- โœจ New features and capabilities
- ๐Ÿงช Additional test cases
- ๐ŸŽจ UI/UX improvements for CLI
- ๐Ÿ”ง Performance optimizations

---

## ๐Ÿ“‹ Requirements & Compatibility

### ๐Ÿ **Python Version Support**

| Python Version | Status | Notes |
|----------------|--------|-------|
| **Python 3.8** | โœ… Supported | Minimum required version |
| **Python 3.9** | โœ… Supported | Fully tested |
| **Python 3.10** | โœ… Supported | Recommended |
| **Python 3.11** | โœ… Supported | Latest features |
| **Python 3.12+** | โœ… Supported | Future-ready |

### ๐Ÿ“ฆ **Dependencies**

```python
# Core dependencies (automatically installed)
REQUIRED = [
    # No external dependencies for core functionality!
    # Framework uses only Python standard library
]

# Optional development dependencies
DEVELOPMENT = [
    "pytest>=7.0.0",           # Testing framework
    "pytest-cov>=4.0.0",      # Coverage reporting
    "black>=22.0.0",           # Code formatting
    "flake8>=5.0.0",           # Code linting
    "mypy>=1.0.0",             # Type checking
    "pre-commit>=2.20.0",      # Git hooks
]
```

### ๐ŸŒ **Platform Support**

- โœ… **Linux** (Ubuntu, CentOS, RHEL)
- โœ… **macOS** (Intel & Apple Silicon)
- โœ… **Windows** (10, 11)
- โœ… **Docker** containers
- โœ… **CI/CD** environments (GitHub Actions, Jenkins, etc.)

---

## ๐Ÿ“„ License

<div align="center">

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=for-the-badge)](https://opensource.org/licenses/MIT)

**This project is licensed under the MIT License**

*You are free to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the software.*

[๐Ÿ“œ **Read the full license**](LICENSE)

</div>

---

## ๐Ÿ™ Acknowledgments & Credits

<div align="center">

### ๐ŸŒŸ **Built with Love and Open Source**

</div>

- **๐Ÿš€ Inspiration**: Born from the need for standardized, reliable LLM evaluation tools
- **๐Ÿ—๏ธ Architecture**: Built with modern Python best practices and enterprise standards
- **๐Ÿงช Testing**: Comprehensive test coverage ensuring production reliability  
- **๐Ÿ‘ฅ Community**: Driven by developers, researchers, and AI practitioners
- **๐Ÿ“š Documentation**: Extensive documentation for developers at all levels

### ๐Ÿ”ง **Technology Stack**

| Technology | Purpose | Why We Chose It |
|------------|---------|----------------|
| **๐Ÿ Python 3.8+** | Core Language | Wide adoption, excellent ecosystem |
| **๐Ÿ“‹ Type Hints** | Code Safety | Better IDE support, fewer runtime errors |
| **๐Ÿงช Pytest** | Testing Framework | Industry standard, excellent plugin ecosystem |
| **๐Ÿ“Š SQLite** | Database Storage | Lightweight, serverless, reliable |
| **๐Ÿ“ MkDocs** | Documentation | Beautiful docs, Markdown-based |
| **๐ŸŽจ Rich CLI** | User Interface | Modern, intuitive command-line experience |

---

## ๐Ÿ“ž Support & Community

<div align="center">

### ๐Ÿ’ฌ **Get Help & Connect**

[![GitHub Issues](https://img.shields.io/badge/Issues-Get%20Help-red?style=for-the-badge&logo=github)](https://github.com/isathish/LLMEvaluationFramework/issues)
[![GitHub Discussions](https://img.shields.io/badge/Discussions-Join%20Community-blue?style=for-the-badge&logo=github)](https://github.com/isathish/LLMEvaluationFramework/discussions)
[![Documentation](https://img.shields.io/badge/Docs-Read%20Here-green?style=for-the-badge&logo=gitbook)](https://isathish.github.io/LLMEvaluationFramework/)

</div>

### ๐Ÿ†˜ **Getting Support**

| Type | Where to Go | Response Time |
|------|-------------|---------------|
| **๐Ÿ› Bug Reports** | [GitHub Issues](https://github.com/isathish/LLMEvaluationFramework/issues) | 24-48 hours |
| **โ“ Questions** | [GitHub Discussions](https://github.com/isathish/LLMEvaluationFramework/discussions) | Community-driven |
| **๐Ÿ“š Documentation** | [Online Docs](https://isathish.github.io/LLMEvaluationFramework/) | Always available |
| **๐Ÿ’ก Feature Requests** | [GitHub Issues](https://github.com/isathish/LLMEvaluationFramework/issues) | Weekly review |

### ๐Ÿ“ˆ **Project Statistics**

<div align="center">

![GitHub stars](https://img.shields.io/github/stars/isathish/LLMEvaluationFramework?style=social)
![GitHub forks](https://img.shields.io/github/forks/isathish/LLMEvaluationFramework?style=social)
![GitHub watchers](https://img.shields.io/github/watchers/isathish/LLMEvaluationFramework?style=social)

</div>

---

## ๐Ÿ”— Important Links

<div align="center">

### ๐ŸŒ **Quick Access**

| Resource | Link | Description |
|----------|------|-------------|
| **๐Ÿ“ฆ PyPI Package** | [pypi.org/project/llm-evaluation-framework](https://pypi.org/project/llm-evaluation-framework/) | Install via pip |
| **๐Ÿ“š Documentation** | [isathish.github.io/LLMEvaluationFramework](https://isathish.github.io/LLMEvaluationFramework/) | Complete documentation |
| **๐Ÿ’ป Source Code** | [github.com/isathish/LLMEvaluationFramework](https://github.com/isathish/LLMEvaluationFramework) | View source & contribute |
| **๐Ÿ› Issue Tracker** | [github.com/.../issues](https://github.com/isathish/LLMEvaluationFramework/issues) | Report bugs & request features |
| **๐Ÿ’ฌ Discussions** | [github.com/.../discussions](https://github.com/isathish/LLMEvaluationFramework/discussions) | Community discussion |

</div>

---

<div align="center">

## ๐ŸŽ‰ **Thank You for Using LLM Evaluation Framework!**

<br>

**Made with โค๏ธ by [Sathish Kumar N](https://github.com/isathish)**

*If you find this project useful, please consider giving it a โญ๏ธ*

<br>

[![Star this repo](https://img.shields.io/github/stars/isathish/LLMEvaluationFramework?style=social)](https://github.com/isathish/LLMEvaluationFramework/stargazers)

<br>

---

### ๐Ÿš€ **Ready to Get Started?**

```bash
pip install LLMEvaluationFramework
```

**[๐Ÿ“š Read the Documentation](https://isathish.github.io/LLMEvaluationFramework/) โ€ข [๐Ÿš€ View Examples](examples/) โ€ข [๐Ÿ’ฌ Join Discussions](https://github.com/isathish/LLMEvaluationFramework/discussions)**

---

*Built for developers, researchers, and AI practitioners who demand reliable, production-ready LLM evaluation tools.*

</div>

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/isathish/LLMEvaluationFramework",
    "name": "LLMEvaluationFramework",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "llm, evaluation, testing, benchmarking, ai, ml, machine-learning, natural-language-processing, nlp, language-models, openai, gpt, anthropic, claude",
    "author": "Sathishkumar Nagarajan",
    "author_email": "mail@sathishkumarnagarajan.com",
    "download_url": "https://files.pythonhosted.org/packages/5c/00/83aa30d97bec6b9c6c46c4512daa1144be2df6062698019d1a29b5121b42/llmevaluationframework-0.0.21.tar.gz",
    "platform": null,
    "description": "# \ud83d\ude80 LLM Evaluation Framework\n\n<div align=\"center\">\n\n[![License](https://img.shields.io/github/license/isathish/LLMEvaluationFramework?style=for-the-badge&color=blue)](LICENSE)\n[![Tests](https://img.shields.io/badge/Tests-212%20Passed-success?style=for-the-badge&logo=pytest)](https://github.com/isathish/LLMEvaluationFramework)\n[![Coverage](https://img.shields.io/badge/Coverage-89%25-success?style=for-the-badge&logo=codecov)](https://github.com/isathish/LLMEvaluationFramework)\n[![Python](https://img.shields.io/badge/Python-3.8%2B-blue?style=for-the-badge&logo=python)](https://python.org)\n[![Documentation](https://img.shields.io/badge/Docs-MkDocs-blue?style=for-the-badge&logo=gitbook)](https://isathish.github.io/LLMEvaluationFramework/)\n\n**\ud83c\udf1f Enterprise-Grade Python Framework for Large Language Model Evaluation & Testing \ud83c\udf1f**\n\n*Built with production-ready standards \u2022 Type-safe \u2022 Comprehensive testing \u2022 Full CLI support*\n\n[\ud83d\udcda **Documentation**](https://isathish.github.io/LLMEvaluationFramework/) \u2022 [\ufffd **Quick Start**](#-quick-start) \u2022 [\ud83d\udca1 **Examples**](examples/) \u2022 [\ud83d\udc1b **Report Issues**](https://github.com/isathish/LLMEvaluationFramework/issues)\n\n</div>\n\n---\n\n## \ud83c\udf1f What Makes This Special?\n\n<table>\n<tr>\n<td width=\"50%\">\n\n### \ud83c\udfaf **Production Ready**\n- **212 comprehensive tests** with **89% coverage**\n- Complete type hints throughout codebase\n- Robust error handling with custom exceptions\n- Enterprise-grade logging and monitoring\n\n### \u26a1 **High Performance**\n- Async inference engine for concurrent evaluations\n- Batch processing capabilities\n- Cost optimization and tracking\n- Memory-efficient data handling\n\n</td>\n<td width=\"50%\">\n\n### \ufffd\ufe0f **Developer Friendly**\n- Intuitive CLI interface for all operations\n- Comprehensive documentation with examples\n- Modular architecture for easy extension\n- Multiple storage backends (JSON, SQLite)\n\n### \ud83d\udcca **Rich Analytics**\n- Multiple scoring strategies (Accuracy, F1, Custom)\n- Detailed performance metrics\n- Cost analysis and optimization\n- Exportable evaluation reports\n\n</td>\n</tr>\n</table>\n\n---\n\n## \ufffd Quick Installation\n\n```bash\n# Install from PyPI (Recommended)\npip install LLMEvaluationFramework\n\n# Or install from source for latest features\ngit clone https://github.com/isathish/LLMEvaluationFramework.git\ncd LLMEvaluationFramework\npip install -e .\n```\n\n**Requirements**: Python 3.8+ \u2022 No external dependencies for core functionality\n\n---\n\n## \u26a1 Quick Start\n\n### \ud83d\udc0d Python API (Recommended)\n\n```python\nfrom llm_evaluation_framework import (\n    ModelRegistry, \n    ModelInferenceEngine, \n    TestDatasetGenerator\n)\n\n# 1\ufe0f\u20e3 Setup the registry and register your model\nregistry = ModelRegistry()\nregistry.register_model(\"gpt-3.5-turbo\", {\n    \"provider\": \"openai\",\n    \"api_cost_input\": 0.0015,\n    \"api_cost_output\": 0.002,\n    \"capabilities\": [\"reasoning\", \"creativity\", \"coding\"]\n})\n\n# 2\ufe0f\u20e3 Generate test cases\ngenerator = TestDatasetGenerator()\ntest_cases = generator.generate_test_cases(\n    use_case={\"domain\": \"general\", \"required_capabilities\": [\"reasoning\"]},\n    count=10\n)\n\n# 3\ufe0f\u20e3 Run evaluation\nengine = ModelInferenceEngine(registry)\nresults = engine.evaluate_model(\"gpt-3.5-turbo\", test_cases)\n\n# 4\ufe0f\u20e3 Analyze results\nprint(f\"\u2705 Accuracy: {results['aggregate_metrics']['accuracy']:.1%}\")\nprint(f\"\ud83d\udcb0 Total Cost: ${results['aggregate_metrics']['total_cost']:.4f}\")\nprint(f\"\u23f1\ufe0f  Total Time: {results['aggregate_metrics']['total_time']:.2f}s\")\n```\n\n### \ud83d\udda5\ufe0f Command Line Interface\n\n```bash\n# Evaluate a model with specific capabilities\nllm-eval evaluate --model gpt-3.5-turbo --test-cases 10 --capability reasoning\n\n# Generate a custom test dataset\nllm-eval generate --capability coding --count 20 --output my_dataset.json\n\n# Score predictions against references\nllm-eval score --predictions \"Hello world\" \"Good morning\" \\\n               --references \"Hello world\" \"Good evening\" \\\n               --metric accuracy\n\n# List available capabilities and models\nllm-eval list\n```\n\n---\n\n## \ufffd\ufe0f Core Architecture\n\n<div align=\"center\">\n\n```mermaid\ngraph TB\n    CLI[\ud83d\udda5\ufe0f CLI Interface<br/>llm-eval] --> Engine[\u2699\ufe0f Inference Engine<br/>ModelInferenceEngine]\n    \n    Engine --> Registry[\ud83d\uddc4\ufe0f Model Registry<br/>ModelRegistry]\n    Engine --> Generator[\ud83e\uddea Dataset Generator<br/>TestDatasetGenerator]\n    Engine --> Scoring[\ud83d\udcca Scoring Strategies<br/>AccuracyScoringStrategy]\n    \n    Registry --> Models[(\ud83e\udd16 Models<br/>gpt-3.5-turbo, gpt-4, etc.)]\n    \n    Engine --> Storage[\ud83d\udcbe Persistence Layer]\n    Storage --> JSON[\ud83d\udcc4 JSON Store]\n    Storage --> SQLite[\ud83d\uddc3\ufe0f SQLite Store]\n    \n    Engine --> Utils[\ud83d\udee0\ufe0f Utilities]\n    Utils --> Logger[\ud83d\udcdd Advanced Logging]\n    Utils --> ErrorHandler[\ud83d\udee1\ufe0f Error Handling]\n    Utils --> AutoSuggest[\ud83d\udca1 Auto Suggestions]\n```\n\n</div>\n\n### \ud83c\udfaf Core Components\n\n| Component | Description | Key Features |\n|-----------|-------------|--------------|\n| **\ud83d\udd25 Inference Engine** | Execute and evaluate LLM inferences | Async processing, cost tracking, batch operations |\n| **\ud83d\uddc4\ufe0f Model Registry** | Centralized model management | Multi-provider support, configuration management |\n| **\ud83e\uddea Dataset Generator** | Create synthetic test cases | Capability-based generation, domain-specific tests |\n| **\ud83d\udcca Scoring Strategies** | Multiple evaluation metrics | Accuracy, F1-score, custom metrics |\n| **\ud83d\udcbe Persistence Layer** | Dual storage backends | JSON files, SQLite database with querying |\n| **\ud83d\udee1\ufe0f Error Handling** | Robust error management | Custom exceptions, retry mechanisms |\n| **\ud83d\udcdd Logging System** | Advanced logging capabilities | File rotation, structured logging |\n\n---\n\n## \ud83c\udfaf Feature Highlights\n\n### \ud83d\ude80 **What You Can Do**\n\n<table>\n<tr>\n<td width=\"33%\">\n\n#### \ud83d\udd2c **Research & Benchmarking**\n- Compare multiple LLM providers\n- Standardized evaluation metrics  \n- Reproducible experiments\n- Performance benchmarking\n\n</td>\n<td width=\"33%\">\n\n#### \ud83c\udfe2 **Enterprise Integration**\n- CI/CD pipeline integration\n- Automated regression testing\n- Cost optimization analysis\n- Quality assurance workflows\n\n</td>\n<td width=\"33%\">\n\n#### \ud83d\udcb0 **Cost Management**\n- Real-time cost tracking\n- Provider cost comparison\n- Budget optimization\n- ROI analysis\n\n</td>\n</tr>\n</table>\n\n### \ud83d\udcca **Supported Capabilities**\n\n```python\n# Available evaluation capabilities\nCAPABILITIES = [\n    \"reasoning\",      # Logical reasoning and problem-solving\n    \"creativity\",     # Creative writing and ideation\n    \"factual\",        # Factual accuracy and knowledge\n    \"instruction\",    # Instruction following\n    \"coding\"          # Code generation and debugging\n]\n```\n\n### \ud83c\udfae **Interactive Examples**\n\n<details>\n<summary>\ud83d\udd0d <strong>Click to see Advanced Usage Examples</strong></summary>\n\n#### \ud83d\udcc8 **Batch Evaluation with Multiple Models**\n\n```python\nfrom llm_evaluation_framework import ModelRegistry, ModelInferenceEngine\nfrom llm_evaluation_framework.persistence import JSONStore\n\n# Setup multiple models\nregistry = ModelRegistry()\nmodels = {\n    \"gpt-3.5-turbo\": {\"provider\": \"openai\", \"cost_input\": 0.0015},\n    \"gpt-4\": {\"provider\": \"openai\", \"cost_input\": 0.03},\n    \"claude-3\": {\"provider\": \"anthropic\", \"cost_input\": 0.015}\n}\n\nfor name, config in models.items():\n    registry.register_model(name, config)\n\n# Run comparative evaluation\nengine = ModelInferenceEngine(registry)\nresults = {}\n\nfor model_name in models.keys():\n    print(f\"\ud83d\ude80 Evaluating {model_name}...\")\n    result = engine.evaluate_model(model_name, test_cases)\n    results[model_name] = result\n    \n    # Save results\n    store = JSONStore(f\"results_{model_name}.json\")\n    store.save_evaluation_result(result)\n\n# Compare results\nfor model, result in results.items():\n    accuracy = result['aggregate_metrics']['accuracy']\n    cost = result['aggregate_metrics']['total_cost']\n    print(f\"\ud83d\udcca {model}: {accuracy:.1%} accuracy, ${cost:.4f} cost\")\n```\n\n#### \ud83c\udfaf **Custom Scoring Strategy**\n\n```python\nfrom llm_evaluation_framework.evaluation.scoring_strategies import ScoringContext\n\nclass CustomCosineSimilarityStrategy:\n    \"\"\"Custom scoring using cosine similarity.\"\"\"\n    \n    def calculate_score(self, predictions, references):\n        # Your custom scoring logic here\n        from sklearn.metrics.pairwise import cosine_similarity\n        from sklearn.feature_extraction.text import TfidfVectorizer\n        \n        vectorizer = TfidfVectorizer()\n        vectors = vectorizer.fit_transform(predictions + references)\n        \n        pred_vectors = vectors[:len(predictions)]\n        ref_vectors = vectors[len(predictions):]\n        \n        similarities = cosine_similarity(pred_vectors, ref_vectors)\n        return similarities.diagonal().mean()\n\n# Use custom strategy\ncustom_strategy = CustomCosineSimilarityStrategy()\ncontext = ScoringContext(custom_strategy)\nscore = context.evaluate(predictions, references)\nprint(f\"\ud83c\udfaf Custom similarity score: {score:.3f}\")\n```\n\n#### \ud83d\udd04 **Async Evaluation Pipeline**\n\n```python\nimport asyncio\nfrom llm_evaluation_framework.engines.async_inference_engine import AsyncInferenceEngine\n\nasync def run_async_evaluation():\n    \"\"\"Run multiple evaluations concurrently.\"\"\"\n    \n    async_engine = AsyncInferenceEngine(registry)\n    \n    # Define multiple evaluation tasks\n    tasks = []\n    for capability in [\"reasoning\", \"creativity\", \"coding\"]:\n        task = async_engine.evaluate_async(\n            model_name=\"gpt-3.5-turbo\",\n            test_cases=test_cases,\n            capability=capability\n        )\n        tasks.append(task)\n    \n    # Run all evaluations concurrently\n    results = await asyncio.gather(*tasks)\n    \n    # Process results\n    for i, result in enumerate(results):\n        capability = [\"reasoning\", \"creativity\", \"coding\"][i]\n        accuracy = result['aggregate_metrics']['accuracy']\n        print(f\"\u2705 {capability}: {accuracy:.1%}\")\n\n# Run async evaluation\nasyncio.run(run_async_evaluation())\n```\n\n</details>\n\n---\n\n## \ud83d\udcda Documentation & Resources\n\n<div align=\"center\">\n\n### \ud83d\udcd6 **Comprehensive Documentation Available**\n\n[![Documentation](https://img.shields.io/badge/Read%20the%20Docs-blue?style=for-the-badge&logo=gitbook)](https://isathish.github.io/LLMEvaluationFramework/)\n\n</div>\n\n| Section | Description | Link |\n|---------|-------------|------|\n| \ud83d\ude80 **Getting Started** | Installation, quick start, and basic concepts | [View Guide](https://isathish.github.io/LLMEvaluationFramework/categories/getting-started/) |\n| \ud83e\udde0 **Core Concepts** | Understanding the framework architecture | [Learn More](https://isathish.github.io/LLMEvaluationFramework/categories/core-concepts/) |\n| \ud83d\udda5\ufe0f **CLI Usage** | Complete command-line interface documentation | [CLI Guide](https://isathish.github.io/LLMEvaluationFramework/categories/cli-usage/) |\n| \ud83d\udcca **API Reference** | Detailed API documentation with examples | [API Docs](https://isathish.github.io/LLMEvaluationFramework/categories/api-reference/) |\n| \ud83d\udca1 **Examples** | Practical examples and tutorials | [View Examples](https://isathish.github.io/LLMEvaluationFramework/categories/examples/) |\n| \ud83d\udee0\ufe0f **Developer Guide** | Contributing guidelines and development setup | [Dev Guide](https://isathish.github.io/LLMEvaluationFramework/developer-guide/) |\n\n---\n\n## \ud83e\uddea Testing & Quality\n\n<div align=\"center\">\n\n### \ud83c\udfc6 **High-Quality Codebase with Comprehensive Testing**\n\n</div>\n\n<table>\n<tr>\n<td width=\"25%\" align=\"center\">\n\n**\ud83d\udcc8 Test Coverage**\n<br>\n<strong style=\"font-size: 2em; color: #28a745;\">89%</strong>\n<br>\n<em>Comprehensive test coverage</em>\n\n</td>\n<td width=\"25%\" align=\"center\">\n\n**\u2705 Total Tests**\n<br>\n<strong style=\"font-size: 2em; color: #007bff;\">212</strong>\n<br>\n<em>All tests passing</em>\n\n</td>\n<td width=\"25%\" align=\"center\">\n\n**\ud83d\udd27 Test Files**\n<br>\n<strong style=\"font-size: 2em; color: #6f42c1;\">10+</strong>\n<br>\n<em>Modular test structure</em>\n\n</td>\n<td width=\"25%\" align=\"center\">\n\n**\u26a1 Test Types**\n<br>\n<strong style=\"font-size: 2em; color: #fd7e14;\">4+</strong>\n<br>\n<em>Unit, Integration, Edge Cases</em>\n\n</td>\n</tr>\n</table>\n\n### \ud83d\ude80 **Run Tests Locally**\n\n```bash\n# Run all tests\npytest\n\n# Run with detailed coverage report\npytest --cov=llm_evaluation_framework --cov-report=html\n\n# Run specific test categories\npytest tests/test_model_inference_engine_comprehensive.py  # Core engine tests\npytest tests/test_cli_comprehensive.py                     # CLI tests\npytest tests/test_persistence_comprehensive.py            # Storage tests\n\n# View coverage report\nopen htmlcov/index.html\n```\n\n### \ud83d\udcca **Test Categories**\n\n| Test Type | Count | Description |\n|-----------|-------|-------------|\n| **\ud83d\udd27 Unit Tests** | 150+ | Individual component testing |\n| **\ud83d\udd17 Integration Tests** | 40+ | Component interaction testing |\n| **\ud83c\udfaf Edge Case Tests** | 20+ | Error conditions and boundaries |\n| **\u26a1 Performance Tests** | 10+ | Speed and memory optimization |\n\n---\n\n## \ud83e\udd1d Contributing\n\n<div align=\"center\">\n\n### \ud83c\udf1f **We Welcome Contributors!**\n\n[![Contributors](https://img.shields.io/github/contributors/isathish/LLMEvaluationFramework?style=for-the-badge)](https://github.com/isathish/LLMEvaluationFramework/graphs/contributors)\n[![Issues](https://img.shields.io/github/issues/isathish/LLMEvaluationFramework?style=for-the-badge)](https://github.com/isathish/LLMEvaluationFramework/issues)\n[![Pull Requests](https://img.shields.io/github/issues-pr/isathish/LLMEvaluationFramework?style=for-the-badge)](https://github.com/isathish/LLMEvaluationFramework/pulls)\n\n</div>\n\n### \ud83d\udee0\ufe0f **Development Setup**\n\n```bash\n# 1\ufe0f\u20e3 Fork and clone the repository\ngit clone https://github.com/YOUR_USERNAME/LLMEvaluationFramework.git\ncd LLMEvaluationFramework\n\n# 2\ufe0f\u20e3 Create a virtual environment (recommended)\npython -m venv venv\nsource venv/bin/activate  # On Windows: venv\\Scripts\\activate\n\n# 3\ufe0f\u20e3 Install in development mode\npip install -e \".[dev]\"\n\n# 4\ufe0f\u20e3 Run tests to ensure everything works\npytest\n\n# 5\ufe0f\u20e3 Install pre-commit hooks (optional but recommended)\npre-commit install\n```\n\n### \ud83d\udcdd **Contribution Guidelines**\n\n1. **\ud83c\udf74 Fork** the repository\n2. **\ud83c\udf3f Create** a feature branch (`git checkout -b feature/amazing-feature`)\n3. **\u2705 Write** tests for your changes\n4. **\ud83e\uddea Run** the test suite (`pytest`)\n5. **\ud83d\udcdd Commit** your changes (`git commit -m 'Add amazing feature'`)\n6. **\ud83d\ude80 Push** to the branch (`git push origin feature/amazing-feature`)\n7. **\ud83d\udd00 Open** a Pull Request\n\n### \ud83c\udfaf **What We're Looking For**\n\n- \ud83d\udc1b Bug fixes and improvements\n- \ud83d\udcda Documentation enhancements\n- \u2728 New features and capabilities\n- \ud83e\uddea Additional test cases\n- \ud83c\udfa8 UI/UX improvements for CLI\n- \ud83d\udd27 Performance optimizations\n\n---\n\n## \ud83d\udccb Requirements & Compatibility\n\n### \ud83d\udc0d **Python Version Support**\n\n| Python Version | Status | Notes |\n|----------------|--------|-------|\n| **Python 3.8** | \u2705 Supported | Minimum required version |\n| **Python 3.9** | \u2705 Supported | Fully tested |\n| **Python 3.10** | \u2705 Supported | Recommended |\n| **Python 3.11** | \u2705 Supported | Latest features |\n| **Python 3.12+** | \u2705 Supported | Future-ready |\n\n### \ud83d\udce6 **Dependencies**\n\n```python\n# Core dependencies (automatically installed)\nREQUIRED = [\n    # No external dependencies for core functionality!\n    # Framework uses only Python standard library\n]\n\n# Optional development dependencies\nDEVELOPMENT = [\n    \"pytest>=7.0.0\",           # Testing framework\n    \"pytest-cov>=4.0.0\",      # Coverage reporting\n    \"black>=22.0.0\",           # Code formatting\n    \"flake8>=5.0.0\",           # Code linting\n    \"mypy>=1.0.0\",             # Type checking\n    \"pre-commit>=2.20.0\",      # Git hooks\n]\n```\n\n### \ud83c\udf10 **Platform Support**\n\n- \u2705 **Linux** (Ubuntu, CentOS, RHEL)\n- \u2705 **macOS** (Intel & Apple Silicon)\n- \u2705 **Windows** (10, 11)\n- \u2705 **Docker** containers\n- \u2705 **CI/CD** environments (GitHub Actions, Jenkins, etc.)\n\n---\n\n## \ud83d\udcc4 License\n\n<div align=\"center\">\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=for-the-badge)](https://opensource.org/licenses/MIT)\n\n**This project is licensed under the MIT License**\n\n*You are free to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the software.*\n\n[\ud83d\udcdc **Read the full license**](LICENSE)\n\n</div>\n\n---\n\n## \ud83d\ude4f Acknowledgments & Credits\n\n<div align=\"center\">\n\n### \ud83c\udf1f **Built with Love and Open Source**\n\n</div>\n\n- **\ud83d\ude80 Inspiration**: Born from the need for standardized, reliable LLM evaluation tools\n- **\ud83c\udfd7\ufe0f Architecture**: Built with modern Python best practices and enterprise standards\n- **\ud83e\uddea Testing**: Comprehensive test coverage ensuring production reliability  \n- **\ud83d\udc65 Community**: Driven by developers, researchers, and AI practitioners\n- **\ud83d\udcda Documentation**: Extensive documentation for developers at all levels\n\n### \ud83d\udd27 **Technology Stack**\n\n| Technology | Purpose | Why We Chose It |\n|------------|---------|----------------|\n| **\ud83d\udc0d Python 3.8+** | Core Language | Wide adoption, excellent ecosystem |\n| **\ud83d\udccb Type Hints** | Code Safety | Better IDE support, fewer runtime errors |\n| **\ud83e\uddea Pytest** | Testing Framework | Industry standard, excellent plugin ecosystem |\n| **\ud83d\udcca SQLite** | Database Storage | Lightweight, serverless, reliable |\n| **\ud83d\udcdd MkDocs** | Documentation | Beautiful docs, Markdown-based |\n| **\ud83c\udfa8 Rich CLI** | User Interface | Modern, intuitive command-line experience |\n\n---\n\n## \ud83d\udcde Support & Community\n\n<div align=\"center\">\n\n### \ud83d\udcac **Get Help & Connect**\n\n[![GitHub Issues](https://img.shields.io/badge/Issues-Get%20Help-red?style=for-the-badge&logo=github)](https://github.com/isathish/LLMEvaluationFramework/issues)\n[![GitHub Discussions](https://img.shields.io/badge/Discussions-Join%20Community-blue?style=for-the-badge&logo=github)](https://github.com/isathish/LLMEvaluationFramework/discussions)\n[![Documentation](https://img.shields.io/badge/Docs-Read%20Here-green?style=for-the-badge&logo=gitbook)](https://isathish.github.io/LLMEvaluationFramework/)\n\n</div>\n\n### \ud83c\udd98 **Getting Support**\n\n| Type | Where to Go | Response Time |\n|------|-------------|---------------|\n| **\ud83d\udc1b Bug Reports** | [GitHub Issues](https://github.com/isathish/LLMEvaluationFramework/issues) | 24-48 hours |\n| **\u2753 Questions** | [GitHub Discussions](https://github.com/isathish/LLMEvaluationFramework/discussions) | Community-driven |\n| **\ud83d\udcda Documentation** | [Online Docs](https://isathish.github.io/LLMEvaluationFramework/) | Always available |\n| **\ud83d\udca1 Feature Requests** | [GitHub Issues](https://github.com/isathish/LLMEvaluationFramework/issues) | Weekly review |\n\n### \ud83d\udcc8 **Project Statistics**\n\n<div align=\"center\">\n\n![GitHub stars](https://img.shields.io/github/stars/isathish/LLMEvaluationFramework?style=social)\n![GitHub forks](https://img.shields.io/github/forks/isathish/LLMEvaluationFramework?style=social)\n![GitHub watchers](https://img.shields.io/github/watchers/isathish/LLMEvaluationFramework?style=social)\n\n</div>\n\n---\n\n## \ud83d\udd17 Important Links\n\n<div align=\"center\">\n\n### \ud83c\udf10 **Quick Access**\n\n| Resource | Link | Description |\n|----------|------|-------------|\n| **\ud83d\udce6 PyPI Package** | [pypi.org/project/llm-evaluation-framework](https://pypi.org/project/llm-evaluation-framework/) | Install via pip |\n| **\ud83d\udcda Documentation** | [isathish.github.io/LLMEvaluationFramework](https://isathish.github.io/LLMEvaluationFramework/) | Complete documentation |\n| **\ud83d\udcbb Source Code** | [github.com/isathish/LLMEvaluationFramework](https://github.com/isathish/LLMEvaluationFramework) | View source & contribute |\n| **\ud83d\udc1b Issue Tracker** | [github.com/.../issues](https://github.com/isathish/LLMEvaluationFramework/issues) | Report bugs & request features |\n| **\ud83d\udcac Discussions** | [github.com/.../discussions](https://github.com/isathish/LLMEvaluationFramework/discussions) | Community discussion |\n\n</div>\n\n---\n\n<div align=\"center\">\n\n## \ud83c\udf89 **Thank You for Using LLM Evaluation Framework!**\n\n<br>\n\n**Made with \u2764\ufe0f by [Sathish Kumar N](https://github.com/isathish)**\n\n*If you find this project useful, please consider giving it a \u2b50\ufe0f*\n\n<br>\n\n[![Star this repo](https://img.shields.io/github/stars/isathish/LLMEvaluationFramework?style=social)](https://github.com/isathish/LLMEvaluationFramework/stargazers)\n\n<br>\n\n---\n\n### \ud83d\ude80 **Ready to Get Started?**\n\n```bash\npip install LLMEvaluationFramework\n```\n\n**[\ud83d\udcda Read the Documentation](https://isathish.github.io/LLMEvaluationFramework/) \u2022 [\ud83d\ude80 View Examples](examples/) \u2022 [\ud83d\udcac Join Discussions](https://github.com/isathish/LLMEvaluationFramework/discussions)**\n\n---\n\n*Built for developers, researchers, and AI practitioners who demand reliable, production-ready LLM evaluation tools.*\n\n</div>\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Enterprise-Grade Python Framework for Large Language Model Evaluation & Testing",
    "version": "0.0.21",
    "project_urls": {
        "Bug Tracker": "https://github.com/isathish/LLMEvaluationFramework/issues",
        "Documentation": "https://isathish.github.io/LLMEvaluationFramework/",
        "Homepage": "https://github.com/isathish/LLMEvaluationFramework",
        "Source": "https://github.com/isathish/LLMEvaluationFramework"
    },
    "split_keywords": [
        "llm",
        " evaluation",
        " testing",
        " benchmarking",
        " ai",
        " ml",
        " machine-learning",
        " natural-language-processing",
        " nlp",
        " language-models",
        " openai",
        " gpt",
        " anthropic",
        " claude"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c5425fe278972d1c3c80ab250ca05208dfff8b36cc37dca24129eb10439080cf",
                "md5": "49399842de2f95bcb02bfde58cf5a186",
                "sha256": "301023518b09a60d14e7e32c37453395db3ed7f8814fb01cc682466c29381558"
            },
            "downloads": -1,
            "filename": "llmevaluationframework-0.0.21-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "49399842de2f95bcb02bfde58cf5a186",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 62830,
            "upload_time": "2025-10-12T08:37:48",
            "upload_time_iso_8601": "2025-10-12T08:37:48.693529Z",
            "url": "https://files.pythonhosted.org/packages/c5/42/5fe278972d1c3c80ab250ca05208dfff8b36cc37dca24129eb10439080cf/llmevaluationframework-0.0.21-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5c0083aa30d97bec6b9c6c46c4512daa1144be2df6062698019d1a29b5121b42",
                "md5": "4bb4ef559e3e11e365307728b7c11b3b",
                "sha256": "50d5b511583c542e18ad7fd38f6cb3a4e3bf59843f6983e53a042d294f73d997"
            },
            "downloads": -1,
            "filename": "llmevaluationframework-0.0.21.tar.gz",
            "has_sig": false,
            "md5_digest": "4bb4ef559e3e11e365307728b7c11b3b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 56937,
            "upload_time": "2025-10-12T08:37:49",
            "upload_time_iso_8601": "2025-10-12T08:37:49.906492Z",
            "url": "https://files.pythonhosted.org/packages/5c/00/83aa30d97bec6b9c6c46c4512daa1144be2df6062698019d1a29b5121b42/llmevaluationframework-0.0.21.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-12 08:37:49",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "isathish",
    "github_project": "LLMEvaluationFramework",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "typing-extensions",
            "specs": [
                [
                    ">=",
                    "4.0.0"
                ]
            ]
        },
        {
            "name": "pytest",
            "specs": [
                [
                    ">=",
                    "7.0.0"
                ]
            ]
        },
        {
            "name": "pytest-cov",
            "specs": [
                [
                    ">=",
                    "4.0.0"
                ]
            ]
        },
        {
            "name": "black",
            "specs": [
                [
                    ">=",
                    "22.0.0"
                ]
            ]
        },
        {
            "name": "flake8",
            "specs": [
                [
                    ">=",
                    "5.0.0"
                ]
            ]
        },
        {
            "name": "mypy",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "mkdocs",
            "specs": [
                [
                    ">=",
                    "1.4.0"
                ]
            ]
        },
        {
            "name": "mkdocs-material",
            "specs": [
                [
                    ">=",
                    "8.0.0"
                ]
            ]
        }
    ],
    "lcname": "llmevaluationframework"
}
        
Elapsed time: 0.93023s