novaeval


Namenovaeval JSON
Version 0.3.2 PyPI version JSON
download
home_pageNone
SummaryA comprehensive, open-source LLM evaluation framework for testing and benchmarking AI models
upload_time2025-07-12 20:19:59
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseNone
keywords llm evaluation ai machine-learning benchmarking testing rag agents conversational-ai g-eval
VCS
bugtrack_url
requirements pydantic pyyaml requests numpy pandas tqdm click rich jinja2 plotly scikit-learn datasets transformers openai anthropic boto3 sentence-transformers
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # NovaEval by Noveum.ai

[![CI](https://github.com/Noveum/NovaEval/actions/workflows/ci.yml/badge.svg)](https://github.com/Noveum/NovaEval/actions/workflows/ci.yml)
[![Release](https://github.com/Noveum/NovaEval/actions/workflows/release.yml/badge.svg)](https://github.com/Noveum/NovaEval/actions/workflows/release.yml)
[![codecov](https://codecov.io/gh/Noveum/NovaEval/branch/main/graph/badge.svg)](https://codecov.io/gh/Noveum/NovaEval)
[![PyPI version](https://badge.fury.io/py/novaeval.svg)](https://badge.fury.io/py/novaeval)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)

A comprehensive, extensible AI model evaluation framework designed for production use. NovaEval provides a unified interface for evaluating language models across various datasets, metrics, and deployment scenarios.

## 🚀 Features

- **Multi-Model Support**: Evaluate models from OpenAI, Anthropic, AWS Bedrock, and custom providers
- **Extensible Scoring**: Built-in scorers for accuracy, semantic similarity, code evaluation, and custom metrics
- **Dataset Integration**: Support for MMLU, HuggingFace datasets, custom datasets, and more
- **Production Ready**: Docker support, Kubernetes deployment, and cloud integrations
- **Comprehensive Reporting**: Detailed evaluation reports, artifacts, and visualizations
- **Secure**: Built-in credential management and secret store integration
- **Scalable**: Designed for both local testing and large-scale production evaluations
- **Cross-Platform**: Tested on macOS, Linux, and Windows with comprehensive CI/CD

## 📦 Installation

### From PyPI (Recommended)

```bash
pip install novaeval
```

### From Source

```bash
git clone https://github.com/Noveum/NovaEval.git
cd NovaEval
pip install -e .
```

### Docker

```bash
docker pull noveum/novaeval:latest
```

## 🏃‍♂️ Quick Start

### Basic Evaluation

```python
from novaeval import Evaluator
from novaeval.datasets import MMLUDataset
from novaeval.models import OpenAIModel
from novaeval.scorers import AccuracyScorer

# Configure for cost-conscious evaluation
MAX_TOKENS = 100  # Adjust based on budget: 5-10 for answers, 100+ for reasoning

# Initialize components
dataset = MMLUDataset(
    subset="elementary_mathematics",  # Easier subset for demo
    num_samples=10,
    split="test"
)

model = OpenAIModel(
    model_name="gpt-4o-mini",  # Cost-effective model
    temperature=0.0,
    max_tokens=MAX_TOKENS
)

scorer = AccuracyScorer(extract_answer=True)

# Create and run evaluation
evaluator = Evaluator(
    dataset=dataset,
    models=[model],
    scorers=[scorer],
    output_dir="./results"
)

results = evaluator.run()

# Display detailed results
for model_name, model_results in results["model_results"].items():
    for scorer_name, score_info in model_results["scores"].items():
        if isinstance(score_info, dict):
            mean_score = score_info.get("mean", 0)
            count = score_info.get("count", 0)
            print(f"{scorer_name}: {mean_score:.4f} ({count} samples)")
```

### Configuration-Based Evaluation

```python
from novaeval import Evaluator

# Load configuration from YAML/JSON
evaluator = Evaluator.from_config("evaluation_config.yaml")
results = evaluator.run()
```

### Example Configuration

```yaml
# evaluation_config.yaml
dataset:
  type: "mmlu"
  subset: "abstract_algebra"
  num_samples: 500

models:
  - type: "openai"
    model_name: "gpt-4"
    temperature: 0.0
  - type: "anthropic"
    model_name: "claude-3-opus"
    temperature: 0.0

scorers:
  - type: "accuracy"
  - type: "semantic_similarity"
    threshold: 0.8

output:
  directory: "./results"
  formats: ["json", "csv", "html"]
  upload_to_s3: true
  s3_bucket: "my-eval-results"
```

## 🏗️ Architecture

NovaEval is built with extensibility and modularity in mind:

```
src/novaeval/
├── datasets/          # Dataset loaders and processors
├── evaluators/        # Core evaluation logic
├── integrations/      # External service integrations
├── models/           # Model interfaces and adapters
├── reporting/        # Report generation and visualization
├── scorers/          # Scoring mechanisms and metrics
└── utils/            # Utility functions and helpers
```

### Core Components

- **Datasets**: Standardized interface for loading evaluation datasets
- **Models**: Unified API for different AI model providers
- **Scorers**: Pluggable scoring mechanisms for various evaluation metrics
- **Evaluators**: Orchestrates the evaluation process
- **Reporting**: Generates comprehensive reports and artifacts
- **Integrations**: Handles external services (S3, credential stores, etc.)

## 📊 Supported Datasets

- **MMLU**: Massive Multitask Language Understanding
- **HuggingFace**: Any dataset from the HuggingFace Hub
- **Custom**: JSON, CSV, or programmatic dataset definitions
- **Code Evaluation**: Programming benchmarks and code generation tasks
- **Agent Traces**: Multi-turn conversation and agent evaluation

## 🤖 Supported Models

- **OpenAI**: GPT-3.5, GPT-4, and newer models
- **Anthropic**: Claude family models
- **AWS Bedrock**: Amazon's managed AI services
- **Noveum AI Gateway**: Integration with Noveum's model gateway
- **Custom**: Extensible interface for any API-based model

## 📏 Built-in Scorers

### Accuracy-Based
- **ExactMatch**: Exact string matching
- **Accuracy**: Classification accuracy
- **F1Score**: F1 score for classification tasks

### Semantic-Based
- **SemanticSimilarity**: Embedding-based similarity scoring
- **BERTScore**: BERT-based semantic evaluation
- **RougeScore**: ROUGE metrics for text generation

### Code-Specific
- **CodeExecution**: Execute and validate code outputs
- **SyntaxChecker**: Validate code syntax
- **TestCoverage**: Code coverage analysis

### Custom
- **LLMJudge**: Use another LLM as a judge
- **HumanEval**: Integration with human evaluation workflows

## 🚀 Deployment

### Local Development

```bash
# Install dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run example evaluation
python examples/basic_evaluation.py
```

### Docker

```bash
# Build image
docker build -t nova-eval .

# Run evaluation
docker run -v $(pwd)/config:/config -v $(pwd)/results:/results nova-eval --config /config/eval.yaml
```

### Kubernetes

```bash
# Deploy to Kubernetes
kubectl apply -f kubernetes/

# Check status
kubectl get pods -l app=nova-eval
```

## 🔧 Configuration

NovaEval supports configuration through:

- **YAML/JSON files**: Declarative configuration
- **Environment variables**: Runtime configuration
- **Python code**: Programmatic configuration
- **CLI arguments**: Command-line overrides

### Environment Variables

```bash
export NOVA_EVAL_OUTPUT_DIR="./results"
export NOVA_EVAL_LOG_LEVEL="INFO"
export OPENAI_API_KEY="your-api-key"
export AWS_ACCESS_KEY_ID="your-aws-key"
```

### CI/CD Integration

NovaEval includes optimized GitHub Actions workflows:
- **Unit tests** run on all PRs and pushes for quick feedback
- **Integration tests** run on main branch only to minimize API costs
- **Cross-platform testing** on macOS, Linux, and Windows

## 📈 Reporting and Artifacts

NovaEval generates comprehensive evaluation reports:

- **Summary Reports**: High-level metrics and insights
- **Detailed Results**: Per-sample predictions and scores
- **Visualizations**: Charts and graphs for result analysis
- **Artifacts**: Model outputs, intermediate results, and debug information
- **Export Formats**: JSON, CSV, HTML, PDF

### Example Report Structure

```
results/
├── summary.json              # High-level metrics
├── detailed_results.csv      # Per-sample results
├── artifacts/
│   ├── model_outputs/        # Raw model responses
│   ├── intermediate/         # Processing artifacts
│   └── debug/               # Debug information
├── visualizations/
│   ├── accuracy_by_category.png
│   ├── score_distribution.png
│   └── confusion_matrix.png
└── report.html              # Interactive HTML report
```

## 🔌 Extending NovaEval

### Custom Datasets

```python
from novaeval.datasets import BaseDataset

class MyCustomDataset(BaseDataset):
    def load_data(self):
        # Implement data loading logic
        return samples

    def get_sample(self, index):
        # Return individual sample
        return sample
```

### Custom Scorers

```python
from novaeval.scorers import BaseScorer

class MyCustomScorer(BaseScorer):
    def score(self, prediction, ground_truth, context=None):
        # Implement scoring logic
        return score
```

### Custom Models

```python
from novaeval.models import BaseModel

class MyCustomModel(BaseModel):
    def generate(self, prompt, **kwargs):
        # Implement model inference
        return response
```

## 🤝 Contributing

We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.

### Development Setup

```bash
# Clone repository
git clone https://github.com/Noveum/NovaEval.git
cd NovaEval

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install development dependencies
pip install -e ".[dev]"

# Install pre-commit hooks
pre-commit install

# Run tests
pytest

# Run with coverage (23% overall, 90%+ for core modules)
pytest --cov=src/novaeval --cov-report=html
```

## 📄 License

This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.

## 🙏 Acknowledgments

- Inspired by evaluation frameworks like DeepEval, Confident AI, and Braintrust
- Built with modern Python best practices and industry standards
- Designed for the AI evaluation community

## 📞 Support

- **Documentation**: [https://noveum.github.io/NovaEval](https://noveum.github.io/NovaEval)
- **Issues**: [GitHub Issues](https://github.com/Noveum/NovaEval/issues)
- **Discussions**: [GitHub Discussions](https://github.com/Noveum/NovaEval/discussions)
- **Email**: support@noveum.ai

---

Made with ❤️ by the Noveum.ai team

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "novaeval",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": "Noveum AI <team@noveum.ai>",
    "keywords": "llm, evaluation, ai, machine-learning, benchmarking, testing, rag, agents, conversational-ai, g-eval",
    "author": null,
    "author_email": "Noveum AI <team@noveum.ai>",
    "download_url": "https://files.pythonhosted.org/packages/ad/b6/1d0b1f2fcd8186a9b5bac9ba80287c3b382ba7fb12ba754f0a9f058012a1/novaeval-0.3.2.tar.gz",
    "platform": null,
    "description": "# NovaEval by Noveum.ai\n\n[![CI](https://github.com/Noveum/NovaEval/actions/workflows/ci.yml/badge.svg)](https://github.com/Noveum/NovaEval/actions/workflows/ci.yml)\n[![Release](https://github.com/Noveum/NovaEval/actions/workflows/release.yml/badge.svg)](https://github.com/Noveum/NovaEval/actions/workflows/release.yml)\n[![codecov](https://codecov.io/gh/Noveum/NovaEval/branch/main/graph/badge.svg)](https://codecov.io/gh/Noveum/NovaEval)\n[![PyPI version](https://badge.fury.io/py/novaeval.svg)](https://badge.fury.io/py/novaeval)\n[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)\n[![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n\nA comprehensive, extensible AI model evaluation framework designed for production use. NovaEval provides a unified interface for evaluating language models across various datasets, metrics, and deployment scenarios.\n\n## \ud83d\ude80 Features\n\n- **Multi-Model Support**: Evaluate models from OpenAI, Anthropic, AWS Bedrock, and custom providers\n- **Extensible Scoring**: Built-in scorers for accuracy, semantic similarity, code evaluation, and custom metrics\n- **Dataset Integration**: Support for MMLU, HuggingFace datasets, custom datasets, and more\n- **Production Ready**: Docker support, Kubernetes deployment, and cloud integrations\n- **Comprehensive Reporting**: Detailed evaluation reports, artifacts, and visualizations\n- **Secure**: Built-in credential management and secret store integration\n- **Scalable**: Designed for both local testing and large-scale production evaluations\n- **Cross-Platform**: Tested on macOS, Linux, and Windows with comprehensive CI/CD\n\n## \ud83d\udce6 Installation\n\n### From PyPI (Recommended)\n\n```bash\npip install novaeval\n```\n\n### From Source\n\n```bash\ngit clone https://github.com/Noveum/NovaEval.git\ncd NovaEval\npip install -e .\n```\n\n### Docker\n\n```bash\ndocker pull noveum/novaeval:latest\n```\n\n## \ud83c\udfc3\u200d\u2642\ufe0f Quick Start\n\n### Basic Evaluation\n\n```python\nfrom novaeval import Evaluator\nfrom novaeval.datasets import MMLUDataset\nfrom novaeval.models import OpenAIModel\nfrom novaeval.scorers import AccuracyScorer\n\n# Configure for cost-conscious evaluation\nMAX_TOKENS = 100  # Adjust based on budget: 5-10 for answers, 100+ for reasoning\n\n# Initialize components\ndataset = MMLUDataset(\n    subset=\"elementary_mathematics\",  # Easier subset for demo\n    num_samples=10,\n    split=\"test\"\n)\n\nmodel = OpenAIModel(\n    model_name=\"gpt-4o-mini\",  # Cost-effective model\n    temperature=0.0,\n    max_tokens=MAX_TOKENS\n)\n\nscorer = AccuracyScorer(extract_answer=True)\n\n# Create and run evaluation\nevaluator = Evaluator(\n    dataset=dataset,\n    models=[model],\n    scorers=[scorer],\n    output_dir=\"./results\"\n)\n\nresults = evaluator.run()\n\n# Display detailed results\nfor model_name, model_results in results[\"model_results\"].items():\n    for scorer_name, score_info in model_results[\"scores\"].items():\n        if isinstance(score_info, dict):\n            mean_score = score_info.get(\"mean\", 0)\n            count = score_info.get(\"count\", 0)\n            print(f\"{scorer_name}: {mean_score:.4f} ({count} samples)\")\n```\n\n### Configuration-Based Evaluation\n\n```python\nfrom novaeval import Evaluator\n\n# Load configuration from YAML/JSON\nevaluator = Evaluator.from_config(\"evaluation_config.yaml\")\nresults = evaluator.run()\n```\n\n### Example Configuration\n\n```yaml\n# evaluation_config.yaml\ndataset:\n  type: \"mmlu\"\n  subset: \"abstract_algebra\"\n  num_samples: 500\n\nmodels:\n  - type: \"openai\"\n    model_name: \"gpt-4\"\n    temperature: 0.0\n  - type: \"anthropic\"\n    model_name: \"claude-3-opus\"\n    temperature: 0.0\n\nscorers:\n  - type: \"accuracy\"\n  - type: \"semantic_similarity\"\n    threshold: 0.8\n\noutput:\n  directory: \"./results\"\n  formats: [\"json\", \"csv\", \"html\"]\n  upload_to_s3: true\n  s3_bucket: \"my-eval-results\"\n```\n\n## \ud83c\udfd7\ufe0f Architecture\n\nNovaEval is built with extensibility and modularity in mind:\n\n```\nsrc/novaeval/\n\u251c\u2500\u2500 datasets/          # Dataset loaders and processors\n\u251c\u2500\u2500 evaluators/        # Core evaluation logic\n\u251c\u2500\u2500 integrations/      # External service integrations\n\u251c\u2500\u2500 models/           # Model interfaces and adapters\n\u251c\u2500\u2500 reporting/        # Report generation and visualization\n\u251c\u2500\u2500 scorers/          # Scoring mechanisms and metrics\n\u2514\u2500\u2500 utils/            # Utility functions and helpers\n```\n\n### Core Components\n\n- **Datasets**: Standardized interface for loading evaluation datasets\n- **Models**: Unified API for different AI model providers\n- **Scorers**: Pluggable scoring mechanisms for various evaluation metrics\n- **Evaluators**: Orchestrates the evaluation process\n- **Reporting**: Generates comprehensive reports and artifacts\n- **Integrations**: Handles external services (S3, credential stores, etc.)\n\n## \ud83d\udcca Supported Datasets\n\n- **MMLU**: Massive Multitask Language Understanding\n- **HuggingFace**: Any dataset from the HuggingFace Hub\n- **Custom**: JSON, CSV, or programmatic dataset definitions\n- **Code Evaluation**: Programming benchmarks and code generation tasks\n- **Agent Traces**: Multi-turn conversation and agent evaluation\n\n## \ud83e\udd16 Supported Models\n\n- **OpenAI**: GPT-3.5, GPT-4, and newer models\n- **Anthropic**: Claude family models\n- **AWS Bedrock**: Amazon's managed AI services\n- **Noveum AI Gateway**: Integration with Noveum's model gateway\n- **Custom**: Extensible interface for any API-based model\n\n## \ud83d\udccf Built-in Scorers\n\n### Accuracy-Based\n- **ExactMatch**: Exact string matching\n- **Accuracy**: Classification accuracy\n- **F1Score**: F1 score for classification tasks\n\n### Semantic-Based\n- **SemanticSimilarity**: Embedding-based similarity scoring\n- **BERTScore**: BERT-based semantic evaluation\n- **RougeScore**: ROUGE metrics for text generation\n\n### Code-Specific\n- **CodeExecution**: Execute and validate code outputs\n- **SyntaxChecker**: Validate code syntax\n- **TestCoverage**: Code coverage analysis\n\n### Custom\n- **LLMJudge**: Use another LLM as a judge\n- **HumanEval**: Integration with human evaluation workflows\n\n## \ud83d\ude80 Deployment\n\n### Local Development\n\n```bash\n# Install dependencies\npip install -e \".[dev]\"\n\n# Run tests\npytest\n\n# Run example evaluation\npython examples/basic_evaluation.py\n```\n\n### Docker\n\n```bash\n# Build image\ndocker build -t nova-eval .\n\n# Run evaluation\ndocker run -v $(pwd)/config:/config -v $(pwd)/results:/results nova-eval --config /config/eval.yaml\n```\n\n### Kubernetes\n\n```bash\n# Deploy to Kubernetes\nkubectl apply -f kubernetes/\n\n# Check status\nkubectl get pods -l app=nova-eval\n```\n\n## \ud83d\udd27 Configuration\n\nNovaEval supports configuration through:\n\n- **YAML/JSON files**: Declarative configuration\n- **Environment variables**: Runtime configuration\n- **Python code**: Programmatic configuration\n- **CLI arguments**: Command-line overrides\n\n### Environment Variables\n\n```bash\nexport NOVA_EVAL_OUTPUT_DIR=\"./results\"\nexport NOVA_EVAL_LOG_LEVEL=\"INFO\"\nexport OPENAI_API_KEY=\"your-api-key\"\nexport AWS_ACCESS_KEY_ID=\"your-aws-key\"\n```\n\n### CI/CD Integration\n\nNovaEval includes optimized GitHub Actions workflows:\n- **Unit tests** run on all PRs and pushes for quick feedback\n- **Integration tests** run on main branch only to minimize API costs\n- **Cross-platform testing** on macOS, Linux, and Windows\n\n## \ud83d\udcc8 Reporting and Artifacts\n\nNovaEval generates comprehensive evaluation reports:\n\n- **Summary Reports**: High-level metrics and insights\n- **Detailed Results**: Per-sample predictions and scores\n- **Visualizations**: Charts and graphs for result analysis\n- **Artifacts**: Model outputs, intermediate results, and debug information\n- **Export Formats**: JSON, CSV, HTML, PDF\n\n### Example Report Structure\n\n```\nresults/\n\u251c\u2500\u2500 summary.json              # High-level metrics\n\u251c\u2500\u2500 detailed_results.csv      # Per-sample results\n\u251c\u2500\u2500 artifacts/\n\u2502   \u251c\u2500\u2500 model_outputs/        # Raw model responses\n\u2502   \u251c\u2500\u2500 intermediate/         # Processing artifacts\n\u2502   \u2514\u2500\u2500 debug/               # Debug information\n\u251c\u2500\u2500 visualizations/\n\u2502   \u251c\u2500\u2500 accuracy_by_category.png\n\u2502   \u251c\u2500\u2500 score_distribution.png\n\u2502   \u2514\u2500\u2500 confusion_matrix.png\n\u2514\u2500\u2500 report.html              # Interactive HTML report\n```\n\n## \ud83d\udd0c Extending NovaEval\n\n### Custom Datasets\n\n```python\nfrom novaeval.datasets import BaseDataset\n\nclass MyCustomDataset(BaseDataset):\n    def load_data(self):\n        # Implement data loading logic\n        return samples\n\n    def get_sample(self, index):\n        # Return individual sample\n        return sample\n```\n\n### Custom Scorers\n\n```python\nfrom novaeval.scorers import BaseScorer\n\nclass MyCustomScorer(BaseScorer):\n    def score(self, prediction, ground_truth, context=None):\n        # Implement scoring logic\n        return score\n```\n\n### Custom Models\n\n```python\nfrom novaeval.models import BaseModel\n\nclass MyCustomModel(BaseModel):\n    def generate(self, prompt, **kwargs):\n        # Implement model inference\n        return response\n```\n\n## \ud83e\udd1d Contributing\n\nWe welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.\n\n### Development Setup\n\n```bash\n# Clone repository\ngit clone https://github.com/Noveum/NovaEval.git\ncd NovaEval\n\n# Create virtual environment\npython -m venv venv\nsource venv/bin/activate  # On Windows: venv\\Scripts\\activate\n\n# Install development dependencies\npip install -e \".[dev]\"\n\n# Install pre-commit hooks\npre-commit install\n\n# Run tests\npytest\n\n# Run with coverage (23% overall, 90%+ for core modules)\npytest --cov=src/novaeval --cov-report=html\n```\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.\n\n## \ud83d\ude4f Acknowledgments\n\n- Inspired by evaluation frameworks like DeepEval, Confident AI, and Braintrust\n- Built with modern Python best practices and industry standards\n- Designed for the AI evaluation community\n\n## \ud83d\udcde Support\n\n- **Documentation**: [https://noveum.github.io/NovaEval](https://noveum.github.io/NovaEval)\n- **Issues**: [GitHub Issues](https://github.com/Noveum/NovaEval/issues)\n- **Discussions**: [GitHub Discussions](https://github.com/Noveum/NovaEval/discussions)\n- **Email**: support@noveum.ai\n\n---\n\nMade with \u2764\ufe0f by the Noveum.ai team\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A comprehensive, open-source LLM evaluation framework for testing and benchmarking AI models",
    "version": "0.3.2",
    "project_urls": {
        "Bug Tracker": "https://github.com/Noveum/NovaEval/issues",
        "Changelog": "https://github.com/Noveum/NovaEval/blob/main/CHANGELOG.md",
        "Documentation": "https://novaeval.readthedocs.io",
        "Homepage": "https://github.com/Noveum/NovaEval",
        "Repository": "https://github.com/Noveum/NovaEval"
    },
    "split_keywords": [
        "llm",
        " evaluation",
        " ai",
        " machine-learning",
        " benchmarking",
        " testing",
        " rag",
        " agents",
        " conversational-ai",
        " g-eval"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a16b4760d39e49c170eb340fada85e03a06d3f94f8acf4397095f485acee5633",
                "md5": "16b0ba3bd64f07cdec5fe3c019bf2b85",
                "sha256": "ebf601ab85ff23dad3cf6cf68b3f42bdc83d211a50f923098f257b895bab5600"
            },
            "downloads": -1,
            "filename": "novaeval-0.3.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "16b0ba3bd64f07cdec5fe3c019bf2b85",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 81265,
            "upload_time": "2025-07-12T20:19:57",
            "upload_time_iso_8601": "2025-07-12T20:19:57.840679Z",
            "url": "https://files.pythonhosted.org/packages/a1/6b/4760d39e49c170eb340fada85e03a06d3f94f8acf4397095f485acee5633/novaeval-0.3.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "adb61d0b1f2fcd8186a9b5bac9ba80287c3b382ba7fb12ba754f0a9f058012a1",
                "md5": "b82804e3a65abde9716baf0eefba86ee",
                "sha256": "84fd5aa8f91faee5fd42c8b447d14ce4e3054a4a3592f72714dcbcfe0a7589c7"
            },
            "downloads": -1,
            "filename": "novaeval-0.3.2.tar.gz",
            "has_sig": false,
            "md5_digest": "b82804e3a65abde9716baf0eefba86ee",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 136260,
            "upload_time": "2025-07-12T20:19:59",
            "upload_time_iso_8601": "2025-07-12T20:19:59.106295Z",
            "url": "https://files.pythonhosted.org/packages/ad/b6/1d0b1f2fcd8186a9b5bac9ba80287c3b382ba7fb12ba754f0a9f058012a1/novaeval-0.3.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-12 20:19:59",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Noveum",
    "github_project": "NovaEval",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "pydantic",
            "specs": [
                [
                    ">=",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "pyyaml",
            "specs": [
                [
                    ">=",
                    "6.0"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    ">=",
                    "2.28.0"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.21.0"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "1.3.0"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    ">=",
                    "4.64.0"
                ]
            ]
        },
        {
            "name": "click",
            "specs": [
                [
                    ">=",
                    "8.0.0"
                ]
            ]
        },
        {
            "name": "rich",
            "specs": [
                [
                    ">=",
                    "12.0.0"
                ]
            ]
        },
        {
            "name": "jinja2",
            "specs": [
                [
                    ">=",
                    "3.0.0"
                ]
            ]
        },
        {
            "name": "plotly",
            "specs": [
                [
                    ">=",
                    "5.0.0"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "datasets",
            "specs": [
                [
                    ">=",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "transformers",
            "specs": [
                [
                    ">=",
                    "4.20.0"
                ]
            ]
        },
        {
            "name": "openai",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "anthropic",
            "specs": [
                [
                    ">=",
                    "0.3.0"
                ]
            ]
        },
        {
            "name": "boto3",
            "specs": [
                [
                    ">=",
                    "1.26.0"
                ]
            ]
        },
        {
            "name": "sentence-transformers",
            "specs": [
                [
                    ">=",
                    "2.2.0"
                ]
            ]
        }
    ],
    "lcname": "novaeval"
}
        
Elapsed time: 1.46800s