| Name | Version | Summary | date |
| novaeval |
0.7.0 |
A comprehensive, open-source LLM evaluation framework for testing and benchmarking AI models |
2025-11-03 09:38:25 |
| omnibar |
0.2.0 |
Comprehensive AI Agent Benchmarking Framework (OmniBAR) |
2025-11-03 05:20:38 |
| ariadne-router |
0.4.4 |
Intelligent quantum simulator router with automatic backend selection |
2025-11-02 22:09:03 |
| tool-scorer |
1.3.3 |
Catch LLM agent regressions before deployment. Test tool-calling accuracy for OpenAI, Anthropic, Gemini with pytest integration and CI/CD workflows. |
2025-10-28 21:46:55 |
| generic-llm-api-client |
0.1.2 |
A unified, provider-agnostic Python client for multiple LLM APIs |
2025-10-28 13:32:06 |
| bbob-jax |
0.5.0 |
BBOB Benchmark function implemented in JAX |
2025-10-27 11:15:39 |
| lmur |
2.1.1 |
Neural Network Dataset |
2025-10-27 09:53:42 |
| nn-dataset |
2.1.1 |
Neural Network Dataset |
2025-10-27 09:45:16 |
| LevDoom |
1.0.3 |
LevDoom: A Generalization Benchmark for Deep Reinforcement Learning |
2025-10-25 17:16:51 |
| raven-pyu |
1.0.2 |
Utilities for Python |
2025-10-20 11:23:33 |
| insdc-benchmarking-schema |
1.2.0 |
JSON schema and validation for INSDC benchmarking results |
2025-10-16 11:37:57 |
| lrdbenchmark |
2.2.0 |
Comprehensive Long-Range Dependence Benchmarking Framework with Classical, ML, and Neural Network Estimators + 5 Demonstration Notebooks |
2025-10-14 09:26:25 |
| LLMEvaluationFramework |
0.0.21 |
Enterprise-Grade Python Framework for Large Language Model Evaluation & Testing |
2025-10-12 08:37:49 |
| guidellm |
0.3.1 |
Guidance platform for deploying and managing large language models. |
2025-10-10 13:40:23 |
| mcpuniverse |
1.0.3 |
A framework for developing and benchmarking AI agents using Model Context Protocol (MCP) |
2025-10-07 08:23:22 |
| mlbench-lite |
2.0.3 |
A simple machine learning benchmarking library |
2025-09-18 20:34:55 |
| causallm |
4.2.0 |
Production-ready causal inference with comprehensive monitoring, testing, and LLM integration |
2025-09-09 17:14:52 |
| omnibench |
0.1.2 |
Comprehensive AI Agent Benchmarking Framework |
2025-09-08 22:17:51 |
| clyrdia-cli |
2.0.1 |
State-of-the-Art AI Benchmarking for CI/CD |
2025-09-08 16:25:53 |
| kode-kronical |
0.7.1 |
A lightweight Python performance tracking library with automatic data collection and visualization |
2025-09-02 06:11:35 |