Name | Version | Summary | date |
modelradar |
0.2.1 |
Aspect-based Forecasting Accuracy |
2025-07-15 20:52:23 |
superoptix |
0.1.0b4 |
Full Stack Agentic AI Framework |
2025-07-15 20:15:11 |
langsmith |
0.4.6 |
Client library to connect to the LangSmith LLM Tracing and Evaluation Platform. |
2025-07-15 19:43:18 |
quotientai |
0.4.3 |
Python library for tracing, logging, and detecting problems with AI Agents |
2025-07-14 17:10:41 |
novaeval |
0.3.3 |
A comprehensive, open-source LLM evaluation framework for testing and benchmarking AI models |
2025-07-14 03:28:13 |
RadEval |
0.0.1rc6 |
All-in-one metrics for evaluating AI-generated radiology text |
2025-07-13 23:05:21 |
SurvivalEVAL |
0.4.4 |
The most comprehensive Python package for evaluating survival analysis models. |
2025-07-12 22:46:15 |
rag-evaluation |
0.2.1 |
A robust Python package for evaluating Retrieval-Augmented Generation (RAG) systems. |
2025-07-12 22:38:32 |
pypitest-radeval |
0.0.3 |
All-in-one metrics for evaluating AI-generated radiology text |
2025-07-11 14:54:29 |
AgentDS-Bench |
1.2.2 |
Python client for AgentDS-Bench: A streamlined benchmarking platform for evaluating AI agent capabilities in data science tasks |
2025-07-09 21:21:17 |
agenta |
0.49.3 |
The SDK for agenta is an open-source LLMOps platform. |
2025-07-09 13:29:26 |
open-rag-eval |
0.2.0 |
A Python package for RAG Evaluation |
2025-07-08 17:20:26 |
benchwise |
0.1.0a1 |
The GitHub of LLM Evaluation - Python SDK |
2025-07-08 10:16:01 |
guidellm |
0.2.1 |
Guidance platform for deploying and managing large language models. |
2025-04-29 17:49:39 |
evo |
1.31.1 |
Python package for the evaluation of odometry and SLAM |
2025-03-20 15:37:42 |
ragmetrics-client |
0.1.9 |
Monitor your LLM calls. Test your LLM app. |
2025-03-14 23:05:52 |
math-verify |
0.7.0 |
HuggingFace library for verifying mathematical answers |
2025-02-27 16:21:04 |
trajectopy |
2.4.2 |
Trajectory Evaluation in Python |
2025-02-26 08:34:59 |
python-lilypad |
0.0.23 |
An open-source prompt engineering framework. |
2025-02-25 03:25:39 |
providentia |
2.4.0 |
Providentia is designed to allow on-the-fly, offline and interactive analysis of experiment outputs, with respect to processed observational data. |
2025-02-12 13:36:50 |