Name | Version | Summary | date |
zeroeval |
0.6.118 |
ZeroEval SDK |
2025-08-31 06:14:42 |
satquest |
0.1.2 |
A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs |
2025-08-30 19:51:50 |
randomstatsmodels |
1.1.2 |
Tools for benchmarking, metrics, and models. |
2025-08-29 18:37:57 |
dyff |
0.35.1 |
Meta-package to install the local SDK for the Dyff AI auditing platform. |
2025-08-29 18:35:04 |
agenta |
0.51.3 |
The SDK for agenta is an open-source LLMOps platform. |
2025-08-29 10:19:12 |
novaeval |
0.5.2 |
A comprehensive, open-source LLM evaluation framework for testing and benchmarking AI models |
2025-08-29 06:18:10 |
dyff-schema |
0.31.2 |
Data models for the Dyff AI auditing platform. |
2025-08-28 16:35:34 |
evalassist |
0.1.26 |
EvalAssist is an open-source project that simplifies using large language models as evaluators (LLM-as-a-Judge) of the output of other large language models by supporting users in iteratively refining evaluation criteria in a web-based user experience. |
2025-08-28 01:10:48 |
clyrdia-cli |
1.2.1 |
Zero-Knowledge AI Benchmarking Platform |
2025-08-28 01:02:31 |
redlite |
0.3.19 |
LLM testing on steroids |
2025-08-27 13:35:23 |
AutoRAG |
0.3.17 |
Automatically Evaluate RAG pipelines with your own data. Find optimal structure for new RAG product. |
2025-08-26 10:47:09 |
evalscope |
1.0.0 |
EvalScope: Lightweight LLMs Evaluation Framework |
2025-08-25 06:53:41 |
deeprails |
0.3.2 |
Python SDK for interacting with the DeepRails API |
2025-08-22 11:54:17 |
mandoline |
0.8.0 |
Official Python client for the Mandoline API |
2025-08-21 17:51:18 |
cli-arena |
1.1.6 |
The definitive AI coding agent evaluation platform |
2025-08-20 15:31:01 |
open-rag-eval |
0.2.1 |
A Python package for RAG Evaluation |
2025-08-19 17:34:06 |
pytrec-eval-terrier |
0.5.8 |
Provides Python bindings for popular Information Retrieval measures implemented within trec_eval. |
2025-08-19 16:32:33 |
ranx-k |
0.0.17 |
Korean-optimized RAG evaluation toolkit based on ranx with Kiwi tokenizer and Korean language support |
2025-08-19 15:17:36 |
oss-redteam |
0.1.1 |
GPT-OSS red-teaming pipeline and harness (OpenAI-compatible) |
2025-08-18 03:07:44 |
superoptix |
0.1.0b17 |
Full Stack Agentic AI Framework |
2025-08-18 00:14:00 |