Name | Version | Summary | date |
langsmith |
0.4.20 |
Client library to connect to the LangSmith LLM Tracing and Evaluation Platform. |
2025-08-28 00:23:43 |
eyantra-autoeval |
0.1.55 |
A python module to aid auto evaluation |
2025-08-27 10:39:18 |
evaluation-service-base |
0.1.1 |
A comprehensive framework for building evaluation services with progress tracking, task management, and result handling |
2025-08-27 10:33:28 |
ntqr |
0.7 |
Tools for the logic of evaluation using unlabeled data |
2025-08-25 15:41:12 |
paperazzi |
0.0.7 |
LLM-Based Paper Query System with Evaluation Framework |
2025-08-20 16:10:29 |
GAICo |
0.3.0 |
GenAI Results Comparator, GAICo, is a Python library to help compare, analyze and visualize outputs from Large Language Models (LLMs), often against a reference text. In doing so, one can use a range of extensible metrics from the literature. |
2025-08-18 15:38:45 |
chainforge |
0.3.6.2 |
A Visual Programming Environment for Prompt Engineering |
2025-08-16 00:48:06 |
ragbits-guardrails |
1.2.2 |
Guardrails module for Ragbits components |
2025-08-09 18:12:34 |
ragbits-evaluate |
1.2.2 |
Evaluation module for Ragbits components |
2025-08-09 18:12:33 |
llama-index-packs-llama-dataset-metadata |
0.4.0 |
llama-index packs llama_dataset_metadata integration |
2025-07-30 20:51:35 |
daindex |
0.8.2 |
Deterioration Allocation Index Framework |
2025-07-25 23:23:24 |
multimedeval |
1.0.0 |
A Python tool to evaluate the performance of VLM on the medical domain. |
2025-07-23 14:44:40 |
evaluate |
0.4.5 |
HuggingFace community-driven open-source library of evaluation |
2025-07-10 13:26:46 |
enoslib |
10.2.0 |
A library to build (distributed) systems experiments |
2025-07-08 22:21:18 |
subset2evaluate |
1.0.5 |
Find informative examples to efficiently (human-)evaluate NLG models. |
2025-02-19 16:13:55 |
uval |
0.2.1 |
This python package is meant to provide a high level interface to facilitate the evaluation of object detection and segmentation algorithms that operate on 3D volumetric data. |
2025-01-21 18:48:28 |
llm-evaluation-in-reasoning |
1.4.2 |
A project for evaluating reasoning capabilities in large language models (LLMs). |
2025-01-17 07:13:34 |
lighteval |
0.7.0 |
A lightweight and configurable evaluation package |
2025-01-03 15:44:54 |
indoxJudge |
0.1.0 |
Indox Judge |
2024-12-19 14:09:13 |
costra |
1.1 |
None |
2024-12-13 12:08:58 |