Name | Version | Summary | date |
agentdojo |
0.1.26 |
A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents |
2025-02-12 08:29:46 |
robobench |
0.0.2 |
A benchmarking tool for AI models and Hardware. |
2025-02-09 07:59:11 |
localbench |
0.0.2 |
A benchmarking tool for Local LLMs. |
2025-02-09 01:39:35 |
rl4co |
0.5.2 |
RL4CO: an Extensive Reinforcement Learning for Combinatorial Optimization Benchmark |
2025-01-26 07:48:28 |
gtrbench |
0.0.1 |
A benchmark to evaluate implicit reasoning in LLMs using guess-the-rule games |
2025-01-19 01:58:11 |
cedarverse-bda |
0.0.1 |
Interactive dashboard tool for analyzing differences between Aider benchmark runs |
2024-12-18 19:52:50 |
hulu-evaluate |
0.0.2 |
Client library to train and evaluate models on the HuLu benchmark. |
2024-12-04 13:14:56 |
benchmark-4dn |
0.5.25 |
Benchmark functions that returns total space, mem, cpu given input size and parameters for the CWL workflows |
2024-11-20 19:34:31 |
causalbench-asu |
0.1rc9 |
Spatio Temporal Causal Benchmarking Platform |
2024-10-21 23:25:24 |
polybench |
0.3.0 |
Multivariate polynomial arithmetic benchmark tests. |
2024-09-28 08:59:41 |
vector-db-benchmark |
1.0.0 |
Benchmarking tool for vector databases |
2024-09-26 01:11:44 |
kerncraft |
0.8.16 |
Loop Kernel Analysis and Performance Modeling Toolkit |
2024-09-04 10:51:06 |
llm-bench |
0.4.32 |
LLM Benchmarking tool for OLLAMA |
2024-07-23 16:39:44 |
egenix-micro-benchmark |
0.1.0 |
Micro benchmark tooling for Python |
2024-05-21 12:10:17 |
FitBenchmarking |
1.1.0 |
FitBenchmarking: A tool for comparing fitting software |
2024-05-15 12:27:40 |
jailbreakbench |
0.1.3 |
An Open Robustness Benchmark for Jailbreaking Language Models |
2024-04-06 15:41:26 |
llm_benchmark |
0.3.1 |
LLM Benchmark for Throughputs via Ollama |
2024-03-30 21:47:35 |
input-tool |
2.0.2 |
Tool which simplifies creating and testing inputs for programming contests. |
2024-03-12 19:25:49 |
new-ai-benchmark |
2.7.0 |
AI Benchmark is an open source python library for evaluating AI performance of various hardware platforms, including CPUs, GPUs and TPUs. |
2024-03-10 18:53:48 |
attribench |
0.1.9 |
A benchmark for feature attribution techniques |
2024-03-06 09:36:36 |