| Name | Version | Summary | date | 
        
        
            
                | stichotrope | 
                0.1.0 | 
                Python profiling library with block-level profiling and multi-track organization | 
                2025-11-04 09:20:28 | 
            
        
            
                | knows | 
                2.0.3 | 
                Powerful and user-friendly property graph benchmark that creates graphs with specified node and edge numbers, supporting multiple output formats and visualization | 
                2025-11-03 18:29:44 | 
            
        
            
                | novaeval | 
                0.7.0 | 
                A comprehensive, open-source LLM evaluation framework for testing and benchmarking AI models | 
                2025-11-03 09:38:25 | 
            
        
            
                | omnibar | 
                0.2.0 | 
                Comprehensive AI Agent Benchmarking Framework (OmniBAR) | 
                2025-11-03 05:20:38 | 
            
        
            
                | ariadne-router | 
                0.4.4 | 
                Intelligent quantum simulator router with automatic backend selection | 
                2025-11-02 22:09:03 | 
            
        
            
                | tool-scorer | 
                1.3.3 | 
                Catch LLM agent regressions before deployment. Test tool-calling accuracy for OpenAI, Anthropic, Gemini with pytest integration and CI/CD workflows. | 
                2025-10-28 21:46:55 | 
            
        
            
                | generic-llm-api-client | 
                0.1.2 | 
                A unified, provider-agnostic Python client for multiple LLM APIs | 
                2025-10-28 13:32:06 | 
            
        
            
                | bbob-jax | 
                0.5.0 | 
                BBOB Benchmark function implemented in JAX | 
                2025-10-27 11:15:39 | 
            
        
            
                | lmur | 
                2.1.1 | 
                Neural Network Dataset | 
                2025-10-27 09:53:42 | 
            
        
            
                | nn-dataset | 
                2.1.1 | 
                Neural Network Dataset | 
                2025-10-27 09:45:16 | 
            
        
            
                | LevDoom | 
                1.0.3 | 
                LevDoom: A Generalization Benchmark for Deep Reinforcement Learning | 
                2025-10-25 17:16:51 | 
            
        
            
                | raven-pyu | 
                1.0.2 | 
                Utilities for Python | 
                2025-10-20 11:23:33 | 
            
        
            
                | insdc-benchmarking-schema | 
                1.2.0 | 
                JSON schema and validation for INSDC benchmarking results | 
                2025-10-16 11:37:57 | 
            
        
            
                | lrdbenchmark | 
                2.2.0 | 
                Comprehensive Long-Range Dependence Benchmarking Framework with Classical, ML, and Neural Network Estimators + 5 Demonstration Notebooks | 
                2025-10-14 09:26:25 | 
            
        
            
                | LLMEvaluationFramework | 
                0.0.21 | 
                Enterprise-Grade Python Framework for Large Language Model Evaluation & Testing | 
                2025-10-12 08:37:49 | 
            
        
            
                | guidellm | 
                0.3.1 | 
                Guidance platform for deploying and managing large language models. | 
                2025-10-10 13:40:23 | 
            
        
            
                | mcpuniverse | 
                1.0.3 | 
                A framework for developing and benchmarking AI agents using Model Context Protocol (MCP) | 
                2025-10-07 08:23:22 | 
            
        
            
                | mlbench-lite | 
                2.0.3 | 
                A simple machine learning benchmarking library | 
                2025-09-18 20:34:55 | 
            
        
            
                | causallm | 
                4.2.0 | 
                Production-ready causal inference with comprehensive monitoring, testing, and LLM integration | 
                2025-09-09 17:14:52 | 
            
        
            
                | omnibench | 
                0.1.2 | 
                Comprehensive AI Agent Benchmarking Framework | 
                2025-09-08 22:17:51 |