# NVIDIA HELM Benchmark Framework
This directory contains the HELM (Holistic Evaluation of Language Models) framework for evaluating large language models in medical applications across various healthcare tasks.
## Overview
The HELM framework provides a comprehensive evaluation system for medical AI models, supporting multiple benchmark datasets and evaluation scenarios. It's designed to work with the EvalFactory infrastructure for standardized model evaluation.
## Available Benchmarks
The framework supports the following medical evaluation benchmarks:
| Benchmark | Description | Type |
|-----------|-------------|------|
| **medcalc_bench** | Medical calculation benchmark with patient notes and ground truth answers | Medical QA |
| **medec** | Medical error detection and correction pairs | Error Detection |
| **head_qa** | Biomedical multiple-choice questions for medical knowledge testing | Medical QA |
| **medbullets** | USMLE-style medical questions with explanations | Medical QA |
| **pubmed_qa** | PubMed abstracts with yes/no/maybe questions | Medical QA |
| **ehr_sql** | Natural language to SQL query generation for clinical research | SQL Generation |
| **race_based_med** | Detection of race-based biases in medical LLM outputs | Bias Detection |
| **medhallu** | Classification of factual vs hallucinated medical answers | Hallucination Detection |
## Quick Start
### 1. Environment Setup
First, ensure you have the required environment variables set:
```bash
# Set your API keys
export OPENAI_API_KEY="your-api-key-here"
# Set Python path if necessary
export PYTHONPATH=$PYTHONPATH:$.
```
### 2. Running Your First Benchmark
#### Method 1: Using `eval-factory` (Recommended)
`eval-factory` is a wrapper that simplifies the HELM benchmark process by handling configuration generation, benchmark execution, and result formatting automatically.
**What `eval-factory` does internally:**
1. **Configuration Processing**: Loads your YAML config and merges it with framework defaults
2. **Dynamic Config Generation**: Creates the necessary HELM model configurations dynamically
3. **Benchmark Execution**: Runs the HELM benchmark with proper parameters
4. **Result Processing**: Formats and saves results in standardized YAML format
Create a configuration file (e.g., `my_test.yml`):
```yaml
config:
type: medcalc_bench # Choose from available benchmarks
output_dir: results/my_test
target:
api_endpoint:
url: https://api.openai.com/v1
model_id: gpt-4
type: chat
api_key: OPENAI_API_KEY
```
Run the evaluation:
```bash
eval-factory run_eval \
--output_dir results/my_test \
--run_config my_test.yml
```
**Internal Process Breakdown:**
1. **Config Loading & Validation**:
- Loads your YAML configuration
- Validates against framework schema
- Merges with default parameters from `framework.yml`
2. **Dynamic Model Config Generation**:
- Calls `scripts/generate_dynamic_model_configs.py`
- Creates model-specific configuration files
- Handles provider-specific API endpoints and authentication
3. **HELM Benchmark Execution**:
- Executes `helm-run` with generated configurations
- Downloads and prepares benchmark datasets
- Runs evaluations with specified parameters
- Caches responses for efficiency
4. **Result Processing**:
- Collects raw benchmark results
- Formats into standardized YAML output
- Saves results in your specified output directory
#### Method 2: Using `helm-run` directly
```bash
helm-run \
--run-entries medcalc_bench:model=openai/gpt-4 \
--suite my-suite \
--max-eval-instances 10 \
--num-train-trials 1 \
-o results/my_test
```
**Comparison: `eval-factory` vs `helm-run`**
| Feature | `eval-factory` | `helm-run` |
|---------|-------------------|------------|
| **Configuration** | Simple YAML config | Complex command-line arguments |
| **Model Setup** | Automatic config generation | Manual model registration required |
| **Provider Support** | Built-in adapter handling | Requires custom model configs |
| **Results Format** | Standardized YAML output | Native HELM format only |
| **Ease of Use** | Beginner-friendly | Advanced users only |
| **Integration** | EvalFactory compatible | HELM-specific |
**Recommendation**: Use `eval-factory` for most use cases, especially when working with EvalFactory. Use `helm-run` only when you need fine-grained control over HELM's native features.
### 3. Understanding the Output
After running a benchmark, you'll find results in your specified output directory:
```
results/my_test/
├── responses/ # Raw model responses
├── cache.db # Cached responses for efficiency
├── instances.jsonl # Evaluation instances
├── results.jsonl # Final evaluation results
├── model_configs/ # Generated HELM model configurations
└── evaluation_config.yaml # Standardized evaluation results
```
**Generated Files Explanation:**
- **`responses/`**: Contains raw API responses from the model for each evaluation instance
- **`cache.db`**: SQLite database caching responses to avoid re-running identical queries
- **`instances.jsonl`**: The evaluation instances (questions, prompts, etc.) used in the benchmark
- **`results.jsonl`**: HELM's native results format with detailed metrics
- **`model_configs/`**: Dynamically generated configuration files for the specific model and provider
- **`evaluation_config.yaml`**: Standardized results in YAML format compatible with EvalFactory
**Key Advantage**: `eval-factory` automatically handles the complexity of HELM configuration generation, making it much easier to run benchmarks compared to using `helm-run` directly.
## Step-by-Step Guide
### Step 1: Choose Your Benchmark
Select from the available benchmarks based on your evaluation needs:
- **For general medical QA**: `medcalc_bench`, `head_qa`, `medbullets`
- **For error detection**: `medec`
- **For research applications**: `pubmed_qa`, `ehr_sql`
- **For safety evaluation**: `race_based_med`, `medhallu`
### Step 2: Configure Your Model
Create a YAML configuration file with your model details. Here are examples for different providers:
#### OpenAI Configuration
```yaml
config:
type: medcalc_bench
output_dir: results/openai_test
target:
api_endpoint:
url: https://api.openai.com/v1
model_id: gpt-4
type: chat
api_key: OPENAI_API_KEY
```
#### NVIDIA AI Foundation Models (build.nvidia.com)
```yaml
config:
type: pubmed_qa
output_dir: results/nim_test
target:
api_endpoint:
url: https://integrate.api.nvidia.com/v1
model_id: nvdev/meta/llama-3.3-70b-instruct
type: chat
api_key: OPENAI_API_KEY
```
#### NVIDIA Cloud Function (nvcf)
```yaml
config:
type: ehr_sql
output_dir: results/nvcf_test
target:
api_endpoint:
url: https://api.nvcf.nvidia.com/v2/nvcf/pexec/functions/13e4f873-9d52-4ba9-8194-61baf8dc2bc9/
model_id: meta-llama/Llama-3.3-70B-Instruct
type: chat
api_key: OPENAI_API_KEY
adapter_config:
use_nvcf: true
```
### Model Naming Conventions
Different providers use different model ID formats:
- **OpenAI**: `gpt-4`, `gpt-3.5-turbo`, `text-davinci-003`
- **NVIDIA**: `meta-llama/Llama-3.3-70B-Instruct`, `mistral-7b-instruct`
**Note**: NVCF requires a specific function ID in the URL and the `use_nvcf: true` adapter configuration.
### Step 3: Set Up API Credentials
Ensure your API credentials are properly configured:
```bash
# For OpenAI models
export OPENAI_API_KEY="<very-long-sequence>"
# For NVIDIA AI Foundation Models (build.nvidia.com)
export OPENAI_API_KEY="nvapi-..." # Uses same env var as OpenAI
# For NVIDIA Cloud Function (nvcf)
export OPENAI_API_KEY="nvapi-..." # Uses same env var as OpenAI
# Note: NVIDIA services typically use the same OPENAI_API_KEY environment variable
# but with NVIDIA-specific API keys (nvapi-... format)
```
### Step 4: Run the Evaluation
Execute the benchmark using one of the methods above. The framework will:
1. **Load the configuration** and validate parameters
2. **Generate model configs** dynamically for the specified model
3. **Download and prepare** the benchmark dataset
4. **Run evaluations** on the specified number of instances
5. **Cache responses** for efficiency and reproducibility
6. **Generate results** in standardized format
### Step 5: Analyze Results
Review the generated results:
```bash
# View raw results
cat results/my_test/results.jsonl
# Use HELM tools for analysis
helm-summarize --suite my-suite
helm-server # Start web interface to view results
```
## Advanced Configuration
### Customizing Evaluation Parameters
You can customize various parameters in your configuration:
```yaml
config:
type: medcalc_bench
output_dir: results/advanced_test
params:
limit_samples: 100 # Limit number of evaluation instances
parallelism: 4 # Number of parallel threads
extra:
num_train_trials: 3 # Number of training trials
max_length: 2048 # Maximum token length
target:
api_endpoint:
url: https://api.openai.com/v1
model_id: gpt-4
type: chat
api_key: OPENAI_API_KEY
```
### Advanced Configuration Parameters
The `config.params.extra` section provides additional parameters for fine-tuning evaluations:
#### `data_path`
- **Purpose**: Custom data path for scenarios that support it
- **Supported Scenarios**: `ehrshot`, `clear`, `medalign`, `n2c2_ct_matching`
- **Example**: `"/path/to/custom/data"`
- **Description**: Overrides the default data location for the scenario
#### `num_output_tokens`
- **Purpose**: Maximum number of tokens the model is allowed to generate in its response
- **Scope**: Controls only the output length, not the total sequence length
- **Example**: `1000` limits model responses to 1000 tokens
- **Use Case**: Useful for controlling response length in generation tasks
#### `max_length`
- **Purpose**: Maximum total length for the entire input-output sequence (input + output combined)
- **Scope**: Controls the combined length of both prompt and response
- **Example**: `2048` limits total conversation to 2048 tokens
- **Difference from num_output_tokens**: This controls total sequence length, while num_output_tokens only controls response length
#### `subject`
- **Purpose**: Specific task or subset to evaluate within a scenario
- **Examples by Scenario**:
- **ehrshot**: `"guo_readmission"`, `"new_hypertension"`, `"lab_anemia"`
- **n2c2_ct_matching**: `"ABDOMINAL"`, `"ADVANCED-CAD"`, `"CREATININE"`
- **clear**: `"major_depression"`, `"bipolar_disorder"`, `"substance_use_disorder"`
- **Description**: Filters the evaluation to a specific prediction task or medical condition
#### `condition`
- **Purpose**: Specific condition or scenario variant to evaluate
- **Supported Scenarios**: `clear`
- **Examples**: `"alcohol_dependence"`, `"chronic_pain"`, `"homelessness"`
- **Description**: Used by scenarios like 'clear' to specify medical conditions for evaluation
#### `num_train_trials`
- **Purpose**: Number of training trials for few-shot evaluation
- **Behavior**: Each trial samples a different set of in-context examples
- **Example**: `3` runs the evaluation 3 times with different examples
- **Use Case**: Useful for robust evaluation with multiple few-shot configurations
### Example Configuration with All Parameters
```yaml
config:
type: ehrshot
output_dir: results/ehrshot_evaluation
params:
limit_samples: 500
parallelism: 2
extra:
data_path: "/custom/path/to/ehrshot/data"
num_output_tokens: 1000
max_length: 4096
subject: "guo_readmission"
num_train_trials: 3
target:
api_endpoint:
url: https://api.openai.com/v1
model_id: gpt-4
type: chat
api_key: OPENAI_API_KEY
```
### Running Multiple Benchmarks
To run multiple benchmarks on the same model:
```bash
# Create separate config files for each benchmark
eval-factory run_eval --output_dir results/medcalc_test --run_config medcalc_config.yml
eval-factory run_eval --output_dir results/medec_test --run_config medec_config.yml
eval-factory run_eval --output_dir results/head_qa_test --run_config head_qa_config.yml
```
### Dry Run Mode
Test your configuration without running the full evaluation:
```bash
eval-factory run_eval \
--output_dir results/test \
--run_config my_config.yml \
--dry_run
```
This will show you the rendered configuration and command without executing the benchmark.
## Troubleshooting
### Common Issues
1. **API Key Errors**: Ensure your API keys are properly set and valid
2. **Model Not Found**: Verify the model ID and endpoint URL are correct
3. **Memory Issues**: Reduce `parallelism` or `limit_samples` for large models
4. **Timeout Errors**: Increase timeout settings or reduce batch sizes
### Debug Mode
Enable debug logging for detailed information:
```bash
eval-factory --debug run_eval \
--output_dir results/debug_test \
--run_config debug_config.yml
```
### Checking Available Tasks
List all available evaluation types:
```bash
eval-factory ls
```
## Examples from commands.sh
Here are some practical examples from the project:
### Basic Medical Calculation Benchmark
```bash
eval-factory run_eval \
--output_dir test_cases/test_case_nim_llama_3_1_8b_medcalc_bench \
--run_config test_cases/test_case_nim_llama_3_1_8b_medcalc_bench.yml
```
### Medical Error Detection
```bash
eval-factory run_eval \
--output_dir test_cases/test_case_nim_llama_3_1_8b_medec \
--run_config test_cases/test_case_nim_llama_3_1_8b_medec.yml
```
### Biomedical QA
```bash
eval-factory run_eval \
--output_dir test_cases/test_case_nim_llama_3_1_8b_head_qa \
--run_config test_cases/test_case_nim_llama_3_1_8b_head_qa.yml
```
## Integration with EvalFactory
This framework is designed to work seamlessly with the EvalFactory infrastructure:
- **Standardized Output**: Results are generated in a format compatible with EvalFactory
- **Configuration Management**: Uses YAML-based configuration for easy integration
- **Caching**: Built-in caching for efficient re-runs and reproducibility
- **Extensibility**: Easy to add new benchmarks and evaluation metrics
## Contributing
To add new benchmarks or modify existing ones:
1. Update `framework.yml` with new benchmark definitions
2. Implement the benchmark logic in the appropriate adapter
3. Add test cases and documentation
4. Update this README with new benchmark information
## References
- [HELM Framework](https://github.com/stanford-crfm/helm)
- [EvalFactory Documentation](https://github.com/nvidia/eval-factory)
- [Medical AI Evaluation Papers](https://arxiv.org/abs/2401.00000)
For more detailed information about specific benchmarks and their implementations, refer to the individual benchmark documentation and the main HELM repository.
# Holistic Evaluation of Language Models (HELM)
<a href="https://github.com/stanford-crfm/helm">
<img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/stanford-crfm/helm">
</a>
<a href="https://github.com/stanford-crfm/helm/graphs/contributors">
<img alt="GitHub contributors" src="https://img.shields.io/github/contributors/stanford-crfm/helm">
</a>
<a href="https://github.com/stanford-crfm/helm/actions/workflows/test.yml?query=branch%3Amain">
<img alt="GitHub Actions Workflow Status" src="https://img.shields.io/github/actions/workflow/status/stanford-crfm/helm/test.yml">
</a>
<a href="https://crfm-helm.readthedocs.io/en/latest/">
<img alt="Documentation Status" src="https://readthedocs.org/projects/helm/badge/?version=latest">
</a>
<a href="https://github.com/stanford-crfm/helm/blob/main/LICENSE">
<img alt="License" src="https://img.shields.io/github/license/stanford-crfm/helm?color=blue" />
</a>
<a href="https://pypi.org/project/crfm-helm/">
<img alt="PyPI" src="https://img.shields.io/pypi/v/crfm-helm?color=blue" />
</a>
[comment]: <> (When using the img tag, which allows us to specify size, src has to be a URL.)
<img src="https://github.com/stanford-crfm/helm/raw/v0.5.4/helm-frontend/src/assets/helm-logo.png" alt="HELM logo" width="480"/>
**Holistic Evaluation of Language Models (HELM)** is an open source Python framework created by the [Center for Research on Foundation Models (CRFM) at Stanford](https://crfm.stanford.edu/) for holistic, reproducible and transparent evaluation of foundation models, including large language models (LLMs) and multimodal models. This framework includes the following features:
- Datasets and benchmarks in a standardized format (e.g. MMLU-Pro, GPQA, IFEval, WildBench)
- Models from various providers accessible through a unified interface (e.g. OpenAI models, Anthropic Claude, Google Gemini)
- Metrics for measuring various aspects beyond accuracy (e.g. efficiency, bias, toxicity)
- Web UI for inspecting individual prompts and responses
- Web leaderboard for comparing results across models and benchmarks
## Documentation
Please refer to [the documentation on Read the Docs](https://crfm-helm.readthedocs.io/) for instructions on how to install and run HELM.
## Quick Start
<!--quick-start-begin-->
Install the package from PyPI:
```sh
pip install crfm-helm
```
Run the following in your shell:
```sh
# Run benchmark
helm-run --run-entries mmlu:subject=philosophy,model=openai/gpt2 --suite my-suite --max-eval-instances 10
# Summarize benchmark results
helm-summarize --suite my-suite
# Start a web server to display benchmark results
helm-server --suite my-suite
```
Then go to http://localhost:8000/ in your browser.
<!--quick-start-end-->
## Leaderboards
We maintain offical leaderboards with results from evaluating recent models on notable benchmarks using this framework. Our current flagship leaderboards are:
- [HELM Capabilities](https://crfm.stanford.edu/helm/capabilities/latest/)
- [HELM Safety](https://crfm.stanford.edu/helm/safety/latest/)
- [Holistic Evaluation of Vision-Language Models (VHELM)](https://crfm.stanford.edu/helm/vhelm/latest/)
We also maintain leaderboards for a diverse range of domains (e.g. medicine, finance) and aspects (e.g. multi-linguality, world knowledge, regulation compliance). Refer to the [HELM website](https://crfm.stanford.edu/helm/) for a full list of leaderboards.
## Papers
The HELM framework was used in the following papers for evaluating models.
- **Holistic Evaluation of Language Models** - [paper](https://openreview.net/forum?id=iO4LZibEqW), [leaderboard](https://crfm.stanford.edu/helm/classic/latest/)
- **Holistic Evaluation of Vision-Language Models (VHELM)** - [paper](https://arxiv.org/abs/2410.07112), [leaderboard](https://crfm.stanford.edu/helm/vhelm/latest/), [documentation](https://crfm-helm.readthedocs.io/en/latest/vhelm/)
- **Holistic Evaluation of Text-To-Image Models (HEIM)** - [paper](https://arxiv.org/abs/2311.04287), [leaderboard](https://crfm.stanford.edu/helm/heim/latest/), [documentation](https://crfm-helm.readthedocs.io/en/latest/heim/)
- **Image2Struct: Benchmarking Structure Extraction for Vision-Language Models** - [paper](https://arxiv.org/abs/2410.22456)
- **Enterprise Benchmarks for Large Language Model Evaluation** - [paper](https://arxiv.org/abs/2410.12857), [documentation](https://crfm-helm.readthedocs.io/en/latest/enterprise_benchmark/)
- **The Mighty ToRR: A Benchmark for Table Reasoning and Robustness** - [paper](https://arxiv.org/abs/2502.19412), [leaderboard](https://crfm.stanford.edu/helm/torr/latest/)
- **Reliable and Efficient Amortized Model-based Evaluation** - [paper](https://arxiv.org/abs/2503.13335), [documentation](https://crfm-helm.readthedocs.io/en/latest/reeval/)
- **MedHELM** - paper in progress, [leaderboard](https://crfm.stanford.edu/helm/medhelm/latest/), [documentation](https://crfm-helm.readthedocs.io/en/latest/reeval/)
The HELM framework can be used to reproduce the published model evaluation results from these papers. To get started, refer to the documentation links above for the corresponding paper, or the [main Reproducing Leaderboards documentation](https://crfm-helm.readthedocs.io/en/latest/reproducing_leaderboards/).
## Citation
If you use this software in your research, please cite the [Holistic Evaluation of Language Models paper](https://openreview.net/forum?id=iO4LZibEqW) as below.
```bibtex
@article{
liang2023holistic,
title={Holistic Evaluation of Language Models},
author={Percy Liang and Rishi Bommasani and Tony Lee and Dimitris Tsipras and Dilara Soylu and Michihiro Yasunaga and Yian Zhang and Deepak Narayanan and Yuhuai Wu and Ananya Kumar and Benjamin Newman and Binhang Yuan and Bobby Yan and Ce Zhang and Christian Alexander Cosgrove and Christopher D Manning and Christopher Re and Diana Acosta-Navas and Drew Arad Hudson and Eric Zelikman and Esin Durmus and Faisal Ladhak and Frieda Rong and Hongyu Ren and Huaxiu Yao and Jue WANG and Keshav Santhanam and Laurel Orr and Lucia Zheng and Mert Yuksekgonul and Mirac Suzgun and Nathan Kim and Neel Guha and Niladri S. Chatterji and Omar Khattab and Peter Henderson and Qian Huang and Ryan Andrew Chi and Sang Michael Xie and Shibani Santurkar and Surya Ganguli and Tatsunori Hashimoto and Thomas Icard and Tianyi Zhang and Vishrav Chaudhary and William Wang and Xuechen Li and Yifan Mai and Yuhui Zhang and Yuta Koreeda},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2023},
url={https://openreview.net/forum?id=iO4LZibEqW},
note={Featured Certification, Expert Certification}
}
```
Raw data
{
"_id": null,
"home_page": "https://github.com/stanford-crfm/helm",
"name": "nvidia-crfm-helm",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "language models benchmarking",
"author": "Stanford CRFM",
"author_email": "contact-crfm@stanford.edu",
"download_url": null,
"platform": null,
"description": "# NVIDIA HELM Benchmark Framework\n\nThis directory contains the HELM (Holistic Evaluation of Language Models) framework for evaluating large language models in medical applications across various healthcare tasks.\n\n## Overview\n\nThe HELM framework provides a comprehensive evaluation system for medical AI models, supporting multiple benchmark datasets and evaluation scenarios. It's designed to work with the EvalFactory infrastructure for standardized model evaluation.\n\n## Available Benchmarks\n\nThe framework supports the following medical evaluation benchmarks:\n\n| Benchmark | Description | Type |\n|-----------|-------------|------|\n| **medcalc_bench** | Medical calculation benchmark with patient notes and ground truth answers | Medical QA |\n| **medec** | Medical error detection and correction pairs | Error Detection |\n| **head_qa** | Biomedical multiple-choice questions for medical knowledge testing | Medical QA |\n| **medbullets** | USMLE-style medical questions with explanations | Medical QA |\n| **pubmed_qa** | PubMed abstracts with yes/no/maybe questions | Medical QA |\n| **ehr_sql** | Natural language to SQL query generation for clinical research | SQL Generation |\n| **race_based_med** | Detection of race-based biases in medical LLM outputs | Bias Detection |\n| **medhallu** | Classification of factual vs hallucinated medical answers | Hallucination Detection |\n\n## Quick Start\n\n### 1. Environment Setup\n\nFirst, ensure you have the required environment variables set:\n\n```bash\n# Set your API keys\nexport OPENAI_API_KEY=\"your-api-key-here\"\n\n# Set Python path if necessary\nexport PYTHONPATH=$PYTHONPATH:$.\n```\n\n### 2. Running Your First Benchmark\n\n#### Method 1: Using `eval-factory` (Recommended)\n\n`eval-factory` is a wrapper that simplifies the HELM benchmark process by handling configuration generation, benchmark execution, and result formatting automatically.\n\n**What `eval-factory` does internally:**\n\n1. **Configuration Processing**: Loads your YAML config and merges it with framework defaults\n2. **Dynamic Config Generation**: Creates the necessary HELM model configurations dynamically\n3. **Benchmark Execution**: Runs the HELM benchmark with proper parameters\n4. **Result Processing**: Formats and saves results in standardized YAML format\n\nCreate a configuration file (e.g., `my_test.yml`):\n\n```yaml\nconfig:\n type: medcalc_bench # Choose from available benchmarks\n output_dir: results/my_test\ntarget:\n api_endpoint:\n url: https://api.openai.com/v1\n model_id: gpt-4\n type: chat\n api_key: OPENAI_API_KEY\n```\n\nRun the evaluation:\n\n```bash\neval-factory run_eval \\\n --output_dir results/my_test \\\n --run_config my_test.yml\n```\n\n**Internal Process Breakdown:**\n\n1. **Config Loading & Validation**: \n - Loads your YAML configuration\n - Validates against framework schema\n - Merges with default parameters from `framework.yml`\n\n2. **Dynamic Model Config Generation**:\n - Calls `scripts/generate_dynamic_model_configs.py`\n - Creates model-specific configuration files\n - Handles provider-specific API endpoints and authentication\n\n3. **HELM Benchmark Execution**:\n - Executes `helm-run` with generated configurations\n - Downloads and prepares benchmark datasets\n - Runs evaluations with specified parameters\n - Caches responses for efficiency\n\n4. **Result Processing**:\n - Collects raw benchmark results\n - Formats into standardized YAML output\n - Saves results in your specified output directory\n\n#### Method 2: Using `helm-run` directly\n\n```bash\nhelm-run \\\n --run-entries medcalc_bench:model=openai/gpt-4 \\\n --suite my-suite \\\n --max-eval-instances 10 \\\n --num-train-trials 1 \\\n -o results/my_test\n```\n\n**Comparison: `eval-factory` vs `helm-run`**\n\n| Feature | `eval-factory` | `helm-run` |\n|---------|-------------------|------------|\n| **Configuration** | Simple YAML config | Complex command-line arguments |\n| **Model Setup** | Automatic config generation | Manual model registration required |\n| **Provider Support** | Built-in adapter handling | Requires custom model configs |\n| **Results Format** | Standardized YAML output | Native HELM format only |\n| **Ease of Use** | Beginner-friendly | Advanced users only |\n| **Integration** | EvalFactory compatible | HELM-specific |\n\n**Recommendation**: Use `eval-factory` for most use cases, especially when working with EvalFactory. Use `helm-run` only when you need fine-grained control over HELM's native features.\n\n### 3. Understanding the Output\n\nAfter running a benchmark, you'll find results in your specified output directory:\n\n```\nresults/my_test/\n\u251c\u2500\u2500 responses/ # Raw model responses\n\u251c\u2500\u2500 cache.db # Cached responses for efficiency\n\u251c\u2500\u2500 instances.jsonl # Evaluation instances\n\u251c\u2500\u2500 results.jsonl # Final evaluation results\n\u251c\u2500\u2500 model_configs/ # Generated HELM model configurations\n\u2514\u2500\u2500 evaluation_config.yaml # Standardized evaluation results\n```\n\n**Generated Files Explanation:**\n\n- **`responses/`**: Contains raw API responses from the model for each evaluation instance\n- **`cache.db`**: SQLite database caching responses to avoid re-running identical queries\n- **`instances.jsonl`**: The evaluation instances (questions, prompts, etc.) used in the benchmark\n- **`results.jsonl`**: HELM's native results format with detailed metrics\n- **`model_configs/`**: Dynamically generated configuration files for the specific model and provider\n- **`evaluation_config.yaml`**: Standardized results in YAML format compatible with EvalFactory\n\n**Key Advantage**: `eval-factory` automatically handles the complexity of HELM configuration generation, making it much easier to run benchmarks compared to using `helm-run` directly.\n\n## Step-by-Step Guide\n\n### Step 1: Choose Your Benchmark\n\nSelect from the available benchmarks based on your evaluation needs:\n\n- **For general medical QA**: `medcalc_bench`, `head_qa`, `medbullets`\n- **For error detection**: `medec`\n- **For research applications**: `pubmed_qa`, `ehr_sql`\n- **For safety evaluation**: `race_based_med`, `medhallu`\n\n### Step 2: Configure Your Model\n\nCreate a YAML configuration file with your model details. Here are examples for different providers:\n\n#### OpenAI Configuration\n```yaml\nconfig:\n type: medcalc_bench\n output_dir: results/openai_test\ntarget:\n api_endpoint:\n url: https://api.openai.com/v1\n model_id: gpt-4\n type: chat\n api_key: OPENAI_API_KEY\n```\n\n#### NVIDIA AI Foundation Models (build.nvidia.com)\n```yaml\nconfig:\n type: pubmed_qa\n output_dir: results/nim_test\ntarget:\n api_endpoint:\n url: https://integrate.api.nvidia.com/v1\n model_id: nvdev/meta/llama-3.3-70b-instruct\n type: chat\n api_key: OPENAI_API_KEY\n```\n\n#### NVIDIA Cloud Function (nvcf)\n```yaml\nconfig:\n type: ehr_sql\n output_dir: results/nvcf_test\ntarget:\n api_endpoint:\n url: https://api.nvcf.nvidia.com/v2/nvcf/pexec/functions/13e4f873-9d52-4ba9-8194-61baf8dc2bc9/\n model_id: meta-llama/Llama-3.3-70B-Instruct\n type: chat\n api_key: OPENAI_API_KEY\n adapter_config:\n use_nvcf: true\n```\n\n### Model Naming Conventions\n\nDifferent providers use different model ID formats:\n\n- **OpenAI**: `gpt-4`, `gpt-3.5-turbo`, `text-davinci-003`\n- **NVIDIA**: `meta-llama/Llama-3.3-70B-Instruct`, `mistral-7b-instruct`\n\n**Note**: NVCF requires a specific function ID in the URL and the `use_nvcf: true` adapter configuration.\n\n### Step 3: Set Up API Credentials\n\nEnsure your API credentials are properly configured:\n\n```bash\n# For OpenAI models\nexport OPENAI_API_KEY=\"<very-long-sequence>\"\n\n# For NVIDIA AI Foundation Models (build.nvidia.com)\nexport OPENAI_API_KEY=\"nvapi-...\" # Uses same env var as OpenAI\n\n# For NVIDIA Cloud Function (nvcf)\nexport OPENAI_API_KEY=\"nvapi-...\" # Uses same env var as OpenAI\n\n# Note: NVIDIA services typically use the same OPENAI_API_KEY environment variable\n# but with NVIDIA-specific API keys (nvapi-... format)\n```\n\n### Step 4: Run the Evaluation\n\nExecute the benchmark using one of the methods above. The framework will:\n\n1. **Load the configuration** and validate parameters\n2. **Generate model configs** dynamically for the specified model\n3. **Download and prepare** the benchmark dataset\n4. **Run evaluations** on the specified number of instances\n5. **Cache responses** for efficiency and reproducibility\n6. **Generate results** in standardized format\n\n### Step 5: Analyze Results\n\nReview the generated results:\n\n```bash\n# View raw results\ncat results/my_test/results.jsonl\n\n# Use HELM tools for analysis\nhelm-summarize --suite my-suite\nhelm-server # Start web interface to view results\n```\n\n## Advanced Configuration\n\n### Customizing Evaluation Parameters\n\nYou can customize various parameters in your configuration:\n\n```yaml\nconfig:\n type: medcalc_bench\n output_dir: results/advanced_test\n params:\n limit_samples: 100 # Limit number of evaluation instances\n parallelism: 4 # Number of parallel threads\n extra:\n num_train_trials: 3 # Number of training trials\n max_length: 2048 # Maximum token length\ntarget:\n api_endpoint:\n url: https://api.openai.com/v1\n model_id: gpt-4\n type: chat\n api_key: OPENAI_API_KEY\n```\n\n### Advanced Configuration Parameters\n\nThe `config.params.extra` section provides additional parameters for fine-tuning evaluations:\n\n#### `data_path`\n- **Purpose**: Custom data path for scenarios that support it\n- **Supported Scenarios**: `ehrshot`, `clear`, `medalign`, `n2c2_ct_matching`\n- **Example**: `\"/path/to/custom/data\"`\n- **Description**: Overrides the default data location for the scenario\n\n#### `num_output_tokens`\n- **Purpose**: Maximum number of tokens the model is allowed to generate in its response\n- **Scope**: Controls only the output length, not the total sequence length\n- **Example**: `1000` limits model responses to 1000 tokens\n- **Use Case**: Useful for controlling response length in generation tasks\n\n#### `max_length`\n- **Purpose**: Maximum total length for the entire input-output sequence (input + output combined)\n- **Scope**: Controls the combined length of both prompt and response\n- **Example**: `2048` limits total conversation to 2048 tokens\n- **Difference from num_output_tokens**: This controls total sequence length, while num_output_tokens only controls response length\n\n#### `subject`\n- **Purpose**: Specific task or subset to evaluate within a scenario\n- **Examples by Scenario**:\n - **ehrshot**: `\"guo_readmission\"`, `\"new_hypertension\"`, `\"lab_anemia\"`\n - **n2c2_ct_matching**: `\"ABDOMINAL\"`, `\"ADVANCED-CAD\"`, `\"CREATININE\"`\n - **clear**: `\"major_depression\"`, `\"bipolar_disorder\"`, `\"substance_use_disorder\"`\n- **Description**: Filters the evaluation to a specific prediction task or medical condition\n\n#### `condition`\n- **Purpose**: Specific condition or scenario variant to evaluate\n- **Supported Scenarios**: `clear`\n- **Examples**: `\"alcohol_dependence\"`, `\"chronic_pain\"`, `\"homelessness\"`\n- **Description**: Used by scenarios like 'clear' to specify medical conditions for evaluation\n\n#### `num_train_trials`\n- **Purpose**: Number of training trials for few-shot evaluation\n- **Behavior**: Each trial samples a different set of in-context examples\n- **Example**: `3` runs the evaluation 3 times with different examples\n- **Use Case**: Useful for robust evaluation with multiple few-shot configurations\n\n### Example Configuration with All Parameters\n\n```yaml\nconfig:\n type: ehrshot\n output_dir: results/ehrshot_evaluation\n params:\n limit_samples: 500\n parallelism: 2\n extra:\n data_path: \"/custom/path/to/ehrshot/data\"\n num_output_tokens: 1000\n max_length: 4096\n subject: \"guo_readmission\"\n num_train_trials: 3\ntarget:\n api_endpoint:\n url: https://api.openai.com/v1\n model_id: gpt-4\n type: chat\n api_key: OPENAI_API_KEY\n```\n\n### Running Multiple Benchmarks\n\nTo run multiple benchmarks on the same model:\n\n```bash\n# Create separate config files for each benchmark\neval-factory run_eval --output_dir results/medcalc_test --run_config medcalc_config.yml\neval-factory run_eval --output_dir results/medec_test --run_config medec_config.yml\neval-factory run_eval --output_dir results/head_qa_test --run_config head_qa_config.yml\n```\n\n### Dry Run Mode\n\nTest your configuration without running the full evaluation:\n\n```bash\neval-factory run_eval \\\n --output_dir results/test \\\n --run_config my_config.yml \\\n --dry_run\n```\n\nThis will show you the rendered configuration and command without executing the benchmark.\n\n## Troubleshooting\n\n### Common Issues\n\n1. **API Key Errors**: Ensure your API keys are properly set and valid\n2. **Model Not Found**: Verify the model ID and endpoint URL are correct\n3. **Memory Issues**: Reduce `parallelism` or `limit_samples` for large models\n4. **Timeout Errors**: Increase timeout settings or reduce batch sizes\n\n### Debug Mode\n\nEnable debug logging for detailed information:\n\n```bash\neval-factory --debug run_eval \\\n --output_dir results/debug_test \\\n --run_config debug_config.yml\n```\n\n### Checking Available Tasks\n\nList all available evaluation types:\n\n```bash\neval-factory ls\n```\n\n## Examples from commands.sh\n\nHere are some practical examples from the project:\n\n### Basic Medical Calculation Benchmark\n```bash\neval-factory run_eval \\\n --output_dir test_cases/test_case_nim_llama_3_1_8b_medcalc_bench \\\n --run_config test_cases/test_case_nim_llama_3_1_8b_medcalc_bench.yml\n```\n\n### Medical Error Detection\n```bash\neval-factory run_eval \\\n --output_dir test_cases/test_case_nim_llama_3_1_8b_medec \\\n --run_config test_cases/test_case_nim_llama_3_1_8b_medec.yml\n```\n\n### Biomedical QA\n```bash\neval-factory run_eval \\\n --output_dir test_cases/test_case_nim_llama_3_1_8b_head_qa \\\n --run_config test_cases/test_case_nim_llama_3_1_8b_head_qa.yml\n```\n\n\n## Integration with EvalFactory\n\nThis framework is designed to work seamlessly with the EvalFactory infrastructure:\n\n- **Standardized Output**: Results are generated in a format compatible with EvalFactory\n- **Configuration Management**: Uses YAML-based configuration for easy integration\n- **Caching**: Built-in caching for efficient re-runs and reproducibility\n- **Extensibility**: Easy to add new benchmarks and evaluation metrics\n\n## Contributing\n\nTo add new benchmarks or modify existing ones:\n\n1. Update `framework.yml` with new benchmark definitions\n2. Implement the benchmark logic in the appropriate adapter\n3. Add test cases and documentation\n4. Update this README with new benchmark information\n\n## References\n\n- [HELM Framework](https://github.com/stanford-crfm/helm)\n- [EvalFactory Documentation](https://github.com/nvidia/eval-factory)\n- [Medical AI Evaluation Papers](https://arxiv.org/abs/2401.00000)\n\nFor more detailed information about specific benchmarks and their implementations, refer to the individual benchmark documentation and the main HELM repository. \n\n\n# Holistic Evaluation of Language Models (HELM)\n\n\n<a href=\"https://github.com/stanford-crfm/helm\">\n <img alt=\"GitHub Repo stars\" src=\"https://img.shields.io/github/stars/stanford-crfm/helm\">\n</a>\n<a href=\"https://github.com/stanford-crfm/helm/graphs/contributors\">\n <img alt=\"GitHub contributors\" src=\"https://img.shields.io/github/contributors/stanford-crfm/helm\">\n</a>\n<a href=\"https://github.com/stanford-crfm/helm/actions/workflows/test.yml?query=branch%3Amain\">\n <img alt=\"GitHub Actions Workflow Status\" src=\"https://img.shields.io/github/actions/workflow/status/stanford-crfm/helm/test.yml\">\n</a>\n<a href=\"https://crfm-helm.readthedocs.io/en/latest/\">\n <img alt=\"Documentation Status\" src=\"https://readthedocs.org/projects/helm/badge/?version=latest\">\n</a>\n<a href=\"https://github.com/stanford-crfm/helm/blob/main/LICENSE\">\n <img alt=\"License\" src=\"https://img.shields.io/github/license/stanford-crfm/helm?color=blue\" />\n</a>\n<a href=\"https://pypi.org/project/crfm-helm/\">\n <img alt=\"PyPI\" src=\"https://img.shields.io/pypi/v/crfm-helm?color=blue\" />\n</a>\n\n[comment]: <> (When using the img tag, which allows us to specify size, src has to be a URL.)\n<img src=\"https://github.com/stanford-crfm/helm/raw/v0.5.4/helm-frontend/src/assets/helm-logo.png\" alt=\"HELM logo\" width=\"480\"/>\n\n**Holistic Evaluation of Language Models (HELM)** is an open source Python framework created by the [Center for Research on Foundation Models (CRFM) at Stanford](https://crfm.stanford.edu/) for holistic, reproducible and transparent evaluation of foundation models, including large language models (LLMs) and multimodal models. This framework includes the following features:\n\n- Datasets and benchmarks in a standardized format (e.g. MMLU-Pro, GPQA, IFEval, WildBench)\n- Models from various providers accessible through a unified interface (e.g. OpenAI models, Anthropic Claude, Google Gemini)\n- Metrics for measuring various aspects beyond accuracy (e.g. efficiency, bias, toxicity)\n- Web UI for inspecting individual prompts and responses\n- Web leaderboard for comparing results across models and benchmarks\n\n## Documentation\n\nPlease refer to [the documentation on Read the Docs](https://crfm-helm.readthedocs.io/) for instructions on how to install and run HELM.\n\n## Quick Start\n\n<!--quick-start-begin-->\n\nInstall the package from PyPI:\n\n```sh\npip install crfm-helm\n```\n\nRun the following in your shell:\n\n```sh\n# Run benchmark\nhelm-run --run-entries mmlu:subject=philosophy,model=openai/gpt2 --suite my-suite --max-eval-instances 10\n\n# Summarize benchmark results\nhelm-summarize --suite my-suite\n\n# Start a web server to display benchmark results\nhelm-server --suite my-suite\n```\n\nThen go to http://localhost:8000/ in your browser.\n\n<!--quick-start-end-->\n\n## Leaderboards\n\nWe maintain offical leaderboards with results from evaluating recent models on notable benchmarks using this framework. Our current flagship leaderboards are:\n\n- [HELM Capabilities](https://crfm.stanford.edu/helm/capabilities/latest/)\n- [HELM Safety](https://crfm.stanford.edu/helm/safety/latest/)\n- [Holistic Evaluation of Vision-Language Models (VHELM)](https://crfm.stanford.edu/helm/vhelm/latest/)\n\nWe also maintain leaderboards for a diverse range of domains (e.g. medicine, finance) and aspects (e.g. multi-linguality, world knowledge, regulation compliance). Refer to the [HELM website](https://crfm.stanford.edu/helm/) for a full list of leaderboards.\n\n## Papers\n\nThe HELM framework was used in the following papers for evaluating models.\n\n- **Holistic Evaluation of Language Models** - [paper](https://openreview.net/forum?id=iO4LZibEqW), [leaderboard](https://crfm.stanford.edu/helm/classic/latest/)\n- **Holistic Evaluation of Vision-Language Models (VHELM)** - [paper](https://arxiv.org/abs/2410.07112), [leaderboard](https://crfm.stanford.edu/helm/vhelm/latest/), [documentation](https://crfm-helm.readthedocs.io/en/latest/vhelm/)\n- **Holistic Evaluation of Text-To-Image Models (HEIM)** - [paper](https://arxiv.org/abs/2311.04287), [leaderboard](https://crfm.stanford.edu/helm/heim/latest/), [documentation](https://crfm-helm.readthedocs.io/en/latest/heim/)\n- **Image2Struct: Benchmarking Structure Extraction for Vision-Language Models** - [paper](https://arxiv.org/abs/2410.22456)\n- **Enterprise Benchmarks for Large Language Model Evaluation** - [paper](https://arxiv.org/abs/2410.12857), [documentation](https://crfm-helm.readthedocs.io/en/latest/enterprise_benchmark/)\n- **The Mighty ToRR: A Benchmark for Table Reasoning and Robustness** - [paper](https://arxiv.org/abs/2502.19412), [leaderboard](https://crfm.stanford.edu/helm/torr/latest/)\n- **Reliable and Efficient Amortized Model-based Evaluation** - [paper](https://arxiv.org/abs/2503.13335), [documentation](https://crfm-helm.readthedocs.io/en/latest/reeval/)\n- **MedHELM** - paper in progress, [leaderboard](https://crfm.stanford.edu/helm/medhelm/latest/), [documentation](https://crfm-helm.readthedocs.io/en/latest/reeval/)\n\nThe HELM framework can be used to reproduce the published model evaluation results from these papers. To get started, refer to the documentation links above for the corresponding paper, or the [main Reproducing Leaderboards documentation](https://crfm-helm.readthedocs.io/en/latest/reproducing_leaderboards/).\n\n## Citation\n\nIf you use this software in your research, please cite the [Holistic Evaluation of Language Models paper](https://openreview.net/forum?id=iO4LZibEqW) as below.\n\n```bibtex\n@article{\nliang2023holistic,\ntitle={Holistic Evaluation of Language Models},\nauthor={Percy Liang and Rishi Bommasani and Tony Lee and Dimitris Tsipras and Dilara Soylu and Michihiro Yasunaga and Yian Zhang and Deepak Narayanan and Yuhuai Wu and Ananya Kumar and Benjamin Newman and Binhang Yuan and Bobby Yan and Ce Zhang and Christian Alexander Cosgrove and Christopher D Manning and Christopher Re and Diana Acosta-Navas and Drew Arad Hudson and Eric Zelikman and Esin Durmus and Faisal Ladhak and Frieda Rong and Hongyu Ren and Huaxiu Yao and Jue WANG and Keshav Santhanam and Laurel Orr and Lucia Zheng and Mert Yuksekgonul and Mirac Suzgun and Nathan Kim and Neel Guha and Niladri S. Chatterji and Omar Khattab and Peter Henderson and Qian Huang and Ryan Andrew Chi and Sang Michael Xie and Shibani Santurkar and Surya Ganguli and Tatsunori Hashimoto and Thomas Icard and Tianyi Zhang and Vishrav Chaudhary and William Wang and Xuechen Li and Yifan Mai and Yuhui Zhang and Yuta Koreeda},\njournal={Transactions on Machine Learning Research},\nissn={2835-8856},\nyear={2023},\nurl={https://openreview.net/forum?id=iO4LZibEqW},\nnote={Featured Certification, Expert Certification}\n}\n```\n",
"bugtrack_url": null,
"license": "Apache License 2.0",
"summary": "Benchmark for language models",
"version": "25.7.2",
"project_urls": {
"Homepage": "https://github.com/stanford-crfm/helm"
},
"split_keywords": [
"language",
"models",
"benchmarking"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "9a10f8560adcb6be9e29b74c865b1dc3d19c61d6e2d8d93e73b0b2eeacd6f025",
"md5": "478fbc695e3e4cce68a016adef598d04",
"sha256": "bfe6964f55080fcbaab3790f0e6998f6d702fa2222deda3ffe7e1f68c6b87e0c"
},
"downloads": -1,
"filename": "nvidia_crfm_helm-25.7.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "478fbc695e3e4cce68a016adef598d04",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 6966154,
"upload_time": "2025-08-06T08:50:37",
"upload_time_iso_8601": "2025-08-06T08:50:37.689963Z",
"url": "https://files.pythonhosted.org/packages/9a/10/f8560adcb6be9e29b74c865b1dc3d19c61d6e2d8d93e73b0b2eeacd6f025/nvidia_crfm_helm-25.7.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-06 08:50:37",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "stanford-crfm",
"github_project": "helm",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "absl-py",
"specs": [
[
"==",
"2.3.1"
]
]
},
{
"name": "accelerate",
"specs": [
[
"==",
"0.34.2"
]
]
},
{
"name": "ai2-olmo",
"specs": [
[
"==",
"0.2.2"
]
]
},
{
"name": "ai2-olmo",
"specs": [
[
"==",
"0.6.0"
]
]
},
{
"name": "ai2-olmo-core",
"specs": [
[
"==",
"0.1.0"
]
]
},
{
"name": "aiodns",
"specs": [
[
"==",
"3.5.0"
]
]
},
{
"name": "aiohappyeyeballs",
"specs": [
[
"==",
"2.6.1"
]
]
},
{
"name": "aiohttp",
"specs": [
[
"==",
"3.12.14"
]
]
},
{
"name": "aiohttp-retry",
"specs": [
[
"==",
"2.9.1"
]
]
},
{
"name": "aiosignal",
"specs": [
[
"==",
"1.4.0"
]
]
},
{
"name": "aleph-alpha-client",
"specs": [
[
"==",
"2.17.0"
]
]
},
{
"name": "annotated-types",
"specs": [
[
"==",
"0.7.0"
]
]
},
{
"name": "anthropic",
"specs": [
[
"==",
"0.58.2"
]
]
},
{
"name": "antlr4-python3-runtime",
"specs": [
[
"==",
"4.9.3"
]
]
},
{
"name": "anyio",
"specs": [
[
"==",
"4.9.0"
]
]
},
{
"name": "astunparse",
"specs": [
[
"==",
"1.6.3"
]
]
},
{
"name": "async-timeout",
"specs": [
[
"==",
"5.0.1"
]
]
},
{
"name": "attrs",
"specs": [
[
"==",
"25.3.0"
]
]
},
{
"name": "autokeras",
"specs": [
[
"==",
"1.1.0"
]
]
},
{
"name": "av",
"specs": [
[
"==",
"15.0.0"
]
]
},
{
"name": "awscli",
"specs": [
[
"==",
"1.41.9"
]
]
},
{
"name": "beaker-gantry",
"specs": [
[
"==",
"1.15.0"
]
]
},
{
"name": "beaker-gantry",
"specs": [
[
"==",
"1.16.0"
]
]
},
{
"name": "beaker-py",
"specs": [
[
"==",
"1.34.3"
]
]
},
{
"name": "beaker-py",
"specs": [
[
"==",
"1.36.1"
]
]
},
{
"name": "beautifulsoup4",
"specs": [
[
"==",
"4.13.4"
]
]
},
{
"name": "black",
"specs": [
[
"==",
"24.3.0"
]
]
},
{
"name": "blis",
"specs": [
[
"==",
"1.2.1"
]
]
},
{
"name": "boltons",
"specs": [
[
"==",
"25.0.0"
]
]
},
{
"name": "boto3",
"specs": [
[
"==",
"1.39.9"
]
]
},
{
"name": "botocore",
"specs": [
[
"==",
"1.39.9"
]
]
},
{
"name": "bottle",
"specs": [
[
"==",
"0.12.25"
]
]
},
{
"name": "cached-path",
"specs": [
[
"==",
"1.7.3"
]
]
},
{
"name": "cachetools",
"specs": [
[
"==",
"5.5.2"
]
]
},
{
"name": "catalogue",
"specs": [
[
"==",
"2.0.10"
]
]
},
{
"name": "cattrs",
"specs": [
[
"==",
"22.2.0"
]
]
},
{
"name": "certifi",
"specs": [
[
"==",
"2025.7.14"
]
]
},
{
"name": "cffi",
"specs": [
[
"==",
"1.17.1"
]
]
},
{
"name": "cfgv",
"specs": [
[
"==",
"3.4.0"
]
]
},
{
"name": "charset-normalizer",
"specs": [
[
"==",
"3.4.2"
]
]
},
{
"name": "chex",
"specs": [
[
"==",
"0.1.89"
]
]
},
{
"name": "clang",
"specs": [
[
"==",
"20.1.5"
]
]
},
{
"name": "click",
"specs": [
[
"==",
"8.1.8"
]
]
},
{
"name": "click",
"specs": [
[
"==",
"8.2.1"
]
]
},
{
"name": "click-help-colors",
"specs": [
[
"==",
"0.9.4"
]
]
},
{
"name": "clip-anytorch",
"specs": [
[
"==",
"2.6.0"
]
]
},
{
"name": "cloudpathlib",
"specs": [
[
"==",
"0.21.1"
]
]
},
{
"name": "cohere",
"specs": [
[
"==",
"5.16.1"
]
]
},
{
"name": "colorama",
"specs": [
[
"==",
"0.4.6"
]
]
},
{
"name": "colorcet",
"specs": [
[
"==",
"3.1.0"
]
]
},
{
"name": "coloredlogs",
"specs": [
[
"==",
"15.0.1"
]
]
},
{
"name": "colorlog",
"specs": [
[
"==",
"6.9.0"
]
]
},
{
"name": "confection",
"specs": [
[
"==",
"0.1.5"
]
]
},
{
"name": "contourpy",
"specs": [
[
"==",
"1.3.0"
]
]
},
{
"name": "contourpy",
"specs": [
[
"==",
"1.3.2"
]
]
},
{
"name": "cycler",
"specs": [
[
"==",
"0.12.1"
]
]
},
{
"name": "cymem",
"specs": [
[
"==",
"2.0.11"
]
]
},
{
"name": "dacite",
"specs": [
[
"==",
"1.9.2"
]
]
},
{
"name": "data",
"specs": [
[
"==",
"0.4"
]
]
},
{
"name": "datasets",
"specs": [
[
"==",
"3.6.0"
]
]
},
{
"name": "decorator",
"specs": [
[
"==",
"5.2.1"
]
]
},
{
"name": "diffusers",
"specs": [
[
"==",
"0.34.0"
]
]
},
{
"name": "dill",
"specs": [
[
"==",
"0.3.8"
]
]
},
{
"name": "distlib",
"specs": [
[
"==",
"0.4.0"
]
]
},
{
"name": "distro",
"specs": [
[
"==",
"1.9.0"
]
]
},
{
"name": "dnspython",
"specs": [
[
"==",
"2.7.0"
]
]
},
{
"name": "docker",
"specs": [
[
"==",
"7.1.0"
]
]
},
{
"name": "docstring-parser",
"specs": [
[
"==",
"0.17.0"
]
]
},
{
"name": "docutils",
"specs": [
[
"==",
"0.19"
]
]
},
{
"name": "einops",
"specs": [
[
"==",
"0.7.0"
]
]
},
{
"name": "einops-exts",
"specs": [
[
"==",
"0.0.4"
]
]
},
{
"name": "et-xmlfile",
"specs": [
[
"==",
"2.0.0"
]
]
},
{
"name": "etils",
"specs": [
[
"==",
"1.5.2"
]
]
},
{
"name": "etils",
"specs": [
[
"==",
"1.13.0"
]
]
},
{
"name": "eval-type-backport",
"specs": [
[
"==",
"0.2.2"
]
]
},
{
"name": "exceptiongroup",
"specs": [
[
"==",
"1.3.0"
]
]
},
{
"name": "face",
"specs": [
[
"==",
"24.0.0"
]
]
},
{
"name": "fairlearn",
"specs": [
[
"==",
"0.9.0"
]
]
},
{
"name": "fastavro",
"specs": [
[
"==",
"1.11.1"
]
]
},
{
"name": "filelock",
"specs": [
[
"==",
"3.18.0"
]
]
},
{
"name": "flake8",
"specs": [
[
"==",
"5.0.4"
]
]
},
{
"name": "flatbuffers",
"specs": [
[
"==",
"25.2.10"
]
]
},
{
"name": "flax",
"specs": [
[
"==",
"0.8.5"
]
]
},
{
"name": "flax",
"specs": [
[
"==",
"0.10.7"
]
]
},
{
"name": "fonttools",
"specs": [
[
"==",
"4.59.0"
]
]
},
{
"name": "frozenlist",
"specs": [
[
"==",
"1.7.0"
]
]
},
{
"name": "fsspec",
"specs": [
[
"==",
"2025.3.0"
]
]
},
{
"name": "ftfy",
"specs": [
[
"==",
"6.3.1"
]
]
},
{
"name": "funcsigs",
"specs": [
[
"==",
"1.0.2"
]
]
},
{
"name": "future",
"specs": [
[
"==",
"1.0.0"
]
]
},
{
"name": "gast",
"specs": [
[
"==",
"0.4.0"
]
]
},
{
"name": "gast",
"specs": [
[
"==",
"0.6.0"
]
]
},
{
"name": "gdown",
"specs": [
[
"==",
"5.2.0"
]
]
},
{
"name": "gitdb",
"specs": [
[
"==",
"4.0.12"
]
]
},
{
"name": "gitpython",
"specs": [
[
"==",
"3.1.44"
]
]
},
{
"name": "glom",
"specs": [
[
"==",
"24.11.0"
]
]
},
{
"name": "google-api-core",
"specs": [
[
"==",
"1.34.1"
]
]
},
{
"name": "google-api-core",
"specs": [
[
"==",
"2.25.1"
]
]
},
{
"name": "google-api-python-client",
"specs": [
[
"==",
"2.176.0"
]
]
},
{
"name": "google-auth",
"specs": [
[
"==",
"2.40.3"
]
]
},
{
"name": "google-auth-httplib2",
"specs": [
[
"==",
"0.2.0"
]
]
},
{
"name": "google-auth-oauthlib",
"specs": [
[
"==",
"0.4.6"
]
]
},
{
"name": "google-cloud-aiplatform",
"specs": [
[
"==",
"1.60.0"
]
]
},
{
"name": "google-cloud-aiplatform",
"specs": [
[
"==",
"1.91.0"
]
]
},
{
"name": "google-cloud-aiplatform",
"specs": [
[
"==",
"1.104.0"
]
]
},
{
"name": "google-cloud-bigquery",
"specs": [
[
"==",
"3.25.0"
]
]
},
{
"name": "google-cloud-bigquery",
"specs": [
[
"==",
"3.35.0"
]
]
},
{
"name": "google-cloud-core",
"specs": [
[
"==",
"2.4.3"
]
]
},
{
"name": "google-cloud-resource-manager",
"specs": [
[
"==",
"1.12.3"
]
]
},
{
"name": "google-cloud-resource-manager",
"specs": [
[
"==",
"1.14.2"
]
]
},
{
"name": "google-cloud-storage",
"specs": [
[
"==",
"2.14.0"
]
]
},
{
"name": "google-cloud-storage",
"specs": [
[
"==",
"2.19.0"
]
]
},
{
"name": "google-cloud-translate",
"specs": [
[
"==",
"3.15.3"
]
]
},
{
"name": "google-cloud-translate",
"specs": [
[
"==",
"3.21.1"
]
]
},
{
"name": "google-crc32c",
"specs": [
[
"==",
"1.7.1"
]
]
},
{
"name": "google-genai",
"specs": [
[
"==",
"1.2.0"
]
]
},
{
"name": "google-pasta",
"specs": [
[
"==",
"0.2.0"
]
]
},
{
"name": "google-resumable-media",
"specs": [
[
"==",
"2.7.2"
]
]
},
{
"name": "googleapis-common-protos",
"specs": [
[
"==",
"1.63.1"
]
]
},
{
"name": "googleapis-common-protos",
"specs": [
[
"==",
"1.70.0"
]
]
},
{
"name": "gradio-client",
"specs": [
[
"==",
"1.3.0"
]
]
},
{
"name": "gradio-client",
"specs": [
[
"==",
"1.11.0"
]
]
},
{
"name": "grpc-google-iam-v1",
"specs": [
[
"==",
"0.13.0"
]
]
},
{
"name": "grpc-google-iam-v1",
"specs": [
[
"==",
"0.14.2"
]
]
},
{
"name": "grpcio",
"specs": [
[
"==",
"1.70.0"
]
]
},
{
"name": "grpcio",
"specs": [
[
"==",
"1.73.1"
]
]
},
{
"name": "grpcio-status",
"specs": [
[
"==",
"1.48.2"
]
]
},
{
"name": "grpcio-status",
"specs": [
[
"==",
"1.71.2"
]
]
},
{
"name": "gunicorn",
"specs": [
[
"==",
"23.0.0"
]
]
},
{
"name": "h11",
"specs": [
[
"==",
"0.16.0"
]
]
},
{
"name": "h5py",
"specs": [
[
"==",
"3.14.0"
]
]
},
{
"name": "hf-xet",
"specs": [
[
"==",
"1.1.5"
]
]
},
{
"name": "html2text",
"specs": [
[
"==",
"2024.2.26"
]
]
},
{
"name": "httpcore",
"specs": [
[
"==",
"1.0.9"
]
]
},
{
"name": "httplib2",
"specs": [
[
"==",
"0.22.0"
]
]
},
{
"name": "httpx",
"specs": [
[
"==",
"0.27.2"
]
]
},
{
"name": "httpx-sse",
"specs": [
[
"==",
"0.4.0"
]
]
},
{
"name": "huggingface-hub",
"specs": [
[
"==",
"0.33.4"
]
]
},
{
"name": "humanfriendly",
"specs": [
[
"==",
"10.0"
]
]
},
{
"name": "humanize",
"specs": [
[
"==",
"4.12.3"
]
]
},
{
"name": "icetk",
"specs": [
[
"==",
"0.0.4"
]
]
},
{
"name": "identify",
"specs": [
[
"==",
"2.6.12"
]
]
},
{
"name": "idna",
"specs": [
[
"==",
"3.10"
]
]
},
{
"name": "imagehash",
"specs": [
[
"==",
"4.3.2"
]
]
},
{
"name": "imageio",
"specs": [
[
"==",
"2.37.0"
]
]
},
{
"name": "immutabledict",
"specs": [
[
"==",
"4.2.1"
]
]
},
{
"name": "importlib-metadata",
"specs": [
[
"==",
"8.7.0"
]
]
},
{
"name": "importlib-resources",
"specs": [
[
"==",
"5.13.0"
]
]
},
{
"name": "iniconfig",
"specs": [
[
"==",
"2.1.0"
]
]
},
{
"name": "jax",
"specs": [
[
"==",
"0.4.30"
]
]
},
{
"name": "jax",
"specs": [
[
"==",
"0.6.2"
]
]
},
{
"name": "jaxlib",
"specs": [
[
"==",
"0.4.30"
]
]
},
{
"name": "jaxlib",
"specs": [
[
"==",
"0.6.2"
]
]
},
{
"name": "jieba",
"specs": [
[
"==",
"0.42.1"
]
]
},
{
"name": "jinja2",
"specs": [
[
"==",
"3.1.6"
]
]
},
{
"name": "jiter",
"specs": [
[
"==",
"0.10.0"
]
]
},
{
"name": "jmespath",
"specs": [
[
"==",
"1.0.1"
]
]
},
{
"name": "joblib",
"specs": [
[
"==",
"1.5.1"
]
]
},
{
"name": "jsonpath-python",
"specs": [
[
"==",
"1.0.6"
]
]
},
{
"name": "kagglehub",
"specs": [
[
"==",
"0.3.12"
]
]
},
{
"name": "keras",
"specs": [
[
"==",
"2.11.0"
]
]
},
{
"name": "keras",
"specs": [
[
"==",
"3.10.0"
]
]
},
{
"name": "keras-hub",
"specs": [
[
"==",
"0.18.1"
]
]
},
{
"name": "keras-hub",
"specs": [
[
"==",
"0.21.1"
]
]
},
{
"name": "keras-nlp",
"specs": [
[
"==",
"0.18.1"
]
]
},
{
"name": "keras-nlp",
"specs": [
[
"==",
"0.21.1"
]
]
},
{
"name": "keras-tuner",
"specs": [
[
"==",
"1.4.7"
]
]
},
{
"name": "kiwisolver",
"specs": [
[
"==",
"1.4.7"
]
]
},
{
"name": "kiwisolver",
"specs": [
[
"==",
"1.4.8"
]
]
},
{
"name": "kt-legacy",
"specs": [
[
"==",
"1.0.5"
]
]
},
{
"name": "langcodes",
"specs": [
[
"==",
"3.5.0"
]
]
},
{
"name": "langdetect",
"specs": [
[
"==",
"1.0.9"
]
]
},
{
"name": "language-data",
"specs": [
[
"==",
"1.3.0"
]
]
},
{
"name": "latex",
"specs": [
[
"==",
"0.7.0"
]
]
},
{
"name": "lazy-loader",
"specs": [
[
"==",
"0.4"
]
]
},
{
"name": "levenshtein",
"specs": [
[
"==",
"0.27.1"
]
]
},
{
"name": "libclang",
"specs": [
[
"==",
"18.1.1"
]
]
},
{
"name": "lightning-utilities",
"specs": [
[
"==",
"0.14.3"
]
]
},
{
"name": "llvmlite",
"specs": [
[
"==",
"0.43.0"
]
]
},
{
"name": "llvmlite",
"specs": [
[
"==",
"0.44.0"
]
]
},
{
"name": "logzio-python-handler",
"specs": [
[
"==",
"3.1.1"
]
]
},
{
"name": "lpips",
"specs": [
[
"==",
"0.1.4"
]
]
},
{
"name": "lxml",
"specs": [
[
"==",
"6.0.0"
]
]
},
{
"name": "mako",
"specs": [
[
"==",
"1.3.10"
]
]
},
{
"name": "marisa-trie",
"specs": [
[
"==",
"1.2.1"
]
]
},
{
"name": "markdown",
"specs": [
[
"==",
"3.8.2"
]
]
},
{
"name": "markdown-it-py",
"specs": [
[
"==",
"3.0.0"
]
]
},
{
"name": "markupsafe",
"specs": [
[
"==",
"3.0.2"
]
]
},
{
"name": "matplotlib",
"specs": [
[
"==",
"3.9.4"
]
]
},
{
"name": "matplotlib",
"specs": [
[
"==",
"3.10.3"
]
]
},
{
"name": "mccabe",
"specs": [
[
"==",
"0.7.0"
]
]
},
{
"name": "mdurl",
"specs": [
[
"==",
"0.1.2"
]
]
},
{
"name": "mistralai",
"specs": [
[
"==",
"1.5.2"
]
]
},
{
"name": "ml-dtypes",
"specs": [
[
"==",
"0.5.1"
]
]
},
{
"name": "mpmath",
"specs": [
[
"==",
"1.3.0"
]
]
},
{
"name": "msgpack",
"specs": [
[
"==",
"1.1.1"
]
]
},
{
"name": "msgspec",
"specs": [
[
"==",
"0.19.0"
]
]
},
{
"name": "multidict",
"specs": [
[
"==",
"6.6.3"
]
]
},
{
"name": "multilingual-clip",
"specs": [
[
"==",
"1.0.10"
]
]
},
{
"name": "multiprocess",
"specs": [
[
"==",
"0.70.16"
]
]
},
{
"name": "murmurhash",
"specs": [
[
"==",
"1.0.13"
]
]
},
{
"name": "mypy",
"specs": [
[
"==",
"1.16.0"
]
]
},
{
"name": "mypy-extensions",
"specs": [
[
"==",
"1.1.0"
]
]
},
{
"name": "namex",
"specs": [
[
"==",
"0.1.0"
]
]
},
{
"name": "necessary",
"specs": [
[
"==",
"0.4.3"
]
]
},
{
"name": "nest-asyncio",
"specs": [
[
"==",
"1.6.0"
]
]
},
{
"name": "networkx",
"specs": [
[
"==",
"3.2.1"
]
]
},
{
"name": "networkx",
"specs": [
[
"==",
"3.4.2"
]
]
},
{
"name": "networkx",
"specs": [
[
"==",
"3.5"
]
]
},
{
"name": "nltk",
"specs": [
[
"==",
"3.9.1"
]
]
},
{
"name": "nodeenv",
"specs": [
[
"==",
"1.9.1"
]
]
},
{
"name": "nudenet",
"specs": [
[
"==",
"2.0.9"
]
]
},
{
"name": "numba",
"specs": [
[
"==",
"0.60.0"
]
]
},
{
"name": "numba",
"specs": [
[
"==",
"0.61.2"
]
]
},
{
"name": "numpy",
"specs": [
[
"==",
"1.26.4"
]
]
},
{
"name": "numpy",
"specs": [
[
"==",
"2.2.6"
]
]
},
{
"name": "nvidia-cublas-cu12",
"specs": [
[
"==",
"12.4.5.8"
]
]
},
{
"name": "nvidia-cuda-cupti-cu12",
"specs": [
[
"==",
"12.4.127"
]
]
},
{
"name": "nvidia-cuda-nvrtc-cu12",
"specs": [
[
"==",
"12.4.127"
]
]
},
{
"name": "nvidia-cuda-runtime-cu12",
"specs": [
[
"==",
"12.4.127"
]
]
},
{
"name": "nvidia-cudnn-cu12",
"specs": [
[
"==",
"9.1.0.70"
]
]
},
{
"name": "nvidia-cufft-cu12",
"specs": [
[
"==",
"11.2.1.3"
]
]
},
{
"name": "nvidia-curand-cu12",
"specs": [
[
"==",
"10.3.5.147"
]
]
},
{
"name": "nvidia-cusolver-cu12",
"specs": [
[
"==",
"11.6.1.9"
]
]
},
{
"name": "nvidia-cusparse-cu12",
"specs": [
[
"==",
"12.3.1.170"
]
]
},
{
"name": "nvidia-nccl-cu12",
"specs": [
[
"==",
"2.21.5"
]
]
},
{
"name": "nvidia-nvjitlink-cu12",
"specs": [
[
"==",
"12.4.127"
]
]
},
{
"name": "nvidia-nvtx-cu12",
"specs": [
[
"==",
"12.4.127"
]
]
},
{
"name": "oauthlib",
"specs": [
[
"==",
"3.3.1"
]
]
},
{
"name": "omegaconf",
"specs": [
[
"==",
"2.3.0"
]
]
},
{
"name": "onnxruntime",
"specs": [
[
"==",
"1.19.2"
]
]
},
{
"name": "onnxruntime",
"specs": [
[
"==",
"1.22.1"
]
]
},
{
"name": "open-clip-torch",
"specs": [
[
"==",
"2.32.0"
]
]
},
{
"name": "openai",
"specs": [
[
"==",
"1.97.0"
]
]
},
{
"name": "opencc",
"specs": [
[
"==",
"1.1.9"
]
]
},
{
"name": "opencv-python",
"specs": [
[
"==",
"4.8.1.78"
]
]
},
{
"name": "opencv-python-headless",
"specs": [
[
"==",
"4.11.0.86"
]
]
},
{
"name": "openpyxl",
"specs": [
[
"==",
"3.1.5"
]
]
},
{
"name": "opt-einsum",
"specs": [
[
"==",
"3.4.0"
]
]
},
{
"name": "optax",
"specs": [
[
"==",
"0.2.4"
]
]
},
{
"name": "optax",
"specs": [
[
"==",
"0.2.5"
]
]
},
{
"name": "optree",
"specs": [
[
"==",
"0.16.0"
]
]
},
{
"name": "orbax-checkpoint",
"specs": [
[
"==",
"0.6.4"
]
]
},
{
"name": "orbax-checkpoint",
"specs": [
[
"==",
"0.11.5"
]
]
},
{
"name": "outcome",
"specs": [
[
"==",
"1.3.0.post0"
]
]
},
{
"name": "packaging",
"specs": [
[
"==",
"25.0"
]
]
},
{
"name": "pandas",
"specs": [
[
"==",
"2.3.1"
]
]
},
{
"name": "parameterized",
"specs": [
[
"==",
"0.9.0"
]
]
},
{
"name": "pathspec",
"specs": [
[
"==",
"0.12.1"
]
]
},
{
"name": "pdf2image",
"specs": [
[
"==",
"1.17.0"
]
]
},
{
"name": "petname",
"specs": [
[
"==",
"2.6"
]
]
},
{
"name": "pillow",
"specs": [
[
"==",
"10.4.0"
]
]
},
{
"name": "platformdirs",
"specs": [
[
"==",
"4.3.8"
]
]
},
{
"name": "pluggy",
"specs": [
[
"==",
"1.6.0"
]
]
},
{
"name": "portalocker",
"specs": [
[
"==",
"3.2.0"
]
]
},
{
"name": "pre-commit",
"specs": [
[
"==",
"2.20.0"
]
]
},
{
"name": "preshed",
"specs": [
[
"==",
"3.0.10"
]
]
},
{
"name": "progressbar2",
"specs": [
[
"==",
"4.5.0"
]
]
},
{
"name": "propcache",
"specs": [
[
"==",
"0.3.2"
]
]
},
{
"name": "proto-plus",
"specs": [
[
"==",
"1.26.1"
]
]
},
{
"name": "protobuf",
"specs": [
[
"==",
"3.19.6"
]
]
},
{
"name": "protobuf",
"specs": [
[
"==",
"5.29.5"
]
]
},
{
"name": "psutil",
"specs": [
[
"==",
"7.0.0"
]
]
},
{
"name": "pyarrow",
"specs": [
[
"==",
"21.0.0"
]
]
},
{
"name": "pyarrow-hotfix",
"specs": [
[
"==",
"0.7"
]
]
},
{
"name": "pyasn1",
"specs": [
[
"==",
"0.6.1"
]
]
},
{
"name": "pyasn1-modules",
"specs": [
[
"==",
"0.4.2"
]
]
},
{
"name": "pycares",
"specs": [
[
"==",
"4.9.0"
]
]
},
{
"name": "pycocoevalcap",
"specs": [
[
"==",
"1.2"
]
]
},
{
"name": "pycocotools",
"specs": [
[
"==",
"2.0.10"
]
]
},
{
"name": "pycodestyle",
"specs": [
[
"==",
"2.9.1"
]
]
},
{
"name": "pycparser",
"specs": [
[
"==",
"2.22"
]
]
},
{
"name": "pydantic",
"specs": [
[
"==",
"2.11.7"
]
]
},
{
"name": "pydantic-core",
"specs": [
[
"==",
"2.33.2"
]
]
},
{
"name": "pydload",
"specs": [
[
"==",
"1.0.9"
]
]
},
{
"name": "pyflakes",
"specs": [
[
"==",
"2.5.0"
]
]
},
{
"name": "pygments",
"specs": [
[
"==",
"2.19.2"
]
]
},
{
"name": "pyhocon",
"specs": [
[
"==",
"0.3.61"
]
]
},
{
"name": "pymongo",
"specs": [
[
"==",
"4.13.2"
]
]
},
{
"name": "pyparsing",
"specs": [
[
"==",
"3.2.3"
]
]
},
{
"name": "pypinyin",
"specs": [
[
"==",
"0.49.0"
]
]
},
{
"name": "pyreadline3",
"specs": [
[
"==",
"3.5.4"
]
]
},
{
"name": "pysocks",
"specs": [
[
"==",
"1.7.1"
]
]
},
{
"name": "pytest",
"specs": [
[
"==",
"7.2.2"
]
]
},
{
"name": "python-dateutil",
"specs": [
[
"==",
"2.8.2"
]
]
},
{
"name": "python-utils",
"specs": [
[
"==",
"3.9.1"
]
]
},
{
"name": "pytorch-fid",
"specs": [
[
"==",
"0.3.0"
]
]
},
{
"name": "pytorch-lightning",
"specs": [
[
"==",
"2.5.2"
]
]
},
{
"name": "pytz",
"specs": [
[
"==",
"2025.2"
]
]
},
{
"name": "pywavelets",
"specs": [
[
"==",
"1.6.0"
]
]
},
{
"name": "pywavelets",
"specs": [
[
"==",
"1.8.0"
]
]
},
{
"name": "pywin32",
"specs": [
[
"==",
"311"
]
]
},
{
"name": "pyyaml",
"specs": [
[
"==",
"6.0.2"
]
]
},
{
"name": "qwen-vl-utils",
"specs": [
[
"==",
"0.0.11"
]
]
},
{
"name": "rapidfuzz",
"specs": [
[
"==",
"3.13.0"
]
]
},
{
"name": "regex",
"specs": [
[
"==",
"2024.11.6"
]
]
},
{
"name": "reka-api",
"specs": [
[
"==",
"2.0.0"
]
]
},
{
"name": "requests",
"specs": [
[
"==",
"2.32.4"
]
]
},
{
"name": "requests-oauthlib",
"specs": [
[
"==",
"2.0.0"
]
]
},
{
"name": "requirements-parser",
"specs": [
[
"==",
"0.13.0"
]
]
},
{
"name": "retrying",
"specs": [
[
"==",
"1.4.1"
]
]
},
{
"name": "rich",
"specs": [
[
"==",
"13.9.4"
]
]
},
{
"name": "rouge-score",
"specs": [
[
"==",
"0.1.2"
]
]
},
{
"name": "rsa",
"specs": [
[
"==",
"4.7.2"
]
]
},
{
"name": "s3transfer",
"specs": [
[
"==",
"0.13.1"
]
]
},
{
"name": "sacrebleu",
"specs": [
[
"==",
"2.5.1"
]
]
},
{
"name": "safetensors",
"specs": [
[
"==",
"0.5.3"
]
]
},
{
"name": "scaleapi",
"specs": [
[
"==",
"2.17.0"
]
]
},
{
"name": "scikit-image",
"specs": [
[
"==",
"0.24.0"
]
]
},
{
"name": "scikit-image",
"specs": [
[
"==",
"0.25.2"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
"==",
"1.6.1"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
"==",
"1.7.1"
]
]
},
{
"name": "scipy",
"specs": [
[
"==",
"1.13.1"
]
]
},
{
"name": "scipy",
"specs": [
[
"==",
"1.15.3"
]
]
},
{
"name": "scipy",
"specs": [
[
"==",
"1.16.0"
]
]
},
{
"name": "seaborn",
"specs": [
[
"==",
"0.13.2"
]
]
},
{
"name": "selenium",
"specs": [
[
"==",
"4.32.0"
]
]
},
{
"name": "selenium",
"specs": [
[
"==",
"4.34.2"
]
]
},
{
"name": "sentence-transformers",
"specs": [
[
"==",
"4.1.0"
]
]
},
{
"name": "sentencepiece",
"specs": [
[
"==",
"0.2.0"
]
]
},
{
"name": "sentry-sdk",
"specs": [
[
"==",
"2.33.1"
]
]
},
{
"name": "setuptools",
"specs": [
[
"==",
"80.9.0"
]
]
},
{
"name": "shapely",
"specs": [
[
"==",
"2.0.7"
]
]
},
{
"name": "shapely",
"specs": [
[
"==",
"2.1.1"
]
]
},
{
"name": "shellingham",
"specs": [
[
"==",
"1.5.4"
]
]
},
{
"name": "shutilwhich",
"specs": [
[
"==",
"1.1.0"
]
]
},
{
"name": "simple-slurm",
"specs": [
[
"==",
"0.2.7"
]
]
},
{
"name": "simplejson",
"specs": [
[
"==",
"3.20.1"
]
]
},
{
"name": "six",
"specs": [
[
"==",
"1.17.0"
]
]
},
{
"name": "smart-open",
"specs": [
[
"==",
"7.3.0.post1"
]
]
},
{
"name": "smashed",
"specs": [
[
"==",
"0.21.5"
]
]
},
{
"name": "smmap",
"specs": [
[
"==",
"5.0.2"
]
]
},
{
"name": "sniffio",
"specs": [
[
"==",
"1.3.1"
]
]
},
{
"name": "sortedcontainers",
"specs": [
[
"==",
"2.4.0"
]
]
},
{
"name": "soupsieve",
"specs": [
[
"==",
"2.7"
]
]
},
{
"name": "spacy",
"specs": [
[
"==",
"3.8.7"
]
]
},
{
"name": "spacy-legacy",
"specs": [
[
"==",
"3.0.12"
]
]
},
{
"name": "spacy-loggers",
"specs": [
[
"==",
"1.0.5"
]
]
},
{
"name": "sqlitedict",
"specs": [
[
"==",
"2.1.0"
]
]
},
{
"name": "srsly",
"specs": [
[
"==",
"2.5.1"
]
]
},
{
"name": "surge-api",
"specs": [
[
"==",
"1.5.10"
]
]
},
{
"name": "sympy",
"specs": [
[
"==",
"1.13.1"
]
]
},
{
"name": "tabulate",
"specs": [
[
"==",
"0.9.0"
]
]
},
{
"name": "tempdir",
"specs": [
[
"==",
"0.7.1"
]
]
},
{
"name": "tensorboard",
"specs": [
[
"==",
"2.11.2"
]
]
},
{
"name": "tensorboard",
"specs": [
[
"==",
"2.18.0"
]
]
},
{
"name": "tensorboard-data-server",
"specs": [
[
"==",
"0.6.1"
]
]
},
{
"name": "tensorboard-data-server",
"specs": [
[
"==",
"0.7.2"
]
]
},
{
"name": "tensorboard-plugin-wit",
"specs": [
[
"==",
"1.8.1"
]
]
},
{
"name": "tensorflow",
"specs": [
[
"==",
"2.11.1"
]
]
},
{
"name": "tensorflow",
"specs": [
[
"==",
"2.18.1"
]
]
},
{
"name": "tensorflow-estimator",
"specs": [
[
"==",
"2.11.0"
]
]
},
{
"name": "tensorflow-hub",
"specs": [
[
"==",
"0.16.1"
]
]
},
{
"name": "tensorflow-io-gcs-filesystem",
"specs": [
[
"==",
"0.37.1"
]
]
},
{
"name": "tensorflow-text",
"specs": [
[
"==",
"2.11.0"
]
]
},
{
"name": "tensorflow-text",
"specs": [
[
"==",
"2.18.1"
]
]
},
{
"name": "tensorstore",
"specs": [
[
"==",
"0.1.69"
]
]
},
{
"name": "tensorstore",
"specs": [
[
"==",
"0.1.74"
]
]
},
{
"name": "termcolor",
"specs": [
[
"==",
"3.1.0"
]
]
},
{
"name": "tf-keras",
"specs": [
[
"==",
"2.15.0"
]
]
},
{
"name": "thinc",
"specs": [
[
"==",
"8.3.4"
]
]
},
{
"name": "threadpoolctl",
"specs": [
[
"==",
"3.6.0"
]
]
},
{
"name": "tifffile",
"specs": [
[
"==",
"2024.8.30"
]
]
},
{
"name": "tifffile",
"specs": [
[
"==",
"2025.5.10"
]
]
},
{
"name": "tifffile",
"specs": [
[
"==",
"2025.6.11"
]
]
},
{
"name": "tiktoken",
"specs": [
[
"==",
"0.9.0"
]
]
},
{
"name": "timm",
"specs": [
[
"==",
"0.6.13"
]
]
},
{
"name": "together",
"specs": [
[
"==",
"1.3.14"
]
]
},
{
"name": "tokenizers",
"specs": [
[
"==",
"0.21.2"
]
]
},
{
"name": "toml",
"specs": [
[
"==",
"0.10.2"
]
]
},
{
"name": "tomli",
"specs": [
[
"==",
"2.2.1"
]
]
},
{
"name": "toolz",
"specs": [
[
"==",
"1.0.0"
]
]
},
{
"name": "torch",
"specs": [
[
"==",
"2.5.1"
]
]
},
{
"name": "torch-fidelity",
"specs": [
[
"==",
"0.3.0"
]
]
},
{
"name": "torchmetrics",
"specs": [
[
"==",
"0.11.4"
]
]
},
{
"name": "torchvision",
"specs": [
[
"==",
"0.20.1"
]
]
},
{
"name": "tqdm",
"specs": [
[
"==",
"4.67.1"
]
]
},
{
"name": "transformers",
"specs": [
[
"==",
"4.52.4"
]
]
},
{
"name": "transformers-stream-generator",
"specs": [
[
"==",
"0.0.5"
]
]
},
{
"name": "treescope",
"specs": [
[
"==",
"0.1.9"
]
]
},
{
"name": "trio",
"specs": [
[
"==",
"0.30.0"
]
]
},
{
"name": "trio-websocket",
"specs": [
[
"==",
"0.12.2"
]
]
},
{
"name": "triton",
"specs": [
[
"==",
"3.1.0"
]
]
},
{
"name": "trouting",
"specs": [
[
"==",
"0.3.3"
]
]
},
{
"name": "typer",
"specs": [
[
"==",
"0.15.3"
]
]
},
{
"name": "types-requests",
"specs": [
[
"==",
"2.31.0.6"
]
]
},
{
"name": "types-requests",
"specs": [
[
"==",
"2.32.4.20250611"
]
]
},
{
"name": "types-urllib3",
"specs": [
[
"==",
"1.26.25.14"
]
]
},
{
"name": "typing-extensions",
"specs": [
[
"==",
"4.14.1"
]
]
},
{
"name": "typing-inspect",
"specs": [
[
"==",
"0.9.0"
]
]
},
{
"name": "typing-inspection",
"specs": [
[
"==",
"0.4.1"
]
]
},
{
"name": "tzdata",
"specs": [
[
"==",
"2025.2"
]
]
},
{
"name": "uncertainty-calibration",
"specs": [
[
"==",
"0.1.4"
]
]
},
{
"name": "unidecode",
"specs": [
[
"==",
"1.4.0"
]
]
},
{
"name": "uritemplate",
"specs": [
[
"==",
"4.2.0"
]
]
},
{
"name": "urllib3",
"specs": [
[
"==",
"1.26.20"
]
]
},
{
"name": "urllib3",
"specs": [
[
"==",
"2.5.0"
]
]
},
{
"name": "virtualenv",
"specs": [
[
"==",
"20.32.0"
]
]
},
{
"name": "wandb",
"specs": [
[
"==",
"0.21.0"
]
]
},
{
"name": "wasabi",
"specs": [
[
"==",
"1.1.3"
]
]
},
{
"name": "wcwidth",
"specs": [
[
"==",
"0.2.13"
]
]
},
{
"name": "weasel",
"specs": [
[
"==",
"0.4.1"
]
]
},
{
"name": "websocket-client",
"specs": [
[
"==",
"1.8.0"
]
]
},
{
"name": "websockets",
"specs": [
[
"==",
"12.0"
]
]
},
{
"name": "websockets",
"specs": [
[
"==",
"14.2"
]
]
},
{
"name": "werkzeug",
"specs": [
[
"==",
"3.1.3"
]
]
},
{
"name": "wheel",
"specs": [
[
"==",
"0.45.1"
]
]
},
{
"name": "wrapt",
"specs": [
[
"==",
"1.17.2"
]
]
},
{
"name": "writer-sdk",
"specs": [
[
"==",
"2.2.1"
]
]
},
{
"name": "writerai",
"specs": [
[
"==",
"4.0.1"
]
]
},
{
"name": "wsproto",
"specs": [
[
"==",
"1.2.0"
]
]
},
{
"name": "xdoctest",
"specs": [
[
"==",
"1.2.0"
]
]
},
{
"name": "xlrd",
"specs": [
[
"==",
"2.0.2"
]
]
},
{
"name": "xxhash",
"specs": [
[
"==",
"3.5.0"
]
]
},
{
"name": "yarl",
"specs": [
[
"==",
"1.20.1"
]
]
},
{
"name": "zipp",
"specs": [
[
"==",
"3.23.0"
]
]
},
{
"name": "zstandard",
"specs": [
[
"==",
"0.18.0"
]
]
}
],
"lcname": "nvidia-crfm-helm"
}