hf-vram-calc


Namehf-vram-calc JSON
Version 1.0.1 PyPI version JSON
download
home_pageNone
SummaryGPU memory calculator for Hugging Face models with different data types and parallelization strategies
upload_time2025-08-19 09:18:33
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT
keywords ai calculator gpu huggingface memory ml transformer vram
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # HF VRAM Calculator

[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![uv](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/uv/main/assets/badge/v0.json)](https://github.com/astral-sh/uv)

A professional Python CLI tool for estimating GPU memory requirements for Hugging Face models with different data types and parallelization strategies.

> **โšก Latest Features**: Smart dtype detection, 12 quantization formats, 20+ GPU models, professional Rich UI

## Quick Demo

```bash
# Install and run
pip install hf-vram-calc
hf-vram-calc microsoft/DialoGPT-medium

# Output: Beautiful tables showing 0.9GB inference, GPU compatibility, parallelization strategies
```

## Features

- ๐Ÿ” **Automatic Model Analysis**: Fetch configurations from Hugging Face Hub automatically
- ๐Ÿง  **Smart Data Type Detection**: Intelligent dtype recommendation from model names, config, or defaults
- ๐Ÿ“Š **Comprehensive Data Type Support**: fp32, fp16, bf16, fp8, int8, int4, mxfp4, nvfp4, awq_int4, gptq_int4, nf4, fp4
- ๐ŸŽฏ **Multi-Scenario Memory Estimation**:
  - **Inference**: Model weights + KV cache overhead (ร—1.2 factor)
  - **Training**: Full Adam optimizer states (ร—4ร—1.3 factors)
  - **LoRA Fine-tuning**: Low-rank adaptation with trainable parameter overhead
- โšก **Advanced Parallelization Analysis**:
  - Tensor Parallelism (TP): 1, 2, 4, 8
  - Pipeline Parallelism (PP): 1, 2, 4, 8  
  - Expert Parallelism (EP) for MoE models
  - Data Parallelism (DP): 2, 4, 8
  - Combined strategies (TP + PP combinations)
- ๐ŸŽฎ **GPU Compatibility Matrix**:
  - 20+ GPU models (RTX 4090, A100, H100, L40S, etc.)
  - Automatic compatibility checking for inference/training/LoRA
  - Minimum GPU memory requirement calculations
- ๐Ÿ“ˆ **Professional Rich UI**:
  - ๐ŸŽจ Beautiful color-coded tables and panels
  - ๐Ÿ“Š Real-time progress indicators
  - ๐Ÿš€ Modern CLI interface with emoji icons
  - ๐Ÿ’ก Smart recommendations and warnings
- ๐Ÿ”ง **Flexible Configuration**:
  - Customizable LoRA rank, batch size, sequence length
  - External JSON configuration files
  - User-defined GPU models and data types
- ๐Ÿ“‹ **Parameter Display**: Raw count + human-readable format (e.g., "405,016,576 (405.0M)")

## Installation

### Quick Install (from PyPI)

```bash
pip install hf-vram-calc
```

### Build from Source

```bash
# Clone the repository
git clone <repository-url>
cd hf-vram-calc

# Build with uv (recommended)
uv build
uv pip install dist/hf_vram_calc-1.0.0-py3-none-any.whl

# Or install directly
uv pip install .
```

> **Dependencies**: `requests` (HTTP), `rich` (beautiful CLI), Python โ‰ฅ3.8

For detailed build instructions, see: [BUILD.md](BUILD.md)

## Usage

### Basic Usage - Smart Dtype Detection

```bash
# Automatic dtype recommendation from model config/name
hf-vram-calc microsoft/DialoGPT-medium

# Model name contains dtype - automatically detects fp16
hf-vram-calc nvidia/DeepSeek-R1-0528-FP4
```

### Specify Data Type Override

```bash
# Override with specific data type
hf-vram-calc meta-llama/Llama-2-7b-hf --dtype bf16
hf-vram-calc mistralai/Mistral-7B-v0.1 --dtype nvfp4
```

### Advanced Configuration

```bash
# Custom batch size and sequence length
hf-vram-calc mistralai/Mistral-7B-v0.1 --batch-size 4 --sequence-length 4096

# Custom LoRA rank for fine-tuning estimation  
hf-vram-calc microsoft/DialoGPT-medium --lora-rank 128

# Detailed analysis (enabled by default)
hf-vram-calc meta-llama/Llama-2-7b-hf --show-detailed
```

### System Information

```bash
# List all available data types and GPU models
hf-vram-calc --list-types

# Use custom configuration directory
hf-vram-calc --config-dir ./my_config microsoft/DialoGPT-medium

# Show help
hf-vram-calc --help
```

## Command Line Arguments

### Required
- `model_name`: Hugging Face model name (e.g., `microsoft/DialoGPT-medium`)

### Data Type Control  
- `--dtype {fp32,fp16,bf16,fp8,int8,int4,mxfp4,nvfp4,awq_int4,fp4,nf4,gptq_int4}`: Override automatic dtype detection
- `--list-types`: List all available data types and GPU models

### Memory Estimation Parameters
- `--batch-size BATCH_SIZE`: Batch size for activation estimation (default: 1)
- `--sequence-length SEQUENCE_LENGTH`: Sequence length for memory calculation (default: 2048)  
- `--lora-rank LORA_RANK`: LoRA rank for fine-tuning estimation (default: 64)

### Display & Configuration
- `--show-detailed`: Show detailed parallelization and GPU compatibility (default: enabled)
- `--config-dir CONFIG_DIR`: Custom configuration directory path
- `--help`: Show complete help message with examples

### Smart Behavior
- **No `--dtype`**: Uses intelligent priority (model name โ†’ config โ†’ fp16 default)
- **With `--dtype`**: Overrides automatic detection with specified type
- **Invalid model**: Graceful error handling with helpful suggestions

## Quick Start Examples

```bash
# Estimate memory for different models
hf-vram-calc microsoft/DialoGPT-medium              # โ†’ 0.9GB inference (FP16)
hf-vram-calc meta-llama/Llama-2-7b-hf              # โ†’ ~13GB inference  
hf-vram-calc nvidia/DeepSeek-R1-0528-FP4           # โ†’ Auto-detects FP4 from name

# Compare different quantization methods
hf-vram-calc meta-llama/Llama-2-7b-hf --dtype fp16     # โ†’ ~13GB
hf-vram-calc meta-llama/Llama-2-7b-hf --dtype int4     # โ†’ ~3.5GB  
hf-vram-calc meta-llama/Llama-2-7b-hf --dtype awq_int4 # โ†’ ~3.5GB

# Find optimal parallelization strategy
hf-vram-calc mistralai/Mistral-7B-v0.1 --show-detailed  # โ†’ TP/PP recommendations

# Check what's available
hf-vram-calc --list-types                               # โ†’ All types & GPUs
```

## Data Type Priority & Detection

### Automatic Data Type Recommendation

The tool uses intelligent priority-based dtype selection:

1. **Model Name Detection** (Highest Priority)
   - `model-fp16`, `model-bf16` โ†’ Extracts from model name  
   - `model-4bit`, `model-gptq`, `model-awq` โ†’ Detects quantization
   
2. **Config torch_dtype** (Medium Priority)
   - Reads `torch_dtype` from model's `config.json`
   - Maps `torch.float16` โ†’ `fp16`, `torch.bfloat16` โ†’ `bf16`, etc.

3. **Default Fallback** (Lowest Priority)
   - Defaults to `fp16` when no dtype detected

### Supported Data Types

| Data Type | Bytes/Param | Description | Detection Patterns |
|-----------|-------------|-------------|--------------------|
| **fp32**  | 4.0 | 32-bit floating point | `fp32`, `float32` |
| **fp16**  | 2.0 | 16-bit floating point | `fp16`, `float16`, `half` |
| **bf16**  | 2.0 | Brain Float 16 | `bf16`, `bfloat16` |
| **fp8**   | 1.0 | 8-bit floating point | `fp8`, `float8` |
| **int8**  | 1.0 | 8-bit integer | `int8`, `8bit` |
| **int4**  | 0.5 | 4-bit integer | `int4`, `4bit` |
| **mxfp4** | 0.5 | Microsoft FP4 | `mxfp4` |
| **nvfp4** | 0.5 | NVIDIA FP4 | `nvfp4` |
| **awq_int4** | 0.5 | AWQ 4-bit quantization | `awq`, `awq-int4` |
| **gptq_int4** | 0.5 | GPTQ 4-bit quantization | `gptq`, `gptq-int4` |
| **nf4**   | 0.5 | 4-bit NormalFloat | `nf4`, `bnb-4bit` |
| **fp4**   | 0.5 | 4-bit floating point | `fp4` |

## Parallelization Strategies

### Tensor Parallelism (TP)
Splits model weights by tensor dimensions across multiple GPUs.

### Pipeline Parallelism (PP)
Distributes different model layers to different GPUs.

### Expert Parallelism (EP)
For MoE (Mixture of Experts) models, distributes expert networks to different GPUs.

### Data Parallelism (DP)
Each GPU holds a complete model copy, only splitting data.

## Example Output

### Smart Dtype Detection Example

```bash
$ hf-vram-calc microsoft/DialoGPT-medium
```

```
Using recommended data type: FP16
Use --dtype to specify different type, or see --list-types for all options
  ๐Ÿ” Fetching configuration for microsoft/DialoGPT-medium...
  ๐Ÿ“‹ Parsing model configuration...                         
  ๐Ÿงฎ Calculating model parameters...                        
  ๐Ÿ’พ Computing memory requirements...                       

                          โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€ ๐Ÿค– Model Information โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
                          โ”‚                                    โ”‚
                          โ”‚  Model: microsoft/DialoGPT-medium  โ”‚
                          โ”‚  Architecture: gpt2                โ”‚
                          โ”‚  Parameters: 405,016,576 (405.0M)  โ”‚
                          โ”‚  Recommended dtype: FP16           โ”‚
                          โ”‚                                    โ”‚
                          โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

        ๐Ÿ’พ Memory Requirements by Data Type and Scenario                
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚              โ”‚   Total Size โ”‚    Inference โ”‚        Training โ”‚         LoRA โ”‚
โ”‚  Data Type   โ”‚         (GB) โ”‚         (GB) โ”‚     (Adam) (GB) โ”‚         (GB) โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚     FP16     โ”‚         0.75 โ”‚         0.91 โ”‚            3.92 โ”‚         0.94 โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

          โšก Parallelization Strategies (FP16 Inference)                 
โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•คโ•โ•โ•โ•โ•โ•โ•คโ•โ•โ•โ•โ•โ•โ•คโ•โ•โ•โ•โ•โ•โ•คโ•โ•โ•โ•โ•โ•โ•คโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•คโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
โ•‘                    โ”‚      โ”‚      โ”‚      โ”‚      โ”‚   Memory/GPU โ”‚   Min GPU    โ•‘
โ•‘ Strategy           โ”‚  TP  โ”‚  PP  โ”‚  EP  โ”‚  DP  โ”‚         (GB) โ”‚   Required   โ•‘
โ•Ÿโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ข
โ•‘ Single GPU         โ”‚  1   โ”‚  1   โ”‚  1   โ”‚  1   โ”‚         0.91 โ”‚     4GB+     โ•‘
โ•‘ Tensor Parallel    โ”‚  2   โ”‚  1   โ”‚  1   โ”‚  1   โ”‚         0.45 โ”‚     4GB+     โ•‘
โ•‘ TP + PP            โ”‚  4   โ”‚  4   โ”‚  1   โ”‚  1   โ”‚         0.06 โ”‚     4GB+     โ•‘
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•งโ•โ•โ•โ•โ•โ•โ•งโ•โ•โ•โ•โ•โ•โ•งโ•โ•โ•โ•โ•โ•โ•งโ•โ•โ•โ•โ•โ•โ•งโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•งโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

                  ๐ŸŽฎ GPU Compatibility Matrix                         
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฏโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฏโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฏโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฏโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ GPU Type        โ”‚   Memory   โ”‚  Inference   โ”‚   Training   โ”‚     LoRA     โ”ƒ
โ” โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”จ
โ”ƒ RTX 4090        โ”‚    24GB    โ”‚      โœ“       โ”‚      โœ“       โ”‚      โœ“       โ”ƒ
โ”ƒ A100 80GB       โ”‚    80GB    โ”‚      โœ“       โ”‚      โœ“       โ”‚      โœ“       โ”ƒ
โ”ƒ H100 80GB       โ”‚    80GB    โ”‚      โœ“       โ”‚      โœ“       โ”‚      โœ“       โ”ƒ
โ”—โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ทโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ทโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ทโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ทโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”›

โ•ญโ”€โ”€โ”€ ๐Ÿ“‹ Minimum GPU Requirements โ”€โ”€โ”€โ”€โ•ฎ
โ”‚                                   โ”‚
โ”‚  Single GPU Inference: 0.9GB      โ”‚
โ”‚  Single GPU Training: 3.9GB       โ”‚  
โ”‚  Single GPU LoRA: 0.9GB           โ”‚
โ”‚                                   โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
```

### Large Model with User Override

```bash
$ hf-vram-calc nvidia/DeepSeek-R1-0528-FP4 --dtype nvfp4
```

```
                          โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ ๐Ÿค– Model Information โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
                          โ”‚                                      โ”‚
                          โ”‚  Model: nvidia/DeepSeek-R1-0528-FP4  โ”‚
                          โ”‚  Architecture: deepseek_v3           โ”‚
                          โ”‚  Parameters: 30,510,606,336 (30.5B)  โ”‚
                          โ”‚  Original torch_dtype: bfloat16      โ”‚
                          โ”‚  User specified dtype: NVFP4         โ”‚
                          โ”‚                                      โ”‚
                          โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

        ๐Ÿ’พ Memory Requirements by Data Type and Scenario                
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚              โ”‚   Total Size โ”‚    Inference โ”‚        Training โ”‚         LoRA โ”‚
โ”‚  Data Type   โ”‚         (GB) โ”‚         (GB) โ”‚     (Adam) (GB) โ”‚         (GB) โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚    NVFP4     โ”‚        14.21 โ”‚        17.05 โ”‚           73.88 โ”‚        19.34 โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
```

### List Available Types

```bash
$ hf-vram-calc --list-types
```

```
Available Data Types:
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ Data Type โ”‚ Bytes/Param โ”‚ Description            โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ FP32      โ”‚           4 โ”‚ 32-bit floating point  โ”‚
โ”‚ FP16      โ”‚           2 โ”‚ 16-bit floating point  โ”‚
โ”‚ BF16      โ”‚           2 โ”‚ Brain Float 16         โ”‚
โ”‚ NVFP4     โ”‚         0.5 โ”‚ NVIDIA FP4             โ”‚
โ”‚ AWQ_INT4  โ”‚         0.5 โ”‚ AWQ 4-bit quantization โ”‚
โ”‚ GPTQ_INT4 โ”‚         0.5 โ”‚ GPTQ 4-bit quantizationโ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

Available GPU Types:
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ GPU Name          โ”‚ Memory (GB) โ”‚ Category   โ”‚ Architecture โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ RTX 4090          โ”‚          24 โ”‚ consumer   โ”‚ Ada Lovelace โ”‚
โ”‚ A100 80GB         โ”‚          80 โ”‚ datacenter โ”‚ Ampere       โ”‚
โ”‚ H100 80GB         โ”‚          80 โ”‚ datacenter โ”‚ Hopper       โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
```

## Calculation Formulas

### Inference Memory
```
Inference Memory = Model Weights ร— 1.2
```
Includes model weights and KV cache overhead.

### Training Memory (with Adam)
```
Training Memory = Model Weights ร— 4 ร— 1.3
```
- 4x factor: Model weights (1x) + Gradients (1x) + Adam optimizer states (2x)
- 1.3x factor: 30% additional overhead (activation caching, etc.)

### LoRA Fine-tuning Memory
```
LoRA Memory = (Model Weights + LoRA Parameter Overhead) ร— 1.2
```
LoRA parameter overhead calculated based on rank and target module ratio.

## Advanced Features

### Configuration System

External JSON configuration files for maximum flexibility:

- **`data_types.json`** - Add custom quantization formats
- **`gpu_types.json`** - Define new GPU models and specifications  
- **`display_settings.json`** - Customize UI appearance and limits

```bash
# Use custom config directory
hf-vram-calc --config-dir ./custom_config model_name

# Add custom data type example (data_types.json)
{
  "my_custom_int2": {
    "bytes_per_param": 0.25,
    "description": "Custom 2-bit quantization"
  }
}
```

### Memory Calculation Details

| Scenario | Formula | Explanation |
|----------|---------|-------------|
| **Inference** | Model ร— 1.2 | Includes KV cache and activation overhead |
| **Training** | Model ร— 4 ร— 1.3 | Weights(1x) + Gradients(1x) + Adam(2x) + 30% overhead |
| **LoRA** | (Model + LoRA_paramsร—4) ร— 1.2 | Base model + trainable parameters with optimizer |

### Parallelization Efficiency

- **TP (Tensor Parallel)**: Near-linear scaling, slight communication overhead
- **PP (Pipeline Parallel)**: Good efficiency, pipeline bubble ~10-15%  
- **EP (Expert Parallel)**: MoE-specific, depends on expert routing efficiency
- **DP (Data Parallel)**: No memory reduction per GPU, full model replica

## Supported Architectures

### Fully Supported โœ…
- **GPT Family**: GPT-2, GPT-3, GPT-4, GPT-NeoX, etc.
- **LLaMA Family**: LLaMA, LLaMA-2, Code Llama, Vicuna, etc.
- **Mistral Family**: Mistral 7B, Mixtral 8x7B (MoE), etc.
- **Other Transformers**: BERT, RoBERTa, T5, FLAN-T5, etc.
- **New Architectures**: DeepSeek, Qwen, ChatGLM, Baichuan, etc.

### Architecture Detection
- **Automatic field mapping** for different config.json formats
- **Fallback support** for uncommon architectures
- **MoE handling** for Mixture-of-Experts models

## Accuracy & Limitations

### โœ… Highly Accurate For:
- **Parameter counting** (exact calculation)
- **Memory estimation** (within 5-10% of actual)
- **Parallelization ratios** (theoretical maximum)

### โš ๏ธ Considerations:
- **Activation memory** varies with sequence length and optimization
- **Real-world efficiency** may differ due to framework overhead  
- **Quantization accuracy** depends on specific implementation
- **MoE models** require expert routing consideration

## Build & Development

Built with modern Python tooling:
- **uv**: Fast Python package management and building
- **Rich**: Professional terminal interface
- **Requests**: HTTP client for model config fetching
- **JSON configuration**: Flexible external configuration system

For development setup, see: [BUILD.md](BUILD.md)

## Contributing

We welcome contributions! Areas for improvement:

- ๐Ÿ”ง **New quantization formats** (add to `data_types.json`)
- ๐ŸŽฎ **GPU models** (update `gpu_types.json`)  
- ๐Ÿ“Š **Architecture support** (enhance config parsing)
- ๐Ÿš€ **Performance optimizations**
- ๐Ÿ“š **Documentation improvements**
- ๐Ÿงช **Test coverage expansion**

## See Also

- ๐Ÿ“š **[BUILD.md](BUILD.md)** - Complete build and installation guide
- โš™๏ธ **[CONFIG_GUIDE.md](CONFIG_GUIDE.md)** - Configuration customization details
- ๐Ÿ“ **Examples in help**: `hf-vram-calc --help` for usage examples

## Version History

- **v1.0.0**: Complete rewrite with uv build, smart dtype detection, professional UI
- **v0.x**: Legacy single-file version (deprecated)

## License

MIT License - see LICENSE file for details.

---

**Made with โค๏ธ for the ML community** | Built with [uv](https://github.com/astral-sh/uv) and [Rich](https://github.com/Textualize/rich)
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "hf-vram-calc",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "ai, calculator, gpu, huggingface, memory, ml, transformer, vram",
    "author": null,
    "author_email": "HF VRAM Calculator Contributors <hf-vram-calc@example.com>",
    "download_url": "https://files.pythonhosted.org/packages/ae/89/efb9b6d1f94e4d483271348e2a5a46ad6ac3422fcaaa0d8a83fe84ebfebf/hf_vram_calc-1.0.1.tar.gz",
    "platform": null,
    "description": "# HF VRAM Calculator\n\n[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![uv](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/uv/main/assets/badge/v0.json)](https://github.com/astral-sh/uv)\n\nA professional Python CLI tool for estimating GPU memory requirements for Hugging Face models with different data types and parallelization strategies.\n\n> **\u26a1 Latest Features**: Smart dtype detection, 12 quantization formats, 20+ GPU models, professional Rich UI\n\n## Quick Demo\n\n```bash\n# Install and run\npip install hf-vram-calc\nhf-vram-calc microsoft/DialoGPT-medium\n\n# Output: Beautiful tables showing 0.9GB inference, GPU compatibility, parallelization strategies\n```\n\n## Features\n\n- \ud83d\udd0d **Automatic Model Analysis**: Fetch configurations from Hugging Face Hub automatically\n- \ud83e\udde0 **Smart Data Type Detection**: Intelligent dtype recommendation from model names, config, or defaults\n- \ud83d\udcca **Comprehensive Data Type Support**: fp32, fp16, bf16, fp8, int8, int4, mxfp4, nvfp4, awq_int4, gptq_int4, nf4, fp4\n- \ud83c\udfaf **Multi-Scenario Memory Estimation**:\n  - **Inference**: Model weights + KV cache overhead (\u00d71.2 factor)\n  - **Training**: Full Adam optimizer states (\u00d74\u00d71.3 factors)\n  - **LoRA Fine-tuning**: Low-rank adaptation with trainable parameter overhead\n- \u26a1 **Advanced Parallelization Analysis**:\n  - Tensor Parallelism (TP): 1, 2, 4, 8\n  - Pipeline Parallelism (PP): 1, 2, 4, 8  \n  - Expert Parallelism (EP) for MoE models\n  - Data Parallelism (DP): 2, 4, 8\n  - Combined strategies (TP + PP combinations)\n- \ud83c\udfae **GPU Compatibility Matrix**:\n  - 20+ GPU models (RTX 4090, A100, H100, L40S, etc.)\n  - Automatic compatibility checking for inference/training/LoRA\n  - Minimum GPU memory requirement calculations\n- \ud83d\udcc8 **Professional Rich UI**:\n  - \ud83c\udfa8 Beautiful color-coded tables and panels\n  - \ud83d\udcca Real-time progress indicators\n  - \ud83d\ude80 Modern CLI interface with emoji icons\n  - \ud83d\udca1 Smart recommendations and warnings\n- \ud83d\udd27 **Flexible Configuration**:\n  - Customizable LoRA rank, batch size, sequence length\n  - External JSON configuration files\n  - User-defined GPU models and data types\n- \ud83d\udccb **Parameter Display**: Raw count + human-readable format (e.g., \"405,016,576 (405.0M)\")\n\n## Installation\n\n### Quick Install (from PyPI)\n\n```bash\npip install hf-vram-calc\n```\n\n### Build from Source\n\n```bash\n# Clone the repository\ngit clone <repository-url>\ncd hf-vram-calc\n\n# Build with uv (recommended)\nuv build\nuv pip install dist/hf_vram_calc-1.0.0-py3-none-any.whl\n\n# Or install directly\nuv pip install .\n```\n\n> **Dependencies**: `requests` (HTTP), `rich` (beautiful CLI), Python \u22653.8\n\nFor detailed build instructions, see: [BUILD.md](BUILD.md)\n\n## Usage\n\n### Basic Usage - Smart Dtype Detection\n\n```bash\n# Automatic dtype recommendation from model config/name\nhf-vram-calc microsoft/DialoGPT-medium\n\n# Model name contains dtype - automatically detects fp16\nhf-vram-calc nvidia/DeepSeek-R1-0528-FP4\n```\n\n### Specify Data Type Override\n\n```bash\n# Override with specific data type\nhf-vram-calc meta-llama/Llama-2-7b-hf --dtype bf16\nhf-vram-calc mistralai/Mistral-7B-v0.1 --dtype nvfp4\n```\n\n### Advanced Configuration\n\n```bash\n# Custom batch size and sequence length\nhf-vram-calc mistralai/Mistral-7B-v0.1 --batch-size 4 --sequence-length 4096\n\n# Custom LoRA rank for fine-tuning estimation  \nhf-vram-calc microsoft/DialoGPT-medium --lora-rank 128\n\n# Detailed analysis (enabled by default)\nhf-vram-calc meta-llama/Llama-2-7b-hf --show-detailed\n```\n\n### System Information\n\n```bash\n# List all available data types and GPU models\nhf-vram-calc --list-types\n\n# Use custom configuration directory\nhf-vram-calc --config-dir ./my_config microsoft/DialoGPT-medium\n\n# Show help\nhf-vram-calc --help\n```\n\n## Command Line Arguments\n\n### Required\n- `model_name`: Hugging Face model name (e.g., `microsoft/DialoGPT-medium`)\n\n### Data Type Control  \n- `--dtype {fp32,fp16,bf16,fp8,int8,int4,mxfp4,nvfp4,awq_int4,fp4,nf4,gptq_int4}`: Override automatic dtype detection\n- `--list-types`: List all available data types and GPU models\n\n### Memory Estimation Parameters\n- `--batch-size BATCH_SIZE`: Batch size for activation estimation (default: 1)\n- `--sequence-length SEQUENCE_LENGTH`: Sequence length for memory calculation (default: 2048)  \n- `--lora-rank LORA_RANK`: LoRA rank for fine-tuning estimation (default: 64)\n\n### Display & Configuration\n- `--show-detailed`: Show detailed parallelization and GPU compatibility (default: enabled)\n- `--config-dir CONFIG_DIR`: Custom configuration directory path\n- `--help`: Show complete help message with examples\n\n### Smart Behavior\n- **No `--dtype`**: Uses intelligent priority (model name \u2192 config \u2192 fp16 default)\n- **With `--dtype`**: Overrides automatic detection with specified type\n- **Invalid model**: Graceful error handling with helpful suggestions\n\n## Quick Start Examples\n\n```bash\n# Estimate memory for different models\nhf-vram-calc microsoft/DialoGPT-medium              # \u2192 0.9GB inference (FP16)\nhf-vram-calc meta-llama/Llama-2-7b-hf              # \u2192 ~13GB inference  \nhf-vram-calc nvidia/DeepSeek-R1-0528-FP4           # \u2192 Auto-detects FP4 from name\n\n# Compare different quantization methods\nhf-vram-calc meta-llama/Llama-2-7b-hf --dtype fp16     # \u2192 ~13GB\nhf-vram-calc meta-llama/Llama-2-7b-hf --dtype int4     # \u2192 ~3.5GB  \nhf-vram-calc meta-llama/Llama-2-7b-hf --dtype awq_int4 # \u2192 ~3.5GB\n\n# Find optimal parallelization strategy\nhf-vram-calc mistralai/Mistral-7B-v0.1 --show-detailed  # \u2192 TP/PP recommendations\n\n# Check what's available\nhf-vram-calc --list-types                               # \u2192 All types & GPUs\n```\n\n## Data Type Priority & Detection\n\n### Automatic Data Type Recommendation\n\nThe tool uses intelligent priority-based dtype selection:\n\n1. **Model Name Detection** (Highest Priority)\n   - `model-fp16`, `model-bf16` \u2192 Extracts from model name  \n   - `model-4bit`, `model-gptq`, `model-awq` \u2192 Detects quantization\n   \n2. **Config torch_dtype** (Medium Priority)\n   - Reads `torch_dtype` from model's `config.json`\n   - Maps `torch.float16` \u2192 `fp16`, `torch.bfloat16` \u2192 `bf16`, etc.\n\n3. **Default Fallback** (Lowest Priority)\n   - Defaults to `fp16` when no dtype detected\n\n### Supported Data Types\n\n| Data Type | Bytes/Param | Description | Detection Patterns |\n|-----------|-------------|-------------|--------------------|\n| **fp32**  | 4.0 | 32-bit floating point | `fp32`, `float32` |\n| **fp16**  | 2.0 | 16-bit floating point | `fp16`, `float16`, `half` |\n| **bf16**  | 2.0 | Brain Float 16 | `bf16`, `bfloat16` |\n| **fp8**   | 1.0 | 8-bit floating point | `fp8`, `float8` |\n| **int8**  | 1.0 | 8-bit integer | `int8`, `8bit` |\n| **int4**  | 0.5 | 4-bit integer | `int4`, `4bit` |\n| **mxfp4** | 0.5 | Microsoft FP4 | `mxfp4` |\n| **nvfp4** | 0.5 | NVIDIA FP4 | `nvfp4` |\n| **awq_int4** | 0.5 | AWQ 4-bit quantization | `awq`, `awq-int4` |\n| **gptq_int4** | 0.5 | GPTQ 4-bit quantization | `gptq`, `gptq-int4` |\n| **nf4**   | 0.5 | 4-bit NormalFloat | `nf4`, `bnb-4bit` |\n| **fp4**   | 0.5 | 4-bit floating point | `fp4` |\n\n## Parallelization Strategies\n\n### Tensor Parallelism (TP)\nSplits model weights by tensor dimensions across multiple GPUs.\n\n### Pipeline Parallelism (PP)\nDistributes different model layers to different GPUs.\n\n### Expert Parallelism (EP)\nFor MoE (Mixture of Experts) models, distributes expert networks to different GPUs.\n\n### Data Parallelism (DP)\nEach GPU holds a complete model copy, only splitting data.\n\n## Example Output\n\n### Smart Dtype Detection Example\n\n```bash\n$ hf-vram-calc microsoft/DialoGPT-medium\n```\n\n```\nUsing recommended data type: FP16\nUse --dtype to specify different type, or see --list-types for all options\n  \ud83d\udd0d Fetching configuration for microsoft/DialoGPT-medium...\n  \ud83d\udccb Parsing model configuration...                         \n  \ud83e\uddee Calculating model parameters...                        \n  \ud83d\udcbe Computing memory requirements...                       \n\n                          \u256d\u2500\u2500\u2500\u2500\u2500\u2500\u2500 \ud83e\udd16 Model Information \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n                          \u2502                                    \u2502\n                          \u2502  Model: microsoft/DialoGPT-medium  \u2502\n                          \u2502  Architecture: gpt2                \u2502\n                          \u2502  Parameters: 405,016,576 (405.0M)  \u2502\n                          \u2502  Recommended dtype: FP16           \u2502\n                          \u2502                                    \u2502\n                          \u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n\n        \ud83d\udcbe Memory Requirements by Data Type and Scenario                \n\u256d\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502              \u2502   Total Size \u2502    Inference \u2502        Training \u2502         LoRA \u2502\n\u2502  Data Type   \u2502         (GB) \u2502         (GB) \u2502     (Adam) (GB) \u2502         (GB) \u2502\n\u251c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2524\n\u2502     FP16     \u2502         0.75 \u2502         0.91 \u2502            3.92 \u2502         0.94 \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n\n          \u26a1 Parallelization Strategies (FP16 Inference)                 \n\u2554\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2557\n\u2551                    \u2502      \u2502      \u2502      \u2502      \u2502   Memory/GPU \u2502   Min GPU    \u2551\n\u2551 Strategy           \u2502  TP  \u2502  PP  \u2502  EP  \u2502  DP  \u2502         (GB) \u2502   Required   \u2551\n\u255f\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2562\n\u2551 Single GPU         \u2502  1   \u2502  1   \u2502  1   \u2502  1   \u2502         0.91 \u2502     4GB+     \u2551\n\u2551 Tensor Parallel    \u2502  2   \u2502  1   \u2502  1   \u2502  1   \u2502         0.45 \u2502     4GB+     \u2551\n\u2551 TP + PP            \u2502  4   \u2502  4   \u2502  1   \u2502  1   \u2502         0.06 \u2502     4GB+     \u2551\n\u255a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2567\u2550\u2550\u2550\u2550\u2550\u2550\u2567\u2550\u2550\u2550\u2550\u2550\u2550\u2567\u2550\u2550\u2550\u2550\u2550\u2550\u2567\u2550\u2550\u2550\u2550\u2550\u2550\u2567\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2567\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u255d\n\n                  \ud83c\udfae GPU Compatibility Matrix                         \n\u250f\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u252f\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u252f\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u252f\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u252f\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2513\n\u2503 GPU Type        \u2502   Memory   \u2502  Inference   \u2502   Training   \u2502     LoRA     \u2503\n\u2520\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2528\n\u2503 RTX 4090        \u2502    24GB    \u2502      \u2713       \u2502      \u2713       \u2502      \u2713       \u2503\n\u2503 A100 80GB       \u2502    80GB    \u2502      \u2713       \u2502      \u2713       \u2502      \u2713       \u2503\n\u2503 H100 80GB       \u2502    80GB    \u2502      \u2713       \u2502      \u2713       \u2502      \u2713       \u2503\n\u2517\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2537\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2537\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2537\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2537\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u251b\n\n\u256d\u2500\u2500\u2500 \ud83d\udccb Minimum GPU Requirements \u2500\u2500\u2500\u2500\u256e\n\u2502                                   \u2502\n\u2502  Single GPU Inference: 0.9GB      \u2502\n\u2502  Single GPU Training: 3.9GB       \u2502  \n\u2502  Single GPU LoRA: 0.9GB           \u2502\n\u2502                                   \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n```\n\n### Large Model with User Override\n\n```bash\n$ hf-vram-calc nvidia/DeepSeek-R1-0528-FP4 --dtype nvfp4\n```\n\n```\n                          \u256d\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500 \ud83e\udd16 Model Information \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n                          \u2502                                      \u2502\n                          \u2502  Model: nvidia/DeepSeek-R1-0528-FP4  \u2502\n                          \u2502  Architecture: deepseek_v3           \u2502\n                          \u2502  Parameters: 30,510,606,336 (30.5B)  \u2502\n                          \u2502  Original torch_dtype: bfloat16      \u2502\n                          \u2502  User specified dtype: NVFP4         \u2502\n                          \u2502                                      \u2502\n                          \u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n\n        \ud83d\udcbe Memory Requirements by Data Type and Scenario                \n\u256d\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502              \u2502   Total Size \u2502    Inference \u2502        Training \u2502         LoRA \u2502\n\u2502  Data Type   \u2502         (GB) \u2502         (GB) \u2502     (Adam) (GB) \u2502         (GB) \u2502\n\u251c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2524\n\u2502    NVFP4     \u2502        14.21 \u2502        17.05 \u2502           73.88 \u2502        19.34 \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n```\n\n### List Available Types\n\n```bash\n$ hf-vram-calc --list-types\n```\n\n```\nAvailable Data Types:\n\u256d\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 Data Type \u2502 Bytes/Param \u2502 Description            \u2502\n\u251c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2524\n\u2502 FP32      \u2502           4 \u2502 32-bit floating point  \u2502\n\u2502 FP16      \u2502           2 \u2502 16-bit floating point  \u2502\n\u2502 BF16      \u2502           2 \u2502 Brain Float 16         \u2502\n\u2502 NVFP4     \u2502         0.5 \u2502 NVIDIA FP4             \u2502\n\u2502 AWQ_INT4  \u2502         0.5 \u2502 AWQ 4-bit quantization \u2502\n\u2502 GPTQ_INT4 \u2502         0.5 \u2502 GPTQ 4-bit quantization\u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n\nAvailable GPU Types:\n\u256d\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 GPU Name          \u2502 Memory (GB) \u2502 Category   \u2502 Architecture \u2502\n\u251c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2524\n\u2502 RTX 4090          \u2502          24 \u2502 consumer   \u2502 Ada Lovelace \u2502\n\u2502 A100 80GB         \u2502          80 \u2502 datacenter \u2502 Ampere       \u2502\n\u2502 H100 80GB         \u2502          80 \u2502 datacenter \u2502 Hopper       \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n```\n\n## Calculation Formulas\n\n### Inference Memory\n```\nInference Memory = Model Weights \u00d7 1.2\n```\nIncludes model weights and KV cache overhead.\n\n### Training Memory (with Adam)\n```\nTraining Memory = Model Weights \u00d7 4 \u00d7 1.3\n```\n- 4x factor: Model weights (1x) + Gradients (1x) + Adam optimizer states (2x)\n- 1.3x factor: 30% additional overhead (activation caching, etc.)\n\n### LoRA Fine-tuning Memory\n```\nLoRA Memory = (Model Weights + LoRA Parameter Overhead) \u00d7 1.2\n```\nLoRA parameter overhead calculated based on rank and target module ratio.\n\n## Advanced Features\n\n### Configuration System\n\nExternal JSON configuration files for maximum flexibility:\n\n- **`data_types.json`** - Add custom quantization formats\n- **`gpu_types.json`** - Define new GPU models and specifications  \n- **`display_settings.json`** - Customize UI appearance and limits\n\n```bash\n# Use custom config directory\nhf-vram-calc --config-dir ./custom_config model_name\n\n# Add custom data type example (data_types.json)\n{\n  \"my_custom_int2\": {\n    \"bytes_per_param\": 0.25,\n    \"description\": \"Custom 2-bit quantization\"\n  }\n}\n```\n\n### Memory Calculation Details\n\n| Scenario | Formula | Explanation |\n|----------|---------|-------------|\n| **Inference** | Model \u00d7 1.2 | Includes KV cache and activation overhead |\n| **Training** | Model \u00d7 4 \u00d7 1.3 | Weights(1x) + Gradients(1x) + Adam(2x) + 30% overhead |\n| **LoRA** | (Model + LoRA_params\u00d74) \u00d7 1.2 | Base model + trainable parameters with optimizer |\n\n### Parallelization Efficiency\n\n- **TP (Tensor Parallel)**: Near-linear scaling, slight communication overhead\n- **PP (Pipeline Parallel)**: Good efficiency, pipeline bubble ~10-15%  \n- **EP (Expert Parallel)**: MoE-specific, depends on expert routing efficiency\n- **DP (Data Parallel)**: No memory reduction per GPU, full model replica\n\n## Supported Architectures\n\n### Fully Supported \u2705\n- **GPT Family**: GPT-2, GPT-3, GPT-4, GPT-NeoX, etc.\n- **LLaMA Family**: LLaMA, LLaMA-2, Code Llama, Vicuna, etc.\n- **Mistral Family**: Mistral 7B, Mixtral 8x7B (MoE), etc.\n- **Other Transformers**: BERT, RoBERTa, T5, FLAN-T5, etc.\n- **New Architectures**: DeepSeek, Qwen, ChatGLM, Baichuan, etc.\n\n### Architecture Detection\n- **Automatic field mapping** for different config.json formats\n- **Fallback support** for uncommon architectures\n- **MoE handling** for Mixture-of-Experts models\n\n## Accuracy & Limitations\n\n### \u2705 Highly Accurate For:\n- **Parameter counting** (exact calculation)\n- **Memory estimation** (within 5-10% of actual)\n- **Parallelization ratios** (theoretical maximum)\n\n### \u26a0\ufe0f Considerations:\n- **Activation memory** varies with sequence length and optimization\n- **Real-world efficiency** may differ due to framework overhead  \n- **Quantization accuracy** depends on specific implementation\n- **MoE models** require expert routing consideration\n\n## Build & Development\n\nBuilt with modern Python tooling:\n- **uv**: Fast Python package management and building\n- **Rich**: Professional terminal interface\n- **Requests**: HTTP client for model config fetching\n- **JSON configuration**: Flexible external configuration system\n\nFor development setup, see: [BUILD.md](BUILD.md)\n\n## Contributing\n\nWe welcome contributions! Areas for improvement:\n\n- \ud83d\udd27 **New quantization formats** (add to `data_types.json`)\n- \ud83c\udfae **GPU models** (update `gpu_types.json`)  \n- \ud83d\udcca **Architecture support** (enhance config parsing)\n- \ud83d\ude80 **Performance optimizations**\n- \ud83d\udcda **Documentation improvements**\n- \ud83e\uddea **Test coverage expansion**\n\n## See Also\n\n- \ud83d\udcda **[BUILD.md](BUILD.md)** - Complete build and installation guide\n- \u2699\ufe0f **[CONFIG_GUIDE.md](CONFIG_GUIDE.md)** - Configuration customization details\n- \ud83d\udcdd **Examples in help**: `hf-vram-calc --help` for usage examples\n\n## Version History\n\n- **v1.0.0**: Complete rewrite with uv build, smart dtype detection, professional UI\n- **v0.x**: Legacy single-file version (deprecated)\n\n## License\n\nMIT License - see LICENSE file for details.\n\n---\n\n**Made with \u2764\ufe0f for the ML community** | Built with [uv](https://github.com/astral-sh/uv) and [Rich](https://github.com/Textualize/rich)",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "GPU memory calculator for Hugging Face models with different data types and parallelization strategies",
    "version": "1.0.1",
    "project_urls": {
        "Homepage": "https://github.com/example/hf-vram-calc",
        "Issues": "https://github.com/example/hf-vram-calc/issues",
        "Repository": "https://github.com/example/hf-vram-calc"
    },
    "split_keywords": [
        "ai",
        " calculator",
        " gpu",
        " huggingface",
        " memory",
        " ml",
        " transformer",
        " vram"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "34fb85c7764ced995beecd1bcdc35757f5fb7c80e42a61b98b55aff5919c9ece",
                "md5": "8bd261b87d4a74763248434db31da620",
                "sha256": "55d32114476a70a0dbcd7170eca36fb557ad4d404f87e752330c8b9cd5b44777"
            },
            "downloads": -1,
            "filename": "hf_vram_calc-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8bd261b87d4a74763248434db31da620",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 21869,
            "upload_time": "2025-08-19T09:18:32",
            "upload_time_iso_8601": "2025-08-19T09:18:32.185067Z",
            "url": "https://files.pythonhosted.org/packages/34/fb/85c7764ced995beecd1bcdc35757f5fb7c80e42a61b98b55aff5919c9ece/hf_vram_calc-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ae89efb9b6d1f94e4d483271348e2a5a46ad6ac3422fcaaa0d8a83fe84ebfebf",
                "md5": "a45e7f1bf25e42181a2cb4b2443232ef",
                "sha256": "f0894ca5a0f6200a4822ea3d63559c37db10499c99ff81c661befd7e87fd6769"
            },
            "downloads": -1,
            "filename": "hf_vram_calc-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "a45e7f1bf25e42181a2cb4b2443232ef",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 21948,
            "upload_time": "2025-08-19T09:18:33",
            "upload_time_iso_8601": "2025-08-19T09:18:33.549070Z",
            "url": "https://files.pythonhosted.org/packages/ae/89/efb9b6d1f94e4d483271348e2a5a46ad6ac3422fcaaa0d8a83fe84ebfebf/hf_vram_calc-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-19 09:18:33",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "example",
    "github_project": "hf-vram-calc",
    "github_not_found": true,
    "lcname": "hf-vram-calc"
}
        
Elapsed time: 0.37429s