# GPU Estimator
A Python package for estimating GPU memory requirements and the number of GPUs needed for training machine learning models.
## Features
- **Latest Model Support**: Built-in configs for LLaMA 4, Gemma 3, Qwen 2.5/3, and more
- Estimate GPU memory requirements based on model parameters
- Calculate optimal number of GPUs for training
- Support for different precision types (FP32, FP16, BF16, INT8)
- Account for optimizer states and gradient storage
- Integration with Hugging Face Hub for latest models
- Discover and search trending models
- Support for popular architectures (GPT, LLaMA, BERT, T5, Mistral, Gemma, Qwen, etc.)
- CLI interface for quick estimates
- Detailed memory breakdown and recommendations
## Installation
```bash
pip install gpu-estimator
```
## Quick Start
### Basic Usage
```python
from gpu_estimator import GPUEstimator
estimator = GPUEstimator()
# Estimate for latest models using predefined configs
from gpu_estimator.utils import get_model_config
result = estimator.estimate_from_architecture(
**get_model_config("qwen2.5-7b"),
batch_size=8,
sequence_length=2048,
precision="fp16"
)
print(f"Memory needed per GPU: {result.memory_per_gpu_gb:.2f} GB")
print(f"Recommended GPUs: {result.num_gpus}")
# Or estimate by parameters for any model size
result = estimator.estimate(
model_params=7e9,
batch_size=32,
sequence_length=2048,
precision="fp16"
)
```
### Hugging Face Integration
```python
from gpu_estimator import GPUEstimator
estimator = GPUEstimator()
# Estimate directly from Hugging Face model ID
result = estimator.estimate_from_huggingface(
model_id="meta-llama/Llama-3.2-3B",
batch_size=4,
sequence_length=2048,
precision="fp16",
gradient_checkpointing=True
)
print(f"Total memory required: {result.total_memory_gb:.2f} GB")
print(f"GPUs needed: {result.num_gpus}")
# Discover trending models
trending = estimator.list_trending_models(limit=10, task="text-generation")
for model in trending:
print(f"{model.model_id} - {model.downloads:,} downloads")
# Search for specific models
models = estimator.search_models("qwen", limit=5)
for model in models:
print(f"{model.model_id} - {model.architecture}")
```
## CLI Usage
### Basic Estimation
```bash
# Estimate for any model by parameters
gpu-estimate estimate --model-params 7e9 --batch-size 4 --precision fp16
# Estimate for predefined models (classic)
gpu-estimate estimate --model-name llama-7b --batch-size 8
# Estimate for latest predefined models
gpu-estimate estimate --model-name qwen2.5-7b --batch-size 4 --precision fp16
gpu-estimate estimate --model-name llama3.2-3b --batch-size 16 --gpu-type A100
gpu-estimate estimate --model-name gemma2-9b --batch-size 8 --precision bf16
# Estimate for Hugging Face models
gpu-estimate estimate --huggingface-model meta-llama/Llama-3.2-3B --batch-size 4
gpu-estimate estimate --huggingface-model Qwen/Qwen2.5-7B --batch-size 8
```
### Model Discovery
```bash
# List trending models
gpu-estimate trending --limit 20 --task text-generation
# Search for models
gpu-estimate search "mistral" --limit 10
# Get popular models by architecture
gpu-estimate popular llama --limit 5
# Get model information
gpu-estimate info qwen2.5-7b
```
### Advanced Options
```bash
# With gradient checkpointing and specific GPU
gpu-estimate estimate \
--huggingface-model meta-llama/Llama-4-Scout-17B \
--batch-size 8 \
--seq-length 1024 \
--precision fp16 \
--gpu-type A100 \
--gradient-checkpointing \
--verbose
```
### Interactive Mode
Launch an interactive session for guided GPU estimation:
```bash
gpu-estimate interactive
```
Features:
- Guided workflows for all estimation tasks
- Model discovery with direct estimation
- Flexible model specification (parameters, names, or HF IDs)
- Step-by-step configuration of training parameters
- Quick estimates from trending model lists
## Supported Models & Architectures
### Hugging Face Models
The package automatically supports any model on Hugging Face Hub by detecting their configuration. Popular architectures include:
| Architecture | Examples | Use Cases |
|-------------|----------|-----------|
| LLaMA/LLaMA2/3/4 | `meta-llama/Llama-2-7b-hf`, `meta-llama/Llama-3.2-3B`, `meta-llama/Llama-4-Scout-17B` | General language modeling, chat |
| GPT | `gpt2`, `microsoft/DialoGPT-large` | Text generation, conversation |
| Mistral | `mistralai/Mistral-7B-v0.1` | Efficient language modeling |
| CodeLlama | `codellama/CodeLlama-7b-Python-hf` | Code generation |
| BERT | `google-bert/bert-base-uncased` | Text classification, NLU |
| T5 | `google-t5/t5-base`, `google/flan-t5-large` | Text-to-text tasks |
| Phi | `microsoft/phi-2` | Small efficient models |
| Gemma/Gemma2/3 | `google/gemma-7b`, `google/gemma-2-9b`, `google/gemma-3-270m` | Google's language models |
| Qwen/Qwen2.5/3 | `Qwen/Qwen-7B`, `Qwen/Qwen2.5-7B`, `Qwen/Qwen3-4B` | Multilingual models |
### Predefined Models
Classic and latest models with known configurations:
**GPT Family:**
- `gpt2`, `gpt2-medium`, `gpt2-large`, `gpt2-xl`, `gpt3`
**LLaMA Family:**
- Original: `llama-7b`, `llama-13b`, `llama-30b`, `llama-65b`
- LLaMA 2: `llama2-7b`, `llama2-13b`, `llama2-70b`
- LLaMA 3.2: `llama3.2-1b`, `llama3.2-3b`
- LLaMA 3.3: `llama3.3-70b`
- LLaMA 4: `llama4-scout-17b`, `llama4-maverick-17b`
- Code LLaMA: `codellama-7b`, `codellama-13b`, `codellama-34b`
**Mistral Family:**
- `mistral-7b`
**Phi Family:**
- `phi-1.5b`, `phi-2.7b`
**Gemma Family:**
- Original: `gemma-2b`, `gemma-7b`
- Gemma 2: `gemma2-2b`, `gemma2-9b`, `gemma2-27b`
- Gemma 3: `gemma3-270m`
**Qwen Family:**
- Qwen 2.5: `qwen2.5-7b`, `qwen2.5-14b`, `qwen2.5-32b`, `qwen2.5-72b`
- Qwen 3: `qwen3-4b`, `qwen3-30b`, `qwen3-235b`
**Flexible Naming**: Model names support flexible matching. Use `custom-llama-7b`, `my-mistral-7b`, or any name containing a known model identifier.
## GPU Types Supported
| GPU | Memory | Use Case |
|-----|--------|----------|
| H100 | 80 GB | Latest high-performance training |
| A100 | 80 GB | Large model training and inference |
| A40 | 48 GB | Professional workstation training |
| A6000 | 48 GB | Creative and AI workstation |
| L40 | 48 GB | Data center inference |
| L4 | 24 GB | Efficient inference |
| RTX 4090 | 24 GB | Consumer high-end |
| RTX 3090 | 24 GB | Consumer enthusiast |
| V100 | 32 GB | Previous generation training |
| T4 | 16 GB | Cloud inference |
Raw data
{
"_id": null,
"home_page": null,
"name": "gpu-estimator",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "gpu, machine-learning, deep-learning, memory, estimation, huggingface, pytorch",
"author": "Hemanth HM",
"author_email": "Hemanth HM <hemanth.hm@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/37/0b/96f5e8c20ec543c64686cb51a490828f3fd81a0f3f1166bb9e70c5b56fb6/gpu_estimator-0.1.4.tar.gz",
"platform": null,
"description": "# GPU Estimator\n\nA Python package for estimating GPU memory requirements and the number of GPUs needed for training machine learning models.\n\n## Features\n\n- **Latest Model Support**: Built-in configs for LLaMA 4, Gemma 3, Qwen 2.5/3, and more\n- Estimate GPU memory requirements based on model parameters\n- Calculate optimal number of GPUs for training\n- Support for different precision types (FP32, FP16, BF16, INT8)\n- Account for optimizer states and gradient storage\n- Integration with Hugging Face Hub for latest models\n- Discover and search trending models\n- Support for popular architectures (GPT, LLaMA, BERT, T5, Mistral, Gemma, Qwen, etc.)\n- CLI interface for quick estimates\n- Detailed memory breakdown and recommendations\n\n## Installation\n\n```bash\npip install gpu-estimator\n```\n\n\n## Quick Start\n\n### Basic Usage\n```python\nfrom gpu_estimator import GPUEstimator\n\nestimator = GPUEstimator()\n\n# Estimate for latest models using predefined configs\nfrom gpu_estimator.utils import get_model_config\n\nresult = estimator.estimate_from_architecture(\n **get_model_config(\"qwen2.5-7b\"),\n batch_size=8,\n sequence_length=2048,\n precision=\"fp16\"\n)\n\nprint(f\"Memory needed per GPU: {result.memory_per_gpu_gb:.2f} GB\")\nprint(f\"Recommended GPUs: {result.num_gpus}\")\n\n# Or estimate by parameters for any model size\nresult = estimator.estimate(\n model_params=7e9,\n batch_size=32,\n sequence_length=2048,\n precision=\"fp16\"\n)\n```\n\n### Hugging Face Integration\n\n```python\nfrom gpu_estimator import GPUEstimator\n\nestimator = GPUEstimator()\n\n# Estimate directly from Hugging Face model ID\nresult = estimator.estimate_from_huggingface(\n model_id=\"meta-llama/Llama-3.2-3B\",\n batch_size=4,\n sequence_length=2048,\n precision=\"fp16\",\n gradient_checkpointing=True\n)\n\nprint(f\"Total memory required: {result.total_memory_gb:.2f} GB\")\nprint(f\"GPUs needed: {result.num_gpus}\")\n\n# Discover trending models\ntrending = estimator.list_trending_models(limit=10, task=\"text-generation\")\nfor model in trending:\n print(f\"{model.model_id} - {model.downloads:,} downloads\")\n\n# Search for specific models\nmodels = estimator.search_models(\"qwen\", limit=5)\nfor model in models:\n print(f\"{model.model_id} - {model.architecture}\")\n```\n\n## CLI Usage\n\n### Basic Estimation\n```bash\n# Estimate for any model by parameters\ngpu-estimate estimate --model-params 7e9 --batch-size 4 --precision fp16\n\n# Estimate for predefined models (classic)\ngpu-estimate estimate --model-name llama-7b --batch-size 8\n\n# Estimate for latest predefined models\ngpu-estimate estimate --model-name qwen2.5-7b --batch-size 4 --precision fp16\ngpu-estimate estimate --model-name llama3.2-3b --batch-size 16 --gpu-type A100\ngpu-estimate estimate --model-name gemma2-9b --batch-size 8 --precision bf16\n\n# Estimate for Hugging Face models\ngpu-estimate estimate --huggingface-model meta-llama/Llama-3.2-3B --batch-size 4\ngpu-estimate estimate --huggingface-model Qwen/Qwen2.5-7B --batch-size 8\n```\n\n### Model Discovery\n```bash\n# List trending models\ngpu-estimate trending --limit 20 --task text-generation\n\n# Search for models\ngpu-estimate search \"mistral\" --limit 10\n\n# Get popular models by architecture\ngpu-estimate popular llama --limit 5\n\n# Get model information\ngpu-estimate info qwen2.5-7b\n```\n\n### Advanced Options\n```bash\n# With gradient checkpointing and specific GPU\ngpu-estimate estimate \\\n --huggingface-model meta-llama/Llama-4-Scout-17B \\\n --batch-size 8 \\\n --seq-length 1024 \\\n --precision fp16 \\\n --gpu-type A100 \\\n --gradient-checkpointing \\\n --verbose\n```\n\n### Interactive Mode\nLaunch an interactive session for guided GPU estimation:\n\n```bash\ngpu-estimate interactive\n```\n\nFeatures:\n- Guided workflows for all estimation tasks\n- Model discovery with direct estimation\n- Flexible model specification (parameters, names, or HF IDs)\n- Step-by-step configuration of training parameters\n- Quick estimates from trending model lists\n\n## Supported Models & Architectures\n\n### Hugging Face Models\nThe package automatically supports any model on Hugging Face Hub by detecting their configuration. Popular architectures include:\n\n| Architecture | Examples | Use Cases |\n|-------------|----------|-----------|\n| LLaMA/LLaMA2/3/4 | `meta-llama/Llama-2-7b-hf`, `meta-llama/Llama-3.2-3B`, `meta-llama/Llama-4-Scout-17B` | General language modeling, chat |\n| GPT | `gpt2`, `microsoft/DialoGPT-large` | Text generation, conversation |\n| Mistral | `mistralai/Mistral-7B-v0.1` | Efficient language modeling |\n| CodeLlama | `codellama/CodeLlama-7b-Python-hf` | Code generation |\n| BERT | `google-bert/bert-base-uncased` | Text classification, NLU |\n| T5 | `google-t5/t5-base`, `google/flan-t5-large` | Text-to-text tasks |\n| Phi | `microsoft/phi-2` | Small efficient models |\n| Gemma/Gemma2/3 | `google/gemma-7b`, `google/gemma-2-9b`, `google/gemma-3-270m` | Google's language models |\n| Qwen/Qwen2.5/3 | `Qwen/Qwen-7B`, `Qwen/Qwen2.5-7B`, `Qwen/Qwen3-4B` | Multilingual models |\n\n### Predefined Models\nClassic and latest models with known configurations:\n\n**GPT Family:**\n- `gpt2`, `gpt2-medium`, `gpt2-large`, `gpt2-xl`, `gpt3`\n\n**LLaMA Family:**\n- Original: `llama-7b`, `llama-13b`, `llama-30b`, `llama-65b` \n- LLaMA 2: `llama2-7b`, `llama2-13b`, `llama2-70b`\n- LLaMA 3.2: `llama3.2-1b`, `llama3.2-3b`\n- LLaMA 3.3: `llama3.3-70b`\n- LLaMA 4: `llama4-scout-17b`, `llama4-maverick-17b`\n- Code LLaMA: `codellama-7b`, `codellama-13b`, `codellama-34b`\n\n**Mistral Family:**\n- `mistral-7b`\n\n**Phi Family:**\n- `phi-1.5b`, `phi-2.7b`\n\n**Gemma Family:**\n- Original: `gemma-2b`, `gemma-7b`\n- Gemma 2: `gemma2-2b`, `gemma2-9b`, `gemma2-27b`\n- Gemma 3: `gemma3-270m`\n\n**Qwen Family:**\n- Qwen 2.5: `qwen2.5-7b`, `qwen2.5-14b`, `qwen2.5-32b`, `qwen2.5-72b`\n- Qwen 3: `qwen3-4b`, `qwen3-30b`, `qwen3-235b`\n\n**Flexible Naming**: Model names support flexible matching. Use `custom-llama-7b`, `my-mistral-7b`, or any name containing a known model identifier.\n\n## GPU Types Supported\n\n| GPU | Memory | Use Case |\n|-----|--------|----------|\n| H100 | 80 GB | Latest high-performance training |\n| A100 | 80 GB | Large model training and inference |\n| A40 | 48 GB | Professional workstation training |\n| A6000 | 48 GB | Creative and AI workstation |\n| L40 | 48 GB | Data center inference |\n| L4 | 24 GB | Efficient inference |\n| RTX 4090 | 24 GB | Consumer high-end |\n| RTX 3090 | 24 GB | Consumer enthusiast |\n| V100 | 32 GB | Previous generation training |\n| T4 | 16 GB | Cloud inference |\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A Python package for estimating GPU memory requirements and the number of GPUs needed for training machine learning models",
"version": "0.1.4",
"project_urls": {
"Documentation": "https://github.com/hemanth/gpu-estimator#readme",
"Homepage": "https://github.com/hemanth/gpu-estimator",
"Issues": "https://github.com/hemanth/gpu-estimator/issues",
"Repository": "https://github.com/hemanth/gpu-estimator"
},
"split_keywords": [
"gpu",
" machine-learning",
" deep-learning",
" memory",
" estimation",
" huggingface",
" pytorch"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "ee8900a678bb5f99e15600fee6621b9bafefccb9b18c5b6993bbc2649bd601c5",
"md5": "25897b7b1c69547e58dfdad8a7f62f24",
"sha256": "3c73661b1e8bb52f0ea49a45ea922525905617b4c9cc12a34240724e1a0e2712"
},
"downloads": -1,
"filename": "gpu_estimator-0.1.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "25897b7b1c69547e58dfdad8a7f62f24",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 17933,
"upload_time": "2025-09-10T19:38:30",
"upload_time_iso_8601": "2025-09-10T19:38:30.995591Z",
"url": "https://files.pythonhosted.org/packages/ee/89/00a678bb5f99e15600fee6621b9bafefccb9b18c5b6993bbc2649bd601c5/gpu_estimator-0.1.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "370b96f5e8c20ec543c64686cb51a490828f3fd81a0f3f1166bb9e70c5b56fb6",
"md5": "a28bf93360503dfc440832ed1971fe27",
"sha256": "0116ad4889f380016c3f9faa84011029574066bffccd9c485aba873566173f56"
},
"downloads": -1,
"filename": "gpu_estimator-0.1.4.tar.gz",
"has_sig": false,
"md5_digest": "a28bf93360503dfc440832ed1971fe27",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 20361,
"upload_time": "2025-09-10T19:38:33",
"upload_time_iso_8601": "2025-09-10T19:38:33.330770Z",
"url": "https://files.pythonhosted.org/packages/37/0b/96f5e8c20ec543c64686cb51a490828f3fd81a0f3f1166bb9e70c5b56fb6/gpu_estimator-0.1.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-10 19:38:33",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "hemanth",
"github_project": "gpu-estimator#readme",
"github_not_found": true,
"lcname": "gpu-estimator"
}