A comprehensive multi-vendor GPU health monitoring and optimization tool that helps users assess GPU performance and select optimal hardware for their workloads.
๐ Features
๐ฅ Comprehensive GPU Health Monitoring: Temperature, power, utilization, and throttling detection
โก Advanced Stress Testing: Compute, memory bandwidth, VRAM, and mixed-precision tests
๐ Detailed Health Scoring: 100-point scoring system with actionable recommendations
๐ฅ๏ธ Multi-GPU Support: Test and compare multiple GPUs simultaneously
๐งช Mock Mode: Test on any computer without GPUs (perfect for development)
๐ Multi-Vendor Support: NVIDIA, AMD, Intel, and Mock mode
โ๏ธ Cloud-Ready: Designed to help select optimal GPUs for cloud deployment (coming soon!)
## Installation
Basic Installation (Mock Mode Only)
# For systems without GPUs or for testing
pip install gpu-benchmark-tool
Installation with GPU Support
bash
# For NVIDIA GPUs (most common)
pip install gpu-benchmark-tool[nvidia]
# For AMD GPUs
pip install gpu-benchmark-tool[amd]
# For Intel GPUs
pip install gpu-benchmark-tool[intel]
# For all GPU vendors
pip install gpu-benchmark-tool[all]
๐ฏ Quick Start
1. Check Available GPUs
gpu-benchmark list
2. Run Benchmark
# Benchmark all GPUs
gpu-benchmark benchmark
# Benchmark specific GPU (recommended)
gpu-benchmark benchmark --gpu-id 0
# Quick 30-second test
gpu-benchmark benchmark --gpu-id 0 --duration 30
# Export results to JSON
gpu-benchmark benchmark --gpu-id 0 --export results.json
3. Mock Mode (No GPU Required)
# Perfect for development or systems without GPUs
gpu-benchmark benchmark --mock --duration 30
๐ Google Colab Quick Start
# Run in a Colab notebook (Runtime > Change runtime type > GPU)
!pip install gpu-benchmark-tool[nvidia]
!gpu-benchmark benchmark --gpu-id 0 --duration 30
# Understanding Results
Health Score (0-100 points)
85-100: ๐ข Healthy - Safe for all workloads including AI training
70-84: ๐ข Good - Suitable for most workloads
55-69: ๐ก Degraded - Limit to inference or light compute
40-54: ๐ก Warning - Monitor closely, avoid heavy workloads
0-39: ๐ด Critical - Do not use for production
### Score Components
Each component contributes to the total 100-point score:
**Temperature (20 points)**
- Peak temperature during stress test
- Under 80ยฐC: Full points
- 80-85ยฐC: 15 points
- 85-90ยฐC: 10 points
- Over 90ยฐC: 5 points
**Baseline Temperature (10 points)**
- GPU temperature at idle
- Under 50ยฐC: Full points
- 50-60ยฐC: 5 points
- Over 60ยฐC: 0 points
**Power Efficiency (10 points)**
- Power consumption optimization
- Within optimal range: Full points
- Slightly outside range: 5 points
- Far from optimal: 0 points
**GPU Utilization (10 points)**
- How well the GPU is utilized during tests
- 99%+: Full points
- 90-98%: 5 points
- Under 90%: 0 points
**Throttling (20 points)**
- Thermal or power throttling detection
- No throttling: Full points
- Occasional throttling: 10-15 points
- Frequent throttling: 0-5 points
**Errors (20 points)**
- Stability during stress tests
- No errors: Full points
- Few errors: 10-15 points
- Many errors: 0-5 points
**Temperature Stability (10 points)**
- Temperature consistency during tests
- Very stable: Full points
- Some fluctuation: 5-7 points
- Unstable: 0-5 points
# Performance Metrics
Matrix Multiplication: Raw compute performance (TFLOPS)
Memory Bandwidth: Memory throughput (GB/s)
VRAM Stress: Memory allocation stability
Mixed Precision: FP16/BF16 support for AI workloads
# Command Line Usage
Benchmark Command
gpu-benchmark benchmark [OPTIONS]
Options:
--gpu-id INTEGER Specific GPU to test (default: all GPUs)
--duration INTEGER Test duration in seconds (default: 60)
--basic Run basic tests only (faster)
--export TEXT Export results to JSON file
--verbose Show detailed output
--mock Use mock GPU (no hardware required)
# Examples
# Full test on GPU 0 with export
gpu-benchmark benchmark --gpu-id 0 --duration 120 --export full_test.json
# Quick health check
gpu-benchmark benchmark --gpu-id 0 --duration 30 --basic
# Development testing
gpu-benchmark benchmark --mock --export mock_results.json
## Real-time Monitoring
# Monitor GPU metrics in real-time (NVIDIA only)
gpu-benchmark monitor --gpu-id 0
# Python API Usage
Basic Usage
import pynvml
from gpu_benchmark import run_full_benchmark
# Initialize NVML
pynvml.nvmlInit()
handle = pynvml.nvmlDeviceGetHandleByIndex(0)
# Run benchmark
results = run_full_benchmark(
handle=handle,
duration=60,
enhanced=True,
device_id=0
)
# Access results
print(f"Health Score: {results['health_score']['score']}/100")
print(f"Status: {results['health_score']['status']}")
Analyzing Results
# Check if GPU is healthy for production
if results['health_score']['score'] >= 70:
print("โ
GPU is suitable for production workloads")
else:
print("โ ๏ธ GPU needs attention")
# Access performance metrics
if 'performance_tests' in results:
tflops = results['performance_tests']['matrix_multiply']['tflops']
print(f"Compute Performance: {tflops:.2f} TFLOPS")
๐ง Troubleshooting
# Common Issues
"No GPUs found"
Use --mock flag for testing without GPUs
Ensure NVIDIA/AMD/Intel drivers are installed
"NVML Error" on Colab
This warning can be ignored - the tool still works correctly
Use --gpu-id 0 for cleaner output
# Low Health Scores
Check system cooling
Ensure GPU isn't thermal throttling
Close other GPU applications
Multi-GPU JSON Format
Use --gpu-id 0 to test single GPU (simpler output)
Without --gpu-id, results are nested under 'results' key
# Supported GPUs
NVIDIA GPUs (Full Support)
Consumer: RTX 4090, 4080, 4070, 3090, 3080, 3070, 3060
Data Center: A100, V100, T4, P100, K80
Workstation: RTX A6000, A5000, A4000
AMD GPUs (ROCm Required)
MI250X, MI210, MI100
Radeon RX 7900 XTX, RX 6900 XT
Intel GPUs (Limited Support)
Arc A770, A750
Intel Xe integrated graphics
# Requirements
Python 3.8 or higher
For NVIDIA: CUDA drivers
For AMD: ROCm drivers
For Intel: Intel GPU drivers
๐ License
MIT License - see LICENSE file for details.
๐ Acknowledgments
Built to solve real-world GPU selection challenges and reduce cloud computing costs through better hardware decisions.
๐ง Contact
PyPI: https://pypi.org/project/gpu-benchmark-tool/
Email: ywrajput@gmail.com
Raw data
{
"_id": null,
"home_page": "https://github.com/yourusername/gpu-benchmark-tool",
"name": "gpu-benchmark-tool",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "gpu benchmark monitoring cuda rocm intel nvidia amd old-gpu ewaste recycling sustainability",
"author": "Yousuf Rajput",
"author_email": "ywrajput@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/d7/23/7c51657e76a3d44a230e4c7189ee00102b78dbd7aa40c3cbed474e1e77b4/gpu_benchmark_tool-0.2.2.tar.gz",
"platform": null,
"description": "A comprehensive multi-vendor GPU health monitoring and optimization tool that helps users assess GPU performance and select optimal hardware for their workloads.\n\n\ud83d\ude80 Features\n\n\ud83d\udd25 Comprehensive GPU Health Monitoring: Temperature, power, utilization, and throttling detection\n\n\u26a1 Advanced Stress Testing: Compute, memory bandwidth, VRAM, and mixed-precision tests\n\n\ud83d\udcca Detailed Health Scoring: 100-point scoring system with actionable recommendations\n\n\ud83d\udda5\ufe0f Multi-GPU Support: Test and compare multiple GPUs simultaneously\n\n\ud83e\uddea Mock Mode: Test on any computer without GPUs (perfect for development)\n\n\ud83d\udd0c Multi-Vendor Support: NVIDIA, AMD, Intel, and Mock mode\n\n\u2601\ufe0f Cloud-Ready: Designed to help select optimal GPUs for cloud deployment (coming soon!)\n\n## Installation\n\nBasic Installation (Mock Mode Only)\n\n# For systems without GPUs or for testing\npip install gpu-benchmark-tool\nInstallation with GPU Support\nbash\n# For NVIDIA GPUs (most common)\npip install gpu-benchmark-tool[nvidia]\n\n# For AMD GPUs\npip install gpu-benchmark-tool[amd]\n\n# For Intel GPUs\npip install gpu-benchmark-tool[intel]\n\n# For all GPU vendors\npip install gpu-benchmark-tool[all]\n\n\ud83c\udfaf Quick Start\n1. Check Available GPUs\ngpu-benchmark list\n\n2. Run Benchmark\n\n# Benchmark all GPUs\ngpu-benchmark benchmark\n\n# Benchmark specific GPU (recommended)\ngpu-benchmark benchmark --gpu-id 0\n\n# Quick 30-second test\ngpu-benchmark benchmark --gpu-id 0 --duration 30\n\n# Export results to JSON\ngpu-benchmark benchmark --gpu-id 0 --export results.json\n\n3. Mock Mode (No GPU Required)\n\n# Perfect for development or systems without GPUs\ngpu-benchmark benchmark --mock --duration 30\n\n\ud83d\udcca Google Colab Quick Start\n\n# Run in a Colab notebook (Runtime > Change runtime type > GPU)\n!pip install gpu-benchmark-tool[nvidia]\n!gpu-benchmark benchmark --gpu-id 0 --duration 30\n\n# Understanding Results\n\nHealth Score (0-100 points)\n85-100: \ud83d\udfe2 Healthy - Safe for all workloads including AI training\n70-84: \ud83d\udfe2 Good - Suitable for most workloads\n55-69: \ud83d\udfe1 Degraded - Limit to inference or light compute\n40-54: \ud83d\udfe1 Warning - Monitor closely, avoid heavy workloads\n0-39: \ud83d\udd34 Critical - Do not use for production\n\n### Score Components\n\nEach component contributes to the total 100-point score:\n\n**Temperature (20 points)**\n- Peak temperature during stress test\n- Under 80\u00b0C: Full points\n- 80-85\u00b0C: 15 points\n- 85-90\u00b0C: 10 points\n- Over 90\u00b0C: 5 points\n\n**Baseline Temperature (10 points)**\n- GPU temperature at idle\n- Under 50\u00b0C: Full points\n- 50-60\u00b0C: 5 points\n- Over 60\u00b0C: 0 points\n\n**Power Efficiency (10 points)**\n- Power consumption optimization\n- Within optimal range: Full points\n- Slightly outside range: 5 points\n- Far from optimal: 0 points\n\n**GPU Utilization (10 points)**\n- How well the GPU is utilized during tests\n- 99%+: Full points\n- 90-98%: 5 points\n- Under 90%: 0 points\n\n**Throttling (20 points)**\n- Thermal or power throttling detection\n- No throttling: Full points\n- Occasional throttling: 10-15 points\n- Frequent throttling: 0-5 points\n\n**Errors (20 points)**\n- Stability during stress tests\n- No errors: Full points\n- Few errors: 10-15 points\n- Many errors: 0-5 points\n\n**Temperature Stability (10 points)**\n- Temperature consistency during tests\n- Very stable: Full points\n- Some fluctuation: 5-7 points\n- Unstable: 0-5 points\n\n# Performance Metrics\nMatrix Multiplication: Raw compute performance (TFLOPS)\nMemory Bandwidth: Memory throughput (GB/s)\nVRAM Stress: Memory allocation stability\nMixed Precision: FP16/BF16 support for AI workloads\n\n# Command Line Usage\nBenchmark Command\n\ngpu-benchmark benchmark [OPTIONS]\n\nOptions:\n --gpu-id INTEGER Specific GPU to test (default: all GPUs)\n --duration INTEGER Test duration in seconds (default: 60)\n --basic Run basic tests only (faster)\n --export TEXT Export results to JSON file\n --verbose Show detailed output\n --mock Use mock GPU (no hardware required)\n\n# Examples\n\n# Full test on GPU 0 with export\ngpu-benchmark benchmark --gpu-id 0 --duration 120 --export full_test.json\n\n# Quick health check\ngpu-benchmark benchmark --gpu-id 0 --duration 30 --basic\n\n# Development testing\ngpu-benchmark benchmark --mock --export mock_results.json\n\n\n## Real-time Monitoring\n\n# Monitor GPU metrics in real-time (NVIDIA only)\ngpu-benchmark monitor --gpu-id 0\n\n# Python API Usage\nBasic Usage\n\nimport pynvml\nfrom gpu_benchmark import run_full_benchmark\n\n# Initialize NVML\npynvml.nvmlInit()\nhandle = pynvml.nvmlDeviceGetHandleByIndex(0)\n\n# Run benchmark\nresults = run_full_benchmark(\n handle=handle,\n duration=60,\n enhanced=True,\n device_id=0\n)\n\n# Access results\n\nprint(f\"Health Score: {results['health_score']['score']}/100\")\nprint(f\"Status: {results['health_score']['status']}\")\n\nAnalyzing Results\n\n# Check if GPU is healthy for production\nif results['health_score']['score'] >= 70:\n print(\"\u2705 GPU is suitable for production workloads\")\nelse:\n print(\"\u26a0\ufe0f GPU needs attention\")\n \n# Access performance metrics\nif 'performance_tests' in results:\n tflops = results['performance_tests']['matrix_multiply']['tflops']\n print(f\"Compute Performance: {tflops:.2f} TFLOPS\")\n\n\ud83d\udd27 Troubleshooting\n\n# Common Issues\n\n\"No GPUs found\"\n\nUse --mock flag for testing without GPUs\nEnsure NVIDIA/AMD/Intel drivers are installed\n\n\"NVML Error\" on Colab\n\nThis warning can be ignored - the tool still works correctly\nUse --gpu-id 0 for cleaner output\n\n\n# Low Health Scores\n\nCheck system cooling\nEnsure GPU isn't thermal throttling\nClose other GPU applications\nMulti-GPU JSON Format\n\nUse --gpu-id 0 to test single GPU (simpler output)\nWithout --gpu-id, results are nested under 'results' key\n\n# Supported GPUs\nNVIDIA GPUs (Full Support)\nConsumer: RTX 4090, 4080, 4070, 3090, 3080, 3070, 3060\nData Center: A100, V100, T4, P100, K80\nWorkstation: RTX A6000, A5000, A4000\nAMD GPUs (ROCm Required)\nMI250X, MI210, MI100\nRadeon RX 7900 XTX, RX 6900 XT\nIntel GPUs (Limited Support)\nArc A770, A750\nIntel Xe integrated graphics\n\n# Requirements\nPython 3.8 or higher\nFor NVIDIA: CUDA drivers\nFor AMD: ROCm drivers\nFor Intel: Intel GPU drivers\n\n\ud83d\udcc4 License\nMIT License - see LICENSE file for details.\n\n\ud83d\ude4f Acknowledgments\nBuilt to solve real-world GPU selection challenges and reduce cloud computing costs through better hardware decisions.\n\n\ud83d\udce7 Contact\nPyPI: https://pypi.org/project/gpu-benchmark-tool/\nEmail: ywrajput@gmail.com\n",
"bugtrack_url": null,
"license": null,
"summary": "Multi-vendor GPU health monitoring supporting old GPUs for e-waste reduction",
"version": "0.2.2",
"project_urls": {
"Homepage": "https://github.com/yourusername/gpu-benchmark-tool"
},
"split_keywords": [
"gpu",
"benchmark",
"monitoring",
"cuda",
"rocm",
"intel",
"nvidia",
"amd",
"old-gpu",
"ewaste",
"recycling",
"sustainability"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "ec3b12fa61ab3b26b9bac6a003481a77efc81d9ffbd3e861b00b480a64462f21",
"md5": "682e9eb948a3367aee0c503a53335807",
"sha256": "4597c063341d38fdccdcb3a125faaff94980e1f87528ddde8720870ab364c503"
},
"downloads": -1,
"filename": "gpu_benchmark_tool-0.2.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "682e9eb948a3367aee0c503a53335807",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 30723,
"upload_time": "2025-07-09T21:03:49",
"upload_time_iso_8601": "2025-07-09T21:03:49.717357Z",
"url": "https://files.pythonhosted.org/packages/ec/3b/12fa61ab3b26b9bac6a003481a77efc81d9ffbd3e861b00b480a64462f21/gpu_benchmark_tool-0.2.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "d7237c51657e76a3d44a230e4c7189ee00102b78dbd7aa40c3cbed474e1e77b4",
"md5": "d9d8a51ecb14a4f5ad5c4aa831d77c4e",
"sha256": "dd071d09db97152c2359a05033d7eae292e70e021e18c6192149032f8826a173"
},
"downloads": -1,
"filename": "gpu_benchmark_tool-0.2.2.tar.gz",
"has_sig": false,
"md5_digest": "d9d8a51ecb14a4f5ad5c4aa831d77c4e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 26596,
"upload_time": "2025-07-09T21:03:53",
"upload_time_iso_8601": "2025-07-09T21:03:53.850044Z",
"url": "https://files.pythonhosted.org/packages/d7/23/7c51657e76a3d44a230e4c7189ee00102b78dbd7aa40c3cbed474e1e77b4/gpu_benchmark_tool-0.2.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-09 21:03:53",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "yourusername",
"github_project": "gpu-benchmark-tool",
"github_not_found": true,
"lcname": "gpu-benchmark-tool"
}