# InferenceUtils - Intelligent Hardware Detection for LLM Inference
A comprehensive Python library that automatically detects your hardware capabilities and provides optimal recommendations for LLM inference engines and build configurations.
## 🚀 Quick Start
```python
from inferenceutils import systeminfo, optimal_inference_engine, llama_cpp_build_args
# Get comprehensive hardware information
hw = systeminfo()
print(f"CPU: {hw.cpu.brand_raw}")
print(f"GPU: {hw.gpu.detected_vendor}")
print(f"RAM: {hw.ram.total_gb} GB")
# Get optimal inference engine recommendation
engine = optimal_inference_engine()
print(f"Recommended: {engine.name}")
print(f"Install: pip install {' '.join(engine.dependencies)}")
# Get optimal build arguments for llama-cpp-python
args = llama_cpp_build_args()
print(f"CMAKE_ARGS: {' '.join(args)}")
```
## ✨ Key Features
### **🔍 Intelligent Hardware Detection**
- **Cross-platform**: macOS, Linux, Windows
- **Comprehensive**: CPU, GPU, RAM, storage, instruction sets
- **Type-safe**: All data validated with Pydantic schemas
- **Pure Python**: No external command execution required
### **🎯 Optimal Engine Recommendations**
- **Hardware-aware**: Automatically selects best engine for your system
- **Dependencies included**: Provides exact pip install commands
- **Detailed reasoning**: Explains why each engine was chosen
- **Performance-focused**: Prioritizes fastest available hardware
### **⚡ Build Optimization**
- **llama-cpp-python**: Optimal CMAKE arguments for your hardware
- **GPU acceleration**: CUDA, Metal, ROCm, Vulkan, SYCL
- **CPU optimization**: AVX-512, AVX2, OpenMP, KleidiAI
- **Platform-specific**: Different optimizations per OS
## 📦 Installation
```bash
# Install from source
git clone <repository-url>
cd InferenceUtils
pip install -e .
# Or install dependencies manually
pip install py-cpuinfo psutil nvidia-ml-py amdsmi openvino mlx pyobjc vulkan pydantic
```
## 🛠️ API Reference
### Core Functions
#### `systeminfo() -> HardwareProfile`
Get comprehensive hardware information as a validated Pydantic BaseModel.
```python
from inferenceutils import systeminfo
hw = systeminfo()
# Access typed data
print(f"OS: {hw.os.platform}")
print(f"CPU: {hw.cpu.brand_raw}")
print(f"RAM: {hw.ram.total_gb} GB")
# GPU information
if hw.gpu.detected_vendor == "NVIDIA":
for gpu in hw.gpu.nvidia:
print(f"NVIDIA: {gpu.model} ({gpu.vram_gb} GB)")
elif hw.gpu.detected_vendor == "Apple":
print(f"Apple: {hw.gpu.apple.model}")
# Convert to JSON
json_data = hw.model_dump_json(indent=2)
```
#### `optimal_inference_engine() -> OptimalInferenceEngine`
Get the optimal inference engine recommendation with dependencies.
```python
from inferenceutils import optimal_inference_engine
engine = optimal_inference_engine()
print(f"Engine: {engine.name}")
print(f"Dependencies: {engine.dependencies}")
print(f"Reason: {engine.reason}")
# Install the recommended engine
install_cmd = f"pip install {' '.join(engine.dependencies)}"
print(f"Run: {install_cmd}")
```
#### `llama_cpp_build_args() -> List[str]`
Get optimal CMAKE build arguments for llama-cpp-python.
```python
from inferenceutils import llama_cpp_build_args, get_llama_cpp_install_command
# Get build arguments
args = llama_cpp_build_args()
print(f"CMAKE arguments: {' '.join(args)}")
# Get complete install command
install_cmd = get_llama_cpp_install_command()
print(f"Install command: {install_cmd}")
```
### Pydantic Schemas
#### `HardwareProfile`
Complete hardware profile with all detected components.
```python
from inferenceutils import HardwareProfile
# Validate hardware data
try:
profile = HardwareProfile(**hardware_data)
print("✅ Data is valid")
except ValidationError as e:
print(f"❌ Validation failed: {e}")
```
#### `OptimalInferenceEngine`
Inference engine recommendation with dependencies.
```python
from inferenceutils import OptimalInferenceEngine
# Create recommendation
recommendation = OptimalInferenceEngine(
name="MLX",
dependencies=["mlx-lm"],
reason="Optimized for Apple Silicon"
)
```
## 🎯 Supported Inference Engines
| Engine | Best For | Dependencies |
|--------|----------|--------------|
| **TensorRT-LLM** | High-end NVIDIA GPUs (Ampere+) | `tensorrt-llm` |
| **vLLM** | NVIDIA GPUs (Turing/Volta+) | `vllm` |
| **MLX** | Apple Silicon | `mlx-lm` |
| **OpenVINO** | Intel GPUs/NPUs | `openvino` |
| **llama.cpp** | AMD GPUs, high-performance CPUs | `llama-cpp-python` |
## 🔧 Hardware Acceleration Support
### GPU Backends
- **NVIDIA CUDA**: Automatic compute capability detection
- **Apple Metal**: Native Apple Silicon optimization
- **AMD ROCm**: HIP acceleration for AMD GPUs
- **Intel SYCL**: oneAPI for Intel accelerators
- **Vulkan**: Cross-platform GPU acceleration
### CPU Optimizations
- **Intel oneMKL**: AVX-512 optimization
- **OpenBLAS**: AVX2 acceleration
- **OpenMP**: Multi-core parallelism
- **KleidiAI**: ARM CPU optimization
## 📋 Example Output
### Hardware Detection
```json
{
"os": {
"platform": "Darwin",
"version": "23.0.0",
"architecture": "arm64"
},
"cpu": {
"brand_raw": "Apple M2 Pro",
"physical_cores": 10,
"logical_cores": 10,
"instruction_sets": ["neon"]
},
"ram": {
"total_gb": 32.0,
"available_gb": 24.5
},
"gpu": {
"detected_vendor": "Apple",
"apple": {
"model": "Apple Silicon GPU",
"vram_gb": 32.0,
"metal_supported": true
}
}
}
```
### Engine Recommendation
```json
{
"name": "MLX",
"dependencies": ["mlx-lm"],
"reason": "Natively designed for Apple Silicon. The system's unified memory architecture is best exploited by Apple's own MLX framework, which leverages the CPU, GPU, and Neural Engine."
}
```
### Build Arguments
```bash
# Apple Silicon
-DGGML_METAL=ON -DGGML_SVE=OFF -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=Accelerate
# NVIDIA GPU
-DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=89 -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
```
## 🚀 Use Cases
### Development Setup
```python
from inferenceutils import systeminfo, optimal_inference_engine
# Quick hardware overview
hw = systeminfo()
print(f"Setting up development environment for {hw.cpu.brand_raw}")
# Get recommended engine
engine = optimal_inference_engine()
print(f"Installing {engine.name}...")
```
### CI/CD Pipelines
```python
from inferenceutils import llama_cpp_build_args
# Generate build args for different runners
args = llama_cpp_build_args()
print(f"Building with: {' '.join(args)}")
```
### User Documentation
```python
from inferenceutils import optimal_inference_engine, get_llama_cpp_install_command
# Generate user-specific instructions
engine = optimal_inference_engine()
if engine.name == "llama.cpp":
install_cmd = get_llama_cpp_install_command()
print(f"Run: {install_cmd}")
else:
print(f"Run: pip install {' '.join(engine.dependencies)}")
```
## 🔍 Hardware Detection Capabilities
### CPU Detection
- Model and architecture
- Core count (physical/logical)
- Instruction sets (AVX-512, AVX2, NEON, AMX)
- Performance characteristics
### GPU Detection
- **NVIDIA**: Model, VRAM, compute capability, driver version
- **AMD**: Model, VRAM, ROCm compatibility, compute units
- **Intel**: Model, type (dGPU/iGPU/NPU), execution units
- **Apple**: Model, unified memory, Metal support, GPU cores
### Memory & Storage
- Total and available RAM
- Primary storage type (SSD/HDD)
- Memory bandwidth considerations
### NPU Detection
- **Apple Neural Engine**: Core count, availability
- **Intel AI Boost**: NPU detection and capabilities
- **AMD Ryzen AI**: CPU-based detection
## 🛠️ Dependencies
### Core Dependencies
- **py-cpuinfo**: CPU information
- **psutil**: System and process utilities
- **nvidia-ml-py**: NVIDIA GPU monitoring
- **openvino**: Intel accelerator support
- **mlx**: Apple Silicon support
- **pyobjc**: macOS system integration
- **vulkan**: Vulkan API support
- **pydantic**: Data validation and serialization
- **amdsmi**: AMD GPU monitoring (install with `pip install inferenceutils[amd]`)
## 📄 License
MIT License - see LICENSE file for details.
## 🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Raw data
{
"_id": null,
"home_page": null,
"name": "inferenceutils",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "hardware, inspection, llm, inference, pydantic, gpu, cpu, utils",
"author": null,
"author_email": "Luke <luke@example.com>",
"download_url": "https://files.pythonhosted.org/packages/08/51/e94fd1c8e8c0548ce55d3fe23d7ef2bc5449f7eac9a206494334582e4136/inferenceutils-0.1.0.tar.gz",
"platform": null,
"description": "# InferenceUtils - Intelligent Hardware Detection for LLM Inference\n\nA comprehensive Python library that automatically detects your hardware capabilities and provides optimal recommendations for LLM inference engines and build configurations.\n\n## \ud83d\ude80 Quick Start\n\n```python\nfrom inferenceutils import systeminfo, optimal_inference_engine, llama_cpp_build_args\n\n# Get comprehensive hardware information\nhw = systeminfo()\nprint(f\"CPU: {hw.cpu.brand_raw}\")\nprint(f\"GPU: {hw.gpu.detected_vendor}\")\nprint(f\"RAM: {hw.ram.total_gb} GB\")\n\n# Get optimal inference engine recommendation\nengine = optimal_inference_engine()\nprint(f\"Recommended: {engine.name}\")\nprint(f\"Install: pip install {' '.join(engine.dependencies)}\")\n\n# Get optimal build arguments for llama-cpp-python\nargs = llama_cpp_build_args()\nprint(f\"CMAKE_ARGS: {' '.join(args)}\")\n```\n\n## \u2728 Key Features\n\n### **\ud83d\udd0d Intelligent Hardware Detection**\n- **Cross-platform**: macOS, Linux, Windows\n- **Comprehensive**: CPU, GPU, RAM, storage, instruction sets\n- **Type-safe**: All data validated with Pydantic schemas\n- **Pure Python**: No external command execution required\n\n### **\ud83c\udfaf Optimal Engine Recommendations**\n- **Hardware-aware**: Automatically selects best engine for your system\n- **Dependencies included**: Provides exact pip install commands\n- **Detailed reasoning**: Explains why each engine was chosen\n- **Performance-focused**: Prioritizes fastest available hardware\n\n### **\u26a1 Build Optimization**\n- **llama-cpp-python**: Optimal CMAKE arguments for your hardware\n- **GPU acceleration**: CUDA, Metal, ROCm, Vulkan, SYCL\n- **CPU optimization**: AVX-512, AVX2, OpenMP, KleidiAI\n- **Platform-specific**: Different optimizations per OS\n\n## \ud83d\udce6 Installation\n\n```bash\n# Install from source\ngit clone <repository-url>\ncd InferenceUtils\npip install -e .\n\n# Or install dependencies manually\npip install py-cpuinfo psutil nvidia-ml-py amdsmi openvino mlx pyobjc vulkan pydantic\n```\n\n## \ud83d\udee0\ufe0f API Reference\n\n### Core Functions\n\n#### `systeminfo() -> HardwareProfile`\nGet comprehensive hardware information as a validated Pydantic BaseModel.\n\n```python\nfrom inferenceutils import systeminfo\n\nhw = systeminfo()\n\n# Access typed data\nprint(f\"OS: {hw.os.platform}\")\nprint(f\"CPU: {hw.cpu.brand_raw}\")\nprint(f\"RAM: {hw.ram.total_gb} GB\")\n\n# GPU information\nif hw.gpu.detected_vendor == \"NVIDIA\":\n for gpu in hw.gpu.nvidia:\n print(f\"NVIDIA: {gpu.model} ({gpu.vram_gb} GB)\")\nelif hw.gpu.detected_vendor == \"Apple\":\n print(f\"Apple: {hw.gpu.apple.model}\")\n\n# Convert to JSON\njson_data = hw.model_dump_json(indent=2)\n```\n\n#### `optimal_inference_engine() -> OptimalInferenceEngine`\nGet the optimal inference engine recommendation with dependencies.\n\n```python\nfrom inferenceutils import optimal_inference_engine\n\nengine = optimal_inference_engine()\n\nprint(f\"Engine: {engine.name}\")\nprint(f\"Dependencies: {engine.dependencies}\")\nprint(f\"Reason: {engine.reason}\")\n\n# Install the recommended engine\ninstall_cmd = f\"pip install {' '.join(engine.dependencies)}\"\nprint(f\"Run: {install_cmd}\")\n```\n\n#### `llama_cpp_build_args() -> List[str]`\nGet optimal CMAKE build arguments for llama-cpp-python.\n\n```python\nfrom inferenceutils import llama_cpp_build_args, get_llama_cpp_install_command\n\n# Get build arguments\nargs = llama_cpp_build_args()\nprint(f\"CMAKE arguments: {' '.join(args)}\")\n\n# Get complete install command\ninstall_cmd = get_llama_cpp_install_command()\nprint(f\"Install command: {install_cmd}\")\n```\n\n### Pydantic Schemas\n\n#### `HardwareProfile`\nComplete hardware profile with all detected components.\n\n```python\nfrom inferenceutils import HardwareProfile\n\n# Validate hardware data\ntry:\n profile = HardwareProfile(**hardware_data)\n print(\"\u2705 Data is valid\")\nexcept ValidationError as e:\n print(f\"\u274c Validation failed: {e}\")\n```\n\n#### `OptimalInferenceEngine`\nInference engine recommendation with dependencies.\n\n```python\nfrom inferenceutils import OptimalInferenceEngine\n\n# Create recommendation\nrecommendation = OptimalInferenceEngine(\n name=\"MLX\",\n dependencies=[\"mlx-lm\"],\n reason=\"Optimized for Apple Silicon\"\n)\n```\n\n## \ud83c\udfaf Supported Inference Engines\n\n| Engine | Best For | Dependencies |\n|--------|----------|--------------|\n| **TensorRT-LLM** | High-end NVIDIA GPUs (Ampere+) | `tensorrt-llm` |\n| **vLLM** | NVIDIA GPUs (Turing/Volta+) | `vllm` |\n| **MLX** | Apple Silicon | `mlx-lm` |\n| **OpenVINO** | Intel GPUs/NPUs | `openvino` |\n| **llama.cpp** | AMD GPUs, high-performance CPUs | `llama-cpp-python` |\n\n## \ud83d\udd27 Hardware Acceleration Support\n\n### GPU Backends\n- **NVIDIA CUDA**: Automatic compute capability detection\n- **Apple Metal**: Native Apple Silicon optimization\n- **AMD ROCm**: HIP acceleration for AMD GPUs\n- **Intel SYCL**: oneAPI for Intel accelerators\n- **Vulkan**: Cross-platform GPU acceleration\n\n### CPU Optimizations\n- **Intel oneMKL**: AVX-512 optimization\n- **OpenBLAS**: AVX2 acceleration\n- **OpenMP**: Multi-core parallelism\n- **KleidiAI**: ARM CPU optimization\n\n## \ud83d\udccb Example Output\n\n### Hardware Detection\n```json\n{\n \"os\": {\n \"platform\": \"Darwin\",\n \"version\": \"23.0.0\",\n \"architecture\": \"arm64\"\n },\n \"cpu\": {\n \"brand_raw\": \"Apple M2 Pro\",\n \"physical_cores\": 10,\n \"logical_cores\": 10,\n \"instruction_sets\": [\"neon\"]\n },\n \"ram\": {\n \"total_gb\": 32.0,\n \"available_gb\": 24.5\n },\n \"gpu\": {\n \"detected_vendor\": \"Apple\",\n \"apple\": {\n \"model\": \"Apple Silicon GPU\",\n \"vram_gb\": 32.0,\n \"metal_supported\": true\n }\n }\n}\n```\n\n### Engine Recommendation\n```json\n{\n \"name\": \"MLX\",\n \"dependencies\": [\"mlx-lm\"],\n \"reason\": \"Natively designed for Apple Silicon. The system's unified memory architecture is best exploited by Apple's own MLX framework, which leverages the CPU, GPU, and Neural Engine.\"\n}\n```\n\n### Build Arguments\n```bash\n# Apple Silicon\n-DGGML_METAL=ON -DGGML_SVE=OFF -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=Accelerate\n\n# NVIDIA GPU\n-DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=89 -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS\n```\n\n## \ud83d\ude80 Use Cases\n\n### Development Setup\n```python\nfrom inferenceutils import systeminfo, optimal_inference_engine\n\n# Quick hardware overview\nhw = systeminfo()\nprint(f\"Setting up development environment for {hw.cpu.brand_raw}\")\n\n# Get recommended engine\nengine = optimal_inference_engine()\nprint(f\"Installing {engine.name}...\")\n```\n\n### CI/CD Pipelines\n```python\nfrom inferenceutils import llama_cpp_build_args\n\n# Generate build args for different runners\nargs = llama_cpp_build_args()\nprint(f\"Building with: {' '.join(args)}\")\n```\n\n### User Documentation\n```python\nfrom inferenceutils import optimal_inference_engine, get_llama_cpp_install_command\n\n# Generate user-specific instructions\nengine = optimal_inference_engine()\nif engine.name == \"llama.cpp\":\n install_cmd = get_llama_cpp_install_command()\n print(f\"Run: {install_cmd}\")\nelse:\n print(f\"Run: pip install {' '.join(engine.dependencies)}\")\n```\n\n## \ud83d\udd0d Hardware Detection Capabilities\n\n### CPU Detection\n- Model and architecture\n- Core count (physical/logical)\n- Instruction sets (AVX-512, AVX2, NEON, AMX)\n- Performance characteristics\n\n### GPU Detection\n- **NVIDIA**: Model, VRAM, compute capability, driver version\n- **AMD**: Model, VRAM, ROCm compatibility, compute units\n- **Intel**: Model, type (dGPU/iGPU/NPU), execution units\n- **Apple**: Model, unified memory, Metal support, GPU cores\n\n### Memory & Storage\n- Total and available RAM\n- Primary storage type (SSD/HDD)\n- Memory bandwidth considerations\n\n### NPU Detection\n- **Apple Neural Engine**: Core count, availability\n- **Intel AI Boost**: NPU detection and capabilities\n- **AMD Ryzen AI**: CPU-based detection\n\n## \ud83d\udee0\ufe0f Dependencies\n\n### Core Dependencies\n- **py-cpuinfo**: CPU information\n- **psutil**: System and process utilities\n- **nvidia-ml-py**: NVIDIA GPU monitoring\n- **openvino**: Intel accelerator support\n- **mlx**: Apple Silicon support\n- **pyobjc**: macOS system integration\n- **vulkan**: Vulkan API support\n- **pydantic**: Data validation and serialization\n- **amdsmi**: AMD GPU monitoring (install with `pip install inferenceutils[amd]`)\n\n## \ud83d\udcc4 License\n\nMIT License - see LICENSE file for details.\n\n## \ud83e\udd1d Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Intelligent hardware detection and optimal LLM inference engine recommendations with Pydantic schemas",
"version": "0.1.0",
"project_urls": {
"Bug Tracker": "https://github.com/lukerbs/InferenceUtils/issues",
"Homepage": "https://github.com/lukerbs/InferenceUtils",
"Repository": "https://github.com/lukerbs/InferenceUtils"
},
"split_keywords": [
"hardware",
" inspection",
" llm",
" inference",
" pydantic",
" gpu",
" cpu",
" utils"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "14eb656f2de8b88a50f264398bc329d0a317e1fe3c1c5a3311036b7d071ab111",
"md5": "9809851fef6a2d7a19ec8020ed222a11",
"sha256": "730026fe044062539688ca764400b47c166bc3305191fb43e3b8072e0176c393"
},
"downloads": -1,
"filename": "inferenceutils-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9809851fef6a2d7a19ec8020ed222a11",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 21749,
"upload_time": "2025-07-20T20:14:45",
"upload_time_iso_8601": "2025-07-20T20:14:45.077226Z",
"url": "https://files.pythonhosted.org/packages/14/eb/656f2de8b88a50f264398bc329d0a317e1fe3c1c5a3311036b7d071ab111/inferenceutils-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "0851e94fd1c8e8c0548ce55d3fe23d7ef2bc5449f7eac9a206494334582e4136",
"md5": "9d18b8c6e2276249694decbaa31ec34f",
"sha256": "76e6a420199a2209ac35a76de4ebe70d908289140414ec29696bcd2bea945c78"
},
"downloads": -1,
"filename": "inferenceutils-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "9d18b8c6e2276249694decbaa31ec34f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 22566,
"upload_time": "2025-07-20T20:14:46",
"upload_time_iso_8601": "2025-07-20T20:14:46.510837Z",
"url": "https://files.pythonhosted.org/packages/08/51/e94fd1c8e8c0548ce55d3fe23d7ef2bc5449f7eac9a206494334582e4136/inferenceutils-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-20 20:14:46",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "lukerbs",
"github_project": "InferenceUtils",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "inferenceutils"
}