inferenceutils


Nameinferenceutils JSON
Version 0.1.0 PyPI version JSON
download
home_pageNone
SummaryIntelligent hardware detection and optimal LLM inference engine recommendations with Pydantic schemas
upload_time2025-07-20 20:14:46
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseMIT
keywords hardware inspection llm inference pydantic gpu cpu utils
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # InferenceUtils - Intelligent Hardware Detection for LLM Inference

A comprehensive Python library that automatically detects your hardware capabilities and provides optimal recommendations for LLM inference engines and build configurations.

## 🚀 Quick Start

```python
from inferenceutils import systeminfo, optimal_inference_engine, llama_cpp_build_args

# Get comprehensive hardware information
hw = systeminfo()
print(f"CPU: {hw.cpu.brand_raw}")
print(f"GPU: {hw.gpu.detected_vendor}")
print(f"RAM: {hw.ram.total_gb} GB")

# Get optimal inference engine recommendation
engine = optimal_inference_engine()
print(f"Recommended: {engine.name}")
print(f"Install: pip install {' '.join(engine.dependencies)}")

# Get optimal build arguments for llama-cpp-python
args = llama_cpp_build_args()
print(f"CMAKE_ARGS: {' '.join(args)}")
```

## ✨ Key Features

### **🔍 Intelligent Hardware Detection**
- **Cross-platform**: macOS, Linux, Windows
- **Comprehensive**: CPU, GPU, RAM, storage, instruction sets
- **Type-safe**: All data validated with Pydantic schemas
- **Pure Python**: No external command execution required

### **🎯 Optimal Engine Recommendations**
- **Hardware-aware**: Automatically selects best engine for your system
- **Dependencies included**: Provides exact pip install commands
- **Detailed reasoning**: Explains why each engine was chosen
- **Performance-focused**: Prioritizes fastest available hardware

### **⚡ Build Optimization**
- **llama-cpp-python**: Optimal CMAKE arguments for your hardware
- **GPU acceleration**: CUDA, Metal, ROCm, Vulkan, SYCL
- **CPU optimization**: AVX-512, AVX2, OpenMP, KleidiAI
- **Platform-specific**: Different optimizations per OS

## 📦 Installation

```bash
# Install from source
git clone <repository-url>
cd InferenceUtils
pip install -e .

# Or install dependencies manually
pip install py-cpuinfo psutil nvidia-ml-py amdsmi openvino mlx pyobjc vulkan pydantic
```

## 🛠️ API Reference

### Core Functions

#### `systeminfo() -> HardwareProfile`
Get comprehensive hardware information as a validated Pydantic BaseModel.

```python
from inferenceutils import systeminfo

hw = systeminfo()

# Access typed data
print(f"OS: {hw.os.platform}")
print(f"CPU: {hw.cpu.brand_raw}")
print(f"RAM: {hw.ram.total_gb} GB")

# GPU information
if hw.gpu.detected_vendor == "NVIDIA":
    for gpu in hw.gpu.nvidia:
        print(f"NVIDIA: {gpu.model} ({gpu.vram_gb} GB)")
elif hw.gpu.detected_vendor == "Apple":
    print(f"Apple: {hw.gpu.apple.model}")

# Convert to JSON
json_data = hw.model_dump_json(indent=2)
```

#### `optimal_inference_engine() -> OptimalInferenceEngine`
Get the optimal inference engine recommendation with dependencies.

```python
from inferenceutils import optimal_inference_engine

engine = optimal_inference_engine()

print(f"Engine: {engine.name}")
print(f"Dependencies: {engine.dependencies}")
print(f"Reason: {engine.reason}")

# Install the recommended engine
install_cmd = f"pip install {' '.join(engine.dependencies)}"
print(f"Run: {install_cmd}")
```

#### `llama_cpp_build_args() -> List[str]`
Get optimal CMAKE build arguments for llama-cpp-python.

```python
from inferenceutils import llama_cpp_build_args, get_llama_cpp_install_command

# Get build arguments
args = llama_cpp_build_args()
print(f"CMAKE arguments: {' '.join(args)}")

# Get complete install command
install_cmd = get_llama_cpp_install_command()
print(f"Install command: {install_cmd}")
```

### Pydantic Schemas

#### `HardwareProfile`
Complete hardware profile with all detected components.

```python
from inferenceutils import HardwareProfile

# Validate hardware data
try:
    profile = HardwareProfile(**hardware_data)
    print("✅ Data is valid")
except ValidationError as e:
    print(f"❌ Validation failed: {e}")
```

#### `OptimalInferenceEngine`
Inference engine recommendation with dependencies.

```python
from inferenceutils import OptimalInferenceEngine

# Create recommendation
recommendation = OptimalInferenceEngine(
    name="MLX",
    dependencies=["mlx-lm"],
    reason="Optimized for Apple Silicon"
)
```

## 🎯 Supported Inference Engines

| Engine | Best For | Dependencies |
|--------|----------|--------------|
| **TensorRT-LLM** | High-end NVIDIA GPUs (Ampere+) | `tensorrt-llm` |
| **vLLM** | NVIDIA GPUs (Turing/Volta+) | `vllm` |
| **MLX** | Apple Silicon | `mlx-lm` |
| **OpenVINO** | Intel GPUs/NPUs | `openvino` |
| **llama.cpp** | AMD GPUs, high-performance CPUs | `llama-cpp-python` |

## 🔧 Hardware Acceleration Support

### GPU Backends
- **NVIDIA CUDA**: Automatic compute capability detection
- **Apple Metal**: Native Apple Silicon optimization
- **AMD ROCm**: HIP acceleration for AMD GPUs
- **Intel SYCL**: oneAPI for Intel accelerators
- **Vulkan**: Cross-platform GPU acceleration

### CPU Optimizations
- **Intel oneMKL**: AVX-512 optimization
- **OpenBLAS**: AVX2 acceleration
- **OpenMP**: Multi-core parallelism
- **KleidiAI**: ARM CPU optimization

## 📋 Example Output

### Hardware Detection
```json
{
  "os": {
    "platform": "Darwin",
    "version": "23.0.0",
    "architecture": "arm64"
  },
  "cpu": {
    "brand_raw": "Apple M2 Pro",
    "physical_cores": 10,
    "logical_cores": 10,
    "instruction_sets": ["neon"]
  },
  "ram": {
    "total_gb": 32.0,
    "available_gb": 24.5
  },
  "gpu": {
    "detected_vendor": "Apple",
    "apple": {
      "model": "Apple Silicon GPU",
      "vram_gb": 32.0,
      "metal_supported": true
    }
  }
}
```

### Engine Recommendation
```json
{
  "name": "MLX",
  "dependencies": ["mlx-lm"],
  "reason": "Natively designed for Apple Silicon. The system's unified memory architecture is best exploited by Apple's own MLX framework, which leverages the CPU, GPU, and Neural Engine."
}
```

### Build Arguments
```bash
# Apple Silicon
-DGGML_METAL=ON -DGGML_SVE=OFF -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=Accelerate

# NVIDIA GPU
-DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=89 -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
```

## 🚀 Use Cases

### Development Setup
```python
from inferenceutils import systeminfo, optimal_inference_engine

# Quick hardware overview
hw = systeminfo()
print(f"Setting up development environment for {hw.cpu.brand_raw}")

# Get recommended engine
engine = optimal_inference_engine()
print(f"Installing {engine.name}...")
```

### CI/CD Pipelines
```python
from inferenceutils import llama_cpp_build_args

# Generate build args for different runners
args = llama_cpp_build_args()
print(f"Building with: {' '.join(args)}")
```

### User Documentation
```python
from inferenceutils import optimal_inference_engine, get_llama_cpp_install_command

# Generate user-specific instructions
engine = optimal_inference_engine()
if engine.name == "llama.cpp":
    install_cmd = get_llama_cpp_install_command()
    print(f"Run: {install_cmd}")
else:
    print(f"Run: pip install {' '.join(engine.dependencies)}")
```

## 🔍 Hardware Detection Capabilities

### CPU Detection
- Model and architecture
- Core count (physical/logical)
- Instruction sets (AVX-512, AVX2, NEON, AMX)
- Performance characteristics

### GPU Detection
- **NVIDIA**: Model, VRAM, compute capability, driver version
- **AMD**: Model, VRAM, ROCm compatibility, compute units
- **Intel**: Model, type (dGPU/iGPU/NPU), execution units
- **Apple**: Model, unified memory, Metal support, GPU cores

### Memory & Storage
- Total and available RAM
- Primary storage type (SSD/HDD)
- Memory bandwidth considerations

### NPU Detection
- **Apple Neural Engine**: Core count, availability
- **Intel AI Boost**: NPU detection and capabilities
- **AMD Ryzen AI**: CPU-based detection

## 🛠️ Dependencies

### Core Dependencies
- **py-cpuinfo**: CPU information
- **psutil**: System and process utilities
- **nvidia-ml-py**: NVIDIA GPU monitoring
- **openvino**: Intel accelerator support
- **mlx**: Apple Silicon support
- **pyobjc**: macOS system integration
- **vulkan**: Vulkan API support
- **pydantic**: Data validation and serialization
- **amdsmi**: AMD GPU monitoring (install with `pip install inferenceutils[amd]`)

## 📄 License

MIT License - see LICENSE file for details.

## 🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "inferenceutils",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "hardware, inspection, llm, inference, pydantic, gpu, cpu, utils",
    "author": null,
    "author_email": "Luke <luke@example.com>",
    "download_url": "https://files.pythonhosted.org/packages/08/51/e94fd1c8e8c0548ce55d3fe23d7ef2bc5449f7eac9a206494334582e4136/inferenceutils-0.1.0.tar.gz",
    "platform": null,
    "description": "# InferenceUtils - Intelligent Hardware Detection for LLM Inference\n\nA comprehensive Python library that automatically detects your hardware capabilities and provides optimal recommendations for LLM inference engines and build configurations.\n\n## \ud83d\ude80 Quick Start\n\n```python\nfrom inferenceutils import systeminfo, optimal_inference_engine, llama_cpp_build_args\n\n# Get comprehensive hardware information\nhw = systeminfo()\nprint(f\"CPU: {hw.cpu.brand_raw}\")\nprint(f\"GPU: {hw.gpu.detected_vendor}\")\nprint(f\"RAM: {hw.ram.total_gb} GB\")\n\n# Get optimal inference engine recommendation\nengine = optimal_inference_engine()\nprint(f\"Recommended: {engine.name}\")\nprint(f\"Install: pip install {' '.join(engine.dependencies)}\")\n\n# Get optimal build arguments for llama-cpp-python\nargs = llama_cpp_build_args()\nprint(f\"CMAKE_ARGS: {' '.join(args)}\")\n```\n\n## \u2728 Key Features\n\n### **\ud83d\udd0d Intelligent Hardware Detection**\n- **Cross-platform**: macOS, Linux, Windows\n- **Comprehensive**: CPU, GPU, RAM, storage, instruction sets\n- **Type-safe**: All data validated with Pydantic schemas\n- **Pure Python**: No external command execution required\n\n### **\ud83c\udfaf Optimal Engine Recommendations**\n- **Hardware-aware**: Automatically selects best engine for your system\n- **Dependencies included**: Provides exact pip install commands\n- **Detailed reasoning**: Explains why each engine was chosen\n- **Performance-focused**: Prioritizes fastest available hardware\n\n### **\u26a1 Build Optimization**\n- **llama-cpp-python**: Optimal CMAKE arguments for your hardware\n- **GPU acceleration**: CUDA, Metal, ROCm, Vulkan, SYCL\n- **CPU optimization**: AVX-512, AVX2, OpenMP, KleidiAI\n- **Platform-specific**: Different optimizations per OS\n\n## \ud83d\udce6 Installation\n\n```bash\n# Install from source\ngit clone <repository-url>\ncd InferenceUtils\npip install -e .\n\n# Or install dependencies manually\npip install py-cpuinfo psutil nvidia-ml-py amdsmi openvino mlx pyobjc vulkan pydantic\n```\n\n## \ud83d\udee0\ufe0f API Reference\n\n### Core Functions\n\n#### `systeminfo() -> HardwareProfile`\nGet comprehensive hardware information as a validated Pydantic BaseModel.\n\n```python\nfrom inferenceutils import systeminfo\n\nhw = systeminfo()\n\n# Access typed data\nprint(f\"OS: {hw.os.platform}\")\nprint(f\"CPU: {hw.cpu.brand_raw}\")\nprint(f\"RAM: {hw.ram.total_gb} GB\")\n\n# GPU information\nif hw.gpu.detected_vendor == \"NVIDIA\":\n    for gpu in hw.gpu.nvidia:\n        print(f\"NVIDIA: {gpu.model} ({gpu.vram_gb} GB)\")\nelif hw.gpu.detected_vendor == \"Apple\":\n    print(f\"Apple: {hw.gpu.apple.model}\")\n\n# Convert to JSON\njson_data = hw.model_dump_json(indent=2)\n```\n\n#### `optimal_inference_engine() -> OptimalInferenceEngine`\nGet the optimal inference engine recommendation with dependencies.\n\n```python\nfrom inferenceutils import optimal_inference_engine\n\nengine = optimal_inference_engine()\n\nprint(f\"Engine: {engine.name}\")\nprint(f\"Dependencies: {engine.dependencies}\")\nprint(f\"Reason: {engine.reason}\")\n\n# Install the recommended engine\ninstall_cmd = f\"pip install {' '.join(engine.dependencies)}\"\nprint(f\"Run: {install_cmd}\")\n```\n\n#### `llama_cpp_build_args() -> List[str]`\nGet optimal CMAKE build arguments for llama-cpp-python.\n\n```python\nfrom inferenceutils import llama_cpp_build_args, get_llama_cpp_install_command\n\n# Get build arguments\nargs = llama_cpp_build_args()\nprint(f\"CMAKE arguments: {' '.join(args)}\")\n\n# Get complete install command\ninstall_cmd = get_llama_cpp_install_command()\nprint(f\"Install command: {install_cmd}\")\n```\n\n### Pydantic Schemas\n\n#### `HardwareProfile`\nComplete hardware profile with all detected components.\n\n```python\nfrom inferenceutils import HardwareProfile\n\n# Validate hardware data\ntry:\n    profile = HardwareProfile(**hardware_data)\n    print(\"\u2705 Data is valid\")\nexcept ValidationError as e:\n    print(f\"\u274c Validation failed: {e}\")\n```\n\n#### `OptimalInferenceEngine`\nInference engine recommendation with dependencies.\n\n```python\nfrom inferenceutils import OptimalInferenceEngine\n\n# Create recommendation\nrecommendation = OptimalInferenceEngine(\n    name=\"MLX\",\n    dependencies=[\"mlx-lm\"],\n    reason=\"Optimized for Apple Silicon\"\n)\n```\n\n## \ud83c\udfaf Supported Inference Engines\n\n| Engine | Best For | Dependencies |\n|--------|----------|--------------|\n| **TensorRT-LLM** | High-end NVIDIA GPUs (Ampere+) | `tensorrt-llm` |\n| **vLLM** | NVIDIA GPUs (Turing/Volta+) | `vllm` |\n| **MLX** | Apple Silicon | `mlx-lm` |\n| **OpenVINO** | Intel GPUs/NPUs | `openvino` |\n| **llama.cpp** | AMD GPUs, high-performance CPUs | `llama-cpp-python` |\n\n## \ud83d\udd27 Hardware Acceleration Support\n\n### GPU Backends\n- **NVIDIA CUDA**: Automatic compute capability detection\n- **Apple Metal**: Native Apple Silicon optimization\n- **AMD ROCm**: HIP acceleration for AMD GPUs\n- **Intel SYCL**: oneAPI for Intel accelerators\n- **Vulkan**: Cross-platform GPU acceleration\n\n### CPU Optimizations\n- **Intel oneMKL**: AVX-512 optimization\n- **OpenBLAS**: AVX2 acceleration\n- **OpenMP**: Multi-core parallelism\n- **KleidiAI**: ARM CPU optimization\n\n## \ud83d\udccb Example Output\n\n### Hardware Detection\n```json\n{\n  \"os\": {\n    \"platform\": \"Darwin\",\n    \"version\": \"23.0.0\",\n    \"architecture\": \"arm64\"\n  },\n  \"cpu\": {\n    \"brand_raw\": \"Apple M2 Pro\",\n    \"physical_cores\": 10,\n    \"logical_cores\": 10,\n    \"instruction_sets\": [\"neon\"]\n  },\n  \"ram\": {\n    \"total_gb\": 32.0,\n    \"available_gb\": 24.5\n  },\n  \"gpu\": {\n    \"detected_vendor\": \"Apple\",\n    \"apple\": {\n      \"model\": \"Apple Silicon GPU\",\n      \"vram_gb\": 32.0,\n      \"metal_supported\": true\n    }\n  }\n}\n```\n\n### Engine Recommendation\n```json\n{\n  \"name\": \"MLX\",\n  \"dependencies\": [\"mlx-lm\"],\n  \"reason\": \"Natively designed for Apple Silicon. The system's unified memory architecture is best exploited by Apple's own MLX framework, which leverages the CPU, GPU, and Neural Engine.\"\n}\n```\n\n### Build Arguments\n```bash\n# Apple Silicon\n-DGGML_METAL=ON -DGGML_SVE=OFF -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=Accelerate\n\n# NVIDIA GPU\n-DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=89 -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS\n```\n\n## \ud83d\ude80 Use Cases\n\n### Development Setup\n```python\nfrom inferenceutils import systeminfo, optimal_inference_engine\n\n# Quick hardware overview\nhw = systeminfo()\nprint(f\"Setting up development environment for {hw.cpu.brand_raw}\")\n\n# Get recommended engine\nengine = optimal_inference_engine()\nprint(f\"Installing {engine.name}...\")\n```\n\n### CI/CD Pipelines\n```python\nfrom inferenceutils import llama_cpp_build_args\n\n# Generate build args for different runners\nargs = llama_cpp_build_args()\nprint(f\"Building with: {' '.join(args)}\")\n```\n\n### User Documentation\n```python\nfrom inferenceutils import optimal_inference_engine, get_llama_cpp_install_command\n\n# Generate user-specific instructions\nengine = optimal_inference_engine()\nif engine.name == \"llama.cpp\":\n    install_cmd = get_llama_cpp_install_command()\n    print(f\"Run: {install_cmd}\")\nelse:\n    print(f\"Run: pip install {' '.join(engine.dependencies)}\")\n```\n\n## \ud83d\udd0d Hardware Detection Capabilities\n\n### CPU Detection\n- Model and architecture\n- Core count (physical/logical)\n- Instruction sets (AVX-512, AVX2, NEON, AMX)\n- Performance characteristics\n\n### GPU Detection\n- **NVIDIA**: Model, VRAM, compute capability, driver version\n- **AMD**: Model, VRAM, ROCm compatibility, compute units\n- **Intel**: Model, type (dGPU/iGPU/NPU), execution units\n- **Apple**: Model, unified memory, Metal support, GPU cores\n\n### Memory & Storage\n- Total and available RAM\n- Primary storage type (SSD/HDD)\n- Memory bandwidth considerations\n\n### NPU Detection\n- **Apple Neural Engine**: Core count, availability\n- **Intel AI Boost**: NPU detection and capabilities\n- **AMD Ryzen AI**: CPU-based detection\n\n## \ud83d\udee0\ufe0f Dependencies\n\n### Core Dependencies\n- **py-cpuinfo**: CPU information\n- **psutil**: System and process utilities\n- **nvidia-ml-py**: NVIDIA GPU monitoring\n- **openvino**: Intel accelerator support\n- **mlx**: Apple Silicon support\n- **pyobjc**: macOS system integration\n- **vulkan**: Vulkan API support\n- **pydantic**: Data validation and serialization\n- **amdsmi**: AMD GPU monitoring (install with `pip install inferenceutils[amd]`)\n\n## \ud83d\udcc4 License\n\nMIT License - see LICENSE file for details.\n\n## \ud83e\udd1d Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Intelligent hardware detection and optimal LLM inference engine recommendations with Pydantic schemas",
    "version": "0.1.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/lukerbs/InferenceUtils/issues",
        "Homepage": "https://github.com/lukerbs/InferenceUtils",
        "Repository": "https://github.com/lukerbs/InferenceUtils"
    },
    "split_keywords": [
        "hardware",
        " inspection",
        " llm",
        " inference",
        " pydantic",
        " gpu",
        " cpu",
        " utils"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "14eb656f2de8b88a50f264398bc329d0a317e1fe3c1c5a3311036b7d071ab111",
                "md5": "9809851fef6a2d7a19ec8020ed222a11",
                "sha256": "730026fe044062539688ca764400b47c166bc3305191fb43e3b8072e0176c393"
            },
            "downloads": -1,
            "filename": "inferenceutils-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9809851fef6a2d7a19ec8020ed222a11",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 21749,
            "upload_time": "2025-07-20T20:14:45",
            "upload_time_iso_8601": "2025-07-20T20:14:45.077226Z",
            "url": "https://files.pythonhosted.org/packages/14/eb/656f2de8b88a50f264398bc329d0a317e1fe3c1c5a3311036b7d071ab111/inferenceutils-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0851e94fd1c8e8c0548ce55d3fe23d7ef2bc5449f7eac9a206494334582e4136",
                "md5": "9d18b8c6e2276249694decbaa31ec34f",
                "sha256": "76e6a420199a2209ac35a76de4ebe70d908289140414ec29696bcd2bea945c78"
            },
            "downloads": -1,
            "filename": "inferenceutils-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "9d18b8c6e2276249694decbaa31ec34f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 22566,
            "upload_time": "2025-07-20T20:14:46",
            "upload_time_iso_8601": "2025-07-20T20:14:46.510837Z",
            "url": "https://files.pythonhosted.org/packages/08/51/e94fd1c8e8c0548ce55d3fe23d7ef2bc5449f7eac9a206494334582e4136/inferenceutils-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-20 20:14:46",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "lukerbs",
    "github_project": "InferenceUtils",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "inferenceutils"
}
        
Elapsed time: 1.32909s