modelaudit


Namemodelaudit JSON
Version 0.2.14 PyPI version JSON
download
home_pageNone
SummaryStatic scanning library for detecting malicious code, backdoors, and other security risks in ML model files
upload_time2025-10-24 01:32:43
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseMIT
keywords ai ml model-scanning pickle pytorch security tensorflow
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ModelAudit

**Secure your AI models before deployment.** Detects malicious code, backdoors, and security vulnerabilities in ML model files.

[![PyPI version](https://badge.fury.io/py/modelaudit.svg)](https://pypi.org/project/modelaudit/)
[![Python versions](https://img.shields.io/pypi/pyversions/modelaudit.svg)](https://pypi.org/project/modelaudit/)
[![Code Style: ruff](https://img.shields.io/badge/code%20style-ruff-005cd7.svg)](https://github.com/astral-sh/ruff)
[![License](https://img.shields.io/github/license/promptfoo/promptfoo)](https://github.com/promptfoo/promptfoo/blob/main/LICENSE)

<img width="989" alt="image" src="https://www.promptfoo.dev/img/docs/modelaudit/modelaudit-result.png" />

πŸ“– **[Full Documentation](https://www.promptfoo.dev/docs/model-audit/)** | 🎯 **[Usage Examples](https://www.promptfoo.dev/docs/model-audit/usage/)** | πŸ” **[Supported Formats](https://www.promptfoo.dev/docs/model-audit/scanners/)**

## πŸš€ Quick Start

**Install and scan in 30 seconds:**

```bash
# Install ModelAudit with all ML framework support
pip install modelaudit[all]

# Scan a model file
modelaudit model.pkl

# Scan a directory
modelaudit ./models/

# Export results for CI/CD
modelaudit model.pkl --format json --output results.json
```

**Example output:**

```bash
$ modelaudit suspicious_model.pkl

βœ“ Scanning suspicious_model.pkl
Files scanned: 1 | Issues found: 2 critical, 1 warning

1. suspicious_model.pkl (pos 28): [CRITICAL] Malicious code execution attempt
   Why: Contains os.system() call that could run arbitrary commands

2. suspicious_model.pkl (pos 52): [WARNING] Dangerous pickle deserialization
   Why: Could execute code when the model loads

βœ— Security issues found - DO NOT deploy this model
```

## πŸ“ Project Structure

ModelAudit is organized by conceptual purpose for clarity and maintainability:

```
modelaudit/
β”œβ”€β”€ scanners/         # 29 specialized file format scanners
β”‚   β”œβ”€β”€ pickle_scanner.py, pytorch_*.py, onnx_scanner.py, etc.
β”‚   └── base.py - BaseScanner class with shared functionality
β”‚
β”œβ”€β”€ detectors/        # Security threat detection modules
β”‚   β”œβ”€β”€ cve_patterns.py - Known CVE patterns (CVE-2025-32434, etc.)
β”‚   β”œβ”€β”€ secrets.py - API keys, tokens, credentials
β”‚   β”œβ”€β”€ jit_script.py - JIT/TorchScript malicious code
β”‚   β”œβ”€β”€ network_comm.py - URLs, IPs, sockets
β”‚   └── suspicious_symbols.py - Dangerous function calls
β”‚
β”œβ”€β”€ integrations/     # External system integrations
β”‚   β”œβ”€β”€ jfrog.py - JFrog Artifactory support
β”‚   β”œβ”€β”€ mlflow.py - MLflow registry support
β”‚   β”œβ”€β”€ sbom_generator.py - CycloneDX SBOM generation
β”‚   β”œβ”€β”€ sarif_formatter.py - SARIF output format
β”‚   └── license_checker.py - License compliance
β”‚
β”œβ”€β”€ analysis/         # Advanced analysis algorithms
β”‚   β”œβ”€β”€ anomaly_detector.py, entropy_analyzer.py
β”‚   └── ml_context_analyzer.py - Context-aware analysis
β”‚
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ file/         # File handling (detection, filtering, streaming)
β”‚   β”œβ”€β”€ sources/      # Model sources (HuggingFace, cloud, JFrog, DVC)
β”‚   └── helpers/      # Generic utilities (retry, caching, etc.)
β”‚
β”œβ”€β”€ cache/            # Caching system for scan results
β”œβ”€β”€ auth/             # Authentication for remote sources
β”œβ”€β”€ progress/         # Progress tracking and UI
β”‚
β”œβ”€β”€ core.py           # Main scanning orchestration
└── cli.py            # Command-line interface
```

**Navigation guide**:

- **"What formats can we scan?"** β†’ `scanners/`
- **"What threats do we detect?"** β†’ `detectors/`
- **"What systems do we integrate with?"** β†’ `integrations/`
- **"Where can models come from?"** β†’ `utils/sources/`

[View detailed refactoring plan β†’](docs/REFACTORING_PLAN.md)

## πŸ›‘οΈ What Problems It Solves

### **Prevents Code Execution Attacks**

Stops malicious models that run arbitrary commands when loaded (common in PyTorch .pt files)

### **Detects Model Backdoors**

Identifies trojaned models with hidden functionality or suspicious weight patterns

### **Ensures Supply Chain Security**

Validates model integrity and prevents tampering in your ML pipeline

### **Enforces License Compliance**

Checks for license violations that could expose your company to legal risk

### **Finds Embedded Secrets**

Detects API keys, tokens, and other credentials hidden in model weights or metadata

### **Flags Network Communication**

Identifies URLs, IPs, and socket usage that could enable data exfiltration or C2 channels

### **Detects Hidden JIT/Script Execution**

Scans TorchScript, ONNX, and other JIT-compiled code for dangerous operations

### **Smart Whitelist System (Reduces False Positives)**

Automatically downgrades findings for 7,440+ trusted models from popular downloads and verified organizations (Meta, Google, Microsoft, NVIDIA, etc.) - [Learn more](#-whitelist-system)

## πŸ“Š Supported Model Formats

ModelAudit supports **29 specialized file format scanners** with comprehensive security analysis:

### πŸ”΄ High Risk Formats (Pickle-based serialization)

| Format             | Extensions                        | Security Focus                    |
| ------------------ | --------------------------------- | --------------------------------- |
| **Pickle**         | `.pkl`, `.pickle`, `.dill`        | Dangerous opcodes, code execution |
| **PyTorch**        | `.pt`, `.pth`, `.ckpt`, `.bin`    | Pickle payloads, embedded malware |
| **Joblib**         | `.joblib`                         | Pickled scikit-learn objects      |
| **NumPy**          | `.npy`, `.npz`                    | Array metadata, pickle objects    |
| **JAX Checkpoint** | `.ckpt`, `.checkpoint`, `.pickle` | Serialized transforms             |

### 🟠 Medium Risk Formats (Complex with custom operations)

| Format              | Extensions               | Security Focus                |
| ------------------- | ------------------------ | ----------------------------- |
| **TensorFlow**      | `.pb`, SavedModel dirs   | PyFunc operations, custom ops |
| **Keras H5**        | `.h5`, `.hdf5`           | Unsafe Lambda layers          |
| **Keras ZIP**       | `.keras`                 | ZIP-based Keras archives      |
| **ONNX**            | `.onnx`                  | Custom operators, metadata    |
| **TensorFlow Lite** | `.tflite`                | Mobile model validation       |
| **PaddlePaddle**    | `.pdmodel`, `.pdiparams` | Custom operations             |
| **XGBoost**         | `.bst`, `.model`, `.ubj` | Serialized boosting models    |
| **Core ML**         | `.mlmodel`               | Apple custom layers           |

### 🟑 Lower Risk Formats (Safer serialization)

| Format               | Extensions                            | Security Focus                  |
| -------------------- | ------------------------------------- | ------------------------------- |
| **SafeTensors**      | `.safetensors`                        | Header validation (recommended) |
| **GGUF/GGML**        | `.gguf`, `.ggml`                      | LLM standard format             |
| **JAX/Flax Msgpack** | `.msgpack`, `.flax`, `.orbax`, `.jax` | Msgpack serialization           |
| **ExecuTorch**       | `.ptl`, `.pte`                        | PyTorch mobile archives         |
| **TensorRT**         | `.engine`, `.plan`                    | NVIDIA inference engines        |
| **OpenVINO**         | `.xml`                                | Intel IR format                 |
| **PMML**             | `.pmml`                               | XML predictive models           |
| **OCI Layers**       | `.manifest`                           | Container layer analysis        |

### πŸ“¦ Archive & Container Formats

| Format    | Extensions                                                        | Security Focus                  |
| --------- | ----------------------------------------------------------------- | ------------------------------- |
| **ZIP**   | `.zip`                                                            | Path traversal, malicious files |
| **TAR**   | `.tar`, `.tar.gz`, `.tgz`, `.tar.bz2`, `.tbz2`, `.tar.xz`, `.txz` | Archive exploits                |
| **7-Zip** | `.7z`                                                             | Archive security                |

### πŸ“„ Configuration & Metadata Formats

| Format               | Extensions                                        | Security Focus            |
| -------------------- | ------------------------------------------------- | ------------------------- |
| **Metadata**         | `.json`, `.md`, `.yml`, `.yaml`, `.rst`           | Embedded secrets, URLs    |
| **Manifest**         | `.json`, `.yaml`, `.xml`, `.toml`, `.ini`, `.cfg` | Config vulnerabilities    |
| **Text**             | `.txt`, `.md`, `.markdown`, `.rst`                | ML-related text analysis  |
| **Jinja2 Templates** | `.jinja`, `.j2`, `.template`                      | Template injection (SSTI) |

[View complete format documentation β†’](https://www.promptfoo.dev/docs/model-audit/scanners/)

## 🎯 Common Use Cases

### **Pre-Deployment Security Checks**

```bash
modelaudit production_model.safetensors --format json --output security_report.json
```

### **CI/CD Pipeline Integration**

ModelAudit automatically detects CI environments and adjusts output accordingly:

```bash
# Recommended: Use JSON format for machine-readable output
modelaudit models/ --format json --output results.json

# Text output automatically adapts to CI (no spinners, plain text)
modelaudit models/ --timeout 300

# Disable colors explicitly with NO_COLOR environment variable
NO_COLOR=1 modelaudit models/
```

**CI-Friendly Features:**

- 🚫 Spinners automatically disabled when output is piped or in CI
- 🎨 Colors disabled when `NO_COLOR` environment variable is set
- πŸ“Š JSON output recommended for parsing in CI pipelines
- πŸ” Exit codes: 0 (clean), 1 (issues found), 2 (errors)

### **Third-Party Model Validation**

```bash
# Scan models from HuggingFace, PyTorch Hub, MLflow, JFrog, or cloud storage
modelaudit https://huggingface.co/gpt2
modelaudit https://pytorch.org/hub/pytorch_vision_resnet/
modelaudit models:/MyModel/Production
modelaudit model.dvc
modelaudit s3://my-bucket/downloaded-model.pt

# JFrog Artifactory - now supports both files AND folders
# Auth: export JFROG_API_TOKEN=... (or JFROG_ACCESS_TOKEN)
modelaudit https://company.jfrog.io/artifactory/repo/model.pt
# Or with explicit flag:
modelaudit https://company.jfrog.io/artifactory/repo/model.pt --api-token "$JFROG_API_TOKEN"
modelaudit https://company.jfrog.io/artifactory/repo/models/  # Scan entire folder!
```

### **Compliance & Audit Reporting**

```bash
modelaudit model_package.zip --sbom compliance_report.json --strict --verbose
```

### 🧠 Smart Detection Examples

ModelAudit automatically adapts to your input - **no configuration needed for most cases:**

```bash
# Local file - fast scan, no progress bars
modelaudit model.pkl

# Cloud directory - auto enables caching + progress bars
modelaudit s3://my-bucket/models/

# HuggingFace model - selective download + caching
modelaudit hf://microsoft/DialoGPT-medium

# Large local file - enables progress + optimizations
modelaudit 15GB-model.bin

# CI environment - auto detects and uses JSON output
CI=true modelaudit model.pkl
```

**Override smart detection when needed:**

```bash
# Force strict mode for security-critical scans
modelaudit model.pkl --strict --format json --output report.json

# Override size limits for huge models
modelaudit huge-model.pt --max-size 50GB --timeout 7200

# Preview mode without downloading
modelaudit s3://bucket/model.pt --dry-run
```

[View advanced usage examples β†’](https://www.promptfoo.dev/docs/model-audit/usage/)

### βš™οΈ Smart Detection & CLI Options

ModelAudit uses **smart detection** to automatically configure optimal settings based on your input:

**✨ Smart Detection Features:**

- **Input type** (local/cloud/registry) β†’ optimal download & caching strategies
- **File size** (>1GB) β†’ large model optimizations + progress bars
- **Terminal type** (TTY/CI) β†’ appropriate UI (progress vs quiet mode)
- **Cloud operations** β†’ automatic caching, size limits, timeouts

**πŸŽ›οΈ Override Controls (13 focused flags):**

- `--strict` – scan all file types, strict license validation, fail on warnings
- `--max-size SIZE` – unified size limit (e.g., `10GB`, `500MB`)
- `--timeout SECONDS` – override auto-detected timeout
- `--dry-run` – preview what would be scanned/downloaded
- `--progress` – force enable progress reporting
- `--no-cache` – disable caching (overrides smart detection)
- `--format json` / `--output file.json` – structured output for CI/CD
- `--sbom file.json` – generate CycloneDX v1.6 SBOM with enhanced ML-BOM support
- `--verbose` / `--quiet` – control output detail level
- `--blacklist PATTERN` – additional security patterns

**πŸ” Authentication (via environment variables):**

- Set `JFROG_API_TOKEN` or `JFROG_ACCESS_TOKEN` for JFrog Artifactory
- Set `MLFLOW_TRACKING_URI` for MLflow registry access

### πŸš€ Large Model Support (Up to 1 TB)

ModelAudit automatically optimizes scanning strategies for different model sizes:

- **< 100 GB**: Full in-memory analysis for comprehensive scanning
- **100 GB - 1 TB**: Chunked processing with 50 GB chunks for memory efficiency
- **1 TB - 5 TB**: Streaming analysis with intelligent sampling
- **> 5 TB**: Advanced distributed scanning techniques

Large models are supported with automatic timeout increases and memory-optimized processing.

### Static Scanning vs. Promptfoo Redteaming

ModelAudit performs **static** analysis only. It examines model files for risky patterns
without ever loading or executing them. Promptfoo's redteaming module is
**dynamic**β€”it loads the model (locally or via API) and sends crafted prompts to
probe runtime behavior. Use ModelAudit first to verify the model file itself,
then run redteaming if you need to test how the model responds when invoked.

## βš™οΈ Installation Options

**Requirements:**

- Python 3.10 or higher
- Compatible with Python 3.10, 3.11, 3.12, and 3.13

**Basic installation (recommended for most users):**

### Quick Install Decision Guide

**πŸš€ Just want everything to work?**

```bash
pip install modelaudit[all]
```

**Basic installation:**

```bash
# Core functionality only (pickle, numpy, archives)
pip install modelaudit
```

**Specific frameworks:**

```bash
pip install modelaudit[tensorflow]  # TensorFlow (.pb)
pip install modelaudit[pytorch]     # PyTorch (.pt, .pth)
pip install modelaudit[h5]          # Keras (.h5, .keras)
pip install modelaudit[onnx]        # ONNX (.onnx)
pip install modelaudit[safetensors] # SafeTensors (.safetensors)

# Multiple frameworks
pip install modelaudit[tensorflow,pytorch,h5]
```

**Additional features:**

```bash
pip install modelaudit[cloud]       # S3, GCS, Azure storage
pip install modelaudit[coreml]      # Apple Core ML
pip install modelaudit[flax]        # JAX/Flax models
pip install modelaudit[mlflow]      # MLflow registry
pip install modelaudit[huggingface] # Hugging Face integration
```

**Compatibility:**

```bash
# NumPy 1.x compatibility (some frameworks require NumPy < 2.0)
pip install modelaudit[numpy1]

# For CI/CD environments (omits dependencies like TensorRT that may not be available)
pip install modelaudit[all-ci]
```

**Docker:**

```bash
docker pull ghcr.io/promptfoo/modelaudit:latest
# Linux/macOS
docker run --rm -v "$(pwd)":/app ghcr.io/promptfoo/modelaudit:latest model.pkl
# Windows
docker run --rm -v "%cd%":/app ghcr.io/promptfoo/modelaudit:latest model.pkl
```

## Security Checks

### Code Execution Detection

- Dangerous Python modules: `os`, `sys`, `subprocess`, `eval`, `exec`
- Pickle opcodes: `REDUCE`, `GLOBAL`, `INST`, `OBJ`, `NEWOBJ`, `STACK_GLOBAL`, `BUILD`, `NEWOBJ_EX`
- Embedded executable file detection

### Embedded Data Extraction

- API keys, tokens, and credentials in model weights/metadata
- URLs, IP addresses, and network endpoints
- Suspicious configuration properties

### Archive Security

- Path traversal attacks in ZIP/TAR archives
- Executable files within model packages
- Malicious filenames and directory structures

### ML Framework Analysis

- TensorFlow operations: `PyFunc`, `PyFuncStateless`
- Keras unsafe layers and custom objects
- Template injection in model configurations

### Context-Aware Analysis

- Intelligently distinguishes between legitimate ML framework patterns and genuine threats to reduce false positives in complex model files

## Supported Formats

ModelAudit includes **29 specialized file format scanners** ([see complete list](https://www.promptfoo.dev/docs/model-audit/scanners/)):

### Model Formats

| Format              | Extensions                            | Risk Level | Security Focus                    |
| ------------------- | ------------------------------------- | ---------- | --------------------------------- |
| **Pickle**          | `.pkl`, `.pickle`, `.dill`            | πŸ”΄ HIGH    | Code execution, dangerous opcodes |
| **PyTorch**         | `.pt`, `.pth`, `.ckpt`, `.bin`        | πŸ”΄ HIGH    | Pickle payloads, embedded malware |
| **Joblib**          | `.joblib`                             | πŸ”΄ HIGH    | Pickled scikit-learn objects      |
| **NumPy**           | `.npy`, `.npz`                        | πŸ”΄ HIGH    | Array metadata, pickle objects    |
| **TensorFlow**      | `.pb`, SavedModel directories         | 🟠 MEDIUM  | PyFunc operations, custom ops     |
| **Keras**           | `.h5`, `.hdf5`, `.keras`              | 🟠 MEDIUM  | Unsafe layers, custom objects     |
| **ONNX**            | `.onnx`                               | 🟠 MEDIUM  | Custom operators, metadata        |
| **XGBoost**         | `.bst`, `.model`, `.ubj`              | 🟠 MEDIUM  | Serialized boosting models        |
| **SafeTensors**     | `.safetensors`                        | 🟒 SAFE    | Header validation (recommended)   |
| **GGUF/GGML**       | `.gguf`, `.ggml`                      | 🟒 SAFE    | LLM standard format               |
| **JAX/Flax**        | `.msgpack`, `.flax`, `.orbax`, `.jax` | 🟑 LOW     | Msgpack serialization             |
| **JAX Checkpoint**  | `.ckpt`, `.checkpoint`, `.pickle`     | 🟑 LOW     | JAX checkpoint formats            |
| **TensorFlow Lite** | `.tflite`                             | 🟑 LOW     | Mobile model validation           |
| **ExecuTorch**      | `.ptl`, `.pte`                        | 🟑 LOW     | PyTorch mobile archives           |
| **Core ML**         | `.mlmodel`                            | 🟑 LOW     | Apple custom layers               |
| **TensorRT**        | `.engine`, `.plan`                    | 🟑 LOW     | NVIDIA inference engines          |
| **PaddlePaddle**    | `.pdmodel`, `.pdiparams`              | 🟑 LOW     | Custom operations                 |
| **OpenVINO**        | `.xml`                                | 🟑 LOW     | Intel IR format                   |
| **PMML**            | `.pmml`                               | 🟑 LOW     | XML predictive models             |

### Archive & Configuration Formats

| Format               | Extensions                                  | Security Focus                  |
| -------------------- | ------------------------------------------- | ------------------------------- |
| **ZIP**              | `.zip`                                      | Path traversal, malicious files |
| **TAR**              | `.tar`, `.tar.gz`, `.tgz`, `.tar.bz2`, etc. | Archive exploits                |
| **7-Zip**            | `.7z`                                       | Archive security                |
| **OCI Layers**       | `.manifest`                                 | Container layer analysis        |
| **Metadata**         | `.json`, `.md`, `.yml`, `.yaml`, `.rst`     | Embedded secrets, URLs          |
| **Manifest**         | `.json`, `.yaml`, `.xml`, `.toml`, `.ini`   | Configuration vulnerabilities   |
| **Text**             | `.txt`, `.md`, `.markdown`, `.rst`          | ML-related text analysis        |
| **Jinja2 Templates** | `.jinja`, `.j2`, `.template`                | Template injection (SSTI)       |

[Complete format documentation β†’](https://www.promptfoo.dev/docs/model-audit/scanners/)

## Usage Examples

### Basic Scanning

```bash
# Scan single file
modelaudit model.pkl

# Scan directory
modelaudit ./models/

# Strict mode (fail on warnings)
modelaudit model.pkl --strict
```

### CI/CD Integration

```bash
# JSON output for automation
modelaudit models/ --format json --output results.json

# Generate SBOM report
modelaudit model.pkl --sbom compliance_report.json

# Disable colors in CI
NO_COLOR=1 modelaudit models/
```

### Remote Sources

```bash
# Hugging Face models (via direct URL or hf:// scheme)
modelaudit https://huggingface.co/gpt2
modelaudit hf://microsoft/DialoGPT-medium

# Cloud storage
modelaudit s3://bucket/model.pt
modelaudit gs://bucket/models/
modelaudit https://account.blob.core.windows.net/container/model.pt

# MLflow registry
modelaudit models:/MyModel/Production

# JFrog Artifactory (files and folders)
modelaudit https://company.jfrog.io/artifactory/repo/model.pt      # Single file
modelaudit https://company.jfrog.io/artifactory/repo/models/       # Entire folder
```

### Command Options

- **`--format`** - Output format: text, json, sarif
- **`--output`** - Write results to file
- **`--verbose`** - Detailed output
- **`--quiet`** - Minimal output
- **`--strict`** - Fail on warnings, scan all files
- **`--timeout`** - Override scan timeout
- **`--max-size`** - Set size limits (e.g., 10 GB)
- **`--dry-run`** - Preview without scanning
- **`--progress`** - Force progress display
- **`--sbom`** - Generate CycloneDX SBOM
- **`--blacklist`** - Additional patterns to flag
- **`--no-cache`** - Disable result caching

[Advanced usage examples β†’](https://www.promptfoo.dev/docs/model-audit/usage/)

## πŸ›‘οΈ Whitelist System

ModelAudit includes a smart whitelist system that **reduces false positives** for trusted models while maintaining security:

### What's Whitelisted

- **7,440+ models** from two trusted sources:
  1. **Popular models** (540 models) - Top downloaded models from HuggingFace
  2. **Trusted organizations** (6,900 models) - Models from 18 verified organizations:
     - Meta/Facebook, Google, Microsoft, NVIDIA
     - OpenAI, Hugging Face, Stability AI
     - EleutherAI, BigScience, BigCode
     - Mistral AI, Sentence Transformers
     - And more...

### How It Works

- **Automatic detection**: Model IDs are extracted from URLs, cache paths, and metadata
- **Smart downgrading**: Security findings are downgraded from WARNING/CRITICAL β†’ INFO
- **Enabled by default**: Works transparently with no configuration needed
- **User control**: Disable via config if needed: `{"use_hf_whitelist": False}`

### Example

```bash
# Scanning a whitelisted model
$ modelaudit facebook/bart-large-cnn

βœ“ Scanning facebook/bart-large-cnn
Files scanned: 3 | Issues found: 0 critical, 0 warning, 2 info

# Issues are downgraded to INFO for trusted models
1. model.safetensors: [INFO] Contains pickle import (whitelisted model)
   Original severity: WARNING
```

### Updating the Whitelist

**For maintainers**: Update periodically to include new popular models and releases:

```bash
# Update popular models (top downloads)
python scripts/fetch_hf_top_models.py --count 2000

# Update organization models (trusted orgs)
python scripts/fetch_hf_org_models.py

# Commit the updated files in modelaudit/whitelists/
```

**Recommended update frequency**: Monthly or before major releases

## Output Formats

### Text (default)

```text
$ modelaudit model.pkl

βœ“ Scanning model.pkl
Files scanned: 1 | Issues found: 1 critical

1. model.pkl (pos 28): [CRITICAL] Malicious code execution attempt
   Why: Contains os.system() call that could run arbitrary commands
```

### JSON (for automation)

```bash
modelaudit model.pkl --format json
```

```json
{
  "files_scanned": 1,
  "issues": [
    {
      "message": "Malicious code execution attempt",
      "severity": "critical",
      "location": "model.pkl (pos 28)"
    }
  ]
}
```

### SARIF (for security tools)

```bash
modelaudit model.pkl --format sarif --output results.sarif
```

## Troubleshooting

### Check scanner availability

```bash
modelaudit doctor --show-failed
```

### NumPy compatibility issues

```bash
# Use NumPy 1.x compatibility mode
pip install modelaudit[numpy1]
```

### Missing dependencies

```bash
# ModelAudit shows exactly what to install
modelaudit your-model.onnx
# Output: "Install with 'pip install modelaudit[onnx]'"
```

### Exit Codes

- `0` - No security issues found
- `1` - Security issues detected
- `2` - Scan errors occurred

### Authentication

ModelAudit uses environment variables for authenticating to remote services:

```bash
# JFrog Artifactory
export JFROG_API_TOKEN=your_token

# MLflow
export MLFLOW_TRACKING_URI=http://localhost:5000

# AWS, Google Cloud, and Azure
# Authentication is handled automatically by the respective client libraries
# (e.g., via IAM roles, `aws configure`, `gcloud auth login`, or environment variables).
# For specific env var setup, refer to the library's documentation.
export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json

# Hugging Face
export HF_TOKEN=your_token
```

## Documentation

- **Documentation**: [promptfoo.dev/docs/model-audit/](https://www.promptfoo.dev/docs/model-audit/)
- **Usage Examples**: [promptfoo.dev/docs/model-audit/usage/](https://www.promptfoo.dev/docs/model-audit/usage/)
- **Report Issues**: Contact support at [promptfoo.dev](https://www.promptfoo.dev/)

## πŸ“ License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "modelaudit",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "ai, ml, model-scanning, pickle, pytorch, security, tensorflow",
    "author": null,
    "author_email": "Ian Webster <ian@promptfoo.dev>, Michael D'Angelo <michael@promptfoo.dev>",
    "download_url": "https://files.pythonhosted.org/packages/60/48/7bf8c9c368cf79fa288c2174d190190ff1fb7c205f136325950571c2e841/modelaudit-0.2.14.tar.gz",
    "platform": null,
    "description": "# ModelAudit\n\n**Secure your AI models before deployment.** Detects malicious code, backdoors, and security vulnerabilities in ML model files.\n\n[![PyPI version](https://badge.fury.io/py/modelaudit.svg)](https://pypi.org/project/modelaudit/)\n[![Python versions](https://img.shields.io/pypi/pyversions/modelaudit.svg)](https://pypi.org/project/modelaudit/)\n[![Code Style: ruff](https://img.shields.io/badge/code%20style-ruff-005cd7.svg)](https://github.com/astral-sh/ruff)\n[![License](https://img.shields.io/github/license/promptfoo/promptfoo)](https://github.com/promptfoo/promptfoo/blob/main/LICENSE)\n\n<img width=\"989\" alt=\"image\" src=\"https://www.promptfoo.dev/img/docs/modelaudit/modelaudit-result.png\" />\n\n\ud83d\udcd6 **[Full Documentation](https://www.promptfoo.dev/docs/model-audit/)** | \ud83c\udfaf **[Usage Examples](https://www.promptfoo.dev/docs/model-audit/usage/)** | \ud83d\udd0d **[Supported Formats](https://www.promptfoo.dev/docs/model-audit/scanners/)**\n\n## \ud83d\ude80 Quick Start\n\n**Install and scan in 30 seconds:**\n\n```bash\n# Install ModelAudit with all ML framework support\npip install modelaudit[all]\n\n# Scan a model file\nmodelaudit model.pkl\n\n# Scan a directory\nmodelaudit ./models/\n\n# Export results for CI/CD\nmodelaudit model.pkl --format json --output results.json\n```\n\n**Example output:**\n\n```bash\n$ modelaudit suspicious_model.pkl\n\n\u2713 Scanning suspicious_model.pkl\nFiles scanned: 1 | Issues found: 2 critical, 1 warning\n\n1. suspicious_model.pkl (pos 28): [CRITICAL] Malicious code execution attempt\n   Why: Contains os.system() call that could run arbitrary commands\n\n2. suspicious_model.pkl (pos 52): [WARNING] Dangerous pickle deserialization\n   Why: Could execute code when the model loads\n\n\u2717 Security issues found - DO NOT deploy this model\n```\n\n## \ud83d\udcc1 Project Structure\n\nModelAudit is organized by conceptual purpose for clarity and maintainability:\n\n```\nmodelaudit/\n\u251c\u2500\u2500 scanners/         # 29 specialized file format scanners\n\u2502   \u251c\u2500\u2500 pickle_scanner.py, pytorch_*.py, onnx_scanner.py, etc.\n\u2502   \u2514\u2500\u2500 base.py - BaseScanner class with shared functionality\n\u2502\n\u251c\u2500\u2500 detectors/        # Security threat detection modules\n\u2502   \u251c\u2500\u2500 cve_patterns.py - Known CVE patterns (CVE-2025-32434, etc.)\n\u2502   \u251c\u2500\u2500 secrets.py - API keys, tokens, credentials\n\u2502   \u251c\u2500\u2500 jit_script.py - JIT/TorchScript malicious code\n\u2502   \u251c\u2500\u2500 network_comm.py - URLs, IPs, sockets\n\u2502   \u2514\u2500\u2500 suspicious_symbols.py - Dangerous function calls\n\u2502\n\u251c\u2500\u2500 integrations/     # External system integrations\n\u2502   \u251c\u2500\u2500 jfrog.py - JFrog Artifactory support\n\u2502   \u251c\u2500\u2500 mlflow.py - MLflow registry support\n\u2502   \u251c\u2500\u2500 sbom_generator.py - CycloneDX SBOM generation\n\u2502   \u251c\u2500\u2500 sarif_formatter.py - SARIF output format\n\u2502   \u2514\u2500\u2500 license_checker.py - License compliance\n\u2502\n\u251c\u2500\u2500 analysis/         # Advanced analysis algorithms\n\u2502   \u251c\u2500\u2500 anomaly_detector.py, entropy_analyzer.py\n\u2502   \u2514\u2500\u2500 ml_context_analyzer.py - Context-aware analysis\n\u2502\n\u251c\u2500\u2500 utils/\n\u2502   \u251c\u2500\u2500 file/         # File handling (detection, filtering, streaming)\n\u2502   \u251c\u2500\u2500 sources/      # Model sources (HuggingFace, cloud, JFrog, DVC)\n\u2502   \u2514\u2500\u2500 helpers/      # Generic utilities (retry, caching, etc.)\n\u2502\n\u251c\u2500\u2500 cache/            # Caching system for scan results\n\u251c\u2500\u2500 auth/             # Authentication for remote sources\n\u251c\u2500\u2500 progress/         # Progress tracking and UI\n\u2502\n\u251c\u2500\u2500 core.py           # Main scanning orchestration\n\u2514\u2500\u2500 cli.py            # Command-line interface\n```\n\n**Navigation guide**:\n\n- **\"What formats can we scan?\"** \u2192 `scanners/`\n- **\"What threats do we detect?\"** \u2192 `detectors/`\n- **\"What systems do we integrate with?\"** \u2192 `integrations/`\n- **\"Where can models come from?\"** \u2192 `utils/sources/`\n\n[View detailed refactoring plan \u2192](docs/REFACTORING_PLAN.md)\n\n## \ud83d\udee1\ufe0f What Problems It Solves\n\n### **Prevents Code Execution Attacks**\n\nStops malicious models that run arbitrary commands when loaded (common in PyTorch .pt files)\n\n### **Detects Model Backdoors**\n\nIdentifies trojaned models with hidden functionality or suspicious weight patterns\n\n### **Ensures Supply Chain Security**\n\nValidates model integrity and prevents tampering in your ML pipeline\n\n### **Enforces License Compliance**\n\nChecks for license violations that could expose your company to legal risk\n\n### **Finds Embedded Secrets**\n\nDetects API keys, tokens, and other credentials hidden in model weights or metadata\n\n### **Flags Network Communication**\n\nIdentifies URLs, IPs, and socket usage that could enable data exfiltration or C2 channels\n\n### **Detects Hidden JIT/Script Execution**\n\nScans TorchScript, ONNX, and other JIT-compiled code for dangerous operations\n\n### **Smart Whitelist System (Reduces False Positives)**\n\nAutomatically downgrades findings for 7,440+ trusted models from popular downloads and verified organizations (Meta, Google, Microsoft, NVIDIA, etc.) - [Learn more](#-whitelist-system)\n\n## \ud83d\udcca Supported Model Formats\n\nModelAudit supports **29 specialized file format scanners** with comprehensive security analysis:\n\n### \ud83d\udd34 High Risk Formats (Pickle-based serialization)\n\n| Format             | Extensions                        | Security Focus                    |\n| ------------------ | --------------------------------- | --------------------------------- |\n| **Pickle**         | `.pkl`, `.pickle`, `.dill`        | Dangerous opcodes, code execution |\n| **PyTorch**        | `.pt`, `.pth`, `.ckpt`, `.bin`    | Pickle payloads, embedded malware |\n| **Joblib**         | `.joblib`                         | Pickled scikit-learn objects      |\n| **NumPy**          | `.npy`, `.npz`                    | Array metadata, pickle objects    |\n| **JAX Checkpoint** | `.ckpt`, `.checkpoint`, `.pickle` | Serialized transforms             |\n\n### \ud83d\udfe0 Medium Risk Formats (Complex with custom operations)\n\n| Format              | Extensions               | Security Focus                |\n| ------------------- | ------------------------ | ----------------------------- |\n| **TensorFlow**      | `.pb`, SavedModel dirs   | PyFunc operations, custom ops |\n| **Keras H5**        | `.h5`, `.hdf5`           | Unsafe Lambda layers          |\n| **Keras ZIP**       | `.keras`                 | ZIP-based Keras archives      |\n| **ONNX**            | `.onnx`                  | Custom operators, metadata    |\n| **TensorFlow Lite** | `.tflite`                | Mobile model validation       |\n| **PaddlePaddle**    | `.pdmodel`, `.pdiparams` | Custom operations             |\n| **XGBoost**         | `.bst`, `.model`, `.ubj` | Serialized boosting models    |\n| **Core ML**         | `.mlmodel`               | Apple custom layers           |\n\n### \ud83d\udfe1 Lower Risk Formats (Safer serialization)\n\n| Format               | Extensions                            | Security Focus                  |\n| -------------------- | ------------------------------------- | ------------------------------- |\n| **SafeTensors**      | `.safetensors`                        | Header validation (recommended) |\n| **GGUF/GGML**        | `.gguf`, `.ggml`                      | LLM standard format             |\n| **JAX/Flax Msgpack** | `.msgpack`, `.flax`, `.orbax`, `.jax` | Msgpack serialization           |\n| **ExecuTorch**       | `.ptl`, `.pte`                        | PyTorch mobile archives         |\n| **TensorRT**         | `.engine`, `.plan`                    | NVIDIA inference engines        |\n| **OpenVINO**         | `.xml`                                | Intel IR format                 |\n| **PMML**             | `.pmml`                               | XML predictive models           |\n| **OCI Layers**       | `.manifest`                           | Container layer analysis        |\n\n### \ud83d\udce6 Archive & Container Formats\n\n| Format    | Extensions                                                        | Security Focus                  |\n| --------- | ----------------------------------------------------------------- | ------------------------------- |\n| **ZIP**   | `.zip`                                                            | Path traversal, malicious files |\n| **TAR**   | `.tar`, `.tar.gz`, `.tgz`, `.tar.bz2`, `.tbz2`, `.tar.xz`, `.txz` | Archive exploits                |\n| **7-Zip** | `.7z`                                                             | Archive security                |\n\n### \ud83d\udcc4 Configuration & Metadata Formats\n\n| Format               | Extensions                                        | Security Focus            |\n| -------------------- | ------------------------------------------------- | ------------------------- |\n| **Metadata**         | `.json`, `.md`, `.yml`, `.yaml`, `.rst`           | Embedded secrets, URLs    |\n| **Manifest**         | `.json`, `.yaml`, `.xml`, `.toml`, `.ini`, `.cfg` | Config vulnerabilities    |\n| **Text**             | `.txt`, `.md`, `.markdown`, `.rst`                | ML-related text analysis  |\n| **Jinja2 Templates** | `.jinja`, `.j2`, `.template`                      | Template injection (SSTI) |\n\n[View complete format documentation \u2192](https://www.promptfoo.dev/docs/model-audit/scanners/)\n\n## \ud83c\udfaf Common Use Cases\n\n### **Pre-Deployment Security Checks**\n\n```bash\nmodelaudit production_model.safetensors --format json --output security_report.json\n```\n\n### **CI/CD Pipeline Integration**\n\nModelAudit automatically detects CI environments and adjusts output accordingly:\n\n```bash\n# Recommended: Use JSON format for machine-readable output\nmodelaudit models/ --format json --output results.json\n\n# Text output automatically adapts to CI (no spinners, plain text)\nmodelaudit models/ --timeout 300\n\n# Disable colors explicitly with NO_COLOR environment variable\nNO_COLOR=1 modelaudit models/\n```\n\n**CI-Friendly Features:**\n\n- \ud83d\udeab Spinners automatically disabled when output is piped or in CI\n- \ud83c\udfa8 Colors disabled when `NO_COLOR` environment variable is set\n- \ud83d\udcca JSON output recommended for parsing in CI pipelines\n- \ud83d\udd0d Exit codes: 0 (clean), 1 (issues found), 2 (errors)\n\n### **Third-Party Model Validation**\n\n```bash\n# Scan models from HuggingFace, PyTorch Hub, MLflow, JFrog, or cloud storage\nmodelaudit https://huggingface.co/gpt2\nmodelaudit https://pytorch.org/hub/pytorch_vision_resnet/\nmodelaudit models:/MyModel/Production\nmodelaudit model.dvc\nmodelaudit s3://my-bucket/downloaded-model.pt\n\n# JFrog Artifactory - now supports both files AND folders\n# Auth: export JFROG_API_TOKEN=... (or JFROG_ACCESS_TOKEN)\nmodelaudit https://company.jfrog.io/artifactory/repo/model.pt\n# Or with explicit flag:\nmodelaudit https://company.jfrog.io/artifactory/repo/model.pt --api-token \"$JFROG_API_TOKEN\"\nmodelaudit https://company.jfrog.io/artifactory/repo/models/  # Scan entire folder!\n```\n\n### **Compliance & Audit Reporting**\n\n```bash\nmodelaudit model_package.zip --sbom compliance_report.json --strict --verbose\n```\n\n### \ud83e\udde0 Smart Detection Examples\n\nModelAudit automatically adapts to your input - **no configuration needed for most cases:**\n\n```bash\n# Local file - fast scan, no progress bars\nmodelaudit model.pkl\n\n# Cloud directory - auto enables caching + progress bars\nmodelaudit s3://my-bucket/models/\n\n# HuggingFace model - selective download + caching\nmodelaudit hf://microsoft/DialoGPT-medium\n\n# Large local file - enables progress + optimizations\nmodelaudit 15GB-model.bin\n\n# CI environment - auto detects and uses JSON output\nCI=true modelaudit model.pkl\n```\n\n**Override smart detection when needed:**\n\n```bash\n# Force strict mode for security-critical scans\nmodelaudit model.pkl --strict --format json --output report.json\n\n# Override size limits for huge models\nmodelaudit huge-model.pt --max-size 50GB --timeout 7200\n\n# Preview mode without downloading\nmodelaudit s3://bucket/model.pt --dry-run\n```\n\n[View advanced usage examples \u2192](https://www.promptfoo.dev/docs/model-audit/usage/)\n\n### \u2699\ufe0f Smart Detection & CLI Options\n\nModelAudit uses **smart detection** to automatically configure optimal settings based on your input:\n\n**\u2728 Smart Detection Features:**\n\n- **Input type** (local/cloud/registry) \u2192 optimal download & caching strategies\n- **File size** (>1GB) \u2192 large model optimizations + progress bars\n- **Terminal type** (TTY/CI) \u2192 appropriate UI (progress vs quiet mode)\n- **Cloud operations** \u2192 automatic caching, size limits, timeouts\n\n**\ud83c\udf9b\ufe0f Override Controls (13 focused flags):**\n\n- `--strict` \u2013 scan all file types, strict license validation, fail on warnings\n- `--max-size SIZE` \u2013 unified size limit (e.g., `10GB`, `500MB`)\n- `--timeout SECONDS` \u2013 override auto-detected timeout\n- `--dry-run` \u2013 preview what would be scanned/downloaded\n- `--progress` \u2013 force enable progress reporting\n- `--no-cache` \u2013 disable caching (overrides smart detection)\n- `--format json` / `--output file.json` \u2013 structured output for CI/CD\n- `--sbom file.json` \u2013 generate CycloneDX v1.6 SBOM with enhanced ML-BOM support\n- `--verbose` / `--quiet` \u2013 control output detail level\n- `--blacklist PATTERN` \u2013 additional security patterns\n\n**\ud83d\udd10 Authentication (via environment variables):**\n\n- Set `JFROG_API_TOKEN` or `JFROG_ACCESS_TOKEN` for JFrog Artifactory\n- Set `MLFLOW_TRACKING_URI` for MLflow registry access\n\n### \ud83d\ude80 Large Model Support (Up to 1 TB)\n\nModelAudit automatically optimizes scanning strategies for different model sizes:\n\n- **< 100 GB**: Full in-memory analysis for comprehensive scanning\n- **100 GB - 1 TB**: Chunked processing with 50 GB chunks for memory efficiency\n- **1 TB - 5 TB**: Streaming analysis with intelligent sampling\n- **> 5 TB**: Advanced distributed scanning techniques\n\nLarge models are supported with automatic timeout increases and memory-optimized processing.\n\n### Static Scanning vs. Promptfoo Redteaming\n\nModelAudit performs **static** analysis only. It examines model files for risky patterns\nwithout ever loading or executing them. Promptfoo's redteaming module is\n**dynamic**\u2014it loads the model (locally or via API) and sends crafted prompts to\nprobe runtime behavior. Use ModelAudit first to verify the model file itself,\nthen run redteaming if you need to test how the model responds when invoked.\n\n## \u2699\ufe0f Installation Options\n\n**Requirements:**\n\n- Python 3.10 or higher\n- Compatible with Python 3.10, 3.11, 3.12, and 3.13\n\n**Basic installation (recommended for most users):**\n\n### Quick Install Decision Guide\n\n**\ud83d\ude80 Just want everything to work?**\n\n```bash\npip install modelaudit[all]\n```\n\n**Basic installation:**\n\n```bash\n# Core functionality only (pickle, numpy, archives)\npip install modelaudit\n```\n\n**Specific frameworks:**\n\n```bash\npip install modelaudit[tensorflow]  # TensorFlow (.pb)\npip install modelaudit[pytorch]     # PyTorch (.pt, .pth)\npip install modelaudit[h5]          # Keras (.h5, .keras)\npip install modelaudit[onnx]        # ONNX (.onnx)\npip install modelaudit[safetensors] # SafeTensors (.safetensors)\n\n# Multiple frameworks\npip install modelaudit[tensorflow,pytorch,h5]\n```\n\n**Additional features:**\n\n```bash\npip install modelaudit[cloud]       # S3, GCS, Azure storage\npip install modelaudit[coreml]      # Apple Core ML\npip install modelaudit[flax]        # JAX/Flax models\npip install modelaudit[mlflow]      # MLflow registry\npip install modelaudit[huggingface] # Hugging Face integration\n```\n\n**Compatibility:**\n\n```bash\n# NumPy 1.x compatibility (some frameworks require NumPy < 2.0)\npip install modelaudit[numpy1]\n\n# For CI/CD environments (omits dependencies like TensorRT that may not be available)\npip install modelaudit[all-ci]\n```\n\n**Docker:**\n\n```bash\ndocker pull ghcr.io/promptfoo/modelaudit:latest\n# Linux/macOS\ndocker run --rm -v \"$(pwd)\":/app ghcr.io/promptfoo/modelaudit:latest model.pkl\n# Windows\ndocker run --rm -v \"%cd%\":/app ghcr.io/promptfoo/modelaudit:latest model.pkl\n```\n\n## Security Checks\n\n### Code Execution Detection\n\n- Dangerous Python modules: `os`, `sys`, `subprocess`, `eval`, `exec`\n- Pickle opcodes: `REDUCE`, `GLOBAL`, `INST`, `OBJ`, `NEWOBJ`, `STACK_GLOBAL`, `BUILD`, `NEWOBJ_EX`\n- Embedded executable file detection\n\n### Embedded Data Extraction\n\n- API keys, tokens, and credentials in model weights/metadata\n- URLs, IP addresses, and network endpoints\n- Suspicious configuration properties\n\n### Archive Security\n\n- Path traversal attacks in ZIP/TAR archives\n- Executable files within model packages\n- Malicious filenames and directory structures\n\n### ML Framework Analysis\n\n- TensorFlow operations: `PyFunc`, `PyFuncStateless`\n- Keras unsafe layers and custom objects\n- Template injection in model configurations\n\n### Context-Aware Analysis\n\n- Intelligently distinguishes between legitimate ML framework patterns and genuine threats to reduce false positives in complex model files\n\n## Supported Formats\n\nModelAudit includes **29 specialized file format scanners** ([see complete list](https://www.promptfoo.dev/docs/model-audit/scanners/)):\n\n### Model Formats\n\n| Format              | Extensions                            | Risk Level | Security Focus                    |\n| ------------------- | ------------------------------------- | ---------- | --------------------------------- |\n| **Pickle**          | `.pkl`, `.pickle`, `.dill`            | \ud83d\udd34 HIGH    | Code execution, dangerous opcodes |\n| **PyTorch**         | `.pt`, `.pth`, `.ckpt`, `.bin`        | \ud83d\udd34 HIGH    | Pickle payloads, embedded malware |\n| **Joblib**          | `.joblib`                             | \ud83d\udd34 HIGH    | Pickled scikit-learn objects      |\n| **NumPy**           | `.npy`, `.npz`                        | \ud83d\udd34 HIGH    | Array metadata, pickle objects    |\n| **TensorFlow**      | `.pb`, SavedModel directories         | \ud83d\udfe0 MEDIUM  | PyFunc operations, custom ops     |\n| **Keras**           | `.h5`, `.hdf5`, `.keras`              | \ud83d\udfe0 MEDIUM  | Unsafe layers, custom objects     |\n| **ONNX**            | `.onnx`                               | \ud83d\udfe0 MEDIUM  | Custom operators, metadata        |\n| **XGBoost**         | `.bst`, `.model`, `.ubj`              | \ud83d\udfe0 MEDIUM  | Serialized boosting models        |\n| **SafeTensors**     | `.safetensors`                        | \ud83d\udfe2 SAFE    | Header validation (recommended)   |\n| **GGUF/GGML**       | `.gguf`, `.ggml`                      | \ud83d\udfe2 SAFE    | LLM standard format               |\n| **JAX/Flax**        | `.msgpack`, `.flax`, `.orbax`, `.jax` | \ud83d\udfe1 LOW     | Msgpack serialization             |\n| **JAX Checkpoint**  | `.ckpt`, `.checkpoint`, `.pickle`     | \ud83d\udfe1 LOW     | JAX checkpoint formats            |\n| **TensorFlow Lite** | `.tflite`                             | \ud83d\udfe1 LOW     | Mobile model validation           |\n| **ExecuTorch**      | `.ptl`, `.pte`                        | \ud83d\udfe1 LOW     | PyTorch mobile archives           |\n| **Core ML**         | `.mlmodel`                            | \ud83d\udfe1 LOW     | Apple custom layers               |\n| **TensorRT**        | `.engine`, `.plan`                    | \ud83d\udfe1 LOW     | NVIDIA inference engines          |\n| **PaddlePaddle**    | `.pdmodel`, `.pdiparams`              | \ud83d\udfe1 LOW     | Custom operations                 |\n| **OpenVINO**        | `.xml`                                | \ud83d\udfe1 LOW     | Intel IR format                   |\n| **PMML**            | `.pmml`                               | \ud83d\udfe1 LOW     | XML predictive models             |\n\n### Archive & Configuration Formats\n\n| Format               | Extensions                                  | Security Focus                  |\n| -------------------- | ------------------------------------------- | ------------------------------- |\n| **ZIP**              | `.zip`                                      | Path traversal, malicious files |\n| **TAR**              | `.tar`, `.tar.gz`, `.tgz`, `.tar.bz2`, etc. | Archive exploits                |\n| **7-Zip**            | `.7z`                                       | Archive security                |\n| **OCI Layers**       | `.manifest`                                 | Container layer analysis        |\n| **Metadata**         | `.json`, `.md`, `.yml`, `.yaml`, `.rst`     | Embedded secrets, URLs          |\n| **Manifest**         | `.json`, `.yaml`, `.xml`, `.toml`, `.ini`   | Configuration vulnerabilities   |\n| **Text**             | `.txt`, `.md`, `.markdown`, `.rst`          | ML-related text analysis        |\n| **Jinja2 Templates** | `.jinja`, `.j2`, `.template`                | Template injection (SSTI)       |\n\n[Complete format documentation \u2192](https://www.promptfoo.dev/docs/model-audit/scanners/)\n\n## Usage Examples\n\n### Basic Scanning\n\n```bash\n# Scan single file\nmodelaudit model.pkl\n\n# Scan directory\nmodelaudit ./models/\n\n# Strict mode (fail on warnings)\nmodelaudit model.pkl --strict\n```\n\n### CI/CD Integration\n\n```bash\n# JSON output for automation\nmodelaudit models/ --format json --output results.json\n\n# Generate SBOM report\nmodelaudit model.pkl --sbom compliance_report.json\n\n# Disable colors in CI\nNO_COLOR=1 modelaudit models/\n```\n\n### Remote Sources\n\n```bash\n# Hugging Face models (via direct URL or hf:// scheme)\nmodelaudit https://huggingface.co/gpt2\nmodelaudit hf://microsoft/DialoGPT-medium\n\n# Cloud storage\nmodelaudit s3://bucket/model.pt\nmodelaudit gs://bucket/models/\nmodelaudit https://account.blob.core.windows.net/container/model.pt\n\n# MLflow registry\nmodelaudit models:/MyModel/Production\n\n# JFrog Artifactory (files and folders)\nmodelaudit https://company.jfrog.io/artifactory/repo/model.pt      # Single file\nmodelaudit https://company.jfrog.io/artifactory/repo/models/       # Entire folder\n```\n\n### Command Options\n\n- **`--format`** - Output format: text, json, sarif\n- **`--output`** - Write results to file\n- **`--verbose`** - Detailed output\n- **`--quiet`** - Minimal output\n- **`--strict`** - Fail on warnings, scan all files\n- **`--timeout`** - Override scan timeout\n- **`--max-size`** - Set size limits (e.g., 10 GB)\n- **`--dry-run`** - Preview without scanning\n- **`--progress`** - Force progress display\n- **`--sbom`** - Generate CycloneDX SBOM\n- **`--blacklist`** - Additional patterns to flag\n- **`--no-cache`** - Disable result caching\n\n[Advanced usage examples \u2192](https://www.promptfoo.dev/docs/model-audit/usage/)\n\n## \ud83d\udee1\ufe0f Whitelist System\n\nModelAudit includes a smart whitelist system that **reduces false positives** for trusted models while maintaining security:\n\n### What's Whitelisted\n\n- **7,440+ models** from two trusted sources:\n  1. **Popular models** (540 models) - Top downloaded models from HuggingFace\n  2. **Trusted organizations** (6,900 models) - Models from 18 verified organizations:\n     - Meta/Facebook, Google, Microsoft, NVIDIA\n     - OpenAI, Hugging Face, Stability AI\n     - EleutherAI, BigScience, BigCode\n     - Mistral AI, Sentence Transformers\n     - And more...\n\n### How It Works\n\n- **Automatic detection**: Model IDs are extracted from URLs, cache paths, and metadata\n- **Smart downgrading**: Security findings are downgraded from WARNING/CRITICAL \u2192 INFO\n- **Enabled by default**: Works transparently with no configuration needed\n- **User control**: Disable via config if needed: `{\"use_hf_whitelist\": False}`\n\n### Example\n\n```bash\n# Scanning a whitelisted model\n$ modelaudit facebook/bart-large-cnn\n\n\u2713 Scanning facebook/bart-large-cnn\nFiles scanned: 3 | Issues found: 0 critical, 0 warning, 2 info\n\n# Issues are downgraded to INFO for trusted models\n1. model.safetensors: [INFO] Contains pickle import (whitelisted model)\n   Original severity: WARNING\n```\n\n### Updating the Whitelist\n\n**For maintainers**: Update periodically to include new popular models and releases:\n\n```bash\n# Update popular models (top downloads)\npython scripts/fetch_hf_top_models.py --count 2000\n\n# Update organization models (trusted orgs)\npython scripts/fetch_hf_org_models.py\n\n# Commit the updated files in modelaudit/whitelists/\n```\n\n**Recommended update frequency**: Monthly or before major releases\n\n## Output Formats\n\n### Text (default)\n\n```text\n$ modelaudit model.pkl\n\n\u2713 Scanning model.pkl\nFiles scanned: 1 | Issues found: 1 critical\n\n1. model.pkl (pos 28): [CRITICAL] Malicious code execution attempt\n   Why: Contains os.system() call that could run arbitrary commands\n```\n\n### JSON (for automation)\n\n```bash\nmodelaudit model.pkl --format json\n```\n\n```json\n{\n  \"files_scanned\": 1,\n  \"issues\": [\n    {\n      \"message\": \"Malicious code execution attempt\",\n      \"severity\": \"critical\",\n      \"location\": \"model.pkl (pos 28)\"\n    }\n  ]\n}\n```\n\n### SARIF (for security tools)\n\n```bash\nmodelaudit model.pkl --format sarif --output results.sarif\n```\n\n## Troubleshooting\n\n### Check scanner availability\n\n```bash\nmodelaudit doctor --show-failed\n```\n\n### NumPy compatibility issues\n\n```bash\n# Use NumPy 1.x compatibility mode\npip install modelaudit[numpy1]\n```\n\n### Missing dependencies\n\n```bash\n# ModelAudit shows exactly what to install\nmodelaudit your-model.onnx\n# Output: \"Install with 'pip install modelaudit[onnx]'\"\n```\n\n### Exit Codes\n\n- `0` - No security issues found\n- `1` - Security issues detected\n- `2` - Scan errors occurred\n\n### Authentication\n\nModelAudit uses environment variables for authenticating to remote services:\n\n```bash\n# JFrog Artifactory\nexport JFROG_API_TOKEN=your_token\n\n# MLflow\nexport MLFLOW_TRACKING_URI=http://localhost:5000\n\n# AWS, Google Cloud, and Azure\n# Authentication is handled automatically by the respective client libraries\n# (e.g., via IAM roles, `aws configure`, `gcloud auth login`, or environment variables).\n# For specific env var setup, refer to the library's documentation.\nexport AWS_ACCESS_KEY_ID=your_access_key\nexport AWS_SECRET_ACCESS_KEY=your_secret_key\nexport GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json\n\n# Hugging Face\nexport HF_TOKEN=your_token\n```\n\n## Documentation\n\n- **Documentation**: [promptfoo.dev/docs/model-audit/](https://www.promptfoo.dev/docs/model-audit/)\n- **Usage Examples**: [promptfoo.dev/docs/model-audit/usage/](https://www.promptfoo.dev/docs/model-audit/usage/)\n- **Report Issues**: Contact support at [promptfoo.dev](https://www.promptfoo.dev/)\n\n## \ud83d\udcdd License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Static scanning library for detecting malicious code, backdoors, and other security risks in ML model files",
    "version": "0.2.14",
    "project_urls": {
        "Homepage": "https://github.com/promptfoo/modelaudit",
        "Repository": "https://github.com/promptfoo/modelaudit"
    },
    "split_keywords": [
        "ai",
        " ml",
        " model-scanning",
        " pickle",
        " pytorch",
        " security",
        " tensorflow"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c34c7a97b7f8550d5bf2cdbebfb9396c88886e1711b85cf99a61e2574fb15d34",
                "md5": "9f3558bafce0797ac42cb7e8817398b9",
                "sha256": "dd46feffa8a6cf8e7deb92e207ca99b90c4634bd34bdc8aa9b13a5583daed825"
            },
            "downloads": -1,
            "filename": "modelaudit-0.2.14-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9f3558bafce0797ac42cb7e8817398b9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 557611,
            "upload_time": "2025-10-24T01:32:40",
            "upload_time_iso_8601": "2025-10-24T01:32:40.906529Z",
            "url": "https://files.pythonhosted.org/packages/c3/4c/7a97b7f8550d5bf2cdbebfb9396c88886e1711b85cf99a61e2574fb15d34/modelaudit-0.2.14-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "60487bf8c9c368cf79fa288c2174d190190ff1fb7c205f136325950571c2e841",
                "md5": "0f42bad03609cd8dee0567f0a3cb293e",
                "sha256": "d6f174fe1cc91fc8be32c486f37d628c8bf849cbcddc3847a5cae000d88964f8"
            },
            "downloads": -1,
            "filename": "modelaudit-0.2.14.tar.gz",
            "has_sig": false,
            "md5_digest": "0f42bad03609cd8dee0567f0a3cb293e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 9174538,
            "upload_time": "2025-10-24T01:32:43",
            "upload_time_iso_8601": "2025-10-24T01:32:43.259821Z",
            "url": "https://files.pythonhosted.org/packages/60/48/7bf8c9c368cf79fa288c2174d190190ff1fb7c205f136325950571c2e841/modelaudit-0.2.14.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-24 01:32:43",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "promptfoo",
    "github_project": "modelaudit",
    "github_not_found": true,
    "lcname": "modelaudit"
}
        
Elapsed time: 2.97367s