# diffai - AI/ML Specialized Diff Tool (Python Package)
[](https://badge.fury.io/py/diffai-python)
[](https://pypi.org/project/diffai-python/)
[](https://pypi.org/project/diffai-python/)
AI/ML specialized data diff tool for deep tensor comparison and analysis. This Python package provides a convenient and type-safe interface to diffai through Python.
## 🚀 Quick Start
### Installation
```bash
# Install via pip
pip install diffai-python
# Development installation
pip install diffai-python[dev]
```
### Basic Usage
```python
import diffai
# Simple model comparison
result = diffai.diff("model_v1.safetensors", "model_v2.safetensors", stats=True)
print(result)
# Advanced ML analysis with type-safe configuration
options = diffai.DiffOptions(
stats=True,
architecture_comparison=True,
memory_analysis=True,
output_format=diffai.OutputFormat.JSON
)
result = diffai.diff("baseline.safetensors", "improved.safetensors", options)
if result.is_json:
for change in result.changes:
print(f"Changed: {change}")
```
### Command Line Usage
```bash
# The package also installs the diffai binary
diffai model1.safetensors model2.safetensors --stats
# Download binary manually if needed
diffai-download-binary
```
## 📦 Supported File Formats
### AI/ML Formats (Specialized Analysis)
- **Safetensors** (.safetensors) - PyTorch model format with ML analysis
- **PyTorch** (.pt, .pth) - Native PyTorch models with tensor statistics
- **NumPy** (.npy, .npz) - Scientific computing arrays with statistical analysis
- **MATLAB** (.mat) - Engineering/scientific data with numerical analysis
### Structured Data Formats (Universal)
- **JSON** (.json) - API configurations, model metadata
- **YAML** (.yaml, .yml) - Configuration files, CI/CD pipelines
- **TOML** (.toml) - Rust configs, Python pyproject.toml
- **XML** (.xml) - Legacy configurations, model definitions
- **CSV** (.csv) - Datasets, experiment results
- **INI** (.ini) - Legacy configuration files
## 🔬 35 ML Analysis Functions
### Core Analysis Functions
```python
# Statistical analysis
result = diffai.diff("model1.safetensors", "model2.safetensors", stats=True)
# Quantization analysis
result = diffai.diff("fp32.safetensors", "quantized.safetensors",
quantization_analysis=True)
# Change magnitude sorting
result = diffai.diff("model1.safetensors", "model2.safetensors",
sort_by_change_magnitude=True, stats=True)
```
### Phase 3 Advanced Analysis (v0.2.7+)
```python
# Architecture comparison
result = diffai.diff("model1.safetensors", "model2.safetensors",
architecture_comparison=True)
# Memory analysis for deployment
result = diffai.diff("model1.safetensors", "model2.safetensors",
memory_analysis=True)
# Anomaly detection for debugging
result = diffai.diff("stable.safetensors", "problematic.safetensors",
anomaly_detection=True)
# Comprehensive analysis
options = diffai.DiffOptions(
stats=True,
architecture_comparison=True,
memory_analysis=True,
anomaly_detection=True,
convergence_analysis=True,
gradient_analysis=True,
similarity_matrix=True,
change_summary=True
)
result = diffai.diff("baseline.safetensors", "improved.safetensors", options)
```
## 💡 Python API Examples
### Type-Safe Configuration
```python
from diffai import DiffOptions, OutputFormat
# Create type-safe configuration
options = DiffOptions(
stats=True,
architecture_comparison=True,
memory_analysis=True,
output_format=OutputFormat.JSON
)
# Compare models
result = diffai.diff("model1.safetensors", "model2.safetensors", options)
# Access structured results
if result.is_json:
print(f"Found {len(result.changes)} changes")
for change in result.changes:
print(f" {change.get('path')}: {change.get('type')}")
```
### Scientific Data Analysis
```python
# NumPy array comparison
result = diffai.diff("experiment_v1.npy", "experiment_v2.npy", stats=True)
print(f"Statistical changes: {result}")
# MATLAB data comparison
result = diffai.diff("simulation_v1.mat", "simulation_v2.mat",
stats=True, sort_by_change_magnitude=True)
```
### JSON Output for Automation
```python
# Get JSON results for MLOps integration
result = diffai.diff("model1.safetensors", "model2.safetensors",
stats=True, output_format=diffai.OutputFormat.JSON)
if result.is_json:
# Process structured data
changes = result.changes
summary = result.summary
# Integration with MLflow, Weights & Biases, etc.
log_model_comparison(changes, summary)
```
### Error Handling
```python
try:
result = diffai.diff("model1.safetensors", "model2.safetensors", stats=True)
print(result)
except diffai.BinaryNotFoundError:
print("diffai binary not found. Please install: pip install diffai-python")
except diffai.InvalidInputError as e:
print(f"Invalid input: {e}")
except diffai.DiffaiError as e:
print(f"diffai error: {e}")
```
### String Comparison (Temporary Files)
```python
# Compare JSON strings directly
json1 = '{"model": "gpt-2", "layers": 12}'
json2 = '{"model": "gpt-2", "layers": 24}'
result = diffai.diff_string(json1, json2, output_format=diffai.OutputFormat.JSON)
print(result)
```
## 🔧 Advanced Usage
### Installation Verification
```python
# Check if diffai is properly installed
try:
info = diffai.verify_installation()
print(f"diffai version: {info['version']}")
print(f"Binary path: {info['binary_path']}")
except diffai.BinaryNotFoundError as e:
print(f"Installation issue: {e}")
```
### Manual Binary Management
```python
# Download binary programmatically
from diffai.installer import install_binary
success = install_binary(force=True) # Force reinstall
if success:
print("Binary installed successfully")
```
### Low-Level API Access
```python
# Direct command execution
result = diffai.run_diffai([
"model1.safetensors",
"model2.safetensors",
"--stats",
"--architecture-comparison",
"--output", "json"
])
print(f"Exit code: {result.exit_code}")
print(f"Output: {result.raw_output}")
```
## 🔗 Integration Examples
### MLflow Integration
```python
import mlflow
import diffai
def log_model_comparison(model1_path, model2_path, run_id=None):
with mlflow.start_run(run_id=run_id):
# Compare models with comprehensive analysis
result = diffai.diff(
model1_path, model2_path,
stats=True,
architecture_comparison=True,
memory_analysis=True,
output_format=diffai.OutputFormat.JSON
)
if result.is_json:
# Log structured comparison data
mlflow.log_dict(result.data, "model_comparison.json")
# Log metrics
if result.changes:
mlflow.log_metric("total_changes", len(result.changes))
mlflow.log_metric("significant_changes",
sum(1 for c in result.changes
if c.get('magnitude', 0) > 0.1))
# Usage
log_model_comparison("baseline.safetensors", "candidate.safetensors")
```
### Weights & Biases Integration
```python
import wandb
import diffai
def wandb_log_model_diff(model1, model2, **kwargs):
result = diffai.diff(model1, model2,
stats=True,
output_format=diffai.OutputFormat.JSON,
**kwargs)
if result.is_json and result.changes:
# Log to wandb
wandb.log({
"model_comparison": wandb.Table(
columns=["parameter", "change_type", "magnitude"],
data=[[c.get("path"), c.get("type"), c.get("magnitude")]
for c in result.changes[:100]] # Limit rows
)
})
# Initialize wandb run
wandb.init(project="model-comparison")
wandb_log_model_diff("model_v1.safetensors", "model_v2.safetensors")
```
### Flask API Endpoint
```python
from flask import Flask, request, jsonify
import diffai
app = Flask(__name__)
@app.route('/compare', methods=['POST'])
def compare_models():
try:
files = request.files
model1 = files['model1']
model2 = files['model2']
# Save temporary files
model1.save('/tmp/model1.safetensors')
model2.save('/tmp/model2.safetensors')
# Compare models
result = diffai.diff('/tmp/model1.safetensors', '/tmp/model2.safetensors',
stats=True,
architecture_comparison=True,
output_format=diffai.OutputFormat.JSON)
return jsonify({
"status": "success",
"comparison": result.data if result.is_json else result.raw_output
})
except diffai.DiffaiError as e:
return jsonify({"status": "error", "message": str(e)}), 400
if __name__ == '__main__':
app.run(debug=True)
```
## 🏗️ Platform Support
This package automatically downloads platform-specific binaries:
- **Linux** (x86_64, ARM64)
- **macOS** (Intel x86_64, Apple Silicon ARM64)
- **Windows** (x86_64)
The binary is downloaded during installation and cached. If download fails, the package falls back to system PATH.
## 🔗 Related Projects
- **[diffx-python](https://pypi.org/project/diffx-python/)** - General-purpose structured data diff tool
- **[diffai (npm)](https://www.npmjs.com/package/diffai)** - Node.js package for diffai
- **[diffai (GitHub)](https://github.com/diffai-team/diffai)** - Main repository
## 📚 Documentation
- [CLI Reference](https://github.com/diffai-team/diffai/blob/main/docs/reference/cli-reference.md)
- [ML Analysis Guide](https://github.com/diffai-team/diffai/blob/main/docs/reference/ml-analysis.md)
- [User Guide](https://github.com/diffai-team/diffai/blob/main/docs/user-guide/)
- [API Documentation](https://github.com/diffai-team/diffai/blob/main/docs/reference/api-reference.md)
## 📄 License
MIT License - see [LICENSE](https://github.com/diffai-team/diffai/blob/main/LICENSE) file for details.
## 🤝 Contributing
Contributions welcome! Please see [CONTRIBUTING.md](https://github.com/diffai-team/diffai/blob/main/CONTRIBUTING.md) for guidelines.
---
**diffai** - Making AI/ML data differences visible, measurable, and actionable through Python. 🐍🚀
Raw data
{
"_id": null,
"home_page": null,
"name": "diffai-python",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "ai, artificial-intelligence, diff, diffai, machine-learning, matlab, ml, model-comparison, numpy, pytorch, safetensors, tensor",
"author": "kako-jun",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/a0/e1/e810f6c2ea95fa2faabdb026507b77dea010ccb4296aded62ee6e16a8014/diffai_python-0.2.9.tar.gz",
"platform": null,
"description": "# diffai - AI/ML Specialized Diff Tool (Python Package)\n\n[](https://badge.fury.io/py/diffai-python)\n[](https://pypi.org/project/diffai-python/)\n[](https://pypi.org/project/diffai-python/)\n\nAI/ML specialized data diff tool for deep tensor comparison and analysis. This Python package provides a convenient and type-safe interface to diffai through Python.\n\n## \ud83d\ude80 Quick Start\n\n### Installation\n\n```bash\n# Install via pip\npip install diffai-python\n\n# Development installation\npip install diffai-python[dev]\n```\n\n### Basic Usage\n\n```python\nimport diffai\n\n# Simple model comparison\nresult = diffai.diff(\"model_v1.safetensors\", \"model_v2.safetensors\", stats=True)\nprint(result)\n\n# Advanced ML analysis with type-safe configuration\noptions = diffai.DiffOptions(\n stats=True,\n architecture_comparison=True,\n memory_analysis=True,\n output_format=diffai.OutputFormat.JSON\n)\n\nresult = diffai.diff(\"baseline.safetensors\", \"improved.safetensors\", options)\nif result.is_json:\n for change in result.changes:\n print(f\"Changed: {change}\")\n```\n\n### Command Line Usage\n\n```bash\n# The package also installs the diffai binary\ndiffai model1.safetensors model2.safetensors --stats\n\n# Download binary manually if needed\ndiffai-download-binary\n```\n\n## \ud83d\udce6 Supported File Formats\n\n### AI/ML Formats (Specialized Analysis)\n- **Safetensors** (.safetensors) - PyTorch model format with ML analysis\n- **PyTorch** (.pt, .pth) - Native PyTorch models with tensor statistics\n- **NumPy** (.npy, .npz) - Scientific computing arrays with statistical analysis\n- **MATLAB** (.mat) - Engineering/scientific data with numerical analysis\n\n### Structured Data Formats (Universal)\n- **JSON** (.json) - API configurations, model metadata\n- **YAML** (.yaml, .yml) - Configuration files, CI/CD pipelines\n- **TOML** (.toml) - Rust configs, Python pyproject.toml\n- **XML** (.xml) - Legacy configurations, model definitions\n- **CSV** (.csv) - Datasets, experiment results\n- **INI** (.ini) - Legacy configuration files\n\n## \ud83d\udd2c 35 ML Analysis Functions\n\n### Core Analysis Functions\n```python\n# Statistical analysis\nresult = diffai.diff(\"model1.safetensors\", \"model2.safetensors\", stats=True)\n\n# Quantization analysis\nresult = diffai.diff(\"fp32.safetensors\", \"quantized.safetensors\", \n quantization_analysis=True)\n\n# Change magnitude sorting\nresult = diffai.diff(\"model1.safetensors\", \"model2.safetensors\", \n sort_by_change_magnitude=True, stats=True)\n```\n\n### Phase 3 Advanced Analysis (v0.2.7+)\n```python\n# Architecture comparison\nresult = diffai.diff(\"model1.safetensors\", \"model2.safetensors\", \n architecture_comparison=True)\n\n# Memory analysis for deployment\nresult = diffai.diff(\"model1.safetensors\", \"model2.safetensors\", \n memory_analysis=True)\n\n# Anomaly detection for debugging\nresult = diffai.diff(\"stable.safetensors\", \"problematic.safetensors\", \n anomaly_detection=True)\n\n# Comprehensive analysis\noptions = diffai.DiffOptions(\n stats=True,\n architecture_comparison=True,\n memory_analysis=True,\n anomaly_detection=True,\n convergence_analysis=True,\n gradient_analysis=True,\n similarity_matrix=True,\n change_summary=True\n)\nresult = diffai.diff(\"baseline.safetensors\", \"improved.safetensors\", options)\n```\n\n## \ud83d\udca1 Python API Examples\n\n### Type-Safe Configuration\n```python\nfrom diffai import DiffOptions, OutputFormat\n\n# Create type-safe configuration\noptions = DiffOptions(\n stats=True,\n architecture_comparison=True,\n memory_analysis=True,\n output_format=OutputFormat.JSON\n)\n\n# Compare models\nresult = diffai.diff(\"model1.safetensors\", \"model2.safetensors\", options)\n\n# Access structured results\nif result.is_json:\n print(f\"Found {len(result.changes)} changes\")\n for change in result.changes:\n print(f\" {change.get('path')}: {change.get('type')}\")\n```\n\n### Scientific Data Analysis\n```python\n# NumPy array comparison\nresult = diffai.diff(\"experiment_v1.npy\", \"experiment_v2.npy\", stats=True)\nprint(f\"Statistical changes: {result}\")\n\n# MATLAB data comparison\nresult = diffai.diff(\"simulation_v1.mat\", \"simulation_v2.mat\", \n stats=True, sort_by_change_magnitude=True)\n```\n\n### JSON Output for Automation\n```python\n# Get JSON results for MLOps integration\nresult = diffai.diff(\"model1.safetensors\", \"model2.safetensors\", \n stats=True, output_format=diffai.OutputFormat.JSON)\n\nif result.is_json:\n # Process structured data\n changes = result.changes\n summary = result.summary\n \n # Integration with MLflow, Weights & Biases, etc.\n log_model_comparison(changes, summary)\n```\n\n### Error Handling\n```python\ntry:\n result = diffai.diff(\"model1.safetensors\", \"model2.safetensors\", stats=True)\n print(result)\nexcept diffai.BinaryNotFoundError:\n print(\"diffai binary not found. Please install: pip install diffai-python\")\nexcept diffai.InvalidInputError as e:\n print(f\"Invalid input: {e}\")\nexcept diffai.DiffaiError as e:\n print(f\"diffai error: {e}\")\n```\n\n### String Comparison (Temporary Files)\n```python\n# Compare JSON strings directly\njson1 = '{\"model\": \"gpt-2\", \"layers\": 12}'\njson2 = '{\"model\": \"gpt-2\", \"layers\": 24}'\n\nresult = diffai.diff_string(json1, json2, output_format=diffai.OutputFormat.JSON)\nprint(result)\n```\n\n## \ud83d\udd27 Advanced Usage\n\n### Installation Verification\n```python\n# Check if diffai is properly installed\ntry:\n info = diffai.verify_installation()\n print(f\"diffai version: {info['version']}\")\n print(f\"Binary path: {info['binary_path']}\")\nexcept diffai.BinaryNotFoundError as e:\n print(f\"Installation issue: {e}\")\n```\n\n### Manual Binary Management\n```python\n# Download binary programmatically\nfrom diffai.installer import install_binary\n\nsuccess = install_binary(force=True) # Force reinstall\nif success:\n print(\"Binary installed successfully\")\n```\n\n### Low-Level API Access\n```python\n# Direct command execution\nresult = diffai.run_diffai([\n \"model1.safetensors\", \n \"model2.safetensors\", \n \"--stats\", \n \"--architecture-comparison\",\n \"--output\", \"json\"\n])\n\nprint(f\"Exit code: {result.exit_code}\")\nprint(f\"Output: {result.raw_output}\")\n```\n\n## \ud83d\udd17 Integration Examples\n\n### MLflow Integration\n```python\nimport mlflow\nimport diffai\n\ndef log_model_comparison(model1_path, model2_path, run_id=None):\n with mlflow.start_run(run_id=run_id):\n # Compare models with comprehensive analysis\n result = diffai.diff(\n model1_path, model2_path,\n stats=True,\n architecture_comparison=True,\n memory_analysis=True,\n output_format=diffai.OutputFormat.JSON\n )\n \n if result.is_json:\n # Log structured comparison data\n mlflow.log_dict(result.data, \"model_comparison.json\")\n \n # Log metrics\n if result.changes:\n mlflow.log_metric(\"total_changes\", len(result.changes))\n mlflow.log_metric(\"significant_changes\", \n sum(1 for c in result.changes \n if c.get('magnitude', 0) > 0.1))\n\n# Usage\nlog_model_comparison(\"baseline.safetensors\", \"candidate.safetensors\")\n```\n\n### Weights & Biases Integration\n```python\nimport wandb\nimport diffai\n\ndef wandb_log_model_diff(model1, model2, **kwargs):\n result = diffai.diff(model1, model2, \n stats=True, \n output_format=diffai.OutputFormat.JSON,\n **kwargs)\n \n if result.is_json and result.changes:\n # Log to wandb\n wandb.log({\n \"model_comparison\": wandb.Table(\n columns=[\"parameter\", \"change_type\", \"magnitude\"],\n data=[[c.get(\"path\"), c.get(\"type\"), c.get(\"magnitude\")] \n for c in result.changes[:100]] # Limit rows\n )\n })\n\n# Initialize wandb run\nwandb.init(project=\"model-comparison\")\nwandb_log_model_diff(\"model_v1.safetensors\", \"model_v2.safetensors\")\n```\n\n### Flask API Endpoint\n```python\nfrom flask import Flask, request, jsonify\nimport diffai\n\napp = Flask(__name__)\n\n@app.route('/compare', methods=['POST'])\ndef compare_models():\n try:\n files = request.files\n model1 = files['model1']\n model2 = files['model2']\n \n # Save temporary files\n model1.save('/tmp/model1.safetensors')\n model2.save('/tmp/model2.safetensors')\n \n # Compare models\n result = diffai.diff('/tmp/model1.safetensors', '/tmp/model2.safetensors',\n stats=True, \n architecture_comparison=True,\n output_format=diffai.OutputFormat.JSON)\n \n return jsonify({\n \"status\": \"success\",\n \"comparison\": result.data if result.is_json else result.raw_output\n })\n \n except diffai.DiffaiError as e:\n return jsonify({\"status\": \"error\", \"message\": str(e)}), 400\n\nif __name__ == '__main__':\n app.run(debug=True)\n```\n\n## \ud83c\udfd7\ufe0f Platform Support\n\nThis package automatically downloads platform-specific binaries:\n\n- **Linux** (x86_64, ARM64)\n- **macOS** (Intel x86_64, Apple Silicon ARM64)\n- **Windows** (x86_64)\n\nThe binary is downloaded during installation and cached. If download fails, the package falls back to system PATH.\n\n## \ud83d\udd17 Related Projects\n\n- **[diffx-python](https://pypi.org/project/diffx-python/)** - General-purpose structured data diff tool\n- **[diffai (npm)](https://www.npmjs.com/package/diffai)** - Node.js package for diffai\n- **[diffai (GitHub)](https://github.com/diffai-team/diffai)** - Main repository\n\n## \ud83d\udcda Documentation\n\n- [CLI Reference](https://github.com/diffai-team/diffai/blob/main/docs/reference/cli-reference.md)\n- [ML Analysis Guide](https://github.com/diffai-team/diffai/blob/main/docs/reference/ml-analysis.md)\n- [User Guide](https://github.com/diffai-team/diffai/blob/main/docs/user-guide/)\n- [API Documentation](https://github.com/diffai-team/diffai/blob/main/docs/reference/api-reference.md)\n\n## \ud83d\udcc4 License\n\nMIT License - see [LICENSE](https://github.com/diffai-team/diffai/blob/main/LICENSE) file for details.\n\n## \ud83e\udd1d Contributing\n\nContributions welcome! Please see [CONTRIBUTING.md](https://github.com/diffai-team/diffai/blob/main/CONTRIBUTING.md) for guidelines.\n\n---\n\n**diffai** - Making AI/ML data differences visible, measurable, and actionable through Python. \ud83d\udc0d\ud83d\ude80",
"bugtrack_url": null,
"license": null,
"summary": "AI/ML specialized diff tool for deep tensor comparison and analysis",
"version": "0.2.9",
"project_urls": {
"Changelog": "https://github.com/kako-jun/diffai/blob/main/CHANGELOG.md",
"Documentation": "https://github.com/kako-jun/diffai/blob/main/docs/",
"Homepage": "https://github.com/kako-jun/diffai",
"Issues": "https://github.com/kako-jun/diffai/issues",
"Repository": "https://github.com/kako-jun/diffai.git"
},
"split_keywords": [
"ai",
" artificial-intelligence",
" diff",
" diffai",
" machine-learning",
" matlab",
" ml",
" model-comparison",
" numpy",
" pytorch",
" safetensors",
" tensor"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "d1555b845361a09f33db289ca6673ce6ede94c3ece726bc4fcf7aab687a85f65",
"md5": "18c1e102352c450ecdc11701dd80369b",
"sha256": "bde1a2ef3389ffddb9362f11fc702e49f074d31afeee8348a5e39025454d8896"
},
"downloads": -1,
"filename": "diffai_python-0.2.9-py3-none-any.whl",
"has_sig": false,
"md5_digest": "18c1e102352c450ecdc11701dd80369b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 13439,
"upload_time": "2025-07-12T14:37:56",
"upload_time_iso_8601": "2025-07-12T14:37:56.770680Z",
"url": "https://files.pythonhosted.org/packages/d1/55/5b845361a09f33db289ca6673ce6ede94c3ece726bc4fcf7aab687a85f65/diffai_python-0.2.9-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "a0e1e810f6c2ea95fa2faabdb026507b77dea010ccb4296aded62ee6e16a8014",
"md5": "ab0daba3747d6b96b51ccbda84d34edf",
"sha256": "ba9ec023e81e3b42ca15bfd79808a801271a97404003305283324bd517197274"
},
"downloads": -1,
"filename": "diffai_python-0.2.9.tar.gz",
"has_sig": false,
"md5_digest": "ab0daba3747d6b96b51ccbda84d34edf",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 12033,
"upload_time": "2025-07-12T14:37:57",
"upload_time_iso_8601": "2025-07-12T14:37:57.883156Z",
"url": "https://files.pythonhosted.org/packages/a0/e1/e810f6c2ea95fa2faabdb026507b77dea010ccb4296aded62ee6e16a8014/diffai_python-0.2.9.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-12 14:37:57",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "kako-jun",
"github_project": "diffai",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "diffai-python"
}