sm-dag-compiler


Namesm-dag-compiler JSON
Version 1.0.1 PyPI version JSON
download
home_pageNone
SummaryAutomatic SageMaker Pipeline Generation from DAG Specifications
upload_time2025-08-01 16:22:28
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords sagemaker pipeline dag machine-learning aws automation mlops data-science workflow orchestration
VCS
bugtrack_url
requirements boto3 botocore sagemaker pydantic PyYAML networkx click requests packaging typing_extensions pandas numpy scikit-learn joblib xgboost matplotlib
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # SM-DAG-Compiler: Automatic SageMaker Pipeline Generation

[![PyPI version](https://badge.fury.io/py/sm-dag-compiler.svg)](https://badge.fury.io/py/sm-dag-compiler)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

**Transform pipeline graphs into production-ready SageMaker pipelines automatically.**

SM-DAG-Compiler is an intelligent pipeline generation system that automatically creates complete SageMaker pipelines from user-provided pipeline graphs. Simply define your ML workflow as a graph structure, and SM-DAG-Compiler handles all the complex SageMaker implementation details, dependency resolution, and configuration management automatically.

## 🚀 Quick Start

### Installation

```bash
# Core installation
pip install sm-dag-compiler

# With ML frameworks
pip install sm-dag-compiler[pytorch,xgboost]

# Full installation with all features
pip install sm-dag-compiler[all]
```

### 30-Second Example

```python
import sm_dag_compiler
from sm_dag_compiler.core.dag import PipelineDAG

# Create a simple DAG
dag = PipelineDAG(name="fraud-detection")
dag.add_node("data_loading", "CRADLE_DATA_LOADING")
dag.add_node("preprocessing", "TABULAR_PREPROCESSING") 
dag.add_node("training", "XGBOOST_TRAINING")
dag.add_edge("data_loading", "preprocessing")
dag.add_edge("preprocessing", "training")

# Compile to SageMaker pipeline automatically
pipeline = sm_dag_compiler.compile_dag(dag)
pipeline.start()  # Deploy and run!
```

### Command Line Interface

```bash
# Generate a new project
sm-dag-compiler init --template xgboost --name fraud-detection

# Validate your DAG
sm-dag-compiler validate my_dag.py

# Compile to SageMaker pipeline
sm-dag-compiler compile my_dag.py --name my-pipeline --output pipeline.json
```

## ✨ Key Features

### 🎯 **Graph-to-Pipeline Automation**
- **Input**: Simple pipeline graph with step types and connections
- **Output**: Complete SageMaker pipeline with all dependencies resolved
- **Magic**: Intelligent analysis of graph structure with automatic step builder selection

### ⚡ **10x Faster Development**
- **Before**: 2-4 weeks of manual SageMaker configuration
- **After**: 10-30 minutes from graph to working pipeline
- **Result**: 95% reduction in development time

### 🧠 **Intelligent Dependency Resolution**
- Automatic step connections and data flow
- Smart configuration matching and validation
- Type-safe specifications with compile-time checks
- Semantic compatibility analysis

### 🛡️ **Production Ready**
- Built-in quality gates and validation
- Enterprise governance and compliance
- Comprehensive error handling and debugging
- 98% complete with 1,650+ lines of complex code eliminated

## 📊 Proven Results

Based on production deployments across enterprise environments:

| Component | Code Reduction | Lines Eliminated | Key Benefit |
|-----------|----------------|------------------|-------------|
| **Processing Steps** | 60% | 400+ lines | Automatic input/output resolution |
| **Training Steps** | 60% | 300+ lines | Intelligent hyperparameter handling |
| **Model Steps** | 47% | 380+ lines | Streamlined model creation |
| **Registration Steps** | 66% | 330+ lines | Simplified deployment workflows |
| **Overall System** | **~55%** | **1,650+ lines** | **Intelligent automation** |

## 🏗️ Architecture

SM-DAG-Compiler follows a sophisticated layered architecture:

- **🎯 User Interface**: Fluent API and Pipeline DAG for intuitive construction
- **🧠 Intelligence Layer**: Smart proxies with automatic dependency resolution  
- **🏗️ Orchestration**: Pipeline assembler and compiler for DAG-to-template conversion
- **📚 Registry Management**: Multi-context coordination with lifecycle management
- **🔗 Dependency Resolution**: Intelligent matching with semantic compatibility
- **📋 Specification Layer**: Comprehensive step definitions with quality gates

## 📚 Usage Examples

### Basic Pipeline

```python
from sm_dag_compiler import PipelineDAGCompiler
from sm_dag_compiler.core.dag import PipelineDAG

# Create DAG
dag = PipelineDAG()
dag.add_node("load_data", "DATA_LOADING_SPEC")
dag.add_node("train_model", "XGBOOST_TRAINING_SPEC")
dag.add_edge("load_data", "train_model")

# Compile with configuration
compiler = PipelineDAGCompiler(config_path="config.yaml")
pipeline = compiler.compile(dag, pipeline_name="my-ml-pipeline")
```

### Advanced Configuration

```python
from sm_dag_compiler import create_pipeline_from_dag

# Create pipeline with custom settings
pipeline = create_pipeline_from_dag(
    dag=my_dag,
    pipeline_name="advanced-pipeline",
    config_path="advanced_config.yaml",
    quality_requirements={
        "min_auc": 0.88,
        "max_training_time": "4 hours"
    }
)
```

### Fluent API (Advanced)

```python
from sm_dag_compiler.utils.fluent import Pipeline

# Natural language-like construction
pipeline = (Pipeline("fraud-detection")
    .load_data("s3://fraud-data/")
    .preprocess_with_defaults()
    .train_xgboost(max_depth=6, eta=0.3)
    .evaluate_performance()
    .deploy_if_threshold_met(min_auc=0.85))
```

## 🔧 Installation Options

### Core Installation
```bash
pip install sm-dag-compiler
```
Includes basic DAG compilation and SageMaker integration.

### Framework-Specific
```bash
pip install sm-dag-compiler[pytorch]    # PyTorch Lightning models
pip install sm-dag-compiler[xgboost]    # XGBoost training pipelines  
pip install sm-dag-compiler[nlp]        # NLP models and processing
pip install sm-dag-compiler[processing] # Advanced data processing
```

### Development
```bash
pip install sm-dag-compiler[dev]        # Development tools
pip install sm-dag-compiler[docs]       # Documentation tools
pip install sm-dag-compiler[all]        # Everything included
```

## 🎯 Who Should Use SM-DAG-Compiler?

### **Data Scientists & ML Practitioners**
- Focus on model development, not infrastructure complexity
- Rapid experimentation with 10x faster iteration
- Business-focused interface eliminates SageMaker expertise requirements

### **Platform Engineers & ML Engineers**  
- 60% less code to maintain and debug
- Specification-driven architecture prevents common errors
- Universal patterns enable faster team onboarding

### **Organizations**
- Accelerated innovation with faster pipeline development
- Reduced technical debt through clean architecture
- Built-in governance and compliance frameworks

## 📖 Documentation

- **[Full Documentation](https://github.com/TianpeiLuke/sm-dag-compiler/blob/main/README.md)** - Complete guide and architecture
- **[API Reference](https://github.com/TianpeiLuke/sm-dag-compiler/tree/main/src)** - Detailed API documentation
- **[Examples](https://github.com/TianpeiLuke/sm-dag-compiler/tree/main/pipeline_examples)** - Ready-to-use pipeline blueprints
- **[Developer Guide](https://github.com/TianpeiLuke/sm-dag-compiler/tree/main/slipbox/developer_guide)** - Contributing and extending AutoPipe

## 🤝 Contributing

We welcome contributions! See our [Contributing Guide](https://github.com/TianpeiLuke/sm-dag-compiler/blob/main/slipbox/developer_guide/README.md) for details.

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](https://github.com/TianpeiLuke/sm-dag-compiler/blob/main/LICENSE) file for details.

## 🔗 Links

- **GitHub**: https://github.com/TianpeiLuke/sm-dag-compiler
- **Issues**: https://github.com/TianpeiLuke/sm-dag-compiler/issues
- **PyPI**: https://pypi.org/project/sm-dag-compiler/

---

**SM-DAG-Compiler**: Making SageMaker pipeline development 10x faster through intelligent automation. 🚀

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "sm-dag-compiler",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "Tianpei Xie <unidoctor@gmail.com>",
    "keywords": "sagemaker, pipeline, dag, machine-learning, aws, automation, mlops, data-science, workflow, orchestration",
    "author": null,
    "author_email": "Tianpei Xie <unidoctor@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/d3/6d/fb89d43d0a3852f8b461957e2f0e525b807cf30621afc8ed5c45823cc634/sm_dag_compiler-1.0.1.tar.gz",
    "platform": null,
    "description": "# SM-DAG-Compiler: Automatic SageMaker Pipeline Generation\n\n[![PyPI version](https://badge.fury.io/py/sm-dag-compiler.svg)](https://badge.fury.io/py/sm-dag-compiler)\n[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\n**Transform pipeline graphs into production-ready SageMaker pipelines automatically.**\n\nSM-DAG-Compiler is an intelligent pipeline generation system that automatically creates complete SageMaker pipelines from user-provided pipeline graphs. Simply define your ML workflow as a graph structure, and SM-DAG-Compiler handles all the complex SageMaker implementation details, dependency resolution, and configuration management automatically.\n\n## \ud83d\ude80 Quick Start\n\n### Installation\n\n```bash\n# Core installation\npip install sm-dag-compiler\n\n# With ML frameworks\npip install sm-dag-compiler[pytorch,xgboost]\n\n# Full installation with all features\npip install sm-dag-compiler[all]\n```\n\n### 30-Second Example\n\n```python\nimport sm_dag_compiler\nfrom sm_dag_compiler.core.dag import PipelineDAG\n\n# Create a simple DAG\ndag = PipelineDAG(name=\"fraud-detection\")\ndag.add_node(\"data_loading\", \"CRADLE_DATA_LOADING\")\ndag.add_node(\"preprocessing\", \"TABULAR_PREPROCESSING\") \ndag.add_node(\"training\", \"XGBOOST_TRAINING\")\ndag.add_edge(\"data_loading\", \"preprocessing\")\ndag.add_edge(\"preprocessing\", \"training\")\n\n# Compile to SageMaker pipeline automatically\npipeline = sm_dag_compiler.compile_dag(dag)\npipeline.start()  # Deploy and run!\n```\n\n### Command Line Interface\n\n```bash\n# Generate a new project\nsm-dag-compiler init --template xgboost --name fraud-detection\n\n# Validate your DAG\nsm-dag-compiler validate my_dag.py\n\n# Compile to SageMaker pipeline\nsm-dag-compiler compile my_dag.py --name my-pipeline --output pipeline.json\n```\n\n## \u2728 Key Features\n\n### \ud83c\udfaf **Graph-to-Pipeline Automation**\n- **Input**: Simple pipeline graph with step types and connections\n- **Output**: Complete SageMaker pipeline with all dependencies resolved\n- **Magic**: Intelligent analysis of graph structure with automatic step builder selection\n\n### \u26a1 **10x Faster Development**\n- **Before**: 2-4 weeks of manual SageMaker configuration\n- **After**: 10-30 minutes from graph to working pipeline\n- **Result**: 95% reduction in development time\n\n### \ud83e\udde0 **Intelligent Dependency Resolution**\n- Automatic step connections and data flow\n- Smart configuration matching and validation\n- Type-safe specifications with compile-time checks\n- Semantic compatibility analysis\n\n### \ud83d\udee1\ufe0f **Production Ready**\n- Built-in quality gates and validation\n- Enterprise governance and compliance\n- Comprehensive error handling and debugging\n- 98% complete with 1,650+ lines of complex code eliminated\n\n## \ud83d\udcca Proven Results\n\nBased on production deployments across enterprise environments:\n\n| Component | Code Reduction | Lines Eliminated | Key Benefit |\n|-----------|----------------|------------------|-------------|\n| **Processing Steps** | 60% | 400+ lines | Automatic input/output resolution |\n| **Training Steps** | 60% | 300+ lines | Intelligent hyperparameter handling |\n| **Model Steps** | 47% | 380+ lines | Streamlined model creation |\n| **Registration Steps** | 66% | 330+ lines | Simplified deployment workflows |\n| **Overall System** | **~55%** | **1,650+ lines** | **Intelligent automation** |\n\n## \ud83c\udfd7\ufe0f Architecture\n\nSM-DAG-Compiler follows a sophisticated layered architecture:\n\n- **\ud83c\udfaf User Interface**: Fluent API and Pipeline DAG for intuitive construction\n- **\ud83e\udde0 Intelligence Layer**: Smart proxies with automatic dependency resolution  \n- **\ud83c\udfd7\ufe0f Orchestration**: Pipeline assembler and compiler for DAG-to-template conversion\n- **\ud83d\udcda Registry Management**: Multi-context coordination with lifecycle management\n- **\ud83d\udd17 Dependency Resolution**: Intelligent matching with semantic compatibility\n- **\ud83d\udccb Specification Layer**: Comprehensive step definitions with quality gates\n\n## \ud83d\udcda Usage Examples\n\n### Basic Pipeline\n\n```python\nfrom sm_dag_compiler import PipelineDAGCompiler\nfrom sm_dag_compiler.core.dag import PipelineDAG\n\n# Create DAG\ndag = PipelineDAG()\ndag.add_node(\"load_data\", \"DATA_LOADING_SPEC\")\ndag.add_node(\"train_model\", \"XGBOOST_TRAINING_SPEC\")\ndag.add_edge(\"load_data\", \"train_model\")\n\n# Compile with configuration\ncompiler = PipelineDAGCompiler(config_path=\"config.yaml\")\npipeline = compiler.compile(dag, pipeline_name=\"my-ml-pipeline\")\n```\n\n### Advanced Configuration\n\n```python\nfrom sm_dag_compiler import create_pipeline_from_dag\n\n# Create pipeline with custom settings\npipeline = create_pipeline_from_dag(\n    dag=my_dag,\n    pipeline_name=\"advanced-pipeline\",\n    config_path=\"advanced_config.yaml\",\n    quality_requirements={\n        \"min_auc\": 0.88,\n        \"max_training_time\": \"4 hours\"\n    }\n)\n```\n\n### Fluent API (Advanced)\n\n```python\nfrom sm_dag_compiler.utils.fluent import Pipeline\n\n# Natural language-like construction\npipeline = (Pipeline(\"fraud-detection\")\n    .load_data(\"s3://fraud-data/\")\n    .preprocess_with_defaults()\n    .train_xgboost(max_depth=6, eta=0.3)\n    .evaluate_performance()\n    .deploy_if_threshold_met(min_auc=0.85))\n```\n\n## \ud83d\udd27 Installation Options\n\n### Core Installation\n```bash\npip install sm-dag-compiler\n```\nIncludes basic DAG compilation and SageMaker integration.\n\n### Framework-Specific\n```bash\npip install sm-dag-compiler[pytorch]    # PyTorch Lightning models\npip install sm-dag-compiler[xgboost]    # XGBoost training pipelines  \npip install sm-dag-compiler[nlp]        # NLP models and processing\npip install sm-dag-compiler[processing] # Advanced data processing\n```\n\n### Development\n```bash\npip install sm-dag-compiler[dev]        # Development tools\npip install sm-dag-compiler[docs]       # Documentation tools\npip install sm-dag-compiler[all]        # Everything included\n```\n\n## \ud83c\udfaf Who Should Use SM-DAG-Compiler?\n\n### **Data Scientists & ML Practitioners**\n- Focus on model development, not infrastructure complexity\n- Rapid experimentation with 10x faster iteration\n- Business-focused interface eliminates SageMaker expertise requirements\n\n### **Platform Engineers & ML Engineers**  \n- 60% less code to maintain and debug\n- Specification-driven architecture prevents common errors\n- Universal patterns enable faster team onboarding\n\n### **Organizations**\n- Accelerated innovation with faster pipeline development\n- Reduced technical debt through clean architecture\n- Built-in governance and compliance frameworks\n\n## \ud83d\udcd6 Documentation\n\n- **[Full Documentation](https://github.com/TianpeiLuke/sm-dag-compiler/blob/main/README.md)** - Complete guide and architecture\n- **[API Reference](https://github.com/TianpeiLuke/sm-dag-compiler/tree/main/src)** - Detailed API documentation\n- **[Examples](https://github.com/TianpeiLuke/sm-dag-compiler/tree/main/pipeline_examples)** - Ready-to-use pipeline blueprints\n- **[Developer Guide](https://github.com/TianpeiLuke/sm-dag-compiler/tree/main/slipbox/developer_guide)** - Contributing and extending AutoPipe\n\n## \ud83e\udd1d Contributing\n\nWe welcome contributions! See our [Contributing Guide](https://github.com/TianpeiLuke/sm-dag-compiler/blob/main/slipbox/developer_guide/README.md) for details.\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License - see the [LICENSE](https://github.com/TianpeiLuke/sm-dag-compiler/blob/main/LICENSE) file for details.\n\n## \ud83d\udd17 Links\n\n- **GitHub**: https://github.com/TianpeiLuke/sm-dag-compiler\n- **Issues**: https://github.com/TianpeiLuke/sm-dag-compiler/issues\n- **PyPI**: https://pypi.org/project/sm-dag-compiler/\n\n---\n\n**SM-DAG-Compiler**: Making SageMaker pipeline development 10x faster through intelligent automation. \ud83d\ude80\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Automatic SageMaker Pipeline Generation from DAG Specifications",
    "version": "1.0.1",
    "project_urls": {
        "Changelog": "https://github.com/TianpeiLuke/sm-dag-compiler/blob/main/CHANGELOG.md",
        "Documentation": "https://github.com/TianpeiLuke/sm-dag-compiler/blob/main/README.md",
        "Homepage": "https://github.com/TianpeiLuke/sm-dag-compiler",
        "Issues": "https://github.com/TianpeiLuke/sm-dag-compiler/issues",
        "Repository": "https://github.com/TianpeiLuke/sm-dag-compiler"
    },
    "split_keywords": [
        "sagemaker",
        " pipeline",
        " dag",
        " machine-learning",
        " aws",
        " automation",
        " mlops",
        " data-science",
        " workflow",
        " orchestration"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d54c6ec212e796f47cd788a31f20fd0e2d82696cb9ef1b1bc375481a09d2936c",
                "md5": "6a89371dfaf5e4d266be581414677506",
                "sha256": "39c1dd51552cd8269e8a0ee4b2e72bf4f67b2f193e0273b3c4e11b47f8763022"
            },
            "downloads": -1,
            "filename": "sm_dag_compiler-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6a89371dfaf5e4d266be581414677506",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 364677,
            "upload_time": "2025-08-01T16:22:26",
            "upload_time_iso_8601": "2025-08-01T16:22:26.600786Z",
            "url": "https://files.pythonhosted.org/packages/d5/4c/6ec212e796f47cd788a31f20fd0e2d82696cb9ef1b1bc375481a09d2936c/sm_dag_compiler-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d36dfb89d43d0a3852f8b461957e2f0e525b807cf30621afc8ed5c45823cc634",
                "md5": "14356e450e70ea3be763049143c7c0bf",
                "sha256": "0d6bcbc515177b7050f908511a736df9cc4805e827833d0233968ae799dbf3b7"
            },
            "downloads": -1,
            "filename": "sm_dag_compiler-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "14356e450e70ea3be763049143c7c0bf",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 256225,
            "upload_time": "2025-08-01T16:22:28",
            "upload_time_iso_8601": "2025-08-01T16:22:28.513869Z",
            "url": "https://files.pythonhosted.org/packages/d3/6d/fb89d43d0a3852f8b461957e2f0e525b807cf30621afc8ed5c45823cc634/sm_dag_compiler-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-01 16:22:28",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "TianpeiLuke",
    "github_project": "sm-dag-compiler",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "boto3",
            "specs": [
                [
                    ">=",
                    "1.39.0"
                ]
            ]
        },
        {
            "name": "botocore",
            "specs": [
                [
                    ">=",
                    "1.39.0"
                ]
            ]
        },
        {
            "name": "sagemaker",
            "specs": [
                [
                    ">=",
                    "2.248.0"
                ]
            ]
        },
        {
            "name": "pydantic",
            "specs": [
                [
                    ">=",
                    "2.11.0"
                ]
            ]
        },
        {
            "name": "PyYAML",
            "specs": [
                [
                    ">=",
                    "6.0.0"
                ]
            ]
        },
        {
            "name": "networkx",
            "specs": [
                [
                    ">=",
                    "3.5.0"
                ]
            ]
        },
        {
            "name": "click",
            "specs": [
                [
                    ">=",
                    "8.2.0"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    ">=",
                    "2.32.0"
                ]
            ]
        },
        {
            "name": "packaging",
            "specs": [
                [
                    ">=",
                    "24.2.0"
                ]
            ]
        },
        {
            "name": "typing_extensions",
            "specs": [
                [
                    ">=",
                    "4.14.0"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "2.1.0"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.26.0"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    ">=",
                    "1.3.0"
                ]
            ]
        },
        {
            "name": "joblib",
            "specs": [
                [
                    ">=",
                    "1.5.0"
                ]
            ]
        },
        {
            "name": "xgboost",
            "specs": [
                [
                    ">=",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "matplotlib",
            "specs": [
                [
                    ">=",
                    "3.8.0"
                ]
            ]
        }
    ],
    "lcname": "sm-dag-compiler"
}
        
Elapsed time: 0.83936s