autonomize-model-sdk


Nameautonomize-model-sdk JSON
Version 1.0.14 PyPI version JSON
download
home_pagehttps://github.com/autonomize-ai/autonomize-model-sdk.git
SummarySDK for creating and managing machine learning pipelines.
upload_time2025-02-12 10:37:01
maintainerNone
docs_urlNone
authorJagveer Singh
requires_python>=3.9
licenseProprietary
keywords machine learning sdk mlflow modelhub
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# ModelHub SDK

ModelHub SDK is a powerful tool for orchestrating and managing machine learning workflows, experiments, datasets, and deployments on Kubernetes. It integrates seamlessly with MLflow and supports custom pipelines, dataset management, model logging, and serving through Kserve.

![Python Version](https://img.shields.io/badge/Python-3.9+-blue?style=for-the-badge&logo=python)
![PyPI Version](https://img.shields.io/pypi/v/autonomize-model-sdk?style=for-the-badge&logo=pypi)
![Code Formatter](https://img.shields.io/badge/code%20style-black-000000.svg?style=for-the-badge)
![Code Linter](https://img.shields.io/badge/linting-pylint-green.svg?style=for-the-badge)
![Code Checker](https://img.shields.io/badge/mypy-checked-blue?style=for-the-badge)
![Code Coverage](https://img.shields.io/badge/coverage-96%25-a4a523?style=for-the-badge&logo=codecov)

## Table of Contents

1. [Installation](#installation)
2. [Environment Setup](#environment-setup)
3. [Quickstart](#quickstart)
4. [Experiments and Runs](#experiments-and-runs)
    - [Logging Parameters and Metrics](#logging-parameters-and-metrics)
    - [Artifact Management](#artifact-management)
5. [Pipeline Management](#pipeline-management)
    - [Pipeline Definition](#pipeline-definition)
    - [Running a Pipeline](#running-a-pipeline)
6. [Dataset Management](#dataset-management)
    - [Loading Datasets](#loading-datasets)
7. [Model Deployment through Kserve](#model-deployment-through-kserve)
8. [Examples](#examples)

---

## Installation

To install the ModelHub SDK, simply run:

```bash
pip install autonomize-model-sdk
```

## Environment Setup
Ensure you have the following environment variables set in your system:

```bash
export MODELHUB_BASE_URL=https://api-modelhub.example.com
export MODELHUB_CLIENT_ID=your_client_id
export MODELHUB_CLIENT_SECRET=your_client_secret
export MLFLOW_EXPERIMENT_ID=your_experiment_id
```

Alternatively, create a .env file in your project directory and add the above environment variables.

## Quickstart
The ModelHub SDK allows you to easily log experiments, manage pipelines, and use datasets.

Here’s a quick example of how to initialize the client and log a run:

```python
import os
from modelhub.clients import MLflowClient

# Initialize the ModelHub client
client = MLflowClient(base_url=os.getenv("MODELHUB_BASE_URL"))
experiment_id = os.getenv("MLFLOW_EXPERIMENT_ID")

client.set_experiment(experiment_id=experiment_id)

# Start an MLflow run
with client.start_run(run_name="my_experiment_run"):
    client.mlflow.log_param("param1", "value1")
    client.mlflow.log_metric("accuracy", 0.85)
    client.mlflow.log_artifact("model.pkl")
```

## Experiments and Runs
ModelHub SDK provides an easy way to interact with MLflow for managing experiments and runs.

## Logging Parameters and Metrics
To log parameters, metrics, and artifacts:

```python
with client.start_run(run_name="my_run"):
    # Log parameters
    client.mlflow.log_param("learning_rate", 0.01)

    # Log metrics
    client.mlflow.log_metric("accuracy", 0.92)
    client.mlflow.log_metric("precision", 0.88)

    # Log artifacts
    client.mlflow.log_artifact("/path/to/model.pkl")
```

## Artifact Management
You can log or download artifacts with ease:

```python
# Log artifact
client.mlflow.log_artifact("/path/to/file.csv")

# Download artifact
client.mlflow.artifacts.download_artifacts(run_id="run_id_here", artifact_path="artifact.csv", dst_path="/tmp")
```

## Pipeline Management
ModelHub SDK enables users to define, manage, and run multi-stage pipelines that automate your machine learning workflow. You can define pipelines in YAML and submit them using the SDK.

## Pipeline Definition
Here’s a sample pipeline.yaml file:

```yaml
name: "ModelHub Pipeline Example"
description: "Pipeline with preprocess, training, and evaluation stages"
experiment_id: "9"
dataset_name: "dataset_name"
image_tag: "base-llm:1.0.1"
stages:
  - name: preprocess
    type: custom
    params:
      data_path: "data"
      output_path: "output"
    script: stages/preprocess.py
    requirements: requirements.txt
    resources:
      cpu: "1"
      memory: "1Gi"

  - name: train
    type: custom
    params:
      data_path: "output/train_preprocessed.csv"
      model_path: "output/model"
    script: stages/train.py
    requirements: requirements.txt
    resources:
      cpu: "1"
      memory: "1Gi"

  - name: evaluate
    type: custom
    params:
      model_path: "output/model"
      eval_output_path: "output/eval"
    script: stages/evaluate.py
    requirements: requirements.txt
    resources:
      cpu: "1"
      memory: "1Gi"
```

## Running a Pipeline
To submit and run a pipeline, use the PipelineManager from the SDK:

```python
from modelhub.clients import PipelineManager

pipeline_manager = PipelineManager(base_url=os.getenv("MODELHUB_BASE_URL"))

# Start the pipeline
pipeline = pipeline_manager.start_pipeline("pipeline.yaml")
print("Pipeline started:", pipeline)
```

## Dataset Management
ModelHub SDK allows you to load and manage datasets easily, with support for loading data from external storage or datasets managed through the frontend.

## Loading Datasets
To load datasets using the SDK:

```python
from modelhub import load_dataset

# Load a dataset by name
dataset = load_dataset("my_dataset")

# Load a dataset from a specific directory
dataset = load_dataset("my_dataset", directory="data_folder/")
```


## Model Deployment through Kserve
Deploy models via Kserve after logging them with MLflow:

## Create a model wrapper:
Use the MLflow PythonModel interface to define your model's prediction logic.

```python
import mlflow.pyfunc
import joblib

class PDFModelWrapper(mlflow.pyfunc.PythonModel):
    def load_context(self, context):
        self.model = joblib.load("/path/to/xgboost_model.pkl")

    def predict(self, context, model_input):
        # Perform inference
        return self.model.predict(model_input)

# Log the model
mlflow.pyfunc.log_model(artifact_path="xgboost_model", python_model=PDFModelWrapper())
```

## Deploy with Kserve:
After logging the model, deploy it using Kserve.
Provide a REST endpoint for model inference.

```yaml
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "re"
  namespace: "modelhub"
  labels:
    azure.workload.identity/use: "true"
spec:
  predictor:
    model:
      modelFormat:
        name: mlflow
      protocolVersion: v2
      storageUri: "https://autonomizestorageaccount.blob.core.windows.net/mlflow/27/e5edc75c09d9470dadc42bd301ee8a8f/artifacts/reinfer_model"
      resources:
        limits:
          cpu: "3"
          memory: "16Gi"
          # nvidia.com/gpu: "1"
        requests:
          cpu: "3"
          memory: "16Gi"
    serviceAccountName: "genesis-platform-sa"
    tolerations:
      - key: "sku"
        operator: "Equal"
        value: "gpu"
        effect: "NoSchedule"
```

## Examples
Here are additional examples to help you get started:

Logging Training and Evaluation Runs

```python
with client.start_run(run_name="Training Run"):
    model.fit(X_train, y_train)
    accuracy = model.score(X_test, y_test)

    # Log parameters and metrics
    client.mlflow.log_param("model", "XGBoost")
    client.mlflow.log_metric("accuracy", accuracy)

    # Save and log the model
    joblib.dump(model, "xgboost_model.pkl")
    client.mlflow.log_artifact("xgboost_model.pkl")
```

## Managing Datasets
```python
from modelhub import load_dataset

# Load dataset
dataset = load_dataset("my_custom_dataset", version=2)

# Convert to pandas
df = pd.DataFrame(dataset["train"])

# Perform operations on the dataset
print(df.head())
```

## Using Blob Storage for Dataset
```python
# Set up blob storage config in YAML
blob_storage_config = "blob_storage_config.yaml"

# Load dataset from blob storage
dataset = load_dataset("my_dataset", blob_storage_config=blob_storage_config)
```
## Submitting a Pipeline
```python
from modelhub.clients import PipelineManager

# Submit and start the pipeline
pipeline_manager = PipelineManager(base_url=os.getenv("MODELHUB_BASE_URL"))
pipeline = pipeline_manager.start_pipeline("pipeline.yaml")

print(f"Pipeline started with ID: {pipeline['id']}")
```

## Feedback & Contributions
Feel free to raise issues, submit PRs, or suggest features for the ModelHub SDK on our GitHub repository.

For feedback or support, please reach out to the ModelHub team directly.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/autonomize-ai/autonomize-model-sdk.git",
    "name": "autonomize-model-sdk",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "machine learning, sdk, mlflow, modelhub",
    "author": "Jagveer Singh",
    "author_email": "jagveer@autonomize.ai",
    "download_url": "https://files.pythonhosted.org/packages/a0/22/f17c64fe2243dec1e315b84453c4858498641306c8bf5c5a360b9871a4df/autonomize_model_sdk-1.0.14.tar.gz",
    "platform": null,
    "description": "\n# ModelHub SDK\n\nModelHub SDK is a powerful tool for orchestrating and managing machine learning workflows, experiments, datasets, and deployments on Kubernetes. It integrates seamlessly with MLflow and supports custom pipelines, dataset management, model logging, and serving through Kserve.\n\n![Python Version](https://img.shields.io/badge/Python-3.9+-blue?style=for-the-badge&logo=python)\n![PyPI Version](https://img.shields.io/pypi/v/autonomize-model-sdk?style=for-the-badge&logo=pypi)\n![Code Formatter](https://img.shields.io/badge/code%20style-black-000000.svg?style=for-the-badge)\n![Code Linter](https://img.shields.io/badge/linting-pylint-green.svg?style=for-the-badge)\n![Code Checker](https://img.shields.io/badge/mypy-checked-blue?style=for-the-badge)\n![Code Coverage](https://img.shields.io/badge/coverage-96%25-a4a523?style=for-the-badge&logo=codecov)\n\n## Table of Contents\n\n1. [Installation](#installation)\n2. [Environment Setup](#environment-setup)\n3. [Quickstart](#quickstart)\n4. [Experiments and Runs](#experiments-and-runs)\n    - [Logging Parameters and Metrics](#logging-parameters-and-metrics)\n    - [Artifact Management](#artifact-management)\n5. [Pipeline Management](#pipeline-management)\n    - [Pipeline Definition](#pipeline-definition)\n    - [Running a Pipeline](#running-a-pipeline)\n6. [Dataset Management](#dataset-management)\n    - [Loading Datasets](#loading-datasets)\n7. [Model Deployment through Kserve](#model-deployment-through-kserve)\n8. [Examples](#examples)\n\n---\n\n## Installation\n\nTo install the ModelHub SDK, simply run:\n\n```bash\npip install autonomize-model-sdk\n```\n\n## Environment Setup\nEnsure you have the following environment variables set in your system:\n\n```bash\nexport MODELHUB_BASE_URL=https://api-modelhub.example.com\nexport MODELHUB_CLIENT_ID=your_client_id\nexport MODELHUB_CLIENT_SECRET=your_client_secret\nexport MLFLOW_EXPERIMENT_ID=your_experiment_id\n```\n\nAlternatively, create a .env file in your project directory and add the above environment variables.\n\n## Quickstart\nThe ModelHub SDK allows you to easily log experiments, manage pipelines, and use datasets.\n\nHere\u2019s a quick example of how to initialize the client and log a run:\n\n```python\nimport os\nfrom modelhub.clients import MLflowClient\n\n# Initialize the ModelHub client\nclient = MLflowClient(base_url=os.getenv(\"MODELHUB_BASE_URL\"))\nexperiment_id = os.getenv(\"MLFLOW_EXPERIMENT_ID\")\n\nclient.set_experiment(experiment_id=experiment_id)\n\n# Start an MLflow run\nwith client.start_run(run_name=\"my_experiment_run\"):\n    client.mlflow.log_param(\"param1\", \"value1\")\n    client.mlflow.log_metric(\"accuracy\", 0.85)\n    client.mlflow.log_artifact(\"model.pkl\")\n```\n\n## Experiments and Runs\nModelHub SDK provides an easy way to interact with MLflow for managing experiments and runs.\n\n## Logging Parameters and Metrics\nTo log parameters, metrics, and artifacts:\n\n```python\nwith client.start_run(run_name=\"my_run\"):\n    # Log parameters\n    client.mlflow.log_param(\"learning_rate\", 0.01)\n\n    # Log metrics\n    client.mlflow.log_metric(\"accuracy\", 0.92)\n    client.mlflow.log_metric(\"precision\", 0.88)\n\n    # Log artifacts\n    client.mlflow.log_artifact(\"/path/to/model.pkl\")\n```\n\n## Artifact Management\nYou can log or download artifacts with ease:\n\n```python\n# Log artifact\nclient.mlflow.log_artifact(\"/path/to/file.csv\")\n\n# Download artifact\nclient.mlflow.artifacts.download_artifacts(run_id=\"run_id_here\", artifact_path=\"artifact.csv\", dst_path=\"/tmp\")\n```\n\n## Pipeline Management\nModelHub SDK enables users to define, manage, and run multi-stage pipelines that automate your machine learning workflow. You can define pipelines in YAML and submit them using the SDK.\n\n## Pipeline Definition\nHere\u2019s a sample pipeline.yaml file:\n\n```yaml\nname: \"ModelHub Pipeline Example\"\ndescription: \"Pipeline with preprocess, training, and evaluation stages\"\nexperiment_id: \"9\"\ndataset_name: \"dataset_name\"\nimage_tag: \"base-llm:1.0.1\"\nstages:\n  - name: preprocess\n    type: custom\n    params:\n      data_path: \"data\"\n      output_path: \"output\"\n    script: stages/preprocess.py\n    requirements: requirements.txt\n    resources:\n      cpu: \"1\"\n      memory: \"1Gi\"\n\n  - name: train\n    type: custom\n    params:\n      data_path: \"output/train_preprocessed.csv\"\n      model_path: \"output/model\"\n    script: stages/train.py\n    requirements: requirements.txt\n    resources:\n      cpu: \"1\"\n      memory: \"1Gi\"\n\n  - name: evaluate\n    type: custom\n    params:\n      model_path: \"output/model\"\n      eval_output_path: \"output/eval\"\n    script: stages/evaluate.py\n    requirements: requirements.txt\n    resources:\n      cpu: \"1\"\n      memory: \"1Gi\"\n```\n\n## Running a Pipeline\nTo submit and run a pipeline, use the PipelineManager from the SDK:\n\n```python\nfrom modelhub.clients import PipelineManager\n\npipeline_manager = PipelineManager(base_url=os.getenv(\"MODELHUB_BASE_URL\"))\n\n# Start the pipeline\npipeline = pipeline_manager.start_pipeline(\"pipeline.yaml\")\nprint(\"Pipeline started:\", pipeline)\n```\n\n## Dataset Management\nModelHub SDK allows you to load and manage datasets easily, with support for loading data from external storage or datasets managed through the frontend.\n\n## Loading Datasets\nTo load datasets using the SDK:\n\n```python\nfrom modelhub import load_dataset\n\n# Load a dataset by name\ndataset = load_dataset(\"my_dataset\")\n\n# Load a dataset from a specific directory\ndataset = load_dataset(\"my_dataset\", directory=\"data_folder/\")\n```\n\n\n## Model Deployment through Kserve\nDeploy models via Kserve after logging them with MLflow:\n\n## Create a model wrapper:\nUse the MLflow PythonModel interface to define your model's prediction logic.\n\n```python\nimport mlflow.pyfunc\nimport joblib\n\nclass PDFModelWrapper(mlflow.pyfunc.PythonModel):\n    def load_context(self, context):\n        self.model = joblib.load(\"/path/to/xgboost_model.pkl\")\n\n    def predict(self, context, model_input):\n        # Perform inference\n        return self.model.predict(model_input)\n\n# Log the model\nmlflow.pyfunc.log_model(artifact_path=\"xgboost_model\", python_model=PDFModelWrapper())\n```\n\n## Deploy with Kserve:\nAfter logging the model, deploy it using Kserve.\nProvide a REST endpoint for model inference.\n\n```yaml\napiVersion: \"serving.kserve.io/v1beta1\"\nkind: \"InferenceService\"\nmetadata:\n  name: \"re\"\n  namespace: \"modelhub\"\n  labels:\n    azure.workload.identity/use: \"true\"\nspec:\n  predictor:\n    model:\n      modelFormat:\n        name: mlflow\n      protocolVersion: v2\n      storageUri: \"https://autonomizestorageaccount.blob.core.windows.net/mlflow/27/e5edc75c09d9470dadc42bd301ee8a8f/artifacts/reinfer_model\"\n      resources:\n        limits:\n          cpu: \"3\"\n          memory: \"16Gi\"\n          # nvidia.com/gpu: \"1\"\n        requests:\n          cpu: \"3\"\n          memory: \"16Gi\"\n    serviceAccountName: \"genesis-platform-sa\"\n    tolerations:\n      - key: \"sku\"\n        operator: \"Equal\"\n        value: \"gpu\"\n        effect: \"NoSchedule\"\n```\n\n## Examples\nHere are additional examples to help you get started:\n\nLogging Training and Evaluation Runs\n\n```python\nwith client.start_run(run_name=\"Training Run\"):\n    model.fit(X_train, y_train)\n    accuracy = model.score(X_test, y_test)\n\n    # Log parameters and metrics\n    client.mlflow.log_param(\"model\", \"XGBoost\")\n    client.mlflow.log_metric(\"accuracy\", accuracy)\n\n    # Save and log the model\n    joblib.dump(model, \"xgboost_model.pkl\")\n    client.mlflow.log_artifact(\"xgboost_model.pkl\")\n```\n\n## Managing Datasets\n```python\nfrom modelhub import load_dataset\n\n# Load dataset\ndataset = load_dataset(\"my_custom_dataset\", version=2)\n\n# Convert to pandas\ndf = pd.DataFrame(dataset[\"train\"])\n\n# Perform operations on the dataset\nprint(df.head())\n```\n\n## Using Blob Storage for Dataset\n```python\n# Set up blob storage config in YAML\nblob_storage_config = \"blob_storage_config.yaml\"\n\n# Load dataset from blob storage\ndataset = load_dataset(\"my_dataset\", blob_storage_config=blob_storage_config)\n```\n## Submitting a Pipeline\n```python\nfrom modelhub.clients import PipelineManager\n\n# Submit and start the pipeline\npipeline_manager = PipelineManager(base_url=os.getenv(\"MODELHUB_BASE_URL\"))\npipeline = pipeline_manager.start_pipeline(\"pipeline.yaml\")\n\nprint(f\"Pipeline started with ID: {pipeline['id']}\")\n```\n\n## Feedback & Contributions\nFeel free to raise issues, submit PRs, or suggest features for the ModelHub SDK on our GitHub repository.\n\nFor feedback or support, please reach out to the ModelHub team directly.\n",
    "bugtrack_url": null,
    "license": "Proprietary",
    "summary": "SDK for creating and managing machine learning pipelines.",
    "version": "1.0.14",
    "project_urls": {
        "Homepage": "https://github.com/autonomize-ai/autonomize-model-sdk.git",
        "Repository": "https://github.com/autonomize-ai/autonomize-model-sdk.git"
    },
    "split_keywords": [
        "machine learning",
        " sdk",
        " mlflow",
        " modelhub"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e47e4d0d60bf3b63a7a9c77dcb2eff6196ba260f2dd9f16f6f93cceef9bf9c70",
                "md5": "9be6a5538b6f434e4dda6a398fe5d91b",
                "sha256": "3d64d1dd16e54d51f3706cada29228b15f41a63a346b6e03b397aae8dcf98a1b"
            },
            "downloads": -1,
            "filename": "autonomize_model_sdk-1.0.14-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9be6a5538b6f434e4dda6a398fe5d91b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 18427,
            "upload_time": "2025-02-12T10:37:00",
            "upload_time_iso_8601": "2025-02-12T10:37:00.681517Z",
            "url": "https://files.pythonhosted.org/packages/e4/7e/4d0d60bf3b63a7a9c77dcb2eff6196ba260f2dd9f16f6f93cceef9bf9c70/autonomize_model_sdk-1.0.14-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a022f17c64fe2243dec1e315b84453c4858498641306c8bf5c5a360b9871a4df",
                "md5": "57d6c545a1a5a4639909400db42b0454",
                "sha256": "af804ac6114e6a15cd0d5b003512d92337750740b6a93c4b082cdf4dc13705a9"
            },
            "downloads": -1,
            "filename": "autonomize_model_sdk-1.0.14.tar.gz",
            "has_sig": false,
            "md5_digest": "57d6c545a1a5a4639909400db42b0454",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 15278,
            "upload_time": "2025-02-12T10:37:01",
            "upload_time_iso_8601": "2025-02-12T10:37:01.978889Z",
            "url": "https://files.pythonhosted.org/packages/a0/22/f17c64fe2243dec1e315b84453c4858498641306c8bf5c5a360b9871a4df/autonomize_model_sdk-1.0.14.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-12 10:37:01",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "autonomize-ai",
    "github_project": "autonomize-model-sdk",
    "github_not_found": true,
    "lcname": "autonomize-model-sdk"
}
        
Elapsed time: 1.32213s