collie-mlops


Namecollie-mlops JSON
Version 0.1.0b0 PyPI version JSON
download
home_pagehttps://github.com/ChingHuanChiu/collie
SummaryA Lightweight MLOps Framework for Machine Learning Workflows
upload_time2025-11-02 10:09:46
maintainerNone
docs_urlNone
authorChingHuanChiu
requires_python>=3.10
licenseMIT
keywords mlops machine-learning mlflow pipeline orchestration deep-learning experiment-tracking
VCS
bugtrack_url
requirements torch sentence-transformers mlflow transformers hyperopt xgboost pynvml pydantic pyspark pytorch-lightning pytest pytest-cov pytest-mock
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Collie 🐕

[![PyPI version](https://badge.fury.io/py/collie-mlops.svg)](https://badge.fury.io/py/collie-mlops)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Documentation](https://img.shields.io/badge/docs-sphinx-blue.svg)](docs/_build/html/index.html)
[![codecov](https://codecov.io/gh/ChingHuanChiu/collie/branch/main/graph/badge.svg)](https://codecov.io/gh/ChingHuanChiu/collie)

A Lightweight MLOps Framework for Machine Learning Workflows


## Overview

Collie is a modern MLOps framework designed to streamline machine learning workflows by providing a component-based architecture integrated with MLflow. It enables data scientists and ML engineers to build, deploy, and manage ML pipelines with ease through modular components that handle different stages of the ML lifecycle.

## Features

- **Component-Based Architecture**: Modular design with specialized components for each ML workflow stage
- **MLflow Integration**: Built-in experiment tracking, model registration, and deployment capabilities
- **Pipeline Orchestration**: Seamless workflow management with event-driven architecture
- **Model Management**: Automated model versioning, staging, and promotion
- **Framework Agnostic**: Supports multiple ML frameworks (PyTorch, scikit-learn, XGBoost, LightGBM, Transformers)

## Architecture

Collie follows an event-driven architecture with the following core components:

- **Transformer**: Data preprocessing and feature engineering
- **Tuner**: Hyperparameter optimization
- **Trainer**: Model training and validation
- **Evaluator**: Model evaluation and comparison
- **Pusher**: Model deployment and registration
- **Orchestrator**: Workflow coordination and execution

## Quick Start

### Installation

```bash
pip install collie-mlops
```

This will install Collie with all supported ML frameworks including:
- scikit-learn
- PyTorch
- XGBoost
- LightGBM
- Transformers (with Sentence Transformers)

### Prerequisites

- Python >= 3.10
- MLflow tracking server (can be local or remote)


## Components

### Transformer
Handles data preprocessing, feature engineering, and data validation.

```python
class CustomTransformer(Transformer):
    def handle(self, event) -> Event:
        # Process your data
        processed_data = self.preprocess(raw_data)
        return Event(payload=TransformerPayload(train_data=processed_data))
```

### Tuner
Performs hyperparameter optimization using various strategies.

```python
class CustomTuner(Tuner):
    def handle(self, event) -> Event:
        # Optimize hyperparameters
        best_params = self.optimize(search_space)
        return Event(payload=TunerPayload(hyperparameters=best_params))
```

### Trainer
Trains machine learning models with automatic experiment tracking.

```python
class CustomTrainer(Trainer):
    def handle(self, event) -> Event:
        # Train your model
        model = self.train(data, hyperparameters)
        return Event(payload=TrainerPayload(model=model))
```

### Evaluator
Evaluates model performance and decides on deployment.

```python
class CustomEvaluator(Evaluator):
    def handle(self, event) -> Event:
        # Evaluate model performance
        metrics = self.evaluate(model, test_data)
        is_better = self.compare_with_production(metrics)
        return Event(payload=EvaluatorPayload(
            metrics=metrics, 
            is_better_than_production=is_better
        ))
```

### Pusher
Handles model deployment and registration.

```python
class CustomPusher(Pusher):
    def handle(self, event) -> Event:
        # Deploy model to production
        model_uri = self.deploy(model)
        return Event(payload=PusherPayload(model_uri=model_uri))
```

## Configuration

### MLflow Setup

Start MLflow tracking server:

```bash
mlflow server \
    --backend-store-uri sqlite:///mlflow.db \
    --default-artifact-root ./mlruns \
    --host 0.0.0.0 \
    --port 5000
```

## Supported Frameworks

Collie supports multiple ML frameworks through its model flavor system currently:

-  **PyTorch** 
-  **scikit-learn**
-  **XGBoost** 
-  **LightGBM**
-  **Transformers**


## Documentation

[Here you are]( https://collie-mlops.readthedocs.io/en/latest/getting_started.html )

## Roadmap

- [ ] TensorFlow/Keras support
- [ ] Model monitoring and drift detection
- [ ] Integration with Airflow/Kubeflow
- [ ] Integrate an LLM training/fine-tuning framework
- [ ] Solve the issue about heavy import and installation.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Citation

If you use Collie in your research, please cite:

```bibtex
@software{collie2025,
  author = {ChingHuanChiu},
  title = {Collie: A Lightweight MLOps Framework},
  year = {2025},
  url = {https://github.com/ChingHuanChiu/collie}
}
```

---


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ChingHuanChiu/collie",
    "name": "collie-mlops",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "ChingHuanChiu <stevenchiou8@gmail.com>",
    "keywords": "mlops, machine-learning, mlflow, pipeline, orchestration, deep-learning, experiment-tracking",
    "author": "ChingHuanChiu",
    "author_email": "ChingHuanChiu <stevenchiou8@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/10/39/a20591ab2b68bd35bd15523b8c8bff3e8d4c038a489e8adee8f93bbddda3/collie_mlops-0.1.0b0.tar.gz",
    "platform": null,
    "description": "# Collie \ud83d\udc15\n\n[![PyPI version](https://badge.fury.io/py/collie-mlops.svg)](https://badge.fury.io/py/collie-mlops)\n[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Documentation](https://img.shields.io/badge/docs-sphinx-blue.svg)](docs/_build/html/index.html)\n[![codecov](https://codecov.io/gh/ChingHuanChiu/collie/branch/main/graph/badge.svg)](https://codecov.io/gh/ChingHuanChiu/collie)\n\nA Lightweight MLOps Framework for Machine Learning Workflows\n\n\n## Overview\n\nCollie is a modern MLOps framework designed to streamline machine learning workflows by providing a component-based architecture integrated with MLflow. It enables data scientists and ML engineers to build, deploy, and manage ML pipelines with ease through modular components that handle different stages of the ML lifecycle.\n\n## Features\n\n- **Component-Based Architecture**: Modular design with specialized components for each ML workflow stage\n- **MLflow Integration**: Built-in experiment tracking, model registration, and deployment capabilities\n- **Pipeline Orchestration**: Seamless workflow management with event-driven architecture\n- **Model Management**: Automated model versioning, staging, and promotion\n- **Framework Agnostic**: Supports multiple ML frameworks (PyTorch, scikit-learn, XGBoost, LightGBM, Transformers)\n\n## Architecture\n\nCollie follows an event-driven architecture with the following core components:\n\n- **Transformer**: Data preprocessing and feature engineering\n- **Tuner**: Hyperparameter optimization\n- **Trainer**: Model training and validation\n- **Evaluator**: Model evaluation and comparison\n- **Pusher**: Model deployment and registration\n- **Orchestrator**: Workflow coordination and execution\n\n## Quick Start\n\n### Installation\n\n```bash\npip install collie-mlops\n```\n\nThis will install Collie with all supported ML frameworks including:\n- scikit-learn\n- PyTorch\n- XGBoost\n- LightGBM\n- Transformers (with Sentence Transformers)\n\n### Prerequisites\n\n- Python >= 3.10\n- MLflow tracking server (can be local or remote)\n\n\n## Components\n\n### Transformer\nHandles data preprocessing, feature engineering, and data validation.\n\n```python\nclass CustomTransformer(Transformer):\n    def handle(self, event) -> Event:\n        # Process your data\n        processed_data = self.preprocess(raw_data)\n        return Event(payload=TransformerPayload(train_data=processed_data))\n```\n\n### Tuner\nPerforms hyperparameter optimization using various strategies.\n\n```python\nclass CustomTuner(Tuner):\n    def handle(self, event) -> Event:\n        # Optimize hyperparameters\n        best_params = self.optimize(search_space)\n        return Event(payload=TunerPayload(hyperparameters=best_params))\n```\n\n### Trainer\nTrains machine learning models with automatic experiment tracking.\n\n```python\nclass CustomTrainer(Trainer):\n    def handle(self, event) -> Event:\n        # Train your model\n        model = self.train(data, hyperparameters)\n        return Event(payload=TrainerPayload(model=model))\n```\n\n### Evaluator\nEvaluates model performance and decides on deployment.\n\n```python\nclass CustomEvaluator(Evaluator):\n    def handle(self, event) -> Event:\n        # Evaluate model performance\n        metrics = self.evaluate(model, test_data)\n        is_better = self.compare_with_production(metrics)\n        return Event(payload=EvaluatorPayload(\n            metrics=metrics, \n            is_better_than_production=is_better\n        ))\n```\n\n### Pusher\nHandles model deployment and registration.\n\n```python\nclass CustomPusher(Pusher):\n    def handle(self, event) -> Event:\n        # Deploy model to production\n        model_uri = self.deploy(model)\n        return Event(payload=PusherPayload(model_uri=model_uri))\n```\n\n## Configuration\n\n### MLflow Setup\n\nStart MLflow tracking server:\n\n```bash\nmlflow server \\\n    --backend-store-uri sqlite:///mlflow.db \\\n    --default-artifact-root ./mlruns \\\n    --host 0.0.0.0 \\\n    --port 5000\n```\n\n## Supported Frameworks\n\nCollie supports multiple ML frameworks through its model flavor system currently:\n\n-  **PyTorch** \n-  **scikit-learn**\n-  **XGBoost** \n-  **LightGBM**\n-  **Transformers**\n\n\n## Documentation\n\n[Here you are]( https://collie-mlops.readthedocs.io/en/latest/getting_started.html )\n\n## Roadmap\n\n- [ ] TensorFlow/Keras support\n- [ ] Model monitoring and drift detection\n- [ ] Integration with Airflow/Kubeflow\n- [ ] Integrate an LLM training/fine-tuning framework\n- [ ] Solve the issue about heavy import and installation.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Citation\n\nIf you use Collie in your research, please cite:\n\n```bibtex\n@software{collie2025,\n  author = {ChingHuanChiu},\n  title = {Collie: A Lightweight MLOps Framework},\n  year = {2025},\n  url = {https://github.com/ChingHuanChiu/collie}\n}\n```\n\n---\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A Lightweight MLOps Framework for Machine Learning Workflows",
    "version": "0.1.0b0",
    "project_urls": {
        "Bug Tracker": "https://github.com/ChingHuanChiu/collie/issues",
        "Changelog": "https://github.com/ChingHuanChiu/collie/blob/main/CHANGELOG.md",
        "Documentation": "https://github.com/ChingHuanChiu/collie/blob/main/README.md",
        "Homepage": "https://github.com/ChingHuanChiu/collie",
        "Repository": "https://github.com/ChingHuanChiu/collie"
    },
    "split_keywords": [
        "mlops",
        " machine-learning",
        " mlflow",
        " pipeline",
        " orchestration",
        " deep-learning",
        " experiment-tracking"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "27c19d830674283ee7e76f2ea5c17279345cb9a26a7a25bf15a7601f5d185f2e",
                "md5": "84f72d0858319a615feab71330c48060",
                "sha256": "c8b72228063afcdec0c319be10895d15f1a6d7dd9bada2ed342ad67bc6c30da7"
            },
            "downloads": -1,
            "filename": "collie_mlops-0.1.0b0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "84f72d0858319a615feab71330c48060",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 34057,
            "upload_time": "2025-11-02T10:09:44",
            "upload_time_iso_8601": "2025-11-02T10:09:44.354676Z",
            "url": "https://files.pythonhosted.org/packages/27/c1/9d830674283ee7e76f2ea5c17279345cb9a26a7a25bf15a7601f5d185f2e/collie_mlops-0.1.0b0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "1039a20591ab2b68bd35bd15523b8c8bff3e8d4c038a489e8adee8f93bbddda3",
                "md5": "4effd0adcd7ad33f63718d7706516875",
                "sha256": "9b4f5192faf628aacc94d9e0d50ba4a93cdb071fd2b3306d431344b166b4dfed"
            },
            "downloads": -1,
            "filename": "collie_mlops-0.1.0b0.tar.gz",
            "has_sig": false,
            "md5_digest": "4effd0adcd7ad33f63718d7706516875",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 28378,
            "upload_time": "2025-11-02T10:09:46",
            "upload_time_iso_8601": "2025-11-02T10:09:46.341060Z",
            "url": "https://files.pythonhosted.org/packages/10/39/a20591ab2b68bd35bd15523b8c8bff3e8d4c038a489e8adee8f93bbddda3/collie_mlops-0.1.0b0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-11-02 10:09:46",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ChingHuanChiu",
    "github_project": "collie",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "torch",
            "specs": []
        },
        {
            "name": "sentence-transformers",
            "specs": []
        },
        {
            "name": "mlflow",
            "specs": []
        },
        {
            "name": "transformers",
            "specs": []
        },
        {
            "name": "hyperopt",
            "specs": []
        },
        {
            "name": "xgboost",
            "specs": []
        },
        {
            "name": "pynvml",
            "specs": []
        },
        {
            "name": "pydantic",
            "specs": []
        },
        {
            "name": "pyspark",
            "specs": []
        },
        {
            "name": "pytorch-lightning",
            "specs": []
        },
        {
            "name": "pytest",
            "specs": []
        },
        {
            "name": "pytest-cov",
            "specs": []
        },
        {
            "name": "pytest-mock",
            "specs": []
        }
    ],
    "lcname": "collie-mlops"
}
        
Elapsed time: 1.03648s