# Collie 🐕
[](https://badge.fury.io/py/collie-mlops)
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
[](docs/_build/html/index.html)
[](https://codecov.io/gh/ChingHuanChiu/collie)
A Lightweight MLOps Framework for Machine Learning Workflows
## Overview
Collie is a modern MLOps framework designed to streamline machine learning workflows by providing a component-based architecture integrated with MLflow. It enables data scientists and ML engineers to build, deploy, and manage ML pipelines with ease through modular components that handle different stages of the ML lifecycle.
## Features
- **Component-Based Architecture**: Modular design with specialized components for each ML workflow stage
- **MLflow Integration**: Built-in experiment tracking, model registration, and deployment capabilities
- **Pipeline Orchestration**: Seamless workflow management with event-driven architecture
- **Model Management**: Automated model versioning, staging, and promotion
- **Framework Agnostic**: Supports multiple ML frameworks (PyTorch, scikit-learn, XGBoost, LightGBM, Transformers)
## Architecture
Collie follows an event-driven architecture with the following core components:
- **Transformer**: Data preprocessing and feature engineering
- **Tuner**: Hyperparameter optimization
- **Trainer**: Model training and validation
- **Evaluator**: Model evaluation and comparison
- **Pusher**: Model deployment and registration
- **Orchestrator**: Workflow coordination and execution
## Quick Start
### Installation
```bash
pip install collie-mlops
```
This will install Collie with all supported ML frameworks including:
- scikit-learn
- PyTorch
- XGBoost
- LightGBM
- Transformers (with Sentence Transformers)
### Prerequisites
- Python >= 3.10
- MLflow tracking server (can be local or remote)
## Components
### Transformer
Handles data preprocessing, feature engineering, and data validation.
```python
class CustomTransformer(Transformer):
def handle(self, event) -> Event:
# Process your data
processed_data = self.preprocess(raw_data)
return Event(payload=TransformerPayload(train_data=processed_data))
```
### Tuner
Performs hyperparameter optimization using various strategies.
```python
class CustomTuner(Tuner):
def handle(self, event) -> Event:
# Optimize hyperparameters
best_params = self.optimize(search_space)
return Event(payload=TunerPayload(hyperparameters=best_params))
```
### Trainer
Trains machine learning models with automatic experiment tracking.
```python
class CustomTrainer(Trainer):
def handle(self, event) -> Event:
# Train your model
model = self.train(data, hyperparameters)
return Event(payload=TrainerPayload(model=model))
```
### Evaluator
Evaluates model performance and decides on deployment.
```python
class CustomEvaluator(Evaluator):
def handle(self, event) -> Event:
# Evaluate model performance
metrics = self.evaluate(model, test_data)
is_better = self.compare_with_production(metrics)
return Event(payload=EvaluatorPayload(
metrics=metrics,
is_better_than_production=is_better
))
```
### Pusher
Handles model deployment and registration.
```python
class CustomPusher(Pusher):
def handle(self, event) -> Event:
# Deploy model to production
model_uri = self.deploy(model)
return Event(payload=PusherPayload(model_uri=model_uri))
```
## Configuration
### MLflow Setup
Start MLflow tracking server:
```bash
mlflow server \
--backend-store-uri sqlite:///mlflow.db \
--default-artifact-root ./mlruns \
--host 0.0.0.0 \
--port 5000
```
## Supported Frameworks
Collie supports multiple ML frameworks through its model flavor system currently:
- **PyTorch**
- **scikit-learn**
- **XGBoost**
- **LightGBM**
- **Transformers**
## Documentation
[Here you are]( https://collie-mlops.readthedocs.io/en/latest/getting_started.html )
## Roadmap
- [ ] TensorFlow/Keras support
- [ ] Model monitoring and drift detection
- [ ] Integration with Airflow/Kubeflow
- [ ] Integrate an LLM training/fine-tuning framework
- [ ] Solve the issue about heavy import and installation.
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Citation
If you use Collie in your research, please cite:
```bibtex
@software{collie2025,
author = {ChingHuanChiu},
title = {Collie: A Lightweight MLOps Framework},
year = {2025},
url = {https://github.com/ChingHuanChiu/collie}
}
```
---
Raw data
{
"_id": null,
"home_page": "https://github.com/ChingHuanChiu/collie",
"name": "collie-mlops",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": "ChingHuanChiu <stevenchiou8@gmail.com>",
"keywords": "mlops, machine-learning, mlflow, pipeline, orchestration, deep-learning, experiment-tracking",
"author": "ChingHuanChiu",
"author_email": "ChingHuanChiu <stevenchiou8@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/10/39/a20591ab2b68bd35bd15523b8c8bff3e8d4c038a489e8adee8f93bbddda3/collie_mlops-0.1.0b0.tar.gz",
"platform": null,
"description": "# Collie \ud83d\udc15\n\n[](https://badge.fury.io/py/collie-mlops)\n[](https://www.python.org/downloads/)\n[](https://opensource.org/licenses/MIT)\n[](docs/_build/html/index.html)\n[](https://codecov.io/gh/ChingHuanChiu/collie)\n\nA Lightweight MLOps Framework for Machine Learning Workflows\n\n\n## Overview\n\nCollie is a modern MLOps framework designed to streamline machine learning workflows by providing a component-based architecture integrated with MLflow. It enables data scientists and ML engineers to build, deploy, and manage ML pipelines with ease through modular components that handle different stages of the ML lifecycle.\n\n## Features\n\n- **Component-Based Architecture**: Modular design with specialized components for each ML workflow stage\n- **MLflow Integration**: Built-in experiment tracking, model registration, and deployment capabilities\n- **Pipeline Orchestration**: Seamless workflow management with event-driven architecture\n- **Model Management**: Automated model versioning, staging, and promotion\n- **Framework Agnostic**: Supports multiple ML frameworks (PyTorch, scikit-learn, XGBoost, LightGBM, Transformers)\n\n## Architecture\n\nCollie follows an event-driven architecture with the following core components:\n\n- **Transformer**: Data preprocessing and feature engineering\n- **Tuner**: Hyperparameter optimization\n- **Trainer**: Model training and validation\n- **Evaluator**: Model evaluation and comparison\n- **Pusher**: Model deployment and registration\n- **Orchestrator**: Workflow coordination and execution\n\n## Quick Start\n\n### Installation\n\n```bash\npip install collie-mlops\n```\n\nThis will install Collie with all supported ML frameworks including:\n- scikit-learn\n- PyTorch\n- XGBoost\n- LightGBM\n- Transformers (with Sentence Transformers)\n\n### Prerequisites\n\n- Python >= 3.10\n- MLflow tracking server (can be local or remote)\n\n\n## Components\n\n### Transformer\nHandles data preprocessing, feature engineering, and data validation.\n\n```python\nclass CustomTransformer(Transformer):\n def handle(self, event) -> Event:\n # Process your data\n processed_data = self.preprocess(raw_data)\n return Event(payload=TransformerPayload(train_data=processed_data))\n```\n\n### Tuner\nPerforms hyperparameter optimization using various strategies.\n\n```python\nclass CustomTuner(Tuner):\n def handle(self, event) -> Event:\n # Optimize hyperparameters\n best_params = self.optimize(search_space)\n return Event(payload=TunerPayload(hyperparameters=best_params))\n```\n\n### Trainer\nTrains machine learning models with automatic experiment tracking.\n\n```python\nclass CustomTrainer(Trainer):\n def handle(self, event) -> Event:\n # Train your model\n model = self.train(data, hyperparameters)\n return Event(payload=TrainerPayload(model=model))\n```\n\n### Evaluator\nEvaluates model performance and decides on deployment.\n\n```python\nclass CustomEvaluator(Evaluator):\n def handle(self, event) -> Event:\n # Evaluate model performance\n metrics = self.evaluate(model, test_data)\n is_better = self.compare_with_production(metrics)\n return Event(payload=EvaluatorPayload(\n metrics=metrics, \n is_better_than_production=is_better\n ))\n```\n\n### Pusher\nHandles model deployment and registration.\n\n```python\nclass CustomPusher(Pusher):\n def handle(self, event) -> Event:\n # Deploy model to production\n model_uri = self.deploy(model)\n return Event(payload=PusherPayload(model_uri=model_uri))\n```\n\n## Configuration\n\n### MLflow Setup\n\nStart MLflow tracking server:\n\n```bash\nmlflow server \\\n --backend-store-uri sqlite:///mlflow.db \\\n --default-artifact-root ./mlruns \\\n --host 0.0.0.0 \\\n --port 5000\n```\n\n## Supported Frameworks\n\nCollie supports multiple ML frameworks through its model flavor system currently:\n\n- **PyTorch** \n- **scikit-learn**\n- **XGBoost** \n- **LightGBM**\n- **Transformers**\n\n\n## Documentation\n\n[Here you are]( https://collie-mlops.readthedocs.io/en/latest/getting_started.html )\n\n## Roadmap\n\n- [ ] TensorFlow/Keras support\n- [ ] Model monitoring and drift detection\n- [ ] Integration with Airflow/Kubeflow\n- [ ] Integrate an LLM training/fine-tuning framework\n- [ ] Solve the issue about heavy import and installation.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Citation\n\nIf you use Collie in your research, please cite:\n\n```bibtex\n@software{collie2025,\n author = {ChingHuanChiu},\n title = {Collie: A Lightweight MLOps Framework},\n year = {2025},\n url = {https://github.com/ChingHuanChiu/collie}\n}\n```\n\n---\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A Lightweight MLOps Framework for Machine Learning Workflows",
"version": "0.1.0b0",
"project_urls": {
"Bug Tracker": "https://github.com/ChingHuanChiu/collie/issues",
"Changelog": "https://github.com/ChingHuanChiu/collie/blob/main/CHANGELOG.md",
"Documentation": "https://github.com/ChingHuanChiu/collie/blob/main/README.md",
"Homepage": "https://github.com/ChingHuanChiu/collie",
"Repository": "https://github.com/ChingHuanChiu/collie"
},
"split_keywords": [
"mlops",
" machine-learning",
" mlflow",
" pipeline",
" orchestration",
" deep-learning",
" experiment-tracking"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "27c19d830674283ee7e76f2ea5c17279345cb9a26a7a25bf15a7601f5d185f2e",
"md5": "84f72d0858319a615feab71330c48060",
"sha256": "c8b72228063afcdec0c319be10895d15f1a6d7dd9bada2ed342ad67bc6c30da7"
},
"downloads": -1,
"filename": "collie_mlops-0.1.0b0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "84f72d0858319a615feab71330c48060",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 34057,
"upload_time": "2025-11-02T10:09:44",
"upload_time_iso_8601": "2025-11-02T10:09:44.354676Z",
"url": "https://files.pythonhosted.org/packages/27/c1/9d830674283ee7e76f2ea5c17279345cb9a26a7a25bf15a7601f5d185f2e/collie_mlops-0.1.0b0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "1039a20591ab2b68bd35bd15523b8c8bff3e8d4c038a489e8adee8f93bbddda3",
"md5": "4effd0adcd7ad33f63718d7706516875",
"sha256": "9b4f5192faf628aacc94d9e0d50ba4a93cdb071fd2b3306d431344b166b4dfed"
},
"downloads": -1,
"filename": "collie_mlops-0.1.0b0.tar.gz",
"has_sig": false,
"md5_digest": "4effd0adcd7ad33f63718d7706516875",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 28378,
"upload_time": "2025-11-02T10:09:46",
"upload_time_iso_8601": "2025-11-02T10:09:46.341060Z",
"url": "https://files.pythonhosted.org/packages/10/39/a20591ab2b68bd35bd15523b8c8bff3e8d4c038a489e8adee8f93bbddda3/collie_mlops-0.1.0b0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-11-02 10:09:46",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "ChingHuanChiu",
"github_project": "collie",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "torch",
"specs": []
},
{
"name": "sentence-transformers",
"specs": []
},
{
"name": "mlflow",
"specs": []
},
{
"name": "transformers",
"specs": []
},
{
"name": "hyperopt",
"specs": []
},
{
"name": "xgboost",
"specs": []
},
{
"name": "pynvml",
"specs": []
},
{
"name": "pydantic",
"specs": []
},
{
"name": "pyspark",
"specs": []
},
{
"name": "pytorch-lightning",
"specs": []
},
{
"name": "pytest",
"specs": []
},
{
"name": "pytest-cov",
"specs": []
},
{
"name": "pytest-mock",
"specs": []
}
],
"lcname": "collie-mlops"
}