<p align="center">
<img src="https://github.com/PatrickTourniaire/ror/blob/main/docs/source/_static/logo_blue.png?raw=true" height=50 />
</p>
<h1 align="center"> ROR </h1>
<div align="center">
<a href="">![Unittesting](https://github.com/patricktourniaire/pypipeline/actions/workflows/python-unittesting.yml/badge.svg)</a>
<a href="">[![Documentation](https://github.com/PatrickTourniaire/pypipeline/actions/workflows/documentation.yml/badge.svg)](https://github.com/PatrickTourniaire/pypipeline/actions/workflows/documentation.yml)</a>
<a href="">[![PyPI Deployment](https://github.com/PatrickTourniaire/pypipeline/actions/workflows/python-release-pypi.yml/badge.svg)](https://github.com/PatrickTourniaire/pypipeline/actions/workflows/python-release-pypi.yml)</a>
</div>
ROR is a pipelining framework for Python which makes it easier to define complex ML and
data-processing stages.
## Install it from PyPI
```bash
pip install ror
```
## Usage
To get started with creating your first pipeline, you can base it on this example which
defines a simple GMM pipeline. Firstly, we import the relevant packages.
```py
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.mixture import GaussianMixture
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from dataclasses import dataclass
from typing import Tuple
from ror.schemas import BaseSchema
from ror.schemas.fields import field_perishable, field_persistance
from ror.stages import IInitStage, ITerminalStage, IForwardStage
from ror.controlers import BaseController
```
Then we can define the schemas which will determine the structure of the data communicated between the different stages.
```py
@dataclass
class InitStageInput(BaseSchema):
data: object = field_perishable()
@dataclass
class InitStageOutput(BaseSchema):
X_pca: object = field_persistance()
X_std: object = field_perishable()
model: object = field_persistance()
@dataclass
class InferenceStageOutput(BaseSchema):
X_pca: object = field_perishable()
model: object = field_perishable()
labels: object = field_persistance()
@dataclass
class VisStageOutput(BaseSchema):
labels: object = field_persistance()
```
We can then define the logical stages which will be utilizing these schemas as input
and output between stages.
```py
class VisStage(ITerminalStage[InferenceStageOutput, VisStageOutput]):
def compute(self) -> None:
# Visualize the clusters
plt.figure(figsize=(8, 6))
colors = ['r', 'g', 'b']
for i in range(3):
plt.scatter(
self.input.X_pca[self.input.labels == i, 0],
self.input.X_pca[self.input.labels == i, 1],
color=colors[i],
label=f'Cluster {i+1}'
)
plt.title('Gaussian Mixture Model Clustering')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend()
plt.show()
self._output = self.input.get_carry()
def get_output(self) -> VisStageOutput:
return VisStageOutput(**self._output)
class InferenceStage(IForwardStage[InitStageOutput, InferenceStageOutput, VisStage]):
def compute(self) -> None:
# Fit Guassian mixture to dataset
self.input.model.fit(self.input.X_std)
# Predict the labels
labels = self.input.model.predict(self.input.X_std)
self._output = {
"labels": labels,
**self.input.get_carry()
}
def get_output(self) -> Tuple[VisStage, InferenceStageOutput]:
return VisStage(), InferenceStageOutput(**self._output)
class InitStage(IInitStage[InitStageInput, InitStageOutput, InferenceStage]):
def compute(self) -> None:
# Load the dataset
X = self.input.data.data
# Standardize the features
scaler = StandardScaler()
X_std = scaler.fit_transform(X)
# Apply PCA to reduce dimensionality for visualization
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_std)
# Fit a Gaussian Mixture Model
gmm = GaussianMixture(n_components=3, random_state=42)
self._output = {
"X_pca": X_pca,
"X_std": X_std,
"model": gmm,
**self.input.get_carry()
}
def get_output(self) -> Tuple[InferenceStage, InitStageOutput]:
return InferenceStage(), InitStageOutput(**self._output)
```
Then we can define a simple controller which will be given an instance of the init stage and the input data to be passed through the pipeline.
```py
iris = datasets.load_iris()
input_data = InitStageInput(data=iris)
controller = BaseController(init_data=input_data, init_stage=InitStage)
controller.discover() # Shows a table of the connected stages
output, run_id = controller.start()
```
And that's it! With this you can define logical processing stages for your ML inference
pipelines whilst keeping a high level of seperation.
Raw data
{
"_id": null,
"home_page": "https://patricktourniaire.github.io/ror",
"name": "ror",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8.1,<4.0",
"maintainer_email": "",
"keywords": "machine-learning,pipelines,data-processing",
"author": "Patrick Tourniaire",
"author_email": "patrick@tourniaire.net",
"download_url": "https://files.pythonhosted.org/packages/41/72/2f97449fcabe0673b7579d39da23e23d3fbdef578083d345e325c8df1aaf/ror-0.1.1.tar.gz",
"platform": null,
"description": "<p align=\"center\">\n <img src=\"https://github.com/PatrickTourniaire/ror/blob/main/docs/source/_static/logo_blue.png?raw=true\" height=50 />\n</p>\n\n<h1 align=\"center\"> ROR </h1>\n\n<div align=\"center\">\n\n<a href=\"\">![Unittesting](https://github.com/patricktourniaire/pypipeline/actions/workflows/python-unittesting.yml/badge.svg)</a>\n<a href=\"\">[![Documentation](https://github.com/PatrickTourniaire/pypipeline/actions/workflows/documentation.yml/badge.svg)](https://github.com/PatrickTourniaire/pypipeline/actions/workflows/documentation.yml)</a>\n<a href=\"\">[![PyPI Deployment](https://github.com/PatrickTourniaire/pypipeline/actions/workflows/python-release-pypi.yml/badge.svg)](https://github.com/PatrickTourniaire/pypipeline/actions/workflows/python-release-pypi.yml)</a>\n\n</div>\n\nROR is a pipelining framework for Python which makes it easier to define complex ML and\ndata-processing stages.\n\n## Install it from PyPI\n\n```bash\npip install ror\n```\n\n## Usage\n\nTo get started with creating your first pipeline, you can base it on this example which\ndefines a simple GMM pipeline. Firstly, we import the relevant packages.\n\n```py\n import matplotlib.pyplot as plt\n from sklearn import datasets\n from sklearn.mixture import GaussianMixture\n from sklearn.decomposition import PCA\n from sklearn.preprocessing import StandardScaler\n\n from dataclasses import dataclass\n from typing import Tuple\n\n from ror.schemas import BaseSchema\n from ror.schemas.fields import field_perishable, field_persistance\n from ror.stages import IInitStage, ITerminalStage, IForwardStage\n from ror.controlers import BaseController\n```\n\nThen we can define the schemas which will determine the structure of the data communicated between the different stages.\n\n```py\n @dataclass\n class InitStageInput(BaseSchema):\n data: object = field_perishable()\n\n @dataclass\n class InitStageOutput(BaseSchema):\n X_pca: object = field_persistance()\n X_std: object = field_perishable()\n model: object = field_persistance()\n\n @dataclass\n class InferenceStageOutput(BaseSchema):\n X_pca: object = field_perishable()\n model: object = field_perishable()\n labels: object = field_persistance()\n\n @dataclass\n class VisStageOutput(BaseSchema):\n labels: object = field_persistance()\n```\n\nWe can then define the logical stages which will be utilizing these schemas as input\nand output between stages.\n\n```py\n class VisStage(ITerminalStage[InferenceStageOutput, VisStageOutput]):\n def compute(self) -> None:\n # Visualize the clusters\n plt.figure(figsize=(8, 6))\n colors = ['r', 'g', 'b']\n\n for i in range(3):\n plt.scatter(\n self.input.X_pca[self.input.labels == i, 0],\n self.input.X_pca[self.input.labels == i, 1],\n color=colors[i],\n label=f'Cluster {i+1}'\n )\n\n plt.title('Gaussian Mixture Model Clustering')\n plt.xlabel('Principal Component 1')\n plt.ylabel('Principal Component 2')\n plt.legend()\n plt.show()\n\n self._output = self.input.get_carry()\n\n def get_output(self) -> VisStageOutput:\n return VisStageOutput(**self._output)\n\n class InferenceStage(IForwardStage[InitStageOutput, InferenceStageOutput, VisStage]):\n def compute(self) -> None:\n # Fit Guassian mixture to dataset\n self.input.model.fit(self.input.X_std)\n\n # Predict the labels\n labels = self.input.model.predict(self.input.X_std)\n\n self._output = {\n \"labels\": labels,\n **self.input.get_carry()\n }\n\n def get_output(self) -> Tuple[VisStage, InferenceStageOutput]:\n return VisStage(), InferenceStageOutput(**self._output)\n\n\n class InitStage(IInitStage[InitStageInput, InitStageOutput, InferenceStage]):\n def compute(self) -> None:\n # Load the dataset\n X = self.input.data.data\n\n # Standardize the features\n scaler = StandardScaler()\n X_std = scaler.fit_transform(X)\n\n # Apply PCA to reduce dimensionality for visualization\n pca = PCA(n_components=2)\n X_pca = pca.fit_transform(X_std)\n\n # Fit a Gaussian Mixture Model\n gmm = GaussianMixture(n_components=3, random_state=42)\n\n self._output = {\n \"X_pca\": X_pca,\n \"X_std\": X_std,\n \"model\": gmm,\n **self.input.get_carry()\n }\n\n def get_output(self) -> Tuple[InferenceStage, InitStageOutput]:\n return InferenceStage(), InitStageOutput(**self._output)\n```\n\nThen we can define a simple controller which will be given an instance of the init stage and the input data to be passed through the pipeline.\n\n```py\n iris = datasets.load_iris()\n\n input_data = InitStageInput(data=iris)\n controller = BaseController(init_data=input_data, init_stage=InitStage)\n controller.discover() # Shows a table of the connected stages\n\n output, run_id = controller.start()\n```\n\nAnd that's it! With this you can define logical processing stages for your ML inference\npipelines whilst keeping a high level of seperation.\n",
"bugtrack_url": null,
"license": "",
"summary": "Simple pipelining framework in Python",
"version": "0.1.1",
"project_urls": {
"Documentation": "https://patricktourniaire.github.io/ror/api_reference/ror.html",
"Homepage": "https://patricktourniaire.github.io/ror",
"Repository": "https://github.com/PatrickTourniaire/ror"
},
"split_keywords": [
"machine-learning",
"pipelines",
"data-processing"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "f053ae4bcb977b76eb5c8855f7e322e41d36f67f22d28c3669916dcb013a4e48",
"md5": "643cbe005b8605d3d992024f0d969d06",
"sha256": "5d295861e1dee53a6be5202db31c96bb5106a44f9cafef9abd35f1f3dbb3f49f"
},
"downloads": -1,
"filename": "ror-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "643cbe005b8605d3d992024f0d969d06",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8.1,<4.0",
"size": 13816,
"upload_time": "2024-01-13T20:25:03",
"upload_time_iso_8601": "2024-01-13T20:25:03.957395Z",
"url": "https://files.pythonhosted.org/packages/f0/53/ae4bcb977b76eb5c8855f7e322e41d36f67f22d28c3669916dcb013a4e48/ror-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "41722f97449fcabe0673b7579d39da23e23d3fbdef578083d345e325c8df1aaf",
"md5": "966fe21805bd59ed9f5d68cdf459f891",
"sha256": "de585d0a65be7852a2a1cea69e649c865617bebae014e82bc562d313cdf3bab0"
},
"downloads": -1,
"filename": "ror-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "966fe21805bd59ed9f5d68cdf459f891",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8.1,<4.0",
"size": 10028,
"upload_time": "2024-01-13T20:25:05",
"upload_time_iso_8601": "2024-01-13T20:25:05.658956Z",
"url": "https://files.pythonhosted.org/packages/41/72/2f97449fcabe0673b7579d39da23e23d3fbdef578083d345e325c8df1aaf/ror-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-01-13 20:25:05",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "PatrickTourniaire",
"github_project": "ror",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "ror"
}