Name | atelierflow JSON |
Version |
0.1.1
JSON |
| download |
home_page | None |
Summary | A lightweight Python framework for creating clear, reproducible, and scalable machine learning pipelines. |
upload_time | 2025-08-03 20:49:34 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.9 |
license | MIT License
Copyright (c) 2024 Data & IoT Atelier
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
|
keywords |
mlops
pipeline
machine-learning
orchestration
reproducibility
|
VCS |
 |
bugtrack_url |
|
requirements |
apache-beam
attrs
certifi
charset-normalizer
cloudpickle
crcmod
dill
dnspython
docopt
fastavro
fasteners
grpcio
hdfs
httplib2
idna
joblib
jsonpickle
jsonschema
jsonschema-specifications
numpy
objsize
orjson
packaging
proto-plus
protobuf
pyarrow
pyarrow-hotfix
pydot
pymongo
pyparsing
python-dateutil
pytz
redis
referencing
regex
requests
rpds-py
scikit-learn
scipy
six
threadpoolctl
typing_extensions
urllib3
zstandard
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
## AtelierFlow π¨
A lightweight, flexible Python framework for creating clear, reproducible, and scalable machine learning pipelines.
AtelierFlow helps you structure your ML code into a series of modular, reusable steps. It's designed to bring clarity and standardization to your experimentation process, from data loading to model evaluation, without imposing a heavy, restrictive structure.
## β¨ Features
- Modular by Design: Build pipelines by chaining together independent, reusable `Step` components.
- Centralized Configuration: Easily manage global settings like `device` (CPU/GPU), `logging_level`, and custom `tags` for your entire experiment from one place.
- Extensible Library: Use pre-built, common steps for I/O, preprocessing, training, evaluation and others or easily create your own.
- Framework Agnostic: While providing helpers for libraries like scikit-learn, the core is lightweight and can orchestrate any Python-based ML workflow.
- Clear & Explicit: The framework prioritizes readability and explicit dependencies, making your pipelines easy to understand, share, and debug.
## π Installation
You can install the core framework via pip. Optional dependencies can be installed to add support for specific ML libraries.
```bash
# Install the core framework
pip install atelierflow
# To include support for scikit-learn based steps
pip install atelierflow[scikit-learn]
# To include support for pytorch based steps
pip install atelierflow[torch]
```
## Quick Start
Hereβs how to build and run a simple pipeline in just a few lines of code. This example uses pre-built steps to generate data, train a model, evaluate it, and save the results.
```python
import logging
from sklearn.ensemble import RandomForestClassifier
from atelierflow.experiment import Experiment
from atelierflow.steps.common.save_data.save_to_avro import SaveToAvroStep
# Steps not implemented
from atelierflow.steps.sklearn.evaluation import ClassificationEvaluationStep
from atelierflow.steps.sklearn.training import TrainModelStep
# 1. Define your components and schemas
model_component = RandomForestClassifier(n_estimators=50)
scores_schema = {'name': 'Scores', 'type': 'record', 'fields': [{'name': 'AUC', 'type': 'double'}]}
# 2. Create an Experiment and configure it
experiment = Experiment(
name="Quick Start Classification",
logging_level="INFO",
tags={"project": "onboarding-example"}
)
# 3. Add steps to the pipeline
experiment.add_step(GenerateDataStep())
experiment.add_step(TrainModelStep(model=model_component))
experiment.add_step(ClassificationEvaluationStep(metrics={'AUC': roc_auc_score}))
experiment.add_step(
SaveToAvroStep(
output_path="./quick_start_results.avro",
data_key='evaluation_scores',
schema=scores_schema
)
)
# 4. Run the experiment!
if __name__ == "__main__":
final_results = experiment.run()
logging.info(f"Pipeline complete. Results saved to ./quick_start_results.avro")
```
AtelierFlow is built around a few simple, powerful concepts.
- `Experiment`: The main orchestrator. You create an Experiment instance, give it a name, and provide global configurations. It is responsible for running the steps in the correct order.
- `Step`: A single, executable stage in your pipeline. A step can do anything: load data, train a model, or save a file. Every step receives the output of the previous step and the global experiment configuration.
- `StepResult`: A simple key-value store that acts as the data carrier between steps. A step adds its outputs (e.g., `result.add('trained_model', model)`) and the next step retrieves them (`input_data.get('trained_model')`).
## π‘ Working with Pre-built Steps
When using pre-built steps like `TrainModelStep` or `ClassificationEvaluationStep`, it's crucial that the objects you pass to them adhere to the framework's core interfaces.
- Models must implement the `Model` interface. Any object passed to `TrainModelStep` must be a class that inherits from `atelierflow.core.model.Model`. This ensures the step can reliably call methods like `.fit()` and `.predict()`.
- Metrics must implement the `Metric` interface. Similarly, any custom metric passed to an evaluation step must inherit from `atelierflow.core.metric.Metric` and implement the `.compute()` method.
This "programming to an interface" design is what gives AtelierFlow its flexibility. It allows the pre-built steps to work with any model or metric, as long as it follows the expected contract.
## π οΈ Creating a Custom Step
Creating your own step is the primary way to extend AtelierFlow. It's as simple as inheriting from the `Step` base class and implementing the `run` method.
- Inherit from `Step`: Create a new class that inherits from `atelierflow.core.step.Step`.
- Use `__init__` for Configuration: Pass any parameters your step needs to its constructor.
- Implement `run`: This is where your logic goes. Use the `input_data` to get results from previous steps and use `experiment_config` to access global settings.
- Return a `StepResult`: Your step must return a `StepResult` object, even if it's empty, to continue the pipeline.
### Example: A Custom Hello World Step
```python
import logging
from atelierflow.core.step import Step
from atelierflow.core.step_result import StepResult
from typing import Dict, Any, Optional
logger = logging.getLogger(__name__)
class HelloWorldStep(Step):
"""A simple example of a custom step."""
def __init__(self, message: str):
# 1. Use __init__ for step-specific parameters
self.message = message
def run(self, input_data: Optional[StepResult], experiment_config: Dict[str, Any]) -> StepResult:
# 2. Implement your logic in the 'run' method
# You can access global config
exp_name = experiment_config.get('name', 'Default Experiment')
logger.info(f"Hello from the '{exp_name}' experiment!")
logger.info(f"Custom message for this step: {self.message}")
# 3. Return a StepResult
return input_data # Pass through the data to the next step
```
## π€ Contributing
Contributions are welcome! Whether it's adding new pre-built steps, improving documentation, or reporting bugs, please feel free to open an issue or submit a pull request.
Raw data
{
"_id": null,
"home_page": null,
"name": "atelierflow",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "mlops, pipeline, machine-learning, orchestration, reproducibility",
"author": null,
"author_email": "Marcelo Brito <mhbcoelho99@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/81/b3/68391f592e4508c15311bc235dd98a4f6cb36a7a6acacac053468bf7a9a7/atelierflow-0.1.1.tar.gz",
"platform": null,
"description": "## AtelierFlow \ud83c\udfa8\n\nA lightweight, flexible Python framework for creating clear, reproducible, and scalable machine learning pipelines.\n\nAtelierFlow helps you structure your ML code into a series of modular, reusable steps. It's designed to bring clarity and standardization to your experimentation process, from data loading to model evaluation, without imposing a heavy, restrictive structure.\n## \u2728 Features\n\n- Modular by Design: Build pipelines by chaining together independent, reusable `Step` components.\n\n- Centralized Configuration: Easily manage global settings like `device` (CPU/GPU), `logging_level`, and custom `tags` for your entire experiment from one place.\n\n- Extensible Library: Use pre-built, common steps for I/O, preprocessing, training, evaluation and others or easily create your own.\n\n- Framework Agnostic: While providing helpers for libraries like scikit-learn, the core is lightweight and can orchestrate any Python-based ML workflow.\n\n- Clear & Explicit: The framework prioritizes readability and explicit dependencies, making your pipelines easy to understand, share, and debug.\n\n## \ud83d\ude80 Installation\n\nYou can install the core framework via pip. Optional dependencies can be installed to add support for specific ML libraries.\n\n```bash\n# Install the core framework\npip install atelierflow\n\n# To include support for scikit-learn based steps\npip install atelierflow[scikit-learn]\n\n# To include support for pytorch based steps\npip install atelierflow[torch]\n```\n\n## Quick Start\n\nHere\u2019s how to build and run a simple pipeline in just a few lines of code. This example uses pre-built steps to generate data, train a model, evaluate it, and save the results.\n\n```python\nimport logging\nfrom sklearn.ensemble import RandomForestClassifier\nfrom atelierflow.experiment import Experiment\nfrom atelierflow.steps.common.save_data.save_to_avro import SaveToAvroStep\n\n# Steps not implemented\nfrom atelierflow.steps.sklearn.evaluation import ClassificationEvaluationStep\nfrom atelierflow.steps.sklearn.training import TrainModelStep\n\n# 1. Define your components and schemas\nmodel_component = RandomForestClassifier(n_estimators=50)\nscores_schema = {'name': 'Scores', 'type': 'record', 'fields': [{'name': 'AUC', 'type': 'double'}]}\n\n# 2. Create an Experiment and configure it\nexperiment = Experiment(\n name=\"Quick Start Classification\",\n logging_level=\"INFO\",\n tags={\"project\": \"onboarding-example\"}\n)\n\n# 3. Add steps to the pipeline\nexperiment.add_step(GenerateDataStep())\nexperiment.add_step(TrainModelStep(model=model_component))\nexperiment.add_step(ClassificationEvaluationStep(metrics={'AUC': roc_auc_score}))\nexperiment.add_step(\n SaveToAvroStep(\n output_path=\"./quick_start_results.avro\",\n data_key='evaluation_scores',\n schema=scores_schema\n )\n)\n\n# 4. Run the experiment!\nif __name__ == \"__main__\":\n final_results = experiment.run()\n logging.info(f\"Pipeline complete. Results saved to ./quick_start_results.avro\")\n```\n\n\nAtelierFlow is built around a few simple, powerful concepts.\n\n- `Experiment`: The main orchestrator. You create an Experiment instance, give it a name, and provide global configurations. It is responsible for running the steps in the correct order.\n\n- `Step`: A single, executable stage in your pipeline. A step can do anything: load data, train a model, or save a file. Every step receives the output of the previous step and the global experiment configuration.\n\n- `StepResult`: A simple key-value store that acts as the data carrier between steps. A step adds its outputs (e.g., `result.add('trained_model', model)`) and the next step retrieves them (`input_data.get('trained_model')`).\n\n## \ud83d\udca1 Working with Pre-built Steps\n\nWhen using pre-built steps like `TrainModelStep` or `ClassificationEvaluationStep`, it's crucial that the objects you pass to them adhere to the framework's core interfaces.\n\n- Models must implement the `Model` interface. Any object passed to `TrainModelStep` must be a class that inherits from `atelierflow.core.model.Model`. This ensures the step can reliably call methods like `.fit()` and `.predict()`.\n\n- Metrics must implement the `Metric` interface. Similarly, any custom metric passed to an evaluation step must inherit from `atelierflow.core.metric.Metric` and implement the `.compute()` method.\n\nThis \"programming to an interface\" design is what gives AtelierFlow its flexibility. It allows the pre-built steps to work with any model or metric, as long as it follows the expected contract.\n\n## \ud83d\udee0\ufe0f Creating a Custom Step\n\nCreating your own step is the primary way to extend AtelierFlow. It's as simple as inheriting from the `Step` base class and implementing the `run` method.\n\n- Inherit from `Step`: Create a new class that inherits from `atelierflow.core.step.Step`.\n\n- Use `__init__` for Configuration: Pass any parameters your step needs to its constructor.\n\n- Implement `run`: This is where your logic goes. Use the `input_data` to get results from previous steps and use `experiment_config` to access global settings.\n\n- Return a `StepResult`: Your step must return a `StepResult` object, even if it's empty, to continue the pipeline.\n\n### Example: A Custom Hello World Step\n\n```python\nimport logging\nfrom atelierflow.core.step import Step\nfrom atelierflow.core.step_result import StepResult\nfrom typing import Dict, Any, Optional\n\nlogger = logging.getLogger(__name__)\n\nclass HelloWorldStep(Step):\n \"\"\"A simple example of a custom step.\"\"\"\n def __init__(self, message: str):\n # 1. Use __init__ for step-specific parameters\n self.message = message\n\n def run(self, input_data: Optional[StepResult], experiment_config: Dict[str, Any]) -> StepResult:\n # 2. Implement your logic in the 'run' method\n \n # You can access global config\n exp_name = experiment_config.get('name', 'Default Experiment')\n \n logger.info(f\"Hello from the '{exp_name}' experiment!\")\n logger.info(f\"Custom message for this step: {self.message}\")\n \n # 3. Return a StepResult\n return input_data # Pass through the data to the next step\n```\n## \ud83e\udd1d Contributing\n\nContributions are welcome! Whether it's adding new pre-built steps, improving documentation, or reporting bugs, please feel free to open an issue or submit a pull request.\n",
"bugtrack_url": null,
"license": "MIT License\n \n Copyright (c) 2024 Data & IoT Atelier\n \n Permission is hereby granted, free of charge, to any person obtaining a copy\n of this software and associated documentation files (the \"Software\"), to deal\n in the Software without restriction, including without limitation the rights\n to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n copies of the Software, and to permit persons to whom the Software is\n furnished to do so, subject to the following conditions:\n \n The above copyright notice and this permission notice shall be included in all\n copies or substantial portions of the Software.\n \n THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n SOFTWARE.\n ",
"summary": "A lightweight Python framework for creating clear, reproducible, and scalable machine learning pipelines.",
"version": "0.1.1",
"project_urls": {
"Homepage": "https://github.com/IoTDataAtelier/pipeflow"
},
"split_keywords": [
"mlops",
" pipeline",
" machine-learning",
" orchestration",
" reproducibility"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "60a0c635b33f73f8d1ecd0906724590755204a90a25e880fc3c1a09fb173e746",
"md5": "7aa582ddf77d097bf71717a81467c684",
"sha256": "fc08402bbe4e3713727ee20b5004a3415c52ca54b35de6cc7ec6c8391c79a3c0"
},
"downloads": -1,
"filename": "atelierflow-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "7aa582ddf77d097bf71717a81467c684",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 10925,
"upload_time": "2025-08-03T20:49:32",
"upload_time_iso_8601": "2025-08-03T20:49:32.405367Z",
"url": "https://files.pythonhosted.org/packages/60/a0/c635b33f73f8d1ecd0906724590755204a90a25e880fc3c1a09fb173e746/atelierflow-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "81b368391f592e4508c15311bc235dd98a4f6cb36a7a6acacac053468bf7a9a7",
"md5": "540f4795a6749a215d0e66ebce47a4b4",
"sha256": "17fcca3d70a6feb879501d3bbe1756cb10c70d9cef4fc23c0edbbc37d1ddea61"
},
"downloads": -1,
"filename": "atelierflow-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "540f4795a6749a215d0e66ebce47a4b4",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 98692479,
"upload_time": "2025-08-03T20:49:34",
"upload_time_iso_8601": "2025-08-03T20:49:34.804569Z",
"url": "https://files.pythonhosted.org/packages/81/b3/68391f592e4508c15311bc235dd98a4f6cb36a7a6acacac053468bf7a9a7/atelierflow-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-03 20:49:34",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "IoTDataAtelier",
"github_project": "pipeflow",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "apache-beam",
"specs": [
[
"==",
"2.58.1"
]
]
},
{
"name": "attrs",
"specs": [
[
"==",
"24.2.0"
]
]
},
{
"name": "certifi",
"specs": [
[
"==",
"2024.8.30"
]
]
},
{
"name": "charset-normalizer",
"specs": [
[
"==",
"3.3.2"
]
]
},
{
"name": "cloudpickle",
"specs": [
[
"==",
"2.2.1"
]
]
},
{
"name": "crcmod",
"specs": [
[
"==",
"1.7"
]
]
},
{
"name": "dill",
"specs": [
[
">=",
"0.3.1.1"
],
[
"<",
"0.3.2"
]
]
},
{
"name": "dnspython",
"specs": [
[
"==",
"2.6.1"
]
]
},
{
"name": "docopt",
"specs": [
[
"==",
"0.6.2"
]
]
},
{
"name": "fastavro",
"specs": [
[
"==",
"1.9.5"
]
]
},
{
"name": "fasteners",
"specs": [
[
"==",
"0.19"
]
]
},
{
"name": "grpcio",
"specs": [
[
"==",
"1.66.1"
]
]
},
{
"name": "hdfs",
"specs": [
[
"==",
"2.7.3"
]
]
},
{
"name": "httplib2",
"specs": [
[
"==",
"0.22.0"
]
]
},
{
"name": "idna",
"specs": [
[
"==",
"3.8"
]
]
},
{
"name": "joblib",
"specs": [
[
"==",
"1.4.2"
]
]
},
{
"name": "jsonpickle",
"specs": [
[
"==",
"3.2.2"
]
]
},
{
"name": "jsonschema",
"specs": [
[
"==",
"4.23.0"
]
]
},
{
"name": "jsonschema-specifications",
"specs": [
[
"==",
"2023.12.1"
]
]
},
{
"name": "numpy",
"specs": [
[
"==",
"1.26.4"
]
]
},
{
"name": "objsize",
"specs": [
[
"==",
"0.7.0"
]
]
},
{
"name": "orjson",
"specs": [
[
"==",
"3.10.7"
]
]
},
{
"name": "packaging",
"specs": [
[
"==",
"24.1"
]
]
},
{
"name": "proto-plus",
"specs": [
[
"==",
"1.24.0"
]
]
},
{
"name": "protobuf",
"specs": [
[
"==",
"4.25.8"
]
]
},
{
"name": "pyarrow",
"specs": [
[
"==",
"16.1.0"
]
]
},
{
"name": "pyarrow-hotfix",
"specs": [
[
"==",
"0.6"
]
]
},
{
"name": "pydot",
"specs": [
[
"==",
"1.4.2"
]
]
},
{
"name": "pymongo",
"specs": [
[
"==",
"4.8.0"
]
]
},
{
"name": "pyparsing",
"specs": [
[
"==",
"3.1.4"
]
]
},
{
"name": "python-dateutil",
"specs": [
[
"==",
"2.9.0.post0"
]
]
},
{
"name": "pytz",
"specs": [
[
"==",
"2024.1"
]
]
},
{
"name": "redis",
"specs": [
[
"==",
"5.0.8"
]
]
},
{
"name": "referencing",
"specs": [
[
"==",
"0.35.1"
]
]
},
{
"name": "regex",
"specs": [
[
"==",
"2024.7.24"
]
]
},
{
"name": "requests",
"specs": [
[
"==",
"2.32.4"
]
]
},
{
"name": "rpds-py",
"specs": [
[
"==",
"0.20.0"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
"==",
"1.5.1"
]
]
},
{
"name": "scipy",
"specs": [
[
"==",
"1.14.1"
]
]
},
{
"name": "six",
"specs": [
[
"==",
"1.16.0"
]
]
},
{
"name": "threadpoolctl",
"specs": [
[
"==",
"3.5.0"
]
]
},
{
"name": "typing_extensions",
"specs": [
[
"==",
"4.12.2"
]
]
},
{
"name": "urllib3",
"specs": [
[
"==",
"2.5.0"
]
]
},
{
"name": "zstandard",
"specs": [
[
"==",
"0.23.0"
]
]
}
],
"lcname": "atelierflow"
}