atelierflow


Nameatelierflow JSON
Version 0.1.1 PyPI version JSON
download
home_pageNone
SummaryA lightweight Python framework for creating clear, reproducible, and scalable machine learning pipelines.
upload_time2025-08-03 20:49:34
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseMIT License Copyright (c) 2024 Data & IoT Atelier Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords mlops pipeline machine-learning orchestration reproducibility
VCS
bugtrack_url
requirements apache-beam attrs certifi charset-normalizer cloudpickle crcmod dill dnspython docopt fastavro fasteners grpcio hdfs httplib2 idna joblib jsonpickle jsonschema jsonschema-specifications numpy objsize orjson packaging proto-plus protobuf pyarrow pyarrow-hotfix pydot pymongo pyparsing python-dateutil pytz redis referencing regex requests rpds-py scikit-learn scipy six threadpoolctl typing_extensions urllib3 zstandard
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ## AtelierFlow 🎨

A lightweight, flexible Python framework for creating clear, reproducible, and scalable machine learning pipelines.

AtelierFlow helps you structure your ML code into a series of modular, reusable steps. It's designed to bring clarity and standardization to your experimentation process, from data loading to model evaluation, without imposing a heavy, restrictive structure.
## ✨ Features

- Modular by Design: Build pipelines by chaining together independent, reusable `Step` components.

- Centralized Configuration: Easily manage global settings like `device` (CPU/GPU), `logging_level`, and custom `tags` for your entire experiment from one place.

- Extensible Library: Use pre-built, common steps for I/O, preprocessing, training, evaluation and others or easily create your own.

- Framework Agnostic: While providing helpers for libraries like scikit-learn, the core is lightweight and can orchestrate any Python-based ML workflow.

- Clear & Explicit: The framework prioritizes readability and explicit dependencies, making your pipelines easy to understand, share, and debug.

## πŸš€ Installation

You can install the core framework via pip. Optional dependencies can be installed to add support for specific ML libraries.

```bash
# Install the core framework
pip install atelierflow

# To include support for scikit-learn based steps
pip install atelierflow[scikit-learn]

# To include support for pytorch based steps
pip install atelierflow[torch]
```

## Quick Start

Here’s how to build and run a simple pipeline in just a few lines of code. This example uses pre-built steps to generate data, train a model, evaluate it, and save the results.

```python
import logging
from sklearn.ensemble import RandomForestClassifier
from atelierflow.experiment import Experiment
from atelierflow.steps.common.save_data.save_to_avro import SaveToAvroStep

#  Steps not implemented
from atelierflow.steps.sklearn.evaluation import ClassificationEvaluationStep
from atelierflow.steps.sklearn.training import TrainModelStep

# 1. Define your components and schemas
model_component = RandomForestClassifier(n_estimators=50)
scores_schema = {'name': 'Scores', 'type': 'record', 'fields': [{'name': 'AUC', 'type': 'double'}]}

# 2. Create an Experiment and configure it
experiment = Experiment(
  name="Quick Start Classification",
  logging_level="INFO",
  tags={"project": "onboarding-example"}
)

# 3. Add steps to the pipeline
experiment.add_step(GenerateDataStep())
experiment.add_step(TrainModelStep(model=model_component))
experiment.add_step(ClassificationEvaluationStep(metrics={'AUC': roc_auc_score}))
experiment.add_step(
  SaveToAvroStep(
    output_path="./quick_start_results.avro",
    data_key='evaluation_scores',
    schema=scores_schema
  )
)

# 4. Run the experiment!
if __name__ == "__main__":
  final_results = experiment.run()
  logging.info(f"Pipeline complete. Results saved to ./quick_start_results.avro")
```


AtelierFlow is built around a few simple, powerful concepts.

- `Experiment`: The main orchestrator. You create an Experiment instance, give it a name, and provide global configurations. It is responsible for running the steps in the correct order.

- `Step`: A single, executable stage in your pipeline. A step can do anything: load data, train a model, or save a file. Every step receives the output of the previous step and the global experiment configuration.

- `StepResult`: A simple key-value store that acts as the data carrier between steps. A step adds its outputs (e.g., `result.add('trained_model', model)`) and the next step retrieves them (`input_data.get('trained_model')`).

## πŸ’‘ Working with Pre-built Steps

When using pre-built steps like `TrainModelStep` or `ClassificationEvaluationStep`, it's crucial that the objects you pass to them adhere to the framework's core interfaces.

- Models must implement the `Model` interface. Any object passed to `TrainModelStep` must be a class that inherits from `atelierflow.core.model.Model`. This ensures the step can reliably call methods like `.fit()` and `.predict()`.

- Metrics must implement the `Metric` interface. Similarly, any custom metric passed to an evaluation step must inherit from `atelierflow.core.metric.Metric` and implement the `.compute()` method.

This "programming to an interface" design is what gives AtelierFlow its flexibility. It allows the pre-built steps to work with any model or metric, as long as it follows the expected contract.

## πŸ› οΈ Creating a Custom Step

Creating your own step is the primary way to extend AtelierFlow. It's as simple as inheriting from the `Step` base class and implementing the `run` method.

- Inherit from `Step`: Create a new class that inherits from `atelierflow.core.step.Step`.

- Use `__init__` for Configuration: Pass any parameters your step needs to its constructor.

- Implement `run`: This is where your logic goes. Use the `input_data` to get results from previous steps and use `experiment_config` to access global settings.

- Return a `StepResult`: Your step must return a `StepResult` object, even if it's empty, to continue the pipeline.

### Example: A Custom Hello World Step

```python
import logging
from atelierflow.core.step import Step
from atelierflow.core.step_result import StepResult
from typing import Dict, Any, Optional

logger = logging.getLogger(__name__)

class HelloWorldStep(Step):
  """A simple example of a custom step."""
  def __init__(self, message: str):
    # 1. Use __init__ for step-specific parameters
    self.message = message

  def run(self, input_data: Optional[StepResult], experiment_config: Dict[str, Any]) -> StepResult:
    # 2. Implement your logic in the 'run' method
    
    # You can access global config
    exp_name = experiment_config.get('name', 'Default Experiment')
    
    logger.info(f"Hello from the '{exp_name}' experiment!")
    logger.info(f"Custom message for this step: {self.message}")
    
    # 3. Return a StepResult
    return input_data # Pass through the data to the next step
```
## 🀝 Contributing

Contributions are welcome! Whether it's adding new pre-built steps, improving documentation, or reporting bugs, please feel free to open an issue or submit a pull request.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "atelierflow",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "mlops, pipeline, machine-learning, orchestration, reproducibility",
    "author": null,
    "author_email": "Marcelo Brito <mhbcoelho99@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/81/b3/68391f592e4508c15311bc235dd98a4f6cb36a7a6acacac053468bf7a9a7/atelierflow-0.1.1.tar.gz",
    "platform": null,
    "description": "## AtelierFlow \ud83c\udfa8\n\nA lightweight, flexible Python framework for creating clear, reproducible, and scalable machine learning pipelines.\n\nAtelierFlow helps you structure your ML code into a series of modular, reusable steps. It's designed to bring clarity and standardization to your experimentation process, from data loading to model evaluation, without imposing a heavy, restrictive structure.\n## \u2728 Features\n\n- Modular by Design: Build pipelines by chaining together independent, reusable `Step` components.\n\n- Centralized Configuration: Easily manage global settings like `device` (CPU/GPU), `logging_level`, and custom `tags` for your entire experiment from one place.\n\n- Extensible Library: Use pre-built, common steps for I/O, preprocessing, training, evaluation and others or easily create your own.\n\n- Framework Agnostic: While providing helpers for libraries like scikit-learn, the core is lightweight and can orchestrate any Python-based ML workflow.\n\n- Clear & Explicit: The framework prioritizes readability and explicit dependencies, making your pipelines easy to understand, share, and debug.\n\n## \ud83d\ude80 Installation\n\nYou can install the core framework via pip. Optional dependencies can be installed to add support for specific ML libraries.\n\n```bash\n# Install the core framework\npip install atelierflow\n\n# To include support for scikit-learn based steps\npip install atelierflow[scikit-learn]\n\n# To include support for pytorch based steps\npip install atelierflow[torch]\n```\n\n## Quick Start\n\nHere\u2019s how to build and run a simple pipeline in just a few lines of code. This example uses pre-built steps to generate data, train a model, evaluate it, and save the results.\n\n```python\nimport logging\nfrom sklearn.ensemble import RandomForestClassifier\nfrom atelierflow.experiment import Experiment\nfrom atelierflow.steps.common.save_data.save_to_avro import SaveToAvroStep\n\n#  Steps not implemented\nfrom atelierflow.steps.sklearn.evaluation import ClassificationEvaluationStep\nfrom atelierflow.steps.sklearn.training import TrainModelStep\n\n# 1. Define your components and schemas\nmodel_component = RandomForestClassifier(n_estimators=50)\nscores_schema = {'name': 'Scores', 'type': 'record', 'fields': [{'name': 'AUC', 'type': 'double'}]}\n\n# 2. Create an Experiment and configure it\nexperiment = Experiment(\n  name=\"Quick Start Classification\",\n  logging_level=\"INFO\",\n  tags={\"project\": \"onboarding-example\"}\n)\n\n# 3. Add steps to the pipeline\nexperiment.add_step(GenerateDataStep())\nexperiment.add_step(TrainModelStep(model=model_component))\nexperiment.add_step(ClassificationEvaluationStep(metrics={'AUC': roc_auc_score}))\nexperiment.add_step(\n  SaveToAvroStep(\n    output_path=\"./quick_start_results.avro\",\n    data_key='evaluation_scores',\n    schema=scores_schema\n  )\n)\n\n# 4. Run the experiment!\nif __name__ == \"__main__\":\n  final_results = experiment.run()\n  logging.info(f\"Pipeline complete. Results saved to ./quick_start_results.avro\")\n```\n\n\nAtelierFlow is built around a few simple, powerful concepts.\n\n- `Experiment`: The main orchestrator. You create an Experiment instance, give it a name, and provide global configurations. It is responsible for running the steps in the correct order.\n\n- `Step`: A single, executable stage in your pipeline. A step can do anything: load data, train a model, or save a file. Every step receives the output of the previous step and the global experiment configuration.\n\n- `StepResult`: A simple key-value store that acts as the data carrier between steps. A step adds its outputs (e.g., `result.add('trained_model', model)`) and the next step retrieves them (`input_data.get('trained_model')`).\n\n## \ud83d\udca1 Working with Pre-built Steps\n\nWhen using pre-built steps like `TrainModelStep` or `ClassificationEvaluationStep`, it's crucial that the objects you pass to them adhere to the framework's core interfaces.\n\n- Models must implement the `Model` interface. Any object passed to `TrainModelStep` must be a class that inherits from `atelierflow.core.model.Model`. This ensures the step can reliably call methods like `.fit()` and `.predict()`.\n\n- Metrics must implement the `Metric` interface. Similarly, any custom metric passed to an evaluation step must inherit from `atelierflow.core.metric.Metric` and implement the `.compute()` method.\n\nThis \"programming to an interface\" design is what gives AtelierFlow its flexibility. It allows the pre-built steps to work with any model or metric, as long as it follows the expected contract.\n\n## \ud83d\udee0\ufe0f Creating a Custom Step\n\nCreating your own step is the primary way to extend AtelierFlow. It's as simple as inheriting from the `Step` base class and implementing the `run` method.\n\n- Inherit from `Step`: Create a new class that inherits from `atelierflow.core.step.Step`.\n\n- Use `__init__` for Configuration: Pass any parameters your step needs to its constructor.\n\n- Implement `run`: This is where your logic goes. Use the `input_data` to get results from previous steps and use `experiment_config` to access global settings.\n\n- Return a `StepResult`: Your step must return a `StepResult` object, even if it's empty, to continue the pipeline.\n\n### Example: A Custom Hello World Step\n\n```python\nimport logging\nfrom atelierflow.core.step import Step\nfrom atelierflow.core.step_result import StepResult\nfrom typing import Dict, Any, Optional\n\nlogger = logging.getLogger(__name__)\n\nclass HelloWorldStep(Step):\n  \"\"\"A simple example of a custom step.\"\"\"\n  def __init__(self, message: str):\n    # 1. Use __init__ for step-specific parameters\n    self.message = message\n\n  def run(self, input_data: Optional[StepResult], experiment_config: Dict[str, Any]) -> StepResult:\n    # 2. Implement your logic in the 'run' method\n    \n    # You can access global config\n    exp_name = experiment_config.get('name', 'Default Experiment')\n    \n    logger.info(f\"Hello from the '{exp_name}' experiment!\")\n    logger.info(f\"Custom message for this step: {self.message}\")\n    \n    # 3. Return a StepResult\n    return input_data # Pass through the data to the next step\n```\n## \ud83e\udd1d Contributing\n\nContributions are welcome! Whether it's adding new pre-built steps, improving documentation, or reporting bugs, please feel free to open an issue or submit a pull request.\n",
    "bugtrack_url": null,
    "license": "MIT License\n        \n        Copyright (c) 2024 Data & IoT Atelier\n        \n        Permission is hereby granted, free of charge, to any person obtaining a copy\n        of this software and associated documentation files (the \"Software\"), to deal\n        in the Software without restriction, including without limitation the rights\n        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n        copies of the Software, and to permit persons to whom the Software is\n        furnished to do so, subject to the following conditions:\n        \n        The above copyright notice and this permission notice shall be included in all\n        copies or substantial portions of the Software.\n        \n        THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n        SOFTWARE.\n        ",
    "summary": "A lightweight Python framework for creating clear, reproducible, and scalable machine learning pipelines.",
    "version": "0.1.1",
    "project_urls": {
        "Homepage": "https://github.com/IoTDataAtelier/pipeflow"
    },
    "split_keywords": [
        "mlops",
        " pipeline",
        " machine-learning",
        " orchestration",
        " reproducibility"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "60a0c635b33f73f8d1ecd0906724590755204a90a25e880fc3c1a09fb173e746",
                "md5": "7aa582ddf77d097bf71717a81467c684",
                "sha256": "fc08402bbe4e3713727ee20b5004a3415c52ca54b35de6cc7ec6c8391c79a3c0"
            },
            "downloads": -1,
            "filename": "atelierflow-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7aa582ddf77d097bf71717a81467c684",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 10925,
            "upload_time": "2025-08-03T20:49:32",
            "upload_time_iso_8601": "2025-08-03T20:49:32.405367Z",
            "url": "https://files.pythonhosted.org/packages/60/a0/c635b33f73f8d1ecd0906724590755204a90a25e880fc3c1a09fb173e746/atelierflow-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "81b368391f592e4508c15311bc235dd98a4f6cb36a7a6acacac053468bf7a9a7",
                "md5": "540f4795a6749a215d0e66ebce47a4b4",
                "sha256": "17fcca3d70a6feb879501d3bbe1756cb10c70d9cef4fc23c0edbbc37d1ddea61"
            },
            "downloads": -1,
            "filename": "atelierflow-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "540f4795a6749a215d0e66ebce47a4b4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 98692479,
            "upload_time": "2025-08-03T20:49:34",
            "upload_time_iso_8601": "2025-08-03T20:49:34.804569Z",
            "url": "https://files.pythonhosted.org/packages/81/b3/68391f592e4508c15311bc235dd98a4f6cb36a7a6acacac053468bf7a9a7/atelierflow-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-03 20:49:34",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "IoTDataAtelier",
    "github_project": "pipeflow",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "apache-beam",
            "specs": [
                [
                    "==",
                    "2.58.1"
                ]
            ]
        },
        {
            "name": "attrs",
            "specs": [
                [
                    "==",
                    "24.2.0"
                ]
            ]
        },
        {
            "name": "certifi",
            "specs": [
                [
                    "==",
                    "2024.8.30"
                ]
            ]
        },
        {
            "name": "charset-normalizer",
            "specs": [
                [
                    "==",
                    "3.3.2"
                ]
            ]
        },
        {
            "name": "cloudpickle",
            "specs": [
                [
                    "==",
                    "2.2.1"
                ]
            ]
        },
        {
            "name": "crcmod",
            "specs": [
                [
                    "==",
                    "1.7"
                ]
            ]
        },
        {
            "name": "dill",
            "specs": [
                [
                    ">=",
                    "0.3.1.1"
                ],
                [
                    "<",
                    "0.3.2"
                ]
            ]
        },
        {
            "name": "dnspython",
            "specs": [
                [
                    "==",
                    "2.6.1"
                ]
            ]
        },
        {
            "name": "docopt",
            "specs": [
                [
                    "==",
                    "0.6.2"
                ]
            ]
        },
        {
            "name": "fastavro",
            "specs": [
                [
                    "==",
                    "1.9.5"
                ]
            ]
        },
        {
            "name": "fasteners",
            "specs": [
                [
                    "==",
                    "0.19"
                ]
            ]
        },
        {
            "name": "grpcio",
            "specs": [
                [
                    "==",
                    "1.66.1"
                ]
            ]
        },
        {
            "name": "hdfs",
            "specs": [
                [
                    "==",
                    "2.7.3"
                ]
            ]
        },
        {
            "name": "httplib2",
            "specs": [
                [
                    "==",
                    "0.22.0"
                ]
            ]
        },
        {
            "name": "idna",
            "specs": [
                [
                    "==",
                    "3.8"
                ]
            ]
        },
        {
            "name": "joblib",
            "specs": [
                [
                    "==",
                    "1.4.2"
                ]
            ]
        },
        {
            "name": "jsonpickle",
            "specs": [
                [
                    "==",
                    "3.2.2"
                ]
            ]
        },
        {
            "name": "jsonschema",
            "specs": [
                [
                    "==",
                    "4.23.0"
                ]
            ]
        },
        {
            "name": "jsonschema-specifications",
            "specs": [
                [
                    "==",
                    "2023.12.1"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "==",
                    "1.26.4"
                ]
            ]
        },
        {
            "name": "objsize",
            "specs": [
                [
                    "==",
                    "0.7.0"
                ]
            ]
        },
        {
            "name": "orjson",
            "specs": [
                [
                    "==",
                    "3.10.7"
                ]
            ]
        },
        {
            "name": "packaging",
            "specs": [
                [
                    "==",
                    "24.1"
                ]
            ]
        },
        {
            "name": "proto-plus",
            "specs": [
                [
                    "==",
                    "1.24.0"
                ]
            ]
        },
        {
            "name": "protobuf",
            "specs": [
                [
                    "==",
                    "4.25.8"
                ]
            ]
        },
        {
            "name": "pyarrow",
            "specs": [
                [
                    "==",
                    "16.1.0"
                ]
            ]
        },
        {
            "name": "pyarrow-hotfix",
            "specs": [
                [
                    "==",
                    "0.6"
                ]
            ]
        },
        {
            "name": "pydot",
            "specs": [
                [
                    "==",
                    "1.4.2"
                ]
            ]
        },
        {
            "name": "pymongo",
            "specs": [
                [
                    "==",
                    "4.8.0"
                ]
            ]
        },
        {
            "name": "pyparsing",
            "specs": [
                [
                    "==",
                    "3.1.4"
                ]
            ]
        },
        {
            "name": "python-dateutil",
            "specs": [
                [
                    "==",
                    "2.9.0.post0"
                ]
            ]
        },
        {
            "name": "pytz",
            "specs": [
                [
                    "==",
                    "2024.1"
                ]
            ]
        },
        {
            "name": "redis",
            "specs": [
                [
                    "==",
                    "5.0.8"
                ]
            ]
        },
        {
            "name": "referencing",
            "specs": [
                [
                    "==",
                    "0.35.1"
                ]
            ]
        },
        {
            "name": "regex",
            "specs": [
                [
                    "==",
                    "2024.7.24"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    "==",
                    "2.32.4"
                ]
            ]
        },
        {
            "name": "rpds-py",
            "specs": [
                [
                    "==",
                    "0.20.0"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    "==",
                    "1.5.1"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    "==",
                    "1.14.1"
                ]
            ]
        },
        {
            "name": "six",
            "specs": [
                [
                    "==",
                    "1.16.0"
                ]
            ]
        },
        {
            "name": "threadpoolctl",
            "specs": [
                [
                    "==",
                    "3.5.0"
                ]
            ]
        },
        {
            "name": "typing_extensions",
            "specs": [
                [
                    "==",
                    "4.12.2"
                ]
            ]
        },
        {
            "name": "urllib3",
            "specs": [
                [
                    "==",
                    "2.5.0"
                ]
            ]
        },
        {
            "name": "zstandard",
            "specs": [
                [
                    "==",
                    "0.23.0"
                ]
            ]
        }
    ],
    "lcname": "atelierflow"
}
        
Elapsed time: 2.40567s