retrain-pipelines


Nameretrain-pipelines JSON
Version 0.1.1 PyPI version JSON
download
home_pageNone
Summaryretrain-pipelines lowers the barrier to entry for the creation and management of professional machine learning retraining pipelines.
upload_time2024-11-04 12:35:35
maintainerAurelien-Morgan
docs_urlNone
authorAurelien-Morgan
requires_python>=3.3
licenseApache License 2.0
keywords machine-learning ml-pipelines retrain-pipelines model-retraining automl mlops model-versioning model-blessing inference-pipeline docker-deployment data-preprocessing hyperparameter-tuning model-performance pipeline-documentation eda exploratory-data-analysis continuous-integration continuous-deployment ci-cd model-monitoring pipeline-customization pipeline-templates open-source
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
![PyPI - Downloads](https://img.shields.io/pypi/dm/retrain-pipelines)
![GitHub - License](https://img.shields.io/github/license/aurelienmorgan/retrain-pipelines?logo=github&style=flat&color=green)

![logo_large](https://github.com/user-attachments/assets/19725866-13f9-48c1-b958-35c2e014351a)

<b>retrain-pipelines</b> simplifies the creation and management of machine learning retraining pipelines. 
The package is designed to remove the complexity of building end-to-end ML retraining pipelines, allowing users to focus on their data and model-architecture. 
With pre-built, highly adaptable pipeline examples that work out of the box, users can easily integrate their own data and begin retraining models with minimal-to-no setup. 

### Key features of retrain-pipelines include:
- **Model version blessing**: Automatically compare the performance of retrained models against previous best versions to ensure only superior models are deployed.
- **Infrastructure validation**: Each retraining pipeline includes inference pipeline packaging, local Docker container deployment, and request/response validation to ensure that models are production-ready.
- **Comprehensive documentation**: Every retraining pipeline is fully documented with sections covering Exploratory Data Analysis (EDA), hyperparameter tuning, retraining steps, model performance metrics, and key commands for retrieving training artifacts. 
  Additionally, DAG information for the retraining process is readily available for pipeline transparency and debugging.

In essence, <b>retrain-pipelines</b> offers a seamless solution: "Come with your data and it works" with the added benefit of flexibility for more advanced users to adjust and extend pipelines as needed.

### Customizability & Adaptability
<b>retrain-pipelines</b> offers a high degree of flexibility, allowing users to tailor the pre-shipped pipelines to their specific needs:
- **Custom Preprocessing Functions**: Users can provide their own Python functions for custom data preprocessing. For example, some built-in pipelines for tabular data allow optional bucketization of numerical features by name, but you can easily modify or extend these preprocessing steps to suit your dataset and feature requirements.
- **Custom Pipeline Card Generation**: You can specify custom Python functions to generate pipeline cards, such as including specific performance charts or metrics relevant to your use case.
- **Custom HTML Templates**: For further personalization, retrain-pipelines supports customizable HTML templates, enabling you to adjust formatting, insert specific charts, change page colors, or even add your company's logo to documentation pages. 

<b>retrain-pipelines</b> doesn't just streamline the retraining process, it empowers teams to innovate faster, iterate smarter, and deploy more robust models with confidence. Whether you're looking for an out-of-the-box solution or a highly customizable pipeline, <b>retrain-pipelines</b> is your ultimate companion for continuous model improvement.


## Getting Started

You can trigger a <b>retrain-pipelines</b> launch from many different places.

[local_launcher.webm](https://github.com/user-attachments/assets/4164abfd-4cd6-4e8a-a720-07267241b9f6)


## Sample pipelines

the <b>retrain-pipelines</b> package comes with off-the-shelf Machine Learning retraining pipelines. Find them at <code><a href="https://github.com/aurelienmorgan/retrain-pipelines/tree/master/sample_pipelines" target="_blank">/sample_pipelines</a></code>. For instance&nbsp;:

| framework | modality | task | model lib | Serving |  |
|----------|----------|----------|----------|----------|--|
| <a href="https://metaflow.org/" target="_blank">Metaflow</a> <img src="https://github.com/user-attachments/assets/30f4f382-3032-4bf7-b697-f6dbcab35fd7" height=20px /> | Tabular   | regression   | <a href="https://www.dask.org/" target="_blank">Dask</a> <img src="https://github.com/user-attachments/assets/a94807e7-cc67-4415-9a9e-da1ed4755cb1" width=20px /> / <a href="https://lightgbm.readthedocs.io/en/stable/" target="_blank">LightGBM</a> <img src="https://github.com/user-attachments/assets/92ac0b53-17f8-470d-9c73-619657db42bd" width=20px />   | <a href="https://www.seldon.io/solutions/seldon-mlserver" target="_blank">ML Server</a> <img src="https://github.com/user-attachments/assets/69c57bce-cd38-4f8c-8730-e5171e842d13" width=20px /> | <b><a href="https://github.com/aurelienmorgan/retrain-pipelines/tree/master/sample_pipelines/LightGBM_hp_cv_WandB" target="_blank">starter-kit</a></b> |
| <a href="https://metaflow.org/" target="_blank">Metaflow</a> <img src="https://github.com/user-attachments/assets/30f4f382-3032-4bf7-b697-f6dbcab35fd7" height=20px /> | Tabular   | classification | <a href="https://pytorch.org/" target="_blank">Pytorch</a> <img src="https://github.com/user-attachments/assets/bfa9b38e-e9b3-41ff-8370-e64a0a0a4a93" width=20px /> / <a href="https://github.com/dreamquark-ai/tabnet/tree/develop" target="_blank">TabNet</a> | <a href="https://pytorch.org/serve/" target="_blank">TorchServe</a> | <b><a href="https://github.com/aurelienmorgan/retrain-pipelines/tree/master/sample_pipelines/TabNet_hp_cv_WandB" target="_blank">starter-kit</a></b> |

You can simply give one of those your data and it just runs. The only manual change you need to do is regarding the endpoint request &amp; serving signatures, since it is purposely hard-coded.<br />
<small>Indeed, the <code>infra_validator</code> step is here to ensure that <u>your inference pipeline</u> (the one you're working on building a continuous-retraining automation for) keeps adhering to the schema expected by consumers of the inference endpoint. So, if you break the format of the required input raw data, you need to create a somehow new retraining pipeline and assign it a new unique name. This is to ensure that any interface disruption between the inference endpoint and its consumer(s) is intentional.</small>

## some important markers

One of the things that make <b>retrain-pipelines</b> stand is its focus on strong MLOps fundamentals.

<details>
  <summary>model blessing&nbsp;🔽</summary>
<b>retrain-pipelines</b> cares for the newly-retrained model version to be evaluated against the previous model versions from that retraining pipeline. We indeed ensure that no lesser-performing model ever gets into production.<br />
Default sample pipelines each come with certain built-in evaluation criteria but, you can customize those per your own requirement. You can for instance choose to include evaluation of model performance on a particular sub-population, so as to serve as a gateway against potential incoming biases.
<hr width=60% />
</details>

<details>
  <summary>infrastructure validation&nbsp;🔽</summary>
<b>retrain-pipelines</b> cares for the inference endpoint to be tested prior to deployment. We pack the preprocessing engine together with the newly retrained (and blessed) model version with the ML-server of choice and deploy it locally. We then send an inference request to that temp endpoint and check for a <code>200 http-ok</code> response with a valid payload format.
<hr width=60% />
</details>

<details>
  <summary>pipeline cards&nbsp;🔽</summary>
<b>retrain-pipelines</b> is strongly opinionated around ease of quick-access to information ML-engineers care for when it comes to retraining and serving.<br />
That's why it offers a central place and minimal amounts of clicks to navigate efficiently.
<table width=100%>
  <tr width=100%>
    <td>
      <a href="https://github.com/user-attachments/assets/fc4b94a5-8178-49b0-822a-a8088dbf1b6d" target="_blank"><img src="https://github.com/user-attachments/assets/fc4b94a5-8178-49b0-822a-a8088dbf1b6d" width=100 height=80 /></a><br />
      overview
    </td>
    <td>
      <a href="https://github.com/user-attachments/assets/b5ce6b19-df87-4486-ac71-3cf06776452e" target="_blank"><img src="https://github.com/user-attachments/assets/b5ce6b19-df87-4486-ac71-3cf06776452e" width=100 height=80 /></a><br />
      EDA
    </td>
    <td>
      <a href="https://github.com/user-attachments/assets/34d401b2-ad79-49e3-b07f-f6fb61418ea1" target="_balnk"><img src="https://github.com/user-attachments/assets/34d401b2-ad79-49e3-b07f-f6fb61418ea1" width=100 height=80 /></a><br />
      overall retraining
    </td>
  </tr>
  <tr>
    <td>
      <a href="https://github.com/user-attachments/assets/560aa7a6-c7ad-4dce-97e8-b1f8cee8edae" target="_blank"><img src="https://github.com/user-attachments/assets/a1aa13c1-2401-4527-b5eb-a8dc2ddca195" width=100 height=80 /></a><br />
      hyperparameter tuning
    </td>
    <td>
      <a href="https://github.com/user-attachments/assets/bf0e0e3f-a442-415d-bb79-104afba3f519" target="_blank"><img src="https://github.com/user-attachments/assets/bf0e0e3f-a442-415d-bb79-104afba3f519" width=100 height=80 /></a><br />
     key artifacts
    </td>
    <td>
      <a href="https://github.com/user-attachments/assets/d6d6c645-be5a-4b3b-9abf-339e0b034703" target="_blank"><img src="https://github.com/user-attachments/assets/35ddeb91-81c8-4caa-b17f-6704aae22410" width=100 height=80 /></a><br />
      pipeline DAG
    </td>
  </tr>
  <tr>
    <td colspan="3">
      <em><small>click thumbnails to enlarge</small></em>
    </td>
  </tr>
</table>
Browse a live example for yourself <a href="https://retrain-pipelines.w3spaces.com/html-custom-2d5ac4812402cf8726619e81d8cc6c8f0ba94c24.html" target="_blank">here on W3Schools Spaces</a>
(click "continue" on the W3Schools landing page)
<hr width=60% />
</details>

<details>
  <summary>Third-parties integration&nbsp;🔽</summary>
TensorBoard, PyTorch Profiler, Weights and Biases. <b>retrain-pipelines</b> aims at making centrally available to ML engineers the information they care for.

  <details>
  <summary>illustration with <code>WandB</code> in the <code>LightGBM_hp_cv_WandB</code> sample pipeline&nbsp;🔽</summary>
  In the example of the <code>LightGBM_hp_cv_WandB</code> sample pipeline for instance, you can find information on how to view details on logging performed during the different <code>training_job</code> steps of a given run. Follow the guidance from the below video&nbsp;:<br />

  [wandb_integration.webm](https://github.com/user-attachments/assets/730bc695-0768-484b-8e6e-2dbf0db08d68)
  </details>
  <hr width=60% />
</details>

<details>
  <summary>customizability&nbsp;🔽</summary>
  As alluded to <a href="#customizability--adaptability">above</a>, a lot of room is given to ML engineers for them to customize <b>retrain-pipelines</b> workflows.<br />
  For staters, the sample pipelines are freely modifiable themselves. But, it goes far beyond that. One can go deep into customization with the defaults for <code>preprocessing</code> and for <code>pipeline_card</code> being fully amendable as well.

  <details>
    <summary>illustration with the <code>LightGBM_hp_cv_WandB</code> sample pipeline&nbsp;🔽</summary>
    Start by getting the default which you'd like to customize (any combination of the below 3 you'd like) :
    <ul>
      <li><code>reprocessing.py</code> module</li>
      <li><code>pipeline_card.py</code> module</li>
      <li><code>template.html</code> html template</li>
    </ul>

  ```shell
  cd sample_pipelines/LightGBM_hp_cv_WandB/
  ```
  ```python
  from retraining_pipeline import LightGbmHpCvWandbFlow

  LightGbmHpCvWandbFlow.copy_default_preprocess_module(".", exists_ok=True)
  LightGbmHpCvWandbFlow.copy_default_pipeline_card_module(".", exists_ok=True)
  LightGbmHpCvWandbFlow.copy_default_pipeline_card_html_template(".", exists_ok=True)
  ```
  Once you updated any of them, you can launch a <b>retrain-pipelines</b> run so it uses those :
  ```python
  %retrain_pipelines_local retraining_pipeline.py run \
    --pipeline_card_artifacts_path "." \
    --preprocess_artifacts_path "."
  ```
  </details>
  <hr width=60% />
</details>


## retrain-pipelines inspectors

Inspectors are convenience methods that abstract away some of the logic to get access to artifacts logged during <b>retrain-pipelines</b> runs.

For instance&nbsp;:
<ul>
  <li>
  <details>
    <summary><code>browse_local_pipeline_card</code>&nbsp;🔽</summary>
    With this convenience method, programmatically open a <code>pipeline_card</code> without the need to browse and click a ML-framework UI&nbsp;:<br />

  ```python
  from retrain_pipelines.inspectors import browse_local_pipeline_card
  
  browse_local_pipeline_card(mf_flow_name)
  ```
  This opens the <code>pipeline_card</code> in a web-browser tab, so you don't have to look for it.
  It's ideal for quick ideation during the drafting phase&nbsp;:
  developers can now <code>run/resume</code> &amp; <code>browse</code> in a chain on instructions.
  <hr width=60% />
  </details>
  </li>
  <li>
  <details>
    <summary><code>get_execution_source_code</code>&nbsp;🔽</summary>
    With this convenience method, programmatically access the versioned source code that was used for a particular <b>retrain-pipelines</b> run. This comes together with the <b>WandB integration</b>&nbsp;:<br />

  ```python
  from retrain_pipelines.inspectors import get_execution_source_code
  
  for source_code_artifact in get_execution_source_code(mf_run_id=<your_flow_id>):
    print(f" - {source_code_artifact.name} {source_code_artifact.url}")
  ```
  You can even have those artifacts downloaded on the go&nbsp;:

  ```python
  from retrain_pipelines.inspectors import explore_source_code
  # download and open file explorer
  explore_source_code(mf_run_id=<your_flow_id>)
  ```
  <hr width=60% />
  </details>
  </li>
  <li>
  <details>
    <summary><code>plot_run_all_cv_tasks</code>&nbsp;🔽</summary>
  Specific to <b>retrain-pipelines</b> runs that involve data-parallelism,
  this <b>inspector</b> method plots each individual hyperparameter-tuning cross-validation training job, showing details for every data-parallel worker.<br />
  For example, for executions of the <code><a href="https://github.com/aurelienmorgan/retrain-pipelines/tree/master/sample_pipelines/LightGBM_hp_cv_WandB" target="_blank">LightGbmHpCvWandbFlow</a></code> sample pipeline (which employs <b>Dask</b> for data-parallel training), this gives&nbsp;:<br />

  ```python
  from retrain_pipelines.inspectors.hp_cv_inspector import plot_run_all_cv_tasks
  
  plot_run_all_cv_tasks(mf_run_id=<your_flow_id>)
  ```
  with results looking like below for a run with 6 different sets of hp values, 2 cross-validation folds and with 4 Dask data-parallel workers&nbsp;:<br />
  <a href="https://github.com/user-attachments/assets/f3c03b06-a086-4be5-9815-73d1a887179d" target="_blank"><img src="https://github.com/user-attachments/assets/f3c03b06-a086-4be5-9815-73d1a887179d" width=400/></a>
  <hr width=60% />
  </details>
  </li>
  <li>
    and more.
  </li>
</ul>

# launch tests
    pytest -s tests

# build from source
    python -m build --verbose pkg_src
# install from source (dev mode)
    pip install -e pkg_src
# install from remote source
    pip install git+https://github.com/aurelienmorgan/retrain-pipelines.git@master#subdirectory=pkg_src

# PyPi
find us @ https://pypi.org/project/retrain-pipelines/
<br />
<hr />

[![GitHub Stars](https://img.shields.io/github/stars/aurelienmorgan/retrain-pipelines.svg?style=social&label=-%C2%A0retrain-pipelines%C2%A0-&maxAge=2592000)](https://github.com/aurelienmorgan/retrain-pipelines/stargazers)<br />
Please consider dropping us a star ! ⭐
<br />
<hr />

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "retrain-pipelines",
    "maintainer": "Aurelien-Morgan",
    "docs_url": null,
    "requires_python": ">=3.3",
    "maintainer_email": null,
    "keywords": "machine-learning, ml-pipelines, retrain-pipelines, model-retraining, automl, mlops, model-versioning, model-blessing, inference-pipeline, docker-deployment, data-preprocessing, hyperparameter-tuning, model-performance, pipeline-documentation, eda, exploratory-data-analysis, continuous-integration, continuous-deployment, ci-cd, model-monitoring, pipeline-customization, pipeline-templates, open-source",
    "author": "Aurelien-Morgan",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/36/cb/6014e2675d1ad7e7a7e512b1c536545a34af07509da7ad492c00053e832b/retrain_pipelines-0.1.1.tar.gz",
    "platform": null,
    "description": "\ufeff\n![PyPI - Downloads](https://img.shields.io/pypi/dm/retrain-pipelines)\n![GitHub - License](https://img.shields.io/github/license/aurelienmorgan/retrain-pipelines?logo=github&style=flat&color=green)\n\n![logo_large](https://github.com/user-attachments/assets/19725866-13f9-48c1-b958-35c2e014351a)\n\n<b>retrain-pipelines</b> simplifies the creation and management of machine learning retraining pipelines. \nThe package is designed to remove the complexity of building end-to-end ML retraining pipelines, allowing users to focus on their data and model-architecture. \nWith pre-built, highly adaptable pipeline examples that work out of the box, users can easily integrate their own data and begin retraining models with minimal-to-no setup. \n\n### Key features of retrain-pipelines include:\n- **Model version blessing**: Automatically compare the performance of retrained models against previous best versions to ensure only superior models are deployed.\n- **Infrastructure validation**: Each retraining pipeline includes inference pipeline packaging, local Docker container deployment, and request/response validation to ensure that models are production-ready.\n- **Comprehensive documentation**: Every retraining pipeline is fully documented with sections covering Exploratory Data Analysis (EDA), hyperparameter tuning, retraining steps, model performance metrics, and key commands for retrieving training artifacts. \n  Additionally, DAG information for the retraining process is readily available for pipeline transparency and debugging.\n\nIn essence, <b>retrain-pipelines</b> offers a seamless solution: \"Come with your data and it works\" with the added benefit of flexibility for more advanced users to adjust and extend pipelines as needed.\n\n### Customizability & Adaptability\n<b>retrain-pipelines</b> offers a high degree of flexibility, allowing users to tailor the pre-shipped pipelines to their specific needs:\n- **Custom Preprocessing Functions**: Users can provide their own Python functions for custom data preprocessing. For example, some built-in pipelines for tabular data allow optional bucketization of numerical features by name, but you can easily modify or extend these preprocessing steps to suit your dataset and feature requirements.\n- **Custom Pipeline Card Generation**: You can specify custom Python functions to generate pipeline cards, such as including specific performance charts or metrics relevant to your use case.\n- **Custom HTML Templates**: For further personalization, retrain-pipelines supports customizable HTML templates, enabling you to adjust formatting, insert specific charts, change page colors, or even add your company's logo to documentation pages. \n\n<b>retrain-pipelines</b> doesn't just streamline the retraining process, it empowers teams to innovate faster, iterate smarter, and deploy more robust models with confidence. Whether you're looking for an out-of-the-box solution or a highly customizable pipeline, <b>retrain-pipelines</b> is your ultimate companion for continuous model improvement.\n\n\n## Getting Started\n\nYou can trigger a <b>retrain-pipelines</b> launch from many different places.\n\n[local_launcher.webm](https://github.com/user-attachments/assets/4164abfd-4cd6-4e8a-a720-07267241b9f6)\n\n\n## Sample pipelines\n\nthe <b>retrain-pipelines</b> package comes with off-the-shelf Machine Learning retraining pipelines. Find them at <code><a href=\"https://github.com/aurelienmorgan/retrain-pipelines/tree/master/sample_pipelines\" target=\"_blank\">/sample_pipelines</a></code>. For instance&nbsp;:\n\n| framework | modality | task | model lib | Serving |  |\n|----------|----------|----------|----------|----------|--|\n| <a href=\"https://metaflow.org/\" target=\"_blank\">Metaflow</a> <img src=\"https://github.com/user-attachments/assets/30f4f382-3032-4bf7-b697-f6dbcab35fd7\" height=20px /> | Tabular   | regression   | <a href=\"https://www.dask.org/\" target=\"_blank\">Dask</a> <img src=\"https://github.com/user-attachments/assets/a94807e7-cc67-4415-9a9e-da1ed4755cb1\" width=20px /> / <a href=\"https://lightgbm.readthedocs.io/en/stable/\" target=\"_blank\">LightGBM</a> <img src=\"https://github.com/user-attachments/assets/92ac0b53-17f8-470d-9c73-619657db42bd\" width=20px />   | <a href=\"https://www.seldon.io/solutions/seldon-mlserver\" target=\"_blank\">ML Server</a> <img src=\"https://github.com/user-attachments/assets/69c57bce-cd38-4f8c-8730-e5171e842d13\" width=20px /> | <b><a href=\"https://github.com/aurelienmorgan/retrain-pipelines/tree/master/sample_pipelines/LightGBM_hp_cv_WandB\" target=\"_blank\">starter-kit</a></b> |\n| <a href=\"https://metaflow.org/\" target=\"_blank\">Metaflow</a> <img src=\"https://github.com/user-attachments/assets/30f4f382-3032-4bf7-b697-f6dbcab35fd7\" height=20px /> | Tabular   | classification | <a href=\"https://pytorch.org/\" target=\"_blank\">Pytorch</a> <img src=\"https://github.com/user-attachments/assets/bfa9b38e-e9b3-41ff-8370-e64a0a0a4a93\" width=20px /> / <a href=\"https://github.com/dreamquark-ai/tabnet/tree/develop\" target=\"_blank\">TabNet</a> | <a href=\"https://pytorch.org/serve/\" target=\"_blank\">TorchServe</a> | <b><a href=\"https://github.com/aurelienmorgan/retrain-pipelines/tree/master/sample_pipelines/TabNet_hp_cv_WandB\" target=\"_blank\">starter-kit</a></b> |\n\nYou can simply give one of those your data and it just runs. The only manual change you need to do is regarding the endpoint request &amp; serving signatures, since it is purposely hard-coded.<br />\n<small>Indeed, the <code>infra_validator</code> step is here to ensure that <u>your inference pipeline</u> (the one you're working on building a continuous-retraining automation for) keeps adhering to the schema expected by consumers of the inference endpoint. So, if you break the format of the required input raw data, you need to create a somehow new retraining pipeline and assign it a new unique name. This is to ensure that any interface disruption between the inference endpoint and its consumer(s) is intentional.</small>\n\n## some important markers\n\nOne of the things that make <b>retrain-pipelines</b> stand is its focus on strong MLOps fundamentals.\n\n<details>\n  <summary>model blessing&nbsp;\ud83d\udd3d</summary>\n<b>retrain-pipelines</b> cares for the newly-retrained model version to be evaluated against the previous model versions from that retraining pipeline. We indeed ensure that no lesser-performing model ever gets into production.<br />\nDefault sample pipelines each come with certain built-in evaluation criteria but, you can customize those per your own requirement. You can for instance choose to include evaluation of model performance on a particular sub-population, so as to serve as a gateway against potential incoming biases.\n<hr width=60% />\n</details>\n\n<details>\n  <summary>infrastructure validation&nbsp;\ud83d\udd3d</summary>\n<b>retrain-pipelines</b> cares for the inference endpoint to be tested prior to deployment. We pack the preprocessing engine together with the newly retrained (and blessed) model version with the ML-server of choice and deploy it locally. We then send an inference request to that temp endpoint and check for a <code>200 http-ok</code> response with a valid payload format.\n<hr width=60% />\n</details>\n\n<details>\n  <summary>pipeline cards&nbsp;\ud83d\udd3d</summary>\n<b>retrain-pipelines</b> is strongly opinionated around ease of quick-access to information ML-engineers care for when it comes to retraining and serving.<br />\nThat's why it offers a central place and minimal amounts of clicks to navigate efficiently.\n<table width=100%>\n  <tr width=100%>\n    <td>\n      <a href=\"https://github.com/user-attachments/assets/fc4b94a5-8178-49b0-822a-a8088dbf1b6d\" target=\"_blank\"><img src=\"https://github.com/user-attachments/assets/fc4b94a5-8178-49b0-822a-a8088dbf1b6d\" width=100 height=80 /></a><br />\n      overview\n    </td>\n    <td>\n      <a href=\"https://github.com/user-attachments/assets/b5ce6b19-df87-4486-ac71-3cf06776452e\" target=\"_blank\"><img src=\"https://github.com/user-attachments/assets/b5ce6b19-df87-4486-ac71-3cf06776452e\" width=100 height=80 /></a><br />\n      EDA\n    </td>\n    <td>\n      <a href=\"https://github.com/user-attachments/assets/34d401b2-ad79-49e3-b07f-f6fb61418ea1\" target=\"_balnk\"><img src=\"https://github.com/user-attachments/assets/34d401b2-ad79-49e3-b07f-f6fb61418ea1\" width=100 height=80 /></a><br />\n      overall retraining\n    </td>\n  </tr>\n  <tr>\n    <td>\n      <a href=\"https://github.com/user-attachments/assets/560aa7a6-c7ad-4dce-97e8-b1f8cee8edae\" target=\"_blank\"><img src=\"https://github.com/user-attachments/assets/a1aa13c1-2401-4527-b5eb-a8dc2ddca195\" width=100 height=80 /></a><br />\n      hyperparameter tuning\n    </td>\n    <td>\n      <a href=\"https://github.com/user-attachments/assets/bf0e0e3f-a442-415d-bb79-104afba3f519\" target=\"_blank\"><img src=\"https://github.com/user-attachments/assets/bf0e0e3f-a442-415d-bb79-104afba3f519\" width=100 height=80 /></a><br />\n     key artifacts\n    </td>\n    <td>\n      <a href=\"https://github.com/user-attachments/assets/d6d6c645-be5a-4b3b-9abf-339e0b034703\" target=\"_blank\"><img src=\"https://github.com/user-attachments/assets/35ddeb91-81c8-4caa-b17f-6704aae22410\" width=100 height=80 /></a><br />\n      pipeline DAG\n    </td>\n  </tr>\n  <tr>\n    <td colspan=\"3\">\n      <em><small>click thumbnails to enlarge</small></em>\n    </td>\n  </tr>\n</table>\nBrowse a live example for yourself <a href=\"https://retrain-pipelines.w3spaces.com/html-custom-2d5ac4812402cf8726619e81d8cc6c8f0ba94c24.html\" target=\"_blank\">here on W3Schools Spaces</a>\n(click \"continue\" on the W3Schools landing page)\n<hr width=60% />\n</details>\n\n<details>\n  <summary>Third-parties integration&nbsp;\ud83d\udd3d</summary>\nTensorBoard, PyTorch Profiler, Weights and Biases. <b>retrain-pipelines</b> aims at making centrally available to ML engineers the information they care for.\n\n  <details>\n  <summary>illustration with <code>WandB</code> in the <code>LightGBM_hp_cv_WandB</code> sample pipeline&nbsp;\ud83d\udd3d</summary>\n  In the example of the <code>LightGBM_hp_cv_WandB</code> sample pipeline for instance, you can find information on how to view details on logging performed during the different <code>training_job</code> steps of a given run. Follow the guidance from the below video&nbsp;:<br />\n\n  [wandb_integration.webm](https://github.com/user-attachments/assets/730bc695-0768-484b-8e6e-2dbf0db08d68)\n  </details>\n  <hr width=60% />\n</details>\n\n<details>\n  <summary>customizability&nbsp;\ud83d\udd3d</summary>\n  As alluded to <a href=\"#customizability--adaptability\">above</a>, a lot of room is given to ML engineers for them to customize <b>retrain-pipelines</b> workflows.<br />\n  For staters, the sample pipelines are freely modifiable themselves. But, it goes far beyond that. One can go deep into customization with the defaults for <code>preprocessing</code> and for <code>pipeline_card</code> being fully amendable as well.\n\n  <details>\n    <summary>illustration with the <code>LightGBM_hp_cv_WandB</code> sample pipeline&nbsp;\ud83d\udd3d</summary>\n    Start by getting the default which you'd like to customize (any combination of the below 3 you'd like) :\n    <ul>\n      <li><code>reprocessing.py</code> module</li>\n      <li><code>pipeline_card.py</code> module</li>\n      <li><code>template.html</code> html template</li>\n    </ul>\n\n  ```shell\n  cd sample_pipelines/LightGBM_hp_cv_WandB/\n  ```\n  ```python\n  from retraining_pipeline import LightGbmHpCvWandbFlow\n\n  LightGbmHpCvWandbFlow.copy_default_preprocess_module(\".\", exists_ok=True)\n  LightGbmHpCvWandbFlow.copy_default_pipeline_card_module(\".\", exists_ok=True)\n  LightGbmHpCvWandbFlow.copy_default_pipeline_card_html_template(\".\", exists_ok=True)\n  ```\n  Once you updated any of them, you can launch a <b>retrain-pipelines</b> run so it uses those :\n  ```python\n  %retrain_pipelines_local retraining_pipeline.py run \\\n    --pipeline_card_artifacts_path \".\" \\\n    --preprocess_artifacts_path \".\"\n  ```\n  </details>\n  <hr width=60% />\n</details>\n\n\n## retrain-pipelines inspectors\n\nInspectors are convenience methods that abstract away some of the logic to get access to artifacts logged during <b>retrain-pipelines</b> runs.\n\nFor instance&nbsp;:\n<ul>\n  <li>\n  <details>\n    <summary><code>browse_local_pipeline_card</code>&nbsp;\ud83d\udd3d</summary>\n    With this convenience method, programmatically open a <code>pipeline_card</code> without the need to browse and click a ML-framework UI&nbsp;:<br />\n\n  ```python\n  from retrain_pipelines.inspectors import browse_local_pipeline_card\n  \n  browse_local_pipeline_card(mf_flow_name)\n  ```\n  This opens the <code>pipeline_card</code> in a web-browser tab, so you don't have to look for it.\n  It's ideal for quick ideation during the drafting phase&nbsp;:\n  developers can now <code>run/resume</code> &amp; <code>browse</code> in a chain on instructions.\n  <hr width=60% />\n  </details>\n  </li>\n  <li>\n  <details>\n    <summary><code>get_execution_source_code</code>&nbsp;\ud83d\udd3d</summary>\n    With this convenience method, programmatically access the versioned source code that was used for a particular <b>retrain-pipelines</b> run. This comes together with the <b>WandB integration</b>&nbsp;:<br />\n\n  ```python\n  from retrain_pipelines.inspectors import get_execution_source_code\n  \n  for source_code_artifact in get_execution_source_code(mf_run_id=<your_flow_id>):\n    print(f\" - {source_code_artifact.name} {source_code_artifact.url}\")\n  ```\n  You can even have those artifacts downloaded on the go&nbsp;:\n\n  ```python\n  from retrain_pipelines.inspectors import explore_source_code\n  # download and open file explorer\n  explore_source_code(mf_run_id=<your_flow_id>)\n  ```\n  <hr width=60% />\n  </details>\n  </li>\n  <li>\n  <details>\n    <summary><code>plot_run_all_cv_tasks</code>&nbsp;\ud83d\udd3d</summary>\n  Specific to <b>retrain-pipelines</b> runs that involve data-parallelism,\n  this <b>inspector</b> method plots each individual hyperparameter-tuning cross-validation training job, showing details for every data-parallel worker.<br />\n  For example, for executions of the <code><a href=\"https://github.com/aurelienmorgan/retrain-pipelines/tree/master/sample_pipelines/LightGBM_hp_cv_WandB\" target=\"_blank\">LightGbmHpCvWandbFlow</a></code> sample pipeline (which employs <b>Dask</b> for data-parallel training), this gives&nbsp;:<br />\n\n  ```python\n  from retrain_pipelines.inspectors.hp_cv_inspector import plot_run_all_cv_tasks\n  \n  plot_run_all_cv_tasks(mf_run_id=<your_flow_id>)\n  ```\n  with results looking like below for a run with 6 different sets of hp values, 2 cross-validation folds and with 4 Dask data-parallel workers&nbsp;:<br />\n  <a href=\"https://github.com/user-attachments/assets/f3c03b06-a086-4be5-9815-73d1a887179d\" target=\"_blank\"><img src=\"https://github.com/user-attachments/assets/f3c03b06-a086-4be5-9815-73d1a887179d\" width=400/></a>\n  <hr width=60% />\n  </details>\n  </li>\n  <li>\n    and more.\n  </li>\n</ul>\n\n# launch tests\n    pytest -s tests\n\n# build from source\n    python -m build --verbose pkg_src\n# install from source (dev mode)\n    pip install -e pkg_src\n# install from remote source\n    pip install git+https://github.com/aurelienmorgan/retrain-pipelines.git@master#subdirectory=pkg_src\n\n# PyPi\nfind us @ https://pypi.org/project/retrain-pipelines/\n<br />\n<hr />\n\n[![GitHub Stars](https://img.shields.io/github/stars/aurelienmorgan/retrain-pipelines.svg?style=social&label=-%C2%A0retrain-pipelines%C2%A0-&maxAge=2592000)](https://github.com/aurelienmorgan/retrain-pipelines/stargazers)<br />\nPlease consider dropping us a star ! \u2b50\n<br />\n<hr />\n",
    "bugtrack_url": null,
    "license": "Apache License 2.0",
    "summary": "retrain-pipelines lowers the barrier to entry for the creation and management of professional machine learning retraining pipelines.",
    "version": "0.1.1",
    "project_urls": {
        "PyPI": "https://pypi.org/project/retrain-pipelines/",
        "Repository": "https://github.com/aurelienmorgan/retrain-pipelines"
    },
    "split_keywords": [
        "machine-learning",
        " ml-pipelines",
        " retrain-pipelines",
        " model-retraining",
        " automl",
        " mlops",
        " model-versioning",
        " model-blessing",
        " inference-pipeline",
        " docker-deployment",
        " data-preprocessing",
        " hyperparameter-tuning",
        " model-performance",
        " pipeline-documentation",
        " eda",
        " exploratory-data-analysis",
        " continuous-integration",
        " continuous-deployment",
        " ci-cd",
        " model-monitoring",
        " pipeline-customization",
        " pipeline-templates",
        " open-source"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "18f5e2350f87979e372cdd389def18fcae234756293cd2757b5539cf7d06e23b",
                "md5": "9116fbd005f13295d9ca1940db262cc4",
                "sha256": "d97c0381e61b1f3c48dd71aa90da2518cd9be0ac2326e373e716bbad1a66f102"
            },
            "downloads": -1,
            "filename": "retrain_pipelines-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9116fbd005f13295d9ca1940db262cc4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.3",
            "size": 161864,
            "upload_time": "2024-11-04T12:35:33",
            "upload_time_iso_8601": "2024-11-04T12:35:33.646369Z",
            "url": "https://files.pythonhosted.org/packages/18/f5/e2350f87979e372cdd389def18fcae234756293cd2757b5539cf7d06e23b/retrain_pipelines-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "36cb6014e2675d1ad7e7a7e512b1c536545a34af07509da7ad492c00053e832b",
                "md5": "46bdc65e35a13c9f04c6e36be3965af1",
                "sha256": "677ea8bff10afdee60cde7abc0a923b474ac947ae153130e62218e8cbd7ed73f"
            },
            "downloads": -1,
            "filename": "retrain_pipelines-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "46bdc65e35a13c9f04c6e36be3965af1",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.3",
            "size": 154609,
            "upload_time": "2024-11-04T12:35:35",
            "upload_time_iso_8601": "2024-11-04T12:35:35.423362Z",
            "url": "https://files.pythonhosted.org/packages/36/cb/6014e2675d1ad7e7a7e512b1c536545a34af07509da7ad492c00053e832b/retrain_pipelines-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-04 12:35:35",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "aurelienmorgan",
    "github_project": "retrain-pipelines",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "retrain-pipelines"
}
        
Elapsed time: 0.50922s