traintrack


Nametraintrack JSON
Version 0.1.6 PyPI version JSON
download
home_pagehttps://github.com/murnanedaniel/train-track
SummaryA simple helper to run pipelines of PytorchLightning models
upload_time2024-02-20 21:46:11
maintainer
docs_urlNone
authorDaniel Murnane
requires_python
licenseApache License, Version 2.0
keywords machine learning mlops pytorch pytorchlightning lightning pipeline
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">

<figure>
    <img src="https://raw.githubusercontent.com/murnanedaniel/train-track/master/docs/media/logo.png" width="250"/>
</figure>
    
# TrainTrack ML
### Quickly run stages of an ML pipeline from the command line

[Documentation](https://hsf-reco-and-software-triggers.github.io/Tracking-ML-Exa.TrkX/)

[![ci](https://github.com/murnanedaniel/train-track/actions/workflows/ci.yml/badge.svg)](https://github.com/murnanedaniel/train-track/actions/workflows/ci.yml) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)


</div>

Welcome to repository and documentation the TrainTrack library. Detailed documentation coming very soon! See [here](https://hsf-reco-and-software-triggers.github.io/Tracking-ML-Exa.TrkX/) for the documentation of the examples of this library. 

## Install

TrainTrack is most easily installed with pip:
```
pip install traintrack
```

## Objective

The aim of TrainTrack is simple: Given any set of self-contained [Pytorch Lightning](https://github.com/PyTorchLightning/pytorch-lightning) modules, run them in a serial and trackable way. 

At its heart, TrainTrack is nothing more than a loop over the stages defined in a `pipeline.yaml` configuration file. However, it can also handle data processing steps (i.e. non-trainable modules), automatically creates grid scans over combinations of hyperparameters, logs training with (currently) either Tensorboard or Weights & Biases, and can run separate, dependent Slurm batch jobs. It also has an opinionated approach to how data is passed from stage to stage, via Lightning callbacks. In this way, the only code that needs to be written is Lightning modules, all other boilerplate and tracking is handled by TrainTrack. 

## Example

`traintrack` uses two ingredients to run and track your training pipeline: 
1. A project configuration file
2. A pipeline configuration file

It also makes one or two assumptions about the structure of your project. For project `MyFirstMNIST`, we should structure it as
```
📦 MyFirstMNIST
┣ 📂 architectures
┣ 📂 notebooks
┣ 📂 configs
┃ ┣ 📜 project_config.yaml
┃ ┗ 📜 my_first_pipeline.yaml
┗ 📂 logs
```
**Note:** Only `configs/project_config.yaml` is a required file. All else is configurable. An example `project_config.yaml`:
```
# project_config.yaml

# Location of libraries
libraries:
    model_library: architectures
    artifact_library: /my/checkpoint/directory
    

# The lines you would like/need in a batch script before the call to pipeline.py
custom_batch_setup:
    - conda activate my-favorite-environment
    
# If you need to set up some environment before a batch is submitted, define it here in order of commands to run
command_line_setup:
    - module load cuda
    
# If you need to run jobs serially, set to true
serial: False

# Which logger to use - options are Weights & Biases [wandb], TensorBoard [tb], or [None]
logger: wandb
```

We can launch a vanilla run of TrainTrack with 
```
traintrack configs/my_first_pipeline.yaml
```
This trains and performs inference callbacks in the terminal. 


## A Pipeline

The pipeline config file defines a pipeline, for example:
```
# my_first_pipeline.yaml

stages:
    - {set: CNN, name: ResNet50, config: test_train.yaml}

```

which presumes a directory structure of:

```
📦 MyFirstMNIST
┣ 📂 architectures
┃ ┗ 📂 CNN
┃ ┃ ┣ 📜 cnn_base.py
┃ ┃ ┣ 📜 test_train.yaml
┃ ┃ ┗ 📂 Models
┃ ┃ ┃ ┗ 📜 resnet.py

```

Again, see [this repository](https://hsf-reco-and-software-triggers.github.io/Tracking-ML-Exa.TrkX/tree/master/Pipelines/Common_Tracking_Example) for example pipelines in action.

<!-- ## Objectives

1. To abstract away the engineering required to run multiple stages of training and inference with combinations of hyperparameter configurations. [Pytorch Lightning](https://github.com/PyTorchLightning/pytorch-lightning) is used for this, and is a good start, but this library extends Lightning to multiple modules run in series in some dependent way.
2. To present a set of templates, best practices and results gathered from significant trial and error, to speed up the development of others in the domain of machine learning for high energy physics. We focus on applications specific to detector physics, but many tools can be applied to other areas, and these are collected in an application-agnostic way in the [Tools](https://hsf-reco-and-software-triggers.github.io/Tracking-ML-Exa.TrkX/tools/overview/) section.

### Disclaimer:

This repository has been functional, but ugly. It is moving to an "alpha" version which follows many conventions and should be considerably more stable and user-friendly. This transition is expected before May 2021. Please be a little patient if using before then, and if something is broken, pull first to make sure it's not already solved, then post an issue second.

## Intro

To start as quickly as possible, clone the repository, [Install](https://hsf-reco-and-software-triggers.github.io/Tracking-ML-Exa.TrkX/pipelines/quickstart) and follow the steps in [Quickstart](https://hsf-reco-and-software-triggers.github.io/Tracking-ML-Exa.TrkX/pipelines/quickstart). This will get you generating toy tracking data and running inference immediately. Many of the choices of structure will be made clear there. If you already have a particle physics problem in mind, you can apply the [Template](https://hsf-reco-and-software-triggers.github.io/Tracking-ML-Exa.TrkX/pipelines/choosingguide.md) that is most suitable to your use case.

Once up and running, you may want to consider more complex ML [Models](https://hsf-reco-and-software-triggers.github.io/Tracking-ML-Exa.TrkX/models/overview/). Many of these are built on other libraries (for example [Pytorch Geometric](https://github.com/rusty1s/pytorch_geometric)).

<div align="center">
<figure>
  <img src="https://raw.githubusercontent.com/HSF-reco-and-software-triggers/Tracking-ML-Exa.TrkX/master/docs/media/application_diagram_1.png" width="600"/>
</figure>
</div>

## Install

It's recommended to start a conda environment before installation:

```
conda create --name exatrkx-tracking python=3.8
conda activate exatrkx-tracking
pip install pip --upgrade
```

If you have a CUDA GPU available, load the toolkit or [install it](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html) now. You should check that this is done by running `nvcc --version`. Then, running:

```
python install.py
```

will **attempt** to negotiate a path through the packages required, using `nvcc --version` to automatically find the correct wheels. 

You should be ready for the [Quickstart](https://hsf-reco-and-software-triggers.github.io/Tracking-ML-Exa.TrkX/pipelines/quickstart)!

If this doesn't work, you can step through the process manually:

<table style="border: 1px solid gray; border-collapse: collapse">
<tr style="border-bottom: 1px solid gray">
<th style="border-bottom: 1px solid gray"> CPU </th>
<th style="border-left: 1px solid gray"> GPU </th>
</tr>
<tr>
<td style="border-bottom: 1px solid gray">

1. Run 
`export CUDA=cpu`
    
</td>
<td style="border-left: 1px solid gray">

1a. Find the GPU version cuda XX.X with `nvcc --version`
    
1b. Run `export CUDA=cuXXX`, with `XXX = 92, 101, 102, 110`

</td>
</tr>
<tr style="border-bottom: 1px solid gray">
<td colspan="2">

2. Install Pytorch and dependencies 

```
    pip install --user -r requirements.txt
```

</td>
</tr>
<tr style="border-bottom: 1px solid gray">
<td colspan="2">

3. Install local packages

```pip install -e .```
    
</td>
</tr>
<tr>
<td style="border-bottom: 1px solid gray">

4. Install CPU-optimized packages

```
pip install faiss-cpu
pip install "git+https://github.com/facebookresearch/pytorch3d.git@stable" 
``` 
    
    
</td>
<td style="border-left: 1px solid gray">

    
4. Install GPU-optimized packages

```pip install faiss-gpu cupy-cudaXXX```, with `XXX`    

```
pip install pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py3{Y}_cu{XXX}_pyt{ZZZ}/download.html
```
    
where `{Y}` is the minor version of Python 3.{Y}, `{XXX}` is as above, and `{ZZZ}` is the version of Pytorch {Z.ZZ}.

    e.g. `py36_cu101_pyt170` is Python 3.6, Cuda 10.1, Pytorch 1.70.
   
    
</td>
</tr>
</table>

### Vintage Errors

A very possible error will be
```
OSError: libcudart.so.XX.X: cannot open shared object file: No such file or directory
```
This indicates a mismatch between CUDA versions. Identify the library that called the error, and ensure there are no versions of this library installed in parallel, e.g. from a previous `pip --user` install. -->

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/murnanedaniel/train-track",
    "name": "traintrack",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "Machine Learning,MLOps,Pytorch,PytorchLightning,Lightning,Pipeline",
    "author": "Daniel Murnane",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/8b/7c/88f9b6652b2f0bb73a6d9f37da92aaef83b540a7db749c2f92b55cfa4098/traintrack-0.1.6.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n\n<figure>\n    <img src=\"https://raw.githubusercontent.com/murnanedaniel/train-track/master/docs/media/logo.png\" width=\"250\"/>\n</figure>\n    \n# TrainTrack ML\n### Quickly run stages of an ML pipeline from the command line\n\n[Documentation](https://hsf-reco-and-software-triggers.github.io/Tracking-ML-Exa.TrkX/)\n\n[![ci](https://github.com/murnanedaniel/train-track/actions/workflows/ci.yml/badge.svg)](https://github.com/murnanedaniel/train-track/actions/workflows/ci.yml) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\n\n</div>\n\nWelcome to repository and documentation the TrainTrack library. Detailed documentation coming very soon! See [here](https://hsf-reco-and-software-triggers.github.io/Tracking-ML-Exa.TrkX/) for the documentation of the examples of this library. \n\n## Install\n\nTrainTrack is most easily installed with pip:\n```\npip install traintrack\n```\n\n## Objective\n\nThe aim of TrainTrack is simple: Given any set of self-contained [Pytorch Lightning](https://github.com/PyTorchLightning/pytorch-lightning) modules, run them in a serial and trackable way. \n\nAt its heart, TrainTrack is nothing more than a loop over the stages defined in a `pipeline.yaml` configuration file. However, it can also handle data processing steps (i.e. non-trainable modules), automatically creates grid scans over combinations of hyperparameters, logs training with (currently) either Tensorboard or Weights & Biases, and can run separate, dependent Slurm batch jobs. It also has an opinionated approach to how data is passed from stage to stage, via Lightning callbacks. In this way, the only code that needs to be written is Lightning modules, all other boilerplate and tracking is handled by TrainTrack. \n\n## Example\n\n`traintrack` uses two ingredients to run and track your training pipeline: \n1. A project configuration file\n2. A pipeline configuration file\n\nIt also makes one or two assumptions about the structure of your project. For project `MyFirstMNIST`, we should structure it as\n```\n\ud83d\udce6 MyFirstMNIST\n\u2523 \ud83d\udcc2 architectures\n\u2523 \ud83d\udcc2 notebooks\n\u2523 \ud83d\udcc2 configs\n\u2503 \u2523 \ud83d\udcdc project_config.yaml\n\u2503 \u2517 \ud83d\udcdc my_first_pipeline.yaml\n\u2517 \ud83d\udcc2 logs\n```\n**Note:** Only `configs/project_config.yaml` is a required file. All else is configurable. An example `project_config.yaml`:\n```\n# project_config.yaml\n\n# Location of libraries\nlibraries:\n    model_library: architectures\n    artifact_library: /my/checkpoint/directory\n    \n\n# The lines you would like/need in a batch script before the call to pipeline.py\ncustom_batch_setup:\n    - conda activate my-favorite-environment\n    \n# If you need to set up some environment before a batch is submitted, define it here in order of commands to run\ncommand_line_setup:\n    - module load cuda\n    \n# If you need to run jobs serially, set to true\nserial: False\n\n# Which logger to use - options are Weights & Biases [wandb], TensorBoard [tb], or [None]\nlogger: wandb\n```\n\nWe can launch a vanilla run of TrainTrack with \n```\ntraintrack configs/my_first_pipeline.yaml\n```\nThis trains and performs inference callbacks in the terminal. \n\n\n## A Pipeline\n\nThe pipeline config file defines a pipeline, for example:\n```\n# my_first_pipeline.yaml\n\nstages:\n    - {set: CNN, name: ResNet50, config: test_train.yaml}\n\n```\n\nwhich presumes a directory structure of:\n\n```\n\ud83d\udce6 MyFirstMNIST\n\u2523 \ud83d\udcc2 architectures\n\u2503 \u2517 \ud83d\udcc2 CNN\n\u2503 \u2503 \u2523 \ud83d\udcdc cnn_base.py\n\u2503 \u2503 \u2523 \ud83d\udcdc test_train.yaml\n\u2503 \u2503 \u2517 \ud83d\udcc2 Models\n\u2503 \u2503 \u2503 \u2517 \ud83d\udcdc resnet.py\n\n```\n\nAgain, see [this repository](https://hsf-reco-and-software-triggers.github.io/Tracking-ML-Exa.TrkX/tree/master/Pipelines/Common_Tracking_Example) for example pipelines in action.\n\n<!-- ## Objectives\n\n1. To abstract away the engineering required to run multiple stages of training and inference with combinations of hyperparameter configurations. [Pytorch Lightning](https://github.com/PyTorchLightning/pytorch-lightning) is used for this, and is a good start, but this library extends Lightning to multiple modules run in series in some dependent way.\n2. To present a set of templates, best practices and results gathered from significant trial and error, to speed up the development of others in the domain of machine learning for high energy physics. We focus on applications specific to detector physics, but many tools can be applied to other areas, and these are collected in an application-agnostic way in the [Tools](https://hsf-reco-and-software-triggers.github.io/Tracking-ML-Exa.TrkX/tools/overview/) section.\n\n### Disclaimer:\n\nThis repository has been functional, but ugly. It is moving to an \"alpha\" version which follows many conventions and should be considerably more stable and user-friendly. This transition is expected before May 2021. Please be a little patient if using before then, and if something is broken, pull first to make sure it's not already solved, then post an issue second.\n\n## Intro\n\nTo start as quickly as possible, clone the repository, [Install](https://hsf-reco-and-software-triggers.github.io/Tracking-ML-Exa.TrkX/pipelines/quickstart) and follow the steps in [Quickstart](https://hsf-reco-and-software-triggers.github.io/Tracking-ML-Exa.TrkX/pipelines/quickstart). This will get you generating toy tracking data and running inference immediately. Many of the choices of structure will be made clear there. If you already have a particle physics problem in mind, you can apply the [Template](https://hsf-reco-and-software-triggers.github.io/Tracking-ML-Exa.TrkX/pipelines/choosingguide.md) that is most suitable to your use case.\n\nOnce up and running, you may want to consider more complex ML [Models](https://hsf-reco-and-software-triggers.github.io/Tracking-ML-Exa.TrkX/models/overview/). Many of these are built on other libraries (for example [Pytorch Geometric](https://github.com/rusty1s/pytorch_geometric)).\n\n<div align=\"center\">\n<figure>\n  <img src=\"https://raw.githubusercontent.com/HSF-reco-and-software-triggers/Tracking-ML-Exa.TrkX/master/docs/media/application_diagram_1.png\" width=\"600\"/>\n</figure>\n</div>\n\n## Install\n\nIt's recommended to start a conda environment before installation:\n\n```\nconda create --name exatrkx-tracking python=3.8\nconda activate exatrkx-tracking\npip install pip --upgrade\n```\n\nIf you have a CUDA GPU available, load the toolkit or [install it](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html) now. You should check that this is done by running `nvcc --version`. Then, running:\n\n```\npython install.py\n```\n\nwill **attempt** to negotiate a path through the packages required, using `nvcc --version` to automatically find the correct wheels. \n\nYou should be ready for the [Quickstart](https://hsf-reco-and-software-triggers.github.io/Tracking-ML-Exa.TrkX/pipelines/quickstart)!\n\nIf this doesn't work, you can step through the process manually:\n\n<table style=\"border: 1px solid gray; border-collapse: collapse\">\n<tr style=\"border-bottom: 1px solid gray\">\n<th style=\"border-bottom: 1px solid gray\"> CPU </th>\n<th style=\"border-left: 1px solid gray\"> GPU </th>\n</tr>\n<tr>\n<td style=\"border-bottom: 1px solid gray\">\n\n1. Run \n`export CUDA=cpu`\n    \n</td>\n<td style=\"border-left: 1px solid gray\">\n\n1a. Find the GPU version cuda XX.X with `nvcc --version`\n    \n1b. Run `export CUDA=cuXXX`, with `XXX = 92, 101, 102, 110`\n\n</td>\n</tr>\n<tr style=\"border-bottom: 1px solid gray\">\n<td colspan=\"2\">\n\n2. Install Pytorch and dependencies \n\n```\n    pip install --user -r requirements.txt\n```\n\n</td>\n</tr>\n<tr style=\"border-bottom: 1px solid gray\">\n<td colspan=\"2\">\n\n3. Install local packages\n\n```pip install -e .```\n    \n</td>\n</tr>\n<tr>\n<td style=\"border-bottom: 1px solid gray\">\n\n4. Install CPU-optimized packages\n\n```\npip install faiss-cpu\npip install \"git+https://github.com/facebookresearch/pytorch3d.git@stable\" \n``` \n    \n    \n</td>\n<td style=\"border-left: 1px solid gray\">\n\n    \n4. Install GPU-optimized packages\n\n```pip install faiss-gpu cupy-cudaXXX```, with `XXX`    \n\n```\npip install pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py3{Y}_cu{XXX}_pyt{ZZZ}/download.html\n```\n    \nwhere `{Y}` is the minor version of Python 3.{Y}, `{XXX}` is as above, and `{ZZZ}` is the version of Pytorch {Z.ZZ}.\n\n    e.g. `py36_cu101_pyt170` is Python 3.6, Cuda 10.1, Pytorch 1.70.\n   \n    \n</td>\n</tr>\n</table>\n\n### Vintage Errors\n\nA very possible error will be\n```\nOSError: libcudart.so.XX.X: cannot open shared object file: No such file or directory\n```\nThis indicates a mismatch between CUDA versions. Identify the library that called the error, and ensure there are no versions of this library installed in parallel, e.g. from a previous `pip --user` install. -->\n",
    "bugtrack_url": null,
    "license": "Apache License, Version 2.0",
    "summary": "A simple helper to run pipelines of PytorchLightning models",
    "version": "0.1.6",
    "project_urls": {
        "Homepage": "https://github.com/murnanedaniel/train-track"
    },
    "split_keywords": [
        "machine learning",
        "mlops",
        "pytorch",
        "pytorchlightning",
        "lightning",
        "pipeline"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "417b615b9679afe4ba043c61465e95fafb13b22e3d2036059d29c3e058490b55",
                "md5": "e218d422fd5c4a288dc19ad30d3dbed1",
                "sha256": "ac0a87e4dac024c1e4996f343badd358f5242b096adcc032ded26476cc5b2676"
            },
            "downloads": -1,
            "filename": "traintrack-0.1.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e218d422fd5c4a288dc19ad30d3dbed1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 13474,
            "upload_time": "2024-02-20T21:46:09",
            "upload_time_iso_8601": "2024-02-20T21:46:09.986166Z",
            "url": "https://files.pythonhosted.org/packages/41/7b/615b9679afe4ba043c61465e95fafb13b22e3d2036059d29c3e058490b55/traintrack-0.1.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8b7c88f9b6652b2f0bb73a6d9f37da92aaef83b540a7db749c2f92b55cfa4098",
                "md5": "7d16eb3855fc3869f4e225519cac35e8",
                "sha256": "113e31731b7763b7f985318936e17abe92bf89d5ba44bb7184c0b4d9c2b3c055"
            },
            "downloads": -1,
            "filename": "traintrack-0.1.6.tar.gz",
            "has_sig": false,
            "md5_digest": "7d16eb3855fc3869f4e225519cac35e8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 14781,
            "upload_time": "2024-02-20T21:46:11",
            "upload_time_iso_8601": "2024-02-20T21:46:11.816192Z",
            "url": "https://files.pythonhosted.org/packages/8b/7c/88f9b6652b2f0bb73a6d9f37da92aaef83b540a7db749c2f92b55cfa4098/traintrack-0.1.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-20 21:46:11",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "murnanedaniel",
    "github_project": "train-track",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "traintrack"
}
        
Elapsed time: 2.51276s