cube-dl

Name	cube-dl JSON
Version	0.3.10 JSON
	download
home_page	None
Summary	"The last stop" for training your deep learning models. Manage tons of configurations and experiments with minimal changes to existing code.
upload_time	2024-05-24 09:39:28
maintainer	None
docs_url	None
author	Alive1024
requires_python	<4.0,>=3.10
license	GPL-3.0-or-later
keywords	python data-science machine-learning deep-learning python3 pytorch pytorch-lightning
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # cube-dl

Languages: English | [简体中文](./docs/README_zh-CN.md)

**"The last stop" for training your deep learning models.**

**Manage tons of configurations and experiments with minimal changes to existing code.**


[![Packaging Wheel](https://github.com/Alive1024/cube-dl/actions/workflows/packaging_wheel_on_push.yml/badge.svg)](https://github.com/Alive1024/cube-dl/actions/workflows/packaging_wheel_on_push.yml)
[![Publishing to PyPI](https://github.com/Alive1024/cube-dl/actions/workflows/publishing_on_tag.yml/badge.svg)](https://github.com/Alive1024/cube-dl/actions/workflows/publishing_on_tag.yml)
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit)](https://github.com/pre-commit/pre-commit)

**Install from PyPI (stable, recommended)**：

```shell
pip install -U cube-dl
```

**Install from wheel file (latest)**:

Enter the [Actions](https://github.com/Alive1024/cube-dl/actions) page of this project, select the latest workflow run from the actions corresponding to "Packaging Wheel", download the compressed package of the wheel file from Artifacts, extract it, and install it using pip:

```shell
pip install xxx.whl
```

**Install from source code (latest)**：

```shell
git clone git@github.com:Alive1024/cube-dl.git
cd cube-dl
pip install .
```

**Table of Contents**：

- [cube-dl](#cube-dl)
- [1. Introduction](#1-introduction)
  - [1.1 Motivation](#11-motivation)
  - [1.2 Main Features](#12-main-features)
  - [1.3 Design Principles](#13-design-principles)
  - [1.4 Prerequisites](#14-prerequisites)
- [2. Project Description](#2-project-description)
  - [2.1 Key Concepts](#21-key-concepts)
    - [2.1.1 Four Core Components](#211-four-core-components)
    - [2.1.2 The Triple-Layer Structure for Organizing Experiments](#212-the-triple-layer-structure-for-organizing-experiments)
  - [2.2 Configuration System](#22-configuration-system)
    - [2.2.1 Configuration Files](#221-configuration-files)
    - [2.2.3 Automatic Archiving of Configuration Files](#223-automatic-archiving-of-configuration-files)
    - [2.2.4 Sharing Preset Values Between Configuration Files](#224-sharing-preset-values-between-configuration-files)
    - [2.2.5 Comparison with Other Configuration Methods](#225-comparison-with-other-configuration-methods)
  - [2.3 Starter](#23-starter)
  - [2.4 Directory Structure of The Starter](#24-directory-structure-of-the-starter)
  - [2.4 Basic Commands and Arguments](#24-basic-commands-and-arguments)
    - [`start`](#start)
    - [`new`](#new)
    - [`add-exp`](#add-exp)
    - [`ls`](#ls)
    - [Common Arguments for `fit`, `validate`, `test`, `predict`](#common-arguments-for-fit-validate-test-predict)
    - [Common Arguments for  `validate`, `test`, `predict`](#common-arguments-for--validate-test-predict)
    - [`fit`](#fit)
    - [`resume-fit`](#resume-fit)
    - [`validate`](#validate)
    - [`test`](#test)
    - [`predict`](#predict)
  - [2.5 Others](#25-others)
    - [2.5.1 Callback Functions](#251-callback-functions)
    - [2.5.2 Runtime Contexts](#252-runtime-contexts)

# 1. Introduction

**cube-dl** is a high-level Python library for managing and training deep learning models, designed to manage a large number of deep learning configuration items and experiments with ease, making it well-organized.

## 1.1 Motivation

As we can see, there are already quite a few libraries related to deep learning at different levels in the open source community. For example, [PyTorch](https://github.com/pytorch/pytorch) provides powerful deep learning modeling capabilities, while [PyTorch-Lightning](https://github.com/Lightning-AI/pytorch-lightning) abstracts and wraps PyTorch, saving the hassle of writing a large amount of boilerplate code. However, even with these, training deep learning models may still be chaotic due to a large number of configurable items, experiments, etc., forcing researchers/developers to spend a lot of energy and time organizing and comparing experimental results, rather than the methods themselves. In addition, in the process of conducting research, it is inevitable to use other people's open-source algorithms. Due to each person's different code habits, open-source algorithms have different organizational structures, and some repositories serve specific methods or datasets without good top-level design. Using these codes for custom experiments can be quite painful. Furthermore, when aggregating algorithms from different sources, a universal code structure is required.

**cube-dl** was born as a result, by imposing some rule constraints on configuration and experimental management to make deep learning projects easier to manage, and striking a good balance between abstraction and flexibility.

## 1.2 Main Features

- **Componentization**: The elements involved in the training process of deep learning models are clearly divided into four parts to achieve low coupling and high reusability;
- **A brand-new configuration system**: Deep learning projects often involve a large number of configurable parameters, and how to effortlessly configure these parameters is an important issue. Moreover, these parameters often have a crucial impact on the final result, so it is necessary to document these parameters in detail. Cube-dl has redesigned the entire configuration system based on the characteristics of deep learning projects, making it easy to use and traceable;
- **Triple organizational structure**: In order to organize a large number of experiments in a more organized manner, all experiments are forcibly divided into three levels: Project, Experiment, and Run. Each task execution will automatically save corresponding records for reference;
- **Simple and Fast CLI**: cube-dl provides a concise CLI that can be managed, trained, and tested with a few commands.


## 1.3 Design Principles

cube-dl follows the following principles as much as possible:

- **Universality**: Independent of specific research fields, there is no need to start from scratch when switching between different fields;
- **Flexibility and Scalability**: "Extend rather than modify". When implementing new models, datasets, optimization algorithms, loss functions, metrics, and other components, try not to change existing code as much as possible. Instead, add new code to achieve extension;
- **Good Organization and Recording**: The results of each operation should be well organized and recorded;
- **Maximum Compatibility**: facilitates the migration of existing other code to the current code repository at the lowest cost;
- **Lowest Learning Cost**: After reading README, you can master how to use it without having to learn a lot of APIs from dozens of pages of documentation.

## 1.4 Prerequisites

Users should have a basic understanding of Python and PyTorch.


# 2. Project Description

## 2.1 Key Concepts

### 2.1.1 Four Core Components

Generally speaking, the core components of deep learning include [<sup>1</sup>](https://d2l.ai/chapter_introduction/index.html#key-components)：

- **Data** that can be learned
- The **model** for converting data
- **Objective function** for quantifying model effectiveness
- **Optimization algorithm** for adjusting model parameters to optimize the objective function

Based on the above classification and componentization ideas, cube-dl reorganizes the relevant components of deep learning projects into four parts:

- **Model**: the model to be trained；
- **Task Module**: The definition of the process for a certain deep learning task, corresponding to a certain training paradigm, such as the most common fully supervised learning. The Task Module can be further subdivided into several components, such as loss functions, optimization algorithms, learning rate regulators, metrics used in validation and testing, etc. Meanwhile, the model to be trained is specified as an initialization parameter for the Task Module;
- **Data Module**: Data related, corresponding to the combination of Dataset and DataLoader for PyTorch, similar to the LightningDataModule for [PyTorch-Lightning](https://lightning.ai/docs/pytorch/stable/data/datamodule.html). However, the usage is slightly different. The Data Module here is not specific to any dataset, and the specific dataset class is passed in as an initialization parameter for the Data Module;
- **Runner**: The engineering level code for executing model training, validation, testing, reasoning, and other processes.

```text
                        ┌────────────────┐
                        │     Model      │
                        └────────────────┘     ┌────────────────┐
                                               │ Loss Function  │
                                               ├────────────────┤
                        ┌────────────────┐     │   Optimizer    │
                        │  Task Module   │─────▶────────────────┤
                        └────────────────┘     │  LR Scheduler  │
                                               ├────────────────┤
                                               │Val/Test Metrics│
                                               ├────────────────┤
                                               │     ......     │
                                               └────────────────┘
                                               ┌────────────────┐
                                               │    Datasets    │
                        ┌────────────────┐     ├────────────────┤
                        │  Data Module   │─────▶  Batch Sizes   │
                        └────────────────┘     ├────────────────┤
                                               │     ......     │
                                               └────────────────┘

                        ┌────────────────┐
                        │     Runner     │
                        └────────────────┘
```



### 2.1.2 The Triple-Layer Structure for Organizing Experiments

In order to organize all experiments in a more organized manner, cube-dl mandatorily requires users to use a "triple-layer structure":

- **Project** (hereinafter referred to as **proj**): contains multiple exps;
- **Experiment** (hereinafter referred to as **exp**): a set of runs with a common theme, each exp must be associated with a certain proj, such as "baseline", "abbreviation", "contrast", etc.;
- **Run**: The smallest atomic unit of operation, each run must belong to an exp in a proj, and each run has a job type indicating what the run is doing.

The above three entities all have corresponding random IDs composed of lowercase letters and numbers. ID of proj/exp is of 2 characters, and the ID of run is of 4 characters.


The structure of the output directory will take the form of:

```text
                 ┌───────────────────────┐
               ┌▶│   proj_6r_DummyProj   │
               │ └───────────────────────┘          ┌─────────────────────┐
               │             │                   ┌─▶│run_z2hi_fit_DummyRun│
               │             │ ┌───────────────┐ │  └─────────────────────┘
               │             ├▶│exp_1b_DummyExp│─┤
               │             │ └───────────────┘ │  ┌───────┐
┌────────────┐ │             │                   └─▶│  ...  │
│   Output   │ │             │ ┌───────┐            └───────┘
│ Directory  │─┤             └▶│  ...  │
└────────────┘ │               └───────┘
               │
               │ ┌───────┐
               └▶│  ...  │
                 └───────┘
```

In the root directory of proj, there will be a JSON file with the same name, which contains records of all exps and runs of the current proj, such as:

```json
{
  "ID": "6r",
  "Name": "DummyProj",
  "Desc": "This is a dummy proj for demonstration.",
  "CreatedTime": "2024-03-18 22:11:15",
  "Path": "./outputs/proj_6r_DummyProj",
  "Exps": {
    "1b": {
      "Name": "DummyExp",
      "Desc": "This is a dummy exp for demonstration.",
      "CreatedTime": "2024-03-18 22:11:15",
      "Path": "./outputs/proj_6r_DummyProj/exp_1b_DummyExp",
      "Runs": {
        "z2hi": {
          "Name": "DummyRun",
          "Desc": "A dummy run for demonstration.",
          "CreatedTime": "2024-03-18 22:12:49",
          "Path": "./outputs/proj_6r_DummyProj/exp_1b_DummyExp/run_z2hi_fit_DummyRun",
          "Type": "fit"
        }
      }
    }
  }
}
```

By default, these proj record files will be tracked by git to facilitate distributed collaboration among multiple people through git. This means that the proj, exp, and run created by user A can be seen by user B (but the output products of run will not be tracked by git).


## 2.2 Configuration System

As mentioned earlier, deep learning projects often involve a large number of configurable parameters, and it is crucial to pass in and record these parameters. Considering that the essence of configuration is to provide initialization parameters for instantiating classes, cube-dl has designed a brand-new configuration system. Writing configuration files is as natural as writing code for instantiating a class normally.

### 2.2.1 Configuration Files

In cube-dl, the configuration file is actually a `.py` source code file, mainly used to define how to instantiate the corresponding object. Writing a configuration file is a process of selecting (`import` what to be used) and defining how to instantiate it. For example, the following is a code snippet for configuring a runner:

```python
@cube_runner
def get_fit_runner():
    run = CUBE_CONTEXT["run"]
    return pl.Trainer(
        accelerator="auto",
        max_epochs=shared_config.get("max_epochs"),
        callbacks=[
            RichProgressBar(),
            ModelCheckpoint(
                dirpath=osp.join(run.run_dir, "checkpoints"),
                filename="{epoch}-{step}-{val_mean_acc:.4f}",
                save_top_k=1,
                monitor="val_mean_acc",
                mode="max",
            ),
        ],
        logger=get_csv_logger(run),
    )
```

As can be seen, in the configuration file, the instantiation process needs to be put into a "getter" function, and the instantiated object will be `return`. The reason why an object is not directly instantiated in the configuration file is to allow the cube-dl to control the timing of instantiation of configuration items.

Since the configuration file is essentially a Python source code file, it can contain any logic like a regular Python source code file, but it is generally not very complex.

Corresponding to the four core components described earlier, there are four main types of core configuration items, namely `cube_model`, `cube_task_module`, `cube_data_module` and `cube_runner`. These configuration items can be used as modular and reusable configuration components in the configuration system. In addition, during actual experiments, it is necessary to freely combine the four components to form a **RootConfig**, which is the root node of all configurations.

The relationship between the five configuration items is as follows:

```text
                                  ┌────────────────────────┐
                                  │       Components       │
                                  │     ┌────────────┐     │
                                  │ ┌──▶│  Model(s)  │──┐  │
                                  │ │   └────────────┘  │  │
                                  │ │                   │  │
                  ┌─────────────┐ │ │   ┌────────────┐  │  │
                  │ Root Config │─┼─┼──▶│Task Module │◀─┘  │
                  └─────────────┘ │ │   └────────────┘     │
                                  │ │   ┌────────────┐     │
                                  │ ├──▶│Data Module │     │
                                  │ │   └────────────┘     │
                                  │ │   ┌────────────┐     │
                                  │ └──▶│ Runner(s)  │     │
                                  │     └────────────┘     │
                                  └────────────────────────┘
```

For some rules regarding configuration files:

- For better readability, keyword parameters must be used when initializing `RootConfig` in the configuration file (it is recommended to force the use of keyword parameters when writing task/data modules, following this rule);
- The getter function name of Root config must be `get_root_config`, and there can only be one in each configuration file. Other types of configuration items do not have this restriction;
- The getter function of task module must have a parameter named `model`, which corresponds to the `model_getters` passed to the root config. This parameter is used to pass the model object, which is needed in optimizer or other configration item of task module. When a list (means multiple models) is passed as `model_getters`, the parameter `model` will be also a list.
- Decorators named  `cube_root_config`, `cube_model`, `cube_task_module`, `cube_data_module` and `cube_runner` can be imported into `cube_dl.config_sys`. It is strongly recommended to use the corresponding decorators when writing getter functions, on the one hand to allow the decorators to check, and on the other hand to expand in the future.

Additionally, it is recommended to use relative import statements when importing the required config components from other configuration files.


### 2.2.3 Automatic Archiving of Configuration Files

For the convenience of replicating a run, the configuration files used during each run will be automatically archived. By default, a configuration file named 'archived_config_<RUN_ID>. py' will be saved in the root directory of the corresponding run. This file combines several configuration files specified at runtime to form a separate file, which can be used directly when replicating this experiment.

### 2.2.4 Sharing Preset Values Between Configuration Files

In some scenarios, some configuration values need to be shared between different configuration files. For example, epochs may be required by both the LR scheduler in the task module and the runner. In order to facilitate one-time modification and prevent errors caused by omissions, when all configuration components are in the same configuration file, the preset values that need to be shared can be defined as global variables. However, this approach is not feasible when configuration components are scattered across multiple files. In this case, the `shared_config` provided by cube-dl can be used (which can be imported from `cube_dl.config_sys`). Perform `set` in the root config getter, and then perform `get` when needed for other purposes.

### 2.2.5 Comparison with Other Configuration Methods

The comparison with several mainstream configuration methods is as follows:

1. **Defining Command Line Arguments by argparse**: some projects directly defines configurable arguments by `argparse`. It is obvious that this configuration method is complex and prone to errors when the number of parameters continues to expand, and it is also very troublesome at runtime;
2. **Using XML/JSON/YAML or Other Configuration Files**：for example, some configurations of [detectron2](https://github.com/facebookresearch/detectron2) and [LightningCLI](https://lightning.ai/docs/pytorch/stable/cli/lightning_cli_advanced.html) provided by PyTorch-Lightning adopt YAML files. This method has an obvious flaw: the prompt function from the IDEs will be very limited, and it is almost identical to a plain text file during editing. When there are many configuration items, handwriting or copying and pasting hundreds of lines of text back and forth can be very painful. When configuring, you also need to spend time looking up optional values and can only achieve simple logic;
3. **Using OmegaConf or Other Configuration Library**： [OmegaConf](https://github.com/omry/omegaconf) is a YAML-based hierarchical configuration system, supporting configuration from merging multiple sources, with strong flexibility. But when writing deep learning projects involving numerous parameters, editing files like YAML also faces the hassle of writing a large number of text files;
4. **Implementing Specific Config Classes**：for example, [Mask_RCNN - config.py](https://github.com/matterport/Mask_RCNN/blob/master/mrcnn/config.py) implements `Config` base class. When using it, subclasses need to be derived and some attribute values need to be covered as needed. This approach is inflexible and tightly coupled with the current project, making it unsuitable for general scenarios;
5. **General Python Source Files**：most of open source libraries from [OpenMMLab](https://github.com/open-mmlab) adopt this method such as [mmdetection](https://github.com/open-mmlab/mmdetection. Their configration files are like [atss_r50_fpn_8xb8-amp-lsj-200e_coco.py](https://github.com/open-mmlab/mmdetection/blob/main/configs/atss/atss_r50_fpn_8xb8-amp-lsj-200e_coco.py). Although Python source code files are used for configuration, it is self-contained and has special rules (such as the need to use `_base_` for inheritance), which incurs learning costs. Essentially, it involves defining several `dict`s, defining the classes to be used and their parameters, which are passed in as key values and cannot fully utilize the code prompts of the IDE. This has similar drawbacks to text-based configuration methods. Moreover, the direct assignment of various configuration items as variables in the configuration file is quite loose and error-prone.

These configuration methods essentially pass parameters in various forms, and then the configuration system will use these parameters to instantiate some classes or pass them to a certain location. The configuration method in cube-dl is equivalent to flipping this process, directly defining how to instantiate the class during use, and the configuration system will automatically record and archive it. In this way, the process of writing configuration files is as natural as instantiating classes normally, with almost no need to learn how to configure them. It can also fully utilize the prompts of the IDE to improve writing efficiency and add any logic.

## 2.3 Starter

The so-called "starter" is a set of initial files compatible with cube-dl, used to initialize a deep learning project. Through this approach, cube-dl can be decoupled from specific frameworks such as PyTorch-Lightning. When creating a project, you can choose more flexible native PyTorch or more abstract PyTorch-Lightning based on actual needs.

The standard starter should contain a file named "[pyproejct.toml](https://packaging.python.org/en/latest/specifications/pyproject-toml/#). And it should contain a configuration item named `tool.cube_dl`.

## 2.4 Directory Structure of The Starter

The structure and meaning of the starter directory are as follows (using "pytorch-lighting" as an example):

```text
pytorch-lightning
├── callbacks 【specific CubeCallbacks for the current starter】
├── configs   【directory for storing configuration files】
│   ├── __init__.py
│   ├── components 【configuration components】
│   │   └── mnist_data_module.py
│   └── mnist_cnn_sl.py 【a RootConfig file】
├── data    【directory for storing（symbolic links）】
│   └── MNIST -> /Users/yihaozuo/Zyh-Coding-Projects/Datasets/MNIST
├── datasets 【data modules and dataset classes】
│   ├── __init__.py
│   └── basic_data_module.py
├── models   【model definition】
│   ├── __init__.py
│   ├── __pycache__
│   └── cnn_example.py
├── outputs  【the output director, string all output products】
├── pyproject.toml  【configuration file for Python project】
├── requirements.txt
└── tasks 【definitions of Task Modules】
│   ├── __init__.py
│   ├── base.py  【task base class】
│   └── supervised_learning.py【task definition of full supervised learning】
└── utils  【miscellaneous utilities】
```

## 2.4 Basic Commands and Arguments

### `start`

Download the specified starter.

You can first view the available starters through `cube start -l`, and then download the specified starter using the following arguments:

| Argument Name   | Type | Required | Meaning |
|-----------------|:----:|:---------:|---------|
| **-o**, --owner | str  |     ❌     |         |
| **-r**, --repo  | str  |     ❌     |         |
| **-p**, --path  | str  |     ✅     |         |
| **-d**, --dest  | str  |     ❌     |         |

For example：

```shell
cube start -o Alive1024 -r cube-dl -p pytorch-lightning
```

### `new`

Create a pair of new proj and exp.

| Argument Name                     | Type | Required | Meaning                     |
|-----------------------------------| :--: | :------: |-----------------------------|
| **-pn**, --proj-name, --proj_name | str  |    ✅     | name of the new proj        |
| **-pd**, --proj-desc, --proj_desc | str  |    ❌     | description of the new proj |
| **-en**, --exp-name, --exp_name   | str  |    ✅     | name of the new exp         |
| **-ed**, --exp-desc, --exp_desc   | str  |    ❌     | description of the new exp  |

For example：

```shell
cube new -pn "MyFirstProject" -pd "This is my first project." -en "Baseline" -ed "Baseline exps."
```

### `add-exp`

Add a new exp to a proj.

| Argument Name                | Type | Required | Meaning                                    |
| ---------------------------- |:----:| :------: |--------------------------------------------|
| **-p**, --proj-id, --proj_id | str  |    ✅     | ID of the proj that the new exp belongs to |
| **-n**, --name               | str  |    ✅     | name of the new exp                        |
| **-d**, --desc               | str  |    ❌     | description of the new exp                 |

For example：

```shell
cube add-exp -p 8q -n "Ablation" -d "Ablation exps."
```

### `ls`

Display information about proj, exp, and run in the form of a table in the terminal.

`cube ls` is equivalent to `cube ls -pe`, which will display all proj and exp.


The following other parameters are mutually exclusive:

| Argument Name                           |     Type     |                                    Meaning                                     |
|-----------------------------------------|:------------:|:------------------------------------------------------------------------------:|
| **-p**, --projs                         | "store_true" |                               display all projs                                |
| **-er**, --exps-runs-of, --exps_runs_of |     str      |             display all exps and runs of the proj specified by ID              |
| **-e**, --exps-of, --exps_of            |     str      |                  display all exps of the proj specified by ID                  |
| **-r**, --runs-of, --runs_of            |  str (2 个)   | display all runs of the exp of the proj specified by two IDs (proj_ID exp_ID). |

For example：

```shell
cube ls -r 8q zy
```

### Common Arguments for `fit`, `validate`, `test`, `predict`

`fit`, `validate`, `test`, `predict` all have the following arguments:

| Argument Name                               | Type | Required | Meaning                                        |
| ------------------------------------ |:----:|:--------:|------------------------------------------------|
| **-c**, --config-file, --config_file | str  |    ✅     | path to the config file                        |
| **-p**, --proj-id, --proj_id         | str  |    ✅     | ID of the proj ID that the new run belongs to. |
| **-e**, --exp-id, --exp_id           | str  |    ✅     | ID of the exp that the new run belongs to.     |
| **-n**, --name                       | str  |    ✅     | name of the new run                            |
| **-d**, --desc                       | str  |    ❌     | description of the new run                                   |

### Common Arguments for  `validate`, `test`, `predict`

In addition to the above parameters, subcommands `validate`, `test`, `predict` also have the following arguments:

| Argument Name                         | Type | Required | Meaning                                               |
| ------------------------------------- | :--: | :------: | -------------------------------------------------- |
| **-lc**, --loaded-ckpt, --loaded_ckpt | str  |    ✅     | File path of the model checkpoint to be loaded. Use an empty string "" to explicitly indicate you are going to conduct validate/test/predict using the initialized model without loading any weights). |


### `fit`

Training on the training set.

For example：

```shell
cube fit -c configs/mnist_cnn_sl.py -p 8q -e zy -n "ep25-lr1e-3" -d "Use a 3-layer simple CNN as baseline, max_epochs: 25, base lr: 1e-3"
```

### `resume-fit`

Resume fit from an interrupted one.

| Argument Name                        | Type | Required | Meaning                                                                                                                                                                |
| ------------------------------------ | :--: | :------: |------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **-c**, --config-file, --config_file | str  |    ✅     | path to the config file                                                                                                                                                |
| **-r**, --resume-from, --resume_from | str  |    ✅     | file path to the checkpoint where resumes, the path should include the directory names where proj, exp, and run are located (inferring IDs requires these information) |

For example：

```shell
cube resume-fit -c configs/mnist_cnn_sl.py -r "outputs/proj_8q_MNIST/exp_zy_Baseline/run_rw4q_fit_ep25-lr1e-3/checkpoints/epoch\=3-step\=1532.ckpt"
```

### `validate`

Evaluate on the validation set.

For example：

```shell
cube validate -c configs/mnist_cnn_sl.py -p 8q -e zy -n "Val" -d "Validate the simple CNN." -lc "outputs/proj_8q_MNIST/exp_zy_Baseline/run_rw4q_fit_ep25-lr1e-3/checkpoints/epoch\=3-step\=1532.ckpt"
```

### `test`

Evaluate on the test set.

For example：

```shell
cube test -c configs/mnist_cnn_sl.py -p 8q -e zy -n "Test" -d "Test the simple CNN." -lc "outputs/proj_8q_MNIST/exp_zy_Baseline/run_rw4q_fit_ep25-lr1e-3/checkpoints/epoch\=3-step\=1532.ckpt"
```

### `predict`

Predict.

For example：

```shell
cube predict -c configs/mnist_cnn_sl.py -p 8q -e zy -n "Test" -d "Predict using the simple CNN." -lc "outputs/proj_8q_MNIST/exp_zy_Baseline/run_rw4q_fit_ep25-lr1e-3/checkpoints/epoch\=3-step\=1532.ckpt"
```

## 2.5 Others

### 2.5.1 Callback Functions

`RootConfig` supports adding callback functions through the `callbacks` parameter. All callback functions should be `cube_dl.callback.CubeCallback` type. When custom callback functions are needed, they should inherit the `CubeCallback` class and implement the required hooks. Currently, `CubeCallback` supports `on_run_start` and `on_run_end`.

### 2.5.2 Runtime Contexts

At runtime, the cube-dl will store some context in specific locations for access.

You can import `CUBE_CONTEXT` (actually a dict) from `cube_dl.core`, and then retrieve the current `Run` object through `run = CUBE_CONTEXT["run"]`. This is very useful when obtaining information related to `Run`. For example, when you want to save the predicted results to the corresponding run directory during validation, you can obtain it through `CUBE_CONTEXT["run"].run_dir`.

In addition, the ID of the current `Run` object can also be obtained by accessing the environment variable named `CUBE_RUN_ID`.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "cube-dl",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": null,
    "keywords": "python, data-science, machine-learning, deep-learning, python3, pytorch, pytorch-lightning",
    "author": "Alive1024",
    "author_email": "2431945058@qq.com",
    "download_url": "https://files.pythonhosted.org/packages/29/0f/b752c443c4aa948d057245558f8e1dda8877909d65ed22f8697d51ab0cf0/cube_dl-0.3.10.tar.gz",
    "platform": null,
    "description": "# cube-dl\n\nLanguages: English | [\u7b80\u4f53\u4e2d\u6587](./docs/README_zh-CN.md)\n\n**\"The last stop\" for training your deep learning models.**\n\n**Manage tons of configurations and experiments with minimal changes to existing code.**\n\n\n[![Packaging Wheel](https://github.com/Alive1024/cube-dl/actions/workflows/packaging_wheel_on_push.yml/badge.svg)](https://github.com/Alive1024/cube-dl/actions/workflows/packaging_wheel_on_push.yml)\n[![Publishing to PyPI](https://github.com/Alive1024/cube-dl/actions/workflows/publishing_on_tag.yml/badge.svg)](https://github.com/Alive1024/cube-dl/actions/workflows/publishing_on_tag.yml)\n[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit)](https://github.com/pre-commit/pre-commit)\n\n**Install from PyPI (stable, recommended)**\uff1a\n\n```shell\npip install -U cube-dl\n```\n\n**Install from wheel file (latest)**:\n\nEnter the [Actions](https://github.com/Alive1024/cube-dl/actions) page of this project, select the latest workflow run from the actions corresponding to \"Packaging Wheel\", download the compressed package of the wheel file from Artifacts, extract it, and install it using pip:\n\n```shell\npip install xxx.whl\n```\n\n**Install from source code (latest)**\uff1a\n\n```shell\ngit clone git@github.com:Alive1024/cube-dl.git\ncd cube-dl\npip install .\n```\n\n**Table of Contents**\uff1a\n\n- [cube-dl](#cube-dl)\n- [1. Introduction](#1-introduction)\n  - [1.1 Motivation](#11-motivation)\n  - [1.2 Main Features](#12-main-features)\n  - [1.3 Design Principles](#13-design-principles)\n  - [1.4 Prerequisites](#14-prerequisites)\n- [2. Project Description](#2-project-description)\n  - [2.1 Key Concepts](#21-key-concepts)\n    - [2.1.1 Four Core Components](#211-four-core-components)\n    - [2.1.2 The Triple-Layer Structure for Organizing Experiments](#212-the-triple-layer-structure-for-organizing-experiments)\n  - [2.2 Configuration System](#22-configuration-system)\n    - [2.2.1 Configuration Files](#221-configuration-files)\n    - [2.2.3 Automatic Archiving of Configuration Files](#223-automatic-archiving-of-configuration-files)\n    - [2.2.4 Sharing Preset Values Between Configuration Files](#224-sharing-preset-values-between-configuration-files)\n    - [2.2.5 Comparison with Other Configuration Methods](#225-comparison-with-other-configuration-methods)\n  - [2.3 Starter](#23-starter)\n  - [2.4 Directory Structure of The Starter](#24-directory-structure-of-the-starter)\n  - [2.4 Basic Commands and Arguments](#24-basic-commands-and-arguments)\n    - [`start`](#start)\n    - [`new`](#new)\n    - [`add-exp`](#add-exp)\n    - [`ls`](#ls)\n    - [Common Arguments for `fit`, `validate`, `test`, `predict`](#common-arguments-for-fit-validate-test-predict)\n    - [Common Arguments for  `validate`, `test`, `predict`](#common-arguments-for--validate-test-predict)\n    - [`fit`](#fit)\n    - [`resume-fit`](#resume-fit)\n    - [`validate`](#validate)\n    - [`test`](#test)\n    - [`predict`](#predict)\n  - [2.5 Others](#25-others)\n    - [2.5.1 Callback Functions](#251-callback-functions)\n    - [2.5.2 Runtime Contexts](#252-runtime-contexts)\n\n# 1. Introduction\n\n**cube-dl** is a high-level Python library for managing and training deep learning models, designed to manage a large number of deep learning configuration items and experiments with ease, making it well-organized.\n\n## 1.1 Motivation\n\nAs we can see, there are already quite a few libraries related to deep learning at different levels in the open source community. For example, [PyTorch](https://github.com/pytorch/pytorch) provides powerful deep learning modeling capabilities, while [PyTorch-Lightning](https://github.com/Lightning-AI/pytorch-lightning) abstracts and wraps PyTorch, saving the hassle of writing a large amount of boilerplate code. However, even with these, training deep learning models may still be chaotic due to a large number of configurable items, experiments, etc., forcing researchers/developers to spend a lot of energy and time organizing and comparing experimental results, rather than the methods themselves. In addition, in the process of conducting research, it is inevitable to use other people's open-source algorithms. Due to each person's different code habits, open-source algorithms have different organizational structures, and some repositories serve specific methods or datasets without good top-level design. Using these codes for custom experiments can be quite painful. Furthermore, when aggregating algorithms from different sources, a universal code structure is required.\n\n**cube-dl** was born as a result, by imposing some rule constraints on configuration and experimental management to make deep learning projects easier to manage, and striking a good balance between abstraction and flexibility.\n\n## 1.2 Main Features\n\n- **Componentization**: The elements involved in the training process of deep learning models are clearly divided into four parts to achieve low coupling and high reusability;\n- **A brand-new configuration system**: Deep learning projects often involve a large number of configurable parameters, and how to effortlessly configure these parameters is an important issue. Moreover, these parameters often have a crucial impact on the final result, so it is necessary to document these parameters in detail. Cube-dl has redesigned the entire configuration system based on the characteristics of deep learning projects, making it easy to use and traceable;\n- **Triple organizational structure**: In order to organize a large number of experiments in a more organized manner, all experiments are forcibly divided into three levels: Project, Experiment, and Run. Each task execution will automatically save corresponding records for reference;\n- **Simple and Fast CLI**: cube-dl provides a concise CLI that can be managed, trained, and tested with a few commands.\n\n\n## 1.3 Design Principles\n\ncube-dl follows the following principles as much as possible:\n\n- **Universality**: Independent of specific research fields, there is no need to start from scratch when switching between different fields;\n- **Flexibility and Scalability**: \"Extend rather than modify\". When implementing new models, datasets, optimization algorithms, loss functions, metrics, and other components, try not to change existing code as much as possible. Instead, add new code to achieve extension;\n- **Good Organization and Recording**: The results of each operation should be well organized and recorded;\n- **Maximum Compatibility**: facilitates the migration of existing other code to the current code repository at the lowest cost;\n- **Lowest Learning Cost**: After reading README, you can master how to use it without having to learn a lot of APIs from dozens of pages of documentation.\n\n## 1.4 Prerequisites\n\nUsers should have a basic understanding of Python and PyTorch.\n\n\n# 2. Project Description\n\n## 2.1 Key Concepts\n\n### 2.1.1 Four Core Components\n\nGenerally speaking, the core components of deep learning include [<sup>1</sup>](https://d2l.ai/chapter_introduction/index.html#key-components)\uff1a\n\n- **Data** that can be learned\n- The **model** for converting data\n- **Objective function** for quantifying model effectiveness\n- **Optimization algorithm** for adjusting model parameters to optimize the objective function\n\nBased on the above classification and componentization ideas, cube-dl reorganizes the relevant components of deep learning projects into four parts:\n\n- **Model**: the model to be trained\uff1b\n- **Task Module**: The definition of the process for a certain deep learning task, corresponding to a certain training paradigm, such as the most common fully supervised learning. The Task Module can be further subdivided into several components, such as loss functions, optimization algorithms, learning rate regulators, metrics used in validation and testing, etc. Meanwhile, the model to be trained is specified as an initialization parameter for the Task Module;\n- **Data Module**: Data related, corresponding to the combination of Dataset and DataLoader for PyTorch, similar to the LightningDataModule for [PyTorch-Lightning](https://lightning.ai/docs/pytorch/stable/data/datamodule.html). However, the usage is slightly different. The Data Module here is not specific to any dataset, and the specific dataset class is passed in as an initialization parameter for the Data Module;\n- **Runner**: The engineering level code for executing model training, validation, testing, reasoning, and other processes.\n\n```text\n                        \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n                        \u2502     Model      \u2502\n                        \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518     \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n                                               \u2502 Loss Function  \u2502\n                                               \u251c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2524\n                        \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510     \u2502   Optimizer    \u2502\n                        \u2502  Task Module   \u2502\u2500\u2500\u2500\u2500\u2500\u25b6\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2524\n                        \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518     \u2502  LR Scheduler  \u2502\n                                               \u251c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2524\n                                               \u2502Val/Test Metrics\u2502\n                                               \u251c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2524\n                                               \u2502     ......     \u2502\n                                               \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n                                               \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n                                               \u2502    Datasets    \u2502\n                        \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510     \u251c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2524\n                        \u2502  Data Module   \u2502\u2500\u2500\u2500\u2500\u2500\u25b6  Batch Sizes   \u2502\n                        \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518     \u251c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2524\n                                               \u2502     ......     \u2502\n                                               \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n\n                        \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n                        \u2502     Runner     \u2502\n                        \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n```\n\n\n\n### 2.1.2 The Triple-Layer Structure for Organizing Experiments\n\nIn order to organize all experiments in a more organized manner, cube-dl mandatorily requires users to use a \"triple-layer structure\":\n\n- **Project** (hereinafter referred to as **proj**): contains multiple exps;\n- **Experiment** (hereinafter referred to as **exp**): a set of runs with a common theme, each exp must be associated with a certain proj, such as \"baseline\", \"abbreviation\", \"contrast\", etc.;\n- **Run**: The smallest atomic unit of operation, each run must belong to an exp in a proj, and each run has a job type indicating what the run is doing.\n\nThe above three entities all have corresponding random IDs composed of lowercase letters and numbers. ID of proj/exp is of 2 characters, and the ID of run is of 4 characters.\n\n\nThe structure of the output directory will take the form of:\n\n```text\n                 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n               \u250c\u25b6\u2502   proj_6r_DummyProj   \u2502\n               \u2502 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518          \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n               \u2502             \u2502                   \u250c\u2500\u25b6\u2502run_z2hi_fit_DummyRun\u2502\n               \u2502             \u2502 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u2502  \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n               \u2502             \u251c\u25b6\u2502exp_1b_DummyExp\u2502\u2500\u2524\n               \u2502             \u2502 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2502  \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u2502             \u2502                   \u2514\u2500\u25b6\u2502  ...  \u2502\n\u2502   Output   \u2502 \u2502             \u2502 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510            \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n\u2502 Directory  \u2502\u2500\u2524             \u2514\u25b6\u2502  ...  \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2502               \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n               \u2502\n               \u2502 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n               \u2514\u25b6\u2502  ...  \u2502\n                 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n```\n\nIn the root directory of proj, there will be a JSON file with the same name, which contains records of all exps and runs of the current proj, such as:\n\n```json\n{\n  \"ID\": \"6r\",\n  \"Name\": \"DummyProj\",\n  \"Desc\": \"This is a dummy proj for demonstration.\",\n  \"CreatedTime\": \"2024-03-18 22:11:15\",\n  \"Path\": \"./outputs/proj_6r_DummyProj\",\n  \"Exps\": {\n    \"1b\": {\n      \"Name\": \"DummyExp\",\n      \"Desc\": \"This is a dummy exp for demonstration.\",\n      \"CreatedTime\": \"2024-03-18 22:11:15\",\n      \"Path\": \"./outputs/proj_6r_DummyProj/exp_1b_DummyExp\",\n      \"Runs\": {\n        \"z2hi\": {\n          \"Name\": \"DummyRun\",\n          \"Desc\": \"A dummy run for demonstration.\",\n          \"CreatedTime\": \"2024-03-18 22:12:49\",\n          \"Path\": \"./outputs/proj_6r_DummyProj/exp_1b_DummyExp/run_z2hi_fit_DummyRun\",\n          \"Type\": \"fit\"\n        }\n      }\n    }\n  }\n}\n```\n\nBy default, these proj record files will be tracked by git to facilitate distributed collaboration among multiple people through git. This means that the proj, exp, and run created by user A can be seen by user B (but the output products of run will not be tracked by git).\n\n\n## 2.2 Configuration System\n\nAs mentioned earlier, deep learning projects often involve a large number of configurable parameters, and it is crucial to pass in and record these parameters. Considering that the essence of configuration is to provide initialization parameters for instantiating classes, cube-dl has designed a brand-new configuration system. Writing configuration files is as natural as writing code for instantiating a class normally.\n\n### 2.2.1 Configuration Files\n\nIn cube-dl, the configuration file is actually a `.py` source code file, mainly used to define how to instantiate the corresponding object. Writing a configuration file is a process of selecting (`import` what to be used) and defining how to instantiate it. For example, the following is a code snippet for configuring a runner:\n\n```python\n@cube_runner\ndef get_fit_runner():\n    run = CUBE_CONTEXT[\"run\"]\n    return pl.Trainer(\n        accelerator=\"auto\",\n        max_epochs=shared_config.get(\"max_epochs\"),\n        callbacks=[\n            RichProgressBar(),\n            ModelCheckpoint(\n                dirpath=osp.join(run.run_dir, \"checkpoints\"),\n                filename=\"{epoch}-{step}-{val_mean_acc:.4f}\",\n                save_top_k=1,\n                monitor=\"val_mean_acc\",\n                mode=\"max\",\n            ),\n        ],\n        logger=get_csv_logger(run),\n    )\n```\n\nAs can be seen, in the configuration file, the instantiation process needs to be put into a \"getter\" function, and the instantiated object will be `return`. The reason why an object is not directly instantiated in the configuration file is to allow the cube-dl to control the timing of instantiation of configuration items.\n\nSince the configuration file is essentially a Python source code file, it can contain any logic like a regular Python source code file, but it is generally not very complex.\n\nCorresponding to the four core components described earlier, there are four main types of core configuration items, namely `cube_model`, `cube_task_module`, `cube_data_module` and `cube_runner`. These configuration items can be used as modular and reusable configuration components in the configuration system. In addition, during actual experiments, it is necessary to freely combine the four components to form a **RootConfig**, which is the root node of all configurations.\n\nThe relationship between the five configuration items is as follows:\n\n```text\n                                  \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n                                  \u2502       Components       \u2502\n                                  \u2502     \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510     \u2502\n                                  \u2502 \u250c\u2500\u2500\u25b6\u2502  Model(s)  \u2502\u2500\u2500\u2510  \u2502\n                                  \u2502 \u2502   \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518  \u2502  \u2502\n                                  \u2502 \u2502                   \u2502  \u2502\n                  \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u2502 \u2502   \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510  \u2502  \u2502\n                  \u2502 Root Config \u2502\u2500\u253c\u2500\u253c\u2500\u2500\u25b6\u2502Task Module \u2502\u25c0\u2500\u2518  \u2502\n                  \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2502 \u2502   \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518     \u2502\n                                  \u2502 \u2502   \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510     \u2502\n                                  \u2502 \u251c\u2500\u2500\u25b6\u2502Data Module \u2502     \u2502\n                                  \u2502 \u2502   \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518     \u2502\n                                  \u2502 \u2502   \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510     \u2502\n                                  \u2502 \u2514\u2500\u2500\u25b6\u2502 Runner(s)  \u2502     \u2502\n                                  \u2502     \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518     \u2502\n                                  \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n```\n\nFor some rules regarding configuration files:\n\n- For better readability, keyword parameters must be used when initializing `RootConfig` in the configuration file (it is recommended to force the use of keyword parameters when writing task/data modules, following this rule);\n- The getter function name of Root config must be `get_root_config`, and there can only be one in each configuration file. Other types of configuration items do not have this restriction;\n- The getter function of task module must have a parameter named `model`, which corresponds to the `model_getters` passed to the root config. This parameter is used to pass the model object, which is needed in optimizer or other configration item of task module. When a list (means multiple models) is passed as `model_getters`, the parameter `model` will be also a list.\n- Decorators named  `cube_root_config`, `cube_model`, `cube_task_module`, `cube_data_module` and `cube_runner` can be imported into `cube_dl.config_sys`. It is strongly recommended to use the corresponding decorators when writing getter functions, on the one hand to allow the decorators to check, and on the other hand to expand in the future.\n\nAdditionally, it is recommended to use relative import statements when importing the required config components from other configuration files.\n\n\n### 2.2.3 Automatic Archiving of Configuration Files\n\nFor the convenience of replicating a run, the configuration files used during each run will be automatically archived. By default, a configuration file named 'archived_config_<RUN_ID>. py' will be saved in the root directory of the corresponding run. This file combines several configuration files specified at runtime to form a separate file, which can be used directly when replicating this experiment.\n\n### 2.2.4 Sharing Preset Values Between Configuration Files\n\nIn some scenarios, some configuration values need to be shared between different configuration files. For example, epochs may be required by both the LR scheduler in the task module and the runner. In order to facilitate one-time modification and prevent errors caused by omissions, when all configuration components are in the same configuration file, the preset values that need to be shared can be defined as global variables. However, this approach is not feasible when configuration components are scattered across multiple files. In this case, the `shared_config` provided by cube-dl can be used (which can be imported from `cube_dl.config_sys`). Perform `set` in the root config getter, and then perform `get` when needed for other purposes.\n\n### 2.2.5 Comparison with Other Configuration Methods\n\nThe comparison with several mainstream configuration methods is as follows:\n\n1. **Defining Command Line Arguments by argparse**: some projects directly defines configurable arguments by `argparse`. It is obvious that this configuration method is complex and prone to errors when the number of parameters continues to expand, and it is also very troublesome at runtime;\n2. **Using XML/JSON/YAML or Other Configuration Files**\uff1afor example, some configurations of [detectron2](https://github.com/facebookresearch/detectron2) and [LightningCLI](https://lightning.ai/docs/pytorch/stable/cli/lightning_cli_advanced.html) provided by PyTorch-Lightning adopt YAML files. This method has an obvious flaw: the prompt function from the IDEs will be very limited, and it is almost identical to a plain text file during editing. When there are many configuration items, handwriting or copying and pasting hundreds of lines of text back and forth can be very painful. When configuring, you also need to spend time looking up optional values and can only achieve simple logic;\n3. **Using OmegaConf or Other Configuration Library**\uff1a [OmegaConf](https://github.com/omry/omegaconf) is a YAML-based hierarchical configuration system, supporting configuration from merging multiple sources, with strong flexibility. But when writing deep learning projects involving numerous parameters, editing files like YAML also faces the hassle of writing a large number of text files;\n4. **Implementing Specific Config Classes**\uff1afor example, [Mask_RCNN - config.py](https://github.com/matterport/Mask_RCNN/blob/master/mrcnn/config.py) implements `Config` base class. When using it, subclasses need to be derived and some attribute values need to be covered as needed. This approach is inflexible and tightly coupled with the current project, making it unsuitable for general scenarios;\n5. **General Python Source Files**\uff1amost of open source libraries from [OpenMMLab](https://github.com/open-mmlab) adopt this method such as [mmdetection](https://github.com/open-mmlab/mmdetection. Their configration files are like [atss_r50_fpn_8xb8-amp-lsj-200e_coco.py](https://github.com/open-mmlab/mmdetection/blob/main/configs/atss/atss_r50_fpn_8xb8-amp-lsj-200e_coco.py). Although Python source code files are used for configuration, it is self-contained and has special rules (such as the need to use `_base_` for inheritance), which incurs learning costs. Essentially, it involves defining several `dict`s, defining the classes to be used and their parameters, which are passed in as key values and cannot fully utilize the code prompts of the IDE. This has similar drawbacks to text-based configuration methods. Moreover, the direct assignment of various configuration items as variables in the configuration file is quite loose and error-prone.\n\nThese configuration methods essentially pass parameters in various forms, and then the configuration system will use these parameters to instantiate some classes or pass them to a certain location. The configuration method in cube-dl is equivalent to flipping this process, directly defining how to instantiate the class during use, and the configuration system will automatically record and archive it. In this way, the process of writing configuration files is as natural as instantiating classes normally, with almost no need to learn how to configure them. It can also fully utilize the prompts of the IDE to improve writing efficiency and add any logic.\n\n## 2.3 Starter\n\nThe so-called \"starter\" is a set of initial files compatible with cube-dl, used to initialize a deep learning project. Through this approach, cube-dl can be decoupled from specific frameworks such as PyTorch-Lightning. When creating a project, you can choose more flexible native PyTorch or more abstract PyTorch-Lightning based on actual needs.\n\nThe standard starter should contain a file named \"[pyproejct.toml](https://packaging.python.org/en/latest/specifications/pyproject-toml/#). And it should contain a configuration item named `tool.cube_dl`.\n\n## 2.4 Directory Structure of The Starter\n\nThe structure and meaning of the starter directory are as follows (using \"pytorch-lighting\" as an example):\n\n```text\npytorch-lightning\n\u251c\u2500\u2500 callbacks \u3010specific CubeCallbacks for the current starter\u3011\n\u251c\u2500\u2500 configs   \u3010directory for storing configuration files\u3011\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 __init__.py\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 components \u3010configuration components\u3011\n\u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u2514\u2500\u2500 mnist_data_module.py\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 mnist_cnn_sl.py \u3010a RootConfig file\u3011\n\u251c\u2500\u2500 data    \u3010directory for storing\uff08symbolic links\uff09\u3011\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 MNIST -> /Users/yihaozuo/Zyh-Coding-Projects/Datasets/MNIST\n\u251c\u2500\u2500 datasets \u3010data modules and dataset classes\u3011\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 __init__.py\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 basic_data_module.py\n\u251c\u2500\u2500 models   \u3010model definition\u3011\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 __init__.py\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 __pycache__\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 cnn_example.py\n\u251c\u2500\u2500 outputs  \u3010the output director, string all output products\u3011\n\u251c\u2500\u2500 pyproject.toml  \u3010configuration file for Python project\u3011\n\u251c\u2500\u2500 requirements.txt\n\u2514\u2500\u2500 tasks \u3010definitions of Task Modules\u3011\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 __init__.py\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 base.py  \u3010task base class\u3011\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 supervised_learning.py\u3010task definition of full supervised learning\u3011\n\u2514\u2500\u2500 utils  \u3010miscellaneous utilities\u3011\n```\n\n## 2.4 Basic Commands and Arguments\n\n### `start`\n\nDownload the specified starter.\n\nYou can first view the available starters through `cube start -l`, and then download the specified starter using the following arguments:\n\n| Argument Name   | Type | Required | Meaning |\n|-----------------|:----:|:---------:|---------|\n| **-o**, --owner | str  |     \u274c     |         |\n| **-r**, --repo  | str  |     \u274c     |         |\n| **-p**, --path  | str  |     \u2705     |         |\n| **-d**, --dest  | str  |     \u274c     |         |\n\nFor example\uff1a\n\n```shell\ncube start -o Alive1024 -r cube-dl -p pytorch-lightning\n```\n\n### `new`\n\nCreate a pair of new proj and exp.\n\n| Argument Name                     | Type | Required | Meaning                     |\n|-----------------------------------| :--: | :------: |-----------------------------|\n| **-pn**, --proj-name, --proj_name | str  |    \u2705     | name of the new proj        |\n| **-pd**, --proj-desc, --proj_desc | str  |    \u274c     | description of the new proj |\n| **-en**, --exp-name, --exp_name   | str  |    \u2705     | name of the new exp         |\n| **-ed**, --exp-desc, --exp_desc   | str  |    \u274c     | description of the new exp  |\n\nFor example\uff1a\n\n```shell\ncube new -pn \"MyFirstProject\" -pd \"This is my first project.\" -en \"Baseline\" -ed \"Baseline exps.\"\n```\n\n### `add-exp`\n\nAdd a new exp to a proj.\n\n| Argument Name                | Type | Required | Meaning                                    |\n| ---------------------------- |:----:| :------: |--------------------------------------------|\n| **-p**, --proj-id, --proj_id | str  |    \u2705     | ID of the proj that the new exp belongs to |\n| **-n**, --name               | str  |    \u2705     | name of the new exp                        |\n| **-d**, --desc               | str  |    \u274c     | description of the new exp                 |\n\nFor example\uff1a\n\n```shell\ncube add-exp -p 8q -n \"Ablation\" -d \"Ablation exps.\"\n```\n\n### `ls`\n\nDisplay information about proj, exp, and run in the form of a table in the terminal.\n\n`cube ls` is equivalent to `cube ls -pe`, which will display all proj and exp.\n\n\nThe following other parameters are mutually exclusive:\n\n| Argument Name                           |     Type     |                                    Meaning                                     |\n|-----------------------------------------|:------------:|:------------------------------------------------------------------------------:|\n| **-p**, --projs                         | \"store_true\" |                               display all projs                                |\n| **-er**, --exps-runs-of, --exps_runs_of |     str      |             display all exps and runs of the proj specified by ID              |\n| **-e**, --exps-of, --exps_of            |     str      |                  display all exps of the proj specified by ID                  |\n| **-r**, --runs-of, --runs_of            |  str (2 \u4e2a)   | display all runs of the exp of the proj specified by two IDs (proj_ID exp_ID). |\n\nFor example\uff1a\n\n```shell\ncube ls -r 8q zy\n```\n\n### Common Arguments for `fit`, `validate`, `test`, `predict`\n\n`fit`, `validate`, `test`, `predict` all have the following arguments:\n\n| Argument Name                               | Type | Required | Meaning                                        |\n| ------------------------------------ |:----:|:--------:|------------------------------------------------|\n| **-c**, --config-file, --config_file | str  |    \u2705     | path to the config file                        |\n| **-p**, --proj-id, --proj_id         | str  |    \u2705     | ID of the proj ID that the new run belongs to. |\n| **-e**, --exp-id, --exp_id           | str  |    \u2705     | ID of the exp that the new run belongs to.     |\n| **-n**, --name                       | str  |    \u2705     | name of the new run                            |\n| **-d**, --desc                       | str  |    \u274c     | description of the new run                                   |\n\n### Common Arguments for  `validate`, `test`, `predict`\n\nIn addition to the above parameters, subcommands `validate`, `test`, `predict` also have the following arguments:\n\n| Argument Name                         | Type | Required | Meaning                                               |\n| ------------------------------------- | :--: | :------: | -------------------------------------------------- |\n| **-lc**, --loaded-ckpt, --loaded_ckpt | str  |    \u2705     | File path of the model checkpoint to be loaded. Use an empty string \"\" to explicitly indicate you are going to conduct validate/test/predict using the initialized model without loading any weights). |\n\n\n### `fit`\n\nTraining on the training set.\n\nFor example\uff1a\n\n```shell\ncube fit -c configs/mnist_cnn_sl.py -p 8q -e zy -n \"ep25-lr1e-3\" -d \"Use a 3-layer simple CNN as baseline, max_epochs: 25, base lr: 1e-3\"\n```\n\n### `resume-fit`\n\nResume fit from an interrupted one.\n\n| Argument Name                        | Type | Required | Meaning                                                                                                                                                                |\n| ------------------------------------ | :--: | :------: |------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| **-c**, --config-file, --config_file | str  |    \u2705     | path to the config file                                                                                                                                                |\n| **-r**, --resume-from, --resume_from | str  |    \u2705     | file path to the checkpoint where resumes, the path should include the directory names where proj, exp, and run are located (inferring IDs requires these information) |\n\nFor example\uff1a\n\n```shell\ncube resume-fit -c configs/mnist_cnn_sl.py -r \"outputs/proj_8q_MNIST/exp_zy_Baseline/run_rw4q_fit_ep25-lr1e-3/checkpoints/epoch\\=3-step\\=1532.ckpt\"\n```\n\n### `validate`\n\nEvaluate on the validation set.\n\nFor example\uff1a\n\n```shell\ncube validate -c configs/mnist_cnn_sl.py -p 8q -e zy -n \"Val\" -d \"Validate the simple CNN.\" -lc \"outputs/proj_8q_MNIST/exp_zy_Baseline/run_rw4q_fit_ep25-lr1e-3/checkpoints/epoch\\=3-step\\=1532.ckpt\"\n```\n\n### `test`\n\nEvaluate on the test set.\n\nFor example\uff1a\n\n```shell\ncube test -c configs/mnist_cnn_sl.py -p 8q -e zy -n \"Test\" -d \"Test the simple CNN.\" -lc \"outputs/proj_8q_MNIST/exp_zy_Baseline/run_rw4q_fit_ep25-lr1e-3/checkpoints/epoch\\=3-step\\=1532.ckpt\"\n```\n\n### `predict`\n\nPredict.\n\nFor example\uff1a\n\n```shell\ncube predict -c configs/mnist_cnn_sl.py -p 8q -e zy -n \"Test\" -d \"Predict using the simple CNN.\" -lc \"outputs/proj_8q_MNIST/exp_zy_Baseline/run_rw4q_fit_ep25-lr1e-3/checkpoints/epoch\\=3-step\\=1532.ckpt\"\n```\n\n## 2.5 Others\n\n### 2.5.1 Callback Functions\n\n`RootConfig` supports adding callback functions through the `callbacks` parameter. All callback functions should be `cube_dl.callback.CubeCallback` type. When custom callback functions are needed, they should inherit the `CubeCallback` class and implement the required hooks. Currently, `CubeCallback` supports `on_run_start` and `on_run_end`.\n\n### 2.5.2 Runtime Contexts\n\nAt runtime, the cube-dl will store some context in specific locations for access.\n\nYou can import `CUBE_CONTEXT` (actually a dict) from `cube_dl.core`, and then retrieve the current `Run` object through `run = CUBE_CONTEXT[\"run\"]`. This is very useful when obtaining information related to `Run`. For example, when you want to save the predicted results to the corresponding run directory during validation, you can obtain it through `CUBE_CONTEXT[\"run\"].run_dir`.\n\nIn addition, the ID of the current `Run` object can also be obtained by accessing the environment variable named `CUBE_RUN_ID`.\n",
    "bugtrack_url": null,
    "license": "GPL-3.0-or-later",
    "summary": "\"The last stop\" for training your deep learning models. Manage tons of configurations and experiments with minimal changes to existing code.",
    "version": "0.3.10",
    "project_urls": null,
    "split_keywords": [
        "python",
        " data-science",
        " machine-learning",
        " deep-learning",
        " python3",
        " pytorch",
        " pytorch-lightning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6d0f51f60842106eeeb6a8eac8b9c4dbf9c4844db29c5286ea87b35aa8d77b61",
                "md5": "30a462e5bc2dec31a335ba9b022c4a84",
                "sha256": "11b846b0d5fb27399cc5bde739036e7c4f000c39be7283986dbfea7eda63f165"
            },
            "downloads": -1,
            "filename": "cube_dl-0.3.10-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "30a462e5bc2dec31a335ba9b022c4a84",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.10",
            "size": 47720,
            "upload_time": "2024-05-24T09:39:26",
            "upload_time_iso_8601": "2024-05-24T09:39:26.933674Z",
            "url": "https://files.pythonhosted.org/packages/6d/0f/51f60842106eeeb6a8eac8b9c4dbf9c4844db29c5286ea87b35aa8d77b61/cube_dl-0.3.10-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "290fb752c443c4aa948d057245558f8e1dda8877909d65ed22f8697d51ab0cf0",
                "md5": "9872e148525fb7e4eda779c9ff81127a",
                "sha256": "bb6d4359d8948e7067db3922c7bc5cb3c3d77bd3edb9fc1f2c2eead063525b6d"
            },
            "downloads": -1,
            "filename": "cube_dl-0.3.10.tar.gz",
            "has_sig": false,
            "md5_digest": "9872e148525fb7e4eda779c9ff81127a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 49461,
            "upload_time": "2024-05-24T09:39:28",
            "upload_time_iso_8601": "2024-05-24T09:39:28.369349Z",
            "url": "https://files.pythonhosted.org/packages/29/0f/b752c443c4aa948d057245558f8e1dda8877909d65ed22f8697d51ab0cf0/cube_dl-0.3.10.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-24 09:39:28",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "cube-dl"
}

Alive1024