malet


Namemalet JSON
Version 0.2.1 PyPI version JSON
download
home_pageNone
SummaryMalet: a tool for machine learning experiment
upload_time2024-07-17 13:42:44
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseCopyright (c) 2023 Dongyeop Lee Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords machine learning experiment plot
VCS
bugtrack_url
requirements absl-py gitpython matplotlib ml-collections numpy pandas rich seaborn
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Malet: a tool for machine learning experiment

🔨 **Malet** (**Ma**chine **L**earning **E**xperiment **T**ool) is a tool for efficient machine learning experiment execution, logging, analysis, and plot making.

The following features are provided:

- Simple YAML-based hyperparameter configuration w/ grid search syntax
- Experiment logging and resuming system
- User-friendly command-line tool for flexible graphing and easy data extraction from experiment logs
- Efficient parallelization by splitting a sequence of experiments over GPU jobs

## Installation

You can install Malet using pip,

```bash
pip install malet
```

or from this repository.

```bash
pip install git+https://github.com/edong6768/Malet.git
```

## Dependencies

- absl-py 1.0.0
- gitpython 3.1.40
- matplotlib 3.7.0
- ml-collections 0.1.0
- numpy 1.22.0
- pandas 2.0.3
- rich 13.6.0
- seaborn 0.11.2

## Documentation **(🚨 Will be migrated to Sphinx-based Read-the-docs in near future)**

### Contents

**[Quick start](#quick-start)**

1. [Prerequisite](#1-prerequisite)
2. [Running experiments](#2-running-experiments)
3. [Plot making](#3-plot-making)

**[Advanced topics](#advanced-topics)**

1. [Advanced gridding in yaml](#advanced-gridding-in-yaml)
2. [Advanced plot making](#advanced-plot-making)
3. [Parallel friendly grid splitting](#parallel-friendly-grid-splitting)
4. [Saving logs in intermediate epochs](#saving-logs-in-intermediate-epochs)
5. [Merging multiple log files](#merging-multiple-log-files)

## Quick start

### 1. Prerequisite

#### Experiment Folder

Using Malet starts with making a folder with a single yaml config file.
Various files resulting from some experiment is saved in this single folder.
We advise to create a folder for each experiment under ```experiments``` folder.

```yaml
experiments/
└── {experiment folder}/
    ├── exp_config.yaml : experiment config yaml file            (User created)
    ├── log.tsv         : log file for saving experiment results (generated by malet.experiment)
    ├── (log_splits)    : folder for splitted logs               (generated by malet.experiment)
    └── figure          : folder for figures                     (generated by malet.plot)
```

#### Pre-existing training pipeline

Say you have some training pipeline that takes in a configuration (any object w/ dictionary-like interface).
We require you to return the result of the training so it gets logged.

```python
def train(config, ...):
    ...
    # training happens here
    ...
    metric_dict = {
        'train_accuracies': train_accuracies,
        'val_accuracies': val_accuracies,
        'train_losses': train_losses,
        'val_losses': val_losses,
    }
    return metric_dict
```

### 2. Running experiments

#### Experiment config yaml

You can configure as you would do in the yaml file.
But we provide useful special keyword `grid`, used as follows:

```yaml
# static configs
model: LeNet5
dataset: mnist

num_epochs: 100
batch_size: 128
optimizer: adam

# grided fields
grid:
    seed: [1, 2, 3]
    lr: [0.0001, 0.001, 0.01, 0.1]
    weight_decay: [0.0, 0.00005, 0.0001]
```

Specifying list of config values under `grid` lets you run all possible combination (*i.e.* grid) of your configurations, with field least frequently changing in the order of declaration in `grid`.

#### Running experiments

The following will run the `train_fn` on grid of configs based on `{exp_folder_path}` and `train_fn`.

```python
from functools import partial
from malet.experiment import Experiment

train_fn = partial(train, ...{other arguments besides config}..)
metric_fields =  ['train_accuracies', 'val_accuracies', 'train_losses', 'val_losses']
experiment = Experiment({exp_folder_path}, train_fn, metric_fields)
experiment.run()
```

Note that you need to partially apply your original function so that you pass in a function with only `config` as its argument.

#### Experiment logs

The experiment log will be automatically saved in the `{exp_folder_path}` as `log.tsv`, where the static configs and the experiment log are eached saved in yaml and tsv like structure respectively.
You can retrieve these data in python using `ExperimentLog` in `malet.experiment` as follows:

```python
from malet.experiment import ExperimentLog

log = ExperimentLog.from_tsv({tsv_file})

static_configs = log.static_configs
df = log.df
```

Experiment logs also enables resuming to the most recently run config when a job is suddenly killed.
Note that this only enable you to resume from the begining of the training.
For resuming from intermediate log checkpoints, check out [Saving logs in intermediate epochs](#saving-logs-in-intermediate-epochs).

### 3. Plot making

Running `malet.plot` lets you make plots based on `log.tsv` in the experiment folder.

```bash
malet-plot \
-exp_folder ../experiments/{exp_folder} \
-mode curve-epoch-train_accuracy
```

The key intuition for using this is to *leave only two field in the dataframe for the x-axis and the y-axis* by

1. **specifying a specific value** (*e.g.* model, dataset, optimizer, etc.),
2. **averaging over** (seed),
3. or **choose value with best metric** (other hyperparameters),

which will leave only one value for each field.
This can be done using the following arguments.

#### Data related arguments

1. **`-mode`**: Mode consists of mode of the plot (currently only has 'curve' and 'bar'), the field for x-axis, and the metric to use for y-axis.

    ```bash
    -mode {plot_mode}-{x_field}-{metric}
    ```

    Any other field except `x_field` and `seed` (always averaged over) is automatically chosen value with best metric.
    To specify a value of a field, you can use the following `-filter` argument.
2. **`-filter`**: Use to explicitly choose only certain subset values of some field.

    ```bash
    -filter '{field1} {v1} {v2} / {field2} {v3} {v4} ...'
    ```

    Here, two special fields are automatically generated:

    - `step` - from `explode`ing list-type metric, with special value 'best' and 'last' for selecting best performing step and last step respectively and with slicing syntax (e.g., 50:100),
    - `metric` - from `melt`ing different metrics column name into a new column.

    are automatically generated.

3. **`-multi_line_fields`**: Specify the fields to plot multiple lines over.

    ```bash
    -multi_line_field '{field1} {field2} ...'
    ```

4. **`-multi_plot_fields`**: Specify the fields to plot multiple plot (column/row) over.

    ```bash
    -multi_plot_field '{column field}'
    -multi_plot_field '{column field} {row field}'
    ```

4. **`-animate_field`**: Specify the fields to animate over. Saves gif intead of pdf.

    ```bash
    -animate_field '{field}'
    ```

6. **`-best_at_max`** (Default: False): Specify whether chosen metric is best when largest (e.g. accuracy).

    ```bash
    -best_at_max
    -nobest_at_max
    ```

#### Styling arguments

1. **`-colors`**: Name or list of names of [matplotlib colormaps](https://matplotlib.org/stable/users/explain/colors/colormaps.html).

    ```bash
    -colors 'default'
    ```

2. **`-annotate`**: Option to add annotation based on field specified in `annotate_fields`.

    ```bash
    -annotate
    ```

3. **`-annotate_fields`**: Field to annotate.

    ```bash
    -annotate_fields '{field1} {field2} ...'
    ```

3. **`-fig_size`**: Figure size.
    
    - Square figure
      ```bash
      -fig_size 7
      ```
    - Rectangular figure (x, y)
      ```bash
      -fig_size 10 8
      ```

3. **`-style`**: Matplotlib style.

    ```bash
    -style 'ggplot'
    ```

4. **`-plot_config`**: The path for a yaml file to configure all aspects the plot.

    ```bash
    -plot_config {plot_config_path}
    ```

    In this yaml, you can specify the `line_style` and `ax_style` under each mode as follows:

    ```yaml
    'curve-epoch-train_accuracy':
      annotate: false
      std_plot: fill
      line_style: 
        linewidth: 4
        marker: 'D'
        markersize: 10

      ax_style:
        frame_width: 2.5
        fig_size: 7
        legend: [{'fontsize': 20}]
        grid: [true, {'linestyle': '--'}]
        tick_params:
          - axis: both
            which: major
            labelsize: 25
            direction: in
            length: 5
    ```

    - `line_style`: Style of the plotted line (`linewidth`, `marker`, `markersize`, `markevery`)
    - `ax_style`: Style of the figure. [Most attribute of `matplotlib.axes.Axes` object](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.html#matplotlib.axes.Axes) can be set as follows:

      ```yaml
      yscale: [parg1, parg2, {'kwarg1': v1, 'kwarg2': v2}]
      ```

      is equivalent to running

      ```python
      ax.set_yscale(parg1, parg2, kwarg1=v1, kwarg2=v2)
      ```

For more details, go to [Advanced plot making](#advanced-plot-making) section.

## Advanced topics

### Advanced gridding in yaml

#### 1. List comprehension

This serves similar functionality as list comprehensions in python, and used as follows:

```yaml
lr: [10**{-i};1:1:5]
```

**Syntax:**

```yaml
[{expression};{start}:{step}:{end}]
```

where expression should be any python-interpretable using symbols `i, +, -, *, /, [], ()` and numbers.
This is equivalent to python expression

```python
[{expression} for i in range({start}, {end}, {step})]
```

#### 2. Grid sequences

We can execute sequence of grids by passing in a list of dictionary instead of a dictionary under `grid` keyword as follows

```yaml
grid:
    - optimizer: sgd
      lr: [0.001, 0.01]
      seed: [1, 2, 3]

    - optimizer: adam
      lr: [0.005]
      seed: [1, 2, 3]
```

#### 3. Grouping

Grouping lets you group two different fields so it gets treated as a single field in the grid.

```yaml
grid:
    group:
        optimizer: [[sgd], [adam]]
        lr: [[0.001, 0.01], [0.005]]
    seed: [1, 2, 3]
```

**Syntax:**

```yaml
grid:
    group:
        cfg1: [A1, B1]
        cfg2: [A2, B2]
    cfg3: [1, 2, 3]
```

is syntactically equivalent to

```yaml
grid:
    - cfg1: A1
      cfg2: A2
      cfg3: [1, 2, 3]

    - cfg1: B1
      cfg2: B2
      cfg3: [1, 2, 3]
```

Here the two config fields `cfg1` and `cfg2` has grouped values `(A1, A2)` and `(B1, B2)` that acts like a single config field and arn't gridded seperately (`A1-2, B1-2` are lists of values.)

You can also create several groupings with list of dictionary under `group` keyword as follows.

```yaml
grid:
    group:
        - cfg1: [A1, B1]
          cfg2: [A2, B2]
        - cfg3: [C1, D1]
          cfg4: [C2, D2]
    cfg5: [1, 2, 3]
```

### Advanced Plot making

#### 1. Other arguments for `malet.plot`

- `-best_ref_x_fields`: On defualt, each point in `x_field` get its own optimal hyperparameter set, which is sometimes undesirable.
This argument lets you specify on which value of `x_field` to choose the best hyperparamter.

    ```bash
    -best_ref_x_field {x_field_value}
    ```

- `-best_ref_ml_fields`: Likewise, we might want to use the same hyperparameter for all lines in `multi_line_field` with best hyperparameter chosen from a single value in `multi_line_field`.

    ```bash
    -best_ref_ml_field {ml_field_value}
    ```

- `-best_ref_metric_field`: To plot one metric with the hyperparameter set chosen based on another, pass the name of the metric of reference in `metric_field_value`.

    ```bash
    -best_ref_metric_field {metric_field_value}
    ```

#### 2. Advanced yaml plot config

#### More details on ax_style keyword

Unlike other fields, `frame_width, fig_size, tick_params, legend, grid` are not attributes of `Axes` but are enabled for convinience.
From these, `frame_width` and `fig_size` should be set as a number, while others can be similarly used like the rest of the attributes in `Axes`.

#### Default style

You can change the default plot style by adding the `default_style` keyword in the yaml file.

```yaml
'default_style':
  annotate: false
  std_plot: fill
  line_style: 
    linewidth: 4
    marker: 'D'
    markersize: 10

  ax_style:
    frame_width: 2.5
    fig_size: 7
    legend: [{'fontsize': 20}]
    grid: [true, {'linestyle': '--'}]
    tick_params:
      - axis: both
        which: major
        labelsize: 25
        direction: in
        length: 5
```

#### Mode aliases

You can specify a set of arguments for `malet.plot` in the yaml file and give it an alias you can pass in to `mode` argument.

```yaml
'sam_rho':
  mode: curve-rho-val-accuracy
  multi_line_field: optimizer
  filter: 'optimizer sgd sam'
  annotate: True
  colors: ''
  
  std_plot: bar

  ax_style:
    title: ['SGD vs SAM', {'size': 27}]
    xlabel: ['$\rho$', {'size': 30}]
    ylabel: ['Val Accuracy (%)', {'size': 30}]
```

```bash
malet-plot \
-exp_folder ../experiments/{exp_folder} \
-plot_config {plot_config_path} \
-mode sam_rho
```

When using mode aliases, the conflicting argument passed within the shell will be ignored.

#### Style hierarchy

If conflicting style is passed in, we use the specifications given in the highest priority, given as the following:

```python
default_style < {custom style} < {mode alias}
```

#### 3. Custom dataframe processing

The `legend` and the `tick` are automatically determined based on the processed dataframe within `draw_metric` function.
You can pass in a function to the `preprcs_df` keyword argument in `draw_metric` with the following arguments and return values:

```python
def preprcs_df(df, legend):
    ...
    # Process df and legend
    ...
    return processed_df, processed_legend
```

We advise to assign a new mode for each `preprcs_df`.

#### 4. Custom plotting using `avgbest_df` and `ax_draw` in `plot_utils.metric_drawer`

Much of what `malet.plot` does comes from `avgbest_df` and `ax_draw`.

#### avgbest_df(df, metric_field, avg_over=None,  best_over=tuple(),  best_of=dict(), best_at_max=True)

- Paramters:
  - **df** (`pandas.DataFrame`) : Base dataframe to operate over. All hyperparameters should be set as `MultiIndex`.
  - **metric_field** (`str`) : Column name of the metric. Used to evaluate best hyperparameter.
  - **avg_over** (`str`) : `MultiIndex` level name to average over.
  - **best_over** (`List[str]`) : List of `MultiIndex` level names to find value yielding best values of `metric_field`.
  - **best_of** (`Dict[str, Any]`) : Dictionary of pair `{MultiIndex name}: {value in MultiIndex}` to find best hyperparameter of. The other values in `{MultiIndex name}` will follow the best hyperparamter found for these values.
  - **best_at_max** (`bool`) : `True` when larger metric is better, and `False` otherwise.
- Returns: Processed DataFrame (`pandas.DataFrame`)

#### ax_draw(ax, df, label, annotate=True, std_plot='fill', unif_xticks=False, plot_config = {'linewidth':4, 'color':'orange', 'marker':'D', 'markersize':10, 'markevery':1})

- Paramters:
  - **ax** (`matplotlib.axes.Axes`) : Axes to plot in.
  - **df** (`pandas.DataFrame`) : Dataframe used for the plot.
    This dataframe should have one named index for the x-axis and one column for the y-axis.
  - **label** (`str`) : label for drawn line to be used in the legend.
  - **std_plot** (`Literal['none','fill','bar']`) : Style of standard error drawn in to the plot.
  - **unif_xticks** (`bool`) : When `True`, the xticks will be uniformly distanced regardless of its value.
  - **plot_config** (`Dict[str, Any]`) : Dictionary of configs to use when plotting the line (e.g. linewidth, color, marker, markersize, markevery).
- Returns: Axes (`matplotlib.axes.Axes`) with single line added based on `df`.

### Parallel friendly grid splitting

When using GPU resource allocation programs such as Slurm, you might want to split multiple hyperparameter configurations over different GPU jobs in parallel.
We provide two methods of spliting the grid as arguments of `Experiment`.
We advise to use flags to pass these as argument of your `train.py` file.

```python
from absl import app, flags
from malet.experiment import Experiment

...

FLAGS = flags.FLAGS
def main(argv):
  ...
  experiment = Experiment({exp_folder_path}, train_fn, metric_fields,
                          total_splits=FLAGS.total_splits,
                          curr_splits=FLAGS.curr_splits,
                          auto_update_tsv=FLAGS.auto_update_tsv,
                          configs_save=FLAGS.configs_save)
  experiment.run()

if __name__=='__main__':
  flags.DEFINE_string('total_splits', '1')
  flags.DEFINE_string('curr_splits', '0')
  flags.DEFINE_bool('auto_update_tsv', False)
  flags.DEFINE_bool('configs_save', False)
  app.run(main)
```

#### 1. Partitioning

1. Uniform Partitioning (Pass in number)

    This method of splits the experiments uniformally given the following arguments

    1. number of total partition (`total_splits`),
    2. batch index to allocate to this script (`curr_splits`).

    Each sbatch script needs to be using different `curr_splits` numbers (=0~total-1).

    ```bash
    splits = 4
    echo "run sbatch slurm_train $n"
    for ((i=0;i<n;i++))
    do
      python train.py ./experiments/{exp_folder} \
        --workdir=./logdir \
        --total_splits=splits \
        --curr_splits=$i
    done
    ```

2. Field Partitioning (Pass in field name)

    This method of splits the experiments based on some field given in the following arguments

    1. name of the field to split over (`total_splits`),
    2. string of field values seperated by ' ' to allocate to this current split script (`curr_splits`).

    Each sbatch script needs different field values (whitespace seperated strings for multiple values) in `curr_splits`.

    ```bash
    python experiment_util.py ./experiments/{exp_folder} \
        --total_splits 'optimizer' \
        --curr_splits= 'sgd'

    python experiment_util.py ./experiments/{exp_folder} \
        --total_splits 'optimizer' \
        --curr_splits= 'rmsprop adam'
    ```

    Both of these split methods result in multiple `.tsv` files, which is saved in `{exp_folder}/log_splits/split_{i}.tsv` folder.

**Comments on `auto_update_tsv` argument.**

`auto_update_tsv` is used for 'Current run checking' stated in the next section, but using it in 'Partitioning' doesn't cause problems.
However we advise to not use it by adding since additional read/writing adds unnessacery computation time, especially as the `log.tsv` file grows larger.

#### 2. Queueing

With this method, each jobs, once finished running its config, runs the next config in the queue of the unrun configs.
More precisly, it skips any configs that finished running or are currently running.
The key to doing this is `configs_save=True`, which saves the configs to the `{exp_folder}/log.tsv` file before a config is run, enabling other jobs to know what config is currently running and skip it.

```bash
python experiment_util.py ./experiments/{exp_folder} --workdir=./logdir \
--auto_update_tsv \
--configs_save
```

This method requires the keyword `auto_update_tsv=True` in `Experiment` to automatically read/write tsv files after a job starts/finishes running a config.

One adventage of 'Queueing' over 'Partitioning' is that you can freely allocate/deallocate new GPUs while running an experiment.

#### 3. Use Both (Partitioning + Queueing)

However as `log.tsv` grows larger, read/write time gets larger which cause various conflicts across different GPU jobs. One workaround is to use 'Partitioning' to split experiments to be saved in seperate `log_splits/split_{i}.tsv` to keep the `.tsv` files small, while using 'Queueing' in each splits to freely allocate GPU jobs to leverage the advantages of both methods.

```bash
splits = 4
echo "run sbatch slurm_train $n"
for ((i=0;i<n;i++))
do
  python experiment_util.py ./experiments/{exp_folder} \
    --workdir=./logdir \
    --total_splits=splits \
    --curr_splits=$i \
    --auto_update_tsv \
    --configs_save
done
```

### Saving logs in intermediate epochs

We checkpoint training state so that we can resume training in the event of an unexpected termination.
We can also checkpoint the experiment log so that we don't have to retrain a certain config to re-evaluate the metrics.

#### Training pipeline

For this, we need to add `exp_log` argument in `train` function for checkpointing the experiment log, where you can use it to add the following code for retrieveing/saving intermediate metric dictionary from/to the `tsv` file.

```python
import os

def get_ckpt_dir(config):
    ...
    return ckpt_dir

def get_ckpt(ckpt_dir):
    ...
    return ckpt

def save_ckpt(new_ckpt, ckpt_dir):
    ...

def train(config, experiment, ...):

    ... # set up
    
    # retrieve model/trainstate checkpoint if there exists
    # these are just placeholders for the logic
    ckpt_epoch = 0
    ckpt_dir = get_ckpt_dir(config)
    if os.path.exists(ckpt_dir)
      ckpt = get_ckpt(ckpt_dir)
      ckpt_epoch = ckpt.epoch
    
    ############# retrieve log checkpoint if there exists #############
    metric_dict = {
        'train_accuracies': [],
        'val_accuracies': [],
        'train_losses': [],
        'val_losses': [],
    }
    if config in experiment.log:
      metric_dict = experiment.get_log_checkpoint(config)[0]
    ###################################################################
    ...
    # training happens here
    for epoch in range(config.ckpt_epoch, config.epochs):
      
      ... # train
      
      ... # update metric_dict

      if not (epoch+1) % config.ckpt_every:

        ... # train state, model checkpoint

        ####################### checkpoint log #######################
        save_ckpt(new_ckpt, ckpt_dir)
        experiment.update_log(config, **metric_dict) 
        ##############################################################
    ...

    return metric_dict
```

The `ExperimentLog.get_log_checkpoint` method retrieves the `metric_dict` based on the `status` field in the dataframe.
|status|Description|Behavior when resumed|
|:----:|-----------|--------|
| `R`  | Currently running | Get skipped |
| `C`  | Completed | Get skipped |
| `F`  | Failed while running | Rerun and `metric_dict` is retrieved |

Note that with some external halt (e.g. computer shut down, slurm job cancellation), malet won't be able to log the status as `F` (failed). 
In these cases, you need to **manually find the row in the `log.tsv` file corresponding to the halted job and change the `status` from `R` (running) to `F` (falied)**.

#### Running experiment

```python
from functools import partial
from malet.experiment import Experiment

train_fn = partial(train, ...{other arguments besides config & exp_log}..)
metric_fields =  ['train_accuracies', 'val_accuracies', 'train_losses', 'val_losses']
experiment = Experiment({exp_folder_path}, train_fn, metric_fields, 
                        checkpoint=True, auto_update_tsv=True) 
experiment.run()
```

You should add `checkpoint=True, auto_update_tsv=True` when instanciating `Experiment`.

### Merging multiple log files

There are two methods for merging multiple log files.

#### 1. Merge all logs in a folder

```python
from malet.experiment import ExperimentLog

ExperimentLog.merge_folder({log_folder_path})
```

#### 2. Merge a portion of the logs in a folder

```python
from malet.experiment import ExperimentLog

names = ["log1", "log2", ..., "logn"]
ExperimentLog.merge_tsv(names, {log_folder_path})
```

Both methods automatically merges and saves as `log_merged.tsv` in the folder.
These methods are helpful after running splitted experiments, where merging is required for using plot tools.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "malet",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "machine learning, experiment, plot",
    "author": null,
    "author_email": "Dongyeop Lee <dylee23@postech.ac.kr>",
    "download_url": "https://files.pythonhosted.org/packages/ce/55/d6f1335edac555ee2bfa7034c2a093a233ede8560ca6e3a070cbed69bfae/malet-0.2.1.tar.gz",
    "platform": null,
    "description": "# Malet: a tool for machine learning experiment\n\n\ud83d\udd28 **Malet** (**Ma**chine **L**earning **E**xperiment **T**ool) is a tool for efficient machine learning experiment execution, logging, analysis, and plot making.\n\nThe following features are provided:\n\n- Simple YAML-based hyperparameter configuration w/ grid search syntax\n- Experiment logging and resuming system\n- User-friendly command-line tool for flexible graphing and easy data extraction from experiment logs\n- Efficient parallelization by splitting a sequence of experiments over GPU jobs\n\n## Installation\n\nYou can install Malet using pip,\n\n```bash\npip install malet\n```\n\nor from this repository.\n\n```bash\npip install git+https://github.com/edong6768/Malet.git\n```\n\n## Dependencies\n\n- absl-py 1.0.0\n- gitpython 3.1.40\n- matplotlib 3.7.0\n- ml-collections 0.1.0\n- numpy 1.22.0\n- pandas 2.0.3\n- rich 13.6.0\n- seaborn 0.11.2\n\n## Documentation **(\ud83d\udea8 Will be migrated to Sphinx-based Read-the-docs in near future)**\n\n### Contents\n\n**[Quick start](#quick-start)**\n\n1. [Prerequisite](#1-prerequisite)\n2. [Running experiments](#2-running-experiments)\n3. [Plot making](#3-plot-making)\n\n**[Advanced topics](#advanced-topics)**\n\n1. [Advanced gridding in yaml](#advanced-gridding-in-yaml)\n2. [Advanced plot making](#advanced-plot-making)\n3. [Parallel friendly grid splitting](#parallel-friendly-grid-splitting)\n4. [Saving logs in intermediate epochs](#saving-logs-in-intermediate-epochs)\n5. [Merging multiple log files](#merging-multiple-log-files)\n\n## Quick start\n\n### 1. Prerequisite\n\n#### Experiment Folder\n\nUsing Malet starts with making a folder with a single yaml config file.\nVarious files resulting from some experiment is saved in this single folder.\nWe advise to create a folder for each experiment under ```experiments``` folder.\n\n```yaml\nexperiments/\n\u2514\u2500\u2500 {experiment folder}/\n    \u251c\u2500\u2500 exp_config.yaml : experiment config yaml file            (User created)\n    \u251c\u2500\u2500 log.tsv         : log file for saving experiment results (generated by malet.experiment)\n    \u251c\u2500\u2500 (log_splits)    : folder for splitted logs               (generated by malet.experiment)\n    \u2514\u2500\u2500 figure          : folder for figures                     (generated by malet.plot)\n```\n\n#### Pre-existing training pipeline\n\nSay you have some training pipeline that takes in a configuration (any object w/ dictionary-like interface).\nWe require you to return the result of the training so it gets logged.\n\n```python\ndef train(config, ...):\n    ...\n    # training happens here\n    ...\n    metric_dict = {\n        'train_accuracies': train_accuracies,\n        'val_accuracies': val_accuracies,\n        'train_losses': train_losses,\n        'val_losses': val_losses,\n    }\n    return metric_dict\n```\n\n### 2. Running experiments\n\n#### Experiment config yaml\n\nYou can configure as you would do in the yaml file.\nBut we provide useful special keyword `grid`, used as follows:\n\n```yaml\n# static configs\nmodel: LeNet5\ndataset: mnist\n\nnum_epochs: 100\nbatch_size: 128\noptimizer: adam\n\n# grided fields\ngrid:\n    seed: [1, 2, 3]\n    lr: [0.0001, 0.001, 0.01, 0.1]\n    weight_decay: [0.0, 0.00005, 0.0001]\n```\n\nSpecifying list of config values under `grid` lets you run all possible combination (*i.e.* grid) of your configurations, with field least frequently changing in the order of declaration in `grid`.\n\n#### Running experiments\n\nThe following will run the `train_fn` on grid of configs based on `{exp_folder_path}` and `train_fn`.\n\n```python\nfrom functools import partial\nfrom malet.experiment import Experiment\n\ntrain_fn = partial(train, ...{other arguments besides config}..)\nmetric_fields =  ['train_accuracies', 'val_accuracies', 'train_losses', 'val_losses']\nexperiment = Experiment({exp_folder_path}, train_fn, metric_fields)\nexperiment.run()\n```\n\nNote that you need to partially apply your original function so that you pass in a function with only `config` as its argument.\n\n#### Experiment logs\n\nThe experiment log will be automatically saved in the `{exp_folder_path}` as `log.tsv`, where the static configs and the experiment log are eached saved in yaml and tsv like structure respectively.\nYou can retrieve these data in python using `ExperimentLog` in `malet.experiment` as follows:\n\n```python\nfrom malet.experiment import ExperimentLog\n\nlog = ExperimentLog.from_tsv({tsv_file})\n\nstatic_configs = log.static_configs\ndf = log.df\n```\n\nExperiment logs also enables resuming to the most recently run config when a job is suddenly killed.\nNote that this only enable you to resume from the begining of the training.\nFor resuming from intermediate log checkpoints, check out [Saving logs in intermediate epochs](#saving-logs-in-intermediate-epochs).\n\n### 3. Plot making\n\nRunning `malet.plot` lets you make plots based on `log.tsv` in the experiment folder.\n\n```bash\nmalet-plot \\\n-exp_folder ../experiments/{exp_folder} \\\n-mode curve-epoch-train_accuracy\n```\n\nThe key intuition for using this is to *leave only two field in the dataframe for the x-axis and the y-axis* by\n\n1. **specifying a specific value** (*e.g.* model, dataset, optimizer, etc.),\n2. **averaging over** (seed),\n3. or **choose value with best metric** (other hyperparameters),\n\nwhich will leave only one value for each field.\nThis can be done using the following arguments.\n\n#### Data related arguments\n\n1. **`-mode`**: Mode consists of mode of the plot (currently only has 'curve' and 'bar'), the field for x-axis, and the metric to use for y-axis.\n\n    ```bash\n    -mode {plot_mode}-{x_field}-{metric}\n    ```\n\n    Any other field except `x_field` and `seed` (always averaged over) is automatically chosen value with best metric.\n    To specify a value of a field, you can use the following `-filter` argument.\n2. **`-filter`**: Use to explicitly choose only certain subset values of some field.\n\n    ```bash\n    -filter '{field1} {v1} {v2} / {field2} {v3} {v4} ...'\n    ```\n\n    Here, two special fields are automatically generated:\n\n    - `step` - from `explode`ing list-type metric, with special value 'best' and 'last' for selecting best performing step and last step respectively and with slicing syntax (e.g., 50:100),\n    - `metric` - from `melt`ing different metrics column name into a new column.\n\n    are automatically generated.\n\n3. **`-multi_line_fields`**: Specify the fields to plot multiple lines over.\n\n    ```bash\n    -multi_line_field '{field1} {field2} ...'\n    ```\n\n4. **`-multi_plot_fields`**: Specify the fields to plot multiple plot (column/row) over.\n\n    ```bash\n    -multi_plot_field '{column field}'\n    -multi_plot_field '{column field} {row field}'\n    ```\n\n4. **`-animate_field`**: Specify the fields to animate over. Saves gif intead of pdf.\n\n    ```bash\n    -animate_field '{field}'\n    ```\n\n6. **`-best_at_max`** (Default: False): Specify whether chosen metric is best when largest (e.g. accuracy).\n\n    ```bash\n    -best_at_max\n    -nobest_at_max\n    ```\n\n#### Styling arguments\n\n1. **`-colors`**: Name or list of names of [matplotlib colormaps](https://matplotlib.org/stable/users/explain/colors/colormaps.html).\n\n    ```bash\n    -colors 'default'\n    ```\n\n2. **`-annotate`**: Option to add annotation based on field specified in `annotate_fields`.\n\n    ```bash\n    -annotate\n    ```\n\n3. **`-annotate_fields`**: Field to annotate.\n\n    ```bash\n    -annotate_fields '{field1} {field2} ...'\n    ```\n\n3. **`-fig_size`**: Figure size.\n    \n    - Square figure\n      ```bash\n      -fig_size 7\n      ```\n    - Rectangular figure (x, y)\n      ```bash\n      -fig_size 10 8\n      ```\n\n3. **`-style`**: Matplotlib style.\n\n    ```bash\n    -style 'ggplot'\n    ```\n\n4. **`-plot_config`**: The path for a yaml file to configure all aspects the plot.\n\n    ```bash\n    -plot_config {plot_config_path}\n    ```\n\n    In this yaml, you can specify the `line_style` and `ax_style` under each mode as follows:\n\n    ```yaml\n    'curve-epoch-train_accuracy':\n      annotate: false\n      std_plot: fill\n      line_style: \n        linewidth: 4\n        marker: 'D'\n        markersize: 10\n\n      ax_style:\n        frame_width: 2.5\n        fig_size: 7\n        legend: [{'fontsize': 20}]\n        grid: [true, {'linestyle': '--'}]\n        tick_params:\n          - axis: both\n            which: major\n            labelsize: 25\n            direction: in\n            length: 5\n    ```\n\n    - `line_style`: Style of the plotted line (`linewidth`, `marker`, `markersize`, `markevery`)\n    - `ax_style`: Style of the figure. [Most attribute of `matplotlib.axes.Axes` object](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.html#matplotlib.axes.Axes) can be set as follows:\n\n      ```yaml\n      yscale: [parg1, parg2, {'kwarg1': v1, 'kwarg2': v2}]\n      ```\n\n      is equivalent to running\n\n      ```python\n      ax.set_yscale(parg1, parg2, kwarg1=v1, kwarg2=v2)\n      ```\n\nFor more details, go to [Advanced plot making](#advanced-plot-making) section.\n\n## Advanced topics\n\n### Advanced gridding in yaml\n\n#### 1. List comprehension\n\nThis serves similar functionality as list comprehensions in python, and used as follows:\n\n```yaml\nlr: [10**{-i};1:1:5]\n```\n\n**Syntax:**\n\n```yaml\n[{expression};{start}:{step}:{end}]\n```\n\nwhere expression should be any python-interpretable using symbols `i, +, -, *, /, [], ()` and numbers.\nThis is equivalent to python expression\n\n```python\n[{expression} for i in range({start}, {end}, {step})]\n```\n\n#### 2. Grid sequences\n\nWe can execute sequence of grids by passing in a list of dictionary instead of a dictionary under `grid` keyword as follows\n\n```yaml\ngrid:\n    - optimizer: sgd\n      lr: [0.001, 0.01]\n      seed: [1, 2, 3]\n\n    - optimizer: adam\n      lr: [0.005]\n      seed: [1, 2, 3]\n```\n\n#### 3. Grouping\n\nGrouping lets you group two different fields so it gets treated as a single field in the grid.\n\n```yaml\ngrid:\n    group:\n        optimizer: [[sgd], [adam]]\n        lr: [[0.001, 0.01], [0.005]]\n    seed: [1, 2, 3]\n```\n\n**Syntax:**\n\n```yaml\ngrid:\n    group:\n        cfg1: [A1, B1]\n        cfg2: [A2, B2]\n    cfg3: [1, 2, 3]\n```\n\nis syntactically equivalent to\n\n```yaml\ngrid:\n    - cfg1: A1\n      cfg2: A2\n      cfg3: [1, 2, 3]\n\n    - cfg1: B1\n      cfg2: B2\n      cfg3: [1, 2, 3]\n```\n\nHere the two config fields `cfg1` and `cfg2` has grouped values `(A1, A2)` and `(B1, B2)` that acts like a single config field and arn't gridded seperately (`A1-2, B1-2` are lists of values.)\n\nYou can also create several groupings with list of dictionary under `group` keyword as follows.\n\n```yaml\ngrid:\n    group:\n        - cfg1: [A1, B1]\n          cfg2: [A2, B2]\n        - cfg3: [C1, D1]\n          cfg4: [C2, D2]\n    cfg5: [1, 2, 3]\n```\n\n### Advanced Plot making\n\n#### 1. Other arguments for `malet.plot`\n\n- `-best_ref_x_fields`: On defualt, each point in `x_field` get its own optimal hyperparameter set, which is sometimes undesirable.\nThis argument lets you specify on which value of `x_field` to choose the best hyperparamter.\n\n    ```bash\n    -best_ref_x_field {x_field_value}\n    ```\n\n- `-best_ref_ml_fields`: Likewise, we might want to use the same hyperparameter for all lines in `multi_line_field` with best hyperparameter chosen from a single value in `multi_line_field`.\n\n    ```bash\n    -best_ref_ml_field {ml_field_value}\n    ```\n\n- `-best_ref_metric_field`: To plot one metric with the hyperparameter set chosen based on another, pass the name of the metric of reference in `metric_field_value`.\n\n    ```bash\n    -best_ref_metric_field {metric_field_value}\n    ```\n\n#### 2. Advanced yaml plot config\n\n#### More details on ax_style keyword\n\nUnlike other fields, `frame_width, fig_size, tick_params, legend, grid` are not attributes of `Axes` but are enabled for convinience.\nFrom these, `frame_width` and `fig_size` should be set as a number, while others can be similarly used like the rest of the attributes in `Axes`.\n\n#### Default style\n\nYou can change the default plot style by adding the `default_style` keyword in the yaml file.\n\n```yaml\n'default_style':\n  annotate: false\n  std_plot: fill\n  line_style: \n    linewidth: 4\n    marker: 'D'\n    markersize: 10\n\n  ax_style:\n    frame_width: 2.5\n    fig_size: 7\n    legend: [{'fontsize': 20}]\n    grid: [true, {'linestyle': '--'}]\n    tick_params:\n      - axis: both\n        which: major\n        labelsize: 25\n        direction: in\n        length: 5\n```\n\n#### Mode aliases\n\nYou can specify a set of arguments for `malet.plot` in the yaml file and give it an alias you can pass in to `mode` argument.\n\n```yaml\n'sam_rho':\n  mode: curve-rho-val-accuracy\n  multi_line_field: optimizer\n  filter: 'optimizer sgd sam'\n  annotate: True\n  colors: ''\n  \n  std_plot: bar\n\n  ax_style:\n    title: ['SGD vs SAM', {'size': 27}]\n    xlabel: ['$\\rho$', {'size': 30}]\n    ylabel: ['Val Accuracy (%)', {'size': 30}]\n```\n\n```bash\nmalet-plot \\\n-exp_folder ../experiments/{exp_folder} \\\n-plot_config {plot_config_path} \\\n-mode sam_rho\n```\n\nWhen using mode aliases, the conflicting argument passed within the shell will be ignored.\n\n#### Style hierarchy\n\nIf conflicting style is passed in, we use the specifications given in the highest priority, given as the following:\n\n```python\ndefault_style < {custom style} < {mode alias}\n```\n\n#### 3. Custom dataframe processing\n\nThe `legend` and the `tick` are automatically determined based on the processed dataframe within `draw_metric` function.\nYou can pass in a function to the `preprcs_df` keyword argument in `draw_metric` with the following arguments and return values:\n\n```python\ndef preprcs_df(df, legend):\n    ...\n    # Process df and legend\n    ...\n    return processed_df, processed_legend\n```\n\nWe advise to assign a new mode for each `preprcs_df`.\n\n#### 4. Custom plotting using `avgbest_df` and `ax_draw` in `plot_utils.metric_drawer`\n\nMuch of what `malet.plot` does comes from `avgbest_df` and `ax_draw`.\n\n#### avgbest_df(df, metric_field, avg_over=None,  best_over=tuple(),  best_of=dict(), best_at_max=True)\n\n- Paramters:\n  - **df** (`pandas.DataFrame`) : Base dataframe to operate over. All hyperparameters should be set as `MultiIndex`.\n  - **metric_field** (`str`) : Column name of the metric. Used to evaluate best hyperparameter.\n  - **avg_over** (`str`) : `MultiIndex` level name to average over.\n  - **best_over** (`List[str]`) : List of `MultiIndex` level names to find value yielding best values of `metric_field`.\n  - **best_of** (`Dict[str, Any]`) : Dictionary of pair `{MultiIndex name}: {value in MultiIndex}` to find best hyperparameter of. The other values in `{MultiIndex name}` will follow the best hyperparamter found for these values.\n  - **best_at_max** (`bool`) : `True` when larger metric is better, and `False` otherwise.\n- Returns: Processed DataFrame (`pandas.DataFrame`)\n\n#### ax_draw(ax, df, label, annotate=True, std_plot='fill', unif_xticks=False, plot_config = {'linewidth':4, 'color':'orange', 'marker':'D', 'markersize':10, 'markevery':1})\n\n- Paramters:\n  - **ax** (`matplotlib.axes.Axes`) : Axes to plot in.\n  - **df** (`pandas.DataFrame`) : Dataframe used for the plot.\n    This dataframe should have one named index for the x-axis and one column for the y-axis.\n  - **label** (`str`) : label for drawn line to be used in the legend.\n  - **std_plot** (`Literal['none','fill','bar']`) : Style of standard error drawn in to the plot.\n  - **unif_xticks** (`bool`) : When `True`, the xticks will be uniformly distanced regardless of its value.\n  - **plot_config** (`Dict[str, Any]`) : Dictionary of configs to use when plotting the line (e.g. linewidth, color, marker, markersize, markevery).\n- Returns: Axes (`matplotlib.axes.Axes`) with single line added based on `df`.\n\n### Parallel friendly grid splitting\n\nWhen using GPU resource allocation programs such as Slurm, you might want to split multiple hyperparameter configurations over different GPU jobs in parallel.\nWe provide two methods of spliting the grid as arguments of `Experiment`.\nWe advise to use flags to pass these as argument of your `train.py` file.\n\n```python\nfrom absl import app, flags\nfrom malet.experiment import Experiment\n\n...\n\nFLAGS = flags.FLAGS\ndef main(argv):\n  ...\n  experiment = Experiment({exp_folder_path}, train_fn, metric_fields,\n                          total_splits=FLAGS.total_splits,\n                          curr_splits=FLAGS.curr_splits,\n                          auto_update_tsv=FLAGS.auto_update_tsv,\n                          configs_save=FLAGS.configs_save)\n  experiment.run()\n\nif __name__=='__main__':\n  flags.DEFINE_string('total_splits', '1')\n  flags.DEFINE_string('curr_splits', '0')\n  flags.DEFINE_bool('auto_update_tsv', False)\n  flags.DEFINE_bool('configs_save', False)\n  app.run(main)\n```\n\n#### 1. Partitioning\n\n1. Uniform Partitioning (Pass in number)\n\n    This method of splits the experiments uniformally given the following arguments\n\n    1. number of total partition (`total_splits`),\n    2. batch index to allocate to this script (`curr_splits`).\n\n    Each sbatch script needs to be using different `curr_splits` numbers (=0~total-1).\n\n    ```bash\n    splits = 4\n    echo \"run sbatch slurm_train $n\"\n    for ((i=0;i<n;i++))\n    do\n      python train.py ./experiments/{exp_folder} \\\n        --workdir=./logdir \\\n        --total_splits=splits \\\n        --curr_splits=$i\n    done\n    ```\n\n2. Field Partitioning (Pass in field name)\n\n    This method of splits the experiments based on some field given in the following arguments\n\n    1. name of the field to split over (`total_splits`),\n    2. string of field values seperated by ' ' to allocate to this current split script (`curr_splits`).\n\n    Each sbatch script needs different field values (whitespace seperated strings for multiple values) in `curr_splits`.\n\n    ```bash\n    python experiment_util.py ./experiments/{exp_folder} \\\n        --total_splits 'optimizer' \\\n        --curr_splits= 'sgd'\n\n    python experiment_util.py ./experiments/{exp_folder} \\\n        --total_splits 'optimizer' \\\n        --curr_splits= 'rmsprop adam'\n    ```\n\n    Both of these split methods result in multiple `.tsv` files, which is saved in `{exp_folder}/log_splits/split_{i}.tsv` folder.\n\n**Comments on `auto_update_tsv` argument.**\n\n`auto_update_tsv` is used for 'Current run checking' stated in the next section, but using it in 'Partitioning' doesn't cause problems.\nHowever we advise to not use it by adding since additional read/writing adds unnessacery computation time, especially as the `log.tsv` file grows larger.\n\n#### 2. Queueing\n\nWith this method, each jobs, once finished running its config, runs the next config in the queue of the unrun configs.\nMore precisly, it skips any configs that finished running or are currently running.\nThe key to doing this is `configs_save=True`, which saves the configs to the `{exp_folder}/log.tsv` file before a config is run, enabling other jobs to know what config is currently running and skip it.\n\n```bash\npython experiment_util.py ./experiments/{exp_folder} --workdir=./logdir \\\n--auto_update_tsv \\\n--configs_save\n```\n\nThis method requires the keyword `auto_update_tsv=True` in `Experiment` to automatically read/write tsv files after a job starts/finishes running a config.\n\nOne adventage of 'Queueing' over 'Partitioning' is that you can freely allocate/deallocate new GPUs while running an experiment.\n\n#### 3. Use Both (Partitioning + Queueing)\n\nHowever as `log.tsv` grows larger, read/write time gets larger which cause various conflicts across different GPU jobs. One workaround is to use 'Partitioning' to split experiments to be saved in seperate `log_splits/split_{i}.tsv` to keep the `.tsv` files small, while using 'Queueing' in each splits to freely allocate GPU jobs to leverage the advantages of both methods.\n\n```bash\nsplits = 4\necho \"run sbatch slurm_train $n\"\nfor ((i=0;i<n;i++))\ndo\n  python experiment_util.py ./experiments/{exp_folder} \\\n    --workdir=./logdir \\\n    --total_splits=splits \\\n    --curr_splits=$i \\\n    --auto_update_tsv \\\n    --configs_save\ndone\n```\n\n### Saving logs in intermediate epochs\n\nWe checkpoint training state so that we can resume training in the event of an unexpected termination.\nWe can also checkpoint the experiment log so that we don't have to retrain a certain config to re-evaluate the metrics.\n\n#### Training pipeline\n\nFor this, we need to add `exp_log` argument in `train` function for checkpointing the experiment log, where you can use it to add the following code for retrieveing/saving intermediate metric dictionary from/to the `tsv` file.\n\n```python\nimport os\n\ndef get_ckpt_dir(config):\n    ...\n    return ckpt_dir\n\ndef get_ckpt(ckpt_dir):\n    ...\n    return ckpt\n\ndef save_ckpt(new_ckpt, ckpt_dir):\n    ...\n\ndef train(config, experiment, ...):\n\n    ... # set up\n    \n    # retrieve model/trainstate checkpoint if there exists\n    # these are just placeholders for the logic\n    ckpt_epoch = 0\n    ckpt_dir = get_ckpt_dir(config)\n    if os.path.exists(ckpt_dir)\n      ckpt = get_ckpt(ckpt_dir)\n      ckpt_epoch = ckpt.epoch\n    \n    ############# retrieve log checkpoint if there exists #############\n    metric_dict = {\n        'train_accuracies': [],\n        'val_accuracies': [],\n        'train_losses': [],\n        'val_losses': [],\n    }\n    if config in experiment.log:\n      metric_dict = experiment.get_log_checkpoint(config)[0]\n    ###################################################################\n    ...\n    # training happens here\n    for epoch in range(config.ckpt_epoch, config.epochs):\n      \n      ... # train\n      \n      ... # update metric_dict\n\n      if not (epoch+1) % config.ckpt_every:\n\n        ... # train state, model checkpoint\n\n        ####################### checkpoint log #######################\n        save_ckpt(new_ckpt, ckpt_dir)\n        experiment.update_log(config, **metric_dict) \n        ##############################################################\n    ...\n\n    return metric_dict\n```\n\nThe `ExperimentLog.get_log_checkpoint` method retrieves the `metric_dict` based on the `status` field in the dataframe.\n|status|Description|Behavior when resumed|\n|:----:|-----------|--------|\n| `R`  | Currently running | Get skipped |\n| `C`  | Completed | Get skipped |\n| `F`  | Failed while running | Rerun and `metric_dict` is retrieved |\n\nNote that with some external halt (e.g. computer shut down, slurm job cancellation), malet won't be able to log the status as `F` (failed). \nIn these cases, you need to **manually find the row in the `log.tsv` file corresponding to the halted job and change the `status` from `R` (running) to `F` (falied)**.\n\n#### Running experiment\n\n```python\nfrom functools import partial\nfrom malet.experiment import Experiment\n\ntrain_fn = partial(train, ...{other arguments besides config & exp_log}..)\nmetric_fields =  ['train_accuracies', 'val_accuracies', 'train_losses', 'val_losses']\nexperiment = Experiment({exp_folder_path}, train_fn, metric_fields, \n                        checkpoint=True, auto_update_tsv=True) \nexperiment.run()\n```\n\nYou should add `checkpoint=True, auto_update_tsv=True` when instanciating `Experiment`.\n\n### Merging multiple log files\n\nThere are two methods for merging multiple log files.\n\n#### 1. Merge all logs in a folder\n\n```python\nfrom malet.experiment import ExperimentLog\n\nExperimentLog.merge_folder({log_folder_path})\n```\n\n#### 2. Merge a portion of the logs in a folder\n\n```python\nfrom malet.experiment import ExperimentLog\n\nnames = [\"log1\", \"log2\", ..., \"logn\"]\nExperimentLog.merge_tsv(names, {log_folder_path})\n```\n\nBoth methods automatically merges and saves as `log_merged.tsv` in the folder.\nThese methods are helpful after running splitted experiments, where merging is required for using plot tools.\n",
    "bugtrack_url": null,
    "license": "Copyright (c) 2023 Dongyeop Lee  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.",
    "summary": "Malet: a tool for machine learning experiment",
    "version": "0.2.1",
    "project_urls": {
        "Repository": "https://github.com/edong6768/Malet.git"
    },
    "split_keywords": [
        "machine learning",
        " experiment",
        " plot"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c48451c273f6ee1a93735261ece7c0bf5b775166ed7e43629c3e2a260fa41b48",
                "md5": "b583b8513f799d42f355060c2b2bad7a",
                "sha256": "7b0fdf22028b210b6615166033a0f82b84f44bbdaf303158a2783f69a21b7e40"
            },
            "downloads": -1,
            "filename": "malet-0.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b583b8513f799d42f355060c2b2bad7a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 36546,
            "upload_time": "2024-07-17T13:42:37",
            "upload_time_iso_8601": "2024-07-17T13:42:37.898055Z",
            "url": "https://files.pythonhosted.org/packages/c4/84/51c273f6ee1a93735261ece7c0bf5b775166ed7e43629c3e2a260fa41b48/malet-0.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ce55d6f1335edac555ee2bfa7034c2a093a233ede8560ca6e3a070cbed69bfae",
                "md5": "e9a2441adcc1f16cca07a83a54992215",
                "sha256": "b4dea38d91610b9ea18592695af12a9d3a7813448cf323ee49b2f0d2e64984c0"
            },
            "downloads": -1,
            "filename": "malet-0.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "e9a2441adcc1f16cca07a83a54992215",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 43127,
            "upload_time": "2024-07-17T13:42:44",
            "upload_time_iso_8601": "2024-07-17T13:42:44.972181Z",
            "url": "https://files.pythonhosted.org/packages/ce/55/d6f1335edac555ee2bfa7034c2a093a233ede8560ca6e3a070cbed69bfae/malet-0.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-17 13:42:44",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "edong6768",
    "github_project": "Malet",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "absl-py",
            "specs": [
                [
                    "==",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "gitpython",
            "specs": [
                [
                    "==",
                    "3.1.40"
                ]
            ]
        },
        {
            "name": "matplotlib",
            "specs": [
                [
                    "==",
                    "3.7.0"
                ]
            ]
        },
        {
            "name": "ml-collections",
            "specs": [
                [
                    "==",
                    "0.1.0"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "==",
                    "1.22.0"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    "==",
                    "2.0.3"
                ]
            ]
        },
        {
            "name": "rich",
            "specs": [
                [
                    "==",
                    "13.6.0"
                ]
            ]
        },
        {
            "name": "seaborn",
            "specs": [
                [
                    "==",
                    "0.11.2"
                ]
            ]
        }
    ],
    "lcname": "malet"
}
        
Elapsed time: 4.29061s