expyrun

Name	expyrun JSON
Version	0.2.0 JSON
	download
home_page	https://github.com/raphaelreme/expyrun
Summary	Run reproducible experiments from yaml configuration file
upload_time	2024-10-24 14:25:36
maintainer	None
docs_url	None
author	Raphael Reme
requires_python	>=3.7
license	MIT
keywords	experiments reproducibility machine learning
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

# Expyrun

[![Lint and Test](https://github.com/raphaelreme/expyrun/actions/workflows/tests.yml/badge.svg)](https://github.com/raphaelreme/expyrun/actions/workflows/tests.yml)

Running reproducible experiments from a yaml configuration file.

Expyrun is a command-line script that will launch your code from
a yaml configuration file and register in the output directory everything
needed to reproduce the run.

The configuration file is a yaml film with some specifications:
- List of objects are not supported yet.
- Environment variables are parsed and resolved. (\$MY_VAR or ${MY_VAR})
- The config can reference itself, for instance, make the name of the experiment
depending on the value of some keys. See the examples.

## Install

### Pip

```bash
$ pip install expyrun
```

### Conda

Not available yet

## Getting started

Expyrun is a command-line tool. You can directly use it once installed:

```bash
$ expyrun -h
$ expyrun path/to/my/experiment/configuration.yml
```

In order to work, you have to build an entry point in your code which is a function
that takes as inputs a string (name of the experiment) and a dictionary (configuration
of the experiment). This function will be imported and run by Expyrun.

The minimal configuration file is:
```yml
__run__:
__main__: package.module:entry_point # How to find your entry point
__output_dir__: /path/to/output_dir # Where to store experiments
__name__: my_experiment # Name of the experiment

# Additional configuration will be given to your code
# For instance:
# seed: 666
#
# data: /path/to/data
#
# device: cuda
#
# model:
# name: resnet
# size: 18
```

It can be stored anywhere. When running Expyrun, the package of your entry point should
be in the current working directory. Or you can specify a \_\_code__ key in \_\_run__
section, to indicate where the code should be found.

Notes:
- Expyrun will create an experiment folder in which it will put the configuration (and raw configuration,
see the example), frozen requirements, and a copy of the source code. Almost everything you need to run
your experiment again. It will also redirect your stdout and stderr to outputs.log file.
- From your function perspective, the current working directory is this experiment directory,
therefore results (model weights, data preprocessing, etc) can be saved directly in it.
- Expyrun does not try to copy all your dependencies (for instance data read by your code), as this
would be too heavy. You are responsible to keep the data the code reads at the same location. Or
you should overwrite the location of the data when reproducing.
- You should probably look at dacite and dataclasses to create nicely typed configuration in your code.
But this is out of the scope of Expyrun.

## Configuration file format
There are three special sections reserved for Expyrun in the yaml files:

- \_\_default__: Inherit keys and values from one or several other configurations
(can be a string or a list of strings). Each path can be absolute (/path/to/default.yml),
relative to the current directory (path/to/default.yml), or relative to the current yaml
config (./path/to/default.yml). If not set, it is considered empty.
This allows you to build a common default configuration between your experiences.

- \_\_new_key_policy__: How to handle new keys in a configuration that inherits from others.
Accepted values: "raise", "warn", "pass". Default: "warn".
A new key is a key that is present in the current configuration but absent from any of
its parents (which is probably weird).

- \_\_run__: The most important section. It defines the metadata for running your experiment.
It has 4 different keys:
- \_\_main__: Main function to run (Mandatory). Expected signature: Callable[[str, dict], None].
This function will be called with the experiment name and the experiment configuration.
A valid main function string is given as package.subpackage.module:function.
Expyrun will search the package inside the current working directory.
- \_\_name__: Name of the experiment. (Mandatory) Used to compute the true output directory,
it will be given to the main function.
- \_\_output_dir__: Base path for outputs to be stored (Mandatory). The outputs will be stored
in {output_dir}/{name}/exp.{i} or {output_dir}/DEBUG/{name}/exp.{i} in debug mode.
(for the ith experiment of the same name)
- \_\_code__: Optional path to the code. Expyrun searches the code package in the current
working directory by default. This allows you to change this behavior.

## Concrete example
Let's assume the following architecture

- my_project/
- data/
- my_code/
- \_\_init__.py
- utils.py
- data.py
- experiments/
- \_\_init__.py
- first_method.py
- second_method.py
- .git/
- .gitignore
- README.md

Different experiments can be launched in the `experiments` package. (One file by experiment). And some code is shared between experiments,
for instance, the code handling the data.

A simple way to create the configuration files would be to create a new configs directory following roughly the architecture of the code
- my_project/
- configs/
- data.yml
- experiments/
- first_method.yml
- second_method.yml

```yml
# data.yml

data:
location: $DATA_FOLDER
train_size: 0.7
```

```yml
# first_method.yml

__default__: ../data.yml

__run__:
__main__: my_code.experiments.first_method:main
__output_dir__: $OUTPUT_DIR
__name__: first_method/{model.name}-{training.lr}

seed: 666

model:
name: MyModel

training:
seed: "{seed}" # Have to add "" when starting with { char
lr: 0.001
batch_size: 10
```

```yml
# second_method.yml

__default__: ./first_method.yml

__run__:
__main__: my_code.experiments.second_method:main
__name__: second_method/{model.name}-{training.size}

seed: 777

model:
name: MyModelBis

training:
lr: 0.1
size: [10, 10]
```

Then within a terminal in the `my_project` directory, you can launch experiments with

```bash
$ expyrun configs/experiments/first_method.yml [--debug]
# Change hyper parameters from arguments:
$ expyrun configs/experiments/second_method.yml --training.size 15,15
```

Have a look at the `example` folder which implements another simple example.

After running these two experiments $OUTPUT_DIR is filled this way:
- $OUTPUT_DIR/
- first_method/
- MyModel-0.0001/
- exp.0/
- config.yml
- frozen_requirements.txt
- my_code/
- outputs.log
- raw_config.yml
- second_method/
- MyModelBis-[10,10]/
- exp.0/
- config.yml
- frozen_requirements.txt
- my_code/
- outputs.log
- raw_config.yml

To execute them again precisely, you should build a new environment
from the frozen_requirements. Then execute Expyrun with the config.yml file.

To start from an experiment and change some hyperparameters,
then use the raw_config.yml file and use args in command-line to overwrite
what you want. (raw_config is the unparsed config. Therefore if you change
some hyperparameters, other values, for instance the name, will be adapted too.)

```bash
# Reproduce and change existing experiments
$ expyrun $OUTPUT_DIR/first_method/MyModel-0.0001/exp.0/config.yml
$ expyrun $OUTPUT_DIR/first_method/MyModel-0.0001/exp.0/raw_config.yml --training.lr 0.001 # Name will be format with the new value of lr
```

After running these two lines here is the output tree:
- $OUTPUT_DIR/
- first_method/
- MyModel-0.0001/
- exp.0/
- exp.1/
- MyModel-0.001/
- exp.0/
- second_method/
- MyModelBis-[10,10]/
- exp.0/

## Environment variables used by Expyrun

- `EXPYRUN_CWD`: Working directory when expyrun has been launched. Expyrun will set this variable before changing to the real working directory.
Can be useful to know exactly where we came from.

## Build and Deploy

```bash
$ python -m build
$ python -m twine upload dist/*
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/raphaelreme/expyrun",
    "name": "expyrun",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "experiments, reproducibility, machine learning",
    "author": "Raphael Reme",
    "author_email": "raphaelreme-dev@protonmail.com",
    "download_url": "https://files.pythonhosted.org/packages/b1/1b/b7228e6da9b88f8a25362641fe5a0f3cc82070c20a374dc92230f72a3bf0/expyrun-0.2.0.tar.gz",
    "platform": null,
    "description": "# Expyrun\n\n[![Lint and Test](https://github.com/raphaelreme/expyrun/actions/workflows/tests.yml/badge.svg)](https://github.com/raphaelreme/expyrun/actions/workflows/tests.yml)\n\nRunning reproducible experiments from a yaml configuration file.\n\nExpyrun is a command-line script that will launch your code from\na yaml configuration file and register in the output directory everything\nneeded to reproduce the run.\n\nThe configuration file is a yaml film with some specifications:\n- List of objects are not supported yet.\n- Environment variables are parsed and resolved. (\\$MY_VAR or ${MY_VAR})\n- The config can reference itself, for instance, make the name of the experiment\ndepending on the value of some keys. See the examples.\n\n\n## Install\n\n### Pip\n\n```bash\n$ pip install expyrun\n```\n\n### Conda\n\nNot available yet\n\n\n## Getting started\n\nExpyrun is a command-line tool. You can directly use it once installed:\n\n```bash\n$ expyrun -h\n$ expyrun path/to/my/experiment/configuration.yml\n```\n\nIn order to work, you have to build an entry point in your code which is a function\nthat takes as inputs a string (name of the experiment) and a dictionary (configuration\nof the experiment). This function will be imported and run by Expyrun.\n\nThe minimal configuration file is:\n```yml\n__run__:\n    __main__: package.module:entry_point  # How to find your entry point\n    __output_dir__: /path/to/output_dir  # Where to store experiments\n    __name__: my_experiment  # Name of the experiment\n\n# Additional configuration will be given to your code\n# For instance:\n# seed: 666\n#\n# data: /path/to/data\n#\n# device: cuda\n#\n# model:\n#   name: resnet\n#   size: 18\n```\n\nIt can be stored anywhere. When running Expyrun, the package of your entry point should\nbe in the current working directory. Or you can specify a \\_\\_code__ key in \\_\\_run__\nsection, to indicate where the code should be found.\n\nNotes:\n- Expyrun will create an experiment folder in which it will put the configuration (and raw configuration,\nsee the example), frozen requirements, and a copy of the source code. Almost everything you need to run\nyour experiment again. It will also redirect your stdout and stderr to outputs.log file.\n- From your function perspective, the current working directory is this experiment directory,\ntherefore results (model weights, data preprocessing, etc) can be saved directly in it.\n- Expyrun does not try to copy all your dependencies (for instance data read by your code), as this\nwould be too heavy. You are responsible to keep the data the code reads at the same location. Or\nyou should overwrite the location of the data when reproducing.\n- You should probably look at dacite and dataclasses to create nicely typed configuration in your code.\nBut this is out of the scope of Expyrun.\n\n## Configuration file format\nThere are three special sections reserved for Expyrun in the yaml files:\n\n- \\_\\_default__: Inherit keys and values from one or several other configurations\n    (can be a string or a list of strings). Each path can be absolute (/path/to/default.yml),\n    relative to the current directory (path/to/default.yml), or relative to the current yaml\n    config (./path/to/default.yml). If not set, it is considered empty.\n    This allows you to build a common default configuration between your experiences.\n\n- \\_\\_new_key_policy__: How to handle new keys in a configuration that inherits from others.\n    Accepted values: \"raise\", \"warn\", \"pass\". Default: \"warn\".\n    A new key is a key that is present in the current configuration but absent from any of\n    its parents (which is probably weird).\n\n- \\_\\_run__: The most important section. It defines the metadata for running your experiment.\n    It has 4 different keys:\n    - \\_\\_main__: Main function to run (Mandatory). Expected signature: Callable[[str, dict], None].\n        This function will be called with the experiment name and the experiment configuration.\n        A valid main function string is given as package.subpackage.module:function.\n        Expyrun will search the package inside the current working directory.\n    - \\_\\_name__: Name of the experiment. (Mandatory) Used to compute the true output directory,\n        it will be given to the main function.\n    - \\_\\_output_dir__: Base path for outputs to be stored (Mandatory). The outputs will be stored\n        in {output_dir}/{name}/exp.{i} or {output_dir}/DEBUG/{name}/exp.{i} in debug mode.\n        (for the ith experiment of the same name)\n    - \\_\\_code__: Optional path to the code. Expyrun searches the code package in the current\n        working directory by default. This allows you to change this behavior.\n\n## Concrete example\nLet's assume the following architecture\n\n- my_project/\n    - data/\n    - my_code/\n        - \\_\\_init__.py\n        - utils.py\n        - data.py\n        - experiments/\n            - \\_\\_init__.py\n            - first_method.py\n            - second_method.py\n    - .git/\n    - .gitignore\n    - README.md\n\nDifferent experiments can be launched in the `experiments` package. (One file by experiment). And some code is shared between experiments,\nfor instance, the code handling the data.\n\nA simple way to create the configuration files would be to create a new configs directory following roughly the architecture of the code\n- my_project/\n    - configs/\n        - data.yml\n        - experiments/\n            - first_method.yml\n            - second_method.yml\n\n\n```yml\n# data.yml\n\ndata:\n    location: $DATA_FOLDER\n    train_size: 0.7\n```\n\n```yml\n# first_method.yml\n\n__default__: ../data.yml\n\n__run__:\n    __main__: my_code.experiments.first_method:main\n    __output_dir__: $OUTPUT_DIR\n    __name__: first_method/{model.name}-{training.lr}\n\nseed: 666\n\nmodel:\n    name: MyModel\n\ntraining:\n    seed: \"{seed}\"  # Have to add \"\" when starting with { char\n    lr: 0.001\n    batch_size: 10\n```\n\n```yml\n# second_method.yml\n\n__default__: ./first_method.yml\n\n__run__:\n    __main__: my_code.experiments.second_method:main\n    __name__: second_method/{model.name}-{training.size}\n\nseed: 777\n\nmodel:\n    name: MyModelBis\n\ntraining:\n    lr: 0.1\n    size: [10, 10]\n```\n\nThen within a terminal in the `my_project` directory, you can launch experiments with\n\n```bash\n$ expyrun configs/experiments/first_method.yml [--debug]\n# Change hyper parameters from arguments:\n$ expyrun configs/experiments/second_method.yml --training.size 15,15\n```\n\nHave a look at the `example` folder which implements another simple example.\n\nAfter running these two experiments $OUTPUT_DIR is filled this way:\n- $OUTPUT_DIR/\n    - first_method/\n        - MyModel-0.0001/\n            - exp.0/\n                - config.yml\n                - frozen_requirements.txt\n                - my_code/\n                - outputs.log\n                - raw_config.yml\n    - second_method/\n        - MyModelBis-[10,10]/\n            - exp.0/\n                - config.yml\n                - frozen_requirements.txt\n                - my_code/\n                - outputs.log\n                - raw_config.yml\n\nTo execute them again precisely, you should build a new environment\nfrom the frozen_requirements. Then execute Expyrun with the config.yml file.\n\nTo start from an experiment and change some hyperparameters,\nthen use the raw_config.yml file and use args in command-line to overwrite\nwhat you want. (raw_config is the unparsed config. Therefore if you change\nsome hyperparameters, other values, for instance the name, will be adapted too.)\n\n```bash\n# Reproduce and change existing experiments\n$ expyrun $OUTPUT_DIR/first_method/MyModel-0.0001/exp.0/config.yml\n$ expyrun $OUTPUT_DIR/first_method/MyModel-0.0001/exp.0/raw_config.yml --training.lr 0.001  # Name will be format with the new value of lr\n```\n\nAfter running these two lines here is the output tree:\n- $OUTPUT_DIR/\n    - first_method/\n        - MyModel-0.0001/\n            - exp.0/\n            - exp.1/\n        - MyModel-0.001/\n            - exp.0/\n    - second_method/\n        - MyModelBis-[10,10]/\n            - exp.0/\n\n\n## Environment variables used by Expyrun\n\n- `EXPYRUN_CWD`: Working directory when expyrun has been launched. Expyrun will set this variable before changing to the real working directory.\n    Can be useful to know exactly where we came from.\n\n\n## Build and Deploy\n\n```bash\n$ python -m build\n$ python -m twine upload dist/*\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Run reproducible experiments from yaml configuration file",
    "version": "0.2.0",
    "project_urls": {
        "Homepage": "https://github.com/raphaelreme/expyrun"
    },
    "split_keywords": [
        "experiments",
        " reproducibility",
        " machine learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a54a269b4d19a56d92d2d014ba4bf9957d836c1fbd760162d84bb4b8f752b44d",
                "md5": "539c1efaac07729bd5c4cc8e17f75e0f",
                "sha256": "4e4ff2f76bb289dfd5551486379d7343a17b2d235725c6144dff04bdbc0893ce"
            },
            "downloads": -1,
            "filename": "expyrun-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "539c1efaac07729bd5c4cc8e17f75e0f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 12721,
            "upload_time": "2024-10-24T14:25:35",
            "upload_time_iso_8601": "2024-10-24T14:25:35.194110Z",
            "url": "https://files.pythonhosted.org/packages/a5/4a/269b4d19a56d92d2d014ba4bf9957d836c1fbd760162d84bb4b8f752b44d/expyrun-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b11bb7228e6da9b88f8a25362641fe5a0f3cc82070c20a374dc92230f72a3bf0",
                "md5": "e2e607f0b448400c3fa7802672e76830",
                "sha256": "954fd7c66a81848a729391f424e24ea651e87df8a64c7d8855b24500a5810c21"
            },
            "downloads": -1,
            "filename": "expyrun-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "e2e607f0b448400c3fa7802672e76830",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 14539,
            "upload_time": "2024-10-24T14:25:36",
            "upload_time_iso_8601": "2024-10-24T14:25:36.742903Z",
            "url": "https://files.pythonhosted.org/packages/b1/1b/b7228e6da9b88f8a25362641fe5a0f3cc82070c20a374dc92230f72a3bf0/expyrun-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-24 14:25:36",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "raphaelreme",
    "github_project": "expyrun",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "expyrun"
}

Raphael Reme