experiment-runner

Name	experiment-runner JSON
Version	0.1.0 JSON
	download
home_page	None
Summary	A tool to orchestrate branch-based workflows and automate job submission for ACCESS experiments.
upload_time	2025-08-28 22:09:01
maintainer	None
docs_url	None
author	None
requires_python	None
license	Apache-2.0
keywords	experiment runner workflow access payu
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

# access-experiment-runner

[![CI](https://github.com/ACCESS-NRI/access-experiment-runner/actions/workflows/ci.yml/badge.svg)](https://github.com/ACCESS-NRI/access-experiment-runner/actions/workflows/ci.yml)
[![CD](https://github.com/ACCESS-NRI/access-experiment-runner/actions/workflows/cd.yml/badge.svg)](https://github.com/ACCESS-NRI/access-experiment-runner/actions/workflows/cd.yml)
[![Coverage Status](https://codecov.io/gh/ACCESS-NRI/access-experiment-runner/branch/main/graph/badge.svg)](https://codecov.io/gh/ACCESS-NRI/access-experiment-runner)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue?style=flat-square)](https://opensource.org/license/apache-2-0)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

## About
The main role of the ACCESS experiment runner is to manage and monitor experiment job runs on the supercomputing environment (e.g., `Gadi`). It builds on `Payu`, handling the orchestration of multiple configuration branches, experiment setup, and job lifecycle.

## Key features
- Leverages `Payu` and run multiple experiments from different configuration branches.

- Submits and tracks PBS jobs on `Gadi`; oversees job lifecycle from submission through completion.
- When a job completes within expected run times, the tool prints a confirmation and stops further submissions.
- If a job fails, users may choose to inspect the working directory to diagnose the root cause. The tool will detect the failure and pause further actions, giving the user control over whether to resubmit.
- Detects already running or queued jobs and avoids redundant submissions—quickly skips duplicates with a user notification.

## Installation
### User setup
The `experiment-runner` is installed in the `payu-dev` conda environment, hence loading `payu/dev` would directly make experiment-runner available for use.
```
module use /g/data/vk83/prerelease/modules && module load payu/dev
```

Alternatively, create and activate a python virtual environment, then install via pip,
```
python3 -m venv <path/to/venv> --system-site-packages
source <path/to/venv>/bin/activate

pip install experiment-runner
```

### Development setup
For contributors and developers, setup a development environment,
```
git clone https://github.com/ACCESS-NRI/access-experiment-runner.git
cd access-experiment-runner

# under a virtual environment
pip install -e .
```

## Usage
```
experiment-runner -i --help

usage: experiment-runner [-h] [-i INPUT_YAML_FILE]

Manage ACCESS experiments using configurable YAML input.
If no YAML file is specified, the tool will look for 'Experiment_runner.yaml' in the current directory.
If that file is missing, you must specify one with -i / --input-yaml-file.

options:
-h, --help show this help message and exit
-i INPUT_YAML_FILE, --input-yaml-file INPUT_YAML_FILE
Path to the YAML file specifying parameter values for experiment runs.
Defaults to 'Experiment_runner.yaml' if present in the current directory.
```

One YAML example is provided in `example/Experiment_runner_example.yaml`

```yaml
test_path: /g/data/{PROJECT}/{USER}/prototype-0.1.0
repository_directory: 1deg_jra55_ryf
running_branches: [ctrl, perturb_1, perturb_2]
keep_uuid: True
running_branches: # List of experiment branches to run.
- ctrl
- perturb_1
- perturb_2

nruns: # Number of runs for each branch; must match the order of running_branches.
- 2
- 0
- 0

# Starting point for each branch. Options include:
# cold: start from scratch (cold start).
# control/restartXXX: start from a specific control run restart index.
# perturb/restartXXX: start from a specific perturbation run restart index.
startfrom_restart:
- cold
- cold
- cold
```
where,

`test_path`: All control and perturbation experiment repositories.

`repository_directory`: Local directory name for the central repository, where the `running_branches` are forked from.

`running_branches`: A list of git branches representing experiments to run.

`keep_uuid`: Preserve unique identifiers (UUIDs) across runs.

`nruns`: A list indicating how many runs to perform for each branch listed in `running_branches`.

`startfrom_restart`: Starting point for each branch.

## Workflow example
1. Trigger the experiment
```
experiment-runner -i example/Experiment_runner_example.yaml
```
2. The tool then checks status:
- Completed:
```
... already completed " {doneruns}, hence no new runs.
```
- Failed:
```
Clean up a failed job {work_dir} and prepare it for resubmission.
```
- Running/Queued:
```
You have duplicated runs for in the same folder hence not submitting this job!
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "experiment-runner",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "experiment runner, workflow, access, payu",
    "author": null,
    "author_email": "Minghang Li <Minghang.Li1@anu.edu.au>",
    "download_url": "https://files.pythonhosted.org/packages/c6/bb/de552b829f402f0b58a905091903961031c1d5bddba52fe31726c76688c6/experiment_runner-0.1.0.tar.gz",
    "platform": null,
    "description": "# access-experiment-runner\n\n[![CI](https://github.com/ACCESS-NRI/access-experiment-runner/actions/workflows/ci.yml/badge.svg)](https://github.com/ACCESS-NRI/access-experiment-runner/actions/workflows/ci.yml)\n[![CD](https://github.com/ACCESS-NRI/access-experiment-runner/actions/workflows/cd.yml/badge.svg)](https://github.com/ACCESS-NRI/access-experiment-runner/actions/workflows/cd.yml)\n[![Coverage Status](https://codecov.io/gh/ACCESS-NRI/access-experiment-runner/branch/main/graph/badge.svg)](https://codecov.io/gh/ACCESS-NRI/access-experiment-runner)\n[![License](https://img.shields.io/badge/license-Apache%202.0-blue?style=flat-square)](https://opensource.org/license/apache-2-0)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\n## About\nThe main role of the ACCESS experiment runner is to manage and monitor experiment job runs on the supercomputing environment (e.g., `Gadi`). It builds on `Payu`, handling the orchestration of multiple configuration branches, experiment setup, and job lifecycle.\n\n## Key features\n- Leverages `Payu` and run multiple experiments from different configuration branches.\n\n- Submits and tracks PBS jobs on `Gadi`; oversees job lifecycle from submission through completion.\n  - When a job completes within expected run times, the tool prints a confirmation and stops further submissions.\n  - If a job fails, users may choose to inspect the working directory to diagnose the root cause. The tool will detect the failure and pause further actions, giving the user control over whether to resubmit.\n  - Detects already running or queued jobs and avoids redundant submissions\u2014quickly skips duplicates with a user notification.\n\n## Installation\n### User setup\nThe `experiment-runner` is installed in the `payu-dev` conda environment, hence loading `payu/dev` would directly make experiment-runner available for use.\n```\nmodule use /g/data/vk83/prerelease/modules && module load payu/dev\n```\n\nAlternatively, create and activate a python virtual environment, then install via pip,\n```\npython3 -m venv <path/to/venv> --system-site-packages\nsource <path/to/venv>/bin/activate\n\npip install experiment-runner\n```\n\n### Development setup\nFor contributors and developers, setup a development environment,\n```\ngit clone https://github.com/ACCESS-NRI/access-experiment-runner.git\ncd access-experiment-runner\n\n# under a virtual environment\npip install -e .\n```\n\n## Usage\n```\nexperiment-runner -i --help\n\nusage: experiment-runner [-h] [-i INPUT_YAML_FILE]\n\nManage ACCESS experiments using configurable YAML input.\nIf no YAML file is specified, the tool will look for 'Experiment_runner.yaml' in the current directory.\nIf that file is missing, you must specify one with -i / --input-yaml-file.\n\noptions:\n  -h, --help            show this help message and exit\n  -i INPUT_YAML_FILE, --input-yaml-file INPUT_YAML_FILE\n                        Path to the YAML file specifying parameter values for experiment runs.\n                        Defaults to 'Experiment_runner.yaml' if present in the current directory.\n```\n\nOne YAML example is provided in `example/Experiment_runner_example.yaml`\n\n```yaml\ntest_path: /g/data/{PROJECT}/{USER}/prototype-0.1.0\nrepository_directory: 1deg_jra55_ryf\nrunning_branches: [ctrl, perturb_1, perturb_2]\nkeep_uuid: True\nrunning_branches: # List of experiment branches to run.\n  - ctrl\n  - perturb_1\n  - perturb_2\n\nnruns: # Number of runs for each branch; must match the order of running_branches.\n  - 2\n  - 0\n  - 0\n\n# Starting point for each branch. Options include:\n#   cold: start from scratch (cold start).\n#   control/restartXXX: start from a specific control run restart index.\n#   perturb/restartXXX: start from a specific perturbation run restart index.\nstartfrom_restart:\n  - cold\n  - cold\n  - cold\n```\nwhere,\n\n`test_path`: All control and perturbation experiment repositories.\n\n`repository_directory`: Local directory name for the central repository, where the `running_branches` are forked from.\n\n`running_branches`: A list of git branches representing experiments to run.\n\n`keep_uuid`: Preserve unique identifiers (UUIDs) across runs.\n\n`nruns`: A list indicating how many runs to perform for each branch listed in `running_branches`.\n\n`startfrom_restart`: Starting point for each branch.\n\n## Workflow example\n1. Trigger the experiment\n```\nexperiment-runner -i example/Experiment_runner_example.yaml\n```\n2. The tool then checks status:\n- Completed:\n```\n... already completed \" {doneruns}, hence no new runs.\n```\n- Failed:\n```\nClean up a failed job {work_dir} and prepare it for resubmission.\n```\n- Running/Queued: \n```\nYou have duplicated runs for in the same folder hence not submitting this job!\n```\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "A tool to orchestrate branch-based workflows and automate job submission for ACCESS experiments.",
    "version": "0.1.0",
    "project_urls": {
        "Homepage": "https://github.com/ACCESS-NRI/access-experiment-runner"
    },
    "split_keywords": [
        "experiment runner",
        " workflow",
        " access",
        " payu"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0994f4b1e8150c1b5f63253c395534fa97bfd0d826c1c03d476e18707663194b",
                "md5": "371f2e948b81657e3feb1c62c1443cdb",
                "sha256": "e60828e08f0bb0392606739017e884a019bf4961e3262cc5084167b5250a0e19"
            },
            "downloads": -1,
            "filename": "experiment_runner-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "371f2e948b81657e3feb1c62c1443cdb",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 14359,
            "upload_time": "2025-08-28T22:09:00",
            "upload_time_iso_8601": "2025-08-28T22:09:00.603760Z",
            "url": "https://files.pythonhosted.org/packages/09/94/f4b1e8150c1b5f63253c395534fa97bfd0d826c1c03d476e18707663194b/experiment_runner-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c6bbde552b829f402f0b58a905091903961031c1d5bddba52fe31726c76688c6",
                "md5": "24ebb6229ecd4ddcc915255a5aa9a363",
                "sha256": "8b7edcba0f871a50fe65ab2c0baeca37012ba57550c9dbb3dd7f0ba52c813dc0"
            },
            "downloads": -1,
            "filename": "experiment_runner-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "24ebb6229ecd4ddcc915255a5aa9a363",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 23848,
            "upload_time": "2025-08-28T22:09:01",
            "upload_time_iso_8601": "2025-08-28T22:09:01.719597Z",
            "url": "https://files.pythonhosted.org/packages/c6/bb/de552b829f402f0b58a905091903961031c1d5bddba52fe31726c76688c6/experiment_runner-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-28 22:09:01",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ACCESS-NRI",
    "github_project": "access-experiment-runner",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "experiment-runner"
}

None