biomero

Name	biomero JSON
Version	1.14.0 JSON
	download
home_page	None
Summary	A python library for easy connecting between OMERO (jobs) and a Slurm cluster
upload_time	2024-07-24 08:50:23
maintainer	None
docs_url	None
author	Core Facility - Cellular Imaging
requires_python	>=3.7
license	Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
keywords	omero slurm high-performance-computing fair image-analysis bioimaging high-throughput-screening high-content-screening cytomine biomero biaflows
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # BIOMERO - BioImage analysis in OMERO
[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![DOI](https://zenodo.org/badge/638954891.svg)](https://zenodo.org/badge/latestdoi/638954891) [![PyPI - Version](https://img.shields.io/pypi/v/biomero)](https://pypi.org/project/biomero/) [![PyPI - Python Versions](https://img.shields.io/pypi/pyversions/biomero)](https://pypi.org/project/biomero/) ![Slurm](https://img.shields.io/badge/Slurm-21.08.6-blue.svg) ![OMERO](https://img.shields.io/badge/OMERO-5.6.8-blue.svg) [![fair-software.eu](https://img.shields.io/badge/fair--software.eu-%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8F-green)](https://fair-software.eu) [![OpenSSF Best Practices](https://bestpractices.coreinfrastructure.org/projects/7530/badge)](https://bestpractices.coreinfrastructure.org/projects/7530) [![Sphinx build](https://github.com/NL-BioImaging/biomero/actions/workflows/sphinx.yml/badge.svg?branch=main)](https://github.com/NL-BioImaging/biomero/actions/workflows/sphinx.yml) [![pages-build-deployment](https://github.com/NL-BioImaging/biomero/actions/workflows/pages/pages-build-deployment/badge.svg)](https://github.com/NL-BioImaging/biomero/actions/workflows/pages/pages-build-deployment) [![python-package build](https://github.com/NL-BioImaging/biomero/actions/workflows/python-package.yml/badge.svg)](https://github.com/NL-BioImaging/biomero/actions/workflows/python-package.yml) [![python-publish build](https://github.com/NL-BioImaging/biomero/actions/workflows/python-publish.yml/badge.svg?branch=main)](https://github.com/NL-BioImaging/biomero/actions/workflows/python-publish.yml)

The **BIOMERO** framework, for **B**io**I**mage analysis in **OMERO**, allows you to run (FAIR) bioimage analysis workflows directly from OMERO on a high-performance compute (HPC) cluster, remotely through SSH.

The BIOMERO framework consists of this Python library `biomero`, together with the [BIOMERO scripts](https://github.com/NL-BioImaging/biomero-scripts) that can be run directly from the OMERO web interface.

The package includes the `SlurmClient` class, which provides **SSH-based connectivity** and interaction with a [Slurm](https://slurm.schedmd.com/quickstart.html) (high-performance compute) cluster. The package enables users to submit jobs, monitor job status, retrieve job output, and perform other Slurm-related tasks. Additionally, the package offers functionality for configuring and managing paths to Slurm data and Singularity images (think Docker containers...), as well as specific FAIR image analysis workflows and their associated repositories. 

Overall, the `biomero` package simplifies the integration of HPC functionality within the OMERO platform for admins and provides an efficient and end-user-friendly interface towards both the HPC and FAIR workflows.

_WARNING_: Please note that default settings are for short/medium jobs. If you run long workflows (>45min), you will run into 2 lethal issues:
- Your Slurm job will timeout after **45 minutes**! See [Time Limit on Slurm](#time-limit-on-slurm) on what configs to change.
- Your OMERO script (incl [biomero-scripts](https://github.com/NL-BioImaging/biomero-scripts)) will timeout after **60 minutes**! Change [omero script timeout](https://omero.readthedocs.io/en/stable/sysadmins/config.html#omero.scripts.timeout) settings if you expect longer workflows.

# Overview

In the figure below we show our **BIOMERO** framework, for **B**io**I**mage analysis in **OMERO**. 

BIOMERO consists of this Python library (`biomero`) and the integrations within OMERO, currently through our [BIOMERO scripts](https://github.com/NL-BioImaging/biomero-scripts).

![OMERO-Figure1_Overview_v5](https://github.com/NL-BioImaging/biomero/assets/68958516/ff437ed2-d4b7-48b4-a7e3-12f1dbf00981)



# Quickstart



For a quick overview of what this library can do for you, we can install an example setup locally with Docker:

1. Setup a local OMERO w/ this library: 
    - Follow Quickstart of https://github.com/Cellular-Imaging-Amsterdam-UMC/NL-BIOMERO
2. Setup a local Slurm w/ SSH access: 
    - Follow Quickstart of https://github.com/TorecLuik/slurm-docker-cluster
3. Upload some data with OMERO.insight to `localhost` server (... we are working on a web importer ... TBC)
4. Try out some scripts from https://github.com/NL-BioImaging/biomero-scripts (already installed in step 1!):
    1. Run script `slurm/init/SLURM Init environment...`
    2. Get a coffee or something. This will take at least 10 min to download all the workflow images. Maybe write a nice review on `image.sc` of this software, or here on the `Discussions` tab of Github.
    3. Select your image / dataset and run script `slurm/workflows/SLURM Run Workflow...`
        - Select at least one of the `Select how to import your results`, e.g. change `Import into NEW Dataset` text to `hello world`
        - Select a fun workflow, e.g. `cellpose`.
            - Change the `nuc channel` to the channel to segment (note that 0 is for grey, so 1,2,3 for RGB)
            - Uncheck the `use gpu` (step 2, our HPC cluster, doesn't come with GPU support built into the containers)
        - Refresh your OMERO `Explore` tab to see your `hello world` dataset with a mask image when the workflow is done!



# Prerequisites & Getting Started with BIOMERO

## Slurm Requirements
Note: This library has only been tested on Slurm versions 21.08.6 and 22.05.09 !

Your Slurm cluster/login node needs to have:
1. SSH access w/ public key (headless)
2. SCP access (generally comes with SSH)
3. 7zip installed
4. Singularity/Apptainer installed
5. (Optional) Git installed, if you want your own job scripts
6. Slurm accounting enabled

## OMERO Requirements

Your OMERO _processing_ node needs to have:
1. SSH client and access to the Slurm cluster (w/ private key / headless)
2. SCP access to the Slurm cluster
3. Python3.7+
4. This library installed 
    - Latest release on PyPI `python3 -m pip install biomero`
    - or latest Github version `python3 -m pip install 'git+https://github.com/NL-BioImaging/biomero'`
5. Configuration setup at `/etc/slurm-.ini`
6. Requirements for some scripts: `python3 -m pip install ezomero==1.1.1 tifffile==2020.9.3` and the [OMERO CLI Zarr plugin](https://github.com/ome/omero-cli-zarr).

Your OMERO _server_ node needs to have:
1. Some OMERO example scripts installed to interact with this library:
    - My examples on github: `https://github.com/NL-BioImaging/biomero-scripts`
    - Install those at `/opt/omero/server/OMERO.server/lib/scripts/slurm/`, e.g. `git clone https://github.com/NL-BioImaging/biomero-scripts.git <path>/slurm`

!!*NOTE*: Do not install [Example Minimal Slurm Script](https://github.com/NL-BioImaging/biomero-scripts/blob/master/Example_Minimal_Slurm_Script.py) if you do not trust your users with your Slurm cluster. It has literal Command Injection for the SSH user as a **FEATURE**. 




## Getting Started

To connect an OMERO processor to a Slurm cluster using the `biomero` library, users can follow these steps:

1. Setup passwordless public key authentication between your OMERO `processor` server and your HPC server. E.g. follow  a [SSH tutorial](https://www.ssh.com/academy/ssh/public-key-authentication) or [this one](https://linuxize.com/post/how-to-setup-passwordless-ssh-login/).
    - You could use 1 Slurm account for all `processor` servers, and share the same private key to all of them.
    - Or you could use unique accounts, but give them all the same alias in step 2.

2. Create a SSH config file named `config` in the `.ssh` directory of (all) the OMERO `processor` servers, within the `omero` user's home directory (`~/.ssh/config`). This file should specify the hostname, username, port, and private key path for the Slurm cluster, under some alias. This alias we will provide to the library. We provide an example in the [resources](./resources/config) directory.

    - This will allow a uniform SSH naming, and makes the connection headless; making it easy for the library.

    - Test the SSH connection manually! `ssh slurm` (as the omero user) should connect you to the Slurm server (given that you named it `slurm` in the `config`).

    - Congratulations! Now the servers are connected. Next, we make sure to setup the connection between OMERO and Slurm.

3. At this point, ensure that the `slurm-config.ini` file is correctly configured with the necessary SSH and Slurm settings, including the host, data path, images path, and model details. Customize the configuration according to the specific Slurm cluster setup. We provide an example in the [resources](./resources/slurm-config.ini) section. To read it automatically, place this `ini` file in one of the following locations (on the OMERO `processor` server):
    - `/etc/slurm-config.ini`
    - `~/slurm-config.ini`

    *Note*: Make sure to place the `slurm-config.ini` in the target folder at build time of your docker container instead of mounting it at runtime. This is because the library reads the config file at import time, and if it is not found, it will not work.

4. Install OMERO scripts from [OMERO Slurm Scripts](https://github.com/NL-BioImaging/biomero-scripts), e.g. 
    - `cd /opt/omero/server/OMERO.server/lib/scripts`
    - `git clone https://github.com/NL-BioImaging/biomero-scripts.git slurm`

!!*NOTE*: Do not install [Example Minimal Slurm Script](https://github.com/NL-BioImaging/biomero-scripts/blob/master/Example_Minimal_Slurm_Script.py) if you do not trust your users with your Slurm cluster. It has literal Command Injection for the SSH user as a **FEATURE**. 

5. Install [BIOMERO Scripts](https://github.com/NL-BioImaging/biomero-scripts/) requirements, e.g.
    - `python3 -m pip install ezomero==1.1.1 tifffile==2020.9.3` 
    - the [OMERO CLI Zarr plugin](https://github.com/ome/omero-cli-zarr), e.g. 
    `python3 -m pip install omero-cli-zarr==0.5.3` && `yum install -y blosc-devel`
    - the [bioformats2raw-0.7.0](https://github.com/glencoesoftware/bioformats2raw/releases/download/v0.7.0/bioformats2raw-0.7.0.zip), e.g. `unzip -d /opt bioformats2raw-0.7.0.zip && export PATH="$PATH:/opt/bioformats2raw-0.7.0/bin"`

6. To finish setting up your `SlurmClient` and Slurm server, run it once with `init_slurm=True`. This is provided in a OMERO script form at [init/Slurm Init environment](https://github.com/NL-BioImaging/biomero-scripts/blob/master/init/SLURM_Init_environment.py) , which you just installed in previous step.
    - Provide the configfile location explicitly if it is not a default one defined earlier, otherwise you can omit that field. 
    - Please note the requirements for your Slurm cluster. We do not install Singularity / 7zip on your cluster for you (at the time of writing).
    - This operation will make it create the directories you provided in the `slurm-config.ini`, pull any described Singularity images to the server (note: might take a while), and generate (or clone from Git) any job scripts for these workflows:

```python
with SlurmClient.from_config(configfile=configfile,
                            init_slurm=True) as slurmClient:
    slurmClient.validate(validate_slurm_setup=True)
```

With the configuration files in place, you can utilize the `SlurmClient` class from the `biomero` library to connect to the Slurm cluster over SSH, enabling the submission and management of Slurm jobs from an OMERO processor. 

# BIOMERO scripts

The easiest interaction from OMERO with this library currently is through our BIOMERO scripts, which are just a set of OMERO scripts using this library for all the steps one needs to run a image analysis workflow from OMERO on Slurm and retrieve the results back into OMERO.

!!*NOTE*: Do not install [Example Minimal Slurm Script](https://github.com/NL-BioImaging/biomero-scripts/blob/master/Example_Minimal_Slurm_Script.py) if you do not trust your users with your Slurm cluster. It has literal Command Injection for the SSH user as a **FEATURE**. 

We have provided the BIOMERO scripts at https://github.com/NL-BioImaging/biomero-scripts (hopefully installed in a previous step). 

For example, [workflows/Slurm Run Workflow](https://github.com/NL-BioImaging/biomero-scripts/blob/master/workflows/SLURM_Run_Workflow.py) should provide an easy way to send data to Slurm, run the configured and chosen workflow, poll Slurm until jobs are done (or errors) and retrieve the results when the job is done. This workflow script uses some of the other scripts, like

-  [`data/Slurm Image Transfer`](https://github.com/NL-BioImaging/biomero-scripts/blob/master/data/_SLURM_Image_Transfer.py): to export your selected images / dataset / screen as TIFF files to a Slurm dir.
- [`data/Slurm Get Results`](https://github.com/NL-BioImaging/biomero-scripts/blob/master/data/SLURM_Get_Results.py): to import your Slurm job results back into OMERO as a zip, dataset or attachment.

Other example OMERO scripts are:
- [`data/Slurm Get Update`](https://github.com/NL-BioImaging/biomero-scripts/blob/master/data/SLURM_Get_Update.py): to run while you are waiting on a job to finish on Slurm; it will try to get a `%` progress from your job's logfile. Depends on your job/workflow logging a `%` of course.

- [`workflows/Slurm Run Workflow Batched`](https://github.com/NL-BioImaging/biomero-scripts/blob/master/workflows/SLURM_Run_Workflow_Batched.py): This will allow you to run several `workflows/Slurm Run Workflow` in parallel, by batching your input images into smaller chunks (e.g. turn 64 images into 2 batches of 32 images each). It will then poll all these jobs.

- [`workflows/Slurm CellPose Segmentation`](https://github.com/NL-BioImaging/biomero-scripts/blob/master/workflows/SLURM_CellPose_Segmentation.py): This is a more primitive script that only runs the actual workflow `CellPose` (if correctly configured). You will need to manually transfer data first (with `Slurm Image Transfer`) and manually retrieve data afterward (with `Slurm Get Results`).

You are encouraged to create your own custom scripts. Do note the copy-left license enforced by OME.

# (Docker) containers
We host BIOMERO container dockerfiles at [NL-BIOMERO](https://github.com/Cellular-Imaging-Amsterdam-UMC/NL-BIOMERO), which publishes container images to our public dockerhub [cellularimagingcf](https://hub.docker.com/repositories/cellularimagingcf). Specifically the [cellularimagingcf/biomero](https://hub.docker.com/repository/docker/cellularimagingcf/biomero/general) image is an OMERO processor container with BIOMERO library installed. When we release a new version of BIOMERO, we will also release a new version of these containers (because we deploy these locally at our Core Facility - Cellular Imaging).

You can mount your specific configurations over those in the container, for example:

```
# Run the biomero container
echo "Starting BIOMERO..."
podman run -d --rm --name biomero \
  -e CONFIG_omero_master_host=omeroserver \
  -e OMERO_WORKER_NAME=biomero \
  -e CONFIG_omero_logging_level=10 \
  --network omero \
  --volume /mnt/datadisk/omero:/OMERO \
  --volume /mnt/data:/data \
  --volume /my/slurm-config.ini:/etc/slurm-config.ini \
  --secret ssh-config,target=/tmp/.ssh/config --secret ssh-key,target=/tmp/.ssh/id_rsa --secret ssh-pubkey,target=/tmp/.ssh/id_rsa.pub  --secret ssh-known_hosts,target=/tmp/.ssh/known_hosts \
  --userns=keep-id:uid=1000,gid=997 \
  cellularimagingcf/biomero:0.2.3
```

This will spin up the docker container (in Podman) with omero config (`-e CONFIG_omero_..`), mounting the required data drives (`--volume /mnt/...`) and adding a new slurm config (`--volume /my/slurm-config.ini:/etc/slurm-config.ini`) and the required SSH settings (`--secret ...,target=/tmp/.ssh/...`) to access the remote HPC.

Note: the [BIOMERO scripts](https://github.com/NL-BioImaging/biomero-scripts) are installed on the [main server](https://hub.docker.com/repository/docker/cellularimagingcf/omeroserver/general), not on the BIOMERO processor. 

Note2: We will also update these containers with our own desired changes, so they will likely not be 1:1 copy with basic omero containers. Especially when we start making a nicer UI for BIOMERO. We will keep up-to-date with the OMERO releases when possible.

# See the tutorials
I have also provided tutorials on connecting to a Local or Cloud Slurm, and tutorials on how to add your FAIR workflows to this setup. Those can give some more insights as well.

# SSH
Note: this library is built for **SSH-based connections**. If you could, it would be a lot easier to just have the OMERO `processor` server and the `slurm` client server be (on) the same machine: then you can just directly call `sbatch` and other `slurm` commands from OMERO scripts and Slurm would have better access to your data. 

This is mainly for those cases where you already have an external HPC cluster and want to connect your OMERO instance.

Theoretically, you could extend the `SlurmClient` class and change the `run` commands to not use SSH, but just a `subprocess`. We might implement this if we need it in the future.
But then you could also look at other Python libraries like [submitit](https://github.com/facebookincubator/submitit).

# SlurmClient class
The SlurmClient class is the main entrypoint in using this library.
It is a Python class that extends the Connection class from the Fabric library. It allows connecting to and interacting with a Slurm cluster over SSH. 

It includes attributes for specifying paths to directories for Slurm data and Singularity images, as well as specific paths, repositories, and Dockerhub information for different Singularity image models. 

The class provides methods for running commands on the remote Slurm host, submitting jobs, checking job status, retrieving job output, and tailing log files. 

It also offers a `from_config` class method to create a `SlurmClient` object by reading configuration parameters from a file. Overall, the class provides a convenient way to work with Slurm clusters and manage job execution and monitoring.


# slurm-config.ini
The `slurm-config.ini` file is a configuration file used by the `biomero` Python package to specify various settings related to SSH and Slurm. Here is a brief description of its contents:

[**SSH**]: This section contains SSH settings, including the alias for the SLURM SSH connection (host). Additional SSH configuration can be specified in the user's SSH config file or in `/etc/fabric.yml`.

[**SLURM**]: This section includes settings specific to Slurm. It defines the paths on the SLURM entrypoint for storing data files (slurm_data_path), container image files (slurm_images_path), and Slurm job scripts (slurm_script_path). It also specifies the repository (slurm_script_repo) from which to pull the Slurm scripts.

[**MODELS**]: This section is used to define different model settings. Each model has a unique key and requires corresponding values for `<key>_repo` (repository containing the descriptor.json file, which will describe parameters and where to find the image), and `<key>_job` (jobscript name and location in the `slurm_script_repo`). The example shows settings for several segmentation models, including Cellpose, Stardist, CellProfiler, DeepCell, and ImageJ.

Note also that you can override the default Slurm job values using this model configuration, like memory, GPU, time limit, etc.
All values for sbatch can be applied (see e.g. [here](https://slurm.schedmd.com/sbatch.html)) and will be forwarded to the job command.

For example
```
# Run CellPose Slurm with 10 GB GPU
cellpose_job_gres=gpu:1g.10gb:1
# Run CellPose Slurm with 15 GB CPU memory
cellpose_job_mem=15GB
```

The `slurm-config.ini` file allows users to configure paths, repositories, and other settings specific to their Slurm cluster and the `biomero` package, providing flexibility and customization options.

## Time Limit on Slurm
An important Slurm job config is the time limit: `SBATCH --time=00:45:00` is the default in BIOMERO (max 45 minutes per job).
The format is `d-hh:mm:ss`

WARNING: After this time, the job will timeout and this scenario is not handled by BIOMERO (yet)! You will lose your processing progress.

You can change this timeout value:

- For ALL workflows, in the [job_template.sh](./resources/job_template.sh) (e.g. `#SBATCH --time=08:00:00` for 8 hours)
- For ONE workflow, in the [slurm-config.ini](./resources/slurm-config.ini) (e.g. `cellpose_job_time=08:00:00` for 8 hours)
- Per specific run, provide it in the OMERO script UI like [SLURM_CellPose_Segmentation.py](https://github.com/NL-BioImaging/biomero-scripts/blob/master/workflows/SLURM_CellPose_Segmentation.py) 

Note that it might take longer for Slurm to schedule your job if you put the time very high, or possibly even make it wait indefinitely (see --time explanation in https://slurm.schedmd.com/sbatch.html). We will work on smart timing, but for now it is hardcoded and configurable.


# How to add an existing workflow

To add an existing (containerized) workflow, add it to the `slurm-config.ini` file like in our example:
```ini
# -------------------------------------
# CELLPOSE SEGMENTATION
# -------------------------------------
# The path to store the container on the slurm_images_path
cellpose=cellpose
# The (e.g. github) repository with the descriptor.json file
cellpose_repo=https://github.com/TorecLuik/W_NucleiSegmentation-Cellpose/tree/v1.2.7
# The jobscript in the 'slurm_script_repo'
cellpose_job=jobs/cellpose.sh
```
Here, 
1. the name referenced for this workflow is `cellpose`
2. the location of the container on slurm will be `<slurm_images_path>/cellpose`
3. the code repository is `https://github.com/TorecLuik/W_NucleiSegmentation-Cellpose` 
4. the specific version we want is `v1.2.7`
5. the container can be found on bitbucket
    - under the path given in the metadata file: [descriptor.json](https://github.com/TorecLuik/W_NucleiSegmentation-Cellpose/blob/v1.2.7/descriptor.json)
5. the location of the jobscript on slurm will be `<slurm_script_repo>/jobs/cellpose.sh`. 
    - This either references a git repo, where it matches this path, 
    - or it will be the location where the library will generate a jobscript (if no repo is given)

## Workflow metadata via descriptor.json
A lot of the automation in this library is based on metadata of the workflow, provided in the source code of the workflow, specifically the [descriptor.json](https://github.com/TorecLuik/W_NucleiSegmentation-Cellpose/blob/v1.2.7/descriptor.json).

For example, the OMERO script UI can be generated automatically, based on this descriptor. And also, the Slurm job script can be generated automatically, based on this descriptor.

This metadata scheme is (based on) Cytomine / BIAFLOWS, and you can find details of it and how to create one yourself on their website, e.g. this [Cytomine dev-guide](https://doc.uliege.cytomine.org/dev-guide/algorithms/write-app#create-the-json-descriptor) or this [BIAFLOWS dev-guide](https://neubias-wg5.github.io/developer_guide_add_new_workflow_to_biaflows_instance.html).

**NOTE!** We do not require the `cytomine_<...>` authentication parameters. They are not mandatory. In fact, we ignore them. But it might be beneficial to make your workflow compatible with Cytomine as well.

### Schema
At this point, we are using the `cytomine-0.1` [schema](https://doc.uliege.cytomine.org/dev-guide/algorithms/descriptor-reference), in the future we will also want to support other schemas, like [Boutiques](https://boutiques.github.io/), [commonwl](https://www.commonwl.org/) or [MLFlow](https://www.mlflow.org/docs/latest/projects.html). 

We will try to stay compatible with all such schemas (perhaps with less functionality because of missing metadata).

At this point, we do not strictly validate the schema, we just read expected fields from the `descriptor.json`.

## Multiple versions
Note that while it is possible to have multiple versions of the same workflow on Slurm (and select the desired one in OMERO), it is not possible to configure this yet. We assume for now you only want one version to start with. You can always update this config to download a new version to Slurm.

## I/O
Unless you change the `Slurm` job, the input is expected to be:
- The `infolder` parameter
    - pointing to a folder with multiple input files/images
- The `gtfolder` parameter (Optional)
    - pointing to a `ground-truth` input files, generally not needed for prediction / processing purposes.
- The `outfolder` parameter
    - where you write all your output files (to get copied back to OMERO)

### Wrapper.py
Note that you can also use the [wrapper.py](https://github.com/Neubias-WG5/W_Template/blob/master/wrapper.py) setup from BIAFLOWS to handle the I/O for you: 

```python
with BiaflowsJob.from_cli(argv) as bj:
        # Change following to the actual problem class of the workflow
        ...
        
        # 1. Prepare data for workflow
        in_imgs, gt_imgs, in_path, gt_path, out_path, tmp_path = prepare_data(problem_cls, bj, is_2d=True, **bj.flags)

        # 2. Run image analysis workflow
        bj.job.update(progress=25, statusComment="Launching workflow...")

        # Add here the code for running the analysis script

        # 3. Upload data to BIAFLOWS
        ...
        
        # 4. Compute and upload metrics
        ...

        # 5. Pipeline finished
        ...
```

This wrapper handles the input parameters for you, providing the input images as `in_imgs`, et cetera. Then you add your commandline call between point 2 and 3, and possibly some preprocessing between point 1 and 2:
```python
#add here the code for running the analysis script
```

For example, from [Cellpose](https://github.com/TorecLuik/W_NucleiSegmentation-Cellpose/blob/master/wrapper.py) container workflow:
```python
...

# 2. Run image analysis workflow
bj.job.update(progress=25, statusComment="Launching workflow...")

# Add here the code for running the analysis script
prob_thresh = bj.parameters.prob_threshold
diameter = bj.parameters.diameter
cp_model = bj.parameters.cp_model
use_gpu = bj.parameters.use_gpu
print(f"Chosen model: {cp_model} | Channel {nuc_channel} | Diameter {diameter} | Cell prob threshold {prob_thresh} | GPU {use_gpu}")
cmd = ["python", "-m", "cellpose", "--dir", tmp_path, "--pretrained_model", f"{cp_model}", "--save_tif", "--no_npy", "--chan", "{:d}".format(nuc_channel), "--diameter", "{:f}".format(diameter), "--cellprob_threshold", "{:f}".format(prob_thresh)]
if use_gpu:
    print("Using GPU!")
    cmd.append("--use_gpu")
status = subprocess.run(cmd)

if status.returncode != 0:
    print("Running Cellpose failed, terminate")
    sys.exit(1)

# Crop to original shape
for bimg in in_imgs:
    shape = resized.get(bimg.filename, None)
    if shape:
        img = imageio.imread(os.path.join(tmp_path,bimg.filename_no_extension+"_cp_masks.tif"))
        img = img[0:shape[0], 0:shape[1]]
        imageio.imwrite(os.path.join(out_path,bimg.filename), img)
    else:
        shutil.copy(os.path.join(tmp_path,bimg.filename_no_extension+"_cp_masks.tif"), os.path.join(out_path,bimg.filename))

# 3. Upload data to BIAFLOWS
```
We get the commandline parameters from `bj.parameters` (biaflows job) and provide that the `cmd` commandline string. Then we run it with `subprocess.run(cmd)` and check the `status`. 

We use a `tmp_path` to store both input and output, then move the output to the `out_path` after the processing is done.

Also note that some preprocessing is done in step 1: 
```python
# Make sure all images have at least 224x224 dimensions
# and that minshape / maxshape * minshape >= 224
# 0 = Grayscale (if input RGB, convert to grayscale)
# 1,2,3 = rgb channel
nuc_channel = bj.parameters.nuc_channel
resized = {}
for bfimg in in_imgs:
    ...
    imageio.imwrite(os.path.join(tmp_path, bfimg.filename), img)
```

Another example is this `imageJ` [wrapper](https://github.com/Neubias-WG5/W_NucleiSegmentation3D-ImageJ/blob/master/wrapper.py):
```python
...

# 3. Call the image analysis workflow using the run script
nj.job.update(progress=25, statusComment="Launching workflow...")

command = "/usr/bin/xvfb-run java -Xmx6000m -cp /fiji/jars/ij.jar ij.ImageJ --headless --console " \
            "-macro macro.ijm \"input={}, output={}, radius={}, min_threshold={}\"".format(in_path, out_path, nj.parameters.ij_radius, nj.parameters.ij_min_threshold)
return_code = call(command, shell=True, cwd="/fiji")  # waits for the subprocess to return

if return_code != 0:
    err_desc = "Failed to execute the ImageJ macro (return code: {})".format(return_code)
    nj.job.update(progress=50, statusComment=err_desc)
    raise ValueError(err_desc)
    
```
Once again, just a commandline `--headless` call to `ImageJ`, wrapped in this Python script and this container.


# How to add your new custom workflow
Building workflows like this will make them more [FAIR](https://www.go-fair.org/fair-principles/) (also for [software](https://fair-software.eu/about)) and uses best practices like code versioning and containerization!

Also take a look at our in-depth tutorial on adding a Cellprofiler pipeline as a workflow to BIOMERO.

Here is a shorter version:
Say you have a script in Python and you want to make it available on OMERO and Slurm.

These are the steps required:

1. Rewrite your script to be headless / to be executable on the commandline. This requires handling of commandline parameters as input.
    - Make sure the I/O matches the Slurm job, see [previous chapter](#io).
2. Describe these commandline parameters in a `descriptor.json` (see previous [chapter](#workflow-metadata-via-descriptorjson)). E.g. [like this](https://doc.uliege.cytomine.org/dev-guide/algorithms/write-app#create-the-json-descriptor).
3. Describe the requirements / environment of your script in a `requirements.txt`, [like this](https://learnpython.com/blog/python-requirements-file/). Make sure to pin your versions for future reproducability!
2. Package your script in a Docker container. E.g. [like this](https://www.docker.com/blog/how-to-dockerize-your-python-applications/).
    - Note: Please watch out for the pitfalls of reproducability with Dockerfiles: [Always version your packages!](https://pythonspeed.com/articles/dockerizing-python-is-hard/).
3. Publish your source code, Dockerfile and descriptor.json to a new Github repository (free for public repositories). You can generate a new repository [from template](https://docs.github.com/en/repositories/creating-and-managing-repositories/creating-a-repository-from-a-template), using [this template](https://github.com/Neubias-WG5/W_Template) provided by Neubias (BIAFLOWS). Then replace the input of the files with yours.
4. (Recommended) Publish a new version of your code (e.g. v1.0.0). E.g. [like this](https://docs.github.com/en/repositories/releasing-projects-on-github/managing-releases-in-a-repository).
5. Publish your container on Dockerhub (free for public repositories), using the same versioning as your source code. [Like this](https://docs.docker.com/get-started/publish-your-own-image/) from Windows Docker or [like this](https://www.geeksforgeeks.org/docker-publishing-images-to-docker-hub/) from a commandline.
    - (Recommended) Please use a tag that equals your repository version, instead of `latest`. This improves reproducability!
    - (Optional) this library grabs `latest` if the code repository is given no version, but the `master` branch.
6. Follow the steps from the previous [chapter](#how-to-add-an-existing-workflow):
    - Add details to `slurm-config.ini`
    - Run `SlurmClient.from_config(init_slurm=True)` (e.g. the init environment script.)

# Slurm jobs

## Generating jobs
By default, `biomero` will generate basic slurm jobs for each workflow, based on the metadata provided in `descriptor.json` and a [job template](./resources/job_template.sh).
It will replace `$PARAMS` with the (non-`cytomine_`) parameters given in `descriptor.json`. See also the [Parameters](#parameters) section below.

## How to add your own Slurm job
You could change the [job template](./resources/job_template.sh) and generate new jobs, by running `SlurmClient.from_config(init_slurm=True)` (or `slurmClient.update_slurm_scripts(generate_jobs=True)`) 

Or you could add your jobs to a [Github repository](https://github.com/TorecLuik/slurm-scripts) and reference this in `slurm-config.ini`, both in the field `slurm_script_repo` and every `<workflow>_job`:

```ini
# -------------------------------------
# REPOSITORIES
# -------------------------------------
# A (github) repository to pull the slurm scripts from.
#
# Note: 
# If you provide no repository, we will generate scripts instead!
# Based on the job_template and the descriptor.json
#
slurm_script_repo=https://github.com/TorecLuik/slurm-scripts

[MODELS]
# -------------------------------------
# Model settings
# -------------------------------------
# ...
# -------------------------------------
# CELLPOSE SEGMENTATION
# -------------------------------------
# The path to store the container on the slurm_images_path
cellpose=cellpose
# The (e.g. github) repository with the descriptor.json file
cellpose_repo=https://github.com/TorecLuik/W_NucleiSegmentation-Cellpose/tree/v1.2.7
# The jobscript in the 'slurm_script_repo'
cellpose_job=jobs/cellpose.sh
```

You can update the jobs by calling `slurmClient.update_slurm_scripts()`, which will pull the repository('s default branch).

This might be useful, for example if you have other hardware requirements for your workflow(s) than the default job asks for, or if you want to run more than just 1 singularity container.

### Parameters
The library will provide the parameters from your `descriptor.json` as environment variables to the call. E.g. `set DIAMETER=0; sbatch ...`.

Other environment variables provided are:
- `DATA_PATH` 
    - Made of `<slurm_data_path>/<input_folder>`. The base dir for data folders for this execution. We expect it to contain `/data/in`, `/data/in` and `/data/in` folders in our template and data transfer setup.
- `IMAGE_PATH`
    - Made of `<slurm_images_path>/<model_path>`, as described in `slurm-config.ini`
- `IMAGE_VERSION`
- `SINGULARITY_IMAGE`
    - Already uses the `IMAGE_VERSION` above, as `<container_name>_<IMAGE_VERSION>.sif`

We (potentially) override the following Slurm job settings programmatically:
- `--mail-user={email}` (optional)
- `--time={time}` (optional)
- `--output=omero-%4j.log` (mandatory)

We could add more overrides in the future, and perhaps make them available as global configuration variables in `slurm-config.ini`.
# Batching
We can simply use `Slurm` for running your workflow 1:1, so 1 job to 1 workflow. This could speed up your workflow already, as `Slurm` servers are likely equipped with strong CPU and GPU.

However, `Slurm` is also built for parallel processing on multiple (or the same) servers. We can accomplish this by running multiple jobs for 1 workflow. This is simple for [embarrassingly parallel](https://en.wikipedia.org/wiki/Embarrassingly_parallel#:~:text=In%20parallel%20computing%2C%20an%20embarrassingly,a%20number%20of%20parallel%20tasks.) tasks, like segmenting multiple images: just provide each job with a different set of input images. If you have 100 images, you could run 10 jobs on 10 images and (given enough resources available for you on Slurm) that could be 10x faster. In theory, you could run 1 job per image, but at some point you run into the overhead cost of Slurm (and OMERO) and it might actually slow down again (as you incur this cost a 100 times instead of 10 times).

# Using the GPU on Slurm

Note, the [default](./resources/job_template.sh) Slurm job script will not request any GPU resources.

This is because GPU resources are expensive and some programs do not work with GPU.

We can instead _enable_ the use of GPU by either providing our own Slurm job scripts, or setting an override value in `slurm-config.ini`:

```ini
# -------------------------------------
# CELLPOSE SEGMENTATION
# -------------------------------------
# The path to store the container on the slurm_images_path
cellpose=cellpose
# The (e.g. github) repository with the descriptor.json file
cellpose_repo=https://github.com/TorecLuik/W_NucleiSegmentation-Cellpose/tree/v1.2.7
# The jobscript in the 'slurm_script_repo'
cellpose_job=jobs/cellpose.sh
# Override the default job values for this workflow
# Or add a job value to this workflow
# If you don't want to override, comment out / delete the line.
# Run CellPose Slurm with 10 GB GPU
cellpose_job_gres=gpu:1g.10gb:1
```

In fact, any `..._job_...=...` configuration value will be forwarded to the Slurm commandline.

Slurm commandline parameters override those in the script, so the above one requests 1 10GB gpu for Cellpose.

E.g. you could also set the time limit higher:

```ini
# -------------------------------------
# CELLPOSE SEGMENTATION
# -------------------------------------
# The path to store the container on the slurm_images_path
cellpose=cellpose
# The (e.g. github) repository with the descriptor.json file
cellpose_repo=https://github.com/TorecLuik/W_NucleiSegmentation-Cellpose/tree/v1.2.7
# The jobscript in the 'slurm_script_repo'
cellpose_job=jobs/cellpose.sh
# Override the default job values for this workflow
# Or add a job value to this workflow
# If you don't want to override, comment out / delete the line.
# Run with longer time limit
cellpose_job_time=00:30:00
```

Now the CellPose job should run for maximum of 30 minutes, instead of the default.

# Transfering data

We have added methods to this library to help with transferring data to the `Slurm` cluster, using the same SSH connection (via SCP or SFTP).

- `slurmClient.transfer_data(...)`
    - Transfer data to the Slurm cluster
- `slurmClient.unpack_data(...)`
    - Unpack zip file on the Slurm cluster
- `slurmClient.zip_data_on_slurm_server(...)`
    - Zip data on the Slurm cluster
- `slurmClient.copy_zip_locally(...)`
    - Transfer (zip) data from the Slurm cluster
- `slurmClient.get_logfile_from_slurm(...)`
    - Transfer logfile from the Slurm cluster

And more; see the docstring of `SlurmClient` and example OMERO scripts.

# Testing the Python code
You can test the library by installing the extra test dependencies:

1. Create a venv to isolate the python install:
`python -m venv venvTest`

2. Install OSC with test dependencies:
`venvTest/Scripts/python -m pip install .[test]`

3. Run pytest from this venv:
`venvTest/Scripts/pytest`

# Logging
Debug logging can be enabled with the standard python logging module, for example with logging.basicConfig():

```
import logging

logging.basicConfig(level='DEBUG')
```

For example in (the `__init__` of) a script:

```Python
if __name__ == '__main__':
    logging.basicConfig(level=logging.INFO,
                        format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
                        stream=sys.stdout)
    runScript()
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "biomero",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "omero, slurm, high-performance-computing, fair, image-analysis, bioimaging, high-throughput-screening, high-content-screening, cytomine, biomero, biaflows",
    "author": "Core Facility - Cellular Imaging",
    "author_email": "Torec Luik <t.t.luik@amsterdamumc.nl>, cellularimaging@amsterdamumc.nl",
    "download_url": "https://files.pythonhosted.org/packages/e2/d6/be63ab135714d728e6f21ca983ae9ed5e717e243a555b9ad228240db4c8d/biomero-1.14.0.tar.gz",
    "platform": null,
    "description": "# BIOMERO - BioImage analysis in OMERO\n[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![DOI](https://zenodo.org/badge/638954891.svg)](https://zenodo.org/badge/latestdoi/638954891) [![PyPI - Version](https://img.shields.io/pypi/v/biomero)](https://pypi.org/project/biomero/) [![PyPI - Python Versions](https://img.shields.io/pypi/pyversions/biomero)](https://pypi.org/project/biomero/) ![Slurm](https://img.shields.io/badge/Slurm-21.08.6-blue.svg) ![OMERO](https://img.shields.io/badge/OMERO-5.6.8-blue.svg) [![fair-software.eu](https://img.shields.io/badge/fair--software.eu-%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8F-green)](https://fair-software.eu) [![OpenSSF Best Practices](https://bestpractices.coreinfrastructure.org/projects/7530/badge)](https://bestpractices.coreinfrastructure.org/projects/7530) [![Sphinx build](https://github.com/NL-BioImaging/biomero/actions/workflows/sphinx.yml/badge.svg?branch=main)](https://github.com/NL-BioImaging/biomero/actions/workflows/sphinx.yml) [![pages-build-deployment](https://github.com/NL-BioImaging/biomero/actions/workflows/pages/pages-build-deployment/badge.svg)](https://github.com/NL-BioImaging/biomero/actions/workflows/pages/pages-build-deployment) [![python-package build](https://github.com/NL-BioImaging/biomero/actions/workflows/python-package.yml/badge.svg)](https://github.com/NL-BioImaging/biomero/actions/workflows/python-package.yml) [![python-publish build](https://github.com/NL-BioImaging/biomero/actions/workflows/python-publish.yml/badge.svg?branch=main)](https://github.com/NL-BioImaging/biomero/actions/workflows/python-publish.yml)\n\nThe **BIOMERO** framework, for **B**io**I**mage analysis in **OMERO**, allows you to run (FAIR) bioimage analysis workflows directly from OMERO on a high-performance compute (HPC) cluster, remotely through SSH.\n\nThe BIOMERO framework consists of this Python library `biomero`, together with the [BIOMERO scripts](https://github.com/NL-BioImaging/biomero-scripts) that can be run directly from the OMERO web interface.\n\nThe package includes the `SlurmClient` class, which provides **SSH-based connectivity** and interaction with a [Slurm](https://slurm.schedmd.com/quickstart.html) (high-performance compute) cluster. The package enables users to submit jobs, monitor job status, retrieve job output, and perform other Slurm-related tasks. Additionally, the package offers functionality for configuring and managing paths to Slurm data and Singularity images (think Docker containers...), as well as specific FAIR image analysis workflows and their associated repositories. \n\nOverall, the `biomero` package simplifies the integration of HPC functionality within the OMERO platform for admins and provides an efficient and end-user-friendly interface towards both the HPC and FAIR workflows.\n\n_WARNING_: Please note that default settings are for short/medium jobs. If you run long workflows (>45min), you will run into 2 lethal issues:\n- Your Slurm job will timeout after **45 minutes**! See [Time Limit on Slurm](#time-limit-on-slurm) on what configs to change.\n- Your OMERO script (incl [biomero-scripts](https://github.com/NL-BioImaging/biomero-scripts)) will timeout after **60 minutes**! Change [omero script timeout](https://omero.readthedocs.io/en/stable/sysadmins/config.html#omero.scripts.timeout) settings if you expect longer workflows.\n\n# Overview\n\nIn the figure below we show our **BIOMERO** framework, for **B**io**I**mage analysis in **OMERO**. \n\nBIOMERO consists of this Python library (`biomero`) and the integrations within OMERO, currently through our [BIOMERO scripts](https://github.com/NL-BioImaging/biomero-scripts).\n\n![OMERO-Figure1_Overview_v5](https://github.com/NL-BioImaging/biomero/assets/68958516/ff437ed2-d4b7-48b4-a7e3-12f1dbf00981)\n\n\n\n# Quickstart\n\n\n\nFor a quick overview of what this library can do for you, we can install an example setup locally with Docker:\n\n1. Setup a local OMERO w/ this library: \n    - Follow Quickstart of https://github.com/Cellular-Imaging-Amsterdam-UMC/NL-BIOMERO\n2. Setup a local Slurm w/ SSH access: \n    - Follow Quickstart of https://github.com/TorecLuik/slurm-docker-cluster\n3. Upload some data with OMERO.insight to `localhost` server (... we are working on a web importer ... TBC)\n4. Try out some scripts from https://github.com/NL-BioImaging/biomero-scripts (already installed in step 1!):\n    1. Run script `slurm/init/SLURM Init environment...`\n    2. Get a coffee or something. This will take at least 10 min to download all the workflow images. Maybe write a nice review on `image.sc` of this software, or here on the `Discussions` tab of Github.\n    3. Select your image / dataset and run script `slurm/workflows/SLURM Run Workflow...`\n        - Select at least one of the `Select how to import your results`, e.g. change `Import into NEW Dataset` text to `hello world`\n        - Select a fun workflow, e.g. `cellpose`.\n            - Change the `nuc channel` to the channel to segment (note that 0 is for grey, so 1,2,3 for RGB)\n            - Uncheck the `use gpu` (step 2, our HPC cluster, doesn't come with GPU support built into the containers)\n        - Refresh your OMERO `Explore` tab to see your `hello world` dataset with a mask image when the workflow is done!\n\n\n\n# Prerequisites & Getting Started with BIOMERO\n\n## Slurm Requirements\nNote: This library has only been tested on Slurm versions 21.08.6 and 22.05.09 !\n\nYour Slurm cluster/login node needs to have:\n1. SSH access w/ public key (headless)\n2. SCP access (generally comes with SSH)\n3. 7zip installed\n4. Singularity/Apptainer installed\n5. (Optional) Git installed, if you want your own job scripts\n6. Slurm accounting enabled\n\n## OMERO Requirements\n\nYour OMERO _processing_ node needs to have:\n1. SSH client and access to the Slurm cluster (w/ private key / headless)\n2. SCP access to the Slurm cluster\n3. Python3.7+\n4. This library installed \n    - Latest release on PyPI `python3 -m pip install biomero`\n    - or latest Github version `python3 -m pip install 'git+https://github.com/NL-BioImaging/biomero'`\n5. Configuration setup at `/etc/slurm-.ini`\n6. Requirements for some scripts: `python3 -m pip install ezomero==1.1.1 tifffile==2020.9.3` and the [OMERO CLI Zarr plugin](https://github.com/ome/omero-cli-zarr).\n\nYour OMERO _server_ node needs to have:\n1. Some OMERO example scripts installed to interact with this library:\n    - My examples on github: `https://github.com/NL-BioImaging/biomero-scripts`\n    - Install those at `/opt/omero/server/OMERO.server/lib/scripts/slurm/`, e.g. `git clone https://github.com/NL-BioImaging/biomero-scripts.git <path>/slurm`\n\n!!*NOTE*: Do not install [Example Minimal Slurm Script](https://github.com/NL-BioImaging/biomero-scripts/blob/master/Example_Minimal_Slurm_Script.py) if you do not trust your users with your Slurm cluster. It has literal Command Injection for the SSH user as a **FEATURE**. \n\n\n\n\n## Getting Started\n\nTo connect an OMERO processor to a Slurm cluster using the `biomero` library, users can follow these steps:\n\n1. Setup passwordless public key authentication between your OMERO `processor` server and your HPC server. E.g. follow  a [SSH tutorial](https://www.ssh.com/academy/ssh/public-key-authentication) or [this one](https://linuxize.com/post/how-to-setup-passwordless-ssh-login/).\n    - You could use 1 Slurm account for all `processor` servers, and share the same private key to all of them.\n    - Or you could use unique accounts, but give them all the same alias in step 2.\n\n2. Create a SSH config file named `config` in the `.ssh` directory of (all) the OMERO `processor` servers, within the `omero` user's home directory (`~/.ssh/config`). This file should specify the hostname, username, port, and private key path for the Slurm cluster, under some alias. This alias we will provide to the library. We provide an example in the [resources](./resources/config) directory.\n\n    - This will allow a uniform SSH naming, and makes the connection headless; making it easy for the library.\n\n    - Test the SSH connection manually! `ssh slurm` (as the omero user) should connect you to the Slurm server (given that you named it `slurm` in the `config`).\n\n    - Congratulations! Now the servers are connected. Next, we make sure to setup the connection between OMERO and Slurm.\n\n3. At this point, ensure that the `slurm-config.ini` file is correctly configured with the necessary SSH and Slurm settings, including the host, data path, images path, and model details. Customize the configuration according to the specific Slurm cluster setup. We provide an example in the [resources](./resources/slurm-config.ini) section. To read it automatically, place this `ini` file in one of the following locations (on the OMERO `processor` server):\n    - `/etc/slurm-config.ini`\n    - `~/slurm-config.ini`\n\n    *Note*: Make sure to place the `slurm-config.ini` in the target folder at build time of your docker container instead of mounting it at runtime. This is because the library reads the config file at import time, and if it is not found, it will not work.\n\n4. Install OMERO scripts from [OMERO Slurm Scripts](https://github.com/NL-BioImaging/biomero-scripts), e.g. \n    - `cd /opt/omero/server/OMERO.server/lib/scripts`\n    - `git clone https://github.com/NL-BioImaging/biomero-scripts.git slurm`\n\n!!*NOTE*: Do not install [Example Minimal Slurm Script](https://github.com/NL-BioImaging/biomero-scripts/blob/master/Example_Minimal_Slurm_Script.py) if you do not trust your users with your Slurm cluster. It has literal Command Injection for the SSH user as a **FEATURE**. \n\n5. Install [BIOMERO Scripts](https://github.com/NL-BioImaging/biomero-scripts/) requirements, e.g.\n    - `python3 -m pip install ezomero==1.1.1 tifffile==2020.9.3` \n    - the [OMERO CLI Zarr plugin](https://github.com/ome/omero-cli-zarr), e.g. \n    `python3 -m pip install omero-cli-zarr==0.5.3` && `yum install -y blosc-devel`\n    - the [bioformats2raw-0.7.0](https://github.com/glencoesoftware/bioformats2raw/releases/download/v0.7.0/bioformats2raw-0.7.0.zip), e.g. `unzip -d /opt bioformats2raw-0.7.0.zip && export PATH=\"$PATH:/opt/bioformats2raw-0.7.0/bin\"`\n\n6. To finish setting up your `SlurmClient` and Slurm server, run it once with `init_slurm=True`. This is provided in a OMERO script form at [init/Slurm Init environment](https://github.com/NL-BioImaging/biomero-scripts/blob/master/init/SLURM_Init_environment.py) , which you just installed in previous step.\n    - Provide the configfile location explicitly if it is not a default one defined earlier, otherwise you can omit that field. \n    - Please note the requirements for your Slurm cluster. We do not install Singularity / 7zip on your cluster for you (at the time of writing).\n    - This operation will make it create the directories you provided in the `slurm-config.ini`, pull any described Singularity images to the server (note: might take a while), and generate (or clone from Git) any job scripts for these workflows:\n\n```python\nwith SlurmClient.from_config(configfile=configfile,\n                            init_slurm=True) as slurmClient:\n    slurmClient.validate(validate_slurm_setup=True)\n```\n\nWith the configuration files in place, you can utilize the `SlurmClient` class from the `biomero` library to connect to the Slurm cluster over SSH, enabling the submission and management of Slurm jobs from an OMERO processor. \n\n# BIOMERO scripts\n\nThe easiest interaction from OMERO with this library currently is through our BIOMERO scripts, which are just a set of OMERO scripts using this library for all the steps one needs to run a image analysis workflow from OMERO on Slurm and retrieve the results back into OMERO.\n\n!!*NOTE*: Do not install [Example Minimal Slurm Script](https://github.com/NL-BioImaging/biomero-scripts/blob/master/Example_Minimal_Slurm_Script.py) if you do not trust your users with your Slurm cluster. It has literal Command Injection for the SSH user as a **FEATURE**. \n\nWe have provided the BIOMERO scripts at https://github.com/NL-BioImaging/biomero-scripts (hopefully installed in a previous step). \n\nFor example, [workflows/Slurm Run Workflow](https://github.com/NL-BioImaging/biomero-scripts/blob/master/workflows/SLURM_Run_Workflow.py) should provide an easy way to send data to Slurm, run the configured and chosen workflow, poll Slurm until jobs are done (or errors) and retrieve the results when the job is done. This workflow script uses some of the other scripts, like\n\n-  [`data/Slurm Image Transfer`](https://github.com/NL-BioImaging/biomero-scripts/blob/master/data/_SLURM_Image_Transfer.py): to export your selected images / dataset / screen as TIFF files to a Slurm dir.\n- [`data/Slurm Get Results`](https://github.com/NL-BioImaging/biomero-scripts/blob/master/data/SLURM_Get_Results.py): to import your Slurm job results back into OMERO as a zip, dataset or attachment.\n\nOther example OMERO scripts are:\n- [`data/Slurm Get Update`](https://github.com/NL-BioImaging/biomero-scripts/blob/master/data/SLURM_Get_Update.py): to run while you are waiting on a job to finish on Slurm; it will try to get a `%` progress from your job's logfile. Depends on your job/workflow logging a `%` of course.\n\n- [`workflows/Slurm Run Workflow Batched`](https://github.com/NL-BioImaging/biomero-scripts/blob/master/workflows/SLURM_Run_Workflow_Batched.py): This will allow you to run several `workflows/Slurm Run Workflow` in parallel, by batching your input images into smaller chunks (e.g. turn 64 images into 2 batches of 32 images each). It will then poll all these jobs.\n\n- [`workflows/Slurm CellPose Segmentation`](https://github.com/NL-BioImaging/biomero-scripts/blob/master/workflows/SLURM_CellPose_Segmentation.py): This is a more primitive script that only runs the actual workflow `CellPose` (if correctly configured). You will need to manually transfer data first (with `Slurm Image Transfer`) and manually retrieve data afterward (with `Slurm Get Results`).\n\nYou are encouraged to create your own custom scripts. Do note the copy-left license enforced by OME.\n\n# (Docker) containers\nWe host BIOMERO container dockerfiles at [NL-BIOMERO](https://github.com/Cellular-Imaging-Amsterdam-UMC/NL-BIOMERO), which publishes container images to our public dockerhub [cellularimagingcf](https://hub.docker.com/repositories/cellularimagingcf). Specifically the [cellularimagingcf/biomero](https://hub.docker.com/repository/docker/cellularimagingcf/biomero/general) image is an OMERO processor container with BIOMERO library installed. When we release a new version of BIOMERO, we will also release a new version of these containers (because we deploy these locally at our Core Facility - Cellular Imaging).\n\nYou can mount your specific configurations over those in the container, for example:\n\n```\n# Run the biomero container\necho \"Starting BIOMERO...\"\npodman run -d --rm --name biomero \\\n  -e CONFIG_omero_master_host=omeroserver \\\n  -e OMERO_WORKER_NAME=biomero \\\n  -e CONFIG_omero_logging_level=10 \\\n  --network omero \\\n  --volume /mnt/datadisk/omero:/OMERO \\\n  --volume /mnt/data:/data \\\n  --volume /my/slurm-config.ini:/etc/slurm-config.ini \\\n  --secret ssh-config,target=/tmp/.ssh/config --secret ssh-key,target=/tmp/.ssh/id_rsa --secret ssh-pubkey,target=/tmp/.ssh/id_rsa.pub  --secret ssh-known_hosts,target=/tmp/.ssh/known_hosts \\\n  --userns=keep-id:uid=1000,gid=997 \\\n  cellularimagingcf/biomero:0.2.3\n```\n\nThis will spin up the docker container (in Podman) with omero config (`-e CONFIG_omero_..`), mounting the required data drives (`--volume /mnt/...`) and adding a new slurm config (`--volume /my/slurm-config.ini:/etc/slurm-config.ini`) and the required SSH settings (`--secret ...,target=/tmp/.ssh/...`) to access the remote HPC.\n\nNote: the [BIOMERO scripts](https://github.com/NL-BioImaging/biomero-scripts) are installed on the [main server](https://hub.docker.com/repository/docker/cellularimagingcf/omeroserver/general), not on the BIOMERO processor. \n\nNote2: We will also update these containers with our own desired changes, so they will likely not be 1:1 copy with basic omero containers. Especially when we start making a nicer UI for BIOMERO. We will keep up-to-date with the OMERO releases when possible.\n\n# See the tutorials\nI have also provided tutorials on connecting to a Local or Cloud Slurm, and tutorials on how to add your FAIR workflows to this setup. Those can give some more insights as well.\n\n# SSH\nNote: this library is built for **SSH-based connections**. If you could, it would be a lot easier to just have the OMERO `processor` server and the `slurm` client server be (on) the same machine: then you can just directly call `sbatch` and other `slurm` commands from OMERO scripts and Slurm would have better access to your data. \n\nThis is mainly for those cases where you already have an external HPC cluster and want to connect your OMERO instance.\n\nTheoretically, you could extend the `SlurmClient` class and change the `run` commands to not use SSH, but just a `subprocess`. We might implement this if we need it in the future.\nBut then you could also look at other Python libraries like [submitit](https://github.com/facebookincubator/submitit).\n\n# SlurmClient class\nThe SlurmClient class is the main entrypoint in using this library.\nIt is a Python class that extends the Connection class from the Fabric library. It allows connecting to and interacting with a Slurm cluster over SSH. \n\nIt includes attributes for specifying paths to directories for Slurm data and Singularity images, as well as specific paths, repositories, and Dockerhub information for different Singularity image models. \n\nThe class provides methods for running commands on the remote Slurm host, submitting jobs, checking job status, retrieving job output, and tailing log files. \n\nIt also offers a `from_config` class method to create a `SlurmClient` object by reading configuration parameters from a file. Overall, the class provides a convenient way to work with Slurm clusters and manage job execution and monitoring.\n\n\n# slurm-config.ini\nThe `slurm-config.ini` file is a configuration file used by the `biomero` Python package to specify various settings related to SSH and Slurm. Here is a brief description of its contents:\n\n[**SSH**]: This section contains SSH settings, including the alias for the SLURM SSH connection (host). Additional SSH configuration can be specified in the user's SSH config file or in `/etc/fabric.yml`.\n\n[**SLURM**]: This section includes settings specific to Slurm. It defines the paths on the SLURM entrypoint for storing data files (slurm_data_path), container image files (slurm_images_path), and Slurm job scripts (slurm_script_path). It also specifies the repository (slurm_script_repo) from which to pull the Slurm scripts.\n\n[**MODELS**]: This section is used to define different model settings. Each model has a unique key and requires corresponding values for `<key>_repo` (repository containing the descriptor.json file, which will describe parameters and where to find the image), and `<key>_job` (jobscript name and location in the `slurm_script_repo`). The example shows settings for several segmentation models, including Cellpose, Stardist, CellProfiler, DeepCell, and ImageJ.\n\nNote also that you can override the default Slurm job values using this model configuration, like memory, GPU, time limit, etc.\nAll values for sbatch can be applied (see e.g. [here](https://slurm.schedmd.com/sbatch.html)) and will be forwarded to the job command.\n\nFor example\n```\n# Run CellPose Slurm with 10 GB GPU\ncellpose_job_gres=gpu:1g.10gb:1\n# Run CellPose Slurm with 15 GB CPU memory\ncellpose_job_mem=15GB\n```\n\nThe `slurm-config.ini` file allows users to configure paths, repositories, and other settings specific to their Slurm cluster and the `biomero` package, providing flexibility and customization options.\n\n## Time Limit on Slurm\nAn important Slurm job config is the time limit: `SBATCH --time=00:45:00` is the default in BIOMERO (max 45 minutes per job).\nThe format is `d-hh:mm:ss`\n\nWARNING: After this time, the job will timeout and this scenario is not handled by BIOMERO (yet)! You will lose your processing progress.\n\nYou can change this timeout value:\n\n- For ALL workflows, in the [job_template.sh](./resources/job_template.sh) (e.g. `#SBATCH --time=08:00:00` for 8 hours)\n- For ONE workflow, in the [slurm-config.ini](./resources/slurm-config.ini) (e.g. `cellpose_job_time=08:00:00` for 8 hours)\n- Per specific run, provide it in the OMERO script UI like [SLURM_CellPose_Segmentation.py](https://github.com/NL-BioImaging/biomero-scripts/blob/master/workflows/SLURM_CellPose_Segmentation.py) \n\nNote that it might take longer for Slurm to schedule your job if you put the time very high, or possibly even make it wait indefinitely (see --time explanation in https://slurm.schedmd.com/sbatch.html). We will work on smart timing, but for now it is hardcoded and configurable.\n\n\n# How to add an existing workflow\n\nTo add an existing (containerized) workflow, add it to the `slurm-config.ini` file like in our example:\n```ini\n# -------------------------------------\n# CELLPOSE SEGMENTATION\n# -------------------------------------\n# The path to store the container on the slurm_images_path\ncellpose=cellpose\n# The (e.g. github) repository with the descriptor.json file\ncellpose_repo=https://github.com/TorecLuik/W_NucleiSegmentation-Cellpose/tree/v1.2.7\n# The jobscript in the 'slurm_script_repo'\ncellpose_job=jobs/cellpose.sh\n```\nHere, \n1. the name referenced for this workflow is `cellpose`\n2. the location of the container on slurm will be `<slurm_images_path>/cellpose`\n3. the code repository is `https://github.com/TorecLuik/W_NucleiSegmentation-Cellpose` \n4. the specific version we want is `v1.2.7`\n5. the container can be found on bitbucket\n    - under the path given in the metadata file: [descriptor.json](https://github.com/TorecLuik/W_NucleiSegmentation-Cellpose/blob/v1.2.7/descriptor.json)\n5. the location of the jobscript on slurm will be `<slurm_script_repo>/jobs/cellpose.sh`. \n    - This either references a git repo, where it matches this path, \n    - or it will be the location where the library will generate a jobscript (if no repo is given)\n\n## Workflow metadata via descriptor.json\nA lot of the automation in this library is based on metadata of the workflow, provided in the source code of the workflow, specifically the [descriptor.json](https://github.com/TorecLuik/W_NucleiSegmentation-Cellpose/blob/v1.2.7/descriptor.json).\n\nFor example, the OMERO script UI can be generated automatically, based on this descriptor. And also, the Slurm job script can be generated automatically, based on this descriptor.\n\nThis metadata scheme is (based on) Cytomine / BIAFLOWS, and you can find details of it and how to create one yourself on their website, e.g. this [Cytomine dev-guide](https://doc.uliege.cytomine.org/dev-guide/algorithms/write-app#create-the-json-descriptor) or this [BIAFLOWS dev-guide](https://neubias-wg5.github.io/developer_guide_add_new_workflow_to_biaflows_instance.html).\n\n**NOTE!** We do not require the `cytomine_<...>` authentication parameters. They are not mandatory. In fact, we ignore them. But it might be beneficial to make your workflow compatible with Cytomine as well.\n\n### Schema\nAt this point, we are using the `cytomine-0.1` [schema](https://doc.uliege.cytomine.org/dev-guide/algorithms/descriptor-reference), in the future we will also want to support other schemas, like [Boutiques](https://boutiques.github.io/), [commonwl](https://www.commonwl.org/) or [MLFlow](https://www.mlflow.org/docs/latest/projects.html). \n\nWe will try to stay compatible with all such schemas (perhaps with less functionality because of missing metadata).\n\nAt this point, we do not strictly validate the schema, we just read expected fields from the `descriptor.json`.\n\n## Multiple versions\nNote that while it is possible to have multiple versions of the same workflow on Slurm (and select the desired one in OMERO), it is not possible to configure this yet. We assume for now you only want one version to start with. You can always update this config to download a new version to Slurm.\n\n## I/O\nUnless you change the `Slurm` job, the input is expected to be:\n- The `infolder` parameter\n    - pointing to a folder with multiple input files/images\n- The `gtfolder` parameter (Optional)\n    - pointing to a `ground-truth` input files, generally not needed for prediction / processing purposes.\n- The `outfolder` parameter\n    - where you write all your output files (to get copied back to OMERO)\n\n### Wrapper.py\nNote that you can also use the [wrapper.py](https://github.com/Neubias-WG5/W_Template/blob/master/wrapper.py) setup from BIAFLOWS to handle the I/O for you: \n\n```python\nwith BiaflowsJob.from_cli(argv) as bj:\n        # Change following to the actual problem class of the workflow\n        ...\n        \n        # 1. Prepare data for workflow\n        in_imgs, gt_imgs, in_path, gt_path, out_path, tmp_path = prepare_data(problem_cls, bj, is_2d=True, **bj.flags)\n\n        # 2. Run image analysis workflow\n        bj.job.update(progress=25, statusComment=\"Launching workflow...\")\n\n        # Add here the code for running the analysis script\n\n        # 3. Upload data to BIAFLOWS\n        ...\n        \n        # 4. Compute and upload metrics\n        ...\n\n        # 5. Pipeline finished\n        ...\n```\n\nThis wrapper handles the input parameters for you, providing the input images as `in_imgs`, et cetera. Then you add your commandline call between point 2 and 3, and possibly some preprocessing between point 1 and 2:\n```python\n#add here the code for running the analysis script\n```\n\nFor example, from [Cellpose](https://github.com/TorecLuik/W_NucleiSegmentation-Cellpose/blob/master/wrapper.py) container workflow:\n```python\n...\n\n# 2. Run image analysis workflow\nbj.job.update(progress=25, statusComment=\"Launching workflow...\")\n\n# Add here the code for running the analysis script\nprob_thresh = bj.parameters.prob_threshold\ndiameter = bj.parameters.diameter\ncp_model = bj.parameters.cp_model\nuse_gpu = bj.parameters.use_gpu\nprint(f\"Chosen model: {cp_model} | Channel {nuc_channel} | Diameter {diameter} | Cell prob threshold {prob_thresh} | GPU {use_gpu}\")\ncmd = [\"python\", \"-m\", \"cellpose\", \"--dir\", tmp_path, \"--pretrained_model\", f\"{cp_model}\", \"--save_tif\", \"--no_npy\", \"--chan\", \"{:d}\".format(nuc_channel), \"--diameter\", \"{:f}\".format(diameter), \"--cellprob_threshold\", \"{:f}\".format(prob_thresh)]\nif use_gpu:\n    print(\"Using GPU!\")\n    cmd.append(\"--use_gpu\")\nstatus = subprocess.run(cmd)\n\nif status.returncode != 0:\n    print(\"Running Cellpose failed, terminate\")\n    sys.exit(1)\n\n# Crop to original shape\nfor bimg in in_imgs:\n    shape = resized.get(bimg.filename, None)\n    if shape:\n        img = imageio.imread(os.path.join(tmp_path,bimg.filename_no_extension+\"_cp_masks.tif\"))\n        img = img[0:shape[0], 0:shape[1]]\n        imageio.imwrite(os.path.join(out_path,bimg.filename), img)\n    else:\n        shutil.copy(os.path.join(tmp_path,bimg.filename_no_extension+\"_cp_masks.tif\"), os.path.join(out_path,bimg.filename))\n\n# 3. Upload data to BIAFLOWS\n```\nWe get the commandline parameters from `bj.parameters` (biaflows job) and provide that the `cmd` commandline string. Then we run it with `subprocess.run(cmd)` and check the `status`. \n\nWe use a `tmp_path` to store both input and output, then move the output to the `out_path` after the processing is done.\n\nAlso note that some preprocessing is done in step 1: \n```python\n# Make sure all images have at least 224x224 dimensions\n# and that minshape / maxshape * minshape >= 224\n# 0 = Grayscale (if input RGB, convert to grayscale)\n# 1,2,3 = rgb channel\nnuc_channel = bj.parameters.nuc_channel\nresized = {}\nfor bfimg in in_imgs:\n    ...\n    imageio.imwrite(os.path.join(tmp_path, bfimg.filename), img)\n```\n\nAnother example is this `imageJ` [wrapper](https://github.com/Neubias-WG5/W_NucleiSegmentation3D-ImageJ/blob/master/wrapper.py):\n```python\n...\n\n# 3. Call the image analysis workflow using the run script\nnj.job.update(progress=25, statusComment=\"Launching workflow...\")\n\ncommand = \"/usr/bin/xvfb-run java -Xmx6000m -cp /fiji/jars/ij.jar ij.ImageJ --headless --console \" \\\n            \"-macro macro.ijm \\\"input={}, output={}, radius={}, min_threshold={}\\\"\".format(in_path, out_path, nj.parameters.ij_radius, nj.parameters.ij_min_threshold)\nreturn_code = call(command, shell=True, cwd=\"/fiji\")  # waits for the subprocess to return\n\nif return_code != 0:\n    err_desc = \"Failed to execute the ImageJ macro (return code: {})\".format(return_code)\n    nj.job.update(progress=50, statusComment=err_desc)\n    raise ValueError(err_desc)\n    \n```\nOnce again, just a commandline `--headless` call to `ImageJ`, wrapped in this Python script and this container.\n\n\n# How to add your new custom workflow\nBuilding workflows like this will make them more [FAIR](https://www.go-fair.org/fair-principles/) (also for [software](https://fair-software.eu/about)) and uses best practices like code versioning and containerization!\n\nAlso take a look at our in-depth tutorial on adding a Cellprofiler pipeline as a workflow to BIOMERO.\n\nHere is a shorter version:\nSay you have a script in Python and you want to make it available on OMERO and Slurm.\n\nThese are the steps required:\n\n1. Rewrite your script to be headless / to be executable on the commandline. This requires handling of commandline parameters as input.\n    - Make sure the I/O matches the Slurm job, see [previous chapter](#io).\n2. Describe these commandline parameters in a `descriptor.json` (see previous [chapter](#workflow-metadata-via-descriptorjson)). E.g. [like this](https://doc.uliege.cytomine.org/dev-guide/algorithms/write-app#create-the-json-descriptor).\n3. Describe the requirements / environment of your script in a `requirements.txt`, [like this](https://learnpython.com/blog/python-requirements-file/). Make sure to pin your versions for future reproducability!\n2. Package your script in a Docker container. E.g. [like this](https://www.docker.com/blog/how-to-dockerize-your-python-applications/).\n    - Note: Please watch out for the pitfalls of reproducability with Dockerfiles: [Always version your packages!](https://pythonspeed.com/articles/dockerizing-python-is-hard/).\n3. Publish your source code, Dockerfile and descriptor.json to a new Github repository (free for public repositories). You can generate a new repository [from template](https://docs.github.com/en/repositories/creating-and-managing-repositories/creating-a-repository-from-a-template), using [this template](https://github.com/Neubias-WG5/W_Template) provided by Neubias (BIAFLOWS). Then replace the input of the files with yours.\n4. (Recommended) Publish a new version of your code (e.g. v1.0.0). E.g. [like this](https://docs.github.com/en/repositories/releasing-projects-on-github/managing-releases-in-a-repository).\n5. Publish your container on Dockerhub (free for public repositories), using the same versioning as your source code. [Like this](https://docs.docker.com/get-started/publish-your-own-image/) from Windows Docker or [like this](https://www.geeksforgeeks.org/docker-publishing-images-to-docker-hub/) from a commandline.\n    - (Recommended) Please use a tag that equals your repository version, instead of `latest`. This improves reproducability!\n    - (Optional) this library grabs `latest` if the code repository is given no version, but the `master` branch.\n6. Follow the steps from the previous [chapter](#how-to-add-an-existing-workflow):\n    - Add details to `slurm-config.ini`\n    - Run `SlurmClient.from_config(init_slurm=True)` (e.g. the init environment script.)\n\n# Slurm jobs\n\n## Generating jobs\nBy default, `biomero` will generate basic slurm jobs for each workflow, based on the metadata provided in `descriptor.json` and a [job template](./resources/job_template.sh).\nIt will replace `$PARAMS` with the (non-`cytomine_`) parameters given in `descriptor.json`. See also the [Parameters](#parameters) section below.\n\n## How to add your own Slurm job\nYou could change the [job template](./resources/job_template.sh) and generate new jobs, by running `SlurmClient.from_config(init_slurm=True)` (or `slurmClient.update_slurm_scripts(generate_jobs=True)`) \n\nOr you could add your jobs to a [Github repository](https://github.com/TorecLuik/slurm-scripts) and reference this in `slurm-config.ini`, both in the field `slurm_script_repo` and every `<workflow>_job`:\n\n```ini\n# -------------------------------------\n# REPOSITORIES\n# -------------------------------------\n# A (github) repository to pull the slurm scripts from.\n#\n# Note: \n# If you provide no repository, we will generate scripts instead!\n# Based on the job_template and the descriptor.json\n#\nslurm_script_repo=https://github.com/TorecLuik/slurm-scripts\n\n[MODELS]\n# -------------------------------------\n# Model settings\n# -------------------------------------\n# ...\n# -------------------------------------\n# CELLPOSE SEGMENTATION\n# -------------------------------------\n# The path to store the container on the slurm_images_path\ncellpose=cellpose\n# The (e.g. github) repository with the descriptor.json file\ncellpose_repo=https://github.com/TorecLuik/W_NucleiSegmentation-Cellpose/tree/v1.2.7\n# The jobscript in the 'slurm_script_repo'\ncellpose_job=jobs/cellpose.sh\n```\n\nYou can update the jobs by calling `slurmClient.update_slurm_scripts()`, which will pull the repository('s default branch).\n\nThis might be useful, for example if you have other hardware requirements for your workflow(s) than the default job asks for, or if you want to run more than just 1 singularity container.\n\n### Parameters\nThe library will provide the parameters from your `descriptor.json` as environment variables to the call. E.g. `set DIAMETER=0; sbatch ...`.\n\nOther environment variables provided are:\n- `DATA_PATH` \n    - Made of `<slurm_data_path>/<input_folder>`. The base dir for data folders for this execution. We expect it to contain `/data/in`, `/data/in` and `/data/in` folders in our template and data transfer setup.\n- `IMAGE_PATH`\n    - Made of `<slurm_images_path>/<model_path>`, as described in `slurm-config.ini`\n- `IMAGE_VERSION`\n- `SINGULARITY_IMAGE`\n    - Already uses the `IMAGE_VERSION` above, as `<container_name>_<IMAGE_VERSION>.sif`\n\nWe (potentially) override the following Slurm job settings programmatically:\n- `--mail-user={email}` (optional)\n- `--time={time}` (optional)\n- `--output=omero-%4j.log` (mandatory)\n\nWe could add more overrides in the future, and perhaps make them available as global configuration variables in `slurm-config.ini`.\n# Batching\nWe can simply use `Slurm` for running your workflow 1:1, so 1 job to 1 workflow. This could speed up your workflow already, as `Slurm` servers are likely equipped with strong CPU and GPU.\n\nHowever, `Slurm` is also built for parallel processing on multiple (or the same) servers. We can accomplish this by running multiple jobs for 1 workflow. This is simple for [embarrassingly parallel](https://en.wikipedia.org/wiki/Embarrassingly_parallel#:~:text=In%20parallel%20computing%2C%20an%20embarrassingly,a%20number%20of%20parallel%20tasks.) tasks, like segmenting multiple images: just provide each job with a different set of input images. If you have 100 images, you could run 10 jobs on 10 images and (given enough resources available for you on Slurm) that could be 10x faster. In theory, you could run 1 job per image, but at some point you run into the overhead cost of Slurm (and OMERO) and it might actually slow down again (as you incur this cost a 100 times instead of 10 times).\n\n# Using the GPU on Slurm\n\nNote, the [default](./resources/job_template.sh) Slurm job script will not request any GPU resources.\n\nThis is because GPU resources are expensive and some programs do not work with GPU.\n\nWe can instead _enable_ the use of GPU by either providing our own Slurm job scripts, or setting an override value in `slurm-config.ini`:\n\n```ini\n# -------------------------------------\n# CELLPOSE SEGMENTATION\n# -------------------------------------\n# The path to store the container on the slurm_images_path\ncellpose=cellpose\n# The (e.g. github) repository with the descriptor.json file\ncellpose_repo=https://github.com/TorecLuik/W_NucleiSegmentation-Cellpose/tree/v1.2.7\n# The jobscript in the 'slurm_script_repo'\ncellpose_job=jobs/cellpose.sh\n# Override the default job values for this workflow\n# Or add a job value to this workflow\n# If you don't want to override, comment out / delete the line.\n# Run CellPose Slurm with 10 GB GPU\ncellpose_job_gres=gpu:1g.10gb:1\n```\n\nIn fact, any `..._job_...=...` configuration value will be forwarded to the Slurm commandline.\n\nSlurm commandline parameters override those in the script, so the above one requests 1 10GB gpu for Cellpose.\n\nE.g. you could also set the time limit higher:\n\n```ini\n# -------------------------------------\n# CELLPOSE SEGMENTATION\n# -------------------------------------\n# The path to store the container on the slurm_images_path\ncellpose=cellpose\n# The (e.g. github) repository with the descriptor.json file\ncellpose_repo=https://github.com/TorecLuik/W_NucleiSegmentation-Cellpose/tree/v1.2.7\n# The jobscript in the 'slurm_script_repo'\ncellpose_job=jobs/cellpose.sh\n# Override the default job values for this workflow\n# Or add a job value to this workflow\n# If you don't want to override, comment out / delete the line.\n# Run with longer time limit\ncellpose_job_time=00:30:00\n```\n\nNow the CellPose job should run for maximum of 30 minutes, instead of the default.\n\n# Transfering data\n\nWe have added methods to this library to help with transferring data to the `Slurm` cluster, using the same SSH connection (via SCP or SFTP).\n\n- `slurmClient.transfer_data(...)`\n    - Transfer data to the Slurm cluster\n- `slurmClient.unpack_data(...)`\n    - Unpack zip file on the Slurm cluster\n- `slurmClient.zip_data_on_slurm_server(...)`\n    - Zip data on the Slurm cluster\n- `slurmClient.copy_zip_locally(...)`\n    - Transfer (zip) data from the Slurm cluster\n- `slurmClient.get_logfile_from_slurm(...)`\n    - Transfer logfile from the Slurm cluster\n\nAnd more; see the docstring of `SlurmClient` and example OMERO scripts.\n\n# Testing the Python code\nYou can test the library by installing the extra test dependencies:\n\n1. Create a venv to isolate the python install:\n`python -m venv venvTest`\n\n2. Install OSC with test dependencies:\n`venvTest/Scripts/python -m pip install .[test]`\n\n3. Run pytest from this venv:\n`venvTest/Scripts/pytest`\n\n# Logging\nDebug logging can be enabled with the standard python logging module, for example with logging.basicConfig():\n\n```\nimport logging\n\nlogging.basicConfig(level='DEBUG')\n```\n\nFor example in (the `__init__` of) a script:\n\n```Python\nif __name__ == '__main__':\n    logging.basicConfig(level=logging.INFO,\n                        format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',\n                        stream=sys.stdout)\n    runScript()\n```\n",
    "bugtrack_url": null,
    "license": "Apache License Version 2.0, January 2004 http://www.apache.org/licenses/  TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION  1. Definitions.  \"License\" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.  \"Licensor\" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.  \"Legal Entity\" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, \"control\" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.  \"You\" (or \"Your\") shall mean an individual or Legal Entity exercising permissions granted by this License.  \"Source\" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.  \"Object\" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.  \"Work\" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).  \"Derivative Works\" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.  \"Contribution\" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, \"submitted\" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as \"Not a Contribution.\"  \"Contributor\" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.  2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.  3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.  4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:  (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and  (b) You must cause any modified files to carry prominent notices stating that You changed the files; and  (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and  (d) If the Work includes a \"NOTICE\" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.  You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.  5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.  6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.  7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.  8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.  9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.  END OF TERMS AND CONDITIONS  APPENDIX: How to apply the Apache License to your work.  To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets \"[]\" replaced with your own identifying information. (Don't include the brackets!)  The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same \"printed page\" as the copyright notice for easier identification within third-party archives.  Copyright [yyyy] [name of copyright owner]  Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at  http://www.apache.org/licenses/LICENSE-2.0  Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. ",
    "summary": "A python library for easy connecting between OMERO (jobs) and a Slurm cluster",
    "version": "1.14.0",
    "project_urls": {
        "Documentation": "https://nl-bioimaging.github.io/biomero/",
        "Homepage": "https://github.com/NL-BioImaging/biomero"
    },
    "split_keywords": [
        "omero",
        " slurm",
        " high-performance-computing",
        " fair",
        " image-analysis",
        " bioimaging",
        " high-throughput-screening",
        " high-content-screening",
        " cytomine",
        " biomero",
        " biaflows"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9f8818bf28e36a83ac7390c98b3bab80a81d1c1539a7b71a9c1222ce1cc5e269",
                "md5": "b1e657495217cfd83c9fd577fbdf4f63",
                "sha256": "c9aa228d132cac612e98cb844c71950708a0f809cf3486f403dc160b6d3337bc"
            },
            "downloads": -1,
            "filename": "biomero-1.14.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b1e657495217cfd83c9fd577fbdf4f63",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 1095952,
            "upload_time": "2024-07-24T08:50:17",
            "upload_time_iso_8601": "2024-07-24T08:50:17.898557Z",
            "url": "https://files.pythonhosted.org/packages/9f/88/18bf28e36a83ac7390c98b3bab80a81d1c1539a7b71a9c1222ce1cc5e269/biomero-1.14.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e2d6be63ab135714d728e6f21ca983ae9ed5e717e243a555b9ad228240db4c8d",
                "md5": "f122b46944f190afbee87c875aff4fbe",
                "sha256": "d4245ab7871e9f852d0a77cc763fe563dcf2f0befd4d8e74eaa50d521d58aea1"
            },
            "downloads": -1,
            "filename": "biomero-1.14.0.tar.gz",
            "has_sig": false,
            "md5_digest": "f122b46944f190afbee87c875aff4fbe",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 1114659,
            "upload_time": "2024-07-24T08:50:23",
            "upload_time_iso_8601": "2024-07-24T08:50:23.270744Z",
            "url": "https://files.pythonhosted.org/packages/e2/d6/be63ab135714d728e6f21ca983ae9ed5e717e243a555b9ad228240db4c8d/biomero-1.14.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-24 08:50:23",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "NL-BioImaging",
    "github_project": "biomero",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "biomero"
}

Core Facility - Cellular Imaging