omnibenchmark


Nameomnibenchmark JSON
Version 0.0.46 PyPI version JSON
download
home_page
SummaryOmnibenchmark core utilities: Setup and running of continous benchmarking modules as part of omnibenchmark
upload_time2023-07-21 07:23:52
maintainer
docs_urlNone
author
requires_python>=3.8
license
keywords benchmark omnibenchmark renku
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Omnibenchmark

Generate and manage omnibenchmark modules for open and continuous benchmarking.
Each module represents a single building block of a benchmark, e.g., dataset, method, metric.
Omnibenchmark-py provides a structure to generate modules and automatically run them.

![](omni.svg)

## Installation

You can install **omnibenchmark** from [PyPI](https://pypi.org/project/omnibenchmark/):

```sh
pip install omnibenchmark
```

The module requires Python >= 3.8 and [renku >= 1.5.0](https://pypi.org/project/renku/).

## How to use omnibenchmark

For detailed documentation and tutorials, check the [omnibenchmark documentation](https://omnibenchmark.readthedocs.io).

### Quick start

Omnibenchmark uses the `renku` platform to run open and continuous benchmarks. To contribute an independent module to one of the existing benchmarks please start by creating a new [renku project](#Create-a-new-renku-project). Each module (= renkulab project) consists of a Docker image, which defines its software environment, a dataset to store outputs and metadata, a workflow that describes how to generate outputs and input and parameter datasets with input files and parameter definitions, if they are used. Thus, each module is an independent benchmark component and can be run, used and modified independently as such. Modules are connected by importing (result) datasets from other modules as input datasets and will automatically be updated according to them.  

All relevant information on how to run a specific module are stored as [`OmniObject`](#omnibenchmark-classes).
The most convenient way to generate an instance of an `OmniObject` is to build it from a `config.yaml` file:

``` python
## modules
from omnibenchmark.utils.build_omni_object import get_omni_object_from_yaml

## Load object
omni_obj = get_omni_object_from_yaml('src/config.yaml')

```

The `config.yaml` defines all module specific information as inputs, outputs, script to run the module, benchmark that the module belongs to and much more. A simple `config.yaml` file could look like this (see [The config.yaml file](#the-config.yaml-file) for more details):

```sh
---
data:
    name: "module-name"
    title: "A new module"
    description: "A new module for omnibenchmark, e.g., a dataset, method, metric,..."
    keywords: ["module-type-key"]
script: "path/to/module_script"
outputs:
    template: "data/${name}/${name}_${out_name}.${out_end}"
    files:
        counts: 
            end: "mtx.gz"
        data_info:
            end: "json"
        meta:
            end: "json"
benchmark_name: "an-omnibenchmark"
```
Once you have an instance of an `OmniObject` you can check if it looks as you expected like this:

```python
## Check object
print(omni_obj.__dict__)
print(omni_obj.outputs.file_mapping)
print(omni_obj.command.command_line)
``` 

If all inputs, outputs and the command line call look as you expected you can run your module:

```python
## create output dataset that stores all result/output files
omni_obj.create_dataset()

## Update inputs from other modules 
omni_obj.update_obj()

## Run your script with all defined inputs and outputs.
## This also generates a workflow description (plan) and is tracked as activity.
omni_obj.run_renku()

## Link output files to output dataset 
omni_obj.update_result_dataset()

## Save and commit to gitlab
renku_save()
```
Once these steps ran successfully and your outputs were generated the module is ready to be [submitted to become a part of omnibenchmark](#Submit-your-module).

### What is renku?
[Renku](https://renkulab.io) is a platform and tools for reproducible and collaborative data analysis from the [Swiss Data Science Centre](https://datascience.ch/). Besides other functionalities, renku provides a framework to create and run data analysis projects, which come with their own Docker container, datasets and workflows. By storing the metadata of projects and datasets on a knowledge graph, renku facilitates provenance tracking and project interactions. To do so renku combines a set of microservices:
- GitLab, for version control and project management
- GitLFS, for file storage
- Kubernetes/Docker, to manage containerized environments
- Jupyter server, to provide interactive sessions
- Apache Jena, to generate, store and manage triplets and the triplet store (knowledge graph) 

Details on how to use renku can be found in their [Documentation](https://renku.readthedocs.io/en/latest/index.html). Omnibenchmark uses renku to build and run collaborative and continuous benchmarks. 

#### Create a new renku project
Omnibenchmark modules are built as separate renku projects. Contributions to one of the existing benchmarks start by creating a new project using the [renku platform](https://renkulab.io). This can be done by registering directly or using a Github account, an ORCID or a Switch-EDU ID.
A new project can be created by a few clicks as described [here](https://renku.readthedocs.io/en/latest/tutorials/first_steps/01_create_project.html). Templates can be chosen depending on the projects code or the Basic Python template. Project can then be populated/changed in an interactive renku session (see session tab of the project) or within the GitLab instance or clone of the project (Overview tab --> View in GitLab). 

### Project requirements
Project requirements can be defined by adapting the `Dockerfile` and specifying the all required R packages (with their versions) in the `install.R` file and all required python modules (with their versions) in the `requirements.txt` file. The latter needs to contain at least `omnibenchmark`. If you work in an interactive session, you need to save/commit your changes either by running `renku save` or `git add/commit/push` and close and restart the session once the new Docker image has been built. The build of the docker image is triggered automatically after committing changes, but can take a while depending on the requirments (and the runner used).

### The config.yaml file
All specific information about a benchmark component (= renkulab project) can be specified in a `config.yaml` file. Below we show an example with all standard fields and explanations to them. Many fields are optional and do not apply to all modules. All unneccessary fields can be skipped. There are further optional fields for specfic edge cases, that are described in an extra `config.yaml` file. In general, the `config.yaml` file consists of a data, an input, an output and a parameter section as well as a few extra fields to define the main benchmark script and benchmark type. Except for the data section, the other sections are optional. Multiple values can be parsed as lists.

```yaml
# Data section to describe the object and the associated (result) dataset
data:
    # Name of the dataset
    name: "out_dataset"
    # Title of the dataset (Optional)
    title: "Output of an example OmniObject"
    # Description of the dataset (Optional)
    description: "This dataset is supposed to store the output files from the example omniobject"
    # Dataset keyword(s) to make this dataset reachable for other projects/benhcmark components
    keywords: ["example_dataset"]
# Script to be run by the workflow associated to the project
script: "path/to/method/dataset/metric/script.py"
# Interpreter to run the script (Optional, automatic detection)
interpreter: "python"
# Benchmark that the object is associated to.
benchmark_name: "omni_celltype"
# Orchestrator url of the benchmark (Optional, automatic detection)
orchestrator: "https://www.orchestrator_url.com"
# Input section to describe output file types. (Optional)
inputs:
    # Keyword to find input datasets, that shall be imported 
    keywords: ["import_this", "import_that"]
    # Input file types
    files: ["count_file", "dim_red_file"]
    # Prefix (part of the filename is sufficient) to automatically detect file types by their names
    prefix:
        count_file: "counts"
        dim_red_file: ["features", "genes"]
# Output section to describe output file types. (Optional)
outputs:
    # Output filetypes and their endings
    files:
        corrected_counts: 
            end: ".mtx.gz"
        meta:
            end: ".json"
# Parameter section to describe the parameter dataset, values and filter. (Optional)
parameter:
    # Names of the parameter to use
    names: ["param1", "param2"]
    # Keyword(s) used to import the parameter dataset
    keywords: ["param_dataset"]
    # Filter that specify limits, values or combinations to exclude
    filter:
        param1:
            upper: 50
            lower: 3
            exclude: 12
    param2:
        "path/to/file/with/parameter/combinations/to/exclude.json"
```

Below gives specific fields that are only relevant for edge cases. These fields have their counterparts in the generated [OmniObject](#omnibenchmark-classes).
Changes of the attributes of the OmniObject instance have the same effects, but come with the flexibility of running from python. 

```yaml
# Command to generate the workflow with (Optional, automatic detection)
command_line: "python path/to/method/dataset/metric/script.py --count_file data/import_this_dataset/...mtx.gz"
inputs:
    # Datasets and manual file type specifications (automatic detection!)
    input_files:
        import_this_dataset:
            count_file: "data/import_this_dataset/import_this_dataset__counts.mtx.gz"
            dim_red_file: "data/import_this_dataset/import_this_dataset__dim_red_file.json"
    # (Dataset) name that default input files belong to (Optional, automatic detection)
    default: "import_this_dataset"
    # Input dataset names that should be ignored (even if they have one of the specified input keywords assciated)
    filter_names: ["data1", "data2"]
outputs:
    # Template to automatically generate output filenames (Optional - recommended for advanced user only)
    template: "data/${name}/${name}_${unique_values}_${out_name}.${out_end}"
    # Variables used for automatic output filename generation (Optional - recommended for advanced user only)
    template_vars:
        vars1: "random"
        vars2: "variable"
    # Manual specification of mapping for output files and their corresponding input files and parameter values (automatic detection!)
    file_mapping:
        mapping1: 
            output_files:
                corrected_counts: "data/out_dataset/out_dataset_import_this__param1_10__param2_test_corrected_counts.mtx.gz"
                meta: "data/out_dataset/out_dataset_import_this__param1_10__param2_test_meta.json"
        input_files:
            count_file: "data/import_this_dataset/import_this_dataset__counts.mtx.gz"
            dim_red_file: "data/import_this_dataset/import_this_dataset__dim_red_file.json"
        parameter:
            param1: 10
            param2: "test"
    # Default output files (Optional, automatic detection)
    default:
        corrected_counts: "data/out_dataset/out_dataset_import_this__param1_10__param2_test_corrected_counts.mtx.gz"
        meta: "data/out_dataset/out_dataset_import_this__param1_10__param2_test_meta.json"
parameter:
    default:
        param1: 10
        param2: "test"
```

### Omnibenchmark classes
Classes to manage omnibenchmark modules and their interactions. The main class is the [OmniObject](#omniobject), that consolidates all relevant information and functions of a module. This object has further subclasses that define inputs, outputs, commands and the workflow.

---

#### OmniObject
Main class to manage an omnibenchmark module. 
It takes the following arguments:
* **`name (str)`**: Module name 
* **`keyword (Optional[List[str]], optional)`**: Keyword associated to the modules output dataset.
* **`title (Optional[str], optional)`**: Title of the modules output dataset.
* **`description (Optional[str], optional)`**: Description of the modules output dataset.
* **`script (Optional[PathLike], optional)`**: Script to generate the modules workflow for.
* **`command (Optional[OmniCommand], optional)`**: Workflow command - will be automatically generated if missing.
* **`inputs (Optional[OmniInput], optional)`**: Definitions of the workflow inputs.
* **`parameter (Optional[OmniParameter], optional)`**: Definitions of the workflow parameter.
* **`outputs (Optional[OmniOutput], optional)`**: Definitions of the workflow outputs.
* **`omni_plan (Optional[OmniPlan], optional)`**: The workflow description.
* **`benchmark_name (Optional[str], optional)`**: Name of the associated benchmark.
* **`orchestrator (Optional[str], optional)`**: Orchestrator URL of the associated benchmark. Automatic detection.
* **`wflow_name (Optional[str], optional)`**: Workflow name. Will be set to the module name if none.
* **`dataset_name (Optional[str], optional)`**: Dataset name. Will be set to the module name if none.

The following methods can be run on an instance of an OmniObject:
* **`create_dataset()`**: Creates a renku dataset with the specified attributes in the current renku project. 
* **`update_object()`**: Checks for new imports or updates in the input and parameter datasets. Will update object attributes accordingly.
* **`run_renku()`**: Generates and updates the workflow and all output files as specified in the object.
* **`update_result_dataset()`**: Updates and adds all output datasets to the dataset specified in the object.
* **`clean_revert_run()`**: Clean/Remove all outputs, workflows and activities associated with the object (including KG connections).
* **`check_updates()`**: Dry run option for update_object().
* **`check_run()`**: Dry run option for run_renku().

---

#### OmniInput
Class to manage inputs of an omnibenchmark module.
This class has the following attributes:
* **`names (List[str])`**: Names of the input filetypes
* **`prefix (Optional[Mapping[str, List[str]]], optional)`**: Prefixes (or substrings) of the input filetypes.
* **`input_files (Optional[Mapping[str, Mapping[str, str]]], optional)`**: Input files ordered by file types.
* **`keyword (Optional[List[str]], optional)`**: Keyword to define which datasets are imported as input datasets.
* **`default (Optional[str], optional)`**: Default input name (e.g., dataset).
* **`filter_names (Optional[List[str]], optional)`**: Input dataset names to be ignored.
* **`multi_data_matching (Optional[bool])`**: If file matching across renku datasets should be allowed (defaults to False).

The following class methods can be run on an instance of an OmniInput:
* **`update_inputs()`**: Method to import new and update existing input datasets and update the object accordingly

---

#### OmniOutput
Class to manage outputs of an omnibenchmark module. 
This class has the following attributes:
* **`name (str)`**: Name that is specific for all outputs. Typically the module name/OmniObject name.
* **`out_names (List[str])`**: Names of the output file types
* **`output_end (Optional[Mapping[str, str]], optional)`**: Endings of the output filetypes.
* **`out_template (str, optional)`**: Template to generate output file names.
* **`file_mapping (Optional[List[OutMapping]], optional)`**: Mapping of input files, parameter values and output files.
* **`inputs (Optional[OmniInput], optional)`**: Object specifying all valid inputs.
* **`parameter (Optional[OmniParameter], optional)`**: Object speccifying the parameter space.
* **`default (Optional[Mapping], optional)`**: Default output files.
* **`filter_json(Optional[str], optional)`**: Path to json file with filter combinations.
* **`template_fun (Optional[Callable[..., Mapping]], optional)`**: Function to use to automatically generate output filenames.
* **`template_vars (Optional[Mapping], optional)`**: Variables that are used by template_fun.

The following class methods can be run on an instance of an OmniInput:
* **`update_outputs()`**: Method to update the output definitions according to the objects attributes.

---

#### OmniParameter
Class to manage parameter of an omnibenchmark module.
This class has the following attributes:
* **`names (List[str])`**: Name of all valid parameter
* **`values (Optional[Mapping[str, List]], optional)`**: Parameter values - usually automatically detected.
* **`default (Optional[Mapping[str, str]], optional)`**: Default parameter values.
* **`keyword (Optional[List[str]], optional)`**: Keyword to import the parameter dataset with.
* **`filter (Optional[Mapping[str, str]], optional)`**: Filter to use for the parameter space.
* **`combinations (Optional[List[Mapping[str, str]]], optional)`**: All possible parameter combinations.

The following class methods can be run on an instance of an OmniInput:
* **`update_parameter()`**: Method to import and update parameter datasets and update the object/parameter space accordingly.

---

#### OmniCommand
Class to manage the main workflow command of an omnibenchmark module. 
This class has the following attributes:
* **`script (Union[PathLike, str])`**: Path to the script run by the command
* **`interpreter (str, optional)`**: Interpreter to run the script with.
* **`command_line (str, optional)`**: Command line to be run.
* **`outputs (OmniOutput, optional)`**: Object specifying all outputs.
* **`input_val (Optional[Mapping], optional)`**: Input file tyoes and paths to run the command on.
* **`parameter_val (Optional[Mapping], optional)`**: Parameter names and values to run the command with.

The following class methods can be run on an instance of an OmniInput:
* **`update_command()`**: Method to update the command according to the outputs,inputs,parameter.

---

#### OmniPlan
Class to manage the workflow of an omnibenchmark module.
This class has the following attributes:

* **`plan (PlanViewModel)`**: A plan view model as defined in renku
* **`param_mapping (Optional[Mapping[str, str]], optional)`**: A mapping between the component names of the plan and the OmniObject.

The following class methods can be run on an instance of an OmniInput:
* **`predict_mapping_from_file_dict()`**: Method to predict the mapping from the (input-, output-, parameter) file mapping used to generate the command.

---

### Submit your module
Once a module is complete and works it can be included into the omnibenchmark orchestrator of the associated benchmark. This means it will be automatically run, continously updated and it's result will automatically be used as inputs by downstream modules. Please open an issue on the corresponding orchestrator GitLab project linking to the project to get the project taken up. 
You can find the corresponding GitLab project at the [omnibenchmark website](https://omnibenchmark.pages.uzh.ch/omni_dash/benchmarks/).


## Release History
* 0.0.46
    * Adapt to slug/name switch in renku dataset/project dataset APIs 
* 0.0.45
    * FIX:
    *  Automatic activity generation
* 0.0.44
    * Add omni_essentials for orchestrator url
    * Add local cache for orchestrator url
* 0.0.43
    * Adapt to renku version 2.4.21
    * Switch to cross entity search api
* 0.0.42
    * Adapt to renku version 2.3.2
    * Enable parallel execution of activities
* 0.0.41
    * New input argument "multi_data_matching" to explicitly allow matching files from multiple datasets.
    * New function to link files to a dataset by prefix and keyword.
    * FIX:
    * Name checking returns conflicting dataset names.
* 0.0.40
    * FIX:
    * file name matching for strings as file_type_dict values.
* 0.0.39
    * add sort_keys argument to omni_output (defaults to true -> alphabetically sorted parameter keys in out names)
    * FIX:
    * remove duplicated outputs independent of parameter order
    * clear project context in revert_run
    * use the next best match for equal matches in file name matching
* 0.0.37
    * Enable import of only one dataset by adding an all flag
    * FIX:
    * Match files by longest sequence instead of ratio
    * Workaround for bug in renku.api.Dataset.list()
* 0.0.35
    * Automatic input files matching prefixes for the same file type
    * Add all parameter to renku_run only for ne activity and plan
    * Adapt to renku 1.10.0
* 0.0.34
    * FIX:
    * Enable project queries to ignore missing projects, instead of breaking 
* 0.0.33
    * FIX:
    * Add flag to actively push/save successfully finished activities after generation.
* 0.0.32
    * Add meaningful commit message to renku_save() after updating/generating activities
    * Adjust to renku 1.9.1
    * Adapt to mypy no_implicit_optional=True
* 0.0.31
    * Adapt to renku-python 1.8.1:
    * Replace context manager by renku api to get plan and activity gateways
    * Adjust project attributes 
    * FIX:
    * Use url path in renku api lineage call 
    * Enable multipage results in renku api queries 
* 0.0.30
    * FIX:
    * Get a stable name hash used to automatically generate output names by fixing the order using sorted input file names
    * Use files that generated the workflow to do the file mapping  
* 0.0.29
    * Add check_status and check_run to omni_obj class methods
    * FIX:
    * Export these most important functionalities 
* 0.0.28
    * Commit after generating/updating each activity
    * Add argument to disable orchestrator check
    * FIX:
    * Include latest pipelines with optional numbers into orchestrator check 
* 0.0.27
    * Change automatic output naming mechanism
    * Ensure stable output names independent of other inputs
* 0.0.26
    * Add revert_run function
    * Remove outputs and activties with non existing inputs
    * Remove datasets with non matching keywords
* 0.0.23
    * FIX:
    * Adapt changes to renku 1.7.1
* 0.0.22
    * FIX:
    * Ignore datasets with broken urls
* 0.0.17-19
    * FIX:
    * Update defaults after filtering
* 0.0.16
    * Add filter for input/parameter combinations
* 0.0.15
    * FIX:
    * Replace command error with warning to allow to build an object before inputs are imported
* 0.0.14
    * FIX:
    * Change benchmark url
* 0.0.13
    * Add function docstrings
    * Add name filter for importing input datasets
    * Add Documentation in Readme
    * FIX:
    * Adapt renku_update_activities to handle skip_update_metadata argument in renku 1.5.0
* 0.0.12
    * FIX:
    * accept float as parameter values and keep after filtering  
* 0.0.9
    * FIX:
    * add common sequence to auto file matching  
* 0.0.8
    * FIX:
    * add update command to omni_obj.update_object()  
* 0.0.4 - 0.0.7
    * FIX:
    * convert defaults to string to generate plan 
    * adapt output default
    * dependency between command line call and renku input definitions
    * ignore not existing defaults
* 0.0.3
    * FIX:
    * automatic input detection from prefixes for files from the same dataset 
* 0.0.2
    * FIX:
    * automatic command detection, file_mapping.input_files structure
* 0.0.1
    * First version of all main functionalities

## Meta

Almut Lütge – [@Almut30618742](https://twitter.com/Almut30618742)
Anthony Sonrel – [@AnthonySonrel](https://twitter.com/AnthonySonrel)
Mark Robinson – [@markrobinsonca](https://twitter.com/markrobinsonca)

Distributed under the Apache 2.0 license. See ``LICENSE`` for more information.

[https://github.com/almutlue/omnibenchmark-py](https://github.com/almutlue/omnibenchmark-py)

## Contributing

1. Fork it (<https://github.com/almutlue/omnibenchmark-py/fork>)
2. Create your feature branch (`git checkout -b feature/fooBar`)
3. Commit your changes (`git commit -am 'Add some fooBar'`)
4. Push to the branch (`git push origin feature/fooBar`)
5. Create a new Pull Request

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "omnibenchmark",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "benchmark,omnibenchmark,renku",
    "author": "",
    "author_email": "Almut L\u00fctge <almut.lue@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/86/26/12f9fb42711c528c665c63810e373225abc3bd987f8c75f980d151224c08/omnibenchmark-0.0.46.tar.gz",
    "platform": null,
    "description": "# Omnibenchmark\n\nGenerate and manage omnibenchmark modules for open and continuous benchmarking.\nEach module represents a single building block of a benchmark, e.g., dataset, method, metric.\nOmnibenchmark-py provides a structure to generate modules and automatically run them.\n\n![](omni.svg)\n\n## Installation\n\nYou can install **omnibenchmark** from [PyPI](https://pypi.org/project/omnibenchmark/):\n\n```sh\npip install omnibenchmark\n```\n\nThe module requires Python >= 3.8 and [renku >= 1.5.0](https://pypi.org/project/renku/).\n\n## How to use omnibenchmark\n\nFor detailed documentation and tutorials, check the [omnibenchmark documentation](https://omnibenchmark.readthedocs.io).\n\n### Quick start\n\nOmnibenchmark uses the `renku` platform to run open and continuous benchmarks. To contribute an independent module to one of the existing benchmarks please start by creating a new [renku project](#Create-a-new-renku-project). Each module (= renkulab project) consists of a Docker image, which defines its software environment, a dataset to store outputs and metadata, a workflow that describes how to generate outputs and input and parameter datasets with input files and parameter definitions, if they are used. Thus, each module is an independent benchmark component and can be run, used and modified independently as such. Modules are connected by importing (result) datasets from other modules as input datasets and will automatically be updated according to them.  \n\nAll relevant information on how to run a specific module are stored as [`OmniObject`](#omnibenchmark-classes).\nThe most convenient way to generate an instance of an `OmniObject` is to build it from a `config.yaml` file:\n\n``` python\n## modules\nfrom omnibenchmark.utils.build_omni_object import get_omni_object_from_yaml\n\n## Load object\nomni_obj = get_omni_object_from_yaml('src/config.yaml')\n\n```\n\nThe `config.yaml` defines all module specific information as inputs, outputs, script to run the module, benchmark that the module belongs to and much more. A simple `config.yaml` file could look like this (see [The config.yaml file](#the-config.yaml-file) for more details):\n\n```sh\n---\ndata:\n    name: \"module-name\"\n    title: \"A new module\"\n    description: \"A new module for omnibenchmark, e.g., a dataset, method, metric,...\"\n    keywords: [\"module-type-key\"]\nscript: \"path/to/module_script\"\noutputs:\n    template: \"data/${name}/${name}_${out_name}.${out_end}\"\n    files:\n        counts: \n            end: \"mtx.gz\"\n        data_info:\n            end: \"json\"\n        meta:\n            end: \"json\"\nbenchmark_name: \"an-omnibenchmark\"\n```\nOnce you have an instance of an `OmniObject` you can check if it looks as you expected like this:\n\n```python\n## Check object\nprint(omni_obj.__dict__)\nprint(omni_obj.outputs.file_mapping)\nprint(omni_obj.command.command_line)\n``` \n\nIf all inputs, outputs and the command line call look as you expected you can run your module:\n\n```python\n## create output dataset that stores all result/output files\nomni_obj.create_dataset()\n\n## Update inputs from other modules \nomni_obj.update_obj()\n\n## Run your script with all defined inputs and outputs.\n## This also generates a workflow description (plan) and is tracked as activity.\nomni_obj.run_renku()\n\n## Link output files to output dataset \nomni_obj.update_result_dataset()\n\n## Save and commit to gitlab\nrenku_save()\n```\nOnce these steps ran successfully and your outputs were generated the module is ready to be [submitted to become a part of omnibenchmark](#Submit-your-module).\n\n### What is renku?\n[Renku](https://renkulab.io) is a platform and tools for reproducible and collaborative data analysis from the [Swiss Data Science Centre](https://datascience.ch/). Besides other functionalities, renku provides a framework to create and run data analysis projects, which come with their own Docker container, datasets and workflows. By storing the metadata of projects and datasets on a knowledge graph, renku facilitates provenance tracking and project interactions. To do so renku combines a set of microservices:\n- GitLab, for version control and project management\n- GitLFS, for file storage\n- Kubernetes/Docker, to manage containerized environments\n- Jupyter server, to provide interactive sessions\n- Apache Jena, to generate, store and manage triplets and the triplet store (knowledge graph) \n\nDetails on how to use renku can be found in their [Documentation](https://renku.readthedocs.io/en/latest/index.html). Omnibenchmark uses renku to build and run collaborative and continuous benchmarks. \n\n#### Create a new renku project\nOmnibenchmark modules are built as separate renku projects. Contributions to one of the existing benchmarks start by creating a new project using the [renku platform](https://renkulab.io). This can be done by registering directly or using a Github account, an ORCID or a Switch-EDU ID.\nA new project can be created by a few clicks as described [here](https://renku.readthedocs.io/en/latest/tutorials/first_steps/01_create_project.html). Templates can be chosen depending on the projects code or the Basic Python template. Project can then be populated/changed in an interactive renku session (see session tab of the project) or within the GitLab instance or clone of the project (Overview tab --> View in GitLab). \n\n### Project requirements\nProject requirements can be defined by adapting the `Dockerfile` and specifying the all required R packages (with their versions) in the `install.R` file and all required python modules (with their versions) in the `requirements.txt` file. The latter needs to contain at least `omnibenchmark`. If you work in an interactive session, you need to save/commit your changes either by running `renku save` or `git add/commit/push` and close and restart the session once the new Docker image has been built. The build of the docker image is triggered automatically after committing changes, but can take a while depending on the requirments (and the runner used).\n\n### The config.yaml file\nAll specific information about a benchmark component (= renkulab project) can be specified in a `config.yaml` file. Below we show an example with all standard fields and explanations to them. Many fields are optional and do not apply to all modules. All unneccessary fields can be skipped. There are further optional fields for specfic edge cases, that are described in an extra `config.yaml` file. In general, the `config.yaml` file consists of a data, an input, an output and a parameter section as well as a few extra fields to define the main benchmark script and benchmark type. Except for the data section, the other sections are optional. Multiple values can be parsed as lists.\n\n```yaml\n# Data section to describe the object and the associated (result) dataset\ndata:\n    # Name of the dataset\n    name: \"out_dataset\"\n    # Title of the dataset (Optional)\n    title: \"Output of an example OmniObject\"\n    # Description of the dataset (Optional)\n    description: \"This dataset is supposed to store the output files from the example omniobject\"\n    # Dataset keyword(s) to make this dataset reachable for other projects/benhcmark components\n    keywords: [\"example_dataset\"]\n# Script to be run by the workflow associated to the project\nscript: \"path/to/method/dataset/metric/script.py\"\n# Interpreter to run the script (Optional, automatic detection)\ninterpreter: \"python\"\n# Benchmark that the object is associated to.\nbenchmark_name: \"omni_celltype\"\n# Orchestrator url of the benchmark (Optional, automatic detection)\norchestrator: \"https://www.orchestrator_url.com\"\n# Input section to describe output file types. (Optional)\ninputs:\n    # Keyword to find input datasets, that shall be imported \n    keywords: [\"import_this\", \"import_that\"]\n    # Input file types\n    files: [\"count_file\", \"dim_red_file\"]\n    # Prefix (part of the filename is sufficient) to automatically detect file types by their names\n    prefix:\n        count_file: \"counts\"\n        dim_red_file: [\"features\", \"genes\"]\n# Output section to describe output file types. (Optional)\noutputs:\n    # Output filetypes and their endings\n    files:\n        corrected_counts: \n            end: \".mtx.gz\"\n        meta:\n            end: \".json\"\n# Parameter section to describe the parameter dataset, values and filter. (Optional)\nparameter:\n    # Names of the parameter to use\n    names: [\"param1\", \"param2\"]\n    # Keyword(s) used to import the parameter dataset\n    keywords: [\"param_dataset\"]\n    # Filter that specify limits, values or combinations to exclude\n    filter:\n        param1:\n            upper: 50\n            lower: 3\n            exclude: 12\n    param2:\n        \"path/to/file/with/parameter/combinations/to/exclude.json\"\n```\n\nBelow gives specific fields that are only relevant for edge cases. These fields have their counterparts in the generated [OmniObject](#omnibenchmark-classes).\nChanges of the attributes of the OmniObject instance have the same effects, but come with the flexibility of running from python. \n\n```yaml\n# Command to generate the workflow with (Optional, automatic detection)\ncommand_line: \"python path/to/method/dataset/metric/script.py --count_file data/import_this_dataset/...mtx.gz\"\ninputs:\n    # Datasets and manual file type specifications (automatic detection!)\n    input_files:\n        import_this_dataset:\n            count_file: \"data/import_this_dataset/import_this_dataset__counts.mtx.gz\"\n            dim_red_file: \"data/import_this_dataset/import_this_dataset__dim_red_file.json\"\n    # (Dataset) name that default input files belong to (Optional, automatic detection)\n    default: \"import_this_dataset\"\n    # Input dataset names that should be ignored (even if they have one of the specified input keywords assciated)\n    filter_names: [\"data1\", \"data2\"]\noutputs:\n    # Template to automatically generate output filenames (Optional - recommended for advanced user only)\n    template: \"data/${name}/${name}_${unique_values}_${out_name}.${out_end}\"\n    # Variables used for automatic output filename generation (Optional - recommended for advanced user only)\n    template_vars:\n        vars1: \"random\"\n        vars2: \"variable\"\n    # Manual specification of mapping for output files and their corresponding input files and parameter values (automatic detection!)\n    file_mapping:\n        mapping1: \n            output_files:\n                corrected_counts: \"data/out_dataset/out_dataset_import_this__param1_10__param2_test_corrected_counts.mtx.gz\"\n                meta: \"data/out_dataset/out_dataset_import_this__param1_10__param2_test_meta.json\"\n        input_files:\n            count_file: \"data/import_this_dataset/import_this_dataset__counts.mtx.gz\"\n            dim_red_file: \"data/import_this_dataset/import_this_dataset__dim_red_file.json\"\n        parameter:\n            param1: 10\n            param2: \"test\"\n    # Default output files (Optional, automatic detection)\n    default:\n        corrected_counts: \"data/out_dataset/out_dataset_import_this__param1_10__param2_test_corrected_counts.mtx.gz\"\n        meta: \"data/out_dataset/out_dataset_import_this__param1_10__param2_test_meta.json\"\nparameter:\n    default:\n        param1: 10\n        param2: \"test\"\n```\n\n### Omnibenchmark classes\nClasses to manage omnibenchmark modules and their interactions. The main class is the [OmniObject](#omniobject), that consolidates all relevant information and functions of a module. This object has further subclasses that define inputs, outputs, commands and the workflow.\n\n---\n\n#### OmniObject\nMain class to manage an omnibenchmark module. \nIt takes the following arguments:\n* **`name (str)`**: Module name \n* **`keyword (Optional[List[str]], optional)`**: Keyword associated to the modules output dataset.\n* **`title (Optional[str], optional)`**: Title of the modules output dataset.\n* **`description (Optional[str], optional)`**: Description of the modules output dataset.\n* **`script (Optional[PathLike], optional)`**: Script to generate the modules workflow for.\n* **`command (Optional[OmniCommand], optional)`**: Workflow command - will be automatically generated if missing.\n* **`inputs (Optional[OmniInput], optional)`**: Definitions of the workflow inputs.\n* **`parameter (Optional[OmniParameter], optional)`**: Definitions of the workflow parameter.\n* **`outputs (Optional[OmniOutput], optional)`**: Definitions of the workflow outputs.\n* **`omni_plan (Optional[OmniPlan], optional)`**: The workflow description.\n* **`benchmark_name (Optional[str], optional)`**: Name of the associated benchmark.\n* **`orchestrator (Optional[str], optional)`**: Orchestrator URL of the associated benchmark. Automatic detection.\n* **`wflow_name (Optional[str], optional)`**: Workflow name. Will be set to the module name if none.\n* **`dataset_name (Optional[str], optional)`**: Dataset name. Will be set to the module name if none.\n\nThe following methods can be run on an instance of an OmniObject:\n* **`create_dataset()`**: Creates a renku dataset with the specified attributes in the current renku project. \n* **`update_object()`**: Checks for new imports or updates in the input and parameter datasets. Will update object attributes accordingly.\n* **`run_renku()`**: Generates and updates the workflow and all output files as specified in the object.\n* **`update_result_dataset()`**: Updates and adds all output datasets to the dataset specified in the object.\n* **`clean_revert_run()`**: Clean/Remove all outputs, workflows and activities associated with the object (including KG connections).\n* **`check_updates()`**: Dry run option for update_object().\n* **`check_run()`**: Dry run option for run_renku().\n\n---\n\n#### OmniInput\nClass to manage inputs of an omnibenchmark module.\nThis class has the following attributes:\n* **`names (List[str])`**: Names of the input filetypes\n* **`prefix (Optional[Mapping[str, List[str]]], optional)`**: Prefixes (or substrings) of the input filetypes.\n* **`input_files (Optional[Mapping[str, Mapping[str, str]]], optional)`**: Input files ordered by file types.\n* **`keyword (Optional[List[str]], optional)`**: Keyword to define which datasets are imported as input datasets.\n* **`default (Optional[str], optional)`**: Default input name (e.g., dataset).\n* **`filter_names (Optional[List[str]], optional)`**: Input dataset names to be ignored.\n* **`multi_data_matching (Optional[bool])`**: If file matching across renku datasets should be allowed (defaults to False).\n\nThe following class methods can be run on an instance of an OmniInput:\n* **`update_inputs()`**: Method to import new and update existing input datasets and update the object accordingly\n\n---\n\n#### OmniOutput\nClass to manage outputs of an omnibenchmark module. \nThis class has the following attributes:\n* **`name (str)`**: Name that is specific for all outputs. Typically the module name/OmniObject name.\n* **`out_names (List[str])`**: Names of the output file types\n* **`output_end (Optional[Mapping[str, str]], optional)`**: Endings of the output filetypes.\n* **`out_template (str, optional)`**: Template to generate output file names.\n* **`file_mapping (Optional[List[OutMapping]], optional)`**: Mapping of input files, parameter values and output files.\n* **`inputs (Optional[OmniInput], optional)`**: Object specifying all valid inputs.\n* **`parameter (Optional[OmniParameter], optional)`**: Object speccifying the parameter space.\n* **`default (Optional[Mapping], optional)`**: Default output files.\n* **`filter_json(Optional[str], optional)`**: Path to json file with filter combinations.\n* **`template_fun (Optional[Callable[..., Mapping]], optional)`**: Function to use to automatically generate output filenames.\n* **`template_vars (Optional[Mapping], optional)`**: Variables that are used by template_fun.\n\nThe following class methods can be run on an instance of an OmniInput:\n* **`update_outputs()`**: Method to update the output definitions according to the objects attributes.\n\n---\n\n#### OmniParameter\nClass to manage parameter of an omnibenchmark module.\nThis class has the following attributes:\n* **`names (List[str])`**: Name of all valid parameter\n* **`values (Optional[Mapping[str, List]], optional)`**: Parameter values - usually automatically detected.\n* **`default (Optional[Mapping[str, str]], optional)`**: Default parameter values.\n* **`keyword (Optional[List[str]], optional)`**: Keyword to import the parameter dataset with.\n* **`filter (Optional[Mapping[str, str]], optional)`**: Filter to use for the parameter space.\n* **`combinations (Optional[List[Mapping[str, str]]], optional)`**: All possible parameter combinations.\n\nThe following class methods can be run on an instance of an OmniInput:\n* **`update_parameter()`**: Method to import and update parameter datasets and update the object/parameter space accordingly.\n\n---\n\n#### OmniCommand\nClass to manage the main workflow command of an omnibenchmark module. \nThis class has the following attributes:\n* **`script (Union[PathLike, str])`**: Path to the script run by the command\n* **`interpreter (str, optional)`**: Interpreter to run the script with.\n* **`command_line (str, optional)`**: Command line to be run.\n* **`outputs (OmniOutput, optional)`**: Object specifying all outputs.\n* **`input_val (Optional[Mapping], optional)`**: Input file tyoes and paths to run the command on.\n* **`parameter_val (Optional[Mapping], optional)`**: Parameter names and values to run the command with.\n\nThe following class methods can be run on an instance of an OmniInput:\n* **`update_command()`**: Method to update the command according to the outputs,inputs,parameter.\n\n---\n\n#### OmniPlan\nClass to manage the workflow of an omnibenchmark module.\nThis class has the following attributes:\n\n* **`plan (PlanViewModel)`**: A plan view model as defined in renku\n* **`param_mapping (Optional[Mapping[str, str]], optional)`**: A mapping between the component names of the plan and the OmniObject.\n\nThe following class methods can be run on an instance of an OmniInput:\n* **`predict_mapping_from_file_dict()`**: Method to predict the mapping from the (input-, output-, parameter) file mapping used to generate the command.\n\n---\n\n### Submit your module\nOnce a module is complete and works it can be included into the omnibenchmark orchestrator of the associated benchmark. This means it will be automatically run, continously updated and it's result will automatically be used as inputs by downstream modules. Please open an issue on the corresponding orchestrator GitLab project linking to the project to get the project taken up. \nYou can find the corresponding GitLab project at the [omnibenchmark website](https://omnibenchmark.pages.uzh.ch/omni_dash/benchmarks/).\n\n\n## Release History\n* 0.0.46\n    * Adapt to slug/name switch in renku dataset/project dataset APIs \n* 0.0.45\n    * FIX:\n    *  Automatic activity generation\n* 0.0.44\n    * Add omni_essentials for orchestrator url\n    * Add local cache for orchestrator url\n* 0.0.43\n    * Adapt to renku version 2.4.21\n    * Switch to cross entity search api\n* 0.0.42\n    * Adapt to renku version 2.3.2\n    * Enable parallel execution of activities\n* 0.0.41\n    * New input argument \"multi_data_matching\" to explicitly allow matching files from multiple datasets.\n    * New function to link files to a dataset by prefix and keyword.\n    * FIX:\n    * Name checking returns conflicting dataset names.\n* 0.0.40\n    * FIX:\n    * file name matching for strings as file_type_dict values.\n* 0.0.39\n    * add sort_keys argument to omni_output (defaults to true -> alphabetically sorted parameter keys in out names)\n    * FIX:\n    * remove duplicated outputs independent of parameter order\n    * clear project context in revert_run\n    * use the next best match for equal matches in file name matching\n* 0.0.37\n    * Enable import of only one dataset by adding an all flag\n    * FIX:\n    * Match files by longest sequence instead of ratio\n    * Workaround for bug in renku.api.Dataset.list()\n* 0.0.35\n    * Automatic input files matching prefixes for the same file type\n    * Add all parameter to renku_run only for ne activity and plan\n    * Adapt to renku 1.10.0\n* 0.0.34\n    * FIX:\n    * Enable project queries to ignore missing projects, instead of breaking \n* 0.0.33\n    * FIX:\n    * Add flag to actively push/save successfully finished activities after generation.\n* 0.0.32\n    * Add meaningful commit message to renku_save() after updating/generating activities\n    * Adjust to renku 1.9.1\n    * Adapt to mypy no_implicit_optional=True\n* 0.0.31\n    * Adapt to renku-python 1.8.1:\n    * Replace context manager by renku api to get plan and activity gateways\n    * Adjust project attributes \n    * FIX:\n    * Use url path in renku api lineage call \n    * Enable multipage results in renku api queries \n* 0.0.30\n    * FIX:\n    * Get a stable name hash used to automatically generate output names by fixing the order using sorted input file names\n    * Use files that generated the workflow to do the file mapping  \n* 0.0.29\n    * Add check_status and check_run to omni_obj class methods\n    * FIX:\n    * Export these most important functionalities \n* 0.0.28\n    * Commit after generating/updating each activity\n    * Add argument to disable orchestrator check\n    * FIX:\n    * Include latest pipelines with optional numbers into orchestrator check \n* 0.0.27\n    * Change automatic output naming mechanism\n    * Ensure stable output names independent of other inputs\n* 0.0.26\n    * Add revert_run function\n    * Remove outputs and activties with non existing inputs\n    * Remove datasets with non matching keywords\n* 0.0.23\n    * FIX:\n    * Adapt changes to renku 1.7.1\n* 0.0.22\n    * FIX:\n    * Ignore datasets with broken urls\n* 0.0.17-19\n    * FIX:\n    * Update defaults after filtering\n* 0.0.16\n    * Add filter for input/parameter combinations\n* 0.0.15\n    * FIX:\n    * Replace command error with warning to allow to build an object before inputs are imported\n* 0.0.14\n    * FIX:\n    * Change benchmark url\n* 0.0.13\n    * Add function docstrings\n    * Add name filter for importing input datasets\n    * Add Documentation in Readme\n    * FIX:\n    * Adapt renku_update_activities to handle skip_update_metadata argument in renku 1.5.0\n* 0.0.12\n    * FIX:\n    * accept float as parameter values and keep after filtering  \n* 0.0.9\n    * FIX:\n    * add common sequence to auto file matching  \n* 0.0.8\n    * FIX:\n    * add update command to omni_obj.update_object()  \n* 0.0.4 - 0.0.7\n    * FIX:\n    * convert defaults to string to generate plan \n    * adapt output default\n    * dependency between command line call and renku input definitions\n    * ignore not existing defaults\n* 0.0.3\n    * FIX:\n    * automatic input detection from prefixes for files from the same dataset \n* 0.0.2\n    * FIX:\n    * automatic command detection, file_mapping.input_files structure\n* 0.0.1\n    * First version of all main functionalities\n\n## Meta\n\nAlmut L\u00fctge \u2013 [@Almut30618742](https://twitter.com/Almut30618742)\nAnthony Sonrel \u2013 [@AnthonySonrel](https://twitter.com/AnthonySonrel)\nMark Robinson \u2013 [@markrobinsonca](https://twitter.com/markrobinsonca)\n\nDistributed under the Apache 2.0 license. See ``LICENSE`` for more information.\n\n[https://github.com/almutlue/omnibenchmark-py](https://github.com/almutlue/omnibenchmark-py)\n\n## Contributing\n\n1. Fork it (<https://github.com/almutlue/omnibenchmark-py/fork>)\n2. Create your feature branch (`git checkout -b feature/fooBar`)\n3. Commit your changes (`git commit -am 'Add some fooBar'`)\n4. Push to the branch (`git push origin feature/fooBar`)\n5. Create a new Pull Request\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Omnibenchmark core utilities: Setup and running of continous benchmarking modules as part of omnibenchmark",
    "version": "0.0.46",
    "project_urls": {
        "Bug Tracker": "https://github.com/almutlue/omnibenchmark-py/issues",
        "Code": "https://github.com/almutlue/omnibenchmark-py",
        "Documentation": "https://omnibenchmark.readthedocs.io",
        "Homepage": "https://omnibenchmark.pages.uzh.ch/omni_dash"
    },
    "split_keywords": [
        "benchmark",
        "omnibenchmark",
        "renku"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "723bd003cbbfc2e449877d7bd4aa2dc6a180445484bd79bac90db99a3a3d31a6",
                "md5": "06f923a35de714ce020e8f59ebe5dc0e",
                "sha256": "bbf31ff02a3b963fc663163ffc98492fa87e5132af86c31ebd4e044a3ebaa323"
            },
            "downloads": -1,
            "filename": "omnibenchmark-0.0.46-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "06f923a35de714ce020e8f59ebe5dc0e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 91411,
            "upload_time": "2023-07-21T07:23:49",
            "upload_time_iso_8601": "2023-07-21T07:23:49.018501Z",
            "url": "https://files.pythonhosted.org/packages/72/3b/d003cbbfc2e449877d7bd4aa2dc6a180445484bd79bac90db99a3a3d31a6/omnibenchmark-0.0.46-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "862612f9fb42711c528c665c63810e373225abc3bd987f8c75f980d151224c08",
                "md5": "88c1198b355f02356bf71e366ce2aff8",
                "sha256": "258b0728f95da270a9b65e6a831d11717b072355577555d8baa92f3f36a6a48a"
            },
            "downloads": -1,
            "filename": "omnibenchmark-0.0.46.tar.gz",
            "has_sig": false,
            "md5_digest": "88c1198b355f02356bf71e366ce2aff8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 79472,
            "upload_time": "2023-07-21T07:23:52",
            "upload_time_iso_8601": "2023-07-21T07:23:52.050842Z",
            "url": "https://files.pythonhosted.org/packages/86/26/12f9fb42711c528c665c63810e373225abc3bd987f8c75f980d151224c08/omnibenchmark-0.0.46.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-21 07:23:52",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "almutlue",
    "github_project": "omnibenchmark-py",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "omnibenchmark"
}
        
Elapsed time: 0.09901s