<p align="center">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://github.com/mrapp-ke/MLRL-Boomer/raw/main/doc/_static/logo_testbed_dark.svg">
<source media="(prefers-color-scheme: light)" srcset="https://github.com/mrapp-ke/MLRL-Boomer/raw/main/doc/_static/logo_testbed_light.svg">
<img alt="MLRL-Testbed" src="https://github.com/mrapp-ke/MLRL-Boomer/raw/main/.assets/logo_testbed_light.svg">
</picture>
</p>
[](https://opensource.org/licenses/MIT) [](https://badge.fury.io/py/mlrl-testbed) [](https://mlrl-boomer.readthedocs.io/en/latest/?badge=latest)
**🔗 Important links:** [Documentation](https://mlrl-boomer.readthedocs.io/en/latest/user_guide/testbed/index.html) | [Issue Tracker](https://github.com/mrapp-ke/MLRL-Boomer/issues) | [Changelog](https://mlrl-boomer.readthedocs.io/en/latest/misc/CHANGELOG.html) | [License](https://mlrl-boomer.readthedocs.io/en/latest/misc/LICENSE.html)
<!-- documentation-start -->
This software package provides **mlrl-testbed - a command line utility for running machine learning experiments**. It implements a straightforward, easily configurable, and extensible *workflow* for conducting experiments, including steps such as (but not restricted to) the following:
- loading a dataset
- splitting it into training and test sets
- training one or several models
- evaluating the models' performance
- saving experimental results to output files
# MLRL-Testbed
On its own, this package is not very powerful. It is intended as a basis for other packages that build functionality upon it. In fact, it does not make any assumptions about the problem domain or type of machine learning algorithm that should be used in an experiment. Instead, implementations of domain- or algorithm-specific functionality are provided by the extensions discussed below.
## Tabular Machine Learning
The package [mlrl-testbed-sklearn](https://pypi.org/project/mlrl-testbed-sklearn/) adds support for tabular machine learning problems by making use of the [scikit-learn](https://scikit-learn.org) framework. It can easily be installed via the following command (and will pull [mlrl-testbed](https://pypi.org/project/mlrl-testbed/) as a dependency):
```
pip install mlrl-testbed-sklearn
```
Optionally, support for the [Slurm Workload Manager](https://wikipedia.org/wiki/Slurm_Workload_Manager) can be installed via the package [mlrl-testbed-slurm](https://pypi.org/project/mlrl-testbed-slurm/).
### 💡 Example
By writing just a small amount of code, any scikit-learn compatible [estimator](https://scikit-learn.org/stable/glossary.html#term-estimators) can be integrated with MLRL-Testbed and used in experiments. For example, the following code integrates scikit-learn's [RandomForestClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier):
```python
from argparse import Namespace
from mlrl.testbed_sklearn.runnables import SkLearnRunnable
from mlrl.util.cli import Argument, IntArgument
from sklearn.ensemble import RandomForestClassifier
from sklearn.base import ClassifierMixin, RegressorMixin
from typing import Optional, Set
class Runnable(SkLearnRunnable):
N_ESTIMATORS = IntArgument(
'--n-estimators',
description='The number of trees in the forest',
default=100,
)
def get_algorithmic_arguments(self, known_args: Namespace) -> Set[Argument]:
return { self.N_ESTIMATORS }
def create_classifier(self, args: Namespace) -> Optional[ClassifierMixin]:
return RandomForestClassifier()
def create_regressor(self, args: Namespace) -> Optional[RegressorMixin]:
return None # Not needed in this case
```
The previously integrated algorithm can then be used in experiments controlled via a command line API. Assuming that the source code shown above is saved to a file named `custom_runnable.py` in the working directory, we are now capable of fitting a [RandomForestClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier) to a dataset by using the command below.
```
mlrl-testbed custom_runnable.py \
--data-dir path/to/datasets/ \
--dataset dataset-name \
--n-estimators 50
```
The above command does not only train a model, but also evaluates it according to common measures and prints the evaluation results. It does also demonstrate how algorithmic parameters can be controlled via command line arguments.
It is also possible to run multiple experiments at once by defining the datasets and algorithmic parameters to be used in the different runs in a YAML file:
```
mlrl-testbed custom_runnable.py --mode batch --config path/to/config.yaml
```
An exemplary YAML file is shown below. Each combination of the specified parameter values is applied to each dataset defined in the file.
```yaml
datasets:
- directory: path/to/datasets/
names:
- first-dataset
- second-dataset
parameters:
- name: --n-estimators
values:
- 50
- 100
```
### 🏁 Advantages
Making use of MLRL-Testbed does not only help with the burdens of training and evaluating machine learning models, it can also help making your own methods and algorithms more accessible to users. This is demonstrated by the rule learning algorithms [mlrl-boomer](https://pypi.org/project/mlrl-boomer/) and [mlrl-seco](https://pypi.org/project/mlrl-seco/) that can easily be run via the command line API described above and even extend it with rule-specific functionalities.
### 🔧 Functionalities
The package [mlrl-testbed-sklearn](https://pypi.org/project/mlrl-testbed-sklearn/) provides a command line API that allows configuring and running machine learning algorithms. It allows to apply machine learning algorithms to different datasets and can evaluate their predictive performance in terms of commonly used measures. In detail, it supports the following functionalities:
- Single- and multi-output datasets in the [Mulan](https://github.com/tsoumakas/mulan) and [MEKA](https://waikato.github.io/meka/) format are supported (with the help of the package [mlrl-testbed-arff](https://pypi.org/project/mlrl-testbed-arff/)).
- Datasets can automatically be split into training and test data, including the possibility to use cross validation. Alternatively, predefined splits can be provided as separate files.
- One-hot-encoding can be applied to nominal or binary features.
- Binary predictions, scores, or probability estimates can be obtained from machine learning algorithms, if supported. Evaluation measures that are suited for the respective type of predictions are picked automatically.
Furthermore, the command line API provides many options for controlling the experimental results to be gathered during an experiment. Depending on the configuration, the following experimental results can be saved to output files or printed on the console:
- Evaluation scores according to commonly used measures
- Characteristics, i.e., statistical properties, of datasets
- Predictions and their characteristics
- Unique label vectors contained in a classification dataset
If the following are written to output files, they can be loaded and reused in future experiments:
- The machine learning models that have been learned
- Algorithmic parameters used for training
<!-- documentation-end -->
## 📚 Documentation
Our documentation provides an extensive [user guide](https://mlrl-boomer.readthedocs.io/en/latest/user_guide/testbed/index.html), as well as an [API reference](https://mlrl-boomer.readthedocs.io/en/latest/developer_guide/api/python/testbed/mlrl.testbed.html) for developers.
- Examples of how to save [experimental results](https://mlrl-boomer.readthedocs.io/en/latest/user_guide/testbed/outputs.html) to output files.
- Instructions for [using your own algorithms](https://mlrl-boomer.readthedocs.io/en/latest/user_guide/testbed/runnables.html) with the command line API.
- An overview of available [command line arguments](https://mlrl-boomer.readthedocs.io/en/latest/user_guide/testbed/arguments.html) for controlling experiments.
For an overview of changes and new features that have been included in past releases, please refer to the [changelog](https://mlrl-boomer.readthedocs.io/en/latest/misc/CHANGELOG.html).
## 📜 License
This project is open source software licensed under the terms of the [MIT license](https://mlrl-boomer.readthedocs.io/en/latest/misc/LICENSE.html). We welcome contributions to the project to enhance its functionality and make it more accessible to a broader audience. A frequently updated list of contributors is available [here](https://mlrl-boomer.readthedocs.io/en/latest/misc/CONTRIBUTORS.html).
All contributions to the project and discussions on the [issue tracker](https://github.com/mrapp-ke/MLRL-Boomer/issues) are expected to follow the [code of conduct](https://mlrl-boomer.readthedocs.io/en/latest/misc/CODE_OF_CONDUCT.html).
Raw data
{
"_id": null,
"home_page": null,
"name": "mlrl-testbed",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.12",
"maintainer_email": null,
"keywords": "machine learning, multi-label classification, multi-output regression, multi-target regression",
"author": null,
"author_email": "Michael Rapp <michael.rapp.ml@gmail.com>",
"download_url": null,
"platform": null,
"description": "<p align=\"center\">\n <picture>\n <source media=\"(prefers-color-scheme: dark)\" srcset=\"https://github.com/mrapp-ke/MLRL-Boomer/raw/main/doc/_static/logo_testbed_dark.svg\">\n <source media=\"(prefers-color-scheme: light)\" srcset=\"https://github.com/mrapp-ke/MLRL-Boomer/raw/main/doc/_static/logo_testbed_light.svg\">\n <img alt=\"MLRL-Testbed\" src=\"https://github.com/mrapp-ke/MLRL-Boomer/raw/main/.assets/logo_testbed_light.svg\">\n </picture>\n</p>\n\n[](https://opensource.org/licenses/MIT) [](https://badge.fury.io/py/mlrl-testbed) [](https://mlrl-boomer.readthedocs.io/en/latest/?badge=latest)\n\n**\ud83d\udd17 Important links:** [Documentation](https://mlrl-boomer.readthedocs.io/en/latest/user_guide/testbed/index.html) | [Issue Tracker](https://github.com/mrapp-ke/MLRL-Boomer/issues) | [Changelog](https://mlrl-boomer.readthedocs.io/en/latest/misc/CHANGELOG.html) | [License](https://mlrl-boomer.readthedocs.io/en/latest/misc/LICENSE.html)\n\n<!-- documentation-start -->\n\nThis software package provides **mlrl-testbed - a command line utility for running machine learning experiments**. It implements a straightforward, easily configurable, and extensible *workflow* for conducting experiments, including steps such as (but not restricted to) the following:\n\n- loading a dataset\n- splitting it into training and test sets\n- training one or several models\n- evaluating the models' performance\n- saving experimental results to output files\n\n# MLRL-Testbed\n\nOn its own, this package is not very powerful. It is intended as a basis for other packages that build functionality upon it. In fact, it does not make any assumptions about the problem domain or type of machine learning algorithm that should be used in an experiment. Instead, implementations of domain- or algorithm-specific functionality are provided by the extensions discussed below.\n\n## Tabular Machine Learning\n\nThe package [mlrl-testbed-sklearn](https://pypi.org/project/mlrl-testbed-sklearn/) adds support for tabular machine learning problems by making use of the [scikit-learn](https://scikit-learn.org) framework. It can easily be installed via the following command (and will pull [mlrl-testbed](https://pypi.org/project/mlrl-testbed/) as a dependency):\n\n```\npip install mlrl-testbed-sklearn\n```\n\nOptionally, support for the [Slurm Workload Manager](https://wikipedia.org/wiki/Slurm_Workload_Manager) can be installed via the package [mlrl-testbed-slurm](https://pypi.org/project/mlrl-testbed-slurm/).\n\n### \ud83d\udca1 Example\n\nBy writing just a small amount of code, any scikit-learn compatible [estimator](https://scikit-learn.org/stable/glossary.html#term-estimators) can be integrated with MLRL-Testbed and used in experiments. For example, the following code integrates scikit-learn's [RandomForestClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier):\n\n```python\nfrom argparse import Namespace\nfrom mlrl.testbed_sklearn.runnables import SkLearnRunnable\nfrom mlrl.util.cli import Argument, IntArgument\nfrom sklearn.ensemble import RandomForestClassifier\nfrom sklearn.base import ClassifierMixin, RegressorMixin\nfrom typing import Optional, Set\n\n\nclass Runnable(SkLearnRunnable):\n\n N_ESTIMATORS = IntArgument(\n '--n-estimators',\n description='The number of trees in the forest',\n default=100,\n )\n\n def get_algorithmic_arguments(self, known_args: Namespace) -> Set[Argument]:\n return { self.N_ESTIMATORS }\n\n def create_classifier(self, args: Namespace) -> Optional[ClassifierMixin]:\n return RandomForestClassifier()\n\n def create_regressor(self, args: Namespace) -> Optional[RegressorMixin]:\n return None # Not needed in this case\n\n```\n\nThe previously integrated algorithm can then be used in experiments controlled via a command line API. Assuming that the source code shown above is saved to a file named `custom_runnable.py` in the working directory, we are now capable of fitting a [RandomForestClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier) to a dataset by using the command below.\n\n```\nmlrl-testbed custom_runnable.py \\\n --data-dir path/to/datasets/ \\\n --dataset dataset-name \\\n --n-estimators 50\n```\n\nThe above command does not only train a model, but also evaluates it according to common measures and prints the evaluation results. It does also demonstrate how algorithmic parameters can be controlled via command line arguments.\n\nIt is also possible to run multiple experiments at once by defining the datasets and algorithmic parameters to be used in the different runs in a YAML file:\n\n```\nmlrl-testbed custom_runnable.py --mode batch --config path/to/config.yaml\n```\n\nAn exemplary YAML file is shown below. Each combination of the specified parameter values is applied to each dataset defined in the file.\n\n```yaml\ndatasets:\n - directory: path/to/datasets/\n names:\n - first-dataset\n - second-dataset\nparameters:\n - name: --n-estimators\n values:\n - 50\n - 100\n```\n\n### \ud83c\udfc1 Advantages\n\nMaking use of MLRL-Testbed does not only help with the burdens of training and evaluating machine learning models, it can also help making your own methods and algorithms more accessible to users. This is demonstrated by the rule learning algorithms [mlrl-boomer](https://pypi.org/project/mlrl-boomer/) and [mlrl-seco](https://pypi.org/project/mlrl-seco/) that can easily be run via the command line API described above and even extend it with rule-specific functionalities.\n\n### \ud83d\udd27 Functionalities\n\nThe package [mlrl-testbed-sklearn](https://pypi.org/project/mlrl-testbed-sklearn/) provides a command line API that allows configuring and running machine learning algorithms. It allows to apply machine learning algorithms to different datasets and can evaluate their predictive performance in terms of commonly used measures. In detail, it supports the following functionalities:\n\n- Single- and multi-output datasets in the [Mulan](https://github.com/tsoumakas/mulan) and [MEKA](https://waikato.github.io/meka/) format are supported (with the help of the package [mlrl-testbed-arff](https://pypi.org/project/mlrl-testbed-arff/)).\n- Datasets can automatically be split into training and test data, including the possibility to use cross validation. Alternatively, predefined splits can be provided as separate files.\n- One-hot-encoding can be applied to nominal or binary features.\n- Binary predictions, scores, or probability estimates can be obtained from machine learning algorithms, if supported. Evaluation measures that are suited for the respective type of predictions are picked automatically.\n\nFurthermore, the command line API provides many options for controlling the experimental results to be gathered during an experiment. Depending on the configuration, the following experimental results can be saved to output files or printed on the console:\n\n- Evaluation scores according to commonly used measures\n- Characteristics, i.e., statistical properties, of datasets\n- Predictions and their characteristics\n- Unique label vectors contained in a classification dataset\n\nIf the following are written to output files, they can be loaded and reused in future experiments:\n\n- The machine learning models that have been learned\n- Algorithmic parameters used for training\n\n<!-- documentation-end -->\n\n## \ud83d\udcda Documentation\n\nOur documentation provides an extensive [user guide](https://mlrl-boomer.readthedocs.io/en/latest/user_guide/testbed/index.html), as well as an [API reference](https://mlrl-boomer.readthedocs.io/en/latest/developer_guide/api/python/testbed/mlrl.testbed.html) for developers.\n\n- Examples of how to save [experimental results](https://mlrl-boomer.readthedocs.io/en/latest/user_guide/testbed/outputs.html) to output files.\n- Instructions for [using your own algorithms](https://mlrl-boomer.readthedocs.io/en/latest/user_guide/testbed/runnables.html) with the command line API.\n- An overview of available [command line arguments](https://mlrl-boomer.readthedocs.io/en/latest/user_guide/testbed/arguments.html) for controlling experiments.\n\nFor an overview of changes and new features that have been included in past releases, please refer to the [changelog](https://mlrl-boomer.readthedocs.io/en/latest/misc/CHANGELOG.html).\n\n## \ud83d\udcdc License\n\nThis project is open source software licensed under the terms of the [MIT license](https://mlrl-boomer.readthedocs.io/en/latest/misc/LICENSE.html). We welcome contributions to the project to enhance its functionality and make it more accessible to a broader audience. A frequently updated list of contributors is available [here](https://mlrl-boomer.readthedocs.io/en/latest/misc/CONTRIBUTORS.html).\n\nAll contributions to the project and discussions on the [issue tracker](https://github.com/mrapp-ke/MLRL-Boomer/issues) are expected to follow the [code of conduct](https://mlrl-boomer.readthedocs.io/en/latest/misc/CODE_OF_CONDUCT.html).\n",
"bugtrack_url": null,
"license": null,
"summary": "Provides utilities for the training and evaluation of machine learning algorithms",
"version": "0.14.1",
"project_urls": {
"changelog": "https://raw.githubusercontent.com/mrapp-ke/MLRL-Boomer/refs/heads/main/CHANGELOG.md",
"documentation": "https://mlrl-boomer.readthedocs.io/en/latest",
"download": "https://github.com/mrapp-ke/MLRL-Boomer/releases",
"homepage": "https://github.com/mrapp-ke/MLRL-Boomer",
"issues": "https://github.com/mrapp-ke/MLRL-Boomer/issues",
"source": "https://github.com/mrapp-ke/MLRL-Boomer.git"
},
"split_keywords": [
"machine learning",
" multi-label classification",
" multi-output regression",
" multi-target regression"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "ce613932fbba5d29c258abe278182cd975ac12cb639bc0ebbca0f0f2a8e18e3b",
"md5": "f97af13daaffc67b8649c0bc3ff4c54a",
"sha256": "07c86dcb28bfb1c3a943451124da78aa64d7f7925a67352d9e0786c73b684578"
},
"downloads": -1,
"filename": "mlrl_testbed-0.14.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f97af13daaffc67b8649c0bc3ff4c54a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.12",
"size": 80197,
"upload_time": "2025-10-14T21:52:45",
"upload_time_iso_8601": "2025-10-14T21:52:45.476498Z",
"url": "https://files.pythonhosted.org/packages/ce/61/3932fbba5d29c258abe278182cd975ac12cb639bc0ebbca0f0f2a8e18e3b/mlrl_testbed-0.14.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-14 21:52:45",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "mrapp-ke",
"github_project": "MLRL-Boomer",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "mlrl-testbed"
}