mlrl-boomer


Namemlrl-boomer JSON
Version 0.10.0 PyPI version JSON
download
home_pagehttps://github.com/mrapp-ke/MLRL-Boomer
SummaryA scikit-learn implementation of BOOMER - an algorithm for learning gradient boosted multi-label classification rules
upload_time2024-05-05 17:02:39
maintainerNone
docs_urlNone
authorMichael Rapp
requires_python>=3.9
licenseMIT
keywords machine learning scikit-learn multi-label classification rule learning gradient boosting
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <p align="center">
  <picture>
    <source media="(prefers-color-scheme: dark)" srcset="assets/logo_dark.svg">
    <source media="(prefers-color-scheme: light)" srcset="assets/logo_light.svg">
    <img alt="BOOMER - Gradient Boosted Multi-Label Classification Rules" src="assets/logo_light.svg">
  </picture>
</p>

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![PyPI version](https://badge.fury.io/py/mlrl-boomer.svg)](https://badge.fury.io/py/mlrl-boomer) [![Documentation Status](https://readthedocs.org/projects/mlrl-boomer/badge/?version=latest)](https://mlrl-boomer.readthedocs.io/en/latest/?badge=latest) [![Build](https://github.com/mrapp-ke/MLRL-Boomer/actions/workflows/test_build.yml/badge.svg)](https://github.com/mrapp-ke/MLRL-Boomer/actions/workflows/test_build.yml) [![Code style](https://github.com/mrapp-ke/MLRL-Boomer/actions/workflows/test_format.yml/badge.svg)](https://github.com/mrapp-ke/MLRL-Boomer/actions/workflows/test_format.yml) [![X URL](https://img.shields.io/twitter/url?label=Follow&style=social&url=https%3A%2F%2Ftwitter.com%2FBOOMER_ML)](https://twitter.com/BOOMER_ML)

**Important links:** [Documentation](https://mlrl-boomer.readthedocs.io/en/latest/) | [Issue Tracker](https://github.com/mrapp-ke/MLRL-Boomer/issues) | [Changelog](https://mlrl-boomer.readthedocs.io/en/latest/misc/CHANGELOG.html) | [Contributors](https://mlrl-boomer.readthedocs.io/en/latest/misc/CONTRIBUTORS.html) | [Code of Conduct](https://mlrl-boomer.readthedocs.io/en/latest/misc/CODE_OF_CONDUCT.html) | [License](https://mlrl-boomer.readthedocs.io/en/latest/misc/LICENSE.html)

This software package provides the official implementation of **BOOMER - an algorithm for learning gradient boosted multi-label classification rules** that integrates with the popular [scikit-learn](https://scikit-learn.org) machine learning framework.

The goal of [multi-label classification](https://en.wikipedia.org/wiki/Multi-label_classification) is the automatic assignment of sets of labels to individual data points, for example, the annotation of text documents with topics. The BOOMER algorithm uses [gradient boosting](https://en.wikipedia.org/wiki/Gradient_boosting) to learn an ensemble of rules that is built with respect to a given multivariate loss function.

To provide a versatile tool for different use cases, great emphasis is put on the *efficiency* of the implementation. Moreover, to ensure its *flexibility*, it is designed in a modular fashion and can therefore easily be adjusted to different requirements. This modular approach enables implementing different kind of rule learning algorithms. For example, this project does also provide a [Separate-and-Conquer (SeCo) algorithm](https://mlrl-boomer.readthedocs.io/en/latest/user_guide/seco/index.html) based on traditional rule learning techniques that are particularly well-suited for learning interpretable models.

## References

The algorithm was first published in the following [paper](https://doi.org/10.1007/978-3-030-67664-3_8). A preprint version is publicly available [here](https://arxiv.org/pdf/2006.13346.pdf).

*Michael Rapp, Eneldo Loza Mencía, Johannes Fürnkranz Vu-Linh Nguyen and Eyke Hüllermeier. Learning Gradient Boosted Multi-label Classification Rules. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), 2020, Springer.*

If you use the algorithm in a scientific publication, we would appreciate citations to the mentioned paper. An overview of publications that are concerned with the BOOMER algorithm, together with information on how to cite them, can be found in the section [References](https://mlrl-boomer.readthedocs.io/en/latest/misc/references.html) of the documentation.

## Functionalities

The algorithm that is provided by this project currently supports the following core functionalities for learning ensembles of boosted classification rules:

- **Label-wise decomposable or non-decomposable loss functions** can be minimized in expectation.
- **L1 and L2 regularization** can be used.
- **Single-label, partial, or complete heads** can be used by rules, i.e., they can predict for an individual label, a subset of the available labels, or all labels. Predicting for multiple labels simultaneously enables rules to model local dependencies between labels.
- **Various strategies for predicting regression scores, labels or probabilities** are available.
- **Isotonic regression models can be used to calibrate marginal and joint probabilities** predicted by a model.
- **Rules can be constructed via a greedy search or a beam search.** The latter may help to improve the quality of individual rules.
- **Sampling techniques and stratification methods** can be used to learn new rules on a subset of the available training examples, features, or labels.
- **Shrinkage (a.k.a. the learning rate) can be adjusted** to control the impact of individual rules on the overall ensemble.
- **Fine-grained control over the specificity/generality of rules** is provided via hyper-parameters.
- **Incremental reduced error pruning** can be used to remove overly specific conditions from rules and prevent overfitting.
- **Post- and pre-pruning (a.k.a. early stopping)** allows to determine the optimal number of rules to be included in an ensemble.
- **Sequential post-optimization** may help to improve the predictive performance of a model by reconstructing each rule in the context of the other rules.
- **Native support for numerical, ordinal, and nominal features** eliminates the need for pre-processing techniques such as one-hot encoding.
- **Handling of missing feature values**, i.e., occurrences of NaN in the feature matrix, is implemented by the algorithm.

## Runtime and Memory Optimizations

In addition, the following features that may speed up training or reduce the memory footprint are currently implemented:

- **Unsupervised feature binning** can be used to speed up the evaluation of a rule's potential conditions when dealing with numerical features.
- **[Gradient-based label binning (GBLB)](https://arxiv.org/pdf/2106.11690.pdf)** can be used to assign the available labels to a limited number of bins. This may speed up training significantly when minimizing a non-decomposable loss function using rules with partial or complete heads.
- **Sparse feature matrices** can be used for training and prediction. This may speed up training significantly on some data sets.
- **Sparse label matrices** can be used for training. This may reduce the memory footprint in case of large data sets.
- **Sparse prediction matrices** can be used to store predicted labels. This may reduce the memory footprint in case of large data sets.
- **Sparse matrices for storing gradients and Hessians** can be used if supported by the loss function. This may speed up training significantly on data sets with many labels.
- **Multi-threading** can be used to parallelize the evaluation of a rule's potential refinements across several features, to update the gradients and Hessians of individual examples in parallel, or to obtain predictions for several examples in parallel.

## Documentation

An extensive user guide, as well as an API documentation for developers, is available at [https://mlrl-boomer.readthedocs.io](https://mlrl-boomer.readthedocs.io/en/latest/). If you are new to the project, you probably want to read about the following topics:

- Instructions for [installing the software package](https://mlrl-boomer.readthedocs.io/en/latest/quickstart/installation.html) or [building the project from source](https://mlrl-boomer.readthedocs.io/en/latest/developer_guide/compilation.html).
- Examples of how to [use the algorithm](https://mlrl-boomer.readthedocs.io/en/latest/quickstart/usage.html) in your own Python code or how to use the [command line API](https://mlrl-boomer.readthedocs.io/en/latest/quickstart/testbed.html).
- An overview of available [parameters](https://mlrl-boomer.readthedocs.io/en/latest/user_guide/boosting/parameters.html).

A collection of benchmark datasets that are compatible with the algorithm are provided in a separate [repository](https://github.com/mrapp-ke/Boomer-Datasets).

For an overview of changes and new features that have been included in past releases, please refer to the [changelog](https://mlrl-boomer.readthedocs.io/en/latest/misc/CHANGELOG.html).

## License

This project is open source software licensed under the terms of the [MIT license](https://mlrl-boomer.readthedocs.io/en/latest/misc/LICENSE.html). We welcome contributions to the project to enhance its functionality and make it more accessible to a broader audience. A frequently updated list of contributors is available [here](https://mlrl-boomer.readthedocs.io/en/latest/misc/CONTRIBUTORS.html).

All contributions to the project and discussions on the [issue tracker](https://github.com/mrapp-ke/MLRL-Boomer/issues) are expected to follow the [code of conduct](https://mlrl-boomer.readthedocs.io/en/latest/misc/CODE_OF_CONDUCT.html).

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/mrapp-ke/MLRL-Boomer",
    "name": "mlrl-boomer",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "machine learning, scikit-learn, multi-label classification, rule learning, gradient boosting",
    "author": "Michael Rapp",
    "author_email": "michael.rapp.ml@gmail.com",
    "download_url": "https://github.com/mrapp-ke/MLRL-Boomer/releases",
    "platform": "Linux",
    "description": "<p align=\"center\">\n  <picture>\n    <source media=\"(prefers-color-scheme: dark)\" srcset=\"assets/logo_dark.svg\">\n    <source media=\"(prefers-color-scheme: light)\" srcset=\"assets/logo_light.svg\">\n    <img alt=\"BOOMER - Gradient Boosted Multi-Label Classification Rules\" src=\"assets/logo_light.svg\">\n  </picture>\n</p>\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![PyPI version](https://badge.fury.io/py/mlrl-boomer.svg)](https://badge.fury.io/py/mlrl-boomer) [![Documentation Status](https://readthedocs.org/projects/mlrl-boomer/badge/?version=latest)](https://mlrl-boomer.readthedocs.io/en/latest/?badge=latest) [![Build](https://github.com/mrapp-ke/MLRL-Boomer/actions/workflows/test_build.yml/badge.svg)](https://github.com/mrapp-ke/MLRL-Boomer/actions/workflows/test_build.yml) [![Code style](https://github.com/mrapp-ke/MLRL-Boomer/actions/workflows/test_format.yml/badge.svg)](https://github.com/mrapp-ke/MLRL-Boomer/actions/workflows/test_format.yml) [![X URL](https://img.shields.io/twitter/url?label=Follow&style=social&url=https%3A%2F%2Ftwitter.com%2FBOOMER_ML)](https://twitter.com/BOOMER_ML)\n\n**Important links:** [Documentation](https://mlrl-boomer.readthedocs.io/en/latest/) | [Issue Tracker](https://github.com/mrapp-ke/MLRL-Boomer/issues) | [Changelog](https://mlrl-boomer.readthedocs.io/en/latest/misc/CHANGELOG.html) | [Contributors](https://mlrl-boomer.readthedocs.io/en/latest/misc/CONTRIBUTORS.html) | [Code of Conduct](https://mlrl-boomer.readthedocs.io/en/latest/misc/CODE_OF_CONDUCT.html) | [License](https://mlrl-boomer.readthedocs.io/en/latest/misc/LICENSE.html)\n\nThis software package provides the official implementation of **BOOMER - an algorithm for learning gradient boosted multi-label classification rules** that integrates with the popular [scikit-learn](https://scikit-learn.org) machine learning framework.\n\nThe goal of [multi-label classification](https://en.wikipedia.org/wiki/Multi-label_classification) is the automatic assignment of sets of labels to individual data points, for example, the annotation of text documents with topics. The BOOMER algorithm uses [gradient boosting](https://en.wikipedia.org/wiki/Gradient_boosting) to learn an ensemble of rules that is built with respect to a given multivariate loss function.\n\nTo provide a versatile tool for different use cases, great emphasis is put on the *efficiency* of the implementation. Moreover, to ensure its *flexibility*, it is designed in a modular fashion and can therefore easily be adjusted to different requirements. This modular approach enables implementing different kind of rule learning algorithms. For example, this project does also provide a [Separate-and-Conquer (SeCo) algorithm](https://mlrl-boomer.readthedocs.io/en/latest/user_guide/seco/index.html) based on traditional rule learning techniques that are particularly well-suited for learning interpretable models.\n\n## References\n\nThe algorithm was first published in the following [paper](https://doi.org/10.1007/978-3-030-67664-3_8). A preprint version is publicly available [here](https://arxiv.org/pdf/2006.13346.pdf).\n\n*Michael Rapp, Eneldo Loza Menc\u00eda, Johannes F\u00fcrnkranz Vu-Linh Nguyen and Eyke H\u00fcllermeier. Learning Gradient Boosted Multi-label Classification Rules. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), 2020, Springer.*\n\nIf you use the algorithm in a scientific publication, we would appreciate citations to the mentioned paper. An overview of publications that are concerned with the BOOMER algorithm, together with information on how to cite them, can be found in the section [References](https://mlrl-boomer.readthedocs.io/en/latest/misc/references.html) of the documentation.\n\n## Functionalities\n\nThe algorithm that is provided by this project currently supports the following core functionalities for learning ensembles of boosted classification rules:\n\n- **Label-wise decomposable or non-decomposable loss functions** can be minimized in expectation.\n- **L1 and L2 regularization** can be used.\n- **Single-label, partial, or complete heads** can be used by rules, i.e., they can predict for an individual label, a subset of the available labels, or all labels. Predicting for multiple labels simultaneously enables rules to model local dependencies between labels.\n- **Various strategies for predicting regression scores, labels or probabilities** are available.\n- **Isotonic regression models can be used to calibrate marginal and joint probabilities** predicted by a model.\n- **Rules can be constructed via a greedy search or a beam search.** The latter may help to improve the quality of individual rules.\n- **Sampling techniques and stratification methods** can be used to learn new rules on a subset of the available training examples, features, or labels.\n- **Shrinkage (a.k.a. the learning rate) can be adjusted** to control the impact of individual rules on the overall ensemble.\n- **Fine-grained control over the specificity/generality of rules** is provided via hyper-parameters.\n- **Incremental reduced error pruning** can be used to remove overly specific conditions from rules and prevent overfitting.\n- **Post- and pre-pruning (a.k.a. early stopping)** allows to determine the optimal number of rules to be included in an ensemble.\n- **Sequential post-optimization** may help to improve the predictive performance of a model by reconstructing each rule in the context of the other rules.\n- **Native support for numerical, ordinal, and nominal features** eliminates the need for pre-processing techniques such as one-hot encoding.\n- **Handling of missing feature values**, i.e., occurrences of NaN in the feature matrix, is implemented by the algorithm.\n\n## Runtime and Memory Optimizations\n\nIn addition, the following features that may speed up training or reduce the memory footprint are currently implemented:\n\n- **Unsupervised feature binning** can be used to speed up the evaluation of a rule's potential conditions when dealing with numerical features.\n- **[Gradient-based label binning (GBLB)](https://arxiv.org/pdf/2106.11690.pdf)** can be used to assign the available labels to a limited number of bins. This may speed up training significantly when minimizing a non-decomposable loss function using rules with partial or complete heads.\n- **Sparse feature matrices** can be used for training and prediction. This may speed up training significantly on some data sets.\n- **Sparse label matrices** can be used for training. This may reduce the memory footprint in case of large data sets.\n- **Sparse prediction matrices** can be used to store predicted labels. This may reduce the memory footprint in case of large data sets.\n- **Sparse matrices for storing gradients and Hessians** can be used if supported by the loss function. This may speed up training significantly on data sets with many labels.\n- **Multi-threading** can be used to parallelize the evaluation of a rule's potential refinements across several features, to update the gradients and Hessians of individual examples in parallel, or to obtain predictions for several examples in parallel.\n\n## Documentation\n\nAn extensive user guide, as well as an API documentation for developers, is available at [https://mlrl-boomer.readthedocs.io](https://mlrl-boomer.readthedocs.io/en/latest/). If you are new to the project, you probably want to read about the following topics:\n\n- Instructions for [installing the software package](https://mlrl-boomer.readthedocs.io/en/latest/quickstart/installation.html) or [building the project from source](https://mlrl-boomer.readthedocs.io/en/latest/developer_guide/compilation.html).\n- Examples of how to [use the algorithm](https://mlrl-boomer.readthedocs.io/en/latest/quickstart/usage.html) in your own Python code or how to use the [command line API](https://mlrl-boomer.readthedocs.io/en/latest/quickstart/testbed.html).\n- An overview of available [parameters](https://mlrl-boomer.readthedocs.io/en/latest/user_guide/boosting/parameters.html).\n\nA collection of benchmark datasets that are compatible with the algorithm are provided in a separate [repository](https://github.com/mrapp-ke/Boomer-Datasets).\n\nFor an overview of changes and new features that have been included in past releases, please refer to the [changelog](https://mlrl-boomer.readthedocs.io/en/latest/misc/CHANGELOG.html).\n\n## License\n\nThis project is open source software licensed under the terms of the [MIT license](https://mlrl-boomer.readthedocs.io/en/latest/misc/LICENSE.html). We welcome contributions to the project to enhance its functionality and make it more accessible to a broader audience. A frequently updated list of contributors is available [here](https://mlrl-boomer.readthedocs.io/en/latest/misc/CONTRIBUTORS.html).\n\nAll contributions to the project and discussions on the [issue tracker](https://github.com/mrapp-ke/MLRL-Boomer/issues) are expected to follow the [code of conduct](https://mlrl-boomer.readthedocs.io/en/latest/misc/CODE_OF_CONDUCT.html).\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A scikit-learn implementation of BOOMER - an algorithm for learning gradient boosted multi-label classification rules",
    "version": "0.10.0",
    "project_urls": {
        "Documentation": "https://mlrl-boomer.readthedocs.io/en/latest",
        "Download": "https://github.com/mrapp-ke/MLRL-Boomer/releases",
        "Homepage": "https://github.com/mrapp-ke/MLRL-Boomer",
        "Issue Tracker": "https://github.com/mrapp-ke/MLRL-Boomer/issues"
    },
    "split_keywords": [
        "machine learning",
        " scikit-learn",
        " multi-label classification",
        " rule learning",
        " gradient boosting"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d1939b66cd0d42483d417d0b2cb1c3ed8088ab1928bd4c2cc0be4d8b9ffbeadc",
                "md5": "d5d824888bda891efdf4fbc94434ca73",
                "sha256": "013a8382600f6824ddd71a59d381fa45792779b537ab94d3959018e0e91fe6e4"
            },
            "downloads": -1,
            "filename": "mlrl_boomer-0.10.0-cp310-cp310-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "d5d824888bda891efdf4fbc94434ca73",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.9",
            "size": 1420102,
            "upload_time": "2024-05-05T17:02:39",
            "upload_time_iso_8601": "2024-05-05T17:02:39.990010Z",
            "url": "https://files.pythonhosted.org/packages/d1/93/9b66cd0d42483d417d0b2cb1c3ed8088ab1928bd4c2cc0be4d8b9ffbeadc/mlrl_boomer-0.10.0-cp310-cp310-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b6bb07273cecd885d87178e5225bc8c37b78839bcfdc8ee3a9d8b5362e6d321e",
                "md5": "4bdeb0ff261511ce688e40a0cbd600d9",
                "sha256": "9c63962451123f35f50bded3817298922485d915020168956c8f32864366a15a"
            },
            "downloads": -1,
            "filename": "mlrl_boomer-0.10.0-cp311-cp311-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "4bdeb0ff261511ce688e40a0cbd600d9",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.9",
            "size": 1411862,
            "upload_time": "2024-05-05T17:02:44",
            "upload_time_iso_8601": "2024-05-05T17:02:44.593667Z",
            "url": "https://files.pythonhosted.org/packages/b6/bb/07273cecd885d87178e5225bc8c37b78839bcfdc8ee3a9d8b5362e6d321e/mlrl_boomer-0.10.0-cp311-cp311-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d9e45883619012cb37f6b5fe6b0aacda61b64ba1da8b9378c03c33ed0312c3ce",
                "md5": "2d8aa443c1891af69938a31c3b93dd0d",
                "sha256": "2c230dacf1b15bb1a957d19c4cbfad5c1377f30a5fed71eca41d838a867d399f"
            },
            "downloads": -1,
            "filename": "mlrl_boomer-0.10.0-cp312-cp312-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "2d8aa443c1891af69938a31c3b93dd0d",
            "packagetype": "bdist_wheel",
            "python_version": "cp312",
            "requires_python": ">=3.9",
            "size": 1418187,
            "upload_time": "2024-05-05T17:02:46",
            "upload_time_iso_8601": "2024-05-05T17:02:46.699462Z",
            "url": "https://files.pythonhosted.org/packages/d9/e4/5883619012cb37f6b5fe6b0aacda61b64ba1da8b9378c03c33ed0312c3ce/mlrl_boomer-0.10.0-cp312-cp312-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "41ef127b9274f4655ac76fa0f9681410c663362b7a9d7412a2cb781d8ac422be",
                "md5": "6934461596b9f681eb936dd33747d334",
                "sha256": "2cb9b9a18c97ce1142a3184ab3601191ed56e71ff9a2e9c26332edbeff433a7f"
            },
            "downloads": -1,
            "filename": "mlrl_boomer-0.10.0-cp39-cp39-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "6934461596b9f681eb936dd33747d334",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.9",
            "size": 1426709,
            "upload_time": "2024-05-05T17:02:49",
            "upload_time_iso_8601": "2024-05-05T17:02:49.222805Z",
            "url": "https://files.pythonhosted.org/packages/41/ef/127b9274f4655ac76fa0f9681410c663362b7a9d7412a2cb781d8ac422be/mlrl_boomer-0.10.0-cp39-cp39-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-05 17:02:39",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mrapp-ke",
    "github_project": "MLRL-Boomer",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "mlrl-boomer"
}
        
Elapsed time: 0.23986s