mlrl-common


Namemlrl-common JSON
Version 0.10.0 PyPI version JSON
download
home_pagehttps://github.com/mrapp-ke/MLRL-Boomer
SummaryProvides common modules to be used by different types of multi-label rule learning algorithms
upload_time2024-05-05 17:01:31
maintainerNone
docs_urlNone
authorMichael Rapp
requires_python>=3.9
licenseMIT
keywords machine learning scikit-learn multi-label classification rule learning
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # "MLRL-Common": Building-Blocks for Multi-label Rule Learning Algorithms 

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![PyPI version](https://badge.fury.io/py/mlrl-common.svg)](https://badge.fury.io/py/mlrl-common)
[![Documentation Status](https://readthedocs.org/projects/mlrl-boomer/badge/?version=latest)](https://mlrl-boomer.readthedocs.io/en/latest/?badge=latest)
[![Build](https://github.com/mrapp-ke/MLRL-Boomer/actions/workflows/test_build.yml/badge.svg)](https://github.com/mrapp-ke/MLRL-Boomer/actions/workflows/test_build.yml)
[![Code style](https://github.com/mrapp-ke/MLRL-Boomer/actions/workflows/test_format.yml/badge.svg)](https://github.com/mrapp-ke/MLRL-Boomer/actions/workflows/test_format.yml)

**Important links:** [Documentation](https://mlrl-boomer.readthedocs.io/en/latest/) | [Issue Tracker](https://github.com/mrapp-ke/MLRL-Boomer/issues) | [Changelog](https://mlrl-boomer.readthedocs.io/en/latest/misc/CHANGELOG.html) | [Contributors](https://mlrl-boomer.readthedocs.io/en/latest/misc/CONTRIBUTORS.html) | [Code of Conduct](https://mlrl-boomer.readthedocs.io/en/latest/misc/CODE_OF_CONDUCT.html) | [License](https://mlrl-boomer.readthedocs.io/en/latest/misc/LICENSE.html)

This software package provides common modules to be used by different types of **multi-label rule learning (MLRL)** algorithms that integrate with the popular [scikit-learn](https://scikit-learn.org) machine learning framework.

The goal of [multi-label classification](https://en.wikipedia.org/wiki/Multi-label_classification) is the automatic assignment of sets of labels to individual data points, for example, the annotation of text documents with topics.

The library serves as the basis for the implementation of the following rule learning algorithms:

* **BOOMER (Gradient Boosted Multi-label Classification Rules)**: A state-of-the art algorithm that uses [gradient boosting](https://en.wikipedia.org/wiki/Gradient_boosting) to learn an ensemble of rules that is built with respect to a given multivariate loss function.

## Functionalities

This package follows a unified and modular framework for the implementation of different types of MLRL algorithms. In the following, we provide an overview of the individual modules an instantiation of the framework must implement.

### Rule Induction

A module for rule induction that is responsible for the construction of individual rules. Currently, the following modules of this kind are implemented:

* A module for **greedy rule induction** that conducts a top-down search, where rules are constructed by adding one condition after the other and adjusting its prediction accordingly. 
* Rule induction based on a **beam search**, where a top-down search is conducted as described above. However, instead of focusing on the best solution at each step, the algorithm keeps track of a predefined number of promising solutions and picks the best one at the end.

All of the above modules support **numerical, ordinal, and nominal features** and can handle **missing feature values**. They can also be combined with methods for **unsupervised feature binning**, where training examples with similar features values are assigned to bins in order to reduce the training complexity. Moreover, **multi-threading** can be used to speed up training.

### Model Assemblage

A module for the assemblage of a rule model that consists of several rules. Currently, the following strategies can be used for constructing a model:

* **Sequential assemblage of rule models**, where one rule is learned after the other.

### Sampling Methods

A wide variety of sampling methods, including **sampling with and without replacement**, as well as **stratified sampling techniques**, is provided by this package. They can be used to learn new rules on a subset of the available training examples, features, or labels.

### (Label Space) Statistics

So-called label space statistics serve as the basis for assessing the quality of potential rules and determining their predictions. The notion of the statistics heavily depend on the rule learning algorithm at hand. For this reason, no particular implementation is currently included in this package.

### Post-Processing

Post-processing methods can be used to alter the predictions of a rule after it has been learned. Whether this is desirable or not heavily depends on the rule learning algorithm at hand. For this reason, no post-processing methods are currently provided by this package.

### Pruning Methods

Rule pruning techniques can optionally be applied to a rule after its construction to improve its generalization to unseen data and prevent overfitting. The following pruning techniques are currently supported by this package:

* **Incremental reduced error pruning (IREP)** removes overly specific conditions from a rule if this results in an increase of predictive performance (measured on a holdout set of the training data).

### Stopping Criteria

One or several stopping criteria can be used to decide whether additional rules should be added to a model or not. Currently, the following criteria are provided out-of-the-box:

* A **size-based stopping criterion** that ensures that a certain number of rules is not exceeded.
* A **time-based stopping criterion** that stops training as soon as a predefined runtime was exceeded.
* **Pre-pruning (a.k.a. early stopping)** aims at terminating the training process as soon as the performance of a model stagnates or declines (measured on a holdout set of the training data).

### Post-Optimization

Post-optimization methods can be employed to further improve the predictive performance of a model after it has been assembled. Currently, the following post-optimization techniques can be used:

* **Sequential post-optimization** reconstructs each rule in a model in the context of the other rules.

* **Post-pruning** may remove trailing rules from a model in this increases the models performance (as measured on a holdout set of the training data).

### Prediction algorithm

A prediction algorithm is needed to derive predictions from the rules in a previously assembled model. As prediction methods heavily depend on the rule learning algorithm at hand, no implementation is provided by this package out-of-the-box. However, it defines interfaces for the prediction of **regression scores, binary predictions, or probability estimates.**

## License

This project is open source software licensed under the terms of the [MIT license](https://mlrl-boomer.readthedocs.io/en/latest/misc/LICENSE.html). We welcome contributions to the project to enhance its functionality and make it more accessible to a broader audience. A frequently updated list of contributors is available [here](https://mlrl-boomer.readthedocs.io/en/latest/misc/CONTRIBUTORS.html). 

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/mrapp-ke/MLRL-Boomer",
    "name": "mlrl-common",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "machine learning, scikit-learn, multi-label classification, rule learning",
    "author": "Michael Rapp",
    "author_email": "michael.rapp.ml@gmail.com",
    "download_url": "https://github.com/mrapp-ke/MLRL-Boomer/releases",
    "platform": "Linux",
    "description": "# \"MLRL-Common\": Building-Blocks for Multi-label Rule Learning Algorithms \n\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![PyPI version](https://badge.fury.io/py/mlrl-common.svg)](https://badge.fury.io/py/mlrl-common)\n[![Documentation Status](https://readthedocs.org/projects/mlrl-boomer/badge/?version=latest)](https://mlrl-boomer.readthedocs.io/en/latest/?badge=latest)\n[![Build](https://github.com/mrapp-ke/MLRL-Boomer/actions/workflows/test_build.yml/badge.svg)](https://github.com/mrapp-ke/MLRL-Boomer/actions/workflows/test_build.yml)\n[![Code style](https://github.com/mrapp-ke/MLRL-Boomer/actions/workflows/test_format.yml/badge.svg)](https://github.com/mrapp-ke/MLRL-Boomer/actions/workflows/test_format.yml)\n\n**Important links:** [Documentation](https://mlrl-boomer.readthedocs.io/en/latest/) | [Issue Tracker](https://github.com/mrapp-ke/MLRL-Boomer/issues) | [Changelog](https://mlrl-boomer.readthedocs.io/en/latest/misc/CHANGELOG.html) | [Contributors](https://mlrl-boomer.readthedocs.io/en/latest/misc/CONTRIBUTORS.html) | [Code of Conduct](https://mlrl-boomer.readthedocs.io/en/latest/misc/CODE_OF_CONDUCT.html) | [License](https://mlrl-boomer.readthedocs.io/en/latest/misc/LICENSE.html)\n\nThis software package provides common modules to be used by different types of **multi-label rule learning (MLRL)** algorithms that integrate with the popular [scikit-learn](https://scikit-learn.org) machine learning framework.\n\nThe goal of [multi-label classification](https://en.wikipedia.org/wiki/Multi-label_classification) is the automatic assignment of sets of labels to individual data points, for example, the annotation of text documents with topics.\n\nThe library serves as the basis for the implementation of the following rule learning algorithms:\n\n* **BOOMER (Gradient Boosted Multi-label Classification Rules)**: A state-of-the art algorithm that uses [gradient boosting](https://en.wikipedia.org/wiki/Gradient_boosting) to learn an ensemble of rules that is built with respect to a given multivariate loss function.\n\n## Functionalities\n\nThis package follows a unified and modular framework for the implementation of different types of MLRL algorithms. In the following, we provide an overview of the individual modules an instantiation of the framework must implement.\n\n### Rule Induction\n\nA module for rule induction that is responsible for the construction of individual rules. Currently, the following modules of this kind are implemented:\n\n* A module for **greedy rule induction** that conducts a top-down search, where rules are constructed by adding one condition after the other and adjusting its prediction accordingly. \n* Rule induction based on a **beam search**, where a top-down search is conducted as described above. However, instead of focusing on the best solution at each step, the algorithm keeps track of a predefined number of promising solutions and picks the best one at the end.\n\nAll of the above modules support **numerical, ordinal, and nominal features** and can handle **missing feature values**. They can also be combined with methods for **unsupervised feature binning**, where training examples with similar features values are assigned to bins in order to reduce the training complexity. Moreover, **multi-threading** can be used to speed up training.\n\n### Model Assemblage\n\nA module for the assemblage of a rule model that consists of several rules. Currently, the following strategies can be used for constructing a model:\n\n* **Sequential assemblage of rule models**, where one rule is learned after the other.\n\n### Sampling Methods\n\nA wide variety of sampling methods, including **sampling with and without replacement**, as well as **stratified sampling techniques**, is provided by this package. They can be used to learn new rules on a subset of the available training examples, features, or labels.\n\n### (Label Space) Statistics\n\nSo-called label space statistics serve as the basis for assessing the quality of potential rules and determining their predictions. The notion of the statistics heavily depend on the rule learning algorithm at hand. For this reason, no particular implementation is currently included in this package.\n\n### Post-Processing\n\nPost-processing methods can be used to alter the predictions of a rule after it has been learned. Whether this is desirable or not heavily depends on the rule learning algorithm at hand. For this reason, no post-processing methods are currently provided by this package.\n\n### Pruning Methods\n\nRule pruning techniques can optionally be applied to a rule after its construction to improve its generalization to unseen data and prevent overfitting. The following pruning techniques are currently supported by this package:\n\n* **Incremental reduced error pruning (IREP)** removes overly specific conditions from a rule if this results in an increase of predictive performance (measured on a holdout set of the training data).\n\n### Stopping Criteria\n\nOne or several stopping criteria can be used to decide whether additional rules should be added to a model or not. Currently, the following criteria are provided out-of-the-box:\n\n* A **size-based stopping criterion** that ensures that a certain number of rules is not exceeded.\n* A **time-based stopping criterion** that stops training as soon as a predefined runtime was exceeded.\n* **Pre-pruning (a.k.a. early stopping)** aims at terminating the training process as soon as the performance of a model stagnates or declines (measured on a holdout set of the training data).\n\n### Post-Optimization\n\nPost-optimization methods can be employed to further improve the predictive performance of a model after it has been assembled. Currently, the following post-optimization techniques can be used:\n\n* **Sequential post-optimization** reconstructs each rule in a model in the context of the other rules.\n\n* **Post-pruning** may remove trailing rules from a model in this increases the models performance (as measured on a holdout set of the training data).\n\n### Prediction algorithm\n\nA prediction algorithm is needed to derive predictions from the rules in a previously assembled model. As prediction methods heavily depend on the rule learning algorithm at hand, no implementation is provided by this package out-of-the-box. However, it defines interfaces for the prediction of **regression scores, binary predictions, or probability estimates.**\n\n## License\n\nThis project is open source software licensed under the terms of the [MIT license](https://mlrl-boomer.readthedocs.io/en/latest/misc/LICENSE.html). We welcome contributions to the project to enhance its functionality and make it more accessible to a broader audience. A frequently updated list of contributors is available [here](https://mlrl-boomer.readthedocs.io/en/latest/misc/CONTRIBUTORS.html). \n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Provides common modules to be used by different types of multi-label rule learning algorithms",
    "version": "0.10.0",
    "project_urls": {
        "Documentation": "https://mlrl-boomer.readthedocs.io/en/latest",
        "Download": "https://github.com/mrapp-ke/MLRL-Boomer/releases",
        "Homepage": "https://github.com/mrapp-ke/MLRL-Boomer",
        "Issue Tracker": "https://github.com/mrapp-ke/MLRL-Boomer/issues"
    },
    "split_keywords": [
        "machine learning",
        " scikit-learn",
        " multi-label classification",
        " rule learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "abd91a63ba874d034ac531fd5979257558953964cce9fd684aaa72d49a676dff",
                "md5": "385ccbf4e6155b833200396ab159a761",
                "sha256": "0106e1f089140cf9004a7b04ecf31cceb11b33ca00c33cff4e7f7b4ea720298e"
            },
            "downloads": -1,
            "filename": "mlrl_common-0.10.0-cp310-cp310-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "385ccbf4e6155b833200396ab159a761",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.9",
            "size": 1717455,
            "upload_time": "2024-05-05T17:01:31",
            "upload_time_iso_8601": "2024-05-05T17:01:31.130458Z",
            "url": "https://files.pythonhosted.org/packages/ab/d9/1a63ba874d034ac531fd5979257558953964cce9fd684aaa72d49a676dff/mlrl_common-0.10.0-cp310-cp310-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0afad3eb19c0aa913ee25bee470c606950bb57511c2eb5878d84ce3799d83c63",
                "md5": "1e4c24b6b76331264a9e5e314145df85",
                "sha256": "91a34063f86b86e8a2d561195afcae3dbc0aca89db8b0de1187d6d93c82a09ac"
            },
            "downloads": -1,
            "filename": "mlrl_common-0.10.0-cp311-cp311-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "1e4c24b6b76331264a9e5e314145df85",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.9",
            "size": 1698475,
            "upload_time": "2024-05-05T17:01:38",
            "upload_time_iso_8601": "2024-05-05T17:01:38.431125Z",
            "url": "https://files.pythonhosted.org/packages/0a/fa/d3eb19c0aa913ee25bee470c606950bb57511c2eb5878d84ce3799d83c63/mlrl_common-0.10.0-cp311-cp311-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2002ecde574039d22091c91c2fa49dabdf42b1a61967a98d84b8f1921de55518",
                "md5": "2ccccb768bcc6065354605f1d2ca1763",
                "sha256": "9189e4f31d4b177c2c5013139dd42f01b2c37a29a54e7242dbf7b35a0682199c"
            },
            "downloads": -1,
            "filename": "mlrl_common-0.10.0-cp312-cp312-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "2ccccb768bcc6065354605f1d2ca1763",
            "packagetype": "bdist_wheel",
            "python_version": "cp312",
            "requires_python": ">=3.9",
            "size": 1722959,
            "upload_time": "2024-05-05T17:01:41",
            "upload_time_iso_8601": "2024-05-05T17:01:41.453904Z",
            "url": "https://files.pythonhosted.org/packages/20/02/ecde574039d22091c91c2fa49dabdf42b1a61967a98d84b8f1921de55518/mlrl_common-0.10.0-cp312-cp312-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d800adf9c838f8b1ce2654a90a7939967567f0d993d140d2bd8bb9762f3af0f3",
                "md5": "1ed76063d66b4426b9ea18b167345d1a",
                "sha256": "1bb66d12c4881d455c2ad97156f53b0103cc297d870a1cf87403590130c737f7"
            },
            "downloads": -1,
            "filename": "mlrl_common-0.10.0-cp39-cp39-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "1ed76063d66b4426b9ea18b167345d1a",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.9",
            "size": 1729975,
            "upload_time": "2024-05-05T17:01:43",
            "upload_time_iso_8601": "2024-05-05T17:01:43.977426Z",
            "url": "https://files.pythonhosted.org/packages/d8/00/adf9c838f8b1ce2654a90a7939967567f0d993d140d2bd8bb9762f3af0f3/mlrl_common-0.10.0-cp39-cp39-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-05 17:01:31",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mrapp-ke",
    "github_project": "MLRL-Boomer",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "mlrl-common"
}
        
Elapsed time: 0.30450s