# Example-wise F1 Maximizer
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![PyPI version](https://badge.fury.io/py/example-wise-f1-maximizer.svg)](https://badge.fury.io/py/example-wise-f1-maximizer)
**Important links:** [Issue Tracker](https://github.com/mrapp-ke/ExampleWiseF1Maximizer/issues) | [Changelog](CHANGELOG.md) | [Code of Conduct](CODE_OF_CONDUCT.md)
This software package provides an implementation of a meta-learning algorithm for multi-label classification that aims to maximize the example-wise F1-measure. It integrates with the popular [scikit-learn](https://scikit-learn.org) machine learning framework and can also be used with frameworks for multi-label classification like [scikit-multilearn](http://scikit.ml).
The goal of [multi-label classification](https://en.wikipedia.org/wiki/Multi-label_classification) is the automatic assignment of sets of labels to individual data points, for example, the annotation of text documents with topics. The example-wise [F1-measure](https://en.wikipedia.org/wiki/F-score) is a particularly relevant evaluation measure for this kind of predictions, as it requires a classifier to achieve a good balance between labels predicted as relevant or irrelevant for an example, i.e., it must neither be to conservative nor to aggressive when it comes to predicting labels as relevant.
## Methodology
The algorithm implemented by this project transforms an original multi-label problem with `n` labels into a series of `n * n + 1` binary classification problems. A probabilistic base estimator is then fit to each of these independent sub-problems as described in the following [paper](http://proceedings.mlr.press/v119/zhang20w/zhang20w.pdf):
*Mingyuan Zhan, Harish G. Ramaswamy, and Shivani Agarwal. Convex Calibrated Surrogates for the Multi-Label F-Measure. In: Proceedings of the International Conference on Machine Learning (ICML), 2020.*
The probabilities predicted by the individual base estimators for unseen examples consitute a `n x n` probability matrix `p`, as well as an additional probability `p_0`. Whereas `p_0` corresponds to the prior probability of the null vector, i.e., a label vector that does not contain any relevant labels, each probability `p_ik` at the `i`-th row and `k`-th column of `p` corresponds to the conditional probability of a label vector with `k` relevant labels, where the `i`-th label is relevant. In order to identify the label vector that maximizes the F1-measure in expectation, these probabilities are used as inputs to the "General F-Measure maximizer" (GFM), as proposed in the following [paper](https://proceedings.neurips.cc/paper/2011/file/71ad16ad2c4d81f348082ff6c4b20768-Paper.pdf):
*Krzysztof Dembczyński, Willem Waegeman, Weiwei Cheng, and Eyke Hüllermeier. An Exact Algorithm for F-Measure Maximization. In: Advances in Neural Information Processing Systems, 2011.*
**Please note that this implementation has not been written by any of the authors shown above.**
## Documentation
### Installation
The software package is available at [PiPy](https://pypi.org/project/example-wise-f1-maximizer/) and can easily be installed via PIP using the following command:
```
pip install example-wise-f1-maximizer
```
### Usage
To use the classifier in your own Python code, you need to import the class `ExampleWiseF1Maximizer`. It can be instantiated and used as shown below:
```python
from example_wise_f1_maximizer import ExampleWiseF1Maximizer
from sklearn.linear_model import LogisticRegression
clf = ExampleWiseF1Maximizer(estimator=LogisticRegression())
x = [[ 1, 2, 3], # Two training examples with three features
[ 11, 12, 13]]
y = [[1, 0], # Ground truth labels of each training example
[0, 1]]
clf.fit(x, y)
pred = clf.predict(x)
```
The fit method accepts two inputs, `x` and `y`:
* A two-dimensional feature matrix `x`, where each row corresponds to a training example and each column corresponds to a particular feature.
* A two-dimensional binary label matrix `y`, where each row corresponds to a training examples and each column corresponds to a label. If an element in the matrix is unlike zero, it indicates that respective label is relevant to an example. Elements that are equal to zero denote irrevant labels.
Both, `x` and `y`, are expected to be [numpy arrays](https://numpy.org/doc/stable/reference/generated/numpy.array.html) or equivalent [array-like](https://scikit-learn.org/stable/glossary.html#term-array-like) data types. In particular, the use of [scipy sparse matrices](https://docs.scipy.org/doc/scipy/reference/sparse.html) is supported.
In the previous example, logistic regression as implemented by the class `LogisticRegression` from the scikit-learn framework is used as a base estimator. Alternatively, you can use any probabilistic estimator for binary classification that is compatible with the scikit-learn framework and implements the `predict_proba` function.
## License
This project is open source software licensed under the terms of the [MIT license](LICENSE.md). We welcome contributions to the project to enhance its functionality and make it more accessible to a broader audience.
All contributions to the project and discussions on the [issue tracker](https://github.com/mrapp-ke/ExampleWiseF1Maximizer/issues) are expected to follow the [code of conduct](CODE_OF_CONDUCT.md).
Raw data
{
"_id": null,
"home_page": "https://github.com/mrapp-ke/ExampleWiseF1Maximizer",
"name": "example-wise-f1-maximizer",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "machine learning,scikit-learn,multi-label classification",
"author": "Michael Rapp",
"author_email": "michael.rapp.ml@gmail.com",
"download_url": "https://github.com/mrapp-ke/ExampleWiseF1Maximizer/releases",
"platform": "any",
"description": "# Example-wise F1 Maximizer\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![PyPI version](https://badge.fury.io/py/example-wise-f1-maximizer.svg)](https://badge.fury.io/py/example-wise-f1-maximizer)\n\n**Important links:** [Issue Tracker](https://github.com/mrapp-ke/ExampleWiseF1Maximizer/issues) | [Changelog](CHANGELOG.md) | [Code of Conduct](CODE_OF_CONDUCT.md)\n\nThis software package provides an implementation of a meta-learning algorithm for multi-label classification that aims to maximize the example-wise F1-measure. It integrates with the popular [scikit-learn](https://scikit-learn.org) machine learning framework and can also be used with frameworks for multi-label classification like [scikit-multilearn](http://scikit.ml).\n\nThe goal of [multi-label classification](https://en.wikipedia.org/wiki/Multi-label_classification) is the automatic assignment of sets of labels to individual data points, for example, the annotation of text documents with topics. The example-wise [F1-measure](https://en.wikipedia.org/wiki/F-score) is a particularly relevant evaluation measure for this kind of predictions, as it requires a classifier to achieve a good balance between labels predicted as relevant or irrelevant for an example, i.e., it must neither be to conservative nor to aggressive when it comes to predicting labels as relevant.\n\n## Methodology\n\nThe algorithm implemented by this project transforms an original multi-label problem with `n` labels into a series of `n * n + 1` binary classification problems. A probabilistic base estimator is then fit to each of these independent sub-problems as described in the following [paper](http://proceedings.mlr.press/v119/zhang20w/zhang20w.pdf):\n\n*Mingyuan Zhan, Harish G. Ramaswamy, and Shivani Agarwal. Convex Calibrated Surrogates for the Multi-Label F-Measure. In: Proceedings of the International Conference on Machine Learning (ICML), 2020.*\n \nThe probabilities predicted by the individual base estimators for unseen examples consitute a `n x n` probability matrix `p`, as well as an additional probability `p_0`. Whereas `p_0` corresponds to the prior probability of the null vector, i.e., a label vector that does not contain any relevant labels, each probability `p_ik` at the `i`-th row and `k`-th column of `p` corresponds to the conditional probability of a label vector with `k` relevant labels, where the `i`-th label is relevant. In order to identify the label vector that maximizes the F1-measure in expectation, these probabilities are used as inputs to the \"General F-Measure maximizer\" (GFM), as proposed in the following [paper](https://proceedings.neurips.cc/paper/2011/file/71ad16ad2c4d81f348082ff6c4b20768-Paper.pdf):\n\n*Krzysztof Dembczy\u0144ski, Willem Waegeman, Weiwei Cheng, and Eyke H\u00fcllermeier. An Exact Algorithm for F-Measure Maximization. In: Advances in Neural Information Processing Systems, 2011.*\n\n**Please note that this implementation has not been written by any of the authors shown above.**\n\n## Documentation\n\n### Installation\n\nThe software package is available at [PiPy](https://pypi.org/project/example-wise-f1-maximizer/) and can easily be installed via PIP using the following command:\n\n```\npip install example-wise-f1-maximizer\n```\n\n### Usage\n\nTo use the classifier in your own Python code, you need to import the class `ExampleWiseF1Maximizer`. It can be instantiated and used as shown below:\n\n```python\nfrom example_wise_f1_maximizer import ExampleWiseF1Maximizer\nfrom sklearn.linear_model import LogisticRegression\n\nclf = ExampleWiseF1Maximizer(estimator=LogisticRegression())\nx = [[ 1, 2, 3], # Two training examples with three features\n [ 11, 12, 13]]\ny = [[1, 0], # Ground truth labels of each training example\n [0, 1]]\nclf.fit(x, y)\npred = clf.predict(x)\n```\n\nThe fit method accepts two inputs, `x` and `y`:\n\n* A two-dimensional feature matrix `x`, where each row corresponds to a training example and each column corresponds to a particular feature.\n* A two-dimensional binary label matrix `y`, where each row corresponds to a training examples and each column corresponds to a label. If an element in the matrix is unlike zero, it indicates that respective label is relevant to an example. Elements that are equal to zero denote irrevant labels.\n\nBoth, `x` and `y`, are expected to be [numpy arrays](https://numpy.org/doc/stable/reference/generated/numpy.array.html) or equivalent [array-like](https://scikit-learn.org/stable/glossary.html#term-array-like) data types. In particular, the use of [scipy sparse matrices](https://docs.scipy.org/doc/scipy/reference/sparse.html) is supported.\n\nIn the previous example, logistic regression as implemented by the class `LogisticRegression` from the scikit-learn framework is used as a base estimator. Alternatively, you can use any probabilistic estimator for binary classification that is compatible with the scikit-learn framework and implements the `predict_proba` function.\n\n## License\n\nThis project is open source software licensed under the terms of the [MIT license](LICENSE.md). We welcome contributions to the project to enhance its functionality and make it more accessible to a broader audience.\n\nAll contributions to the project and discussions on the [issue tracker](https://github.com/mrapp-ke/ExampleWiseF1Maximizer/issues) are expected to follow the [code of conduct](CODE_OF_CONDUCT.md).\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A scikit-learn meta-estimator for multi-label classification that aims to maximize the example-wise F1 measure",
"version": "0.1.5",
"project_urls": {
"Download": "https://github.com/mrapp-ke/ExampleWiseF1Maximizer/releases",
"Homepage": "https://github.com/mrapp-ke/ExampleWiseF1Maximizer",
"Issue Tracker": "https://github.com/mrapp-ke/ExampleWiseF1Maximizer/issues"
},
"split_keywords": [
"machine learning",
"scikit-learn",
"multi-label classification"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "cb5a81a986125c9ed4c4d77aeded7928a8716f2ed7450da57422303e48880897",
"md5": "0ae13395ee7e889006f9614b4ff4ceda",
"sha256": "5268c66962e4162d94335dd693a142c913092bd4da6207c563d2ce6fc7c3f869"
},
"downloads": -1,
"filename": "example_wise_f1_maximizer-0.1.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "0ae13395ee7e889006f9614b4ff4ceda",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 7894,
"upload_time": "2023-06-16T11:15:57",
"upload_time_iso_8601": "2023-06-16T11:15:57.375186Z",
"url": "https://files.pythonhosted.org/packages/cb/5a/81a986125c9ed4c4d77aeded7928a8716f2ed7450da57422303e48880897/example_wise_f1_maximizer-0.1.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-06-16 11:15:57",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "mrapp-ke",
"github_project": "ExampleWiseF1Maximizer",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "example-wise-f1-maximizer"
}