# Features Maximization Metric
[![ci](https://github.com/cognitivefactory/features-maximization-metric/workflows/ci/badge.svg)](https://github.com/cognitivefactory/features-maximization-metric/actions?query=workflow%3Aci)
[![documentation](https://img.shields.io/badge/docs-mkdocs%20material-blue.svg?style=flat)](https://cognitivefactory.github.io/features-maximization-metric/)
[![pypi version](https://img.shields.io/pypi/v/cognitivefactory-features-maximization-metric.svg)](https://pypi.org/project/cognitivefactory-features-maximization-metric/)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7646382.svg)](https://doi.org/10.5281/zenodo.7646382)
Implementation of _Features Maximization Metric_, an unbiased metric aimed at estimate the quality of an unsupervised classification.
## <a name="Description"></a> Quick description
_Features Maximization_ (`FMC`) is a features selection method described in `Lamirel, J.-C., Cuxac, P., & Hajlaoui, K. (2016). A Novel Approach to Feature Selection Based on Quality Estimation Metrics. In Advances in Knowledge Discovery and Management (pp. 121–140). Springer International Publishing. https://doi.org/10.1007/978-3-319-45763-5_7`.
This metric is computed by applying the following steps:
1. Compute the ***Features F-Measure*** metric (based on ***Features Recall*** and ***Features Predominance*** metrics).
> (a) The ***Features Recall*** `FR[f][c]` for a given class `c` and a given feature `f` is the ratio between
> the sum of the vectors weights of the feature `f` for data in class `c`
> and the sum of all vectors weights of feature `f` for all data.
> It answers the question: "_Can the feature `f` distinguish the class `c` from other classes `c'` ?_"
> (b) The ***Features Predominance*** `FP[f][c]` for a given class `c` and a given feature `f` is the ratio between
> the sum of the vectors weights of the feature `f` for data in class `c`
> and the sum of all vectors weights of all feature `f'` for data in class `c`.
> It answers the question: "_Can the feature `f` better identify the class `c` than the other features `f'` ?_"
> (c) The ***Features F-Measure*** `FM[f][c]` for a given class `c` and a given feature `f` is
> the harmonic mean of the ***Features Recall*** (a) and the ***Features Predominance*** (c).
> It answers the question: "_How much information does the feature `f` contain about the class `c` ?_"
2. Compute the ***Features Selection*** (based on ***F-Measure Overall Average*** comparison).
> (d) The ***F-Measure Overall Average*** is the average of ***Features F-Measure*** (c) for all classes `c` and for all features `f`.
> It answers the question: "_What are the mean of information contained by features in all classes ?_"
> (e) A feature `f` is ***Selected*** if and only if it exist at least one class `c` for which the ***Features F-Measure*** (c) `FM[f][c]` is bigger than the ***F-Measure Overall Average*** (d).
> It answers the question: "_What are the features which contain more information than the mean of information in the dataset ?_"
> (f) A Feature `f` is ***Deleted*** if and only if the ***Features F-Measure*** (c) `FM[f][c]` is always lower than the ***F-Measure Overall Average*** (d) for each class `c`.
> It answers the question: "_What are the features which do not contain more information than the mean of information in the dataset ?_"
3. Compute the ***Features Contrast*** and ***Features Activation*** (based on ***F-Measure Marginal Averages*** comparison).
> (g) The ***F-Measure Marginal Averages*** for a given feature `f` is the average of ***Features F-Measure*** (c) for all classes `c` and for the given feature `f`.
> It answers the question: "_What are the mean of information contained by the feature `f` in all classes ?_"
> (h) The ***Features Contrast*** `FC[f][c]` for a given class `c` and a given selected feature `f` is the ratio between
> the ***Features F-Measure*** (c) `FM[f][c]`
> and the ***F-Measure Marginal Averages*** (g) for selected feature f
> put to the power of an ***Amplification Factor***.
> It answers the question: "_How relevant is the feature `f` to distinguish the class `c` ?_"
> (i) A selected Feature `f` is ***Active*** for a given class `c` if and only if the ***Features Contrast*** (h) `FC[f][c]` is bigger than `1.0`.
> It answers the question : "_For which classes a selected feature `f` is relevant ?_"
This metric is an **efficient method** to:
- **identify relevant features** of a dataset modelization;
- **describe association** between vectors features and data classes;
- **increase contrast** between data classes.
## <a name="Documentation"></a> Documentation
- [Main documentation](https://cognitivefactory.github.io/features-maximization-metric/)
## <a name="Installation"></a> Installation
Features Maximization Metric requires [`Python`](https://www.python.org/) 3.8 or above.
To install with [`pip`](https://github.com/pypa/pip):
```bash
# install package
python3 -m pip install cognitivefactory-features-maximization-metric
```
To install with [`pipx`](https://github.com/pypa/pipx):
```bash
# install pipx
python3 -m pip install --user pipx
# install package
pipx install --python python3 cognitivefactory-features-maximization-metric
```
## <a name="Development"></a> Development
To work on this project or contribute to it, please read:
- the [Copier PDM](https://pawamoy.github.io/copier-pdm/) template documentation ;
- the [Contributing](https://cognitivefactory.github.io/features-maximization-metric/contributing/) page for environment setup and development help ;
- the [Code of Conduct](https://cognitivefactory.github.io/features-maximization-metric/code_of_conduct/) page for contribution rules.
## <a name="References"></a> References
- **Features Maximization Metric**: `Lamirel, J.-C., Cuxac, P., & Hajlaoui, K. (2016). A Novel Approach to Feature Selection Based on Quality Estimation Metrics. In Advances in Knowledge Discovery and Management (pp. 121–140). Springer International Publishing. https://doi.org/10.1007/978-3-319-45763-5_7`
- **V-Measure**: `Rosenberg, Andrew & Hirschberg, Julia. (2007). V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure. 410-420.`
## <a name="How to cite"></a> How to cite
`Schild, E. (2023). cognitivefactory/features-maximization-metric. Zenodo. https://doi.org/10.5281/zenodo.7646382.`
Raw data
{
"_id": null,
"home_page": "",
"name": "cognitivefactory-features-maximization-metric",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "python,metrics,feature selection,features maximization",
"author": "",
"author_email": "Erwan Schild <erwan.schild@e-i.com>",
"download_url": "https://files.pythonhosted.org/packages/d5/e3/973b78993917e1ce7bf06fe9a512e5cb4605a2174415315e9b2e7bb860cc/cognitivefactory-features-maximization-metric-1.0.0.tar.gz",
"platform": null,
"description": "# Features Maximization Metric\n\n[![ci](https://github.com/cognitivefactory/features-maximization-metric/workflows/ci/badge.svg)](https://github.com/cognitivefactory/features-maximization-metric/actions?query=workflow%3Aci)\n[![documentation](https://img.shields.io/badge/docs-mkdocs%20material-blue.svg?style=flat)](https://cognitivefactory.github.io/features-maximization-metric/)\n[![pypi version](https://img.shields.io/pypi/v/cognitivefactory-features-maximization-metric.svg)](https://pypi.org/project/cognitivefactory-features-maximization-metric/)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7646382.svg)](https://doi.org/10.5281/zenodo.7646382)\n\nImplementation of _Features Maximization Metric_, an unbiased metric aimed at estimate the quality of an unsupervised classification.\n\n\n## <a name=\"Description\"></a> Quick description\n\n_Features Maximization_ (`FMC`) is a features selection method described in `Lamirel, J.-C., Cuxac, P., & Hajlaoui, K. (2016). A Novel Approach to Feature Selection Based on Quality Estimation Metrics. In Advances in Knowledge Discovery and Management (pp. 121\u2013140). Springer International Publishing. https://doi.org/10.1007/978-3-319-45763-5_7`.\n\nThis metric is computed by applying the following steps:\n\n1. Compute the ***Features F-Measure*** metric (based on ***Features Recall*** and ***Features Predominance*** metrics).\n\n > (a) The ***Features Recall*** `FR[f][c]` for a given class `c` and a given feature `f` is the ratio between\n > the sum of the vectors weights of the feature `f` for data in class `c`\n > and the sum of all vectors weights of feature `f` for all data.\n > It answers the question: \"_Can the feature `f` distinguish the class `c` from other classes `c'` ?_\"\n\n > (b) The ***Features Predominance*** `FP[f][c]` for a given class `c` and a given feature `f` is the ratio between\n > the sum of the vectors weights of the feature `f` for data in class `c`\n > and the sum of all vectors weights of all feature `f'` for data in class `c`.\n > It answers the question: \"_Can the feature `f` better identify the class `c` than the other features `f'` ?_\"\n\n > (c) The ***Features F-Measure*** `FM[f][c]` for a given class `c` and a given feature `f` is\n > the harmonic mean of the ***Features Recall*** (a) and the ***Features Predominance*** (c).\n > It answers the question: \"_How much information does the feature `f` contain about the class `c` ?_\"\n\n2. Compute the ***Features Selection*** (based on ***F-Measure Overall Average*** comparison).\n\n > (d) The ***F-Measure Overall Average*** is the average of ***Features F-Measure*** (c) for all classes `c` and for all features `f`.\n > It answers the question: \"_What are the mean of information contained by features in all classes ?_\"\n\n > (e) A feature `f` is ***Selected*** if and only if it exist at least one class `c` for which the ***Features F-Measure*** (c) `FM[f][c]` is bigger than the ***F-Measure Overall Average*** (d).\n > It answers the question: \"_What are the features which contain more information than the mean of information in the dataset ?_\"\n\n > (f) A Feature `f` is ***Deleted*** if and only if the ***Features F-Measure*** (c) `FM[f][c]` is always lower than the ***F-Measure Overall Average*** (d) for each class `c`.\n > It answers the question: \"_What are the features which do not contain more information than the mean of information in the dataset ?_\"\n\n3. Compute the ***Features Contrast*** and ***Features Activation*** (based on ***F-Measure Marginal Averages*** comparison).\n\n > (g) The ***F-Measure Marginal Averages*** for a given feature `f` is the average of ***Features F-Measure*** (c) for all classes `c` and for the given feature `f`.\n > It answers the question: \"_What are the mean of information contained by the feature `f` in all classes ?_\"\n\n > (h) The ***Features Contrast*** `FC[f][c]` for a given class `c` and a given selected feature `f` is the ratio between\n > the ***Features F-Measure*** (c) `FM[f][c]`\n > and the ***F-Measure Marginal Averages*** (g) for selected feature f\n > put to the power of an ***Amplification Factor***.\n > It answers the question: \"_How relevant is the feature `f` to distinguish the class `c` ?_\"\n\n > (i) A selected Feature `f` is ***Active*** for a given class `c` if and only if the ***Features Contrast*** (h) `FC[f][c]` is bigger than `1.0`.\n > It answers the question : \"_For which classes a selected feature `f` is relevant ?_\"\n\nThis metric is an **efficient method** to:\n\n- **identify relevant features** of a dataset modelization;\n- **describe association** between vectors features and data classes;\n- **increase contrast** between data classes.\n\n\n## <a name=\"Documentation\"></a> Documentation\n\n- [Main documentation](https://cognitivefactory.github.io/features-maximization-metric/)\n\n\n## <a name=\"Installation\"></a> Installation\n\nFeatures Maximization Metric requires [`Python`](https://www.python.org/) 3.8 or above.\n\nTo install with [`pip`](https://github.com/pypa/pip):\n\n```bash\n# install package\npython3 -m pip install cognitivefactory-features-maximization-metric\n```\n\nTo install with [`pipx`](https://github.com/pypa/pipx):\n\n```bash\n# install pipx\npython3 -m pip install --user pipx\n\n# install package\npipx install --python python3 cognitivefactory-features-maximization-metric\n```\n\n\n## <a name=\"Development\"></a> Development\n\nTo work on this project or contribute to it, please read:\n\n- the [Copier PDM](https://pawamoy.github.io/copier-pdm/) template documentation ;\n- the [Contributing](https://cognitivefactory.github.io/features-maximization-metric/contributing/) page for environment setup and development help ;\n- the [Code of Conduct](https://cognitivefactory.github.io/features-maximization-metric/code_of_conduct/) page for contribution rules.\n\n\n## <a name=\"References\"></a> References\n\n- **Features Maximization Metric**: `Lamirel, J.-C., Cuxac, P., & Hajlaoui, K. (2016). A Novel Approach to Feature Selection Based on Quality Estimation Metrics. In Advances in Knowledge Discovery and Management (pp. 121\u2013140). Springer International Publishing. https://doi.org/10.1007/978-3-319-45763-5_7`\n- **V-Measure**: `Rosenberg, Andrew & Hirschberg, Julia. (2007). V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure. 410-420.`\n\n\n## <a name=\"How to cite\"></a> How to cite\t\n\n`Schild, E. (2023). cognitivefactory/features-maximization-metric. Zenodo. https://doi.org/10.5281/zenodo.7646382.`\n",
"bugtrack_url": null,
"license": "CECILL-C",
"summary": "Implementation of Features Maximization Metric, an unbiased metric aimed at estimate the quality of an unsupervised classification.",
"version": "1.0.0",
"project_urls": {
"Changelog": "https://cognitivefactory.github.io/features-maximization-metric/changelog",
"Discussions": "https://github.com/cognitivefactory/features-maximization-metric/discussions",
"Documentation": "https://cognitivefactory.github.io/features-maximization-metric",
"Homepage": "https://cognitivefactory.github.io/features-maximization-metric",
"Issues": "https://github.com/cognitivefactory/features-maximization-metric/issues",
"Repository": "https://github.com/cognitivefactory/features-maximization-metric"
},
"split_keywords": [
"python",
"metrics",
"feature selection",
"features maximization"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "a818ae3b161416d7b47392dd3b9bd348fdb253421bfa788b1eed5305d178eece",
"md5": "3672757aea4c8aee2a5a5e63fa4435ce",
"sha256": "5021cf4021fb99eaa63f74d6ffdd8771b30806087dd4c7a45528970700b1396d"
},
"downloads": -1,
"filename": "cognitivefactory_features_maximization_metric-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3672757aea4c8aee2a5a5e63fa4435ce",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 18268,
"upload_time": "2023-11-14T13:09:05",
"upload_time_iso_8601": "2023-11-14T13:09:05.921885Z",
"url": "https://files.pythonhosted.org/packages/a8/18/ae3b161416d7b47392dd3b9bd348fdb253421bfa788b1eed5305d178eece/cognitivefactory_features_maximization_metric-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d5e3973b78993917e1ce7bf06fe9a512e5cb4605a2174415315e9b2e7bb860cc",
"md5": "302cd3dc2a9d43c91fdfe445c9d1a39a",
"sha256": "709d2be4346d9fbb9149bee4451ce0be96184f58dbbdca90e2ba0c735d6e9eb8"
},
"downloads": -1,
"filename": "cognitivefactory-features-maximization-metric-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "302cd3dc2a9d43c91fdfe445c9d1a39a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 21900,
"upload_time": "2023-11-14T13:09:07",
"upload_time_iso_8601": "2023-11-14T13:09:07.732647Z",
"url": "https://files.pythonhosted.org/packages/d5/e3/973b78993917e1ce7bf06fe9a512e5cb4605a2174415315e9b2e7bb860cc/cognitivefactory-features-maximization-metric-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-11-14 13:09:07",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "cognitivefactory",
"github_project": "features-maximization-metric",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "cognitivefactory-features-maximization-metric"
}