cognitivefactory-features-maximization-metric


Namecognitivefactory-features-maximization-metric JSON
Version 1.0.0 PyPI version JSON
download
home_page
SummaryImplementation of Features Maximization Metric, an unbiased metric aimed at estimate the quality of an unsupervised classification.
upload_time2023-11-14 13:09:07
maintainer
docs_urlNone
author
requires_python>=3.8
licenseCECILL-C
keywords python metrics feature selection features maximization
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Features Maximization Metric

[![ci](https://github.com/cognitivefactory/features-maximization-metric/workflows/ci/badge.svg)](https://github.com/cognitivefactory/features-maximization-metric/actions?query=workflow%3Aci)
[![documentation](https://img.shields.io/badge/docs-mkdocs%20material-blue.svg?style=flat)](https://cognitivefactory.github.io/features-maximization-metric/)
[![pypi version](https://img.shields.io/pypi/v/cognitivefactory-features-maximization-metric.svg)](https://pypi.org/project/cognitivefactory-features-maximization-metric/)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7646382.svg)](https://doi.org/10.5281/zenodo.7646382)

Implementation of _Features Maximization Metric_, an unbiased metric aimed at estimate the quality of an unsupervised classification.


## <a name="Description"></a> Quick description

_Features Maximization_ (`FMC`) is a features selection method described in `Lamirel, J.-C., Cuxac, P., & Hajlaoui, K. (2016). A Novel Approach to Feature Selection Based on Quality Estimation Metrics. In Advances in Knowledge Discovery and Management (pp. 121–140). Springer International Publishing. https://doi.org/10.1007/978-3-319-45763-5_7`.

This metric is computed by applying the following steps:

1. Compute the ***Features F-Measure*** metric (based on ***Features Recall*** and ***Features Predominance*** metrics).

    > (a) The ***Features Recall*** `FR[f][c]` for a given class `c` and a given feature `f` is the ratio between
    > the sum of the vectors weights of the feature `f` for data in class `c`
    > and the sum of all vectors weights of feature `f` for all data.
    > It answers the question: "_Can the feature `f` distinguish the class `c` from other classes `c'` ?_"

    > (b) The ***Features Predominance*** `FP[f][c]` for a given class `c` and a given feature `f` is the ratio between
    > the sum of the vectors weights of the feature `f` for data in class `c`
    > and the sum of all vectors weights of all feature `f'` for data in class `c`.
    > It answers the question: "_Can the feature `f` better identify the class `c` than the other features `f'` ?_"

    > (c) The ***Features F-Measure*** `FM[f][c]` for a given class `c` and a given feature `f` is
    > the harmonic mean of the ***Features Recall*** (a) and the ***Features Predominance*** (c).
    > It answers the question: "_How much information does the feature `f` contain about the class `c` ?_"

2. Compute the ***Features Selection*** (based on ***F-Measure Overall Average*** comparison).

    > (d) The ***F-Measure Overall Average*** is the average of ***Features F-Measure*** (c) for all classes `c` and for all features `f`.
    > It answers the question: "_What are the mean of information contained by features in all classes ?_"

    > (e) A feature `f` is ***Selected*** if and only if it exist at least one class `c` for which the ***Features F-Measure*** (c) `FM[f][c]` is bigger than the ***F-Measure Overall Average*** (d).
    > It answers the question: "_What are the features which contain more information than the mean of information in the dataset ?_"

    > (f) A Feature `f` is ***Deleted*** if and only if the ***Features F-Measure*** (c) `FM[f][c]` is always lower than the ***F-Measure Overall Average*** (d) for each class `c`.
    > It answers the question: "_What are the features which do not contain more information than the mean of information in the dataset ?_"

3. Compute the ***Features Contrast*** and ***Features Activation*** (based on ***F-Measure Marginal Averages*** comparison).

    > (g) The ***F-Measure Marginal Averages*** for a given feature `f` is the average of ***Features F-Measure*** (c) for all classes `c` and for the given feature `f`.
    > It answers the question: "_What are the mean of information contained by the feature `f` in all classes ?_"

    > (h) The ***Features Contrast*** `FC[f][c]` for a given class `c` and a given selected feature `f` is the ratio between
    > the ***Features F-Measure*** (c) `FM[f][c]`
    > and the ***F-Measure Marginal Averages*** (g) for selected feature f
    > put to the power of an ***Amplification Factor***.
    > It answers the question: "_How relevant is the feature `f` to distinguish the class `c` ?_"

    > (i) A selected Feature `f` is ***Active*** for a given class `c` if and only if the ***Features Contrast*** (h) `FC[f][c]` is bigger than `1.0`.
    > It answers the question : "_For which classes a selected feature `f` is relevant ?_"

This metric is an **efficient method** to:

- **identify relevant features** of a dataset modelization;
- **describe association** between vectors features and data classes;
- **increase contrast** between data classes.


## <a name="Documentation"></a> Documentation

- [Main documentation](https://cognitivefactory.github.io/features-maximization-metric/)


## <a name="Installation"></a> Installation

Features Maximization Metric requires [`Python`](https://www.python.org/) 3.8 or above.

To install with [`pip`](https://github.com/pypa/pip):

```bash
# install package
python3 -m pip install cognitivefactory-features-maximization-metric
```

To install with [`pipx`](https://github.com/pypa/pipx):

```bash
# install pipx
python3 -m pip install --user pipx

# install package
pipx install --python python3 cognitivefactory-features-maximization-metric
```


## <a name="Development"></a> Development

To work on this project or contribute to it, please read:

- the [Copier PDM](https://pawamoy.github.io/copier-pdm/) template documentation ;
- the [Contributing](https://cognitivefactory.github.io/features-maximization-metric/contributing/) page for environment setup and development help ;
- the [Code of Conduct](https://cognitivefactory.github.io/features-maximization-metric/code_of_conduct/) page for contribution rules.


## <a name="References"></a> References

- **Features Maximization Metric**: `Lamirel, J.-C., Cuxac, P., & Hajlaoui, K. (2016). A Novel Approach to Feature Selection Based on Quality Estimation Metrics. In Advances in Knowledge Discovery and Management (pp. 121–140). Springer International Publishing. https://doi.org/10.1007/978-3-319-45763-5_7`
- **V-Measure**: `Rosenberg, Andrew & Hirschberg, Julia. (2007). V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure. 410-420.`


## <a name="How to cite"></a> How to cite	

`Schild, E. (2023). cognitivefactory/features-maximization-metric. Zenodo. https://doi.org/10.5281/zenodo.7646382.`

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "cognitivefactory-features-maximization-metric",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "python,metrics,feature selection,features maximization",
    "author": "",
    "author_email": "Erwan Schild <erwan.schild@e-i.com>",
    "download_url": "https://files.pythonhosted.org/packages/d5/e3/973b78993917e1ce7bf06fe9a512e5cb4605a2174415315e9b2e7bb860cc/cognitivefactory-features-maximization-metric-1.0.0.tar.gz",
    "platform": null,
    "description": "# Features Maximization Metric\n\n[![ci](https://github.com/cognitivefactory/features-maximization-metric/workflows/ci/badge.svg)](https://github.com/cognitivefactory/features-maximization-metric/actions?query=workflow%3Aci)\n[![documentation](https://img.shields.io/badge/docs-mkdocs%20material-blue.svg?style=flat)](https://cognitivefactory.github.io/features-maximization-metric/)\n[![pypi version](https://img.shields.io/pypi/v/cognitivefactory-features-maximization-metric.svg)](https://pypi.org/project/cognitivefactory-features-maximization-metric/)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7646382.svg)](https://doi.org/10.5281/zenodo.7646382)\n\nImplementation of _Features Maximization Metric_, an unbiased metric aimed at estimate the quality of an unsupervised classification.\n\n\n## <a name=\"Description\"></a> Quick description\n\n_Features Maximization_ (`FMC`) is a features selection method described in `Lamirel, J.-C., Cuxac, P., & Hajlaoui, K. (2016). A Novel Approach to Feature Selection Based on Quality Estimation Metrics. In Advances in Knowledge Discovery and Management (pp. 121\u2013140). Springer International Publishing. https://doi.org/10.1007/978-3-319-45763-5_7`.\n\nThis metric is computed by applying the following steps:\n\n1. Compute the ***Features F-Measure*** metric (based on ***Features Recall*** and ***Features Predominance*** metrics).\n\n    > (a) The ***Features Recall*** `FR[f][c]` for a given class `c` and a given feature `f` is the ratio between\n    > the sum of the vectors weights of the feature `f` for data in class `c`\n    > and the sum of all vectors weights of feature `f` for all data.\n    > It answers the question: \"_Can the feature `f` distinguish the class `c` from other classes `c'` ?_\"\n\n    > (b) The ***Features Predominance*** `FP[f][c]` for a given class `c` and a given feature `f` is the ratio between\n    > the sum of the vectors weights of the feature `f` for data in class `c`\n    > and the sum of all vectors weights of all feature `f'` for data in class `c`.\n    > It answers the question: \"_Can the feature `f` better identify the class `c` than the other features `f'` ?_\"\n\n    > (c) The ***Features F-Measure*** `FM[f][c]` for a given class `c` and a given feature `f` is\n    > the harmonic mean of the ***Features Recall*** (a) and the ***Features Predominance*** (c).\n    > It answers the question: \"_How much information does the feature `f` contain about the class `c` ?_\"\n\n2. Compute the ***Features Selection*** (based on ***F-Measure Overall Average*** comparison).\n\n    > (d) The ***F-Measure Overall Average*** is the average of ***Features F-Measure*** (c) for all classes `c` and for all features `f`.\n    > It answers the question: \"_What are the mean of information contained by features in all classes ?_\"\n\n    > (e) A feature `f` is ***Selected*** if and only if it exist at least one class `c` for which the ***Features F-Measure*** (c) `FM[f][c]` is bigger than the ***F-Measure Overall Average*** (d).\n    > It answers the question: \"_What are the features which contain more information than the mean of information in the dataset ?_\"\n\n    > (f) A Feature `f` is ***Deleted*** if and only if the ***Features F-Measure*** (c) `FM[f][c]` is always lower than the ***F-Measure Overall Average*** (d) for each class `c`.\n    > It answers the question: \"_What are the features which do not contain more information than the mean of information in the dataset ?_\"\n\n3. Compute the ***Features Contrast*** and ***Features Activation*** (based on ***F-Measure Marginal Averages*** comparison).\n\n    > (g) The ***F-Measure Marginal Averages*** for a given feature `f` is the average of ***Features F-Measure*** (c) for all classes `c` and for the given feature `f`.\n    > It answers the question: \"_What are the mean of information contained by the feature `f` in all classes ?_\"\n\n    > (h) The ***Features Contrast*** `FC[f][c]` for a given class `c` and a given selected feature `f` is the ratio between\n    > the ***Features F-Measure*** (c) `FM[f][c]`\n    > and the ***F-Measure Marginal Averages*** (g) for selected feature f\n    > put to the power of an ***Amplification Factor***.\n    > It answers the question: \"_How relevant is the feature `f` to distinguish the class `c` ?_\"\n\n    > (i) A selected Feature `f` is ***Active*** for a given class `c` if and only if the ***Features Contrast*** (h) `FC[f][c]` is bigger than `1.0`.\n    > It answers the question : \"_For which classes a selected feature `f` is relevant ?_\"\n\nThis metric is an **efficient method** to:\n\n- **identify relevant features** of a dataset modelization;\n- **describe association** between vectors features and data classes;\n- **increase contrast** between data classes.\n\n\n## <a name=\"Documentation\"></a> Documentation\n\n- [Main documentation](https://cognitivefactory.github.io/features-maximization-metric/)\n\n\n## <a name=\"Installation\"></a> Installation\n\nFeatures Maximization Metric requires [`Python`](https://www.python.org/) 3.8 or above.\n\nTo install with [`pip`](https://github.com/pypa/pip):\n\n```bash\n# install package\npython3 -m pip install cognitivefactory-features-maximization-metric\n```\n\nTo install with [`pipx`](https://github.com/pypa/pipx):\n\n```bash\n# install pipx\npython3 -m pip install --user pipx\n\n# install package\npipx install --python python3 cognitivefactory-features-maximization-metric\n```\n\n\n## <a name=\"Development\"></a> Development\n\nTo work on this project or contribute to it, please read:\n\n- the [Copier PDM](https://pawamoy.github.io/copier-pdm/) template documentation ;\n- the [Contributing](https://cognitivefactory.github.io/features-maximization-metric/contributing/) page for environment setup and development help ;\n- the [Code of Conduct](https://cognitivefactory.github.io/features-maximization-metric/code_of_conduct/) page for contribution rules.\n\n\n## <a name=\"References\"></a> References\n\n- **Features Maximization Metric**: `Lamirel, J.-C., Cuxac, P., & Hajlaoui, K. (2016). A Novel Approach to Feature Selection Based on Quality Estimation Metrics. In Advances in Knowledge Discovery and Management (pp. 121\u2013140). Springer International Publishing. https://doi.org/10.1007/978-3-319-45763-5_7`\n- **V-Measure**: `Rosenberg, Andrew & Hirschberg, Julia. (2007). V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure. 410-420.`\n\n\n## <a name=\"How to cite\"></a> How to cite\t\n\n`Schild, E. (2023). cognitivefactory/features-maximization-metric. Zenodo. https://doi.org/10.5281/zenodo.7646382.`\n",
    "bugtrack_url": null,
    "license": "CECILL-C",
    "summary": "Implementation of Features Maximization Metric, an unbiased metric aimed at estimate the quality of an unsupervised classification.",
    "version": "1.0.0",
    "project_urls": {
        "Changelog": "https://cognitivefactory.github.io/features-maximization-metric/changelog",
        "Discussions": "https://github.com/cognitivefactory/features-maximization-metric/discussions",
        "Documentation": "https://cognitivefactory.github.io/features-maximization-metric",
        "Homepage": "https://cognitivefactory.github.io/features-maximization-metric",
        "Issues": "https://github.com/cognitivefactory/features-maximization-metric/issues",
        "Repository": "https://github.com/cognitivefactory/features-maximization-metric"
    },
    "split_keywords": [
        "python",
        "metrics",
        "feature selection",
        "features maximization"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a818ae3b161416d7b47392dd3b9bd348fdb253421bfa788b1eed5305d178eece",
                "md5": "3672757aea4c8aee2a5a5e63fa4435ce",
                "sha256": "5021cf4021fb99eaa63f74d6ffdd8771b30806087dd4c7a45528970700b1396d"
            },
            "downloads": -1,
            "filename": "cognitivefactory_features_maximization_metric-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3672757aea4c8aee2a5a5e63fa4435ce",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 18268,
            "upload_time": "2023-11-14T13:09:05",
            "upload_time_iso_8601": "2023-11-14T13:09:05.921885Z",
            "url": "https://files.pythonhosted.org/packages/a8/18/ae3b161416d7b47392dd3b9bd348fdb253421bfa788b1eed5305d178eece/cognitivefactory_features_maximization_metric-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d5e3973b78993917e1ce7bf06fe9a512e5cb4605a2174415315e9b2e7bb860cc",
                "md5": "302cd3dc2a9d43c91fdfe445c9d1a39a",
                "sha256": "709d2be4346d9fbb9149bee4451ce0be96184f58dbbdca90e2ba0c735d6e9eb8"
            },
            "downloads": -1,
            "filename": "cognitivefactory-features-maximization-metric-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "302cd3dc2a9d43c91fdfe445c9d1a39a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 21900,
            "upload_time": "2023-11-14T13:09:07",
            "upload_time_iso_8601": "2023-11-14T13:09:07.732647Z",
            "url": "https://files.pythonhosted.org/packages/d5/e3/973b78993917e1ce7bf06fe9a512e5cb4605a2174415315e9b2e7bb860cc/cognitivefactory-features-maximization-metric-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-11-14 13:09:07",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "cognitivefactory",
    "github_project": "features-maximization-metric",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "cognitivefactory-features-maximization-metric"
}
        
Elapsed time: 2.20593s