confidence-ensembles


Nameconfidence-ensembles JSON
Version 0.1 PyPI version JSON
download
home_pagehttps://github.com/tommyippoz/confidence-ensembles
SummaryConfidence Ensembles to Improve Classification
upload_time2023-12-13 09:41:30
maintainer
docs_urlNone
authorTommaso Zoppi
requires_python
license
keywords machine learning confidence safety ensamble
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Confidence Ensembles

Python Frameworkthat implements confidence ensembles: ConfBag and ConfBoost

## Aim/Concept of the Project

One of the biggest ICT challenges of the current era is to solve classification problems by designing and training Machine Learning (ML) algorithms that assign a discrete label to input data with perfect accuracy i.e., no misclassifications. These classifiers can be used for solving a wide variety of tasks processing either structured tabular data or unstructured inputs (e.g., images) for binary or multi-class classification. In real-world applications, the most common data type is tabular data, comprising samples (rows) with the same set of features (columns). Tabular data is used in practical applications in many fields, and especially when stakeholders aim at monitoring the behavior of ICT systems for an early detection of anomalies, errors, or intrusions. Noticeably, monitoring activities generate a massive amount of (tabular) data that could be potentially used for classification but, unfortunately, is hardly labeled. In this case, the classification process can only be unsupervised, leading to well-known detrimental effects on the overall accuracy of the machinery, making it practically unusable in practice aside from specific corner cases.

To deal with these problems, ML experts and researchers keep designing new algorithms and providing trained models that have the potential to outperform their baseline on specific scenarios, can or cannot deal with unsupervised learning, and perform well under specific assumptions (e.g., the closed-world assumption of Independent and Identically Distributed (IID) train, validation, and test data). Constraining experiments behind such assumptions may be acceptable and suitable for research purposes but does not fit the deployment of classifiers for critical ICT systems, which are complex systems whose failure could directly impact the health of people, infrastructures, or the environment. Adopting classifiers to operate critical functions requires very low misclassification probability and to prepare countermeasures that avoid or mitigate the potential cascading effects of misclassifications to the encompassing critical system.

We tackle these problems by designing, implementing and benchmarking classifiers that are more accurate and robust than baselines, potentially becoming reliable enough to be deployed in the wild in critical real-world systems. Confidence ensembles, either Confidence Bagging (ConfBag) or Confidence Boosting (ConfBoost), can use any existing classifier as a base estimator, and build a meta-classifier through a training process that does not require any additional information with respect to those that are already used to train the base estimator. ConfBag reworks the traditional bagging by using only the most confident base-learners to assign the label to input data, whereas ConfBoost creates base-learners that are more and more specialized to classify train data items for which other base-learners cannot make a confident prediction. This is radically different from the typical boosting strategies, which always require a labeled training set as base-learners are typically trained using train data that gets misclassified by other base-learners. 
As a result, ConfBag and ConfBoost:
- can work with any existing classifier as base estimator, which is used to create base-learners;
-	have a generic and quite straightforward architecture that applies to all classification tasks and scenarios: (semi-)supervised or unsupervised, binary or multi-class, and can deal with tabular, image, or even textual input data; 
-	do not require additional information with respect to those already needed by the base estimator;
-	require little to no parameter tuning, are easy to use even from non-experts, and are implemented in a public Python framework that exposes interfaces that comply with de-facto standard libraries such as scikit-learn. 

## Dependencies

confidence-ensembles needs the following libraries:
- <a href="https://numpy.org/">NumPy</a>
- <a href="https://scipy.org/">SciPy</a>
- <a href="https://pandas.pydata.org/">Pandas</a>
- <a href="https://scikit-learn.org/stable/">Scikit-Learn</a>
- <a href="https://github.com/yzhao062/pyod">PYOD</a>

## Usage

confidence-ensembles can wrap any classifier you may want to use, provided that the classifier implements scikit-learn like interfaces, namely
- classifier.fit(train_features, train_labels (optional)): trains the classifier; labels are optional (applies to unsupervised learning)
- classifier.predict_proba(test_set): takes a 2D ndarray and returns a 2D ndarray where each line contains probabilities for a given data point in the test_set

Assuming the classifier has such a structure, a confidence ensemble can be set up as it can be seen in the `test` folder

Snippet for supervised learning

https://github.com/tommyippoz/confidence-ensembles/blob/dbdcf730c55cc82505380c60339179e0d1a129c9/tests/example_supervised.py#L50-L64

Snippet for unsupervised learning

https://github.com/tommyippoz/confidence-ensembles/blob/8bbca5446fa912174dd78fbb9bcc03c19d5a166c/tests/example_unsupervised.py#L54-L69

## Credits

Developed @ University of Trento, Povo, Italy, and University of Florence, Florence, Italy

Contributors
- Tommaso Zoppi

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/tommyippoz/confidence-ensembles",
    "name": "confidence-ensembles",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "machine learning,confidence,safety,ensamble",
    "author": "Tommaso Zoppi",
    "author_email": "tommaso.zoppi@unitn.it",
    "download_url": "https://files.pythonhosted.org/packages/28/ae/01e6af80d098324220c615386c333bd52736e43cf654e548ee80ec9cb914/confidence-ensembles-0.1.tar.gz",
    "platform": null,
    "description": "# Confidence Ensembles\r\n\r\nPython Frameworkthat implements confidence ensembles: ConfBag and ConfBoost\r\n\r\n## Aim/Concept of the Project\r\n\r\nOne of the biggest ICT challenges of the current era is to solve classification problems by designing and training Machine Learning (ML) algorithms that assign a discrete label to input data with perfect accuracy i.e., no misclassifications. These classifiers can be used for solving a wide variety of tasks processing either structured tabular data or unstructured inputs (e.g., images) for binary or multi-class classification. In real-world applications, the most common data type is tabular data, comprising samples (rows) with the same set of features (columns). Tabular data is used in practical applications in many fields, and especially when stakeholders aim at monitoring the behavior of ICT systems for an early detection of anomalies, errors, or intrusions. Noticeably, monitoring activities generate a massive amount of (tabular) data that could be potentially used for classification but, unfortunately, is hardly labeled. In this case, the classification process can only be unsupervised, leading to well-known detrimental effects on the overall accuracy of the machinery, making it practically unusable in practice aside from specific corner cases.\r\n\r\nTo deal with these problems, ML experts and researchers keep designing new algorithms and providing trained models that have the potential to outperform their baseline on specific scenarios, can or cannot deal with unsupervised learning, and perform well under specific assumptions (e.g., the closed-world assumption of Independent and Identically Distributed (IID) train, validation, and test data). Constraining experiments behind such assumptions may be acceptable and suitable for research purposes but does not fit the deployment of classifiers for critical ICT systems, which are complex systems whose failure could directly impact the health of people, infrastructures, or the environment. Adopting classifiers to operate critical functions requires very low misclassification probability and to prepare countermeasures that avoid or mitigate the potential cascading effects of misclassifications to the encompassing critical system.\r\n\r\nWe tackle these problems by designing, implementing and benchmarking classifiers that are more accurate and robust than baselines, potentially becoming reliable enough to be deployed in the wild in critical real-world systems. Confidence ensembles, either Confidence Bagging (ConfBag) or Confidence Boosting (ConfBoost), can use any existing classifier as a base estimator, and build a meta-classifier through a training process that does not require any additional information with respect to those that are already used to train the base estimator. ConfBag reworks the traditional bagging by using only the most confident base-learners to assign the label to input data, whereas ConfBoost creates base-learners that are more and more specialized to classify train data items for which other base-learners cannot make a confident prediction. This is radically different from the typical boosting strategies, which always require a labeled training set as base-learners are typically trained using train data that gets misclassified by other base-learners. \r\nAs a result, ConfBag and ConfBoost:\r\n- can work with any existing classifier as base estimator, which is used to create base-learners;\r\n-\thave a generic and quite straightforward architecture that applies to all classification tasks and scenarios: (semi-)supervised or unsupervised, binary or multi-class, and can deal with tabular, image, or even textual input data; \r\n-\tdo not require additional information with respect to those already needed by the base estimator;\r\n-\trequire little to no parameter tuning, are easy to use even from non-experts, and are implemented in a public Python framework that exposes interfaces that comply with de-facto standard libraries such as scikit-learn. \r\n\r\n## Dependencies\r\n\r\nconfidence-ensembles needs the following libraries:\r\n- <a href=\"https://numpy.org/\">NumPy</a>\r\n- <a href=\"https://scipy.org/\">SciPy</a>\r\n- <a href=\"https://pandas.pydata.org/\">Pandas</a>\r\n- <a href=\"https://scikit-learn.org/stable/\">Scikit-Learn</a>\r\n- <a href=\"https://github.com/yzhao062/pyod\">PYOD</a>\r\n\r\n## Usage\r\n\r\nconfidence-ensembles can wrap any classifier you may want to use, provided that the classifier implements scikit-learn like interfaces, namely\r\n- classifier.fit(train_features, train_labels (optional)): trains the classifier; labels are optional (applies to unsupervised learning)\r\n- classifier.predict_proba(test_set): takes a 2D ndarray and returns a 2D ndarray where each line contains probabilities for a given data point in the test_set\r\n\r\nAssuming the classifier has such a structure, a confidence ensemble can be set up as it can be seen in the `test` folder\r\n\r\nSnippet for supervised learning\r\n\r\nhttps://github.com/tommyippoz/confidence-ensembles/blob/dbdcf730c55cc82505380c60339179e0d1a129c9/tests/example_supervised.py#L50-L64\r\n\r\nSnippet for unsupervised learning\r\n\r\nhttps://github.com/tommyippoz/confidence-ensembles/blob/8bbca5446fa912174dd78fbb9bcc03c19d5a166c/tests/example_unsupervised.py#L54-L69\r\n\r\n## Credits\r\n\r\nDeveloped @ University of Trento, Povo, Italy, and University of Florence, Florence, Italy\r\n\r\nContributors\r\n- Tommaso Zoppi\r\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Confidence Ensembles to Improve Classification",
    "version": "0.1",
    "project_urls": {
        "Homepage": "https://github.com/tommyippoz/confidence-ensembles"
    },
    "split_keywords": [
        "machine learning",
        "confidence",
        "safety",
        "ensamble"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e08f594cdf52e96bcb8ac89e5bed9170591d92d071fbc1320ca7726f7c5eddf2",
                "md5": "12a1b8d680aaa86838e40ca54bb8ea1c",
                "sha256": "cc505afaa8586330456e8dade409eddc821288ae775eb5887b64a2d41176ab45"
            },
            "downloads": -1,
            "filename": "confidence_ensembles-0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "12a1b8d680aaa86838e40ca54bb8ea1c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 6135,
            "upload_time": "2023-12-13T09:41:27",
            "upload_time_iso_8601": "2023-12-13T09:41:27.961114Z",
            "url": "https://files.pythonhosted.org/packages/e0/8f/594cdf52e96bcb8ac89e5bed9170591d92d071fbc1320ca7726f7c5eddf2/confidence_ensembles-0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "28ae01e6af80d098324220c615386c333bd52736e43cf654e548ee80ec9cb914",
                "md5": "64721a721e88604e1e301112c6399b5c",
                "sha256": "db71416e42205488ccd758b9a97d6564dfb0c8eb095fd8c05d7d14c188b65446"
            },
            "downloads": -1,
            "filename": "confidence-ensembles-0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "64721a721e88604e1e301112c6399b5c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 5843,
            "upload_time": "2023-12-13T09:41:30",
            "upload_time_iso_8601": "2023-12-13T09:41:30.035782Z",
            "url": "https://files.pythonhosted.org/packages/28/ae/01e6af80d098324220c615386c333bd52736e43cf654e548ee80ec9cb914/confidence-ensembles-0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-13 09:41:30",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "tommyippoz",
    "github_project": "confidence-ensembles",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "confidence-ensembles"
}
        
Elapsed time: 0.22880s