scikit-activeml


Namescikit-activeml JSON
Version 0.4.1 PyPI version JSON
download
home_pagehttps://github.com/scikit-activeml/scikit-activeml
SummaryOur package scikit-activeml is a Python library for active learning on top of SciPy and scikit-learn.
upload_time2023-04-27 09:12:25
maintainer
docs_urlNone
authorMarek Herde
requires_python>=3.8
licenseBSD 3-Clause License
keywords active learning machine learning semi-supervised learning data mining pattern recognition artificial intelligence
VCS
bugtrack_url
requirements joblib numpy scipy scikit-learn matplotlib iteration-utilities
Travis-CI No Travis.
coveralls test coverage
            .. intro_start

|Doc|_ |Codecov|_ |PythonVersion|_ |PyPi|_ |Paper|_ |Black|_

.. |Doc| image:: https://img.shields.io/badge/docs-latest-green
.. _Doc: https://scikit-activeml.github.io/scikit-activeml-docs/

.. |Codecov| image:: https://codecov.io/gh/scikit-activeml/scikit-activeml/branch/master/graph/badge.svg
.. _Codecov: https://app.codecov.io/gh/scikit-activeml/scikit-activeml

.. |PythonVersion| image:: https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10-blue
.. _PythonVersion: https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10-blue

.. |PyPi| image:: https://badge.fury.io/py/scikit-activeml.svg
.. _PyPi: https://badge.fury.io/py/scikit-activeml

.. |Paper| image:: https://img.shields.io/badge/paper-10.20944/preprints202103.0194.v1-blue
.. _Paper: https://www.preprints.org/manuscript/202103.0194/v1

.. |Black| image:: https://img.shields.io/badge/code%20style-black-000000.svg
.. _Black: https://github.com/psf/black

|

.. image:: https://raw.githubusercontent.com/scikit-activeml/scikit-activeml/master/docs/logos/scikit-activeml-logo.png
   :width: 200

|

Machine learning applications often need large amounts of training data to
perform well. Whereas unlabeled data can be easily gathered, the labeling process
is difficult, time-consuming, or expensive in most applications. Active learning can help solve
this problem by querying labels for those data samples that will improve the performance
the most. Thereby, the goal is that the learning algorithm performs sufficiently well with
fewer labels

With this goal in mind, **scikit-activeml** has been developed as a Python module for active learning
on top of `scikit-learn <https://scikit-learn.org/stable/>`_. The project was initiated in 2020 by the
`Intelligent Embedded Systems Group <https://www.uni-kassel.de/eecs/en/sections/intelligent-embedded-systems/home>`_
at the University of Kassel and is distributed under the `3-Clause BSD licence
<https://github.com/scikit-activeml/scikit-activeml/blob/master/LICENSE.txt>`_.

.. intro_end

.. user_installation_start

User Installation
=================

The easiest way of installing scikit-activeml is using ``pip``:

::

    pip install -U scikit-activeml

.. user_installation_end

.. examples_start

Examples
========
We provide a broad overview of different use-cases in our `tutorial section <https://scikit-activeml.github.io/scikit-activeml-docs/tutorials.html>`_ offering

- `Pool-based Active Learning - Getting Started <https://scikit-activeml.github.io/scikit-activeml-docs/generated/tutorials/00_pool_getting_started.html>`_,
- `Deep Pool-based Active Learning - scikit-activeml with Skorch <https://scikit-activeml.github.io/scikit-activeml-docs/generated/tutorials/01_deep_pool_al_with_skorch.html>`_,
- `Pool-based Active Learning for Regression - Getting Started <https://scikit-activeml.github.io/scikit-activeml-docs/generated/tutorials/02_pool_regression_getting_started.html>`_,
- `Multi-annotator Pool-based Active Learning - Getting Started <https://scikit-activeml.github.io/scikit-activeml-docs/generated/tutorials/10_multiple_annotators_getting_started.html>`_,
- `Stream-based Active Learning - Getting Started <https://scikit-activeml.github.io/scikit-activeml-docs/generated/tutorials/20_stream_getting_started.html>`_,
- and `Batch Stream-based Active Learning with Pool Query Strategies <https://scikit-activeml.github.io/scikit-activeml-docs/generated/tutorials/21_stream_batch_with_pool_al.html>`_.

Two simple examples illustrating the straightforwardness of implementing active learning cycles with our Python package ``skactiveml`` are given in the following.

Pool-based Active Learning
##########################

The following code snippet implements an active learning cycle with 20 iterations using a Gaussian process
classifier and uncertainty sampling. To use other classifiers, you can simply wrap classifiers from
``sklearn`` or use classifiers provided by ``skactiveml``. Note that the main difficulty using
active learning with ``sklearn`` is the ability to handle unlabeled data, which we denote as a specific value
(``MISSING_LABEL``) in the label vector ``y``. More query strategies can be found in the documentation.

.. code-block:: python

    import numpy as np
    from sklearn.gaussian_process import GaussianProcessClassifier
    from sklearn.datasets import make_blobs
    from skactiveml.pool import UncertaintySampling
    from skactiveml.utils import unlabeled_indices, MISSING_LABEL
    from skactiveml.classifier import SklearnClassifier

    # Generate data set.
    X, y_true = make_blobs(n_samples=200, centers=4, random_state=0)
    y = np.full(shape=y_true.shape, fill_value=MISSING_LABEL)

    # Use the first 10 instances as initial training data.
    y[:10] = y_true[:10]

    # Create classifier and query strategy.
    clf = SklearnClassifier(
        GaussianProcessClassifier(random_state=0),
        classes=np.unique(y_true),
        random_state=0
    )
    qs = UncertaintySampling(method='entropy')

    # Execute active learning cycle.
    n_cycles = 20
    for c in range(n_cycles):
        query_idx = qs.query(X=X, y=y, clf=clf)
        y[query_idx] = y_true[query_idx]

    # Fit final classifier.
    clf.fit(X, y)

As a result, we obtain an actively trained Gaussian process classifier.
A corresponding visualization of its decision boundary (black line) and the
sample utilities (greenish contours) is given below.

.. image:: https://raw.githubusercontent.com/scikit-activeml/scikit-activeml/master/docs/logos/pal-example-output.png
   :width: 400

Stream-based Active Learning
############################

The following code snippet implements an active learning cycle with 200 data points and
the default budget of 10% using a pwc classifier and split uncertainty sampling. 
Like in the pool-based example you can wrap other classifiers from ``sklearn``,
``sklearn`` compatible classifiers or like the example classifiers provided by ``skactiveml``.

.. code-block:: python

    import numpy as np
    from sklearn.datasets import make_blobs
    from skactiveml.classifier import ParzenWindowClassifier
    from skactiveml.stream import Split
    from skactiveml.utils import MISSING_LABEL

    # Generate data set.
    X, y_true = make_blobs(n_samples=200, centers=4, random_state=0)

    # Create classifier and query strategy.
    clf = ParzenWindowClassifier(random_state=0, classes=np.unique(y_true))
    qs = Split(random_state=0)

    # Initializing the training data as an empty array.
    X_train = []
    y_train = []

    # Initialize the list that stores the result of the classifier's prediction.
    correct_classifications = []

    # Execute active learning cycle.
    for x_t, y_t in zip(X, y_true):
        X_cand = x_t.reshape([1, -1])
        y_cand = y_t
        clf.fit(X_train, y_train)
        correct_classifications.append(clf.predict(X_cand)[0] == y_cand)
        sampled_indices = qs.query(candidates=X_cand, clf=clf)
        qs.update(candidates=X_cand, queried_indices=sampled_indices)
        X_train.append(x_t)
        y_train.append(y_cand if len(sampled_indices) > 0 else MISSING_LABEL)

As a result, we obtain an actively trained Parzen window classifier.
A corresponding visualization of its accuracy curve accross the active learning
cycle is given below.

.. image:: https://raw.githubusercontent.com/scikit-activeml/scikit-activeml/master/docs/logos/stream-example-output.png
   :width: 400

Query Strategy Overview
#######################

For better orientation, we provide an `overview <https://scikit-activeml.github.io/scikit-activeml-docs/generated/strategy_overview.html>`_
(incl. paper references and `visualizations <https://scikit-activeml.github.io/scikit-activeml-docs/generated/sphinx_gallery_examples/index.html>`_)
of the query strategies implemented by ``skactiveml``.

|Overview| |Visualization|

.. |Overview| image:: https://raw.githubusercontent.com/scikit-activeml/scikit-activeml/master/docs/logos/strategy-overview.gif
   :width: 400
   
.. |Visualization| image:: https://raw.githubusercontent.com/scikit-activeml/scikit-activeml/master/docs/logos/example-overview.gif
   :width: 400

.. examples_end

.. citing_start

Citing
======
If you use ``skactiveml`` in one of your research projects and find it helpful,
please cite the following:

::

    @article{skactiveml2021,
        title={scikit-activeml: {A} {L}ibrary and {T}oolbox for {A}ctive {L}earning {A}lgorithms},
        author={Daniel Kottke and Marek Herde and Tuan Pham Minh and Alexander Benz and Pascal Mergard and Atal Roghman and Christoph Sandrock and Bernhard Sick},
        journal={Preprints},
        doi={10.20944/preprints202103.0194.v1},
        year={2021},
        url={https://github.com/scikit-activeml/scikit-activeml}
    }

.. citing_end



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/scikit-activeml/scikit-activeml",
    "name": "scikit-activeml",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "active learning,machine learning,semi-supervised learning,data mining,pattern recognition,artificial intelligence",
    "author": "Marek Herde",
    "author_email": "marek.herde@uni-kassel.de",
    "download_url": "https://files.pythonhosted.org/packages/49/34/ded2ec48583fe3b72235cb7a3d6b00fe715b615f66154bf3e9a6a5813b5e/scikit-activeml-0.4.1.tar.gz",
    "platform": null,
    "description": ".. intro_start\n\n|Doc|_ |Codecov|_ |PythonVersion|_ |PyPi|_ |Paper|_ |Black|_\n\n.. |Doc| image:: https://img.shields.io/badge/docs-latest-green\n.. _Doc: https://scikit-activeml.github.io/scikit-activeml-docs/\n\n.. |Codecov| image:: https://codecov.io/gh/scikit-activeml/scikit-activeml/branch/master/graph/badge.svg\n.. _Codecov: https://app.codecov.io/gh/scikit-activeml/scikit-activeml\n\n.. |PythonVersion| image:: https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10-blue\n.. _PythonVersion: https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10-blue\n\n.. |PyPi| image:: https://badge.fury.io/py/scikit-activeml.svg\n.. _PyPi: https://badge.fury.io/py/scikit-activeml\n\n.. |Paper| image:: https://img.shields.io/badge/paper-10.20944/preprints202103.0194.v1-blue\n.. _Paper: https://www.preprints.org/manuscript/202103.0194/v1\n\n.. |Black| image:: https://img.shields.io/badge/code%20style-black-000000.svg\n.. _Black: https://github.com/psf/black\n\n|\n\n.. image:: https://raw.githubusercontent.com/scikit-activeml/scikit-activeml/master/docs/logos/scikit-activeml-logo.png\n   :width: 200\n\n|\n\nMachine learning applications often need large amounts of training data to\nperform well. Whereas unlabeled data can be easily gathered, the labeling process\nis difficult, time-consuming, or expensive in most applications. Active learning can help solve\nthis problem by querying labels for those data samples that will improve the performance\nthe most. Thereby, the goal is that the learning algorithm performs sufficiently well with\nfewer labels\n\nWith this goal in mind, **scikit-activeml** has been developed as a Python module for active learning\non top of `scikit-learn <https://scikit-learn.org/stable/>`_. The project was initiated in 2020 by the\n`Intelligent Embedded Systems Group <https://www.uni-kassel.de/eecs/en/sections/intelligent-embedded-systems/home>`_\nat the University of Kassel and is distributed under the `3-Clause BSD licence\n<https://github.com/scikit-activeml/scikit-activeml/blob/master/LICENSE.txt>`_.\n\n.. intro_end\n\n.. user_installation_start\n\nUser Installation\n=================\n\nThe easiest way of installing scikit-activeml is using ``pip``:\n\n::\n\n    pip install -U scikit-activeml\n\n.. user_installation_end\n\n.. examples_start\n\nExamples\n========\nWe provide a broad overview of different use-cases in our `tutorial section <https://scikit-activeml.github.io/scikit-activeml-docs/tutorials.html>`_ offering\n\n- `Pool-based Active Learning - Getting Started <https://scikit-activeml.github.io/scikit-activeml-docs/generated/tutorials/00_pool_getting_started.html>`_,\n- `Deep Pool-based Active Learning - scikit-activeml with Skorch <https://scikit-activeml.github.io/scikit-activeml-docs/generated/tutorials/01_deep_pool_al_with_skorch.html>`_,\n- `Pool-based Active Learning for Regression - Getting Started <https://scikit-activeml.github.io/scikit-activeml-docs/generated/tutorials/02_pool_regression_getting_started.html>`_,\n- `Multi-annotator Pool-based Active Learning - Getting Started <https://scikit-activeml.github.io/scikit-activeml-docs/generated/tutorials/10_multiple_annotators_getting_started.html>`_,\n- `Stream-based Active Learning - Getting Started <https://scikit-activeml.github.io/scikit-activeml-docs/generated/tutorials/20_stream_getting_started.html>`_,\n- and `Batch Stream-based Active Learning with Pool Query Strategies <https://scikit-activeml.github.io/scikit-activeml-docs/generated/tutorials/21_stream_batch_with_pool_al.html>`_.\n\nTwo simple examples illustrating the straightforwardness of implementing active learning cycles with our Python package ``skactiveml`` are given in the following.\n\nPool-based Active Learning\n##########################\n\nThe following code snippet implements an active learning cycle with 20 iterations using a Gaussian process\nclassifier and uncertainty sampling. To use other classifiers, you can simply wrap classifiers from\n``sklearn`` or use classifiers provided by ``skactiveml``. Note that the main difficulty using\nactive learning with ``sklearn`` is the ability to handle unlabeled data, which we denote as a specific value\n(``MISSING_LABEL``) in the label vector ``y``. More query strategies can be found in the documentation.\n\n.. code-block:: python\n\n    import numpy as np\n    from sklearn.gaussian_process import GaussianProcessClassifier\n    from sklearn.datasets import make_blobs\n    from skactiveml.pool import UncertaintySampling\n    from skactiveml.utils import unlabeled_indices, MISSING_LABEL\n    from skactiveml.classifier import SklearnClassifier\n\n    # Generate data set.\n    X, y_true = make_blobs(n_samples=200, centers=4, random_state=0)\n    y = np.full(shape=y_true.shape, fill_value=MISSING_LABEL)\n\n    # Use the first 10 instances as initial training data.\n    y[:10] = y_true[:10]\n\n    # Create classifier and query strategy.\n    clf = SklearnClassifier(\n        GaussianProcessClassifier(random_state=0),\n        classes=np.unique(y_true),\n        random_state=0\n    )\n    qs = UncertaintySampling(method='entropy')\n\n    # Execute active learning cycle.\n    n_cycles = 20\n    for c in range(n_cycles):\n        query_idx = qs.query(X=X, y=y, clf=clf)\n        y[query_idx] = y_true[query_idx]\n\n    # Fit final classifier.\n    clf.fit(X, y)\n\nAs a result, we obtain an actively trained Gaussian process classifier.\nA corresponding visualization of its decision boundary (black line) and the\nsample utilities (greenish contours) is given below.\n\n.. image:: https://raw.githubusercontent.com/scikit-activeml/scikit-activeml/master/docs/logos/pal-example-output.png\n   :width: 400\n\nStream-based Active Learning\n############################\n\nThe following code snippet implements an active learning cycle with 200 data points and\nthe default budget of 10% using a pwc classifier and split uncertainty sampling. \nLike in the pool-based example you can wrap other classifiers from ``sklearn``,\n``sklearn`` compatible classifiers or like the example classifiers provided by ``skactiveml``.\n\n.. code-block:: python\n\n    import numpy as np\n    from sklearn.datasets import make_blobs\n    from skactiveml.classifier import ParzenWindowClassifier\n    from skactiveml.stream import Split\n    from skactiveml.utils import MISSING_LABEL\n\n    # Generate data set.\n    X, y_true = make_blobs(n_samples=200, centers=4, random_state=0)\n\n    # Create classifier and query strategy.\n    clf = ParzenWindowClassifier(random_state=0, classes=np.unique(y_true))\n    qs = Split(random_state=0)\n\n    # Initializing the training data as an empty array.\n    X_train = []\n    y_train = []\n\n    # Initialize the list that stores the result of the classifier's prediction.\n    correct_classifications = []\n\n    # Execute active learning cycle.\n    for x_t, y_t in zip(X, y_true):\n        X_cand = x_t.reshape([1, -1])\n        y_cand = y_t\n        clf.fit(X_train, y_train)\n        correct_classifications.append(clf.predict(X_cand)[0] == y_cand)\n        sampled_indices = qs.query(candidates=X_cand, clf=clf)\n        qs.update(candidates=X_cand, queried_indices=sampled_indices)\n        X_train.append(x_t)\n        y_train.append(y_cand if len(sampled_indices) > 0 else MISSING_LABEL)\n\nAs a result, we obtain an actively trained Parzen window classifier.\nA corresponding visualization of its accuracy curve accross the active learning\ncycle is given below.\n\n.. image:: https://raw.githubusercontent.com/scikit-activeml/scikit-activeml/master/docs/logos/stream-example-output.png\n   :width: 400\n\nQuery Strategy Overview\n#######################\n\nFor better orientation, we provide an `overview <https://scikit-activeml.github.io/scikit-activeml-docs/generated/strategy_overview.html>`_\n(incl. paper references and `visualizations <https://scikit-activeml.github.io/scikit-activeml-docs/generated/sphinx_gallery_examples/index.html>`_)\nof the query strategies implemented by ``skactiveml``.\n\n|Overview| |Visualization|\n\n.. |Overview| image:: https://raw.githubusercontent.com/scikit-activeml/scikit-activeml/master/docs/logos/strategy-overview.gif\n   :width: 400\n   \n.. |Visualization| image:: https://raw.githubusercontent.com/scikit-activeml/scikit-activeml/master/docs/logos/example-overview.gif\n   :width: 400\n\n.. examples_end\n\n.. citing_start\n\nCiting\n======\nIf you use ``skactiveml`` in one of your research projects and find it helpful,\nplease cite the following:\n\n::\n\n    @article{skactiveml2021,\n        title={scikit-activeml: {A} {L}ibrary and {T}oolbox for {A}ctive {L}earning {A}lgorithms},\n        author={Daniel Kottke and Marek Herde and Tuan Pham Minh and Alexander Benz and Pascal Mergard and Atal Roghman and Christoph Sandrock and Bernhard Sick},\n        journal={Preprints},\n        doi={10.20944/preprints202103.0194.v1},\n        year={2021},\n        url={https://github.com/scikit-activeml/scikit-activeml}\n    }\n\n.. citing_end\n\n\n",
    "bugtrack_url": null,
    "license": "BSD 3-Clause License",
    "summary": "Our package scikit-activeml is a Python library for active learning on top of SciPy and scikit-learn.",
    "version": "0.4.1",
    "split_keywords": [
        "active learning",
        "machine learning",
        "semi-supervised learning",
        "data mining",
        "pattern recognition",
        "artificial intelligence"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "97686dbbf002a1ce2825c5d3e40823fbd64811c38f1880e487ffe7c79a2f21e1",
                "md5": "a61131f806aa1cd28d375a9b05544a82",
                "sha256": "7861f9e0f6080f3f74cc26128c36cf434ba59ef871f511d16d8fc17f468707de"
            },
            "downloads": -1,
            "filename": "scikit_activeml-0.4.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a61131f806aa1cd28d375a9b05544a82",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 252713,
            "upload_time": "2023-04-27T09:12:22",
            "upload_time_iso_8601": "2023-04-27T09:12:22.065678Z",
            "url": "https://files.pythonhosted.org/packages/97/68/6dbbf002a1ce2825c5d3e40823fbd64811c38f1880e487ffe7c79a2f21e1/scikit_activeml-0.4.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4934ded2ec48583fe3b72235cb7a3d6b00fe715b615f66154bf3e9a6a5813b5e",
                "md5": "9e1a3f6281118e5bf44c64120673e51e",
                "sha256": "b05e821a7c1a407c6ab0da817f27830fdf6cfedbb70e5bfe91f1496bfdefb8c2"
            },
            "downloads": -1,
            "filename": "scikit-activeml-0.4.1.tar.gz",
            "has_sig": false,
            "md5_digest": "9e1a3f6281118e5bf44c64120673e51e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 177731,
            "upload_time": "2023-04-27T09:12:25",
            "upload_time_iso_8601": "2023-04-27T09:12:25.120349Z",
            "url": "https://files.pythonhosted.org/packages/49/34/ded2ec48583fe3b72235cb7a3d6b00fe715b615f66154bf3e9a6a5813b5e/scikit-activeml-0.4.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-04-27 09:12:25",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "scikit-activeml",
    "github_project": "scikit-activeml",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "requirements": [
        {
            "name": "joblib",
            "specs": [
                [
                    ">=",
                    "1.2"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.22"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    ">=",
                    "1.8"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    ">=",
                    "1.2"
                ]
            ]
        },
        {
            "name": "matplotlib",
            "specs": [
                [
                    ">=",
                    "3.5"
                ]
            ]
        },
        {
            "name": "iteration-utilities",
            "specs": [
                [
                    ">=",
                    "0.11"
                ]
            ]
        }
    ],
    "lcname": "scikit-activeml"
}
        
Elapsed time: 0.05816s