pulearn ⏂
#########
|PyPI-Status| |PyPI-Versions| |Build-Status| |Codecov| |Codefactor| |LICENCE|
Positive-unlabeled learning with Python.
**Website:** `https://pulearn.github.io/pulearn/ <https://pulearn.github.io/pulearn/>`_
**Documentation:** `https://pulearn.github.io/pulearn/doc/pulearn/ <https://pulearn.github.io/pulearn/doc/pulearn/>`_
.. code-block:: python
from pulearn import ElkanotoPuClassifier
from sklearn.svm import SVC
svc = SVC(C=10, kernel='rbf', gamma=0.4, probability=True)
pu_estimator = ElkanotoPuClassifier(estimator=svc, hold_out_ratio=0.2)
pu_estimator.fit(X, y)
.. contents::
.. section-numbering::
Documentation
=============
This is the repository for the ``pulearn`` package. The readme file is aimed at helping contributors to the project.
To learn more about how to use ``pulearn``, either `visit pulearn's homepage <https://pulearn.github.io/pulearn/>`_ or read the documentation at <https://pulearn.github.io/pulearn/doc/pulearn/>`_.
Installation
============
Install ``pulearn`` with:
.. code-block:: bash
pip install pulearn
Implemented Classifiers
=======================
Elkanoto
--------
Scikit-Learn wrappers for both the methods mentioned in the paper by Elkan and Noto, `"Learning classifiers from only positive and unlabeled data" <https://cseweb.ucsd.edu/~elkan/posonly.pdf>`_ (published in Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2008).
These wrap the Python code from `a fork by AdityaAS <https://github.com/AdityaAS/pu-learning>`_ (with implementation to both methods) to the `original repository <https://github.com/aldro61/pu-learning>`_ by `Alexandre Drouin <https://github.com/aldro61>`_ implementing one of the methods.
Unlabeled examples are expected to be indicated by `-1`, positives by `1`.
Classic Elkanoto
~~~~~~~~~~~~~~~~
To use the classic (unweighted) method, use the ``ElkanotoPuClassifier`` class:
.. code-block:: python
from pulearn import ElkanotoPuClassifier
from sklearn.svm import SVC
svc = SVC(C=10, kernel='rbf', gamma=0.4, probability=True)
pu_estimator = ElkanotoPuClassifier(estimator=svc, hold_out_ratio=0.2)
pu_estimator.fit(X, y)
Weighted Elkanoto
~~~~~~~~~~~~~~~~~
To use the weighted method, use the ``WeightedElkanotoPuClassifier`` class:
.. code-block:: python
from pulearn import WeightedElkanotoPuClassifier
from sklearn.svm import SVC
svc = SVC(C=10, kernel='rbf', gamma=0.4, probability=True)
pu_estimator = WeightedElkanotoPuClassifier(
estimator=svc, labeled=10, unlabeled=20, hold_out_ratio=0.2)
pu_estimator.fit(X, y)
See the original paper for details on how the ``labeled`` and ``unlabeled`` quantities are used to weigh training examples and affect the learning process: `https://cseweb.ucsd.edu/~elkan/posonly.pdf <https://cseweb.ucsd.edu/~elkan/posonly.pdf>`_.
Bagging-based PU-learning
-------------------------
Based on the paper `A bagging SVM to learn from positive and unlabeled examples (2013) <http://members.cbio.mines-paristech.fr/~jvert/svn/bibli/local/Mordelet2013bagging.pdf>`_ by Mordelet and Vert. The implementation is by `Roy Wright <https://roywrightme.wordpress.com/>`__ (`roywright <https://github.com/roywright/>`_ on GitHub), and can be found in `his repository <https://github.com/roywright/pu_learning>`_.
Unlabeled examples are expected to be indicated by a number smaller than `1`, positives by `1`.
.. code-block:: python
from pulearn import BaggingPuClassifier
from sklearn.svm import SVC
svc = SVC(C=10, kernel='rbf', gamma=0.4, probability=True)
pu_estimator = BaggingPuClassifier(
estimator=svc, n_estimators=15)
pu_estimator.fit(X, y)
Examples
========
A nice code example of the classic Elkan-Noto classifier used for classification on the `Wisconsin breast cancer dataset <https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)>`_ , comparing it to a regular random forest classifier, can be found in the ``examples`` directory.
To run it, clone the repository, and run the following command from the root of the repository, with a python environment where ``pulearn`` is installed:
.. code-block:: bash
python examples/BreastCancerElkanotoExample.py
You should see a nice plot like the one below, comparing the F1 score of the PU learner versus a naive learner, demonstrating how PU learning becomes more effective - or worthwhile - the more positive examples are "hidden" from the training set.
.. image:: https://raw.githubusercontent.com/pulearn/pulearn/master/pulearn_breast_cancer_f1_scores.png
Contributing
============
Package author and current maintainer is Shay Palachy (shay.palachy@gmail.com); You are more than welcome to approach him for help. Contributions are very welcomed, especially since this package is very much in its infancy and many other PU Learning methods can be added.
Installing for development
--------------------------
Clone:
.. code-block:: bash
git clone git@github.com:pulearn/pulearn.git
Install in development mode with test dependencies:
.. code-block:: bash
cd pulearn
pip install -e ".[test]"
Running the tests
-----------------
To run the tests, use:
.. code-block:: bash
python -m pytest
Notice ``pytest`` runs are configured by the ``pytest.ini`` file. Read it to understand the exact ``pytest`` arguments used.
Adding tests
------------
At the time of writing, ``pulearn`` is maintained with a test coverage of 100%. Although challenging, I hope to maintain this status. If you add code to the package, please make sure you thoroughly test it. Codecov automatically reports changes in coverage on each PR, and so PR reducing test coverage will not be examined before that is fixed.
Tests reside under the ``tests`` directory in the root of the repository. Each model has a separate test folder, with each class - usually a pipeline stage - having a dedicated file (always starting with the string "test") containing several tests (each a global function starting with the string "test"). Please adhere to this structure, and try to separate tests cases to different test functions; this allows us to quickly focus on problem areas and use cases. Thank you! :)
Code style
----------
``pulearn`` code is written to adhere to the coding style dictated by `flake8 <http://flake8.pycqa.org/en/latest/>`_. Practically, this means that one of the jobs that runs on `the project's Travis <https://travis-ci.org/pulearn/pulearn>`_ for each commit and pull request checks for a successful run of the ``flake8`` CLI command in the repository's root. Which means pull requests will be flagged red by the Travis bot if non-flake8-compliant code was added.
To solve this, please run ``flake8`` on your code (whether through your text editor/IDE or using the command line) and fix all resulting errors. Thank you! :)
Adding documentation
--------------------
This project is documented using the `numpy docstring conventions`_, which were chosen as perhaps the most widelspread conventions both supported by common tools such as Sphinx and resulting in human-readable docstrings (in my personal opinion, of course). When documenting code you add to this project, please follow `these conventions`_.
.. _`numpy docstring conventions`: https://numpydoc.readthedocs.io/en/latest/format.html#docstring-standard
.. _`these conventions`: https://numpydoc.readthedocs.io/en/latest/format.html#docstring-standard
Additionally, if you update this ``README.rst`` file, use ``python setup.py checkdocs`` to validate it compiles.
License
=======
This package is released as open-source software under the `BSD 3-clause license <https://opensource.org/licenses/BSD-3-Clause>`_. See ``LICENSE_NOTICE.md`` for the different copyright holders of different parts of the code.
Credits
=======
Implementations code by:
* Elkan & Noto - Alexandre Drouin and `AditraAS <https://github.com/AdityaAS>`_.
* Bagging PU Classifier - `Roy Wright <https://github.com/roywright/>`_.
Packaging, testing and documentation by `Shay Palachy <http://www.shaypalachy.com/>`_.
Fixes and feature contributions by:
* `kjappelbaum <https://github.com/kjappelbaum>`_
* `mepland <https://github.com/mepland>`_
* `TEGELB <https://github.com/TEGELB>`_
.. alternative:
.. https://badge.fury.io/py/yellowbrick.svg
.. |PyPI-Status| image:: https://img.shields.io/pypi/v/pulearn.svg
:target: https://pypi.org/project/pulearn
.. |PyPI-Versions| image:: https://img.shields.io/pypi/pyversions/pulearn.svg
:target: https://pypi.org/project/pulearn
.. |Build-Status| image:: https://github.com/pulearn/pulearn/actions/workflows/ci-test.yml/badge.svg
:target: https://github.com/pulearn/pulearn/actions/workflows/ci-test.yml
.. |LICENCE| image:: https://img.shields.io/badge/License-BSD%203--clause-ff69b4.svg
:target: https://pypi.python.org/pypi/pulearn
.. .. |LICENCE| image:: https://github.com/pulearn/pulearn/blob/master/mit_license_badge.svg
:target: https://pypi.python.org/pypi/pulearn
.. https://img.shields.io/pypi/l/pulearn.svg
.. |Codecov| image:: https://codecov.io/github/pulearn/pulearn/coverage.svg?branch=master
:target: https://codecov.io/github/pulearn/pulearn?branch=master
.. |Codacy| image:: https://api.codacy.com/project/badge/Grade/7d605e063f114ecdb5569266bd0226cd
:alt: Codacy Badge
:target: https://app.codacy.com/app/pulearn/pulearn?utm_source=github.com&utm_medium=referral&utm_content=pulearn/pulearn&utm_campaign=Badge_Grade_Dashboard
.. |Requirements| image:: https://requires.io/github/pulearn/pulearn/requirements.svg?branch=master
:target: https://requires.io/github/pulearn/pulearn/requirements/?branch=master
:alt: Requirements Status
.. |Downloads| image:: https://pepy.tech/badge/pulearn
:target: https://pepy.tech/project/pulearn
:alt: PePy stats
.. |Codefactor| image:: https://www.codefactor.io/repository/github/pulearn/pulearn/badge?style=plastic
:target: https://www.codefactor.io/repository/github/pulearn/pulearn
:alt: Codefactor code quality
Raw data
{
"_id": null,
"home_page": null,
"name": "pulearn",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "machine-learning, positive-unlabeled-learning, pulearning",
"author": null,
"author_email": "Shay Palachy Affek <shay.palachy@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/05/87/84ff64e831b4588a5fa10dc3e4894711f5a1570144fb786ec1e0daa7632a/pulearn-0.0.11.tar.gz",
"platform": null,
"description": "pulearn \u23c2\n#########\n\n|PyPI-Status| |PyPI-Versions| |Build-Status| |Codecov| |Codefactor| |LICENCE|\n\nPositive-unlabeled learning with Python.\n\n**Website:** `https://pulearn.github.io/pulearn/ <https://pulearn.github.io/pulearn/>`_\n\n**Documentation:** `https://pulearn.github.io/pulearn/doc/pulearn/ <https://pulearn.github.io/pulearn/doc/pulearn/>`_\n\n\n.. code-block:: python\n\n from pulearn import ElkanotoPuClassifier\n from sklearn.svm import SVC\n svc = SVC(C=10, kernel='rbf', gamma=0.4, probability=True)\n pu_estimator = ElkanotoPuClassifier(estimator=svc, hold_out_ratio=0.2)\n pu_estimator.fit(X, y)\n\n\n.. contents::\n\n.. section-numbering::\n\n\nDocumentation\n=============\n\nThis is the repository for the ``pulearn`` package. The readme file is aimed at helping contributors to the project.\n\nTo learn more about how to use ``pulearn``, either `visit pulearn's homepage <https://pulearn.github.io/pulearn/>`_ or read the documentation at <https://pulearn.github.io/pulearn/doc/pulearn/>`_.\n\n\nInstallation\n============\n\nInstall ``pulearn`` with:\n\n.. code-block:: bash\n\n pip install pulearn\n\n\nImplemented Classifiers\n=======================\n\nElkanoto\n--------\n\nScikit-Learn wrappers for both the methods mentioned in the paper by Elkan and Noto, `\"Learning classifiers from only positive and unlabeled data\" <https://cseweb.ucsd.edu/~elkan/posonly.pdf>`_ (published in Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2008).\n\nThese wrap the Python code from `a fork by AdityaAS <https://github.com/AdityaAS/pu-learning>`_ (with implementation to both methods) to the `original repository <https://github.com/aldro61/pu-learning>`_ by `Alexandre Drouin <https://github.com/aldro61>`_ implementing one of the methods.\n\nUnlabeled examples are expected to be indicated by `-1`, positives by `1`.\n\nClassic Elkanoto\n~~~~~~~~~~~~~~~~\n\nTo use the classic (unweighted) method, use the ``ElkanotoPuClassifier`` class:\n\n.. code-block:: python\n\n from pulearn import ElkanotoPuClassifier\n from sklearn.svm import SVC\n svc = SVC(C=10, kernel='rbf', gamma=0.4, probability=True)\n pu_estimator = ElkanotoPuClassifier(estimator=svc, hold_out_ratio=0.2)\n pu_estimator.fit(X, y)\n\n\nWeighted Elkanoto\n~~~~~~~~~~~~~~~~~\n\nTo use the weighted method, use the ``WeightedElkanotoPuClassifier`` class:\n\n.. code-block:: python\n\n from pulearn import WeightedElkanotoPuClassifier\n from sklearn.svm import SVC\n svc = SVC(C=10, kernel='rbf', gamma=0.4, probability=True)\n pu_estimator = WeightedElkanotoPuClassifier(\n estimator=svc, labeled=10, unlabeled=20, hold_out_ratio=0.2)\n pu_estimator.fit(X, y)\n\nSee the original paper for details on how the ``labeled`` and ``unlabeled`` quantities are used to weigh training examples and affect the learning process: `https://cseweb.ucsd.edu/~elkan/posonly.pdf <https://cseweb.ucsd.edu/~elkan/posonly.pdf>`_.\n\nBagging-based PU-learning\n-------------------------\n\nBased on the paper `A bagging SVM to learn from positive and unlabeled examples (2013) <http://members.cbio.mines-paristech.fr/~jvert/svn/bibli/local/Mordelet2013bagging.pdf>`_ by Mordelet and Vert. The implementation is by `Roy Wright <https://roywrightme.wordpress.com/>`__ (`roywright <https://github.com/roywright/>`_ on GitHub), and can be found in `his repository <https://github.com/roywright/pu_learning>`_.\n\nUnlabeled examples are expected to be indicated by a number smaller than `1`, positives by `1`.\n\n.. code-block:: python\n\n from pulearn import BaggingPuClassifier\n from sklearn.svm import SVC\n svc = SVC(C=10, kernel='rbf', gamma=0.4, probability=True)\n pu_estimator = BaggingPuClassifier(\n estimator=svc, n_estimators=15)\n pu_estimator.fit(X, y)\n\n\nExamples\n========\n\nA nice code example of the classic Elkan-Noto classifier used for classification on the `Wisconsin breast cancer dataset <https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)>`_ , comparing it to a regular random forest classifier, can be found in the ``examples`` directory.\n\nTo run it, clone the repository, and run the following command from the root of the repository, with a python environment where ``pulearn`` is installed:\n\n.. code-block:: bash\n\n python examples/BreastCancerElkanotoExample.py\n\nYou should see a nice plot like the one below, comparing the F1 score of the PU learner versus a naive learner, demonstrating how PU learning becomes more effective - or worthwhile - the more positive examples are \"hidden\" from the training set.\n\n.. image:: https://raw.githubusercontent.com/pulearn/pulearn/master/pulearn_breast_cancer_f1_scores.png\n\n\nContributing\n============\n\nPackage author and current maintainer is Shay Palachy (shay.palachy@gmail.com); You are more than welcome to approach him for help. Contributions are very welcomed, especially since this package is very much in its infancy and many other PU Learning methods can be added.\n\nInstalling for development\n--------------------------\n\nClone:\n\n.. code-block:: bash\n\n git clone git@github.com:pulearn/pulearn.git\n\n\nInstall in development mode with test dependencies:\n\n.. code-block:: bash\n\n cd pulearn\n pip install -e \".[test]\"\n\n\nRunning the tests\n-----------------\n\nTo run the tests, use:\n\n.. code-block:: bash\n\n python -m pytest\n\n\nNotice ``pytest`` runs are configured by the ``pytest.ini`` file. Read it to understand the exact ``pytest`` arguments used.\n\n\nAdding tests\n------------\n\nAt the time of writing, ``pulearn`` is maintained with a test coverage of 100%. Although challenging, I hope to maintain this status. If you add code to the package, please make sure you thoroughly test it. Codecov automatically reports changes in coverage on each PR, and so PR reducing test coverage will not be examined before that is fixed.\n\nTests reside under the ``tests`` directory in the root of the repository. Each model has a separate test folder, with each class - usually a pipeline stage - having a dedicated file (always starting with the string \"test\") containing several tests (each a global function starting with the string \"test\"). Please adhere to this structure, and try to separate tests cases to different test functions; this allows us to quickly focus on problem areas and use cases. Thank you! :)\n\nCode style\n----------\n\n``pulearn`` code is written to adhere to the coding style dictated by `flake8 <http://flake8.pycqa.org/en/latest/>`_. Practically, this means that one of the jobs that runs on `the project's Travis <https://travis-ci.org/pulearn/pulearn>`_ for each commit and pull request checks for a successful run of the ``flake8`` CLI command in the repository's root. Which means pull requests will be flagged red by the Travis bot if non-flake8-compliant code was added.\n\nTo solve this, please run ``flake8`` on your code (whether through your text editor/IDE or using the command line) and fix all resulting errors. Thank you! :)\n\n\nAdding documentation\n--------------------\n\nThis project is documented using the `numpy docstring conventions`_, which were chosen as perhaps the most widelspread conventions both supported by common tools such as Sphinx and resulting in human-readable docstrings (in my personal opinion, of course). When documenting code you add to this project, please follow `these conventions`_.\n\n.. _`numpy docstring conventions`: https://numpydoc.readthedocs.io/en/latest/format.html#docstring-standard\n.. _`these conventions`: https://numpydoc.readthedocs.io/en/latest/format.html#docstring-standard\n\nAdditionally, if you update this ``README.rst`` file, use ``python setup.py checkdocs`` to validate it compiles.\n\n\nLicense\n=======\n\nThis package is released as open-source software under the `BSD 3-clause license <https://opensource.org/licenses/BSD-3-Clause>`_. See ``LICENSE_NOTICE.md`` for the different copyright holders of different parts of the code.\n\n\nCredits\n=======\n\nImplementations code by:\n\n* Elkan & Noto - Alexandre Drouin and `AditraAS <https://github.com/AdityaAS>`_.\n* Bagging PU Classifier - `Roy Wright <https://github.com/roywright/>`_.\n\nPackaging, testing and documentation by `Shay Palachy <http://www.shaypalachy.com/>`_.\n\nFixes and feature contributions by:\n\n* `kjappelbaum <https://github.com/kjappelbaum>`_\n* `mepland <https://github.com/mepland>`_\n* `TEGELB <https://github.com/TEGELB>`_\n\n\n.. alternative:\n.. https://badge.fury.io/py/yellowbrick.svg\n\n.. |PyPI-Status| image:: https://img.shields.io/pypi/v/pulearn.svg\n :target: https://pypi.org/project/pulearn\n\n.. |PyPI-Versions| image:: https://img.shields.io/pypi/pyversions/pulearn.svg\n :target: https://pypi.org/project/pulearn\n\n.. |Build-Status| image:: https://github.com/pulearn/pulearn/actions/workflows/ci-test.yml/badge.svg\n :target: https://github.com/pulearn/pulearn/actions/workflows/ci-test.yml\n\n.. |LICENCE| image:: https://img.shields.io/badge/License-BSD%203--clause-ff69b4.svg\n :target: https://pypi.python.org/pypi/pulearn\n\n.. .. |LICENCE| image:: https://github.com/pulearn/pulearn/blob/master/mit_license_badge.svg\n :target: https://pypi.python.org/pypi/pulearn\n\n.. https://img.shields.io/pypi/l/pulearn.svg\n\n.. |Codecov| image:: https://codecov.io/github/pulearn/pulearn/coverage.svg?branch=master\n :target: https://codecov.io/github/pulearn/pulearn?branch=master\n\n\n.. |Codacy| image:: https://api.codacy.com/project/badge/Grade/7d605e063f114ecdb5569266bd0226cd\n :alt: Codacy Badge\n :target: https://app.codacy.com/app/pulearn/pulearn?utm_source=github.com&utm_medium=referral&utm_content=pulearn/pulearn&utm_campaign=Badge_Grade_Dashboard\n\n.. |Requirements| image:: https://requires.io/github/pulearn/pulearn/requirements.svg?branch=master\n :target: https://requires.io/github/pulearn/pulearn/requirements/?branch=master\n :alt: Requirements Status\n\n.. |Downloads| image:: https://pepy.tech/badge/pulearn\n :target: https://pepy.tech/project/pulearn\n :alt: PePy stats\n\n.. |Codefactor| image:: https://www.codefactor.io/repository/github/pulearn/pulearn/badge?style=plastic\n :target: https://www.codefactor.io/repository/github/pulearn/pulearn\n :alt: Codefactor code quality\n",
"bugtrack_url": null,
"license": null,
"summary": "Positive-unlabeled learning with Python",
"version": "0.0.11",
"project_urls": {
"Source": "https://github.com/pulearn/pulearn"
},
"split_keywords": [
"machine-learning",
" positive-unlabeled-learning",
" pulearning"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "85273ae18b24076f58f9b1547bd325c829fea6a8963e6260d2d27d682e118351",
"md5": "a00289545c20369de666e5e37ff4f119",
"sha256": "0173f923c35a0254660d4b6ccaad55132fc0d6367b2bcf0c7144f57c3c51ce8b"
},
"downloads": -1,
"filename": "pulearn-0.0.11-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a00289545c20369de666e5e37ff4f119",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 18148,
"upload_time": "2024-09-10T10:15:46",
"upload_time_iso_8601": "2024-09-10T10:15:46.416914Z",
"url": "https://files.pythonhosted.org/packages/85/27/3ae18b24076f58f9b1547bd325c829fea6a8963e6260d2d27d682e118351/pulearn-0.0.11-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "058784ff64e831b4588a5fa10dc3e4894711f5a1570144fb786ec1e0daa7632a",
"md5": "f76007b5ce7b73515c896080d43a5bb8",
"sha256": "a0ef288a80699aeb0057ae5d6fdacf75d710a4bf4b8e273732e1ebc901bd2652"
},
"downloads": -1,
"filename": "pulearn-0.0.11.tar.gz",
"has_sig": false,
"md5_digest": "f76007b5ce7b73515c896080d43a5bb8",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 21198,
"upload_time": "2024-09-10T10:15:48",
"upload_time_iso_8601": "2024-09-10T10:15:48.218321Z",
"url": "https://files.pythonhosted.org/packages/05/87/84ff64e831b4588a5fa10dc3e4894711f5a1570144fb786ec1e0daa7632a/pulearn-0.0.11.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-10 10:15:48",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "pulearn",
"github_project": "pulearn",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "pulearn"
}