biquality-learn


Namebiquality-learn JSON
Version 0.0.2 PyPI version JSON
download
home_pagehttps://biquality-learn.readthedocs.io/en/stable/
Summarybiquality-learn is a library à la scikit-learn for Biquality Learning.
upload_time2023-08-22 09:24:41
maintainer
docs_urlNone
authorPierre Nodet
requires_python
licenseBSD 3-Clause
keywords biquality-learn
VCS
bugtrack_url
requirements numpy scipy scikit-learn scs
Travis-CI No Travis.
coveralls test coverage
            # biquality-learn

[![main](https://github.com/biquality-learn/biquality-learn/actions/workflows/main.yml/badge.svg)](https://github.com/biquality-learn/biquality-learn/actions/workflows/main.yml)
[![codecov](https://codecov.io/gh/biquality-learn/biquality-learn/branch/main/graph/badge.svg)](https://codecov.io/gh/biquality-learn/biquality-learn)
[![versions](https://img.shields.io/badge/python-3.8%20|%203.9%20|%203.10-blue)](https://img.shields.io/badge/python-3.8%20|%203.9%20|%203.10-blue)
[![pypi](https://img.shields.io/pypi/v/biquality-learn?color=blue)](https://pypi.org/project/biquality-learn/)

**biquality-learn** (or bqlearn in short) is a library à la [scikit-learn](https://github.com/scikit-learn/scikit-learn) for Biquality Learning. 

## Biquality Learning

Biquality Learning is a machine learning framework to train classifiers on Biquality Data, composed of an untrusted and a trusted dataset:

* The ***trusted*** dataset contains trustworthy samples with clean labels and proper feature distribution.
* The ***untrusted*** dataset contains potentially corrupted samples from label noise or covariate shift (distribution shift).

**biquality-learn** aims at making well-known and proven biquality learning algorithms *accessible* to everyone and help researchers experiment in a *reproducible* way on biquality data.

## Install

biquality-learn requires multiple dependencies:

- numpy>=1.17.3
- scipy>=1.5.0
- scikit-learn>=1.2.0
- scs>=3.2.2

The package is available on [PyPi](https://pypi.org). To install biquality-learn, run the following command :

```bash
pip install biquality-learn
```

A dev version is available on [TestPyPi](https://test.pypi.org) :

```bash
pip install --index-url https://test.pypi.org/simple/ biquality-learn
```

## Quick Start

For a quick example, we are going to train one of the available biquality classifiers, ``KPDR``, on the ``digits`` dataset with synthetic asymmetric label noise.

### Loading Data

First, we must load the dataset with **scikit-learn** and split it into a ***trusted*** and ***untrusted*** dataset.

```python
from sklearn.datasets import load_digits
from sklearn.model_selection import StratifiedShuffleSplit

X, y = load_digits(return_X_y=True)

trusted, untrusted = next(StratifiedShuffleSplit(train_size=0.1).split(X, y))
```

### Simulating Label Noise

Then we generate label noise on the untrusted dataset.

```python
from bqlearn.corruption import make_label_noise

y[untrusted] = make_label_noise(y[untrusted], "flip", noise_ratio=0.8)
```

### Training Biquality Classifier

Finally, we train ``KKMM`` on the biquality dataset by providing the ``sample_quality`` metadata, indicating if a sample is trusted or untrusted.

```python
from sklearn.linear_models import LogisticRegression
from bqlearn.density_ratio import KKMM

bqclf = KKMM(LogisticRegression(), kernel="rbf")

sample_quality = np.ones(X.shape[0])
sample_quality[untrusted] = 0

bqclf.fit(X, y, sample_quality=sample_quality)
bqclf.predict(X)
```

## Citation

If you use **biquality-learn** in your research, please consider citing us :

```
@misc{todo,
      title={biquality-learn: a Python library for Biquality Learning}, 
      author={Pierre Nodet and Vincent Lemaire and Alexis Bondu and Antoine Cornuéjols},
      year={2023},
      eprint={todo},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}
```

## Acknowledgment

This work has been funded by Orange Labs.

[<img src="https://c.woopic.com/logo-orange.png" alt="Orange Logo" width="100"/>](https://orange.com)



            

Raw data

            {
    "_id": null,
    "home_page": "https://biquality-learn.readthedocs.io/en/stable/",
    "name": "biquality-learn",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "biquality-learn",
    "author": "Pierre Nodet",
    "author_email": "pierre.nodet@orange.com",
    "download_url": "https://files.pythonhosted.org/packages/ae/3b/fc381aa6f8085a2f6eb1108a2e6e5729acb399359d935f0d4783677fd231/biquality-learn-0.0.2.tar.gz",
    "platform": null,
    "description": "# biquality-learn\n\n[![main](https://github.com/biquality-learn/biquality-learn/actions/workflows/main.yml/badge.svg)](https://github.com/biquality-learn/biquality-learn/actions/workflows/main.yml)\n[![codecov](https://codecov.io/gh/biquality-learn/biquality-learn/branch/main/graph/badge.svg)](https://codecov.io/gh/biquality-learn/biquality-learn)\n[![versions](https://img.shields.io/badge/python-3.8%20|%203.9%20|%203.10-blue)](https://img.shields.io/badge/python-3.8%20|%203.9%20|%203.10-blue)\n[![pypi](https://img.shields.io/pypi/v/biquality-learn?color=blue)](https://pypi.org/project/biquality-learn/)\n\n**biquality-learn** (or bqlearn in short) is a library \u00e0 la [scikit-learn](https://github.com/scikit-learn/scikit-learn) for Biquality Learning. \n\n## Biquality Learning\n\nBiquality Learning is a machine learning framework to train classifiers on Biquality Data, composed of an untrusted and a trusted dataset:\n\n* The ***trusted*** dataset contains trustworthy samples with clean labels and proper feature distribution.\n* The ***untrusted*** dataset contains potentially corrupted samples from label noise or covariate shift (distribution shift).\n\n**biquality-learn** aims at making well-known and proven biquality learning algorithms *accessible* to everyone and help researchers experiment in a *reproducible* way on biquality data.\n\n## Install\n\nbiquality-learn requires multiple dependencies:\n\n- numpy>=1.17.3\n- scipy>=1.5.0\n- scikit-learn>=1.2.0\n- scs>=3.2.2\n\nThe package is available on [PyPi](https://pypi.org). To install biquality-learn, run the following command :\n\n```bash\npip install biquality-learn\n```\n\nA dev version is available on [TestPyPi](https://test.pypi.org) :\n\n```bash\npip install --index-url https://test.pypi.org/simple/ biquality-learn\n```\n\n## Quick Start\n\nFor a quick example, we are going to train one of the available biquality classifiers, ``KPDR``, on the ``digits`` dataset with synthetic asymmetric label noise.\n\n### Loading Data\n\nFirst, we must load the dataset with **scikit-learn** and split it into a ***trusted*** and ***untrusted*** dataset.\n\n```python\nfrom sklearn.datasets import load_digits\nfrom sklearn.model_selection import StratifiedShuffleSplit\n\nX, y = load_digits(return_X_y=True)\n\ntrusted, untrusted = next(StratifiedShuffleSplit(train_size=0.1).split(X, y))\n```\n\n### Simulating Label Noise\n\nThen we generate label noise on the untrusted dataset.\n\n```python\nfrom bqlearn.corruption import make_label_noise\n\ny[untrusted] = make_label_noise(y[untrusted], \"flip\", noise_ratio=0.8)\n```\n\n### Training Biquality Classifier\n\nFinally, we train ``KKMM`` on the biquality dataset by providing the ``sample_quality`` metadata, indicating if a sample is trusted or untrusted.\n\n```python\nfrom sklearn.linear_models import LogisticRegression\nfrom bqlearn.density_ratio import KKMM\n\nbqclf = KKMM(LogisticRegression(), kernel=\"rbf\")\n\nsample_quality = np.ones(X.shape[0])\nsample_quality[untrusted] = 0\n\nbqclf.fit(X, y, sample_quality=sample_quality)\nbqclf.predict(X)\n```\n\n## Citation\n\nIf you use **biquality-learn** in your research, please consider citing us :\n\n```\n@misc{todo,\n      title={biquality-learn: a Python library for Biquality Learning}, \n      author={Pierre Nodet and Vincent Lemaire and Alexis Bondu and Antoine Cornu\u00e9jols},\n      year={2023},\n      eprint={todo},\n      archivePrefix={arXiv},\n      primaryClass={cs.LG}\n}\n```\n\n## Acknowledgment\n\nThis work has been funded by Orange Labs.\n\n[<img src=\"https://c.woopic.com/logo-orange.png\" alt=\"Orange Logo\" width=\"100\"/>](https://orange.com)\n\n\n",
    "bugtrack_url": null,
    "license": "BSD 3-Clause",
    "summary": "biquality-learn is a library \u00e0 la scikit-learn for Biquality Learning.",
    "version": "0.0.2",
    "project_urls": {
        "Homepage": "https://biquality-learn.readthedocs.io/en/stable/",
        "Source Code": "https://github.com/biquality-learn/biquality-learn"
    },
    "split_keywords": [
        "biquality-learn"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e65f20ff0038cab80572815d865aa718c411247b1de4c9d46e395f76b9b25cfa",
                "md5": "93519131a5811b5377c22b78cdafb8f3",
                "sha256": "0e94abd858ddfe1fb226441742ee271a23fae2def091d8f505a5063b048ed9e5"
            },
            "downloads": -1,
            "filename": "biquality_learn-0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "93519131a5811b5377c22b78cdafb8f3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 70361,
            "upload_time": "2023-08-22T09:24:40",
            "upload_time_iso_8601": "2023-08-22T09:24:40.461740Z",
            "url": "https://files.pythonhosted.org/packages/e6/5f/20ff0038cab80572815d865aa718c411247b1de4c9d46e395f76b9b25cfa/biquality_learn-0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ae3bfc381aa6f8085a2f6eb1108a2e6e5729acb399359d935f0d4783677fd231",
                "md5": "e8f92b76993fbb55b56200bfadee7555",
                "sha256": "53a2ebaebd0aa0f1783660db53149f59fa0ab530956fb973c4cffcf35e1b7194"
            },
            "downloads": -1,
            "filename": "biquality-learn-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "e8f92b76993fbb55b56200bfadee7555",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 75634,
            "upload_time": "2023-08-22T09:24:41",
            "upload_time_iso_8601": "2023-08-22T09:24:41.750766Z",
            "url": "https://files.pythonhosted.org/packages/ae/3b/fc381aa6f8085a2f6eb1108a2e6e5729acb399359d935f0d4783677fd231/biquality-learn-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-22 09:24:41",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "biquality-learn",
    "github_project": "biquality-learn",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "requirements": [
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.17.3"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    ">=",
                    "1.5.0"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    ">=",
                    "1.2.0"
                ]
            ]
        },
        {
            "name": "scs",
            "specs": [
                [
                    ">=",
                    "3.2.2"
                ]
            ]
        }
    ],
    "lcname": "biquality-learn"
}
        
Elapsed time: 0.12659s