# biquality-learn
[![main](https://github.com/biquality-learn/biquality-learn/actions/workflows/main.yml/badge.svg)](https://github.com/biquality-learn/biquality-learn/actions/workflows/main.yml)
[![codecov](https://codecov.io/gh/biquality-learn/biquality-learn/branch/main/graph/badge.svg)](https://codecov.io/gh/biquality-learn/biquality-learn)
[![versions](https://img.shields.io/badge/python-3.8%20|%203.9%20|%203.10-blue)](https://img.shields.io/badge/python-3.8%20|%203.9%20|%203.10-blue)
[![pypi](https://img.shields.io/pypi/v/biquality-learn?color=blue)](https://pypi.org/project/biquality-learn/)
**biquality-learn** (or bqlearn in short) is a library à la [scikit-learn](https://github.com/scikit-learn/scikit-learn) for Biquality Learning.
## Biquality Learning
Biquality Learning is a machine learning framework to train classifiers on Biquality Data, composed of an untrusted and a trusted dataset:
* The ***trusted*** dataset contains trustworthy samples with clean labels and proper feature distribution.
* The ***untrusted*** dataset contains potentially corrupted samples from label noise or covariate shift (distribution shift).
**biquality-learn** aims at making well-known and proven biquality learning algorithms *accessible* to everyone and help researchers experiment in a *reproducible* way on biquality data.
## Install
biquality-learn requires multiple dependencies:
- numpy>=1.17.3
- scipy>=1.5.0
- scikit-learn>=1.2.0
- scs>=3.2.2
The package is available on [PyPi](https://pypi.org). To install biquality-learn, run the following command :
```bash
pip install biquality-learn
```
A dev version is available on [TestPyPi](https://test.pypi.org) :
```bash
pip install --index-url https://test.pypi.org/simple/ biquality-learn
```
## Quick Start
For a quick example, we are going to train one of the available biquality classifiers, ``KPDR``, on the ``digits`` dataset with synthetic asymmetric label noise.
### Loading Data
First, we must load the dataset with **scikit-learn** and split it into a ***trusted*** and ***untrusted*** dataset.
```python
from sklearn.datasets import load_digits
from sklearn.model_selection import StratifiedShuffleSplit
X, y = load_digits(return_X_y=True)
trusted, untrusted = next(StratifiedShuffleSplit(train_size=0.1).split(X, y))
```
### Simulating Label Noise
Then we generate label noise on the untrusted dataset.
```python
from bqlearn.corruption import make_label_noise
y[untrusted] = make_label_noise(y[untrusted], "flip", noise_ratio=0.8)
```
### Training Biquality Classifier
Finally, we train ``KKMM`` on the biquality dataset by providing the ``sample_quality`` metadata, indicating if a sample is trusted or untrusted.
```python
from sklearn.linear_models import LogisticRegression
from bqlearn.density_ratio import KKMM
bqclf = KKMM(LogisticRegression(), kernel="rbf")
sample_quality = np.ones(X.shape[0])
sample_quality[untrusted] = 0
bqclf.fit(X, y, sample_quality=sample_quality)
bqclf.predict(X)
```
## Citation
If you use **biquality-learn** in your research, please consider citing us :
```
@misc{todo,
title={biquality-learn: a Python library for Biquality Learning},
author={Pierre Nodet and Vincent Lemaire and Alexis Bondu and Antoine Cornuéjols},
year={2023},
eprint={todo},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
```
## Acknowledgment
This work has been funded by Orange Labs.
[<img src="https://c.woopic.com/logo-orange.png" alt="Orange Logo" width="100"/>](https://orange.com)
Raw data
{
"_id": null,
"home_page": "https://biquality-learn.readthedocs.io/en/stable/",
"name": "biquality-learn",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "biquality-learn",
"author": "Pierre Nodet",
"author_email": "pierre.nodet@orange.com",
"download_url": "https://files.pythonhosted.org/packages/ae/3b/fc381aa6f8085a2f6eb1108a2e6e5729acb399359d935f0d4783677fd231/biquality-learn-0.0.2.tar.gz",
"platform": null,
"description": "# biquality-learn\n\n[![main](https://github.com/biquality-learn/biquality-learn/actions/workflows/main.yml/badge.svg)](https://github.com/biquality-learn/biquality-learn/actions/workflows/main.yml)\n[![codecov](https://codecov.io/gh/biquality-learn/biquality-learn/branch/main/graph/badge.svg)](https://codecov.io/gh/biquality-learn/biquality-learn)\n[![versions](https://img.shields.io/badge/python-3.8%20|%203.9%20|%203.10-blue)](https://img.shields.io/badge/python-3.8%20|%203.9%20|%203.10-blue)\n[![pypi](https://img.shields.io/pypi/v/biquality-learn?color=blue)](https://pypi.org/project/biquality-learn/)\n\n**biquality-learn** (or bqlearn in short) is a library \u00e0 la [scikit-learn](https://github.com/scikit-learn/scikit-learn) for Biquality Learning. \n\n## Biquality Learning\n\nBiquality Learning is a machine learning framework to train classifiers on Biquality Data, composed of an untrusted and a trusted dataset:\n\n* The ***trusted*** dataset contains trustworthy samples with clean labels and proper feature distribution.\n* The ***untrusted*** dataset contains potentially corrupted samples from label noise or covariate shift (distribution shift).\n\n**biquality-learn** aims at making well-known and proven biquality learning algorithms *accessible* to everyone and help researchers experiment in a *reproducible* way on biquality data.\n\n## Install\n\nbiquality-learn requires multiple dependencies:\n\n- numpy>=1.17.3\n- scipy>=1.5.0\n- scikit-learn>=1.2.0\n- scs>=3.2.2\n\nThe package is available on [PyPi](https://pypi.org). To install biquality-learn, run the following command :\n\n```bash\npip install biquality-learn\n```\n\nA dev version is available on [TestPyPi](https://test.pypi.org) :\n\n```bash\npip install --index-url https://test.pypi.org/simple/ biquality-learn\n```\n\n## Quick Start\n\nFor a quick example, we are going to train one of the available biquality classifiers, ``KPDR``, on the ``digits`` dataset with synthetic asymmetric label noise.\n\n### Loading Data\n\nFirst, we must load the dataset with **scikit-learn** and split it into a ***trusted*** and ***untrusted*** dataset.\n\n```python\nfrom sklearn.datasets import load_digits\nfrom sklearn.model_selection import StratifiedShuffleSplit\n\nX, y = load_digits(return_X_y=True)\n\ntrusted, untrusted = next(StratifiedShuffleSplit(train_size=0.1).split(X, y))\n```\n\n### Simulating Label Noise\n\nThen we generate label noise on the untrusted dataset.\n\n```python\nfrom bqlearn.corruption import make_label_noise\n\ny[untrusted] = make_label_noise(y[untrusted], \"flip\", noise_ratio=0.8)\n```\n\n### Training Biquality Classifier\n\nFinally, we train ``KKMM`` on the biquality dataset by providing the ``sample_quality`` metadata, indicating if a sample is trusted or untrusted.\n\n```python\nfrom sklearn.linear_models import LogisticRegression\nfrom bqlearn.density_ratio import KKMM\n\nbqclf = KKMM(LogisticRegression(), kernel=\"rbf\")\n\nsample_quality = np.ones(X.shape[0])\nsample_quality[untrusted] = 0\n\nbqclf.fit(X, y, sample_quality=sample_quality)\nbqclf.predict(X)\n```\n\n## Citation\n\nIf you use **biquality-learn** in your research, please consider citing us :\n\n```\n@misc{todo,\n title={biquality-learn: a Python library for Biquality Learning}, \n author={Pierre Nodet and Vincent Lemaire and Alexis Bondu and Antoine Cornu\u00e9jols},\n year={2023},\n eprint={todo},\n archivePrefix={arXiv},\n primaryClass={cs.LG}\n}\n```\n\n## Acknowledgment\n\nThis work has been funded by Orange Labs.\n\n[<img src=\"https://c.woopic.com/logo-orange.png\" alt=\"Orange Logo\" width=\"100\"/>](https://orange.com)\n\n\n",
"bugtrack_url": null,
"license": "BSD 3-Clause",
"summary": "biquality-learn is a library \u00e0 la scikit-learn for Biquality Learning.",
"version": "0.0.2",
"project_urls": {
"Homepage": "https://biquality-learn.readthedocs.io/en/stable/",
"Source Code": "https://github.com/biquality-learn/biquality-learn"
},
"split_keywords": [
"biquality-learn"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "e65f20ff0038cab80572815d865aa718c411247b1de4c9d46e395f76b9b25cfa",
"md5": "93519131a5811b5377c22b78cdafb8f3",
"sha256": "0e94abd858ddfe1fb226441742ee271a23fae2def091d8f505a5063b048ed9e5"
},
"downloads": -1,
"filename": "biquality_learn-0.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "93519131a5811b5377c22b78cdafb8f3",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 70361,
"upload_time": "2023-08-22T09:24:40",
"upload_time_iso_8601": "2023-08-22T09:24:40.461740Z",
"url": "https://files.pythonhosted.org/packages/e6/5f/20ff0038cab80572815d865aa718c411247b1de4c9d46e395f76b9b25cfa/biquality_learn-0.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ae3bfc381aa6f8085a2f6eb1108a2e6e5729acb399359d935f0d4783677fd231",
"md5": "e8f92b76993fbb55b56200bfadee7555",
"sha256": "53a2ebaebd0aa0f1783660db53149f59fa0ab530956fb973c4cffcf35e1b7194"
},
"downloads": -1,
"filename": "biquality-learn-0.0.2.tar.gz",
"has_sig": false,
"md5_digest": "e8f92b76993fbb55b56200bfadee7555",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 75634,
"upload_time": "2023-08-22T09:24:41",
"upload_time_iso_8601": "2023-08-22T09:24:41.750766Z",
"url": "https://files.pythonhosted.org/packages/ae/3b/fc381aa6f8085a2f6eb1108a2e6e5729acb399359d935f0d4783677fd231/biquality-learn-0.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-08-22 09:24:41",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "biquality-learn",
"github_project": "biquality-learn",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"requirements": [
{
"name": "numpy",
"specs": [
[
">=",
"1.17.3"
]
]
},
{
"name": "scipy",
"specs": [
[
">=",
"1.5.0"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
">=",
"1.2.0"
]
]
},
{
"name": "scs",
"specs": [
[
">=",
"3.2.2"
]
]
}
],
"lcname": "biquality-learn"
}