# QuaPy

 Name QuaPy JSON Version 0.1.5 JSON download home_page https://github.com/HLT-ISTI/QuaPy Summary QuaPy: a framework for Quantification in Python upload_time 2021-06-01 14:19:21 maintainer Alejandro Moreo docs_url None author requires_python >=3.6, <4 license keywords VCS bugtrack_url requirements No requirements were recorded. Travis-CI No Travis. coveralls test coverage No coveralls.
```            # QuaPy

QuaPy is an open source framework for Quantification (a.k.a. Supervised Prevalence Estimation)
written in Python.

QuaPy roots on the concept of data sample, and provides implementations of
most important concepts in quantification literature, such as the most important
quantification baselines, many advanced quantification methods,
quantification-oriented model selection, many evaluation measures and protocols
used for evaluating quantification methods.
QuaPy also integrates commonly used datasets and offers visualization tools
for facilitating the analysis and interpretation of results.

### Installation

```commandline
pip install quapy
```

## A quick example:

The following script fetchs a Twitter dataset, trains and evaluates an
_Adjusted Classify & Count_ model in terms of the _Mean Absolute Error_ (MAE)
between the class prevalences estimated for the test set and the true prevalences
of the test set.

```python
import quapy as qp
from sklearn.linear_model import LogisticRegression

# create an "Adjusted Classify & Count" quantifier
model = qp.method.aggregative.ACC(LogisticRegression())
model.fit(dataset.training)

estim_prevalences = model.quantify(dataset.test.instances)
true_prevalences  = dataset.test.prevalence()

error = qp.error.mae(true_prevalences, estim_prevalences)

print(f'Mean Absolute Error (MAE)={error:.3f}')
```

Quantification is useful in scenarios of prior probability shift. In other
words, we would not be interested in estimating the class prevalences of the test set if
we could assume the IID assumption to hold, as this prevalence would simply coincide with the
class prevalence of the training set. For this reason, any Quantification model
should be tested across samples characterized by different class prevalences.
QuaPy implements sampling procedures and evaluation protocols that automates this endeavour.
See the [Wiki](https://github.com/HLT-ISTI/QuaPy/wiki) for detailed examples.

## Features

* Implementation of most popular quantification methods (Classify-&-Count variants, Expectation-Maximization,
SVM-based variants for quantification, HDy, QuaNet, and Ensembles).
* Versatile functionality for performing evaluation based on artificial sampling protocols.
* Implementation of most commonly used evaluation metrics (e.g., MAE, MRAE, MSE, NKLD, etc.).
* Popular datasets for Quantification (textual and numeric) available, including:
* 32 UCI Machine Learning datasets.
* 3 Reviews Sentiment datasets.
* Native supports for binary and single-label scenarios of quantification.
* Model selection functionality targeting quantification-oriented losses.
* Visualization tools for analysing results.

## Requirements

* scikit-learn, numpy, scipy
* pytorch (for QuaNet)
* svmperf patched for quantification (see below)
* joblib
* tqdm
* pandas, xlrd
* matplotlib

## SVM-perf with quantification-oriented losses
In order to run experiments involving SVM(Q), SVM(KLD), SVM(NKLD),
[svmperf](http://www.cs.cornell.edu/people/tj/svm_light/svm_perf.html)
package, apply the patch
[svm-perf-quantification-ext.patch](./svm-perf-quantification-ext.patch), and compile the sources.
The script [prepare_svmperf.sh](prepare_svmperf.sh) does all the job. Simply run:

```
./prepare_svmperf.sh
```

The resulting directory [svm_perf_quantification](./svm_perf_quantification) contains the
patched version of _svmperf_ with quantification-oriented losses.

The [svm-perf-quantification-ext.patch](./svm-perf-quantification-ext.patch) is an extension of the patch made available by
[Esuli et al. 2015](https://dl.acm.org/doi/abs/10.1145/2700406?casa_token=8D2fHsGCVn0AAAAA:ZfThYOvrzWxMGfZYlQW_y8Cagg-o_l6X_PcF09mdETQ4Tu7jK98mxFbGSXp9ZSO14JkUIYuDGFG0)
that allows SVMperf to optimize for
the _Q_ measure as proposed by [Barranquero et al. 2015](https://www.sciencedirect.com/science/article/abs/pii/S003132031400291X)
and for the _KLD_ and _NKLD_ as proposed by [Esuli et al. 2015](https://dl.acm.org/doi/abs/10.1145/2700406?casa_token=8D2fHsGCVn0AAAAA:ZfThYOvrzWxMGfZYlQW_y8Cagg-o_l6X_PcF09mdETQ4Tu7jK98mxFbGSXp9ZSO14JkUIYuDGFG0)
for quantification.
This patch extends the former by also allowing SVMperf to optimize for
_AE_ and _RAE_.

## Wiki

Check out our [Wiki](https://github.com/HLT-ISTI/QuaPy/wiki) in which many examples
are provided:

* [Datasets](https://github.com/HLT-ISTI/QuaPy/wiki/Datasets)
* [Evaluation](https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation)
* [Methods](https://github.com/HLT-ISTI/QuaPy/wiki/Methods)
* [Model Selection](https://github.com/HLT-ISTI/QuaPy/wiki/Model-Selection)
* [Plotting](https://github.com/HLT-ISTI/QuaPy/wiki/Plotting)

```

### Raw data

```            {
"_id": null,
"home_page": "https://github.com/HLT-ISTI/QuaPy",
"name": "QuaPy",
"maintainer": "Alejandro Moreo",
"docs_url": null,
"requires_python": ">=3.6, <4",
"maintainer_email": "alejandro.moreo@isti.cnr.it",
"keywords": "machine learning,quantification,classification,prevalence estimation,priors estimate",
"author": "",
"author_email": "",
"platform": "",
"description": "# QuaPy\n\nQuaPy is an open source framework for Quantification (a.k.a. Supervised Prevalence Estimation)\nwritten in Python.\n\nQuaPy roots on the concept of data sample, and provides implementations of\nmost important concepts in quantification literature, such as the most important \nquantification baselines, many advanced quantification methods, \nquantification-oriented model selection, many evaluation measures and protocols\nused for evaluating quantification methods.\nQuaPy also integrates commonly used datasets and offers visualization tools \nfor facilitating the analysis and interpretation of results.\n\n### Installation\n\n```commandline\npip install quapy\n```\n\n## A quick example:\n\nThe following script fetchs a Twitter dataset, trains and evaluates an \n_Adjusted Classify & Count_ model in terms of the _Mean Absolute Error_ (MAE)\nbetween the class prevalences estimated for the test set and the true prevalences\nof the test set.\n\n```python\nimport quapy as qp\nfrom sklearn.linear_model import LogisticRegression\n\ndataset = qp.datasets.fetch_twitter('semeval16')\n\n# create an \"Adjusted Classify & Count\" quantifier\nmodel = qp.method.aggregative.ACC(LogisticRegression())\nmodel.fit(dataset.training)\n\nestim_prevalences = model.quantify(dataset.test.instances)\ntrue_prevalences  = dataset.test.prevalence()\n\nerror = qp.error.mae(true_prevalences, estim_prevalences)\n\nprint(f'Mean Absolute Error (MAE)={error:.3f}')\n```\n\nQuantification is useful in scenarios of prior probability shift. In other\nwords, we would not be interested in estimating the class prevalences of the test set if \nwe could assume the IID assumption to hold, as this prevalence would simply coincide with the \nclass prevalence of the training set. For this reason, any Quantification model \nshould be tested across samples characterized by different class prevalences.\nQuaPy implements sampling procedures and evaluation protocols that automates this endeavour.\nSee the [Wiki](https://github.com/HLT-ISTI/QuaPy/wiki) for detailed examples.\n\n## Features\n\n* Implementation of most popular quantification methods (Classify-&-Count variants, Expectation-Maximization,\nSVM-based variants for quantification, HDy, QuaNet, and Ensembles).\n* Versatile functionality for performing evaluation based on artificial sampling protocols.\n* Implementation of most commonly used evaluation metrics (e.g., MAE, MRAE, MSE, NKLD, etc.).\n* Popular datasets for Quantification (textual and numeric) available, including:\n    * 32 UCI Machine Learning datasets.\n    * 11 Twitter Sentiment datasets.\n    * 3 Reviews Sentiment datasets. \n* Native supports for binary and single-label scenarios of quantification.\n* Model selection functionality targeting quantification-oriented losses.\n* Visualization tools for analysing results.\n\n## Requirements\n\n* scikit-learn, numpy, scipy\n* pytorch (for QuaNet)\n* svmperf patched for quantification (see below)\n* joblib\n* tqdm\n* pandas, xlrd\n* matplotlib\n\n## SVM-perf with quantification-oriented losses\nIn order to run experiments involving SVM(Q), SVM(KLD), SVM(NKLD),\nSVM(AE), or SVM(RAE), you have to first download the \n[svmperf](http://www.cs.cornell.edu/people/tj/svm_light/svm_perf.html) \npackage, apply the patch \n[svm-perf-quantification-ext.patch](./svm-perf-quantification-ext.patch), and compile the sources.\nThe script [prepare_svmperf.sh](prepare_svmperf.sh) does all the job. Simply run:\n\n```\n./prepare_svmperf.sh\n```\n\nThe resulting directory [svm_perf_quantification](./svm_perf_quantification) contains the\npatched version of _svmperf_ with quantification-oriented losses. \n\nThe [svm-perf-quantification-ext.patch](./svm-perf-quantification-ext.patch) is an extension of the patch made available by\n[Esuli et al. 2015](https://dl.acm.org/doi/abs/10.1145/2700406?casa_token=8D2fHsGCVn0AAAAA:ZfThYOvrzWxMGfZYlQW_y8Cagg-o_l6X_PcF09mdETQ4Tu7jK98mxFbGSXp9ZSO14JkUIYuDGFG0) \nthat allows SVMperf to optimize for\nthe _Q_ measure as proposed by [Barranquero et al. 2015](https://www.sciencedirect.com/science/article/abs/pii/S003132031400291X) \nand for the _KLD_ and _NKLD_ as proposed by [Esuli et al. 2015](https://dl.acm.org/doi/abs/10.1145/2700406?casa_token=8D2fHsGCVn0AAAAA:ZfThYOvrzWxMGfZYlQW_y8Cagg-o_l6X_PcF09mdETQ4Tu7jK98mxFbGSXp9ZSO14JkUIYuDGFG0)\nfor quantification.\nThis patch extends the former by also allowing SVMperf to optimize for \n_AE_ and _RAE_.\n\n\n## Wiki\n\nCheck out our [Wiki](https://github.com/HLT-ISTI/QuaPy/wiki) in which many examples\nare provided:\n\n* [Datasets](https://github.com/HLT-ISTI/QuaPy/wiki/Datasets)\n* [Evaluation](https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation)\n* [Methods](https://github.com/HLT-ISTI/QuaPy/wiki/Methods)\n* [Model Selection](https://github.com/HLT-ISTI/QuaPy/wiki/Model-Selection)\n* [Plotting](https://github.com/HLT-ISTI/QuaPy/wiki/Plotting)\n\n",
"bugtrack_url": null,
"summary": "QuaPy: a framework for Quantification in Python",
"version": "0.1.5",
"split_keywords": [
"machine learning",
"quantification",
"classification",
"prevalence estimation",
"priors estimate"
],
"urls": [
{
"comment_text": "",
"digests": {
"md5": "8d425fd84126a93de0f9baebfdd1d351",
"sha256": "1d6673c0283f549eef69eca8be56a4610b333d324fd72724d9ec32e121103f7e"
},
"filename": "QuaPy-0.1.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "8d425fd84126a93de0f9baebfdd1d351",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6, <4",
"size": 55694,
"url": "https://files.pythonhosted.org/packages/48/96/a5a3f91f602dece17e0721440064ea987729b771e711f87b4c078c34230b/QuaPy-0.1.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"md5": "5285c6886eb331b9742348a36801af04",
"sha256": "5f0794c5abfeb73a336304f6dc65695129a5fa6018ab63366c2e775df42eb4d4"
},
"filename": "QuaPy-0.1.5.tar.gz",
"has_sig": false,
"md5_digest": "5285c6886eb331b9742348a36801af04",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6, <4",
"size": 49084,
"url": "https://files.pythonhosted.org/packages/98/dc/c82c8bdc2ba5379907d05a83cd0e379758f9d5b77c1e956df60566f95de1/QuaPy-0.1.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],