QuaPy


NameQuaPy JSON
Version 0.1.8 PyPI version JSON
download
home_pagehttps://github.com/HLT-ISTI/QuaPy
SummaryQuaPy: a framework for Quantification in Python
upload_time2024-02-14 13:50:37
maintainerAlejandro Moreo
docs_urlNone
author
requires_python>=3.8, <4
license
keywords machine learning quantification classification prevalence estimation priors estimate
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # QuaPy

QuaPy is an open source framework for quantification (a.k.a. supervised prevalence estimation, or learning to quantify)
written in Python.

QuaPy is based on the concept of "data sample", and provides implementations of the
most important aspects of the quantification workflow, such as (baseline and advanced)
quantification methods, 
quantification-oriented model selection mechanisms, evaluation measures, and evaluations protocols
used for evaluating quantification methods.
QuaPy also makes available commonly used datasets, and offers visualization tools 
for facilitating the analysis and interpretation of the experimental results.

### Last updates:

* Version 0.1.8 is released! major changes can be consulted [here](CHANGE_LOG.txt).
* A detailed wiki is available [here](https://github.com/HLT-ISTI/QuaPy/wiki)
* The developer API documentation is available [here](https://hlt-isti.github.io/QuaPy/build/html/modules.html)

### Installation

```commandline
pip install quapy
```

### Cite QuaPy

If you find QuaPy useful (and we hope you will), plese consider citing the original paper in your research:

```
@inproceedings{moreo2021quapy,
  title={QuaPy: a python-based framework for quantification},
  author={Moreo, Alejandro and Esuli, Andrea and Sebastiani, Fabrizio},
  booktitle={Proceedings of the 30th ACM International Conference on Information \& Knowledge Management},
  pages={4534--4543},
  year={2021}
}
```

## A quick example:

The following script fetches a dataset of tweets, trains, applies, and evaluates a quantifier based on the 
_Adjusted Classify & Count_ quantification method, using, as the evaluation measure, the _Mean Absolute Error_ (MAE)
between the predicted and the true class prevalence values
of the test set.

```python
import quapy as qp
from sklearn.linear_model import LogisticRegression

dataset = qp.datasets.fetch_twitter('semeval16')

# create an "Adjusted Classify & Count" quantifier
model = qp.method.aggregative.ACC(LogisticRegression())
model.fit(dataset.training)

estim_prevalence = model.quantify(dataset.test.instances)
true_prevalence  = dataset.test.prevalence()

error = qp.error.mae(true_prevalence, estim_prevalence)

print(f'Mean Absolute Error (MAE)={error:.3f}')
```

Quantification is useful in scenarios characterized by prior probability shift. In other
words, we would be little interested in estimating the class prevalence values of the test set if 
we could assume the IID assumption to hold, as this prevalence would be roughly equivalent to the 
class prevalence of the training set. For this reason, any quantification model 
should be tested across many samples, even ones characterized by class prevalence 
values different or very different from those found in the training set.
QuaPy implements sampling procedures and evaluation protocols that automate this workflow.
See the [Wiki](https://github.com/HLT-ISTI/QuaPy/wiki) for detailed examples.

## Features

* Implementation of many popular quantification methods (Classify-&-Count and its variants, Expectation Maximization,
quantification methods based on structured output learning, HDy, QuaNet, quantification ensembles, among others).
* Versatile functionality for performing evaluation based on sampling generation protocols (e.g., APP, NPP, etc.).
* Implementation of most commonly used evaluation metrics (e.g., AE, RAE, NAE, NRAE, SE, KLD, NKLD, etc.).
* Datasets frequently used in quantification (textual and numeric), including:
    * 32 UCI Machine Learning binary datasets.
    * 5 UCI Machine Learning multiclass datasets (_new in v0.1.8!_).
    * 11 Twitter quantification-by-sentiment datasets.
    * 3 product reviews quantification-by-sentiment datasets. 
    * 4 tasks from LeQua competition (_new in v0.1.7!_)
    * IFCB dataset of plankton water samples (_new in v0.1.8!_).
* Native support for binary and single-label multiclass quantification scenarios.
* Model selection functionality that minimizes quantification-oriented loss functions.
* Visualization tools for analysing the experimental results.

## Requirements

* scikit-learn, numpy, scipy
* pytorch (for QuaNet)
* svmperf patched for quantification (see below)
* joblib
* tqdm
* pandas, xlrd
* matplotlib
* ucimlrepo

## Contributing

In case you want to contribute improvements to quapy, please generate pull request to the "devel" branch.
  
## Documentation

The developer API documentation is available [here](https://hlt-isti.github.io/QuaPy/build/html/index.html). 

Check out our [Wiki](https://github.com/HLT-ISTI/QuaPy/wiki), in which many examples
are provided:

* [Datasets](https://github.com/HLT-ISTI/QuaPy/wiki/Datasets)
* [Evaluation](https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation)
* [Protocols](https://github.com/HLT-ISTI/QuaPy/wiki/Protocols)
* [Methods](https://github.com/HLT-ISTI/QuaPy/wiki/Methods)
* [SVMperf](https://github.com/HLT-ISTI/QuaPy/wiki/ExplicitLossMinimization)
* [Model Selection](https://github.com/HLT-ISTI/QuaPy/wiki/Model-Selection)
* [Plotting](https://github.com/HLT-ISTI/QuaPy/wiki/Plotting)

## Acknowledgments:



<img src="logo/SoBigData.png" alt="SoBigData++" width="250"/> 

<img src="logo/LogoQuaDaSh.png" alt="QuaDaSh" width="250"/>

<img src="logo/NextGenerationEU.jpg" alt="QuaDaSh" width="250"/>

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/HLT-ISTI/QuaPy",
    "name": "QuaPy",
    "maintainer": "Alejandro Moreo",
    "docs_url": null,
    "requires_python": ">=3.8, <4",
    "maintainer_email": "alejandro.moreo@isti.cnr.it",
    "keywords": "machine learning,quantification,classification,prevalence estimation,priors estimate",
    "author": "",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/cd/ee/201e71908283403e950d61d3aca87e77e7f8d3c400b7d319a0226c7c3e8b/QuaPy-0.1.8.tar.gz",
    "platform": null,
    "description": "# QuaPy\n\nQuaPy is an open source framework for quantification (a.k.a. supervised prevalence estimation, or learning to quantify)\nwritten in Python.\n\nQuaPy is based on the concept of \"data sample\", and provides implementations of the\nmost important aspects of the quantification workflow, such as (baseline and advanced)\nquantification methods, \nquantification-oriented model selection mechanisms, evaluation measures, and evaluations protocols\nused for evaluating quantification methods.\nQuaPy also makes available commonly used datasets, and offers visualization tools \nfor facilitating the analysis and interpretation of the experimental results.\n\n### Last updates:\n\n* Version 0.1.8 is released! major changes can be consulted [here](CHANGE_LOG.txt).\n* A detailed wiki is available [here](https://github.com/HLT-ISTI/QuaPy/wiki)\n* The developer API documentation is available [here](https://hlt-isti.github.io/QuaPy/build/html/modules.html)\n\n### Installation\n\n```commandline\npip install quapy\n```\n\n### Cite QuaPy\n\nIf you find QuaPy useful (and we hope you will), plese consider citing the original paper in your research:\n\n```\n@inproceedings{moreo2021quapy,\n  title={QuaPy: a python-based framework for quantification},\n  author={Moreo, Alejandro and Esuli, Andrea and Sebastiani, Fabrizio},\n  booktitle={Proceedings of the 30th ACM International Conference on Information \\& Knowledge Management},\n  pages={4534--4543},\n  year={2021}\n}\n```\n\n## A quick example:\n\nThe following script fetches a dataset of tweets, trains, applies, and evaluates a quantifier based on the \n_Adjusted Classify & Count_ quantification method, using, as the evaluation measure, the _Mean Absolute Error_ (MAE)\nbetween the predicted and the true class prevalence values\nof the test set.\n\n```python\nimport quapy as qp\nfrom sklearn.linear_model import LogisticRegression\n\ndataset = qp.datasets.fetch_twitter('semeval16')\n\n# create an \"Adjusted Classify & Count\" quantifier\nmodel = qp.method.aggregative.ACC(LogisticRegression())\nmodel.fit(dataset.training)\n\nestim_prevalence = model.quantify(dataset.test.instances)\ntrue_prevalence  = dataset.test.prevalence()\n\nerror = qp.error.mae(true_prevalence, estim_prevalence)\n\nprint(f'Mean Absolute Error (MAE)={error:.3f}')\n```\n\nQuantification is useful in scenarios characterized by prior probability shift. In other\nwords, we would be little interested in estimating the class prevalence values of the test set if \nwe could assume the IID assumption to hold, as this prevalence would be roughly equivalent to the \nclass prevalence of the training set. For this reason, any quantification model \nshould be tested across many samples, even ones characterized by class prevalence \nvalues different or very different from those found in the training set.\nQuaPy implements sampling procedures and evaluation protocols that automate this workflow.\nSee the [Wiki](https://github.com/HLT-ISTI/QuaPy/wiki) for detailed examples.\n\n## Features\n\n* Implementation of many popular quantification methods (Classify-&-Count and its variants, Expectation Maximization,\nquantification methods based on structured output learning, HDy, QuaNet, quantification ensembles, among others).\n* Versatile functionality for performing evaluation based on sampling generation protocols (e.g., APP, NPP, etc.).\n* Implementation of most commonly used evaluation metrics (e.g., AE, RAE, NAE, NRAE, SE, KLD, NKLD, etc.).\n* Datasets frequently used in quantification (textual and numeric), including:\n    * 32 UCI Machine Learning binary datasets.\n    * 5 UCI Machine Learning multiclass datasets (_new in v0.1.8!_).\n    * 11 Twitter quantification-by-sentiment datasets.\n    * 3 product reviews quantification-by-sentiment datasets. \n    * 4 tasks from LeQua competition (_new in v0.1.7!_)\n    * IFCB dataset of plankton water samples (_new in v0.1.8!_).\n* Native support for binary and single-label multiclass quantification scenarios.\n* Model selection functionality that minimizes quantification-oriented loss functions.\n* Visualization tools for analysing the experimental results.\n\n## Requirements\n\n* scikit-learn, numpy, scipy\n* pytorch (for QuaNet)\n* svmperf patched for quantification (see below)\n* joblib\n* tqdm\n* pandas, xlrd\n* matplotlib\n* ucimlrepo\n\n## Contributing\n\nIn case you want to contribute improvements to quapy, please generate pull request to the \"devel\" branch.\n  \n## Documentation\n\nThe developer API documentation is available [here](https://hlt-isti.github.io/QuaPy/build/html/index.html). \n\nCheck out our [Wiki](https://github.com/HLT-ISTI/QuaPy/wiki), in which many examples\nare provided:\n\n* [Datasets](https://github.com/HLT-ISTI/QuaPy/wiki/Datasets)\n* [Evaluation](https://github.com/HLT-ISTI/QuaPy/wiki/Evaluation)\n* [Protocols](https://github.com/HLT-ISTI/QuaPy/wiki/Protocols)\n* [Methods](https://github.com/HLT-ISTI/QuaPy/wiki/Methods)\n* [SVMperf](https://github.com/HLT-ISTI/QuaPy/wiki/ExplicitLossMinimization)\n* [Model Selection](https://github.com/HLT-ISTI/QuaPy/wiki/Model-Selection)\n* [Plotting](https://github.com/HLT-ISTI/QuaPy/wiki/Plotting)\n\n## Acknowledgments:\n\n\n\n<img src=\"logo/SoBigData.png\" alt=\"SoBigData++\" width=\"250\"/> \n\n<img src=\"logo/LogoQuaDaSh.png\" alt=\"QuaDaSh\" width=\"250\"/>\n\n<img src=\"logo/NextGenerationEU.jpg\" alt=\"QuaDaSh\" width=\"250\"/>\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "QuaPy: a framework for Quantification in Python",
    "version": "0.1.8",
    "project_urls": {
        "Bug Reports": "https://github.com/HLT-ISTI/QuaPy/issues",
        "Contributors": "https://github.com/HLT-ISTI/QuaPy/graphs/contributors",
        "Documentation": "https://hlt-isti.github.io/QuaPy/build/html/index.html",
        "Homepage": "https://github.com/HLT-ISTI/QuaPy",
        "Source": "https://github.com/HLT-ISTI/QuaPy/",
        "Wiki": "https://github.com/HLT-ISTI/QuaPy/wiki"
    },
    "split_keywords": [
        "machine learning",
        "quantification",
        "classification",
        "prevalence estimation",
        "priors estimate"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6125af1ab1c0011e9fc5cbc484749ad291041254b823f96e36d4fa9cfd2068a1",
                "md5": "db55d7f457fff7a38bc6e2977b317f66",
                "sha256": "47c74e28a9c668a439f90c3ff80fce428632a4a778fafc322dd031063d80000d"
            },
            "downloads": -1,
            "filename": "QuaPy-0.1.8-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "db55d7f457fff7a38bc6e2977b317f66",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8, <4",
            "size": 127547,
            "upload_time": "2024-02-14T13:50:34",
            "upload_time_iso_8601": "2024-02-14T13:50:34.935218Z",
            "url": "https://files.pythonhosted.org/packages/61/25/af1ab1c0011e9fc5cbc484749ad291041254b823f96e36d4fa9cfd2068a1/QuaPy-0.1.8-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cdee201e71908283403e950d61d3aca87e77e7f8d3c400b7d319a0226c7c3e8b",
                "md5": "ab3cbd5442d47faa85dbdbc0b33fa266",
                "sha256": "d2144251f5b672d786e200730451eb6c28695fb1bd5f1461c1024811dc2cd6f6"
            },
            "downloads": -1,
            "filename": "QuaPy-0.1.8.tar.gz",
            "has_sig": false,
            "md5_digest": "ab3cbd5442d47faa85dbdbc0b33fa266",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8, <4",
            "size": 107864,
            "upload_time": "2024-02-14T13:50:37",
            "upload_time_iso_8601": "2024-02-14T13:50:37.602958Z",
            "url": "https://files.pythonhosted.org/packages/cd/ee/201e71908283403e950d61d3aca87e77e7f8d3c400b7d319a0226c7c3e8b/QuaPy-0.1.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-14 13:50:37",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "HLT-ISTI",
    "github_project": "QuaPy",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "quapy"
}
        
Elapsed time: 0.20413s