[](https://github.com/dholzmueller/probmetrics/actions/workflows/testing.yml)
# Probmetrics: Classification metrics and post-hoc calibration
This package (PyTorch-based) currently contains
- classification metrics, especially also
metrics for assessing the quality of probabilistic predictions, and
- post-hoc calibration methods, especially
a fast and accurate implementation of temperature scaling.
It accompanies our paper
[Rethinking Early Stopping: Refine, Then Calibrate](https://arxiv.org/abs/2501.19195).
Please cite our paper if you use this repository for research purposes.
The experiments from the paper can be found here:
[vision](https://github.com/eugeneberta/RefineThenCalibrate-Vision),
[tabular](https://github.com/dholzmueller/pytabkit),
[theory](https://github.com/eugeneberta/RefineThenCalibrate-Theory).
## Installation
Probmetrics is available via
```bash
pip install probmetrics
```
To obtain all functionality, install `probmetrics[extra,dev,dirichletcal]`.
- extra installs more packages for smooth ECE,
Venn-Abers calibration,
centered isotonic regression,
the temperature scaling implementation in NetCal.
- dev installs more packages for development (esp. documentation)
- dirichletcal installs Dirichlet calibration,
which however only works for Python 3.12 upwards.
## Using temperature scaling
We provide a highly efficient implementation of temperature scaling
that, unlike some other implementations,
does not suffer from optimization issues.
### Numpy interface
```python
from probmetrics.calibrators import get_calibrator
import numpy as np
probas = np.asarray([[0.1, 0.9]])
labels = np.asarray([1])
# this is the version with Laplace smoothing,
# use 'temp-scaling' for the version without
calib = get_calibrator('temp-scaling', calibrate_with_mixture=True)
# other option: calib = MixtureCalibrator(TemperatureScalingCalibrator())
# there is also a fit_torch / predict_proba_torch interface
calib.fit(probas, labels)
calibrated_probas = calib.predict_proba(probas)
# -------- alternatively, using torch tensors (GPU support) ------------
```
### PyTorch interface
The PyTorch version can be used directly with GPU tensors, but
this can actually be slower than CPU for smaller validation sets (around 1K-10K samples).
```python
from probmetrics.distributions import CategoricalProbs
from probmetrics.calibrators import get_calibrator
import torch
probas = torch.as_tensor([[0.1, 0.9]])
labels = torch.as_tensor([1])
calib = get_calibrator('ts-mix')
# if you have logits, you can use CategoricalLogits instead
calib.fit_torch(CategoricalProbs(probas), labels)
calib.predict_proba_torch(CategoricalProbs(probas))
```
## Using our refinement and calibration metrics
We provide estimators for refinement error
(loss after post-hoc calibration)
and calibration error
(loss improvement through post-hoc calibration).
They can be used as follows:
```python
import torch
from probmetrics.metrics import Metrics
# compute multiple metrics at once
# this is more efficient than computing them individually
metrics = Metrics.from_names(['logloss',
'refinement_logloss_ts-mix_all',
'calib-err_logloss_ts-mix_all'])
y_true = torch.tensor(...)
y_logits = torch.tensor(...)
results = metrics.compute_all_from_labels_logits(y_true, y_logits)
print(results['refinement_logloss_ts-mix_all'].item())
```
## Using more metrics
In general, while some metrics can be
flexibly configured using the corresponding classes,
many metrics are available through their name.
Here are some relevant classification metrics:
```python
from probmetrics.metrics import Metrics
metrics = Metrics.from_names([
'logloss',
'brier', # for binary, this is 2x the brier from sklearn
'accuracy', 'class-error',
'auroc-ovr', # one-vs-rest
'auroc-ovo-sklearn', # one-vs-one (can be slow!)
# calibration metrics
'ece-15', 'rmsce-15', 'mce-15', 'smece'
'refinement_logloss_ts-mix_all',
'calib-err_logloss_ts-mix_all',
'refinement_brier_ts-mix_all',
'calib-err_brier_ts-mix_all'
])
```
The following function returns a list of all metric names:
```python
from probmetrics.metrics import Metrics, MetricType
Metrics.get_available_names(metric_type=MetricType.CLASS)
```
While there are some classes for regression metrics, they are not implemented.
Raw data
{
"_id": null,
"home_page": null,
"name": "probmetrics",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "Calibration, Classification, Metrics, Refinement",
"author": "David Holzm\u00fcller",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/1e/36/77d05cad2834a825e0cfbb8595d24ddbdb0b538949f3f5f9f15d0271c4b4/probmetrics-0.0.1.tar.gz",
"platform": null,
"description": "[](https://github.com/dholzmueller/probmetrics/actions/workflows/testing.yml)\n\n# Probmetrics: Classification metrics and post-hoc calibration\n\nThis package (PyTorch-based) currently contains\n- classification metrics, especially also \nmetrics for assessing the quality of probabilistic predictions, and\n- post-hoc calibration methods, especially \n a fast and accurate implementation of temperature scaling.\n\nIt accompanies our paper\n[Rethinking Early Stopping: Refine, Then Calibrate](https://arxiv.org/abs/2501.19195).\nPlease cite our paper if you use this repository for research purposes.\nThe experiments from the paper can be found here: \n[vision](https://github.com/eugeneberta/RefineThenCalibrate-Vision), \n[tabular](https://github.com/dholzmueller/pytabkit), \n[theory](https://github.com/eugeneberta/RefineThenCalibrate-Theory).\n\n## Installation\n\nProbmetrics is available via\n```bash\npip install probmetrics\n```\nTo obtain all functionality, install `probmetrics[extra,dev,dirichletcal]`.\n- extra installs more packages for smooth ECE, \n Venn-Abers calibration, \n centered isotonic regression, \n the temperature scaling implementation in NetCal.\n- dev installs more packages for development (esp. documentation)\n- dirichletcal installs Dirichlet calibration, \n which however only works for Python 3.12 upwards.\n\n## Using temperature scaling\n\nWe provide a highly efficient implementation of temperature scaling\nthat, unlike some other implementations, \ndoes not suffer from optimization issues.\n\n### Numpy interface\n\n```python\nfrom probmetrics.calibrators import get_calibrator\nimport numpy as np\n\nprobas = np.asarray([[0.1, 0.9]])\nlabels = np.asarray([1])\n# this is the version with Laplace smoothing, \n# use 'temp-scaling' for the version without\ncalib = get_calibrator('temp-scaling', calibrate_with_mixture=True)\n# other option: calib = MixtureCalibrator(TemperatureScalingCalibrator())\n# there is also a fit_torch / predict_proba_torch interface\ncalib.fit(probas, labels)\ncalibrated_probas = calib.predict_proba(probas)\n\n# -------- alternatively, using torch tensors (GPU support) ------------\n\n```\n\n### PyTorch interface\n\nThe PyTorch version can be used directly with GPU tensors, but \nthis can actually be slower than CPU for smaller validation sets (around 1K-10K samples).\n\n```python\nfrom probmetrics.distributions import CategoricalProbs\nfrom probmetrics.calibrators import get_calibrator\nimport torch\n\nprobas = torch.as_tensor([[0.1, 0.9]])\nlabels = torch.as_tensor([1])\n\ncalib = get_calibrator('ts-mix')\n\n# if you have logits, you can use CategoricalLogits instead\ncalib.fit_torch(CategoricalProbs(probas), labels)\ncalib.predict_proba_torch(CategoricalProbs(probas))\n```\n\n## Using our refinement and calibration metrics\n\nWe provide estimators for refinement error \n(loss after post-hoc calibration)\nand calibration error \n(loss improvement through post-hoc calibration). \nThey can be used as follows:\n\n```python\nimport torch\nfrom probmetrics.metrics import Metrics\n\n# compute multiple metrics at once \n# this is more efficient than computing them individually\nmetrics = Metrics.from_names(['logloss', \n 'refinement_logloss_ts-mix_all', \n 'calib-err_logloss_ts-mix_all'])\ny_true = torch.tensor(...)\ny_logits = torch.tensor(...)\nresults = metrics.compute_all_from_labels_logits(y_true, y_logits)\nprint(results['refinement_logloss_ts-mix_all'].item())\n```\n\n## Using more metrics\n\nIn general, while some metrics can be \nflexibly configured using the corresponding classes,\nmany metrics are available through their name. \nHere are some relevant classification metrics:\n```python\nfrom probmetrics.metrics import Metrics\n\nmetrics = Metrics.from_names([\n 'logloss',\n 'brier', # for binary, this is 2x the brier from sklearn\n 'accuracy', 'class-error',\n 'auroc-ovr', # one-vs-rest\n 'auroc-ovo-sklearn', # one-vs-one (can be slow!)\n # calibration metrics\n 'ece-15', 'rmsce-15', 'mce-15', 'smece'\n 'refinement_logloss_ts-mix_all', \n 'calib-err_logloss_ts-mix_all',\n 'refinement_brier_ts-mix_all', \n 'calib-err_brier_ts-mix_all'\n])\n```\n\nThe following function returns a list of all metric names:\n```python\nfrom probmetrics.metrics import Metrics, MetricType\nMetrics.get_available_names(metric_type=MetricType.CLASS)\n```\n\nWhile there are some classes for regression metrics, they are not implemented.",
"bugtrack_url": null,
"license": null,
"summary": "Probabilistic metrics and calibration methods",
"version": "0.0.1",
"project_urls": {
"Documentation": "https://github.com/dholzmueller/probmetrics#readme",
"Issues": "https://github.com/dholzmueller/probmetrics/issues",
"Source": "https://github.com/dholzmueller/probmetrics"
},
"split_keywords": [
"calibration",
" classification",
" metrics",
" refinement"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "4ba0b7e90be9cb1612cc13427e1055cbb59acb1b799d6143e8cae8fe2ede7731",
"md5": "deaefd4ef3e9661933fe689348a4854c",
"sha256": "ad70b7deccefdd94dbd3e3f8c4dffbc29a1a94f6a4596fb71170f836fd751000"
},
"downloads": -1,
"filename": "probmetrics-0.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "deaefd4ef3e9661933fe689348a4854c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 24132,
"upload_time": "2025-02-03T09:58:33",
"upload_time_iso_8601": "2025-02-03T09:58:33.739198Z",
"url": "https://files.pythonhosted.org/packages/4b/a0/b7e90be9cb1612cc13427e1055cbb59acb1b799d6143e8cae8fe2ede7731/probmetrics-0.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "1e3677d05cad2834a825e0cfbb8595d24ddbdb0b538949f3f5f9f15d0271c4b4",
"md5": "0eceadc70182159075fd28747ac66f7a",
"sha256": "010e731950eb4de1ca4a7bed199ecba9327b9dde42561acea24ec742d96b1751"
},
"downloads": -1,
"filename": "probmetrics-0.0.1.tar.gz",
"has_sig": false,
"md5_digest": "0eceadc70182159075fd28747ac66f7a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 21867,
"upload_time": "2025-02-03T09:58:36",
"upload_time_iso_8601": "2025-02-03T09:58:36.308500Z",
"url": "https://files.pythonhosted.org/packages/1e/36/77d05cad2834a825e0cfbb8595d24ddbdb0b538949f3f5f9f15d0271c4b4/probmetrics-0.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-03 09:58:36",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "dholzmueller",
"github_project": "probmetrics#readme",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "probmetrics"
}