# EMD model comparison library
The EMD (empirical model discrepancy) is a method for comparing models based both on their expected risk, and the uncertainty of that risk across experiments.
It is described in [this publication](); it’s main features are:
- **Symmetric**: All models are treated the same. (There is no preferred null model.)
A corollary is that the test works for any number of models.
- **Specific**: Models are compared for particular parameter sets. In particular, the different models may all be the same equations but with different parameters.
- **Dimension agnostic**: Models are compared based on their quantile function, which is always 1d. So the method scales well to high-dimensional problems.
- **Designed for real-world data**: A “true” model is not required: all models are assumed to be imperfect.
- **Designed for reproducibility**: Comparisons are based on a model’s ability to generalize.
- **Designed for big data**: Comparisons assume a standard machine learning pipeline with separate training and test datasets.
[](https://doi.org/10.5281/zenodo.13287993)
## Problem requirements
The main requirement to be able to compute the EMD criterion are:
- Observation data.
- A loss function.
- At least two models to compare.
- The ability to use the models to generate synthetic samples.
The models can take any form; they can even be blackbox deep neural networks.
## Installation
pip install emdcmp
## Usage
The short form for comparing two models is
```python
from emdcmp import Bemd, make_empirical_risk, draw_R_samples
# Load data into `data`
# Define `modelA`, `modelB`
# Define `lossA`, `lossB` : functions
# Define `Lsynth` : int
# Define `c` : float
synth_ppfA = make_empirical_risk(lossA(modelA.generate(Lsynth)))
synth_ppfB = make_empirical_risk(lossB(modelB.generate(Lsynth)))
mixed_ppfA = make_empirical_risk(lossA(data))
mixed_ppfB = make_empirical_risk(lossB(data))
Bemd(mixed_ppfA, mixed_ppfB, synth_ppfA, synth_ppfB, c=c)
```
The function `Bemd` returns a float corresponding to the tail probability
$$B^{\mathrm{EMD}}_{AB} = P(R_A < R_B \mid c) \,.$$
This is the probability that the [expected risk](https://en.wikipedia.org/wiki/Statistical_learning_theory#Formal_description) (aka the expected loss) of model $A$ is less than that of model $B$, _across replications of the experiment_. The fundamental assumption underlying EMD is this:
> Discrepancies between model predictions and observed data are due to uncontrolled experimental factors. Those factors may change across replications.
Computations work by generating for each model a *risk-distribution*, representing our estimate of what the risk would be across experiments.
Comparisons are tied to the *overlap* between distributions: a model is rejected if it is always worse than another model. For example, if we compute $B^{\mathrm{EMD}}_{AB}$ = 95%, that would mean that in at least 95% of replications, model $A$ would have a lower (better) risk than model $B$. We could then decide to reject model $B$.
The scaling parameter $c$ controls the sensitivity of replications to prediction errors: larger $c$ means more variability, broader risk distributions, increased overlap, and reduced ability to differentiate between models. In our experiments we found that values in the range 2⁻¹ to 2⁻² worked well, but see the paper for a calibration method to identify appropriate for your experiment.
This package provides `emdcmp.tasks.Calibrate` to help run calibration experiments.
Although the single $B^{\mathrm{EMD}}$ number is convenient, much more informative are the risk-distributions themselves.
These distributions are obtained by sampling, with the function `draw_R_samples`:
```python
RA_samples = emd.draw_R_samples(mixed_ppfA, synth_ppfA, c=c)
RB_samples = emd.draw_R_samples(mixed_ppfB, synth_ppfB, c=c)
```
Samples can then be plotted as a distribution. The example below uses [HoloViews](https://holoviews.org/):
```python
import holoviews as hv
hv.Distribution(RA_samples, label="Model A") \
* hv.Distribution(RB_samples, label="Model B")
```
### Complete usage examples
The documentation contains a [simple example](https://alcrene.github.io/emdcmp/src/emdcmp/emd.html#test-sampling-of-expected-risk-r).
Moreover, all the [code for the paper’s figures] is available, in the form of Jupyter notebooks.
These are heavily commented with extra additional usage hints; they are highly recommended reading.
## Debugging
If computations are taking inordinately long, set the debug level to `DEBUG`:
logging.getLogger("emdcmp").setLevel("DEBUG")
This will print messages to your console reporting how much time each computation step is taking, which should help pin down the source of the issue.
## Further work
The current implementation of the hierarchical beta process (used for sampling quantile paths) has seen quite a lot of testing for numerical stability, but little optimization effort. In particular it makes a lot of calls to functions in `scipy.optimize`, which makes the whole function quite slow: even with a relatively complicated data model like the [pyloric circuit](https://alcrene.github.io/pyloric-network-simulator/pyloric_simulator/prinz2004.html), drawing quantile paths can still take 10x longer than generating the data.
Substantial performance improvements to the sampling algorithm would almost certainly be possible with dedicated computer science effort. This sampling is the main bottleneck, any improvement on this front would also benefit the whole EMD procedure.
The hierarchical beta process is also not the only possible process for stochastically generating quantile paths: it was chosen in part because it made proving integrability particularly simple. Other processes may provide other advantages, either with respect to statistical robustness or computation speed.
Raw data
{
"_id": null,
"home_page": null,
"name": "emdcmp",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "empirical model discrepancy, bayes factor, inference, model comparison, bayesian",
"author": null,
"author_email": "Alexandre Ren\u00e9 <a.rene@physik.rwth-aachen.de>",
"download_url": "https://files.pythonhosted.org/packages/77/42/8c22ef18de2f804f88b2b5f606b2358b4c243a0b88dc6c8d568e73c0358f/emdcmp-1.1.1.tar.gz",
"platform": null,
"description": "# EMD model comparison library\n\nThe EMD (empirical model discrepancy) is a method for comparing models based both on their expected risk, and the uncertainty of that risk across experiments.\nIt is described in [this publication](); it\u2019s main features are:\n- **Symmetric**: All models are treated the same. (There is no preferred null model.)\n A corollary is that the test works for any number of models.\n- **Specific**: Models are compared for particular parameter sets. In particular, the different models may all be the same equations but with different parameters.\n- **Dimension agnostic**: Models are compared based on their quantile function, which is always 1d. So the method scales well to high-dimensional problems.\n- **Designed for real-world data**: A \u201ctrue\u201d model is not required: all models are assumed to be imperfect.\n- **Designed for reproducibility**: Comparisons are based on a model\u2019s ability to generalize.\n- **Designed for big data**: Comparisons assume a standard machine learning pipeline with separate training and test datasets.\n\n[](https://doi.org/10.5281/zenodo.13287993)\n\n## Problem requirements\n\nThe main requirement to be able to compute the EMD criterion are:\n\n- Observation data.\n- A loss function.\n- At least two models to compare.\n- The ability to use the models to generate synthetic samples.\n\nThe models can take any form; they can even be blackbox deep neural networks.\n\n## Installation\n\n pip install emdcmp\n\n## Usage\n\nThe short form for comparing two models is\n\n```python\nfrom emdcmp import Bemd, make_empirical_risk, draw_R_samples\n\n# Load data into `data`\n# Define `modelA`, `modelB`\n# Define `lossA`, `lossB` : functions\n# Define `Lsynth` : int\n# Define `c` : float\n\nsynth_ppfA = make_empirical_risk(lossA(modelA.generate(Lsynth)))\nsynth_ppfB = make_empirical_risk(lossB(modelB.generate(Lsynth)))\nmixed_ppfA = make_empirical_risk(lossA(data))\nmixed_ppfB = make_empirical_risk(lossB(data))\n\nBemd(mixed_ppfA, mixed_ppfB, synth_ppfA, synth_ppfB, c=c)\n```\n\nThe function `Bemd` returns a float corresponding to the tail probability\n$$B^{\\mathrm{EMD}}_{AB} = P(R_A < R_B \\mid c) \\,.$$\nThis is the probability that the [expected risk](https://en.wikipedia.org/wiki/Statistical_learning_theory#Formal_description) (aka the expected loss) of model $A$ is less than that of model $B$, _across replications of the experiment_. The fundamental assumption underlying EMD is this:\n\n> Discrepancies between model predictions and observed data are due to uncontrolled experimental factors. Those factors may change across replications.\n\nComputations work by generating for each model a *risk-distribution*, representing our estimate of what the risk would be across experiments. \nComparisons are tied to the *overlap* between distributions: a model is rejected if it is always worse than another model. For example, if we compute $B^{\\mathrm{EMD}}_{AB}$ = 95%, that would mean that in at least 95% of replications, model $A$ would have a lower (better) risk than model $B$. We could then decide to reject model $B$.\n\nThe scaling parameter $c$ controls the sensitivity of replications to prediction errors: larger $c$ means more variability, broader risk distributions, increased overlap, and reduced ability to differentiate between models. In our experiments we found that values in the range 2\u207b\u00b9 to 2\u207b\u00b2 worked well, but see the paper for a calibration method to identify appropriate for your experiment.\nThis package provides `emdcmp.tasks.Calibrate` to help run calibration experiments.\n\n\nAlthough the single $B^{\\mathrm{EMD}}$ number is convenient, much more informative are the risk-distributions themselves. \nThese distributions are obtained by sampling, with the function `draw_R_samples`:\n\n```python\nRA_samples = emd.draw_R_samples(mixed_ppfA, synth_ppfA, c=c)\nRB_samples = emd.draw_R_samples(mixed_ppfB, synth_ppfB, c=c)\n```\n\nSamples can then be plotted as a distribution. The example below uses [HoloViews](https://holoviews.org/):\n```python\nimport holoviews as hv\nhv.Distribution(RA_samples, label=\"Model A\") \\\n* hv.Distribution(RB_samples, label=\"Model B\")\n```\n\n### Complete usage examples\n\nThe documentation contains a [simple example](https://alcrene.github.io/emdcmp/src/emdcmp/emd.html#test-sampling-of-expected-risk-r).\nMoreover, all the [code for the paper\u2019s figures] is available, in the form of Jupyter notebooks.\nThese are heavily commented with extra additional usage hints; they are highly recommended reading.\n\n\n## Debugging\n\nIf computations are taking inordinately long, set the debug level to `DEBUG`:\n\n \n logging.getLogger(\"emdcmp\").setLevel(\"DEBUG\")\n\nThis will print messages to your console reporting how much time each computation step is taking, which should help pin down the source of the issue.\n\n## Further work\n\nThe current implementation of the hierarchical beta process (used for sampling quantile paths) has seen quite a lot of testing for numerical stability, but little optimization effort. In particular it makes a lot of calls to functions in `scipy.optimize`, which makes the whole function quite slow: even with a relatively complicated data model like the [pyloric circuit](https://alcrene.github.io/pyloric-network-simulator/pyloric_simulator/prinz2004.html), drawing quantile paths can still take 10x longer than generating the data.\n\nSubstantial performance improvements to the sampling algorithm would almost certainly be possible with dedicated computer science effort. This sampling is the main bottleneck, any improvement on this front would also benefit the whole EMD procedure.\n\nThe hierarchical beta process is also not the only possible process for stochastically generating quantile paths: it was chosen in part because it made proving integrability particularly simple. Other processes may provide other advantages, either with respect to statistical robustness or computation speed.\n",
"bugtrack_url": null,
"license": "MPL 2.0",
"summary": "Original implementation of the EMD (empirical model discrepancy) model comparison criterion",
"version": "1.1.1",
"project_urls": {
"Bug Tracker": "https://github.com/alcrene/emdcmp/issues",
"Documentation": "https://alcrene.github.io/emdcmp/",
"Source": "https://github.com/alcrene/emdcmp"
},
"split_keywords": [
"empirical model discrepancy",
" bayes factor",
" inference",
" model comparison",
" bayesian"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "8286f6405113885d4cbb2df12346a344f34a646bec5f9a72d282b98364633c96",
"md5": "9149a040d8b2ecfc5d69b416ddb13d83",
"sha256": "97df0f230cab10b0bbf23802497d21e20bdb218f1b3aac8639fd0fa2616ebdf9"
},
"downloads": -1,
"filename": "emdcmp-1.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9149a040d8b2ecfc5d69b416ddb13d83",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 117632,
"upload_time": "2025-08-06T08:33:29",
"upload_time_iso_8601": "2025-08-06T08:33:29.163602Z",
"url": "https://files.pythonhosted.org/packages/82/86/f6405113885d4cbb2df12346a344f34a646bec5f9a72d282b98364633c96/emdcmp-1.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "77428c22ef18de2f804f88b2b5f606b2358b4c243a0b88dc6c8d568e73c0358f",
"md5": "c6b45870d407d55711a83e905a930508",
"sha256": "910e870976939991ad0e7315ad4d5a6a254ef7bcd81396731f276d8dbf870d29"
},
"downloads": -1,
"filename": "emdcmp-1.1.1.tar.gz",
"has_sig": false,
"md5_digest": "c6b45870d407d55711a83e905a930508",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 108405,
"upload_time": "2025-08-06T08:33:30",
"upload_time_iso_8601": "2025-08-06T08:33:30.700031Z",
"url": "https://files.pythonhosted.org/packages/77/42/8c22ef18de2f804f88b2b5f606b2358b4c243a0b88dc6c8d568e73c0358f/emdcmp-1.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-06 08:33:30",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "alcrene",
"github_project": "emdcmp",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "emdcmp"
}