emd-falsify


Nameemd-falsify JSON
Version 1.0.1 PyPI version JSON
download
home_pageNone
SummaryOriginal implementation of the EMD (empirical model discrepancy) model comparison criterion
upload_time2024-08-09 22:34:50
maintainerNone
docs_urlNone
authorNone
requires_python>=3.7
licenseMPL 2.0
keywords empirical model discrepancy bayes factor inference model comparison bayesian
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # EMD falsification library

The EMD (empirical model discrepancy) criterion is used to compare models based on how well each describes the data.
It is described in [this publication](); it’s main features are:
- **Symmetric**: All models are treated the same. (There is no preferred null model.)
  A corollary is that the test works for any number of parameters.
- **Specific**: Models are compared for particular parameter sets. In particular, the different models may all be the same equations but with different parameters.
- **Dimension agnostic**: Models are compared based on their quantile function, which is always 1d. So the method scales well to high-dimensional problems.

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.13287993.svg)](https://doi.org/10.5281/zenodo.13287993)

## Problem requirements

The main requirement to be able to compute the EMD criterion are:

- Observation data.
- At least two models to compare.
- The ability to use the models to generate synthetic samples.

The models can take any form; they can even be blackbox deep neural networks.

## Installation

    pip install emd-falsify

## Usage

The short form for comparing two models is

```python
from emd_falsify import Bemd, make_empirical_risk

# Load data into `data`
# Define `modelA`, `modelB`
# Define `lossA`, `lossB` : functions
# Define `Lsynth` : int
# Define `c` : float

synth_ppfA = make_empirical_risk(lossA(modelA.generate(Lsynth)))
synth_ppfB = make_empirical_risk(lossB(modelB.generate(Lsynth)))
mixed_ppfA = make_empirical_risk(lossA(data))
mixed_ppfB = make_empirical_risk(lossB(data))

Bemd(mixed_ppfA, mixed_ppfB, synth_ppfA, synth_ppfB, c=c)
```

We also expose additional functions like the lower level `draw_R_samples`.
Using them is a bit more verbose, but especially for cases with multiple models to compare,
they may be more convenient.

Note that comparisons depend on choosing an appropriate value for `c`; a systematic way to do this is via a *calibration experiment*, as described in our publication.
This package provides `emd_falsify.tasks.Calibrate` to help run calibration experiments.

### Complete usage examples

The documentation contains a [simple example](https://alcrene.github.io/emd-falsify/src/emd_falsify/emd.html#test-sampling-of-expected-risk-r).
Moreover, all the [code for the paper’s figures] is available, in the form of Jupyter notebooks.
These are heavily commented with extra additional usage hints; they are highly recommended reading.


## Debugging

If computations are taking inordinately long, set the debug level to `DEBUG`:

    
    logging.getLogger("emd_falsify").setLevel("DEBUG")

This will print messages to your console reporting how much time each computation step is taking, which should help pin down the source of the issue.

## Further work

The current implementation of the hierarchical beta process (used for sampling quantile paths) has seen quite a lot of testing for numerical stability, but little optimization effort. In particular it makes a lot of calls to functions in `scipy.optimize`, which makes the whole function quite slow: even with a relatively complicated data model like the [pyloric circuit](https://alcrene.github.io/pyloric-network-simulator/pyloric_simulator/prinz2004.html), drawing quantile paths can still take 10x longer than generating the data.

Substantial performance improvements to the sampling algorithm would almost certainly be possible with dedicated computer science effort. This sampling is the main bottleneck, any improvement on this front would also benefit the whole EMD procedure.

The hierarchical beta process is also not the only possible process for stochastically generating quantile paths: it was chosen in part because it made proving integrability particularly simple. Other processes may provide other advantages, either with respect to statistical robustness or computation speed.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "emd-falsify",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "empirical model discrepancy, bayes factor, inference, model comparison, bayesian",
    "author": null,
    "author_email": "Alexandre Ren\u00e9 <arene010@uottawa.ca>",
    "download_url": "https://files.pythonhosted.org/packages/40/22/b6915111305f7439c5d5c1d9d0488c48205f3ed62737c8135ca99f483aec/emd_falsify-1.0.1.tar.gz",
    "platform": null,
    "description": "# EMD falsification library\n\nThe EMD (empirical model discrepancy) criterion is used to compare models based on how well each describes the data.\nIt is described in [this publication](); it\u2019s main features are:\n- **Symmetric**: All models are treated the same. (There is no preferred null model.)\n  A corollary is that the test works for any number of parameters.\n- **Specific**: Models are compared for particular parameter sets. In particular, the different models may all be the same equations but with different parameters.\n- **Dimension agnostic**: Models are compared based on their quantile function, which is always 1d. So the method scales well to high-dimensional problems.\n\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.13287993.svg)](https://doi.org/10.5281/zenodo.13287993)\n\n## Problem requirements\n\nThe main requirement to be able to compute the EMD criterion are:\n\n- Observation data.\n- At least two models to compare.\n- The ability to use the models to generate synthetic samples.\n\nThe models can take any form; they can even be blackbox deep neural networks.\n\n## Installation\n\n    pip install emd-falsify\n\n## Usage\n\nThe short form for comparing two models is\n\n```python\nfrom emd_falsify import Bemd, make_empirical_risk\n\n# Load data into `data`\n# Define `modelA`, `modelB`\n# Define `lossA`, `lossB` : functions\n# Define `Lsynth` : int\n# Define `c` : float\n\nsynth_ppfA = make_empirical_risk(lossA(modelA.generate(Lsynth)))\nsynth_ppfB = make_empirical_risk(lossB(modelB.generate(Lsynth)))\nmixed_ppfA = make_empirical_risk(lossA(data))\nmixed_ppfB = make_empirical_risk(lossB(data))\n\nBemd(mixed_ppfA, mixed_ppfB, synth_ppfA, synth_ppfB, c=c)\n```\n\nWe also expose additional functions like the lower level `draw_R_samples`.\nUsing them is a bit more verbose, but especially for cases with multiple models to compare,\nthey may be more convenient.\n\nNote that comparisons depend on choosing an appropriate value for `c`; a systematic way to do this is via a *calibration experiment*, as described in our publication.\nThis package provides `emd_falsify.tasks.Calibrate` to help run calibration experiments.\n\n### Complete usage examples\n\nThe documentation contains a [simple example](https://alcrene.github.io/emd-falsify/src/emd_falsify/emd.html#test-sampling-of-expected-risk-r).\nMoreover, all the [code for the paper\u2019s figures] is available, in the form of Jupyter notebooks.\nThese are heavily commented with extra additional usage hints; they are highly recommended reading.\n\n\n## Debugging\n\nIf computations are taking inordinately long, set the debug level to `DEBUG`:\n\n    \n    logging.getLogger(\"emd_falsify\").setLevel(\"DEBUG\")\n\nThis will print messages to your console reporting how much time each computation step is taking, which should help pin down the source of the issue.\n\n## Further work\n\nThe current implementation of the hierarchical beta process (used for sampling quantile paths) has seen quite a lot of testing for numerical stability, but little optimization effort. In particular it makes a lot of calls to functions in `scipy.optimize`, which makes the whole function quite slow: even with a relatively complicated data model like the [pyloric circuit](https://alcrene.github.io/pyloric-network-simulator/pyloric_simulator/prinz2004.html), drawing quantile paths can still take 10x longer than generating the data.\n\nSubstantial performance improvements to the sampling algorithm would almost certainly be possible with dedicated computer science effort. This sampling is the main bottleneck, any improvement on this front would also benefit the whole EMD procedure.\n\nThe hierarchical beta process is also not the only possible process for stochastically generating quantile paths: it was chosen in part because it made proving integrability particularly simple. Other processes may provide other advantages, either with respect to statistical robustness or computation speed.\n",
    "bugtrack_url": null,
    "license": "MPL 2.0",
    "summary": "Original implementation of the EMD (empirical model discrepancy) model comparison criterion",
    "version": "1.0.1",
    "project_urls": {
        "Bug Tracker": "https://github.com/alcrene/emd-falsify/issues",
        "Documentation": "https://alcrene.github.io/emd-falsify/",
        "Source": "https://github.com/alcrene/emd-falsify"
    },
    "split_keywords": [
        "empirical model discrepancy",
        " bayes factor",
        " inference",
        " model comparison",
        " bayesian"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "414e3b13c6fe2c7a2199a17e9f68ac2f8f18348d12d3cd78aad06f513b150bf4",
                "md5": "b2bb2c6c76ace4a88d5f3dac2fe0d8dc",
                "sha256": "55170824fbe238bc00416030b146c0de790bd7e0975ddb9b7f37fbb9a325137e"
            },
            "downloads": -1,
            "filename": "emd_falsify-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b2bb2c6c76ace4a88d5f3dac2fe0d8dc",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 114249,
            "upload_time": "2024-08-09T22:34:48",
            "upload_time_iso_8601": "2024-08-09T22:34:48.991358Z",
            "url": "https://files.pythonhosted.org/packages/41/4e/3b13c6fe2c7a2199a17e9f68ac2f8f18348d12d3cd78aad06f513b150bf4/emd_falsify-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4022b6915111305f7439c5d5c1d9d0488c48205f3ed62737c8135ca99f483aec",
                "md5": "31d5ef1c8b904b857b4ac00e1a601b29",
                "sha256": "b8221ff5a0ccd31a40d88d6afdaff9a06278fcc484279ac009a22a5348447ca5"
            },
            "downloads": -1,
            "filename": "emd_falsify-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "31d5ef1c8b904b857b4ac00e1a601b29",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 103832,
            "upload_time": "2024-08-09T22:34:50",
            "upload_time_iso_8601": "2024-08-09T22:34:50.324066Z",
            "url": "https://files.pythonhosted.org/packages/40/22/b6915111305f7439c5d5c1d9d0488c48205f3ed62737c8135ca99f483aec/emd_falsify-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-09 22:34:50",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "alcrene",
    "github_project": "emd-falsify",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "emd-falsify"
}
        
Elapsed time: 4.02639s