Name | hepstats JSON |
Version |
0.9.0
JSON |
| download |
home_page | None |
Summary | statistics tools and utilities |
upload_time | 2024-11-09 19:56:33 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.9 |
license | BSD 3-Clause License |
keywords |
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
<img src="https://raw.githubusercontent.com/scikit-hep/hepstats/master/docs/images/logo.png" width="450">
# `hepstats` package: statistics tools and utilities
[![Scikit-HEP][sk-badge]](https://scikit-hep.org/)
[](https://pypi.org/project/hepstats/)
[](https://pypi.org/project/hepstats/)
[](https://anaconda.org/conda-forge/hepstats)
[](https://doi.org/10.5281/zenodo.3519200)

[](https://codecov.io/gh/scikit-hep/hepstats)
[](https://github.com/psf/black)
[](https://mybinder.org/v2/gh/scikit-hep/hepstats/master)
hepstats is a library for statistical inference aiming to cover the needs High Energy Physics.
It is part of the [Scikit-HEP project](https://scikit-hep.org/).
**Questions**: for usage questions, use [StackOverflow with the hepstats tag](https://stackoverflow.com/questions/ask?tags=hepstats)
**Bugs and odd behavior**: open [an issue with hepstats](https://github.com/scikit-hep/hepstats/issues/new)
## Installation
Install `hepstats` like any other Python package:
```
pip install hepstats
```
or similar (use e.g. `virtualenv` if you wish).
## Changelog
See the [changelog](https://github.com/scikit-hep/hepstats/blob/master/CHANGELOG.md) for a history of notable changes.
## Getting Started
The `hepstats` module includes `modeling`, `hypotests` and `splot` submodules. This a quick user guide to each submodule. The [binder](https://mybinder.org/v2/gh/scikit-hep/hepstats/master) examples are also a good way to get started.
### modeling
The modeling submodule includes the [Bayesian Block algorithm](https://arxiv.org/pdf/1207.5578.pdf) that can be used to improve the binning of histograms. The visual improvement can be dramatic, and more importantly, this algorithm produces histograms that accurately represent the underlying distribution while being robust to statistical fluctuations. Here is a small example of the algorithm applied on Laplacian sampled data, compared to a histogram of this sample with a fine binning.
```python
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> from hepstats.modeling import bayesian_blocks
>>> data = np.random.laplace(size=10000)
>>> blocks = bayesian_blocks(data)
>>> plt.hist(data, bins=1000, label='Fine Binning', density=True, alpha=0.6)
>>> plt.hist(data, bins=blocks, label='Bayesian Blocks', histtype='step', density=True, linewidth=2)
>>> plt.legend(loc=2)
```

### hypotests
This submodule provides tools to do hypothesis tests such as discovery test and computations of upper limits or confidence intervals. hepstats needs a fitting backend to perform computations such as [zfit](https://github.com/zfit/zfit). Any fitting library can be used if their API is compatible with hepstats (see [api checks](https://github.com/scikit-hep/hepstats/blob/master/hepstats/hypotests/utils/fit/api_check.py)).
We give here a simple example of an upper limit calculation of the yield of a Gaussian signal with known mean and sigma over an exponential background. The fitting backend used is the [zfit](https://github.com/zfit/zfit) package. An example with a **counting experiment** analysis is also given in the [binder](https://mybinder.org/v2/gh/scikit-hep/hepstats/master) examples.
```python
>>> import zfit
>>> from zfit.loss import ExtendedUnbinnedNLL
>>> from zfit.minimize import Minuit
>>> bounds = (0.1, 3.0)
>>> obs = zfit.Space('x', limits=bounds)
>>> bkg = np.random.exponential(0.5, 300)
>>> peak = np.random.normal(1.2, 0.1, 10)
>>> data = np.concatenate((bkg, peak))
>>> data = data[(data > bounds[0]) & (data < bounds[1])]
>>> N = data.size
>>> data = zfit.Data.from_numpy(obs=obs, array=data)
>>> lambda_ = zfit.Parameter("lambda", -2.0, -4.0, -1.0)
>>> Nsig = zfit.Parameter("Nsig", 1., -20., N)
>>> Nbkg = zfit.Parameter("Nbkg", N, 0., N*1.1)
>>> signal = zfit.pdf.Gauss(obs=obs, mu=1.2, sigma=0.1).create_extended(Nsig)
>>> background = zfit.pdf.Exponential(obs=obs, lambda_=lambda_).create_extended(Nbkg)
>>> total = zfit.pdf.SumPDF([signal, background])
>>> loss = ExtendedUnbinnedNLL(model=total, data=data)
>>> from hepstats.hypotests.calculators import AsymptoticCalculator
>>> from hepstats.hypotests import UpperLimit
>>> from hepstats.hypotests.parameters import POI, POIarray
>>> calculator = AsymptoticCalculator(loss, Minuit(), asimov_bins=100)
>>> poinull = POIarray(Nsig, np.linspace(0.0, 25, 20))
>>> poialt = POI(Nsig, 0)
>>> ul = UpperLimit(calculator, poinull, poialt)
>>> ul.upperlimit(alpha=0.05, CLs=True)
Observed upper limit: Nsig = 15.725784747406346
Expected upper limit: Nsig = 11.927442041887158
Expected upper limit +1 sigma: Nsig = 16.596396280677116
Expected upper limit -1 sigma: Nsig = 8.592750403611896
Expected upper limit +2 sigma: Nsig = 22.24864429383046
Expected upper limit -2 sigma: Nsig = 6.400549971360598
```

### splots
A full example using the **sPlot** algorithm can be found [here](https://github.com/scikit-hep/hepstats/tree/master/notebooks/splots/splot_example.ipynb). **sWeights** for different components in a data sample, modeled with a sum of extended probability density functions, are derived using the `compute_sweights` function:
```python
>>> from hepstats.splot import compute_sweights
# using same model as above for illustration
>>> sweights = compute_sweights(zfit.pdf.SumPDF([signal, background]), data)
>>> bkg_sweights = sweights[Nbkg]
>>> sig_sweights = sweights[Nsig]
```
The model needs to be fitted to the data for the computation of the **sWeights**, if not an error is raised.
[sk-badge]: https://img.shields.io/badge/Scikit--HEP-Project-blue?logo=
Raw data
{
"_id": null,
"home_page": null,
"name": "hepstats",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": "Scikit-HEP <scikit-hep-admins@googlegroups.com>",
"keywords": null,
"author": null,
"author_email": "Matthieu Marinangeli <matthieu.marinangeli@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/85/b2/2cad7bb1aaa540bb824efb0e2f0f6de7267c47f9cb9e58e8c30a3d5c3541/hepstats-0.9.0.tar.gz",
"platform": null,
"description": "<img src=\"https://raw.githubusercontent.com/scikit-hep/hepstats/master/docs/images/logo.png\" width=\"450\">\n\n\n# `hepstats` package: statistics tools and utilities\n\n[![Scikit-HEP][sk-badge]](https://scikit-hep.org/)\n\n[](https://pypi.org/project/hepstats/)\n[](https://pypi.org/project/hepstats/)\n[](https://anaconda.org/conda-forge/hepstats)\n[](https://doi.org/10.5281/zenodo.3519200)\n\n\n[](https://codecov.io/gh/scikit-hep/hepstats)\n[](https://github.com/psf/black)\n\n[](https://mybinder.org/v2/gh/scikit-hep/hepstats/master)\n\nhepstats is a library for statistical inference aiming to cover the needs High Energy Physics.\nIt is part of the [Scikit-HEP project](https://scikit-hep.org/).\n\n**Questions**: for usage questions, use [StackOverflow with the hepstats tag](https://stackoverflow.com/questions/ask?tags=hepstats)\n**Bugs and odd behavior**: open [an issue with hepstats](https://github.com/scikit-hep/hepstats/issues/new)\n\n## Installation\n\nInstall `hepstats` like any other Python package:\n\n```\npip install hepstats\n```\n\nor similar (use e.g. `virtualenv` if you wish).\n\n## Changelog\nSee the [changelog](https://github.com/scikit-hep/hepstats/blob/master/CHANGELOG.md) for a history of notable changes.\n\n## Getting Started\n\nThe `hepstats` module includes `modeling`, `hypotests` and `splot` submodules. This a quick user guide to each submodule. The [binder](https://mybinder.org/v2/gh/scikit-hep/hepstats/master) examples are also a good way to get started.\n\n### modeling\n\nThe modeling submodule includes the [Bayesian Block algorithm](https://arxiv.org/pdf/1207.5578.pdf) that can be used to improve the binning of histograms. The visual improvement can be dramatic, and more importantly, this algorithm produces histograms that accurately represent the underlying distribution while being robust to statistical fluctuations. Here is a small example of the algorithm applied on Laplacian sampled data, compared to a histogram of this sample with a fine binning.\n\n```python\n>>> import numpy as np\n>>> import matplotlib.pyplot as plt\n>>> from hepstats.modeling import bayesian_blocks\n\n>>> data = np.random.laplace(size=10000)\n>>> blocks = bayesian_blocks(data)\n\n>>> plt.hist(data, bins=1000, label='Fine Binning', density=True, alpha=0.6)\n>>> plt.hist(data, bins=blocks, label='Bayesian Blocks', histtype='step', density=True, linewidth=2)\n>>> plt.legend(loc=2)\n```\n\n\n\n### hypotests\n\nThis submodule provides tools to do hypothesis tests such as discovery test and computations of upper limits or confidence intervals. hepstats needs a fitting backend to perform computations such as [zfit](https://github.com/zfit/zfit). Any fitting library can be used if their API is compatible with hepstats (see [api checks](https://github.com/scikit-hep/hepstats/blob/master/hepstats/hypotests/utils/fit/api_check.py)).\n\nWe give here a simple example of an upper limit calculation of the yield of a Gaussian signal with known mean and sigma over an exponential background. The fitting backend used is the [zfit](https://github.com/zfit/zfit) package. An example with a **counting experiment** analysis is also given in the [binder](https://mybinder.org/v2/gh/scikit-hep/hepstats/master) examples.\n\n```python\n>>> import zfit\n>>> from zfit.loss import ExtendedUnbinnedNLL\n>>> from zfit.minimize import Minuit\n\n>>> bounds = (0.1, 3.0)\n>>> obs = zfit.Space('x', limits=bounds)\n\n>>> bkg = np.random.exponential(0.5, 300)\n>>> peak = np.random.normal(1.2, 0.1, 10)\n>>> data = np.concatenate((bkg, peak))\n>>> data = data[(data > bounds[0]) & (data < bounds[1])]\n>>> N = data.size\n>>> data = zfit.Data.from_numpy(obs=obs, array=data)\n\n>>> lambda_ = zfit.Parameter(\"lambda\", -2.0, -4.0, -1.0)\n>>> Nsig = zfit.Parameter(\"Nsig\", 1., -20., N)\n>>> Nbkg = zfit.Parameter(\"Nbkg\", N, 0., N*1.1)\n>>> signal = zfit.pdf.Gauss(obs=obs, mu=1.2, sigma=0.1).create_extended(Nsig)\n>>> background = zfit.pdf.Exponential(obs=obs, lambda_=lambda_).create_extended(Nbkg)\n>>> total = zfit.pdf.SumPDF([signal, background])\n>>> loss = ExtendedUnbinnedNLL(model=total, data=data)\n\n>>> from hepstats.hypotests.calculators import AsymptoticCalculator\n>>> from hepstats.hypotests import UpperLimit\n>>> from hepstats.hypotests.parameters import POI, POIarray\n\n>>> calculator = AsymptoticCalculator(loss, Minuit(), asimov_bins=100)\n>>> poinull = POIarray(Nsig, np.linspace(0.0, 25, 20))\n>>> poialt = POI(Nsig, 0)\n>>> ul = UpperLimit(calculator, poinull, poialt)\n>>> ul.upperlimit(alpha=0.05, CLs=True)\n\nObserved upper limit: Nsig = 15.725784747406346\nExpected upper limit: Nsig = 11.927442041887158\nExpected upper limit +1 sigma: Nsig = 16.596396280677116\nExpected upper limit -1 sigma: Nsig = 8.592750403611896\nExpected upper limit +2 sigma: Nsig = 22.24864429383046\nExpected upper limit -2 sigma: Nsig = 6.400549971360598\n```\n\n\n\n### splots\n\nA full example using the **sPlot** algorithm can be found [here](https://github.com/scikit-hep/hepstats/tree/master/notebooks/splots/splot_example.ipynb). **sWeights** for different components in a data sample, modeled with a sum of extended probability density functions, are derived using the `compute_sweights` function:\n\n```python\n>>> from hepstats.splot import compute_sweights\n\n# using same model as above for illustration\n>>> sweights = compute_sweights(zfit.pdf.SumPDF([signal, background]), data)\n\n>>> bkg_sweights = sweights[Nbkg]\n>>> sig_sweights = sweights[Nsig]\n```\n\nThe model needs to be fitted to the data for the computation of the **sWeights**, if not an error is raised.\n\n[sk-badge]: https://img.shields.io/badge/Scikit--HEP-Project-blue?logo=\n",
"bugtrack_url": null,
"license": "BSD 3-Clause License",
"summary": "statistics tools and utilities",
"version": "0.9.0",
"project_urls": {
"Homepage": "https://github.com/scikit-hep/hepstats"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "1b8221c846414492257ffc90944b7a878d4426815a710de61b0f547a7e153535",
"md5": "f113d7495382a068f58c6f7fba48cc47",
"sha256": "ea9ba63a19363e1d7e18c62a241cca49efe42663aa57811a943bd9d955db437d"
},
"downloads": -1,
"filename": "hepstats-0.9.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f113d7495382a068f58c6f7fba48cc47",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 42931,
"upload_time": "2024-11-09T19:56:30",
"upload_time_iso_8601": "2024-11-09T19:56:30.631083Z",
"url": "https://files.pythonhosted.org/packages/1b/82/21c846414492257ffc90944b7a878d4426815a710de61b0f547a7e153535/hepstats-0.9.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "85b22cad7bb1aaa540bb824efb0e2f0f6de7267c47f9cb9e58e8c30a3d5c3541",
"md5": "97c8fa22a33511a2a06afef920d9bc15",
"sha256": "5c5af54a845332e9e2f27d5899c680f9f425f16ab9e16683d1bf80dd9ca8ec14"
},
"downloads": -1,
"filename": "hepstats-0.9.0.tar.gz",
"has_sig": false,
"md5_digest": "97c8fa22a33511a2a06afef920d9bc15",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 16641856,
"upload_time": "2024-11-09T19:56:33",
"upload_time_iso_8601": "2024-11-09T19:56:33.227213Z",
"url": "https://files.pythonhosted.org/packages/85/b2/2cad7bb1aaa540bb824efb0e2f0f6de7267c47f9cb9e58e8c30a3d5c3541/hepstats-0.9.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-09 19:56:33",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "scikit-hep",
"github_project": "hepstats",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "hepstats"
}