# Significance Analysis
[![PyPI version](https://img.shields.io/pypi/v/significance-analysis?color=informational)](https://pypi.org/project/significance-analysis/)
[![Python versions](https://img.shields.io/pypi/pyversions/significance-analysis)](https://pypi.org/project/significance-analysis/)
[![License](https://img.shields.io/pypi/l/significance-analysis?color=informational)](LICENSE)
This package is used to analyse datasets of different HPO-algorithms performing on multiple benchmarks.
## Note
As indicated with the `v0.x.x` version number, Significance Analysis is early stage code and APIs might change in the future.
## Documentation
Please have a look at our [example](significance_analysis_example/example_analysis.py).
The dataset should have the following format:
| system_id<br>(algorithm name) | input_id<br>(benchmark name) | metric<br>(mean/estimate) | optional: bin_id<br>(budget/traininground) |
| ----------------------------- | ---------------------------- | ------------------------- | ------------------------------------------ |
| Algorithm1 | Benchmark1 | x.xxx | 1 |
| Algorithm1 | Benchmark1 | x.xxx | 2 |
| Algorithm1 | Benchmark2 | x.xxx | 1 |
| ... | ... | ... | ... |
| Algorithm2 | Benchmark2 | x..xxx | 2 |
In this dataset, there are two different algorithms, trained on two benchmarks for two iterations each. The variable-names (system_id, input_id...) can be customized, but have to be consistent throughout the dataset, i.e. not "mean" for one benchmark and "estimate" for another. The `conduct_analysis` function is then called with the dataset and the variable-names as parameters.
Optionally the dataset can be binned according to a fourth variable (bin_id) and the analysis is conducted on each of the bins seperately, as shown in the code example above. To do this, provide the name of the bin_id-variable and if wanted the exact bins and bin labels. Otherwise a bin for each unique value will be created.
## Installation
Using R, >=4.0.0
install packages: Matrix, emmeans, lmerTest and lme4
Using pip
```bash
pip install significance-analysis
```
## Usage
1. Generate data from HPO-algorithms on benchmarks, saving data according to our format.
1. Call function `conduct_analysis` on dataset, while specifying variable-names
In code, the usage pattern can look like this:
```python
import pandas as pd
from signficance_analysis import conduct_analysis
# 1. Generate/import dataset
data = pd.read_csv("./significance_analysis_example/exampleDataset.csv")
# 2. Analyse dataset
conduct_analysis(data, "mean", "acquisition", "benchmark")
```
For more details and features please have a look at our [example](significance_analysis_example/example_analysis.py).
Raw data
{
"_id": null,
"home_page": "",
"name": "significance-analysis",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8,<3.11",
"maintainer_email": "",
"keywords": "Hyperparameter Optimization,AutoML",
"author": "Anton Merlin Geburek",
"author_email": "gebureka@cs.uni-freiburg.de",
"download_url": "https://files.pythonhosted.org/packages/bc/a2/c035ec747ffc49ec005266599b57822540597806f1a7ef6f2312a4a81e05/significance_analysis-0.1.11.tar.gz",
"platform": null,
"description": "# Significance Analysis\n\n[![PyPI version](https://img.shields.io/pypi/v/significance-analysis?color=informational)](https://pypi.org/project/significance-analysis/)\n[![Python versions](https://img.shields.io/pypi/pyversions/significance-analysis)](https://pypi.org/project/significance-analysis/)\n[![License](https://img.shields.io/pypi/l/significance-analysis?color=informational)](LICENSE)\n\nThis package is used to analyse datasets of different HPO-algorithms performing on multiple benchmarks.\n\n## Note\n\nAs indicated with the `v0.x.x` version number, Significance Analysis is early stage code and APIs might change in the future.\n\n## Documentation\n\nPlease have a look at our [example](significance_analysis_example/example_analysis.py).\nThe dataset should have the following format:\n\n| system_id<br>(algorithm name) | input_id<br>(benchmark name) | metric<br>(mean/estimate) | optional: bin_id<br>(budget/traininground) |\n| ----------------------------- | ---------------------------- | ------------------------- | ------------------------------------------ |\n| Algorithm1 | Benchmark1 | x.xxx | 1 |\n| Algorithm1 | Benchmark1 | x.xxx | 2 |\n| Algorithm1 | Benchmark2 | x.xxx | 1 |\n| ... | ... | ... | ... |\n| Algorithm2 | Benchmark2 | x..xxx | 2 |\n\nIn this dataset, there are two different algorithms, trained on two benchmarks for two iterations each. The variable-names (system_id, input_id...) can be customized, but have to be consistent throughout the dataset, i.e. not \"mean\" for one benchmark and \"estimate\" for another. The `conduct_analysis` function is then called with the dataset and the variable-names as parameters.\nOptionally the dataset can be binned according to a fourth variable (bin_id) and the analysis is conducted on each of the bins seperately, as shown in the code example above. To do this, provide the name of the bin_id-variable and if wanted the exact bins and bin labels. Otherwise a bin for each unique value will be created.\n\n## Installation\n\nUsing R, >=4.0.0\ninstall packages: Matrix, emmeans, lmerTest and lme4\n\nUsing pip\n\n```bash\npip install significance-analysis\n```\n\n## Usage\n\n1. Generate data from HPO-algorithms on benchmarks, saving data according to our format.\n1. Call function `conduct_analysis` on dataset, while specifying variable-names\n\nIn code, the usage pattern can look like this:\n\n```python\nimport pandas as pd\nfrom signficance_analysis import conduct_analysis\n\n# 1. Generate/import dataset\ndata = pd.read_csv(\"./significance_analysis_example/exampleDataset.csv\")\n\n# 2. Analyse dataset\nconduct_analysis(data, \"mean\", \"acquisition\", \"benchmark\")\n```\n\nFor more details and features please have a look at our [example](significance_analysis_example/example_analysis.py).\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Significance Analysis for HPO-algorithms performing on multiple benchmarks",
"version": "0.1.11",
"project_urls": null,
"split_keywords": [
"hyperparameter optimization",
"automl"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "3b6ebef44c4e43806fc9d87fefa3e8164ebc854f328b50aea9d4f2261d2ed3f3",
"md5": "781413bbef074a75babe56ac450a56b2",
"sha256": "fd99c929b333755f0602d8dde95b0178f79295c9d76fc3b324855257bc225a56"
},
"downloads": -1,
"filename": "significance_analysis-0.1.11-py3-none-any.whl",
"has_sig": false,
"md5_digest": "781413bbef074a75babe56ac450a56b2",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8,<3.11",
"size": 8483,
"upload_time": "2023-10-06T23:03:03",
"upload_time_iso_8601": "2023-10-06T23:03:03.011058Z",
"url": "https://files.pythonhosted.org/packages/3b/6e/bef44c4e43806fc9d87fefa3e8164ebc854f328b50aea9d4f2261d2ed3f3/significance_analysis-0.1.11-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "bca2c035ec747ffc49ec005266599b57822540597806f1a7ef6f2312a4a81e05",
"md5": "e27d317ca3787243f012ec35e8b8ad82",
"sha256": "045edefad21b913e2d4a8e57b8bbf43b3a04236368cc42466a8e1119c7d63a40"
},
"downloads": -1,
"filename": "significance_analysis-0.1.11.tar.gz",
"has_sig": false,
"md5_digest": "e27d317ca3787243f012ec35e8b8ad82",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8,<3.11",
"size": 9804,
"upload_time": "2023-10-06T23:03:05",
"upload_time_iso_8601": "2023-10-06T23:03:05.319895Z",
"url": "https://files.pythonhosted.org/packages/bc/a2/c035ec747ffc49ec005266599b57822540597806f1a7ef6f2312a4a81e05/significance_analysis-0.1.11.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-10-06 23:03:05",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "significance-analysis"
}