significance-analysis

Name	significance-analysis JSON
Version	0.2.1 JSON
	download
home_page	None
Summary	Significance Analysis for HPO-algorithms performing on multiple benchmarks
upload_time	2024-09-11 07:05:30
maintainer	None
docs_url	None
author	Anton Merlin Geburek
requires_python	<3.11,>=3.8
license	MIT
keywords	hyperparameter optimization automl
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Significance Analysis

[![PyPI version](https://img.shields.io/pypi/v/significance-analysis?color=informational)](https://pypi.org/project/significance-analysis/)
[![Python versions](https://img.shields.io/pypi/pyversions/significance-analysis)](https://pypi.org/project/significance-analysis/)
[![License](https://img.shields.io/pypi/l/significance-analysis?color=informational)](LICENSE)

This package is used to analyse datasets of different HPO-algorithms performing on multiple benchmarks, using a Linear Mixed-Effects Model-based approach.

## Note

As indicated with the `v0.x.x` version number, Significance Analysis is early stage code and APIs might change in the future.

## Documentation

Please have a look at our [example](significance_analysis_example/analysis_example.ipynb).
The dataset should be a pandas dataframe of the following format:

| algorithm  | benchmark  | metric | optional: budget/prior/... |
| ---------- | ---------- | ------ | -------------------------- |
| Algorithm1 | Benchmark1 | x.xxx  | 1.0                        |
| Algorithm1 | Benchmark1 | x.xxx  | 2.0                        |
| Algorithm1 | Benchmark2 | x.xxx  | 1.0                        |
| ...        | ...        | ...    | ...                        |
| Algorithm2 | Benchmark2 | x.xxx  | 2.0                        |

Our function `dataset_validator` checks for this format.

## Installation

Using R, >=4.0.0
install packages: Matrix, emmeans, lmerTest and lme4

Using pip

```bash
pip install significance-analysis
```

## Usage for significance testing

1. Generate data from HPO-algorithms on benchmarks, saving data according to our format.
1. Build a model with all interesting factors
1. Do post-hoc testing
1. Plot the results as CD-diagram

In code, the usage pattern can look like this:

```python
import pandas as pd
from significance_analysis import dataframe_validator, model, cd_diagram


# 1. Generate/import dataset
data = dataframe_validator(pd.read_parquet("datasets/priorband_data.parquet"))

# 2. Build the model
mod = model("value ~ algorithm + (1|benchmark) + prior", data)

# 3. Conduct the post-hoc analysis
post_hoc_results = mod.post_hoc("algorithm")

# 4. Plot the results
cd_diagram(post_hoc_results)
```

## Usage for hypothesis testing

Use the GLRT implementation or our prepared `sanity checks` to conduct LMEM-based hypothesis testing.

In code:

```python
from significance_analysis import (
    dataframe_validator,
    glrt,
    model,
    seed_dependency_check,
    benchmark_information_check,
    fidelity_check,
)

# 1. Generate/import dataset
data = dataframe_validator(pd.read_parquet("datasets/priorband_data.parquet"))

# 2. Run the preconfigured sanity checks
seed_dependency_check(data)
benchmark_information_check(data)
fidelity_check(data)

# 3. Run a custom hypothesis test, comparing model_1 and model_2
model_1 = model("value ~ algorithm", data)
model_2 = model("value ~ 1", data)
glrt(model_1, model_2)
```

## Usage for metafeature impact analysis

Analyzing the influence, a metafeature has on two algorithms performances.

In code:

```python
from significance_analysis import dataframe_validator, metafeature_analysis

# 1. Generate/import dataset
data = dataframe_validator(pd.read_parquet("datasets/priorband_data.parquet"))

# 2. Run the metafeature analysis
scores = metafeature_analysis(data, ("HB", "PB"), "prior")
```

For more details and features please have a look at our [example](significance_analysis_example/analysis_example.py).

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "significance-analysis",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.11,>=3.8",
    "maintainer_email": null,
    "keywords": "Hyperparameter Optimization, AutoML",
    "author": "Anton Merlin Geburek",
    "author_email": "gebureka@cs.uni-freiburg.de",
    "download_url": "https://files.pythonhosted.org/packages/93/36/f6d0ea6ee62ddfe638cac828aae69c3dc17f0e952ff34d5f0bb031f03891/significance_analysis-0.2.1.tar.gz",
    "platform": null,
    "description": "# Significance Analysis\n\n[![PyPI version](https://img.shields.io/pypi/v/significance-analysis?color=informational)](https://pypi.org/project/significance-analysis/)\n[![Python versions](https://img.shields.io/pypi/pyversions/significance-analysis)](https://pypi.org/project/significance-analysis/)\n[![License](https://img.shields.io/pypi/l/significance-analysis?color=informational)](LICENSE)\n\nThis package is used to analyse datasets of different HPO-algorithms performing on multiple benchmarks, using a Linear Mixed-Effects Model-based approach.\n\n## Note\n\nAs indicated with the `v0.x.x` version number, Significance Analysis is early stage code and APIs might change in the future.\n\n## Documentation\n\nPlease have a look at our [example](significance_analysis_example/analysis_example.ipynb).\nThe dataset should be a pandas dataframe of the following format:\n\n| algorithm  | benchmark  | metric | optional: budget/prior/... |\n| ---------- | ---------- | ------ | -------------------------- |\n| Algorithm1 | Benchmark1 | x.xxx  | 1.0                        |\n| Algorithm1 | Benchmark1 | x.xxx  | 2.0                        |\n| Algorithm1 | Benchmark2 | x.xxx  | 1.0                        |\n| ...        | ...        | ...    | ...                        |\n| Algorithm2 | Benchmark2 | x.xxx  | 2.0                        |\n\nOur function `dataset_validator` checks for this format.\n\n## Installation\n\nUsing R, >=4.0.0\ninstall packages: Matrix, emmeans, lmerTest and lme4\n\nUsing pip\n\n```bash\npip install significance-analysis\n```\n\n## Usage for significance testing\n\n1. Generate data from HPO-algorithms on benchmarks, saving data according to our format.\n1. Build a model with all interesting factors\n1. Do post-hoc testing\n1. Plot the results as CD-diagram\n\nIn code, the usage pattern can look like this:\n\n```python\nimport pandas as pd\nfrom significance_analysis import dataframe_validator, model, cd_diagram\n\n\n# 1. Generate/import dataset\ndata = dataframe_validator(pd.read_parquet(\"datasets/priorband_data.parquet\"))\n\n# 2. Build the model\nmod = model(\"value ~ algorithm + (1|benchmark) + prior\", data)\n\n# 3. Conduct the post-hoc analysis\npost_hoc_results = mod.post_hoc(\"algorithm\")\n\n# 4. Plot the results\ncd_diagram(post_hoc_results)\n```\n\n## Usage for hypothesis testing\n\nUse the GLRT implementation or our prepared `sanity checks` to conduct LMEM-based hypothesis testing.\n\nIn code:\n\n```python\nfrom significance_analysis import (\n    dataframe_validator,\n    glrt,\n    model,\n    seed_dependency_check,\n    benchmark_information_check,\n    fidelity_check,\n)\n\n# 1. Generate/import dataset\ndata = dataframe_validator(pd.read_parquet(\"datasets/priorband_data.parquet\"))\n\n# 2. Run the preconfigured sanity checks\nseed_dependency_check(data)\nbenchmark_information_check(data)\nfidelity_check(data)\n\n# 3. Run a custom hypothesis test, comparing model_1 and model_2\nmodel_1 = model(\"value ~ algorithm\", data)\nmodel_2 = model(\"value ~ 1\", data)\nglrt(model_1, model_2)\n```\n\n## Usage for metafeature impact analysis\n\nAnalyzing the influence, a metafeature has on two algorithms performances.\n\nIn code:\n\n```python\nfrom significance_analysis import dataframe_validator, metafeature_analysis\n\n# 1. Generate/import dataset\ndata = dataframe_validator(pd.read_parquet(\"datasets/priorband_data.parquet\"))\n\n# 2. Run the metafeature analysis\nscores = metafeature_analysis(data, (\"HB\", \"PB\"), \"prior\")\n```\n\nFor more details and features please have a look at our [example](significance_analysis_example/analysis_example.py).\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Significance Analysis for HPO-algorithms performing on multiple benchmarks",
    "version": "0.2.1",
    "project_urls": null,
    "split_keywords": [
        "hyperparameter optimization",
        " automl"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5f2a5b971b55d9240b4ee97a66e49eacd419d83249103128fb8a297e90332025",
                "md5": "a0ef1f10d1cf3c0f8e0fb98906fa86fe",
                "sha256": "4be51cf86bcd8f8a1c5a4253969a08be9e2346b0e6fca981a791ca607667071d"
            },
            "downloads": -1,
            "filename": "significance_analysis-0.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a0ef1f10d1cf3c0f8e0fb98906fa86fe",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.11,>=3.8",
            "size": 14388,
            "upload_time": "2024-09-11T07:05:29",
            "upload_time_iso_8601": "2024-09-11T07:05:29.759226Z",
            "url": "https://files.pythonhosted.org/packages/5f/2a/5b971b55d9240b4ee97a66e49eacd419d83249103128fb8a297e90332025/significance_analysis-0.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9336f6d0ea6ee62ddfe638cac828aae69c3dc17f0e952ff34d5f0bb031f03891",
                "md5": "58543cacca7900d9ce8b8918814c6557",
                "sha256": "78285d2af842b1cc296abbffda01f312e00f7efe37cee8ca336cddd7cb01e05f"
            },
            "downloads": -1,
            "filename": "significance_analysis-0.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "58543cacca7900d9ce8b8918814c6557",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.11,>=3.8",
            "size": 16187,
            "upload_time": "2024-09-11T07:05:30",
            "upload_time_iso_8601": "2024-09-11T07:05:30.649543Z",
            "url": "https://files.pythonhosted.org/packages/93/36/f6d0ea6ee62ddfe638cac828aae69c3dc17f0e952ff34d5f0bb031f03891/significance_analysis-0.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-11 07:05:30",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "significance-analysis"
}

Anton Merlin Geburek