significance-analysis


Namesignificance-analysis JSON
Version 0.1.11 PyPI version JSON
download
home_page
SummarySignificance Analysis for HPO-algorithms performing on multiple benchmarks
upload_time2023-10-06 23:03:05
maintainer
docs_urlNone
authorAnton Merlin Geburek
requires_python>=3.8,<3.11
licenseMIT
keywords hyperparameter optimization automl
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Significance Analysis

[![PyPI version](https://img.shields.io/pypi/v/significance-analysis?color=informational)](https://pypi.org/project/significance-analysis/)
[![Python versions](https://img.shields.io/pypi/pyversions/significance-analysis)](https://pypi.org/project/significance-analysis/)
[![License](https://img.shields.io/pypi/l/significance-analysis?color=informational)](LICENSE)

This package is used to analyse datasets of different HPO-algorithms performing on multiple benchmarks.

## Note

As indicated with the `v0.x.x` version number, Significance Analysis is early stage code and APIs might change in the future.

## Documentation

Please have a look at our [example](significance_analysis_example/example_analysis.py).
The dataset should have the following format:

| system_id<br>(algorithm name) | input_id<br>(benchmark name) | metric<br>(mean/estimate) | optional: bin_id<br>(budget/traininground) |
| ----------------------------- | ---------------------------- | ------------------------- | ------------------------------------------ |
| Algorithm1                    | Benchmark1                   | x.xxx                     | 1                                          |
| Algorithm1                    | Benchmark1                   | x.xxx                     | 2                                          |
| Algorithm1                    | Benchmark2                   | x.xxx                     | 1                                          |
| ...                           | ...                          | ...                       | ...                                        |
| Algorithm2                    | Benchmark2                   | x..xxx                    | 2                                          |

In this dataset, there are two different algorithms, trained on two benchmarks for two iterations each. The variable-names (system_id, input_id...) can be customized, but have to be consistent throughout the dataset, i.e. not "mean" for one benchmark and "estimate" for another. The `conduct_analysis` function is then called with the dataset and the variable-names as parameters.
Optionally the dataset can be binned according to a fourth variable (bin_id) and the analysis is conducted on each of the bins seperately, as shown in the code example above. To do this, provide the name of the bin_id-variable and if wanted the exact bins and bin labels. Otherwise a bin for each unique value will be created.

## Installation

Using R, >=4.0.0
install packages: Matrix, emmeans, lmerTest and lme4

Using pip

```bash
pip install significance-analysis
```

## Usage

1. Generate data from HPO-algorithms on benchmarks, saving data according to our format.
1. Call function `conduct_analysis` on dataset, while specifying variable-names

In code, the usage pattern can look like this:

```python
import pandas as pd
from signficance_analysis import conduct_analysis

# 1. Generate/import dataset
data = pd.read_csv("./significance_analysis_example/exampleDataset.csv")

# 2. Analyse dataset
conduct_analysis(data, "mean", "acquisition", "benchmark")
```

For more details and features please have a look at our [example](significance_analysis_example/example_analysis.py).

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "significance-analysis",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8,<3.11",
    "maintainer_email": "",
    "keywords": "Hyperparameter Optimization,AutoML",
    "author": "Anton Merlin Geburek",
    "author_email": "gebureka@cs.uni-freiburg.de",
    "download_url": "https://files.pythonhosted.org/packages/bc/a2/c035ec747ffc49ec005266599b57822540597806f1a7ef6f2312a4a81e05/significance_analysis-0.1.11.tar.gz",
    "platform": null,
    "description": "# Significance Analysis\n\n[![PyPI version](https://img.shields.io/pypi/v/significance-analysis?color=informational)](https://pypi.org/project/significance-analysis/)\n[![Python versions](https://img.shields.io/pypi/pyversions/significance-analysis)](https://pypi.org/project/significance-analysis/)\n[![License](https://img.shields.io/pypi/l/significance-analysis?color=informational)](LICENSE)\n\nThis package is used to analyse datasets of different HPO-algorithms performing on multiple benchmarks.\n\n## Note\n\nAs indicated with the `v0.x.x` version number, Significance Analysis is early stage code and APIs might change in the future.\n\n## Documentation\n\nPlease have a look at our [example](significance_analysis_example/example_analysis.py).\nThe dataset should have the following format:\n\n| system_id<br>(algorithm name) | input_id<br>(benchmark name) | metric<br>(mean/estimate) | optional: bin_id<br>(budget/traininground) |\n| ----------------------------- | ---------------------------- | ------------------------- | ------------------------------------------ |\n| Algorithm1                    | Benchmark1                   | x.xxx                     | 1                                          |\n| Algorithm1                    | Benchmark1                   | x.xxx                     | 2                                          |\n| Algorithm1                    | Benchmark2                   | x.xxx                     | 1                                          |\n| ...                           | ...                          | ...                       | ...                                        |\n| Algorithm2                    | Benchmark2                   | x..xxx                    | 2                                          |\n\nIn this dataset, there are two different algorithms, trained on two benchmarks for two iterations each. The variable-names (system_id, input_id...) can be customized, but have to be consistent throughout the dataset, i.e. not \"mean\" for one benchmark and \"estimate\" for another. The `conduct_analysis` function is then called with the dataset and the variable-names as parameters.\nOptionally the dataset can be binned according to a fourth variable (bin_id) and the analysis is conducted on each of the bins seperately, as shown in the code example above. To do this, provide the name of the bin_id-variable and if wanted the exact bins and bin labels. Otherwise a bin for each unique value will be created.\n\n## Installation\n\nUsing R, >=4.0.0\ninstall packages: Matrix, emmeans, lmerTest and lme4\n\nUsing pip\n\n```bash\npip install significance-analysis\n```\n\n## Usage\n\n1. Generate data from HPO-algorithms on benchmarks, saving data according to our format.\n1. Call function `conduct_analysis` on dataset, while specifying variable-names\n\nIn code, the usage pattern can look like this:\n\n```python\nimport pandas as pd\nfrom signficance_analysis import conduct_analysis\n\n# 1. Generate/import dataset\ndata = pd.read_csv(\"./significance_analysis_example/exampleDataset.csv\")\n\n# 2. Analyse dataset\nconduct_analysis(data, \"mean\", \"acquisition\", \"benchmark\")\n```\n\nFor more details and features please have a look at our [example](significance_analysis_example/example_analysis.py).\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Significance Analysis for HPO-algorithms performing on multiple benchmarks",
    "version": "0.1.11",
    "project_urls": null,
    "split_keywords": [
        "hyperparameter optimization",
        "automl"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3b6ebef44c4e43806fc9d87fefa3e8164ebc854f328b50aea9d4f2261d2ed3f3",
                "md5": "781413bbef074a75babe56ac450a56b2",
                "sha256": "fd99c929b333755f0602d8dde95b0178f79295c9d76fc3b324855257bc225a56"
            },
            "downloads": -1,
            "filename": "significance_analysis-0.1.11-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "781413bbef074a75babe56ac450a56b2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<3.11",
            "size": 8483,
            "upload_time": "2023-10-06T23:03:03",
            "upload_time_iso_8601": "2023-10-06T23:03:03.011058Z",
            "url": "https://files.pythonhosted.org/packages/3b/6e/bef44c4e43806fc9d87fefa3e8164ebc854f328b50aea9d4f2261d2ed3f3/significance_analysis-0.1.11-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bca2c035ec747ffc49ec005266599b57822540597806f1a7ef6f2312a4a81e05",
                "md5": "e27d317ca3787243f012ec35e8b8ad82",
                "sha256": "045edefad21b913e2d4a8e57b8bbf43b3a04236368cc42466a8e1119c7d63a40"
            },
            "downloads": -1,
            "filename": "significance_analysis-0.1.11.tar.gz",
            "has_sig": false,
            "md5_digest": "e27d317ca3787243f012ec35e8b8ad82",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<3.11",
            "size": 9804,
            "upload_time": "2023-10-06T23:03:05",
            "upload_time_iso_8601": "2023-10-06T23:03:05.319895Z",
            "url": "https://files.pythonhosted.org/packages/bc/a2/c035ec747ffc49ec005266599b57822540597806f1a7ef6f2312a4a81e05/significance_analysis-0.1.11.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-06 23:03:05",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "significance-analysis"
}
        
Elapsed time: 0.12164s