[![License](https://img.shields.io/badge/License-MIT-brightgreen.svg)](https://opensource.org/licenses/MIT)
[![Coverage Status](https://coveralls.io/repos/github/ValentinMargraf/ActiveLearningPipelines/badge.svg)](https://coveralls.io/github/ValentinMargraf/ActiveLearningPipelines)
[![Tests](https://github.com/ValentinMargraf/ActiveLearningPipelines/actions/workflows/unit-tests.yml/badge.svg)](https://github.com/ValentinMargraf/ActiveLearningPipelines/actions/workflows/unit-tests.yml)
[![Read the Docs](https://readthedocs.org/projects/shapiq/badge/?version=latest)](https://activelearningpipelines.readthedocs.io/en/latest/?badge=latest)
[![PyPI Version](https://img.shields.io/pypi/pyversions/alpbench.svg)](https://pypi.org/project/alpbench)
[![PyPI status](https://img.shields.io/pypi/status/alpbench.svg?color=blue)](https://pypi.org/project/alpbench)
[![Code Style](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
# ALPBench: A Benchmark for Active Learning Pipelines on Tabular Data
`ALPBench` is a Python package for the specification, execution, and performance monitoring of **active learning pipelines (ALP)** consisting of a **learning algorithm** and a **query strategy** for real-world tabular classification tasks. It has built-in measures to ensure evaluations are done reproducibly, saving exact dataset splits and hyperparameter settings of used algorithms. In total, `ALPBench` consists of 86 real-world tabular classification datasets and 5 active learning settings, yielding 430 active learning problems. However, the benchmark allows for easy extension such as implementing your own learning algorithm and/or query strategy and benchmark it against existing approaches.
# 🛠️ Install
`ALPBench` is intended to work with **Python 3.10 and above**.
```
# The base package can be installed via pip:
pip install alpbench
# Alternatively, you can install the full package via pip:
pip install alpbench[full]
# Or you can install the package from source:
git clone https://github.com/ValentinMargraf/ActiveLearningPipelines.git
cd ActiveLearningPipelines
conda create --name alpbench python=3.10
conda activate alpbench
# Install for usage (without TabNet and TabPFN)
pip install -r requirements.txt
# Install for usage (with TabNet and TabPFN)
pip install -r requirements_full.txt
```
Documentation at https://activelearningpipelines.readthedocs.io/en/latest/
# ⭐ Quickstart
You can use `ALPBench` in different ways. There already exist quite some learners and query strategies that can be
run through accessing them with their name, as can be seen in the minimal example below. In the ALP.pipeline module you
can also implement your own (new) query strategies.
## 📈 Fit an Active Learning Pipeline
Fit an ALP on dataset with openmlid 31, using a random forest and margin sampling. You can find similar example code snippets in
**examples/**.
```python
from sklearn.metrics import accuracy_score
from alpbench.benchmark.BenchmarkConnector import DataFileBenchmarkConnector
from alpbench.evaluation.experimenter.DefaultSetup import ensure_default_setup
from alpbench.pipeline.ALPEvaluator import ALPEvaluator
# create benchmark connector and establish database connection
benchmark_connector = DataFileBenchmarkConnector()
# load some default settings and algorithm choices
ensure_default_setup(benchmark_connector)
evaluator = ALPEvaluator(benchmark_connector=benchmark_connector,
setting_name="small", openml_id=31, query_strategy_name="margin", learner_name="rf_gini")
alp = evaluator.fit()
# fit / predict and evaluate predictions
X_test, y_test = evaluator.get_test_data()
y_hat = alp.predict(X=X_test)
print("final test acc", accuracy_score(y_test, y_hat))
>> final
test
acc
0.7181818181818181
```
## Changelog
### v0.1.0 (2024-06-13)
Initial release
- `pipeline` can be used to combine learning algorithms and query strategies into active learning pipelines
- `evaluation` provides tools to evaluate active learning pipelines
- `benchmark` monitors the performance of active learning pipelines over time and store results in a database
### v0.1.1 (2024-06-14)
- extra code for `tabnet` does no longer need to be included from the repo
### v0.1.2 (2024-07-10)
- added plotting functionality to `evaluation` to generate budget curves
- notebook in `docs` explains how to extract results from previously run experiments and plot budget curves
Raw data
{
"_id": null,
"home_page": "https://github.com/ValentinMargraf/ActiveLearningPipelines",
"name": "alpbench",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10.0",
"maintainer_email": null,
"keywords": "python, machine learning, active learning, benchmark, tabular data, classification",
"author": "Valentin Margraf et al.",
"author_email": "valentin.margraf@ifi.lmu.de",
"download_url": "https://files.pythonhosted.org/packages/47/2e/9f184459a6771fe09eb31ddd64e7d372ad34d96192b43245d78abb0320bd/alpbench-0.1.2.tar.gz",
"platform": null,
"description": "[![License](https://img.shields.io/badge/License-MIT-brightgreen.svg)](https://opensource.org/licenses/MIT)\n[![Coverage Status](https://coveralls.io/repos/github/ValentinMargraf/ActiveLearningPipelines/badge.svg)](https://coveralls.io/github/ValentinMargraf/ActiveLearningPipelines)\n[![Tests](https://github.com/ValentinMargraf/ActiveLearningPipelines/actions/workflows/unit-tests.yml/badge.svg)](https://github.com/ValentinMargraf/ActiveLearningPipelines/actions/workflows/unit-tests.yml)\n[![Read the Docs](https://readthedocs.org/projects/shapiq/badge/?version=latest)](https://activelearningpipelines.readthedocs.io/en/latest/?badge=latest)\n\n[![PyPI Version](https://img.shields.io/pypi/pyversions/alpbench.svg)](https://pypi.org/project/alpbench)\n[![PyPI status](https://img.shields.io/pypi/status/alpbench.svg?color=blue)](https://pypi.org/project/alpbench)\n[![Code Style](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\n# ALPBench: A Benchmark for Active Learning Pipelines on Tabular Data\n`ALPBench` is a Python package for the specification, execution, and performance monitoring of **active learning pipelines (ALP)** consisting of a **learning algorithm** and a **query strategy** for real-world tabular classification tasks. It has built-in measures to ensure evaluations are done reproducibly, saving exact dataset splits and hyperparameter settings of used algorithms. In total, `ALPBench` consists of 86 real-world tabular classification datasets and 5 active learning settings, yielding 430 active learning problems. However, the benchmark allows for easy extension such as implementing your own learning algorithm and/or query strategy and benchmark it against existing approaches.\n\n\n# \ud83d\udee0\ufe0f Install\n`ALPBench` is intended to work with **Python 3.10 and above**.\n\n```\n# The base package can be installed via pip:\npip install alpbench\n\n# Alternatively, you can install the full package via pip:\npip install alpbench[full]\n\n# Or you can install the package from source:\ngit clone https://github.com/ValentinMargraf/ActiveLearningPipelines.git\ncd ActiveLearningPipelines\nconda create --name alpbench python=3.10\nconda activate alpbench\n\n# Install for usage (without TabNet and TabPFN)\npip install -r requirements.txt\n\n# Install for usage (with TabNet and TabPFN)\npip install -r requirements_full.txt\n```\n\nDocumentation at https://activelearningpipelines.readthedocs.io/en/latest/\n\n\n# \u2b50 Quickstart\nYou can use `ALPBench` in different ways. There already exist quite some learners and query strategies that can be\nrun through accessing them with their name, as can be seen in the minimal example below. In the ALP.pipeline module you\ncan also implement your own (new) query strategies.\n\n\n## \ud83d\udcc8 Fit an Active Learning Pipeline\n\nFit an ALP on dataset with openmlid 31, using a random forest and margin sampling. You can find similar example code snippets in\n**examples/**.\n\n```python\nfrom sklearn.metrics import accuracy_score\n\nfrom alpbench.benchmark.BenchmarkConnector import DataFileBenchmarkConnector\nfrom alpbench.evaluation.experimenter.DefaultSetup import ensure_default_setup\nfrom alpbench.pipeline.ALPEvaluator import ALPEvaluator\n\n# create benchmark connector and establish database connection\nbenchmark_connector = DataFileBenchmarkConnector()\n\n# load some default settings and algorithm choices\nensure_default_setup(benchmark_connector)\n\nevaluator = ALPEvaluator(benchmark_connector=benchmark_connector,\n setting_name=\"small\", openml_id=31, query_strategy_name=\"margin\", learner_name=\"rf_gini\")\nalp = evaluator.fit()\n\n# fit / predict and evaluate predictions\nX_test, y_test = evaluator.get_test_data()\ny_hat = alp.predict(X=X_test)\nprint(\"final test acc\", accuracy_score(y_test, y_hat))\n\n>> final\ntest\nacc\n0.7181818181818181\n\n```\n\n\n## Changelog \n### v0.1.0 (2024-06-13)\nInitial release\n\n- `pipeline` can be used to combine learning algorithms and query strategies into active learning pipelines\n- `evaluation` provides tools to evaluate active learning pipelines\n- `benchmark` monitors the performance of active learning pipelines over time and store results in a database\n\n### v0.1.1 (2024-06-14)\n\n- extra code for `tabnet` does no longer need to be included from the repo\n\n### v0.1.2 (2024-07-10)\n\n- added plotting functionality to `evaluation` to generate budget curves\n- notebook in `docs` explains how to extract results from previously run experiments and plot budget curves\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Active Learning Pipelines Benchmark",
"version": "0.1.2",
"project_urls": {
"Documentation": "https://activelearningpipelines.readthedocs.io/",
"Homepage": "https://github.com/ValentinMargraf/ActiveLearningPipelines",
"Source": "https://github.com/ValentinMargraf/ActiveLearningPipelines",
"Tracker": "https://github.com/ValentinMargraf/ActiveLearningPipelines/issues"
},
"split_keywords": [
"python",
" machine learning",
" active learning",
" benchmark",
" tabular data",
" classification"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "66a30e4fd16d665957b47180d5539c58b17c4dfce268b1ea11a39f8482abfbda",
"md5": "1efe5e23691e8047e63696687b970b12",
"sha256": "e7afff22409a3b1fe9606c49edd637cef4968baac1929fb9dc5fa12e52141066"
},
"downloads": -1,
"filename": "alpbench-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1efe5e23691e8047e63696687b970b12",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10.0",
"size": 106242,
"upload_time": "2024-07-10T15:41:02",
"upload_time_iso_8601": "2024-07-10T15:41:02.252046Z",
"url": "https://files.pythonhosted.org/packages/66/a3/0e4fd16d665957b47180d5539c58b17c4dfce268b1ea11a39f8482abfbda/alpbench-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "472e9f184459a6771fe09eb31ddd64e7d372ad34d96192b43245d78abb0320bd",
"md5": "07c9351a36d1a2c1f0d35a6ea5bcf1fa",
"sha256": "e90884afc979c0e8e560bc20d176415e5155bcb5790f2c05c723a627eb543189"
},
"downloads": -1,
"filename": "alpbench-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "07c9351a36d1a2c1f0d35a6ea5bcf1fa",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10.0",
"size": 90334,
"upload_time": "2024-07-10T15:41:04",
"upload_time_iso_8601": "2024-07-10T15:41:04.974640Z",
"url": "https://files.pythonhosted.org/packages/47/2e/9f184459a6771fe09eb31ddd64e7d372ad34d96192b43245d78abb0320bd/alpbench-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-10 15:41:04",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "ValentinMargraf",
"github_project": "ActiveLearningPipelines",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "py-experimenter",
"specs": [
[
"==",
"1.4.1"
]
]
},
{
"name": "mysql-connector-python",
"specs": [
[
"==",
"8.4.0"
]
]
},
{
"name": "openml",
"specs": [
[
"==",
"0.14.2"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
"==",
"1.2.2"
]
]
},
{
"name": "scikit-activeml",
"specs": [
[
"==",
"0.4.1"
]
]
},
{
"name": "catboost",
"specs": [
[
"==",
"1.2.5"
]
]
},
{
"name": "xgboost",
"specs": [
[
"==",
"2.0.3"
]
]
}
],
"lcname": "alpbench"
}