powerlift


Namepowerlift JSON
Version 0.1.12 PyPI version JSON
download
home_pagehttps://github.com/interpretml/interpret
SummaryInteractive Benchmarking for Machine Learning.
upload_time2024-09-28 19:01:52
maintainerNone
docs_urlNone
authorThe InterpretML Contributors
requires_python>=3.7
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Powerlift

![License](https://img.shields.io/github/license/interpretml/interpret.svg?style=flat-square)
<br/>
> ### Advancing the state of machine learning?
> ### With 5-10 datasets? Wake me up when I'm dead.

Powerlift is all about testing machine learning techniques across many, many datasets. So many, that we had run into design of experiment concerns. So many, that we had to develop a package for it.

Yes, we run this for InterpretML on as many docker containers we can run in parallel on. Why wait days for benchmark evalations when you can wait for minutes? Rhetorical question, please don't hurt me.

```python
def trial_filter(task):
    if task.problem == "binary" and task.n_samples <= 10000:
        return ["rf", "svm"]
    return []

def trial_runner(trial):
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.svm import LinearSVC
    from sklearn.calibration import CalibratedClassifierCV
    from sklearn.model_selection import train_test_split
    from sklearn.metrics import roc_auc_score
    from sklearn.pipeline import Pipeline
    from sklearn.preprocessing import OneHotEncoder, FunctionTransformer
    from sklearn.compose import ColumnTransformer
    from sklearn.impute import SimpleImputer

    if trial.task.problem == "binary":
        X, y = trial.task.data()

        # Holdout split
        X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.3)

        # Build preprocessor
        is_cat = meta["categorical_mask"]
        cat_cols = [idx for idx in range(X.shape[1]) if is_cat[idx]]
        num_cols = [idx for idx in range(X.shape[1]) if not is_cat[idx]]
        cat_ohe_step = ("ohe", OneHotEncoder(sparse_output=True, handle_unknown="ignore"))
        cat_pipe = Pipeline([cat_ohe_step])
        num_pipe = Pipeline([("identity", FunctionTransformer())])
        transformers = [("cat", cat_pipe, cat_cols), ("num", num_pipe, num_cols)]
        ct = Pipeline(
            [
                ("ct", ColumnTransformer(transformers=transformers)),
                (
                    "missing",
                    SimpleImputer(add_indicator=True, strategy="most_frequent"),
                ),
            ]
        )
        # Connect preprocessor with target learner
        if trial.method == "svm":
            clf = Pipeline([("ct", ct), ("est", CalibratedClassifierCV(LinearSVC()))])
        else:
            clf = Pipeline([("ct", ct), ("est", RandomForestClassifier())])

        # Train
        clf.fit(X_tr, y_tr)

        # Predict
        predictions = clf.predict_proba(X_te)[:, 1]

        # Score
        auc = roc_auc_score(y_te, predictions)
        trial.log("auc", auc)


import os
from powerlift.bench import Benchmark, Store
from powerlift.bench import populate_with_datasets

# Initialize database (if needed).
conn_str = f"sqlite:///{os.getcwd()}/powerlift.db"
store = Store(conn_str, force_recreate=False)

# This downloads datasets once and feeds into the database.
populate_with_datasets(store, cache_dir="~/.powerlift", exist_ok=True)

# Run experiment
benchmark = Benchmark(f"sqlite:///{os.getcwd()}/powerlift.db", name="SVM vs RF")
benchmark.run(trial_runner, trial_filter)
benchmark.wait_until_complete()
```

This can also be run on Azure Container Instances where needed.
```python
# Run experiment (but in ACI).
from powerlift.executors import AzureContainerInstance
store = Store(os.getenv("AZURE_DB_URL"))
azure_tenant_id = os.getenv("AZURE_TENANT_ID")
subscription_id = os.getenv("AZURE_SUBSCRIPTION_ID")
azure_client_id = os.getenv("AZURE_CLIENT_ID")
azure_client_secret = os.getenv("AZURE_CLIENT_SECRET")
resource_group = os.getenv("AZURE_RESOURCE_GROUP")

executor = AzureContainerInstance(
    store,
    azure_tenant_id,
    subscription_id,
    azure_client_id,
    azure_client_secret=azure_client_secret,
    resource_group=resource_group,
    n_running_containers=5
)
benchmark = Benchmark(store, name="SVM vs RF")
benchmark.run(trial_runner, trial_filter, timeout=10, executor=executor)
benchmark.wait_until_complete()
```

## Install
`pip install powerlift[datasets]`

That's it, go get 'em boss.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/interpretml/interpret",
    "name": "powerlift",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": null,
    "author": "The InterpretML Contributors",
    "author_email": "interpret@microsoft.com",
    "download_url": null,
    "platform": null,
    "description": "# Powerlift\n\n![License](https://img.shields.io/github/license/interpretml/interpret.svg?style=flat-square)\n<br/>\n> ### Advancing the state of machine learning?\n> ### With 5-10 datasets? Wake me up when I'm dead.\n\nPowerlift is all about testing machine learning techniques across many, many datasets. So many, that we had run into design of experiment concerns. So many, that we had to develop a package for it.\n\nYes, we run this for InterpretML on as many docker containers we can run in parallel on. Why wait days for benchmark evalations when you can wait for minutes? Rhetorical question, please don't hurt me.\n\n```python\ndef trial_filter(task):\n    if task.problem == \"binary\" and task.n_samples <= 10000:\n        return [\"rf\", \"svm\"]\n    return []\n\ndef trial_runner(trial):\n    from sklearn.ensemble import RandomForestClassifier\n    from sklearn.svm import LinearSVC\n    from sklearn.calibration import CalibratedClassifierCV\n    from sklearn.model_selection import train_test_split\n    from sklearn.metrics import roc_auc_score\n    from sklearn.pipeline import Pipeline\n    from sklearn.preprocessing import OneHotEncoder, FunctionTransformer\n    from sklearn.compose import ColumnTransformer\n    from sklearn.impute import SimpleImputer\n\n    if trial.task.problem == \"binary\":\n        X, y = trial.task.data()\n\n        # Holdout split\n        X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.3)\n\n        # Build preprocessor\n        is_cat = meta[\"categorical_mask\"]\n        cat_cols = [idx for idx in range(X.shape[1]) if is_cat[idx]]\n        num_cols = [idx for idx in range(X.shape[1]) if not is_cat[idx]]\n        cat_ohe_step = (\"ohe\", OneHotEncoder(sparse_output=True, handle_unknown=\"ignore\"))\n        cat_pipe = Pipeline([cat_ohe_step])\n        num_pipe = Pipeline([(\"identity\", FunctionTransformer())])\n        transformers = [(\"cat\", cat_pipe, cat_cols), (\"num\", num_pipe, num_cols)]\n        ct = Pipeline(\n            [\n                (\"ct\", ColumnTransformer(transformers=transformers)),\n                (\n                    \"missing\",\n                    SimpleImputer(add_indicator=True, strategy=\"most_frequent\"),\n                ),\n            ]\n        )\n        # Connect preprocessor with target learner\n        if trial.method == \"svm\":\n            clf = Pipeline([(\"ct\", ct), (\"est\", CalibratedClassifierCV(LinearSVC()))])\n        else:\n            clf = Pipeline([(\"ct\", ct), (\"est\", RandomForestClassifier())])\n\n        # Train\n        clf.fit(X_tr, y_tr)\n\n        # Predict\n        predictions = clf.predict_proba(X_te)[:, 1]\n\n        # Score\n        auc = roc_auc_score(y_te, predictions)\n        trial.log(\"auc\", auc)\n\n\nimport os\nfrom powerlift.bench import Benchmark, Store\nfrom powerlift.bench import populate_with_datasets\n\n# Initialize database (if needed).\nconn_str = f\"sqlite:///{os.getcwd()}/powerlift.db\"\nstore = Store(conn_str, force_recreate=False)\n\n# This downloads datasets once and feeds into the database.\npopulate_with_datasets(store, cache_dir=\"~/.powerlift\", exist_ok=True)\n\n# Run experiment\nbenchmark = Benchmark(f\"sqlite:///{os.getcwd()}/powerlift.db\", name=\"SVM vs RF\")\nbenchmark.run(trial_runner, trial_filter)\nbenchmark.wait_until_complete()\n```\n\nThis can also be run on Azure Container Instances where needed.\n```python\n# Run experiment (but in ACI).\nfrom powerlift.executors import AzureContainerInstance\nstore = Store(os.getenv(\"AZURE_DB_URL\"))\nazure_tenant_id = os.getenv(\"AZURE_TENANT_ID\")\nsubscription_id = os.getenv(\"AZURE_SUBSCRIPTION_ID\")\nazure_client_id = os.getenv(\"AZURE_CLIENT_ID\")\nazure_client_secret = os.getenv(\"AZURE_CLIENT_SECRET\")\nresource_group = os.getenv(\"AZURE_RESOURCE_GROUP\")\n\nexecutor = AzureContainerInstance(\n    store,\n    azure_tenant_id,\n    subscription_id,\n    azure_client_id,\n    azure_client_secret=azure_client_secret,\n    resource_group=resource_group,\n    n_running_containers=5\n)\nbenchmark = Benchmark(store, name=\"SVM vs RF\")\nbenchmark.run(trial_runner, trial_filter, timeout=10, executor=executor)\nbenchmark.wait_until_complete()\n```\n\n## Install\n`pip install powerlift[datasets]`\n\nThat's it, go get 'em boss.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Interactive Benchmarking for Machine Learning.",
    "version": "0.1.12",
    "project_urls": {
        "Bug Tracker": "https://github.com/interpretml/interpret/issues",
        "Homepage": "https://github.com/interpretml/interpret"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6b7bed0f7f2f9b7532a26bc64b3cf9cbee1c0d27d2cde56254dbb371a083ddbf",
                "md5": "87adb2c60186f7e4f106866a1d7eb9f4",
                "sha256": "23103f6f1074ddc70fb52ff7c31cb04f0c5d1d105747176e63d098b4e6bc355e"
            },
            "downloads": -1,
            "filename": "powerlift-0.1.12-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "87adb2c60186f7e4f106866a1d7eb9f4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 28846,
            "upload_time": "2024-09-28T19:01:52",
            "upload_time_iso_8601": "2024-09-28T19:01:52.219011Z",
            "url": "https://files.pythonhosted.org/packages/6b/7b/ed0f7f2f9b7532a26bc64b3cf9cbee1c0d27d2cde56254dbb371a083ddbf/powerlift-0.1.12-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-28 19:01:52",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "interpretml",
    "github_project": "interpret",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "powerlift"
}
        
Elapsed time: 0.47219s