softauto

Name	softauto JSON
Version	0.7.0 JSON
	download
home_page	None
Summary	Advising-first AutoML: EDA, preprocessing, selection/extraction, training, metrics, plots.
upload_time	2025-09-12 12:36:15
maintainer	None
docs_url	None
author	Soft Tech Talks
requires_python	>=3.9
license	MIT
keywords	automl advice eda sklearn feature selection pca
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # softauto 0.7.0 — Advising‑first AutoML

**softauto** is a light, zero‑boilerplate AutoML toolkit that starts with **AI advice**: it inspects your dataframe, infers the task, suggests preprocessing & model shortlists, then trains, evaluates, and saves the best pipeline — with plots and a JSON summary.

- ✅ **Advisor**: suggests task, preprocessing, feature selection, PCA, and model shortlists (with reasons)
- ✅ **User‑choice or Auto**: pass a model name or let softauto pick
- ✅ **EDA**: target distribution, missingness, correlation matrix
- ✅ **Preprocessing**: impute, encode, scale, rare‑category binning, outlier clip
- ✅ **Selection/Extraction**: Mutual Info, RFECV, PCA
- ✅ **Training/Testing**: robust CV, leaderboard, hold‑out metrics, artifacts + saved model
- ✅ **Boosters optional**: XGBoost, LightGBM, CatBoost auto‑detected (no hard dependency)
- ✅ **Tiny API**: `autorun(df, target, task=...)` or `Runner(...).fit()`

---

## Installation

**Core (lightweight):**
```bash
pip install softauto
```

**With boosters & imbalance extras:**
```bash
pip install "softauto[all]"
# or pick subsets:
# pip install "softauto[boosters]"
# pip install "softauto[imbalance]"
```

> Boosters are optional; if not present they’re skipped silently.

---

## Quick Start (Classification)

```python
import pandas as pd
from sklearn.datasets import load_breast_cancer
from softauto import autorun

cancer = load_breast_cancer(as_frame=True)
df = cancer.frame.copy()
df.rename(columns={"target":"target"}, inplace=True)

res = autorun(df, target="target", task="classification", report_dir="report_cls", model="auto")
print(res["best_model"], res["test_metrics"])
# Artifacts in report_cls/: target_dist.png, missingness.png, corr_matrix.png, best_<model>.joblib, summary.json
```

## Quick Start (Regression)

```python
from sklearn.datasets import fetch_california_housing
from softauto import autorun
cal = fetch_california_housing(as_frame=True)
df = cal.frame.copy()
df["target"] = df["MedHouseVal"]; df = df.drop(columns=["MedHouseVal"])

res = autorun(df, target="target", task="regression", report_dir="report_reg", model="auto")
print(res["best_model"], res["test_metrics"])
```

---

## API

### `autorun(df, target, task=None, **kwargs) -> dict`
Single‑shot run. Returns a summary dict with:
- `task`, `best_model`, `cv_leaderboard`, `test_metrics`
- `artifacts` (plot paths), `model_path`, `advisor_notes`, `selected_features`

Common kwargs:
- `report_dir="softauto_artifacts"`, `random_state=42`, `test_size=0.2`
- `model="auto"` or list/str of model names (`"random_forest"`, `"logreg"`, `"xgb"`, `"lgbm"`, `"catboost"`, ...)
- Preprocess: `scale`, `one_hot`, `cat_min_freq`, `numeric_impute`, `categorical_impute`
- Selection: `feature_selection=("mutual_info"|"rfecv"|None)`, `top_k_features`, `pca_components`
- CV: `cv=5`

### `Runner`
```python
from softauto import Runner
r = Runner(df, target="target", task="classification", report_dir="report")
summary = r.fit()
y_pred = r.predict(r.X_test_)  # after fit
```

---

## Artifacts

- `target_dist.png` — class/target distribution
- `missingness.png` — top-30 missing rates
- `corr_matrix.png` — numeric correlation heatmap (skips if target not numeric)
- `best_<model>.joblib` — saved sklearn pipeline
- `summary.json` — complete run data

---

## Model Names

Classification: `logreg`, `random_forest`, `gb`, `svm_rbf`, `knn`, `mlp`, (`xgb`, `lgbm`, `catboost` if installed)  
Regression: `linear`, `ridge`, `lasso`, `random_forest`, `gb`, `svr_rbf`, `knn`, `mlp`, (`xgb`, `lgbm`, `catboost`)

---

## License
MIT © Soft Tech Talks

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "softauto",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "automl, advice, EDA, sklearn, feature selection, pca",
    "author": "Soft Tech Talks",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/ab/4f/19ac33a0aea64c7e422c0fc8ecf9abd943dd8b36029a839fe53986f75622/softauto-0.7.0.tar.gz",
    "platform": null,
    "description": "# softauto 0.7.0 \u2014 Advising\u2011first AutoML\r\n\r\n**softauto** is a light, zero\u2011boilerplate AutoML toolkit that starts with **AI advice**: it inspects your dataframe, infers the task, suggests preprocessing & model shortlists, then trains, evaluates, and saves the best pipeline \u2014 with plots and a JSON summary.\r\n\r\n- \u2705 **Advisor**: suggests task, preprocessing, feature selection, PCA, and model shortlists (with reasons)\r\n- \u2705 **User\u2011choice or Auto**: pass a model name or let softauto pick\r\n- \u2705 **EDA**: target distribution, missingness, correlation matrix\r\n- \u2705 **Preprocessing**: impute, encode, scale, rare\u2011category binning, outlier clip\r\n- \u2705 **Selection/Extraction**: Mutual Info, RFECV, PCA\r\n- \u2705 **Training/Testing**: robust CV, leaderboard, hold\u2011out metrics, artifacts + saved model\r\n- \u2705 **Boosters optional**: XGBoost, LightGBM, CatBoost auto\u2011detected (no hard dependency)\r\n- \u2705 **Tiny API**: `autorun(df, target, task=...)` or `Runner(...).fit()`\r\n\r\n---\r\n\r\n## Installation\r\n\r\n**Core (lightweight):**\r\n```bash\r\npip install softauto\r\n```\r\n\r\n**With boosters & imbalance extras:**\r\n```bash\r\npip install \"softauto[all]\"\r\n# or pick subsets:\r\n# pip install \"softauto[boosters]\"\r\n# pip install \"softauto[imbalance]\"\r\n```\r\n\r\n> Boosters are optional; if not present they\u2019re skipped silently.\r\n\r\n---\r\n\r\n## Quick Start (Classification)\r\n\r\n```python\r\nimport pandas as pd\r\nfrom sklearn.datasets import load_breast_cancer\r\nfrom softauto import autorun\r\n\r\ncancer = load_breast_cancer(as_frame=True)\r\ndf = cancer.frame.copy()\r\ndf.rename(columns={\"target\":\"target\"}, inplace=True)\r\n\r\nres = autorun(df, target=\"target\", task=\"classification\", report_dir=\"report_cls\", model=\"auto\")\r\nprint(res[\"best_model\"], res[\"test_metrics\"])\r\n# Artifacts in report_cls/: target_dist.png, missingness.png, corr_matrix.png, best_<model>.joblib, summary.json\r\n```\r\n\r\n## Quick Start (Regression)\r\n\r\n```python\r\nfrom sklearn.datasets import fetch_california_housing\r\nfrom softauto import autorun\r\ncal = fetch_california_housing(as_frame=True)\r\ndf = cal.frame.copy()\r\ndf[\"target\"] = df[\"MedHouseVal\"]; df = df.drop(columns=[\"MedHouseVal\"])\r\n\r\nres = autorun(df, target=\"target\", task=\"regression\", report_dir=\"report_reg\", model=\"auto\")\r\nprint(res[\"best_model\"], res[\"test_metrics\"])\r\n```\r\n\r\n---\r\n\r\n## API\r\n\r\n### `autorun(df, target, task=None, **kwargs) -> dict`\r\nSingle\u2011shot run. Returns a summary dict with:\r\n- `task`, `best_model`, `cv_leaderboard`, `test_metrics`\r\n- `artifacts` (plot paths), `model_path`, `advisor_notes`, `selected_features`\r\n\r\nCommon kwargs:\r\n- `report_dir=\"softauto_artifacts\"`, `random_state=42`, `test_size=0.2`\r\n- `model=\"auto\"` or list/str of model names (`\"random_forest\"`, `\"logreg\"`, `\"xgb\"`, `\"lgbm\"`, `\"catboost\"`, ...)\r\n- Preprocess: `scale`, `one_hot`, `cat_min_freq`, `numeric_impute`, `categorical_impute`\r\n- Selection: `feature_selection=(\"mutual_info\"|\"rfecv\"|None)`, `top_k_features`, `pca_components`\r\n- CV: `cv=5`\r\n\r\n### `Runner`\r\n```python\r\nfrom softauto import Runner\r\nr = Runner(df, target=\"target\", task=\"classification\", report_dir=\"report\")\r\nsummary = r.fit()\r\ny_pred = r.predict(r.X_test_)  # after fit\r\n```\r\n\r\n---\r\n\r\n## Artifacts\r\n\r\n- `target_dist.png` \u2014 class/target distribution\r\n- `missingness.png` \u2014 top-30 missing rates\r\n- `corr_matrix.png` \u2014 numeric correlation heatmap (skips if target not numeric)\r\n- `best_<model>.joblib` \u2014 saved sklearn pipeline\r\n- `summary.json` \u2014 complete run data\r\n\r\n---\r\n\r\n## Model Names\r\n\r\nClassification: `logreg`, `random_forest`, `gb`, `svm_rbf`, `knn`, `mlp`, (`xgb`, `lgbm`, `catboost` if installed)  \r\nRegression: `linear`, `ridge`, `lasso`, `random_forest`, `gb`, `svr_rbf`, `knn`, `mlp`, (`xgb`, `lgbm`, `catboost`)\r\n\r\n---\r\n\r\n## License\r\nMIT \u00a9 Soft Tech Talks\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Advising-first AutoML: EDA, preprocessing, selection/extraction, training, metrics, plots.",
    "version": "0.7.0",
    "project_urls": null,
    "split_keywords": [
        "automl",
        " advice",
        " eda",
        " sklearn",
        " feature selection",
        " pca"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "52abeb40f22d6b1272b5ce210d456880a5fcecc902a72ccd341e09160efce711",
                "md5": "ce6d18b0d526a8112a74549eb70c089b",
                "sha256": "b0b0da2d35b2918edca75cddeae3314490552c43f0684187f5f48938a489380c"
            },
            "downloads": -1,
            "filename": "softauto-0.7.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ce6d18b0d526a8112a74549eb70c089b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 12462,
            "upload_time": "2025-09-12T12:36:11",
            "upload_time_iso_8601": "2025-09-12T12:36:11.892521Z",
            "url": "https://files.pythonhosted.org/packages/52/ab/eb40f22d6b1272b5ce210d456880a5fcecc902a72ccd341e09160efce711/softauto-0.7.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ab4f19ac33a0aea64c7e422c0fc8ecf9abd943dd8b36029a839fe53986f75622",
                "md5": "14727b1c8c9645a637464feac3cfe3c5",
                "sha256": "867db35a9bc18d97d44ce01c08ecdaeb5b0ac0c1fe14ebe55f212e272105a7e3"
            },
            "downloads": -1,
            "filename": "softauto-0.7.0.tar.gz",
            "has_sig": false,
            "md5_digest": "14727b1c8c9645a637464feac3cfe3c5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 12096,
            "upload_time": "2025-09-12T12:36:15",
            "upload_time_iso_8601": "2025-09-12T12:36:15.916135Z",
            "url": "https://files.pythonhosted.org/packages/ab/4f/19ac33a0aea64c7e422c0fc8ecf9abd943dd8b36029a839fe53986f75622/softauto-0.7.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-12 12:36:15",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "softauto"
}

Soft Tech Talks