# softauto 0.7.0 — Advising‑first AutoML
**softauto** is a light, zero‑boilerplate AutoML toolkit that starts with **AI advice**: it inspects your dataframe, infers the task, suggests preprocessing & model shortlists, then trains, evaluates, and saves the best pipeline — with plots and a JSON summary.
- ✅ **Advisor**: suggests task, preprocessing, feature selection, PCA, and model shortlists (with reasons)
- ✅ **User‑choice or Auto**: pass a model name or let softauto pick
- ✅ **EDA**: target distribution, missingness, correlation matrix
- ✅ **Preprocessing**: impute, encode, scale, rare‑category binning, outlier clip
- ✅ **Selection/Extraction**: Mutual Info, RFECV, PCA
- ✅ **Training/Testing**: robust CV, leaderboard, hold‑out metrics, artifacts + saved model
- ✅ **Boosters optional**: XGBoost, LightGBM, CatBoost auto‑detected (no hard dependency)
- ✅ **Tiny API**: `autorun(df, target, task=...)` or `Runner(...).fit()`
---
## Installation
**Core (lightweight):**
```bash
pip install softauto
```
**With boosters & imbalance extras:**
```bash
pip install "softauto[all]"
# or pick subsets:
# pip install "softauto[boosters]"
# pip install "softauto[imbalance]"
```
> Boosters are optional; if not present they’re skipped silently.
---
## Quick Start (Classification)
```python
import pandas as pd
from sklearn.datasets import load_breast_cancer
from softauto import autorun
cancer = load_breast_cancer(as_frame=True)
df = cancer.frame.copy()
df.rename(columns={"target":"target"}, inplace=True)
res = autorun(df, target="target", task="classification", report_dir="report_cls", model="auto")
print(res["best_model"], res["test_metrics"])
# Artifacts in report_cls/: target_dist.png, missingness.png, corr_matrix.png, best_<model>.joblib, summary.json
```
## Quick Start (Regression)
```python
from sklearn.datasets import fetch_california_housing
from softauto import autorun
cal = fetch_california_housing(as_frame=True)
df = cal.frame.copy()
df["target"] = df["MedHouseVal"]; df = df.drop(columns=["MedHouseVal"])
res = autorun(df, target="target", task="regression", report_dir="report_reg", model="auto")
print(res["best_model"], res["test_metrics"])
```
---
## API
### `autorun(df, target, task=None, **kwargs) -> dict`
Single‑shot run. Returns a summary dict with:
- `task`, `best_model`, `cv_leaderboard`, `test_metrics`
- `artifacts` (plot paths), `model_path`, `advisor_notes`, `selected_features`
Common kwargs:
- `report_dir="softauto_artifacts"`, `random_state=42`, `test_size=0.2`
- `model="auto"` or list/str of model names (`"random_forest"`, `"logreg"`, `"xgb"`, `"lgbm"`, `"catboost"`, ...)
- Preprocess: `scale`, `one_hot`, `cat_min_freq`, `numeric_impute`, `categorical_impute`
- Selection: `feature_selection=("mutual_info"|"rfecv"|None)`, `top_k_features`, `pca_components`
- CV: `cv=5`
### `Runner`
```python
from softauto import Runner
r = Runner(df, target="target", task="classification", report_dir="report")
summary = r.fit()
y_pred = r.predict(r.X_test_) # after fit
```
---
## Artifacts
- `target_dist.png` — class/target distribution
- `missingness.png` — top-30 missing rates
- `corr_matrix.png` — numeric correlation heatmap (skips if target not numeric)
- `best_<model>.joblib` — saved sklearn pipeline
- `summary.json` — complete run data
---
## Model Names
Classification: `logreg`, `random_forest`, `gb`, `svm_rbf`, `knn`, `mlp`, (`xgb`, `lgbm`, `catboost` if installed)
Regression: `linear`, `ridge`, `lasso`, `random_forest`, `gb`, `svr_rbf`, `knn`, `mlp`, (`xgb`, `lgbm`, `catboost`)
---
## License
MIT © Soft Tech Talks
Raw data
{
"_id": null,
"home_page": null,
"name": "softauto",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "automl, advice, EDA, sklearn, feature selection, pca",
"author": "Soft Tech Talks",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/ab/4f/19ac33a0aea64c7e422c0fc8ecf9abd943dd8b36029a839fe53986f75622/softauto-0.7.0.tar.gz",
"platform": null,
"description": "# softauto 0.7.0 \u2014 Advising\u2011first AutoML\r\n\r\n**softauto** is a light, zero\u2011boilerplate AutoML toolkit that starts with **AI advice**: it inspects your dataframe, infers the task, suggests preprocessing & model shortlists, then trains, evaluates, and saves the best pipeline \u2014 with plots and a JSON summary.\r\n\r\n- \u2705 **Advisor**: suggests task, preprocessing, feature selection, PCA, and model shortlists (with reasons)\r\n- \u2705 **User\u2011choice or Auto**: pass a model name or let softauto pick\r\n- \u2705 **EDA**: target distribution, missingness, correlation matrix\r\n- \u2705 **Preprocessing**: impute, encode, scale, rare\u2011category binning, outlier clip\r\n- \u2705 **Selection/Extraction**: Mutual Info, RFECV, PCA\r\n- \u2705 **Training/Testing**: robust CV, leaderboard, hold\u2011out metrics, artifacts + saved model\r\n- \u2705 **Boosters optional**: XGBoost, LightGBM, CatBoost auto\u2011detected (no hard dependency)\r\n- \u2705 **Tiny API**: `autorun(df, target, task=...)` or `Runner(...).fit()`\r\n\r\n---\r\n\r\n## Installation\r\n\r\n**Core (lightweight):**\r\n```bash\r\npip install softauto\r\n```\r\n\r\n**With boosters & imbalance extras:**\r\n```bash\r\npip install \"softauto[all]\"\r\n# or pick subsets:\r\n# pip install \"softauto[boosters]\"\r\n# pip install \"softauto[imbalance]\"\r\n```\r\n\r\n> Boosters are optional; if not present they\u2019re skipped silently.\r\n\r\n---\r\n\r\n## Quick Start (Classification)\r\n\r\n```python\r\nimport pandas as pd\r\nfrom sklearn.datasets import load_breast_cancer\r\nfrom softauto import autorun\r\n\r\ncancer = load_breast_cancer(as_frame=True)\r\ndf = cancer.frame.copy()\r\ndf.rename(columns={\"target\":\"target\"}, inplace=True)\r\n\r\nres = autorun(df, target=\"target\", task=\"classification\", report_dir=\"report_cls\", model=\"auto\")\r\nprint(res[\"best_model\"], res[\"test_metrics\"])\r\n# Artifacts in report_cls/: target_dist.png, missingness.png, corr_matrix.png, best_<model>.joblib, summary.json\r\n```\r\n\r\n## Quick Start (Regression)\r\n\r\n```python\r\nfrom sklearn.datasets import fetch_california_housing\r\nfrom softauto import autorun\r\ncal = fetch_california_housing(as_frame=True)\r\ndf = cal.frame.copy()\r\ndf[\"target\"] = df[\"MedHouseVal\"]; df = df.drop(columns=[\"MedHouseVal\"])\r\n\r\nres = autorun(df, target=\"target\", task=\"regression\", report_dir=\"report_reg\", model=\"auto\")\r\nprint(res[\"best_model\"], res[\"test_metrics\"])\r\n```\r\n\r\n---\r\n\r\n## API\r\n\r\n### `autorun(df, target, task=None, **kwargs) -> dict`\r\nSingle\u2011shot run. Returns a summary dict with:\r\n- `task`, `best_model`, `cv_leaderboard`, `test_metrics`\r\n- `artifacts` (plot paths), `model_path`, `advisor_notes`, `selected_features`\r\n\r\nCommon kwargs:\r\n- `report_dir=\"softauto_artifacts\"`, `random_state=42`, `test_size=0.2`\r\n- `model=\"auto\"` or list/str of model names (`\"random_forest\"`, `\"logreg\"`, `\"xgb\"`, `\"lgbm\"`, `\"catboost\"`, ...)\r\n- Preprocess: `scale`, `one_hot`, `cat_min_freq`, `numeric_impute`, `categorical_impute`\r\n- Selection: `feature_selection=(\"mutual_info\"|\"rfecv\"|None)`, `top_k_features`, `pca_components`\r\n- CV: `cv=5`\r\n\r\n### `Runner`\r\n```python\r\nfrom softauto import Runner\r\nr = Runner(df, target=\"target\", task=\"classification\", report_dir=\"report\")\r\nsummary = r.fit()\r\ny_pred = r.predict(r.X_test_) # after fit\r\n```\r\n\r\n---\r\n\r\n## Artifacts\r\n\r\n- `target_dist.png` \u2014 class/target distribution\r\n- `missingness.png` \u2014 top-30 missing rates\r\n- `corr_matrix.png` \u2014 numeric correlation heatmap (skips if target not numeric)\r\n- `best_<model>.joblib` \u2014 saved sklearn pipeline\r\n- `summary.json` \u2014 complete run data\r\n\r\n---\r\n\r\n## Model Names\r\n\r\nClassification: `logreg`, `random_forest`, `gb`, `svm_rbf`, `knn`, `mlp`, (`xgb`, `lgbm`, `catboost` if installed) \r\nRegression: `linear`, `ridge`, `lasso`, `random_forest`, `gb`, `svr_rbf`, `knn`, `mlp`, (`xgb`, `lgbm`, `catboost`)\r\n\r\n---\r\n\r\n## License\r\nMIT \u00a9 Soft Tech Talks\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Advising-first AutoML: EDA, preprocessing, selection/extraction, training, metrics, plots.",
"version": "0.7.0",
"project_urls": null,
"split_keywords": [
"automl",
" advice",
" eda",
" sklearn",
" feature selection",
" pca"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "52abeb40f22d6b1272b5ce210d456880a5fcecc902a72ccd341e09160efce711",
"md5": "ce6d18b0d526a8112a74549eb70c089b",
"sha256": "b0b0da2d35b2918edca75cddeae3314490552c43f0684187f5f48938a489380c"
},
"downloads": -1,
"filename": "softauto-0.7.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ce6d18b0d526a8112a74549eb70c089b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 12462,
"upload_time": "2025-09-12T12:36:11",
"upload_time_iso_8601": "2025-09-12T12:36:11.892521Z",
"url": "https://files.pythonhosted.org/packages/52/ab/eb40f22d6b1272b5ce210d456880a5fcecc902a72ccd341e09160efce711/softauto-0.7.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "ab4f19ac33a0aea64c7e422c0fc8ecf9abd943dd8b36029a839fe53986f75622",
"md5": "14727b1c8c9645a637464feac3cfe3c5",
"sha256": "867db35a9bc18d97d44ce01c08ecdaeb5b0ac0c1fe14ebe55f212e272105a7e3"
},
"downloads": -1,
"filename": "softauto-0.7.0.tar.gz",
"has_sig": false,
"md5_digest": "14727b1c8c9645a637464feac3cfe3c5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 12096,
"upload_time": "2025-09-12T12:36:15",
"upload_time_iso_8601": "2025-09-12T12:36:15.916135Z",
"url": "https://files.pythonhosted.org/packages/ab/4f/19ac33a0aea64c7e422c0fc8ecf9abd943dd8b36029a839fe53986f75622/softauto-0.7.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-12 12:36:15",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "softauto"
}