poniard


Nameponiard JSON
Version 0.5.0 PyPI version JSON
download
home_pagehttps://github.com/rxavier/poniard
SummaryStreamline scikit-learn model comparison
upload_time2022-12-21 13:57:34
maintainer
docs_urlNone
authorRafael Xavier
requires_python>=3.7
licenseMIT
keywords machine learning scikit-learn
VCS
bugtrack_url
requirements joblib numpy pandas plotly python-dateutil pytz scikit-learn scipy six tenacity threadpoolctl tqdm xgboost
Travis-CI No Travis.
coveralls test coverage No coveralls.
            Poniard
================

<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->
<p align="center">
<img src="https://raw.githubusercontent.com/rxavier/poniard/main/logo.png" alt="Poniard logo" title="Poniard" width="50%"/>
</p>

## Introduction

> A poniard /ˈpɒnjərd/ or poignard (Fr.) is a long, lightweight
> thrusting knife ([Wikipedia](https://en.wikipedia.org/wiki/Poignard)).

Poniard is a scikit-learn companion library that streamlines the process
of fitting different machine learning models and comparing them.

It can be used to provide quick answers to questions like these: \* What
is the reasonable range of scores for this task? \* Is a simple and
explainable linear model enough or should I work with forests and
gradient boosters? \* Are the features good enough as is or should I
work on feature engineering? \* How much can hyperparemeter tuning
improve metrics? \* Do I need to work on a custom preprocessing
strategy?

This is not meant to be end to end solution, and you definitely should
keep on working on your models after you are done with Poniard.

The core functionality has been tested to work on Python 3.7 through
3.10 on Linux systems, and from 3.8 to 3.10 on macOS.

## Installation

Stable version:

``` bash
pip install poniard
```

Dev version with most up to date changes:

``` bash
pip install git+https://github.com/rxavier/poniard.git@develop#egg=poniard
```

## Documentation

Check the full [Quarto docs](https://rxavier.github.io/poniard),
including guides and API reference.

## Usage/features

### Basics

The API was designed with tabular tasks in mind, but it should also work
with time series tasks provided an appropiate cross validation strategy
is used (don’t shuffle!)

The usual Poniard flow is: 1. Define some estimators. 2. Define some
metrics. 3. Define a cross validation strategy. 4. Fit everything. 5.
Print the results.

Poniard provides sane defaults for 1, 2 and 3, so in most cases you can
just do…

``` python
from poniard import PoniardRegressor
from sklearn.datasets import load_diabetes
```

``` python
X, y = load_diabetes(return_X_y=True, as_frame=True)
pnd = PoniardRegressor(random_state=0)
pnd.setup(X, y)
pnd.fit()
```

                         <h2>Setup info</h2>
                         <h3>Target</h3>
                             <p><b>Type:</b> continuous</p>
                             <p><b>Shape:</b> (442,)</p>
                             <p><b>Unique values:</b> 214</p>
                             <h3>Metrics</h3>
                             <b>Main metric:</b> neg_mean_squared_error

 <h3>Feature type inference</h3>
                                <p><b>Minimum unique values to consider a number-like feature numeric:</b> 44</p>
                                <p><b>Minimum unique values to consider a categorical feature high cardinality:</b> 20</p>
                                <p><b>Inferred feature types:</b></p>
                                <table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>numeric</th>
      <th>categorical_high</th>
      <th>categorical_low</th>
      <th>datetime</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>age</td>
      <td></td>
      <td>sex</td>
      <td></td>
    </tr>
    <tr>
      <th>1</th>
      <td>bmi</td>
      <td></td>
      <td></td>
      <td></td>
    </tr>
    <tr>
      <th>2</th>
      <td>bp</td>
      <td></td>
      <td></td>
      <td></td>
    </tr>
    <tr>
      <th>3</th>
      <td>s1</td>
      <td></td>
      <td></td>
      <td></td>
    </tr>
    <tr>
      <th>4</th>
      <td>s2</td>
      <td></td>
      <td></td>
      <td></td>
    </tr>
    <tr>
      <th>5</th>
      <td>s3</td>
      <td></td>
      <td></td>
      <td></td>
    </tr>
    <tr>
      <th>6</th>
      <td>s4</td>
      <td></td>
      <td></td>
      <td></td>
    </tr>
    <tr>
      <th>7</th>
      <td>s5</td>
      <td></td>
      <td></td>
      <td></td>
    </tr>
    <tr>
      <th>8</th>
      <td>s6</td>
      <td></td>
      <td></td>
      <td></td>
    </tr>
  </tbody>
</table>

      0%|          | 0/9 [00:00<?, ?it/s]

    PoniardRegressor(random_state=0)

… and get a nice table showing the average of each metric in all folds
for every model, including fit and score times (thanks, scikit-learn
`cross_validate` function!)

``` python
pnd.get_results()
```

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>test_neg_mean_squared_error</th>
      <th>test_neg_mean_absolute_percentage_error</th>
      <th>test_neg_median_absolute_error</th>
      <th>test_r2</th>
      <th>fit_time</th>
      <th>score_time</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>LinearRegression</th>
      <td>-2977.598515</td>
      <td>-0.396566</td>
      <td>-39.009146</td>
      <td>0.489155</td>
      <td>0.005265</td>
      <td>0.001960</td>
    </tr>
    <tr>
      <th>ElasticNet</th>
      <td>-3159.017211</td>
      <td>-0.422912</td>
      <td>-42.619546</td>
      <td>0.460740</td>
      <td>0.003509</td>
      <td>0.001755</td>
    </tr>
    <tr>
      <th>RandomForestRegressor</th>
      <td>-3431.823331</td>
      <td>-0.419956</td>
      <td>-42.203000</td>
      <td>0.414595</td>
      <td>0.101435</td>
      <td>0.004821</td>
    </tr>
    <tr>
      <th>HistGradientBoostingRegressor</th>
      <td>-3544.069433</td>
      <td>-0.407417</td>
      <td>-40.396390</td>
      <td>0.391633</td>
      <td>0.334695</td>
      <td>0.009266</td>
    </tr>
    <tr>
      <th>KNeighborsRegressor</th>
      <td>-3615.195398</td>
      <td>-0.418674</td>
      <td>-38.980000</td>
      <td>0.379625</td>
      <td>0.003038</td>
      <td>0.002083</td>
    </tr>
    <tr>
      <th>XGBRegressor</th>
      <td>-3923.488860</td>
      <td>-0.426471</td>
      <td>-39.031309</td>
      <td>0.329961</td>
      <td>0.055696</td>
      <td>0.002855</td>
    </tr>
    <tr>
      <th>LinearSVR</th>
      <td>-4268.314411</td>
      <td>-0.374296</td>
      <td>-43.388592</td>
      <td>0.271443</td>
      <td>0.003470</td>
      <td>0.001721</td>
    </tr>
    <tr>
      <th>DummyRegressor</th>
      <td>-5934.577616</td>
      <td>-0.621540</td>
      <td>-61.775921</td>
      <td>-0.000797</td>
      <td>0.003010</td>
      <td>0.001627</td>
    </tr>
    <tr>
      <th>DecisionTreeRegressor</th>
      <td>-6728.423034</td>
      <td>-0.591906</td>
      <td>-59.700000</td>
      <td>-0.145460</td>
      <td>0.004179</td>
      <td>0.001667</td>
    </tr>
  </tbody>
</table>

Alternatively, you can also get a nice plot of your different metrics by
using the `PoniardBaseEstimator.plot.metrics` method.

### Type inference

Poniard uses some basic heuristics to infer the data types.

Float and integer columns are defined as numeric if the number of unique
values is greater than indicated by the `categorical_threshold`
parameter.

String/object/categorical columns are assumed to be categorical.

Datetime features are processed separately with a custom encoder.

For categorical features, high and low cardinality is defined by the
`cardinality_threshold` parameter. Only low cardinality categorical
features are one-hot encoded.

### Ensembles

Poniard makes it easy to combine various estimators in stacking or
voting ensembles. The base esimators can be selected according to their
performance (top-n) or chosen by their names.

Poniard also reports how similar the predictions of the estimators are,
so ensembles with different base estimators can be built. A basic
correlation table of the cross-validated predictions is built for
regression tasks, while [Cramér’s
V](https://en.wikipedia.org/wiki/Cram%C3%A9r%27s_V) is used for
classification.

By default, it computes this similarity of prediction errors instead of
the actual predictions; this helps in building ensembles with good
scoring estimators and uncorrelated errors, which in principle and
hopefully should lead to a “wisdom of crowds” kind of situation.

### Hyperparameter optimization

The
[`PoniardBaseEstimator.tune_estimator`](https://rxavier.github.io/poniard/estimators.core.html#poniardbaseestimator.tune_estimator)
method can be used to optimize the hyperparameters of a given estimator,
either by passing a grid of parameters or using the inbuilt ones
available for default estimators. The tuned estimator will be added to
the list of estimators and will be scored the next time
[`PoniardBaseEstimator.fit`](https://rxavier.github.io/poniard/estimators.core.html#poniardbaseestimator.fit)
is called.

### Plotting

The `plot` accessor provides several plotting methods based on the
attached Poniard estimator instance. These Plotly plots are based on a
default template, but can be modified by passing a different
[`PoniardPlotFactory`](https://rxavier.github.io/poniard/plot.plot_factory.html#poniardplotfactory)
to the Poniard `plot_options` argument.

### Plugin system

The `plugins` argument in Poniard estimators takes a plugin or list of
plugins that subclass
[`BasePlugin`](https://rxavier.github.io/poniard/plugins.core.html#baseplugin).
These plugins have access to the Poniard estimator instance and hook
onto different sections of the process, for example, on setup start, on
fit end, on remove estimator, etc.

This makes it easy for third parties to extend Poniard’s functionality.

Two plugins are baked into Poniard. 1. Weights and Biases: logs your
data, plots, runs wandb scikit-learn analysis, saves model artifacts,
etc. 2. Pandas Profiling: generates an HTML report of the features and
target. If the Weights and Biases plugin is present, also logs this
report to the wandb run.

The requirements for these plugins are not included in the base Poniard
dependencies, so you can safely ignore them if you don’t intend to use
them.

## Design philosophy

### Not another dependency

We try very hard to avoid cluttering the environment with stuff you
won’t use outside of this library. Poniard’s dependencies are:

1.  scikit-learn (duh)
2.  pandas
3.  XGBoost
4.  Plotly
5.  tqdm
6.  That’s it!

Apart from `tqdm` and possibly `Plotly`, all dependencies most likely
were going to be installed anyway, so Poniard’s added footprint should
be small.

### We don’t do that here (AutoML)

Poniard tries not to take control away from the user. As such, it is not
designed to perform 2 hours of feature engineering and selection, try
every model under the sun together with endless ensembles and select the
top performing model according to some metric.

Instead, it strives to abstract away some of the boilerplate code needed
to fit and compare a number of models and allows the user to decide what
to do with the results.

Poniard can be your first stab at a prediction problem, but it
definitely shouldn’t be your last one.

### Opinionated with a few exceptions

While some parameters can be modified to control how variable type
inference and preprocessing are performed, the API is designed to
prevent parameter proliferation.

### Cross validate all the things

Everything in Poniard is run with cross validation by default, and in
fact no relevant functionality can be used without cross validation.

### Use baselines

A dummy estimator is always included in model comparisons so you can
gauge whether your model is better than a dumb strategy.

### Fast TTFM (time to first model)

Preprocessing tries to ensure that your models run successfully without
significant data munging. By default, Poniard imputes missing data and
one-hot encodes or target encodes (depending on cardinality) inferred
categorical variables, which in most cases is enough for scikit-learn
algorithms to fit without complaints. Additionally, it scales numeric
data and drops features with a single unique value.

## Similar projects

Poniard is not a groundbreaking idea, and a number of libraries follow a
similar approach.

**[ATOM](https://github.com/tvdboom/ATOM)** is perhaps the most similar
library to Poniard, albeit with a different approach to the API.

**[LazyPredict](https://github.com/shankarpandala/lazypredict)** is
similar in that it runs multiple estimators and provides results for
various metrics. Unlike Poniard, by default it tries most scikit-learn
estimators, and is not based on cross validation.

**[PyCaret](https://github.com/pycaret/pycaret)** is a whole other beast
that includes model explainability, deployment, plotting, NLP, anomaly
detection, etc., which leads to a list of dependencies several times
larger than Poniard’s, and a more complicated API.



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/rxavier/poniard",
    "name": "poniard",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "machine learning,scikit-learn",
    "author": "Rafael Xavier",
    "author_email": "rxaviermontero@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/77/82/9603b4f3f4971d36b1b13ff5ac348e6d992cab56f3f8a7f1318bf4eb158a/poniard-0.5.0.tar.gz",
    "platform": null,
    "description": "Poniard\n================\n\n<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->\n<p align=\"center\">\n<img src=\"https://raw.githubusercontent.com/rxavier/poniard/main/logo.png\" alt=\"Poniard logo\" title=\"Poniard\" width=\"50%\"/>\n</p>\n\n## Introduction\n\n> A poniard /\u02c8p\u0252nj\u0259rd/ or poignard (Fr.) is a long, lightweight\n> thrusting knife ([Wikipedia](https://en.wikipedia.org/wiki/Poignard)).\n\nPoniard is a scikit-learn companion library that streamlines the process\nof fitting different machine learning models and comparing them.\n\nIt can be used to provide quick answers to questions like these: \\* What\nis the reasonable range of scores for this task? \\* Is a simple and\nexplainable linear model enough or should I work with forests and\ngradient boosters? \\* Are the features good enough as is or should I\nwork on feature engineering? \\* How much can hyperparemeter tuning\nimprove metrics? \\* Do I need to work on a custom preprocessing\nstrategy?\n\nThis is not meant to be end to end solution, and you definitely should\nkeep on working on your models after you are done with Poniard.\n\nThe core functionality has been tested to work on Python 3.7 through\n3.10 on Linux systems, and from 3.8 to 3.10 on macOS.\n\n## Installation\n\nStable version:\n\n``` bash\npip install poniard\n```\n\nDev version with most up to date changes:\n\n``` bash\npip install git+https://github.com/rxavier/poniard.git@develop#egg=poniard\n```\n\n## Documentation\n\nCheck the full [Quarto docs](https://rxavier.github.io/poniard),\nincluding guides and API reference.\n\n## Usage/features\n\n### Basics\n\nThe API was designed with tabular tasks in mind, but it should also work\nwith time series tasks provided an appropiate cross validation strategy\nis used (don\u2019t shuffle!)\n\nThe usual Poniard flow is: 1. Define some estimators. 2. Define some\nmetrics. 3. Define a cross validation strategy. 4. Fit everything. 5.\nPrint the results.\n\nPoniard provides sane defaults for 1, 2 and 3, so in most cases you can\njust do\u2026\n\n``` python\nfrom poniard import PoniardRegressor\nfrom sklearn.datasets import load_diabetes\n```\n\n``` python\nX, y = load_diabetes(return_X_y=True, as_frame=True)\npnd = PoniardRegressor(random_state=0)\npnd.setup(X, y)\npnd.fit()\n```\n\n                         <h2>Setup info</h2>\n                         <h3>Target</h3>\n                             <p><b>Type:</b> continuous</p>\n                             <p><b>Shape:</b> (442,)</p>\n                             <p><b>Unique values:</b> 214</p>\n                             <h3>Metrics</h3>\n                             <b>Main metric:</b> neg_mean_squared_error\n\n <h3>Feature type inference</h3>\n                                <p><b>Minimum unique values to consider a number-like feature numeric:</b> 44</p>\n                                <p><b>Minimum unique values to consider a categorical feature high cardinality:</b> 20</p>\n                                <p><b>Inferred feature types:</b></p>\n                                <table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>numeric</th>\n      <th>categorical_high</th>\n      <th>categorical_low</th>\n      <th>datetime</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>age</td>\n      <td></td>\n      <td>sex</td>\n      <td></td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>bmi</td>\n      <td></td>\n      <td></td>\n      <td></td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>bp</td>\n      <td></td>\n      <td></td>\n      <td></td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>s1</td>\n      <td></td>\n      <td></td>\n      <td></td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>s2</td>\n      <td></td>\n      <td></td>\n      <td></td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>s3</td>\n      <td></td>\n      <td></td>\n      <td></td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>s4</td>\n      <td></td>\n      <td></td>\n      <td></td>\n    </tr>\n    <tr>\n      <th>7</th>\n      <td>s5</td>\n      <td></td>\n      <td></td>\n      <td></td>\n    </tr>\n    <tr>\n      <th>8</th>\n      <td>s6</td>\n      <td></td>\n      <td></td>\n      <td></td>\n    </tr>\n  </tbody>\n</table>\n\n      0%|          | 0/9 [00:00<?, ?it/s]\n\n    PoniardRegressor(random_state=0)\n\n\u2026 and get a nice table showing the average of each metric in all folds\nfor every model, including fit and score times (thanks, scikit-learn\n`cross_validate` function!)\n\n``` python\npnd.get_results()\n```\n\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>test_neg_mean_squared_error</th>\n      <th>test_neg_mean_absolute_percentage_error</th>\n      <th>test_neg_median_absolute_error</th>\n      <th>test_r2</th>\n      <th>fit_time</th>\n      <th>score_time</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>LinearRegression</th>\n      <td>-2977.598515</td>\n      <td>-0.396566</td>\n      <td>-39.009146</td>\n      <td>0.489155</td>\n      <td>0.005265</td>\n      <td>0.001960</td>\n    </tr>\n    <tr>\n      <th>ElasticNet</th>\n      <td>-3159.017211</td>\n      <td>-0.422912</td>\n      <td>-42.619546</td>\n      <td>0.460740</td>\n      <td>0.003509</td>\n      <td>0.001755</td>\n    </tr>\n    <tr>\n      <th>RandomForestRegressor</th>\n      <td>-3431.823331</td>\n      <td>-0.419956</td>\n      <td>-42.203000</td>\n      <td>0.414595</td>\n      <td>0.101435</td>\n      <td>0.004821</td>\n    </tr>\n    <tr>\n      <th>HistGradientBoostingRegressor</th>\n      <td>-3544.069433</td>\n      <td>-0.407417</td>\n      <td>-40.396390</td>\n      <td>0.391633</td>\n      <td>0.334695</td>\n      <td>0.009266</td>\n    </tr>\n    <tr>\n      <th>KNeighborsRegressor</th>\n      <td>-3615.195398</td>\n      <td>-0.418674</td>\n      <td>-38.980000</td>\n      <td>0.379625</td>\n      <td>0.003038</td>\n      <td>0.002083</td>\n    </tr>\n    <tr>\n      <th>XGBRegressor</th>\n      <td>-3923.488860</td>\n      <td>-0.426471</td>\n      <td>-39.031309</td>\n      <td>0.329961</td>\n      <td>0.055696</td>\n      <td>0.002855</td>\n    </tr>\n    <tr>\n      <th>LinearSVR</th>\n      <td>-4268.314411</td>\n      <td>-0.374296</td>\n      <td>-43.388592</td>\n      <td>0.271443</td>\n      <td>0.003470</td>\n      <td>0.001721</td>\n    </tr>\n    <tr>\n      <th>DummyRegressor</th>\n      <td>-5934.577616</td>\n      <td>-0.621540</td>\n      <td>-61.775921</td>\n      <td>-0.000797</td>\n      <td>0.003010</td>\n      <td>0.001627</td>\n    </tr>\n    <tr>\n      <th>DecisionTreeRegressor</th>\n      <td>-6728.423034</td>\n      <td>-0.591906</td>\n      <td>-59.700000</td>\n      <td>-0.145460</td>\n      <td>0.004179</td>\n      <td>0.001667</td>\n    </tr>\n  </tbody>\n</table>\n\nAlternatively, you can also get a nice plot of your different metrics by\nusing the `PoniardBaseEstimator.plot.metrics` method.\n\n### Type inference\n\nPoniard uses some basic heuristics to infer the data types.\n\nFloat and integer columns are defined as numeric if the number of unique\nvalues is greater than indicated by the `categorical_threshold`\nparameter.\n\nString/object/categorical columns are assumed to be categorical.\n\nDatetime features are processed separately with a custom encoder.\n\nFor categorical features, high and low cardinality is defined by the\n`cardinality_threshold` parameter. Only low cardinality categorical\nfeatures are one-hot encoded.\n\n### Ensembles\n\nPoniard makes it easy to combine various estimators in stacking or\nvoting ensembles. The base esimators can be selected according to their\nperformance (top-n) or chosen by their names.\n\nPoniard also reports how similar the predictions of the estimators are,\nso ensembles with different base estimators can be built. A basic\ncorrelation table of the cross-validated predictions is built for\nregression tasks, while [Cram\u00e9r\u2019s\nV](https://en.wikipedia.org/wiki/Cram%C3%A9r%27s_V) is used for\nclassification.\n\nBy default, it computes this similarity of prediction errors instead of\nthe actual predictions; this helps in building ensembles with good\nscoring estimators and uncorrelated errors, which in principle and\nhopefully should lead to a \u201cwisdom of crowds\u201d kind of situation.\n\n### Hyperparameter optimization\n\nThe\n[`PoniardBaseEstimator.tune_estimator`](https://rxavier.github.io/poniard/estimators.core.html#poniardbaseestimator.tune_estimator)\nmethod can be used to optimize the hyperparameters of a given estimator,\neither by passing a grid of parameters or using the inbuilt ones\navailable for default estimators. The tuned estimator will be added to\nthe list of estimators and will be scored the next time\n[`PoniardBaseEstimator.fit`](https://rxavier.github.io/poniard/estimators.core.html#poniardbaseestimator.fit)\nis called.\n\n### Plotting\n\nThe `plot` accessor provides several plotting methods based on the\nattached Poniard estimator instance. These Plotly plots are based on a\ndefault template, but can be modified by passing a different\n[`PoniardPlotFactory`](https://rxavier.github.io/poniard/plot.plot_factory.html#poniardplotfactory)\nto the Poniard `plot_options`\u00a0argument.\n\n### Plugin system\n\nThe `plugins` argument in Poniard estimators takes a plugin or list of\nplugins that subclass\n[`BasePlugin`](https://rxavier.github.io/poniard/plugins.core.html#baseplugin).\nThese plugins have access to the Poniard estimator instance and hook\nonto different sections of the process, for example, on setup start, on\nfit end, on remove estimator, etc.\n\nThis makes it easy for third parties to extend Poniard\u2019s functionality.\n\nTwo plugins are baked into Poniard. 1. Weights and Biases: logs your\ndata, plots, runs wandb scikit-learn analysis, saves model artifacts,\netc. 2. Pandas Profiling: generates an HTML report of the features and\ntarget. If the Weights and Biases plugin is present, also logs this\nreport to the wandb run.\n\nThe requirements for these plugins are not included in the base Poniard\ndependencies, so you can safely ignore them if you don\u2019t intend to use\nthem.\n\n## Design philosophy\n\n### Not another dependency\n\nWe try very hard to avoid cluttering the environment with stuff you\nwon\u2019t use outside of this library. Poniard\u2019s dependencies are:\n\n1.  scikit-learn (duh)\n2.  pandas\n3.  XGBoost\n4.  Plotly\n5.  tqdm\n6.  That\u2019s it!\n\nApart from `tqdm` and possibly `Plotly`, all dependencies most likely\nwere going to be installed anyway, so Poniard\u2019s added footprint should\nbe small.\n\n### We don\u2019t do that here (AutoML)\n\nPoniard tries not to take control away from the user. As such, it is not\ndesigned to perform 2 hours of feature engineering and selection, try\nevery model under the sun together with endless ensembles and select the\ntop performing model according to some metric.\n\nInstead, it strives to abstract away some of the boilerplate code needed\nto fit and compare a number of models and allows the user to decide what\nto do with the results.\n\nPoniard can be your first stab at a prediction problem, but it\ndefinitely shouldn\u2019t be your last one.\n\n### Opinionated with a few exceptions\n\nWhile some parameters can be modified to control how variable type\ninference and preprocessing are performed, the API is designed to\nprevent parameter proliferation.\n\n### Cross validate all the things\n\nEverything in Poniard is run with cross validation by default, and in\nfact no relevant functionality can be used without cross validation.\n\n### Use baselines\n\nA dummy estimator is always included in model comparisons so you can\ngauge whether your model is better than a dumb strategy.\n\n### Fast TTFM (time to first model)\n\nPreprocessing tries to ensure that your models run successfully without\nsignificant data munging. By default, Poniard imputes missing data and\none-hot encodes or target encodes (depending on cardinality) inferred\ncategorical variables, which in most cases is enough for scikit-learn\nalgorithms to fit without complaints. Additionally, it scales numeric\ndata and drops features with a single unique value.\n\n## Similar projects\n\nPoniard is not a groundbreaking idea, and a number of libraries follow a\nsimilar approach.\n\n**[ATOM](https://github.com/tvdboom/ATOM)** is perhaps the most similar\nlibrary to Poniard, albeit with a different approach to the API.\n\n**[LazyPredict](https://github.com/shankarpandala/lazypredict)** is\nsimilar in that it runs multiple estimators and provides results for\nvarious metrics. Unlike Poniard, by default it tries most scikit-learn\nestimators, and is not based on cross validation.\n\n**[PyCaret](https://github.com/pycaret/pycaret)** is a whole other beast\nthat includes model explainability, deployment, plotting, NLP, anomaly\ndetection, etc., which leads to a list of dependencies several times\nlarger than Poniard\u2019s, and a more complicated API.\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Streamline scikit-learn model comparison",
    "version": "0.5.0",
    "split_keywords": [
        "machine learning",
        "scikit-learn"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "126d681e10a1005cd19e82d62c129018",
                "sha256": "d83ff36215992c0575deea846a1bd483cf44a9fcea3fdf431b93ed188d63c108"
            },
            "downloads": -1,
            "filename": "poniard-0.5.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "126d681e10a1005cd19e82d62c129018",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 59421,
            "upload_time": "2022-12-21T13:57:33",
            "upload_time_iso_8601": "2022-12-21T13:57:33.346667Z",
            "url": "https://files.pythonhosted.org/packages/dc/63/3d70e3010d69ed32d86219c237ec78c3eb2404016eebdc5b823404485e2e/poniard-0.5.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "dc9802745640a30ba0330520f08f3a3a",
                "sha256": "3267458764bd6bd366cb9df8445ec4275bad58ce0d943af643126ba434af8613"
            },
            "downloads": -1,
            "filename": "poniard-0.5.0.tar.gz",
            "has_sig": false,
            "md5_digest": "dc9802745640a30ba0330520f08f3a3a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 55591,
            "upload_time": "2022-12-21T13:57:34",
            "upload_time_iso_8601": "2022-12-21T13:57:34.656921Z",
            "url": "https://files.pythonhosted.org/packages/77/82/9603b4f3f4971d36b1b13ff5ac348e6d992cab56f3f8a7f1318bf4eb158a/poniard-0.5.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-12-21 13:57:34",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "rxavier",
    "github_project": "poniard",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "joblib",
            "specs": [
                [
                    "==",
                    "1.2.0"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "==",
                    "1.22.4"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    "==",
                    "1.3.5"
                ]
            ]
        },
        {
            "name": "plotly",
            "specs": [
                [
                    "==",
                    "5.7.0"
                ]
            ]
        },
        {
            "name": "python-dateutil",
            "specs": [
                [
                    "==",
                    "2.8.2"
                ]
            ]
        },
        {
            "name": "pytz",
            "specs": [
                [
                    "==",
                    "2021.3"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    "==",
                    "1.1.0"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    "==",
                    "1.7.3"
                ]
            ]
        },
        {
            "name": "six",
            "specs": [
                [
                    "==",
                    "1.16.0"
                ]
            ]
        },
        {
            "name": "tenacity",
            "specs": [
                [
                    "==",
                    "8.0.1"
                ]
            ]
        },
        {
            "name": "threadpoolctl",
            "specs": [
                [
                    "==",
                    "3.1.0"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    "==",
                    "4.63.0"
                ]
            ]
        },
        {
            "name": "xgboost",
            "specs": [
                [
                    "==",
                    "1.5.2"
                ]
            ]
        }
    ],
    "tox": true,
    "lcname": "poniard"
}
        
Elapsed time: 0.02587s