mljar-supervised


Namemljar-supervised JSON
Version 1.1.14 PyPI version JSON
download
home_pagehttps://github.com/mljar/mljar-supervised
SummaryAutomated Machine Learning for Humans
upload_time2024-11-12 09:10:30
maintainerNone
docs_urlNone
authorMLJAR, Sp. z o.o.
requires_python>=3.8
licenseMIT
keywords automated machine learning automl machine learning data science data mining mljar random forest decision tree xgboost lightgbm catboost neural network extra trees linear model features selection features engineering
VCS
bugtrack_url
requirements numpy pandas scipy scikit-learn xgboost lightgbm catboost joblib tabulate matplotlib dtreeviz shap seaborn wordcloud category_encoders optuna-integration mljar-scikit-plot markdown typing-extensions ipython
Travis-CI No Travis.
coveralls test coverage No coveralls.
            

# MLJAR Automated Machine Learning for Humans

[![Tests status](https://github.com/mljar/mljar-supervised/actions/workflows/run-tests.yml/badge.svg)](https://github.com/mljar/mljar-supervised/actions/workflows/run-tests.yml)
[![PyPI version](https://badge.fury.io/py/mljar-supervised.svg)](https://badge.fury.io/py/mljar-supervised)
[![Anaconda-Server Badge](https://anaconda.org/conda-forge/mljar-supervised/badges/version.svg)](https://anaconda.org/conda-forge/mljar-supervised)
[![PyPI pyversions](https://img.shields.io/pypi/pyversions/mljar-supervised.svg)](https://pypi.python.org/pypi/mljar-supervised/)


[![Anaconda-Server Badge](https://anaconda.org/conda-forge/mljar-supervised/badges/platforms.svg)](https://anaconda.org/conda-forge/mljar-supervised)
[![Anaconda-Server Badge](https://anaconda.org/conda-forge/mljar-supervised/badges/license.svg)](https://anaconda.org/conda-forge/mljar-supervised)
[![Downloads](https://pepy.tech/badge/mljar-supervised)](https://pepy.tech/project/mljar-supervised)

<p align="center">
  <img 
    alt="mljar AutoML"
    src="https://raw.githubusercontent.com/mljar/mljar-examples/master/media/AutoML_white.png#gh-light-mode-only" width="50%" />  
</p>
<p align="center">
  <img 
    alt="mljar AutoML"
    src="https://raw.githubusercontent.com/mljar/mljar-examples/master/media/AutoML_black.png#gh-dark-mode-only" width="50%" />  
</p>

---

**Documentation**: <a href="https://supervised.mljar.com/" target="_blank">https://supervised.mljar.com/</a>

**Source Code**: <a href="https://github.com/mljar/mljar-supervised" target="_blank">https://github.com/mljar/mljar-supervised</a>

**Looking for commercial support**: Please contact us by [email](https://mljar.com/contact/) for details


<p align="center">
  <img src="https://raw.githubusercontent.com/mljar/mljar-examples/master/media/pipeline_AutoML.png" width="100%" />
</p>

---

## Table of Contents

 - [Automated Machine Learning](https://github.com/mljar/mljar-supervised#automated-machine-learning)
 - [What's good in it?](https://github.com/mljar/mljar-supervised#whats-good-in-it)
 - [AutoML Web App with GUI](https://github.com/mljar/mljar-supervised#automl-web-app-with-user-interface)
 - [Automatic Documentation](https://github.com/mljar/mljar-supervised#automatic-documentation)
 - [Available Modes](https://github.com/mljar/mljar-supervised#available-modes)
 - [Fairness Aware Training](https://github.com/mljar/mljar-supervised#fairness-aware-training)
 - [Examples](https://github.com/mljar/mljar-supervised#examples)
 - [FAQ](https://github.com/mljar/mljar-supervised#faq)
 - [Documentation](https://github.com/mljar/mljar-supervised#documentation)
 - [Installation](https://github.com/mljar/mljar-supervised#installation)
 - [Demo](https://github.com/mljar/mljar-supervised#demo)
 - [Contributing](https://github.com/mljar/mljar-supervised#contributing)
 - [Cite](https://github.com/mljar/mljar-supervised#cite)
 - [License](https://github.com/mljar/mljar-supervised#license)
 - [Commercial support](https://github.com/mljar/mljar-supervised#commercial-support)
 - [MLJAR](https://github.com/mljar/mljar-supervised#mljar)
 




# Automated Machine Learning 

The `mljar-supervised` is an Automated Machine Learning Python package that works with tabular data. It is designed to save time for a data scientist. It abstracts the common way to preprocess the data, construct the machine learning models, and perform hyper-parameters tuning to find the best model :trophy:. It is no black box, as you can see exactly how the ML pipeline is constructed (with a detailed Markdown report for each ML model). 

The `mljar-supervised` will help you with:
 - explaining and understanding your data (Automatic Exploratory Data Analysis),
 - trying many different machine learning models (Algorithm Selection and Hyper-Parameters tuning),
 - creating Markdown reports from analysis with details about all models (Automatic-Documentation),
 - saving, re-running, and loading the analysis and ML models.

It has four built-in modes of work:
 - `Explain` mode, which is ideal for explaining and understanding the data, with many data explanations, like decision trees visualization, linear models coefficients display, permutation importance, and SHAP explanations of data,
 - `Perform` for building ML pipelines to use in production,
 - `Compete` mode that trains highly-tuned ML models with ensembling and stacking, with the purpose to use in ML competitions.
 - `Optuna` mode can be used to search for highly-tuned ML models should be used when the performance is the most important, and computation time is not limited (it is available from version `0.10.0`)

Of course, you can further customize the details of each `mode` to meet the requirements.

## What's good in it? 

- It uses many algorithms: `Baseline`, `Linear`, `Random Forest`, `Extra Trees`, `LightGBM`, `Xgboost`, `CatBoost`, `Neural Networks`, and `Nearest Neighbors`.
- It can compute Ensemble based on a greedy algorithm from [Caruana paper](http://www.cs.cornell.edu/~alexn/papers/shotgun.icml04.revised.rev2.pdf).
- It can stack models to build a level 2 ensemble (available in `Compete` mode or after setting the `stack_models` parameter).
- It can do features preprocessing, like missing values imputation and converting categoricals. What is more, it can also handle target values preprocessing.
- It can do advanced features engineering, like [Golden Features](https://supervised.mljar.com/features/golden_features/), [Features Selection](https://supervised.mljar.com/features/features_selection/), Text and Time Transformations.
- It can tune hyper-parameters with a `not-so-random-search` algorithm (random-search over a defined set of values) and hill climbing to fine-tune final models.
- It can compute the `Baseline` for your data so that you will know if you need Machine Learning or not!
- It has extensive explanations. This package is training simple `Decision Trees` with `max_depth <= 5`, so you can easily visualize them with amazing [dtreeviz](https://github.com/parrt/dtreeviz) to better understand your data.
- The `mljar-supervised` uses simple linear regression and includes its coefficients in the summary report, so you can check which features are used the most in the linear model.
- It cares about the explainability of models: for every algorithm, the feature importance is computed based on permutation. Additionally, for every algorithm, the SHAP explanations are computed: feature importance, dependence plots, and decision plots (explanations can be switched off with the `explain_level` parameter).
- There is automatic documentation for every ML experiment run with AutoML. The `mljar-supervised` creates markdown reports from AutoML training full of ML details, metrics, and charts. 

<p align="center">
  <img src="https://raw.githubusercontent.com/mljar/visual-identity/main/media/infograph.png" width="100%" />
</p>

# AutoML Web App with User Interface

We created a Web App with GUI, so you don't need to write any code 🐍. Just upload your data. Please check the Web App at [github.com/mljar/automl-app](https://github.com/mljar/automl-app). You can run this Web App locally on your computer, so your data is safe and secure :cat:

<kbd>
<img src="https://github.com/mljar/automl-app/blob/main/media/web-app.gif" alt="AutoML training in Web App"></img>
</kbd>

# Automatic Documentation

## The AutoML Report

The report from running AutoML will contain the table with information about each model score and the time needed to train the model. There is a link for each model, which you can click to see the model's details. The performance of all ML models is presented as scatter and box plots so you can visually inspect which algorithms perform the best :trophy:.

![AutoML leaderboard](https://github.com/mljar/mljar-examples/blob/master/media/automl_summary.gif)

## The `Decision Tree` Report

The example for `Decision Tree` summary with trees visualization. For classification tasks, additional metrics are provided:
- confusion matrix
- threshold (optimized in the case of binary classification task)
- F1 score
- Accuracy
- Precision, Recall, MCC

![Decision Tree summary](https://github.com/mljar/mljar-examples/blob/master/media/decision_tree_summary.gif)

## The `LightGBM` Report

The example for `LightGBM` summary:

![Decision Tree summary](https://github.com/mljar/mljar-examples/blob/master/media/lightgbm_summary.gif)


## Available Modes

In the [docs](https://supervised.mljar.com/features/modes/) you can find details about AutoML modes that are presented in the table.

<p align="center">
  <img src="https://raw.githubusercontent.com/mljar/visual-identity/main/media/mljar_modes.png" width="100%" />
</p>

### Explain 

```py
automl = AutoML(mode="Explain")
```

It is aimed to be used when the user wants to explain and understand the data.
 - It is using 75%/25% train/test split. 
 - It uses: `Baseline`, `Linear`, `Decision Tree`, `Random Forest`, `Xgboost`, `Neural Network' algorithms, and ensemble. 
 - It has full explanations: learning curves, importance plots, and SHAP plots.

### Perform

```py
automl = AutoML(mode="Perform")
```

It should be used when the user wants to train a model that will be used in real-life use cases.
 - It uses a 5-fold CV.
 - It uses: `Linear`, `Random Forest`, `LightGBM`, `Xgboost`, `CatBoost`, and `Neural Network`. It uses ensembling. 
 - It has learning curves and importance plots in reports.

### Compete

```py
automl = AutoML(mode="Compete")
```

It should be used for machine learning competitions.
 - It adapts the validation strategy depending on dataset size and `total_time_limit`. It can be: a train/test split (80/20), 5-fold CV or 10-fold CV. 
 - It is using: `Linear`, `Decision Tree`, `Random Forest`, `Extra Trees`, `LightGBM`, `Xgboost`, `CatBoost`, `Neural Network`, and `Nearest Neighbors`. It uses ensemble and **stacking**. 
 - It has only learning curves in the reports.

### Optuna

```py
automl = AutoML(mode="Optuna", optuna_time_budget=3600)
```

It should be used when the performance is the most important and time is not limited.
- It uses a 10-fold CV
- It uses: `Random Forest`, `Extra Trees`, `LightGBM`, `Xgboost`, and `CatBoost`. Those algorithms are tuned by `Optuna` framework for `optuna_time_budget` seconds, each. Algorithms are tuned with original data, without advanced feature engineering.
- It uses advanced feature engineering, stacking and ensembling. The hyperparameters found for original data are reused with those steps.
- It produces learning curves in the reports.



## How to save and load AutoML?

All models in the AutoML are saved and loaded automatically. No need to call `save()` or `load()`.

### Example:

#### Train AutoML

```python
automl = AutoML(results_path="AutoML_classifier")
automl.fit(X, y)
```

You will have all models saved in the `AutoML_classifier` directory. Each model will have a separate directory with the `README.md` file with all details from the training.

#### Compute predictions
```python
automl = AutoML(results_path="AutoML_classifier")
automl.predict(X)
```

The  AutoML automatically loads models from the `results_path` directory. If you will call `fit()` on already trained AutoML then you will get a warning message that AutoML is already fitted.


### Why do you automatically save all models?

All models are automatically saved to be able to restore the training after interruption. For example, you are training AutoML for 48 hours, and after 47 hours, there is some unexpected interruption. In MLJAR AutoML you just call the same training code after the interruption and AutoML reloads already trained models and finishes the training.

## Supported evaluation metrics (`eval_metric` argument in `AutoML()`)

- for binary classification: `logloss`, `auc`, `f1`, `average_precision`, `accuracy`- default is `logloss`
- for multiclass classification: `logloss`, `f1`, `accuracy` - default is `logloss`
- for regression: `rmse`, `mse`, `mae`, `r2`, `mape`, `spearman`, `pearson` - default is `rmse`

If you don't find the `eval_metric` that you need, please add a new issue. We will add it.


## Fairness Aware Training

Starting from version `1.0.0` AutoML can optimize the Machine Learning pipeline with sensitive features. There are the following fairness related arguments in the AutoML constructor:
 - `fairness_metric` - metric which will be used to decide if the model is fair,
 - `fairness_threshold` - threshold used in decision about model fairness,
 - `privileged_groups` - privileged groups used in fairness metrics computation,
 - `underprivileged_groups` - underprivileged groups used in fairness metrics computation.

The `fit()` method accepts `sensitive_features`. When sensitive features are passed to AutoML, the best model will be selected among fair models only. In the AutoML reports, additional information about fairness metrics will be added. The MLJAR AutoML supports two methods for bias mitigation:
 - Sample Weighting - assigns weights to samples to treat samples equally,
 - Smart Grid Search - similar to Sample Weighting, where different weights are checked to optimize fairness metric.

The fair ML building can be used with all algorithms, including `Ensemble` and `Stacked Ensemble`. We support three Machine Learning tasks:
 - binary classification,
 - mutliclass classification,
 - regression.

Example code:


```python
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_openml
from supervised.automl import AutoML

data = fetch_openml(data_id=1590, as_frame=True)
X = data.data
y = (data.target == ">50K") * 1
sensitive_features = X[["sex"]]

X_train, X_test, y_train, y_test, S_train, S_test = train_test_split(
    X, y, sensitive_features, stratify=y, test_size=0.75, random_state=42
)

automl = AutoML(
    algorithms=[
        "Xgboost"
    ],
    train_ensemble=False,
    fairness_metric="demographic_parity_ratio",  
    fairness_threshold=0.8,
    privileged_groups = [{"sex": "Male"}],
    underprivileged_groups = [{"sex": "Female"}],
)

automl.fit(X_train, y_train, sensitive_features=S_train)
```

You can read more about fairness aware AutoML training in our article https://mljar.com/blog/fairness-machine-learning/

![Fairness aware AutoML](https://raw.githubusercontent.com/mljar/visual-identity/main/automl/fairness-automl.gif)



# Examples

## :point_right: Binary Classification Example

There is a simple interface available with `fit` and `predict` methods.

```python
import pandas as pd
from sklearn.model_selection import train_test_split
from supervised.automl import AutoML

df = pd.read_csv(
    "https://raw.githubusercontent.com/pplonski/datasets-for-start/master/adult/data.csv",
    skipinitialspace=True,
)
X_train, X_test, y_train, y_test = train_test_split(
    df[df.columns[:-1]], df["income"], test_size=0.25
)

automl = AutoML()
automl.fit(X_train, y_train)

predictions = automl.predict(X_test)
```

AutoML `fit` will print:
```py
Create directory AutoML_1
AutoML task to be solved: binary_classification
AutoML will use algorithms: ['Baseline', 'Linear', 'Decision Tree', 'Random Forest', 'Xgboost', 'Neural Network']
AutoML will optimize for metric: logloss
1_Baseline final logloss 0.5519845471086654 time 0.08 seconds
2_DecisionTree final logloss 0.3655910192804364 time 10.28 seconds
3_Linear final logloss 0.38139916864708445 time 3.19 seconds
4_Default_RandomForest final logloss 0.2975204390214936 time 79.19 seconds
5_Default_Xgboost final logloss 0.2731086827200411 time 5.17 seconds
6_Default_NeuralNetwork final logloss 0.319812276905242 time 21.19 seconds
Ensemble final logloss 0.2731086821194617 time 1.43 seconds
```

- the AutoML results in [Markdown report](https://github.com/mljar/mljar-examples/tree/master/Income_classification/AutoML_1#automl-leaderboard)
- the Xgboost [Markdown report](https://github.com/mljar/mljar-examples/blob/master/Income_classification/AutoML_1/5_Default_Xgboost/README.md), please take a look at amazing dependence plots produced by SHAP package :sparkling_heart:
- the Decision Tree [Markdown report](https://github.com/mljar/mljar-examples/blob/master/Income_classification/AutoML_1/2_DecisionTree/README.md), please take a look at beautiful tree visualization :sparkles:
- the Logistic Regression [Markdown report](https://github.com/mljar/mljar-examples/blob/master/Income_classification/AutoML_1/3_Linear/README.md), please take a look at coefficients table, and you can compare the SHAP plots between (Xgboost, Decision Tree and Logistic Regression) :coffee:


## :point_right: Multi-Class Classification Example

The example code for classification of the optical recognition of handwritten digits dataset. Running this code in less than 30 minutes will result in test accuracy ~98%.

```python
import pandas as pd 
# scikit learn utilites
from sklearn.datasets import load_digits
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
# mljar-supervised package
from supervised.automl import AutoML

# load the data
digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(
    pd.DataFrame(digits.data), digits.target, stratify=digits.target, test_size=0.25,
    random_state=123
)

# train models with AutoML
automl = AutoML(mode="Perform")
automl.fit(X_train, y_train)

# compute the accuracy on test data
predictions = automl.predict_all(X_test)
print(predictions.head())
print("Test accuracy:", accuracy_score(y_test, predictions["label"].astype(int)))
```

## :point_right: Regression Example

Regression example on `California Housing` house prices data.

```python
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from supervised.automl import AutoML # mljar-supervised

# Load the data
housing = fetch_california_housing()
X_train, X_test, y_train, y_test = train_test_split(
    pd.DataFrame(housing.data, columns=housing.feature_names),
    housing.target,
    test_size=0.25,
    random_state=123,
)

# train models with AutoML
automl = AutoML(mode="Explain")
automl.fit(X_train, y_train)

# compute the MSE on test data
predictions = automl.predict(X_test)
print("Test MSE:", mean_squared_error(y_test, predictions))
```

## :point_right: More Examples

- [**Income classification**](https://github.com/mljar/mljar-examples/tree/master/Income_classification) - it is a binary classification task on census data
- [**Iris classification**](https://github.com/mljar/mljar-examples/tree/master/Iris_classification) - it is a multiclass classification on Iris flowers data
- [**House price regression**](https://github.com/mljar/mljar-examples/tree/master/House_price_regression) - it is a regression task on Boston houses data

# FAQ

<details><summary>What method is used for hyperparameters optimization?</summary>
  - For modes: `Explain`, `Perform`, and `Compete` there is used a random search method combined with hill climbing. In this approach, all checked models are saved and used for building Ensemble.
  - For mode: `Optuna` the Optuna framework is used. It uses using TPE sampler for tuning. Models checked during the Optuna hyperparameters search are not saved, only the best model is saved (the final model from tuning). You can check the details about checked hyperparameters from optuna by checking study files in the `optuna` directory in your AutoML `results_path`.
</details>

<details><summary>How to save and load AutoML?</summary>

The save and load of AutoML models is automatic. All models created during AutoML training are saved in the directory set in `results_path` (argument of `AutoML()` constructor). If there is no `results_path` set, then the directory is created based on following name convention: `AutoML_{number}` the `number` will be number from 1 to 1000 (depends which directory name will be free).

Example save and load:

```python
automl = AutoML(results_path='AutoML_1')
automl.fit(X, y)
```

The all models from AutoML are saved in `AutoML_1` directory.

To load models:

```python
automl = AutoML(results_path='AutoML_1')
automl.predict(X)
```

</details>

<details><summary>How to set ML task (select between classification or regression)?</summary>

The MLJAR AutoML can work with:
- binary classification
- multi-class classification
- regression

The ML task detection is automatic based on target values. There can be situation if you want to manually force AutoML to select the ML task, then you need to set `ml_task` parameter. It can be set to `'binary_classification'`, `'multiclass_classification'`, `'regression'`.

Example:
```python
automl = AutoML(ml_task='regression')
automl.fit(X, y)
```
In the above example the regression model will be fitted.

</details>

<details><summary>How to reuse Optuna hyperparameters?</summary>
  
  You can reuse Optuna hyperparameters that were found in other AutoML training. You need to pass them in `optuna_init_params` argument. All hyperparameters found during Optuna tuning are saved in the `optuna/optuna.json` file (inside `results_path` directory).
  
 Example:
 
 ```python
 optuna_init = json.loads(open('previous_AutoML_training/optuna/optuna.json').read())
 
 automl = AutoML(
     mode='Optuna',
     optuna_init_params=optuna_init
 )
 automl.fit(X, y)
 ```
  
 When reusing Optuna hyperparameters the Optuna tuning is simply skipped. The model will be trained with hyperparameters set in `optuna_init_params`. Right now there is no option to continue Optuna tuning with seed parameters.
  
  
</details>


<details><summary>How to know the order of classes for binary or multiclass problem when using predict_proba?</summary>

To get predicted probabilites with information about class label please use the `predict_all()` method. It returns the pandas DataFrame with class names in the columns. The order of predicted columns is the same in the `predict_proba()` and `predict_all()` methods. The `predict_all()` method will additionaly have the column with the predicted class label.

</details>

# Documentation  

For details please check [mljar-supervised docs](https://supervised.mljar.com).

# Installation  

From PyPi repository:

```
pip install mljar-supervised
```

To install this package with conda run:
```
conda install -c conda-forge mljar-supervised
```

From source code:

```
git clone https://github.com/mljar/mljar-supervised.git
cd mljar-supervised
python setup.py install
```

Installation for development
```
git clone https://github.com/mljar/mljar-supervised.git
virtualenv venv --python=python3.6
source venv/bin/activate
pip install -r requirements.txt
pip install -r requirements_dev.txt
```

Running in the docker:
```
FROM python:3.7-slim-buster
RUN apt-get update && apt-get -y update
RUN apt-get install -y build-essential python3-pip python3-dev
RUN pip3 -q install pip --upgrade
RUN pip3 install mljar-supervised jupyter
CMD ["jupyter", "notebook", "--port=8888", "--no-browser", "--ip=0.0.0.0", "--allow-root"]
```

Install from GitHub with pip:
```
pip install -q -U git+https://github.com/mljar/mljar-supervised.git@master
```
# Demo

In the below demo GIF you will see:
- MLJAR AutoML trained in Jupyter Notebook on the Titanic dataset
- overview of created files
- a showcase of selected plots created during AutoML training
- algorithm comparison report along with their plots
- example of README file and CSV file with results

![](https://github.com/mljar/mljar-examples/raw/master/media/mljar_files.gif)

# Contributing

To get started take a look at our [Contribution Guide](https://supervised.mljar.com/contributing/) for information about our process and where you can fit in!

### Contributors
<a href="https://github.com/mljar/mljar-supervised/graphs/contributors">
  <img src="https://contributors-img.web.app/image?repo=mljar/mljar-supervised" />
</a>

# Cite

Would you like to cite MLJAR? Great! :)

You can cite MLJAR as follows:

```
@misc{mljar,
  author    = {Aleksandra P\l{}o\'{n}ska and Piotr P\l{}o\'{n}ski},
  year      = {2021},
  publisher = {MLJAR},
  address   = {\L{}apy, Poland},
  title     = {MLJAR: State-of-the-art Automated Machine Learning Framework for Tabular Data.  Version 0.10.3},
  url       = {https://github.com/mljar/mljar-supervised}
}
```

Would love to hear from you about how have you used MLJAR AutoML in your project. 
Please feel free to let us know at 
![image](https://user-images.githubusercontent.com/6959032/118103228-f5ea9a00-b3d9-11eb-87ed-8cfb1f873f91.png)


# License  

The `mljar-supervised` is provided with [MIT license](https://github.com/mljar/mljar-supervised/blob/master/LICENSE).

# Commercial support

Looking for commercial support? Do you need new feature implementation? Please contact us by [email](https://mljar.com/contact/) for details.

# MLJAR 
<p align="center">
  <img src="https://github.com/mljar/mljar-examples/blob/master/media/large_logo.png" width="314" />
</p>

The `mljar-supervised` is an open-source project created by [MLJAR](https://mljar.com). We care about ease of use in Machine Learning. 
The [mljar.com](https://mljar.com) provides a beautiful and simple user interface for building machine learning models.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/mljar/mljar-supervised",
    "name": "mljar-supervised",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "automated machine learning, automl, machine learning, data science, data mining, mljar, random forest, decision tree, xgboost, lightgbm, catboost, neural network, extra trees, linear model, features selection, features engineering",
    "author": "MLJAR, Sp. z o.o.",
    "author_email": "contact@mljar.com",
    "download_url": "https://files.pythonhosted.org/packages/9b/a4/971b42bce1e23a1d876f717710b480f6d99448598bb54466e7e6726c29e3/mljar-supervised-1.1.14.tar.gz",
    "platform": null,
    "description": "\n\n# MLJAR Automated Machine Learning for Humans\n\n[![Tests status](https://github.com/mljar/mljar-supervised/actions/workflows/run-tests.yml/badge.svg)](https://github.com/mljar/mljar-supervised/actions/workflows/run-tests.yml)\n[![PyPI version](https://badge.fury.io/py/mljar-supervised.svg)](https://badge.fury.io/py/mljar-supervised)\n[![Anaconda-Server Badge](https://anaconda.org/conda-forge/mljar-supervised/badges/version.svg)](https://anaconda.org/conda-forge/mljar-supervised)\n[![PyPI pyversions](https://img.shields.io/pypi/pyversions/mljar-supervised.svg)](https://pypi.python.org/pypi/mljar-supervised/)\n\n\n[![Anaconda-Server Badge](https://anaconda.org/conda-forge/mljar-supervised/badges/platforms.svg)](https://anaconda.org/conda-forge/mljar-supervised)\n[![Anaconda-Server Badge](https://anaconda.org/conda-forge/mljar-supervised/badges/license.svg)](https://anaconda.org/conda-forge/mljar-supervised)\n[![Downloads](https://pepy.tech/badge/mljar-supervised)](https://pepy.tech/project/mljar-supervised)\n\n<p align=\"center\">\n  <img \n    alt=\"mljar AutoML\"\n    src=\"https://raw.githubusercontent.com/mljar/mljar-examples/master/media/AutoML_white.png#gh-light-mode-only\" width=\"50%\" />  \n</p>\n<p align=\"center\">\n  <img \n    alt=\"mljar AutoML\"\n    src=\"https://raw.githubusercontent.com/mljar/mljar-examples/master/media/AutoML_black.png#gh-dark-mode-only\" width=\"50%\" />  \n</p>\n\n---\n\n**Documentation**: <a href=\"https://supervised.mljar.com/\" target=\"_blank\">https://supervised.mljar.com/</a>\n\n**Source Code**: <a href=\"https://github.com/mljar/mljar-supervised\" target=\"_blank\">https://github.com/mljar/mljar-supervised</a>\n\n**Looking for commercial support**: Please contact us by [email](https://mljar.com/contact/) for details\n\n\n<p align=\"center\">\n  <img src=\"https://raw.githubusercontent.com/mljar/mljar-examples/master/media/pipeline_AutoML.png\" width=\"100%\" />\n</p>\n\n---\n\n## Table of Contents\n\n - [Automated Machine Learning](https://github.com/mljar/mljar-supervised#automated-machine-learning)\n - [What's good in it?](https://github.com/mljar/mljar-supervised#whats-good-in-it)\n - [AutoML Web App with GUI](https://github.com/mljar/mljar-supervised#automl-web-app-with-user-interface)\n - [Automatic Documentation](https://github.com/mljar/mljar-supervised#automatic-documentation)\n - [Available Modes](https://github.com/mljar/mljar-supervised#available-modes)\n - [Fairness Aware Training](https://github.com/mljar/mljar-supervised#fairness-aware-training)\n - [Examples](https://github.com/mljar/mljar-supervised#examples)\n - [FAQ](https://github.com/mljar/mljar-supervised#faq)\n - [Documentation](https://github.com/mljar/mljar-supervised#documentation)\n - [Installation](https://github.com/mljar/mljar-supervised#installation)\n - [Demo](https://github.com/mljar/mljar-supervised#demo)\n - [Contributing](https://github.com/mljar/mljar-supervised#contributing)\n - [Cite](https://github.com/mljar/mljar-supervised#cite)\n - [License](https://github.com/mljar/mljar-supervised#license)\n - [Commercial support](https://github.com/mljar/mljar-supervised#commercial-support)\n - [MLJAR](https://github.com/mljar/mljar-supervised#mljar)\n \n\n\n\n\n# Automated Machine Learning \n\nThe `mljar-supervised` is an Automated Machine Learning Python package that works with tabular data. It is designed to save time for a data scientist. It abstracts the common way to preprocess the data, construct the machine learning models, and perform hyper-parameters tuning to find the best model :trophy:. It is no black box, as you can see exactly how the ML pipeline is constructed (with a detailed Markdown report for each ML model). \n\nThe `mljar-supervised` will help you with:\n - explaining and understanding your data (Automatic Exploratory Data Analysis),\n - trying many different machine learning models (Algorithm Selection and Hyper-Parameters tuning),\n - creating Markdown reports from analysis with details about all models (Automatic-Documentation),\n - saving, re-running, and loading the analysis and ML models.\n\nIt has four built-in modes of work:\n - `Explain` mode, which is ideal for explaining and understanding the data, with many data explanations, like decision trees visualization, linear models coefficients display, permutation importance, and SHAP explanations of data,\n - `Perform` for building ML pipelines to use in production,\n - `Compete` mode that trains highly-tuned ML models with ensembling and stacking, with the purpose to use in ML competitions.\n - `Optuna` mode can be used to search for highly-tuned ML models should be used when the performance is the most important, and computation time is not limited (it is available from version `0.10.0`)\n\nOf course, you can further customize the details of each `mode` to meet the requirements.\n\n## What's good in it? \n\n- It uses many algorithms: `Baseline`, `Linear`, `Random Forest`, `Extra Trees`, `LightGBM`, `Xgboost`, `CatBoost`, `Neural Networks`, and `Nearest Neighbors`.\n- It can compute Ensemble based on a greedy algorithm from [Caruana paper](http://www.cs.cornell.edu/~alexn/papers/shotgun.icml04.revised.rev2.pdf).\n- It can stack models to build a level 2 ensemble (available in `Compete` mode or after setting the `stack_models` parameter).\n- It can do features preprocessing, like missing values imputation and converting categoricals. What is more, it can also handle target values preprocessing.\n- It can do advanced features engineering, like [Golden Features](https://supervised.mljar.com/features/golden_features/), [Features Selection](https://supervised.mljar.com/features/features_selection/), Text and Time Transformations.\n- It can tune hyper-parameters with a `not-so-random-search` algorithm (random-search over a defined set of values) and hill climbing to fine-tune final models.\n- It can compute the `Baseline` for your data so that you will know if you need Machine Learning or not!\n- It has extensive explanations. This package is training simple `Decision Trees` with `max_depth <= 5`, so you can easily visualize them with amazing [dtreeviz](https://github.com/parrt/dtreeviz) to better understand your data.\n- The `mljar-supervised` uses simple linear regression and includes its coefficients in the summary report, so you can check which features are used the most in the linear model.\n- It cares about the explainability of models: for every algorithm, the feature importance is computed based on permutation. Additionally, for every algorithm, the SHAP explanations are computed: feature importance, dependence plots, and decision plots (explanations can be switched off with the `explain_level` parameter).\n- There is automatic documentation for every ML experiment run with AutoML. The `mljar-supervised` creates markdown reports from AutoML training full of ML details, metrics, and charts. \n\n<p align=\"center\">\n  <img src=\"https://raw.githubusercontent.com/mljar/visual-identity/main/media/infograph.png\" width=\"100%\" />\n</p>\n\n# AutoML Web App with User Interface\n\nWe created a Web App with GUI, so you don't need to write any code \ud83d\udc0d. Just upload your data. Please check the Web App at [github.com/mljar/automl-app](https://github.com/mljar/automl-app). You can run this Web App locally on your computer, so your data is safe and secure :cat:\n\n<kbd>\n<img src=\"https://github.com/mljar/automl-app/blob/main/media/web-app.gif\" alt=\"AutoML training in Web App\"></img>\n</kbd>\n\n# Automatic Documentation\n\n## The AutoML Report\n\nThe report from running AutoML will contain the table with information about each model score and the time needed to train the model. There is a link for each model, which you can click to see the model's details. The performance of all ML models is presented as scatter and box plots so you can visually inspect which algorithms perform the best :trophy:.\n\n![AutoML leaderboard](https://github.com/mljar/mljar-examples/blob/master/media/automl_summary.gif)\n\n## The `Decision Tree` Report\n\nThe example for `Decision Tree` summary with trees visualization. For classification tasks, additional metrics are provided:\n- confusion matrix\n- threshold (optimized in the case of binary classification task)\n- F1 score\n- Accuracy\n- Precision, Recall, MCC\n\n![Decision Tree summary](https://github.com/mljar/mljar-examples/blob/master/media/decision_tree_summary.gif)\n\n## The `LightGBM` Report\n\nThe example for `LightGBM` summary:\n\n![Decision Tree summary](https://github.com/mljar/mljar-examples/blob/master/media/lightgbm_summary.gif)\n\n\n## Available Modes\n\nIn the [docs](https://supervised.mljar.com/features/modes/) you can find details about AutoML modes that are presented in the table.\n\n<p align=\"center\">\n  <img src=\"https://raw.githubusercontent.com/mljar/visual-identity/main/media/mljar_modes.png\" width=\"100%\" />\n</p>\n\n### Explain \n\n```py\nautoml = AutoML(mode=\"Explain\")\n```\n\nIt is aimed to be used when the user wants to explain and understand the data.\n - It is using 75%/25% train/test split. \n - It uses: `Baseline`, `Linear`, `Decision Tree`, `Random Forest`, `Xgboost`, `Neural Network' algorithms, and ensemble. \n - It has full explanations: learning curves, importance plots, and SHAP plots.\n\n### Perform\n\n```py\nautoml = AutoML(mode=\"Perform\")\n```\n\nIt should be used when the user wants to train a model that will be used in real-life use cases.\n - It uses a 5-fold CV.\n - It uses: `Linear`, `Random Forest`, `LightGBM`, `Xgboost`, `CatBoost`, and `Neural Network`. It uses ensembling. \n - It has learning curves and importance plots in reports.\n\n### Compete\n\n```py\nautoml = AutoML(mode=\"Compete\")\n```\n\nIt should be used for machine learning competitions.\n - It adapts the validation strategy depending on dataset size and `total_time_limit`. It can be: a train/test split (80/20), 5-fold CV or 10-fold CV. \n - It is using: `Linear`, `Decision Tree`, `Random Forest`, `Extra Trees`, `LightGBM`, `Xgboost`, `CatBoost`, `Neural Network`, and `Nearest Neighbors`. It uses ensemble and **stacking**. \n - It has only learning curves in the reports.\n\n### Optuna\n\n```py\nautoml = AutoML(mode=\"Optuna\", optuna_time_budget=3600)\n```\n\nIt should be used when the performance is the most important and time is not limited.\n- It uses a 10-fold CV\n- It uses: `Random Forest`, `Extra Trees`, `LightGBM`, `Xgboost`, and `CatBoost`. Those algorithms are tuned by `Optuna` framework for `optuna_time_budget` seconds, each. Algorithms are tuned with original data, without advanced feature engineering.\n- It uses advanced feature engineering, stacking and ensembling. The hyperparameters found for original data are reused with those steps.\n- It produces learning curves in the reports.\n\n\n\n## How to save and load AutoML?\n\nAll models in the AutoML are saved and loaded automatically. No need to call `save()` or `load()`.\n\n### Example:\n\n#### Train AutoML\n\n```python\nautoml = AutoML(results_path=\"AutoML_classifier\")\nautoml.fit(X, y)\n```\n\nYou will have all models saved in the `AutoML_classifier` directory. Each model will have a separate directory with the `README.md` file with all details from the training.\n\n#### Compute predictions\n```python\nautoml = AutoML(results_path=\"AutoML_classifier\")\nautoml.predict(X)\n```\n\nThe  AutoML automatically loads models from the `results_path` directory. If you will call `fit()` on already trained AutoML then you will get a warning message that AutoML is already fitted.\n\n\n### Why do you automatically save all models?\n\nAll models are automatically saved to be able to restore the training after interruption. For example, you are training AutoML for 48 hours, and after 47 hours, there is some unexpected interruption. In MLJAR AutoML you just call the same training code after the interruption and AutoML reloads already trained models and finishes the training.\n\n## Supported evaluation metrics (`eval_metric` argument in `AutoML()`)\n\n- for binary classification: `logloss`, `auc`, `f1`, `average_precision`, `accuracy`- default is `logloss`\n- for multiclass classification: `logloss`, `f1`, `accuracy` - default is `logloss`\n- for regression: `rmse`, `mse`, `mae`, `r2`, `mape`, `spearman`, `pearson` - default is `rmse`\n\nIf you don't find the `eval_metric` that you need, please add a new issue. We will add it.\n\n\n## Fairness Aware Training\n\nStarting from version `1.0.0` AutoML can optimize the Machine Learning pipeline with sensitive features. There are the following fairness related arguments in the AutoML constructor:\n - `fairness_metric` - metric which will be used to decide if the model is fair,\n - `fairness_threshold` - threshold used in decision about model fairness,\n - `privileged_groups` - privileged groups used in fairness metrics computation,\n - `underprivileged_groups` - underprivileged groups used in fairness metrics computation.\n\nThe `fit()` method accepts `sensitive_features`. When sensitive features are passed to AutoML, the best model will be selected among fair models only. In the AutoML reports, additional information about fairness metrics will be added. The MLJAR AutoML supports two methods for bias mitigation:\n - Sample Weighting - assigns weights to samples to treat samples equally,\n - Smart Grid Search - similar to Sample Weighting, where different weights are checked to optimize fairness metric.\n\nThe fair ML building can be used with all algorithms, including `Ensemble` and `Stacked Ensemble`. We support three Machine Learning tasks:\n - binary classification,\n - mutliclass classification,\n - regression.\n\nExample code:\n\n\n```python\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.datasets import fetch_openml\nfrom supervised.automl import AutoML\n\ndata = fetch_openml(data_id=1590, as_frame=True)\nX = data.data\ny = (data.target == \">50K\") * 1\nsensitive_features = X[[\"sex\"]]\n\nX_train, X_test, y_train, y_test, S_train, S_test = train_test_split(\n    X, y, sensitive_features, stratify=y, test_size=0.75, random_state=42\n)\n\nautoml = AutoML(\n    algorithms=[\n        \"Xgboost\"\n    ],\n    train_ensemble=False,\n    fairness_metric=\"demographic_parity_ratio\",  \n    fairness_threshold=0.8,\n    privileged_groups = [{\"sex\": \"Male\"}],\n    underprivileged_groups = [{\"sex\": \"Female\"}],\n)\n\nautoml.fit(X_train, y_train, sensitive_features=S_train)\n```\n\nYou can read more about fairness aware AutoML training in our article https://mljar.com/blog/fairness-machine-learning/\n\n![Fairness aware AutoML](https://raw.githubusercontent.com/mljar/visual-identity/main/automl/fairness-automl.gif)\n\n\n\n# Examples\n\n## :point_right: Binary Classification Example\n\nThere is a simple interface available with `fit` and `predict` methods.\n\n```python\nimport pandas as pd\nfrom sklearn.model_selection import train_test_split\nfrom supervised.automl import AutoML\n\ndf = pd.read_csv(\n    \"https://raw.githubusercontent.com/pplonski/datasets-for-start/master/adult/data.csv\",\n    skipinitialspace=True,\n)\nX_train, X_test, y_train, y_test = train_test_split(\n    df[df.columns[:-1]], df[\"income\"], test_size=0.25\n)\n\nautoml = AutoML()\nautoml.fit(X_train, y_train)\n\npredictions = automl.predict(X_test)\n```\n\nAutoML `fit` will print:\n```py\nCreate directory AutoML_1\nAutoML task to be solved: binary_classification\nAutoML will use algorithms: ['Baseline', 'Linear', 'Decision Tree', 'Random Forest', 'Xgboost', 'Neural Network']\nAutoML will optimize for metric: logloss\n1_Baseline final logloss 0.5519845471086654 time 0.08 seconds\n2_DecisionTree final logloss 0.3655910192804364 time 10.28 seconds\n3_Linear final logloss 0.38139916864708445 time 3.19 seconds\n4_Default_RandomForest final logloss 0.2975204390214936 time 79.19 seconds\n5_Default_Xgboost final logloss 0.2731086827200411 time 5.17 seconds\n6_Default_NeuralNetwork final logloss 0.319812276905242 time 21.19 seconds\nEnsemble final logloss 0.2731086821194617 time 1.43 seconds\n```\n\n- the AutoML results in [Markdown report](https://github.com/mljar/mljar-examples/tree/master/Income_classification/AutoML_1#automl-leaderboard)\n- the Xgboost [Markdown report](https://github.com/mljar/mljar-examples/blob/master/Income_classification/AutoML_1/5_Default_Xgboost/README.md), please take a look at amazing dependence plots produced by SHAP package :sparkling_heart:\n- the Decision Tree [Markdown report](https://github.com/mljar/mljar-examples/blob/master/Income_classification/AutoML_1/2_DecisionTree/README.md), please take a look at beautiful tree visualization :sparkles:\n- the Logistic Regression [Markdown report](https://github.com/mljar/mljar-examples/blob/master/Income_classification/AutoML_1/3_Linear/README.md), please take a look at coefficients table, and you can compare the SHAP plots between (Xgboost, Decision Tree and Logistic Regression) :coffee:\n\n\n## :point_right: Multi-Class Classification Example\n\nThe example code for classification of the optical recognition of handwritten digits dataset. Running this code in less than 30 minutes will result in test accuracy ~98%.\n\n```python\nimport pandas as pd \n# scikit learn utilites\nfrom sklearn.datasets import load_digits\nfrom sklearn.metrics import accuracy_score\nfrom sklearn.model_selection import train_test_split\n# mljar-supervised package\nfrom supervised.automl import AutoML\n\n# load the data\ndigits = load_digits()\nX_train, X_test, y_train, y_test = train_test_split(\n    pd.DataFrame(digits.data), digits.target, stratify=digits.target, test_size=0.25,\n    random_state=123\n)\n\n# train models with AutoML\nautoml = AutoML(mode=\"Perform\")\nautoml.fit(X_train, y_train)\n\n# compute the accuracy on test data\npredictions = automl.predict_all(X_test)\nprint(predictions.head())\nprint(\"Test accuracy:\", accuracy_score(y_test, predictions[\"label\"].astype(int)))\n```\n\n## :point_right: Regression Example\n\nRegression example on `California Housing` house prices data.\n\n```python\nimport numpy as np\nimport pandas as pd\nfrom sklearn.datasets import fetch_california_housing\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.metrics import mean_squared_error\nfrom supervised.automl import AutoML # mljar-supervised\n\n# Load the data\nhousing = fetch_california_housing()\nX_train, X_test, y_train, y_test = train_test_split(\n    pd.DataFrame(housing.data, columns=housing.feature_names),\n    housing.target,\n    test_size=0.25,\n    random_state=123,\n)\n\n# train models with AutoML\nautoml = AutoML(mode=\"Explain\")\nautoml.fit(X_train, y_train)\n\n# compute the MSE on test data\npredictions = automl.predict(X_test)\nprint(\"Test MSE:\", mean_squared_error(y_test, predictions))\n```\n\n## :point_right: More Examples\n\n- [**Income classification**](https://github.com/mljar/mljar-examples/tree/master/Income_classification) - it is a binary classification task on census data\n- [**Iris classification**](https://github.com/mljar/mljar-examples/tree/master/Iris_classification) - it is a multiclass classification on Iris flowers data\n- [**House price regression**](https://github.com/mljar/mljar-examples/tree/master/House_price_regression) - it is a regression task on Boston houses data\n\n# FAQ\n\n<details><summary>What method is used for hyperparameters optimization?</summary>\n  - For modes: `Explain`, `Perform`, and `Compete` there is used a random search method combined with hill climbing. In this approach, all checked models are saved and used for building Ensemble.\n  - For mode: `Optuna` the Optuna framework is used. It uses using TPE sampler for tuning. Models checked during the Optuna hyperparameters search are not saved, only the best model is saved (the final model from tuning). You can check the details about checked hyperparameters from optuna by checking study files in the `optuna` directory in your AutoML `results_path`.\n</details>\n\n<details><summary>How to save and load AutoML?</summary>\n\nThe save and load of AutoML models is automatic. All models created during AutoML training are saved in the directory set in `results_path` (argument of `AutoML()` constructor). If there is no `results_path` set, then the directory is created based on following name convention: `AutoML_{number}` the `number` will be number from 1 to 1000 (depends which directory name will be free).\n\nExample save and load:\n\n```python\nautoml = AutoML(results_path='AutoML_1')\nautoml.fit(X, y)\n```\n\nThe all models from AutoML are saved in `AutoML_1` directory.\n\nTo load models:\n\n```python\nautoml = AutoML(results_path='AutoML_1')\nautoml.predict(X)\n```\n\n</details>\n\n<details><summary>How to set ML task (select between classification or regression)?</summary>\n\nThe MLJAR AutoML can work with:\n- binary classification\n- multi-class classification\n- regression\n\nThe ML task detection is automatic based on target values. There can be situation if you want to manually force AutoML to select the ML task, then you need to set `ml_task` parameter. It can be set to `'binary_classification'`, `'multiclass_classification'`, `'regression'`.\n\nExample:\n```python\nautoml = AutoML(ml_task='regression')\nautoml.fit(X, y)\n```\nIn the above example the regression model will be fitted.\n\n</details>\n\n<details><summary>How to reuse Optuna hyperparameters?</summary>\n  \n  You can reuse Optuna hyperparameters that were found in other AutoML training. You need to pass them in `optuna_init_params` argument. All hyperparameters found during Optuna tuning are saved in the `optuna/optuna.json` file (inside `results_path` directory).\n  \n Example:\n \n ```python\n optuna_init = json.loads(open('previous_AutoML_training/optuna/optuna.json').read())\n \n automl = AutoML(\n     mode='Optuna',\n     optuna_init_params=optuna_init\n )\n automl.fit(X, y)\n ```\n  \n When reusing Optuna hyperparameters the Optuna tuning is simply skipped. The model will be trained with hyperparameters set in `optuna_init_params`. Right now there is no option to continue Optuna tuning with seed parameters.\n  \n  \n</details>\n\n\n<details><summary>How to know the order of classes for binary or multiclass problem when using predict_proba?</summary>\n\nTo get predicted probabilites with information about class label please use the `predict_all()` method. It returns the pandas DataFrame with class names in the columns. The order of predicted columns is the same in the `predict_proba()` and `predict_all()` methods. The `predict_all()` method will additionaly have the column with the predicted class label.\n\n</details>\n\n# Documentation  \n\nFor details please check [mljar-supervised docs](https://supervised.mljar.com).\n\n# Installation  \n\nFrom PyPi repository:\n\n```\npip install mljar-supervised\n```\n\nTo install this package with conda run:\n```\nconda install -c conda-forge mljar-supervised\n```\n\nFrom source code:\n\n```\ngit clone https://github.com/mljar/mljar-supervised.git\ncd mljar-supervised\npython setup.py install\n```\n\nInstallation for development\n```\ngit clone https://github.com/mljar/mljar-supervised.git\nvirtualenv venv --python=python3.6\nsource venv/bin/activate\npip install -r requirements.txt\npip install -r requirements_dev.txt\n```\n\nRunning in the docker:\n```\nFROM python:3.7-slim-buster\nRUN apt-get update && apt-get -y update\nRUN apt-get install -y build-essential python3-pip python3-dev\nRUN pip3 -q install pip --upgrade\nRUN pip3 install mljar-supervised jupyter\nCMD [\"jupyter\", \"notebook\", \"--port=8888\", \"--no-browser\", \"--ip=0.0.0.0\", \"--allow-root\"]\n```\n\nInstall from GitHub with pip:\n```\npip install -q -U git+https://github.com/mljar/mljar-supervised.git@master\n```\n# Demo\n\nIn the below demo GIF you will see:\n- MLJAR AutoML trained in Jupyter Notebook on the Titanic dataset\n- overview of created files\n- a showcase of selected plots created during AutoML training\n- algorithm comparison report along with their plots\n- example of README file and CSV file with results\n\n![](https://github.com/mljar/mljar-examples/raw/master/media/mljar_files.gif)\n\n# Contributing\n\nTo get started take a look at our [Contribution Guide](https://supervised.mljar.com/contributing/) for information about our process and where you can fit in!\n\n### Contributors\n<a href=\"https://github.com/mljar/mljar-supervised/graphs/contributors\">\n  <img src=\"https://contributors-img.web.app/image?repo=mljar/mljar-supervised\" />\n</a>\n\n# Cite\n\nWould you like to cite MLJAR? Great! :)\n\nYou can cite MLJAR as follows:\n\n```\n@misc{mljar,\n  author    = {Aleksandra P\\l{}o\\'{n}ska and Piotr P\\l{}o\\'{n}ski},\n  year      = {2021},\n  publisher = {MLJAR},\n  address   = {\\L{}apy, Poland},\n  title     = {MLJAR: State-of-the-art Automated Machine Learning Framework for Tabular Data.  Version 0.10.3},\n  url       = {https://github.com/mljar/mljar-supervised}\n}\n```\n\nWould love to hear from you about how have you used MLJAR AutoML in your project. \nPlease feel free to let us know at \n![image](https://user-images.githubusercontent.com/6959032/118103228-f5ea9a00-b3d9-11eb-87ed-8cfb1f873f91.png)\n\n\n# License  \n\nThe `mljar-supervised` is provided with [MIT license](https://github.com/mljar/mljar-supervised/blob/master/LICENSE).\n\n# Commercial support\n\nLooking for commercial support? Do you need new feature implementation? Please contact us by [email](https://mljar.com/contact/) for details.\n\n# MLJAR \n<p align=\"center\">\n  <img src=\"https://github.com/mljar/mljar-examples/blob/master/media/large_logo.png\" width=\"314\" />\n</p>\n\nThe `mljar-supervised` is an open-source project created by [MLJAR](https://mljar.com). We care about ease of use in Machine Learning. \nThe [mljar.com](https://mljar.com) provides a beautiful and simple user interface for building machine learning models.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Automated Machine Learning for Humans",
    "version": "1.1.14",
    "project_urls": {
        "Homepage": "https://github.com/mljar/mljar-supervised"
    },
    "split_keywords": [
        "automated machine learning",
        " automl",
        " machine learning",
        " data science",
        " data mining",
        " mljar",
        " random forest",
        " decision tree",
        " xgboost",
        " lightgbm",
        " catboost",
        " neural network",
        " extra trees",
        " linear model",
        " features selection",
        " features engineering"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9ba4971b42bce1e23a1d876f717710b480f6d99448598bb54466e7e6726c29e3",
                "md5": "5d6f8a2990bea732e54bd3d2c934f964",
                "sha256": "588cde856656511bf7d3b61d315068d9ca77bf2e17c7c9054da1e6c3a2a29e5f"
            },
            "downloads": -1,
            "filename": "mljar-supervised-1.1.14.tar.gz",
            "has_sig": false,
            "md5_digest": "5d6f8a2990bea732e54bd3d2c934f964",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 127621,
            "upload_time": "2024-11-12T09:10:30",
            "upload_time_iso_8601": "2024-11-12T09:10:30.255569Z",
            "url": "https://files.pythonhosted.org/packages/9b/a4/971b42bce1e23a1d876f717710b480f6d99448598bb54466e7e6726c29e3/mljar-supervised-1.1.14.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-12 09:10:30",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mljar",
    "github_project": "mljar-supervised",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "numpy",
            "specs": [
                [
                    "<",
                    "2"
                ],
                [
                    ">=",
                    "1.19.5"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    ">=",
                    "1.6.1"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    ">=",
                    "1.5.0"
                ]
            ]
        },
        {
            "name": "xgboost",
            "specs": [
                [
                    ">=",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "lightgbm",
            "specs": [
                [
                    ">=",
                    "3.0.0"
                ]
            ]
        },
        {
            "name": "catboost",
            "specs": [
                [
                    ">=",
                    "0.24.4"
                ]
            ]
        },
        {
            "name": "joblib",
            "specs": [
                [
                    ">=",
                    "1.0.1"
                ]
            ]
        },
        {
            "name": "tabulate",
            "specs": [
                [
                    ">=",
                    "0.8.7"
                ]
            ]
        },
        {
            "name": "matplotlib",
            "specs": [
                [
                    ">=",
                    "3.2.2"
                ]
            ]
        },
        {
            "name": "dtreeviz",
            "specs": [
                [
                    ">=",
                    "2.2.2"
                ]
            ]
        },
        {
            "name": "shap",
            "specs": [
                [
                    ">=",
                    "0.42.1"
                ]
            ]
        },
        {
            "name": "seaborn",
            "specs": [
                [
                    ">=",
                    "0.11.1"
                ]
            ]
        },
        {
            "name": "wordcloud",
            "specs": [
                [
                    ">=",
                    "1.8.1"
                ]
            ]
        },
        {
            "name": "category_encoders",
            "specs": [
                [
                    ">=",
                    "2.2.2"
                ]
            ]
        },
        {
            "name": "optuna-integration",
            "specs": [
                [
                    ">=",
                    "3.6.0"
                ]
            ]
        },
        {
            "name": "mljar-scikit-plot",
            "specs": [
                [
                    ">=",
                    "0.3.11"
                ]
            ]
        },
        {
            "name": "markdown",
            "specs": []
        },
        {
            "name": "typing-extensions",
            "specs": []
        },
        {
            "name": "ipython",
            "specs": []
        }
    ],
    "lcname": "mljar-supervised"
}
        
Elapsed time: 0.36130s