mloptimizer


Namemloptimizer JSON
Version 0.8.5 PyPI version JSON
download
home_pagehttps://github.com/Caparrini/mloptimizer
Summarymloptimizer is a Python library for optimizing hyperparameters of machine learning algorithms using genetic algorithms.
upload_time2024-04-06 16:14:14
maintainerNone
docs_urlNone
authorAntonio Caparrini López, Javier Arroyo Gallardo
requires_python<3.12,>=3.9
licenseNone
keywords xgboost genetic deap
VCS
bugtrack_url
requirements catboost deap joblib pandas python-dateutil pytz scikit-learn scipy seaborn six xgboost plotly kaleido tqdm
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ![mloptimizer_banner](https://raw.githubusercontent.com/Caparrini/mloptimizer-static/main/logos/mloptimizer_banner_readme.png)

[![Documentation Status](https://readthedocs.org/projects/mloptimizer/badge/?version=master)](https://mloptimizer.readthedocs.io/en/master/?badge=master)
[![PyPI version](https://badge.fury.io/py/mloptimizer.svg)](https://badge.fury.io/py/mloptimizer)
[![PyPI pyversions](https://img.shields.io/pypi/pyversions/mloptimizer.svg)](https://pypi.python.org/pypi/mloptimizer/)
[![Tests](https://github.com/Caparrini/mloptimizer/actions/workflows/CI.yml/badge.svg)](https://github.com/Caparrini/mloptimizer/actions/workflows/CI.yml)
[![Coverage Status](http://codecov.io/github/Caparrini/mloptimizer/coverage.svg?branch=master)](https://app.codecov.io/gh/Caparrini/mloptimizer)


**mloptimizer** is a Python library for optimizing hyperparameters of machine learning algorithms using genetic algorithms. 
With mloptimizer, you can find the optimal set of hyperparameters for a given machine learning model and dataset, which can significantly improve the performance of the model. 
The library supports several popular machine learning algorithms, including decision trees, random forests, and gradient boosting classifiers. 
The genetic algorithm used in mloptimizer provides an efficient and flexible approach to search for the optimal hyperparameters in a large search space.

## Features
- Easy to use
- DEAP-based genetic algorithm ready to use with several machine learning algorithms
- Adaptable to use with any machine learning algorithm that complies with the Scikit-Learn API
- Default hyperparameter ranges
- Default score functions for evaluating the performance of the model
- Reproducibility of results

## Advanced Features
- Extensible with more machine learning algorithms that comply with the Scikit-Learn API
- Customizable hyperparameter ranges
- Customizable score functions
- Optional mlflow compatibility for tracking the optimization process

## Installation

It is recommended to create a virtual environment using the `venv` package. 
To learn more about how to use `venv`, 
check out the official Python documentation at 
https://docs.python.org/3/library/venv.html.

```bash
# Create the virtual environment
python -m venv myenv
# Activate the virtual environment
source myenv/bin/activate
```

To install `mloptimizer`, run:

```bash
pip install mloptimizer
```

You can get more information about the package installation at https://pypi.org/project/mloptimizer/.


### Quickstart

Here's a simple example of how to optimize hyperparameters in a decision tree classifier using the iris dataset:

```python
from mloptimizer.core import Optimizer
from mloptimizer.hyperparams import HyperparameterSpace
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris

# 1) Load the dataset and get the features and target
X, y = load_iris(return_X_y=True)

# 2) Define the hyperparameter space (a default space is provided for some algorithms)
hyperparameter_space = HyperparameterSpace.get_default_hyperparameter_space(DecisionTreeClassifier)

# 3) Create the optimizer and optimize the classifier
opt = Optimizer(estimator_class=DecisionTreeClassifier, features=X, labels=y, hyperparam_space=hyperparameter_space)

# 4) Optimize the classifier, the optimization returns the best estimator found in the optimization process
# - 10 generations starting with a population of 10 individuals, other parameters are set to default
clf = opt.optimize_clf(population_size=10, generations=10)
```
Other algorithms can be used, such as `RandomForestClassifier` or `XGBClassifier` which have a 
default hyperparameter space defined in the library.
Even if the algorithm is not included in the default hyperparameter space, you can define your own hyperparameter space
following the documentation.

The optimization will create a directory in the current folder with a name like `YYYYMMDD_nnnnnnnnnn_SklearnOptimizer`.
This folder contains the results of the optimization, 
including the best estimator found and the log file `opt.log` informing with all the steps, 
the best estimator and the result of the optimization.

A structure like this will be created:

```
├── checkpoints
│   ├── cp_gen_0.pkl
│   └── cp_gen_1.pkl
├── graphics
│   ├── logbook.html
│   └── search_space.html
├── opt.log
├── progress
│   ├── Generation_0.csv
│   └── Generation_1.csv
└── results
    ├── logbook.csv
    └── populations.csv
```

Each item in the directory is described below:

- `checkpoints`: This directory contains the checkpoint files for each generation of the genetic optimization process. These files are used to save the state of the optimization process at each generation, allowing for the process to be resumed from a specific point if needed.
    - `cp_gen_0.pkl`, `cp_gen_1.pkl`: These are the individual checkpoint files for each generation. They are named according to the generation number and are saved in Python's pickle format.

- `graphics`: This directory contains HTML files for visualizing the optimization process.
    - `logbook.html`: This file provides a graphical representation of the logbook, which records the statistics of the optimization process over generations.
    - `search_space.html`: This file provides a graphical representation of the search space of the optimization process.

- `opt.log`: This is the log file for the optimization process. It contains detailed logs of the optimization process, including the performance of the algorithm at each generation.

- `progress`: This directory contains CSV files that record the progress of the optimization process for each generation.
    - `Generation_0.csv`, `Generation_1.csv`: These are the individual progress files for each generation. They contain detailed information about each individual in the population at each generation.

- `results`: This directory contains CSV files with the results of the optimization process.
    - `logbook.csv`: This file is a CSV representation of the logbook, which records the statistics of the optimization process over generations.
    - `populations.csv`: This file contains the final populations of the optimization process. It includes the hyperparameters and fitness values of each individual in the population.

More details in the [documentation](http://mloptimizer.readthedocs.io/).



## Examples

Examples can be found in [examples](https://mloptimizer.readthedocs.io/en/latest/auto_examples/index.html) on readthedocs.io.

## Dependencies

The following dependencies are used in `mloptimizer`:

* [Deap](https://github.com/DEAP/deap) - Genetic Algorithms
* [XGBoost](https://github.com/dmlc/xgboost) - Gradient boosting classifier
* [Scikit-Learn](https://github.com/scikit-learn/scikit-learn) - Machine learning algorithms and utilities

Optional:
* [Keras](https://keras.io) - Deep learning library
* [mlflow](https://mlflow.org) - Tracking the optimization process

## Documentation

The documentation for `mloptimizer` can be found in the project's [wiki](http://mloptimizer.readthedocs.io/)
with examples, classes and methods reference.


## Authors

* **Antonio Caparrini** - *Author* - [caparrini](https://github.com/caparrini)
* **Javier Arroyo Gallardo** - *Author* - [javiag](https://github.com/javiag)

## License

This project is licensed under the [MIT License](LICENSE).

## FAQs
- TODO

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Caparrini/mloptimizer",
    "name": "mloptimizer",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.12,>=3.9",
    "maintainer_email": null,
    "keywords": "xgboost, genetic, deap",
    "author": "Antonio Caparrini L\u00f3pez, Javier Arroyo Gallardo",
    "author_email": "acaparri@ucm.es",
    "download_url": "https://files.pythonhosted.org/packages/b5/ed/7017736e46164b0db7178ef567cfec928e01d547e7a45f32c66aa77c80eb/mloptimizer-0.8.5.tar.gz",
    "platform": null,
    "description": "![mloptimizer_banner](https://raw.githubusercontent.com/Caparrini/mloptimizer-static/main/logos/mloptimizer_banner_readme.png)\n\n[![Documentation Status](https://readthedocs.org/projects/mloptimizer/badge/?version=master)](https://mloptimizer.readthedocs.io/en/master/?badge=master)\n[![PyPI version](https://badge.fury.io/py/mloptimizer.svg)](https://badge.fury.io/py/mloptimizer)\n[![PyPI pyversions](https://img.shields.io/pypi/pyversions/mloptimizer.svg)](https://pypi.python.org/pypi/mloptimizer/)\n[![Tests](https://github.com/Caparrini/mloptimizer/actions/workflows/CI.yml/badge.svg)](https://github.com/Caparrini/mloptimizer/actions/workflows/CI.yml)\n[![Coverage Status](http://codecov.io/github/Caparrini/mloptimizer/coverage.svg?branch=master)](https://app.codecov.io/gh/Caparrini/mloptimizer)\n\n\n**mloptimizer** is a Python library for optimizing hyperparameters of machine learning algorithms using genetic algorithms. \nWith mloptimizer, you can find the optimal set of hyperparameters for a given machine learning model and dataset, which can significantly improve the performance of the model. \nThe library supports several popular machine learning algorithms, including decision trees, random forests, and gradient boosting classifiers. \nThe genetic algorithm used in mloptimizer provides an efficient and flexible approach to search for the optimal hyperparameters in a large search space.\n\n## Features\n- Easy to use\n- DEAP-based genetic algorithm ready to use with several machine learning algorithms\n- Adaptable to use with any machine learning algorithm that complies with the Scikit-Learn API\n- Default hyperparameter ranges\n- Default score functions for evaluating the performance of the model\n- Reproducibility of results\n\n## Advanced Features\n- Extensible with more machine learning algorithms that comply with the Scikit-Learn API\n- Customizable hyperparameter ranges\n- Customizable score functions\n- Optional mlflow compatibility for tracking the optimization process\n\n## Installation\n\nIt is recommended to create a virtual environment using the `venv` package. \nTo learn more about how to use `venv`, \ncheck out the official Python documentation at \nhttps://docs.python.org/3/library/venv.html.\n\n```bash\n# Create the virtual environment\npython -m venv myenv\n# Activate the virtual environment\nsource myenv/bin/activate\n```\n\nTo install `mloptimizer`, run:\n\n```bash\npip install mloptimizer\n```\n\nYou can get more information about the package installation at https://pypi.org/project/mloptimizer/.\n\n\n### Quickstart\n\nHere's a simple example of how to optimize hyperparameters in a decision tree classifier using the iris dataset:\n\n```python\nfrom mloptimizer.core import Optimizer\nfrom mloptimizer.hyperparams import HyperparameterSpace\nfrom sklearn.tree import DecisionTreeClassifier\nfrom sklearn.datasets import load_iris\n\n# 1) Load the dataset and get the features and target\nX, y = load_iris(return_X_y=True)\n\n# 2) Define the hyperparameter space (a default space is provided for some algorithms)\nhyperparameter_space = HyperparameterSpace.get_default_hyperparameter_space(DecisionTreeClassifier)\n\n# 3) Create the optimizer and optimize the classifier\nopt = Optimizer(estimator_class=DecisionTreeClassifier, features=X, labels=y, hyperparam_space=hyperparameter_space)\n\n# 4) Optimize the classifier, the optimization returns the best estimator found in the optimization process\n# - 10 generations starting with a population of 10 individuals, other parameters are set to default\nclf = opt.optimize_clf(population_size=10, generations=10)\n```\nOther algorithms can be used, such as `RandomForestClassifier` or `XGBClassifier` which have a \ndefault hyperparameter space defined in the library.\nEven if the algorithm is not included in the default hyperparameter space, you can define your own hyperparameter space\nfollowing the documentation.\n\nThe optimization will create a directory in the current folder with a name like `YYYYMMDD_nnnnnnnnnn_SklearnOptimizer`.\nThis folder contains the results of the optimization, \nincluding the best estimator found and the log file `opt.log` informing with all the steps, \nthe best estimator and the result of the optimization.\n\nA structure like this will be created:\n\n```\n\u251c\u2500\u2500 checkpoints\n\u2502   \u251c\u2500\u2500 cp_gen_0.pkl\n\u2502   \u2514\u2500\u2500 cp_gen_1.pkl\n\u251c\u2500\u2500 graphics\n\u2502   \u251c\u2500\u2500 logbook.html\n\u2502   \u2514\u2500\u2500 search_space.html\n\u251c\u2500\u2500 opt.log\n\u251c\u2500\u2500 progress\n\u2502   \u251c\u2500\u2500 Generation_0.csv\n\u2502   \u2514\u2500\u2500 Generation_1.csv\n\u2514\u2500\u2500 results\n    \u251c\u2500\u2500 logbook.csv\n    \u2514\u2500\u2500 populations.csv\n```\n\nEach item in the directory is described below:\n\n- `checkpoints`: This directory contains the checkpoint files for each generation of the genetic optimization process. These files are used to save the state of the optimization process at each generation, allowing for the process to be resumed from a specific point if needed.\n    - `cp_gen_0.pkl`, `cp_gen_1.pkl`: These are the individual checkpoint files for each generation. They are named according to the generation number and are saved in Python's pickle format.\n\n- `graphics`: This directory contains HTML files for visualizing the optimization process.\n    - `logbook.html`: This file provides a graphical representation of the logbook, which records the statistics of the optimization process over generations.\n    - `search_space.html`: This file provides a graphical representation of the search space of the optimization process.\n\n- `opt.log`: This is the log file for the optimization process. It contains detailed logs of the optimization process, including the performance of the algorithm at each generation.\n\n- `progress`: This directory contains CSV files that record the progress of the optimization process for each generation.\n    - `Generation_0.csv`, `Generation_1.csv`: These are the individual progress files for each generation. They contain detailed information about each individual in the population at each generation.\n\n- `results`: This directory contains CSV files with the results of the optimization process.\n    - `logbook.csv`: This file is a CSV representation of the logbook, which records the statistics of the optimization process over generations.\n    - `populations.csv`: This file contains the final populations of the optimization process. It includes the hyperparameters and fitness values of each individual in the population.\n\nMore details in the [documentation](http://mloptimizer.readthedocs.io/).\n\n\n\n## Examples\n\nExamples can be found in [examples](https://mloptimizer.readthedocs.io/en/latest/auto_examples/index.html) on readthedocs.io.\n\n## Dependencies\n\nThe following dependencies are used in `mloptimizer`:\n\n* [Deap](https://github.com/DEAP/deap) - Genetic Algorithms\n* [XGBoost](https://github.com/dmlc/xgboost) - Gradient boosting classifier\n* [Scikit-Learn](https://github.com/scikit-learn/scikit-learn) - Machine learning algorithms and utilities\n\nOptional:\n* [Keras](https://keras.io) - Deep learning library\n* [mlflow](https://mlflow.org) - Tracking the optimization process\n\n## Documentation\n\nThe documentation for `mloptimizer` can be found in the project's [wiki](http://mloptimizer.readthedocs.io/)\nwith examples, classes and methods reference.\n\n\n## Authors\n\n* **Antonio Caparrini** - *Author* - [caparrini](https://github.com/caparrini)\n* **Javier Arroyo Gallardo** - *Author* - [javiag](https://github.com/javiag)\n\n## License\n\nThis project is licensed under the [MIT License](LICENSE).\n\n## FAQs\n- TODO\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "mloptimizer is a Python library for optimizing hyperparameters of machine learning algorithms using genetic algorithms.",
    "version": "0.8.5",
    "project_urls": {
        "Homepage": "https://github.com/Caparrini/mloptimizer",
        "Source": "https://github.com/Caparrini/mloptimizer"
    },
    "split_keywords": [
        "xgboost",
        " genetic",
        " deap"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ff51febb3b974ef278dc9c430bb5fb05c101b9f356cce85c2dd0ec2919e86765",
                "md5": "0ee6e83906c518a79bc1b6a771b83a89",
                "sha256": "47789dc749ceeeef173d43c9b0ebbc46e681bf6a9ad87e066f7d9bd219b98617"
            },
            "downloads": -1,
            "filename": "mloptimizer-0.8.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0ee6e83906c518a79bc1b6a771b83a89",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.12,>=3.9",
            "size": 38831,
            "upload_time": "2024-04-06T16:14:12",
            "upload_time_iso_8601": "2024-04-06T16:14:12.227995Z",
            "url": "https://files.pythonhosted.org/packages/ff/51/febb3b974ef278dc9c430bb5fb05c101b9f356cce85c2dd0ec2919e86765/mloptimizer-0.8.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b5ed7017736e46164b0db7178ef567cfec928e01d547e7a45f32c66aa77c80eb",
                "md5": "daeb3a56152d6be0938b7c26b9279b4a",
                "sha256": "1d6c088b95e0e653cb830dc478d4ecd00b973dbbb1487cee38e944b9c031593c"
            },
            "downloads": -1,
            "filename": "mloptimizer-0.8.5.tar.gz",
            "has_sig": false,
            "md5_digest": "daeb3a56152d6be0938b7c26b9279b4a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.12,>=3.9",
            "size": 33233,
            "upload_time": "2024-04-06T16:14:14",
            "upload_time_iso_8601": "2024-04-06T16:14:14.249856Z",
            "url": "https://files.pythonhosted.org/packages/b5/ed/7017736e46164b0db7178ef567cfec928e01d547e7a45f32c66aa77c80eb/mloptimizer-0.8.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-06 16:14:14",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Caparrini",
    "github_project": "mloptimizer",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "catboost",
            "specs": [
                [
                    ">=",
                    "1.1.1"
                ]
            ]
        },
        {
            "name": "deap",
            "specs": [
                [
                    ">=",
                    "1.3.3"
                ]
            ]
        },
        {
            "name": "joblib",
            "specs": [
                [
                    ">=",
                    "1.2.0"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "1.5.3"
                ]
            ]
        },
        {
            "name": "python-dateutil",
            "specs": [
                [
                    ">=",
                    "2.8.1"
                ]
            ]
        },
        {
            "name": "pytz",
            "specs": [
                [
                    ">=",
                    "2022.7.1"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    ">=",
                    "1.2.1"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    ">=",
                    "1.10.0"
                ]
            ]
        },
        {
            "name": "seaborn",
            "specs": [
                [
                    "==",
                    "0.12.2"
                ]
            ]
        },
        {
            "name": "six",
            "specs": [
                [
                    ">=",
                    "1.15.0"
                ]
            ]
        },
        {
            "name": "xgboost",
            "specs": [
                [
                    ">=",
                    "1.7.3"
                ]
            ]
        },
        {
            "name": "plotly",
            "specs": [
                [
                    ">=",
                    "5.15.0"
                ]
            ]
        },
        {
            "name": "kaleido",
            "specs": []
        },
        {
            "name": "tqdm",
            "specs": []
        }
    ],
    "lcname": "mloptimizer"
}
        
Elapsed time: 0.22480s