# <img src='https://raw.githubusercontent.com/VicentePerezSoloviev/EDAspy/master/Logo%20EDAspy.png' align="right" height="150"/>
[![PyPI](https://img.shields.io/pypi/v/edaspy)](https://pypi.python.org/pypi/EDAspy/)
[![PyPI license](https://img.shields.io/pypi/l/EDAspy.svg)](https://pypi.python.org/pypi/EDAspy/)
[![Downloads](https://static.pepy.tech/personalized-badge/edaspy?period=total&units=none&left_color=grey&right_color=blue&left_text=downloads)](https://pepy.tech/project/edaspy)
[![Documentation Status](https://readthedocs.org/projects/edaspy/badge/?version=latest)](https://edaspy.readthedocs.io/en/latest/?badge=latest)
# EDAspy
## Introduction
EDAspy presents some implementations of the Estimation of Distribution Algorithms (EDAs) [1]. EDAs are a type of
evolutionary algorithms. Depending on the type of the probabilistic model embedded in the EDA, and the type of
variables considered, we will use a different EDA implementation.
The pseudocode of EDAs is the following:
1. Random initialization of the population.
2. Evaluate each individual of the population.
3. Select the top best individuals according to cost function evaluation.
4. Learn a probabilistic model from the best individuals selected.
5. Sampled another population.
6. If stopping criteria is met, finish; else, go to 2.
EDAspy allows to create a custom version of the EDA. Using the modular probabilistic models and the initializators, this can be embedded into the EDA baseline and used for different purposes. If this fits you, take a look on the examples section to the EDACustom example.
EDAspy also incorporates a set of benchmarks in order to compare the algorithms trying to minimize these cost functions.
The following implementations are available in EDAspy:
* UMDAd: Univariate Marginal Distribution Algorithm binary [2]. It can be used as a simple example of EDA where the variables are binary and there are not dependencies between variables. Some usages include feature selection, for example.
* UMDAc: Univariate Marginal Distribution Algorithm continuous [3]. In this EDA all the variables assume a Gaussian distribution and there are not dependencies considered between the variables. Some usages include hyperparameter optimization, for example.
* UnivariateKEDA: Univariate Kernel Estimation of Distribution Algorithm [4]. Each variables distribution is estimated using Kernel Density Estimation.
* EGNA: Estimation of Gaussian Distribution Algorithm [5][6]. This is a complex implementation in which dependencies between the variables are considered during the optimization. In each iteration, a Gaussian Bayesian network is learned and sampled. The variables in the model are assumed to be Gaussian and also de dependencies between them. This implementation is focused in continuous optimization.
* EMNA: Estimation of Multivariate Normal Algorithm [1]. This is a similar implementation to EGNA, in which instead of using a Gaussian Bayesian network, a multivariate Gaussian distribution is iteratively learned and sampled. As in EGNA, the dependencies between variables are considered and assumed to be linear Gaussian. This implementation is focused in continuous optimization.
* SPEDA: Semiparametric Estimation of Distribution Algorithm [7]. This multivariate EDA allows estimating the density of a variable using either KDE or Gaussians, and allow dependencies between both types of variables. It is an archive-based approach where the probabilistic model is updated given the best individuals of l previous generations.
* MultivariateKEDA: Special case of SPEDA approach in which all nodes are restricted to be estimated using KDE (Gaussian nodes are forbidden) [7]. It is also an archive-based approach.
* Categorical EDA. In this implementation we consider some independent categorical variables. Some usages include portfolio optimization, for exampled.
Some tools are also available in EDAspy such as the Bayesian network structure plotting, for visualizing the graph learnt in some of the implementations, if needed.
Although some categorical EDAs are implemented, the package is focused in continuous optimization. Below, we show a CPU time analysis for the different approaches implemented for continuous optimization. Note that the CPU time can be reduced using parallelization (available as a parameter in the EDA initialization). Reference [7] shows a comparison about the performance of the algorithms in terms of cost function minimization.
<img src='cpu_comparison_continuous_opt.jpeg' alt="CPU time comparison for continuous optimization" title="CPU time comparison for continuous optimization"/>
## Examples
Some examples are available in https://github.com/VicentePerezSoloviev/EDAspy/tree/master/notebooks
## Getting started
For installing EDAspy from Pypi execute the following command using pip:
```bash
pip install EDAspy
```
## Build from Source
### Prerequisites
- Python >= 3.0
- Pybnesian, numpy, pandas.
### Building
Clone the repository:
```bash
git clone https://github.com/VicentePerezSoloviev/EDAspy.git
cd EDAspy
git checkout v1.0.0 # You can checkout a specific version if you want
python setup.py install
```
## Testing
The library contains tests that can be executed using `pytest <https://docs.pytest.org/>`_. Install it using
pip:
```bash
pip install pytest
```
Run the tests with:
```bash
pytest
```
## Bibliography
[1] Larrañaga, P., & Lozano, J. A. (Eds.). (2001). Estimation of distribution algorithms: A new tool for evolutionary computation (Vol. 2). Springer Science & Business Media.
[2] Mühlenbein, H., & Paass, G. (1996). From recombination of genes to the estimation of distributions I. Binary parameters. In Parallel Problem Solving from Nature—PPSN IV: International Conference on Evolutionary Computation—The 4th International Conference on Parallel Problem Solving from Nature Berlin, Germany, September 22–26, 1996 Proceedings 4 (pp. 178-187). Springer Berlin Heidelberg.
[3] Mühlenbein, H., Bendisch, J., & Voigt, H. M. (1996). From recombination of genes to the estimation of distributions II. Continuous parameters. In Parallel Problem Solving from Nature—PPSN IV: International Conference on Evolutionary Computation—The 4th International Conference on Parallel Problem Solving from Nature Berlin, Germany, September 22–26, 1996 Proceedings 4 (pp. 188-197). Springer Berlin Heidelberg.
[4] Luo, N., & Qian, F. (2009, August). Evolutionary algorithm using kernel density estimation model in continuous domain. In 2009 7th Asian Control Conference (pp. 1526-1531). IEEE.
[5] Larranaga, P. (2000). Optimization in continuous domains by learning and simulation of Gaussian networks. In Proc. of the 2000 Genetic and Evolutionary Computation Conference Workshop Program.
[6] Soloviev, V. P., Larrañaga, P., & Bielza, C. (2022). Estimation of distribution algorithms using Gaussian Bayesian networks to solve industrial optimization problems constrained by environment variables. Journal of Combinatorial Optimization, 44(2), 1077-1098.
[7] Soloviev, Vicente P.& Bielza, Concha & Larrañaga, Pedro (2023). Semiparametric Estimation of Distribution Algorithms for continuous optimization. IEEE Transactions on Evolutionary Computation.
Raw data
{
"_id": null,
"home_page": "https://github.com/VicentePerezSoloviev/EDAspy",
"name": "EDAspy",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.0",
"maintainer_email": "",
"keywords": "EDA,estimation,bayesian,evolutionary,algorithm,optimization,time_series,feature,selection,semiparametric,Gaussian",
"author": "Vicente P. Soloviev",
"author_email": "vicente.perez.soloviev@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/e4/72/36266d9bfdbc1ab9ab7a485825bb1ab27764b4ff4831911129f41e4db34a/EDAspy-1.1.1.tar.gz",
"platform": null,
"description": "# <img src='https://raw.githubusercontent.com/VicentePerezSoloviev/EDAspy/master/Logo%20EDAspy.png' align=\"right\" height=\"150\"/>\n\n[![PyPI](https://img.shields.io/pypi/v/edaspy)](https://pypi.python.org/pypi/EDAspy/)\n[![PyPI license](https://img.shields.io/pypi/l/EDAspy.svg)](https://pypi.python.org/pypi/EDAspy/)\n[![Downloads](https://static.pepy.tech/personalized-badge/edaspy?period=total&units=none&left_color=grey&right_color=blue&left_text=downloads)](https://pepy.tech/project/edaspy)\n[![Documentation Status](https://readthedocs.org/projects/edaspy/badge/?version=latest)](https://edaspy.readthedocs.io/en/latest/?badge=latest)\n\n# EDAspy\n\n## Introduction\n\nEDAspy presents some implementations of the Estimation of Distribution Algorithms (EDAs) [1]. EDAs are a type of\nevolutionary algorithms. Depending on the type of the probabilistic model embedded in the EDA, and the type of\nvariables considered, we will use a different EDA implementation.\n\nThe pseudocode of EDAs is the following:\n\n1. Random initialization of the population.\n\n2. Evaluate each individual of the population.\n\n3. Select the top best individuals according to cost function evaluation.\n\n4. Learn a probabilistic model from the best individuals selected.\n\n5. Sampled another population.\n\n6. If stopping criteria is met, finish; else, go to 2.\n\nEDAspy allows to create a custom version of the EDA. Using the modular probabilistic models and the initializators, this can be embedded into the EDA baseline and used for different purposes. If this fits you, take a look on the examples section to the EDACustom example.\n\nEDAspy also incorporates a set of benchmarks in order to compare the algorithms trying to minimize these cost functions.\n\nThe following implementations are available in EDAspy:\n\n* UMDAd: Univariate Marginal Distribution Algorithm binary [2]. It can be used as a simple example of EDA where the variables are binary and there are not dependencies between variables. Some usages include feature selection, for example.\n\n\n* UMDAc: Univariate Marginal Distribution Algorithm continuous [3]. In this EDA all the variables assume a Gaussian distribution and there are not dependencies considered between the variables. Some usages include hyperparameter optimization, for example.\n\n\n* UnivariateKEDA: Univariate Kernel Estimation of Distribution Algorithm [4]. Each variables distribution is estimated using Kernel Density Estimation. \n\n\n* EGNA: Estimation of Gaussian Distribution Algorithm [5][6]. This is a complex implementation in which dependencies between the variables are considered during the optimization. In each iteration, a Gaussian Bayesian network is learned and sampled. The variables in the model are assumed to be Gaussian and also de dependencies between them. This implementation is focused in continuous optimization.\n\n\n* EMNA: Estimation of Multivariate Normal Algorithm [1]. This is a similar implementation to EGNA, in which instead of using a Gaussian Bayesian network, a multivariate Gaussian distribution is iteratively learned and sampled. As in EGNA, the dependencies between variables are considered and assumed to be linear Gaussian. This implementation is focused in continuous optimization.\n\n\n* SPEDA: Semiparametric Estimation of Distribution Algorithm [7]. This multivariate EDA allows estimating the density of a variable using either KDE or Gaussians, and allow dependencies between both types of variables. It is an archive-based approach where the probabilistic model is updated given the best individuals of l previous generations.\n\n\n* MultivariateKEDA: Special case of SPEDA approach in which all nodes are restricted to be estimated using KDE (Gaussian nodes are forbidden) [7]. It is also an archive-based approach.\n\n\n* Categorical EDA. In this implementation we consider some independent categorical variables. Some usages include portfolio optimization, for exampled.\n\n\nSome tools are also available in EDAspy such as the Bayesian network structure plotting, for visualizing the graph learnt in some of the implementations, if needed.\n\n\nAlthough some categorical EDAs are implemented, the package is focused in continuous optimization. Below, we show a CPU time analysis for the different approaches implemented for continuous optimization. Note that the CPU time can be reduced using parallelization (available as a parameter in the EDA initialization). Reference [7] shows a comparison about the performance of the algorithms in terms of cost function minimization. \n\n<img src='cpu_comparison_continuous_opt.jpeg' alt=\"CPU time comparison for continuous optimization\" title=\"CPU time comparison for continuous optimization\"/>\n\n## Examples\n\nSome examples are available in https://github.com/VicentePerezSoloviev/EDAspy/tree/master/notebooks\n\n## Getting started\n\nFor installing EDAspy from Pypi execute the following command using pip:\n\n```bash\n pip install EDAspy\n```\n\n## Build from Source\n\n### Prerequisites\n\n- Python >= 3.0\n- Pybnesian, numpy, pandas.\n\n### Building\n\nClone the repository:\n\n```bash\n git clone https://github.com/VicentePerezSoloviev/EDAspy.git\n cd EDAspy\n git checkout v1.0.0 # You can checkout a specific version if you want\n python setup.py install\n```\n## Testing \n\nThe library contains tests that can be executed using `pytest <https://docs.pytest.org/>`_. Install it using \npip:\n\n```bash\n pip install pytest\n```\n\nRun the tests with:\n\n```bash\n pytest\n```\n\n## Bibliography\n\n[1] Larra\u00c3\u00b1aga, P., & Lozano, J. A. (Eds.). (2001). Estimation of distribution algorithms: A new tool for evolutionary computation (Vol. 2). Springer Science & Business Media.\n\n[2] M\u00c3\u00bchlenbein, H., & Paass, G. (1996). From recombination of genes to the estimation of distributions I. Binary parameters. In Parallel Problem Solving from Nature\u00e2\u20ac\u201dPPSN IV: International Conference on Evolutionary Computation\u00e2\u20ac\u201dThe 4th International Conference on Parallel Problem Solving from Nature Berlin, Germany, September 22\u00e2\u20ac\u201c26, 1996 Proceedings 4 (pp. 178-187). Springer Berlin Heidelberg.\n\n[3] M\u00c3\u00bchlenbein, H., Bendisch, J., & Voigt, H. M. (1996). From recombination of genes to the estimation of distributions II. Continuous parameters. In Parallel Problem Solving from Nature\u00e2\u20ac\u201dPPSN IV: International Conference on Evolutionary Computation\u00e2\u20ac\u201dThe 4th International Conference on Parallel Problem Solving from Nature Berlin, Germany, September 22\u00e2\u20ac\u201c26, 1996 Proceedings 4 (pp. 188-197). Springer Berlin Heidelberg.\n\n[4] Luo, N., & Qian, F. (2009, August). Evolutionary algorithm using kernel density estimation model in continuous domain. In 2009 7th Asian Control Conference (pp. 1526-1531). IEEE.\n\n[5] Larranaga, P. (2000). Optimization in continuous domains by learning and simulation of Gaussian networks. In Proc. of the 2000 Genetic and Evolutionary Computation Conference Workshop Program.\n\n[6] Soloviev, V. P., Larra\u00c3\u00b1aga, P., & Bielza, C. (2022). Estimation of distribution algorithms using Gaussian Bayesian networks to solve industrial optimization problems constrained by environment variables. Journal of Combinatorial Optimization, 44(2), 1077-1098.\n\n[7] Soloviev, Vicente P.& Bielza, Concha & Larra\u00c3\u00b1aga, Pedro (2023). Semiparametric Estimation of Distribution Algorithms for continuous optimization. IEEE Transactions on Evolutionary Computation.",
"bugtrack_url": null,
"license": "MIT",
"summary": "EDAspy is a Python package that implements Estimation of Distribution Algorithms. EDAspy allows toeither use already existing implementations or customize the EDAs baseline easily building it bymodules so new research can be easily developed. It also has several benchmarks for comparisons.",
"version": "1.1.1",
"project_urls": {
"Download": "https://github.com/VicentePerezSoloviev/EDAspy/archive/1.1.1.tar.gz",
"Homepage": "https://github.com/VicentePerezSoloviev/EDAspy"
},
"split_keywords": [
"eda",
"estimation",
"bayesian",
"evolutionary",
"algorithm",
"optimization",
"time_series",
"feature",
"selection",
"semiparametric",
"gaussian"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "e47236266d9bfdbc1ab9ab7a485825bb1ab27764b4ff4831911129f41e4db34a",
"md5": "e1c2d3b2fd4b11b66c925d67ab33c2de",
"sha256": "ee5bed4a90166af3a42676adc547a488c7697f261622c4b74d3c7ea3aa0f80f1"
},
"downloads": -1,
"filename": "EDAspy-1.1.1.tar.gz",
"has_sig": false,
"md5_digest": "e1c2d3b2fd4b11b66c925d67ab33c2de",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.0",
"size": 33107,
"upload_time": "2023-06-28T09:36:34",
"upload_time_iso_8601": "2023-06-28T09:36:34.304537Z",
"url": "https://files.pythonhosted.org/packages/e4/72/36266d9bfdbc1ab9ab7a485825bb1ab27764b4ff4831911129f41e4db34a/EDAspy-1.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-06-28 09:36:34",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "VicentePerezSoloviev",
"github_project": "EDAspy",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "matplotlib",
"specs": [
[
"==",
"3.3.4"
]
]
},
{
"name": "networkx",
"specs": [
[
"==",
"2.5"
]
]
},
{
"name": "numpy",
"specs": [
[
"==",
"1.22.4"
]
]
},
{
"name": "pandas",
"specs": [
[
"==",
"1.2.4"
]
]
},
{
"name": "pybnesian",
"specs": [
[
"==",
"0.4.2"
]
]
},
{
"name": "scikit_learn",
"specs": [
[
"==",
"1.2.2"
]
]
},
{
"name": "scipy",
"specs": [
[
"==",
"1.6.2"
]
]
},
{
"name": "setuptools",
"specs": [
[
">=",
"65.5.1"
]
]
}
],
"lcname": "edaspy"
}