online-deterministic-annealing


Nameonline-deterministic-annealing JSON
Version 1.0.0 PyPI version JSON
download
home_pageNone
SummaryOnline Deterministic Annealing Algorithm
upload_time2025-07-29 01:52:20
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords annealing classification clustering hybrid system identification machine learning optimization regression system identification
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Online Deterministic Annealing (ODA)

> A general-purpose learning model designed to meet the needs of applications in which computational resources are limited, and robustness and interpretability are prioritized.

> Constitutes an **online** prototype-based learning algorithm based on annealing optimization that is formulated as an recursive **gradient-free** stochastic approximation algorithm.

> Can be viewed as an interpretable and progressively growing competitive-learning neural network model.

>Applications include online unsupervised and supervised learning [1], regression,
>reinforcement learning [2], 
>adaptive graph partitioning [3], and swarm leader detection.

![](tests/tutorials/img/2Dapproximation/app0.png)
![](tests/tutorials/img/2Dapproximation/app1.png)
![](tests/tutorials/img/2Dapproximation/app2.png)
![](tests/tutorials/img/2Dapproximation/original.png)

## Contact 

Christos N. Mavridis, Ph.D. \
Division of Decision and Control Systems \
School of Electrical Engineering and Computer Science, \
KTH Royal Institute of Technology \
https://mavridischristos.github.io/ \
```mavridis (at) kth.se``` 

## Description of the Optimization Algorithm

The **observed data** are represented by a random variable 
$$X: \Omega \rightarrow S\subseteq \mathbb{R}^d$$
defined in a probability space $(\Omega, \mathcal{F}, \mathbb{P})$.

Given a **similarity measure** (which can be any Bregman divergence, e.g., squared Euclidean distance, Kullback-Leibler divergence, etc.) 
$$d:S\rightarrow \mathrm{ri}(S)$$ 
the goal is to **find a set $\mu$ of $M$ codevectors** 
in the input space **such that** the following average distortion measure is minimized: 

$$ \min_\mu  J(\mu) := E[\min_i d(X,\mu_i)] $$
    
For supervised learning, e.g., classification and regression, each codevector $\mu_i$ is associated with a label $c_i$ as well.
This process is equivalent to finding the most suitable set of $M$
local constant models, and results in a 

> **Piecewise-constant approximation (partition) of the input space $S$**.

To construct a learning algorithm that progressively increases the number 
of codevectors $M$ as needed, 
we define a probability space over an infinite number of local models, 
and constraint their distribution using the maximum-entropy principle 
at different levels.

First we need to adopt a probabilistic approach, and a discrete random variable
$$Q:S \rightarrow \mu$$ 
with countably infinite domain $\mu$.

Then we constraint its distribution by formulating the multi-objective optimization:

$$\min_\mu F(\mu) := (1-T) D(\mu) - T H(\mu)$$
where 
$$D(\mu) := E[d(X,Q)] =\int p(x) \sum_i p(\mu_i|x) d_\phi(x,\mu_i) ~\textrm{d}x$$
and
$$H(\mu) := E[-\log P(X,Q)] =H(X) - \int p(x) \sum_i p(\mu_i|x) \log p(\mu_i|x) ~\textrm{d}x $$
is the Shannon entropy.

This is now a problem of finding the locations $\{\mu_i\}$ and the 
corresponding probabilities
$\{p(\mu_i|x)\}:=\{p(Q=\mu_i|X=x)\}$.

> The **Lagrange multiplier $T\in[0,1]$** is called the **temperature parameter** 

and controls the trade-off between $D$ and $H$.
As $T$ is varied, we essentially transition from one solution of the multi-objective optimization 
(a Pareto point when the objectives are convex) to another, and:

> **Reducing the values of $T$ results in a bifurcation phenomenon that increases $M$ and describes an annealing process** [1, 2].

The above **sequence of optimization problems** is solved for decreasing values of T using a

> Recursive **gradient-free stochastic approximation** algorithm.

The annealing nature of the algorithm contributes to
avoiding poor local minima, 
offers robustness with respect to the initial conditions,
and provides a means 
to progressively increase the complexity of the learning model
through an intuitive bifurcation phenomenon.
	
## Usage (Outdated)

The ODA architecture is coded in the ODA class inside ```OnlineDeterministicAnnealing/oda.py```:
	
	from oda import ODA

Regarding the data format, they need to be a list of *(n)* lists of *(m=1)* *d*-vectors (np.arrays):

	train_data = [[np.array], [np.array], [np.array], ...]

The simplest way to train ODA on a dataset is:

    clf = ODA(train_data=train_data,train_labels=train_labels)
    clf.fit(test_data=test_data,test_labels=test_labels)

Notice that a dataset is not required, and one can train ODA using observations one at a time as follows:

    tl = len(clf.timeline)
    # Stop in the next converged configuration
    while len(clf.timeline)==tl and not clf.trained:
        train_datum, train_label = system.observe()
        clf.train(train_datum,train_label,test_data=test_data,test_labels=test_labels)
	
## Classification

For classification, the labels need to be a list of *(n)* labels, preferably integer numbers (for numba.jit)

	train_labels = [ int, int , int, ...]
            
## Clustering

For clustering replace:

    train_labels = [0 for i in range(len(train_labels))] 

## Regression

For regression (piece-wise constant function approximation) replace:

    train_labels = [ np.array, np.array , np.array, ...]
    clf = ODA(train_data=train_data,train_labels=train_labels,regression=True)

## Prediction

    prediction = clf.predict(test_datum)
    error = clf.score(test_data, test_labels)

## Useful Parameters (Outdated)

### Cost Function

> Bregman Divergence: 
    
    # Values in {'phi_Eucl', 'phi_KL'} (Squared Euclidean distance, KL divergence)
    Bregman_phi = ['phi_Eucl'] 

### Termination Criteria

> Minimum Termperature 

    Tmin = [1e-4]

> Limit in node's children. After that stop growing

    Kmax = [50]

> Desired training error

    error_threshold = [0.0]
    # Stop when reached 'error_threshold_count' times
    error_threshold_count = [2]
    # Make sure keepscore > 2

> ODA vs Soft-Clustering vs LVQ

    # Values in {0,1,2,3}
    # 0:ODA update
    # 1:ODA until Kmax. Then switch to 2:soft clustering with no perturbation/merging 
    # 2:soft clustering with no perturbation/merging 
    # 3: LVQ update (hard-clustering) with no perturbation/merging
    lvq=[0]

> Verbose

    # Values in {0,1,2,3}    
    # 0: don't compute or show score
    # 1: compute and show score only on tree node splits 
    # 2: compute score after every SA convergence and use it as a stopping criterion
    # 3: compute and show score after every SA convergence and use it as a stopping criterion
    keepscore = 3

## Model Progression

The history of all the intermediate models trained is stored in:

    clf.myY, clf.myYlabels, clf.myK, clf.myTreeK, clf.myT, clf.myLoops, clf.myTime, clf.myTrainError, clf.myTestError

## Tree Structure and Multiple Resolutions

For multiple resolutions every parameter becomes a list of *m* parameters.
Example for *m=2*:

	Tmax = [0.9, 0.09]
	Tmin = [0.01, 0.0001]

The training data should look like this:

	train_data = [[np.array, np.array, ...], [np.array, np.array, ...], [np.array, np.array, ...], ...]

## Tutorials (Outdated)

> [Clustering](https://colab.research.google.com/github/MavridisChristos/OnlineDeterministicAnnealing/blob/main/tutorials/tutorial-clustering.ipynb)

![](tests/tutorials/img/clustering/app0.png)
![](tests/tutorials/img/clustering/app1.png)
![](tests/tutorials/img/clustering/app2.png)

> [Classification](https://colab.research.google.com/github/MavridisChristos/OnlineDeterministicAnnealing/blob/main/tutorials/tutorial-classification.ipynb)

![](tests/tutorials/img/classification/app0.png)
![](tests/tutorials/img/classification/app1.png)
![](tests/tutorials/img/classification/app2.png)

> [Regression](https://colab.research.google.com/github/MavridisChristos/OnlineDeterministicAnnealing/blob/main/tutorials/tutorial-regression.ipynb)

![](tests/tutorials/img/2Dapproximation/app0.png)
![](tests/tutorials/img/2Dapproximation/app1.png)
![](tests/tutorials/img/2Dapproximation/app2.png)
![](tests/tutorials/img/2Dapproximation/original.png)

## Citing
If you use this work in an academic context, please cite the following:

    @article{mavridis2023annealing,
        author = {Mavridis, Christos and Baras, John S.},
        journal = {IEEE Transactions on Automatic Control},
        title = {Annealing Optimization for Progressive Learning With Stochastic Approximation},
        year = {2023},
        volume = {68},
        number = {5},
        pages = {2862-2874},
        publisher = {IEEE},
    }

or

    @article{mavridis2023online,
        title = {Online deterministic annealing for classification and clustering},
        author = {Mavridis, Christos and Baras, John S},
        journal = {IEEE Transactions on Neural Networks and Learning Systems},
        year = {2023},
        volume = {34},
        number = {10},
        pages = {7125-7134},
        publisher = {IEEE},
    }
	  
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "online-deterministic-annealing",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "Christos Mavridis <mavridis@kth.se>",
    "keywords": "annealing, classification, clustering, hybrid system identification, machine learning, optimization, regression, system identification",
    "author": null,
    "author_email": "Christos Mavridis <mavridis@kth.se>",
    "download_url": "https://files.pythonhosted.org/packages/32/2e/df7f538436e216c466c8c8c0ad3e752d27b012f111476ef351f08ac1e487/online_deterministic_annealing-1.0.0.tar.gz",
    "platform": null,
    "description": "# Online Deterministic Annealing (ODA)\n\n> A general-purpose learning model designed to meet the needs of applications in which computational resources are limited, and robustness and interpretability are prioritized.\n\n> Constitutes an **online** prototype-based learning algorithm based on annealing optimization that is formulated as an recursive **gradient-free** stochastic approximation algorithm.\n\n> Can be viewed as an interpretable and progressively growing competitive-learning neural network model.\n\n>Applications include online unsupervised and supervised learning [1], regression,\n>reinforcement learning [2], \n>adaptive graph partitioning [3], and swarm leader detection.\n\n![](tests/tutorials/img/2Dapproximation/app0.png)\n![](tests/tutorials/img/2Dapproximation/app1.png)\n![](tests/tutorials/img/2Dapproximation/app2.png)\n![](tests/tutorials/img/2Dapproximation/original.png)\n\n## Contact \n\nChristos N. Mavridis, Ph.D. \\\nDivision of Decision and Control Systems \\\nSchool of Electrical Engineering and Computer Science, \\\nKTH Royal Institute of Technology \\\nhttps://mavridischristos.github.io/ \\\n```mavridis (at) kth.se``` \n\n## Description of the Optimization Algorithm\n\nThe **observed data** are represented by a random variable \n$$X: \\Omega \\rightarrow S\\subseteq \\mathbb{R}^d$$\ndefined in a probability space $(\\Omega, \\mathcal{F}, \\mathbb{P})$.\n\nGiven a **similarity measure** (which can be any Bregman divergence, e.g., squared Euclidean distance, Kullback-Leibler divergence, etc.) \n$$d:S\\rightarrow \\mathrm{ri}(S)$$ \nthe goal is to **find a set $\\mu$ of $M$ codevectors** \nin the input space **such that** the following average distortion measure is minimized: \n\n$$ \\min_\\mu  J(\\mu) := E[\\min_i d(X,\\mu_i)] $$\n    \nFor supervised learning, e.g., classification and regression, each codevector $\\mu_i$ is associated with a label $c_i$ as well.\nThis process is equivalent to finding the most suitable set of $M$\nlocal constant models, and results in a \n\n> **Piecewise-constant approximation (partition) of the input space $S$**.\n\nTo construct a learning algorithm that progressively increases the number \nof codevectors $M$ as needed, \nwe define a probability space over an infinite number of local models, \nand constraint their distribution using the maximum-entropy principle \nat different levels.\n\nFirst we need to adopt a probabilistic approach, and a discrete random variable\n$$Q:S \\rightarrow \\mu$$ \nwith countably infinite domain $\\mu$.\n\nThen we constraint its distribution by formulating the multi-objective optimization:\n\n$$\\min_\\mu F(\\mu) := (1-T) D(\\mu) - T H(\\mu)$$\nwhere \n$$D(\\mu) := E[d(X,Q)] =\\int p(x) \\sum_i p(\\mu_i|x) d_\\phi(x,\\mu_i) ~\\textrm{d}x$$\nand\n$$H(\\mu) := E[-\\log P(X,Q)] =H(X) - \\int p(x) \\sum_i p(\\mu_i|x) \\log p(\\mu_i|x) ~\\textrm{d}x $$\nis the Shannon entropy.\n\nThis is now a problem of finding the locations $\\{\\mu_i\\}$ and the \ncorresponding probabilities\n$\\{p(\\mu_i|x)\\}:=\\{p(Q=\\mu_i|X=x)\\}$.\n\n> The **Lagrange multiplier $T\\in[0,1]$** is called the **temperature parameter** \n\nand controls the trade-off between $D$ and $H$.\nAs $T$ is varied, we essentially transition from one solution of the multi-objective optimization \n(a Pareto point when the objectives are convex) to another, and:\n\n> **Reducing the values of $T$ results in a bifurcation phenomenon that increases $M$ and describes an annealing process** [1, 2].\n\nThe above **sequence of optimization problems** is solved for decreasing values of T using a\n\n> Recursive **gradient-free stochastic approximation** algorithm.\n\nThe annealing nature of the algorithm contributes to\navoiding poor local minima, \noffers robustness with respect to the initial conditions,\nand provides a means \nto progressively increase the complexity of the learning model\nthrough an intuitive bifurcation phenomenon.\n\t\n## Usage (Outdated)\n\nThe ODA architecture is coded in the ODA class inside ```OnlineDeterministicAnnealing/oda.py```:\n\t\n\tfrom oda import ODA\n\nRegarding the data format, they need to be a list of *(n)* lists of *(m=1)* *d*-vectors (np.arrays):\n\n\ttrain_data = [[np.array], [np.array], [np.array], ...]\n\nThe simplest way to train ODA on a dataset is:\n\n    clf = ODA(train_data=train_data,train_labels=train_labels)\n    clf.fit(test_data=test_data,test_labels=test_labels)\n\nNotice that a dataset is not required, and one can train ODA using observations one at a time as follows:\n\n    tl = len(clf.timeline)\n    # Stop in the next converged configuration\n    while len(clf.timeline)==tl and not clf.trained:\n        train_datum, train_label = system.observe()\n        clf.train(train_datum,train_label,test_data=test_data,test_labels=test_labels)\n\t\n## Classification\n\nFor classification, the labels need to be a list of *(n)* labels, preferably integer numbers (for numba.jit)\n\n\ttrain_labels = [ int, int , int, ...]\n            \n## Clustering\n\nFor clustering replace:\n\n    train_labels = [0 for i in range(len(train_labels))] \n\n## Regression\n\nFor regression (piece-wise constant function approximation) replace:\n\n    train_labels = [ np.array, np.array , np.array, ...]\n    clf = ODA(train_data=train_data,train_labels=train_labels,regression=True)\n\n## Prediction\n\n    prediction = clf.predict(test_datum)\n    error = clf.score(test_data, test_labels)\n\n## Useful Parameters (Outdated)\n\n### Cost Function\n\n> Bregman Divergence: \n    \n    # Values in {'phi_Eucl', 'phi_KL'} (Squared Euclidean distance, KL divergence)\n    Bregman_phi = ['phi_Eucl'] \n\n### Termination Criteria\n\n> Minimum Termperature \n\n    Tmin = [1e-4]\n\n> Limit in node's children. After that stop growing\n\n    Kmax = [50]\n\n> Desired training error\n\n    error_threshold = [0.0]\n    # Stop when reached 'error_threshold_count' times\n    error_threshold_count = [2]\n    # Make sure keepscore > 2\n\n> ODA vs Soft-Clustering vs LVQ\n\n    # Values in {0,1,2,3}\n    # 0:ODA update\n    # 1:ODA until Kmax. Then switch to 2:soft clustering with no perturbation/merging \n    # 2:soft clustering with no perturbation/merging \n    # 3: LVQ update (hard-clustering) with no perturbation/merging\n    lvq=[0]\n\n> Verbose\n\n    # Values in {0,1,2,3}    \n    # 0: don't compute or show score\n    # 1: compute and show score only on tree node splits \n    # 2: compute score after every SA convergence and use it as a stopping criterion\n    # 3: compute and show score after every SA convergence and use it as a stopping criterion\n    keepscore = 3\n\n## Model Progression\n\nThe history of all the intermediate models trained is stored in:\n\n    clf.myY, clf.myYlabels, clf.myK, clf.myTreeK, clf.myT, clf.myLoops, clf.myTime, clf.myTrainError, clf.myTestError\n\n## Tree Structure and Multiple Resolutions\n\nFor multiple resolutions every parameter becomes a list of *m* parameters.\nExample for *m=2*:\n\n\tTmax = [0.9, 0.09]\n\tTmin = [0.01, 0.0001]\n\nThe training data should look like this:\n\n\ttrain_data = [[np.array, np.array, ...], [np.array, np.array, ...], [np.array, np.array, ...], ...]\n\n## Tutorials (Outdated)\n\n> [Clustering](https://colab.research.google.com/github/MavridisChristos/OnlineDeterministicAnnealing/blob/main/tutorials/tutorial-clustering.ipynb)\n\n![](tests/tutorials/img/clustering/app0.png)\n![](tests/tutorials/img/clustering/app1.png)\n![](tests/tutorials/img/clustering/app2.png)\n\n> [Classification](https://colab.research.google.com/github/MavridisChristos/OnlineDeterministicAnnealing/blob/main/tutorials/tutorial-classification.ipynb)\n\n![](tests/tutorials/img/classification/app0.png)\n![](tests/tutorials/img/classification/app1.png)\n![](tests/tutorials/img/classification/app2.png)\n\n> [Regression](https://colab.research.google.com/github/MavridisChristos/OnlineDeterministicAnnealing/blob/main/tutorials/tutorial-regression.ipynb)\n\n![](tests/tutorials/img/2Dapproximation/app0.png)\n![](tests/tutorials/img/2Dapproximation/app1.png)\n![](tests/tutorials/img/2Dapproximation/app2.png)\n![](tests/tutorials/img/2Dapproximation/original.png)\n\n## Citing\nIf you use this work in an academic context, please cite the following:\n\n    @article{mavridis2023annealing,\n        author = {Mavridis, Christos and Baras, John S.},\n        journal = {IEEE Transactions on Automatic Control},\n        title = {Annealing Optimization for Progressive Learning With Stochastic Approximation},\n        year = {2023},\n        volume = {68},\n        number = {5},\n        pages = {2862-2874},\n        publisher = {IEEE},\n    }\n\nor\n\n    @article{mavridis2023online,\n        title = {Online deterministic annealing for classification and clustering},\n        author = {Mavridis, Christos and Baras, John S},\n        journal = {IEEE Transactions on Neural Networks and Learning Systems},\n        year = {2023},\n        volume = {34},\n        number = {10},\n        pages = {7125-7134},\n        publisher = {IEEE},\n    }\n\t  ",
    "bugtrack_url": null,
    "license": null,
    "summary": "Online Deterministic Annealing Algorithm",
    "version": "1.0.0",
    "project_urls": {
        "Homepage": "https://mavridischristos.github.io/",
        "Repository": "https://github.com/MavridisChristos/online-deterministic-annealing"
    },
    "split_keywords": [
        "annealing",
        " classification",
        " clustering",
        " hybrid system identification",
        " machine learning",
        " optimization",
        " regression",
        " system identification"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c32a1d9cc42ee68b8378f2687910376e765a7758f9176d3f01f635e7efe10faf",
                "md5": "7aa92f4366a7aebdba19e62bff911d3f",
                "sha256": "a41a3732f5719ac1711d5f10187e7d69a8550554ccda57354678bb2a37ac3f97"
            },
            "downloads": -1,
            "filename": "online_deterministic_annealing-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7aa92f4366a7aebdba19e62bff911d3f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 18169,
            "upload_time": "2025-07-29T01:52:18",
            "upload_time_iso_8601": "2025-07-29T01:52:18.054195Z",
            "url": "https://files.pythonhosted.org/packages/c3/2a/1d9cc42ee68b8378f2687910376e765a7758f9176d3f01f635e7efe10faf/online_deterministic_annealing-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "322edf7f538436e216c466c8c8c0ad3e752d27b012f111476ef351f08ac1e487",
                "md5": "bb584fd3e758522944ed0bb840b2ce70",
                "sha256": "fc53a5d9f8906233e4039f8f26b5a5733c67d26df68acd73ce40b028937bf5b4"
            },
            "downloads": -1,
            "filename": "online_deterministic_annealing-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "bb584fd3e758522944ed0bb840b2ce70",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 1943495,
            "upload_time": "2025-07-29T01:52:20",
            "upload_time_iso_8601": "2025-07-29T01:52:20.085459Z",
            "url": "https://files.pythonhosted.org/packages/32/2e/df7f538436e216c466c8c8c0ad3e752d27b012f111476ef351f08ac1e487/online_deterministic_annealing-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-29 01:52:20",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "MavridisChristos",
    "github_project": "online-deterministic-annealing",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "online-deterministic-annealing"
}
        
Elapsed time: 2.35117s