# Online Deterministic Annealing (ODA)
> A general-purpose learning model designed to meet the needs of applications in which computational resources are limited, and robustness and interpretability are prioritized.
> Constitutes an **online** prototype-based learning algorithm based on annealing optimization that is formulated as an recursive **gradient-free** stochastic approximation algorithm.
> Can be viewed as an interpretable and progressively growing competitive-learning neural network model.
>Applications include online unsupervised and supervised learning [1], regression,
>reinforcement learning [2],
>adaptive graph partitioning [3], and swarm leader detection.




## Contact
Christos N. Mavridis, Ph.D. \
Division of Decision and Control Systems \
School of Electrical Engineering and Computer Science, \
KTH Royal Institute of Technology \
https://mavridischristos.github.io/ \
```mavridis (at) kth.se```
## Description of the Optimization Algorithm
The **observed data** are represented by a random variable
$$X: \Omega \rightarrow S\subseteq \mathbb{R}^d$$
defined in a probability space $(\Omega, \mathcal{F}, \mathbb{P})$.
Given a **similarity measure** (which can be any Bregman divergence, e.g., squared Euclidean distance, Kullback-Leibler divergence, etc.)
$$d:S\rightarrow \mathrm{ri}(S)$$
the goal is to **find a set $\mu$ of $M$ codevectors**
in the input space **such that** the following average distortion measure is minimized:
$$ \min_\mu J(\mu) := E[\min_i d(X,\mu_i)] $$
For supervised learning, e.g., classification and regression, each codevector $\mu_i$ is associated with a label $c_i$ as well.
This process is equivalent to finding the most suitable set of $M$
local constant models, and results in a
> **Piecewise-constant approximation (partition) of the input space $S$**.
To construct a learning algorithm that progressively increases the number
of codevectors $M$ as needed,
we define a probability space over an infinite number of local models,
and constraint their distribution using the maximum-entropy principle
at different levels.
First we need to adopt a probabilistic approach, and a discrete random variable
$$Q:S \rightarrow \mu$$
with countably infinite domain $\mu$.
Then we constraint its distribution by formulating the multi-objective optimization:
$$\min_\mu F(\mu) := (1-T) D(\mu) - T H(\mu)$$
where
$$D(\mu) := E[d(X,Q)] =\int p(x) \sum_i p(\mu_i|x) d_\phi(x,\mu_i) ~\textrm{d}x$$
and
$$H(\mu) := E[-\log P(X,Q)] =H(X) - \int p(x) \sum_i p(\mu_i|x) \log p(\mu_i|x) ~\textrm{d}x $$
is the Shannon entropy.
This is now a problem of finding the locations $\{\mu_i\}$ and the
corresponding probabilities
$\{p(\mu_i|x)\}:=\{p(Q=\mu_i|X=x)\}$.
> The **Lagrange multiplier $T\in[0,1]$** is called the **temperature parameter**
and controls the trade-off between $D$ and $H$.
As $T$ is varied, we essentially transition from one solution of the multi-objective optimization
(a Pareto point when the objectives are convex) to another, and:
> **Reducing the values of $T$ results in a bifurcation phenomenon that increases $M$ and describes an annealing process** [1, 2].
The above **sequence of optimization problems** is solved for decreasing values of T using a
> Recursive **gradient-free stochastic approximation** algorithm.
The annealing nature of the algorithm contributes to
avoiding poor local minima,
offers robustness with respect to the initial conditions,
and provides a means
to progressively increase the complexity of the learning model
through an intuitive bifurcation phenomenon.
## Usage (Outdated)
The ODA architecture is coded in the ODA class inside ```OnlineDeterministicAnnealing/oda.py```:
from oda import ODA
Regarding the data format, they need to be a list of *(n)* lists of *(m=1)* *d*-vectors (np.arrays):
train_data = [[np.array], [np.array], [np.array], ...]
The simplest way to train ODA on a dataset is:
clf = ODA(train_data=train_data,train_labels=train_labels)
clf.fit(test_data=test_data,test_labels=test_labels)
Notice that a dataset is not required, and one can train ODA using observations one at a time as follows:
tl = len(clf.timeline)
# Stop in the next converged configuration
while len(clf.timeline)==tl and not clf.trained:
train_datum, train_label = system.observe()
clf.train(train_datum,train_label,test_data=test_data,test_labels=test_labels)
## Classification
For classification, the labels need to be a list of *(n)* labels, preferably integer numbers (for numba.jit)
train_labels = [ int, int , int, ...]
## Clustering
For clustering replace:
train_labels = [0 for i in range(len(train_labels))]
## Regression
For regression (piece-wise constant function approximation) replace:
train_labels = [ np.array, np.array , np.array, ...]
clf = ODA(train_data=train_data,train_labels=train_labels,regression=True)
## Prediction
prediction = clf.predict(test_datum)
error = clf.score(test_data, test_labels)
## Useful Parameters (Outdated)
### Cost Function
> Bregman Divergence:
# Values in {'phi_Eucl', 'phi_KL'} (Squared Euclidean distance, KL divergence)
Bregman_phi = ['phi_Eucl']
### Termination Criteria
> Minimum Termperature
Tmin = [1e-4]
> Limit in node's children. After that stop growing
Kmax = [50]
> Desired training error
error_threshold = [0.0]
# Stop when reached 'error_threshold_count' times
error_threshold_count = [2]
# Make sure keepscore > 2
> ODA vs Soft-Clustering vs LVQ
# Values in {0,1,2,3}
# 0:ODA update
# 1:ODA until Kmax. Then switch to 2:soft clustering with no perturbation/merging
# 2:soft clustering with no perturbation/merging
# 3: LVQ update (hard-clustering) with no perturbation/merging
lvq=[0]
> Verbose
# Values in {0,1,2,3}
# 0: don't compute or show score
# 1: compute and show score only on tree node splits
# 2: compute score after every SA convergence and use it as a stopping criterion
# 3: compute and show score after every SA convergence and use it as a stopping criterion
keepscore = 3
## Model Progression
The history of all the intermediate models trained is stored in:
clf.myY, clf.myYlabels, clf.myK, clf.myTreeK, clf.myT, clf.myLoops, clf.myTime, clf.myTrainError, clf.myTestError
## Tree Structure and Multiple Resolutions
For multiple resolutions every parameter becomes a list of *m* parameters.
Example for *m=2*:
Tmax = [0.9, 0.09]
Tmin = [0.01, 0.0001]
The training data should look like this:
train_data = [[np.array, np.array, ...], [np.array, np.array, ...], [np.array, np.array, ...], ...]
## Tutorials (Outdated)
> [Clustering](https://colab.research.google.com/github/MavridisChristos/OnlineDeterministicAnnealing/blob/main/tutorials/tutorial-clustering.ipynb)



> [Classification](https://colab.research.google.com/github/MavridisChristos/OnlineDeterministicAnnealing/blob/main/tutorials/tutorial-classification.ipynb)



> [Regression](https://colab.research.google.com/github/MavridisChristos/OnlineDeterministicAnnealing/blob/main/tutorials/tutorial-regression.ipynb)




## Citing
If you use this work in an academic context, please cite the following:
@article{mavridis2023annealing,
author = {Mavridis, Christos and Baras, John S.},
journal = {IEEE Transactions on Automatic Control},
title = {Annealing Optimization for Progressive Learning With Stochastic Approximation},
year = {2023},
volume = {68},
number = {5},
pages = {2862-2874},
publisher = {IEEE},
}
or
@article{mavridis2023online,
title = {Online deterministic annealing for classification and clustering},
author = {Mavridis, Christos and Baras, John S},
journal = {IEEE Transactions on Neural Networks and Learning Systems},
year = {2023},
volume = {34},
number = {10},
pages = {7125-7134},
publisher = {IEEE},
}
Raw data
{
"_id": null,
"home_page": null,
"name": "online-deterministic-annealing",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "Christos Mavridis <mavridis@kth.se>",
"keywords": "annealing, classification, clustering, hybrid system identification, machine learning, optimization, regression, system identification",
"author": null,
"author_email": "Christos Mavridis <mavridis@kth.se>",
"download_url": "https://files.pythonhosted.org/packages/32/2e/df7f538436e216c466c8c8c0ad3e752d27b012f111476ef351f08ac1e487/online_deterministic_annealing-1.0.0.tar.gz",
"platform": null,
"description": "# Online Deterministic Annealing (ODA)\n\n> A general-purpose learning model designed to meet the needs of applications in which computational resources are limited, and robustness and interpretability are prioritized.\n\n> Constitutes an **online** prototype-based learning algorithm based on annealing optimization that is formulated as an recursive **gradient-free** stochastic approximation algorithm.\n\n> Can be viewed as an interpretable and progressively growing competitive-learning neural network model.\n\n>Applications include online unsupervised and supervised learning [1], regression,\n>reinforcement learning [2], \n>adaptive graph partitioning [3], and swarm leader detection.\n\n\n\n\n\n\n## Contact \n\nChristos N. Mavridis, Ph.D. \\\nDivision of Decision and Control Systems \\\nSchool of Electrical Engineering and Computer Science, \\\nKTH Royal Institute of Technology \\\nhttps://mavridischristos.github.io/ \\\n```mavridis (at) kth.se``` \n\n## Description of the Optimization Algorithm\n\nThe **observed data** are represented by a random variable \n$$X: \\Omega \\rightarrow S\\subseteq \\mathbb{R}^d$$\ndefined in a probability space $(\\Omega, \\mathcal{F}, \\mathbb{P})$.\n\nGiven a **similarity measure** (which can be any Bregman divergence, e.g., squared Euclidean distance, Kullback-Leibler divergence, etc.) \n$$d:S\\rightarrow \\mathrm{ri}(S)$$ \nthe goal is to **find a set $\\mu$ of $M$ codevectors** \nin the input space **such that** the following average distortion measure is minimized: \n\n$$ \\min_\\mu J(\\mu) := E[\\min_i d(X,\\mu_i)] $$\n \nFor supervised learning, e.g., classification and regression, each codevector $\\mu_i$ is associated with a label $c_i$ as well.\nThis process is equivalent to finding the most suitable set of $M$\nlocal constant models, and results in a \n\n> **Piecewise-constant approximation (partition) of the input space $S$**.\n\nTo construct a learning algorithm that progressively increases the number \nof codevectors $M$ as needed, \nwe define a probability space over an infinite number of local models, \nand constraint their distribution using the maximum-entropy principle \nat different levels.\n\nFirst we need to adopt a probabilistic approach, and a discrete random variable\n$$Q:S \\rightarrow \\mu$$ \nwith countably infinite domain $\\mu$.\n\nThen we constraint its distribution by formulating the multi-objective optimization:\n\n$$\\min_\\mu F(\\mu) := (1-T) D(\\mu) - T H(\\mu)$$\nwhere \n$$D(\\mu) := E[d(X,Q)] =\\int p(x) \\sum_i p(\\mu_i|x) d_\\phi(x,\\mu_i) ~\\textrm{d}x$$\nand\n$$H(\\mu) := E[-\\log P(X,Q)] =H(X) - \\int p(x) \\sum_i p(\\mu_i|x) \\log p(\\mu_i|x) ~\\textrm{d}x $$\nis the Shannon entropy.\n\nThis is now a problem of finding the locations $\\{\\mu_i\\}$ and the \ncorresponding probabilities\n$\\{p(\\mu_i|x)\\}:=\\{p(Q=\\mu_i|X=x)\\}$.\n\n> The **Lagrange multiplier $T\\in[0,1]$** is called the **temperature parameter** \n\nand controls the trade-off between $D$ and $H$.\nAs $T$ is varied, we essentially transition from one solution of the multi-objective optimization \n(a Pareto point when the objectives are convex) to another, and:\n\n> **Reducing the values of $T$ results in a bifurcation phenomenon that increases $M$ and describes an annealing process** [1, 2].\n\nThe above **sequence of optimization problems** is solved for decreasing values of T using a\n\n> Recursive **gradient-free stochastic approximation** algorithm.\n\nThe annealing nature of the algorithm contributes to\navoiding poor local minima, \noffers robustness with respect to the initial conditions,\nand provides a means \nto progressively increase the complexity of the learning model\nthrough an intuitive bifurcation phenomenon.\n\t\n## Usage (Outdated)\n\nThe ODA architecture is coded in the ODA class inside ```OnlineDeterministicAnnealing/oda.py```:\n\t\n\tfrom oda import ODA\n\nRegarding the data format, they need to be a list of *(n)* lists of *(m=1)* *d*-vectors (np.arrays):\n\n\ttrain_data = [[np.array], [np.array], [np.array], ...]\n\nThe simplest way to train ODA on a dataset is:\n\n clf = ODA(train_data=train_data,train_labels=train_labels)\n clf.fit(test_data=test_data,test_labels=test_labels)\n\nNotice that a dataset is not required, and one can train ODA using observations one at a time as follows:\n\n tl = len(clf.timeline)\n # Stop in the next converged configuration\n while len(clf.timeline)==tl and not clf.trained:\n train_datum, train_label = system.observe()\n clf.train(train_datum,train_label,test_data=test_data,test_labels=test_labels)\n\t\n## Classification\n\nFor classification, the labels need to be a list of *(n)* labels, preferably integer numbers (for numba.jit)\n\n\ttrain_labels = [ int, int , int, ...]\n \n## Clustering\n\nFor clustering replace:\n\n train_labels = [0 for i in range(len(train_labels))] \n\n## Regression\n\nFor regression (piece-wise constant function approximation) replace:\n\n train_labels = [ np.array, np.array , np.array, ...]\n clf = ODA(train_data=train_data,train_labels=train_labels,regression=True)\n\n## Prediction\n\n prediction = clf.predict(test_datum)\n error = clf.score(test_data, test_labels)\n\n## Useful Parameters (Outdated)\n\n### Cost Function\n\n> Bregman Divergence: \n \n # Values in {'phi_Eucl', 'phi_KL'} (Squared Euclidean distance, KL divergence)\n Bregman_phi = ['phi_Eucl'] \n\n### Termination Criteria\n\n> Minimum Termperature \n\n Tmin = [1e-4]\n\n> Limit in node's children. After that stop growing\n\n Kmax = [50]\n\n> Desired training error\n\n error_threshold = [0.0]\n # Stop when reached 'error_threshold_count' times\n error_threshold_count = [2]\n # Make sure keepscore > 2\n\n> ODA vs Soft-Clustering vs LVQ\n\n # Values in {0,1,2,3}\n # 0:ODA update\n # 1:ODA until Kmax. Then switch to 2:soft clustering with no perturbation/merging \n # 2:soft clustering with no perturbation/merging \n # 3: LVQ update (hard-clustering) with no perturbation/merging\n lvq=[0]\n\n> Verbose\n\n # Values in {0,1,2,3} \n # 0: don't compute or show score\n # 1: compute and show score only on tree node splits \n # 2: compute score after every SA convergence and use it as a stopping criterion\n # 3: compute and show score after every SA convergence and use it as a stopping criterion\n keepscore = 3\n\n## Model Progression\n\nThe history of all the intermediate models trained is stored in:\n\n clf.myY, clf.myYlabels, clf.myK, clf.myTreeK, clf.myT, clf.myLoops, clf.myTime, clf.myTrainError, clf.myTestError\n\n## Tree Structure and Multiple Resolutions\n\nFor multiple resolutions every parameter becomes a list of *m* parameters.\nExample for *m=2*:\n\n\tTmax = [0.9, 0.09]\n\tTmin = [0.01, 0.0001]\n\nThe training data should look like this:\n\n\ttrain_data = [[np.array, np.array, ...], [np.array, np.array, ...], [np.array, np.array, ...], ...]\n\n## Tutorials (Outdated)\n\n> [Clustering](https://colab.research.google.com/github/MavridisChristos/OnlineDeterministicAnnealing/blob/main/tutorials/tutorial-clustering.ipynb)\n\n\n\n\n\n> [Classification](https://colab.research.google.com/github/MavridisChristos/OnlineDeterministicAnnealing/blob/main/tutorials/tutorial-classification.ipynb)\n\n\n\n\n\n> [Regression](https://colab.research.google.com/github/MavridisChristos/OnlineDeterministicAnnealing/blob/main/tutorials/tutorial-regression.ipynb)\n\n\n\n\n\n\n## Citing\nIf you use this work in an academic context, please cite the following:\n\n @article{mavridis2023annealing,\n author = {Mavridis, Christos and Baras, John S.},\n journal = {IEEE Transactions on Automatic Control},\n title = {Annealing Optimization for Progressive Learning With Stochastic Approximation},\n year = {2023},\n volume = {68},\n number = {5},\n pages = {2862-2874},\n publisher = {IEEE},\n }\n\nor\n\n @article{mavridis2023online,\n title = {Online deterministic annealing for classification and clustering},\n author = {Mavridis, Christos and Baras, John S},\n journal = {IEEE Transactions on Neural Networks and Learning Systems},\n year = {2023},\n volume = {34},\n number = {10},\n pages = {7125-7134},\n publisher = {IEEE},\n }\n\t ",
"bugtrack_url": null,
"license": null,
"summary": "Online Deterministic Annealing Algorithm",
"version": "1.0.0",
"project_urls": {
"Homepage": "https://mavridischristos.github.io/",
"Repository": "https://github.com/MavridisChristos/online-deterministic-annealing"
},
"split_keywords": [
"annealing",
" classification",
" clustering",
" hybrid system identification",
" machine learning",
" optimization",
" regression",
" system identification"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "c32a1d9cc42ee68b8378f2687910376e765a7758f9176d3f01f635e7efe10faf",
"md5": "7aa92f4366a7aebdba19e62bff911d3f",
"sha256": "a41a3732f5719ac1711d5f10187e7d69a8550554ccda57354678bb2a37ac3f97"
},
"downloads": -1,
"filename": "online_deterministic_annealing-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "7aa92f4366a7aebdba19e62bff911d3f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 18169,
"upload_time": "2025-07-29T01:52:18",
"upload_time_iso_8601": "2025-07-29T01:52:18.054195Z",
"url": "https://files.pythonhosted.org/packages/c3/2a/1d9cc42ee68b8378f2687910376e765a7758f9176d3f01f635e7efe10faf/online_deterministic_annealing-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "322edf7f538436e216c466c8c8c0ad3e752d27b012f111476ef351f08ac1e487",
"md5": "bb584fd3e758522944ed0bb840b2ce70",
"sha256": "fc53a5d9f8906233e4039f8f26b5a5733c67d26df68acd73ce40b028937bf5b4"
},
"downloads": -1,
"filename": "online_deterministic_annealing-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "bb584fd3e758522944ed0bb840b2ce70",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 1943495,
"upload_time": "2025-07-29T01:52:20",
"upload_time_iso_8601": "2025-07-29T01:52:20.085459Z",
"url": "https://files.pythonhosted.org/packages/32/2e/df7f538436e216c466c8c8c0ad3e752d27b012f111476ef351f08ac1e487/online_deterministic_annealing-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-29 01:52:20",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "MavridisChristos",
"github_project": "online-deterministic-annealing",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "online-deterministic-annealing"
}