online-deterministic-annealing

Name	online-deterministic-annealing JSON
Version	1.0.1 JSON
	download
home_page	None
Summary	Online Deterministic Annealing Algorithm
upload_time	2025-10-16 21:26:54
maintainer	None
docs_url	None
author	None
requires_python	>=3.8
license	None
keywords	annealing classification clustering hybrid system identification machine learning optimization regression system identification
VCS
bugtrack_url
requirements	numpy numba matplotlib scipy shapely
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Online Deterministic Annealing (ODA)

> A general-purpose learning model designed to meet the needs of Cyber-Physical Systems applications in which data and computational/communication resources are limited, and robustness and interpretability are prioritized.

> An **online** prototype-based learning algorithm based on annealing optimization that is formulated as an recursive **gradient-free** stochastic approximation algorithm.

> An interpretable and progressively growing competitive-learning neural network model.

> A hierarchical, multi-resolution learning model. 

> **Applications:** Real-time clustering, classification, regression, (state-aggregation) reinforcement learning, hybrid system identification, graph partitioning, leader detection.

## pip install

	pip install online-deterministic-annealing

Current Version: 

    version = "1.0.1"

Dependencies: 

    dependencies = [
    "numpy>=2.2.6",
    "numba>=0.61.2",
    "matplotlib>=3.10.3",
    "scipy>=1.16.2",
    "shapely>=2.1.2",
    ]
    requires-python = ">=3.8"

License: 

    license = "MIT"

## Demo

https://github.com/MavridisChristos/online-deterministic-annealing/tests/demo.py

## Usage 

The ODA architecture is coded in the ODA class inside ```online_deterministic_annealing/oda.py```:
	
	from online_deterministic_annealing.oda import ODA

Regarding the data format, they need to be a list of *(n)* lists of *(m=1)* *d*-vectors (np.arrays):

	train_data_x = [[np.array], [np.array], [np.array], ...]
	train_data_y = [np.array, np.array, np.array, ...] | make sure it is np.atleast1d()

The simplest way to train ODA on a dataset is:

    clf = ODA()

    clf.fit(train_data_x, train_data_y, train_labels,
            test_data_x, test_data_y, test_labels)

    hatX, hatY, hatLabel, hatMode = clf.predict(test_data_x)

Notice that a dataset is not required, and one can train ODA using observations one at a time as follows:

    while stopping_criterion:
        
        # Stop in the next converged configuration
        tl = len(clf.timeline)
        while len(clf.timeline)==tl and not clf.trained:
            train_datum_x, train_datum_y, train_label = system.observe()
            clf.train(train_datum_x, train_datum_y, train_label)

    hatX, hatY, hatLabel, hatMode = clf.predict(test_data_x)

## Clustering

For clustering set:

    train_data_x = [[np.array], [np.array], [np.array], ...]
    train_data_y = [np.atleast1d(0) for td in train_data_x]
    train_labels = [0 for td in train_data_x] 

## Classification

For classification, the labels need to be a list of *(n)* labels, preferably integer numbers (for numba.jit)

    train_data_x = [[np.array], [np.array], [np.array], ...]
    train_data_y = [np.atleast1d(0) for td in train_data_x]
	train_labels = [ int, int , int, ...]

## Regression

For regression (piece-wise constant function approximation) replace:

    observe_xy = 0.9 | set a value between [0,1] to give more weight to x or y error
    train_data_x = [[np.array], [np.array], [np.array], ...]
    train_data_y = [np.array, np.array, np.array, ...]
    train_labels = [0 for td in train_data_x] 

## Simultaneous Classification and Regression (Hybrid Learning)

    train_data_x = [[np.array], [np.array], [np.array], ...]
    train_data_y = [np.array, np.array, np.array, ...]
    train_labels = [ int, int , int, ...] 

## Prediction

    hatX, hatY, hatLabel, hatMode = clf.predict(test_data_x)
    error_clustering, error_regression, error_classification = clf.score(data_x, data_y, labels)

## All Parameters

Data

    - train_data_x
        # Single layer: [[np.array], [np.array], [np.array], ...]
        # Multiple Layers/Resolutions: [[np.array, np.array, ...], [np.array, np.array, ...], [np.array, np.array, ...], ...]
    - train_data_y
        # [ np.array, np.array, np.array, ... ] (np.atleast1d())
    - train_labels
        # [ 0, 0, 0, ... ] (zero values for clustering)
        # [ int, int , int, ... ] (int values for classification with numba.jit)
    - observe_xy = [0]
        # value in [0,1]. 0 considers only x values, 1 considers only y values, 0.5 considers bothe x,y equally, etc..
    
Bregman divergence

    - Bregman_phi = ['phi_Eucl']
        # Defines Bregman divergence d_phi. 
        # Values in {'phi_Eucl', 'phi_KL'} (Squared Euclidean distance, KL divergence)


Termination Criteria

    - Kmax = [32]
        # Limit in node's children. After that stop growing
    - timeline_limit = 1e6
        # Limit in the number of convergent representations. (Developer Mode) 
    - error_type = [0] 
        # 0:Clustering, 1:Regression, 2:Classification
    - error_threshold = [0.01] 
        # Desired training error. 
    - error_threshold_count = [3] 
        # Stop when reached 'error_threshold_count' times


Temperature Schedule

    - Tmax = [0.9] 
    - Tmin = [1e-2]
        # lambda max min values in [0,1]. T = (1-lambda)/lambda
    - gamma_steady = [0.95] 
        # T' = gamma * T
    - gamma_schedule = [[0.8,0.8]] 
        # Initial updates can be set to reduce faster


Tree Structure

    - node_id = [0] 
        # Tree/branch parent node
    - parent = None 
        # Pointer used to create tree-structured linked list


Regularization: Perturbation and Merging

    - lvq = [0] 
        # Values in {0,1,2,3}
        # 0:ODA update
        # 1:ODA until Kmax. Then switch to 2:soft clustering with no perturbation/merging 
        # 2:soft clustering with no perturbation/merging 
        # 3: LVQ update (hard-clustering) with no perturbation/merging
    - px_cut = [1e-5] 
        # Parameter e_r: threshold to find idle codevectors
    - perturb_param = [1e-1] 
        # Perturb (dublicate) existing codevectors 
        # Parameter delta = d_phi(mu, mu+'delta')/T: 
    - effective_neighborhood = [1e-0] 
        # Threshold to find merged (effective) codevectors
        # Parameter e_n = d_phi(mu, mu+'effective_neighborhood')/T


Convergence 

    - em_convergence = [1e-1]
    - convergence_counter_threshold = [5]
        # Convergece when d_phi(mu',mu) < e_c * (1+bb_init)/(1+bb) for 'convergence_counter_threshold' times
        # Parameter e_c =  d_phi(mu, mu+'em_convergence')/T
    - convergence_loops = [0]
        # Custom number of loops until convergence is considered true (overwrites e_c) (Developer mode)
    - stop_separation = [1e9-1]
        # After 'stop_separation' loops, gibbs probabilities consider all codevectors regardless of class 
    - bb_init = [0.9]
        # Initial bb value for stochastic approximation stepsize: 1/(bb+1)
    - bb_step = [0.9]
        # bb+=bb_step

Verbose

    - verbose = 2 
        # Values in {0,1,2}    
        # 0: don't show score
        # 1: show score only on tree node splits 
        # 2: show score after every SA convergence 

Numba Jit

    - jit = True
        # Using jit/python for Bregman divergences

## Model Parameters 

Tree Structure

    - self.id
    - self.parent
    - self.children

Status

    - self.K
    - self.T

    - self.perturbed = False
    - self.converged = False
    - self.trained = False

Variables

    - self.x 
    - self.y 
    - self.labels 
    - self.classes
    - self.parameters 
    - self.model 
    - self.optimizer 

    - self.px 
    - self.sx 

History

    - self.myK 
    - self.myT 
    - self.myX 
    - self.myY 
    - self.myLabels 
    - self.myParameters
    - self.myModels
    - self.myOptimizers

    - self.myTrainError 
    - self.myTestError 
    - self.myLoops 
    - self.myTime 
    - self.myTreeK 
    - self.myTreeLoops 

Convergence Parameters (development mode)

    - self.e_p
    - self.e_n
    - self.e_c
    - self.px_cut 
    - self.lvq 
    - self.convergence_loops 
    - self.error_type 
    - self.error_threshold 
    - self.error_threshold_count 
    - self.convergence_counter_threshold 
    - self.bb_init
    - self.bb_step 
    - self.separate 
    - self.stop_separation 
    - self.bb 
    - self.sa_steps 

## Tree Structure and Multiple Resolutions

For multiple resolutions every parameter becomes a list of *m* parameters.
Example for *m=2*:

	Tmax = [0.9, 0.09]
	Tmin = [0.01, 0.0001]

The training data should look like this:

	train_data = [[np.array, np.array, ...], [np.array, np.array, ...], [np.array, np.array, ...], ...]


## Description of the Optimization Algorithm

The **observed data** are represented by a random variable 
$$X: \Omega \rightarrow S\subseteq \mathbb{R}^d$$
defined in a probability space $(\Omega, \mathcal{F}, \mathbb{P})$.

Given a **similarity measure** (which can be any Bregman divergence, e.g., squared Euclidean distance, Kullback-Leibler divergence, etc.) 
$$d:S\rightarrow \mathrm{ri}(S)$$ 
the goal is to **find a set $\mu$ of $M$ codevectors** 
in the input space **such that** the following average distortion measure is minimized: 

$$ \min_\mu  J(\mu) := E[\min_i d(X,\mu_i)] $$
    
For supervised learning, e.g., classification and regression, each codevector $\mu_i$ is associated with a label $c_i$ as well.
This process is equivalent to finding the most suitable set of $M$
local constant models, and results in a 

> **Piecewise-constant approximation (partition) of the input space $S$**.

To construct a learning algorithm that progressively increases the number 
of codevectors $M$ as needed, 
we define a probability space over an infinite number of local models, 
and constraint their distribution using the maximum-entropy principle 
at different levels.

First we need to adopt a probabilistic approach, and a discrete random variable
$$Q:S \rightarrow \mu$$ 
with countably infinite domain $\mu$.

Then we constraint its distribution by formulating the multi-objective optimization:

$$\min_\mu F(\mu) := (1-\lambda) D(\mu) - \lambda H(\mu)$$
where 
$$D(\mu) := E[d(X,Q)] =\int p(x) \sum_i p(\mu_i|x) d_\phi(x,\mu_i) ~\textrm{d}x$$
and
$$H(\mu) := E[-\log P(X,Q)] =H(X) - \int p(x) \sum_i p(\mu_i|x) \log p(\mu_i|x) ~\textrm{d}x $$
is the Shannon entropy.

This is now a problem of finding the locations $\{\mu_i\}$ and the 
corresponding probabilities
$\{p(\mu_i|x)\}:=\{p(Q=\mu_i|X=x)\}$.

> The **Lagrange multiplier $\lambda\in[0,1]$** is called the **temperature parameter** 

and controls the trade-off between $D$ and $H$.
As $\lambda$ is varied, we essentially transition from one solution of the multi-objective optimization 
(a Pareto point when the objectives are convex) to another, and:

> **Reducing the values of $\lambda$ results in a bifurcation phenomenon that increases $M$ and describes an annealing process**.

The above **sequence of optimization problems** is solved for decreasing values of $\lambda$ using a

> Recursive **gradient-free stochastic approximation** algorithm.

The annealing nature of the algorithm contributes to
avoiding poor local minima, 
offers robustness with respect to the initial conditions,
and provides a means 
to progressively increase the complexity of the learning model
through an intuitive bifurcation phenomenon.

## Cite
If you use this work in an academic context, please cite the following:

    @article{mavridis2023annealing,
        author = {Mavridis, Christos and Baras, John S.},
        journal = {IEEE Transactions on Automatic Control},
        title = {Annealing Optimization for Progressive Learning With Stochastic Approximation},
        year = {2023},
        volume = {68},
        number = {5},
        pages = {2862-2874},
        publisher = {IEEE},
    }

    @article{mavridis2023online,
        title = {Online deterministic annealing for classification and clustering},
        author = {Mavridis, Christos and Baras, John S},
        journal = {IEEE Transactions on Neural Networks and Learning Systems},
        year = {2023},
        volume = {34},
        number = {10},
        pages = {7125-7134},
        publisher = {IEEE},
    }
	  
    @article{mavridis2022multi,
        title = {Multi-Resolution Online Deterministic Annealing: A Hierarchical and Progressive Learning Architecture},
        author = {Mavridis, Christos and Baras, John},
        journal = {arXiv preprint arXiv:2212.08189},
        year = {2024},
    }

## Author 

Christos N. Mavridis, Ph.D. \
Division of Decision and Control Systems \
School of Electrical Engineering and Computer Science, \
KTH Royal Institute of Technology \
https://mavridischristos.github.io/ \
```mavridis (at) kth.se``` 
```c.n.mavridis (at) gmail.com```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "online-deterministic-annealing",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "Christos Mavridis <mavridis@kth.se>",
    "keywords": "annealing, classification, clustering, hybrid system identification, machine learning, optimization, regression, system identification",
    "author": null,
    "author_email": "Christos Mavridis <mavridis@kth.se>",
    "download_url": "https://files.pythonhosted.org/packages/54/07/8dda5235f4ce2d983f4a22755104ea132646744a4a9fd908c78b27188698/online_deterministic_annealing-1.0.1.tar.gz",
    "platform": null,
    "description": "# Online Deterministic Annealing (ODA)\n\n> A general-purpose learning model designed to meet the needs of Cyber-Physical Systems applications in which data and computational/communication resources are limited, and robustness and interpretability are prioritized.\n\n> An **online** prototype-based learning algorithm based on annealing optimization that is formulated as an recursive **gradient-free** stochastic approximation algorithm.\n\n> An interpretable and progressively growing competitive-learning neural network model.\n\n> A hierarchical, multi-resolution learning model. \n\n> **Applications:** Real-time clustering, classification, regression, (state-aggregation) reinforcement learning, hybrid system identification, graph partitioning, leader detection.\n\n## pip install\n\n\tpip install online-deterministic-annealing\n\nCurrent Version: \n\n    version = \"1.0.1\"\n\nDependencies: \n\n    dependencies = [\n    \"numpy>=2.2.6\",\n    \"numba>=0.61.2\",\n    \"matplotlib>=3.10.3\",\n    \"scipy>=1.16.2\",\n    \"shapely>=2.1.2\",\n    ]\n    requires-python = \">=3.8\"\n\nLicense: \n\n    license = \"MIT\"\n\n## Demo\n\nhttps://github.com/MavridisChristos/online-deterministic-annealing/tests/demo.py\n\n## Usage \n\nThe ODA architecture is coded in the ODA class inside ```online_deterministic_annealing/oda.py```:\n\t\n\tfrom online_deterministic_annealing.oda import ODA\n\nRegarding the data format, they need to be a list of *(n)* lists of *(m=1)* *d*-vectors (np.arrays):\n\n\ttrain_data_x = [[np.array], [np.array], [np.array], ...]\n\ttrain_data_y = [np.array, np.array, np.array, ...] | make sure it is np.atleast1d()\n\nThe simplest way to train ODA on a dataset is:\n\n    clf = ODA()\n\n    clf.fit(train_data_x, train_data_y, train_labels,\n            test_data_x, test_data_y, test_labels)\n\n    hatX, hatY, hatLabel, hatMode = clf.predict(test_data_x)\n\nNotice that a dataset is not required, and one can train ODA using observations one at a time as follows:\n\n    while stopping_criterion:\n        \n        # Stop in the next converged configuration\n        tl = len(clf.timeline)\n        while len(clf.timeline)==tl and not clf.trained:\n            train_datum_x, train_datum_y, train_label = system.observe()\n            clf.train(train_datum_x, train_datum_y, train_label)\n\n    hatX, hatY, hatLabel, hatMode = clf.predict(test_data_x)\n\n## Clustering\n\nFor clustering set:\n\n    train_data_x = [[np.array], [np.array], [np.array], ...]\n    train_data_y = [np.atleast1d(0) for td in train_data_x]\n    train_labels = [0 for td in train_data_x] \n\n## Classification\n\nFor classification, the labels need to be a list of *(n)* labels, preferably integer numbers (for numba.jit)\n\n    train_data_x = [[np.array], [np.array], [np.array], ...]\n    train_data_y = [np.atleast1d(0) for td in train_data_x]\n\ttrain_labels = [ int, int , int, ...]\n\n## Regression\n\nFor regression (piece-wise constant function approximation) replace:\n\n    observe_xy = 0.9 | set a value between [0,1] to give more weight to x or y error\n    train_data_x = [[np.array], [np.array], [np.array], ...]\n    train_data_y = [np.array, np.array, np.array, ...]\n    train_labels = [0 for td in train_data_x] \n\n## Simultaneous Classification and Regression (Hybrid Learning)\n\n    train_data_x = [[np.array], [np.array], [np.array], ...]\n    train_data_y = [np.array, np.array, np.array, ...]\n    train_labels = [ int, int , int, ...] \n\n## Prediction\n\n    hatX, hatY, hatLabel, hatMode = clf.predict(test_data_x)\n    error_clustering, error_regression, error_classification = clf.score(data_x, data_y, labels)\n\n## All Parameters\n\nData\n\n    - train_data_x\n        # Single layer: [[np.array], [np.array], [np.array], ...]\n        # Multiple Layers/Resolutions: [[np.array, np.array, ...], [np.array, np.array, ...], [np.array, np.array, ...], ...]\n    - train_data_y\n        # [ np.array, np.array, np.array, ... ] (np.atleast1d())\n    - train_labels\n        # [ 0, 0, 0, ... ] (zero values for clustering)\n        # [ int, int , int, ... ] (int values for classification with numba.jit)\n    - observe_xy = [0]\n        # value in [0,1]. 0 considers only x values, 1 considers only y values, 0.5 considers bothe x,y equally, etc..\n    \nBregman divergence\n\n    - Bregman_phi = ['phi_Eucl']\n        # Defines Bregman divergence d_phi. \n        # Values in {'phi_Eucl', 'phi_KL'} (Squared Euclidean distance, KL divergence)\n\n\nTermination Criteria\n\n    - Kmax = [32]\n        # Limit in node's children. After that stop growing\n    - timeline_limit = 1e6\n        # Limit in the number of convergent representations. (Developer Mode) \n    - error_type = [0] \n        # 0:Clustering, 1:Regression, 2:Classification\n    - error_threshold = [0.01] \n        # Desired training error. \n    - error_threshold_count = [3] \n        # Stop when reached 'error_threshold_count' times\n\n\nTemperature Schedule\n\n    - Tmax = [0.9] \n    - Tmin = [1e-2]\n        # lambda max min values in [0,1]. T = (1-lambda)/lambda\n    - gamma_steady = [0.95] \n        # T' = gamma * T\n    - gamma_schedule = [[0.8,0.8]] \n        # Initial updates can be set to reduce faster\n\n\nTree Structure\n\n    - node_id = [0] \n        # Tree/branch parent node\n    - parent = None \n        # Pointer used to create tree-structured linked list\n\n\nRegularization: Perturbation and Merging\n\n    - lvq = [0] \n        # Values in {0,1,2,3}\n        # 0:ODA update\n        # 1:ODA until Kmax. Then switch to 2:soft clustering with no perturbation/merging \n        # 2:soft clustering with no perturbation/merging \n        # 3: LVQ update (hard-clustering) with no perturbation/merging\n    - px_cut = [1e-5] \n        # Parameter e_r: threshold to find idle codevectors\n    - perturb_param = [1e-1] \n        # Perturb (dublicate) existing codevectors \n        # Parameter delta = d_phi(mu, mu+'delta')/T: \n    - effective_neighborhood = [1e-0] \n        # Threshold to find merged (effective) codevectors\n        # Parameter e_n = d_phi(mu, mu+'effective_neighborhood')/T\n\n\nConvergence \n\n    - em_convergence = [1e-1]\n    - convergence_counter_threshold = [5]\n        # Convergece when d_phi(mu',mu) < e_c * (1+bb_init)/(1+bb) for 'convergence_counter_threshold' times\n        # Parameter e_c =  d_phi(mu, mu+'em_convergence')/T\n    - convergence_loops = [0]\n        # Custom number of loops until convergence is considered true (overwrites e_c) (Developer mode)\n    - stop_separation = [1e9-1]\n        # After 'stop_separation' loops, gibbs probabilities consider all codevectors regardless of class \n    - bb_init = [0.9]\n        # Initial bb value for stochastic approximation stepsize: 1/(bb+1)\n    - bb_step = [0.9]\n        # bb+=bb_step\n\nVerbose\n\n    - verbose = 2 \n        # Values in {0,1,2}    \n        # 0: don't show score\n        # 1: show score only on tree node splits \n        # 2: show score after every SA convergence \n\nNumba Jit\n\n    - jit = True\n        # Using jit/python for Bregman divergences\n\n## Model Parameters \n\nTree Structure\n\n    - self.id\n    - self.parent\n    - self.children\n\nStatus\n\n    - self.K\n    - self.T\n\n    - self.perturbed = False\n    - self.converged = False\n    - self.trained = False\n\nVariables\n\n    - self.x \n    - self.y \n    - self.labels \n    - self.classes\n    - self.parameters \n    - self.model \n    - self.optimizer \n\n    - self.px \n    - self.sx \n\nHistory\n\n    - self.myK \n    - self.myT \n    - self.myX \n    - self.myY \n    - self.myLabels \n    - self.myParameters\n    - self.myModels\n    - self.myOptimizers\n\n    - self.myTrainError \n    - self.myTestError \n    - self.myLoops \n    - self.myTime \n    - self.myTreeK \n    - self.myTreeLoops \n\nConvergence Parameters (development mode)\n\n    - self.e_p\n    - self.e_n\n    - self.e_c\n    - self.px_cut \n    - self.lvq \n    - self.convergence_loops \n    - self.error_type \n    - self.error_threshold \n    - self.error_threshold_count \n    - self.convergence_counter_threshold \n    - self.bb_init\n    - self.bb_step \n    - self.separate \n    - self.stop_separation \n    - self.bb \n    - self.sa_steps \n\n## Tree Structure and Multiple Resolutions\n\nFor multiple resolutions every parameter becomes a list of *m* parameters.\nExample for *m=2*:\n\n\tTmax = [0.9, 0.09]\n\tTmin = [0.01, 0.0001]\n\nThe training data should look like this:\n\n\ttrain_data = [[np.array, np.array, ...], [np.array, np.array, ...], [np.array, np.array, ...], ...]\n\n\n## Description of the Optimization Algorithm\n\nThe **observed data** are represented by a random variable \n$$X: \\Omega \\rightarrow S\\subseteq \\mathbb{R}^d$$\ndefined in a probability space $(\\Omega, \\mathcal{F}, \\mathbb{P})$.\n\nGiven a **similarity measure** (which can be any Bregman divergence, e.g., squared Euclidean distance, Kullback-Leibler divergence, etc.) \n$$d:S\\rightarrow \\mathrm{ri}(S)$$ \nthe goal is to **find a set $\\mu$ of $M$ codevectors** \nin the input space **such that** the following average distortion measure is minimized: \n\n$$ \\min_\\mu  J(\\mu) := E[\\min_i d(X,\\mu_i)] $$\n    \nFor supervised learning, e.g., classification and regression, each codevector $\\mu_i$ is associated with a label $c_i$ as well.\nThis process is equivalent to finding the most suitable set of $M$\nlocal constant models, and results in a \n\n> **Piecewise-constant approximation (partition) of the input space $S$**.\n\nTo construct a learning algorithm that progressively increases the number \nof codevectors $M$ as needed, \nwe define a probability space over an infinite number of local models, \nand constraint their distribution using the maximum-entropy principle \nat different levels.\n\nFirst we need to adopt a probabilistic approach, and a discrete random variable\n$$Q:S \\rightarrow \\mu$$ \nwith countably infinite domain $\\mu$.\n\nThen we constraint its distribution by formulating the multi-objective optimization:\n\n$$\\min_\\mu F(\\mu) := (1-\\lambda) D(\\mu) - \\lambda H(\\mu)$$\nwhere \n$$D(\\mu) := E[d(X,Q)] =\\int p(x) \\sum_i p(\\mu_i|x) d_\\phi(x,\\mu_i) ~\\textrm{d}x$$\nand\n$$H(\\mu) := E[-\\log P(X,Q)] =H(X) - \\int p(x) \\sum_i p(\\mu_i|x) \\log p(\\mu_i|x) ~\\textrm{d}x $$\nis the Shannon entropy.\n\nThis is now a problem of finding the locations $\\{\\mu_i\\}$ and the \ncorresponding probabilities\n$\\{p(\\mu_i|x)\\}:=\\{p(Q=\\mu_i|X=x)\\}$.\n\n> The **Lagrange multiplier $\\lambda\\in[0,1]$** is called the **temperature parameter** \n\nand controls the trade-off between $D$ and $H$.\nAs $\\lambda$ is varied, we essentially transition from one solution of the multi-objective optimization \n(a Pareto point when the objectives are convex) to another, and:\n\n> **Reducing the values of $\\lambda$ results in a bifurcation phenomenon that increases $M$ and describes an annealing process**.\n\nThe above **sequence of optimization problems** is solved for decreasing values of $\\lambda$ using a\n\n> Recursive **gradient-free stochastic approximation** algorithm.\n\nThe annealing nature of the algorithm contributes to\navoiding poor local minima, \noffers robustness with respect to the initial conditions,\nand provides a means \nto progressively increase the complexity of the learning model\nthrough an intuitive bifurcation phenomenon.\n\n## Cite\nIf you use this work in an academic context, please cite the following:\n\n    @article{mavridis2023annealing,\n        author = {Mavridis, Christos and Baras, John S.},\n        journal = {IEEE Transactions on Automatic Control},\n        title = {Annealing Optimization for Progressive Learning With Stochastic Approximation},\n        year = {2023},\n        volume = {68},\n        number = {5},\n        pages = {2862-2874},\n        publisher = {IEEE},\n    }\n\n    @article{mavridis2023online,\n        title = {Online deterministic annealing for classification and clustering},\n        author = {Mavridis, Christos and Baras, John S},\n        journal = {IEEE Transactions on Neural Networks and Learning Systems},\n        year = {2023},\n        volume = {34},\n        number = {10},\n        pages = {7125-7134},\n        publisher = {IEEE},\n    }\n\t  \n    @article{mavridis2022multi,\n        title = {Multi-Resolution Online Deterministic Annealing: A Hierarchical and Progressive Learning Architecture},\n        author = {Mavridis, Christos and Baras, John},\n        journal = {arXiv preprint arXiv:2212.08189},\n        year = {2024},\n    }\n\n## Author \n\nChristos N. Mavridis, Ph.D. \\\nDivision of Decision and Control Systems \\\nSchool of Electrical Engineering and Computer Science, \\\nKTH Royal Institute of Technology \\\nhttps://mavridischristos.github.io/ \\\n```mavridis (at) kth.se``` \n```c.n.mavridis (at) gmail.com``` ",
    "bugtrack_url": null,
    "license": null,
    "summary": "Online Deterministic Annealing Algorithm",
    "version": "1.0.1",
    "project_urls": {
        "Homepage": "https://mavridischristos.github.io/",
        "Repository": "https://github.com/MavridisChristos/online-deterministic-annealing"
    },
    "split_keywords": [
        "annealing",
        " classification",
        " clustering",
        " hybrid system identification",
        " machine learning",
        " optimization",
        " regression",
        " system identification"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "80afa383fb29ebec98d1fac7ec83f127eef3ee42d77e362f45fd69fc14f793b6",
                "md5": "03c781ac7ec3ab6e31d173f6a1ccacca",
                "sha256": "c0f2ab04af434cf4c6ee3d1c3be4d7e3d6eed844a89893f0927f44f0f155f303"
            },
            "downloads": -1,
            "filename": "online_deterministic_annealing-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "03c781ac7ec3ab6e31d173f6a1ccacca",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 20211,
            "upload_time": "2025-10-16T21:26:52",
            "upload_time_iso_8601": "2025-10-16T21:26:52.906202Z",
            "url": "https://files.pythonhosted.org/packages/80/af/a383fb29ebec98d1fac7ec83f127eef3ee42d77e362f45fd69fc14f793b6/online_deterministic_annealing-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "54078dda5235f4ce2d983f4a22755104ea132646744a4a9fd908c78b27188698",
                "md5": "653327837a2b795d340e487fd073663c",
                "sha256": "5eff6b70d7be4eb45609b52136767389b3ed7a1755b171043406d74be855fb43"
            },
            "downloads": -1,
            "filename": "online_deterministic_annealing-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "653327837a2b795d340e487fd073663c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 838564,
            "upload_time": "2025-10-16T21:26:54",
            "upload_time_iso_8601": "2025-10-16T21:26:54.749597Z",
            "url": "https://files.pythonhosted.org/packages/54/07/8dda5235f4ce2d983f4a22755104ea132646744a4a9fd908c78b27188698/online_deterministic_annealing-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-16 21:26:54",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "MavridisChristos",
    "github_project": "online-deterministic-annealing",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "2.2.6"
                ]
            ]
        },
        {
            "name": "numba",
            "specs": [
                [
                    ">=",
                    "0.61.2"
                ]
            ]
        },
        {
            "name": "matplotlib",
            "specs": [
                [
                    ">=",
                    "3.10.3"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    ">=",
                    "1.16.2"
                ]
            ]
        },
        {
            "name": "shapely",
            "specs": [
                [
                    ">=",
                    "2.1.2"
                ]
            ]
        }
    ],
    "lcname": "online-deterministic-annealing"
}

None