[](https://github.com/jonnevd/constrained-linkage/actions/workflows/test.yml)
[](https://app.codecov.io/github/jonnevd/constrained-linkage/branch/main)
[](https://pypi.org/project/constrained-linkage/)
# Constrained Hierarchical Agglomerative Clustering
This repository contains the implementation of the constrained linkage function for Constrained Hierarchical Agglomerative Clustering from the paper:
> **HEAT: Hierarchical-constrained Encoder-Assisted Time series clustering for fault detection in district heating substations**
> *Jonne van Dreven, Abbas Cheddad, Ahmad Nauman Ghazi, Sadi Alawadi, Jad Al Koussa, Dirk Vanhoudt*
> *Energy and AI, 21 (2025), 100548*
> DOI: [10.1016/j.egyai.2025.100548](https://doi.org/10.1016/j.egyai.2025.100548)
If you use this library in academic or scientific work, please cite:
```bibtex
@article{van_Dreven-HEAT,
title={HEAT: Hierarchical-constrained Encoder-Assisted Time series clustering for fault detection in district heating substations},
volume={21},
ISSN={2666-5468},
DOI={10.1016/j.egyai.2025.100548},
journal={Energy and AI},
author={van Dreven, Jonne and Cheddad, Abbas and Ghazi, Ahmad Nauman and Alawadi, Sadi and Al Koussa, Jad and Vanhoudt, Dirk},
year={2025},
month=sep,
pages={100548}
}
```
A **NumPy-only** hierarchical agglomerative clustering routine with **soft constraints**, returning a SciPy-compatible linkage matrix `Z`.
## ✨ Features
- Drop-in replacement for a constrained `linkage` routine supporting:
- `single`, `complete`, `average`, `weighted`, `centroid`, `median`, `ward`
- Accepts **either**:
- condensed 1-D distances (`len n*(n-1)/2`)
- `n×n` square distance matrix
- Adds **soft constraints**:
- **Must-link / Cannot-link** via a constraint matrix `M`
- `M[i,j] < 0` → encourages merging (must-link)
- `M[i,j] > 0` → discourages merging (cannot-link)
- **Min/max cluster size** penalties (linear in violation amount)
- Minimum adds a penalty when a merge would create a cluster smaller than the specified minimum.
- Maximum adds a penalty when a merge would create a cluster larger than the specified maximum.
- Penalty grows linearly with how far below/above the minimum/maximum the cluster size is.
- **Normalised** distances
- When `normalize_distances=True`, the penalties are relative to the [0, 1] normalized distance range, making them proportional regardless of the original distance scale.
- No SciPy dependency — output `Z` works with SciPy’s downstream tools.
---
## 🔌 Plug-and-play
`constrained_linkage` is a **drop-in replacement** for SciPy’s `linkage` function.
- **No constraints?** Works identically to `scipy.cluster.hierarchy.linkage`.
- **With constraints?** Adds powerful, flexible soft constraints with minimal code changes.
- Output is a **SciPy-compatible linkage matrix `Z`**, so you can keep using all SciPy tools (e.g., `fcluster`, `dendrogram`) unchanged.
---
## 🔧 Install
```bash
pip install constrained-linkage
# from source:
pip install "git+https://github.com/jonnevd/constrained-linkage"
```
---
## 🚀 Usage Examples
Below we illustrate **must-link** (negative penalties) and **cannot-link** (positive penalties) via the constraint matrix `M`.
All distances are optionally scaled to `[0,1]` when `normalize_distances=True`, so penalties are **scale-free**.
> **Semantics:**
> - `M[i, j] < 0` → **must-link** (encourage merging i↔j)
> - `M[i, j] > 0` → **cannot-link** (discourage merging i↔j)
---
### Example 1 — Must-link & Cannot-link constraints
<p align="center">
<img src="https://github.com/jonnevd/constrained-linkage/blob/main/docs/CM_cluster_effect.png" alt="Effect of constraint matrix on clusters" width="500">
</p>
```python
import numpy as np
from constrained_linkage import constrained_linkage
from scipy.cluster import hierarchy
import matplotlib.pyplot as plt
# Four points in 1D (two well-separated pairs)
X = np.array([[0.0], [0.1], [10.0], [10.1]])
D = np.sqrt(((X[:, None, :] - X[None, :, :]) ** 2).sum(-1))
# Constraint matrix: must-link 0↔1, cannot-link 2↔3
M = np.zeros_like(D)
M[0, 1] = M[1, 0] = -0.6 # must-link (negative)
M[2, 3] = M[3, 2] = 0.6 # cannot-link (positive)
Z = constrained_linkage(
D, method="average",
constraint_matrix=M,
normalize_distances=True
)
# Works seamlessly with SciPy tools
labels = hierarchy.fcluster(Z, 2, criterion="maxclust")
print("Partition with must-link(0,1) & cannot-link(2,3):", labels)
plt.figure(figsize=(6, 3))
hierarchy.dendrogram(Z, labels=[f"P{i}" for i in range(len(X))])
plt.title("Dendrogram — must-link(0,1), cannot-link(2,3)")
plt.tight_layout()
plt.show()
```
### Example 2 — Enforcing a maximum cluster size
Discourage clusters larger than a threshold by adding a positive penalty above the maximum.
```python
import numpy as np
from constrained_linkage import constrained_linkage
from scipy.cluster import hierarchy
# Six points in 1D (three tight pairs)
X = np.array([[0.0], [0.1], [5.0], [5.1], [10.0], [10.1]])
D = np.sqrt(((X[:, None, :] - X[None, :, :]) ** 2).sum(-1))
Z_max = constrained_linkage(
D, method="average",
max_cluster_size=2, # soft cap
max_penalty_weight=0.6, # stronger => avoids overgrown clusters
normalize_distances=True
)
labels_max = hierarchy.fcluster(Z_max, 3, criterion="maxclust")
print("Partition with max_cluster_size=2:", labels_max)
```
### Example 3 — Enforcing a minimum cluster size
When domain knowledge suggests small units should coalesce before analysis, use a minimum size prior to avoid singletons or small groups. Increasing the penalty weight strengthens this bias, as shown in the figure below.
<p align="center">
<img src="https://github.com/jonnevd/constrained-linkage/blob/main/docs/min_cluster_effect.png" alt="Effect of min_cluster_size penalty on small clusters" width="500">
</p>
```python
import numpy as np
from constrained_linkage import constrained_linkage
from scipy.cluster import hierarchy
# Six points in 1D (three tight pairs)
X = np.array([[0.0], [0.1], [5.0], [5.1], [10.0], [10.1]])
D = np.sqrt(((X[:, None, :] - X[None, :, :]) ** 2).sum(-1))
Z_min = constrained_linkage(
D, method="average",
min_cluster_size=3, # target minimum size
min_penalty_weight=0.5, # stronger => merge undersized clusters earlier
normalize_distances=True
)
labels_min = hierarchy.fcluster(Z_min, 2, criterion="maxclust")
print("Partition with min_cluster_size=3:", labels_min)
```
Raw data
{
"_id": null,
"home_page": null,
"name": "constrained-linkage",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "HAC, clustering, constraints, hierarchical, linkage",
"author": null,
"author_email": "Jonne van Dreven <jva@bth.se>",
"download_url": "https://files.pythonhosted.org/packages/0c/db/d298259ecadd50fbf7e950515499e93ffb468562c48a2802bf74bfaeabd5/constrained_linkage-1.0.1.tar.gz",
"platform": null,
"description": "[](https://github.com/jonnevd/constrained-linkage/actions/workflows/test.yml)\n[](https://app.codecov.io/github/jonnevd/constrained-linkage/branch/main)\n[](https://pypi.org/project/constrained-linkage/)\n\n# Constrained Hierarchical Agglomerative Clustering\nThis repository contains the implementation of the constrained linkage function for Constrained Hierarchical Agglomerative Clustering from the paper:\n\n> **HEAT: Hierarchical-constrained Encoder-Assisted Time series clustering for fault detection in district heating substations** \n> *Jonne van Dreven, Abbas Cheddad, Ahmad Nauman Ghazi, Sadi Alawadi, Jad Al Koussa, Dirk Vanhoudt* \n> *Energy and AI, 21 (2025), 100548* \n> DOI: [10.1016/j.egyai.2025.100548](https://doi.org/10.1016/j.egyai.2025.100548)\n\nIf you use this library in academic or scientific work, please cite:\n\n```bibtex\n@article{van_Dreven-HEAT,\n title={HEAT: Hierarchical-constrained Encoder-Assisted Time series clustering for fault detection in district heating substations},\n volume={21},\n ISSN={2666-5468},\n DOI={10.1016/j.egyai.2025.100548},\n journal={Energy and AI},\n author={van Dreven, Jonne and Cheddad, Abbas and Ghazi, Ahmad Nauman and Alawadi, Sadi and Al Koussa, Jad and Vanhoudt, Dirk},\n year={2025},\n month=sep,\n pages={100548}\n}\n```\n\nA **NumPy-only** hierarchical agglomerative clustering routine with **soft constraints**, returning a SciPy-compatible linkage matrix `Z`.\n\n## \u2728 Features\n\n- Drop-in replacement for a constrained `linkage` routine supporting:\n - `single`, `complete`, `average`, `weighted`, `centroid`, `median`, `ward`\n- Accepts **either**:\n - condensed 1-D distances (`len n*(n-1)/2`)\n - `n\u00d7n` square distance matrix\n- Adds **soft constraints**:\n - **Must-link / Cannot-link** via a constraint matrix `M`\n - `M[i,j] < 0` \u2192 encourages merging (must-link)\n - `M[i,j] > 0` \u2192 discourages merging (cannot-link)\n - **Min/max cluster size** penalties (linear in violation amount)\n - Minimum adds a penalty when a merge would create a cluster smaller than the specified minimum.\n - Maximum adds a penalty when a merge would create a cluster larger than the specified maximum.\n - Penalty grows linearly with how far below/above the minimum/maximum the cluster size is. \n - **Normalised** distances\n - When `normalize_distances=True`, the penalties are relative to the [0, 1] normalized distance range, making them proportional regardless of the original distance scale.\n- No SciPy dependency \u2014 output `Z` works with SciPy\u2019s downstream tools.\n\n---\n\n## \ud83d\udd0c Plug-and-play\n\n`constrained_linkage` is a **drop-in replacement** for SciPy\u2019s `linkage` function. \n\n- **No constraints?** Works identically to `scipy.cluster.hierarchy.linkage`. \n- **With constraints?** Adds powerful, flexible soft constraints with minimal code changes. \n- Output is a **SciPy-compatible linkage matrix `Z`**, so you can keep using all SciPy tools (e.g., `fcluster`, `dendrogram`) unchanged.\n\n---\n\n## \ud83d\udd27 Install\n\n```bash\npip install constrained-linkage\n# from source:\npip install \"git+https://github.com/jonnevd/constrained-linkage\"\n```\n\n---\n\n## \ud83d\ude80 Usage Examples\n\nBelow we illustrate **must-link** (negative penalties) and **cannot-link** (positive penalties) via the constraint matrix `M`. \nAll distances are optionally scaled to `[0,1]` when `normalize_distances=True`, so penalties are **scale-free**.\n\n> **Semantics:** \n> - `M[i, j] < 0` \u2192 **must-link** (encourage merging i\u2194j) \n> - `M[i, j] > 0` \u2192 **cannot-link** (discourage merging i\u2194j)\n\n---\n\n### Example 1 \u2014 Must-link & Cannot-link constraints\n\n<p align=\"center\">\n <img src=\"https://github.com/jonnevd/constrained-linkage/blob/main/docs/CM_cluster_effect.png\" alt=\"Effect of constraint matrix on clusters\" width=\"500\">\n</p>\n\n```python\nimport numpy as np\nfrom constrained_linkage import constrained_linkage\nfrom scipy.cluster import hierarchy\nimport matplotlib.pyplot as plt\n\n# Four points in 1D (two well-separated pairs)\nX = np.array([[0.0], [0.1], [10.0], [10.1]])\nD = np.sqrt(((X[:, None, :] - X[None, :, :]) ** 2).sum(-1))\n\n# Constraint matrix: must-link 0\u21941, cannot-link 2\u21943\nM = np.zeros_like(D)\nM[0, 1] = M[1, 0] = -0.6 # must-link (negative)\nM[2, 3] = M[3, 2] = 0.6 # cannot-link (positive)\n\nZ = constrained_linkage(\n D, method=\"average\",\n constraint_matrix=M,\n normalize_distances=True\n)\n\n# Works seamlessly with SciPy tools\nlabels = hierarchy.fcluster(Z, 2, criterion=\"maxclust\")\nprint(\"Partition with must-link(0,1) & cannot-link(2,3):\", labels)\n\nplt.figure(figsize=(6, 3))\nhierarchy.dendrogram(Z, labels=[f\"P{i}\" for i in range(len(X))])\nplt.title(\"Dendrogram \u2014 must-link(0,1), cannot-link(2,3)\")\nplt.tight_layout()\nplt.show()\n```\n\n### Example 2 \u2014 Enforcing a maximum cluster size\n\nDiscourage clusters larger than a threshold by adding a positive penalty above the maximum.\n\n```python\nimport numpy as np\nfrom constrained_linkage import constrained_linkage\nfrom scipy.cluster import hierarchy\n\n# Six points in 1D (three tight pairs)\nX = np.array([[0.0], [0.1], [5.0], [5.1], [10.0], [10.1]])\nD = np.sqrt(((X[:, None, :] - X[None, :, :]) ** 2).sum(-1))\n\nZ_max = constrained_linkage(\n D, method=\"average\",\n max_cluster_size=2, # soft cap\n max_penalty_weight=0.6, # stronger => avoids overgrown clusters\n normalize_distances=True\n)\n\nlabels_max = hierarchy.fcluster(Z_max, 3, criterion=\"maxclust\")\nprint(\"Partition with max_cluster_size=2:\", labels_max)\n```\n\n\n### Example 3 \u2014 Enforcing a minimum cluster size\n\nWhen domain knowledge suggests small units should coalesce before analysis, use a minimum size prior to avoid singletons or small groups. Increasing the penalty weight strengthens this bias, as shown in the figure below.\n\n<p align=\"center\">\n <img src=\"https://github.com/jonnevd/constrained-linkage/blob/main/docs/min_cluster_effect.png\" alt=\"Effect of min_cluster_size penalty on small clusters\" width=\"500\">\n</p>\n\n```python\nimport numpy as np\nfrom constrained_linkage import constrained_linkage\nfrom scipy.cluster import hierarchy\n\n# Six points in 1D (three tight pairs)\nX = np.array([[0.0], [0.1], [5.0], [5.1], [10.0], [10.1]])\nD = np.sqrt(((X[:, None, :] - X[None, :, :]) ** 2).sum(-1))\n\nZ_min = constrained_linkage(\n D, method=\"average\",\n min_cluster_size=3, # target minimum size\n min_penalty_weight=0.5, # stronger => merge undersized clusters earlier\n normalize_distances=True\n)\n\nlabels_min = hierarchy.fcluster(Z_min, 2, criterion=\"maxclust\")\nprint(\"Partition with min_cluster_size=3:\", labels_min)\n```",
"bugtrack_url": null,
"license": "MIT",
"summary": "Hierarchical agglomerative clustering with soft constraints (SciPy-compatible Z).",
"version": "1.0.1",
"project_urls": {
"Documentation": "https://github.com/jonnevd/constrained-linkage#readme",
"Homepage": "https://github.com/jonnevd/constrained-linkage",
"Issues": "https://github.com/jonnevd/constrained-linkage/issues",
"Source": "https://github.com/jonnevd/constrained-linkage"
},
"split_keywords": [
"hac",
" clustering",
" constraints",
" hierarchical",
" linkage"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "4a2bd2f1f203be0324aa9a23f0b5a26ae14194d495358ceacaffa54b0479237c",
"md5": "96cc13259a561c4009f7405f74817123",
"sha256": "b2accd6f0e82af5ecec576e007dd894fd4d28e0bcb477b8c8161d596bf6131fa"
},
"downloads": -1,
"filename": "constrained_linkage-1.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "96cc13259a561c4009f7405f74817123",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 10711,
"upload_time": "2025-08-20T10:24:56",
"upload_time_iso_8601": "2025-08-20T10:24:56.118350Z",
"url": "https://files.pythonhosted.org/packages/4a/2b/d2f1f203be0324aa9a23f0b5a26ae14194d495358ceacaffa54b0479237c/constrained_linkage-1.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "0cdbd298259ecadd50fbf7e950515499e93ffb468562c48a2802bf74bfaeabd5",
"md5": "4d7e789f3e790ce9e163af2a21ffdec4",
"sha256": "1480cfe2955e53c31c168c5a4246ebaefa1e8ade6a4d0047620de11a451bee84"
},
"downloads": -1,
"filename": "constrained_linkage-1.0.1.tar.gz",
"has_sig": false,
"md5_digest": "4d7e789f3e790ce9e163af2a21ffdec4",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 95512,
"upload_time": "2025-08-20T10:24:58",
"upload_time_iso_8601": "2025-08-20T10:24:58.926099Z",
"url": "https://files.pythonhosted.org/packages/0c/db/d298259ecadd50fbf7e950515499e93ffb468562c48a2802bf74bfaeabd5/constrained_linkage-1.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-20 10:24:58",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "jonnevd",
"github_project": "constrained-linkage#readme",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "constrained-linkage"
}