constrained-linkage


Nameconstrained-linkage JSON
Version 1.0.1 PyPI version JSON
download
home_pageNone
SummaryHierarchical agglomerative clustering with soft constraints (SciPy-compatible Z).
upload_time2025-08-20 10:24:58
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT
keywords hac clustering constraints hierarchical linkage
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![Tests](https://github.com/jonnevd/constrained-linkage/actions/workflows/test.yml/badge.svg)](https://github.com/jonnevd/constrained-linkage/actions/workflows/test.yml)
[![Coverage Status](https://img.shields.io/codecov/c/github/jonnevd/constrained-linkage/main.svg)](https://app.codecov.io/github/jonnevd/constrained-linkage/branch/main)
[![PyPI version](https://img.shields.io/pypi/v/constrained-linkage.svg)](https://pypi.org/project/constrained-linkage/)

# Constrained Hierarchical Agglomerative Clustering
This repository contains the implementation of the constrained linkage function for Constrained Hierarchical Agglomerative Clustering from the paper:

> **HEAT: Hierarchical-constrained Encoder-Assisted Time series clustering for fault detection in district heating substations**  
> *Jonne van Dreven, Abbas Cheddad, Ahmad Nauman Ghazi, Sadi Alawadi, Jad Al Koussa, Dirk Vanhoudt*  
> *Energy and AI, 21 (2025), 100548*  
> DOI: [10.1016/j.egyai.2025.100548](https://doi.org/10.1016/j.egyai.2025.100548)

If you use this library in academic or scientific work, please cite:

```bibtex
@article{van_Dreven-HEAT,
  title={HEAT: Hierarchical-constrained Encoder-Assisted Time series clustering for fault detection in district heating substations},
  volume={21},
  ISSN={2666-5468},
  DOI={10.1016/j.egyai.2025.100548},
  journal={Energy and AI},
  author={van Dreven, Jonne and Cheddad, Abbas and Ghazi, Ahmad Nauman and Alawadi, Sadi and Al Koussa, Jad and Vanhoudt, Dirk},
  year={2025},
  month=sep,
  pages={100548}
}
```

A **NumPy-only** hierarchical agglomerative clustering routine with **soft constraints**, returning a SciPy-compatible linkage matrix `Z`.

## ✨ Features

- Drop-in replacement for a constrained `linkage` routine supporting:
  - `single`, `complete`, `average`, `weighted`, `centroid`, `median`, `ward`
- Accepts **either**:
  - condensed 1-D distances (`len n*(n-1)/2`)
  - `n×n` square distance matrix
- Adds **soft constraints**:
  - **Must-link / Cannot-link** via a constraint matrix `M`
    - `M[i,j] < 0` → encourages merging (must-link)
    - `M[i,j] > 0` → discourages merging (cannot-link)
  - **Min/max cluster size** penalties (linear in violation amount)
    - Minimum adds a penalty when a merge would create a cluster smaller than the specified minimum.
    - Maximum adds a penalty when a merge would create a cluster larger than the specified maximum.
    - Penalty grows linearly with how far below/above the minimum/maximum the cluster size is. 
  - **Normalised** distances
    - When `normalize_distances=True`, the penalties are relative to the [0, 1] normalized distance range, making them proportional regardless of the original distance scale.
- No SciPy dependency — output `Z` works with SciPy’s downstream tools.

---

## 🔌 Plug-and-play

`constrained_linkage` is a **drop-in replacement** for SciPy’s `linkage` function.  

- **No constraints?** Works identically to `scipy.cluster.hierarchy.linkage`.  
- **With constraints?** Adds powerful, flexible soft constraints with minimal code changes.  
- Output is a **SciPy-compatible linkage matrix `Z`**, so you can keep using all SciPy tools (e.g., `fcluster`, `dendrogram`) unchanged.

---

## 🔧 Install

```bash
pip install constrained-linkage
# from source:
pip install "git+https://github.com/jonnevd/constrained-linkage"
```

---

## 🚀 Usage Examples

Below we illustrate **must-link** (negative penalties) and **cannot-link** (positive penalties) via the constraint matrix `M`.  
All distances are optionally scaled to `[0,1]` when `normalize_distances=True`, so penalties are **scale-free**.

> **Semantics:**  
> - `M[i, j] < 0` → **must-link** (encourage merging i↔j)  
> - `M[i, j] > 0` → **cannot-link** (discourage merging i↔j)

---

### Example 1 — Must-link & Cannot-link constraints

<p align="center">
  <img src="https://github.com/jonnevd/constrained-linkage/blob/main/docs/CM_cluster_effect.png" alt="Effect of constraint matrix on clusters" width="500">
</p>

```python
import numpy as np
from constrained_linkage import constrained_linkage
from scipy.cluster import hierarchy
import matplotlib.pyplot as plt

# Four points in 1D (two well-separated pairs)
X = np.array([[0.0], [0.1], [10.0], [10.1]])
D = np.sqrt(((X[:, None, :] - X[None, :, :]) ** 2).sum(-1))

# Constraint matrix: must-link 0↔1, cannot-link 2↔3
M = np.zeros_like(D)
M[0, 1] = M[1, 0] = -0.6   # must-link (negative)
M[2, 3] = M[3, 2] =  0.6   # cannot-link (positive)

Z = constrained_linkage(
    D, method="average",
    constraint_matrix=M,
    normalize_distances=True
)

# Works seamlessly with SciPy tools
labels = hierarchy.fcluster(Z, 2, criterion="maxclust")
print("Partition with must-link(0,1) & cannot-link(2,3):", labels)

plt.figure(figsize=(6, 3))
hierarchy.dendrogram(Z, labels=[f"P{i}" for i in range(len(X))])
plt.title("Dendrogram — must-link(0,1), cannot-link(2,3)")
plt.tight_layout()
plt.show()
```

### Example 2 — Enforcing a maximum cluster size

Discourage clusters larger than a threshold by adding a positive penalty above the maximum.

```python
import numpy as np
from constrained_linkage import constrained_linkage
from scipy.cluster import hierarchy

# Six points in 1D (three tight pairs)
X = np.array([[0.0], [0.1], [5.0], [5.1], [10.0], [10.1]])
D = np.sqrt(((X[:, None, :] - X[None, :, :]) ** 2).sum(-1))

Z_max = constrained_linkage(
    D, method="average",
    max_cluster_size=2,     # soft cap
    max_penalty_weight=0.6, # stronger => avoids overgrown clusters
    normalize_distances=True
)

labels_max = hierarchy.fcluster(Z_max, 3, criterion="maxclust")
print("Partition with max_cluster_size=2:", labels_max)
```


### Example 3 — Enforcing a minimum cluster size

When domain knowledge suggests small units should coalesce before analysis, use a minimum size prior to avoid singletons or small groups. Increasing the penalty weight strengthens this bias, as shown in the figure below.

<p align="center">
  <img src="https://github.com/jonnevd/constrained-linkage/blob/main/docs/min_cluster_effect.png" alt="Effect of min_cluster_size penalty on small clusters" width="500">
</p>

```python
import numpy as np
from constrained_linkage import constrained_linkage
from scipy.cluster import hierarchy

# Six points in 1D (three tight pairs)
X = np.array([[0.0], [0.1], [5.0], [5.1], [10.0], [10.1]])
D = np.sqrt(((X[:, None, :] - X[None, :, :]) ** 2).sum(-1))

Z_min = constrained_linkage(
    D, method="average",
    min_cluster_size=3,     # target minimum size
    min_penalty_weight=0.5, # stronger => merge undersized clusters earlier
    normalize_distances=True
)

labels_min = hierarchy.fcluster(Z_min, 2, criterion="maxclust")
print("Partition with min_cluster_size=3:", labels_min)
```
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "constrained-linkage",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "HAC, clustering, constraints, hierarchical, linkage",
    "author": null,
    "author_email": "Jonne van Dreven <jva@bth.se>",
    "download_url": "https://files.pythonhosted.org/packages/0c/db/d298259ecadd50fbf7e950515499e93ffb468562c48a2802bf74bfaeabd5/constrained_linkage-1.0.1.tar.gz",
    "platform": null,
    "description": "[![Tests](https://github.com/jonnevd/constrained-linkage/actions/workflows/test.yml/badge.svg)](https://github.com/jonnevd/constrained-linkage/actions/workflows/test.yml)\n[![Coverage Status](https://img.shields.io/codecov/c/github/jonnevd/constrained-linkage/main.svg)](https://app.codecov.io/github/jonnevd/constrained-linkage/branch/main)\n[![PyPI version](https://img.shields.io/pypi/v/constrained-linkage.svg)](https://pypi.org/project/constrained-linkage/)\n\n# Constrained Hierarchical Agglomerative Clustering\nThis repository contains the implementation of the constrained linkage function for Constrained Hierarchical Agglomerative Clustering from the paper:\n\n> **HEAT: Hierarchical-constrained Encoder-Assisted Time series clustering for fault detection in district heating substations**  \n> *Jonne van Dreven, Abbas Cheddad, Ahmad Nauman Ghazi, Sadi Alawadi, Jad Al Koussa, Dirk Vanhoudt*  \n> *Energy and AI, 21 (2025), 100548*  \n> DOI: [10.1016/j.egyai.2025.100548](https://doi.org/10.1016/j.egyai.2025.100548)\n\nIf you use this library in academic or scientific work, please cite:\n\n```bibtex\n@article{van_Dreven-HEAT,\n  title={HEAT: Hierarchical-constrained Encoder-Assisted Time series clustering for fault detection in district heating substations},\n  volume={21},\n  ISSN={2666-5468},\n  DOI={10.1016/j.egyai.2025.100548},\n  journal={Energy and AI},\n  author={van Dreven, Jonne and Cheddad, Abbas and Ghazi, Ahmad Nauman and Alawadi, Sadi and Al Koussa, Jad and Vanhoudt, Dirk},\n  year={2025},\n  month=sep,\n  pages={100548}\n}\n```\n\nA **NumPy-only** hierarchical agglomerative clustering routine with **soft constraints**, returning a SciPy-compatible linkage matrix `Z`.\n\n## \u2728 Features\n\n- Drop-in replacement for a constrained `linkage` routine supporting:\n  - `single`, `complete`, `average`, `weighted`, `centroid`, `median`, `ward`\n- Accepts **either**:\n  - condensed 1-D distances (`len n*(n-1)/2`)\n  - `n\u00d7n` square distance matrix\n- Adds **soft constraints**:\n  - **Must-link / Cannot-link** via a constraint matrix `M`\n    - `M[i,j] < 0` \u2192 encourages merging (must-link)\n    - `M[i,j] > 0` \u2192 discourages merging (cannot-link)\n  - **Min/max cluster size** penalties (linear in violation amount)\n    - Minimum adds a penalty when a merge would create a cluster smaller than the specified minimum.\n    - Maximum adds a penalty when a merge would create a cluster larger than the specified maximum.\n    - Penalty grows linearly with how far below/above the minimum/maximum the cluster size is. \n  - **Normalised** distances\n    - When `normalize_distances=True`, the penalties are relative to the [0, 1] normalized distance range, making them proportional regardless of the original distance scale.\n- No SciPy dependency \u2014 output `Z` works with SciPy\u2019s downstream tools.\n\n---\n\n## \ud83d\udd0c Plug-and-play\n\n`constrained_linkage` is a **drop-in replacement** for SciPy\u2019s `linkage` function.  \n\n- **No constraints?** Works identically to `scipy.cluster.hierarchy.linkage`.  \n- **With constraints?** Adds powerful, flexible soft constraints with minimal code changes.  \n- Output is a **SciPy-compatible linkage matrix `Z`**, so you can keep using all SciPy tools (e.g., `fcluster`, `dendrogram`) unchanged.\n\n---\n\n## \ud83d\udd27 Install\n\n```bash\npip install constrained-linkage\n# from source:\npip install \"git+https://github.com/jonnevd/constrained-linkage\"\n```\n\n---\n\n## \ud83d\ude80 Usage Examples\n\nBelow we illustrate **must-link** (negative penalties) and **cannot-link** (positive penalties) via the constraint matrix `M`.  \nAll distances are optionally scaled to `[0,1]` when `normalize_distances=True`, so penalties are **scale-free**.\n\n> **Semantics:**  \n> - `M[i, j] < 0` \u2192 **must-link** (encourage merging i\u2194j)  \n> - `M[i, j] > 0` \u2192 **cannot-link** (discourage merging i\u2194j)\n\n---\n\n### Example 1 \u2014 Must-link & Cannot-link constraints\n\n<p align=\"center\">\n  <img src=\"https://github.com/jonnevd/constrained-linkage/blob/main/docs/CM_cluster_effect.png\" alt=\"Effect of constraint matrix on clusters\" width=\"500\">\n</p>\n\n```python\nimport numpy as np\nfrom constrained_linkage import constrained_linkage\nfrom scipy.cluster import hierarchy\nimport matplotlib.pyplot as plt\n\n# Four points in 1D (two well-separated pairs)\nX = np.array([[0.0], [0.1], [10.0], [10.1]])\nD = np.sqrt(((X[:, None, :] - X[None, :, :]) ** 2).sum(-1))\n\n# Constraint matrix: must-link 0\u21941, cannot-link 2\u21943\nM = np.zeros_like(D)\nM[0, 1] = M[1, 0] = -0.6   # must-link (negative)\nM[2, 3] = M[3, 2] =  0.6   # cannot-link (positive)\n\nZ = constrained_linkage(\n    D, method=\"average\",\n    constraint_matrix=M,\n    normalize_distances=True\n)\n\n# Works seamlessly with SciPy tools\nlabels = hierarchy.fcluster(Z, 2, criterion=\"maxclust\")\nprint(\"Partition with must-link(0,1) & cannot-link(2,3):\", labels)\n\nplt.figure(figsize=(6, 3))\nhierarchy.dendrogram(Z, labels=[f\"P{i}\" for i in range(len(X))])\nplt.title(\"Dendrogram \u2014 must-link(0,1), cannot-link(2,3)\")\nplt.tight_layout()\nplt.show()\n```\n\n### Example 2 \u2014 Enforcing a maximum cluster size\n\nDiscourage clusters larger than a threshold by adding a positive penalty above the maximum.\n\n```python\nimport numpy as np\nfrom constrained_linkage import constrained_linkage\nfrom scipy.cluster import hierarchy\n\n# Six points in 1D (three tight pairs)\nX = np.array([[0.0], [0.1], [5.0], [5.1], [10.0], [10.1]])\nD = np.sqrt(((X[:, None, :] - X[None, :, :]) ** 2).sum(-1))\n\nZ_max = constrained_linkage(\n    D, method=\"average\",\n    max_cluster_size=2,     # soft cap\n    max_penalty_weight=0.6, # stronger => avoids overgrown clusters\n    normalize_distances=True\n)\n\nlabels_max = hierarchy.fcluster(Z_max, 3, criterion=\"maxclust\")\nprint(\"Partition with max_cluster_size=2:\", labels_max)\n```\n\n\n### Example 3 \u2014 Enforcing a minimum cluster size\n\nWhen domain knowledge suggests small units should coalesce before analysis, use a minimum size prior to avoid singletons or small groups. Increasing the penalty weight strengthens this bias, as shown in the figure below.\n\n<p align=\"center\">\n  <img src=\"https://github.com/jonnevd/constrained-linkage/blob/main/docs/min_cluster_effect.png\" alt=\"Effect of min_cluster_size penalty on small clusters\" width=\"500\">\n</p>\n\n```python\nimport numpy as np\nfrom constrained_linkage import constrained_linkage\nfrom scipy.cluster import hierarchy\n\n# Six points in 1D (three tight pairs)\nX = np.array([[0.0], [0.1], [5.0], [5.1], [10.0], [10.1]])\nD = np.sqrt(((X[:, None, :] - X[None, :, :]) ** 2).sum(-1))\n\nZ_min = constrained_linkage(\n    D, method=\"average\",\n    min_cluster_size=3,     # target minimum size\n    min_penalty_weight=0.5, # stronger => merge undersized clusters earlier\n    normalize_distances=True\n)\n\nlabels_min = hierarchy.fcluster(Z_min, 2, criterion=\"maxclust\")\nprint(\"Partition with min_cluster_size=3:\", labels_min)\n```",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Hierarchical agglomerative clustering with soft constraints (SciPy-compatible Z).",
    "version": "1.0.1",
    "project_urls": {
        "Documentation": "https://github.com/jonnevd/constrained-linkage#readme",
        "Homepage": "https://github.com/jonnevd/constrained-linkage",
        "Issues": "https://github.com/jonnevd/constrained-linkage/issues",
        "Source": "https://github.com/jonnevd/constrained-linkage"
    },
    "split_keywords": [
        "hac",
        " clustering",
        " constraints",
        " hierarchical",
        " linkage"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "4a2bd2f1f203be0324aa9a23f0b5a26ae14194d495358ceacaffa54b0479237c",
                "md5": "96cc13259a561c4009f7405f74817123",
                "sha256": "b2accd6f0e82af5ecec576e007dd894fd4d28e0bcb477b8c8161d596bf6131fa"
            },
            "downloads": -1,
            "filename": "constrained_linkage-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "96cc13259a561c4009f7405f74817123",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 10711,
            "upload_time": "2025-08-20T10:24:56",
            "upload_time_iso_8601": "2025-08-20T10:24:56.118350Z",
            "url": "https://files.pythonhosted.org/packages/4a/2b/d2f1f203be0324aa9a23f0b5a26ae14194d495358ceacaffa54b0479237c/constrained_linkage-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0cdbd298259ecadd50fbf7e950515499e93ffb468562c48a2802bf74bfaeabd5",
                "md5": "4d7e789f3e790ce9e163af2a21ffdec4",
                "sha256": "1480cfe2955e53c31c168c5a4246ebaefa1e8ade6a4d0047620de11a451bee84"
            },
            "downloads": -1,
            "filename": "constrained_linkage-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "4d7e789f3e790ce9e163af2a21ffdec4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 95512,
            "upload_time": "2025-08-20T10:24:58",
            "upload_time_iso_8601": "2025-08-20T10:24:58.926099Z",
            "url": "https://files.pythonhosted.org/packages/0c/db/d298259ecadd50fbf7e950515499e93ffb468562c48a2802bf74bfaeabd5/constrained_linkage-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-20 10:24:58",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jonnevd",
    "github_project": "constrained-linkage#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "constrained-linkage"
}
        
Elapsed time: 0.66276s