kmeans-tjdwill


Namekmeans-tjdwill JSON
Version 1.0.4 PyPI version JSON
download
home_pageNone
SummaryA function-based implementation of k-means clustering that maintains data association.
upload_time2024-07-06 16:55:13
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseMIT License Copyright (c) 2024 tjdwill Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords clustering computer vision data analysis data processing k-means linear algebra robotics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # K-Means Clustering

[![PyPI version](https://badge.fury.io/py/kmeans-tjdwill.svg)](https://badge.fury.io/py/kmeans-tjdwill)
[![Docs](https://github.com/tjdwill/kmeans/actions/workflows/sitebuild.yml/badge.svg)](https://tjdwill.github.io/kmeans) 


A repository documenting the implementation of k-Means clustering in Python. Usage examples can be found in the `tests` directory.


The thing that makes this k-means clustering module different from others is that it allows the user to specify the number of dimensions to use for the clustering operation.

For example, given some data where each element is of form 
```python
# Each element would actually be a Numpy array, but the following uses lists for readability.
[
  [1, 2, 3, 4, 5],
  [4, 6, 7, 8, 2],
  ...
]
```
specifying `ndim=3` will result in only the first three elements of each data point being used for each operation.

This is useful for maintaining data association where it otherwise would be shuffled. An example of this is found in my implementation of image segmentation (`segmentation.py`) in this same project.
Other examples of use could be for maintaining data association in object detection elements. Given some 
```python
[xmin, ymin, xmax, ymax, conf, label]  # [bounding box, conf, label]
```
we may want to cluster the data solely on bounding box information while also maintaining the confidence intervals for each detection for further processing.

---

## Installation

```bash
$ python -m pip install kmeans-tjdwill
```

## How it Works

Specifying the `k` value results in a `dict[int: NDArray]` where each `NDArray` contains the elements within the cluster. The keys of this dict range from `0` to `k-1`, allowing the key to also be used to index the corresponding cluster centroid from the centroid array.

Here is an example of the use of the `cluster` function:

```python
>>> from kmeans import cluster
>>> import numpy as np
>>> np.random.seed(27)   # For reproducible results
>>> data = np.random.random((15, 5)).round(3)
>>> data[0]
array([0.426, 0.815, 0.735, 0.868, 0.383])
>>> # Cluster using only first two dimensions
>>> clusters, centroids = cluster(data, k=3, ndim=2, tolerance=0.001)
>>> centroids
array([[0.9004  , 0.79    ],
      [0.361375, 0.580125],
      [0.801   , 0.143   ]])
>>> clusters  # visually compare centroids with first two elements of each data entry.
{0: array([[0.979, 0.893, 0.21 , 0.742, 0.663],
     [0.887, 0.858, 0.749, 0.87 , 0.187],
     [0.966, 0.583, 0.092, 0.014, 0.837],
     [0.915, 0.705, 0.387, 0.706, 0.923],
     [0.755, 0.911, 0.242, 0.976, 0.304]]),
1: array([[0.426, 0.815, 0.735, 0.868, 0.383],
     [0.326, 0.373, 0.794, 0.151, 0.17 ],
     [0.081, 0.305, 0.783, 0.163, 0.071],
     [0.221, 0.726, 0.849, 0.929, 0.736],
     [0.477, 0.493, 0.595, 0.076, 0.117],
     [0.288, 0.684, 0.52 , 0.877, 0.924],
     [0.489, 0.596, 0.264, 0.992, 0.21 ],
     [0.583, 0.649, 0.911, 0.122, 0.676]]),
2: array([[0.701, 0.181, 0.599, 0.415, 0.514],
     [0.901, 0.105, 0.673, 0.87 , 0.561]])}
```

---

## Features

- k-means clustering (no side-effects)
- k-means clustering w/ animation
  - (2-D & 3-D)
- image segmentation via `kmeans.segmentation.segment_img` function


### k-means Animation

Using the `view_clustering` function

#### 2-D Case (Smallest Tolerance Possible)

[kmeans2D_animate.webm](https://github.com/tjdwill/KMeans_Clustering/assets/118497355/0584a4d1-268d-4785-b05e-319d54a28de1)

#### 3-D Case (Tolerance = 0.001)

[kmeans3D_animate.webm](https://github.com/tjdwill/KMeans_Clustering/assets/118497355/a542b606-0844-427e-bfef-243e6f1ceffc)

### Image Segmentation

Perform image segmentation based on color groups specified by the user.

Two options:

#### Averaged Colors

k=4

![seg_groups04](https://github.com/tjdwill/KMeans_Clustering/assets/118497355/9b468213-6983-4c66-8f93-de6e58a736a1)

k=10

![seg_groups10](https://github.com/tjdwill/KMeans_Clustering/assets/118497355/91fc5e42-4c2e-49bf-a24f-9926565a1a6c)

#### Random Colors

k=4

![seg_rand_groups04_cpy](https://github.com/tjdwill/KMeans_Clustering/assets/118497355/33cee3ba-0a7d-4c12-9f34-7c140376f24b)

---

## Developed With
* Python (3.12.1)
* Numpy (1.26.2) 
* Matplotlib (3.8.4)

However, no features specific to Python 3.12 were used.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "kmeans-tjdwill",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "clustering, computer vision, data analysis, data processing, k-means, linear algebra, robotics",
    "author": null,
    "author_email": "Terrance Williams <tjdwill.gh@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/36/b0/0e72a1ceb5c9961e0687fa08f532ade8d8a1700eb79a4997cb93dc963357/kmeans_tjdwill-1.0.4.tar.gz",
    "platform": null,
    "description": "# K-Means Clustering\n\n[![PyPI version](https://badge.fury.io/py/kmeans-tjdwill.svg)](https://badge.fury.io/py/kmeans-tjdwill)\n[![Docs](https://github.com/tjdwill/kmeans/actions/workflows/sitebuild.yml/badge.svg)](https://tjdwill.github.io/kmeans) \n\n\nA repository documenting the implementation of k-Means clustering in Python. Usage examples can be found in the `tests` directory.\n\n\nThe thing that makes this k-means clustering module different from others is that it allows the user to specify the number of dimensions to use for the clustering operation.\n\nFor example, given some data where each element is of form \n```python\n# Each element would actually be a Numpy array, but the following uses lists for readability.\n[\n  [1, 2, 3, 4, 5],\n  [4, 6, 7, 8, 2],\n  ...\n]\n```\nspecifying `ndim=3` will result in only the first three elements of each data point being used for each operation.\n\nThis is useful for maintaining data association where it otherwise would be shuffled. An example of this is found in my implementation of image segmentation (`segmentation.py`) in this same project.\nOther examples of use could be for maintaining data association in object detection elements. Given some \n```python\n[xmin, ymin, xmax, ymax, conf, label]  # [bounding box, conf, label]\n```\nwe may want to cluster the data solely on bounding box information while also maintaining the confidence intervals for each detection for further processing.\n\n---\n\n## Installation\n\n```bash\n$ python -m pip install kmeans-tjdwill\n```\n\n## How it Works\n\nSpecifying the `k` value results in a `dict[int: NDArray]` where each `NDArray` contains the elements within the cluster. The keys of this dict range from `0` to `k-1`, allowing the key to also be used to index the corresponding cluster centroid from the centroid array.\n\nHere is an example of the use of the `cluster` function:\n\n```python\n>>> from kmeans import cluster\n>>> import numpy as np\n>>> np.random.seed(27)   # For reproducible results\n>>> data = np.random.random((15, 5)).round(3)\n>>> data[0]\narray([0.426, 0.815, 0.735, 0.868, 0.383])\n>>> # Cluster using only first two dimensions\n>>> clusters, centroids = cluster(data, k=3, ndim=2, tolerance=0.001)\n>>> centroids\narray([[0.9004  , 0.79    ],\n      [0.361375, 0.580125],\n      [0.801   , 0.143   ]])\n>>> clusters  # visually compare centroids with first two elements of each data entry.\n{0: array([[0.979, 0.893, 0.21 , 0.742, 0.663],\n     [0.887, 0.858, 0.749, 0.87 , 0.187],\n     [0.966, 0.583, 0.092, 0.014, 0.837],\n     [0.915, 0.705, 0.387, 0.706, 0.923],\n     [0.755, 0.911, 0.242, 0.976, 0.304]]),\n1: array([[0.426, 0.815, 0.735, 0.868, 0.383],\n     [0.326, 0.373, 0.794, 0.151, 0.17 ],\n     [0.081, 0.305, 0.783, 0.163, 0.071],\n     [0.221, 0.726, 0.849, 0.929, 0.736],\n     [0.477, 0.493, 0.595, 0.076, 0.117],\n     [0.288, 0.684, 0.52 , 0.877, 0.924],\n     [0.489, 0.596, 0.264, 0.992, 0.21 ],\n     [0.583, 0.649, 0.911, 0.122, 0.676]]),\n2: array([[0.701, 0.181, 0.599, 0.415, 0.514],\n     [0.901, 0.105, 0.673, 0.87 , 0.561]])}\n```\n\n---\n\n## Features\n\n- k-means clustering (no side-effects)\n- k-means clustering w/ animation\n  - (2-D & 3-D)\n- image segmentation via `kmeans.segmentation.segment_img` function\n\n\n### k-means Animation\n\nUsing the `view_clustering` function\n\n#### 2-D Case (Smallest Tolerance Possible)\n\n[kmeans2D_animate.webm](https://github.com/tjdwill/KMeans_Clustering/assets/118497355/0584a4d1-268d-4785-b05e-319d54a28de1)\n\n#### 3-D Case (Tolerance = 0.001)\n\n[kmeans3D_animate.webm](https://github.com/tjdwill/KMeans_Clustering/assets/118497355/a542b606-0844-427e-bfef-243e6f1ceffc)\n\n### Image Segmentation\n\nPerform image segmentation based on color groups specified by the user.\n\nTwo options:\n\n#### Averaged Colors\n\nk=4\n\n![seg_groups04](https://github.com/tjdwill/KMeans_Clustering/assets/118497355/9b468213-6983-4c66-8f93-de6e58a736a1)\n\nk=10\n\n![seg_groups10](https://github.com/tjdwill/KMeans_Clustering/assets/118497355/91fc5e42-4c2e-49bf-a24f-9926565a1a6c)\n\n#### Random Colors\n\nk=4\n\n![seg_rand_groups04_cpy](https://github.com/tjdwill/KMeans_Clustering/assets/118497355/33cee3ba-0a7d-4c12-9f34-7c140376f24b)\n\n---\n\n## Developed With\n* Python (3.12.1)\n* Numpy (1.26.2) \n* Matplotlib (3.8.4)\n\nHowever, no features specific to Python 3.12 were used.\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2024 tjdwill  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.",
    "summary": "A function-based implementation of k-means clustering that maintains data association.",
    "version": "1.0.4",
    "project_urls": {
        "Docs": "https://tjdwill.github.io/kmeans",
        "Homepage": "https://github.com/tjdwill/kmeans",
        "Issues": "https://github.com/tjdwill/kmeans/issues"
    },
    "split_keywords": [
        "clustering",
        " computer vision",
        " data analysis",
        " data processing",
        " k-means",
        " linear algebra",
        " robotics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "22b7ce0c31eb244b26c245e6355a0c5a7a6d7377e7d0696956dcd6730d326272",
                "md5": "8b1456394d947cf2246fcdc830c4a6ab",
                "sha256": "81b914f99cb0ac6a68599aad9bbb4c4b6f3fe37390e6602568b9d446e9ffbfd9"
            },
            "downloads": -1,
            "filename": "kmeans_tjdwill-1.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8b1456394d947cf2246fcdc830c4a6ab",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 12593,
            "upload_time": "2024-07-06T16:55:11",
            "upload_time_iso_8601": "2024-07-06T16:55:11.117518Z",
            "url": "https://files.pythonhosted.org/packages/22/b7/ce0c31eb244b26c245e6355a0c5a7a6d7377e7d0696956dcd6730d326272/kmeans_tjdwill-1.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "36b00e72a1ceb5c9961e0687fa08f532ade8d8a1700eb79a4997cb93dc963357",
                "md5": "51d40863c2c896b28302fa48809615a5",
                "sha256": "1bf207bf8da93887b9e8311e499d8a8dbe3685bc97f288a5a2ae7f463c159c15"
            },
            "downloads": -1,
            "filename": "kmeans_tjdwill-1.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "51d40863c2c896b28302fa48809615a5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 82413,
            "upload_time": "2024-07-06T16:55:13",
            "upload_time_iso_8601": "2024-07-06T16:55:13.315191Z",
            "url": "https://files.pythonhosted.org/packages/36/b0/0e72a1ceb5c9961e0687fa08f532ade8d8a1700eb79a4997cb93dc963357/kmeans_tjdwill-1.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-06 16:55:13",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "tjdwill",
    "github_project": "kmeans",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "kmeans-tjdwill"
}
        
Elapsed time: 0.31827s