kmeans-tjdwill

Name	kmeans-tjdwill JSON
Version	1.0.4 JSON
	download
home_page	None
Summary	A function-based implementation of k-means clustering that maintains data association.
upload_time	2024-07-06 16:55:13
maintainer	None
docs_url	None
author	None
requires_python	>=3.9
license	MIT License Copyright (c) 2024 tjdwill Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords	clustering computer vision data analysis data processing k-means linear algebra robotics
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # K-Means Clustering

[![PyPI version](https://badge.fury.io/py/kmeans-tjdwill.svg)](https://badge.fury.io/py/kmeans-tjdwill)
[![Docs](https://github.com/tjdwill/kmeans/actions/workflows/sitebuild.yml/badge.svg)](https://tjdwill.github.io/kmeans) 


A repository documenting the implementation of k-Means clustering in Python. Usage examples can be found in the `tests` directory.


The thing that makes this k-means clustering module different from others is that it allows the user to specify the number of dimensions to use for the clustering operation.

For example, given some data where each element is of form 
```python
# Each element would actually be a Numpy array, but the following uses lists for readability.
[
  [1, 2, 3, 4, 5],
  [4, 6, 7, 8, 2],
  ...
]
```
specifying `ndim=3` will result in only the first three elements of each data point being used for each operation.

This is useful for maintaining data association where it otherwise would be shuffled. An example of this is found in my implementation of image segmentation (`segmentation.py`) in this same project.
Other examples of use could be for maintaining data association in object detection elements. Given some 
```python
[xmin, ymin, xmax, ymax, conf, label]  # [bounding box, conf, label]
```
we may want to cluster the data solely on bounding box information while also maintaining the confidence intervals for each detection for further processing.

---

## Installation

```bash
$ python -m pip install kmeans-tjdwill
```

## How it Works

Specifying the `k` value results in a `dict[int: NDArray]` where each `NDArray` contains the elements within the cluster. The keys of this dict range from `0` to `k-1`, allowing the key to also be used to index the corresponding cluster centroid from the centroid array.

Here is an example of the use of the `cluster` function:

```python
>>> from kmeans import cluster
>>> import numpy as np
>>> np.random.seed(27)   # For reproducible results
>>> data = np.random.random((15, 5)).round(3)
>>> data[0]
array([0.426, 0.815, 0.735, 0.868, 0.383])
>>> # Cluster using only first two dimensions
>>> clusters, centroids = cluster(data, k=3, ndim=2, tolerance=0.001)
>>> centroids
array([[0.9004  , 0.79    ],
      [0.361375, 0.580125],
      [0.801   , 0.143   ]])
>>> clusters  # visually compare centroids with first two elements of each data entry.
{0: array([[0.979, 0.893, 0.21 , 0.742, 0.663],
     [0.887, 0.858, 0.749, 0.87 , 0.187],
     [0.966, 0.583, 0.092, 0.014, 0.837],
     [0.915, 0.705, 0.387, 0.706, 0.923],
     [0.755, 0.911, 0.242, 0.976, 0.304]]),
1: array([[0.426, 0.815, 0.735, 0.868, 0.383],
     [0.326, 0.373, 0.794, 0.151, 0.17 ],
     [0.081, 0.305, 0.783, 0.163, 0.071],
     [0.221, 0.726, 0.849, 0.929, 0.736],
     [0.477, 0.493, 0.595, 0.076, 0.117],
     [0.288, 0.684, 0.52 , 0.877, 0.924],
     [0.489, 0.596, 0.264, 0.992, 0.21 ],
     [0.583, 0.649, 0.911, 0.122, 0.676]]),
2: array([[0.701, 0.181, 0.599, 0.415, 0.514],
     [0.901, 0.105, 0.673, 0.87 , 0.561]])}
```

---

## Features

- k-means clustering (no side-effects)
- k-means clustering w/ animation
  - (2-D & 3-D)
- image segmentation via `kmeans.segmentation.segment_img` function


### k-means Animation

Using the `view_clustering` function

#### 2-D Case (Smallest Tolerance Possible)

[kmeans2D_animate.webm](https://github.com/tjdwill/KMeans_Clustering/assets/118497355/0584a4d1-268d-4785-b05e-319d54a28de1)

#### 3-D Case (Tolerance = 0.001)

[kmeans3D_animate.webm](https://github.com/tjdwill/KMeans_Clustering/assets/118497355/a542b606-0844-427e-bfef-243e6f1ceffc)

### Image Segmentation

Perform image segmentation based on color groups specified by the user.

Two options:

#### Averaged Colors

k=4

![seg_groups04](https://github.com/tjdwill/KMeans_Clustering/assets/118497355/9b468213-6983-4c66-8f93-de6e58a736a1)

k=10

![seg_groups10](https://github.com/tjdwill/KMeans_Clustering/assets/118497355/91fc5e42-4c2e-49bf-a24f-9926565a1a6c)

#### Random Colors

k=4

![seg_rand_groups04_cpy](https://github.com/tjdwill/KMeans_Clustering/assets/118497355/33cee3ba-0a7d-4c12-9f34-7c140376f24b)

---

## Developed With
* Python (3.12.1)
* Numpy (1.26.2) 
* Matplotlib (3.8.4)

However, no features specific to Python 3.12 were used.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "kmeans-tjdwill",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "clustering, computer vision, data analysis, data processing, k-means, linear algebra, robotics",
    "author": null,
    "author_email": "Terrance Williams <tjdwill.gh@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/36/b0/0e72a1ceb5c9961e0687fa08f532ade8d8a1700eb79a4997cb93dc963357/kmeans_tjdwill-1.0.4.tar.gz",
    "platform": null,
    "description": "# K-Means Clustering\n\n[![PyPI version](https://badge.fury.io/py/kmeans-tjdwill.svg)](https://badge.fury.io/py/kmeans-tjdwill)\n[![Docs](https://github.com/tjdwill/kmeans/actions/workflows/sitebuild.yml/badge.svg)](https://tjdwill.github.io/kmeans) \n\n\nA repository documenting the implementation of k-Means clustering in Python. Usage examples can be found in the `tests` directory.\n\n\nThe thing that makes this k-means clustering module different from others is that it allows the user to specify the number of dimensions to use for the clustering operation.\n\nFor example, given some data where each element is of form \n```python\n# Each element would actually be a Numpy array, but the following uses lists for readability.\n[\n  [1, 2, 3, 4, 5],\n  [4, 6, 7, 8, 2],\n  ...\n]\n```\nspecifying `ndim=3` will result in only the first three elements of each data point being used for each operation.\n\nThis is useful for maintaining data association where it otherwise would be shuffled. An example of this is found in my implementation of image segmentation (`segmentation.py`) in this same project.\nOther examples of use could be for maintaining data association in object detection elements. Given some \n```python\n[xmin, ymin, xmax, ymax, conf, label]  # [bounding box, conf, label]\n```\nwe may want to cluster the data solely on bounding box information while also maintaining the confidence intervals for each detection for further processing.\n\n---\n\n## Installation\n\n```bash\n$ python -m pip install kmeans-tjdwill\n```\n\n## How it Works\n\nSpecifying the `k` value results in a `dict[int: NDArray]` where each `NDArray` contains the elements within the cluster. The keys of this dict range from `0` to `k-1`, allowing the key to also be used to index the corresponding cluster centroid from the centroid array.\n\nHere is an example of the use of the `cluster` function:\n\n```python\n>>> from kmeans import cluster\n>>> import numpy as np\n>>> np.random.seed(27)   # For reproducible results\n>>> data = np.random.random((15, 5)).round(3)\n>>> data[0]\narray([0.426, 0.815, 0.735, 0.868, 0.383])\n>>> # Cluster using only first two dimensions\n>>> clusters, centroids = cluster(data, k=3, ndim=2, tolerance=0.001)\n>>> centroids\narray([[0.9004  , 0.79    ],\n      [0.361375, 0.580125],\n      [0.801   , 0.143   ]])\n>>> clusters  # visually compare centroids with first two elements of each data entry.\n{0: array([[0.979, 0.893, 0.21 , 0.742, 0.663],\n     [0.887, 0.858, 0.749, 0.87 , 0.187],\n     [0.966, 0.583, 0.092, 0.014, 0.837],\n     [0.915, 0.705, 0.387, 0.706, 0.923],\n     [0.755, 0.911, 0.242, 0.976, 0.304]]),\n1: array([[0.426, 0.815, 0.735, 0.868, 0.383],\n     [0.326, 0.373, 0.794, 0.151, 0.17 ],\n     [0.081, 0.305, 0.783, 0.163, 0.071],\n     [0.221, 0.726, 0.849, 0.929, 0.736],\n     [0.477, 0.493, 0.595, 0.076, 0.117],\n     [0.288, 0.684, 0.52 , 0.877, 0.924],\n     [0.489, 0.596, 0.264, 0.992, 0.21 ],\n     [0.583, 0.649, 0.911, 0.122, 0.676]]),\n2: array([[0.701, 0.181, 0.599, 0.415, 0.514],\n     [0.901, 0.105, 0.673, 0.87 , 0.561]])}\n```\n\n---\n\n## Features\n\n- k-means clustering (no side-effects)\n- k-means clustering w/ animation\n  - (2-D & 3-D)\n- image segmentation via `kmeans.segmentation.segment_img` function\n\n\n### k-means Animation\n\nUsing the `view_clustering` function\n\n#### 2-D Case (Smallest Tolerance Possible)\n\n[kmeans2D_animate.webm](https://github.com/tjdwill/KMeans_Clustering/assets/118497355/0584a4d1-268d-4785-b05e-319d54a28de1)\n\n#### 3-D Case (Tolerance = 0.001)\n\n[kmeans3D_animate.webm](https://github.com/tjdwill/KMeans_Clustering/assets/118497355/a542b606-0844-427e-bfef-243e6f1ceffc)\n\n### Image Segmentation\n\nPerform image segmentation based on color groups specified by the user.\n\nTwo options:\n\n#### Averaged Colors\n\nk=4\n\n![seg_groups04](https://github.com/tjdwill/KMeans_Clustering/assets/118497355/9b468213-6983-4c66-8f93-de6e58a736a1)\n\nk=10\n\n![seg_groups10](https://github.com/tjdwill/KMeans_Clustering/assets/118497355/91fc5e42-4c2e-49bf-a24f-9926565a1a6c)\n\n#### Random Colors\n\nk=4\n\n![seg_rand_groups04_cpy](https://github.com/tjdwill/KMeans_Clustering/assets/118497355/33cee3ba-0a7d-4c12-9f34-7c140376f24b)\n\n---\n\n## Developed With\n* Python (3.12.1)\n* Numpy (1.26.2) \n* Matplotlib (3.8.4)\n\nHowever, no features specific to Python 3.12 were used.\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2024 tjdwill  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.",
    "summary": "A function-based implementation of k-means clustering that maintains data association.",
    "version": "1.0.4",
    "project_urls": {
        "Docs": "https://tjdwill.github.io/kmeans",
        "Homepage": "https://github.com/tjdwill/kmeans",
        "Issues": "https://github.com/tjdwill/kmeans/issues"
    },
    "split_keywords": [
        "clustering",
        " computer vision",
        " data analysis",
        " data processing",
        " k-means",
        " linear algebra",
        " robotics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "22b7ce0c31eb244b26c245e6355a0c5a7a6d7377e7d0696956dcd6730d326272",
                "md5": "8b1456394d947cf2246fcdc830c4a6ab",
                "sha256": "81b914f99cb0ac6a68599aad9bbb4c4b6f3fe37390e6602568b9d446e9ffbfd9"
            },
            "downloads": -1,
            "filename": "kmeans_tjdwill-1.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8b1456394d947cf2246fcdc830c4a6ab",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 12593,
            "upload_time": "2024-07-06T16:55:11",
            "upload_time_iso_8601": "2024-07-06T16:55:11.117518Z",
            "url": "https://files.pythonhosted.org/packages/22/b7/ce0c31eb244b26c245e6355a0c5a7a6d7377e7d0696956dcd6730d326272/kmeans_tjdwill-1.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "36b00e72a1ceb5c9961e0687fa08f532ade8d8a1700eb79a4997cb93dc963357",
                "md5": "51d40863c2c896b28302fa48809615a5",
                "sha256": "1bf207bf8da93887b9e8311e499d8a8dbe3685bc97f288a5a2ae7f463c159c15"
            },
            "downloads": -1,
            "filename": "kmeans_tjdwill-1.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "51d40863c2c896b28302fa48809615a5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 82413,
            "upload_time": "2024-07-06T16:55:13",
            "upload_time_iso_8601": "2024-07-06T16:55:13.315191Z",
            "url": "https://files.pythonhosted.org/packages/36/b0/0e72a1ceb5c9961e0687fa08f532ade8d8a1700eb79a4997cb93dc963357/kmeans_tjdwill-1.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-06 16:55:13",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "tjdwill",
    "github_project": "kmeans",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "kmeans-tjdwill"
}

None