[![Build Status](https://github.com/dstein64/kmeans1d/workflows/build/badge.svg)](https://github.com/dstein64/kmeans1d/actions)
kmeans1d
========
A Python library with an implementation of *k*-means clustering on 1D data, based on the algorithm
from Xiaolin (1991), as presented by Gronlund et al. (2017, Section 2.2).
Globally optimal *k*-means clustering is NP-hard for multi-dimensional data. Lloyd's algorithm is a
popular approach for finding a locally optimal solution. For 1-dimensional data, there are polynomial
time algorithms. The algorithm implemented here is an *O(kn + n log n)* dynamic programming algorithm
for finding the globally optimal *k* clusters for *n* 1D data points.
The code is written in C++, and wrapped with Python.
Requirements
------------
*kmeans1d* supports Python 3.x.
Installation
------------
[kmeans1d](https://pypi.python.org/pypi/kmeans1d) is available on PyPI, the Python Package Index.
```sh
$ pip3 install kmeans1d
```
Example Usage
-------------
```python
import kmeans1d
x = [4.0, 4.1, 4.2, -50, 200.2, 200.4, 200.9, 80, 100, 102]
k = 4
clusters, centroids = kmeans1d.cluster(x, k)
print(clusters) # [1, 1, 1, 0, 3, 3, 3, 2, 2, 2]
print(centroids) # [-50.0, 4.1, 94.0, 200.5]
```
Tests
-----
Tests are in [tests/](https://github.com/dstein64/kmeans1d/blob/master/tests).
```sh
# Run tests
$ python3 -m unittest discover tests -v
```
Development
-----------
The underlying C++ code can be built in-place, outside the context of `pip`. This requires Python
development tools for building Python modules (e.g., the `python3-dev` package on Ubuntu). `gcc`,
`clang`, and `MSVC` have been tested.
```
$ python3 setup.py build_ext --inplace
```
The [packages](https://github.com/dstein64/kmeans1d/blob/master/.github/workflows/packages.yml)
GitHub action can be manually triggered (`Actions` > `packages` > `Run workflow`) to build wheels
and a source distribution.
License
-------
The code in this repository has an [MIT License](https://en.wikipedia.org/wiki/MIT_License).
See [LICENSE](https://github.com/dstein64/kmeans1d/blob/master/LICENSE).
References
----------
[1] Wu, Xiaolin. "Optimal Quantization by Matrix Searching." Journal of Algorithms 12, no. 4
(December 1, 1991): 663
[2] Gronlund, Allan, Kasper Green Larsen, Alexander Mathiasen, Jesper Sindahl Nielsen, Stefan Schneider,
and Mingzhou Song. "Fast Exact K-Means, k-Medians and Bregman Divergence Clustering in 1D."
ArXiv:1701.07204 [Cs], January 25, 2017. http://arxiv.org/abs/1701.07204.
Raw data
{
"_id": null,
"home_page": "https://github.com/dstein64/kmeans1d",
"name": "kmeans1d",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": "k-means, machine learning, optimization",
"author": "Daniel Steinberg",
"author_email": "ds@dannyadam.com",
"download_url": "https://files.pythonhosted.org/packages/18/5b/0ba1f71363e62fb2bd5370ec9b26360c482bd9ad06082b5dcfb03e148928/kmeans1d-0.4.0.tar.gz",
"platform": null,
"description": "[![Build Status](https://github.com/dstein64/kmeans1d/workflows/build/badge.svg)](https://github.com/dstein64/kmeans1d/actions)\n\nkmeans1d\n========\n\nA Python library with an implementation of *k*-means clustering on 1D data, based on the algorithm\nfrom Xiaolin (1991), as presented by Gronlund et al. (2017, Section 2.2).\n\nGlobally optimal *k*-means clustering is NP-hard for multi-dimensional data. Lloyd's algorithm is a\npopular approach for finding a locally optimal solution. For 1-dimensional data, there are polynomial\ntime algorithms. The algorithm implemented here is an *O(kn + n log n)* dynamic programming algorithm\nfor finding the globally optimal *k* clusters for *n* 1D data points.\n\nThe code is written in C++, and wrapped with Python.\n\nRequirements\n------------\n\n*kmeans1d* supports Python 3.x.\n\nInstallation\n------------\n\n[kmeans1d](https://pypi.python.org/pypi/kmeans1d) is available on PyPI, the Python Package Index.\n\n```sh\n$ pip3 install kmeans1d\n```\n\nExample Usage\n-------------\n\n```python\nimport kmeans1d\n\nx = [4.0, 4.1, 4.2, -50, 200.2, 200.4, 200.9, 80, 100, 102]\nk = 4\n\nclusters, centroids = kmeans1d.cluster(x, k)\n\nprint(clusters) # [1, 1, 1, 0, 3, 3, 3, 2, 2, 2]\nprint(centroids) # [-50.0, 4.1, 94.0, 200.5]\n```\n\nTests\n-----\n\nTests are in [tests/](https://github.com/dstein64/kmeans1d/blob/master/tests).\n\n```sh\n# Run tests\n$ python3 -m unittest discover tests -v\n```\n\nDevelopment\n-----------\n\nThe underlying C++ code can be built in-place, outside the context of `pip`. This requires Python\ndevelopment tools for building Python modules (e.g., the `python3-dev` package on Ubuntu). `gcc`,\n`clang`, and `MSVC` have been tested.\n\n```\n$ python3 setup.py build_ext --inplace\n```\n\nThe [packages](https://github.com/dstein64/kmeans1d/blob/master/.github/workflows/packages.yml)\nGitHub action can be manually triggered (`Actions` > `packages` > `Run workflow`) to build wheels\nand a source distribution.\n\nLicense\n-------\n\nThe code in this repository has an [MIT License](https://en.wikipedia.org/wiki/MIT_License).\n\nSee [LICENSE](https://github.com/dstein64/kmeans1d/blob/master/LICENSE).\n\nReferences\n----------\n\n[1] Wu, Xiaolin. \"Optimal Quantization by Matrix Searching.\" Journal of Algorithms 12, no. 4\n(December 1, 1991): 663\n\n[2] Gronlund, Allan, Kasper Green Larsen, Alexander Mathiasen, Jesper Sindahl Nielsen, Stefan Schneider,\nand Mingzhou Song. \"Fast Exact K-Means, k-Medians and Bregman Divergence Clustering in 1D.\"\nArXiv:1701.07204 [Cs], January 25, 2017. http://arxiv.org/abs/1701.07204.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A Python package for optimal 1D k-means clustering",
"version": "0.4.0",
"project_urls": {
"Homepage": "https://github.com/dstein64/kmeans1d"
},
"split_keywords": [
"k-means",
" machine learning",
" optimization"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "191957056e4566611f5da2166c52a9f88185c177a224d0ecab1dd632cffa1c88",
"md5": "27cc446ada593465911564900b00f7cf",
"sha256": "9a052bca5bbacbd70a7f4846896439ebed29e96a2b04115c1fec480555cad70d"
},
"downloads": -1,
"filename": "kmeans1d-0.4.0-cp32-abi3-macosx_11_0_universal2.whl",
"has_sig": false,
"md5_digest": "27cc446ada593465911564900b00f7cf",
"packagetype": "bdist_wheel",
"python_version": "cp32",
"requires_python": ">=3.6",
"size": 28571,
"upload_time": "2024-07-03T17:45:15",
"upload_time_iso_8601": "2024-07-03T17:45:15.561866Z",
"url": "https://files.pythonhosted.org/packages/19/19/57056e4566611f5da2166c52a9f88185c177a224d0ecab1dd632cffa1c88/kmeans1d-0.4.0-cp32-abi3-macosx_11_0_universal2.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ba0c63c31443d391f5452aa21e7d5e10106de06cd6ea42bf0a0689b01900139f",
"md5": "2ad01bfbbc32032aebc10b6c03f7501d",
"sha256": "ad8017fdb82746c718215695b4290e9bd6ded0305740e2fe2b1ede697a14e947"
},
"downloads": -1,
"filename": "kmeans1d-0.4.0-cp32-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "2ad01bfbbc32032aebc10b6c03f7501d",
"packagetype": "bdist_wheel",
"python_version": "cp32",
"requires_python": ">=3.6",
"size": 122693,
"upload_time": "2024-07-03T17:45:17",
"upload_time_iso_8601": "2024-07-03T17:45:17.428420Z",
"url": "https://files.pythonhosted.org/packages/ba/0c/63c31443d391f5452aa21e7d5e10106de06cd6ea42bf0a0689b01900139f/kmeans1d-0.4.0-cp32-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d55adb6e1f27d31db470e53a6b70009216891050a7396a70584c9600a5b98bd4",
"md5": "a327d8ece92b60bb366631897a8346d4",
"sha256": "5943d68f6514309023ace23d8ad3f43703871e87a021e8024cdf4db79ed2f2e1"
},
"downloads": -1,
"filename": "kmeans1d-0.4.0-cp32-abi3-win_amd64.whl",
"has_sig": false,
"md5_digest": "a327d8ece92b60bb366631897a8346d4",
"packagetype": "bdist_wheel",
"python_version": "cp32",
"requires_python": ">=3.6",
"size": 18133,
"upload_time": "2024-07-03T17:45:18",
"upload_time_iso_8601": "2024-07-03T17:45:18.989084Z",
"url": "https://files.pythonhosted.org/packages/d5/5a/db6e1f27d31db470e53a6b70009216891050a7396a70584c9600a5b98bd4/kmeans1d-0.4.0-cp32-abi3-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "185b0ba1f71363e62fb2bd5370ec9b26360c482bd9ad06082b5dcfb03e148928",
"md5": "2b7eb7a6a44a2717e4717a07bd2cb6c6",
"sha256": "80f66178d4e66f75e076c4877b52f9b4d0eb2d3b4294c056016a1384f75fe5de"
},
"downloads": -1,
"filename": "kmeans1d-0.4.0.tar.gz",
"has_sig": false,
"md5_digest": "2b7eb7a6a44a2717e4717a07bd2cb6c6",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 7217,
"upload_time": "2024-07-03T17:45:19",
"upload_time_iso_8601": "2024-07-03T17:45:19.936062Z",
"url": "https://files.pythonhosted.org/packages/18/5b/0ba1f71363e62fb2bd5370ec9b26360c482bd9ad06082b5dcfb03e148928/kmeans1d-0.4.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-03 17:45:19",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "dstein64",
"github_project": "kmeans1d",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "kmeans1d"
}