# Parallel Delayed Cluster DP-Means
[Paper](https://openreview.net/pdf?id=rnzVBD8jqlq) <br>
### Introduction
The PDC-DP-Means package presents a highly optimized version of the DP-Means algorithm, introducing a new parallel algorithm, Parallel Delayed Cluster DP-Means (PDC-DP-Means), and a MiniBatch implementation for enhanced speed. These features cater to scalable and efficient cluster analysis where the number of clusters is unknown.
In addition to offering major speed improvements, the PDC-DP-Means algorithm supports an optional online mode for real-time data processing. Its scikit-learn-like interface is user-friendly and designed for easy integration into existing data workflows. PDC-DP-Means outperforms other nonparametric methods, establishing its efficiency and scalability in the realm of clustering algorithms.
See the paper for more details.
### Installation
`pip install pdc-dp-means`
### Quick Start
from sklearn.datasets import make_blobs
from pdc_dp_means import DPMeans
# Generate sample data
X, y_true = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0)
# Apply DPMeans clustering
dpmeans = DPMeans(n_clusters=1,n_init=10, delta=10) # n_init and delta parameters
dpmeans.fit(X)
# Predict the cluster for each data point
y_dpmeans = dpmeans.predict(X)
# Plotting clusters and centroids
import matplotlib.pyplot as plt
plt.scatter(X[:, 0], X[:, 1], c=y_dpmeans, s=50, cmap='viridis')
centers = dpmeans.cluster_centers_
plt.scatter(centers[:, 0], centers[:, 1], c='black', s=200, alpha=0.5)
plt.show()
One thing to note is that we replace the `\lambda` parameter from the paper with `delta` in the code, as `lambda` is a reserved word in python.
### Usage
Please refer to the documentation: https://pdc-dp-means.readthedocs.io/en/latest/
### Paper Code
Please refer to https://github.com/BGU-CS-VIL/pdc-dp-means/tree/main/paper_code for the code used in the paper.
### Citing this work
If you use this code for your work, please cite the following:
```
@inproceedings{dinari2022revisiting,
title={Revisiting {DP}-Means: Fast Scalable Algorithms via Parallelism and Delayed Cluster Creation},
author={Dinari, Or and Freifeld, Oren},
booktitle={The 38th Conference on Uncertainty in Artificial Intelligence},
year={2022}
}
```
### License
Our code is licensed under the BDS-3-Clause license.
Raw data
{
"_id": null,
"home_page": null,
"name": "pdc-dp-means",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "dp-means clustering",
"author": "Or Dinari",
"author_email": "dinari.or@gmail.com",
"download_url": null,
"platform": null,
"description": "# Parallel Delayed Cluster DP-Means\n\n[Paper](https://openreview.net/pdf?id=rnzVBD8jqlq) <br>\n\n### Introduction\nThe PDC-DP-Means package presents a highly optimized version of the DP-Means algorithm, introducing a new parallel algorithm, Parallel Delayed Cluster DP-Means (PDC-DP-Means), and a MiniBatch implementation for enhanced speed. These features cater to scalable and efficient cluster analysis where the number of clusters is unknown.\n\nIn addition to offering major speed improvements, the PDC-DP-Means algorithm supports an optional online mode for real-time data processing. Its scikit-learn-like interface is user-friendly and designed for easy integration into existing data workflows. PDC-DP-Means outperforms other nonparametric methods, establishing its efficiency and scalability in the realm of clustering algorithms.\n\nSee the paper for more details.\n\n\n### Installation\n`pip install pdc-dp-means`\n\n### Quick Start\n\n from sklearn.datasets import make_blobs\n from pdc_dp_means import DPMeans\n\n # Generate sample data\n X, y_true = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0)\n\n # Apply DPMeans clustering\n dpmeans = DPMeans(n_clusters=1,n_init=10, delta=10) # n_init and delta parameters\n dpmeans.fit(X)\n\n # Predict the cluster for each data point\n y_dpmeans = dpmeans.predict(X)\n\n # Plotting clusters and centroids\n import matplotlib.pyplot as plt\n\n plt.scatter(X[:, 0], X[:, 1], c=y_dpmeans, s=50, cmap='viridis')\n centers = dpmeans.cluster_centers_\n plt.scatter(centers[:, 0], centers[:, 1], c='black', s=200, alpha=0.5)\n plt.show()\n\nOne thing to note is that we replace the `\\lambda` parameter from the paper with `delta` in the code, as `lambda` is a reserved word in python.\n\n### Usage\nPlease refer to the documentation: https://pdc-dp-means.readthedocs.io/en/latest/\n\n### Paper Code\nPlease refer to https://github.com/BGU-CS-VIL/pdc-dp-means/tree/main/paper_code for the code used in the paper.\n\n### Citing this work\nIf you use this code for your work, please cite the following:\n\n```\n@inproceedings{dinari2022revisiting,\n title={Revisiting {DP}-Means: Fast Scalable Algorithms via Parallelism and Delayed Cluster Creation},\n author={Dinari, Or and Freifeld, Oren},\n booktitle={The 38th Conference on Uncertainty in Artificial Intelligence},\n year={2022}\n}\n```\n### License \nOur code is licensed under the BDS-3-Clause license.\n",
"bugtrack_url": null,
"license": "BSD3",
"summary": null,
"version": "0.0.8",
"project_urls": {
"Documentation": "https://pdc-dp-means.readthedocs.io/en/latest/",
"Source": "https://github.com/BGU-CS-VIL/pdc-dp-means",
"Tracker": "https://github.com/BGU-CS-VIL/pdc-dp-means"
},
"split_keywords": [
"dp-means",
"clustering"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "80df22fa8a25802fffb2b5ed9d57e6e2f55d9a60b95f29bf4ce49a8b916e3770",
"md5": "3d47df2a6890b0a8fe68a9e58e644494",
"sha256": "83b3069a0fa078a90d9db48272705880272092476de508b30a149fe85a07af5b"
},
"downloads": -1,
"filename": "pdc_dp_means-0.0.8-cp310-cp310-macosx_11_0_arm64.whl",
"has_sig": false,
"md5_digest": "3d47df2a6890b0a8fe68a9e58e644494",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": null,
"size": 2561225,
"upload_time": "2024-07-20T20:33:55",
"upload_time_iso_8601": "2024-07-20T20:33:55.686222Z",
"url": "https://files.pythonhosted.org/packages/80/df/22fa8a25802fffb2b5ed9d57e6e2f55d9a60b95f29bf4ce49a8b916e3770/pdc_dp_means-0.0.8-cp310-cp310-macosx_11_0_arm64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "dc932e9a14ab03f298bc544633851e0d962b57d1d1e394db415426fad52f40fb",
"md5": "de03312cef756e041f78e72bf5635aea",
"sha256": "8baeeb74efe8abca3d70bd7e5f1c6e4d9633c3ceea8eaf125b1c67d2ac887bf2"
},
"downloads": -1,
"filename": "pdc_dp_means-0.0.8-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "de03312cef756e041f78e72bf5635aea",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": null,
"size": 3079251,
"upload_time": "2024-07-20T20:33:57",
"upload_time_iso_8601": "2024-07-20T20:33:57.409929Z",
"url": "https://files.pythonhosted.org/packages/dc/93/2e9a14ab03f298bc544633851e0d962b57d1d1e394db415426fad52f40fb/pdc_dp_means-0.0.8-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "1bd8cb8e04a7730d9d39d7044907972d23614eb38e415deda321fb20c0afa6a6",
"md5": "1130d0ea2d32a02d132b84a75852365a",
"sha256": "24b60975074a301b5f1a82373007372a7bd4b0d47b409194693ce01d86f29e1e"
},
"downloads": -1,
"filename": "pdc_dp_means-0.0.8-cp310-cp310-win_amd64.whl",
"has_sig": false,
"md5_digest": "1130d0ea2d32a02d132b84a75852365a",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": null,
"size": 2561299,
"upload_time": "2024-07-20T20:33:59",
"upload_time_iso_8601": "2024-07-20T20:33:59.066295Z",
"url": "https://files.pythonhosted.org/packages/1b/d8/cb8e04a7730d9d39d7044907972d23614eb38e415deda321fb20c0afa6a6/pdc_dp_means-0.0.8-cp310-cp310-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "0a531836fbaa5e1fe8f4b635750c4ae6fe99adb5f150e074143d07938bb73477",
"md5": "b19d9510af655cacae1c4ad7d528b196",
"sha256": "39ff67a3b65bc66688cdeee2b67cb3158e890fe8fc8bb7a04cf03948c1bb0eb0"
},
"downloads": -1,
"filename": "pdc_dp_means-0.0.8-cp311-cp311-macosx_11_0_arm64.whl",
"has_sig": false,
"md5_digest": "b19d9510af655cacae1c4ad7d528b196",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": null,
"size": 2561215,
"upload_time": "2024-07-20T20:34:00",
"upload_time_iso_8601": "2024-07-20T20:34:00.689610Z",
"url": "https://files.pythonhosted.org/packages/0a/53/1836fbaa5e1fe8f4b635750c4ae6fe99adb5f150e074143d07938bb73477/pdc_dp_means-0.0.8-cp311-cp311-macosx_11_0_arm64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "7c1d8b748ac8a7ed1606b09af54aea1d395643c2c524bfd84859385d2e8be311",
"md5": "2d029d04c00212bc4419b3b83f0fe24e",
"sha256": "a691bbb15f59100a010358a9bc55b766d95736ad1ca53888b87bf969b0cf854e"
},
"downloads": -1,
"filename": "pdc_dp_means-0.0.8-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "2d029d04c00212bc4419b3b83f0fe24e",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": null,
"size": 3127374,
"upload_time": "2024-07-20T20:34:02",
"upload_time_iso_8601": "2024-07-20T20:34:02.315791Z",
"url": "https://files.pythonhosted.org/packages/7c/1d/8b748ac8a7ed1606b09af54aea1d395643c2c524bfd84859385d2e8be311/pdc_dp_means-0.0.8-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "c4d495057618524e451e682f022175676811929dd6db34ba255b128d91ecc984",
"md5": "e4f92fac50e2c9d164bbf37eb27d1914",
"sha256": "126b197dce8e96932ec7e55b21baf9a8196ff949aaa3d51b8ab7cf7247ffc629"
},
"downloads": -1,
"filename": "pdc_dp_means-0.0.8-cp311-cp311-win_amd64.whl",
"has_sig": false,
"md5_digest": "e4f92fac50e2c9d164bbf37eb27d1914",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": null,
"size": 2562032,
"upload_time": "2024-07-20T20:34:03",
"upload_time_iso_8601": "2024-07-20T20:34:03.990419Z",
"url": "https://files.pythonhosted.org/packages/c4/d4/95057618524e451e682f022175676811929dd6db34ba255b128d91ecc984/pdc_dp_means-0.0.8-cp311-cp311-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "014e5a3d1f5b14ea36186bcf4e7be4b4b7d8313bae9c10dba5bdaa7836858999",
"md5": "050d096e6a1430b783c23881b2843d7b",
"sha256": "ba99c143a4f1eb1b0b81fe69b3d04a7b6109d0daa61aee4dd0f8bb47ddb0bdbb"
},
"downloads": -1,
"filename": "pdc_dp_means-0.0.8-cp312-cp312-macosx_11_0_arm64.whl",
"has_sig": false,
"md5_digest": "050d096e6a1430b783c23881b2843d7b",
"packagetype": "bdist_wheel",
"python_version": "cp312",
"requires_python": null,
"size": 2562294,
"upload_time": "2024-07-20T20:34:05",
"upload_time_iso_8601": "2024-07-20T20:34:05.484892Z",
"url": "https://files.pythonhosted.org/packages/01/4e/5a3d1f5b14ea36186bcf4e7be4b4b7d8313bae9c10dba5bdaa7836858999/pdc_dp_means-0.0.8-cp312-cp312-macosx_11_0_arm64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ac11a9d68aaa0cb6f436b873959151631ec7aa78973d41c2afe2c5d1fb7d42c7",
"md5": "644066510f3611e2541e8f73e1a2df60",
"sha256": "9282b74e086c1cf966888e0f98462abd96e1be4d89265b669a1bfa962a7e04fd"
},
"downloads": -1,
"filename": "pdc_dp_means-0.0.8-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "644066510f3611e2541e8f73e1a2df60",
"packagetype": "bdist_wheel",
"python_version": "cp312",
"requires_python": null,
"size": 3109233,
"upload_time": "2024-07-20T20:34:08",
"upload_time_iso_8601": "2024-07-20T20:34:08.873347Z",
"url": "https://files.pythonhosted.org/packages/ac/11/a9d68aaa0cb6f436b873959151631ec7aa78973d41c2afe2c5d1fb7d42c7/pdc_dp_means-0.0.8-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "1f6c5f3b87f647cd9934d8fa0b4138ff5d7c896b69c8d84450ab24dce472dd92",
"md5": "bd0d817be7b358c93c0d36ae77b2d085",
"sha256": "c86709f23ac22497b83884727f028930e3c9edcef7c4515f55dec41c4aa3fed6"
},
"downloads": -1,
"filename": "pdc_dp_means-0.0.8-cp312-cp312-win_amd64.whl",
"has_sig": false,
"md5_digest": "bd0d817be7b358c93c0d36ae77b2d085",
"packagetype": "bdist_wheel",
"python_version": "cp312",
"requires_python": null,
"size": 2563415,
"upload_time": "2024-07-20T20:34:10",
"upload_time_iso_8601": "2024-07-20T20:34:10.581445Z",
"url": "https://files.pythonhosted.org/packages/1f/6c/5f3b87f647cd9934d8fa0b4138ff5d7c896b69c8d84450ab24dce472dd92/pdc_dp_means-0.0.8-cp312-cp312-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "fdaf4d789b8a833fa48d2cc80c8c18c86316726b6600bc0ba9b52d2a32082372",
"md5": "5b0adb9670266b938b6d27c6863cc803",
"sha256": "c4c1d95d445b194ed22df0e3704a6c37e1c102f2292796fdebb1acfc6c30405e"
},
"downloads": -1,
"filename": "pdc_dp_means-0.0.8-cp39-cp39-macosx_11_0_arm64.whl",
"has_sig": false,
"md5_digest": "5b0adb9670266b938b6d27c6863cc803",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": null,
"size": 2561772,
"upload_time": "2024-07-20T20:34:12",
"upload_time_iso_8601": "2024-07-20T20:34:12.083295Z",
"url": "https://files.pythonhosted.org/packages/fd/af/4d789b8a833fa48d2cc80c8c18c86316726b6600bc0ba9b52d2a32082372/pdc_dp_means-0.0.8-cp39-cp39-macosx_11_0_arm64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "5b42178bca8b3e850e4361b7515e0376b51eb46bcc99dae69fc8f555d77d3ec7",
"md5": "8070bd5ee62da933482a24f2c223f330",
"sha256": "baa7a6e0f87f665cb9178d77a22458ab246e95486878d00cfe8aaa2ec44b08a0"
},
"downloads": -1,
"filename": "pdc_dp_means-0.0.8-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "8070bd5ee62da933482a24f2c223f330",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": null,
"size": 3081138,
"upload_time": "2024-07-20T20:34:13",
"upload_time_iso_8601": "2024-07-20T20:34:13.339029Z",
"url": "https://files.pythonhosted.org/packages/5b/42/178bca8b3e850e4361b7515e0376b51eb46bcc99dae69fc8f555d77d3ec7/pdc_dp_means-0.0.8-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "8428a58832ef63413a058d711dc3b09df08bb5b313a729b2b44b8ad14b2ec40e",
"md5": "21c5f0be0086959ee274c331680095a3",
"sha256": "255b112f408aa04281ad26b7aa503324960196d5d6e9b90ddd110a5e5a463dc2"
},
"downloads": -1,
"filename": "pdc_dp_means-0.0.8-cp39-cp39-win_amd64.whl",
"has_sig": false,
"md5_digest": "21c5f0be0086959ee274c331680095a3",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": null,
"size": 2561843,
"upload_time": "2024-07-20T20:34:14",
"upload_time_iso_8601": "2024-07-20T20:34:14.826110Z",
"url": "https://files.pythonhosted.org/packages/84/28/a58832ef63413a058d711dc3b09df08bb5b313a729b2b44b8ad14b2ec40e/pdc_dp_means-0.0.8-cp39-cp39-win_amd64.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-20 20:33:55",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "BGU-CS-VIL",
"github_project": "pdc-dp-means",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "pdc-dp-means"
}