# k-DBCV
k-DBCV is an efficient python implementation of the density based cluster validation (DBCV) score proposed by Moulavi et al. (2014).
## Getting Started
### Dependencies
- SciPy
- NumPy
### Installation
k-DBCV can be installed via pip:
```
pip install kDBCV
```
## Usage
To score clustering scenarios, the following libraries are used:
- scikit-learn
- ClustSim
For visualization:
- matplotlib
### DBCV Score
#### Simple Scenario
The half moons dataset simulated from scikit-learn is shown:
<p align="center">
<img width="500" height="300" src=https://github.com/user-attachments/assets/22c7c5c3-dcf1-47d4-86fd-53f428e7f87b
</p>
```
DBCV_Score(X,labels)
```
Output: 0.5068928345037831
#### Scenario II
A larger dataset of clusters simulated with Clust_Sim-SMLM is shown:
<p align="center">
<img width="300" height="300" src=https://github.com/user-attachments/assets/acd7adee-9416-4a61-bfa0-caebf540097b
</p>
```
score = DBCV_score(X,labels)
```
Output: 0.6171526846848352
### Extracting Individual Cluster Scores
k-DBCV enables individual cluster score extraction where each cluster is assigned a score without consideration for noise:
Individual Cluster Score = separation-sparseness/max(separation,sparseness)
By default, ind_clust_scores is set to False
```
score, ind_clust_score_array = DBCV_Score(X,labels, ind_clust_scores = True)
```
Individual cluster scores are displayed by color below:
<p align="center">
<img width="350" height="300" src=https://github.com/user-attachments/assets/56cd291a-9991-45d9-8dd7-cd132ec823fb
</p>
### Memory cutoff
A memory cutoff is necessary to prevent attempts to score clusters that would exceed available memory. This cutoff should be set dependent on the machine being used. The default is set to a maximum of 25.0 GB. The score will output a -1 if the cutoff would be exceeded, along with an error message. To remove these error messages set batch_mode = True (Default is False).
```
score = DBCV_score(X,labels, memory_cutoff = 25.0)
```
## Relevant Citations
#### Density Based Cluster Validation
Moulavi, D., Jaskowiak, P. A., Campello, R. J. G. B., Zimek, A. & Sander, J. Density-based clustering validation. SIAM Int. Conf. Data Min. 2014, SDM 2014 2, 839–847 (2014)
#### k-DBCV implementation
Hammer, J. L., Devanny, A. J. & Kaufman, L. J. Density-based optimization for unbiased, reproducible clustering applied to single molecule localization microscopy. Preprint at https://www.biorxiv.org/content/10.1101/2024.11.01.621498v1 (2024)
## License
k-DBCV is licensed with an MIT license. See LICENSE file for more information.
## Referencing
#### In addition to citing Moulavi et al., if you use this repository, please cite with the following (currently in preprint):
Hammer, J. L., Devanny, A. J. & Kaufman, L. J. Density-based optimization for unbiased, reproducible clustering applied to single molecule localization microscopy. Preprint at https://www.biorxiv.org/content/10.1101/2024.11.01.621498v1 (2024)
## Contact
kaufmangroup.rubylab@gmail.com
Raw data
{
"_id": null,
"home_page": "https://github.com/Kaufman-Lab-Columbia/k-DBCV",
"name": "kDBCV",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "cluster clusters clustering",
"author": "Joseph L. Hammer, Alexander J. Devanny",
"author_email": "jhammer3018@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/e7/f7/fb4cd6b293b6f3cb61a2c800bfb89a8dc387dc79545e48a1de0ff6cf5313/kdbcv-1.0.0.tar.gz",
"platform": null,
"description": "# k-DBCV\r\n\r\nk-DBCV is an efficient python implementation of the density based cluster validation (DBCV) score proposed by Moulavi et al. (2014). \r\n\r\n## Getting Started\r\n### Dependencies\r\n- SciPy\r\n- NumPy\r\n### Installation\r\nk-DBCV can be installed via pip:\r\n```\r\npip install kDBCV\r\n```\r\n\r\n## Usage\r\nTo score clustering scenarios, the following libraries are used:\r\n- scikit-learn\r\n- ClustSim\r\n\r\nFor visualization:\r\n- matplotlib\r\n \r\n### DBCV Score\r\n#### Simple Scenario\r\nThe half moons dataset simulated from scikit-learn is shown:\r\n<p align=\"center\">\r\n <img width=\"500\" height=\"300\" src=https://github.com/user-attachments/assets/22c7c5c3-dcf1-47d4-86fd-53f428e7f87b\r\n</p>\r\n\r\n```\r\nDBCV_Score(X,labels)\r\n```\r\nOutput: 0.5068928345037831\r\n\r\n#### Scenario II\r\nA larger dataset of clusters simulated with Clust_Sim-SMLM is shown:\r\n\r\n<p align=\"center\">\r\n <img width=\"300\" height=\"300\" src=https://github.com/user-attachments/assets/acd7adee-9416-4a61-bfa0-caebf540097b\r\n</p>\r\n \r\n```\r\nscore = DBCV_score(X,labels)\r\n```\r\nOutput: 0.6171526846848352\r\n\r\n### Extracting Individual Cluster Scores\r\nk-DBCV enables individual cluster score extraction where each cluster is assigned a score without consideration for noise:\r\nIndividual Cluster Score = separation-sparseness/max(separation,sparseness)\r\n\r\nBy default, ind_clust_scores is set to False\r\n```\r\nscore, ind_clust_score_array = DBCV_Score(X,labels, ind_clust_scores = True)\r\n```\r\nIndividual cluster scores are displayed by color below:\r\n<p align=\"center\">\r\n <img width=\"350\" height=\"300\" src=https://github.com/user-attachments/assets/56cd291a-9991-45d9-8dd7-cd132ec823fb\r\n</p>\r\n\r\n### Memory cutoff\r\nA memory cutoff is necessary to prevent attempts to score clusters that would exceed available memory. This cutoff should be set dependent on the machine being used. The default is set to a maximum of 25.0 GB. The score will output a -1 if the cutoff would be exceeded, along with an error message. To remove these error messages set batch_mode = True (Default is False).\r\n```\r\nscore = DBCV_score(X,labels, memory_cutoff = 25.0)\r\n```\r\n\r\n## Relevant Citations\r\n#### Density Based Cluster Validation\r\n\r\nMoulavi, D., Jaskowiak, P. A., Campello, R. J. G. B., Zimek, A. & Sander, J. Density-based clustering validation. SIAM Int. Conf. Data Min. 2014, SDM 2014 2, 839\u00e2\u20ac\u201c847 (2014)\r\n\r\n#### k-DBCV implementation\r\n\r\nHammer, J. L., Devanny, A. J. & Kaufman, L. J. Density-based optimization for unbiased, reproducible clustering applied to single molecule localization microscopy. Preprint at https://www.biorxiv.org/content/10.1101/2024.11.01.621498v1 (2024)\r\n\r\n## License\r\nk-DBCV is licensed with an MIT license. See LICENSE file for more information.\r\n\r\n## Referencing\r\n#### In addition to citing Moulavi et al., if you use this repository, please cite with the following (currently in preprint):\r\n\r\nHammer, J. L., Devanny, A. J. & Kaufman, L. J. Density-based optimization for unbiased, reproducible clustering applied to single molecule localization microscopy. Preprint at https://www.biorxiv.org/content/10.1101/2024.11.01.621498v1 (2024)\r\n\r\n## Contact \r\nkaufmangroup.rubylab@gmail.com\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Efficient implementation of DBCV with a k-dimensional tree",
"version": "1.0.0",
"project_urls": {
"Homepage": "https://github.com/Kaufman-Lab-Columbia/k-DBCV"
},
"split_keywords": [
"cluster",
"clusters",
"clustering"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "47ccf312b478ca3839a1847213e33818ffac57ebd8f992dbd7f840f57e66e8d9",
"md5": "298a41210fbee5ef98d976c5d710c8a9",
"sha256": "3059216e871d9578e93ed19b8a97fd3cca263b9ae600b656030b27583122d2c0"
},
"downloads": -1,
"filename": "kDBCV-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "298a41210fbee5ef98d976c5d710c8a9",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 9572,
"upload_time": "2024-11-04T19:54:59",
"upload_time_iso_8601": "2024-11-04T19:54:59.470947Z",
"url": "https://files.pythonhosted.org/packages/47/cc/f312b478ca3839a1847213e33818ffac57ebd8f992dbd7f840f57e66e8d9/kDBCV-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "e7f7fb4cd6b293b6f3cb61a2c800bfb89a8dc387dc79545e48a1de0ff6cf5313",
"md5": "4d0cde9ea2bb2c5919f4a0820ddc2107",
"sha256": "3b0d66e103f935eb11008e90bb6d52b5d7e7a0bbed1f69a3e44f050d5144a3e2"
},
"downloads": -1,
"filename": "kdbcv-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "4d0cde9ea2bb2c5919f4a0820ddc2107",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 10975,
"upload_time": "2024-11-04T19:55:00",
"upload_time_iso_8601": "2024-11-04T19:55:00.575746Z",
"url": "https://files.pythonhosted.org/packages/e7/f7/fb4cd6b293b6f3cb61a2c800bfb89a8dc387dc79545e48a1de0ff6cf5313/kdbcv-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-04 19:55:00",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Kaufman-Lab-Columbia",
"github_project": "k-DBCV",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "numpy",
"specs": [
[
"<",
"2"
],
[
">=",
"1.20.0"
]
]
},
{
"name": "scipy",
"specs": [
[
"<=",
"1.14.1"
],
[
">=",
"1.7.0"
]
]
}
],
"lcname": "kdbcv"
}