kDBCV


NamekDBCV JSON
Version 1.0.0 PyPI version JSON
download
home_pagehttps://github.com/Kaufman-Lab-Columbia/k-DBCV
SummaryEfficient implementation of DBCV with a k-dimensional tree
upload_time2024-11-04 19:55:00
maintainerNone
docs_urlNone
authorJoseph L. Hammer, Alexander J. Devanny
requires_pythonNone
licenseMIT
keywords cluster clusters clustering
VCS
bugtrack_url
requirements numpy scipy
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # k-DBCV

k-DBCV is an efficient python implementation of the density based cluster validation (DBCV) score proposed by Moulavi et al. (2014). 

## Getting Started
### Dependencies
- SciPy
- NumPy
### Installation
k-DBCV can be installed via pip:
```
pip install kDBCV
```

## Usage
To score clustering scenarios, the following libraries are used:
- scikit-learn
- ClustSim

For visualization:
- matplotlib
 
### DBCV Score
#### Simple Scenario
The half moons dataset simulated from scikit-learn is shown:
<p align="center">
  <img width="500" height="300" src=https://github.com/user-attachments/assets/22c7c5c3-dcf1-47d4-86fd-53f428e7f87b
</p>

```
DBCV_Score(X,labels)
```
Output: 0.5068928345037831

#### Scenario II
A larger dataset of clusters simulated with Clust_Sim-SMLM is shown:

<p align="center">
  <img width="300" height="300" src=https://github.com/user-attachments/assets/acd7adee-9416-4a61-bfa0-caebf540097b
</p>
 
```
score = DBCV_score(X,labels)
```
Output: 0.6171526846848352

### Extracting Individual Cluster Scores
k-DBCV enables individual cluster score extraction where each cluster is assigned a score without consideration for noise:
Individual Cluster Score = separation-sparseness/max(separation,sparseness)

By default, ind_clust_scores is set to False
```
score, ind_clust_score_array = DBCV_Score(X,labels, ind_clust_scores = True)
```
Individual cluster scores are displayed by color below:
<p align="center">
  <img width="350" height="300" src=https://github.com/user-attachments/assets/56cd291a-9991-45d9-8dd7-cd132ec823fb
</p>

### Memory cutoff
A memory cutoff is necessary to prevent attempts to score clusters that would exceed available memory. This cutoff should be set dependent on the machine being used. The default is set to a maximum of 25.0 GB. The score will output a -1 if the cutoff would be exceeded, along with an error message. To remove these error messages set batch_mode = True (Default is False).
```
score = DBCV_score(X,labels, memory_cutoff = 25.0)
```

## Relevant Citations
#### Density Based Cluster Validation

Moulavi, D., Jaskowiak, P. A., Campello, R. J. G. B., Zimek, A. & Sander, J. Density-based clustering validation. SIAM Int. Conf. Data Min. 2014, SDM 2014 2, 839–847 (2014)

#### k-DBCV implementation

Hammer, J. L., Devanny, A. J. & Kaufman, L. J. Density-based optimization for unbiased, reproducible clustering applied to single molecule localization microscopy. Preprint at https://www.biorxiv.org/content/10.1101/2024.11.01.621498v1 (2024)

## License
k-DBCV is licensed with an MIT license. See LICENSE file for more information.

## Referencing
#### In addition to citing Moulavi et al., if you use this repository, please cite with the following (currently in preprint):

Hammer, J. L., Devanny, A. J. & Kaufman, L. J. Density-based optimization for unbiased, reproducible clustering applied to single molecule localization microscopy. Preprint at https://www.biorxiv.org/content/10.1101/2024.11.01.621498v1 (2024)

## Contact 
kaufmangroup.rubylab@gmail.com

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Kaufman-Lab-Columbia/k-DBCV",
    "name": "kDBCV",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "cluster clusters clustering",
    "author": "Joseph L. Hammer, Alexander J. Devanny",
    "author_email": "jhammer3018@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/e7/f7/fb4cd6b293b6f3cb61a2c800bfb89a8dc387dc79545e48a1de0ff6cf5313/kdbcv-1.0.0.tar.gz",
    "platform": null,
    "description": "# k-DBCV\r\n\r\nk-DBCV is an efficient python implementation of the density based cluster validation (DBCV) score proposed by Moulavi et al. (2014). \r\n\r\n## Getting Started\r\n### Dependencies\r\n- SciPy\r\n- NumPy\r\n### Installation\r\nk-DBCV can be installed via pip:\r\n```\r\npip install kDBCV\r\n```\r\n\r\n## Usage\r\nTo score clustering scenarios, the following libraries are used:\r\n- scikit-learn\r\n- ClustSim\r\n\r\nFor visualization:\r\n- matplotlib\r\n \r\n### DBCV Score\r\n#### Simple Scenario\r\nThe half moons dataset simulated from scikit-learn is shown:\r\n<p align=\"center\">\r\n  <img width=\"500\" height=\"300\" src=https://github.com/user-attachments/assets/22c7c5c3-dcf1-47d4-86fd-53f428e7f87b\r\n</p>\r\n\r\n```\r\nDBCV_Score(X,labels)\r\n```\r\nOutput: 0.5068928345037831\r\n\r\n#### Scenario II\r\nA larger dataset of clusters simulated with Clust_Sim-SMLM is shown:\r\n\r\n<p align=\"center\">\r\n  <img width=\"300\" height=\"300\" src=https://github.com/user-attachments/assets/acd7adee-9416-4a61-bfa0-caebf540097b\r\n</p>\r\n \r\n```\r\nscore = DBCV_score(X,labels)\r\n```\r\nOutput: 0.6171526846848352\r\n\r\n### Extracting Individual Cluster Scores\r\nk-DBCV enables individual cluster score extraction where each cluster is assigned a score without consideration for noise:\r\nIndividual Cluster Score = separation-sparseness/max(separation,sparseness)\r\n\r\nBy default, ind_clust_scores is set to False\r\n```\r\nscore, ind_clust_score_array = DBCV_Score(X,labels, ind_clust_scores = True)\r\n```\r\nIndividual cluster scores are displayed by color below:\r\n<p align=\"center\">\r\n  <img width=\"350\" height=\"300\" src=https://github.com/user-attachments/assets/56cd291a-9991-45d9-8dd7-cd132ec823fb\r\n</p>\r\n\r\n### Memory cutoff\r\nA memory cutoff is necessary to prevent attempts to score clusters that would exceed available memory. This cutoff should be set dependent on the machine being used. The default is set to a maximum of 25.0 GB. The score will output a -1 if the cutoff would be exceeded, along with an error message. To remove these error messages set batch_mode = True (Default is False).\r\n```\r\nscore = DBCV_score(X,labels, memory_cutoff = 25.0)\r\n```\r\n\r\n## Relevant Citations\r\n#### Density Based Cluster Validation\r\n\r\nMoulavi, D., Jaskowiak, P. A., Campello, R. J. G. B., Zimek, A. & Sander, J. Density-based clustering validation. SIAM Int. Conf. Data Min. 2014, SDM 2014 2, 839\u00e2\u20ac\u201c847 (2014)\r\n\r\n#### k-DBCV implementation\r\n\r\nHammer, J. L., Devanny, A. J. & Kaufman, L. J. Density-based optimization for unbiased, reproducible clustering applied to single molecule localization microscopy. Preprint at https://www.biorxiv.org/content/10.1101/2024.11.01.621498v1 (2024)\r\n\r\n## License\r\nk-DBCV is licensed with an MIT license. See LICENSE file for more information.\r\n\r\n## Referencing\r\n#### In addition to citing Moulavi et al., if you use this repository, please cite with the following (currently in preprint):\r\n\r\nHammer, J. L., Devanny, A. J. & Kaufman, L. J. Density-based optimization for unbiased, reproducible clustering applied to single molecule localization microscopy. Preprint at https://www.biorxiv.org/content/10.1101/2024.11.01.621498v1 (2024)\r\n\r\n## Contact \r\nkaufmangroup.rubylab@gmail.com\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Efficient implementation of DBCV with a k-dimensional tree",
    "version": "1.0.0",
    "project_urls": {
        "Homepage": "https://github.com/Kaufman-Lab-Columbia/k-DBCV"
    },
    "split_keywords": [
        "cluster",
        "clusters",
        "clustering"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "47ccf312b478ca3839a1847213e33818ffac57ebd8f992dbd7f840f57e66e8d9",
                "md5": "298a41210fbee5ef98d976c5d710c8a9",
                "sha256": "3059216e871d9578e93ed19b8a97fd3cca263b9ae600b656030b27583122d2c0"
            },
            "downloads": -1,
            "filename": "kDBCV-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "298a41210fbee5ef98d976c5d710c8a9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 9572,
            "upload_time": "2024-11-04T19:54:59",
            "upload_time_iso_8601": "2024-11-04T19:54:59.470947Z",
            "url": "https://files.pythonhosted.org/packages/47/cc/f312b478ca3839a1847213e33818ffac57ebd8f992dbd7f840f57e66e8d9/kDBCV-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e7f7fb4cd6b293b6f3cb61a2c800bfb89a8dc387dc79545e48a1de0ff6cf5313",
                "md5": "4d0cde9ea2bb2c5919f4a0820ddc2107",
                "sha256": "3b0d66e103f935eb11008e90bb6d52b5d7e7a0bbed1f69a3e44f050d5144a3e2"
            },
            "downloads": -1,
            "filename": "kdbcv-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "4d0cde9ea2bb2c5919f4a0820ddc2107",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 10975,
            "upload_time": "2024-11-04T19:55:00",
            "upload_time_iso_8601": "2024-11-04T19:55:00.575746Z",
            "url": "https://files.pythonhosted.org/packages/e7/f7/fb4cd6b293b6f3cb61a2c800bfb89a8dc387dc79545e48a1de0ff6cf5313/kdbcv-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-04 19:55:00",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Kaufman-Lab-Columbia",
    "github_project": "k-DBCV",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "numpy",
            "specs": [
                [
                    "<",
                    "2"
                ],
                [
                    ">=",
                    "1.20.0"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    "<=",
                    "1.14.1"
                ],
                [
                    ">=",
                    "1.7.0"
                ]
            ]
        }
    ],
    "lcname": "kdbcv"
}
        
Elapsed time: 0.42624s