# clusteval
<p align="center">
<a href="https://erdogant.github.io/clusteval">
<img src="https://github.com/erdogant/clusteval/blob/master/docs/figs/logo_large_2.png" width="300" />
</a>
</p>
[](https://img.shields.io/pypi/pyversions/clusteval)
[](https://pypi.org/project/clusteval/)
[](https://github.com/erdogant/clusteval/blob/master/LICENSE)
[](https://www.buymeacoffee.com/erdogant)
[](https://github.com/erdogant/clusteval/network)
[](https://github.com/erdogant/clusteval/issues)
[](http://www.repostatus.org/#active)
[](https://pepy.tech/project/clusteval)
[](https://pepy.tech/project/clusteval)
[](https://zenodo.org/badge/latestdoi/232915924)
[](https://erdogant.github.io/clusteval/)
[](https://erdogant.github.io/clusteval/pages/html/Documentation.html#colab-notebook)
<!---[](https://erdogant.github.io/donate/?currency=USD&amount=5)-->
``clusteval`` is a python package that is developed to evaluate detected clusters and return the cluster labels that have most optimal **clustering tendency**, **Number of clusters** and **clustering quality**. Multiple evaluation strategies are implemented for the evaluation; **silhouette**, **dbindex**, and **derivative**, and four clustering methods can be used: **agglomerative**, **kmeans**, **dbscan** and **hdbscan**.
#
**⭐️ Star this repo if you like it ⭐️**
#
### Blogs
#### [1. A step-by-step guide for clustering images](https://towardsdatascience.com/a-step-by-step-guide-for-clustering-images-4b45f9906128)
#### [2. Detection of Duplicate Images Using Image Hash Functions](https://towardsdatascience.com/detection-of-duplicate-images-using-image-hash-functions-4d9c53f04a75)
#### [3. From Data to Clusters: When is Your Clustering Good Enough?](https://towardsdatascience.com/from-data-to-clusters-when-is-your-clustering-good-enough-5895440a978a)
#### [4. From Clusters To Insights; The Next Step](https://towardsdatascience.com/from-clusters-to-insights-the-next-step-1c166814e0c6)
#
### [Documentation pages](https://erdogant.github.io/clusteval/)
On the [documentation pages](https://erdogant.github.io/clusteval/) you can find detailed information about the working of the ``clusteval`` with many examples.
#
### Installation
##### It is advisable to create a new environment (e.g. with Conda).
```bash
conda create -n env_clusteval python=3.8
conda activate clusteval
```
##### Install from PyPI
```bash
pip install clusteval
```
##### Import library
```python
from clusteval import clusteval
```
<hr>
### Examples
A structured overview of all examples are now available on the [documentation pages](https://erdogant.github.io/clusteval/).
<hr>
* [Example: Cluster validation using Silhouette score](https://erdogant.github.io/clusteval/pages/html/Examples.html#cluster-evaluation)
<p align="left">
<a href="https://erdogant.github.io/clusteval/pages/html/Examples.html#cluster-evaluation">
<img src="https://github.com/erdogant/clusteval/blob/master/docs/figs/fig1b_sil.png" width="600" />
</a>
</p>
#
* [Example: Determine the optimal number of clusters](https://erdogant.github.io/clusteval/pages/html/Plots.html#plot)
<p align="left">
<a href="https://erdogant.github.io/clusteval/pages/html/Plots.html#plot">
<img src="https://github.com/erdogant/clusteval/blob/master/docs/figs/fig1a_sil.png" width="600" />
</a>
</p>
#
* [Example: Plot the dendrogram](https://erdogant.github.io/clusteval/pages/html/Plots.html#dendrogram)
<p align="left">
<a href="https://erdogant.github.io/clusteval/pages/html/Plots.html#dendrogram">
<img src="https://github.com/erdogant/clusteval/blob/master/docs/figs/dendrogram.png" width="600" />
</a>
</p>
#
* [Example: Cluster validation using davies-boulin index](https://erdogant.github.io/clusteval/pages/html/Examples.html#dbindex-method)
<p align="left">
<a href="https://erdogant.github.io/clusteval/pages/html/Examples.html#dbindex-method">
<img src="https://github.com/erdogant/clusteval/blob/master/docs/figs/dendrogram.png" width="600" />
</a>
</p>
#
* [Example: Cluster validation using davies-boulin index](https://erdogant.github.io/clusteval/pages/html/Examples.html#dbindex-method)
<p align="left">
<a href="https://erdogant.github.io/clusteval/pages/html/Examples.html#dbindex-method">
<img src="https://github.com/erdogant/clusteval/blob/master/docs/figs/fig2_dbindex.png" width="600" />
</a>
</p>
#
* [Example: Cluster validation using derivative evaluation method](https://erdogant.github.io/clusteval/pages/html/Examples.html#derivative-method)
<p align="left">
<a href="https://erdogant.github.io/clusteval/pages/html/Examples.html#derivative-method">
<img src="https://github.com/erdogant/clusteval/blob/master/docs/figs/fig3_der.png" width="600" />
</a>
</p>
#
* [Example: Cluster validation using dbscan](https://erdogant.github.io/clusteval/pages/html/Examples.html#dbscan)
<p align="left">
<a href="https://erdogant.github.io/clusteval/pages/html/Examples.html#dbscan">
<img src="https://github.com/erdogant/clusteval/blob/master/docs/figs/fig5_dbscan.png" width="600" />
</a>
</p>
#
* [Example: Cluster validation using hdbscan](https://erdogant.github.io/clusteval/pages/html/Examples.html#hdbscan)
<p align="left">
<a href="https://erdogant.github.io/clusteval/pages/html/Examples.html#hdbscan">
<img src="https://github.com/erdogant/clusteval/blob/master/docs/figs/fig4a_hdbscan.png" width="600" />
<img src="https://github.com/erdogant/clusteval/blob/master/docs/figs/fig4b_hdbscan.png" width="600" />
</a>
</p>
## Citation
Please cite clusteval in your publications if this is useful for your research (see right top for citation).
## Other interesting techniques/blogs
* Use ARI when the ground truth clustering has large equal sized clusters
* Usa AMI when the ground truth clustering is unbalanced and there exist small clusters
* https://scikit-learn.org/stable/modules/generated/sklearn.metrics.adjusted_rand_score.html
* https://scikit-learn.org/stable/auto_examples/cluster/plot_adjusted_for_chance_measures.html#sphx-glr-auto-examples-cluster-plot-adjusted-for-chance-measures-py
* https://github.com/idealo/imagededup
* https://towardsdatascience.com/how-to-cluster-images-based-on-visual-similarity-cd6e7209fe34
* https://github.com/facebookresearch/deepcluster
* https://towardsdatascience.com/pca-on-hyperspectral-data-99c9c5178385
* https://machinelearningmastery.com/face-recognition-using-principal-component-analysis/
### Maintainer
* Erdogan Taskesen, github: [erdogant](https://github.com/erdogant)
* Contributions are welcome.
* If you wish to buy me a <a href="https://erdogant.github.io/donate/?currency=USD&amount=5">Coffee</a> for this work, it is very appreciated :)
Star it if you like it!
Raw data
{
"_id": null,
"home_page": "https://erdogant.github.io/clusteval",
"name": "clusteval",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3",
"maintainer_email": null,
"keywords": null,
"author": "Erdogan Taskesen",
"author_email": "erdogant@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/57/1e/84ccb22be5de93ed968390b496e6512a49945c2c0183b571345cf2c4daeb/clusteval-2.2.2.tar.gz",
"platform": null,
"description": "# clusteval\r\n<p align=\"center\">\r\n <a href=\"https://erdogant.github.io/clusteval\">\r\n <img src=\"https://github.com/erdogant/clusteval/blob/master/docs/figs/logo_large_2.png\" width=\"300\" />\r\n </a>\r\n</p>\r\n\r\n[](https://img.shields.io/pypi/pyversions/clusteval)\r\n[](https://pypi.org/project/clusteval/)\r\n[](https://github.com/erdogant/clusteval/blob/master/LICENSE)\r\n[](https://www.buymeacoffee.com/erdogant)\r\n[](https://github.com/erdogant/clusteval/network)\r\n[](https://github.com/erdogant/clusteval/issues)\r\n[](http://www.repostatus.org/#active)\r\n[](https://pepy.tech/project/clusteval)\r\n[](https://pepy.tech/project/clusteval)\r\n[](https://zenodo.org/badge/latestdoi/232915924)\r\n[](https://erdogant.github.io/clusteval/)\r\n[](https://erdogant.github.io/clusteval/pages/html/Documentation.html#colab-notebook)\r\n<!---[](https://erdogant.github.io/donate/?currency=USD&amount=5)-->\r\n\r\n``clusteval`` is a python package that is developed to evaluate detected clusters and return the cluster labels that have most optimal **clustering tendency**, **Number of clusters** and **clustering quality**. Multiple evaluation strategies are implemented for the evaluation; **silhouette**, **dbindex**, and **derivative**, and four clustering methods can be used: **agglomerative**, **kmeans**, **dbscan** and **hdbscan**.\r\n\r\n\r\n# \r\n**\u2b50\ufe0f Star this repo if you like it \u2b50\ufe0f**\r\n# \r\n\r\n### Blogs\r\n#### [1. A step-by-step guide for clustering images](https://towardsdatascience.com/a-step-by-step-guide-for-clustering-images-4b45f9906128)\r\n\r\n#### [2. Detection of Duplicate Images Using Image Hash Functions](https://towardsdatascience.com/detection-of-duplicate-images-using-image-hash-functions-4d9c53f04a75)\r\n\r\n#### [3. From Data to Clusters: When is Your Clustering Good Enough?](https://towardsdatascience.com/from-data-to-clusters-when-is-your-clustering-good-enough-5895440a978a)\r\n\r\n#### [4. From Clusters To Insights; The Next Step](https://towardsdatascience.com/from-clusters-to-insights-the-next-step-1c166814e0c6)\r\n\r\n\r\n# \r\n\r\n### [Documentation pages](https://erdogant.github.io/clusteval/)\r\n\r\nOn the [documentation pages](https://erdogant.github.io/clusteval/) you can find detailed information about the working of the ``clusteval`` with many examples. \r\n\r\n# \r\n\r\n### Installation\r\n\r\n##### It is advisable to create a new environment (e.g. with Conda). \r\n```bash\r\nconda create -n env_clusteval python=3.8\r\nconda activate clusteval\r\n```\r\n\r\n##### Install from PyPI\r\n```bash\r\npip install clusteval\r\n```\r\n\r\n##### Import library\r\n```python\r\nfrom clusteval import clusteval\r\n```\r\n\r\n<hr>\r\n\r\n### Examples\r\nA structured overview of all examples are now available on the [documentation pages](https://erdogant.github.io/clusteval/).\r\n\r\n<hr>\r\n\r\n\r\n* [Example: Cluster validation using Silhouette score](https://erdogant.github.io/clusteval/pages/html/Examples.html#cluster-evaluation)\r\n\r\n<p align=\"left\">\r\n <a href=\"https://erdogant.github.io/clusteval/pages/html/Examples.html#cluster-evaluation\">\r\n <img src=\"https://github.com/erdogant/clusteval/blob/master/docs/figs/fig1b_sil.png\" width=\"600\" />\r\n </a>\r\n</p>\r\n\r\n\r\n#\r\n\r\n* [Example: Determine the optimal number of clusters](https://erdogant.github.io/clusteval/pages/html/Plots.html#plot)\r\n\r\n<p align=\"left\">\r\n <a href=\"https://erdogant.github.io/clusteval/pages/html/Plots.html#plot\">\r\n <img src=\"https://github.com/erdogant/clusteval/blob/master/docs/figs/fig1a_sil.png\" width=\"600\" />\r\n </a>\r\n</p>\r\n\r\n#\r\n\r\n* [Example: Plot the dendrogram](https://erdogant.github.io/clusteval/pages/html/Plots.html#dendrogram)\r\n\r\n<p align=\"left\">\r\n <a href=\"https://erdogant.github.io/clusteval/pages/html/Plots.html#dendrogram\">\r\n <img src=\"https://github.com/erdogant/clusteval/blob/master/docs/figs/dendrogram.png\" width=\"600\" />\r\n </a>\r\n</p>\r\n\r\n#\r\n\r\n* [Example: Cluster validation using davies-boulin index](https://erdogant.github.io/clusteval/pages/html/Examples.html#dbindex-method)\r\n\r\n<p align=\"left\">\r\n <a href=\"https://erdogant.github.io/clusteval/pages/html/Examples.html#dbindex-method\">\r\n <img src=\"https://github.com/erdogant/clusteval/blob/master/docs/figs/dendrogram.png\" width=\"600\" />\r\n </a>\r\n</p>\r\n\r\n#\r\n\r\n* [Example: Cluster validation using davies-boulin index](https://erdogant.github.io/clusteval/pages/html/Examples.html#dbindex-method)\r\n\r\n<p align=\"left\">\r\n <a href=\"https://erdogant.github.io/clusteval/pages/html/Examples.html#dbindex-method\">\r\n <img src=\"https://github.com/erdogant/clusteval/blob/master/docs/figs/fig2_dbindex.png\" width=\"600\" />\r\n </a>\r\n</p>\r\n\r\n#\r\n\r\n* [Example: Cluster validation using derivative evaluation method](https://erdogant.github.io/clusteval/pages/html/Examples.html#derivative-method)\r\n\r\n<p align=\"left\">\r\n <a href=\"https://erdogant.github.io/clusteval/pages/html/Examples.html#derivative-method\">\r\n <img src=\"https://github.com/erdogant/clusteval/blob/master/docs/figs/fig3_der.png\" width=\"600\" />\r\n </a>\r\n</p>\r\n\r\n#\r\n\r\n\r\n* [Example: Cluster validation using dbscan](https://erdogant.github.io/clusteval/pages/html/Examples.html#dbscan)\r\n\r\n<p align=\"left\">\r\n <a href=\"https://erdogant.github.io/clusteval/pages/html/Examples.html#dbscan\">\r\n <img src=\"https://github.com/erdogant/clusteval/blob/master/docs/figs/fig5_dbscan.png\" width=\"600\" />\r\n </a>\r\n</p>\r\n\r\n#\r\n\r\n* [Example: Cluster validation using hdbscan](https://erdogant.github.io/clusteval/pages/html/Examples.html#hdbscan)\r\n\r\n<p align=\"left\">\r\n <a href=\"https://erdogant.github.io/clusteval/pages/html/Examples.html#hdbscan\">\r\n <img src=\"https://github.com/erdogant/clusteval/blob/master/docs/figs/fig4a_hdbscan.png\" width=\"600\" />\r\n <img src=\"https://github.com/erdogant/clusteval/blob/master/docs/figs/fig4b_hdbscan.png\" width=\"600\" />\r\n </a>\r\n</p>\r\n\r\n\r\n\r\n\r\n\r\n\r\n## Citation\r\nPlease cite clusteval in your publications if this is useful for your research (see right top for citation).\r\n\r\n## Other interesting techniques/blogs\r\n* Use ARI when the ground truth clustering has large equal sized clusters\r\n* Usa AMI when the ground truth clustering is unbalanced and there exist small clusters\r\n* https://scikit-learn.org/stable/modules/generated/sklearn.metrics.adjusted_rand_score.html\r\n* https://scikit-learn.org/stable/auto_examples/cluster/plot_adjusted_for_chance_measures.html#sphx-glr-auto-examples-cluster-plot-adjusted-for-chance-measures-py\r\n* https://github.com/idealo/imagededup\r\n* https://towardsdatascience.com/how-to-cluster-images-based-on-visual-similarity-cd6e7209fe34\r\n* https://github.com/facebookresearch/deepcluster\r\n* https://towardsdatascience.com/pca-on-hyperspectral-data-99c9c5178385\r\n* https://machinelearningmastery.com/face-recognition-using-principal-component-analysis/\r\n\r\n### Maintainer\r\n* Erdogan Taskesen, github: [erdogant](https://github.com/erdogant)\r\n* Contributions are welcome.\r\n* If you wish to buy me a <a href=\"https://erdogant.github.io/donate/?currency=USD&amount=5\">Coffee</a> for this work, it is very appreciated :)\r\n\tStar it if you like it!\r\n",
"bugtrack_url": null,
"license": null,
"summary": "clusteval is a python package for unsupervised cluster validation.",
"version": "2.2.2",
"project_urls": {
"Download": "https://github.com/erdogant/clusteval/archive/2.2.2.tar.gz",
"Homepage": "https://erdogant.github.io/clusteval"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "992c1d2ba13fac34564b335b50b74949615aad3d35ba63b1c75ab3b8b7937ca9",
"md5": "1925ed9bcb29bdfe883ed33a18805982",
"sha256": "28104f94abfe884107fff25de7b3c058aeb1157b2a644be8075b8faa178e6098"
},
"downloads": -1,
"filename": "clusteval-2.2.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1925ed9bcb29bdfe883ed33a18805982",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3",
"size": 42740,
"upload_time": "2024-10-02T13:05:20",
"upload_time_iso_8601": "2024-10-02T13:05:20.432345Z",
"url": "https://files.pythonhosted.org/packages/99/2c/1d2ba13fac34564b335b50b74949615aad3d35ba63b1c75ab3b8b7937ca9/clusteval-2.2.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "571e84ccb22be5de93ed968390b496e6512a49945c2c0183b571345cf2c4daeb",
"md5": "7f91301862f53ff1a7454bc8e282ba70",
"sha256": "0f543272bcd630e85b5ea5f3c55ca8c703e264d9a4b4b41b44bc009364643c2a"
},
"downloads": -1,
"filename": "clusteval-2.2.2.tar.gz",
"has_sig": false,
"md5_digest": "7f91301862f53ff1a7454bc8e282ba70",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3",
"size": 36663,
"upload_time": "2024-10-02T13:05:21",
"upload_time_iso_8601": "2024-10-02T13:05:21.762481Z",
"url": "https://files.pythonhosted.org/packages/57/1e/84ccb22be5de93ed968390b496e6512a49945c2c0183b571345cf2c4daeb/clusteval-2.2.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-02 13:05:21",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "erdogant",
"github_project": "clusteval",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "clusteval"
}