# The PyClustKit Module: All about clustering in a single Python Module!
The pyclustkit module is built on top of various libraries to enable many clustering operations.
Currently, the module is built for clustering evaluation and meta-learning.
# Table of Contents
- [Installation Instructions](#installation-instructions)
- [Useful Links](#useful-links)
- [Usage Examples](#usage-examples)
- [Cite Us!](#citing-this-work)
# Installation Instructions
The pyclustkit is available to download with pypi
```commandline
pip install pyclustkit
```
I
# Useful Links
# Usage Examples
## Calculating Internal Cluster Validity Indices (CVI)
PyClustKit comes with an evaluation suite of 46 internal validity indices. Each is implemented on top of numpy and,
the module incorporates specific methods for speeding up the execution of multiple CVI by implementing a shared process
tracking.
```{python}
from pyclustkit.eval import CVIToolbox
ct = CVIToolbox(X,y)
ct.calculate_icvi(cvi=["dunn", "silhouette"]) # if no CVI are specified it defaults to 'all'.
print(ct.cvi_results)
```
## Meta Learning
### Meta-Feature Extraction
PyClustKit comes with an evaluation suite of 46 internal validity indices. Each is implemented on top of numpy and,
the module incorporates specific methods for speeding up the execution of multiple CVI by implementing a shared process
tracking.
```{python}
from pyclustkit.eval import CVIToolbox
ct = CVIToolbox(X,y)
ct.calculate_icvi(cvi=["dunn", "silhouette"]) # if no CVI are specified it defaults to 'all'.
print(ct.cvi_results)
```
# Citing This Work
<details>
<summary>List of Implemented CVI with citations</summary>
Currently the collection consists of the following internal CVIs. R does not do gdi 61,62,63 due to hausdorff:
1. **ball_hall**: <i> G. H. Ball and D. J. Hall. Isodata: A novel method of data analysis and pattern
classification. Menlo Park: Stanford Research Institute. (NTIS No. AD 699616),1965.</i>
2. **banfeld_raftery**: <i> J.D. Banfield and A.E. Raftery. Model-based gaussian and non-gaussian clustering. Biometrics,
49:803–821, 1993. </i>
3. **c_index**: <i> Hubert, Lawrence & Levin, Joel. (1976). A general statistical framework for assessing categorical
clustering in free recall. Psychological Bulletin. 83. 1072-1080. 10.1037/0033-2909.83.6.1072. </i>
4. **CDbw** : <i>Halkidi, M., & Vazirgiannis, M. (2008). A density-based cluster validity approach using
multi-representatives. Pattern Recognit. Lett., 29, 773-786. </i>
5. **det_ratio** : <i> A. J. Scott and M. J. Symons. Clustering methods based on likelihood ratio criteria. Biometrics,
27:387–397, 1971.</i>
6. **Dunn Index** : <i>J. Dunn. Well separated clusters and optimal fuzzy partitions. Journal of Cybernetics, 4:95–104,
1974. </i>
7. **GDI [11,21,31,41,51,61][12,22,32,42,52,62][13,23,33,43,53,63]**: <i>J. C. Bezdek and N. R. Pal. Some new indexes of
cluster validity. IEEE Transactions on Systems, Man, and CyberneticsÑPART B: CYBERNETICS, 28, no.3:301–315, 1998.</i>
8. **ksq_detw**: F. H. B. Marriot. Practical problems in a method of cluster analysis. Biometrics,
27:456–460, 1975.
9. **log_det_ratio**: <i> Halkidi et al. On clustering validation techniques. J. Intell. Inf. Syst., 2001. </i>
10. **log_ss_ratio**: <i> J. A. Hartigan. Clustering algorithms. New York: Wiley, 1975. </i>
11. **McClain_Rao**: <i> J. O. McClain and V. R. Rao. Clustisz: A program to test for the quality of
clustering of a set of objects. Journal of Marketing Research, 12:456–460, 1975.</i>
11. trace_w Index
13. Friedman-Rudin 1 Index
14. Friedman-Rudin 2 Index
15. **S_dbw**: <i> M. Halkidi and M. Vazirgiannis, "Clustering validity assessment: finding the optimal partitioning of a
data set," Proceedings 2001 IEEE International Conference on Data Mining. </i>
16. **sd_dis Index**: <i>Halkidi et al. On clustering validation techniques. J. Intell. Inf. Syst., 2001.</i>
17. **sd_scat Index**: <i>Halkidi et al. On clustering validation techniques. J. Intell. Inf. Syst., 2001.</i>
18. **pbm**: <i> Bandyopadhyay S. Pakhira M. K. and Maulik U. Validity index for crisp and fuzzy clusters. Pattern
Recognition, 2004. </i>
19. ratkowsky_lance
20.
21. **ray_turi**: <i> Ray et al. Determination of number of clusters in k-means clustering and application in colour
image segmentation. 4th International Conference on Advances in Pattern Recognition and Digital
Techniques, 1999. </i>
22. wemmert_gancarski
23. **xie_beni**: <i> X.L. Xie and G. Beni. A validity measure for fuzzy clustering. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 1991. </i>
24.
25. banfeld_raftery
26. trace_wib
27.
28. log_det_ratio
29.
30. point_biserial
31. calinski_harabasz
32. silhouette
33. davies_bouldin
34. scott_symons
Raw data
{
"_id": null,
"home_page": "https://github.com/yannispoulakis/pyclustkit",
"name": "pyclust-evl",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": "Clustering, Meta-Learning, Meta-Features, Evaluation",
"author": "Yannis Poulakis",
"author_email": "giannispoy@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/dc/d4/b0ba4712fc323ee7379648fabee8f6fe62ac4976050cca4c1584dfe5fe4c/pyclust_evl-0.1.0.tar.gz",
"platform": null,
"description": "# The PyClustKit Module: All about clustering in a single Python Module!\r\n\r\nThe pyclustkit module is built on top of various libraries to enable many clustering operations. \r\nCurrently, the module is built for clustering evaluation and meta-learning. \r\n\r\n# Table of Contents\r\n- [Installation Instructions](#installation-instructions)\r\n- [Useful Links](#useful-links)\r\n- [Usage Examples](#usage-examples)\r\n- [Cite Us!](#citing-this-work)\r\n\r\n# Installation Instructions\r\n\r\nThe pyclustkit is available to download with pypi\r\n\r\n```commandline\r\npip install pyclustkit\r\n```\r\n\r\nI\r\n\r\n# Useful Links\r\n\r\n\r\n# Usage Examples\r\n## Calculating Internal Cluster Validity Indices (CVI) \r\n\r\nPyClustKit comes with an evaluation suite of 46 internal validity indices. Each is implemented on top of numpy and, \r\nthe module incorporates specific methods for speeding up the execution of multiple CVI by implementing a shared process \r\ntracking. \r\n\r\n\r\n```{python}\r\nfrom pyclustkit.eval import CVIToolbox \r\n\r\nct = CVIToolbox(X,y)\r\nct.calculate_icvi(cvi=[\"dunn\", \"silhouette\"]) # if no CVI are specified it defaults to 'all'.\r\nprint(ct.cvi_results)\r\n\r\n```\r\n## Meta Learning \r\n\r\n### Meta-Feature Extraction\r\nPyClustKit comes with an evaluation suite of 46 internal validity indices. Each is implemented on top of numpy and, \r\nthe module incorporates specific methods for speeding up the execution of multiple CVI by implementing a shared process \r\ntracking. \r\n\r\n\r\n```{python}\r\nfrom pyclustkit.eval import CVIToolbox \r\n\r\nct = CVIToolbox(X,y)\r\nct.calculate_icvi(cvi=[\"dunn\", \"silhouette\"]) # if no CVI are specified it defaults to 'all'.\r\nprint(ct.cvi_results)\r\n\r\n```\r\n\r\n\r\n\r\n\r\n# Citing This Work\r\n\r\n\r\n<details>\r\n<summary>List of Implemented CVI with citations</summary>\r\nCurrently the collection consists of the following internal CVIs. R does not do gdi 61,62,63 due to hausdorff:\r\n\r\n1. **ball_hall**: <i> G. H. Ball and D. J. Hall. Isodata: A novel method of data analysis and pattern\r\n classification. Menlo Park: Stanford Research Institute. (NTIS No. AD 699616),1965.</i>\r\n2. **banfeld_raftery**: <i> J.D. Banfield and A.E. Raftery. Model-based gaussian and non-gaussian clustering. Biometrics,\r\n 49:803\u2013821, 1993. </i>\r\n3. **c_index**: <i> Hubert, Lawrence & Levin, Joel. (1976). A general statistical framework for assessing categorical \r\nclustering in free recall. Psychological Bulletin. 83. 1072-1080. 10.1037/0033-2909.83.6.1072. </i>\r\n4. **CDbw** : <i>Halkidi, M., & Vazirgiannis, M. (2008). A density-based cluster validity approach using \r\nmulti-representatives. Pattern Recognit. Lett., 29, 773-786. </i>\r\n5. **det_ratio** : <i> A. J. Scott and M. J. Symons. Clustering methods based on likelihood ratio criteria. Biometrics, \r\n 27:387\u2013397, 1971.</i>\r\n6. **Dunn Index** : <i>J. Dunn. Well separated clusters and optimal fuzzy partitions. Journal of Cybernetics, 4:95\u2013104, \r\n 1974. </i>\r\n\r\n7. **GDI [11,21,31,41,51,61][12,22,32,42,52,62][13,23,33,43,53,63]**: <i>J. C. Bezdek and N. R. Pal. Some new indexes of\r\ncluster validity. IEEE Transactions on Systems, Man, and Cybernetics\u00d1PART B: CYBERNETICS, 28, no.3:301\u2013315, 1998.</i>\r\n8. **ksq_detw**: F. H. B. Marriot. Practical problems in a method of cluster analysis. Biometrics,\r\n27:456\u2013460, 1975.\r\n9. **log_det_ratio**: <i> Halkidi et al. On clustering validation techniques. J. Intell. Inf. Syst., 2001. </i>\r\n10. **log_ss_ratio**: <i> J. A. Hartigan. Clustering algorithms. New York: Wiley, 1975. </i>\r\n11. **McClain_Rao**: <i> J. O. McClain and V. R. Rao. Clustisz: A program to test for the quality of\r\n clustering of a set of objects. Journal of Marketing Research, 12:456\u2013460, 1975.</i>\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n11. trace_w Index\r\n\r\n13. Friedman-Rudin 1 Index\r\n14. Friedman-Rudin 2 Index\r\n15. **S_dbw**: <i> M. Halkidi and M. Vazirgiannis, \"Clustering validity assessment: finding the optimal partitioning of a \r\ndata set,\" Proceedings 2001 IEEE International Conference on Data Mining. </i>\r\n16. **sd_dis Index**: <i>Halkidi et al. On clustering validation techniques. J. Intell. Inf. Syst., 2001.</i>\r\n17. **sd_scat Index**: <i>Halkidi et al. On clustering validation techniques. J. Intell. Inf. Syst., 2001.</i> \r\n\r\n18. **pbm**: <i> Bandyopadhyay S. Pakhira M. K. and Maulik U. Validity index for crisp and fuzzy clusters. Pattern \r\n Recognition, 2004. </i>\r\n19. ratkowsky_lance\r\n20. \r\n21. **ray_turi**: <i> Ray et al. Determination of number of clusters in k-means clustering and application in colour \r\n image segmentation. 4th International Conference on Advances in Pattern Recognition and Digital \r\n Techniques, 1999. </i>\r\n22. wemmert_gancarski\r\n23. **xie_beni**: <i> X.L. Xie and G. Beni. A validity measure for fuzzy clustering. IEEE Transactions on Pattern \r\n Analysis and Machine Intelligence, 1991. </i>\r\n24. \r\n25. banfeld_raftery\r\n26. trace_wib\r\n27. \r\n28. log_det_ratio\r\n29. \r\n30. point_biserial\r\n31. calinski_harabasz\r\n32. silhouette\r\n33. davies_bouldin\r\n34. scott_symons\r\n",
"bugtrack_url": null,
"license": null,
"summary": "A Python library for clustering operations. Evaluation and meta-feature generation.",
"version": "0.1.0",
"project_urls": {
"Homepage": "https://github.com/yannispoulakis/pyclustkit"
},
"split_keywords": [
"clustering",
" meta-learning",
" meta-features",
" evaluation"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b64b539bc1e8206dcb69df9a75e9c81e832a5796b3ca66463b384a882f91dc3b",
"md5": "e8cdac33a2e0e9cc51a432f26250420a",
"sha256": "24433de6f41346fe55b27229c0b83ea7ed678b3f78d30652151ae3c8438a9be4"
},
"downloads": -1,
"filename": "pyclust_evl-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e8cdac33a2e0e9cc51a432f26250420a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 54492,
"upload_time": "2024-12-09T11:51:50",
"upload_time_iso_8601": "2024-12-09T11:51:50.406448Z",
"url": "https://files.pythonhosted.org/packages/b6/4b/539bc1e8206dcb69df9a75e9c81e832a5796b3ca66463b384a882f91dc3b/pyclust_evl-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "dcd4b0ba4712fc323ee7379648fabee8f6fe62ac4976050cca4c1584dfe5fe4c",
"md5": "c773c2e7cd6b7c6c8924715887cb9a41",
"sha256": "678981305d8fdbdae2f4333bdb1a2f263ed239c64c77f68627f8652d6002dde9"
},
"downloads": -1,
"filename": "pyclust_evl-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "c773c2e7cd6b7c6c8924715887cb9a41",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 44672,
"upload_time": "2024-12-09T11:51:55",
"upload_time_iso_8601": "2024-12-09T11:51:55.027476Z",
"url": "https://files.pythonhosted.org/packages/dc/d4/b0ba4712fc323ee7379648fabee8f6fe62ac4976050cca4c1584dfe5fe4c/pyclust_evl-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-09 11:51:55",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "yannispoulakis",
"github_project": "pyclustkit",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "dgl",
"specs": [
[
"==",
"2.2.1"
]
]
},
{
"name": "gensim",
"specs": [
[
"==",
"4.3.3"
]
]
},
{
"name": "matplotlib",
"specs": [
[
"==",
"3.9.2"
]
]
},
{
"name": "networkx",
"specs": [
[
"==",
"3.4.2"
]
]
},
{
"name": "numpy",
"specs": [
[
"==",
"1.26.4"
]
]
},
{
"name": "pandas",
"specs": [
[
"==",
"2.2.3"
]
]
},
{
"name": "Pillow",
"specs": [
[
"==",
"11.0.0"
]
]
},
{
"name": "psutil",
"specs": [
[
"==",
"6.1.0"
]
]
},
{
"name": "scikit_learn",
"specs": [
[
"==",
"1.5.2"
]
]
},
{
"name": "scipy",
"specs": [
[
"==",
"1.13.1"
]
]
},
{
"name": "setuptools",
"specs": [
[
"==",
"75.3.0"
]
]
},
{
"name": "six",
"specs": [
[
"==",
"1.16.0"
]
]
},
{
"name": "torch",
"specs": [
[
"==",
"2.3.0"
]
]
}
],
"lcname": "pyclust-evl"
}