pyclust-evl


Namepyclust-evl JSON
Version 0.1.0 PyPI version JSON
download
home_pagehttps://github.com/yannispoulakis/pyclustkit
SummaryA Python library for clustering operations. Evaluation and meta-feature generation.
upload_time2024-12-09 11:51:55
maintainerNone
docs_urlNone
authorYannis Poulakis
requires_python>=3.6
licenseNone
keywords clustering meta-learning meta-features evaluation
VCS
bugtrack_url
requirements dgl gensim matplotlib networkx numpy pandas Pillow psutil scikit_learn scipy setuptools six torch
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # The PyClustKit Module: All about clustering in a single Python Module!

The pyclustkit module is built on top of various libraries to enable many clustering operations. 
Currently, the module is built for clustering evaluation and meta-learning. 

# Table of Contents
- [Installation Instructions](#installation-instructions)
- [Useful Links](#useful-links)
- [Usage Examples](#usage-examples)
- [Cite Us!](#citing-this-work)

# Installation Instructions

The pyclustkit is available to download with pypi

```commandline
pip install pyclustkit
```

I

# Useful Links


# Usage Examples
## Calculating Internal Cluster Validity Indices (CVI) 

PyClustKit comes with an evaluation suite of 46 internal validity indices. Each is implemented on top of numpy and, 
the module incorporates specific methods for speeding up the execution of multiple CVI by implementing a shared process 
tracking. 


```{python}
from pyclustkit.eval import CVIToolbox 

ct = CVIToolbox(X,y)
ct.calculate_icvi(cvi=["dunn", "silhouette"]) # if no CVI are specified it defaults to 'all'.
print(ct.cvi_results)

```
## Meta Learning 

### Meta-Feature Extraction
PyClustKit comes with an evaluation suite of 46 internal validity indices. Each is implemented on top of numpy and, 
the module incorporates specific methods for speeding up the execution of multiple CVI by implementing a shared process 
tracking. 


```{python}
from pyclustkit.eval import CVIToolbox 

ct = CVIToolbox(X,y)
ct.calculate_icvi(cvi=["dunn", "silhouette"]) # if no CVI are specified it defaults to 'all'.
print(ct.cvi_results)

```




# Citing This Work


<details>
<summary>List of Implemented CVI with citations</summary>
Currently the collection consists of the following internal CVIs. R does not do gdi 61,62,63 due to hausdorff:

1. **ball_hall**: <i> G. H. Ball and D. J. Hall. Isodata: A novel method of data analysis and pattern
                      classification. Menlo Park: Stanford Research Institute. (NTIS No. AD 699616),1965.</i>
2. **banfeld_raftery**: <i> J.D. Banfield and A.E. Raftery. Model-based gaussian and non-gaussian clustering. Biometrics,
                        49:803–821, 1993. </i>
3. **c_index**: <i> Hubert, Lawrence & Levin, Joel. (1976). A general statistical framework for assessing categorical 
clustering in free recall. Psychological Bulletin. 83. 1072-1080. 10.1037/0033-2909.83.6.1072. </i>
4. **CDbw** : <i>Halkidi, M., & Vazirgiannis, M. (2008). A density-based cluster validity approach using 
multi-representatives. Pattern Recognit. Lett., 29, 773-786.  </i>
5. **det_ratio** : <i> A. J. Scott and M. J. Symons. Clustering methods based on likelihood ratio criteria. Biometrics, 
                27:387–397, 1971.</i>
6. **Dunn Index** : <i>J. Dunn. Well separated clusters and optimal fuzzy partitions. Journal of Cybernetics, 4:95–104, 
                    1974. </i>

7. **GDI [11,21,31,41,51,61][12,22,32,42,52,62][13,23,33,43,53,63]**: <i>J. C. Bezdek and N. R. Pal. Some new indexes of
cluster validity. IEEE Transactions on Systems, Man, and CyberneticsÑPART B: CYBERNETICS, 28, no.3:301–315, 1998.</i>
8. **ksq_detw**:  F. H. B. Marriot. Practical problems in a method of cluster analysis. Biometrics,
27:456–460, 1975.
9. **log_det_ratio**: <i> Halkidi et al. On clustering validation techniques. J. Intell. Inf. Syst., 2001. </i>
10. **log_ss_ratio**: <i> J. A. Hartigan. Clustering algorithms. New York: Wiley, 1975. </i>
11. **McClain_Rao**: <i> J. O. McClain and V. R. Rao. Clustisz: A program to test for the quality of
                         clustering of a set of objects. Journal of Marketing Research, 12:456–460, 1975.</i>












11. trace_w Index

13. Friedman-Rudin 1 Index
14. Friedman-Rudin 2 Index
15. **S_dbw**: <i> M. Halkidi and M. Vazirgiannis, "Clustering validity assessment: finding the optimal partitioning of a 
data set," Proceedings 2001 IEEE International Conference on Data Mining. </i>
16. **sd_dis Index**: <i>Halkidi et al. On clustering validation techniques. J. Intell. Inf. Syst., 2001.</i>
17. **sd_scat Index**: <i>Halkidi et al. On clustering validation techniques. J. Intell. Inf. Syst., 2001.</i> 

18. **pbm**: <i> Bandyopadhyay S. Pakhira M. K. and Maulik U. Validity index for crisp and fuzzy clusters. Pattern 
             Recognition, 2004. </i>
19. ratkowsky_lance
20. 
21. **ray_turi**: <i> Ray et al. Determination of number of clusters in k-means clustering and application in colour 
                  image segmentation. 4th International Conference on Advances in Pattern Recognition and Digital 
                  Techniques, 1999. </i>
22. wemmert_gancarski
23. **xie_beni**: <i> X.L. Xie and G. Beni. A validity measure for fuzzy clustering. IEEE Transactions on Pattern 
                  Analysis and Machine Intelligence, 1991. </i>
24. 
25. banfeld_raftery
26. trace_wib
27. 
28. log_det_ratio
29. 
30. point_biserial
31. calinski_harabasz
32. silhouette
33. davies_bouldin
34. scott_symons

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/yannispoulakis/pyclustkit",
    "name": "pyclust-evl",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": "Clustering, Meta-Learning, Meta-Features, Evaluation",
    "author": "Yannis Poulakis",
    "author_email": "giannispoy@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/dc/d4/b0ba4712fc323ee7379648fabee8f6fe62ac4976050cca4c1584dfe5fe4c/pyclust_evl-0.1.0.tar.gz",
    "platform": null,
    "description": "# The PyClustKit Module: All about clustering in a single Python Module!\r\n\r\nThe pyclustkit module is built on top of various libraries to enable many clustering operations. \r\nCurrently, the module is built for clustering evaluation and meta-learning. \r\n\r\n# Table of Contents\r\n- [Installation Instructions](#installation-instructions)\r\n- [Useful Links](#useful-links)\r\n- [Usage Examples](#usage-examples)\r\n- [Cite Us!](#citing-this-work)\r\n\r\n# Installation Instructions\r\n\r\nThe pyclustkit is available to download with pypi\r\n\r\n```commandline\r\npip install pyclustkit\r\n```\r\n\r\nI\r\n\r\n# Useful Links\r\n\r\n\r\n# Usage Examples\r\n## Calculating Internal Cluster Validity Indices (CVI) \r\n\r\nPyClustKit comes with an evaluation suite of 46 internal validity indices. Each is implemented on top of numpy and, \r\nthe module incorporates specific methods for speeding up the execution of multiple CVI by implementing a shared process \r\ntracking. \r\n\r\n\r\n```{python}\r\nfrom pyclustkit.eval import CVIToolbox \r\n\r\nct = CVIToolbox(X,y)\r\nct.calculate_icvi(cvi=[\"dunn\", \"silhouette\"]) # if no CVI are specified it defaults to 'all'.\r\nprint(ct.cvi_results)\r\n\r\n```\r\n## Meta Learning \r\n\r\n### Meta-Feature Extraction\r\nPyClustKit comes with an evaluation suite of 46 internal validity indices. Each is implemented on top of numpy and, \r\nthe module incorporates specific methods for speeding up the execution of multiple CVI by implementing a shared process \r\ntracking. \r\n\r\n\r\n```{python}\r\nfrom pyclustkit.eval import CVIToolbox \r\n\r\nct = CVIToolbox(X,y)\r\nct.calculate_icvi(cvi=[\"dunn\", \"silhouette\"]) # if no CVI are specified it defaults to 'all'.\r\nprint(ct.cvi_results)\r\n\r\n```\r\n\r\n\r\n\r\n\r\n# Citing This Work\r\n\r\n\r\n<details>\r\n<summary>List of Implemented CVI with citations</summary>\r\nCurrently the collection consists of the following internal CVIs. R does not do gdi 61,62,63 due to hausdorff:\r\n\r\n1. **ball_hall**: <i> G. H. Ball and D. J. Hall. Isodata: A novel method of data analysis and pattern\r\n                      classification. Menlo Park: Stanford Research Institute. (NTIS No. AD 699616),1965.</i>\r\n2. **banfeld_raftery**: <i> J.D. Banfield and A.E. Raftery. Model-based gaussian and non-gaussian clustering. Biometrics,\r\n                        49:803\u2013821, 1993. </i>\r\n3. **c_index**: <i> Hubert, Lawrence & Levin, Joel. (1976). A general statistical framework for assessing categorical \r\nclustering in free recall. Psychological Bulletin. 83. 1072-1080. 10.1037/0033-2909.83.6.1072. </i>\r\n4. **CDbw** : <i>Halkidi, M., & Vazirgiannis, M. (2008). A density-based cluster validity approach using \r\nmulti-representatives. Pattern Recognit. Lett., 29, 773-786.  </i>\r\n5. **det_ratio** : <i> A. J. Scott and M. J. Symons. Clustering methods based on likelihood ratio criteria. Biometrics, \r\n                27:387\u2013397, 1971.</i>\r\n6. **Dunn Index** : <i>J. Dunn. Well separated clusters and optimal fuzzy partitions. Journal of Cybernetics, 4:95\u2013104, \r\n                    1974. </i>\r\n\r\n7. **GDI [11,21,31,41,51,61][12,22,32,42,52,62][13,23,33,43,53,63]**: <i>J. C. Bezdek and N. R. Pal. Some new indexes of\r\ncluster validity. IEEE Transactions on Systems, Man, and Cybernetics\u00d1PART B: CYBERNETICS, 28, no.3:301\u2013315, 1998.</i>\r\n8. **ksq_detw**:  F. H. B. Marriot. Practical problems in a method of cluster analysis. Biometrics,\r\n27:456\u2013460, 1975.\r\n9. **log_det_ratio**: <i> Halkidi et al. On clustering validation techniques. J. Intell. Inf. Syst., 2001. </i>\r\n10. **log_ss_ratio**: <i> J. A. Hartigan. Clustering algorithms. New York: Wiley, 1975. </i>\r\n11. **McClain_Rao**: <i> J. O. McClain and V. R. Rao. Clustisz: A program to test for the quality of\r\n                         clustering of a set of objects. Journal of Marketing Research, 12:456\u2013460, 1975.</i>\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n11. trace_w Index\r\n\r\n13. Friedman-Rudin 1 Index\r\n14. Friedman-Rudin 2 Index\r\n15. **S_dbw**: <i> M. Halkidi and M. Vazirgiannis, \"Clustering validity assessment: finding the optimal partitioning of a \r\ndata set,\" Proceedings 2001 IEEE International Conference on Data Mining. </i>\r\n16. **sd_dis Index**: <i>Halkidi et al. On clustering validation techniques. J. Intell. Inf. Syst., 2001.</i>\r\n17. **sd_scat Index**: <i>Halkidi et al. On clustering validation techniques. J. Intell. Inf. Syst., 2001.</i> \r\n\r\n18. **pbm**: <i> Bandyopadhyay S. Pakhira M. K. and Maulik U. Validity index for crisp and fuzzy clusters. Pattern \r\n             Recognition, 2004. </i>\r\n19. ratkowsky_lance\r\n20. \r\n21. **ray_turi**: <i> Ray et al. Determination of number of clusters in k-means clustering and application in colour \r\n                  image segmentation. 4th International Conference on Advances in Pattern Recognition and Digital \r\n                  Techniques, 1999. </i>\r\n22. wemmert_gancarski\r\n23. **xie_beni**: <i> X.L. Xie and G. Beni. A validity measure for fuzzy clustering. IEEE Transactions on Pattern \r\n                  Analysis and Machine Intelligence, 1991. </i>\r\n24. \r\n25. banfeld_raftery\r\n26. trace_wib\r\n27. \r\n28. log_det_ratio\r\n29. \r\n30. point_biserial\r\n31. calinski_harabasz\r\n32. silhouette\r\n33. davies_bouldin\r\n34. scott_symons\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A Python library for clustering operations. Evaluation and meta-feature generation.",
    "version": "0.1.0",
    "project_urls": {
        "Homepage": "https://github.com/yannispoulakis/pyclustkit"
    },
    "split_keywords": [
        "clustering",
        " meta-learning",
        " meta-features",
        " evaluation"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b64b539bc1e8206dcb69df9a75e9c81e832a5796b3ca66463b384a882f91dc3b",
                "md5": "e8cdac33a2e0e9cc51a432f26250420a",
                "sha256": "24433de6f41346fe55b27229c0b83ea7ed678b3f78d30652151ae3c8438a9be4"
            },
            "downloads": -1,
            "filename": "pyclust_evl-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e8cdac33a2e0e9cc51a432f26250420a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 54492,
            "upload_time": "2024-12-09T11:51:50",
            "upload_time_iso_8601": "2024-12-09T11:51:50.406448Z",
            "url": "https://files.pythonhosted.org/packages/b6/4b/539bc1e8206dcb69df9a75e9c81e832a5796b3ca66463b384a882f91dc3b/pyclust_evl-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "dcd4b0ba4712fc323ee7379648fabee8f6fe62ac4976050cca4c1584dfe5fe4c",
                "md5": "c773c2e7cd6b7c6c8924715887cb9a41",
                "sha256": "678981305d8fdbdae2f4333bdb1a2f263ed239c64c77f68627f8652d6002dde9"
            },
            "downloads": -1,
            "filename": "pyclust_evl-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "c773c2e7cd6b7c6c8924715887cb9a41",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 44672,
            "upload_time": "2024-12-09T11:51:55",
            "upload_time_iso_8601": "2024-12-09T11:51:55.027476Z",
            "url": "https://files.pythonhosted.org/packages/dc/d4/b0ba4712fc323ee7379648fabee8f6fe62ac4976050cca4c1584dfe5fe4c/pyclust_evl-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-09 11:51:55",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "yannispoulakis",
    "github_project": "pyclustkit",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "dgl",
            "specs": [
                [
                    "==",
                    "2.2.1"
                ]
            ]
        },
        {
            "name": "gensim",
            "specs": [
                [
                    "==",
                    "4.3.3"
                ]
            ]
        },
        {
            "name": "matplotlib",
            "specs": [
                [
                    "==",
                    "3.9.2"
                ]
            ]
        },
        {
            "name": "networkx",
            "specs": [
                [
                    "==",
                    "3.4.2"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "==",
                    "1.26.4"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    "==",
                    "2.2.3"
                ]
            ]
        },
        {
            "name": "Pillow",
            "specs": [
                [
                    "==",
                    "11.0.0"
                ]
            ]
        },
        {
            "name": "psutil",
            "specs": [
                [
                    "==",
                    "6.1.0"
                ]
            ]
        },
        {
            "name": "scikit_learn",
            "specs": [
                [
                    "==",
                    "1.5.2"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    "==",
                    "1.13.1"
                ]
            ]
        },
        {
            "name": "setuptools",
            "specs": [
                [
                    "==",
                    "75.3.0"
                ]
            ]
        },
        {
            "name": "six",
            "specs": [
                [
                    "==",
                    "1.16.0"
                ]
            ]
        },
        {
            "name": "torch",
            "specs": [
                [
                    "==",
                    "2.3.0"
                ]
            ]
        }
    ],
    "lcname": "pyclust-evl"
}
        
Elapsed time: 0.60640s