cvi


Namecvi JSON
Version 0.6.0 PyPI version JSON
download
home_pageNone
SummaryA Python package for both batch and incremental cluster validity indices.
upload_time2024-06-13 15:16:41
maintainerNone
docs_urlNone
authorNone
requires_python>=3.6
licenseMIT License Copyright (c) 2022 Sasha Petrenko Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords cluster validity indices cluster validity index incremental cluster validity indices incremental cluster validity index cluster validation cvi icvi
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            [![cvi-header](https://github.com/AP6YC/FileStorage/blob/main/cvi/header.png?raw=true)][docs-dev-url]

A Python package implementing both batch and incremental cluster validity indices (CVIs).

| **Stable Docs**  | **Dev Docs** | **Build Status** | **Coverage** |
|:----------------:|:------------:|:----------------:|:------------:|
| [![Stable][docs-stable-img]][docs-stable-url] | [![Dev][docs-dev-img]][docs-dev-url]| [![Build Status][ci-img]][ci-url] | [![Codecov][codecov-img]][codecov-url] |
| **Version** | **Issues** | **Downloads** | **Zenodo DOI** |
| [![version][version-img]][version-url] | [![issues][issues-img]][issues-url] | [![Downloads][downloads-img]][downloads-url] |  [![DOI][zenodo-img]][zenodo-url] |

[downloads-img]: https://static.pepy.tech/badge/cvi
[downloads-url]: https://pepy.tech/project/cvi

[zenodo-img]: https://zenodo.org/badge/526280198.svg
[zenodo-url]: https://zenodo.org/badge/latestdoi/526280198

[docs-stable-img]: https://img.shields.io/badge/docs-stable-blue.svg
[docs-stable-url]: https://AP6YC.github.io/cvi/main

[docs-dev-img]: https://img.shields.io/badge/docs-dev-blue.svg
[docs-dev-url]: https://AP6YC.github.io/cvi/develop

[ci-img]: https://github.com/AP6YC/cvi/actions/workflows/Test.yml/badge.svg
[ci-url]: https://github.com/AP6YC/cvi/actions/workflows/Test.yml

[codecov-img]: https://codecov.io/gh/AP6YC/cvi/branch/main/graph/badge.svg
[codecov-url]: https://codecov.io/gh/AP6YC/cvi

[version-img]: https://img.shields.io/pypi/v/cvi.svg
[version-url]: https://pypi.org/project/cvi

[issues-img]: https://img.shields.io/github/issues/AP6YC/cvi?style=flat
[issues-url]: https://github.com/AP6YC/cvi/issues

## Table of Contents

- [Table of Contents](#table-of-contents)
- [Cluster Validity Indices](#cluster-validity-indices)
- [Installation](#installation)
- [Usage](#usage)
  - [Quickstart](#quickstart)
  - [Detailed Usage](#detailed-usage)
- [Implemented CVIs](#implemented-cvis)
- [History](#history)
- [Acknowledgements](#acknowledgements)
  - [Derivation](#derivation)
  - [Authors](#authors)
  - [Related Projects](#related-projects)
  - [Assets](#assets)
    - [Fonts](#fonts)
    - [Icons](#icons)

## Cluster Validity Indices

Say you have a clustering algorithm that clusters a set of samples containing features of some kind and some dimensionality.
Great!
That was a lot of work, and you should feel accomplished.
But how do you know that the algorithm performed _well_?
By definition, you wouldn't have the _true_ label belonging to each sample (if one could even exist in your context), just the label prescribed by your clustering algorithm.

**Enter Cluster Validity Indices (CVIs)**.

CVIs are metrics of cluster partitioning when true cluster labels are unavailable.
Each operates on only the information available (i.e., the provided samples of features and the labels prescribed by the clustering algorithm) and produces a _metric_, a number that goes up or down according to how well the CVI believes the clustering algorithm appears to, well, _cluster_.
Clustering well in this context means correctly partitioning (i.e., separating) the data rather than prescribing too many different clusters (over partitioning) or too few (under partitioning).
Every CVI itself also behaves differently in terms of the range and scale of their numbers.
**Furthermore, each CVI has an original batch implementation and incremental implementation that are equivalent**.

The `cvi` Python package contains a variety of these batch and incremental CVIs.

## Installation

The `cvi` package is listed on PyPI, so you may install the latest version with

```python
pip install cvi
```

You can also specify a version to install in the usual way with

```python
pip install cvi==v0.6.0
```

Alternatively, you can manually install a release from the [releases page](https://github.com/AP6YC/cvi/releases) on GitHub.

## Usage

### Quickstart

Create a CVI object and compute the criterion value in batch with `get_cvi`:

```python
# Import the library
import cvi
# Create a Calinski-Harabasz (CH) CVI object
my_cvi = cvi.CH()
# Load some data from some clustering algorithm
samples, labels = load_some_clustering_data()
# Compute the final criterion value in batch
criterion_value = my_cvi.get_cvi(samples, labels)
```

or do it incrementally, also with `get_cvi`:

```python
# Datasets are numpy arrays
import numpy as np
# Create a container for criterion values
n_samples = len(labels)
criterion_values = np.zeros(n_samples)
# Iterate over the data
for ix in range(n_samples):
    criterion_values = my_cvi.get_cvi(samples[ix, :], labels[ix])
```

### Detailed Usage

The `cvi` package contains a set of implemented CVIs with batch and incremental update methods.
Each CVI is a standalone stateful object inheriting from a base class `CVI`, and all `CVI` functions are object methods, such as those that update parameters and return the criterion value.

Instantiate a CVI of you choice with the default constructor:

```python
# Import the package
import cvi
# Import numpy for some data handling
import numpy as np

# Instantiate a Calinski-Harabasz (CH) CVI object
my_cvi = cvi.CH()
```

CVIs are instantiated with their acronyms, with a list of all implemented CVIS being found in the [Implemented CVIs](#implemented-cvis) section.

A batch of data is assumed to be a numpy array of samples and a numpy vector of integer labels.

```python
# Load some data
samples, labels = my_clustering_alg(some_data)
```

> **NOTE**:
>
> The `cvi` package assumes the Numpy **row-major** convention where rows are individual samples and columns are features.
> A batch dataset is then `[n_samples, n_features]` large, and their corresponding labels are `[n_samples]` large.

You may compute the final criterion value with a batch update all at once with `CVI.get_cvi`

```python
# Get the final criterion value in batch mode
criterion_value = my_cvi.get_cvi(samples, labels)
```

or you may get them incrementally with the same method, where you pass instead just a single numpy vector of features and a single integer label.
The incremental methods are used automatically based upon the dimensions of the data that is passed.

```python
# Create a container for the criterion value after each sample
n_samples = len(labels)
criterion_values = np.zeros(n_samples)

# Iterate across the data and store the criterion value over time
for ix in range(n_samples):
    sample = samples[ix, :]
    label = labels[ix]
    criterion_values[ix] = my_cvi.get_cvi(sample, label)
```

> **NOTE**:
>
> Currently only using _either_ batch _or_ incremental methods is supported; switching from batch to incremental updates with the same is not yet implemented.

## Implemented CVIs

The following CVIs have been implemented as of the latest version of `cvi`:

- **CH**: Calinski-Harabasz
- **cSIL**: Centroid-based Silhouette
- **DB**: Davies-Bouldin
- **GD43**: Generalized Dunn's Index 43.
- **GD53**: Generalized Dunn's Index 53.
- **PS**: Partition Separation.
- **rCIP**: (Renyi's) representative Cross Information Potential.
- **WB**: WB-index.
- **XB**: Xie-Beni.

## History

- 8/18/2022: Initialize project.
- 9/8/2022: First release on PyPi and initiate GitFlow.
- 8/10/2023: v0.5.1 released.
- 5/31/2024: Updated documentation.

## Acknowledgements

### Derivation

The incremental and batch CVI implementations in this package are largely derived from the following Julia language implementations by the same authors of this package:

- [ClusterValidityIndices.jl](https://github.com/AP6YC/ClusterValidityIndices.jl)

### Authors

The principal authors of the `cvi` pacakge are:

- Sasha Petrenko <petrenkos@mst.edu>
- Nik Melton <nmmz76@mst.edu>

### Related Projects

If this package is missing something that you need, feel free to check out some related Python cluster validity packages:

- [validclust](https://github.com/crew102/validclust)
- [clusterval](https://github.com/Nuno09/clusterval)

### Assets

#### Fonts

The following font is used in the logo:

- [Ethnocentric Font Family](https://www.1001fonts.com/ethnocentric-font.html)

#### Icons

The icon for the project is taken from:

- [Cluster computing icons created by IconBaandar - Flaticon](https://www.flaticon.com/free-icons/cluster-computing) ([cluster-5464694](https://www.flaticon.com/free-icon/cluster_5464694))

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "cvi",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "Sasha Petrenko <petrenkos@mst.edu>",
    "keywords": "cluster validity indices, cluster validity index, incremental cluster validity indices, incremental cluster validity index, cluster validation, cvi, icvi",
    "author": null,
    "author_email": "Sasha Petrenko <petrenkos@mst.edu>",
    "download_url": "https://files.pythonhosted.org/packages/b2/ba/a5c843cbc6e3ebc0676f927f592080a3aeb54ccd9cb25d177e00873792fd/cvi-0.6.0.tar.gz",
    "platform": null,
    "description": "[![cvi-header](https://github.com/AP6YC/FileStorage/blob/main/cvi/header.png?raw=true)][docs-dev-url]\n\nA Python package implementing both batch and incremental cluster validity indices (CVIs).\n\n| **Stable Docs**  | **Dev Docs** | **Build Status** | **Coverage** |\n|:----------------:|:------------:|:----------------:|:------------:|\n| [![Stable][docs-stable-img]][docs-stable-url] | [![Dev][docs-dev-img]][docs-dev-url]| [![Build Status][ci-img]][ci-url] | [![Codecov][codecov-img]][codecov-url] |\n| **Version** | **Issues** | **Downloads** | **Zenodo DOI** |\n| [![version][version-img]][version-url] | [![issues][issues-img]][issues-url] | [![Downloads][downloads-img]][downloads-url] |  [![DOI][zenodo-img]][zenodo-url] |\n\n[downloads-img]: https://static.pepy.tech/badge/cvi\n[downloads-url]: https://pepy.tech/project/cvi\n\n[zenodo-img]: https://zenodo.org/badge/526280198.svg\n[zenodo-url]: https://zenodo.org/badge/latestdoi/526280198\n\n[docs-stable-img]: https://img.shields.io/badge/docs-stable-blue.svg\n[docs-stable-url]: https://AP6YC.github.io/cvi/main\n\n[docs-dev-img]: https://img.shields.io/badge/docs-dev-blue.svg\n[docs-dev-url]: https://AP6YC.github.io/cvi/develop\n\n[ci-img]: https://github.com/AP6YC/cvi/actions/workflows/Test.yml/badge.svg\n[ci-url]: https://github.com/AP6YC/cvi/actions/workflows/Test.yml\n\n[codecov-img]: https://codecov.io/gh/AP6YC/cvi/branch/main/graph/badge.svg\n[codecov-url]: https://codecov.io/gh/AP6YC/cvi\n\n[version-img]: https://img.shields.io/pypi/v/cvi.svg\n[version-url]: https://pypi.org/project/cvi\n\n[issues-img]: https://img.shields.io/github/issues/AP6YC/cvi?style=flat\n[issues-url]: https://github.com/AP6YC/cvi/issues\n\n## Table of Contents\n\n- [Table of Contents](#table-of-contents)\n- [Cluster Validity Indices](#cluster-validity-indices)\n- [Installation](#installation)\n- [Usage](#usage)\n  - [Quickstart](#quickstart)\n  - [Detailed Usage](#detailed-usage)\n- [Implemented CVIs](#implemented-cvis)\n- [History](#history)\n- [Acknowledgements](#acknowledgements)\n  - [Derivation](#derivation)\n  - [Authors](#authors)\n  - [Related Projects](#related-projects)\n  - [Assets](#assets)\n    - [Fonts](#fonts)\n    - [Icons](#icons)\n\n## Cluster Validity Indices\n\nSay you have a clustering algorithm that clusters a set of samples containing features of some kind and some dimensionality.\nGreat!\nThat was a lot of work, and you should feel accomplished.\nBut how do you know that the algorithm performed _well_?\nBy definition, you wouldn't have the _true_ label belonging to each sample (if one could even exist in your context), just the label prescribed by your clustering algorithm.\n\n**Enter Cluster Validity Indices (CVIs)**.\n\nCVIs are metrics of cluster partitioning when true cluster labels are unavailable.\nEach operates on only the information available (i.e., the provided samples of features and the labels prescribed by the clustering algorithm) and produces a _metric_, a number that goes up or down according to how well the CVI believes the clustering algorithm appears to, well, _cluster_.\nClustering well in this context means correctly partitioning (i.e., separating) the data rather than prescribing too many different clusters (over partitioning) or too few (under partitioning).\nEvery CVI itself also behaves differently in terms of the range and scale of their numbers.\n**Furthermore, each CVI has an original batch implementation and incremental implementation that are equivalent**.\n\nThe `cvi` Python package contains a variety of these batch and incremental CVIs.\n\n## Installation\n\nThe `cvi` package is listed on PyPI, so you may install the latest version with\n\n```python\npip install cvi\n```\n\nYou can also specify a version to install in the usual way with\n\n```python\npip install cvi==v0.6.0\n```\n\nAlternatively, you can manually install a release from the [releases page](https://github.com/AP6YC/cvi/releases) on GitHub.\n\n## Usage\n\n### Quickstart\n\nCreate a CVI object and compute the criterion value in batch with `get_cvi`:\n\n```python\n# Import the library\nimport cvi\n# Create a Calinski-Harabasz (CH) CVI object\nmy_cvi = cvi.CH()\n# Load some data from some clustering algorithm\nsamples, labels = load_some_clustering_data()\n# Compute the final criterion value in batch\ncriterion_value = my_cvi.get_cvi(samples, labels)\n```\n\nor do it incrementally, also with `get_cvi`:\n\n```python\n# Datasets are numpy arrays\nimport numpy as np\n# Create a container for criterion values\nn_samples = len(labels)\ncriterion_values = np.zeros(n_samples)\n# Iterate over the data\nfor ix in range(n_samples):\n    criterion_values = my_cvi.get_cvi(samples[ix, :], labels[ix])\n```\n\n### Detailed Usage\n\nThe `cvi` package contains a set of implemented CVIs with batch and incremental update methods.\nEach CVI is a standalone stateful object inheriting from a base class `CVI`, and all `CVI` functions are object methods, such as those that update parameters and return the criterion value.\n\nInstantiate a CVI of you choice with the default constructor:\n\n```python\n# Import the package\nimport cvi\n# Import numpy for some data handling\nimport numpy as np\n\n# Instantiate a Calinski-Harabasz (CH) CVI object\nmy_cvi = cvi.CH()\n```\n\nCVIs are instantiated with their acronyms, with a list of all implemented CVIS being found in the [Implemented CVIs](#implemented-cvis) section.\n\nA batch of data is assumed to be a numpy array of samples and a numpy vector of integer labels.\n\n```python\n# Load some data\nsamples, labels = my_clustering_alg(some_data)\n```\n\n> **NOTE**:\n>\n> The `cvi` package assumes the Numpy **row-major** convention where rows are individual samples and columns are features.\n> A batch dataset is then `[n_samples, n_features]` large, and their corresponding labels are `[n_samples]` large.\n\nYou may compute the final criterion value with a batch update all at once with `CVI.get_cvi`\n\n```python\n# Get the final criterion value in batch mode\ncriterion_value = my_cvi.get_cvi(samples, labels)\n```\n\nor you may get them incrementally with the same method, where you pass instead just a single numpy vector of features and a single integer label.\nThe incremental methods are used automatically based upon the dimensions of the data that is passed.\n\n```python\n# Create a container for the criterion value after each sample\nn_samples = len(labels)\ncriterion_values = np.zeros(n_samples)\n\n# Iterate across the data and store the criterion value over time\nfor ix in range(n_samples):\n    sample = samples[ix, :]\n    label = labels[ix]\n    criterion_values[ix] = my_cvi.get_cvi(sample, label)\n```\n\n> **NOTE**:\n>\n> Currently only using _either_ batch _or_ incremental methods is supported; switching from batch to incremental updates with the same is not yet implemented.\n\n## Implemented CVIs\n\nThe following CVIs have been implemented as of the latest version of `cvi`:\n\n- **CH**: Calinski-Harabasz\n- **cSIL**: Centroid-based Silhouette\n- **DB**: Davies-Bouldin\n- **GD43**: Generalized Dunn's Index 43.\n- **GD53**: Generalized Dunn's Index 53.\n- **PS**: Partition Separation.\n- **rCIP**: (Renyi's) representative Cross Information Potential.\n- **WB**: WB-index.\n- **XB**: Xie-Beni.\n\n## History\n\n- 8/18/2022: Initialize project.\n- 9/8/2022: First release on PyPi and initiate GitFlow.\n- 8/10/2023: v0.5.1 released.\n- 5/31/2024: Updated documentation.\n\n## Acknowledgements\n\n### Derivation\n\nThe incremental and batch CVI implementations in this package are largely derived from the following Julia language implementations by the same authors of this package:\n\n- [ClusterValidityIndices.jl](https://github.com/AP6YC/ClusterValidityIndices.jl)\n\n### Authors\n\nThe principal authors of the `cvi` pacakge are:\n\n- Sasha Petrenko <petrenkos@mst.edu>\n- Nik Melton <nmmz76@mst.edu>\n\n### Related Projects\n\nIf this package is missing something that you need, feel free to check out some related Python cluster validity packages:\n\n- [validclust](https://github.com/crew102/validclust)\n- [clusterval](https://github.com/Nuno09/clusterval)\n\n### Assets\n\n#### Fonts\n\nThe following font is used in the logo:\n\n- [Ethnocentric Font Family](https://www.1001fonts.com/ethnocentric-font.html)\n\n#### Icons\n\nThe icon for the project is taken from:\n\n- [Cluster computing icons created by IconBaandar - Flaticon](https://www.flaticon.com/free-icons/cluster-computing) ([cluster-5464694](https://www.flaticon.com/free-icon/cluster_5464694))\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2022 Sasha Petrenko  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ",
    "summary": "A Python package for both batch and incremental cluster validity indices.",
    "version": "0.6.0",
    "project_urls": {
        "Bug Reports": "https://github.com/AP6YC/cvi/issues",
        "Documentation": "https://cluster-validity-indices.readthedocs.io/",
        "Homepage": "https://github.com/AP6YC/cvi",
        "Source": "https://github.com/AP6YC/cvi/"
    },
    "split_keywords": [
        "cluster validity indices",
        " cluster validity index",
        " incremental cluster validity indices",
        " incremental cluster validity index",
        " cluster validation",
        " cvi",
        " icvi"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e209c5f92c6b294dd43053efebfd3e294cfa3ffbbb511a471fcde860da2491b7",
                "md5": "97628eba79c484f83fe4289be087efe5",
                "sha256": "e3ad11df01cf8e7dbcbabf5d2e501b8d83cd964bde0f9bf95d5c2e2ad5621e62"
            },
            "downloads": -1,
            "filename": "cvi-0.6.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "97628eba79c484f83fe4289be087efe5",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 33161,
            "upload_time": "2024-06-13T15:16:38",
            "upload_time_iso_8601": "2024-06-13T15:16:38.558743Z",
            "url": "https://files.pythonhosted.org/packages/e2/09/c5f92c6b294dd43053efebfd3e294cfa3ffbbb511a471fcde860da2491b7/cvi-0.6.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b2baa5c843cbc6e3ebc0676f927f592080a3aeb54ccd9cb25d177e00873792fd",
                "md5": "be4061f09eb283bca142c023a01cc260",
                "sha256": "202514286645b48750c8eba9b5aee135aaf7a59c1ce270301ccc7101e4a47968"
            },
            "downloads": -1,
            "filename": "cvi-0.6.0.tar.gz",
            "has_sig": false,
            "md5_digest": "be4061f09eb283bca142c023a01cc260",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 27888,
            "upload_time": "2024-06-13T15:16:41",
            "upload_time_iso_8601": "2024-06-13T15:16:41.380606Z",
            "url": "https://files.pythonhosted.org/packages/b2/ba/a5c843cbc6e3ebc0676f927f592080a3aeb54ccd9cb25d177e00873792fd/cvi-0.6.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-13 15:16:41",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "AP6YC",
    "github_project": "cvi",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "lcname": "cvi"
}
        
Elapsed time: 0.86910s