Name | zadu JSON |
Version |
0.2.0
JSON |
| download |
home_page | https://github.com/hj-n/zadu |
Summary | A Python Toolkit for Evaluating the Reliability of Dimensionality Reduction Embeddings |
upload_time | 2024-07-31 13:49:59 |
maintainer | None |
docs_url | None |
author | Hyeon Jeon |
requires_python | >=3.9.0 |
license | None |
keywords |
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
<p align="center">
<h2 align="center">ZADU</h2>
<p align="center"><b>A</b>-to-<b>Z</b> python library for eval<b>U</b>ating <b>D</b>imensionality reduction</p>
</p>
---
ZADU is a Python library that provides distortion measures for evaluating and analyzing dimensionality reduction (DR) embeddings. The library supports a diverse set of local, cluster-level, and global distortion measures, allowing users to assess DR techniques from various structural perspectives. By offering an optimized execution and pointwise local distortions, ZADU enables efficient and in-depth analysis of DR embeddings.
## Installation
You can install ZADU via `pip`:
```bash
pip install zadu
```
## Supported Distortion Measures
ZADU currently supports a total of 18 distortion measures, including:
- 7 local measures
- 5 cluster-level measures
- 6 global measures
For a complete list of supported measures, refer to [measures](/src/zadu/measures). The library initially provided 17 measures when it was first introduced by our academic paper, and we added one more measure (label trustworthiness & continuity) to the library.
## How To Use ZADU
ZADU provides two different interfaces for executing distortion measures.
You can either use the main class that wraps the measures, or directly access and invoke the functions that define each distortion measure.
### Using the Main Class
Use the main class of ZADU to compute distortion measures.
This approach benefits from the optimization, providing faster performance when executing multiple measures.
```python
from zadu import zadu
hd, ld = load_datasets()
spec = [{
"id" : "tnc",
"params": { "k": 20 },
}, {
"id" : "snc",
"params": { "k": 30, "clustering_strategy": "dbscan" }
}]
scores = zadu.ZADU(spec, hd).measure(ld)
print("T&C:", scores[0])
print("S&C:", scores[1])
```
`hd` represents high-dimensional data, `ld` represents low-dimensional data
## ZADU Class
The ZADU class provides the main interface for the Zadu library, allowing users to evaluate and analyze dimensionality reduction (DR) embeddings effectively and reliably.
### Class Constructor
The ZADU class constructor has the following signature:
```python
class ZADU(spec: List[Dict[str, Union[str, dict]]], hd: np.ndarray, return_local: bool = False)
```
### Parameters:
#### `spec`
A list of dictionaries that define the distortion measures to execute and their hyperparameters.
Each dictionary must contain the following keys:
* `"id"`: The identifier of the distortion measure, such as `"tnc"` or `"snc"`.
* `"params"`: A dictionary containing hyperparameters specific to the chosen distortion measure.
#### List of ID/Parameters for Each Function
***Warning***: While using `dsc`, `ivm`, `c_evm`, `nh`, and `ca_tnc`, please be aware that these measures assume that class labels are *well-separated* in the original high-dimensional space. If the class labels are not well-separated, the measures may produce unreliable results. Use the measure only if you are confident that the class labels are well-separated. Please refer to the related [academic paper](https://www.hyeonjeon.com/assets/pdf/jeon23tvcg.pdf) for more detail.
> ##### Local Measures
>
> | Measure | ID | Parameters | Range | Optimum |
> |---------|----|------------|-------|---------|
> | Trustworthiness & Continuity | tnc | `k=20` | [0.5, 1] | 1 |
> | Mean Relative Rank Errors | mrre | `k=20` | [0, 1] | 1 |
> | Local Continuity Meta-Criteria | lcmc | `k=20` | [0, 1] | 1 |
> | Neighborhood hit | nh | `k=20` | [0, 1] | 1 |
> | Neighbor Dissimilarity | nd | `k=20` | R+ | 0 |
> | Class-Aware Trustworthiness & Continuity | ca_tnc | `k=20` | [0.5, 1] | 1|
> | Procrustes Measure | proc | `k=20` | R+ | 0 |
>
> ##### Cluster-level Measures
>
> | Measure | ID | Parameters | Range | Optimum |
> |---------|----|------------|-------|---------|
> | Steadiness & Cohesiveness | snc | `iteration=150, walk_num_ratio=0.3, alpha=0.1, k=50, clustering_strategy="dbscan"` | [0, 1] | 1 |
> | Distance Consistency | dsc | | [0.5, 1] | 0.5 |
> | Internal Validation Measures | ivm | `measure="silhouette"` | Depends on IVM | Depends on IVM |
> | Clustering + External Clustering Validation Measures | c_evm | `measure="arand", clustering="kmeans", clustering_args=None` | Depends on EVM | Depends on EVM |
> | Label Trustworthiness & Continuity | l_tnc | `cvm="dsc"` | [0, 1] | 1 |
> ##### Global Measures
>
> | Measure | ID | Parameters | Range | Optimum |
> |---------|----|------------|-------|---------|
> | Stress | stress | | R+ | 0 |
> | Kullback-Leibler Divergence | kl_div | `sigma=0.1` | R+ | 0 |
> | Distance-to-Measure | dtm | `sigma=0.1` | R+ | 0 |
> | Topographic Product | topo | `k=20` | R | 0 |
> | Pearson’s correlation coefficient | pr | | [-1, 1] | 1
> | Spearman’s rank correlation coefficient | srho | | [-1, 1] | 1
##### `hd`
A high-dimensional dataset (numpy array) to register and reuse during the evaluation process.
##### `return_local`
A boolean flag that, when set to `True`, enables the computation of local pointwise distortions for each data point. The default value is `False`.
### Directly Accessing Functions
You can also directly access and invoke the functions defining each distortion measure for greater flexibility.
```python
from zadu.measures import *
mrre = mean_relative_rank_error.measure(hd, ld, k=20)
pr = pearson_r.measure(hd, ld)
nh = neighborhood_hit.measure(ld, label, k=20)
```
## Advanced Features
### Optimizing the Execution
ZADU automatically optimizes the execution of multiple distortion measures. It minimizes the computational overhead associated with preprocessing stages such as pairwise distance calculation, pointwise distance ranking determination, and k-nearest neighbor identification.
### Computing Pointwise Local Distortions
Users can obtain local pointwise distortions by setting the return_local flag. If a specified distortion measure produces local pointwise distortion as intermediate results, it returns a list of pointwise distortions when the flag is raised.
```python
from zadu import zadu
spec = [{
"id" : "dtm",
"params": {}
}, {
"id" : "mrre",
"params": { "k": 30 }
}]
zadu_obj = zadu.ZADU(spec, hd, return_local=True)
global_, local_ = zadu_obj.measure(ld)
print("MRRE local distortions:", local_[1])
```
### Visualizing Local Distortions
With the pointwise local distortions obtained from ZADU, users can visualize the distortions using various distortion visualizations. We provide ZADUVis, a python library that enables the rendering of two disotortion visualizations: [CheckViz](https://onlinelibrary.wiley.com/doi/full/10.1111/j.1467-8659.2010.01835.x) and the [Reliability Map](https://arxiv.org/abs/2107.07859).

```python
from zadu import zadu
from zaduvis import zaduvis
import matplotlib.pyplot as plt
from sklearn.manifold import TSNE
from sklearn.datasets import fetch_openml
hd = fetch_openml('mnist_784', version=1, cache=True).data.to_numpy()[::7]
ld = TSNE().fit_transform(hd)
## Computing local pointwise distortions
spec = [{
"id": "tnc",
"params": {"k": 25}
},{
"id": "snc",
"params": {"k": 50}
}]
zadu_obj = zadu.ZADU(spec, hd, return_local=True)
scores, local_list = zadu_obj.measure(ld)
tnc_local = local_list[0]
snc_local = local_list[1]
local_trustworthiness = tnc_local["local_trustworthiness"]
local_continuity = tnc_local["local_continuity"]
local_steadiness = snc_local["local_steadiness"]
local_cohesiveness = snc_local["local_cohesiveness"]
fig, ax = plt.subplots(1, 4, figsize=(50, 12.5))
zaduvis.checkviz(ld, local_trustworthiness, local_continuity, ax=ax[0])
zaduvis.reliability_map(ld, local_trustworthiness, local_continuity, k=10, ax=ax[1])
zaduvis.checkviz(ld, local_steadiness, local_cohesiveness, ax=ax[2])
zaduvis.reliability_map(ld, local_steadiness, local_cohesiveness, k=10, ax=ax[3])
```
The above code snippet demonstrates how to visualize local pointwise distortions using CheckViz and Reliability Map plots.

## Documentation
For more information about the available distortion measures, their use cases, and examples, please refer to our paper (IEEE VIS 2023 Short).
## Citation
> Hyeon Jeon, Aeri Cho, Jinhwa Jang, Soohyun Lee, Jake Hyun, Hyung-Kwon Ko, Jaemin Jo, and Jinwook Seo. Zadu: A python library for evaluating the reliability of dimensionality reduction embeddings. In 2023 IEEE Visualization and Visual Analytics (VIS), 2023. to appear.
```bib
@inproceedings{jeon23vis,
author={Jeon, Hyeon and Cho, Aeri and Jang, Jinhwa and Lee, Soohyun and Hyun, Jake and Ko, Hyung-Kwon and Jo, Jaemin and Seo, Jinwook},
booktitle={2023 IEEE Visualization and Visual Analytics (VIS)},
title={ZADU: A Python Library for Evaluating the Reliability of Dimensionality Reduction Embeddings},
year={2023},
volume={},
number={},
pages={},
doi={},
note={to appear}
}
```
Raw data
{
"_id": null,
"home_page": "https://github.com/hj-n/zadu",
"name": "zadu",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9.0",
"maintainer_email": null,
"keywords": null,
"author": "Hyeon Jeon",
"author_email": "hj@hcil.snu.ac.kr",
"download_url": "https://files.pythonhosted.org/packages/c9/92/bed2e3f0f38f815c171da5784c03843f895ee83dcb33613393285873719c/zadu-0.2.0.tar.gz",
"platform": null,
"description": "<p align=\"center\">\n <h2 align=\"center\">ZADU</h2>\n\t<p align=\"center\"><b>A</b>-to-<b>Z</b> python library for eval<b>U</b>ating <b>D</b>imensionality reduction</p>\n</p>\n\n---\n\nZADU is a Python library that provides distortion measures for evaluating and analyzing dimensionality reduction (DR) embeddings. The library supports a diverse set of local, cluster-level, and global distortion measures, allowing users to assess DR techniques from various structural perspectives. By offering an optimized execution and pointwise local distortions, ZADU enables efficient and in-depth analysis of DR embeddings. \n\n\n\n## Installation\n\nYou can install ZADU via `pip`:\n\n```bash\npip install zadu\n```\n\n## Supported Distortion Measures\n\nZADU currently supports a total of 18 distortion measures, including:\n\n- 7 local measures\n- 5 cluster-level measures\n- 6 global measures\n\nFor a complete list of supported measures, refer to [measures](/src/zadu/measures). The library initially provided 17 measures when it was first introduced by our academic paper, and we added one more measure (label trustworthiness & continuity) to the library.\n\n## How To Use ZADU\n\nZADU provides two different interfaces for executing distortion measures.\nYou can either use the main class that wraps the measures, or directly access and invoke the functions that define each distortion measure.\n\n### Using the Main Class\n\nUse the main class of ZADU to compute distortion measures.\nThis approach benefits from the optimization, providing faster performance when executing multiple measures.\n\n\n```python\nfrom zadu import zadu\n\nhd, ld = load_datasets()\nspec = [{\n \"id\" : \"tnc\",\n \"params\": { \"k\": 20 },\n}, {\n \"id\" : \"snc\",\n \"params\": { \"k\": 30, \"clustering_strategy\": \"dbscan\" }\n}]\n\nscores = zadu.ZADU(spec, hd).measure(ld)\nprint(\"T&C:\", scores[0])\nprint(\"S&C:\", scores[1])\n\n```\n\n`hd` represents high-dimensional data, `ld` represents low-dimensional data\n\n## ZADU Class\n\nThe ZADU class provides the main interface for the Zadu library, allowing users to evaluate and analyze dimensionality reduction (DR) embeddings effectively and reliably.\n\n### Class Constructor\n\nThe ZADU class constructor has the following signature:\n\n```python\nclass ZADU(spec: List[Dict[str, Union[str, dict]]], hd: np.ndarray, return_local: bool = False)\n\n```\n\n### Parameters:\n\n#### `spec` \n \nA list of dictionaries that define the distortion measures to execute and their hyperparameters.\nEach dictionary must contain the following keys:\n * `\"id\"`: The identifier of the distortion measure, such as `\"tnc\"` or `\"snc\"`.\n\n * `\"params\"`: A dictionary containing hyperparameters specific to the chosen distortion measure.\n\n#### List of ID/Parameters for Each Function\n\n\n***Warning***: While using `dsc`, `ivm`, `c_evm`, `nh`, and `ca_tnc`, please be aware that these measures assume that class labels are *well-separated* in the original high-dimensional space. If the class labels are not well-separated, the measures may produce unreliable results. Use the measure only if you are confident that the class labels are well-separated. Please refer to the related [academic paper](https://www.hyeonjeon.com/assets/pdf/jeon23tvcg.pdf) for more detail. \n\n> ##### Local Measures\n> \n> | Measure | ID | Parameters | Range | Optimum |\n> |---------|----|------------|-------|---------|\n> | Trustworthiness & Continuity | tnc | `k=20` | [0.5, 1] | 1 |\n> | Mean Relative Rank Errors | mrre | `k=20` | [0, 1] | 1 | \n> | Local Continuity Meta-Criteria | lcmc | `k=20` | [0, 1] | 1 |\n> | Neighborhood hit | nh | `k=20` | [0, 1] | 1 |\n> | Neighbor Dissimilarity | nd | `k=20` | R+ | 0 |\n> | Class-Aware Trustworthiness & Continuity | ca_tnc | `k=20` | [0.5, 1] | 1|\n> | Procrustes Measure | proc | `k=20` | R+ | 0 |\n> \n> ##### Cluster-level Measures\n> \n> | Measure | ID | Parameters | Range | Optimum |\n> |---------|----|------------|-------|---------|\n> | Steadiness & Cohesiveness | snc | `iteration=150, walk_num_ratio=0.3, alpha=0.1, k=50, clustering_strategy=\"dbscan\"` | [0, 1] | 1 |\n> | Distance Consistency | dsc | | [0.5, 1] | 0.5 | \n> | Internal Validation Measures | ivm | `measure=\"silhouette\"` | Depends on IVM | Depends on IVM |\n> | Clustering + External Clustering Validation Measures | c_evm | `measure=\"arand\", clustering=\"kmeans\", clustering_args=None` | Depends on EVM | Depends on EVM |\n> | Label Trustworthiness & Continuity | l_tnc | `cvm=\"dsc\"` | [0, 1] | 1 |\n\n\n\n\n> ##### Global Measures\n> \n> | Measure | ID | Parameters | Range | Optimum |\n> |---------|----|------------|-------|---------|\n> | Stress | stress | | R+ | 0 |\n> | Kullback-Leibler Divergence | kl_div | `sigma=0.1` | R+ | 0 |\n> | Distance-to-Measure | dtm | `sigma=0.1` | R+ | 0 |\n> | Topographic Product | topo | `k=20` | R | 0 |\n> | Pearson\u2019s correlation coefficient | pr | | [-1, 1] | 1\n> | Spearman\u2019s rank correlation coefficient | srho | | [-1, 1] | 1\n\n\n\n##### `hd`\n \nA high-dimensional dataset (numpy array) to register and reuse during the evaluation process.\n\n\n##### `return_local`\n \nA boolean flag that, when set to `True`, enables the computation of local pointwise distortions for each data point. The default value is `False`.\n\n\n### Directly Accessing Functions\n\nYou can also directly access and invoke the functions defining each distortion measure for greater flexibility.\n\n```python\nfrom zadu.measures import *\n\nmrre = mean_relative_rank_error.measure(hd, ld, k=20)\npr = pearson_r.measure(hd, ld)\nnh = neighborhood_hit.measure(ld, label, k=20)\n```\n\n## Advanced Features\n\n### Optimizing the Execution\n\nZADU automatically optimizes the execution of multiple distortion measures. It minimizes the computational overhead associated with preprocessing stages such as pairwise distance calculation, pointwise distance ranking determination, and k-nearest neighbor identification.\n\n### Computing Pointwise Local Distortions\n\nUsers can obtain local pointwise distortions by setting the return_local flag. If a specified distortion measure produces local pointwise distortion as intermediate results, it returns a list of pointwise distortions when the flag is raised.\n\n```python\nfrom zadu import zadu\n\nspec = [{\n \"id\" : \"dtm\",\n \"params\": {}\n}, {\n \"id\" : \"mrre\",\n \"params\": { \"k\": 30 }\n}]\n\nzadu_obj = zadu.ZADU(spec, hd, return_local=True)\nglobal_, local_ = zadu_obj.measure(ld)\nprint(\"MRRE local distortions:\", local_[1])\n\n```\n\n### Visualizing Local Distortions\n\nWith the pointwise local distortions obtained from ZADU, users can visualize the distortions using various distortion visualizations. We provide ZADUVis, a python library that enables the rendering of two disotortion visualizations: [CheckViz](https://onlinelibrary.wiley.com/doi/full/10.1111/j.1467-8659.2010.01835.x) and the [Reliability Map](https://arxiv.org/abs/2107.07859).\n\n\n\n\n```python\nfrom zadu import zadu\nfrom zaduvis import zaduvis\nimport matplotlib.pyplot as plt\nfrom sklearn.manifold import TSNE\nfrom sklearn.datasets import fetch_openml\n\n\nhd = fetch_openml('mnist_784', version=1, cache=True).data.to_numpy()[::7]\nld = TSNE().fit_transform(hd)\n\n## Computing local pointwise distortions\nspec = [{\n \"id\": \"tnc\",\n \"params\": {\"k\": 25}\n},{\n \"id\": \"snc\",\n \"params\": {\"k\": 50}\n}]\nzadu_obj = zadu.ZADU(spec, hd, return_local=True)\nscores, local_list = zadu_obj.measure(ld)\n\ntnc_local = local_list[0]\nsnc_local = local_list[1]\n\nlocal_trustworthiness = tnc_local[\"local_trustworthiness\"]\nlocal_continuity = tnc_local[\"local_continuity\"]\nlocal_steadiness = snc_local[\"local_steadiness\"]\nlocal_cohesiveness = snc_local[\"local_cohesiveness\"]\n\nfig, ax = plt.subplots(1, 4, figsize=(50, 12.5))\nzaduvis.checkviz(ld, local_trustworthiness, local_continuity, ax=ax[0])\nzaduvis.reliability_map(ld, local_trustworthiness, local_continuity, k=10, ax=ax[1])\nzaduvis.checkviz(ld, local_steadiness, local_cohesiveness, ax=ax[2])\nzaduvis.reliability_map(ld, local_steadiness, local_cohesiveness, k=10, ax=ax[3])\n\n\n```\nThe above code snippet demonstrates how to visualize local pointwise distortions using CheckViz and Reliability Map plots.\n\n\n\n## Documentation\n\nFor more information about the available distortion measures, their use cases, and examples, please refer to our paper (IEEE VIS 2023 Short).\n\n## Citation\n\n> Hyeon Jeon, Aeri Cho, Jinhwa Jang, Soohyun Lee, Jake Hyun, Hyung-Kwon Ko, Jaemin Jo, and Jinwook Seo. Zadu: A python library for evaluating the reliability of dimensionality reduction embeddings. In 2023 IEEE Visualization and Visual Analytics (VIS), 2023. to appear.\n\n```bib\n@inproceedings{jeon23vis,\n author={Jeon, Hyeon and Cho, Aeri and Jang, Jinhwa and Lee, Soohyun and Hyun, Jake and Ko, Hyung-Kwon and Jo, Jaemin and Seo, Jinwook},\n booktitle={2023 IEEE Visualization and Visual Analytics (VIS)}, \n title={ZADU: A Python Library for Evaluating the Reliability of Dimensionality Reduction Embeddings}, \n year={2023},\n volume={},\n number={},\n pages={},\n doi={},\n note={to appear}\n}\n```\n\n",
"bugtrack_url": null,
"license": null,
"summary": "A Python Toolkit for Evaluating the Reliability of Dimensionality Reduction Embeddings",
"version": "0.2.0",
"project_urls": {
"Homepage": "https://github.com/hj-n/zadu"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "a653ef56f972eaa15c0cc4f35d7b8846e896024c18bb5924700314d8f29c4f21",
"md5": "929f1e1b1b39ca0d205d9d12652bfa78",
"sha256": "0f8d83390616ea0dbd64c7f9e8e87bdbddd870640547f1cfcd24e5e250ed2981"
},
"downloads": -1,
"filename": "zadu-0.2.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "929f1e1b1b39ca0d205d9d12652bfa78",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9.0",
"size": 26387,
"upload_time": "2024-07-31T13:49:57",
"upload_time_iso_8601": "2024-07-31T13:49:57.311378Z",
"url": "https://files.pythonhosted.org/packages/a6/53/ef56f972eaa15c0cc4f35d7b8846e896024c18bb5924700314d8f29c4f21/zadu-0.2.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "c992bed2e3f0f38f815c171da5784c03843f895ee83dcb33613393285873719c",
"md5": "1afb9eea072ea401d81d8d5b562ab0bc",
"sha256": "e15d07d7201041a7487193dbefcde6cdc5656650d8865617f9b2f8b13e6a3d08"
},
"downloads": -1,
"filename": "zadu-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "1afb9eea072ea401d81d8d5b562ab0bc",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9.0",
"size": 20819,
"upload_time": "2024-07-31T13:49:59",
"upload_time_iso_8601": "2024-07-31T13:49:59.230094Z",
"url": "https://files.pythonhosted.org/packages/c9/92/bed2e3f0f38f815c171da5784c03843f895ee83dcb33613393285873719c/zadu-0.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-31 13:49:59",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "hj-n",
"github_project": "zadu",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "zadu"
}