topomap


Nametopomap JSON
Version 0.0.3 PyPI version JSON
download
home_pagehttps://github.com/VIDA-NYU/TopoMap-pp
SummaryTopoMap++: A faster and more space efficient technique to compute projections with topological guarantees
upload_time2024-08-09 17:00:11
maintainerNone
docs_urlNone
authorVitoria Guardieiro, Felipe Inagaki de Oliveira, Harish Doraiswamy, Luis Gustavo Nonato, Claudio Silva
requires_pythonNone
licenseNone
keywords topological data analysis computational topology high-dimensional data projection
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # TopoMap++

GitHub repository for our paper, *TopoMap++: A faster and more space efficient technique to compute projections with topological guarantees*. TopoMap++ is a layout improvement scheme to highlight important structures in the TopoMap layout. This new approach maintains local topological guarantees and introduces an interactive, TreeMap-based visualization for easier analysis of high-dimensional datasets. Additionally, we propose an efficient approximation scheme inspired by ANNS algorithms to compute the Rips filtration, drastically reducing computational costs while preserving data topology.

TopoMap++ improves upon TopoMap, which was originally implemented in C++ [here](https://github.com/harishd10/TopoMap) and outlined in the paper:

> Harish Doraiswamy, Julien Tierny, Paulo J. S. Silva, Luis Gustavo Nonato, and Claudio Silva. [TopoMap: A 0-dimensional Homology Preserving Projection of High-Dimensional Data](https://arxiv.org/abs/2009.01512), IEEE Transactions on Visualization and Computer Graphics (IEEE SciVis '20), 2020.

And TopoMap++ is outlined in:

> Vitoria Guardieiro, Felipe Inagaki de Oliveira, Harish Doraiswamy, Luis Gustavo Nonato, and Claudio Silva. TopoMap++: A faster and more space efficient technique to compute
projections with topological guarantees, IEEE Transactions on Visualization and Computer Graphics (IEEE VIS '24), 2024.

## Usage

This version was implemented with Python 3.11.7 and all packages are listed in `requirements.txt`.

### TopoMap

To run `TopoMap`, you just need to pass your data points as a numpy array (`X` in the following example):

```
from TopoMap import TopoMap

topomap = TopoMap()
proj = topomap.fit_transform(X)
```

The output `proj` is also a numpy array, with the same number of rows as `X` and two dimensions.

To use an approximate but much faster version, set approach="ANN" when creating the TopoMap, like this: `TopoMap(approach="ANN")`. Note that this requires the data to be in `np.float32`, `np.uint8`, or `np.int8` format.

### TopoTree


To run `TopoTree`, you also need to pass your data points as a numpy array. Additionally, `TopoTree` receives the (optional) parameter `min_box_size`, which is the minimum number of points in a component for it to be represented in the tree. In the following example, we set `min_box_size` to 5% of the data points:

```
from TopoTree import TopoTree

topotree = TopoTree(min_box_size=0.05*X.shape[0])
comp_info = topotree.fit(X) 
```

The output `comp_info` is a list in which each element is a dictionary corresponding to a component and containing information such as its id, size, and list of data points.

To visualize the components as a tree, we provide the `plot_hierarchical_treemap` function in `visualizations.py`. To do so, you will need to pass the components' information as a pandas DataFrame:

```
from visualizations import plot_hierarchical_treemap

df_comp = pd.DataFrame.from_dict(comp_info)
fig = plot_hierarchical_treemap(df_comp_blobs)
fig.show()
```

### Hierarchical TopoMap

To run `HierarchicalTopoMap`, you also need to pass your data points as a numpy array. Additionally, you need to indicate which components to scale (by providing a list of component ids) or how the component selection should be made:

```
from HierarchicalTopoMap import HierarchicalTopoMap

hier_topomap = HierarchicalTopoMap(components_to_scale=components_to_scale)
proj = hier_topomap.fit_transform(X)
```

## Examples and Case Studies

In the "examples" folder, we provide example notebooks that illustrate the use of Topomap, TopoTree, and Hierarchical Topomap. These notebooks also demonstrate how to compute the approximate minimum spanning tree and compare it with the original.

Additionally, we offer notebooks that reproduce the case studies section of our paper.

To reproduce the outputs in the notebooks, please place our [data](https://drive.google.com/file/d/1unPHq1-wc_nODQP2igb-28peXbyFn-NN/view?usp=sharing), in the 'data' folder at the root of the project. The StreetAware data can be found [here](https://drive.google.com/drive/folders/1nkmWsjCDIDws4zL7WMRcLiOn2qqroRsE?usp=sharing)

## App

We also provide a simple interface for iteratively exploring the connection between a Hierarchical Topomap and a TopoTree. To use it, please install Flask and Fastparquet.

We have an [app_data](https://drive.google.com/file/d/1RdLbcOBsedBO6LQ9u8IX_-SNC72VOFlS/view?usp=sharing) folder containing the necessary structure and data examples. To use your own data, add your dataset as a NumPy array in the "numpy_datasets_app" subfolder and your dataframe of features as a CSV in the "features_app" subfolder. Then, update the "datasets," "features," and "available_columns" dictionaries in app.py accordingly.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/VIDA-NYU/TopoMap-pp",
    "name": "topomap",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "Topological data analysis, Computational topology, High-dimensional data, Projection",
    "author": "Vitoria Guardieiro, Felipe Inagaki de Oliveira, Harish Doraiswamy, Luis Gustavo Nonato, Claudio Silva",
    "author_email": "vitoriaguardieiro@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/65/13/e7adc920df9ed590718891ba7609ab854c14cac95400b1d1910abab4642b/topomap-0.0.3.tar.gz",
    "platform": null,
    "description": "# TopoMap++\n\nGitHub repository for our paper, *TopoMap++: A faster and more space efficient technique to compute projections with topological guarantees*. TopoMap++ is a layout improvement scheme to highlight important structures in the TopoMap layout. This new approach maintains local topological guarantees and introduces an interactive, TreeMap-based visualization for easier analysis of high-dimensional datasets. Additionally, we propose an efficient approximation scheme inspired by ANNS algorithms to compute the Rips filtration, drastically reducing computational costs while preserving data topology.\n\nTopoMap++ improves upon TopoMap, which was originally implemented in C++ [here](https://github.com/harishd10/TopoMap) and outlined in the paper:\n\n> Harish Doraiswamy, Julien Tierny, Paulo J. S. Silva, Luis Gustavo Nonato, and Claudio Silva. [TopoMap: A 0-dimensional Homology Preserving Projection of High-Dimensional Data](https://arxiv.org/abs/2009.01512), IEEE Transactions on Visualization and Computer Graphics (IEEE SciVis '20), 2020.\n\nAnd TopoMap++ is outlined in:\n\n> Vitoria Guardieiro, Felipe Inagaki de Oliveira, Harish Doraiswamy, Luis Gustavo Nonato, and Claudio Silva. TopoMap++: A faster and more space efficient technique to compute\nprojections with topological guarantees, IEEE Transactions on Visualization and Computer Graphics (IEEE VIS '24), 2024.\n\n## Usage\n\nThis version was implemented with Python 3.11.7 and all packages are listed in `requirements.txt`.\n\n### TopoMap\n\nTo run `TopoMap`, you just need to pass your data points as a numpy array (`X` in the following example):\n\n```\nfrom TopoMap import TopoMap\n\ntopomap = TopoMap()\nproj = topomap.fit_transform(X)\n```\n\nThe output `proj` is also a numpy array, with the same number of rows as `X` and two dimensions.\n\nTo use an approximate but much faster version, set approach=\"ANN\" when creating the TopoMap, like this: `TopoMap(approach=\"ANN\")`. Note that this requires the data to be in `np.float32`, `np.uint8`, or `np.int8` format.\n\n### TopoTree\n\n\nTo run `TopoTree`, you also need to pass your data points as a numpy array. Additionally, `TopoTree` receives the (optional) parameter `min_box_size`, which is the minimum number of points in a component for it to be represented in the tree. In the following example, we set `min_box_size` to 5% of the data points:\n\n```\nfrom TopoTree import TopoTree\n\ntopotree = TopoTree(min_box_size=0.05*X.shape[0])\ncomp_info = topotree.fit(X) \n```\n\nThe output `comp_info` is a list in which each element is a dictionary corresponding to a component and containing information such as its id, size, and list of data points.\n\nTo visualize the components as a tree, we provide the `plot_hierarchical_treemap` function in `visualizations.py`. To do so, you will need to pass the components' information as a pandas DataFrame:\n\n```\nfrom visualizations import plot_hierarchical_treemap\n\ndf_comp = pd.DataFrame.from_dict(comp_info)\nfig = plot_hierarchical_treemap(df_comp_blobs)\nfig.show()\n```\n\n### Hierarchical TopoMap\n\nTo run `HierarchicalTopoMap`, you also need to pass your data points as a numpy array. Additionally, you need to indicate which components to scale (by providing a list of component ids) or how the component selection should be made:\n\n```\nfrom HierarchicalTopoMap import HierarchicalTopoMap\n\nhier_topomap = HierarchicalTopoMap(components_to_scale=components_to_scale)\nproj = hier_topomap.fit_transform(X)\n```\n\n## Examples and Case Studies\n\nIn the \"examples\" folder, we provide example notebooks that illustrate the use of Topomap, TopoTree, and Hierarchical Topomap. These notebooks also demonstrate how to compute the approximate minimum spanning tree and compare it with the original.\n\nAdditionally, we offer notebooks that reproduce the case studies section of our paper.\n\nTo reproduce the outputs in the notebooks, please place our [data](https://drive.google.com/file/d/1unPHq1-wc_nODQP2igb-28peXbyFn-NN/view?usp=sharing), in the 'data' folder at the root of the project. The StreetAware data can be found [here](https://drive.google.com/drive/folders/1nkmWsjCDIDws4zL7WMRcLiOn2qqroRsE?usp=sharing)\n\n## App\n\nWe also provide a simple interface for iteratively exploring the connection between a Hierarchical Topomap and a TopoTree. To use it, please install Flask and Fastparquet.\n\nWe have an [app_data](https://drive.google.com/file/d/1RdLbcOBsedBO6LQ9u8IX_-SNC72VOFlS/view?usp=sharing) folder containing the necessary structure and data examples. To use your own data, add your dataset as a NumPy array in the \"numpy_datasets_app\" subfolder and your dataframe of features as a CSV in the \"features_app\" subfolder. Then, update the \"datasets,\" \"features,\" and \"available_columns\" dictionaries in app.py accordingly.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "TopoMap++: A faster and more space efficient technique to compute projections with topological guarantees",
    "version": "0.0.3",
    "project_urls": {
        "Homepage": "https://github.com/VIDA-NYU/TopoMap-pp"
    },
    "split_keywords": [
        "topological data analysis",
        " computational topology",
        " high-dimensional data",
        " projection"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "603c42eee8b6172a172a6a704cd1a656a2e585a58b273b80a7ddf069b593d543",
                "md5": "9e4ebc468de56cad4d9c02314910cc6d",
                "sha256": "2eeaee9805a28859324f8a067dc2110c4fea0f44f8450d947e1d2dc2b3aaacd9"
            },
            "downloads": -1,
            "filename": "topomap-0.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9e4ebc468de56cad4d9c02314910cc6d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 25032,
            "upload_time": "2024-08-09T17:00:10",
            "upload_time_iso_8601": "2024-08-09T17:00:10.303290Z",
            "url": "https://files.pythonhosted.org/packages/60/3c/42eee8b6172a172a6a704cd1a656a2e585a58b273b80a7ddf069b593d543/topomap-0.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6513e7adc920df9ed590718891ba7609ab854c14cac95400b1d1910abab4642b",
                "md5": "a6f3b0b7b2316b562558596522019a7f",
                "sha256": "8eb96ca07cf042a5489f7f1f7cbfdf93d8ff4132f4d05b6983da85feb17492d5"
            },
            "downloads": -1,
            "filename": "topomap-0.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "a6f3b0b7b2316b562558596522019a7f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 22421,
            "upload_time": "2024-08-09T17:00:11",
            "upload_time_iso_8601": "2024-08-09T17:00:11.490984Z",
            "url": "https://files.pythonhosted.org/packages/65/13/e7adc920df9ed590718891ba7609ab854c14cac95400b1d1910abab4642b/topomap-0.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-09 17:00:11",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "VIDA-NYU",
    "github_project": "TopoMap-pp",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "topomap"
}
        
Elapsed time: 4.41033s