persistable-clustering


Namepersistable-clustering JSON
Version 0.5.1 PyPI version JSON
download
home_pagehttps://github.com/LuisScoccola/persistable
SummaryDensity-based clustering for exploratory data analysis based on multi-parameter persistence
upload_time2023-09-22 15:59:36
maintainerLuis Scoccola
docs_urlNone
author
requires_python
license3-clause BSD
keywords clustering density hierarchical persistence tda
VCS
bugtrack_url
requirements numpy scipy scikit-learn cython plotly dash diskcache multiprocess psutil
Travis-CI No Travis.
coveralls test coverage
            
[![PyPI](https://img.shields.io/pypi/v/persistable-clustering?color=green)](https://pypi.org/project/persistable-clustering)
[![Downloads](https://static.pepy.tech/personalized-badge/persistable-clustering?period=total&units=international_system&left_color=grey&right_color=green&left_text=Downloads)](https://pepy.tech/project/persistable-clustering)
[![tests](https://github.com/LuisScoccola/persistable/actions/workflows/run_tests.yaml/badge.svg)](https://github.com/LuisScoccola/persistable/actions/workflows/run_tests.yaml)
[![coverage](https://codecov.io/gh/LuisScoccola/persistable/branch/main/graph/badge.svg)](https://codecov.io/gh/LuisScoccola/persistable)
[![docs](https://readthedocs.org/projects/persistable/badge/?version=latest)](https://persistable.readthedocs.io/)
[![status](https://joss.theoj.org/papers/63d612cd4730c3aa708e3a47eb2c50f3/status.svg)](https://joss.theoj.org/papers/63d612cd4730c3aa708e3a47eb2c50f3)
[![license](https://img.shields.io/github/license/LuisScoccola/persistable)](https://github.com/LuisScoccola/persistable/blob/main/LICENSE)
---

<p align="center">
    <img src="https://raw.githubusercontent.com/LuisScoccola/persistable/main/docs/pictures/logo.svg" width="550">
</p>

Persistent and stable clustering (Persistable) is a density-based clustering algorithm intended for exploratory data analysis.
What distinguishes Persistable from other clustering algorithms is its visualization capabilities.
Persistable's interactive mode lets you visualize multi-scale and multi-density cluster structure present in the data.
This is used to guide the choice of parameters that lead to the final clustering.


## Usage

Here is a brief outline of the main functionality; see the [documentation](https://persistable.readthedocs.io/) for details, including the [API reference](https://persistable.readthedocs.io/en/latest/api.html).

In order to run Persistable's interactive mode from a Jupyter notebook, run the following in a Jupyter cell:

```python
import persistable
from sklearn.datasets import make_blobs

X = make_blobs(2000, centers=4, random_state=1)[0]

p = persistable.Persistable(X)
pi = persistable.PersistableInteractive(p)
pi.start_ui()
```

The last command returns the port in `localhost` serving the UI, which is `8050` by default.
Now go to `localhost:8050` in your web browser to access the graphical user interface:

![Alt text](https://raw.githubusercontent.com/LuisScoccola/persistable/main/docs/pictures/GUI.png)

After choosing your parameters using the user interface, you can get your clustering in another Jupyter cell by running:

```python
clustering_labels = pi.cluster()
```

**Note:** You may use `pi.start_ui(jupyter_mode="inline")` to have the graphical user interface display directly in the Jupyter notebook!


## Installing

Make sure you are using Python 3.
Persistable depends on the following python packages, which will be installed automatically when you install with `pip`:
`numpy`, `scipy`, `scikit-learn`, `cython`, `plotly`, `dash`, `diskcache`, `multiprocess`, `psutil`.
To install from pypi, simply run the following:

```bash
pip install persistable-clustering
```


## Documentation and support

You can find the documentation at [persistable.readthedocs.io](https://persistable.readthedocs.io/).
If you have further questions, please [open an issue](https://github.com/LuisScoccola/persistable/issues/new) and we will do our best to help you.
Please include as much information as possible, including your system's information, warnings, logs, screenshots, and anything else you think may be of use.
If you do not wish to open an issue, you are also welcome to contact [Luis Scoccola](https://luisscoccola.github.io/) directly.
Please be patient if it takes us a bit to get back to you.



## Running the tests

You can run the tests by running the following commands from the root directory of a clone of this repository.
If a test fails, please [report a bug](https://github.com/LuisScoccola/persistable/issues/new), trying to include as much information as possible, including your system's information, warnings, logs, screenshots, and anything else you think may be of use.

```bash
pip install pytest playwright pytest-playwright
python -m playwright install --with-deps
pip install -r requirements.txt
python -m setup build_ext --inplace
pytest .
```


## Details about theory and implementation

Persistable is based on multi-parameter persistence [[4]](#4), a method from topological data analysis.
The theory behind Persistable is developed in [[1]](#1), while this implementation uses the high performance algorithms for density-based clustering developed in [[2]](#2) and implemented in [[3]](#3).
Persistable's interactive mode is inspired by RIVET [[5]](#5) and is implemented in [Dash](https://dash.plotly.com/).


## Contributing

To contribute, you can fork the project, make your changes, and submit a pull request.
You may want to contact [Luis Scoccola](https://luisscoccola.github.io/) first, to make sure your work does not overlap with ongoing work.


## Authors

[Luis Scoccola](https://luisscoccola.github.io/) and [Alexander Rolle](https://alexanderrolle.github.io/).

## Citing

If you use this package in your work, you may cite the corresponding paper using the following bibtex entry:

```
@article{Scoccola2023,
    doi = {10.21105/joss.05022},
    url = {https://doi.org/10.21105/joss.05022},
    year = {2023},
    publisher = {The Open Journal},
    volume = {8},
    number = {83},
    pages = {5022},
    author = {Luis Scoccola and Alexander Rolle},
    title = {Persistable: persistent and stable clustering},
    journal = {Journal of Open Source Software}
}
```

## References

<a id="1">[1]</a> 
*Stable and consistent density-based clustering*. A. Rolle and L. Scoccola. [arXiv:2005.09048](https://arxiv.org/abs/2005.09048)

<a id="2">[2]</a> 
*Accelerated Hierarchical Density Based Clustering*. L. McInnes, J. Healy. 2017 IEEE International Conference on Data Mining Workshops (ICDMW), IEEE, pp 33-42. 2017

<a id="3">[3]</a> 
*hdbscan: Hierarchical density based clustering*. L. McInnes, J. Healy, S. Astels. Journal of Open Source Software, The Open Journal, volume 2, number 11. 2017

<a id="4">[4]</a> 
*An Introduction to Multiparameter Persistence*. M. B. Botnan, M. Lesnick. Proceedings of the 2020 International Conference on Representations of Algebras. 2022

<a id="5">[5]</a> 
*RIVET*. The RIVET Developers. [[Git]](https://github.com/rivetTDA/rivet) [[docs]](https://rivet.readthedocs.io/en/latest/index.html)

<!---
<a id="4">[4]</a> 
*Density-based clustering based on hierarchical density estimates*. R. J. G. B. Campello, D. Moulavi, and J. Sander. Advances in Knowledge Discovery and Data Mining, volume 7819 of Lecture Notes in Computer Science, pp. 160-172. Springer, 2013.
-->


## License

This software is published under the 3-clause BSD license.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/LuisScoccola/persistable",
    "name": "persistable-clustering",
    "maintainer": "Luis Scoccola",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "luis.scoccola@gmail.com",
    "keywords": "clustering density hierarchical persistence TDA",
    "author": "",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/44/72/e10f37811bf20bc8486d65db0ebdb563a39000f588696953cd2457af8b20/persistable-clustering-0.5.1.tar.gz",
    "platform": null,
    "description": "\n[![PyPI](https://img.shields.io/pypi/v/persistable-clustering?color=green)](https://pypi.org/project/persistable-clustering)\n[![Downloads](https://static.pepy.tech/personalized-badge/persistable-clustering?period=total&units=international_system&left_color=grey&right_color=green&left_text=Downloads)](https://pepy.tech/project/persistable-clustering)\n[![tests](https://github.com/LuisScoccola/persistable/actions/workflows/run_tests.yaml/badge.svg)](https://github.com/LuisScoccola/persistable/actions/workflows/run_tests.yaml)\n[![coverage](https://codecov.io/gh/LuisScoccola/persistable/branch/main/graph/badge.svg)](https://codecov.io/gh/LuisScoccola/persistable)\n[![docs](https://readthedocs.org/projects/persistable/badge/?version=latest)](https://persistable.readthedocs.io/)\n[![status](https://joss.theoj.org/papers/63d612cd4730c3aa708e3a47eb2c50f3/status.svg)](https://joss.theoj.org/papers/63d612cd4730c3aa708e3a47eb2c50f3)\n[![license](https://img.shields.io/github/license/LuisScoccola/persistable)](https://github.com/LuisScoccola/persistable/blob/main/LICENSE)\n---\n\n<p align=\"center\">\n    <img src=\"https://raw.githubusercontent.com/LuisScoccola/persistable/main/docs/pictures/logo.svg\" width=\"550\">\n</p>\n\nPersistent and stable clustering (Persistable) is a density-based clustering algorithm intended for exploratory data analysis.\nWhat distinguishes Persistable from other clustering algorithms is its visualization capabilities.\nPersistable's interactive mode lets you visualize multi-scale and multi-density cluster structure present in the data.\nThis is used to guide the choice of parameters that lead to the final clustering.\n\n\n## Usage\n\nHere is a brief outline of the main functionality; see the [documentation](https://persistable.readthedocs.io/) for details, including the [API reference](https://persistable.readthedocs.io/en/latest/api.html).\n\nIn order to run Persistable's interactive mode from a Jupyter notebook, run the following in a Jupyter cell:\n\n```python\nimport persistable\nfrom sklearn.datasets import make_blobs\n\nX = make_blobs(2000, centers=4, random_state=1)[0]\n\np = persistable.Persistable(X)\npi = persistable.PersistableInteractive(p)\npi.start_ui()\n```\n\nThe last command returns the port in `localhost` serving the UI, which is `8050` by default.\nNow go to `localhost:8050` in your web browser to access the graphical user interface:\n\n![Alt text](https://raw.githubusercontent.com/LuisScoccola/persistable/main/docs/pictures/GUI.png)\n\nAfter choosing your parameters using the user interface, you can get your clustering in another Jupyter cell by running:\n\n```python\nclustering_labels = pi.cluster()\n```\n\n**Note:** You may use `pi.start_ui(jupyter_mode=\"inline\")` to have the graphical user interface display directly in the Jupyter notebook!\n\n\n## Installing\n\nMake sure you are using Python 3.\nPersistable depends on the following python packages, which will be installed automatically when you install with `pip`:\n`numpy`, `scipy`, `scikit-learn`, `cython`, `plotly`, `dash`, `diskcache`, `multiprocess`, `psutil`.\nTo install from pypi, simply run the following:\n\n```bash\npip install persistable-clustering\n```\n\n\n## Documentation and support\n\nYou can find the documentation at [persistable.readthedocs.io](https://persistable.readthedocs.io/).\nIf you have further questions, please [open an issue](https://github.com/LuisScoccola/persistable/issues/new) and we will do our best to help you.\nPlease include as much information as possible, including your system's information, warnings, logs, screenshots, and anything else you think may be of use.\nIf you do not wish to open an issue, you are also welcome to contact [Luis Scoccola](https://luisscoccola.github.io/) directly.\nPlease be patient if it takes us a bit to get back to you.\n\n\n\n## Running the tests\n\nYou can run the tests by running the following commands from the root directory of a clone of this repository.\nIf a test fails, please [report a bug](https://github.com/LuisScoccola/persistable/issues/new), trying to include as much information as possible, including your system's information, warnings, logs, screenshots, and anything else you think may be of use.\n\n```bash\npip install pytest playwright pytest-playwright\npython -m playwright install --with-deps\npip install -r requirements.txt\npython -m setup build_ext --inplace\npytest .\n```\n\n\n## Details about theory and implementation\n\nPersistable is based on multi-parameter persistence [[4]](#4), a method from topological data analysis.\nThe theory behind Persistable is developed in [[1]](#1), while this implementation uses the high performance algorithms for density-based clustering developed in [[2]](#2) and implemented in [[3]](#3).\nPersistable's interactive mode is inspired by RIVET [[5]](#5) and is implemented in [Dash](https://dash.plotly.com/).\n\n\n## Contributing\n\nTo contribute, you can fork the project, make your changes, and submit a pull request.\nYou may want to contact [Luis Scoccola](https://luisscoccola.github.io/) first, to make sure your work does not overlap with ongoing work.\n\n\n## Authors\n\n[Luis Scoccola](https://luisscoccola.github.io/) and [Alexander Rolle](https://alexanderrolle.github.io/).\n\n## Citing\n\nIf you use this package in your work, you may cite the corresponding paper using the following bibtex entry:\n\n```\n@article{Scoccola2023,\n    doi = {10.21105/joss.05022},\n    url = {https://doi.org/10.21105/joss.05022},\n    year = {2023},\n    publisher = {The Open Journal},\n    volume = {8},\n    number = {83},\n    pages = {5022},\n    author = {Luis Scoccola and Alexander Rolle},\n    title = {Persistable: persistent and stable clustering},\n    journal = {Journal of Open Source Software}\n}\n```\n\n## References\n\n<a id=\"1\">[1]</a> \n*Stable and consistent density-based clustering*. A. Rolle and L. Scoccola. [arXiv:2005.09048](https://arxiv.org/abs/2005.09048)\n\n<a id=\"2\">[2]</a> \n*Accelerated Hierarchical Density Based Clustering*. L. McInnes, J. Healy. 2017 IEEE International Conference on Data Mining Workshops (ICDMW), IEEE, pp 33-42. 2017\n\n<a id=\"3\">[3]</a> \n*hdbscan: Hierarchical density based clustering*. L. McInnes, J. Healy, S. Astels. Journal of Open Source Software, The Open Journal, volume 2, number 11. 2017\n\n<a id=\"4\">[4]</a> \n*An Introduction to Multiparameter Persistence*. M. B. Botnan, M. Lesnick. Proceedings of the 2020 International Conference on Representations of Algebras. 2022\n\n<a id=\"5\">[5]</a> \n*RIVET*. The RIVET Developers. [[Git]](https://github.com/rivetTDA/rivet) [[docs]](https://rivet.readthedocs.io/en/latest/index.html)\n\n<!---\n<a id=\"4\">[4]</a> \n*Density-based clustering based on hierarchical density estimates*. R. J. G. B. Campello, D. Moulavi, and J. Sander. Advances in Knowledge Discovery and Data Mining, volume 7819 of Lecture Notes in Computer Science, pp. 160-172. Springer, 2013.\n-->\n\n\n## License\n\nThis software is published under the 3-clause BSD license.\n",
    "bugtrack_url": null,
    "license": "3-clause BSD",
    "summary": "Density-based clustering for exploratory data analysis based on multi-parameter persistence",
    "version": "0.5.1",
    "project_urls": {
        "Homepage": "https://github.com/LuisScoccola/persistable"
    },
    "split_keywords": [
        "clustering",
        "density",
        "hierarchical",
        "persistence",
        "tda"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9ef96fb14185db9d64902e2e3c74d09778ce7d368d1d91ba744008cc537b6bdb",
                "md5": "299ca40867d33e940c43b038d559e8c9",
                "sha256": "2bb9c32c39c434e77540007e028b1ea12f79880f0b406c80671018d7a73a8ca3"
            },
            "downloads": -1,
            "filename": "persistable_clustering-0.5.1-cp311-cp311-macosx_10_9_universal2.whl",
            "has_sig": false,
            "md5_digest": "299ca40867d33e940c43b038d559e8c9",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": null,
            "size": 1253845,
            "upload_time": "2023-09-22T16:00:56",
            "upload_time_iso_8601": "2023-09-22T16:00:56.657616Z",
            "url": "https://files.pythonhosted.org/packages/9e/f9/6fb14185db9d64902e2e3c74d09778ce7d368d1d91ba744008cc537b6bdb/persistable_clustering-0.5.1-cp311-cp311-macosx_10_9_universal2.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "39c07f63515821001ca9b4180467803de7cd0c495d222a1056be6808332e8f28",
                "md5": "92532a08d8d82c4beefe1fa156de26e5",
                "sha256": "cde9a044a1e01dbc0d680dce16f8dd75658f521b0258d30149336d3d39b52fe4"
            },
            "downloads": -1,
            "filename": "persistable_clustering-0.5.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "92532a08d8d82c4beefe1fa156de26e5",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": null,
            "size": 3590287,
            "upload_time": "2023-09-22T15:59:39",
            "upload_time_iso_8601": "2023-09-22T15:59:39.998396Z",
            "url": "https://files.pythonhosted.org/packages/39/c0/7f63515821001ca9b4180467803de7cd0c495d222a1056be6808332e8f28/persistable_clustering-0.5.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "44d3857199ee6233c822e3d762b885f33068c9ed8aac94055265810493f218da",
                "md5": "a77c0c2bd628246d0a25f37aef3fca35",
                "sha256": "48483e9c197395b698d432e46bc1dffcce4b62f53089bdee9a8b79648923ee6c"
            },
            "downloads": -1,
            "filename": "persistable_clustering-0.5.1-cp311-cp311-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "a77c0c2bd628246d0a25f37aef3fca35",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": null,
            "size": 587216,
            "upload_time": "2023-09-22T16:03:02",
            "upload_time_iso_8601": "2023-09-22T16:03:02.538684Z",
            "url": "https://files.pythonhosted.org/packages/44/d3/857199ee6233c822e3d762b885f33068c9ed8aac94055265810493f218da/persistable_clustering-0.5.1-cp311-cp311-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4472e10f37811bf20bc8486d65db0ebdb563a39000f588696953cd2457af8b20",
                "md5": "850dedf60bcddc6fa59ee9eccfa99490",
                "sha256": "0729302faf1122e7db19787ed631cf1d73cba8ff23ee2d136045131bc9a7b727"
            },
            "downloads": -1,
            "filename": "persistable-clustering-0.5.1.tar.gz",
            "has_sig": false,
            "md5_digest": "850dedf60bcddc6fa59ee9eccfa99490",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 68228,
            "upload_time": "2023-09-22T15:59:36",
            "upload_time_iso_8601": "2023-09-22T15:59:36.107383Z",
            "url": "https://files.pythonhosted.org/packages/44/72/e10f37811bf20bc8486d65db0ebdb563a39000f588696953cd2457af8b20/persistable-clustering-0.5.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-22 15:59:36",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "LuisScoccola",
    "github_project": "persistable",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "requirements": [
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.21"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    ">=",
                    "1.7"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    ">=",
                    "1.3"
                ]
            ]
        },
        {
            "name": "cython",
            "specs": [
                [
                    "<",
                    "3"
                ],
                [
                    ">=",
                    "0.29"
                ]
            ]
        },
        {
            "name": "plotly",
            "specs": [
                [
                    ">=",
                    "5.10"
                ]
            ]
        },
        {
            "name": "dash",
            "specs": [
                [
                    ">=",
                    "2.11"
                ]
            ]
        },
        {
            "name": "diskcache",
            "specs": [
                [
                    ">=",
                    "5.4"
                ]
            ]
        },
        {
            "name": "multiprocess",
            "specs": [
                [
                    ">=",
                    "0.70"
                ]
            ]
        },
        {
            "name": "psutil",
            "specs": [
                [
                    ">=",
                    "5.9"
                ]
            ]
        }
    ],
    "lcname": "persistable-clustering"
}
        
Elapsed time: 0.13007s