<div align="center">
<img src="https://github.com/kavehmahdavi/kavica/raw/main/doc/web/icon.png"><br>
</div>
-----------------
# KAVICA: Powerful Python Cluster Analysis and Inference Toolkit
[![PyPI Latest Release](https://img.shields.io/pypi/v/pandas.svg)](https://pypi.org/project/kavica/)
[![Conda Latest Release](https://anaconda.org/conda-forge/pandas/badges/version.svg)](https://anaconda.org/anaconda/pandas/)
[![Package Status](https://img.shields.io/pypi/status/pandas.svg)](https://pypi.org/project/kavica/)
[![License](https://img.shields.io/pypi/l/pandas.svg)](https://github.com/kavehmahdavi/kavica_container/blob/main/LICENSE)
[![Downloads](https://pepy.tech/badge/kavica/month)](https://pepy.tech/project/kavica)
[![Downloads](https://pepy.tech/badge/kavica/month)](https://pepy.tech/project/kavica)
[![Stack Overflow](https://img.shields.io/badge/stackoverflow-Ask%20questions-blue.svg)](https://stackoverflow.com/questions/tagged/kavica)
## What is it?
**kavica** is a Python package that provides semi-automated, flexible, and expressive clustering
analysis designed to make working with "unlabeled" data easy and intuitive.
It aims to be the fundamental high-level building block for doing practical, **real world** cluster analysis in Python.
Additionally, it has the broader goal of becoming **A powerful and flexible open source AutoML unsupervised / clustering
analysis tool and pipeline**. It is already well on its way towards this goal.
## Main Features
Here are just a few of the things that kavica does well:
- Intelligent [**Density
Maping**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/cluster_inference_system/space_curvature_map.py)
to model the density structuer of the data in analogy to
[Einstein's theory of relativity](https://www.space.com/17661-theory-general-relativity.html),
and automated [**Density
Homogenizing**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/cluster_inference_system/bilinear_transformation.py)
to prepare the
data for the density-based clustering (e.g DBSCAN)
- Automatic, and powerful [**Organization Component
Analysis**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/cluster_inference_system/OCA.py) to interpret
the clustering result by understanding the topological structuer of each cluster
- Topological and powerful [**Self-Organizing Maps Inference
System**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/cluster_inference_system/somis.py) to
use the self-learning ability of the SOM to understand the topological structuer of the data
- Automated and Bayesian-based [**DBSCAN Hyper-parameter
Tuner**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/tuner) to select the optimal
hyper-parameters configuration of the DBSCAN clustering algorithm
- Efficient handling of [**feature
selection**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/feature_selection/) in a potentially
high-dimensional and
massive datasets
- Gravitational implementation of Kohonen [**Generational Self-Organizing Maps (
GSOM)**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/cluster_inference_system/som.py) useful
for unsupervised learning and supper-clustering by providing an enriched graphics, plots and animations features.
- Computational geometrical model [**Polygonal
Cage**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/cluster_inference_system/polygon_cage.py) to transfer
feature vectors from a curved non-euclidean feature space to a new euclidean one.
- Robust [**factor analysis**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/factor_analysis;) to reduce a
large number of variables into fewer numbers
- Easy handling of [**missing data**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/imputation/) (represented
as `NaN`, `NA`, or `NaT`) in floating point
as well as non-floating point data
- Flexible implementation of directed and undirected [**graph data
structuer**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/graph_data_structur.py) and
algorithms.
- Intuitive [**resampling**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/resampling/) data sets
- Powerful, flexible [**parser**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/parser) functionality to
perform parsing, manipulating, and generating
operations on flat, massive and unstructured [Traces](https://tools.bsc.es/paraver/trace_generation) datasets
which are generated by [MareNostrum](https://www.bsc.es/marenostrum/marenostrum)
- [**Utilities**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/utils) functionality: intuitive explanatory
data analysis, plotting, load and generate
data, and etc...
## Examples:
- [**Feature Space Curvature Map**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/cluster_inference_system/space_curvature_map.py)
<div align="center">
<img src="https://github.com/kavehmahdavi/kavica/raw/main/doc/web/circel.gif" width="800"><br>
</div>
- [**Density
Homogenizing**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/cluster_inference_system/bilinear_transformation.py)
![](https://github.com/kavehmahdavi/kavica/raw/main/doc/web/hemo.png)
Application of Feature Space Curvature Map on a multi-density 2D dataset Synt10 containing ten clusters. (a) A scatter
plot of clusters with varied densities. The legend shows the size/N(μ,σ2) per cluster, the colors represent the data
original labeling and the
red lines draw the initial FSF. (b) shows the FSC model that is computed with our FSCM method. Note that the red lines
show the deformation of the FSF. (c) scatter plots the data (a) projected by applying our transformation through
model (b).
As a result, the diversity of the clusters’ density scaled appropriately to achieve a better density-based clustering
performance.
![](https://github.com/kavehmahdavi/kavica/raw/main/doc/web/hemo_exam.png)
- [**Polygonal Cage**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/cluster_inference_system/polygon_cage.py)
Multilinear transformation
Feature Space Curvetuer | Feature Space Fabric
--- |---
![](https://github.com/kavehmahdavi/kavica/raw/main/doc/web/BLT1.png) | ![](https://github.com/kavehmahdavi/kavica/raw/main/doc/web/BLT00.png)
Data point transformation between a bent FSC (a) and regular FSF (b) based on the Multi-linear transformation in
R<sup>2</sup>.
- [**Organization Component Analysis**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/cluster_inference_system/OCA.py)
<div align="center">
<img src="https://github.com/kavehmahdavi/kavica/raw/main/doc/web/oca.png" width="800"><br>
</div>
Application of the OCA on the Iris dataset
## Video
<div align="center">
<a href="https://www.youtube.com/watch?v=lxL3niQmBcU&t=27s">
<img src="https://github.com/kavehmahdavi/kavica/raw/main/doc/web/OCA_presentation.png" width="600">
</a>
</div>
## Where to get it
The source code is currently hosted on GitHub at: [kavica](https://github.com/kavehmahdavi/kavica)
Binary installers for the latest released version are available at the
[Python Package Index (PyPI)](https://pypi.org/project/KAVICA/) and on [Conda](https://docs.conda.io/en/latest/).
The recommended way to install kavica is to use:
```sh
# PyPI
pip install kavica
```
But it can also be installed using:
```sh
# or conda
conda config --add channels conda-forge
conda install kavica
```
To verify your setup, start Python from the command line and run the following:
```sh
import kavica
```
## Dependencies
See the [requirement.txt](/requirements.txt) for installing the required packages:
```sh
pip install -r requirements.txt
```
## Publications
[Unsupervised Feature Selection for Noisy Data](https://doi.org/10.1007/978-3-030-35231-8_6)
[Organization Component Analysis: The method for extracting insights from the shape of cluster](https://doi.org/10.1109/IJCNN52387.2021.9533650)
[Feature Space Curvature Map: A Method To Homogenize Cluster Densities](https://doi.org/10.1109/IJCNN55064.2022.9892921)
## Issue tracker
If you find a bug, please help us solve it by [filing a report](https://github.com/kavehmahdavi/kavica/issues).
## Contributing
If you want to contribute, check out the
[contribution guidelines](https://kavehmahdavi.github.io/kavica/main/contributions.html).
## License
The main library of **kavica** is
[released under the BSD 3 clause license](https://kavehmahdavi.github.io/kavica/main/license.html).
Raw data
{
"_id": null,
"home_page": "https://github.com/kavehmahdavi/kavica",
"name": "KAVICA",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3",
"maintainer_email": "",
"keywords": "Cluster Inference System,Feature Selection,Factor Analysis,Parser,Clustering,Unsupervised,Self-organizing map,Organization Component Analysis,Feature Space Curvature Map,Multiline Transformation",
"author": "Kaveh Mahdavi",
"author_email": "kavehmahdavi74@yahoo.com",
"download_url": "https://files.pythonhosted.org/packages/d9/12/fe2854a76f83bd31dab3ddb24e8e00b57a3f95c10ce718d9b15feb68cfc6/KAVICA-1.3.4.tar.gz",
"platform": null,
"description": "<div align=\"center\">\n <img src=\"https://github.com/kavehmahdavi/kavica/raw/main/doc/web/icon.png\"><br>\n</div>\n\n-----------------\n\n# KAVICA: Powerful Python Cluster Analysis and Inference Toolkit\n\n[![PyPI Latest Release](https://img.shields.io/pypi/v/pandas.svg)](https://pypi.org/project/kavica/)\n[![Conda Latest Release](https://anaconda.org/conda-forge/pandas/badges/version.svg)](https://anaconda.org/anaconda/pandas/)\n[![Package Status](https://img.shields.io/pypi/status/pandas.svg)](https://pypi.org/project/kavica/)\n[![License](https://img.shields.io/pypi/l/pandas.svg)](https://github.com/kavehmahdavi/kavica_container/blob/main/LICENSE)\n[![Downloads](https://pepy.tech/badge/kavica/month)](https://pepy.tech/project/kavica)\n[![Downloads](https://pepy.tech/badge/kavica/month)](https://pepy.tech/project/kavica)\n[![Stack Overflow](https://img.shields.io/badge/stackoverflow-Ask%20questions-blue.svg)](https://stackoverflow.com/questions/tagged/kavica)\n\n## What is it?\n\n**kavica** is a Python package that provides semi-automated, flexible, and expressive clustering\nanalysis designed to make working with \"unlabeled\" data easy and intuitive.\nIt aims to be the fundamental high-level building block for doing practical, **real world** cluster analysis in Python.\nAdditionally, it has the broader goal of becoming **A powerful and flexible open source AutoML unsupervised / clustering\nanalysis tool and pipeline**. It is already well on its way towards this goal.\n\n## Main Features\n\nHere are just a few of the things that kavica does well:\n\n- Intelligent [**Density\n Maping**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/cluster_inference_system/space_curvature_map.py)\n to model the density structuer of the data in analogy to\n [Einstein's theory of relativity](https://www.space.com/17661-theory-general-relativity.html),\n and automated [**Density\n Homogenizing**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/cluster_inference_system/bilinear_transformation.py)\n to prepare the\n data for the density-based clustering (e.g DBSCAN)\n\n- Automatic, and powerful [**Organization Component\n Analysis**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/cluster_inference_system/OCA.py) to interpret\n the clustering result by understanding the topological structuer of each cluster\n\n- Topological and powerful [**Self-Organizing Maps Inference\n System**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/cluster_inference_system/somis.py) to\n use the self-learning ability of the SOM to understand the topological structuer of the data\n\n- Automated and Bayesian-based [**DBSCAN Hyper-parameter\n Tuner**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/tuner) to select the optimal\n hyper-parameters configuration of the DBSCAN clustering algorithm\n\n- Efficient handling of [**feature\n selection**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/feature_selection/) in a potentially\n high-dimensional and\n massive datasets\n\n- Gravitational implementation of Kohonen [**Generational Self-Organizing Maps (\n GSOM)**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/cluster_inference_system/som.py) useful\n for unsupervised learning and supper-clustering by providing an enriched graphics, plots and animations features.\n\n- Computational geometrical model [**Polygonal\n Cage**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/cluster_inference_system/polygon_cage.py) to transfer\n feature vectors from a curved non-euclidean feature space to a new euclidean one.\n\n- Robust [**factor analysis**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/factor_analysis;) to reduce a\n large number of variables into fewer numbers\n\n- Easy handling of [**missing data**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/imputation/) (represented\n as `NaN`, `NA`, or `NaT`) in floating point\n as well as non-floating point data\n\n- Flexible implementation of directed and undirected [**graph data\n structuer**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/graph_data_structur.py) and\n algorithms.\n\n- Intuitive [**resampling**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/resampling/) data sets\n\n- Powerful, flexible [**parser**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/parser) functionality to\n perform parsing, manipulating, and generating\n operations on flat, massive and unstructured [Traces](https://tools.bsc.es/paraver/trace_generation) datasets\n which are generated by [MareNostrum](https://www.bsc.es/marenostrum/marenostrum)\n\n- [**Utilities**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/utils) functionality: intuitive explanatory\n data analysis, plotting, load and generate\n data, and etc...\n\n## Examples:\n\n- [**Feature Space Curvature Map**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/cluster_inference_system/space_curvature_map.py)\n \n <div align=\"center\"> \n <img src=\"https://github.com/kavehmahdavi/kavica/raw/main/doc/web/circel.gif\" width=\"800\"><br>\n </div>\n\n- [**Density\n Homogenizing**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/cluster_inference_system/bilinear_transformation.py)\n\n ![](https://github.com/kavehmahdavi/kavica/raw/main/doc/web/hemo.png)\n\n Application of Feature Space Curvature Map on a multi-density 2D dataset Synt10 containing ten clusters. (a) A scatter\n plot of clusters with varied densities. The legend shows the size/N(\u03bc,\u03c32) per cluster, the colors represent the data\n original labeling and the\n red lines draw the initial FSF. (b) shows the FSC model that is computed with our FSCM method. Note that the red lines\n show the deformation of the FSF. (c) scatter plots the data (a) projected by applying our transformation through\n model (b).\n As a result, the diversity of the clusters\u2019 density scaled appropriately to achieve a better density-based clustering\n performance.\n\n ![](https://github.com/kavehmahdavi/kavica/raw/main/doc/web/hemo_exam.png)\n\n\n- [**Polygonal Cage**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/cluster_inference_system/polygon_cage.py)\n Multilinear transformation\n\n Feature Space Curvetuer | Feature Space Fabric \n --- |---\n ![](https://github.com/kavehmahdavi/kavica/raw/main/doc/web/BLT1.png) | ![](https://github.com/kavehmahdavi/kavica/raw/main/doc/web/BLT00.png) \n\nData point transformation between a bent FSC (a) and regular FSF (b) based on the Multi-linear transformation in\nR<sup>2</sup>.\n\n- [**Organization Component Analysis**](https://github.com/kavehmahdavi/kavica/tree/main/kavica/cluster_inference_system/OCA.py)\n <div align=\"center\"> \n <img src=\"https://github.com/kavehmahdavi/kavica/raw/main/doc/web/oca.png\" width=\"800\"><br>\n </div>\n Application of the OCA on the Iris dataset\n\n## Video\n\n <div align=\"center\"> \n <a href=\"https://www.youtube.com/watch?v=lxL3niQmBcU&t=27s\"> \n <img src=\"https://github.com/kavehmahdavi/kavica/raw/main/doc/web/OCA_presentation.png\" width=\"600\">\n </a>\n </div>\n\n## Where to get it\n\nThe source code is currently hosted on GitHub at: [kavica](https://github.com/kavehmahdavi/kavica)\n\nBinary installers for the latest released version are available at the\n[Python Package Index (PyPI)](https://pypi.org/project/KAVICA/) and on [Conda](https://docs.conda.io/en/latest/).\n\nThe recommended way to install kavica is to use:\n\n```sh\n# PyPI\npip install kavica\n```\n\nBut it can also be installed using:\n\n```sh\n# or conda\nconda config --add channels conda-forge\nconda install kavica\n```\n\nTo verify your setup, start Python from the command line and run the following:\n\n```sh\nimport kavica\n```\n\n## Dependencies\n\nSee the [requirement.txt](/requirements.txt) for installing the required packages:\n\n```sh\npip install -r requirements.txt\n```\n\n## Publications\n\n[Unsupervised Feature Selection for Noisy Data](https://doi.org/10.1007/978-3-030-35231-8_6)\n\n[Organization Component Analysis: The method for extracting insights from the shape of cluster](https://doi.org/10.1109/IJCNN52387.2021.9533650)\n\n[Feature Space Curvature Map: A Method To Homogenize Cluster Densities](https://doi.org/10.1109/IJCNN55064.2022.9892921)\n\n## Issue tracker\n\nIf you find a bug, please help us solve it by [filing a report](https://github.com/kavehmahdavi/kavica/issues).\n\n## Contributing\n\nIf you want to contribute, check out the\n[contribution guidelines](https://kavehmahdavi.github.io/kavica/main/contributions.html).\n\n## License\n\nThe main library of **kavica** is\n[released under the BSD 3 clause license](https://kavehmahdavi.github.io/kavica/main/license.html).\n\n\n\n",
"bugtrack_url": null,
"license": "",
"summary": "KAVICA: Powerful Python Cluster Analysis and Inference Toolkit",
"version": "1.3.4",
"split_keywords": [
"cluster inference system",
"feature selection",
"factor analysis",
"parser",
"clustering",
"unsupervised",
"self-organizing map",
"organization component analysis",
"feature space curvature map",
"multiline transformation"
],
"urls": [
{
"comment_text": "",
"digests": {
"md5": "0d4f84ac50ba4209aa5e38a6a9b6d042",
"sha256": "e94bda9414d7ca3e32a5067aa1609e5a78d68e8a561ac5b48053469ac7bd7461"
},
"downloads": -1,
"filename": "KAVICA-1.3.4.tar.gz",
"has_sig": false,
"md5_digest": "0d4f84ac50ba4209aa5e38a6a9b6d042",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3",
"size": 155907,
"upload_time": "2022-12-09T19:36:22",
"upload_time_iso_8601": "2022-12-09T19:36:22.825738Z",
"url": "https://files.pythonhosted.org/packages/d9/12/fe2854a76f83bd31dab3ddb24e8e00b57a3f95c10ce718d9b15feb68cfc6/KAVICA-1.3.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2022-12-09 19:36:22",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "kavehmahdavi",
"github_project": "kavica",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "setuptools",
"specs": [
[
"~=",
"59.6.0"
]
]
},
{
"name": "numpy",
"specs": [
[
"~=",
"1.19.5"
]
]
},
{
"name": "pandas",
"specs": [
[
"==",
"0.25.3"
]
]
},
{
"name": "mpi4py",
"specs": [
[
"==",
"3.0.0"
]
]
},
{
"name": "scipy",
"specs": [
[
"~=",
"1.5.4"
]
]
},
{
"name": "matplotlib",
"specs": [
[
"==",
"3.1.2"
]
]
},
{
"name": "networkx",
"specs": [
[
"==",
"2.2"
]
]
},
{
"name": "h5py",
"specs": [
[
"~=",
"3.1.0"
]
]
},
{
"name": "terminaltables",
"specs": [
[
"==",
"3.1.0"
]
]
},
{
"name": "psutil",
"specs": [
[
"==",
"5.4.8"
]
]
},
{
"name": "Cython",
"specs": [
[
"==",
"0.28.5"
]
]
},
{
"name": "seaborn",
"specs": [
[
"==",
"0.9.0"
]
]
},
{
"name": "pandas_profiling",
"specs": [
[
"==",
"2.5.0"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
"~=",
"0.24.1"
]
]
},
{
"name": "alphashape",
"specs": [
[
"==",
"1.0.2"
]
]
},
{
"name": "statsmodels",
"specs": [
[
"==",
"0.10.2"
]
]
},
{
"name": "vg",
"specs": [
[
"==",
"1.8.0"
]
]
},
{
"name": "shapely",
"specs": [
[
"==",
"1.7.0"
]
]
},
{
"name": "tabulate",
"specs": [
[
"==",
"0.8.6"
]
]
},
{
"name": "six",
"specs": [
[
"==",
"1.12.0"
]
]
},
{
"name": "missingno",
"specs": [
[
"~=",
"0.4.2"
]
]
},
{
"name": "pyqtgraph",
"specs": [
[
"==",
"0.10.0"
]
]
},
{
"name": "descartes",
"specs": [
[
"==",
"1.1.0"
]
]
},
{
"name": "cryptography",
"specs": [
[
"~=",
"2.7"
]
]
},
{
"name": "plotly",
"specs": [
[
"~=",
"4.14.3"
]
]
},
{
"name": "imageio",
"specs": [
[
"~=",
"2.9.0"
]
]
},
{
"name": "celluloid",
"specs": [
[
"~=",
"0.2.0"
]
]
},
{
"name": "sympy",
"specs": [
[
"~=",
"1.8"
]
]
},
{
"name": "ggplot",
"specs": [
[
"~=",
"0.11.5"
]
]
},
{
"name": "Pillow",
"specs": [
[
"~=",
"8.1.0"
]
]
},
{
"name": "sklearn",
"specs": [
[
"~=",
"0.0"
]
]
},
{
"name": "twine",
"specs": [
[
"~=",
"3.8.0"
]
]
}
],
"lcname": "kavica"
}