NNSOM

Name	NNSOM JSON
Version	1.8.2 JSON
	download
home_page	None
Summary	A SOM package
upload_time	2024-08-27 13:51:20
maintainer	None
docs_url	None
author	Dr. Martin Hagan, Dr. Amir Jafari, Lakshmi Sravya Chalapati, Ei Tanaka
requires_python	>=3.8
license	None
keywords	clustering machine learning neural network som unsupervised learning
VCS
bugtrack_url
requirements	NNSOM numpy scipy datetime matplotlib networkx scikit-learn pandas sphinx sphinx-gallery pydata-sphinx-theme ghp-import nbsphinx nbsphinx_link pandoc
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # NNSOM

## Self-Organizing Maps

NNSOM is a Python library that provides an implementation of Self-Organizing Maps (SOM) using NumPy and CuPy.
SOM is a type of Artificial Neural Network that can transform complex, nonlinear statistical relationships between high-dimensional data into simple topological relationships on a low-dimensional display (typically 2-dimensional).

The library is designed with two main goals in mind:

- Extensibility: NNSOM aims to provide a solid foundation for researchers to build upon and extend its functionality according to their specific requirements.
- Educational Value: The implementation is structured in a way that allows students to quickly understand the inner workings of SOM, fostering a better grasp of the algorithm's details.

With NNSOM, researchers and students alike can leverage the power of SOM for various applications, such as data visualization, clustering, and dimensionality reduction, while benefiting from the flexibility and educational value offered by this library.

## Installation

You can install the NNSOM by just using pip:

```angular2html
pip install NNSOM
```

## How to use it

You can see the example file with Iris dataset on Jupyter Notebook [here](https://github.com/amir-jafari/SOM/blob/main/examples/Tabular/Iris/notebook/iris_training.ipynb).

### Data Preparation
To use the NNSOM library effectively, format your data as a NumPy matrix where each row is an observation. 
```bash
import numpy as np
np.random.seed(42)
data = np.random.rand(3000, 10)
```

Alternatively, you can provide the data as a list of lists, following this structure:
```bash
data = [
  [value1, value2, value3, ..., valueN], # Observation 1
  [value1, value2, value3, ..., valueN], # Observation 2
  ...,
  [value1, value2, value3, ..., valueN], # Observation M
]
```

### Customize Your Normalization
Depending on your data's specific characteristics, you may opt to define a custom normalization function. 
Here's how to normalize your data using sklearn's MinMaxScaler:
```bash 
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(-1, 1))
norm_func = scaler.fit_transform
```

### Configurate the SOM Grid Parameters
Then, you can configurate the SOM Grid Parameters as follows:
```bash
SOM_Row_Num = 4  # The number of rows in the SOM grid
SOM_Col_Num = 4  # The number of columns in the SOM grid
Dimensions = (SOM_Row_Num, SOM_Col_Num) # The two-dimensional layout of the SOM grid 
```

### Configurate the Training Parameters 
Next, you can configurate the Training Parameters as follows:
```bash
Epochs = 200  # The total number of training epochs 
Steps = 100  #  The granularity of the weight update process within each epoch.
Init_neighborhood = 3 # Initial size of the neighborhood radius   
```

### Train the SOM
Then, you can train NNSOM just as follows:
```bash
from NNSOM.plots import SOMPlots
som = SOMPlots(Dimensions)  # Initialization of 4x4 SOM
som.init_w(data, norm_func=norm_func) # Initialize the weight
som.train(data, Init_neighborhood, Epochs, Steps)
```

### Export a SOM and load it again
A model can be saved using pickle as follows:
```bash
file_name = "..."
model_path = ".../"

som.save_pickle(file_name, model_path)
```
and can be loaded as follows:
```bash
from NNSOM.plots import SOMPlots
som = SOMPlots(Dimensions)  # Use the same dimension with the stored model.
som = som.load_pickle(file_name, model_path)
```

### Post-Training Data Clustering with NNSOM
After training SOM with NNSOM, you can leverage the trained model to cluster new or existing data. 
```bash
clust, dist, mdist, clusterSizes = som.cluster_data(data)
```
- clust: This is a list where each sublist contains the indices of data points that are assigned to the same cluster.
- dist: This list mirrors the structure of the "clust" list, with each sublist containing the distances of the corresponding data points. in "clust" from their Best Matching Unit.
- mdist: An array where each element represents the maximum distance between the SOM neuron.
- clusterSizes: An array listing the number of data points in each cluster.

### Error Analysis
NNSOM offers comprehensive tools to assess the quality and reliability of the trained SOM through various error metrics. 
Understanding these errors can help refine the SOM's configuration and interpret its performance effectively. 
Below are the three types of error measures provided by NNSOM:

#### 1. Quantization Error
Quantization error measures the average distance between each data point and its Best Matching Unit (BMU). This error provides insight into the SOM's ability to accurately represent the data space. A lower quantization error generally indicates a better representation.

Examples:
```bash
# Find quantization error
clust, dist, mdist, clusterSizes = som.cluster_data(data)
quant_err = som.quantization_error(dist)
print('Quantization error: ' + str(quant_err))
```

#### 2. Topological Error
Topological error evaluates the SOM's preservation of the data's topological structure. It is calculated by checking if adjacent data points in the input space are mapped to adjacent neurons in the SOM. This metric is split into two:

- Topological Error (1st neighbor): Measures the proportion of data points whose first nearest neighbor in the input space is not their neighbor on the map.
- Topological Error (1st and 2nd neighbor): Extends this to the first and second nearest neighbors.

Examples:
```bash
# Find topological error
top_error_1, top_error_1_2 =  som.topological_error(data)
print('Topological Error (1st neighbor) = ' + str(top_error_1) + '%')
print('Topological Error (1st and 2nd neighbor) = ' + str(top_error_1_2) + '%')
```

#### 3. Distortion Error
Distortion error calculates the total distance between each data point and its corresponding BMU, scaled by the data density around each BMU. This error helps to understand how well the SOM covers the distribution of the dataset and identifies areas where the map might be over or under-fitting.

Examples:
```bash
# Find Distortion Error
som.distortion_error(data)
```

### Visualize the SOM

To effectively understand and interpret the results of your SOM training, visualizing the SOM grid is crucial.
The NNSOM library offers a variety of plotting functions that allow you to visualize different aspects of the SOM and the training process.

#### The Generic Plot Function [[source]](https://github.com/amir-jafari/SOM/blob/main/src/NNSOM/plots.py#L1391)
This generic plot function can be used to generate multiple types of visualizations depending on the specified plot type.

Usage of the Plot Function:
```bash
som.plot('plot_type', data_dict=None, ind=None, target_class=None, use_add_array=False)
```

Parameters:
- plot_type: A string indicating the type of plot to generate. Options include 'top', 'neuron_dist', 'hit_hist', etc.
- data_dict: Optional dictionary containing data needed for specific plots.
- ind: Optional index for targeted plotting.
- target_class: Optional parameter to specify a target class for the plot.
- use_add_array: Boolean flag to indicate whether additional arrays in data_dict should be used.

Structure of data_dict:

The data_dict parameter should be structured as follows to provide necessary data for the plots:
```bash
data_dict = {
  "data": data,          # Main dataset used in SOM training or the new inputs data
  "target": y,           # Target variable, if applicable
  "clust": clust,        # Clustering results from SOM
  "add_1d_array": [],    # Additional 1D arrays for enhanced plotting
  "add_2d_array": [],    # Additional 2D arrays for enhanced plotting
}
```

The source code for the plot function can be found [here](https://github.com/amir-jafari/SOM/blob/main/src/NNSOM/plots.py#L1391).

#### Examples of Common Visualizations

1. Topological Grid
    
    Visualize the topological grid of the SOM to understand the layout and structure of the neurons.
```bash
import matplotlib.pyplot as plt
fig, ax, patches = som.plot('top')
plt.show()
```

2. Neuron Distance Map (U-Map)
    
    Display a distance map (U-Map) to see the distances between neighboring neurons, highlighting potential clusters.
```bash
fig, ax, pathces = som.plot('neuron_dist')
plt.show()
```

3. Hit Histogram

    Generate a hit histogram to visualize the frequency of each neuron being the best matching unit.
```bash
fig, ax, patches, text = som.plot('hit_hist', data_dict)
plt.show()
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "NNSOM",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "Clustering, Machine Learning, Neural Network, SOM, Unsupervised Learning",
    "author": "Dr. Martin Hagan, Dr. Amir Jafari, Lakshmi Sravya Chalapati, Ei Tanaka",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/8a/91/f0e30837e9e6cdd448328e622d37f6dfe64f7894595ffac9476c6ba62a28/nnsom-1.8.2.tar.gz",
    "platform": null,
    "description": "# NNSOM\n\n## Self-Organizing Maps\n\nNNSOM is a Python library that provides an implementation of Self-Organizing Maps (SOM) using NumPy and CuPy.\nSOM is a type of Artificial Neural Network that can transform complex, nonlinear statistical relationships between high-dimensional data into simple topological relationships on a low-dimensional display (typically 2-dimensional).\n\nThe library is designed with two main goals in mind:\n\n- Extensibility: NNSOM aims to provide a solid foundation for researchers to build upon and extend its functionality according to their specific requirements.\n- Educational Value: The implementation is structured in a way that allows students to quickly understand the inner workings of SOM, fostering a better grasp of the algorithm's details.\n\nWith NNSOM, researchers and students alike can leverage the power of SOM for various applications, such as data visualization, clustering, and dimensionality reduction, while benefiting from the flexibility and educational value offered by this library.\n\n## Installation\n\nYou can install the NNSOM by just using pip:\n\n```angular2html\npip install NNSOM\n```\n\n## How to use it\n\nYou can see the example file with Iris dataset on Jupyter Notebook [here](https://github.com/amir-jafari/SOM/blob/main/examples/Tabular/Iris/notebook/iris_training.ipynb).\n\n### Data Preparation\nTo use the NNSOM library effectively, format your data as a NumPy matrix where each row is an observation. \n```bash\nimport numpy as np\nnp.random.seed(42)\ndata = np.random.rand(3000, 10)\n```\n\nAlternatively, you can provide the data as a list of lists, following this structure:\n```bash\ndata = [\n  [value1, value2, value3, ..., valueN], # Observation 1\n  [value1, value2, value3, ..., valueN], # Observation 2\n  ...,\n  [value1, value2, value3, ..., valueN], # Observation M\n]\n```\n\n### Customize Your Normalization\nDepending on your data's specific characteristics, you may opt to define a custom normalization function. \nHere's how to normalize your data using sklearn's MinMaxScaler:\n```bash \nfrom sklearn.preprocessing import MinMaxScaler\nscaler = MinMaxScaler(feature_range=(-1, 1))\nnorm_func = scaler.fit_transform\n```\n\n### Configurate the SOM Grid Parameters\nThen, you can configurate the SOM Grid Parameters as follows:\n```bash\nSOM_Row_Num = 4  # The number of rows in the SOM grid\nSOM_Col_Num = 4  # The number of columns in the SOM grid\nDimensions = (SOM_Row_Num, SOM_Col_Num) # The two-dimensional layout of the SOM grid \n```\n\n### Configurate the Training Parameters \nNext, you can configurate the Training Parameters as follows:\n```bash\nEpochs = 200  # The total number of training epochs \nSteps = 100  #  The granularity of the weight update process within each epoch.\nInit_neighborhood = 3 # Initial size of the neighborhood radius   \n```\n\n### Train the SOM\nThen, you can train NNSOM just as follows:\n```bash\nfrom NNSOM.plots import SOMPlots\nsom = SOMPlots(Dimensions)  # Initialization of 4x4 SOM\nsom.init_w(data, norm_func=norm_func) # Initialize the weight\nsom.train(data, Init_neighborhood, Epochs, Steps)\n```\n\n### Export a SOM and load it again\nA model can be saved using pickle as follows:\n```bash\nfile_name = \"...\"\nmodel_path = \".../\"\n\nsom.save_pickle(file_name, model_path)\n```\nand can be loaded as follows:\n```bash\nfrom NNSOM.plots import SOMPlots\nsom = SOMPlots(Dimensions)  # Use the same dimension with the stored model.\nsom = som.load_pickle(file_name, model_path)\n```\n\n### Post-Training Data Clustering with NNSOM\nAfter training SOM with NNSOM, you can leverage the trained model to cluster new or existing data. \n```bash\nclust, dist, mdist, clusterSizes = som.cluster_data(data)\n```\n- clust: This is a list where each sublist contains the indices of data points that are assigned to the same cluster.\n- dist: This list mirrors the structure of the \"clust\" list, with each sublist containing the distances of the corresponding data points. in \"clust\" from their Best Matching Unit.\n- mdist: An array where each element represents the maximum distance between the SOM neuron.\n- clusterSizes: An array listing the number of data points in each cluster.\n\n### Error Analysis\nNNSOM offers comprehensive tools to assess the quality and reliability of the trained SOM through various error metrics. \nUnderstanding these errors can help refine the SOM's configuration and interpret its performance effectively. \nBelow are the three types of error measures provided by NNSOM:\n\n#### 1. Quantization Error\nQuantization error measures the average distance between each data point and its Best Matching Unit (BMU). This error provides insight into the SOM's ability to accurately represent the data space. A lower quantization error generally indicates a better representation.\n\nExamples:\n```bash\n# Find quantization error\nclust, dist, mdist, clusterSizes = som.cluster_data(data)\nquant_err = som.quantization_error(dist)\nprint('Quantization error: ' + str(quant_err))\n```\n\n#### 2. Topological Error\nTopological error evaluates the SOM's preservation of the data's topological structure. It is calculated by checking if adjacent data points in the input space are mapped to adjacent neurons in the SOM. This metric is split into two:\n\n- Topological Error (1st neighbor): Measures the proportion of data points whose first nearest neighbor in the input space is not their neighbor on the map.\n- Topological Error (1st and 2nd neighbor): Extends this to the first and second nearest neighbors.\n\nExamples:\n```bash\n# Find topological error\ntop_error_1, top_error_1_2 =  som.topological_error(data)\nprint('Topological Error (1st neighbor) = ' + str(top_error_1) + '%')\nprint('Topological Error (1st and 2nd neighbor) = ' + str(top_error_1_2) + '%')\n```\n\n#### 3. Distortion Error\nDistortion error calculates the total distance between each data point and its corresponding BMU, scaled by the data density around each BMU. This error helps to understand how well the SOM covers the distribution of the dataset and identifies areas where the map might be over or under-fitting.\n\nExamples:\n```bash\n# Find Distortion Error\nsom.distortion_error(data)\n```\n\n### Visualize the SOM\n\nTo effectively understand and interpret the results of your SOM training, visualizing the SOM grid is crucial.\nThe NNSOM library offers a variety of plotting functions that allow you to visualize different aspects of the SOM and the training process.\n\n#### The Generic Plot Function [[source]](https://github.com/amir-jafari/SOM/blob/main/src/NNSOM/plots.py#L1391)\nThis generic plot function can be used to generate multiple types of visualizations depending on the specified plot type.\n\nUsage of the Plot Function:\n```bash\nsom.plot('plot_type', data_dict=None, ind=None, target_class=None, use_add_array=False)\n```\n\nParameters:\n- plot_type: A string indicating the type of plot to generate. Options include 'top', 'neuron_dist', 'hit_hist', etc.\n- data_dict: Optional dictionary containing data needed for specific plots.\n- ind: Optional index for targeted plotting.\n- target_class: Optional parameter to specify a target class for the plot.\n- use_add_array: Boolean flag to indicate whether additional arrays in data_dict should be used.\n\nStructure of data_dict:\n\nThe data_dict parameter should be structured as follows to provide necessary data for the plots:\n```bash\ndata_dict = {\n  \"data\": data,          # Main dataset used in SOM training or the new inputs data\n  \"target\": y,           # Target variable, if applicable\n  \"clust\": clust,        # Clustering results from SOM\n  \"add_1d_array\": [],    # Additional 1D arrays for enhanced plotting\n  \"add_2d_array\": [],    # Additional 2D arrays for enhanced plotting\n}\n```\n\nThe source code for the plot function can be found [here](https://github.com/amir-jafari/SOM/blob/main/src/NNSOM/plots.py#L1391).\n\n#### Examples of Common Visualizations\n\n1. Topological Grid\n    \n    Visualize the topological grid of the SOM to understand the layout and structure of the neurons.\n```bash\nimport matplotlib.pyplot as plt\nfig, ax, patches = som.plot('top')\nplt.show()\n```\n\n2. Neuron Distance Map (U-Map)\n    \n    Display a distance map (U-Map) to see the distances between neighboring neurons, highlighting potential clusters.\n```bash\nfig, ax, pathces = som.plot('neuron_dist')\nplt.show()\n```\n\n3. Hit Histogram\n\n    Generate a hit histogram to visualize the frequency of each neuron being the best matching unit.\n```bash\nfig, ax, patches, text = som.plot('hit_hist', data_dict)\nplt.show()\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A SOM package",
    "version": "1.8.2",
    "project_urls": {
        "Documenation": "https://amir-jafari.github.io/SOM/",
        "Issues": "https://github.com/amir-jafari/SOM/issues",
        "Repository": "https://github.com/amir-jafari/SOM"
    },
    "split_keywords": [
        "clustering",
        " machine learning",
        " neural network",
        " som",
        " unsupervised learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f97e2062fe8e4f6b2cd156170e15d8ab00fb9a9ad15b79043355f869cd062039",
                "md5": "04dc5d4e51b48e9a77ddcc942f2e8991",
                "sha256": "9c7d017fc25a88dcabcbbb97d966edd057a178972af8321aa58fea394da4e360"
            },
            "downloads": -1,
            "filename": "nnsom-1.8.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "04dc5d4e51b48e9a77ddcc942f2e8991",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 37424,
            "upload_time": "2024-08-27T13:51:19",
            "upload_time_iso_8601": "2024-08-27T13:51:19.239176Z",
            "url": "https://files.pythonhosted.org/packages/f9/7e/2062fe8e4f6b2cd156170e15d8ab00fb9a9ad15b79043355f869cd062039/nnsom-1.8.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8a91f0e30837e9e6cdd448328e622d37f6dfe64f7894595ffac9476c6ba62a28",
                "md5": "5e53715c5000b7285bebe68cb1a92b59",
                "sha256": "c78a3a2d8b1a186308f60fd3c60410155e59209f93068e288a96dc5ca011357e"
            },
            "downloads": -1,
            "filename": "nnsom-1.8.2.tar.gz",
            "has_sig": false,
            "md5_digest": "5e53715c5000b7285bebe68cb1a92b59",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 38159,
            "upload_time": "2024-08-27T13:51:20",
            "upload_time_iso_8601": "2024-08-27T13:51:20.318261Z",
            "url": "https://files.pythonhosted.org/packages/8a/91/f0e30837e9e6cdd448328e622d37f6dfe64f7894595ffac9476c6ba62a28/nnsom-1.8.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-27 13:51:20",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "amir-jafari",
    "github_project": "SOM",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "NNSOM",
            "specs": [
                [
                    "~=",
                    "1.8.1"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "~=",
                    "1.25.1"
                ]
            ]
        },
        {
            "name": "scipy",
            "specs": [
                [
                    "~=",
                    "1.9.3"
                ]
            ]
        },
        {
            "name": "datetime",
            "specs": [
                [
                    "~=",
                    "5.4"
                ]
            ]
        },
        {
            "name": "matplotlib",
            "specs": [
                [
                    "~=",
                    "3.5.2"
                ]
            ]
        },
        {
            "name": "networkx",
            "specs": [
                [
                    "~=",
                    "3.1"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    "~=",
                    "1.0.2"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    "~=",
                    "1.4.4"
                ]
            ]
        },
        {
            "name": "sphinx",
            "specs": [
                [
                    "~=",
                    "7.2.6"
                ]
            ]
        },
        {
            "name": "sphinx-gallery",
            "specs": [
                [
                    "~=",
                    "0.15.0"
                ]
            ]
        },
        {
            "name": "pydata-sphinx-theme",
            "specs": [
                [
                    "~=",
                    "0.15.2"
                ]
            ]
        },
        {
            "name": "ghp-import",
            "specs": [
                [
                    "~=",
                    "2.1.0"
                ]
            ]
        },
        {
            "name": "nbsphinx",
            "specs": [
                [
                    "~=",
                    "0.9.3"
                ]
            ]
        },
        {
            "name": "nbsphinx_link",
            "specs": [
                [
                    "~=",
                    "1.3.0"
                ]
            ]
        },
        {
            "name": "pandoc",
            "specs": [
                [
                    "~=",
                    "2.3"
                ]
            ]
        }
    ],
    "lcname": "nnsom"
}

Dr. Martin Hagan, Dr. Amir Jafari, Lakshmi Sravya Chalapati, Ei Tanaka