CSGT


NameCSGT JSON
Version 1.0.3 PyPI version JSON
download
home_pagehttps://github.com/MM21B038/CSGO
SummaryA deep learning library for Self-Organizing Maps (SOM) with clustering and gradient optimization.
upload_time2024-08-17 23:34:05
maintainerNone
docs_urlNone
authorManav Gupta
requires_python>=3.10
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # CSGT - Cluster-Sort-Gradient-Tuning

**CSGT** (Cluster-Sort-Gradient-Tuning) is a robust Python library designed for implementing Self-Organizing Maps (SOMs), a type of unsupervised learning algorithm that uses competitive learning to perform dimensionality reduction and data clustering. The library focuses on gradient-based optimization techniques, providing advanced features for data visualization and analysis through U-Matrix and hit maps, along with error quantification metrics like quantization and topographic errors.

## Key Features

- **Self-Organizing Map (SOM) Implementation**: Train SOMs with customizable grid size, learning rate, neighborhood function, and training algorithms.
- **Gradient-Based Optimization**: Dynamic learning rate and neighborhood size adjustment using various decay functions, allowing flexible control over model convergence.
- **Distance Metrics**: Support multiple distance metrics including Euclidean, Manhattan (L1), and Cosine distances for neuron weight updates and winner selection.
- **Error Metrics**: Calculating quantization and topographic errors to assess the performance and quality of the SOM.
- **Visualization Tools**: Generation of U-Matrix and hit maps to visually interpret and evaluate the SOM, helping to identify data clusters and relationships.

## Installation

You can install the package directly from PyPI:

```bash
pip install CSGT
```

## Getting Started

### Importing the Library
```bash
from CSGT import CSGT
import numpy as np
```
### Initializing the CSGO Model
```bash
# Sample data
data = np.random.random((100, 3))

# Initialize the CSGO model with a 10x10 grid and 3-dimensional input data
model = CSGT(x=10, y=10, input_len=3)
```
### Training the Model
```bash
# Train the SOM with 10,000 epochs
model.train(data, epoch=10000)
```
### Visualizing the U-Matrix
```bash
# Plot the U-Matrix to visualize the topological relationships of the neurons
model.plot_u_matrix(data)
```
### Visualizing the Hit Map
```bash
# Plot the hit map to visualize neuron activation frequencies
model.plot_hit_map(data)
```
## CSGT Class and Methods
### Initialization: CSGO.__init__()
```bash
CSGT(x, y, input_len, sigma=1.0, learning_rate=0.5, norm='L1', decay_function='g', factor=None, random_state=None, metric='euclidean', train_type='hard')
```
#### Parameters:
1. `x, y`: Dimensions of the SOM grid.
2. `input_len`: Length of the input vectors.
3. `sigma`: Initial neighborhood radius, controlling the spread of the influence of the BMU.
4. `learning_rate`: Initial learning rate for updating the neurons' weights.
5. `norm`: Normalization type for neuron weights ('L1' or 'L2').
6. `decay_function`: Function to decay learning rate and neighborhood radius. Options:
   - ``g``: Linear decay (Default)
   - ``e``: Exponential decay
   - ``s_e``: Scaled exponential decay
   - ``l``: Linear decay with a different formulation
   - ``i``: Inverse decay
   - ``p``: Polynomial decay
8. `factor`: Additional factor for the decay function (used in 's_e' and 'p' decay).
9. `random_state`: Seed for random number generation, ensuring reproducibility.
10. `metric`: Distance metric to calculate distances between input vectors and neuron weights ('euclidean', 'manhattan', or 'cosine').
11. `train_type`: Type of neighborhood function to be used during training. Options:
    - ``hard``: Quantized neighborhood function.
    - ``gaussian``: Gaussian neighborhood function.
    - ``comb``: Combination of hard and Gaussian functions.

### Weight Initialization: CSGT.initialize_weight()
Initializes the neuron weight vectors based on the input length and normalization type.

### Distance Calculation: CSGT.calculate_distance()
Calculates the distance between two vectors using the specified metric.

### Best Matching Unit (BMU): CSGT.bestMatchingNeuron()
Identifies the neuron on the grid that best matches the current input vector based on the minimum distance.

### Decay Function: CSGT.decay()
Applies the selected decay function to adjust the learning rate and neighborhood radius over time.

### Training the SOM: CSGT.train()
Trains the SOM over a specified number of epochs, adjusting neuron weights based on the input data.

### U-Matrix Calculation: CSGT.distance_map()
Generates the U-Matrix, a matrix that visualizes the distances between the neuron weights, helping to identify clusters and topological structures.

### Plotting the U-Matrix: CSGT.plot_u_matrix()
Displays the U-Matrix using a heatmap to represent the distances between neighboring neurons.

### Plotting the Hit Map: CSGT.plot_hit_map()
Generates and displays a hit map that shows how frequently each neuron has been the BMU for the input vectors.

### Quantization Error: CSGT.quantization_error()
Calculates the quantization error, which measures the average distance between the input vectors and their corresponding BMUs. Lower quantization errors indicate a better fit of the SOM to the input data.

### Topographic Error: CSGT.topographic_error()
Calculates the topographic error, which measures the proportion of input vectors for which the first and second BMUs are not adjacent. Lower topographic errors indicate a better preservation of the input data topology.

### Winning Neuron Map: CSGT.win_map()
Returns a map of neurons with the corresponding input vectors that each neuron has won during training.

### Neighbor Retrieval: CSGT.get_neighbors()
Returns the list of neighbors for a specified neuron based on the current neighborhood radius.

## Mathematical Background
### Self-Organizing Maps (SOM)
SOMs are a type of artificial neural network introduced by Teuvo Kohonen in the 1980s. They use competitive learning to project high-dimensional data onto a lower-dimensional (usually 2D) grid, preserving the topological relationships of the input data. Each neuron in the SOM corresponds to a weight vector, and during training, the neurons compete to be the best matching unit (BMU) for each input vector. The BMU and its neighboring neurons have their weights updated to become more similar to the input vector.

### Quantization Error
Quantization error is a crucial metric in evaluating SOMs. It quantifies the error introduced when representing high-dimensional data using the discrete grid of neurons in the SOM. Mathematically, it is defined as the average Euclidean distance between the input vectors and their BMUs.

### U-Matrix
The U-Matrix (Unified Distance Matrix) is a visualization tool used in SOMs to represent the distances between neighboring neurons. It helps in identifying clusters and understanding the topological structure of the SOM.

## Example Use Cases
Clustering: Grouping high-dimensional data into clusters for pattern recognition and data analysis.
Dimensionality Reduction: Projecting high-dimensional data onto a 2D grid while preserving the relationships among data points.
Visualization: Understanding and interpreting the structure and relationships in complex datasets through U-Matrix and hit maps.

## License
This project is licensed under the MIT License - see the LICENSE file for details.

# Author
- Manav Gupta
- Email: manav26102002@gmail.com

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/MM21B038/CSGO",
    "name": "CSGT",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": "Manav Gupta",
    "author_email": "manav26102002@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/87/b4/d6e04199aa8e069341ea08d7016ad0d6bf298f2af1668bb5e7034f4dc3cd/csgt-1.0.3.tar.gz",
    "platform": null,
    "description": "# CSGT - Cluster-Sort-Gradient-Tuning\r\n\r\n**CSGT** (Cluster-Sort-Gradient-Tuning) is a robust Python library designed for implementing Self-Organizing Maps (SOMs), a type of unsupervised learning algorithm that uses competitive learning to perform dimensionality reduction and data clustering. The library focuses on gradient-based optimization techniques, providing advanced features for data visualization and analysis through U-Matrix and hit maps, along with error quantification metrics like quantization and topographic errors.\r\n\r\n## Key Features\r\n\r\n- **Self-Organizing Map (SOM) Implementation**: Train SOMs with customizable grid size, learning rate, neighborhood function, and training algorithms.\r\n- **Gradient-Based Optimization**: Dynamic learning rate and neighborhood size adjustment using various decay functions, allowing flexible control over model convergence.\r\n- **Distance Metrics**: Support multiple distance metrics including Euclidean, Manhattan (L1), and Cosine distances for neuron weight updates and winner selection.\r\n- **Error Metrics**: Calculating quantization and topographic errors to assess the performance and quality of the SOM.\r\n- **Visualization Tools**: Generation of U-Matrix and hit maps to visually interpret and evaluate the SOM, helping to identify data clusters and relationships.\r\n\r\n## Installation\r\n\r\nYou can install the package directly from PyPI:\r\n\r\n```bash\r\npip install CSGT\r\n```\r\n\r\n## Getting Started\r\n\r\n### Importing the Library\r\n```bash\r\nfrom CSGT import CSGT\r\nimport numpy as np\r\n```\r\n### Initializing the CSGO Model\r\n```bash\r\n# Sample data\r\ndata = np.random.random((100, 3))\r\n\r\n# Initialize the CSGO model with a 10x10 grid and 3-dimensional input data\r\nmodel = CSGT(x=10, y=10, input_len=3)\r\n```\r\n### Training the Model\r\n```bash\r\n# Train the SOM with 10,000 epochs\r\nmodel.train(data, epoch=10000)\r\n```\r\n### Visualizing the U-Matrix\r\n```bash\r\n# Plot the U-Matrix to visualize the topological relationships of the neurons\r\nmodel.plot_u_matrix(data)\r\n```\r\n### Visualizing the Hit Map\r\n```bash\r\n# Plot the hit map to visualize neuron activation frequencies\r\nmodel.plot_hit_map(data)\r\n```\r\n## CSGT Class and Methods\r\n### Initialization: CSGO.__init__()\r\n```bash\r\nCSGT(x, y, input_len, sigma=1.0, learning_rate=0.5, norm='L1', decay_function='g', factor=None, random_state=None, metric='euclidean', train_type='hard')\r\n```\r\n#### Parameters:\r\n1. `x, y`: Dimensions of the SOM grid.\r\n2. `input_len`: Length of the input vectors.\r\n3. `sigma`: Initial neighborhood radius, controlling the spread of the influence of the BMU.\r\n4. `learning_rate`: Initial learning rate for updating the neurons' weights.\r\n5. `norm`: Normalization type for neuron weights ('L1' or 'L2').\r\n6. `decay_function`: Function to decay learning rate and neighborhood radius. Options:\r\n   - ``g``: Linear decay (Default)\r\n   - ``e``: Exponential decay\r\n   - ``s_e``: Scaled exponential decay\r\n   - ``l``: Linear decay with a different formulation\r\n   - ``i``: Inverse decay\r\n   - ``p``: Polynomial decay\r\n8. `factor`: Additional factor for the decay function (used in 's_e' and 'p' decay).\r\n9. `random_state`: Seed for random number generation, ensuring reproducibility.\r\n10. `metric`: Distance metric to calculate distances between input vectors and neuron weights ('euclidean', 'manhattan', or 'cosine').\r\n11. `train_type`: Type of neighborhood function to be used during training. Options:\r\n    - ``hard``: Quantized neighborhood function.\r\n    - ``gaussian``: Gaussian neighborhood function.\r\n    - ``comb``: Combination of hard and Gaussian functions.\r\n\r\n### Weight Initialization: CSGT.initialize_weight()\r\nInitializes the neuron weight vectors based on the input length and normalization type.\r\n\r\n### Distance Calculation: CSGT.calculate_distance()\r\nCalculates the distance between two vectors using the specified metric.\r\n\r\n### Best Matching Unit (BMU): CSGT.bestMatchingNeuron()\r\nIdentifies the neuron on the grid that best matches the current input vector based on the minimum distance.\r\n\r\n### Decay Function: CSGT.decay()\r\nApplies the selected decay function to adjust the learning rate and neighborhood radius over time.\r\n\r\n### Training the SOM: CSGT.train()\r\nTrains the SOM over a specified number of epochs, adjusting neuron weights based on the input data.\r\n\r\n### U-Matrix Calculation: CSGT.distance_map()\r\nGenerates the U-Matrix, a matrix that visualizes the distances between the neuron weights, helping to identify clusters and topological structures.\r\n\r\n### Plotting the U-Matrix: CSGT.plot_u_matrix()\r\nDisplays the U-Matrix using a heatmap to represent the distances between neighboring neurons.\r\n\r\n### Plotting the Hit Map: CSGT.plot_hit_map()\r\nGenerates and displays a hit map that shows how frequently each neuron has been the BMU for the input vectors.\r\n\r\n### Quantization Error: CSGT.quantization_error()\r\nCalculates the quantization error, which measures the average distance between the input vectors and their corresponding BMUs. Lower quantization errors indicate a better fit of the SOM to the input data.\r\n\r\n### Topographic Error: CSGT.topographic_error()\r\nCalculates the topographic error, which measures the proportion of input vectors for which the first and second BMUs are not adjacent. Lower topographic errors indicate a better preservation of the input data topology.\r\n\r\n### Winning Neuron Map: CSGT.win_map()\r\nReturns a map of neurons with the corresponding input vectors that each neuron has won during training.\r\n\r\n### Neighbor Retrieval: CSGT.get_neighbors()\r\nReturns the list of neighbors for a specified neuron based on the current neighborhood radius.\r\n\r\n## Mathematical Background\r\n### Self-Organizing Maps (SOM)\r\nSOMs are a type of artificial neural network introduced by Teuvo Kohonen in the 1980s. They use competitive learning to project high-dimensional data onto a lower-dimensional (usually 2D) grid, preserving the topological relationships of the input data. Each neuron in the SOM corresponds to a weight vector, and during training, the neurons compete to be the best matching unit (BMU) for each input vector. The BMU and its neighboring neurons have their weights updated to become more similar to the input vector.\r\n\r\n### Quantization Error\r\nQuantization error is a crucial metric in evaluating SOMs. It quantifies the error introduced when representing high-dimensional data using the discrete grid of neurons in the SOM. Mathematically, it is defined as the average Euclidean distance between the input vectors and their BMUs.\r\n\r\n### U-Matrix\r\nThe U-Matrix (Unified Distance Matrix) is a visualization tool used in SOMs to represent the distances between neighboring neurons. It helps in identifying clusters and understanding the topological structure of the SOM.\r\n\r\n## Example Use Cases\r\nClustering: Grouping high-dimensional data into clusters for pattern recognition and data analysis.\r\nDimensionality Reduction: Projecting high-dimensional data onto a 2D grid while preserving the relationships among data points.\r\nVisualization: Understanding and interpreting the structure and relationships in complex datasets through U-Matrix and hit maps.\r\n\r\n## License\r\nThis project is licensed under the MIT License - see the LICENSE file for details.\r\n\r\n# Author\r\n- Manav Gupta\r\n- Email: manav26102002@gmail.com\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A deep learning library for Self-Organizing Maps (SOM) with clustering and gradient optimization.",
    "version": "1.0.3",
    "project_urls": {
        "Homepage": "https://github.com/MM21B038/CSGO"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f3e9b2594556d1c2d948313c5d135be31faa77194d4ec4cfe9fd956c58190983",
                "md5": "6c4ed73bb66b27399b25ecd61c3ca2f6",
                "sha256": "4d7ec766a1ac581a302026700048f4e2aa2ac4eaf0006ab7f583a4f019cd95a7"
            },
            "downloads": -1,
            "filename": "CSGT-1.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6c4ed73bb66b27399b25ecd61c3ca2f6",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 7260,
            "upload_time": "2024-08-17T23:34:04",
            "upload_time_iso_8601": "2024-08-17T23:34:04.198560Z",
            "url": "https://files.pythonhosted.org/packages/f3/e9/b2594556d1c2d948313c5d135be31faa77194d4ec4cfe9fd956c58190983/CSGT-1.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "87b4d6e04199aa8e069341ea08d7016ad0d6bf298f2af1668bb5e7034f4dc3cd",
                "md5": "3ef86f5af0ae74eb1c0c118773312e4c",
                "sha256": "9e2df52c68c9af8af205b4425a92ea968b8f1cff824d89b3886cbc9279bbe447"
            },
            "downloads": -1,
            "filename": "csgt-1.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "3ef86f5af0ae74eb1c0c118773312e4c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 7376,
            "upload_time": "2024-08-17T23:34:05",
            "upload_time_iso_8601": "2024-08-17T23:34:05.969538Z",
            "url": "https://files.pythonhosted.org/packages/87/b4/d6e04199aa8e069341ea08d7016ad0d6bf298f2af1668bb5e7034f4dc3cd/csgt-1.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-17 23:34:05",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "MM21B038",
    "github_project": "CSGO",
    "github_not_found": true,
    "lcname": "csgt"
}
        
Elapsed time: 0.41054s