ParametricSpectralClustering


NameParametricSpectralClustering JSON
Version 0.0.4 PyPI version JSON
download
home_pageNone
SummaryA library for users to use parametric spectral clustering
upload_time2024-05-09 13:44:56
maintainerNone
docs_urlNone
authorIvy Chang, Hsin Ju Tai
requires_python>=3.8
licenseMIT
keywords spectral clustering incremental clustering online clustering non-linear clustering
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <!-- Parametric Spectral Clustering -->

# Parametric Spectral Clustering

This repository provides a PyTorch implementation of the **Parametric Spectral Clustering** (PSC) algorithm, which offers a favorable alternative to the traditional spectral clustering algorithm. PSC addresses issues related to computational efficiency, memory usage, and the absence of online learning capabilities. It serves as a versatile framework suitable for applying spectral clustering to large datasets.

<!-- PREREQUISITES -->

# Installation

## Dependencies

Parametric Spectral Clustering requires:

-   Python (>= 3.8)
-   NumPy (>= 1.26.4)
-   SciPy (>= 1.13.0)
-   PyTorch (>= 2.2.2)
-   scikit-learn (>= 1.4.2)
-   Pandas (>= 2.2.2)
-   Matplotlib (3.8.4)

---

<!-- INSTALLATION -->

## User installation

Use setup.py:

```sh
python setup.py install
```

Use pip:

```sh
pip install ParametricSpectralClustering
```

<!-- SAMPLE USAGE -->

## Sample Usage

Using UCI ML hand-written digits datasets as an example.

```sh
>>> from ParametricSpectralClustering import PSC, Four_layer_FNN
>>> from sklearn.datasets import load_digits
>>> from sklearn.cluster import KMeans
>>> digits = load_digits()
>>> X = digits.data/16
>>> cluster_method = KMeans(n_clusters=10, init="k-means++", n_init=1, max_iter=100, algorithm='elkan')
>>> model = Four_layer_FNN(64, 128, 256, 64, 10)
>>> psc = PSC(model=model, clustering_method=cluster_method, n_neighbor=10, sampling_ratio=0, batch_size_data=1797)
>>> psc.fit(X)
>>> psc.save_model("model")
>>> cluster_idx = psc.predict(X)
```

<!-- COMMEND LINE TOOL -->

## Command line tool

After installation, you may run the following scripts directly.

```sh
python bin/run.py [data] [rate] [n_cluster] [model_path] [cluster_result_format]
```

The `[data]` can accept .txt, .csv, and .npy format of data.

The `[rate]` should be in float, between 0.0 and 1.0. It represent the proportion of the input data reserved for training the mapping function from the original feature space to the spectral embedding.

The `[n_cluster]` is the number of clusters the user intends to partition. This number needs to be lower than the total data points available within the dataset.

The `[model_path]` is the path to save the trained model.

The `[cluster_result_format]` can be either .txt or .csv. It represent the format of the cluster result.

<!-- EXPERIMENT-->

# Experiment

The 'JSS_Experiments' directory contains the code for the experiments detailed in the paper "PSC: a Python Package for Parametric Spectral Clustering." This includes scripts for experiments on the Firewall, NIDS, and Synthesis datasets.

Prior to executing these scripts, ensure that the necessary datasets have been downloaded and placed in the appropriate location. The datasets can be obtained from the following sources:

-   NIDS Dataset: https://www.kaggle.com/datasets/aryashah2k/nfuqnidsv2-network-intrusion-detection-dataset

Please place the downloaded datasets in the ‘JSS_Experiments/datasets’ directory. Ensure the datasets are correctly located before running the scripts.

```sh
cd JSS_Experiments
python run.py
```

<!-- Test -->

# Test

To run the test, use the following command:

```sh
pytest tests
```

<!-- LICENSE -->

# License

Distributed under the MIT License. See `LICENSE.txt` for more information.

<!-- CONTACT -->

# Contact

| Author | Ivy Chang           | Hsin Ju Tai         |
| ------ | ------------------- | ------------------- |
| E-mail | ivy900403@gmail.com | hsinjutai@gmail.com |

Project Link: [Parametric Spectral Clsutering](https://github.com/IvyChang04/PSC_library)


Change Log
==========

2024/03/26
First published

2024/04/19
Update requirements

2024/05/01
Update requirements (add Matplotlib)

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "ParametricSpectralClustering",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "Spectral Clustering, Incremental Clustering, Online Clustering, Non-linear clustering",
    "author": "Ivy Chang, Hsin Ju Tai",
    "author_email": "ivy900403@gmail.com, luludai020127@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/99/26/9910c783c32462664663e834117c29b972cb105681d903b28cf30747b8bd/parametricspectralclustering-0.0.4.tar.gz",
    "platform": null,
    "description": "<!-- Parametric Spectral Clustering -->\n\n# Parametric Spectral Clustering\n\nThis repository provides a PyTorch implementation of the **Parametric Spectral Clustering** (PSC) algorithm, which offers a favorable alternative to the traditional spectral clustering algorithm. PSC addresses issues related to computational efficiency, memory usage, and the absence of online learning capabilities. It serves as a versatile framework suitable for applying spectral clustering to large datasets.\n\n<!-- PREREQUISITES -->\n\n# Installation\n\n## Dependencies\n\nParametric Spectral Clustering requires:\n\n-   Python (>= 3.8)\n-   NumPy (>= 1.26.4)\n-   SciPy (>= 1.13.0)\n-   PyTorch (>= 2.2.2)\n-   scikit-learn (>= 1.4.2)\n-   Pandas (>= 2.2.2)\n-   Matplotlib (3.8.4)\n\n---\n\n<!-- INSTALLATION -->\n\n## User installation\n\nUse setup.py:\n\n```sh\npython setup.py install\n```\n\nUse pip:\n\n```sh\npip install ParametricSpectralClustering\n```\n\n<!-- SAMPLE USAGE -->\n\n## Sample Usage\n\nUsing UCI ML hand-written digits datasets as an example.\n\n```sh\n>>> from ParametricSpectralClustering import PSC, Four_layer_FNN\n>>> from sklearn.datasets import load_digits\n>>> from sklearn.cluster import KMeans\n>>> digits = load_digits()\n>>> X = digits.data/16\n>>> cluster_method = KMeans(n_clusters=10, init=\"k-means++\", n_init=1, max_iter=100, algorithm='elkan')\n>>> model = Four_layer_FNN(64, 128, 256, 64, 10)\n>>> psc = PSC(model=model, clustering_method=cluster_method, n_neighbor=10, sampling_ratio=0, batch_size_data=1797)\n>>> psc.fit(X)\n>>> psc.save_model(\"model\")\n>>> cluster_idx = psc.predict(X)\n```\n\n<!-- COMMEND LINE TOOL -->\n\n## Command line tool\n\nAfter installation, you may run the following scripts directly.\n\n```sh\npython bin/run.py [data] [rate] [n_cluster] [model_path] [cluster_result_format]\n```\n\nThe `[data]` can accept .txt, .csv, and .npy format of data.\n\nThe `[rate]` should be in float, between 0.0 and 1.0. It represent the proportion of the input data reserved for training the mapping function from the original feature space to the spectral embedding.\n\nThe `[n_cluster]` is the number of clusters the user intends to partition. This number needs to be lower than the total data points available within the dataset.\n\nThe `[model_path]` is the path to save the trained model.\n\nThe `[cluster_result_format]` can be either .txt or .csv. It represent the format of the cluster result.\n\n<!-- EXPERIMENT-->\n\n# Experiment\n\nThe 'JSS_Experiments' directory contains the code for the experiments detailed in the paper \"PSC: a Python Package for Parametric Spectral Clustering.\" This includes scripts for experiments on the Firewall, NIDS, and Synthesis datasets.\n\nPrior to executing these scripts, ensure that the necessary datasets have been downloaded and placed in the appropriate location. The datasets can be obtained from the following sources:\n\n-   NIDS Dataset: https://www.kaggle.com/datasets/aryashah2k/nfuqnidsv2-network-intrusion-detection-dataset\n\nPlease place the downloaded datasets in the \u2018JSS_Experiments/datasets\u2019 directory. Ensure the datasets are correctly located before running the scripts.\n\n```sh\ncd JSS_Experiments\npython run.py\n```\n\n<!-- Test -->\n\n# Test\n\nTo run the test, use the following command:\n\n```sh\npytest tests\n```\n\n<!-- LICENSE -->\n\n# License\n\nDistributed under the MIT License. See `LICENSE.txt` for more information.\n\n<!-- CONTACT -->\n\n# Contact\n\n| Author | Ivy Chang           | Hsin Ju Tai         |\n| ------ | ------------------- | ------------------- |\n| E-mail | ivy900403@gmail.com | hsinjutai@gmail.com |\n\nProject Link: [Parametric Spectral Clsutering](https://github.com/IvyChang04/PSC_library)\n\n\nChange Log\n==========\n\n2024/03/26\nFirst published\n\n2024/04/19\nUpdate requirements\n\n2024/05/01\nUpdate requirements (add Matplotlib)\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A library for users to use parametric spectral clustering",
    "version": "0.0.4",
    "project_urls": null,
    "split_keywords": [
        "spectral clustering",
        " incremental clustering",
        " online clustering",
        " non-linear clustering"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4ae3f709a2272b286cdaf79dfc06ab004dc6b77552a00974b9fa7792cd890ded",
                "md5": "0ff8e2e1e741b398b06af95b3fc3d183",
                "sha256": "304d44ce3c722b32da3ed16dfa17513bcecffdfd84cc073e83a92297927a1a34"
            },
            "downloads": -1,
            "filename": "ParametricSpectralClustering-0.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0ff8e2e1e741b398b06af95b3fc3d183",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 11261,
            "upload_time": "2024-05-09T13:44:55",
            "upload_time_iso_8601": "2024-05-09T13:44:55.279101Z",
            "url": "https://files.pythonhosted.org/packages/4a/e3/f709a2272b286cdaf79dfc06ab004dc6b77552a00974b9fa7792cd890ded/ParametricSpectralClustering-0.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "99269910c783c32462664663e834117c29b972cb105681d903b28cf30747b8bd",
                "md5": "4d1ac7142e3cf7dd9526fde94ee72705",
                "sha256": "d697436d96e4c3504c8569001f21aa9c94902676c54245dd92144c175a6c2b30"
            },
            "downloads": -1,
            "filename": "parametricspectralclustering-0.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "4d1ac7142e3cf7dd9526fde94ee72705",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 20549,
            "upload_time": "2024-05-09T13:44:56",
            "upload_time_iso_8601": "2024-05-09T13:44:56.953615Z",
            "url": "https://files.pythonhosted.org/packages/99/26/9910c783c32462664663e834117c29b972cb105681d903b28cf30747b8bd/parametricspectralclustering-0.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-09 13:44:56",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "parametricspectralclustering"
}
        
Elapsed time: 3.38362s