featclus


Namefeatclus JSON
Version 0.1.2 PyPI version JSON
download
home_pagehttps://github.com/sebassaras02/featclus
SummaryThis library is built to perform feature selection in clustering models
upload_time2024-10-13 00:07:25
maintainerNone
docs_urlNone
authorSebastian Sarasti
requires_python>=3.12.4
licenseMIT License Copyright (c) 2024 Sebastian Sarasti Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords unsupervised learning machine learning clustering
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # 📊 FeatureClus: Feature Selection for Clustering Models

Welcome to **FeatureClus**, a Python library designed to simplify **feature selection** for **clustering models**. This tool helps you select the most relevant features that enhance clustering performance, ensuring you avoid the "curse of dimensionality" and make your clustering algorithms more efficient and interpretable. 🧠

## 🔍 How It Works

The feature selection process is driven by evaluating how each feature impacts the clustering results. **FeatureClus** uses an isolated data shift for each feature to assess its importance. The process follows these steps:

1. **MinMaxScaler**: First, we scale the features using MinMaxScaler to normalize the data.
2. **PCA (80% variance)**: Next, we apply Principal Component Analysis (PCA) to reduce dimensionality, retaining 80% of the variance.
3. **DBSCAN Clustering**: After reducing the dimensionality, DBSCAN is used to perform clustering.
4. **Silhouette Score Calculation**: For each feature, we calculate the silhouette score to evaluate the quality of the clusters. The silhouette score represents how similar an object is to its own cluster compared to other clusters.
5. **Data Shift and Feature Importance**: By applying isolated shifts to each feature and recalculating the silhouette score, we measure how the score changes. The absolute difference in the silhouette score after shifting each feature is used to rank the features by importance.

This method ensures that the features are evaluated for their individual contribution to the clustering process, allowing you to focus on the most impactful features.

## 🚀 Key Features
- 🔍 **Feature Ranking**: Ranks features based on the absolute change in silhouette score after applying isolated shifts to each feature.
- 📈 **Cluster Evaluation Metrics**: Calculates the silhouette score to assess the clustering quality and the influence of each feature.
- 💻 **Easy-to-Use API**: A simple, intuitive API that can be easily integrated into your machine learning pipeline.


## 📦 Installation

To install the library, run the following command:

```bash
pip install featclus
```

## 📊 Example

Here is a quick example of how to use **FeatureClus** with a clustering algorithm (e.g., KMeans):

```python
from featureclus import FeatureSelection
from sklearn.datasets import make_blobs

# Sample DataFrame
data, labels = make_blobs(n_samples=10000, centers=7, n_features=15, random_state=42)
df = pd.DataFrame(data, columns=[f"Feature_{i}" for i in range(15)])

# Initialize the FeatureSelection
model = FeatureSelection(data=df, shifts=[1, 25, 50, 75, 100], n_jobs=-1)

# See how the metrics are important
metrics2 = model2.get_metrics()

```

## 🛠️ Methods

### `get_metrics()`
Returns metrics that assess how each feature contributes to clustering.

### `plot_results(n_features)`
Selects the top `n_features` features based on their importance to clustering results.


## ☕ Support the Project

If you find this inventory optimization tool helpful and would like to support its continued development, consider buying me a coffee. Your support helps maintain and improve this project!

[![Buy Me A Coffee](https://www.buymeacoffee.com/assets/img/custom_images/orange_img.png)](https://www.paypal.com/paypalme/sebassarasti)

### Other Ways to Support
- ⭐ Star this repository
- 🍴 Fork it and contribute
- 📢 Share it with others who might find it useful
- 🐛 Report issues or suggest new features

Your support, in any form, is greatly appreciated! 🙏

## 📝 License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.

---

Happy clustering! 🎉

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/sebassaras02/featclus",
    "name": "featclus",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.12.4",
    "maintainer_email": null,
    "keywords": "Unsupervised Learning, Machine Learning, Clustering",
    "author": "Sebastian Sarasti",
    "author_email": "Sebastian Sarasti <sebitas.alejo@hotmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/40/75/13b22f55682b98fcdc6bf54c24ba96c87c02d96ba542a6587b67f42dbe66/featclus-0.1.2.tar.gz",
    "platform": null,
    "description": "# \ud83d\udcca FeatureClus: Feature Selection for Clustering Models\n\nWelcome to **FeatureClus**, a Python library designed to simplify **feature selection** for **clustering models**. This tool helps you select the most relevant features that enhance clustering performance, ensuring you avoid the \"curse of dimensionality\" and make your clustering algorithms more efficient and interpretable. \ud83e\udde0\n\n## \ud83d\udd0d How It Works\n\nThe feature selection process is driven by evaluating how each feature impacts the clustering results. **FeatureClus** uses an isolated data shift for each feature to assess its importance. The process follows these steps:\n\n1. **MinMaxScaler**: First, we scale the features using MinMaxScaler to normalize the data.\n2. **PCA (80% variance)**: Next, we apply Principal Component Analysis (PCA) to reduce dimensionality, retaining 80% of the variance.\n3. **DBSCAN Clustering**: After reducing the dimensionality, DBSCAN is used to perform clustering.\n4. **Silhouette Score Calculation**: For each feature, we calculate the silhouette score to evaluate the quality of the clusters. The silhouette score represents how similar an object is to its own cluster compared to other clusters.\n5. **Data Shift and Feature Importance**: By applying isolated shifts to each feature and recalculating the silhouette score, we measure how the score changes. The absolute difference in the silhouette score after shifting each feature is used to rank the features by importance.\n\nThis method ensures that the features are evaluated for their individual contribution to the clustering process, allowing you to focus on the most impactful features.\n\n## \ud83d\ude80 Key Features\n- \ud83d\udd0d **Feature Ranking**: Ranks features based on the absolute change in silhouette score after applying isolated shifts to each feature.\n- \ud83d\udcc8 **Cluster Evaluation Metrics**: Calculates the silhouette score to assess the clustering quality and the influence of each feature.\n- \ud83d\udcbb **Easy-to-Use API**: A simple, intuitive API that can be easily integrated into your machine learning pipeline.\n\n\n## \ud83d\udce6 Installation\n\nTo install the library, run the following command:\n\n```bash\npip install featclus\n```\n\n## \ud83d\udcca Example\n\nHere is a quick example of how to use **FeatureClus** with a clustering algorithm (e.g., KMeans):\n\n```python\nfrom featureclus import FeatureSelection\nfrom sklearn.datasets import make_blobs\n\n# Sample DataFrame\ndata, labels = make_blobs(n_samples=10000, centers=7, n_features=15, random_state=42)\ndf = pd.DataFrame(data, columns=[f\"Feature_{i}\" for i in range(15)])\n\n# Initialize the FeatureSelection\nmodel = FeatureSelection(data=df, shifts=[1, 25, 50, 75, 100], n_jobs=-1)\n\n# See how the metrics are important\nmetrics2 = model2.get_metrics()\n\n```\n\n## \ud83d\udee0\ufe0f Methods\n\n### `get_metrics()`\nReturns metrics that assess how each feature contributes to clustering.\n\n### `plot_results(n_features)`\nSelects the top `n_features` features based on their importance to clustering results.\n\n\n## \u2615 Support the Project\n\nIf you find this inventory optimization tool helpful and would like to support its continued development, consider buying me a coffee. Your support helps maintain and improve this project!\n\n[![Buy Me A Coffee](https://www.buymeacoffee.com/assets/img/custom_images/orange_img.png)](https://www.paypal.com/paypalme/sebassarasti)\n\n### Other Ways to Support\n- \u2b50 Star this repository\n- \ud83c\udf74 Fork it and contribute\n- \ud83d\udce2 Share it with others who might find it useful\n- \ud83d\udc1b Report issues or suggest new features\n\nYour support, in any form, is greatly appreciated! \ud83d\ude4f\n\n## \ud83d\udcdd License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.\n\n---\n\nHappy clustering! \ud83c\udf89\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2024 Sebastian Sarasti  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ",
    "summary": "This library is built to perform feature selection in clustering models",
    "version": "0.1.2",
    "project_urls": {
        "Homepage": "https://github.com/sebassaras02/featclus"
    },
    "split_keywords": [
        "unsupervised learning",
        " machine learning",
        " clustering"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "493719d29e48944c1bfdffc04809442044f7566477ddd695dce0a840bb9457eb",
                "md5": "c657f248e89b0db004c1f898e7c400df",
                "sha256": "269a09e5024a34cbd6f3d5f68a627de3d600418967b412b09bf7ce6b99027276"
            },
            "downloads": -1,
            "filename": "featclus-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c657f248e89b0db004c1f898e7c400df",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.12.4",
            "size": 6446,
            "upload_time": "2024-10-13T00:07:24",
            "upload_time_iso_8601": "2024-10-13T00:07:24.152293Z",
            "url": "https://files.pythonhosted.org/packages/49/37/19d29e48944c1bfdffc04809442044f7566477ddd695dce0a840bb9457eb/featclus-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "407513b22f55682b98fcdc6bf54c24ba96c87c02d96ba542a6587b67f42dbe66",
                "md5": "8f7e3d08d035ed633922f5723e555bd9",
                "sha256": "28cdbb0b755315776f87636c8a8e75bc8c6fef3eaa27ea35c658bf2271bd5d16"
            },
            "downloads": -1,
            "filename": "featclus-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "8f7e3d08d035ed633922f5723e555bd9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.12.4",
            "size": 6957,
            "upload_time": "2024-10-13T00:07:25",
            "upload_time_iso_8601": "2024-10-13T00:07:25.125447Z",
            "url": "https://files.pythonhosted.org/packages/40/75/13b22f55682b98fcdc6bf54c24ba96c87c02d96ba542a6587b67f42dbe66/featclus-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-13 00:07:25",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "sebassaras02",
    "github_project": "featclus",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "featclus"
}
        
Elapsed time: 0.66669s