cluster-over-sampling


Namecluster-over-sampling JSON
Version 0.5.0 PyPI version JSON
download
home_page
SummaryA general interface for clustering based over-sampling algorithms.
upload_time2023-03-16 11:38:41
maintainer
docs_urlNone
author
requires_python>=3.10, <3.12
licenseMIT
keywords machine learning imbalanced learning oversampling
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [scikit-learn]: <http://scikit-learn.org/stable/>
[imbalanced-learn]: <http://imbalanced-learn.org/stable/>
[SMOTE]: <https://arxiv.org/pdf/1106.1813.pdf>
[SOMO]: <https://www.sciencedirect.com/science/article/abs/pii/S0957417417302324>
[KMeans-SMOTE]: <https://www.sciencedirect.com/science/article/abs/pii/S0020025518304997>
[G-SOMO]: <https://www.sciencedirect.com/science/article/abs/pii/S095741742100662X>
[black badge]: <https://img.shields.io/badge/%20style-black-000000.svg>
[black]: <https://github.com/psf/black>
[docformatter badge]: <https://img.shields.io/badge/%20formatter-docformatter-fedcba.svg>
[docformatter]: <https://github.com/PyCQA/docformatter>
[ruff badge]: <https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/charliermarsh/ruff/main/assets/badge/v1.json>
[ruff]: <https://github.com/charliermarsh/ruff>
[mypy badge]: <http://www.mypy-lang.org/static/mypy_badge.svg>
[mypy]: <http://mypy-lang.org>
[mkdocs badge]: <https://img.shields.io/badge/docs-mkdocs%20material-blue.svg?style=flat>
[mkdocs]: <https://squidfunk.github.io/mkdocs-material>
[version badge]: <https://img.shields.io/pypi/v/cluster-over-sampling.svg>
[pythonversion badge]: <https://img.shields.io/pypi/pyversions/cluster-over-sampling.svg>
[downloads badge]: <https://img.shields.io/pypi/dd/cluster-over-sampling>
[gitter]: <https://gitter.im/cluster-over-sampling/community>
[gitter badge]: <https://badges.gitter.im/join%20chat.svg>
[discussions]: <https://github.com/georgedouzas/cluster-over-sampling/discussions>
[discussions badge]: <https://img.shields.io/github/discussions/georgedouzas/cluster-over-sampling>
[ci]: <https://github.com/georgedouzas/cluster-over-sampling/actions?query=workflow>
[ci badge]: <https://github.com/georgedouzas/cluster-over-sampling/actions/workflows/ci.yml/badge.svg>
[doc]: <https://github.com/georgedouzas/cluster-over-sampling/actions?query=workflow>
[doc badge]: <https://github.com/georgedouzas/cluster-over-sampling/actions/workflows/doc.yml/badge.svg?branch=master>

# cluster-over-sampling

[![ci][ci badge]][ci] [![doc][doc badge]][doc]

| Category          | Tools    |
| ------------------| -------- |
| **Development**   | [![black][black badge]][black] [![ruff][ruff badge]][ruff] [![mypy][mypy badge]][mypy] [![docformatter][docformatter badge]][docformatter] |
| **Package**       | ![version][version badge] ![pythonversion][pythonversion badge] ![downloads][downloads badge] |
| **Documentation** | [![mkdocs][mkdocs badge]][mkdocs]|
| **Communication** | [![gitter][gitter badge]][gitter] [![discussions][discussions badge]][discussions] |

## Introduction

A general interface for clustering based over-sampling algorithms.

## Installation

`cluster-over-sampling` is currently available on the PyPi's repository, and you can install it via `pip`:

```bash
pip install cluster-over-sampling
```

SOM clusterer requires optional dependencies:

```bash
pip install cluster-over-sampling[som]
```

Similarly for Geometric SMOTE oversampler:

```bash
pip install cluster-over-sampling[gsmote]
```

You can also install both of them:

```bash
pip install cluster-over-sampling[all]
```

## Usage

All the classes included in `cluster-over-sampling` follow the [imbalanced-learn] API using the functionality of the base
oversampler. Using [scikit-learn] convention, the data are represented as follows:

- Input data `X`: 2D array-like or sparse matrices.
- Targets `y`: 1D array-like.

The clustering-based oversamplers implement a `fit` method to learn from `X` and `y`:

```python
clustering_based_oversampler.fit(X, y)
```

They also implement a `fit_resample` method to resample `X` and `y`:

```python
X_resampled, y_resampled = clustering_based_oversampler.fit_resample(X, y)
```

## References

If you use `cluster-over-sampling` in a scientific publication, we would appreciate citations to any of the following papers:

[^1]: [G. Douzas, F. Bacao, "Self-Organizing Map Oversampling (SOMO) for imbalanced data set learning", Expert Systems with
    Applications, vol. 82, pp. 40-52, 2017.][SOMO]
[^2]: [G. Douzas, F. Bacao, F. Last, "Improving imbalanced learning through a heuristic oversampling method based on k-means and
    SMOTE", Information Sciences, vol. 465, pp. 1-20, 2018.][KMeans-SMOTE]
[^3]: [G. Douzas, F. Bacao, F. Last, "G-SOMO: An oversampling approach based on self-organized maps and geometric SMOTE", Expert
    Systems with Applications, vol. 183,115230, 2021.][G-SOMO]


            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "cluster-over-sampling",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.10, <3.12",
    "maintainer_email": "",
    "keywords": "machine learning,imbalanced learning,oversampling",
    "author": "",
    "author_email": "Georgios Douzas <gdouzas@icloud.com>",
    "download_url": "https://files.pythonhosted.org/packages/b9/4e/226bc1fa39fd9ec029cb3e5314e84b92fbcfca6e1fad98213537899ec6c1/cluster-over-sampling-0.5.0.tar.gz",
    "platform": null,
    "description": "[scikit-learn]: <http://scikit-learn.org/stable/>\n[imbalanced-learn]: <http://imbalanced-learn.org/stable/>\n[SMOTE]: <https://arxiv.org/pdf/1106.1813.pdf>\n[SOMO]: <https://www.sciencedirect.com/science/article/abs/pii/S0957417417302324>\n[KMeans-SMOTE]: <https://www.sciencedirect.com/science/article/abs/pii/S0020025518304997>\n[G-SOMO]: <https://www.sciencedirect.com/science/article/abs/pii/S095741742100662X>\n[black badge]: <https://img.shields.io/badge/%20style-black-000000.svg>\n[black]: <https://github.com/psf/black>\n[docformatter badge]: <https://img.shields.io/badge/%20formatter-docformatter-fedcba.svg>\n[docformatter]: <https://github.com/PyCQA/docformatter>\n[ruff badge]: <https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/charliermarsh/ruff/main/assets/badge/v1.json>\n[ruff]: <https://github.com/charliermarsh/ruff>\n[mypy badge]: <http://www.mypy-lang.org/static/mypy_badge.svg>\n[mypy]: <http://mypy-lang.org>\n[mkdocs badge]: <https://img.shields.io/badge/docs-mkdocs%20material-blue.svg?style=flat>\n[mkdocs]: <https://squidfunk.github.io/mkdocs-material>\n[version badge]: <https://img.shields.io/pypi/v/cluster-over-sampling.svg>\n[pythonversion badge]: <https://img.shields.io/pypi/pyversions/cluster-over-sampling.svg>\n[downloads badge]: <https://img.shields.io/pypi/dd/cluster-over-sampling>\n[gitter]: <https://gitter.im/cluster-over-sampling/community>\n[gitter badge]: <https://badges.gitter.im/join%20chat.svg>\n[discussions]: <https://github.com/georgedouzas/cluster-over-sampling/discussions>\n[discussions badge]: <https://img.shields.io/github/discussions/georgedouzas/cluster-over-sampling>\n[ci]: <https://github.com/georgedouzas/cluster-over-sampling/actions?query=workflow>\n[ci badge]: <https://github.com/georgedouzas/cluster-over-sampling/actions/workflows/ci.yml/badge.svg>\n[doc]: <https://github.com/georgedouzas/cluster-over-sampling/actions?query=workflow>\n[doc badge]: <https://github.com/georgedouzas/cluster-over-sampling/actions/workflows/doc.yml/badge.svg?branch=master>\n\n# cluster-over-sampling\n\n[![ci][ci badge]][ci] [![doc][doc badge]][doc]\n\n| Category          | Tools    |\n| ------------------| -------- |\n| **Development**   | [![black][black badge]][black] [![ruff][ruff badge]][ruff] [![mypy][mypy badge]][mypy] [![docformatter][docformatter badge]][docformatter] |\n| **Package**       | ![version][version badge] ![pythonversion][pythonversion badge] ![downloads][downloads badge] |\n| **Documentation** | [![mkdocs][mkdocs badge]][mkdocs]|\n| **Communication** | [![gitter][gitter badge]][gitter] [![discussions][discussions badge]][discussions] |\n\n## Introduction\n\nA general interface for clustering based over-sampling algorithms.\n\n## Installation\n\n`cluster-over-sampling` is currently available on the PyPi's repository, and you can install it via `pip`:\n\n```bash\npip install cluster-over-sampling\n```\n\nSOM clusterer requires optional dependencies:\n\n```bash\npip install cluster-over-sampling[som]\n```\n\nSimilarly for Geometric SMOTE oversampler:\n\n```bash\npip install cluster-over-sampling[gsmote]\n```\n\nYou can also install both of them:\n\n```bash\npip install cluster-over-sampling[all]\n```\n\n## Usage\n\nAll the classes included in `cluster-over-sampling` follow the [imbalanced-learn] API using the functionality of the base\noversampler. Using [scikit-learn] convention, the data are represented as follows:\n\n- Input data `X`: 2D array-like or sparse matrices.\n- Targets `y`: 1D array-like.\n\nThe clustering-based oversamplers implement a `fit` method to learn from `X` and `y`:\n\n```python\nclustering_based_oversampler.fit(X, y)\n```\n\nThey also implement a `fit_resample` method to resample `X` and `y`:\n\n```python\nX_resampled, y_resampled = clustering_based_oversampler.fit_resample(X, y)\n```\n\n## References\n\nIf you use `cluster-over-sampling` in a scientific publication, we would appreciate citations to any of the following papers:\n\n[^1]: [G. Douzas, F. Bacao, \"Self-Organizing Map Oversampling (SOMO) for imbalanced data set learning\", Expert Systems with\n    Applications, vol. 82, pp. 40-52, 2017.][SOMO]\n[^2]: [G. Douzas, F. Bacao, F. Last, \"Improving imbalanced learning through a heuristic oversampling method based on k-means and\n    SMOTE\", Information Sciences, vol. 465, pp. 1-20, 2018.][KMeans-SMOTE]\n[^3]: [G. Douzas, F. Bacao, F. Last, \"G-SOMO: An oversampling approach based on self-organized maps and geometric SMOTE\", Expert\n    Systems with Applications, vol. 183,115230, 2021.][G-SOMO]\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A general interface for clustering based over-sampling algorithms.",
    "version": "0.5.0",
    "split_keywords": [
        "machine learning",
        "imbalanced learning",
        "oversampling"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0089217a17ba7e773415069614116477ee05a932ec998a2290aaa2b6cdffdb3b",
                "md5": "673a64bb241e864e5461ce0cf35a3f4f",
                "sha256": "fac4f7496102de9c1ba1e66cc35a31029e756fa5ac29a145351f243f6e45690e"
            },
            "downloads": -1,
            "filename": "cluster_over_sampling-0.5.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "673a64bb241e864e5461ce0cf35a3f4f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10, <3.12",
            "size": 24673,
            "upload_time": "2023-03-16T11:38:39",
            "upload_time_iso_8601": "2023-03-16T11:38:39.417390Z",
            "url": "https://files.pythonhosted.org/packages/00/89/217a17ba7e773415069614116477ee05a932ec998a2290aaa2b6cdffdb3b/cluster_over_sampling-0.5.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b94e226bc1fa39fd9ec029cb3e5314e84b92fbcfca6e1fad98213537899ec6c1",
                "md5": "a70c46ecb16b5dd43ad13fbe012133de",
                "sha256": "7edf4b43d36398936cadee26172d275c03a09a164a6abada43f0ec6d0a73aec9"
            },
            "downloads": -1,
            "filename": "cluster-over-sampling-0.5.0.tar.gz",
            "has_sig": false,
            "md5_digest": "a70c46ecb16b5dd43ad13fbe012133de",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10, <3.12",
            "size": 24794,
            "upload_time": "2023-03-16T11:38:41",
            "upload_time_iso_8601": "2023-03-16T11:38:41.759786Z",
            "url": "https://files.pythonhosted.org/packages/b9/4e/226bc1fa39fd9ec029cb3e5314e84b92fbcfca6e1fad98213537899ec6c1/cluster-over-sampling-0.5.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-03-16 11:38:41",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "cluster-over-sampling"
}
        
Elapsed time: 0.11863s