scikit-query


Namescikit-query JSON
Version 0.3 PyPI version JSON
download
home_pagehttps://github.com/aymericb213/scikit-query
Summaryscikit-query is a Python library for active query strategies in constrained clustering on top of SciPy and scikit-learn.
upload_time2023-09-05 07:44:14
maintainer
docs_urlNone
authorAymeric Beauchamp
requires_python>=3.10
licenseBSD 3-Clause License
keywords active clustering semi-supervised clustering constrained clustering pattern recognition machine learning artificial intelligence
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            [![Documentation Status](https://readthedocs.org/projects/scikit-query/badge/?version=latest)](https://scikit-query.readthedocs.io/en/latest/?badge=latest)
[![version](https://img.shields.io/pypi/v/scikit-query)](https://pypi.org/project/scikit-query)
[![Python](https://img.shields.io/pypi/pyversions/scikit-query)]()
[![codecov](https://codecov.io/github/aymericb213/scikit-query/branch/main/graph/badge.svg?token=ZU4OEZKSP9)](https://codecov.io/github/aymericb213/scikit-query)
[![license](https://img.shields.io/pypi/l/scikit-query)](https://choosealicense.com/licenses/bsd-3-clause)
[![Downloads](https://static.pepy.tech/badge/scikit-query)](https://pepy.tech/project/scikit-query)

# scikit-query

Clustering aims to group data into clusters without the help of labels, unlike classification algorithms. 
A well-known shortcoming of clustering algorithms is that they rely on an objective function geared toward 
specific types of clusters (convex, dense, well-separated), and hyperparameters that are hard to tune.
Semi-supervised clustering mitigates these problems by injecting background knowledge in order to guide the clustering.
Active clustering algorithms analyze the data to select interesting points to ask the user about, generating constraints
that allow fast convergence towards a user-specified partition.

**scikit-query** is a library of active query strategies for constrained clustering inspired by [scikit-learn](https://scikit-learn.org)
and the now inactive [active-semi-supervised-clustering](https://github.com/datamole-ai/active-semi-supervised-clustering) library by Jakub Švehla.

It is focused on algorithm-agnostic query strategies, 
i.e. methods that do not rely on a particular clustering algorithm. 
From an input dataset, they produce a set of constraints by making insightful queries to an oracle.
A variant for incremental constrained clustering is provided for applicable algorithms,
taking a data partition into account. 

In typical *scikit* way, the library is used by instanciating a class and using its *fit* method.

``` python
from skquery.pairwise import AIPC
from skquery.oracle import MLCLOracle

qs = AIPC()
oracle = MLCLOracle(truth=labels, budget=10)
constraints = qs.fit(dataset, oracle)
```

## Algorithms

| Algorithm       | Description                            | Constraint type | Works in incremental setting ? | Source                                                                                  | Date |
|-----------------|----------------------------------------|-----------------|--------------------------------|-----------------------------------------------------------------------------------------|------|
| Random sampling |                                        | ML/CL, triplet  | :heavy_check_mark:             |                                                                                         |      |
| FFQS            | Neighborhood-based                     | ML/CL           | :heavy_check_mark:             | [Basu et al.](https://epubs.siam.org/doi/10.1137/1.9781611972740.31)                    | 2004   |
| MMFFQS (MinMax) | Neighborhood-based, similarity         | ML/CL           | :heavy_check_mark:             | [Mallapragada et al.](https://ieeexplore.ieee.org/document/4761792)                     | 2008                                                                 |
| NPU             | Neighborhood-based, information theory | ML/CL           | :heavy_check_mark:             | [Xiong et al.](https://dl.acm.org/doi/10.1109/TKDE.2013.22)                             | 2013                                                                 |
| SASC            | SVDD, greedy approach                  | ML/CL           |                                | [Abin & Beigy](https://www.sciencedirect.com/science/article/abs/pii/S0031320313004068) | 2014                                                                 |
| AIPC            | Fuzzy clustering, information theory   | ML/CL           |                                | [Zhang et al.](https://ieeexplore.ieee.org/document/8740960)                            | 2019                                                                                    |

## Dependencies

scikit-query is developed on Python >= 3.10, and requires the following libraries :
- pandas>=2.0.1
- matplotlib>=3.7.1
- numpy>=1.24.3
- scikit-learn>=1.2.2
- cvxopt>=1.3.1
- scikit-fuzzy>=0.4.2
- scipy>=1.10.1
- plotly>=5.14.1

## Contributors

FFQS, MinMax and NPU are based upon Jakub Švehla's implementation. 
Other algorithms have been implemented by Aymeric Beauchamp or his students from the University of Orléans :
- Salma Badri, Elis Ishimwe, Brice Jacquesson, Matthéo Pailler (2023)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/aymericb213/scikit-query",
    "name": "scikit-query",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "",
    "keywords": "active clustering,semi-supervised clustering,constrained clustering,pattern recognition,machine learning,artificial intelligence",
    "author": "Aymeric Beauchamp",
    "author_email": "aymeric.beauchamp@univ-orleans.fr",
    "download_url": "https://files.pythonhosted.org/packages/8e/79/005426ad42378c32b0c444a46c7b2864af06c0ed77f2dda28dfc61665d4d/scikit-query-0.3.tar.gz",
    "platform": null,
    "description": "[![Documentation Status](https://readthedocs.org/projects/scikit-query/badge/?version=latest)](https://scikit-query.readthedocs.io/en/latest/?badge=latest)\n[![version](https://img.shields.io/pypi/v/scikit-query)](https://pypi.org/project/scikit-query)\n[![Python](https://img.shields.io/pypi/pyversions/scikit-query)]()\n[![codecov](https://codecov.io/github/aymericb213/scikit-query/branch/main/graph/badge.svg?token=ZU4OEZKSP9)](https://codecov.io/github/aymericb213/scikit-query)\n[![license](https://img.shields.io/pypi/l/scikit-query)](https://choosealicense.com/licenses/bsd-3-clause)\n[![Downloads](https://static.pepy.tech/badge/scikit-query)](https://pepy.tech/project/scikit-query)\n\n# scikit-query\n\nClustering aims to group data into clusters without the help of labels, unlike classification algorithms. \nA well-known shortcoming of clustering algorithms is that they rely on an objective function geared toward \nspecific types of clusters (convex, dense, well-separated), and hyperparameters that are hard to tune.\nSemi-supervised clustering mitigates these problems by injecting background knowledge in order to guide the clustering.\nActive clustering algorithms analyze the data to select interesting points to ask the user about, generating constraints\nthat allow fast convergence towards a user-specified partition.\n\n**scikit-query** is a library of active query strategies for constrained clustering inspired by [scikit-learn](https://scikit-learn.org)\nand the now inactive [active-semi-supervised-clustering](https://github.com/datamole-ai/active-semi-supervised-clustering) library by Jakub \u0160vehla.\n\nIt is focused on algorithm-agnostic query strategies, \ni.e. methods that do not rely on a particular clustering algorithm. \nFrom an input dataset, they produce a set of constraints by making insightful queries to an oracle.\nA variant for incremental constrained clustering is provided for applicable algorithms,\ntaking a data partition into account. \n\nIn typical *scikit* way, the library is used by instanciating a class and using its *fit* method.\n\n``` python\nfrom skquery.pairwise import AIPC\nfrom skquery.oracle import MLCLOracle\n\nqs = AIPC()\noracle = MLCLOracle(truth=labels, budget=10)\nconstraints = qs.fit(dataset, oracle)\n```\n\n## Algorithms\n\n| Algorithm       | Description                            | Constraint type | Works in incremental setting ? | Source                                                                                  | Date |\n|-----------------|----------------------------------------|-----------------|--------------------------------|-----------------------------------------------------------------------------------------|------|\n| Random sampling |                                        | ML/CL, triplet  | :heavy_check_mark:             |                                                                                         |      |\n| FFQS            | Neighborhood-based                     | ML/CL           | :heavy_check_mark:             | [Basu et al.](https://epubs.siam.org/doi/10.1137/1.9781611972740.31)                    | 2004   |\n| MMFFQS (MinMax) | Neighborhood-based, similarity         | ML/CL           | :heavy_check_mark:             | [Mallapragada et al.](https://ieeexplore.ieee.org/document/4761792)                     | 2008                                                                 |\n| NPU             | Neighborhood-based, information theory | ML/CL           | :heavy_check_mark:             | [Xiong et al.](https://dl.acm.org/doi/10.1109/TKDE.2013.22)                             | 2013                                                                 |\n| SASC            | SVDD, greedy approach                  | ML/CL           |                                | [Abin & Beigy](https://www.sciencedirect.com/science/article/abs/pii/S0031320313004068) | 2014                                                                 |\n| AIPC            | Fuzzy clustering, information theory   | ML/CL           |                                | [Zhang et al.](https://ieeexplore.ieee.org/document/8740960)                            | 2019                                                                                    |\n\n## Dependencies\n\nscikit-query is developed on Python >= 3.10, and requires the following libraries :\n- pandas>=2.0.1\n- matplotlib>=3.7.1\n- numpy>=1.24.3\n- scikit-learn>=1.2.2\n- cvxopt>=1.3.1\n- scikit-fuzzy>=0.4.2\n- scipy>=1.10.1\n- plotly>=5.14.1\n\n## Contributors\n\nFFQS, MinMax and NPU are based upon Jakub \u0160vehla's implementation. \nOther algorithms have been implemented by Aymeric Beauchamp or his students from the University of Orl\u00e9ans :\n- Salma Badri, Elis Ishimwe, Brice Jacquesson, Matth\u00e9o Pailler (2023)\n",
    "bugtrack_url": null,
    "license": "BSD 3-Clause License",
    "summary": "scikit-query is a Python library for active query strategies in constrained clustering on top of SciPy and scikit-learn.",
    "version": "0.3",
    "project_urls": {
        "Homepage": "https://github.com/aymericb213/scikit-query"
    },
    "split_keywords": [
        "active clustering",
        "semi-supervised clustering",
        "constrained clustering",
        "pattern recognition",
        "machine learning",
        "artificial intelligence"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fd9cdabbea9f39abb6c8e88bb891b3beae9704993fcde7e5ca4ac8e4daa96cc5",
                "md5": "99c365f6e5e98aa00bf3aa01222e1d10",
                "sha256": "af6314de176a5e76362a4b1781880381e65be553c800ded228a17caa97a7854c"
            },
            "downloads": -1,
            "filename": "scikit_query-0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "99c365f6e5e98aa00bf3aa01222e1d10",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 33317,
            "upload_time": "2023-09-05T07:44:13",
            "upload_time_iso_8601": "2023-09-05T07:44:13.680954Z",
            "url": "https://files.pythonhosted.org/packages/fd/9c/dabbea9f39abb6c8e88bb891b3beae9704993fcde7e5ca4ac8e4daa96cc5/scikit_query-0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8e79005426ad42378c32b0c444a46c7b2864af06c0ed77f2dda28dfc61665d4d",
                "md5": "a93438e702c791e64ab7fd43ba4d7c97",
                "sha256": "06fc1ea2f7c192acf6c49002463eb4c2a24565efe0a67423b3230e3f1e32a1b6"
            },
            "downloads": -1,
            "filename": "scikit-query-0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "a93438e702c791e64ab7fd43ba4d7c97",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 25015,
            "upload_time": "2023-09-05T07:44:14",
            "upload_time_iso_8601": "2023-09-05T07:44:14.858638Z",
            "url": "https://files.pythonhosted.org/packages/8e/79/005426ad42378c32b0c444a46c7b2864af06c0ed77f2dda28dfc61665d4d/scikit-query-0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-05 07:44:14",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "aymericb213",
    "github_project": "scikit-query",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "requirements": [],
    "lcname": "scikit-query"
}
        
Elapsed time: 0.14988s