autocluster


Nameautocluster JSON
Version 0.5.3 PyPI version JSON
download
home_pagehttps://github.com/wywongbd/autocluster
SummaryAutomated machine learning toolkit for performing clustering tasks.
upload_time2023-01-09 06:27:58
maintainer
docs_urlNone
authorWong Wen Yan
requires_python
licenseBSD-3-clause
keywords automl clustering bayesian-optimization hyperparameter-optimization
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ## autocluster
``autocluster`` is an automated machine learning (AutoML) toolkit for performing clustering tasks.   

Report and presentation slides can be found [here](https://github.com/wywongbd/autocluster/blob/master/reports/autocluster_report.pdf) and [here](https://github.com/wywongbd/autocluster/blob/master/reports/autocluster_ppt.pdf).

## Prerequisites
- Python 3.5 or above
- Linux OS, or [Windows WSL](https://docs.microsoft.com/en-us/windows/wsl/about) is also possible

## How to get started?
1. First, install [SMAC](https://automl.github.io/SMAC3/stable/installation.html):
  - ``sudo apt-get install build-essential swig``
  - ``conda install gxx_linux-64 gcc_linux-64 swig``
  - ``pip install smac==0.8.0``
2. ``pip install autocluster``

## How it works?
- ``autocluster`` automatically optimizes the *configuration* of a clustering problem. By *configuration*, we mean 
    - choice of dimension reduction algorithm
    - choice of clustering model
    - setting of dimension reduction algorithm's hyperparameters
    - setting of clustering model's hyperparameters  
  
- ``autocluster`` provides 3 different approaches to optimize the configuration (with increasing complexity): 
    - random optimization
    - bayesian optimization 
    - bayesian optimization + meta-learning (warmstarting)

## Algorithms/Models supported
- List of dimension reduction algorithms in ``sklearn`` supported by ``autocluster``'s optimizer.
<img src="images/dim_reduction_algorithms.png" width="600">

- List of clustering models in ``sklearn`` supported by ``autocluster``'s optimizer.
<img src="images/clustering_algorithms.png" width="600">

## Examples
Examples are available in these [notebooks](/autocluster/examples/).

## Experimental results
- This dataset comprises of 16 Gaussian clusters in 128-dimensional space with ``N = 1024`` points. The optimal configuration obtained by ``autocluster`` (SMAC + Warmstarting) consists of a Truncated SVD dimension reduction model + Birch clustering model.
<img src="images/clustering_result_dim128.png" width="600">

- This dataset comprises of 15 Gaussian clusters in 2-dimensional space with ``N = 5000 points``. The optimal configuration obtained by ``autocluster`` (SMAC + Warmstarting) consists of a TSNE dimension reduction model + Agglomerative clustering model.
<img src="images/clustering_result_s2.png" width="600">


## Links  
- [Link](https://pypi.org/project/autocluster/) to pypi. 
- Great [writeup](http://krasserm.github.io/2018/03/21/bayesian-optimization/) by Martin Krasser on Bayesian Optimization

## Disclaimer
The project is experimental and still under development.



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/wywongbd/autocluster",
    "name": "autocluster",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "automl,clustering,bayesian-optimization,hyperparameter-optimization",
    "author": "Wong Wen Yan",
    "author_email": "wywongbd@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/c6/1d/47352630c57a530bdfa6e1db427890df51fb0858ae99b02cb611ec429a91/autocluster-0.5.3.tar.gz",
    "platform": null,
    "description": "## autocluster\n``autocluster`` is an automated machine learning (AutoML) toolkit for performing clustering tasks.   \n\nReport and presentation slides can be found [here](https://github.com/wywongbd/autocluster/blob/master/reports/autocluster_report.pdf) and [here](https://github.com/wywongbd/autocluster/blob/master/reports/autocluster_ppt.pdf).\n\n## Prerequisites\n- Python 3.5 or above\n- Linux OS, or [Windows WSL](https://docs.microsoft.com/en-us/windows/wsl/about) is also possible\n\n## How to get started?\n1. First, install [SMAC](https://automl.github.io/SMAC3/stable/installation.html):\n  - ``sudo apt-get install build-essential swig``\n  - ``conda install gxx_linux-64 gcc_linux-64 swig``\n  - ``pip install smac==0.8.0``\n2. ``pip install autocluster``\n\n## How it works?\n- ``autocluster`` automatically optimizes the *configuration* of a clustering problem. By *configuration*, we mean \n    - choice of dimension reduction algorithm\n    - choice of clustering model\n    - setting of dimension reduction algorithm's hyperparameters\n    - setting of clustering model's hyperparameters  \n  \n- ``autocluster`` provides 3 different approaches to optimize the configuration (with increasing complexity): \n    - random optimization\n    - bayesian optimization \n    - bayesian optimization + meta-learning (warmstarting)\n\n## Algorithms/Models supported\n- List of dimension reduction algorithms in ``sklearn`` supported by ``autocluster``'s optimizer.\n<img src=\"images/dim_reduction_algorithms.png\" width=\"600\">\n\n- List of clustering models in ``sklearn`` supported by ``autocluster``'s optimizer.\n<img src=\"images/clustering_algorithms.png\" width=\"600\">\n\n## Examples\nExamples are available in these [notebooks](/autocluster/examples/).\n\n## Experimental results\n- This dataset comprises of 16 Gaussian clusters in 128-dimensional space with ``N = 1024`` points. The optimal configuration obtained by ``autocluster`` (SMAC + Warmstarting) consists of a Truncated SVD dimension reduction model + Birch clustering model.\n<img src=\"images/clustering_result_dim128.png\" width=\"600\">\n\n- This dataset comprises of 15 Gaussian clusters in 2-dimensional space with ``N = 5000 points``. The optimal configuration obtained by ``autocluster`` (SMAC + Warmstarting) consists of a TSNE dimension reduction model + Agglomerative clustering model.\n<img src=\"images/clustering_result_s2.png\" width=\"600\">\n\n\n## Links  \n- [Link](https://pypi.org/project/autocluster/) to pypi. \n- Great [writeup](http://krasserm.github.io/2018/03/21/bayesian-optimization/) by Martin Krasser on Bayesian Optimization\n\n## Disclaimer\nThe project is experimental and still under development.\n\n\n",
    "bugtrack_url": null,
    "license": "BSD-3-clause",
    "summary": "Automated machine learning toolkit for performing clustering tasks.",
    "version": "0.5.3",
    "split_keywords": [
        "automl",
        "clustering",
        "bayesian-optimization",
        "hyperparameter-optimization"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "efa280b92e402899623f53f24116c793a14a2e3133c3a5d5221ee97335d05abf",
                "md5": "00fdeec0405ce734842d1f6a5e9ba2c9",
                "sha256": "548b2c02ca8d402314677a27bcf949520b0f67b6edb8a6bff3bb9aece0a23e09"
            },
            "downloads": -1,
            "filename": "autocluster-0.5.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "00fdeec0405ce734842d1f6a5e9ba2c9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 27863,
            "upload_time": "2023-01-09T06:27:56",
            "upload_time_iso_8601": "2023-01-09T06:27:56.390845Z",
            "url": "https://files.pythonhosted.org/packages/ef/a2/80b92e402899623f53f24116c793a14a2e3133c3a5d5221ee97335d05abf/autocluster-0.5.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c61d47352630c57a530bdfa6e1db427890df51fb0858ae99b02cb611ec429a91",
                "md5": "969b9ac11fa08603585bafec8ee5f708",
                "sha256": "c2842b656e3c2a6e177224194e48b23a8bcc0b46a9b6215f680dea2f7d85a009"
            },
            "downloads": -1,
            "filename": "autocluster-0.5.3.tar.gz",
            "has_sig": false,
            "md5_digest": "969b9ac11fa08603585bafec8ee5f708",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 23448,
            "upload_time": "2023-01-09T06:27:58",
            "upload_time_iso_8601": "2023-01-09T06:27:58.083477Z",
            "url": "https://files.pythonhosted.org/packages/c6/1d/47352630c57a530bdfa6e1db427890df51fb0858ae99b02cb611ec429a91/autocluster-0.5.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-01-09 06:27:58",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "wywongbd",
    "github_project": "autocluster",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "autocluster"
}
        
Elapsed time: 0.02593s