## autocluster
``autocluster`` is an automated machine learning (AutoML) toolkit for performing clustering tasks.
Report and presentation slides can be found [here](https://github.com/wywongbd/autocluster/blob/master/reports/autocluster_report.pdf) and [here](https://github.com/wywongbd/autocluster/blob/master/reports/autocluster_ppt.pdf).
## Prerequisites
- Python 3.5 or above
- Linux OS, or [Windows WSL](https://docs.microsoft.com/en-us/windows/wsl/about) is also possible
## How to get started?
1. First, install [SMAC](https://automl.github.io/SMAC3/stable/installation.html):
- ``sudo apt-get install build-essential swig``
- ``conda install gxx_linux-64 gcc_linux-64 swig``
- ``pip install smac==0.8.0``
2. ``pip install autocluster``
## How it works?
- ``autocluster`` automatically optimizes the *configuration* of a clustering problem. By *configuration*, we mean
- choice of dimension reduction algorithm
- choice of clustering model
- setting of dimension reduction algorithm's hyperparameters
- setting of clustering model's hyperparameters
- ``autocluster`` provides 3 different approaches to optimize the configuration (with increasing complexity):
- random optimization
- bayesian optimization
- bayesian optimization + meta-learning (warmstarting)
## Algorithms/Models supported
- List of dimension reduction algorithms in ``sklearn`` supported by ``autocluster``'s optimizer.
<img src="images/dim_reduction_algorithms.png" width="600">
- List of clustering models in ``sklearn`` supported by ``autocluster``'s optimizer.
<img src="images/clustering_algorithms.png" width="600">
## Examples
Examples are available in these [notebooks](/autocluster/examples/).
## Experimental results
- This dataset comprises of 16 Gaussian clusters in 128-dimensional space with ``N = 1024`` points. The optimal configuration obtained by ``autocluster`` (SMAC + Warmstarting) consists of a Truncated SVD dimension reduction model + Birch clustering model.
<img src="images/clustering_result_dim128.png" width="600">
- This dataset comprises of 15 Gaussian clusters in 2-dimensional space with ``N = 5000 points``. The optimal configuration obtained by ``autocluster`` (SMAC + Warmstarting) consists of a TSNE dimension reduction model + Agglomerative clustering model.
<img src="images/clustering_result_s2.png" width="600">
## Links
- [Link](https://pypi.org/project/autocluster/) to pypi.
- Great [writeup](http://krasserm.github.io/2018/03/21/bayesian-optimization/) by Martin Krasser on Bayesian Optimization
## Disclaimer
The project is experimental and still under development.
Raw data
{
"_id": null,
"home_page": "https://github.com/wywongbd/autocluster",
"name": "autocluster",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "automl,clustering,bayesian-optimization,hyperparameter-optimization",
"author": "Wong Wen Yan",
"author_email": "wywongbd@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/c6/1d/47352630c57a530bdfa6e1db427890df51fb0858ae99b02cb611ec429a91/autocluster-0.5.3.tar.gz",
"platform": null,
"description": "## autocluster\n``autocluster`` is an automated machine learning (AutoML) toolkit for performing clustering tasks. \n\nReport and presentation slides can be found [here](https://github.com/wywongbd/autocluster/blob/master/reports/autocluster_report.pdf) and [here](https://github.com/wywongbd/autocluster/blob/master/reports/autocluster_ppt.pdf).\n\n## Prerequisites\n- Python 3.5 or above\n- Linux OS, or [Windows WSL](https://docs.microsoft.com/en-us/windows/wsl/about) is also possible\n\n## How to get started?\n1. First, install [SMAC](https://automl.github.io/SMAC3/stable/installation.html):\n - ``sudo apt-get install build-essential swig``\n - ``conda install gxx_linux-64 gcc_linux-64 swig``\n - ``pip install smac==0.8.0``\n2. ``pip install autocluster``\n\n## How it works?\n- ``autocluster`` automatically optimizes the *configuration* of a clustering problem. By *configuration*, we mean \n - choice of dimension reduction algorithm\n - choice of clustering model\n - setting of dimension reduction algorithm's hyperparameters\n - setting of clustering model's hyperparameters \n \n- ``autocluster`` provides 3 different approaches to optimize the configuration (with increasing complexity): \n - random optimization\n - bayesian optimization \n - bayesian optimization + meta-learning (warmstarting)\n\n## Algorithms/Models supported\n- List of dimension reduction algorithms in ``sklearn`` supported by ``autocluster``'s optimizer.\n<img src=\"images/dim_reduction_algorithms.png\" width=\"600\">\n\n- List of clustering models in ``sklearn`` supported by ``autocluster``'s optimizer.\n<img src=\"images/clustering_algorithms.png\" width=\"600\">\n\n## Examples\nExamples are available in these [notebooks](/autocluster/examples/).\n\n## Experimental results\n- This dataset comprises of 16 Gaussian clusters in 128-dimensional space with ``N = 1024`` points. The optimal configuration obtained by ``autocluster`` (SMAC + Warmstarting) consists of a Truncated SVD dimension reduction model + Birch clustering model.\n<img src=\"images/clustering_result_dim128.png\" width=\"600\">\n\n- This dataset comprises of 15 Gaussian clusters in 2-dimensional space with ``N = 5000 points``. The optimal configuration obtained by ``autocluster`` (SMAC + Warmstarting) consists of a TSNE dimension reduction model + Agglomerative clustering model.\n<img src=\"images/clustering_result_s2.png\" width=\"600\">\n\n\n## Links \n- [Link](https://pypi.org/project/autocluster/) to pypi. \n- Great [writeup](http://krasserm.github.io/2018/03/21/bayesian-optimization/) by Martin Krasser on Bayesian Optimization\n\n## Disclaimer\nThe project is experimental and still under development.\n\n\n",
"bugtrack_url": null,
"license": "BSD-3-clause",
"summary": "Automated machine learning toolkit for performing clustering tasks.",
"version": "0.5.3",
"split_keywords": [
"automl",
"clustering",
"bayesian-optimization",
"hyperparameter-optimization"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "efa280b92e402899623f53f24116c793a14a2e3133c3a5d5221ee97335d05abf",
"md5": "00fdeec0405ce734842d1f6a5e9ba2c9",
"sha256": "548b2c02ca8d402314677a27bcf949520b0f67b6edb8a6bff3bb9aece0a23e09"
},
"downloads": -1,
"filename": "autocluster-0.5.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "00fdeec0405ce734842d1f6a5e9ba2c9",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 27863,
"upload_time": "2023-01-09T06:27:56",
"upload_time_iso_8601": "2023-01-09T06:27:56.390845Z",
"url": "https://files.pythonhosted.org/packages/ef/a2/80b92e402899623f53f24116c793a14a2e3133c3a5d5221ee97335d05abf/autocluster-0.5.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "c61d47352630c57a530bdfa6e1db427890df51fb0858ae99b02cb611ec429a91",
"md5": "969b9ac11fa08603585bafec8ee5f708",
"sha256": "c2842b656e3c2a6e177224194e48b23a8bcc0b46a9b6215f680dea2f7d85a009"
},
"downloads": -1,
"filename": "autocluster-0.5.3.tar.gz",
"has_sig": false,
"md5_digest": "969b9ac11fa08603585bafec8ee5f708",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 23448,
"upload_time": "2023-01-09T06:27:58",
"upload_time_iso_8601": "2023-01-09T06:27:58.083477Z",
"url": "https://files.pythonhosted.org/packages/c6/1d/47352630c57a530bdfa6e1db427890df51fb0858ae99b02cb611ec429a91/autocluster-0.5.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-01-09 06:27:58",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "wywongbd",
"github_project": "autocluster",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "autocluster"
}