# ParTree - Data Partitioning through Tree-based Clustering Method
While existing clustering methods only provide the assignment of records to clusters without justifying the partitioning, we propose tree-based clustering methods that offer interpretable data partitioning through a shallow decision tree.
These decision trees enable easy-to-understand explanations of cluster assignments through short and understandable split conditions.
The proposed methods are evaluated through experiments on synthetic and real datasets and proved to be more effective than traditional clustering approaches and interpretable ones in terms of standard evaluation measures and runtime.
## Setup
### Using PyPI
```bash
pip install partree
```
### Manual Setup
```bash
git clone https://github.com/cri98li/ParTree
cd ParTree
pip install -e .
```
## Running the code
```python
import pandas as pd
from ParTree import PrincipalParTree
from ParTree import print_rules
# load the data
df = pd.read_csv(dataset)
X = df.values
#train the model
partree = PrincipalParTree()
partree.fit(X)
#extract the labels
labels = partree.labels_
#get the row explanation in a dictionary-like structure and print it
rules = partree.get_rules()
print(print_rules(rules, None, feature_names=df.columns))
```
You can find the software documentation in the `/docs/` folder and
a powerpoint presentation on Geolet can be found [here]().
You can cite this work with
```
TODO
```
## Additional Material
Clustering logic visualizations for diabetes dataset. From left to right: ParTree, k-Means, Hier:
<p float="left">
<img src="img/tree_dendo_parallel_DEF.jpg" width="33%" />
<img src="img/tree_dendo_parallel_DEF3.jpg" width="33%" />
<img src="img/tree_dendo_parallel_DEF5.jpg" width="33%" />
</p>
Raw data
{
"_id": null,
"home_page": "https://github.com/cri98li/ParTree",
"name": "ParTree",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "keyword1 keyword2 keyword3",
"author": "Cristiano Landi",
"author_email": "cri98li@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/bd/b4/0cef11c5a5d96cf3b562eb6906c5635b2c40b7855558bbd5298f238aa1b8/ParTree-0.0.4.tar.gz",
"platform": null,
"description": "# ParTree - Data Partitioning through Tree-based Clustering Method\n\nWhile existing clustering methods only provide the assignment of records to clusters without justifying the partitioning, we propose tree-based clustering methods that offer interpretable data partitioning through a shallow decision tree. \nThese decision trees enable easy-to-understand explanations of cluster assignments through short and understandable split conditions. \nThe proposed methods are evaluated through experiments on synthetic and real datasets and proved to be more effective than traditional clustering approaches and interpretable ones in terms of standard evaluation measures and runtime. \n\n\n## Setup\n\n### Using PyPI\n\n```bash\npip install partree\n```\n\n### Manual Setup\n\n```bash\ngit clone https://github.com/cri98li/ParTree\ncd ParTree\npip install -e .\n```\n\n## Running the code\n\n```python\nimport pandas as pd\nfrom ParTree import PrincipalParTree\nfrom ParTree import print_rules\n\n# load the data\ndf = pd.read_csv(dataset) \nX = df.values\n\n#train the model\npartree = PrincipalParTree()\npartree.fit(X)\n\n#extract the labels\nlabels = partree.labels_ \n\n#get the row explanation in a dictionary-like structure and print it\nrules = partree.get_rules()\nprint(print_rules(rules, None, feature_names=df.columns))\n```\n\nYou can find the software documentation in the `/docs/` folder and \na powerpoint presentation on Geolet can be found [here]().\nYou can cite this work with\n```\nTODO\n```\n\n\n## Additional Material\n\nClustering logic visualizations for diabetes dataset. From left to right: ParTree, k-Means, Hier:\n\n<p float=\"left\">\n <img src=\"img/tree_dendo_parallel_DEF.jpg\" width=\"33%\" />\n <img src=\"img/tree_dendo_parallel_DEF3.jpg\" width=\"33%\" /> \n <img src=\"img/tree_dendo_parallel_DEF5.jpg\" width=\"33%\" />\n</p>\n\n\n",
"bugtrack_url": null,
"license": "BSD-Clause-2",
"summary": "Package description",
"version": "0.0.4",
"project_urls": {
"Homepage": "https://github.com/cri98li/ParTree"
},
"split_keywords": [
"keyword1",
"keyword2",
"keyword3"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "bdb40cef11c5a5d96cf3b562eb6906c5635b2c40b7855558bbd5298f238aa1b8",
"md5": "9ac760cb08fa8c1d6597b97c0909d0cd",
"sha256": "0afc555ec52fb8c3db1c990f10c3d63923b66ea79ac35411c768de5da671f4ad"
},
"downloads": -1,
"filename": "ParTree-0.0.4.tar.gz",
"has_sig": false,
"md5_digest": "9ac760cb08fa8c1d6597b97c0909d0cd",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 21264,
"upload_time": "2024-07-12T09:22:10",
"upload_time_iso_8601": "2024-07-12T09:22:10.591873Z",
"url": "https://files.pythonhosted.org/packages/bd/b4/0cef11c5a5d96cf3b562eb6906c5635b2c40b7855558bbd5298f238aa1b8/ParTree-0.0.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-12 09:22:10",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "cri98li",
"github_project": "ParTree",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"requirements": [],
"lcname": "partree"
}