knac-toolkit


Nameknac-toolkit JSON
Version 1.0.2 PyPI version JSON
download
home_page
SummaryKnowledge Augmented Clustering
upload_time2024-02-29 10:37:49
maintainer
docs_urlNone
author
requires_python>=3.8
license
keywords xai clustering explainability model-agnostic rule-based
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Google Colab Tutorial
Please follow the tutorial on using KnAC in text-clustering and fil in the survey at the end: [Colab Notebook](https://colab.research.google.com/drive/1SJaG_wW0h1_JaPk40vPNP3dpTJGa1xXG)
# Knowledge Augmented Clustering (KnAC)
KnAC is a toolk for expert knowledge extension with a usage of automatic clustering algorithms.
It allows to refine expert-based labeling with splits and merges recommendations of expert labeling augmented with explanations.
The explanations were formulated as rules and therefore can be easily interpreted incorporated with expert knowledge.

Possible integration witth [CLAMP](https://github.com/sbobek/clamp) and [LUX](https://github.com/sbobek/lux) is currently under development.

The overall workflow for KnAC is presented in Figure below:
![Workflow for KnAC](https://raw.githubusercontent.com/sbobek/knac/main/pix/workflow.png "Title")


## Install
KnAC can be installed from either [PyPI](https://pypi.org/project/knac) or directly from source code [GitHub](https://github.com/sbobek/knac)

To install form PyPI:

```
pip install knac-toolkit
```

To install from  source:

``` python
git clone  https://github.com/sbobek/knac
cd knac
pip install .
```
After that you can install and run `jupyter lab` and anvigate to `examples`   direcotry to run notebooks.

## Splitting example
Synthetic datasets with clusters to split is presented below. Columns in the figure represent clustering performed with expert knowledge, automated clustering, and $H^{split}$ matrix. In this example it is visible that expert knowledge clustering defined cluster 1 which should in fact be merged according to wht is seen in the data.

![](https://raw.githubusercontent.com/sbobek/knac/main/pix/split-toy-example.png)

For such a cese we will get following KnAC recommendations:

``` python
knac_splits = KnacSplits(confidence_threshold=0.9,silhouette_weight=0.2) 
knac_splits_recoms = knac_splits.fit_transform(confusion_matrix,
                                              y=None, data=data, 
                                              labels_automatic=data['Automatic_Clusters'].astype(str), 
                                              labels_expert=XX2['Expert_Clusters'])

Expert_Clusters
1    [(1, 2), 0.8332849823568992]
```

Which should be read as: Split expert cluster 1 into clusters 1 and 2 with confidence 0.83

For this recommendation, following justifications describing differences between expert clusters to split, showing that the most important difference between the clusters is in the **x1** variable and its value around 0.9, which is consisten with what we can see in the plot above.

``` python
justify_splits_tree(expert_to_split=expert_to_split, 
               split_recoms=split_recoms, 
               data=data, 
               features=features, 
               target_automatic='Automatic_Clusters')
               
['if (x1 > -0.903) then class: 2 (proba: 100.0%) | based on 100 samples',
 'if (x1 <= -0.903) then class: 1 (proba: 100.0%) | based on 100 samples']
               
```

## Merging example
Synthetic datasets with clusters to merge is presented below. Columns in the figure represent clustering performed with expert knowledge, automated clustering, and $H^{merge}$ matrix. In this example it is visible that expert knowledge clustering defined cluster 0 and 3 which should in fact be merged according to wht is seen in the data.

![](https://raw.githubusercontent.com/sbobek/knac/main/pix/merge-toy-example.png)

For such a cese we will get following KnAC recommendations:

``` python
knac1_merges = KnacMerges(confidence_threshold=0.9, 
                    metric='centroids_link',           
                    metric_weight=0.2)       
knac_merges_recoms=knac1_merges.fit_transform(confusion_matrix,data=data[['x1','x2']].values,labels_expert=data['Expert_Clusters'])

C1	C2	similarity
0	3	0.958983
```

Which should be read as: automatically discovered clusters C1 and C2 should be merged, as the similiarity (begin combinantion of link metric choosen and similarity indistribution between expret clusters) is equal to 0.93.


For this recommendation, following justifications describing differences between expert clusters to merge, showing that the most important difference between the clusters to merge is in the **x2** variable and its value around -5. It is the xpert role to decide if this difference is significant taking into account the domain knowledge (in this case one can assume that the difference describe by the rule sis not relevant for distinguising two separate clusters based on such condition).

``` python
justify_merges_tree(merge_recoms=merge_recoms, data=data, features=features, target_expert='Expert_Clusters')

['if (x2 <= -5.065) then class: 0 (proba: 98.21%) | based on 56 samples',
 'if (x2 > -5.065) then class: 3 (proba: 97.78%) | based on 45 samples']
```


# Cite this work

```
@Article{bobek2022knac,
  author="Bobek, Szymon
  and Kuk, Micha{\l}
  and Brzegowski, Jakub
  and Brzychczy, Edyta
  and Nalepa, Grzegorz J.",
  title="KnAC: an approach for enhancing cluster analysis with background knowledge and explanations",
  journal="Applied Intelligence",
  year="2022",
  month="Nov",
  day="23",
  issn="1573-7497",
  doi="10.1007/s10489-022-04310-9",
  url="https://doi.org/10.1007/s10489-022-04310-9"
}
```


            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "knac-toolkit",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "xai,clustering,explainability,model-agnostic,rule-based",
    "author": "",
    "author_email": "Szymon Bobek <szymon.bobek@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/f8/8d/0f3c9bc364747fab481e61e4c5679a8b3f51775bd1873e536ccdd5bd8050/knac-toolkit-1.0.2.tar.gz",
    "platform": null,
    "description": "# Google Colab Tutorial\nPlease follow the tutorial on using KnAC in text-clustering and fil in the survey at the end: [Colab Notebook](https://colab.research.google.com/drive/1SJaG_wW0h1_JaPk40vPNP3dpTJGa1xXG)\n# Knowledge Augmented Clustering (KnAC)\nKnAC is a toolk for expert knowledge extension with a usage of automatic clustering algorithms.\nIt allows to refine expert-based labeling with splits and merges recommendations of expert labeling augmented with explanations.\nThe explanations were formulated as rules and therefore can be easily interpreted incorporated with expert knowledge.\n\nPossible integration witth [CLAMP](https://github.com/sbobek/clamp) and [LUX](https://github.com/sbobek/lux) is currently under development.\n\nThe overall workflow for KnAC is presented in Figure below:\n![Workflow for KnAC](https://raw.githubusercontent.com/sbobek/knac/main/pix/workflow.png \"Title\")\n\n\n## Install\nKnAC can be installed from either [PyPI](https://pypi.org/project/knac) or directly from source code [GitHub](https://github.com/sbobek/knac)\n\nTo install form PyPI:\n\n```\npip install knac-toolkit\n```\n\nTo install from  source:\n\n``` python\ngit clone  https://github.com/sbobek/knac\ncd knac\npip install .\n```\nAfter that you can install and run `jupyter lab` and anvigate to `examples`   direcotry to run notebooks.\n\n## Splitting example\nSynthetic datasets with clusters to split is presented below. Columns in the figure represent clustering performed with expert knowledge, automated clustering, and $H^{split}$ matrix. In this example it is visible that expert knowledge clustering defined cluster 1 which should in fact be merged according to wht is seen in the data.\n\n![](https://raw.githubusercontent.com/sbobek/knac/main/pix/split-toy-example.png)\n\nFor such a cese we will get following KnAC recommendations:\n\n``` python\nknac_splits = KnacSplits(confidence_threshold=0.9,silhouette_weight=0.2) \nknac_splits_recoms = knac_splits.fit_transform(confusion_matrix,\n                                              y=None, data=data, \n                                              labels_automatic=data['Automatic_Clusters'].astype(str), \n                                              labels_expert=XX2['Expert_Clusters'])\n\nExpert_Clusters\n1    [(1, 2), 0.8332849823568992]\n```\n\nWhich should be read as: Split expert cluster 1 into clusters 1 and 2 with confidence 0.83\n\nFor this recommendation, following justifications describing differences between expert clusters to split, showing that the most important difference between the clusters is in the **x1** variable and its value around 0.9, which is consisten with what we can see in the plot above.\n\n``` python\njustify_splits_tree(expert_to_split=expert_to_split, \n               split_recoms=split_recoms, \n               data=data, \n               features=features, \n               target_automatic='Automatic_Clusters')\n               \n['if (x1 > -0.903) then class: 2 (proba: 100.0%) | based on 100 samples',\n 'if (x1 <= -0.903) then class: 1 (proba: 100.0%) | based on 100 samples']\n               \n```\n\n## Merging example\nSynthetic datasets with clusters to merge is presented below. Columns in the figure represent clustering performed with expert knowledge, automated clustering, and $H^{merge}$ matrix. In this example it is visible that expert knowledge clustering defined cluster 0 and 3 which should in fact be merged according to wht is seen in the data.\n\n![](https://raw.githubusercontent.com/sbobek/knac/main/pix/merge-toy-example.png)\n\nFor such a cese we will get following KnAC recommendations:\n\n``` python\nknac1_merges = KnacMerges(confidence_threshold=0.9, \n                    metric='centroids_link',           \n                    metric_weight=0.2)       \nknac_merges_recoms=knac1_merges.fit_transform(confusion_matrix,data=data[['x1','x2']].values,labels_expert=data['Expert_Clusters'])\n\nC1\tC2\tsimilarity\n0\t3\t0.958983\n```\n\nWhich should be read as: automatically discovered clusters C1 and C2 should be merged, as the similiarity (begin combinantion of link metric choosen and similarity indistribution between expret clusters) is equal to 0.93.\n\n\nFor this recommendation, following justifications describing differences between expert clusters to merge, showing that the most important difference between the clusters to merge is in the **x2** variable and its value around -5. It is the xpert role to decide if this difference is significant taking into account the domain knowledge (in this case one can assume that the difference describe by the rule sis not relevant for distinguising two separate clusters based on such condition).\n\n``` python\njustify_merges_tree(merge_recoms=merge_recoms, data=data, features=features, target_expert='Expert_Clusters')\n\n['if (x2 <= -5.065) then class: 0 (proba: 98.21%) | based on 56 samples',\n 'if (x2 > -5.065) then class: 3 (proba: 97.78%) | based on 45 samples']\n```\n\n\n# Cite this work\n\n```\n@Article{bobek2022knac,\n  author=\"Bobek, Szymon\n  and Kuk, Micha{\\l}\n  and Brzegowski, Jakub\n  and Brzychczy, Edyta\n  and Nalepa, Grzegorz J.\",\n  title=\"KnAC: an approach for enhancing cluster analysis with background knowledge and explanations\",\n  journal=\"Applied Intelligence\",\n  year=\"2022\",\n  month=\"Nov\",\n  day=\"23\",\n  issn=\"1573-7497\",\n  doi=\"10.1007/s10489-022-04310-9\",\n  url=\"https://doi.org/10.1007/s10489-022-04310-9\"\n}\n```\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Knowledge Augmented Clustering",
    "version": "1.0.2",
    "project_urls": {
        "Documentation": "https://knac-toolkit.readthedocs.org",
        "Homepage": "https://github.com/sbobek/knac",
        "Issues": "https://github.com/sbobek/knac/issues"
    },
    "split_keywords": [
        "xai",
        "clustering",
        "explainability",
        "model-agnostic",
        "rule-based"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d7ae028d91cdad8f64d96311a335d2d23d0d8e86aaf16a8eae0584b052d65fe9",
                "md5": "8f4a6e58caa09a0e192d2151e04c4ad5",
                "sha256": "ae154d5ceb7a8173ba3db49e6b166949e9c0b69a379037cbe8d62b2f014b9266"
            },
            "downloads": -1,
            "filename": "knac_toolkit-1.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8f4a6e58caa09a0e192d2151e04c4ad5",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 8900,
            "upload_time": "2024-02-29T10:37:47",
            "upload_time_iso_8601": "2024-02-29T10:37:47.664213Z",
            "url": "https://files.pythonhosted.org/packages/d7/ae/028d91cdad8f64d96311a335d2d23d0d8e86aaf16a8eae0584b052d65fe9/knac_toolkit-1.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f88d0f3c9bc364747fab481e61e4c5679a8b3f51775bd1873e536ccdd5bd8050",
                "md5": "1df527ce5ed7ac5525d77309e5b02bb9",
                "sha256": "e9475ff21ab756321da573cdfc4bfc17f48f141812785dec1d6ea9eff6ea758a"
            },
            "downloads": -1,
            "filename": "knac-toolkit-1.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "1df527ce5ed7ac5525d77309e5b02bb9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 10733,
            "upload_time": "2024-02-29T10:37:49",
            "upload_time_iso_8601": "2024-02-29T10:37:49.611888Z",
            "url": "https://files.pythonhosted.org/packages/f8/8d/0f3c9bc364747fab481e61e4c5679a8b3f51775bd1873e536ccdd5bd8050/knac-toolkit-1.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-29 10:37:49",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "sbobek",
    "github_project": "knac",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "knac-toolkit"
}
        
Elapsed time: 0.25381s