# classgraphic
[![made-with-python](https://img.shields.io/badge/Made%20with-Python-1f425f.svg)](https://www.python.org/)
[![image](https://img.shields.io/pypi/v/classgraphic.svg)](https://pypi.python.org/pypi/classgraphic)
![Dev](https://github.com/dionresearch/classgraphic/actions/workflows/dev.yml/badge.svg)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/dionresearch/classgraphic/HEAD?labpath=notebooks%2FClassGraphic_iris_demo.ipynb)
Interactive classification diagnostic plots for scikit-learn.
![coin sorting machine](https://github.com/dionresearch/classgraphic/raw/main/docs/source/sorter_patent.jpg)
> We classify things for the purpose of doing something to them. Any classification which does not assist manipulation is worse than useless. - Randolph S. Bourne,
"Education and Living", The Century Co (April 1917)
# Major features:
Plotly based tables for:
- class_imbalance_table
- classification_table
- confusion_matrix_table
- describe (dataframe stats)
- prediction_table
- table
And the following charts:
- class_imbalance
- class_error
- det
- feature_importance
- missing
- precision_recall
- roc
- prediction_histogram
- threshold
For clustering:
- Delauney triangulations
- Voronoi tessalations
# Try it
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/dionresearch/classgraphic/HEAD?labpath=notebooks%2FClassGraphic_iris_demo.ipynb)
By trying it on binder, you'll see all the details and interactivity. The quickstart below
has static images, but if you run these commands in a jupyter notebook, ipython or IDE you will
be able to interact with them.
# Quickstart
```python
from classgraphic.essential import *
# loading the data
df = px.data.iris()
# let's see what kind of data we have
describe(df, transpose=True).show()
```
![dataframe describe tale](https://github.com/dionresearch/classgraphic/raw/main/docs/source/describe.png)
```python
# any missing?
missing(df)
```
![dataframe describe tale](https://github.com/dionresearch/classgraphic/raw/main/docs/source/missing.png)
```python
# features
X = df.drop(columns=["species", "species_id"])
#target
y = df["species"]
# Let's check our classes we will be training on and predicting
class_imbalance_table(y, condition="all")
```
![dataframe describe tale](https://github.com/dionresearch/classgraphic/raw/main/docs/source/imbalance_table.png)
```python
# train / test split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.5, random_state=random_state
)
# we want to see total count for each, default for bars is to be stacked, so that works
# we could also pass to class_imbalance barmode="overlay" if we prefer
class_imbalance(y_train, y_test, condition="train,test")
```
![dataframe describe tale](https://github.com/dionresearch/classgraphic/raw/main/docs/source/class_imbalance.png)
```python
# model
model = LogisticRegression(max_iter=max_iter, random_state=random_state)
model.fit(X_train, y_train)
# predictions
y_score = model.predict_proba(X_test)
y_pred = model.predict(X_test)
confusion_matrix_table(model, y_test, y_pred).show()
classification_table(model, y_test, y_pred)
```
![dataframe describe tale](https://github.com/dionresearch/classgraphic/raw/main/docs/source/confusion.png)
![dataframe describe tale](https://github.com/dionresearch/classgraphic/raw/main/docs/source/classification_table.png)
```python
feature_importance(model, y, transpose=True)
```
![dataframe describe tale](https://github.com/dionresearch/classgraphic/raw/main/docs/source/feature.png)
This concludes the quickstart. There are many more visualizations and tables to explore.
See the `notebooks` and `docs` folders on [github](https://github.com/dionresearch/classgraphic) and the documentation
[web site](https://dionresearch.github.io/classgraphic/) for more information.
# Requirements
- Python 3.8 or later
- numpy
- pandas
- plotly>=5.0
- scikit-learn
- nbformat
# Install
If you use conda, create an environment named `classgraphic`, then activate it:
- in Linux:
`source activate pilot`
- In Windows:
`conda activate pilot`
If you use another environment management create and activate your environment
using the normal steps.
Then execute:
```sh
python setup.py install
```
or for installing in [development mode](https://pip.pypa.io/en/latest/cli/pip_install/#install-editable):
```sh
python -m pip install -e . --no-build-isolation
```
or alternatively
```sh
python setup.py develop
```
To install from github instead:
```shell
pip install git+https://github.com/dionresearch/classgraphic
```
# See also
- [stemgraphic](https://github.com/dionresearch/stemgraphic) python package for visualization of data and text
- [Hotelling](https://github.com/dionresearch/hotelling) one and two sample Hotelling T2 tests, T2 and f statistics and univariate and multivariate control charts and anomaly detection
# History
## 0.3.1 (2023-09-20)
* bugfix for describe with pandas 2.x
## 0.3.0 (2023-05-01)
* added 2D clustering visualization
* defaults to Voronoi tessalation
* optional Delauney triangulation
## 0.2.1 (2022-09-20)
* fixed image not showing on pypi
* fixed feature importance error
* warning = False didn't prevent warning to show
## 0.2.1 (2022-09-19)
* added binary classification notebook example
* fixed issue with non dataframe binary classification
## 0.2.0 (2022-09-18)
The previous version was a first step to doing a public release. This
release added:
* documented
* updated the code to be in line with plotly 5.x
It was released to [github](https://github.com/dionresearch/classgraphic) and pypy.
## 0.1.0 (2019-10-27)
* First private release
## Origins
Inspired by Dion Research LLC Internal EDA/anomaly and end to end data science platform.
A dozen charts and tables were initially designed to provide better diagnostic reporting.
Some can also be used for exploratory or explanatory purposes.
See:
https://blog.dionresearch.com/2019/10/visualizations-explanatory-exploratory.html
Raw data
{
"_id": null,
"home_page": "https://github.com/dionresearch/classgraphic",
"name": "classgraphic",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "",
"keywords": "classgraphic,classification,clustering,visualization,ml,machine learning,plotly,interactive",
"author": "Francois Dion",
"author_email": "fdion@dionresearch.com",
"download_url": "https://files.pythonhosted.org/packages/8d/95/3cd340432d090a24c87c3c8429534fd4485419f4dab0f368a6331b973247/classgraphic-0.3.1.tar.gz",
"platform": null,
"description": "# classgraphic\n[![made-with-python](https://img.shields.io/badge/Made%20with-Python-1f425f.svg)](https://www.python.org/)\n[![image](https://img.shields.io/pypi/v/classgraphic.svg)](https://pypi.python.org/pypi/classgraphic) \n![Dev](https://github.com/dionresearch/classgraphic/actions/workflows/dev.yml/badge.svg)\n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/dionresearch/classgraphic/HEAD?labpath=notebooks%2FClassGraphic_iris_demo.ipynb)\n\nInteractive classification diagnostic plots for scikit-learn.\n\n![coin sorting machine](https://github.com/dionresearch/classgraphic/raw/main/docs/source/sorter_patent.jpg)\n\n> We classify things for the purpose of doing something to them. Any classification which does not assist manipulation is worse than useless. - Randolph S. Bourne,\n \"Education and Living\", The Century Co (April 1917)\n\n# Major features:\n\nPlotly based tables for:\n\n- class_imbalance_table \n- classification_table\n- confusion_matrix_table\n- describe (dataframe stats)\n- prediction_table\n- table\n\nAnd the following charts:\n\n- class_imbalance \n- class_error\n- det\n- feature_importance\n- missing\n- precision_recall\n- roc\n- prediction_histogram\n- threshold\n\nFor clustering:\n- Delauney triangulations\n- Voronoi tessalations\n\n# Try it\n\n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/dionresearch/classgraphic/HEAD?labpath=notebooks%2FClassGraphic_iris_demo.ipynb)\n\nBy trying it on binder, you'll see all the details and interactivity. The quickstart below\nhas static images, but if you run these commands in a jupyter notebook, ipython or IDE you will\nbe able to interact with them.\n\n# Quickstart\n\n```python\nfrom classgraphic.essential import *\n\n# loading the data\ndf = px.data.iris()\n\n# let's see what kind of data we have\ndescribe(df, transpose=True).show()\n```\n![dataframe describe tale](https://github.com/dionresearch/classgraphic/raw/main/docs/source/describe.png)\n```python\n# any missing?\nmissing(df)\n```\n![dataframe describe tale](https://github.com/dionresearch/classgraphic/raw/main/docs/source/missing.png)\n```python\n# features\nX = df.drop(columns=[\"species\", \"species_id\"])\n\n#target\ny = df[\"species\"]\n\n# Let's check our classes we will be training on and predicting\nclass_imbalance_table(y, condition=\"all\")\n```\n![dataframe describe tale](https://github.com/dionresearch/classgraphic/raw/main/docs/source/imbalance_table.png)\n```python\n# train / test split\nX_train, X_test, y_train, y_test = train_test_split(\n X, y, test_size=0.5, random_state=random_state\n)\n\n# we want to see total count for each, default for bars is to be stacked, so that works\n# we could also pass to class_imbalance barmode=\"overlay\" if we prefer\nclass_imbalance(y_train, y_test, condition=\"train,test\")\n```\n![dataframe describe tale](https://github.com/dionresearch/classgraphic/raw/main/docs/source/class_imbalance.png)\n```python\n# model\nmodel = LogisticRegression(max_iter=max_iter, random_state=random_state)\nmodel.fit(X_train, y_train)\n\n# predictions\ny_score = model.predict_proba(X_test)\ny_pred = model.predict(X_test)\n\nconfusion_matrix_table(model, y_test, y_pred).show()\nclassification_table(model, y_test, y_pred)\n```\n![dataframe describe tale](https://github.com/dionresearch/classgraphic/raw/main/docs/source/confusion.png)\n![dataframe describe tale](https://github.com/dionresearch/classgraphic/raw/main/docs/source/classification_table.png)\n```python\nfeature_importance(model, y, transpose=True)\n```\n![dataframe describe tale](https://github.com/dionresearch/classgraphic/raw/main/docs/source/feature.png)\n\nThis concludes the quickstart. There are many more visualizations and tables to explore.\n\nSee the `notebooks` and `docs` folders on [github](https://github.com/dionresearch/classgraphic) and the documentation\n[web site](https://dionresearch.github.io/classgraphic/) for more information.\n\n# Requirements\n\n- Python 3.8 or later\n- numpy\n- pandas\n- plotly>=5.0\n- scikit-learn\n- nbformat\n\n# Install\n\nIf you use conda, create an environment named `classgraphic`, then activate it:\n\n- in Linux:\n`source activate pilot`\n\n- In Windows:\n`conda activate pilot`\n\nIf you use another environment management create and activate your environment\nusing the normal steps.\n\nThen execute:\n\n```sh\npython setup.py install\n```\n\nor for installing in [development mode](https://pip.pypa.io/en/latest/cli/pip_install/#install-editable):\n\n\n```sh\npython -m pip install -e . --no-build-isolation\n```\n\nor alternatively\n\n```sh\npython setup.py develop\n```\n\nTo install from github instead:\n```shell\npip install git+https://github.com/dionresearch/classgraphic\n```\n\n\n# See also\n\n- [stemgraphic](https://github.com/dionresearch/stemgraphic) python package for visualization of data and text\n- [Hotelling](https://github.com/dionresearch/hotelling) one and two sample Hotelling T2 tests, T2 and f statistics and univariate and multivariate control charts and anomaly detection\n\n\n# History\n\n\n## 0.3.1 (2023-09-20)\n* bugfix for describe with pandas 2.x\n\n## 0.3.0 (2023-05-01)\n\n* added 2D clustering visualization\n* defaults to Voronoi tessalation\n* optional Delauney triangulation\n\n## 0.2.1 (2022-09-20)\n\n* fixed image not showing on pypi\n* fixed feature importance error\n* warning = False didn't prevent warning to show\n\n## 0.2.1 (2022-09-19)\n\n* added binary classification notebook example\n* fixed issue with non dataframe binary classification\n\n## 0.2.0 (2022-09-18)\n\nThe previous version was a first step to doing a public release. This\nrelease added:\n* documented\n* updated the code to be in line with plotly 5.x\n\nIt was released to [github](https://github.com/dionresearch/classgraphic) and pypy.\n\n## 0.1.0 (2019-10-27)\n\n* First private release\n\n## Origins\n\nInspired by Dion Research LLC Internal EDA/anomaly and end to end data science platform.\nA dozen charts and tables were initially designed to provide better diagnostic reporting.\nSome can also be used for exploratory or explanatory purposes.\n\nSee:\nhttps://blog.dionresearch.com/2019/10/visualizations-explanatory-exploratory.html\n",
"bugtrack_url": null,
"license": "MIT license",
"summary": "Interactive classification diagnostic plots",
"version": "0.3.1",
"project_urls": {
"Homepage": "https://github.com/dionresearch/classgraphic"
},
"split_keywords": [
"classgraphic",
"classification",
"clustering",
"visualization",
"ml",
"machine learning",
"plotly",
"interactive"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "e11ee16382d15f79f855f6f9c75263dc93535daccb9c5cdea71f549001a4c25c",
"md5": "6c9ee27529b57e210599291b2c69dc6b",
"sha256": "d0e5a9cd816845f583d156a11b4b8f1782b694ba85b9b1c08941db3c777a127b"
},
"downloads": -1,
"filename": "classgraphic-0.3.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6c9ee27529b57e210599291b2c69dc6b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 20183,
"upload_time": "2023-09-21T13:01:47",
"upload_time_iso_8601": "2023-09-21T13:01:47.723146Z",
"url": "https://files.pythonhosted.org/packages/e1/1e/e16382d15f79f855f6f9c75263dc93535daccb9c5cdea71f549001a4c25c/classgraphic-0.3.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "8d953cd340432d090a24c87c3c8429534fd4485419f4dab0f368a6331b973247",
"md5": "6694ffabda467806d6bd9880689bd1df",
"sha256": "75803f1e09d4990661d03e08c0ce73278782e08835da8970f24ea07012f029d5"
},
"downloads": -1,
"filename": "classgraphic-0.3.1.tar.gz",
"has_sig": false,
"md5_digest": "6694ffabda467806d6bd9880689bd1df",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 546320,
"upload_time": "2023-09-21T13:01:49",
"upload_time_iso_8601": "2023-09-21T13:01:49.570367Z",
"url": "https://files.pythonhosted.org/packages/8d/95/3cd340432d090a24c87c3c8429534fd4485419f4dab0f368a6331b973247/classgraphic-0.3.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-09-21 13:01:49",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "dionresearch",
"github_project": "classgraphic",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "numpy",
"specs": []
},
{
"name": "pandas",
"specs": []
},
{
"name": "plotly",
"specs": [
[
">=",
"5.0"
]
]
},
{
"name": "scikit-learn",
"specs": []
}
],
"lcname": "classgraphic"
}