hiclass


Namehiclass JSON
Version 4.13.3 PyPI version JSON
download
home_pageNone
SummaryHierarchical Classification Library.
upload_time2024-12-06 21:35:53
maintainerNone
docs_urlNone
authorFabio Malcher Miranda, Niklas Koehnecke
requires_python<3.13,>=3.8
licenseBSD 3-Clause
keywords hierarchical classification
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            
# HiClass

HiClass is an open-source Python library for hierarchical classification compatible with scikit-learn.

[![Deploy PyPI](https://github.com/scikit-learn-contrib/hiclass/actions/workflows/deploy-pypi.yml/badge.svg?event=push)](https://github.com/scikit-learn-contrib/hiclass/actions/workflows/deploy-pypi.yml) [![Documentation Status](https://readthedocs.org/projects/hiclass/badge/?version=latest)](https://hiclass.readthedocs.io/en/latest/?badge=latest) [![codecov](https://codecov.io/gh/scikit-learn-contrib/hiclass/branch/main/graph/badge.svg?token=PR8VLBMMNR)](https://codecov.io/gh/scikit-learn-contrib/hiclass) [![Downloads PyPI](https://static.pepy.tech/personalized-badge/hiclass?period=total&units=international_system&left_color=grey&right_color=brightgreen&left_text=pypi)](https://pypi.org/project/hiclass/) [![Downloads Conda](https://img.shields.io/conda/dn/conda-forge/hiclass?label=conda)](https://anaconda.org/conda-forge/hiclass) [![License](https://img.shields.io/badge/License-BSD_3--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

✨ Here is a **demo** that shows HiClass in action on hierarchical data:

- Classify a consumer complaints dataset from the consumer financial protection bureau: [consumer-complaints](https://colab.research.google.com/drive/1rQTDxWcck-PH4saKzrofQ7Sg9W23lYZv?usp=sharing)

## Quick links

- [Features](#features)
- [Benchmarks](#benchmarks)
- [Roadmap](#roadmap)
- [Who is using HiClass?](#who-is-using-hiclass)
- [Install](#install)
- [Quick start](#quick-start)
- [Explaining Hierarchical Classifiers](#explaining-hierarchical-classifiers)
- [Step-by-step walk-through](#step-by-step-walk-through)
- [API documentation](#api-documentation)
- [FAQ](#faq)
- [Support](#support)
- [Contributing](#contributing)
- [Getting the latest updates](#getting-the-latest-updates)
- [Citation](#citation)

## Features

- **Python lists and NumPy arrays:** Handles Python lists and NumPy arrays elegantly, out-of-the-box.
- **Pandas Series and DataFrames:** If you prefer to use pandas, that is not an issue as HiClass also works with Pandas.
- **Sparse matrices:** HiClass also supports features (X_train and X_test) built with sparse matrices, both for training and predicting, which can save you heaps of memory.
- **[Parallel training](https://hiclass.readthedocs.io/en/latest/auto_examples/plot_parallel_training.html):** Training can be performed in parallel on the hierarchical classifiers, which allows parallelization regardless of the implementations available on scikit-learn.
- **[Build pipelines](https://hiclass.readthedocs.io/en/latest/auto_examples/plot_pipeline.html):** Since the hierarchical classifiers inherit from the BaseEstimator of scikit-learn, pipelines can be built to automate machine learning workflows.
- **[Hierarchical metrics](https://hiclass.readthedocs.io/en/latest/api/utilities.html#hierarchical-metrics):** HiClass supports the computation of hierarchical precision, recall and f-score, which are more appropriate for hierarchical data than traditional metrics.
- **[Compatible with pickle](https://hiclass.readthedocs.io/en/latest/auto_examples/plot_model_persistence.html):** Easily store trained models on disk for future use.
- **[BERT sklearn](https://hiclass.readthedocs.io/en/latest/auto_examples/plot_bert.html):** Compatible with the library [BERT sklearn](https://github.com/charles9n/bert-sklearn).
- **[Hierarchical Explanability](https://hiclass.readthedocs.io/en/latest/algorithms/explainer.html):**  HiClass allows explaining hierarchical models using the [SHAP](https://github.com/shap/shap) package.

**Any feature missing on this list?** Search our [issue tracker](https://github.com/scikit-learn-contrib/hiclass/issues) to see if someone has already requested it and add a comment to it explaining your use-case. Otherwise, please open a new issue describing the requested feature and possible use-case scenario. We prioritize our roadmap based on user feedback, so we would love to hear from you.

## Benchmarks

### Consumer complaints dataset with ~600K training examples

This first benchmark was executed on Google Colab with only 1 core, using Logistic Regression as the base classifier.

|Classifier|Training Time (hh:mm:ss)|Memory Usage (GB)|Disk Usage (MB)|F-score|
|----------|:-----------------------:|:---------------:|:-------------:|:-----:|
|[Local Classifier per Parent Node](https://colab.research.google.com/drive/1yZlQ9UnBEGdkIpnJ3pBwvbZ-U0SXL-UG?usp=sharing)|00:52:58|5.28|121|**0.7689**|
|[Local Classifier per Node](https://colab.research.google.com/drive/1rQTDxWcck-PH4saKzrofQ7Sg9W23lYZv?usp=sharing)|**00:33:02**|**4.87**|123|0.7647|
|[Local Classifier per Level](https://colab.research.google.com/drive/1b_Qb2d6RhSO7ICYTIsxH6ZqCVgeKWmll?usp=sharing)|04:14:45|10.71|123|0.7684|
|[Flat Classifier](https://colab.research.google.com/drive/10jgzA65WaoTc7tFfrlKlhlwPBs3PFy9m?usp=sharing)|03:20:26|9.57|**107**|0.7636|

This second benchmark is similar to the last one, except that it was executed on multiple cluster nodes running GNU/Linux with 512 GB physical memory and 128
cores provided by two AMD EPYC™ 7742 processors, and each model had 12 cores available for training.

|Classifier|Training Time (hh:mm:ss)|Memory Usage (GB)|Disk Usage (MB)|F-score|
|----------|:-----------------------:|:---------------:|:-------------:|:-----:|
|Local Classifier per Parent Node|00:32:05|9.30|122|**0.7798**|
|Local Classifier per Node|**00:04:05**|21.01|123|0.7763|
|Local Classifier per Level|02:24:44|11.45|124|0.7795|
|Flat Classifier|00:57:16|**3.15**|**108**|0.7748|

This third benchmark was also executed on the same cluster node as the previous benchmark and 12 cores were provided for each model, however, the base classifier was LightGBM instead.

|Classifier|Training Time (hh:mm:ss)|Memory Usage (GB)|Disk Usage (MB)|F-score|
|----------|:-----------------------:|:---------------:|:-------------:|:-----:|
|Local Classifier per Parent Node|**00:28:00**|9.00|77|0.7531|
|Local Classifier per Node|00:55:55|31.92|412|**0.7901**|
|Local Classifier per Level|01:35:26|9.04|36|0.6854|
|Flat Classifier|01:11:24|**4.54**|**30**|0.3710|

Lastly, this fourth benchmark was also executed on the same cluster node as the previous benchmarks and 12 cores were provided for each model, however, the base classifier was random forest instead.

|Classifier|Training Time (hh:mm:ss)|Memory Usage (GB)|Disk Usage (GB)|F-score|
|----------|:-----------------------:|:---------------:|:-------------:|:-----:|
|Local Classifier per Parent Node|07:34:47|**48.30**|**24**|0.7407|
|Local Classifier per Node|06:50:17|55.19|27|**0.7668**|
|Local Classifier per Level|09:45:18|191.39|96|0.7383|
|Flat Classifier|**01:26:55**|162.40|81|0.6672|

For reproducibility, a Snakemake pipeline was created. Instructions on how to run it and source code are available at [https://github.com/scikit-learn-contrib/hiclass/tree/main/benchmarks/consumer_complaints](https://github.com/scikit-learn-contrib/hiclass/tree/main/benchmarks/consumer_complaints).

We would love to benchmark with larger datasets, if we can find them in the public domain. If you have any suggestions for hierarchical datasets that are public, please let us know by opening an issue. We would also be delighted if you are able to share benchmarks from your own large datasets. Please send us a pull request.

## Roadmap

Here is our public roadmap: https://github.com/scikit-learn-contrib/hiclass/projects/1.

We do Just-In-Time planning, and we tend to reprioritize based on your feedback. Hence, items you see on this roadmap are subject to change. We prioritize features based on the number of people asking for it, features/fixes that are small enough and can be addressed while we work on other related features, features/fixes that help improve stability & relevance and features that address interesting use cases that excite us! If you would like to have a request prioritized, we ask that you add a detailed use-case for it, either as a comment on an existing issue (besides a thumbs-up) or in a new issue. The detailed context helps.


## Who is using HiClass?

HiClass is currently being used in [HiTaC](https://gitlab.com/dacs-hpi/hitac), a hierarchical taxonomic classifier for fungal ITS sequences.

If you use HiClass in one of your projects and would like to have it listed here, please send us a pull request or contact fabio.malchermiranda@hpi.de.

## Install

### Option 1: Pip


HiClass and its dependencies can be easily installed with pip:

```shell
pip install hiclass
```

If you need additional functionality, you can install extra dependencies using the following syntax:
```shell
pip install hiclass"[<extra_name>]"
```
Replace <extra_name> with one of the following options:

- ray: Installs the ray package, which is required for parallel processing support.
- xai: Installs the shap and xarray packages, which are required for explaining Hiclass' predictions.

### Option 2: Conda

Alternatively, HiClass and its dependencies can also be installed with conda:

```shell
conda install -c conda-forge hiclass
```

Further installation instructions are available on our [getting started guide](https://hiclass.readthedocs.io/en/latest/get_started/index.html). This will guide you through the process of setting up an isolated Python virtual environment with conda, venv or pipenv before installing hiclass with conda or pip, and how to verify a successful installation.

## Quick start

Here's a quick example showcasing how you can train and predict using a local classifier per node, with a `RandomForestClassifier` for each node:

```python
from hiclass import LocalClassifierPerNode
from sklearn.ensemble import RandomForestClassifier

# Define data
X_train = [[1], [2], [3], [4]]
X_test = [[4], [3], [2], [1]]
Y_train = [
    ['Animal', 'Mammal', 'Sheep'],
    ['Animal', 'Mammal', 'Cow'],
    ['Animal', 'Reptile', 'Snake'],
    ['Animal', 'Reptile', 'Lizard'],
]

# Use random forest classifiers for every node
rf = RandomForestClassifier()
classifier = LocalClassifierPerNode(local_classifier=rf)

# Train local classifier per node
classifier.fit(X_train, Y_train)

# Predict
predictions = classifier.predict(X_test)
```

HiClass can also be adopted in scikit-learn pipelines, and fully supports sparse matrices as input. In order to demonstrate the use of both of these features, we will use the following example:

```python
from hiclass import LocalClassifierPerParentNode
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline

# Define data
X_train = [
    'Struggling to repay loan',
    'Unable to get annual report',
]
X_test = [
    'Unable to get annual report',
    'Struggling to repay loan',
]
Y_train = [
    ['Loan', 'Student loan'],
    ['Credit reporting', 'Reports']
]
```

Now, let's build a pipeline that will use `CountVectorizer` and `TfidfTransformer` to extract features as sparse matrices:

```python
# Use logistic regression classifiers for every parent node
lr = LogisticRegression()
pipeline = Pipeline([
    ('count', CountVectorizer()),
    ('tfidf', TfidfTransformer()),
    ('lcppn', LocalClassifierPerParentNode(local_classifier=lr)),
])
```

Finally, let's train and predict with the pipeline we just created:

```python
# Train local classifier per parent node
pipeline.fit(X_train, Y_train)

# Predict
predictions = pipeline.predict(X_test)
```

## Explaining Hierarchical Classifiers

Hierarchical classifiers can provide additional insights when combined with explainability methods. HiClass allows explaining hierarchical models using SHAP values. Different hierarchical models yield different insights. More information on explaining [Local classifier per parent node](https://colab.research.google.com/drive/1rVlYuRU_uO1jw5sD6qo2HoCpCz6E6z5J?usp=sharing), [Local classifier per node](https://colab.research.google.com/drive/1wqSl1t_Qn2f62WNZQ48mdB0mNeu1XSF1?usp=sharing), and [Local classifier per level](https://colab.research.google.com/drive/1VnGlJu-1wSG4wxHXL0Ijf2a7Pu3kklT-?usp=sharing) is available on [Read the Docs](https://hiclass.readthedocs.io/en/latest/algorithms/explainer.html).

## Step-by-step walk-through

A step-by-step walk-through is available on our documentation hosted on [Read the Docs](https://hiclass.readthedocs.io/en/latest/index.html).

This will guide you through the process of installing hiclass within a virtual environment, training, predicting, persisting models and much more.

## API documentation

Here's our official API documentation, available on [Read the Docs](https://hiclass.readthedocs.io/en/latest/api/index.html).

If you notice any issues with the documentation or walk-through, please let us know by opening an issue here: [https://github.com/scikit-learn-contrib/hiclass/issues](https://github.com/scikit-learn-contrib/hiclass/issues).

## FAQ

### How do the hierarchical classifiers work?

A detailed description on how the classifiers work is available at the [Algorithms Overview](https://hiclass.readthedocs.io/en/latest/algorithms/index.html) section on Read the Docs.

## Support

If you run into any problems or issues, please create a [Github issue](https://github.com/scikit-learn-contrib/hiclass/issues) and we'll try our best to help.

We strive to provide good support through our issue tracker on Github. However, if you'd like to receive private support with:

- Phone / video calls to discuss your specific use case and get recommendations
- Private discussions over Slack or Mattermost

Please reach out to fabio.malchermiranda@hpi.de.

## Contributing

We are a small team on a mission to democratize hierarchical classification, and we will take all the help we can get! If you would like to get involved, here is information on [contribution guidelines and how to test the code locally](https://github.com/scikit-learn-contrib/hiclass/blob/main/CONTRIBUTING.md).

You can contribute in multiple ways, e.g., reporting bugs, writing or translating documentation, reviewing or refactoring code, requesting or implementing new features, etc.

## Getting the latest updates

If you'd like to get updates when we release new versions, please click on the "Watch" button on the top and select "Releases only". Github will then send you notifications along with a changelog with each new release.

## Citation

If you use HiClass in your research, please cite our [paper](https://jmlr.org/papers/v24/21-1518.html):

> Miranda, F.M., Köehnecke, N. and Renard, B.Y. (2023) 'HiClass: a Python Library for Local Hierarchical Classification Compatible with Scikit-learn', Journal of Machine Learning Research, 24(29), pp. 1–17. Available at: https://jmlr.org/papers/v24/21-1518.html.

```latex
@article{JMLR:v24:21-1518,
  author  = {F{\'a}bio M. Miranda and Niklas K{\"o}hnecke and Bernhard Y. Renard},
  title   = {HiClass: a Python Library for Local Hierarchical Classification Compatible with Scikit-learn},
  journal = {Journal of Machine Learning Research},
  year    = {2023},
  volume  = {24},
  number  = {29},
  pages   = {1--17},
  url     = {http://jmlr.org/papers/v24/21-1518.html}
}
```

**Note**: If you use HiClass in your GitHub projects, please add `hiclass` in the `requirements.txt`.

In addition, we would like to list publications that use HiClass to solve hierarchical problems. If you would like your manuscript to be added to this list, please email the reference, the name of your lab, department and institution to fabio.malchermiranda@hpi.de

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "hiclass",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.8",
    "maintainer_email": null,
    "keywords": "hierarchical classification",
    "author": "Fabio Malcher Miranda, Niklas Koehnecke",
    "author_email": "fabio.malchermiranda@hpi.de, Niklas.Koehnecke@student.hpi.uni-potsdam.de",
    "download_url": "https://files.pythonhosted.org/packages/41/7f/2aa737557e54d48bb50c04406c6eb8d26152fb317892fe6dc985a12f9e70/hiclass-4.13.3.tar.gz",
    "platform": null,
    "description": "\n# HiClass\n\nHiClass is an open-source Python library for hierarchical classification compatible with scikit-learn.\n\n[![Deploy PyPI](https://github.com/scikit-learn-contrib/hiclass/actions/workflows/deploy-pypi.yml/badge.svg?event=push)](https://github.com/scikit-learn-contrib/hiclass/actions/workflows/deploy-pypi.yml) [![Documentation Status](https://readthedocs.org/projects/hiclass/badge/?version=latest)](https://hiclass.readthedocs.io/en/latest/?badge=latest) [![codecov](https://codecov.io/gh/scikit-learn-contrib/hiclass/branch/main/graph/badge.svg?token=PR8VLBMMNR)](https://codecov.io/gh/scikit-learn-contrib/hiclass) [![Downloads PyPI](https://static.pepy.tech/personalized-badge/hiclass?period=total&units=international_system&left_color=grey&right_color=brightgreen&left_text=pypi)](https://pypi.org/project/hiclass/) [![Downloads Conda](https://img.shields.io/conda/dn/conda-forge/hiclass?label=conda)](https://anaconda.org/conda-forge/hiclass) [![License](https://img.shields.io/badge/License-BSD_3--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\n\u2728 Here is a **demo** that shows HiClass in action on hierarchical data:\n\n- Classify a consumer complaints dataset from the consumer financial protection bureau: [consumer-complaints](https://colab.research.google.com/drive/1rQTDxWcck-PH4saKzrofQ7Sg9W23lYZv?usp=sharing)\n\n## Quick links\n\n- [Features](#features)\n- [Benchmarks](#benchmarks)\n- [Roadmap](#roadmap)\n- [Who is using HiClass?](#who-is-using-hiclass)\n- [Install](#install)\n- [Quick start](#quick-start)\n- [Explaining Hierarchical Classifiers](#explaining-hierarchical-classifiers)\n- [Step-by-step walk-through](#step-by-step-walk-through)\n- [API documentation](#api-documentation)\n- [FAQ](#faq)\n- [Support](#support)\n- [Contributing](#contributing)\n- [Getting the latest updates](#getting-the-latest-updates)\n- [Citation](#citation)\n\n## Features\n\n- **Python lists and NumPy arrays:** Handles Python lists and NumPy arrays elegantly, out-of-the-box.\n- **Pandas Series and DataFrames:** If you prefer to use pandas, that is not an issue as HiClass also works with Pandas.\n- **Sparse matrices:** HiClass also supports features (X_train and X_test) built with sparse matrices, both for training and predicting, which can save you heaps of memory.\n- **[Parallel training](https://hiclass.readthedocs.io/en/latest/auto_examples/plot_parallel_training.html):** Training can be performed in parallel on the hierarchical classifiers, which allows parallelization regardless of the implementations available on scikit-learn.\n- **[Build pipelines](https://hiclass.readthedocs.io/en/latest/auto_examples/plot_pipeline.html):** Since the hierarchical classifiers inherit from the BaseEstimator of scikit-learn, pipelines can be built to automate machine learning workflows.\n- **[Hierarchical metrics](https://hiclass.readthedocs.io/en/latest/api/utilities.html#hierarchical-metrics):** HiClass supports the computation of hierarchical precision, recall and f-score, which are more appropriate for hierarchical data than traditional metrics.\n- **[Compatible with pickle](https://hiclass.readthedocs.io/en/latest/auto_examples/plot_model_persistence.html):** Easily store trained models on disk for future use.\n- **[BERT sklearn](https://hiclass.readthedocs.io/en/latest/auto_examples/plot_bert.html):** Compatible with the library [BERT sklearn](https://github.com/charles9n/bert-sklearn).\n- **[Hierarchical Explanability](https://hiclass.readthedocs.io/en/latest/algorithms/explainer.html):**  HiClass allows explaining hierarchical models using the [SHAP](https://github.com/shap/shap) package.\n\n**Any feature missing on this list?** Search our [issue tracker](https://github.com/scikit-learn-contrib/hiclass/issues) to see if someone has already requested it and add a comment to it explaining your use-case. Otherwise, please open a new issue describing the requested feature and possible use-case scenario. We prioritize our roadmap based on user feedback, so we would love to hear from you.\n\n## Benchmarks\n\n### Consumer complaints dataset with ~600K training examples\n\nThis first benchmark was executed on Google Colab with only 1 core, using Logistic Regression as the base classifier.\n\n|Classifier|Training Time (hh:mm:ss)|Memory Usage (GB)|Disk Usage (MB)|F-score|\n|----------|:-----------------------:|:---------------:|:-------------:|:-----:|\n|[Local Classifier per Parent Node](https://colab.research.google.com/drive/1yZlQ9UnBEGdkIpnJ3pBwvbZ-U0SXL-UG?usp=sharing)|00:52:58|5.28|121|**0.7689**|\n|[Local Classifier per Node](https://colab.research.google.com/drive/1rQTDxWcck-PH4saKzrofQ7Sg9W23lYZv?usp=sharing)|**00:33:02**|**4.87**|123|0.7647|\n|[Local Classifier per Level](https://colab.research.google.com/drive/1b_Qb2d6RhSO7ICYTIsxH6ZqCVgeKWmll?usp=sharing)|04:14:45|10.71|123|0.7684|\n|[Flat Classifier](https://colab.research.google.com/drive/10jgzA65WaoTc7tFfrlKlhlwPBs3PFy9m?usp=sharing)|03:20:26|9.57|**107**|0.7636|\n\nThis second benchmark is similar to the last one, except that it was executed on multiple cluster nodes running GNU/Linux with 512 GB physical memory and 128\ncores provided by two AMD EPYC\u2122 7742 processors, and each model had 12 cores available for training.\n\n|Classifier|Training Time (hh:mm:ss)|Memory Usage (GB)|Disk Usage (MB)|F-score|\n|----------|:-----------------------:|:---------------:|:-------------:|:-----:|\n|Local Classifier per Parent Node|00:32:05|9.30|122|**0.7798**|\n|Local Classifier per Node|**00:04:05**|21.01|123|0.7763|\n|Local Classifier per Level|02:24:44|11.45|124|0.7795|\n|Flat Classifier|00:57:16|**3.15**|**108**|0.7748|\n\nThis third benchmark was also executed on the same cluster node as the previous benchmark and 12 cores were provided for each model, however, the base classifier was LightGBM instead.\n\n|Classifier|Training Time (hh:mm:ss)|Memory Usage (GB)|Disk Usage (MB)|F-score|\n|----------|:-----------------------:|:---------------:|:-------------:|:-----:|\n|Local Classifier per Parent Node|**00:28:00**|9.00|77|0.7531|\n|Local Classifier per Node|00:55:55|31.92|412|**0.7901**|\n|Local Classifier per Level|01:35:26|9.04|36|0.6854|\n|Flat Classifier|01:11:24|**4.54**|**30**|0.3710|\n\nLastly, this fourth benchmark was also executed on the same cluster node as the previous benchmarks and 12 cores were provided for each model, however, the base classifier was random forest instead.\n\n|Classifier|Training Time (hh:mm:ss)|Memory Usage (GB)|Disk Usage (GB)|F-score|\n|----------|:-----------------------:|:---------------:|:-------------:|:-----:|\n|Local Classifier per Parent Node|07:34:47|**48.30**|**24**|0.7407|\n|Local Classifier per Node|06:50:17|55.19|27|**0.7668**|\n|Local Classifier per Level|09:45:18|191.39|96|0.7383|\n|Flat Classifier|**01:26:55**|162.40|81|0.6672|\n\nFor reproducibility, a Snakemake pipeline was created. Instructions on how to run it and source code are available at [https://github.com/scikit-learn-contrib/hiclass/tree/main/benchmarks/consumer_complaints](https://github.com/scikit-learn-contrib/hiclass/tree/main/benchmarks/consumer_complaints).\n\nWe would love to benchmark with larger datasets, if we can find them in the public domain. If you have any suggestions for hierarchical datasets that are public, please let us know by opening an issue. We would also be delighted if you are able to share benchmarks from your own large datasets. Please send us a pull request.\n\n## Roadmap\n\nHere is our public roadmap: https://github.com/scikit-learn-contrib/hiclass/projects/1.\n\nWe do Just-In-Time planning, and we tend to reprioritize based on your feedback. Hence, items you see on this roadmap are subject to change. We prioritize features based on the number of people asking for it, features/fixes that are small enough and can be addressed while we work on other related features, features/fixes that help improve stability & relevance and features that address interesting use cases that excite us! If you would like to have a request prioritized, we ask that you add a detailed use-case for it, either as a comment on an existing issue (besides a thumbs-up) or in a new issue. The detailed context helps.\n\n\n## Who is using HiClass?\n\nHiClass is currently being used in [HiTaC](https://gitlab.com/dacs-hpi/hitac), a hierarchical taxonomic classifier for fungal ITS sequences.\n\nIf you use HiClass in one of your projects and would like to have it listed here, please send us a pull request or contact fabio.malchermiranda@hpi.de.\n\n## Install\n\n### Option 1: Pip\n\n\nHiClass and its dependencies can be easily installed with pip:\n\n```shell\npip install hiclass\n```\n\nIf you need additional functionality, you can install extra dependencies using the following syntax:\n```shell\npip install hiclass\"[<extra_name>]\"\n```\nReplace <extra_name> with one of the following options:\n\n- ray: Installs the ray package, which is required for parallel processing support.\n- xai: Installs the shap and xarray packages, which are required for explaining Hiclass' predictions.\n\n### Option 2: Conda\n\nAlternatively, HiClass and its dependencies can also be installed with conda:\n\n```shell\nconda install -c conda-forge hiclass\n```\n\nFurther installation instructions are available on our [getting started guide](https://hiclass.readthedocs.io/en/latest/get_started/index.html). This will guide you through the process of setting up an isolated Python virtual environment with conda, venv or pipenv before installing hiclass with conda or pip, and how to verify a successful installation.\n\n## Quick start\n\nHere's a quick example showcasing how you can train and predict using a local classifier per node, with a `RandomForestClassifier` for each node:\n\n```python\nfrom hiclass import LocalClassifierPerNode\nfrom sklearn.ensemble import RandomForestClassifier\n\n# Define data\nX_train = [[1], [2], [3], [4]]\nX_test = [[4], [3], [2], [1]]\nY_train = [\n    ['Animal', 'Mammal', 'Sheep'],\n    ['Animal', 'Mammal', 'Cow'],\n    ['Animal', 'Reptile', 'Snake'],\n    ['Animal', 'Reptile', 'Lizard'],\n]\n\n# Use random forest classifiers for every node\nrf = RandomForestClassifier()\nclassifier = LocalClassifierPerNode(local_classifier=rf)\n\n# Train local classifier per node\nclassifier.fit(X_train, Y_train)\n\n# Predict\npredictions = classifier.predict(X_test)\n```\n\nHiClass can also be adopted in scikit-learn pipelines, and fully supports sparse matrices as input. In order to demonstrate the use of both of these features, we will use the following example:\n\n```python\nfrom hiclass import LocalClassifierPerParentNode\nfrom sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer\nfrom sklearn.linear_model import LogisticRegression\nfrom sklearn.pipeline import Pipeline\n\n# Define data\nX_train = [\n    'Struggling to repay loan',\n    'Unable to get annual report',\n]\nX_test = [\n    'Unable to get annual report',\n    'Struggling to repay loan',\n]\nY_train = [\n    ['Loan', 'Student loan'],\n    ['Credit reporting', 'Reports']\n]\n```\n\nNow, let's build a pipeline that will use `CountVectorizer` and `TfidfTransformer` to extract features as sparse matrices:\n\n```python\n# Use logistic regression classifiers for every parent node\nlr = LogisticRegression()\npipeline = Pipeline([\n    ('count', CountVectorizer()),\n    ('tfidf', TfidfTransformer()),\n    ('lcppn', LocalClassifierPerParentNode(local_classifier=lr)),\n])\n```\n\nFinally, let's train and predict with the pipeline we just created:\n\n```python\n# Train local classifier per parent node\npipeline.fit(X_train, Y_train)\n\n# Predict\npredictions = pipeline.predict(X_test)\n```\n\n## Explaining Hierarchical Classifiers\n\nHierarchical classifiers can provide additional insights when combined with explainability methods. HiClass allows explaining hierarchical models using SHAP values. Different hierarchical models yield different insights. More information on explaining [Local classifier per parent node](https://colab.research.google.com/drive/1rVlYuRU_uO1jw5sD6qo2HoCpCz6E6z5J?usp=sharing), [Local classifier per node](https://colab.research.google.com/drive/1wqSl1t_Qn2f62WNZQ48mdB0mNeu1XSF1?usp=sharing), and [Local classifier per level](https://colab.research.google.com/drive/1VnGlJu-1wSG4wxHXL0Ijf2a7Pu3kklT-?usp=sharing) is available on [Read the Docs](https://hiclass.readthedocs.io/en/latest/algorithms/explainer.html).\n\n## Step-by-step walk-through\n\nA step-by-step walk-through is available on our documentation hosted on [Read the Docs](https://hiclass.readthedocs.io/en/latest/index.html).\n\nThis will guide you through the process of installing hiclass within a virtual environment, training, predicting, persisting models and much more.\n\n## API documentation\n\nHere's our official API documentation, available on [Read the Docs](https://hiclass.readthedocs.io/en/latest/api/index.html).\n\nIf you notice any issues with the documentation or walk-through, please let us know by opening an issue here: [https://github.com/scikit-learn-contrib/hiclass/issues](https://github.com/scikit-learn-contrib/hiclass/issues).\n\n## FAQ\n\n### How do the hierarchical classifiers work?\n\nA detailed description on how the classifiers work is available at the [Algorithms Overview](https://hiclass.readthedocs.io/en/latest/algorithms/index.html) section on Read the Docs.\n\n## Support\n\nIf you run into any problems or issues, please create a [Github issue](https://github.com/scikit-learn-contrib/hiclass/issues) and we'll try our best to help.\n\nWe strive to provide good support through our issue tracker on Github. However, if you'd like to receive private support with:\n\n- Phone / video calls to discuss your specific use case and get recommendations\n- Private discussions over Slack or Mattermost\n\nPlease reach out to fabio.malchermiranda@hpi.de.\n\n## Contributing\n\nWe are a small team on a mission to democratize hierarchical classification, and we will take all the help we can get! If you would like to get involved, here is information on [contribution guidelines and how to test the code locally](https://github.com/scikit-learn-contrib/hiclass/blob/main/CONTRIBUTING.md).\n\nYou can contribute in multiple ways, e.g., reporting bugs, writing or translating documentation, reviewing or refactoring code, requesting or implementing new features, etc.\n\n## Getting the latest updates\n\nIf you'd like to get updates when we release new versions, please click on the \"Watch\" button on the top and select \"Releases only\". Github will then send you notifications along with a changelog with each new release.\n\n## Citation\n\nIf you use HiClass in your research, please cite our [paper](https://jmlr.org/papers/v24/21-1518.html):\n\n> Miranda, F.M., K\u00f6ehnecke, N. and Renard, B.Y. (2023) 'HiClass: a Python Library for Local Hierarchical Classification Compatible with Scikit-learn', Journal of Machine Learning Research, 24(29), pp. 1\u201317. Available at: https://jmlr.org/papers/v24/21-1518.html.\n\n```latex\n@article{JMLR:v24:21-1518,\n  author  = {F{\\'a}bio M. Miranda and Niklas K{\\\"o}hnecke and Bernhard Y. Renard},\n  title   = {HiClass: a Python Library for Local Hierarchical Classification Compatible with Scikit-learn},\n  journal = {Journal of Machine Learning Research},\n  year    = {2023},\n  volume  = {24},\n  number  = {29},\n  pages   = {1--17},\n  url     = {http://jmlr.org/papers/v24/21-1518.html}\n}\n```\n\n**Note**: If you use HiClass in your GitHub projects, please add `hiclass` in the `requirements.txt`.\n\nIn addition, we would like to list publications that use HiClass to solve hierarchical problems. If you would like your manuscript to be added to this list, please email the reference, the name of your lab, department and institution to fabio.malchermiranda@hpi.de\n",
    "bugtrack_url": null,
    "license": "BSD 3-Clause",
    "summary": "Hierarchical Classification Library.",
    "version": "4.13.3",
    "project_urls": {
        "Bug Tracker": "https://github.com/scikit-learn-contrib/hiclass/issues",
        "Related Software": "https://gitlab.com/dacs-hpi",
        "Source Code": "https://github.com/scikit-learn-contrib/hiclass"
    },
    "split_keywords": [
        "hierarchical",
        "classification"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c07dc801638c2fff27e95c24990ec933b5faac86914a04430707d5429c71907a",
                "md5": "df33906eb31ec37ab1752d9c5c46ec12",
                "sha256": "0465b49a7a8ce1d4d56b8917ec600939e6d18e725f5a5e0b1eada47c6ea9d3b3"
            },
            "downloads": -1,
            "filename": "hiclass-4.13.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "df33906eb31ec37ab1752d9c5c46ec12",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.8",
            "size": 67018,
            "upload_time": "2024-12-06T21:35:51",
            "upload_time_iso_8601": "2024-12-06T21:35:51.360267Z",
            "url": "https://files.pythonhosted.org/packages/c0/7d/c801638c2fff27e95c24990ec933b5faac86914a04430707d5429c71907a/hiclass-4.13.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "417f2aa737557e54d48bb50c04406c6eb8d26152fb317892fe6dc985a12f9e70",
                "md5": "e79724f002c84ee0e7f08b1d60eb39d9",
                "sha256": "e42ef64b66a05bed88cbc1c7ffe3b22147269fa455c8d50d44fad45cf3d069a7"
            },
            "downloads": -1,
            "filename": "hiclass-4.13.3.tar.gz",
            "has_sig": false,
            "md5_digest": "e79724f002c84ee0e7f08b1d60eb39d9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.8",
            "size": 86793,
            "upload_time": "2024-12-06T21:35:53",
            "upload_time_iso_8601": "2024-12-06T21:35:53.853606Z",
            "url": "https://files.pythonhosted.org/packages/41/7f/2aa737557e54d48bb50c04406c6eb8d26152fb317892fe6dc985a12f9e70/hiclass-4.13.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-06 21:35:53",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "scikit-learn-contrib",
    "github_project": "hiclass",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "lcname": "hiclass"
}
        
Elapsed time: 0.45668s