arm-preprocessing


Namearm-preprocessing JSON
Version 0.2.2 PyPI version JSON
download
home_pagehttps://github.com/firefly-cpp/arm-preprocessing
SummaryImplementation of several preprocessing techniques for Association Rule Mining (ARM)
upload_time2024-04-02 08:27:12
maintainerNone
docs_urlNone
authorTadej Lahovnik
requires_python<3.13,>=3.9
licenseNone
keywords association rule mining data science preprocessing
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <p align="center">
  <img alt="logo" width="300" src=".github/images/logo_black.png">
</p>

# arm-preprocessing
![PyPI Version](https://img.shields.io/pypi/v/arm-preprocessing.svg)
[![arm-preprocessing](https://github.com/firefly-cpp/arm-preprocessing/actions/workflows/test.yml/badge.svg)](https://github.com/firefly-cpp/arm-preprocessing/actions/workflows/test.yml)
[![Documentation Status](https://readthedocs.org/projects/arm-preprocessing/badge/?version=latest)](https://arm-preprocessing.readthedocs.io/en/latest/?badge=latest)
![Repository size](https://img.shields.io/github/repo-size/firefly-cpp/arm-preprocessing)
[![Downloads](https://static.pepy.tech/badge/arm-preprocessing)](https://pepy.tech/project/arm-preprocessing)
![License](https://img.shields.io/github/license/firefly-cpp/arm-preprocessing.svg)
![GitHub commit activity](https://img.shields.io/github/commit-activity/w/firefly-cpp/arm-preprocessing.svg)
![Open issues](https://isitmaintained.com/badge/open/firefly-cpp/arm-preprocessing.svg)
[![Average time to resolve an issue](http://isitmaintained.com/badge/resolution/firefly-cpp/arm-preprocessing.svg)](http://isitmaintained.com/project/firefly-cpp/arm-preprocessing "Average time to resolve an issue")

* **Free software:** MIT license
* **Documentation**: [http://arm-preprocessing.readthedocs.io](http://arm-preprocessing.readthedocs.io)
* **Python**: 3.9.x, 3.10.x, 3.11.x, 3.12x
* **Tested OS:** Windows, Ubuntu, Fedora, Alpine, Arch, macOS. **However, that does not mean it does not work on others**

## About 📋
arm-preprocessing is a lightweight Python library supporting several key steps involving data preparation, manipulation, and discretisation for Association Rule Mining (ARM). 🧠 Embrace its minimalistic design that prioritises simplicity. 💡 The framework is intended to be fully extensible and offers seamless integration with related ARM libraries (e.g., [NiaARM](https://github.com/firefly-cpp/NiaARM)). 🔗

## Why arm-preprocessing?
While numerous libraries facilitate data mining preprocessing tasks, this library is designed to integrate seamlessly with association rule mining. It harmonises well with the NiaARM library, a robust numerical association rule mining framework. The primary aim is to bridge the gap between preprocessing and rule mining, simplifying the workflow/pipeline. Additionally, its design allows for the effortless incorporation of new preprocessing methods and fast benchmarking.

## Key features ✨
- Loading various formats of datasets (CSV, JSON, TXT, TCX) 📊
- Converting datasets to different formats 🔄
- Loading different types of datasets (numerical dataset, discrete dataset, time-series data, text, etc.) 📉
- Dataset identification (which type of dataset) 🔍
- Dataset statistics 📈
- Discretisation methods 📏
- Data squashing methods 🤏
- Feature scaling methods ⚖️
- Feature selection methods 🎯

## Installation 📦
### pip
To install ``arm-preprocessing`` with pip, use:
```bash
pip install arm-preprocessing
```
To install ``arm-preprocessing`` on Alpine Linux, please use:
```sh
$ apk add py3-arm-preprocessing
```

To install ``arm-preprocessing`` on Arch Linux, please use an [AUR helper](https://wiki.archlinux.org/title/AUR_helpers):
```sh
$ yay -Syyu python-arm-preprocessing
```

## Usage 🚀
### Data loading
The following example demonstrates how to load a dataset from a file (csv, json, txt). More examples can be found in the [examples/data_loading](./examples/data_loading/) directory:
- [Loading a dataset from a CSV file](./examples/data_loading/load_dataset_csv.py)
- [Loading a dataset from a JSON file](./examples/data_loading/load_dataset_json.py)
- [Loading a dataset from a TCX file](./examples/data_loading/load_dataset_tcx.py)
- [Loading a time-series dataset](./examples/data_loading/load_dataset_timeseries.py)

```python
from arm_preprocessing.dataset import Dataset

# Initialise dataset with filename (without format) and format (csv, json, txt)
dataset = Dataset('path/to/datasets', format='csv')

# Load dataset
dataset.load_data()
df = dataset.data
```

### Missing values
The following example demonstrates how to handle missing values in a dataset using imputation. More examples can be found in the [examples/missing_values](./examples/missing_values) directory:
- [Handling missing values in a dataset using row deletion](./examples/missing_values/missing_values_rows.py)
- [Handling missing values in a dataset using column deletion](./examples/missing_values/missing_values_columns.py)
- [Handling missing values in a dataset using imputation](./examples/missing_values/missing_values_impute.py)

```python
from arm_preprocessing.dataset import Dataset

# Initialise dataset with filename and format
dataset = Dataset('examples/missing_values/data', format='csv')
dataset.load()

# Impute missing data
dataset.missing_values(method='impute')
```

### Data discretisation
The following example demonstrates how to discretise a dataset using the equal width method. More examples can be found in the [examples/discretisation](./examples/discretisation) directory:
- [Discretising a dataset using the equal width method](./examples/discretisation/equal_width_discretisation.py)
- [Discretising a dataset using the equal frequency method](./examples/discretisation/equal_frequency_discretisation.py)
- [Discretising a dataset using k-means clustering](./examples/discretisation/kmeans_discretisation.py)

```python
from arm_preprocessing.dataset import Dataset

# Initialise dataset with filename (without format) and format (csv, json, txt)
dataset = Dataset('datasets/sportydatagen', format='csv')
dataset.load_data()

# Discretise dataset using equal width discretisation
dataset.discretise(method='equal_width', num_bins=5, columns=['calories'])
```

### Data squashing
The following example demonstrates how to squash a dataset using the euclidean similarity. More examples can be found in the [examples/squashing](./examples/squashing) directory:
- [Squashing a dataset using the euclidean similarity](./examples/squashing/squash_euclidean.py)
- [Squashing a dataset using the cosine similarity](./examples/squashing/squash_cosine.py)

```python
from arm_preprocessing.dataset import Dataset

# Initialise dataset with filename and format
dataset = Dataset('datasets/breast', format='csv')
dataset.load()

# Squash dataset
dataset.squash(threshold=0.75, similarity='euclidean')
```

### Feature scaling
The following example demonstrates how to scale the dataset's features. More examples can be found in the [examples/scaling](./examples/scaling) directory:
- [Scale features using normalisation](./examples/scaling/normalisation.py)
- [Scale features using standardisation](./examples/scaling/standardisation.py)

```python
from arm_preprocessing.dataset import Dataset

# Initialise dataset with filename and format
dataset = Dataset('datasets/Abalone', format='csv')
dataset.load()

# Scale dataset using normalisation
dataset.scale(method='normalisation')
```

### Feature selection
The following example demonstrates how to select features from a dataset. More examples can be found in the [examples/feature_selection](./examples/feature_selection) directory:
- [Select features using the Kendall Tau correlation coefficient](./examples/feature_selection/feature_selection.py)

```python
from arm_preprocessing.dataset import Dataset

# Initialise dataset with filename and format
dataset = Dataset('datasets/sportydatagen', format='csv')
dataset.load()

# Feature selection
dataset.feature_selection(
    method='kendall', threshold=0.15, class_column='calories')
```

## Related frameworks 🔗

[1] [NiaARM: A minimalistic framework for Numerical Association Rule Mining](https://github.com/firefly-cpp/NiaARM)

[2] [uARMSolver: universal Association Rule Mining Solver](https://github.com/firefly-cpp/uARMSolver)

## References 📚

[1] I. Fister, I. Fister Jr., D. Novak and D. Verber, [Data squashing as preprocessing in association rule mining](https://iztok-jr-fister.eu/static/publications/300.pdf), 2022 IEEE Symposium Series on Computational Intelligence (SSCI), Singapore, Singapore, 2022, pp. 1720-1725, doi: 10.1109/SSCI51031.2022.10022240.

[2] I. Fister Jr., I. Fister [A brief overview of swarm intelligence-based algorithms for numerical association rule mining](https://arxiv.org/abs/2010.15524). arXiv preprint arXiv:2010.15524 (2020).

## License

This package is distributed under the MIT License. This license can be found online
at <http://www.opensource.org/licenses/MIT>.

## Disclaimer

This framework is provided as-is, and there are no guarantees that it fits your purposes or that it is bug-free. Use it at your own risk!

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/firefly-cpp/arm-preprocessing",
    "name": "arm-preprocessing",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.9",
    "maintainer_email": null,
    "keywords": "association rule mining, data science, preprocessing",
    "author": "Tadej Lahovnik",
    "author_email": "lahovnik.tadej@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/a8/ac/6d78edbe4b17e3afa5dfbe501e32d988005ce2f7d9d8263f81077f7f18a4/arm_preprocessing-0.2.2.tar.gz",
    "platform": null,
    "description": "<p align=\"center\">\n  <img alt=\"logo\" width=\"300\" src=\".github/images/logo_black.png\">\n</p>\n\n# arm-preprocessing\n![PyPI Version](https://img.shields.io/pypi/v/arm-preprocessing.svg)\n[![arm-preprocessing](https://github.com/firefly-cpp/arm-preprocessing/actions/workflows/test.yml/badge.svg)](https://github.com/firefly-cpp/arm-preprocessing/actions/workflows/test.yml)\n[![Documentation Status](https://readthedocs.org/projects/arm-preprocessing/badge/?version=latest)](https://arm-preprocessing.readthedocs.io/en/latest/?badge=latest)\n![Repository size](https://img.shields.io/github/repo-size/firefly-cpp/arm-preprocessing)\n[![Downloads](https://static.pepy.tech/badge/arm-preprocessing)](https://pepy.tech/project/arm-preprocessing)\n![License](https://img.shields.io/github/license/firefly-cpp/arm-preprocessing.svg)\n![GitHub commit activity](https://img.shields.io/github/commit-activity/w/firefly-cpp/arm-preprocessing.svg)\n![Open issues](https://isitmaintained.com/badge/open/firefly-cpp/arm-preprocessing.svg)\n[![Average time to resolve an issue](http://isitmaintained.com/badge/resolution/firefly-cpp/arm-preprocessing.svg)](http://isitmaintained.com/project/firefly-cpp/arm-preprocessing \"Average time to resolve an issue\")\n\n* **Free software:** MIT license\n* **Documentation**: [http://arm-preprocessing.readthedocs.io](http://arm-preprocessing.readthedocs.io)\n* **Python**: 3.9.x, 3.10.x, 3.11.x, 3.12x\n* **Tested OS:** Windows, Ubuntu, Fedora, Alpine, Arch, macOS. **However, that does not mean it does not work on others**\n\n## About \ud83d\udccb\narm-preprocessing is a lightweight Python library supporting several key steps involving data preparation, manipulation, and discretisation for Association Rule Mining (ARM). \ud83e\udde0 Embrace its minimalistic design that prioritises simplicity. \ud83d\udca1 The framework is intended to be fully extensible and offers seamless integration with related ARM libraries (e.g., [NiaARM](https://github.com/firefly-cpp/NiaARM)). \ud83d\udd17\n\n## Why arm-preprocessing?\nWhile numerous libraries facilitate data mining preprocessing tasks, this library is designed to integrate seamlessly with association rule mining. It harmonises well with the NiaARM library, a robust numerical association rule mining framework. The primary aim is to bridge the gap between preprocessing and rule mining, simplifying the workflow/pipeline. Additionally, its design allows for the effortless incorporation of new preprocessing methods and fast benchmarking.\n\n## Key features \u2728\n- Loading various formats of datasets (CSV, JSON, TXT, TCX) \ud83d\udcca\n- Converting datasets to different formats \ud83d\udd04\n- Loading different types of datasets (numerical dataset, discrete dataset, time-series data, text, etc.) \ud83d\udcc9\n- Dataset identification (which type of dataset) \ud83d\udd0d\n- Dataset statistics \ud83d\udcc8\n- Discretisation methods \ud83d\udccf\n- Data squashing methods \ud83e\udd0f\n- Feature scaling methods \u2696\ufe0f\n- Feature selection methods \ud83c\udfaf\n\n## Installation \ud83d\udce6\n### pip\nTo install ``arm-preprocessing`` with pip, use:\n```bash\npip install arm-preprocessing\n```\nTo install ``arm-preprocessing`` on Alpine Linux, please use:\n```sh\n$ apk add py3-arm-preprocessing\n```\n\nTo install ``arm-preprocessing`` on Arch Linux, please use an [AUR helper](https://wiki.archlinux.org/title/AUR_helpers):\n```sh\n$ yay -Syyu python-arm-preprocessing\n```\n\n## Usage \ud83d\ude80\n### Data loading\nThe following example demonstrates how to load a dataset from a file (csv, json, txt). More examples can be found in the [examples/data_loading](./examples/data_loading/) directory:\n- [Loading a dataset from a CSV file](./examples/data_loading/load_dataset_csv.py)\n- [Loading a dataset from a JSON file](./examples/data_loading/load_dataset_json.py)\n- [Loading a dataset from a TCX file](./examples/data_loading/load_dataset_tcx.py)\n- [Loading a time-series dataset](./examples/data_loading/load_dataset_timeseries.py)\n\n```python\nfrom arm_preprocessing.dataset import Dataset\n\n# Initialise dataset with filename (without format) and format (csv, json, txt)\ndataset = Dataset('path/to/datasets', format='csv')\n\n# Load dataset\ndataset.load_data()\ndf = dataset.data\n```\n\n### Missing values\nThe following example demonstrates how to handle missing values in a dataset using imputation. More examples can be found in the [examples/missing_values](./examples/missing_values) directory:\n- [Handling missing values in a dataset using row deletion](./examples/missing_values/missing_values_rows.py)\n- [Handling missing values in a dataset using column deletion](./examples/missing_values/missing_values_columns.py)\n- [Handling missing values in a dataset using imputation](./examples/missing_values/missing_values_impute.py)\n\n```python\nfrom arm_preprocessing.dataset import Dataset\n\n# Initialise dataset with filename and format\ndataset = Dataset('examples/missing_values/data', format='csv')\ndataset.load()\n\n# Impute missing data\ndataset.missing_values(method='impute')\n```\n\n### Data discretisation\nThe following example demonstrates how to discretise a dataset using the equal width method. More examples can be found in the [examples/discretisation](./examples/discretisation) directory:\n- [Discretising a dataset using the equal width method](./examples/discretisation/equal_width_discretisation.py)\n- [Discretising a dataset using the equal frequency method](./examples/discretisation/equal_frequency_discretisation.py)\n- [Discretising a dataset using k-means clustering](./examples/discretisation/kmeans_discretisation.py)\n\n```python\nfrom arm_preprocessing.dataset import Dataset\n\n# Initialise dataset with filename (without format) and format (csv, json, txt)\ndataset = Dataset('datasets/sportydatagen', format='csv')\ndataset.load_data()\n\n# Discretise dataset using equal width discretisation\ndataset.discretise(method='equal_width', num_bins=5, columns=['calories'])\n```\n\n### Data squashing\nThe following example demonstrates how to squash a dataset using the euclidean similarity. More examples can be found in the [examples/squashing](./examples/squashing) directory:\n- [Squashing a dataset using the euclidean similarity](./examples/squashing/squash_euclidean.py)\n- [Squashing a dataset using the cosine similarity](./examples/squashing/squash_cosine.py)\n\n```python\nfrom arm_preprocessing.dataset import Dataset\n\n# Initialise dataset with filename and format\ndataset = Dataset('datasets/breast', format='csv')\ndataset.load()\n\n# Squash dataset\ndataset.squash(threshold=0.75, similarity='euclidean')\n```\n\n### Feature scaling\nThe following example demonstrates how to scale the dataset's features. More examples can be found in the [examples/scaling](./examples/scaling) directory:\n- [Scale features using normalisation](./examples/scaling/normalisation.py)\n- [Scale features using standardisation](./examples/scaling/standardisation.py)\n\n```python\nfrom arm_preprocessing.dataset import Dataset\n\n# Initialise dataset with filename and format\ndataset = Dataset('datasets/Abalone', format='csv')\ndataset.load()\n\n# Scale dataset using normalisation\ndataset.scale(method='normalisation')\n```\n\n### Feature selection\nThe following example demonstrates how to select features from a dataset. More examples can be found in the [examples/feature_selection](./examples/feature_selection) directory:\n- [Select features using the Kendall Tau correlation coefficient](./examples/feature_selection/feature_selection.py)\n\n```python\nfrom arm_preprocessing.dataset import Dataset\n\n# Initialise dataset with filename and format\ndataset = Dataset('datasets/sportydatagen', format='csv')\ndataset.load()\n\n# Feature selection\ndataset.feature_selection(\n    method='kendall', threshold=0.15, class_column='calories')\n```\n\n## Related frameworks \ud83d\udd17\n\n[1] [NiaARM: A minimalistic framework for Numerical Association Rule Mining](https://github.com/firefly-cpp/NiaARM)\n\n[2] [uARMSolver: universal Association Rule Mining Solver](https://github.com/firefly-cpp/uARMSolver)\n\n## References \ud83d\udcda\n\n[1] I. Fister, I. Fister Jr., D. Novak and D. Verber, [Data squashing as preprocessing in association rule mining](https://iztok-jr-fister.eu/static/publications/300.pdf), 2022 IEEE Symposium Series on Computational Intelligence (SSCI), Singapore, Singapore, 2022, pp. 1720-1725, doi: 10.1109/SSCI51031.2022.10022240.\n\n[2] I. Fister Jr., I. Fister [A brief overview of swarm intelligence-based algorithms for numerical association rule mining](https://arxiv.org/abs/2010.15524). arXiv preprint arXiv:2010.15524 (2020).\n\n## License\n\nThis package is distributed under the MIT License. This license can be found online\nat <http://www.opensource.org/licenses/MIT>.\n\n## Disclaimer\n\nThis framework is provided as-is, and there are no guarantees that it fits your purposes or that it is bug-free. Use it at your own risk!\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Implementation of several preprocessing techniques for Association Rule Mining (ARM)",
    "version": "0.2.2",
    "project_urls": {
        "Documentation": "http://arm-preprocessing.readthedocs.io",
        "Homepage": "https://github.com/firefly-cpp/arm-preprocessing",
        "Repository": "https://github.com/firefly-cpp/arm-preprocessing"
    },
    "split_keywords": [
        "association rule mining",
        " data science",
        " preprocessing"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e8029ecaf0731d4e12c2dd9ae2dbab2e8e88a67efd5b860ec277bdd52fa85380",
                "md5": "9773c8e625ff86759f5f8641f7eccb51",
                "sha256": "56c247965b541219d83ba48eec3068a944ef94d512b278d3531e06f457e0b15c"
            },
            "downloads": -1,
            "filename": "arm_preprocessing-0.2.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9773c8e625ff86759f5f8641f7eccb51",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.9",
            "size": 9948,
            "upload_time": "2024-04-02T08:27:11",
            "upload_time_iso_8601": "2024-04-02T08:27:11.101003Z",
            "url": "https://files.pythonhosted.org/packages/e8/02/9ecaf0731d4e12c2dd9ae2dbab2e8e88a67efd5b860ec277bdd52fa85380/arm_preprocessing-0.2.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a8ac6d78edbe4b17e3afa5dfbe501e32d988005ce2f7d9d8263f81077f7f18a4",
                "md5": "ac0c0c1bf5c444976ac987fb29198e78",
                "sha256": "b9563bf1febcc9b60df37e97580ebb00e770bd23032cf9b4e1b6a2c03ccec5aa"
            },
            "downloads": -1,
            "filename": "arm_preprocessing-0.2.2.tar.gz",
            "has_sig": false,
            "md5_digest": "ac0c0c1bf5c444976ac987fb29198e78",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.9",
            "size": 10770,
            "upload_time": "2024-04-02T08:27:12",
            "upload_time_iso_8601": "2024-04-02T08:27:12.778821Z",
            "url": "https://files.pythonhosted.org/packages/a8/ac/6d78edbe4b17e3afa5dfbe501e32d988005ce2f7d9d8263f81077f7f18a4/arm_preprocessing-0.2.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-02 08:27:12",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "firefly-cpp",
    "github_project": "arm-preprocessing",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "arm-preprocessing"
}
        
Elapsed time: 0.24138s