arulespy


Namearulespy JSON
Version 0.1.4 PyPI version JSON
download
home_pagehttps://github.com/mhahsler/arulespy
SummaryPython interface to the R package arules
upload_time2023-09-12 18:49:12
maintainer
docs_urlNone
authorMichael Hahsler
requires_python>=3.8
license
keywords association rules frequent itemsets
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Python interface to the R package arules

[![PyPI
package](https://img.shields.io/badge/pip%20install-arulespy-brightgreen)](https://pypi.org/project/arulespy/)
[![version
number](https://img.shields.io/pypi/v/arulespy?color=green&label=version)](https://github.com/mhahsler/arulespy/releases)
[![Actions
Status](https://github.com/mhahsler/arulespy/workflows/Test/badge.svg)](https://github.com/mhahsler/arulespy/actions)
[![License](https://img.shields.io/github/license/mhahsler/arulespy)](https://github.com/mhahsler/arulespy/blob/main/LICENSE)

`arulespy` is a Python module available from [PyPI](https://pypi.org/project/arulespy/).
The `arules` module in `arulespy` provides an easy to install Python interface to the 
[R package arules](https://github.com/mhahsler/arules) for association rule mining built 
with [`rpy2`](https://pypi.org/project/rpy2/). 

The R arules package implements a comprehensive
infrastructure for representing, manipulating and analyzing transaction data and patterns using frequent itemsets and association rules. 
The package also provides a wide range of interest measures and mining algorithms including the code of Christian Borgelt’s popular 
and efficient C implementations of the association mining algorithms Apriori and Eclat,
and optimized C/C++ code for 
mining and manipulating association rules using sparse matrix representation. 

The `arulesViz` module provides `plot()` for visualizing association rules using
the [R package arulesViz](https://github.com/mhahsler/arulesViz).

`arulespy` provides Python classes
for

-   `Transactions`: Convert pandas dataframes into transaction data
-   `Rules`: Association rules
-   `Itemsets`: Itemsets
-   `ItemMatrix`: sparse matrix representation of sets of items.

with Phyton-style slicing and `len()`. 

Most arules functions are
interfaced as methods for the four classes with conversion from the R data structures to Python.
Documentation is avaialible in Python via `help()`. Detailed online documentation
for the R package is available [here](https://mhahsler.r-universe.dev/arules/doc/manual.html). 

Low-level `arules` functions can also be directly used in the form 
`R.<arules R function>()`. The result will be a `rpy2` data type.
Transactions, itemsets and rules can manually be converted to Python
classes using the helper function `a2p()`.

To cite the Python module ‘arulespy’ in publications use:

> Michael Hahsler. ARULESPY: Exploring association rules and frequent itemsets in Python. arXiv:2305.15263 [cs.DB], May 2023. DOI: [10.48550/arXiv.2305.15263](https://doi.org/10.48550/arXiv.2305.15263)


## Installation

`arulespy` is based on the python package `rpy2` which requires an R installation. Here are the installation steps:

1. Install the latest version of R (>4.0) from https://www.r-project.org/

2. Install required libraries on your OS:
   - libcurl is needed by R package [curl](https://cran.r-project.org/web/packages/curl/index.html).
      - Ubuntu: `sudo apt-get install libcurl4-openssl-dev`
      - MacOS: `brew install curl`
      - Windows: no installation necessary, but read the Windows section below.

3. Install `arulespy` which will automatically install `rpy2` and `pandas`.
    ``` sh
    pip install arulespy
    ```

4. Optional: Set the environment variable `R_LIBS_USER` to decide where R packages are stored 
    (see [libPaths()](https://stat.ethz.ch/R-manual/R-devel/library/base/html/libPaths.html) for details). If not set then R will determine a suitable location.

5. Optional: `arulespy` will install the needed R packages when it is imported for the first time.
    This may take a while. R packages can also be preinstalled. Start R and run 
    `install.packages(c("arules", "arulesViz"))`


The most likely issue is that `rpy2` does not find R or R's shared library. 
This will lead the python kernel to die or exit without explanation when the package `arulespy` is imported.
Check `python -m rpy2.situation` to see if R and R's libraries are found.
If you use iPython notebooks then you can include the following code block in your notebook to check:
```python
from rpy2 import situation

for row in situation.iter_info():
    print(row)
```

The output should include a line saying `Loading R library from rpy2: OK`.

### Note for Windows users
 `rpy2` currently does not fully support Windows and the installation is somewhat tricky. I was able to use it with the following setup:

* Windows 10
* rpy2 version 3.5.14
* Python version 3.10.12
* R version 4.3.1

I use the following code to set the needed environment variables needed by Windows 
before I import from `arulespy`
```python
from rpy2 import situation
import os

r_home = situation.r_home_from_registry()
r_bin = r_home + '\\bin\\x64\\'
os.environ['R_HOME'] = r_home
os.environ['PATH'] =  r_bin + ";" + os.environ['PATH']
os.add_dll_directory(r_bin)

for row in situation.iter_info():
    print(row)
```

The output should include a line saying `Loading R library from rpy2: OK`

More information on installing `rpy2` can be found [here](https://pypi.org/project/rpy2/).


## Example

```python
from arulespy.arules import Transactions, apriori, parameters
import pandas as pd

# define the data as a pandas dataframe
df = pd.DataFrame (
    [
        [True,True, True],
        [True, False,False],
        [True, True, True],
        [True, False, False],
        [True, True, True]
    ],
    columns=list ('ABC')) 

# convert dataframe to transactions
trans = transactions.from_df(df)

# mine association rules
rules = apriori(trans,
                    parameter = parameters({"supp": 0.1, "conf": 0.8}), 
                    control = parameters({"verbose": False}))  

# display the rules as a pandas dataframe
rules.as_df()
```

|    | LHS   | RHS   |   support |   confidence |   coverage |   lift |   count |
|---:|:------|:------|----------:|-------------:|-----------:|-------:|--------:|
|  1 | {}    | {A}   |       0.8 |          0.8 |        1   |   1    |       8 |
|  2 | {}    | {C}   |       0.8 |          0.8 |        1   |   1    |       8 |
|  3 | {B}   | {A}   |       0.4 |          0.8 |        0.5 |   1    |       4 |
|  4 | {B}   | {C}   |       0.5 |          1   |        0.5 |   1.25 |       5 |
|  5 | {A,B} | {C}   |       0.4 |          1   |        0.4 |   1.25 |       4 |
|  6 | {B,C} | {A}   |       0.4 |          0.8 |        0.5 |   1    |       4 |

Complete examples:
  * [Using arules](https://mhahsler.github.io/arulespy/examples/arules.html)
  * [Using arulesViz](https://mhahsler.github.io/arulespy/examples/arulesViz.html)


## References

- Michael Hahsler. [ARULESPY: Exploring association rules and frequent itemsets in 
  Python.](http://dx.doi.org/10.48550/arXiv.2305.15263) arXiv:2305.15263 [cs.DB], May 2023. 
  DOI: 10.48550/arXiv.2305.15263
- Michael Hahsler, Sudheer Chelluboina, Kurt Hornik, and Christian
  Buchta. [The arules R-package ecosystem: Analyzing interesting
  patterns from large transaction
  datasets.](https://jmlr.csail.mit.edu/papers/v12/hahsler11a.html)
  *Journal of Machine Learning Research,* 12:1977-1981, 2011.
- Michael Hahsler, Bettina Grün and Kurt Hornik. [arules - A
  Computational Environment for Mining Association Rules and Frequent
  Item Sets.](https://dx.doi.org/10.18637/jss.v014.i15) *Journal of
  Statistical Software,* 14(15), 2005. DOI: 10.18637/jss.v014.i15
- Hahsler, Michael. [A Probabilistic Comparison of Commonly Used
  Interest Measures for Association
  Rules](https://mhahsler.github.io/arules/docs/measures), 2015, URL:
  <https://mhahsler.github.io/arules/docs/measures>.
- Michael Hahsler. [An R Companion for Introduction to Data Mining:
  Chapter
  5](https://mhahsler.github.io/Introduction_to_Data_Mining_R_Examples/book/association-analysis-basic-concepts-and-algorithms.html),
  2021, URL:
  <https://mhahsler.github.io/Introduction_to_Data_Mining_R_Examples/book/>


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/mhahsler/arulespy",
    "name": "arulespy",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "association rules,frequent itemsets",
    "author": "Michael Hahsler",
    "author_email": "mhahsler@lyle.smu.edu",
    "download_url": "https://files.pythonhosted.org/packages/90/66/c5299e22dd45654c82fa5899b4921e4517e6f7de254f52a4a875aad8d4f1/arulespy-0.1.4.tar.gz",
    "platform": null,
    "description": "# Python interface to the R package arules\n\n[![PyPI\npackage](https://img.shields.io/badge/pip%20install-arulespy-brightgreen)](https://pypi.org/project/arulespy/)\n[![version\nnumber](https://img.shields.io/pypi/v/arulespy?color=green&label=version)](https://github.com/mhahsler/arulespy/releases)\n[![Actions\nStatus](https://github.com/mhahsler/arulespy/workflows/Test/badge.svg)](https://github.com/mhahsler/arulespy/actions)\n[![License](https://img.shields.io/github/license/mhahsler/arulespy)](https://github.com/mhahsler/arulespy/blob/main/LICENSE)\n\n`arulespy` is a Python module available from [PyPI](https://pypi.org/project/arulespy/).\nThe `arules` module in `arulespy` provides an easy to install Python interface to the \n[R package arules](https://github.com/mhahsler/arules) for association rule mining built \nwith [`rpy2`](https://pypi.org/project/rpy2/). \n\nThe R arules package implements a comprehensive\ninfrastructure for representing, manipulating and analyzing transaction data and patterns using frequent itemsets and association rules. \nThe package also provides a wide range of interest measures and mining algorithms including the code of Christian Borgelt\u2019s popular \nand efficient C implementations of the association mining algorithms Apriori and Eclat,\nand optimized C/C++ code for \nmining and manipulating association rules using sparse matrix representation. \n\nThe `arulesViz` module provides `plot()` for visualizing association rules using\nthe [R package arulesViz](https://github.com/mhahsler/arulesViz).\n\n`arulespy` provides Python classes\nfor\n\n-   `Transactions`: Convert pandas dataframes into transaction data\n-   `Rules`: Association rules\n-   `Itemsets`: Itemsets\n-   `ItemMatrix`: sparse matrix representation of sets of items.\n\nwith Phyton-style slicing and `len()`. \n\nMost arules functions are\ninterfaced as methods for the four classes with conversion from the R data structures to Python.\nDocumentation is avaialible in Python via `help()`. Detailed online documentation\nfor the R package is available [here](https://mhahsler.r-universe.dev/arules/doc/manual.html). \n\nLow-level `arules` functions can also be directly used in the form \n`R.<arules R function>()`. The result will be a `rpy2` data type.\nTransactions, itemsets and rules can manually be converted to Python\nclasses using the helper function `a2p()`.\n\nTo cite the Python module \u2018arulespy\u2019 in publications use:\n\n> Michael Hahsler. ARULESPY: Exploring association rules and frequent itemsets in Python. arXiv:2305.15263 [cs.DB], May 2023. DOI: [10.48550/arXiv.2305.15263](https://doi.org/10.48550/arXiv.2305.15263)\n\n\n## Installation\n\n`arulespy` is based on the python package `rpy2` which requires an R installation. Here are the installation steps:\n\n1. Install the latest version of R (>4.0) from https://www.r-project.org/\n\n2. Install required libraries on your OS:\n   - libcurl is needed by R package [curl](https://cran.r-project.org/web/packages/curl/index.html).\n      - Ubuntu: `sudo apt-get install libcurl4-openssl-dev`\n      - MacOS: `brew install curl`\n      - Windows: no installation necessary, but read the Windows section below.\n\n3. Install `arulespy` which will automatically install `rpy2` and `pandas`.\n    ``` sh\n    pip install arulespy\n    ```\n\n4. Optional: Set the environment variable `R_LIBS_USER` to decide where R packages are stored \n    (see [libPaths()](https://stat.ethz.ch/R-manual/R-devel/library/base/html/libPaths.html) for details). If not set then R will determine a suitable location.\n\n5. Optional: `arulespy` will install the needed R packages when it is imported for the first time.\n    This may take a while. R packages can also be preinstalled. Start R and run \n    `install.packages(c(\"arules\", \"arulesViz\"))`\n\n\nThe most likely issue is that `rpy2` does not find R or R's shared library. \nThis will lead the python kernel to die or exit without explanation when the package `arulespy` is imported.\nCheck `python -m rpy2.situation` to see if R and R's libraries are found.\nIf you use iPython notebooks then you can include the following code block in your notebook to check:\n```python\nfrom rpy2 import situation\n\nfor row in situation.iter_info():\n    print(row)\n```\n\nThe output should include a line saying `Loading R library from rpy2: OK`.\n\n### Note for Windows users\n `rpy2` currently does not fully support Windows and the installation is somewhat tricky. I was able to use it with the following setup:\n\n* Windows 10\n* rpy2 version 3.5.14\n* Python version 3.10.12\n* R version 4.3.1\n\nI use the following code to set the needed environment variables needed by Windows \nbefore I import from `arulespy`\n```python\nfrom rpy2 import situation\nimport os\n\nr_home = situation.r_home_from_registry()\nr_bin = r_home + '\\\\bin\\\\x64\\\\'\nos.environ['R_HOME'] = r_home\nos.environ['PATH'] =  r_bin + \";\" + os.environ['PATH']\nos.add_dll_directory(r_bin)\n\nfor row in situation.iter_info():\n    print(row)\n```\n\nThe output should include a line saying `Loading R library from rpy2: OK`\n\nMore information on installing `rpy2` can be found [here](https://pypi.org/project/rpy2/).\n\n\n## Example\n\n```python\nfrom arulespy.arules import Transactions, apriori, parameters\nimport pandas as pd\n\n# define the data as a pandas dataframe\ndf = pd.DataFrame (\n    [\n        [True,True, True],\n        [True, False,False],\n        [True, True, True],\n        [True, False, False],\n        [True, True, True]\n    ],\n    columns=list ('ABC')) \n\n# convert dataframe to transactions\ntrans = transactions.from_df(df)\n\n# mine association rules\nrules = apriori(trans,\n                    parameter = parameters({\"supp\": 0.1, \"conf\": 0.8}), \n                    control = parameters({\"verbose\": False}))  \n\n# display the rules as a pandas dataframe\nrules.as_df()\n```\n\n|    | LHS   | RHS   |   support |   confidence |   coverage |   lift |   count |\n|---:|:------|:------|----------:|-------------:|-----------:|-------:|--------:|\n|  1 | {}    | {A}   |       0.8 |          0.8 |        1   |   1    |       8 |\n|  2 | {}    | {C}   |       0.8 |          0.8 |        1   |   1    |       8 |\n|  3 | {B}   | {A}   |       0.4 |          0.8 |        0.5 |   1    |       4 |\n|  4 | {B}   | {C}   |       0.5 |          1   |        0.5 |   1.25 |       5 |\n|  5 | {A,B} | {C}   |       0.4 |          1   |        0.4 |   1.25 |       4 |\n|  6 | {B,C} | {A}   |       0.4 |          0.8 |        0.5 |   1    |       4 |\n\nComplete examples:\n  * [Using arules](https://mhahsler.github.io/arulespy/examples/arules.html)\n  * [Using arulesViz](https://mhahsler.github.io/arulespy/examples/arulesViz.html)\n\n\n## References\n\n- Michael Hahsler. [ARULESPY: Exploring association rules and frequent itemsets in \n  Python.](http://dx.doi.org/10.48550/arXiv.2305.15263) arXiv:2305.15263 [cs.DB], May 2023. \n  DOI: 10.48550/arXiv.2305.15263\n- Michael Hahsler, Sudheer Chelluboina, Kurt Hornik, and Christian\n  Buchta. [The arules R-package ecosystem: Analyzing interesting\n  patterns from large transaction\n  datasets.](https://jmlr.csail.mit.edu/papers/v12/hahsler11a.html)\n  *Journal of Machine Learning Research,* 12:1977-1981, 2011.\n- Michael Hahsler, Bettina Gr\u00fcn and Kurt Hornik. [arules - A\n  Computational Environment for Mining Association Rules and Frequent\n  Item Sets.](https://dx.doi.org/10.18637/jss.v014.i15) *Journal of\n  Statistical Software,* 14(15), 2005. DOI: 10.18637/jss.v014.i15\n- Hahsler, Michael. [A Probabilistic Comparison of Commonly Used\n  Interest Measures for Association\n  Rules](https://mhahsler.github.io/arules/docs/measures), 2015, URL:\n  <https://mhahsler.github.io/arules/docs/measures>.\n- Michael Hahsler. [An R Companion for Introduction to Data Mining:\n  Chapter\n  5](https://mhahsler.github.io/Introduction_to_Data_Mining_R_Examples/book/association-analysis-basic-concepts-and-algorithms.html),\n  2021, URL:\n  <https://mhahsler.github.io/Introduction_to_Data_Mining_R_Examples/book/>\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Python interface to the R package arules",
    "version": "0.1.4",
    "project_urls": {
        "Bug Reports": "https://github.com/mhahsler/arulespy/issues",
        "Documentation": "https://github.com/mhahsler/arulespy",
        "Homepage": "https://github.com/mhahsler/arulespy",
        "Source Code": "https://github.com/mhahsler/arulespy"
    },
    "split_keywords": [
        "association rules",
        "frequent itemsets"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9f1c68cd8fb16ccc8f53656b58395c08348878ac6a10a19ea8375a784c300bd3",
                "md5": "0f8607e97b50be6bdf2f2435b4bcc302",
                "sha256": "758a79d177deb7ad2985c9f78e629be9369cf1294ff9251a38ba604083fb8aab"
            },
            "downloads": -1,
            "filename": "arulespy-0.1.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0f8607e97b50be6bdf2f2435b4bcc302",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 21414,
            "upload_time": "2023-09-12T18:49:10",
            "upload_time_iso_8601": "2023-09-12T18:49:10.285456Z",
            "url": "https://files.pythonhosted.org/packages/9f/1c/68cd8fb16ccc8f53656b58395c08348878ac6a10a19ea8375a784c300bd3/arulespy-0.1.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9066c5299e22dd45654c82fa5899b4921e4517e6f7de254f52a4a875aad8d4f1",
                "md5": "3613cca8e360f84997cce98eacbd3243",
                "sha256": "fcbc7c8a3571d03fb9482bd5aa8517bb9975e48a793058f3b381d46d2b0778ab"
            },
            "downloads": -1,
            "filename": "arulespy-0.1.4.tar.gz",
            "has_sig": false,
            "md5_digest": "3613cca8e360f84997cce98eacbd3243",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 24382,
            "upload_time": "2023-09-12T18:49:12",
            "upload_time_iso_8601": "2023-09-12T18:49:12.157364Z",
            "url": "https://files.pythonhosted.org/packages/90/66/c5299e22dd45654c82fa5899b4921e4517e6f7de254f52a4a875aad8d4f1/arulespy-0.1.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-12 18:49:12",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mhahsler",
    "github_project": "arulespy",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "arulespy"
}
        
Elapsed time: 0.35912s