# Python interface to the R package arules
[](https://pypi.org/project/arulespy/)
[](https://github.com/mhahsler/arulespy/releases)
[](https://github.com/mhahsler/arulespy/actions)
[](https://github.com/mhahsler/arulespy/blob/main/LICENSE)
`arulespy` is a Python module available from [PyPI](https://pypi.org/project/arulespy/).
The `arules` module in `arulespy` provides an easy to install Python interface to the
[R package arules](https://github.com/mhahsler/arules) for association rule mining built
with [`rpy2`](https://pypi.org/project/rpy2/).
The R arules package implements a comprehensive
infrastructure for representing, manipulating and analyzing transaction data and patterns using frequent itemsets and association rules.
The package also provides a wide range of interest measures and mining algorithms including the code of Christian Borgelt’s popular
and efficient C implementations of the association mining algorithms Apriori and Eclat,
and optimized C/C++ code for
mining and manipulating association rules using sparse matrix representation.
The `arulesViz` module provides `plot()` for visualizing association rules using
the [R package arulesViz](https://github.com/mhahsler/arulesViz).
`arulespy` provides Python classes
for
- `Transactions`: Convert pandas dataframes into transaction data
- `Rules`: Association rules
- `Itemsets`: Itemsets
- `ItemMatrix`: sparse matrix representation of sets of items.
with Phyton-style slicing and `len()`.
Most arules functions are
interfaced as methods for the four classes with conversion from the R data structures to Python.
Documentation is avaialible in Python via `help()`. Detailed online documentation
for the R package is available [here](https://mhahsler.r-universe.dev/arules/doc/manual.html).
Low-level `arules` functions can also be directly used in the form
`R.<arules R function>()`. The result will be a `rpy2` data type.
Transactions, itemsets and rules can manually be converted to Python
classes using the helper function `a2p()`.
To cite the Python module ‘arulespy’ in publications use:
> Michael Hahsler. ARULESPY: Exploring association rules and frequent itemsets in Python. arXiv:2305.15263 [cs.DB], May 2023. DOI: [10.48550/arXiv.2305.15263](https://doi.org/10.48550/arXiv.2305.15263)
## Installation
`arulespy` is based on the python package `rpy2` which requires an R installation. Here are the installation steps:
1. Install the latest version of R (>4.0) from https://www.r-project.org/
2. Install required libraries on your OS:
- libcurl is needed by R package [curl](https://cran.r-project.org/web/packages/curl/index.html).
- Ubuntu: `sudo apt-get install libcurl4-openssl-dev`
- MacOS: `brew install curl`
- Windows: no installation necessary, but read the Windows section below.
3. Install `arulespy` which will automatically install `rpy2` and `pandas`.
``` sh
pip install arulespy
```
4. Optional: Set the environment variable `R_LIBS_USER` to decide where R packages are stored
(see [libPaths()](https://stat.ethz.ch/R-manual/R-devel/library/base/html/libPaths.html) for details). If not set then R will determine a suitable location.
5. Optional: `arulespy` will install the needed R packages when it is imported for the first time.
This may take a while. R packages can also be preinstalled. Start R and run
`install.packages(c("arules", "arulesViz"))`
The most likely issue is that `rpy2` does not find R or R's shared library.
This will lead the python kernel to die or exit without explanation when the package `arulespy` is imported.
Check `python -m rpy2.situation` to see if R and R's libraries are found.
If you use iPython notebooks then you can include the following code block in your notebook to check:
```python
from rpy2 import situation
for row in situation.iter_info():
print(row)
```
The output should include a line saying `Loading R library from rpy2: OK`.
### Note for Windows users
`rpy2` currently does not fully support Windows and the installation is somewhat tricky. I was able to use it with the following setup:
* Windows 10
* rpy2 version 3.5.14
* Python version 3.10.12
* R version 4.3.1
I use the following code to set the needed environment variables needed by Windows
before I import from `arulespy`
```python
from rpy2 import situation
import os
r_home = situation.r_home_from_registry()
r_bin = r_home + '\\bin\\x64\\'
os.environ['R_HOME'] = r_home
os.environ['PATH'] = r_bin + ";" + os.environ['PATH']
os.add_dll_directory(r_bin)
for row in situation.iter_info():
print(row)
```
The output should include a line saying `Loading R library from rpy2: OK`
More information on installing `rpy2` can be found [here](https://pypi.org/project/rpy2/).
## Example
```python
from arulespy.arules import Transactions, apriori, parameters
import pandas as pd
# define the data as a pandas dataframe
df = pd.DataFrame (
[
[True,True, True],
[True, False,False],
[True, True, True],
[True, False, False],
[True, True, True]
],
columns=list ('ABC'))
# convert dataframe to transactions
trans = transactions.from_df(df)
# mine association rules
rules = apriori(trans,
parameter = parameters({"supp": 0.1, "conf": 0.8}),
control = parameters({"verbose": False}))
# display the rules as a pandas dataframe
rules.as_df()
```
| | LHS | RHS | support | confidence | coverage | lift | count |
|---:|:------|:------|----------:|-------------:|-----------:|-------:|--------:|
| 1 | {} | {A} | 0.8 | 0.8 | 1 | 1 | 8 |
| 2 | {} | {C} | 0.8 | 0.8 | 1 | 1 | 8 |
| 3 | {B} | {A} | 0.4 | 0.8 | 0.5 | 1 | 4 |
| 4 | {B} | {C} | 0.5 | 1 | 0.5 | 1.25 | 5 |
| 5 | {A,B} | {C} | 0.4 | 1 | 0.4 | 1.25 | 4 |
| 6 | {B,C} | {A} | 0.4 | 0.8 | 0.5 | 1 | 4 |
Complete examples:
* [Using arules](https://mhahsler.github.io/arulespy/examples/arules.html)
* [Using arulesViz](https://mhahsler.github.io/arulespy/examples/arulesViz.html)
## References
- Michael Hahsler. [ARULESPY: Exploring association rules and frequent itemsets in
Python.](http://dx.doi.org/10.48550/arXiv.2305.15263) arXiv:2305.15263 [cs.DB], May 2023.
DOI: 10.48550/arXiv.2305.15263
- Michael Hahsler, Sudheer Chelluboina, Kurt Hornik, and Christian
Buchta. [The arules R-package ecosystem: Analyzing interesting
patterns from large transaction
datasets.](https://jmlr.csail.mit.edu/papers/v12/hahsler11a.html)
*Journal of Machine Learning Research,* 12:1977-1981, 2011.
- Michael Hahsler, Bettina Grün and Kurt Hornik. [arules - A
Computational Environment for Mining Association Rules and Frequent
Item Sets.](https://dx.doi.org/10.18637/jss.v014.i15) *Journal of
Statistical Software,* 14(15), 2005. DOI: 10.18637/jss.v014.i15
- Hahsler, Michael. [A Probabilistic Comparison of Commonly Used
Interest Measures for Association
Rules](https://mhahsler.github.io/arules/docs/measures), 2015, URL:
<https://mhahsler.github.io/arules/docs/measures>.
- Michael Hahsler. [An R Companion for Introduction to Data Mining:
Chapter
5](https://mhahsler.github.io/Introduction_to_Data_Mining_R_Examples/book/association-analysis-basic-concepts-and-algorithms.html),
2021, URL:
<https://mhahsler.github.io/Introduction_to_Data_Mining_R_Examples/book/>
Raw data
{
"_id": null,
"home_page": "https://github.com/mhahsler/arulespy",
"name": "arulespy",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "association rules,frequent itemsets",
"author": "Michael Hahsler",
"author_email": "mhahsler@lyle.smu.edu",
"download_url": "https://files.pythonhosted.org/packages/90/66/c5299e22dd45654c82fa5899b4921e4517e6f7de254f52a4a875aad8d4f1/arulespy-0.1.4.tar.gz",
"platform": null,
"description": "# Python interface to the R package arules\n\n[](https://pypi.org/project/arulespy/)\n[](https://github.com/mhahsler/arulespy/releases)\n[](https://github.com/mhahsler/arulespy/actions)\n[](https://github.com/mhahsler/arulespy/blob/main/LICENSE)\n\n`arulespy` is a Python module available from [PyPI](https://pypi.org/project/arulespy/).\nThe `arules` module in `arulespy` provides an easy to install Python interface to the \n[R package arules](https://github.com/mhahsler/arules) for association rule mining built \nwith [`rpy2`](https://pypi.org/project/rpy2/). \n\nThe R arules package implements a comprehensive\ninfrastructure for representing, manipulating and analyzing transaction data and patterns using frequent itemsets and association rules. \nThe package also provides a wide range of interest measures and mining algorithms including the code of Christian Borgelt\u2019s popular \nand efficient C implementations of the association mining algorithms Apriori and Eclat,\nand optimized C/C++ code for \nmining and manipulating association rules using sparse matrix representation. \n\nThe `arulesViz` module provides `plot()` for visualizing association rules using\nthe [R package arulesViz](https://github.com/mhahsler/arulesViz).\n\n`arulespy` provides Python classes\nfor\n\n- `Transactions`: Convert pandas dataframes into transaction data\n- `Rules`: Association rules\n- `Itemsets`: Itemsets\n- `ItemMatrix`: sparse matrix representation of sets of items.\n\nwith Phyton-style slicing and `len()`. \n\nMost arules functions are\ninterfaced as methods for the four classes with conversion from the R data structures to Python.\nDocumentation is avaialible in Python via `help()`. Detailed online documentation\nfor the R package is available [here](https://mhahsler.r-universe.dev/arules/doc/manual.html). \n\nLow-level `arules` functions can also be directly used in the form \n`R.<arules R function>()`. The result will be a `rpy2` data type.\nTransactions, itemsets and rules can manually be converted to Python\nclasses using the helper function `a2p()`.\n\nTo cite the Python module \u2018arulespy\u2019 in publications use:\n\n> Michael Hahsler. ARULESPY: Exploring association rules and frequent itemsets in Python. arXiv:2305.15263 [cs.DB], May 2023. DOI: [10.48550/arXiv.2305.15263](https://doi.org/10.48550/arXiv.2305.15263)\n\n\n## Installation\n\n`arulespy` is based on the python package `rpy2` which requires an R installation. Here are the installation steps:\n\n1. Install the latest version of R (>4.0) from https://www.r-project.org/\n\n2. Install required libraries on your OS:\n - libcurl is needed by R package [curl](https://cran.r-project.org/web/packages/curl/index.html).\n - Ubuntu: `sudo apt-get install libcurl4-openssl-dev`\n - MacOS: `brew install curl`\n - Windows: no installation necessary, but read the Windows section below.\n\n3. Install `arulespy` which will automatically install `rpy2` and `pandas`.\n ``` sh\n pip install arulespy\n ```\n\n4. Optional: Set the environment variable `R_LIBS_USER` to decide where R packages are stored \n (see [libPaths()](https://stat.ethz.ch/R-manual/R-devel/library/base/html/libPaths.html) for details). If not set then R will determine a suitable location.\n\n5. Optional: `arulespy` will install the needed R packages when it is imported for the first time.\n This may take a while. R packages can also be preinstalled. Start R and run \n `install.packages(c(\"arules\", \"arulesViz\"))`\n\n\nThe most likely issue is that `rpy2` does not find R or R's shared library. \nThis will lead the python kernel to die or exit without explanation when the package `arulespy` is imported.\nCheck `python -m rpy2.situation` to see if R and R's libraries are found.\nIf you use iPython notebooks then you can include the following code block in your notebook to check:\n```python\nfrom rpy2 import situation\n\nfor row in situation.iter_info():\n print(row)\n```\n\nThe output should include a line saying `Loading R library from rpy2: OK`.\n\n### Note for Windows users\n `rpy2` currently does not fully support Windows and the installation is somewhat tricky. I was able to use it with the following setup:\n\n* Windows 10\n* rpy2 version 3.5.14\n* Python version 3.10.12\n* R version 4.3.1\n\nI use the following code to set the needed environment variables needed by Windows \nbefore I import from `arulespy`\n```python\nfrom rpy2 import situation\nimport os\n\nr_home = situation.r_home_from_registry()\nr_bin = r_home + '\\\\bin\\\\x64\\\\'\nos.environ['R_HOME'] = r_home\nos.environ['PATH'] = r_bin + \";\" + os.environ['PATH']\nos.add_dll_directory(r_bin)\n\nfor row in situation.iter_info():\n print(row)\n```\n\nThe output should include a line saying `Loading R library from rpy2: OK`\n\nMore information on installing `rpy2` can be found [here](https://pypi.org/project/rpy2/).\n\n\n## Example\n\n```python\nfrom arulespy.arules import Transactions, apriori, parameters\nimport pandas as pd\n\n# define the data as a pandas dataframe\ndf = pd.DataFrame (\n [\n [True,True, True],\n [True, False,False],\n [True, True, True],\n [True, False, False],\n [True, True, True]\n ],\n columns=list ('ABC')) \n\n# convert dataframe to transactions\ntrans = transactions.from_df(df)\n\n# mine association rules\nrules = apriori(trans,\n parameter = parameters({\"supp\": 0.1, \"conf\": 0.8}), \n control = parameters({\"verbose\": False})) \n\n# display the rules as a pandas dataframe\nrules.as_df()\n```\n\n| | LHS | RHS | support | confidence | coverage | lift | count |\n|---:|:------|:------|----------:|-------------:|-----------:|-------:|--------:|\n| 1 | {} | {A} | 0.8 | 0.8 | 1 | 1 | 8 |\n| 2 | {} | {C} | 0.8 | 0.8 | 1 | 1 | 8 |\n| 3 | {B} | {A} | 0.4 | 0.8 | 0.5 | 1 | 4 |\n| 4 | {B} | {C} | 0.5 | 1 | 0.5 | 1.25 | 5 |\n| 5 | {A,B} | {C} | 0.4 | 1 | 0.4 | 1.25 | 4 |\n| 6 | {B,C} | {A} | 0.4 | 0.8 | 0.5 | 1 | 4 |\n\nComplete examples:\n * [Using arules](https://mhahsler.github.io/arulespy/examples/arules.html)\n * [Using arulesViz](https://mhahsler.github.io/arulespy/examples/arulesViz.html)\n\n\n## References\n\n- Michael Hahsler. [ARULESPY: Exploring association rules and frequent itemsets in \n Python.](http://dx.doi.org/10.48550/arXiv.2305.15263) arXiv:2305.15263 [cs.DB], May 2023. \n DOI: 10.48550/arXiv.2305.15263\n- Michael Hahsler, Sudheer Chelluboina, Kurt Hornik, and Christian\n Buchta. [The arules R-package ecosystem: Analyzing interesting\n patterns from large transaction\n datasets.](https://jmlr.csail.mit.edu/papers/v12/hahsler11a.html)\n *Journal of Machine Learning Research,* 12:1977-1981, 2011.\n- Michael Hahsler, Bettina Gr\u00fcn and Kurt Hornik. [arules - A\n Computational Environment for Mining Association Rules and Frequent\n Item Sets.](https://dx.doi.org/10.18637/jss.v014.i15) *Journal of\n Statistical Software,* 14(15), 2005. DOI: 10.18637/jss.v014.i15\n- Hahsler, Michael. [A Probabilistic Comparison of Commonly Used\n Interest Measures for Association\n Rules](https://mhahsler.github.io/arules/docs/measures), 2015, URL:\n <https://mhahsler.github.io/arules/docs/measures>.\n- Michael Hahsler. [An R Companion for Introduction to Data Mining:\n Chapter\n 5](https://mhahsler.github.io/Introduction_to_Data_Mining_R_Examples/book/association-analysis-basic-concepts-and-algorithms.html),\n 2021, URL:\n <https://mhahsler.github.io/Introduction_to_Data_Mining_R_Examples/book/>\n\n",
"bugtrack_url": null,
"license": "",
"summary": "Python interface to the R package arules",
"version": "0.1.4",
"project_urls": {
"Bug Reports": "https://github.com/mhahsler/arulespy/issues",
"Documentation": "https://github.com/mhahsler/arulespy",
"Homepage": "https://github.com/mhahsler/arulespy",
"Source Code": "https://github.com/mhahsler/arulespy"
},
"split_keywords": [
"association rules",
"frequent itemsets"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "9f1c68cd8fb16ccc8f53656b58395c08348878ac6a10a19ea8375a784c300bd3",
"md5": "0f8607e97b50be6bdf2f2435b4bcc302",
"sha256": "758a79d177deb7ad2985c9f78e629be9369cf1294ff9251a38ba604083fb8aab"
},
"downloads": -1,
"filename": "arulespy-0.1.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "0f8607e97b50be6bdf2f2435b4bcc302",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 21414,
"upload_time": "2023-09-12T18:49:10",
"upload_time_iso_8601": "2023-09-12T18:49:10.285456Z",
"url": "https://files.pythonhosted.org/packages/9f/1c/68cd8fb16ccc8f53656b58395c08348878ac6a10a19ea8375a784c300bd3/arulespy-0.1.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "9066c5299e22dd45654c82fa5899b4921e4517e6f7de254f52a4a875aad8d4f1",
"md5": "3613cca8e360f84997cce98eacbd3243",
"sha256": "fcbc7c8a3571d03fb9482bd5aa8517bb9975e48a793058f3b381d46d2b0778ab"
},
"downloads": -1,
"filename": "arulespy-0.1.4.tar.gz",
"has_sig": false,
"md5_digest": "3613cca8e360f84997cce98eacbd3243",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 24382,
"upload_time": "2023-09-12T18:49:12",
"upload_time_iso_8601": "2023-09-12T18:49:12.157364Z",
"url": "https://files.pythonhosted.org/packages/90/66/c5299e22dd45654c82fa5899b4921e4517e6f7de254f52a4a875aad8d4f1/arulespy-0.1.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-09-12 18:49:12",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "mhahsler",
"github_project": "arulespy",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "arulespy"
}