ruleminer


Nameruleminer JSON
Version 0.1.26 PyPI version JSON
download
home_pagehttps://github.com/wjwillemse/ruleminer
SummaryPython package to mine association rules in datasets
upload_time2024-04-22 10:24:39
maintainerNone
docs_urlNone
authorWillem Jan Willemse
requires_python>=3.6
licenseMIT license
keywords ruleminer
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            =========
ruleminer
=========

.. image:: https://readthedocs.org/projects/ruleminer/badge/?version=latest
        :alt: ReadTheDocs
        :target: https://ruleminer.readthedocs.io/en/latest/

.. image:: https://img.shields.io/pypi/v/ruleminer.svg
        :target: https://pypi.python.org/pypi/ruleminer

.. image:: https://img.shields.io/badge/License-MIT-yellow.svg
        :target: https://opensource.org/licenses/MIT
        :alt: License: MIT

.. image:: https://img.shields.io/badge/code%20style-black-000000.svg
        :target: https://github.com/psf/black
        :alt: Code style: black


Python package to discover association rules in Pandas DataFrames. 

This package implements the code of the paper `Discovering and ranking validation rules in supervisory data <https://github.com/wjwillemse/ruleminer/tree/main/docs/paper.pdf>`_.

The documentation can be found `here <https://ruleminer.readthedocs.io/en/latest/>`_. 

Here is what the package does:

* Generate human-readable validation rules using rule templates containing regular expressions and a Pandas DataFrame dataset

  - available functions: min, max, abs, quantile, sum, substr, split, count, sumif and countif
  - including parameters for metric filters and rule precisions (including XBRL tolerances)

* Evaluate rules and calculate association rules metrics

  - available metrics: abs support, abs exceptions, confidence, support, added value, casual confidence, casual support, conviction, lift and rule power factor

Here are some examples of rule templates with regexes with which you can generate validation rules:

- if ({"Type"} == ".*") then ({".*"} > 0)

- if ({".*"} > 0) then (({".*"} == 0) & ({".*"} > 0))

- (({".*"} + {".*"} + {".*"}) == {".*"})

- ({"Own funds"} <= quantile({"Own funds"}, 0.95))

- (substr({"Type"}, 0, 1) in ["a", "b"])

The first template generates (with the dataset described in the Usage section) rules like

- if ({"Type"} == "non-life_insurer") then ({"TP-nonlife"} > 0)
- if ({"Type"} == "life_insurer") then ({"TP-life"} > 0)

These generated validation rules can then be used to validate new datasets.


=======
History
=======

0.1.0 (2021-11-21)
------------------

* First release on PyPI.

0.1.1 (2021-11-23)
------------------

* Added more documentation to the README text

0.1.2 (2022-1-20)
-----------------

* Bug fixes wrt some complex expressions

0.1.3 (2022-1-26)
-----------------

* Optimized rule generation process

0.1.4 (2022-1-26)
-----------------

* Evaluated columns in then part are now dependent on if part of rule

0.1.5 (2022-1-30)
-----------------

* Rule with quantiles added (including evaluating intermediate results)

0.1.6 and 0.1.7 (2022-2-1)
--------------------------

* A number of optimization in rule generation process

0.1.8 (2022-2-3)
----------------

* Rule power factor metric added

0.1.12 (2022-5-11)
------------------

* Optimizations: metric calculations are done with boolean masks of DataFrame

0.1.14 (2023-4-17)
------------------

* Nested functions added
* substr and in operators added

0.1.16 (2023-8-3)
-----------------

* Templates now do not necessarily have to contain a regex
* Bug fix when evaluating rules that contain columns that do not exist
* Templates now can start with 'if () then'

0.1.17 (2023-8-8)
-----------------

* Generate rules now runs without specified data

0.1.18 (2023-8-8)
-----------------

* Dedicated function added for template to rule conversion without data
* Exp sign changed from ^ to **

0.1.19 (2023-8-27)
------------------

* Small fixes rule conversion without data

0.1.20 (2023-8-29)
------------------

* Small fixes in evaluating rules with syntax errors

0.1.21 (2023-10-11)
-------------------

* changed sum to nansum
* added tolerance functionality for ==

0.1.22 (2023-10-17)
-------------------

* added tolerance functionality for !=, <, <=, > and >=
* updated docs

0.1.23 (2023-10-18)
-------------------

* added nested conditions in functions

0.1.24 (2023-10-25)
-------------------

* added sumif and improved tolerance functionality

0.1.26 (2023-4-22)
-------------------

* added additional arguments estimate, base and sample_weights to fit_ensemble_and_extract_expressions function to use more than AdaBoost
* added decision tree functions to __init__.py

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/wjwillemse/ruleminer",
    "name": "ruleminer",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": "ruleminer",
    "author": "Willem Jan Willemse",
    "author_email": "w.j.willemse@xs4all.nl",
    "download_url": null,
    "platform": null,
    "description": "=========\r\nruleminer\r\n=========\r\n\r\n.. image:: https://readthedocs.org/projects/ruleminer/badge/?version=latest\r\n        :alt: ReadTheDocs\r\n        :target: https://ruleminer.readthedocs.io/en/latest/\r\n\r\n.. image:: https://img.shields.io/pypi/v/ruleminer.svg\r\n        :target: https://pypi.python.org/pypi/ruleminer\r\n\r\n.. image:: https://img.shields.io/badge/License-MIT-yellow.svg\r\n        :target: https://opensource.org/licenses/MIT\r\n        :alt: License: MIT\r\n\r\n.. image:: https://img.shields.io/badge/code%20style-black-000000.svg\r\n        :target: https://github.com/psf/black\r\n        :alt: Code style: black\r\n\r\n\r\nPython package to discover association rules in Pandas DataFrames. \r\n\r\nThis package implements the code of the paper `Discovering and ranking validation rules in supervisory data <https://github.com/wjwillemse/ruleminer/tree/main/docs/paper.pdf>`_.\r\n\r\nThe documentation can be found `here <https://ruleminer.readthedocs.io/en/latest/>`_. \r\n\r\nHere is what the package does:\r\n\r\n* Generate human-readable validation rules using rule templates containing regular expressions and a Pandas DataFrame dataset\r\n\r\n  - available functions: min, max, abs, quantile, sum, substr, split, count, sumif and countif\r\n  - including parameters for metric filters and rule precisions (including XBRL tolerances)\r\n\r\n* Evaluate rules and calculate association rules metrics\r\n\r\n  - available metrics: abs support, abs exceptions, confidence, support, added value, casual confidence, casual support, conviction, lift and rule power factor\r\n\r\nHere are some examples of rule templates with regexes with which you can generate validation rules:\r\n\r\n- if ({\"Type\"} == \".*\") then ({\".*\"} > 0)\r\n\r\n- if ({\".*\"} > 0) then (({\".*\"} == 0) & ({\".*\"} > 0))\r\n\r\n- (({\".*\"} + {\".*\"} + {\".*\"}) == {\".*\"})\r\n\r\n- ({\"Own funds\"} <= quantile({\"Own funds\"}, 0.95))\r\n\r\n- (substr({\"Type\"}, 0, 1) in [\"a\", \"b\"])\r\n\r\nThe first template generates (with the dataset described in the Usage section) rules like\r\n\r\n- if ({\"Type\"} == \"non-life_insurer\") then ({\"TP-nonlife\"} > 0)\r\n- if ({\"Type\"} == \"life_insurer\") then ({\"TP-life\"} > 0)\r\n\r\nThese generated validation rules can then be used to validate new datasets.\r\n\r\n\r\n=======\r\nHistory\r\n=======\r\n\r\n0.1.0 (2021-11-21)\r\n------------------\r\n\r\n* First release on PyPI.\r\n\r\n0.1.1 (2021-11-23)\r\n------------------\r\n\r\n* Added more documentation to the README text\r\n\r\n0.1.2 (2022-1-20)\r\n-----------------\r\n\r\n* Bug fixes wrt some complex expressions\r\n\r\n0.1.3 (2022-1-26)\r\n-----------------\r\n\r\n* Optimized rule generation process\r\n\r\n0.1.4 (2022-1-26)\r\n-----------------\r\n\r\n* Evaluated columns in then part are now dependent on if part of rule\r\n\r\n0.1.5 (2022-1-30)\r\n-----------------\r\n\r\n* Rule with quantiles added (including evaluating intermediate results)\r\n\r\n0.1.6 and 0.1.7 (2022-2-1)\r\n--------------------------\r\n\r\n* A number of optimization in rule generation process\r\n\r\n0.1.8 (2022-2-3)\r\n----------------\r\n\r\n* Rule power factor metric added\r\n\r\n0.1.12 (2022-5-11)\r\n------------------\r\n\r\n* Optimizations: metric calculations are done with boolean masks of DataFrame\r\n\r\n0.1.14 (2023-4-17)\r\n------------------\r\n\r\n* Nested functions added\r\n* substr and in operators added\r\n\r\n0.1.16 (2023-8-3)\r\n-----------------\r\n\r\n* Templates now do not necessarily have to contain a regex\r\n* Bug fix when evaluating rules that contain columns that do not exist\r\n* Templates now can start with 'if () then'\r\n\r\n0.1.17 (2023-8-8)\r\n-----------------\r\n\r\n* Generate rules now runs without specified data\r\n\r\n0.1.18 (2023-8-8)\r\n-----------------\r\n\r\n* Dedicated function added for template to rule conversion without data\r\n* Exp sign changed from ^ to **\r\n\r\n0.1.19 (2023-8-27)\r\n------------------\r\n\r\n* Small fixes rule conversion without data\r\n\r\n0.1.20 (2023-8-29)\r\n------------------\r\n\r\n* Small fixes in evaluating rules with syntax errors\r\n\r\n0.1.21 (2023-10-11)\r\n-------------------\r\n\r\n* changed sum to nansum\r\n* added tolerance functionality for ==\r\n\r\n0.1.22 (2023-10-17)\r\n-------------------\r\n\r\n* added tolerance functionality for !=, <, <=, > and >=\r\n* updated docs\r\n\r\n0.1.23 (2023-10-18)\r\n-------------------\r\n\r\n* added nested conditions in functions\r\n\r\n0.1.24 (2023-10-25)\r\n-------------------\r\n\r\n* added sumif and improved tolerance functionality\r\n\r\n0.1.26 (2023-4-22)\r\n-------------------\r\n\r\n* added additional arguments estimate, base and sample_weights to fit_ensemble_and_extract_expressions function to use more than AdaBoost\r\n* added decision tree functions to __init__.py\r\n",
    "bugtrack_url": null,
    "license": "MIT license",
    "summary": "Python package to mine association rules in datasets",
    "version": "0.1.26",
    "project_urls": {
        "Homepage": "https://github.com/wjwillemse/ruleminer"
    },
    "split_keywords": [
        "ruleminer"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0d17dfd0b8b1f1bf7dd3e1a434a7537b1fb9bb532182f1930ca63ba48b982f0e",
                "md5": "8f0152707d589f9e5d27c571e3950a88",
                "sha256": "b688258550f83827d8b08745191cbdf2b94bbff43b9b6115a78ecf4b0d5125cf"
            },
            "downloads": -1,
            "filename": "ruleminer-0.1.26-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8f0152707d589f9e5d27c571e3950a88",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": ">=3.6",
            "size": 21502,
            "upload_time": "2024-04-22T10:24:39",
            "upload_time_iso_8601": "2024-04-22T10:24:39.143317Z",
            "url": "https://files.pythonhosted.org/packages/0d/17/dfd0b8b1f1bf7dd3e1a434a7537b1fb9bb532182f1930ca63ba48b982f0e/ruleminer-0.1.26-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-22 10:24:39",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "wjwillemse",
    "github_project": "ruleminer",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "tox": true,
    "lcname": "ruleminer"
}
        
Elapsed time: 0.31627s