regex-learner


Nameregex-learner JSON
Version 0.0.4 PyPI version JSON
download
home_pagehttps://github.com/IBM/regex-learner
SummaryThe project provides a tool/library implementing an automated regular expression building mechanism.
upload_time2023-08-28 15:18:24
maintainer
docs_urlNone
authorStefano Braghin, Liubov Nedoshivina
requires_python>=3.8
licenseApache License 2.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Regex-learner

This project provides a tool/library implementing an automated regular expression building mechanism.

This project takes inspiration on the paper from Ilyas, et al [1]

[Ilyas, Andrew, M. F. da Trindade, Joana, Castro Fernandez, Raul and Madden, Samuel. 2018. "Extracting Syntactical Patterns from Databases."](https://hdl.handle.net/1721.1/137774)

This repository contains code and examples to assist in the exeuction of regular expression learning from the columns of data.

This is a basic readme. It will be completed as the prototype grows.

# Installation

The project can be installed via pip:
```bash
pip install regex-learner
```

# Examples of usage

Example of learning a date pattern from 100 examples of randomly sampled dates in the format DD-MM-YYYY.

```python
from xsystem import XTructure
from faker import Faker

fake = Faker()
x = XTructure() # Create basic XTructure class

for _ in range(100):
    d = fake.date(pattern=r"%d-%m-%Y") # Create example of data - date in the format DD-MM-YYYY
    x.learn_new_word(d) # Add example to XSystem and learn new features

print(str(x)) # ([0312][0-9])(-)([01][891652073])(-)([21][09][078912][0-9])
```

Similary, the tool can be used directly from the command line using the `regex-learner` CLI provided by the installation of the package.

The tool has several options, as described by the help message:

```
> regex-learner -h
usage: regex-learner [-h] [-i INPUT] [-o OUTPUT] [--max-branch MAX_BRANCH] [--alpha ALPHA] [--branch-threshold BRANCH_THRESHOLD]

A simple tool to learn human readable a regular expression from examples

options:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        Path to the input source, defaults to stdin
  -o OUTPUT, --output OUTPUT
                        Path to the output file, defaults to stdout
  --max-branch MAX_BRANCH
                        Maximum number of branches allowed, defaults to 8
  --alpha ALPHA         Weight for fitting tuples, defaults to 1/5
  --branch-threshold BRANCH_THRESHOLD
                        Branching threshold, defaults to 0.85, relative to the fitting score alpha
```

Assuming a data file containing the examples to learn from is called `EXAMPLE_FILE`, and assuming one is interested in a very simple regular expression, the tool can be used as follows:

```bash
cat EXAMPLE_FILE | regex-learner --max-branch 2
```

## Note
Note that this project is not based on the actual implementation of the paper as presented in [2]

## References
1. Ilyas, Andrew, et al. "Extracting syntactical patterns from databases." 2018 IEEE 34th International Conference on Data Engineering (ICDE). IEEE, 2018.
2. https://github.com/mitdbg/XSystem

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/IBM/regex-learner",
    "name": "regex-learner",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "",
    "author": "Stefano Braghin, Liubov Nedoshivina",
    "author_email": "\"Liubov Nedoshivia\" <liubov.nedoshivina@ibm.com>",
    "download_url": "https://files.pythonhosted.org/packages/d1/4f/a0f85e09fdfa431080d97949a2e71e26e2a01a08b48eb684d06726adfce3/regex-learner-0.0.4.tar.gz",
    "platform": null,
    "description": "# Regex-learner\n\nThis project provides a tool/library implementing an automated regular expression building mechanism.\n\nThis project takes inspiration on the paper from Ilyas, et al [1]\n\n[Ilyas, Andrew, M. F. da Trindade, Joana, Castro Fernandez, Raul and Madden, Samuel. 2018. \"Extracting Syntactical Patterns from Databases.\"](https://hdl.handle.net/1721.1/137774)\n\nThis repository contains code and examples to assist in the exeuction of regular expression learning from the columns of data.\n\nThis is a basic readme. It will be completed as the prototype grows.\n\n# Installation\n\nThe project can be installed via pip:\n```bash\npip install regex-learner\n```\n\n# Examples of usage\n\nExample of learning a date pattern from 100 examples of randomly sampled dates in the format DD-MM-YYYY.\n\n```python\nfrom xsystem import XTructure\nfrom faker import Faker\n\nfake = Faker()\nx = XTructure() # Create basic XTructure class\n\nfor _ in range(100):\n    d = fake.date(pattern=r\"%d-%m-%Y\") # Create example of data - date in the format DD-MM-YYYY\n    x.learn_new_word(d) # Add example to XSystem and learn new features\n\nprint(str(x)) # ([0312][0-9])(-)([01][891652073])(-)([21][09][078912][0-9])\n```\n\nSimilary, the tool can be used directly from the command line using the `regex-learner` CLI provided by the installation of the package.\n\nThe tool has several options, as described by the help message:\n\n```\n> regex-learner -h\nusage: regex-learner [-h] [-i INPUT] [-o OUTPUT] [--max-branch MAX_BRANCH] [--alpha ALPHA] [--branch-threshold BRANCH_THRESHOLD]\n\nA simple tool to learn human readable a regular expression from examples\n\noptions:\n  -h, --help            show this help message and exit\n  -i INPUT, --input INPUT\n                        Path to the input source, defaults to stdin\n  -o OUTPUT, --output OUTPUT\n                        Path to the output file, defaults to stdout\n  --max-branch MAX_BRANCH\n                        Maximum number of branches allowed, defaults to 8\n  --alpha ALPHA         Weight for fitting tuples, defaults to 1/5\n  --branch-threshold BRANCH_THRESHOLD\n                        Branching threshold, defaults to 0.85, relative to the fitting score alpha\n```\n\nAssuming a data file containing the examples to learn from is called `EXAMPLE_FILE`, and assuming one is interested in a very simple regular expression, the tool can be used as follows:\n\n```bash\ncat EXAMPLE_FILE | regex-learner --max-branch 2\n```\n\n## Note\nNote that this project is not based on the actual implementation of the paper as presented in [2]\n\n## References\n1. Ilyas, Andrew, et al. \"Extracting syntactical patterns from databases.\" 2018 IEEE 34th International Conference on Data Engineering (ICDE). IEEE, 2018.\n2. https://github.com/mitdbg/XSystem\n",
    "bugtrack_url": null,
    "license": "Apache License 2.0",
    "summary": "The project provides a tool/library implementing an automated regular expression building mechanism.",
    "version": "0.0.4",
    "project_urls": {
        "Homepage": "https://github.com/IBM/regex-learner"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "015e78c0f07a08f285f3fdadf55288ea30966c61006970a64ffabedf695abc60",
                "md5": "6ba07de961480c7c6ccb9cb7d8068736",
                "sha256": "2a46f7983421a73faf2d80de1a57a1100b34126b043a8a029d516e9451fe86e0"
            },
            "downloads": -1,
            "filename": "regex_learner-0.0.4-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6ba07de961480c7c6ccb9cb7d8068736",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": ">=3.8",
            "size": 10932,
            "upload_time": "2023-08-28T15:18:24",
            "upload_time_iso_8601": "2023-08-28T15:18:24.051976Z",
            "url": "https://files.pythonhosted.org/packages/01/5e/78c0f07a08f285f3fdadf55288ea30966c61006970a64ffabedf695abc60/regex_learner-0.0.4-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d14fa0f85e09fdfa431080d97949a2e71e26e2a01a08b48eb684d06726adfce3",
                "md5": "d3b376a32ec88ed598e17ec872a963dc",
                "sha256": "f92d9d918616bcf360f64aecb0384c39701886c514f4d548735aa31497a4bee8"
            },
            "downloads": -1,
            "filename": "regex-learner-0.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "d3b376a32ec88ed598e17ec872a963dc",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 10417,
            "upload_time": "2023-08-28T15:18:24",
            "upload_time_iso_8601": "2023-08-28T15:18:24.969913Z",
            "url": "https://files.pythonhosted.org/packages/d1/4f/a0f85e09fdfa431080d97949a2e71e26e2a01a08b48eb684d06726adfce3/regex-learner-0.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-28 15:18:24",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "IBM",
    "github_project": "regex-learner",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "regex-learner"
}
        
Elapsed time: 0.18675s