# dfa-identify
Python library for identifying (learning) minimal DFAs from labeled examples
by reduction to SAT.
[](https://cloud.drone.io/mvcisback/dfa-identify)
[](https://badge.fury.io/py/dfa-identify)
[](https://opensource.org/licenses/MIT)
**Table of Contents**
- [Installation](#installation)
- [Usage](#usage)
- [Encoding](#encoding)
- [Goals and related libraries](#goals-and-related-libraries)
# Installation
If you just need to use `dfa`, you can just run:
`$ pip install dfa`
For developers, note that this project uses the
[poetry](https://poetry.eustace.io/) python package/dependency
management tool. Please familarize yourself with it and then
run:
`$ poetry install`
# Usage
`dfa_identify` is centered around the `find_dfa` and `find_dfas` function. Both take in
sequences of accepting and rejecting "words", where are word is a
sequence of arbitrary python objects.
1. `find_dfas` returns all minimally sized (no `DFA`s exist of size
smaller) consistent with the given labeled data.
2. `find_dfa` returns an arbitrary (first) minimally sized `DFA`.
The returned `DFA` object is from the [dfa](https://github.com/mvcisback/dfa) library.
```python
from dfa_identify import find_dfa
accepting = ['a', 'abaa', 'bb']
rejecting = ['abb', 'b']
my_dfa = find_dfa(accepting=accepting, rejecting=rejecting)
assert all(my_dfa.label(x) for x in accepting)
assert all(not my_dfa.label(x) for x in rejecting)
```
Because words are sequences of arbitrary python objects, the
identification problem, with `a` ↦ 0 and `b` ↦ 1, is given below:
```python
accepting = [[0], [0, 'z', 0, 0], ['z', 'z']]
rejecting = [[0, 'z', 'z'], ['z']]
my_dfa = find_dfa(accepting=accepting, rejecting=rejecting)
```
# Active learning
There are also active variants of `find_dfa` and `find_dfas` called
`find_dfa_active` and `find_dfas_active` resp.
An example from the unit tests:
```python
import dfa
from dfa_identify import find_dfa_active
def oracle(word):
return sum(word) % 2 == 0
lang = find_dfa_active(alphabet=[0, 1],
oracle=oracle,
n_queries=20)
assert lang == dfa.DFA(
inputs=[0,1],
label=lambda s: s,
transition=lambda s, c: s ^ bool(c),
start=True
)
# Can also send in positive and negative examples:
lang = find_dfa_active(alphabet=[0, 1],
positive=[(0,), (0,0)],
negative=[(1,), (1,0)],
oracle=oracle,
n_queries=20)
```
# Learning Decomposed DFAs
The is also support for learning decomposed DFAs following,
[Learning Deterministic Finite Automata Decompositions
from Examples and Demonstrations. FMCAD`22](
https://doi.org/10.34727/2022/isbn.978-3-85448-053-2_39).
These are tuples of DFAs whose labels are combined to determine
the acceptence / rejection of string, e.g., via conjunction or
disjunction.
Similar to learning a monolithic dfa, this functionality can
be accessed using `find_decomposed_dfas`. For example:
```python
from dfa_identify import find_decomposed_dfas
accepting = ['y', 'yy', 'gy', 'bgy', 'bbgy', 'bggy']
rejecting = ['', 'r', 'ry', 'by', 'yr', 'gr', 'rr', 'rry', 'rygy']
# --------------------------------------
# 1. Learn a disjunctive decomposition.
# --------------------------------------
gen_dfas = find_decomposed_dfas(accepting=accepting,
rejecting=rejecting,
n_dfas=2,
order_by_stutter=True,
decompose_via="disjunction")
dfas = next(gen_dfas)
monolithic = dfas[0] | dfas[1] # Build DFA that is the union of languages.
assert all(monolithic.label(w) for w in accepting)
assert not any(monolithic.label(w) for w in rejecting)
# Each dfa must reject a rejecting string.
assert all(all(~d.label(w) for d in dfas) for w in rejecting)
# At least one dfa must accept an accepting string.
assert all(any(d.label(w) for d in dfas) for w in accepting)
# --------------------------------------
# 2. Learn a conjunctive decomposition.
# --------------------------------------
gen_dfas = find_decomposed_dfas(accepting=accepting,
rejecting=rejecting,
n_dfas=2,
order_by_stutter=True,
decompose_via="conjunction")
dfas = next(gen_dfas)
monolithic = dfas[0] & dfas[1] # Build DFA that is the union of languages.
assert all(monolithic.label(w) for w in accepting)
assert not any(monolithic.label(w) for w in rejecting)
```
# Minimality
There are two forms of "minimality" supported by `dfa-identify`.
1. By default, dfa-identify returns DFAs that have the minimum
number of states required to seperate the accepting and
rejecting set.
2. If the `order_by_stutter` flag is set to `True`, then the
`find_dfas` (lazily) orders the DFAs so that the number of
self loops (stuttering transitions) appearing the DFAs decreases.
`find_dfa` thus returns a DFA with the most number of self loops
given the minimal number of states.
# Encoding
This library currently uses the encodings outlined in [Heule, Marijn JH, and Sicco Verwer. "Exact DFA identification using SAT solvers." International Colloquium on Grammatical Inference. Springer, Berlin, Heidelberg, 2010.](https://link.springer.com/chapter/10.1007/978-3-642-15488-1_7) and [Ulyantsev, Vladimir, Ilya Zakirzyanov, and Anatoly Shalyto. "Symmetry Breaking Predicates for SAT-based DFA Identification."](https://arxiv.org/abs/1602.05028).
The key difference is in the use of the symmetry breaking clauses. Two kinds are exposed.
1. clique (Heule 2010): Partially breaks symmetries by analyzing
conflict graph.
2. bfs (Ulyantsev 2016): Breaks all symmetries so that each model corresponds to a unique DFA.
# Goals and related libraries
There are many other python libraries that
perform DFA and other automata inference.
1. [DFA-Inductor-py](https://github.com/ctlab/DFA-Inductor-py) - State of the art passive inference via reduction to SAT (as of 2019).
2. [z3gi](https://gitlab.science.ru.nl/rick/z3gi): Uses SMT backed passive learning algorithm.
3. [lstar](https://pypi.org/project/lstar/): Active learning algorithm based L* derivative.
The primary goal of this library is to loosely track the state of the art in passive SAT based inference while providing a simple implementation and API.
Raw data
{
"_id": null,
"home_page": "https://github.com/mvcisback/dfa-identify",
"name": "dfa_identify",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.9,<4.0",
"maintainer_email": "",
"keywords": "",
"author": "Marcell Vazquez-Chanlatte",
"author_email": "mvc@linux.com",
"download_url": "https://files.pythonhosted.org/packages/2a/3d/a6ff9c528291772412fc55abfa793cac5500aba741de2a62c6033287b4e2/dfa_identify-3.13.0.tar.gz",
"platform": null,
"description": "# dfa-identify\nPython library for identifying (learning) minimal DFAs from labeled examples\nby reduction to SAT.\n\n[](https://cloud.drone.io/mvcisback/dfa-identify)\n[](https://badge.fury.io/py/dfa-identify)\n[](https://opensource.org/licenses/MIT)\n\n**Table of Contents**\n\n- [Installation](#installation)\n- [Usage](#usage)\n- [Encoding](#encoding)\n- [Goals and related libraries](#goals-and-related-libraries)\n\n# Installation\n\nIf you just need to use `dfa`, you can just run:\n\n`$ pip install dfa`\n\nFor developers, note that this project uses the\n[poetry](https://poetry.eustace.io/) python package/dependency\nmanagement tool. Please familarize yourself with it and then\nrun:\n\n`$ poetry install`\n\n# Usage\n\n`dfa_identify` is centered around the `find_dfa` and `find_dfas` function. Both take in\nsequences of accepting and rejecting \"words\", where are word is a\nsequence of arbitrary python objects. \n\n1. `find_dfas` returns all minimally sized (no `DFA`s exist of size\nsmaller) consistent with the given labeled data.\n\n2. `find_dfa` returns an arbitrary (first) minimally sized `DFA`.\n\nThe returned `DFA` object is from the [dfa](https://github.com/mvcisback/dfa) library.\n\n```python\nfrom dfa_identify import find_dfa\n\n\naccepting = ['a', 'abaa', 'bb']\nrejecting = ['abb', 'b']\n \nmy_dfa = find_dfa(accepting=accepting, rejecting=rejecting)\n\nassert all(my_dfa.label(x) for x in accepting)\nassert all(not my_dfa.label(x) for x in rejecting)\n```\n\nBecause words are sequences of arbitrary python objects, the\nidentification problem, with `a` \u21a6 0 and `b` \u21a6 1, is given below:\n\n\n```python\naccepting = [[0], [0, 'z', 0, 0], ['z', 'z']]\nrejecting = [[0, 'z', 'z'], ['z']]\n\nmy_dfa = find_dfa(accepting=accepting, rejecting=rejecting)\n```\n\n# Active learning\n\nThere are also active variants of `find_dfa` and `find_dfas` called\n`find_dfa_active` and `find_dfas_active` resp. \n\nAn example from the unit tests:\n\n```python\nimport dfa\nfrom dfa_identify import find_dfa_active\n\n\ndef oracle(word):\n return sum(word) % 2 == 0\n\nlang = find_dfa_active(alphabet=[0, 1],\n oracle=oracle,\n n_queries=20)\nassert lang == dfa.DFA(\n inputs=[0,1],\n label=lambda s: s,\n transition=lambda s, c: s ^ bool(c),\n start=True\n)\n\n# Can also send in positive and negative examples:\nlang = find_dfa_active(alphabet=[0, 1],\n positive=[(0,), (0,0)],\n negative=[(1,), (1,0)],\n oracle=oracle,\n n_queries=20)\n\n```\n\n# Learning Decomposed DFAs\n\nThe is also support for learning decomposed DFAs following,\n[Learning Deterministic Finite Automata Decompositions\n from Examples and Demonstrations. FMCAD`22](\n https://doi.org/10.34727/2022/isbn.978-3-85448-053-2_39).\nThese are tuples of DFAs whose labels are combined to determine\nthe acceptence / rejection of string, e.g., via conjunction or\ndisjunction.\n\nSimilar to learning a monolithic dfa, this functionality can\nbe accessed using `find_decomposed_dfas`. For example:\n\n```python\nfrom dfa_identify import find_decomposed_dfas\naccepting = ['y', 'yy', 'gy', 'bgy', 'bbgy', 'bggy']\nrejecting = ['', 'r', 'ry', 'by', 'yr', 'gr', 'rr', 'rry', 'rygy']\n\n# --------------------------------------\n# 1. Learn a disjunctive decomposition.\n# --------------------------------------\ngen_dfas = find_decomposed_dfas(accepting=accepting,\n rejecting=rejecting,\n n_dfas=2,\n order_by_stutter=True,\n decompose_via=\"disjunction\")\ndfas = next(gen_dfas)\n\nmonolithic = dfas[0] | dfas[1] # Build DFA that is the union of languages.\nassert all(monolithic.label(w) for w in accepting)\nassert not any(monolithic.label(w) for w in rejecting)\n\n# Each dfa must reject a rejecting string.\nassert all(all(~d.label(w) for d in dfas) for w in rejecting)\n# At least one dfa must accept an accepting string.\nassert all(any(d.label(w) for d in dfas) for w in accepting)\n\n# --------------------------------------\n# 2. Learn a conjunctive decomposition.\n# --------------------------------------\ngen_dfas = find_decomposed_dfas(accepting=accepting,\n rejecting=rejecting,\n n_dfas=2,\n order_by_stutter=True,\n decompose_via=\"conjunction\")\ndfas = next(gen_dfas)\n\nmonolithic = dfas[0] & dfas[1] # Build DFA that is the union of languages.\nassert all(monolithic.label(w) for w in accepting)\nassert not any(monolithic.label(w) for w in rejecting)\n\n```\n\n# Minimality\n\nThere are two forms of \"minimality\" supported by `dfa-identify`.\n\n1. By default, dfa-identify returns DFAs that have the minimum\n number of states required to seperate the accepting and\n rejecting set.\n2. If the `order_by_stutter` flag is set to `True`, then the\n `find_dfas` (lazily) orders the DFAs so that the number of\n self loops (stuttering transitions) appearing the DFAs decreases.\n `find_dfa` thus returns a DFA with the most number of self loops\n given the minimal number of states.\n\n# Encoding\n\nThis library currently uses the encodings outlined in [Heule, Marijn JH, and Sicco Verwer. \"Exact DFA identification using SAT solvers.\" International Colloquium on Grammatical Inference. Springer, Berlin, Heidelberg, 2010.](https://link.springer.com/chapter/10.1007/978-3-642-15488-1_7) and [Ulyantsev, Vladimir, Ilya Zakirzyanov, and Anatoly Shalyto. \"Symmetry Breaking Predicates for SAT-based DFA Identification.\"](https://arxiv.org/abs/1602.05028).\n\nThe key difference is in the use of the symmetry breaking clauses. Two kinds are exposed.\n\n1. clique (Heule 2010): Partially breaks symmetries by analyzing\n conflict graph.\n2. bfs (Ulyantsev 2016): Breaks all symmetries so that each model corresponds to a unique DFA.\n\n# Goals and related libraries\n\nThere are many other python libraries that \nperform DFA and other automata inference.\n\n1. [DFA-Inductor-py](https://github.com/ctlab/DFA-Inductor-py) - State of the art passive inference via reduction to SAT (as of 2019).\n2. [z3gi](https://gitlab.science.ru.nl/rick/z3gi): Uses SMT backed passive learning algorithm.\n3. [lstar](https://pypi.org/project/lstar/): Active learning algorithm based L* derivative.\n\nThe primary goal of this library is to loosely track the state of the art in passive SAT based inference while providing a simple implementation and API.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Python library for identifying (learning) DFAs (automata) from labeled examples.",
"version": "3.13.0",
"project_urls": {
"Homepage": "https://github.com/mvcisback/dfa-identify",
"Repository": "https://github.com/mvcisback/dfa-identify"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "7770022ca336f0a029ddb2525fe3de5f3f3cfdcf7a4800872d8c42d26f3b5566",
"md5": "871aa4b7d0f36b820898c3bb3a9b835d",
"sha256": "4e701e25782d87ccf9c0d68f7d4bdac0f123d2bbed4c8dee1c0a2c706f5f6038"
},
"downloads": -1,
"filename": "dfa_identify-3.13.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "871aa4b7d0f36b820898c3bb3a9b835d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9,<4.0",
"size": 19704,
"upload_time": "2024-03-18T18:37:40",
"upload_time_iso_8601": "2024-03-18T18:37:40.904925Z",
"url": "https://files.pythonhosted.org/packages/77/70/022ca336f0a029ddb2525fe3de5f3f3cfdcf7a4800872d8c42d26f3b5566/dfa_identify-3.13.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "2a3da6ff9c528291772412fc55abfa793cac5500aba741de2a62c6033287b4e2",
"md5": "5c6644c435883e17640dc60154890221",
"sha256": "8beec037b81e6c0f40c67fbf1c565dd0a3d4373300315eccee032744e0e6e13c"
},
"downloads": -1,
"filename": "dfa_identify-3.13.0.tar.gz",
"has_sig": false,
"md5_digest": "5c6644c435883e17640dc60154890221",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9,<4.0",
"size": 17933,
"upload_time": "2024-03-18T18:37:43",
"upload_time_iso_8601": "2024-03-18T18:37:43.059980Z",
"url": "https://files.pythonhosted.org/packages/2a/3d/a6ff9c528291772412fc55abfa793cac5500aba741de2a62c6033287b4e2/dfa_identify-3.13.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-03-18 18:37:43",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "mvcisback",
"github_project": "dfa-identify",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "dfa_identify"
}