Name | comber JSON |
Version |
1.0.1
JSON |
| download |
home_page | None |
Summary | Parser combinator library with a BNF-esque interface |
upload_time | 2025-09-07 03:41:12 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.10 |
license | None |
keywords |
parser
combinator
parsing
compiler
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
""""""
Comber
""""""
Comber is a parser combinator library for creating text parsers in plain Python with a BNF flavor.
For instance, we could define a simple grammar for calling functions on integer values like this:
.. code-block:: python
from comber import C, rs, inf
keyword = rs(r'[_a-zA-Z][_a-zA-Z0-9]*')@('keyword')
number = rs(r'[0-9]+')@('number', int)
package = keyword[1, inf, ',']
import_statement = C+ 'import' + package
value = keyword | number
function_call = keyword + '(' + value**',' + ')'
assignment = C+ 'let' + keyword + '=' + (function_call | value)
grammar = (import_statement | assignment | function_call)[0, inf]
Our toy grammar can handle simple import statements, variable assignment, and function calling. For example::
import math
import sys.io
let user_number = read()
let double = multiple(user_number, 2)
print(double)
To use our parser, we simply pass a string to it:
.. code-block:: python
from comber import ParseError
code = "add(17, 3)"
try:
parseState = grammar(code)
print('Parsed tokens: %s' % parseState.tree)
except ParseError as ex:
print(ex)
The ``grammar`` call returns the final parser state on a successful parse, from which you can retrieve the final parse
tree. In our case, it is effectively a list::
['add', '(', 17, 3, ')']
Our grammar only supports variables and integers. Suppose we tried to assign pythong-style string to a variable::
let text = "Here's some text"
If we tried to parse this, we'd get an exception like::
1:12: Unexpected Text: "Here's som. Expected one of: @keyword, @number
----------
Installing
----------
You can get the latest release from pip::
$ pip install comber
or from the source::
$ pip install .
=============
Documentation
=============
.. contents:: Contents
:depth: 3
================
Building Parsers
================
The most basic parsers are string literals (which match exactly), ``rs`` for matching a regular expression, and ``cs`` for
matching any of the strings of an iterable.
The ``rs`` parser can be seen in our toy grammar above. Standard Python regular expressions are supported. The entire
match is used as the result of the parse, regardless of groupings etc.
The ``cs`` parse takes any iterable of strings. The first string that exactly matches (the start of) the input text. If
none of the strings match, the parse fails. Since strings are themselves iterables, we could add addition and
subtraction to our toy grammar like this:
.. code-block:: python
from comber import C, cs
addition = value + cs('+-') + value
assignment = C+ 'let' + keyword + '=' + (addition | function_call | value)
String literals can only be used in combination with other parsers. If a string literal would start a sequence, or
otherwise appear alone, make use of the C parser.
------------
The C Parser
------------
The C parser, on its own, consumes no text, produces no tokens, and always succeeds. It's most useful for starting a
parser that would otherwise begin with a string literal. E.g. this:
.. code-block:: python
'let' + keyword + '=' + (function_call | value)
would actually throw a Python error because 'let' isn't *really* a parser - yet! That's where ``C`` comes in:
.. code-block:: python
C+ 'let' + keyword + '=' + (function_call | value)
``C`` starts off the sequence, so we can use any combination of parsers and string literals we like from there. It works
similarly with alternatives, so if we wanted to allow ``set`` to be used as a synonym for ``let``, we might do:
.. code-block:: python
(C| 'let'|'set') + keyword + '=' + (function_call | value)
``C`` can also be used to wrap a parser to protect it from optimization; for instance, embedding one sequence or
alternative set inside another. If, for instance, we extended our grammar to allow a bare value to be a whole statement:
.. code-block:: python
value = (keyword | number)@'value'
grammar = (import_statement | assignment | function_call | value)[0, inf]
-----------------
Basic Combinators
-----------------
Parsers can be combined in series with ``+``:
.. code-block:: python
name + address + pet
A sequence of parsers is evaluated left to right, each consuming text before the next is evaluated. If at any point in
the sequence a parser fails, the entire sequence fails.
A set of alternatives is built with ``|``:
.. code-block:: python
name | idnumber | location
Alternatives are considered left to right, with the first successful match being the match for the entire set. Be
careful! This means that for some sets of alternatives, the "obvious" parser may not be the one used, simply because it
came after another match.
Both sequences and alternatives will flatten like combinators, such that:
.. code-block:: python
name = firstname + lastname
salutation = C+ 'Hello' + name + '!'
is equivalent to:
.. code-block:: python
salutation = C+ 'Hello' + firstname + lastname + '!'
If you need to mantain the logical separation (to parse correctly, or maintain the name of a subparser), wrap the
subparser with ``C``:
.. code-block:: python
name = C(firstname + lastname)
salutation = C+ 'Hello' + name + '!'
----------
Repetition
----------
The most flexible option for specifying repetition is brackets:
.. code-block:: python
keyword[0, 10, ',']
The above would parse ``keyword`` zero to ten times, separated by a comma. The separator is optional - without it, the
result would simply parse ``keyword`` zero to ten times.
We could also specify parsing an exact number of times:
.. code-block:: python
keyword[10]
Or, with a separator:
.. code-block:: python
keyword[10, None, ',']
Infinity - ``math.inf`` - is a valid maximum value. For convenience, it can be imported directly from Comber:
.. code-block:: python
from comber import inf
param_list = keyword[0, inf, ',']
There are several convenience combinators for common types of repetition.
For zero or more with a separator, using ``*``:
.. code-block:: python
parser*','
Or zero or more without a separator, using the unary ``+``:
.. code-block:: python
+parser
You can declare a parser as *optional* with ``~``:
.. code-block:: python
~parser
------------------
Recursive Grammars
------------------
Consider a program in our toy grammar::
let word = exp(2, 16)
let maxint = minus(word, 1)
It'd be nice to simplify this by eliminating the variable "word" and pass the exp() call directly to minus(), but to
allow that, we need to extend our grammar to consider a function call to be a value so we can use one as a function
argument. But to do that, we'd need to use ``function_call`` in our definition of ``value`` - but ``function_call`` likewise
needs to reference ``value``.
To solve this issue, we can define one of them as ``defer``.
.. code-block:: python
from comber import C, rs, inf, defer
...
value = defer()
function_call = keyword + '(' + value**',' + ')'
value.fill(keyword | number | function_call)
This way, we can refer to ``value`` wherever we want, and only define its meaning when we're ready. We can safely build
fairly complex grammars this way, but be wary of performance.
-----------------
Controlling Space
-----------------
Before each subparser is run, Comber consumes leading whitespace. By default, this is spaces, tabs, and newlines. You
can change this in two ways: set the ``whitespace`` property on the outermost parser to a string containing the new
whitespace characters, or likewise passing a string as the second argument to the parser. In either case, the value
``None`` will disable the feature altogether.
====================
Building Parse Trees
====================
When you call a parser on a some text, they return a ``State`` object containing the resultant parse tree (``State.tree``).
By default, parsers output a "flat" tree - a list of strings parsed by the "leaf" parsers (i.e. string literals,
``rs``, and ``cs``).
To build a more useful parse tree, you have to provide *emitters*. Our toy grammar contains a simple one that converts
integers in the input to Python ``int`` values:
.. code-block:: Python
number = rs(r'[0-9]+')@('number', int)
So if we ran our parser on the input "let foo = 5" the resulting state's ``tree`` property would be ``["let", "foo", "=",
5]``. But it'd be more useful if it resulted in some kind of "let" object (that could execute the assignment, or be fed
to a VM, or whatever else). We could define one we can use as a emitter for our let statement like this:
.. code-block:: Python
class Let:
def __init__(self, let, variable, eq, value):
self.variable = variable
self.value = value
# We can ignore `let` and `eq` (which will always contain "let" and "=", respectively).
def __repr__(self):
return f'Let({self.variable}, {self.value})'
and redefine the assignment rule like:
.. code-block:: Python
assignment = (C+ 'let' + keyword + '=' + (function_call | value))@Let
Now if rerun the parser on "let foo = 5", we get ``[Let(foo, 5)]``.
As we did with number, you can also combine an emitter with a name, to improve error messages:
.. code-block:: Python
assignment = (C+ 'let' + keyword + '=' + (function_call | value))@("assignment", Let)
===========
Performance
===========
Under the covers, Comber is essentially a recursive descent parser. It's best suited for relatively shallow grammars
parsing small amounts of text.
You can gain a bit more speed by calling ``analyze()`` on the top-level/outermost parser. This modifies the parser (and
subparsers) in place:
.. code-block:: Python
grammar.analyze()
grammar('max(1, 4)')
====
TODO
====
-------------------
Available Operators
-------------------
Operators that Python allows to overridden
======== ============== ===========
Operator Method Current use
======== ============== ===========
\+ __add__ sequences
\| __or__ selection
[ ] __getitem__ repeat
@ __matmul__ names and internalization
< __lt__
> __gt__
<= __le__
>= __ge__
== __eq__
!= __ne__
is _is
is not is_not
\- __sub__
% __mod__
\* __mul__ zero or more, with provided separator
\** __pow__
/ __truediv__
// __floordiv__
& __and__
^ __xor__
<< __lshift__
>> __rshift__
in __contains__
======== ============== ===========
Unary operators:
======== =========== ===========
Operator Method Current use
======== =========== ===========
~ __invert__ optional
not __not__
\- __neg__
\+ __pos__ zero or more
======== =========== ===========
And::
() __call__ parse a string
=========
Copyright
=========
Comber copyright 2025 Ray Wallace III
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, **version 2.1**.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
Raw data
{
"_id": null,
"home_page": null,
"name": "comber",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "parser, combinator, parsing, compiler",
"author": null,
"author_email": "Ray Wallace III <rwallace3@proton.me>",
"download_url": "https://files.pythonhosted.org/packages/6a/8b/1f2db267eeea9029ec8a9e793d8ca6850d9516f5776c3ff3da19c9b8d7d1/comber-1.0.1.tar.gz",
"platform": null,
"description": "\"\"\"\"\"\"\nComber\n\"\"\"\"\"\"\n\nComber is a parser combinator library for creating text parsers in plain Python with a BNF flavor.\n\nFor instance, we could define a simple grammar for calling functions on integer values like this:\n\n.. code-block:: python\n\n from comber import C, rs, inf\n \n keyword = rs(r'[_a-zA-Z][_a-zA-Z0-9]*')@('keyword')\n number = rs(r'[0-9]+')@('number', int)\n package = keyword[1, inf, ',']\n import_statement = C+ 'import' + package\n value = keyword | number\n function_call = keyword + '(' + value**',' + ')'\n assignment = C+ 'let' + keyword + '=' + (function_call | value)\n grammar = (import_statement | assignment | function_call)[0, inf]\n\nOur toy grammar can handle simple import statements, variable assignment, and function calling. For example::\n\n import math\n import sys.io\n \n let user_number = read()\n let double = multiple(user_number, 2)\n \n print(double)\n\nTo use our parser, we simply pass a string to it:\n\n.. code-block:: python\n\n from comber import ParseError\n\n code = \"add(17, 3)\"\n try:\n parseState = grammar(code)\n print('Parsed tokens: %s' % parseState.tree)\n except ParseError as ex:\n print(ex)\n\nThe ``grammar`` call returns the final parser state on a successful parse, from which you can retrieve the final parse\ntree. In our case, it is effectively a list::\n\n ['add', '(', 17, 3, ')']\n\nOur grammar only supports variables and integers. Suppose we tried to assign pythong-style string to a variable::\n \n let text = \"Here's some text\"\n\nIf we tried to parse this, we'd get an exception like::\n\n 1:12: Unexpected Text: \"Here's som. Expected one of: @keyword, @number\n\n----------\nInstalling\n----------\n\nYou can get the latest release from pip::\n\n $ pip install comber\n\nor from the source::\n\n $ pip install .\n\n\n=============\nDocumentation\n=============\n\n.. contents:: Contents\n :depth: 3\n\n\n================\nBuilding Parsers\n================\n\nThe most basic parsers are string literals (which match exactly), ``rs`` for matching a regular expression, and ``cs`` for\nmatching any of the strings of an iterable.\n\nThe ``rs`` parser can be seen in our toy grammar above. Standard Python regular expressions are supported. The entire\nmatch is used as the result of the parse, regardless of groupings etc.\n\nThe ``cs`` parse takes any iterable of strings. The first string that exactly matches (the start of) the input text. If\nnone of the strings match, the parse fails. Since strings are themselves iterables, we could add addition and\nsubtraction to our toy grammar like this:\n\n.. code-block:: python\n\n from comber import C, cs\n\n addition = value + cs('+-') + value\n assignment = C+ 'let' + keyword + '=' + (addition | function_call | value)\n\nString literals can only be used in combination with other parsers. If a string literal would start a sequence, or\notherwise appear alone, make use of the C parser.\n\n------------\nThe C Parser\n------------\n\nThe C parser, on its own, consumes no text, produces no tokens, and always succeeds. It's most useful for starting a\nparser that would otherwise begin with a string literal. E.g. this:\n\n.. code-block:: python\n\n 'let' + keyword + '=' + (function_call | value)\n\nwould actually throw a Python error because 'let' isn't *really* a parser - yet! That's where ``C`` comes in:\n\n.. code-block:: python\n\n C+ 'let' + keyword + '=' + (function_call | value)\n\n``C`` starts off the sequence, so we can use any combination of parsers and string literals we like from there. It works\nsimilarly with alternatives, so if we wanted to allow ``set`` to be used as a synonym for ``let``, we might do:\n\n.. code-block:: python\n\n (C| 'let'|'set') + keyword + '=' + (function_call | value)\n\n``C`` can also be used to wrap a parser to protect it from optimization; for instance, embedding one sequence or\nalternative set inside another. If, for instance, we extended our grammar to allow a bare value to be a whole statement:\n\n.. code-block:: python\n\n value = (keyword | number)@'value'\n grammar = (import_statement | assignment | function_call | value)[0, inf]\n\n\n-----------------\nBasic Combinators\n-----------------\n\nParsers can be combined in series with ``+``:\n\n.. code-block:: python\n\n name + address + pet\n\nA sequence of parsers is evaluated left to right, each consuming text before the next is evaluated. If at any point in\nthe sequence a parser fails, the entire sequence fails.\n\nA set of alternatives is built with ``|``:\n\n.. code-block:: python\n\n name | idnumber | location\n\nAlternatives are considered left to right, with the first successful match being the match for the entire set. Be\ncareful! This means that for some sets of alternatives, the \"obvious\" parser may not be the one used, simply because it\ncame after another match. \n\nBoth sequences and alternatives will flatten like combinators, such that:\n\n.. code-block:: python\n\n name = firstname + lastname\n salutation = C+ 'Hello' + name + '!'\n\nis equivalent to:\n\n.. code-block:: python\n\n salutation = C+ 'Hello' + firstname + lastname + '!'\n\nIf you need to mantain the logical separation (to parse correctly, or maintain the name of a subparser), wrap the\nsubparser with ``C``:\n\n.. code-block:: python\n\n name = C(firstname + lastname)\n salutation = C+ 'Hello' + name + '!'\n\n----------\nRepetition\n----------\n\nThe most flexible option for specifying repetition is brackets:\n\n.. code-block:: python\n\n keyword[0, 10, ',']\n\nThe above would parse ``keyword`` zero to ten times, separated by a comma. The separator is optional - without it, the\nresult would simply parse ``keyword`` zero to ten times.\n\nWe could also specify parsing an exact number of times:\n\n.. code-block:: python\n\n keyword[10]\n\nOr, with a separator: \n\n.. code-block:: python\n\n keyword[10, None, ',']\n\nInfinity - ``math.inf`` - is a valid maximum value. For convenience, it can be imported directly from Comber:\n\n.. code-block:: python\n\n from comber import inf\n\n param_list = keyword[0, inf, ',']\n\nThere are several convenience combinators for common types of repetition.\n\nFor zero or more with a separator, using ``*``:\n\n.. code-block:: python\n\n parser*','\n\nOr zero or more without a separator, using the unary ``+``:\n\n.. code-block:: python\n\n +parser\n\nYou can declare a parser as *optional* with ``~``:\n\n.. code-block:: python\n\n ~parser\n\n------------------\nRecursive Grammars\n------------------\n\nConsider a program in our toy grammar::\n\n let word = exp(2, 16)\n let maxint = minus(word, 1)\n\nIt'd be nice to simplify this by eliminating the variable \"word\" and pass the exp() call directly to minus(), but to\nallow that, we need to extend our grammar to consider a function call to be a value so we can use one as a function\nargument. But to do that, we'd need to use ``function_call`` in our definition of ``value`` - but ``function_call`` likewise\nneeds to reference ``value``.\n\nTo solve this issue, we can define one of them as ``defer``.\n\n.. code-block:: python\n\n from comber import C, rs, inf, defer\n\n ...\n\n value = defer()\n function_call = keyword + '(' + value**',' + ')'\n value.fill(keyword | number | function_call)\n\nThis way, we can refer to ``value`` wherever we want, and only define its meaning when we're ready. We can safely build\nfairly complex grammars this way, but be wary of performance.\n\n-----------------\nControlling Space\n-----------------\n\nBefore each subparser is run, Comber consumes leading whitespace. By default, this is spaces, tabs, and newlines. You\ncan change this in two ways: set the ``whitespace`` property on the outermost parser to a string containing the new\nwhitespace characters, or likewise passing a string as the second argument to the parser. In either case, the value\n``None`` will disable the feature altogether.\n\n====================\nBuilding Parse Trees\n====================\n\nWhen you call a parser on a some text, they return a ``State`` object containing the resultant parse tree (``State.tree``).\n\nBy default, parsers output a \"flat\" tree - a list of strings parsed by the \"leaf\" parsers (i.e. string literals,\n``rs``, and ``cs``).\n\nTo build a more useful parse tree, you have to provide *emitters*. Our toy grammar contains a simple one that converts\nintegers in the input to Python ``int`` values:\n\n.. code-block:: Python\n\n number = rs(r'[0-9]+')@('number', int)\n\nSo if we ran our parser on the input \"let foo = 5\" the resulting state's ``tree`` property would be ``[\"let\", \"foo\", \"=\",\n5]``. But it'd be more useful if it resulted in some kind of \"let\" object (that could execute the assignment, or be fed\nto a VM, or whatever else). We could define one we can use as a emitter for our let statement like this:\n\n.. code-block:: Python\n\n class Let:\n def __init__(self, let, variable, eq, value):\n self.variable = variable\n self.value = value\n # We can ignore `let` and `eq` (which will always contain \"let\" and \"=\", respectively).\n\n def __repr__(self):\n return f'Let({self.variable}, {self.value})'\n\nand redefine the assignment rule like:\n\n.. code-block:: Python\n\n assignment = (C+ 'let' + keyword + '=' + (function_call | value))@Let\n\nNow if rerun the parser on \"let foo = 5\", we get ``[Let(foo, 5)]``.\n\nAs we did with number, you can also combine an emitter with a name, to improve error messages:\n\n.. code-block:: Python\n\n assignment = (C+ 'let' + keyword + '=' + (function_call | value))@(\"assignment\", Let)\n\n===========\nPerformance\n===========\n\nUnder the covers, Comber is essentially a recursive descent parser. It's best suited for relatively shallow grammars\nparsing small amounts of text.\n\nYou can gain a bit more speed by calling ``analyze()`` on the top-level/outermost parser. This modifies the parser (and\nsubparsers) in place:\n\n.. code-block:: Python\n\n grammar.analyze()\n grammar('max(1, 4)')\n\n\n====\nTODO\n====\n\n-------------------\nAvailable Operators\n-------------------\n\nOperators that Python allows to overridden\n\n\n======== ============== ===========\nOperator Method Current use\n======== ============== ===========\n\\+ __add__ sequences\n\\| __or__ selection\n[ ] __getitem__ repeat\n@ __matmul__ names and internalization\n< __lt__\n> __gt__\n<= __le__\n>= __ge__\n== __eq__\n!= __ne__\nis _is\nis not is_not\n\\- __sub__\n% __mod__\n\\* __mul__ zero or more, with provided separator\n\\** __pow__\n/ __truediv__\n// __floordiv__\n& __and__\n^ __xor__\n<< __lshift__\n>> __rshift__\nin __contains__\n======== ============== ===========\n\n\nUnary operators:\n\n======== =========== ===========\nOperator Method Current use\n======== =========== ===========\n~ __invert__ optional\nnot __not__\n\\- __neg__\n\\+ __pos__ zero or more\n======== =========== ===========\n\nAnd::\n\n () __call__ parse a string\n\n\n=========\nCopyright\n=========\n\nComber copyright 2025 Ray Wallace III\n\nThis library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, **version 2.1**.\n \nThis library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.\n \nYou should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA\n",
"bugtrack_url": null,
"license": null,
"summary": "Parser combinator library with a BNF-esque interface",
"version": "1.0.1",
"project_urls": {
"Homepage": "https://github.com/faboo/comber",
"Repository": "https://github.com/faboo/comber.git"
},
"split_keywords": [
"parser",
" combinator",
" parsing",
" compiler"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "261cc22ef98af967fdffb18b0289c232b6833375fb62a7a2438918e51fd93ad0",
"md5": "f629864b46d6faf8f5825065a92dd136",
"sha256": "fd11c5a373005879382b72d27e8c66153ffd5914b8bbacaa6ab142c971ad3c0b"
},
"downloads": -1,
"filename": "comber-1.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f629864b46d6faf8f5825065a92dd136",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 22144,
"upload_time": "2025-09-07T03:41:11",
"upload_time_iso_8601": "2025-09-07T03:41:11.045283Z",
"url": "https://files.pythonhosted.org/packages/26/1c/c22ef98af967fdffb18b0289c232b6833375fb62a7a2438918e51fd93ad0/comber-1.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "6a8b1f2db267eeea9029ec8a9e793d8ca6850d9516f5776c3ff3da19c9b8d7d1",
"md5": "b8a051e2ae17f3a21334f56df32216e2",
"sha256": "bbd196c355f086ce3009c80d796b4d4d01c19c3abc9b6295074ecae3553ba4ca"
},
"downloads": -1,
"filename": "comber-1.0.1.tar.gz",
"has_sig": false,
"md5_digest": "b8a051e2ae17f3a21334f56df32216e2",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 28166,
"upload_time": "2025-09-07T03:41:12",
"upload_time_iso_8601": "2025-09-07T03:41:12.953978Z",
"url": "https://files.pythonhosted.org/packages/6a/8b/1f2db267eeea9029ec8a9e793d8ca6850d9516f5776c3ff3da19c9b8d7d1/comber-1.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-07 03:41:12",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "faboo",
"github_project": "comber",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "comber"
}