# ReBNF
<div>
<a href="#"><img src="https://img.shields.io/badge/%F0%9F%94%96%20Version-0.9-ec3832.svg?color=ec3832&style=flat"/></a>
<a href="https://opsocket.com" style="text-decoration: none;">
<img alt="opsocket" height="42" src="https://gitlab.com/opsocket/rebnf/-/raw/main/docs/assets/imgs/logo.svg" loading="lazy" />
</a>
</div>
**ReBNF** (*Regexes for Extended Backus-Naur Form*) is a notation used to define the
syntax of a language using regular expressions.
It is an extension of the EBNF (Extended Backus-Naur Form) notation, allowing
for more flexibility and ease of use.
```
ooooooooo. oooooooooo. ooooo ooo oooooooooooo
`888 `Y88. `888' `Y8b `888b. `8' `888' `8
888 .d88' .ooooo. 888 888 8 `88b. 8 888
888ooo88P' d88' `88b 888oooo888' 8 `88b. 8 888oooo8
888`88b. 888ooo888 888 `88b 8 `88b.8 888 "
888 `88b. 888 .o 888 .88P 8 `888 888
o888o o888o `Y8bod8P' o888bood8P' o8o `8 o888o
```
## Table of Contents
- [Syntax](#syntax)
- [Example](#example)
- [Usage](#usage)
- [Contributing](#contributing)
- [License](#license)
## Syntax
The **ReBNF** notation uses regular expressions to define the structure of a
language. Each *rule* consists of a *left-hand side* (non-terminal) and a
*right-hand side* separated by an **assignment operator** (either `::=`, `:=` or `=`).
The general syntax of a **ReBNF** rule is as follows:
```
<alnum> ::= r"[a-zA-Z0-9]" ; # any alphanumeric character
```
The alphanumeric set is composed of all letters and all digits, which sums up
36 characters.
The **EBNF** syntax requires *quotes* and `|` operators in between characters to
define the `alnum` identifier as matching any alphanumeric character, which sums up to 143 characters.
Using **ReBNF**, a single regex is required such as `r"[a-zA-Z0-9]"`, which sums up to 14 characters.
### Identifiers
The enclosures `<` and `>` are optional, such as:
```
alnum = r"[a-zA-Z0-9]" # shorter definition
```
To improve readability and consistency, spaces are removed from *identifiers*,
and the **snake_case** naming convention is used instead.
Snake case identifiers consist of **lowercase letters**, **digits**,
and **underscores**.
The naming convention also dictates that each word within an identifier is
separated by an underscore.
This convention makes a clear distinction between individual words and
ensures that identifiers are easily recognizable.
For example, an identifier `non-terminal symbol` would have to be written as `non_terminal_symbol`.
By adhering to the snake case convention, ReBNF identifiers maintain a
standardized and consistent style throughout the notation, enabling easier
comprehension and usage.
### Modularity
In **ReBNF**, `import` statements are used to bring in *grammar rules* defined
in separate specification files. This enables the reuse of existing rules and
promotes modular design in grammar specifications.
As a result, we can organize grammar rules into separate `.rebnf`
specification files, making it easier to manage and maintain complex
grammars. This allows for better code organization, reuse of common rules,
and separation of concerns.
To import rules from another specification file, we can use the `import`
statement followed by the dotted path to a specification file or the `from`
statement to import only specific items. This enables us to selectively use
and reference rules defined in other files.
Given a folder hierarchy such as:
```
grammar/
├── common.rebnf
└── spec.rebnf
```
Here's an example:
```
from common import *
```
Using modularity in **ReBNF** files can lead to more maintainable and scalable
grammar specifications.
### Optional groups
Square brackets `[ ]` are used to define optional groups rather than
repetition. In **EBNF**, `3 * [aa]` would indicate the generation of multiple
occurrences of `aa` (e.g., A, AA, AAA), whereas in **ReBNF**, it denotes an
optional group that can occur *zero or one* times.
In **EBNF**:
```
aa = "A";
bb = 3 * aa, "B";
cc = 3 * [aa], "C";
```
Which means:
- `aa`: A
- `bb`: AAAB
- `cc`: C, AC, AAC, AAAC
In **ReBNF**:
```
aa = "A";
bb = 3 * aa "B";
cc = 3 * [aa] "C";
```
Which means:
- `aa`: A
- `bb`: AAAB
- `cc`: AAAC
### Concatenation
**ReBNF** also introduces a change in concatenation.
In **EBNF**, explicit concatenation is required using a comma `,` between two
identifiers.
However, in **ReBNF**, since snake cased identifiers are enforced,
concatenation is implicit. Adjacent terminals or identifiers are
concatenated.
That's why we are able to drop the comma in `3 * aa, "C"` if we want `cc` to be `"AAAC"`.
## Example
Here's a short example of a **ReBNF** definition for a simple arithmetic
expression language:
```
expression = term { ('+' | '-') term }
term = factor { ('*' | '/') factor }
factor = number | expression
number = r'\d+'
```
## Usage
**ReBNF** notation is used to define the syntax of programming languages,
configuration file formats, or any other formal language.
It provides a concise and powerful way to express language structures with a
addition of regular expressions.
> Note that the functions in this module are only designed to parse
> syntactically valid **ReBNF** code (code that does not raise when parsed
> using `parse()`). The behavior of the functions in this module is undefined
> when providing invalid **ReBNF** code and it can change at any point.
## Contributing
Contributions are welcome! If you have suggestions, improvements, or new ideas
related to the **ReBNF** notation, please feel free to open an issue or
submit a pull request.
## License
This project is licensed under the [GPLv3][#gplv3] license - see [LICENSE.md][#license] for details.
[#gplv3]: https://www.gnu.org/licenses/gpl-3.0.html
[#license]: https://gitlab.com/opsocket/rebnf/-/blob/main/LICENSE.md
Raw data
{
"_id": null,
"home_page": "",
"name": "rebnf",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "bnf,ebnf,rebnf,regex,regexes,backus-naur form,extended backus-naur form,syntax,metasyntax,lexer,parser,context-free grammar,formal language,grammar,language,text processing",
"author": "",
"author_email": "opsocket <opsocket@pm.me>",
"download_url": "https://files.pythonhosted.org/packages/e8/6d/0bd5d68761ae00a51cec827c90e36f4280a6b22f41c530eadc7b368a1b2a/rebnf-0.9.tar.gz",
"platform": null,
"description": "# ReBNF\n\n<div>\n <a href=\"#\"><img src=\"https://img.shields.io/badge/%F0%9F%94%96%20Version-0.9-ec3832.svg?color=ec3832&style=flat\"/></a>\n <a href=\"https://opsocket.com\" style=\"text-decoration: none;\">\n <img alt=\"opsocket\" height=\"42\" src=\"https://gitlab.com/opsocket/rebnf/-/raw/main/docs/assets/imgs/logo.svg\" loading=\"lazy\" />\n </a>\n</div>\n\n\n**ReBNF** (*Regexes for Extended Backus-Naur Form*) is a notation used to define the\nsyntax of a language using regular expressions.\n\nIt is an extension of the EBNF (Extended Backus-Naur Form) notation, allowing\nfor more flexibility and ease of use.\n\n```\nooooooooo. oooooooooo. ooooo ooo oooooooooooo \n`888 `Y88. `888' `Y8b `888b. `8' `888' `8 \n 888 .d88' .ooooo. 888 888 8 `88b. 8 888 \n 888ooo88P' d88' `88b 888oooo888' 8 `88b. 8 888oooo8 \n 888`88b. 888ooo888 888 `88b 8 `88b.8 888 \" \n 888 `88b. 888 .o 888 .88P 8 `888 888 \no888o o888o `Y8bod8P' o888bood8P' o8o `8 o888o \n```\n\n## Table of Contents\n\n- [Syntax](#syntax)\n- [Example](#example)\n- [Usage](#usage)\n- [Contributing](#contributing)\n- [License](#license)\n\n## Syntax\n\nThe **ReBNF** notation uses regular expressions to define the structure of a\nlanguage. Each *rule* consists of a *left-hand side* (non-terminal) and a\n*right-hand side* separated by an **assignment operator** (either `::=`, `:=` or `=`).\n\nThe general syntax of a **ReBNF** rule is as follows:\n\n```\n<alnum> ::= r\"[a-zA-Z0-9]\" ; # any alphanumeric character\n```\n\nThe alphanumeric set is composed of all letters and all digits, which sums up\n36 characters. \n\nThe **EBNF** syntax requires *quotes* and `|` operators in between characters to\ndefine the `alnum` identifier as matching any alphanumeric character, which sums up to 143 characters.\n\nUsing **ReBNF**, a single regex is required such as `r\"[a-zA-Z0-9]\"`, which sums up to 14 characters.\n\n### Identifiers\n\n The enclosures `<` and `>` are optional, such as:\n\n```\nalnum = r\"[a-zA-Z0-9]\" # shorter definition\n```\n\nTo improve readability and consistency, spaces are removed from *identifiers*,\nand the **snake_case** naming convention is used instead.\n\nSnake case identifiers consist of **lowercase letters**, **digits**,\nand **underscores**. \n\nThe naming convention also dictates that each word within an identifier is\nseparated by an underscore.\n\nThis convention makes a clear distinction between individual words and\nensures that identifiers are easily recognizable.\n\nFor example, an identifier `non-terminal symbol` would have to be written as `non_terminal_symbol`. \n\nBy adhering to the snake case convention, ReBNF identifiers maintain a\nstandardized and consistent style throughout the notation, enabling easier\ncomprehension and usage.\n\n### Modularity\n\nIn **ReBNF**, `import` statements are used to bring in *grammar rules* defined\nin separate specification files. This enables the reuse of existing rules and\npromotes modular design in grammar specifications.\n\nAs a result, we can organize grammar rules into separate `.rebnf`\nspecification files, making it easier to manage and maintain complex\ngrammars. This allows for better code organization, reuse of common rules,\nand separation of concerns.\n\nTo import rules from another specification file, we can use the `import`\nstatement followed by the dotted path to a specification file or the `from`\nstatement to import only specific items. This enables us to selectively use\nand reference rules defined in other files.\n\nGiven a folder hierarchy such as:\n\n```\ngrammar/\n\u251c\u2500\u2500 common.rebnf\n\u2514\u2500\u2500 spec.rebnf\n```\n\nHere's an example:\n\n```\nfrom common import *\n```\n\nUsing modularity in **ReBNF** files can lead to more maintainable and scalable\ngrammar specifications.\n\n### Optional groups\n\nSquare brackets `[ ]` are used to define optional groups rather than\nrepetition. In **EBNF**, `3 * [aa]` would indicate the generation of multiple\noccurrences of `aa` (e.g., A, AA, AAA), whereas in **ReBNF**, it denotes an\noptional group that can occur *zero or one* times.\n\nIn **EBNF**:\n```\naa = \"A\";\nbb = 3 * aa, \"B\";\ncc = 3 * [aa], \"C\";\n```\n\nWhich means:\n\n- `aa`: A\n- `bb`: AAAB\n- `cc`: C, AC, AAC, AAAC\n\n\nIn **ReBNF**:\n```\naa = \"A\";\nbb = 3 * aa \"B\";\ncc = 3 * [aa] \"C\";\n```\n\nWhich means:\n\n- `aa`: A\n- `bb`: AAAB\n- `cc`: AAAC\n\n### Concatenation\n\n**ReBNF** also introduces a change in concatenation. \n\nIn **EBNF**, explicit concatenation is required using a comma `,` between two\nidentifiers. \n\nHowever, in **ReBNF**, since snake cased identifiers are enforced,\nconcatenation is implicit. Adjacent terminals or identifiers are\nconcatenated.\n\nThat's why we are able to drop the comma in `3 * aa, \"C\"` if we want `cc` to be `\"AAAC\"`.\n\n## Example\n\nHere's a short example of a **ReBNF** definition for a simple arithmetic\nexpression language:\n\n```\nexpression = term { ('+' | '-') term }\nterm = factor { ('*' | '/') factor }\nfactor = number | expression\nnumber = r'\\d+'\n```\n## Usage\n\n**ReBNF** notation is used to define the syntax of programming languages,\nconfiguration file formats, or any other formal language. \n\nIt provides a concise and powerful way to express language structures with a\naddition of regular expressions.\n\n> Note that the functions in this module are only designed to parse\n> syntactically valid **ReBNF** code (code that does not raise when parsed\n> using `parse()`). The behavior of the functions in this module is undefined\n> when providing invalid **ReBNF** code and it can change at any point. \n\n## Contributing\n\nContributions are welcome! If you have suggestions, improvements, or new ideas\nrelated to the **ReBNF** notation, please feel free to open an issue or\nsubmit a pull request.\n\n## License\n\nThis project is licensed under the [GPLv3][#gplv3] license - see [LICENSE.md][#license] for details.\n\n[#gplv3]: https://www.gnu.org/licenses/gpl-3.0.html\n[#license]: https://gitlab.com/opsocket/rebnf/-/blob/main/LICENSE.md\n\n",
"bugtrack_url": null,
"license": "",
"summary": "ReBNF: Regexes for Extended Backus-Naur Form (EBNF)",
"version": "0.9",
"project_urls": {
"Repository": "https://gitlab.com/opsocket/rebnf.git"
},
"split_keywords": [
"bnf",
"ebnf",
"rebnf",
"regex",
"regexes",
"backus-naur form",
"extended backus-naur form",
"syntax",
"metasyntax",
"lexer",
"parser",
"context-free grammar",
"formal language",
"grammar",
"language",
"text processing"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "1cfb60b79ca51e83934025c8f620889de15488bdc3269e50f26ae179c8bf6b9d",
"md5": "88f70c7d3e0ba9438921093868c93365",
"sha256": "b554ec454a5d26ac8728b8b1b24d5a1d0c455a26d2ee421c480716eb74f4be4e"
},
"downloads": -1,
"filename": "rebnf-0.9-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "88f70c7d3e0ba9438921093868c93365",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": null,
"size": 25964,
"upload_time": "2023-06-19T14:03:49",
"upload_time_iso_8601": "2023-06-19T14:03:49.263621Z",
"url": "https://files.pythonhosted.org/packages/1c/fb/60b79ca51e83934025c8f620889de15488bdc3269e50f26ae179c8bf6b9d/rebnf-0.9-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "e86d0bd5d68761ae00a51cec827c90e36f4280a6b22f41c530eadc7b368a1b2a",
"md5": "63bc5793d704d47aa0579953cbc93fa5",
"sha256": "40a60e13aa777ddb38b0a49588d8d99fc5a507e91f4b9dd6efe6e2abee28900b"
},
"downloads": -1,
"filename": "rebnf-0.9.tar.gz",
"has_sig": false,
"md5_digest": "63bc5793d704d47aa0579953cbc93fa5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 24888,
"upload_time": "2023-06-19T14:03:50",
"upload_time_iso_8601": "2023-06-19T14:03:50.759459Z",
"url": "https://files.pythonhosted.org/packages/e8/6d/0bd5d68761ae00a51cec827c90e36f4280a6b22f41c530eadc7b368a1b2a/rebnf-0.9.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-06-19 14:03:50",
"github": false,
"gitlab": true,
"bitbucket": false,
"codeberg": false,
"gitlab_user": "opsocket",
"gitlab_project": "rebnf",
"lcname": "rebnf"
}