pegen

Name	pegen JSON
Version	0.3.0 JSON
	download
home_page
Summary	CPython's PEG parser generator
upload_time	2023-11-14 12:02:21
maintainer
docs_url	None
author	Guido van Rossum
requires_python	<4,>=3.8
license	MIT License Copyright (c) 2021 we-like-parsers Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords	parser cpython peg pegen
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage

            <p align="center">
<img src="https://github.com/we-like-parsers/pegen/raw/main/media/logo.svg" width="70%">
</p>

-----------------------------------

[![Downloads](https://pepy.tech/badge/pegen/month)](https://pepy.tech/project/pegen)
[![PyPI version](https://badge.fury.io/py/pegen.svg)](https://badge.fury.io/py/pegen)
![CI](https://github.com/we-like-parsers/pegen/actions/workflows/test.yml/badge.svg)

# What is this?

Pegen is the parser generator used in CPython to produce the parser used by the interpreter. It allows to
produce PEG parsers from a description of a formal Grammar. 

# Installing

Install with `pip` or your favorite PyPi package manager.

```
pip install pegen
```

# Documentation

The documentation is available [here](https://we-like-parsers.github.io/pegen/).

# How to generate a parser

Given a grammar file compatible with `pegen` (you can write your own or start with one in the [`data`](data) directory), you
can easily generate a parser by running:

```
python -m pegen <path-to-grammar-file> -o parser.py
```

This will generate a file called `parser.py` in the current directory. This can be used to parse code using the grammar that
we just used:

```
python parser.py <file-with-code-to-parse>
```

As a demo: generate a Python parser from data/python.gram, and use the generated parser to parse and run tests/demo.py
```
make demo
```


# How to contribute

See the instructions in the [CONTRIBUTING.md](CONTRIBUTING.md) file.

# Differences with CPython's Pegen

This repository exists to distribute a version of the Python PEG parser generator used by CPython that can be installed via PyPI,
with some improvements. Although the official PEG generator included in CPython can generate both Python and C code, this distribution
of the generator only allows to generate Python code. This is due to the fact that the C code generated by the generator included
in CPython includes a lot of implementation details and private headers that are not available for general usage.

The official PEG generator for Python 3.9 and later is now included in the CPython repo under
[Tools/peg_generator/](https://github.com/python/cpython/tree/master/Tools/peg_generator). We aim to keep this repo in sync with the
Python generator from that version of `pegen`.

See also [PEP 617](https://www.python.org/dev/peps/pep-0617/).

# Repository structure

* The `src` directory contains the `pegen` source (the package itself).
* The `tests` directory contains the test suite for the `pegen` package.
* The `data` directory contains some example grammars compatible with `pegen`. This
  includes a [pure-Python version of the Python grammar](data/python.gram).
* The `docs` directory contains the documentation for the package.
* The `scripts` directory contains some useful scripts that can be used for visualizing
  grammars, benchmarking and other usages relevant to the development of the generator itself.
* The `stories` directory contains the backing files and examples for
  [Guido's series on PEG parser](https://medium.com/@gvanrossum_83706/peg-parsing-series-de5d41b2ed60).


# Quick syntax overview

The grammar consists of a sequence of rules of the form:

```
    rule_name: expression
```

Optionally, a type can be included right after the rule name, which
specifies the return type of the Python function corresponding to
the rule:

```
    rule_name[return_type]: expression
```

If the return type is omitted, then ``Any`` is returned.

## Grammar Expressions

### `# comment`

Python-style comments.

### `e1 e2`

Match e1, then match e2.

```
    rule_name: first_rule second_rule
```

### `e1 | e2`

Match e1 or e2.

The first alternative can also appear on the line after the rule name
for formatting purposes. In that case, a \| must be used before the
first alternative, like so:

```
    rule_name[return_type]:
        | first_alt
        | second_alt
```

### `( e )`

Match e.

```
    rule_name: (e)
```

A slightly more complex and useful example includes using the grouping
operator together with the repeat operators:

```
    rule_name: (e1 e2)*
```

### `[ e ] or e?`

Optionally match e.


```
    rule_name: [e]
```

A more useful example includes defining that a trailing comma is
optional:

```
    rule_name: e (',' e)* [',']
```

### `e*`

Match zero or more occurrences of e.

```
    rule_name: (e1 e2)*
```

### `e+`

Match one or more occurrences of e.

```
    rule_name: (e1 e2)+
```

### `s.e+`

Match one or more occurrences of e, separated by s. The generated parse
tree does not include the separator. This is otherwise identical to
``(e (s e)*)``.

```
    rule_name: ','.e+
```

### `&e`

Succeed if e can be parsed, without consuming any input.

### `!e`

Fail if e can be parsed, without consuming any input.

An example taken from the Python grammar specifies that a primary
consists of an atom, which is not followed by a ``.`` or a ``(`` or a
``[``:

```
    primary: atom !'.' !'(' !'['
```

### `~`

Commit to the current alternative, even if it fails to parse.

```
    rule_name: '(' ~ some_rule ')' | some_alt
```

In this example, if a left parenthesis is parsed, then the other
alternative won’t be considered, even if some_rule or ‘)’ fail to be
parsed.

## Left recursion

PEG parsers normally do not support left recursion but Pegen implements a
technique that allows left recursion using the memoization cache. This allows
us to write not only simple left-recursive rules but also more complicated
rules that involve indirect left-recursion like

```
  rule1: rule2 | 'a'
  rule2: rule3 | 'b'
  rule3: rule1 | 'c'
```

and "hidden left-recursion" like::

```
  rule: 'optional'? rule '@' some_other_rule
```

## Variables in the Grammar

A sub-expression can be named by preceding it with an identifier and an
``=`` sign. The name can then be used in the action (see below), like this: ::

```
    rule_name[return_type]: '(' a=some_other_rule ')' { a }
```

## Grammar actions

To avoid the intermediate steps that obscure the relationship between the
grammar and the AST generation the PEG parser allows directly generating AST
nodes for a rule via grammar actions. Grammar actions are language-specific
expressions that are evaluated when a grammar rule is successfully parsed. These
expressions can be written in Python. As an example of a grammar with Python actions,
the piece of the parser generator that parses grammar files is bootstrapped from a
meta-grammar file with Python actions that generate the grammar tree as a result
of the parsing. 

In the specific case of the PEG grammar for Python, having actions allows
directly describing how the AST is composed in the grammar itself, making it
more clear and maintainable. This AST generation process is supported by the use
of some helper functions that factor out common AST object manipulations and
some other required operations that are not directly related to the grammar.

To indicate these actions each alternative can be followed by the action code
inside curly-braces, which specifies the return value of the alternative

```
    rule_name[return_type]:
        | first_alt1 first_alt2 { first_alt1 }
        | second_alt1 second_alt2 { second_alt1 }
```

If the action is ommited, a default action is generated: 

* If there's a single name in the rule in the rule, it gets returned.

* If there is more than one name in the rule, a collection with all parsed
  expressions gets returned.

This default behaviour is primarily made for very simple situations and for
debugging purposes.

As an illustrative example this simple grammar file allows directly
generating a full parser that can parse simple arithmetic expressions and that
returns a valid Python AST:


```
    start[ast.Module]: a=expr_stmt* ENDMARKER { ast.Module(body=a or []) }
    expr_stmt: a=expr NEWLINE { ast.Expr(value=a, EXTRA) }

    expr:
        | l=expr '+' r=term { ast.BinOp(left=l, op=ast.Add(), right=r, EXTRA) }
        | l=expr '-' r=term { ast.BinOp(left=l, op=ast.Sub(), right=r, EXTRA) }
        | term

    term:
        | l=term '*' r=factor { ast.BinOp(left=l, op=ast.Mult(), right=r, EXTRA) }
        | l=term '/' r=factor { ast.BinOp(left=l, op=ast.Div(), right=r, EXTRA) }
        | factor

    factor:
        | '(' e=expr ')' { e }
        | atom

    atom:
        | NAME
        | NUMBER
```

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "pegen",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "<4,>=3.8",
    "maintainer_email": "\"Matthieu C. Dartiailh\" <m.dartiailh@gmail.com>",
    "keywords": "parser,CPython,PEG,pegen",
    "author": "Guido van Rossum",
    "author_email": "Pablo Galindo <pablogsal@gmail.com>, Lysandros Nikolaou <lisandrosnik@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/af/02/e275ed3cc1692dc7b882eb14b6726db8d0810f19da83756b3168034a6843/pegen-0.3.0.tar.gz",
    "platform": null,
    "description": "<p align=\"center\">\n<img src=\"https://github.com/we-like-parsers/pegen/raw/main/media/logo.svg\" width=\"70%\">\n</p>\n\n-----------------------------------\n\n[![Downloads](https://pepy.tech/badge/pegen/month)](https://pepy.tech/project/pegen)\n[![PyPI version](https://badge.fury.io/py/pegen.svg)](https://badge.fury.io/py/pegen)\n![CI](https://github.com/we-like-parsers/pegen/actions/workflows/test.yml/badge.svg)\n\n# What is this?\n\nPegen is the parser generator used in CPython to produce the parser used by the interpreter. It allows to\nproduce PEG parsers from a description of a formal Grammar. \n\n# Installing\n\nInstall with `pip` or your favorite PyPi package manager.\n\n```\npip install pegen\n```\n\n# Documentation\n\nThe documentation is available [here](https://we-like-parsers.github.io/pegen/).\n\n# How to generate a parser\n\nGiven a grammar file compatible with `pegen` (you can write your own or start with one in the [`data`](data) directory), you\ncan easily generate a parser by running:\n\n```\npython -m pegen <path-to-grammar-file> -o parser.py\n```\n\nThis will generate a file called `parser.py` in the current directory. This can be used to parse code using the grammar that\nwe just used:\n\n```\npython parser.py <file-with-code-to-parse>\n```\n\nAs a demo: generate a Python parser from data/python.gram, and use the generated parser to parse and run tests/demo.py\n```\nmake demo\n```\n\n\n# How to contribute\n\nSee the instructions in the [CONTRIBUTING.md](CONTRIBUTING.md) file.\n\n# Differences with CPython's Pegen\n\nThis repository exists to distribute a version of the Python PEG parser generator used by CPython that can be installed via PyPI,\nwith some improvements. Although the official PEG generator included in CPython can generate both Python and C code, this distribution\nof the generator only allows to generate Python code. This is due to the fact that the C code generated by the generator included\nin CPython includes a lot of implementation details and private headers that are not available for general usage.\n\nThe official PEG generator for Python 3.9 and later is now included in the CPython repo under\n[Tools/peg_generator/](https://github.com/python/cpython/tree/master/Tools/peg_generator). We aim to keep this repo in sync with the\nPython generator from that version of `pegen`.\n\nSee also [PEP 617](https://www.python.org/dev/peps/pep-0617/).\n\n# Repository structure\n\n* The `src` directory contains the `pegen` source (the package itself).\n* The `tests` directory contains the test suite for the `pegen` package.\n* The `data` directory contains some example grammars compatible with `pegen`. This\n  includes a [pure-Python version of the Python grammar](data/python.gram).\n* The `docs` directory contains the documentation for the package.\n* The `scripts` directory contains some useful scripts that can be used for visualizing\n  grammars, benchmarking and other usages relevant to the development of the generator itself.\n* The `stories` directory contains the backing files and examples for\n  [Guido's series on PEG parser](https://medium.com/@gvanrossum_83706/peg-parsing-series-de5d41b2ed60).\n\n\n# Quick syntax overview\n\nThe grammar consists of a sequence of rules of the form:\n\n```\n    rule_name: expression\n```\n\nOptionally, a type can be included right after the rule name, which\nspecifies the return type of the Python function corresponding to\nthe rule:\n\n```\n    rule_name[return_type]: expression\n```\n\nIf the return type is omitted, then ``Any`` is returned.\n\n## Grammar Expressions\n\n### `# comment`\n\nPython-style comments.\n\n### `e1 e2`\n\nMatch e1, then match e2.\n\n```\n    rule_name: first_rule second_rule\n```\n\n### `e1 | e2`\n\nMatch e1 or e2.\n\nThe first alternative can also appear on the line after the rule name\nfor formatting purposes. In that case, a \\| must be used before the\nfirst alternative, like so:\n\n```\n    rule_name[return_type]:\n        | first_alt\n        | second_alt\n```\n\n### `( e )`\n\nMatch e.\n\n```\n    rule_name: (e)\n```\n\nA slightly more complex and useful example includes using the grouping\noperator together with the repeat operators:\n\n```\n    rule_name: (e1 e2)*\n```\n\n### `[ e ] or e?`\n\nOptionally match e.\n\n\n```\n    rule_name: [e]\n```\n\nA more useful example includes defining that a trailing comma is\noptional:\n\n```\n    rule_name: e (',' e)* [',']\n```\n\n### `e*`\n\nMatch zero or more occurrences of e.\n\n```\n    rule_name: (e1 e2)*\n```\n\n### `e+`\n\nMatch one or more occurrences of e.\n\n```\n    rule_name: (e1 e2)+\n```\n\n### `s.e+`\n\nMatch one or more occurrences of e, separated by s. The generated parse\ntree does not include the separator. This is otherwise identical to\n``(e (s e)*)``.\n\n```\n    rule_name: ','.e+\n```\n\n### `&e`\n\nSucceed if e can be parsed, without consuming any input.\n\n### `!e`\n\nFail if e can be parsed, without consuming any input.\n\nAn example taken from the Python grammar specifies that a primary\nconsists of an atom, which is not followed by a ``.`` or a ``(`` or a\n``[``:\n\n```\n    primary: atom !'.' !'(' !'['\n```\n\n### `~`\n\nCommit to the current alternative, even if it fails to parse.\n\n```\n    rule_name: '(' ~ some_rule ')' | some_alt\n```\n\nIn this example, if a left parenthesis is parsed, then the other\nalternative won\u2019t be considered, even if some_rule or \u2018)\u2019 fail to be\nparsed.\n\n## Left recursion\n\nPEG parsers normally do not support left recursion but Pegen implements a\ntechnique that allows left recursion using the memoization cache. This allows\nus to write not only simple left-recursive rules but also more complicated\nrules that involve indirect left-recursion like\n\n```\n  rule1: rule2 | 'a'\n  rule2: rule3 | 'b'\n  rule3: rule1 | 'c'\n```\n\nand \"hidden left-recursion\" like::\n\n```\n  rule: 'optional'? rule '@' some_other_rule\n```\n\n## Variables in the Grammar\n\nA sub-expression can be named by preceding it with an identifier and an\n``=`` sign. The name can then be used in the action (see below), like this: ::\n\n```\n    rule_name[return_type]: '(' a=some_other_rule ')' { a }\n```\n\n## Grammar actions\n\nTo avoid the intermediate steps that obscure the relationship between the\ngrammar and the AST generation the PEG parser allows directly generating AST\nnodes for a rule via grammar actions. Grammar actions are language-specific\nexpressions that are evaluated when a grammar rule is successfully parsed. These\nexpressions can be written in Python. As an example of a grammar with Python actions,\nthe piece of the parser generator that parses grammar files is bootstrapped from a\nmeta-grammar file with Python actions that generate the grammar tree as a result\nof the parsing. \n\nIn the specific case of the PEG grammar for Python, having actions allows\ndirectly describing how the AST is composed in the grammar itself, making it\nmore clear and maintainable. This AST generation process is supported by the use\nof some helper functions that factor out common AST object manipulations and\nsome other required operations that are not directly related to the grammar.\n\nTo indicate these actions each alternative can be followed by the action code\ninside curly-braces, which specifies the return value of the alternative\n\n```\n    rule_name[return_type]:\n        | first_alt1 first_alt2 { first_alt1 }\n        | second_alt1 second_alt2 { second_alt1 }\n```\n\nIf the action is ommited, a default action is generated: \n\n* If there's a single name in the rule in the rule, it gets returned.\n\n* If there is more than one name in the rule, a collection with all parsed\n  expressions gets returned.\n\nThis default behaviour is primarily made for very simple situations and for\ndebugging purposes.\n\nAs an illustrative example this simple grammar file allows directly\ngenerating a full parser that can parse simple arithmetic expressions and that\nreturns a valid Python AST:\n\n\n```\n    start[ast.Module]: a=expr_stmt* ENDMARKER { ast.Module(body=a or []) }\n    expr_stmt: a=expr NEWLINE { ast.Expr(value=a, EXTRA) }\n\n    expr:\n        | l=expr '+' r=term { ast.BinOp(left=l, op=ast.Add(), right=r, EXTRA) }\n        | l=expr '-' r=term { ast.BinOp(left=l, op=ast.Sub(), right=r, EXTRA) }\n        | term\n\n    term:\n        | l=term '*' r=factor { ast.BinOp(left=l, op=ast.Mult(), right=r, EXTRA) }\n        | l=term '/' r=factor { ast.BinOp(left=l, op=ast.Div(), right=r, EXTRA) }\n        | factor\n\n    factor:\n        | '(' e=expr ')' { e }\n        | atom\n\n    atom:\n        | NAME\n        | NUMBER\n```\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2021 we-like-parsers  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ",
    "summary": "CPython's PEG parser generator",
    "version": "0.3.0",
    "project_urls": {
        "bug_reports": "https://github.com/we-like-parsers/pegen/issues",
        "changelog": "https://github.com/we-like-parsers/pegen/releasenotes.rst",
        "documentation": "https://we-like-parsers.github.io/pegen/",
        "homepage": "https://github.com/we-like-parsers/pegen",
        "source": "https://github.com/we-like-parsers/pegen"
    },
    "split_keywords": [
        "parser",
        "cpython",
        "peg",
        "pegen"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2b20fcf4e7b9e36e1b45e04632d771b1f36fce7c8ce37e151b1bac0916c6b469",
                "md5": "f25e63dc6d495ca394337ca5396f57d9",
                "sha256": "69253c196ea425828a6ca01cec1aadca415cd5c789b4d00d653962234497071a"
            },
            "downloads": -1,
            "filename": "pegen-0.3.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f25e63dc6d495ca394337ca5396f57d9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4,>=3.8",
            "size": 35345,
            "upload_time": "2023-11-14T12:02:19",
            "upload_time_iso_8601": "2023-11-14T12:02:19.450236Z",
            "url": "https://files.pythonhosted.org/packages/2b/20/fcf4e7b9e36e1b45e04632d771b1f36fce7c8ce37e151b1bac0916c6b469/pegen-0.3.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "af02e275ed3cc1692dc7b882eb14b6726db8d0810f19da83756b3168034a6843",
                "md5": "34157c01a76d3c5f2987635cd5fa73fb",
                "sha256": "8cb30cee508a95c573aa256ed2cfa80ef8f561b90264c70e7de5afc50b4ac87d"
            },
            "downloads": -1,
            "filename": "pegen-0.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "34157c01a76d3c5f2987635cd5fa73fb",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4,>=3.8",
            "size": 3803936,
            "upload_time": "2023-11-14T12:02:21",
            "upload_time_iso_8601": "2023-11-14T12:02:21.081751Z",
            "url": "https://files.pythonhosted.org/packages/af/02/e275ed3cc1692dc7b882eb14b6726db8d0810f19da83756b3168034a6843/pegen-0.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-11-14 12:02:21",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "we-like-parsers",
    "github_project": "pegen",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "tox": true,
    "lcname": "pegen"
}

Guido van Rossum