# More Parsing!
[![PyPI Latest Release](https://img.shields.io/pypi/v/mo-parsing.svg)](https://pypi.org/project/mo-parsing/)
[![Build Status](https://github.com/klahnakoski/mo-parsing/actions/workflows/build.yml/badge.svg?branch=master)](https://github.com/klahnakoski/mo-parsing/actions/workflows/build.yml)
[![Coverage Status](https://coveralls.io/repos/github/klahnakoski/mo-parsing/badge.svg?branch=master)](https://coveralls.io/github/klahnakoski/mo-parsing?branch=master)
A fork of [pyparsing](https://github.com/pyparsing/pyparsing) for faster parsing
## Installation
This is a pypi package
pip install mo-parsing
## Usage
This module allows you to define a PEG parser using predefined patterns and Python operators. Here is an example
```
>>> from mo_parsing import Word
>>> from mo_parsing.utils import alphas
>>>
>>> greet = Word(alphas)("greeting") + "," + Word(alphas)("person") + "!"
>>> result = greet.parse_string("Hello, World!")
```
The `result` can be accessed as a nested list
```
>>> list(result)
['Hello', ',', 'World', '!']
```
The `result` can also be accessed as a dictionary
```
>>> dict(result)
{'greeting': 'Hello', 'person': 'World'}
```
Read the [pyparsing documentation](https://github.com/pyparsing/pyparsing/#readme) for more
### The `Whitespace` Context
The `mo_parsing.whitespaces.CURRENT` is used during parser creation: It is effectively defines what "whitespace" to skip during parsing, with additional features to simplify the language definition. You declare "standard" `Whitespace` like so:
with Whitespace() as whitespace:
# PUT YOUR LANGUAGE DEFINITION HERE (space, tab and CR are "whitespace")
If you are declaring a large language, and you want to minimize indentation, and you are careful, you may also use this pattern:
whitespace = Whitespace().use()
# PUT YOUR LANGUAGE DEFINITION HERE
whitespace.release()
The whitespace can be used to set global parsing parameters, like
* `set_whitespace()` - set the ignored characters (default: `"\t\n "`)
* `add_ignore()` - include whole patterns that are ignored (like comments)
* `set_literal()` - Set the definition for what `Literal()` means
* `set_keyword_chars()` - For default `Keyword()` (important for defining word boundary)
### Navigating ParseResults
The results of parsing are in `ParseResults` and are in the form of an n-ary tree; with the children found in `ParseResults.tokens`. Each `ParseResult.type` points to the `ParserElement` that made it. In general, if you want to get fancy with post processing (or in a `parse_action`), you will be required to navigate the raw `tokens` to generate a final result
There are some convenience methods;
* `__iter__()` - allows you to iterate through parse results in **depth first search**. Empty results are skipped, and `Group`ed results are treated as atoms (which can be further iterated if required)
* `name` is a convenient property for `ParseResults.type.token_name`
* `__getitem__()` - allows you to jump into the parse tree to the given `name`. This is blocked by any names found inside `Group`ed results (because groups are considered atoms).
### Parse Actions
Parse actions are methods that run after a ParserElement found a match.
* Parameters must be accepted in `(tokens, index, string)` order (the opposite of pyparsing)
* Parse actions are wrapped to ensure the output is a legitimate ParseResult
* If your parse action returns `None` then the result is the original `tokens`
* If your parse action returns an object, or list, or tuple, then it will be packaged in a `ParseResult` with same type as `tokens`.
* If your parse action returns a `ParseResult` then it is accepted ***even if is belongs to some other pattern***
#### Simple example:
```
integer = Word("0123456789").add_parse_action(lambda t, i, s: int(t[0]))
result = integer.parse_string("42")
assert (result[0] == 42)
```
For slightly shorter specification, you may use the `/` operator and only parameters you need:
```
integer = Word("0123456789") / (lambda t: int(t[0]))
result = integer.parse_string("42")
assert (result[0] == 42)
```
### Debugging
The PEG-style of mo-parsing (from pyparsing) makes a very expressible and readable specification, but debugging a parser is still hard. To look deeper into what the parser is doing use the `Debugger`:
```
with Debugger():
expr.parse_string("my new language")
```
The debugger will print out details of what's happening
* Each attempt, and if it matched or failed
* A small number of bytes to show you the current position
* location, line and column for more info about the current position
* whitespace indicating stack depth
* print out of the ParserElement performing the attempt
This should help to isolate the exact position your grammar is failing.
### Regular Expressions
`mo-parsing` can parse and generate regular expressions. `ParserElement` has a `__regex__()` function that returns the regular expression for the given grammar; which works up to a limit, and is used internally to accelerate parsing. The `Regex` class parses regular expressions into a grammar; it is used to optimize parsing, and you may find it useful to decompose regular expressions that look like line noise.
## Differences from PyParsing
This fork was originally created to support faster parsing for [mo-sql-parsing](https://github.com/klahnakoski/moz-sql-parser). Since then it has deviated sufficiently to be it's own collection of parser specification functions. Here are the differences:
* Added `Whitespace`, which controls parsing context and whitespace. It replaces the whitespace modifying methods of pyparsing
* the wildcard ("`*`") could be used in pyparsing to indicate multi-values are expected; this is not allowed in `mo-parsing`: all values are multi-values
* ParserElements are static: For example, `expr.add_parse_action(action)` creates a new ParserElement, so must be assigned to variable or it is lost. **This is the biggest source of bugs when converting from pyparsing**
* removed all backward-compatibility settings
* no support for binary serialization (no pickle)
Faster Parsing
* faster infix operator parsing (main reason for this fork)
* ParseResults point to ParserElement for reduced size
* regex used to reduce the number of failed parse attempts
* packrat parser is not need
* less stack used
## Contributing
If you plan to extend or enhance this code, please [see the README in the tests directory](https://github.com/klahnakoski/mo-parsing/blob/dev/tests/README.md)
Raw data
{
"_id": null,
"home_page": "https://github.com/klahnakoski/mo-parsing",
"name": "mo-parsing",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "Various",
"author_email": "kyle@lahnakoski.com",
"download_url": "https://files.pythonhosted.org/packages/eb/6b/2442e4b74b13b7fea014dc8310dde73cb7736d7568686db6cee5cdeddc10/mo_parsing-8.654.24251.tar.gz",
"platform": null,
"description": "# More Parsing!\r\n\r\n[![PyPI Latest Release](https://img.shields.io/pypi/v/mo-parsing.svg)](https://pypi.org/project/mo-parsing/)\r\n [![Build Status](https://github.com/klahnakoski/mo-parsing/actions/workflows/build.yml/badge.svg?branch=master)](https://github.com/klahnakoski/mo-parsing/actions/workflows/build.yml)\r\n[![Coverage Status](https://coveralls.io/repos/github/klahnakoski/mo-parsing/badge.svg?branch=master)](https://coveralls.io/github/klahnakoski/mo-parsing?branch=master)\r\n\r\nA fork of [pyparsing](https://github.com/pyparsing/pyparsing) for faster parsing\r\n\r\n\r\n## Installation\r\n\r\nThis is a pypi package\r\n\r\n pip install mo-parsing\r\n \r\n## Usage\r\n\r\nThis module allows you to define a PEG parser using predefined patterns and Python operators. Here is an example \r\n\r\n```\r\n>>> from mo_parsing import Word\r\n>>> from mo_parsing.utils import alphas\r\n>>>\r\n>>> greet = Word(alphas)(\"greeting\") + \",\" + Word(alphas)(\"person\") + \"!\"\r\n>>> result = greet.parse_string(\"Hello, World!\")\r\n```\r\n\r\nThe `result` can be accessed as a nested list\r\n\r\n```\r\n>>> list(result)\r\n['Hello', ',', 'World', '!']\r\n```\r\n\r\nThe `result` can also be accessed as a dictionary\r\n\r\n```\r\n>>> dict(result)\r\n{'greeting': 'Hello', 'person': 'World'}\r\n```\r\n\r\nRead the [pyparsing documentation](https://github.com/pyparsing/pyparsing/#readme) for more\r\n\r\n### The `Whitespace` Context\r\n\r\nThe `mo_parsing.whitespaces.CURRENT` is used during parser creation: It is effectively defines what \"whitespace\" to skip during parsing, with additional features to simplify the language definition. You declare \"standard\" `Whitespace` like so:\r\n\r\n with Whitespace() as whitespace:\r\n # PUT YOUR LANGUAGE DEFINITION HERE (space, tab and CR are \"whitespace\")\r\n\r\nIf you are declaring a large language, and you want to minimize indentation, and you are careful, you may also use this pattern:\r\n\r\n whitespace = Whitespace().use()\r\n # PUT YOUR LANGUAGE DEFINITION HERE\r\n whitespace.release()\r\n\r\nThe whitespace can be used to set global parsing parameters, like\r\n\r\n* `set_whitespace()` - set the ignored characters (default: `\"\\t\\n \"`)\r\n* `add_ignore()` - include whole patterns that are ignored (like comments)\r\n* `set_literal()` - Set the definition for what `Literal()` means\r\n* `set_keyword_chars()` - For default `Keyword()` (important for defining word boundary)\r\n\r\n\r\n### Navigating ParseResults\r\n\r\nThe results of parsing are in `ParseResults` and are in the form of an n-ary tree; with the children found in `ParseResults.tokens`. Each `ParseResult.type` points to the `ParserElement` that made it. In general, if you want to get fancy with post processing (or in a `parse_action`), you will be required to navigate the raw `tokens` to generate a final result\r\n\r\nThere are some convenience methods; \r\n* `__iter__()` - allows you to iterate through parse results in **depth first search**. Empty results are skipped, and `Group`ed results are treated as atoms (which can be further iterated if required) \r\n* `name` is a convenient property for `ParseResults.type.token_name`\r\n* `__getitem__()` - allows you to jump into the parse tree to the given `name`. This is blocked by any names found inside `Group`ed results (because groups are considered atoms). \r\n\r\n### Parse Actions\r\n\r\nParse actions are methods that run after a ParserElement found a match. \r\n\r\n* Parameters must be accepted in `(tokens, index, string)` order (the opposite of pyparsing)\r\n* Parse actions are wrapped to ensure the output is a legitimate ParseResult\r\n * If your parse action returns `None` then the result is the original `tokens`\r\n * If your parse action returns an object, or list, or tuple, then it will be packaged in a `ParseResult` with same type as `tokens`.\r\n * If your parse action returns a `ParseResult` then it is accepted ***even if is belongs to some other pattern***\r\n \r\n#### Simple example:\r\n\r\n```\r\ninteger = Word(\"0123456789\").add_parse_action(lambda t, i, s: int(t[0]))\r\nresult = integer.parse_string(\"42\")\r\nassert (result[0] == 42)\r\n```\r\n\r\nFor slightly shorter specification, you may use the `/` operator and only parameters you need:\r\n\r\n```\r\ninteger = Word(\"0123456789\") / (lambda t: int(t[0]))\r\nresult = integer.parse_string(\"42\")\r\nassert (result[0] == 42)\r\n```\r\n\r\n### Debugging\r\n\r\nThe PEG-style of mo-parsing (from pyparsing) makes a very expressible and readable specification, but debugging a parser is still hard. To look deeper into what the parser is doing use the `Debugger`:\r\n\r\n```\r\nwith Debugger():\r\n expr.parse_string(\"my new language\")\r\n```\r\n\r\nThe debugger will print out details of what's happening\r\n\r\n* Each attempt, and if it matched or failed\r\n* A small number of bytes to show you the current position\r\n* location, line and column for more info about the current position\r\n* whitespace indicating stack depth\r\n* print out of the ParserElement performing the attempt\r\n\r\nThis should help to isolate the exact position your grammar is failing. \r\n\r\n### Regular Expressions\r\n\r\n`mo-parsing` can parse and generate regular expressions. `ParserElement` has a `__regex__()` function that returns the regular expression for the given grammar; which works up to a limit, and is used internally to accelerate parsing. The `Regex` class parses regular expressions into a grammar; it is used to optimize parsing, and you may find it useful to decompose regular expressions that look like line noise.\r\n\r\n\r\n## Differences from PyParsing\r\n\r\nThis fork was originally created to support faster parsing for [mo-sql-parsing](https://github.com/klahnakoski/moz-sql-parser). Since then it has deviated sufficiently to be it's own collection of parser specification functions. Here are the differences:\r\n\r\n* Added `Whitespace`, which controls parsing context and whitespace. It replaces the whitespace modifying methods of pyparsing\r\n* the wildcard (\"`*`\") could be used in pyparsing to indicate multi-values are expected; this is not allowed in `mo-parsing`: all values are multi-values\r\n* ParserElements are static: For example, `expr.add_parse_action(action)` creates a new ParserElement, so must be assigned to variable or it is lost. **This is the biggest source of bugs when converting from pyparsing**\r\n* removed all backward-compatibility settings\r\n* no support for binary serialization (no pickle)\r\n\r\nFaster Parsing\r\n\r\n* faster infix operator parsing (main reason for this fork)\r\n* ParseResults point to ParserElement for reduced size\r\n* regex used to reduce the number of failed parse attempts \r\n* packrat parser is not need\r\n* less stack used \r\n\r\n\r\n\r\n## Contributing\r\n\r\nIf you plan to extend or enhance this code, please [see the README in the tests directory](https://github.com/klahnakoski/mo-parsing/blob/dev/tests/README.md)\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Another PEG Parsing Tool",
"version": "8.654.24251",
"project_urls": {
"Homepage": "https://github.com/klahnakoski/mo-parsing"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "1ddc0e87664f60378775cfcadfbc2b6c0567510f8be88d41cef67e0f79ce839a",
"md5": "6093d86389f7c0ee91174675c1f6a57c",
"sha256": "72414e72daa81f72ea8286a4209d3cd972d697229edffb706ea0df6fbdc0271c"
},
"downloads": -1,
"filename": "mo_parsing-8.654.24251-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6093d86389f7c0ee91174675c1f6a57c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 62600,
"upload_time": "2024-09-07T12:28:58",
"upload_time_iso_8601": "2024-09-07T12:28:58.409627Z",
"url": "https://files.pythonhosted.org/packages/1d/dc/0e87664f60378775cfcadfbc2b6c0567510f8be88d41cef67e0f79ce839a/mo_parsing-8.654.24251-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "eb6b2442e4b74b13b7fea014dc8310dde73cb7736d7568686db6cee5cdeddc10",
"md5": "df6a6da7e27baa11e0613ac8d3e999d9",
"sha256": "b8f55590313629d8a4d90d5dfc3068cb14fb93098f15c7e157c08a4561484505"
},
"downloads": -1,
"filename": "mo_parsing-8.654.24251.tar.gz",
"has_sig": false,
"md5_digest": "df6a6da7e27baa11e0613ac8d3e999d9",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 59219,
"upload_time": "2024-09-07T12:28:59",
"upload_time_iso_8601": "2024-09-07T12:28:59.484235Z",
"url": "https://files.pythonhosted.org/packages/eb/6b/2442e4b74b13b7fea014dc8310dde73cb7736d7568686db6cee5cdeddc10/mo_parsing-8.654.24251.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-07 12:28:59",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "klahnakoski",
"github_project": "mo-parsing",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "mo-parsing"
}