mo-parsing


Namemo-parsing JSON
Version 8.654.24251 PyPI version JSON
download
home_pagehttps://github.com/klahnakoski/mo-parsing
SummaryAnother PEG Parsing Tool
upload_time2024-09-07 12:28:59
maintainerNone
docs_urlNone
authorVarious
requires_pythonNone
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # More Parsing!

[![PyPI Latest Release](https://img.shields.io/pypi/v/mo-parsing.svg)](https://pypi.org/project/mo-parsing/)
 [![Build Status](https://github.com/klahnakoski/mo-parsing/actions/workflows/build.yml/badge.svg?branch=master)](https://github.com/klahnakoski/mo-parsing/actions/workflows/build.yml)
[![Coverage Status](https://coveralls.io/repos/github/klahnakoski/mo-parsing/badge.svg?branch=master)](https://coveralls.io/github/klahnakoski/mo-parsing?branch=master)

A fork of [pyparsing](https://github.com/pyparsing/pyparsing) for faster parsing


## Installation

This is a pypi package

    pip install mo-parsing
    
## Usage

This module allows you to define a PEG parser using predefined patterns and Python operators.  Here is an example 

```
>>> from mo_parsing import Word
>>> from mo_parsing.utils import alphas
>>>
>>> greet = Word(alphas)("greeting") + "," + Word(alphas)("person") + "!"
>>> result = greet.parse_string("Hello, World!")
```

The `result` can be accessed as a nested list

```
>>> list(result)
['Hello', ',', 'World', '!']
```

The `result` can also be accessed as a dictionary

```
>>> dict(result)
{'greeting': 'Hello', 'person': 'World'}
```

Read the [pyparsing documentation](https://github.com/pyparsing/pyparsing/#readme) for more

### The `Whitespace` Context

The `mo_parsing.whitespaces.CURRENT` is used during parser creation: It is effectively defines what "whitespace" to skip during parsing, with additional features to simplify the language definition.  You declare "standard" `Whitespace` like so:

    with Whitespace() as whitespace:
        # PUT YOUR LANGUAGE DEFINITION HERE (space, tab and CR are "whitespace")

If you are declaring a large language, and you want to minimize indentation, and you are careful, you may also use this pattern:

    whitespace = Whitespace().use()
    # PUT YOUR LANGUAGE DEFINITION HERE
    whitespace.release()

The whitespace can be used to set global parsing parameters, like

* `set_whitespace()` - set the ignored characters (default: `"\t\n "`)
* `add_ignore()` - include whole patterns that are ignored (like comments)
* `set_literal()` - Set the definition for what `Literal()` means
* `set_keyword_chars()` - For default `Keyword()` (important for defining word boundary)


### Navigating ParseResults

The results of parsing are in `ParseResults` and are in the form of an n-ary tree; with the children found in `ParseResults.tokens`.  Each `ParseResult.type` points to the `ParserElement` that made it.  In general, if you want to get fancy with post processing (or in a `parse_action`), you will be required to navigate the raw `tokens` to generate a final result

There are some convenience methods;  
* `__iter__()` - allows you to iterate through parse results in **depth first search**. Empty results are skipped, and `Group`ed results are treated as atoms (which can be further iterated if required) 
* `name` is a convenient property for `ParseResults.type.token_name`
* `__getitem__()` - allows you to jump into the parse tree to the given `name`. This is blocked by any names found inside `Group`ed results (because groups are considered atoms).      

### Parse Actions

Parse actions are methods that run after a ParserElement found a match. 

* Parameters must be accepted in `(tokens, index, string)` order (the opposite of pyparsing)
* Parse actions are wrapped to ensure the output is a legitimate ParseResult
  * If your parse action returns `None` then the result is the original `tokens`
  * If your parse action returns an object, or list, or tuple, then it will be packaged in a `ParseResult` with same type as `tokens`.
  * If your parse action returns a `ParseResult` then it is accepted ***even if is belongs to some other pattern***
  
#### Simple example:

```
integer = Word("0123456789").add_parse_action(lambda t, i, s: int(t[0]))
result = integer.parse_string("42")
assert (result[0] == 42)
```

For slightly shorter specification, you may use the `/` operator and only parameters you need:

```
integer = Word("0123456789") / (lambda t: int(t[0]))
result = integer.parse_string("42")
assert (result[0] == 42)
```

### Debugging

The PEG-style of mo-parsing (from pyparsing) makes a very expressible and readable specification, but debugging a parser is still hard.  To look deeper into what the parser is doing use the `Debugger`:

```
with Debugger():
    expr.parse_string("my new language")
```

The debugger will print out details of what's happening

* Each attempt, and if it matched or failed
* A small number of bytes to show you the current position
* location, line and column for more info about the current position
* whitespace indicating stack depth
* print out of the ParserElement performing the attempt

This should help to isolate the exact position your grammar is failing. 

### Regular Expressions

`mo-parsing` can parse and generate regular expressions. `ParserElement` has a `__regex__()` function that returns the regular expression for the given grammar; which works up to a limit, and is used internally to accelerate parsing.  The `Regex` class parses regular expressions into a grammar; it is used to optimize parsing, and you may find it useful to decompose regular expressions that look like line noise.


## Differences from PyParsing

This fork was originally created to support faster parsing for [mo-sql-parsing](https://github.com/klahnakoski/moz-sql-parser).  Since then it has deviated sufficiently to be it's own collection of parser specification functions.  Here are the differences:

* Added `Whitespace`, which controls parsing context and whitespace.  It replaces the whitespace modifying methods of pyparsing
* the wildcard ("`*`") could be used in pyparsing to indicate multi-values are expected; this is not allowed in `mo-parsing`: all values are multi-values
* ParserElements are static: For example, `expr.add_parse_action(action)` creates a new ParserElement, so must be assigned to variable or it is lost. **This is the biggest source of bugs when converting from pyparsing**
* removed all backward-compatibility settings
* no support for binary serialization (no pickle)

Faster Parsing

* faster infix operator parsing (main reason for this fork)
* ParseResults point to ParserElement for reduced size
* regex used to reduce the number of failed parse attempts  
* packrat parser is not need
* less stack used 



## Contributing

If you plan to extend or enhance this code, please [see the README in the tests directory](https://github.com/klahnakoski/mo-parsing/blob/dev/tests/README.md)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/klahnakoski/mo-parsing",
    "name": "mo-parsing",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "Various",
    "author_email": "kyle@lahnakoski.com",
    "download_url": "https://files.pythonhosted.org/packages/eb/6b/2442e4b74b13b7fea014dc8310dde73cb7736d7568686db6cee5cdeddc10/mo_parsing-8.654.24251.tar.gz",
    "platform": null,
    "description": "# More Parsing!\r\n\r\n[![PyPI Latest Release](https://img.shields.io/pypi/v/mo-parsing.svg)](https://pypi.org/project/mo-parsing/)\r\n [![Build Status](https://github.com/klahnakoski/mo-parsing/actions/workflows/build.yml/badge.svg?branch=master)](https://github.com/klahnakoski/mo-parsing/actions/workflows/build.yml)\r\n[![Coverage Status](https://coveralls.io/repos/github/klahnakoski/mo-parsing/badge.svg?branch=master)](https://coveralls.io/github/klahnakoski/mo-parsing?branch=master)\r\n\r\nA fork of [pyparsing](https://github.com/pyparsing/pyparsing) for faster parsing\r\n\r\n\r\n## Installation\r\n\r\nThis is a pypi package\r\n\r\n    pip install mo-parsing\r\n    \r\n## Usage\r\n\r\nThis module allows you to define a PEG parser using predefined patterns and Python operators.  Here is an example \r\n\r\n```\r\n>>> from mo_parsing import Word\r\n>>> from mo_parsing.utils import alphas\r\n>>>\r\n>>> greet = Word(alphas)(\"greeting\") + \",\" + Word(alphas)(\"person\") + \"!\"\r\n>>> result = greet.parse_string(\"Hello, World!\")\r\n```\r\n\r\nThe `result` can be accessed as a nested list\r\n\r\n```\r\n>>> list(result)\r\n['Hello', ',', 'World', '!']\r\n```\r\n\r\nThe `result` can also be accessed as a dictionary\r\n\r\n```\r\n>>> dict(result)\r\n{'greeting': 'Hello', 'person': 'World'}\r\n```\r\n\r\nRead the [pyparsing documentation](https://github.com/pyparsing/pyparsing/#readme) for more\r\n\r\n### The `Whitespace` Context\r\n\r\nThe `mo_parsing.whitespaces.CURRENT` is used during parser creation: It is effectively defines what \"whitespace\" to skip during parsing, with additional features to simplify the language definition.  You declare \"standard\" `Whitespace` like so:\r\n\r\n    with Whitespace() as whitespace:\r\n        # PUT YOUR LANGUAGE DEFINITION HERE (space, tab and CR are \"whitespace\")\r\n\r\nIf you are declaring a large language, and you want to minimize indentation, and you are careful, you may also use this pattern:\r\n\r\n    whitespace = Whitespace().use()\r\n    # PUT YOUR LANGUAGE DEFINITION HERE\r\n    whitespace.release()\r\n\r\nThe whitespace can be used to set global parsing parameters, like\r\n\r\n* `set_whitespace()` - set the ignored characters (default: `\"\\t\\n \"`)\r\n* `add_ignore()` - include whole patterns that are ignored (like comments)\r\n* `set_literal()` - Set the definition for what `Literal()` means\r\n* `set_keyword_chars()` - For default `Keyword()` (important for defining word boundary)\r\n\r\n\r\n### Navigating ParseResults\r\n\r\nThe results of parsing are in `ParseResults` and are in the form of an n-ary tree; with the children found in `ParseResults.tokens`.  Each `ParseResult.type` points to the `ParserElement` that made it.  In general, if you want to get fancy with post processing (or in a `parse_action`), you will be required to navigate the raw `tokens` to generate a final result\r\n\r\nThere are some convenience methods;  \r\n* `__iter__()` - allows you to iterate through parse results in **depth first search**. Empty results are skipped, and `Group`ed results are treated as atoms (which can be further iterated if required) \r\n* `name` is a convenient property for `ParseResults.type.token_name`\r\n* `__getitem__()` - allows you to jump into the parse tree to the given `name`. This is blocked by any names found inside `Group`ed results (because groups are considered atoms).      \r\n\r\n### Parse Actions\r\n\r\nParse actions are methods that run after a ParserElement found a match. \r\n\r\n* Parameters must be accepted in `(tokens, index, string)` order (the opposite of pyparsing)\r\n* Parse actions are wrapped to ensure the output is a legitimate ParseResult\r\n  * If your parse action returns `None` then the result is the original `tokens`\r\n  * If your parse action returns an object, or list, or tuple, then it will be packaged in a `ParseResult` with same type as `tokens`.\r\n  * If your parse action returns a `ParseResult` then it is accepted ***even if is belongs to some other pattern***\r\n  \r\n#### Simple example:\r\n\r\n```\r\ninteger = Word(\"0123456789\").add_parse_action(lambda t, i, s: int(t[0]))\r\nresult = integer.parse_string(\"42\")\r\nassert (result[0] == 42)\r\n```\r\n\r\nFor slightly shorter specification, you may use the `/` operator and only parameters you need:\r\n\r\n```\r\ninteger = Word(\"0123456789\") / (lambda t: int(t[0]))\r\nresult = integer.parse_string(\"42\")\r\nassert (result[0] == 42)\r\n```\r\n\r\n### Debugging\r\n\r\nThe PEG-style of mo-parsing (from pyparsing) makes a very expressible and readable specification, but debugging a parser is still hard.  To look deeper into what the parser is doing use the `Debugger`:\r\n\r\n```\r\nwith Debugger():\r\n    expr.parse_string(\"my new language\")\r\n```\r\n\r\nThe debugger will print out details of what's happening\r\n\r\n* Each attempt, and if it matched or failed\r\n* A small number of bytes to show you the current position\r\n* location, line and column for more info about the current position\r\n* whitespace indicating stack depth\r\n* print out of the ParserElement performing the attempt\r\n\r\nThis should help to isolate the exact position your grammar is failing. \r\n\r\n### Regular Expressions\r\n\r\n`mo-parsing` can parse and generate regular expressions. `ParserElement` has a `__regex__()` function that returns the regular expression for the given grammar; which works up to a limit, and is used internally to accelerate parsing.  The `Regex` class parses regular expressions into a grammar; it is used to optimize parsing, and you may find it useful to decompose regular expressions that look like line noise.\r\n\r\n\r\n## Differences from PyParsing\r\n\r\nThis fork was originally created to support faster parsing for [mo-sql-parsing](https://github.com/klahnakoski/moz-sql-parser).  Since then it has deviated sufficiently to be it's own collection of parser specification functions.  Here are the differences:\r\n\r\n* Added `Whitespace`, which controls parsing context and whitespace.  It replaces the whitespace modifying methods of pyparsing\r\n* the wildcard (\"`*`\") could be used in pyparsing to indicate multi-values are expected; this is not allowed in `mo-parsing`: all values are multi-values\r\n* ParserElements are static: For example, `expr.add_parse_action(action)` creates a new ParserElement, so must be assigned to variable or it is lost. **This is the biggest source of bugs when converting from pyparsing**\r\n* removed all backward-compatibility settings\r\n* no support for binary serialization (no pickle)\r\n\r\nFaster Parsing\r\n\r\n* faster infix operator parsing (main reason for this fork)\r\n* ParseResults point to ParserElement for reduced size\r\n* regex used to reduce the number of failed parse attempts  \r\n* packrat parser is not need\r\n* less stack used \r\n\r\n\r\n\r\n## Contributing\r\n\r\nIf you plan to extend or enhance this code, please [see the README in the tests directory](https://github.com/klahnakoski/mo-parsing/blob/dev/tests/README.md)\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Another PEG Parsing Tool",
    "version": "8.654.24251",
    "project_urls": {
        "Homepage": "https://github.com/klahnakoski/mo-parsing"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1ddc0e87664f60378775cfcadfbc2b6c0567510f8be88d41cef67e0f79ce839a",
                "md5": "6093d86389f7c0ee91174675c1f6a57c",
                "sha256": "72414e72daa81f72ea8286a4209d3cd972d697229edffb706ea0df6fbdc0271c"
            },
            "downloads": -1,
            "filename": "mo_parsing-8.654.24251-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6093d86389f7c0ee91174675c1f6a57c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 62600,
            "upload_time": "2024-09-07T12:28:58",
            "upload_time_iso_8601": "2024-09-07T12:28:58.409627Z",
            "url": "https://files.pythonhosted.org/packages/1d/dc/0e87664f60378775cfcadfbc2b6c0567510f8be88d41cef67e0f79ce839a/mo_parsing-8.654.24251-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "eb6b2442e4b74b13b7fea014dc8310dde73cb7736d7568686db6cee5cdeddc10",
                "md5": "df6a6da7e27baa11e0613ac8d3e999d9",
                "sha256": "b8f55590313629d8a4d90d5dfc3068cb14fb93098f15c7e157c08a4561484505"
            },
            "downloads": -1,
            "filename": "mo_parsing-8.654.24251.tar.gz",
            "has_sig": false,
            "md5_digest": "df6a6da7e27baa11e0613ac8d3e999d9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 59219,
            "upload_time": "2024-09-07T12:28:59",
            "upload_time_iso_8601": "2024-09-07T12:28:59.484235Z",
            "url": "https://files.pythonhosted.org/packages/eb/6b/2442e4b74b13b7fea014dc8310dde73cb7736d7568686db6cee5cdeddc10/mo_parsing-8.654.24251.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-07 12:28:59",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "klahnakoski",
    "github_project": "mo-parsing",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "mo-parsing"
}
        
Elapsed time: 0.33325s