parmancer

Name	parmancer JSON
Version	0.2.1 JSON
	download
home_page	None
Summary	Parse structured data from text using parser combinators
upload_time	2025-07-19 00:16:39
maintainer	None
docs_url	None
author	None
requires_python	>=3.9
license	None
keywords	parser parser combinator parsing
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Parmancer

Parse text into **structured data types** with **parser combinators**.

Parmancer has **type annotations** for parsers and intermediate results.
Using a type checker with Parmancer gives immediate feedback about parser result types, and gives type errors when creating invalid combinations of parsers.

## Installation

```sh
pip install parmancer
```

## Documentation

https://parmancer.com

## Introductory example

This example shows a parser which can parse text like `"Hello World! 1 + 2 + 3"` to extract the name in `Hello <name>!` and find the sum of the numbers which come after it:

```python
from parmancer import regex, digits, seq, string

# A parser which extracts a name from a greeting using a regular expression
greeting = regex(r"Hello (\w+)! ", group=1)

# A parser which takes integers separated by ` + `,
# converts them to `int`s, and sums them.
adder = digits.map(int).sep_by(string(" + ")).map(sum)

# The `greeting` and `adder` parsers are combined in sequence
parser = seq(greeting, adder)
# The type of `parser` is `Parser[tuple[str, int]]`, meaning it's a parser which
# will return a `tuple[str, int]` when it parses text.

# Now the parser can be applied to the example string, or other strings following the
# same pattern.
result = parser.parse("Hello World! 1 + 2 + 3")

# The result is a tuple containing the `greeting` result followed by the `adder` result
assert result == ("World", 6)

# Parsing different text which matches the same structure:
assert parser.parse("Hello Example! 10 + 11") == ("Example", 21)
```

Type checkers such as `mypy` and `Pylance`'s type checker help during development by revealing type information and catching type errors.

Here the in-line types are displayed automatically with VSCode's Python extension and the 'Inlay Hints' setting:

![Type annotations for Parmancer parsers](docs/intro_example.gif)

When the type of a parser doesn't match what's expected, such as in the following example, a type error reveals the problem as soon as the code is type checked, without having to run the code.
In this example the `Parser.unpack` method is being used to unpack the result tuple of type `(str, int)` into a function which expects arguments of type `(str, str)` which is a type incompatibility:

![Type mismatch for the unpack method](docs/type_mismatch.png)

## Dataclass parsers

A key feature of Parmancer is the ability to create parsers which return dataclass instances using a short syntax where parsers are directly associated with each field of a dataclass.

Each dataclass field has a parser associated with it using the `take` field descriptor instead of the usual `dataclasses.field`.

The entire dataclass parser is then **combined** using the `gather` function, creating a parser which sequentially applies each field's parser, assigning each result to the dataclass field it is associated with.

```python
from dataclasses import dataclass
from parmancer import regex, string, take, gather

# Example text which a sensor might produce
sample_text = """Device: SensorA
ID: abc001
Readings (3:01 PM)
300.1, 301, 300
Readings (3:02 PM)
302, 1000, 2500
"""

numeric = regex(r"\d+(\.\d+)?").map(float)
any_text = regex(r"[^\n]+")
line_break = string("\n")


# Define parsers for the sensor readings and device information
@dataclass
class Reading:
    # Matches text like `Readings (3:01 PM)`
    timestamp: str = take(regex(r"Readings \(([^)]+)\)", group=1) << line_break)
    # Matches text like `300.1, 301, 300`
    values: list[float] = take(numeric.sep_by(string(", ")) << line_break)


@dataclass
class Device:
    # Matches text like `Device: SensorA`
    name: str = take(string("Device: ") >> any_text << line_break)
    # Matches text like `ID: abc001`
    id: str = take(string("ID: ") >> any_text << line_break)
    # Matches the entire `Reading` dataclass parser 0, 1 or many times
    readings: list[Reading] = take(gather(Reading).many())


# Gather the fields of the `Device` dataclass into a single combined parser
# Note the `Device.readings` field parser uses the `Reading` dataclass parser
parser = gather(Device)

# The result of the parser is a nicely structured `Device` dataclass instance,
# ready for use in the rest of the code with minimal boilerplate to get this far
assert parser.parse(sample_text) == Device(
    name="SensorA",
    id="abc001",
    readings=[
        Reading(timestamp="3:01 PM", values=[300.1, 301, 300]),
        Reading(timestamp="3:02 PM", values=[302, 1000, 2500]),
    ],
)
```

Dataclass parsers come with type annotations which make it easy to write them with hints from an IDE.
For example, a dataclass field of type `str` cannot be associated with a parser of type `Parser[int]` - the parser has to produce a string (`Parser[str]`) for it to be compatible, and a type checker can reveal this while writing code in an IDE:

![Dataclass field parser type error](docs/dataclass_type_mismatch.png)

## Why use Parmancer?

- **Simple construction**: Simple parsers can be defined concisely and independently, and then combined with short, understandable **combinator** functions and methods which replace the usual branching and sequencing boilerplate of parsers written in vanilla Python.
- **Modularity, testability, maintainability**: Each intermediate parser component is a complete parser in itself, which means it can be understood, tested and modified in isolation from the rest of the parser.
- **Regular Python**: Some approaches to parsing use a separate grammar definition outside of Python which goes through a compilation or generation step before it can be used in Python, which can lead to black boxes. Parmancer parsers are defined as Python code rather than a separate grammar syntax.
- **Combination features**: The parser comes with standard parser combinator methods and functions such as: combining parsers in sequence; matching alternative parsers until one matches; making a parser optional; repeatedly matching a parser until it no longer matches; mapping a parsing result through a function, and more.
- **Type checking**: Parmancer has a lot of type information which makes it easier to use with IDEs and type checkers.
- **Debug mode**: Built-in debug mode (`parser.parse(text, debug=True)`) provides detailed parse tree visualization including failures to help understand and fix parsing issues.

Parmancer is not for creating performant parsers, its speed is similar to other pure Python parsing libraries.
Its purpose is to create understandable, testable and maintainable parsers.

Parmancer is in development so its public API is not stable.
Please leave feedback and suggestions in the GitHub issue tracker.

Parmancer is based on [Parsy](https://parsy.readthedocs.io/en/latest/overview.html) (and [typed-parsy](https://github.com/python-parsy/typed-parsy)) which is an excellent parsing library.

## Debug mode

When developing parsers, it can be helpful to understand why a parser fails on certain input. Parmancer includes a debug mode that provides detailed information about parser execution when parsing fails.

To enable debug mode, pass `debug=True` to the `parse()` method:

```python
from parmancer import string, regex, seq, ParseError

# Create a simple parser that expects a greeting followed by a number
parser = seq(string("Hello "), regex(r"\d+"))

# This will fail - let's see why
try:
    parser.parse("Hello world", debug=True)
except ParseError as e:
    print(e)
```

The debug output shows a parse tree indicating which parsers succeeded and which failed:

```
failed with '\d+'
Furthest parsing position:
Hello world
~~~~~~^

Debug information:
==================
Parse tree:
Parser
└─KeepOne
  └─sequence
    ├─'Hello ' = 'Hello '
    └─\d+ X (failed)
```

This shows that the `'Hello '` parser succeeded, but the `\d+` regex parser failed when it encountered `"world"` instead of digits.

Debug mode is useful during development but has performance overhead, so it should be disabled in production code.

## API documentation and examples

The API docs include minimal examples of each parser and combinator.

The [GitHub repository](https://github.com/parmancer/parmancer) has an `examples` folder containing larger examples which use multiple features.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "parmancer",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "parser, parser combinator, parsing",
    "author": null,
    "author_email": "Rob Hornby <robjhornby@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/34/ef/4be612e1ada5dce4c6a8f1031ae1a97e72e69fdf4b0cb932b1e5f739c103/parmancer-0.2.1.tar.gz",
    "platform": null,
    "description": "# Parmancer\n\nParse text into **structured data types** with **parser combinators**.\n\nParmancer has **type annotations** for parsers and intermediate results.\nUsing a type checker with Parmancer gives immediate feedback about parser result types, and gives type errors when creating invalid combinations of parsers.\n\n## Installation\n\n```sh\npip install parmancer\n```\n\n## Documentation\n\nhttps://parmancer.com\n\n## Introductory example\n\nThis example shows a parser which can parse text like `\"Hello World! 1 + 2 + 3\"` to extract the name in `Hello <name>!` and find the sum of the numbers which come after it:\n\n```python\nfrom parmancer import regex, digits, seq, string\n\n# A parser which extracts a name from a greeting using a regular expression\ngreeting = regex(r\"Hello (\\w+)! \", group=1)\n\n# A parser which takes integers separated by ` + `,\n# converts them to `int`s, and sums them.\nadder = digits.map(int).sep_by(string(\" + \")).map(sum)\n\n# The `greeting` and `adder` parsers are combined in sequence\nparser = seq(greeting, adder)\n# The type of `parser` is `Parser[tuple[str, int]]`, meaning it's a parser which\n# will return a `tuple[str, int]` when it parses text.\n\n# Now the parser can be applied to the example string, or other strings following the\n# same pattern.\nresult = parser.parse(\"Hello World! 1 + 2 + 3\")\n\n# The result is a tuple containing the `greeting` result followed by the `adder` result\nassert result == (\"World\", 6)\n\n# Parsing different text which matches the same structure:\nassert parser.parse(\"Hello Example! 10 + 11\") == (\"Example\", 21)\n```\n\nType checkers such as `mypy` and `Pylance`'s type checker help during development by revealing type information and catching type errors.\n\nHere the in-line types are displayed automatically with VSCode's Python extension and the 'Inlay Hints' setting:\n\n![Type annotations for Parmancer parsers](docs/intro_example.gif)\n\nWhen the type of a parser doesn't match what's expected, such as in the following example, a type error reveals the problem as soon as the code is type checked, without having to run the code.\nIn this example the `Parser.unpack` method is being used to unpack the result tuple of type `(str, int)` into a function which expects arguments of type `(str, str)` which is a type incompatibility:\n\n![Type mismatch for the unpack method](docs/type_mismatch.png)\n\n## Dataclass parsers\n\nA key feature of Parmancer is the ability to create parsers which return dataclass instances using a short syntax where parsers are directly associated with each field of a dataclass.\n\nEach dataclass field has a parser associated with it using the `take` field descriptor instead of the usual `dataclasses.field`.\n\nThe entire dataclass parser is then **combined** using the `gather` function, creating a parser which sequentially applies each field's parser, assigning each result to the dataclass field it is associated with.\n\n```python\nfrom dataclasses import dataclass\nfrom parmancer import regex, string, take, gather\n\n# Example text which a sensor might produce\nsample_text = \"\"\"Device: SensorA\nID: abc001\nReadings (3:01 PM)\n300.1, 301, 300\nReadings (3:02 PM)\n302, 1000, 2500\n\"\"\"\n\nnumeric = regex(r\"\\d+(\\.\\d+)?\").map(float)\nany_text = regex(r\"[^\\n]+\")\nline_break = string(\"\\n\")\n\n\n# Define parsers for the sensor readings and device information\n@dataclass\nclass Reading:\n    # Matches text like `Readings (3:01 PM)`\n    timestamp: str = take(regex(r\"Readings \\(([^)]+)\\)\", group=1) << line_break)\n    # Matches text like `300.1, 301, 300`\n    values: list[float] = take(numeric.sep_by(string(\", \")) << line_break)\n\n\n@dataclass\nclass Device:\n    # Matches text like `Device: SensorA`\n    name: str = take(string(\"Device: \") >> any_text << line_break)\n    # Matches text like `ID: abc001`\n    id: str = take(string(\"ID: \") >> any_text << line_break)\n    # Matches the entire `Reading` dataclass parser 0, 1 or many times\n    readings: list[Reading] = take(gather(Reading).many())\n\n\n# Gather the fields of the `Device` dataclass into a single combined parser\n# Note the `Device.readings` field parser uses the `Reading` dataclass parser\nparser = gather(Device)\n\n# The result of the parser is a nicely structured `Device` dataclass instance,\n# ready for use in the rest of the code with minimal boilerplate to get this far\nassert parser.parse(sample_text) == Device(\n    name=\"SensorA\",\n    id=\"abc001\",\n    readings=[\n        Reading(timestamp=\"3:01 PM\", values=[300.1, 301, 300]),\n        Reading(timestamp=\"3:02 PM\", values=[302, 1000, 2500]),\n    ],\n)\n```\n\nDataclass parsers come with type annotations which make it easy to write them with hints from an IDE.\nFor example, a dataclass field of type `str` cannot be associated with a parser of type `Parser[int]` - the parser has to produce a string (`Parser[str]`) for it to be compatible, and a type checker can reveal this while writing code in an IDE:\n\n![Dataclass field parser type error](docs/dataclass_type_mismatch.png)\n\n## Why use Parmancer?\n\n- **Simple construction**: Simple parsers can be defined concisely and independently, and then combined with short, understandable **combinator** functions and methods which replace the usual branching and sequencing boilerplate of parsers written in vanilla Python.\n- **Modularity, testability, maintainability**: Each intermediate parser component is a complete parser in itself, which means it can be understood, tested and modified in isolation from the rest of the parser.\n- **Regular Python**: Some approaches to parsing use a separate grammar definition outside of Python which goes through a compilation or generation step before it can be used in Python, which can lead to black boxes. Parmancer parsers are defined as Python code rather than a separate grammar syntax.\n- **Combination features**: The parser comes with standard parser combinator methods and functions such as: combining parsers in sequence; matching alternative parsers until one matches; making a parser optional; repeatedly matching a parser until it no longer matches; mapping a parsing result through a function, and more.\n- **Type checking**: Parmancer has a lot of type information which makes it easier to use with IDEs and type checkers.\n- **Debug mode**: Built-in debug mode (`parser.parse(text, debug=True)`) provides detailed parse tree visualization including failures to help understand and fix parsing issues.\n\nParmancer is not for creating performant parsers, its speed is similar to other pure Python parsing libraries.\nIts purpose is to create understandable, testable and maintainable parsers.\n\nParmancer is in development so its public API is not stable.\nPlease leave feedback and suggestions in the GitHub issue tracker.\n\nParmancer is based on [Parsy](https://parsy.readthedocs.io/en/latest/overview.html) (and [typed-parsy](https://github.com/python-parsy/typed-parsy)) which is an excellent parsing library.\n\n## Debug mode\n\nWhen developing parsers, it can be helpful to understand why a parser fails on certain input. Parmancer includes a debug mode that provides detailed information about parser execution when parsing fails.\n\nTo enable debug mode, pass `debug=True` to the `parse()` method:\n\n```python\nfrom parmancer import string, regex, seq, ParseError\n\n# Create a simple parser that expects a greeting followed by a number\nparser = seq(string(\"Hello \"), regex(r\"\\d+\"))\n\n# This will fail - let's see why\ntry:\n    parser.parse(\"Hello world\", debug=True)\nexcept ParseError as e:\n    print(e)\n```\n\nThe debug output shows a parse tree indicating which parsers succeeded and which failed:\n\n```\nfailed with '\\d+'\nFurthest parsing position:\nHello world\n~~~~~~^\n\nDebug information:\n==================\nParse tree:\nParser\n\u2514\u2500KeepOne\n  \u2514\u2500sequence\n    \u251c\u2500'Hello ' = 'Hello '\n    \u2514\u2500\\d+ X (failed)\n```\n\nThis shows that the `'Hello '` parser succeeded, but the `\\d+` regex parser failed when it encountered `\"world\"` instead of digits.\n\nDebug mode is useful during development but has performance overhead, so it should be disabled in production code.\n\n## API documentation and examples\n\nThe API docs include minimal examples of each parser and combinator.\n\nThe [GitHub repository](https://github.com/parmancer/parmancer) has an `examples` folder containing larger examples which use multiple features.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Parse structured data from text using parser combinators",
    "version": "0.2.1",
    "project_urls": {
        "Documentation": "https://parmancer.com",
        "Repository": "https://github.com/parmancer/parmancer"
    },
    "split_keywords": [
        "parser",
        " parser combinator",
        " parsing"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "f2cdd20e2d201060bad31f04d2b5ada32c1d00de036ff186e7a8c383d7ef4e46",
                "md5": "2c5d3fed3df63f6d96663c5b53ddf2ce",
                "sha256": "acecefca57566761af1ab702f5307a032f741e66ee2c8ece9921b4927a6635b7"
            },
            "downloads": -1,
            "filename": "parmancer-0.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2c5d3fed3df63f6d96663c5b53ddf2ce",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 27394,
            "upload_time": "2025-07-19T00:16:37",
            "upload_time_iso_8601": "2025-07-19T00:16:37.472466Z",
            "url": "https://files.pythonhosted.org/packages/f2/cd/d20e2d201060bad31f04d2b5ada32c1d00de036ff186e7a8c383d7ef4e46/parmancer-0.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "34ef4be612e1ada5dce4c6a8f1031ae1a97e72e69fdf4b0cb932b1e5f739c103",
                "md5": "335b82be0bc393993fc53012edc7854c",
                "sha256": "4133b1a99e5d8ea0eecc4e684b7c04df2993cbe5b951f9c463cd1c8cd5f66ee2"
            },
            "downloads": -1,
            "filename": "parmancer-0.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "335b82be0bc393993fc53012edc7854c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 25209,
            "upload_time": "2025-07-19T00:16:39",
            "upload_time_iso_8601": "2025-07-19T00:16:39.086878Z",
            "url": "https://files.pythonhosted.org/packages/34/ef/4be612e1ada5dce4c6a8f1031ae1a97e72e69fdf4b0cb932b1e5f739c103/parmancer-0.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-19 00:16:39",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "parmancer",
    "github_project": "parmancer",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "parmancer"
}

None