parmancer


Nameparmancer JSON
Version 0.1.1 PyPI version JSON
download
home_pagehttps://github.com/parmancer/parmancer
SummaryParse structured data from text using parser combinators
upload_time2024-09-15 02:30:10
maintainerNone
docs_urlNone
authorRob Hornby
requires_python>=3.8
licenseMIT
keywords parser parsing parser combinator
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Parmancer

Parse text into **structured data types** with **parser combinators**.

Parmancer has rich **type annotations** for parsers and intermediate results.
Using a type checker with Parmancer gives immediate feedback about parser result types, and gives type errors when creating invalid combinations of parsers.

## Installation

```sh
pip install parmancer
```

## Documentation

https://parmancer.github.io/parmancer

## Introductory example

This example shows a parser which can parse text like `"Hello World! 1 + 2 + 3"` to extract the name in `Hello <name>!` and find the sum of the numbers which come after it:

```python
from parmancer import regex, digits, seq, string

# A parser which extracts a name from a greeting using a regular expression
greeting = regex(r"Hello (\w+)! ", group=1)

# A parser which takes integers separated by ` + `,
# converts them to `int`s, and sums them.
adder = digits.map(int).sep_by(string(" + ")).map(sum)

# The `greeting` and `adder` parsers are combined in sequence
parser = seq(greeting, adder)

# Now the parser can be applied to the example string, or other strings following the
# same pattern.
result = parser.parse("Hello World! 1 + 2 + 3")

# The result is a tuple containing the `greeting` result followed by the `adder` result
assert result == ("World", 6)

# Parsing different text which matches the same structure:
assert parser.parse("Hello Example! 10 + 11") == ("Example", 21)
```

## Dataclass parsers

A key feature of Parmancer is the ability to create parsers which return dataclass instances using a short syntax where parsers are directly associated with each field of a dataclass.

Each dataclass field has a parser associated with it using the `take` field descriptor instead of the usual `dataclasses.field`.

The entire dataclass parser is then **combined** using the `gather` function, creating a parser which sequentially applies each field's parser, assigning each result to the dataclass field it is associated with.

```python
from dataclasses import dataclass
from parmancer import regex, string, take, gather

# Example text which a sensor might produce
sample_text = """Device: SensorA
ID: abc001
Readings (3:01 PM)
300.1, 301, 300
Readings (3:02 PM)
302, 1000, 2500
"""

numeric = regex(r"\d+(\.\d+)?").map(float)
any_text = regex(r"[^\n]+")
line_break = string("\n")

# Define parsers for the sensor readings and device information
@dataclass
class Reading:
    # Matches text like `Readings (3:01 PM)`
    timestamp: str = take(regex(r"Readings \(([^)]+)\)", group=1) << line_break)
    # Matches text like `300.1, 301, 300`
    values: list[float] = take(numeric.sep_by(string(", ")) << line_break)

@dataclass
class Device:
    # Matches text like `Device: SensorA`
    name: str = take(string("Device: ") >> any_text << line_break)
    # Matches text like `ID: abc001`
    id: str = take(string("ID: ") >> any_text << line_break)
    # Matches the entire `Reading` dataclass parser 0, 1 or many times
    readings: list[Reading] = take(gather(Reading).many())

# Gather the fields of the `Device` dataclass into a single combined parser
# Note the `Device.readings` field parser uses the `Reading` dataclass parser
parser = gather(Device)

# The result of the parser is a nicely structured `Device` dataclass instance,
# ready for use in the rest of the code with minimal boilerplate to get this far
assert parser.parse(sample_text) == Device(
    name="SensorA",
    id="abc001",
    readings=[
        Reading(timestamp="3:01 PM", values=[300.1, 301, 300]),
        Reading(timestamp="3:02 PM", values=[302, 1000, 2500]),
    ],
)
```

## Why use Parmancer?

- **Simple construction**: Simple parsers can be defined concisely and independently, and then combined with short, understandable **combinator** functions and methods which replace the usual branching and sequencing boilerplate of parsers written in vanilla Python.
- **Modularity, testability, maintainability**: Each intermediate parser component is a complete parser in itself, which means it can be understood, tested and modified in isolation from the rest of the parser.
- **Regular Python**: Some other approaches to parsing use a separate grammar definition outside of Python which goes through a compilation or generation step before it can be used in Python, which can lead to black boxes. Parmancer parsers are defined as Python code rather than a separate grammar syntax.
- **Combination features**: The parser comes with standard parser combinator methods and functions such as: combining parsers in sequence; matching alternative parsers until one matches; making a parser optional; repeatedly matching a parser until it no longer matches; mapping a parsing result through a function, and more.
- **Type checking**: Parmancer has a lot of type information which makes it easier to use with IDEs and type checkers.

Parmancer is not for creating performant parsers, its speed is similar to other pure Python parsing libraries.
Its purpose is to create understandable, testable and maintainable parsers.

Parmancer is in development so its public API is not stable.
Please leave feedback and suggestions in the GitHub issue tracker.

Parmancer is based on [Parsy](https://parsy.readthedocs.io/en/latest/overview.html) (and [typed-parsy](https://github.com/python-parsy/typed-parsy)) which is an excellent parsing library.

## API documentation and examples

The API docs include minimal examples of each parser and combinator.

The GitHub repository has an `examples` folder containing larger examples which use multiple features.


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/parmancer/parmancer",
    "name": "parmancer",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "parser, parsing, parser combinator",
    "author": "Rob Hornby",
    "author_email": "robjhornby@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/10/8c/1c8c8614e4d59989a37c285c589437586f8992381bff2f74fa6f9eb97cbb/parmancer-0.1.1.tar.gz",
    "platform": null,
    "description": "# Parmancer\n\nParse text into **structured data types** with **parser combinators**.\n\nParmancer has rich **type annotations** for parsers and intermediate results.\nUsing a type checker with Parmancer gives immediate feedback about parser result types, and gives type errors when creating invalid combinations of parsers.\n\n## Installation\n\n```sh\npip install parmancer\n```\n\n## Documentation\n\nhttps://parmancer.github.io/parmancer\n\n## Introductory example\n\nThis example shows a parser which can parse text like `\"Hello World! 1 + 2 + 3\"` to extract the name in `Hello <name>!` and find the sum of the numbers which come after it:\n\n```python\nfrom parmancer import regex, digits, seq, string\n\n# A parser which extracts a name from a greeting using a regular expression\ngreeting = regex(r\"Hello (\\w+)! \", group=1)\n\n# A parser which takes integers separated by ` + `,\n# converts them to `int`s, and sums them.\nadder = digits.map(int).sep_by(string(\" + \")).map(sum)\n\n# The `greeting` and `adder` parsers are combined in sequence\nparser = seq(greeting, adder)\n\n# Now the parser can be applied to the example string, or other strings following the\n# same pattern.\nresult = parser.parse(\"Hello World! 1 + 2 + 3\")\n\n# The result is a tuple containing the `greeting` result followed by the `adder` result\nassert result == (\"World\", 6)\n\n# Parsing different text which matches the same structure:\nassert parser.parse(\"Hello Example! 10 + 11\") == (\"Example\", 21)\n```\n\n## Dataclass parsers\n\nA key feature of Parmancer is the ability to create parsers which return dataclass instances using a short syntax where parsers are directly associated with each field of a dataclass.\n\nEach dataclass field has a parser associated with it using the `take` field descriptor instead of the usual `dataclasses.field`.\n\nThe entire dataclass parser is then **combined** using the `gather` function, creating a parser which sequentially applies each field's parser, assigning each result to the dataclass field it is associated with.\n\n```python\nfrom dataclasses import dataclass\nfrom parmancer import regex, string, take, gather\n\n# Example text which a sensor might produce\nsample_text = \"\"\"Device: SensorA\nID: abc001\nReadings (3:01 PM)\n300.1, 301, 300\nReadings (3:02 PM)\n302, 1000, 2500\n\"\"\"\n\nnumeric = regex(r\"\\d+(\\.\\d+)?\").map(float)\nany_text = regex(r\"[^\\n]+\")\nline_break = string(\"\\n\")\n\n# Define parsers for the sensor readings and device information\n@dataclass\nclass Reading:\n    # Matches text like `Readings (3:01 PM)`\n    timestamp: str = take(regex(r\"Readings \\(([^)]+)\\)\", group=1) << line_break)\n    # Matches text like `300.1, 301, 300`\n    values: list[float] = take(numeric.sep_by(string(\", \")) << line_break)\n\n@dataclass\nclass Device:\n    #\u00a0Matches text like `Device: SensorA`\n    name: str = take(string(\"Device: \") >> any_text << line_break)\n    # Matches text like `ID: abc001`\n    id: str = take(string(\"ID: \") >> any_text << line_break)\n    # Matches the entire `Reading` dataclass parser 0, 1 or many times\n    readings: list[Reading] = take(gather(Reading).many())\n\n# Gather the fields of the `Device` dataclass into a single combined parser\n# Note the `Device.readings` field parser uses the `Reading` dataclass parser\nparser = gather(Device)\n\n# The result of the parser is a nicely structured `Device` dataclass instance,\n# ready for use in the rest of the code with minimal boilerplate to get this far\nassert parser.parse(sample_text) == Device(\n    name=\"SensorA\",\n    id=\"abc001\",\n    readings=[\n        Reading(timestamp=\"3:01 PM\", values=[300.1, 301, 300]),\n        Reading(timestamp=\"3:02 PM\", values=[302, 1000, 2500]),\n    ],\n)\n```\n\n## Why use Parmancer?\n\n- **Simple construction**: Simple parsers can be defined concisely and independently, and then combined with short, understandable **combinator** functions and methods which replace the usual branching and sequencing boilerplate of parsers written in vanilla Python.\n- **Modularity, testability, maintainability**: Each intermediate parser component is a complete parser in itself, which means it can be understood, tested and modified in isolation from the rest of the parser.\n- **Regular Python**: Some other approaches to parsing use a separate grammar definition outside of Python which goes through a compilation or generation step before it can be used in Python, which can lead to black boxes. Parmancer parsers are defined as Python code rather than a separate grammar syntax.\n- **Combination features**: The parser comes with standard parser combinator methods and functions such as: combining parsers in sequence; matching alternative parsers until one matches; making a parser optional; repeatedly matching a parser until it no longer matches; mapping a parsing result through a function, and more.\n- **Type checking**: Parmancer has a lot of type information which makes it easier to use with IDEs and type checkers.\n\nParmancer is not for creating performant parsers, its speed is similar to other pure Python parsing libraries.\nIts purpose is to create understandable, testable and maintainable parsers.\n\nParmancer is in development so its public API is not stable.\nPlease leave feedback and suggestions in the GitHub issue tracker.\n\nParmancer is based on [Parsy](https://parsy.readthedocs.io/en/latest/overview.html) (and [typed-parsy](https://github.com/python-parsy/typed-parsy)) which is an excellent parsing library.\n\n## API documentation and examples\n\nThe API docs include minimal examples of each parser and combinator.\n\nThe GitHub repository has an `examples` folder containing larger examples which use multiple features.\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Parse structured data from text using parser combinators",
    "version": "0.1.1",
    "project_urls": {
        "Documentation": "https://parmancer.github.io/parmancer/",
        "Homepage": "https://github.com/parmancer/parmancer",
        "Repository": "https://github.com/parmancer/parmancer"
    },
    "split_keywords": [
        "parser",
        " parsing",
        " parser combinator"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7a70135fddc020243b7f2373bed64dbb78784bc3d905e6a15d73690a90a14513",
                "md5": "493b5a0443c53f09654a90ae639497e7",
                "sha256": "aa0edfdaa399566921a23e551f2f414a6555b15f71f3533da9fafae7f3adef06"
            },
            "downloads": -1,
            "filename": "parmancer-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "493b5a0443c53f09654a90ae639497e7",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 21984,
            "upload_time": "2024-09-15T02:30:08",
            "upload_time_iso_8601": "2024-09-15T02:30:08.593009Z",
            "url": "https://files.pythonhosted.org/packages/7a/70/135fddc020243b7f2373bed64dbb78784bc3d905e6a15d73690a90a14513/parmancer-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "108c1c8c8614e4d59989a37c285c589437586f8992381bff2f74fa6f9eb97cbb",
                "md5": "adfc53c17faac1a61fa3b757b9b52b32",
                "sha256": "82754796348a35dc1eb2bd1d05bd0995850402b17370e8dfade10694335f8018"
            },
            "downloads": -1,
            "filename": "parmancer-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "adfc53c17faac1a61fa3b757b9b52b32",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 20872,
            "upload_time": "2024-09-15T02:30:10",
            "upload_time_iso_8601": "2024-09-15T02:30:10.232600Z",
            "url": "https://files.pythonhosted.org/packages/10/8c/1c8c8614e4d59989a37c285c589437586f8992381bff2f74fa6f9eb97cbb/parmancer-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-15 02:30:10",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "parmancer",
    "github_project": "parmancer",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "parmancer"
}
        
Elapsed time: 3.75460s