caustic.lexer


Namecaustic.lexer JSON
Version 1.3.0 PyPI version JSON
download
home_pageNone
SummaryGrammar compilation for Caustic
upload_time2024-04-15 16:48:34
maintainerNone
docs_urlNone
authorShae.c32
requires_python>=3.12
licenseNone
keywords caustic language parser lexer
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            Caustic's lexing/grammar framework

The `basic_compiler` module is a less advanced compiler, but is used to
bootstrap the `Compiler`

The `Compiler` class compiles grammars from Caustic grammar (`.cag`) files into nodes,
and uses a grammer system built in Caustic grammar format and compiled with the `basic_compiler` module

The `Compiler` is loaded through the `load_compiler()` function in the package,
and can be cached to the disk using the `save_compiler()` function

The `nodes` module provides the nodes themselves, and allows manually building grammar by
supplying nodes

The `serialize` module provides functions for serializing and deserializing nodes

The `util` module provides small utilities

# The `.cag` specification

## Pragmas
Pragmas are special directives embedded in the grammar  
These are only supported on the bootstrapped `compile` module

### Include
> `$include [path]`

Allows putting multiple grammar files together

Relative paths provided as `[path]` will be checked against the following
directories, in order:
- The path of the includer/importer (if possible)
- The `builtin_path` of the `compiler` module (the location of `compiler.py`)
- The current directory

## Comments
Comments may start with a `#`

## Statements
A statement begins with an [identifier](#identifier), followed by an `=`,
then an [expression](#expression), and finally a `;`

### Identifier
An identifier is a sequence of alphanumeric characters, underscores, and periods

Note: `basic_compiler` will not accept identifiers with periods

## Expression
Expressions consist of nodes, where a node can be as simple as a [string](#string) to as complex as a [group](#group)

### Naming
> `nodes.Node.name`

Named nodes are denoted by a name (alphanumeric, underscores, and periods), followed
by a `:`, and then the node/expression  
This controls the return value of containing groups

Note: `basic_compiler` will not accept node names with periods

#### Anonymous
"Anonymous" named nodes are expressions prefixed with `:`, but with
no leading name

#### Unpack
"Unpack" nodes are expressions prefixed with `^:`

Note: `basic_compiler` will not accept unpack nodes

### Group
> `nodes.NodeGroup`

The top level of an expression is implicitly grouped

A simple group node is opened by `(` and closed by `)`  
Groups match the nodes inside of them in a sequence in order  
The return value of this group will be dependent on its contents' [naming](#naming):

- A group containing no named nodes will return a list of its nodes' results
- A group containing nodes with "[anonymous](#anonymous)" names returns the last matched anonymous nodes' return value
- A group containing [named](#naming) nodes returns a dict containing a mapping of the names to the nodes' results
- Any [unpack](#unpack) nodes will unpack either their elements (sequence) or their names and values into the surrounding group's result

Mixing anonymous and named expressions in a single group will result in an error

#### Whitespace sensitive group
> `nodes.NodeGroup`, `keep_whitespace=True`

A whitespace sensitive group is opened by `{` and closed by `}`  
The only difference between this type of group and a normal group is that it does not implicitly
discard whitespace between its nodes

#### Union
> `nodes.UnionNode`

A union is opened by `[` and closed by `]`  
Unions match any of their contained nodes

#### Range
> `nodes.NodeRange`

Can be created in the following ways:
- ` - [node]`: Matches any amount of `[node]`
- ` x- [node]`: Matches `x` or more of `[node]`
- ` -x [node]`: Matches up to (but not including) `x` of `[node]`
- ` a-b [node]`: Matches between `a` (inclusive) and `b` (exclusive) of `[node]`

Note that this should be placed *after* a (name)[#naming]

### Real
Real nodes are nodes that actually match content, such as strings or patterns

#### String
> `nodes.StringNode`

The simplest node, denoted either by single quotes (`''`) or double quotes (`""`)  
Supports escape characters

> Note: despite the name of this node, it is important to remember that the nodes only match bytes!

#### Pattern
> `nodes.PatternNode`

Matches a regular expression, denoted by slashes (`/`) in the following syntax:  
> [target group](#target-group) `/` pattern `/` [flags](#flags)

##### Target Group
In a pattern, if a target group is given (as an integer), the result of this
node will be the bytes of that group instead of the entire match

##### Flags
Supports these common RegEx flags:
- `i`: ignore case / case insensitive
- `m`: multiline - `^` matches beginning of line or string, `$` matches end of either
- `s`: single-line / "dotall" - `.` matches newlines as well

### Meta
"Meta" nodes that don't actually match anything, but can change some context

#### Stealer
> `nodes.Stealer`

A "stealer" node is denoted by a `!`, and is only acceptable in a group

If a [group](#group) reaches a "stealer" node, then the group will raise an exception
if any of the subsequent nodes fail

#### Context
> `nodes.Context`

A context is created with an opening `<` and closing `>`  
Context nodes always mach, with the result being the (string) contents

Context nodes should contain either a [string](#string),
or a short sequence of alphanumeric characters and underscores

#### Node Reference
> `nodes.NodeRef`

Denoted by an `@`, followed by a node name (as a string of alphanumeric characters, underscores, and periods)

Matches the value of the targeted node, and returns the result of that

Must be bound using either its `.bind()` method, or automatically through the
default compilers

Note: `basic_compiler` will not accept node references with periods


#### Lookahead
> `nodes.Lookahead`

Denoted by a `&` (positive lookahead) or `&!` (negative lookahead), this node will match its
target node, but will not consume any of the buffer

If the lookahead is negative, then it will return `True` if its node fails to match,
otherwise failing to match


# Changelog

## 0.2.0
- Implemented node saving and loading through the `serialize` module
- Moved `compiler.bind_nodes()` to `util.bind_nodes()`

## 1.0.0
- Completely reworked compiler caching
- Removed `$import` pragma
- Moved `WHITESPACE_PATT` to `.util`
- Changed `nodes.Node.NO_RETURN` to singleton(ish) `util.NO_MATCH`

### 1.0.0-1
- Fixed an inaccuracy in README

## 1.0.1
- Added builtin `grammar.cag` to package
- Added precompiled `precompiled_nodes.pkl` to package

## 1.0.2
- Fixed error causted by `compiler.py` `Compiler.compile_buffermatcher()` passing unneeded kwarg to `.pre_process()`
- Made `NodeSyntaxError` self-formatting also include exception notes

## 1.1.0
- Added support for periods in node names
- Fixed `Compiler.post_process_compile()` not actually doing anything

## 1.2.0
- Implemented [unpacking](#unpack) nodes

## 1.2.1
- Fixed several nodes improperly stripping whitespace

## 1.2.2
- Fixed unpacking never triggering
- Fixed `NodeRange`s raising exceptions upon backtracking

## 1.3.0
- Implemented lookaheads

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "caustic.lexer",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.12",
    "maintainer_email": null,
    "keywords": "caustic, language, parser, lexer",
    "author": "Shae.c32",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/6f/82/e26bcb68b969ad0994d630ac43bfbb95344f629298b3595d37511e490bdd/caustic.lexer-1.3.0.tar.gz",
    "platform": null,
    "description": "Caustic's lexing/grammar framework\n\nThe `basic_compiler` module is a less advanced compiler, but is used to\nbootstrap the `Compiler`\n\nThe `Compiler` class compiles grammars from Caustic grammar (`.cag`) files into nodes,\nand uses a grammer system built in Caustic grammar format and compiled with the `basic_compiler` module\n\nThe `Compiler` is loaded through the `load_compiler()` function in the package,\nand can be cached to the disk using the `save_compiler()` function\n\nThe `nodes` module provides the nodes themselves, and allows manually building grammar by\nsupplying nodes\n\nThe `serialize` module provides functions for serializing and deserializing nodes\n\nThe `util` module provides small utilities\n\n# The `.cag` specification\n\n## Pragmas\nPragmas are special directives embedded in the grammar  \nThese are only supported on the bootstrapped `compile` module\n\n### Include\n> `$include [path]`\n\nAllows putting multiple grammar files together\n\nRelative paths provided as `[path]` will be checked against the following\ndirectories, in order:\n- The path of the includer/importer (if possible)\n- The `builtin_path` of the `compiler` module (the location of `compiler.py`)\n- The current directory\n\n## Comments\nComments may start with a `#`\n\n## Statements\nA statement begins with an [identifier](#identifier), followed by an `=`,\nthen an [expression](#expression), and finally a `;`\n\n### Identifier\nAn identifier is a sequence of alphanumeric characters, underscores, and periods\n\nNote: `basic_compiler` will not accept identifiers with periods\n\n## Expression\nExpressions consist of nodes, where a node can be as simple as a [string](#string) to as complex as a [group](#group)\n\n### Naming\n> `nodes.Node.name`\n\nNamed nodes are denoted by a name (alphanumeric, underscores, and periods), followed\nby a `:`, and then the node/expression  \nThis controls the return value of containing groups\n\nNote: `basic_compiler` will not accept node names with periods\n\n#### Anonymous\n\"Anonymous\" named nodes are expressions prefixed with `:`, but with\nno leading name\n\n#### Unpack\n\"Unpack\" nodes are expressions prefixed with `^:`\n\nNote: `basic_compiler` will not accept unpack nodes\n\n### Group\n> `nodes.NodeGroup`\n\nThe top level of an expression is implicitly grouped\n\nA simple group node is opened by `(` and closed by `)`  \nGroups match the nodes inside of them in a sequence in order  \nThe return value of this group will be dependent on its contents' [naming](#naming):\n\n- A group containing no named nodes will return a list of its nodes' results\n- A group containing nodes with \"[anonymous](#anonymous)\" names returns the last matched anonymous nodes' return value\n- A group containing [named](#naming) nodes returns a dict containing a mapping of the names to the nodes' results\n- Any [unpack](#unpack) nodes will unpack either their elements (sequence) or their names and values into the surrounding group's result\n\nMixing anonymous and named expressions in a single group will result in an error\n\n#### Whitespace sensitive group\n> `nodes.NodeGroup`, `keep_whitespace=True`\n\nA whitespace sensitive group is opened by `{` and closed by `}`  \nThe only difference between this type of group and a normal group is that it does not implicitly\ndiscard whitespace between its nodes\n\n#### Union\n> `nodes.UnionNode`\n\nA union is opened by `[` and closed by `]`  \nUnions match any of their contained nodes\n\n#### Range\n> `nodes.NodeRange`\n\nCan be created in the following ways:\n- ` - [node]`: Matches any amount of `[node]`\n- ` x- [node]`: Matches `x` or more of `[node]`\n- ` -x [node]`: Matches up to (but not including) `x` of `[node]`\n- ` a-b [node]`: Matches between `a` (inclusive) and `b` (exclusive) of `[node]`\n\nNote that this should be placed *after* a (name)[#naming]\n\n### Real\nReal nodes are nodes that actually match content, such as strings or patterns\n\n#### String\n> `nodes.StringNode`\n\nThe simplest node, denoted either by single quotes (`''`) or double quotes (`\"\"`)  \nSupports escape characters\n\n> Note: despite the name of this node, it is important to remember that the nodes only match bytes!\n\n#### Pattern\n> `nodes.PatternNode`\n\nMatches a regular expression, denoted by slashes (`/`) in the following syntax:  \n> [target group](#target-group) `/` pattern `/` [flags](#flags)\n\n##### Target Group\nIn a pattern, if a target group is given (as an integer), the result of this\nnode will be the bytes of that group instead of the entire match\n\n##### Flags\nSupports these common RegEx flags:\n- `i`: ignore case / case insensitive\n- `m`: multiline - `^` matches beginning of line or string, `$` matches end of either\n- `s`: single-line / \"dotall\" - `.` matches newlines as well\n\n### Meta\n\"Meta\" nodes that don't actually match anything, but can change some context\n\n#### Stealer\n> `nodes.Stealer`\n\nA \"stealer\" node is denoted by a `!`, and is only acceptable in a group\n\nIf a [group](#group) reaches a \"stealer\" node, then the group will raise an exception\nif any of the subsequent nodes fail\n\n#### Context\n> `nodes.Context`\n\nA context is created with an opening `<` and closing `>`  \nContext nodes always mach, with the result being the (string) contents\n\nContext nodes should contain either a [string](#string),\nor a short sequence of alphanumeric characters and underscores\n\n#### Node Reference\n> `nodes.NodeRef`\n\nDenoted by an `@`, followed by a node name (as a string of alphanumeric characters, underscores, and periods)\n\nMatches the value of the targeted node, and returns the result of that\n\nMust be bound using either its `.bind()` method, or automatically through the\ndefault compilers\n\nNote: `basic_compiler` will not accept node references with periods\n\n\n#### Lookahead\n> `nodes.Lookahead`\n\nDenoted by a `&` (positive lookahead) or `&!` (negative lookahead), this node will match its\ntarget node, but will not consume any of the buffer\n\nIf the lookahead is negative, then it will return `True` if its node fails to match,\notherwise failing to match\n\n\n# Changelog\n\n## 0.2.0\n- Implemented node saving and loading through the `serialize` module\n- Moved `compiler.bind_nodes()` to `util.bind_nodes()`\n\n## 1.0.0\n- Completely reworked compiler caching\n- Removed `$import` pragma\n- Moved `WHITESPACE_PATT` to `.util`\n- Changed `nodes.Node.NO_RETURN` to singleton(ish) `util.NO_MATCH`\n\n### 1.0.0-1\n- Fixed an inaccuracy in README\n\n## 1.0.1\n- Added builtin `grammar.cag` to package\n- Added precompiled `precompiled_nodes.pkl` to package\n\n## 1.0.2\n- Fixed error causted by `compiler.py` `Compiler.compile_buffermatcher()` passing unneeded kwarg to `.pre_process()`\n- Made `NodeSyntaxError` self-formatting also include exception notes\n\n## 1.1.0\n- Added support for periods in node names\n- Fixed `Compiler.post_process_compile()` not actually doing anything\n\n## 1.2.0\n- Implemented [unpacking](#unpack) nodes\n\n## 1.2.1\n- Fixed several nodes improperly stripping whitespace\n\n## 1.2.2\n- Fixed unpacking never triggering\n- Fixed `NodeRange`s raising exceptions upon backtracking\n\n## 1.3.0\n- Implemented lookaheads\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Grammar compilation for Caustic",
    "version": "1.3.0",
    "project_urls": {
        "Homepage": "https://codeberg.org/Caustic/CausticLexer",
        "Issues": "https://codeberg.org/Caustic/CausticLexer/issues"
    },
    "split_keywords": [
        "caustic",
        " language",
        " parser",
        " lexer"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ff9224beaa51fa161b74d80def69c5d207a7244e61d479cf57f5d8c70ddc8cb1",
                "md5": "afb3efe1c65fedd2e41f10b787796633",
                "sha256": "0f61724a0dd7b3f576467c7c506bf0860da81ec74025940597b88be10f8fa71f"
            },
            "downloads": -1,
            "filename": "caustic.lexer-1.3.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "afb3efe1c65fedd2e41f10b787796633",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.12",
            "size": 20822,
            "upload_time": "2024-04-15T16:48:31",
            "upload_time_iso_8601": "2024-04-15T16:48:31.508165Z",
            "url": "https://files.pythonhosted.org/packages/ff/92/24beaa51fa161b74d80def69c5d207a7244e61d479cf57f5d8c70ddc8cb1/caustic.lexer-1.3.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6f82e26bcb68b969ad0994d630ac43bfbb95344f629298b3595d37511e490bdd",
                "md5": "d01926a010d0bc824dcd7bf40c98d019",
                "sha256": "333343000bfc11b5a8cc2b3ff0c7b93c6795f06a3b50b078b1c8df788810cc95"
            },
            "downloads": -1,
            "filename": "caustic.lexer-1.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "d01926a010d0bc824dcd7bf40c98d019",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.12",
            "size": 21113,
            "upload_time": "2024-04-15T16:48:34",
            "upload_time_iso_8601": "2024-04-15T16:48:34.047982Z",
            "url": "https://files.pythonhosted.org/packages/6f/82/e26bcb68b969ad0994d630ac43bfbb95344f629298b3595d37511e490bdd/caustic.lexer-1.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-15 16:48:34",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": true,
    "codeberg_user": "Caustic",
    "codeberg_project": "CausticLexer",
    "lcname": "caustic.lexer"
}
        
Elapsed time: 0.27201s