Caustic's lexing/grammar framework
The `basic_compiler` module is a less advanced compiler, but is used to
bootstrap the `Compiler`
The `Compiler` class compiles grammars from Caustic grammar (`.cag`) files into nodes,
and uses a grammer system built in Caustic grammar format and compiled with the `basic_compiler` module
The `Compiler` is loaded through the `load_compiler()` function in the package,
and can be cached to the disk using the `save_compiler()` function
The `nodes` module provides the nodes themselves, and allows manually building grammar by
supplying nodes
The `serialize` module provides functions for serializing and deserializing nodes
The `util` module provides small utilities
# The `.cag` specification
## Pragmas
Pragmas are special directives embedded in the grammar
These are only supported on the bootstrapped `compile` module
### Include
> `$include [path]`
Allows putting multiple grammar files together
Relative paths provided as `[path]` will be checked against the following
directories, in order:
- The path of the includer/importer (if possible)
- The `builtin_path` of the `compiler` module (the location of `compiler.py`)
- The current directory
## Comments
Comments may start with a `#`
## Statements
A statement begins with an [identifier](#identifier), followed by an `=`,
then an [expression](#expression), and finally a `;`
### Identifier
An identifier is a sequence of alphanumeric characters, underscores, and periods
Note: `basic_compiler` will not accept identifiers with periods
## Expression
Expressions consist of nodes, where a node can be as simple as a [string](#string) to as complex as a [group](#group)
### Naming
> `nodes.Node.name`
Named nodes are denoted by a name (alphanumeric, underscores, and periods), followed
by a `:`, and then the node/expression
This controls the return value of containing groups
Note: `basic_compiler` will not accept node names with periods
#### Anonymous
"Anonymous" named nodes are expressions prefixed with `:`, but with
no leading name
#### Unpack
"Unpack" nodes are expressions prefixed with `^:`
Note: `basic_compiler` will not accept unpack nodes
### Group
> `nodes.NodeGroup`
The top level of an expression is implicitly grouped
A simple group node is opened by `(` and closed by `)`
Groups match the nodes inside of them in a sequence in order
The return value of this group will be dependent on its contents' [naming](#naming):
- A group containing no named nodes will return a list of its nodes' results
- A group containing nodes with "[anonymous](#anonymous)" names returns the last matched anonymous nodes' return value
- A group containing [named](#naming) nodes returns a dict containing a mapping of the names to the nodes' results
- Any [unpack](#unpack) nodes will unpack either their elements (sequence) or their names and values into the surrounding group's result
Mixing anonymous and named expressions in a single group will result in an error
#### Whitespace sensitive group
> `nodes.NodeGroup`, `keep_whitespace=True`
A whitespace sensitive group is opened by `{` and closed by `}`
The only difference between this type of group and a normal group is that it does not implicitly
discard whitespace between its nodes
#### Union
> `nodes.UnionNode`
A union is opened by `[` and closed by `]`
Unions match any of their contained nodes
#### Range
> `nodes.NodeRange`
Can be created in the following ways:
- ` - [node]`: Matches any amount of `[node]`
- ` x- [node]`: Matches `x` or more of `[node]`
- ` -x [node]`: Matches up to (but not including) `x` of `[node]`
- ` a-b [node]`: Matches between `a` (inclusive) and `b` (exclusive) of `[node]`
Note that this should be placed *after* a (name)[#naming]
### Real
Real nodes are nodes that actually match content, such as strings or patterns
#### String
> `nodes.StringNode`
The simplest node, denoted either by single quotes (`''`) or double quotes (`""`)
Supports escape characters
> Note: despite the name of this node, it is important to remember that the nodes only match bytes!
#### Pattern
> `nodes.PatternNode`
Matches a regular expression, denoted by slashes (`/`) in the following syntax:
> [target group](#target-group) `/` pattern `/` [flags](#flags)
##### Target Group
In a pattern, if a target group is given (as an integer), the result of this
node will be the bytes of that group instead of the entire match
##### Flags
Supports these common RegEx flags:
- `i`: ignore case / case insensitive
- `m`: multiline - `^` matches beginning of line or string, `$` matches end of either
- `s`: single-line / "dotall" - `.` matches newlines as well
### Meta
"Meta" nodes that don't actually match anything, but can change some context
#### Stealer
> `nodes.Stealer`
A "stealer" node is denoted by a `!`, and is only acceptable in a group
If a [group](#group) reaches a "stealer" node, then the group will raise an exception
if any of the subsequent nodes fail
#### Context
> `nodes.Context`
A context is created with an opening `<` and closing `>`
Context nodes always mach, with the result being the (string) contents
Context nodes should contain either a [string](#string),
or a short sequence of alphanumeric characters and underscores
#### Node Reference
> `nodes.NodeRef`
Denoted by an `@`, followed by a node name (as a string of alphanumeric characters, underscores, and periods)
Matches the value of the targeted node, and returns the result of that
Must be bound using either its `.bind()` method, or automatically through the
default compilers
Note: `basic_compiler` will not accept node references with periods
#### Lookahead
> `nodes.Lookahead`
Denoted by a `&` (positive lookahead) or `&!` (negative lookahead), this node will match its
target node, but will not consume any of the buffer
If the lookahead is negative, then it will return `True` if its node fails to match,
otherwise failing to match
# Changelog
## 0.2.0
- Implemented node saving and loading through the `serialize` module
- Moved `compiler.bind_nodes()` to `util.bind_nodes()`
## 1.0.0
- Completely reworked compiler caching
- Removed `$import` pragma
- Moved `WHITESPACE_PATT` to `.util`
- Changed `nodes.Node.NO_RETURN` to singleton(ish) `util.NO_MATCH`
### 1.0.0-1
- Fixed an inaccuracy in README
## 1.0.1
- Added builtin `grammar.cag` to package
- Added precompiled `precompiled_nodes.pkl` to package
## 1.0.2
- Fixed error causted by `compiler.py` `Compiler.compile_buffermatcher()` passing unneeded kwarg to `.pre_process()`
- Made `NodeSyntaxError` self-formatting also include exception notes
## 1.1.0
- Added support for periods in node names
- Fixed `Compiler.post_process_compile()` not actually doing anything
## 1.2.0
- Implemented [unpacking](#unpack) nodes
## 1.2.1
- Fixed several nodes improperly stripping whitespace
## 1.2.2
- Fixed unpacking never triggering
- Fixed `NodeRange`s raising exceptions upon backtracking
## 1.3.0
- Implemented lookaheads
Raw data
{
"_id": null,
"home_page": null,
"name": "caustic.lexer",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.12",
"maintainer_email": null,
"keywords": "caustic, language, parser, lexer",
"author": "Shae.c32",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/6f/82/e26bcb68b969ad0994d630ac43bfbb95344f629298b3595d37511e490bdd/caustic.lexer-1.3.0.tar.gz",
"platform": null,
"description": "Caustic's lexing/grammar framework\n\nThe `basic_compiler` module is a less advanced compiler, but is used to\nbootstrap the `Compiler`\n\nThe `Compiler` class compiles grammars from Caustic grammar (`.cag`) files into nodes,\nand uses a grammer system built in Caustic grammar format and compiled with the `basic_compiler` module\n\nThe `Compiler` is loaded through the `load_compiler()` function in the package,\nand can be cached to the disk using the `save_compiler()` function\n\nThe `nodes` module provides the nodes themselves, and allows manually building grammar by\nsupplying nodes\n\nThe `serialize` module provides functions for serializing and deserializing nodes\n\nThe `util` module provides small utilities\n\n# The `.cag` specification\n\n## Pragmas\nPragmas are special directives embedded in the grammar \nThese are only supported on the bootstrapped `compile` module\n\n### Include\n> `$include [path]`\n\nAllows putting multiple grammar files together\n\nRelative paths provided as `[path]` will be checked against the following\ndirectories, in order:\n- The path of the includer/importer (if possible)\n- The `builtin_path` of the `compiler` module (the location of `compiler.py`)\n- The current directory\n\n## Comments\nComments may start with a `#`\n\n## Statements\nA statement begins with an [identifier](#identifier), followed by an `=`,\nthen an [expression](#expression), and finally a `;`\n\n### Identifier\nAn identifier is a sequence of alphanumeric characters, underscores, and periods\n\nNote: `basic_compiler` will not accept identifiers with periods\n\n## Expression\nExpressions consist of nodes, where a node can be as simple as a [string](#string) to as complex as a [group](#group)\n\n### Naming\n> `nodes.Node.name`\n\nNamed nodes are denoted by a name (alphanumeric, underscores, and periods), followed\nby a `:`, and then the node/expression \nThis controls the return value of containing groups\n\nNote: `basic_compiler` will not accept node names with periods\n\n#### Anonymous\n\"Anonymous\" named nodes are expressions prefixed with `:`, but with\nno leading name\n\n#### Unpack\n\"Unpack\" nodes are expressions prefixed with `^:`\n\nNote: `basic_compiler` will not accept unpack nodes\n\n### Group\n> `nodes.NodeGroup`\n\nThe top level of an expression is implicitly grouped\n\nA simple group node is opened by `(` and closed by `)` \nGroups match the nodes inside of them in a sequence in order \nThe return value of this group will be dependent on its contents' [naming](#naming):\n\n- A group containing no named nodes will return a list of its nodes' results\n- A group containing nodes with \"[anonymous](#anonymous)\" names returns the last matched anonymous nodes' return value\n- A group containing [named](#naming) nodes returns a dict containing a mapping of the names to the nodes' results\n- Any [unpack](#unpack) nodes will unpack either their elements (sequence) or their names and values into the surrounding group's result\n\nMixing anonymous and named expressions in a single group will result in an error\n\n#### Whitespace sensitive group\n> `nodes.NodeGroup`, `keep_whitespace=True`\n\nA whitespace sensitive group is opened by `{` and closed by `}` \nThe only difference between this type of group and a normal group is that it does not implicitly\ndiscard whitespace between its nodes\n\n#### Union\n> `nodes.UnionNode`\n\nA union is opened by `[` and closed by `]` \nUnions match any of their contained nodes\n\n#### Range\n> `nodes.NodeRange`\n\nCan be created in the following ways:\n- ` - [node]`: Matches any amount of `[node]`\n- ` x- [node]`: Matches `x` or more of `[node]`\n- ` -x [node]`: Matches up to (but not including) `x` of `[node]`\n- ` a-b [node]`: Matches between `a` (inclusive) and `b` (exclusive) of `[node]`\n\nNote that this should be placed *after* a (name)[#naming]\n\n### Real\nReal nodes are nodes that actually match content, such as strings or patterns\n\n#### String\n> `nodes.StringNode`\n\nThe simplest node, denoted either by single quotes (`''`) or double quotes (`\"\"`) \nSupports escape characters\n\n> Note: despite the name of this node, it is important to remember that the nodes only match bytes!\n\n#### Pattern\n> `nodes.PatternNode`\n\nMatches a regular expression, denoted by slashes (`/`) in the following syntax: \n> [target group](#target-group) `/` pattern `/` [flags](#flags)\n\n##### Target Group\nIn a pattern, if a target group is given (as an integer), the result of this\nnode will be the bytes of that group instead of the entire match\n\n##### Flags\nSupports these common RegEx flags:\n- `i`: ignore case / case insensitive\n- `m`: multiline - `^` matches beginning of line or string, `$` matches end of either\n- `s`: single-line / \"dotall\" - `.` matches newlines as well\n\n### Meta\n\"Meta\" nodes that don't actually match anything, but can change some context\n\n#### Stealer\n> `nodes.Stealer`\n\nA \"stealer\" node is denoted by a `!`, and is only acceptable in a group\n\nIf a [group](#group) reaches a \"stealer\" node, then the group will raise an exception\nif any of the subsequent nodes fail\n\n#### Context\n> `nodes.Context`\n\nA context is created with an opening `<` and closing `>` \nContext nodes always mach, with the result being the (string) contents\n\nContext nodes should contain either a [string](#string),\nor a short sequence of alphanumeric characters and underscores\n\n#### Node Reference\n> `nodes.NodeRef`\n\nDenoted by an `@`, followed by a node name (as a string of alphanumeric characters, underscores, and periods)\n\nMatches the value of the targeted node, and returns the result of that\n\nMust be bound using either its `.bind()` method, or automatically through the\ndefault compilers\n\nNote: `basic_compiler` will not accept node references with periods\n\n\n#### Lookahead\n> `nodes.Lookahead`\n\nDenoted by a `&` (positive lookahead) or `&!` (negative lookahead), this node will match its\ntarget node, but will not consume any of the buffer\n\nIf the lookahead is negative, then it will return `True` if its node fails to match,\notherwise failing to match\n\n\n# Changelog\n\n## 0.2.0\n- Implemented node saving and loading through the `serialize` module\n- Moved `compiler.bind_nodes()` to `util.bind_nodes()`\n\n## 1.0.0\n- Completely reworked compiler caching\n- Removed `$import` pragma\n- Moved `WHITESPACE_PATT` to `.util`\n- Changed `nodes.Node.NO_RETURN` to singleton(ish) `util.NO_MATCH`\n\n### 1.0.0-1\n- Fixed an inaccuracy in README\n\n## 1.0.1\n- Added builtin `grammar.cag` to package\n- Added precompiled `precompiled_nodes.pkl` to package\n\n## 1.0.2\n- Fixed error causted by `compiler.py` `Compiler.compile_buffermatcher()` passing unneeded kwarg to `.pre_process()`\n- Made `NodeSyntaxError` self-formatting also include exception notes\n\n## 1.1.0\n- Added support for periods in node names\n- Fixed `Compiler.post_process_compile()` not actually doing anything\n\n## 1.2.0\n- Implemented [unpacking](#unpack) nodes\n\n## 1.2.1\n- Fixed several nodes improperly stripping whitespace\n\n## 1.2.2\n- Fixed unpacking never triggering\n- Fixed `NodeRange`s raising exceptions upon backtracking\n\n## 1.3.0\n- Implemented lookaheads\n",
"bugtrack_url": null,
"license": null,
"summary": "Grammar compilation for Caustic",
"version": "1.3.0",
"project_urls": {
"Homepage": "https://codeberg.org/Caustic/CausticLexer",
"Issues": "https://codeberg.org/Caustic/CausticLexer/issues"
},
"split_keywords": [
"caustic",
" language",
" parser",
" lexer"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ff9224beaa51fa161b74d80def69c5d207a7244e61d479cf57f5d8c70ddc8cb1",
"md5": "afb3efe1c65fedd2e41f10b787796633",
"sha256": "0f61724a0dd7b3f576467c7c506bf0860da81ec74025940597b88be10f8fa71f"
},
"downloads": -1,
"filename": "caustic.lexer-1.3.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "afb3efe1c65fedd2e41f10b787796633",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.12",
"size": 20822,
"upload_time": "2024-04-15T16:48:31",
"upload_time_iso_8601": "2024-04-15T16:48:31.508165Z",
"url": "https://files.pythonhosted.org/packages/ff/92/24beaa51fa161b74d80def69c5d207a7244e61d479cf57f5d8c70ddc8cb1/caustic.lexer-1.3.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "6f82e26bcb68b969ad0994d630ac43bfbb95344f629298b3595d37511e490bdd",
"md5": "d01926a010d0bc824dcd7bf40c98d019",
"sha256": "333343000bfc11b5a8cc2b3ff0c7b93c6795f06a3b50b078b1c8df788810cc95"
},
"downloads": -1,
"filename": "caustic.lexer-1.3.0.tar.gz",
"has_sig": false,
"md5_digest": "d01926a010d0bc824dcd7bf40c98d019",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.12",
"size": 21113,
"upload_time": "2024-04-15T16:48:34",
"upload_time_iso_8601": "2024-04-15T16:48:34.047982Z",
"url": "https://files.pythonhosted.org/packages/6f/82/e26bcb68b969ad0994d630ac43bfbb95344f629298b3595d37511e490bdd/caustic.lexer-1.3.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-15 16:48:34",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": true,
"codeberg_user": "Caustic",
"codeberg_project": "CausticLexer",
"lcname": "caustic.lexer"
}