magelang


Namemagelang JSON
Version 0.1.1 PyPI version JSON
download
home_pageNone
SummaryA modern lexer/parser generator for a growing number of languages
upload_time2024-11-20 18:10:13
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseCopyright 2024 Sam Vervaeck Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords text-analysis scanner lexer parser code-generator
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            Mage: Text Analysis Made Easy
=============================

Mage is an experimental tool for performing text analysis. It does so by
generating a _lexer_, _parser_ and _parse tree_ for you. Whether it is a piece
of programming code or some tabular data in a fringe format, Mage has got you
covered!

**Features**

 - ✅ A simple yet expressive DSL to write your grammars in
 - ✅ Full support for Python typings. Avoid runtime errors while building your language!
 - 🚧 Lots of unit tests to enure your code does what you expect it to do
 - 🚧 An intermediate language to very easily add support to any other programming language

👀 Mage is written in itself. Check out the [generated code][1] of part of our Python generator!

**Implementation Status**

| Feature | Python  | Rust | C  | C++ | JavaScript |
|---------|---------|------|----|-----|------------|
| CST     | ✅      | ⏳   | ⏳ | ⏳  | ⏳         |
| AST     | ⏳      | ⏳   | ⏳ | ⏳  | ⏳         |
| Lexer   | 🚧      | ⏳   | ⏳ | ⏳  | ⏳         |
| Parser  | ⏳      | ⏳   | ⏳ | ⏳  | ⏳         |
| Emitter | ⏳      | ⏳   | ⏳ | ⏳  | ⏳         |

[1]: https://github.com/samvv/mage/blob/main/src/magelang/lang/python/cst.py

## Installation

```
$ pip3 install --user -U magelang
```

## Usage

Currently requires at least Python version 3.12 to run.

### `mage generate <lang> <filename>`

Generate a parser for the given grammar in a language that you specify.

**Example**

```
mage generate python foo.mage --prefix foo --out-dir src/foolang
```

### 🚧 `mage test <filename..>`

> [!WARNING]
>
> This command is under construction.

Run all tests inside the documentation of the given grammar.

## Grammar

### `pub <name> = <expr>`

Define a new node or token that must be parsed according the given expression.

You can use both inline rules and other node rules inside `expr`. When
referring to another node, that node will become a field in the node that
referred to it. Nodes that have no fields are converted to a special token type
that is more efficient to represent.

```
pub var_decl = 'var' name:ident '=' type_expr
```

### `<name> = <expr>`

Define a new inline rule that can be used inside other rules.

As the name suggests, this type of rule is merely syntactic sugar and gets
inlined whenever it is referred to inside another rule.

```
digits = [0-9]+
```

### `extern <name>`

Defines a new parsing rule that is defined somewhere else, possibly in a different language.

### `extern token <name>`

Defines a new lexing rule that is defined somewhere else, possibly in a different language.

### `pub token <name> = <expr>`

Like `pub <name> = <expr>` but forces the rule to be a token.

Mage will show an error when the rule could not be converted to a token rule.
This usually means that the rule references another rule that is `pub`.

```
pub token float_expression
  = digits? '.' digits
```

### `expr1 expr2`

First parse `expr1` and continue to parse `expr2` immediately after it.

```
pub two_column_csv_line
  = text ',' text '\n'
```

### `expr1 | expr2`

First try to parse `expr1`. If that fails, try to parse `expr2`. If none of the
expressions matched, the parser fails.

```
pub declaration
  = function_declaration
  | let_declaration
  | const_declaration
```

### `expr?`

Parse or skip the given expression, depending on whether the expression can be
parsed.

```
pub singleton_or_pair
  = value (',' value)?
```

### `expr*`

Parse the given expression as much as possible.

```
skip = (multiline_comment | whitespace)*
```

### `expr+`

Parse the given expression one or more times.

For example, in Python, there must always be at least one statement in the body of a class or function:

```
body = stmt+
```

### `\expr`

Escape an expression by making it hidden. The expression will be parsed, but
not be visible in the resulting CST/AST.

### `expr{n,m}`

Parse the expression at least `n` times and at most `m` times.

```
unicode_char = 'U+' hex_digit{4,4}
```

### `@keyword`

Treat the given rule as being a potential source for keywords.

String literals matching this rule will get the special `_keyword`-suffix
during transformation. The lexer will also take into account that the rule
conflicts with keywords and generate code accordingly.

```
@keyword
pub token ident
  = [a-zA-Z_] [a-zA-Z_0-9]*
```

### `@skip`

Register the chosen rule as a special rule that the lexer uses to lex 'gibberish'.

The rule will still be available in other rules, e.g. when `@noskip` was added.

```
@skip
whitespace = [\n\r\t ]*
```

### 🚧 `@noskip`

> [!WARNING]
>
> This decorator is under construction.

Disable automatic injection of the `@skip` rule for the chosen rule.

This can be useful for e.g. parsing indentation in a context where whitespace
is normally discarded.

```
@skip
__ = [\n\r\t ]*

@noskip
pub body
  = ':' __ stmt
  | ':' \indent stmt* \dedent
```

### `@wrap`

Adding this decorator to a rule ensures that a real CST node is emitted for
that rule, instead of possibly a variant.

This decorator makes the CST heavier, but this might be warranted in the name
of robustness and forward compatibility. Use this decorator if you plan to add
more fields to the rule.

```
@wrap
pub lit_expr
   = literal:(string | integer | boolean)
```

### `keyword`

A special rule that matches **any keyword present in the grammar**.

The generated CST will contain predicates to check for a keyword:

```py
print_bold = False
if is_py_keyword(token):
    print_bold = True
```

### `token`

A rule that matches **any token in the grammar**.

```
pub macro_call
  = name:ident '{' token* '}'
```

### `node`

A special rule that matches **any parseable node in the grammar**, excluding tokens.

### `syntax`

A special rule that matches **any rule in the grammar**, including tokens.

## Python API

This section documents the API that is generated by taking a Mage grammar as
input and specifying `python` as the output language.

In what follows, `Node` is the name of an arbitrary CST node (such as
`PyReturnStmt` or `MageRepeatExpr`) and `foo` and `bar` are the name of fields
of such a node. Examples of field names are `expr`, `return_keyword`, `min`,
`max,`, and so on.

### Node(...)

Construct a node with the fields specified in the `...` part of the expression.

First go all elements that are required, i.e. they weren't suffixed with `?` or
`*` in the grammar or something similar. They may be specified as positional
arguments or as keyword.

Next are all optional arguments. They **must** be specified as keyword
arguments. When omitted, the corresponding fields are either set to `None` or a
new empty token/node is created.

#### Examples

**Creating a new CST node by providing positional arguments for required fields:**
```py
PyInfixExpr(
    PyNamedExpr('value'),
    PyIsKeyword(),
    PyNamedExpr('None')
)
```

**The same example but now with keyword arguments:**
```py
PyInfixExpr(
    left=PyNamedExpr('value'),
    op=PyIsKeyword(),
    right=PyNamedExpr('None')
)
```

**Omitting fields that are trivial to construct:**
```py
# Note that `return_keyword` is not specified
stmt = PyReturnStmt(expr=PyConstExpr(42))

# stmt.return_keyword was automatically created
assert(isinstance(stmt.return_keyword, ReturnKeyword()))
```

### Node.count_foos() -> int

This member is generated when there was a repetition in field `foo` such
as the Mage expression `'.'+`

It returns the amount of elements that are actually present in the CST node.

## FAQ

### What is a CST, AST, visitor, and so on?

A **CST** is a collection of structures and enumerations that completely
represent the source code that needs to be parsed/emitted.

An **AST** is an abstract representation of the CST. Mage can automatically
derive a good AST from a CST.

A **visitor** is (usually) a function that traverses the AST/CST in a
particular way. It is useful for various things, such as code analysis and
evaluation.

A **rewriter** is similar to a visitor in that it traverses that AST/CST but
also creates new nodes during this traversal.

A **lexer** or scanner is at it core a program that splits the input stream
into separate _tokens_ that are easy to digest by the parser.

A **parser** converts a stream of tokens in AST/CST nodes. What parts of the
input stream are converted to which nodes usually depends on how the parser is invoked.

### How do I assign a list of nodes to another node in Python without type errors?

This is probably due to [this feature](https://mypy.readthedocs.io/en/stable/common_issues.html#invariance-vs-covariance)
in the Python type checker, which prevents subclasses from being assigned to a more general type.

For small lists, we recommend making a copy of the list, like so:

```py
defn = PyFuncDef(body=list([ ... ]))
```

See also [this issue](https://github.com/microsoft/pyright/issues/130) in the Pyright repository.

## Contributing

Run the following command in a terminal to link the `mage` command to your checkout:

```
pip3 install -e '.[dev]'
```

## License

This code is generously licensed under the MIT license.


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "magelang",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "Sam Vervaeck <samvv@pm.me>",
    "keywords": "text-analysis, scanner, lexer, parser, code-generator",
    "author": null,
    "author_email": "Sam Vervaeck <samvv@pm.me>",
    "download_url": "https://files.pythonhosted.org/packages/30/e9/b3b7252627f212f634f013e5602ef5c3a4c472160f3bbbb80cd42c9d1c3a/magelang-0.1.1.tar.gz",
    "platform": null,
    "description": "Mage: Text Analysis Made Easy\n=============================\n\nMage is an experimental tool for performing text analysis. It does so by\ngenerating a _lexer_, _parser_ and _parse tree_ for you. Whether it is a piece\nof programming code or some tabular data in a fringe format, Mage has got you\ncovered!\n\n**Features**\n\n - \u2705 A simple yet expressive DSL to write your grammars in\n - \u2705 Full support for Python typings. Avoid runtime errors while building your language!\n - \ud83d\udea7 Lots of unit tests to enure your code does what you expect it to do\n - \ud83d\udea7 An intermediate language to very easily add support to any other programming language\n\n\ud83d\udc40 Mage is written in itself. Check out the [generated code][1] of part of our Python generator!\n\n**Implementation Status**\n\n| Feature | Python  | Rust | C  | C++ | JavaScript |\n|---------|---------|------|----|-----|------------|\n| CST     | \u2705      | \u23f3   | \u23f3 | \u23f3  | \u23f3         |\n| AST     | \u23f3      | \u23f3   | \u23f3 | \u23f3  | \u23f3         |\n| Lexer   | \ud83d\udea7      | \u23f3   | \u23f3 | \u23f3  | \u23f3         |\n| Parser  | \u23f3      | \u23f3   | \u23f3 | \u23f3  | \u23f3         |\n| Emitter | \u23f3      | \u23f3   | \u23f3 | \u23f3  | \u23f3         |\n\n[1]: https://github.com/samvv/mage/blob/main/src/magelang/lang/python/cst.py\n\n## Installation\n\n```\n$ pip3 install --user -U magelang\n```\n\n## Usage\n\nCurrently requires at least Python version 3.12 to run.\n\n### `mage generate <lang> <filename>`\n\nGenerate a parser for the given grammar in a language that you specify.\n\n**Example**\n\n```\nmage generate python foo.mage --prefix foo --out-dir src/foolang\n```\n\n### \ud83d\udea7 `mage test <filename..>`\n\n> [!WARNING]\n>\n> This command is under construction.\n\nRun all tests inside the documentation of the given grammar.\n\n## Grammar\n\n### `pub <name> = <expr>`\n\nDefine a new node or token that must be parsed according the given expression.\n\nYou can use both inline rules and other node rules inside `expr`. When\nreferring to another node, that node will become a field in the node that\nreferred to it. Nodes that have no fields are converted to a special token type\nthat is more efficient to represent.\n\n```\npub var_decl = 'var' name:ident '=' type_expr\n```\n\n### `<name> = <expr>`\n\nDefine a new inline rule that can be used inside other rules.\n\nAs the name suggests, this type of rule is merely syntactic sugar and gets\ninlined whenever it is referred to inside another rule.\n\n```\ndigits = [0-9]+\n```\n\n### `extern <name>`\n\nDefines a new parsing rule that is defined somewhere else, possibly in a different language.\n\n### `extern token <name>`\n\nDefines a new lexing rule that is defined somewhere else, possibly in a different language.\n\n### `pub token <name> = <expr>`\n\nLike `pub <name> = <expr>` but forces the rule to be a token.\n\nMage will show an error when the rule could not be converted to a token rule.\nThis usually means that the rule references another rule that is `pub`.\n\n```\npub token float_expression\n  = digits? '.' digits\n```\n\n### `expr1 expr2`\n\nFirst parse `expr1` and continue to parse `expr2` immediately after it.\n\n```\npub two_column_csv_line\n  = text ',' text '\\n'\n```\n\n### `expr1 | expr2`\n\nFirst try to parse `expr1`. If that fails, try to parse `expr2`. If none of the\nexpressions matched, the parser fails.\n\n```\npub declaration\n  = function_declaration\n  | let_declaration\n  | const_declaration\n```\n\n### `expr?`\n\nParse or skip the given expression, depending on whether the expression can be\nparsed.\n\n```\npub singleton_or_pair\n  = value (',' value)?\n```\n\n### `expr*`\n\nParse the given expression as much as possible.\n\n```\nskip = (multiline_comment | whitespace)*\n```\n\n### `expr+`\n\nParse the given expression one or more times.\n\nFor example, in Python, there must always be at least one statement in the body of a class or function:\n\n```\nbody = stmt+\n```\n\n### `\\expr`\n\nEscape an expression by making it hidden. The expression will be parsed, but\nnot be visible in the resulting CST/AST.\n\n### `expr{n,m}`\n\nParse the expression at least `n` times and at most `m` times.\n\n```\nunicode_char = 'U+' hex_digit{4,4}\n```\n\n### `@keyword`\n\nTreat the given rule as being a potential source for keywords.\n\nString literals matching this rule will get the special `_keyword`-suffix\nduring transformation. The lexer will also take into account that the rule\nconflicts with keywords and generate code accordingly.\n\n```\n@keyword\npub token ident\n  = [a-zA-Z_] [a-zA-Z_0-9]*\n```\n\n### `@skip`\n\nRegister the chosen rule as a special rule that the lexer uses to lex 'gibberish'.\n\nThe rule will still be available in other rules, e.g. when `@noskip` was added.\n\n```\n@skip\nwhitespace = [\\n\\r\\t ]*\n```\n\n### \ud83d\udea7 `@noskip`\n\n> [!WARNING]\n>\n> This decorator is under construction.\n\nDisable automatic injection of the `@skip` rule for the chosen rule.\n\nThis can be useful for e.g. parsing indentation in a context where whitespace\nis normally discarded.\n\n```\n@skip\n__ = [\\n\\r\\t ]*\n\n@noskip\npub body\n  = ':' __ stmt\n  | ':' \\indent stmt* \\dedent\n```\n\n### `@wrap`\n\nAdding this decorator to a rule ensures that a real CST node is emitted for\nthat rule, instead of possibly a variant.\n\nThis decorator makes the CST heavier, but this might be warranted in the name\nof robustness and forward compatibility. Use this decorator if you plan to add\nmore fields to the rule.\n\n```\n@wrap\npub lit_expr\n   = literal:(string | integer | boolean)\n```\n\n### `keyword`\n\nA special rule that matches **any keyword present in the grammar**.\n\nThe generated CST will contain predicates to check for a keyword:\n\n```py\nprint_bold = False\nif is_py_keyword(token):\n    print_bold = True\n```\n\n### `token`\n\nA rule that matches **any token in the grammar**.\n\n```\npub macro_call\n  = name:ident '{' token* '}'\n```\n\n### `node`\n\nA special rule that matches **any parseable node in the grammar**, excluding tokens.\n\n### `syntax`\n\nA special rule that matches **any rule in the grammar**, including tokens.\n\n## Python API\n\nThis section documents the API that is generated by taking a Mage grammar as\ninput and specifying `python` as the output language.\n\nIn what follows, `Node` is the name of an arbitrary CST node (such as\n`PyReturnStmt` or `MageRepeatExpr`) and `foo` and `bar` are the name of fields\nof such a node. Examples of field names are `expr`, `return_keyword`, `min`,\n`max,`, and so on.\n\n### Node(...)\n\nConstruct a node with the fields specified in the `...` part of the expression.\n\nFirst go all elements that are required, i.e. they weren't suffixed with `?` or\n`*` in the grammar or something similar. They may be specified as positional\narguments or as keyword.\n\nNext are all optional arguments. They **must** be specified as keyword\narguments. When omitted, the corresponding fields are either set to `None` or a\nnew empty token/node is created.\n\n#### Examples\n\n**Creating a new CST node by providing positional arguments for required fields:**\n```py\nPyInfixExpr(\n    PyNamedExpr('value'),\n    PyIsKeyword(),\n    PyNamedExpr('None')\n)\n```\n\n**The same example but now with keyword arguments:**\n```py\nPyInfixExpr(\n    left=PyNamedExpr('value'),\n    op=PyIsKeyword(),\n    right=PyNamedExpr('None')\n)\n```\n\n**Omitting fields that are trivial to construct:**\n```py\n# Note that `return_keyword` is not specified\nstmt = PyReturnStmt(expr=PyConstExpr(42))\n\n# stmt.return_keyword was automatically created\nassert(isinstance(stmt.return_keyword, ReturnKeyword()))\n```\n\n### Node.count_foos() -> int\n\nThis member is generated when there was a repetition in field `foo` such\nas the Mage expression `'.'+`\n\nIt returns the amount of elements that are actually present in the CST node.\n\n## FAQ\n\n### What is a CST, AST, visitor, and so on?\n\nA **CST** is a collection of structures and enumerations that completely\nrepresent the source code that needs to be parsed/emitted.\n\nAn **AST** is an abstract representation of the CST. Mage can automatically\nderive a good AST from a CST.\n\nA **visitor** is (usually) a function that traverses the AST/CST in a\nparticular way. It is useful for various things, such as code analysis and\nevaluation.\n\nA **rewriter** is similar to a visitor in that it traverses that AST/CST but\nalso creates new nodes during this traversal.\n\nA **lexer** or scanner is at it core a program that splits the input stream\ninto separate _tokens_ that are easy to digest by the parser.\n\nA **parser** converts a stream of tokens in AST/CST nodes. What parts of the\ninput stream are converted to which nodes usually depends on how the parser is invoked.\n\n### How do I assign a list of nodes to another node in Python without type errors?\n\nThis is probably due to [this feature](https://mypy.readthedocs.io/en/stable/common_issues.html#invariance-vs-covariance)\nin the Python type checker, which prevents subclasses from being assigned to a more general type.\n\nFor small lists, we recommend making a copy of the list, like so:\n\n```py\ndefn = PyFuncDef(body=list([ ... ]))\n```\n\nSee also [this issue](https://github.com/microsoft/pyright/issues/130) in the Pyright repository.\n\n## Contributing\n\nRun the following command in a terminal to link the `mage` command to your checkout:\n\n```\npip3 install -e '.[dev]'\n```\n\n## License\n\nThis code is generously licensed under the MIT license.\n\n",
    "bugtrack_url": null,
    "license": "Copyright 2024 Sam Vervaeck  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \u201cSoftware\u201d), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \u201cAS IS\u201d, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ",
    "summary": "A modern lexer/parser generator for a growing number of languages",
    "version": "0.1.1",
    "project_urls": {
        "Bug Reports": "https://github.com/samvv/mage/issues",
        "Homepage": "https://github.com/samvv/mage",
        "Source": "https://github.com/samvv/mage/"
    },
    "split_keywords": [
        "text-analysis",
        " scanner",
        " lexer",
        " parser",
        " code-generator"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "03c41ee63c3a838353a0dec9993998f6ece312565dcfa9494ba832f1734f73a7",
                "md5": "3a4d24306d24ddc782ca6cc714fec84f",
                "sha256": "d2344952035e161e3f5b24b53794bbbd3f1cfdc069ab83a4aabbbf5970f668be"
            },
            "downloads": -1,
            "filename": "magelang-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3a4d24306d24ddc782ca6cc714fec84f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 126577,
            "upload_time": "2024-11-20T18:10:11",
            "upload_time_iso_8601": "2024-11-20T18:10:11.114184Z",
            "url": "https://files.pythonhosted.org/packages/03/c4/1ee63c3a838353a0dec9993998f6ece312565dcfa9494ba832f1734f73a7/magelang-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "30e9b3b7252627f212f634f013e5602ef5c3a4c472160f3bbbb80cd42c9d1c3a",
                "md5": "5c9cce39776b49e7266d4fa27842dc67",
                "sha256": "86498714a7ed4a269a8cb0d71ec58cd15a6d749b37c08eb4b62a40d98d2d7722"
            },
            "downloads": -1,
            "filename": "magelang-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "5c9cce39776b49e7266d4fa27842dc67",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 114225,
            "upload_time": "2024-11-20T18:10:13",
            "upload_time_iso_8601": "2024-11-20T18:10:13.314601Z",
            "url": "https://files.pythonhosted.org/packages/30/e9/b3b7252627f212f634f013e5602ef5c3a4c472160f3bbbb80cd42c9d1c3a/magelang-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-20 18:10:13",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "samvv",
    "github_project": "mage",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "magelang"
}
        
Elapsed time: 0.98381s