<!--TOC-->
- [Generic Tree-Sitter AST API](#generic-tree-sitter-ast-api)
- [Quick Intro](#quick-intro)
- [Extended Tutorial](#extended-tutorial)
- [AST Creation](#ast-creation)
- [Constructor](#constructor)
- [From String](#from-string)
- [AST Templates](#ast-templates)
- [AST Copy](#ast-copy)
- [AST Methods](#ast-methods)
- [Common Operations](#common-operations)
- [Source Locations](#source-locations)
- [Functions](#functions)
- [Function Callsites](#function-callsites)
- [AST Traversal](#ast-traversal)
- [AST Manipulation](#ast-manipulation)
- [Mutation Primitives](#mutation-primitives)
- [Transformers](#transformers)
- [Architecture](#architecture)
- [FAQ](#faq)
- [License](#license)
<!--TOC-->
# Generic Tree-Sitter AST API
The ASTs package provides a Python API into GrammaTech's Software
Evolution Library ([SEL][]) for source code manipulation. SEL
generalizes over GitHub's [tree-sitter][] parsing libraries providing
a uniform interface over multiple programming languages (primarily
Python, JavaScript/TypeScript, and C/C++), and providing additional
functionality for software inspection and modification.
# Quick Intro
Here's how to create and perform some common operations on an AST:
```python3
$ python3
Python 3.8.5
Type "help", "copyright", "credits" or "license" for more information.
>>> import asts
>>> root = asts.AST.from_string("x + 88", language=asts.ASTLanguage.Python)
>>> root.children
[<asts.types.PythonExpressionStatement0 0x2>]
>>> root.children[0].children
[<asts.types.PythonBinaryOperator 0x3>]
>>> root.children[0].children[0].children
[<asts.types.PythonIdentifier 0x4>,
<asts.types.PythonAdd 0x5>,
<asts.types.PythonInteger 0x6>]
>>> root.children[0].children[0].children[0].source_text
'x'
>>> root.children[0].children[0].children[1].source_text
'+'
>>> root.children[0].children[0].children[2].source_text
'88'
>>> root.children[0].children[0].source_text
'x + 88'
>>> root.children[0].children[0].child_slots()
[('LEFT', 1), ('OPERATOR', 1), ('RIGHT', 1), ('CHILDREN', 0)]
>>> list(map(lambda x:x.source_text, root.children[0].children[0].children))
['x', '+', '88']
```
# Extended Tutorial
The following examples assume you have imported the asts library using
`import asts`. See the methods provided by asts.py for more
information.
<!-- TODO: Setup automatic documentation building. -->
## AST Creation
### Constructor
The default `AST` constructor should not be invoked directly by clients.
Instead, the static factory methods described below should be utilized.
### From String
ASTs may be created from source text using the `AST.from_string`
factory method in the Python API. Using this method requires source
text and (optionally) a language enumeration indicating the source
text language. The example below shows the creation of a simple AST:
```python
>>> root = asts.AST.from_string("x + 88", language=asts.ASTLanguage.Python)
```
Language enumerations exist for `C`, `Cpp`, `Java`, `Javascript`, `Python`,
`Rust`, `TypescriptTs`, and `TypescriptTsx`.
For larger examples where the language may be inferred, the language
parameter may optionally be elided. For instance:
```python
>>> text = """
... import sys
...
... def fib(n: int) -> int:
... if n < 2:
... return n
... else:
... return fib(n - 1) + fib(n - 2)
...
... def main():
... if len(sys.argv) == 1:
... print(fib(int(sys.argv[1])))
...
... if __name__ == '__main__':
... main()
... """
>>> root = asts.AST.from_string(text)
```
If tree-sitter cannot parse the source text into an AST, an error or text
fragment AST node will be created starting at the location of the invalid
parse. By default, the children of this node will be the best-effort but
potentially incorrect parse tree received from tree-sitter with all terminal
tokens excepting zero-width ones included. If instead of the tree you wish
to have a flat, text representation child, you may pass `error_tree=False`
as a keyword argument.
Finally, by default, the AST returned is the top-level, module node of
the parsed AST. However, in some cases, we may wish get the deepest
subnode still encompassing all of the given source text. This allows
us to create statement AST nodes, for instance. To do so, clients
may use the `deepest` keyword, as shown below:
```python
>>> root = asts.AST.from_string(
... "x + 88",
... language=asts.ASTLanguage.Python,
... deepest=True
... )
>>> type(root)
<class 'asts.types.PythonBinaryOperator'>
```
### AST Templates
#### Templates for building ASTs
ASTs may also be created using the AST template mechanism. For
instance, the following snippet creates an AST equivalent to
`asts.AST.from_string("x = 2", language=asts.ASTLanguage.Python, deepest=True)`:
```python
>>> root = asts.AST.ast_template("$ID = 2", asts.ASTLanguage.Python, id="x")
>>> root.source_text
'x = 2'
```
Metavariable names (e.g. `$ID` above) may contain uppercase characters,
digits, or underscores. Metavariables may be scalars (e.g. `$`) or
lists (e.g. `@`), as shown below:
```python
>>> root = asts.AST.ast_template("fn(@ARGS)", asts.ASTLanguage.Python, args=[1,2,3])
>>> root.source_text
'fn(1, 2, 3)'
```
Metavariables may also be positional arguments (e.g. `$1`, `$2`), as
shown below:
```python
>>> root = asts.AST.ast_template("$1 = 2", asts.ASTLanguage.Python, "x")
>>> root.source_text
'x = 2'
```
However, you may not combine positional (e.g. `$1`) and keyword
(e.g. `$ID`) metavariables in a single template. The corresponding
metavariable values passed in as arguments to `ast_template` may be
ASTs, literals, or lists.
#### Templates for building and destructuring ASTs
ASTs may also be directly created for the metavariables in an AST
template. For instance, in the template `"$1 = $2"`, we may create
ASTs for `$1` and `$2` using `asts_from_template`, as shown below:
```python
>>> asts = asts.AST.asts_from_template("$1 = $2", asts.ASTLanguage.Python, "x", 1)
>>> len(asts)
2
>>> asts[0].source_text
'x'
>>> asts[1].source_text
'1'
```
For now, only the position syntax (e.g. `$1`, `$2`) is supported by
`asts_from_template`. One AST is returned per metavariable, in
numeric order.
#### More information
More information on AST templates may be found in the SEL
[template documentation][].
### AST Copy
Copies of an AST may created using `AST.copy` or the python `copy`
module, as shown below:
```python
>>> root = asts.AST.from_string("x + 1", asts.ASTLanguage.Python, deepest=True)
>>> copy = asts.AST.copy(root)
>>> copy.source_text
'x + 1'
```
In addition, clients may set child slots in the copy by passing in new
ASTs for each slot as keyword arguments, as shown below:
```python
>>> root = asts.AST.from_string("x + 1", asts.ASTLanguage.Python, deepest=True)
>>> y = asts.AST.from_string("y", asts.ASTLanguage.Python, deepest=True)
>>> copy = asts.AST.copy(root, left=y)
>>> copy.source_text
'y + 1'
```
In addition to ASTs, clients may also pass in literals (e.g. strings,
code snippets, numbers) as keyword values, as shown below. These are
automatically parsed into an AST to be inserted.
```python
>>> root = asts.AST.from_string("x + 1", asts.ASTLanguage.Python, deepest=True)
>>> copy = asts.AST.copy(root, left=1)
>>> copy.source_text
'1 + 1'
>>> copy = asts.AST.copy(root, left="y")
>>> copy.source_text
'y + 1'
```
To view the names of an AST's child slots, you may use the
`child_slots` method, as shown below:
```python
>>> root = asts.AST.from_string("x + 1", asts.ASTLanguage.Python, deepest=True)
>>> root.child_slots()
[('LEFT', 1), ('OPERATOR', 1), ('RIGHT', 1), ('CHILDREN', 0)]
```
Alternatively, you may use the `dir` built-in to inspect an AST object
and find its child slots as python properties, as shown below:
```python
>>> root = asts.AST.from_string("x + 1", asts.ASTLanguage.Python, deepest=True)
>>> dir(root)
['__class__', ..., 'left', ..., 'operator', ..., 'right', ..., 'traverse']
```
Beyond these named child slots storing ASTs which are parsed from the language
grammar, internally, each AST has several additional slots which store ASTs
outside of the grammar rules, such as comments or error nodes. These include
`before-asts`, `after-asts`, and (optionally) `internal-asts-[0-9]+` storing
nodes such as comments and errors before the AST, after the AST, and between
implicit terminal tokens within the AST respectively. The names of these slots
can be accessed using the `child_slots` method and passing `internal=True`, as
shown below. These slot names may also be used `child_slot`/`lookup` to
retrieve their current value and and with `copy` to modify the existing value.
```python
>>> text = """
... # This is a comment
... x = x + 1
... """
>>> root = asts.AST.from_string(text, asts.ASTLanguage.Python)
>>> stmt = root.children[-1]
>>> stmt.source_text
'x = x + 1'
>>> stmt.child_slots(internal=True)
[('BEFORE-ASTS', 0), ('CHILDREN', 0), ('AFTER-ASTS', 0)]
>>> stmt.child_slot("BEFORE-ASTS")
[<asts.types.PythonComment 0x3>]
```
## AST Methods
### Common Operations
In practice, most clients will interact with ASTs using the AST's type and
retrieving the AST's source text, parent AST, and child ASTs. All of
these operations are supported by the python API. To begin, let's examine
using AST types.
ASTs are embedded in a rich type hierarchy generated by the tree-sitter
library; this type hierarchy is augmented by generic mixins in SEL for
common AST types across languages. For instance, `IfAST`,
`StatementAST`, and `IdentifierAST` classes abstract common language
constructs and allow identification of these types of ASTs regardless
of the language of the underlying program. As with other objects,
clients may use `isinstance` to test if an AST is an instance or
subclass of a given type, as shown below:
```python
>>> root = asts.AST.from_string(
... "print(x)",
... language=asts.ASTLanguage.Python,
... deepest=True
... )
>>> isinstance(root, asts.types.PythonCall)
True
>>> isinstance(root, asts.types.CallAST)
True
>>> isinstance(root, asts.types.PythonAST)
True
>>> isinstance(root, asts.types.CStringLiteral)
False
```
Beyond AST types, accessing the source text of an AST is
another common operation. This may be accomplished using
the `source_text` property on ASTs, as shown below:
```python
>>> root = asts.AST.from_string(
... "print(x)",
... language=asts.ASTLanguage.Python,
... deepest=True
... )
>>> root.source_text
'print(x)'
```
Additionally, to view a list of the immediate children
of a particular AST, the `children` property may be used,
as shown below:
```python
>>> root = asts.AST.from_string(
... "print(x)",
... language=asts.ASTLanguage.Python,
... deepest=True
... )
>>> root.children
[<asts.types.PythonIdentifier 0x4>, <asts.types.PythonArgumentList1 0x5>]
>>> root.children[0].source_text
'print'
>>> identifier = root.children[1].children[0]
>>> identifier.source_text
'x'
```
An AST is composed of various child slots which are concatenated together
when accessed via the `children` property. To view the child slots for
a particular AST you may use the `child_slots()` method, which returns
a list of slot-name, arity pairs. An arity of one indicates the slot
is a single AST, while an arity of zero indicates the slot is composed
of zero or more ASTs. The AST(s) comprising a given slot may be
accessed using the slot name as a python property accessor or by using
`child_slot` method, as shown below:
```python
>>> root = asts.AST.from_string(
... "print(x)",
... language=asts.ASTLanguage.Python,
... deepest=True
... )
>>> root.child_slots()
[('FUNCTION', 1), ('ARGUMENTS', 1), ('CHILDREN', 0)]
>>> root.function.source_text
'print'
>>> root.child_slot("FUNCTION").source_text
'print'
```
Beyond direct child lookup, an AST path composed of the names of the
child slots and, when the child slot is a list, a position in this list
may be used to lookup ASTs at arbritrary depths in the tree using the
`lookup` method. Additionally, an AST's path in a tree may be found
directly using the `ast_path` method, as shown below:
```python
>>> root = asts.AST.from_string(
... "print(x)",
... language=asts.ASTLanguage.Python,
... deepest=True
... )
>>> x = root.lookup(['ARGUMENTS', ('CHILDREN', 0)])
>>> x.source_text
'x'
>>> root.ast_path(x)
['ARGUMENTS', ('CHILDREN', 0)]
```
Finally, parent trees may be accessed using the `parent` and `parents`
methods, as shown below. Please note that these methods require the
root of the subtree as a parameter.
```python
>>> root = asts.AST.from_string(
... "print(x)",
... language=asts.ASTLanguage.Python,
... deepest=True
... )
>>> root.children
[<asts.types.PythonIdentifier 0x4>, <asts.types.PythonArgumentList1 0x5>]
>>> root.children[0].source_text
'print'
>>> identifier = root.children[1].children[0]
>>> identifier.source_text
'x'
>>> identifier.parent(root).source_text
'(x)'
>>> [p.source_text for p in identifier.parents(root)]
['(x)', 'print(x)']
```
#### Pattern Matching
In Python 3.10+, AST types and properties may be used in
[pattern matching][] for conditional logic and to destructure
an AST into its components parts. For instance, to match
against assignments where the left-hand side of the assignment
is the variable "x", the following match clause may be used:
```python
>>> root = asts.AST.from_string(
... "x = 1",
... language=asts.ASTLanguage.Python,
... deepest=True
... )
>>> match root:
... case asts.types.PythonAssignment(left=asts.types.IdentifierAST(source_text="x")):
... print("Match found")
...
Match found
```
As another example, to destructure the left-hand and right-hand sides
of an assignment into separate variables, the following match
clause may be used:
```python
>>> root = asts.AST.from_string(
... "x = 1",
... language=asts.ASTLanguage.Python,
... deepest=True
... )
>>> match root:
... case asts.types.PythonAssignment(left=lhs, right=rhs):
... [lhs, rhs]
...
[<asts.types.PythonIdentifier 0x4>, <asts.types.PythonInteger 0x5>]
```
### Source Locations
For some applications, it may be useful to know the start/end locations
of each AST or retrieve the AST at a given location. Clients may do
both using the `ast_source_ranges` and `ast_at_point` methods
respectively, as shown below. Please note that for each method the
lines and columns are 1-indexed.
```python
>>> root = asts.AST.from_string(
... "print(x)",
... language=asts.ASTLanguage.Python,
... deepest=True
... )
>>> root.ast_source_ranges()
[(<asts.types.PythonCall 0x3>, ((1, 1), (1, 9))),
(<asts.types.PythonIdentifier 0x4>, ((1, 1), (1, 6))),
(<asts.types.PythonArgumentList1 0x5>, ((1, 6), (1, 9))),
(<asts.types.PythonIdentifier 0x6>, ((1, 7), (1, 8)))]
>>> root.ast_at_point(1, 7).source_text
'x'
```
### Functions
Function ASTs have special consideration in the python API, and clients
may retrieve various function attributes, such as name, parameters, and
body, using the respective AST methods, `function_name`,
`function_parameters`, and `function_body`, as shown below:
```python
>>> root = asts.AST.from_string(
... "def foo(bar: int) -> int:\n return bar / 2",
... language=asts.ASTLanguage.Python,
... deepest=True
... )
>>> root.function_name()
'foo'
>>> [param.source_text for param in root.function_parameters()]
['bar: int']
>>> root.function_body().source_text
' return bar / 2'
```
### Function Callsites
In addition to function ASTs, function callsites also have special
consideration in the python API. Clients may query for the library
containing the function implementation (`provided_by`), the function
portion of the callsite (`call_function`), and the callargs
(`call_arguments`). An example incorporating these methods is shown
below:
```python
>>> root = asts.AST.from_string(
... "import json\njson.dumps({})",
... language=asts.ASTLanguage.Python
... )
>>> callsite = root.children[-1].children[-1]
>>> callsite.provided_by(root)
'json'
>>> callsite.call_function().source_text
'json.dumps'
>>> [callarg.source_text for callarg in callsite.call_arguments()]
['{}']
```
## AST Traversal
ASTs may be explictly traversed in pre-order using the `traverse` method
which creates a generator that may be used anywhere a python `iterable`
is required. An example usage is shown below:
```python
>>> root = asts.AST.from_string("x + 88", language=asts.ASTLanguage.Python)
>>> for a in root.traverse():
... print(a)
<asts.types.PythonModule 0x1>
<asts.types.PythonExpressionStatement0 0x2>
<asts.types.PythonBinaryOperator 0x3>
<asts.types.PythonIdentifier 0x4>
<asts.types.PythonAdd 0x5>
<asts.types.PythonInteger 0x6>
```
Additionally, AST objects are themselves iterators and may be used
anywhere a python `iterable` is required, as shown below:
```python
>>> root = asts.AST.from_string("x + 88", language=asts.ASTLanguage.Python)
>>> for a in root:
... print(a)
<asts.types.PythonModule 0x1>
<asts.types.PythonExpressionStatement0 0x2>
<asts.types.PythonBinaryOperator 0x3>
<asts.types.PythonIdentifier 0x4>
<asts.types.PythonAdd 0x5>
<asts.types.PythonInteger 0x6>
```
As expected, ASTs may be also be used in list comprehensions as shown:
```python
>>> root = asts.AST.from_string("x + 88", language=asts.ASTLanguage.Python)
>>> ids = [a for a in root if isinstance(a, asts.types.PythonIdentifier)]
>>> len(ids)
1
```
## AST Manipulation
ASTs may also be manipulated (mutated) using the python API. Mutation
operations create a new AST distinct from the inputs. The input ASTs
may continue to be used as before; however, they are unchanged objects
distinct from the AST(s) created.
### Mutation Primitives
Currently, clients may cut, insert, or replace AST subtrees, as shown. The
operations are based on AST serial numbers; we find the AST in the root node
passed as the first parameter with the same serial number as the point
modify and perform the corresponding operation.
CUT:
```python
>>> root = asts.AST.from_string("x = 2\n", language=asts.ASTLanguage.Python)
>>> stmt = root.children[0]
>>> root = asts.AST.cut(root, stmt)
>>> root.source_text
''
```
INSERT:
```python
>>> root = asts.AST.from_string(
... "y = 3\n",
... language=asts.ASTLanguage.Python
... )
>>> stmt = root.children[0]
>>> new_stmt = asts.AST.from_string(
... "x = 2\n",
... language=asts.ASTLanguage.Python,
... deepest=True
... )
>>> root = asts.AST.insert(root, stmt, new_stmt)
>>> root.source_text
'x = 2\ny = 3\n'
```
REPLACE:
```python
>>> root = asts.AST.from_string(
... "x = 2\n",
... language=asts.ASTLanguage.Python
... )
>>> literal = root.children[0].children[0].children[-1]
>>> new_literal = asts.AST.from_string(
... "3",
... language=asts.ASTLanguage.Python,
... deepest=True
... )
>>> root = asts.AST.replace(root, literal, new_literal)
>>> root.source_text
"x = 3\n"
```
As a useful shortcut, for small mutations, literals may be passed as the
values for insertion and replacement, as shown below:
```python
>>> root = asts.AST.from_string(
... "x = 2\n",
... language=asts.ASTLanguage.Python
... )
>>> literal = root.children[0].children[0].children[-1]
>>> root = asts.AST.replace(root, literal, 3)
>>> root.source_text
"x = 3\n"
```
Finally, AST paths may be used as mutation point parameters to cut,
insert, and replace, as shown below:
```python
>>> root = asts.AST.from_string(
... "x = 2\n",
... language=asts.ASTLanguage.Python
... )
>>> literal_path = [('CHILDREN', 0), ('CHILDREN', 0), 'RIGHT']
>>> root = asts.AST.replace(root, literal_path, 3)
>>> root.source_text
"x = 3\n"
```
### Transformers
In addition to simple mutation primitives, the API also supports walking
the AST tree and performing transformations on the nodes within. This
mode of mutation requires the client to define a *transformer* function
which takes an AST node parameter as input and (optionally) returns an
AST or literal which should replace that node in the new tree. If no
new node is provided by the transformer function, the node is not
replaced in the newly created tree.
For instance, consider a scenario where you wish to rewrite an AST,
replacing all `x` identifiers with `y`. To begin, you would define
an `x_to_y` transformer function, as shown below:
```python
>>> def x_to_y(ast: asts.AST) -> Optional[asts.LiteralOrAST]:
... """Convert 'x' identifier ASTs to 'y'."""
... if isinstance(ast, asts.types.IdentifierAST) and "x" == ast.source_text:
... return asts.AST.from_string("y", ast.language, deepest=True)
...
```
As you can see, this function returns a `y` identifier when it encounters
an `x` identifier. To use the transformer to replace nodes in an AST tree,
you would use the `AST.transform` function, as shown below:
```python
>>> text = """
... x = 1
... print(x)
... """
>>> root = asts.AST.from_string(text, asts.ASTLanguage.Python)
>>> print(root.source_text.strip())
x = 1
print(x)
>>> transformed = asts.AST.transform(root, x_to_y)
>>> print(transformed.source_text.strip())
y = 1
print(y)
```
In the example above, the `x_to_y` transformer returned an AST. The
transformer function may also return a literal value, which will be
parsed as an AST. For instance, the below `x_to_y` transformer is
functionally equivalent to the example above:
```python
>>> def x_to_y(ast: asts.AST) -> Optional[asts.LiteralOrAST]:
... """Convert 'x' identifier ASTs to 'y'."""
... if isinstance(ast, asts.types.IdentifierAST) and "x" == ast.source_text:
... return "y"
...
```
Additionally, when using Python 3.10+, you may use [pattern matching][]
to further simplify the implementation of the `x_to_y` transformer,
as shown below:
```python
>>> def x_to_y(ast: asts.AST) -> Optional[asts.LiteralOrAST]:
... """Convert 'x' identifier ASTs to 'y'."""
... match ast:
... case asts.types.IdentifierAST(source_text="x"):
... return "y"
...
```
Transformers may implement more complicated logic than the simple
`x_to_y` transform above. For instance, one may wish to convert
`x` identifiers to `y`, but only for the left-hand side of assignment
statements, as shown below:
```python
>>> def x_to_y_assignment_lhs(ast: asts.AST) -> Optional[asts.LiteralOrAST]:
... """Convert 'x' identifier ASTs to 'y' on the lhs of assignments."""
... if (
... isinstance(ast, asts.types.PythonAssignment)
... and "x" == ast.left.source_text
... ):
... return asts.AST.copy(ast, left="y")
...
>>> text = """
... x = 1
... print(y)
... """
>>> root = asts.AST.from_string(text, asts.ASTLanguage.Python)
>>> transformed = asts.AST.transform(root, x_to_y_assignment_lhs)
>>> print(transformed.source_text.strip())
y = 1
print(y)
```
As before, the `x_to_y_assignment_lhs` transformer may be simplified
using [pattern matching][] in Python 3.10+, as shown below:
```python
>>> def x_to_y_assignment_lhs(ast: asts.AST) -> Optional[asts.LiteralOrAST]:
... """Convert 'x' identifier ASTs to 'y' on the lhs of assignments."""
... match ast:
... case asts.types.PythonAssignment(left=asts.types.IdentifierAST(source_text="x")):
... return asts.AST.copy(ast, left="y")
...
```
For these more complicated transforms, you may need to mutate the parent
of the node you are interested in. For instance, above we create a
new python assignment node with the left-hand side modified to the
value we want.
Tying everything together, consider the case where wish to delete
all `print` statements from an AST. This may be accomplished by
first defining a predicate for print statements, as shown below:
```python
>>> def is_print_statement(ast: asts.AST) -> bool:
... """Return TRUE if AST is an statement calling the print function."""
... if isinstance(ast, asts.types.ExpressionStatementAST):
... fn_calls = [c.call_function().source_text for c in asts.utility.call_asts(ast)]
... return "print" in fn_calls
... return False
...
```
Once this predicate is in place, we may use it to define a transformer which
returns a node with the `print` statements immediately below it elided:
```python
>>> def delete_print_statements(ast: asts.AST) -> Optional[asts.LiteralOrAST]:
... """Delete all print statements from the children of AST."""
... if isinstance(ast, (asts.types.RootAST, asts.types.CompoundAST)):
... # Build a list of new, non-comment children directly under the AST,
... # eliding print statements.
... new_children = [
... c for c in ast.child_slot("children")
... if not is_print_statement(c)
... ]
...
... # Special case; if no children remain, add a "pass" statement nop to
... # avoid syntax errors.
... new_children = new_children if new_children else ["pass\n"]
...
... # Create the new node with print statements removed from the children.
... return asts.AST.copy(ast, children=new_children)
...
```
Finally, as before, we may use the `delete_print_statements` transformer
to mutate an AST, as shown below:
```
>>> text = """
... x = 1
... y = 2
... if x > 1:
... x = x ** y
... print("Test One: %d" % x)
... else:
... print("Test Two: %d" % x)
... print("y = %d", y)
... """
>>> root = asts.AST.from_string(text, asts.ASTLanguage.Python)
>>> transformed = asts.AST.transform(root, delete_print_statements)
>>> print(transformed.source_text.strip())
x = 1
y = 2
if x > 1:
x = x ** y
else:
pass
```
# Architecture
The python library is a thin wrapper around a Common Lisp program named
`tree-sitter-interface` which calls the required pieces of the
Software Evolution Library ([SEL][]). Most API calls are delegated to
this interface which we communicate with using JSON formatted input/
output over stdio/stdout or a socket.
The python AST objects contain a oid attribute representing an
object id (oid) on the Common Lisp side of the interface; in essence,
the python ASTs are pointers to Common Lisp memory locations. When
calling a python AST method, the oid is serialized to the Common Lisp
side of the interface where the underlying AST object is found
(dereferenced) and the operation performed. You may get the object id
using the `oid` property on python ASTs; to test for python AST
equality, we check to see if the ASTs point to the same object using
the oids.
Additionally, python AST objects contain a serial number attribute
representing a serial number on the Common Lisp side of the interface.
This serial number is used when performing modifications (mutations)
to an AST such as cut/replace/insert. To perform these modifications,
we find the AST in the root with the same serial number as the
point to modify and perform the operation. Serial numbers are stable
in the unmodified, original tree across operations; however, the
serial numbers of values inserted into the tree will differ from the
AST given to insert.
To allow for garbage collection, the ASTs are manually reference
counted. Whenever a python AST (pointer) is created, the reference
count for the associated Common Lisp AST is incremented. Similarly,
as python ASTs are garbage collected the reference counter is
decremented. When the reference counter reaches zero, the Common Lisp
AST is released and may be garbage collected at the appropriate time.
To get the reference count for a particular python AST, you may use
the `ast_ref_count` method.
The underlying Common Lisp ASTs are themselves treated as immutable.
Therefore, when performing mutation operations (e.g. cut, replace,
insert), new ASTs are created in the process.
# FAQ
#### ASTs package does not faithfully reproduce original text
The source text property of an AST created using the ASTs package
may not match the original text given to the parser, even if no
mutations are performed. An automatic source code formatter, such
as `black` or `clang-format`, may be used on the original source
text and the source text attribute of the parsed AST before printing
to ensure consistency if this is required. This flexibility in
parsing allows for us to perform structured mutation operations -
for instance, automatically inserting a comma when an item is
added to an initializer list in C++.
#### ASTs package inserts/deletes characters upon mutation.
As part of a mutation, the ASTs package may insert or delete whitespace
or separator characters (e.g. commas) between ASTs. One common
idiom is to use an automatic source code formatter, such as `black`
or `clang-format`, on the source text after mutation to ensure
consistency before printing to screen or disk.
#### ASTs package throws "Unable to match ... on on AST of type ..."
This error occurs after a mutation when an AST is inserted or
replaces another and the type of the new AST does not match that
expected by the tree-sitter grammer. These errors are often subtle
and may be tricky to diagnose. As an example, consider the following:
```python
>>> func = asts.AST.from_string("print(a)", asts.ASTLanguage.Python, deepest=True)
>>> type(func)
<class 'asts.types.PythonCall'>
```
In the above snippet, we create a `func` AST representing a call to the print
function. Consider now that we wish to replace the `print` call with something
different, say `foo`. We may believe the following will perform
the operation:
```python
>>> foo = asts.AST.from_string("foo", asts.ASTLanguage.Python)
>>> new_func = asts.AST.copy(func, function=foo)
>>> new_func.source_text
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../Python-3.8.12/lib/python3.8/functools.py", line 967, in __get__
val = self.func(instance)
File ".../python/asts/asts.py", line 228, in source_text
return _interface.dispatch(AST.source_text.func.__name__, self)
File ".../python/asts/asts.py", line 589, in dispatch
return deserialize(handle_errors(json.loads(response.decode())))
File ".../python/asts/asts.py", line 539, in handle_errors
raise ASTException(data["error"])
asts.asts.ASTException: An unhandled error condition has been signalled: Unable to match
(SEQ (FIELD PYTHON-FUNCTION PYTHON-PRIMARY-EXPRESSION)
(FIELD PYTHON-ARGUMENTS PYTHON-GENERATOR-EXPRESSION PYTHON-ARGUMENT-LIST))
on AST of type
PYTHON-CALL
```
However, this does not work properly. If we look at the type of `foo` and the type
of the function child slot of `new_func`, we can see as to why.
```python
>>> type(foo)
<class 'asts.types.PythonModule'>
>>> type(new_func.child_slot("function"))
<class 'asts.types.PythonModule'>
```
In this case, we attempt to insert a `PythonModule` AST as the function child
slot of `func`. A `PythonModule` AST is a "root" AST type, associated with an
entire python file, replete with imports, functions, etc. This AST type does
not make sense as the "function" portion of a call AST; in this case,
we want an identifier AST. We can get the identifier by passing the `deepest`
keyword during AST creation, retrieving the deepest subnode still
encompassing all of the given source text, as shown below:
```python
>>> foo = asts.AST.from_string("foo", asts.ASTLanguage.Python, deepest=True)
>>> type(foo)
<class 'asts.types.PythonIdentifier'>
>>> new_func = asts.AST.copy(func, function=foo)
>>> new_func.source_text
'foo(a)'
```
#### What is the difference between the "children" property and the "children" child slot?
An AST is composed of several child slots, the arity of which may
be 0 (zero or more) or 1 (single AST). For instance, in the AST
for `print(a)`, there are two child slots, `[['FUNCTION', 1], ['ARGUMENTS', 1]]`,
allowing for destructuring into component parts of the AST.
In addition, internally, each AST has private before/after/internal-AST
slots for storing comments, errors, and internal whitespace that appears
in the source code. For instance, in the AST for the function body below,
the block AST has the comment `# This is a comment` stored in the internal
before-AST slot.
```python
def foo():
# This is a comment
return 0
```
The `children` property on AST will return the concatenated list of
component slots of an AST and any comments included in each AST's
internal before or after AST slot. The order of the ASTs will
match the order they appear in the source text.
The `children` child slot on an AST will return all ASTs not
assigned to an explicit slot (e.g. 'FUNCTION', 'ARGUMENTS' above)
and will NOT include any comment ASTs stored internally in an
AST's before/internal/after-AST slots.
When using the AST `copy` method, care should be taken when
using the `children` key (e.g. `AST.copy(ast, children=...)`).
In this case, we are setting the `children` slot and not the
property. Therefore, in almost all cases, we will want to
use the children slot, not property, of another AST to populate
the keyword argument (e.g. `AST.copy(ast, children=other.child_slot("children"))`).
If the property is utilized and a comment is added to the copy
AST's children slot, the error described above in "ASTs package
throws 'Unable to match...'" will occur.
#### Why does every AST has a child slot named "children"?
To begin, lets consider the AST for `print(a)`; for this AST there
are two explicit child slots not named `children`:
`[['FUNCTION', 1], ['ARGUMENTS', 1]]`. All ASTs not assigned to an
explicit child slot on an AST (e.g. "FUNCTION" or "ARGUMENTS"
previously) are assigned to the "catchall" slot named "children".
Additionally, for ASTs with no named slots (e.g. function
bodies), all of the ASTs are stored in the slot named "children".
In essence, the slot serves as a fall-thru place to store all
ASTs not assigned to an explicit, named slot.
#### ASTs package throws "tree-sitter-interface crashed"
This error is raised when the Common Lisp binary (`tree-sitter-interface`)
backing the ASTs python package crashes. If available, the error will
report the standard output and standard error streams for the
`tree-sitter-interface` process prior to the crash.
#### What is the difference between object ID (oid) and serial number?
Python AST objects contain a oid attribute representing an
object id (oid) on the Common Lisp side of the interface; in essence,
the python ASTs are pointers to Common Lisp memory locations. When
calling a python AST method, the oid is serialized to the Common Lisp
side of the interface where the underlying AST object is found
(dereferenced) and the operation performed.
Additionally, python AST objects contain a serial number attribute
representing a serial number on the Common Lisp side of the interface.
This serial number is used when performing modifications (mutations)
to an AST such as cut/replace/insert. To perform these modifications,
we find the AST in the root with the same serial number as the
point to modify and perform the operation. Serial numbers are stable
in the unmodified, original tree across operations; however, the
serial numbers of values inserted into the tree will differ from the
AST given to insert.
Two objects with the same oid will always the same serial number.
The converse does not hold.
# License
GPLv3+
[tree-sitter]: https://tree-sitter.github.io/tree-sitter/
[SEL]: https://grammatech.github.io/sel/index.html#Software-Evolution-Library
[template documentation]: https://grammatech.github.io/sel/Templates.html#Templates
[pattern matching]: https://www.python.org/dev/peps/pep-0636/
Raw data
{
"_id": null,
"home_page": "https://github.com/grammatech/sel",
"name": "asts",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6.0",
"maintainer_email": null,
"keywords": "software-engineering, source, program-synthesis",
"author": "Eric Schulte and GrammaTech",
"author_email": null,
"download_url": null,
"platform": null,
"description": "<!--TOC-->\n\n- [Generic Tree-Sitter AST API](#generic-tree-sitter-ast-api)\n- [Quick Intro](#quick-intro)\n- [Extended Tutorial](#extended-tutorial)\n - [AST Creation](#ast-creation)\n - [Constructor](#constructor)\n - [From String](#from-string)\n - [AST Templates](#ast-templates)\n - [AST Copy](#ast-copy)\n - [AST Methods](#ast-methods)\n - [Common Operations](#common-operations)\n - [Source Locations](#source-locations)\n - [Functions](#functions)\n - [Function Callsites](#function-callsites)\n - [AST Traversal](#ast-traversal)\n - [AST Manipulation](#ast-manipulation)\n - [Mutation Primitives](#mutation-primitives)\n - [Transformers](#transformers)\n- [Architecture](#architecture)\n- [FAQ](#faq)\n- [License](#license)\n\n<!--TOC-->\n\n# Generic Tree-Sitter AST API\n\nThe ASTs package provides a Python API into GrammaTech's Software\nEvolution Library ([SEL][]) for source code manipulation. SEL\ngeneralizes over GitHub's [tree-sitter][] parsing libraries providing\na uniform interface over multiple programming languages (primarily\nPython, JavaScript/TypeScript, and C/C++), and providing additional\nfunctionality for software inspection and modification.\n\n# Quick Intro\n\nHere's how to create and perform some common operations on an AST:\n\n```python3\n$ python3\nPython 3.8.5\nType \"help\", \"copyright\", \"credits\" or \"license\" for more information.\n>>> import asts\n>>> root = asts.AST.from_string(\"x + 88\", language=asts.ASTLanguage.Python)\n>>> root.children\n[<asts.types.PythonExpressionStatement0 0x2>]\n>>> root.children[0].children\n[<asts.types.PythonBinaryOperator 0x3>]\n>>> root.children[0].children[0].children\n[<asts.types.PythonIdentifier 0x4>,\n <asts.types.PythonAdd 0x5>,\n <asts.types.PythonInteger 0x6>]\n>>> root.children[0].children[0].children[0].source_text\n'x'\n>>> root.children[0].children[0].children[1].source_text\n'+'\n>>> root.children[0].children[0].children[2].source_text\n'88'\n>>> root.children[0].children[0].source_text\n'x + 88'\n>>> root.children[0].children[0].child_slots()\n[('LEFT', 1), ('OPERATOR', 1), ('RIGHT', 1), ('CHILDREN', 0)]\n>>> list(map(lambda x:x.source_text, root.children[0].children[0].children))\n['x', '+', '88']\n```\n\n# Extended Tutorial\n\nThe following examples assume you have imported the asts library using\n`import asts`. See the methods provided by asts.py for more\ninformation.\n\n<!-- TODO: Setup automatic documentation building. -->\n\n## AST Creation\n\n### Constructor\n\nThe default `AST` constructor should not be invoked directly by clients.\nInstead, the static factory methods described below should be utilized.\n\n### From String\n\nASTs may be created from source text using the `AST.from_string`\nfactory method in the Python API. Using this method requires source\ntext and (optionally) a language enumeration indicating the source\ntext language. The example below shows the creation of a simple AST:\n\n```python\n>>> root = asts.AST.from_string(\"x + 88\", language=asts.ASTLanguage.Python)\n```\n\nLanguage enumerations exist for `C`, `Cpp`, `Java`, `Javascript`, `Python`,\n`Rust`, `TypescriptTs`, and `TypescriptTsx`.\n\nFor larger examples where the language may be inferred, the language\nparameter may optionally be elided. For instance:\n\n```python\n>>> text = \"\"\"\n... import sys\n...\n... def fib(n: int) -> int:\n... if n < 2:\n... return n\n... else:\n... return fib(n - 1) + fib(n - 2)\n...\n... def main():\n... if len(sys.argv) == 1:\n... print(fib(int(sys.argv[1])))\n...\n... if __name__ == '__main__':\n... main()\n... \"\"\"\n>>> root = asts.AST.from_string(text)\n```\n\nIf tree-sitter cannot parse the source text into an AST, an error or text\nfragment AST node will be created starting at the location of the invalid\nparse. By default, the children of this node will be the best-effort but\npotentially incorrect parse tree received from tree-sitter with all terminal\ntokens excepting zero-width ones included. If instead of the tree you wish\nto have a flat, text representation child, you may pass `error_tree=False`\nas a keyword argument.\n\nFinally, by default, the AST returned is the top-level, module node of\nthe parsed AST. However, in some cases, we may wish get the deepest\nsubnode still encompassing all of the given source text. This allows\nus to create statement AST nodes, for instance. To do so, clients\nmay use the `deepest` keyword, as shown below:\n\n```python\n>>> root = asts.AST.from_string(\n... \"x + 88\",\n... language=asts.ASTLanguage.Python,\n... deepest=True\n... )\n>>> type(root)\n<class 'asts.types.PythonBinaryOperator'>\n```\n\n### AST Templates\n\n#### Templates for building ASTs\n\nASTs may also be created using the AST template mechanism. For\ninstance, the following snippet creates an AST equivalent to\n`asts.AST.from_string(\"x = 2\", language=asts.ASTLanguage.Python, deepest=True)`:\n\n```python\n>>> root = asts.AST.ast_template(\"$ID = 2\", asts.ASTLanguage.Python, id=\"x\")\n>>> root.source_text\n'x = 2'\n```\n\nMetavariable names (e.g. `$ID` above) may contain uppercase characters,\ndigits, or underscores. Metavariables may be scalars (e.g. `$`) or\nlists (e.g. `@`), as shown below:\n\n```python\n>>> root = asts.AST.ast_template(\"fn(@ARGS)\", asts.ASTLanguage.Python, args=[1,2,3])\n>>> root.source_text\n'fn(1, 2, 3)'\n```\n\nMetavariables may also be positional arguments (e.g. `$1`, `$2`), as\nshown below:\n\n```python\n>>> root = asts.AST.ast_template(\"$1 = 2\", asts.ASTLanguage.Python, \"x\")\n>>> root.source_text\n'x = 2'\n```\n\nHowever, you may not combine positional (e.g. `$1`) and keyword\n(e.g. `$ID`) metavariables in a single template. The corresponding\nmetavariable values passed in as arguments to `ast_template` may be\nASTs, literals, or lists.\n\n#### Templates for building and destructuring ASTs\n\nASTs may also be directly created for the metavariables in an AST\ntemplate. For instance, in the template `\"$1 = $2\"`, we may create\nASTs for `$1` and `$2` using `asts_from_template`, as shown below:\n\n```python\n>>> asts = asts.AST.asts_from_template(\"$1 = $2\", asts.ASTLanguage.Python, \"x\", 1)\n>>> len(asts)\n2\n>>> asts[0].source_text\n'x'\n>>> asts[1].source_text\n'1'\n```\n\nFor now, only the position syntax (e.g. `$1`, `$2`) is supported by\n`asts_from_template`. One AST is returned per metavariable, in\nnumeric order.\n\n#### More information\n\nMore information on AST templates may be found in the SEL\n[template documentation][].\n\n### AST Copy\n\nCopies of an AST may created using `AST.copy` or the python `copy`\nmodule, as shown below:\n\n```python\n>>> root = asts.AST.from_string(\"x + 1\", asts.ASTLanguage.Python, deepest=True)\n>>> copy = asts.AST.copy(root)\n>>> copy.source_text\n'x + 1'\n```\n\nIn addition, clients may set child slots in the copy by passing in new\nASTs for each slot as keyword arguments, as shown below:\n\n```python\n>>> root = asts.AST.from_string(\"x + 1\", asts.ASTLanguage.Python, deepest=True)\n>>> y = asts.AST.from_string(\"y\", asts.ASTLanguage.Python, deepest=True)\n>>> copy = asts.AST.copy(root, left=y)\n>>> copy.source_text\n'y + 1'\n```\n\nIn addition to ASTs, clients may also pass in literals (e.g. strings,\ncode snippets, numbers) as keyword values, as shown below. These are\nautomatically parsed into an AST to be inserted.\n\n```python\n>>> root = asts.AST.from_string(\"x + 1\", asts.ASTLanguage.Python, deepest=True)\n>>> copy = asts.AST.copy(root, left=1)\n>>> copy.source_text\n'1 + 1'\n>>> copy = asts.AST.copy(root, left=\"y\")\n>>> copy.source_text\n'y + 1'\n```\n\nTo view the names of an AST's child slots, you may use the\n`child_slots` method, as shown below:\n\n```python\n>>> root = asts.AST.from_string(\"x + 1\", asts.ASTLanguage.Python, deepest=True)\n>>> root.child_slots()\n[('LEFT', 1), ('OPERATOR', 1), ('RIGHT', 1), ('CHILDREN', 0)]\n```\n\nAlternatively, you may use the `dir` built-in to inspect an AST object\nand find its child slots as python properties, as shown below:\n\n```python\n>>> root = asts.AST.from_string(\"x + 1\", asts.ASTLanguage.Python, deepest=True)\n>>> dir(root)\n['__class__', ..., 'left', ..., 'operator', ..., 'right', ..., 'traverse']\n```\n\nBeyond these named child slots storing ASTs which are parsed from the language\ngrammar, internally, each AST has several additional slots which store ASTs\noutside of the grammar rules, such as comments or error nodes. These include\n`before-asts`, `after-asts`, and (optionally) `internal-asts-[0-9]+` storing\nnodes such as comments and errors before the AST, after the AST, and between\nimplicit terminal tokens within the AST respectively. The names of these slots\ncan be accessed using the `child_slots` method and passing `internal=True`, as\nshown below. These slot names may also be used `child_slot`/`lookup` to\nretrieve their current value and and with `copy` to modify the existing value.\n\n```python\n>>> text = \"\"\"\n... # This is a comment\n... x = x + 1\n... \"\"\"\n>>> root = asts.AST.from_string(text, asts.ASTLanguage.Python)\n>>> stmt = root.children[-1]\n>>> stmt.source_text\n'x = x + 1'\n>>> stmt.child_slots(internal=True)\n[('BEFORE-ASTS', 0), ('CHILDREN', 0), ('AFTER-ASTS', 0)]\n>>> stmt.child_slot(\"BEFORE-ASTS\")\n[<asts.types.PythonComment 0x3>]\n```\n\n## AST Methods\n\n### Common Operations\n\nIn practice, most clients will interact with ASTs using the AST's type and\nretrieving the AST's source text, parent AST, and child ASTs. All of\nthese operations are supported by the python API. To begin, let's examine\nusing AST types.\n\nASTs are embedded in a rich type hierarchy generated by the tree-sitter\nlibrary; this type hierarchy is augmented by generic mixins in SEL for\ncommon AST types across languages. For instance, `IfAST`,\n`StatementAST`, and `IdentifierAST` classes abstract common language\nconstructs and allow identification of these types of ASTs regardless\nof the language of the underlying program. As with other objects,\nclients may use `isinstance` to test if an AST is an instance or\nsubclass of a given type, as shown below:\n\n```python\n>>> root = asts.AST.from_string(\n... \"print(x)\",\n... language=asts.ASTLanguage.Python,\n... deepest=True\n... )\n>>> isinstance(root, asts.types.PythonCall)\nTrue\n>>> isinstance(root, asts.types.CallAST)\nTrue\n>>> isinstance(root, asts.types.PythonAST)\nTrue\n>>> isinstance(root, asts.types.CStringLiteral)\nFalse\n```\n\nBeyond AST types, accessing the source text of an AST is\nanother common operation. This may be accomplished using\nthe `source_text` property on ASTs, as shown below:\n\n```python\n>>> root = asts.AST.from_string(\n... \"print(x)\",\n... language=asts.ASTLanguage.Python,\n... deepest=True\n... )\n>>> root.source_text\n'print(x)'\n```\n\nAdditionally, to view a list of the immediate children\nof a particular AST, the `children` property may be used,\nas shown below:\n\n```python\n>>> root = asts.AST.from_string(\n... \"print(x)\",\n... language=asts.ASTLanguage.Python,\n... deepest=True\n... )\n>>> root.children\n[<asts.types.PythonIdentifier 0x4>, <asts.types.PythonArgumentList1 0x5>]\n>>> root.children[0].source_text\n'print'\n>>> identifier = root.children[1].children[0]\n>>> identifier.source_text\n'x'\n```\n\nAn AST is composed of various child slots which are concatenated together\nwhen accessed via the `children` property. To view the child slots for\na particular AST you may use the `child_slots()` method, which returns\na list of slot-name, arity pairs. An arity of one indicates the slot\nis a single AST, while an arity of zero indicates the slot is composed\nof zero or more ASTs. The AST(s) comprising a given slot may be\naccessed using the slot name as a python property accessor or by using\n`child_slot` method, as shown below:\n\n```python\n>>> root = asts.AST.from_string(\n... \"print(x)\",\n... language=asts.ASTLanguage.Python,\n... deepest=True\n... )\n>>> root.child_slots()\n[('FUNCTION', 1), ('ARGUMENTS', 1), ('CHILDREN', 0)]\n>>> root.function.source_text\n'print'\n>>> root.child_slot(\"FUNCTION\").source_text\n'print'\n```\n\nBeyond direct child lookup, an AST path composed of the names of the\nchild slots and, when the child slot is a list, a position in this list\nmay be used to lookup ASTs at arbritrary depths in the tree using the\n`lookup` method. Additionally, an AST's path in a tree may be found\ndirectly using the `ast_path` method, as shown below:\n\n```python\n>>> root = asts.AST.from_string(\n... \"print(x)\",\n... language=asts.ASTLanguage.Python,\n... deepest=True\n... )\n>>> x = root.lookup(['ARGUMENTS', ('CHILDREN', 0)])\n>>> x.source_text\n'x'\n>>> root.ast_path(x)\n['ARGUMENTS', ('CHILDREN', 0)]\n```\n\nFinally, parent trees may be accessed using the `parent` and `parents`\nmethods, as shown below. Please note that these methods require the\nroot of the subtree as a parameter.\n\n```python\n>>> root = asts.AST.from_string(\n... \"print(x)\",\n... language=asts.ASTLanguage.Python,\n... deepest=True\n... )\n>>> root.children\n[<asts.types.PythonIdentifier 0x4>, <asts.types.PythonArgumentList1 0x5>]\n>>> root.children[0].source_text\n'print'\n>>> identifier = root.children[1].children[0]\n>>> identifier.source_text\n'x'\n>>> identifier.parent(root).source_text\n'(x)'\n>>> [p.source_text for p in identifier.parents(root)]\n['(x)', 'print(x)']\n```\n\n#### Pattern Matching\n\nIn Python 3.10+, AST types and properties may be used in\n[pattern matching][] for conditional logic and to destructure\nan AST into its components parts. For instance, to match\nagainst assignments where the left-hand side of the assignment\nis the variable \"x\", the following match clause may be used:\n\n```python\n>>> root = asts.AST.from_string(\n... \"x = 1\",\n... language=asts.ASTLanguage.Python,\n... deepest=True\n... )\n>>> match root:\n... case asts.types.PythonAssignment(left=asts.types.IdentifierAST(source_text=\"x\")):\n... print(\"Match found\")\n...\nMatch found\n```\n\nAs another example, to destructure the left-hand and right-hand sides\nof an assignment into separate variables, the following match\nclause may be used:\n\n```python\n>>> root = asts.AST.from_string(\n... \"x = 1\",\n... language=asts.ASTLanguage.Python,\n... deepest=True\n... )\n>>> match root:\n... case asts.types.PythonAssignment(left=lhs, right=rhs):\n... [lhs, rhs]\n...\n[<asts.types.PythonIdentifier 0x4>, <asts.types.PythonInteger 0x5>]\n```\n\n### Source Locations\n\nFor some applications, it may be useful to know the start/end locations\nof each AST or retrieve the AST at a given location. Clients may do\nboth using the `ast_source_ranges` and `ast_at_point` methods\nrespectively, as shown below. Please note that for each method the\nlines and columns are 1-indexed.\n\n```python\n>>> root = asts.AST.from_string(\n... \"print(x)\",\n... language=asts.ASTLanguage.Python,\n... deepest=True\n... )\n>>> root.ast_source_ranges()\n[(<asts.types.PythonCall 0x3>, ((1, 1), (1, 9))),\n (<asts.types.PythonIdentifier 0x4>, ((1, 1), (1, 6))),\n (<asts.types.PythonArgumentList1 0x5>, ((1, 6), (1, 9))),\n (<asts.types.PythonIdentifier 0x6>, ((1, 7), (1, 8)))]\n>>> root.ast_at_point(1, 7).source_text\n'x'\n```\n\n### Functions\n\nFunction ASTs have special consideration in the python API, and clients\nmay retrieve various function attributes, such as name, parameters, and\nbody, using the respective AST methods, `function_name`,\n`function_parameters`, and `function_body`, as shown below:\n\n```python\n>>> root = asts.AST.from_string(\n... \"def foo(bar: int) -> int:\\n return bar / 2\",\n... language=asts.ASTLanguage.Python,\n... deepest=True\n... )\n>>> root.function_name()\n'foo'\n>>> [param.source_text for param in root.function_parameters()]\n['bar: int']\n>>> root.function_body().source_text\n' return bar / 2'\n```\n\n### Function Callsites\n\nIn addition to function ASTs, function callsites also have special\nconsideration in the python API. Clients may query for the library\ncontaining the function implementation (`provided_by`), the function\nportion of the callsite (`call_function`), and the callargs\n(`call_arguments`). An example incorporating these methods is shown\nbelow:\n\n```python\n>>> root = asts.AST.from_string(\n... \"import json\\njson.dumps({})\",\n... language=asts.ASTLanguage.Python\n... )\n>>> callsite = root.children[-1].children[-1]\n>>> callsite.provided_by(root)\n'json'\n>>> callsite.call_function().source_text\n'json.dumps'\n>>> [callarg.source_text for callarg in callsite.call_arguments()]\n['{}']\n```\n\n## AST Traversal\n\nASTs may be explictly traversed in pre-order using the `traverse` method\nwhich creates a generator that may be used anywhere a python `iterable`\nis required. An example usage is shown below:\n\n```python\n>>> root = asts.AST.from_string(\"x + 88\", language=asts.ASTLanguage.Python)\n>>> for a in root.traverse():\n... print(a)\n<asts.types.PythonModule 0x1>\n<asts.types.PythonExpressionStatement0 0x2>\n<asts.types.PythonBinaryOperator 0x3>\n<asts.types.PythonIdentifier 0x4>\n<asts.types.PythonAdd 0x5>\n<asts.types.PythonInteger 0x6>\n```\n\nAdditionally, AST objects are themselves iterators and may be used\nanywhere a python `iterable` is required, as shown below:\n\n```python\n>>> root = asts.AST.from_string(\"x + 88\", language=asts.ASTLanguage.Python)\n>>> for a in root:\n... print(a)\n<asts.types.PythonModule 0x1>\n<asts.types.PythonExpressionStatement0 0x2>\n<asts.types.PythonBinaryOperator 0x3>\n<asts.types.PythonIdentifier 0x4>\n<asts.types.PythonAdd 0x5>\n<asts.types.PythonInteger 0x6>\n```\n\nAs expected, ASTs may be also be used in list comprehensions as shown:\n\n```python\n>>> root = asts.AST.from_string(\"x + 88\", language=asts.ASTLanguage.Python)\n>>> ids = [a for a in root if isinstance(a, asts.types.PythonIdentifier)]\n>>> len(ids)\n1\n```\n\n## AST Manipulation\n\nASTs may also be manipulated (mutated) using the python API. Mutation\noperations create a new AST distinct from the inputs. The input ASTs\nmay continue to be used as before; however, they are unchanged objects\ndistinct from the AST(s) created.\n\n### Mutation Primitives\n\nCurrently, clients may cut, insert, or replace AST subtrees, as shown. The\noperations are based on AST serial numbers; we find the AST in the root node\npassed as the first parameter with the same serial number as the point\nmodify and perform the corresponding operation.\n\nCUT:\n```python\n>>> root = asts.AST.from_string(\"x = 2\\n\", language=asts.ASTLanguage.Python)\n>>> stmt = root.children[0]\n>>> root = asts.AST.cut(root, stmt)\n>>> root.source_text\n''\n```\n\nINSERT:\n```python\n>>> root = asts.AST.from_string(\n... \"y = 3\\n\",\n... language=asts.ASTLanguage.Python\n... )\n>>> stmt = root.children[0]\n>>> new_stmt = asts.AST.from_string(\n... \"x = 2\\n\",\n... language=asts.ASTLanguage.Python,\n... deepest=True\n... )\n>>> root = asts.AST.insert(root, stmt, new_stmt)\n>>> root.source_text\n'x = 2\\ny = 3\\n'\n```\n\nREPLACE:\n```python\n>>> root = asts.AST.from_string(\n... \"x = 2\\n\",\n... language=asts.ASTLanguage.Python\n... )\n>>> literal = root.children[0].children[0].children[-1]\n>>> new_literal = asts.AST.from_string(\n... \"3\",\n... language=asts.ASTLanguage.Python,\n... deepest=True\n... )\n>>> root = asts.AST.replace(root, literal, new_literal)\n>>> root.source_text\n\"x = 3\\n\"\n```\n\nAs a useful shortcut, for small mutations, literals may be passed as the\nvalues for insertion and replacement, as shown below:\n\n```python\n>>> root = asts.AST.from_string(\n... \"x = 2\\n\",\n... language=asts.ASTLanguage.Python\n... )\n>>> literal = root.children[0].children[0].children[-1]\n>>> root = asts.AST.replace(root, literal, 3)\n>>> root.source_text\n\"x = 3\\n\"\n```\n\nFinally, AST paths may be used as mutation point parameters to cut,\ninsert, and replace, as shown below:\n\n```python\n>>> root = asts.AST.from_string(\n... \"x = 2\\n\",\n... language=asts.ASTLanguage.Python\n... )\n>>> literal_path = [('CHILDREN', 0), ('CHILDREN', 0), 'RIGHT']\n>>> root = asts.AST.replace(root, literal_path, 3)\n>>> root.source_text\n\"x = 3\\n\"\n```\n\n### Transformers\n\nIn addition to simple mutation primitives, the API also supports walking\nthe AST tree and performing transformations on the nodes within. This\nmode of mutation requires the client to define a *transformer* function\nwhich takes an AST node parameter as input and (optionally) returns an\nAST or literal which should replace that node in the new tree. If no\nnew node is provided by the transformer function, the node is not\nreplaced in the newly created tree.\n\nFor instance, consider a scenario where you wish to rewrite an AST,\nreplacing all `x` identifiers with `y`. To begin, you would define\nan `x_to_y` transformer function, as shown below:\n\n```python\n>>> def x_to_y(ast: asts.AST) -> Optional[asts.LiteralOrAST]:\n... \"\"\"Convert 'x' identifier ASTs to 'y'.\"\"\"\n... if isinstance(ast, asts.types.IdentifierAST) and \"x\" == ast.source_text:\n... return asts.AST.from_string(\"y\", ast.language, deepest=True)\n...\n```\n\nAs you can see, this function returns a `y` identifier when it encounters\nan `x` identifier. To use the transformer to replace nodes in an AST tree,\nyou would use the `AST.transform` function, as shown below:\n\n```python\n>>> text = \"\"\"\n... x = 1\n... print(x)\n... \"\"\"\n>>> root = asts.AST.from_string(text, asts.ASTLanguage.Python)\n>>> print(root.source_text.strip())\nx = 1\nprint(x)\n>>> transformed = asts.AST.transform(root, x_to_y)\n>>> print(transformed.source_text.strip())\ny = 1\nprint(y)\n```\n\nIn the example above, the `x_to_y` transformer returned an AST. The\ntransformer function may also return a literal value, which will be\nparsed as an AST. For instance, the below `x_to_y` transformer is\nfunctionally equivalent to the example above:\n\n```python\n>>> def x_to_y(ast: asts.AST) -> Optional[asts.LiteralOrAST]:\n... \"\"\"Convert 'x' identifier ASTs to 'y'.\"\"\"\n... if isinstance(ast, asts.types.IdentifierAST) and \"x\" == ast.source_text:\n... return \"y\"\n...\n```\n\nAdditionally, when using Python 3.10+, you may use [pattern matching][]\nto further simplify the implementation of the `x_to_y` transformer,\nas shown below:\n\n```python\n>>> def x_to_y(ast: asts.AST) -> Optional[asts.LiteralOrAST]:\n... \"\"\"Convert 'x' identifier ASTs to 'y'.\"\"\"\n... match ast:\n... case asts.types.IdentifierAST(source_text=\"x\"):\n... return \"y\"\n...\n```\n\nTransformers may implement more complicated logic than the simple\n`x_to_y` transform above. For instance, one may wish to convert\n`x` identifiers to `y`, but only for the left-hand side of assignment\nstatements, as shown below:\n\n```python\n>>> def x_to_y_assignment_lhs(ast: asts.AST) -> Optional[asts.LiteralOrAST]:\n... \"\"\"Convert 'x' identifier ASTs to 'y' on the lhs of assignments.\"\"\"\n... if (\n... isinstance(ast, asts.types.PythonAssignment)\n... and \"x\" == ast.left.source_text\n... ):\n... return asts.AST.copy(ast, left=\"y\")\n...\n>>> text = \"\"\"\n... x = 1\n... print(y)\n... \"\"\"\n>>> root = asts.AST.from_string(text, asts.ASTLanguage.Python)\n>>> transformed = asts.AST.transform(root, x_to_y_assignment_lhs)\n>>> print(transformed.source_text.strip())\ny = 1\nprint(y)\n```\n\nAs before, the `x_to_y_assignment_lhs` transformer may be simplified\nusing [pattern matching][] in Python 3.10+, as shown below:\n\n```python\n>>> def x_to_y_assignment_lhs(ast: asts.AST) -> Optional[asts.LiteralOrAST]:\n... \"\"\"Convert 'x' identifier ASTs to 'y' on the lhs of assignments.\"\"\"\n... match ast:\n... case asts.types.PythonAssignment(left=asts.types.IdentifierAST(source_text=\"x\")):\n... return asts.AST.copy(ast, left=\"y\")\n...\n```\n\nFor these more complicated transforms, you may need to mutate the parent\nof the node you are interested in. For instance, above we create a\nnew python assignment node with the left-hand side modified to the\nvalue we want.\n\nTying everything together, consider the case where wish to delete\nall `print` statements from an AST. This may be accomplished by\nfirst defining a predicate for print statements, as shown below:\n\n```python\n>>> def is_print_statement(ast: asts.AST) -> bool:\n... \"\"\"Return TRUE if AST is an statement calling the print function.\"\"\"\n... if isinstance(ast, asts.types.ExpressionStatementAST):\n... fn_calls = [c.call_function().source_text for c in asts.utility.call_asts(ast)]\n... return \"print\" in fn_calls\n... return False\n...\n```\n\nOnce this predicate is in place, we may use it to define a transformer which\nreturns a node with the `print` statements immediately below it elided:\n\n```python\n>>> def delete_print_statements(ast: asts.AST) -> Optional[asts.LiteralOrAST]:\n... \"\"\"Delete all print statements from the children of AST.\"\"\"\n... if isinstance(ast, (asts.types.RootAST, asts.types.CompoundAST)):\n... # Build a list of new, non-comment children directly under the AST,\n... # eliding print statements.\n... new_children = [\n... c for c in ast.child_slot(\"children\")\n... if not is_print_statement(c)\n... ]\n...\n... # Special case; if no children remain, add a \"pass\" statement nop to\n... # avoid syntax errors.\n... new_children = new_children if new_children else [\"pass\\n\"]\n...\n... # Create the new node with print statements removed from the children.\n... return asts.AST.copy(ast, children=new_children)\n...\n```\n\nFinally, as before, we may use the `delete_print_statements` transformer\nto mutate an AST, as shown below:\n\n```\n>>> text = \"\"\"\n... x = 1\n... y = 2\n... if x > 1:\n... x = x ** y\n... print(\"Test One: %d\" % x)\n... else:\n... print(\"Test Two: %d\" % x)\n... print(\"y = %d\", y)\n... \"\"\"\n>>> root = asts.AST.from_string(text, asts.ASTLanguage.Python)\n>>> transformed = asts.AST.transform(root, delete_print_statements)\n>>> print(transformed.source_text.strip())\nx = 1\ny = 2\nif x > 1:\n x = x ** y\nelse:\n pass\n```\n\n# Architecture\n\nThe python library is a thin wrapper around a Common Lisp program named\n`tree-sitter-interface` which calls the required pieces of the\nSoftware Evolution Library ([SEL][]). Most API calls are delegated to\nthis interface which we communicate with using JSON formatted input/\noutput over stdio/stdout or a socket.\n\nThe python AST objects contain a oid attribute representing an\nobject id (oid) on the Common Lisp side of the interface; in essence,\nthe python ASTs are pointers to Common Lisp memory locations. When\ncalling a python AST method, the oid is serialized to the Common Lisp\nside of the interface where the underlying AST object is found\n(dereferenced) and the operation performed. You may get the object id\nusing the `oid` property on python ASTs; to test for python AST\nequality, we check to see if the ASTs point to the same object using\nthe oids.\n\nAdditionally, python AST objects contain a serial number attribute\nrepresenting a serial number on the Common Lisp side of the interface.\nThis serial number is used when performing modifications (mutations)\nto an AST such as cut/replace/insert. To perform these modifications,\nwe find the AST in the root with the same serial number as the\npoint to modify and perform the operation. Serial numbers are stable\nin the unmodified, original tree across operations; however, the\nserial numbers of values inserted into the tree will differ from the\nAST given to insert.\n\nTo allow for garbage collection, the ASTs are manually reference\ncounted. Whenever a python AST (pointer) is created, the reference\ncount for the associated Common Lisp AST is incremented. Similarly,\nas python ASTs are garbage collected the reference counter is\ndecremented. When the reference counter reaches zero, the Common Lisp\nAST is released and may be garbage collected at the appropriate time.\nTo get the reference count for a particular python AST, you may use\nthe `ast_ref_count` method.\n\nThe underlying Common Lisp ASTs are themselves treated as immutable.\nTherefore, when performing mutation operations (e.g. cut, replace,\ninsert), new ASTs are created in the process.\n\n# FAQ\n\n#### ASTs package does not faithfully reproduce original text\n\nThe source text property of an AST created using the ASTs package\nmay not match the original text given to the parser, even if no\nmutations are performed. An automatic source code formatter, such\nas `black` or `clang-format`, may be used on the original source\ntext and the source text attribute of the parsed AST before printing\nto ensure consistency if this is required. This flexibility in\nparsing allows for us to perform structured mutation operations -\nfor instance, automatically inserting a comma when an item is\nadded to an initializer list in C++.\n\n#### ASTs package inserts/deletes characters upon mutation.\n\nAs part of a mutation, the ASTs package may insert or delete whitespace\nor separator characters (e.g. commas) between ASTs. One common\nidiom is to use an automatic source code formatter, such as `black`\nor `clang-format`, on the source text after mutation to ensure\nconsistency before printing to screen or disk.\n\n#### ASTs package throws \"Unable to match ... on on AST of type ...\"\n\nThis error occurs after a mutation when an AST is inserted or\nreplaces another and the type of the new AST does not match that\nexpected by the tree-sitter grammer. These errors are often subtle\nand may be tricky to diagnose. As an example, consider the following:\n\n```python\n>>> func = asts.AST.from_string(\"print(a)\", asts.ASTLanguage.Python, deepest=True)\n>>> type(func)\n<class 'asts.types.PythonCall'>\n```\n\nIn the above snippet, we create a `func` AST representing a call to the print\nfunction. Consider now that we wish to replace the `print` call with something\ndifferent, say `foo`. We may believe the following will perform\nthe operation:\n\n```python\n>>> foo = asts.AST.from_string(\"foo\", asts.ASTLanguage.Python)\n>>> new_func = asts.AST.copy(func, function=foo)\n>>> new_func.source_text\nTraceback (most recent call last):\n File \"<stdin>\", line 1, in <module>\n File \".../Python-3.8.12/lib/python3.8/functools.py\", line 967, in __get__\n val = self.func(instance)\n File \".../python/asts/asts.py\", line 228, in source_text\n return _interface.dispatch(AST.source_text.func.__name__, self)\n File \".../python/asts/asts.py\", line 589, in dispatch\n return deserialize(handle_errors(json.loads(response.decode())))\n File \".../python/asts/asts.py\", line 539, in handle_errors\n raise ASTException(data[\"error\"])\nasts.asts.ASTException: An unhandled error condition has been signalled: Unable to match\n(SEQ (FIELD PYTHON-FUNCTION PYTHON-PRIMARY-EXPRESSION)\n (FIELD PYTHON-ARGUMENTS PYTHON-GENERATOR-EXPRESSION PYTHON-ARGUMENT-LIST))\non AST of type\nPYTHON-CALL\n```\n\nHowever, this does not work properly. If we look at the type of `foo` and the type\nof the function child slot of `new_func`, we can see as to why.\n\n```python\n>>> type(foo)\n<class 'asts.types.PythonModule'>\n>>> type(new_func.child_slot(\"function\"))\n<class 'asts.types.PythonModule'>\n```\n\nIn this case, we attempt to insert a `PythonModule` AST as the function child\nslot of `func`. A `PythonModule` AST is a \"root\" AST type, associated with an\nentire python file, replete with imports, functions, etc. This AST type does\nnot make sense as the \"function\" portion of a call AST; in this case,\nwe want an identifier AST. We can get the identifier by passing the `deepest`\nkeyword during AST creation, retrieving the deepest subnode still\nencompassing all of the given source text, as shown below:\n\n\n```python\n>>> foo = asts.AST.from_string(\"foo\", asts.ASTLanguage.Python, deepest=True)\n>>> type(foo)\n<class 'asts.types.PythonIdentifier'>\n>>> new_func = asts.AST.copy(func, function=foo)\n>>> new_func.source_text\n'foo(a)'\n```\n\n#### What is the difference between the \"children\" property and the \"children\" child slot?\n\nAn AST is composed of several child slots, the arity of which may\nbe 0 (zero or more) or 1 (single AST). For instance, in the AST\nfor `print(a)`, there are two child slots, `[['FUNCTION', 1], ['ARGUMENTS', 1]]`,\nallowing for destructuring into component parts of the AST.\n\nIn addition, internally, each AST has private before/after/internal-AST\nslots for storing comments, errors, and internal whitespace that appears\nin the source code. For instance, in the AST for the function body below,\nthe block AST has the comment `# This is a comment` stored in the internal\nbefore-AST slot.\n\n```python\ndef foo():\n # This is a comment\n return 0\n```\n\nThe `children` property on AST will return the concatenated list of\ncomponent slots of an AST and any comments included in each AST's\ninternal before or after AST slot. The order of the ASTs will\nmatch the order they appear in the source text.\n\nThe `children` child slot on an AST will return all ASTs not\nassigned to an explicit slot (e.g. 'FUNCTION', 'ARGUMENTS' above)\nand will NOT include any comment ASTs stored internally in an\nAST's before/internal/after-AST slots.\n\nWhen using the AST `copy` method, care should be taken when\nusing the `children` key (e.g. `AST.copy(ast, children=...)`).\nIn this case, we are setting the `children` slot and not the\nproperty. Therefore, in almost all cases, we will want to\nuse the children slot, not property, of another AST to populate\nthe keyword argument (e.g. `AST.copy(ast, children=other.child_slot(\"children\"))`).\nIf the property is utilized and a comment is added to the copy\nAST's children slot, the error described above in \"ASTs package\nthrows 'Unable to match...'\" will occur.\n\n#### Why does every AST has a child slot named \"children\"?\n\nTo begin, lets consider the AST for `print(a)`; for this AST there\nare two explicit child slots not named `children`:\n`[['FUNCTION', 1], ['ARGUMENTS', 1]]`. All ASTs not assigned to an\nexplicit child slot on an AST (e.g. \"FUNCTION\" or \"ARGUMENTS\"\npreviously) are assigned to the \"catchall\" slot named \"children\".\nAdditionally, for ASTs with no named slots (e.g. function\nbodies), all of the ASTs are stored in the slot named \"children\".\nIn essence, the slot serves as a fall-thru place to store all\nASTs not assigned to an explicit, named slot.\n\n#### ASTs package throws \"tree-sitter-interface crashed\"\n\nThis error is raised when the Common Lisp binary (`tree-sitter-interface`)\nbacking the ASTs python package crashes. If available, the error will\nreport the standard output and standard error streams for the\n`tree-sitter-interface` process prior to the crash.\n\n#### What is the difference between object ID (oid) and serial number?\n\nPython AST objects contain a oid attribute representing an\nobject id (oid) on the Common Lisp side of the interface; in essence,\nthe python ASTs are pointers to Common Lisp memory locations. When\ncalling a python AST method, the oid is serialized to the Common Lisp\nside of the interface where the underlying AST object is found\n(dereferenced) and the operation performed.\n\nAdditionally, python AST objects contain a serial number attribute\nrepresenting a serial number on the Common Lisp side of the interface.\nThis serial number is used when performing modifications (mutations)\nto an AST such as cut/replace/insert. To perform these modifications,\nwe find the AST in the root with the same serial number as the\npoint to modify and perform the operation. Serial numbers are stable\nin the unmodified, original tree across operations; however, the\nserial numbers of values inserted into the tree will differ from the\nAST given to insert.\n\nTwo objects with the same oid will always the same serial number.\nThe converse does not hold.\n\n# License\n\nGPLv3+\n\n[tree-sitter]: https://tree-sitter.github.io/tree-sitter/\n[SEL]: https://grammatech.github.io/sel/index.html#Software-Evolution-Library\n[template documentation]: https://grammatech.github.io/sel/Templates.html#Templates\n[pattern matching]: https://www.python.org/dev/peps/pep-0636/\n",
"bugtrack_url": null,
"license": "GPLv3+",
"summary": "A library for programmatic software modification",
"version": "0.9.10",
"project_urls": {
"Bug Reports": "https://github.com/grammatech/sel/issues",
"Homepage": "https://github.com/grammatech/sel",
"Source": "https://github.com/grammatech/sel/"
},
"split_keywords": [
"software-engineering",
" source",
" program-synthesis"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "6827b863d279361920d8d7f6240ca40b4390037a2979066cdc5a440fd0a63d39",
"md5": "ff68e8949e44ef700d0881242be4b2ee",
"sha256": "5aab24866246a4bde32b44add8ba1eb742abee34812d7b06487b058364a1aeb7"
},
"downloads": -1,
"filename": "asts-0.9.10-py3-none-manylinux_2_28_x86_64.whl",
"has_sig": false,
"md5_digest": "ff68e8949e44ef700d0881242be4b2ee",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6.0",
"size": 36253719,
"upload_time": "2024-09-30T17:10:23",
"upload_time_iso_8601": "2024-09-30T17:10:23.863675Z",
"url": "https://files.pythonhosted.org/packages/68/27/b863d279361920d8d7f6240ca40b4390037a2979066cdc5a440fd0a63d39/asts-0.9.10-py3-none-manylinux_2_28_x86_64.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-30 17:10:23",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "grammatech",
"github_project": "sel",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "asts"
}