LARA

Concrete vs Abstract Syntax Trees

Concrete Syntax Tree

Given a grammar, a concrete syntax tree is a directed tree with ordered children, given like this:

  • leaves are labeled by terminals (tokens)
  • internal nodes are labelled by non-terminals
    • if a node is labelled by non-terminal $X$, then
      • there must be a production $X ::= q_1 \ldots q_n$ such that
      • the children of the node are labelled by $q_1$, …, $q_n$ (in that order)
  • root is the start symbol

Example grammar, with start symbol 'term':

term ::= factor | term "+" factor 
factor ::= INTLITERAL | IDENTIFIER

An example tree:

term(term(term(factor(IDENTIFIER("xx"))),
          "+",
          INTLITERAL(42)),
     "+",
     factor(IDENTIFIER("yy"))

This tree shows that

xx + 42 + yy

can be generated by the grammar.

The yield of the concrete syntax tree is the sequence of terminals that we obtain by traversing the tree in infix order.

The parse tree “proves” that the yield can be generated by the grammar.

Parse tree is more abstract than derivation: there are several derivation that result in the same concrete syntax tree, depending on the order in which we expand the non-terminals.

Abstract Syntax Tree (AST)

These trees typically ignore simple tokens such as

  • keywords
  • parentheses
  • operator symbols

Instead, the names of syntax tree nodes and their content carry sufficient information

abstract class Expression
case class IntegerLiteral(v : Int) extends Expressioon
case class Variable(id : String) extends Expression
case class Plus(e1 : Expression, e2 : Expression) extends Expression
case class Minus(e1 : Expression, e2 : Expression) extends Expression

The Ordering of Operators

The order of operators in AST determines the meaning. Compare:

Minus(Minus(100,20),5)   --> 75
Minus(100,Minus(20,5))   --> 85

Must be careful to ensure the desired priority and the order of application of operators.

left-associative operators:

def parseTerm : Expression = {
  var e = parseFactor
  while (lex.token = MINUS) {
    lex.next
    e = Minus(e, parseFactor)
  }
  e
}

right-associative operator:

def parseTerm : Expression = {
  val e1 = parseFactor
  if (lex.token = EXPONENT) {
    lex.next
    val e2 = parseTerm
    Exponent(e1,e2)
  } else {
    e1
  }
}