Notion of Syntax Trees

We have seen how to build a parser

  • complains if the input is wrong
  • does absolutely nothing if input is correct :)

Next step: have parser create representation of the program

  • this representation is abstract syntax tree

Concrete syntax tree

  • has a leaf for each terminal
  • has an internal node for each terminal
  • usually not built explicitly

Abstract Syntax Tree

  • similar to concrete syntax tree
  • removes unnecessary tokens (parentheses, keywords)
  • keeps enough information to compile or interpret program

Example: trees for x + y * z

Example definitions: tree.scala, syntax tree for while language (for which you built an interpreter)

Specifying trees:

  • sometimes we use rules as in context-free grammar to specify above
  • but the goal is then to describe trees, not sequence of strings
  • keywords are not that important
  • if it was interpreted as a grammars for strings, it would be ambiguous


The description

E ::= E + E | E * E | ID

is not good as a grammar for LL(1) parsing, but is good as description of abstract syntax such as this:

sealed abstract class Tree
case class Plus(left : Tree, right : Tree) extends Tree
case class Times(left : Tree, right : Tree) extends Tree
case class Ident(id : String) extends Tree