Semantic Actions in a Compiler

Parser as Recognizer

In the theory of formal languages, a parser is simply an algorithm with:

INPUT:

context-free grammar $G$ with a start symbol $S$
a word $w$ (sequence of terminal symbols)

OUTPUT:

yes, if $S \Rightarrow^* w$ in the grammar $G$
no, otherwise

Example of calculator language:

Grammar:

term ::= factor ("+" factor)*
factor ::= INTLITERAL | IDENTIFIER

Recognizer:

def parseTerm : Unit = {
  parseFactor
  while (lex.token == PLUS) {
    lex.next
    parseFactor
  }
}
def parseFactor : Unit = {
  if (lex.current == INTLITERAL) {
    lex.next
  } else if (lex.current == IDENTIFIER) {
    lex.next
  } else {
    syntaxError("Expected identifier or integer literal")
  }
}

Parser as a Compiler Phase

In compilers we are interested in generating code, and not just saying whether the program is syntactically correct.

To do so, we extend the “yes/no” parser with semantic actions, which do something with the program content, using the information on *how* the program was parsed.

Pretty Printing

Automatically format program according to some conventions (indentation, etc)

def parseTerm : Unit = {
  parseFactor
  while (lex.token == PLUS) {
    lex.next
    print("\n  + ") // put each factor in new line
    parseFactor
  }
}
def parseFactor : Unit = {
  if (lex.current == INTLITERAL) {
    val v = lex.getIntValue
    printInt(v) // will print all constants in decimal, even if they were parsed as hex
    lex.next
  } else if (lex.current == IDENTIFIER) {
    val name = lex.getIdent
    print(name)
    lex.next
  } else {
    syntaxError("Expected identifier or integer literal")
  }
}

Interpreting (Evaluating) Program

If we built an interpreter for very simple language and did interpretation while parsing (not very efficient if language has loops)

def parseTerm : Int = {
  var sum = parseFactor
  while (lex.token == PLUS) {
    lex.next
    sum = sum + parseFactor
  }
  sum
}
def parseFactor : Int = {
  if (lex.current == INTLITERAL) {
    val res = lex.getIntValue
    lex.next
    res
  } else if (lex.current == IDENTIFIER) {
    val name = lex.getIdent
    lex.next
    lookup(name)
  } else {
    syntaxError("Expected identifier or integer literal")
  }
}

Emitting Code

In simple compilers, semantic actions can already emit the code.

def parseTerm : List[Instruction] = {
  var instrs = parseFactor
  while (lex.token == PLUS) {
    lex.next
    instrs = instrs ::: parseFactor :: List(AddInstruction)
  }
  instrs
}
def parseFactor : List[Instruction] = {
  if (lex.current == INTLITERAL) {
    val v = lex.getIntValue
    lex.next
    List(PushConstantInstruction(v))
  } else if (lex.current == IDENTIFIER) {
    val name = lex.getIdent
    lex.next
    List(LoadInstruction(lookupAddress(name)))
  } else {
    syntaxError("Expected identifier or integer literal")
  }
}

Building Abstract Syntax Tree

For all more complex actions, semantic actions build abstract syntax tree, a representation of the program that has all relevant information about how the program should execution (but not necessarily how the program is written e.g. where the comments were, etc.).

Here is a tiny syntax tree:

sealed abstract class Expression
case class IntegerLiteral(v : Int) extends Expressioon
case class Variable(id : String) extends Expression
case class Plus(e1 : Expression, e2 : Expression) extends Expression

Here is the version of the parser that builds the syntax tree:

def parseTerm : Expression = {
  var e = parseFactor
  while (lex.token == PLUS) {
    lex.next
    e = Plus(e, parseFactor)
  }
  e
}
def parseFactor : Expression = {
  if (lex.current == INTLITERAL) {
    val v = lex.getIntValue
    lex.next
    IntegerLiteral(v)
  } else if (lex.current == IDENTIFIER) {
    val name = lex.getIdent
    lex.next
    Variable(name)
  } else {
    syntaxError("Expected identifier or integer literal")
  }
}

When we run the code above on

xx + 42 + yy

we obtain the tree:

Plus(Plus("xx", IntegerLiteral(42)),
     Variable("yy"))