Semantic Actions in a Compiler
Parser as Recognizer
In the theory of formal languages, a parser is simply an algorithm with:
INPUT:
- context-free grammar with a start symbol
- a word (sequence of terminal symbols)
OUTPUT:
- yes, if in the grammar
- no, otherwise
Example of calculator language:
Grammar:
term ::= factor ("+" factor)* factor ::= INTLITERAL | IDENTIFIER
Recognizer:
def parseTerm : Unit = { parseFactor while (lex.token == PLUS) { lex.next parseFactor } } def parseFactor : Unit = { if (lex.current == INTLITERAL) { lex.next } else if (lex.current == IDENTIFIER) { lex.next } else { syntaxError("Expected identifier or integer literal") } }
Parser as a Compiler Phase
In compilers we are interested in generating code, and not just saying whether the program is syntactically correct.
To do so, we extend the “yes/no” parser with semantic actions, which do something with the program content, using the information on *how* the program was parsed.
Pretty Printing
Automatically format program according to some conventions (indentation, etc)
def parseTerm : Unit = { parseFactor while (lex.token == PLUS) { lex.next print("\n + ") // put each factor in new line parseFactor } } def parseFactor : Unit = { if (lex.current == INTLITERAL) { val v = lex.getIntValue printInt(v) // will print all constants in decimal, even if they were parsed as hex lex.next } else if (lex.current == IDENTIFIER) { val name = lex.getIdent print(name) lex.next } else { syntaxError("Expected identifier or integer literal") } }
Interpreting (Evaluating) Program
If we built an interpreter for very simple language and did interpretation while parsing (not very efficient if language has loops)
def parseTerm : Int = { var sum = parseFactor while (lex.token == PLUS) { lex.next sum = sum + parseFactor } sum } def parseFactor : Int = { if (lex.current == INTLITERAL) { val res = lex.getIntValue lex.next res } else if (lex.current == IDENTIFIER) { val name = lex.getIdent lex.next lookup(name) } else { syntaxError("Expected identifier or integer literal") } }
Emitting Code
In simple compilers, semantic actions can already emit the code.
def parseTerm : List[Instruction] = { var instrs = parseFactor while (lex.token == PLUS) { lex.next instrs = instrs ::: parseFactor :: List(AddInstruction) } instrs } def parseFactor : List[Instruction] = { if (lex.current == INTLITERAL) { val v = lex.getIntValue lex.next List(PushConstantInstruction(v)) } else if (lex.current == IDENTIFIER) { val name = lex.getIdent lex.next List(LoadInstruction(lookupAddress(name))) } else { syntaxError("Expected identifier or integer literal") } }
Building Abstract Syntax Tree
For all more complex actions, semantic actions build abstract syntax tree, a representation of the program that has all relevant information about how the program should execution (but not necessarily how the program is written e.g. where the comments were, etc.).
Here is a tiny syntax tree:
sealed abstract class Expression case class IntegerLiteral(v : Int) extends Expressioon case class Variable(id : String) extends Expression case class Plus(e1 : Expression, e2 : Expression) extends Expression
Here is the version of the parser that builds the syntax tree:
def parseTerm : Expression = { var e = parseFactor while (lex.token == PLUS) { lex.next e = Plus(e, parseFactor) } e } def parseFactor : Expression = { if (lex.current == INTLITERAL) { val v = lex.getIntValue lex.next IntegerLiteral(v) } else if (lex.current == IDENTIFIER) { val name = lex.getIdent lex.next Variable(name) } else { syntaxError("Expected identifier or integer literal") } }
When we run the code above on
xx + 42 + yy
we obtain the tree:
Plus(Plus("xx", IntegerLiteral(42)), Variable("yy"))