Labs 04: Name Analysis

This week you will add name analysis to your Tool compiler. This will considerably ease the task of type checking that you will start next week or the week after.

The description of this assignment is rather long. Don't panic :) Most of it is to help you avoid forgetting important points. Please read it carefully. The assignment itself is one of the longer ones, but you have three weeks to work on it, and we give you a substantial amount of code.

The goal of name analysis is twofold: we want to reject programs which contain certain types of errors, and we want to associate symbols to all identifiers.

Symbols

Symbols are values which uniquely identify all class, method and variable names in a program, by mapping their (multiple) occurrences to their (unique) definition. Additionally, they map any occurrence of this to the class it refers to. Identifiers are already present in the AST and contain names as well, but these are not sufficient to distinguish between a class member and a local variable with the same name, for instance. This week we will add this missing information to the ASTs.

Rejecting erroneous programs

In the process of mapping occurrences to definitions, we will be able to detect the following kinds or errors and reject programs containing them:

a class is defined more than once
a class uses the same name as the main object
a class or the main object is named Object
two class members in a class have the same name
a class member has the same name as one in an inherited class (field overriding is forbidden in Tool)
a method is overloaded (forbidden in Tool)
a method overrides another with a different number of parameters
in a method, two parameters/local variables have the same name.
a class name is used (as parent class or type, for instance) but is not declared
an identifier is used as a variable but not is declared
the inheritance graph has a cycle (eg. “class A extends ; B {} class B extends A {}”)
this is referenced from the main object

Note: We cannot check for the following errors yet, because we need type checking for them:

Method calls correspond to methods defined in the proper class.
Each overriding method has compatible argument and return types with the method it is overriding.

Implementation

In order to attach symbols to trees, we define a new trait, Symbolic, and a new set of classes for the symbols. The Symbolic trait is parametrized by a class name which allows us to define the kind of symbols which can be attached to each kind of AST node (see Symbols.scala and Trees.scala later for examples).

You need to write your analyzer such that two nodes referencing the same symbol have the same symbol class instance attached to them (that is, reference equality, structural equality is not enough). We defined the Symbol class such that symbols automatically get a unique identifier attached to them at creation. This will allow you to check that you are attaching symbols correctly: we have added an option to your pretty-printer to be able to print these unique numbers along with the identifiers where they occur.

Note that Symbols are also Positional objects, which means you have to set them a correct position. The position of the symbol should be its declaration position. This is necessary to produce meaningful error messages such as “error: class Cl is defined twice. First definition here: …”.

Internal errors

When your compiler encounters an internal error (for example, a scope is not initialized as you expected, a symbol is null, etc.), you should not use the methods from the reporter trait. This will be counted as a mistake. You must use sys.error instead, which will throw an exception and show you the stack trace. The reason is that you shouldn't blame the user for internal errors. In fact, the user should never encounter an internal error. Of course, writing a bug-free compiler is hard…

Symbols as scopes

We will take advantage of the fact that scopes in Tool are only of three kinds:

the global scope (the set of declared classes)
the class scopes (all members and methods within a class, plus the global scope)
the method scopes (the parameters and local variables, plus the corresponding class scope)

This in fact defines a hierarchy among symbols:

all class symbols are defined in the global scope
all methods are defined in a class scope
variables are defined either in a method scope, as arguments or locals, or in a class scope

We encoded this hierarchy in the symbol classes. Therefore, if we have access to a class symbol, for instance, and all symbols were correctly set, we can access from there all method symbols and member variable symbols. This will allow us to easily check if a variable was declared, for instance.

Declarations and Identifiers

For declarations such as MethodDecl, you will notice that not only the declaration itself extends Symbolic, but also its Identifier field. Generally, you must set the (same) symbol both on declarations and their identifiers. Note that since the MethodDecl extends Symbolic[MethodSymbol], you have the guarantee that setting/getting a symbol here will always be a method symbol.

Your task

Here is how we recommend you proceed for the implementation:

First, collect all symbols: create the symbol class instances, and attach them to definitions of the main object, classes, methods, variables and formal parameters.
Implement the second phase of your analyzer which consists in attaching the proper symbol to the occurrences of the identifiers and this. Make sure to attach symbols to all identifiers, whether they are in expressions, statements, or type trees. Make sure to use the lookup methods in the various Symbol subclasses!
You can use your pretty-printer to make sure you attached symbols correctly.
Make sure that you emit errors and warnings when appropriate (using ctx.reporter). Unlike the parser, where it is hard to continue parsing when there is an error in the program, here we can almost always continue analyzing when we encounter a user error, and possibly report more errors later in the program (much like during lexing). So do not use fatal for this phase; instead prefer error.

Execution example

When analyzing the following file:

program Example {
  println(new B().foo());
}
 
class B extends A {
  def foo() : Int = {
    value = 42;
    return value;
  }
}
 
class A {
  var value : Int;
  def foo() : Int = {
    var value : Bool;
    value = false;
    return 41;
  }
}

The (updated) pretty-printer would output something like:

program Example#0 {
  println(new B#1().foo#??());
}
 
class B#1 extends A#2 {
  def foo#6() : Int = {
    value#3 = 42;
    return value#3;
  }
}
 
class A#2 {
  var value#3 : Int;
  def foo#4() : Int = {
    var value#5 : Bool;
    value#5 = false;
    return 41;
  }
}

Note that:

Overriding methods have a different symbol than their overridden counterparts.
Method names in method calls are unresolved symbols.

Constraints

Here are all the constraints that your analyzer should enforce:¹⁾

Variable declarations

No two variables can have the same name in the same scope, unless one of the two cases of shadowing occurs.
All variables used must be declared.

Shadowing

Shadowing can occur in two different situations:

a local variable in a method can shadow a class member
a method parameter can shadow a class member

All other types of shadowing are not allowed in Tool.

Classes

Classes must be defined only once.
When a class is declared as extending another one, the other class must be declared and cannot be the main object.
The transitive closure of the “extends” relation must be irreflexive (no cycles in the inheritance graph).
When a class name is used as a type, the class must be declared. The main object cannot be used as a type.
A class or the main object cannot be named Object.

Overloading

Overloading is not permitted:
- In a given class, no two methods can have the same name.
- In a given class, no method can have the same name as another method defined in a super class, unless overriding applies.

Overriding

A method in a given class overrides another one in a super class if they have the same name and the same number of arguments.²⁾
Fields can not be overridden.

Reference to This

this is only referred to from a class method (as opposed to the main method)

Stubs

Merge the stubs of the parser into your main branch by typing

git fetch git merge origin/Lab04

(assuming that origin is the name of your remote repository).

It should merge the following files in your project. If you get conflicts, don't panic, they should be relatively easy to resolve.

analyzer/Symbols.scala contains the definition of symbols, GlobalScope and the Symbolic trait.
analyzer/NameAnalysis.scala contains a stub for the name analyzer.
ast/Trees.scala contains an updated version of trees with symbols.
ast/Printer.scala contains an updated printer which can print unique symbols.
Main.scala contains code to run the analyzer after the parser.

toolc
  ├── Main.scala          (updated this week)
  │
  ├── lexer
  │    ├── Lexer.scala      
  │    └── Tokens.scala
  │
  ├── eval                     
  │    └── Evaluator.scala 
  │
  ├── analyzer
  │     ├── Symbols.scala       (new)
  │     └── NameAnalysis.scala  (new)
  │
  ├── ast
  │    ├── ASTConstructor.scala
  │    ├── ASTConstructorLL1.scala
  │    ├── Parser.scala
  │    ├── Printer.scala   (updated this week)
  │    └── Trees.scala     (updated this week)
  │
  └── utils
       ├── Context.scala
       ├── Positioned.scala
       ├── Reporter.scala
       └── Pipeline.scala

Deliverable

Please choose a commit from your git repository as a deliverable on our server before Tuesday, Nov. 15th, 23h59.

¹⁾

Note that this is simply a reformulation of the types of errors we want to catch.

²⁾

Of course this constraint will be tightened once we start checking types.