LARA

Funcheck and ScalaCheck

ScalaCheck is a testing framework for automatic test case generation. The interesting part about it is that test cases are defined through properties that resemble logic statements. For instance we can write properties such as

$\forall xs: List[Int].ys: List[Int]. xs.size + ys.size = (xs ::: ys).size$

Or, using ScalaCheck syntax

  forAll[List[Int],List[Int]] { (l1, l2) => l1.size + l2.size == (l1 ::: l2).size }

And ScalaCheck will take care of generating input test case with the intent of invalidating the stated property. Naturally, if ScalaCheck is not able to find a counter-example for the specified property, this does not (evidently) imply that the property holds in general.

Instead, this is exactly what FunCheck is meant to be doing. FunCheck will be using a very similar syntax to ScalaCheck for performing static analysis and prove the correctness of the declared properties. The above property on list can be written using the FunCheck Library as follow:

  forAll[(List[Int],List[Int]])] { p => p._1.size + p._2.size == (p._1 ::: p._2).size }

The property will then be translated (by the FunCheck plugin) into a logic statement that will then be fed to a SMT-solver (such as Z3) for proving whether it is satisfiable.

Interfacing FunCheck and ScalaCheck

When a program containing a property such as forAll[(T1,T2,..,TN)]( p ⇒ f(p)) is compiled with the usual scalac environment, the runtime behavior will certainly not be the one expected. This because the provided implementation of the FunCheck Library does not compute any smart algorithm (you might want to take a look at the library now). However, programs compiled with the FunCheck plugin will be transformed so that when the program is run under the scala environment, every declared forAll property will be checked against a number of randomly generated input tests. The engine that is responsible of generating the input tests is ScalaCheck. This binding is performed when the program is compiled using our FunCheck plugin. the The magic consists in intercepting all places where the (FunCheck) library method forAll is called and replace these calls with the ScalaCheck library method forAll. This requires some slight adaptation but the translation can be automatized. The nice part about this is that sources compiled with the FunCheck plugin will be using ScalaCheck at runtime to automatically generate input tests.

ScalaCheck Generators

ScalaCheck expects that one generator is defined for each type used in a property. For instance, in the above defined property we used List[Int], therefore we should declare a generator for Int and a generator for List[T]. Fortunately, ScalaCheck provides generators for basic types such as Int, Double or String and for some immutable data structures, e.g., Map, Set or List (a complete list of all built-in generators is given later in this page). Therefore, for the above property we will not have to create a generator to get the job done.

Generators for user-defined datatypes

So far so good. But what if we want to generate input tests for a user-defined data structure, e.g. a binary tree. Let's take a rather simple (case) class hierarchy (written using the Pure Scala subset):

  sealed abstract class Tree
  case class Node(left: Tree, right: Tree, v: Int) extends Tree
  case class Leaf() extends Tree

In this case the user would need to manually create generators for each declared class. That would be something on the line

  def genTree: Gen[Tree] = Gen.oneOf(genLeaf,genNode)      // takes one among the wish list
  def genLeaf: Gen[Leaf] = Gen.value(Leaf())               // generate a value
  def genNode: Gen[Node] = for { 
                             left <- genTree               // note that this is recursive, so this is why we will get different input trees
                             right <- genTree
                             v <- Arbitrary.arbitrary[Int] // built-in generator for integer numbers. Technically, generators are wrapped into Arbitrary objects since forAll take these as parameters (but this is a detail ...)
                           } yield Node(left,right,v)  

The process of defining generic generators for such recursive structures (case classes) can be automatized. By instrumenting the code with a @generator annotation next to the case class declaration the FunCheck plugin will automagically define the above generators and inject them in the program that is going to be compiled. Hence, the next snippet of code will automatically contains generators for the user-defined datatypes

  @generator abstract class Tree
  case class Node(left: Tree, right: Tree, v: Int) extends Tree
  case class Leaf() extends Tree

There are some rule that defines how we process the @generator annotation, depending on the AST element that is associated with. If @generator appears in the

  1. definition of an abstract class, then a default generator is created for the current abstract class and for all the children of the class (recursively).
  2. definition of a class, then a default generator is created only for the current class.
  3. definition of a method, then a generator is created for the type of elements that is returned by the method (we will see an example of this shortly).
  4. if the @generator annotation is not used, then no default generator is created.
  5. the exception to the above rule is that for every declared abstract class we always create its generator. This is a good idea in practice since it reduces the need of explicit annotations.

So, we said that the @generator annotation can be used to adorn methods (point 3), let's see how this work in practice and how generators are created in this case (and also how we solve potential conflicts among multiple generators for the same type T).

We will take a Leftist Heap (priority queue) to exemplify the problem. A correct leftist heap respects the invariant that the node storing the minimum element is always the root. In order to maintain this invariant the user should always create new nodes through the insert(h: HeapTree, e: Elem) method. Let's look at the code

   object LeftistHeap {
     @generator abstract class HeapTree
     case class T(val rank: Int, val el: Elem, val left: HeapTree,val right: HeapTree) extends HeapTree
     case class E() extends HeapTree
     
     @generator case class Elem(v: Int)
     
     @generator def insert(h: HeapTree, e: Elem) = merge(h,T(1,x,E,E)) 
     
     def merge(that: HeapTree): HeapTree = //... code for merging two heaps
   }

In this case FunCheck will inject the following generators:

   // generator due to @generator appearing in the case class definition
   def genElem: Gen[Elem] = for(v <- Arbitrary.arbitrary[Int]) yield Elem(v)
   
   // generators due to @generator appearing in the abstract class definition
   def genHeap1: Gen[HeapTree] = Gen.oneOf(genE,genT)
   def genE: Gen[E] = Gen.value(E())
   def genT: Gen[T] = for {
                           rank  <- Arbitrary.arbitrary[Int]
                           el    <- genElem
                           left  <- genHeap1
                           right <- genHeap1
                         } yield T(rank,el,left,right)
   
   // generator due to @generator appearing in the method declaration (note that the parametric type fof the Generator is the returned method's type)
   def genHeap2: Gen[HeapTree] = for {
                               e <- genElem
                               h <- genHeap2
                             } yield insert(h,e)

   // since there are multiple generators for the type **HeapTree**, we create a new one that combines the others
   def genMultipleHeap: Gen[HeapTree] = Gen.oneOf(genHeap1, genHeap2)

However, following this definition it is possible to create heaps that do not respect the invariant. Specifically, if the generator genHeap1 is called, the heap is created by directly using the case class constructor instead of the insert method. Depending on what properties the user want to test, it might be inconvenient to generate incorrect heaps. This can be avoided by modifying how the @generator annotation is used.

  object LeftistHeap {
    abstract class HeapTree
    case class T(val rank: Int, val el: Elem, val left: HeapTree,val right: HeapTree) extends HeapTree
    @generator case class E() extends HeapTree
    
    @generator case class Elem(v: Int)
     
    @generator def insert(h: HeapTree, e: Elem) = merge(h,T(1,x,E,E)) 
     
    def merge(that: HeapTree): HeapTree = //... code for merging two heaps
  }

Will cause the following generators to be injected

   
   def genElem: Gen[Elem] = for(v <- Arbitrary.arbitrary[Int]) yield Elem(v)

   def genE: Gen[E] = Gen.value(E())
   
   def genHeap: Gen[HeapTree] = for {
                               e <- genElem
                               h <- Gen.oneOf(genE,genHeap)
                             } yield insert(h,e)

Which will ensure that only conform heaps will be produced as input test.

Generators for recursive data-types

One problem that we encountered while automatically injecting generators for user-defined, recursive, data-types is the risk of incurring in a StackOverflow error while we try to evaluate the concrete generator. For a simple example take the generator of Node in the above discussed Tree example. It is easy to see that the generator might be looping forever if it tries to evaluate every single function call (remember that evaluation in scala is by value, and not by name!).

Fixing this is simple and what we do currently is that every created generator is wrapped into a lazy generator which will prevent him from being evaluated too soon.

// Each generator is wrapped into a (built-in) lazy generator
def genTree: Gen[Tree] = Gen.lzy(Gen.oneOf(genLeaf,genNode))    
def genLeaf: Gen[Leaf] = Gen.lzy(Gen.value(Leaf()))               
def genNode: Gen[Node] = Gen.lzy(for { 
                           left  <- genTree 
                           right <- genTree
                           v     <- Arbitrary.arbitrary[Int] 
                         })

Admittedly, we could be smart and wrap into a lazy generator only the generator that are used to create recursive data-types (e.g., like the Node data-type, but not the Leaf one). But this was a quick fix which worked and we had (at the time) more important concerns to have the framework working by the deadline. hough, this is something that should be fixed in the future.

ScalaCheck built-in Generators

The following is a comprehensive list of the available generators in ScalaCheck.

  • Int
  • Double
  • String
  • Date
  • Long
  • Throwable
  • Char
  • Byte
  • Option
  • Either
  • Map
  • List
  • Stream
  • Array
  • Set
  • ArrayList (this is java.util.ArrayList)
  • FunctionN (with arity N from 1 to 5, i.e., Function1, Function2, …, Function5)
  • TupleN (with arity N from 2 to 8)

FIXME Will have to see how we can inject a generator for Multiset (since this is not a user-defined data type)

Example of Input to Output Transformation

A complete example of the transformation that the FunCheck will perform on a source is the following. Input, is the program written using the FunCheck library and by instrumenting the source with the @generator annotation. Output, shows how the program would look like after the FunCheck transformation. Note how the property definition is changed and the generators (and Arbitrary objects, which are containers for generators and are passed in arguments to properties) have been injected in the source.