LARA

Old Projects

Static code analysis for real-world Scala code base

Enterprises want to “manage” code quality, enforce code standards, flag dubious code (find bugs). Gist

Create an extensible static analysis tool based on the Scala front-end. Expose a simple API for developers to add their own “bug finding” strategies. Use case: once a bug was found (for example, memory leak through outer pointer), it should be easy to make the tool find all occurrences of that pattern in code.

The tool will be based on attributed Scala syntax trees (as defined by Scala reflection and macros). A recipe takes a Tree and some context (for instance, the enclosing definition) and produces a number of warnings/errors. Iteration is encapsulated in the tool (so that, ideally, a single pass on the program can collect all errors from all recipes). As an alternative to ASTs, new quasiquote support in Scala 2.11 should be evaluated.

Examples

Application specific

* [akka] mutable message send * [akka/play] blocking calls (future, synchronous message sends) * [akka] sending closures as messages * [play] too much logic in play templates accesses of public mutable fields * [slick/specs]

vs

* [specs] specific quick fixes for compilation errors (forgot 'in')

General recipes

Avoid capturing outer pointers in closures (memory leak potential) Inner extends Outer class (potential memory leak through outer) Use before initialize (eval order in constructor) Avoid using self-types instead of inheritance (unless the dependency is circular, wen you need selfies) Avoid threading values using implicits (allow specific uses, like Akka timeout values) Use val instead of var Unused local/private def unguarded recursion string interpolation syntax without interpolator no return type for public method

Requirements

Integrate with all IDEs work on the command line (integrate with build tools) easy to share recipes: they should be simple snippets of Scala code each recipe should be independently turned on/off, either from the command line or in the code (for instance, using a comment)

In a first step, analysis would use limited context (type-checking information). As the tool evolves, we can integrate data-flow or control-flow analysis (flag dead-code, potential race conditions, etc.)

Milestones

M1

define the Rules interface try the quasiquote support in 2.11 implement 4-5 strategies from the list above, embedding the Scala compiler in the tool

M2

integrate with Eclipse integrate with Sbt

M3

explore support for multiple versions of the Scala compiler

Random Testing for Leon

Extend Leon tool with random testing capabilities. In the process, support general Scala functions, treated as a black box.

Consult, among others, the following research papers:

Verification of Functions with Trigonometry

The Rosa tool can verify various properties of numerical functions, by using the Z3 SMT solver as a backend (https://github.com/malyzajko/rosa). In particular, Rosa is using Z3's nonlinear solver for nonlinear real arithmetic, which is a decidable theory. However, many numerical computations require trigonometric functions such as sine and cosine. As this theory is undecidable in general, we cannot expect to find a solver that is complete, but there are already techniques and implementations that usually define (un)satisfiability up to some uncertainty, for example RSolver (http://rsolver.sourceforge.net/) or dReal (http://dreal.cs.cmu.edu/#!index.md). The goal of this project is to identify a suitable solver and adapt it so that it can be used within the Rosa tool to verify properties of numerical functions with trigonometry.

Incremental Loader for InSynth

Loader extracts declarations (fields and methods) from Scala ASTs and loads them into InSynth. A context is a set of all declarations InSynth sees.

The task is to make the loader more efficient. Now, every time InSynth is invoked, the loader loads all declarations from the context. However, between two consecutive invocations, the context changes a little (only in a few declarations). Thus, one idea is to make an incremental loader, that only loads a difference between two “consecutive” contexts. Thus, one will need to identify the change in the context the moment it occurs. Then, one will update the context accordingly. The entire process will be done in the background, while a user edits the code. Once he/she invokes InSynth all declarations will be already loaded. This way we will drastically reduce the time we spend on loading.

A student will need to implement the engine that will observe the changes and update the context.

Expressions as Input in InSynth

InSynth currently extracts the environment automatically and uses all available declarations (fields and methods) for synthesis. The idea is to allow a user to specify a set of expressions (and declarations) that should/must appear in the snippets. This is also related to fixing the broken expressions.

Generics in InSynth

InSynth supports only ground (simple and function types) in Scala, and synthesizes expressions that contain only declarations (fields and methods) with such types. We would like to extend it such that it supports polymorphic types (generics). This will include reasoning about: bounded, covariant and contravariant type parameters. A student will need to formulate the theory and the algorithm, and incorporate this into InSynth.

Because, this is a very important project for us, a student will have a great help during the development, including the help in understanding the current state of InSynth, the help in theory and algorithm formulation, and of course implementation. This will include development and programming on our side as well.

Offline Machine Learning in InSynth

InSynth uses weights to rank final snippets and to guide the synthesis. At the moment, we use two simple machine learning techniques for assigning weights to declarations, based on proximity of the cursor and the frequency in the code corpus. We would like to experiment with more sophisticated machine learning (ML) techniques, hoping that this will give us better results.

The learning can be done offline (before InSynth is used) based on examples in a corpus.

A student will need to experiment with different ML techniques, like the ones presented in Learning from Examples to Improve Code Completion Systems paper. This will include searching for the related work and finding the techniques that will suit our problem the best. Next, implementing (a prototype) for a few candidate techniques (if there are many). Then selecting the best one and implementing it in InSynth.

Implementation will include both:

  • building the corpus
  • implementation in InSynth

Because, this is very important project for us, a student will have a great help during the development, including the help in understanding the current state of InSynth and the help in identifying useful ML techniques, and of course help during implementation. This will include development and programming on our side as well.

Scala Tutor

Add capabilities of http://pythontutor.com/ to a Scala IDE, either the official one ( http://scala-ide.org/ ) or the Leon one.

BAPA Logic in Leon

Do verification and/or synthesis for BAPA or ordered BAPA.

There can be at least 2-3 projects here. Please contact Viktor Kuncak.

SMT-Based Synthesis

Implement synthesis based on capturing quantifier instantiations.

Integrate Insynth completion to Web IDE for Leon

Integrate code-completion from Insynth within the Leon Web IDE.

Improve Web IDE for Leon

Add features such as sharing and exchanging programs, continuous error reporting, counterexample visualisation, specification debugging. Consider comparison to: http://alloy.mit.edu/alloy/

Leon front ends

Add a front-end (lexer+parser) for Leon for:

Scala System for Conference Management

(This project is initially software design and software development in Scala, but connects to ideas in software synthesis and implicit programming, as well as the Scalatex project. It permits for multiple people to take part in the project.)

Understand existing 'content management' or related systems built on top of mainstream Scala libraries and extend them to build working conference management and shared calendar, and shared TODO systems. Focus on systematically mapping data structures and databases onto editable forms, so that creating a new system of this form becomes very easy. Possible extensions:

  • push one of the applications into a usable open-source application
  • design a graphical/web front end to create such systems without necessarily writing lines of code
  • design a domain-specific language for writing such applications

See also:

Synthesis

Implicit Programming

Explore innovative mechanisms to help people (not necessarily programming experts) express their requirements in intuitive ways, and compile such descriptions into executable programs. This is a part of the Implicit Programming agenda:

http://lara.epfl.ch/w/impro

We are looking at approaches that are based on compiling specifications, domain-specific reasoning, as well as novel user interfaces and methods to automatically repair incorrect requirements into meaningful requirements.

Explore Connections with Autobayes

Verification with Leon

Leon, http://lara.epfl.ch/w/leon, is a system for verification and synthesis of programs in a simple, mostly functional, subset of Scala. To learn more about Leon, see Satisfiability Modulo Recursive Programs.

Verification of Functional Scala Code

Extend the Leon verifier by defining a robust interface that approximates any Scala code with its functional subset. Use this approximation feature to apply Leon to substantial Scala code base. Please contact Viktor Kuncak and Philippe Suter for additional details and background.

Verification of Linked Structures using Leon

Extend Leon to support verification of linked structures. See the above item, as well as the Jahob system, and this paper on decision procedure for tree reachability.

Extend Leon with Parametric Polymorphism

Add a simple form of type parameters to Leon. This will enable defining generic lists and similarly reusable algebraic data types.

This requires enhancing the front end, the formula trees, and connection to Z3.

For some theoretical implications, please see:

but soundness for proofs should be achievable by mostly considering unknown type parameters to denote (countably) infinite sets.

Add Global Variables to Leon extension

Leon has an extension that supports imperative programs, but it currently only supports local mutable variables. Add support for global variables declared in an object. Translate them by introducing them as function parameters and results.

To help with modularity of the transformed programs, introduce a non-recursive algebraic data type storing all global variables.

Global variables should support any of the existing types, such as integers, algebraic data types, sets, maps.

Time permitting, consider allowing mutable fields in classes and transforming them into global maps.

Translate Leon programs into Horn clauses

Currently Leon mostly requires precise postconditions to verify recursive functions. The goal of this project is to improve the automation in Leon. Leon will therefore automatically infer some of the postconditions using static analysis.

The approach explored here is to develop a translation from Leon into Horn clauses and then run one of the tools that work on Horn clauses.

One such tool is Eldarica. See Eldarica page and this paper,

Another tool is HSF.

Bounded Universal Quantifiers

Leon uses only quantifier-free constraints so far, which allows it to predictably find counterexamples. Bounded universal quantifiers can in some cases be expanded into recursive functions, but can be handled more efficiently in many cases. To improve the efficiency of reasoning and the convenience, the goal of this project is to add bounded universal quantifiers to Leon.

One example approach is to allow

 e.forAll 
 e.exists

where e is of type Set[A].

This will be useful for especially for stating properties of maps and arrays. Existing Leon benchmarks provide a good example where several Boolean functions can be expressed using quantifiers.

Automated Discovery of Algebraic Specifications

Automatically generate candidate specifications of functions (for example, associativity of a binary operation, monotonicity, etc.). Then test candidate hypothesis and introduce them as Leon lemmas.

Consult Philippe Suter and Viktor Kuncak

Partially addressed

Comparison of Leon, Why3, and Dafny

Compare three related verification tools:

Translate benchmarks from one tool to another when this is natural, and understand reasons in verification technology that contribute to differences in the success for verification.

Optionally, you may choose to transfer some of the important advantages of Why3 or Dafny to Leon.

String and List Concatenation

Extend Leon to support theory of string and list concatentations. Study the existential word problem and context unification problems and examine verification conditions that arise in practice.

Extend Leon with Higher-Order Functions

This extension will allow Leon to support higher-order functions such as map and fold. We can therefore prove interesting lemmas about them, such as:

(map(f) andThen map(g))(x) == map(f compose g)(x)

and do reasoning at a higher level of abstraction.

Ideally, the project will support the

(x => E) 

notation for anonymous functions, functions as parameters, representation of function parameters as uninterpreted functions/maps, and, during reasoning, replacement of function parameters with concrete anonymous or non-anonymous functions.

Templates in Function Postconditions

Allow postconditions to use unknown parameters and develop techniques to synthesize them.

This can be used in conjunction with specifying running times of functions. Instead of saying that the running time of a method is 10*x+2, we can write, e.g. 'a*x+'b.

See, for example, this paper:

Proof Carrying Code

The idea of proof carrying code is that, just as we compile code, we can also compile Hoare logic proofs for this code to assembly level. The advantage of doing that is that we can go much further than Java bytecode verifiers. We can have lower-level code (like binaries) and prove higher-level properties (like assertions or postconditions), if we ship binaries with additional annotations.

The proof carrying code model is that a high-level language is verified, and then compiled together with the information about the result of the verification. When the client downloads the code, a low-level code verifier examines the binary and the associated proof. Without trusting whoever produced the binary or the proof, it checks whether the binary satisfies the shipped properties. The properties are required to be detailed enough, so that checking this is fast. In that way, we do not need to trust the high-level language compiler or verifier, as long as the low-level code verifier is correct. A broken verifier or compiler will generate binaries that will be refused by low-level verifier.

In this project, you will explore building a basic form of proof carrying code for Leon. You can choose either to work with the existing compiler to bytecodes in Leon, or compile to a concrete or abstract assembly language.

For more information, please see:

Online Machine Learning in InSynth

Machine learning can be done online (while InSynth is used) as well. Here, a student will implement an engine that will learn directly from the InSynth user and his/her code. This includes collecting and reasoning about recently used snippets and similar statistics, plus, data we can derive from the user's code. Note that the latter is quite similar to the previous project, except that is done in an online setting.

More verification

Verifying Security Properties

Background: Spi-calculus is a formalism designed to represent cryptographic protocols. In a spi-calculus model there is a set of processes running in parallel. The processes can send and receive messages through channels. There is an automatic verification tool called ProVerif for verification of models in spi-calculus. The input language of ProVerif is based on spi-calculus. ProVerif translates the spi-calculus description into a set of Horn clauses for internal representation. It can also directly analyze Horn clause description of security protocols.

Problem description: There are some predicate abstraction tools (HSF, Eldarica) that can handle Horn clauses. The goal of this project is to verify some of the security benchmarks using the the predicate engines. Second (optional), we would like to find some completeness grantees for security protocols. This means to find some basic classes of protocols for which we ensure that the predicate abstraction engine can finally decide if the program is correct or not.

Proving Correctness of Scala Data Structures

Use verification tools to do correctness proofs for several data structures in Scala. See e.g. Jahob verification system.

Integrate SWI Prolog or CVC4

Explore integration of “JPL, a bidirectional interface between SWI-Prolog” into the Kaplan constraint programming system. Similarly, explore CVC4 as an additional solver for Kaplan. See the Constraints as Control paper.

Lists with Concatenation

Implement a theorem prover for constraints including concatenation of sequences, possibly extended with the length constraints.

See Satisfiability of Word Equations with Constants Is in PSPACE

Shape Analyzer for Scala

Design and implement a shape analysis tool for Scala. The tool will perform static, dynamic, and syntactic analysis of data structures in Scala programs and determine the classes of shapes that these data structures can take. It will determine whether data structures can have cycles, whether they are tree-like, DAG-like, or list-like. The tool can be used to check program invariants of interest or aid in building parallel or concurrent Scala applications.

For more information, please consult e.g. a paper on Bohne symbolic shape analysis.

Decision Procedure for Integer Linear Arithmetic

Implement an algorithm that, given a formula involving linear arithmetic expressions (+,-, multiplication by constants, logical operators) determines the truth value of the formula.

Such algorithms have applications in software verification and theorem proving, see Presburger arithmetic in the Software Analysis and Verification course.

Parallel Theorem Proving Infrastructure

Extend our theorem proving infrastructure to dispatch formulas in parallel to external tools, automatically detecting which ones are suitable, and/or by applying validity-preserving rewrites first. Deploy the implemented tool on our server and write a client to query it. An understanding of first-order logic and good Scala programming skills are required. Knowledge of higher-order logic (lambda calculus) is a strong plus.

Mathematical computation

Approximating nonlinear functions

Evaluating nonlinear functions can be expensive, especially if done repeatedly, as functions like sin, cos, log, etc. are evaluated correct up to the last digit. However, such precision is not always necessary, e.g. if the rest of the computation is imprecise anyway, or only a small number of decimal digits is needed to make a decision. This project will explore ways of automatically approximating such nonlinear functions with guaranteed error bounds by using linear approximations. The identification of realistic examples and evaluation of the precision and efficiency of the method is part of the project. This project complements approaches that determine suitable combinations of approximations to achieve an overall precision, but require the user to provide the initial approximations manually, e.g.:
Z.A. Zhu, S. Misailovic, J.A. Kelner, M. Rinard, Randomized accuracy-aware program transformations for efficient approximate computations. POPL 2012

Explore Connections with Octave

Computer Algebra and Scala

Write a small but extensible computer algebra system in Scala.

See the functionality of systems like Axiom_(computer_algebra_system), SAGE_(computer_algebra_system), and REDUCE.

Alternatively, integrate Scala with an open-source Java-based computer algebra system (see Comparison of computer algebra systems), such as

Rendering:

Possible realization: a library that overloads mathematical operators to manipulate syntax trees, and can be used from within the 'scala>' prompt.

More fun projects

Inference of Regular Tree Constraints

Design and implement an inference engine that will compute regular tree grammars that describe subsets of algebraic data types that functions manipulate.

For example, for a recursive function that removes certain types of nodes from an abstract syntax tree, the result will be a grammar describing a subset of trees that it returns (omitting the removed node types).

Probabilistic Programming in Scala

Develop an extension of Scala for manipulating probabilistic distributions, building distributions from execution statistics, and using them to direct program execution.

Integrate a PHP Analysis Tool into Eclipse

See

Scala as a Proof Language

Explore the effectiveness of Scala as a language for writing down formal proofs, such as the proofs used in Isabelle or Coq systems.

Semantic document processing

The overall idea of this broad research direction is to design techniques for analyzing semantic information present in documents such as software documentation, web pages, lectures notes. It therefore spans the fields such as automated reasoning, programming languages, natural language processing, knowledge bases, machine learning, and semantic web.

Here are some relevant links.

An Empirical Study of Code Readability for Scala

In a programming language such as Scala, there are often many more than one way of doing things. Consider for instance that:

var i = 0
while(i < 10) {
  println(2 * i)
  i += 1
}
for(i <- 0 to 9) {
  println(2 * i)
}

and

(0 to 9).foreach(i => println(2 * i))

all do the exact same thing. The goal of this project is to find out what makes one variant more readable than another to a particular programmer. The project would be carried out as a combination of a Scala compiler plugin and a user-study, most likely in the form of a web application, and advertised to the Scala community.

Music Generation in Scala

(Music theory background preferred.) Create a library for generation of music by programming in Scala. Focus on concisely encoding and specifying existing musical styles or even variations of a given song. Support a full range from creating own compositions to creating programs that accept parameters and generate a variations of a given song following the parameters, while preserving the quality of the melody. Application to education.

A LaTeX-like Document Description Language Embedded within Scala

The idea of this project is to use Scala's support for string interpolation to develop a domain-specific embedded language inside Scala that supports writing documents similar as in LaTeX. The resulting system would allow writing a text document as a Scala string with embedded invocations to Scala functions that replace LaTeX macros. Running the Scala program would produce an output as a PDF file (initially going through LaTeX first).