LARA

Available Student Projects in LARA - EPFL Internal Page

© 2016-2019 Laboratory for Automated Reasoning and Analysis. All Rights Reserved. Any reproduction of the parts of this page in printed or electronic form requires a prior written permission from Viktor Kunčak.

These projects are suitable for both BSc and MSc projects (including master's thesis projects) for students in the IC school of EPFL, and are suitable for both computer science and communication systems students. Each project is for one student in a given semester. Students report directly to Prof. Viktor Kunčak, even if PhD students or post-doctoral researchers are involved to help with various aspects of the project.

Nature of these projects

Below we list several projects reflecting the current interests of our research group. All projects can be adapted to fit the scope of a semester project (bachelor & master) or master's project. Typically we expect the knowledge of courses such as Computer Language Processing and, at the MSc level, of Formal Verification.

For MSc and PhD projects, please consult useful guidelines in the student projects repository.

Licence for code: Please note that we release software under open source licenses and we request our contributors to agree that EPFL LARA holds copyright of the code that you contribute.

Disclaimer: If you have your own idea, feel free to check with us, but please note that sometimes we cannot accommodate your own project suggestions even if the student is brilliant and the project is excellent. Moreover, even if you select one of the projects listed below, we reserve the right to not offer you the project. By deciding to talk to us you agree that you will not require detailed justification on why we made our negative decision.

Several example projects follow.

Relating Natural Languages to Programming and Specification Languages

Evaluating semantic relatedness of sentences in program documentation

We are interested in processing program source comments and matching sentences that refer to same entities or topics. Our primary concern are doc comments for methods of Java programs. Each such comment has two parts: free-form natural language description of method functionality and a set of sentences starting with an @-tag which refer to specific parts of method declaration and behavior. We would like to find likely matches between sentences from the two parts. For example, in the comment below from the ChatNoir OSS project we would like to establish correspondence between sentences {1,3,4}, {2,6} and leave {5} as a singleton.

/**
      * Parse and return document with the given name. Returns a map with a "content" key for the parsed contents
      * and further parameters from the YAML front matter, which can be passed to the template.
      *
      * Parsed documents are cached.
      *
      * @param docPath document path
      * @param request HTTP request for URL rewriting
      * @return map containing parsed document and meta data, null if document does not exist
      */

In this project we would first need to address tasks of document formatting cleanup (doc comments are frequently written in HTML markdown), breaking comment blocks into sentences, word stemming and POS-tagging for syntactic similarity evaluation. Then we would proceed to semantic similarity evaluation, example tasks being synonym matching applied to software engineering domain (e.g., quit-close becomes likely synonym pair) and identifier matching.

The data will be taken from FunCom and CodeSearchNet parallel corpora of documented Java methods.

Analyzing Word Use in Program Documentation

Stop word removal is one of the most common operations in processing natural language texts. However, such lists are often created in a way that they include semantically valid words in the context of program documentation texts (e.g., the word “to” in the sentence “This method adds an Object to this queue”). We would like to explore stop word lists generation in the context of program documentation, more specifically for Javadoc of the software libraries such as Apache Commons and Google Guava.

Grammatical Framework Controlled Natural Language for Method Return Statements

Use Grammatical Framework to design a Controlled Natural Language (a subset of English) that is suitable for describing conditions of multiple return statements of Java methods in a single sentence. The data will be taken from FunCom and CodeSearchNet parallel corpora of method-comment pairs. The designed grammar should generate (simplified) sentences such as “@return changes or null if the input is empty” for cases like one below (FunCom entry 1339885):

/**
     * Draw a line and add the changes to the undo queue
     * @param line
     * @return the changes.
     */
 
public ChangeSet draw(Collection<LineSegment> line) {
        if (line.isEmpty()) {
            return null;
        }
        ChangeSet changes = new ChangeSet(this);
        for (LineSegment ls : line) {
            addSegment(ls, changes);
        }
        this.changes.add(changes);
        return changes;
    }

Grammatical Framework Controlled Natural Language for Data Structure Operations

Use Grammatical Framework to design a Controlled Natural Language (a subset of English) that is suitable for writing JavaDoc-like descriptions of operations on data structure implementations (such as lists, stacks, queues, hash tables, etc.). For example, the grammar should accept (simplified versions of) descriptions such as this one (from MinMaxPriorityQueue of Guava:

/**
   * Adds the given element to this queue. If this queue has a maximum size, after adding {@code
   * element} the queue will automatically evict its greatest element (according to its comparator),
   * which may be {@code element} itself.
   */
  @CanIgnoreReturnValue
  @Override
  public boolean offer(E element) 

Note that we would like to represent the structure of natural language sentences (such as “adds the given element to this queue”), in addition to method signature, modifiers and @code elements.

Parsing and Printing Combinators

LARA has developed the ScaLL1on parsing framework: https://github.com/epfl-lara/scallion/ that is used in several course as well as in our research. We have several projects related to advancing this framework, including:

  • Implementing new parsing algorithms (such as SLR, LALR, CYK or Earley's algorithm) and benchmarking their performance.
  • Offering a unified API for parsing and lexing, such that both the parser and lexer can be described in a single step.
  • Working on the pretty printing capabilities, implementing strategies to produce nicely indented and spaced pretty printed code.

Further Projects

Verifying JavaScript

Design a translation of a clean subset of JavaScript including higher-order functions and (mostly immutable) objects into Stainless. Map numbers into Double (apply overflow checks as needed), and enable annotations or special function calls to describe types and contracts.

Note that `(f:TObj) ⇒ e` can be encoded as `TObj(x ⇒ e)`.

Verified Parameterized-Length Bitvectors in Stainless

Define and prove correct a library for performing arithmetic (+, -, *, /, %) on modular arithmetic with large integers. Such operations are analogous to Int in that they perform modular arithmetic (unlike the unbounded BigInt), but the number of bits can be user-specified (unlike Int).

Verify Security Protocol

Propose a way to model security protocols in Stainless and perform a case study to verify example protocols.

Libraries and Applications in Stainless

Develop and specify additional applications and/or libraries for Stainless system, along with their specifications. Examples of applications include parsing algorithms, compilers, email clients, web applications (e.g. a news portal). Examples of libraries include data structures, linear algebra libraries, user interface libraries, as well as sound and graphics processing libraries.

Among possible directions is to use http://www.scala-js.org/ to generate Javascript from Stainless. To make such code useful, explore building Stainless libraries that interface to Javascript APIs (focusing to the extent possible on stateless APIs) and build Leon specifications and stubs that approximate the behavior of these APIs to enable verification and synthesis.

Low-Level Back-End for a Stainless Subset

Port the GenC back end from Leon to Stainless. This generates C code from a subset of Leon, and evaluate it on case studies for embedded code, IoT code, or fragments of OS code functinality.

Embedding Scala into Isabelle

Develop an embedding of a subset of Scala into Isabelle and enables its verification using Isabelle/HOL, as in the following work:

Contact us

If one of the projects lists sounds interesting and you believe you are a suitable candidate, please do not hesitate to contact us. In your email, please include your CV, the list of your grades, and a description of projects that you did at EPFL or outside of EPFL.