Natural Language Generation from Program Specifications
General idea Specifications for methods and classes, if they are to be used by theorem provers like Jahob or Isabelle, have to be written in some formal logical language. This can be, for a new or unfamiliar user, quite hard to understand or very easy to misunderstand. If one only wants to get an overview of what the method or class does, it would be more helpful to have it written in natural text. The expert user, on the other hand, could use the generated text to check if the specs he/she has just written also reflect what was meant. There are and have been projects concerned with natural language generation, but mostly the purpose was to translate everything more or less literally. What this project would want to achieve is to be able to extract the most important aspects of the formal specification in some form of a summary.
Previous and Related Work
- Attempto Controlled English :
Formally defined syntax for unambiguous use of natural text. Can be translated to and from FOL.
- Grammatical Framework (GF) :
specifies abstract and concrete grammars for different languages, which can then be used to create linguistically correct text.
- Translating OCL specs from JavaCard API to natural text :
GF has been used successfully to generate 'good' and readable natural text description for existing Java Card API specification. The generation was literal, i.e. translating everything as is, only adapting it to English syntax. Additionally, in this example, formatting was used to increase readability.
- Verification of informal proofs :
The papers describes how to process informal proofs (e.g. proofs how you find them in textbooks) and what is involved. For example, it outlines, which information is implied in 'human' proofs and must be added for a computer to be able to process it.
Looking at method specifications
- postconditions (the most interesting)
- refer to 'old' values, e.g. x=old(x)+1
Techniques and Ideas:
- different forms of FOL or other logics may be more suitable for text generation
- approximation of HOL with FOL
- zoom level: have a parameter that determines how concise the specification should be (heuristics)
- include domain specific implied knowledge, e.g. “index” is always an integer
- use concept of informativity [5, chapter 1] (maybe): if a formula is valid, it contains to 'useful' or 'new' information. If we could add domain specific knowledge to existing formulas and show that the resulting formula is valid, then we could assume that the original formula does not have to be translated, since it does not contain information the user does not know anyway. (May not work, since we need to check for validity…)
- interesting naming conventions in Java : Infer meaning or functionality from method names. Maybe create a Lexicon with meaning of certain high-level words:
- insert, remove, add
- increase (x=old(x)+1)
- start, stop
Since a parser and an AST implementation are already available in Scala (part of Guru project), we'd like to reuse those. The svn repository is here: svn co https://larasvn.epfl.ch/svn-repos/projects/guru/trunk
The Higher-Order Logic syntax tree can be found here: vepar/src/guru/hol/ast
and the pretty-printer is here: vepar/src/guru/hol/printers/isabelle
- manage Scala and understand structure of Guru project
- look at printing FOL in English
- look at examples:
- JML examples from Dutch and Irish voting system (UCD Projects)
- consider a generator for English
 Attempto Controlled English, Attempto Homepage
 Grammatical Framework (GF), http://www.cs.chalmers.se/~aarne/GF/
 Translating Formal Software Specifications to Natural Language, by David A. Burke and Kristofer Johannisson, http://dx.doi.org/10.1007/11422532_4
 Supporting the formal verification of mathematical texts , by Claus Zinn, http://dx.doi.org/10.1016/j.jal.2005.10.010
 Representation and Inference for Natural Language, A First Course in Computational Semantics, by Patrick Blackburn and Johan Bos, http://homepages.inf.ed.ac.uk/jbos/comsem/book1.html
 PhD project from Einar W. Host (several papers), http://www.nr.no/~einarwh/