Natural Language Generation from Program Specifications

General idea Specifications for methods and classes, if they are to be used by theorem provers like Jahob or Isabelle, have to be written in some formal logical language. This can be, for a new or unfamiliar user, quite hard to understand or very easy to misunderstand. If one only wants to get an overview of what the method or class does, it would be more helpful to have it written in natural text. The expert user, on the other hand, could use the generated text to check if the specs he/she has just written also reflect what was meant. There are and have been projects concerned with natural language generation, but mostly the purpose was to translate everything more or less literally. What this project would want to achieve is to be able to extract the most important aspects of the formal specification in some form of a summary.

  • Attempto Controlled English [1]:
    Formally defined syntax for unambiguous use of natural text. Can be translated to and from FOL.
  • Grammatical Framework (GF) [2]:
    specifies abstract and concrete grammars for different languages, which can then be used to create linguistically correct text.
  • Translating OCL specs from JavaCard API to natural text [3]:
    GF has been used successfully to generate 'good' and readable natural text description for existing Java Card API specification. The generation was literal, i.e. translating everything as is, only adapting it to English syntax. Additionally, in this example, formatting was used to increase readability.
  • Verification of informal proofs [4]:
    The papers describes how to process informal proofs (e.g. proofs how you find them in textbooks) and what is involved. For example, it outlines, which information is implied in 'human' proofs and must be added for a computer to be able to process it.

This project

Looking at method specifications

  • signature
  • preconditions
  • postconditions (the most interesting)
    • refer to 'old' values, e.g. x=old(x)+1

Techniques and Ideas:

  • different forms of FOL or other logics may be more suitable for text generation
  • approximation of HOL with FOL
  • zoom level: have a parameter that determines how concise the specification should be (heuristics)
  • include domain specific implied knowledge, e.g. “index” is always an integer
  • use concept of informativity [5, chapter 1] (maybe): if a formula is valid, it contains to 'useful' or 'new' information. If we could add domain specific knowledge to existing formulas and show that the resulting formula is valid, then we could assume that the original formula does not have to be translated, since it does not contain information the user does not know anyway. (May not work, since we need to check for validity…)
  • interesting naming conventions in Java [6]: Infer meaning or functionality from method names. Maybe create a Lexicon with meaning of certain high-level words:
    • insert, remove, add
    • increase (x=old(x)+1)
    • update
    • start, stop


Since a parser and an AST implementation are already available in Scala (part of Guru project), we'd like to reuse those. The svn repository is here: svn co

The Higher-Order Logic syntax tree can be found here: vepar/src/guru/hol/ast

and the pretty-printer is here: vepar/src/guru/hol/printers/isabelle

Next steps

  • manage Scala and understand structure of Guru project
  • look at printing FOL in English
  • look at examples:
  • consider a generator for English


[1] Attempto Controlled English, Attempto Homepage

[2] Grammatical Framework (GF),

[3] Translating Formal Software Specifications to Natural Language, by David A. Burke and Kristofer Johannisson,

[4] Supporting the formal verification of mathematical texts , by Claus Zinn,

[5] Representation and Inference for Natural Language, A First Course in Computational Semantics, by Patrick Blackburn and Johan Bos,

[6] PhD project from Einar W. Host (several papers),