Rich-Model Toolkit: An Infrastructure for Reliable Computer Systems

ABSTRACT

Automated tools that can improve reliability of computer systems have made tremendous progress in the last decade, but their impact is limited by their specialized nature and the lack of interoperability. To address these problems, we propose to develop an infrastructure for automated analysis and synthesis of Rich Models of computer systems. Our Rich Models will have the expressive power of a large fragment of formalizable mathematics, enabling specification and automated analysis of software, hardware, and embedded systems. They will support modeling at a wide range of abstraction levels, from high-level information system architecture to complete software source code and detailed hardware design. The central result of the proposed action will be Rich-Model Toolkit, an infrastructure that precisely defines the meaning of Rich Models, standardizes their representation formats, then develops and deploys tools for automated analysis and synthesis of such models.

BACKGROUND, PROBLEMS

We have recently witnessed the development of a number of very useful tools for automated analysis of particular classes of models: 1) static analysis tools can analyze software source code, automatically constructing mathematical models of millions of pages length, proving non-interference of system components and detecting safety and security errors; 2) description logic reasoners analyze relationships between tens of thousands of terms in medical ontologies and verify their consistency; 3) hardware manufacturers are using SAT solvers, model checkers, and theorem provers to identify and correct subtle errors that could have enormous financial consequences; 4) software vendors are using static analyses supported by state-of-the art automated theorem provers and constraint solvers to prevent software crashes; 5) aircraft manufacturers and space agencies are using analysis tools based on abstract interpretation to eliminate errors in aircraft control software.

Despite these successes, today's automated analysis methods are not widespread in engineering practice, and therefore have a limited economic impact. One factor contributing to this state of affairs are the limitations of the tools themselves: a narrow range of applicability, the lack of automation, specialized input formats, and a passive role of tools in the development process, resulting in an unattractive cost/benefit ratio. Another factor are the social circumstances, such as the lack of quality standards that differentiate formally verified computer systems from systems without formal assurance guarantees.

To address these problems, we must make a conscious effort to unify current specialized algorithms and hide their internal complexity from the users. We therefore propose the notion of Rich Model as a precise universal representation of different types of computer systems and of different levels of detail (from design to implementation, from functional correctness to timing and performance). Driven by this notion, we will coordinate the development of automated analysis and synthesis tools, ensuring that the tools can accept models expressed in the same general-purpose language and enabling communication between tools. We will develop a collection of relevant Rich Models serving as benchmarks for comparing different tools.

Given the complexity of automated analysis and synthesis tasks, it is essential to drive tool development using benchmarks of practical interest. Such initiatives have previously proved successful in advancing SAT solving, constraint solving, satisfiability modulo theory proving, first-order theorem proving, and description logic reasoners. For example, the recent SMT-LIB initiative defined standard input/output formats and interfaces for SMT solvers. Over a period of several years it became supported by a number of SMT solvers, and a number of formal methods client applications. The initiative runs an influential yearly competition, SMT-COMP, and contains a growing repository of over 40,000 benchmarks from academia and industry. Based on these past success stories, we expect Rich Models to lead to more efficient analyses, increase tool interoperability, bridge the gaps between different specialized algorithms, and lead to a much larger class of automatically analyzable problems.

Developing tools that combine different specialized techniques poses non-trivial theoretical, algorithmic, and engineering challenges, such as understanding completeness and complexity, developing efficient search techniques and data structures, and defining interfaces between different components. A collection of Rich Models and tools that communicate using standard Rich Model language will result in Rich-Model Toolkit, an infrastructure for analysis and synthesis that helps developer productivity and the reliability of constructed systems.

The research we plan to coordinate is currently supported by over 10 independent national programs. Ongoing EU efforts that are synergistic with the proposed activities include MOBIUS, High Integrity Java, GAMES ESF Research Networking Programme, and the COST Action IC0701. These activities will be a source of particular classes of models and specialized algorithms, but none of them proposes such a general notion of models and aims at unifying such a broad set of automated reasoning, analysis, and synthesis techniques.

Past EU projects on integration of reasoning techniques included the PROSPER toolkit, funded under the ESPRIT program. Relevant activities in the United States include the integration of formal method tools at SRI and the Bandera tool set at KSU. An emerging world-wide initiative that could build directly on our work is the Verifying Compiler Grand Challenge for Computing Research.

BENEFITS

The Rich-Model Language will be more expressive than any of the languages used to describe analyzable models. It will enable researchers and developers to directly represent the implementations of software and hardware systems, avoiding manual abstraction and lowering the expertise needed to construct sophisticated analysis and synthesis tools. The rich-model language will also foster research in analysis and synthesis algorithms because it will enable researchers to compare a wide range of techniques on a collection of models of practical interest, leading to the exchange of fruitful ideas across different approaches.

Developers and designers of computer systems will directly benefit from the sophisticated tools developed as part of the Rich-Model Toolkit. These tools will detect errors in their designs and implementations, repair errors, and synthesize new implementations from specifications. The developers using such tools will be more productive and will be able to focus more on the creative aspect of the work.

The automation of tools in Rich-Model Toolkit will make tool adoption cost-effective, resulting in higher-quality computer systems, and improving the safety and availability of information technology used by all members of the society. These efforts will reduce the probability of future disasters, such as the 1996 Ariane 5 rocket explosion, 2005 Cryosat satellite crash, and automobile recalls due to software bugs.

OBJECTIVES, DELIVERABLES AND EXPECTED SCIENTIFIC IMPACT

The objective of the COST action is to develop the foundations and tools of the Rich-Model Toolkit infrastructure. Examples of deliverables expected include: 1) the syntax and semantics of the rich-model language; 2) translations between rich-model language and languages Isabelle/HOL, SMT-LIB, TPTP, and OWL; 3) tools that extract Rich Models from software source code, software bytecode, and hardware designs; 4) a collection of Rich Model benchmarks; 5) tools that prove validity of Rich Models, find counterexamples, perform mixed execution, solve optimization problems, synthesize inductive invariants, synthesize realizable models from specifications, and repair models.

Our results will be publicized in leading computer science conferences and journals. We plan to organize a workshop and a competition of automated model analysis, synthesis, and certification tools, coordinating it with current competitions at conferences CADE (CASC), CAV (SMT-COMP), SAT, and MEMOCODE.

SCIENTIFIC PROGRAMME AND INNOVATION

Our research towards the above-mentioned objectives will examine a broad range of problems in automated reasoning, formal methods, programming languages and software engineering. It will include 1) the design of the Rich Model Language and benchmarks, 2) the development of automated reasoning tools that operate on Rich Models, and 3) the development of optimization and synthesis tools for Rich Models.

The design of Rich-Model Language will be inspired by the expressiveness of Isabelle/HOL, while ensuring the efficiency of processing present in more specialized languages, and establishing precise semantic connections with specialized languages. We will also develop formats for expressing manually and automatically constructed proofs of properties of Rich-Models.

To automate reasoning about Rich Models we will develop and implement decision procedures for sets, relations, arrays, and multisets, study their worst-case complexity and usability in practice. We will combine these decision procedures and integrate them with proof search algorithms. In addition to Rich Models as logic fragments, we will consider Rich Models representing infinite-state transition systems, replacing compiler representations with models having precise formal meaning. To analyze such models we will deploy counterexample-driven abstraction refinement and constraint solving techniques. We will also apply description logic reasoners to large models, exploring synergy with program analysis and SMT techniques.

In general, the tools in Rich-Model Toolkit will accept a set of Rich Models and produce a new set of Rich Models (with output in a formally verifiable relationship to the input). This general view will allow us to explore a) optimization problems: combining techniques from SMT, operations research, and computer algebra, and b) repair and synthesis problems: developing game-based synthesis algorithms, solving quantitative synthesis problems, and introducing techniques to improve scalability and usability.

ORGANISATION

The strong need for coordination as well as the fundamental and broad nature of the research involved means that no research program other than COST is appropriate to support our effort at this stage. COST offers an excellent framework for organizing this activity. A number of national programs already support research activities of members. Currently, however, these research activities are carried independently and do not benefit from timely exchanges of ideas that would be provided by the proposed COST action.

Formats of COST activities are suitable for the proposed coordination. The development of representation formats for Rich Models requires discussions of the parties involved, which is best done through joint COST meetings. The core technical work will be done in individual research groups; short-term scientific missions including junior researchers are ideal for fostering intensive interaction in this context.

We will organize our activities within work groups following the thrusts of the scientific program. We will create new work groups in the course of the action, as needed. We are planning the activities for the period of 4 years.