Program Reversal

Note that this version is obsolete, and contains many errors.

Please see Ersoy Bayramoglu's master's thesis.

Introduction

The undo operation, mostly popularized by text editors, goes back to the ENIAC[] times. It has important uses in areas like debugging, databases, programming language environments, fault tolerant systems, and error recovery.

The goal of this project is to provide memory efficient reverse execution mechanisms. The central ideas are making use of the run-time environment, data structure persistence, and programmer annotations.

Repository for the code and the thesis: https://svn.epfl.ch/svn/reversal/

alternatively: https://svn.epfl.ch/svn/undo/

Here is an introductory presentation: https://documents.epfl.ch/users/b/ba/bayramog/public/talk.pdf

Preliminaries

An undo model is generally defined by the combination of commands, history of these commands, and the undo operator that works to manipulate the history. (The history does not have to be there physically, but even in that case it is good to think in terms of a history.) Different undo models have been proposed over the years. One property that distinguishes different undo models is the representation of the undo operation itself. Some systems like the emacs editor[], provide undo as a primitive operation, that is the undo operation itself is added to the history and can be undone. In this setting, there is no need for an explicit redo operation. Other systems like the COPE interactive program editor[], provide undo as a meta-command. In this model, undo manipulates the history, but never appears in it. These systems typically have explicit redo meta-commands.

The second distinguishing feature is whether the undo model provides a linear undo operation or a selective one. Linear undo can only undo the effects of the most recently performed operation in the history file. In order to undo the effects of another operation, the user has to undo all the operations that were performed after that. In a selective undo model, a user can undo the effects of an arbitrary operation in the history without dealing with the following operations. One important difference between different types of selective undo models is how they deal with dependencies between operations. Some models ignore the dependencies. COPE's script model[] is one of them. Programs in the script model are a sequence of scripts. The user can selectively undo the effects of any script executed before. COPE does not undo the effects of he scripts that depend on the undone script. Other selective undo models advocate undoing the effects all of the subsequent operations that depend on the undone operation.

Undo can help users recover from their errors in interactive systems. Brown[] argues that application level undo offers a very limited form of error recovery. It cannot account for problems like operator errors made during upgrades and reconfiguration, external viruses and hacker attacks. To address these issues he offers system wide undo, which can restore system-level state including operating system and application configuration, the executable binaries for the OS and application software, and software patches, libraries and modules.

Applications

Debugging: While debugging, it may be desirable to go few steps back without running the whole program again. Usually the errors cannot be spotted easily by just setting a break point, and in that case being able to go back comes handy.

The best paper in the area: Instruction-level Reverse Execution for Debugging

Extended Technical Report Tankut Akgul's PhD Thesis

Later, they augmented their algorithm with dynamic slicing in order to reverse just the instructions related to the bug in consideration: A Fast Assembly Level Reverse Execution Method via Dynamic Slicing

Stateful Interactive Applications: In principle, users of any interactive application containing state would benefit from having an undo option, so that they can recover from their errors. There has been some general work done in the area like Archer et al[] and Leeman[] but they fail to give any examples of such applications other than the well known editors.

Backtracking in Constraint Programming: Constraint Programming heavily relies on search and backtracking in given domains. Custom solvers are able to handle it all; however, when a programmer implements his own constraint he has to handle backtracking in his data structures himself. Usually there are many constraints interacting with each other, so getting backtracking right is often tricky. It would be good if backtracking was done automatically, and the programmers had means of expressing that some variables and structures will be handled by the solver, but the others have to be taken care of.

Source Language

e := n | x | e₁ ; e₂ | x : T = e | x := e
| new C{f₁ = e₁, . . . , f_n = e_n} | e.f_i
| e₁.f_i := e₂ | e.m(e₁, . . . , e_n)
| new[e₀, . . . , e_n-1] of T | e₁[e₂] := e₃
| e₁[e₂] | cast[C](e) | cast[C[ ]](e)
| true | false | e₁ op e₂ | while(e₁) e₂ | if (e₁) then e₂ else e₃

op := < | > | == | ⇐ | >= | + | - | * | / | && | ||

mdecl := T m(x₁ : T₁, . . . , x_n : T_n) = e

cdecl := C extends B{f₁ : T₁, . . . , f_n : T_n, mdecl₁, . . . ,mdecl_k}

prog := cdecl₁, . . . , cdecl_n in e

Note: I will change the syntax to make it a strict subset of Java. (and I may add the keyword “super”)

Persistence

A data structure is called partially persistent if all versions can be accessed, but only the latest version can be modified. It is called fully persistent if the earlier versions can also be modified. Driscoll et al.[] proposed methods for transforming any linked data structure into its persistent version. The easiest method they proposed is the fat node method. The main idea is to allow nodes to become arbitrarily “fat,” i.e., to hold an arbitrary number of values of each field. Each fat node will contain the same information and pointer fields as an ephemeral node (holding original field values), along with space for an arbitrary number of extra field values. Each extra field value has an associated field name and a version stamp. The version stamp indicates the version in which the named field was changed to have the specified value. In addition, each fat node has its own version stamp, indicating the version in which the node was created.

Ephemeral update steps on the fat node structure are simulated as follows. Consider update operation i. When an ephemeral update step creates a new node, a corresponding new fat node, with version stamp i, containing the appropriate original values of the information and pointer fields is created. When an ephemeral update step changes a field value in a node, the corresponding new value to the corresponding fat node are added, along with the name of the field being changed and a version stamp of i. For each field in a node, only one value per version is stored; when storing a field value, if there is already a value of the same field with the same version stamp the old value is overwritten. Original field values are regarded as having the version stamp of the node containing them.

An auxiliary data structure to store the access pointers of the various versions is also needed. This structure consists of an array of pointers for each access pointer name. After update operation i, the current values of the access pointers are stored in the ith positions of these access arrays. With this structure, initiating access into any version takes O(1) time.

Arrays

Persistent arrays are problematic, because accessing the elements of an array takes O(log log m) time, where m is the number of updates on an array. For our purposes even partial persistence seems too strong. Because we need an earlier version of a data structure only when an operation is undone. In an alternative approach, we can simply implement array[T] as array[stack<T>]. Reading an element from the array will be reading the top of the stack of the indexed element (which takes O(1) time), and updating a field will be just pushing the element to the stack of the indexed element (again O(1) time).

Undoing an operation performed on a persistent structure

In a stack based approach going back to a previous version takes a linear scan and modifications on the data structure. In the persistent setting using a previous version of an application does not require these. With some cooperation from the garbage collector the undone versions can be collected efficiently. Therefore, especially for linked structures whose update and access operations can be implemented with constant slowdown, a persistent implementation can be preferable. (Note that the fat node method is not suitable for this. Because in the best case, the fat node will be a tree and there will be logarithmic slow down for accesses)

Line Editor

Node.java

public class Node {
 
	public String line;
	public Node nextLine;
 
	public Node(String ln, Node Next)
	{
		line=ln;
		nextLine=Next;
	}
 
}

Document.java

import java.io.*;
 
public class Document {
 
	private Node Doc;
	private int cur_line;
	private int size;
 
 
	public Document()
	{
		Doc = new Node(null,null);
		cur_line=0;
		size=0;
	}
 
	public Document(Node D, int s)
	{
		Doc = D;
		cur_line=0;
		size = s;
	}
 
	public void insert(Document new_Doc, int line)
	{
		int ctr=1;
		Node temp=Doc;
 
		while(ctr<line)
		{
			temp= temp.nextLine;
			ctr = ctr + 1;
		}
		Node rest = temp.nextLine;
		temp.nextLine = new_Doc.Doc;
		if(rest!=null)
		{
			temp = new_Doc.Doc;
			while(temp.nextLine!=null)
			{
				temp = temp.nextLine;
			}
			temp.nextLine = rest;
		}
		size= size + new_Doc.size;
		cur_line =line + new_Doc.size-1;
	}

After the transformation a new class is created to have versioned entrys in a stack based fat node.

VersionedEntry.java

public class VersionedEntry<T> {
 
	public T entry;
	public int version;
 
	public VersionedEntry(T e, int v)
	{
		entry = e;
		version = v;
	}
}

The transformed Node.java

import java.util.Stack;
 
public class Node {
 
	public int current_version=0;
 
	public Stack<VersionedEntry<String>> line= new Stack<VersionedEntry<String>>();
	public Stack<VersionedEntry<Node>> nextLine=new Stack<VersionedEntry<Node>>();
 
	public Node(String ln, Node Next)
	{
		// This is simplified. Normally the code related to the version number would be more complicated
		// unless we restrict the counstructors with initialization
		line.push(new VersionedEntry<String>(ln,0));
		nextLine.push(new VersionedEntry<Node>(Next,0));
	}
 
}

Transformed Document.java

import java.util.*;
 
public class Document {
 
	public int current_version = 0;
 
	private Stack<VersionedEntry<Node>> Doc;
	private Stack<VersionedEntry<Integer>> cur_line;
	private Stack<VersionedEntry<Integer>> size;
 
	public Document()
	{
		Doc = new Stack<VersionedEntry<Node>>();
		Doc.push(new VersionedEntry<Node>(new Node(null,null),0));
		cur_line=new Stack<VersionedEntry<Integer>>();
		cur_line.push(new VersionedEntry<Integer>(0,0));
		size= new Stack<VersionedEntry<Integer>>();
		size.push(new VersionedEntry<Integer>(0,0));
	}
 
	public Document(Node D, int s)
	{
		Doc = new Stack<VersionedEntry<Node>>();
		Doc.push(new VersionedEntry<Node>(D,0));
		cur_line=new Stack<VersionedEntry<Integer>>();
		cur_line.push(new VersionedEntry<Integer>(0,0));
		size= new Stack<VersionedEntry<Integer>>();
		size.push(new VersionedEntry<Integer>(s,0));
	}
 
	public void insert(Document new_Doc, int line)
	{
		int /*ephemeral*/ prev_version = current_version;
 
		int /*ephemeral*/ ctr = 1;
		Node temp=Doc.peek().entry;
 
		while(ctr<line)
		{
			temp = temp.nextLine.peek().entry;
			ctr = ctr + 1;
		}
		Node rest = temp.nextLine.peek().entry;
 
		temp.nextLine.push(new VersionedEntry<Node>(new_Doc.Doc.peek().entry,prev_version+1)); 
 
		if(rest!=null)
		{
			temp = new_Doc.Doc.peek().entry;
			while(temp.nextLine.peek()!=null)
			{
				temp = temp.nextLine.peek().entry;
			}
			temp.nextLine.push(new VersionedEntry<Node>(rest,prev_version+1));
		}
 
		size.push(new VersionedEntry<Integer>(size.peek().entry + new_Doc.size.peek().entry,prev_version+1));
		cur_line.push(new VersionedEntry<Integer>(line + (new_Doc.size.peek().entry - 1),prev_version+1));
		current_version = current_version + 1;
	}
 
	public void undo()
	{
		if(cur_line.peek().version==current_version)
			cur_line.pop();
		if(size.peek().version==current_version)
			size.pop();
		current_version = current_version - 1;
 
		// travers Doc and pop the entrys with the the version stamp 
		// equal to the current one.
	}
 
}

Related Work

Leeman[] gives a very good overview of the earlier work on undo. Tolmach's PhD. thesis[] has lengthy section on previous time travel mechanisms.

References

[] H. Agrawal, R. A. DeMillo, and E. H. Spafford. An execution backtracking approach to program debugging. IEEE Software, May 1991.

[] T. Akgul. Assembly Instruction Level Reverse Execution for Debugging. PhD. Thesis, Georgia Institute of Technology, Spring 2004.

[] T. Akgul and V. J. Mooney III. Instruction-level reverse execution for debugging. Workshop on Program Analysis For Software Tools and Engineering, 2002.

[] T. Akgul and V. J. Mooney III. Assembly instruction level reverse execution for debugging. ACM Transactions on Software Engineering and Methodology, 13(2):149–198, April 2004.

[] T. Akgul, V. J. Mooney III, and S. Pande. A fast assembly level reverse execution method via dynamic slicing. 26th International Conference on Software Engineering (ICSE '04), 2004.

[] J. E. J. Archer, R. Conway, and F. B. Schneider. User recovery and reversal in interactive systems. ACM Transactions on Programming Languages and Systems, 6(1):1–19, Jan. 1984.

[] T. Berlage. A selective undo mechanism for graphical user interfaces based on command objects. ACM Transactions on Computer-Human Interaction, 1(3):269–294, 1994.

[] A. B. Brown. A Recovery-Oriented Approach to Dependable Services: Repairing Past Errors With System-Wide Undo. PhD. Thesis, UC Berkeley, Dec. 2003.

[] A. B. Brown and D. A. Patterson. Rewind, Repair, Replay: Three R's to Dependability. 10th ACM SIGOPS European Workshop, September 2002.

[] A. B. Brown and D. A. Patterson. Undo for Operators: Building an Undoable E-mail Store. USENIX Annual Technical Conference, June 2003.

[] J. R. Driscoll, N. Sarnak, D. D. Sleator, and R. E. Tarjan. Making data structures persistent. 18th ACM Symposium on Theory of Computing, May 1986.

[] D. Gries. The Science of Programming. Springer-Verlag New York, 1987.

[] G. F. Johnson and D. Duggan. Stores and partial continuations as first-class objects in a language and environment. 15th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, Jan. 1988

[] J. Lee. Dynamic Reverse Code Generation for Backward Execution. Electronic Notes in Theoretical Computer Science, 174(4):37-54, 2007.

[] G. B. Leeman, Jr. A formal approach to undo operations in programming languages. ACM Transactions on Programming Languages and Systems, 8(1):50-87, Jan. 1986.

[] J. G. Morrisett. Refining First Class Stores. ACM SIGPLAN Workshop on State in Programming Languages, 1993.

[] C. Okasaki. Purely Functional Data Structures. Cambridge University Press, 1998.

[] R. M. Stallman. GNU Emacs Manual, 15th Ed. Boston, MA: GNU Press, 2002.

[] T. Teitelbaum and T. Reps. The Cornell Program Synthesizer: A syntax-directed programming environment. Communications of the ACM, 24(9):563-573, Sept. 1981.

[] W. Teitelman. Interlisp Reference Manual. Xerox Palo Alto Research Center, 1978.

[] A. P. Tolmach. Debugging Standard ML. PhD. Thesis, Princeton University, Oct. 1992.

[] A. P. Tolmach and A.W. Appel. Debugging Standard ML without reverse engineering. ACM Conference on Lisp and Functional Programming, June 1990.

TO DO

Write in document:

related work

Examples:

take example of line editor (ledit) with undo (print the line after each operation
vector graphics editor (without graphical display)
look into DPLL(T)