Last Update: 12 Oct 2016, 14:30 CET: Added a minor tip for low configuration machines
Virtual disk image containing the artifact: popl-paper-184-artifact.tar.gz md5
The above vdi image is pre-installed with the following artifacts:
Below we provide instructions for running our resource verification system (Leon/Orb) on the benchmarks, and for reproducing the results of the paper. Our tool can also be tried online on some benchmarks at leondev.epfl.ch (Memresource section). To know more about our tool and for getting started on writing and verifying new programs with Leon/Orb please refer to the documentation http://leondev.epfl.ch/doc/resourcebounds.html.
The following command can be used to run individual source files. However, to reproduce the results presented in the paper we recommend using the scripts described in the subsequent sections.
$ cd ~/leon
$ ./leon --mem --benchmark --timeout=<secs> path-to-file
E.g. try ./leon --mem --benchmark --timeout=60 ./testcases/benchmarks/steps/LazySelectionSort.scala
. The tool prints some log messages and the inferred bounds to the console. It dumps the final output and some statistics of the evaluation to a file <filename>-stats.txt in the directory from where the tool was run. For a short description of the command line options use leon --help
.
As shown in the Figure 9 of the paper, a total of 17 benchmarks are used in the evaluation. Each benchmark has two versions: one with a steps
bound, which denotes the number evaluation steps, and another with an alloc
resource bound, which denotes the number of heap-allocated objects. The versions can be found in ~/leon/testcases/benchmark/steps
and ~/leon/testcases/benchmark/alloc
respectively. Each benchmark has a brief description or pointers to the algorithm that is implemented. The results of running the tool on these benchmarks are available in the folders inside the directory: ~/leon/results/
. The folder ~/leon/results/server-results
has the results of running the benchmarks on a server with the configurations presented in the paper, and provides more accurate data for the time taken by the tool to analyze a benchmark.
To infer the steps
bounds of the benchmarks (shown in Figure 9), use the following commands:
$ cd ~/leon/results
$ mkdir steps; cd steps
$ ../../scripts/steps.sh
To infer the alloc
bounds of the benchmarks, replace steps
by alloc
in the above commands. The script may take few tens of minutes (depending on the VM configuration) to verify all benchmarks. For slower machines it is possible to comment out some larger benchmarks from the scripts. The benchmarks LazyNumericalRep.scala
and Conqueue.scala
are among the largest and most time-consuming of all.
Running the script produces two files for each benchmark: <benchmarkname>-stats.txt
and <benchmarkname>.out
. The *-stats
file provides several statistics in the form of key : value pairs and has the bounds inferred for every function (that has a template) in the benchmark. Note that the Figure 9 of the paper shows only the bounds inferred for a couple of selected functions in each benchmark (for each resource), whereas *-stats
displays every bound inferred. Below we list the functions in each benchmark whose bounds were presented in Figure 9. Reviewers may restrict their attention to these functions in all of the results that follow.
kthMin
PrimesUntilN
nthFib
nthHammingNumber
isPrefixOf
enqueue
, dequeue
kthMin
cons
, tail
and reverse
incAndPay
pushLeftAndPay
and concatNonEmpty
lcsSols
levDistSols
hammingList
schedBU
knapsack
parse
viterbiSols
At the end of each stats file there are two tables: Resource Verification Summary and State Verification Summary. The former table shows the inferred bounds for every function (including those presented in Figure 9), and the latter table shows the result of verifying the correctness invariants needed for proving the resource bounds, which may possibly depend on the state of memoization. All the invariants in all the benchmarks will be verified by the tool and would be marked as valid. The table also shows the SMT solver among CVC4 and Z3 that succeeded first in deciding the generated verification conditions.
Most of the key-value pairs in the stats file present details on the internals of the algorithm. The most relevant entries among these are Total-Time (The column AT of Figure 9), State-Verification-Time and Resource-Verification-Time.
Even though the tool makes a best effort to enforce determinism, minor variances across different runs (although rare) is possible. This is because of the incompleteness of the minimization problem in the presence of nonlinearity and recursive functions, and the non-determinism in SMT solvers. This means that occasionally the constants inferred by the tool may slightly vary compared to the ones presented in Figure 9. We observed a difference greater than +/- 1 in the inferred constants only in the two benchmarks: PackratParsing and Deque (for the steps
resource). In both cases the tool computed a more precise bound than the one presented in Figure 9, as the algorithm was able to perform more minimization.
To measure the accuracy of the inferred bounds, we instrument the benchmarks to track the resources (using ./leon --instrument
option), and add drivers to run the benchmarks with inputs that exercise the worst-case behavior. All such instrumented benchmarks (having an executable main
function) can be found in the folder: ~/leon/RuntimeEvaluation/src/main/scala/steps
(or alloc).
The main
function of an instrumented benchmarks invokes the key functions on several inputs, computes the dynamic resource usage, and compares it against the statically inferred bounds as described in the section 5 of the paper. Use the following commands to run the instrumented benchmarks.
$ cd ~/leon/RuntimeEvaluation/
$ sbt
Once inside the sbt
(Scala build tool) console, use the following command
> run
The run
command will list all the available benchmarks that can be executed. Choose the benchmark to run by entering its index. Running a benchmark will produce a set of .data
and .report
files in the directory: ~/leon/RuntimeEvaluation/results/steps/<Benchmarkname>/
(steps
can be replaced by alloc
). The content of these files are described shortly.
To compute the summary statistics shown in Figure 10, run the StatsCollector
program (listed first) from the sbt
console, which outputs the averages shown in Figure 10 to a file ~/leon/RuntimeEvaluation/Figure-10-data
.
Running an instrumented benchmark outputs the following files in the directory: ~/leon/RuntimeEvaluation/results/steps/<Benchmarkname>/
:
dynamic<benchmark-tag>.data
paretoanalysis<benchmark-tag>.report
pareto<index><benchmark-tag>.data
Each .data
file contains a list of triples: (a) the size (or another measure) of the input, (b) the statically inferred bound for the input, (c) the dynamic resource usage or the pareto optimal resource usage. For example, the file dynamicmsort.data
has the following entries
1000 57318 54546
2000 113426 108612
3000 169534 162656
4000 225534 216678
5000 281642 270722
6000 337642 324722
7000 393642 378744
8000 449642 432744
9000 505750 486788
10000 561750 540788
The first line of the file indicates that for an input of size 1000, the statically inferred steps count was 57318 and the runtime steps count was 54546. Using these data for each benchmark, the program StatsCollector
computes a summary statistics using the formula shown in page 10 of the paper. This quantity, referred to as dynamic/static * 100, is outputted by StatsCollector
(for each benchmark and resource) to the file ~/leon/RuntimeEvaluation/Figure-10-data
.
As explained in the section 5 of the paper, we also compare the statically inferred bounds to a set of pareto optimal bounds with respect to the dynamic execution. Each pareto optimal bound is obtained by reducing a coefficient in the inferred bound, while preserving the other coefficients, until the bound is violated by a concrete input. The file paretoanalysis<benchmark-tag>.report
reports the pareto optimal bound for each benchmark. For instance, the file paretoanalysishams.report
(in folder ~/leon/RuntimeEvaluation/results/steps/CyclicHammingStream/
) has the following content:
Statically inferred formula: 129*n + 4
Reducing coeff 0 starting from 4
Lowest possible value for the coefficient is 4
Pareto optimal template: 129*n + 4
First counter-example for 129*n + 3 is at the point 0
Reducing coeff 1 starting from 129
Lowest possible value for the coefficient is 124
Pareto optimal template: 124*n + 4
First counter-example for 123*n + 4 is at the point 8000
As explained in the report, for this benchmark, the first coefficient cannot be reduced any further, and the second coefficient could at best be reduced to 124 (without increasing the other coefficients) for the given dynamic runs. It shows that the data point 8000 violates the bound if the coefficient is reduced to 123.
The files pareto0hams.data
and pareto1hams.data
displays the statically inferred bound against the first and the second pareto optimal bound (respectively), for each input.
Using these data for each benchmark, the program StatsCollector
computes the metric optimal/static * 100 (for each benchmark and resource) and outputs it to the file ~/leon/RuntimeEvaluation/Figure-10-data
.
Even though most of the percentages in the Figure-10-data
file are identical to the data shown in Figure 10 of the paper, in a few benchmarks there is a slight difference (approximately 1 to 2 percentage in most benchmarks) from the data shown in Figure 10. This is because, the results in the paper are more exhaustive and includes more top-level functions, whose evaluations require more complex drivers and manual effort. E.g. For the ease of artifact evaluation, the instrumented BottomUpMergeSort benchmark provided fixes the value of k
to a constant and only varies the size of the input
(see Figure 9 for the bound of kthMin
). However, in the results presented in the paper, k
is also varied. The exact data used in the paper are also provided in the directory ~/leon/RuntimeEvaluation/results/archives/
.
We have also provided some graphical plots of selected functions of the benchmarks to visualize the relationship between the dynamic resource usage and the statically inferred bounds in the folder: ~/leon/RuntimeEvaluation/results/graphs/
. The graphs PackratParsing-pAdd
and deque-rotateDrop
helps visualize the behavior of inner functions, which was summarized in Figure 11 of the paper. The lines in the graphs denote the statically inferred bounds and the dots denote the runtime resource usage.