Experiments
How to read the tables?
Meaning of the model-checker's resultsModel-checker on the instrumented program | Observed state in Litmus | Picture |
VERIFICATION FAILED | YES | |
VERIFICATION SUCCESSFUL | NO |
Reading the results in the experiments tables
The table below is an excerpt of former results we obtained for x86/podwr experiments.
- In the first column, we can read the name of the original Litmus test (with a link to the file), and the expected violation cycle. The second column gathers the results of the model-checkers for SC. As we are interested in WMM bugs, the verification for SC should be .
- The three next columns concern TSO model. The last column tells which result we should get for this program under TSO, namely . We see in the third column that Satabs indeed finds the correct result when checking the instrumented program. ESBMC in this example is missing the counterexample. Threader is either returning an error, or killed by timeout (here, it is an error).
- Note that there are two columns for TSO results: in the first one, we instrument all the delays; in the second one, we instrument only one delay per cycle.
- We also experiment under PSO, RMO and Power, and with additional model-checkers: additional columns of results (and expected results) appear after TSO's columns.
Name | SC | TSO (all delay/cyc) | TSO (1 delay/cyc) | Expected (TSO) | ... |
x86/podwr000 Fre PodWR Fre PodWR |
SatAbs: 0.60sec ESBMC: 0.30sec Threader: 7.92sec ... |
Cycles SatAbs: 3.94sec (x6.56) ESBMC: 8.34sec Threader: ERROR/TIMEOUT[1] 491.99sec ... |
Cycles SatAbs: 2.43sec (x4.05) ESBMC: 7.58sec Threader: ERROR/TIMEOUT[1] 182.60sec ... |
... |
Instrumentation strategies:
- all delays/cyc: instruments all the unsafe delay pairs per cycle
- 1 delay/cyc: instruments one unsafe delay pair per cycle
- for e1 -po-> e2: instrument e1
- for e1 -rfe-> e2: instrument e1 and e2
- instrumented events in red boxes
- only non-uniproc non-thin-air cycles with instrumented events drawn
- an empty graph means that there is nothing to instrument
49551096 kB
Linux 2.6.32-37-server x86_64
x86 tests
x86/podwr
^ Overviewx86/rfi
^ Overviewx86/mix
This family corresponds to litmus tests involving either write-read pairs or internal read-from, to exercise typical x86/TSO scenarios. Therefore they should answer Yes in litmus terms, or Verification Failed in verification terms, as soon as the model is x86/TSO or a weaker one. It differs from x86/podwr and x86/rfi in that it involves several diy cycles instead of one. |
x86/safe
This family corresponds to litmus tests involving only safe x86 relations. Therefore they should answer No in litmus terms, or Verification Successful in verification terms, if the model is x86/TSO. |
x86/thin
^ Overviewppc tests
ppc/plain/podrw
^ Overviewppc/plain/podwr
^ Overviewppc/plain/podww
^ Overviewppc/plain/podrr
^ Overviewppc/plain/posrr
^ Overviewppc/nc/lwdwr
^ Overviewppc/nc/lwswr
^ Overviewppc/plain/rfe
^ Overviewppc/plain/rfi
^ Overviewppc/plain/podrwposwr
^ Overviewppc/ac/aclwdrr
^ Overviewppc/ac/aclwsrr
^ Overviewppc/bc/bclwdww
^ Overviewppc/bc/bclwsww
^ Overviewppc/safe
This family corresponds to litmus tests involving only safe Power relations. Therefore they should answer No in litmus terms, or Verification Successful in verification terms, if the model is Power. |
ppc/cross
This family corresponds to litmus tests involving pairs exercising typical Power scenarios. Therefore they should answer Yes in litmus terms, or Verification Failed in verification terms, as soon as the model is Power. It involves several diy cycles instead of one. |
ppc/mix
This family corresponds to litmus tests involving pairs exercising typical Power scenarios. Therefore they should answer Yes in litmus terms, or Verification Failed in verification terms as soon as the model is Power. It involves several diy cycles instead of one. |
ppc/thin
^ OverviewC source
For the following set of experiments, which use non-generated code the origin of which is explained for each group below, the verification results even for the SC case must be taken with a grain of salt: the truth values of properties embedded in the code may be affected by the way integer variables are handled, i.e., whether boundedness and overflow are taken into account or not. We tried to work around these inconsistencies by adding additional assumptions, but no such change will eliminate the potential efficiency gains of (unsoundly) treating C integers and mathematical (unbounded) integers. Therefore, again, we solely focus on the soundness of tools, but not their relative efficiency.Standard C algorithms
This family consists of Dekker, Szymanski, Lamport, Peterson's mutual exclusion algorithms, and Lamport's bakery algorithm. They should answer Yes in litmus terms, or Verification Failed in verification terms, starting from x86/TSO. |
Borrowed C examples
This family contains the bug found by Sebastian Burckhart and Madan Musuvathi with their Sober tool, presented in their CAV 2008 paper, and our own fix for it. It should answer Yes in litmus terms, or Verification Failed in verification terms, starting from x86/TSO. Its fixed version should answer No in litmus terms, or Verification Successful in verification terms, regardless of the model. |
Other C programs
On the one hand, this family contains the PostgreSQL bug, the fix proposed by PostgreSQL developers and our own fix for it. It should answer Yes in litmus terms, or Verification Failed in verification terms, starting from RMO. The fix proposed by the developers is not enough, hence should answer Yes in litmus terms, or Verification Failed in verification terms, starting from RMO. Our fix should answer No in litmus terms and Verification Successful in verification terms, regardless of the model. On the other hand, this family contains our experiments with RCU and Apache. It shows all the cycles (i.e. potential bugs) exhibited by RCU, and verifies Apache. |