Experiments

How to read the tables?

Meaning of the model-checker's results

Model-checker on the instrumented program	Observed state in Litmus	Picture
VERIFICATION FAILED	YES
VERIFICATION SUCCESSFUL	NO

Reading the results in the experiments tables
The table below is an excerpt of former results we obtained for x86/podwr experiments.

In the first column, we can read the name of the original Litmus test (with a link to the file), and the expected violation cycle. The second column gathers the results of the model-checkers for SC. As we are interested in WMM bugs, the verification for SC should be .
The three next columns concern TSO model. The last column tells which result we should get for this program under TSO, namely . We see in the third column that Satabs indeed finds the correct result when checking the instrumented program. ESBMC in this example is missing the counterexample. Threader is either returning an error, or killed by timeout (here, it is an error).
Note that there are two columns for TSO results: in the first one, we instrument all the delays; in the second one, we instrument only one delay per cycle.
We also experiment under PSO, RMO and Power, and with additional model-checkers: additional columns of results (and expected results) appear after TSO's columns.

Name	SC	TSO (all delay/cyc)	TSO (1 delay/cyc)	Expected (TSO)	...
x86/podwr000 Fre PodWR Fre PodWR	SatAbs: 0.60sec ESBMC: 0.30sec Threader: 7.92sec ...	Cycles SatAbs: 3.94sec (x6.56) ESBMC: 8.34sec Threader: ERROR/TIMEOUT[1] 491.99sec ...	Cycles SatAbs: 2.43sec (x4.05) ESBMC: 7.58sec Threader: ERROR/TIMEOUT[1] 182.60sec ...		...

Instrumentation strategies:

all delays/cyc: instruments all the unsafe delay pairs per cycle
1 delay/cyc: instruments one unsafe delay pair per cycle

Delays instrumentation:

for e1 -po-> e2: instrument e1
for e1 -rfe-> e2: instrument e1 and e2

Graphs:

instrumented events in red boxes
only non-uniproc non-thin-air cycles with instrumented events drawn
an empty graph means that there is nothing to instrument

Experiments setting: Intel(R) Xeon(R) CPU X5667 @ 3.07GHz x8
49551096 kB
Linux 2.6.32-37-server x86_64

Contents

Litmus tests for X86/TSO:	Litmus tests for Power/ARM:		C code:
x86/podwr x86/rfi x86/mix x86/safe x86/thin	ppc/podrw ppc/podwr ppc/podww ppc/podrr ppc/posrr ppc/rfi ppc/rfe ppc/lwswr ppc/lwdwr	ppc/podrwposwr ppc/aclwdrr ppc/aclwsrr ppc/bclwdww ppc/bclwsww ppc/cross ppc/mix ppc/safe ppc/thin	standard C programs borrowed C programs real-life C programs

x86 tests

x86/podwr

This family corresponds to litmus tests involving write-read pairs, to exercise typical x86 reordering of write-read pairs scenarios. Therefore they should answer Yes in litmus terms, or Verification Failed in verification terms, as soon as the model is x86/TSO or a weaker one.

x86/rfi

This family corresponds to litmus tests involving internal read-from pairs, to exercise typical x86 store buffering scenarios. Therefore they should answer Yes in litmus terms, or Verification Failed in verification terms, as soon as the model is x86/TSO or a weaker one.

x86/mix

This family corresponds to litmus tests involving either write-read pairs or internal read-from, to exercise typical x86/TSO scenarios. Therefore they should answer Yes in litmus terms, or Verification Failed in verification terms, as soon as the model is x86/TSO or a weaker one. It differs from x86/podwr and x86/rfi in that it involves several diy cycles instead of one.

x86/safe

This family corresponds to litmus tests involving only safe x86 relations. Therefore they should answer No in litmus terms, or Verification Successful in verification terms, if the model is x86/TSO.

x86/thin

This family corresponds to litmus tests involving thin-air (or causal loops) cycles. Therefore they should answer No in litmus terms, or Verification Successful in verification terms, regardless of the model.

ppc tests

ppc/plain/podrw

This family corresponds to litmus tests involving pairs showing that Power can reorder read-write pairs. Therefore they should answer Yes in litmus terms, or Verification Failed in verification terms as soon as the model is Power.

ppc/plain/podwr

This family corresponds to litmus tests involving pairs showing that Power can reorder write-read pairs. Therefore they should answer Yes in litmus terms, or Verification Failed in verification terms as soon as the model is Power.

ppc/plain/podww

This family corresponds to litmus tests involving pairs showing that Power can reorder write-write pairs. Therefore they should answer Yes in litmus terms, or Verification Failed in verification terms as soon as the model is Power.

ppc/plain/podrr

This family corresponds to litmus tests involving pairs showing that Power can reorder read-read pairs. Therefore they should answer Yes in litmus terms, or Verification Failed in verification terms as soon as the model is Power.

ppc/plain/posrr

This family corresponds to litmus tests involving pairs showing that Power can reorder read-read pairs. Therefore they should answer Yes in litmus terms, or Verification Failed in verification terms as soon as the model is Power.

ppc/nc/lwdwr

This family corresponds to litmus tests involving pairs showing that Power's lwsync barrier cannot enforce any ordering when placed between a write and a read. Therefore they should answer Yes in litmus terms, or Verification Failed in verification terms as soon as the model is Power.

ppc/nc/lwswr

This family corresponds to litmus tests involving pairs showing that Power's lwsync barrier cannot enforce any ordering when placed between a write and a read. Therefore they should answer Yes in litmus terms, or Verification Failed in verification terms as soon as the model is Power.

ppc/plain/rfe

This family corresponds to litmus tests showing that Power relaxes the atomicity of writes. Therefore they should answer Yes in litmus terms, or Verification Failed in verification terms as soon as the model is Power.

ppc/plain/rfi

This family corresponds to litmus tests showing that Power authorises store buffering. Therefore they should answer Yes in litmus terms, or Verification Failed in verification terms as soon as the model is Power.

ppc/plain/podrwposwr

This family corresponds to litmus tests involving pairs showing that Power can reorder read-read pairs. Therefore they should answer Yes in litmus terms, or Verification Failed in verification terms as soon as the model is Power.

ppc/ac/aclwdrr

This family corresponds to litmus tests involving pairs showing that Power's lwsync barrier cannot enforce an A-cumulative ordering when placed between two reads. Therefore they should answer Yes in litmus terms, or Verification Failed in verification terms as soon as the model is Power.

ppc/ac/aclwsrr

This family corresponds to litmus tests involving pairs showing that Power's lwsync barrier cannot enforce an A-cumulative ordering when placed between two reads. Therefore they should answer Yes in litmus terms, or Verification Failed in verification terms as soon as the model is Power.

ppc/bc/bclwdww

This family corresponds to litmus tests involving pairs showing that Power's lwsync barrier cannot enforce a B-cumulative ordering when placed between two writes. Therefore they should answer Yes in litmus terms, or Verification Failed in verification terms as soon as the model is Power.

ppc/bc/bclwsww

This family corresponds to litmus tests involving pairs showing that Power's lwsync barrier cannot enforce a B-cumulative ordering when placed between two writes. Therefore they should answer Yes in litmus terms, or Verification Failed in verification terms as soon as the model is Power.

ppc/safe

This family corresponds to litmus tests involving only safe Power relations. Therefore they should answer No in litmus terms, or Verification Successful in verification terms, if the model is Power.

ppc/cross

This family corresponds to litmus tests involving pairs exercising typical Power scenarios. Therefore they should answer Yes in litmus terms, or Verification Failed in verification terms, as soon as the model is Power. It involves several diy cycles instead of one.

ppc/mix

This family corresponds to litmus tests involving pairs exercising typical Power scenarios. Therefore they should answer Yes in litmus terms, or Verification Failed in verification terms as soon as the model is Power. It involves several diy cycles instead of one.

ppc/thin

This family corresponds to litmus tests involving thin-air (or causal loops) cycles. Therefore they should answer No in litmus terms, or Verification Successful in verification terms, regardless of the model.

C source

For the following set of experiments, which use non-generated code the origin of which is explained for each group below, the verification results even for the SC case must be taken with a grain of salt: the truth values of properties embedded in the code may be affected by the way integer variables are handled, i.e., whether boundedness and overflow are taken into account or not. We tried to work around these inconsistencies by adding additional assumptions, but no such change will eliminate the potential efficiency gains of (unsoundly) treating C integers and mathematical (unbounded) integers. Therefore, again, we solely focus on the soundness of tools, but not their relative efficiency.

Standard C algorithms

This family consists of Dekker, Szymanski, Lamport, Peterson's mutual exclusion algorithms, and Lamport's bakery algorithm. They should answer Yes in litmus terms, or Verification Failed in verification terms, starting from x86/TSO.

Borrowed C examples

This family contains the bug found by Sebastian Burckhart and Madan Musuvathi with their Sober tool, presented in their CAV 2008 paper, and our own fix for it.

It should answer Yes in litmus terms, or Verification Failed in verification terms, starting from x86/TSO. Its fixed version should answer No in litmus terms, or Verification Successful in verification terms, regardless of the model.

Other C programs

On the one hand, this family contains the PostgreSQL bug, the fix proposed by PostgreSQL developers and our own fix for it. It should answer Yes in litmus terms, or Verification Failed in verification terms, starting from RMO. The fix proposed by the developers is not enough, hence should answer Yes in litmus terms, or Verification Failed in verification terms, starting from RMO. Our fix should answer No in litmus terms and Verification Successful in verification terms, regardless of the model.

On the other hand, this family contains our experiments with RCU and Apache. It shows all the cycles (i.e. potential bugs) exhibited by RCU, and verifies Apache.