Incremental Update of Datalog Materialisation:
The Backward/Forward Algorithm
This is the webpage in support of the AAAI 2015 submission "
Incremental
Update of Datalog Materialisation: The Backward/Forward Algorithm".
We have extended
RDFox
with a variant of the well known DRed algorithm and with the
Backward/Forward Algorithm (B/F) presented in our submission.
RDFox is an open-source, RAM-based datalog system developed in our
group. A Linux
version of our extension is downloadable as
RDFox-linux.zip usable under Oxford's
licence agreement.
Test Cases and data Sets
In our
paper, we conducted our evaluation on four different test setups, the first two
involving the benchmark data sets
LUBM and
UOBM an the latter two involving the cultural database with two different datalog programs,
Claros-L and
Claros-LE.
For explanations of the directory structure please have a look at further
down.
Our Test Results
We provide the results of our tests as separate pdf documents and csv files:
LUBM-L (
pdf,
csv), and
UOBM-U (
pdf,
csv),
Claros-L (
pdf,
csv),
Claros-LE (
pdf,
csv),
Reproducing Test Results
On each test setup 5 different tests were run where we measured the time, the
number of derivations and the number of checked triples and deleted triples,
respectively for the three different approaches:
Algorithm | required Datasets |
---|
rematerialisation | (E\E–) |
incremental deletion by the DRed algorithm | E,E– |
incremental deletion by the B/F algorithm. | E,E– |
where
E is the original data set and
E–
is a subset of
E that is to be incrementally deleted.
Note that derivations and number of triples need to be measured separately
from the times as the statistical subroutines introduce an asymmetric
overhead.
In order to obtain
E– and
E\E–
we provide the Java utility
FileSplit which is used as follows
java <utility> -cp uk/ac/ox/cs/util/FileSplit -s=27012015 -l=<k> -f=<E>
-o1=<E–>
-o2=<(E\E–)>
-n=<n>
where
- <k> the number of lines to be extracted
- <E> the .ttl file from which the lines are to be extracted. File must be present.
- <E–> the facts that need to be deleted. File will be created.
- <(E\E–)> contains all lines from <E>
except those lines contained in E–. File will be created.
- <n> number of lines contained in <E>
We fixed the parameter -s as given above for our tests. It is the seed used in the
random number generator to obtain reproducible results, when extracting randomly lines
from <
E>. The table below shows the parameter <k> used for the different test cases
and the parameter <n> which only depends on the original data file <E>. Note that the
data files called Claros.ttl for Claros-L and Claros-LE are the same.
Test name | 25% | 50% | 75% | 100% | -n |
LUBM | 2500000 | 5000000 | 7500000 | 10000000 | 133573856 |
UOBM | 17000000 | 34000000 | 51000000 | 68000000 | 254753783 |
Claros-L | 575000 | 1150000 | 1725000 | 2300000 | 18793277 |
Claros-LE | 25000 | 50000 | 75000 | 100000 | 18793277 |
Setting up the Test Environment
To reproduce tests, please extract the archive and use the executable CppRDFox under ../RDFox/lib/
RDFox requires for each test case a specific
directory structure, where the sub-directory
- facts contains the original data set with extension .ttl
- dlog contains the ruleset with extension .dlog
- scripts contains the start-up scripts for the RDFox shell.
- Create the directories dlog, facts and scripts.
- Copy the .dlog file into dlog and the .ttl file into facts.
- Use FileSplit.class and create the files <E–> and
<(E\E–)> in the facts sub-directory.
- Download one of the following script files into the scripts sub-directory and adapt to your needs.
- Start RDFox using the following command from the Linux shell (command line)
<RDFox> -shell <absScript>
where <RDFox> is a suitable path to the executable CppRDFox and
<absScript> is the absolute filename of the script you are starting.
Evaluating the Statistical Output
Times need to be evaluated without statistics! They can then be found in case of
- remat.rdfox as
Materializing rules on 1 thread.
Materialization time: 66.058 s.
- dred.rdfox or bf.rdfox as
Materializing rules incrementally on 1 thread and the backward/forward algorithm.
Materialization time: 0.006 s.
and "…the DRed algorithm." for the DRed algorithm respectively.
Number of Derivations and Number of Checked Facts and Deleted Facts can be found only when statistics
are turned on. In the case of
- remat.rdfox the number of derivations can be found in the line
"… : Derivations"
- dred.rdfox the number of Deleted Facts can be found the second line of
---------------- BACKWARD/FORWARD DELETION: forward reasoning ----------------
3979494 : Non-merged IDB tuples extracted from proved list
the number of Backward Derivatons (Bwd) can be found in
---------------- BACKWARD/FORWARD DELETION: forward reasoning ----------------
⋮
590286 : Non-delayed derivations
the number of Forward Derivations (Fwd) is the sum of the following numbers
---------------- BACKWARD/FORWARD DELETION: deletion propagation ----------------
⋮
59875463 : Derivations
---------------- TUPLE INSERTION ----------------
⋮
29301013 : Matched rule instances
- bf.rdfox the number of Checked Facts can be found in
---------------- BACKWARD/FORWARD DELETION ----------------
⋮
1462837 : Tuples added to the checked list
the number of Backward Derivations can be found in
---------------- BACKWARD/FORWARD DELETION ----------------
⋮
22072685 : Rule instances checked during backward reasoning
the number of Forward Derivations is the sum of the following numbers
---------------- BACKWARD/FORWARD DELETION: forward reasoning ----------------
⋮
7112373 : Matched rule instances during tuple extraction
---------------- BACKWARD/FORWARD DELETION: deletion propagation ----------------
⋮
30574450 : Derivations