We analyze which of the 22 TPC-H queries can benefit from our
low-level operator for exact confidence computation on
tuple-independent probabilistic databases. For each TPC-H query, we
consider its largest subquery without aggregations and inequality
joins, but with the conf()..group by aggregation in the
outermost select clause. A query with conf()..group by
returns the distinct answer tuples together with their exact
probability.
We consider two flavours of each of these queries: A version with the
original (TPC-H) selection attributes, and a version where we drop all
the selection attributes (a Boolean version). For each of the two
versions, we also consider the case when the existing TPC-H functional
dependencies (FDs) are used to derive a better query signature.
For each query we also state the cost for the evaluation of the
aggregation conf()..group by in terms of secondary-storage
sorting and scans.
The following shorthand notations are used in the query signatures:
Item: lineitem    
Part: part    
Sup: supplier    
PS: partsupp    
Nat: nation    
Reg: region    
Ord: orders    
Cust: customer    
In the query signatures we
use the names of tables to denote the variable columns that originate
from the corresponding tables in the answer to the plain queries
without the conf construct. The computation necessary for
conf uses these variable columns and does not propagate them
in the final results consisting of the distinct tuples with exact
confidences.
TPC-H Query No | TPC-H Query (without aggregations and theta joins) | Query Signature | Cost for conf()..group by |
---|---|---|---|
1 |
Non-Boolean Versionselect l_returnflag, l_linestatus, conf() from lineitem where l_shipdate <= date '1998-09-01' group by l_returnflag, l_linestatus; |
Item* | Sorting on [l_returnflag,l_linestatus] plus one scan. |
TPC-H FDs do not change the signature. | |||
Boolean Versionselect conf() from lineitem where l_shipdate <= date '1998-09-01'; |
Item* | One scan. | |
TPC-H FDs do not change the signature. | |||
2 |
Non-Boolean Versionselect s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment, conf() from part, supplier, partsupp, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and p_size = 15 and p_type like '%BRASS' and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'EUROPE' group by s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment; |
No hierarchy between subgoals(suppkey)={PS,Sup} and subgoals(nationkey)={Nat,Sup}. | |
Under TPC-H FDs: Part(Reg(Nat(Sup(PS)*)*)) | Sorting on [s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment, Part, Reg, Nat, Sup] plus one scan. | ||
The Boolean version is not hierarchical: PS has no key. | |||
3 |
Non-Boolean Versionselect l_orderkey, o_orderdate, o_shippriority, conf() from customer, orders, lineitem where c_mktsegment = 'BUILDING' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < date '1992-01-10' and l_shipdate > date '1992-01-10' group by l_orderkey, o_orderdate, o_shippriority; |
(Cust*(Ord*(Item)*)*)* | Sorting on [l_orderkey, o_orderdate, o_shippriority, Cust, Ord] plus three scans. |
Under TPC-H FDs: Cust(Ord(Item)*). | Sorting on [l_orderkey, o_orderdate, o_shippriority, Cust, Ord] plus one scan. | ||
Boolean Versionselect conf() from customer, orders, lineitem where c_mktsegment = 'BUILDING' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < date '1992-01-10' and l_shipdate > date '1992-01-10'; |
No hierarchy between subgoals(orderkey)={Ord,Item} and subgoals(custkey)={Ord,Cust}. | ||
Under TPC-H FDs: (Cust(Ord(Item*))*)* | Sorting on [Cust, Ord] plus one scan. | ||
4 |
Non-Boolean Versionselect o_orderpriority, conf() from orders, lineitem where o_orderdate >= date '1993-07-01' and o_orderdate < date '1993-10-01' and l_orderkey = o_orderkey and l_commitdate < l_receiptdate group by o_orderpriority; |
(Ord*Item*)* | Sorting on [o_orderpriority, Ord] plus two scans. |
Under TPC-H FDs: (Ord(Item)*)*. | Sorting on [o_orderpriority, Ord] plus one scan. | ||
Boolean Versionselect conf() from orders, lineitem where o_orderdate >= date '1993-07-01' and o_orderdate < date '1993-10-01' and l_orderkey = o_orderkey and l_commitdate < l_receiptdate; |
(Ord*Item*)* | Sorting on [Ord] plus two scans. | |
Under TPC-H FDs: (Ord(Item)*)*. | Sorting on [Ord] plus one scan. | ||
6 |
Non-Boolean VersionThe same as the Boolean version. |
||
Boolean Versionselect conf() from lineitem where l_shipdate >= '1994-01-01' and l_shipdate < '1995-01-01' and l_discount >= 0.05 and l_discount <= 0.07 and l_quantity < 24; |
Item* | One scan. | |
TPC-H FDs do not change the signature. | |||
7 |
Non-Boolean Versionselect s_suppkey, n1.n_name, conf() from supplier, lineitem, orders, customer, nation n1, nation n2 where s_suppkey = l_suppkey and o_orderkey = l_orderkey and c_custkey = o_custkey and s_nationkey = n1.n_nationkey and c_nationkey = n2.n_nationkey and ((n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE') or (n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY')) and l_shipdate between '1995-01-01' and '1996-12-31' group by s_suppkey, n1.n_name; |
(Nat1*(Sup*(Nat2*(Cust*(Ord*Item*)*)*)*)*)*
Remark: Self-join on Nation, but the two copies of Nation use distinct tuples to create an answer tuple. |
Sorting on [s_suppkey, n1.n_name, Nat1, Sup, Nat2, Cust, Ord] plus six scans. |
Under TPC-H FDs: Nat1(Sup(Nat2(Cust(Ord(Item)*)*)*)*) | Sorting on [s_suppkey, n1.n_name, Nat1, Sup, Nat2, Cust, Ord] plus one scan. | ||
The Boolean version is not hierarchical. | |||
10 |
Non-Boolean Versionselect c_custkey, c_name, c_acctbal, n_name, c_address, c_phone, c_comment, conf() from customer, orders, lineitem, nation where c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate >= '1993-01-01' and o_orderdate < '1993-1-10' and l_returnflag = 'N' and c_nationkey = n_nationkey group by c_custkey, c_name, c_acctbal, n_name, c_address, c_phone, c_comment; |
(Nat*(Cust*(Ord*Item*)*)*)* | Sorting on [c_custkey, c_name, c_acctbal, n_name, c_address, c_phone, Nat, Cust, Ord] plus four scans. |
Under TPC-H FDs: Nat(Cust(Ord(Item)*)*). | Sorting on [c_custkey, c_name, c_acctbal, n_name, c_address, c_phone, Nat, Cust, Ord] plus one scan. | ||
Boolean Versionselect conf() from customer, orders, lineitem, nation where c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate >= '1993-01-01' and o_orderdate < '1993-1-10' and l_returnflag = 'N' and c_nationkey = n_nationkey; |
No hierarchy between subgoals(orderkey)={Ord,Item} and subgoals(custkey)={Ord,Cust}. | ||
Under TPC-H FDs: (Nat(Cust(Ord(Item)*)*)*)* | Sorting on [Nat, Cust, Ord] plus one scan. | ||
11 |
Non-Boolean Versionselect ps_partkey, conf() from partsupp, supplier, nation where ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = 'GERMANY' group by ps_partkey; |
No hierarchy between subgoals(suppkey)={PS,Sup} and subgoals(nationkey)={Nat,Sup}. | |
Under TPC-H FDs: (Nat(Sup(PS)*)*)*. | Sorting on [ps_partkey, Nat, Sup] plus one scan. | ||
Boolean Versionselect conf() from partsupp, supplier, nation where ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = 'GERMANY'; |
No hierarchy between subgoals(suppkey)={PS,Sup} and subgoals(nationkey)={Nat,Sup}. | ||
Under TPC-H FDs: (Nat(Sup(PS)*)*)*. | Sorting on [ps_partkey, Nat, Sup] plus one scan. | ||
12 |
Non-Boolean Versionselect l_shipmode, conf() from orders, lineitem where orders.o_orderkey = lineitem.l_orderkey and (l_shipmode = 'MAIL' or l_shipmode = 'SHIP') and l_commitdate < l_receiptdate and l_shipdate < l_commitdate and l_receiptdate >= '1992-01-01' and l_receiptdate < '1999-01-01' group by l_shipmode; |
(Ord*(Item)*)* | Sorting on [l_shipmode, Ord] plus two scans. |
Under TPC-H FDs: (Ord(Item)*)*. | Sorting on [l_shipmode, Ord] plus one scan. | ||
Boolean Versionselect conf() from orders, lineitem where orders.o_orderkey = lineitem.l_orderkey and (l_shipmode = 'MAIL' or l_shipmode = 'SHIP') and l_commitdate < l_receiptdate and l_shipdate < l_commitdate and l_receiptdate >= '1992-01-01' and l_receiptdate < '1999-01-01'; |
(Ord*(Item)*)* | Sorting on [l_shipmode, Ord] plus two scans. | |
Under TPC-H FDs: (Ord(Item)*)*. | Sorting on [l_shipmode, Ord] plus one scan. | ||
14 |
Non-Boolean VersionThe same as the Boolean version. |
||
Boolean Versionselect conf() from lineitem, part where l_partkey = p_partkey and l_shipdate >= date '1995-09-01' and l_shipdate < date '1995-10-01'; |
(Part*Item*)* | Sorting on [Part] plus two scans. | |
Under TPC-H FDs: (Part Item*)* | Sorting on [Part] plus one scan. | ||
15 |
Non-Boolean Versionselect s_suppkey, s_name, s_address, s_phone, conf() from supplier, lineitem where s_suppkey = l_suppkey and l_shipdate >= date '1991-10-10' and l_shipdate < date '1992-01-10' group by s_suppkey, s_name, s_address, s_phone; |
(Sup*Item*)* | Sorting on [s_suppkey, s_name, s_address, s_phone, Sup] plus two scans. |
Under TPC-H FDs: Sup(Item)*. | Sorting on [s_suppkey, s_name, s_address, s_phone, Sup] plus one scan. | ||
Boolean Versionselect conf() from supplier, lineitem where s_suppkey = l_suppkey and l_shipdate >= date '1991-10-10' and l_shipdate < date '1992-01-10'; |
(Sup*Item*)* | Sorting on [Sup] plus two scans. | |
Under TPC-H FDs: (Sup(Item)*)* | Sorting on [Sup] plus one scan. | ||
16 |
Non-Boolean Versionselect p_brand, p_type, p_size, conf() from partsupp, part where p_partkey = ps_partkey and p_brand = 'Brand#45' and p_type like 'MEDIUM POLISHED%' and p_size in ( 49, 14, 23, 45, 19, 3, 36, 9 ) group by p_brand, p_type, p_size; |
(Part*PS*)* | Sorting on [p_brand, p_type, p_size, Part] plus two scans. |
Under TPC-H FDs: (Part(PS)*)*. | Sorting on [p_brand, p_type, p_size, Part] plus one scan. | ||
Boolean Versionselect conf() from partsupp, part where p_partkey = ps_partkey and p_brand <> 'Brand#45' and p_type like 'MEDIUM POLISHED%' and p_size in ( 49, 14, 23, 45, 19, 3, 36, 9 ); |
(Part*PS*)* | Sorting on [Part] plus two scans. | |
Under TPC-H FDs: (Part(PS)*)* | Sorting on [Part] plus one scan. | ||
17 |
Non-Boolean VersionThe same as the Boolean version. |
||
Boolean Versionselect conf() from lineitem, part where p_partkey = l_partkey and p_brand = 'Brand#23' and p_container = 'MED BOX'; |
(Part*Item*)* | Sorting on [Part] plus 2 scans. | |
Under TPC-H FDs: (Part(Item)*)* | Sorting on [Part] plus 1 scans. | ||
18 |
Non-Boolean Versionselect c_custkey, c_name, o_orderkey, o_orderdate, o_totalprice, conf() from customer, orders, lineitem where c_custkey = o_custkey and o_orderkey = l_orderkey and c_name = 'Customer#000148949' group by c_custkey, c_name, o_orderkey, o_orderdate, o_totalprice; |
(Cust*(Ord*Item*)*)* | Sorting on [c_custkey, c_name, o_orderkey, o_orderdate, o_totalprice, Cust, Ord] plus three scan. |
Under TPC-H FDs: (Cust(Ord(Item)*)*)* | Sorting on [c_custkey, c_name, o_orderkey, o_orderdate, o_totalprice, Cust, Ord] plus one scan. | ||
Boolean Versionselect conf() from customer, orders, lineitem where c_custkey = o_custkey and o_orderkey = l_orderkey and c_name = 'Customer#000148949'; |
No hierarchy between subgoals(orderkey)={Ord,Item} and subgoals(custkey)={Ord,Cust}. | ||
Under TPC-H FDs: (Cust(Ord(Item)*)*)* | Sorting on [Cust, Ord] plus one scan. | ||
19 |
Non-Boolean VersionThe same as the Boolean version. |
||
Boolean Versionselect conf() from lineitem, part where p_partkey = l_partkey and ( ( p_brand = 'Brand#12' and p_container in ('SM CASE','SM BOX','SM PACK','SM PKG') and l_quantity >= 1 and l_quantity <= 1 + 10 and p_size between 1 and 5 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_brand = 'Brand#23' and p_container in ('MED BAG','MED BOX','MED PKG','MED PACK') and l_quantity >= 10 and l_quantity <= 10 + 10 and p_size between 1 and 10 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON') or ( p_brand = 'Brand#34' and p_container in ('LG CASE','LG BOX','LG PACK','LG PKG') and l_quantity >= 20 and l_quantity <= 20 + 10 and p_size between 1 and 15 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON')); |
(Part*Item*)* Remark: The three clauses in the disjunction are mutually exclusive. |
Sorting on [Part] plus two scans. | |
Under TPC-H FDs: (Part(Item)*)* | Sorting on [Part] plus one scan. | ||
20 |
Non-Boolean Versionselect s_suppkey, s_name, s_address, conf() from supplier, nation, partsupp, part where s_suppkey = ps_suppkey and p_partkey = ps_partkey and p_name like 'forest%' and s_nationkey = n_nationkey and n_name = 'CANADA' group by s_suppkey, s_name, s_address; |
(Nat*Sup*)* | Sorting on [s_suppkey, s_name, s_address, Nat] plus four scans. |
Under TPC-H FDs: NatSup | Sorting on [s_suppkey, s_name, s_address, Nat] plus one scan. | ||
The Boolean version is not hierarchical. | |||
21 |
Non-Boolean Versionselect s_suppkey, conf() from supplier, lineitem, orders, nation where s_suppkey = l_suppkey and o_orderkey = l_orderkey and o_orderstatus = 'F' and l_receiptdate > l_commitdate and s_nationkey = n_nationkey and n_name = 'SAUDI ARABIA' group by s_suppkey; |
(Nat*(Sup*(Ord*Item*)*)*)* | Sorting on [s_suppkey, Nat, Sup, Ord] plus four scans. |
Under TPC-H FDs: Nat(Sup(Ord(Item)*)*) | Sorting on [s_suppkey, Nat, Sup, Ord] plus one scan. | ||
The Boolean version is not hierarchical. |
Last update: Jul 2009.