Query Rewriting for Expressive Ontology Languages
24th September 2010 to 31st December 2013
Background and Motivation:
Critical decisions in industry, science, government,
and healthcareare are increasingly based on information derived from data sources whose number, size and heterogeneity continue
to grow at a phenomenal rate. Improved access to and exploitation of such data is thus becoming critical to both our future
competitiveness and our quality of life. One such example in the telecoms industry relates to the organisation and management
of global communications networks, which has become highly complex, and involves the adjustment in real time of numerous parameters.
Decisions on how to adjust these parameters can have a huge impact on performance and on profitability. These decisions are
based on information derived from a large number of data sources, ranging from historical data to data streaming from monitoring
equipment.
In order to meet the challenges presented by this kind of application, a new generation
of Ontology-Based Information Systems (OBISs) is beginning to emerge. These systems aim to offer high performance and large
storage capacities, while at the same time exploiting rich knowledge about the application domain captured in ontologies;
this knowledge is used to present a unified view over the data sources, provide a domain-centric vocabulary for use in queries,
deal with incomplete and semi-structured data, and enrich query answers with implicit information. The hope is that
OBISs can be realised via a principled synthesis of ontological reasoning and database management techniques in a flexible
architecture that will enable them to access data directly from different kinds of data repository, exploit different kinds
of query answering technology, and adapt to a wide range of different application settings, while still providing formal guarantees
about the quality of query answers.
Currently, all practical approaches to building OBISs rely
on query answering via query rewriting: answers to a query q over an ontology O and a database instance DB are computed by
first rewriting q (using O) into another query q' and then evaluating q' over DB. Query rewriting means that query evaluation
can be delegated to existing information systems and thus exploit the capabilities that they provide, including robust scalability
and the ability to handle rapidly changing data. However, existing query rewriting techniques have focused on reducing the
expressivity of the ontology language in order to obtain formal tractability guarantees; this seriously restricts the modelling
capabilities of the system, while not necessarily delivering robust scalability in practice.
The
goal of the project is to design practically effective query answering algorithms for more expressive ontology languages with
features such as transitivity and equality, and demonstrating that OBISs based on such languages can be effectively deployed
in applications. In achieving these goals we will work closely with Alcatel-Lucent Bell Labs,
who have extensive technical expertise in this area, as well as motivating applications in the telecoms domain.
Technical
Details:
Due to the fundamental trade-off between ontology language expressivity and query
answering complexity, the currently prevalent view is that Ontology-Based Information Systems (OBISs) should use ontology
languages that allow for first-order reducibility of query answering, with the low data complexity of the resulting algorithm
often being given as a supporting argument. Reasoning with individual equality, transitivity, recursive axioms, or any kind
of disjunctive information is LogSpace-hard in data complexity, which prevents first-order reducibility, so these features
were excluded from OBIS languages such as OWL 2 QL. Such features, however, are often needed in practice; for example, transitivity
is needed to describe part-whole relationships, and equality is critical in information integration. Furthermore, even
with very restricted ontology languages, the existing query rewriting algorithms produce rewritings whose size is of the order
((|O| . |q|)^|q|), where |q| and |O| are the sizes of the original query and the ontology, respectively. Data complexity is
useful only when the size of the query is negligible w.r.t. the size of the data; but then, since the rewritings can be large,
the low data complexity of query answering via rewriting does not provide a realistic estimate of performance in practice.
Thus, while seriously restricting the ontology language, first-order reducibility does not provide a practical scalability
guarantee.
The technical challenge of this project is to develop a new approach to query answering
in OBISs. On the one hand, in order to accommodate the expressivity required in practice, more expressive languages should
be considered as targets for query rewriting. On the other hand, finer-grained complexity measures are needed in order to
identify assumptions about the content and usage of the OBIS under which scalability can still be guaranteed.
Regarding more expressive targets for query rewriting, first steps in this direction have already been taken:
if an ontology O is expressed in the DL ELHIO, a conjunctive query q over O can be rewritten into a Datalog query q'. The
polynomial data complexity of Datalog may be too high in general; however, little is known about using languages with data
complexity between AC0 and PTime as targets for query rewriting. Regular path queries (RPQs) are a promising candidate since
they provide a limited form of recursion. RPQs were used as the basis for the query languages in semi-structured databases,
so a rewriting approach based on RPQs should allow for ontology-based access to semi-structured data.
Regarding
finer-grained complexity measures, in related fields such as propositional satisfiability, constraint satisfaction and conjunctive
query answering, measures based on (hyper)treewidth have proved very effective. Adapting such measures to OBISs, and developing
query answering algorithms that exploit them, is thus an important research problem. It is currently unknown whether assumptions
about the structure of data (rather than on the structure of the ontology and/or the query) may be used to obtain practically
effective query answering algorithms. Furthermore, determining classes of ontology languages and/or queries that admit `small'
rewritings is an important and practically relevant open problem.
Some usage scenarios may also
allow alternative architectures for OBISs to be considered. For example, if the ontology and the data change relatively
infrequently, ontology-based ISs could be based on the "combined" query answering techniques, where some entailed
facts are materialised in the data. It is, however, currently unclear whether these can be extended to expressive languages
such as Horn-SHIQ and the related dependency classes. Furthermore, if the data changes more frequently, the cost of recomputing
the preprocessed database instance may be prohibitive; in such cases, however, an incremental maintenance approach may be
efficacious.