A Massively Scalable Intelligent Information Infrastructure
Ontology-based Data Management Systems (ODMSs) are a new kind of data
management systems specifically designed to
manage large semi-structured data
sets needed to power modern intelligent applications. ODMS data is typically
expressed
using formalisms such as the Resource Description Framework (RDF),
the Web Ontology Language (OWL), and the Semantic
Web Rule Language (SWRL).
The main task of an ODMS is to answer queries over the given ontology and data
set, with
the queries commonly being expressed in the SPARQL language.
Reasoning plays a key role in ODMSs, and modern intelligent
applications
commonly require an integration of taxonomic, spatio-temporal, mereological,
and other kinds of reasoning.
ODMSs can and do exploit implementation techniques described in the database
literature. The computational problems
that such systems need to solve,
however, are very hard, so developing robustly scalable systems is extremely
challenging,
usually requiring a combination heuristics and careful
engineering. Although significant progress has been made and state
of the art
ODMSs can now deal with nontrivial data sets, their performance still falls
far short of what is required
by modern `data hungry' applications. This is
partly due to the sheer size of the data sets that need to be processed,
but
also partly due to the complexity of the computational tasks that need to be
performed.
The main
hypothesis of this project is that the robust scalability required by
modern ODMS applications can only be achieved through
the principled
application of techniques that provide provable performance and/or
tractability guarantees. The use
of such techniques will not only allow for
better and more consistent performance, but will also help ODMS users to
better understand and thus avoid performance bottlenecks. This is to be
achieved by a synthesis of the techniques from
three distinct fields:
knowledge representation will provide the necessary reasoning algorithms,
databases will
provide the techniques for scalable data storage and analysis
of the query structure, and mathematical network theory
will provide the
techniques for describing the statistical properties of ontology data.
Combining all of these techniques
with insightful engineering and extensive
optimisation will enable the implementation a new ODMS with scalability
surpassing that of existing systems by several orders of magnitude. This
project thus aims to lay both the theoretical
and the practical foundations
for a massively scalable intelligent information infrastructure capable of
powering
modern data-intensive applications.