Data and Knowledge Group

― Knowledge Representation and Reasoning

HermiT: Reasoning With Large Ontologies

An ISG project in Knowledge Representation and Reasoning

Description

Ontologies are formal vocabularies of terms, often shared by a community of users. One of the most prominent application areas of ontologies is medicine and the life sciences. For example, the Systematised Nomenclature of Medicine Clinical Terms (SNOMED CT) is a clinical ontology which is being used in the UK Health Service's National Programme for Information Technology (NPfIT). Other examples include GALEN, the Foundational Model of Anatomy (FMA), the National Cancer Institute (NCI) Thesaurus, and the OBO Foundry -- a repository containing about 80 biomedical ontologies.

These ontologies are gradually superseding existing medical classifications and will provide the future platforms for gathering and sharing medical knowledge. Capturing medical records using ontologies will reduce the possibility for data misinterpretation, and will enable information exchange between different applications and institutions.

Medical ontologies are strongly related to description logics (DLs), which provide the formal basis for many ontology languages, most notably the W3C standardised Web Ontology Language (OWL). All the above mentioned ontologies are nowadays available in OWL and, therefore, in a description logic. The developers of medical ontologies have recognised the numerous benefits of using DLs, such as the clear and unambiguous semantics for different modelling constructs, the well-understood tradeoffs between expressivity and computational complexity, and the availability of provably correct reasoners and tools.

The development and application of ontologies crucially depend on reasoning. Ontology classification, i.e., organising classes into a specialisation/generalisation hierarchy, is a reasoning task that plays a major role during ontology development: it provides for the detection of potential modelling errors such as inconsistent class descriptions and missing sub-class relationships. For example, about 180 missing sub-class relationships were detected when the version of SNOMED CT used by the NHS was classified using the DL reasoner FaCT++. Query answering is another reasoning task that is mainly used during ontology-based information retrieval; e.g., in clinical applications query answering might be used to retrieve "all patients that suffer from nut allergies".

Despite the impressive state-of-the-art, modern medical ontologies pose significant challenges to both the theory and practice of DL-based languages. Existing reasoners can efficiently deal with some large ontologies, such as NCI, but many important ontologies are still beyond the reach of available tools. For example, none of the existing reasoners can successfully classify either GALEN or FMA.

Applications currently need to work around these limitations, e.g., by using subsets of ontologies that can be successfully processed. For example, the version of GALEN typically used in practice contains only about 20% of the axioms of the full version; this reduces the interaction between concepts and thus makes the ontology "processable". This is, however, highly undesirable in practice, because it reduces coverage, weakens the conceptualisation of the domain and may prevent the detection of modelling errors.

Furthermore, the amount of data used with ontologies can be orders of magnitude larger than the ontology itself. For example, the annotation of patients' medical records in a single hospital can easily produce data consisting of hundreds of millions of facts, and aggregation at a national level might produce billions of facts. Existing reasoners cannot cope with such data volumes, especially not if ontologies such as GALEN and FMA are used as schemata.

The goal of this project is to develop scalable reasoning algorithms and a prototypical implementation that can efficiently deal with large and complex ontologies and large data sets. Developing such a reasoner will be critical to the success of many ontology based applications.

Support

HermiT is sponsored by the UK Engineering and Physical Sciences Research Council (EPSRC).

Project Summary

Duration

August 2008 to May 2011

Researchers

Ian Horrocks, Boris Motik, Birte Glimm, and Rob Shearer

Sponsors

UK Engineering and Physical Sciences Research Council

Links

The HermiT OWL Reasoner

Key Publications

HermiT: A Highly-Efficient OWL Reasoner by Rob Shearer, Boris Motik, and Ian Horrocks. (Published at OWL:ED 2008 EU.)

Hypertableau Reasoning for Description Logics by Boris Motik, Rob Shearer, and Ian Horrocks. J. of Artificial Intelligence Research, 36:165-228, 2009.
BibTeX-Entry | Pdf ]

Rob Shearer, Ian Horrocks, and Boris Motik. Exploiting Partial Information in Taxonomy Construction. In Proc. of the 2009 Description Logic Workshop (DL 2009), volume 477 of CEUR (http://ceur-ws.org/), 2009.
[ bib | .pdf ]

Birte Glimm, Ian Horrocks, Boris Motik, and Giorgos Stoilos. Optimising Ontology Classification. In Proc. of the 9th International Semantic Web Conference (ISWC 2010), 2010.
[ bib | .pdf ]

Birte Glimm, Ian Horrocks, and Boris Motik. Optimized Description Logic Reasoning via Core Blocking. In Jürgen Giesl and Reiner Hähnle, editors, Proc. of the Int. Joint Conf. on Automated Reasoning (IJCAR 2010), volume 6173 of Lecture Notes in Artificial Intelligence, pages 457-471. Springer, 2010.
[ bib | .pdf ]

Complete list...