Aggregating Semantic Annotators
Luying Chen‚ Stefano Ortona‚ Giorgio Orsi and Michael Benedikt
Abstract
A growing number of resources are available for extending documents with semantic annotations. While originally focused on a few standard classes of annotations, the annotator ecosystem is becoming increasingly diverse. There are high- and low-level concepts that are supported by multiple annotators, in addition to a range of annotators with highly distinct vocabularies, but having many semantic interconnections. We will show that both the overlap and the diversity in annotator vocabularies motivate the need for semantic annotation integration: middleware that produces a unified annotation on top of diverse semantic annotators. On the one hand, the diversity of vocabulary allows applications to benefit from the much richer vocabulary available in an integrated vocabulary. On the other hand, we present evidence that the most widely-used individual annotators suffer from serious accuracy deficiencies: the overlap in vocabularies from individual annotators could allow an integrated annotator to boost accuracy by exploiting inter-annotator agreement and disagreement. The integration of semantic annotations leads to new challenges, both compared to usual data integration scenarios and to standard aggregation of machine learning tools. We overview an approach to these challenges that performs ontology-aware aggregation. We introduce an approach that requires no training data, making use of ideas from database repair. We experimentally compare this with a supervised approach, which adapts maximal entropy Markov model to the setting of ontology-based annotationsWe further experimentally compare both these approaches with respect to ontology-unaware supervised approaches, and to individual annotators.