UMLS Metathesaurus [1] (2009AA) has been selected as the basis for the track reference alignments. Although the standard UMLS distribution does not directly provide sets of "alignments" (in the OAEI sense) between the integrated ontologies (e.g. FMA, NCI and SNOMED CT), it is relatively straightforward to extract alignment sets from the information provided in the MRCONSO.RRF distribution file (see [2] for details).
It was noticed, however, that the integration of the (formally represented) UMLS alignments together with the input ontologies was leading to numerous unsatisfiable classes [2].
Since alignment coherence is an aspect of ontology matching that we aim to promote in the Large BioMed track, in previous editions we provided coherent reference alignments by refining the UMLS mappings using Alcomo (mapping) debugging system [3], LogMap's (mapping) repair facility [4], or both.
However, concerns were raised about the validity and fairness of applying automated mapping repair techniques to make reference alignments coherent [5].
It is clear that using the original (incoherent) UMLS alignments would be penalizing to ontology matching systems that perform mapping repair. However, using automatically repaired alignments would penalize systems that do not perform mapping repair and also systems that employ a repair strategy that differs from that used on the reference alignments [5].
Thus, for this year's edition of the Large BioMed track we arrived at a compromising solution that should be fair to all ontology matching systems. Instead of repairing the reference alignments as normal, by removing mappings, we flagged the incoherence-causing mappings in the alignments by setting the relation to "?" (unknown). These "?" mappings will neither be considered as positive nor as negative when evaluating the participating ontology matching systems, but will simply be ignored. This way, systems that do not perform mapping repair are not penalized for finding mappings that (despite causing incoherences) may or may not be correct, and systems that do perform mapping repair are not penalized for removing such mappings.
To ensure that this solution was as fair as possible to all mapping repair strategies, we flagged as unknown all mappings repaired by any of Alcomo, LogMap or AgreementMakerLight [6], as well as all mappings repaired in the reference alignments of last year's edition (using Alcomo and LogMap combined).
These refined UMLS alignments with repaired mappings flagged as unknown (i.e. "?") will be the reference alignments for the 2014 edition of the Large BioMed track
The refined and flagged UMLS-based reference alignment for the OAEI 2014 campaign can be downloaded as a zip file (RDF format): oaei2014_umls_flagged_reference.zip
Please consider citing [1-6] when you use the refined UMLS-based reference alignments.
[1] O. Bodenreider: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic acids research 32 (2004) [url]
[2] E. Jimenez-Ruiz, B. Cuenca Grau, I. Horrocks, and R. Berlanga: Logic-based assessment of the compatibility of UMLS ontology sources. J Biomed. Sem. 2 (2011) [url]
[3] Christian Meilicke. Alignment Incoherence in Ontology Matching. University of Mannheim, Chair of Artificial Intelligence (2011) [url]
[4] E. Jimenez-Ruiz, B. Cuenca Grau: Logmap: Logic-based and scalable ontology matching. In: 10th International Semantic Web Conference, 273-288 (2011) [url]
[5] Catia Pesquita, Daniel Faria, Emanuel Santos, Francisco M. Couto. To repair or not to repair: reconciling correctness and coherence in ontology reference alignments. In OM 2013 workshop. [pdf]
[6] Emanuel Santos, Daniel Faria, Catia Pesquita, Francisco M. Couto. Ontology alignment repair through modularization and confidence-based heuristics. arXiv:1307.5322