Ontology Alignment Evaluation Initiative - OAEI-2013 Campaign

Large BioMed Track

OAEI 2013::Large BioMed Track

NEWS:


General description

This track consists of finding alignments between the Foundational Model of Anatomy (FMA), SNOMED CT, and the National Cancer Institute Thesaurus (NCI). These ontologies are semantically rich and contain tens of thousands of classes.

UMLS Metathesaurus has been selected as the basis for the track reference alignments (see oaei2013_umls_reference for details). UMLS is currently the most comprehensive effort for integrating independently-developed medical thesauri and ontologies, including FMA, SNOMED CT, and NCI.

The complete datasets for the OAEI 2013 campaign can be downloaded as a zip file (LargeBioMed_dataset_oaei2013.zip [20Mb]) or accessed via the SEALS platform (see dataset identifiers).


Modalities and SEALS support

This track has two main objectives. On the one hand, it intends to evaluate the performance of matching systems when matching real large scale ontologies (Modality 1). On the other hand, it also aims at creating an error free (and as complete as possible) reference alignment; to this end, mapping repair systems are also welcome to participate in the track (Modality 2).

Additionally, the outputs of the matching systems participating in Modality 1 will be "harmonised" in order to create a "silver standard" reference alignment. Participant outputs will also be compared against the silver standard in order to analyse how different they are w.r.t. the other systems. See the 2012 hamonisation results. The harmonized (i.e. voted) mapping sets can be downloaded as a zip file (RDF, OWL and TXT formats): voted_mappings_harmo_2012.zip [6.0Mb].

Regarding the use of background knowledge, the OAEI rules state that a resource (i.e. a third biomedical ontology) especially designed for the test is not allowed. Particularly, matching systems using UMLS as background knowledge will have an advantage since the reference alignment is also based on UMLS. Nevertheless, it will be interesting to evaluate the performance of a system with and without specialised background knowledge. Moreover, matching systems using UMLS may be specially helpful in the creation of the proposed "silver standard" reference alignment.

Modality 1: standard matching

For this modality the generated alignment should be an optimal solution to the matching problem with respect to both recall and precision wrt the reference alignment. This year we aim at giving a special attention to the number of unsatisfiabilities lead by the generated mappings. Thus, we encourage system developers to implement mapping debugging techniques or reuse state-of-the art techniques.

The evaluation of Modality 1 will be run with support of SEALS. This requires that you wrap your matching system in a way that allows us to execute it on the SEALS platform (see OAEI 2013 evaluation details).

Modality 2: mapping repair (optional)

Mapping repair systems are also welcome to provide a revised version of the original UMLS mappings, similar to the current provided refinement.

The original UMLS-based alignments, which lead to many logical errors, can be downloaded as a zip file (RDF, OWL and TXT formats): oaei2013_umls_original_mappings.zip [4.4Mb].

Modality 2 will be optional and will be run in an 'off-line' way.


Data sets

The Large BioMed Track consists of several matching tasks involving different fragments of the FMA, NCI and SNOMED CT ontologies. The complete datasets for the OAEI 2013 campaign can be downloaded as a zip file [20Mb]. Information about the reference alignment can be found here.

Note that the ontologies have been normalised for the OAEI, as a result the synonyms of concept names are provided as "rdfs:label" annotations.

Required input for the SEALS OMT client:


FMA-NCI matching problem

Task 1: FMA-NCI small fragments

This task consists of matching two (relatively) small fragments of FMA and NCI. The FMA fragment contains 3,696 classes (5% of FMA), while the NCI fragment contains 6,488 classes (10% of NCI).

Task 2: FMA-NCI whole ontologies

This task consists of matching the whole FMA and NCI ontologies, which contains 78,989 and 66,724 classes, respectively.


FMA-SNOMED matching problem

Task 3: FMA-SNOMED small fragments

This task consists of matching two (relatively) small fragments of FMA and SNOMED. The FMA fragment contains 10,157 classes (13% of FMA), while the SNOMED fragment contains 13,412 classes (5% of SNOMED).

Task 4: FMA whole ontology with SNOMED large fragment

This task consists of matching the whole FMA that contains 78,989 classes with a large SNOMED fragment that contains 122,464 classes (40% of SNOMED).


SNOMED-NCI matching problem

Task 5: SNOMED-NCI small fragments

This task consists of matching two (relatively) small fragments of SNOMED and NCI. The SNOMED fragment contains 51,128 classes (17% of SNOMED), while the NCI fragment contains 23,958 classes (36% of NCI).

Task 6: NCI whole ontology with SNOMED large fragment

This task consists of matching the whole NCI that contains 66,724 classes with a large SNOMED fragment that contains 122,464 classes (40% of SNOMED).