A Hybrid Metho d for Integrating Multiple Ontologies Trong Hai Duong 1 , Ngoc Thanh Nguyen 2 , and Geun SikJo 1 1 School of Computer and Information Engineering, Inha University, Korea [email protected], [email protected]2 Institute of Computer Science, Wroclaw University ofTechnology, Poland[email protected]Abstract. While there have been a variety of researches focusing on ontology integration based on simple techniques (e.g., element- or structure-level techniques), the hybrid approaches combining the simple techniques have not been explored. In this paper, we describe a hybrid method to integrate multiple ontologies in several levels such as element level, internal structure, and relational structure. A semantic supporting environment (SSE) combining special domains (e.g., WordNet) and text corpus are defined in the proposed approach. An enriched ontology model (EOM) has been proposed to reduce the initial complexity of the process of ontology integration. Subsequently, the semantic network called OnConceptSNetis provided. The relations between the concepts in the OnConceptSNetare derived from the SSE. An Enhanced Algorithm (EA) has been proposed to enh ance OnConceptSNet. Keywords. Knowle dge integration, Ontology integration, Semantic network, Meta-rules. 1 Introduction Ontology has become a “buzz w ord” in th e semantic w eb an d semantic data processin g, and its import ance is being recogn ized in a multipli city of resear ch fields and application areas, such as knowledge engineering, database design and integrati on, information retrieval and extraction, standard search (e. g., Yahoo and Lycos), ecommer ce (e.g., Amazon and eBay), configuration (e.g., Dell and PC-Order), and gov ernment intelligence (e.g., DARPA’s High Performance Knowledge Base (HPKB) program). The ontologies play a central role in facilitating data exchange between the several sources. In general, the problem of ontology integration can be formulated as follows: For given ontologies O 1 , …, O none should determine an ontology O which could rep lace them (Gangemi et al. 1998, Pinto and Martins 2001). Ontology integrati on is then a comp lex task, since th e on tologies have various characteristics and forms such as languages, domains, structur es of ontologi es may differ from each other. Therefore, the authors of (Lee et al. 2006) have suggested an ontology-
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
2 Institute of Computer Science , Wroclaw University of Technology, P oland [email protected]
Abstract.
While there have been a variety of researches focusing on ontology integration based on simple techniques (e.g.,
element- or structure-level techniques), the hybrid approaches combining the simple techniques have not been
explored. In this paper, we describe a hybrid method to integrate multiple ontologies in several levels such aselement level, internal structure, and relational structure. A semantic supporting environment (SSE) combining
special domains (e.g., WordNet) and text corpus are defined in the proposed approach. An enriched ontology model
(EOM) has been proposed to reduce the initial complexity of the process of ontology integration. Subsequently, the
semantic network called OnConceptSNet is provided. The relations between the concepts in the OnConceptSNet are
derived from the SSE. An Enhanced Algorithm (EA) has been proposed to enhance OnConceptSNet .
based architecture, which provides a solid basis for existing studies about ontology integration task. Pinto and Martins (2001)
identified the activities which should be performed in the ontology integration process. Recently, there has been an
increased interest in creating various tools serving to ontology integration: PROMPT (Noy and Musen 2003) is a semi-
automatic and interactive tool suitable for performing ontology mapping, alignment, versioning, and merging, based on the
Frame paradigm. Noy and Musen have developed ANCHORPROMPT (2001) for ontology mapping and PROM-PTDIFF
(2002) for ontology merging. The limitation of PROMPT is that two ontologies taking part in the mapping (and merging)
process must be different versions of the same ontology. MAFRA (Maedche et al. 2002) is an ontology mapping frame-
work using Semantic Bridge Ontology (SBO). In MAFRA, similarity between two concepts is calculated mainly using
lexical analysis via WordNet, domain glossaries, bilingual dictionaries, and corpuses. There is no explicit deterministic
heuristics other than lexical heuristics (or synonyms), in the semantic bridge construction. ONION (Mitra and Wiederhold
2002) is a heuristic-based ontology composition system to resolve the terminological heterogeneity using two matching
approaches: linguistic matching via WordNet and instance-based matching via databases. Chimaera (McGuiness et al. 2000)
is an ontology merging and diagnosis tool developed by the Stanford University Knowledge Systems Laboratory (KSL).
Owing to this tool, two semantically identical terms from different ontologies are coalesced so that they are referred by the
same name in the resulting ontology, next it identifies the terms that should be related with each other by subsumption,
disjointness, or instance relationships and provides the support for introducing those relationships. GLUE (Doan et al. 2001,
Doan et al. 2002) is a system that employs a multi-strategy machine learning technique with jointing probability distribution.
Firstly, GLUE identifies the similarities of instances. And secondly, it compares between the relations, based on thesimilarity results of instances. GLUE uses two kinds of base learners: a name learner and a number of content learners.
The purpose of the above mapping tools it not to create a new ontology from multiple ontologies. In this paper, we
propose a new method to integrate multiple ontologies. Our main contributions consist of the following elements:
- Enriched Ontology Model (EOM) has been proposed to improve the semantic concepts in ontologies from which the
complexity is reduced initially by a direct matching between the same types of concept, instead of matching blindly or
exhaustively among all concepts.
- Semantic Supporting Environment (SSE) has been defined. It not only provides the semantic relations between the
concepts in which the relations acquire from the knowledge of combining the special domain (e.g. WordNet) and the text
corpus discovery, but also enhances the ability of special domain such as supplementing new relations of concepts to the
special domain. Moreover, the techniques of similarity analysis used in SSE are combined with instance-based similarity,
- A semantic network called OnConceptSNet has been also provided. It allows two concepts owing many relations in the
progress of ontology integration. The OnConceptSNet provides a rich semantic environment in order that the relations
between concepts enhance themselves.
- An Enhanced Algorithm (EA) has been proposed in which OnConceptSNet is initiated by the static rules and the
knowledge included in SSE, next enhanced by the meta-rules, and finally reduced by the dynamic rules. The final
OnConceptSNet will be the one that representing the candidate ontologies.
2 Basic Notions
We assume a real world (A,V)where Ais the finite set of attributes and Vis the domain of A. Also, Vcan be explained as a
set of the values of the attribute, and V=⋃V ∈ (Vis the domain of attribute a). In this paper, we accept the following
assumptions:
Definition 1 (Ontology). An ontology is a quintuplet:
= ( , ∑, , , )
where,
– : set of concepts (the classes);
– : set of instances of the concepts;
– : set of binary relations between the concepts from , or between the concepts from and values
defined in a standard or user-defined data type;
– : set of axioms, which can be interpreted as integrity constraints or relationship between instances
and concepts. It means that is set of restrictions or conditions (necessary & sufficient) to define
the concepts in C;
– < , ∑ > : is the taxonomic structure of the concepts from where ∑ is the collection of
subsumption relationship (⊑) between any two concepts from . For two concepts and ∈ , ⊑if only if any instances that are the members also are the members of concept ,
and it is not vice versa.
The is known as the set of properties. For every ∈, there is a specific domain and rangeℛsuch that p: ,
where ⊆Cand if ℛ⊆Cthen p called object property, else if ℛis a set of standard of user-defined data types then p called
data type property. We assume that concepts c and correspond to the domain and range of property p respectively , where p
The similarity between two concepts cand cis defined as follows:
( ,)= (, )+ , + (( , )+ ( , )) where 0≤ , , ≤1, + + =1and (, )is the similarity between two labels of concepts ,and
Combining Acquisition Algorithm. The representation of the OnConceptSNet is built or extended as the initial step
by acquiring the knowledge from WordNet-Based and Text corpus discovery. We suppose that the relation (, )will
exist between two concepts and that come from the OnConceptSNet . While comparing a result (, )to the
WordNet-based, three possibilities are available:
1. Both concepts and are in WordNet, and their relation (, )is already in the database of WordNet, it is
suggested to update the OnConceptSNet .
2. Both concepts and are in WordNet, and their relation (, )is not; it is suggested to update the
OnConceptSNet and the WordNet.
3. The concepts and are not present, these concepts and the corresponding (, )relation are suggested to add
the Knowledge-base of Assistant WordNet and to update OnConceptSNet (just the relation (, )).Here we sketch the collaborative acquisition algorithm which combines WordNet-based and Text corpus to discover new
relations between the entities of ontologies for ontology integration tasks as follows (see figure 2.):
- Knowledge of Assistance WordNet is a Concept Net based on the ontology with its relations: is kind of, is equivalent of . It
receives messages from Feedback component, then updates the relations between the entities of ontologies which is not
- Mining from Text Corpus is the procedure that is mentioned in (Duong et al. 2008a). It discovers new relations between
the entities of ontologies through Text corpus.
- Ontology Integration Task , will be presented in the next section, receives the relation R(c,c)and updates OnConcept-
SNet.
- Feedback is a cache of new relation and mark (mark is used to identify new relation which should be updated in
Knowledge of Assistance WordNet or WordNet-based).
5 Ontology Integration Strategies
5.1 The OnConceptSNet
In this section, we present a semantic network of ontologies’ concept, called OnConceptSNet which serves to integrate
multiple ontologies and reconcile semantic conflicts between the ontologies. The OnConceptSNet builds or extends the
concept representations by acquiring knowledge from WordNet-Base, Text corpus, and Meta-rules. The knowledge may
change the old network by adding or deleting nodes and arcs or by modifying the numerical values of arcs (relations) or the
relation between nodes, called weights, associated with the arcs.
An OnConceptSNet is a directed loop graph with quadruple:
=(∗,∗, , )
where,
– ∗is a set of nodes representing concepts that come from ,…, ,
– ∗is a set of arcs representing the relations between concepts: semantic equivalent ( ⇔), more general (⊑), disjoint (⊥),overlap (≍). Each arc is associated with a numerical value being weight (w) of a relation is represented by the
corresponding arc.
– N is an adjacency matrix of G, written N (G), n-by-n matrix in which n is the number of nodes in G. Entry is the number
of arcs in G with endpoints ( , )/ = and otherwise entry to distinguish and its corresponding ontology.
– M is the incidence matrix of G, written M (G) , n-by-m matrix in which m is the number of edges (relations) in G. If is
the start point of , entry is equal -1. If is the second point of , entry is equal w ( >0) that is the weight
of the arc and is qual 0 in the others case.
– If vertex v is a start point of edge e, then v and e are incident values.
– The degree of vertex v, written d(v) is the number of incident values of edges.
Note that the above-mentioned meta-rules are just some examples of equality meta-rules. Other meta-rules such as the
subsumption, overlap and disjoint meta-rules are not presented here. Moreover, because these meta-rules enhance relations
between the concepts of OnConceptSNet by analyzing relation structure between the concepts, this approach is called
relation structure-based similarity.
6 Multiple Ontologies Integration Progress
The Figure 4 below illustrates the ontology integration progress where the most of components are already presented in the
previous sections. Therefore, in this section, we just discuss on how to recognize the identity of concepts, where it is clue to
classify concepts in the EOM.
Here we show some methods to recognize identities. We assume that all candidate ontologies are transformed to ontologies
OWL. Firstly, we collect all the necessary and sufficient properties of the concept. Secondly, we represent an identity as the
property of the concept and distinguish it from other properties by the characteristic of one-to-one functional between itsdomain and range by implement two different methods as follows:
1. As we know, the identities can be written in OWL by using owl:DatatypeProperty with three restrictions:
owl:FunctionalProperty, owl:InverseFunctionalProperty , and owl:cardinality = 1. Here, we use the following
heuristic to distinguish a DC : If a concept is the one of top-most taxonomy in a given ontology and it contains at