This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Motivation 2: Schemas coming from different languages
A river is a natural stream of water, usually freshwater, flowing toward an ocean, a lake, or another stream. In some cases a river flows into the ground or dries up completely before reaching another body of water. Usually larger streams are called rivers while smaller streams are called creeks, brooks, rivulets, rills, and many other terms, but there is no general rule that defines what can be called a river. Sometimes a river is said to be larger than a creek,[1] but this is not always the case.[2]
Une rivière est un cours d'eau qui s'écoule sous l'effet de la gravité et qui se jette dans une autre rivière ou dans un fleuve, contrairement au fleuve qui se jette, lui, dans la mer ou dans l'océan.
Een rivier is een min of meer natuurlijke waterstroom. We onderscheiden oceanische rivieren (in België ook wel stroom genoemd) die in een zee of oceaan uitmonden, en continentale rivieren die in een meer, een moeras of woestijn uitmonden. Een beek is de aanduiding voor een kleine rivier. Tussen beek en rivier ligt meestal een bijrivier.
combining heterogeneous data sources under a single query interface
A federated database system is a type of meta-database management system (DBMS) which transparently integrates multiple autonomous database systems into a single federated database.
The constituent databases are interconnected via a computer network, and may be geographically decentralized.
Since the constituent database systems remain autonomous, a federated database system is a contrastable alternative to the (sometimes daunting) task of merging together several disparate databases.
A federated database (or virtual database) is the fully-integrated, logical composite of all constituent databases in a federated database system.
In both GaV and LaV systems, a user poses conjunctive queries over a virtual schema represented by a set of views, or "materialized" conjunctive queries.
Integration seeks to rewrite the queries represented by the views to make their results equivalent or maximally contained by our user's query.
This corresponds to the problem of answering queries using views.
Software that creates and manipulates data is the same
All data follows same structure and data model and is part of a single universe of discourse
Different levels of heterogeneity Different languages to write applications Different query languages Different models Different DBMSs Different file systems Semantic heterogeneity etc.
ER-style relationships (is-a, part-of, ...) Set-oriented relationships (overlaps, contains, ...) Any other terms that are defined in the expression language used
Semantics of the involved elements often need to be inferred
Often need to base (heuristic) solutions on cues in schema and data, which are unreliable
e.g., homonyms (area), synonyms (area, location)
Schema and data clues are often incomplete e.g., date: date of what?
Global nature of matching: to choose one matching possibility, must typically exclude all others as worse
Matching is often subjective and/or context-dependent e.g., does house-style match house-description or not?
Extremely laborious and error-prone process e.g., Li & Clifton 200: project at GTE telecommunications:
40 databases, 27K elements, no access to the original developers of the DB estimated time for just finding and documenting the matches: 12 person years
For target attribute T.LISTINGS.agent-address: Examine attributes and concatenations of attributes from S Restrict examined set by analyzing textual properties
Data type information in schema, heuristics (proportion of non-numeric characters etc.)
Evaluate match candidates based on data correspondences, prune inferior candidates
What is ontology matching (relative to schema matching)?
same basic idea
but works on ontologies that are conceptual models (not on logical schemas such as relational tables or XML trees)
emphasizes that concepts and relations need to be matched and mapped, and may treat these differently
(Note: in the schema matching literature, it is not always clearly laid out whether the matched items come from a conceptual or a logical model; the toy examples above in particular are also conceptual)
In practice, some ontology matching tasks in fact work on such simple models (or simple subparts of models) that they do not differ at all from what we have seen so far
example: Anatomy task, see below in evaluation
Terminology: Also known as ontology alignment See (Shvaiko & Euzenat, 2005) for more details
How could this give rise to a mapping/matching problem?
A river is a natural stream of water, usually freshwater, flowing toward an ocean, a lake, or another stream. In some cases a river flows into the ground or dries up completely before reaching another body of water. Usually larger streams are called rivers while smaller streams are called creeks, brooks, rivulets, rills, and many other terms, but there is no general rule that defines what can be called a river. Sometimes a river is said to be larger than a creek,[1] but this is not always the case.[2]
Une rivière est un cours d'eau qui s'écoule sous l'effet de la gravité et qui se jette dans une autre rivière ou dans un fleuve, contrairement au fleuve qui se jette, lui, dans la mer ou dans l'océan.
Een rivier is een min of meer natuurlijke waterstroom. We onderscheiden oceanische rivieren (in België ook wel stroom genoemd) die in een zee of oceaan uitmonden, en continentale rivieren die in een meer, een moeras of woestijn uitmonden. Een beek is de aanduiding voor een kleine rivier. Tussen beek en rivier ligt meestal een bijrivier.
Sometimes a class needs to restrict the range of a property
OceanLake
BodyOfWater
River
Stream
Sea
NaturallyOccurringWaterSource
TributaryBrook
Rivulet Fleuve
Properties: emptiesInto: BodyOfWater
Since Fleuve is a subclass of River, it inherits emptiesInto.The range for emptiesInto is any BodyOfWater. However,the definition of a Fleuve (French) is: "a River which emptiesIntoa Sea". Thus, in the context of the Flueve class we want therange of emptiesInto restricted to Sea.
Note for nerds: Why does this use „rdf:ID“ and not „rdf:about“ (as FOAF does)?
“As for choosing between rdf:ID and rdf:about, you will most likely want to use the former if you are describing a resource that doesn't really have a meaningful location outside the RDF file that describes it. Perhaps it is a local or convenience record, or even a proxy for an abstraction or real-world object (although I recommend you take great care describing such things in RDF as it leads to all sorts of metaphysical confusion; I have a practice of only using RDF to describe records that are meaningful to a computer). rdf:about is usually the way to go when you are referring to a resource with a globally well-known identifier or location.“ (http://www.ibm.com/developerworks/xml/library/x-tiprdfai.html)
rdfs:range imposes a global restriction on the emptiesInto property, i.e., the rdfs:range value applies to River and all subclasses of River.
As we have seen, in the context of the Fleuve class, we would like the emptiesInto property to have its range restricted to just the Sea class. Thus, for the Fleuve class we want a local definition of emptiesInto.
67Example: Tasks 2009 (various are re-used; 2011 is currently running)(excerpt; from http://oaei.ontologymatching.org/2009/)
Expressive ontologies anatomy
The anatomy real world case is about matching the Adult Mouse Anatomy (2744 classes) and the NCI Thesaurus (3304 classes) describing the human anatomy.
conference Participants will be asked to find all correct correspondences (equivalence and/or
subsumption correspondences) and/or 'interesting correspondences' within a collection of ontologies describing the domain of organising conferences (the domain being well understandable for every researcher). Results will be evaluated a posteriori in part manually and in part by data-mining techniques and logical reasoning techniques. There will also be evaluation against reference mapping based on subset of the whole collection.
Directories and thesauri fishery gears
features four different classification schemes, expressed in OWL, adopted by different fishery information systems in FIM division of FAO. An alignment performed on this 4 schemes should be able to spot out equivalence, or a degree of similarity between the fishing gear types and the groups of gears, such to enable a future exercise of data aggregation cross systems.
Oriented matching This track focuses on the evaluation of alignments that contain other mapping
relations than equivalences.
Instance matching very large crosslingual resources
The purpose of this task (vlcr) is to match the Thesaurus of the Netherlands Institute for Sound and Vision (called GTAA, see below for more information) to two other resources: the English WordNet from Princeton University and DBpedia.
The anatomy real world case is about matching the Adult Mouse Anatomy (2744 classes) and the NCI Thesaurus (3304 classes) describing the human anatomy.
69Matching task and evaluation approach(http://oaei.ontologymatching.org/2007/anatomy/)
We would like to gratefully thank Martin Ringwald and Terry Hayamizu (Mouse Genome Informatics - http://www.informatics.jax.org/), who provided us with a reference mapping for these ontologies.
The reference mapping contains only equivalence correspondences between concepts of the ontologies. No correspondences between properties (roles) are specified.
If your system also creates correspondences between properties or correspondences that describe subsumption relations, these results will not influence the evaluation (but can nevertheless be part of your submitted results).
The results of your matching system will be compared to this reference alignment. Therefore, all of the the results have to be delivered in the format specified here.
Dhamankar, R., Lee, Y., Doan, A., Halevy, A., & Domingos, P. (2004). iMAP: Discovering complex semantic matches between database schemas. In Proc. Of SIGMOD 2004.
also interesting: N. Noy: Semantic Integration: A Survey of Ontology-based Approaches. SIGMOD Record, 33(3), 2004. http://www.dit.unitn.it/~p2p/RelatedWork/Matching/13.natasha-10.pdf
Do, H.-H., Melnik, S., & Rahm, E. (2003). Comparison of schema matching evaluations. In Web, Web-Services, and Database Systems: NODe 2002, Web- and Database-Related Workshops, Erfurt, Germany, October 7-10, 2002. Revised Papers (pp. 221-237). Springer.
McCann, R., Doan, A., Varadarajan, V., & Kramnik, A. (2003). Building data integration systems via mass collaboration. In Proc. International Workshop on the Web and Databases (WebDB).