Mar 27, 2008 Christiano Santiago 1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching Philip Bohannon, Eiman Elnahrawy, Wenfei Fan, Michael Flaster COMA - A System for Flexible Combination of Schema Matching Approaches Hongai-Hai Do, Erhard Rahm
30
Embed
Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Mar 27, 2008 Christiano Santiago 1
Schema Matching
Matching Large XML SchemasErhard Rahm, Hong-Hai Do, Sabine Maßmann
Putting Context into Schema MatchingPhilip Bohannon, Eiman Elnahrawy, Wenfei Fan, Michael Flaster
COMA - A System for Flexible Combination of Schema Matching ApproachesHongai-Hai Do, Erhard Rahm
Mar 27, 2008 Christiano Santiago 2
Goals
Introductory concepts on Schema Matching Context-Sensitive versus Context-Insensitive Complexity on XSD schemas
Mar 27, 2008 Christiano Santiago 3
Agenda
Terminology Different Approaches XML Schema Definition Context-Insensitive Context-Sensitive Q&A
Mar 27, 2008 Christiano Santiago 4
Terminology
Schema matching: it is the process of identifying that two objects are semantically related.
Mapping: it refers to the transformations between the objects.
Meaning
Conversion
Mar 27, 2008 Christiano Santiago 5
Terminology
Student.Name ≈ GradStudent.Name
Student.SSN ≈ GradStudent.ID
Student.Marks ≈ GradStudents.Grades
StudentName, SSN, Level,
Major, Marks
GradStudentName, ID, Major,
Grades
Match
Match
Transformation
Mar 27, 2008 Christiano Santiago 6
Schema Matching
Mar 27, 2008 Christiano Santiago 7
Context
Context-insensitive Context-sensitive
Mar 27, 2008 Christiano Santiago 8
Different Approaches
Schema-level matchers Instance-level matchers Hybrid matchers Reusing matching information
Mar 27, 2008 Christiano Santiago 9
Schema-Level Matchers
Only consider schema information Name Description Data type Relationship Constraints Number of nesting levels
Mar 27, 2008 Christiano Santiago 10
Instance-Level Matchers
Use instance-level to gather insight into the content and meaning of schema elements Linguistic
Dept DeptName EmpName
Constraints 416-7362100 M3J1P3
Mar 27, 2008 Christiano Santiago 11
Hybrid-Level Matchers
Combines more than one approach
Mar 27, 2008 Christiano Santiago 12
Reusing Matching Information
Use previous matching information for future matching tasks Structures or substructures often repeat
Caution Salary & Income
Payroll Tax Reporting
Mar 27, 2008 Christiano Santiago 13
XML Schema Definition (XSD)
Data types 19 built-in primitive data types 25 built-in derived data types User defined complex types
Mar 27, 2008 Christiano Santiago 14
Complex type definition: <complexType name="myNewNameType">
Match Systems approaches COMA: path-based Cupid: materialized
Scalability issue: XCBL Order schema contains 1451 components, including 91 shared types. After resolving the shared components, 26000+ nodes/paths were identified.
Mar 27, 2008 Christiano Santiago 17
XML Schema Definition (XSD)
Distributed schemas XSD allows a schema to be distributed
over several schema documents (.xsd files) and namespaces
Mar 27, 2008 Christiano Santiago 18
XML Schema Definition (XSD)
Determining similarity between and
matching complex types can be as difficult
as matching two complete schemas.
Mar 27, 2008 Christiano Santiago 19
Standard Schema Matching Context-Insensitive
Matchers Matching algorithms to compute similarity
scores between a pair of attributes Weights
Scores are weighted Confidence scores are identified based on
standard statistical techniques Selection of best matches