Page 1 Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integrating Multiple Data Sources Integrating Multiple Data Sources using a Standardized XML Dictionary using a Standardized XML Dictionary TR TR Labs Labs - Winnipeg - Winnipeg Ramon Lawrence Ramon Lawrence University of Manitoba University of Manitoba [email protected][email protected]Supervisor: Dr. Ken Barker Supervisor: Dr. Ken Barker
27
Embed
Page 1 Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integrating Multiple Data Sources using a Standardized XML.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1
Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence
Integrating Multiple Data Sources Integrating Multiple Data Sources using a Standardized XML Dictionaryusing a Standardized XML Dictionary
Integrating Multiple Data Sources Integrating Multiple Data Sources using a Standardized XML Dictionaryusing a Standardized XML Dictionary
Ramon LawrenceRamon LawrenceUniversity of ManitobaUniversity of Manitoba
[email protected]@cs.umanitoba.caSupervisor: Dr. Ken BarkerSupervisor: Dr. Ken Barker
Ramon LawrenceRamon LawrenceUniversity of ManitobaUniversity of Manitoba
[email protected]@cs.umanitoba.caSupervisor: Dr. Ken BarkerSupervisor: Dr. Ken Barker
Page 2
Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence
Outline
Introduction, Motivation, and Background Integration architecture components Integration architecture Example integration Applications to the WWW Future work and conclusions Demonstration of Unity
Page 3
Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence
Introduction
Integration of data is required when accessing multiple databases within an organization or on the WWW.
Our focus is automatically combining database schema using schema integration.
Schema integration requires knowledge of data semantics and use of metadata.
Page 4
Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence
Motivation
Organizations have several database systems which must interoperate.
Users often access multiple Web databases whose knowledge must be integrated and presented in a useful form.
Data warehouses and OLAP systems require data semantics to be understood and data to be cleansed and summarized.
Page 5
Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence
Background
Schema integration involves combining diverse database schema into an integrated view by resolving conflicts.
Schema conflicts include naming, structural, and semantic conflicts.
Schema integration is required for database interoperability, but it is currently a manual process.
Page 6
Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence
LDBS
GTS
MDBS Architecture
Global Transaction Manager (GTM)
•processes global transactions•insures information in all LDBSs is consistent•submits subtransactions to the GTSs for each LDBS
Global Transaction Servers (GTSs)•one for each LDBS•converts subtransactions from the GTM into a form usable by the LDBS and vice versa
Local Database Systems (LDBSs)•databases combined into MDBS•unchanged as still process local transactions
GTM
Global Transactions
Local Transactions
subtransactions
GTSGTS GTS
LDBS LDBS LDBS
Page 7
Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence
Previous Work
Research systems: integrating systems by logical rules (Sheth) defining global dictionaries (Castano) Carnot Project using the Cyc knowledge base
Industrial systems and standards: Metadata Interchange Specification (MDIS) XML, BizTalk, E-commerce portals
Page 8
Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence
Architecture Objective
The objective of our architecture is to provide a system for automatically integrating diverse relational schemas into a multidatabase
Desirable properties: individual mappings - information sources integrated
one-at-a-time and independently global view constructed for query transparency handles schema conflicts - including semantic,
structural, and naming conflicts automated global integration - global view
constructed efficiently and automatically
Page 9
Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence
The Idea
The major idea is that schema conflicts can be resolved if we:
eliminate all naming conflicts define a language capable of determining schema
equivalence and performing transformations
With these two properties, schema conflicts can be resolved automatically at the global level
Page 10
Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence
Architecture Components: The Global Dictionary
A global dictionary (GD) provides standardized terms to capture data semantics.
Hierarchy of terms related by IS-A or Has-A links Contains base set of common database
concepts, but new concepts can be added
A GD term is a single, unambiguous semantic definition.
Several GD entries for a single English word are required if the word has multiple definitions.
Page 11
Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence
Architecture Components:Using the Global Dictionary
GD terms are used to build semantic names to describe the semantics of schema elements.
Semantic names have the form: semantic name = “[“CT [[;CT] | [,CT]] “]” CN CT = context term, CN = concept name each CT and CN is a single term from the GD
Semantic names are included in specifications describing a data source.
Page 12
Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon LawrenceArchitecture Components:
X-Specs
Database metadata and semantic names are combined into specifications called X-Specs:
stored and transmitted using XML contains information on a relational schema organized into database, table, and field levels stores semantic names to describe and integrate
schema elements
Page 13
Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon LawrenceArchitecture Components:
Integrating X-Specs
Each database to be integrated is described using a X-Spec.
Identical concepts in different databases are identified by similar semantic names.
Concepts with identical (or hierarchially related) semantic names are combined regardless of their physical representation in the individual databases.
Page 14
Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence
Integration Architecture
Our integration architecture consists of two separate phases:
capture process: X-Specs are constructed for each data source independently
integration process: X-Specs are combined using the integration algorithm which matches semantic names using the global dictionary
Page 15
Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence
Integration Architecture:The Capture Process
Capture process involves: automatically extracting the schema information
and metadata using a specification editor assigning semantic names to each schema
element (tables and fields) to capture their semantics
Page 16
Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence
Integration Architecture:The Capture Process
RelationalSchema
GlobalDictionary
X-SpecSpecification
Editor
AutomaticExtraction
DBA Lookupof terms
Page 17
Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence
Integration Architecture:The Integration Process
Integration process involves: automatically identifying identical concepts by
matching semantic names constructing a global view of database concepts
consisting of a hierarchy of concept terms resolving structural differences during query
generation and submission (e.g. a concept may be represented as a table in one database and a field (attribute) in another)
Page 18
Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence
Integration Architecture:The Integration Process
Client
RDBMS
Integration Site
Subtransactions
Client………….
RDBMS……..
X-Spec X-Spec
Page 19
Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence
Integration Architecture Benefits
The benefits of the two phase architecture are:
Dynamic integration: schemas integrated as needed
X-Specs are constructed only once and independent of each other
Automatic conflict resolution by integrating based on semantic name rather than physical structure
Users are isolated from system names and organization by querying through a global view using semantic names for concepts
Page 20
Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence
Integration Example
Two claims databases to be integrated: ABC Company: Claims_tb(claim_id, claimant,