Top Banner
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2
33
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.

1

Lecture 13:Database Heterogeneity

Debriefing Project Phase 2

Page 2: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.

2

Outline

• Database Integration

• Wrappers

• Mediators

• Schema Integration

Book Section

Page 3: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.

3

Database Integration

• How to build applications using multiple DBs?

Ebay DVDorders

IMDB amazon

Oracle PointBase MySQL IBM DB2

movie DB order movie order status

Page 4: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.

4

Problem Dimensions

Distribution

Autonomy

Heterogeneity

Page 5: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.

5

How to Deal with Distribution?

• Problems

• Solutions

Page 6: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.

6

How to Deal with Autonomy?

• Problems

• Solutions

Page 7: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.

7

How to Deal with Heterogeneity?

• Problems

• Solutions

Page 8: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.

8

Solution Variants

• General issues– Bottom-up vs. top-down engineering– Virtual vs. materialized integration– Read-only vs. read-write access– Transparency: language, schema, location

• What did you do?

Page 9: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.

9

A Generic System Architecture

• Wrapper-Mediator architecture

DB1 DB2 DB3 DB4

Oracle PointBase MySQL IBM DB2

wrapper wrapper wrapper wrapper

mediator

application 1 application 2 application 3

mediators integrate thedata from the DBs

wrappers convert to acommon representation

Page 10: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.

10

A Closer Look at Data Models

• Data model used by sources– relational? HTML? XML? Text?

• Data model used by integrated DB– canonical data model (e.g. relational, XML)

• Query models– Structured queries, retrieval queries, data

mining (statistics)

Page 11: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.

11

A Generic Wrapper Architecture

request/query result/data

Compensationfor missingprocessing capabilities

Transformationof data model

Communicationinterface

Source data

Metadata

integrity constraints

Page 12: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.

12

Wrapper Tasks

• Data Model consists of– Data types– Integrity constraints– Operations (e.g. query language)

• Translate among different data models• Overcome other "syntactic" heterogeneity

Which was the task?

How was it implemented?

Page 13: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.

13

Example: Wrapping Relational Data in XML/HTML

• Data types– trivial

• Integrity Constraints (e.g. primary keys)– requires XML Schema

• Operations– none in HTML

Where did this play a role?

Page 14: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.

14

Example: Wrapping XML/HTML into Relational

• Data Types– which difficulties?

• Integrity Constraints– none in HTML

• Operations– requires generally XQuery– form fields can be considered as hard-coded

queries

Page 15: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.

15

A Closer Look at Schemas

• Tight vs. loose integration– Is there a global schema?

• Support for semantic integration– collection, fusion, abstraction

Page 16: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.

16

Schema Architecture for Federated DBMS

View 1 View 2 View 3

Integrated Schema

ExportSchema

ExportSchema

ExportSchema

ExportSchema

ImportSchema

ImportSchema

ImportSchema

ImportSchema

...

Relational.DBMS

Objectorient.DBMS

FileSystem

WebServer

• accepted model for integrated database systems with integrated schema

• 5-level architecture

• data independence

Page 17: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.

17

Export Schema

• provided by data source

• source DB can change w/o changing export schema

which was the export schema?

View 1 View 2 View 3

Integrated Schema

ExportSchema

ExportSchema

ExportSchema

ExportSchema

ImportSchema

ImportSchema

ImportSchema

ImportSchema

...

Relational.DBMS

Objectorient.DBMS

FileSystem

WebServer

Page 18: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.

18

Import Schema

• provided by wrapper

• export schema can change w/o changing import schema

which was the import schema?

View 1 View 2 View 3

Integrated Schema

ExportSchema

ExportSchema

ExportSchema

ExportSchema

ImportSchema

ImportSchema

ImportSchema

ImportSchema

...

Relational.DBMS

Objectorient.DBMS

FileSystem

WebServer

Page 19: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.

19

Integrated Schema

• provided by mediator

• import schemas can change w/o changing integrated schema

which was the integrated schema?

View 1 View 2 View 3

Integrated Schema

ExportSchema

ExportSchema

ExportSchema

ExportSchema

ImportSchema

ImportSchema

ImportSchema

ImportSchema

...

Relational.DBMS

Objectorient.DBMS

FileSystem

WebServer

Page 20: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.

20

Application View

• provided by application

• integrated DB can change w/o changing application (code)

which were application views?

View 1 View 2 View 3

Integrated Schema

ExportSchema

ExportSchema

ExportSchema

ExportSchema

ImportSchema

ImportSchema

ImportSchema

ImportSchema

...

Relational.DBMS

Objectorient.DBMS

FileSystem

WebServer

Page 21: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.

21

Mediator Tasks

• Integrate data with same "real-world meaning", but different representation– integration mapping schema integration– can be implemented, e.g., as database view

• Decompose queries against the integrated schema to queries against source DBs– only for virtual integration

Page 22: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.

22

Schema Integration

• Standard Methodology

Schema translation(wrapper)

Correspondenceinvestigation

Conflict resolutionand schema integration

Page 23: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.

23

Identifying Schema Correspondences

Sources of information– source schema– source database– source application– database administrator, developer, user

Which were your information sources?

Page 24: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.

24

Identifying Schema Correspondences

• Semantic correspondences – e.g. related names

• Structural correspondences– reachability by paths

• Data analysis– distribution of values

Can you give examples?

Page 25: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.

25

Conflicts

• What types of problems did you encounter integrating corresponding data?

Page 26: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.

26

Types of Conflicts

• Schema level– Naming conflicts– Structural conflicts– Classification conflicts– Constraint and behavioral conflicts

• Data level– Identification conflicts– Representational conflicts– Data errors

Page 27: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.

27

Conflict Resolution

• Depends on type of conflict

• Requires construction of mappings

• Mappings might be complex, e.g. not expressible as SQL views

Page 28: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.

28

Naming Conflicts

• Homonyms (give example)– same name used for different concepts– Resolution: introduce prefixes to distinguish

the names

• Synonyms (give example)– different names for the same concepts– Resolution: introduce a mapping to a common

name

Page 29: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.

29

Structural Conflicts

• Different, non-corresponding attributes– Resolution: create a relation with the union of

the attributes

• Different datatypes – Resolution: build a mapping function

• Different data model constructs– e.g. attribute vs. relation– Resolution: requires higher order mappings

Page 30: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.

30

Classification Conflicts

• Relations can have different coverage (inclusion, non-empty intersection)– Resolution: build generalization hierarchies

• Additional problem– Identification of corresponding data instances– "real world" correspondence is application

dependent

Page 31: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.

31

Data Correspondences

• Corresponding data instances– similar to naming conflicts at schema level– Resolution: mapping tables and functions– Similarity functions

• Corresponding data values, data conflicts– of corresponding data instances– Resolution: mapping tables and functions– Prefer data from more trusted data source

Page 32: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.

32

Constraint and Behavioral Conflicts

• Cardinality conflicts– different types of cardinality constraints on

relationships– Resolution: use the more general constraint

• Behavioral conflicts for relation update– E.g. cascading delete vs. non-cascading– Resolution: add missing behavior at global level

Page 33: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.

33

More?

• Security– protecting data

• Data Quality– actively managing data quality

• Integration as Agreement Process– "emergent semantics"