Top Banner
1 Heterogeneous / Federated / Multi-Database Systems Vera Goebel Department of Informatics, University of Oslo 2011
54

Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

Mar 19, 2018

Download

Documents

vuonghanh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

1

Heterogeneous / Federated /

Multi-Database Systems

Vera Goebel Department of Informatics, University of Oslo

2011

Page 2: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

2

Contents: Heterogeneous DBSs

• Motivation

– Applications for Heterogenous Database Systems (HDBS)

• What is a HDBMS?

• Architectures for HDBS

• Main Problems:

- Defining a Global Data Model

- Query Processing & Optimization

- Transaction Management

• Summary and Conclusion

Page 3: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

3

Extended CAD App

Multi-product

Customer Support

Applications

• Multitude of extensive, isolated data agglomerations managed by different DBMSs or file systems

CIS

Electricity

Billing

Cust.

Support

Suppliers

CIS

Nat. Gas

Billing

Cust.

Support

Accting

CIS

Oil

Billing

Cust.

Support

Deliveries.

CAD Parts

Library

Simulation Design

Creation.

Supplier

Parts DB

Payment Accting

Manufac.

DB

Line

Analysis Equip

Inven.

• Extension of data and management software because of new and/or extended applications

• Heterogeneous application domains (e.g., CIM, CAD, Biz-mgmt, …)

– Similar data • Ex: 3 Customer Info Systems

– Dissimiliar data • Ex: Extended CAD Application

Page 4: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

4

Heterogeneous Database Systems (HDBS)

Local

application

Inventory Accounts Shipping

DBMS 1 DBMS 2 DBMS 3

HDBS

HDBS

Metadata

integration layer

Global

application

Global

application

Page 5: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

5

Requirements for HDBS

• Properties known from homogeneous DBS: - global data model, transactions, recovery, dist transparency, ...

• Integration of Heterogeneous Data Stores -> queries across HDBs (combine heterogeneous data) -> heterogeneous information structures -> avoid redundancy -> access (query) language transparency

• “Open” system support for integration of existing data models and DBSs, as well as their schemas and DBs

• Constraints -> retain autonomy of DBS to be integrated -> avoid modifications of existing local applications -> define a viable global data model for global applications

Page 6: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

6

Definition - Heterogeneous DBS (HDBS)

A HDBS comprises a software layer (integration layer)

and multiple DBSs and/or file sytems to be integrated.

Users can transparently access the integrated DBSs and/or file

systems via the interface provided by the integration layer.

Defines a global data model

Supports a Data Definition Language (DDL)

Supports a Data Manipulation Language (DML)

Distributed Transaction Management

Transparent integration of the underlying, disparate DBSs

The integrated, local DBSs are autonomous and can also be used

as stand-alone systems.

Local applications are unchanged and unknown to the HDBS.

Page 7: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

7

Access Language Transparency

Data Modelling Language Transparency

Network/Distribution Transp.

Data Replication Transparency

Data Fragmentation Transparency

Layers of Transparency

Data

Data Independence

Single site DBMS

Homogeneous Distributed DBMS

Heterogeneous DBMS

Page 8: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

8

Abstraction Levels [Christmann et al. 87] Abstraction Level Supported By Objects

access & data model lang global conceptual schema relations or objects

Glo

bal A

bstr

actions

replication transparency replication schema multiple copies of

fragments of rels/objs

fragmentation transparency fragmentation schema fragments of rels/objs

network transparency remote communication remotely located multiple

services copies of fragments

logical data independence local conceptual schema local relations/objects

storage and I/O system disk storage definitions tracks, physical blocks

physical data independence physical schema records, access paths

file system file definitions and physical records, pages

buffer management

Local A

bstr

actions

Page 9: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

9

DBMS Implementation Alternatives

Distribution

Heterogeneity

Autonomy

Distributed Homog.

Federated DBMS

Centralized Homog.

Federated DBMS

Distributed Heterog.

Federated DBMS

Centralized Heterog.

Federated DBMS

Distributed

Multi-DBMS

Centralized

Multi-DBMS

Distributed Heterog.

Multi-DBMS

Centralized Heterog.

Multi-DBMS

Distributed

Heterogeneous DBMS

Centralized

Heterogeneous DBMS

Centralized

Homogeneous DBMS

Distributed

Homogeneous DBMS

Page 10: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

10

Heterogeneous Database Systems (HDBS)

Inventory Accounts Shipping

Global

application

Local

application

integration layer

DBMS 1 DBMS 2 DBMS 3

HDBS

HDBS

Metadata

Global

application A Multi-Database or

a Federated Database System

Page 11: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

11

Components of a Multi-DBMS

USER

User Requests System Responses

Multi-DBMS Layer

DBMS Query

Processor

Transaction

Manager

Scheduler

Runtime Support

Processor

Recovery

Manager

DBMS Query

Processor

Transaction

Manager

Scheduler

Runtime Support

Processor

Recovery

Manager

• • •

Page 12: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

12

Components of a Distributed Multi-DBMS

Multi-DBMS Layer

DBMS Query

Processor

Transaction

Manager

Scheduler

Runtime Support

Processor

Recovery

Manager

DBMS Query

Processor

Transaction

Manager

Scheduler

Runtime Support

Processor

Recovery

Manager

• • •

Multi-DBMS Layer

DBMS Query

Processor

Transaction

Manager

Scheduler

Runtime Support

Processor

Recovery

Manager

DBMS Query

Processor

Transaction

Manager

Scheduler

Runtime Support

Processor

Recovery

Manager

• • •

USER

User Requests System Responses

USER

User Requests System Responses

Multi-DB Integration layers act as peers in a homogeneous distributed database system

- Use the global data model and global access language

- Distributed control over transaction execution

- Users submit queries to any Multi-DB site

Page 13: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

13

HDBS Architecture

DB 1 DB 2 DB n

Local

application

global integration layer

DBMS 1 DBMS n

HDBS (federation)

local

system 1

local

system 2

local

system n

...

HDBS

Metadata

DBMS 2

Global

application

Global

application

Export

Schema1

Export

Schema2

Export

Schema3

Page 14: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

14

Abstract Component Architecture of HDBS

DB 1 DB 2 DB n

global

integration

layer

DBMS 1 DBMS 2 DBMS n ... local

DBSs

DBMS software of HDBS HDBMS

Metadata

DB-model-specific

coupling software

Coupling software can be partitioned into processes (or agents)

that execute on HDBMS hosts and on local DB hosts.

Page 15: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

15

Toolkits for HDBMS – an implementation approach

DB 1 DB 4 DB 5

DBMS 1 DBMS 4 DBMS 5

Multi-DB Layer

Integration

Toolkit

DBS T1

DBS T2

DBS T3

DBS T4

DBS T5

Page 16: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

16

Export

Schema3

Export

Schema2

Export

Schema1

Heterogeneous Database Systems (fully auton. HDBS)

DB 1 DB 2 DB 3

Local

application

integration layer

DBMS 1 DBMS 2 DBMS 3

HDBS

HDBS

Metadata

Global application

HDBS Server or HDBS Proxy

- Runs on the local DB site

- Typically includes some code that is specific to the local DB type

Global application

Page 17: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

17

Legacy Data Source #2

Information Integration Architecture “Multiple, legacy data sources”

Information Mediator

Global Data

Dictionary

Decompose Query

Manage Query Exec

Compute Final Results

. . .

Web

Browser

Query

Query

Legacy Data Source #1

Wrapper #1

Local Data

Dictionary

Parse SubQuery

Create & Exec

Call Sequence

Convert & Return

Results as Tuples

Wrapper #2

Local Data

Dictionary

Parse SubQuery

Create & Exec

Call Sequence

Convert & Return

Results as Tuples

Page 18: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

18

CORBA Objects for HDBS – an implementation approach Use distributed object managers (DOMs) to realize HDBSs -> CORBA

Data

Source X

Data

Source Y

DOM 3

LAI 1

DOM 1

LAI 2

DOM 2

LAI 3

client a client c client b

LAI - local application interface

DOM – distributed object manager

DOM 4 Like the

HDBMS Proxy

Like the

Integration Layer

Page 19: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

19

Concepts in the Integration Layer

• Global data model

• Global schema and meta data management

• Distributed query processing and optimization

• Distributed transaction management

• Extensible software construction

(to allow the “easy” integration of additional system components)

Page 20: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

20

Data Model

• Local data models: any kind of data model possible, e.g., object-oriented, relational, entity-relationship, hierarchical, network-oriented, flat files, ...

• Global data model: must comprise modeling concepts and mechanisms to express the features of the local data models – When integrating N local data models,

use the “richest” model of the N models you are integrating

– Object-oriented data models

• Provide user-defined data types and methods

• Are often used as the global (integration) data model

1) Is a complete, minimal, and understandable data model for the union of

the data stored in the set of local data bases (application development time)

2) Support application queries that can be satisfied by retrieving data from

the set of local data bases(application runtime)

Goals - To define a data model that:

Page 21: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

21

Schema Architecture of HDBS

global

data model

global

data model

local

data models local

schema 1

local

schema n ...

global/federated

schema

schema

integration

... export

schema 1

export

schema n

homo-

genization

Page 22: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

22

Schema Architecture of HDBS - 2 5-layer schema architecture

local schema local schema local

data models

...

auxiliary schema auxiliary schema ... ...

external schema external schema external schema ...

Multi-lingual

export schema export schema export schema ...

Multiple Views

federated schema federated schema ...

Multi-Use

Translation

Global View Defn

Integration

App View Defn

... component

schema

component schema global

data model

Page 23: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

23

Schema Homogenization

• Schema Translation

– Map each local schema to the language of the global data model

• Ex: a Relational schema to an Object-oriented schema

Adequate design tools

are not available

• Schema Integration

– For N translated, local schemas

• Pairwise integration, X-at-a-time integration, One-step integration

– Determine ”common semantics” of the schemas

– Make the ”same things” be ”one thing” in the integrated schema

– Resolve conflicts

• structural and semantic

Page 24: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

24

Schema Conflicts • Name

– Different names for equivalent entities,

attributes, relationships, etc.

– Same name for different entities, attributes, …

Engr

Cost Center

works-in

name

title

name rank

salary

Comp Pkg

earns

works-on

Emp

Proj

M

N N

1

C2 C1

Fname Lname Nickname Init

Name (as an entity)

Name (as an attribute)

Same Info

• Structure

– Missing attributes

– Missing but implicit attributes

• Relationship

– One-to-many, many-to-many

• Entity versus Attribute (inclusion)

– One attribute or several attributes

• Behavior

– Different integrity constraints

• Ex: automatic update, delete a project when

the last engineer is moved to another project

Page 25: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

25

Data Representation Conflicts

• Different representation for equivalent data

How to Resolve Schema Conflicts?

Can Object-Oriented Models Help?

– Different units

• Celsius ↔ Farenheit; Kilograms ↔ Pounds; Liters ↔ Gallons;

– Different levels of precision

• 4 decimal digits versus 2 decimal digits

• Floating point versus integer

– Different expression denoting same information

• Enumerated Value sets that are not one-to-one

– {good, ok, bad} versus {one, two, three, four, five}

Page 26: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

26

Suitability of OO Data Models as Global Data Models

• Rich set of type constructors

-> easy representation of other data models

• Extensibility (user-defined types + type specific operators) &

Encapsulation

-> representation of “foreign” types/systems

-> hiding heterogeneity (concrete storage) in a natural way

• Inheritance (generalization) & computational completeness

-> schema integration

- factor out common properties of similar types

- thereby “arbitrary” computations possible

Page 27: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

27

class Employee (

class Person (

class Student (

Use of Generalization & Comp. Completeness (Example)

is_a is_a

class Employee

name: string,

address: Address,

salary: float,

course-given: set (Courses);

DBS1 class Student

name: string,

address: Address,

grant: float,

course-enroll: set (Courses);

DBS2

global

data

model

local

data

models

method net-income(): float;

name: string,

address: Address)

method net-income (): float

return (self->salary *

(1-self->tax-rate));

tax-rate: float)

salary: float,

course-given: set (Courses),

method net-income (): float

return (self->grant);

grant: float,

course-enroll: set (Courses))

Page 28: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

28

Conflict Resolution

• Renaming entities and attributes – Pick one name for the same things

– Use unique prefixes for different things

Engr

D-Name

D-Name

D-Name

Dept

Member-of

Emp

1

N

D-Name Bldg …

Bldg

Dept

Member-of

1

N

• Homogenizing representations – Use conversions and mappings

• stored programs in relational systems

• methods in OO systems

• auxiliary schemas to store conversion rules/code

• Homogenizing attributes – Use type coercion (e.g., integer to float)

– Attribute concatenation (e.g., first name || last name)

– For missing attributes, assign default values

• Homogenizing an attribute and an entity – Extract an attribute from the entity

• Ex: Project department name from the Dept entity to create a virtual attribute (e.g., Emp->Dept.name)

– Create an entity from the attribute

• Ex: Define default values and behavior for all other attributes of the Dept entity

Page 29: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

29

Conflict Resolution • Horizontal joins

A B C 1 2 3 4 5

A B C 1 2 3

A B 4 5

dfv

A B C 1 2 3

A D E F 1 2 3

A B 1 2

A C D 1 2

C E F 1 2

A B C D E F 1 2 3 4 5

Union

Union

Join

Join

Join

– Union compatible

• For missing attributes, assign default values

or compute implicit values

– Extended union compatible • Use generalization

– Define a virtual class containing common

attributes

• Subclasses of the generalization

– Provide specialized values and compute attribute

values for generalized attributes

• See earlier example

– class Person generalizes

class Student and class Employee

• Vertical joins

– Many and many to one

• Mixed Joins

– Vertical and horizontal joins in combination

Page 30: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

30

Conflict Resolution involving a Database Key

• Entity-Attribute Conflicts where the

Attribute is a DB key in one local schema LDB2-E

Attr1

D

Rel

LDB1-E

1

N

AttrN Attr1 …

LDB1-D

GDB-E

GDB-D

Rel 1

N

AttrN Attr1 … N-key

• Example:

– The global schema defines Attr1 as an entity

– Attr1 is a DB key for instances of LDB2-E

• If Attr1 is a complete DB key in LDB2,

then in the global schema

– Define entities E and D and relationship Rel

– Define a new DB key attribute that will

be used to uniquely identify instances

of LDB2-E when they are accessed through

GDB-E and GDB-D

Page 31: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

31

Conflict Resolution involving a Partial Database Key

• Entity-Attribute Conflicts where the Attribute

is a partial DB key in one local schema

D

Rel

LDB1-E

1

N

AttrN Attr1 …

LDB1-D

Attr1 AttrN … N-key

GDB-D

Rel 1

N

GDB-E

Key2

LDB2-E

Attr1 Key2

• Example:

– The global schema defines Attr1 as an entity

– Attr1 is a partial DB key for instances

of LDB2-E

• If Attr1 is a partial DB key in LDB2

– Define the entities E and D, and relationship Rel

– Define a new attribute as a partial DB key

– Add partial DB key LDB2-Attr1 as an attribute only

– Add the other partial key attributes from LDB2 as

partial keys

Page 32: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

32

Global Schema Management

• HDBS manages the global schema = (all local exported schema)

• Global schema definition facilities provide mechanisms for handling

the full spectrum of schematic differences that may exist among the

heterogeneous local schemata.

– Can use an Auxiliary Schema to store mappers, translators, and converters.

• Data is stored in the local component systems.

• Global dictionary information is used to query and manipulate the

data. The global language statements are translated into equivalent

statements of the local languages supported by the local systems

Page 33: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

33

Query Processing and Optimization

• The HDBMS has

– A global Data Definition Language (DDL)

– A global Data Manipulation Language (DML)

– A set of local DMLs

• The HDBMS Query Processing Goal:

– Given a query stated in the global query language (DML),

execute that query, in an optimal manner,

using the local database management systems

Page 34: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

34

Localized multi-DB query 1

DB n DB 3 DB 2 DB 1 ...

Localized multi-DB query m

Another

Multi-DBMS

... SQ 1 SQ 2 SQ 3 SQ n ... PQ 1 PQ k

Query Planning and Optimization in a Distributed Multi-DBMS

global query

query

translator 1 query

translator 2

query

translator 3

query

translator n ...

query localization

query fragmentation

and global optimization

... TQ 1 TQ 2 TQ n TQ 3

...

Sorting and unioning result data

Joining intermediate results

Page 35: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

35

Local DBMS Decomposition &

Local Optimization

Global Query on Multiple

Databases at Multiple Sites

Localization

Control Site

Information Supporting Query Planning & Optimization

Fragmentation & Global Opt Multi-DB Manager

Translation

Optimized Local Execution Plan

Data Allocation

Data Directory

Export & Aux

Schema

Local Schema

& Access Paths

{ Subqueries, each on a single Multi-DB }

{ Queries, that can be processed by local DBMS }

{ Subqueries, each on a single local DBMS }

{ Post-processing Queries }

{ Post-processing Queries }

Page 36: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

36

• Similar to query fragmentation problem for homogeneous distributed DBSs

• But …Complicating factors:

Query Fragmentation

– Autonomy

• Little information about “how” the subquery will be executed by the Local DBS

– Heterogeneous Data Definition Languages

• Weaker modeling languages do not support the same manipulation “features”

• Must use multiple techniques in order to define a consistent global data model

• Query fragmentation must produce a set of subqueries that reverse the

operations used to create/define the global schema

• Processing Steps:

(1) Replace names from the global schema with “fullnames” from the export schemas

(2) If a subquery involves multiple export schemas, then break the query into queries

that operate on one export schema and insert data communication operators to

exchange intermediate results between local database systems

Page 37: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

37

Global Query Optimization

• Primary Considerations: – Post-processing Strategy

– Parallel Execution Possibilities

– Global Cost Function/Estimation

• Similar to global query optimization for homogeneous distributed DBSs (many algorithms can be used directly)

• But only possible under the following assumptions: – No data inconsistency (the global schema correctly represents

the semantics of disjoint, overlapping, and conflicting data)

– Know the characteristics of local DBSs • e.g., statistical info on data cardinalities and selectivities are available

– Can transfer partial data results between different local DBSs • Major impact on post-processing plans

Page 38: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

38

Post-Processing Strategies

1) Control site performs all intermediate and

post-processing operations (I&PP-ops)

• Heavy work load; minimal parallelism

• Three Strategies:

2) Control site performs I&PP-ops for multi-DB results;

Multi-DB managers, and HDBMS agents on the local

database sites perform I&PP-ops for DBSs within one

multi-DB environment

• Better work load balance; more parallelism

3) Use strategy #2 and use “pushdown” to get the local

database systems to perform I&PP-ops

• Possible if local DBMS can read intermediate results from

external sources, and sort, join, etc. can be directly invoked

Page 39: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

39

Parallel Execution Strategies

• Traditional query plans use left linear join trees

• Bushy join trees provide parallel execution

in heterogenous multi-DB environments

– Convert a left linear join tree into

a (balanced?) bushy join tree

R1

R5 R4

R3

R2

R1 R2

R5

R3 R4

• Join operations are slow → speedup with parallel execution?

– One of the operands is always a base relation

• Have good info on cardinality and selectivity for the base

– Used even in homogeneous distributed DBSs

because cooperative nodes can pipeline the

sequence of joins

Page 40: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

40

Global Cost Estimation

• Differs from cost estimation in homogeneous distributed DBSs

– Little (or no) info on QP algorithms and data statistics in local DBS

• Cost Estimation Function

– Cost to execute each subquery on the local DBMSs

– Cost to execute all I&PP-ops

• via pushdown or by any HDBMS agent/service

• Use a simplified cost function

• Run test queries on the local DBSs to get time estimates for ops

– Selection, with and without an index

– Join (testing for different algorithms: sort, hash, or indexed based algorithms)

Cost = Initialization cost

+ cost to retrieve a set of objects

+ cost to process a set of objects

Page 41: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

41

Query Translation When a query language of a local DBS is different from the global

query language, each export schema subquery for the local DB needs

to be translated from the global language to the target language.

Weaker target languages do not support the same operations,

so emulate required operations in post-processing

Ex: retrieve more data than requested by the query

and then post-process that data to compute

the correct response to the query

Object-oriented (global)

Object-oriented (local)

Relational (local)

Hierarchical (local)

Network-oriented (local)

. . . Relational (global)

Reduce the number of language mappings

using the Entity-Relationship Query Language

as an intermediary language

ERQL

QUEL SQL

OQL

CODASYL

Access Funcs

DB/2

Func I/F

Page 42: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

42

Query Translation - 2

(b) relational predicate graph

Car1 Company

City1 People age = 52

City2 Car2 color = red

(1)

(2) (3)

(2) (5) (4)

Join Predicates:

(1) Company-OID (2) City-OID

(3) People-OID (4) Car-OID

(5) City1.name = City2.name

Car Company People City OID OID OID OID color name name name manufacturer profit hometown state headquarter car population president age

(c) object-oriented local schema <4 classes> (a) global query

”select all car

company presidents

that are 52 years

old and own a car

that is built in their

hometown”

Object References (implicit & explicit joins):

(1) manufacturer (2) headquarter

(3) president (4) car

(5) hometown (6) City1.name = City2.name

Car1 Company

City1 People age = 52

City2 Car2 color = red

(1)

(2) (3)

(5) (6) (4)

(d) object-oriented predicate graph

Page 43: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

43

HDBS Transaction Model

server (proxy for the GTM)

server (proxy for the GTM)

{ GSTi1, GSTl1, GSTi2, GSTj2 }

...

global transactions

GTi GTj

DBMS 1

GSTi1 GSTj1

GTM - global

transaction manager

DBMS n

GSTi2 GSTj2

local

transactions LTm

LTn

local

transactions LTk

LTl

Page 44: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

44

Autonomy Type Definition Resulting Problem

Transaction Management

• Local transactions: access data at a single site outside of the

global HDBS control.

• Global transactions: are executed under the HDBS control.

Local DBMSs have three types of autonomy:

Design No changes can be made to the local

DBMS software to support the HDBMS

Non-serializable schedule

for global transactions

Execution

Each local DBMS controls execution of

global subtransactions and local

transactions ( the commit/abort decision)

Non-atomic & non-durable

global transactions

Communication

Local DBMS do not communicate with

each other and they do not exchange

execution control information

Distributed deadlock

can not be detected

Page 45: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

45

Local DBMS-3

Local DBMS-2

Local DBMS-1

Global Serializability Problem

• GTM is responsible for

– A serializable schedule for the set of global transactions

– Coordination of submission and execution of global subtransactions

among the local DBMSs

• Serializing the global schedule?

If GST11 GST22 at site DBMS-1,

Then it must be the case that GST12 GST23 at site DBMS-2

GT1

GST11 GST12

GT2

GST21 GST22 GST23

GT1 GT2

GT2 GT1

Global

Serializability

Atomicity &

Durability

Distrbuted

Deadlock

If GST23 GST12 at site DBMS-2 A non-serializable schedule!

Page 46: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

46

LDBMS-2: w4(c) r1(c) c1 r2(d) c2 w4(d) c4

GT1: r1(a) r1(c)

=> LDBMS-1: GT1 LT3 GT2

LDBMS-1: r1(a) c1 w3(a) w3(b) c3 r2(b) c2

Local Transactions and the Global Serializable Schedule

• Local transactions execute outside the control of the GTM

• Local transactions create indirect conflicts with global transactions

• GTM is not aware of local transactions and these indirect conflicts

• In general, the GTM cannot ensure global serializability

GT2: r2(b) r2(d)

a b

LDBMS-1

c d

LDBMS-2 LT3: w3(a) w3(b) LT4: w4(c) w4(d)

=> LDBMS-2: GT2 LT4 GT1

Page 47: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

47

Controlling the Execution Order of Global Subtransactions

• Three Strategies: Global

Serializability

Atomicity &

Durability

Distrbuted

Deadlock 1) Execute global transactions serially

• No concurrent execution for global transactions!

• Does not solve indirect conflicts with local transactions

2) Relax the serializability/consistency requirement

• Use “strong correctness” instead

• Most indirect conflicts have no effect on correctness

3) Define a specific order over the global transactions and

use the concurrency control mechanism of each local

DBMS to enforce that order

• Use a local database “ticket”

Page 48: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

48

Alternative Consistency Notions

• Local serializability: In some HDBS applications there may be no global constraints because each DBS is quite independent from others and may wish to remain that way. => no global concurrency control mechanism needed That is, local serializability is sufficient to ensure strong correctness of global executions.

– Example application: travel reservation service for planes, trains, ferries, hotels, etc.

Constraint-based strategies

Non-constraint-based strategies

• Handling global constraints: In some applications we need global constraints. However, it

may still be possible to enforce them without the full generality of globally serializable

schedules (two-level serializability, 2LSR). The data that can be involved in global

constraints are limited. Two types of data: global and local data. Global constraints may

only span global data, and local transactions may not write to global data.

– Artificial solution: local site has no autonomy over global data; master-slave relationship.

• Other approaches: extend the allowable schedules beyond global serializability, e.g.,

epsilon serializability (schedule can have a limited number of nonserializable conflicts), or

define sets of compatible transactions that are known to be interleavable.

Page 49: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

49

• Unknown DBMSs: the GTM ensures that all global transactions will conflict at every site where they execute together. If a pair of transactions does not naturally conflict, then the GTM modifies them so that they do conflict. Each local site has a special data item (called a ticket). Every subtransaction reads and writes the ticket:

Global Serializability Schemes Failure-free environment where the local DBMSs cannot unilaterally abort

transactions (unrealistic case, but we can relax some of these conditions later ).

GT1: r1(a) w1(a)

GT2: r2(b) w2(b)

Severe performance issues with these approaches

newGT1: r1(ticketS1) r1(a) w1(a) w1(ticketS1) c1

newGT2: r2(ticketS1) r2(b) w2(b) w2(ticketS2) c2

• Means GT1 and GT2 will be correctly serialized with respect to all global transactions and all local transaction executed by the local DBMS at S1

• Rigorous DBMSs: scenario where the GTM knows that all local DBMSs use the rigorous (strict) two-phase locking protocol (R2PL). With local R2PL, global serializability can be ensured as long as the GTM does not issue any commits for a transaction until all its actions have been completed.

Page 50: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

50

Global Atomicity and Recovery Problem

• The GTM must guarantee that a global transaction commits at all sites or aborts at all sites

• Local DBMSs wish to preserve their execution autonomy – May not implement or export a prepare-to-commit interface

Global

Serializability

Atomicity &

Durability

Distrbuted

Deadlock

GTM

GTM Proxy

LDBMS

2PC

No 2PC

GTM Proxy

LDBMS

2PC

No 2PC Commit GST12 Abort GST11

GT1

GST11 GST12

• A local DBMS can unilaterally abort a subtransaction anytime – Results in non-atomic global transactions and incorrect global schedules

– Local transactions and global subtransactions see committed partial results

Note: The first heterogeneous systems did not support update transactions!

Page 51: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

51

GTM Proxy

LDBMS

No 2PC

GTM

2PC

Approaches to Achieve Atomicity and Durability

• If all LDBMSs export a “prepare-to-commit” interface, then use 2PC between the proxy and the LDBMS

• If some LDBMSs do not export “prepare-to-commit”,

then three approaches:

1) Modify each global subtransaction to “callback to the proxy”

just before local commit

• Blocks the global subtransaction until GTM completes 2PC with proxies

• Possibly only if the LDBMS supports a client callback service

• Fails if the LDBMS is running optimistic concurrency control

– If any global subtransaction aborts

2) Attempt to REDO that global subtransaction

– Other transactions see inconsistent data until the redo is successful

3) Execute compensating transactions to UNDO

the committed global subtransactions

– Other transaction see inconsistent data until the undo is completed

Page 52: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

52

Global Deadlock Problem • Same problem as in distributed homogeneous DBMSs

Global

Serializability

Atomicity &

Durability

Distrbuted

Deadlock

• We solved the problem by exchanging lock information to construct the global “waits-for” graph

– This violates design autonomy and communication autonomy

Site X

Site Y

T1 x

holds lock Lx

T2 y

holds lock Ly

waits for T2 y

to release Ly

waits for T1 x

to release Lx

T1 y

holds lock La

T2 x

holds lock Lb

T2 y needs b

waits for T2 x

to complete

T1 x needs a

waits for T1 y

to complete

• Therefore the GTM will be unaware of a global deadlock.

• There are no complete solutions to the global deadlock problem for autonomous multi-database systems.

Page 53: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

53

Status: Transaction Management for HDBS

• What can be done if some of the local subsystems (e.g., file systems) do not support transaction management?

• Performance implications of transaction management strategy?

• Handling of different degrees of consistency?

Open issues:

• Transaction management for HDBSs is a very active research area.

• Distributed transactions over the Internet define new semantic

possibilities, allowing development of new solutions.

Page 54: Heterogeneous / Federated / Multi-Database · PDF fileHeterogeneous / Federated / Multi-Database Systems ... Users can transparently access the integrated DBSs and/or file ... Multi-DBMS

54

Conclusions

a uniform view on the combination of data

maintained by different autonomous database systems.

HDBS allows

• available: prototypes & commercial products with a set of fixed /

specific drivers (so-called gateways) for existing, widely used data

management systems (conventional DBS and file systems)

• missing: systematic support for individual integration of arbitrary

data management systems (especially modern DBS)