1 Lecture 9: Distributed Databases – Principles and Architectures Advanced Databases CG096 Nick Rossiter.

1

Lecture 9: Distributed Databases –Principles and Architectures

Advanced Databases CG096

Nick Rossiter

2

Distributed Database (DDB) -- Definition

A logically interrelated collection of shared data physically distributed over a computer network.

Implies data description at two levels: Global (the view of the whole) Local (where data is actually held)

3

Distributed DBMS DDBMS The software system that permits the

management of the distributed database and makes the distribution transparent to users Transparent – users are unaware of the

underlying local structure Data requests do not specify distribution sites But they may notice performance differences (e.g.

if local data moved to another site with slow line)

4

Characteristics of DDB Collection of logically related shared data Data is split into a number of fragments

(horizontal or vertical (select or project)) Fragments may be replicated Fragments/replicas are allocated to sites

Fragments are in effect views Replicas are duplicates – only acceptable if

redundancy is controlled

5

Why distribute? Natural match of data with location

Can have each division, department or office hold its own data with some degree of autonomy Autonomy – to have control (self-determination,

self-rule)

Users can decide policies locally (devolved) Still need global DBA to ensure entire system

works

6

Why distribute? (continued) More flexible operation

Improved availability One node failure does not bring the whole system down

Improved reliability Replication ensures that copies of data are still available if a

node fails Improved performance

Accessing most data locally reduces network overheads Readily handle expansion

Can add new nodes with local schema Followed by simple adjustments to global definition

7

Problems with distribution Complexity

Global and local schema must be integrated Design techniques involve more stages Replications rigorously handled Network made robust

Costs Although cheaper to buy power with smaller

machines rather than larger ones More people effort in distributed than centralised approach

to handle the complexity

8

Problems with distribution (continued) Security

Many more potential access points for would be violators

Integrity Need to ensure that combination of local and

global constraints gives the required effect Experience

Fairly immature technology Not yet translated to standards

9

Homogeneous and Heterogeneous DDBMS

Homogeneous DDBMS uses the same database product at all sites

Heterogeneous DDBMS uses different database products across its sites may arise from corporate mergers

10

Degrees of heterogeneity vary Same software, different hardware can be

handled fairly easily Oracle 9i : Oracle 8i – differences slight Oracle 9i : SQL Server – same underlying

relational model, different syntax in places Oracle 9i : Objectivity – object-relational

(SQL-1999) to ODMG, different underlying model.

11

Interoperability Ability to work with each other. In loosely coupled environment:

full details of each system not needed BUT need to have interfaces for reliably exchanging

messages without error or misunderstanding Solutions:

standardized specifications mediation

Differences in implementation: may still lead to breakdowns in communication

12

Simple Problem in Interoperability Two schemas in SQL-1999A Bauthor varchar2(50), author_surname varchar2(40), author, initials varchar2(10),title varchar2(300), title varchar2(200),keyword1 varchar2(30), keywd keywordarr;keyword2 varchar2(30);

CREATE TYPE keywordarr AS

VARRAY(8) OF varchar2(30);

Note: homogeneous model -- both SQL-1999 -- but difficulties.

13

Different Standards For example -- Names:

Person(surname, first_name, ..) or Person(first_name, surname, …) or Person(name, …)

First two may easily be made equivalent but convention in third needs to be understood.

Note also possibilities of A.N.Other, AN Other, A N Other.

14

Possible Solutions In schema B define function which amalgamates the two

parts of author into one value. Will need to look manually at format of author in

schema A. If format inconsistent, need some pre-processing.

Other inconsistencies require decisions: Fixed two entries for keyword versus array dimension 8. Different name for keyword attribute Different size for title fields (presumably adopt higher).

In heterogeneous environment, need also to relate schema constructions. Is class same as table?

15

Simple Problem in Interoperability 2 Homogeneous Models

The same information may be held as attribute name, relation name or a value in different databases

e.g. fines in library; could be held in a dedicated relation Fine(amount,

borrowed_id) or as an attribute Loan(id, isbn, date_out, fine) or as a value Charge(1.25, ‘fine’)

16

Architectures for Interoperability 11. Global schema integration

Produces single new schema (C) for the different information systems with schemas (A, B).

A

C

B

17

Global Schema Integration Advantages

Transparent to end users -- appears as single information system

Disadvantages Difficult -- needs human understanding to

perform integration Local autonomy lost Static - does not evolve automatically Tightly-coupled

18

Architectures for Interoperability2 2. Federated Database Systems

Less tightly coupled schema (than in 1) Each service through an export schema

specifies sharable objects Common data model Internal command language Decentralised control (local autonomy) Five-level architecture for federated system

e.g. Objectivity as Federated OODBMS

19

Federated DBMS - five-level Architecture

Local CS Local CS

Local IS Local IS

DB

Global CS

Global ESGlobal ES

Local ESLocal ES

DB

20

Terminology FDBMS IS is Internal Schema defining layout on

disk of a conceptual schema CS is Conceptual Schema defining logical

database (e.g. relational -- tables, attributes, domains)

ES is External Schema defining views on conceptual schema

21

Federated Databases: Loosely-coupled Created by users. AE, BE are export

schema. V is view. A,B are base schema, autonomy

retained over that part of schema not exported.

A B

V

AEBE

22

Federated Databases: Tightly- Coupled Created by administrators Global schema integration on all export

schemas More formal than loosely-coupled Much effort to resolve semantic

inconsistencies

23

Federated Database Systems - General Advantages Preserves local autonomy Not all data needs to be integrated Provides metadata structures for views

(external and export schema, data dictionary)

24

Federated Database Systems - Disadvantages by Approach Tightly-coupled

similar to global schema integration1) complex, difficult to make changes dynamically2) much effort in resolving semantic inconsistencies

Loosely-coupled duplication by different users in building views updating data defined in views can be difficult

25

Multidatabase Language Approach No attempt at schema integration All sites maintain complete autonomy Various schema in services provided can be

heterogeneous, inconsistent and duplicate information in different ways.

Language (e.g. MSQL) is used to integrate databases at run time.

Relational data model used as Common Data Model

26

Multidatabase Language Approach - Diagram

A,B are schema

MSQL is runtime

language

A B

MSQL

27

Multidatabase Language Approach - Advantages No preparatory work to understand

semantics of schema Dynamic -- access latest versions Very skilled users can succeed in reaching

their goals Interesting work on multidatabase

dependencies

28

Example Multidatabase Language MSQL (Multidatabase SQL)

Biased towards relational model Illustrates problems

Consider 2 databases Each on publications of a computing society And query: “What is the name, email, title for each

publication of an author appearing in both of the society’s databases?”

29

MSQL - Schema Schema 1 (for AIIA Database):

Contacts (PersonID, Name, Email, …) Conference (Name, Type, …) Attendees(ID, Conf_ID, Speaker, …) Publ_Papers(P_ID, Title, Author_ID, …)

Schema 2 (for IFIP Database): Member_Socs(Soc_Name, …) Conf (Conf_ID, …) Publ_Papers(P_Ref, Title, Conf_Ref, …) Authors(Name, Email, Paper_ID, …)

Underlined attributes are primary key; attributes in italics are foreign key.

30

MSQL for QueryUSE AIIA, IFIPSELECT Name, Email, TitleFROM Authors,IFIP.Publ_Papers IFIP_Paper,Contacts,AIIA.Publ_papers AIIA_PaperWHERE Authors.Name = Contacts.NameAND Contacts.Person_ID = AIIA_Paper. Author_IDAND Authors.Paper_ID = IFIP_Paper.P_Ref;

The USE statement declares the multidatabases which are aliased in the FROM statement to distinguish tables with the same name.Retrieves Name, Email and Title from both databases.

31

Potential Problems with MSQL Are domains on name comparable? Can use LET command to create

equivalencies of names but does not solve domain mismatch.

What if one schema not relational? Entity-Relationship model often used as neutral schema for translation and comparison of heterogeneous features

32

Multidatabase Language - Disadvantages in General Distribution is not transparent Users must resolve inconsistencies

themselves Common language may restrict scope of

heterogeneity (relational bias) Local autonomous system may change

schema freely (so that existing queries fail)

33

Comparison of Approaches By coupling:

how tightly is the interoperable system connected to its underlying systems

By adaptability: the ability for the interoperable system to

evolve in line with underlying schema By transparency:

the need for the end-user to understand the underlying schema

34

Comparison of ApproachesCoupling Adaptability

Transparency

ApproachGlobal Schema Tight Low High

Integration

Federated Medium Medium Medium

Data Bases

Multidatabase Low High Low

Languages

35

SummaryTrend:

From Global Schema Integration Federated Database

Multidatabase Language of lower coupling, higher adaptability,

and lower transparency.

36

Further Reading Management of Heterogeneous and

Autonomous Database Systems

Elmagarmid, Ahmed

Rusinkiewicz, Marek

Sheth, Amit

Morgan Kaufmann 1999.

1 Lecture 9: Distributed Databases – Principles and Architectures Advanced Databases CG096 Nick Rossiter.

Documents