Transcript

Distributed Databases :An Overview

Unit-1

ContentsUNIT – IChapter.1

1.0 What is a Distributed Database [ DDB]1.1 Features of Distributed versus Centralized Databases,

Chapter 3. Levels Of Distribution Transparency, 3.1 Reference Architecture for Distributed Databases , 3.2 Types of Data Fragmentation, 3.6 Integrity Constraints in Distributed Databases.

Book-1 : Distributed Databases, by Stefano Ceri, Giuseppe Pelagatti, Tata McGraw-Hill edn 20081.1; 3.1, 3.2, 3.6

1.1 Features of Distributed versus Centralized Databases

What is a Distributed Database [DDB]?A simple definition:

A collection of data which belong to the same enterprise spread over the sites of a computer network.

The two important aspect of a DDB are:Distribution – [ of data]

In a centralized database data is at a single site [ host]

Logical Correlation – how exactly the data at different site are related.

Illustration of DDB through example:

Different Scenarios of BD applications

Personal Computer• One DB application • one computer

• One/more application(s) on a single computer with multiple [dumb] terminals / users

Different Scenarios of BD applications

• Multiple networked computers each with its own DB local application and local users

Different Scenarios of BD applications

• Multiple networked computers each with its own DB local DB and local users with a global application accessing data from these sites

Different Scenarios of BD applications

• Multiple networked computers each with its own local DB and local users with multiple global applications, each accessing data from these multiple sites

Different Scenarios of BD applications

Example.1A bank with 3 branches at different

locations. At each branch, a computer controls the teller terminals of the branch and the account database of the branch.

Each branch with its local database constitutes one site of the distributed database.

Computers are connected by a communication network

each site handles only local applications – operations requested from a terminal to access the db of that branch.

Does logical correlation property hold here?Should this be considered as an example of a DDB or a set of local DBs?

A global application – eg. An application that transfers funds from one site to another- is the one that make a DDB.

Example.2Same the previous example 1Now the computers and their

respective DBs have been moved form the branches to a common building and are connected with a high-bandwidth local network.

Tellers are connected to their respective computers by telephone lines

Each processor and its DB constitute a site for the local computer network.Should this be considered as an

example of a DDB or a set of local DBs?

Fig 1.2

Same as example 1 except for the geographical distribution of the computers

What are the major differences between the two from the view point of functioning and performance?

Example.3• Here the data of the different

branches are distributed on three “backend” computers, which perform the DBMS functions.

• The application programs are executed by a different computer [front-end] , which requests database access services from the backends when necessary.

Computer Center

Fig 1.3 A multiprocessor System

Should this be considered as an example of a DDB or a set of local DBs?NO. though the data is distributed, their distribution is not relevant to the application point of view. What is missing here is the local application.

1.1 Features of Distributed versus Centralized Databases

From the examples we can have the following working definition of a Distributed Database [DDB].A DDB is an integrated database which is built on top of a

computer network rather than on a single computer. The data which constitute the database are stored at the different sites of the computer network, and the application programs which are run by the computer access data at different sites.

13

Taxonomy of DDS

14

Homogeneous Distributed Databases

In a homogeneous distributed databaseAll sites have identical software Are aware of each other and agree to cooperate

in processing user requests.Each site surrenders part of its autonomy in

terms of right to change schemas or softwareAppears to user as a single system

15

Architecture of Homogeneous DDBMS

16

Schema Architecture of a Homogenous DDBMS

17

Hetrogeneous Distributed Databases

In a heterogeneous distributed databaseDifferent sites may use different schemas and software

Difference in schema is a major problem for query processing

Difference in software is a major problem for transaction processing

Sites may not be aware of each other and may provide only limited facilities for cooperation in transaction processing

18

Overall Architecture of multidatabase Systems

19

1. Distributed Database System

• Tightly Coupled• Loosely Coupled

20

Schema Architecture of Tightly-Coupled MDBS

• Advantages of Replication– Availability: failure of site containing relation r does not

result in unavailability of r is replicas exist.– Parallelism: queries on r may be processed by several

nodes in parallel.– Reduced data transfer: relation r is available locally at

each site containing a replica of r.

– ri = Ri (r)

21

1. Distributed Database System

• Loosely Coupled• A distributed database system consists of

loosely coupled sites that share no physical component

• Database systems that run on each site are independent of each other

• Transactions may access data at one or more sites

22

Loosely Coupled MDBS with Export Schema

23

Loosely Coupled MDBS with No Export Schema

DBS Architectures

DBS-Architecture

Features of a centralized Vs DDBs

centralized Vs DDBsReview:

What is a centralized DB? Traditional databases

What is a DDBs?

Features that characterize a Centralized DBCentralized ControlData independenceReduction of redundancyComplex physical structures and efficient access Integrity, Recovery and Concurrency ControlPrivacy and Security

centralized Vs DDBsCentralized Control

CDB One point control of the entire DB Single Database Administrator [DBA]

DDB Multi point (source) control Global Database Administrator [GDBA] & Local Database Administrator [LDBA] & “Site /Local Autonomy”- decides freedom of local

administrator

centralized Vs DDBsData Independence

What is data Independence? Organization of data (physical storage of data in a DB) is

transparent to the application developer How is it achieved?

Layered design/ Levels of Abstraction– Logical Level [Conceptual design- schemas, tuples, attributes]– Physical Level [ how data is stored in the hard disc]

Benefit Application developers need not know how data is

stored in the database stored In CDB

Allows the two layers to be designed independently How does this help? Each can be designed /changed

independent of the other.

centralized Vs DDBsData Independence …. Contd…

In DDB Also proves data independence, with an additional

feature called Distribution Transparency –Application programmers

not only need to know – How data is stored, and also– On which site it is stored.

Thus we have here in addition to traditional– Conceptual Schema– Storage Schema, we have– External Schema

centralized Vs DDBsRedundancy Reduction

In CDB Redundancy repetition of data Reduced as much as possible for TWO reasons:

– To avoid inconsistencies– To minimized the storage required

It is one of the main concerns – Normalization used

In DDB Redundancy is allowed ………….

centralized Vs DDBsRedundancy Reduction … contd..

In DDB Redundancy is allowed Reasons

– Faster access [ local data can be accessed faster]» Higher throughput» Higher availability» More fault tolerant

Makes design, development and data modification complex .

centralized Vs DDBsComplex Physical Structure & Efficient Access

In CDB Uses indexing, hashing, interfile chains and so on Purpose – faster / efficient access

In DDB Complex structures alone can not solve access

problems Efficient access is still an issue Complex structures at local level alone [local

optimization] are not enough. The network delays dominate the disc access delays.

A global optimization is necessary and it includes local optimization plan + an additional “network access plan”

centralized Vs DDBsIntegrity, recovery & concurrency Control

In CDB Integrity- requires enforcing ACID properties Integrity in Concurrency environmentConcurrency control

Various Protocols : two-phase, time-stamp, tree- ..etc.,Recovery

Log based approach, checkpointing etc. In DDB

All these are enforced Distribution of data make these protocols more

complex.

centralized Vs DDBsPrivacy & Security

In CDB DBA ensures authorized access to data Also requires additional specialized control

DDB Has similar problem, in addition to threats over the

network Local autonomy helps the local DBA to enforce

security Additional security measures are required for global /

overt the network threats.

Why DDBS?Organizational & economic reasons Interconnection of existing DBs Incremental growthReduced communication overheadPerformance considerationsReliability & availabilityAll these problems are not new. Why then the

development of DDBSs has taken this long? First, development of inexpensive, powerful small computers Second, for want of necessary network, middleware & DB-

technologies

DDBMSDistributed Database Management Systems

They support the creation & maintenance of DDBSsThey contain additional components which extend the

capabilities of CDBMSs. The typical such software components are: The database management component (DB) Data communication component (DC)– ODBC, JDBC,

TCP/IP The data dictionary (DD)– to include information

about the distribution of data over the network – fragmentation schema & allocation schema

The distributed database component (DDB)

components of a commercial DDBMS

DCDCDB

DD DD DDBDDB

Local database-2

DCDCDB

DD DD DDBDDB

Local database-1

Site 2

Site 1

components of a commercial DDBMSServices supported by the above systems

Remote database access by an application: RPC, ODBC, JDBC, TCP/IP, Named-pipes

Some degree of distribution transparencySupport for database administration & controlSome support for concurrency control

Assignment -1

1. List out all the key words introduced in this chapter and write a brief definition/explanation for each of them.

2. Selected any TWO commercial DBMS of your choice and describe the salient features of them as DDBMS.

DUE: next week the same hour.Questions:1. What are the different types of DDBS? Explain them

briefly2. What are the major differences between CDB & DDB?

Exsplain.

Seminars Sai sandeepShekun Bee IndexingRamya KrishnaSwathi GSwathi CSameeraRajeswriSharon SamuelSri RamyaSravanthi

Seminars

Naga subramanyamGiridharSyed AbdullaBhaskar Aunusha-1Najma KanamAmruthaAnusha-2

JAI SAI RAM

top related