Chapter 13: Distributed DatabasesModern Database Management7th
EditionJeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden
Chapter 13 2005 by Prentice Hall
ObjectivesDefinition of termsExplain business conditions driving
distributed databasesDescribe salient characteristics of
distributed database environmentsExplain advantages and risks of
distributed databasesExplain strategies and options for distributed
database designDiscuss synchronous and asynchronous data
replication and partitioningDiscuss optimized query processing in
distributed databasesExplain salient features of several
distributed database management systems
Chapter 13 2005 by Prentice Hall
DefinitionsDistributed Database: A single logical database that
is spread physically across computers in multiple locations that
are connected by a data communications linkDecentralized Database:
A collection of independent databases on non-networked
computersThey are NOT the same thing!
Chapter 13 2005 by Prentice Hall
Reasons forDistributed DatabaseBusiness unit autonomy and
distributionData sharingData communication costsData communication
reliability and costsMultiple application vendorsDatabase
recoveryTransaction and analytic processing
Chapter 13 2005 by Prentice Hall
Figure 13-1 Distributed database environments (adapted from Bell
and Grimson, 1992)
Chapter 13 2005 by Prentice Hall
Distributed Database Options
Homogeneous - Same DBMS at each nodeAutonomous - Independent
DBMSsNon-autonomous - Central, coordinating DBMSEasy to manage,
difficult to enforceHeterogeneous - Different DBMSs at different
nodesSystems With full or partial DBMS functionalityGateways -
Simple paths are created to other databases without the benefits of
one logical databaseDifficult to manage, preferred by independent
organizations
Chapter 13 2005 by Prentice Hall
Distributed Database Options (cont.)Systems - Supports some or
all functionality of one logical databaseFull DBMS Functionality -
All distributed DB functionsPartial-Multi database - Some
distributed DB functionsFederated - Supports local databases for
unique data requestsLoose Integration - Local dbs have their own
schemasTight Integration - Local dbs use common schemaUnfederated -
Requires all access to go through a central, coordinating
module
Chapter 13 2005 by Prentice Hall
Homogeneous, Non-Autonomous DatabaseData is distributed across
all the nodesSame DBMS at each nodeAll data is managed by the
distributed DBMS (no exclusively local data)All access is through
one, global schemaThe global schema is the union of all the local
schema
Chapter 13 2005 by Prentice Hall
Figure 13-2: Homogeneous DatabaseSource: adapted from Bell and
Grimson, 1992.
Chapter 13 2005 by Prentice Hall
Typical Heterogeneous EnvironmentData distributed across all the
nodesDifferent DBMSs may be used at each nodeLocal access is done
using the local DBMS and schemaRemote access is done using the
global schema
Chapter 13 2005 by Prentice Hall
Figure 13-3: Typical Heterogeneous Environment
Source: adapted from Bell and Grimson, 1992.
Chapter 13 2005 by Prentice Hall
Major ObjectivesLocation Transparency User does not have to know
the location of the dataData requests automatically forwarded to
appropriate sitesLocal Autonomy Local site can operate with its
database when network connections failEach site controls its own
data, security, logging, recovery
Chapter 13 2005 by Prentice Hall
Significant Trade-OffsSynchronous Distributed Database All
copies of the same data are always identicalData updates are
immediately applied to all copies throughout networkGood for data
integrityHigh overhead slow response timesAsynchronous Distributed
DatabaseSome data inconsistency is toleratedData update propagation
is delayedLower data integrityLess overhead faster response
timeNOTE: all this assumes replicated data (to be discussed
later)
Chapter 13 2005 by Prentice Hall
Advantages ofDistributed Database over Centralized
DatabasesIncreased reliability/availabilityLocal control over
dataModular growthLower communication costsFaster response for
certain queries
Chapter 13 2005 by Prentice Hall
Disadvantages ofDistributed Database Compared to Centralized
DatabasesSoftware cost and complexityProcessing overheadData
integrity exposureSlower response for certain queries
Chapter 13 2005 by Prentice Hall
Options forDistributing a DatabaseData replication Copies of
data distributed to different sitesHorizontal partitioningDifferent
rows of a table distributed to different sitesVertical
partitioningDifferent columns of a table distributed to different
sitesCombinations of the above
Chapter 13 2005 by Prentice Hall
Data ReplicationAdvantages: ReliabilityFast responseMay avoid
complicated distributed transaction integrity routines (if
replicated data is refreshed at scheduled intervals)Decouples nodes
(transactions proceed even if some nodes are down)Reduced network
traffic at prime time (if updates can be delayed)
Chapter 13 2005 by Prentice Hall
Data Replication (cont.)Disadvantages: Additional requirements
for storage spaceAdditional time for update operationsComplexity
and cost of updatingIntegrity exposure of getting incorrect data if
replicated data is not updated simultaneouslyTherefore, better when
used for non-volatile (read-only) data
Chapter 13 2005 by Prentice Hall
Types of Data ReplicationPush Replication updating site sends
changes to other sitesPull Replication receiving sites control when
update messages will be processed
Chapter 13 2005 by Prentice Hall
Types of Push ReplicationSnapshot Replication - Changes
periodically sent to master site Master collects updates in logFull
or differential (incremental) snapshotsDynamic vs. shared update
ownershipNear Real-Time Replication -Broadcast update orders
without requiring confirmationDone through use of triggersUpdate
messages stored in message queue until processed by receiving
site
Chapter 13 2005 by Prentice Hall
Issues for Data ReplicationData timeliness high tolerance for
out-of-date data may be requiredDBMS capabilities if DBMS cannot
support multi-node queries, replication may be necessaryPerformance
implications refreshing may cause performance problems for busy
nodesNetwork heterogeneity complicates replicationNetwork
communication capabilities complete refreshes place heavy demand on
telecommunications
Chapter 13 2005 by Prentice Hall
Horizontal PartitioningDifferent rows of a table at different
sitesAdvantages -Data stored close to where it is used
efficiencyLocal access optimization better performanceOnly relevant
data is available securityUnions across partitions ease of
queryDisadvantagesAccessing data across partitions inconsistent
access speedNo data replication backup vulnerability
Chapter 13 2005 by Prentice Hall
Vertical PartitioningDifferent columns of a table at different
sitesAdvantages and disadvantages are the same as for horizontal
partitioning except that combining data across partitions is more
difficult because it requires joins (instead of unions)
Chapter 13 2005 by Prentice Hall
Figure 13-6 Distributed processing system for a manufacturing
company
Chapter 13 2005 by Prentice Hall
Five Distributed Database OrganizationsCentralized database,
distributed accessReplication with periodic snapshot
updateReplication with near real-time synchronization of
updatesPartitioned, one logical databasePartitioned, independent,
nonintegrated segments
Chapter 13 2005 by Prentice Hall
Factors in Choice ofDistributed StrategyFunding, autonomy,
securitySite data referencing patternsGrowth and expansion
needsTechnological capabilitiesCosts of managing complex
technologiesNeed for reliable service
Chapter 13 2005 by Prentice Hall
Table 13-1: Distributed Design Strategies
Chapter 13 2005 by Prentice Hall
Distributed DBMSDistributed database requires distributed
DBMSFunctions of a distributed DBMS:Locate data with a distributed
data dictionaryDetermine location from which to retrieve data and
process query componentsDBMS translation between nodes with
different local DBMSs (using middleware)Data consistency (via
multiphase commit protocols)Global primary key
controlScalabilitySecurity, concurrency, query optimization,
failure recovery
Chapter 13 2005 by Prentice Hall
Figure 13-10: Distributed DBMS architecture
Chapter 13 2005 by Prentice Hall
Local Transaction StepsApplication makes request to distributed
DBMSDistributed DBMS checks distributed data repository for
location of data. Finds that it is localDistributed DBMS sends
request to local DBMSLocal DBMS processes requestLocal DBMS sends
results to application
Chapter 13 2005 by Prentice Hall
Figure 13-10: Distributed DBMS Architecture (cont.) (showing
local transaction steps)Local transaction all data stored
locally
Chapter 13 2005 by Prentice Hall
Global Transaction StepsApplication makes request to distributed
DBMSDistributed DBMS checks distributed data repository for
location of data. Finds that it is remoteDistributed DBMS routes
request to remote siteDistributed DBMS at remote site translates
request for its local DBMS if necessary, and sends request to local
DBMSLocal DBMS at remote site processes requestLocal DBMS at remote
site sends results to distributed DBMS at remote siteRemote
distributed DBMS sends results back to originating siteDistributed
DBMS at originating site sends results to application
Chapter 13 2005 by Prentice Hall
Figure 13-10: Distributed DBMS architecture (cont.) (showing
global transaction steps)Global transaction some data is at remote
site(s)
Chapter 13 2005 by Prentice Hall
Distributed DBMSTransparency ObjectivesLocation
TransparencyUser/application does not need to know where data
residesReplication TransparencyUser/application does not need to
know about duplicationFailure TransparencyEither all or none of the
actions of a transaction are committedEach site has a transaction
managerLogs transactions and before and after imagesConcurrency
control scheme to ensure data integrityRequires special commit
protocol
Chapter 13 2005 by Prentice Hall
Two-Phase CommitPrepare PhaseCoordinator receives a commit
requestCoordinator instructs all resource managers to get ready to
go either way on the transaction. Each resource manager writes all
updates from that transaction to its own physical logCoordinator
receives replies from all resource managers. If all are ok, it
writes commit to its own log; if not then it writes rollback to its
log
Chapter 13 2005 by Prentice Hall
Two-Phase Commit (cont.)Commit PhaseCoordinator then informs
each resource manager of its decision and broadcasts a message to
either commit or rollback (abort). If the message is commit, then
each resource manager transfers the update from its log to its
databaseA failure during the commit phase puts a transaction in
limbo. This has to be tested for and handled with timeouts or
polling
Chapter 13 2005 by Prentice Hall
Concurrency ControlConcurrency TransparencyDesign goal for
distributed databaseTimestampingConcurrency control
mechanismAlternative to locks in distributed databases
Chapter 13 2005 by Prentice Hall
Query OptimizationIn a query involving a multi-site join and,
possibly, a distributed database with replicated files, the
distributed DBMS must decide where to access the data and how to
proceed with the join. Three step process:Query decomposition -
rewritten and simplifiedData localization - query fragmented so
that fragments reference data at only one siteGlobal optimization -
Order in which to execute query fragmentsData movement between
sitesWhere parts of the query will be executed
Chapter 13 2005 by Prentice Hall
Evolution of Distributed DBMSUnit of Work - All of a
transactions steps.Remote Unit of WorkSQL statements originated at
one location can be executed as a single unit of work on a single
remote DBMS
Chapter 13 2005 by Prentice Hall
Evolution of Distributed DBMS (cont.)Distributed Unit of
WorkDifferent statements in a unit of work may refer to different
remote sitesAll databases in a single SQL statement must be at a
single siteDistributed RequestA single SQL statement may refer to
tables in more than one remote siteMay not support replication
transparency or failure transparency