Top Banner
© 2005 by Prentice Hall © 2005 by Prentice Hall 1 Chapter 13: Chapter 13: Distributed Databases Distributed Databases Modern Database Management Modern Database Management 7 7 th th Edition Edition Jeffrey A. Hoffer, Mary B. Prescott, Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden Fred R. McFadden
40
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • Chapter 13: Distributed DatabasesModern Database Management7th EditionJeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden

    Chapter 13 2005 by Prentice Hall

    ObjectivesDefinition of termsExplain business conditions driving distributed databasesDescribe salient characteristics of distributed database environmentsExplain advantages and risks of distributed databasesExplain strategies and options for distributed database designDiscuss synchronous and asynchronous data replication and partitioningDiscuss optimized query processing in distributed databasesExplain salient features of several distributed database management systems

    Chapter 13 2005 by Prentice Hall

    DefinitionsDistributed Database: A single logical database that is spread physically across computers in multiple locations that are connected by a data communications linkDecentralized Database: A collection of independent databases on non-networked computersThey are NOT the same thing!

    Chapter 13 2005 by Prentice Hall

    Reasons forDistributed DatabaseBusiness unit autonomy and distributionData sharingData communication costsData communication reliability and costsMultiple application vendorsDatabase recoveryTransaction and analytic processing

    Chapter 13 2005 by Prentice Hall

    Figure 13-1 Distributed database environments (adapted from Bell and Grimson, 1992)

    Chapter 13 2005 by Prentice Hall

    Distributed Database Options

    Homogeneous - Same DBMS at each nodeAutonomous - Independent DBMSsNon-autonomous - Central, coordinating DBMSEasy to manage, difficult to enforceHeterogeneous - Different DBMSs at different nodesSystems With full or partial DBMS functionalityGateways - Simple paths are created to other databases without the benefits of one logical databaseDifficult to manage, preferred by independent organizations

    Chapter 13 2005 by Prentice Hall

    Distributed Database Options (cont.)Systems - Supports some or all functionality of one logical databaseFull DBMS Functionality - All distributed DB functionsPartial-Multi database - Some distributed DB functionsFederated - Supports local databases for unique data requestsLoose Integration - Local dbs have their own schemasTight Integration - Local dbs use common schemaUnfederated - Requires all access to go through a central, coordinating module

    Chapter 13 2005 by Prentice Hall

    Homogeneous, Non-Autonomous DatabaseData is distributed across all the nodesSame DBMS at each nodeAll data is managed by the distributed DBMS (no exclusively local data)All access is through one, global schemaThe global schema is the union of all the local schema

    Chapter 13 2005 by Prentice Hall

    Figure 13-2: Homogeneous DatabaseSource: adapted from Bell and Grimson, 1992.

    Chapter 13 2005 by Prentice Hall

    Typical Heterogeneous EnvironmentData distributed across all the nodesDifferent DBMSs may be used at each nodeLocal access is done using the local DBMS and schemaRemote access is done using the global schema

    Chapter 13 2005 by Prentice Hall

    Figure 13-3: Typical Heterogeneous Environment

    Source: adapted from Bell and Grimson, 1992.

    Chapter 13 2005 by Prentice Hall

    Major ObjectivesLocation Transparency User does not have to know the location of the dataData requests automatically forwarded to appropriate sitesLocal Autonomy Local site can operate with its database when network connections failEach site controls its own data, security, logging, recovery

    Chapter 13 2005 by Prentice Hall

    Significant Trade-OffsSynchronous Distributed Database All copies of the same data are always identicalData updates are immediately applied to all copies throughout networkGood for data integrityHigh overhead slow response timesAsynchronous Distributed DatabaseSome data inconsistency is toleratedData update propagation is delayedLower data integrityLess overhead faster response timeNOTE: all this assumes replicated data (to be discussed later)

    Chapter 13 2005 by Prentice Hall

    Advantages ofDistributed Database over Centralized DatabasesIncreased reliability/availabilityLocal control over dataModular growthLower communication costsFaster response for certain queries

    Chapter 13 2005 by Prentice Hall

    Disadvantages ofDistributed Database Compared to Centralized DatabasesSoftware cost and complexityProcessing overheadData integrity exposureSlower response for certain queries

    Chapter 13 2005 by Prentice Hall

    Options forDistributing a DatabaseData replication Copies of data distributed to different sitesHorizontal partitioningDifferent rows of a table distributed to different sitesVertical partitioningDifferent columns of a table distributed to different sitesCombinations of the above

    Chapter 13 2005 by Prentice Hall

    Data ReplicationAdvantages: ReliabilityFast responseMay avoid complicated distributed transaction integrity routines (if replicated data is refreshed at scheduled intervals)Decouples nodes (transactions proceed even if some nodes are down)Reduced network traffic at prime time (if updates can be delayed)

    Chapter 13 2005 by Prentice Hall

    Data Replication (cont.)Disadvantages: Additional requirements for storage spaceAdditional time for update operationsComplexity and cost of updatingIntegrity exposure of getting incorrect data if replicated data is not updated simultaneouslyTherefore, better when used for non-volatile (read-only) data

    Chapter 13 2005 by Prentice Hall

    Types of Data ReplicationPush Replication updating site sends changes to other sitesPull Replication receiving sites control when update messages will be processed

    Chapter 13 2005 by Prentice Hall

    Types of Push ReplicationSnapshot Replication - Changes periodically sent to master site Master collects updates in logFull or differential (incremental) snapshotsDynamic vs. shared update ownershipNear Real-Time Replication -Broadcast update orders without requiring confirmationDone through use of triggersUpdate messages stored in message queue until processed by receiving site

    Chapter 13 2005 by Prentice Hall

    Issues for Data ReplicationData timeliness high tolerance for out-of-date data may be requiredDBMS capabilities if DBMS cannot support multi-node queries, replication may be necessaryPerformance implications refreshing may cause performance problems for busy nodesNetwork heterogeneity complicates replicationNetwork communication capabilities complete refreshes place heavy demand on telecommunications

    Chapter 13 2005 by Prentice Hall

    Horizontal PartitioningDifferent rows of a table at different sitesAdvantages -Data stored close to where it is used efficiencyLocal access optimization better performanceOnly relevant data is available securityUnions across partitions ease of queryDisadvantagesAccessing data across partitions inconsistent access speedNo data replication backup vulnerability

    Chapter 13 2005 by Prentice Hall

    Vertical PartitioningDifferent columns of a table at different sitesAdvantages and disadvantages are the same as for horizontal partitioning except that combining data across partitions is more difficult because it requires joins (instead of unions)

    Chapter 13 2005 by Prentice Hall

    Figure 13-6 Distributed processing system for a manufacturing company

    Chapter 13 2005 by Prentice Hall

    Five Distributed Database OrganizationsCentralized database, distributed accessReplication with periodic snapshot updateReplication with near real-time synchronization of updatesPartitioned, one logical databasePartitioned, independent, nonintegrated segments

    Chapter 13 2005 by Prentice Hall

    Factors in Choice ofDistributed StrategyFunding, autonomy, securitySite data referencing patternsGrowth and expansion needsTechnological capabilitiesCosts of managing complex technologiesNeed for reliable service

    Chapter 13 2005 by Prentice Hall

    Table 13-1: Distributed Design Strategies

    Chapter 13 2005 by Prentice Hall

    Distributed DBMSDistributed database requires distributed DBMSFunctions of a distributed DBMS:Locate data with a distributed data dictionaryDetermine location from which to retrieve data and process query componentsDBMS translation between nodes with different local DBMSs (using middleware)Data consistency (via multiphase commit protocols)Global primary key controlScalabilitySecurity, concurrency, query optimization, failure recovery

    Chapter 13 2005 by Prentice Hall

    Figure 13-10: Distributed DBMS architecture

    Chapter 13 2005 by Prentice Hall

    Local Transaction StepsApplication makes request to distributed DBMSDistributed DBMS checks distributed data repository for location of data. Finds that it is localDistributed DBMS sends request to local DBMSLocal DBMS processes requestLocal DBMS sends results to application

    Chapter 13 2005 by Prentice Hall

    Figure 13-10: Distributed DBMS Architecture (cont.) (showing local transaction steps)Local transaction all data stored locally

    Chapter 13 2005 by Prentice Hall

    Global Transaction StepsApplication makes request to distributed DBMSDistributed DBMS checks distributed data repository for location of data. Finds that it is remoteDistributed DBMS routes request to remote siteDistributed DBMS at remote site translates request for its local DBMS if necessary, and sends request to local DBMSLocal DBMS at remote site processes requestLocal DBMS at remote site sends results to distributed DBMS at remote siteRemote distributed DBMS sends results back to originating siteDistributed DBMS at originating site sends results to application

    Chapter 13 2005 by Prentice Hall

    Figure 13-10: Distributed DBMS architecture (cont.) (showing global transaction steps)Global transaction some data is at remote site(s)

    Chapter 13 2005 by Prentice Hall

    Distributed DBMSTransparency ObjectivesLocation TransparencyUser/application does not need to know where data residesReplication TransparencyUser/application does not need to know about duplicationFailure TransparencyEither all or none of the actions of a transaction are committedEach site has a transaction managerLogs transactions and before and after imagesConcurrency control scheme to ensure data integrityRequires special commit protocol

    Chapter 13 2005 by Prentice Hall

    Two-Phase CommitPrepare PhaseCoordinator receives a commit requestCoordinator instructs all resource managers to get ready to go either way on the transaction. Each resource manager writes all updates from that transaction to its own physical logCoordinator receives replies from all resource managers. If all are ok, it writes commit to its own log; if not then it writes rollback to its log

    Chapter 13 2005 by Prentice Hall

    Two-Phase Commit (cont.)Commit PhaseCoordinator then informs each resource manager of its decision and broadcasts a message to either commit or rollback (abort). If the message is commit, then each resource manager transfers the update from its log to its databaseA failure during the commit phase puts a transaction in limbo. This has to be tested for and handled with timeouts or polling

    Chapter 13 2005 by Prentice Hall

    Concurrency ControlConcurrency TransparencyDesign goal for distributed databaseTimestampingConcurrency control mechanismAlternative to locks in distributed databases

    Chapter 13 2005 by Prentice Hall

    Query OptimizationIn a query involving a multi-site join and, possibly, a distributed database with replicated files, the distributed DBMS must decide where to access the data and how to proceed with the join. Three step process:Query decomposition - rewritten and simplifiedData localization - query fragmented so that fragments reference data at only one siteGlobal optimization - Order in which to execute query fragmentsData movement between sitesWhere parts of the query will be executed

    Chapter 13 2005 by Prentice Hall

    Evolution of Distributed DBMSUnit of Work - All of a transactions steps.Remote Unit of WorkSQL statements originated at one location can be executed as a single unit of work on a single remote DBMS

    Chapter 13 2005 by Prentice Hall

    Evolution of Distributed DBMS (cont.)Distributed Unit of WorkDifferent statements in a unit of work may refer to different remote sitesAll databases in a single SQL statement must be at a single siteDistributed RequestA single SQL statement may refer to tables in more than one remote siteMay not support replication transparency or failure transparency