8/18/2019 L15-16_PPT_IVSem
1/55
Lecture 15Lecture 15- - 1616
Database System ArchitecturesDatabase System Architectures
Distributed DatabaseDistributed DatabaseParallel DatabaseParallel Database
©Silberschatz, Korth and Sudarshan20.1Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
2/55
IntroductionIntroduction
Distributed Data base system consists
physical Components and the parallel
constitute a single data Base system
©Silberschatz, Korth and Sudarshan20.2Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
3/55
ScopeScope
distributed database are widely used in large data processing, today’sword using the internet, e-banking system, whether forecasting etc.where large amount of data is processed, so there we need of dataprocessing if data is distributed in many places then that is distributeddata processing, there fore scope of parallel data bases anddistributed data processing is very bright.
©Silberschatz, Korth and Sudarshan20.3Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
4/55
ResearchResearch
Lots of research are going on in
databases.
©Silberschatz, Korth and Sudarshan20.4Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
5/55
Database System ArchitecturesDatabase System Architectures
Centralized and Client-Server Systems
Server System Architectures
ara e ystems
Distributed Systems
Network Types
©Silberschatz, Korth and Sudarshan20.5Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
6/55
Centralized SystemsCentralized Systems
Run on a single computer system and do not interact with othercomputer systems.
-
of device controllers that are connected through a common bus thatprovides access to shared memory.
Single-user system (e.g., personal computer or workstation): desk-topunit, single user, usually has only one CPU and one or two harddisks; the OS may support only one user.
Multi-user system: more disks, more memory, multiple CPUs, and amulti-user OS. Serve a large number of users who are connected tothe system vie terminals. Often called server systems.
©Silberschatz, Korth and Sudarshan20.6Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
7/55
A Centralized Computer System A Centralized Computer System
©Silberschatz, Korth and Sudarshan20.7Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
8/55
ClientClient--Server SystemsServer Systems
Server systems satisfy requests generated at m client systems, whose generalstructure is shown below:
©Silberschatz, Korth and Sudarshan20.8Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
9/55
ClientClient--Server Systems (Cont.)Server Systems (Cont.)
Database functionality can be divided into:
Back-end : manages access structures, query evaluation and, .
Front-end : consists of tools such as forms , report-writers , andgraphical user interface facilities.
- -through an application program interface.
©Silberschatz, Korth and Sudarshan20.9Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
10/55
ClientClient--Server Systems (Cont.)Server Systems (Cont.)
Advantages of replacing mainframes with networks of workstations orpersonal computers connected to back-end server machines:
flexibility in locating resources and expanding facilitiesbetter user interfaces
eas er ma ntenance
©Silberschatz, Korth and Sudarshan20.10Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
11/55
Server System ArchitectureServer System Architecture
Server systems can be broadly categorized into two kinds:transaction servers which are widely used in relational databasesystems, and
data servers , used in object-oriented database systems
©Silberschatz, Korth and Sudarshan20.11Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
12/55
Transaction ServersTransaction Servers
Clients send requests to the server
Transactions are executed at the server
Results are shipped back to the client.Requests are specified in SQL, and communicated to the
.
Transactional RPC allows many RPC calls to form a
transaction.Open Database Connectivity (ODBC) is a C languageapplication program interface standard from Microsoft forconnecting to a server, sending SQL requests, and receiving
resu s.JDBC standard is similar to ODBC, for Java
©Silberschatz, Korth and Sudarshan20.12Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
13/55
Transaction Server Process StructureTransaction Server Process Structure A typical transaction server consists of multiple processes accessingdata in shared memory.
Server processes
These receive user queries (transactions), execute them and sendresults back
Processes may be multithreaded , allowing a single process toexecute several user queries concurrently
Typically multiple multithreaded server processes
Lock manager process
Database writer process
©Silberschatz, Korth and Sudarshan20.13Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
14/55
Transaction Server Processes (Cont.)Transaction Server Processes (Cont.)
Log writer process
Server processes simply add log records to log record buffer
og wr ter process outputs og recor s to sta e storage.
Checkpoint process
Performs periodic checkpoints
Process monitor process
Monitors other processes, and takes recovery actions if any of the otherprocesses fail
E.g. aborting any transactions being executed by a server processand restarting it
©Silberschatz, Korth and Sudarshan20.14Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
15/55
Transaction System Processes (Cont.)Transaction System Processes (Cont.)
©Silberschatz, Korth and Sudarshan20.15Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
16/55
Transaction System Processes (Cont.)Transaction System Processes (Cont.)
Shared memory contains shared dataBuffer pool
Log buffer Cached query plans (reused if same query submitted again)
To ensure that no two processes are accessing the same data structureat the same time, databases systems implement mutual exclusionusing either
Operating system semaphores Atomic instructions such as test-and-set
To avoid overhead of inter rocess communication for lock request/grant, each database process operates directly on the locktable
instead of sending requests to lock manager process
©Silberschatz, Korth and Sudarshan20.16Database System Concepts - 5 th Edition
Lock manager process still used for deadlock detection
8/18/2019 L15-16_PPT_IVSem
17/55
Data ServersData Servers
Used in high-speed LANs, in cases where
The clients are comparable in processing power to the server
.
Data are shipped to clients where processing is performed, and thenshipped results back to the server.
s arc tecture requ res u ac -en unct ona ty at t e c ents.
Used in many object-oriented database systems
Issues:Page-Shipping versus Item-Shipping
Locking
Lock Caching
©Silberschatz, Korth and Sudarshan20.17Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
18/55
Data Servers (Cont.)Data Servers (Cont.)- -
Smaller unit of shipping ⇒ more messagesWorth prefetching related items along with requested item
LockingOverhead of requesting and getting locks from server is high due
Can grant locks on requested and prefetched items; with pageshipping, transaction is granted lock on whole page.Locks on a refetched item can be P called back b the server and returned by client transaction if the prefetched item has notbeen used.Locks on the page can be deescalated to locks on items in the
.then be returned to server.
©Silberschatz, Korth and Sudarshan20.18Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
19/55
Data Servers (Cont.)Data Servers (Cont.)a a ac ng
Data can be cached at client even in between transactions
But check that data is up-to-date before it is used ( cache coherency )
Check can be done when requesting lock on data item
Lock Caching
Transactions can acquire cached locks locally, without contactingserver
request. Client returns lock once no local transaction is using it.
Similar to deescalation, but across transactions.
©Silberschatz, Korth and Sudarshan20.19Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
20/55
Parallel SystemsParallel Systems
Parallel database systems consist of multiple processors and multipledisks connected by a fast interconnection network.
- powerful processors
A massively parallel or fine grain parallel machine utilizesthousands of smaller processors.
Two main performance measures:
throughput --- the number of tasks that can be completed in aiven time interval
response time --- the amount of time it takes to complete a singletask from the time it is submitted
©Silberschatz, Korth and Sudarshan20.20Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
21/55
SpeedSpeed- -Up and ScaleUp and Scale- -UpUp
Speedup : a fixed-sized problem executing on a small system is givento a system which is N-times larger.
Measured b :
speedup = small system elapsed timelarge system elapsed time
pee up s near equa on equa s .
Scaleup : increase the size of both the problem and the system
N-times larger system used to perform N-times larger jobMeasured by:
scaleup = small system small problem elapsed time
Scale up is linear if equation equals 1.
©Silberschatz, Korth and Sudarshan20.21Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
22/55
SpeedupSpeedup
©Silberschatz, Korth and Sudarshan20.22Database System Concepts - 5 th Edition
Speedup
8/18/2019 L15-16_PPT_IVSem
23/55
ScaleupScaleup
Scaleu
©Silberschatz, Korth and Sudarshan20.23Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
24/55
Batch and Transaction ScaleupBatch and Transaction Scaleup
Batch scaleup :
A single large job; typical of most decision support queries and
Use an N-times larger computer on N-times larger problem.Transaction scaleup :
umerous sma quer es su m tte y n epen ent users to ashared database; typical transaction processing and timesharingsystems.
- , -many requests) to an N-times larger database, on an N-timeslarger computer.
- .
©Silberschatz, Korth and Sudarshan20.24Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
25/55
Factors Limiting Speedup and ScaleupFactors Limiting Speedup and Scaleup
Speedup and scaleup are often sublinear due to:
Startup costs : Cost of starting up multiple processes may dominate
Interference : Processes accessing shared resources (e.g.,systembus, disks, or locks) compete with each other, thus spending timewaiting on other processes, rather than performing useful work.
Skew : Increasing the degree of parallelism increases the variance inservice times of parallely executing tasks. Overall execution timedetermined by slowest of parallely executing tasks.
©Silberschatz, Korth and Sudarshan20.25Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
26/55
Interconnection Network ArchitecturesInterconnection Network Architectures
Bus . System components send data on and receive data from asingle communication bus;
Does not scale well with increasing parallelism.Mesh . Components are arranged as nodes in a grid, and eachcomponent is connected to all adjacent components
Communication links grow with growing number of components,and so scales better.But may require 2 √n hops to send message to a node (or √n with
wraparound connections at edge of grid).y u . omponen s are num ere n nary; componen s are
connected to one another if their binary representations differ inexactly one bit.
n com onents are connected to lo n other com onents and can
reach each other via at most log(n) links; reduces communicationdelays.
©Silberschatz, Korth and Sudarshan20.26Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
27/55
Interconnection ArchitecturesInterconnection Architectures
©Silberschatz, Korth and Sudarshan20.27Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
28/55
Parallel Database ArchitecturesParallel Database Architectures
Shared memory -- processors share a common memory
Shared disk -- processors share a common disk
--
common diskHierarchical -- hybrid of the above architectures
©Silberschatz, Korth and Sudarshan20.28Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
29/55
Parallel Database ArchitecturesParallel Database Architectures
©Silberschatz, Korth and Sudarshan20.29Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
30/55
Shared MemoryShared Memory
Processors and disks have access to a common memory, typically viaa bus or through an interconnection network.
— shared memory can be accessed by any processor without having tomove it using software.
Downside – architecture is not scalable beyond 32 or 64 processorssince the bus or the interconnection network becomes a bottleneck
Widely used for lower degrees of parallelism (4 to 8) .
©Silberschatz, Korth and Sudarshan20.30Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
31/55
Shared DiskShared Disk
All processors can directly access all disks via an interconnectionnetwork, but the processors have private memories.
Architecture provides a degree of fault-tolerance — if a processorfails, the other processors can take over its tasks since the databaseis resident on disks that are accessible from all rocessors.
Examples: IBM Sysplex and DEC clusters (now part of Compaq)running Rdb (now Oracle Rdb) were early commercial users
subsystem.
Shared-disk systems can scale to a somewhat larger number ofrocessors, but communication between rocessors is slower.
©Silberschatz, Korth and Sudarshan20.31Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
32/55
Shared NothingShared Nothing
Node consists of a processor, memory, and one or more disks.Processors at one node communicate with another processor atanother node using an interconnection network. A node functions asthe server for the data on the disk or disks the node owns.
Examples: Teradata, Tandem, Oracle-n CUBE
Data accessed from local disks (and local memory accesses) do notpass through interconnection network, thereby minimizing theinterference of resource sharing.
Shared-nothing multiprocessors can be scaled up to thousands ofprocessors without interference.
Main drawback: cost of communication and non-local disk access;sending data involves software interaction at both ends.
©Silberschatz, Korth and Sudarshan20.32Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
33/55
HierarchicalHierarchical
Combines characteristics of shared-memory, shared-disk, and shared-nothing architectures.
- – interconnection network, and do not share disks or memory with eachother.
Each node of the system could be a shared-memory system with afew processors.
Alternatively, each node could be a shared-disk system, and each of
the systems sharing a set of disks could be a shared-memory system.Reduce the complexity of programming such systems by distributedvirtual-memory architectures
Also called non-uniform memory architecture (NUMA)
©Silberschatz, Korth and Sudarshan20.33Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
34/55
Distributed SystemsDistributed Systemsata sprea over mu t p e mac nes a so re erre to as s es or
nodes ).
Network interconnects the machines
Data shared by users on multiple machines
©Silberschatz, Korth and Sudarshan20.34Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
35/55
Distributed DatabasesDistributed Databases
Homogeneous distributed databasesSame software/schema on all sites, data may be partitionedamong sitesGoal: provide a view of a single database, hiding details ofdistribution
Heterogeneous distributed databasesDifferent software/schema on different sitesGoal: integrate existing databases to provide useful functionality
Differentiate between local and global transactions A local transaction accesses data in the single site at which thetransaction was initiated.
A global transaction either accesses data in a site different frome one a w c e ransac on was n a e or accesses a a n
several different sites.
©Silberschatz, Korth and Sudarshan20.35Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
36/55
TradeTrade- -offs in Distributed Systemsoffs in Distributed Systems
Sharing data – users at one site able to access the data residing atsome other sites.
–stored locally.
Higher system availability through redundancy — data can bereplicated at remote sites, and system can function even if a site fails.
Disadvantage: added complexity required to ensure propercoordination among sites.
Software develo ment cost.
Greater potential for bugs.
Increased processing overhead.
©Silberschatz, Korth and Sudarshan20.36Database System Concepts - 5 th Edition
Implementation Issues for DistributedImplementation Issues for Distributed
8/18/2019 L15-16_PPT_IVSem
37/55
Implementation Issues for DistributedImplementation Issues for DistributedDatabasesDatabases
Atomicity needed even for transactions that update data at multiple sites
The two-phase commit protocol (2PC) is used to ensure atomicity
as c ea: eac s te executes transact on unt ust e ore comm t,
and the leaves final decision to a coordinator Each site must follow decision of coordinator, even if there is a failure
2PC is not always appropriate: other transaction models based onpersistent messaging, and workflows, are also used
s r u e concurrency con ro an ea oc e ec on requ re
Data items may be replicated to improve data availability
Details of above in Chapter 22
©Silberschatz, Korth and Sudarshan20.37Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
38/55
Network TypesNetwork Types
Local-area networks ( LANs) – composed of processors that aredistributed over small geographical areas, such as a single building ora few adjacent buildings.
Wide-area networks ( WANs) – composed of processors distributed
over a large geographical area .
©Silberschatz, Korth and Sudarshan20.38Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
39/55
Networks Types (Cont.)Networks Types (Cont.)
WANs with continuous connection (e.g. the Internet) are needed forimplementing distributed database systems
discontinuous connection:
Data is replicated.
.
Copies of data may be updated independently.
Non-serializable executions can thus result. Resolution is.
©Silberschatz, Korth and Sudarshan20.39Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
40/55
Parallel DatabasesParallel Databases
Introduction
I/O Parallelism
Intraquery ParallelismIntraoperation Parallelism
Interoperation Parallelism
Design of Parallel Systems
©Silberschatz, Korth and Sudarshan20.40Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
41/55
IntroductionIntroduction
Parallel machines are becoming quite common and affordablePrices of microprocessors, memory and disks have droppedsharplyRecent desktop computers feature multiple processors and thistrend is projected to accelerate
Databases are growing increasingly largelarge volumes of transaction data are collected and stored for lateranalysis.
multimedia objects like images are increasingly stored ina a asesLarge-scale parallel database systems increasingly used for:
storing large volumes of dataprocessing time-consuming decision-support queriesproviding high throughput for transaction processing
©Silberschatz, Korth and Sudarshan20.41Database System Concepts - 5 th Edition
8/18/2019 L15-16_PPT_IVSem
42/55
Parallelism in DatabasesParallelism in Databases
Data can be partitioned across multiple disks for parallel I/O.
Individual relational operations (e.g., sort, join, aggregation) can be
data can be partitioned and each processor can workindependently on its own partition.
,relational algebra)
makes parallelization easier.
.Concurrency control takes care of conflicts.
Thus, databases naturally lend themselves to parallelism.
©Silberschatz, Korth and Sudarshan20.42Database System Concepts - 5 th Edition
ll lll l
8/18/2019 L15-16_PPT_IVSem
43/55
I/O ParallelismI/O Parallelism
Reduce the time required to retrieve relations from disk by partitioning
the relations on multiple disks.
or zon a par on ng – up es o a re a on are v e among many s ssuch that each tuple resides on one disk.
Partitioning techniques (number of disks = n):
Round-robin :
Send the ith tuple inserted in the relation to disk i mod n.
Hash partitioning :Choose one or more attributes as the partitioning attributes.
Choose hash function h with range 0… n - 1
value of a tuple. Send tuple to disk i.
©Silberschatz, Korth and Sudarshan20.43Database System Concepts - 5 th Edition
I/O P ll li (C )I/O P ll li (C )
8/18/2019 L15-16_PPT_IVSem
44/55
I/O Parallelism (Cont.)I/O Parallelism (Cont.)
Partitioning techniques (cont.):
Range parti tioning :
oose an a r u e as e par on ng a r u e.
A partitioning vector [ vo, v1, ..., vn-2] is chosen.Let v be the partitioning attribute value of a tuple. Tuples such thatvi ≤ vi+1 go to disk I + 1. Tuples with v < v0 go to disk 0 and tupleswith v ≥ vn-2 go to disk n-1.
E.g., with a partitioning vector [5,11], a tuple with partitioning attributeva ue o w go o s , a up e w va ue w go o s ,while a tuple with value 20 will go to disk2.
©Silberschatz, Korth and Sudarshan20.44Database System Concepts - 5 th Edition
C i f P i i i T h iC i f P i i i T h i
8/18/2019 L15-16_PPT_IVSem
45/55
Comparison of Partitioning TechniquesComparison of Partitioning Techniques
of data access:
1.Scanning the entire relation.
. – .
E.g., r.A = 25.3.Locating all tuples such that the value of a given attribute lies within aspec e range – range quer es .
E.g., 10 ≤ r.A < 25.
©Silberschatz, Korth and Sudarshan20.45Database System Concepts - 5 th Edition
Comparison of Partitioning Techniques (Cont )Comparison of Partitioning Techniques (Cont )
8/18/2019 L15-16_PPT_IVSem
46/55
Comparison of Parti tioning Techniques (Cont.)Comparison of Parti tioning Techniques (Cont.)
Round robin:
Advantages
es su e or sequen a scan o en re re a on on eac query.
All disks have almost an equal number of tuples; retrieval work isthus well balanced between disks.
Range queries are difficult to process
No clustering -- tuples are scattered across all disks
©Silberschatz, Korth and Sudarshan20.46Database System Concepts - 5 th Edition
Comparison of Partitioning Techniques(Cont )Comparison of Partitioning Techniques(Cont )
8/18/2019 L15-16_PPT_IVSem
47/55
Comparison of Parti tioning Techniques(Cont.)Comparison of Parti tioning Techniques(Cont.)
Hash partitioning:
Good for sequential access
Assuming hash function is good, and partitioning attributes form akey, tuples will be equally distributed between disks
Retrieval work is then well balanced between disks.
Good for point queries on partitioning attribute
,other queries.
Index on partitioning attribute can be local to disk, making lookupand u date more efficient
No clustering, so difficult to answer range queries
©Silberschatz, Korth and Sudarshan20.47Database System Concepts - 5 th Edition
Comparison of Partitioning Techniques (Cont )Comparison of Partitioning Techniques (Cont )
8/18/2019 L15-16_PPT_IVSem
48/55
Comparison of Parti tioning Techniques (Cont.)Comparison of Parti tioning Techniques (Cont.)
Range partitioning:
Provides data clustering by partitioning attribute value.
oo or sequen a access
Good for point queries on partitioning attribute: only one disk needs tobe accessed.
For range queries on partitioning attribute, one to a few disks may needto be accessed
Remaining disks are available for other queries.
Good if result tuples are from one to a few blocks.
If many blocks are to be fetched, they are still fetched from one to afew disks, and potential parallelism in disk access is wasted
Example of execution skew.
©Silberschatz, Korth and Sudarshan20.48Database System Concepts - 5 th Edition
P rtitioning Rel tion cross DisksP rtitioning Rel tion cross Disks
8/18/2019 L15-16_PPT_IVSem
49/55
Partitioning a Relation across DisksPartitioning a Relation across Disks
If a relation contains only a few tuples which will fit into a single diskblock, then assign the relation to a single disk.
disks.
If a relation consists of m disk blocks and there are n disks available inthe s stem, then the relation should be allocated min m,n disks.
©Silberschatz, Korth and Sudarshan20.49Database System Concepts - 5 th Edition
Handling of SkewHandling of Skew
8/18/2019 L15-16_PPT_IVSem
50/55
Handling of SkewHandling of Skew
The distribution of tuples to disks may be skewed — that is, somedisks have many tuples, while others may have fewer tuples.
Attribute-value skew.
Some values appear in the partitioning attributes of many
attribute end up in the same partition.
Can occur with range-partitioning and hash-partitioning.
.
With range-partitioning, badly chosen partition vector mayassign too many tuples to some partitions and too few to
.
Less likely with hash-partitioning if a good hash-function ischosen.
©Silberschatz, Korth and Sudarshan20.50Database System Concepts - 5 th Edition
Handling Skew in RangeHandling Skew in Range--PartitioningPartitioning
8/18/2019 L15-16_PPT_IVSem
51/55
g gg g g g
To create a balanced partitioning vector (assuming partitioning attributeforms a key of the relation):
.
Construct the partition vector by scanning the relation in sorted orderas follows.
th ,partitioning attribute of the next tuple is added to the partitionvector.
n denotes the number of artitions to be constructed. Duplicate entries or imbalances can result if duplicates are present inpartitioning attributes.
©Silberschatz, Korth and Sudarshan20.51Database System Concepts - 5 th Edition
Handling Skew Using Virtual ProcessorHandling Skew Using Virtual Processor
8/18/2019 L15-16_PPT_IVSem
52/55
g gg gPartitioninPartitionin
Skew in range partitioning can be handled elegantly using virtualprocessor partitioning :
of processors)
Assign virtual processors to partitions either in round-robin fashionor based on estimated cost of processing each virtual partition
Basic idea:
If any normal partition would have been skewed, it is very likelythe skew is s read over a number of virtual artitionsSkewed virtual partitions get spread across a number ofprocessors, so work gets distributed evenly!
©Silberschatz, Korth and Sudarshan20.52Database System Concepts - 5 th Edition
Interquery ParallelismInterquery Parallelism
8/18/2019 L15-16_PPT_IVSem
53/55
Interquery ParallelismInterquery Parallelism
Queries/transactions execute in parallel with one another.
Increases transaction throughput; used primarily to scale up a
transactions per second.
Easiest form of parallelism to support, particularly in a shared-memoryarallel database, because even se uential database s stems su ort
concurrent processing.
More complicated to implement on shared-disk or shared-nothing
architecturesLocking and logging must be coordinated by passing messagesbetween processors.
Data in a local buffer may have been updated at another processor.
Cache-coherency has to be maintained — reads and writes of datain buffer must find latest version of data.
©Silberschatz, Korth and Sudarshan20.53Database System Concepts - 5 th Edition
Cache Coherency ProtocolCache Coherency Protocol
8/18/2019 L15-16_PPT_IVSem
54/55
Cache Coherency ProtocolCache Coherency Protocol
Example of a cache coherency protocol for shared disk systems:
Before reading/writing to a page, the page must be locked in.
On locking a page, the page must be read from disk
Before unlocking a page, the page must be written to disk if it was.
More complex protocols with fewer disk reads/writes exist.
Cache coherency protocols for shared-nothing systems are similar..
the page or write it to disk are sent to the home processor.
©Silberschatz, Korth and Sudarshan20.54Database System Concepts - 5 th Edition
Intraquery ParallelismIntraquery Parallelism
8/18/2019 L15-16_PPT_IVSem
55/55
Intraquery ParallelismIntraquery Parallelism
Execution of a single query in parallel on multiple processors/disks;important for speeding up long-running queries.
Intraoperation Parallelism – parallelize the execution of eachindividual operation in the query.
–query expression in parallel.
the first form scales better with increasing parallelism becausethe number of tu les rocessed b each o eration is t icall more than the number of operations in a query
©Silberschatz, Korth and Sudarshan20.55Database System Concepts - 5 th Edition