Motivation for Parallel Databases (The Database Problem) ☞ I/O bottleneck (or memory access bottleneck): • Speed(disk) < speed(RAM) < speed(microprocessor) ☞ Predictions • (Micro-) processor speed growth : 0.5× per year • DRAM capacity growth : 4× every three years • Disk throughput : 2× in the last ten years ☞ Conclusion : the I/O bottleneck worsens Spring, 2006 Arturas Mazeika Page 3 Parallel Databases ☞ Introduction, General Idea, Motivation ☞ Parallel Architectures ☞ Parallel DBMS Techniques Motivation for Parallel Databases (The Solution) ☞ Increase the I/O bandwidth • Data partitioning • Parallel data access ☞ Origins (1980’s): database machines • Hardware-oriented (failure due to bad cost-performance) ☞ 1990’s: same solution but using standard hardware components integrated in a multiprocessor • Software-oriented • Standard essential to exploit continuing technology improvements Spring, 2006 Arturas Mazeika Page 4 The Target Architecture Shared Memory (SM) Shared Disk (SD) Shared Nothing (SN) Spring, 2006 Arturas Mazeika Page 2
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Motivation for Parallel Databases (The Database Problem)
• Hardware-oriented (failure due to bad cost-performance)
☞ 1990’s: same solution but using standard hardware components integrated in amultiprocessor
• Software-oriented
• Standard essential to exploit continuing technology improvements
Spring, 2006 Arturas Mazeika Page 4
The Target Architecture
Shared Memory (SM) Shared Disk (SD)
Shared Nothing (SN)
Spring, 2006 Arturas Mazeika Page 2
Parallel Data Processing (Data Based Parallelism)
☞ Databases archive parallelism through inter-operations and intra-operations
☞ Inter-operation: one query is broken into a number of operations that are run in parallel
☞ Intra-operation the same query is run on different portions of the data in parallel
Spring, 2006 Arturas Mazeika Page 7
Motivation for Parallel Databases (Multiprocessor Objectives)
☞ High-performance with better cost-performance than mainframe or vector supercomputer
☞ Use many nodes, each with good cost- performance, communicating through network
• Good cost via high-volume components
• Good performance via bandwidth
☞ Trends
• Microprocessor and memory (DRAM): off-the-shelf
• Network (multiprocessor edge): custom
☞ The real challenge is to parallelize applications to run with good load balancing
Spring, 2006 Arturas Mazeika Page 5
Parallel Data Processing (Objectives of Parallel DBMS 1/3)
☞ High-performance through parallelism
☞ High availability and reliability by exploiting data replication
☞ Extensibility with the ideal goals
• Linear speed-up
• Linear scale-up
Spring, 2006 Arturas Mazeika Page 8
Parallel Data Processing (General Idea)
☞ Three ways of exploiting high-performance multiprocessor systems:
• Automatically detect parallelism in sequential programs (e.g., Fortran)
• Augment an existing language with parallel constructs (e.g., C, Fortran90)
• Offer a new language in which parallelism can be expressed or automatically inferred
☞ Critique
• Hard to develop parallelizing compilers, limited resulting speed-up
• Enables the programmer to express parallel computations but too low-level
• Can combine the advantages of both (1) and (2)
Spring, 2006 Arturas Mazeika Page 6
Parallel Data Processing (Barriers to Parallelism)
☞ Startup
• The time needed to start a parallel operation may dominate the actual computationtime (especially for small queries)
☞ Interference
• When accessing shared resources, each new process slows down the others (twoprocesses access the same data item)
☞ Skew
• The response time of a set of parallel processes is the time of the slowest one (oneprocess has lots of work, while the others are idle)
Spring, 2006 Arturas Mazeika Page 11
Parallel Data Processing (Objectives of Parallel, Linear Speed-Up DBMS 2/3)
☞ Every system component (processor, memory, disk) increases the performance by aconstant (the performance is proportional to the number of components)
Spring, 2006 Arturas Mazeika Page 9
Parallel Architectures (Functional Architecture of Parallel Database System 1/2)
Spring, 2006 Arturas Mazeika Page 12
Parallel Data Processing (Objectives of Parallel, Linear Speed-Up DBMS 3/3)
☞ The same performance as the size of the database increases and the number ofcomponents of the system increases
☞ Combines good load balancing of shared-memory with extensibility of shared-nothing
☞ Alternatives
• Limited number of large nodes, e.g., 4×16 processor nodes
• High number of small nodes, e.g., 16 × 4 processor nodes, has much bettercost-performance (can be a cluster of workstations)
☞ Advantages: better load balancing
☞ Disadvantages: More complicated design
Spring, 2006 Arturas Mazeika Page 22
Data Placement (Main Implementations 2/2)
☞ Each relation is divided in n partitions (sub-relations), where n is a function of relationsize and access frequency
☞ Data partitioning supports good load balancing
☞ Three implementations are common:
• Round-robin
– Maps i-th element to node i mod n– Simple but only exact-match queries– Updates should be handled carefully
• B-tree index
– Supports range queries but large index
• Hash function
– Only exact-match queries but small index
Spring, 2006 Arturas Mazeika Page 27
Parallel DBMS Techniques
Parallel DBMS techniques rely on four inter-dependent concepts:
☞ Data placement
• Physical placement and replication of the DB onto multiple nodes
☞ Parallel data processing
• The main issue is to process joins, selects are easy
☞ Parallel query optimization
• Automatic parallelization of the queries and load balancing are main issues
• Choice of the best parallel execution plans
☞ Transaction management
• Similar to distributed transaction management
Spring, 2006 Arturas Mazeika Page 25
Data Placement (Issues in Data Partitioning)
☞ Simple involving (small) relations should be partitioned separately from the large ones (itdoes not make sense to partition a few blocks to a thousand of places)
☞ Large relations also should not be fragmented everywhere in large parallel systems.Consider a binary join in full fragmentation in 1024 CPU parallel architecture. Thenumber of messages between nodes would be 10,242.
☞ The data should be periodically repartitioned to support balanced loads and changes indata distributions (due to updates)
Spring, 2006 Arturas Mazeika Page 28
Data Placement (Main Implementations 1/2)
Spring, 2006 Arturas Mazeika Page 26
Data Placement (Data Replication, Interleaved Partitioning)
☞ fragment the data of the node, and mirror each fragments to a different node
☞ System is unrecoverable, if two nodes fail
Spring, 2006 Arturas Mazeika Page 31
Data Placement (Data Replication)
☞ High-availability requires data replication
• Mirror disks
• Interleaved partitioning
• Chained partitioning
Spring, 2006 Arturas Mazeika Page 29
Data Placement (Data Replication, Chained Partitioning)
☞ A copy is stored on two adjacent nodes
☞ The probability that two adjacent nodes fails is much lower than the probability of any twonodes (?)
☞ Simple strategy to mirror nodes
Spring, 2006 Arturas Mazeika Page 32
Data Placement (Data Replication, Mirroring)
☞ keep the entire copy of one node at the other
☞ simple strategy
☞ hurts the load balancing if one node fails
Spring, 2006 Arturas Mazeika Page 30
Join Processing (Parallel Associative Join)
☞ Assumptions: equi-join, partition according to the join attribute (S is partitionedaccording to the hash function h)
☞ Send Ri = R|h(R.A) to Si
☞ Join Ri with Si
☞ Return ∪ki=1Ri on Si
Spring, 2006 Arturas Mazeika Page 35
Join Processing
☞ Three basic algorithms for intra-operator parallelism
• Parallel nested loop join: no special assumption
• Parallel associative join: one relation is de-clustered on join attribute and equi-join
• Parallel hash join: equi-join
☞ They also apply to other complex operators such as duplicate elimination, union,intersection, etc. with minor adaptation
☞ Let
• R1, R2, . . . , Rm be fragments of R
• S1, S2, . . . , Sk be fragments of S
• JP be the join predicate
Spring, 2006 Arturas Mazeika Page 33
Join Processing (Parallel Hash Join)
☞ Generalizes associative join, does not require any specific partitioning
☞ Partition both R and S into mutually exclusive sets: Ri = R|h(R.A), Si = S|h(S.A)
☞ Send Si and Ri to the same node
☞ Return ∪pi=1Ri on Si
Spring, 2006 Arturas Mazeika Page 36
Join Processing (Parallel Nested Loop)
☞ Send the whole R to all sites with Si
☞ Join R with Si at all sites
☞ Return ∪ki=1R on Si
Spring, 2006 Arturas Mazeika Page 34
Parallel Query Optimization (Search Space)
☞ Two subsequent operations are executed through the pipeline
☞ One operand of the pipeline usually should be stored (hash table in case of the join)
☞ Build and probe steps are typical phases of the pipeline (first the hash table is built thenit is queried for joining)
☞ The parallel hierarchical architecture is based on the following concepts
• Divide the query tree into activations. Assign each activation to a shared memory(SM) machine. Activation is the smallest unit of sequential processing that cannot befurther partitioned
• Make sure that there are more fragments than CPUs. In this case the SM is stillpossible to enjoy parallelism
• Run more threads in one SM that there are CPUs. In this case CPUs will be busy,though some threads are blocked due to unavailable recourses.
• Maximize parallelism in one SM, minimize parallelism between different SMs.
• Allow parallelism SM, so busy SMs give work to idle SMs. Compute the benefit of thetransfer of the work. As a rule of thumb send builds to idle SMs, rather than the data.