CS 600.416 Transaction Processing Lecture 18 Parallelism
CS 600.416 Transaction Processing
Lecture 18
Parallelism
CS 600.416 Transaction Processing
Motivation for Parallel Databases
• Extremely large data sets– Special application needs: computer-aided design, World Wide
Web
• Queries that have large data requirements– Decision support systems, statistical analysis
• Inherent parallelism in data– Set oriented nature of relations
• Commoditization of parallel computers– 2 or 4 SMPs are commonplace– Clustering software for multiple SMPs is freely available– Weak point in the argument in light of mainframe OSes
CS 600.416 Transaction Processing
Motivation for Parallel Databases
• 2 Major reasons for parallel databases that we learned on the previous slide
• Large data sets, application and query– Because we need it
• Parallel computers and feasible application domain– Because we can
CS 600.416 Transaction Processing
Motivation Reality Check
• Have always needed parallel DBs– DBs have always stretched the capabilities of computer
architectures– Enterprises have always grown to a DBs capabilities
• Distribution cannot really solve the problem– Replication and latency concerns
•As we learned from the paper last week
– Isolation problems•Fault and performance isolation
• One big computer is more powerful than 2 equivalent small computers
– Parallel machines look like 1 big computer from the outside
CS 600.416 Transaction Processing
Parallelism
Theoretically, the execution of a task T onto the computer system Pn should be n times faster than processor P1
Pn
P1 P2 Pn
N processor system
T
T1 T2 Tn
Tasks of equal sizes
CS 600.416 Transaction Processing
Parallelism
• Hardware Parallelism– Parallelism “available” as a result of the existing resources
– Egs: multiprocessors, RAIDS, etc.
• Software Parallelism– parallelism that could be "discovered" in an application
– Egs: parallel algorithms, programming style, compiler optimization
CS 600.416 Transaction Processing
Speedup, Efficiency, and Scaleup
• Definition:– T(p,N) = time to solve problem of size N on p processors
• Speedup:– S(p,N) = T(1,N)/ T(p,N)– Compute same problem with more processors in shorter time
• Efficiency:– E(p,N) = S(p,N) / p
• Scaleup:– Sc(p,N) = N / n with T(1,n) = T(p,N)– Compute larger problem with more processors in same time
• Problems:– S(p,N) close to p or far less ? -> Sub linear speedup
CS 600.416 Transaction Processing
Scale up
• Two kinds:– Batch scaleup:
– The size of the task increases
– E.g: size of database increase, sequential scan is proportionately increased
– Transaction scaleup– Rate of submission of task increases
– Each task may still be short lived
CS 600.416 Transaction Processing
Scaleup, Speedup
sublinear speedup
linear speedup
resources
s
p
e
e
d
Ts
--
Tl
sublinear scaleup
linear scaleup
problem size
CS 600.416 Transaction Processing
Factors against parallelism
• Startup– Thousands of processes may influence startup costs
• Interference– synchronization, communication – Even 1% contention limits speedup to 37x
• Skew– Efficiently load balancing– At fine granularity, variance can exceed mean time to
finish one parallel step
CS 600.416 Transaction Processing
When is parallelism available?
• Good if,– Operations access significant amount of data for e.g joins of large
tables, bulk inserts, aggregation, copying, queries, etc
– Symmetric multiprocessors
– Sufficient IO bandwidth, under utilized or intermittently used CPUs
• Bad if,– Query execution/transactions are short lived
– CPU, memory and IO resources are heavily utilized
• Software parallelism should utilize hardware parallelism
CS 600.416 Transaction Processing
Parallel Architectures
• Stonebraker’s simple taxonomy for parallel architectures:– Shared memory : Processors share common memory
– Shared disk/Clusters: Processors share a common set of disks
– Shared nothing: Network sharing
– Hierarchical: Hybrid of architectures above
CS 600.416 Transaction Processing
Shared Memory
MP
P
P
P
Processors share common memory
Common in SMP systems
CS 600.416 Transaction Processing
Shared Nothing
• Pros– Cost
• Use inexpensive computers to build such a system
– Extensibility• Promotes incremental growth
– Availability• Redundancy can be introduced by replication of data
• Cons– Complex
• Distributed database concepts in parallel setup
– Difficult to achieve load balancing• Relies on software parallelism
CS 600.416 Transaction Processing
Shared Disk
P
P
P
P
M
M
M
MProcessors share a common set of disks
Common in clusters
Networked attached I/O protocols
make more readily available
CS 600.416 Transaction Processing
Shared Disk
• Features– Shared disk access but exclusive memory access– Global locking protocols are needed
• Pros– Cost
• Lower as standard I/O interconnects can be used
– Extensibility• Interference is minimized by exclusive memory cache
– Availability• Degree of fault tolerance in both processor subsystem and disks
• Cons– Highly Complex– Shared-disk as a potential bottleneck
CS 600.416 Transaction Processing
Shared Nothing
P
P
P
P
P
MM
M
MM
Network sharing only
Parallelism available without
any hardware support
CS 600.416 Transaction Processing
Shared Memory
• Pros– Fast processor to processor communication
• No software primitives required
– Simplicity• Meta and control information shared by all
• Cons– Cost
• Expensive interconnect
– Limited extensibility• The shared memory soon becomes a bottleneck• Limited to 10-20 processors
– Cache coherency– Low availability
• Availability depends on the robustness of memory
CS 600.416 Transaction Processing
So far…
• Parallelism and its measures• Problems with parallelism…• Parallel architectures…
CS 600.416 Transaction Processing
I/O and Databases
• What’s important about I/O– Reminder: the performance measure for all DBs is the number of
I/Os. – For the most part, it is the only thing that matters
• Why is I/O inherently parallel?– Even a machine with 1 processor has multiple disks– Placement of data on these disks greatly effects performance
• What does this tell us about parallel DBs?– Parallelism is not necessarily about supercomputers, but occurs at
many levels in computer systems– Every system has some degree of parallelism, can be between
scheduling the different processing units in a CPU
CS 600.416 Transaction Processing
I/O or Disk Parallelism
• Partition data onto multiple disks– Most frequently horizontal paritioning
– Conduct I/O to all disks at the same time
• Techniques– Round-robin: send ith tuple to disk i mod n in an n-disk system
– Hash partitioning: send tuple n to disk f(n) where f is a uniformly distributed random function
– Range partitioning: break tuples up into contiguous ranges of keys, requires a key that can be ordered linearly
– Multi-dimensional partitioning strategies: used for spatial data, images, other mutli-dimensional sets much recent work
CS 600.416 Transaction Processing
Workloads
• Several important/expected workloads– Scanning the entire relation
– Locating a tuple (identity query)
– Locating a set of tuples based on attribute value•Range query, e.g. 100<a<200
•Find all people whose names start with A–Note this is not an identity query
CS 600.416 Transaction Processing
Range partitioning
• Parition requires a partitioning attribute A usually the primary key• A vector of dimension n partitions A
– Vector {v0,v2,…,vn-1}
• Each tuple t goes into:– Partition 0 if t[A] < v0– Partition n-1 if t[A] > vn-2– Partition k if t[A] > vk-1 and t[A] < vk, k >=1
• Simple range paritioning #disks = #partitions• Combined with round robin #disk*k = #partitions
– Has some benefits of avoiding variance in any one partition
CS 600.416 Transaction Processing
Some Practicalities
• Disk blocks are what we partition– Block size is generally a tradeoff between I/O performance and utilization
• bigger blocks are better for performance, more I/O• bigger blocks fragment data, leading to poor space utilization
– Blocks are generally set to the page size• bigger than we would like • often lots of space fragemented (> 50% in file systems)
• What is the problem with larger blocks?– Small relations don’t get placed on as many disks, less parallelism
• What is the problem with small blocks?– Well pages are what OSes read?– Performance suffers
• Some applications with known large data use larger block sizes– Paricularly scientific applications
CS 600.416 Transaction Processing
Workloads Round Robin
• Ups– Good for scans, sequential, parallel, entirely load balanced– What about unfairness in the tail (if you start on the same block all
the time)• randomize start block• use a next block policy
• Downs– Identity queries search n blocks (n/2 if item always exists and is a
key)(n blocks if not a key or to establish it is not found)– Range queries search n blocks, there is not relationship between
key value and placement
CS 600.416 Transaction Processing
Workloads Hash Partition
• Ups– Good for identity queries
• isolates query to a single disk
– Good for sequential scans• low variance in hashing \Omega(log t) for relations with cardinality t
• essentially d (number of disks) times speedup over a single disk system (actually d/(1+\Omega(logt))
• Downs– Bad for range queries, search n blocks
– Bad for identity queries on non-partitioning attributed• e.g. partition/hash on SS# and lookup by last name
CS 600.416 Transaction Processing
Workloads Range Partition
• Ups– Good for identity queries
• isolates query to a single “data” disk/block• must generally read another block to read range information, which hash partitions do not require
–Indices can be large
• Ambiguous-es– Range queries
• good performance when queries access few items–Isolates queries to one or few disks–Allows other queries to run in parallel on other disks
•Bad when accessing lots of data items– Can localize traffic to few disks, creating a hot spot
•Really the good outweighs the bad here
CS 600.416 Transaction Processing
Handling Skew
• Attribute value skew – when lots of tuples are clustered around the same (or nearly same value)
– Occurs in range partitioning
– Imagine a relation with 2 values of an attribute and k disks• Only two will be used
• Partition skew – load imbalance when there is no attribute skew
– O(log t) for t tuples in hash partitioning, no problem
– From poorly constructed range vector
CS 600.416 Transaction Processing
Constructing a Range Vector
• Balanced range partitioning vector can be constructed by– Sorting existing tuples – but incurs I/O costs when sorting and does
not keep the partitioning balanced as new inserts arise
– Using a B-tree, but this limits occupancy of tuples in disks blocks, which ultimately limits I/O performance
– Statistics – keep counts of values based on buckets of values, but this has problems of AV skew within buckets and estimation (Histogram)
CS 600.416 Transaction Processing
Virtual Processor Technique
• Create many virtual processors and map ranges to virtual processors
• Assign the virtual processors to real processors– This eliminates skew, because each processor is accessing many
virtual processors, which are more likely to have close to mean load
• Allows a system to use a “poor” range partition and not have problems with skew
– Generally DBs use histograms with VPs
CS 600.416 Transaction Processing
Lessons Learned
• Parallelism is important, even for single machines
• Disk based parallelism is the most important kind of paralellism
– I/O is the bottleneck in databases
– Not really entirely true, networking is starting to be the bottleneck in distributed TP applications
• Know thy data