Network Computing Laboratory Dynamic Load Dynamic Load Distribution in the Distribution in the Borealis Stream Borealis Stream Processor Processor Ying Xing, Stan Zdonik, Jeong-Heon Hwang Ying Xing, Stan Zdonik, Jeong-Heon Hwang Brown Univ. Brown Univ. ICDE 2005 ICDE 2005
16
Embed
Dynamic Load Distribution in the Borealis Stream Processor
Dynamic Load Distribution in the Borealis Stream Processor. Ying Xing, Stan Zdonik, Jeong-Heon Hwang Brown Univ. ICDE 2005. One line comment. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Network Computing Laboratory
Dynamic Load Dynamic Load Distribution in the Distribution in the Borealis Stream Borealis Stream ProcessorProcessor
Ying Xing, Stan Zdonik, Jeong-Heon HwangYing Xing, Stan Zdonik, Jeong-Heon HwangBrown Univ.Brown Univ.
ICDE 2005ICDE 2005
One line commentOne line comment
Proposed an algorithm which Proposed an algorithm which balances load dynamically by balances load dynamically by distributing operators under highly distributing operators under highly fluctuating data in the context of fluctuating data in the context of clustered Borealis (CQE) systemclustered Borealis (CQE) system
ProblemProblem
Cluster of Borealis nodes
•In a push-based (CQE) system load-fluctuation occurs in the input data rate
•Temporary load spike can affect data processing latency significantlyWe should avoid
temporary overload as much as possible!
ChallengesChallengesConnected Plan
A BS1
C DS2
r
2r
2cr
4cr
S1
S2
A B
C D
r
2r
3cr 3cr
A Better Plan
Cluster of Borealis nodes
What operator mapping plancan balance the load best?
How should we rearrange the plandynamically as the load changes?
Solution approachSolution approach
Busy all together, idle all togetherBusy all together, idle all togetherFind out the operators that are busy at the Find out the operators that are busy at the same timesame time
Calculate the correlation of the operatorsCalculate the correlation of the operators
Distribute busy operatorsDistribute busy operatorsMove the operators from a heavily loaded machine to Move the operators from a heavily loaded machine to under loaded machineunder loaded machine
Perform the above operations periodicallyPerform the above operations periodically
Propose a two-phase operator distribution Propose a two-phase operator distribution algorithmalgorithm
load time series of operators or nodesload time series of operators or nodesLoad of an operatorLoad of an operator
# of tuples arrived * CPU time required for a tuple# of tuples arrived * CPU time required for a tuple
Load of a machineLoad of a machineSum of the loads of it’s operatorsSum of the loads of it’s operators
only keep the recent only keep the recent KK statistics statistics
Average load of a machine XAverage load of a machine X11
Average of load time series SAverage of load time series S11=(s=(s11, s, s22,…, s,…, skk))
Correlation of operators XCorrelation of operators X1, 1, XX22
Correlation of load time series SCorrelation of load time series SX1X1, S, SX2X2
Ideal state of the clusterIdeal state of the cluster
Average load of all machines are Average load of all machines are equalequal
Minimize the average of each Minimize the average of each machine’s load variancemachine’s load variance
Make the lower bound of the average Make the lower bound of the average variance as small as possiblevariance as small as possible
1 2 3 4 5 6
Pair-wise Load Pair-wise Load Distribution AlgorithmDistribution Algorithm
7One-way
8 9
Select operators having the greatest score until the load of the selected operators exceed (L1-L2)/2
Score function:
Co(O1, M1) – Co(O1, M2)
M1
M2
1 2 3 4 5 6
Pair-wise Load Pair-wise Load Distribution AlgorithmDistribution Algorithm
7Two-way
8 9
M1
M2
•Redistribute all movable operators
•Lower loaded node is selected
•Operators are assigned one by one
•Operator having the highest score is selected
8 9
M1
Global Operator Global Operator DistributionDistribution
M1
M2
M1
•Redistribute all movable operators after warm up period
•A node with the lowest load is selected
•Operators are assigned one by one
•Operator having the highest score is selectedScore function:
Experimental resultsExperimental results
1. computation overhead of the algorithms1. computation overhead of the algorithms2. Effectiveness of the global algorithm2. Effectiveness of the global algorithm
Strong pointsStrong pointsBalance loads according to the change of input data rBalance loads according to the change of input data rate (data pushing into the system)ate (data pushing into the system)A simple algorithm using correlationA simple algorithm using correlation
Weak pointsWeak pointsUnrealistic work-load (operator chains, input streams)Unrealistic work-load (operator chains, input streams)Hard to define parameters of statistics measurementHard to define parameters of statistics measurement
Load collection period, score threshold, # of time series. …Load collection period, score threshold, # of time series. …It must be changed depending on the workloadIt must be changed depending on the workload
If an input fluctuation doesn’t have any historical behaIf an input fluctuation doesn’t have any historical behavior the effect will be limitedvior the effect will be limitedDoesn’t consider about dynamic changes of an operatDoesn’t consider about dynamic changes of an operator network (query addition, deletion)or network (query addition, deletion)
Parameters for Parameters for Experiments Experiments (supplementary)(supplementary)Independent linear operator chain(10 Independent linear operator chain(10