Top Banner
Providing Resiliency to Load Providing Resiliency to Load Variations Variations in Distributed Stream in Distributed Stream Processing Processing Ying Xing, Ying Xing, Jeong-Hyon Hwang Jeong-Hyon Hwang , Ugur Cetintemel, Stan Zdonik , Ugur Cetintemel, Stan Zdonik Brown University Brown University
22

Providing Resiliency to Load Variations in Distributed Stream Processing Ying Xing, Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik Brown University.

Jan 12, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Providing Resiliency to Load Variations in Distributed Stream Processing Ying Xing, Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik Brown University.

Providing Resiliency to Load VariationsProviding Resiliency to Load Variationsin Distributed Stream Processingin Distributed Stream Processing

Ying Xing, Ying Xing, Jeong-Hyon HwangJeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik, Ugur Cetintemel, Stan Zdonik

Brown UniversityBrown University

Page 2: Providing Resiliency to Load Variations in Distributed Stream Processing Ying Xing, Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik Brown University.

2Jeong-Hyon Hwang ([email protected])

Stream Processing

Monitoring Apps

Financial Data Streams

Surveillance

Network MonitoringClick Stream Analysis

Traffic Monitoring Sensor Network

Page 3: Providing Resiliency to Load Variations in Distributed Stream Processing Ying Xing, Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik Brown University.

3Jeong-Hyon Hwang ([email protected])

Distributed Stream Processing

Page 4: Providing Resiliency to Load Variations in Distributed Stream Processing Ying Xing, Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik Brown University.

4Jeong-Hyon Hwang ([email protected])

Roadmap

•Problem Statement• Linear Load Model

• Feasible Set• The Algorithm

•Extensions• Lower Bound of Input Rates• Non-linear Load Model• Network Bandwidth / Communication Overhead

•Experimental Results•Related Work•Conclusions

Page 5: Providing Resiliency to Load Variations in Distributed Stream Processing Ying Xing, Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik Brown University.

5Jeong-Hyon Hwang ([email protected])

Problem Statement

• Goal

• Find an operator distribution with the largest feasible set size

r1

r2

r1

r1

r2

r1

r2

A:{ok } {Ni }

arg max 1F (A)

dr1drd , where F(A) (r1,r2 ,,rd ) : lA (Ni ) Ci .

Input Rate SpaceOperator Distribution

feasible infeasible

Feasible Set

Page 6: Providing Resiliency to Load Variations in Distributed Stream Processing Ying Xing, Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik Brown University.

6Jeong-Hyon Hwang ([email protected])

Linear Load Model• rj - input rate of input j (tuples/sec)• ck - processing cost of operator ok (CPU cycles/tuple)• l(ok) - the processing load of operator ok (CPU cycles/sec)

• sk - selectivity of operator ok ( [# output tuples] / [# of input tuples] )

o1o1

o3o3

o2o2

o4o4

s1r1

l(N1) l(o1) l(o3) c1r1 c3r2l(N2 ) l(o2 ) l(o4 ) c2s1r1 c4s2r2

l(o1) c1r1

l(o3) c3r2

r1

r2

l(Ni ) li1r1 li2r2 lidrd Ci

l(o2 ) c2s1r1

s3r2l(o4 ) c4s3r2

Page 7: Providing Resiliency to Load Variations in Distributed Stream Processing Ying Xing, Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik Brown University.

7Jeong-Hyon Hwang ([email protected])

Example Feasible Sets

o1o1

o3o3

o2o2

o4o4

r1

r2

0

o1o1

o4o4

o2o2

o3o3

r1

r2

0

o1o1

o3o3

o2o2

o4o4

r1

r2

0

12r1 9r2 C1

8r1 11r2 C2

Page 8: Providing Resiliency to Load Variations in Distributed Stream Processing Ying Xing, Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik Brown University.

8Jeong-Hyon Hwang ([email protected])

“Ideal” Feasible Set

•Theorem 1. Feasible Set is maximized when load coefficients of each input are perfectly balanced over all nodes (relative to their capacities)

l(N1) l11r1 l12r2 l1drd C1

l(N2 ) l21r1 l22r2 l2drd C2

l(Nn ) ln1r1 ln2r2 lndrd Cn

lij CiCkk

lkjk

o1o1

o3o3

o2o2

o4o4

12r1 9r2 C1

8r1 11r2 C2

r1

r2

0r1

r2

020

C2

C1 C2

r1 20C2

C1 C2

r2 C2

20C1

C1 C2

r1 20C1

C1 C2

r2 C1

C1 C2

20

C1 C2

20

Page 9: Providing Resiliency to Load Variations in Distributed Stream Processing Ying Xing, Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik Brown University.

9Jeong-Hyon Hwang ([email protected])

Resilient Operator Distribution Algorithm

1. Compute the Ideal Feasible Set

2. Sort Operators based on Load Coefficients

3. For each operator, determine the destination server

r2

0 r1

Ideal Feasible Set

Page 10: Providing Resiliency to Load Variations in Distributed Stream Processing Ying Xing, Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik Brown University.

10

Jeong-Hyon Hwang ([email protected])

Result: R.O.D. vs Load Balancing

10 nodes

5 input streams

Page 11: Providing Resiliency to Load Variations in Distributed Stream Processing Ying Xing, Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik Brown University.

11

Jeong-Hyon Hwang ([email protected])

Result: Latency of a Network Monitoring Query

Page 12: Providing Resiliency to Load Variations in Distributed Stream Processing Ying Xing, Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik Brown University.

12

Jeong-Hyon Hwang ([email protected])

Extension: Network Bandwidth & Comm. Overhead

•Network Bandwidth

•Comm. Overhead

Page 13: Providing Resiliency to Load Variations in Distributed Stream Processing Ying Xing, Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik Brown University.

13

Jeong-Hyon Hwang ([email protected])

Extension: Nonlinear Load Model

•Add an artificial variable

…r1

…o1o1 ouou ou+1

ou+1 omom

…r1

o1o1 ouou

r2…ou+1

ou+1 omom

r2

Page 14: Providing Resiliency to Load Variations in Distributed Stream Processing Ying Xing, Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik Brown University.

14

Jeong-Hyon Hwang ([email protected])

Extension: Lower Bound of Input Rates

•Use the lower bound instead of the origin

0 r1

r2

0 r1

r2

Page 15: Providing Resiliency to Load Variations in Distributed Stream Processing Ying Xing, Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik Brown University.

15

Jeong-Hyon Hwang ([email protected])

Related Work

•Traditional Distributed Systems- Load balancing and load sharing [Shivaratri92]

[Diekmann97]

- Parallel query processing [DeWitt92]

- Graph partitioning [Walshaw97] [Schloegel00]

•Stream Processing Systems- Load management•Flux [Shah03] – data partitioning based

parallel continuous query processing•Medusa [Balazinska04] – federated

distributed stream processing

Page 16: Providing Resiliency to Load Variations in Distributed Stream Processing Ying Xing, Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik Brown University.

16

Jeong-Hyon Hwang ([email protected])

Conclusion

•Distributed Stream Processing

•Resilient Operator Distribution

- Maximize feasible set size

•Performance

- Much better than conventional load distribution algorithms

Page 17: Providing Resiliency to Load Variations in Distributed Stream Processing Ying Xing, Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik Brown University.

Backup Slides

Page 18: Providing Resiliency to Load Variations in Distributed Stream Processing Ying Xing, Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik Brown University.

Computation Complexity

Computation time is determined by n – number of nodes m –number of operators d –number of system input streams k – number of samples in load time series

Static operator distribution

Dynamic operator distribution

Page 19: Providing Resiliency to Load Variations in Distributed Stream Processing Ying Xing, Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik Brown University.

19

Jeong-Hyon Hwang ([email protected])

Heuristics

• Heuristic #1

• Choose the case where feasibility boundaries are close on each axis

• Heuristic #2

• Choose the case where all the feasibility boundaries are far from the orgin.

r1

r2

0

r1

r2

0r1

r2

0

r1

r2

0

Page 20: Providing Resiliency to Load Variations in Distributed Stream Processing Ying Xing, Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik Brown University.

Resilient vs. Optimal

2 nodes

4 input streams

Page 21: Providing Resiliency to Load Variations in Distributed Stream Processing Ying Xing, Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik Brown University.

Varying Bandwidth Constraints

Resilient vs. Connected-Load-Balancing

Page 22: Providing Resiliency to Load Variations in Distributed Stream Processing Ying Xing, Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik Brown University.

Varying Data Communication CPU Overhead

Resilient vs. Connected-Load-Balancing