Top Banner
Applying Control Theory to Stream Processing Systems Wei Xu ([email protected] ) Bill Kramer ([email protected] ) Joe Hellerstein ( hellers@us. ibm.com )
23
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Applying Control Theory to Stream Processing Systems Wei Xu (xuw@cs.berkeley.edu)xuw@cs.berkeley.edu Bill Kramer (kramer@lbl.gov)kramer@lbl.gov Joe Hellerstein.

Applying Control Theory to Stream Processing Systems

Wei Xu ([email protected])Bill Kramer ([email protected])

Joe Hellerstein ( [email protected] )

Page 2: Applying Control Theory to Stream Processing Systems Wei Xu (xuw@cs.berkeley.edu)xuw@cs.berkeley.edu Bill Kramer (kramer@lbl.gov)kramer@lbl.gov Joe Hellerstein.

Description of the system

TCQComplex internal structure

Input BufferData Source

TCQ drops tuples silently if result queue is full

Page 3: Applying Control Theory to Stream Processing Systems Wei Xu (xuw@cs.berkeley.edu)xuw@cs.berkeley.edu Bill Kramer (kramer@lbl.gov)kramer@lbl.gov Joe Hellerstein.

Why do we need control?• Data source does not provide accurate data rate

0 1 2 3 4 5 6

x 105

0

500

1000

1500

2000

2500

3000

3500

4000

4500

time (ms)

num

ber

of t

uple

s pe

r se

c

desired load

actual load

Page 4: Applying Control Theory to Stream Processing Systems Wei Xu (xuw@cs.berkeley.edu)xuw@cs.berkeley.edu Bill Kramer (kramer@lbl.gov)kramer@lbl.gov Joe Hellerstein.

Why do we need control?• TCQ node drops tuples when result queue fill up

0 1 2 3 4 5 6 7 8 9

x 105

0

1000

2000

3000

time (ms)

tup

les p

er

se

c

source data rateend-to-end drop rate

0 1 2 3 4 5 6 7 8 9

x 105

0

1000

2000

3000

time (ms)

tup

les p

er

se

c

output rate of buffer

0 1 2 3 4 5 6 7 8 9

x 105

0

2

4

6x 10

5

time (ms)

fre

e s

pa

ce

(K

B)

free space

Buffer

Source

TCQ

Result Q

Page 5: Applying Control Theory to Stream Processing Systems Wei Xu (xuw@cs.berkeley.edu)xuw@cs.berkeley.edu Bill Kramer (kramer@lbl.gov)kramer@lbl.gov Joe Hellerstein.

Control Problems

• Providing an accurate data source– Get the actual data rate

• Regulate queue length on TCQ node– Prevent dropping tuples – Maximize throughput (and adapts when distur

bance happens)

Page 6: Applying Control Theory to Stream Processing Systems Wei Xu (xuw@cs.berkeley.edu)xuw@cs.berkeley.edu Bill Kramer (kramer@lbl.gov)kramer@lbl.gov Joe Hellerstein.

System with Control

Output Rate Controller

ControlledData Source

Queue Length Monitor

2

Page 7: Applying Control Theory to Stream Processing Systems Wei Xu (xuw@cs.berkeley.edu)xuw@cs.berkeley.edu Bill Kramer (kramer@lbl.gov)kramer@lbl.gov Joe Hellerstein.

The Control Architecture

P Controller

PI Controller

Joseph L Hellerstein
I don't understand the direction of the arrows for the reference, error, and control inputs
Page 8: Applying Control Theory to Stream Processing Systems Wei Xu (xuw@cs.berkeley.edu)xuw@cs.berkeley.edu Bill Kramer (kramer@lbl.gov)kramer@lbl.gov Joe Hellerstein.

Result – An accurate data source

P Controller with Pre-compensation PI Controller

0 2 4 6 8 10

x 105

0

500

1000

1500

2000

2500

3000

time (ms)

tup

les

pe

r se

c

desired loadactual load

0 2 4 6 8 10

x 105

0

500

1000

1500

2000

2500

3000

time (ms)

tup

les

pe

r se

c

desired loadactual load

Joseph L Hellerstein
But it's a P controller with precompensation
Page 9: Applying Control Theory to Stream Processing Systems Wei Xu (xuw@cs.berkeley.edu)xuw@cs.berkeley.edu Bill Kramer (kramer@lbl.gov)kramer@lbl.gov Joe Hellerstein.

Result – regulating queue length

Buffer

Source

TCQ

Result Q

0 1 2 3 4 5 6 7 8 9 10

x 105

0

1000

2000

3000

time (ms)

tup

les

pe

r se

csource data rateend-to-end drop rate

0 1 2 3 4 5 6 7 8 9 10

x 105

0

1000

2000

3000

time (ms)

tup

les

pe

r se

c

output rate of buffer

0 1 2 3 4 5 6 7 8 9 10

x 105

2

4

6x 10

5

time (ms)

fre

e s

pa

ce (

KB

)

free space

Joseph L Hellerstein
Can you explain the spikes?
Page 10: Applying Control Theory to Stream Processing Systems Wei Xu (xuw@cs.berkeley.edu)xuw@cs.berkeley.edu Bill Kramer (kramer@lbl.gov)kramer@lbl.gov Joe Hellerstein.

Result – Under CPU Contention

Buffer

Source

TCQ

Result Q

0 1 2 3 4 5 6 7 8 9

x 105

0

1000

2000

3000

time (ms)

tup

les

pe

r se

csource data rateend-to-end drop rate

0 1 2 3 4 5 6 7 8 9

x 105

0

1000

2000

3000

time (ms)

tup

les

pe

r se

c

output rate of buffer

0 1 2 3 4 5 6 7 8 9

x 105

0

2

4

6x 10

5

time (ms)

fre

e s

pa

ce (

KB

)

free space

Joseph L Hellerstein
It would be good to go back to the control diagram to show how CPU contention relates to disturbances
Joseph L Hellerstein
I'm not sure I fully understand what happened here.
Page 11: Applying Control Theory to Stream Processing Systems Wei Xu (xuw@cs.berkeley.edu)xuw@cs.berkeley.edu Bill Kramer (kramer@lbl.gov)kramer@lbl.gov Joe Hellerstein.

Why theory is useful?• One of my implementations .. What happened?

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 105

0

2000

4000

time (ms)

tup

les p

er

se

c

source data rateend-to-end drop rate

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 105

0

5000

10000

time (ms)

tup

les p

er

se

c

output rate of buffer

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 105

0

2

4

6x 10

5

time (ms)

fre

e s

pa

ce

(K

B)

free space

Buffer

Source

TCQ

Result Q

Page 12: Applying Control Theory to Stream Processing Systems Wei Xu (xuw@cs.berkeley.edu)xuw@cs.berkeley.edu Bill Kramer (kramer@lbl.gov)kramer@lbl.gov Joe Hellerstein.

What is going on?

Queue LengthController

Desired Queue length

Actual Queue Length

Data Rate to TCQControlled

Output Thread

(Code Reuse)

Page 13: Applying Control Theory to Stream Processing Systems Wei Xu (xuw@cs.berkeley.edu)xuw@cs.berkeley.edu Bill Kramer (kramer@lbl.gov)kramer@lbl.gov Joe Hellerstein.

Theory meets reality

Output Y from simulation

Time

Que

ue

leng

th

Page 14: Applying Control Theory to Stream Processing Systems Wei Xu (xuw@cs.berkeley.edu)xuw@cs.berkeley.edu Bill Kramer (kramer@lbl.gov)kramer@lbl.gov Joe Hellerstein.

Tricky part of parameter estimation

-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5

x 105

-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5x 10

5

u: number of tuples per sec

y: f

ree

spac

e on

que

ue

Model evaluation – Making the system operate in desired range

Non-Linear range

Easy for data source, but queue length ..

0 1 2 3 4 5 6 7 8 9

x 105

0

1

2

3

4

5

6x 10

5

time (ms)

free

spac

e (K

B)

free space

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

x 105

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

5.5x 10

5

time (ms)

fre

e sp

ace

(K

B)

free space

-1 0 1 2 3 4 5

x 105

-1

0

1

2

3

4

5x 10

5

u: number of tuples per sec

y: fr

ee

spa

ce o

n qu

eue

Free Space

Data rate vs free space

Joseph L Hellerstein
Need a plot that shows why this is non-linear due to the threshold effects
Page 15: Applying Control Theory to Stream Processing Systems Wei Xu (xuw@cs.berkeley.edu)xuw@cs.berkeley.edu Bill Kramer (kramer@lbl.gov)kramer@lbl.gov Joe Hellerstein.

Settling Time and Overshoot matters

P Controller

A lot of small disturbance in a Java programIncremental garbage collection

5 5.1 5.2 5.3 5.4 5.5

x 105

1400

1420

1440

1460

1480

1500

1520

1540

1560

1580

time (ms)

num

ber

of t

uple

s pe

r se

c

desired load

actual load

5.3 5.4 5.5 5.6 5.7 5.8 5.9

x 105

1280

1300

1320

1340

1360

1380

1400

1420

1440

time (ms)

num

ber

of t

uple

s pe

r se

c

desired load

actual load

PI Controller

Page 16: Applying Control Theory to Stream Processing Systems Wei Xu (xuw@cs.berkeley.edu)xuw@cs.berkeley.edu Bill Kramer (kramer@lbl.gov)kramer@lbl.gov Joe Hellerstein.

Conclusion

• Advantages of feedback control– Make system more robust under disturbance– Treat complex systems as black boxes

• Cope with the system characteristics instead of having to change it

– Encourage reporting system statistics– Implementation is easy and has theoretical

guarantees

Page 17: Applying Control Theory to Stream Processing Systems Wei Xu (xuw@cs.berkeley.edu)xuw@cs.berkeley.edu Bill Kramer (kramer@lbl.gov)kramer@lbl.gov Joe Hellerstein.

Future Work

• Load balancer

• Smaller sample time to reduce disturbance caused by Java GC?

• Controller on scheduling of system shared by multiple streams

Page 18: Applying Control Theory to Stream Processing Systems Wei Xu (xuw@cs.berkeley.edu)xuw@cs.berkeley.edu Bill Kramer (kramer@lbl.gov)kramer@lbl.gov Joe Hellerstein.

Backup Slides

Page 19: Applying Control Theory to Stream Processing Systems Wei Xu (xuw@cs.berkeley.edu)xuw@cs.berkeley.edu Bill Kramer (kramer@lbl.gov)kramer@lbl.gov Joe Hellerstein.

Outline

• Problems and Motivation

• Controller design

• Result

• Discussion

Page 20: Applying Control Theory to Stream Processing Systems Wei Xu (xuw@cs.berkeley.edu)xuw@cs.berkeley.edu Bill Kramer (kramer@lbl.gov)kramer@lbl.gov Joe Hellerstein.

Description of the System

Revised

DataSource

Input Buffer

TCQ Node

Queue length

RoutingLogic

Load SplitterTCQ Node

Tuples

Tuples

Tuple Blocks

Operation of Load Splitter1. Arriving blocks wait in Input Buffer2. Tuples are routed to balance TCQ queue lengths3. Stop routing if queue length is too large to avoid tuple discards

Page 21: Applying Control Theory to Stream Processing Systems Wei Xu (xuw@cs.berkeley.edu)xuw@cs.berkeley.edu Bill Kramer (kramer@lbl.gov)kramer@lbl.gov Joe Hellerstein.

Compare to Open Loop Control

We know

Y(k) , and we know what we want y(k+1) to be.. Use transfer function to solve for u(k)…

(Expected result – accuracy and disturbance ) -- do be done

Page 22: Applying Control Theory to Stream Processing Systems Wei Xu (xuw@cs.berkeley.edu)xuw@cs.berkeley.edu Bill Kramer (kramer@lbl.gov)kramer@lbl.gov Joe Hellerstein.

Estimation of the transfer function

-3000 -2000 -1000 0 1000 2000 3000-3000

-2000

-1000

0

1000

2000

3000

u: desired tuples per sec

y: a

ctua

l tup

les

per

sec

y(k+1)=ay(k)+bu(k)

Regression

Joseph L Hellerstein
Make sure that explain the experimental setup
Page 23: Applying Control Theory to Stream Processing Systems Wei Xu (xuw@cs.berkeley.edu)xuw@cs.berkeley.edu Bill Kramer (kramer@lbl.gov)kramer@lbl.gov Joe Hellerstein.

Tricky part of parameter estimation

Model evaluation – A data rate that make it operate in linear range

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

x 105

0

1000

2000

3000

time (ms)

nu

mb

er

of

tup

les

pe

r s

ec

desired load

actual loadend drop

tcq drop

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

x 105

0

1000

2000

3000

time (ms)

blo

ck

re

sp

on

se

tim

e

block response time

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

x 105

0

2

4

6x 10

5

time (ms)

fre

e s

pa

ce

on

re

su

lt q

ue

ue