Lecture slides from last week are posted on course web page

1

2

•  Lecture slides from last week are posted on course web page.

•  Project suggestions & deadlines are posted on web page

•  Reading list is posted. • Volunteer now! :-)‏ • Need one presenter for next week …

3

•  Sept. 30: Project proposal •  Oct 7: Related work •  Oct 28: Status report I •  Nov 20: Status report II •  Dec 20: Final report

4

•  Two case studies from my own research •  Some project suggestions •  A few words about paper presentations

•  Probably next week: •  Queueing Terminology •  First operational laws •  Little’s law

5

6

PS (timesharing)‏

Standard web server

SRPT (shortest-remaining-time)

Socket 1

Socket 3

Socket 2

SRPT web server (kernel-level Implementation) ‏

Socket 1

Socket 3

Socket 2 S

M

L

Size-based scheduling for better

response times.

7

Workload generator 1

resp

onse

tim

e (m

s) ‏

PS SRPT

PS

SRPT

300

200

100

load 0 .25 .5 .75 1 0 .25 .5 .75 1

load

Workload generator 2

WHY?   Mean file size   File size distribution   Access pattern   Request rate   CPU utilization   Bandwidth   Network effects ALL THE SAME!

Tuning knob - Request rate

Tuning knob - Number of users - Think time

8

load 0 .25 .5 .75 1

load 0 .25 .5 .75 1

Web server App server Database (PostgreSQL) ‏

resp

onse

tim

e (s

ec ‏(

10

5

0

TPC-W generator X TPC-W generator Y



9

•  Based on trace from top-10 online auctioning site.

load 0 0.25 0.5 0.75 1

load 0 0.25 0.5 0.75 1

resp

onse

tim

e (s

ec ‏(

20

15

10

5

0

Simulator 1 Simulator 2 20

15

10

5

0

PLJF PS SRPT

PLJF PS SRPT



10

load

10 clnt.

100 clnt.

1000 clnt.

load 0.1 0.25 0.5 0.75 1

Simulator A Simulator B

FCFS

Cray J90/C90

•  Simulation based on trace from Pittsburgh Supercomputing Center.

10

10

10

10 resp

onse

tim

e (m

in ‏(

5

4

3

2

0.1 0.25 0.5 0.75 1



11

PLJF PS PSJF

10 clnt. 100 clnt. 1000 clnt.

PLJF PS PSJF

PS SRPT

PS SRPT



12

Model of user behavior

User requests web page, receives page, reads page, clicks on new link

•  Arrivals triggered by completions.

•  Fixed number of users, called the Multi-Programming-Level (MPL) ‏

think send receive

13

x x x server

new arrivals

arrival times

next arrival time from trace

•  Arrivals are independent of completions

•  There is no max number of simultaneous users

Trace / probability distribution

14

Surge SPECWeb

TPC-W Sclient RUBiS

WebBench Webjamma

•  Generators for same purpose use different models!

•  Often not clear which model generators use!

15

•  Very little … •  Limited to FCFS single server queue.

–  Response times under open system higher than under closed [Bondi and Whitt 1986].

–  For MPL -> , closed system converges to open system [Schatte83, Schatte84]. ∞

16

–  What is the magnitude in difference of response times? –  What is the speed of convergence? –  How does variability (heavy tails) affect results? –  How are different scheduling disciplines affected? –  …. in practice?

17

•  What is the magnitude in difference of response times? –  Orders of magnitude!

load 0 0.25 0.5 0.75 1

mea

n re

spon

se ti

me

1000

100

10

Open

Closed (MPL=50)‏

ANALYSIS

•  Why? –  Bounded number of jobs in closed system.

18

•  How does variability affect open/closed response times? –  Huge effect on open, limited effect on closed system.

Closed (MPL=50)‏ Closed (MPL=100)‏

Closed (MPL=1000)‏

low variability high variability mea

n re

spon

se ti

me

1500

1000

500

Open Web Workloads

•  Why? –  Dependency between completions and arrivals in closed system

reduces burstiness.

ANALYSIS

19

•  Can we make closed look like open, by increasing MPL?

Closed (MPL=50)‏ Closed (MPL=100)‏

Closed (MPL=1000)‏

low variability high variability mea

n re

spon

se ti

me

1500

1000

500

Open Web Workloads

20

•  What is the impact of scheduling? –  Huge in open system, almost none in closed system.

PLJF FCFS PS PSJF

PLJF FCFS PS PSJF

ANALYSIS

•  Why? –  Scheduling takes advantage of variability in the system. –  Closed systems reduce the effect of variability.

ANALYSIS

21

1.  Is there a more realistic model?

2.  What’s most representative of real systems?

22

x x x new arrivals

server

think send receive

leave system

with probability q return to the system

23

1 10 100 1000 mean think time

300

200

100

0

mea

n re

spon

se ti

me

SRPT

PS

24

q1 q0

number of requests per visit ↑ number of requests per visit ↓ ? ?

x x x new arrivals

server

think send receive

leave system

with probability q return to the system

25

300

200

100

0 0 5 10 15 20

PS open

PS closed

PS

SRPT mea

n re

spon

se ti

me

mean number of requests per visit

OPEN CLOSED

26

Open or Closed? Use partly-open system

to decide

Real web workloads

•  A site being “Slashdotted” •  Financial service provider •  CMU web server •  Kasparov vs Deep Blue •  Large corporate web site •  Science Institute USGS •  Online dept. store •  Supercomp. site •  World cup site •  Online gaming site

)1.2(‏ )1.4(‏ )1.8(‏ )2.4(‏ )2.4(‏ )3.6(‏ )5.4(‏ )6.0(‏ )11.6(‏ )12.9(‏

#req. / visit

27

28

Storage system (RAID)‏

•  Depends on probability that after one drive fails, a second drive fails while reconstructing data.

29

4

2

1

0

3

5

6 x 10 -3

Prob

abili

ty (%

‏(

1 hour reconstruction time

•  Need probability of second failure during reconstruction

Standard approach: Use datasheet MTTF and exponential distr.

30

4

2

1

0

3

5

6 x 10 -3

Prob

abili

ty (%

‏(



Standard approach: Use datasheet MTTF and exponential distr.

Estimate based on data

31

x 10 -3

4

2

1

0

3

5

6

Prob

abili

ty (%

‏(



Standard approach: Use datasheet MTTF and exponential distr. Use measured MTTF and exponential distribution

Estimate based on data

32

x 10 -3

4

2

1

0

3

5

6

Prob

abili

ty (%

‏(



Standard approach: Use datasheet MTTF and exponential distr. Use measured MTTF and exponential distribution Use measured MTTF and Weibull distribution Estimate based on data

33

1.2 1.0

0.6 0.4 0.2

0

0.8

1.4 1.6

x 10 -2

Prob

abili

ty (%

‏(

Reconstruction time


Standard approach: Use datasheet MTTF and exponential distr. Use measured MTTF and exponential distribution Use measured MTTF and Weibull distribution Estimate based on data

34

•  Intuition is not always good enough •  Need back-of-the envelope calculations

and analytical tools to answer questions. •  Workload / fault load matters hugely

•  Important to understand what the real world looks like!

Lecture slides from last week are posted on course web page

Documents