Top Banner
1
34

Lecture slides from last week are posted on course web page

Feb 03, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lecture slides from last week are posted on course web page

1

Page 2: Lecture slides from last week are posted on course web page

2

•  Lecture slides from last week are posted on course web page.

•  Project suggestions & deadlines are posted on web page

•  Reading list is posted. • Volunteer now! :-)‏ • Need one presenter for next week …

Page 3: Lecture slides from last week are posted on course web page

3

•  Sept. 30: Project proposal •  Oct 7: Related work •  Oct 28: Status report I •  Nov 20: Status report II •  Dec 20: Final report

Page 4: Lecture slides from last week are posted on course web page

4

•  Two case studies from my own research •  Some project suggestions •  A few words about paper presentations

•  Probably next week: •  Queueing Terminology •  First operational laws •  Little’s law

Page 5: Lecture slides from last week are posted on course web page

5

Page 6: Lecture slides from last week are posted on course web page

6

PS (timesharing)‏

Standard web server

SRPT (shortest-remaining-time)

Socket 1

Socket 3

Socket 2

SRPT web server (kernel-level Implementation) ‏

Socket 1

Socket 3

Socket 2 S

M

L

Size-based scheduling for better

response times.

Page 7: Lecture slides from last week are posted on course web page

7

Workload generator 1

resp

onse

tim

e (m

s) ‏

PS SRPT

PS

SRPT

300

200

100

load 0 .25 .5 .75 1 0 .25 .5 .75 1

load

Workload generator 2

WHY?   Mean file size   File size distribution   Access pattern   Request rate   CPU utilization   Bandwidth   Network effects ALL THE SAME!

Tuning knob - Request rate

Tuning knob - Number of users - Think time

Page 8: Lecture slides from last week are posted on course web page

8

load 0 .25 .5 .75 1

load 0 .25 .5 .75 1

Web server App server Database (PostgreSQL) ‏

resp

onse

tim

e (s

ec ‏(

10

5

0

TPC-W generator X TPC-W generator Y

Tuning knob - Request rate

Tuning knob - Number of users - Think time

Page 9: Lecture slides from last week are posted on course web page

9

•  Based on trace from top-10 online auctioning site.

load 0 0.25 0.5 0.75 1

load 0 0.25 0.5 0.75 1

resp

onse

tim

e (s

ec ‏(

20

15

10

5

0

Simulator 1 Simulator 2 20

15

10

5

0

PLJF PS SRPT

PLJF PS SRPT

Tuning knob - Request rate

Tuning knob - Number of users - Think time

Page 10: Lecture slides from last week are posted on course web page

10

load

10 clnt.

100 clnt.

1000 clnt.

load 0.1 0.25 0.5 0.75 1

Simulator A Simulator B

FCFS

Cray J90/C90

•  Simulation based on trace from Pittsburgh Supercomputing Center.

10

10

10

10 resp

onse

tim

e (m

in ‏(

5

4

3

2

0.1 0.25 0.5 0.75 1

Tuning knob - Request rate

Tuning knob - Number of users - Think time

Page 11: Lecture slides from last week are posted on course web page

11

PLJF PS PSJF

10 clnt. 100 clnt. 1000 clnt.

PLJF PS PSJF

PS SRPT

PS SRPT

Tuning knob - Request rate

Tuning knob - Number of users - Think time

Page 12: Lecture slides from last week are posted on course web page

12

Model of user behavior

User requests web page, receives page, reads page, clicks on new link

•  Arrivals triggered by completions.

•  Fixed number of users, called the Multi-Programming-Level (MPL) ‏

think send receive

Page 13: Lecture slides from last week are posted on course web page

13

x x x server

new arrivals

arrival times

next arrival time from trace

•  Arrivals are independent of completions

•  There is no max number of simultaneous users

Trace / probability distribution

Page 14: Lecture slides from last week are posted on course web page

14

Surge SPECWeb

TPC-W Sclient RUBiS

WebBench Webjamma

•  Generators for same purpose use different models!

•  Often not clear which model generators use!

Page 15: Lecture slides from last week are posted on course web page

15

•  Very little … •  Limited to FCFS single server queue.

–  Response times under open system higher than under closed [Bondi and Whitt 1986].

–  For MPL -> , closed system converges to open system [Schatte83, Schatte84]. ∞

Page 16: Lecture slides from last week are posted on course web page

16

–  What is the magnitude in difference of response times? –  What is the speed of convergence? –  How does variability (heavy tails) affect results? –  How are different scheduling disciplines affected? –  …. in practice?

Page 17: Lecture slides from last week are posted on course web page

17

•  What is the magnitude in difference of response times? –  Orders of magnitude!

load 0 0.25 0.5 0.75 1

mea

n re

spon

se ti

me

1000

100

10

Open

Closed (MPL=50)‏

ANALYSIS

•  Why? –  Bounded number of jobs in closed system.

Page 18: Lecture slides from last week are posted on course web page

18

•  How does variability affect open/closed response times? –  Huge effect on open, limited effect on closed system.

Closed (MPL=50)‏ Closed (MPL=100)‏

Closed (MPL=1000)‏

low variability high variability mea

n re

spon

se ti

me

1500

1000

500

Open Web Workloads

•  Why? –  Dependency between completions and arrivals in closed system

reduces burstiness.

ANALYSIS

Page 19: Lecture slides from last week are posted on course web page

19

•  Can we make closed look like open, by increasing MPL?

Closed (MPL=50)‏ Closed (MPL=100)‏

Closed (MPL=1000)‏

low variability high variability mea

n re

spon

se ti

me

1500

1000

500

Open Web Workloads

Page 20: Lecture slides from last week are posted on course web page

20

•  What is the impact of scheduling? –  Huge in open system, almost none in closed system.

PLJF FCFS PS PSJF

PLJF FCFS PS PSJF

ANALYSIS

•  Why? –  Scheduling takes advantage of variability in the system. –  Closed systems reduce the effect of variability.

ANALYSIS

Page 21: Lecture slides from last week are posted on course web page

21

1.  Is there a more realistic model?

2.  What’s most representative of real systems?

Page 22: Lecture slides from last week are posted on course web page

22

x x x new arrivals

server

think send receive

leave system

with probability q return to the system

Page 23: Lecture slides from last week are posted on course web page

23

1 10 100 1000 mean think time

300

200

100

0

mea

n re

spon

se ti

me

SRPT

PS

Page 24: Lecture slides from last week are posted on course web page

24

q1 q0

number of requests per visit ↑ number of requests per visit ↓ ? ?

x x x new arrivals

server

think send receive

leave system

with probability q return to the system

Page 25: Lecture slides from last week are posted on course web page

25

300

200

100

0 0 5 10 15 20

PS open

PS closed

PS

SRPT mea

n re

spon

se ti

me

mean number of requests per visit

OPEN CLOSED

Page 26: Lecture slides from last week are posted on course web page

26

Open or Closed? Use partly-open system

to decide

Real web workloads

•  A site being “Slashdotted” •  Financial service provider •  CMU web server •  Kasparov vs Deep Blue •  Large corporate web site •  Science Institute USGS •  Online dept. store •  Supercomp. site •  World cup site •  Online gaming site

)1.2(‏ )1.4(‏ )1.8(‏ )2.4(‏ )2.4(‏ )3.6(‏ )5.4(‏ )6.0(‏ )11.6(‏ )12.9(‏

#req. / visit

Page 27: Lecture slides from last week are posted on course web page

27

Page 28: Lecture slides from last week are posted on course web page

28

Storage system (RAID)‏

•  Depends on probability that after one drive fails, a second drive fails while reconstructing data.

Page 29: Lecture slides from last week are posted on course web page

29

4

2

1

0

3

5

6 x 10 -3

Prob

abili

ty (%

‏(

1 hour reconstruction time

•  Need probability of second failure during reconstruction

Standard approach: Use datasheet MTTF and exponential distr.

Page 30: Lecture slides from last week are posted on course web page

30

4

2

1

0

3

5

6 x 10 -3

Prob

abili

ty (%

‏(

1 hour reconstruction time

•  Need probability of second failure during reconstruction

Standard approach: Use datasheet MTTF and exponential distr.

Estimate based on data

Page 31: Lecture slides from last week are posted on course web page

31

x 10 -3

4

2

1

0

3

5

6

Prob

abili

ty (%

‏(

1 hour reconstruction time

•  Need probability of second failure during reconstruction

Standard approach: Use datasheet MTTF and exponential distr. Use measured MTTF and exponential distribution

Estimate based on data

Page 32: Lecture slides from last week are posted on course web page

32

x 10 -3

4

2

1

0

3

5

6

Prob

abili

ty (%

‏(

1 hour reconstruction time

•  Need probability of second failure during reconstruction

Standard approach: Use datasheet MTTF and exponential distr. Use measured MTTF and exponential distribution Use measured MTTF and Weibull distribution Estimate based on data

Page 33: Lecture slides from last week are posted on course web page

33

1.2 1.0

0.6 0.4 0.2

0

0.8

1.4 1.6

x 10 -2

Prob

abili

ty (%

‏(

Reconstruction time

•  Need probability of second failure during reconstruction

Standard approach: Use datasheet MTTF and exponential distr. Use measured MTTF and exponential distribution Use measured MTTF and Weibull distribution Estimate based on data

Page 34: Lecture slides from last week are posted on course web page

34

•  Intuition is not always good enough •  Need back-of-the envelope calculations

and analytical tools to answer questions. •  Workload / fault load matters hugely

•  Important to understand what the real world looks like!