Top Banner
Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex Cheung Nov 13, 2006 ECE1747
26

Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.

Dec 28, 2015

Download

Documents

Octavia Stewart
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.

Adaptive Overload Control for Busy Internet Servers

Matt Welsh and David CullerUSENIX Symposium on Internet Technologies and Systems (USITS)

2003

Alex CheungNov 13, 2006

ECE1747

Page 2: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.

Outline• Motivation

• Goal

• Methodology• Detection• Overload control

• Experiments

• Comments

Page 3: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.

Motivation1. Internet services becoming important to

our daily lives:• Email• News• Trading

2. Services becoming more complex• Large dynamic content

• Requires high computation and I/O• Hard to predict load requirements of requests

3. Withstand peak load that is 1000x the norm without over-provisioning

• Solve CNN’s problem on 911

Page 4: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.

Goal• Adaptive overload control scheme at

node level by maintaining:• Response time• Throughput• QoS & Availability

Page 5: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.

Methodology - Detection

1. Look at the 90th percentile response time2. Compare with threshold and decide what to do

Weaker alternatives:• 100th percentile: does not capture “shape” of response time

curve• Throughput: does not capture user perceived performance

of the system

I ask:• What makes 90th percentile so great?• Why not 95th? 80th? 70th?• No supporting micro-experiment

1 2 3 4 5 6 7 8 9 10

Requests served

Examine 90th highest response time

Page 6: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.

Methodology – Overload Control• If response time is higher than

threshold:1. Limit service rate by rejecting selected

requests• Extension: Differentiate requests with

classes/priorities levels and reject lower class/priority requests first

2. Quality/service degradation3. Back pressure

1. Queue explosion at 1st stage (they say)• Solved by rejecting requests at 1st stage

2. Breaks the loose-coupling modular design of SEDA with out-of-band notification scheme (I say)

Page 7: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.

Methodology – Overload Control4. Forward rejected request to another

“more available” server.• “more available” – server with the most of a

particular resource:• CPU, network, I/O, hard disk

• Make decision using centralized or distributed algorithm

• Reliable state migration, possibly transactional

My take:• More complex, interesting, and actually

solves CNN’s problem with a cluster of servers!

Page 8: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.

Rate Limit

SMOOTHED

Multiplicative decrease Additive increase

Just like TCP!

10 fine-tuned parameters per stage.

Page 9: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.

Rate Limit With Class/Priority

Class/priority assignment based on:• IP address, header information, HTTP cookies

I ask:• Where is the priority assignment module implemented?• Should priority assignment be a stage of its own?• Is it not shown because complicates the diagram and makes the

stage design not “clean”?• How to classify which requests are potentially “bottleneck”

requests? Application dependent?

Page 10: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.

Quality/Service Degradation• Notify application via signal to DO service

degradation.• Application does service degradation, not

SEDA

Questions:• How is the signaling implemented?

• Out of band? • Is it possible to signal previous stages in

the pipeline? Will this SEDA’s loose-coupling design?

signal

Attach image

Send response

Page 11: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.

Experiments

Page 12: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.

Experiments - Setup

• Arashi email server (realistic experiment)• Real access workload• Real email content• Admission control

• Web server benchmark• Service degradation + 1-class admission

control

Page 13: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.

Experiments – Admission Rate

Controller response time is not as fast.

Additive increaseMultiplicative decrease

Page 14: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.

Experiments – Response Time

Why?

Why?

Page 15: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.

Experiments – Massive Load Spike

Not fair! SEDA’s parameters were fine-tuned. Apache can be tuned to stay flat too.

Page 16: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.

Experiments – Service Degradation

Service degradation and admission control kick in at roughly the same time

Page 17: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.

Experiments – Service Differentiation

• Average reject rates without service differentiation:• Low-priority: 55.5%• High-priority: 57.6%

• With service differentiation:• Low-priority: 87.9% +32.4%• High-priority: 48.8% -8.8%

Question:• Why is the drop rate for high priority

request reduced so little with service differentiation? Workload dependent?

Page 18: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.

Comments

Page 19: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.

Comments• No idea on what is the controller’s

overhead• Overload control at node level is not good:

• Node level is inefficient• Late rejection

• Node level is not user friendly:• All session state is gone if you get a reject out of the

blues ← comes without warning

• Need global level overload control scheme

• Idea/concept is explained in 2.5 pages

Page 20: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.

Comments• Rejected requests:

• Instead of TCP timeout, send static page.• (Paper says) this is better• (I say) This is worst because it leads to a out-of-

memory crash down the road:• Saturated output bandwidth• Boundless queue at reject handler

• Parameters:• How to tune them? How difficult to tune?• May be tedious tuning each stage manually.• Given a 1M stage application, need to

configure all 1M stage thresholds manually?• Automated tuning with control theory?

• Methodology of adding extensions is not shown in any figures.

Page 21: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.

Comments• Experiment is not entirely realistic:

• Inter-request think time is 20ms• realistic?

• Rejected users have to re-login after 5 min:

• All state information is gone• Frustrated users

• Two drawbacks of using response time for load detection…

Page 22: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.

Comments

1. No idea which resource is the bottleneck: CPU? I/O? Network? Memory?• SEDA can only either:

• Do admission control • Reduces throughput

• Tell application to degrade overall service

Page 23: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.

Comments

CPU I/O Network Memory

Res

ourc

e ut

iliza

tion

threshold

Default admission control:

Attach image

Send response

Reject requests

OVERLOADED!!!

… and piss off some users.

Page 24: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.

Comments

CPU I/O Network Memory

Res

ourc

e ut

iliza

tion

threshold

Service degradation WITH bottleneck intelligence:

Network is the bottleneck, so expend some CPU and memory to reduce fidelity and size of images to reduce bandwidth consumption WITHOUT reducing admission rate.

Page 25: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.

Comments2. The response time index is lagging by at

least the magnitude of the response time• 50 requests come in all at once• nreq = 100• timeout = 10s• target = 20s• Processing time per request = 1s

• Detects overload after 30s

Solution:• Compare enqueue VS dequeue rate• Overload occurs when enqueue rate >

dequeue rate• Detects overload after 10s

Page 26: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.

Questions?