Top Banner
EVERYBODY LIES TOMASZ KOWALCZEWSKI
56
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Everybody Lies

E V E R Y B O D Y L I E ST O M A S Z K O W A L C Z E W S K I

Page 2: Everybody Lies

C A R G O C U LT

During the Middle Ages there were all kinds of crazy ideas, such as that a piece of of rhinoceros horn would increase potency. Then a method was discovered for separating the ideas- which was to try one to see if it worked, and if it didn't work, to eliminate it. This method became organized, of course, into science. And it developed very well, so that we are now in the scientific age. It is such a scientific age, in fact, that we have difficulty in understanding how witch doctors could ever have existed, when nothing that they proposed ever really worked-or very little of it did.

Richard Feynman

From a Caltech commencement address given in 1974

Page 3: Everybody Lies

W H Y B O T H E R ?

• You get what you measure

- Ineffective optimisations that complicate code

+ Numbers to convince management to do refactoring or migration to Java 8!

Page 4: Everybody Lies

W H Y B O T H E R ?

• Predictable is better than fast

• One page display requires multiple calls (static and dynamic resources)

• Multiple microservices are called to generate response

• During a session user may do hundreds of displays of your webpages

Page 5: Everybody Lies

W H Y D O T H I S ?

• Every 100 ms increase in load time of Amazon.com decreased sales by 1%1

• Increasing web search latency 100 to 400 ms reduces the daily searches per user by 0.2% to 0.6%. Furthermore, users do fewer searches the longer they are exposed. For longer delays, the loss of searches persists for a time even after latency returns to previous levels.2

1Kohavi and Longbotham 20072Brutlag 2009

Page 6: Everybody Lies

S U R V E Y

• Do you…

Page 7: Everybody Lies

S U R V E Y

• Use graphite?

Page 8: Everybody Lies

S U R V E Y

• Use graphite?

• Feed it with Coda Hale/Dropwizard metrics?

Page 9: Everybody Lies

S U R V E Y

• Use graphite?

• Feed it with Coda Hale/Dropwizard metrics?

• Modify their source? Use nonstandard options?

Page 10: Everybody Lies

S U R V E Y

• Use graphite?

• Feed it with Coda Hale/Dropwizard metrics?

• Modify their source? Use nonstandard options?

• Graph average? Median?

Page 11: Everybody Lies

S U R V E Y

• Use graphite?

• Feed it with Coda Hale/Dropwizard metrics?

• Modify their source? Use nonstandard options?

• Graph average? Median?

• Percentiles?

Page 12: Everybody Lies

(c) xkcd.com

Page 13: Everybody Lies

W H AT M E T R I C S C A N W E U S E ?

graphite.send(prefix(name, "max"), ...); graphite.send(prefix(name, "mean"), ...); graphite.send(prefix(name, "min"), ...); graphite.send(prefix(name, "stddev"), ...); graphite.send(prefix(name, "p50"), ...); graphite.send(prefix(name, "p75"), ...); graphite.send(prefix(name, "p95"), ...); graphite.send(prefix(name, "p98"), ...); graphite.send(prefix(name, "p99"), ...); graphite.send(prefix(name, “p999"), ...);

Page 14: Everybody Lies

D O N ’ T L O O K AT M E A N

• 1000 queries - 0ms latency, 100 queries 5s latency

• Average is 4,5ms

• 1000 queries - 1ms latency, 100 queries - 5s latency

• Average is 455ms

• Does not help to quantify lags users will experience

Page 15: Everybody Lies

– A N S C O M B E ' S Q U A R T E T B Y F R A N C I S A N S C O M B E

These four data sets all have the same mean, median, and variance

Page 16: Everybody Lies

P L O T T I N G M E A N I S F O R S H O W I N G O F F T O M A N A G E M E N T

Page 17: Everybody Lies

M AY B E M E D I A N T H E N ?

• What is the probability of end user encountering latency worse than median?

• Remember: usually multiple requests are needed to respond to API call (e.g. N micro services, N resource requests per page)

✓1

2

◆N

· 100

Page 18: Everybody Lies

P R O B A B I L I T Y O F E X P E R I E N C I N G L AT E N C Y B E T T E R T H A N M E D I A N

I N F U N C T I O N O F M I C R O S E R V I C E S I N V O LV E D

0 1 2 3 4 5 6 7 8 9 10

10

20

30

40

50

60

70

80

90

100

Page 19: Everybody Lies

W H I C H P E R C E N T I L E I S R E L E VA N T T O Y O U ?

• Is 99th percentile demanding constraint?

• In application serving 1000 qps latency worse than that happens ten times per second.

• User that needs to navigate through several web pages will most probably experience it

• What is the probability of encountering latency better than 99th?

✓99

100

◆N

· 100

Page 20: Everybody Lies

P R O B A B I L I T Y O F E X P E R I E N C I N G L AT E N C Y B E T T E R T H A N 9 9 T H P E R C E N T I L E

I N F U N C T I O N O F M I C R O S E R V I C E S I N V O LV E D

0 10 20 30 40 50 60 70 80 90 100

0

10

20

30

40

50

60

70

80

90

100

Page 21: Everybody Lies

D O N O T AV E R A G E P E R C E N T I L E S

Example scenario:

1. Load balancer splits traffic unevenly (ELB anyone?)

2. Server S1 has 1 qps over measured time with 95%’ile == 1ms

3. Server S2 has 100 qps over measured time with 95%’ile == 10s

4. Average is ~5s.

5. What does that tell us?

6. Did we satisfy SLA if it says “95%’ile must be below 8s”?

7. Actual 95%’ile percentile is ~10s

Page 22: Everybody Lies

– A L I C E ' S A D V E N T U R E S I N W O N D E R L A N D

“If there's no meaning in it,' said the King, 'that saves a world of trouble, you know, as we

needn't try to find any”

Page 23: Everybody Lies

Every time you average max values someone in the world starts new JavaScript framework

Page 24: Everybody Lies

Demo time

Page 25: Everybody Lies

metricRegistry.timer("2015.standardTimer");

Standard timer will over or under report actual percentiles at will.

Green line represents actual MAX values.

Page 26: Everybody Lies

metricRegistry.timer("2015.standardTimer");

Standard timer will over or under report actual percentiles at will.

Green line represents actual MAX values.

Page 27: Everybody Lies

T I M E R ’ S H I S T O G R A M R E S E R V O I R

• Backing storage for Timer’s data

• Contain “statistically representative reservoir of a data stream”

• Default is ExponentiallyDecayingReservoir which has many drawbacks and is source of most inaccuracies observed throughout this presentation

• Others include

• UniformReservoir, SlidingTimeWindowReservoir, SlidingTimeWindowReservoir, SlidingWindowReservoir

Page 28: Everybody Lies

E X P O N E N T I A L LY D E C AY I N G R E S E R V O I R

• Stores 1028 random samples by default

• Assumes normal distribution of recorded values

• Many statistical tools applied in computer systems monitoring will assume normal distribution

• Be suspicious of such tools

• Why is that a bad idea?

Page 29: Everybody Lies

-2,4 -2 -1,6 -1,2 -0,8 -0,4 0 0,4 0,8 1,2 1,6 2 2,4

0,5

1

1,5

2

2,5

3N O R M A L D I S T R I B U T I O N - W H Y S O U S E F U L ?

• Central limit theorem

• Chebyshev's inequality

f (x, µ, �) =1

p2⇡

e

� (x�µ)2

2�2

Page 30: Everybody Lies

10 10,5 11 11,5 12

-0,25

0,25

0,5

0,75

1C A L C U L AT E 9 5 % ’ I L E B A S E D O N M E A N A N D S T D . D E V.

• IFF latency values were distributed normally then we could calculate any percentile based on mean and standard deviation

µ = 10ms � = 1ms

• Lookup into standard normal (Z) table

• 95%’ile is located 1.65 std. dev. from mean

• Result is 11,65ms

Page 31: Everybody Lies

Latency profile resembling normal distribution…

Page 32: Everybody Lies

Add spikes due to young gen GC pauses

Page 33: Everybody Lies

Add spikes due to old gen GC pauses

Page 34: Everybody Lies

Add spikes due to calling other services (like DB)

Page 35: Everybody Lies

Add spikes due to: lost tcp packet retransmission, disk swapping, kernel bookkeeping etc.

Page 36: Everybody Lies

-2,4 -2 -1,6 -1,2 -0,8 -0,4 0 0,4 0,8 1,2 1,6 2 2,4

0,5

1

1,5

2

2,5

3N O R M A L D I S T R I B U T I O N - W H Y N O T A P P L I C A B L E ?

• The value of the normal distribution is practically zero when the value x lies more than a few standard deviations away from the mean.

• It may not be an appropriate model when one expects a significant fraction of outliers

• […] other statistical inference methods that are optimal for normally distributed variables often become highly unreliable when applied to such data.

1

f (x, µ, �) =1

p2⇡

e

� (x�µ)2

2�2

1All quotes on this slide from Wikipedia

Page 37: Everybody Lies

Blue line represents metric reported from Timer class Green line represents request rate

Page 38: Everybody Lies

T I M E R , T I M E R N E V E R C H A N G E S …

• Timer values decay exponentially

• giving artificial smoothing of values for server behaviour that may be long gone

• Timer that is not updated does not decay

• If Timer is not updated (e.g. subprocess failed and we stopped sending requests to it) its values will remain constant

• Check this post for potential solutions: taint.org/2014/01/16/145944a.html

Page 39: Everybody Lies

H D R H I S T O G R A M

• Supports recording and analysis of sampled data across configurable range with configurable accuracy

• Provides compact representation of data while retaining high resolution

• Allows configurable tradeoffs between space and accuracy

• Very fast, allocation free, not thread safe for maximum speed (thread safe versions available)

• Created by Gil Tene of Azul Sytems

Page 40: Everybody Lies

R E C O R D E R

• Uses HdrHistogram to store values

• Supports concurrent recording of values

• Recording is lock free but also wait free on most architectures (that support lock xadd)

• Reading is not lock free but does not stall writers (writer-reader phaser)

• Checkout Marshall Pierce’s library for using it as a Reservoir implementation

Page 41: Everybody Lies

S O L U T I O N S

• Always instantiate Timer with custom reservoir

• new ExponentiallyDecayingReservoir(LARGE_NUMBER)

• new SlidingTimeWindowReservoir(1, MINUTES)

• new HdrHistogramResetOnSnapshotReservoir()

• Only last one is safe and accurate and will not report stale values if no updates were made

Page 42: Everybody Lies

JMH benchmarks (from my laptop, caveat emptor!)

Page 43: Everybody Lies

S M O K I N G B E N C H M A R K I N G I S T H E L E A D I N G C A U S E O F S TAT I S T I C S I N T H E W O R L D

Page 44: Everybody Lies

C O O R D I N AT E D O M I S S I O N

• As formulated by Gil Tene of Azul Systems

• When load driver is plotting with system under test to deceive you

• Most tools do this

• Most benchmarks do this

• Yahoo Cloud Serving Benchmark had that problem1

1Recently fixed by Nitsan Wakart, see psy-lob-saw.blogspot.com/2015/03/fixing-ycsb-coordinated-omission.html

Page 45: Everybody Lies

-1,6 -0,8 0 0,8 1,6 2,4 3,2 4 4,8 5,6 6,4 7,2

-0,8

0,8

1,6

2,4

3,2

4

4,8

5,6

request arrival time

Application pause time

Requests according to test plan. Only red one will be

send. Others will be missing from test.

latency

Page 46: Everybody Lies

– C R E AT E D W I T H G I L T E N E ' S H D R H I S T O G R A M P L O T T I N G S C R I P T

Effects on benchmarks at high percentiles are spectacular

Page 47: Everybody Lies

C O O R D I N AT E D O M I S S I O N S O L U T I O N S

1. Ignore the problem!

perfectly fine for non interactive system where only throughput matters

Page 48: Everybody Lies

C O O R D I N AT E D O M I S S I O N S O L U T I O N S

2. Correct it mathematically in sampling mechanism

HdrHistogram can correct CO with these methods (choose one!):

histogram.recordValueWithExpectedInterval( value, expectedIntervalBetweenSamples );

histogram.copyCorrectedForCoordinatedOmission( expectedIntervalBetweenSamples );

Page 49: Everybody Lies

C O O R D I N AT E D O M I S S I O N S O L U T I O N S

3. Correct it on load driver side

by noticing pauses between sent requests.

newly issued request will have timer that starts counting from time it should have been sent but wasn't

Page 50: Everybody Lies

C O O R D I N AT E D O M I S S I O N S O L U T I O N S

4. Fail the test

for hard real time systems where pause causes human casualties (breaks, pacemakers, Phalanx system)

Page 51: Everybody Lies

C O O R D I N AT E D O M I S S I O N

• Mathematical solutions can overcorrect when load driver has pauses (e.g. GC).

• Do not account for the fact that server after pause has no work to do instead of N more requests waiting to be executed

• In real world it might have never recovered

• Most tools ignore the problem

• Notable exception: Twitter Iago

Page 52: Everybody Lies

– L O A D D R I V E R M O T T O

“Do not bend to the tyranny of reality”

Page 53: Everybody Lies

S U M M A R Y

• Measure what is meaningful not just what is measurable

• Set SLA before testing and creating dashboards

• Do not trust Timer class, use custom reservoirs, HdrHistogram, Recorder, never trust EMWA for request rate

• Do not average percentiles unless you need a random number generator

• Do not plot averages unless you just want to look good on dashboards

• When load testing be aware of coordinated omission

Page 54: Everybody Lies

S O U R C E S , T H A N K Y O U S A N D R E C O M M E N D E D F O L L O W U P S• Coda Hale for great metrics library

• Gil Tene

• latencytipoftheday.blogspot.de

• www.infoq.com/presentations/latency-pitfalls

• github.com/HdrHistogram/HdrHistogram

• Nitsan Wakart

• psy-lob-saw.blogspot.de/2015/03/fixing-ycsb-coordinated-omission.html

• and whole blog

• Matin Thompson et. al.

• groups.google.com/forum/#!forum/mechanical-sympathy

Page 55: Everybody Lies

R E C O M M E N D E D

Great introduction to statistics and queueing theory.

Performance Modeling and Design of Computer Systems: Queueing Theory in Action

Prof. Mor Harchol-Balter

Page 56: Everybody Lies

F E E D B A C K K I N D LY R E Q U E S T E D

https://www.surveymonkey.com/s/B5KGWWN