Top Banner
Understanding Microservice Performance Rob Harrop
54

Understanding Microservice Performance

Jan 08, 2017

Download

Software

Rob Harrop
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Understanding Microservice Performance

Understanding Microservice PerformanceRob Harrop

Page 2: Understanding Microservice Performance

The performance of a distributed system is the combined performance of

its collaborating services and their communication links

Page 3: Understanding Microservice Performance

Services and

aggregations of services

Page 4: Understanding Microservice Performance

Who am I?

▸ CTO @ Skipjaq▸ ML-driven performance optimisation

▸ Co-founder of SpringSource

▸ Once upon a time I…▸ Contributed to Spring Framework

▸ Wrote a book about Spring

▸ Talked a lot about Spring

Page 5: Understanding Microservice Performance

Who am I?

▸ I’m on Twitter: ▸ @robertharrop

▸ I’m on Github: ▸ github.com/robharrop

▸ I write about maths and performance ▸ https://robharrop.github.io

If you have questions after the session, {grab, tweet} me.

Page 6: Understanding Microservice Performance

Agenda

Page 7: Understanding Microservice Performance

After this talk you will know how to:

▸ Measure performance correctly

▸ Find potential performance disasters

▸ Identify the best candidates for optimisation

▸ Model complex micro services systems

▸ Forecast system scalability

Page 8: Understanding Microservice Performance
Page 9: Understanding Microservice Performance

What is performance?

Page 10: Understanding Microservice Performance

How fast can I do a thing, and

how many things can I do every period?

What do we mean by performance?

Page 11: Understanding Microservice Performance

What measures performance?

Latency (how fast) and

Throughput (how many)

Page 12: Understanding Microservice Performance

Throughput

▸ The rate of processing: x per y

▸ Requests per second

▸ Records per minute

▸ Messages per second

▸ Tasks per day

Page 13: Understanding Microservice Performance

Latency

▸ Time taken… for something▸ Service time?

▸ First byte?

▸ First response complete?

▸ Last byte?

▸ Render?

▸ Moral of the story: define what you mean by latency

Page 14: Understanding Microservice Performance

Measuring

Page 15: Understanding Microservice Performance

This is where everything goes wrong

Page 16: Understanding Microservice Performance

Crib Sheet

▸ Record timestamped requests with observed latency and success/error

▸ Throughput▸ Min, max, mean

▸ Varying time windows (10s, 30s, 1m, 5m, …)

▸ Latency▸ Min, max, 95th, 99th, 99.9th and other tail percentiles

▸ Mean just means meaningless

Page 17: Understanding Microservice Performance

We need to talk about latency

▸ Latency isn’t exponentially-distributed▸ And it certainly isn’t normally-distributed

▸ Latency distributions have heavy tails

▸ Latency distributions are multi-modal

▸ Customers see tail latencies way more than you think▸ Don’t let percentiles trick you

▸ Understand what latency means to your business

Page 18: Understanding Microservice Performance
Page 19: Understanding Microservice Performance

Tail Latencies

Page 20: Understanding Microservice Performance

Is my customer getting good service?

Page 21: Understanding Microservice Performance

What does this mean in reality?

Page 22: Understanding Microservice Performance

Is my customer getting good service?

95th percentile

42 requests

Page 23: Understanding Microservice Performance

Most customers will see tail latencies

Page 24: Understanding Microservice Performance

Visualising Tail Latencies

Page 25: Understanding Microservice Performance

Are latency and throughput useful when considered in isolation?

Page 26: Understanding Microservice Performance

Attribution: http://www.cowboyjedi.com/comics/2010-03-10-i-made-the-kessel.gif

Page 27: Understanding Microservice Performance

Little’s Law

Page 28: Understanding Microservice Performance

Queueing Theory - Little’s Law

Page 29: Understanding Microservice Performance

Queueing Theory - M/M/1 Queue

Page 30: Understanding Microservice Performance

What can we conclude?

▸ Latency and throughput work in tandem

▸ At high throughput, latency degrades considerably

Page 31: Understanding Microservice Performance

High utilisation is an

early-warning sign

Page 32: Understanding Microservice Performance

Latency Stacking

Page 33: Understanding Microservice Performance

Service latencies stack

▸ For simple cases (feed-forward networks), latencies are additive▸ Analytical models are available

▸ http://robharrop.github.io/maths/performance/2016/03/15/queue-networks.html

▸ For most interesting cases this cannot be assumed▸ Simulation is the best option▸ Pretty Damn Quick (PDQ) is a great tool, but requires a chunk of effort▸ Guesstimate is great for quick and dirty models

Page 34: Understanding Microservice Performance

Analytical Stacking

Page 35: Understanding Microservice Performance

Simulating Latency Stacking

Page 36: Understanding Microservice Performance
Page 37: Understanding Microservice Performance

Amdahl’s Law

Page 38: Understanding Microservice Performance

How much improvement can we get from an optimisation?

Page 39: Understanding Microservice Performance

Amdahl’s Law

▸ Theoretical improvement in latency given a fixed workload

Theoretical max system speedup

Speedup of part under optimisation

Percentage of execution time in part under optimisation

Page 40: Understanding Microservice Performance

Amdahl’s Law in the Limit

▸ Theoretical max is limited by parts of the system not under improvement

Theoretical max system speedup Percentage of execution time in part

under optimisation

Speedup of part under optimisation

Page 41: Understanding Microservice Performance

Thinking about Amdahl’s Law

Page 42: Understanding Microservice Performance

Maximum theoretical system speedup

[1.11][1.67]

[1.25]

[1.43]

Page 43: Understanding Microservice Performance

Drive optimisation choices by service utilisation

Page 44: Understanding Microservice Performance

Universal Scalability Law

Page 45: Understanding Microservice Performance

Can increasing capacity reduce performance?

Page 46: Understanding Microservice Performance

TL;DR - Yes

Page 47: Understanding Microservice Performance

Crosstalk overhead

Universal Scalability Law

Contention overhead

Relative capacityNumber of users

Page 48: Understanding Microservice Performance

Coherence in Action

Page 49: Understanding Microservice Performance

Visualising the USL

Page 50: Understanding Microservice Performance

Measure crosstalk and target it for optimisation

Page 51: Understanding Microservice Performance
Page 52: Understanding Microservice Performance
Page 53: Understanding Microservice Performance

Summary

▸ Measurements are critical▸ Garbage in, garbage out

▸ Monitor utilisation for early-warning of disaster▸ Little’s Law

▸ Monitor latency per-user, not just per-request

▸ Select optimisation targets carefully▸ Amdahl’s Law

▸ Monitor crosstalk to forecast scalability▸ Universal Scalability Law

Page 54: Understanding Microservice Performance

Reading List and Q&A

▸ Release It! - Michael Nygard

▸ Systems Performance - Brendan Gregg

▸ Guerrilla Capacity Planning - Dr. Neil Gunther

▸ Practical Scalability Analysis - Baron Schwartz