Soroban: Attributing latency in virtualized environments Lucian Carata [email protected] James Snee, Oliver R.A. Chick, Ripduman Sohan, Ramsey Faragher, Andrew Rice, Andy Hopper
Mar 28, 2018
Soroban: Attributing latency in virtualized environments
Lucian [email protected]
James Snee, Oliver R.A. Chick, Ripduman Sohan, Ramsey Faragher, Andrew Rice, Andy Hopper
Tail latency increases when you virtualize software.
But who's responsible?
1 / 10
of individual requests
Attribution as a cloud problem
● Typical issues:
– tail latency
– perf variability & isolation
– workload colocation side-effects
● Measurement only tells half of the story
Attribution gives an in-depth view of time spent doing an action
- 7ms due to workload interference (cloud)- 5ms due to concurrent requests (VM)- 3ms processing request (VM)
15ms
2 / 10
Attribution as a cloud problem
● Typical issues:
– tail latency
– perf variability & isolation
– workload colocation side-effects
● Measurement only tells half of the story
Attribution gives an in-depth view of time spent doing an action
- Precise measuremens- Context (relating different measurements)- Inference
2 / 10
Attribution based on measurement
● Setup:
– Lighttpd (modified with Soroban API),
– 16 concurrent VMs, Low contention (periodic CPU load, low network load)
● Goal:
– determine how much is the concurrent workload slowing down each request
● Start from a simple idea (soroban-enabled measurement):
– look at the time the lighttpd VM was scheduled out
– during each request!S1 S1 S2 S2 S3 S3
3 / 10
Soroban
● Precise measurement API
– Activity boundaries
● Kernel module
– Provides measurement context
– Aggregation
● Inference
– Machine learning to determine attribution model
5 / 10
Soroban
Measurement
Application
Soroban API
Soroban .ko
XEN Containers
Sched (virt),Event channel, Yields / Blocks
Sched (kernel), Perf (PMU), Subsys, Kernel data
activity boundaries
Linux Kernel
per activity data
(sched, ev, yield, . lat)…
Training
6 / 10
Soroban
Measurement
Application
Soroban API
Soroban .ko
XEN Containers
Sched (virt),Event channel, Yields / Blocks
Sched (kernel), Perf (PMU), Subsys, Kernel data
activity boundaries
Linux Kernel
per activity data
Training
Reference latency profile(bare metal)
Measurements in virtualizedenvironment
(sched, ev, yield, . lat)…
6 / 10
Soroban
Measurement
Application
Soroban API
Soroban .ko
XEN Containers
Sched (virt),Event channel, Yields / Blocks
Sched (kernel), Perf (PMU), Subsys, Kernel data
activity boundaries
Linux Kernel
per activity data
Training
Reference latency profile(bare metal)
Measurements in virtualizedenvironment
(sched, ev, yield, . lat)…
6 / 10
quantile-quantile difference
Soroban
Measurement
Application
Soroban API
Soroban .ko
XEN Containers
Sched (virt),Event channel, Yields / Blocks
Sched (kernel), Perf (PMU), Subsys, Kernel data
activity boundaries
Linux Kernel
per activity data
Training
Reference latency profile(bare metal)
Measurements in virtualizedenvironment
quantile-quantile difference
(sched, ev, yield, . lat)…
6 / 10
23 ms … 75th %tile
75th %tile... 10 ms
Soroban
Measurement
Application
Soroban API
Soroban .ko
XEN Containers
Sched (virt),Event channel, Yields / Blocks
Sched (kernel), Perf (PMU), Subsys, Kernel data
activity boundaries
Linux Kernel
per activity data
Training
Reference latency profile(bare metal)
Measurements in virtualizedenvironment
quantile-quantile difference
(sched, ev, yield, . lat, … virt-influence)
Gaussian Processregression model
(sched, ev, yield, . lat)…
6 / 10
23 ms … 75th %tile
75th %tile... 10 ms
13 ms
Soroban
Measurement
Application
Soroban API
Soroban .ko
XEN Containers
Sched (virt),Event channel, Yields / Blocks
Sched (kernel), Perf (PMU), Subsys, Kernel data
activity boundaries
Linux Kernel
per activity data
Attribution
Gaussian Processregression model
Measurements in virtualizedenvironment
(sched, ev, yield, . lat)…
7 / 10
Soroban
Measurement
Application
Soroban API
Soroban .ko
XEN Containers
Sched (virt),Event channel, Yields / Blocks
Sched (kernel), Perf (PMU), Subsys, Kernel data
activity boundaries
Linux Kernel
per activity data
Attribution
Gaussian Processregression model
Measurements in virtualizedenvironment
prediction
(virt-influence, confidence)
Hypervisor-induced latency
Containerisation-induced latency
(sched, ev, yield, . lat)…
7 / 10
Limitations
● Needs applications to disclose activity boundaries
● Training phase
● Depends on virtualization isolation properties
10 / 10
[email protected]@cl.cam.ac.uk
Discussion
● Moving to multi-hop requests / actions
● Automating app instrumentation
● Cloud provider transparency
● Finer-grained charging?
www.cl.cam.ac.uk/rscfl