Thunderbolt: Throughput-Optimized, QoS-Aware Power Capping at Scale 1
Thunderbolt: Throughput-Optimized, QoS-Aware Power Capping at Scale
1
Motivation: power oversubscription and capping
$200+B worldwide spend on data centers
Power oversubscription: more capacity without construction
●
●
Power capping: protective system that shaves power spikes
●
2
Motivation: task QoS differentiation
●
●
●
●
●
●
●
●
●
Google’s cluster scheduler(task-level QoS)
Node
Node
Requires task level control
Goal: Task QoS-aware capping that gently throttles throughput-oriented tasks and exempts latency-sensitive tasks
3
Prior industry solutions did not meet our needs
Example:
Either task QoS-aware but has disruptive capping action...
Examples:
…Or has gentle throttling but coarser-grained QoS differentiation
4
Thunderbolt’s contributions
01Power safety with minimized performance degradation
02Task-level QoS differentiation
04Tolerance of power telemetry unavailability
03Hardware platform independence
5
6
“Reactive capping” primary subsystem
available.
“Proactive capping” failover subsystem
unavailable.
Architecture
7
Thunderbolt service
meters
node controller
node controller
node controller
meter watcher
power notifier
...
power topology data
machine manager
throttling RPCs
power readings
risk assessor
power history data
power readings
power history
throttling decision
risk assessment
power readings
power topology
“Reactive capping” primary subsystem
available.
load shaping
CPU bandwidth control”
“Proactive capping” failover subsystem
Architecture
8
Thunderbolt service
meters
node controller
node controller
node controller
meter watcher
power notifier
...
power topology data
machine manager
throttling RPCs
power readings
risk assessor
power history data
power readings
power history
throttling decision
risk assessment
power readings
power topology
“Reactive capping” primary subsystem
“Proactive capping” failover subsystem
unavailable.
CPU jailing”
Architecture
9
Thunderbolt service
meters
node controller
node controller
node controller
meter watcher
power notifier
...
power topology data
machine manager
throttling RPCs
power readings
risk assessor
power history data
power readings
power history
throttling decision
risk assessment
power readings
power topology
Mechanism and policy details
10
Mechanism and policy details
11
Linux kernel feature
●
Task-level CPU cap
●
●
Reactive capping mechanism: CPU bandwidth controlExample machine (period = 100 ms)
cgroup 1(quota = 70 ms) task 1
cgroup 2(quota = 90 ms)
task 2
If the machine has 2 logical CPUs, then its CPU utilization is capped at
(70 + 90) / (100 * 2) = 80%
12
Why not RAPL or DVFS?
●
●
●
RAPL
●●
●
●
DVFS
’s native task-level control and platform independence is vital for scalability (DVFS may be added for future efficiency optimization where per-core control is supported)
13
CPU power and set point
●●
CPU power and throughput
●●●
CPU bandwidth control, DVFS, RAPL on Intel Skylake CPU
14
Mechanism and policy details
15
Randomized unthrottling, multiplicative decrease
●●
Two thresholds with two multipliers
●
QoS differentiation: exempting latency tasks
●
Reactive capping policy: load shaping
quota (ms) - 2 - 80 56
usage (ms) 200 2 100 70 50
16
Production cluster
●●
Power utilization pattern
●●● ⇒
Load shaping on a production cluster
17
Failure of affected tasks
●
99%-ile read latency of exempt storage service
●
Load shaping on a production cluster
18
Mechanism and policy details
19
Deterministic machine CPU cap
●●
●●
Relaxed QoS differentiation
●●●
Proactive capping mechanism: CPU jailing
logical CPU
logical CPU
logical CPU
logical CPU
logical CPU
logical CPU
logical CPU
logical CPU
logical CPU
logical CPU
CPU mask of tasks with 20% CPU jailing (J = 0.2)
green gray
20
Production cluster
●●
Task failures
●
99%-ile read latency of storage service
●
20% CPU jailing on a production cluster
21
Mechanism and policy details
22
Risk assessment using a probabilistic model
●
●
Proactive capping policy: risk assessment
Assesses risk of reaching power limit
23
24
Deployed in logs processing clusters
25
Summary
26
Thank you
27