Variation Aware Application Scheduling and Power Management for Chip Multiprocessors
Radu Teodorescu* and Josep Torrellas
Computer Science DepartmentUniversity of Illinois at Urbana-Champaign
http://iacoma.cs.uiuc.edu
*now at Ohio State University
Radu Teodorescu Variation-Aware Application Scheduling and Power Management
Technology scaling continues
transistorsize
2
number of transistors
Pentium 3
Pentium 4
Core 2 Duo
Quad Opteron
Radu Teodorescu Variation-Aware Application Scheduling and Power Management
Variation in transistor parameters
3
Frequency
Power
Reliability
switching speed
leakage power
nominal
Radu Teodorescu Variation-Aware Application Scheduling and Power Management
Variation components
4
die-to-die
slower, less leaky transistors
fast, leaky transistors
within-die
Radu Teodorescu Variation-Aware Application Scheduling and Power Management
C1 C2 C3 C4 C5
C6 C7 C8 C9 C10
C11 C12 C13 C14 C15
C16 C17 C18 C19 C20
L2 Cache
L2 Cache
• We model a 20-core CMP, 32nm
C20
C2
5
Effects of within-die variation
• CMPs: significant core-to-core variation in frequency and power
Total power
40%
Leakage power
2X
Frequency
30%vs.
fastest
slowest
Design-identical cores will have significantly different properties
• On average:
Radu Teodorescu Variation-Aware Application Scheduling and Power Management
• 15% average frequency increase
• Support present in AMD’s 4-core Opteron
How can we exploit this variation?
6
• Heterogeneous system
C1 C2 C3 C4 C5
C6 C7 C8 C9 C10
C11 C12 C13 C14 C15
C16 C17 C18 C19 C20
L2 Cache
L2 Cache
• Current CMPs run at the frequency of the slowest core
slowest core
C1 C2 C3 C4 C5
C6 C7 C8 C9 C10
C11 C12 C13 C14 C15
C16 C17 C18 C19 C20
L2 Cache
L2 Cache
C1 C2 C3 C4 C5
C6 C7 C8 C9 C10
C11 C12 C13 C14 C15
C16 C17 C18 C19 C20
L2 Cache
L2 Cache
C1 C2 C3 C4 C5
C6 C7 C8 C9 C10
C11 C12 C13 C14 C15
C16 C17 C18 C19 C20
L2 Cache
L2 Cache
C1 C2 C3 C4 C5
C6 C7 C8 C9 C10
C11 C12 C13 C14 C15
C16 C17 C18 C19 C20
L2 Cache
L2 Cache
• We can run each core at the maximum frequency it can achieve
Radu Teodorescu Variation-Aware Application Scheduling and Power Management
Contributions
• Expose variation in core frequency and power to the OS
• Variation-aware application scheduling algorithms
• Variation-aware global power management subsystem
• On-line optimization algorithm that maximizes system performance at a power budget
• 12-17% CMP throughput improvement at the same power
7
Radu Teodorescu Variation-Aware Application Scheduling and Power Management
Outline
• Variation-aware scheduling
• Variation-aware power management
• Defining the optimization problem
• Implementation
• Evaluation
• Conclusions
8
Radu Teodorescu Variation-Aware Application Scheduling and Power Management
Outline
• Variation-aware scheduling
• Variation-aware power management
• Defining the optimization problem
• Implementation
• Evaluation
• Conclusions
9
Radu Teodorescu Variation-Aware Application Scheduling and Power Management
C1 C2 C3 C4 C5
C6 C7 C8 C9 C10
C11 C12 C13 C14 C15
C16 C17 C18 C19 C20
L2 Cache
L2 Cache
C1 C2 C3 C4 C5
C6 C7 C8 C9 C10
C11 C12 C13 C14 C15
C16 C17 C18 C19 C20
L2 Cache
L2 Cache
Variation-aware scheduling
• Per core frequency and static power
• Application behavior
• Dynamic power consumption
• Compute intensity (IPC)
10
• Multiple possible goals:
• Reduce power
• Improve performance
ApplicationsAdditional information to guide scheduling decisions:
Radu Teodorescu Variation-Aware Application Scheduling and Power Management
C1 C2 C3 C4 C5
C6 C7 C8 C9 C10
C11 C12 C13 C14 C15
C16 C17 C18 C19 C20
L2 Cache
L2 Cache
C1 C2 C3 C4 C5
C6 C7 C8 C9 C10
C11 C12 C13 C14 C15
C16 C17 C18 C19 C20
L2 Cache
L2 Cache
Variation-aware scheduling
11
When the goal is to reduce power consumption:
Assign applications to low static power cores first
• VarP
Assign applications with high dynamic power to low static power cores
• VarP&AppP
Applications
C1 C2 C3 C4 C5
C6 C7 C8 C9 C10
C11 C12 C13 C14 C15
C16 C17 C18 C19 C20
L2 Cache
L2 Cache
C1 C2 C3 C4 C5
C6 C7 C8 C9 C10
C11 C12 C13 C14 C15
C16 C17 C18 C19 C20
L2 Cache
L2 Cache
dynamic power
low high
Radu Teodorescu Variation-Aware Application Scheduling and Power Management
C1 C2 C3 C4 C5
C6 C7 C8 C9 C10
C11 C12 C13 C14 C15
C16 C17 C18 C19 C20
L2 Cache
L2 Cache
C1 C2 C3 C4 C5
C6 C7 C8 C9 C10
C11 C12 C13 C14 C15
C16 C17 C18 C19 C20
L2 Cache
L2 Cache
Variation-aware scheduling
12
When the goal is to improve performance:
Assign applications to high frequeny cores first
• VarF
Assign high IPC applications to high frequency cores
• VarF&AppIPC
Applications
IPC
low high
C1 C2 C3 C4 C5
C6 C7 C8 C9 C10
C11 C12 C13 C14 C15
C16 C17 C18 C19 C20
L2 Cache
L2 Cache
C1 C2 C3 C4 C5
C6 C7 C8 C9 C10
C11 C12 C13 C14 C15
C16 C17 C18 C19 C20
L2 Cache
L2 Cache
Radu Teodorescu Variation-Aware Application Scheduling and Power Management
Outline
• Variation-aware scheduling
• Variation-aware power management
• Defining the optimization problem
• Implementation
• Evaluation
• Conclusions
13
Radu Teodorescu Variation-Aware Application Scheduling and Power Management14
Variation-aware global power management
• Challenge: find best (V,F) for each core
• Core-level decisions less effective in large CMPs
• Global (CMP-wide) power management solution is needed
C1 C2 C3 C4 C5
C6 C7 C8 C9 C10
C11 C12 C13 C14 C15
C16 C17 C18 C19 C20
L2 Cache
L2 Cache
C1 C2 C3 C4 C5
C6 C7 C8 C9 C10
C11 C12 C13 C14 C15
C16 C17 C18 C19 C20
L2 Cache
L2 Cache
CMP power management
V,F V,F V,F V,F V,F
V,F V,F V,F V,F V,F
V,F V,F V,F V,F V,F
V,F V,F V,F V,F V,F
• Per core dynamic voltage and frequency scaling (DVFS)
Variation makes the problem more difficult
Radu Teodorescu Variation-Aware Application Scheduling and Power Management
0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Frequency
Tota
l pow
er
DVFS under variation
15
1V
0.6V
0.85V
Vdd=1V
0.9V
0.8V
0.7V0.6V
Radu Teodorescu Variation-Aware Application Scheduling and Power Management
Optimization problem
16
FIND!
V,F
V,F V,F V,F V,F V,F
V,F V,F V,F V,F V,F
V,F V,F V,F V,F V,F
V,F V,F V,F V,F V,F
V,F V,F V,F V,F V,F
V,F V,F V,F V,F V,F
V,F V,F V,F V,F V,F
V,F V,F V,F V,F V,F
best (Vi,Fi) of each core
• Goal: maximize system throughput (MIPS)
• Constraint: keep total power below budget
?
• Runtime system adaptation50W
75W
100W
Given a mapping of threads to cores (variation-aware):
C1 C2 C3 C4 C5
C6 C7 C8 C9 C10
C11 C12 C13 C14 C15
C16 C17 C18 C19 C20
L2 Cache
L2 Cache
C1 C2 C3 C4 C5
C6 C7 C8 C9 C10
C11 C12 C13 C14 C15
C16 C17 C18 C19 C20
L2 Cache
L2 Cache
Radu Teodorescu Variation-Aware Application Scheduling and Power Management
• Simulated annealing (SAnn)
• Not practical at runtime
• Linear programming (LinOpt)
• Simpler, faster
• Requires some approximations
• Exhaustive search: too expensive
Possible solutions
17
FIND?
LinOpt
Radu Teodorescu Variation-Aware Application Scheduling and Power Management
Outline
• Variation-aware scheduling
• Variation-aware power management
• Defining the optimization problem
• Implementation
• Evaluation
• Conclusions
18
Radu Teodorescu Variation-Aware Application Scheduling and Power Management
LinOpt problem definition
• Linear programming:
• Maximize objective function: f(x1,...,xn), with x1,...,xn independent
• Subject to constraints such as: g(x1,...,xn) < C
• f,g are linear functions
• Unknowns: voltages V1,...,Vn for all cores
• Objective function: maximize CMP throughput
• Throughput (MIPS) = Frequency X IPC = f(V1,...,Vn)
• Constraint: keep power under Ptarget
• Power = g(V)
19
Radu Teodorescu Variation-Aware Application Scheduling and Power Management
C1 C2 C3 C4 C5
C6 C7 C8 C9 C10
C11 C12 C13 C14 C15
C16 C17 C18 C19 C20
L2 Cache
L2 Cache
• LinOpt works together with the OS scheduler
• OS scheduler maps applications to cores (e.g. VarF&AppIPC)
• LinOpt then finds (V,F) settings for each core
20
LinOpt implementation
• On a core
• LinOpt uses profile information as inputPMU
• Power management unit (PMU) - e.g., Foxton
• LinOpt runs periodically as a system process
Radu Teodorescu Variation-Aware Application Scheduling and Power Management21
LinOpt implementation
Post-manufacturing profiling
Each core: frequency, static power
Dynamic profiling
Each app: dynamic power, IPC
LinOpt
Powertarget
Goal
LinOpt
10ms Time
OS scheduling interval
V,F V,F V,F V,F V,F
V,F V,F V,F V,F V,F
V,F V,F V,F V,F V,F
V,F V,F V,F V,F V,F
V,F V,F V,F V,F V,F
V,F V,F V,F V,F V,F
V,F V,F V,F V,F V,F
V,F V,F V,F V,F V,F
best (Vi,Fi) of each core
Radu Teodorescu Variation-Aware Application Scheduling and Power Management
Outline
• Variation-aware scheduling
• Variation-aware power management
• Defining the optimization problem
• Implementation
• Evaluation
• Conclusions
22
Radu Teodorescu Variation-Aware Application Scheduling and Power Management
Evaluation infrastructure
• Process variation model - VARIUS [IEEE TSM’08]
• Monte Carlo simulations for 200 chips
• SESC - cycle accurate microarchitectural simulator
• SPICE model - leakage power
• Hotspot - temperature estimation
23
Radu Teodorescu Variation-Aware Application Scheduling and Power Management
Evaluation infrastructure
• 20-core CMP
• 2-issue, OOO cores
• Shared L2 cache
• 32nm technology, 4GHz
24
C1 C2 C3 C4 C5
C6 C7 C8 C9 C10
C11 C12 C13 C14 C15
C16 C17 C18 C19 C20
L2 Cache
L2 Cache
• Multiprogrammed workload:
• From a pool of SPECint and SPECfp benchmarks
Radu Teodorescu Variation-Aware Application Scheduling and Power Management
• VarF: up to 9% throughput improvement over Naive
9%7% 5% 2% 0%
• VarF&AppIPC scales better with number of threads: 5-10% improvement over Naive
10%5%
8% 7% 6%
25
Variation-aware scheduling
0.5
0.6
0.7
0.8
0.9
1.0
1.1
2 Threads 4 Threads 8 Threads 16 Threads 20 Threads
MIP
S
Goal: Improve CMP throughput
Naive VarF VarF&AppIPC
Radu Teodorescu Variation-Aware Application Scheduling and Power Management
Foxton+: baseline
LinOpt: proposed scheme
SAnn: approximate upper bound
26
• Goal: maximize throughput
• Constraint: keep power below budget (75W)
Global power management algorithms:
Variation-aware power management
Radu Teodorescu Variation-Aware Application Scheduling and Power Management27
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
4 Threads 8 Threads 16 Threads 20 Threads
MIP
S
Foxton+ LinOpt SAnn
12%17%
13% 16%
• LinOpt: 12-17% improvement over Foxton+, at the same power
• LinOpt within 2% of SAnn
Variation-aware power management
• 30-38% reduction in ED2
Radu Teodorescu Variation-Aware Application Scheduling and Power Management
• Low overhead even for large problem size
• Up to 6 µs for 20 threads
• LinOpt runs on a core every 1-10 ms - negligible impact
28
Time overhead of LinOpt
0
1.5
3.0
4.5
6.0
1 2 4 8 16 20
Tim
e(m
icro
seco
nds)
Number of threads
50W 75W 100W
Radu Teodorescu Variation-Aware Application Scheduling and Power Management
Conclusions
• We showed the value of exposing variation in core frequency and power to the OS
• Proposed a set of scheduling algorithms
• reduce CMP power consumption (2-16%)
• improve CMP throughput (5-10%)
• Proposed a power management algorithm
• improve CMP throughput for a given power budget (12-17%)
29
Variation Aware Application Scheduling and Power Management for Chip Multiprocessors
Radu Teodorescu* and Josep Torrellas
Computer Science DepartmentUniversity of Illinois at Urbana-Champaign
http://iacoma.cs.uiuc.edu
*now at Ohio State University