Exploiting Dynamic Phase Distance Mapping for Phase-based Tuning of Embedded Systems + Also Affiliated with NSF Center for High-Performance Reconfigurable Computing This work was supported by National Science Foundation (NSF) grant CNS-0953447 Tosiron Adegbija and Ann Gordon-Ross + Department of Electrical and Computer Engineering University of Florida, Gainesville, Florida, USA
26
Embed
Exploiting Dynamic Phase Distance Mapping for Phase-based Tuning of Embedded Systems
Exploiting Dynamic Phase Distance Mapping for Phase-based Tuning of Embedded Systems. Tosiron Adegbija and Ann Gordon-Ross + Department of Electrical and Computer Engineering University of Florida, Gainesville, Florida, USA. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Exploiting Dynamic Phase Distance Mapping for Phase-based Tuning of
Embedded Systems
+ Also Affiliated with NSF Center for High-Performance Reconfigurable Computing
This work was supported by National Science Foundation (NSF) grant CNS-0953447
Tosiron Adegbija and Ann Gordon-Ross+
Department of Electrical and Computer EngineeringUniversity of Florida, Gainesville, Florida, USA
Introduction and Motivation• Embedded systems are pervasive and have stringent design constraints
– Constraints: Energy, size, real time, cost, etc.• System optimization is challenging due to numerous tunable parameters
– Tunable parameters: parameters that can be changed• E.g., cache size, associativity, line size, clock frequency, etc.
– Many combinations large design space– Multicore systems result in exponential increase in design space– Tradeoff design constraints
• Design constraints (e.g., energy/performance)• Many Pareto optimal designs
• Tuner evaluates different configurations to determine the best configuration
Lowest energy
Execution time
6
Dynamic Tuning• Advantages
– Adapts to the runtime operating environment and input stimuli– Specializes configurations to executing applications
• Disadvantages– Large design space
• Challenging to determine the best configuration– Runtime tuning overhead (e.g., energy, power, performance)
• Energy consumed and additional execution time incurred during tuning
Tunable
cacheTuning
hardware
TCTCTCTCTCTCTCTCTCTCTC
Download application
Microprocessor
7
Phase-based Tuning
Time varying behavior for IPC, level one data cache hits, branch predictor hits, and power consumption for SPEC2000 gcc (using the
integrate input set)
Base energy
Application-tuned
TimeEner
gy C
onsu
mpt
ion
Phase-tuned
Change configurationGreater savings when tuning is phase-based, rather than application-based
• Applications have dynamic requirements during execution– Different phases of execution
• Tune when the phase changes, rather than when the application changes– Phase = a length of execution where application characteristics are relatively stable
• Phase tuning architecture consists of phase characterization hardware– Phase characterization determines phase’s best configuration– Phase characterization hardware orchestrates PDM
Contributions• We present DynaPDM: dynamic phase distance mapping
– Alleviates design time effort• Shorter time-to-market
– Maximizes energy delay product (EDP) savings• Defines distance windows during runtime• Dynamically designates base phase and calculates configuration distances• Specializes distance windows to dynamic system and application behavior
– Low overhead, dynamic method for determining phase’s best configuration• DynaPDM evaluation
– DynaPDM effectively determines best configuration with minimal designer effort– Applicable to general purpose embedded systems with disparate unknown
applications (e.g., smartphones, tablets, etc.)• DynaPDM compared to PDM
– Quantify DynaPDM’s EDP improvement over PDM• Achieves higher EDP savings than PDM without extensive a priori application
analysis
13
DynaPDM: Dynamic Phase Distance Mapping
14
Phase Characterization
15
Phaseclassification
Phases/ phase characteristics
Phase P1
executed
Phase ConfigurationPhase history table
CP1? xx
DynaPDM
New phase, P1
P1 configuration, CP1
P1
P2
P3
P4
x
x
Base phase, PbPb
P1
P2
P3
P4
CPb
x
P1
Pb CPb
• DynaPDM is part of phase characterization– Determines a phase, Pi’s best configuration
CP1
CP1
Used for comparison. Best configuration determined
a priori or at runtime.
16
DynaPDM
New phase, P1
Distance window (DW) Configuration distance (CD)
DW1 CD1
DW2 CD2
d (Pb, P1)
CalculateConfigPi
Distance window table
Phase ConfigurationPhase history table
P1 ConfigP1
Execute P1
in ConfigP1
Create newDW
InitializeDW
DW3 CD3
DynaPDM• Creating new distance windows
– Windows defined by upper bound WinU and lower bound WinL
• Phase distance D maps to a distance window if: WinL < D < WinU
• Distance window size Sd = WinU – WinL
– Empirically determined Sd = 0.5 as generally suitable for embedded processor applications
• If D < Sd, WinL = 0, and WinU = Sd
– WinUmax determines maximum number of new distance windows• If D > WinUmax, D maps to WinUmax < D < ∞• Maximum number of distance windows = WinUmax/Sd
– If Sd < D < WinUmax, • WinL = x | x ≤ D, x mod Sd = 0, x + Sd > D
17
DynaPDM• Initializing distance windows (DW)
18
New DW?Set most recently used configuration as first phase’s initial configuration
Tune cacheparameters
Execution j ≤ n?
Tune cache size
Tune associativity
Tune line size
Continue tuningStop tuning;
Store lowest EDP configuration
No YesPower-of-two
increments until EDP increases
Defines DW’sconfiguration
distance
n allows limited number of phase executions to hone configurations closer
to the optimal trades off improved configuration efficacy for
tuning overhead depends on system’s application/phase
persistence
Experimental Results
19
Experimental Setup• Experiments modeled closely with PDM• Design space
– Level 1 (L1) instruction and data caches: cache size (2kB 8kB); line size (16B64B); associativity (direct-mapped4-way)
– Base cache configuration:• L1 instruction and data caches: cache size (8kB); line size (64B);
associativity (4-way)• 16 workloads from EEMBC Multibench Suite
– Image processing, MD5 checksum, networking, Huffman decoding– Each workload represented a phase
• Simulations– GEM5 generated cache miss rates– McPAT calculated power consumption
• Energy delay product (EDP) as evaluation metric= system_power * (total_phase_cycles/system_frequency)2
20
• Workloads
• Optimal configurations determined by exhaustive search– Used to compare DynaPDM and PDM results
• Perl scripts implemented DynaPDM and PDM
Experimental SetupApplication Type Phases Number of phasesImage processing 1, 5, 13, 14, 15, 16 6
Networking 2, 3, 4, 8, 9 5
MD5 Checksum 10, 11, 12 3
Huffman Decoding 7 1
Empty 6 1
21
– Distribution of application phases used in experimental setup
Results
- Used rotate-16x4Ms32w8 as base phase- EDP savings calculated with respect to base configuration- DynaPDM achieved 28% average EDP savings overall- Savings as high as 47% for 64M-rotatew2- On average, within 1% of optimal- EDP improved over PDM by 8%
22
rotate
-16x4
Ms3
2w8
rotate
-16x4
Ms4
w864
M-ro
tatew
2rot
ate-4M
s4w1
rotate
-520k
-270d
eg
rotate
-color
-4M-90
degw
14M
-chec
k
4M-re
assem
bly4M
-tcp-m
ixed
ippktc
heck
-8x4M
-4Work
er
ipres-
6M4w
orker
md5-12
8M4w
orker
md5-32
M4w
orker
md5-4M
empty
-wld
huffd
e-all
Averag
e
00.10.20.30.40.50.60.70.80.9
1Optimal PDM DynaPDM
EDP
norm
aliz
ed to
the
base
ca
che
conf
igur
atio
n
Base phase
28%
DynaPDM improved over PDM and eliminated design-time effort!
47%DynaPDM’s
configurations1% of
the optimal!
Results- Effects of n = 3 on EDP normalized to optimal cache configurations
- n = maximum number of phase executions
23
64M-ro
tatew
2
rotate
-520k-2
70deg
rotate
-color
-4M-90de
gw1
4M-ch
eck
4M-re
assem
bly
ippktc
heck-8x4
M-4Worke
r
ipres-
6M4worke
r
md5-12
8M4worke
r
md5-32
M4work
er
empty
-wld
00.20.40.60.8
11.21.41.6
Optimal j = 1 j = 2 j = 3
EDP
norm
aliz
ed to
opt
imal
cac
he c
onfig
u-ra
tion
DynaPDM determinedoptimal configurations
Number of phase executions
Determined optimal configurations in ≤ 3 executions!
Results
24
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
2
4
6
8
10
Distance window size, Sd
Num
ber o
f dis
tanc
e w
indo
ws
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 125%
26%
27%
28%
29%
30%
Distance window size, Sd
Perc
enta
ge E
DP
savi
ngs
- Tradeoffs of Sd with percentage of phases tuned at runtime