I High-Level Synthesis, TLM Power State Machines, and advanced tracing for Virtual Platforms Philipp A. Hartmann [email protected]OFFIS Institute for Information Technology R&D Division Transportation 16 March 2012 Quo Vadis, Virtual Platforms? QVVP’2012, Dresden Philipp A. Hartmann QVVP’2012 16 March 2012
58
Embed
High-Level Synthesis, TLM Power State Machines, [-.33em ... · (mobile) embedded systems Background ... 2 Power-Aware High-Level Synthesis 3 Non-invasive TLM Power State Machines
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
I High-Level Synthesis, TLM Power State Machines,and advanced tracing for Virtual Platforms
I Extra-functional model for timing and powerI Explicit separation of functional and extra-functional modelI Activity model for powerI Scalable physical/technology power model
for frequency, supply voltage, and temperature
I Automatic timing and power annotation techniquesI Embedded Software: Timing & Power annotation based on cross-compiled binaryI Custom Hardware: Timing & Power annotation from power aware HL-synthesisI Black-Box Hardware IP: Power State Machines instead of power annotation
I Scalable Timing and Power Tracing infrastructureI Timing and Power Tracing Streams per observable VP componentI Processing through filters (e.g. aggregation, averaging, selection)I Dynamic granularity (e.g. area of interest)
I 5 Virtual Platform Power and Timing Annotation FlowGeneral Overview
parallelapplicationdescription
a
applicationscenario
inputstimuli
bexec
uta
ble
spec
ific
atio
n userconstrainedHW/SW sep.& mapping
c
architecture/platform
description
d
I Executable specificationa) Task graph model of applicationb) Application scenario stimulic) Task to platform resource mappingd) Processing, Communication and
Memory blocks
I Estimation & model generation
e) Extraction of task’s behaviour
f) Power & Timing Estimation andback-annotation to input model
g) HW IP components withpower and timing information
h) Assemble Virtual Platform
I Simulationi) Executable power-aware VPj) Configurable tracing
I 5 Virtual Platform Power and Timing Annotation FlowGeneral Overview
parallelapplicationdescription
a
applicationscenario
inputstimuli
bexec
uta
ble
spec
ific
atio
n userconstrainedHW/SW sep.& mapping
c
architecture/platform
description
d
Hardware/Software task separation
e
I Executable specificationa) Task graph model of applicationb) Application scenario stimulic) Task to platform resource mappingd) Processing, Communication and
Memory blocks
I Estimation & model generation
e) Extraction of task’s behaviour
f) Power & Timing Estimation andback-annotation to input model
g) HW IP components withpower and timing information
h) Assemble Virtual Platform
I Simulationi) Executable power-aware VPj) Configurable tracing
I 5 Virtual Platform Power and Timing Annotation FlowGeneral Overview
parallelapplicationdescription
a
applicationscenarioinputstimuli
bexec
uta
ble
spec
ific
atio
n userconstrainedHW/SW sep.& mapping
c
architecture/platformdescription
d
Hardware/Software task separation
e
Hardware & Software estimationquick synthesis
functional, power & timingmodel generation
HW
tasks
estim
atio
n&
model
gen
erat
ion
SW
tasks
f
Pre-existing IP &Virtual componentModels (with PSM)
g
I Executable specificationa) Task graph model of applicationb) Application scenario stimulic) Task to platform resource mappingd) Processing, Communication and
Memory blocks
I Estimation & model generation
e) Extraction of task’s behaviourf) Power & Timing Estimation and
back-annotation to input modelg) HW IP components with
power and timing information
h) Assemble Virtual Platform
I Simulationi) Executable power-aware VPj) Configurable tracing
I 5 Virtual Platform Power and Timing Annotation FlowGeneral Overview
parallelapplicationdescription
a
applicationscenarioinputstimuli
bexec
uta
ble
spec
ific
atio
n userconstrainedHW/SW sep.& mapping
c
architecture/platformdescription
d
Hardware/Software task separation
e
Hardware & Software estimationquick synthesis
functional, power & timingmodel generation
HW
tasks
estim
atio
n&
model
gen
erat
ion
SW
tasks
f
Pre-existing IP &Virtual componentModels (with PSM)
g
timing & power aware executablevirtual system prototype in SystemC
isim
ula
tion
j time
dyn
amic
pow
er
virtual system generator withTLM2 interface synthesis
h
mapping
information
BAC++
BAC++
I Executable specificationa) Task graph model of applicationb) Application scenario stimulic) Task to platform resource mappingd) Processing, Communication and
Memory blocks
I Estimation & model generation
e) Extraction of task’s behaviourf) Power & Timing Estimation and
back-annotation to input modelg) HW IP components with
power and timing informationh) Assemble Virtual Platform
I Simulationi) Executable power-aware VPj) Configurable tracing
I 12 Characterisation of individual control-steps (II)Hardware Basic Blocks
Dynamic power model
I Simplest (and fastest) dynamic HBB power
model uses average activity
I Based on stimuli provided during synthesis
R1 R2 R3 R4 R5
R6 R7 R8
x - + +
mux0 1
active
inactive
extra func.
Con
trol
ler
I Control-step activity is assumed to be sum of average switched capacity of eachactivated component: A = ΣN
n=11
MnΣMn−1
m=1 α(νn, patternm−1, patternm
), with
I Active components ν1, . . . , νNI Mn is the number of stimuli applied to νnI α(νn, . . .) is the switched capacity by applying given patterns consecutively
I On-going research is evaluating probabilistic models to address internal
I 12 Characterisation of individual control-steps (II)Hardware Basic Blocks
Dynamic power model
I Simplest (and fastest) dynamic HBB power
model uses average activity
I Based on stimuli provided during synthesis
R1 R2 R3 R4 R5
R6 R7 R8
x - + +
mux0 1
active
inactive
extra func.
Con
trol
ler
I Control-step activity is assumed to be sum of average switched capacity of eachactivated component: A = ΣN
n=11
MnΣMn−1
m=1 α(νn, patternm−1, patternm
), with
I Active components ν1, . . . , νNI Mn is the number of stimuli applied to νnI α(νn, . . .) is the switched capacity by applying given patterns consecutively
I On-going research is evaluating probabilistic models to address internal
I 12 Characterisation of individual control-steps (II)Hardware Basic Blocks
Dynamic power model
I Simplest (and fastest) dynamic HBB power
model uses average activity
I Based on stimuli provided during synthesis
R1 R2 R3 R4 R5
R6 R7 R8
x - + +
mux0 1
active
inactive
extra func.
Con
trol
ler
I Control-step activity is assumed to be sum of average switched capacity of eachactivated component: A = ΣN
n=11
MnΣMn−1
m=1 α(νn, patternm−1, patternm
), with
I Active components ν1, . . . , νNI Mn is the number of stimuli applied to νnI α(νn, . . .) is the switched capacity by applying given patterns consecutively
I On-going research is evaluating probabilistic models to address internal
I 13 Back-Annotation and Model GenerationPower-Aware High-Level Synthesis
I Generated module container forgeneric white-box IP blocksI Functional and power modelsI Communication & Tracing
I The functional model implementsthe function-call of the originalmoduleI Execute the behaviourI Perform (extra-functional) simulation
steps, until functional model finishes
I The functional model consists ofhierarchical, plain C++ “processes”I RT data path (hardware basic blocks)I Controller (switch statement)I Possibly sub-processes
I 13 Back-Annotation and Model GenerationPower-Aware High-Level Synthesis
I Generated module container forgeneric white-box IP blocksI Functional and power modelsI Communication & Tracing
I The functional model implementsthe function-call of the originalmoduleI Execute the behaviourI Perform (extra-functional) simulation
steps, until functional model finishes
I The functional model consists ofhierarchical, plain C++ “processes”I RT data path (hardware basic blocks)I Controller (switch statement)I Possibly sub-processes
I 13 Back-Annotation and Model GenerationPower-Aware High-Level Synthesis
I Generated module container forgeneric white-box IP blocksI Functional and power modelsI Communication & Tracing
I The functional model implementsthe function-call of the originalmoduleI Execute the behaviourI Perform (extra-functional) simulation
steps, until functional model finishes
I The functional model consists ofhierarchical, plain C++ “processes”I RT data path (hardware basic blocks)I Controller (switch statement)I Possibly sub-processes
I 16 Observation of TLM-2 communicationNon-invasive TLM Power State Machines
I Approximate internal state by observing
the interaction with environment
I Generation of transparent PSM wrapper withspecial observable sockets
I Transaction forwarding to/from componentI Bookkeeping and protocol handling (LT/AT)
Wrapper
IP ComponentBP
BP
BP
BP
TxnMgmnt
I Two new convenience socket typesI tlm_utils::tlm_observable_initiator_socket<BUSWIDTH>I tlm_utils::tlm_observable_target_socket<BUSWIDTH>
I Observer infrastructure to register and triggerProtocol State Machine transition conditionsI Other use-cases supported as wellI Details presented at ESCUG24 @ FDL’2011
I 16 Observation of TLM-2 communicationNon-invasive TLM Power State Machines
I Approximate internal state by observing
the interaction with environment
I Generation of transparent PSM wrapper withspecial observable sockets
I Transaction forwarding to/from componentI Bookkeeping and protocol handling (LT/AT)
Wrapper
IP ComponentBP
BP
BP
BP
TxnMgmnt
I Two new convenience socket typesI tlm_utils::tlm_observable_initiator_socket<BUSWIDTH>I tlm_utils::tlm_observable_target_socket<BUSWIDTH>
I Observer infrastructure to register and triggerProtocol State Machine transition conditionsI Other use-cases supported as wellI Details presented at ESCUG24 @ FDL’2011
I 20 Advanced tracing for TLM Virtual PlatformsGeneral overview
I Problem: Flexible tracing of physical quantities not directly possible in SystemCI sc_core::sc_trace not flexible enough (tied to simulation time)I sca_core::sca_trace is SystemC AMS-specific and not widely supportedI SCV transaction recording not really appropriate
I Goal: Enable flexible and configurable tracing of extra-functional properties
in TLM-2-based virtual platforms
I Integration with temporal-decoupling
→ Independence of current simulation time, backwards and forwards
I Hierarchical pre-processingI Filtering and data reduction (aggregation, averaging, selection)I Collection of user-defined performance metricsI Run-time configurable granularity (Region of Interest)
I 20 Advanced tracing for TLM Virtual PlatformsGeneral overview
I Problem: Flexible tracing of physical quantities not directly possible in SystemCI sc_core::sc_trace not flexible enough (tied to simulation time)I sca_core::sca_trace is SystemC AMS-specific and not widely supportedI SCV transaction recording not really appropriate
I Goal: Enable flexible and configurable tracing of extra-functional properties
in TLM-2-based virtual platforms
I Integration with temporal-decoupling
→ Independence of current simulation time, backwards and forwards
I Hierarchical pre-processingI Filtering and data reduction (aggregation, averaging, selection)I Collection of user-defined performance metricsI Run-time configurable granularity (Region of Interest)
I 21 Stream-based tracing of extra-functional propertiesAdvanced tracing for TLM Virtual Platforms
I Tracing is based on (time.value) streams per componentI Streams are hierarchically named SystemC objectsI Strongly-typed values for type safety with support for physical unitsI Fine granular control over (local) time offset, synchronisation with a local clockI push APIs for both absolute (start, value, [duration]), or relative (duration,value) tuplesI Non-overlapping tuples enforced by MergePolicy
I Hierarchy of (user-defined) stream preprocessors (and sinks)I Streams can be processed by an extensible set of (pre-)processorsI Hierarchically connected during simulationI Sinks can write streams to storage backends for offline analysis
I Automatic separation and merging of multiple source processes (initiators)optionally supported for temporal decoupling with overlapsI Example: Accumulate overlapping power consumptions due to loss of time resolution
I 21 Stream-based tracing of extra-functional propertiesAdvanced tracing for TLM Virtual Platforms
I Tracing is based on (time.value) streams per componentI Streams are hierarchically named SystemC objectsI Strongly-typed values for type safety with support for physical unitsI Fine granular control over (local) time offset, synchronisation with a local clockI push APIs for both absolute (start, value, [duration]), or relative (duration,value) tuplesI Non-overlapping tuples enforced by MergePolicy
I Hierarchy of (user-defined) stream preprocessors (and sinks)I Streams can be processed by an extensible set of (pre-)processorsI Hierarchically connected during simulationI Sinks can write streams to storage backends for offline analysis
I Automatic separation and merging of multiple source processes (initiators)optionally supported for temporal decoupling with overlapsI Example: Accumulate overlapping power consumptions due to loss of time resolution
I 21 Stream-based tracing of extra-functional propertiesAdvanced tracing for TLM Virtual Platforms
I Tracing is based on (time.value) streams per componentI Streams are hierarchically named SystemC objectsI Strongly-typed values for type safety with support for physical unitsI Fine granular control over (local) time offset, synchronisation with a local clockI push APIs for both absolute (start, value, [duration]), or relative (duration,value) tuplesI Non-overlapping tuples enforced by MergePolicy
I Hierarchy of (user-defined) stream preprocessors (and sinks)I Streams can be processed by an extensible set of (pre-)processorsI Hierarchically connected during simulationI Sinks can write streams to storage backends for offline analysis
I Automatic separation and merging of multiple source processes (initiators)optionally supported for temporal decoupling with overlapsI Example: Accumulate overlapping power consumptions due to loss of time resolution
I COMPLEX partners work on advanced Virtual Platform technologies enablingI Component-level power state tracingI Co-analysis with softwareI Debugging with power and timing informationI Fast design-space explorationI Derivation of power management strategies
I through
I Automatic timing and power annotation techniques and tools forVirtual Platform Components (Embedded Software, Custom Hardware, and Hardware IP)
I Extra-functional scalable executable model for timing and powerI Scalable timing and power tracing infrastructure