1 Integrated Management of Power Aware Computing & Communication Technologies Kickoff review meeting Nader Bagherzadeh, Pai H. Chou, Fadi Kurdahi University of California, Irvine, ECE Dept. DARPA Contract F33615-00-1-1719 September 27, 2000
Jan 04, 2016
1
Integrated Management of Power Aware Computing & Communication
Technologies
Kickoff review meeting
Nader Bagherzadeh, Pai H. Chou, Fadi KurdahiUniversity of California, Irvine, ECE Dept.
DARPA Contract F33615-00-1-1719
September 27, 2000
2
Agenda
Introduction and overview
Management status, financial, milestones, schedule.
Technical presentation Task progress
Architecture Applications CAD
Lessons learned, challenges, issues.
Questions + action items review.
3
Outline
Introduction Program goals Project overview
Management status Personnel and teaming plans Plans and milestones Financial information
Technical presentation Background Technical approach Status and accomplishments Current detailed schedule
Program impact and anticipated transitions
4
Introduction
5
Program Goals
Power-aware system-level design Enhance mission success (time, task) Rapid customization for different missions
Design tool Exploration & evaluation Optimization& specialization Technique integration
System architecture Statically configurable Dynamically adaptive Use COTS parts & protocols
6
Technical approach
High-level specification Separate behavior from architecture Explicit constraints (timing, power) Library characterization
System synthesis tool Source-aware power usage scheduling Bus topology transformation and communication scheduling
Configurable architecture Task migration & selective shutdown Bus segmentation and voltage scaling
Domain knowledge Encompass mechanical / thermal power Aware of power supply model
7
Quad Chart
Innovations Component-based power-aware design
Exploit off-the-shelf components & protocols Best price/performance, reliable, cheap to replace
CAD tool for global power policy optimization Optimal partitioning, scheduling, configuration Manage entire system, including mechanical & thermal
Power-aware reconfigurable architectures Reusable platform for many missions Bus segmentation, voltage / frequency scaling
Impact
Enhanced mission success More task for the same power Dramatic reduction in mission completion time
Cost saving over a variety of missions Reusable platform & design techniques Fast turnaround time by configuration, not redesign
Confidence in complex design points Provably correct functional/power constraints Retargetable optimization to eliminate overdesign Power protocol for massive scale
Behavior
Architecture
high-levelsimulation
functionalpartitioning& scheduling
compositionoperators
high-levelcomponents
behavioralsystem model
busses, protocols systemarchitecture
mapping system integration& synthesis
staticconfiguration
dynamic powermanagement
parameterizablecomponents
2Q 00
Kickoff
2Q 01 2Q 02
Static & hybrid optimizations partitioning / allocation scheduling bus segmentation voltage scaling
COTS component library
FireWire and I2C bus models
Static composition authoring
Architecture definition
High-level simulation
Benchmark Identification
Dynamic optimizations task migration processor shutdown bus segmentation frequency scaling
Parameterizable components library
Generalized bus models
Dynamic reconfiguration authoring
Architecture reconfiguration
Low-level simulation
System benchmarking
Year 1 Year 2
8
Innovations
Component-based power-aware design Exploit off-the-shelf components & protocols COTS offer best price/performance, reliable, cheap to replace
CAD tool for global power policy optimization Optimal partitioning, scheduling, configuration Manage entire system, including mechanical & thermal
Power-aware reconfigurable architectures Reusable platform for many missions Bus segmentation, voltage / frequency scaling
9
Impact
Enhanced mission success More task for the same power Dramatic reduction in mission completion time
Cost saving over a variety of missions Reusable platform & design techniques Fast turnaround time by configuration, not redesign
Confidence in complex design points Provably correct functional/power constraints Retargetable optimization to eliminate overdesign Power protocol for massive scale
10
Management Status
11
Personnel & teaming plans
UC Irvine, Co-PI's - Design tools Nader Bagherzadeh Pai Chou Fadi Kurdahi
UC Irvine, research assistants Dexin Li Jinfeng Liu Afshin Niktash
USC - Component power optimization Jean-Luc Gaudiot Seong-Won Lee
JPL - Applications and benchmarking Nazeeh Aranki Nikzad “Benny” Toomarian
12
Previous work
Design tools System-level: the Chinook HW/SW codesign tool Architectural synthesis (w/ physical design considerations)
Components Reconfigurable computing: the MorphoSys Chip Parameterizable components: PCL Simultaneous MultiThreading
vs. Chip MultiProcessing
Architectural platform Segmented bus X-2000, Mars Pathfinder Configurable SMP
13
Responsibilities
Bagherzadeh, Chou, Kurdahi -- co-PIs Oversee project operation Integration into curriculum and related research efforts
Li, Liu, Afshin -- RA's Development of CAD tools Modeling of demonstrator examples Authoring of component / protocol library
JPL Furnish example specifications Co-develop optimization techniques
USC Supporting link to low-level technologies
14
External collaborations
JPL X-2000 multi-mission architecture Mars Pathfinder as baseline JPL to provide COTS testbed JPL to evaluate IMPACCT optimizations
USC Parameterizable components Low-level power estimation
Consystant Design Technologies (Seattle, WA) Framework for component-based design IMPACCT plugins to support power management
15
Technical Background
16
Background: MorphoSys project
Reconfigurable processor array
MIPS-like RISC processor
High-bandwidth data interface
100 MHz clock
0.35µm 4metal CMOS
Software support
Platform for dynamic power management
Advanced RISCProcessor
External Memory (e.g. SDRAM, RDRAM)
System Bus
Instr./DataCache (L1)
ReconfigurableProcessor Array
High BandwidthData Interface
MorphoSys
17
RC Array and Context Memory
RC RC
RC RC
RC RC
RC RC
RC RC
RC RC
RC
RC
RC
RC RC
RC RC
RC RC
RC RC
RC RC
RC RC
RC RC
RC RC
RC RC
RC RC
RC RC
RC
RC
RC
RC
RC RC
RC RC
RC RC
RC RC
RC RC RC RC RC RC RC
RC
RC
RC
RC
RC
RC
RC
RC
1616 16 16 16 16 16 16
column block
16
16
16
16
16
16
16
16row
bl
ock
Context Memory• 2 blocks
• 8 sets in each block
• A set controls 1 row or column (SIMD)
• 16 contexts in 1 set.
• Possible to overlap ctx broadcast with ctx reloading
18
The M1 chip layout
19
M1 chip test fixture
20
TR_appa = b + cp = a + 1
TR_appa = b + cp = a + 1
TinyRISCTinyRISC
RC ArrayRC Array
App. (C Code)
C++, VHDL
MorphoSys Chip
mcc
Z=RC_F(X)
W=RC_F(Y)
mLoad Context Lib.
mSchedmSchedExecutable
RC Array functions
MuLate,MorphoSim
mView
Configurationcontext
Software environment
21
Background on USC's SMT work
High performance processors Superscalar processor (SSP) Single chip multiprocessor (CMP) Very long instruction word (VLIW) Simultaneous multithreading (SMT)
Performance and power dissipation High performance need high power consumption
Recent applications need for low power, high performance processor
22
Microarchitectural tradeoffs
Power tradeoffs between different architectures SMT vs. SSP:
SMT has more modules than SSP SMT has better performance and consumes more power
SMT vs. CMP: SMT has better utilization They have similar performance, but SMT consumes less power
SMT vs. VLIW: SMT consume more power SMT has compatibility with conventional architecture
Design of simple SMT A simplified SMT may consume less power and still have the advantage of TLP
Analysis of architectural features Power drain of modern processor (control vs. data path)
23
SMT design methodology
Measuring power consumption of a processor Checking transitions of signals and module operations Hardware implementation of the processor simulator
Measuring performance of modules The contribution of each module to the total performance Performance-power ratio of each module
Comparison between architectures
Design of a low power processor
24
Measuring performance
Finding the performance per power of each module Simulate and measure the performance without a module Calculate the performance per power for each module Classify modules if more than two modules cooperate with each other
Find the solution for the low power high performance processor
25
Background: Chinook project
Component-based HW/SW codesign framework Specification, simulation, synthesis Motivated by IP reuse, system integration
Problem: IP reuse forces modification Reason: components have hardwired coordination protocols
Approach Adaptable components Separate coordination protocols
from components
Benefits Reuse without modification Enable system-level optimizations
26
Example protocol: Subsumption
Must handle three cases: Subsuming, yielding, idle Hardwired protocol
Generalization: Adaptable components (by mode mapping) Separate protocols & components
joystick
bumper
sonar
wheels
escape
avoid
override
s
s
sensorsactuators
decisionmodules
decisioncomposition
isy
isy
isy
isy
i
i
i
i
s i
yi
s i
yi
ys
WB TFBumperprocess
yieldingsubsumingidlesubsumptioninterface
+B
BF
subsumingidle
+W
WB
+subsuming
yieldingsubsuming
WBF
W
T2s
45d bump
release
27
Architectural mapping
Single processor or multiple processors
Multiple mappings to an architecture
modemanager
modalprocesses
28
Distributed mode managers
Automatically partitioned among processors Synthesized control communication Comm. tradeoffs: synchronization, replication
modemanager
modalprocesses
29
Technical Presentation
30
Solar Panel
UHF Antenna
Cameras/Lasers
Warm Electronics Box (WEB)
APXS
30 cm
65 cm
48 cm
“Sojourner”The Mars Pathfinder Microrover Flight Experiment
Alpha Proton X-ray Spectrometer (APXS)
Past missions – Mars Pathfinder
31
Application requirements
System specification 6 wheel motors 4 steering motors System health check Hazard detection
Power supply Battery (non-rechargeable) Solar panel
Power consumption Digital
Computation, imaging, communication, control Mechanical
Driving, steering Thermal
Motors must be heated in low-temperature environment
32
Energy Required
Function Time and Calculation
7.51W-hr5.63W-hr6.92W-hr1.83W-hr0.45W-hr
1.2W-hr
5.2W-hr0.63W-hr15.0W-hr
50W-hr95W-hr
motor heating: 1 motor at a timemotor heating: 2 motors at a timedriving (extreme terrain @ -80degC)hazard detectionimaging (3 images @ 2 min/image)image compression (compress 3 images @ 6 min/image)6Mbit communication @ 50min/sol42, 10 sec health checks during dayremainder of 7 hr daytime CPU operationWEB heating (as needed)
= 7.51W x 1hr = 11.26W x 0.5hr= 13.85W x 0.5hr= 7.33W x 0.25hr= 4.5W x 0.1hr= 3.7W x 0.3hr = 6.27W x 0.8hr= 6.27W x 0.1hr= 3.7W x 4hr= 50W-hr
System-level power budget
33
Design issues
Timing constraints System health check 10s/10min Heating motor for 5s, 50s prior to driving Hazard detection 10s – steering 5s – driving
10s
Power management Low-power electronics cannot make
significant power saving No system-level management tool available
Conservative hand-crafted schedule Serialize all operations to avoid power surge Long execution time Solar power wasted
34
Pancam/Mini-TES
Mini-Corer
Instrument Arm Cluster : Raman Spectrometer Alpha-Proton-X-Ray Spectrometer (APXS) Mössbauer Spectrometer Microscopic Imager
Present missions – Athena/Mars ’03 Rover configuration
35
Athena/Mars ‘03 Rovers - power subsystem
Power utilization: 38 W = 19 W (CPU&I/O) + 9 W (accel and gyro) + 10 W (wheel
motors) for driving. 75 W = 19 W (CPU&I/O) + 55 W (transmission) for orbiter
communication 30 W = 19 W (CPU&I/O) + 10 W (transmission) for lander relay
communication 55 W = 19 W (CPU&I/O) + 33 W (peak motor) for drilling 29 W = 23 W (CPU&I/O) + 6 W (cameras) required for imaging 11 W Raman, 1.4W APXS and 2.3 W for nighttime spectrometer
operation 141Whr daily for housekeeping engineering 75Whr limit for nighttime operations
36
Present missions – MUSES-CN Asteroid NanoRover
Completely solar powered Requiring only 1 watt, including
an RF telecommunications system for communications between the rover and a lander or small-body orbiter for relay to Earth.
Power source 500 grams of commercial, non-
rechargeable, replaceable lithium batteries, with energy density of 750 joules per gram.
37
Power-aware designs
Subsume low power as a special case Minimize power consumption Minimal application specific knowledge, limited reconfiguration space Conservative
Make best use of available power Use MAX solar power while it's available Increase parallelism, perform more tasks, reduce mission time Both MIN and MAX power constraints
Application-specific knowledge Multiple mission requirement Adapt to run-time power supply, operating environment
38
System-level power management
Amdahl's law -- extended to power Component-level improvements must be scaled by % contributions Synergy between inter-component interactions
Scope of system power model Digital, mechanical, thermal Battery model - control power surge Renewable source - solar panel, etc
Mission-driven tradeoffs Execution time vs. power saving Adapt to operating environment
39
What's needed?
Reconfigurable system architecture Statically configurable for different missions Reconfiguration for dynamic power management Support state-of-the-art power management policies
System-level design tool Support design space exploration Take full advantage of COTS components Optimize mission-specific system configuration Synthesize system-level power manager Support simulation for early validation
40
X2000 avionics system architecture
Symmetric COTS multiprocessors Low cost component with strong commercial support Widely accepted specification, design, application and testing Reduced development cost
Dual system bus architecture High speed data rate with moderate power Low speed control with low power
Industry standard bus protocols FireWire (IEEE 1394) bus I2C bus Reconfigurable bus topology
41
PA system architecture
The NASA X2000 Avionics System
high-rateinput
(camera)
high-speed bus (e.g. IEEE 1394)
communicationmodule (CDMA)
bus powercontroller
symmetric multiprocessor modules
altimetersubnet
microcontroller-directed subnet- power regulations & control- analog telemetry sensors- safety inhibits- valve & pyro drive
reconfigurable hardware blocks
low-speed bus (e.g. I2C )
42
Applicable power optimizations
Application level Scheduling under timing and power constraints Task partitioning, allocation, migration Algorithm selection
Architecture level Bus segmentation / clustering Communication scheduling
Component level Voltage / frequency scaling Power down
X-2000 goals Digital electronics power: 10x decrease Analog electronics power: 2x decrease Computer performance: 10 to 20x increase
both static &dynamic versions
43
The need for a system-level CAD tool
Avoid pitfalls with manual design Overdesign (too conservative) Hardwired assumptions in implementation (hard to change/adapt) System integration (bottleneck in projects)
Scalable methodology Specification: separation of concerns
Behavior vs. architecture Policy vs. mechanism Constraint vs. implementation
Exploration Framework for technique integration Rapid feedback
Manage complexity Knowledge base for component/bus details Consistent knowledge propagation through design stages
44
Design tool
Library Components and bus protocols Provides power estimation Defines configuration space
Authoring Behavioral description, architecture description Mapping from behavior to architecture
Synthesis Scheduling, partitioning Bus segmentation, voltage scaling Synthesis of power manager with task scheduler
Simulation High-level: explore design space Detailed-level: power/performance for a given design point
45
Behavior
Architecture
high-levelsimulation
functionalpartitioning& scheduling
compositionoperators
high-levelcomponents
behavioralsystem model
busses, protocols systemarchitecture
mappingsystem integration
& synthesis
staticconfiguration
dynamic powermanagement
IMPAC2T overview
parameterizablecomponents
46
Library: low-level components
Supported components COTS Parameterizable
Levels of abstraction Parameterizable Simulatable Synthesizable Reconfigurable
VHDL code
Bus width = 8 Bus width = 16
47
Library: component definition
Component interface Physical:pin interface Functional: data and control interface Power, current, voltage
Power/mode characterization Mode governs power usage Restrictions on mode changes allowed High-level yet refined power estimation
Aggregation Smaller components combined into larger ones New external parameters, interfaces, modes
48
Example components
Processor : PowerPC, ARM, Pentium, MIPS
Microcontroller StrongARM, Intel 8051, Motorola 68HC11, 68332
Bus controller/transceiver: FireWire controller& transceiver I2C bus controller, GPIB
Memory SRAM DRAM Flash memory
49
Example component definition
FireWire bus transceiver: National Semi CS4103 Working voltage: 3.3 V Power modes
Full-on (400mW) PHY-on (150mW) Standby (50mW) CLK-disable (21mW) Crystal-disable (16mW)
FireWire bus controller: National Semi CS4210 Working voltage: 3.3 V Power modes
Full-on (300mW) Standby (17mW)
Aggregated bus transceiver/controller Up to ten working modes to play with Flexibility in power management
50
Library: bus protocols
Architecture Parallelism (parallel or serial) Topology (serial, tree, ring) Service layers (physical, link, transaction, application)
Communication Data transfer mode (asynchronouus, isochronous) Data transfer speed Response mode (need acknowledgement or not) Arbitration mode
Configuration Configuration process (deterministic or randomly ) Reconfigurability (statical, hybrid, dynamical)
Power Power mode ( full-on, standby, deep-sleep, shutdown) Media (cable, wireless, backplane)
51
Bus protocols exploration
Explore bus protocol dimensions Protocol simulation
Input: bus protocol model Ouput: sequency of events
Map events into relative power quantities Compare and tradeoff between different design points
Example: simulating FireWire bus configuration Event-driven simulator Compare two designs with different topology
Pure tree topology (acyclic) Tree topology with bus segmentation
Tree-ID process, 9 nodes Tree 37 events Segmented tree 24 events
52
Bus optimization
Bus: a significant power consumer Up to 30% - 50% of the total system power consumption[Mehra97] Bus power consumption determined by
Capacitance (load C and bus C, proportional to bus length) Voltage (bus supply voltage and swing voltage) Bus access frequency Bus signal switching activity
Why bus power optimization? System performance requirements Power constraints Adapt to execution time variations Bus segmentation for increased bandwidth Enable other novel power management techniques
53
Bus-level optimizations
Bus encoding [Shin98][Benini97][Nakase98] Minimize switching activity on bus Makes sense mostly for parallel bus Gray code, bus-invert code, T0 code and Beach code Bus driver design
Bus clustering (segmentation) [Mehra97][Zhang98] Optimize bus topology by grouping components Divide the global bus into multiple segments Benefits:
Reduced bus capacitance (power saving) Shorter bus latency, higher throughput, increased flexibility
Partitioning [Hauck95][Yang94][Cong93] Divide tasks among components Minimize inter-cluster traffic Clustering before partitioning
54
FireWire (IEEE 1394)
High speed serial bus 100, 200, 400 Mbps in 1394a 800M, 1.6Gbps in 1394b
Advantages Low power Real-time bandwidth guarantee => important for media apps Isochronous and asynchronous transfer modes Hot-pluggable, self reconfiguring Supports bus segmentation
55
LegendCAM: cameraMC: micro controller
HD: hard driveNVM: non-volatile memorySCI: scientific equipmentRF modem: radio frequency modemI2C bus omitted on this diagram
FireWire 1394 Bus
SCI SCIHD / NVM
HD / NVMCPU 1
CPU 1
RF Modem
RF Modem
CAMCAM
MC1MC1
SCI1SCI1SCI2SCI2
CPU2 (Bus controller)
CPU2 (Bus controller)
MC2MC2MC3MC3
Tasks:
• MC's are responsible for sensing, drive control, steering control• Capture picture, compress in CPU1, and send data to RF Modem
• SCI's carry out scientific experiments, sending data to CPU2 • After analysis, CPU2 stores data in HD/ NVM
X2000 architecture mapping
Map Mars Rover application onto X2000 architecture
56
Bottlenecks in an unsegmented architecture
Contention for bus bandwidth Camera, RF, harddisk Forces serialization of communication globally
All nodes must be kept awake Prevents component shutdown Global overhead for bus reconfiguration
Long routing path Power overhead on routing controllers
57
Segmentation example
Three bus segments
SCI2RF Modem CAM MC1
HDMC2
SCI1
CPU2/ Bus controller
CPU1/DSP
MC3
MC sensingdrive control
steering control
SCIscientific
experiment
CAM picture captureimage compression
RF transmission
Suppose bus bandwidth is 100Mbps, image size 20Mb each, 20 pictures to work on, SCI data volume 16kbps X 10 Ks X 2 (4 hrs a day)
Power numbers:CPU1: 4.0WCPU2: 240mWRF modem: 1.7 WCamera: 2.6 WSCI1: 0.8 WSCI2: 3.2 WPower number details
58
Bus segmentation with FireWire
Blue nodes can't be disabled All nodes’ PHY layers must
remain active.
Request packets are broadcast to all nodes
Gray nodes can be safely disabled They are in different segments
from the active ones.
Request packets are broadcast to only active nodes.
segmentation
59
SCI2RF Modem CAM MC1
HDMC2
SCI1
CPU2/ Bus controller
CPU1/DSP
MC3
Throughput improvement
100Mbps bandwidth9s transfer time
300Mbps5s transfer time
No useful traffic
Bus segmentation help improve bus bandwidth.
FireWire 1394 Bus
SCI SCIHD / NVM
HD / NVMCPU 1
CPU 1
RF Modem
RF Modem
CAMCAM
MC1MC1
SCI1SCI1SCI2SCI2
CPU2 (Bus controller)
CPU2 (Bus controller)
MC2MC2MC3MC3
60
SCI2RF Modem CAM MC1
HDMC2
SCI1
CPU2/ Bus controller
CPU1/DSP
MC3
Bandwidth-enabled voltage scaling
Use voltage scaling and
clock scaling to decrease
component power.
Bandwidth 100Mbps
Power consumption = 12.3 W
Could be 300Mbps, keep it at 100Mbps
Power consumption after voltage scaling = 9.2 W
61
Power/latency reduction
energy consumption = 46 JPower consumption after voltage scaling = 9.2 W
Data transfer time = 5 s
Note: bus configuration power not counted
Power consumption = 12.3 W
Data transfer time = 9 senergy consumption = 111 J
energy saving 58% Power saving 25%
62
Segmentation-enabled shutdown
All components’ bus interfaces are active.Entire bus is hot.
Non-operating bus segments are disabled.Non-operating components are disabled.
Bus power is saved.
Drive control(10 min.)
Drive control(20 min.)
Picture capture(6 min.)
Science experiment(20 min.)
63
Combined energy savings from static techniques
Shutting down inactive nodes:27 times of global bus configs.
Only 11 bus configurations Config energy << 165 J Transceiver energy 1962 JConfig energy + transceiver energy
< 1962 + 165 = 2127 JNot shutting down inactive nodes:Bus transceiver active all the time. Transceiver energy:
150 mW x 10 x 3360 s = 5040 JTransceiver: National Semi CS4103, PHY-active only mode.
2.4 X energy reduction!
64
Dynamic bus reconfiguration
SCI2 RF ModemCAM MCS1
HDMCS2
SCI1
CPU2/ Bus controller
CPU1/DSP
MCS3
Solution: dynamically change bus topology
Science experimentsRadio frequency data transfer
SCI+RF (20+60 min)SCI2RF Modem CAM MC1
HDMC2
SCI1
CPU2/ Bus controller
CPU1
MC3
New task: send data from HD to RF modem!(continue from previous task )
Science experimentsRadio frequency data transfer
SCI+RF (20+60 min)
65
Energy savings from dynamic bus reconfiguration
Local configuration: 3 Global configuration: nonere-segmentation : noneActive transceiver: 7Active bus segment: 2Energy: 12.7 x 3 x 1+ 0.15 x 7 x 4800 = 5078 J
Local configuration: none Global configuration: 1re-segmentation : 1Active transceiver: 3+2Active bus segment: 1
Power number list:Local config: 12.7WGlobal config: 23.7WActive transceiver: 150mWSegmentation: software support Bus segment: proportional to bus length
Energy: 23.7 x 1 x 1+ 0.15 x 3 x 4800 + 0.05 x 2 x 4800 = 2664 J
1.9 X energy reduction!
66
Summary of architecture optimization
Towards loose coupling Reduced bus contention Increased parallel bandwidth Enabling voltage/frequency scaling
Application-driven clustering Communication bandwidth requirements between processes Knowledge from high-level behavioral model
Static optimization 2.4x energy reduction Bus segmentation Cluster shutdown
Dynamic reclustering 1.9x energy reduction
67
Power management & optimization
Behavioral modeling Extract power related attributes of all objects
Architecture modeling Use low-power devices or devices that can operate on low-power mode
Partitioning Migration – merge computations on under-utilized processors on one
processor to improve utilization Segmentation – separate tightly coupled computations into clusters to
localize communication
Scheduling Arrange operation sequences on multi-processor / multiple power
consumer to meet both performance and power requirement
68
Behavioral model
Application specific knowledge Input, output and function Dependency and precedence Control and data flow Timing and sequence
Software architecture Operating system features – real-time, centralized, distributed, and etc. Execution model – event driven, interrupt, distributed agent, client-
server, and etc. Communication model – protocol stack and specification
Power related attributes Data rate, execution time, CPU speed, memory size, communication
path, and etc.
69
Allocation
Map behavioral objects to hardware Group related OS, communication, control and application objects into
processing nodes Extract data objects into storage nodes Allocate components/packages for each processing node Arrange data storage for data nodes and optimize storage location to
reduce communication
Map communication paths to busses Setup working mode of each component/package to fit the behavioral
requirement Extract attribute of each structure
Function – computation, control, communication CPU utilization Bus traffic Power consumption
70
Scheduling
Mapping of tasks to time slots Computation Communication
Mapping of power usage to time slots Mechanical devices Thermal subsystems Other electronics subsystems
Constraints Real-time deadlines, periods, min/max separation Power budget, power surge (min/max) Potentially scenario-driven
71
Scheduling techniques
Deadline based real-time scheduling on multiprocessors
Rate-monotonic scheduling – extend existing RM scheduling to multiprocessors
Timing constraint graph scheduling – multiple serializable sequences in a single heart beat
72
Novel IMPACCT scheduler
A novel graphical tool Timing and power constraint visualization Transforms them into graph problems Give designers a vision to the power surge at run-time
Complete system-level model All power sources All power consumers
Power-aware scheduling Schedule operations based on power source output Both performance requirement and power constraint Regulate power surge Optimize for power efficiency and reduce execution time
73
Power
TimeStarting time Ending time
Power level Energy consumption
Demo
IMPACCT scheduler
Extended Gantt-chart in real-time scheduling for single processor Event – bins
Timing – horizontal size Power – vertical size Energy – area of the bin
Power surge – compacting bins downward
74
A
B B B B
C C C C C
D D D
Constant task A
Periodic task B
Periodic task C
Task D follows B
Power
Time
Demo
IMPACCT scheduler
Scheduling chart for multi-processor and multiple power consumers Events can overlap vertically
Multi-processor Multiple power consumer – electronics, mechanical, thermal
Power awareness – min and max power supply
75
A
B
C
DPower
Time
B
C
Deadline of B (scheduling space) Deadline of B
Min timing constraint of D
Max timing constraint of D
Deadline of C (scheduling space)
Deadline of C
Scheduling space of D
Slide bin within timing space
Squeeze/extend bin to available time slot
C
C
Demo
IMPACCT scheduler
Timing constraints – bin packing problem to satisfy horizontal constraints Independent tasks – moving bins horizontally Dependent tasks – moving grouped bins horizontally Power/voltage/clock scaling – extending/squeezing bins
76
A
B
C DPower
Time
B
Manual scheduling while monitoring power surge C
A
B
C
D
Power
Time
B
Attack spike
Automated global scheduling to meet min-max power
CC
Max
Min
Improve utilization
Demo
IMPACCT scheduler
Power constraints – bin packing problem to satisfy vertical constraints Automatic optimization – let the tool do everything Manual optimization – visualizing power in manual scheduling
77
Example revisited – Mars Rover
System specification 6 wheel motors 4 steering motors System health check Hazard detection
Power supply Battery (non-rechargeable) Solar panel
Power consumption Digital
Computation, imaging, communication, control Mechanical
Driving, steering Thermal
Motors must be heated in low-temperature environment
78
Timing constraints – Mars Rover
Operation Duration Timing constraintsHealth check 10 s Once in every 10-minute intervalHeating steering motors 5 sHeating wheel motors 5 sHazard detection 10 s Before steering
Steering 5 sBefore driving ALL four steering motors must be heated during the 50-second period prior to steering.
Driving 10 s ALL six wheel motors must be heated during the 50-second period prior to driving.
79
Scheduling method
Constraint graph construction Nodes: operations Edges: precedence relationship between operations
Resource specification Resource: an executing unit that can perform operations independently
Six thermal resources for wheel heating Four thermal resources for steer motor heating One mechanical resource for driving One mechanical resource for steering One computation resource for control
Operations on one resource must be serialized
Scheduling Primary resource selection Schedule primary resource by applying graph algorithms Auxiliary resources and power requirement are considered as scheduling
constraints
80
Constraint graph
System health check /
Thc
System health check /
Thc
thc -(thc + Thc)
Heat wheel 1 / Thw
Heat wheel 2 / Thw
Heat wheel 3 / Thw
Heat wheel 4 / Thw
Heat wheel 5 / Thw
Heat wheel 6 / Thw
Heat steer 2 / Ths
Heat steer 3 / Ths
Heat steer 4 / Ths
Hazard detection / Thd
Steer / Ts
Drive / Td
- thw
-ths
Heat steer 1 / Ths
81
-ths + Ths_E
-thw + Thw_E
thc -(thc + Thc)
Resource specification
Hazard detection (C) /
Thc / Phc_CHealth check (C) /
Thc / Phc_C
Heat steer i (C) / Ths_C /
Phs_C
Heat steer i
(T) / Ths_T /
Phs_T
Heat wheel j
(C) / Thw_C
/ Phw_CHeat
wheel j (T) / Thw_T /
Phw_T
Steer (C) / Ts_C /
Ps_CSteer
(M) / Ts_M /
Ps_M
Drive (C) / Td_C /
Pd_C
Drive (M) / Td_M /
Pd_M
Health check (C) /
Thc / Phc_C
Computation
Mechanical
Thermal
Heat steer i
Heat wheel j
Health check
Health check
Steer
Drive
Hazard detection
82
Scheduling graph
Hazard detection (C) /
Thc / Phc_C
Heat steer i (C) / Ths_E /
Phs_E
Heat steer i
(T) / Ths_T /
Phs_T
Heat wheel j
(C) / Thw_E
/ Phw_E
Heat wheel j
(T) / Thw_T /
Phw_T
Steer (C) / Ts_C /
Ps_CSteer
(M) / Ts_M /
Ps_M
Drive (C) / Td_C /
Pd_CDrive
(M) / Td_M /
Pd_M
-ths + Ths_E
-thw
Primary resource: Computation
Auxiliary resource: Mechanical
Auxiliary resource: Thermal
Health check (C) /
Thc / Phc_C
thc -(thc + Thc)
-ths
-thw + Thw_E
-Ts_C + Ts_M
83
-40 degC -60 degC -80 degC
Solar power 14.9 12 9Battery 10 max 10 max 10 max
Heat one motor 5 5.1 6.2 7.5
Heat two motors 5 7.6 9.5 11.3
Drive 10 7.5 10.9 13.8
Steer 5 4.3 6.5 8.1
Hazard detection 10 5.1 6.7 7.3
Health check 10 4.7 5.7 6.3CPU Constant 2.5 3.5 3.7
PowerResource Duration
Example – Mars Rover
Power constraints Different solar power supply over time Different power consumption over
temperature/time
84
JPL Solution - High Solar Power (a)
0
5
10
15
20
25
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85
Time
Heat steer
Heat wheel
Drive
Steer
Hazard detection
Health check
CPU
System heart-beat - moving two steps
(a) Begin with health check (b) no health check
Previous solution by JPL
Over-constrained, conservative Serialize every operation to
satisfy power constraint Longer execution time and
under-utilization of solar power
No scheduling tool is used – manual scheduling
Not power-aware Scheduling without considering
power sources and consumers
85
System heart-beat - moving two steps
(a) Begin with health check (b) no health check
Solution 1: high solar power (14.9W)
Max solar power: 14.9W at noon Improved utilization of solar
power Automated scheduling – use
scheduling tools
Aggressive – do as much as possible heating motors while doing
other operations Fastest moving speed – no
waiting on heating
86
System heart-beat - moving two steps
(a) Begin with health check (b) no health check
Solution 2: typical solar power (12W)
Moderate solar power output – 12W Improved utilization of solar
power Automated scheduling – use
scheduling tools
Moderately aggressive – avoid exceeding power limit Relaxed constraint –heating
motors while doing other operations
Faster moving speed – some waiting time on heating
87
System heart-beat - moving two steps
(a) Begin with health check (b) no health check
Solution 3: low solar power (9W)
Minimum solar power output – 9W Restricted constraint –
serialize operations Automated scheduling – use
scheduling tools
Conservative – same as JPL solution Slow moving speed Full utilization of low solar
power
88
Solar power output Battery energy Solar energy % of solar energy Time Moving distance14.9 0 672.5 60% 75 2 steps - 14cm12 55 817 91% 75 2 steps - 14cm9 388 675 100% 75 2 steps - 14cm
Solar power output Battery energy Solar energy % of solar energy Time Moving distance14.9 6 604 81% 50 2 steps - 14cm12 149 720 100% 60 2 steps - 14cm9 388 675 100% 75 2 steps - 14cm
Comparison
JPL's previous solution Conservative – long execution time, low solar power utilization Not power aware – same schedule for all cases Not intend to use battery energy
Our solution Adaptive – speedup when solar power supply is high Power-aware – smart scheduling on different power supply/consumption Use battery energy when necessary
89
Travel Distance
TimeEnergy
CostTravel
DistanceTime
Energy Cost
0-610 14.9 16 610 0 24 610 72611-1220 12 16 610 440 20 610 1569.5
1221- 9 16 610 3114 4 160 786Total 48 1830 3554 48 1380 2427.5
Improvement
24.6% 31.7%
IMPACCTJPLTime frame Solar Power
Application-level evaluation
Mission description Target location – 48 (distance-) steps away from current location
Power condition 14.9W solar power for first 10 minutes, 12W for next 10 minutes, 9W
thereafter
Metrics Execution time Total energy drawn from battery
90
Application-level evaluation
Power-awareness Execution speed scales with power
condition adaptively
Smart schedule Maximize best case Avoid worst case
Tradeoff Power vs. performance Energy renewability
Application-specific Application-level knowledge Working mode parameters of
components
91
Program plans and milestones
92
Development plans
Web-based CAD tool Perl/CGI scripts for configuration Java applets for interactive scheduling UI Interface with database engine
Interface with commercial CAD backend Detailed power estimation tools Functional simulation with proprietary models
Rationale No software installation needed by end user Ready to use by everyone on the Internet Open source with all publicly available development tools
93
Status & accomplishments to date
Architecture Component Busses Software Example
Mars Pathfinder completed completed completed completedX-2000 completed completed in progress planned
Specification Low-level High-level UI Integration
Library in progress in progress in progress planned
Authoring in progress in progress in progress planned
Synthesis Static Hybrid Dynamic Integration
Partitioning in progress planned planned planned
Scheduling in progress planned planned planned
Bus segmentation in progress planned planned planned
Voltage scaling planned planned planned planned
Output Low-level High-level UI Integration
Simulation planned planned planned planned
Configuration planned planned planned planned
94
July 200
0
Aug
200
0
Sep
t 200
0
Oct
2000
Nov
2000
Dec
2000
Jan 200
1
core toolUI
Library
Authoring
Partitioning
Scheduling
Segmentation
Volt. Scaling
Simulation
IMPACCT schedule
planned in progress
95
Original schedule
2Q 00
Kickoff
2Q 01 2Q 02
System modeling
Coordination synthesis
Architecture definition
Static partitioning
Component partitioning
System modeling
Coordination synthesis
Architecture definition
Static partitioning
Component partitioning
Component simulator
PCL benchmarking
Synthesizable components
System benchmarking
Component simulator
PCL benchmarking
Synthesizable components
System benchmarking
Power aware design techniques
PCL definition
Simulatable components
Benchmark Identification
Power aware design techniques
PCL definition
Simulatable components
Benchmark Identification
Authoring tool v1.0
Dynamic partitioning
Simulator v1.0
Component partitioning
Authoring tool v1.0
Dynamic partitioning
Simulator v1.0
Component partitioning
network option
96
Updated schedule
2Q 00Kickoff
2Q 01 2Q 02
Static & hybrid optimizations Partitioning / allocation Scheduling Bus segmentation Voltage scaling
Library COTS components FireWire and I2C bus models
Static composition authoring
High-level simulation
Benchmark Identification
Architecture definition
Static & hybrid optimizations Partitioning / allocation Scheduling Bus segmentation Voltage scaling
Library COTS components FireWire and I2C bus models
Static composition authoring
High-level simulation
Benchmark Identification
Architecture definition
Dynamic optimizations Task migration Processor shutdown Bus segmentation Frequency scaling
Library Parameterizable components Parameterizable bus models
Reconfiguration authoring
Architecture reconfiguration
Low-level simulation
System benchmarking
Dynamic optimizations Task migration Processor shutdown Bus segmentation Frequency scaling
Library Parameterizable components Parameterizable bus models
Reconfiguration authoring
Architecture reconfiguration
Low-level simulation
System benchmarking
optionYear 1 Year2
97
Quarterly schedule
3Q2001
FireWire and I2C bus models
Static bus segmentation
Architecture definition
Low-level simulation
System benchmarking
Frequency scaling
High-level simulation
Hybrid partitioning / allocation
Voltage scaling
Parameterizable components
Dynamic scheduling
Parameterizable bus models
2000
4Q
1Q
2Q
3Q
4Q
2002
1Q
2Q
COTS components library
Static scheduling
Benchmark identification
Static partitioning / allocation
Hybrid scheduling
Static composition authoring
Dynamic processor shutdown
Dynamic bus segmentation
Dynamic reconfig. authoring
Hybrid bus segmentation
Architecture reconfiguration
Dynamic task migration
2001
98
Financial information
99
IMPACCT budget
Months 1-6 $180,000Months 7-12 $180,000Second year $400,000
IMPACCT Budget
0
50000
100000
150000
200000
250000
300000
350000
400000
450000
months 1-6 months 7-12 second year
Funding
IMPACCT Budget
100
Budget distribution
Budget Distribution
SalariesBenefitsEquipmentTravelOtherIndirectUSCJPL
101
http://www.ece.uci.edu/impacct/
102
Bibliography
[Mehra97] R. Mehra, et al. "A partitioning scheme for optimizing Interconnect power", IEEE Journal of solid-state circuits, Vol. 32, No.3, March 1997
[Shin98] Y. Shin, et al. "Reduction of bus transitions with partial bus-invert coding", Electrons Letters, vol.34, No.7, IEE 2 April 1998 p. 642-3
[Benini97 ] L. Benini et al. "Asymptotic zero-transition activity encoding for address buses in low-power microprocessor-based systems", Proceedings Great Lakes Symposium on VLSI, Los Alamitos, CA, USA: IEEE Comput. Soc. Press, 1997, p.77-82
[Nakase98] Y. Nakase et al. "Complementary half-swing bus architecture and its application for wide band SRAM macros", IEE proceedings-Circuits, Devices and Systems, vol.145, No.5 IEE, Oct 1998, p337-42
[Zhang98] Y. Zhang et al. "An alternative architecture for on-chip global interconnect: segmented bus power modeling", Thirty-Second Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 1-4 Nov. 1998.
[Kernighan70] B. Kernighan et al. “An Efficient Heuristic Procedure for Partitioning Graphs”, Bell System technical Journal Vol. 49 No.2, Feb. 1970 p291-307
[Hauck95] S. Hauck et al. “Logic Partition Orderings for Multi-FPGA Systems”, International Symposium on Field-Programmable Gate Arrays, 1995
103
Program Goals
Evaluation, exploration power usage, performance, cost alternative configurations, algorithms
Optimization achieve most effective power usage high-level, global knowledge
Tool integration many point tools, independent techniques
Specialization configurable platform
Reuse take advantage of rich collection of COTS not to re-design from scratch
104
Technical approach
High-level abstraction component vs. composition Separate models for architecture and behavior
Synthesis and optimization of power manager Architecture reconfiguration Scheduling for optimal power usage adaptable to different power management policies
Aggressive, domain-knowledge Encompass mechanical / thermal power Aware of power supply model
105
System level modeling
Architectural modeling COTS components component encapsulation bus architecture system interconnect
Behavioral modeling Application specific knowledge Software architecture Mission goals High level constraints
106
Power-aware coordination
Protocols Coordinate power usage
e.g. peak power, resource arbitration Multiple versions of given algorithm
Components Adaptable to different power management policies, not hardwired Usable in new applications even if not designed to be power aware!
Synthesis Coordination controller (“mode manager”) Optimization to minimize control dependency Optimality depends on architectural mapping
107
Measuring power consumption (1)
Different levels of analysis by # of operations:
(+) easy to implement (-) neglect of different sizes of modules Appropriate to compare two different architectures with similar modules
# of lines of code: (+) assume the size of hardware to be implemented (-) may be too simple to estimate power consumption With the number of operations, gives a indication of the power consumption
of each module # of F/F:
(+) more accurate measure (-) should find the relationship between # of F/F and # of lines of code The number of F/F is the lowest hardware characteristics in the high level
simulator Control unit and data path have different power dissipation pattern even
with same amount of gates
108
Measuring power consumption (2)
# of gates: (+) Makes accurate power estimation possible (-) needs Register transfer level (RTL) description and power analysis tools To get accurate hardware information, we have to implement RTL modules Input/output statistics of each module are also necessary
109
USC's Work in Progress
Select a processor simulator
Analyze the hardware description of each module
Estimate the power consumption of each module
Find performance-power ratio
Design a minimum power processor model
110
Program impact & transitions
Productivity Fully exploit off-the-shelf components Rapid turnaround time to architecture
Massive Scalability Protocol based power management System architecture platform
Robust methodology Unified functional/power correctness Confidence in complex design points
111
Bus Architecture Perspectives (X)
Parallelism Parallel:
high cost, high throughput, enable design exploration Serial:
low cost, constrained throughput, simple bus interface
Locality Functional Spatial
Adaptivity Adaptive Deterministic
112
Communication model asynchronous transfer isochronous transfer
Arbitration model Fair gap arbitration Priority arbitration
Configuration model Bus initialization Tree identification Self identification
FireWire (IEEE 1394) bus
Service model Physical layer Link layer Transaction layer
113
Architectural Model
Component – parameterized COTS Type – processor, memory, I/O, DSP, bus, and etc. Interface – how the components can be connected to each other Modes – operation modes parameters, voltage, clock speed, bandwidth,
power consumption, and etc.
Package – a bundle of connected components that performs certain operation A set of connected components Internal/external interface – how components are connected Modes – configuration space of the collected components specified by
each component’s working mode and collective attributes, e.g., voltage, speed, power and etc.
114
Approach: system-level modeling
High-level abstractions Employ application specific knowledge in system models Encompass multiple domains – electronics, mechanical, thermal
System modeling Behavioral modeling – software architecture, application specific
knowledge Architectural modeling – hardware platform built on top of
parameterized components Partitioning – mapping behavioral objects to architectural structures Scheduling – a valid sequence of concurrent/parallel operations on
multiple processors that satisfies real-time requirement
115
Example – Mars Rover
System specification 6 wheel motors 4 steering motors System health check Hazard detection
Power supply Battery (non-rechargeable) Solar panel
Power consumption Digital
computation, imaging, communication, control Mechanical
driving, steering Thermal
motors must be heated in low-temperature environment
116
Scheduling example – Mars Rover
Power constraints Solar panel: 14.9W peak power @ noon, 11W for 6hr/sol Battery: 10W max power output. 150W-hr energy storage CPU: 3.7W, constant for 4h/sol Health check: 6.3W, 10s Hazard detection: 7.3W, 10s Heating: 7.5W (1 motor) or 11.3W (2 motors), 5s Steering: 6.8W, 5s (7º/s) Driving: 12.4W, 10s (7cm)
Existing solution Serialize each operation to satisfy power constraint Conservative – longer execution time and under utilization of solar power No scheduling tool is used
117
Scheduling techniques
Constraint logic solving Transfer all constraints into a pure mathematical form Use tools to solve the problem in mathematical domain
Example – CLPR Constraints
C1 > 3, C1 < 5, C2 > 2, C2 < 4 # two power consumers C1 + C2 < S, S > 6, S < 12 # one power source
Inputs C1 = 4.5, S = 7
Results C2 < 2.5 2 < C2
118
Evaluation
Application level evaluation Metrics based on overall mission objectives Constraint-driven solutions
Power related scenario Various power constraint (supply/consumption) over different stages of
application Power-aware adaptive scheduling for different stages