Page 1
INDEX
Abstract Syntax Tree (AST), 172-173, 174 Access time, sandwich/spin tunneling cell
architecture, 23-24 Accumulator registers, 81 ACPI (Advanced Configuration and Power
Interface), 118-119, 128, 154-155, 286 Active power: see Flip-flops Activity reduction, pipeline gating as, 60-61 Adaptive bias logic, queue design, 50 Adaptive CAM array read simulation, 44-46 Adaptive filter, FORTE, 250, 252, 253 Adaptive issue queues: see Queue; Queue design Adaptive logic and processing
circuit-level evaluation of power and performance overhead, 38
dynamic: see Dynamic adaptation satellite-based parallel signal processing, 251-
252 application partitioning, 251-252 architecture, 251, 252
Address bus, transition count reduction, 213, 164 Addressing
energy exposed instruction sets: see Direct addressing, stores with
and StrongARM SA-ll00 current consumption, 344-345
Admission control code, API, 161 Advanced Configuration and Power Interface
(ACPI), 118-119, 128, 154-155, 286 AgilentlHP 0,25um CMOS, 8-13
361
Aircraft, design-time optimization for, 228-233 aircraft examples and analysis, 231-233 domain considerations, 229-230 endurance, 230 endurance as function of energy conservation,
230-231 Algorithmic transformations, 212-213 Alpha 21264, 36 alpha-queue, 144-145 Annapolis Micro Systems, WILDS TAR VHDL
templates, 182, 183 Annotations to AST nodes, SUIF, 174 API: see Application programming interface,
power-aware Application-level power awareness, 227-242
design-time optimization for aircraft, 228-233 aircraft examples and analysis, 231-233 domain considerations, 229-230 endurance, 230 endurance as function of energy conservation,
230-231 dynamic energy allocation for cooperating
sensors, 234-241 application of theory, 236 behavior of system, 238-240 energy allocation and sensor measurements,
235 fusing sensor measurements, 234 minimizing variance through energy
allocation, 235-236
Page 2
362
Application-level power awareness (con!.)
dynamic energy allocation for cooperating sensors (con!.)
parameterized sensor model, 236 two-sensor problem, 237-238
related problems, 240-242 Application partitioning, satellite-based parallel
signal processing, 251-252 Application programming interface, power-aware,
153-164 features, 156-160
interface between power manager and hardware, 157-158
interface between power manager and operating system, 158-160
future directions, 163-164 implementation, 160-161 predictive power-aware scheduling with eCos,
160-161 requirements, 155-156 results, 161-163
Application Specific ICs (ASICs), 170 Architectural innovations, 212, 213 Architectural level power modeling, 317-336
cycle simulator augmentation, 322-325 omitted details, 322-324 power estimation methodology, 324-325
future directions, 334-335 implementation of cycle-accurate power
estimator, 325-334 data structure and microarchitectural block
models, 326-328 power modeling techniques, 329-334
power metrics, 319-320 power modeling techniques, 329-334
clock distribution tree, 334-335 data path components, 332-333 memory models, 330-332 random logic and interconnections, 333
previous work, 320-322 Architecture: see also Energy-exposed instruction
sets; Microarchitecture design; RISCaccumulator architecture
HDLs: see PACT HDL instruction level parallelism, 63 power reduction approaches, 60 RISC, ARM-like, 211-223; see also ARM-like
RISC architecture sandwich/tunneling memory, 23-25, 31 satellite-based parallel signal processing, 251,
252 Architecture description file, PACT HDL
architecture independence, 180
INDEX
ARM instruction set, software energy profiling, StrongARM SA-HOO, 343
ARM-like RISC architecture, 211-223 benchmarks, 218-219, 220, 222 C compiler, 213 experimental setup, 213-217, 218 future prospects, 222 methodology, 218-219 metrics, 219, 222 off-chip memory and PCB bus models, 215-
217,218 previous research, 212-213 results, 219-221, 222 scaling off-chip memory bus frequency and
voltage, 219-221, 222 VeriLog simulation environment, 213-215
ARM processor, PACT ARM, 170 ARM Project Manager (APM), 341 ARM simulator, IouleTrack, 356 Array interleaving, 207-208 Arrays, PACT HDL, 174, 177 ASIC, 182, 183; see also PACT HDL Associative CAM-tag cache, 91, 92 AST (Abstract Syntax Tree), 172-173, 174 ASIX simulations, 43-44, 51-52 Average current consumption, 341
Backend, HDL AST, 181 Back-end (low-level) compiler optimization, 193-196 Balance equation, 113 Barrier instruction, 82-83 Barriers
restart analysis, 86 system calls, 85, 86
Base cost program blocks, 341 Baseline processor, energy exposed instruction
sets, 81 Batteries, FORTE, 252-253 Benchmarks, ARM-like RISC architecture, 218-
219, 220, 222 Bias logic table, 50 Bit-width analysis and reduction, 185 Block buffering, combined optimizations, 201-202 Body effect, 20 Branch instructions, restart analysis, 86 Branch prediction
confidence estimators, 61-62 pipeline gating, 64-65
Buffers, power reduction approaches, 60 Bus encoding, 164 Bus energy, compiler optimizations, 195-196 Bus modeling
ARM-like RISC architecture, 215-217, 218
Page 3
INDEX
Bus modeling (cant.)
ARM-like RISC architecture (cant.) frequency, 219-221, 222, 221, 222 power dissipation, 217, 218, 220
microarchitecture models, 326, 327-328 Bus state controller, 342 Bus transaction cycles, 323-324, 326 Bypass latches, 81, 87-90
compiler analysis, 89 evaluation, 90 ISA enhancements, 88
Cache architectural innovations, 213 block buffering, 201 CAM-tag, 91, 92 CPU cycle numbers, worst-case, 131 energy exposed instruction set techniques, 81 HDL AST optimization, 186 loop analysis, 185 loop transformations and tiling, 196-197 power reduction approaches, 60 software energy profiling, StrongARM SA-1100,
342, 343 subbanking, 201 web server workload, 269
Cache misses scheduling slack, 69, 70 compiler optimizations, 198-200
CACTI, 321, 331 Cai-Lim simulator, 319 CAM array read simulation, adaptive, 44-46 CAMIRAM queue design, 38-41, 55-56 CAM-tag cache, 81, 91, 92 Capacitance
power reduction, 60 transistor width and, 304, 305
C/C++, 171-172; see also PACT HDL ARM-like RISC architecture, 213 energy consumption estimation, louleTrack, 356 FORTE power usage optimization, 255, 256 GNU compiler, 255, 256 tag-unchecked compiler analysis, 95
CCMOS Flip-Flop, modified, 11, 12, 13 Checkpointed state, 83 Chip memory access, ARM-like RISC
architecture, 222 Circuit blocks, SDT devices, 25 Circuit parameters
evaluation of power and performance overhead, 38
microarchitectural block models, 326 Circuit sizing, 304-306
363
Circuit state average current consumption measurement, 341 effects on energy, 193
Circular queue structure, 38 C Level Design System Compiler, 172 Clock-based power management policies, 102 Clock distribution tree, 334-335 Clock frequency
changes in, 139-140 software energy profiling, StrongARM SA-1100,
342 Clock gating
energy tradeoffs, 354-356 flip-flops, 4, 5 hardware innovations, 212 static flip-flops, 7
Clock speeds reverse levelization and, 185 scheduling slack, 67
Clustered voltage scaling, 61 CMOS circuit, 19; see also Flip-flops
energy and delay, 294, 295-296 power reduction, 60
C2MOS Flip-Flop, 7 CMOS level power optimization, 172 Co-Design Automation, Superlog, 172 Cold scheduling effects, 193 Common Gateway Interface (CGI) scripts, 356 Compaction, latch-based design with, 55-56 Compaq iPaq, API, 161-162 Compiler: see also C/C++; PACT HDL
ARM-like RISC architecture, 211-223; see also ARM-like RISC architecture
energy exposed instruction sets RISC accumulator architecture, 89 tag-unchecked loads and stores with direct
addressing, 94-95 software restart regions, 85-86
Compiler optimizations, 191-208 energy-aware low-level compilers, 193-196
instruction scheduling and energy, 193-194 register assignment and bus energy, 195-196
hardware-software interaction, 200-208 code transformations and power mode control
mechanisms, 202-206 data transformations and power mode control
mechanism effectiveness, 206-208 hardware, 200-201 optimizations for memory energy, 201-202
high-level loop optimizations, 196-200 cache miss rates versus energy consumption,
198-200 experimental evaluation, 197-198
Page 4
364
Compiler optimizations (cont.)
high-level loop optimizations (cont.)
types of, 196-197 Compress program, 131 Computational energy, VSLI computations, 294 Confidence estimators
pipeline gateway, 61-63 speculation control, 63-66 terminology, 60
Consolidation of web servers, 262 Contactless RF identification tags, 27 Continuous-Time Markov Decision Processes
(CTDMP), 111, 112 Control pass, SUIF to HDL translation, 176, 178-
179 Control speculation: see Microarchitecture design Conventional Flip-Flop, 7, II, 12, 13 Conversion pass, SUIF to HDL translation, 177-
178 Core power dissipation, 220 Counter
queue, 48 reorder buffer, 49
Conventional Flip-Flop, 7 CoWare N2C layer, 171 CPU cycle numbers, worst-case, 130-131
flow control modeling, 131-132 static (off-line) power management, 134
CPU frequency scaling, web server, 265 CPU speed
power management points, 134 slack computation, 137-139 worst-case execution time (WCET), 129, 130,
139-140 time overhead, 139-140
CPU time and energy model, web server, 277-280 CSIM engine, web server, 277-280 Cubic power reduction, 133 Cumulative probability tables, decision states, 114-
115 Curie temperature (CT), 25, 26 Current
sandwich/spin tunneling cell, 20, 23 software energy profiling, 343
factors affecting consumption, 341 leakage current observations, 349-351; see
also Leakage current/power instruction current, 343-345 operating point and, 345, 346 prediction, 347-348 separation of components, 353-354 variation within instruction, 344-345
Cycle-accurate power estimator, 325-334
INDEX
Cycle-accurate power estimator (cont.)
data structure and microarchitectural block models, 326-328
power modeling techniques, 329-334 Cycle partitioning, 347 Cycle simulator, 322-325
efficiency of, 318 omitted details, 322-324 power consumption, 320 power estimation methodology, 324-325
Cycle window queue, 48, 55 shutdown logic, 49-50 SimpleScalar simulation, 52-53
CynApps C/C+ + extensions, 171
DA: see Direct addressing, stores with Data bus streams, cycle simulators, 323-324 Data caches, software energy profiling,
StrongARM SA-ll00, 342 Data gating, 4-5, 6; see also Flip-flops Data layout transformations, 207 Data path components, architectural level power
modeling, 332-333 Data storage: see Sandwich/spin tunneling memory
device Data structure, architectural level power modeling,
326-328 Data transformation: see Compiler optimizations DBench tool, 286 Deadline management, 156, 159, 160-161 Dead state elimination, 186 DEC AXP-21l64, 63 Decay, slack indicator table state, 74 Decision logic, dynamic adaptation algorithms,
46-49 Decision making, power management policies, 102 Decode cycle, pipeline gating, 63-64, 65-66 Delay: see Efficiency metric Et2; Flip-flops Delay product, low-power, flip-flop, 14-15 Demand paging, restart schemes, 85-86 Density density, sandwich/spin tunneling cell, 23 Design; see also Application-level power
awareness; Microarchitecture design flip-flops, 7-8, 9, 10 power estimation methodology, 324
Design Compiler, 183 Design Manager, 182 Design Power, 183 Desktops, power performance comparisons, 119 di/dt noise, 320 Diffusion sharing, flip-flop analysis, II Digital Signal Processors (DSPs), 27, 339
Page 5
INDEX
Direct addressing, stores with, 90-97 compiler analysis, 94-95 DA register implementation, 93-94 evaluation, 95 example use, 92-93 ISA enhancements, 92
Discrete-Time Markov Decision Processes, 102, 111,119
Display brightness, web server power management, 265
Dominating nest, 204, 205, 206 Drain Induced Barrier Lowering (DmL)
coefficient, 351-352 DRAM
power mode control, 202-206 software energy profiling, StrongARM SA-ll00, 342
DSTC, 6, 11, 12, 13 Duty cycle, and software energy estimation, 340 Dynamic adaptation, 52; see also Queue; Queue
design CAMIRAM design with, 55-56 queue design
algorithms, 46-49 issue queues, 41-46 issue queue size, 37
Dynamic circuits, microarchitectural block models, 326
Dynamic frequency scaling computation, 161 Dynamic management of power consumption, 294;
see also Application-level power awareness power-aware real-time systems, 136-139
evaluation of dynamic schemes, 146-147 reclaiming scheme, 129-130, 145
power consumption calculations, 320 system model, 104-110
hard disks, 108-109 overview, 110 portable devices, 106 queue, 110 smart badge, 107 user, 105, 106 WLAN cards, 109
techniques, 110-115 policy implementation, 114-115 results, 118-120, 123
voltage and frequency scaling, 103, 128 TISMDP, 115-118, 120-122, 123 web servers, 280-284
Dynamic power dissipation, 212 Dynamic reclaiming scheme, 129-130, 145 Dynamic reasoning, 241 Dynamic speed setting: see Power-aware real-time
systems
365
Earliest Deadline First (EDF) scheduling, 129, 129, 130
Earliness slack alpha-queue, 144-145 dynamic (on-line) power management, 136-137
eCos, 157, 160-161 EDF* policy, alpha-queue, 144, 145 EEPROMS, 20, 27 Efficiency, cycle simulator, 318 Efficiency metric Et2, 293-314
comparing algorithms, 296-299 Et,297-298 Et2, 299 e,298
energy and delay and VLSI computation, 294-296
power rule for sequential composition, 311-312 e efficiency of design, 300-304
parallelism, 300-302 pipelining, 302-304
e rules for parallel and sequential compositions, 312-313
transistor sizing for optimal e, 304-311 ET" with n *- 2, 306-307 experimental evidence, 308-309 minimum energy function, 308 multi-cycle system, 309-311 optimal energy and cycle time, 307-308
8TDFF, 6, 11, 12, 13 Embedded systems: see Application programming
interface, power-aware Empty states, HDL AST optimization, 186 Energy; see also Efficiency metric Et2
optimization: see Compiler optimizations overhead, power-aware real-time systems, 140 software, 229-258; see also Software energy
profiling transistor sizing for optimal e, 307-308 and VLSI computation, 294-296
Energy-aware applications: see Application programming interface, power-aware; Power-aware real-time systems
Energy delay metric SPEC2IW, 287 Energy efficiency, web servers, 265 Energy exposed instruction sets, 79-97
baseline processor, 81 exposing bypass latches with hybrid RISC-
accumulator architecture, 87-90 compiler analysis, 89 evaluation, 90 ISA enhancements, 88
future work, 95-97 instruction chain, 96-97
Page 6
366
Energy exposed instruction sets (cant.)
software restart regions, 81-87 categories of machine state, 84 compiler analysis, 85-86 evaluation, 87 example use, 84-85 restart marker implementation, 83
tag-unchecked loads and stores with direct addressing, 90-97
compiler analysis, 94-95 DA register implementation, 93-94 evaluation, 95 example use, 92-93 ISA enhancements, 92
Enumerated types, SUIF, 174 Environment, web servers, 266 Esterel-C Language (ECL), 171 ET2: see Efficiency metric Et2
Event-driven power management policies, 102 Exception management
sequential instruction semantics and, 82 software restart markers and, 80, 82-83
Execution box (EBOX), 344 Execution progress information, 134 Execution time
API requirements, 156 worst-case (WCET), 129, 130
Exponential behavior, software energy profiling, 351-353
Expressions, HDL AST, 175
Factor values, queue design, 52-53, 55 False alarms, FORTE signal detection, 248 Fast-Fourier Transform
FORTE,250 software energy profiling, 349-351, 355-356
Fast On-Orbit Recording of Transient Events: see FORTE
Fetch cycle, pipeline gating, 63-64, 65-66 File systems, flash memory, 164 Filtering, FORTE: see FORTE Finance workload, webserver, 269, 270, 274-276,
279,283 Finite State Machine (FSM) HDL AST, 173, 174-
175, 181 FIR Filter, HDL PACT compiler test, 187, 188 First-order model, software energy profiling,
345 First-use table, 37, 38, 37 Fission, loop, 202-206 Flash memory, 20, 164 Flip-flops, 3-16
clock gating, 4, 5
INDEX
Flip-flops (cant.)
comparative analysis and experimental results, 8, 10-13
data gating, 4-5, 6 design, static and dynamic, 7-8, 9, 10 pipelined multiplier design 8 x 8, 13-14, 15, 16
Floating point operations, 258-259, 340 Flow control models, power-aware real-time
systems, 131-132 FORTE: see also Satellite-based parallel signal
processing filtering, 249-250
application partitioning, 252, 253 sensor resources, 240 timing, 255, 256
goal, 244-245 hardware, 247-248
Fortran, 206 Field Programmable Gate Arrays (FPGA), 171,
182; see also PACT HDL Frame-based systems, periodic task model, 133 Frequency levels, software energy profiling, 347 Frequency scaling, 103, 115-118, 120-122, 123,
128 API, 157, 158 web servers, 280-284
Frequency-voltage tradeoff, SmartBadge, 107 FSM (Finite State Machine), 173, 174-175, 181
Gate leakage, 20 Gating: see also Flip-flops
architectural innovations, 212, 213 energy tradeoffs, 354-356 flip-flops, 4-5, 6 pipeline, 60-61; see also Pipeline gating
General purpose register (GPR) RISC architecture, 81
Global clock, software energy profiling, 343 Global register allocation, relabeling after, 195-
196 Global symbol table, SUIF, 174 GNU C compiler, 255, 256 GNU cross-assembler, 218 Go program, 131 Ground sensors, sandwich/tunneling memory
applications, 26-27
Half-select currents, 21, 23 Hamming distance, 213, 327-328 Hard disks, dynamic power management, 108-109,
118-119 portable devices, 108-109 user request arrival distribution, 105
Page 7
INDEX
Hardware, API power manager, 157-158 Hardware Abstraction Layer (HAL), API, 156, 157 Hardware C, 171 Hardware Description Languages (HDL): see HDL
AST; PACT HDL Hardware-software interaction
compiler optimizations, 200-208 code transformations and power mode control
mechanisms, 202-206 data transformations and power mode control
mechanism effectiveness, 206-208 hardware, 200-201 optimizations for memory energy, 201-202
interfaces: see Energy-exposed instruction sets Harvard Array of Clustered Computers (HACC),
286 Hazard information, single cycle simulation, 328-
329 HDL AST, 174-181; see also PACT HDL
backend, 181 PACT HDL, 185-186 SUIF to HDL translation, 176-179 symbols and symbol table, 175-176 target architecture independence, 179-181
High-impedance bus stream, 327-328 High-level loop optimizations, 196-200
cache miss rates versus energy consumption, 198-200
experimental evaluation, 197-198 types of, 196-197
High-Speed D Flip-Flop, 6, 11-12, 13 Hitachi SH-4 processors, 342; see also Software
energy profiling Homer's Rule, 177, 178 HSPICE, 9, 297 httperf simulation, 270-271 HTTP requests, server workload construction,
267-270 Hybrid Latch Flip-Flop (HLFF), 6, 11, 12, 13 Hybrid RISC-accumulator architecture: see Bypass
latches Hypercycle, 131
Idle mode, flip-flop designs, 6 Indicated slack, 73-74 Inductive effects, power consumption changes and,
320 Information Resource Caching (IRCache) Project,
267 Innate scheduling slack, 69-70 In-order ready queue, 37 Instruction base current cost, 341 Instruction-level parallelism (ILP), 36, 63
Instructions energy exposed instruction sets, 96-97 queue design, 36, 37
367
software energy profiling, StrongARM SA-ll00 caches, 342 current profiles, 343-345 current variation within, 344-345
reduction in number of, 60 scheduling
compiler optimizations, 193-194 terminology, 61
sequential semantics, 82 software energy profiling, StrongARM SA-lloo,
343, 347, 348 Instruction Set Architecture (ISA)
energy-exposed, 88; see Energy-exposed instruction sets
sequential instruction semantics, 82 tag-unchecked loads and stores with direct
addressing, 92 and software energy estimation, 340
Instructions per cycle (IPC) architectural improvements, 63 issue queue adaptation, 48 queue design and, 37, 38
Instruction trace, 341 Integer queue, 36 Intel
Evaluation, API, 161-162 SpeedStep technology, 286 StrongARM processor SA-ll0, 216-217 XScale, 160
Intel PentiumPro, 63 Interarrival times, 102, 104, 105 Interevent time set, TISDMP, 112 Inter-instruction (circuit-state) effects
average current consumption measurement, 341
on energy, 193 Interleaving, array, 207-208 Internal cycles, software energy profiling, 347,
348 Interrupt services routine (lSRs), API, 158 Inverter feedback based flip-flops, 7 Ionospheric dispersed signals, 246-247; see also
Satellite-based parallel signal processing IPC: see Instructions per cycle ISA: see Instruction Set Architecture Issue control logic, queue scheduling, 37-38 Issue queue, 36; see also Queue; Queue design
comparison of designs, 55-56 dynamic adaptation algorithms, 46-49 size of, 37
Page 8
368
Java, data transformations, 206 JouleTrack, 356-357, 358; see also Software
energy profiling Jump instructions, restart analysis, 86 Junction resistance, sandwich/spin tunneling cell,
22
Ko's Low-Power Flip-Flop, 7
Laplace Transform, HDL PACT compiler test, 187, 188
Laptops, 285 power performance comparisons, 119 TISDMP, LAN-attached, 119-120
Latching structure: see also Bypass latches flip-flops, 6 issue queues, 38-41 SDT devices, 24
Latency, single cycle simulation information, 328-329
Latest time, scheduling slack, 69 Leakage current/power, 20
energy tradeoffs, 354-356 flip-flop analysis, 16
data gated, 5, 6 low-leakage input vector, 5 pipelined mUltiplier, 14
MOS network, 352-353 power estimation methodology, 325 separation of current components, 353-354 software energy profiling, 340, 348-351
LEDA Systems HDL PACT compiler test, 187, 188 libraries, 214, 321
Linear loop transformations, 196, 201-202 Li program, 131 Locality, loop analysis, 185 Local memory, PACT HDL, 170 Logic gate, production rules, 295 LongRun, 286 Look-up tables
analytical power models, 321 power estimation methodology, 324
Loop analysis, 185 Loop and data transformations: see Compiler
optimizations Loop distribution (fission), 202-206 Loop execution, array interleaving, 207-208 Loop fission, 202-206 Loop invariant code motion, HDL AST
optimization, 186 Loop nest, dominating, 204, 205, 206 Loop optimizations, 196-200
INDEX
Loop optimizations (cont.) cache miss rates versus energy consumption,
198-200 combined optimizations, 202-206 data transformations, 206 experimental evaluation, 197-198 types of, 196-197
Loop reordering, 185 Loop tiling, 196-197, 201-202 Loop transformations
combined optimizations, 202-206 linear, 196, 197, 201-202
Loop unrolling, 197, 201-202 Low-Power Flip-Flop, 7, 11, 12, 13; see also Flip
flops Low Power Sleep mode with output pull Down
Flip-Flop (LPSDFF) design and operation, 7 -8, 9, 10, 11, 12, 13
low-power delay product, 14-15 pipelined multiplier design, 13-14, 15, 16
Low Power Sleep mode with output Pull-up flipFlop (LPSPFF) design and operation, 7-8, 9, 10, 11, 12, 13
low-power delay product, 14-15 pipelined multiplier design, 13-14, 15, 16
Low-power systems, compiler optimizations: see Compiler optimizations
Machine state energy exposed instruction sets, 84 software restart marker, 80-81
Macros, C/C++, 171 Magnetic material hysteresis: see Sandwich/spin
tunneling memory device Magnetic tunnel junction MRAM cell, 20-21,
28 Magnetic Tunnel Junctions (MTls), 20, 21, 28 Magnetoresistive Random Access Memory
(MRAM), 20-21, 28 Markovian randomized stationary policies,
114 MATCH, 181 Matched filter, FORTE application partitioning,
252,253 MATCH group, 172, 174 MATLAB, 172, 174, 255 Matrix Multiplication, HDL PACT compiler test,
187, 188 Maximum likelihood fit, FORTE application
partitioning, 252, 253 Mediabench
restart analysis, 86, 87 tag-unchecked compiler analysis, 95
Page 9
INDEX
Memory access
ARM-like RISC architecture, 222 microprocessors versus cycle simulators, 323 PACT HDL architecture independence, 180 software energy profiling, 347, 348
address PACT HDL architecture independence, 180 transition count reduction, 213
architectural level power modeling, 330-332 ARM-like RISC architecture bus, 215, 217, 218;
see also ARM-like RISC architecture frequency, 219-221, 222 power dissipation, 215, 217-218, 220
array interleaving, 207-208 flash, 20, 164 PACT HDL, 170
allocation of, 177 architecture independence, 180 caching, HDL AST optimization, 186 local,170 pipelining, 180-181, 184
power mode control, 202-206 restart analysis, 85-86 sandwich/tunneling, 23-25, 31; see also
Sandwich/spin tunneling memory device Memory-like microarchitectural blocks, analytical
power models, 321 Metrics, 219, 222; see also Efficiency metric Et2
Microarchitecture design, 59-77 background and terminology, 60-61 costs and benefits of slack indication table slack
detection, 73-76 pipeline gateway, 61-66
confidence estimators, 61-63 speculation control with confidence
estimation, 63-66 slack detection, 70-72 slack indicator table, 72-76
using indicated slack, 73-74 slack scheduling, theoretical underpinnings, 68-
70 speculation control by exploitation of scheduling
slack, 66-68 Microarchitecture simulation-based results
block models, 326-328 queue design, 52-56
Microprocessor, hardware innovations, 212 Microsoft On-Now initiative, 286 Minimum energy function E(t), 304, 308 MIPS instruction set, restart regions, 83 MIPS-like instruction, SUIF, 174 MIPS Rl0000, 36
MIPS R3000, Et2 measurements, 299, 300 MIPS RISC microprocessor, 81 Miss rates, cache, 198-200 Model Sim, 182 Model Technologies synthesis flow tools, 182 Modified CCMOS Flip-Flop, 11, 12, 13 MP3 audio
dynamic voltage scaling, 120-122, 123 TISDMP, 116-117
MPEG video decoder, 120-122, 123 MRAM (Magnetoresistive Random Access
Memory), 20-21, 28
369
MTJs: see Magnetic Tunnel Junctions Multi-cycle system, transistor sizing for optimal
e,309-311 Multiple program, multiple data-stream processing,
FORTE, 252, 253 Multiple program, single data stream processing,
FORTE application partitioning, 252, 254 Multiprocessor architecture, Power Aware
(PAMA),244
Nagano Winter Olympics: see Web server National Laboratory for Applied Network
Research (NLANR), 267 Neel temperature (TN), 25, 26 Nest, dominating, 204, 205, 206 Nested loops, loop distribution/fission, 202-206 9TDFF, 6, 11, 12, 13 nMOS transistor current, 295-296 Nonstationary policies, 102 N2C layer, 171
Off-chip bus frequency, 221, 222 Off-chip bus power dissipation, 217, 218 Off-chip memory and PCB bus models, 215-217,
218 Off-chip memory bus frequency and voltage, 219-
221,222 Off-line scheduling, 128-129 Off-line (static) power management, 130, 134-136 Olympics of 1998: see Web server Olympus Synthesis System Hardware C, 171 On-line (dynamic) power management, 130, 136-
139 On-Now initiative, 286 Operating frequency, software energy profiling,
340, 347 Operating point, and program current
consumption, 345, 346 Operating systems: see Application programming
interface, power-aware
Page 10
370
Operating voltage and frequency, software energy estimation, 340
Operational latency, scheduling slack, 68 Operators
HDL AST, 175 production rules, 295
Optimization: see Compiler optimizations; Flipflops; Queue; Queue design; PACT HDL
Out-of-order execution, instruction level parallelism, 63
Out-of-order processors pipeline gating, 65 -66 power consumption, 36
Output control, flip-flop design and, 7
PACT HDL, 169-189 future work, 188, 189 HDL AST, 174-181
backend, 181 SUIF to HDL translation, 176-179 symbols and symbol table, 175-176 target architecture independence, 179-181
optimization for power and performance, 183-186
HDL AST, 185-186 memory pipelining, 184 SUIF AST, 184-185
results, 186-187, 188 static size arrays, 177 SUIF,173 synthesis flow, 182-183
ASIC design flow, 183 FPGA design path, 182
PACT (Power Aware Architecture and Compliation Techniques), 170; see also PACTHDL
Parallelism FORTE application partitioning, 252, 253, 254 instruction level, speCUlation and out-of-order
execution, 63 issue queue adaptation, 48-49 SOT junctions, 23 shutdown logic, 52 signal processing: see Satellite-based parallel
signal processing e effiency of design, 300-302 transistors, leakage current, 352-353 utilization-based algorithms versus, 55-56
Parameter estimation, satellite-based parallel signal processing, 249-251
Pareto distribution, user request arrivals, 105, 110 Partitioned memory architecture, array
interleaving, 207-208
INDEX
Pass-gate, dynamic flip-flop design and operation, 7-8,9,10
PCB bus models, ARM-like RISC architecture, 215-217, 218
Peak power, power consumption calculations, 320 Performance
optimization: see PACT HDL web servers, 284-285
Periodic task model, power-aware real-time systems, 133
Perl program, 131 Personal digital assistants, 285 Personal Digital Assistants (PDAs), 285, 339 Pipeline
energy exposed instruction set techniques, 81 PACT HDL, 184 PACT HDL architecture independence, 180-181
Pipelined multiplier design 8 x 8, 13-14, 15, 16 Pipeline gating
at decode and issue stage, 65-66 goal of, 63 microarchitecture design, 61-66
confidence estimators, 61-63 specUlation control with confidence
estimation, 63-66 scheduling slack, 67-68, 69, 70 terminology, 60-61
pMOS transistor, pull-up network, 295-296 Pointers, SUIF, 174 Portable computers, 285 Portable devices, 106, 285, 339; see also Laptops Port symbol table, 180 Power, total (Ptotal), flip-flop analysis, 10 Power availability, satellite-based parallel signal
processing, 252-255 Power-aware API: see Application programming
interface, power-aware Power Aware Architecture and Compilation
Techniques (PACT), 170; see also PACT HDL
Power-aware design: see Application-level power awareness
Power Aware Multiprocessor Architecture (PAMA),244
Power-aware real-time systems, 127-149 dynamic (on-line) power management, 136-139 energy overhead, 140 evaluation of dynamic schemes, 146-147 maximizing reward while meeting time and
energy constraints, 147-148 modeling flow control, 131-132 periodic task model, 133 power consumption model, 133-134
Page 11
INDEX
Power-aware real-time systems (cant.) power management points, 134 speculative speed retention, 145-146 speed management overhead, 139-140 static (off-line) power management, 134-136 system level dynamic power management, 143-
145 task and systems models, 130-131 task level dynamic power management, 141-142,
143 time overhead, 139-140
Power Compiler, ARM-like RISC architecture, 214-215, 218
Power consumption management of: see Dynamic management of
power consumption power-aware real-time systems, 133-134 web servers, 271-277
measurements, 271-275 opportunities for power management, 275-
277 Power-delay product, 10, 11, 14-15 Power dissipation, 212
ARM-like RISC architecture, 217, 218 power estimation methodology, 325 SIA roadmap, 36 simulation of, 321, 322
Power estimation methodology, architectural level power modeling, 324-325
Power management API
and hardware, 157-158 and operating system, 158-160
architectural level power modeling, 319-320, 329-334
clock distribution tree, 334-335 data path components, 332-333 memory models, 330-332 random logic and interconnections, 333
dynamic: see Dynamic management of power consumption
optimization: see Flip-flops power-aware real-time systems: see Power
aware real-time systems system-level DPM, 102
Power management points, 134 Power management policy, 102 Power mode control mechanisms, compiler
optimization code transformations, 202-206 data transformations, 206-208
Power PC 603 Flip-Flop, 7 Power rule for sequential composition, 311-312
PowerScope, 287 Power state machine model, 110 Power supply voltage, ARM-like RISC
architecture, 215 Power usage, satellite-based parallel signal
processing, 255-259 PPC603 Flip-Flop, 11, 12, 13
371
Prediction assessment, confidence estimators, 61-62 Predictive power-aware scheduling with eCos,
160-161 PrimePower, 321 Printed Circuit Board (PCB) bus, Verilog
simulation, 213, 214 Printed Circuit Board Power model, ARM-like
RISC architecture, 216-217 Processor
frequency, power reduction, 60 PACT HDL, 170 pipeline: see Pipeline gating power: see Power-aware real-time systems
Processor-cache interface, ISA enhancements, 92 Processor states, API, 158 Process scheduling: see Application programming
interface, power-aware Producer, dynamic adaptation algorithms, 46-47 Production rules, 295 Program counter
restart, 83 sequential instruction semantics and, 82
Programmable gate arrays, magnetic memory applications, 27
Program performance completion time, ARM-like RISC architecture,
221 reducing number of instructions, 60
Proxy servers, 267 Proxy workload, web server, 270-271, 276, 279,
280,283 Pull-down bus state, 327-328 Pull-down network, single nMOS transistor as,
295-296 Pull-up bus state, 327-328 Pulse-Triggered True Phase Flip-Flop (PTTFF), 6,
11, 12, 13 Push-Pull Flip-Flop, 7, 11, 12, 13
Quadratic power reduction, 133, 279 Quality metric, 213 Queue
dynamic power consumption management, 110 models
TISDMP,104 web server, 277-280
Page 12
372
Queue design, 35-57 dynamic adaptation algorithms, 46-49 dynamic adaptation in issue queues, 41-46 latch and eAMIRAM based issue queues, 38-
41 microarchitecture simulation-based results, 52-
56 shut-down logic, 49-52
Radiation, solar array degradation, 253-254 Radiofrequency signals: see FORTE Random access memory
DRAM power mode control, 202-206 software energy profiling, StrongARM
SA-llOO, 342 MRAM, 20-21, 28
Random logic and interconnections, architectural level power modeling, 333
Rate Monotonic Scheduling (RMS), 129, 161 Read access time
MRAM,21 SOT devices, 25
Read caching, bypass latches, 88 Read circuits, sandwich/spin tunneling cell, 22 Ready queue, 37 Real-time systems: see Power-aware real-time
systems Reconfiguration elements, magnetic memory
applications, 27 RedHat eCos, 160 Register assignment, compiler optimizations, 195-
196 Register files
energy-exposed processor, 87, 88, 90, 91 extensions, slack state, 7 J
Register Transfer Level (RTL) models ARM-like RISe architecture, 214 HDL codes, 170 synthesis flow, 182
REL HDL codes, macros, 171 Remote Debug Interface (RDI), 357 Remote sensing, 245-248; see also Satellite-based
parallel signal processing Remote sensing applications, 245-248
FORTE hardware, 247-248 ionospheric dispersed signals, 246-247
Reorder Buffer (ROB) queue, 48-49 parallelism-based algorithm, 52 slack state, 71, 73
Replay program, web servers, 270-271 Request interarrival times, 102, 104, 105
Resistance, tunneling, 22 Resource Allocation Strategy, HDL AST
optimization, 186 Restart
machine state categories, 83
INDEX
software: see Software restart regions, energy exposed instruction sets
Reverse levelization, 185 Reward-based model of power management, 147-
148 RF identification tags, 27 RISe architecture
ARM-like, 211-223; see also ARM-like RISe architecture
bypass latch exposure, 81 energy exposed instruction sets, 81, 87-90
compiler analysis, 89 evaluation, 90 ISA enhancements, 88
software restart marker, 80 ROB: see Reorder Buffer RTL (Register Transfer Level) model, 170, 182,
214 Run-time execution profile, 341 Runtime slack, 69
SA-1100: see Software energy profiling; StrongARM SA-lloo
Sandwich/spin tunneling memory device, 19-32 magnetic tunnel junction MRAM cell, 20-21,
28 memory circuits/architecture, 23-25, 31 potential applications, 26-27, 29 potential higher density sandwich/tunneling
memory, 25-26, 29, 32 sandwich spin tunneling cell, 21-23, 29, 30
Satellite-based parallel signal processing, 243-259 adaptive power-aware processing, 251-252
application partitioning, 251-252 architecture, 251, 252
conventional solutions to power management, 244
FORTE goal, 244-245 power availability, 252-255 power usage, 255-259 remote sensing applications, 245-248
FORTE hardware, 247-248 ionospheric dispersed signals, 246-247
signal filters for parameter estimation, 249-251 signal filtering, 249-251 trigger and digitizer output signals, 249-251
Scaling epu frequency, web server, 265
Page 13
INDEX
Scaling (cont.)
off-chip memory bus frequency and voltage, ARM-like RISC architecture, 219-221, 222
voltage: see Power-aware real-time systems Scheduling
API requirements, 154-155. 160; see also Application programming interface, poweraware
compiler optimizations, 193-194 effects on energy, 193-194 Rate Monotonic, 129 slack state, 61, 72; see also Slack static dynamic, 129 static off-line, 128-129
SDFF, 11, 12, 13 SDRAM, software energy profiling, 342 Second-order model, software energy profiling,
345-348 Segment flow graph, flow control modeling, 131-132 Selection logic, issue queue instructions, 36 Select transistor, MTJ s, 21 Semantics, instruction, 82 Semi-dynamic Flip-Flop (SDFF), 6 Semi-Markov decision processes (SDMP), Ill, 112 SenseAmp Flip-Flop, 6, 11, 12, 13 Sense amplifier, sandwich/spin tunneling cell
architectures, 24 Sense Current Driver, 24 Sensors
dynamic energy allocation, 234-241 application of theory, 236 behavior of system, 238-240 energy allocation and sensor measurements,
235 fusing sensor measurements, 234 minimizing variance through energy
allocation, 235-236 parameterized sensor model, 236 two-sensor problem, 237-238
remote: see Satellite-based parallel signal processing
time multiplexing, 240 Sequential memory access, software energy
profiling, 347, 348 Servers: see Web servers Short-circuit power dissipation, 212 Shutdown states
API requirements, 154-155 queue design, 49-52
SIA roadmap for power dissipation, 36 Signal detection, FORTE, 248 Signal filters
FORTE, 249
Signal filters (cont.) for parameter estimation, 249-251
signal filtering, 249-251
373
trigger and digitizer output signals, 249-251 timing, 255, 256
Signal processing: see Satellite-based parallel signal processing
SimplePower, register assignment effects on bus energy, 195, 196
SimpleScalar, 196, 322 SimpleScalar 3.0, 52-56, 131, 322 Simplescalar ARM, architectural innovations, 213 Simulator
ARM-like RISC architecture, 213-215 web servers, 277-280
Single cycle behavior, simulation of, 328-329 Single nMOS transistor, transistor current, 295-
296 Single SDT junction memory cell (UC), 23, 31 Slack
dynamic (on-line) power management, 136-137 microarchitecture design
detection, 70-72, 73-76 speCUlation control, 66-68 terminology, 61 theoretical underpinnings, 68-70
power management point calculation, 134 workload variation, 103
Slack indicator table (SIT), 71, 72-76 SmartBadge, 103
dynamic power consumption management, 107 dynamic voltage scaling, 122 portable devices, 106, 107 user request arrival distribution, 105
Smooth circuits, 295-296. Sobel Transform, HDL PACT compiler test, 187,
188 Software energy profiling, 229-258
energy tradeoffs, 354-357, 358 exponential behavior, explanation of, 351-353 factors affecting software energy, 340 first-order model, 345 instruction current profiles, 343-345 leakage current observations, 349-351 leakage energy measurement, 348-349 related work, 340-341 second-order model, 345-348 separation of current components, 353-354 StrongARM experimental setup, 341-342
Software restart regions, energy exposed instruction sets, 80, 81-87
categories of machine state, 84 compiler analysis, 85-86
Page 14
374
Software restart regions, energy exposed instruction sets (cant.)
evaluation, 87 example use, 84-85 restart marker implementation, 83
Software trigger, FORTE application partitioning, 252,253
Solar array degradation, 253-254 SPEC benchmarks, 131 SPECint95, 42, 86, 87 Speculation
microarchitecture design: see also Microarchitecture design
with confidence estimation, 61, 63-66 exploitation of scheduling slack, 66-68 terminology, 60
power-aware real-time systems, 145-146 dynamic (on-line) power management, 137-
139 speed adjustment, 130 speed reduction, 145-146 task scheduling, 129-130
SPECWebIWatt metric, 287 Speed computation, time overhead, 139-140 Speed management overhead, power-aware real-
time systems, 139-140 Speed reduction, worst-case execution time
(WCET), 129, 130 SpeedStep technology, 286 Spin tunneling, MRAM cells, 20-21; see also
Sandwich/spin tunneling memory device Squid,267 Stable state, 83 Standby/sleep mode
dynamic flip-flop design and operation, 7-8, 9, 10
flip-flop design and, 7 SDT devices, 25
Stanford University Intermediate Format: see SUIF State-action frequencies, TISDMP, 114 Statements
HDL AST, 175, 176 SUIF to HDL translation, 176
State Node representations, HDL AST, 175, 176 States, SUIF to HDL translation, 176 State transition decisions, power management
policy, 102 Static circuits, microarchitectural block models,
326 Static current, exponential behavior, 350 Static (off-line) power management, 128, 130,
134-136 Static reclaiming scheme, 129-130
Static scheduling, issue queue, 37 Stationary policies, 102
INDEX
Stochastic power management policies, 102 Storage: see Sandwich/spin tunneling memory
device Stores with direct addressing: see Direct
addressing, stores with StrongARM
floating point operations, 258-259 SmartBadge, 107
StrongARM-l, 81 StrongARM 110 Flip-Flop, 6, 11, 12, 13 StrongARM SA-ll0, 216-217 StrongARM SA-ll00, 339
clock frequency changes, 139-140 software energy profiling, 339, 341-342; see
also Software energy profiling StrongARM SA-lllO, 158, 160, 161-162 Structs, SUIF, 174 Subbanking, combined optimizations, 202 Sub-expression elimination, 185 SUIF, 173, 174
HDL AST correlations, 175 HDL translation, 176-179 PACT HDL optimizations, 183-185 tag-unchecked compiler analysis, 95
Superlog, 172 Superscalar processors
issue queue power consumption, 36 SimpleScalar simulation, 52-56
Switching ARM-like RISC architecture, 215, 217-218 sandwich/spin tunneling cell
current, 22 mechanism, 26 speed,23 time, 22
Switching energy energy tradeoffs, 354-356 operating voltage and frequency effects, 340
Switching power dynamic power dissipation, 212 power estimation methodology, 325
Symbol table HDL AST, 175-176 PACT HDL architecture independence, 180 SUIF,174
Synopsis ARM-like RISC architecture, 214-215, 218 Design Compiler, 183, 214, 321 HDL PACT compiler test, 187, 188 Power Compiler, 214-215, 218 synthesis flow, 182
Page 15
INDEX
Synopsis (cont.) SystemC, 171-172
Synplicity Synplify, 182 Synthesis flow, PACT HDL, 182-183; see also
PACTHDL ASIC design flow, 183 FPGA design path, 182
SystemC, 171-172 System level
dynamic power management dynamic reclaiming and aggressive
scheduling, 130 power-aware real-time systems,130-13I, 143-
145 power management point, 134 static (off-line) power management, 135
innovations, 212 power and energy estimates, ARM-like RISC
architecture, 221
Tag checks elimination of, 95 ISA enhancements, 92
Tag-unchecked loads and stores with direct addressing, energy exposed instruction sets, 90-97
compiler analysis, 94-95 DA register implementation, 93-94 evaluation, 95 example use, 92-93 ISA enhancements, 92
Target processor, and software energy estimation, 340
Task level power management API requirements, 154-156 power-aware real-time systems, 141-142, 143
dynamic reclaiming and aggressive scheduling, 130
models, 130-131 periodic tasks, 133 power management point, 133, 134 static (off-line) power management, 135
Task scheduling, 129-130, 154-155 Task termination, API requirements, 155-156 Temperature, and magnetic memory cells, 25-26 Temporary state, 83 Termination of task, API requirements, 155-156 e
comparison of algorithms, 298 effiency of design, 300-304 rules for parallel and sequential compositions,
312-313 transistor sizing for optimizing, 304-311
e (cont.) transistor sizing for optimizing (cont.)
ET" with n "* 2, 306-307 experimental evidence, 308-309 minimum energy function, 308 multi-cycle system, 309-311 optimal energy and cycle time, 307-308
Threshold levels, FORTE signal detection, 248 Threshold values, queue design, 55 Tiling, loop, 196-197,201-202 TI LowPower DFF, 11, 12, 13 Time: see Efficiency metric Et2
Time and energy model, web server, 277-280 Time-Indexed Semi-Markov Decision Process
Model (TISDMP): see also Dynamic management of power consumption
goal of optimisation, 112-114 dynamic voltage scaling and, 115-116
Time-indexed states, TISDMP, 113 Time multiplexing of sensors, 240
375
Time overhead, power-aware real-time systems, 139-140
Times between arrivals of user requests, 102 Time-specific information, PACT HDL
architecture independence, 180 Timing of FORTE signal filters, 255, 256 TISMDP: see Time-Indexed Semi-Markov
Decision Process Model TLBs, ISAs and, 80, 86 Total task utilization, 128 Transistor connectivity, flip-flop analysis, 11 Transistor count
flip-flop analysis, 10, 11 shutdown logic, 51-52
Transistor current energy and delay analysis, 295-296 leakage, 20, 352-353
Transistor size flip-flop analysis, 8-9 for optimal e, 304-311
ET" with n "* 2, 306-307 experimental evidence, 308-309 minimum energy function, 308 multi-cycle system, 309-311 optimal energy and cycle time, 307-308
Transition, production rules, 295 Transition count reduction, memory address bus,
213 Transition distribution models, TISDMP, 104 Translation Lookaside Buffer (TLB), 342 Transmeta Crusoe processor, 286 Transmeta TM5400, 139-140 Transmogrifier C compiler, 171
Page 16
376
Triceps, 218 Trigger, FORTE application partitioning, 252, 253 Trimaran, 213, 218, 219, 221, 222 TSMC library, 214 TSPC (True Single Phase Clocking) Flip-Flop, 6,
U, 12, 13 Tunneling magnetoresistance, 20-21, 22; see also
Sandwich/spin tunneling memory device Two SDT junction memory cell architecture, 23-
25,31
Unmanned Airborne Vehicle (UAV) queue usage, 54-55
Unrolling, loop, 197,201-202 User
dynamic power consumption management, 105, 106
request interarrival times, 102, 104, 105 TISDMP,I04
Utilization-based algorithm dynamic adaptation, 47-48 queue design, 55-56 shutdown logic, 52
Variable-voltage CPUs, power consumption reduction, 133
Vdd, 294, 299 Vectorsum, HDL PACT compiler test, 187, 188 VeriLog simulation environment, 169; see also
PACTHDL ARM-like RISC architecture, 213-215, 218-219 HDL AST, 174 Superiog, 172
VHDL, 169; see also PACT HDL HDL AST, 174 RTL, 181, 183 synthesis flow, 182 WILDSTAR, 182, 183
Virtex FPGA, 187, 188 Voltage
gating, architectural innovations, 213 power reduction, 60 sandwich/spin tunneling cell, 22 and software energy estimation, 340
Voltage scaling: see also Power-aware real-time systems
algorithms for, 128-129 API, 157, 158 dynamic power consumption management, U5-
U8, 120-122, 123 terminology, 60 web servers, 280-284
Wffitqueue, 37, 38, 37 Wattch simulator, 287, 319 Web servers, 261-288
consolidation of, 261-262
INDEX
dynamic voltage and frequency scaling, 280-284
JouleTrack, 356-357, 358 methodology, 265-271
environment, 266 measurement system, 266-267 replay program, 270-271 workloads, 267-270
performance metrics, implication for, 284-285 power consumption, 271-277
measurements, 271-275 opportunities for power management, 275-
277 power management, 263-265
energy efficiency, 265 server loads, 263-265
related work, 295-287 simulator, 277-280
Website, Trimaran benchmarks, 219 WILDSTAR, FPGA design path, 182, 183 Window, issue queue, 36 Winter Olympics of 1998: see Web server Wireless local area network (WLAN)
portable devices, 106, 109 TISDMP, 119-120 user request arrival distribution, 106 dynamic power consumption management, 109
Workload, 219, 221, 222 variation in, slack times, 103 web servers, 267-270 worst case, 130
Workload completion efficiency (WCE), 219, 221, 222
Workload completion rate efficiency (WCRE), 219, 221,222
Workload completion rate (WCR), 219, 221, 222 Worst-case CPU cycle numbers, 130-131 Worst-case execution time (WCET), 103, 129, 130 Write operations, sandwich/spin tunneling cells,
21, 23, 24, 25
Xilinx Forge J HDL, 172 Xilinx Foundation Tools Design Manager, 182 Xilinx 4000 series, 171 Xilinx XCV 400, HDL PACT compiler test, 187,
188 XScale, API, 161-163
Yield problems, MRAM cells, 20