Top Banner
1 Energy Savings and Speedups from Partitioning Critical Software Loops to Hardware in Embedded Systems Greg Stitt, Frank Vahid, Shawn Nematbakhsh University of California, Riverside ACM Transactions on Embedded Computing Systems, February 2004
16

1 Energy Savings and Speedups from Partitioning Critical Software Loops to Hardware in Embedded Systems Greg Stitt, Frank Vahid, Shawn Nematbakhsh University.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Energy Savings and Speedups from Partitioning Critical Software Loops to Hardware in Embedded Systems Greg Stitt, Frank Vahid, Shawn Nematbakhsh University.

1

Energy Savings and Speedups from Partitioning Critical

Software Loops to Hardware in Embedded Systems

Greg Stitt, Frank Vahid, Shawn Nematbakhsh

University of California, Riverside

ACM Transactions on Embedded Computing Systems, February 2004

Page 2: 1 Energy Savings and Speedups from Partitioning Critical Software Loops to Hardware in Embedded Systems Greg Stitt, Frank Vahid, Shawn Nematbakhsh University.

2

General Idea• Increase performance and/or save power of a

single embedded system program.• Take advantage of embedded properties:

– Fairly specific applications that rarely change.

– Small loops account for large portion of exec time.

• Dedicate configurable logic device or an ASIC to perform the loop function efficiently.

• For overall power savings, speedup must be great enough to overcome increase in “exec” power.

Page 3: 1 Energy Savings and Speedups from Partitioning Critical Software Loops to Hardware in Embedded Systems Greg Stitt, Frank Vahid, Shawn Nematbakhsh University.

3

Study Uniqueness

• What separates this study from others?– Simple HW/SW partitioning method (no

complex search algorithms).– Focus on embedded systems– Extensive evaluation of energy savings.

Page 4: 1 Energy Savings and Speedups from Partitioning Critical Software Loops to Hardware in Embedded Systems Greg Stitt, Frank Vahid, Shawn Nematbakhsh University.

4

Critical Loops

Ave.

Page 5: 1 Energy Savings and Speedups from Partitioning Critical Software Loops to Hardware in Embedded Systems Greg Stitt, Frank Vahid, Shawn Nematbakhsh University.

5

Partitioning Methods• Modified apps with critical loops moved to hardware using Synopsis

register-transfer VHDL.• Configurable system logic (CSL) master of bus. CSL accesses memory

directly, or through DMA.• CPU – CSL communication via shared memory (including CSL

registers) and direct signals.• Using ASIC, no DMA.

Page 6: 1 Energy Savings and Speedups from Partitioning Critical Software Loops to Hardware in Embedded Systems Greg Stitt, Frank Vahid, Shawn Nematbakhsh University.

6

Partitioning Methods

• Handshaking routines used for activating custom hardware (CSL or ASIC) when entering “critical” region.

Page 7: 1 Energy Savings and Speedups from Partitioning Critical Software Loops to Hardware in Embedded Systems Greg Stitt, Frank Vahid, Shawn Nematbakhsh University.

7

Speedup & E-savings (estimation)• Software loops replaced with handshaking behavior.• HW cycles/loop calculated as always worst-case.• Simulated: 100 MHz MIPS, 25 MHz 8051, max possible

CSL speeds after synthesis.• Xilinx Vertix power estimator for CSL (.18 um FPGA

1.8V – XCV50E).• Measured active/idle power in Triscend’s parts: CPUidle =

.85*CPUactive, CSLidle = .125*SCLactive.• Power of interconnect and memory gathered through

physical measurment of Triscend parts.• Total system Energy =

Page 8: 1 Energy Savings and Speedups from Partitioning Critical Software Loops to Hardware in Embedded Systems Greg Stitt, Frank Vahid, Shawn Nematbakhsh University.

8

Speedup & E-savings (estimation)

Page 9: 1 Energy Savings and Speedups from Partitioning Critical Software Loops to Hardware in Embedded Systems Greg Stitt, Frank Vahid, Shawn Nematbakhsh University.

9

Speedup & E-savings (estimation)

Gates

Page 10: 1 Energy Savings and Speedups from Partitioning Critical Software Loops to Hardware in Embedded Systems Greg Stitt, Frank Vahid, Shawn Nematbakhsh University.

10

Speedup & E-savings (measurement)• Single-chip microproc/CSL devices from Triscend:

– E5 (8051 @ 25 MHz)– A7 (ARM7 @ 40 MHz

• Digital multimeter used for current/voltage measurement, time with timer (!)

• Subset of benchmarks measured.• Good speedups and energy savings, energy

“estimates even look conservative”. (only on MIPS)

• … But comparing a 100 MHz MIPS (sim) to 40 MHz ARM7 (measured)?

Page 11: 1 Energy Savings and Speedups from Partitioning Critical Software Loops to Hardware in Embedded Systems Greg Stitt, Frank Vahid, Shawn Nematbakhsh University.

11

Speedup & E-savings (measurement)

Page 12: 1 Energy Savings and Speedups from Partitioning Critical Software Loops to Hardware in Embedded Systems Greg Stitt, Frank Vahid, Shawn Nematbakhsh University.

12

Speedup & E-savings (ASIC)

• Estimations of a uP and custom logic on a single ASIC.

• Synopsis synthesis and power estimation tools for 0.18 um.

• Ave. estimated speedups increased to 4.0 from 3.2, due to increase in clock speed.

• E savings up to 50% from 34%.• Ave # of gates down to 5,738 from 10,507.

Page 13: 1 Energy Savings and Speedups from Partitioning Critical Software Loops to Hardware in Embedded Systems Greg Stitt, Frank Vahid, Shawn Nematbakhsh University.

13

Voltage Scaling

• Additional energy saved if voltage scaling factored in.

• Because of the increased performance, clock speed may be slowed, and voltage reduced to attain equivalent performance.

• On average, Vscaling gives an additional 14% of E-savings.

Page 14: 1 Energy Savings and Speedups from Partitioning Critical Software Loops to Hardware in Embedded Systems Greg Stitt, Frank Vahid, Shawn Nematbakhsh University.

14

Voltage ScalingPercent Speed (clock) Reduction

Page 15: 1 Energy Savings and Speedups from Partitioning Critical Software Loops to Hardware in Embedded Systems Greg Stitt, Frank Vahid, Shawn Nematbakhsh University.

15

Conclusion

• Moving a small amount of critical code to hardware can provide speedups and/or energy savings.

• Single-chip CPU / Config logic can give much improvements over CPU-only implementations.

• Extensive hardware/software partitioning exploration not needed – only basic profiling.

Page 16: 1 Energy Savings and Speedups from Partitioning Critical Software Loops to Hardware in Embedded Systems Greg Stitt, Frank Vahid, Shawn Nematbakhsh University.

16

Discussion Ideas• Can the gains seen on this benchmark suite carry over to

actual applications? • Why did they simulate a 100 MHz MIPS, but used a 40

MHz ARM?• How would the results be different on more modern

microprocs? Xscale? AVR? Do these newer CPU’s have much better performance/power ratios?

• Pg 223. – “parallel execution”? Do they actually have parallel exec going on? Pg. 224 says no. How does having a DMA option allow “almost any software region” to be implemented on HW more easily?

• 85% idle power for 8051??!! (pg 225). Obviously not “sleeping.”