1 Energy Savings and Speedups from Partitioning Critical Software Loops to Hardware in Embedded Systems Greg Stitt, Frank Vahid, Shawn Nematbakhsh University.

Post on 22-Dec-2015

222 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

Transcript

1

Energy Savings and Speedups from Partitioning Critical

Software Loops to Hardware in Embedded Systems

Greg Stitt, Frank Vahid, Shawn Nematbakhsh

University of California, Riverside

ACM Transactions on Embedded Computing Systems, February 2004

2

General Idea• Increase performance and/or save power of a

single embedded system program.• Take advantage of embedded properties:

– Fairly specific applications that rarely change.

– Small loops account for large portion of exec time.

• Dedicate configurable logic device or an ASIC to perform the loop function efficiently.

• For overall power savings, speedup must be great enough to overcome increase in “exec” power.

3

Study Uniqueness

• What separates this study from others?– Simple HW/SW partitioning method (no

complex search algorithms).– Focus on embedded systems– Extensive evaluation of energy savings.

4

Critical Loops

Ave.

5

Partitioning Methods• Modified apps with critical loops moved to hardware using Synopsis

register-transfer VHDL.• Configurable system logic (CSL) master of bus. CSL accesses memory

directly, or through DMA.• CPU – CSL communication via shared memory (including CSL

registers) and direct signals.• Using ASIC, no DMA.

6

Partitioning Methods

• Handshaking routines used for activating custom hardware (CSL or ASIC) when entering “critical” region.

7

Speedup & E-savings (estimation)• Software loops replaced with handshaking behavior.• HW cycles/loop calculated as always worst-case.• Simulated: 100 MHz MIPS, 25 MHz 8051, max possible

CSL speeds after synthesis.• Xilinx Vertix power estimator for CSL (.18 um FPGA

1.8V – XCV50E).• Measured active/idle power in Triscend’s parts: CPUidle =

.85*CPUactive, CSLidle = .125*SCLactive.• Power of interconnect and memory gathered through

physical measurment of Triscend parts.• Total system Energy =

8

Speedup & E-savings (estimation)

9

Speedup & E-savings (estimation)

Gates

10

Speedup & E-savings (measurement)• Single-chip microproc/CSL devices from Triscend:

– E5 (8051 @ 25 MHz)– A7 (ARM7 @ 40 MHz

• Digital multimeter used for current/voltage measurement, time with timer (!)

• Subset of benchmarks measured.• Good speedups and energy savings, energy

“estimates even look conservative”. (only on MIPS)

• … But comparing a 100 MHz MIPS (sim) to 40 MHz ARM7 (measured)?

11

Speedup & E-savings (measurement)

12

Speedup & E-savings (ASIC)

• Estimations of a uP and custom logic on a single ASIC.

• Synopsis synthesis and power estimation tools for 0.18 um.

• Ave. estimated speedups increased to 4.0 from 3.2, due to increase in clock speed.

• E savings up to 50% from 34%.• Ave # of gates down to 5,738 from 10,507.

13

Voltage Scaling

• Additional energy saved if voltage scaling factored in.

• Because of the increased performance, clock speed may be slowed, and voltage reduced to attain equivalent performance.

• On average, Vscaling gives an additional 14% of E-savings.

14

Voltage ScalingPercent Speed (clock) Reduction

15

Conclusion

• Moving a small amount of critical code to hardware can provide speedups and/or energy savings.

• Single-chip CPU / Config logic can give much improvements over CPU-only implementations.

• Extensive hardware/software partitioning exploration not needed – only basic profiling.

16

Discussion Ideas• Can the gains seen on this benchmark suite carry over to

actual applications? • Why did they simulate a 100 MHz MIPS, but used a 40

MHz ARM?• How would the results be different on more modern

microprocs? Xscale? AVR? Do these newer CPU’s have much better performance/power ratios?

• Pg 223. – “parallel execution”? Do they actually have parallel exec going on? Pg. 224 says no. How does having a DMA option allow “almost any software region” to be implemented on HW more easily?

• 85% idle power for 8051??!! (pg 225). Obviously not “sleeping.”

top related