Top Banner
© 2005 ECNU SEI Principles of Embedded Computing System Design 1 Program design and analysis Optimizing for execution time. Optimizing for energy/power. Optimizing for program size.
30

Program design and analysis

Mar 16, 2016

Download

Documents

ember

Program design and analysis. Optimizing for execution time. Optimizing for energy/power. Optimizing for program size. Motivation (P.186). Embedded systems must often meet deadlines. Faster may not be fast enough. Need to be able to analyze execution time. Worst-case, not typical. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Program design and analysis

© 2005 ECNU SEI Principles of Embedded Computing System Design 1

Program design and analysisOptimizing for execution time.Optimizing for energy/power.Optimizing for program size.

Page 2: Program design and analysis

© 2005 ECNU SEI Principles of Embedded Computing System Design 2

Motivation (P.186)Embedded systems must often meet

deadlines. Faster may not be fast enough.

Need to be able to analyze execution time. Worst-case, not typical.

Need techniques for reliably improving execution time.

Page 3: Program design and analysis

© 2005 ECNU SEI Principles of Embedded Computing System Design 3

Run times will vary (P.186)Program execution times depend on

several factors: Input data values. State of the instruction, data caches. Pipelining effects.

Page 4: Program design and analysis

© 2005 ECNU SEI Principles of Embedded Computing System Design 4

Measuring program speedCPU simulator.

I/O may be hard. May not be totally accurate.

Hardware timer. Requires board, instrumented program.

Logic analyzer. Limited logic analyzer memory size.

Page 5: Program design and analysis

© 2005 ECNU SEI Principles of Embedded Computing System Design 5

Program performance metricsAverage-case:

For typical data values, whatever they are.Worst-case:

For any possible input set.Best-case:

For any possible input set.Too-fast programs may cause critical

races at system level.

Page 6: Program design and analysis

© 2005 ECNU SEI Principles of Embedded Computing System Design 6

What data values?What values create

worst/average/best case behavior? analysis; experimentation.

Concerns: operations; program paths.

Page 7: Program design and analysis

© 2005 ECNU SEI Principles of Embedded Computing System Design 7

Performance analysis (P.187)

Elements of program performance : execution time = program path +

instruction timing Path depends on data values. Choose

which case you are interested in. Instruction timing depends on

pipelining, cache behavior.

Page 8: Program design and analysis

© 2005 ECNU SEI Principles of Embedded Computing System Design 8

Programs and performance analysisBest results come from analyzing

optimized instructions, not high-level language code: non-obvious translations of HLL

statements into instructions; code may move; cache effects are hard to predict.

Page 9: Program design and analysis

© 2005 ECNU SEI Principles of Embedded Computing System Design 9

Program paths (P.188)Consider for loop:

for (i=0, f=0, i<N; i++)f = f + c[i]*x[i];

Loop initiation block executed once.

Loop test executed N+1 times.

Loop body and variable update executed N times.

i<N

i=0; f=0;

f = f + c[i]*x[i];

i = i+1;

N

Ytest

body

update

initialization

Page 10: Program design and analysis

© 2005 ECNU SEI Principles of Embedded Computing System Design 10

Instruction timing (P.189)Not all instructions take the same

amount of time. Hard to get execution time data for

instructions.Instruction execution times are not

independent.Execution time may depend on

operand values.

Page 11: Program design and analysis

© 2005 ECNU SEI Principles of Embedded Computing System Design 11

Trace-driven performance analysis (P.189)Trace: a record of the execution path

of a program.Trace gives execution path for

performance analysis.A useful trace:

requires proper input values; is large (gigabytes).

Trace processors Rotenberg, E.; Jacobson, Q.; Sazeides, Y.; Smith, J.; Microarchitecture, 1997. Proceedings. Thirtieth Annual IEEE/ACM International Symposium on , 1-3 Dec 1997 Page(s): 138 -148

Page 12: Program design and analysis

© 2005 ECNU SEI Principles of Embedded Computing System Design 12

Trace generation (P.190)Hardware capture:

logic analyzer; hardware assist in CPU.

Software: PC sampling. Instrumentation instructions. Simulation.

Page 13: Program design and analysis

© 2005 ECNU SEI Principles of Embedded Computing System Design 13

Trace scheduling

1

Bookkeepi ngmodi fi es l essl i kel y traces

The most l i kel ytrace i s opti mi sed

35

2"

4

2'

1

53

2

4

1

3

2

Trace scheduling: the most likely path is found, and its basic blocks are merged into one. Bookkeeping is required to ensure correctness.

Page 14: Program design and analysis

© 2005 ECNU SEI Principles of Embedded Computing System Design 14

Loop optimizations (P.191)Loops are good targets for

optimization.Basic loop optimizations:

code motion; induction-variable elimination; strength reduction (x*2 x<<1).

Page 15: Program design and analysis

© 2005 ECNU SEI Principles of Embedded Computing System Design 15

Code motionfor (i=0; i<N*M; i++)

z[i] = a[i] + b[i];

i<N*M

i=0;

z[i] = a[i] + b[i];

i = i+1;

N

Yi<X

i=0; X = N*M

Page 16: Program design and analysis

© 2005 ECNU SEI Principles of Embedded Computing System Design 16

Induction variable eliminationInduction variable: loop index.Consider loop:

for (i=0; i<N; i++)for (j=0; j<M; j++)z[i][j] = b[i][j];

Rather than recompute i*M+j for each array in each iteration, share induction variable between arrays, increment at end of loop body. Cf. P.192

Page 17: Program design and analysis

© 2005 ECNU SEI Principles of Embedded Computing System Design 17

Cache analysisLoop nest: set of loops, one inside

other. Rewrite loop nest to change the order of

access array.Perfect loop nest: no conditionals in

nest.Because loops use large quantities of

data, cache conflicts are common.

Page 18: Program design and analysis

© 2005 ECNU SEI Principles of Embedded Computing System Design 18

Array conflicts in cache (P.194)

a[0][0]

b[0][0]

main memory cache

1024 4096

...

1024

4096pad

Page 19: Program design and analysis

© 2005 ECNU SEI Principles of Embedded Computing System Design 19

Array conflicts, cont’d.Array elements conflict because they

are in the same line, even if not mapped to same location.

Solutions: move one array; pad array.

Page 20: Program design and analysis

© 2005 ECNU SEI Principles of Embedded Computing System Design 20

Use registers efficiently.Use page mode memory accesses.Analyze cache behavior:

instruction conflicts can be handled by rewriting code, rescheudling;

conflicting scalar data can easily be moved; conflicting array data can be moved,

padded.

Performance optimization hints

Page 21: Program design and analysis

© 2005 ECNU SEI Principles of Embedded Computing System Design 21

Energy/power optimization (P.195)

Energy: ability to do work. Most important in battery-powered

systems.Power: energy per unit time.

Important even in wall-plug systems---power becomes heat.

Page 22: Program design and analysis

© 2005 ECNU SEI Principles of Embedded Computing System Design 22

Measuring energy consumptionExecute a small loop, measure current:

while (TRUE)a();

I

CPU

Page 23: Program design and analysis

© 2005 ECNU SEI Principles of Embedded Computing System Design 23

Sources of energy consumption

Relative energy per operation (Catthoor et al): memory transfer: 33 external I/O: 10 SRAM write: 9 SRAM read: 4.4 multiply: 3.6 add: 1 Cf. Fig.5-26 P.196

Page 24: Program design and analysis

© 2005 ECNU SEI Principles of Embedded Computing System Design 24

Cache behavior is importantEnergy consumption has a sweet spot

as cache size changes: cache too small: program thrashes, burning

energy on external memory accesses; cache too large: cache itself burns too

much power.

Cf. Fig.5-27 P.197cache ~ energycache ~ execute time

Page 25: Program design and analysis

© 2005 ECNU SEI Principles of Embedded Computing System Design 25

Optimizing for energy (P.198)

First-order optimization: high performance = low energy.

Not many instructions trade speed for energy.

?

Page 26: Program design and analysis

© 2005 ECNU SEI Principles of Embedded Computing System Design 26

Optimizing for energy, cont’d.Use registers efficiently.Identify and eliminate cache conflicts.Use page mode memory accesses.Moderate loop unrolling eliminates some

loop overhead instructions.Eliminate pipeline stalls.Inlining procedures may help: reduces

linkage, but may increase cache thrashing.

Page 27: Program design and analysis

© 2005 ECNU SEI Principles of Embedded Computing System Design 27

Optimizing for program sizeGoal:

reduce hardware cost of memory; reduce power consumption of memory

units.Two opportunities:

data; instructions.

Page 28: Program design and analysis

© 2005 ECNU SEI Principles of Embedded Computing System Design 28

Data size minimizationReuse constants, variables, data

buffers in different parts of code. Requires careful verification of

correctness. Eliminates the copy of data

Generate data using instructions.

Page 29: Program design and analysis

© 2005 ECNU SEI Principles of Embedded Computing System Design 29

Reducing code sizeAvoid function inlining.Choose CPU with compact instructions.

ARM Thumb MIPS-16 Variable length of instruction

Use specialized instructions where possible. RPTS/RPTB

Code compression

contradiction?

Page 30: Program design and analysis

© 2005 ECNU SEI Principles of Embedded Computing System Design 30

Code compression (P.199)Use statistical compression to reduce

code size, decompress on-the-fly:

CPUdeco

mpr

esso

r table

cache

mainmemory

0101101

0101101LDR r0,[r4]