Jan 13, 2016
William Sandqvist [email protected]
William Sandqvist [email protected]
What simplifications could a compiler, or you, do without sacrifice fast execution?
William Sandqvist [email protected]
5-7 Code optimization#define MAX 10int a[MAX], b[MAX], c[MAX], x[MAX], y[MAX];int i, j, r, s;. . .int f(int a, int b){ int z; z = 2 * a – b; return z;}
int g(int a, int b, int c){ int z; z = a * c – c * b; return z;}
Two functions f and g
What code optimization can the compiler do?
-O, -O0, -O1, -O2, -O3, -Os ?
With the –O or –O0 you have to do all optimi-zations yourself
William Sandqvist [email protected]
Optimization flags
-O, -O0 No optimization-O1 Optimize for size-O2 Optimize for speed
and enable some optimization-O3 Enable all optimizations as O2,
and intensive loop optimizations-Os Optimize for speed
Default setting!
William Sandqvist [email protected]
Two for loops. . .for(i = 0; i <= MAX -1; i++) { x[i] = f(a[i], b[i]); }s = 2 * r;for(j = 0; j <= MAX - 1; j++) { y[j] = s * g(a[j], b[j], c[j]);
}
What can be done?
We want shorter execution time without increasing the code!
William Sandqvist [email protected]
Loop integrationThe two loops have the same range (0, MAX-1), and no data dependency (x only in loop1, y only in loop2).
Loops can be integrated – saves loop overhead ( only i )!
s = 2 * r;for(i = 0; i <= MAX - 1; i++) { x[i] = f(a[i], b[i]); y[j] = s * g(a[j], b[j], c[j]); }
William Sandqvist [email protected]
Precalculation at compile time
s = 2 * r;for(i = 0; i <= 9; i++) { x[i] = f(a[i], b[i]); y[j] = s * g(a[j], b[j], c[j]); }
The defined constant MAX is used as MAX - 1 in the loop. MAX - 1 could be precalculated as 10 – 1 = 9 at compile time!
William Sandqvist [email protected]
Algebraic simplification
)( bacbccaz
Rewriting function g can save one multiplication operation:
int g(int a, int b, int c){ int z; z = c * (a – b); return z;}
mul sub mul mul sub
William Sandqvist [email protected]
Inlining of functionsBoth functions f and g are ”short” and their code could be inserted directly in the loop.
int a[10], b[10], c[10], x[10], y[10];int i, r, s;s = 2 * r;for(i = 0; i <= 9; i++) { x[i] = 2 * a[i] – b[i]; y[j] = s * ((a[i] – b[i]) * c[i]); }
loop unrolling would give shorter execution time, but it would also increase the code size, so it can’t be used in this case.
William Sandqvist [email protected]
William Sandqvist [email protected]
5-2 Register lifetimeA processor has this instruction type: op R1, R2, R3 all three registers must be different. Code to run:
u = c + d; (1) v = a – b; (2)w = a – u; (3)x = v + e; (4)
How many registers are needed?
William Sandqvist [email protected]
Register Life Time Graph
u = c + d; (1) v = a – b; (2)w = a – u; (3)x = v + e; (4)
Four registers are needed!
William Sandqvist [email protected]
Data Flow GraphA Data Flow Graph can detect data dependencies.
(1) Must be before (3)
(2) Must be before (4)
(2) and (3) can change execution order!
u = c + d; (1) v = a – b; (2)w = a – u; (3)x = v + e; (4)
William Sandqvist [email protected]
New Register Life Time Graph
u = c + d; (1) w = a – u; (2’)v = a – b; (3’)x = v + e; (4)
New instruction order
Now only 3 registers needed. Saving 25%.
William Sandqvist [email protected]
William Sandqvist [email protected]
5-8 CDFG
y = 0;if(mode == 1) { for(i = 0; i < 5; i++) { y += a[i] * b[i]; } }
a) Control and Data Flow Graph (CDFG)
b) Multiplication takes 3 cycles, all other instructions take 1 cycle. Best/Worst execution time?
mode =0
TBest = 1+1= 2
mode =1
TWorst =1+1 +1+(5+1) + 5*4 +5 = 34T = 3+1 = 4
William Sandqvist [email protected]
Multiply – Accumulate operationc) MAC-unit!
R1 = R1 + R2 * R3 in one cycle!
y += a[i] * b[i]; /* one cycle */
TWorst = 1+1 +1+(5+1) + 5*1 +5 = 19
19/34 = 0.56. With MAC 56% of ordinary processor execution time.
T = 1
William Sandqvist [email protected]
William Sandqvist [email protected]
Processes on a CPU
William Sandqvist [email protected]
Scheduling states of process
William Sandqvist [email protected]
Priority Driven Scheduling• Each process has fixed priority
• The ready process with the highest priority executes
• Process executes until completion or preemtion by higher priority process
William Sandqvist [email protected]
Examples of sampling frequencies and execution period.
GPS sensor20 Hz
Speed sensor1 kHz
Joystick500 Hz
Actuator servo2000 HzProcess periods:
GPS=1/20 =50 ms
Speed =1/1000 =1 ms
Joystick = 1/500 =2 ms
Servo = 1/2000 =0.5 ms
RTOS
Tasks will often run periodicaly with different process periods.
William Sandqvist [email protected]
Task Triplet
P( max execution time, period, deadline )
deadline < = period
RMS: deadline = period (simplification)
William Sandqvist [email protected]
6-2 Processor utilization and feasible scheduling
Timeline = least-common multiple of process periods
9, 2, 6 33, 2, 23 332 = 18
CPU utilization:
118
396
6
1
2
1
9
3
period
timeexecution
1 i
i
n
i
U100% ?
Task Triplet:P(execution time, period, deadline) deadline = period
P1(3, 9, 9) P2(1, 2, 2) P3(1, 6, 6)
William Sandqvist [email protected]
Rate Monotonic Scheduling
RMS guarantee, feasible schedule exists if :
12
1
nnU n = 3 U < 0.78 In this case U = 1 so there is no guarantee!
RMS shortest period is assigned the highest priority and so on.
( Limit: n = U < 69% )
William Sandqvist [email protected]
RMS figurePriorities: P2 > P3 > P1 (2 < 6 < 9)
P1 misses the deadline! No feasible schedule with RMS!
William Sandqvist [email protected]
Earliest Deadline First SchedulingEDF guarantee, feasible schedule exists if : U 1This case U = 1, EDF shall produce a feasible schedule.
William Sandqvist [email protected]
William Sandqvist [email protected]
6.3 Scheduling and semaphoresP(execution time, period, deadline)
P1(1, 3, 3) P2(1, 4, 4) P3(2, 6, 6) 3, 22, 23 322 = 12
P1: f1() [2] accessSem1() g1() [1] releaseSem1() y1() [1]
P2: f2() [1] g2() [1] y2() [1]
P3: f3() [1] accessSem1() g3() [2] releaseSem1() y3() [1]
Sem1 is a binary semaphore. accessSem1() and releaseSem1() takes 0 time.
RMS P1 > P2 > P3 (3 < 4 < 6)12
11
3
1
4
1
3
1U
William Sandqvist [email protected]
RMS with no critical sections
William Sandqvist [email protected]
RMS with critical sections