Memory Bandwidth Regulation for Multiframe Task Sets · 2019. 9. 6. · Muhammad Ali Awan, Pedro F. Souto, Konstantinos Bletsas, Benny Akesson, and Eduardo Tovar Memory Bandwidth

CISTER – Research Centre in

Real-Time & Embedded Computing Systems

Muhammad Ali Awan, Pedro F. Souto, Konstantinos Bletsas, Benny Akesson, and Eduardo Tovar

Memory Bandwidth Regulation for Multiframe Task Sets



Outline

Motivation and system model

Background

Our approach

Computation time

Heuristics

Evaluation

Conclusions

2



App 1 App 2 App 3 App 4

Motivation

3

Resource

Utilization

Timing

Analysis

Multi-core Platform

Computation Energy Weight ScalabilityCost



Task Model

4

Independent

sporadic tasks

Preemptive fixed

priority scheduling

No-migration

Multiframe task-model

Frame 1 Frame 2 Frame 4Frame 3

Super frame

Deadline

monotonic



Platform Model

5

Identical multicore platform Core 1 Core 2 Core 3 Core 4

Cache Cache Cache Cache

Partitioned/private last-level cache

Bus Arbiter

Main Memory

Round-robin bus arbitration

(interconnect + memory controller)

Constant memory access time

Multiple outstanding memory requests

Prefetchers and speculative units are disabled

Computation and memory accesses do not overlap in time

WCET

CPU computation Memory accesses

Performance monitoring counters



Memory Access Regulation Model

6

Memory accesses regulated by MemGuard

Memory budget

assigned

Contention stall

b d ac

Budget exhausted

a a aa

Regulation stall

b

d

a

c

Core a memory access

Core b memory access

Core c memory access

Core d memory access

Uneven memory bandwidth across cores

Regulation period



Memory-aware schedulability analysis

7

Schedulability analysis (Yao’s approach)

Composite Task

Task under analysisJobs of higher or equal priority tasks that can

preempt a task under analysis

Ce Cm

WCRT = Stall + Interference of higher priority tasks + (Ce + Cm)

Stall analysis

Stall Analysis

Ce Cm

Upper bound on Stall

Memory access

budget

WCET of a task under analysis

Standard response time analysis demand in previous iteration



Schedulability analysis of Multiframetasks

8

Maximum cumulative execution requirement (Baruah’s approach)

a b c d

For two frames = Max (a, b, c, d)

w

x

y

z

For three frames = Max (w, x, y, z)

Compute number of jobs for each

higher priority task

Sum their maximum cumulative

execution requirement



Why SOTA solutions do not work

9

t=34

t=5

Task parameters- Period of 10

- Three frames32 1 2S2

3 1 2 3S3

t = 34

Sequence Total WCET Total memory

accesses

S1 7 3

S2 8 5

S3 9 4

31 2 1S1

Sequence

t = 5


accesses

S1 1 0

S2 2 2

S3 3 1

C = 1 (Cm = 0 , Ce = 1)

3

1

2 C = 2 (Cm = 2 , Ce = 0)

C = 3 (Cm = 1 , Ce = 2)

F1

F2

F3

WCRT = Stall + Interference of higher priority tasks + (Ce + Cm)



Main contributions

10

Worst-case memory stall for

multiframe task-model

Stall-aware schedulability analysis

for multiframe task-model

Five memory bandwidth and

task-to-core allocation heuristics



Our proposed schedulability analysis

11

WCRTF1 = Stall + Interference of higher priority tasks + (Ce,F1 + Cm,F1)

WCRTF2 = Stall + Interference of higher priority tasks + (Ce,F2 + Cm,F2)

WCRTFn = Stall + Interference of higher priority tasks + (Ce,Fn + Cm,Fn)

WCRT = Max(WCRTF1, WCRTF2 , ….. , WCRTFn )



Example

12

t=25

Task A

(3 frames, Period =20)

1 2S1,A

S1,A = {F1,A, F2,A}

32S2,A

S2,A = {F2,A, F3,A}

3 1S3,A

S3,A = {F3,A, F1,A}

Task B

(4 frames, Period = 10)

t=25

31 2S1,A

S1,B = {F1,B, F2,B, F3,B}

32S2,B 4

S2,B = {F2,B, F3,B, F4,B}

3 4 1S3,B

S3,B = {F3,B, F4,B, F1,B}

4 1S4,B 2

S4,B = {F4,B, F1,B, F2,B}

Cartesian product of sequences =

{ (S1,A ,S1,B), (S1,A ,S2,B), (S1,A ,S3,B), (S1,A ,S4,B),

(S2,A ,S1,B), (S2,A ,S2,B), (S2,A ,S3,B), (S2,A ,S4,B),

(S3,A ,S1,B), (S3,A ,S2,B), (S3,A ,S3,B), (S3,A ,S4,B) }

Select tuple that gives the maximum memory regulation stall and interference.

WCRTFx | (k+1) = Stall (Θx) + Interference of higher priority tasks (Θx) + (Ce,Fx + Cm,Fx), where x = {1, 2, …, 12}

In (k+1)th iteration, for each tuple Θx compute stall and interference

Or

{Θ1, Θ2, Θ3, Θ4,

Θ5, Θ6, Θ7, Θ8,

Θ9, Θ10, Θ11, Θ12}



Reducing computational cost

13

No. of Frames No. of tuples

For any two tuple

Θx ≥ Θy, iff

mmyx CC

ee

yx CC

and

)(),( memeyyxx CCCC

Or

It is sufficient to check

only tuple Θx

No. of Sequences

For any two sequences

Sx ≥ Sy, iff

m

S

m

S yx CC e

S

e

S yx CC and

)(),( m

S

e

S

m

S

e

S yyxx CCCC

Or


only sequence Sx


only WCRT of Fi

For any two frames

Fi ≥ Fj , iff

jmim CC ,, jeie CC ,, and

)(),( ,,,, jmjeimie CCCC

Or



Tightness vs computational cost

14

t = 34


accesses

Total CPU

Computation

S1 7 3 4

S2 8 5 3

S3 9 4 5

31 2 1

32 1 2

3 1 2 3

t=34

S1

S2

S3

SequenceTask parameters- Period of 10

- Three frames

C = 1 (Cm = 0 , Ce = 1)

3

1

2 C = 2 (Cm = 2 , Ce = 0)

C = 3 (Cm = 1 , Ce = 2)

F1

F2

F3


accesses

Total CPU

Computation

S12 8 5 4

S3 9 4 5


accesses

Total CPU

Computation

S123 9 5 5



Implementation details

15

1,1 1,2 1,3 … 1,N

2,1 2,2 2,3 … 2,N

3,1 3,2 3,3 … 3,N

… … … … …

N,1 N,2 N,3 … N,N

N frames

N fra

me

s

For each task

(2,3) → 2 jobs starting from 3rd frame



Task-to-core and Memory Bandwidth allocation

16

Five HeuristicsEven First-Fit

- Each Core has equal memory bandwidth share

- First-fit bin packing for task-to-core analysis

Uneven First-Fit

- Initially each Core has equal memory bandwidth share

- Trim-off memory bandwidth, if tasks are not schedulable with

equal memory bandwidth

- Use this trimmed bandwidth to schedule remaining tasks

Memory density worst-fit

- Sort cores in non-increasing order of energy density

- Assign task to a core with that gives minimum increase in

memory density

Total density worst-fit

- Similar to memory density worst-fit, except

it uses total density instead of memory

density

Memory-fit

- Assign a task to a core that requires minimum memory bandwidth

Priority assignment:

Deadline monotonic



Utilization: UUnifast-discard algorithm

Inter-arrival time: Log-uniform distribution (10 ms- 1 s)

WCET of first frame = inter-arrival time × Utilization

Implicit deadlines (though algorithm works for constrained deadlines)

Number of frames: Selected randomly

WCET of other frames

Randomly selected with log-uniform distribution

Between user define value and WCET of first frame

Memory accesses of each frame

Selected randomly

Memory access time is 40 nsec

Regulation period length is 100 msec

1000 random task-sets per set point

Experimental Setup

17



Results

18

MF-TA achieves up to 85% improvements in

terms of schedulability success ratio

MF-FA is 11x times faster than MF-TA with 3.7%

decreases in terms of schedulability success ratio



Conclusions

Stall-aware schedulability analysis for multiframe task sets on multicore platforms

Provided techniques to reduce the computation time

Has possibility to trade-off tightness vs computation time

Improved schedulability success ratio up to 85% when compared to frame-agonistic stall-aware analysis

Achieved 11-fold speed up with 3.7% loss in schedulability

Proposed five memory bandwidth and task-to-core allocation heuristics

19

Click to edit Master title style

Click to edit Master text styles

Second level

Third level

Fourth level

Fifth level





Questions ?

20



Our work

Main idea

21

Multiframe Task-model Memory regulationMulticore platforms

Memory Bandwidth Regulation for Multiframe Task Sets · 2019. 9. 6. · Muhammad Ali Awan, Pedro F. Souto, Konstantinos Bletsas, Benny Akesson, and Eduardo Tovar Memory Bandwidth

Documents