CISTER – Research Centre in Real-Time & Embedded Computing Systems Muhammad Ali Awan , Pedro F. Souto, Konstantinos Bletsas, Benny Akesson, and Eduardo Tovar Memory Bandwidth Regulation for Multiframe Task Sets
CISTER – Research Centre in
Real-Time & Embedded Computing Systems
Muhammad Ali Awan, Pedro F. Souto, Konstantinos Bletsas, Benny Akesson, and Eduardo Tovar
Memory Bandwidth Regulation for Multiframe Task Sets
CISTER – Research Centre in
Real-Time & Embedded Computing Systems
Outline
Motivation and system model
Background
Our approach
Computation time
Heuristics
Evaluation
Conclusions
2
CISTER – Research Centre in
Real-Time & Embedded Computing Systems
App 1 App 2 App 3 App 4
Motivation
3
Resource
Utilization
Timing
Analysis
Multi-core Platform
Computation Energy Weight ScalabilityCost
CISTER – Research Centre in
Real-Time & Embedded Computing Systems
Task Model
4
Independent
sporadic tasks
Preemptive fixed
priority scheduling
No-migration
Multiframe task-model
Frame 1 Frame 2 Frame 4Frame 3
Super frame
Deadline
monotonic
CISTER – Research Centre in
Real-Time & Embedded Computing Systems
Platform Model
5
Identical multicore platform Core 1 Core 2 Core 3 Core 4
Cache Cache Cache Cache
Partitioned/private last-level cache
Bus Arbiter
Main Memory
Round-robin bus arbitration
(interconnect + memory controller)
Constant memory access time
Multiple outstanding memory requests
Prefetchers and speculative units are disabled
Computation and memory accesses do not overlap in time
WCET
CPU computation Memory accesses
Performance monitoring counters
CISTER – Research Centre in
Real-Time & Embedded Computing Systems
Memory Access Regulation Model
6
Memory accesses regulated by MemGuard
Memory budget
assigned
Contention stall
b d ac
Budget exhausted
a a aa
Regulation stall
b
d
a
c
Core a memory access
Core b memory access
Core c memory access
Core d memory access
Uneven memory bandwidth across cores
Regulation period
CISTER – Research Centre in
Real-Time & Embedded Computing Systems
Memory-aware schedulability analysis
7
Schedulability analysis (Yao’s approach)
Composite Task
Task under analysisJobs of higher or equal priority tasks that can
preempt a task under analysis
Ce Cm
WCRT = Stall + Interference of higher priority tasks + (Ce + Cm)
Stall analysis
Stall Analysis
Ce Cm
Upper bound on Stall
Memory access
budget
WCET of a task under analysis
Standard response time analysis demand in previous iteration
CISTER – Research Centre in
Real-Time & Embedded Computing Systems
Schedulability analysis of Multiframetasks
8
Maximum cumulative execution requirement (Baruah’s approach)
a b c d
For two frames = Max (a, b, c, d)
w
x
y
z
For three frames = Max (w, x, y, z)
Compute number of jobs for each
higher priority task
Sum their maximum cumulative
execution requirement
CISTER – Research Centre in
Real-Time & Embedded Computing Systems
Why SOTA solutions do not work
9
t=34
t=5
Task parameters- Period of 10
- Three frames32 1 2S2
3 1 2 3S3
t = 34
Sequence Total WCET Total memory
accesses
S1 7 3
S2 8 5
S3 9 4
31 2 1S1
Sequence
t = 5
Sequence Total WCET Total memory
accesses
S1 1 0
S2 2 2
S3 3 1
C = 1 (Cm = 0 , Ce = 1)
3
1
2 C = 2 (Cm = 2 , Ce = 0)
C = 3 (Cm = 1 , Ce = 2)
F1
F2
F3
WCRT = Stall + Interference of higher priority tasks + (Ce + Cm)
CISTER – Research Centre in
Real-Time & Embedded Computing Systems
Main contributions
10
Worst-case memory stall for
multiframe task-model
Stall-aware schedulability analysis
for multiframe task-model
Five memory bandwidth and
task-to-core allocation heuristics
CISTER – Research Centre in
Real-Time & Embedded Computing Systems
Our proposed schedulability analysis
11
WCRTF1 = Stall + Interference of higher priority tasks + (Ce,F1 + Cm,F1)
WCRTF2 = Stall + Interference of higher priority tasks + (Ce,F2 + Cm,F2)
WCRTFn = Stall + Interference of higher priority tasks + (Ce,Fn + Cm,Fn)
WCRT = Max(WCRTF1, WCRTF2 , ….. , WCRTFn )
CISTER – Research Centre in
Real-Time & Embedded Computing Systems
Example
12
t=25
Task A
(3 frames, Period =20)
1 2S1,A
S1,A = {F1,A, F2,A}
32S2,A
S2,A = {F2,A, F3,A}
3 1S3,A
S3,A = {F3,A, F1,A}
Task B
(4 frames, Period = 10)
t=25
31 2S1,A
S1,B = {F1,B, F2,B, F3,B}
32S2,B 4
S2,B = {F2,B, F3,B, F4,B}
3 4 1S3,B
S3,B = {F3,B, F4,B, F1,B}
4 1S4,B 2
S4,B = {F4,B, F1,B, F2,B}
Cartesian product of sequences =
{ (S1,A ,S1,B), (S1,A ,S2,B), (S1,A ,S3,B), (S1,A ,S4,B),
(S2,A ,S1,B), (S2,A ,S2,B), (S2,A ,S3,B), (S2,A ,S4,B),
(S3,A ,S1,B), (S3,A ,S2,B), (S3,A ,S3,B), (S3,A ,S4,B) }
Select tuple that gives the maximum memory regulation stall and interference.
WCRTFx | (k+1) = Stall (Θx) + Interference of higher priority tasks (Θx) + (Ce,Fx + Cm,Fx), where x = {1, 2, …, 12}
In (k+1)th iteration, for each tuple Θx compute stall and interference
Or
{Θ1, Θ2, Θ3, Θ4,
Θ5, Θ6, Θ7, Θ8,
Θ9, Θ10, Θ11, Θ12}
CISTER – Research Centre in
Real-Time & Embedded Computing Systems
Reducing computational cost
13
No. of Frames No. of tuples
For any two tuple
Θx ≥ Θy, iff
mmyx CC
ee
yx CC
and
)(),( memeyyxx CCCC
Or
It is sufficient to check
only tuple Θx
No. of Sequences
For any two sequences
Sx ≥ Sy, iff
m
S
m
S yx CC e
S
e
S yx CC and
)(),( m
S
e
S
m
S
e
S yyxx CCCC
Or
It is sufficient to check
only sequence Sx
It is sufficient to check
only WCRT of Fi
For any two frames
Fi ≥ Fj , iff
jmim CC ,, jeie CC ,, and
)(),( ,,,, jmjeimie CCCC
Or
CISTER – Research Centre in
Real-Time & Embedded Computing Systems
Tightness vs computational cost
14
t = 34
Sequence Total WCET Total memory
accesses
Total CPU
Computation
S1 7 3 4
S2 8 5 3
S3 9 4 5
31 2 1
32 1 2
3 1 2 3
t=34
S1
S2
S3
SequenceTask parameters- Period of 10
- Three frames
C = 1 (Cm = 0 , Ce = 1)
3
1
2 C = 2 (Cm = 2 , Ce = 0)
C = 3 (Cm = 1 , Ce = 2)
F1
F2
F3
Sequence Total WCET Total memory
accesses
Total CPU
Computation
S12 8 5 4
S3 9 4 5
Sequence Total WCET Total memory
accesses
Total CPU
Computation
S123 9 5 5
CISTER – Research Centre in
Real-Time & Embedded Computing Systems
Implementation details
15
1,1 1,2 1,3 … 1,N
2,1 2,2 2,3 … 2,N
3,1 3,2 3,3 … 3,N
… … … … …
N,1 N,2 N,3 … N,N
N frames
N fra
me
s
For each task
(2,3) → 2 jobs starting from 3rd frame
CISTER – Research Centre in
Real-Time & Embedded Computing Systems
Task-to-core and Memory Bandwidth allocation
16
Five HeuristicsEven First-Fit
- Each Core has equal memory bandwidth share
- First-fit bin packing for task-to-core analysis
Uneven First-Fit
- Initially each Core has equal memory bandwidth share
- Trim-off memory bandwidth, if tasks are not schedulable with
equal memory bandwidth
- Use this trimmed bandwidth to schedule remaining tasks
Memory density worst-fit
- Sort cores in non-increasing order of energy density
- Assign task to a core with that gives minimum increase in
memory density
Total density worst-fit
- Similar to memory density worst-fit, except
it uses total density instead of memory
density
Memory-fit
- Assign a task to a core that requires minimum memory bandwidth
Priority assignment:
Deadline monotonic
CISTER – Research Centre in
Real-Time & Embedded Computing Systems
Utilization: UUnifast-discard algorithm
Inter-arrival time: Log-uniform distribution (10 ms- 1 s)
WCET of first frame = inter-arrival time × Utilization
Implicit deadlines (though algorithm works for constrained deadlines)
Number of frames: Selected randomly
WCET of other frames
Randomly selected with log-uniform distribution
Between user define value and WCET of first frame
Memory accesses of each frame
Selected randomly
Memory access time is 40 nsec
Regulation period length is 100 msec
1000 random task-sets per set point
Experimental Setup
17
CISTER – Research Centre in
Real-Time & Embedded Computing Systems
Results
18
MF-TA achieves up to 85% improvements in
terms of schedulability success ratio
MF-FA is 11x times faster than MF-TA with 3.7%
decreases in terms of schedulability success ratio
CISTER – Research Centre in
Real-Time & Embedded Computing Systems
Conclusions
Stall-aware schedulability analysis for multiframe task sets on multicore platforms
Provided techniques to reduce the computation time
Has possibility to trade-off tightness vs computation time
Improved schedulability success ratio up to 85% when compared to frame-agonistic stall-aware analysis
Achieved 11-fold speed up with 3.7% loss in schedulability
Proposed five memory bandwidth and task-to-core allocation heuristics
19
Click to edit Master title style
Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level
CISTER – Research Centre in
Real-Time & Embedded Computing Systems
CISTER – Research Centre in
Real-Time & Embedded Computing Systems
Questions ?
20
CISTER – Research Centre in
Real-Time & Embedded Computing Systems
Our work
Main idea
21
Multiframe Task-model Memory regulationMulticore platforms