EE249 1 Scheduling
EE2492
What’s the problem?
Have some work to do
– know subtasks
Have limited resources
Have some constraints to meet
Want to optimize quality
EE2493
Outline
Overview– shop scheduling
– data-flow scheduling
– real-time scheduling
– OS scheduling
Real-time scheduling RTOS generation
– scheduling
– Communication
Data-flow scheduling– pure
– Petri nets
EE2494
Shop scheduling
Single job, one time
finite and known amount of work
multiple resources of different kind
often minimize lateness
– could add release, precedence, deadlines, ...
SOLUTION: compute the schedule
APPLICATION: manufacturing
EE2495
Data-flow scheduling
Single-job, repeatedly
known amount of work
– simple subtasks
multi-processor
max. throughput, min. latency
SOLUTION: code generation
APPLICATION: signal processing
EE2496
Data-flow scheduling variants
Work
– data dependent (BDF, FCPN)
Resources
– many different execution units (HLS)
Goal
– min. code, min. buffers, min. resources
EE2497
Real-time scheduling
Fixed number of repeating jobs
each job has fixed work
– job is a sub-task
processor(s)
meet individual deadlines
SOLUTION: choose policy, let RTOS implement it
APPLICATION: real-time control
EE2498
RT scheduling variants
Work
– sporadic or event-driven tasks,
– variable (data dependent) work
– coordination between tasks:
– mutual exclusion, precedence, …
Goal
– event loss, input or output correlation, freshness, soft deadlines, ...
EE2499
OS scheduling
Variable number of random tasks
know nothing about sub-tasks
processor + other computer resources
progress of all tasks, average service time
SOLUTION: OS implements time-slicing
APPLICATION: computer systems
EE24910
Outline
Overview– shop scheduling
– data-flow scheduling
– real-time scheduling
– OS scheduling
Real-time scheduling RTOS generation
– Scheduling
– Communication
Data-flow scheduling– pure
– Petri nets
EE24911
RTOS functions
Enable communication between software tasks, hardware and other system resources
Coordinate software tasks
– keep track which tasks are ready to execute
– decide which one to execute: scheduling
EE24912
Outline
Implementing communication through events
Coordination:
– classic scheduling results
– reactive model of real-time systems
– conservative scheduling analysis
– priority assignment
EE24913
The scheduling problem
Given:
– estimates on execution times of each task
– timing constraints
Find:
– an execution ordering of tasks that satisfies constraints
A schedule needs to be:
– constructed
– validated
EE24914
Off-line vs. on-line scheduling
Plus side:
– simpler
– lower overhead
– highly predictable
Minus side
– bad service to urgent tasks
– independent of actual requests
EE24915
Scheduling Algorithms
off-line (pre-run-time, static)
– round-robin, e.g.
– C1 C2 C3 C4 C1 C2 C3 C4 C1 C2 C3 C4 …
– static cyclic, e. g.
– C1 C2 C3 C2 C4 C1 C2 C3 C2 C4 C1 C2 …
on-line (run-time, dynamic)
– static priority
– dynamic priority
– preemptive or not
EE24916
Static priority scheduling
synthesis:
– priority assignment
– RMS [LL73]
analysis
– Audsley 91
EE24917
Rate Monotonic Scheduling
Liu -Layland [73] consider systems consisting of tasks:
– enabled periodically
– with fixed run time
– that should be executed before enabled again
– scheduled preemptively with statically assigned priorities
EE24918
Rate Monotonic Scheduling
giving higher priority to tasks with shorter period (RMS) is optimal
– if any other static priority assignment can schedule it, them RMS can do it too
define utilization as sum of Ei/Ti
any set of n tasks with utilization of less than n(21/n-1) is schedulable
for n=2,3,…. n(21/n-1) = 0.83, 0.78, … ln(2)=0.69
EE24919
Static Priority Schedule Validation
Audsley [91]:
for a task in Liu-Layland’s model find its worst case execution time
i i ii
k n
run time i WCET i
period i period i
time
i
EE24920
Audsley’s algorithm
let Ei’s be run-times, Ti’s periods
how much can i be delayed by a higher priority task k:
– each execution delays it by Ek
– while i is executing k will be executed ciel(WCETi/ Tk)
WCETi = Ei + SUMk>i ciel(WCETi/ Tk)* Ek
EE24921
Solving implicit equation
iteration
– WCETi,0 = Ei
– WCETi,n+1 = Ei + SUMk>i ciel(WCETi,n/ Tk)* Ek
will converge if processor utilization if less than 1
EE24922
Dynamic priority
Earliest deadline first:
– at each moment schedule a task with the least time before next occurrence
LL have shown that for their model, EDF schedules any feasible set of tasks
EE24923
What’s wrong with LL model?
Liu-Layland model yields strong results but does not model reactivity well
Our model:
– models reactivity directly
– abstracts functionality
– allows efficient conservative schedule validation
EE24924
Computation Model
System is a network of internal and external tasks
External tasks have minimum times between execution
Internal tasks have priorities and run times
20
10
1,2 5,2 3,2
2,1 4,1
EE24925
Computation Model
20
10
1,2 5,2 3,2
2,1 4,1
External task execute at random, respecting the lower bound between executions
Execution of a task enables all its successors
Correct if no events are lost
EE24926
Schedule Validation
To check correctness:
– check whether internal events can be lost
– priority analysis
– check whether external events can be lost
– bound WCET
EE24927
Validation for Internal Events
Simple: if priority of i is less than k, then (i,k) cannot be lost
i k
i k
More general: if fan-ins of i form a tree such that leaves have lower priority than non-leaves and k, then (i,k) cannot be lost
EE24928
Validation for External Events
Compute a bound on the period of time a processor executes task of priority i or higher (i-busy period)
i i
> i
time
> i
< i < i
i-busy period
> i
( i-busy period ) > ( WCET i )
EE24929
Bounding i-busy Period
i-busy period is bounded by:
– initial workload at priority level i or higher caused by execution of some task < i
– workload at priority level i or higher caused by execution of
external tasks during the i-busy period
can find (by simulation) workload at priority level i or higher caused by execution of a single task
can bound the number of occurrences of external tasks in a given period
need to solve a fix-point equation
EE24930
System: Network of CFSMs
CFSM2
CFSM3
C=>G
CFSM1
C=>FB=>C
F^(G==1)
(A==0)=>B
C=>A
C=>B
F
G
CC
BA C=>B
EE24931
Implementations
CFSMs can be implemented:
– in hardware: HW-CFSMs
– in software: SW-CFSMs
– by built-in peripherals (e.g. timer): MP-CFSMs
EE24932
Events: SW to SW
for every event, RTOS maintains
– global values
– local flags
CFSM1
CFSM2
CFSM3
emit x( 3 ) x
x
x
3
detect x
EE24933
Events: atomicity problems
TASK 1 detects y AND NOT x, which is never true
to avoid, need atomic detects
TASK 1detect x
detect y
TASK 2
emit x
TASK 3
if detect x then emit y
EE24934
Events: SW to SW
for atomicity:
– always read from frozen
– others always write to live
– at the beginning of execution, switch
CFSM
frozenlive
EE24935
Events: HW to SW event can be polled or driving an interrupt
for polled events:
– allocate I/O port bits for value, occurrence and acknowledge flags
– generate the polling task that acknowledges and emits all polled events that have occurred
for events driving an interrupt:
– allocate I/O port bits for value,
– allocate an interrupt vector,
– create an interrupt service routine that emits an event
EE24936
Events: interrupts
interrupt service routine:
optional interrupt service routine:
{ emit x}
X RTOS
X
HW-CFSM SW-CFSM
IRQ
{ emit x execute SW-CFSM}
EE24937
Events: SW to HW
allocate I/O port bits for value and occurrence flag
use existing ports or memory-mapped ports
write value to I/O port
create a pulse on occurrence flag
EE24938
Events: SW to/from MP
every peripheral must have a library with
– init function (to be called at initialization time)
– deliver function for each input (to be called by emit)
– detect function for each output (to be called by poll-taker)
– interrupt service routine (containing emit)
EE24939
Coordination
consider SW-CFSM ready to run whenever it has some not consumed input events
choose the next ready SW-CFSM to run:
– scheduling problem
EE24940
Experiments
dashboard
– 6 tasks, 13 events
– 0.1s (8.6s to estimate run times)
shock absorber controller
– 48 tasks, 11 events
– 0.3s (880s to estimate run times)
PATHO RTOS
– orders of magnitude faster than timed automata
– scales linearly
EE24941
Open Problems
Propagation of constraints from external I/O behavior to each CFSM
–probabilistic: Markov chains
–exact: FSM state traversal
Satisfaction of constraints within a single transition
(e.g., software-driven bus interface protocol)
Automatic choice of scheduling algorithm, based on performance estimation and constraints
Scheduling for verifiability
EE24942
Outline
Overview– shop scheduling
– data-flow scheduling
– real-time scheduling
– OS scheduling
Real-time scheduling RTOS generation
– scheduling
– Communication
Data-flow scheduling– pure
– Petri nets
EE24943
Data-flow scheduling
Functionality usually represented with a data-flow graph
Kahn’s conditions allow scheduling freedom
– if a computation is specified with actors (operators) and data dependency, and
– every actor waits for data on all inputs before firing, and
– no data is lost
– then the firing order doesn’t matter
EE24944
Data-flow graphs
Schedule: a firing order that respects data-flow constraints and returns the graph to initial state
A, 1 B, 2
D, 1C, 3
EE24945
Schedule implementation
Static scheduling (cyclic executive, round robin)
A, B, C, D are processes
RTOS schedules them repeatedly in order A D B C
simple, but context-switching overhead large
A, 1 B, 2
D, 1C, 3
A D B C
A schedule:
EE24946
Schedule implementation
Code synthesis (OS generation)
A, B, C, D are subroutines
generate: forever{ call A; call D; call B; call C; }
less robust, better overhead
A, 1 B, 2
D, 1C, 3
A D B C
A schedule:
EE24947
Schedule implementation
In-lined code synthesis
A, B, C, D are code fragments
generate: forever{A; D; B; C; }
even less robust, even better overhead
A, 1 B, 2
D, 1C, 3
A D B C
A schedule:
EE24948
Data-flow scheduling
Resources
fixed or arbitrary number of processors
Goal:
max. throughput given a fixed number of processors
min. processors to achieve required throughput
EE24949
Data-flow scheduling goals
Max. throughput given a fixed number of processors
it is NP-hard to determine max. achievable throughput
Min. processors to achieve required throughput
if there are loops than there is a fundamental upper bound
easy to compute
EE24950
Throughput bound
1/maxloops(Time/Delay)
A, 1 B, 2
D, 1C, 3
N+2’nd output of A can be computed at least 7 time units after the Nth
EE24951
Scheduling heuristics
Non-overlapped scheduling
Look at one iteration
Use list scheduling algorithm (developed for shop scheduling)
Overlapped scheduling
less developed
EE24952
Inter-iteration constraints
Remove delayed edges
List scheduling:
– maintain list of tasks that could be scheduled
– schedule one with longest path
A, 1 B, 2
D, 1C, 3
EE24957
Inter-iteration constraints
Unfold k iteration (e.g. k=2)
Do list scheduling
A1, 1 B1, 2
D1, 1C1, 3
A2, 1 B2, 2
D2, 1C2, 3
EE24958
List scheduling
Rate optimal (not true in general)
A1, 1 B1, 2
D1, 1C1, 3
A2, 1 B2, 2
D2, 1C2, 3
C1
A1
P1
P2 D1 B1
A2
D2
C2
B2
EE24961
Loop scheduling and code size
A (2 B (2 C))A;for i = 1 … 2 {
B;for i = 1 … 2 {
C;}
}
single appearance schedules minimize in-lined code size
A B C20 20 1010
EE24962
Buffer size
ABCBCCC 20 30
A (2 B (2 C)) 20 20
A (2 B) (4 C) 20 40
A (2 B C) (2 C) 20 30
A B C20 20 1010
EE24963
Data-flow scheduling
Perfect design-time information
Fixed amount of repeating work
– data-independent
Input streams from the environment always available
Simple global constraints
data dependency => Petri nets
timing constraints => real-time scheduling
EE24964
Other scheduling models
Problem: computation result may depend on dynamic schedule
Synchronous languages: no scheduler needed
(but inapplicable to HW/SW heterogeneous systems)
Data Flow networks: deterministic computation
(but blocking read is unsuitable for reactive systems)
Can we obtain determinism without losing efficiency ?